Datasets and documents
Inquiry within the BigData@Heart consortium resulted in a total of 56 participating datasets relating to 28 PIs of 13 different institutes, organizations or companies. PIs were identified and contacted for retrieval of documents. Out of 28 PIs we addressed by e-mail and/or telephone, 20 (71%) responded covering a total of 31/56 (55%) datasets. Eight datasets were removed from the list because they did not contribute patient-level data to BigData@Heart.
For 24/48 (50%) datasets we were sent some form of documentation or reference to conditions for data sharing. For 17/24 datasets actual documentation was provided. For the other 7 datasets a statement was received by e-mail with a reference to policy or legislation. Our team received 60 documents consisting of: 26 informed consent forms (often including patient information), 15 patient information forms, 8 data transfer and data access agreements (DTAs/DAAs), 4 study protocols, 3 blank case report forms (CRFs) of clinical trials, 2 policy statements and 2 questionnaires for participants. After initial screening, blank CRFs and questionnaires were excluded since there were no conditions referred to in these documents, leaving 55 documents for further analysis.
Analysis of prespecified conditions for data sharing
We extracted information from the 55 identified documents along the lines of the following five key elements: statements on data sharing, purpose limitation, level of de-identification, terms of issuance and reference to ‘policy otherwise’. An overview of the conditions for data sharing as stated in the received ethico-legal documentation from 17 datasets can be found in the Additional file 1: Table S1.
The documentation on most datasets included a statement about data sharing. Statements about data sharing were more common among non-experimental, non-commercial studies (cohorts) than among clinical trials performed by industry. For all clinical trials, no explicit mentioning of data sharing for future health research was found. In most documents data sharing was described as the “conditional” sharing of the data collected during the study with “third parties”. Conditions and third parties were not further specified in a number of documents.
Purpose limitation
For use of stored data, most datasets restrict the permitted use to “scientific research”. For such studies, some patient information forms mention that it “might be necessary to work with commercial companies”. One document states that data will never be sold to commercial companies. Another declares that the database is established by a non-profit organization, but that the results from collaboration with a commercial company may become property of that company and may be exploited for commercial purposes. Patients are reminded that they have no claim to property rights in such cases. Most document pertaining to observational studies/registries restrict use of stored data to scientific research within the scope of the primary research activities only, either by limiting use to disease area (e.g., cardiovascular disease) or “relevance” to the dataset or original study itself. For most datasets, it appears that use of stored data is limited to the questions specified in a pre-determined research plan that is submitted for approval by the primary study team. Some industry-sponsored trials state that property rights may be shared or transferred to another sponsor/owner (without specifying use).
Level of de-identification
Many documents explicitly mention “coded data” as a condition for data sharing. The key to access to the directly identifiable personal data is described to remain with either 1) only the research team, 2) only one researcher from the team 3) or only the treating clinician. Coded data is mostly described in informed consent and patient information forms as “you will only be identified by a number” or “you will be given a special code that identifies you”. According to different informed consent documents, use of “coded data” may indicate either “use without revealing your identity”, that “your identification will be removed”, “that it is unlikely that anyone will be able to identify you”, that “you cannot be recognized by it (removal of full name and address)” or that “your data is anonymous”.
Industry-sponsored trials sometimes state that: “all personal data that leaves your doctor’s site will be anonymized/in anonymous form”. In documents where data sharing with third parties is explicitly addressed, various (descriptions of) levels of de-identification are mentioned. For example, some state that “your data will be shared in such a way that the data cannot be traced back to you”, that “data is issued with unique pseudonyms as patient identifiers” or that “requested data have been made (fully) anonymous”. In some cases, where “full anonymization” is not considered possible, data is “de-identified to the fullest extent possible to ensure data is unidentifiable”. Only two clinical trials explicitly mention which personal identifiers will be removed in order to de-identify the data. Some datasets only allow sharing of aggregate data, not individualized data. One dataset mentions that, in some instances, sharing of personal data is unavoidable, and that a separate processing agreement will need to be signed.
Terms of issuance
For use of stored data, most datasets require interested users to submit a formal request to the original study team. In many cases, data transfer agreements (DTAs) are used to bind users to terms and conditions. General templates are used as well as specific DTAs issued per project. All DTAs mention the required level of de-identification and state that users are not permitted to re-identify patients or share the data with persons other than those directly working on the specific research project, for which approval was granted. In a number of DTAs, use of the data for purposes other than the research objectives outlined in the application is prohibited. Many research teams also specify conditions with respect to publication of the results generated from the data. According to a few DTAs, users have to agree with someone from the original team being involved in the study, for example, in the analyses or as an author of the publication. Often, DTAs will include responsibilities with respect to data security. Sometimes this responsibility is placed on the user, sometimes on the provider and sometimes on both. Only one dataset requires express written informed consent from the data subjects for secondary use.
Compliance with ‘policy otherwise’
Most DTAs include a paragraph that refers to compliance with what we will call ‘policy otherwise’. Here, it is relevant to note that the GDPR only came into force as of 25 May 2018, meaning that reviewed documents will likely not refer to this regulation. The vast majority of transfer agreements do make reference to national legislation. A general statement often encountered in DTAs is that data will be used in accordance with “all the applicable local laws, regulations, statutes and guidelines which are applicable to the recipient’s use of the data”. In informed consent forms, reference is made more generally to legislation (“data is kept confidential within the limits of the law”). For international clinical trials, sponsors often mention in patient information forms that participants “should be aware that some countries may not offer the same level of privacy protection as [they] are used to in the country where [they] live or where this study is conducted”. Two datasets refer to their own institute-specific/study-specific privacy regulations.