Data Sharing Under the General Data Protection Regulation

Supplemental Digital Content is available in the text.

T he General Data Protection Regulation (GDPR) became binding law in all European Union (EU) Member States in May 2018. 1,2 As a Regulation, it is, in principle, directly applicable to all EU Member States, superseding existing Member State laws. It, thus, represents a significant step toward harmonizing EU data protection laws. 3 The GDPR applies to the processing of personal data across a large range of contexts, including those used in biomedical research and pertaining to vast areas of translational and clinical research activities, although it was probably not drafted with all the latter purposes in mind. As such, it has instigated multiple discussions among researchers and, in several cases, has raised major concerns, particularly, but not exclusively, in the genomics community (reviewed in study by Townend, 3 Phillips, 4 and Hallinan 5 ).
With the present intense need for international scientific data and bio-sample sharing, actively supported by major publishers and research funders, 6,7 GDPR requirements, particularly with respect to specific consent for data and bio-sample sharing, generate confusion and uncertainty and create conflicts between Regulation and Research Ethics. 8 Under the fear of legal and social sanctions in combination with the threat of huge penalties as a consequence of violating GDPR, scientists have become reluctant to exchange data and bio-samples for secondary research. 9 Increasingly, data and bio-sample use is restricted (to eg, specific hospitals, research institutions, or regions). In addition, the basic principle of scientific publishing that all relevant research data must be made accessible to ensure transparency and data reuse is frequently violated. Public health emergencies (such as the Ebola and Zika virus outbreaks and especially the ongoing coronavirus disease 2019  pandemic) exemplify the need to facilitate swift and safe data sharing around the globe, without legal delays. 10,11 This article aims to review the main principles of GDPR, in particular, pertinent to consent, as related to biomedical research; in this context, it discusses the practical problems related to data and bio-sample sharing, currently encountered by researchers, perceived to be linked directly or indirectly to GDPR; it also provides suggestions for interpretations and potential adaptations of the Regulation to better fit the practical realities of current biomedical research.

CONSENT AND PERSONAL DATA IN THE GDPR
Focusing on the most relevant issues, basic definitions and major aspects of the Regulation with direct impact on biomedical research are summarized in Table 1.
Processing of personal data concerning health is a priori prohibited (Article 9). Exemptions can be made based on consent which, as shown (Table 1), involves a clear, affirmative action in GDPR (Article 4.11). This action cannot simply be assumed based on failure to opt-out (eg, by not refusing preticked boxes or default settings). In addition, the scope of consent covers data processing for one or more specific purposes (Article 6). "Specific" is not explicitly defined but interpretations suggest that "the objectives of research, the principal investigator and the project's duration are specified". 12 Exceptions allowing research without consent exist, based on Member State law for "archiving purposes in the public interest, scientific or historical research purposes" (Articles 6,9). Alternatively, there is the possibility of obtaining consent for purposes not fully specified in advance (what we would call "broad" consent), related to "certain areas of scientific research, when in keeping with recognized ethical standards" (Recital 33). Nevertheless, there is confusion about the extent for which these exceptions could be applicable. 13 Both directly identifiable and pseudonymized data used by researchers should be treated as personal data (Table 1), based on the presumed residual risk of research subject identification, in case of pseudonymization (summarized in study by Rumbold and Pierscionek 14 ). Anonymized data are not affected or regulated by GDPR. To ensure anonymization, simply changing the identity label is insufficient. Multiple techniques are existent for further data perturbation to avoid (re) identification based on the variables listed. 15

CURRENT STATUS IN BIOMEDICAL RESEARCH DATA AND BIO-SAMPLE SHARING
Sharing is an undisputed ethical obligation of any investigator (including those conducting patient-centered research) and is a prerequisite for scientific and clinical advancement. 16 To frame this basic condition, pertinent ethical norms exist, including study monitoring by independent ethics committees, aiming to protect human subjects, while at the same time, promoting research collaboration (for a historical perspective, see study by Phillips 4 ). The need for sharing in research becomes even more obvious and critical, when studying rare diseases (such as rare cardiac diseases 17 ), which requires compilation of resources to allow meaningful research. In addition, and regardless of the disease incidence rates,

COVID-19 coronavirus disease 2019 EU
European Union GDPR The General Data Protection Regulation Personal data: any information relating to an identified or identifiable natural person (data subject) (Article 4 [1] of the GDPR). In practical terms, the definition covers all types of data and information, which have some connection to a specific, identifiable person. The connection can exist (1) on the basis of the content of the data or information itself-for example, in the case of biometric identifiers-or (2) on the basis of the possibility of combining data with other data sets, which would allow a connection to be made to a specific individual.
Pseudonymous data: data that have been produced via pseudonymization (Article 4(5) of the GDPR). The latter means the processing of personal data in such a manner that the personal data can no longer be attributed to a specific data subject, without the use of additional information, provided that such additional information is kept separate. Practically, pseudonymization involves coding of data. Pseudonymous data are still considered personal (Recital 26).
Anonymous data: data from which no connection to a specific identifiable person can be drawn, based on either the specific data alone or through linking to other datasets. Anonymous data fall outside applicability of the GDPR (Recital 26).

GDPR indicates General Data Protection Regulation.
the increasing diversity and complexity of molecular profiles and pathophysiologic phenotypes, underscores the need for adopting a systems integrative approach to define associations and causes, and, ultimately, biologydriven biomarkers and therapeutic targets. As an example, meta-analysis of genome-wide association datasets enables the identification of cardiovascular or kidney disease risk factors or confounders, that are too rare to be revealed at the individual study level. 18,19 Likewise, fueling artificial intelligence algorithms with sufficiently large, multi-level datasets (clinical, pathophysiological, imaging, molecular), offers the unique opportunity to understand disease in a holistic manner. 20,21 Along the same lines, data linkage can lead to better disease predictors, which was recently shown of particular value in stroke research. 22 Such large scale holistic analyses can only be realized via data sharing and data reuse and are actively supported by major medical associations (eg reflected in the establishment of standards for data collection by the American Heart Association/American College of Cardiology Task Force on Data Standards 21 ) and research funders, such as the EU Framework Program Horizon 2020, the National Institutes of Health, United States, and others. GDPR-triggered restrictions linked to the requirements for specific consent generate uncertainty and slow down or even prohibit research activities. Contributing to this problem are the exponential developments in molecular technologies, which allow high resolution multidimensional analyses of a biological sample, making it hard or even impossible in the case of high-resolution genetic information, to truly anonymize study participants. 13,23 Having to comply with an extensive (88 pages) legal framework increases the need for legal advice. As a result, collectively, valuable data and clinical samples may remain unused or significantly underused, causing a decrease of the power of the analyses.
To receive guidance on this critical issue of data and bio-sample use and sharing, as linked to study participant consent requirements in the context of the GDPR, several relevant translational networks (

REFLECTIONS ON IMPLEMENTING GDPR IN SCIENTIFIC DATA AND SAMPLE SHARING
The main questions/problems currently distressing researchers and respective answers deduced from the Regulation and discussions with legal experts are summarized in Table 2.
The answers provided (Table 2) highlight some main impediments perceived to be directly or indirectly imposed by the GDPR on biomedical research data and bio-sample sharing: (1) Although the initial aim of the GDPR was to harmonize practices in the EU Member States, hence facilitating data sharing, a significant part of decision-making is still left to the Member States, increasing confusion and bureaucratic complexity. As a consequence, implementation of multinational collaborative projects relying on data and bio-sample sharing and mobility frequently face de facto regulatory deadlocks, which in our experience, to the least, slow down progress. Even more restrictions apply in the case of research involving partners outside the EU (including collaborations with European countries, Japan, or the United States), requiring a "lawful basis for making the transfer". 9 (3) Consent to data use ensuring protection of participants, undoubtedly constitutes a pillar of clinical research ethics and the principal condition to obtain ethical approval. However, the stated need for specific consent raises doubts as to the legitimacy of the use of valuable stored data and bio-samples covered by pre-GDPR consent forms. Calling for national regulation to legitimate processing, when the specific consent is not fully available or cannot be readily obtained, opens Pandora's box in terms of the complexity of the legal landscape. The high complexity in relation to the scope of consent is illustrated by the problems that arise when existing data and bio-samples collected with the purpose of identifying specific biomarkers, for instance, for cardiovascular disease, would be considered as control in a study targeting the identification of specific chronic kidney disease biomarkers. In the absence of an explicit specification of this latter prospect in the original consent referring to cardiovascular disease, the use of such data and samples in chronic kidney disease research may be, a priori, prohibited. A possibility may exist to allow reuse with a supplemental consent. If this is not possible, reuse may be possible if a relevant national law has been enacted. However, in this latter case, use of Member State law to legitimate processing may end up causing issues in cases of collaborations requiring cross-border data and bio-sample exchange involving multiple EU Member States, as national laws do not always align.
(3) A potential solution in the latter case could be the use of data and bio-samples in an anonymized manner, hence, being exempt from GDPR and (re) consent requirements. However, considering the frequent availability of multiparametric, clinical, pathological, and molecular data characterizing a study participant, as mentioned above, it is reasonable to assume that true anonymization is only possible at the cost of significant loss of information (perturbance of data), especially in the context of -omics and Big Data analyses. 13,23 Furthermore, anonymization may be incompatible with many current research objectives (such as study of disease progression in a longitudinal, individualized manner, as required to establish treatment in a personalized approach).

SOLUTIONS
Specification of the impediments forms the basis for a fruitful discussion toward resolution or improvement. Potential ways forward might include (summarized in Table 3) the following: (1) Similar to other areas in medicine, 25 public/general population education and engagement in research pipelines is critical. This would target getting a clear understanding of the definitions and associated benefits and risks with broad and other emerging forms of consent, namely meta-consent (allowing the research participant to define what type, when and how consent may be provided in the future 26 ) or dynamic consent (through web platforms 27 ). In addition, it would explain residual risks for reidentification. There is an apparent ever increasing willingness of individuals to share their data for research advancement, reflected in the existence of large personal genome projects (such as the Harvard or Canadian, United Kingdom, Austrian, Korean, and other national personal genome projects; https:// www.personalgenomes.org/). Interactions of researchers and patients should be intensified in the form of large patient-linked networks (for example, the European Kidney Health Alliance; involving Kidney Patients and Foundations, Nephrologists and Nurses, http:// ekha.eu/; or the US National Patient-Centered Clinical Research Network [PCORnet]), consisting of health care centers, as well as patient organizations (Patient-Powered Research Networks; https://pcornet.org/patientcentric/), and via regular local outreach initiatives led by academic health centers, with the aim of achieving patient-centered informed consent, anticipating as much as possible future developments.
(2) Addressing issues related to secondary use of data and need for respective re-consent is the most challenging problem, as, for the time being, this seems to be in many cases subject to national law. In our opinion, this cannot be solved by an ad hoc administrative, legal, or policy change only; it must be the topic of a broad multidisciplinary discussion involving researchers, patient advocacy groups and the general public, ethics experts, regulators, and politicians targeting clarity and harmonization, as much as possible going beyond national boundaries, in line with the international character of high-level research. An example in this direction forms the design of the Final Rule, the latest update of the federal policy for the protection of human subjects in the United States, which emerged after being subject to commenting by the general public, various agencies, and expert stakeholders. 28 Our proposal is that the lead to open this discussion is taken by a Consortium of several acknowledged societies involved in the field, including patient advocacy groups; these should work closely with relevant authorities at the EU level (EU Directorate General [DG] for Research and Innovation, the European Commissioner for Research, Innovation and Science, and the Committee of Members of European Parliament for Industry, Research and Energy), and eventually also involve the respective national authorities, toward harmonizing interpretations of the law, aiming at minimizing boundaries impeding cross-border research in the EU.
(3) In this context, the generation of a standardized consent form (broad consent allowing generic future research activities), acceptable to all stakeholders involved, would be of enormous benefit. Such efforts Table 2

Question Answer
(1) Can data be shared and reused after publication?
Yes, if the prospect of data sharing and further use was included in the original consent form.
If not specified in the original consent, exceptions may be allowed depending on the inability for re-consenting.
Obtaining legitimation for exceptions can be complex and will be governed by the GDPR (Articles 9 [2][i], [g], and [j]) and the national laws of individual Member States.
(2) Can follow-up (monitoring over time) data be used and shared?
Yes, if the research participant has specifically consented.
Exceptions may be applicable, as above (point 1).
(3) Can biobanked data and bio-samples be shared and used?
A positive or negative response by default is not possible. It depends on the specifics of the situation. 9 Issue relates to the use of data and samples in broad fields of research, regularly linked to broad consent. GDPR does not use the term broad consent; but is also not entirely opposing to the concept of broad use (Recital 33).
Major existing international biobanking instruments are supportive of broad consent (eg, the World Medical Association [WMA] Declaration of Taipei).
National laws (potentially applicable under the GDPR Articles 9 [2][j], or Article 9 [4]) frequently differ in their requirements for data and sample use from biobanks.
(4) Are data and bio-sample restrictions to specific researchers/institutes allowed by the GDPR?
Not addressed by the GDPR.
Issue relates to privacy and data and biosample ownership with divergent views in different countries. 24 (5) How can we generate anonymized data for further sharing?
Multiple anonymization techniques exist depending on the type of identifiers and dimension of the dataset. 15 Anonymization requires well trained personnel; with awareness of the subject matter and individual importance of each data field.

GDPR indicates General Data Protection Regulation.
in the case of biobanking do exist, for example, in Germany, 12 but should be expanded on a harmonized European (and, possibly, international) level. Such a form should inform research participants that complete anonymization may not always be feasible (eg, in the case of whole genome sequencing) and that pseudonymization will be applied to minimize risk for reidentification. As a step in this direction, a template consent form, using a multi-omics study as a test case, is provided (Data Supplement). This is adapted from, and borrows text from existing consent forms (Health Innovation and Research Institute from the Ghent University Hospital, Belgium 29 ; and Human Cancer Models Initiative, United States 30 ). This form is intended to serve as a basis for discussion with ethics/legal experts and research participantspatient representatives for further improvement (including abbreviating and simplifying to obtain a legally and ethically sound template, that meets the needs of both research participants and researchers).
(4) The frequently observed restriction of data and bio-samples to specific institutions or countries, although not instructed by GDPR, is often the consequence of fear of violating GDPR. Basic research ethics through implementation of FAIR principles (Findable, Accessible, Interoperable, and Reusable data 16 ) and compliance with privacy and data protection principles should be observed and can be monitored, as recently described. 31 Systems, such as DATAshield, for simultaneous analysis of multisource individual data without actual data transfer, 32 data hubs allowing highly regulated access, 3 or blockchain platforms offering de-centralised data access and interoperability, 33 provide data sharing solutions which help maintaining data integrity and patient security. Both the patient and scientific communities would benefit from a harmonized European roadmap on this issue.

CONCLUSIONS
The collective experience accrued by now renders the time ripe to assess whether or not the GDPR is on the right track with respect to scientific data and biosample sharing and facilitating adherence to the basic research ethics of data sharing. Although, in principle, the GDPR is a step toward regulatory harmonization, based on the issues presented above, adjustments are urgently needed. These should (1) increase harmonization by minimizing differences caused by exceptions based on national laws; (2) seek explicit endorsement of the concept of broad consent and consequently; (3) better define the roadmap for secondary use of data and bio-samples at European (and, if possible, international) level. The lead to formulate the working draft should be taken by a team of acknowledged learned societies in the field including patient advocacy groups working closely with experts at the EU level; finalization should be made following a period of commenting by a broad multi-stakeholder audience. This process should evolve in parallel to promoting engagement and education of the public in the relevant definitions (of, eg, broad [specific, meta-, or dynamic] consent; data sharing; residual risk for re-identification), led by academic health centers on a local level and amplified by large patient-centered multidisciplinary networks. We hope that this article will serve as a catalyst for this broad discussion involving all major stakeholders, toward optimizing GDPR to facilitate biomedical research and to produce social benefit and welfare.  Exceptions based on national law should be restricted/eliminated and harmonization should be pursued, considering the international dimension of research.
A document outlining optimal interpretations of the GDPR should be prepared by acknowledged learned societies in the field, working closely with relevant authorities and patient groups; the draft will be subjected to commenting by representatives of all involved stakeholders (patients, regulators, legal officers, scientists) before finalization.
Apparent need for specific consent complicates or prohibits secondary data use.
Obtain explicit endorsement for the concept of broad consent and other emerging consent forms (such as meta-or dynamic consent).
Education of the general public on different forms of consent and associated risks should be intensified locally (via activities led by local academic health centers) and internationally (via large patient-centered networks). The general public should be actively involved in finalizing consent forms and ideal GDPR interpretations (described above).
Generate a standardized consent form template, allowing generic future research activities.
Fear for violating GDPR restricts data and sample sharing.
Harmonize basic research ethics (including FAIR principles) with privacy and data protection principles.
A harmonized roadmap for safe data sharing and monitoring compliance to privacy and data protection principles using state of the art tools should be defined.
EU indicates European Union; and GDPR, General Data Protection Regulation.