Skip main navigation

Facilitating Harmonization of Variables in Framingham, MESA, ARIC, and REGARDS Studies Through a Metadata Repository

Originally publishedhttps://doi.org/10.1161/CIRCOUTCOMES.123.009938Circulation: Cardiovascular Quality and Outcomes. 2023;16

Abstract

BACKGROUND:

High-quality research in cardiovascular prevention, as in other fields, requires inclusion of a broad range of data sets from different sources. Integrating and harmonizing different data sources are essential to increase generalizability, sample size, and representation of understudied populations—strengthening the evidence for the scientific questions being addressed.

METHODS:

Here, we describe an effort to build an open-access repository and interactive online portal for researchers to access the metadata and code harmonizing data from 4 well-known cohort studies—the REGARDS (Reasons for Geographic and Racial Differences in Stroke) study, FHS (Framingham Heart Study), MESA (Multi-Ethnic Study of Atherosclerosis), and ARIC (Atherosclerosis Risk in Communities) study. We introduce a methodology and a framework used for preprocessing and harmonizing variables from multiple studies.

RESULTS:

We provide a real-case study and step-by-step guidance to demonstrate the practical utility of our repository and interactive web page. In addition to our successful development of such an open-access repository and interactive web page, this exercise in harmonizing data from multiple cohort studies has revealed several key themes. These themes include the importance of careful preprocessing and harmonization of variables, the value of creating an open-access repository to facilitate collaboration and reproducibility, and the potential for using harmonized data to address important scientific questions and disparities in cardiovascular disease research.

CONCLUSIONS:

By integrating and harmonizing these large-scale cohort studies, such a repository may improve the statistical power and representation of understudied cohorts, enabling development and validation of risk prediction models, identification and investigation of risk factors, and creating a platform for racial disparities research.

REGISTRATION:

URL: https://precision.heart.org/duke-ninds.

WHAT IS KNOWN

  • Stroke is the fifth leading cause of death in the United States and there is a critical need to continue to test and improve risk prediction models for primary stroke.

  • Accuracy of risk prediction models is based on the quality, sample size, and diversity of the data sets that are used to build the models.

  • Leveraging, integrating, and harmonizing various data sources are critical to increase sample size and improve the representativeness and generalizability of the cohorts used for training and validation of risk prediction models.

WHAT THE STUDY ADDS

  • This study democratizes access to the processes and methods used to harmonize data and visualize how different cohort variables are brought together as a means to better understand and improve the resources and tools that researchers are looking for.

  • We describe an effort to build an open-access repository and interactive online portal for researchers to access the metadata and code harmonizing data from 4 well-known cohort studies—the REGARDS (Reasons for Geographic and Racial Differences in Stroke) study, FHS (Framingham Heart Study), MESA (Multi-Ethnic Study of Atherosclerosis), and ARIC (Atherosclerosis Risk in Communities) study.

  • The online portal can be found at https://precision.heart.org/duke-ninds.

  • Harmonizing 4 well-known studies would enable researchers to increase the statistical power and potential for using harmonized data to address important scientific questions and disparities in cardiovascular disease research.

Stroke is the fifth leading cause of death in the United States.1 Stroke primary prevention guidelines call for more research to validate risk assessment tools across age, sex, and race/ethnic groups.2 To that end, we are assessing performance of existing risk prediction models recommended for primary stroke, and developing and validating a new machine learning-based risk prediction model for primary stroke for comparison. The validity and generalizability of this work depends on access to sufficiently large, high-quality, well-characterized data sets representative of the populations of interest. Unfortunately, as in many other fields, relevant data are scattered across many different, often heterogeneous sets. Thus, leveraging and integrating the various existing open data sources is critical to increase sample size and improve the representativeness and generalizability of the cohorts used for training and validation of risk prediction models. The fact that commonly available open data sources, for example, FHS (Framingham Heart Study), MESA (Multi-Ethnic Study of Atherosclerosis),3 and ARIC (Atherosclerosis Risk in Communities), often contain variables with the same label that are defined differently poses challenges for data integration and harmonization. This requires data harmonization including (1) mapping similar variables across datasets, (2) selecting groups of variables that represent generalized concepts, and (3) combining, converting, and pooling concept-mapped variables to a cohesive harmonized variable for use in analysis (Figure 1).4,5 We chose to democratize access to the processes and methods used to harmonize the data and visualization of how different cohort variables are brought together as a means to better understand and improve the resources and tools that researchers are looking for.

Figure 1.

Figure 1. The data harmonization process. Study data variables collected from different sources need to be mapped to one another (step 1), classified into the generalized concepts they represent (step 2), and transformed into unified harmonized variables (step 3) for analysis.

Interoperability and reusability are critical components of the scientific process, and key for the verification and advancement of scientific results.5–8 A consistent challenge with accessing and analyzing data in the ecosystem of open data is the lack of clear and rigorous documentation and tools that support reuse and interoperability. This led to the Findable, Accessible, Interoperable, and Reusable guiding principles for data management.9–12 However, the conceptualization, mappings, and transformation of data are rarely published in an interoperable, computational manner that facilitates collection to improve future search and reuse.10,11

The objective of this work was to create a high-quality data repository that improves the standardization, accessibility, and reusability of variables harmonized from commonly used observational studies to predict stroke. This is accomplished by creating a Findable, Accessible, Interoperable and Reusable metadata repository documenting the harmonization process and definitions of well-defined, stroke outcome and predictor variables. In this article, we demonstrate the methodology used for preprocessing and harmonizing variables from multiple studies, including the REGARDS (Reasons for Geographic and Racial Differences in Stroke) study,13 FHS,14 MESA,3 and ARIC study.15 The Methods section describes the processes used to integrate and harmonize the variables across studies. The results section provides users a step-by-step guide to use the metadata repository through an interactive web page that is still in beta-testing mode in an interoperable cloud-based virtual environment on the American Heart Association’s (AHA) Precision Medicine Platform.12,16,17

METHODS

In accordance with the AHA Journal’s implementation of the Transparency and Openness Promotion Guidelines, we have established a public URL, where we will make the metadata, methods used in the analysis, and documentation available to all. The URL is https://precision.heart.org/duke-ninds. Access to the raw data in each of the 4 longitudinal studies is under the purview of National Institutes of Health (ARIC, FHS, and MESA) and the University of Alabama Birmingham (REGARDS). The overall workflow for harmonizing data across observational studies and providing a metadata repository via an interactive site is illustrated in Figure 2.

Figure 2.

Figure 2. Overall workflow for documenting the process of harmonizing data across observational studies and providing the resulting metadata through an interactive site and metadata repository.

Data Sources

We obtained data from 4 cohort studies—REGARDS, FHS, MESA, and ARIC. Institutional review board approval was obtained for access to retrospective data in all studies.

  1. REGARDS is a national, population-based, longitudinal study of risk factors for stroke in adults 45 years or older. A total of 30 239 participants were recruited between January 2003 and October 2007 from all over the United States with a higher concentration of participants from the Southeastern United States.13,18

  2. FHS is an ongoing, longitudinal study that began in 1948 to investigate cardiovascular disease risk factors. It initially recruited an original cohort of 5209 participants from Framingham, MA, and later added an offspring cohort of 5124 participants, who have been assessed 9 times between 1971 and 2014. The study has been widely used as a resource for understanding the risk factors and progression of cardiovascular disease.19

  3. The MESA study is a prospective cohort designed to investigate the prevalence and progression of subclinical cardiovascular disease in community dwelling adults. The study assessed a diverse, population-based sample of 6814 asymptomatic men and women aged 45 to 84 years between 2000 and 2018. The study recruited participants from 6 field centers across the United States, including Wake Forest University, Columbia University, Johns Hopkins University, University of Minnesota, Northwestern University, and University of California, Los Angeles. The cohort consists of 38% White individuals, 28% African American, 22% Hispanic, and 12% Asian individuals (primarily Chinese). The study conducted 6 examinations since July 2000, with each examination period occurring every 18 to 24 months.20

  4. The ARIC study is a multisite, prospective, biracial cohort study that aim to investigate the causes of atherosclerosis and the clinical outcomes in adults from 4 US communities (Washington County, MD; Forsyth County, NC; Jackson, MS; and Minneapolis, MN). The study cohort consists of a randomly selected sample of ≈4000 individuals aged 45 to 64 years from a defined population from each community. The participants enrolled in the study are 15 792 Black and White adults. These participants were re-examined every 3 years with the first screen (baseline) occurring in 1987 to 1989, the second in 1990 to 1992, the third in 1993 to 1995, the fourth examination in 1996 to 1998, and the fifth examination was conducted in 2009.15

Data Extraction

Documentation for the FHS, MESA, and ARIC study variables was obtained through the database of Genotypes and Phenotypes (dbGaP).21 Documentation for the REGARDS study13 was obtained directly through the University of Alabama Birmingham. The documentation and metadata for each individual study variable were extracted, harmonized, and used to provide the following information for each variable: (1) study name; (2) dataset name (the collection of a subset of study variables stored in a file); and (3) description. The time period when variables were measured was extracted from the dataset descriptions and manually annotated. To extract variable documentation for the FHS, MESA, and ARIC studies, we parsed the Extensible Markup Language variable reports provided with the data from dbGaP. For the REGARDS study, a combination of structured plain text files and PDF files were parsed to extract variable documentation.

Data Harmonization

We conducted data harmonization on stroke outcomes, including stroke subtype and their risk factors across the 4 cohorts, based on the previous stroke risk model studies. The current AHA/American Stroke Association prevention of stroke guidelines recommend use of risk prediction models to optimize screening and interventions.2 The Framingham Stroke Risk Profile (Framingham Stroke) estimated 10-year risk of developing stroke using key risk factors identified through epidemiological studies.22,23 Additional prediction tools have been introduced, including the revised Framingham Stroke,24 a risk stratification that only requires self-reported measures from the REGARDS study25 and the Pooled Cohort Equation,26 although the Pooled Cohort Equation was designed to estimate 10-year risk of atherosclerotic cardiovascular disease, defined as myocardial infarction, or any stroke or death from cardiovascular causes. Although there is limited data comparing the performance of these prediction tools in estimating risk of stroke, especially among subgroups defined by sex, race, and age, we included stroke risk factors as defined in these previous stroke risk model studies. Algorithms proposed to date have relied on traditional regression techniques. The potential added predictive value offered by more complex machine learning algorithms is lacking.

The stroke risk factors included age, sex, race, smoking status, medical history of cardiovascular disease, atrial fibrillation, diabetes, hypertension medications, systolic blood pressure, total cholesterol, high-density lipoprotein cholesterol, prior myocardial infarction, education level, and general health. To curate these stroke outcomes and risk factors, we first identified the raw variables from the 4 cohorts that can be mapped to each of the underlying risk factor, and then converted and pooled those raw variables into a cohesive harmonized variable. Additionally, we relied on the National Institute of Neurological Disorders and Stroke Common Data Elements for curation of stroke outcomes and risk factors, where available.

Metadata Repository and Interactive Website

The repository and interactive website aim to provide 2 functionality levels to end users: (1) to view and search documentation of harmonized variables and the study variables used to create them; and (2) make metadata and code findable, accessible, interoperable, and reusable.

To implement the functionality level 1, we included a harmonized variable ID and definition, and mappings for each harmonized variable to the following: (1) general concepts and standard Systemized Nomenclature of Medicine Clinical Terms27 or National Institute of Neurological Disorders and Stroke Common Data Elements28 terms; and (2) variables in 1 or multiple datasets in each cohort. Clinical experts manually reviewed the definitions of common data elements and study-specific and cohort-specific variables and verified that the definitions were sufficiently congruent. To understand how variables from a specific study were combined and transformed, we generated a network display for each harmonized variable depicting how each study variable was mapped and transformed.

To implement the functionality level 2, we chose to deploy this resource on the AHA Precision Medicine Platform.12,17 Users request datasets from dbGAP and access this data in the cloud-based workspaces on the AHA Precision Medicine Platform. These workspaces are Health Insurance Portability and Accountability Act and Federal Risk and Authorization Management Program certified, and are equipped with data analysis tools such as Python and R. Users can also bring their own data and install additional software and tools.12,17

To meet the findable, accessible, interoperable, and reusable guiding principles, this repository provides the following: (1) detailed definitions for harmonized concepts, including references and mappings to standards; (2) clear documentation detailing which specific study variables were mapped to concepts, documentation for each study variable mapped, and the transformations required for harmonization; and (3) access to code available for use on personal computers and in interoperable workspace for reuse and development.

Code

The extensible markup language package in R29 was used to load and parse documentation information contained in extensible markup language documents and read Hyper Text Markup Language from the xml2 package was used to extract all documentation text.29,30 The tabulizer package in R (version 3.5) was used to extract documentation from PDF files.31 For the table/metadata repository, we use the datatables.net package. For the node network visualization, we use the cytoscape package.

The full set of documentation in the metadata repository and the code to harmonize the variable depicted can be accessed in a workspace and on GitHub (https://github.com/duke-harmonization/manual_harmonization).

RESULTS

The online portal can be accessed online (https://precision.heart.org/duke-ninds). Researchers interested in studying cardiovascular disease and integrating data across the REGARDS study, FHS, MESA, and ARIC study can leverage this open platform to accelerate their work and better understand how the same variable is defined across all cohorts. An overview of the observational studies used, the total variables mapped, and the total stroke and outcome concepts represented were shown in Table.

Table. Overview of Harmonized Observational Study Variables

StudyConcepts Represented (N)Datasets (N)Variables Mapped (N)
REGARDS51554
MESA497174
ARIC5043310
FHS5224327

Breakdown of the concepts, datasets, and variables from each study in the harmonization. Concepts Represented refers to the number (N) of unique harmonized variables; Datasets indicates the number of unique dataset labels in dbGaP used; and Variables Mapped refers to the number of unique individual variables used in the harmonization. ARIC indicates Atherosclerosis Risk in Communities study; FHS, Framingham Heart Study; MESA, Multi-Ethnic Study of Atherosclerosis, and REGARDS, Reasons for Geographic and Racial Differences in Stroke study.

The online portal, as shown in Figure 3, includes the main harmonized variable section where researchers can easily browse or search for variables of their interest, such as race, age, and others. The interface also features a side panel that provides quick access to relevant resources for the researcher’s convenience. Below shows the step-by-step guide for using the metadata repository through an interactive web page on the AHA Precision Medicine Platform.

Figure 3.

Figure 3. Screenshot of online portal.

Step 1: Data access request. Because we do not have the rights to grant access to individual-level data, the first step for researchers is to submit a data access request to dbGaP. This can be done by clicking the dbGaP Access Request button on the site, which will redirect investigators to the National Institutes of Health dbGaP repository. Please note that access to the REGARDS cohort is currently only available through the University of Alabama at Birmingham.

Step 2: Variable curation. The variable curation process involves harmonizing stroke outcome and risk factor variables across studies. Researchers can navigate to the Harmonized Variables section and search for the variable of interest. Figure 4 presents a case study to demonstrate 1 harmonized variable, atrial fibrillation, and shows what researchers can find in this platform. The portal will display a table of raw variables from each study that were grouped together to create the harmonized variable. Researchers can then explore additional information about individual study variables, such as the source study, dataset, and the fields used for harmonization. For each harmonized variable, users can also visualize a network display to understand how each raw variable in the original studies was mapped and transformed to this harmonized variable. By clicking on a node, users will see a table displaying the variable name, dataset description, and the variable description.

Figure 4.

Figure 4. Application of a use case. We select a harmonized variable, for example, Atrial Fibrillation, to demonstrate additional information used to create the harmonized variable and how it may be visualized through a node network. The nodes and edges in the network can be toggled to display additional variable documentation and how the variables were transformed. https://precision.heart.org/duke-ninds.

Step 3: Validation. Because the National Institute of Neurological Disorders and Stroke (NINDS) Common Data Elements contain standards used by the ARIC study to define stroke and transient ischemic attack, the validation of curated stroke outcome and variables also relied on the NINDS Common Data Elements where available. To validate data harmonization, clinical experts can navigate to the Standardization section in the top panel of the repository and click Mapping Table. This will display the mapping from the curated variable to NINDS and Systematized Nomenclature of Medicine. The experts can manually review the definitions of Common Data Elements and those of the study-specific and cohort-specific variables to ensure they are sufficiently congruent.

The platform also includes hyperlinked documentation and resource repositories to enable easy access to additional outside resources and content.

DISCUSSION

We have created a metadata repository with an interactive website for researchers to facilitate data harmonization of variables from the FHS, MESA, ARIC, and REGARDS studies. This resource provides access to the source code for data harmonization showing reusable and interoperable mappings, conceptualization, and transformation of study variables, which researchers can run the logic to create these harmonized variables in their own studies. This repository also provides access to analysis code for risk prediction models such as stroke outcome prediction. By combining and harmonizing variables from multiple observational studies, this resource addresses the demand for combined data sets, by increasing overall sample size and increasing representation of historically underrepresented populations. This enables researchers to explore research questions that were previously unattainable due to limited sample sizes or lack of diversity in study populations. A few scientific examples of how harmonizing studies accelerates research include expanding the pool of risk prediction factors, improving modeling techniques to address observed racial disparities, conducting sensitivity analysis and evaluating model performance stratified by age, gender, and race/ethnicity groups, and improving model performance for predicting new-onset stroke and outcomes.32

Other repositories exist that support discoverability, comparison, and dynamic reuse.21,33 However, the scope of these repositories is limited to requesting access to data and currently few, if any, exist that are centered around facilitating integration and curation. Global repositories such as dbGaP and the Biologic Specimen and Data Repository Information Coordinating Center have been developed to support reuse of data from an individual observational study21,33; however, lack of in-depth documentation, semantic interoperability, and standardization remains a challenge for discovery and comparison of variables when harmonizing multiple observational studies for predictive models.

A challenge in building this metadata repository has been automating the extraction and transformation of all cohort variable documentation to limit manual creation of the documentation and definitions of each harmonized variable. We were able to partially overcome this by creating a generalized framework to automatically parse the extensible markup language files from dbGaP to extract the documentation. For studies documented in less structured formats, such as PDFs or word documents, we created pipelines to extract metadata information. An additional challenge has been to determine the amount of information users would like and the way in which they would like it displayed. Beta-testing and focus groups are currently underway to receive feedback from users and modify the site, accordingly.

We are continually working on the scalability and automating the extraction and documentation of each harmonized variable, while also iteratively testing and developing the user interface for the metadata repository. As we finalize the fully harmonized data, we plan to enable users to interactively explore the harmonized data and create aggregate statistics, and we also provide resources for reusing the harmonization code and accompanying metadata.

Although the current work focused on 4 cohorts for stroke risk prediction, the pipeline can easily be extended to harmonize variables for other diseases and additional data. To integrate more studies into the platform, we require the variables in the study to be documented in a way that can be parsed to extract metadata, which can be in the form of computer-readable structured data file formats such as extensible markup language, comma separated values, Microsoft excel, etc. Once this metadata is extracted, the underlying JavaScript Object Notation used as the data storage for the platform can be modified to seamlessly update tables, figures, and links to include information from the new study. We are committed to learning from the research community on how to extend this portal.

ARTICLE INFORMATION

Acknowledgments

The Framingham Heart Study is conducted and supported by the National Heart, Lung, and Blood Institute (NHLBI) in collaboration with Boston University (Contract No. N01-HC-25195 and HHSN268201500001I). This article was not prepared in collaboration with investigators of the Framingham Heart Study and does not necessarily reflect the opinions or views of the Framingham Heart Study, Boston University, or NHLBI. The original data used for this analysis can be found at database of Genotypes and Phenotypes (dbGaP) using the dbGaP accession number phs000007.v32 L. The MESA (Multi-Ethnic Study of Atherosclerosis) and the MESA SHARe project are conducted and supported by the NHLBI in collaboration with MESA investigators. Support for MESA is provided by contracts HHSN268201500003I, N01-HC-95159, N01-HC-95160, N01-HC-95161, N01-HC-95162, N01-HC-95163, N01-HC95164, N01-HC-95165, N01-HC-95166, N01-HC-95167, N01-HC-95168, N01-HC-95169, UL1-TR-001079, UL1-TR000040, UL1-TR-001420, UL1-TR-001881, and DK063491. The original data used for this analysis can be found at dbGaP using the dbGaP accession phs000209.v13. The ARIC (Atherosclerosis Risk in Communities) study has been funded in whole or in part with Federal funds from the National Heart, Lung, and Blood Institute, National Institute of Health, Department of Health and Human Services, under contract numbers (HHSN268201700001I, HHSN268201700002I, HHSN268201700003I, HHSN268201700004I, and HHSN268201700005I). The authors thank the staff and participants of the ARIC study for their important contributions. The original data used for this analysis can be found at dbGaP, accession: phs000280.v7. While the REGARDS (Reasons for Geographic and Racial Differences in Stroke) data used in this study was obtained from Judd, Suzanne E [email protected], the data can be found at dbGaP by including the dbGaP accession number phs002719.v1.p1. The REGARDS Genome-Wide Association Studies study was supported by the National Institutes of Health (NIH) NHLBI grant R01HL136666. The parent REGARDS study is supported by a cooperative agreement U01 NS041588 from the National Institute of Neurological Disorders and Stroke, National Institutes of Health, US Department of Health and Human Services.

Nonstandard Abbreviations and Acronyms

AHA

American Heart Association

ARIC

Atherosclerosis Risk in Communities study

dbGAP

database of Genotypes and Phenotypes

FHS

Framingham Heart Study

MESA

Multi-Ethnic Study of Atherosclerosis

REGARDS

Reasons for Geographic and Racial Differences in Stroke

Disclosures Drs Hall, Mallya, Zhao, and V. Manchanda are employees of the American Heart Association (AHA). The other authors report no conflicts.

Footnotes

*P. Mallya and L. Stevens contributed equally.

For Sources of Funding and Disclosures, see page 791.

Supplemental Material is available at https://www.ahajournals.org/doi/suppl/10.1161/CIRCOUTCOMES.123.009938.

Correspondence to: Jennifer L. Hall, PhD, American Heart Association, Dallas, TX. Email

REFERENCES

  • 1. Benjamin EJ, Muntner P, Alonso A, Bittencourt MS, Callaway CW, Carson AP, Chamberlain AM, Chang AR, Cheng S, Das SR, et al; American Heart Association Council on Epidemiology and Prevention Statistics Committee and Stroke Statistics Subcommittee. Heart disease and stroke statistics-2019 update: a report from the American Heart Association.Circulation. 2019; 139:e56–e528. doi: 10.1161/CIR.0000000000000659LinkGoogle Scholar
  • 2. Meschia JF, Bushnell C, Boden-Albala B, Braun LT, Bravata DM, Chaturvedi S, Creager MA, Eckel RH, Elkind MS, Fornage M, et al; American Heart Association Stroke Council. Guidelines for the primary prevention of stroke: a statement for healthcare professionals from the American Heart Association/American Stroke Association.Stroke. 2014; 45:3754–3832. doi: 10.1161/STR.0000000000000046LinkGoogle Scholar
  • 3. Bild DE, Bluemke DA, Burke GL, Detrano R, Diez Roux AV, Folsom AR, Greenland P, Jacob DR, Kronmal R, Liu K, et al. Multi-Ethnic Study of Atherosclerosis: objectives and design.Am J Epidemiol. 2002; 156:871–881. doi: 10.1093/aje/kwf113CrossrefMedlineGoogle Scholar
  • 4. Fortier I, Burton PR, Robson PJ, Ferretti V, Little J, L’Heureux F, Deschenes M, Knoppers BM, Doiron D, Keers JC, et al. Quality, quantity and harmony: the DataSHaPER approach to integrating data across bioclinical studies.Int J Epidemiol. 2010; 39:1383–1393. doi: 10.1093/ije/dyq139CrossrefMedlineGoogle Scholar
  • 5. Fortier I, Doiron D, Little J, Ferretti V, L’Heureux F, Stolk RP, Knoppers BM, Hudson TJ, Burton PR; International Harmonization Initiative. Is rigorous retrospective harmonization possible? Application of the DataSHaPER approach across 53 large studies.Int J Epidemiol. 2011; 40:1314–1328. doi: 10.1093/ije/dyr106CrossrefMedlineGoogle Scholar
  • 6. Boeckhout M, Zielhuis GA, Bredenoord AL. The FAIR guiding principles for data stewardship: fair enough?Eur J Hum Genet. 2018; 26:931–936. doi: 10.1038/s41431-018-0160-0CrossrefMedlineGoogle Scholar
  • 7. Munafo MR, Nosek BA, Bishop DVM, Button KS, Chambers CD, du Sert NP, Simonsohn U, Wagenmakers EJ, Ware JJ, Ioannidis JPA. A manifesto for reproducible science.Nat Hum Behav. 2017; 1:0021. doi: 10.1038/s41562-016-0021CrossrefMedlineGoogle Scholar
  • 8. Stilp AM, Emery LS, Broome JG, Buth EJ, Khan AT, Laurie CA, Wang FF, Wong Q, Chen D, D’Augustine CM, et al. A system for phenotype harmonization in the national heart, lung, and blood institute trans-omics for precision medicine (TOPMed) program.Am J Epidemiol. 2021; 190:1977–1992. doi: 10.1093/aje/kwab115CrossrefMedlineGoogle Scholar
  • 9. Almugbel R, Hung LH, Hu J, Almutairy A, Ortogero N, Tamta Y, Yeung KY. Reproducible Bioconductor workflows using browser-based interactive notebooks and containers.J Am Med Inform Assoc. 2018; 25:4–12. doi: 10.1093/jamia/ocx120CrossrefMedlineGoogle Scholar
  • 10. Mons B, Neylon C, Velterop J, Dumontier M, da Silva Santos LOB, Wilkinson MD. Cloudy, increasingly FAIR; revisiting the FAIR Data guiding principles for the European Open Science Cloud.Inf Serv Use. 2017; 37:49–56. doi: 10.3233/ISU-170824CrossrefGoogle Scholar
  • 11. Pugliese P, Knell C, Christoph J. Exchange of clinical and omics data according to FAIR principles: a review of open source solutions.Methods Inf Med. 2020; 59:e13–e20. doi: 10.1055/s-0040-1712968CrossrefMedlineGoogle Scholar
  • 12. Stevens LM, de Lemos JA, Das SR, Rutan C, Alger HM, Elkind MSV, Zhao J, Iyer K, Figueroa CA, Hall JL. American Heart Association precision medicine platform addresses challenges in data sharing.Circ Cardiovasc Qual Outcomes. 2021; 14:e007949. doi: 10.1161/CIRCOUTCOMES.121.007949LinkGoogle Scholar
  • 13. Howard VJ, Cushman M, Pulley L, Gomez CR, Go RC, Prineas RJ, Graham A, Moy CS, Howard G. The reasons for geographic and racial differences in stroke study: objectives and design.Neuroepidemiology. 2005; 25:135–143. doi: 10.1159/000086678CrossrefMedlineGoogle Scholar
  • 14. Dawber TR MG, Moore FE. Epidemiological approaches to heart disease: the Framingham study.Am J Public Health Nations Health. 1951; 41:27981. doi: 10.2105/ajph.41.3.279CrossrefMedlineGoogle Scholar
  • 15. The ARIC investigators. The Atherosclerosis risk in communit (ARIC) study: design and objectives.Am J Epidemiol. 1989; 4:687–702. doi: 10.1093/oxfordjournals.aje.a115184CrossrefGoogle Scholar
  • 16. Houser SR. The American Heart Association’s new institute for precision cardiovascular medicine.Circulation. 2016; 134:1913–1914. doi: 10.1161/CIRCULATIONAHA.116.022138LinkGoogle Scholar
  • 17. Kass-Hout TA, Stevens LM, Hall JL. American Heart Association precision medicine platform.Circulation. 2018; 137:647–649. doi: 10.1161/CIRCULATIONAHA.117.032041LinkGoogle Scholar
  • 18. Armstrong ND, Srinivasasainagendra V, Patki A, Tanner RM, Hidalgo BA, Tiwari HK, Limdi NA, Lange EM, Lange LA, Arnett DK, et al. Genetic contributors of incident stroke in 10,700 African Americans with hypertension: a meta-analysis from the genetics of hypertension associated treatments and reasons for geographic and racial differences in stroke studies.Front Genet. 2021; 12:781451. doi: 10.3389/fgene.2021.781451CrossrefMedlineGoogle Scholar
  • 19. Mahmood SS, Levy D, Vasan RS, Wang TJ. The Framingham Heart Study and the Epidemiology of Cardiovascular Diseases: a Historical Perspective.Lancet. 2014; 383:999–1008. doi: 10.1016/S0140-6736(13)61752-3CrossrefMedlineGoogle Scholar
  • 20. Olson JL, Bild DE, Kronmal RA, Burke GL. Legacy of MESA.Glob Heart. 2016; 11:269–274. doi: 10.1016/j.gheart.2016.08.004CrossrefMedlineGoogle Scholar
  • 21. National Center for Biotechnology Information. dbGAP database interface.Accessed September 7, 2023. http://www.ncbi.nlm.nih.gov/gap/Google Scholar
  • 22. D’Agostino RB, Wolf PA, Belanger AJ, Kannel WB. Stroke risk profile: adjustment for antihypertensive medication. The Framingham Study.Stroke. 1994; 25:40–43. doi: 10.1161/01.str.25.1.40LinkGoogle Scholar
  • 23. Wolf PA, D’Agostino RB, Belanger AJ, Kannel WB. Probability of stroke: a risk profile from the Framingham Study.Stroke. 1991; 22:312–318. doi: 10.1161/01.str.22.3.312LinkGoogle Scholar
  • 24. Dufouil C, Beiser A, McLure LA, Wolf PA, Tzourio C, Howard VJ, Westwood AJ, Himali JJ, Sullivan L, Aparicio HJ, et al. Revised Framingham stroke risk profile to reflect temporal trends.Circulation. 2017; 135:1145–1159. doi: 10.1161/CIRCULATIONAHA.115.021275LinkGoogle Scholar
  • 25. Howard G, McClure LA, Moy CS, Howard VJ, Judd SE, Yuan Y, Long DL, Muntner P, Safford MM, Kleindorfer DO. Self-reported stroke risk stratification: reasons for geographic and racial differences in stroke study.Stroke. 2017; 48:1737–1743. doi: 10.1161/STROKEAHA.117.016757LinkGoogle Scholar
  • 26. Andrus B, Lacaille D. 2013 ACC/AHA guideline on the assessment of cardiovascular risk.J Am Coll Cardiol. 2014; 63:2886. doi: 10.1016/j.jacc.2014.02.606CrossrefMedlineGoogle Scholar
  • 27. SNOMED International. Accessed September 8, 2023. https://www.snomed.orgGoogle Scholar
  • 28. National Institute of Neurological Disorders and Stroke. NINDS Common Data Elements.Accessed April 27, 2023. https://www.ninds.nih.gov/ninds-common-data-elementsGoogle Scholar
  • 29. Lang DaTCT. XML: tools for parsing and generating XML within R and S-Plus.2019. Accessed September 8, 2023. https://cran.r-project.org/web/packages/XML/index.htmlGoogle Scholar
  • 30. Wickham H, Hester J, Ooms J. RStudio, example RF Copy of R project homepage cached as. xml2: Parse XML [Internet]. Accessed September 8, 2023. https://cran.r-project.org/web/packages/xml2/index.htmlGoogle Scholar
  • 31. Leeper TJ. Tabulizer: Bindings for Tabula PDF Table Extractor Library.2018. Accessed September 8, 2023. https://scholar.google.com/scholar_lookup?title=tabulizer:+Bindings+for+tabula+PDF+table+extractor+library&publication_year=2018&Google Scholar
  • 32. Hong C, Pencina MJ, Wojdyla DM, Hall JL, Judd SE, Cary M, Engelhard MM, Berchuck S, Xian Y, D’Agostino R, et al. Predictive accuracy of stroke risk prediction models across black and white race, sex, and age groups.JAMA. 2023; 329:306–317. doi: 10.1001/jama.2022.24683CrossrefMedlineGoogle Scholar
  • 33. NIH. BioLINCC Resource Overview. NIH; 2022. Accessed July 23, 2020.Google Scholar