Skip main navigation

Computable Phenotype Implementation for a National, Multicenter Pragmatic Clinical Trial

Lessons Learned From ADAPTABLE
Originally published Cardiovascular Quality and Outcomes. 2020;13:e006292



Many large-scale cardiovascular clinical trials are plagued with escalating costs and low enrollment. Implementing a computable phenotype, which is a set of executable algorithms, to identify a group of clinical characteristics derivable from electronic health records or administrative claims records, is essential to successful recruitment in large-scale pragmatic clinical trials. This methods paper provides an overview of the development and implementation of a computable phenotype in ADAPTABLE (Aspirin Dosing: a Patient-Centric Trial Assessing Benefits and Long-Term Effectiveness)—a pragmatic, randomized, open-label clinical trial testing the optimal dose of aspirin for secondary prevention of atherosclerotic cardiovascular disease events.

Methods and Results:

A multidisciplinary team developed and tested the computable phenotype to identify adults ≥18 years of age with a history of atherosclerotic cardiovascular disease without safety concerns around using aspirin and meeting trial eligibility criteria. Using the computable phenotype, investigators identified over 650 000 potentially eligible patients from the 40 participating sites from Patient-Centered Outcomes Research Network—a network of Clinical Data Research Networks, Patient-Powered Research Networks, and Health Plan Research Networks. Leveraging diverse recruitment methods, sites enrolled 15 076 participants from April 2016 to June 2019. During the process of developing and implementing the ADAPTABLE computable phenotype, several key lessons were learned. The accuracy and utility of a computable phenotype are dependent on the quality of the source data, which can be variable even with a common data model. Local validation and modification were required based on site factors, such as recruitment strategies, data quality, and local coding patterns. Sustained collaboration among a diverse team of researchers is needed during computable phenotype development and implementation.


The ADAPTABLE computable phenotype served as an efficient method to recruit patients in a multisite pragmatic clinical trial. This process of development and implementation will be informative for future large-scale, pragmatic clinical trials.


URL:; Unique identifier: NCT02697916.

Cardiovascular clinical trials are often plagued by escalating costs, challenging regulations, and declining site-level participation.1,2 One source of high costs and complexity is the lack of efficient methods for the identification and enrollment of participants. The electronic health record (EHR) represents one tool to facilitate clinical trial recruitment, potentially at a lower cost.3 The EHR is particularly important for pragmatic clinical trials, which are studies performed within the context of routine care on diverse populations.4

The national Patient-Centered Outcomes Research Network (PCORnet), similar to other data networks,5–8 leverages a standardized common data model with a single, overarching governance process. PCORnet is a national network of 13 Clinical Data Research Networks, 2 Health Plan Research Networks, and 21 Patient-Powered Research Networks with the goal of enabling investigators to use electronic health data to support clinical research across broad, diverse populations.9 With the adoption of a standardized data model across multiple institutional participants, a single, well-written query developed by one institution can run successfully across all institutions to identify large cohorts of potentially eligible patients.

ADAPTABLE (Aspirin Dosing: a Patient-Centric Trial Assessing Benefits and Long-Term Effectiveness) is the flagship, pragmatic, Patient-Centered Outcomes Research Institute–funded clinical trial evaluating the effectiveness and safety of 81 versus 325 mg aspirin doses in patients with a history of atherosclerotic cardiovascular disease (ASCVD) in 15 000 patients.10 Leveraging a computable phenotype was central to the pragmatic nature of ADAPTABLE. A computable phenotype is the product of using an executable set of algorithms to identify specific measurable constructs present in patient records.11 A computable phenotype translates study eligibility criteria into a query, enabling the efficient identification of potentially eligible patients for research.9,11–14 The components of a computable phenotype include the presence or absence of diseases, procedures, patterns of medication use, laboratory results, and clinical events.15

In this methods paper, we describe the development, implementation, and lessons learned from the ADAPTABLE computable phenotype. The lessons were identified through discussions of the ADAPTABLE computable phenotype working group and other ADAPTABLE team members, including clinical investigators, informaticians, data scientists, and health services researchers. Data were also used from an internal ADAPTABLE survey of participating sites administered by the trial coordinating center in 2016.

Summary of ADAPTABLE

PCORnet operates as a distributed research network where each Clinical Data Research Network, Patient-Powered Research Network, or Health Plan Research Network retains control of its data but makes it available to others for querying. Partner institutions harmonize the structure of their data through the use of a common data model to enable this interoperability. The implemented version (v3.1) of the PCORnet Common Data Model at time of computable phenotype development and initial execution included the 15 data tables. The specifications of the PCORnet Common Data Model are available online.16

The Common Data Model contains a focused set of discrete data elements including demographics, diagnoses, procedures, laboratory data, vital signs, smoking status, medication orders, and dispensing data. With only 15 tables, the Common Data Model includes only a subset of all data available in the EHR or claims-based system.

Computable Phenotype Development

The major eligibility criteria for ADAPTABLE includes a history of ASCVD defined as requiring ischemic heart disease, the presence of ≥1 enrichment factors that elevate the risk for cardiovascular events, and the absence of contraindications to aspirin use, such as prior severe gastrointestinal bleeding or aspirin allergy. After study protocol finalization in October 2015, the Duke Clinical Research Institute, serving as the trial coordinating center, started computable phenotype development and leveraged prior work by the Mid-South Clinical Data Research Network. The Mid-South Clinical Data Research Network developed and validated a coronary heart disease computable phenotype with excellent performance with a positive predicted value of 98.5% and sensitivity of 94.6%.17 The next step of development was the creation of technical specifications based on the ADAPTABLE protocol language.

As shown in Figure 1, concepts that appear in the inclusion and exclusion criteria were generally divided into 3 categories: (1) concepts directly obtained from Common Data Model fields (eg, EHR-derived smoking status, systolic blood pressure), (2) concepts requiring generated code lists from coded medical terminologies stored in Common Data Model fields (eg, medication usage), and (3) concepts not readily available from Common Data Model fields (eg, prior coronary angiography with ≥75% stenosis of at least 1 epicardial coronary vessel). Code lists for the second category were created using a number of different tools, most of which are publicly available,18–21 and were derived from encounter diagnosis and procedure codes. Codes from problem lists were not used due to data quality concerns. Codes from structured medical history are not mapped to a table in the Common Data Model.

Figure 1.

Figure 1. ADAPTABLE (Aspirin Dosing: a Patient-Centric Trial Assessing Benefits and Long-Term Effectiveness) inclusion and exclusion criteria, versions 1 and 2. The inclusion and exclusion for the trial at the time of initial computable phenotype development included data elements that were directly mapped to the common data model, identified via sets of code lists, or not available in the common data model. The new criteria added in version 2 are annotated with blue arrows. ASCVD indicates atherosclerotic cardiovascular disease; CABG, coronary artery bypass grafting; GI, gastrointestinal; LDL, low-density lipoprotein; and PCI, percutaneous coronary intervention.

This computable phenotype was expected to be a base phenotype that individual sites would customize locally. Sites had the option to run the customized computable phenotype on their PCORnet datamart and include the addition of local data not available in the Common Data Model. Most sites adapted the code to run exclusively on local systems instead of their PCORnet datamart. In winter 2015/2016, the computable phenotype was posted on GitHub22 and presented to participating ADAPTABLE sites on multiple conference calls. After this release, each site tested, revised, and validated the computable phenotype locally.

Computable Phenotype Updates

In October 2016, Patient-Centered Outcomes Research Institute approved a revised protocol, which included a number of additions to the inclusion/exclusion criteria (Figure 1). Based on previously published work,17,23 additional chart review testing the proposed phenotype changes, and extensive steering committee discussions, the definition of ASCVD was expanded from requiring prior myocardial infarction, coronary revascularization, or severe coronary artery disease on coronary angiogram to include patients with a history of clinical ASCVD, defined as a history of chronic ischemic heart disease, coronary artery disease, or ASCVD and identified using International Classification of Diseases codes. This expansion was a response to sites that indicated that prior coronary angiography results were neither available in the PCORnet Common Data Model or readily available as coded data in their EHR systems. Moreover, for the health system sites, these patients often had tests and procedures at other institutions that were not documented in their EHR systems. This change, along with a broadened list of enrichment factors, was incorporated into the computable phenotype and more than doubled the eligible patient population.

The coordinating center posted updates to the computable phenotype via GitHub throughout the duration of the trial. Sites making changes to their base code were encouraged to post their work to GitHub or share it with other sites and the coordinating center.

Computable Phenotype Implementation and Patient Enrollment

All sites leveraged the computable phenotype as the main part of their strategy to identify eligible participants. The first phenotype was run at 19 sites across 8 participating Clinical Data Research Networks. The computable phenotype identified 174 785 eligible patients. The second phenotype was run at 40 sites across 9 participating Clinical Data Research Networks and 1 Health Plan Research Network, and it identified 657 215 eligible patients. A total of 15 076 participants were enrolled from April 2016 to June 2019. Figure 2 illustrates the trial recruitment metrics, stratified by the Clinical Data Research Networks and Health Plan Research Network recruitment efforts.

Figure 2.

Figure 2. ADAPTABLE (Aspirin Dosing: a Patient-Centric Trial Assessing Benefits and Long-Term Effectiveness) recruitment metrics. The computable phenotype identified a large pool of potentially eligible participants at both clinical data research networks and health plan research networks. CDRN indicates clinical data research network; and HPRN, health plan research network.

The recruitment strategies include mailed letters, secure messaging, phone calls, EHR-based trial recruitment tools, social media campaigns, and traditional in-clinic recruitment. The details of the recruitment strategies and results will be reported separately. The section below describes the 4 principal lessons learned from the implementation of the computable phenotype.

Lessons Learned

Choose Your Data Sources Based on Local Data Quality and Study Needs

Ideally, in a multisite trial in a distributed research data network like ADAPTABLE, a single computable phenotype could be run that would accurately identify eligible participants. However, during the creation of the computable phenotype, the development team recognized that completeness and quality of the data can greatly influence the performance of a computable phenotype,24 and they designed an approach to account for these issues. For example, using a standardized data model like the Common Data Model does not guarantee that all of the data are the same (or normalized in informatics terms). The use of a common data model ensures all participating sites will have a local version (or instance) of a database with the same set of tables having the same field names, all populated according to a set of rules (or specifications). On a quarterly basis, the PCORnet Coordinating Center facilitates data characterization at each site through a series of queries, which are reviewed for each datamart by the Coordinating Center and local team and include assessments for data completeness, data plausibility, and data model conformance.25 This ensures a baseline level of data quality needed to execute a distributed query like the ADAPTABLE computable phenotype. Despite following a common set of rules and undergoing systematic data curation processes, the content of the Common Data Model often differs among institutions. These differences arise from the makeup of the participating institutions (eg, academic facilities versus community-based care centers), clinical practice and coding styles, the interval over which electronic data are available, or the relative balance of inpatient and ambulatory data.

The underlying source data, which may come from either EHR systems or administrative/billing systems, can affect content, and the data quality can differ based on the EHR system in use. Many of the clinical criteria in the computable phenotype were extracted from all available codes in the diagnosis and procedure tables in the Common Data Model at each site; however, sites, based on guidance from the PCORnet Coordinating Center and their local source systems, decided whether to load codes from EHR or administrative/billing systems or both. The specific extraction (E), transformation (T), and loading (L) practices used to move data from the source system into the data model can also affect the data content. While the Common Data Model ensures a computable phenotype will run at all participating sites, the results of the computable phenotype output in terms of the true number of patients meeting the desired criteria at all the sites needs to be interpreted with caution.

Local factors contribute to the general quality of the available data, which in turn will affect the accuracy of the computable phenotype. For example, some EHR systems facilitate mapping key structured data elements, such as medications and laboratory results to the standards used in the Common Data Model and by the computable phenotype. Sites with EHR systems more conducive to accurate mappings, in general, have better data quality and specifically may have more complete data to use in the execution of the computable phenotype. Some sites have more accurate death data through linkage to state or national vital records, whereas others rely exclusively on health system data. These differences led to variations in the ability to exclude deceased patients. Another factor is the frequency of updating the data used for the computable phenotype. Sites in PCORnet refresh their datamarts approximately every 3 months. If the computable phenotype is run directly on the Common Data Model datamart, new data between the time of refresh and the time of computable phenotype execution are not included. This concept, referred to as data latency, has implications for accurately identifying eligible patients for the trial.

Understanding these challenges with EHR data quality and common data models is essential during computable phenotype development. In ADAPTABLE, these challenges were addressed by creating a base algorithm that was then tested and customized locally at the sites by teams of clinical leaders, clinical informaticians, researchers, and data experts. In most cases, site teams elected to customize and validate the computable phenotype on local source systems, in part, due to concerns with data latency and need to access data not included in the Common Data Model. These additional data enhanced the identification of eligible participants and facilitated recruitment.

Recognize and Accept Translation Limitations

Ideally, all of the inclusion and exclusion criteria in ADAPTABLE could be precisely mapped to electronic data elements. Unfortunately, the translation from written criteria into valid and actionable code may not be feasible for multiple reasons. Categorizing a disease, which is inherently complex, into codes creates an opportunity for information loss. For example, a condition such as 3-vessel artery disease is not reliably identified in a single code or set of codes. Instead, valid and reliable identification would likely be based on several structured and unstructured data elements within the patient record. Moreover, some clinical constructs may require advanced computing techniques to fully extract (eg, the use of natural language processing to extract left ventricular ejection fraction from unstructured documents).

When identifying patients with a particular disease, the concept of time in EHR data is important and requires attention during computable phenotype development. The time associated with the first diagnosis code reflects the time of entry and not necessarily the time of onset of disease. Diagnosis codes often persist in a patient’s record well after the disease or condition has resolved, whereas procedure codes remain fixed to a single time point in the record. A patient may be admitted with a suspected diagnosis of acute myocardial infarction, which is appropriately entered as a diagnosis in the EHR at the time of admission. However, subsequent inpatient evaluation may have ruled out acute myocardial infarction and established an alternative diagnosis such as pericarditis. Yet, the diagnosis code for acute myocardial infarction may persist in the medical record for an indefinite period of time.

Another challenge can arise from using procedure codes. For example, the PCORnet datamarts predominantly include data after 2010. The first version of the ADAPTABLE computable phenotype used procedural codes—International Classification of Diseases and Current Procedural Terminology—to identify patients with a history of coronary artery bypass grafting surgery or history of percutaneous intervention. However, if a patient had a coronary artery bypass grafting or percutaneous intervention in 2005, those procedural codes would not necessarily be available in the PCORnet datamarts. A clinician seeing that patient after 2010 would likely only use clinical codes for chronic ischemic heart disease; thus, querying only for procedure codes would miss these patients. The second version of the computable phenotype improved identification of these patients via expanding the definition to include history of chronic ischemic heart disease, coronary artery disease, or ASCVD.

For these reasons (and many more), determining and defining incident and prevalent disease is particularly problematic. The ADAPTABLE computable phenotype accounted for these challenges by using previously tested definitions of diseases, including broad criteria to identify patients with ASCVD, allowing for local validation and customization, and building in confirmation of key inclusion and exclusion criteria during the enrollment process.

Validate and Tailor Locally

In a multisite pragmatic clinic trial using a computable phenotype, the study leadership has to provide guidance on validation. In ADAPTABLE, the coordinating center asked each site to develop its own validation strategy of the computable phenotype. Among participating ADAPTABLE sites, nearly everyone validated the computable phenotype. Validation practices varied in terms of the process, data availability, and number of charts reviewed. The Health Plan Research Network validation, for example, focused on optimizing positive predictive value of the computable phenotype because the network only has access to claims data and could not routinely review patient charts. Their implementation of the computable phenotype limited the sample to patients with codes for acute myocardial infarction or revascularization to minimize risks of misclassification. Validation of the Health Plan Research Network computable phenotype through chart review of 185 patients demonstrated a positive predictive value of 90.8%.26

Health system sites developed and executed their own validation plan based on local factors, including previous experience identifying patients with coronary artery disease for trials, site recruitment strategies, and resource constraints. Based on local validation efforts, certain sites further optimized the computable phenotype by including additional criteria on their computable phenotype and leveraging local data beyond that available in the Common Data Model. For example, some sites increased the minimum age requirements to reduce the likelihood of identifying patients with congenital heart disease. Other sites sought to improve the accuracy of clinical ASCVD identification by including additional requirements, such as at least 2 instances of ASCVD diagnostic codes, a history of an aspirin prescription, or documentation of an ASCVD International Classification of Diseases code by a cardiologist. Due to the heterogeneity in validation methods across sites and the decentralized nature of the Common Data Model, we are unable to describe computable phenotype performance across all study sites.

Providing flexibility for the validation of the computable phenotype locally has benefits. First, the coordinating center does not have direct access to the data. Second, local investigators and data scientists understand their data best and are ideal for designing a validation process that aligns well within their own system. Third, the goal of the phenotype may vary among sites. For example, a site that plans to send out mass mailings to all participants may adapt the algorithm to optimize positive predictive values. In contrast, a site planning for only in-clinic recruitment with chart review by a research assistant may elect to optimize sensitivity. Thus, it would be challenging for the coordinating study team to provide significant oversight over the validation process. However, not having guidelines for the validation process will lead to variation in the extent and quality of validation processes across sites, which may introduce bias into the study sample selection. The extent of the guidelines for validation of the computable phenotype should depend on its complexity, resources dedicated to validation, and ongoing discussions between local investigators and the coordinating study team.

Questions as a Safety Net

A computable phenotype will never capture all criteria of interest, and it will suffer from tradeoffs between sensitivity and specificity. To protect the safety of participants and the integrity of the trial, it is essential to recognize the limitations of the study’s computable phenotype within and across sites and understand whether additional screening is needed using mechanisms such as chart review or asking self-reported questions to potential participants either in person, over the phone, or via a patient portal. If the limitations of the phenotype are identified, they can be used to inform the types of questions patients should be asked before enrollment. To this end, during the participant enrollment process using an online portal in ADAPTABLE, each participant is asked a series of questions related to key inclusion and exclusion criteria. For the participating Health Plan Research Network, these questions were essential because their data were exclusively claims based and lacked information about exclusion factors such as medication allergies.

Within ADAPTABLE, potentially eligible patients were given the opportunity to confirm their eligibility by answering questions in a patient portal (Table). During the enrollment period from April 2016 to June 2019, 32 087 patients entered the portal and 18 497 completed the eligibility questions (remaining patients dropped out of the portal before answering questions). As shown in the Table, the computable phenotype did not identify 896 unique patients with self-reported exclusions to the trial, representing a 4.8% failure rate. Failure to identify current use of a contraindicated medication represented over half of the failure rate cases (543/896). An additional 1342 patients deemed ineligible because they self-reported an inability to change their aspirin dose. Beyond matching into the computable phenotype and answering the portal screening questions, additional eligibility confirmation occurred locally, although these practices varied by site and recruitment strategy.

Table. Eligibility Questions Answered by Potential Participants in Online Portal During Enrollment

Portal QuestionAffirmative Patient Response*
1. Are you allergic to aspirin?158 (0.85%)
2. Have you had a severe bleeding problem in the past with aspirin?282 (1.5%)
3. Are you currently taking any of these anticoagulant medications?543 (2.9%)
Total, unique noneligible896 (4.8%)

*Denominator=18 497 patients entering the portal and responding to eligibility questions.

†Brilinta (ticagrelor), Coumadin (warfarin), Eliquis (apixaban), Pradaxa (dabigatran), Savaysa (edoxaban), and Xarelto (rivaroxaban).

‡Potential participants may be represented in >1 category.


The wide-scale adoption of EHRs and the development of national, electronic data research networks with standardized data models have created the opportunity to revitalize the clinical enterprise infrastructure and optimize the research process. Many challenges experienced by ADAPTABLE investigators mirror those described by other groups using EHR and claims data to develop computable phenotypes and conduct observational epidemiological studies and clinical trials over the last 30 years.12,13,27,28

The ADAPTABLE computable phenotype successfully identified over 650 000 potentially eligible participants for the trial and helped the trial achieve its recruitment goal of 15 000 patients. Although computable phenotypes can enhance the speed and efficiency of recruitment efforts, there are many challenges with their implementation within national data research networks. Through the experience with ADAPTABLE, several key recommendations were identified that can inform the future use of computable phenotypes (Figure 3).

Figure 3.

Figure 3. Key recommendations from ADAPTABLE (Aspirin Dosing: a Patient-Centric Trial Assessing Benefits and Long-Term Effectiveness) computable phenotype implementation. EHR indicates electronic health record.

Historically, EHR data have not been collected to support clinical research but instead are collected for clinical care and billing. As such, it is important for researchers to maintain reasonable expectations about the utility of EHR data quality to support secondary research.29–37 Before implementation of a computable phenotype, sites may require additional data quality exercises to improve the quality of clinical constructs identified by the computable phenotype. Beyond data quality issues, important clinical elements are not always easily extractable from the EHR. Instead, some data are stored in text fields, which is difficult, if not impossible, to consistently parse. Computable phenotypes may benefit from using a combination of EHR and non-EHR data sources, such as state or national death data or disease registries. Administrative claims data have not historically been utilized to identify eligible research participants. While these systems typically lack clinical granularity, they do capture qualifying clinical events across health systems.

As learned in ADAPTABLE, a computable phenotype is not a static program. Instead, it should be prepared to respond to changes. Throughout the course of the recruitment period, the phenotype should respond to changes in source data or alterations to the study protocol. If the quality of source data changes, the phenotype should be prepared to respond, in kind. In addition, validation efforts along with recruitment efforts should inform changes to subsequent computable phenotype iterations. For example, if through validation efforts, a diagnosis code contributes significantly to false positive rates then removing the code might be necessary. Alternatively, if the computable phenotype fails to identify sufficient patients for recruitment, trial leadership should consider altering the phenotype to broaden the eligible pool.

Implementing a high-quality computable phenotype requires collaboration between the study coordinating center, the data teams, and study sites. Within each study site, adaptation and implementation of the computable phenotype requires close partnership between clinical leaders and data/informatics experts. Their work requires further guidance by their respective compliance policies. Each team member brings a set of different expertise, from understanding clinical workflows, designing appropriate data and information technology architecture, addressing data quality issues along with technical expertise in writing, testing, and customizing computable phenotypes. Supporting a diverse group requires adequate resources for personnel and meetings.

A better understanding of electronic data quality along with developing methodologies to account for data quality issues all are important areas for future development and research. Another key area of research is the development of methodology to combine structured and unstructured data elements into a portable computable phenotype across multiple sites.38–40 Developing the infrastructure to integrate health system and claims data across large, diverse populations and the methods to develop a single computable phenotype that runs across both types of data will help advance the capacity to conduct large, efficient pragmatic clinical trials. Lastly, improvement in EHR usability and workflow to better capture data, either at the point-of-care by clinicians or with natural language processing pipelines, could improve clinical care and increase the efficiency of pragmatic clinical trials. This type of work involves a diverse team, including human factors experts.

This report has important limitations. The ADAPTABLE trial is an ongoing PCORnet demonstration study, and these lessons reflect the initial experience of sites participating in this trial. Similarly, the computable phenotype used in ADAPTABLE was designed specifically for the PCORnet Common Data Model and may not be easily extensible to other data models or analytical programs other than SAS. Due to the heterogeneity in validation efforts along with the decentralized nature of the PCORnet, we are unable to report computable phenotype performance characteristics. However, validation from the Health Plan Research Network and prior work identifying patients with coronary heart disease demonstrated a positive predictive value of over 90%.17,26 Finally, the results presented herein are descriptive reflections of a diverse team of researchers working on the ADAPTABLE computable phenotype. While the results are a product of a diverse authorship group (in terms of site location and position within the study), they are descriptive and serve to provide broad recommendations. Future research on the ADAPTABLE computable phenotype will focus on quantitative analysis to provide more specific recommendations.


To our knowledge, this is the first report on the process of development of an EHR-based computable phenotype and the lessons learned during its implementation in a large, national US cardiovascular clinical trial. The results of the investigation include broad lessons learned from experience deploying and implementing a computable phenotype that may be broadly generalizable to other clinical studies. In conclusion, the most important lesson learned in the process of computable phenotype implementation is that challenges must be met with a degree of flexibility, or adaptability, which enables changes based on the data, resources, sites, and patients.


We thank the ADAPTABLE study (Aspirin Dosing: a Patient-Centric Trial Assessing Benefits and Long-Term Effectiveness) leadership team including Drs Richard Platt, Adrian Hernandez, Robert Harrington, and Russell Rothman. In addition, we thank the Patient-Centered Outcomes Research Network (PCORnet) coordinating center and the PCORnet Distributed Research Network Operations. The base code for the ADAPTABLE computable phenotype is located on GitHub here:


*Dr Ahmad and I.M. Ricket contributed equally to this work as first authors.

This manuscript was sent to Karin H. Humphries, DSc, Guest Editor, for review by expert referees, editorial decision, and final disposition.

Faraz S. Ahmad, MD, MS, Northwestern University Feinberg School of Medicine, 676 N St Clair St, Suite 600, Chicago, IL 60611. Email


  • 1. Eapen ZJ, Vavalle JP, Granger CB, Harrington RA, Peterson ED, Califf RM. Rescuing clinical trials in the United States and beyond: a call for action.Am Heart J. 2013; 165:837–847. doi: 10.1016/j.ahj.2013.02.003CrossrefMedlineGoogle Scholar
  • 2. Jones WS, Roe MT, Antman EM, Pletcher MJ, Harrington RA, Rothman RL, Oetgen WJ, Rao SV, Krucoff MW, Curtis LH, Hernandez AF, Masoudi FA. The changing landscape of Randomized Clinical Trials in Cardiovascular Disease.J Am Coll Cardiol. 2016; 68:1898–1907. doi: 10.1016/j.jacc.2016.07.781CrossrefMedlineGoogle Scholar
  • 3. Mc Cord KA, Ewald H, Ladanie A, Briel M, Speich B, Bucher HC, Hemkens LG; RCD for RCTs Initiative and the Making Randomized Trials More Affordable Group. Current use and costs of electronic health records for clinical trial research: a descriptive study.CMAJ Open. 2019; 7:E23–E32. doi: 10.9778/cmajo.20180096CrossrefMedlineGoogle Scholar
  • 4. Weinfurt KP, Hernandez AF, Coronado GD, DeBar LL, Dember LM, Green BB, Heagerty PJ, Huang SS, James KT, Jarvik JG, Larson EB, Mor V, Platt R, Rosenthal GE, Septimus EJ, Simon GE, Staman KL, Sugarman J, Vazquez M, Zatzick D, Curtis LH. Pragmatic clinical trials embedded in healthcare systems: generalizable lessons from the NIH collaboratory.BMC Med Res Methodol. 2017; 17:144. doi: 10.1186/s12874-017-0420-7CrossrefMedlineGoogle Scholar
  • 5. Hripcsak G, Duke JD, Shah NH, Reich CG, Huser V, Schuemie MJ, Suchard MA, Park RW, Wong IC, Rijnbeek PR, van der Lei J, Pratt N, Norén GN, Li YC, Stang PE, Madigan D, Ryan PB. Observational Health Data Sciences and Informatics (OHDSI): opportunities for observational researchers.Stud Health Technol Inform. 2015; 216:574–578.MedlineGoogle Scholar
  • 6. Psaty BM, Breckenridge AM. Mini-Sentinel and regulatory science–big data rendered fit and functional.N Engl J Med. 2014; 370:2165–2167. doi: 10.1056/NEJMp1401664CrossrefMedlineGoogle Scholar
  • 7. Forrow S, Campion DM, Herrinton LJ, Nair VP, Robb MA, Wilson M, Platt R. The organizational structure and governing principles of the food and drug administration’s mini-sentinel pilot program.Pharmacoepidemiol Drug Saf. 2012; 21(suppl 1):12–17. doi: 10.1002/pds.2242CrossrefMedlineGoogle Scholar
  • 8. Platt R, Carnahan RM, Brown JS, Chrischilles E, Curtis LH, Hennessy S, Nelson JC, Racoosin JA, Robb M, Schneeweiss S, Toh S, Weiner MG. The U.S. food and drug administration’s mini-sentinel program: status and direction.Pharmacoepidemiol Drug Saf. 2012; 21(suppl 1):1–8. doi: 10.1002/pds.2343MedlineGoogle Scholar
  • 9. Richesson RL, Smerek MM, Blake Cameron C. A framework to support the sharing and reuse of computable phenotype definitions across health care delivery and clinical research applications.EGEMS (Wash DC). 2016; 4:1232. doi: 10.13063/2327-9214.1232MedlineGoogle Scholar
  • 10. Hernandez AF, Fleurence RL, Rothman RL. The ADAPTABLE Trial and PCORnet: shining light on a new research paradigm.Ann Intern Med. 2015; 163:635–636. doi: 10.7326/M15-1460CrossrefMedlineGoogle Scholar
  • 11. Richesson RL, Hammond WE, Nahm M, Wixted D, Simon GE, Robinson JG, Bauck AE, Cifelli D, Smerek MM, Dickerson J, Laws RL, Madigan RA, Rusincovitch SA, Kluchar C, Califf RM. Electronic health records based phenotyping in next-generation clinical trials: a perspective from the NIH health care systems collaboratory.J Am Med Inform Assoc. 2013; 20:e226–e231. doi: 10.1136/amiajnl-2013-001926CrossrefMedlineGoogle Scholar
  • 12. Richesson RL, Sun J, Pathak J, Kho AN, Denny JC. Clinical phenotyping in selected national networks: demonstrating the need for high-throughput, portable, and computational methods.Artif Intell Med. 2016; 71:57–61. doi: 10.1016/j.artmed.2016.05.005CrossrefMedlineGoogle Scholar
  • 13. Pathak J, Kho AN, Denny JC. Electronic health records-driven phenotyping: challenges, recent advances, and perspectives.J Am Med Inform Assoc. 2013; 20:e206–e211. doi: 10.1136/amiajnl-2013-002428CrossrefMedlineGoogle Scholar
  • 14. Boxwala AA, Peleg M, Tu S, Ogunyemi O, Zeng QT, Wang D, Patel VL, Greenes RA, Shortliffe EH. GLIF3: a representation format for sharable computer-interpretable clinical practice guidelines.J Biomed Inform. 2004; 37:147–161. doi: 10.1016/j.jbi.2004.04.002CrossrefMedlineGoogle Scholar
  • 15. Richesson RL, Smerek M, Rusincovitch S, Zozus MN, Chaudhuri PS, Hammond WE, Califf RM, Simon G, Green Beverly, Kahn M, Laws R. Electronic Health Records-Based Phenotyping.In: Rethinking Clinical Trials: A Living Textbook of Pragmatic Clinical Trials. Bethesda, MD: NIH Healthcare Systems Research Collaboratory; 2014.Google Scholar
  • 16. PCORnet. Common Data Model (CDM) Specification, Version 5.1. Accessed April 2, 2020.Google Scholar
  • 17. Roumie CL, Patel NJ, Muñoz D, Bachmann J, Stahl A, Case R, Leak C, Rothman R, Kripalani S. Design and outcomes of the Patient Centered Outcomes Research Institute coronary heart disease cohort study.Contemp Clin Trials Commun. 2018; 10:42–49. doi: 10.1016/j.conctc.2018.03.001CrossrefMedlineGoogle Scholar
  • 18. RxNav. Available at: Accessed April 2, 2020.Google Scholar
  • 19. National Drug Code Directory.Available at: Accessed April 2, 2020.Google Scholar
  • 20. 2016 ICD-10-PCS and GEMs.Available at: Accessed April 2, 2020.Google Scholar
  • 21. Healthcare Cost and Utilization Project Clinical Classification Software (CCS) for ICD-9-CM.Available at: Accessed April 2, 2020.Google Scholar
  • 22. ADAPTABLETRIAL/PHENOTYPE.Available at: Accessed April 2, 2020.Google Scholar
  • 23. Roumie CR, Shirley-Rice J, Kripalani S. MidSouth CDRN. Coronary Heart Disease Algorithm.Vanderbilt University. PheKB; 2014. Available at: Accessed April 2, 2020.Google Scholar
  • 24. Kahn MG, Raebel MA, Glanz JM, Riedlinger K, Steiner JF. A pragmatic framework for single-site and multisite data quality assessment in electronic health record-based clinical research.Med Care. 2012; 50(suppl):S21–S29. doi: 10.1097/MLR.0b013e318257dd67CrossrefMedlineGoogle Scholar
  • 25. Qualls LG, Phillips TA, Hammill BG, Topping J, Louzao DM, Brown JS, Curtis LH, Marsolo K. Evaluating foundational data quality in the national Patient-Centered Clinical Research Network (PCORnet®).EGEMS (Wash DC). 2018; 6:3. doi: 10.5334/egems.199MedlineGoogle Scholar
  • 26. Fishman E, Barron J, Dinh J, Jones WS, Marshall A, Merkh R, Robertson H, Haynes K. Validation of a claims-based algorithm identifying eligible study subjects in the ADAPTABLE pragmatic clinical trial.Contemp Clin Trials Commun. 2018; 12:154–160. doi: 10.1016/j.conctc.2018.11.001CrossrefMedlineGoogle Scholar
  • 27. Weng C, Tu SW, Sim I, Richesson R. Formal representation of eligibility criteria: a literature review.J Biomed Inform. 2010; 43:451–467. doi: 10.1016/j.jbi.2009.12.004CrossrefMedlineGoogle Scholar
  • 28. Richesson RL, Green BB, Laws R, Puro J, Kahn MG, Bauck A, Smerek M, Van Eaton EG, Zozus M, Hammond WE, Stephens KA, Simon GE. Pragmatic (trial) informatics: a perspective from the NIH health care systems research collaboratory.J Am Med Inform Assoc. 2017; 24:996–1001. doi: 10.1093/jamia/ocx016CrossrefMedlineGoogle Scholar
  • 29. Köpcke F, Trinczek B, Majeed RW, Schreiweis B, Wenk J, Leusch T, Ganslandt T, Ohmann C, Bergh B, Röhrig R, Dugas M, Prokosch HU. Evaluation of data completeness in the electronic health record for the purpose of patient recruitment into clinical trials: a retrospective analysis of element presence.BMC Med Inform Decis Mak. 2013; 13:37. doi: 10.1186/1472-6947-13-37CrossrefMedlineGoogle Scholar
  • 30. Weiner JP, Fowles JB, Chan KS. New paradigms for measuring clinical performance using electronic health records.Int J Qual Health Care. 2012; 24:200–205. doi: 10.1093/intqhc/mzs011CrossrefMedlineGoogle Scholar
  • 31. Weiskopf NG, Weng C. Methods and dimensions of electronic health record data quality assessment: enabling reuse for clinical research.J Am Med Inform Assoc. 2013; 20:144–151. doi: 10.1136/amiajnl-2011-000681CrossrefMedlineGoogle Scholar
  • 32. Weiskopf NG, Hripcsak G, Swaminathan S, Weng C. Defining and measuring completeness of electronic health records for secondary use.J Biomed Inform. 2013; 46:830–836. doi: 10.1016/j.jbi.2013.06.010CrossrefMedlineGoogle Scholar
  • 33. Weiner MG, Embi PJ. Toward reuse of clinical data for research and quality improvement: the end of the beginning?Ann Intern Med. 2009; 151:359–360. doi: 10.7326/0003-4819-151-5-200909010-00141CrossrefMedlineGoogle Scholar
  • 34. Dugas M, Lange M, Müller-Tidow C, Kirchhof P, Prokosch HU. Routine data from hospital information systems can support patient recruitment for clinical studies.Clin Trials. 2010; 7:183–189. doi: 10.1177/1740774510363013CrossrefMedlineGoogle Scholar
  • 35. Hersh WR, Weiner MG, Embi PJ, Logan JR, Payne PR, Bernstam EV, Lehmann HP, Hripcsak G, Hartzog TH, Cimino JJ, Saltz JH. Caveats for the use of operational electronic health record data in comparative effectiveness research.Med Care. 2013; 51(8 suppl 3):S30–S37. doi: 10.1097/MLR.0b013e31829b1dbdCrossrefMedlineGoogle Scholar
  • 36. Ahmad FS, Chan C, Rosenman MB, Post WS, Fort DG, Greenland P, Liu KJ, Kho AN, Allen NB. Validity of cardiovascular data from electronic sources: the Multi-Ethnic Study of Atherosclerosis and HealthLNK.Circulation. 2017; 136:1207–1216. doi: 10.1161/CIRCULATIONAHA.117.027436LinkGoogle Scholar
  • 37. Capurro D, Yetisgen M, van Eaton E, Black R, Tarczy-Hornoch P. Availability of structured and unstructured clinical data for comparative effectiveness research and quality improvement: a multisite assessment.EGEMS (Wash DC). 2014; 2:1079. doi: 10.13063/2327-9214.1079MedlineGoogle Scholar
  • 38. Sharma H, Mao C, Zhang Y, Vatani H, Yao L, Zhong Y, Rasmussen L, Jiang G, Pathak J, Luo Y. Developing a portable natural language processing based phenotyping system.BMC Med Inform Decis Mak. 2019; 19(suppl 3):78. doi: 10.1186/s12911-019-0786-zCrossrefMedlineGoogle Scholar
  • 39. Kang T, Zhang S, Tang Y, Hruby GW, Rusanov A, Elhadad N, Weng C. EliIE: an open-source information extraction system for clinical trial eligibility criteria.J Am Med Inform Assoc. 2017; 24:1062–1071. doi: 10.1093/jamia/ocx019CrossrefMedlineGoogle Scholar
  • 40. Hripcsak G, Albers DJ. Next-generation phenotyping of electronic health records.J Am Med Inform Assoc. 2013; 20:117–121. doi: 10.1136/amiajnl-2012-001145CrossrefMedlineGoogle Scholar