Genome-Wide Polygenic Score, Clinical Risk Factors, and Long-Term Trajectories of Coronary Artery Disease
- Other version(s) of this article
You are viewing the most recent version of this article. Previous versions:
Abstract
Objective:
To determine the relationship of a genome-wide polygenic score for coronary artery disease (GPSCAD) with lifetime trajectories of CAD risk, directly compare its predictive capacity to traditional risk factors, and assess its interplay with the Pooled Cohort Equations (PCE) clinical risk estimator.
Approach and Results:
We studied GPSCAD in 28 556 middle-aged participants of the Malmö Diet and Cancer Study, of whom 4122 (14.4%) developed CAD over a median follow-up of 21.3 years. A pronounced gradient in lifetime risk of CAD was observed—16% for those in the lowest GPSCAD decile to 48% in the highest. We evaluated the discriminative capacity of the GPSCAD—as assessed by change in the C-statistic from a baseline model including age and sex—among 5685 individuals with PCE risk estimates available. The increment for the GPSCAD (+0.045, P<0.001) was higher than for any of 11 traditional risk factors (range +0.007 to +0.032). Minimal correlation was observed between GPSCAD and 10-year risk defined by the PCE (r=0.03), and addition of GPSCAD improved the C-statistic of the PCE model by 0.026. A significant gradient in lifetime risk was observed for the GPSCAD, even among individuals within a given PCE clinical risk stratum. We replicated key findings—noting strikingly consistent results—in 325 003 participants of the UK Biobank.
Conclusions:
GPSCAD—a risk estimator available from birth—stratifies individuals into varying trajectories of clinical risk for CAD. Implementation of GPSCAD may enable identification of high-risk individuals early in life, decades in advance of manifest risk factors or disease.
Highlights
Among 28 556 individuals with median follow-up of 21.3 years, the genome-wide polygenic score for coronary artery disease is a powerful predictor of lifetime risk for coronary artery disease.
The genome-wide polygenic score for coronary artery disease adds more predictive information—as assessed by change in C-statistic from a baseline model including age and sex—than any of 11 traditional risk factors.
The genome-wide polygenic score for coronary artery disease provides additional discrimination and predictive information on top of clinical risk assessed by the Pooled Cohort Equations.
Genome-wide polygenic scores (GPSs) are a new quantitative predictor of inherited risk, integrating information from many common sites of DNA variation.1 For coronary artery disease (CAD), we recently validated and tested a genetic predictor (GPSCAD) comprised of 6.6 million variants.2 Within cross-sectional studies, GPSCAD identifies individuals in the extreme tail of the risk distribution who have several-fold increased risk, and others that enjoy substantial inborn protection.
A genetic predictor has potential advantages over the current approach to cardiovascular risk prediction, which is predicated on laboratory and clinical factors typically assessed once individuals reach middle age. GPSCAD can be ascertained early in life, potentially giving individuals a head start of several decades in adopting lifestyle or pharmacological changes that attenuate the development of atherosclerosis. By contrast, the current clinical model used in the United States—based on the Pooled Cohort Equations (PCE)—does not allow for risk estimation before 40 years of age.3 Strikingly, a recent analysis suggested that a simple risk estimator using variables available at birth—age, sex, and a polygenic score—provided equivalent risk discrimination for predicting incident CAD events as the PCE estimator.4
Here, we address 3 key areas of uncertainty. First, we determine the relationship between the GPSCAD and lifetime trajectory of risk for CAD in a middle-aged cohort with prolonged follow-up. Second, we directly compare the predictive capacity of the GPSCAD to other traditional risk factors. Third, we determine the relationship of the GPSCAD to a commonly used risk predictor—the PCE—and assess the interplay of PCE-based and genetic risk in impacting risk of CAD.
To explore these questions, we analyze data from 28 556 participants of the Malmö Diet and Cancer Study—free of CAD at baseline and with median follow-up of 21 years—and replicate key results in 325 003 UK Biobank participants. The added value of this study comes from the extensive follow-up of the Malmö Diet and Cancer Study which allows the evaluation of lifelong effects of the GPSCAD on CAD which has not yet been fully explored.
Materials and Methods
Because of the sensitive nature of the data collected for the Malmö Diet and Cancer Study, requests to access the data set from qualified researchers trained in human subject confidentiality protocols may be sent to Lund University at [email protected]. All UK Biobank data are made available to qualified researchers, with applications reviewed centrally. Additional information is available at https://www.ukbiobank.ac.uk/.
Study Populations
The Malmö Diet and Cancer Study is a prospective population-based cohort that enrolled 30 447 participants between 1991 and 1996 ranging in age from 44 to 73 years. Baseline information on lifestyle and clinical factors was collected using an extensive questionnaire as previously described.5 Assessment of family history of CAD was based on self-report of a first-degree relative who suffered a myocardial infarction. Among these 30 447 participants, 28 556 (94%) who had genetic data available and were free of CAD at time of enrollment were analyzed in the present study. A subset of 5685 participants was previously randomly selected to comprise the Malmö Diet and Cancer Cardiovascular Cohort, and these participants underwent assessment of cholesterol concentrations on fasting blood samples. The ethics committee at Lund University approved study protocols and an informed consent was obtained from all participants.
The UK Biobank is a prospective cohort study that recruited >500 000 participants from the United Kingdom between 2006 and 2010. Age at baseline examination ranged between 40 and 69. Individuals completed extensive questionnaires about sociodemographic, lifestyle, and health-related factors and completed a range of physical measures. Disease outcomes were ascertained based on linkage to Hospital Episode Statistics and Office of Population Censuses and Surveys Classification of Surgical Operations codes and the mortality register from the Office of National Statistics.6 Here, we analyzed 325 003 participants who were not included as part of our previous validation dataset in an analysis used to derive GPSCAD and free from baseline CAD. Written informed consent was obtained from study participants, and this research was conducted using the UK Biobank Resource under Application Number 7089.
Assessment of Incident Coronary Artery Disease Events
Within the Malmö Diet and Cancer Study, CAD was defined as fatal or nonfatal myocardial infarction, coronary artery bypass graft surgery, percutaneous coronary intervention, or death due to CAD. The Swedish Hospital Discharge Register, the Swedish Cause of Death Register and the Swedish Coronary Angiography and Angioplasty Registry were used to identify CAD cases.7,8 Myocardial infarction was defined based on either International Classification of Diseases, Ninth Revision (ICD-9) code 410 or Tenth Revision (ICD-10) code I21. Information about coronary artery bypass surgery was obtained from the national Swedish classification systems of surgical procedures and defined as procedure codes 3065, 3066, 3068, 3080, 3092, 3105, 3127, or 3158 (the Op6 system) or procedure code FN (the KKÅ97 system). Percutaneous coronary intervention status was obtained from the Swedish Coronary Angiography and Angioplasty Registry. Death due to CAD was defined as ICD-9 codes 412 and 414 or ICD-10 codes I22, I23, and I25. Incident event adjudication was available through December 31, 2016.
Within the UK Biobank, incident CAD was similarly defined based on hospitalization with or death due to ICD-10 codes for acute or subsequent myocardial infarction (I21, I22, I23, I24.1, and I25.2); or hospitalization with ICD-9 codes for myocardial infarction (410, 411, and 412); or hospitalization with OPCS-4 (Office of Population Censuses and Surveys) codes for coronary artery bypass grafting (K40, K41, and K45) or coronary angioplasty with or without stenting (K49, K50.2, and K75).
Assessment of GPSCAD
Genotyping of Malmö Diet and Cancer Study participants was performed using the Illumina GSA v1 genotyping array. Of 29 304 samples which underwent genotyping and were free from CAD at baseline, 28 556 (97%) were retained after quality control procedures that removed low-quality samples (discordance between reported and genetically inferred sex, low call rate (<90%), and sample duplicates). With respect to genetic variants, quality control was performed with removal of those not in Hardy-Weinberg equilibrium (P<1×10−15). Imputation was then performed using the Haplotype Reference Consortium reference panel.9 The previously reported GPSCAD was computed using 6 234 207 of the 6 630 150 (94%) variants with high-quality imputation results available, as defined by information score (INFO) > 0.3. For each participant, the raw GPSCAD was generated by multiplying the genotype dosage for each risk-increasing allele by its respective weight and then summing across all variants in the score using PLINK2 software.10 To enable adjustment for genetic ancestry, principal components of ancestry were computed using EIGENSOFT software package.11,12 The computed raw GPSCAD was ancestry-adjusted by taking the residual of a linear regression model that predicted GPSCAD using the first ten principal components as performed previously.13
Genotyping of UK Biobank participants was performed using the custom UK BiLEVE Axiom Array or the closely related UK Biobank Axiom Array. Of the 489 212 genotyped samples, 487 320 were retained after quality control procedures that removed samples with low genotype quality (outliers for heterozygosity or genotype missing rates, discordance between reported and genetically inferred sex, putative sex chromosome aneuploidy, or withdrawal of informed consent). Details of sample and variant quality control measures have been previously reported.6 Imputation was performed using IMPUTE4 and the Haplotype Reference Consortium,9 the UK10K,14 and the 1000 Genomes Phase 3,15 after performing sample and variant quality control (n=487 320). Out of the 437 320 high-quality samples, our study was restricted to 325 003 free from CAD at baseline and do not include samples previously used to validate our GPSCAD. The GPSCAD was generated from 6 630 150 variants with high imputation quality (INFO>0.3). The raw GPSCAD was generated by multiplying the genotype dosage of each CAD risk increasing allele for each variant by its respective weight and then summing across all variants in the score using PLINK2 software.10 The raw GPSCAD was ancestry-adjusted by taking the residual of a linear regression model that predicted GPSCAD using the first ten principal components.
Statistical Analysis
All analyses in the Malmö Diet and Cancer Study and the UK Biobank were performed after exclusion of individuals with CAD at time of enrollment. Cox proportional hazard models adjusted for age and sex were used to assess time-to-event relationship between clinical risk factors and GPSCAD with incident CAD events. To assess the lifetime risk of CAD by GPSCAD, we performed Cox proportional hazard analyses adjusted for sex and using age as the underlying time scale. Lifetime cumulative absolute risks up to age 90 years within deciles of GPSCAD were derived from cumulative event curves from Cox models adjusted for sex and using age as the time scale.16 The distributions of Z-score transformed GPSCAD and baseline measures of clinical risk factors, including systolic blood pressure, blood apolipoprotein B and A1 levels, and body mass index, were compared. In addition, the cumulative lifetime CAD risk up to age 90 was assessed within percentiles of GPSCAD and baseline measures of clinical risk factors. Competing risk analysis accounting for non-CAD death was performed using the Fine and Gray method.17
To assess and compare the discriminative capacity of the GPSCAD with clinical risk factors, we obtained Harrell C-statistics in the Malmö Diet and Cancer Cardiovascular Cohort using Cox proportional hazard analysis over a 10-year follow-up period. This analysis was limited to the Malmö Diet and Cancer Cardiovascular Cohort as total and HDL (high-density lipoprotein)-cholesterol were only measured there. The C-statistics of individual clinical risk factors or GPSCAD were assessed on top of a baseline model of age, sex, and principal components. Additionally, clinical risk factors were used to calculate the 10-year atherosclerotic cardiovascular disease risk using the PCE. The C-statistics were then obtained for the PCE and for a model including both PCE and GPSCAD. To obtain P values for differences in prediction models, we used partial likelihood ratio tests. The 10-year absolute risk and lifetime risk up to 90 years of age were obtained in PCE risk categories (low: <5%, borderline: ≥5% and <7.5%, intermediate: ≥7.5% and <20%, and high: ≥20%) and genetic risk as defined by quintiles of GPSCAD (low: bottom quintile, intermediate: middle 3 quintiles, and high: top quintile) as performed previously.18 To allow for direct comparison between PCE and GPSCAD, we additionally obtained risks in PCE categories defined by quintiles of risk (low: bottom quintile, intermediate: middle 3 quintiles, and high: top quintile). Interactions between GPSCAD and PCE categories were analyzed by introducing multiplicative terms in Cox proportional hazard models. Net reclassification improvement was calculated from risks predicted using Cox proportional hazard models over a 10-year follow-up period. The net reclassification improvement was assessed after addition of GPSCAD to PCE in 4-risk categories (<5%, ≥5% and <7.5%, ≥7.5% and <20%, and ≥20%). CIs were determined using bootstrapping. Key results were replicated in the UK Biobank study.
A 2-sided P<0.05 was considered statistically significant. All analyses were performed in R version 3.5.
Results
To determine the relationship between GPSCAD and trajectories of risk for CAD, we analyzed 28 556 participants from the Malmö Diet and Cancer Study who were free of CAD at enrollment. Mean age was 57.9 years (interquartile range [IQR], 51.1–64.0) and 11 063 (38.7%) were male. The majority of the participants were of European ancestry, as confirmed by genetic principal component analysis. Over a median follow-up of 21.3 years (IQR, 16.1–23.1), an incident CAD event was noted in 4122 (14.4%) of the participants. As expected, individuals who went on to suffer a CAD event were enriched for traditional risk factors as compared with those who remained free of CAD (Table).
Incident CAD (n=4122) | No Incident CAD (n= 24 434) | P Value* | |
---|---|---|---|
Age, y | 60.7±7.0 | 57.4±7.6 | 9×10−215 |
Male sex, n (%) | 2455 (59.6) | 8608 (35.2) | 7×10−195 |
European ancestry, n (%) | 4093 (99.3) | 24193 (99.0) | 0.35 |
Current smoker, n (%) | 1283 (33.4) | 6329 (27.5) | 4×10−52 |
Family history of CAD, n (%) | 1702 (41.3) | 8174 (33.5) | 4×10−27 |
Body mass index, kg/m2 | 26.7±4.1 | 25.7±4.0 | 6×10−31 |
Systolic blood pressure, mm Hg | 148±20 | 140±20 | 2×10−64 |
Diastolic blood pressure, mm Hg | 88±10 | 85±10 | 1×10−32 |
Use of antihypertensives, n (%) | 1006 (24.4) | 3429 (14.0) | 6×10−44 |
Use of lipid-lowering, n (%) | 188 (4.6) | 424 (1.7) | 2×10−17 |
Diabetes mellitus, n (%) | 381 (9.2) | 855 (3.5) | 4×10−56 |
Apolipoprotein B, mg/dL | 116.8±26.2 | 105.5±25.8 | 6×10−85 |
Apolipoprotein A1, mg/dL | 149.0±26.8 | 158.5±28.2 | 1×10−39 |
We next stratified individuals into deciles of the GPSCAD, noting a striking gradient in risk of CAD. The lifetime risk ranged from 16.3% (95% CI, 15.3%–18.3%) for those in the lowest decile to 47.7% (95% CI, 44.8–50.5%) for those in the highest decile (Figure 1). In a survival model adjusted for age and sex, the hazard ratio for top versus bottom decile was 3.67 (95% CI, 3.17–4.26; P=3×10−66). Expressed as a continuous variable, hazard ratio per SD increment in GPSCAD was 1.45 (95% CI, 1.40–1.49; P=1×10−124). Competing risk analysis accounting for non-CAD related mortality provided similar results—hazard ratios of 3.66 (95% CI, 3.15–4.25; P=2×10−65) for top versus bottom decile and 1.45 (95% CI, 1.40–1.49; P=5×10−118) per SD.

Figure 1. Lifetime risk of coronary artery disease (CAD) in the Malmö Diet and Cancer Study according to decile of the genome-wide polygenic score (GPS).A, Cumulative incidence curves for incident CAD are displayed according to decile of the GPSCAD. Cumulative events were obtained from Cox proportional hazard models using age as the underlying time scale and standardized for sex using the population mean. B, Lifetime risk of CAD—defined as cumulative risk by 90 y of age—is plotted according to deciles of GPSCAD. Error bars reflect 95% CI.
This gradient in risk was observed despite modest differences in traditional risk factors across GPSCAD deciles (Table I in the Data Supplement)—for example, average systolic blood pressure was 142 mm Hg for those in the top decile versus 140 for those in the bottom decile, and average apolipoprotein B was 112 mg/dL for those in the top decile versus 102 for those in the bottom decile.
To place GPSCAD into the context of traditional risk factors, we next plotted the distribution of GPSCAD stratified by those who went on to develop CAD versus those who did not (Figure I in the Data Supplement). As expected for a complex disease, the distribution of GPSCAD was right-shifted for those who developed disease, but with significant overlap with those who remained free of disease. Average value for GPSCAD—expressed in units of a Z-score (SDs from the mean)—was 0.30 for those who had an event versus −0.05 for those who did not (P=1×10−124). A similar pattern was observed for traditional risk factors including systolic blood pressure, apolipoprotein B, apolipoprotein A1, and body mass index.
Risk gradient across the distribution of GPSCAD was more pronounced than for traditional risk factors. For example, the lifetime risk of developing CAD ranged from 13.9% to 54.2% across percentiles of the GPSCAD (Figure II in the Data Supplement). Analogous results for traditional risk factors included a gradient of 11.7% to 41.6% for systolic blood pressure, 16.1% to 49% for apolipoprotein B, 50.4% to 20.9% for apolipoprotein A1 (higher values for apolipoprotein A1 are associated with lower risk), and 22.8% to 45.7% for body mass index.
To allow direct comparison of GPSCAD with the PCE risk estimator, we focused this analysis in the subset of 5685 participants from the Malmö Diet and Cancer Study selected at random to be part of the Cardiovascular Cohort, in whom cholesterol concentrations had been measured. Among these individuals, 815 (14.3%) developed an incident CAD event over a median follow-up of 23.2 years (IQR, 17.6–24.2). Traditional risk factors were enriched in those who developed incident CAD, with a pattern similar to the overall cohort (Table II in the Data Supplement).
GPSCAD had a higher discriminative capacity—as assessed by the C-statistic—than any of the traditional risk factors. We first evaluated a baseline model which included age, sex, and the first 10 principal components of genetic ancestry, yielding a C-statistic of 0.714. Each of 11 traditional risk factors was then added (individually) to this baseline model, with the resulting C-statistic ranging from 0.720 to 0.746 (Figure 2). Representative examples include systolic blood pressure (0.746), LDL (low-density lipoprotein)-cholesterol (0.731), and family history (0.721). By contrast, the addition of GPSCAD led to a C-statistic of 0.759.

Figure 2. Discriminative capacity of the genome-wide polygenic score (GPS) and clinical risk factors in the Malmö Diet and Cancer Cardiovascular Cohort. The C-statistic was first obtained for a baseline model of age, sex, and the first 10 principal components (PCs) using a Cox proportional hazard model with follow-up time as the underlying time scale. Next, the C-statistic was calculated after additional inclusion of individual clinical risk factors, the genome-wide polygenic score (GPSCAD), the Pooled Cohort Equations (PCE) risk estimator, and both GPSCAD and PCE.
To further study the relationship of GPSCAD with incident CAD events, we next constructed a multivariable model including the GPSCAD, family history, and clinical risk factors. Within this model, the GPSCAD remained strongly associated with incident events (hazard ratio per SD increment 1.45 [95% CI, 1.34–1.56]; P=2×10−22). By comparison, hazard ratio per SD increment for systolic blood pressure was 1.29 and hazard ratio for family history was 1.37 (Table III in the Data Supplement).
The GPSCAD had minimal correlation with the PCE risk estimator and added significantly to risk discrimination. Consistent with previous reports, there was minimal correlation between GPSCAD and 10-year risk predicted by the PCE (correlation coefficient 0.03; P=0.08; Figure III in the Data Supplement). Additionally, the GPSCAD had weak correlation with clinical risk factors when studied individually. Absolute Pearson r coefficients ranged between 0.01 and 0.1 for body mass index, systolic blood pressure, apolipoprotein B, apolipoprotein A1, and for total-, HDL-, and LDL-cholesterol (Table IV in the Data Supplement).
With respect to discriminative capacity, the C-statistic of the PCE, which integrates several risk factors assessed in middle age, was 0.776 versus 0.759 for the GPSCAD. Addition of the GPSCAD resulted in a significantly higher C-statistic than either of the risk estimators considered in isolation, 0.802 (P=1×10−12 and 5×10−15).
Within any given strata of risk predicted by the PCE risk estimator, a significant gradient in risk was noted according to GPSCAD (Figure 3 and Table V in the Data Supplement). For example, among those at intermediate clinical risk according to the PCE (10-year predicted risk of 7.5%–20%), the observed 10-year risk of incident CAD was 3.3% for those in the lowest quintile of the GPSCAD versus 8.4% for those in the highest quintile. When extended to lifetime risk of CAD, observed risk was 18.1% for those in the lowest quintile versus 41.3% of those in the highest quintile. There was no interaction between the GPSCAD and PCE risk categories in prediction of both 10-year (Pinteraction=0.56) and lifetime (Pinteraction=0.93) risks. A similar pattern was observed when PCE was stratified into 3 risk categories defined by quintiles (low, bottom quintile; intermediate, middle 3 quintiles; and high, top quintile; Figure IV in the Data Supplement). Adding the GPSCAD to the PCE significantly improved the overall reclassification accuracy. The 10-year net reclassification improvement was 0.165 (95% CI, 0.076–0.182) in a 4-category risk assessment (<5%, 5%–7.5%, 7.5%–20%, and >20%; Table VI in the Data Supplement).

Figure 3. Ten-year and lifetime risk of coronary artery disease (CAD) according to Pooled Cohort Equations risk category and genome-wide polygenic score (GPS) in the Malmö Diet and Cancer Cardiovascular Cohort. Participants of the Malmö Diet and Cancer Cardiovascular Cohort were first stratified into low (<5%), borderline (≥5% and <7.5%), intermediate (≥7.5% and <20%), and high (≥20%) 10-year risk of atherosclerotic cardiovascular disease categories using the Pooled Cohort Equations clinical risk estimator. Next individuals were stratified into low (bottom quintile), intermediate (quintiles 2–4), or high (top quintile) polygenic risk according to the GPSCAD. A, Ten-year risk of incident CAD is displayed, based on a Cox proportional hazard model using follow-up time as the time scale and standardized for age and sex using the population mean. B, Lifetime risk of CAD—defined as cumulative risk by 90 y of age—is displayed, based on a Cox proportional hazard model using age as the underlying time scale and standardized sex using the population mean.
To replicate the key findings in an independent cohort, we next analyzed 325 003 participants of the UK Biobank who were free of CAD at time of enrollment. Mean age was 56.8 years (IQR, 50.3–63.5) and 143 538 (44.2%) were male. Over a median follow-up of 8.1 years (IQR, 7.4–8.8), an incident CAD event was observed in 7708 (2.4%). The majority of the participants had self-reported European (93.6%) ancestry. The study included a minority of self-reported African (1.8%), South Asian (2.1%), Chinese (0.3%), and other (2.2%) ancestries. Traditional risk factors were enriched among these 7708 participants as compared to those who remained free of CAD (Table VII in the Data Supplement). Mean Z-score of the GPSCAD was 0.39 for those who developed incident CAD versus −0.01 for those who did not (P=2×10−39), and hazard ratio per SD increment in the GPSCAD was 1.53 (95% CI, 1.49–1.56; P=5×10−303).
Within the UK Biobank cohort, the C-statistic of the baseline model of age, sex, and principal components of genetic ancestry was 0.730. As noted in the Malmö Diet and Cancer Study, addition of GPSCAD led to significant improvement in discriminative capacity, in this case to 0.756—again higher than any single traditional risk factor (Figure 4). Even more striking, the model including only variables available at birth—age, sex, and GPSCAD—outperformed the PCE risk estimator, C-statistics of 0.756 versus 0.748, respectively (P=3×10−110). Addition of the GPSCAD to the PCE risk estimator increased the C-statistic from 0.748 to 0.768 (P=3×10−243).

Figure 4. Discriminative capacity of the genome-wide polygenic score (GPS) and clinical risk factors in the UK Biobank. The C-statistic was first obtained for a baseline model of age, sex, and the first 10 principal components using a Cox proportional hazard model with follow-up time as the underlying time scale. Next, the C-statistic was calculated after additional inclusion of individual clinical risk factors, the Pooled Cohort Equations risk estimator, and the GPS.
As in the Malmö study, GPSCAD in the UK Biobank remained strongly associated with incident CAD in a multivariable model that included family history and clinical risk factors. Within this model, hazard ratio per SD increment for GPSCAD was 1.46 (95% CI, 1.42–1.49). By comparison, hazard ratio per SD increment for systolic blood pressure was 1.19, and hazard ratio for family history was 1.41 (Table VIII in the Data Supplement).
Among UK Biobank participants, we again noted minimal correlation between GPSCAD and 10-year risk predicted by the PCE (correlation coefficient =0.007; P=0.0001), and a risk gradient within each of the PCE risk strata was noted according to GPSCAD (Figure 5 and Table IX in the Data Supplement). For example, among those at intermediate clinical risk according to the PCE (10-year predicted risk of 7.5%–20%), the observed 10-year risk of incident coronary artery disease was 2.8% for those in the lowest quintile of the GPSCAD versus 8.3% for those in the highest quintile. A similar pattern was observed when PCE was stratified into 3 risk categories defined by quintiles (low, bottom quintile; intermediate, middle 3 quintiles; and high, top quintile; Figure V in the Data Supplement). The net reclassification improvement was 0.085 (95% CI, 0.071–0.098) in a 4-category risk assessment (<5%, 5%–7.5%, 7.5%–20%, and >20%; Table X in the Data Supplement).

Figure 5. Ten-year risk of coronary artery disease (CAD) according to Pooled Cohort Equations risk category and genome-wide polygenic score (GPS) in the UK Biobank. Participants of the UK Biobank were first stratified into low (<5%), borderline (≥5% and <7.5%), intermediate (≥7.5% and <20%), and high (≥20%) 10-year risk of atherosclerotic cardiovascular disease categories using the Pooled Cohort Equations clinical risk estimator. Next individuals were stratified into low (bottom quintile), intermediate (quintiles 2–4), or high (top quintile) polygenic risk according to the GPSCAD. The 10-year risk of incident coronary artery disease is displayed based on a Cox proportional hazard model using follow-up time as the time scale and standardized for age and sex using the population mean.
Discussion
In the Malmö Diet and Cancer Study including 28 556 participants followed for an average of >21 years, we find that GPSCAD is a powerful predictor of lifetime risk for CAD. The GPSCAD has a greater discriminative capacity than any of several traditional risk factors assessed in middle age and refines risk estimation even within strata of risk predicted by PCE. These findings were replicated in 325 003 participants of the UK Biobank.
These results support 5 key conclusions. First, inborn DNA variation—as quantified by GPSCAD—can stratify the population into very different trajectories of lifetime risk of CAD. With respect to lifetime risk, 16% those in the lowest decile of GPSCAD ultimately manifested CAD versus 48% of those in the highest decile. This result is largely in keeping with a recent analysis of the FinnGen cohort that similarly enabled analysis over long-term follow-up.19 The incremental discriminative capacity beyond age and sex was higher for GPSCAD than any single traditional risk factor, including those such as elevated blood pressure that typically does not emerge until middle age.
Second, there was almost no correlation between GPSCAD and the PCE clinical risk estimator—Pearson correlation coefficient 0.03 and 0.007 in the Malmö Diet and Cancer Study and the UK Biobank, respectively. This lack of correlation is largely explained by the heavy reliance of age and sex as the most important predictors of the PCE risk estimates, neither of which vary according to the GPSCAD. Beyond traditional risk factors, UK Biobank participants with a high GPSCAD had modestly increased burden of additional risk-enhancing factors—including increased inflammation as assessed by C-reactive protein concentrations or rates of rheumatoid arthritis, and more frequent history of preeclampsia (Table XI in the Data Supplement). Moreover, genetic variants most heavily weighted in the GPSCAD quantify risk related to many pathways not currently measured in clinical practice, including transendothelial migration, cellular proliferation, and vascular tone.20
Third, these results confirm and extend recent analyses in middle-aged individuals that noted an increment in the C-statistic of about 0.02 when adding a polygenic score to the PCE risk estimate. Here, we observe an increment of 0.026 in the Malmö Diet and Cancer Study and 0.020 in the UK Biobank. An almost identical result, 0.021, was recently reported in the Multi-Ethnic Study of Atherosclerosis.21 By contrast, the authors noted no such improvement in the ARIC study (Atherosclerosis Risk in Communities).21 However, PCE-related analyses in the ARIC study are subject to significant bias, since this was one of the cohorts used in its derivation. To address this shortcoming, the authors chose to study only follow-up time not used previously as part of the PCE derivation, starting follow-up time analysis at the fourth visit rather than at enrollment, leading to the exclusion of 35% of participants and 73% of the incident CAD events. We hypothesized that this analytic choice may have underestimated the incremental value of the polygenic score. We re-analyzed ARIC including all participants at the baseline visit and we found that adding GPSCAD to the PCE increased the C-statistic by 0.013, from 0.726 to 0.739 (P=8.6×10−29; Table XII in the Data Supplement). As such, across 4 studies, the increment in the C-statistic seems to be consistently about 0.02 when adding a polygenic score to the PCE risk estimate.
Fourth, the value of GPSCAD may be even more pronounced if assessed early in life, providing a several-decade head start for disease prevention before clinical risk factors set in. This is particularly true for CAD, where early adulthood exposures play a key role in atherosclerotic progression. We previously noted that a high GPSCAD can be offset by either adherence to a healthy lifestyle or statin therapy.18,22,23 More recently, post hoc analyses of clinical trials of PCSK-9 (proprotein convertase subtilisin/kexin type 9) inhibition extended results from earlier statin studies, confirming that high polygenic score individuals enjoy the greatest absolute and relative benefit of LDL-cholesterol lowering therapies.24,25
Fifth, beyond CAD, future clinical practice may include a single, low-cost genetic assessment early in life that would enable polygenic score calculation for a range of important diseases.2 Representative examples and associated interventions might include wearable screening for those at high genetic risk of atrial fibrillation, and earlier cancer screenings tailored to inherited risk.
Additional research efforts will be important to build the evidence base necessary for GPSCAD to be integrated into routine clinical practice. One important next step is to integrate polygenic scores with additional genetic and nongenetic risk estimators to better inform clinical decision making. An early example includes our recent demonstration that GPSCAD—which quantifies risk from common variant background—powerfully modifies risk of CAD conferred by monogenic familial hypercholesterolemia mutations.26 A second key research need is to prospectively demonstrate that disclosure of GPSCAD can motivate lifestyle changes and more efficient use of screening technologies such as coronary artery calcification, or medication initiation or adherence.
Limitations
First, the PCE clinical risk estimator was optimized to predict a composite of CAD and stroke outcomes rather than CAD alone, which may have led to modest decreases in performance. Second, the GPSCAD integrates information from common DNA variation, but rare variants—such as those related to familial hypercholesterolemia—and lifestyle factors are known to play an important role in certain individuals as well. Third, the majority of individuals studied here were of European ancestry—ongoing efforts to expand genetic research in more diverse populations will enable assessment and improvement of transethnic transferability.13 Fourth, it is likely that the utility of a GPS is even more pronounced if assessed before middle age—additional studies of young adults with long-term follow-up are an important next step.
Conclusions
We demonstrate that GPSCAD—an estimator available from birth—can stratify individuals into varying trajectories of clinical risk even late in life, improves risk discrimination more than other clinical risk scores, and allows for refined risk estimation within any given strata of risk predicted by the current clinical estimator.
ARIC | Atherosclerosis Risk in Communities |
CAD | coronary artery disease |
GPS | genome-wide polygenic score |
GPSCAD | genome-wide polygenic score for coronary artery disease |
HDL | high-density lipoprotein |
ICD | International Classification of Diseases |
IQR | interquartile range |
LDL | low-density lipoprotein |
PCE | Pooled Cohort Equations |
Acknowledgments
We thank the participants of the Malmö Diet and Cancer Study and the UK Biobank.
Sources of Funding
G. Hindy was supported by the Swedish Research Council. O. Melander was supported by the European Research Council (Advanced Grant nr 885003), Knut and Alice Wallenberg Foundation, the Göran Gustafsson Foundation, the Swedish Heart- and Lung Foundation, the Swedish Research Council, the Novo Nordisk Foundation, Region Skåne and Skåne University Hospital and Swedish Foundation for Strategic Research for IRC15-0067. M. Orho-Melander was supported by the European Research Council (Consolidator grant nr 649021, M. Orho-Melander), the Swedish Research Council, the Swedish Heart and Lung Foundation, the Region Skåne, the Swedish Diabetes Foundation, the Novo Nordic Foundation and the Albert Påhlsson and Swedish Foundation for Strategic Research for IRC15-0067. Genotyping of the Malmö Diet and Cancer Study was supported by the Regeneron Genetics Center. Dr Khera was supported by grants 1K08HG010155 and 5UM1HG008895 from the National Human Genome Research Institute, a Hassenfeld Scholar Award from Massachusetts General Hospital, a Merkin Institute Fellowship from the Broad Institute of MIT and Harvard, and a sponsored research agreement from IBM Research.
Disclosures
K. Ng is an employee of IBM Research. L.A. Lotta and A. Baras are employees of the Regeneron Genetics Center. S. Kathiresan is an employee of Verve Therapeutics and holds equity in Verve Therapeutics, Maze Therapeutics, Catabasis, and San Therapeutics. He is a member of the scientific advisory boards for Regeneron Genetics Center and Corvidia Therapeutics; he has served as a consultant for Acceleron, Eli Lilly, Novartis, Merck, Novo Nordisk, Novo Ventures, Ionis, Alnylam, Aegerion, Haug Partners, Noble Insights, Leerink Partners, Bayer Healthcare, Illumina, Color Genomics, MedGenome, Quest, and Medscape; he reports patents related to a method of identifying and treating a person having a predisposition to or afflicted with cardiometabolic disease (20180010185) and a genetics risk predictor (20190017119). Dr Khera has served as a consultant to Sanofi, Medicines Company, Maze Pharmaceuticals, Navitor Pharmaceuticals, Verve Therapeutics, Amgen, and Color Genomics; received speaking fees from Illumina, the Novartis Institute for Biomedical Research; received sponsored research agreements from the Novartis Institute for Biomedical Research and IBM Research, and reports a patent related to a genetic risk predictor (20190017119). The other authors report no conflicts.
Footnotes
References
- 1.
Evans DM, Visscher PM, Wray NR . Harnessing the information contained within genome-wide association studies to improve individual prediction of complex disease risk.Hum Mol Genet. 2009; 18:3525–3531. doi: 10.1093/hmg/ddp295CrossrefMedlineGoogle Scholar - 2.
Khera AV, Chaffin M, Aragam KG, Haas ME, Roselli C, Choi SH, Natarajan P, Lander ES, Lubitz SA, Ellinor PT, . Genome-wide polygenic scores for common diseases identify individuals with risk equivalent to monogenic mutations.Nat Genet. 2018; 50:1219–1224. doi: 10.1038/s41588-018-0183-zCrossrefMedlineGoogle Scholar - 3.
Goff DC, Lloyd-Jones DM, Bennett G, Coady S, D’Agostino RB, Gibbons R, Greenland P, Lackland DT, Levy D, O’Donnell CJ, . 2013 ACC/AHA guideline on the assessment of cardiovascular risk: a report of the American College of Cardiology/American Heart Association Task Force on Practice Guidelines.J Am Coll Cardiol. 2014; 63(25 Pt B):2935–2959. doi: 10.1016/j.jacc.2013.11.005CrossrefMedlineGoogle Scholar - 4.
Elliott J, Bodinier B, Bond TA, Chadeau-Hyam M, Evangelou E, Moons KGM, Dehghan A, Muller DC, Elliott P, Tzoulaki I . Predictive accuracy of a polygenic risk score-enhanced prediction model vs a clinical risk score for coronary artery disease.JAMA. 2020; 323:636–645. doi: 10.1001/jama.2019.22241CrossrefMedlineGoogle Scholar - 5.
Berglund G, Elmstähl S, Janzon L, Larsson SA . The malmo diet and cancer study. Design and feasibility.J Intern Med. 1993; 233:45–51. doi: 10.1111/j.1365-2796.1993.tb00647.xCrossrefMedlineGoogle Scholar - 6.
Bycroft C, Freeman C, Petkova D, Band G, Elliott LT, Sharp K, Motyer A, Vukcevic D, Delaneau O, O’Connell J, . The UK Biobank resource with deep phenotyping and genomic data.Nature. 2018; 562:203–209. doi: 10.1038/s41586-018-0579-zCrossrefMedlineGoogle Scholar - 7.
Ludvigsson JF, Andersson E, Ekbom A, Feychting M, Kim JL, Reuterwall C, Heurgren M, Olausson PO . External review and validation of the Swedish national inpatient register.BMC Public Health. 2011; 11:450. doi: 10.1186/1471-2458-11-450CrossrefMedlineGoogle Scholar - 8.
James SK, Stenestrand U, Lindback J, Carlsson J, Schersten F, Nilsson T, Wallentin L, Lagerqvist B, SCAAR Study Group. Long-term safety and efficacy of drug-eluting versus bare-metal stents in sweden.N Engl J Med. 2009; 360:1933–1945. doi: 10.1056/NEJMoa0809902Google Scholar - 9.
McCarthy S, Das S, Kretzschmar W, Delaneau O, Wood AR, Teumer A, Kang HM, Fuchsberger C, Danecek P, Sharp K, ; Haplotype Reference Consortium. A reference panel of 64,976 haplotypes for genotype imputation.Nat Genet. 2016; 48:1279–1283. doi: 10.1038/ng.3643CrossrefMedlineGoogle Scholar - 10.
Chang CC, Chow CC, Tellier LC, Vattikuti S, Purcell SM, Lee JJ . Second-generation PLINK: rising to the challenge of larger and richer datasets.Gigascience. 2015; 4:7. doi: 10.1186/s13742-015-0047-8CrossrefMedlineGoogle Scholar - 11.
Price AL, Patterson NJ, Plenge RM, Weinblatt ME, Shadick NA, Reich D . Principal components analysis corrects for stratification in genome-wide association studies.Nat Genet. 2006; 38:904–909. doi: 10.1038/ng1847CrossrefMedlineGoogle Scholar - 12.
Patterson N, Price AL, Reich D . Population structure and eigenanalysis.PLoS Genet. 2006; 2:e190. doi: 10.1371/journal.pgen.0020190CrossrefMedlineGoogle Scholar - 13.
Khera AV, Chaffin M, Zekavat SM, Collins RL, Roselli C, Natarajan P, Lichtman JH, D’Onofrio G, Mattera J, Dreyer R, . Whole-genome sequencing to characterize monogenic and polygenic contributions in patients hospitalized with early-onset myocardial infarction.Circulation. 2019; 139:1593–1602. doi: 10.1161/CIRCULATIONAHA.118.035658LinkGoogle Scholar - 14.
Walter K, Min JL, Huang J, . The uk10k project identifies rare variants in health and disease.Nature. 2015; 526:82–90.CrossrefMedlineGoogle Scholar - 15.
Auton A, Brooks LD, Durbin RM, Garrison EP, Kang HM, Korbel JO, Marchini JL, McCarthy S, McVean GA, Abecasis GR ; 1000 Genomes Project Consortium. A global reference for human genetic variation.Nature. 2015; 526:68–74. doi: 10.1038/nature15393CrossrefMedlineGoogle Scholar - 16.
Beiser A, D’Agostino RB, Seshadri S, Sullivan LM, Wolf PA . Computing estimates of incidence, including lifetime risk: Alzheimer’s disease in the Framingham Study. The practical incidence estimators (PIE) macro.Stat Med. 2000; 19:1495–1522.doi: 10.1002/(sici)1097-0258(20000615/30)19:11/12<1495::aid-sim441>3.0.co;2-eCrossrefMedlineGoogle Scholar - 17.
Fine JP, Gray RJ . A proportional hazards model for the subdistribution of a competing risk.Journal of the American Statistical Association. 1999; 94:496–509.CrossrefGoogle Scholar - 18.
Khera AV, Emdin CA, Drake I, Natarajan P, Bick AG, Cook NR, Chasman DI, Baber U, Mehran R, Rader DJ, . Genetic risk, adherence to a healthy lifestyle, and coronary disease.N Engl J Med. 2016; 375:2349–2358. doi: 10.1056/NEJMoa1605086CrossrefMedlineGoogle Scholar - 19.
Mars N, Koskela JT, Ripatti P, Kiiskinen TTJ, Havulinna AS, Lindbohm JV, Ahola-Olli A, Kurki M, Karjalainen J, Palta P, ; FinnGen. Polygenic and clinical risk scores and their impact on age at onset and prediction of cardiometabolic diseases and common cancers.Nat Med. 2020; 26:549–557. doi: 10.1038/s41591-020-0800-0CrossrefMedlineGoogle Scholar - 20.
Khera AV, Kathiresan S . Genetics of coronary artery disease: discovery, biology and clinical translation.Nat Rev Genet. 2017; 18:331–344. doi: 10.1038/nrg.2016.160CrossrefMedlineGoogle Scholar - 21.
Mosley JD, Gupta DK, Tan J, Yao J, Wells QS, Shaffer CM, Kundu S, Robinson-Cohen C, Psaty BM, Rich SS, . Predictive accuracy of a polygenic risk score compared with a clinical risk score for incident coronary heart disease.JAMA. 2020; 323:627–635. doi: 10.1001/jama.2019.21782CrossrefMedlineGoogle Scholar - 22.
Mega JL, Stitziel NO, Smith JG, Chasman DI, Caulfield M, Devlin JJ, Nordio F, Hyde C, Cannon CP, Sacks F, . Genetic risk, coronary heart disease events, and the clinical benefit of statin therapy: an analysis of primary and secondary prevention trials.Lancet. 2015; 385:2264–2271. doi: 10.1016/S0140-6736(14)61730-XCrossrefMedlineGoogle Scholar - 23.
Natarajan P, Young R, Stitziel NO, Padmanabhan S, Baber U, Mehran R, Sartori S, Fuster V, Reilly DF, Butterworth A, . Polygenic risk score identifies subgroup with higher burden of atherosclerosis and greater relative benefit from statin therapy in the primary prevention setting.Circulation. 2017; 135:2091–2101. doi: 10.1161/CIRCULATIONAHA.116.024436LinkGoogle Scholar - 24.
Marston NA, Kamanu FK, Nordio F, Gurmu Y, Roselli C, Sever PS, Pedersen TR, Keech AC, Wang H, Lira Pineda A, . Predicting benefit from evolocumab therapy in patients with atherosclerotic disease using a genetic risk score: results from the FOURIER trial.Circulation. 2020; 141:616–623. doi: 10.1161/CIRCULATIONAHA.119.043805LinkGoogle Scholar - 25.
Damask A, Steg PG, Schwartz GG, Szarek M, Hagström E, Badimon L, Chapman MJ, Boileau C, Tsimikas S, Ginsberg HN, ; Regeneron Genetics Center and the ODYSSEY OUTCOMES Investigators. Patients with high genome-wide polygenic risk scores for coronary artery disease may receive greater clinical benefit from alirocumab treatment in the ODYSSEY OUTCOMES trial.Circulation. 2020; 141:624–636. doi: 10.1161/CIRCULATIONAHA.119.044434LinkGoogle Scholar - 26.
Fahed AC, Wang M, Homburger JR, Patel AP, Bick AG, Neben CL, Lai C, Brockman D, Philippakis A, Ellinor PT, Cassa CA, Lebo M, Ng K, Lander ES, Zhou AY, Kathiresan S, Khera AV . Polygenic background modifies penetrance of monogenic variants for tier 1 genomic conditions.Nat Commun. 2020;11:3635. doi: 10.1038/s41467-020-17374-3MedlineGoogle Scholar
Submit a Response to This Article