Using Polygenic Risk Scores for Prioritizing Individuals at Greatest Need of a Cardiovascular Disease Risk Assessment

Background The aim of this study was to provide quantitative evidence of the use of polygenic risk scores for systematically identifying individuals for invitation for full formal cardiovascular disease (CVD) risk assessment. Methods and Results A total of 108 685 participants aged 40 to 69 years, with measured biomarkers, linked primary care records, and genetic data in UK Biobank were used for model derivation and population health modeling. Prioritization tools using age, polygenic risk scores for coronary artery disease and stroke, and conventional risk factors for CVD available within longitudinal primary care records were derived using sex‐specific Cox models. We modeled the implications of initiating guideline‐recommended statin therapy after prioritizing individuals for invitation to a formal CVD risk assessment. If primary care records were used to prioritize individuals for formal risk assessment using age‐ and sex‐specific thresholds corresponding to 5% false‐negative rates, then the numbers of men and women needed to be screened to prevent 1 CVD event are 149 and 280, respectively. In contrast, adding polygenic risk scores to both prioritization and formal assessments, and selecting thresholds to capture the same number of events, resulted in a number needed to screen of 116 for men and 180 for women. Conclusions Using both polygenic risk scores and primary care records to prioritize individuals at highest risk of a CVD event for a formal CVD risk assessment can efficiently prioritize those who need interventions the most than using primary care records alone. This could lead to better allocation of resources by reducing the number of risk assessments in primary care while still preventing the same number of CVD events.


Ethics Approval
This research has been conducted using the UK Biobank Resource under Application Number 26865.Data from the Clinical Practice Research Datalink (CPRD) were obtained under license from the UK Medicines and Healthcare Products Regulatory Agency (protocol 162RMn2).

Data Availability
All data files are available from the UK Biobank and CPRD databases.

UK Biobank Data Source
UKB (UK Biobank) is a prospective cohort study with detailed baseline information, genetic data, and linked primary care record data available for 177 359 individuals in England recruited between 2006 and 2010. 27enetic data were sequenced using a genome-wide array of ≈826 000 markers with imputation to ≈96 million markers. 27Primary care data were provided by the Phoenix Partnership, Egton Medical Information Systems, and Vision GP system suppliers. 28Data were linked with secondary care admissions from Hospital Episode Statistics and death records from the Office for National Statistics.For this study, primary care records were restricted to those measured between April 1, 2004, and the introduction of the Quality and Outcomes Framework and UKB baseline survey.To assess the impact of PRSs as a prioritization tool and compare with primary care records, our primary analyses were restricted to individuals with complete genetic data necessary for calculating the PRS, at least 1 primary care record, and without prior CVD or statin initiation before UKB baseline.Individuals contributing to the PRS derivation were also excluded.Data from the UKB were used to derive CVD risk tools and to model the implications of prioritizing individuals for formal assessment (Figure S1).All individuals gave informed consent.

CPRD Data Source
The Clinical Practice Research Datalink (CPRD) is a large UK primary care database containing primary care records 28 with linked information from Hospital Episode Statistics and death records from the Office for National Statistics.The most recent 5-year primary care records available were extracted for 870 486 individuals who were still alive and without prior CVD on January 1, 2014, and had no statins throughout follow-up until May 31, 2019, the end of data availability (Figure S2).Data from CPRD were used to rescale estimated CVD risks in UKB participants to address the healthy cohort effect (Figure S1).All individuals gave informed consent.

CLINICAL PERSPECTIVE
What Is New?
• Data from a large prospective cohort study were used to assess the public health impact of using polygenic risk scores for systematically identifying individuals for invitation for full formal cardiovascular disease risk assessment.• Our study directly compared the use of tools derived using primary care records or polygenic risk scores in the same population.• Our study quantified a 20% to 35% reduction in the number needed to screen to prevent 1 CVD event by using polygenic risk scores at both the invitation and full formal CVD assessment stage.
What Are the Clinical Implications?
• These results provide quantitative evidence of current guidelines in England to systematically prioritize individuals using existing primary care records.• Using polygenic risk scores may be beneficial in identifying high-risk individuals before a full formal risk assessment, which could lead to better allocation of resources by reducing the number of formal risk assessments in primary care while still preventing the same number of cardiovascular disease events.

Nonstandard Abbreviations and Acronyms
CPRD Clinical Practice Research Datalink NNS number needed to screen PRS polygenic risk score UKB UK Biobank

Outcomes
CVD was defined as the first ever incident of fatal or nonfatal events of coronary heart disease (including angina and myocardial infarction), ischemic heart disease, and stroke (code lists provided in Table S1), appearing in the linked Hospital Episode Statistics and Office for National Statistics databases during follow-up.

Risk Factors
Two PRSs for coronary artery disease and stroke, constructed using a meta-score approach and external summary statistics from large genome-wide association studies, 20,29 were used as independent variables.Conventional risk factors (as those in the QRISK2 scores 4 ) were selected: age; sex; ethnicity; Townsend score; smoking status (current/ever smoker); history of diabetes (type 1 or type 2 or history of diabetes medication); family history of CVD; history of chronic kidney disease (stages 4 and 5); history of atrial fibrillation status; history of blood pressure treatment; history of rheumatoid arthritis; total and high-density lipoprotein (HDL) cholesterol; systolic blood pressure (SBP); body mass index (BMI); and age interactions with Townsend score, history of diabetes, family history of CVD, history of atrial fibrillation, history of blood pressure treatment, SBP, and BMI.

Statistical Modeling
Sex-specific Cox models were used to derive 3 different prioritization tools for estimating 10-year CVD prioritization risk using primary care and genetic data from UKB.First, we derived a prioritization tool with linear predictors of baseline age, coronary artery disease PRS, 20 and stroke PRS. 29Age interactions were considered but were not statistically significant at the 5% level.Second, we derived a prioritization tool with predictors using longitudinal primary care records.
To handle missing values, the tool was derived in 2 stages: In the first stage, we used sex-specific multivariate mixed-effect regression models on longitudinal risk factor measurements for SBP, total and HDL cholesterol, and BMI to estimate current risk factor values (Data S1); in the second stage, we derived sex-specific Cox models with the estimated current risk factor values for SBP, total and HDL cholesterol, and BMI, and the most recent primary care measurements for the remaining QRISK2 risk factors.Third, we derived a prioritization tool with both PRSs and primary care records, using the 2-stage approach described above with the addition of linear predictors for the coronary artery disease PRS and stroke PRS in the second-stage Cox models.For each of these 3 tools, the model is used to identify individuals crossing a minimum 10-year risk threshold to be invited for a formal assessment.Sex-specific Cox models were used to derive 2 formal risk assessment models for predicting 10-year formal assessment CVD risk using risk factor measurements recorded at UKB baseline survey.First, we rederived a model based on QRISK2 predictors; and second, we derived a model based on QRISK2 predictors enhanced with the coronary artery disease PRS and stroke PRS.
All models were validated using 10-fold cross validation, and prognostic ability was quantified using Harrell's C-index to measure discrimination.

Population Health Modeling
Population health modeling was conducted to compare the population health impact of (1) prioritizing using a primary care records-based tool followed by a formal assessment with conventional risk factors, (2) prioritizing using a PRS and age-based tool followed by a formal assessment with conventional risk factors and PRS, and (3) prioritizing using both PRS and primary care records, followed by a formal assessment with conventional risk factors and PRSs (Figure 1).As UKB consists of healthier individuals than the general primary care population in England, we rescaled each model's estimated CVD risks so that the distribution of risks estimated were using age group-and sexspecific level risk factors obtained from CPRD and the published QRISK2 score to better reflect the CVD risk assessment program in the general population (Data S2; Table S2 and Table S3).Details of the rescaling method have been described elsewhere. 30,31 hypothetical population of 100 000 individuals (50 000 men and 50 000 women) from the United Kingdom was created; the population age structure was obtained using data from the Office for National Statistics in 2015, 32 and the number of expected CVD events was calculated using age group-and sex-specific incidence rates from CPRD (Table S3; Figure S3).A policy of statin initiation for individuals at ≥10% predicted 10-year formal assessment of CVD risk as currently recommended by National Institute for Health and Care Excellence guidelines and a 20% reduction in CVD risk were assumed. 33,34The population health impact for each of the 3 prioritization tools was modeled using age-and sex-specific prioritization thresholds in 2 ways.First, we selected prioritization thresholds to limit the formal risk assessment falsenegative rate to 5%.Second, we selected prioritization thresholds for the tools using PRSs, such that the same number of events identified would be equivalent to prioritizing with primary care records (Table S4).
Summary metrics were estimated for the number needed to screen (NNS) to prevent 1 CVD event, the number of CVD events identified, and the number needed to invite to prevent 1 CVD event.We assumed 50% statin compliance and a 50% invitation uptake of a formal assessment if inviting all individuals. 35,36We further assumed an increased invitation uptake of 55% if individuals were prioritized for an invitation to a formal assessment.Bootstrap 95% CIs were calculated using 500 iterations.
In sensitivity analyses, we repeated populationhealth analyses for all individuals, including those without a primary care record for any 1 of SBP, HDL, total cholesterol, or BMI, where those without a record were all invited for formal assessment (Table S3).We also repeated analyses assuming a 5% formal risk assessment threshold, in addition to age-and sex-specific prioritization thresholds selected to correspond to 2.5% false-negative rates.
Analyses were conducted in R x64 3.6.1 (R Foundation for Statistical Computing, Vienna, Austria).This study follows the RECORD statement (Table S5). 37

Population Characteristics
For our primary analysis, we identified 108 685 individuals in UKB with genetic data and a primary care record for at least 1 of SBP, HDL, total cholesterol, and BMI (Figure S4).All individuals had complete information for the conventional risk factors necessary to calculate a 10-year formal CVD risk at baseline survey.
The mean age at baseline was 56.2 years (SD, 8.0) for men and 56.1 years (SD, 7.8) for women.During mean follow-up of 8.2 years, there were 1838 incident cardiovascular events (Table 1).Compared with the measurements observed at the UKB baseline survey, the measurements recorded in primary care records were lower for SBP and total cholesterol and, although similar for current/ever smoking status and history of diabetes, were less concordant for the remaining disease statuses.The oldest primary care record was on average 3.8 years before baseline.

Model Performance and Comparison
Hazard ratios in the prioritization tools and formal assessment models for the same predictors were similar (Tables S6 and S7).All models had good discriminatory performance, with higher performance in women (Table 2).The greatest performance was observed in the model using conventional risk factors and PRSs in men (C-index, 0.716; 95% CI, 0.702-0.730)and in women (C-index, 0.742; 95% CI, 0.722-0.762).
The estimated 10-year risks between the primary care records-only prioritization tool and the formal assessment model using conventional risk factors were highly correlated (correlation coefficients, 0.75 for men and 0.80 for women).In contrast, the estimated 10year risks between the PRS+age prioritization tool and the formal assessment model using conventional risk factors and PRS were less highly correlated (correlation coefficients, 0.67 for men and women), and the estimated 10-year risks between the PRS and primary care records based prioritization tool and the formal assessment model using conventional risk factors with PRS were more highly correlated (correlation coefficients, 0.82 for men and women; Table 3).Rescaled 10-year risk estimates between all models were similar (Figure S5).

Population Health Modeling
In our representative population of 100 000 individuals aged 40 to 69, 3573 men and 1808 women would experience a CVD event over the next 10 years.If conventional risk factors were used to formally assess the whole population, 2426 (67.9%) men and 801 (44.3%) women would be identified at high risk (Figure 2, Table S8).Assuming statin therapy would be initiated on high-risk individuals and no other preventive interventions implemented, the NNS to prevent 1 CVD event in men and women would be 103 (95% CI, 100-107) and 312 (95% CI, 288-334), respectively.If the primary care records-based prioritization tool was first used to prioritize formal assessment in the population, then 2335 (65.3%) men and 785 (43.4%) women would be identified at high risk (Figure 2, Table S8).The NNS to prevent 1 event would reduce to 149 (95% CI, 143-155) in men and 280 (95% CI, 259-301) in women (27.7% and 55.1% reduction, respectively).
If the PRS+age prioritization tool was first used to prioritize formal assessment in the population, then 78.8% of men and 74.8% of women would be prioritized and, among them, 2356 (65.9%) men and 813 (45.0%) women with CVD events over the next 10 years would be classified at high risk (Figure 2, Table S9).The NNS to prevent 1 event would reduce to 167 (95% CI, 161-174) in men and 460 (95% CI, 423-491) in women (18.1% and 22.3% reduction, respectively).If the PRS and primary care records-based prioritization tool was first used to prioritize formal assessment in the population, then 2367 (66.3%) men and 825 (45.6%) women would be classified at high risk (Figure 2, Table S10).The NNS to prevent 1 event would reduce to 127 (95% CI, 122-132) in men and 255 (95% CI, 234-273) in women (37.7% and 56.9% reduction, respectively).
Choosing prioritization thresholds such that all strategies would identify the same number of events if prioritizing using primary care records (Table S4) or prioritizing using PRS and age resulted in an NNS of 164 (95% CI, 157-170) in men and 446 (95% CI, 410-480) in women.Prioritizing using PRSs and primary care records resulted in an NNS of 116 (95% CI, 111-121) in men and 180 (95% CI, 166-193) in women (Table 4, Figure S6).Compared with using primary care records, prioritizing using PRSs and primary care records led to statistically significant differences in the NNS at the 5% level for all except in women aged 40 to 49 years.

Sensitivity Analysis
In sensitivity analyses including all individuals (ie, including 15 324 individuals without a primary care record for any 1 of SBP, total cholesterol, HDL cholesterol, or BMI; Table S11), we found comparable results for the PRS-based prioritization tool and the primary care-based prioritization tool in men and women.As expected, we increased the NNS if prioritizing with primary care records, especially among the youngest group (Tables S12 through S14, Figure S7).
In sensitivity analyses assuming a 5% formal risk assessment threshold, in addition to age-and sexspecific prioritization thresholds selected to correspond to 2.5% false-negative rates (Table S15), we found significant improvements in the number of events identified, as expected.Consequently, this reduces the NNS when formally assessing all individuals, using either conventional risk factors or conventional risk factors enhanced with PRSs (Tables S16 through S18).While prioritization can still reduce the overall NNS and the NNS among the youngest, the differences are smaller compared with when using a 10% formal risk assessment threshold.

DISCUSSION
This study has rigorously assessed the impact of using PRS both alone and in combination with traditional risk factors for systematically prioritizing individuals for a formal CVD risk assessment.Comparing against current recommendations of using existing primary care records, we found that adding PRS to both prioritization and formal assessment improves their correlation.This subsequently leads to higher efficiency and effectiveness, especially among younger individuals.Consequently, augmenting primary care records with PRSs reduces the NNS by ≈20% and 35% in men and women, respectively, relative to using primary care records alone and identifying the same number of events.In contrast, using only PRS and age in a prioritization tool leads to a larger NNS.These results support the addition of PRS with primary care records to prioritize individuals at highest risk for a formal CVD risk assessment, which could lead to better allocation of resources by reducing the number of assessments.This study provided a comparison of prioritization tools using longitudinal primary care records or PRSs within a population in England aged between 40 and 69 years who are currently invited for a National Health Service Health Check to assess their individual risk of CVD.We demonstrated the benefits of PRS not only by measuring model discrimination but also by evaluating the health impact if implemented within this population.Compared with previous studies that generally focused on the role of PRSs in a formal CVD risk assessment model, 20,21,29,38 our study has uniquely assessed its role in a prioritization tool, in conjunction with a CVD risk model.We have also shown that if PRSs were widely available, the inclusion of PRSs in a prioritization tool could improve the effectiveness of a prioritization tool, especially in younger individuals, by reducing the reliance on primary care records.
The benefits in prioritizing a subgroup of those individuals at low absolute risk to increase efficiency echoes other studies, which have also shown that selecting a smaller proportion of younger, low-risk individuals can lead to dramatically reduced costs while resulting in more quality-adjusted life years gained. 39sing a prioritization tool could also efficiently help reduce the concerning backlog in health checks caused by the COVID-19 pandemic, 40 where the number of people invited to health checks in England declined by 82% between the end of 2019 and 2020, [41][42][43] while still preventing nearly the same number of CVD events.[46][47][48]

Strengths and Limitations
Our study has several strengths.To our knowledge, this study is the first to directly compare how using different data types for a prioritization tool can impact  on the CVD risk assessment program in England.This was possible due to the unique data linkage of primary care records along with a baseline survey in UKB.
We derived the PRS-based prioritization tools using 2 current and well-documented PRSs that have been shown to improve model performance independent of traditional CVD risk factors.We also took advantage of the sporadically observed longitudinal primary care records when deriving the primary care records-based prioritization tools, by estimating current risk factor values using a multivariate mixed model.While QRISK2, which replaces missing values with age-, sex-, and ethnicity-specific population average values, could have been used as a prioritization tool, we chose to optimize the available data in primary care records to reduce possible overinflation of the information from PRSs.We would expect greater improvements if augmenting PRSs in CVD risk scores with fewer risk factors.Another strength of this study is the use of 10-fold cross validation to correct for overoptimism that may exist in our analyses, as we derived and conducted the population health modeling in the same individuals.Further, we used rescaling methods to adjust the 10year risk estimates for all of the models to minimize the healthy selection bias when deriving models in UKB and to ensure that results were representative to the general population of England.Such rescaling methods could be adapted toward other countries.
However, several potential limitations exist.First, while we used primary care records that were no more than 6.5 years old before baseline, the mean risk factor levels between primary care records and at the UKB baseline differed within the same individuals, which could lead to a different distribution of 10-year risk estimates.This may also weaken the correlations between the prioritization tool and formal risk assessment models reported.Second, we determined the number of events identified in the population health modeling by calculating the model's sensitivity in the UKB and translating to a hypothetical population; due to the low number of events in the UKB, the sensitivity of each model may be limited in accuracy, especially in younger age groups with fewer events.In addition, any uncertainties are propagated within the population health modeling.Third, PRSs for cardiovascular disease are still under active development, and while we use 2 extensively studied and validated PRSs, there are likely more powerful PRSs soon to be available. 49ourth, the age range of the population health modeling was limited to between 40 and 69 years due to the use of UK Biobank.This restricts the population health modeling and, in particular, limits the ability to investigate the early prioritization capabilities of PRSs (which are fixed at conception).Fifth, we have focused on estimating the differences in a primary care population in England.Further work should generalize  Age group-and sex-specific prioritization thresholds when prioritizing using primary care records were defined as the level such that the expected false-negative rate is controlled to be 5%.Age group-and sex-specific prioritization thresholds when prioritizing using PRS or PRS and primary care records tool were selected to result in the same number of events identified if prioritizing using primary care records.NNI and NNS assumes 100% statin compliance.NNI assumes a 50% invitation uptake if assessing without using prioritization tool, and a 55% invitation uptake if assessing with using prioritization tool.Bootstrap 95% CIs were calculated using 500 iterations.NNI indicates number needed to invite; NNS, number needed to screen; and PRS, polygenic risk score.
the findings to other countries and their respective health care systems.Sixth, we assumed a constant 20% reduction in risk due to statins, which is unlikely in practice, where reductions may be greater in those with a greater genetic risk. 50Seventh, while data from CPRD are generally representative of the primary care attending population in the United Kingdom in terms of age, sex, and ethnicity, the CPRD data used do not have comprehensive coverage in the North and East of England. 28,51Eighth, we did not model the combined effects of other preventative interventions, such as lifestyle advice.It is likely that the benefits of communicating polygenic risk may lead to beneficial lifestyle changes, which may impact health outcomes. 52Finally, we acknowledge our use of International Classification of Diseases, Tenth Revision (ICD-10) codes to identify CVD outcomes may have missed some events, although this is unlikely to affect our between-model comparisons.

CONCLUSIONS
Population health guidelines in England recommend individuals at higher estimated risk of CVD be prioritized for formal risk assessment.Our results show that incorporating PRSs improves the correlation between prioritization tools and formal CVD risk assessment models.In particular, the use of PRSs together with primary care records to prioritize individuals at highest risk of a CVD event for a formal CVD risk assessment has the ability to efficiently prioritize those who need interventions the most, which could lead to better allocation of resources by reducing the number of formal risk assessments in primary care.

ARTICLE INFORMATION
Where  1 to  4 represent the random intercepts but are correlated between risk factors. 1 to  4 represents the uncorrelated residual errors for each risk factor.
A mixed effects model was chosen to take into account the sporadic nature of electronic health records, as well as being able to model the intra-correlations between each risk factor.In addition, the model only needs a minimum of one recorded measurement of any one risk factor to estimate all four of the risk factors.
The model assumes that all risk factors jointly follow a multivariate normal distribution.Inference based from the multivariate normal distribution may often be reasonable even if the multivariate normality does not hold, especially in the context of imputation of missing data 53 and regression calibration 54,55 .

Data S2.
Rescaling of prioritisation tool and formal risk assessment tool risks for population health modelling.
Our aim was to validate each prioritisation tool to estimate the health impact in a general population in England.We used UK Biobank due to its availability of detailed measurements at baseline, which was used to estimate a 10-year formal assessment risk, but also genetic data and linked historical primary care records necessary for the formal assessment model using conventional risk factors and PRS, and the prioritisation tools derived using primary care records and/or PRS.The breadth of the data allowed for a direct comparison of each prioritisation tool in the same individuals.
However, UKB participants have been shown to be healthier than the general population both in terms of risk factor levels and CVD incidence rates.Deriving and modelling the health impact of all prioritisation tools and formal risk tools in UK Biobank without adjustments would lead to a biased distribution of 10-year risks estimated, with the distribution of risks being skewed to the right and be narrow relative to the distribution observed in the general population.To more accurately use UK Biobank for population health modelling, the distribution of 10-year risks estimated were rescaled.
Rescaling was completed for each tool and by sex, using methods similar to those previously described 30 , and allowed the mean level of predicted risks based on UKB data to match what was observed in CPRD.We used sex-specific mean risk factor levels calculated from the Clinical Practice Research Datalink (CPRD) between the years 2014 and 2019 within 5-year age groups to estimate the predicted risk in the general population by fitting the average level risk factors into the published QRISK2 risk model (Table S2).Table S4.Age-and sex-specific prioritisation thresholds chosen for population health modelling.
Abbreviations: FNR, false negative rate; PRS, polygenic risk score Age group and sex specific 5% FNR prioritisation thresholds were defined as the level such that the expected false negative rate of the formal risk assessment is controlled to be 5%.The prioritisation thresholds were chosen by first, ranking the estimated 10-year CVD risks from each prioritisation tool amongst individuals with a future CVD event.The FNR threshold was selected as the maximum estimated risk such that 5% of individuals with a future event would not be prioritised (i.e. were lower than the threshold).
Age group and sex specific 'equivalent events prioritisation thresholds', when prioritising using PRS or PRS and primary care records tool, were chosen such that the number of events identified would be similar to if prioritising with primary care records using a 5% FNR prioritisation threshold.S10.Number needed to invite and screen to prevent one event, and number of events identified when prioritising with PRS and primary care records in a hypothetical population of 100,000 individuals in England.S13.Number need to invite and screen to prevent one event, and number of events identified when prioritising using PRS + age, including all individuals without a primary care record for any one of SBP, HDL, total cholesterol or BMI, in a hypothetical population of 100,000 individuals in England.
Table S14.Number need to invite and screen to prevent one event, and number of events identified when prioritising using PRS and primary care records, including all individuals without a primary care record for any one of SBP, HDL, total cholesterol or BMI, in a hypothetical population of 100,000 individuals in England.
Table S15.Age-and sex-specific prioritisation thresholds chosen for population health modelling with 2.5% false negative rate prioritisation thresholds.
Abbreviations: FNR, false negative rate; PRS, polygenic risk score Age group and sex specific 2.5% FNR prioritisation thresholds were defined as the level such that the expected false negative rate of the formal risk assessment is controlled to be 2.5%.The prioritisation thresholds were chosen by first, ranking the estimated 10-year CVD risks from each prioritisation tool amongst individuals with a future CVD event.The FNR threshold was selected as the maximum estimated risk such that 2.5% of individuals with a future event would not be prioritised (i.e. were lower than the threshold).Abbreviations: BMI, body mass index; CVD, cardiovascular disease.
Highlighted in red were individuals without necessary primary care records to calculate incidence rates for sensitivity analyses and were included for sensitivity analysis incidence rates calculation.
. Number needed to invite and screen to prevent one event and number of events identified after prioritisation and formal assessment in a hypothetical population of 100,000 individuals in England, with prioritisation thresholds selected to identify the same number of events if prioritising with primary care records with prioritisation thresholds controlling the false negative rate to 5%.
Abbreviations: NNS, number needed to screen; NNI, number needed to invite; PRS, polygenic risk score.
95% confidence intervals are represented by vertical lines.Age group and sex specific prioritisation thresholds were defined as the level such that the expected false negative rate was controlled to be 5%.NNI and NNS assumes 50% statin compliance, and half of all individuals invited for formal assessment attend.Abbreviations: HDL, high-density lipoprotein; NNI, number needed to invite; NNS, number needed to screen; PRS, polygenic risk score; SBP, systolic blood pressure.
95% confidence intervals are represented by vertical lines.Age group and sex specific prioritisation thresholds were defined as the level such that the expected false negative rate was controlled to be 5%.NNI and NNS assumes 50% statin compliance, and half of all individuals invited for formal assessment attend.

Figure 1 .
Figure 1.Flowchart of the implementation of a prioritization tool for formal cardiovascular disease assessments.BMI indicates body mass index; CAD, coronary artery disease; CVD; cardiovascular disease; HDL, high-density lipoprotein; and PRS, polygenic risk score.

Figure 2 .
Figure2.Number needed to invite, number needed to screen, and number of events identified after prioritizing for a formal CVD assessment, in a hypothetical population of 100 000 individuals in England.95% CIs are represented by vertical lines.Age group-and sex-specific prioritization thresholds were defined as the level such that the expected false-negative rate is controlled to be 5%.NNI and NNS assumes 50% statin compliance, and half of all individuals invited for formal assessment attend.CVD indicates cardiovascular disease; NNI, number needed to invite; NNS, number needed to screen; and PRS, polygenic risk score.

Figure S2 .
Figure S2.Flowchart showing selection of patient records for generating summary statistics from CPRD.

Figure S3 .Figure S4 .
Figure S3.Illustration of UK Biobank data used in analysis.
Figure S7.Number needed to invite, number needed to screen and number of events identified after prioritising for a formal CVD assessment, including all individuals without a primary care record for any one of SBP, HDL, total cholesterol or BMI, in a hypothetical population of 100,000 individuals in England.
statins No prioritisation tool used: formal assessment with conventional risk factors Prioritisation with primary care records & formal assessment with conventional risk factors No prioritisation tool used: formal assessment with conventional risk factors + PRS Prioritisation with PRS + age & f ormal assessment with conventional risk factors + PRS Prioritisation with primary care records + PRS & formal assessment with conventional risk factors + PRS

Table 1 .
Key Characteristics of Individuals in UK Biobank Baseline Survey and Linked Primary Care Records BMI indicates body mass index; CVD, cardiovascular disease; HDL, high-density lipoprotein; and PRS, polygenic risk score.*Riskfactor values in both baseline and primary care records if 1 was missing.†Risk factor values for primary care records estimated using multivariate mixed-effect model.

Table 2 .
C Indices of Prioritization Tools and Formal CVD Risk Assessment Tools in UK Biobank CIs from each model for the prediction of 10-year cardiovascular disease by sex and for the combined population in UK Biobank after 10-fold cross validation.CVD indicates cardiovascular disease; and PRS, polygenic risk score.

Table 3 .
Correlation of Predicted 10-Year Risks Between Prioritization Tools and Formal Assessment Tools by Sex in the Derivation Data Set PRS indicates polygenic risk score.

Table 4 .
Number Needed to Invite and Screen to Prevent 1 Event and Number of Events Identified After Prioritization and Formal Assessment in a Hypothetical Population of 100 000 Individuals in England, With Prioritization Thresholds Selected to Identify the Same Number of Events if Prioritizing With Primary Care Records With Prioritization Thresholds Controlling the False Negative Rate to 5% Age structure of hypothetical population extrapolated from Office for National Statistics, England, United Kingdom, 2015.Expected events at 10 years based on extrapolation of incidence rates from CPRD, 2014-2019.
This allows us to calculate scaling factors to rescale each prioritisation tool and formal risk assessment model to have a distribution similar to what would be expected in the

Table S11 . Summary of number of individuals without primary care records in UK Biobank.
Prioritisation with primary care records requires at least one CVD risk factor of: systolic blood pressure, total cholesterol, HDL cholesterol and/or BMI.

Table S17 . Number needed to invite and screen to prevent one event, and number of events identified when prioritising with PRS + age in a hypothetical population of 100,000 individuals in England, assuming a 5% formal risk assessment threshold. Table S18. Number needed to invite and screen to prevent one event, and number of events identified when prioritising with PRS and primary care records in a hypothetical population of 100,000 individuals in England, assuming a 5% formal risk assessment threshold. Figure S1. Flowchart showing data sources used for model derivation for estimated risks and population health modelling.
Abbreviations: CPRD, Clinical Practice Research Datalink; ONS, Office for National Statistics.
Remove patients whose data is unacceptable, gender is not male or female, outside of England.Removing individuals who exited before or at the start of the study or age 40, and who started after the study exit or age 85. • Study entry data was the latest of: o The data of 6 months after the individual registered at a general practice o The date that the individual turned 30 years of age o The date that the data for the practice were up to standard 2,525,156 individuals in CPRD with linkage o The date for enhanced data quality usage in English general practice, which was defined as the time that the national Quality and Outcomes Framework (QOF) was introduced (1 st April 2004) • Study exit date was the earlist of: o The date of deregistration at the practice o The individual's death o The date that the individual turned 95 years of age o The last contact date for the practice with CPRD o The administration end date (31 st May 2019) N = 9,459,190