Machine Learning Outperforms ACC/AHA CVD Risk Calculator in MESA
Studies have demonstrated that the current US guidelines based on American College of Cardiology/American Heart Association (ACC/AHA) Pooled Cohort Equations Risk Calculator may underestimate risk of atherosclerotic cardiovascular disease (
Methods and Results
We developed a
What Is New?
The 2013 American College of Cardiology/American Heart Association (ACC/AHA) Pooled Cohort Equations risk calculator has been shown to be inaccurate in certain populations.
Using the same risk variables, we developed a Machine Learning‐based risk calculator in the MESA (Multi‐Ethnic Study of Atherosclerosis) cohort and validated in the FLEMENGHO study (Flemish Study on Environment, Genes and Health Outcomes).
What Are the Clinical Implications?
The Machine Learning Risk Calculator outperformed the ACC/AHA Risk Calculator by recommending less drug therapy, yet missing fewer cardiovascular disease events.
These findings demonstrate the potential of Machine Learning to improve cardiovascular risk prediction and assist medical decision‐making.
Approximately every 20 seconds an American will have a heart attack or stroke. Of 790 000 heart attacks each year, 580 000 are new attacks in asymptomatic individuals. Similarly, of 610 000 strokes each year, 425 000 events are first‐time. The economic cost of these unpredicted cardiovascular events is tens of billions of dollars annually.1 Despite the grave nature of the problem, many of these events could be prevented if a more accurate tool for early detection of high‐risk individuals became available.
The traditional method of cardiovascular disease (CVD) risk assessment is based on measuring traditional risk factors and predicting events over 10 years or a lifetime. Numerous studies have shown that current 10‐year risk calculators, including the 2013 American College of Cardiology/American Heart Association (ACC/AHA) Pooled Cohort Equations Risk Calculator,2 often overestimate cardiovascular events and, in women and certain ethnic groups, may underestimate risk.3 The existing approach to CVD risk assessment desperately needs an overhaul. A consensus report from the Society for Heart Attack Prevention and Eradication (SHAPE) Task Force concluded that a comprehensive assessment of plaque, blood, and myocardial vulnerability factors is needed for an accurate prediction of CVD events. The task force further noted that despite major advances in the treatment of coronary heart disease patients, a large number of victims of the disease who are apparently healthy die suddenly without prior symptoms.8 Clearly the available screening and diagnostic methods are insufficient to identify the victims before the event occurs; therefore, short‐term risk prediction is much needed. To reach the goal, a stepwise multi‐phase approach is warranted that includes maximizing the long‐term predictive value of traditional risk factors using Machine Learning (ML), gathering unique data on asymptomatic subjects who, shortly after an exam with blood testing, experience an ASCVD event, and applying ML to all available clinical data, including genomic. proteomic, and others to detect the vulnerable patient. In this paper, we report on our initial effort at advancing the field by using the same risk factors used by existing risk calculators but using ML as a new tool instead of the traditional statistical tools used for the existing risk calculator.2 Rapid growth in information technology and computing power in recent years has spurred the emergence of ML and its applications in our day to day life from automated personal assistants to self‐driving cars. The medical community has begun taking advantage of these new possibilities to improve medical care.
ML is generally categorized into 2 types, supervised and unsupervised.9 Here, we report the use of a supervised ML model developed for predicting CVD risk and guiding the decision of whom should be recommended CVD preventive statin therapy. Unlike traditional statistical prediction methods, which operate with certain assumptions of linearity and force the predictive models to behave accordingly, the ML algorithm we used (Support Vector Machine—SVM) does not follow such assumptions and, instead, relies on learning the intrinsic properties or patterns of a given data set. The use of ML in medicine is clearly lagging other fields and remains experimental. Although several ML‐based predictive models related to medical and cardiovascular fields have been published,10 we are not aware of any approved by the FDA (Food and Drug Administration) for CVD prevention.
Our study serves as a step towards addressing this unmet need. Using ML and the same risk factors used by ACC/AHA Risk Calculator, we aimed to improve CVD risk stratification. We tested this approach in MESA (the Multi‐Ethnic Study of Atherosclerosis)18 and also used FLEMENGHO (the Flemish Study on Environment, Genes and Health Outcomes)19 for external validation of our findings.
The data, analytic methods, and study materials will not be made available to other researchers for purposes of reproducing the results or replicating the procedure.
Initiated in July 2000 to investigate the prevalence, correlation, and progression of subclinical CVD in individuals who did not exhibit known cardiovascular issues, MESA is a US community‐based prospective cohort study of 6814 white, black, Hispanic, or Chinese American men and women aged 45 to 84 years free of clinically apparent CVD at baseline (2000–2002). Study details have been previously published.18 All participants gave their written informed consent and the institutional review boards at all participating centers approved the study. The MESA study group followed the cohort yearly for up to 13 years from baseline (median, 11.1 years) and monitored for incidence of cardiovascular events. Thus, the MESA data set used in this study includes baseline characteristics data and which participants have experienced CVD events during the follow‐up period.
For this study, we excluded 69 (1.0%) subjects with missing risk factor data. Next, we excluded 286 (4.2%) who were >79 years at baseline; this step was performed because the ACC/AHA Risk Calculator was designed for those between 40 and 79 years of age. Thus, our study population was comprised of 6459 participants.
ACC/AHA Pooled Cohort Equations Risk Calculator
The 2013 ACC/AHA Pooled Cohort Equations calculator was designed to estimate 10‐year risk of atherosclerotic cardiovascular disease, defined as heart attack, CHD death, or stroke. ACC/AHA risk estimates were computed using the subjects’ baseline characteristics and available published equations.2 We grouped the scores into 2 preselected 10‐year risk prediction categories: (1) low risk (<7.5%) and (2) high risk (≥7.5%) categories, for determining which subjects the risk calculator would have recommended statin treatment, based on the 2013 ACC/AHA guideline on the treatment of blood cholesterol to reduce atherosclerotic cardiovascular risk.20 Because the follow‐up for the MESA data is 13 years, we linearly transformed the 10‐year risk of the base models into a 13‐year risk. Thus, the risk threshold for statin eligibility becomes 9.75% for 13 years.
Machine Learning Risk Calculator
An overview of the ML approach is shown in Figure 1. The algorithm begins with the initial cohort, which is then split into training and prediction sets. Because of the imbalanced nature of data (many more samples without an event than samples with an event), during training, data are augmented using NEATER (a method for the filtering of oversampled data using non‐cooperative game theory),21 and then an SVM22 classifier is trained. When a new unseen sample appears during prediction, the ML model determines in which category it belongs.
We built 8 ML‐based models for “Hard CVD” events and 8 models for “All CVD” events. For each type of events we built 2 models per sex (ie, 1 for males and 1 for females) and for each one of them we built 4 models per ethnicity (eg, white, Chinese, black, and Hispanic). Thus, there were 16 models in total.
The SVM method is one of the most powerful learning algorithms for binary classification problems such as the problem stated in this article.23 SVM is given a training set of examples (or inputs), belonging to 2 classes (eg, positive and negative, event and no event), with associated labels (or output values) and it finds the optimal maximum‐margin dividing the 2 classes. This dividing interface is called a hyperplane and achieves maximum discrimination.
During training, the ML model was provided with the baseline values of the same 9 risk factors (“predictor variables,” in ML parlance) used by the ACC/AHA Risk Calculator, namely the following: age, sex, ethnicity, total cholesterol, high‐density lipoprotein cholesterol, systolic blood pressure, treatment for hypertension, history of diabetes mellitus, and smoking status. “Hard CVD” events included myocardial infarction, fatal CHD, stroke, and stroke death. “All CVD” events included the above list, plus congestive heart failure, transient ischemic attack, peripheral vascular disease, resuscitated cardiac arrest, percutaneous transluminal coronary angioplasties, probable angina, other revascularization, other CVD death, other atherosclerotic death, and cardiac bypass graft surgery.
The investigation of the performance was evaluated in real data samples that the model had not seen before. Moreover, 2‐fold cross validation was used to validate the method and the results were averaged for all possible configurations. By using 2‐fold cross validation, we ensure that we obtain an unbiased estimation of the model performance, while no significant loss of modeling or prediction capability may be apparent because of the imbalanced nature of the data. More details on the 2‐fold cross validation that was performed can be found in the following sections.
For visualization purposes, we projected the high‐dimensional feature space into a 3D feature space using the Principal Component Analysis. Because of the high dimensionality of the input training data, the decision hyperplane between the class samples with an event and the class samples without an event is transformed into a hyper‐surface. An example of the 3D hyper‐surface for the MESA male group for classifying “Hard CVD” and “All CVD” events can be seen in Figure S1.
We cast the problem of event prediction as a binary classification problem. We denote the positive class “+1” (ie, subjects with an event) as the minority class (ie, category of samples with events), and the negative class “−1” (ie, subjects without an event) as the majority class (ie, category of samples without events). The MESA data are severely imbalanced in terms of outcomes, that is, the size of the minority class is much smaller than the size of the majority class, and, as a result, the decision boundary for ML methods would be severely biased and could result in poor performance. To cope with this skewed class distribution issue, we selected the NEATER algorithm,21 a data augmentation algorithm that is based on filtering oversampled data using cooperative game theory. We elected to use this algorithm based on its ability to effectively increase the performance of the classifier as well as its unique tendency to avoid overfitting, which can be inevitable when using other oversampling techniques. NEATER is able to handle data sets of an imbalanced nature and generate new data. A detailed description of how NEATER works can be found in Data S1.
The main advantage of NEATER is that it makes no prior assumptions about the data, while it reaches high accuracy for both the minority and the majority classes. It is also important to note that NEATER is used for data augmentation only for training purposes and never during prediction.
Two‐Fold Cross Validation
To ensure and increase the model's robustness and ability to generalize under unknown samples, we employed 2‐fold cross validation to randomly split the original data set into 2 equally sized halves, a training set to train the model, and a test set to evaluate it. This type of cross validation has been widely used in the machine learning literature for predicting high‐risk individuals (more details about the 2‐fold cross validation can be found in Data S2).28
To test the generalizability of the ML models, and also to check for potential overfitting, we tested the ML risk calculator on an external data set drawn from the FLEMENGHO study.
FLEMENGHO recruited 2940 white participants between ages 20 and 90 years from August 1985 to December 2005 who were free of clinical CVD at baseline. FLEMENGHO studies a random population sample stratified by sex and age from a geographically defined area in Northern Belgium. All participants provided their written informed consent and local institutional review board approved the study protocol. FLEMENGHO study details have been previously published.19
For the FLEMENGHO study, we excluded 104 (3.5%) subjects with missing risk factor data. Then, 1488 (50.6%) subjects who were <45 years and >79 years at baseline were excluded; this step was performed because the ML Risk Calculator was trained in MESA cohort for those between 45 and 79 years of age. The final study population for the external validation was comprised of 1348 subjects. In FLEMENGHO, 621 (21.2%) subjects were followed‐up for >13 years. Of them 180 (6.1%) subjects had a CVD event. We treated these samples as “no events” and included them in the study population as such; the reason for this step is that the ML Risk Calculator was trained on a 13‐year follow up period while FLEMENGHO's follow‐up was >13 years. Thus, to have a fair comparison, this step was necessary for our analysis.
We performed an analysis to determine the sensitivity, specificity, accuracy, and C‐statistic of the ACC/AHA Risk Calculator based on the prediction equations and the 7.5% 10‐year risk threshold described previously. Next, we analyzed the performance of the ML Risk Calculator, compared its performance metrics to those of the ACC/AHA Risk Calculator, and calculated categorical net reclassification improvement (NRI) values for paired models. A paired t test was used to compare the population means and compute the P values. All statistical tests were 2‐ tailed, and P<0.05 was considered significant.
Overall, 480 (7.4%) “Hard CVD” and 976 (15.1%) “All CVD” events occurred in the MESA study population (N=6459) during the 13‐year follow up period. Baseline characteristics of the study population and subgroups of interest are reported in Table 1. “Hard CVD” events included 221 myocardial infarction, 71 CHD deaths, 178 strokes, and 10 stroke deaths. “All CVD” events included 221 myocardial infarction, 71 CHD deaths, 178 strokes, 10 stroke deaths, 176 angina‐driven revascularizations, 11 resuscitated cardiac arrests, 8 other atherosclerotic deaths, 38 other CVD deaths, 111 congestive heart failures (CHF), 53 peripheral vascular diseases (PVD), 22 percutaneous transluminal coronary angioplasties, 9 coronary bypass grafts , 61 transient ischemic attacks, and 7 other revascularizations.
|All (N=6459)||Hard CVD (n=480)||All CVD (n=976)||ACC/AHA <9.75% 13‐y risk (n=3487)||ACC/AHA ≥9.75% 13‐y risk (n=2972)||ML: Low Risk (13‐y) (n=5724)||ML: High Risk (13‐y) (n=735)|
|Male, n%||3060 (47.4%)||282 (58.7%)||590 (60.4%)||1254 (36.0%)||1806 (60.8%)||2601 (45.4%)||459 (62.4%)|
|Female, n%||3399 (52.6%)||198 (41.3%)||386 (39.6%)||2233 (64.0%)||1166 (39.2%)||3123 (54.6%)||276 (37.6%)|
|White||2484 (38.5%)||187 (39.0%)||413 (42.3%)||1439 (41.2%)||1045 (35.2%)||2197 (38.4%)||287 (39.0%)|
|Asian||767 (11.9%)||35 (7.3%)||67 (6.9%)||446 (12.8%)||321 (10.8%)||697 (12.2%)||70 (9.5%)|
|Black||1780 (27.5%)||138 (28.7%)||282 (28.9%)||794 (22.8%)||986 (33.2%)||1573 (27.5%)||207 (28.2%)|
|Hispanic||1428 (22.1%)||120 (25.0%)||214 (21.9%)||808 (23.2%)||620 (20.8%)||1257 (21.9%)||171 (23.3%)|
|Total cholesterol, mg/dL||194.4±35.8||194.6±34.2||193.2±37.3||194.9±34.9||193.8±36.8||194.7±36.6||192.3±30.9|
|High‐density lipoprotein cholesterol, mg/dL||50.9±14.8||47.8±13.9||48.1±13.6||52.6±15.0||48.8±14.3||51.5±15.0||46.5±12.3|
|Systolic blood pressure, mm Hg||125.9±21.1||136.3±22.2||134.5±21.7||116.9±16.4||136.6±21.1||124.7±20.9||135.8±20.0|
|Hypertension, n%||2351 (36.4%)||243 (50.6%)||510 (52.2%)||735 (21.1%)||1616 (54.4%)||1974 (34.5%)||377 (51.3%)|
|Diabetes mellitus, n%||729 (11.3%)||107 (22.3%)||217 (22.2%)||127 (3.6%)||602 (20.3%)||653 (11.4%)||76 (10.3%)|
|Current smoking||869 (13.5%)||92 (19.2%)||169 (17.3%)||387 (11.1%)||482 (16.2%)||762 (13.3%)||107 (14.6%)|
|Prior smoking||2365 (36.6%)||180 (37.5%)||419 (42.9%)||1192 (34.2%)||1173 (39.5%)||2073 (36.2%)||292 (39.7%)|
|Never||3225 (49.9%)||208 (43.3%)||388 (39.8%)||1908 (54.7%)||1317 (44.3%)||2889 (50.5%)||336 (45.7%)|
|Family history heart attack, n%a||2593 (40.1%)||239 (49.8%)||482 (49.4%)||1364 (39.1%)||1229 (41.3%)||2231 (39.0%)||362 (49.2%)|
|Coronary artery calcification, Agatstona||138.8±408.6||316.1±577.2||389.9±759.0||41.0±160.8||253.4±555.2||121.0±380.1||274.0±563.7|
Risk Calculator Performance Comparison
Table 2 presents the sensitivity, specificity, and accuracy of the risk calculators for the prediction of “Hard CVD” and “All CVD” events, respectively. The ACC/AHA Risk Calculator achieved 0.76 sensitivity, 0.56 specificity, and 0.58 accuracy for predicting “Hard CVD” events and 0.75 sensitivity, 0.59 specificity, and 0.62 accuracy for predicting “All CVD” events. In comparison, the ML Risk Calculator for the prediction of “Hard CVD” events had higher sensitivity (0.86), specificity (0.95), and accuracy (0.94). For the prediction of “All CVD” events, ML Risk Calculator sensitivity was increased to 0.96 but with a slight decrease in specificity (0.87) and accuracy (0.89).
|Event||Model||Sn (95% CI)||P Value||Sp (95% CI)||P Value||FN||FP||TP||TN||Acc (95% CI)||P Value||NRI (95% CI)||P Value|
|Hard CVD||ACC/AHA Risk Calculator||0.86±0.1 (0.81–0.90)||–||0.44±0.1 (0.42–0.46)||–||40||1564||242||1214||0.48±0.1 (0.46–0.49)||–||–||–|
|ML Risk Calculator||0.90±0.1 (0.86–0.94)||≤0.001||0.93±0.1 (0.92–0.94)||≤0.001||27||204||255||2574||0.92±0.1 (0.91–0.93)||≤0.001||0.53 (0.51–0.55)||≤0.001|
|All CVD||ACC/AHA Risk Calculator||0.84±0.1 (0.81–0.87)||–||0.47±0.1 (0.45–0.49)||–||96||1312||494||1158||0.54±0.1 (0.52–0.56)||–||–||–|
|ML Risk Calculator||0.97±0.1 (0.96–0.99)||≤0.001||0.82±0.1 (0.80–0.84)||≤0.001||15||443||575||2027||0.85±0.1 (0.84–0.86)||≤0.001||0.48 (0.46–0.50)||≤0.001|
|Hard CVD||ACC/AHA Risk Calculator||0.63±0.1 (0.56–0.69)||–||0.67±0.1 (0.66–0.69)||–||74||1042||124||2159||0.67±0.1 (0.66–0.69)||–||–||–|
|ML Risk Calculator||0.79±0.1 (0.72–0.84)||≤0.001||0.96±0.1 (0.95–0.97)||≤0.001||42||120||156||3081||0.95±0.1 (0.94–0.96)||≤0.001||0.45 (0.43–0.47)||≤0.001|
|All CVD||ACC/AHA Risk Calculator||0.62±0.1 (0.57–0.67)||–||0.69±0.1 (0.68–0.71)||–||146||926||240||2087||0.68±0.1 (0.67–0.70)||–||–||–|
|ML Risk Calculator||0.93±0.1 (0.90–0.95)||≤0.001||0.92±0.1 (0.91–0.93)||≤0.001||28||247||358||2766||0.92±0.1 (0.91–0.93)||≤0.001||0.54 (0.52–0.55)||≤0.001|
|Hard CVD||ACC/AHA Risk Calculator||0.76±0.1 (0.72–0.80)||–||0.56±0.1 (0.55–0.58)||–||114||2606||366||3373||0.58±0.1 (0.57–0.59)||–||–||–|
|ML Risk Calculator||0.86±0.1 (0.82–0.89)||≤0.001||0.95±0.1 (0.94–0.96)||≤0.001||69||324||411||5655||0.94±0.1 (0.93–0.95)||≤0.001||0.49 (0.48–0.50)||≤0.001|
|All CVD||ACC/AHA Risk Calculator||0.75±0.1 (0.72–0.78)||–||0.59±0.1 (0.56–0.61)||–||242||2238||734||3245||0.62±0.1 (0.60–0.63)||–||–||–|
|ML Risk Calculator||0.96±0.1 (0.94–0.97)||≤0.001||0.87±0.1 (0.86–0.88)||≤0.001||43||690||933||4793||0.89±0.1 (0.88–0.89)||≤0.001||0.50 (0.48–0.51)||≤0.001|
The number of false negatives (ie, subjects who are classified by the Risk Calculator as “low risk” but do experience a CVD event) and false positives (ie, subjects who are classified as “high risk” but do not experience a CVD event), and the categorical net reclassification improvement (NRI) between each base ACC/AHA model and its corresponding ML model using the same risk factors are also shown in Table 2. For the ML Risk Calculator, an NRI improvement of 0.49 for “Hard CVD” events and 0.50 for “All CVD” events, when compared with the ACC/AHA Risk Calculator, for all subjects, was achieved.
To investigate the potential impact of removing the statin users, we performed a sensitivity analysis after excluding statin users from the data set and found similar results for ML Risk Calculator performance. For example, for “Hard CVD” events, the ML Risk Calculator had AUC of 0.92 and NRI of 0.46 when statin users were excluded, as compared with AUC of 0.92 and NRI of 0.49 when statin users were included. The baseline characteristics of the study population and subgroups of interest, when statin users were excluded from the analysis, and the performance metrics of the risk calculators are reported in Tables S1 and S2, respectively.
Figure 2 depicts the receiver operating characteristic curves of the different models. It can be observed that all ML models attain high discrimination ability between events and no events with respect to the base models. In particular, the ML Risk Calculator, for “Hard CVD” events, achieved an average AUC of 0.92 and for “All CVD” events, the AUC was 0.94. The ACC/AHA Risk Calculator for “Hard CVD” events achieved an average AUC of 0.71, and for “All CVD” events, the AUC was 0.72. The corresponding receiver operating characteristic curves, when statin users were excluded from the analysis, are depicted in Figure S2.
Statin Eligibility and Missed Treatment Opportunities
Figure 3 shows the performance characteristics of the risk calculators for addressing 2 clinically relevant issues, ie, determining statin eligibility and avoiding missed treatment opportunities. Almost half of the study population (46.0%) were determined by the ACC/AHA calculator to be statin eligible. In contrast, the ML calculator deemed only 11.4% to be at high risk and statin eligible. For “All CVD” events, the ML calculator determined 25.1% to be statin eligible.
Regarding missed treatment opportunities (false negatives), the ACC/AHA calculator also performed poorly, as 23.8% of “Hard CVD” events occurred in individuals that ACC/AHA calculator would not have recommended statin. The ML Risk Calculator fared better, with only 14.4% of “Hard CVD” events and only 4.4% of “All CVD” events occurring in individuals; the ML calculator would not have recommended statin. The breakdown of the missed “Hard CVD” and “All CVD” events comparing the ML Risk calculator with the ACC/AHA Risk Calculator is shown in Figure S3.
The baseline characteristics of the FLEMENGHO external validation study population are reported in Table S3.
Table 3 provides the sensitivity, specificity, and accuracy of the risk calculators using the MESA data set for training purposes only and the FLEMENGHO as an external validation set. The ML Risk calculator tested on “White Race” FLEMENGHO data set achieved a sensitivity of 0.74, specificity of 0.87, and accuracy of 0.84, much higher than ACC/AHA Risk Calculator (sensitivity 0.63, specificity 0.69, and accuracy 0.68). Also, for comparison purposes, we used the “White Race” sub‐cohort of MESA for both training and testing. In this setup, the ML Risk Calculator tested on the “White Race” MESA cohort had a sensitivity of 0.84, specificity of 0.96, and accuracy of 0.95, while the ACC/AHA Risk Calculator achieved 0.73 sensitivity, 0.59 specificity, and 0.60 accuracy. The NRI improvement of ML Risk Calculator tested on FLEMENGHO data set over the ACC/AHA Risk Calculator was 0.29 and for the ML Risk Calculator tested on the “White Race” MESA cohort the NRI was 0.48, respectively. Also, Tables S4 and S5 show the performance metrics of the ML Risk Calculator trained and tested on FLEMENGHO cohort and trained on “White Race” FLEMENGHO cohort and tested on “White Race” MESA Cohort, respectively.
|Cohort||Model||Sn (95% CI)||P Value||Sp (95% CI)||P Value||FN||FP||TP||TN||Acc (95% CI)||P Value||NRI (95% CI)||P Value|
|Train and test on White Race MESA||ACC/AHA Risk Calculator||0.85±0.1 (0.77–0.91)||–||0.45±0.1 (0.42–0.48)||–||16||602||91||488||0.48±0.1 (0.46–0.51)||–||–||–|
|ML Risk Calculator||0.90±0.1 (0.82–0.95)||≤0.001||0.98±0.1 (0.97–0.99)||≤0.001||11||22||96||1068||0.97±0.1 (0.96–0.98)||0.002||0.58 (0.55–0.60)||≤0.001|
|Train on White Race MESA and test on FLEMENGHO||ACC/AHA Risk Calculator||0.74±0.1 (0.66–0.80)||–||0.55±0.1 (0.5–0.59)||–||41||234||114||283||0.59±0.1 (0.55–0.63)||–||–||–|
|ML Risk Calculator||0.86±0.1 (0.80–0.91)||≤0.001||0.82±0.1 (0.79–0.85)||≤0.001||21||92||134||425||0.83±0.1 (0.80–0.86)||0.001||0.39 (0.36–0.43)||0.004|
|Train and test on White Race MESA||ACC/AHA Risk Calculator||0.58±0.1 (0.49–0.68)||–||0.72±0.1 (0.70–0.75)||–||34||336||46||871||0.71±0.1 (0.69–0.74)||–||–||–|
|ML Risk Calculator||0.77±0.1 (0.65–0.85)||≤0.001||0.94±0.1 (0.93–0.96)||≤0.001||19||68||61||1139||0.93±0.1 (0.92–0.95)||0.002||0.41 (0.40–0.46)||≤0.001|
|Train on White Race MESA and test on FLEMENGHO||ACC/AHA Risk Calculator||0.48±0.1 (0.39–0.57)||–||0.82±0.1 (0.78–0.85)||–||57||103||53||463||0.76±0.1 (0.73–0.79)||–||–||–|
|ML Risk Calculator||0.56±0.1 (0.47–0.66)||≤0.001||0.91±0.1 (0.88–0.93)||≤0.001||48||52||62||514||0.85±0.1 (0.82–0.88)||0.002||0.17 (0.14–0.20)||0.004|
|Train and test on White Race MESA||ACC/AHA Risk Calculator||0.73±0.1 (0.66–0.79)||–||0.59±0.1 (0.57–0.61)||–||50||938||137||1359||0.60±0.1 (0.58–0.62)||–||–||–|
|ML Risk Calculator||0.84±0.1 (0.78–0.89)||≤0.001||0.96±0.1 (0.95–0.97)||≤0.001||30||90||157||2207||0.95±0.1 (0.94–0.96)||0.001||0.48 (0.46–0.50)||≤0.001|
|Train on White Race MESA and test on FLEMENGHO||ACC/AHA Risk Calculator||0.63±0.1 (0.57–0.69)||–||0.69±0.1 (0.66–0.72)||–||98||337||167||746||0.68±0.1 (0.65–0.70)||–||–||–|
|ML Risk Calculator||0.74±0.1 (0.68–0.79)||≤0.001||0.87±0.1 (0.85–0.89)||≤0.001||69||144||196||939||0.84±0.1 (0.82–0.86)||0.001||0.29 (0.27–0.31)||0.004|
Figure S4 illustrates the discrimination properties of the ML Risk Calculator compared with the ACC/AHA Risk Calculator for the “White Race” MESA and the FLEMENGHO cohorts. The ML Risk Calculator tested on FLEMENGHO data set model achieved an AUC of 0.81 and the AUC of the ACC/AHA Risk Calculator was 0.70. For the testing on the “White Race” MESA cohort, the ML Risk Calculator trained on the “White Race” MESA cohort achieved an average AUC of 0.91 and the ACC/AHA Risk Calculator achieved an AUC of 0.71, indicating that the ML models can more accurately classify those with and without event.
In this report, we present a new ML‐based risk calculator which uses the same 9 traditional risk factors (ie, age, sex, ethnicity, total cholesterol, high‐density lipoprotein cholesterol, systolic blood pressure, treatment for hypertension, diabetes mellitus, and smoking) used by the ACC/AHA Risk Calculator. Despite using identical input, our ML Risk Calculator attained a significantly higher accuracy than the ACC/AHA Risk Calculator. It detected 13% more high‐risk individuals and recommended 25% less unnecessary statin therapy in low‐risk individuals. Unlike the ACC/AHA Risk Calculator, which is only designed for predicting “Hard CVD” events, our ML Risk Calculator performed well for predicting both Hard and All CVD events. Furthermore, the ML Risk Calculator performed well both for males and females, with NRI improvement values of 0.53 and 0.48 for males, and 0.45 and 0.54 for females, for “Hard CVD” and “All CVD” events, respectively.
To prevent methodological biases and overfitting, we applied a 2‐fold internal cross validation and subsequently tested the model an independent external data set, FLEMENGHO. Moreover, to address the inherent problem of class imbalance, which is a common problem in many cohort‐based studies, we used the NEATER algorithm. Detailed technical analyses of our validation methodologies and the treatment of class imbalance problem can be found in Data S1. Additionally, characteristics of the synthetic data generated by NEATER for the “Male White Race” MESA subgroup can be seen in Table S6.
We trained our ML model with and without statin users. The results of the sensitivity analysis with and without statin users were not significantly different. Refer to the Table S1 for details.
We attribute the superior performance of our ML model to its flexibility and non‐linear function. ML maps the data into a multidimensional space where various separating planes are evaluated and ultimately a “hyperplane” is found. Additionally, the ability to train the ML model with artificially created events using data augmentation techniques such as NEATER can further empower ML over the traditional statistical methods. Our 2‐fold cross validation technique assured the independence of testing samples from the training samples. Overall, ML‐based prediction models are more versatile and capable than statistical models. As our ML model is exposed to more longitudinal data, including those from which ACC/AHA Risk Calculator was derived, we anticipate a more robust risk calculator. We also plan to introduce new predictor variables such as coronary calcium score and other biomarkers to our model that are expected to further improve its predictive power. Although 10‐year risk is the status quo for risk prediction, the ability to predict events in a shorter term (eg, 1‐year) is highly desired. Such a short‐term risk predictor can open doors for new prophylactic therapies. This development is the focus of the SHAPE initiative titled “Machine Learning Vulnerable Patient: Developing an Artificial Intelligence‐based Forecast System for Prediction of Heart Attacks within 12 Months”.
Study Strengths and Limitations
A major strength of our study is that we created our ML model based on a robust 13‐year follow up data set from MESA, which ranks as the best multiethnic study of atherosclerosis in the world. Unlike data in national registries, population surveys, or other healthcare management databases, MESA meets the highest standards of research quality data, which is key for developing reliable machine learning models. Another strength of our study is that we used human expertise with advanced knowledge of the field to supervise and fine‐tune the machine learning models. Yet another strength of our study is the use of oversampling techniques to maximize ML training. Finally, we validated our ML model both internally (2‐fold cross validation) and externally (testing on FLEMENGHO cohort).
As for limitations, although the MESA cohort is comprised of a large population of different ethnicities, it suffers from a low event rate in subgroups, which in turn limits the predictive power and generalizability of our ML Risk Calculator. Because of such low number of events, our ML models may not be reliable in other populations or other countries. Although the external validation results were promising, FLEMENGHO is not a multiethnic cohort, therefore our ML model needs to be validated in other multiethnic data sets. Without validating our model across a large number of US and international cohorts we are unable to claim an equal performance as we have seen in MESA and FLEMENGHO. Another limitation is that MESA's age range is 45 to 84 years, which limits the applicability of our ML Risk Calculator for prediction of events in individuals who fall outside this age range. Moreover, the impact of other risk factors or biomarkers in the prediction of the cardiovascular events was not considered in this study. Furthermore, we did not refit the ACC/AHA Risk Calculator to the MESA data set only, which could likely have increased its accuracy in MESA. We decided not to refit because we aimed to compare the ML model with the exact same model as recommended by ACC/AHA Pooled Cohort Equation.
Finally, we would like to reiterate the major limitation of our ML method is that it was created and validated based on 2 data sets only (MESA and FLEMENGHO), while the ACC/AHA Risk Calculator was derived from several data sets. To this end, further studies are underway to validate these findings in other large multiethnic and multinational cohorts.
In conclusion, we developed a new ML Risk Calculator based on MESA, a multiethnic, community‐based cohort of men and women studied for incident atherosclerotic cardiovascular disease. We used the same variables used by the ACC/AHA Risk Calculator yet achieved a much higher predictive accuracy. Further studies are underway to validate this new ML Risk Calculator in other cohorts. As we introduce more data to our ML Risk Calculator, particularly to cases in which events occurred weeks or months following data collection instead of years or decades, the “holy grail” of short‐term CVD risk prediction may be within reach.
Sources of Funding
Dr Kakadiaris and Dr Vrigkas's work has been funded in part by the UH Hugh Roy and Lillie Cranz Cullen Endowment Fund. MESA study was supported by contracts N01‐HC‐95159, N01‐HC‐95160, N01‐HC‐95161, N01‐HC‐95162, N01‐HC‐95163, N01‐HC‐95164, N01‐HC‐95165, N01‐HC‐95166, N01‐HC‐95167, N01‐HC‐95168 and N01‐HC‐95169 from the National Heart, Lung, and Blood Institute, and by grants UL1‐TR‐000040, UL1 TR 001079, and UL1‐RR‐025005 from National Center for Research Resources.
This study was conducted as part of an international initiative led by the Society for Heart Attack Prevention and Eradication (SHAPE) to advance CVD risk assessment. The authors wish to acknowledge the scientific advice of SHAPE's Scientific Advisory Board. Authors would like to thank Ahmed Gul for his administrative assistance with references and formatting. The authors thank the other investigators, the staff, and the participants of the MESA study for their valuable contributions. A full list of participating MESA investigators and institutions can be found at http://www.mesa-nhlbi.org.
- 1 Benjamin EJ, Blaha MJ, Chiuve SE, Cushman M, Das SR, Deo R, de Ferranti SD, Floyd J, Fornage M, Gillespie C, Isasi CR, Jiménez MC, Jordan LC, Judd SE, Lackland D, Lichtman JH, Lisabeth L, Liu S, Longenecker CT, Mackey RH, Matsushita K, Mozaffarian D, Mussolino ME, Nasir K, Neumar RW, Palaniappan L, Pandey DK, Thiagarajan RR, Reeves MJ, Ritchey M, Rodriguez CJ, Roth GA, Rosamond WD, Sasson C, Towfighi A, Tsao CW, Turner MB, Virani SS, Voeks JH, Willey JZ, Wilkins JT, Wu JH, Alger HM, Wong SS, Muntner P; American Heart Association Statistics Committee and Stroke Statistics Subcommittee . Heart disease and stroke statistics‐2017 update: a report from the American Heart Association. Circulation. 2017; 135:e146–e603.LinkGoogle Scholar
- 2 Goff DC, Lloyd‐Jones DM, Bennett G, Coady S, D'Agostino RB, Gibbons R, Greenland P, Lackland DT, Levy D, O'Donnell CJ, Robinson JG, Schwartz JS, Shero ST, Smith SC, Sorlie P, Stone NJ, Wilson PW, Jordan HS, Nevo L, Wnek J, Anderson JL, Halperin JL, Albert NM, Bozkurt B, Brindis RG, Curtis LH, DeMets D, Hochman JS, Kovacs RJ, Ohman EM, Pressler SJ, Sellke FW, Shen WK, Tomaselli GF; American College of Cardiology/American Heart Association Task Force on Practice Guidelines . 2013 ACC/AHA guideline on the assessment of cardiovascular risk: a report of the American College Of Cardiology/American Heart Association Task Force on Practice Guidelines. Circulation. 2014; 129:S49–S73.LinkGoogle Scholar
- 3 Ridker PM, Cook NR. Statins: new American guidelines for prevention of cardiovascular disease. Lancet. 2013; 382:1762–1765.CrossrefMedlineGoogle Scholar
- 4 Muntner P, Colantonio LD, Cushman M, Goff DC, Howard G, Howard VJ, Kissela B, Levitan EB, Lloyd‐Jones DM, Safford MM. Validation of the atherosclerotic cardiovascular disease pooled cohort risk equations. JAMA. 2014; 311:1406–1415.CrossrefMedlineGoogle Scholar
- 5 Kavousi M, Leening MJ, Nanchen D, Greenland P, Graham IM, Steyerberg EW, Ikram MA, Stricker BH, Hofman A, Franco OH. Comparison of application of the ACC/AHA guidelines, Adult Treatment Panel III guidelines, and European Society of Cardiology guidelines for cardiovascular disease prevention in a European Cohort. JAMA. 2014; 311:1416–1423.CrossrefMedlineGoogle Scholar
- 6 DeFilippis AP, Young R, Carrubba CJ, McEvoy JW, Budoff MJ, Blumenthal RS, Kronmal RA, McClelland RL, Nasir K, Blaha MJ. An analysis of calibration and discrimination among multiple cardiovascular risk scores in a modern multiethnic cohort. Ann Intern Med. 2015; 162:266–275.CrossrefMedlineGoogle Scholar
- 7 DeFilippis AP, Young R, McEvoy JW, Michos ED, Sandfort V, Kronmal RA, McClelland RL, Blaha MJ. Risk score overestimation: the impact of individual cardiovascular risk factors and preventive therapies on the performance of the American Heart Association‐American College of Cardiology‐Atherosclerotic Cardiovascular Disease risk score in a modern multi‐ethnic cohort. Eur Heart J. 2017; 38:598–608.MedlineGoogle Scholar
- 8 Naghavi M, Falk E, Hecht HS, Jamieson MJ, Kaul S, Berman D, Fayad Z, Budoff MJ, Rumberger J, Naqvi TZ, Shaw LJ, Faergeman O, Cohn J, Bahr R, Koenig W, Demirovic J, Arking D, Herrera VLM, Badimon J, Goldstein JA, Rudy Y, Airaksinen J, Schwartz RS, Riley WA, Mendes RA, Douglas P, Shah PK. From vulnerable plaque to vulnerable patient—Part III: executive summary of the screening for heart attack prevention and education (SHAPE) task force report. Am J Cardiol. 2006; 98:2–15.CrossrefMedlineGoogle Scholar
- 9 Sajda P. Machine learning for detection and diagnosis of disease. Annu Rev Biomed Eng. 2006; 8:537–565.CrossrefMedlineGoogle Scholar
- 10 Parthiban L, Subramanian R. Intelligent heart disease prediction system using CANFIS and genetic algorithm. Int J Biol Life Sci. 2007; 3:157–160.Google Scholar
- 11 Kourou K, Exarchos TP, Exarchos KP, Karamouzis MV, Fotiadis DI. Machine learning applications in cancer prognosis and prediction. Comput Struct Biotechnol J. 2015; 13:8–17.CrossrefMedlineGoogle Scholar
- 12 Vidyasagar M. Identifying predictive features in drug response using machine learning: opportunities and challenges. Annu Rev Pharmacol Toxicol. 2015; 55:15–34.CrossrefMedlineGoogle Scholar
- 13 Vock DM, Wolfson J, Bandyopadhyay S, Adomavicius G, Johnson PE, Vazquez‐Benitez G, O'Connor PJ. Adapting machine learning techniques to censored time‐to‐event health record data: a general‐purpose approach using inverse probability of censoring weighting. J Biomed Inform. 2016; 61:119–131.CrossrefMedlineGoogle Scholar
- 14 Araki T, Ikeda N, Shukla D, Jain PK, Londhe ND, Shrivastava VK, Banchhor SK, Saba L, Nicolaides A, Shafique S, Laird JR, Suri JS. PCA‐based polling strategy in machine learning framework for coronary artery disease risk assessment in intravascular ultrasound: a link between carotid and coronary grayscale plaque morphology. Comput Methods Programs Biomed. 2016; 128:137–158.CrossrefMedlineGoogle Scholar
- 15 Deo RC. Machine learning in medicine. Circulation. 2015; 132:1920–1930.LinkGoogle Scholar
- 16 Motwani M, Dey D, Berman DS, Germano G, Achenbach S, Al‐Mallah MH, Andreini D, Budoff MJ, Cademartiri F, Callister TQ, Chang HJ, Chinnaiyan K, Chow BJ, Cury RC, Delago A, Gomez M, Gransar H, Hadamitzky M, Hausleiter J, Hindoyan N, Feuchtner G, Kaufmann PA, Kim YJ, Leipsic J, Lin FY, Maffei E, Marques H, Pontone G, Raff G, Rubinshtein R, Shaw LJ, Stehli J, Villines TC, Dunning A, Min JK, Slomka PJ. Machine learning for prediction of all‐cause mortality in patients with suspected coronary artery disease: a 5‐year multicentre prospective registry analysis. Eur Heart J. 2017; 38:500–507.MedlineGoogle Scholar
- 17 Ambale‐Venkatesh B, Yang X, Wu CO, Liu K, Hundley WG, McClelland RL, Gomes AS, Folsom AR, Shea S, Guallar E, Bluemke DA, Lima JA. Cardiovascular event prediction by machine learning: the multi‐ethnic study of atherosclerosis. Circ Res. 2017; 121:1092–1101.LinkGoogle Scholar
- 18 Bild DE, Bluemke DA, Burke GL, Detrano R, Diez Roux AV, Folsom AR, Greenland P, Jacob DR, Kronmal R, Liu K, Nelson JC, O'Leary D, Saad MF, Shea S, Szklo M, Tracy RP. Multi‐ethnic study of atherosclerosis: objectives and design. Am J Epidemiol. 2002; 156:871–881.CrossrefMedlineGoogle Scholar
- 19 Stolarz‐Skrzypek K, Kuznetsova T, Thijs L, Tikhonoff V, Seidlerová J, Richart T, Jin Y, Olszanecka A, Malyutina S, Casiglia E, Filipovský J, Kawecka‐Jaszcz K, Nikitin Y, Staessen JA. Fatal and nonfatal outcomes, incidence of hypertension, and blood pressure changes in relation to urinary sodium excretion. JAMA. 2011; 305:1777–1785.CrossrefMedlineGoogle Scholar
- 20 Stone NJ, Robinson JG, Lichtenstein AH, Bairey Merz CN, Blum CB, Eckel RH, Goldberg AC, Gordon D, Levy D, Lloyd‐Jones DM, McBride P, Schwartz JS, Shero ST, Smith SC, Watson K, Wilson PWF. 2013 ACC/AHA guideline on the treatment of blood cholesterol to reduce atherosclerotic cardiovascular risk in adults: a report of the American College of Cardiology/American Heart Association Task Force on Practice Guidelines. J Am Coll Cardiol. 2014; 63:2889–2934.CrossrefMedlineGoogle Scholar
- 21 Almogahed BA, Kakadiaris IA. Neater: filtering of over‐sampled data using non‐cooperative game theory. Soft Comput. 2015; 19:3301–3322.CrossrefGoogle Scholar
- 22 Cristianini N, Shawe‐Taylor J. An Introduction to Support Vector Machines and Other Kernel‐Based Learning Methods. Cambridge, United Kingdom: Cambridge University Press; 2000.CrossrefGoogle Scholar
- 23 Jabeen A, Ahmad N, Raza K.
Machine learning‐based state‐of‐the‐art methods for the classification of rna‐seq data. In: Dey N, Ashour AS, Borra S, eds. Classification in BioApps: Automation of Decision Making. Cham: Springer International Publishing; 2018:133–172.CrossrefGoogle Scholar
- 24 Melacci S, Belkin M. Laplacian support vector machines trained in the primal. J Mach Learn Res. 2011; 12:1149–1184.Google Scholar
- 25 Baesens B, Van Gestel T, Viaene S, Stepanova M, Suykens J, Vanthienen J. Benchmarking state‐of‐the‐art classification algorithms for credit scoring. J Oper Res Soc. 2003; 54:627–635.CrossrefGoogle Scholar
- 26 Xiangying W, Yixin Z. Statistical learning theory and state of the art in SVM. Proceedings of IEEE International Conference on Cognitive Informatics. 2003:55–59.Google Scholar
- 27 Boser BE, Guyon IM, Vapnik VN. A training algorithm for optimal margin classifiers. Proceedings of 5th Annual Workshop on Computational Learning Theory. 1992:144–152.Google Scholar
- 28 Weng SF, Reps J, Kai J, Garibaldi JM, Qureshi N. Can machine‐learning improve cardiovascular risk prediction using routine clinical data?PLoS One. 2017; 12:e0174944.CrossrefMedlineGoogle Scholar
- 29 Isler Y, Narin A, Ozer M. Comparison of the effects of cross‐validation methods on determining performances of classifiers used in diagnosing congestive heart failure. Measurement Science Review. 2015; 15:196–201.CrossrefGoogle Scholar
- 30 Wiens AD, Inan OT. Accelerometer body sensor network improves systolic time interval assessment with wearable ballistocardiography. Proceedings of Annual International Conference of the IEEE Engineering in Medicine and Biology Society. 2015:1833–1836.Google Scholar
- 31 Alcaraz R, Martínez A, Rieta JJ. Role of the P‐wave high frequency energy and duration as noninvasive cardiovascular predictors of paroxysmal atrial fibrillation. Comput Methods Programs Biomed. 2015; 119:110–119.CrossrefMedlineGoogle Scholar
- 32 Chang C‐C, Lin C‐J. LIBSVM: a library for support vector machines. ACM Trans Intell Syst Technol. 2011; 2:1–27.CrossrefGoogle Scholar