Skip main navigation

Machine Learning–Based Risk Assessment for Cancer Therapy–Related Cardiac Dysfunction in 4300 Longitudinal Oncology Patients

Originally publishedhttps://doi.org/10.1161/JAHA.120.019628Journal of the American Heart Association. 2020;9:e019628

Abstract

Background

The growing awareness of cardiovascular toxicity from cancer therapies has led to the emerging field of cardio‐oncology, which centers on preventing, detecting, and treating patients with cardiac dysfunction before, during, or after cancer treatment. Early detection and prevention of cancer therapy–related cardiac dysfunction (CTRCD) play important roles in precision cardio‐oncology.

Methods and Results

This retrospective study included 4309 cancer patients between 1997 and 2018 whose laboratory tests and cardiovascular echocardiographic variables were collected from the Cleveland Clinic institutional electronic medical record database (Epic Systems). Among these patients, 1560 (36%) were diagnosed with at least 1 type of CTRCD, and 838 (19%) developed CTRCD after cancer therapy (de novo). We posited that machine learning algorithms can be implemented to predict CTRCDs in cancer patients according to clinically relevant variables. Classification models were trained and evaluated for 6 types of cardiovascular outcomes, including coronary artery disease (area under the receiver operating characteristic curve [AUROC], 0.821; 95% CI, 0.815–0.826), atrial fibrillation (AUROC, 0.787; 95% CI, 0.782–0.792), heart failure (AUROC, 0.882; 95% CI, 0.878–0.887), stroke (AUROC, 0.660; 95% CI, 0.650–0.670), myocardial infarction (AUROC, 0.807; 95% CI, 0.799–0.816), and de novo CTRCD (AUROC, 0.802; 95% CI, 0.797–0.807). Model generalizability was further confirmed using time‐split data. Model inspection revealed several clinically relevant variables significantly associated with CTRCDs, including age, hypertension, glucose levels, left ventricular ejection fraction, creatinine, and aspartate aminotransferase levels.

Conclusions

This study suggests that machine learning approaches offer powerful tools for cardiac risk stratification in oncology patients by utilizing large‐scale, longitudinal patient data from healthcare systems.

Nonstandard Abbreviations and Acronyms

AUPR

area under the precision‐recall curve

AUROC

area under the receiver operating characteristic curve

CTRCD

cancer therapy–related cardiac dysfunction

GB

gradient tree boosting

LR

logistic regression

ML

machine learning

RF

random forest

SMOTE

Synthetic Minority Oversampling Technique

SVM

support vector machine

Clinical Perspective

What Is New?

  • This study presents the first, large‐scale machine learning–based approach to evaluate complications between cancer therapies and cardiovascular diseases using cardiovascular echocardiographic and laboratory test variables from over 4300 longitudinal cancer patients.

  • We developed machine learning models with high performance and verified the generalizability using time‐split data to simulate real‐world scenarios and found that combining both laboratory test and echocardiographic variables resulted in the highest performance.

  • We identified and validated multiple clinically relevant variables associated with cancer therapy–related cardiac dysfunction using learned weight analysis of the optimal machine learning models.

What Are the Clinical Implications?

  • We demonstrate the potential clinical implication of using a machine learning method to predict 6 types of cancer therapy–related cardiac dysfunction, including heart failure, atrial fibrillation, coronary artery disease, myocardial infarction, stroke, and de novo cancer therapy–related cardiac dysfunction.

  • These machine learning models offer potential tools for risk assessment of cancer therapy–related cardiac dysfunction in cardio‐oncology clinical practices.

Cardiovascular disease (CVD) is the leading cause of death and the second leading cause of morbidity in cancer survivors after recurrent malignancy in the United States.1 Comorbidity between CVD and cancer suggests underlying shared disease pathogeneses, which can be both genetic and environmental. One critical issue regarding environmental factors is that CVD can be associated with various treatments for cancer itself. First recognized in the 1960s,2 cancer therapy–related cardiac dysfunction (CTRCD) has been increasingly diagnosed and investigated.3, 4, 5, 6, 7, 8 For example, a growing number of cancer survivors (>5 million) are at risk for cardiotoxicity caused by anthracycline therapy years or even decades prior for various types of cancer.9

Through the success of basic and translational research, cancer survivors have become one of the largest growing subsets of patients in the US healthcare system.10 Currently, there are over 16.9 million cancer survivors in the United States. This number is projected to reach more than 22.1 million by 2030.11 Increasing numbers of oncology patients are facing CTRCD risks as cancer survival improves. The growing awareness of cardiovascular toxicity by cancer treatment has led to the emerging field of cardio‐oncology, which centers on preventing, detecting, and treating patients with cardiovascular toxicity from cancer treatment. However, precise prediction and prevention of cardiovascular toxicity in individual cancer patients or survivors has proven elusive. Further, while basic and translational research studies continue, experimental assays in animal models are limited by significant functional disparities between animal and human cardiomyocytes. Development of novel methodologies or tools, such as computational approaches, would offer unique opportunities for cardio‐oncology by utilizing the accumulated longitudinal clinical data available from healthcare systems.

In recent years, machine learning (ML) has been increasingly used for cardiovascular studies, such as for the prediction of drug‐induced cardiovascular complications,12, 13 cardiac resynchronization therapy response prediction,14 risk assessment of cardiovascular events after acute myocardial infarction (MI),15, 16 and claims data–based mortality risk predictions.17 As more longitudinal clinical data are accumulated for oncology patients, ML presents a great opportunity to use these data to build predictive models in clinical practices.18, 19

In this study, we hypothesized that supervised ML models could accurately predict the risk for developing several cardiovascular outcomes in cancer patients. Specifically, we applied ML models to the prediction of 6 types of cardiac outcomes, namely heart failure (HF), atrial fibrillation (AF), coronary artery disease (CAD), MI, stroke, and de novo CTRCD. We also determined several clinically relevant variables associated with these outcomes.

Methods

All data used in this study are available from the corresponding author on reasonable request and the approval of the institutional review board. The code can be found at https://github.com/ChengF‐Lab/CO‐ML.

Study Design

Figure 1 shows the overview of the study design. We integrated both cardiovascular echocardiographic and laboratory testing variables from over 4300 longitudinal cancer patients. We developed and evaluated ML models to assist in the risk assessment of CTRCDs. We systematically tested 5 classification methods: k‐nearest neighbors, logistic regression (LR), support vector machine (SVM), random forest (RF), and gradient tree boosting (GB). For the feature sets, we tested: (1) laboratory tests only, (2) echocardiography only, and (3) laboratory tests and echocardiography combined. The generalizability of these models was verified by time‐based data split. We also interrogated the final models to uncover clinically relevant variables associated with CTRCDs using learned weight analysis.

Figure 1. Overview of the study design.

We integrated both cardiovascular echocardiographic and laboratory testing variables from over 4300 longitudinal cancer patients for the prediction of 6 outcomes, including heart failure (HF), atrial fibrillation (AF), coronary artery disease (CAD), myocardial infarction (MI), stroke, and de novo cancer therapy–related cardiac dysfunction (CTRCD). We systematically tested 5 classification methods: k‐nearest neighbors (k‐NN), logistic regression (LR), support vector machine (SVM), random forest (RF), and gradient tree boosting (GB). For the feature sets, we tested laboratory test variables only, echocardiographic variables only, and laboratory test and echocardiographic variables combined.

Study Population and Data Preparation

This study was reviewed and approved by the institutional review board and the patients gave informed consent. We extracted the clinical data of over 4600 oncology patients receiving cancer therapies from our institutional electronic medical health record database. All adult patients with cancer referred to the cardio‐oncology service at the Cleveland Clinic from 1997 to 2018 were included. Five outcomes, including HF, AF, CAD, MI, and stroke, were extracted using International Classification of Diseases, Ninth and Tenth Revision (ICD‐9, ICD‐10), diagnosis codes and were manually checked by looking at patient charts on EPIC for accuracy (Epic Systems Corporation). Both inpatient and outpatient codes were included in this study. An additional outcome, de novo CTRCD, was also examined in this study. According to the diagnosis date of these 5 cardiac events, we identified the cardiac events that were diagnosed before cancer therapy as preexisting cardiac events and those after as de novo CTRCD. All variables were collected per patient based on the entirety of all available data. All patients had 2 sets of clinical variables: laboratory tests and echocardiographic variables. Laboratory test results included variables such as estimated glomerular filtration rate, glycated hemoglobin, glucose, calcium, total protein, and many others. Echocardiographic data included variables such as left ventricular ejection fraction, left ventricular end‐systolic volume index, and left ventricular end‐diastolic volume index. Since available echocardiographic data were longitudinal, we extracted several features for each echocardiographic variable: maximum of all follow‐ups, minimum of all follow‐ups, slope of all follow‐ups, maximum increase within 3 months, and maximum decrease within 3 months (see Table S1 for a list of the variables). Finally, clinical variables were used as features to build ML models among 6 types of cardiovascular outcomes. After removing patients with >6 missing variables, the final data set contained 4309 patients (see Table for the characteristics of the cohort).

John Wiley & Sons, Ltd

Table 1. Characteristics of the Entire Cardio‐Oncology Cohort

VariablesCohort (N=4309)
Basic characteristics
Age, y61.1±13.7*
Sex
Female2552 (59)
Male1757 (41)
Body mass index, kg/m228.3±7.3
Tobacco use2162 (50)
Alcohol use1995 (48)
Family history1548 (36)
Comorbidity characteristics
Hypertension2450 (57)
Hyperlipidemia1877 (44)
Diabetes mellitus974 (23)
Chest pain1724 (40)
Shortness of breath1523 (35)
Fatigue2202 (51)
Cardiac outcomes
CTRCD1560 (36)
HF596 (14)
AF653 (15)
CAD673 (16)
MI193 (4)
Stroke275 (6)
Preexisting CVD722 (17)
de novo CTRCD838 (19)
Cancer therapy
Chemotherapy4011 (93)
Radiation1969 (46)
Chemotherapy and radiation1780 (41)
Anthracycline1764 (41)
Cyclophosphamide1567 (36)
Trastuzumab822 (19)

AF indicates atrial fibrillation; CAD, coronary artery disease; CTRCD, cancer therapy–related cardiac function; CVD, cardiovascular disease; HF, heart failure; and MI, myocardial infarction.

*Continuous variables are reported as mean±SD.

Categorical variables are reported as number (percentage).

Classifier Development and Evaluation

Our first goal was to identify the optimal classification method and feature set combination. To do this, we systematically tested all of the combinations of 5 classification methods and 3 feature sets. For each outcome, we adopted a training‐validation test procedure, repeated 100 times. In each iteration, all patients were randomly split into training set (81%), validation set (9%), or test set (10%). The training and validation sets were used in a grid search (Table S2) to identify the optimal hyperparameters for each classification method and feature set combination. Then, these 2 sets were merged and trained with the optimal hyperparameters to build the final model, which was evaluated using the test set. See Figure S1 for the detailed workflow of method and feature selection. All classification models were trained using the Python package scikit‐learn.20 We tested the effect of balancing the data sets using Synthetic Minority Oversampling Technique (SMOTE) implemented in the Python package imbalanced‐learn.21

To test the generalizability of our ML models, we adopted a time‐based data split strategy to simulate real‐world scenarios, in which models used to predict new patients (external validation set) are built on data from the past. Specifically, we selected January 1, 2017 (2017.1.1) as the cutoff time point, as it produced subsequent test sets with reasonable sizes. Patients who received cancer therapies before 2017.1.1 were used as the training set, and those who received cancer therapies after 2017.1.1 were used as the test set. The detailed workflow of this strategy is provided in Figure S2.

Model Criteria to Determine Predictive Variables

Next, we sought to understand which clinically relevant variables were significantly associated with CTRCD and further contributed to the high performance of ML models. We examined the weights of the 100 final LR models for each outcome. LR learns a weight for each feature, and the prediction is the summation of all of the products of the weight and feature pairs squashed using a sigmoid function. We identified the clinically relevant variables based on 2 criteria: (1) the absolute coefficient of variation (the ratio of SD and mean) was low to ensure small fluctuation of the weight in the 100 repeats; (2) the absolute associated weight compared with the extremum weight for that outcome was high (relative weight). We used 0.5 and 0.3 as the 2 cutoffs:

where T denotes the feature set, wi denotes the learned weight for feature i, and sgn is the sign function.

To verify the clinically relevant variables uncovered by examining the LR weights, we tested the hazard ratios (95% CIs) of the clinically relevant variables for the de novo CTRCD. The Wald χ2 test was used to evaluate the variables with statistically significant coefficients. In addition, the log‐rank test was used for global significance evaluation. The hazard analyses were performed with the survival (v2.44‐1.1) and survminer (v0.4.6) packages on R 3.6.1.

Statistical Analysis

To evaluate the performance of ML models, we used 2 metrics: area under the receiver operating characteristic curve (AUROC) and area under the precision‐recall curve (AUPR). AUROC and AUPR were computed using the metrics.roc_auc_score and metrics.average_precision_score functions from the scikit‐learn Python package. For the comparison of the performances of the laboratory test and echocardiographic feature sets, we applied a 2‐sided paired sample t test using the AUROCs of the test sets from 100 iterations. P<0.05 was considered statistically significant. The t test was performed using the stats.ttest_rel function from the SciPy Python package.22 We applied χ2 test for the categorical variables to verify their associations with the outcomes. Kolmogorov‐Smirnov test was used for the continuous variables. These 2 statistical analyses were performed by stats.chi2_contingency and stats.ks_2samp from the SciPy Python package.

Results

Overview of the Classifier Performance

In this study, we built a large, longitudinal cardio‐oncology cohort with 4309 oncology patients collected from our institutional electronic medical record database (Table). The median age was 61.1 years (interquartile range [IQR], 53.8–70.5 years) for the overall population. Six types of cardiac events, including HF (n=596), AF (n=653), CAD (n=673), MI (n=193), stroke (n=275), and de novo CTRCD (n=838) were evaluated. In total, 1560 (36%) of patients had at least 1 type of diagnosed cardiac events, among which 722 (17%) patients had preexisting cardiac events/disease before cancer therapy, while 838 (19%) patients developed de novo CTRCD afterward. Among all of the patients, 4011 (93%) were treated with chemotherapy and 1969 (46%) were treated with radiation. For chemotherapy, 1764 (41%) patients were treated with anthracycline drugs (including doxorubicin, idarubicin, daunorubicin, and epirubicin), 1567 (36%) were treated with cyclophosphamide, and 822 (19%) patients were treated with trastuzumab. A list of all therapies can be found in Table S3. Two sets of clinical variables—laboratory tests (such as estimated glomerular filtration rate, glycated hemoglobin, glucose, calcium, and total protein) and echocardiographic variables (such as left ventricular ejection fraction, left ventricular end‐diastolic volume index, and left ventricular end‐systolic volume index)—were used to build the ML models. Table S1 lists all of the variables used in this study.

We conducted a systematic evaluation of 5 ML algorithms (k‐nearest neighbors, LR, SVM, RF, and GB) and 3 feature sets (laboratory tests only, echocardiography only, or both combined). The average performance and SD for each outcome based on the 100 iterations are listed in Table S4 (AUROC) and Table S5 (AUPR). LR, RF, and GB achieved the first‐tier performance, followed by SVM, then k‐nearest neighbors. Although LR, RF, and GB performed similarly, LR achieved the highest AUROCs among 5 outcomes and comparable AUROC for HF which GB achieved the highest AUROC. LR was selected as the optimal classification method for all further analyses.

Figure 2 shows the overall performance for LR models. The AUROCs were 0.882 (95% CI, 0.878–0.887) for HF, 0.787 (95% CI, 0.782–0.792) for AF, 0.821 (95% CI, 0.815–0.826) for CAD, 0.807 (95% CI, 0.799–0.816) for MI, 0.660 (95% CI, 0.650–0.670) for stroke, and 0.802 (95% CI, 0.797–0.807) for de novo CTRCD. All AUPRs were at least 2‐fold of their respective baselines of random classifiers. Precision‐recall curve showed the trade‐off between precision and recall, which, in this case, means the fraction of patients actually developed the disease in the patients who were predicted to have disease (precision) and their fraction in all of the patients who developed the disease (recall). In the case of a random classifier, the prediction error made by the classifier is consistent (a horizontal line in the precision‐recall plot), thus leading to a baseline AUPR that is the percentage of patients with the outcomes in the cohort. The AUPRs compared with their respective baselines were 0.651 (95% CI, 0.641–0.661) versus 0.138 for HF, 0.401 (95% CI, 0.392–0.411) versus 0.151 for AF, 0.481 (95% CI, 0.469–0.492) versus 0.156 for CAD, 0.220 (95% CI, 0.206–0.234) versus 0.045 for MI, 0.138 (95% CI, 0.131–0.146) versus 0.064 for stroke, and 0.592 (95% CI, 0.583–0.601) versus 0.234 for de novo CTRCD.

Figure 2. Performances for the 6 outcomes in receiver operating characteristic (A through F) and precision‐recall (G through L) curves using logistic regression and the combined feature set.

For each subplot, light‐colored lines correspond to the 100 iterations; the saturated‐colored line is the average of the 100 iterations; background indicates mean±SD; the grey dotted line indicates the baseline of a random classifier. The area under the receiver operating characteristic curves (AUROCs) and area under the precision‐recall curves (AUPRs) shown are the averages. AF indicates atrial fibrillation; CAD, coronary artery disease; CTRCD, cancer therapy–related cardiac dysfunction; HF, heart failure; and MI, myocardial infarction.

Combining Echocardiographic and Laboratory Test Variables Showed the Best Performance

Next, we wanted to find out the complementary effect of different feature sets on the model performance. Based on the 100 iterations, we found that while echocardiographic or laboratory test variables alone were predictive, inclusion of both types of data synergistically improved performance of the models (Figure 3 and Figure S3). Moreover, we showed that laboratory test and echocardiographic features performed differently among the outcomes (2‐sided paired t test). Echocardiographic features outperformed laboratory test for HF (0.854 versus 0.729, P<0.001), MI (0.766 versus 0.746, P=0.003), and de novo CTRCD (0.742 versus 0.733, P=0.04). Laboratory test outperformed echocardiographic features for AF (0.760 versus 0.700, P<0.001), CAD (0.797 versus 0.702, P<0.001), and stroke (0.656 versus 0.617, P<0.001). In summary, combining both echocardiographic and laboratory test variables showed the best performance.

Figure 3. Comparison of the performances of laboratory test and echocardiographic feature sets.

A through F, When using the combined feature set, the models outperformed those that used either feature set individually. A, D, and F, Echocardiographic features showed significantly better performances for heart failure (HF), myocardial infarction (MI), and de novo cancer therapy–related cardiac dysfunction (CTRCD) than laboratory test. B, C, and E, Laboratory test features significantly outperformed echocardiographic features for atrial fibrillation (AF), coronary artery disease (CAD), and stroke. P values were calculated using 2‐sided paired sample t test. AUROC indicates area under the receiver operating characteristic curve; and AUPR, area under the precision‐recall curve.

Generalizability of the Models

An important aspect of ML models is real‐world generalizability. The patients were further split by dates—those with cancer therapy start dates before 2017.1.1 (see Methods) as the training set and those with start dates after 2017.1.1 as the test set. The results show that for all 6 outcomes, the AUROCs ranged from 0.913 for HF to 0.656 for MI (Figure 4 and Table S6). All AUPRs were higher than their corresponding baselines as well (Figure S4), indicating high generalizability of ML models in the prediction of CTRCD for new patients in real‐world clinical practices.

Figure 4. Evaluation of the model generalizability using time‐split data.

The receiver operating characteristic curve for each outcome is shown. Dotted line indicates the theoretical baseline performance of a random classifier. Patients were split by the date January 1, 2017. Patients who received cancer therapies before this date were used for model training, and patients who received cancer therapies after this date comprised the test sets. Logistic regression and the combined feature set were used. All models achieved moderate to high performances, suggesting a high generalizability of the models. AF indicates atrial fibrillation; AUROC, area under the receiver operating characteristic curve; CAD, coronary artery disease; CTRCD, cancer therapy–related cardiac dysfunction; HF, heart failure; and MI, myocardial infarction.

Clinical Interpretability of the Models

We next interrogated what the LR models learned from the data to determine associations between clinical variables and the CTRCD outcomes. We examined the model weights of the 100 final models for each outcome. Using the mean and SD of the weight, we derived 2 metrics, coefficient of variation and relative weight (see Methods), to identify the features that have stable and relatively large absolute weights throughout the 100 iterations. Figure 5A shows the 23 variables that were predictive of at least 1 cardiovascular outcome; the actual values of the weights in the LR models can be found in Table S7. Age was most predictive for all 6 outcomes, followed by hypertension and left ventricular ejection fraction, which were also predictive for the 6 outcomes. The predictive variables for each outcome can be found in Table S8. Using Cox proportional hazards model analysis for de novo CTRCD, left ventricular ejection fraction, hazard ratio, and risk factors such as sex, age, and hypertension, were verified as predictive (Figure 5B). The distributions of the 23 variables among the patients further illustrated the clinical relevancy of the variables uncovered by LR model weight analysis (Figure 5C and 5D, Figures S5 and S6).

Figure 5. Clinically relevant variables uncovered by weight examination of the final logistic regression models.

A, Twenty‐three predictive variables for at least 1 outcome (marked by an “X” in the grid). Color gradient indicates that, as the value of the variable increases, the risk for the outcome increases (red) or decreases (green). B, Cox proportional hazards model analysis was performed for de novo cancer therapy–related cardiac dysfunction (CTRCD), which verified the clinically relevant variables using the machine learning method. C, Distributions of 6 continuous variables by the outcomes (P values were computed by Kolmogorov‐Smirnov test). D, Distributions of 5 categorical variables (P values were computed by χ 2 test). +/− indicates whether the patients have the symptoms (row) or the outcomes (column). AF indicates atrial fibrillation; AST, aspartate aminotransferase; CAD, coronary artery disease; HF, heart failure; LVEF, left ventricular ejection fraction; LVESVi, left ventricular end‐systolic volume index; and MI, myocardial infarction.

Impact of Cancer Treatment Types on the Models

We next examined whether cancer treatment information can affect the model performances by conducting 2 separate experiments.

In the first experiment, we pursued to find out whether our models could be applied to patients with specific types of cancer treatments. We generated 5 subpopulations (Table) based on whether the patients were treated with the following cancer therapies respectively: (1) chemotherapy, (2) radiation therapy, (3) chemotherapy and radiation therapy, (4) anthracycline, and (5) trastuzumab. We found high AUROCs in the prediction of de novo CTRCD among different types of cancer therapies as well (Figure 6). Specifically, the AUROCs were 0.779 (95% CI, 0.771–0.787) for anthracycline and 0.764 (95% CI, 0.746–0.783) for trastuzumab.

Figure 6. Performances for de novo cancer therapy–related cardiac dysfunction for patients with different cancer therapies.

A through F, Receiver operating characteristic curves. G through L, Precision‐recall curves. F and L, The model performances using all of the patients with de novo cancer therapy–related cardiac dysfunction (CTRCD) as comparison. For each subplot, light‐colored lines correspond to the 100 iterations; saturated‐colored line is the average of the 100 iterations; background indicates mean±SD; grey dotted line indicates the baseline of a random classifier. The area under the receiver operating characteristic curves (AUROCs) and area under the precision‐recall curves (AUPRs) shown are the averages.

In the second experiment, we examined whether cancer therapy information used as features can improve model performances. We included 4 additional categorical features: the usage of chemotherapy, radiation, anthracycline, or trastuzumab. We found that incorporating treatment information had a marginal improvement on the model performances (AUROC: 0.805 versus 0.802; P>0.1, t test) (Figure S7).

Discussion

In this study, we built predictive ML models for cardiac risk assessment among 6 types of cardiovascular outcomes, including HF, AF, CAD, MI, stroke, and de novo CTRCD. Based on 100 model iterations, all outcomes received relatively high or high AUROC, ranging from 0.882 for HF to 0.660 for stroke (Figure 2). In addition, models built using time‐split data demonstrated a high generalizability of our models for potential clinical implementation (Figure 4).

By comparing the model performances using different feature sets, we found that both laboratory test variables and echocardiographic variables contributed to the overall high performance. When laboratory test data were used alone, all outcomes still achieved moderate to high AUROCs (Figure 3), with 5 of the AUROCs >0.7 and 1 at 0.66. In addition, by comparing the performances of laboratory test and echocardiographic feature sets, we found that for HF, MI, and de novo CTRCD, echocardiographic features significantly outperformed the laboratory test. For AF, CAD, and stroke, laboratory test performed better than echocardiographic features (Figure 3 and Figure S3). These highly predictive models offer potential approaches for cardio‐oncology clinical practice. Oncologists referred these patients to the cardio‐oncology services based on professional assessment of clinical factors such as cardiac symptoms, preexisting cardiac diseases, or cardiovascular risk factors. The models trained on laboratory test data could assist in the decision of referring, with or without incorporation of echocardiographic data.

To understand which specific variables contributed to model performance, we examined the learned weights for the features (Figure 5, Figures S5 and S6). We found that increased creatinine level was associated with high risk of cancer treatment–associated HF. In the general population, creatinine elevation in patients with HF is associated with increased mortality.23 Creatinine is the metabolic product of creatine that is excreted in the urine.24 An elevated glucose level is commonly found in patients with acute MI.25 Studies have also shown that high glucose level is associated with high mortality risk in patients with MI.26 Our results showed that a higher glucose level was associated with higher risks of cancer treatment–associated MI. Other risk factors, such as sex, hypertension, and age, were also verified. Men have a higher risk of heart disease than women.27, 28, 29, 30, 31 Age is a well‐known CVD risk factor,32, 33, 34 and it was identified for all 6 outcomes. Hypertension is another strong risk factor for many types of CVDs.35, 36, 37 To summarize, by looking at the learned weights of the LR models, we uncovered the clinically relevant variables that were strong predictors for the CTRCD outcomes in the oncology cohort.

The skewness of the cardiovascular events in the data sets, especially in MI and stroke, could negatively affect the performances. Therefore, we tested this issue using SMOTE.38 As shown in Figure S8, LR did not benefit from the resampling. The resampling marginally improved the performance of other methods for certain outcomes, such as k‐nearest neighbors for MI, SVM for MI, and SVM for AF. However, the improved models still do not outperform LR. We also experimented with stacking the output of these models. We found that stacking LR, RF, and GB achieved a marginal improvement compared with using LR alone (Figure S9; HF, AF, and stroke). In summary, these observations suggest a low risk of data skewness in our current models, especially for LR models; yet, potential further improvements by combining the techniques such as stacking and resampling, and perhaps by a meta‐classifier trained using the output of the models, are achievable in the future.

Our future work includes several directions. First, we will continue to improve the models as more data are gathered, since we noticed marginally increased model performances when the training sizes increased (Figure S10), suggesting the importance of large‐scale cohorts for ML studies. Our models may also be improved with a more model‐specific variable selection procedure to further reduce risk of “overfitting.” When we tested the effect of limiting variables to a certain period (ie, variable collected within 1, 5, and 10 years of the first diagnosis for the outcome), we found that the models performed similarly, although certain outcomes may be slightly improved (Figure S11). Second, we are actively incorporating imaging data39, 40 directly using convolutional neural networks to improve performance of models further. Third, we plan to integrate ML‐based risk assessment with online tools for use in clinical practice.

Limitations

We acknowledge several potential limitations in the current study. First, because of the retrospective nature of this study and potential risk of patient selection bias, the model performances may be overestimated for real‐world uses, even though model generalizability was evaluated with time‐split data as the external validation set. Although each ICD‐9/10 diagnosis code was manually reviewed by a physician for accuracy, potential errors of ICD9/10 codes may influence the performance of ML models. In addition, while our models can output a probability for each outcome, they have not been explicitly programmed to predict risk levels. This could be considered in the next iteration of the models, in which a system of risk‐based tertiles or quartiles could potentially be implemented based on our data.41, 42

We did not include feature interactions as additional features for the modeling using LR. Some risk factors are known to interact with others, such as sex and diabetes mellitus.43 A potential improvement would be to include these interactions as features. However, we should also note that it could introduce a large number of features and could potentially increase the risk of model overfitting. In addition, some of the classification methods we have evaluated had such capacity, but they did not outperform LR.

We were able to identify several clinically relevant variables that were stable strong predictors of the outcomes. However, this method could not reveal all of the factors. When 2 features are linearly related (multicollinearity), their final learned weights may fluctuate and will depend on the initial randomization of the weights. These features will have high absolute coefficient of variations, and their contributions to the observed outcomes cannot easily be inferred using this method.

Last, although we applied L2 regularization for the training of the LR models, the models could still potentially overfit. To overcome this, we could filter the features to remove irrelevant ones, which could be performed through variance analysis, mutual information, and L1 regularization.

Conclusions

ML models were built for each of 6 CTRCD outcomes for the oncology population based on a systematic evaluation of 5 classification methods and 3 feature sets. These models showed moderate to high performances and real‐world generalizability using time‐split data. We found that laboratory test and echocardiographic variables were each associated with different outcomes. We uncovered several clinically relevant variables associated with CTRCD, offering potential predictive factors and biomarkers for cardio‐oncology clinical practices. Future versions of our models can include risk stratification in tertiles or quartiles to help with clinical decision‐making to impact patient outcomes. To this end, we are currently working on the development of free online outcomes and risk calculators that integrate our models for shared decision‐making. Our findings suggest that ML tools hold promise for cardiac risk assessment for patients before, during, or after cancer treatments by integrating large‐scale, longitudinal patient data from healthcare systems.

Sources of Funding

This work was supported by the National Heart, Lung, and Blood Institute of the National Institutes of Health under award numbers K99 HL138272 and R00 HL138272 to F.C.

Disclosures

None.

Footnotes

* Correspondence to: Patrick Collier, MD, PhD, FESC, Sydell and Arnold Miller Family Heart and Vascular Institute, Cleveland Clinic, Cleveland, OH. E‐mail:
or
Feixiong Cheng, PhD, Lerner Research Institute, Cleveland Clinic, Cleveland, OH. E‐mail:

*Dr Zhou, Dr Hou, and Dr Hussain contributed equally to this work.

Supplementary Material for this article is available at https://www.ahajournals.org/doi/suppl/10.1161/JAHA.120.019628

For Sources of Funding and Disclosures, see page 12.

References

  • 1 Curtin SC. Trends in cancer and heart disease death rates among adults aged 45–64: United States, 1999–2017. Natl Vital Stat Rep. 2019; 68:1–8.MedlineGoogle Scholar
  • 2 Tan C, Tasaka H, Yu KP, Murphy ML, Karnofsky DA. Daunomycin, an antitumor antibiotic, in the treatment of neoplastic disease. Clinical evaluation with special reference to childhood leukemia. Cancer. 1967; 20:333–353.CrossrefMedlineGoogle Scholar
  • 3 Steinherz LJ. Cardiac toxicity 4 to 20 years after completing anthracycline therapy. JAMA. 1991; 266:1672–1677.CrossrefMedlineGoogle Scholar
  • 4 Hancock SL, Hoppe RT, Tucker MA. Factors affecting late mortality from heart disease after treatment of Hodgkin’s disease. JAMA. 1993; 270:1949–1955.CrossrefMedlineGoogle Scholar
  • 5 Fedele P, Orlando L, Schiavone P, Ciccarese M, Forcignanò RC, Calvani N, Marino A, Nacci A, Sponziello F, Mazzoni E, et al. Clinical outcomes and cardiac safety of continuous antiHer2 therapy in c‐erbB2‐positive metastatic breast cancer patients. J Chemother. 2013; 25:369–375.CrossrefMedlineGoogle Scholar
  • 6 Hahn VS, Lenihan DJ, Ky B. Cancer therapy‐induced cardiotoxicity: basic mechanisms and potential cardioprotective therapies. J Am Heart Assoc. 2014; 3:e000665. DOI: 10.1161/JAHA.113.000665LinkGoogle Scholar
  • 7 Pituskin E, Mackey JR, Koshman S, Jassal D, Pitz M, Haykowsky MJ, Pagano JJ, Chow K, Thompson RB, Vos LJ, et al. Multidisciplinary approach to novel therapies in cardio‐oncology research (MANTICORE 101‐Breast): a randomized trial for the prevention of trastuzumab‐associated cardiotoxicity. J Clin Oncol. 2017; 35:870–877.CrossrefMedlineGoogle Scholar
  • 8 Lee J, Hur H, Lee JW, Youn HJ, Han K, Kim NW, Jung SY, Kim Z, Kim KS, Lee MH, et al. Long‐term risk of congestive heart failure in younger breast cancer survivors: a nationwide study by the SMARTSHIP group. Cancer. 2020; 126:181–188.CrossrefMedlineGoogle Scholar
  • 9 Brown SA, Sandhu N, Herrmann J. Systems biology approaches to adverse drug effects: the example of cardio‐oncology. Nat Rev Clin Oncol. 2015; 12:718–731.CrossrefMedlineGoogle Scholar
  • 10 Siegel RL, Miller KD, Jemal A. Cancer statistics, 2019. CA Cancer J Clin. 2019; 69:7–34.CrossrefMedlineGoogle Scholar
  • 11 Miller KD, Nogueira L, Mariotto AB, Rowland JH, Yabroff KR, Alfano CM, Jemal A, Kramer JL, Siegel RL. Cancer treatment and survivorship statistics, 2019. CA Cancer J Clin. 2019; 69:363–385.CrossrefMedlineGoogle Scholar
  • 12 Cai C, Fang J, Guo P, Wang Q, Hong H, Moslehi J, Cheng F. In silico pharmacoepidemiologic evaluation of drug‐induced cardiovascular complications using combined classifiers. J Chem Inf Model. 2018; 58:943–956.CrossrefMedlineGoogle Scholar
  • 13 Cai C, Guo P, Zhou Y, Zhou J, Wang Q, Zhang F, Fang J, Cheng F. Deep learning‐based prediction of drug‐induced cardiotoxicity. J Chem Inf Model. 2019; 59:1073–1084.CrossrefMedlineGoogle Scholar
  • 14 Feeny AK, Rickard J, Patel D, Toro S, Trulock KM, Park CJ, Labarbera MA, Varma N, Niebauer MJ, Sinha S, et al. Machine learning prediction of response to cardiac resynchronization therapy: Improvement versus current guidelines. Circ Arrhythm Electrophysiol. 2019; 12:e007316. DOI: 10.1161/CIRCEP.119.007316.LinkGoogle Scholar
  • 15 Wang Y, Li J, Zheng X, Jiang Z, Hu S, Wadhera RK, Bai X, Lu J, Wang Q, Li Y, et al. Risk factors associated with major cardiovascular events 1 year after acute myocardial infarction. JAMA Netw Open. 2018; 1:e181079.CrossrefMedlineGoogle Scholar
  • 16 Xu B, Kocyigit D, Grimm R, Griffin BP, Cheng F. Applications of artificial intelligence in multimodality cardiovascular imaging: a state‐of‐the‐art review. Prog Cardiovasc Dis. 2020; 63:367–376.CrossrefMedlineGoogle Scholar
  • 17 Krumholz HM, Coppi AC, Warner F, Triche EW, Li SX, Mahajan S, Li Y, Bernheim SM, Grady J, Dorsey K, et al. Comparative effectiveness of new approaches to improve mortality risk models from Medicare claims data. JAMA Netw Open. 2019; 2:e197314.CrossrefMedlineGoogle Scholar
  • 18 Keane PA, Topol EJ. With an eye to AI and autonomous diagnosis. NPJ Digit Med. 2018; 1:40.CrossrefMedlineGoogle Scholar
  • 19 Rajkomar A, Dean J, Kohane I. Machine learning in medicine. N Engl J Med. 2019; 380:1347–1358.CrossrefMedlineGoogle Scholar
  • 20 Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V, et al. Scikit‐learn: machine learning in Python. J Mach Learn Res. 2011; 12:2825–2830.Google Scholar
  • 21 Lemaître G, Nogueira F, Aridas CK. Imbalanced‐learn: a Python toolbox to tackle the curse of imbalanced datasets in machine learning. J Mach Learn Res. 2017; 18:1–5.Google Scholar
  • 22 Virtanen P, Gommers R, Oliphant TE, Haberland M, Reddy T, Cournapeau D, Burovski E, Peterson P, Weckesser W, Bright J, et al. SciPy 1.0: fundamental algorithms for scientific computing in Python. Nat Methods. 2020; 17:261–272.CrossrefMedlineGoogle Scholar
  • 23 Smith GL, Vaccarino V, Kosiborod M, Lichtman JH, Cheng S, Watnick SG, Krumholz HM. Worsening renal function: what is a clinically meaningful change in creatinine during hospitalization with heart failure?J Card Fail. 2003; 9:13–25.CrossrefMedlineGoogle Scholar
  • 24 Perrone RD, Madias NE, Levey AS. Serum creatinine as an index of renal function: new insights into old concepts. Clin Chem. 1992; 38:1933–1953.CrossrefMedlineGoogle Scholar
  • 25 Kosiborod M. Blood glucose and its prognostic implications in patients hospitalised with acute myocardial infarction. Diabetes Vasc Dis Res. 2008; 5:269–275.CrossrefMedlineGoogle Scholar
  • 26 Ishihara M. Acute hyperglycemia in patients with acute myocardial infarction. Circ J. 2012; 76:563–571.CrossrefMedlineGoogle Scholar
  • 27 Kannel WB, Hjortland MC, McNamara PM, Gordon T. Menopause and risk of cardiovascular disease: the Framingham study. Ann Intern Med. 1976; 85:447–452.CrossrefMedlineGoogle Scholar
  • 28 Njølstad I, Arnesen E, Lund‐Larsen PG. Smoking, serum lipids, blood pressure, and sex differences in myocardial infarction: a 12‐year follow‐up of the Finnmark study. Circulation. 1996; 93:450–456.LinkGoogle Scholar
  • 29 Vitale C, Fini M, Speziale G, Chierchia S. Gender differences in the cardiovascular effects of sex hormones. Fundam Clin Pharmacol. 2010; 24:675–685.CrossrefMedlineGoogle Scholar
  • 30 Dallongevillle J, De Bacquer D, Heidrich J, De Backer G, Prugger C, Kotseva K, Montaye M, Amouyel P. Gender differences in the implementation of cardiovascular prevention measures after an acute coronary event. Heart. 2010; 96:1744–1749.CrossrefMedlineGoogle Scholar
  • 31 De Smedt D, De Bacquer D, De Sutter J, Dallongeville J, Gevaert S, De Backer G, Bruthans J, Kotseva K, Reiner Ž, Tokgözoğlu L, et al. The gender gap in risk factor control: effects of age and education on the control of cardiovascular risk factors in male and female coronary patients. the EUROASPIRE IV study by the European Society of Cardiology. Int J Cardiol. 2016; 209:284–290.CrossrefMedlineGoogle Scholar
  • 32 Castelli WP. Epidemiology of coronary heart disease: the Framingham study. Am J Med. 1984; 76:4–12.CrossrefMedlineGoogle Scholar
  • 33 Rich‐Edwards JW, Manson JE, Hennekens CH, Buring JE. The primary prevention of coronary heart disease in women. N Engl J Med. 1995; 332:1758–1766.CrossrefMedlineGoogle Scholar
  • 34 Dhingra R, Vasan RS. Age as a risk factor. Med Clin North Am. 2012; 96:87–91.CrossrefMedlineGoogle Scholar
  • 35 Wu CY, Hu HY, Chou YJ, Huang N, Chou YC, Li CP. High blood pressure and all‐cause and cardiovascular disease mortalities in community‐dwelling older adults. Medicine (Baltimore). 2015; 94:e2160.CrossrefMedlineGoogle Scholar
  • 36 Stevens SL, Wood S, Koshiaris C, Law K, Glasziou P, Stevens RJ, McManus RJ. Blood pressure variability and cardiovascular disease: systematic review and meta‐analysis. BMJ. 2016; 354:i4098.CrossrefMedlineGoogle Scholar
  • 37 Kjeldsen SE. Hypertension and cardiovascular risk: general aspects. Pharmacol Res. 2018; 129:95–99.CrossrefMedlineGoogle Scholar
  • 38 Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP. SMOTE: synthetic minority over‐sampling technique. J Artif Intell Res. 2002; 16:321–357.CrossrefGoogle Scholar
  • 39 Henglin M, Stein G, Hushcha PV, Snoek J, Wiltschko AB, Cheng S. Machine learning approaches in cardiovascular imaging. Circ Cardiovasc Imaging. 2017; 10:e005614.LinkGoogle Scholar
  • 40 Zhang J, Gajjala S, Agrawal P, Tison GH, Hallock LA, Beussink‐Nelson L, Lassen MH, Fan E, Aras MA, Jordan CR, et al. Fully automated echocardiogram interpretation in clinical practice: feasibility and diagnostic accuracy. Circulation. 2018; 138:1623–1635.LinkGoogle Scholar
  • 41 Kang Y, Assuncao BL, Denduluri S, McCurdy S, Luger S, Lefebvre B, Carver J, Scherrer‐Crosbie M. Symptomatic heart failure in acute leukemia patients treated with anthracyclines. JACC CardioOncol. 2019; 1:208–217.CrossrefMedlineGoogle Scholar
  • 42 Abdel‐Qadir H, Thavendiranathan P, Austin PC, Lee DS, Amir E, Tu JV, Fung K, Anderson GM. Development and validation of a multivariable prediction model for major adverse cardiovascular events after early stage breast cancer: a population‐based cohort study. Eur Heart J. 2019; 40:3913–3920.CrossrefMedlineGoogle Scholar
  • 43 Wakabayashi I. Gender differences in cardiovascular risk factors in patients with coronary artery disease and those with type 2 diabetes. J Thorac Dis. 2017; 9:E503–E506.CrossrefMedlineGoogle Scholar

eLetters(0)

eLetters should relate to an article recently published in the journal and are not a forum for providing unpublished data. Comments are reviewed for appropriate use of tone and language. Comments are not peer-reviewed. Acceptable comments are posted to the journal website only. Comments are not published in an issue and are not indexed in PubMed. Comments should be no longer than 500 words and will only be posted online. References are limited to 10. Authors of the article cited in the comment will be invited to reply, as appropriate.

Comments and feedback on AHA/ASA Scientific Statements and Guidelines should be directed to the AHA/ASA Manuscript Oversight Committee via its Correspondence page.