Deep Learning to Predict Cardiac Magnetic Resonance–Derived Left Ventricular Mass and Hypertrophy From 12-Lead ECGs
Circulation: Cardiovascular Imaging
Abstract
Background:
Classical methods for detecting left ventricular (LV) hypertrophy (LVH) using 12-lead ECGs are insensitive. Deep learning models using ECG to infer cardiac magnetic resonance (CMR)-derived LV mass may improve LVH detection.
Methods:
Within 32 239 individuals of the UK Biobank prospective cohort who underwent CMR and 12-lead ECG, we trained a convolutional neural network to predict CMR-derived LV mass using 12-lead ECGs (left ventricular mass-artificial intelligence [LVM-AI]). In independent test sets (UK Biobank [n=4903] and Mass General Brigham [MGB, n=1371]), we assessed correlation between LVM-AI predicted and CMR-derived LV mass and compared LVH discrimination using LVM-AI versus traditional ECG-based rules (ie, Sokolow-Lyon, Cornell, lead aVL rule, or any ECG rule). In the UK Biobank and an ambulatory MGB cohort (MGB outcomes, n=28 612), we assessed associations between LVM-AI predicted LVH and incident cardiovascular outcomes using age- and sex-adjusted Cox regression.
Results:
LVM-AI predicted LV mass correlated with CMR-derived LV mass in both test sets, although correlation was greater in the UK Biobank (r=0.79) versus MGB (r=0.60, P<0.001 for both). When compared with any ECG rule, LVM-AI demonstrated similar LVH discrimination in the UK Biobank (LVM-AI c-statistic 0.653 [95% CI, 0.608 -0.698] versus any ECG rule c-statistic 0.618 [95% CI, 0.574 -0.663], P=0.11) and superior discrimination in MGB (0.621; 95% CI, 0.592 -0.649 versus 0.588; 95% CI, 0.564 -0.611, P=0.02). LVM-AI-predicted LVH was associated with incident atrial fibrillation, myocardial infarction, heart failure, and ventricular arrhythmias.
Conclusions:
Deep learning-inferred LV mass estimates from 12-lead ECGs correlate with CMR-derived LV mass, associate with incident cardiovascular disease, and may improve LVH discrimination compared to traditional ECG rules.
Introduction
See Editorial by Faro and Sengupta
Left ventricular (LV) hypertrophy (LVH) is defined as pathologically increased LV mass1 and predicts adverse cardiovascular events including atrial fibrillation (AF)2 and heart failure (HF).3 ECGs are common, inexpensive, and have been used to infer the presence of LVH for decades using amplitude-based rules.4,5 Yet, studies consistently demonstrate that ECG-based LVH rules have limited sensitivity.6 Cardiac magnetic resonance (CMR) provides accurate and reproducible quantification of cardiac structure and now represents the gold-standard for LVH diagnosis.7
Deep learning architectures are a subset of machine learning algorithms capable of modeling multiple nonlinear interactions present within complex data.8 A potential role for deep learning on clinical data is to leverage information present within data types available at scale to infer rich structural features typically available only through complex, expensive, or invasive diagnostics.9 CMR is costly, time-consuming, and not universally available. Conversely, ECG is inexpensive, ubiquitous, and may contain sufficiently rich information to infer cardiac structure.
Recent work has demonstrated that LV mass estimation using deep learning on 12-lead ECG is feasible,9,10 but previous studies have utilized echocardiogram-based LV mass,9,10 were not designed to assess for associations between predicted LV mass and incident cardiovascular outcomes, and developed models within modestly sized and retrospectively ascertained health care–related datasets, which are subject to selection bias and may have limited generalizability.
In this study, we analyzed a unique dataset of over 35 000 individuals in the UK Biobank prospective cohort study who underwent acquisition of both 12-lead ECG and CMR and trained a deep learning model to infer CMR-derived LV mass using 12-lead ECG (left ventricular mass-artificial intelligence [LVM-AI]). We then compared the performance of LVM-AI to established ECG-based criteria4,5 for LVH diagnosis in independent test sets from the UK Biobank and an external healthcare system (Mass General Brigham, MGB) and assessed whether LVM-AI predicted LVH was associated with incident cardiovascular events.
Methods
Data Availability
UK Biobank data are publicly available by application (www.ukbiobank.ac.uk). MGB data contain protected health information and cannot be shared publicly. The code underlying LVM-AI is accessible at https://github.com/broadinstitute/ml4h/tree/master/model_zoo/left_ventricular_mass_from_ecg_student_and_mri_teacher.
Derivation Sample
The UK Biobank is a prospective cohort of 502 629 participants recruited between 2006 and 2010.11 Briefly, ≈9.2 million individuals aged 40 to 69 living within 25 miles of the 22 assessment centers in England, Wales, and Scotland were invited, and 5.4% participated in the baseline assessment. Extensive questionnaires, physical measures, and biological samples were collected at recruitment, with multimodal imaging obtained in a large subset. All participants are followed for health outcomes through linkage to national datasets. Participants provided written informed consent. The UK Biobank was approved by the UK Biobank Research Ethics Committee (reference number 11/NW/0382). Use of UK Biobank (application 7089) and MGB data were approved by the local MGB Institutional Review Board.
Baseline Assessment
For this analysis, we included all individuals who underwent both resting 12-lead ECG and CMR contemporaneously during the UK Biobank imaging assessment. Demographics including age, sex, and race, and physical measurements including height, weight, and body mass index (BMI) were obtained at the imaging assessment or study visit most closely preceding.
The UK Biobank CMR protocol has been described previously.12 Briefly, all CMRs were acquired on a clinical wide-bore 1.5 Tesla scanner (MAGNETOM Aera, Syngo Platform VD13A, Siemens Healthcare, Erlangen, Germany) and used balanced steady-state free precession with typical parameters.
Data Processing
Resting 12-lead ECG data were downloaded as XML files and converted into arrays of lead amplitude sequence data. CMR images were downloaded as DICOM slabs and converted into arrays of 3-dimensional voxel data. We utilized a validated deep learning model (ML4Hseg)13 to extract LV mass from the CMR images, which served as the ground truth for LVM-AI. A conceptual overview of the study is shown in Figure 1.
LVM-AI
After setting aside a random sample (n=4903) as an internal test set (UK Biobank Test, Figure 2), we trained LVM-AI within 32 239 individuals with paired CMR and 12-lead ECG. LVM-AI is a one-dimensional convolutional neural network designed to infer LV mass using 12-lead ECG (Figure I in the Data Supplement). LVM-AI was provided with the entire 10 seconds of the 12-lead ECG waveform as well as participant age, sex, and BMI. Given the clinical importance of diagnosing LVH (ie, elevated LV mass), LVM-AI utilized a loss function giving additional weight to errors at the high extreme of the LV mass distribution.14
ECG Rules
We sought to compare LVM-AI to the lead aVL (R wave in lead aVL >1.1 mV),4 Sokolow and Lyon,5 and Cornell voltage4 rules for diagnosing LVH using 12-lead ECG (Table I in the Data Supplement). Although ECG rules adjusted for additional clinical factors have been proposed,15 we focused on the original rules since they are commonly used.6 However, in secondary analyses we assessed the performance of a Cornell voltage product adjusted for age, sex, BMI, and hypertension.15
To apply ECG rules, we extracted lead-specific R and S wave amplitudes from the 12-lead ECG XML files. To validate ECG rule calculation, we developed a plotting function16 to reconstruct 12-lead ECG waveforms for visual interpretation (Figure II in the Data Supplement). Two cardiologists (S.K. and J.P.P.) manually assessed the accuracy of extracted amplitudes and presence of LVH by each rule. Both per-lead (98.7%–100%) and per-rule (98.7%–99.3%) accuracy met prespecified criteria (>90%). Interrater agreement on a per-lead (Gwet’s AC117 0.98) and per-rule (range, 0.91–0.96) basis was excellent (Tables II and III in the Data Supplement). Given that ECG-based LVH rules have not been validated in the setting of left bundle branch block, left anterior fascicular block, and limb lead reversal, we excluded these tracings from ECG rule analyses.6
Disease Associations
We assessed for associations between LVM-AI predicted LV mass and incident AF, myocardial infarction (MI), HF, and ventricular arrhythmias (VA) within 35 350 participants with follow-up clinical data available after the imaging assessment. Diseases were defined using self-report and inpatient International Classification of Diseases-9/International Classification of Diseases-10 codes (updated through March 31, 2021, Table IV in the Data Supplement). Follow-up started at ECG acquisition and spanned until the earliest of an event, death, or last follow-up. Last follow-up was dependent upon the availability of linked hospital data and was therefore defined as March 31, 2020 for participants enrolled in England (99.6%), October 31, 2016 for participants enrolled in Scotland (0.2%), and February 29, 2016 for participants enrolled in Wales (0.2%).
External Validation
We tested LVM-AI in the external MGB health care–related dataset (Figure 2). First, we assessed correlation between LVM-AI predicted LV mass and CMR-derived LV mass in a sample of individuals from the MGB Biobank with both ECG and CMR performed within 1 year of each other (MGB test). Second, we compared the accuracy of LVM-AI predicted LVH to traditional ECG-based rules in MGB Test. Third, we assessed for associations between LVM-AI predicted LVH and incident AF, MI, HF, and VA in a previously described MGB-based dataset independent of MGB Test (MGB Outcomes).18 Disease definitions have been previously validated (Table IV in the Data Supplement).18,19 External validation methods are described in detail in Methods in the Data Supplement.
Statistical Analyses
We calculated the Pearson correlation and mean absolute error between LVM-AI predicted LV mass and CMR-derived LV mass. We assessed agreement using Bland-Altman plots.20 We quantified calibration using the mean within-individual difference between LVM-AI predicted and CMR-derived LV mass. To compare LVM-AI to traditional ECG rules, we calculated sensitivity, specificity, positive predictive value, negative predictive value, and c-statistic of each using CMR-derived LVH as the reference. We also generated contingency tables and performed net reclassification analyses. In all analyses, LVH was defined as indexed LV mass >72 g/m2 (men) and >55 g/m2 (women).21 The sex-specific 90th percentile of LV mass index was a secondary LVH definition. Indexing for body surface area was performed using the DuBois formula.22 CIs were generated using the exact method and test characteristics were compared using 1000-sample bootstrapping.
To assess LVM-AI behavior, we produced saliency maps depicting areas of the ECG having the largest gradients (ie, greatest influence on LV mass predictions). For each individual, we additionally identified the ECG lead having the highest absolute gradient, as a surrogate for the most influential ECG lead on that individual’s LV mass estimate.
We assessed associations between LVM-AI predicted LVH and incident AF, MI, HF, and VA using Cox proportional hazards models adjusted for age and sex. We built similar models using LVM-AI predicted LV mass index as a continuous exposure. The proportional hazards assumption was assessed by inspecting Schoenfeld residuals. Substantial deviations from proportional hazards (observed for age and sex only) were modeled using interaction terms including strata of person-time. We plotted cumulative risk of events within strata of LVM-AI predicted LVH using the Kaplan-Meier method. Given evidence that anatomic LVH may provide complementary prognostic information to ECG rule-based LVH,23 we fit analogous Cox proportional hazards models using (1) the ECG rules and (2) both the ECG rules and LVM-AI predicted LVH as exposures of interest.
We performed sensitivity analyses to assess robustness of our results. First, we trained a version of LVM-AI taking ECG alone as input. Second, we trained a version of LVM-AI utilizing an unweighted logcosh function regression loss (a loss function giving equal weight to errors at either extreme of the LV mass distribution). Third, we compared LVM-AI with a modified Cornell voltage product adjusted for age, sex, BMI, and hypertension.15
Results
LVM-AI Derivation
A total of 37 142 individuals had both CMR-derived LV mass and 12-lead ECG available. Setting aside 4903 individuals for UK Biobank Test, we trained LVM-AI on a total of 32 239 participants (Figure 2). Individuals in the training set had a mean age of 64.2±7.5 years and 52% were female. The mean CMR-derived LV mass index was 47.0±9.6 g/m2. Other characteristics are shown in Table 1.
UKBB training set (N=32 239) | UKBB test set (N=4903) | MGB test set (N=1371) | MGB ambulatory (N=28 612) | |
---|---|---|---|---|
Age | 64.2±7.5 | 63.6±7.7 | 55.5±14.6 | 62.3±10.4 |
Female | 16 591 (51.5%) | 2573 (52.5%) | 630 (46.0%) | 15 012 (52.5%) |
Race/ethnicity | … | … | … | … |
White participant | 31 166 (96.7%) | 4763 (97.1%) | 1178 (86.0%) | 24 812 (86.7%) |
Black participant | 221 (0.7%) | 24 (0.5%) | 85 (6.2%) | 1055 (3.7%) |
Hispanic or Latino | … | … | 31 (2.3%) | 794 (2.8%) |
Asian or Pacific Islander | 449 (1.4%) | 54 (1.1%) | 31 (2.3%) | 625 (2.2%) |
Mixed | 153 (0.5%) | 21 (0.4%) | … | … |
Other | 163 (0.5%) | 24 (0.5%) | 20 (1.5%) | 406 (1.4%) |
Unknown | 87 (0.3%) | 17 (0.3%) | 26 (1.9%) | 920 (3.2%) |
Systolic blood pressure, mm Hg | 138±18 | 137±18 | 126±19 | 131±18 |
Diastolic blood pressure, mm Hg | 79±10 | 79±10 | 76±12 | 76±10 |
HTN | 9893 (30.7%) | 1413 (28.9%) | 333 (24.3%) | 11 807 (41.3%) |
Diabetes | 1214 (3.8%) | 150 (3.1%) | 261 (19.0%) | 5170 (18.1%) |
Heart failure | 177 (0.55%) | 24 (0.49%) | 173 (12.6%) | 2499 (8.7%) |
Myocardial infarction | 652 (2.0%) | 91 (1.9%) | 181 (13.2%) | 3278 (11.5%) |
CMR-derived LV mass, g | 89.1±24.8 | 89.3±24.3 | 120.8±48.7 | … |
CMR-derived LV mass index, g/m2 | 47.0±9.6 | 47.3±9.4 | 61.1±21.5 | … |
CMR indicates cardiac magnetic resonance; HTN, hypertension; LV, left ventricular; MGB, Mass General Brigham; and UKBB, UK Biobank.
LVM-AI Validation
Individuals in UK Biobank Test (mean age, 63.6±7.7, 53% female) were similar in composition to the training set, whereas individuals in MGB Test (mean age 55.5±14.6, 46% female) had substantially greater cardiac comorbidity (Table 1). The mean CMR-derived LV mass index was 47.3±9.4 g/m2 in UK Biobank Test and 61.1±21.5 g/m2 in MGB Test. The prevalence of CMR-derived LVH was 126/4903 (2.6%) in UK Biobank Test and 454/1327 (34%) in MGB test.
In UK Biobank Test, when compared with CMR-derived LV mass, LVM-AI demonstrated good correlation (r=0.79 [95% CI, 0.78–0.80], P<0.001), accuracy (mean absolute error 12.6 g [95% CI, 12.3–12.9]), and calibration (within-individual mean difference −3.1 g [95% CI, −3.5 to −2.6], Figure 3). Within MGB Test, LVM-AI demonstrated moderate correlation (r=0.48 [95% CI, 0.44–0.52], P<0.001) but poor accuracy (mean absolute error 117.0 g [95% CI, 111.7–122.3]), due largely to systematic overestimation (within-individual mean difference −111.8 g [95% CI, −117.3 to −106.2]). After linear recalibration (Methods in the Data Supplement), correlation (r=0.60 [95% CI, 0.57–0.64], P<0.001) and accuracy (mean absolute error, 28.4 g [95% CI, 27.1–29.8]) were improved (Figure 3). In both test sets, correlation was slightly lower for indexed LV mass (UK Biobank r=0.63 [95% CI, 0.61–0.64]; MGB r=0.51 [95% CI, 0.47–0.55], Figure 3). Bland-Altman plots demonstrated greater agreement in UK Biobank Test (95% limits of agreement −15.1 to 18.6 g/m2) than MGB Test (95% limits of agreement −36.3 to 36.3 g/m2), as well as a tendency to make conservative estimation errors in MGB (Figure III in the Data Supplement). Sex-stratified distributions of actual and predicted LV mass are shown in Figure IV in the Data Supplement. Plots depicting learned embeddings from LVM-AI are shown in Figure V in the Data Supplement. Saliency maps demonstrated that components of the ECG waveform plausibly relevant for LV mass estimation (eg, p wave, early portion of QRS complex) had the greatest impact on LV mass estimates (Figure 4 and Figure VI in the Data Supplement). On an individual basis, the ECG lead exerting the greatest influence on LV mass estimates was most frequently V5 (97.4%), followed by V4 (2.4%) then V1 (0.3%).
In total, 4417 (90.0%) in UK Biobank Test and 1062 (77.5%) in MGB Test had a 12-lead ECG suitable for ECG rule calculation (Figure 2). LVM-AI demonstrated moderate LVH discrimination (UK Biobank c-statistic, 0.653 [95% CI, 0.608–0.698]; MGB 0.621 [95% CI, 0.592–0.649]), which was favorable when compared individually to Sokolow-Lyon, Cornell, and lead aVL criteria (P<0.001 for all, Figure 5 and Table V in the Data Supplement). LVH discrimination using LVM-AI was similar to the presence of any ECG rule in UK Biobank Test (c-statistic, 0.618 [95% CI, 0.574–0.663]; P=0.11), but significantly greater than any ECG rule in MGB Test (0.588 [95% CI, 0.564–0.611]; P=0.02).
When compared with any ECG rule, LVM-AI had greater sensitivity and specificity in UK Biobank Test (sensitivity 34% [95% CI, 25–44] versus 32% [95% CI, 24–42]; specificity 96% [95% CI, 96–97] versus 91% [95% CI, 90–92]), and greater sensitivity but lower specificity in MGB Test (sensitivity 41% [95% CI, 36–46] versus 24% [95% CI, 20–29]; specificity 83% [95% CI, 80–86] versus 93% [95% CI, 91–95], Figure 5). When compared with individual ECG rules, LVM-AI had greater sensitivity, with comparable or moderately lower specificity (Figure 5; Table V in the Data Supplement). Net reclassification improvement using LVM-AI versus any ECG rule was 0.071 (95% CI, −0.016 to 0.17) in UK Biobank Test, and 0.067 (95% CI, 0.0072–0.13) in MGB Test (Table VI in the Data Supplement), with increased case detection in both sets (UK Biobank 1.9% [95% CI, −7.6 to 13]; MGB 17% [95% CI, 12–22]).
LVM-AI and Incident Events
In the UK Biobank and MGB Outcomes samples, LVM-AI predicted LVH was associated with incident AF (hazard ratio, 1.84 [95% CI, 1.29–2.63] in UK Biobank; 1.34 [95% CI, 1.21–1.50] in MGB), MI (hazard ratio 1.80 [95% CI, 1.09–2.96]; 1.29 [95% CI, 1.15–1.46]), HF (3.97 [95% CI, 2.70–5.84]; 1.49 [95% CI, 1.37–1.62]), and VA (3.16 [95% CI, 1.62–6.18]; 1.71 [95% CI, 1.47–1.99]). Associations were similar using LVH defined as the 90th percentile of LV mass index and using LV mass index as a continuous variable (Table 2 and Figure VII in the Data Supplement). Cumulative risk curves stratified by presence of LVM-AI predicted LVH are shown in Figure 6 and Figure VIII in the Data Supplement.
Hazard ratio for covariate (95% CI)* | |||||
---|---|---|---|---|---|
N events/N total† | Follow-up, y (Q1,Q3) | LVMI (per 1 SD) | LVH (UKBB cutoff) | LVH (90th percentile) | |
UK Biobank | |||||
Atrial fibrillation | 376/34242 | 2.3 (1.3–3.7) | 1.30 (1.18−1.43) | 1.84 (1.29–2.63) | 1.45 (1.08–1.95) |
Myocardial infarction | 193/34454 | 2.3 (1.3–3.8) | 1.38 (1.19–1.59) | 1.80 (1.09–2.96) | 1.51 (1.01–2.27) |
Heart failure | 182/35077 | 2.3 (1.3–3.8) | 1.50 (1.40–1.60) | 3.97 (2.69–5.84) | 3.44 (2.47–4.79) |
Ventricular arrhythmias | 69/35213 | 2.3 (1.3–3.8) | 1.43 (1.25–1.64) | 3.16 (1.62–6.18) | 3.05 (1.76–5.27) |
MGB | |||||
Atrial fibrillation | 4661/28612 | 11.3 (6.4–14.2) | 1.08 (1.02–1.14) | 1.34 (1.21–1.50) | 1.20 (1.10–1.30) |
Myocardial infarction | 2134/25334 | 11.9 (7.4–14.3) | 1.20 (1.11–1.30) | 1.29 (1.15–1.46) | 1.44 (1.22–1.71) |
Heart failure | 4042/26113 | 11.4 (6.7–14.3) | 1.22 (1.15–1.29) | 1.49 (1.37–1.62) | 1.70 (1.52–1.89) |
Ventricular arrhythmias | 1165/27547 | 10.6 (7.8–14.3) | 1.35 (1.22–1.50) | 1.71 (1.47–1.99) | 2.22 (1.82–2.70) |
LVH indicates left ventricular hypertrophy; LVM-AI, left ventricular mass-artificial intelligence; LVMI, left ventricular mass index; MGB, Mass General Brigham; and UKBB, UK Biobank.
*
Hazard ratios obtained using Cox proportional hazards models adjusted for age and sex.
†
Includes individuals without the prevalent condition at imaging assessment (UK Biobank) or start of cohort follow-up (MGB).
In secondary analyses, associations between ECG rule-based LVH and incident events varied by specific rule, although the presence of LVH by any ECG rule was consistently associated with incident AF and HF (Table VII in the Data Supplement). In models including both LVM-AI predicted and ECG rule-based LVH, LVM-AI predicted LVH was independently associated with incident events (Table VIII in the Data Supplement). Cumulative risk curves stratified by the presence of LVH using LVM-AI and any ECG rule are shown in Figure IX in the Data Supplement.
Sensitivity Analyses
An age, sex, BMI, and hypertension-adjusted Cornell voltage product had lower sensitivity and specificity than LVM-AI (Table IX in the Data Supplement and Figure X in the Data Supplement). In the UK Biobank, a version of LVM-AI trained using ECG alone (ie, without age, sex, or BMI) had similar correlation with CMR-derived LV mass (r=0.72 [95% CI, 0.70–0.73]) and diagnostic performance for LVH (c-statistic 0.654 [95% CI, 0.608–0.700]). In contrast, LVM-AI trained using an unweighted loss function had slightly higher correlation (r=0.82 [95% CI, 0.81–0.83]) but substantially worse LVH diagnostic performance (c-statistic 0.528 [95% CI, 0.505–0.552]). Performance of the secondary LVM-AI models is summarized in Figures XI and XII in the Data Supplement.
Discussion
In a prospective community-based sample of over 30 000 individuals with CMR and 12-lead ECG, we developed LVM-AI, a deep learning model that estimates CMR-derived LV mass using 12-lead ECG waveforms. When assessed in 2 independent test sets, LVM-AI appeared more sensitive for the presence of LVH on CMR as compared with traditional ECG rules applied individually or in aggregate. Importantly, LVM-AI predicted LVH was consistently associated with incident cardiovascular events. Our findings demonstrate the potential of deep learning on medical data available at scale to recapitulate structural information otherwise obtainable only through advanced imaging, as well as the potential to transfer such models across disparate clinical settings.
Our findings support and extend previous work using deep learning to infer cardiac structure from ECG. Tison et al9 and Kwon et al10 used neural networks to predict the presence of increased LV mass on echocardiography. Differences in LVH discrimination observed in Tison et al (c-statistic 0.870) and Kwon et al (0.868) versus our study (0.654) may reflect the impact of sample composition on model performance. Specifically, retrospective ascertainment of ECGs (for inference) and echocardiograms (for LVH definition) performed for clinical reasons in prior studies may enrich for pathology and potentially introduce selection bias. It is also possible that differences in performance may be related to varying model architectures. Nevertheless, we submit that training on prospectively collected ECG and CMR (the gold-standard for LV mass measurement)7 has the potential to reduce model bias.
Our results suggest that deep learning–based LV mass estimation may improve the yield of LVH screening using 12-lead ECG. Since antihypertensive treatment is a low-cost, well-tolerated intervention that can lead to LVH regression and improved outcomes,27 it is critical for ECG-based LVH screening tools to be sufficiently sensitive. To this end, LVM-AI demonstrated increased case detection when compared with ECG rules applied individually or in aggregate. At the same time, overall discrimination for CMR-derived LVH using LVM-AI remained modest, and improved methods for discriminating LVH are warranted. Nevertheless, even modest improvements in performance may be substantial when applied at scale and leveraging the potential for automation. Whether deep learning models explicitly trained to exhibit certain test characteristics (eg, very high sensitivity) are better suited for specific clinical applications merits further study.
Our results demonstrate that deep learning models can be transferred across populations with varying characteristics, although performance seems to decline. We transferred LVM-AI, which was trained in a prospective community-based cohort, to an independent health care–related dataset in which the prevalence of LVH and related comorbidities were frequently over twice as high. Although initial model predictions in MGB required linear recalibration, LVM-AI predicted LV mass correlated with CMR-derived LV mass and LVM-AI discriminated CMR-based LVH better than traditional ECG rules. Nevertheless, model accuracy, calibration, and agreement with CMR-derived LV mass were noticeably lower within MGB, suggesting that model generalizability may have been constrained by limited diversity in patient characteristics within the UK Biobank training set, or alternatively overfitting. Future work is warranted to further evaluate expected declines in model performance when transferring across settings, and whether training on data from multiple settings leads to more generalizable models.
The current study underscores the potential for deep learning models using raw ECG data to produce clinically relevant output. Recent studies have shown that ECG-based deep learning models can discriminate individuals at higher risk for short-term outcomes including death28 or AF.29 In our study, the presence of LVM-AI predicted LVH was associated with substantially increased risks of incident cardiovascular events over many years. Effect sizes for AF and VA were similar to those reported previously using imaging-based LVH,1,3 whereas those for HF and MI were slightly lower.2,30 Notably, effect sizes were larger in the UK Biobank as opposed to MGB, which may reflect less accurate CMR-derived LV mass estimation in MGB, or a higher risk population in MGB, in which the relative effect of LV mass on outcomes may be smaller. Interestingly, when added to the presence of ECG rule-based LVH, LVM-AI—a surrogate for anatomic LVH—remained independently associated with outcomes. Such findings demonstrate the added prognostic value of LVM-AI and are consistent with previous reports suggesting that ECG-based LVH may be an electrophysiological risk marker comprising elements independent of ventricular anatomy.23 We anticipate that future deep learning models may add even further value if they can characterize additional aspects of cardiac structure and function that can be difficult to quantify using current imaging techniques.
Our findings must be interpreted in the context of study design. First, LVM-AI was trained within the UK Biobank, a sample enriched for health and socioeconomic status and having a relatively low prevalence of LVH, which may have impacted model generalizability. Nevertheless, we observed reasonable accuracy in an external healthcare-related dataset, although we note initial LVM-AI estimates were poorly calibrated and required linear adjustment. On balance, our findings demonstrate that portability is feasible, but model training within populations most similar to those in which implementation is intended may optimize performance. Second, in the absence of manually annotated CMR images, we utilized a segmentation algorithm to derive CMR-based LV mass. Although the algorithm is accurate,13 imperfect LV mass estimates may have impacted the performance of LVM-AI. Third, LVM-AI is a black box model. However, saliency maps demonstrated that components of the ECG waveform plausibly relevant for LV mass estimation had the greatest impact on predicted LV mass.
In summary, using prospectively collected ECG and CMR data from a sizeable community-based cohort, we developed LVM-AI, a deep learning algorithm that estimates CMR-derived LV mass with fair accuracy using 12-lead ECG. We validated LVM-AI in 2 independent samples including a health care dataset and demonstrated improved diagnostic performance compared with traditional ECG-based rules applied individually or in aggregate. LVM-AI predicted LVH was associated with increased risk of cardiovascular events independently of ECG rule-based LVH. Our findings highlight the utility of deep learning to leverage clinical data available at scale to infer cardiac structural information otherwise requiring dedicated imaging to characterize.
Supplemental Material
File (ecg_lvh_supplement_final.pdf)
- Download
- 8.05 MB
References
1.
Bluemke DA, Kronmal RA, Lima JA, Liu K, Olson J, Burke GL, Folsom AR. The relationship of left ventricular mass and geometry to incident cardiovascular events: the MESA (multi-ethnic study of atherosclerosis) study. J Am Coll Cardiol. 2008;52:2148–2155. doi: 10.1016/j.jacc.2008.09.014
2.
Chrispin J, Jain A, Soliman EZ, Guallar E, Alonso A, Heckbert SR, Bluemke DA, Lima JA, Nazarian S. Association of electrocardiographic and imaging surrogates of left ventricular hypertrophy with incident atrial fibrillation: MESA (multi-ethnic study of atherosclerosis). J Am Coll Cardiol. 2014;63:2007–2013. doi: 10.1016/j.jacc.2014.01.066
3.
Kawel-Boehm N, Kronmal R, Eng J, Folsom A, Burke G, Carr JJ, Shea S, Lima JAC, Bluemke DA. Left ventricular mass at MRI and long-term risk of cardiovascular events: the multi-ethnic study of atherosclerosis (MESA). Radiology. 2019;293:107–114. doi: 10.1148/radiol.2019182871
4.
Casale PN, Devereux RB, Kligfield P, Eisenberg RR, Miller DH, Chaudhary BS, Phillips MC. Electrocardiographic detection of left ventricular hypertrophy: development and prospective validation of improved criteria. J Am Coll Cardiol. 1985;6:572–580. doi: 10.1016/s0735-1097(85)80115-7
5.
Sokolow M, Lyon TP. The ventricular complex in left ventricular hypertrophy as obtained by unipolar precordial and limb leads. Am Heart J. 1949;37:161–186. doi: 10.1016/0002-8703(49)90562-1
6.
Pewsner D, Jüni P, Egger M, Battaglia M, Sundström J, Bachmann LM. Accuracy of electrocardiography in diagnosis of left ventricular hypertrophy in arterial hypertension: systematic review. BMJ. 2007;335:711. doi: 10.1136/bmj.39276.636354.AE
7.
Lenstrup M, Kjaergaard J, Petersen CL, Kjaer A, Hassager C. Evaluation of left ventricular mass measured by 3D echocardiography using magnetic resonance imaging as gold standard. Scand J Clin Lab Invest. 2006;66:647–657. doi: 10.1080/00365510600892233
8.
Deo RC. Machine learning in medicine. Circulation. 2015;132:1920–1930. doi: 10.1161/CIRCULATIONAHA.115.001593
9.
Tison GH, Zhang J, Delling FN, Deo RC. Automated and interpretable patient ECG profiles for disease detection, tracking, and discovery. Circ Cardiovasc Qual Outcomes. 2019;12:e005289. doi: 10.1161/CIRCOUTCOMES.118.005289
10.
Kwon JM, Jeon KH, Kim HM, Kim MJ, Lim SM, Kim KH, Song PS, Park J, Choi RK, Oh BH. Comparing the performance of artificial intelligence and conventional diagnosis criteria for detecting left ventricular hypertrophy using electrocardiography. Europace. 2020;22:412–419. doi: 10.1093/europace/euz324
11.
Littlejohns TJ, Sudlow C, Allen NE, Collins R. UK Biobank: opportunities for cardiovascular research. Eur Heart J. 2019;40:1158–1166. doi: 10.1093/eurheartj/ehx254
12.
Petersen SE, Matthews PM, Francis JM, Robson MD, Zemrak F, Boubertakh R, Young AA, Hudson S, Weale P, Garratt S, et al. UK Biobank’s cardiovascular magnetic resonance protocol. J Cardiovasc Magn Reson. 2016;18:8. doi: 10.1186/s12968-016-0227-4
13.
Khurshid S, Friedman SF, Pirruccello JP, Di Achille P, Diamant N, Anderson CD, Ellinor PT, Batra P, Ho JE, Philippakis AA, Lubitz SA. Deep learning to estimate cardiac magnetic resonance–derived left ventricular mass. Cardiovasc Digit Health J. 2021;S2666693621000232. doi: 10.1016/j.cvdhj.2021.03.001
14.
Rodrigues J, Zellner A. Weighted balanced loss function and estimation of the mean time to failure. Commun Stat Theory Methods. 1994;23:3609–3616.
15.
Norman JE, Levy D. Adjustment of ECG left ventricular hypertrophy criteria for body mass index and age improves classification accuracy. The effects of hypertension and obesity. J Electrocardiol. 1996;29 (Suppl):241–247. doi: 10.1016/s0022-0736(96)80070-7
16.
ML4CVD Group. Machine Learning for Health (ML4H). Accessed March 1, 2021. Available at: https://github.com/broadinstitute/ml. GitHub.2020.
17.
Gwet KL. Computing inter-rater reliability and its variance in the presence of high agreement. Br J Math Stat Psychol. 2008;61(Pt 1):29–48. doi: 10.1348/000711006X126600
18.
Hulme OL, Khurshid S, Weng LC, Anderson CD, Wang EY, Ashburner JM, Ko D, McManus DD, Benjamin EJ, Ellinor PT, et al. Development and validation of a prediction model for atrial fibrillation using electronic health records. JACC Clin Electrophysiol. 2019;5:1331–1341. doi: 10.1016/j.jacep.2019.07.016
19.
Khurshid S, Choi SH, Weng LC, Wang EY, Trinquart L, Benjamin EJ, Ellinor PT, Lubitz SA. Frequency of cardiac rhythm abnormalities in a half million adults. Circ Arrhythm Electrophysiol. 2018;11:e006273. doi: 10.1161/CIRCEP.118.006273
20.
Bland JM, Altman DG. Measuring agreement in method comparison studies. Stat Methods Med Res. 1999;8:135–160. doi: 10.1177/096228029900800204
21.
Petersen SE, Aung N, Sanghvi MM, Zemrak F, Fung K, Paiva JM, Francis JM, Khanji MY, Lukaschuk E, Lee AM, et al. Reference ranges for cardiac structure and function using cardiovascular magnetic resonance (CMR) in Caucasians from the UK Biobank population cohort. J Cardiovasc Magn Reson. 2017;19:18. doi: 10.1186/s12968-017-0327-9
22.
Du Bois D, Du Bois EF. A formula to estimate the approximate surface area if height and weight be known. 1916. Nutrition. 1989;5:303–311.
23.
Leigh JA, O’Neal WT, Soliman EZ. Electrocardiographic left ventricular hypertrophy as a predictor of cardiovascular disease independent of left ventricular anatomy in subjects aged ≥65 years. Am J Cardiol. 2016;117:1831–1835. doi: 10.1016/j.amjcard.2016.03.020
24.
Python Core Team. Python: a dynamic, open source programming language. Python Software Foundation. 2015. Accessed October 9, 2019. Available at: https://www.python.org/.
25.
Abadi, M, Agarwal, A, Barham, P, Brevdo, E, Chen, Z, Citro, C, Corrado, GS. TensorFlow: large-scale machine learning on heterogeneous systems. 2016. arXiv:1603.04467v2.
26.
R Core Team (2015). R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing; 2015. Accessed October 9, 2019. https://www.R-project.org/.
27.
Okin PM, Devereux RB, Jern S, Kjeldsen SE, Julius S, Nieminen MS, Snapinn S, Harris KE, Aurup P, Edelman JM, et al; LIFE Study Investigators. Regression of electrocardiographic left ventricular hypertrophy during antihypertensive treatment and the prediction of major cardiovascular events. JAMA. 2004;292:2343–2349. doi: 10.1001/jama.292.19.2343
28.
Raghunath S, Ulloa Cerna AE, Jing L, vanMaanen DP, Stough J, Hartzel DN, Leader JB, Kirchner HL, Stumpe MC, Hafez A, et al. Prediction of mortality from 12-lead electrocardiogram voltage data using a deep neural network. Nat Med. 2020;26:886–891. doi: 10.1038/s41591-020-0870-z
29.
Raghunath S, Pfeifer JM, Ulloa-Cerna AE, Nemani A, Carbonati T, Jing L, vanMaanen DP, Hartzel DN, Ruhl JA, Lagerman BF, et al. Deep neural networks can predict new-onset atrial fibrillation from the 12-lead ECG and help identify those at risk of atrial fibrillation-related stroke. Circulation. 2021;143:1287–1298. doi: 10.1161/CIRCULATIONAHA.120.047829
30.
Haider AW, Larson MG, Benjamin EJ, Levy D. Increased left ventricular mass and hypertrophy are associated with increased risk for sudden death. J Am Coll Cardiol. 1998;32:1454–1459. doi: 10.1016/s0735-1097(98)00407-0
Information & Authors
Information
Published In
Copyright
© 2021 American Heart Association, Inc.
Versions
You are viewing the most recent version of this article.
History
Received: 27 December 2020
Accepted: 21 April 2021
Published in print: June 2021
Published online: 15 June 2021
Keywords
Subjects
Authors
Disclosures
Disclosures Dr Pirruccello has consulted for Maze Therapeutics. Dr Philippakis receives research support from Bayer AG, IBM, Intel, and Verily, and has consulted for Novartis and Rakuten. Dr Ho receives research support from Bayer AG and Gilead Sciences and has received research supplies from EcoNugenics. Dr Friedman receives research support from Bayer AG and IBM. Dr Anderson receives research support from Bayer AG and has consulted for ApoPharma, Inc. Dr Batra receives research support from Bayer AG and IBM and consults for Novartis. Dr Lubitz receives research support from Bristol Myers Squibb/Pfizer, Bayer AG, Boehringer Ingelheim, and Fitbit, and has consulted for Bristol Myers Squibb/Pfizer and Bayer AG, and participates in a research collaboration with IBM. Dr Ellinor receives research support from Bayer AG and has consulted for Bayer AG, Novartis, MyoKardia, and Quest Diagnostics.
Sources of Funding
Dr Khurshid is supported by NIH (T32HL007208). Dr Pirruccello is supported by a John S. LaDue Memorial Fellowship. Dr Ho is supported by NIH (R01HL134893/R01HL140224/K24HL153669). Dr Lubitz is supported by NIH (1R01HL139731) and American Heart Association (AHA) (18SFRN34250007). Dr Ellinor is supported by NIH (1R01HL092577/R01HL128914/K24HL105780), AHA (18SFRN34110082), and the Foundation Leducq (14CVD01). Dr Anderson is supported by NIH (R01NS103924) and AHA (18SFRN34250007).
Metrics & Citations
Metrics
Citations
Download Citations
If you have the appropriate software installed, you can download article citation data to the citation manager of your choice. Select your manager software from the list below and click Download.
- The Application of Artificial Intelligence in the Field of Cardiovascular Diseases Focuses on Both Diagnostic and Therapeutic Aspects., Experimental and Applied Medical Science, (2024).https://doi.org/10.46871/eams.1438927
- Searching for the Best Machine Learning Algorithm for the Detection of Left Ventricular Hypertrophy from the ECG: A Review, Bioengineering, 11, 5, (489), (2024).https://doi.org/10.3390/bioengineering11050489
- Preliminary Evidence of the Use of Generative AI in Health Care Clinical Services: Systematic Narrative Review, JMIR Medical Informatics, 12, (e52073), (2024).https://doi.org/10.2196/52073
- Clinical and genetic associations of asymmetric apical and septal left ventricular hypertrophy, European Heart Journal - Digital Health, 5, 5, (591-600), (2024).https://doi.org/10.1093/ehjdh/ztae060
- Four-Channel ECG as a Single Source for Early Diagnosis of Cardiac Hypertrophy and Dilation — A Deep Learning Approach, NEJM AI, 1, 10, (2024).https://doi.org/10.1056/AIoa2300297
- Diagnostic accuracy of artificial intelligence in detecting left ventricular hypertrophy by electrocardiograph: a systematic review and meta-analysis, Scientific Reports, 14, 1, (2024).https://doi.org/10.1038/s41598-024-66247-y
- Consider this a WARNing, Patterns, 5, 6, (101009), (2024).https://doi.org/10.1016/j.patter.2024.101009
- Improving the efficiency and accuracy of cardiovascular magnetic resonance with artificial intelligence—review of evidence and proposition of a roadmap to clinical translation, Journal of Cardiovascular Magnetic Resonance, 26, 2, (101051), (2024).https://doi.org/10.1016/j.jocmr.2024.101051
- Assessing Biological Age, JACC: Clinical Electrophysiology, 10, 4, (775-789), (2024).https://doi.org/10.1016/j.jacep.2024.02.011
- Artificial Intelligence for Cardiovascular Care—Part 1: Advances, Journal of the American College of Cardiology, 83, 24, (2472-2486), (2024).https://doi.org/10.1016/j.jacc.2024.03.400
- See more
Loading...
View Options
Login options
Check if you have access through your login credentials or your institution to get full access on this article.
Personal login Institutional LoginPurchase Options
Purchase this article to access the full text.
Submit a Response to This Article