Skip main navigation

Unsupervised Cluster Analysis of Patients With Aortic Stenosis Reveals Distinct Population With Different Phenotypes and Outcomes

Originally published Cardiovascular Imaging. 2020;13:e009707



There is a lack of studies investigating the heterogeneity of patients with aortic stenosis (AS). We explored whether cluster analysis identifies distinct subgroups with different prognostic significances in AS.


Newly diagnosed patients with moderate or severe AS were prospectively enrolled between 2013 and 2016 (n=398, mean 71 years, 55% male). Among demographics, laboratory, and echocardiography parameters (n=32), 11 variables were selected through dimension reduction and used for unsupervised clustering. Phenotypes and causes of mortality were compared between the clusters.


Three clusters with markedly different features were identified. Cluster 1 (n=60) was predominantly associated with cardiac dysfunction, cluster 2 (n=86) consisted of elderly with comorbidities, especially end-stage renal disease, whereas cluster 3 (n=252) demonstrated neither cardiac dysfunction nor comorbidities. Although AS severity did not differ, there was a significant difference in adverse outcomes between the clusters during a median 2.4 years follow-up (mortality rate, 13.3% versus 19.8% versus 6.0% for cluster 1, 2, and 3, P<0.001). Particularly, compared with cluster 3, cluster 1 was associated with only cardiac mortality (adjusted hazard ratio, 7.37 [95% CI, 2.00–27.13]; P=0.003), whereas cluster 2 was associated with higher noncardiac mortality (adjusted hazard ratio, 3.35 [95% CI, 1.26–8.90]; P=0.015). Phenotypes and association of clusters with specific outcomes were reproduced in an independent validation cohort (n=262).


Unsupervised cluster analysis of patients with AS revealed 3 distinct groups with different causes of death. This provides a new perspective in the categorization of patients with AS that takes into account comorbidities and extravalvular cardiac dysfunction.

Clinical Perspective

Machine learning with the use of unsupervised cluster analysis enables us to explore the possible heterogeneity within a disease category. Among 398 patients with significant aortic stenosis (AS), we identified 3 groups by model-based clustering that can be interpreted as follows: cardiac dysfunction, comorbidities, and healthy AS. The severity of AS was similar throughout the clusters, but outcomes markedly differed; comorbidities group demonstrated the highest all-cause mortality and was associated with noncardiac as well as cardiac mortality, whereas cardiac dysfunction group was associated with only cardiac mortality. The association of clusters with distinct patterns of outcomes were reproduced in a separate validation cohort of 262 patients. The result provides a new perspective of phenotyping AS by cluster analysis, which emphasizes the role of comorbidities and extravalvular cardiac dysfunctions. For future perspective, whether the cluster analysis could improve risk stratification, and could be potential criteria to determine the management strategy in AS should be explored.


The prevalence of aortic stenosis (AS) continues to increase.1 Severe AS is fatal without aortic valve replacement (AVR) and the evolution of surgical AVR and transcatheter AVR has significantly improved the prognosis.2,3 With the increased burden of comorbidities4 and expanded treatment options,5 phenotypes and outcomes of patients with AS may be more heterogeneous than expected.

Current guidelines categorize AS into 4 stages according to its severity based on echocardiographic aortic valve assessments and symptoms.6 However, the clinical course of AS is variable with complex interactions between the patients’ characteristics, associated cardiac and noncardiac diseases, factors critical when deciding the timing and types of intervention.6,7 Moreover, there is growing evidence that extravalvular cardiac damages, such as myocardial fibrosis, can be substantially diverse among AS and associated with adverse outcomes.8 These imply that the current valve-oriented classification may not capture the heterogeneity within AS, and sophisticated phenotyping with multiple factors may have additive value.

Recently, machine learning has been adopted in cardiovascular research.9 Unsupervised cluster analysis categorizes the complex entities without investigators’ supervision by segregating samples into homogenous groups based on each cluster’s dissimilarities.9 This helps to unveil meaningful phenotypes within a disease that has been previously considered homogenous. Cluster analysis may be invaluable in phenotyping several cardiovascular diseases10–12 but has never been adopted to AS.

We hypothesized that there might be clinically distinct AS clusters. We aimed to explore whether unsupervised cluster analysis can identify clinically relevant groups among AS with different outcomes and causes of death.


The study materials are available from the corresponding authors on reasonable request. The overall scheme of the study is depicted in Figure 1 and more detailed methods are available in Methods in the Data Supplement.

Figure 1.

Figure 1. Overall scheme of the study design. Left, data from patients with aortic stenosis (AS) were prepared, and pivotal variables for clustering were selected. The correlation matrix between 32 candidate variables is shown. With dimensionality reduction, 11 variables were selected (marked in red text). Middle, model-based clustering was performed. For simple visualization, the classification plot was projected onto the subspace by the 2 most significant dimensions (upper). The density per cluster is depicted by the first dimension (lower). Right, each cluster’s phenotype was interpreted. Outcomes were compared between the clusters, and distinct associations sought. AVAi indicates aortic valve area index; AV TVI, aortic valve time-velocity integral; AV Valsalva diameter, Aortic valve sinus of Valsalva diameter; BMI; body mass index; BUN, blood urea nitrogen; DBP, diastolic blood pressure; HR, heart rate; IVSD, interventricular septal thickness, end-diastole; LA volume, left atrial volume; LVEDV, left ventricular end-diastolic volume; LVEF, left ventricular ejection fraction; LVESV, left ventricular end-systolic volume; LVIDD, left ventricular internal dimension, end-diastole; LVIDS, left ventricular internal dimension, end-systole; LV mass, left ventricular mass; LVOT diameter, left ventricular outflow tract diameter; LVPWD, left ventricular posterior wall thickness, end-diastole; mean PG, transaortic mean pressure gradient; SBP, systolic blood pressure; ST junction, Sinotubular junction diameter; TR velocity, tricuspid regurgitation jet velocity; Vmax, transaortic peak velocity; and WBC, white blood cell count.

Study Design and Cohort

A prospective cohort of newly diagnosed patients with moderate or severe AS at a tertiary university hospital (Seoul National University Hospital, Korea) between 2013 and 2016 was used for derivation of the unsupervised clustering (n=398). We used another separate dataset of patients with AS with a different enrollment period, from 2010 to 2012 at the same institution (n=262), for validation of the clusters. Details on the cohort characteristics, inclusion/exclusion criteria, and definition of comorbidities are in Method in the Data Supplement.

The study protocol was approved by the institutional review board, and all participants in the derivation and validation cohort provided informed consent.

Outcome Assessment

All-cause mortality was the primary outcome. Secondary outcomes include cardiac mortality, noncardiac mortality, and death after AVR. Cardiac mortality was defined as either sudden cardiac arrest, death from heart failure or myocardial infarction, or death related to AVR. Noncardiac mortality was defined as mortality other than the cardiac causes.

Mortality with no identifiable cause of death from the death certificate was classified as indeterminate. Follow-up data were available in 94% of the derivation, and 100% of the validation cohort.

Variable Preparation for Cluster Analysis

Variables used for clustering were recruited from clinical or echocardiographic domains that are either routinely obtained in the assessment of AS, used for risk stratification, or have prognostic value.7,13–15Table 1 summarizes phenotypic domains and variables used for the analysis.

Table 1. Phenotype Domains and Variables

Physical examSystolic blood pressure, diastolic blood pressure,
heart rate,* body mass index*
Laboratory dataWhite blood cell count,* hemoglobin,* platelet count,*
blood urea nitrogen, creatinine*
Echocardiography data
 Left heart geometryLV end-systolic diameter, LV end-diastolic diameter,
LV end-systolic volume, LV end-diastolic volume,
LV septal thickness, LV posterior wall thickness,
LV mass index, left atrial volume*
 LV systolic functionEjection fraction*, cardiac index
 LV diastolic functionE-wave,* A-wave,* e’-wave, a’-wave
 Aortic valveAortic valve area index, peak aortic jet velocity,
mean aortic pressure gradient,
aortic valve time velocity integral,
LV outflow tract diameter, sinus of Valsalva diameter,
sinotubular junction diameter
 OtherTricuspid regurgitation jet velocity*

LV indicates left ventricle.

*The 11 variables used in the final clustering, selected through the dimension reduction steps.

The missing values were imputed with the missForest algorithm,16 with appropriate imputation error (Figure I in the Data Supplement).16,17 Then, we selected pivotal variables for clustering through the dimension reduction using Pearson coefficient and Bayesian information criterion (BIC), which penalizes model complexity. A larger BIC indicates a stronger likelihood of the corresponding model. Details of variable preparation are described in Method in the Data Supplement.

Unsupervised Clustering

For the primary cluster analysis, we applied model-based clustering,18,19 which has been broadly adopted in previous studies.10–12 The number of clusters and specific geometric model were chosen based on BIC,10–12,18,19 as well as the integrated complete-data likelihood.18,19 Clustering was performed independently from the outcome data. After allocating individuals to each cluster, we compared phenotypes and outcomes between clusters, followed by interpretation of its clinical relevance.

In addition, we used another popular clustering algorithm, agglomerative hierarchical clustering (Ward's method), to investigate whether it produces clusters of similar meaning. The number of clusters was determined based on 30 indices for hierarchical clustering. We also compared phenotypes and outcomes within these clusters. Details on model-based and hierarchical clustering are in Method in the Data Supplement.

Validation of the Findings From the Derivation Cohort

To validate the generalizability of the clustering, we used data from a separate validation cohort (n=262). The data was imputed and normalized using the same method as the derivation cohort (Figure II in the Data Supplement). Cluster prediction was determined by multivariate observations (the variables used for clustering) based on Gaussian finite mixture models derived from the model-based clustering.10,19

Statistical Analysis

Continuous variables are presented as mean±SD and categorical variables as numbers (percentages). The difference between continuous variables was analyzed using the ANOVA or Kruskal-Wallis test, and for categorical variables, either χ2 or Fisher's exact test. Kaplan-Meier curves were plotted with the duration from the enrollment to the last follow-up or death and compared with the log-rank test.

Cox proportional hazard analyses were performed to evaluate the association between the outcomes and clusters. In the derivation cohort, multivariate Cox models were adjusted only for basic characteristics (age, sex, and body mass index) to avoid possible overfitting with the small number of events. Patients with missing outcome data were excluded from the survival and Cox analyses.

To further evaluate the prognostic and discriminative utility of the clusters, we compared the predictability for 3-year outcomes between the model with and without the cluster variable.20,21 The base model was built with the variables significant in univariate Cox analysis for all-cause mortality, with missing values <10%. We calculated C statistics, net reclassification improvement and integrated discrimination improvement for comparing prediction accuracy of the 2 models. The statistical inference for comparing 2 models was conducted by the perturbation-resampling method by Uno,20,21 which uses an inverse probability of censoring weights. The truncation time was set at 3-year and the P-value was obtained with 1000 perturbation samples.21 This analysis was conducted with the survIDINRI package of R software. Additionally, we performed the same analysis using the CURRENT-AS risk score22 as a reference standard.

All analyses were done with R (Vienna, Austria) and its packages (Table I in the Data Supplement). A Pvalue <0.05 was considered statistically significant.


Study Population

Between 2013 and 2016, 441 patients with AS were recruited. Among these, 43 patients were excluded because of a history of myocardial infarction or coronary artery bypass graft surgery (n=21), a history of valve surgery (n=12), or other valvular diseases greater than or equal to moderate degree (n=10). The remaining 398 patients constituted the final derivation cohort.

Variable Selection and Optimal Number of Clusters

From the initial 32 variables, 11 variables (6 clinical, 5 imaging parameters) were selected through the dimension reduction step (Table 1). A heatmap with the selected variables is shown (Figure IIIA in the Data Supplement). In model-based clustering, the VVE model with 3 clusters had the maximum BIC and integrated complete-data likelihood values (Figure IIIB and IIIC in the Data Supplement), which we concluded as the most optimal model and the number of clusters. The BIC of the final model using the 11 selected variables was substantially improved compared with the model with all variables (n=32), confirming the validity of the dimension reduction step (Table II in the Data Supplement). The relative importance of the 11 variables for cluster assignment was assessed by the McFadden pseudo-R2 (Method in the Data Supplement), the rank of which was as follows: hemoglobin, tricuspid regurgitant jet velocity, creatinine, left atrial volume, E-wave velocity, left ventricular ejection fraction, body mass index, heart rate, A-wave velocity, platelet, and white blood cell count (Figure IV in the Data Supplement).

Comparison of Clinical and Echocardiography Parameters Between Clusters

The baseline characteristics of the clusters were compared (Table 2). Cluster 2 consisted predominantly of lean, elderly patients with more prevalent comorbidities, particularly end-stage renal disease, with the lowest glomerular filtration rate and hemoglobin level. Patients in cluster 3 were the youngest, least symptomatic, and had the least comorbidities among the 3 groups. In cluster 1, the most notable finding was the highest prevalence of atrial fibrillation.

Table 2. Baseline Characteristics of Study Participants by Clusters in the Derivation Cohort

Cluster 1 (n=60)Cluster 2 (n=86)Cluster 3 (n=252)P Value
Age, y72.2±10.074.9±10.070.8±10.3 0.002
Male, n (%)30 (50.0)54 (62.8)135 (53.6) 0.232
Body mass index, kg/m226.3±5.122.8±2.924.1±2.8<0.001
NYHA functional class, n (%) 0.005
 I16 (26.7)26 (30.2)104 (41.3)
 II29 (48.3)40 (46.5)121 (48.0)
 III12 (20.0)19 (22.1)25 (9.9)
 IV3 (5.0)1 (1.2)2 (0.8)
Smoking, n (%)11 (18.3)22 (25.6)49 (19.4) 0.428
Systolic blood pressure, mm Hg131.9±18.3135.4±24.9131.7±17.4 0.435
Diastolic blood pressure, mm Hg69.9±11.369.7±12.071.4±10.8 0.224
Heart rate, beats per minute75.7±20.569.2±11.765.7±9.8<0.001
Comorbidities, n (%)
 Hypertension46 (76.7)66 (76.7)167 (66.3) 0.090
 Diabetes mellitus18 (30.0)33 (38.4)63 (25.0) 0.059
 Dyslipidemia14 (23.3)18 (20.9)80 (31.7) 0.105
 Coronary artery disease7 (11.7)11 (12.8)16 (6.3) 0.117
 Atrial fibrillation11 (18.3)2 (2.3)4 (1.6)<0.001
 Stroke7 (11.7)8 (9.3)15 (6.0) 0.251
 Pulmonary disease6 (10.0)16 (18.6)22 (8.7) 0.040
 Liver cirrhosis2 (3.3)4 (4.7)1 (0.4) 0.021
 End-stage renal disease0 (0.0)18 (20.9)2 (0.8)<0.001
 WBC count, ×103/μL7.2±1.76.9±3.86.8±1.9 0.023
 Hemoglobin, g/dL12.8±1.910.7±1.513.4±1.5<0.001
 Platelet count, ×103/μL222.6±65.0197.8±99.8213.7±51.6 0.002
 Blood urea nitrogen, mg/dL18.5±6.926.7±13.816.8±5.2<0.001
 Creatinine, mg/dL0.9±0.22.5±2.60.9±0.2<0.001
 eGFR, mL/min per 1.73 m279.4±18.054.5±38.682.8±21.4<0.001

The data are presented as mean±SD for continuous variables and number (percentage) for categorical variables. eGFR indicates estimated glomerular filtration rate; NYHA, New York Heart Association; and WBC, white blood cell.

Regarding echocardiographic evaluation, cluster 1 had significantly depressed left ventricular ejection fraction, more left ventricular hypertrophy, and more severe diastolic dysfunction (Table 3). In contrast, cluster 3 had the most preserved cardiac function and structure. Cluster 2 had mid-range values (ie, left ventricular mass index, left atrial volume) between cluster 1 and 3. Notably, there was no difference in AS severity between the 3 clusters. Overall, the 3 clusters could be characterized as follows: cardiac dysfunction for cluster 1, comorbidities for cluster 2, and healthy AS for cluster 3.

Table 3. Echocardiography Parameters of Study Participants by Clusters in the Derivation Cohort

Cluster 1 (n=60)Cluster 2 (n=86)Cluster3 (n=252)P Value
Left heart geometry
 LV end-systolic diameter, mm32.0±7.431.6±5.028.8±3.6<0.001
 LV end-diastolic diameter, mm50.1±7.149.3±5.047.3±4.8<0.001
 LV end-systolic volume, mL59.4±46.547.7±21.438.4±14.6<0.001
 LV end-diastolic volume, mL126.9±61.8116.1±36.6105.6±34.1 0.017
 LV septal thickness, mm11.7±2.011.1±1.711.0±2.0 0.043
 LV posterior wall thickness, mm11.3±1.710.7±1.510.6±1.6 0.010
 LV mass index, g/m2135.0±42.9128.9±32.1115.6±31.2<0.001
 Left atrial volume, mL115.5±41.999.9±31.980.3±22.9<0.001
LV systolic function and hemodynamics
 LV ejection fraction, %56.6±12.660.0±8.063.8±5.0<0.001
 Cardiac index, L/min/m23.6±1.33.4±7.23.3±7.6 0.127
Diastolic function
 E-wave, m/s0.9±0.30.8±0.30.6±0.2<0.001
 A-wave, m/s0.9±0.41.0±0.30.9±0.2 0.014
 e’-wave, cm/s4.5±1.54.2±1.34.5±1.4 0.250
 a’-wave, cm/s6.4±2.17.3±1.97.8±1.7<0.001
 E/A ratio1.3±1.00.8±0.40.7±0.3<0.001
 E/e’ ratio23.3±11.920.3±10.115.2±6.2<0.001
 TR jet velocity, m/s2.9±0.62.7±0.42.4±0.2<0.001
Diastolic dysfunction, n (%)<0.001
 Normal11 (18.3)23 (26.7)115 (45.6)
 Grade I2 (3.3)6 (7.0)36 (14.3)
 Grade II25 (41.7)47 (54.7)82 (32.5)
 Grade III20 (33.3)7 (8.1)4 (1.6)
 Indeterminate2 (3.3)3 (3.5)15 (6.0)
Aortic valve
 AS severity, n (%)0.830
  Moderate27 (45.0)43 (50.0)119 (47.2)
  Severe33 (55.0)43 (50.0)133 (52.8)
 Aortic valve area index, cm2/m20.5±0.20.5±0.10.5±0.1 0.826
 Peak aortic jet velocity, m/s4.3±0.94.1±0.94.2±0.8 0.348
 Mean pressure gradient, mm Hg45.3±19.642.5±20.644.2±18.2 0.351
 Aortic valve TVI, cm99.1±29.099.0±28.698.5±25.2 0.944
 LV outflow tract diameter, mm21.2±2.021.3±1.621.3±1.8 0.961
 Sinus of Valsalva diameter, mm33.6±4.233.6±4.334.0±4.4 0.772
 Sinotubular junction diameter, mm28.2±4.727.6±4.128.4±4.5 0.076

The data are presented as mean±SD. AS indicates aortic stenosis; LV, TR, tricuspid regurgitation; and TVI, time velocity integral.

Clinical Outcomes of Each Cluster in the Derivation Cohort

Clinical outcomes were markedly different per cluster. During a median 2.4 years (interquartile range, 1.3–3.4 years) follow-up, there were 40 mortality cases (14 cardiac mortality, 20 noncardiac mortality, and 6 indeterminate; Table III in the Data Supplement). Cluster 2 had the highest all-cause mortality, followed by cluster 1 (P<0.001; Figure 2A). The cumulative incidence of cardiac mortality was the highest in cluster 1 and also significantly increased in cluster 2 compared with cluster 3 (P=0.005; Figure 2B). However, noncardiac mortality occurred predominantly only in cluster 2 (P=0.001; Figure 2C).

Figure 2.

Figure 2. Adverse outcomes according to the clusters in the derivation cohort. Survival free from (A) all-cause, (B) cardiac, (C) noncardiac, and (D) post-aortic valve replacement (AVR) mortality. *P<0.05 by pairwise comparison with cluster 3.

In unadjusted Cox analysis, when compared with cluster 3, all-cause mortality risk was higher in cluster 2, as well as in cluster 1 with marginal significance (Table 4). Cluster 1 had the strongest risk of cardiac mortality (HR, 6.44 [95% CI, 1.82–22.83]; P=0.004) but no association with noncardiac mortality, whereas the risk of noncardiac mortality was significantly elevated in cluster 2 (HR, 4.51 [95% CI, 1.78–11.45]; P=0.002). Cluster 2 also had a higher cardiac mortality risk (HR, 3.67 [95% CI, 0.92–14.72]; P=0.066). The association of cluster 1 with cardiac mortality and cluster 2 with noncardiac mortality was consistent in the adjusted Cox analysis (Table 4).

Table 4. Association of Clusters With Adverse Outcomes in Cox Proportional Hazard Analysis in the Derivation Cohort

Cluster 1 (n=60)Cluster 2 (n=86)Cluster 3 (n=252)P Value
Outcome, n (%)
 All-cause mortality8 (13.3)17 (19.8)15 (6.0)<0.001
 Cardiac mortality6 (10.0)4 (4.7)4 (1.6) 0.005
 Noncardiac mortality2 (3.3)10 (11.6)8 (3.2) 0.001
 Death after AVR*2 (5.7)6 (13.3)6 (5.2) 0.161
Unadjusted HR (95% CI)
 All-cause mortality2.30 (0.98–5.43)4.04 (2.01–8.09)1
 Cardiac mortality6.44 (1.82–22.83)3.67 (0.92–14.72)1
 Noncardiac mortality1.07 (0.22–5.05)4.51 (1.78–11.45)1
 Death after AVR*1.06 (0.21–5.24)2.74 (0.88–8.51)1
Adjusted HR (95% CI)
 All-cause mortality2.61 (1.08–6.31)2.95 (1.44–6.07)1
 Cardiac mortality7.37 (2.00–27.13)2.75 (0.67–11.36)1
 Noncardiac mortality1.20 (0.25–5.77)3.35 (1.26–8.90)1
 Death after AVR*1.00 (0.19–5.36)2.69 (0.83–8.67)1

The data are presented as number (percentage) and HR (95% CI). Mortality data were available for 58 in cluster 1 (96.7%), 81 in cluster 2 (94.2%), and 236 in cluster 3 (93.7%). Three patients in cluster 2 and 3 patients in cluster 3 had indeterminate causes of mortality and omitted from the Cox analysis. AVR indicates aortic valve replacement; and HR, hazard ratio.

*AVR was performed in 196 (49%) patients.


‡Adjusted for age, sex, and body mass index.

During follow-up, 196 (49%) received AVR, of which 32 were transcatheter AVR and 164 surgical AVR. There was no notable difference in the proportion and types of AVR between the clusters (Figure V in the Data Supplement). Cluster 2 had the worst survival after AVR, although not statistically different (Figure 2D).

Incremental Predictive Value of the Clusters for the Prediction of Outcomes

The baseline prediction models for 3-year outcomes were constructed using the risk factors identified from the univariate Cox analysis. Univariate and multivariate Cox analyses for these variables are shown in Table IV in the Data Supplement. The addition of the cluster variable to the base model showed a significant integrated discrimination improvement and net reclassification improvement for 3-year all-cause mortality (C statistics 0.762 versus 0.788; integrated discrimination improvement 0.029, P=0.020; net reclassification improvement 0.294, P=0.032), as well as for noncardiac mortality (Table 5). For cardiac mortality, the predictability was improved based on integrated discrimination improvement (Table 5). The result was consistent when the CURRENT-AS risk score22 was used as a reference standard (Table V in the Data Supplement).

Table 5. Comparison of Risk Prediction Models With and Without the Cluster Variable for 3-year Outcomes in the Derivation Cohort

All-Cause MortalityCardiac MortalityNoncardiac Mortality
Base ModelBase Model+ClusterBase ModelBase Model+ClusterBase ModelBase Model+Cluster
C statistics0.7620.7880.8160.8220.7670.808
IDI0.029, P=0.0200.045, P=0.0440.047, P=0.016
NRI0.294, P=0.0320.382, P=0.2100.313, P=0.036

The base prediction model included age, hypertension, diabetes mellitus, end-stage renal disease, left ventricular ejection fraction, and peak aortic jet velocity. IDI indicates integrated discrimination improvement; and NRI, net reclassification improvement.

Clinical Outcomes of the Clusters in the Independent Validation Cohort

A separate data of 262 patients with AS was used for validation of the findings from the derivation cohort. The difference between the 2 cohorts are described (Method and Table VI in the Data Supplement). The cluster prediction for the validation cohort was performed using the same 11 variables identified from the derivation cohort. As in the derivation cohort, cluster 2 in the validation cohort (V-cluster 2) included elderly patients with end-stage renal disease, and cluster 1 in the validation cohort (V-cluster 1) had reduced left ventricular ejection fraction and more frequent atrial fibrillation (Table VII in the Data Supplement).

The pattern of outcomes was reproduced in the validation cohort (Figure 3). During a median 4.3 years (interquartile range, 0.8–6.5 years) follow-up, 113 mortalities occurred (41 cardiac cause, 59 noncardiac cause, and 13 indeterminate). The all-cause mortality rate was the highest in V-cluster 2, followed by V-cluster 1 (Figure 3A). Both V-cluster 1 and 2 had increased cardiac mortality compared with V-cluster 3 (Figure 3B). However, the majority of noncardiac deaths occurred in V-cluster 2 (Figure 3C), as well as the post-AVR deaths (Figure 3D).

Figure 3.

Figure 3. Adverse outcomes according to the clusters in the validation cohort. Survival free from (A) all-cause, (B) cardiac, (C) noncardiac, and (D) post-aortic valve replacement (AVR) mortality. *P<0.05 by pairwise comparison with V-cluster 3.

In Cox analysis of the validation cohort, V-cluster 1 was associated with an increased risk of cardiac mortality, whereas V-cluster 2 had increased risk of both cardiac and noncardiac mortality (Table VIII in the Data Supplement), findings consistent with that from the derivation cohort.

Alternative Cluster Analysis Using Hierarchical Clustering

We further investigated whether another alternative cluster algorithm produces similar results. The derivation cohort data were reanalyzed with agglomerative hierarchical clustering. Three clusters were identified in the hierarchical clustering by the majority rule (Figure VI in the Data Supplement). The characteristics of each cluster generally corresponded with the original clusters from model-based clustering, that cluster 2 in hierarchical clustering (H-cluster 2) had more frequent comorbidities, and H-cluster 1 was characterized by cardiac dysfunctions, while H-cluster 3 had neither (Table IX in the Data Supplement).

The overall trend of outcomes was similar to that of the model-based clustering. H-cluster 2 had the highest all-cause, noncardiac, and post-AVR mortality rate (Figure VII in the Data Supplement). Both H-cluster 1 and 2 had more cardiac mortality compared with H-cluster 3, although the higher rate of cardiac mortality in H-cluster 1 was less prominent (P=0.078).


There are 3 main findings in the current study. First, unsupervised cluster analysis successfully demonstrated 3 groups of patients with moderate or severe AS with distinct phenotypes. Second, each cluster had markedly different patterns of outcomes and causes of death. Third, the result was replicated in the validation cohort. This study provides a guide on how patients with AS can be phenotypically categorized and what kinds of outcomes would be expected within these specific groups.

Cardiologists are apt to focus on echocardiography parameters of the valve in patients with AS. However, a variety of prognostic features has been identified beyond the valve, which is not included in the current staging.7,8,14,15,23 A recent study reported that the AS classification based on extravalvular cardiac damages (ie, left ventricular hypertrophy, tricuspid regurgitation) has a significant prognostic value in predicting outcomes after AVR, while AS severity measurements (ie, aortic valve area) was not associated with adverse events.23 While these diverse factors have been investigated individually through the conventional hypothesis-driven approach, a more integrative, data-driven cluster analysis can be powerful to explore heterogeneity.9–12

In our cluster analysis, each of the 3 clusters demonstrated distinct phenotypes and can be interpreted as follows: cardiac dysfunction for cluster 1, comorbidities for cluster 2, and healthy AS for cluster 3. Notably, echocardiographic variables that were pivotal for clustering were tricuspid regurgitant jet velocity, left atrial volume, E-, and A-wave velocity, and left ventricular ejection fraction (Figure IV in the Data Supplement), whereas none of the AS severity indices were critical for clustering. Despite the similar degree of AS severity throughout the groups, cluster 1 was characterized by various structural and functional cardiac dysfunctions, which led to the worst cardiac consequences. The result implies the substantial prognostic value within these cardiac imaging markers in AS outside the diseased aortic valve.14,23

Importantly, cluster 2 presented the worst prognosis in all-cause mortality, a difference mainly seen in noncardiac causes. Although noncardiac comorbidities, such as malignancy or infection, accounts for almost half of the actual deaths in AS,24 this has not been paid enough attention. Factors associated with noncardiac death are age, low body mass index, anemia, and dialysis,24 which were predominant features of cluster 2. These conditions, particularly end-stage renal disease, are also poor prognosticators even after AVR,25 further supporting the lowest post-AVR survival of cluster 2. Collectively, our results highlight that there are specific types of death more related to specific groups of patients with AS, and noncardiac death should not be neglected, particularly in those with significant comorbidities.

With the similar phenotypes for the corresponding clusters, the pattern of outcome and the leading cause of death were reproduced in an independent cohort (n=262), providing the external clinical validity and generalizability.10 Notably, the marginally increased cardiac mortality risk of cluster 2 in the derivation cohort became more evident in the validation cohort, as well as the worst post-AVR prognosis (Figure 3B and 3D). This further supported the characteristic outcomes of the clusters (cluster 1—cardiac mortality, cluster 2—noncardiac and cardiac mortality, and death after AVR).

Our study suggests that a different therapeutic and surveillance strategy may be needed for each cluster, as a step toward precision-medicine in AS. In particular, cluster analysis with deeply phenotyped data could be utilized for future trials of AS. With rapid advances in AVR techniques, recent studies have tried to address the optimal timing of intervention, specifically for patients with moderate or asymptomatic severe AS.26,27 Along with this clinical demand, there are ongoing efforts for a more sophisticated risk stratification using clinical factors,22 diverse imaging parameters,23 and biomarkers,27 to identify individuals for whom the benefits of AVR outweigh the risks.27 The cluster-based analysis, which segregates patients into distinct phenotypes and relevant adverse outcomes, may offer a novel target for AVR among the heterogeneous AS patients for future trials. For instance, given a similar favorable post-AVR prognosis to the healthy AS group, cluster 1 (cardiac dysfunction group) might be a candidate for a more active AVR, while it may not completely resolve the dismal outcome in cluster 2 (comorbidities group; Figure 3D). The clinical implications of cluster-based approaches may also be enhanced by incorporating a broader range of data, including proteomics or cardiac magnetic resonance. These hypotheses need to be tested in future work.


First, choosing the optimal clustering method can be difficult to determine. Although we found 3 distinct clusters using model-based clustering, which generally corresponds to the hierarchical clustering, other algorithms may yield different results. We adopted the model-based clustering as it is based on the likelihood and statistically more flexible. For instance, the number of clusters can be chosen by BIC or likelihood ratio test.9–12 Second, patients were enrolled from a single institution, and the validation is warranted in datasets from other institutions. Additionally, there were several differences between the derivation and validation cohort. Nevertheless, the phenotypes and outcomes were reproduced in the validation cohort, verifying the robustness of our results. Third, the multivariate Cox analysis may be limited due to the small number of events in the derivation cohort. Last, some variables had missing values. However, they were imputed with an adequate statistical method,16 the appropriateness verified in previous studies.17


Unsupervised cluster analysis of patients with AS demonstrated 3 distinct phenotypes with significantly different outcomes. The result provides new insight into a novel grouping of patients with AS, emphasizing the role of comorbidities and extravalvular cardiac dysfunctions in association with different causes of death.


The Data Supplement is available at

Sungho Won, PhD, Department of Public Health Sciences, Seoul National University, 1, Gwanak-ro, Gwanak-gu, Seoul 08826, South Korea. Email
Seung-Pyo Lee, MD, PhD, Division of Cardiology, Department of Internal Medicine, Seoul National University Hospital, 101, Daehak-ro, Jongno-gu, Seoul 03080, South Korea. Email


  • 1. d’Arcy JL, Coffey S, Loudon MA, Kennedy A, Pearson-Stuttard J, Birks J, Frangou E, Farmer AJ, Mant D, Wilson J, et al. Large-scale community echocardiographic screening reveals a major burden of undiagnosed valvular heart disease in older people: the OxVALVE Population Cohort Study.Eur Heart J. 2016; 37:3515–3522. doi: 10.1093/eurheartj/ehw229CrossrefMedlineGoogle Scholar
  • 2. Makkar RR, Fontana GP, Jilaihawi H, Kapadia S, Pichard AD, Douglas PS, Thourani VH, Babaliaros VC, Webb JG, Herrmann HC, et al; PARTNER Trial Investigators. Transcatheter aortic-valve replacement for inoperable severe aortic stenosis.N Engl J Med. 2012; 366:1696–1704. doi: 10.1056/NEJMoa1202277CrossrefMedlineGoogle Scholar
  • 3. Varadarajan P, Kapoor N, Bansal RC, Pai RG. Survival in elderly patients with severe aortic stenosis is dramatically improved by aortic valve replacement: Results from a cohort of 277 patients aged > or =80 years.Eur J Cardiothorac Surg. 2006; 30:722–727. doi: 10.1016/j.ejcts.2006.07.028CrossrefMedlineGoogle Scholar
  • 4. Faggiano P, Frattini S, Zilioli V, Rossi A, Nistri S, Dini FL, Lorusso R, Tomasi C, Cas LD. Prevalence of comorbidities and associated cardiac diseases in patients with valve aortic stenosis. Potential implications for the decision-making process.Int J Cardiol. 2012; 159:94–99. doi: 10.1016/j.ijcard.2011.02.026CrossrefMedlineGoogle Scholar
  • 5. Reardon MJ, Van Mieghem NM, Popma JJ, Kleiman NS, Søndergaard L, Mumtaz M, Adams DH, Deeb GM, Maini B, Gada H, et al; SURTAVI Investigators. Surgical or transcatheter aortic-valve replacement in intermediate-risk patients.N Engl J Med. 2017; 376:1321–1331. doi: 10.1056/NEJMoa1700456CrossrefMedlineGoogle Scholar
  • 6. Nishimura RA, Otto CM, Bonow RO, Carabello BA, Erwin JP, Guyton RA, O’Gara PT, Ruiz CE, Skubas NJ, Sorajja P, et al; ACC/AHA Task Force Members. 2014 AHA/ACC guideline for the management of patients with valvular heart disease: executive summary: a report of the American College of Cardiology/American Heart Association Task Force on practice guidelines.Circulation. 2014; 129:2440–2492. doi: 10.1161/CIR.0000000000000029LinkGoogle Scholar
  • 7. Vahanian A, Otto CM. Risk stratification of patients with aortic stenosis.Eur Heart J. 2010; 31:416–423. doi: 10.1093/eurheartj/ehp575CrossrefMedlineGoogle Scholar
  • 8. Lee H, Park JB, Yoon YE, Park EA, Kim HK, Lee W, Kim YJ, Cho GY, Sohn DW, Greiser A, et alNoncontrast myocardial T1 mapping by cardiac magnetic resonance predicts outcome in patients with aortic stenosis.JACC Cardiovasc Imaging. 2018; 11:974–983. doi: 10.1016/j.jcmg.2017.09.005CrossrefMedlineGoogle Scholar
  • 9. Dey D, Slomka PJ, Leeson P, Comaniciu D, Shrestha S, Sengupta PP, Marwick TH. Artificial intelligence in cardiovascular imaging: JACC state-of-the-art review.J Am Coll Cardiol. 2019; 73:1317–1335. doi: 10.1016/j.jacc.2018.12.054CrossrefMedlineGoogle Scholar
  • 10. Shah SJ, Katz DH, Selvaraj S, Burke MA, Yancy CW, Gheorghiade M, Bonow RO, Huang CC, Deo RC. Phenomapping for novel classification of heart failure with preserved ejection fraction.Circulation. 2015; 131:269–279. doi: 10.1161/CIRCULATIONAHA.114.010637LinkGoogle Scholar
  • 11. Katz DH, Deo RC, Aguilar FG, Selvaraj S, Martinez EE, Beussink-Nelson L, Kim KA, Peng J, Irvin MR, Tiwari H, et al. Phenomapping for the identification of hypertensive patients with the myocardial substrate for heart failure with preserved ejection fraction.J Cardiovasc Transl Res. 2017; 10:275–284. doi: 10.1007/s12265-017-9739-zCrossrefMedlineGoogle Scholar
  • 12. Shah RV, Yeri AS, Murthy VL, Massaro JM, D’Agostino R, Freedman JE, Long MT, Fox CS, Das S, Benjamin EJ, et al. Association of multiorgan computed tomographic phenomap with adverse cardiovascular health outcomes: the Framingham heart study.JAMA Cardiol. 2017; 2:1236–1246. doi: 10.1001/jamacardio.2017.3145CrossrefMedlineGoogle Scholar
  • 13. Baumgartner H, Hung J, Bermejo J, Chambers JB, Edvardsen T, Goldstein S, Lancellotti P, LeFevre M, Miller F, Otto CM. Recommendations on the echocardiographic assessment of aortic valve stenosis: a focused update from the European Association of Cardiovascular Imaging and the American society of echocardiography.J Am Soc Echocardiogr. 2017; 30:372–392. doi: 10.1016/j.echo.2017.02.009CrossrefMedlineGoogle Scholar
  • 14. Asami M, Lanz J, Stortecky S, Räber L, Franzone A, Heg D, Hunziker L, Roost E, Siontis GC, Valgimigli M, et al. The impact of left ventricular diastolic dysfunction on clinical outcomes after transcatheter aortic valve replacement.JACC Cardiovasc Interv. 2018; 11:593–601. doi: 10.1016/j.jcin.2018.01.240CrossrefMedlineGoogle Scholar
  • 15. O’Brien SM, Feng L, He X, Xian Y, Jacobs JP, Badhwar V, Kurlansky PA, Furnary AP, Cleveland JC, Lobdell KW, et al. The society of thoracic surgeons 2018 adult cardiac surgery risk models: part 2-statistical methods and results.Ann Thorac Surg. 2018; 105:1419–1428. doi: 10.1016/j.athoracsur.2018.03.003MedlineGoogle Scholar
  • 16. Stekhoven DJ, Bühlmann P. MissForest–non-parametric missing value imputation for mixed-type data.Bioinformatics. 2012; 28:112–118. doi: 10.1093/bioinformatics/btr597CrossrefMedlineGoogle Scholar
  • 17. Wei R, Wang J, Su M, Jia E, Chen S, Chen T, Ni Y. Missing value imputation approach for mass spectrometry-based metabolomics data.Sci Rep. 2018; 8:663. doi: 10.1038/s41598-017-19120-0CrossrefMedlineGoogle Scholar
  • 18. Fraley C, Raftery AE. Model-based clustering, discriminant analysis, and density estimation. J Am Stat Assoc. 2002; 97:611–631.CrossrefGoogle Scholar
  • 19. Scrucca L, Fop M, Murphy TB, Raftery AE. mclust 5: clustering, classification and density estimation using gaussian finite mixture models.R J. 2016; 8:289–317.CrossrefMedlineGoogle Scholar
  • 20. Uno H, Cai T, Pencina MJ, D’Agostino RB, Wei LJ. On the C statistics for evaluating overall adequacy of risk prediction procedures with censored survival data.Stat Med. 2011; 30:1105–1117. doi: 10.1002/sim.4154CrossrefMedlineGoogle Scholar
  • 21. Uno H, Tian L, Cai T, Kohane IS, Wei LJ. A unified inference procedure for a class of measures to assess improvement in risk prediction systems with survival data.Stat Med. 2013; 32:2430–2442. doi: 10.1002/sim.5647CrossrefMedlineGoogle Scholar
  • 22. Minamino-Muta E, Kato T, Morimoto T, Taniguchi T, Ando K, Kanamori N, Murata K, Kitai T, Kawase Y, Miyake Met al. A risk prediction model in asymptomatic patients with severe aortic stenosis: CURRENT-AS risk score. Eur Heart J Qual Care Clin Outcomes. 2019; 6:166–174 doi:10.1093/ehjqcco/qcz044CrossrefGoogle Scholar
  • 23. Généreux P, Pibarot P, Redfors B, Mack MJ, Makkar RR, Jaber WA, Svensson LG, Kapadia S, Tuzcu EM, Thourani VH, et al. Staging classification of aortic stenosis based on the extent of cardiac damage.Eur Heart J. 2017; 38:3351–3358. doi: 10.1093/eurheartj/ehx381CrossrefMedlineGoogle Scholar
  • 24. Minamino-Muta E, Kato T, Morimoto T, Taniguchi T, Shiomi H, Nakatsuma K, Shirai S, Ando K, Kanamori N, Murata K, et al. Causes of death in patients with severe aortic stenosis: an observational study.Sci Rep. 2017; 7:14723. doi: 10.1038/s41598-017-15316-6CrossrefMedlineGoogle Scholar
  • 25. Thourani VH, Keeling WB, Sarin EL, Guyton RA, Kilgo PD, Dara AB, Puskas JD, Chen EP, Cooper WA, Vega JD, et al. Impact of preoperative renal dysfunction on long-term survival for patients undergoing aortic valve replacement.Ann Thorac Surg. 2011; 91:1798–1806; discussion 1806. doi: 10.1016/j.athoracsur.2011.02.015CrossrefMedlineGoogle Scholar
  • 26. Kang DH, Park SJ, Lee SA, Lee S, Kim DH, Kim HK, Yun SC, Hong GR, Song JM, Chung CH, et alEarly surgery or conservative care for asymptomatic aortic stenosis.N Engl J Med. 2020; 382:111–119. doi: 10.1056/NEJMoa1912846CrossrefMedlineGoogle Scholar
  • 27. Lancellotti P, Vannan MA. Timing of intervention in aortic stenosis.N Engl J Med. 2020; 382:191–193. doi: 10.1056/NEJMe1914382CrossrefMedlineGoogle Scholar