Incremental Value of a Panel of Serum Metabolites for Predicting Risk of Atherosclerotic Cardiovascular Disease

Cardiovascular diseases (CVDs) are the leading causes of mortality and morbidity worldwide, accounting for 17.3 million deaths per year.1 The American College of Cardiology/American Heart Association 10year atherosclerotic CVD risk score is a sexand racespecific single multivariable risk assessment tool used to estimate the 10year CVD risk of an individual based on age, sex, and traditional risk factors (TRFs), including highdensity lipoprotein and total cholesterol, blood pressure, blood pressure medications, smoking, and type 2 diabetes.1 These factors contribute considerably to disease risk, although they may not identify atrisk individuals before disease onset.2,3 Previous studies found circulating metabolites predictive of cardiovascular traits, mostly using linear approaches and a limited number of metabolites.3– 5 By combining the effects of a larger number of individual biomarkers, TRFs, and environmental variables, we applied a machine learning technique to identify a metabolite panel crosssectionally associated with estimated atherosclerotic CVD (eASCVD) risk and longitudinally predictive of CVD mortality and morbidity in a populationbased cohort with independent replication, to gain further insights into the metabolic pathways underlying CVD risk. The data used in this study are held by the Department of Twins Research at King’s College London. The data can be released to bona fide researchers using our normal procedures overseen by the Wellcome Trust and its guidelines as part of our core funding (https://twins uk.ac.uk/resou rcesforresea rcher s/acces sourdata/). The scripts in R and all the necessary information to replicate the findings reported in this article are publicly available at https:// github.com/anano gal1/ASCVD metab olite panel. The flowchart of the study design is depicted in the Figure (A). We included women from TwinsUK1 with fasting serum metabolomic profiling (533 metabolites; Metabolon) along with eASCVD,1 TRFs, diet (healthy eating index),1 menopause status, and physical activity at 2 time points 6 years apart (SD=2) (Figure [B]). Individuals with prevalent CVD were excluded. TwinsUK provided informed written consent, and the study was approved by the St. Thomas’ Hospital Research Ethics Committee (REC Ref: EC04/015). Metabolites were inverse normalized, and missing values imputed using minimum runday measures. For each metabolite, we calculated residuals by running linear regressions adjusting for age, body mass index, menopause status, diet, and physical activity. To identify a metabolite panel associated with eASCVD, we built random forest models on the residuals at each time point, splitting the data set into training and test sets (80:20). We tuned hyperparameters

C ardiovascular diseases (CVDs) are the leading causes of mortality and morbidity worldwide, accounting for 17.3 million deaths per year. 1 The American College of Cardiology/American Heart Association 10-year atherosclerotic CVD risk score is a sex-and race-specific single multivariable risk assessment tool used to estimate the 10-year CVD risk of an individual based on age, sex, and traditional risk factors (TRFs), including high-density lipoprotein and total cholesterol, blood pressure, blood pressure medications, smoking, and type 2 diabetes. 1 These factors contribute considerably to disease risk, although they may not identify at-risk individuals before disease onset. 2,3 Previous studies found circulating metabolites predictive of cardiovascular traits, mostly using linear approaches and a limited number of metabolites. [3][4][5] By combining the effects of a larger number of individual biomarkers, TRFs, and environmental variables, we applied a machine learning technique to identify a metabolite panel cross-sectionally associated with estimated atherosclerotic CVD (eASCVD) risk and longitudinally predictive of CVD mortality and morbidity in a population-based cohort with independent replication, to gain further insights into the metabolic pathways underlying CVD risk.
The data used in this study are held by the Department of Twins Research at King's College London. The data can be released to bona fide researchers using our normal procedures overseen by the Wellcome Trust and its guidelines as part of our core funding (https://twins uk.ac.uk/resou rces-forresea rcher s/acces s-our-data/). The scripts in R and all the necessary information to replicate the findings reported in this article are publicly available at https:// github.com/anano gal1/ASCVD -metab olite -panel.
The flowchart of the study design is depicted in the Figure (A). We included women from TwinsUK 1 with fasting serum metabolomic profiling (533 metabolites; Metabolon) along with eASCVD, 1 TRFs, diet (healthy eating index), 1 menopause status, and physical activity at 2 time points 6 years apart (SD=2) (Figure [B]). Individuals with prevalent CVD were excluded. TwinsUK provided informed written consent, and the study was approved by the St. Thomas' Hospital Research Ethics Committee (REC Ref: EC04/015).
Metabolites were inverse normalized, and missing values imputed using minimum run-day measures. For each metabolite, we calculated residuals by running linear regressions adjusting for age, body mass index, menopause status, diet, and physical activity. To identify a metabolite panel associated with eAS-CVD, we built random forest models on the residuals at each time point, splitting the data set into training and test sets (80:20). We tuned hyperparameters using the adaptive resampling search and used 5-fold cross-validation and node purity to select the optimal predictors' number. We identified common predictors between the 2 time points and examined the effect on model prediction using the Shapley additive explanations plot. Common metabolites with concordant effects at both time points were included in the eAS-CVD metabolites panel. Results were replicated in 295 women from PREDICT-1 (Personalised Responses to Dietary Composition Trial). 1 We further tested the incremental area under the curve (AUC) value of the eAS-CVD metabolites panel in predicting incident cardiac disease (including congestive heart disease, angina, atrial fibrillation, and coronary heart disease) and CVD mortality (through record linkage with the Office for National Statistics [ONS]) in independent sets of 50 to 134 individuals (follow-up, 5.6 years [SD, 2.2 years]). Finally, we explored the pathways in which the identified metabolites were involved using Ingenuity Pathway Analysis (QIAGEN; Fisher exact test, false discovery rate [Benjamini-Hochberg] <0.05).
The random forest models on residuals in 1066 TwinsUK women adjusted for age, body mass index, menopause, physical activity, and diet identified 100 and 67 predictors of eASCVD at time point 1 and 2, respectively, of which 25 were overlapping. Of these, 21 had concordant effects at both time points and were included in the eASCVD metabolites panel. After adjusting for family, the panel explained 12.7% of the variance in eASCVD in the test set and 13.6% in PREDICT-1. When further adjusting for TRFs, the panel explained 9.3% in the test set and 8.5% in PREDICT-1. Among the metabolites identified, 9 were positively associated with eASCVD, whereas 12 were negatively associated (Figure [C]). The peptide phenylalanyltryptophan, the lipid choline phosphate, and the amino acid 4-hydroxyphenylpyruvate were the most important contributors (Figure [C (Figure [D]). Finally, pathway enrichment analysis highlighted the involvement (false discovery rate range = 0.01-0.02) of the metabolites positively associated with eASCVD in the biosynthesis of 4-hydroxyphenylpyruvate, choline, phosphatidylcholine and glucocorticoids, sphingomyelin metabolism, tyrosine degradation, and phospholipases. Moreover, the panel was enriched (false discovery rate range = 0.001-0.04) in metabolites related to cardiac inflammation, dysfunction damage, and infarction.
Here, we report for the first time a panel of serum metabolites correlated with eASCVD explaining 9.3% of the variance not already explained by environmental and TRFs. The panel further improved prediction of incident cardiac disease and CVD mortality over and above conventional risk factors, thereby generating new research avenues. Metabolites positively associated with eASCVD are enriched in pathways previously linked with atherosclerotic CVD. 2 The sphingomyelin:phosphatidylcholine ratio, choline and glucocorticoids biosynthesis, tyrosine degradation, and phospholipases have been shown to increase the CVD risk and/ or mortality risk. 2,4,5 Therefore, this study sheds light into the metabolites behind these pathways.
Limitations include the homogeneous ethnicity and women-only composition of the samples, the lack of longitudinal data in PREDICT-1, and the limited number of CVD events. However, we benefit from crosssectional ASCVD data, independent data sets to test the panel predictive power, and independent replication. Our results illustrate how metabolic profiling along with machine learning might identify novel biomarkers implicated in CVD, which are crucial for early diagnosis and treatment.