Skip main navigation

Association Between Coffee Intake and Incident Heart Failure Risk

A Machine Learning Analysis of the FHS, the ARIC Study, and the CHS
Originally published Heart Failure. 2021;14:e006799



Coronary heart disease, heart failure (HF), and stroke are complex diseases with multiple phenotypes. While many risk factors for these diseases are well known, investigation of as-yet unidentified risk factors may improve risk assessment and patient adherence to prevention guidelines. We investigated the diet domain in FHS (Framingham Heart Study), CHS (Cardiovascular Heart Study), and the ARIC study (Atherosclerosis Risk in Communities) to identify potential lifestyle and behavioral factors associated with coronary heart disease, HF, and stroke.


We used machine learning feature selection based on random forest analysis to identify potential risk factors associated with coronary heart disease, stroke, and HF in FHS. We evaluated the significance of selected variables using univariable and multivariable Cox proportional hazards analysis adjusted for known cardiovascular risks. Findings from FHS were then validated using CHS and ARIC.


We identified multiple dietary and behavioral risk factors for cardiovascular disease outcomes including marital status, red meat consumption, whole milk consumption, and coffee consumption. Among these dietary variables, increasing coffee consumption was associated with decreasing long-term risk of HF congruently in FHS, ARIC, and CHS.


Higher coffee intake was found to be associated with reduced risk of HF in all three studies. Further study is warranted to better define the role, possible causality, and potential mechanism of coffee consumption as a potential modifiable risk factor for HF.

What Is New?

  • Little is known about the risk of developing heart failure (HF) associated with dietary components and the potential benefits of changing intake of specific foods.

  • By using machine learning in FHS (Framingham Heart Study), we identified several dietary factors that might be associated with risk of HF.

  • We found in 3 large, well-known studies (FHS, CHS [Cardiovascular Health Study], and the ARIC study [Atherosclerosis Risk in Communities]) that increased coffee consumption appeared to correlate with reduced risk of developing HF later in life.

  • Additional work is needed to determine whether modulating coffee intake could affect future risk of developing HF.

What are the Clinical Implications?

  • Controlling for known risk factors, increased coffee consumption was found to be associated with reduced risk of HF in 3 large, longitudinal epidemiological studies (FHS, CHS, and ARIC). The mechanism of this association is unclear, but limited analysis in FHS and CHS suggests caffeine may be an important contributor. The high prevalence of coffee consumption in society suggests further study is warranted to better define the role, possible causality, and potential mechanism of coffee and caffeine consumption as potential modifiable risk factors for HF.


See Editorial by Vest

Coronary heart disease (CHD), heart failure (HF), and stroke are among the top causes of death attributable to cardiovascular disease (CVD) in the United States.1 Risk factors for CHD, HF, and stroke have been previously identified and incorporated into predictive models to provide quantitative assessments for individual risk of developing disease and support development of personalized CVD prevention strategies.2–7 Although widely used, these models consider a relatively limited set of patient characteristics, and there may remain as-yet unidentified risk markers, which could improve accuracy of risk prediction and possibly represent opportunities for improved CVD prevention.4,8,9 Additional lifestyle and behavioral risk factors such as diet have been identified since the initial development of CVD risk models such as the Framingham Heart Score, which are used widely in primary care. Understanding and validation of the association of these factors in CVD has the potential to improve understanding of cardiovascular risk and aid in patient adherence to lifestyle and behavioral therapies.2

Epidemiological studies like the FHS (Framingham Heart Study) collected thousands of patient characteristics. Validation and identification of risk factors for complex diseases such as CHD, HF, and stroke are made difficult by the large number and variety of potentially relevant patient characteristics such as comorbid conditions, lifestyle, and patient behavior. Traditionally, epidemiological statistical approaches use a hypothesis-driven framework to reduce the number of factors evaluated in a given model by selecting factors using clinical expertise. Although this approach is effective in focusing the analysis and reducing false discovery, clinical bias can impact the features evaluated, potentially excluding unanticipated predictors of disease.10 Using machine learning–based feature selection to identify factors potentially important to disease risk can be advantageous because it allows for the assessment of large numbers of patient characteristics in a comparatively unbiased manner, reduces false positives, and can potentially pick up patterns that otherwise may be missed when using a hypothesis-based approach.11–14 The ability of machine learning methods to analyze large sets of features in an automated fashion is one of the several advantages of machine learning that has fueled its adoption in data analytics. Longitudinal studies such as FHS, the ARIC study (Atherosclerosis Risk in Communities), and the CHS (Cardiovascular Health Study) enrolled thousands of patients with relatively high event rates over decades of follow-up. These study qualities provide an excellent foundation to explore utilizing machine learning to identify CVD risk factors beyond those used in current predictive models.

The objective of this analysis was to use supervised machine learning to identify potential variables important to assessing risk of incident CHD, stroke, and HF in a hypothesis-free, data-driven manner. We then evaluated the significance of association between these features and CVD adjusting for known risk factors and validated these findings in the ARIC and CHS. We hypothesized that feature selection through supervised machine learning would identify potentially novel risk factors for CHD, HF, and stroke.


Data Sources

This study was completed under an approved expedited Colorado Multiple Institution Review Board protocol (No. 15-1193). FHS, CHS, and ARIC clinical data were obtained from the National Heart, Lung, and Blood Institutes Biologic Specimen and Data Repository Information Coordinating Center (BioLINCC, Calverton, MD). Because of the sensitive nature of the data collected for this study, requests to access the dataset from qualified researchers trained in human subject confidentiality protocols completed through BioLINCC. FHS, CHS, and ARIC were prospective longitudinal cohort studies designed to investigate the incidence, survival rate, and determinants of CVD. All studies were community based, investigated multiple incident CVD end points, and included at least 10 years of follow-up from the exam used as baseline for this analysis. Study design, response rates, and methodologies of each study are reported in detail elsewhere.15–17 Participants in FHS (n=5209) were between the ages of 30 and 62 years and assessed every 2 years. Participants who attended FHS Exam 14 and who had not yet had a CVD event were used in this analysis (n=2732). Exam 14 was used as the reference date for all time-to-event analyses. CHS participants (n=5888) were above the age of 65 years and assessed annually for ≈10 years. Of the 5888 CHS participants, the 3704 participants without prior CVD who contained complete data for the dietary factors identified during feature selection were used in analysis. ARIC (n=15 792) enrolled individuals aged 45 to 64 years without prior CVD and consisted of 4 exams conducted every 3 years and a fifth exam ≈25 years after enrollment. From the ARIC study, 14 925 participants contained data for follow-up CVD events and were used for analysis. FHS Exam 14 occurred during 1975 to 1978 with mean follow-up of 16.7±9.8 (max, 36.3 years) thereafter. ARIC began in 1987 and CHS in 1989, indicating a temporal overlap between all three studies. Patients in CHS on average were older than patients in FHS and ARIC and had higher rates of comorbid conditions and outcome events. Patients in ARIC, while on average 10 years younger than FHS patients, had similar rates of smoking status, hypertension, and diabetes but trended lower in prevalence of outcome events. Baseline characteristics among participant subgroups were compared using the χ2 and Mann-Whitney U test for categorical and continuous variables, respectively (Table 1). All analyses were performed using the R statistical package (version 3.5; R Foundation for Statistical Computing, Vienna, Austria). Random forest analysis was completed using the caret package, and Cox proportional hazards analysis was completed using the survival package.18,19

Table 1. Overview Statistics of Coffee Consumption and Known Risk Factors of CVD

Framingham Exam 14 (n=2732)ARIC baseline (n=14 925)CHS baseline (n=3704)
Women1602 (59)8153 (55)1625 (49)
Age, y66 (61–73)54 (49–59)71 (67–75)
Systolic BP, mg/dL137 (125–150)119 (108–131)134 (121–149)
HDL cholesterol, mg/dL48 (39–58)48 (39–61)50 (42–62)
Total cholesterol, mg/dL227 (202–258)212 (186–239)210 (185–235)
BMI, kg/m225.9 (23.5–28.7)26.9 (24.0–30.4)26.1 (23.6–29.0)
CVD risk score, n (%)10.2 (5.9–17.5)4.2 (2.2–7.8)11.0 (6.7–17.6)
Hypertension, n (%)880 (32)3812 (26)1772 (47)
Current smoking, n (%)820 (30)3951 (26)385 (10)
Diabetes mellitus, n (%)176 (7)1509 (10)604 (16)
Coffee intake, cups/d; median (Q1–Q3)2 (1–3)1 (0–2.5)0.36 (0–1)
Incident outcomes
 CVD, n (%)1172 (43)4401 (30)2736 (73)
 HF, n (%)625 (23)2324 (17)1698 (46)
 Stroke, n (%)461 (17)1127 (8)1147 (31)
 CHD, n (%)706 (31)3033 (20)2199 (59)

Values are n (%) and median (quartile 1 to quartile 3). ARIC indicates Atherosclerosis Risk in Communities; BMI, body mass index; BP, blood pressure; CHD, coronary heart disease; CHS, Cardiovascular Health Study; CVD, cardiovascular disease; HDL, high-density lipoprotein; and HF, heart failure.

*P<0.001 for all characteristics.


Outcomes of interest were time to incident CHD, HF, and stroke. All outcomes were adjudicated per individual study protocols, and data transformations for harmonization are given in Tables I through IV in the Data Supplement. Incident HF in FHS was defined using the Framingham HF Diagnostic Criteria.20 In ARIC, incident HF was defined using the International Classification of Diseases, Ninth Revision, Clinical Modification codes upon hospital discharge. In CHS, HF was adjudicated based on ejection fraction (when available), signs, symptoms, clinical tests, and medical therapy. Code for the analyses performed is available from the first author () upon reasonable request.

Feature Selection

We used FHS Exam 14 because it was the first clinical exam to include dietary variables, and it had been used in the development of prior FHS risk models.7 In total, 222 variables were recorded at Exam 14, which included conventional variables such as age, sex, blood pressure, and others (Table I in the Data Supplement). Variables with missing data and excluded variables with >15% missing values were excluded, and samples with complete cases for the remaining variables were used in feature selection.21 Of the remaining 204 variables, 16 were dietary factors and 13 were nondietary lifestyle behaviors. Participants were included in the analysis only if they had no missing data from any of the 204 variables.

Patient characteristics potentially important for predicting incident CHD, HF, and stroke were identified using random forest analysis. For optimal feature selection, we used 10-fold cross validation with 5 repeats.14,18,22 Candidate variables for use in time-to-event analysis were the top 20% predictors based on importance metrics across all outcomes in the random forest model (Table I in the Data Supplement). The majority of nondietary and nonlifestyle variables in the top 20% were collinearly related to known risk factors.

Evaluation of Feature Significance

Significance, magnitude, and direction of association between candidate dietary factors and outcomes of interest were assessed using multivariable Cox proportional hazards analysis. The randomized nature of both the random forest methods applied and importance metric calculated does control for potential confounding and collinearity in machine learning experiments. Variables with importance scores in the top 20% were assessed for collinearity and compared with known risk factors when assessing which variables to include as covariates in the Cox proportional hazards analysis. The association between variables in top 20% importance and the outcomes were assessed for collinearity and strength of association with the outcome. Given that the majority of nondietary risk factors with importance scores in the top 20% were collinearly related to known risk factors and some known risk factors were only modestly associated with the outcome, we chose to use the FHS CVD risk score to provide good coverage of the probabilities for known risk factors while also accounting for known risks with weaker association.23 Models for individual CHD, HF, and stroke outcomes were calibrated as presented in the original FHS CVD risk score publication by D’Agostino et Al.5 We chose to use a risk score when performing multivariable analysis to account for collinearity and the impact of the combinations of known risk factors over the impact of each individual factor alone.24P<0.05 was considered significant throughout.


Baseline exams of CHS and ARIC were used for validation of the findings from FHS. Where possible, dietary factors significantly associated with outcomes of interest in FHS were harmonized with comparable variables in CHS and ARIC (Tables II through IV in the Data Supplement). Associations between dietary factors and clinical outcomes were then validated in CHS and ARIC. All outcomes, traditional risk factors, and dietary factors harmonized between FHS, ARIC, and CHS for validation are given in Tables II through IV in the Data Supplement. The first and senior authors (L.M.S. and D.P.K.) each have full access to all relevant data and take responsibility for the integrity of the analysis.


Feature Selection

Baseline characteristics of analyzed participants for all three studies are summarized in Table 1. The decision trees from the random forest models containing the 204 potential data measurements at Exam 14 with CHD, stroke, and HF outcomes were investigated to assess the importance of potential risk factors of CHD, stroke, and HF. There were 35 common risk factors across all outcomes that were ranked in the top 20% of important features by random forest analysis for either CHD, HF, or stroke (Table 2). Among these features were known risk factors such as blood pressure, age, and cholesterol. Given the potential for behavioral modification with dietary factors over nonmodifiable factors such as the number of dead siblings, we selected dietary factors in the top 20% of variables ranked by importance to be evaluated further. Dietary factors including consumption of whole milk, red meat, eggs, alcohol, cheese, coffee, and decaffeinated coffee were also ranked in the top 20% most important to risk of all CVD outcomes as were other lifestyle factors including marital status (Table 2). To evaluate and validate the association of coffee consumption with HF and stroke, the coffee consumption values from CHS and ARIC were converted to cups per day. For ARIC, coffee consumption was reported as rarely/never, a few cups/month, 1 cup/week, 2 to 4 cups/week, 5 to 6 cups/week, 2 to 3 cups/day, 4 to 5 cups/day, and >6 cups/day, which were transformed to 0, 0.07, 0.14, 0.43, 0.79, 1, 2.5, 5, and 6.5 cups/day, respectively. CHS frequencies were never, 5 to 10 cups/year, 1 to 3 cups/month, 1 to 4 cups/week, and nearly every day, which were transformed to 0, 0.021, 0.07, 0.36, and 1 cup/day, respectively. The definition of what constituted red meat and level of consumption in FHS were ambiguous and could not be satisfactorily harmonized with ARIC and CHS preventing confident validation. Therefore, red meat consumption was not investigated beyond initial importance. All other dietary factors were further investigated to assess magnitude and direction of risk using univariable and multivariable analysis.

Table 2. Variables Ranked in Top 20% From Random Forest Decision Trees and Importance Score for All Outcomes: Cardiovascular Disease, Coronary Heart Disease, Stroke, and Heart Failure (Displayed in Ranked Order)

FG319PH PH 8 or 9

* Dietary factors.

Evaluation of Feature Significance

Of the dietary factors identified by random forest that were evaluated using Cox proportional hazards analysis, coffee consumption was the only factor that remained significantly associated with any of the outcomes. Increasing caffeinated coffee consumption was found to be significantly associated with reduced risk of HF (hazard ratio [HR], 0.95 per cup/day [95% CI, 0.91–0.99]; P=0.02) and stroke (HR, 0.94 per cup/day [95% CI, 0.89–0.99]; P=0.02) but not CHD (P=0.21) or CVD (P=0.59). Adjusted for the FHS CVD risk score, increasing caffeinated coffee consumption remained significantly associated with decreased risk of HF (HR, 0.95 per cup/day [95% CI, 0.90–1.00]; P=0.03) but not stroke (P=0.33).


Results of univariable and multivariable survival analysis for HF in all three studies are shown in the Figure. In univariable analysis, increasing coffee consumption was significantly associated with decreased risk of HF in both CHS (HR, 0.86 per cup/day [95% CI, 0.78–0.96]; P=0.005) and ARIC (HR, 0.98 per cup/day [95% CI, 0.96–0.99]; P=0.048). When adjusted for the FHS risk scores, coffee consumption remained significantly associated with HF in CHS (HR, 0.88 per cup/day [95% CI, 0.79–0.97]; P=0.01). In ARIC, coffee consumption showed a trend toward multivariable association between coffee consumption and HF (HR, 0.98 per cup/day [95% CI, 0.96–1.00]; P=0.06). To investigate dose response, participant characteristics according to quartiles of caffeinated coffee consumption (0, 1, 2, and ≥3 per day) are shown in Table 3. Compared with no coffee consumption, risk of HF was similar in participants drinking 1 cup/day (P=0.19) but reduced in participants drinking 2 cups/day (HR, 0.69 [95% CI, 0.55–0.87]; P<0.001) and ≥3 cups/day (HR, 0.71 [95% CI, 0.58–0.89]; P<0.001). A dose-response threshold for reduction in risk was not found; however, higher consumption rates did not yield high enough sample sizes to individually higher coffee consumption levels, and CHS did not report coffee consumption with enough granularity to categorize participants into consumption beyond 1 cup/day.

Table 3. Clinical Characteristics and Outcomes of Coffee Consumption and Known Risk Factors of CVD by Quartile of Coffee Consumption

0 cups/d1 cup/d2 cups/d≥3 cups/d
8809 (41)5130 (24)4056 (19)3347 (16)
Women4874 (55)2719 (53)2106 (52)1648 (49)
Age, y59 (51–67)62 (54–69)56 (50–61)56 (50–61)
Systolic BP, mg/dL125 (112–139)127 (115–143)121 (109–134)119 (108–133)
HDL cholesterol, mg/dL49 (39–61)50 (40–62)49 (39–61)48 (38–59)
Total cholesterol, mg/dL212 (187–239)214 (188–242)214 (188–241)217 (190–244)
BMI, kg/m226.8 (24.0–30.4)26.7 (23.9–30.1)26.4 (23.8–29.5)26.1 (23.5–29.2)
CVD risk score5.6 (2.8–11.0)6.9 (3.5–12.8)4.5 (2.4–8.9)5.1 (2.6–9.3)
Hypertension, %3040 (35)1847 (36)959 (24)592 (18)
Current smoking, %1529 (17)1139 (22)1134 (28)1339 (40)
Diabetes mellitus, %1112 (13)644 (13)329 (8)193 (6)
Incident outcomes
 CVD, %3546 (40)2305 (45)1262 (31)1145 (34)
 HF, %2037 (23)1387 (27)627 (15)566 (17)
 Stroke, %1242 (14)824 (16)355 (9)296 (9)
 CHD, %2636 (30)1643 (32)865 (21)757 (23)

N (%) and median (quartile 1 to quartile 3). BMI indicates body mass index; BP, blood pressure; CHD, coronary heart disease; CVD, cardiovascular disease; HDL, high-density lipoprotein; and HF, heart failure.

*P<0.001 across all quartiles.


Figure. Association between coffee and incident heart failure in FHS (Framingham Heart Study), CHS (Cardiovascular Health Study), and the ARIC study (Atherosclerosis Risk in Communities).

To investigate the possible role of caffeine in association between coffee consumption and HF risk, we performed additional analyses with respect to decaffeinated coffee consumption (FHS and CHS) and caffeine intake (FHS, CHS, and ARIC). Caffeinated versus decaffeinated tea consumption was not separated in FHS Exam 14 or baseline CHS and, therefore, was not considered. Data from FHS Exam 20 (n=867) and CHS Exam 8 (n=1903) were used because these were the first to report estimated caffeine consumption.

Decaffeinated coffee consumption was significantly associated with increased risk of HF in multivariable analysis in FHS (HR, 1.10 per cup/day [95% CI, 1.03–1.17]; P=0.004) but not in CHS (P=0.63). All three studies showed a concordant inverse relationship between caffeine intake in 100-mg doses (1 cup coffee or 2 cups black tea) and risk of HF. In FHS, increased caffeine consumption was found to be significantly associated with reduced risk of HF in both univariable (HR, 0.93/100 mg caffeine [95% CI, 0.86–0.98]; P=0.02) and multivariable analyses (HR, 0.92/100 mg caffeine [95% CI, 0.86–0.98]; P=0.01.) In CHS, caffeine consumption was significantly associated with reduced risk of HF in univariable analyses (HR, 0.96/100 mg caffeine [95% CI, 0.92–0.99]; P=0.02) and showed a trend toward reduced HF risk in multivariable analysis (HR, 0.97/100 mg caffeine [95% CI, 0.93–1.00]; P=0.07). In ARIC, increased caffeine consumption was also associated with significantly reduced risk of HF in univariable (HR, 0.98/100 mg caffeine [95% CI, 0.97–0.99]; P=0.01) and multivariable analyses (HR, 0.99/100 mg caffeine [95% CI, 0.97–0.99]; P=0.049).


HF incidence, HF hospitalizations, and societal costs continue to increase despite decreasing CHD and stroke mortality rates.1,25 Although much is known about modifiable risk factors for ischemic CVD, opportunities for reducing HF incidence are less clear, likely, in part, because a substantial fraction of HF is nonischemic in etiology. Using random forest feature selection applied to data from FHS, we found multiple dietary and behavioral risk factors including marital status, red meat consumption, whole milk consumption, and coffee consumption that may be associated with CHD, HF, or stroke. Evaluation of these features showed that people who reported higher coffee consumption rates were associated with decreased long-term risk of HF concordantly in FHS, ARIC, and CHS. Previous studies primarily focused on composite CVD outcomes or CHD and CVD mortality, whereas relatively few studies have reported an association between coffee consumption and HF risk. This analysis expands those findings to include the relationship between decreased risk of HF and higher coffee consumption. The mechanism of this association is unclear, but limited analysis in FHS and CHS suggested that caffeine may be an important contributor. The pervasive consumption of coffee in modern society and the high potential for dietary modification that could reduce HF risk suggest further exploration of the role of caffeine and coffee in development of HF is warranted.

Caffeinated coffee consumption and reduced risk of CHD mortality has been previously reported in elderly participants without hypertension.26,27 In FHS, elderly individuals who consumed any caffeinated coffee had a 43% reduction in CHD deaths compared with those who never consumed coffee. Similarly, an analysis of National Health and Nutrition Examination Survey revealed that individuals ≥65 years of age who did not have severe hypertension also had a dose-dependent decrease in CVD and cardiovascular mortality associated with higher coffee consumption.26,27 Another prospective epidemiological study found that consumption of coffee, green tea, and oolong tea and total caffeine intake was associated with reduced stroke and mortality from CVD in Japanese men and women.28 More recently, a review of 201 meta-analyses found that increasing daily coffee consumption was associated with decreased CVD mortality and all-cause mortality.29 A systematic review of 351 observational studies of healthy adults, adolescents, and pregnant women found that consumption of ≤400 mg (4 cups of coffee or 8 cups of black tea) of caffeine/day was not associated with cardiovascular toxicity in adults.30 A meta-analysis of 53 studies found a nonlinear association between long-term coffee consumption and CVD risk, where 3 to 5 cups/day of coffee was significantly associated with decreased CVD risk, compared with none, light (1–2 cups/day), or heavy (≥6 cups/day) caffeinated coffee consumption. However, the authors speculated that heavy coffee consumption analysis may have been confounded by increased smoking rates, and decaffeinated coffee consumption was not associated with elevated CVD risk.31 Finally, a meta-analysis incorporating 5 prospective studies comprised of 6522 HF events and 140 220 participants and investigating HF risk and coffee (caffeinated and decaffeinated) consumption, observed a statistically significant J-shaped relationship between coffee and HF. Compared with no consumption, the strongest inverse association was seen for 4 servings/day with a potentially higher risk above this level of consumption.32 When considering doses of 1, 2, and ≥3 cups per day in this analysis, we did not observe a similar J-shaped curve. Coffee contains higher amounts of caffeine than any other dietary product in addition to containing many other constituents such as potassium, niacin, magnesium, or tocopherols that could contribute for this association.33 Our results support caffeine is in fact an important contributor given that increased estimated caffeine consumption irrespective of source was associated with decreased HF risk in all three studies.

Increasing decaffeinated coffee consumption was associated with increased risk of HF, although this was only shown in FHS. As with prior studies, interpretation of the association between decaffeinated coffee and incident CVD was limited by much lower reported coffee consumption.31 Association between increasing decaffeinated coffee consumption and increased CVD risk could also be due to unobserved latent or confounding factors, such as individuals with other CVD risk factors switching from regular to decaffeinated coffee or concomitant high-risk behaviors such as smoking.31,34 Additionally, methods for decaffeinating coffee can involve the addition of harmful chemicals, which could be affecting the association between increased decaffeinated coffee consumption and increased risk of HF.35

The potential of intentional higher coffee consumption as a means of reducing HF risk cannot be determined from this analysis. It remains possible that coffee consumption is a marker or proxy for another behavior or dietary factor that reduces HF risk. Consequently, intentional or prescribed increase in coffee intake for the purposes of reducing HF risk cannot be recommended based on our results. However, the pervasive popularity of coffee worldwide suggests great potential for reducing CVD risk through dietary modification if the association is true and highlights the importance of future clinical studies to validate these observations.


The observational and retrospective nature of the data, much of which rely on patient recall, introduces significant uncertainty regarding data quality and the strong possibility of unmeasured confounders. For example, the data specifically do not distinguish between the type of coffee consumed (eg, type of bean/organic/mold content), use of additives (eg, sugar or creamer), brewing method (eg, drip versus espresso), or timing of consumption (eg, with breakfast versus after dinner/before bedtime), which could further impact the associations between coffee and clinical outcomes. Associations between coffee consumptions and key CVD risk factors not present in the Framingham risk models could also impact results. Data regarding caffeine intake were estimated based on patient-reported dietary intake and not collected uniformly. Because correlation does not imply causality, a prospective randomized or cohort control trial would ideally be used to validate these findings.


Machine learning feature selection identified coffee consumption as an important risk factor for subsequent development of HF. Higher coffee consumption and caffeine intake were associated with reduced risk of HF in 3 large, well-known epidemiological studies, although decaffeinated coffee was not. Further study is warranted to better define the mechanism and role of coffee consumption as a potential modifiable risk factor for HF.

Nonstandard Abbreviations and Acronyms


Atherosclerosis Risk in Communities


coronary heart disease


Cardiovascular Health Study


cardiovascular disease


Framingham Heart Study


heart failure


hazard ratio


We would like to thank Carsten Görg and Chris Roeder for their assistance in preparing this article.

Supplemental Materials

Tables I–IV

Disclosures None.


The Data Supplement is available at

For Sources of Funding and Disclosures, see pages 187-188.

Correspondence to: David P. Kao, MD, University of Colorado Anschutz Medical School, 12700 E 19th Ave, Campus Box B-139, Aurora, CO 80045. Email The American Heart Association Precision Medicine Platform ( was used for data analysis.


  • 1. Benjamin EJ, Muntner P, Alonso A, Bittencourt MS, Callaway CW, Carson AP, Chamberlain AM, Chang AR, Cheng S, Das SR, et al; American Heart Association Council on Epidemiology and Prevention Statistics Committee and Stroke Statistics Subcommittee. Heart disease and stroke statistics-2019 update: a report from the American Heart Association.Circulation. 2019; 139:e56–e528. doi: 10.1161/CIR.0000000000000659LinkGoogle Scholar
  • 2. Goff DC, Lloyd-Jones DM, Bennett G, Coady S, D’Agostino RB, Gibbons R, Greenland P, Lackland DT, Levy D, O’Donnell CJ, et al; American College of Cardiology/American Heart Association Task Force on Practice Guidelines. 2013 ACC/AHA guideline on the assessment of cardiovascular risk: a report of the American College of Cardiology/American Heart Association Task Force on Practice Guidelines.Circulation. 2014; 129(25 suppl 2):S49–S73. doi: 10.1161/01.cir.0000437741.48606.98LinkGoogle Scholar
  • 3. Brindle PM, McConnachie A, Upton MN, Hart CL, Davey Smith G, Watt GC. The accuracy of the Framingham risk-score in different socioeconomic groups: a prospective study.Br J Gen Pract. 2005; 55:838–845.MedlineGoogle Scholar
  • 4. Lloyd-Jones DM. Strengths and limitations of the ASCVD risk score and what should go in the risk discussion.Am Coll Cardiol. 2014. Accessed Jan 22, 2021.Google Scholar
  • 5. D’Agostino RB, Vasan RS, Pencina MJ, Wolf PA, Cobain M, Massaro JM, Kannel WB. General cardiovascular risk profile for use in primary care: the Framingham Heart Study.Circulation. 2008; 117:743–753. doi: 10.1161/CIRCULATIONAHA.107.699579LinkGoogle Scholar
  • 6. Kannel WB, D’Agostino RB, Silbershatz H, Belanger AJ, Wilson PW, Levy D. Profile for estimating risk of heart failure.Arch Intern Med. 1999; 159:1197–1204. doi: 10.1001/archinte.159.11.1197CrossrefMedlineGoogle Scholar
  • 7. D’Agostino RB, Wolf PA, Belanger AJ, Kannel WB. Stroke risk profile: adjustment for antihypertensive medication. The Framingham Study.Stroke. 1994; 25:40–43. doi: 10.1161/01.str.25.1.40LinkGoogle Scholar
  • 8. Damen JA, Hooft L, Schuit E, Debray TP, Collins GS, Tzoulaki I, Lassale CM, Siontis GC, Chiocchia V, Roberts C, et al. Prediction models for cardiovascular disease risk in the general population: systematic review.BMJ. 2016; 353:i2416. doi: 10.1136/bmj.i2416CrossrefMedlineGoogle Scholar
  • 9. Goldstein BA, Navar AM, Pencina MJ, Ioannidis JP. Opportunities and challenges in developing risk prediction models with electronic health records data: a systematic review.J Am Med Inform Assoc. 2017; 24:198–208. doi: 10.1093/jamia/ocw042CrossrefMedlineGoogle Scholar
  • 10. Oakden-Rayner L, Carneiro G, Bessen T, Nascimento JC, Bradley AP, Palmer LJ. Precision radiology: predicting longevity using feature engineering and deep learning methods in a radiomics framework.Sci Rep. 2017; 7:1648. doi: 10.1038/s41598-017-01931-wCrossrefMedlineGoogle Scholar
  • 11. Awada W, Khoshgoftaar TM, Dittman D, Wald R, Napolitano A. A review of the stability of feature selection techniques for bioinformatics data.In: 2012 IEEE 13th International Conference on Information Reuse Integration (IRI). 2012. p. 356–363.Google Scholar
  • 12. Breiman L. Random forests.Mach Learn. 2001; 45:5–32.CrossrefGoogle Scholar
  • 13. Syarif I, Zaluska E, Prugel-Bennett A, Wills G. Application of bagging, boosting and stacking to intrusion detection.Perner P, ed. In: Machine Learning and Data Mining in Pattern Recognition. Springer; 2012:593–602.Google Scholar
  • 14. Saeys Y, Inza I, Larrañaga P. A review of feature selection techniques in bioinformatics.Bioinformatics. 2007; 23:2507–2517. doi: 10.1093/bioinformatics/btm344CrossrefMedlineGoogle Scholar
  • 15. Framingham Heart Study. Scholar
  • 16. The Atherosclerosis Risk in Communities Study (ARIC). NHLBI Obesity Research - NHLBI, NIH. Accessed January 22, 2021.Google Scholar
  • 17. Fried LP, Borhani NO, Enright P, Furberg CD, Gardin JM, Kronmal RA, Kuller LH, Manolio TA, Mittelmark MB, Newman A. The Cardiovascular Health Study: design and rationale.Ann Epidemiol. 1991; 1:263–276. doi: 10.1016/1047-2797(91)90005-wCrossrefMedlineGoogle Scholar
  • 18. Kuhn M, Wing J, Weston S, Williams A, Keefer C, Engelhardt A, Cooper T, Mayer Z, Kenkel B, et al; R Core Team. caret: Classification and Regression Training [Internet].2017. Scholar
  • 19. Therneau TM, until 2009) TL (original S->R port and maintainer. survival: Survival Analysis [Internet].2017. Accessed January 22, 2021.Google Scholar
  • 20. McKee PA, Castelli WP, McNamara PM, Kannel WB. The natural history of congestive heart failure: the Framingham Study.N Engl J Med. 1971; 285:1441–1446. doi: 10.1056/NEJM197112232852601CrossrefMedlineGoogle Scholar
  • 21. Pedersen AB, Mikkelsen EM, Cronin-Fenton D, Kristensen NR, Pham TM, Pedersen L, Petersen I. Missing data and multiple imputation in clinical epidemiological research.Clin Epidemiol. 2017; 9:157–166. doi: 10.2147/CLEP.S129785CrossrefMedlineGoogle Scholar
  • 22. Luo W, Phung D, Tran T, Gupta S, Rana S, Karmakar C, Shilton A, Yearwood J, Dimitrova N, Ho TB, et al. Guidelines for developing and reporting machine learning predictive models in biomedical research: a multidisciplinary view.J Med Internet Res. 2016; 18:e323. doi: 10.2196/jmir.5870CrossrefMedlineGoogle Scholar
  • 23. Arbogast PG, Ray WA. Performance of disease risk scores, propensity scores, and traditional multivariable outcome regression in the presence of multiple confounders.Am J Epidemiol. 2011; 174:613–620. doi: 10.1093/aje/kwr143CrossrefMedlineGoogle Scholar
  • 24. Concato J, Feinstein AR, Holford TR. The risk of determining risk with multivariable models.Ann Intern Med. 1993; 118:201–210. doi: 10.7326/0003-4819-118-3-199302010-00009CrossrefMedlineGoogle Scholar
  • 25. Folsom AR, Yamagishi K, Hozawa A, Chambless LE; Atherosclerosis Risk in Communities Study Investigators. Absolute and attributable risks of heart failure incidence in relation to optimal risk factors.Circ Heart Fail. 2009; 2:11–17. doi: 10.1161/CIRCHEARTFAILURE.108.794933LinkGoogle Scholar
  • 26. Greenberg JA, Chow G, Ziegelstein RC. Caffeinated coffee consumption, cardiovascular disease, and heart valve disease in the elderly (from the Framingham Study).Am J Cardiol. 2009; 102:1502–8.CrossrefGoogle Scholar
  • 27. Greenberg JA, Dunbar CC, Schnoll R, Kokolis R, Kokolis S, Kassotis J. Caffeinated beverage intake and the risk of heart disease mortality in the elderly: a prospective analysis.Am J Clin Nutr. 2007; 85:392–398. doi: 10.1093/ajcn/85.2.392CrossrefMedlineGoogle Scholar
  • 28. Mineharu Y, Koizumi A, Wada Y, Iso H, Watanabe Y, Date C, Yamamoto A, Kikuchi S, Inaba Y, Toyoshima H, et al; JACC Study Group. Coffee, green tea, black tea and oolong tea consumption and risk of mortality from cardiovascular disease in Japanese men and women.J Epidemiol Community Health. 2011; 65:230–240. doi: 10.1136/jech.2009.097311CrossrefMedlineGoogle Scholar
  • 29. Poole R, Kennedy OJ, Roderick P, Fallowfield JA, Hayes PC, Parkes J. Coffee consumption and health: umbrella review of meta-analyses of multiple health outcomes.BMJ. 2017; 359:j5024. doi: 10.1136/bmj.j5024CrossrefMedlineGoogle Scholar
  • 30. Wikoff D, Welsh BT, Henderson R, Brorby GP, Britt J, Myers E, Goldberger J, Lieberman HR, O’Brien C, Peck J, et al. Systematic review of the potential adverse effects of caffeine consumption in healthy adults, pregnant women, adolescents, and children.Food Chem Toxicol. 2017; 109(Pt 1):585–648. doi: 10.1016/j.fct.2017.04.002CrossrefMedlineGoogle Scholar
  • 31. Ding M, Bhupathiraju SN, Satija A, van Dam RM, Hu FB. Long-term coffee consumption and risk of cardiovascular disease: a systematic review and a dose-response meta-analysis of prospective cohort studies.Circulation. 2014; 129:643–659. doi: 10.1161/CIRCULATIONAHA.113.005925LinkGoogle Scholar
  • 32. Mostofsky E, Rice MS, Levitan EB, Mittleman MA. Habitual coffee consumption and risk of heart failure: a dose-response meta-analysis.Circ Heart Fail. 2012; 5:401–405. doi: 10.1161/CIRCHEARTFAILURE.112.967299LinkGoogle Scholar
  • 33. You DC, Kim YS, Ha AW, Lee YN, Kim SM, Kim CH, Lee SH, Choi D, Lee JM. Possible health effects of caffeinated coffee consumption on Alzheimer’s disease and cardiovascular disease.Toxicol Res. 2011; 27:7–10. doi: 10.5487/TR.2011.27.1.007CrossrefMedlineGoogle Scholar
  • 34. van Dusseldorp M, Smits P, Thien T, Katan MB. Effect of decaffeinated versus regular coffee on blood pressure. A 12-week, double-blind trial.Hypertension. 1989; 14:563–569.LinkGoogle Scholar
  • 35. Ramalakshmi K, Raghavan B. Caffeine in coffee: its removal. Why and how?Crit Rev Food Sci Nutr. 1999; 39:441–456. doi: 10.1080/10408699991279231CrossrefMedlineGoogle Scholar