Epigenetic Patterns in Blood Associated With Lipid Traits Predict Incident Coronary Heart Disease Events and Are Enriched for Results From Genome-Wide Association Studies

Supplemental Digital Content is available in the text.

Participants were asked to bring in their current medication containers. Weight was measured to the nearest pound with the participant wearing only a gown without slippers or shoes, standing in the middle of the scale (Detecto Scale, Worchester Scale) with weight equally distributed on both feet. Standing height was measured to the nearest ¼ inch, with the participant barefoot or wearing thin socks, using a vertical mounted stadiometer. BMI was calculated as weight in kg divided by height in meters squared. Peripheral blood samples were collected in the morning from participants after an eight-hour fast. TC, HDL-C and TG were measured via an enzymatic colorimetric assay (Roche Hitachi 911, Roche Diagnostics) and LDL-C was calculated by the Friedewald equation. CHD was defined as a fatal or non-fatal myocardial infarction (MI), coronary death, revascularization procedure (percutaneous transluminal coronary angioplasty or coronary artery bypass graft) or coronary insufficiency (unstable angina). All CHD events were reviewed and adjudicated by a physician endpoint committee.
PIVUS is a prospective community-based cohort of participants from Uppsala, Sweden. All men and women at age 70 living in Uppsala in 2001 were invited to participate. The 1,016 participants (50% women) have been extensively phenotyped, as described previously 8 , and on the Internet (www.medsci.uu.se/pivus/). The eligible sample for investigation was from the 1,016 enrolled patients at 70 years of age and conducted between the years of 2001-2003. Lipid traits were measured at Uppsala University Hospital using routine medical chemistry methods. LDL-C was calculated by the Friedewald equation. The participants have been reexamined at ages 75 and 80, and their morbidity and mortality has been followed via national registers and journal review. Clinical diagnoses by journal review of CVD up to 10 years after baseline were used to define disease events. In the present study, we combined acute fatal or non-fatal MI and revascularization procedure (percutaneous transluminal coronary angioplasty or coronary artery bypass graft) into a composite atherosclerotic CHD endpoint.

Replication cohorts
The LBC 1921 and 1936 are two longitudinal studies of ageing [9][10][11] . They derive from the Scottish Mental Surveys of 1932 and 1947, respectively, when nearly all 11-year old children in Scotland completed a test of general cognitive ability 9 . Survivors living in the Lothian area of Scotland were recruited in late-life at mean age 79 for LBC1921 (n=550) and mean age 70 for LBC1936 (n=1,091). Follow-up has taken place at ages 70, 73, and 76 in LBC1936 and ages 79, 83, 87, and 90 in LBC1921. The eligible sample for investigation was collected between at age 70 for LBC1936 and age 79 for LBC1921. Collected data include genetic information, longitudinal epigenetic information, longitudinal brain imaging (LBC1936), and numerous blood biomarkers, anthropomorphic and lifestyle measures. Serum cholesterol was measured as part of a blood analysis profile. Non-fasting blood was drawn on the day of cognitive assessment and analysed within 24 h in serum stored at 4 °C using an enzymatic Quinoneimine dye method measuring at 500 nm, at the Western General Hospital, Edinburgh. For LBC1921 HDL-C and LDL-C were not available.

GOLDN:
The National Heart, Lung, and Blood Institute GOLDN study was designed to identify genetic determinants of lipid response to two interventions (a high-fat meal challenge and fenofibrate treatment for 3 weeks). The GOLDN study has been previously described in detail in Irvin et al 12 . Briefly, the study ascertained and recruited families from the Family Heart Study at two centres, Minneapolis, MN and Salt Lake City, UT, who self-reported to be white. Only families with at least two siblings were recruited for a total of 1,327 individuals. Volunteers were required to withhold lipid-lowering agents (pharmaceuticals or nutraceuticals) for at least 4 weeks prior to the initial visit to be eligible. A total of 1,053 met all eligibility requirements. The study protocol was approved by Institutional Review Boards at the University of Minnesota, University of Utah, and Tufts University/New England Medical Center. For the current study, we evaluated fasting blood lipids among 991 participants for whom baseline epigenetic data were available. Lipids were measured before the diet and drug intervention. Participants were asked to fast for ≥12 hours and abstain from alcohol intake for ≥24 hours. TG was measured by a glycerol-blanked enzymatic method (Trig/GB, Roche Diagnostics Corporation, Indianapolis, IN). TC was measured using a cholesterol esterase-cholesterol oxidase reaction (Chol R1, Roche Diagnostics Corporation) on the Roche/Hitachi 911 Automatic Analyzer (Roche Diagnostics Corporation). The same reaction was also used to measure HDL-C after precipitation of non-HDL-C with magnesium/dextran. LDL-C was measured by a homogeneous direct method (LDL Direct Liquid Select™ Cholesterol Reagent, Equal Diagnostics, Exton, PA). Data on medical history, physical activity and other lifestyle factors such as alcohol intake, smoking status, and diet were collected using an interviewer-administered questionnaire. Weight was measured by a beam balance and height was ascertained by a stadiometer. BMI was calculated as weight in kilograms divided by height in meters squared. After exclusion of replicates a total of 1,002 study participants had methylation data available for quality control procedures. Three samples were excluded based on poor bisulphite conversion efficiency, twelve samples due to low pass rate of CpG sites (<98.5% with a detection P>0.01) and a further six samples based on low SNP genotype match (>1 SNP mismatches) between genotypes from the methylation array and Omni/Metabochip genotyping chips leaving 981 individuals with adequate methylation data available for analyses. Following removal of participants with abnormal leukocyte cell counts (>10x10 9 cells/L; n=14) methylation data from 967 individuals remained for analyses. The signal intensities for the methylated and unmethylated state were then quantile normalised for each probe type separately, and beta values were calculated. LBC 1921 and 1936: Detailed information about the collection and QC steps undertaken on the LBC methylation data has been reported previously 16 . Briefly, the Infinium HumanMethylation450 BeadChip (Illumina Inc, San Diego, CA) was used to measure DNA methylation in whole blood of consenting participants. Background correction was performed using the R minfi package 17 and QC was used to remove probes with a low detection rate (<95% detection rate at P<0.01), probes with low quality (manual inspection), samples with a low call rate (samples with <450,000 probes detected at P<0.01), and samples with a poor match between genotypes and SNP control probes (cross-checked using wateRmelon package 14 ) or incorrect predicted sex. Post QC, DNA methylation data were available for 446 LBC1921 participants at age 79 and for 920 LBC1936 participants at age. The LBC methylation data are available at European Genome-Phenome Archive under accession number EGAS00001000910. GOLDN: DNA was extracted from CD4+ T-cells harvested from stored buffy coats using antibody-linked Invitrogen Dynabeads 18 . Stored buffy coats were collected at the same time lipid concentrations were measured. We lysed cells captured on the beads and extracted DNA using DNeasy kits (Qiagen, Venlo, Netherlands) and methylation was assayed across ~470,00 autosomal CpG sites using the Illumina Infinium Human Methylation450 Beadchip (Illumina, San Diego, CA). For each assay, 500ng of DNA was treated with sodium bisulfite (EZ DNA, Zymo Research, Irvine, CA) prior to standard Illumina amplification, hybridization, and imaging steps. The resulting intensity files were analyzed with Illumina's GenomeStudio which generated beta scores (i.e. the proportion of total signal from the methylation specific probe or color channel) and "detection P-values" (the probability that the total intensity for a given probe falls within the background signal intensity). Beta scores with an associated detection P-value greater than 0.01 were removed and samples with more than 1.5% missing data points were eliminated from further analysis. Furthermore, any CpG probes where more than 10% of samples failed to yield adequate intensity were removed. A total of 58 samples were removed. The filtered beta scores were then subjected to batch normalization with the ComBat package for R software in non-parametric mode 19 . We performed the normalization in parallel on random subsets of 20,000 CpGs per run where each array of 12 samples was used as a "batch." These methods have been extensively described in Absher et al and the utility of ComBat to correct for batch effects in comparison to other programs is reported 20,21 . To correct for probe chemistry, we separately normalized probes from the Infinium I and II chemistries and subsequently adjusted the β scores for Infinium II probes using the equation derived from fitting a second order polynomial to the observed methylation values across all pairs of probes located <50bp apart (within-chemistry correlations >0.99), where one probe was Infinium I and one was Infinium II. Finally, we eliminated any CpGs where the probe sequence mapped either to a location that did not match the annotation file or to more than one locus. We identified such markers by re-aligning all probes (with unconverted Cs) to the human reference genome. After these quality control procedures, there were methylation data from 461,281 CpGs. Principal components (PCs) based on the beta scores of all autosomal CpGs passing QC were generated using the prcomp function in R (V 2.12.1) and used to adjust for cell purity in association analysis. Deconvolution estimated CD4+ T-cell percentages were calculated adapting the method of Abbas et al 22 . Predicted CD4+ T-cell percent purity was highly correlated with PC1 (r 2 =0.85, P=4E-293) but not other PCs, thus supporting the usefulness of methylation PCs in adjusting for cell purity in our analysis. The GOLDN methylation data are available at dbGaP under the accession number phs000741.v1.p1.

Association of methylation of blood cell-derived DNA with lipid levels FHS:
Linear mixed effects regression models were conducted to test the association between site-specific DNA methylation and each lipid phenotype (TC, HDL-C, LDL-C, TG) individually. Participants currently on lipid lowering medication were excluded. The primary model was adjusted for age, sex and technical covariates. The secondary model was additionally adjusted for BMI. Technical covariates included chip, row, column (specified as random effects) and two methylation principal components (PCs) to adjust for unmeasured batch effects. Cell count heterogeneity was accounted for by adjusting for imputed cell counts obtained by the Houseman method 23 . The linear mixed effect models were run with the pedigreemm package in R (version 3.1), which additionally accounts for the family correlation structure in the FHS. Familial relatedness was obtained by reported relationships and genetic similarity calculated by identity-by-descent (IBD) probabilities. For any inconsistencies between reported and IBD relationships, relationships obtained from IBD probabilities were utilized. After removing individuals on lipid-lowering medication and thus with missing phenotype, a total of 1,494 participants were considered in the association analyses. PIVUS: The associations between normalised DNA methylation beta values and phenotypes were modelled by a linear mixed effect model, using R 24 and the lmer function (lme4 package), fitted by maximum-likelihood assuming a normally distributed error term. Models were adjusted for age, sex and predicted white cell counts (estimated from the DNA methylation data using the Houseman algorithm 23 as implemented in R package minfi for Illumina HumanMethylation450 17 ) as fixed effects and chip, chip row and chip column as random effects. A likelihood ratio test was used to assess the significance of the phenotype effect. The p-value of the phenotype effect in each model was calculated from the Chi-square distribution with 1 degree of freedom using -2log(likelihood ratio) as the test statistic. After removing individuals on lipid-lowering medication (n=155), a total of 812 individuals were considered in the association analyses. LBC 1936 and 1921: Linear regression modelling was used to assess the association between DNA methylation (outcome variable) and the lipid traits (predictor variable). Covariates included age, sex, and measured white blood cell counts (eosinophils, basophils, neutrophils, monocytes, and lymphocytes). Additional adjustments were made for BMI in secondary models. Participants were excluded from the analyses if they were taking lipid-lowering medication. All statistical analyses were performed using R software (http://cran.rproject.org/). After removing individuals on lipid-lowering medication and thus with missing phenotype, a total of 654, 588, 592 and 588 participants were considered in the association analyses of TC, LDL-C, HDL-C and TG, respectively for LBC1936. For LBC1921, a total of 380 and 376 were considered in the association analyses of TC and TG, respectively. GOLDN: Associations between normalised methylation beta values at each CpG site and lipid traits were analysed using mixed linear regression models adjusted for age, gender, study site, and 4 methylation PCs (as a proxy for cell purity) as fixed effects and family structure as a random effect using the R kinship package (lmekin function). A second set of models additionally adjusted for BMI. After removing four observations due to missing phenotype or covariate data, a total of 991 participants were considered in the association analysis.

Gene expression profiling
Gene expression data were available for participants in the FHS. RNA was extracted from whole blood using the PAXgene Blood RNA System Kit (Qiagen, Venlo, Netherlands) with mRNA expression profiling assessed using the Affymetrix Human Exon 1.0 ST GeneChip platform. Gene expression data were normalised using robust multichip average methods 27 with quality control measures as previously described 13 . Cell count proportions were derived from gene expression markers in this sample set as there was overlap between gene expression measures and directly measured cell counts (lymphocytes, monocytes, neutrophils, basophils, and eosinophils) from a sample of 2,280 Third Generation FHS participants obtained during the second examination cycle (2008-2011). Internal validation using training and testing datasets achieved an r 2 > 0.8 in the majority of cell lines (except basophils). The FHS gene expression data are available at dbGaP under the accession number phs000363.v12.p9.

Metabolomic profiling
Metabolomics data were available for participants in the PIVUS. Untargeted metabolomic profiling of serum samples was measured in duplicates as described previously 28 . In brief, 1 μl of sample was analysed on Acquity UPLC coupled to a Xevo G2 Q-TOFMS (Waters Corporation, Milford, Massachusetts, USA) and raw data was processed using XCMS software 29 for detection, alignment, grouping, and imputation of features. For normalisation of data metabolic feature intensities were log-transformed and an ANOVA-type normalisation applied. Fragmentation spectra were reconstructed from metabolic features with strong correlation and similar retention time and metabolites were identified from spectra. Only annotated metabolites (n=229) were used in analysis in relation to DNA methylation. Raw spectra from mass spectrometry analysis and annotated metabolites intensities are available in Metabolights (http://www.ebi.ac.uk/metabolights/) with accession number MTBLS90.