Skip main navigation

Epigenetic Signatures of Cigarette Smoking

Originally published Cardiovascular Genetics. 2016;9:436–447



DNA methylation leaves a long-term signature of smoking exposure and is one potential mechanism by which tobacco exposure predisposes to adverse health outcomes, such as cancers, osteoporosis, lung, and cardiovascular disorders.

Methods and Results—

To comprehensively determine the association between cigarette smoking and DNA methylation, we conducted a meta-analysis of genome-wide DNA methylation assessed using the Illumina BeadChip 450K array on 15 907 blood-derived DNA samples from participants in 16 cohorts (including 2433 current, 6518 former, and 6956 never smokers). Comparing current versus never smokers, 2623 cytosine–phosphate–guanine sites (CpGs), annotated to 1405 genes, were statistically significantly differentially methylated at Bonferroni threshold of P<1×10−7 (18 760 CpGs at false discovery rate <0.05). Genes annotated to these CpGs were enriched for associations with several smoking-related traits in genome-wide studies including pulmonary function, cancers, inflammatory diseases, and heart disease. Comparing former versus never smokers, 185 of the CpGs that differed between current and never smokers were significant P<1×10−7 (2623 CpGs at false discovery rate <0.05), indicating a pattern of persistent altered methylation, with attenuation, after smoking cessation. Transcriptomic integration identified effects on gene expression at many differentially methylated CpGs.


Cigarette smoking has a broad impact on genome-wide methylation that, at many loci, persists many years after smoking cessation. Many of the differentially methylated genes were novel genes with respect to biological effects of smoking and might represent therapeutic targets for prevention or treatment of tobacco-related diseases. Methylation at these sites could also serve as sensitive and stable biomarkers of lifetime exposure to tobacco smoke.


Cigarette smoking is a major causal risk factor for various diseases, including cancers, cardiovascular disease, chronic obstructive pulmonary disease,1 and osteoporosis.1 Worldwide cessation campaigns and legislative actions have been accompanied by a reduction in the number of cigarette smokers and corresponding increases in the number of former smokers. In the United States, there are more former smokers than current smokers.1 Despite the decline in the prevalence of smoking in many countries, it remains the leading preventable cause of death in the world, accounting for ≈6 million deaths each year.2

Clinical Perspective on p 447

Even decades after cessation, cigarette smoking confers long-term risk of diseases including some cancers, chronic obstructive pulmonary disease, and stroke.1 The mechanisms for these long-term effects are not well understood. DNA methylation changes have been proposed as one possible explanation.

DNA methylation seems to reflect exposure to a variety of lifestyle factors,3 including cigarette smoking. Several studies have shown reproducible associations between tobacco smoking and altered DNA methylation at multiple cytosine–phosphate–guanine (CpG) sites (CpGs).415 Some DNA methylation sites associated with tobacco smoking have also localized to genes related to coronary heart disease5 and pulmonary disease.16 Some studies have found differently associated CpGs in smokers versus nonsmokers.8,11 Consortium-based meta-analyses have been extremely successful in identifying genetic variants associated with numerous phenotypes, but large-scale meta-analyses of genome-wide DNA methylation data have not yet been widely used. It is likely that additional novel loci differentially methylated in response to cigarette smoking remain to be discovered by meta-analyzing data across larger sample sizes comprising multiple cohorts. Differentially methylated loci with respect to smoking may serve as biomarkers of lifetime smoking exposure. They may also shed light on the molecular mechanisms by which tobacco exposure predisposes to multiple diseases.

A recent systematic review13 analyzed published findings across 14 epigenome-wide association studies of smoking exposure across various DNA methylation platforms of varying degrees of coverage and varying phenotypic definitions. Among these were 12 studies (comprising 4750 subjects) that used the more comprehensive Illumina Human Methylation BeadChip 450K array (Illumina 450K), which includes and greatly expands on the coverage of the earlier 27K platform. The review compares only statistically significant published results and is not a meta-analysis that can identify signals that do not reach statistical significance in individual studies.17

In the current study, we meta-analyzed association results between DNA methylation and cigarette smoking in 15 907 individuals from 16 cohorts in the CHARGE consortium (Cohorts for Heart and Aging Research in Genomic Epidemiology) using a harmonized analysis. Methylation was measured on DNA extracted from blood samples using the Illumina Human Methylation BeadChip 450K array. In separate analyses, we compared current smokers and past smokers with nonsmokers and characterized the persistence of smoking-related CpG methylation associations with the duration of smoking cessation among former smokers. We integrated information from genome-wide association studies (GWAS) and gene expression data to gain insight into potential functional relevance of our findings for human diseases. Finally, we conducted analyses to identify pathways that may explain the molecular effects of cigarette exposure on tobacco-related diseases.

Materials and Methods

Study Participants

This study comprised a total of 15 907 participants from 16 cohorts of the Cohorts for Heart and Aging Research in Genetic Epidemiology Consortium (Table I in the Data Supplement). The 16 participating cohorts are ARIC, FHS Offspring, KORA F4, GOLDN, LBC 1921, LBC 1936, NAS, Rotterdam, Inchianti, GTP, CHS European Ancestry (EA), CHS African Ancestry (AA), GENOA, EPIC Norfolk, EPIC, and MESA (Multi-Ethnic Study of Atherosclerosis). Of these, 12 161 are of EA and 3746 are of AA. The study was approved by institutional review committees for each cohort, and all participants provided written informed consent for genetic research.

DNA Methylation Sample and Measurement

For most studies, methylation was measured on DNA extracted from whole blood, but some studies used CD4+ T cells or monocytes (Table I in the Data Supplement). In all studies, DNA was bisulfite converted using the Zymo EZ DNA methylation kit and assayed for methylation using the Infinium HumanMethylation 450 BeadChip, which contains 485 512 CpG sites. Details of genomic DNA preparation, bisulfite conversion, and methylation assay for each cohort can be found in the Data Supplement.

Raw methylated and total probe intensities were extracted using the Illumina Genome Studio methylation module. Preprocessing of the methylated signal (M) and unmethylated signal (U) was conducted using various software tools, primarily DASEN of wateRmelon18 and BMIQ,19 both of which are R packages. The methylation beta (β) values were defined as β=M/(M+U). Each cohort followed its own quality-control protocols, removing poor quality or outlier samples and excluding low-quality CpG sites (with detection P value >0.01). Each cohort evaluated batch effects and controlled for them in the analysis. Details of these processes can be found in the Data Supplement.

Smoking Phenotype Definition

Self-reported cigarette-smoking status was divided into 3 categories. Current smokers were defined as those who have smoked at least 1 cigarette a day within 12 months before the blood draw, former smokers were defined as those who had ever smoked at least 1 cigarette a day but had stopped at least 12 months before the blood draw, and never smokers reported never having smoked. Pack years was calculated based on self-report as the average number of cigarettes smoked per day divided by 20 multiplied by the number of years of smoking, with zero assigned to never smokers. A few cohorts recorded the number of years since each former smoker had stopped smoking.

Cohort-Specific Analyses and Meta-Analysis

Each cohort analyzed its data using at least 2 linear mixed-effect models. Each model was run separately for each CpG site. Model 1 is as follows:


where blood count comprises the fractions of CD4+ T cells, CD8+ T cells, NK cells, monocyte, and eosinophils either measured or estimated using the Houseman et al method.20 The blood count adjustment was performed only in cohorts with whole-blood and leukocyte samples. Familial relationship was also accounted for in the model when applicable (eg, for FHS, see Data Supplement for details). Acknowledging that each cohort may be influenced by a unique set of technical factors, we allow each cohort to choose its cohort-specific technical covariates. Model 2 added to model 1 body mass index because it is associated with methylation at some loci, making it a potential confounder.21 Only 3 cohorts participated in model 2 analysis: FHS, KORA, and NAS. Model 3 substituted smoking phenotypes for pack years. Only 3 cohorts participated in model 3 analysis: FHS, Rotterdam, and Inchianti. The pack-year analysis was performed only on 2 subsets: current versus never smokers and former versus never smokers. Combining all 3 categories would require accurate records of time of quitting, which among the 3 cohorts was available for only FHS. To investigate cell type differences, we removed blood counts from model 1 and called it model 4. Only 3 cohorts participated in this analysis: FHS, KORA, and NAS. All models were run with the lme4 package22 in R,23 except for FHS (see Data Supplement for details).

Meta-analysis was performed to combine the results from all cohorts. Because of the variability of available CpG sites after quality-control steps, we excluded CpG sites that were available in <3 cohorts. The remaining 485 381 CpG sites were then meta-analyzed with a random-effects model using the following formula:


where Ei is the observed effect of study i, μ is the main smoking effect, si is the between-study error for study i, and ei is the within-study error for study i, with both si and ei are assumed to be normally distributed. The model is fitted using the restricted maximum likelihood criterion in R’s metafor24 package. Multiple-testing adjustment on the resulting P values was performed using the false discovery rate (FDR) method of Benjamini and Hochberg.25 In addition, we also report results using the Bonferroni-corrected threshold of 1×10−7 (≈0.05/485 381).

The regression coefficient β (from meta-analysis) is interpretable as the difference in mean methylation between current and never smokers. We multiplied these by 100 to represent the percentage methylation difference where methylation ranges from 0% to 100%.

Literature Review to Identify Genes Previously Associated With Smoking and Methylation

We used the same literature search strategy published previously.26 A broad query of NCBIs PubMed literature database using medical subject heading (MeSH) terms (“((((DNA Methylation[Mesh]) OR methylation)) AND ((Smoking[Mesh]) OR smoking))”) yielded 775 results when initially performed on January 8, 2015, and 789 studies when repeated to update the results on March 1, 2015. Results were reviewed by abstract to determine whether studies met inclusion criteria: (1) performed in healthy human populations, (2) agnostically examined >1000 CpG sites at a time, (3) only cigarette exposure was considered, and (4) with public reporting of P values and gene annotations. A total of 25 publications met inclusion criteria, listed in the fourth supplementary table of Joubert et al.26 CpG-level results (P values and gene annotations) for sites showing genome-wide statistically significant associations (FDR <0.05) were extracted and resulted in 1185 genes previously associated with adult or maternal smoking. All CpGs annotated to these 1185 genes were marked as previously found.

Gene-Set Enrichment Analysis

Gene-set enrichment analysis27 was performed in the website ( on significant findings to determine putative functions of the CpG sites. We selected gene ontology biological process (C5-BP) and collected all categories with FDR <0.05 (≤100 categories).

Enrichment Analysis for Localization to Different Genomic Features

Enrichment analysis on genomic features were performed using the annotation file supplied by Illumina (version 1.2; downloaded from manufacturer’s website,, which contains information of CpG location relative to gene (ie, body, first exon, 3′ UTR, 5′ UTR, within 200 base pairs of transcriptional start site [TSS200], and within 1500 base pairs of transcriptional start site [TSS1500], the relation of CpG site to a CpG island (ie, island, northern shelf, northern shore, southern shelf, and southern shore), whether the CpG site is known to be in differentially methylated regions, and whether the CpG site is known to be an enhancer or a DNAse I hypersensitive site. Enrichment analysis was performed using 1-sided Fisher exact set for each feature, using R’s fisher.test.

GWAS Analysis

We intersected our results with single-nucleotide polymorphisms (SNPs) having GWAS P values ≤5×10−8 in the National Human Genome Research Institute GWAS catalog (accessed November 2, 2015).28 The catalog contained 9777 SNPs annotated to 7075 genes associated with 865 phenotypes at P≤5×10−8. To determine the genes, we looked up each significant CpG on the annotation file supplied by Illumina. Enrichment analysis was performed on a per-gene basis using 1-sided Fisher exact test.

For bone mineral phenotype enrichment, we included all SNPs containing terms bone mineral density or osteoporosis. For cardiovascular disease, we included all SNPs containing terms cardiovascular disease, stroke, coronary disease, cardiomyopathy, or myocardial infarction. For cardiovascular disease risk factors, we included all SNPs containing terms blood pressure, cholesterol, diabetes, obesity, or hypertension. For overall cancer enrichment, we included all SNPs containing terms cancer, carcinoma, or lymphoma, while removing those pertaining to cancer treatment effects. For overall pulmonary phenotype enrichment, we included all SNPs containing terms pulmonary disease, pulmonary function, emphysema, asthma, or airflow obstruction.

Analysis of Persistence of Methylation Signals With Time Since Quitting Smoking Among Former Smokers

We examined whether smoking methylation associations were attenuated over time in the FHS cohort, which had ascertained longitudinal smoking status of >35 years. The analysis was performed on 7 dichotomous variables, indicating cessation of smoking for 5, 10, 15, 20, 25, and 30 years versus never smokers. For example, for 5-year cessation variable, those who quit smoking before ≥5 years are marked as ones, whereas never smokers are marked as zeroes, and current smokers are excluded. For this analysis, we used the pedigreemm package29 with the same set of covariates as in the primary analysis. Sites with P<0.002 across all 7 variables were deemed to be statistically significant compared with never-smoker levels.

Methylation by Expression Analysis

To determine transcriptomic association of each significant CpG site, we interrogated such CpG sites in the FHS gene-level methylation by expression database, at genome-wide FDR <0.05. The methylation by expression database was constructed from 2262 individuals from the FHS Offspring cohort attending examination cycle 8 (2005–2008) with both whole-blood DNA methylation and transcriptomic data based on the Affymetrix Human Exon Array ST 1.0. Enrichment analysis was performed using a 1-sided Fisher exact test. We defined that the methylation CpG site and the corresponding transcript are associated in cis if the location of the CpG site is within 500 kilobases of the transcript’s start location.

Analysis of Ethnic Discrepancy Between AA and EA Cohorts

Meta-analysis of the current versus never smoker results of EA cohorts (FHS, KORA, GOLDN, LBC 1921, LBC 1936, NAS, Rotterdam, Inchianti, EPIC, EPIC Norfolk, MESA, and CHS-EA) was performed separately from those of AA cohorts (ARIC, GTP, GENOA, and CHS-AA).

Analysis of Sample Types for DNA Extraction

Meta-analysis was performed on the results from cohorts with whole blood/buffy coat samples (FHS, KORA, LBC 1921, LBC 1936, NAS, Rotterdam, Inchianti, GTP, CHS-EA, CHS-AA, ARIC, GENOA, EPIC, and EPIC Norfolk). CD4+ samples in GOLDN and CD14+ samples in MESA, because they comprise single cohorts, are not meta-analyzed. Correlations of results across different cell types were performed on CpG sites with FDR <0.05 in at least one cell type.


Table 1 displays the characteristics of participants in the meta-analysis. The proportion of participants reporting current smoking ranged from 4% to 33% across the different study populations. The characteristics of the participants within each cohort are provided in Table I in the Data Supplement.

Table 1. Participant Characteristics

CharacteristicsCurrent Smokers, n=2433Former Smokers, n=6518Never Smokers, n=6956
Sex (% men)46.355.631.7
Age, y*57.7±7.764.8±8.261.2±9.7
BMI, kg/m2*27.3±5.428.7±5.028.6±5.3

BMI indicates body mass index.

*Weighted mean±pooled SD across cohorts

Current Versus Never Smokers

In the meta-analysis of current cigarette smokers (n=2433) versus never smokers (n=6956), 2623 CpGs annotated to 1405 genes met Bonferroni significance after correction for 485 381 tests (P<1×10−7). On the basis of genome-wide FDR< 0.05, 18 760 CpGs annotated to 7201 genes were differentially methylated. There was a moderate inflation factor30 λ of 1.32 (Figure I in the Data Supplement), which is consistent with a large number of sites being impacted by smoking. Our results lend support to many previously reported loci,7,8,11,13 including CpGs annotated to AHRR, RARA, F2RL3, and LRRN3 (Table II in the Data Supplement). Not surprisingly, cg05575921 annotated to AHRR, the top CpG identified in most previous studies of smoking, was highly significant in our meta-analysis (P=4.6×10−26; ranked 36, Table II in the Data Supplement) and also had the largest effect size (−18% difference in methylation), which is comparable to effect sizes in previous studies.18 Of the 18 760 significant CpGs at FDR <0.05, 16 673 (annotated to 6720 genes) have not been previously reported to be associated with cigarette smoking—these include 1500 of the 2623 CpGs that met Bonferroni significance. The 25 CpGs with lowest P values for both overall and novel findings are shown in Table 2. Table II in the Data Supplement provides the complete list of all CpGs that were significantly differentially methylated (FDR <0.05) in analysis of current versus never smokers. Adding body mass index into the model did not appreciably alter the results (Figure II in the Data Supplement).

Table 2. Most Statistically Significant CpG Sites That Were Associated With Current Vs Never-Smoker Status

Probe IDChromosomeLocationGene Symbol*Regression CoefficientsSEPFDR
25 most significant CpG sites
cg16145216142 385 662HIVEP30.02980.00206.7×10−483.3×10−42
cg19406367166 999 929SGIP10.01750.00137×10−441.7×10−38
cg0560398512 161 049SKI−0.01220.00091.8×10−432.8×10−38
cg140996851147 546 068CUGBP1−0.01240.00091.5×10−421.8×10−37
cg125136165177 370 977−0.02620.00206.1×10−415.9×10−36
cg037928761673 243−0.01820.00147.2×10−385.9×10−33
cg010977685378 854AHRR−0.01660.00136.8×10−354.7×10−30
cg26856289124 307 516SFRS13A−0.01630.00138.6×10−355.2×10−30
cg079544239130 741 881FAM102A−0.01340.00111.2×10−346.3×10−30
cg019402732233 284 934−0.08150.00672×10−349.8×10−30
cg010831311667 877 413THAP11;CENPT−0.01550.00133.7×10−341.6×10−29
cg010174641847 018 095SNORD58A; SNORD58B; RPL17−0.01720.00141.9×10−337.6×10−29
cg061218082113 404 678SLC20A1−0.01430.00122.1×10−327.9×10−28
cg100629191738 503 802RARA−0.01280.00119.2×10−323.2×10−27
cg200661882237 678 791CYTH4−0.02520.00221.6×10−315.2×10−27
cg045517765393 366AHRR−0.02440.00215.8×10−311.8×10−26
cg111524121574 927 688EDC3−0.00770.00071.8×10−305×10−26
cg00073090191 265 879−0.01960.00174.2×10−301.1×10−25
cg119027775368 843AHRR−0.02010.00189.1×10−302.3×10−25
cg25212453171 509 953SLC43A2−0.01010.00091.4×10−293.5×10−25
cg049562441738 511 592RARA0.01220.00111.5×10−293.5×10−25
cg13951797162 204 381TRAF7−0.01530.00141.6×10−293.5×10−25
cg110280751097 200 911SORBS10.01750.00161.7×10−293.6×10−25
cg11700584†1450 088 544RPL36AL;MGAT2−0.01510.00133.4×10−296.8×10−25
cg112639971170 257 280CTTN0.00500.00054.3×10−298.4×10−25
25 most significant novel CpG sites
cg117005841450 088 544RPL36AL; MGAT2−0.01510.00133.4×10−296.8×10−25
cg224177336153 303 409FBXO5−0.01710.00151.5×10−282.7×10−24
cg081189081615 787 920NDE10.00530.00055.4×10−267.1×10−22
cg140032659139 796 499TRAF2−0.01060.00103.2×10−253.7×10−21
cg025563933168 866 705MECOM−0.01620.00162.8×10−242.6×10−20
cg0121820611116 933 977SIK3−0.01500.00153.1×10−232.5×10−19
cg0498773414103 415 873CDC42BPB0.01490.00159.0×10−236.8×10−19
cg271180351631 891 978ZNF2670.01360.00142.4×10−221.7×10−18
cg18450254364 200 005PRICKLE20.01200.00132.3×10−211.3×10−17
cg067537872220 074 208ZFAND2B0.00630.00073.2×10−211.8×10−17
cg1815830612133 135 032FBRSL10.01020.00116.2×10−213.2×10−17
cg190933701717 110 180PLD60.01980.00218.7×10−214.4×10−17
cg0918218911 709 203NADK−0.01040.00112.0×10−209.2×10−17
cg183699902112 941 244FBLN70.01160.00132.3×10−201.1×10−16
cg245788571717 110 207PLD60.02000.00223.1×10−201.4×10−16
cg204084021072 362 452PRF10.00850.00097.6×10−203.1×10−16
cg046734462239 879 951MGAT30.00600.00072.0×10−198.0×10−16
cg06803614140 133 581NT5C1A−0.00880.00102.1×10−198.3×10−16
cg162746781154 127 952TPM3; NUP210L−0.01520.00172.9×10−191.1×10−15
cg072863415176 923 805PDLIM7−0.00770.00093.4×10−191.3×10−15
cg206744243186 503 527MIR1248; EIF4A2; SNORA81−0.00910.00104.2×10−191.5×10−15
cg022796251578 384 520SH2D70.01050.00124.8×10−191.7×10−15
cg034856671675 143 200ZNRF1−0.01680.00195.0×10−191.8×10−15
cg03531211632 920 102HLA-DMA−0.01080.00127.5×10−192.5×10−15
cg0994067714103 415 458CDC42BPB0.00810.00091.0×10−183.2×10−15

CpG indicates cytosine–phosphate–guanine; and FDR, false discovery rate.

*CpG sites without gene names are intergenic. These are all included in all the analyses.

Not previously discovered by other studies.

Methylation can be either reduced or increased at CpG sites in response to smoking. For the 53.2% of FDR-significant CpGs with increased methylation in response to current smoking, the mean percentage difference in methylation between current and never smokers was 0.5% (SD=0.37%; range, 0.06–7.3%). For 46.8% of CpGs with decreased methylation in response to current smoking, the mean percentage difference was 0.65% (SD=0.56; range, 0.04–18%) The volcano plot can be found in Figure III in the Data Supplement.

We did not observe correlation between the number of significant CpGs and either the size of the gene or the number of exons or the coverage of the methylation platform. We performed a formal enrichment test for each of the 7201 genes in regard to the length of the gene or number of exons and found only 3 for which associations were observed (AHRR, PRRT1, and TNF). However, given the robust findings for a specific CpG in AHRR in multiple studies in the literature4,7,9 and our own, and its key role in the AHR pathway, which is crucial in response to polyaromatic hydrocarbons, such as are produced by smoking,31 it seems unlikely that the AHRR findings are false positives. Likewise, there is strong support in the literature for PRRT132 and TNF.33 The enrichment results for methylation platform coverage also yielded the same 3 genes.

In a subset of 3 cohorts (1827 subjects), we investigated the association of the number of pack years smoked with the 18 760 CpGs that were differentially methylated (FDR <0.05) between current versus never smokers. Significant dose responses were observed for 11 267 CpGs (60.1%) at FDR <0.05 (Table III in the Data Supplement).

To investigate the pathways implicated by these genes, we performed a gene-set enrichment analysis34 on the annotated genes. The results suggested that cigarette smoking is associated with potential changes in numerous vital molecular processes, such as signal transduction (FDR=2.8×10−79), protein metabolic processes (FDR=1.2×10−43), and transcription pathways (FDR=8.4×10−31). The complete list of 99 enriched molecular processes can be found in Table IV in the Data Supplement.

Former Versus Never Smokers

Meta-analysis of former (n=6518) versus never smokers (n=6956) restricted to the 18 760 CpG sites that were differentially methylated in current versus never smokers identified 2568 CpGs annotated to 1326 genes at FDR <0.05 (Table V in the Data Supplement). There were 185 CpGs (annotated to 149 genes) that also met Bonferroni correction (P<0.05/18760≈2.67×10−6). There was no evidence of inflation30 (λ=0.98) (Figure IV in the Data Supplement). We also confirmed previously reported findings for CpGs annotated to AHRR, RARA, and LRRN3.7,8,11,13 Effect sizes of these CpGs were all weaker than that in the analysis of current versus never smokers (61.2%±15.3% weaker) for the 2568 CpGs that remained significantly differentially methylated in former versus never smokers compared with current versus never smokers. Results for the top 25 CpGs are displayed in Table 3. Adding body mass index to the model did not appreciably alter the results (Figure V in the Data Supplement). A volcano plot can be found in Figure VI in the Data Supplement. In a subset of 3 cohorts (3349 subjects), analyses using pack years confirmed a significant dose response for 1804 of the 2568 CpGs (70%) annotated to 942 genes at FDR <0.05 (Table VI in the Data Supplement).

Table 3. Twenty-Five Most Statistically Significant CpG Sites That Were Associated With Former Versus Never Smoker Status

Probe IDChromosomeLocationGene Symbol*Regression CoefficientsSEPFDR
cg019402732233 284 934−0.02340.00139.6×10−731.8×10−68
cg25189904168 299 493GNG12−0.02830.00213.5×10−403.3×10−36
cg12803068745 002 919MYO1G0.01910.00179.3×10−315.8×10−27
cg195724871738 476 024RARA−0.01590.00142.2×10−301.0×10−26
cg115543915321 320AHRR−0.00910.00081.0×10−283.9×10−25
cg059512212233 284 402−0.03960.00361.1×10−273.2×10−24
cg237713661186 510 998PRSS23−0.01670.00151.2×10−273.2×10−24
cg26764244168 299 511GNG12−0.01190.00112.3×10−275.4×10−24
cg055759215373 378AHRR−0.04060.00388.2×10−271.7×10−23
cg116600181186 510 915PRSS23−0.01570.00154.3×10−268.1×10−23
cg215666422233 284 661−0.04340.00411.0×10−251.7×10−22
cg119027775368 843AHRR−0.00630.00062.8×10−254.3×10−22
cg268506245429 559AHRR0.01180.00113.1×10−254.4×10−22
cg036361831917 000 585F2RL3−0.02670.00268.9×10−251.2×10−21
cg15693572322 412 3850.01900.00191.5×10−231.9×10−20
cg179244765323 794AHRR0.01480.00164.0×10−204.7×10−17
cg125136165177 370 977−0.00720.00082.4×10−192.7×10−16
cg073392362050 312 490ATP9A−0.00620.00071.4×10−181.4×10−15
cg06126421630 720 080−0.03650.00423.0×10−183.0×10−15
cg146242071168 142 198LRP5−0.00700.00085.0×10−184.7×10−15
cg007066832233 251 030ECEL1P20.01010.00121.4×10−171.2×10−14
cg233515841186 512 100PRSS23−0.00480.00067.0×10−176.0×10−14
cg025834841254 677 008HNRNPA1−0.00620.00081.0×10−158.5×10−13
cg05302489631 760 426VARS0.00790.00102.5×10−152.0×10−12
cg0144206445 713 450EVC−0.00550.00073.3×10−152.4×10−12

CpG indicates cytosine–phosphate–guanine; and FDR, false discovery rate.

*CpG sites without gene names are intergenic. These are all included in all the analyses.

The gene-set enrichment analysis27 in the former versus never smoker analyses on all 1326 genes revealed enrichment for genes associated with protein metabolic processes (FDR=1.1×10−23), RNA metabolic processes (FDR=1.4×10−17), and transcription pathways (FDR=3.9×10−18; Table VII in the Data Supplement). The gene-set enrichment analysis on the 942 genes for which the 1804 CpGs exhibited dose responses with pack years also revealed similar pathways to those summarized in Table VII in the Data Supplement, except with weaker enrichment FDR values.

In 2648 Framingham Heart Study participants with ≤30 years of prospectively collected smoking data, we examined the 2568 CpGs that were differentially methylated in meta-analysis of former versus never smokers and explored their associations with time since smoking cessation. Methylation levels of most CpGs returned toward that of never smokers within 5 years of smoking cessation. However, 36 CpGs annotated to 19 genes, including TIAM2, PRRT1, AHRR, F2RL3, GNG12, LRRN3, APBA2, MACROD2, and PRSS23, did not return to never-smoker levels even after 30 years of smoking cessation (Figure; Table 4).

Table 4. The Top 36 Most Statistically Significant CpG Sites That Did Not Return to Never-Smoker Levels 30 Y After Smoking Cessation in the Framingham Heart Study (n=2648)

Probe IDChromosomeLocationGene SymbolP
cg059512212233 284 4023.2×10−15
cg066444282233 284 1121.2×10−14
cg055759215373 378AHRR6.5×10−14
cg215666422233 284 6618.6×10−10
cg036361831917 000 585F2RL35.7×10−7
cg06126421630 720 0801.3×10−6
cg019402732233 284 9341.9×10−6
cg237713661186 510 998PRSS233.1×10−6
cg172725636321 16 548PRRT14.4×10−6
cg239168965368 804AHRR1.3×10−5
cg116600181186 510 915PRSS231.3×10−5
cg081189081615 787 920NDE13.0×10−5
cg139379051253 612 551RARG1.5×10−4
cg241723242232 258 3631.7×10−4
cg10780313633 501 3792.0×10−4
cg14027333632 116 317PRRT12.1×10−4
cg11245297198 117 898CCL252.1×10−4
cg016929689108 005 3493.1×10−4
cg007066832233 251 030ECEL1P23.4×10−4
cg253179412233 351 153ECEL14.0×10−4
cg25189904168 299 493GNG124.0×10−4
cg14179389192 947 961GFI14.7×10−4
cg136413173127 255 5524.9×10−4
cg198475771529 213 748APBA25.1×10−4
cg142396187110 281 3565.8×10−4
cg25955180632 116 538PRRT16.3×10−4
cg007741493522 55 721TLR96.4×10−4
cg213513926161 607 487AGPAT47.1×10−4
cg119027775368 843AHRR7.6×10−4
cg072518871773 641 809LOC100130933; RECQL57.7×10−4
cg1938215772 124 566MAD1L18.9×10−4
cg199257801101 509 5571.1×10−3
cg036795446155 537 972TIAM21.1×10−3
cg085597122016 030 674MACROD21.3×10−3
cg098379777110 731 201LRRN3; IMMP2L1.3×10−3
cg009318436155 442 993TIAM21.4×10−3

CpG indicates cytosine–phosphate–guanine.

*CpG sites without gene names are intergenic. These are all included in all the analyses.


Figure. Trajectories of cytosine–phosphate–guanine (CpG) sites that did not return to never-smoker levels within 30 y after cessation.

The EPIC studies included cancer cases plus noncancer controls analyzed together, adjusting for cancer status. The other studies were population-based samples not selected for disease status. To evaluate residual confounding by cancer status after adjustment, we repeated the meta-analysis without the EPIC studies. The effect estimates were highly correlated: Pearson ρ=0.99 for current versus never smoking and 0.98 for former smoking versus never.

Enrichment Analysis for Genes Identified in GWAS of Smoking-Related Phenotypes

To identify potential relevance of the differentially methylated genes to smoking-related phenotypes, we determined whether these genes had been associated with smoking-related phenotypes in the National Human Genome Research Institute-EBI GWAS Catalog28 (accessed November 2, 2015). The catalog contained 9777 SNPs annotated to 7075 genes associated with 865 phenotypes at P≤5×10−8. Of the 7201 genes (mapped by 18 760 CpG sites) significantly differentially methylated in current versus never smokers, we found overlap with 1791 genes (4187 CpGs are mapped to these) associated in GWAS with 700 phenotypes (enrichment P=2.4×10−52). We identified smoking-related traits using the 2014 US Surgeon General’s report.1 Enrichment results for a selection of smoking-related phenotypes, including coronary heart disease and its risk factors, various cancers, inflammatory diseases, osteoporosis, and pulmonary traits, are available in Table 5. We also performed the same enrichment analysis on the 2568 CpGs associated with former versus never-smoking status. We identified enrichment for coronary heart disease, pulmonary traits, and some cancers (Table 5). More detailed results are available in Tables VIII and IX in the Data Supplement. Differentially methylated genes in relation to smoking status that are associated in GWAS with coronary heart disease or coronary heart disease risk factors are available in Table X in the Data Supplement. We also performed enrichment analyses on phenotypes that have no clear relationships to smoking, such as male pattern baldness (P=0.0888), myopia (P=0.1070), thyroid cancer (P=0.2406), and testicular germ cell tumor (P=0.3602) and did not find significant enrichment.

Table 5. Enrichment of CpGs for Genome-Wide Association Study Phenotypes That Are Regarded as Causally Related to Cigarette Smoking1

GWAS PhenotypeEnrichment P Value
Current vs never smoking
CHD and stroke0.0028
 Ischemic stroke0.0095
CHD risk factors1.2×10−12
 Blood pressure/hypertension8.1×10−6
 Diastolic blood pressure6.1×10−5
 Systolic blood pressure0.0008
 High-density lipoprotein0.0009
 Type 2 diabetes mellitus0.0106
Rheumatoid arthritis2.9×10−5
Bone mineral density and osteoporosis0.0467
All pulmonary traits2.8×10−6
 All COPD0.0295
 Moderate-to-severe COPD0.0156
 Pulmonary function0.0044
Crohn disease9.5×10−7
Primary biliary cirrhosis3.4×10−6
Inflammation bowel disease3.5×10−5
Ulcerative colitis9.8×10−5
All cancer8.0×10−15
 Lung adenocarcinoma0.0015
 Colorectal cancer0.0014
Former vs never smoking
CHD risk factors7.6×10−5
 Blood pressure/hypertension5.8×10−5
 Diastolic blood pressure0.0021
 Systolic blood pressure0.0002
Rheumatoid arthritis6.3×10−5
All pulmonary traits0.0217
Inflammation bowel disease5.2×10−6
Crohn disease0.0064
All cancer7.8×10−6

CHD indicates coronary heart disease; COPD, chronic obstructive pulmonary disease; and CpG, cytosine–phosphate–guanine.

Enrichment Analysis for Genomic Features

We examined the differentially methylated CpGs with respect to localization to different genomic regions including CpG islands, gene bodies, known differentially methylated regions, and sites identified as likely to be functionally important in the ENCODE project such as DNAse1 hypersensitivity sites and enhancers (refer to the Methods section for details). We performed this analysis separately for the CpGs related to current smoking and past smoking (Table XI in the Data Supplement). Trends were similar for the 2 sets of CpGs, although the power to identify enrichment was much greater for the larger set of 18 760 CpGs related to current smoking. There was no enrichment for CpG islands. In contrast, significant enrichment was observed for island shores, gene bodies, DNAse1 hypersensitivity sites, and enhancers.

Transcriptomic Integration

Of the 18 760 statistically significant CpG sites associated with current smoking in the meta-analysis, 1430 were significantly associated in cis with the expression of 924 genes at FDR <0.05 (enrichment P=3.6×10−215; Table XII in the Data Supplement) using whole-blood samples from 2262 Framingham Heart Study participants. Of these, 424 CpGs associated with the expression of 285 genes were replicated at FDR <0.0001 in 1264 CD14+ samples from the MESA.35 These genes are associated with pathways similar to those described earlier (Table XIII in the Data Supplement).

Comparison Between AA and EA

Meta-analysis of the current versus never smokers in 11 cohorts with participants of EA (n=6750 subjects) yielded 10 977 CpGs annotated to 4940 genes at FDR <0.05. Meta-analysis of the results of the smaller data set of 4 cohorts with AA participants (n=2639) yielded 3945 CpGs annotated to 2088 genes at FDR <0.05. The effect estimates of the CpGs significant in at least one ancestry (12 927 CpGs) were highly correlated in the combined group of individuals of either ancestry (Spearman ρ=0.89). The results by ancestry are shown in Table XIV in the Data Supplement.

We performed the same ancestry-stratified analysis on former versus never smokers (Table XV in the Data Supplement). Meta-analysis of the results of EA participants yielded 2045 CpG sites annotated to 1081 genes at FDR <0.05. Meta-analysis of the results of AA participants yielded 329 CpG sites annotated to 178 genes at FDR <0.05. The effect estimates of the union of CpGs significant in at least one ancestry (2234 CpGs) were correlated in the combined group of individuals of either ancestry (Spearman ρ=0.75). Of note, one of CpG sites showing differential methylation in ancestry, cg00706683, mapped to gene ECEL1P2, did not return to never-smoker levels 30 years after smoking cessation (Table 4).

To more directly compare results by ethnicity, removing the effect of better statistical power in the larger EA sample size, we performed a meta-analysis on subset of EA cohorts: the Framingham Heart Study, Rotterdam Study, and KORA, such that the total number of smokers, the major determinant of power, would match that of AA cohorts. In this subset, similar correlations of the effect estimates were observed as in the complete analyses, suggesting that the differences in number of statistically significant CpGs are indeed because of better power in the EA cohorts (Spearman ρ=0.87 and 0.79 for current versus never smokers and former versus never smokers, respectively).

Cell Type Adjustment

We adjusted our main analyses for white blood cell fractions, in studies based on either whole blood or leukocytes from the buffy coat of whole blood, either measured or using a published method.20 Reassuringly, results before and after cell type adjustment were highly comparable. The correlation of regression coefficients before and after adjustment is 0.85 for the current versus never-smoker analysis (Figure VII in the Data Supplement). Similarly for the analysis of former versus never smokers, the effect estimates were highly correlated before and after adjustment (ρ=0.93; Figure VIII in the Data Supplement). In addition, in 2 cohorts, we had results from specific cell fractions—CD4+ cells in GOLDN and CD14+ cells in MESA. The correlation of results between buffy coat and CD4+ or CD14+ for former versus never smokers are generally high (ρ>0.74; Table XVI in the Data Supplement).

Methylation Profile Across CpG Sites

We assessed methylation profile in FHS cohort as a representative cohort in the study. The profile of all 485 381 analyzed CpG sites can be found in Figure IX in the Data Supplement. The profile across 18 760 CpG sites significantly associated with current versus never smoking status can be found in Figure X in the Data Supplement. These plots indicate that most CpG sites with less dynamic range are largely not statistically significant in our results.


We performed a genome-wide meta-analysis of blood-derived DNA methylation in 15 907 individuals across 16 cohorts and identified broad epigenome-wide impact of cigarette smoking, with 18 760 statistically significant CpGs (FDR <0.05) annotated to >7000 genes, or roughly one third of known human genes. These genes in turn affect multiple molecular mechanisms and are implicated in smoking-related phenotypes and diseases. In addition to confirming previous findings from smaller studies, we detected >16 000 novel differentially methylated CpGs in response to cigarette smoking. Many of these genes have not been previously implicated in the biological effects of tobacco exposure. The large number of genes implicated in this well-powered meta-analysis might on first glance raise concerns about false positives. However, on further consideration, given the widespread impact of smoking on disease outcomes across many organ systems and across the life span,1 the identification of a large number of genes at genome-wide significance is not surprising. In addition, our findings are robust and consistent across all 16 cohorts (Tables II and V in the Data Supplement) because we accounted for interstudy variability by using random-effect meta-analyses, which is conservative when heterogeneity is present.36 The implicated genes are mainly involved in molecular machineries, such as transcription and translation. Furthermore, differential methylation of a subset of CpGs persisted, often for decades, after smoking cessation.

We found that genes differentially methylated in relation to smoking are enriched for variants associated in GWAS with smoking-related diseases,1 including osteoporosis, colorectal cancers, chronic obstructive pulmonary disease, pulmonary function, cardiovascular disease, and rheumatoid arthritis. We find it noteworthy that there is enrichment of smoking-associated CpGs for genes associated with rheumatoid arthritis because DNA methylation is one of the proposed molecular mechanisms underlying this disease.37 It is also interesting that the most significant association of smoking with methylation was for the gene HIVEP3 (a.k.a. Schnurri3), the mammalian homolog of the Drosophila zinc finger adapter protein Shn.38 This gene regulates bone formation, an important determinant to osteoporosis, which was one of the enriched GWAS phenotypes.

When we examined time since smoking cessation, we found that the majority of the differentially methylated CpG sites observed in analysis of current versus never smokers returned to the level of never smokers within 5 years of smoking cessation. This is consistent with the fact that risks of many smoking-related diseases revert to nonsmoking levels within this period of time. Our results also indicate that cigarette smoking induces long-lasting alterations in DNA methylation at some CpGs. Although speculative, it is possible that persistent methylation changes at some loci might contribute to risks of some conditions that remain elevated after smoking cessation.

In all but 2 of our 14 cohorts, DNA was extracted from the entire circulating leukocyte population. Thus, there is the possibility of confounding by the effects of smoking on differential cell counts. We attempted to adjust for cell type and found that results were generally little changed by the adjustment.

Our significant results are highly enriched for CpG sites associated with the expression of nearby genes (ie, in cis) even though a single measurement of gene expression in blood is probably subject to considerably more within-subject variability than DNA methylation,39 limiting our ability to find correlations. Differential DNA methylation at many of the CpGs we identified in relation to smoking status may have a functional impact on nearby gene expression. Our analysis of genomic regions further supports the potential functional impact of our findings on gene expression. We demonstrated enrichment for sites with greater functional impact, such as island shores, gene bodies, DNAse1 hypersensitivity sites, and enhancers, whereas we found no enrichment for CpG islands. These results reinforce previous findings showing that island shores, enhancers, and DNAse I hypersensitive sites are more dynamic (ie, susceptible to methylation changes) than CpG islands,40 which may be more resistant to abrupt changes in DNA methylation in response to environmental exposures.41 Thus, our results suggest that many of the smoking-associated CpG sites may have regulatory effects.

Although identification of changes in methylation patterns may suggest mechanisms by which exposure to tobacco smoke exerts its effects on several disease processes, DNA methylation profiles can also serve as biomarkers of exposure to tobacco smoke. Cotinine is a biomarker only of recent smoking; DNA methylation signals have the potential to serve as robust biomarkers of smoking history.9,42 Indeed, several studies have identified several of such markers.5,42,43 The large number of persistently modified CpGs may be useful to develop even more robust biomarkers to objectively quantify long-term cigarette-smoking exposure for prediction of risk for health outcomes in settings where smoking history is not available or is incomplete and to validate self-reported never-smoker status. Furthermore, our analyses of both former and current smokers show dose-dependent effects at many CpGs (Tables III and VII in the Data Supplement). Methylation-based biomarkers could be informative for investigating dose–response relationships with disease end points. This is useful because smokers often under-report the amount of smoking, both current and historical.

It is possible that smoking-related conditions or correlated exposures may contribute to some of the methylation signatures identified. However, our studies are nearly all population-based studies composed of predominantly healthy individuals, not selected for smoking-related disease. Given the number, strength, and robustness to replication of findings for smoking across the literature and among our diverse cohorts from various countries, the likelihood that these are confounded by other exposures or conditions related to smoking is greatly reduced.

There are several potential limitations to our study. First, the cross-sectional design limits our ability to study the time course of smoking effects. In addition, we analyzed methylation in DNA samples from blood, which is readily accessible. Although we demonstrated that blood-derived DNA reveals a strong and robust signature of cigarette-smoking exposure, studies in target tissues for smoking-related diseases (eg, heart and lung) would be of additional interest. In addition, our analyses could not distinguish direct effects of smoking from indirect effects of smoking because of smoking-induced changes in cell metabolism, organ function, inflammation, or injury that could in turn influence methylation. However, this is the largest examination to date of the effects of smoking on DNA methylation with 16 studies from different countries contributing.

In conclusion, we identify an order of magnitude more sites differentially methylated in relation to smoking across the genome than have been previously seen. Many of these signals persist long after smoking cessation, providing potential biomarkers of smoking history. These findings may provide new insights into molecular mechanisms underlying the protean effects of smoking on human health and disease.


We would like to thank Bonnie R. Joubert, PhD, and Frank Day, PhD, of the National Institute of Environmental Health Sciences (Research Triangle Park, NC) and Jianping Jin, PhD, of Westat (Durham, NC) for expert computational assistance. Additional Acknowledgments can be found in the Data Supplement.


From the Institute for Aging Research, Hebrew SeniorLife (R.J., D.P.K.), Department of Medicine, Beth Israel Deaconess Medical Center (R.J., D.P.K.), Channing Division of Network Medicine, Brigham and Women’s Hospital (D.L.D.), and Department of Psychiatry (K.J.R.), Harvard Medical School, Boston, MA; Population Sciences Branch, National Heart, Lung, and Blood Institute (R.J., T.H., C.L., M.M.M., C.Y., D.L.) and Laboratory of Neurogenetics, National Institute on Aging (D.G.H., A.B.S.), National Institutes of Health, Bethesda, MD; Framingham Heart Study, MA (R.J., T.H., C.L., M.M.M., C.Y., D.L.); Department of Preventive Medicine, Icahn School of Medicine at Mount Sinai, New York, NY (A.C.J.); Centre for Cognitive Ageing and Cognitive Epidemiology (R.E.M., P.M.V., J.M.S., I.J.D.), Centre for Genomic and Experimental Medicine, Institute of Genetics and Molecular Medicine (R.E.M.), Alzheimer Scotland Dementia Research Centre (J.M.S.), and Department of Psychology, University of Edinburgh (I.J.D.), United Kingdom; Queensland Brain Institute (R.E.M., R.H.S., A.F.M., P.M.V., N.R.W.) and University of Queensland Diamantina Institute, Translational Research Institute (A.F.M., P.M.V.), University of Queensland, Brisbane, Australia; Epidemiology and Public Health Group, Institute of Biomedical and Clinical Science, University of Exeter Medical School, United Kingdom (L.C.P., D.M.); Department of Epidemiology and Prevention, Division of Public Health Sciences (L.M.R., C.J.R., Y.L.), Department of Biostatistical Sciences, Division of Public Health Sciences (K.L.), and Department of Internal Medicine (J.D.), Wake Forest School of Medicine, Winston-Salem, NC; Department of Internal Medicine (P.R.M., A.G.U., J.B.J.v.M.), Department of Clinical Chemistry (P.R.M.), and Department of Epidemiology (A.H.), Erasmus University Medical Center, Rotterdam, The Netherlands; Division of Biostatistics (W.G.) and Division of Epidemiology and Community Health (J.S.P.), School of Public Health, University of Minnesota, Minneapolis; Research Unit of Molecular Epidemiology, Institute of Epidemiology II, Helmhotz Zentrum Muenchen, Munich, Germany (T.X., S.K., A.P., R.W.-S., M.W.); MRC Epidemiology Unit, Institute of Metabolic Science, University of Cambridge, United Kingdom (C.E.E., N.J.W., K.K.O.); Department of Epidemiology, University of Alabama at Birmingham (S.A., J.S., M.R.I., D.K.A.); Autonomous Metropolitan University-Iztapalapa, Mexico City, Mexico (H.M.-M.); International Agency for Research on Cancer, Lyon, France (H.M.-M., S.A., Z.H., I.R.); Department of Epidemiology, School of Public Health (J.A.S., W.Z., E.B.W., S.L.R.K.) and Research Center for Group Dynamics, Institute for Social Research (E.B.W.), University of Michigan, Ann Arbor; Cardiovascular Health Research Unit, Departments of Medicine, Epidemiology, and Health Services (J.A.B., B.M.P., B.R.S.), Center for Lung Biology, Division of Pulmonary and Critical Care Medicine, Department of Medicine (S.A.G.) and Cardiovascular Health Research Unit, Division of Cardiology, Department of Epidemiology (N.S.), University of Washington, Seattle; Department of Environmental Health, Rollins School of Public Health, Emory University, Atlanta, GA (R.D.); School of Public Health, University of California, Berkeley (P.Y., E.W.D.); HudsonAlpha Institute for Biotechnology, Huntsville, AL (D.M.A.); Clinical Research Branch, National Institute on Aging, Baltimore, MD (L.F.); Human Genetics Center, School of Public Health (J.B., M.L.G., M.F.) and School of Biomedical Informatics (D.Z.), The University of Texas Health Science Center at Houston; Department of Cardiology, Boston Children’s Hospital, Boston, MA (M.M.M.); Division of Cancer Epidemiology, German Cancer Research Center (DKFZ) Heidelberg (M.B.); MRC/PHE Centre for Environment and Health, School of Public Health, Imperial College London, United Kingdom (P.V.); HuGeF Foundation, Torino, Italy (P.V.); Department of Epidemiology (J.S., A.A.B.) and Department of Environmental Health (A.A.B.), Harvard T.H. Chan School of Public Health, Boston, MA; Department of Preventive Medicine and the Robert H. Lurie Comprehensive Cancer Center, Feinberg School of Medicine, Northwestern University, Chicago, IL (L.H.); VA Normative Aging Study, VA Boston Healthcare System & Department of Medicine, Boston University School of Medicine, Boston, MA (P.S.V.); Geriatric Unit, Azienda Sanitaria di Firenze, Florence, Italy (S.B.); Division of Nephrology & Hypertension, Mayo Clinic, Rochester, MN (S.T.T.); Department of Psychiatry and Behavioral Sciences (E.B.B., A.K.S., K.J.R.); Department of Human Genetics, Emory University School of Medicine, Atlanta, GA (K.N.C.); Department of Translational Research in Psychiatry, Max-Planck Institute of Psychiatry, Munich, Germany (T.K., E.B.B.); Division of Depression & Anxiety Disorders, McLean Hospital, Belmont, MA (T.K., K.J.R.); Group Health Research Institute, Group Health Cooperative, Seattle, WA (B.M.P.); Institute for Translational Genomics & Population Sciences, Los Angeles BioMedical Research Institute (K.D.T.), Division of Genomic Outcomes, Department of Pediatrics, Harbor-UCLA Medical Center, Torrance (K.D.T.); Departments of Pediatrics, Medicine, and Human Genetics, UCLA, Los Angeles, CA (K.D.T.); Harvard School of Public Health (L.L.); Boston University School of Medicine (G.T.O.); and Epidemiology Branch, Department of Health and Human Services, National Institute of Environmental Health Sciences, National Institutes of Health, Research Triangle Park, NC (S.J.L


Guest Editor for this article was Christopher Semsarian, MBBS, PhD, MPH.

*Drs Joehanes, Just, Marioni, Pilling, Reynolds, Guan, Xu, Elks, Aslibekyan, Moreno-Macias, J.A. Smith, Brody, Dhingra, and P.R. Mandaviya contributed equally as first authors.

†Drs. Conneely, Sotoodehnia, Kardia, Melzer, Baccarelli, van Meurs, Romieu, Arnett, Ong, Y. Liu, Waldenberger, Deary, Fornage, Levy, and London contributed equally as senior authors.

The Data Supplement is available at

Correspondence to Stephanie J. London, MD, DrPH, Epidemiology Branch, Department of Health and Human Services, National Institute of Environmental Health Sciences, National Institutes of Health, PO Box 12233, Room A306, Research Triangle Park, NC 27709. E-mail


  • 1. National Center for Chronic Disease Prevention and Health Promotion (US) Office on Smoking and Health. The Health Consequences of Smoking—50 Years of Progress: A Report of the Surgeon General. Atlanta, GA: Centers for Disease Control and Prevention (US); 2014.Google Scholar
  • 2. World Health Organization. WHO global report on trends in prevalence of tobacco smoking 2015.Available at Scholar
  • 3. Szarc vel Szic K, Declerck K, Vidaković M, Vanden Berghe W. From inflammaging to healthy aging by dietary lifestyle choices: is epigenetics the key to personalized nutrition?Clin Epigenetics. 2015; 7:33. doi: 10.1186/s13148-015-0068-2.CrossrefMedlineGoogle Scholar
  • 4. Breitling LP, Yang R, Korn B, Burwinkel B, Brenner H. Tobacco-smoking-related differential DNA methylation: 27K discovery and replication.Am J Hum Genet. 2011; 88:450–457. doi: 10.1016/j.ajhg.2011.03.003.CrossrefMedlineGoogle Scholar
  • 5. Breitling LP, Salzmann K, Rothenbacher D, Burwinkel B, Brenner H. Smoking, F2RL3 methylation, and prognosis in stable coronary heart disease.Eur Heart J. 2012; 33:2841–2848. doi: 10.1093/eurheartj/ehs091.CrossrefMedlineGoogle Scholar
  • 6. Wan ES, Qiu W, Baccarelli A, Carey VJ, Bacherman H, Rennard SI, et al. Cigarette smoking behaviors and time since quitting are associated with differential DNA methylation across the human genome.Hum Mol Genet. 2012; 21:3073–3082. doi: 10.1093/hmg/dds135.CrossrefMedlineGoogle Scholar
  • 7. Wan ES, Qiu W, Carey VJ, Morrow J, Bacherman H, Foreman MG, et al. Smoking-associated site-specific differential methylation in buccal mucosa in the COPDGene study.Am J Respir Cell Mol Biol. 2015; 53:246–254. doi: 10.1165/rcmb.2014-0103OC.CrossrefMedlineGoogle Scholar
  • 8. Zeilinger S, Kühnel B, Klopp N, Baurecht H, Kleinschmidt A, Gieger C, et al. Tobacco smoking leads to extensive genome-wide changes in DNA methylation.PLoS One. 2013; 8:e63812. doi: 10.1371/journal.pone.0063812.CrossrefMedlineGoogle Scholar
  • 9. Shenker NS, Ueland PM, Polidoro S, van Veldhoven K, Ricceri F, Brown R, et al. DNA methylation as a long-term biomarker of exposure to tobacco smoke.Epidemiology. 2013; 24:712–716. doi: 10.1097/EDE.0b013e31829d5cb3.CrossrefMedlineGoogle Scholar
  • 10. Shenker NS, Polidoro S, van Veldhoven K, Sacerdote C, Ricceri F, Birrell MA, et al. Epigenome-wide association study in the European Prospective Investigation into Cancer and Nutrition (EPIC-Turin) identifies novel genetic loci associated with smoking.Hum Mol Genet. 2013; 22:843–851. doi: 10.1093/hmg/dds488.CrossrefMedlineGoogle Scholar
  • 11. Guida F, Sandanger TM, Castagné R, Campanella G, Polidoro S, Palli D, et al. Dynamics of smoking-induced genome-wide methylation changes with time since smoking cessation.Hum Mol Genet. 2015; 24:2349–2359. doi: 10.1093/hmg/ddu751.CrossrefMedlineGoogle Scholar
  • 12. Qiu W, Wan E, Morrow J, Cho MH, Crapo JD, Silverman EK, et al. The impact of genetic variation and cigarette smoke on DNA methylation in current and former smokers from the COPDGene study.Epigenetics. 2015; 10:1064–1073. doi: 10.1080/15592294.2015.1106672.CrossrefMedlineGoogle Scholar
  • 13. Gao X, Jia M, Zhang Y, Breitling LP, Brenner H. DNA methylation changes of whole blood cells in response to active smoking exposure in adults: a systematic review of DNA methylation studies.Clin Epigenetics. 2015; 7:113. doi: 10.1186/s13148-015-0148-3.CrossrefMedlineGoogle Scholar
  • 14. Shah S, Bonder MJ, Marioni RE, Zhu Z, McRae AF, Zhernakova A, et al; BIOS Consortium. Improving phenotypic prediction by combining genetic and epigenetic associations.Am J Hum Genet. 2015; 97:75–85. doi: 10.1016/j.ajhg.2015.05.014.CrossrefMedlineGoogle Scholar
  • 15. Beane J, Sebastiani P, Liu G, Brody JS, Lenburg ME, Spira A. Reversible and permanent effects of tobacco smoke exposure on airway epithelial gene expression.Genome Biol. 2007; 8:R201. doi: 10.1186/gb-2007-8-9-r201.CrossrefMedlineGoogle Scholar
  • 16. Wauters E, Janssens W, Vansteenkiste J, Decaluwé H, Heulens N, Thienpont B, et al. DNA methylation profiling of non-small cell lung cancer reveals a COPD-driven immune-related signature.Thorax. 2015; 70:1113–1122. doi: 10.1136/thoraxjnl-2015-207288.CrossrefMedlineGoogle Scholar
  • 17. Garg AX, Hackam D, Tonelli M. Systematic review and meta-analysis: when one study is just not enough.Clin J Am Soc Nephrol. 2008; 3:253–260. doi: 10.2215/CJN.01430307.CrossrefMedlineGoogle Scholar
  • 18. Pidsley R, Y Wong CC, Volta M, Lunnon K, Mill J, Schalkwyk LC. A data-driven approach to preprocessing Illumina 450K methylation array data.BMC Genomics. 2013; 14:293. doi: 10.1186/1471-2164-14-293.CrossrefMedlineGoogle Scholar
  • 19. Teschendorff AE, Marabita F, Lechner M, Bartlett T, Tegner J, Gomez-Cabrero D, et al. A beta-mixture quantile normalization method for correcting probe design bias in Illumina Infinium 450 k DNA methylation data.Bioinformatics. 2013; 29:189–196. doi: 10.1093/bioinformatics/bts680.CrossrefMedlineGoogle Scholar
  • 20. Houseman EA, Accomando WP, Koestler DC, Christensen BC, Marsit CJ, Nelson HH, et al. DNA methylation arrays as surrogate measures of cell mixture distribution.BMC Bioinformatics. 2012; 13:86. doi: 10.1186/1471-2105-13-86.CrossrefMedlineGoogle Scholar
  • 21. Dick KJ, Nelson CP, Tsaprouni L, Sandling JK, Aïssi D, Wahl S, et al. DNA methylation and body-mass index: a genome-wide analysis.Lancet. 2014; 383:1990–1998. doi: 10.1016/S0140-6736(13)62674-4.CrossrefMedlineGoogle Scholar
  • 22. Bates D, Mächler M, Bolker B, Walker S. Fitting linear mixed-effects models using lme4.J Stat Softw. 2015; 67:1–48.CrossrefGoogle Scholar
  • 23. R Development Core Team. R: A Language and Environment for Statistical Computing. Vienna, Austria: R Development Core Team; 2010.Google Scholar
  • 24. Viechtbauer W. Conducting meta-analyses in R with the metafor package.J Stat Softw. 2010; 36:1–48.CrossrefGoogle Scholar
  • 25. Benjamini Y, Hochberg Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing.JSSRB. 1995; 57:289–300. doi: 10.2307/2346101.Google Scholar
  • 26. Joubert BR, Felix JF, Yousefi P, Bakulski KM, Just AC, Breton C, et al. DNA methylation in newborns and maternal smoking in pregnancy: genome-wide consortium meta-analysis.Am J Hum Genet. 2016; 98:680–696. doi: 10.1016/j.ajhg.2016.02.019.CrossrefMedlineGoogle Scholar
  • 27. Su AI, Wiltshire T, Batalov S, Lapp H, Ching KA, Block D, et al. A gene atlas of the mouse and human protein-encoding transcriptomes.Proc Natl Acad Sci USA. 2004; 101:6062–6067. doi: 10.1073/pnas.0400782101.CrossrefMedlineGoogle Scholar
  • 28. Hindorff LA, Sethupathy P, Junkins HA, Ramos EM, Mehta JP, Collins FS, et al. Potential etiologic and functional implications of genome-wide association loci for human diseases and traits.Proc Natl Acad Sci USA. 2009; 106:9362–9367. doi: 10.1073/pnas.0903103106.CrossrefMedlineGoogle Scholar
  • 29. Vazquez AI, Bates DM, Rosa GJ, Gianola D, Weigel KA. Technical note: an R package for fitting generalized linear mixed models in animal breeding.J Anim Sci. 2010; 88:497–504. doi: 10.2527/jas.2009-1952.CrossrefMedlineGoogle Scholar
  • 30. Devlin B, Roeder K. Genomic control for association studies.Biometrics. 1999; 55:997–1004.CrossrefMedlineGoogle Scholar
  • 31. Martey CA, Baglole CJ, Gasiewicz TA, Sime PJ, Phipps RP. The aryl hydrocarbon receptor is a regulator of cigarette smoke induction of the cyclooxygenase and prostaglandin pathways in human lung fibroblasts.Am J Physiol Lung Cell Mol Physiol. 2005; 289:L391–L399. doi: 10.1152/ajplung.00062.2005.CrossrefMedlineGoogle Scholar
  • 32. Teschendorff AE, Yang Z, Wong A, Pipinikas CP, Jiao Y, Jones A, et al. Correlation of smoking-associated DNA methylation changes in buccal cells with DNA methylation changes in epithelial cancer.JAMA Oncol. 2015; 1:476–485. doi: 10.1001/jamaoncol.2015.1053.CrossrefMedlineGoogle Scholar
  • 33. Campesi I, Carru C, Zinellu A, Occhioni S, Sanna M, Palermo M, et al. Regular cigarette smoking influences the transsulfuration pathway, endothelial function, and inflammation biomarkers in a sex-gender specific manner in healthy young humans.Am J Transl Res. 2013; 5:497–509.MedlineGoogle Scholar
  • 34. Subramanian A, Tamayo P, Mootha VK, Mukherjee S, Ebert BL, Gillette MA, et al. Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles.Proc Natl Acad Sci USA. 2005; 102:15545–15550. doi: 10.1073/pnas.0506580102.CrossrefMedlineGoogle Scholar
  • 35. Liu Y, Ding J, Reynolds LM, Lohman K, Register TC, De La Fuente A, et al. Methylomics of gene expression in human monocytes.Hum Mol Genet. 2013; 22:5065–5074. doi: 10.1093/hmg/ddt356.CrossrefMedlineGoogle Scholar
  • 36. Han B, Eskin E. Random-effects model aimed at discovering associations in meta-analysis of genome-wide association studies.Am J Hum Genet. 2011; 88:586–598. doi: 10.1016/j.ajhg.2011.04.014.CrossrefMedlineGoogle Scholar
  • 37. Liu Y, Aryee MJ, Padyukov L, Fallin MD, Hesselberg E, Runarsson A, et al. Epigenome-wide association data implicate DNA methylation as an intermediary of genetic risk in rheumatoid arthritis.Nat Biotechnol. 2013; 31:142–147. doi: 10.1038/nbt.2487.CrossrefMedlineGoogle Scholar
  • 38. Jones DC, Wein MN, Oukka M, Hofstaetter JG, Glimcher MJ, Glimcher LH. Regulation of adult bone mass by the zinc finger adapter protein Schnurri-3.Science. 2006; 312:1223–1227. doi: 10.1126/science.1126313.CrossrefMedlineGoogle Scholar
  • 39. Suderman M, Pappas JJ, Borghol N, Buxton JL, McArdle WL, Ring SM, et al. Lymphoblastoid cell lines reveal associations of adult DNA methylation with childhood and current adversity that are distinct from whole blood associations.Int J Epidemiol. 2015; 44:1331–1340. doi: 10.1093/ije/dyv168.CrossrefMedlineGoogle Scholar
  • 40. Ziller MJ, Gu H, Müller F, Donaghey J, Tsai LT, Kohlbacher O, et al. Charting a dynamic DNA methylation landscape of the human genome.Nature. 2013; 500:477–481. doi: 10.1038/nature12433.CrossrefMedlineGoogle Scholar
  • 41. Ivanova E, Chen JH, Segonds-Pichon A, Ozanne SE, Kelsey G. DNA methylation at differentially methylated regions of imprinted genes is resistant to developmental programming by maternal nutrition.Epigenetics. 2012; 7:1200–1210. doi: 10.4161/epi.22141.CrossrefMedlineGoogle Scholar
  • 42. Zhang Y, Schöttker B, Florath I, Stock C, Butterbach K, Holleczek B, et al. Smoking-associated DNA methylation biomarkers and their predictive value for all-cause and cardiovascular mortality.Environ Health Perspect. 2016; 124:67–74. doi: 10.1289/ehp.1409020.MedlineGoogle Scholar
  • 43. Zhang Y, Yang R, Burwinkel B, Breitling LP, Brenner H. F2RL3 methylation as a biomarker of current and lifetime smoking exposures.Environ Health Perspect. 2014; 122:131–137. doi: 10.1289/ehp.1306937.CrossrefMedlineGoogle Scholar


We combined data from 16 cohorts (15 907 individuals) examining genome-wide methylation, a type of epigenetic modification, in blood DNA, in relation to smoking status. In this large-scale meta-analysis, thousands of DNA methylation cytosine-p-guanine sites were associated with current versus never-smoking status. These methylation signals reside in genes that are associated with numerous diseases caused by cigarette smoking, such as cardiovascular diseases and certain cancers. Of the thousands of cytosine-p-guanine sites differentially methylated in current versus never smokers, >10% also were significantly associated with former versus never-smoking status. Although many of these former smoker methylation signals return to never-smoker levels with 5 years of quitting, a substantial proportion remain elevated even after 30 years of cessation. We also found widespread evidence that many differentially methylated sites also are related to gene expression, showing a functional impact on the genome. Furthermore, in our analyses, these cigarette-smoking DNA methylation signals affect genes important to fundamental molecular pathways, such as molecular signal transduction, protein metabolic processes, and transcription. In conclusion, cigarette smoking has a widespread and long-lasting impact on DNA methylation. DNA methylation is one potential mechanism by which tobacco exposure predisposes to numerous adverse health outcomes.


eLetters should relate to an article recently published in the journal and are not a forum for providing unpublished data. Comments are reviewed for appropriate use of tone and language. Comments are not peer-reviewed. Acceptable comments are posted to the journal website only. Comments are not published in an issue and are not indexed in PubMed. Comments should be no longer than 500 words and will only be posted online. References are limited to 10. Authors of the article cited in the comment will be invited to reply, as appropriate.

Comments and feedback on AHA/ASA Scientific Statements and Guidelines should be directed to the AHA/ASA Manuscript Oversight Committee via its Correspondence page.