Identification of 64 Novel Genetic Loci Provides an Expanded View on the Genetic Architecture of Coronary Artery Disease
Coronary artery disease (CAD) is a complex phenotype driven by genetic and environmental factors. Ninety-seven genetic risk loci have been identified to date, but the identification of additional susceptibility loci might be important to enhance our understanding of the genetic architecture of CAD.
To expand the number of genome-wide significant loci, catalog functional insights, and enhance our understanding of the genetic architecture of CAD.
Methods and Results:
We performed a genome-wide association study in 34 541 CAD cases and 261 984 controls of UK Biobank resource followed by replication in 88 192 cases and 162 544 controls from CARDIoGRAMplusC4D. We identified 75 loci that replicated and were genome-wide significant (P<5×10−8) in meta-analysis, 13 of which had not been reported previously. Next, to further identify novel loci, we identified all promising (P<0.0001) loci in the CARDIoGRAMplusC4D data and performed reciprocal replication and meta-analyses with UK Biobank. This led to the identification of 21 additional novel loci reaching genome-wide significance (P<5×10−8) in meta-analysis. Finally, we performed a genome-wide meta-analysis of all available data revealing 30 additional novel loci (P<5×10−8) without further replication. The increase in sample size by UK Biobank raised the number of reconstituted gene sets from 4.2% to 13.9% of all gene sets to be involved in CAD. For the 64 novel loci, 155 candidate causal genes were prioritized, many without an obvious connection to CAD. Fine mapping of the 161 CAD loci generated lists of credible sets of single causal variants and genes for functional follow-up. Genetic risk variants of CAD were linked to development of atrial fibrillation, heart failure, and death.
We identified 64 novel genetic risk loci for CAD and performed fine mapping of all 161 risk loci to obtain a credible set of causal variants. The large expansion of reconstituted gene sets argues in favor of an expanded omnigenic model view on the genetic architecture of CAD.
Coronary artery disease (CAD) is the predominant cause of ischemic heart disease often leading to myocardial infarction and a leading cause of death. Globally, deaths because of ischemic heart disease increased by 16.6% from 2005 to 2015 to 8.9 million deaths. However, the age-standardized mortality rates are decreasing (fell by 12.8%)1 because of preventive and treatment strategies established on evolving knowledge of the underlying pathophysiology of CAD.
Editorial, see p 391
In This Issue, see p 385
CAD is a complex disease, resulting from numerous additive and interacting contributions in an individual’s environment and lifestyle in combination with their underlying genetic architecture. Since the first genome-wide association studies (GWAS) for CAD in 2007,2–4 multiple additional studies with progressively larger sample sizes identified 97 genome-wide significant genetic loci associated with CAD5–10 at the time of analysis. The continuous effort to identify additional loci associated with CAD and share these early with the scientific community is important, especially to enhance our understanding of the biological underpinnings of CAD and to catalyze the development of drugs. A comprehensive understanding of the genetic architecture of CAD is also essential to enable precision medicine approaches by identifying subgroups of patients at increased risk of CAD or its complications and might identify those with a specific driving pathophysiology in whom a particular therapeutic or preventive approach would be most useful.11
To further our knowledge of the genetic architecture of CAD, we performed a de novo GWAS of the UK Biobank resource and meta-analyses with CARDIoGRAMplusC4D data. Our approach led to the identification of 64 novel loci associated with CAD, expanding the grand total to 161. These loci were interrogated using bioinformatic approaches to catalog and interpret the potential biological relevance of our findings. We also performed network and gene-set analyses and proposed the omnigenic model to explain our findings. This expanding resource is now available for other investigators to help to further elucidate the underlying biology and relevance.
The data that support the findings of this study are available from the corresponding author on reasonable request. The de novo GWAS analysis and meta-analysis have been posted on Mendeley (doi:10.17632/2zdd47c94h.1; doi:10.17632/gbbsrpx6bs.1). A summary of the methods is provided below, and a more detailed description of the experimental procedures is provided in the Online Data Supplement.
Study Design and Samples
The study design consisted of a reciprocal 2-stage sequential discovery and replication approach (Online Figure I) providing the most robust statistical evidence followed by an overall meta-analysis of all available data for which currently no replication data were available in this study. First, using the UK Biobank resource, we conducted a GWAS to discover single-nucleotide polymorphisms (SNPs) associated with CAD. In stage 2, we took forward all promising SNPs reaching nominal significance (P<0.0001) for replication in CARDIoGRAMplusC4D data. Replicating SNPs (P<0.05 after Bonferroni adjustment) were meta-analyzed and considered true when surpassing the genome-wide significance threshold (P<5×10−8). The reciprocal stage 1 entailed the identification for all promising SNPs (P<0.0001) in CARDIoGRAMplusC4D and replication in UK Biobank (P<0.05 after Bonferroni adjustment) followed by meta-analysis. Again, SNPs replicating and surpassing the genome-wide significance threshold were considered true. A sentinel SNP in a locus was defined as the most significant variant in a 1-mb region that was independent from other sentinel SNPs (r2<0.1). A locus was defined as a region of 1 mb at either side of the sentinel SNP. A locus was considered novel if the sentinel SNP was not within a 1-mb window (at either side) of earlier reported genome-wide significant SNPs (Online Table I). Finally, we performed a genome-wide meta-analysis of the UK Biobank resource and CARDIoGRAMplusC4D to identify additional CAD-associated loci (P<5×10−8 in meta-analysis). A potential sample overlap between the UK Biobank and cohorts of CARDIoGRAMplusC4D was estimated to be <0.1%; no evidence was found that this biased the test statistics (Online Data Supplement).
Candidate Genes and Insights in Biology
Candidate causal genes at each of the loci were prioritized based on proximity, expression quantitative trait locus (eQTL) data, DEPICT analyses (Data-Driven Expression-Prioritized Integration for Complex Traits),12 and long-range chromatin interactions of variants with gene promoters (Online Data Supplement).8,13 Summary information of genes was obtained via queries in GeneCards, EntrezGene, UniProt, and Tocris. The Mouse Genomic Informatics database was used for obtaining insights into mammalian phenotypes associated with disruption of candidate genes. DEPICT was also used to test for enrichment of gene sets and identify relevant tissues and cell types. Ingenuity pathway analysis (June 2017 release) was performed to strengthen the biological relevancy of the novel loci.
Insights in Loci by Associations With Other Phenotypes
The GWAS catalog was queried and a phenome scan was performed by intersecting the identified loci with the GWAS catalog and by testing the association of the newly identified SNPs with a wide range of phenotypes using linear or logistic regression analysis in UK Biobank (Online Data Supplement). Genetic risk scores (GRS) were constructed using effect estimates obtained from the CARDIoGRAMplusC4D data as described previously.8 Multivariable Cox proportional hazards models were fitted for quintiles of the GRS in the UK Biobank resource, to assess the extent to which the GRS could predict new-onset atrial fibrillation/flutter and heart failure.
Regulatory DNA and Fine Mapping of Probable Causal Variants
To systematically characterize the functional, cellular, and regulatory contribution of genetic variation, we used GARFIELD,14 analyzing the enrichment of genome-wide association summary statistics in tissue-specific functional elements at given significance thresholds. Probabilistic Annotation Integrator was used to fine-map loci by integrating genetic association signal strength with genomic functional annotation data.15 We explored the potential target genes of these candidate causal variants by determining their direct effects on protein function (missense variants) and evidence connecting the causal variant in an untranslated region (Utr)-3′ region to gene expression (eQTL) or physical interactions (Hi-C) with the promotor of an eQTL gene. Determination of potential causal mechanisms of the potential causal variants based on (1) missense variation, (2) chromatin interaction between the causal variant and the promotor of a gene for which the causal variant was also significantly associated with gene expression by eQTL analyses, or (3) Utr3′ overlapping variants that were also significantly associated with gene expression of the same gene corresponding to the Utr3′ position. In addition, for genes/mechanisms to be prioritized by eQTL analyses and chromatin interactions or Utr′3, the respective causal variant was required to be in an enhancer region.
Genome-Wide Analyses of 34 541 Cases and 261 984 Controls
The stage 1 GWAS analysis in UK Biobank (34 541 cases and 261 984 controls; Online Table II) of 7 947 838 SNPs revealed 630 suggestive SNPs (P<0.0001) in 442 loci (Online Table III). Eighty-six independent SNPs in 75 loci both replicated (P<0.05 Bonferroni adjusted) in stage 2 in ≤88 192 cases and 162 544 controls of CARDIoGRAMplusC4D, and achieved genome-wide significance (P<5×10−8) with no evidence of heterogeneity of effects (Phet≥0.10). Thirteen of the 75 loci are not established CAD-associated loci (Table 1).
|Cytoband||Position||Lead SNP||A1||A2||Freq||Variant Function||Candidate Genes||Resource||OR (95% CI)||P Value|
|10q23.1||82251514||rs17680741||T||C||0.72||Intronic||TSPAN14,*†§ MAT1A,†FAM213A‡||UK||1.05 (1.03–1.06)||2.3×10−11|
Next, we reanalyzed the data from the MetaboChip meta-analysis of CARDIoGRAMplusC4D,9 the CARDIoGRAMplusC4D 1000 Genomes meta-analysis,7 and the CARDIoGRAM Exome array data16 to identify the promising SNPs (P<0.0001). We identified 568 promising SNPs located in 375 loci (Online Table IV). One hundred and thirteen independent SNPs in 96 loci both replicated (P<0.05 Bonferroni adjusted) in stage 2, UK Biobank, and achieved genome-wide significance in meta-analysis (P<5×10−8), including 21 additional novel loci (Table 1; Online Table V).
Finally, we performed a meta-analysis of CARDIoGRAMplusC4D9 and the CARDIoGRAMplusC4D 1000 Genomes meta-analysis7 with UK Biobank and identified 30 additional loci for which no replication test was available (Table 1; Online Table VI) increasing the total number of genome-wide significant CAD loci to 161 (Online Figure II). The novel variants were common (>5%, except for 1, rs112635299 near SERPINA1). Online Figure III shows the regional association plot of each novel locus. For some variants, a dominant or recessive linkage model appears to be a better fit compared with an additive model (Online Table VII). Complete summary statistics of all SNPs in UK Biobank and the UK Biobank CARDIoGRAMplusC4D meta-analysis are available as download on www.cardiomics.net.
Candidate Genes and Deeper Insights Into Biology
To disentangle whether associations were driven more by acute myocardial infarction as opposed to stable CAD, we performed multinomial logistic regression analyses for all genome-wide significant (P<5×10−8) loci in UK Biobank. In total, 17 666 of 34 541 CAD individuals were diagnosed with myocardial infarction. None of the novel loci and only 2 previously identified variants (rs9349379 and rs10947789) appear to be mainly driven by its association with myocardial infarction rather than stable CAD (false discovery rate [FDR], P<0.05; Online Table VIII).
We further explored the potential biology of the 64 novel CAD-associated loci by prioritizing 155 candidate causal genes in these loci: 69 genes were in proximity (the nearest gene and any additional gene within 10 kb) of the lead variant, 9 genes contained coding genetic variation in linkage disequilibrium (r2>0.8) with the lead variant (Online Table IX), 50 genes were selected based on eQTL analyses (Online Table X), 64 genes showed significant chromatin interactions (Hi-C) between the genetic variant and promoter of the gene (Online Table XI), and 60 genes were prioritized based on DEPICT analyses (Online Table XII). Of the 155 candidate genes, 63 were prioritized by multiple methods of identification, which may be used to prioritize candidate causal genes. A summary of the current function annotation of each novel candidate gene is provided in Online Table XIII, and knowledge on pharmacological compounds and nutrients influencing these genes is provided in Online Table XIV. Next, we performed a systematic search in the Mouse Genome Informatics database to identify the effect of mutations in orthologous genes for these candidate causal genes (details in Online Table XV). In brief, we identified 34 genes that expressed at least 1 cardiovascular system phenotype (AGT, ARHGAP42, BACH1, CALCRL, CASQ2, CCM2, CDC123, CDKN1A, FIGN, FOXC1, GIT1, GNPAT, HCRT, HSD17B12, MAP1S, MAP3K1, MSANTD1, NGF, NPHP3, PCIF1, PDS5B, PLCG1, PLEKHA1, PPP2R3A, PRDM16, PRKCE, RAC1, SEMA5A, SH3PXD2A, TFPI, TIPARP, TMEM106B, VEGFA, and ZFPM2) and 34 genes that affected other potentially plausible traits linked to CAD, including metabolic/lipid/adipose/weight abnormalities (AGT, CORO6, FIGN, GIT1, KAT2A, NGF, PPP2R3A, NPHH3, SH3PXD2A, TMEM106B, VEGFA, ZHX3, OPTN, FAM213A, DNAJC7, and COPRS), abnormalities in inflammation or white blood cells (DHX58, FHL3, HNRNPD, PLCG2, PRDM16, TFPI, VEGFA, ZNF335, PRKCE, MYO1G, RAC1, and ARID4A), and abnormalities in platelets or coagulation (FHL3, PLCG2, TFPI, VEGFA, DST, and KLF4).
Novel Insights From Pathway Analyses
Ingenuity pathway analysis restricted to the 155 candidate causal genes confirmed that these are enriched for effects on the cardiovascular system and cell cycle functions (Online Table XVI). Pathway insights provided by the DEPICT framework identified 1525 reconstituted gene sets that could be captured in 156 meta gene sets (Online Table XVII). The 4 most significant metasets were complete embryonic lethality during organogenesis, blood vessel development, anemia, and SRC PPI subnetwork. The platelet α-granule lumen, SRC PPI subnetwork, blood vessel development, and hemostasis had the largest betweenness centrality—an indicator of a node’s centrality in the network. The tissue enrichment analyses by DEPICT indicated blood vessels as the most relevant tissue (P=4×10−7); 41 additional tissues or cell types were significantly enriched at FDR<0.05 (Online Table XVIII). We compared the contribution of novel information with previous work. The previous CARDIoGRAMplusC4D analysis led to 457 reconstituted gene sets (at FDR<0.05); the addition of the intermediate data set UK Biobank of 150 k individuals identified a total of 889 significant gene sets, substantially less than the current 1525 gene sets (Figure 1; Online Table XVII). Considering all 10 968 possible gene sets, this study represents an increase from 4.16% to 13.90% of all gene sets involved in CAD since the 1000 Genomes analysis of CARDIoGRAMplusC4D in 2015. Genes implicated by DEPICT on the FDR<0.05 level are 94 in the previous data, which has increased to 540 genes.
Insights in Loci by Associations With Other Phenotypes
To increase our understanding of potentially mediating mechanisms at the genetic variant level, we searched the GWAS catalog for previously reported variants. Of the 64 novel loci, 23 loci were in linkage disequilibrium (r2>0.6) with genetic variants previously reported to be associated with other traits surpassing the genome-wide significant (P<5×10−8) threshold (Online Table XIX). We found associations with anthropometric measurements (rs6905288, rs1591805, rs3936511, and rs840616), antineutrophil antibody-associated vasculitis (rs112635299), angiotensinogen measurements (rs699), coffee consumption (rs13723), C-reactive protein (rs667920), pulmonary function (rs61848342, rs13723, and rs112635299), fibrinogen levels (rs67920, rs16844401, and rs2074158), glomerular filtration rate (rs12500824), high-density lipoprotein cholesterol (rs667920, rs10512861, and rs6905288), low-density lipoprotein cholesterol (rs10512861), total cholesterol (rs6997340), triglycerides (rs667920, rs3936511, rs6905288, and rs6997340), diabetes mellitus (rs1591805 and rs3936511), blood pressure indices (rs260020, rs17080091, rs61776719, rs7696431, and rs1317507), transferrin levels (rs6997340), QRS amplitude (rs13723), abdominal aortic aneurysm (rs885150 and rs3827066), adiponectin measurements (rs6905288), and age at menarche (rs1591805); full details can be found in Online Table XIX. We also explored the association of the 64 lead SNPs with a range of traits in UK Biobank resource. Consistent with the GWAS-catalog search and in keeping with earlier observations in established CAD loci, several of our novel loci were associated with hyperlipidemia, blood pressure traits, diabetes mellitus, and anthropometric traits (Figure 2). For example, rs6905288 (VEGFA) was also associated with waist-to-hip ratio and hyperlipidemia, and rs61776719 (FHL3 and UTP11L) was also closely associated with pulse pressure in UK Biobank. Interestingly, we observed that 15 of 64 loci were associated with platelet counts.
Genetic Risk for CAD, and Association With CAD Risk Factors and Outcome
To explore potential clinical relevance, we constructed a GRS, weighted for their effects in CARDIoGRAMplusC4D by multiplying the effect sizes with the number of effect variants of each variant in each individual, and divided this GRS into quintiles. The associations with many different traits and diseases from the UK Biobank are visualized in Figure 2. The risk of a future diagnosis of atrial fibrillation and heart failure in UK Biobank participants was higher in quantile 5 individuals as compared with quantile 1 (hazard ratio, 1.18 [95% confidence interval, 1.10–1.27; P=1.2×10−6] and 1.59 [95% confidence interval, 1.43–1.77; P=3.3×10−18], respectively; Online Figure IV). In addition, all-cause mortality and especially cardiovascular mortality was higher in individuals of quantile 5 compared with quantile 1 (hazard ratio, 1.12 [95% confidence interval, 1.06–1.19; P=4×10−4] and 1.94 [95% confidence interval, 1.70–2.21; P=2×10−23], respectively; Online Figure IV).
Role of Regulatory DNA and Fine Mapping of Candidate Causal Variants
Across the genome, virtually all tissues showed significant enrichment of DNase I hypersensitivity sites providing limited indications for involved biology (Figure 3A and 3B). Minimal differential enrichment of functional elements for the identified genetic loci was observed in blood vessels and liver. To facilitate future functional studies directed at causal variants and molecular mechanisms, we prioritized variants via the probabilistic framework of Probabilistic Annotation Integrator. Because no clear differential enrichment was observed for tissue-specific functional elements, we focused on DNA annotations from the study of Finucane et al17 that are not specific for tissue or cell types. Probabilistic Annotation Integrator determined the significance of each annotation to be causal (Figure 3C and 3D), and a model was constructed using linkage disequilibrium information, P value distribution, and information on coding variation, conservation and H3K4me1 sites to prioritize potential causal SNPs of all 161 (known and novel) loci. This analysis yielded 28 variants ≥95% confidence level for which we prioritized candidate genes (Online Table XX; Table 2).
|Cytoband||Causal Variant||No. of SNPs in Locus||MAF (EUR)||GWAS P Value||Posterior P Value||Annotation||Candidate Gene/Mechanism|
|1p13.3||rs602633||67||0.21||3.6×10−58||1.00||Downstream||SORT1†‡(130)§(51.1), SARS‡(11.6), PSRC1(152), CELSR2‡(108), ATXN7L2‡(11.7)|
|2q35||rs2571445||50||0.40||1.6×10−12||0.97||Missense (T)||TNS1*†‡(121.5), DIRC3‡(7.9)|
|3p21.31||rs7633770‖||49||0.44||1.1×10−08||0.97||Intergenic||ALS2CL‡(8.6), RTP3§(49.6), LTF§(49.6)|
|6p24.1||rs9349379||268||0.41||2.7×10−76||1.00||Intronic||EDN1†‡(2.2)§(23.9), TBC1D7‡(15.9), PHACTR1‡(55.7), GFOD1‡(8.1)|
|7p13||rs2107732‖||11||0.10||3.6×10−08||0.98||Missense (T)||CCM2*†§(9.6), MYO1G§(22.9)|
|7q32.2||rs11556924||95||0.38||1.4×10−23||1.00||Missense (D)||ZC3HC1*†, NRF1§(38), KLF14§(216.9)|
|11p15.4||rs11601507‖||3||0.06||5.6×10−13||1.00||Missense (D)||TRIM5*†, OR52N1§(45), TRIM6§(49), OR52B6§(49)|
|19q13.32||rs7412||39||0.07||2.1×10−35||1.00||Missense (D)||APOE*†, APOC2§(64.1), CLPTM1§(64.1), APOC4§(64.1)|
|20q11.22||rs867186||>500||0.10||6.8×10−12||0.97||Missense (T)||PROCR*†‡(20.3), TRPC4AP†‡(42.9)§(116), GGT7‡(4.6), EDEM2‡(7.9), NCOA6§(75.1), HMGB3P1§(75.1)|
For example, rs974819 was prioritized as causal variant and could be linked to PDGFD by Hi-C evidence and eQTL data in relevant tissues (Online Figure V). In total, 15 of the 28 fine-mapped loci could be pinpointed to 1 single potential causal mechanism implicating a single gene. For 2 loci, there were 2 potential causal mechanisms (TRPC4AP/PROCR and MRPS6/SLC5A3) with equal evidence.
The present study is the largest genetic association study of CAD performed to date. We report on the primary results and downstream bioinformatic analyses of the meta-analysis of de novo GWAS data derived from the UK Biobank combined with existing data from CARDIoGRAMplusC4D, leading to the inclusion of ≤122 733 cases and 424 528 controls. This study contributes to the existing literature by reporting 64 novel genetic loci representing 38% of all 161 GWAS-identified CAD loci to date.18 For the novel loci, a detailed catalog of 155 candidate genes (based on proximity, gene-expression data, coding variation, and physical chromatin interaction) is provided. We demonstrate that the increase in significantly associated CAD loci results in a large expansion of implicated reconstituted gene networks, from 4% to almost 14%. Finally, by integrating genetic association strength, linkage disequilibrium, and functional annotation data, we performed fine mapping of all 161 CAD loci, providing a novel credible list of causal variants and plausible genes to be prioritized for functional validation.
The 64 novel genetic loci reported in this single article are exceptionally large compared with previous articles, including those of CARDIoGRAMplusC4D and others reporting on 10 to 15 novel loci each.2–10 Thirty-four of the 64 loci are significant in a robust reciprocal replication strategy between CARDIoGRAMplusC4D and the UK Biobank, but another 30 are genome-wide significant in the overall meta-analysis as is commonly considered sufficient evidence.7,10 The obvious reason for the large number of novel loci is the considerable number of novel CAD cases and non-CAD controls compared with these earlier efforts combined with less heterogeneity in samples, collection, and definitions used. By increasing the sample size, more loci can be identified, more genes can be implicated, and more gene networks or pathways can be constructed. Not only is the increase of associated loci in the past decade rapidly outpacing functional validation, even understanding biological networks seems to insufficiently accommodate the increased amount of GWAS hits under the conceptual polygenetic model. This can be illustrated by the large increase of reconstituted gene networks observed in our study. For the first time, we show that almost 14% of all existing gene networks are involved in the complex CAD trait (Figure 1), and this will only increase when further samples are added to the GWAS study making it increasingly more difficult to consider these all to be key pathways. In our data, we also observed genetic association signals to be spread across most of the genome, and many of the novel 155 candidate genes do not have an obvious connection to CAD. In addition, virtually all cell types showed significant enrichment of DNase I hypersensitivity and other functional elements. These notions are all supportive of the omnigenic model, which has recently been proposed by the Pritchard team suggesting that prevailing conceptual models for complex diseases are incomplete. The omnigenic model hypothesizes that all gene-regulatory networks are sufficiently interconnected such that all genes expressed in disease-relevant cells can influence the function of core disease-related genes and a major proportion of heritability can be explained by effects of genes outside key pathways.19 To further our knowledge, it is questionable whether further increasing the GWAS sample size will resolve the outstanding issues concerning our incomplete understanding of cellular regulatory networks and our ability to differentiate core genes from peripheral genes. If the omnigenic model is indeed correct, detailed mapping of cell-specific regulatory networks will be essential to understand CAD.
To facilitate functional research based on our findings, we not only provided extensive bioinformatic analyses of coding variation, gene expression, and chromatin interactions for the 64 novel loci but also performed novel fine mapping and presented statistically convincing arguments for causal genetic variants at 28 loci, linking 19 genes in the 161 CAD loci. In the known loci, these genes included APOE, PCSK9, ANGPTL4, and SORT1, all implicated as core genes in lipid metabolism. Recently, PCSK9 has been validated in clinical trials,20 and functional studies are also supporting a key role for SORT1.21 More recently, EDN1 has indeed been identified as the likely causal gene in the pathogenesis of CAD instead of the nearby PHACTR.22 In the novel loci, we found evidence for causal variants linked to FNDC3B (Fibronectin Type III Domain Containing 3B), CCM2 (CCM2 Scaffolding Protein), and TRIM5 (Tripartite Motif Containing 5). Indeed, the functional link between these genes and CAD is not obvious and remains to be determined. FNDC3B has been suggested to function as a positive regulator of adipogenesis.23CCM2 has been implicated in abnormal vascular morphogenesis in the brain, leading to cerebral cavernous malformations24 but is also expressed in the heart. Although its effect in the coronary arteries has not been investigated, Ccm2 knockdown in the mouse brain endothelial cells leads to increased monolayer permeability, decreased tubule formation, and reduced cell migration after wound healing.25TRIM5 has been suggested to promote innate immune signaling, and its activity is amplified by retroviral infections.26 All SNP-gene mechanisms proposed in this article should be experimentally sought out. Also, the analyses were restricted to variants available in the Haplotype Reference Consortium imputation panel. Although this is the largest imputation panel to date, it only comprised SNPs; future fine-mapping efforts are necessary that include non-SNPs as well, such as indels, to cover the additional aspects of the human variation landscape. However, a 95% credible set that contains just 1 potential causal variant per locus provides a first starting point for generating new hypotheses and scientific explorations.
In our current work, we validated our previous finding that these genetic variants of CAD also predict the risk of atrial fibrillation, heart failure,8 and extended it to all-cause death. We also aimed to differentiate between stable CAD and acute myocardial infarction by performing multinomial logistic regression analyses. Most loci were not driven by 1 clinical presentation specifically. However, for 2 previously identified loci (rs9349379 [EDN1] and rs10947789 [KCNK5]), we found statistical evidence that these loci may be driven by acute myocardial infarction and not stable CAD. Also, for this observation, functional hypotheses are to be developed and tested. Our variants might be driven mainly by nonfatal CAD, and different variants might exist for fatal heart disease.
Some limitations of the current work are to be acknowledged. This work is based on statistical evidence and does not provide functional experimental validation. The genetic variants identified and the genes prioritized require further direct investigations in future studies to elucidate their role, and function, in the development and progression of CAD. However, in the short term, these data open up new possibilities to improve quantitative measures of genetic risk prediction. Recent data suggests that instead of operating in a deterministic fashion, high genetic risk is indeed modifiable by lifestyle,27 pharmacotherapy,28 and also by incorporation of genetic risk into shared decision-making sessions with patients.29
In conclusion, our GWAS, meta-analyses, and bioinformatic analyses provide several novel insights into the biology of CAD. We report 64 novel loci, link 155 candidate genes, and performed fine mapping of all old and novel loci, providing a credible list of causal genetic variants. However, with the ever-increasing sample size, our work is the first to indicate that an omnigenic model may be more appropriate to accommodate the complex genetic architecture of CAD, compared with a polygenic model. In addition to an expanded view, it also suggests new methods and tools are required to further our understanding of CAD biology through genetics.
coronary artery disease
CCM2 scaffolding protein
data-driven expression-prioritized integration for complex
expression quantitative trait locus
fibronectin type III domain containing 3B
genome-wide association study
tripartite motif containing 5
This research has been conducted using the UK Biobank resource under application number 12006 and 15031. We thank the CARDIoGRAMplusC4D investigators for making their data publicly available. We would like to thank the Center for Information Technology of the University of Groningen for their support and for providing access to the Peregrine high-performance computing cluster.
Sources of Funding
N. Verweij is supported by Marie Sklodowska-Curie GF (call: H2020-MSCA-IF-2014; project identifier: 661395) and an NWO VENI grant (016.186.125). We acknowledge the support from the Netherlands Cardiovascular Research Initiative—an initiative with support of the Dutch Heart Foundation, CVON2015-17 EARLY-SYNERGY.
Wang H, Naghavi M, Allen C, Barber R, Bhutta ZA, Carter C, Casey C, Charlson F, Chen C, Coates M, Dandona H. Global, regional, and national life expectancy, all-cause mortality, and cause-specific mortality for 249 causes of death, 1980-2015: a systematic analysis for the Global Burden of Disease Study 2015.Lancet. 2016; 388:1459–1544.CrossrefMedlineGoogle Scholar
Samani NJ, Erdmann J, Hall AS,; WTCCC and the Cardiogenics Consortium. Genomewide association analysis of coronary artery disease.N Engl J Med. 2007; 357:443–453. doi: 10.1056/NEJMoa072366.CrossrefMedlineGoogle Scholar
Helgadottir A, Thorleifsson G, Manolescu A,. A common variant on chromosome 9p21 affects the risk of myocardial infarction.Science. 2007; 316:1491–1493. doi: 10.1126/science.1142842.CrossrefMedlineGoogle Scholar
McPherson R, Pertsemlidis A, Kavaslar N, Stewart A, Roberts R, Cox DR, Hinds DA, Pennacchio LA, Tybjaerg-Hansen A, Folsom AR, Boerwinkle E, Hobbs HH, Cohen JC. A common allele on chromosome 9 associated with coronary heart disease.Science. 2007; 316:1488–1491. doi: 10.1126/science.1142447.CrossrefMedlineGoogle Scholar
Schunkert H, König IR, Kathiresan S,; Cardiogenics; CARDIoGRAM Consortium. Large-scale association analysis identifies 13 new susceptibility loci for coronary artery disease.Nat Genet. 2011; 43:333–338. doi: 10.1038/ng.784.CrossrefMedlineGoogle Scholar
Deloukas P, Kanoni S, Willenborg C,; CARDIoGRAMplusC4D Consortium. Large-scale association analysis identifies new risk loci for coronary artery disease.Nat Genet. 2013; 45:25–33. doi: 10.1038/ng.2480.CrossrefMedlineGoogle Scholar
Nikpay M, Goel A, Won HH,. A comprehensive 1,000 Genomes-based genome-wide association meta-analysis of coronary artery disease.Nat Genet. 2015; 47:1121–1130. doi: 10.1038/ng.3396.CrossrefMedlineGoogle Scholar
Verweij N, Eppinga RN, Hagemeijer Y, van der Harst P. Identification of 15 novel risk loci for coronary artery disease and genetic risk of recurrent events, atrial fibrillation and heart failure.Sci Rep. 2017; 7:2761. doi: 10.1038/s41598-017-03062-8.CrossrefMedlineGoogle Scholar
Howson JMM, Zhao W, Barnes DR,; CARDIoGRAMplusC4D; EPIC-CVD. Fifteen new risk loci for coronary artery disease highlight arterial-wall-specific mechanisms.Nat Genet. 2017; 49:1113–1119. doi: 10.1038/ng.3874.CrossrefMedlineGoogle Scholar
Nelson CP, Goel A, Butterworth AS,; EPIC-CVD Consortium; CARDIoGRAMplusC4D; UK Biobank CardioMetabolic Consortium CHD Working Group. Association analyses based on false discovery rate implicate new loci for coronary artery disease.Nat Genet. 2017; 49:1385–1391. doi: 10.1038/ng.3913.CrossrefMedlineGoogle Scholar
Khera AV, Kathiresan S. Genetics of coronary artery disease: discovery, biology and clinical translation.Nat Rev Genet. 2017; 18:331–344. doi: 10.1038/nrg.2016.160.CrossrefMedlineGoogle Scholar
Pers TH, Karjalainen JM, Chan Y,; Genetic Investigation of ANthropometric Traits (GIANT) Consortium. Biological interpretation of genome-wide association studies using predicted gene functions.Nat Commun. 2015; 6:5890. doi: 10.1038/ncomms6890.CrossrefMedlineGoogle Scholar
van der Harst P, van Setten J, Verweij N,. 52 genetic loci influencing myocardial mass.J Am Coll Cardiol. 2016; 68:1435–1448. doi: 10.1016/j.jacc.2016.07.729.CrossrefMedlineGoogle Scholar
Iotchkova V, Huang J, Morris JA,; UK10K Consortium. Discovery and refinement of genetic loci associated with cardiometabolic risk using dense imputation maps.Nat Genet. 2016; 48:1303–1312. doi: 10.1038/ng.3668.CrossrefMedlineGoogle Scholar
Kichaev G, Yang WY, Lindstrom S, Hormozdiari F, Eskin E, Price AL, Kraft P, Pasaniuc B. Integrating functional data to prioritize causal variants in statistical fine-mapping studies.PLoS Genet. 2014; 10:e1004722. doi: 10.1371/journal.pgen.1004722.CrossrefMedlineGoogle Scholar
Stitziel NO, Stirrups KE, Masca NG,; Myocardial Infarction Genetics and CARDIoGRAM Exome Consortia Investigators. Coding variation in ANGPTL4, LPL, and SVEP1 and the risk of coronary disease.N Engl J Med. 2016; 374:1134–1144. doi: 10.1056/NEJMoa1507652.CrossrefMedlineGoogle Scholar
Finucane HK, Bulik-Sullivan B, Gusev A,; ReproGen Consortium; Schizophrenia Working Group of the Psychiatric Genomics Consortium; RACI Consortium. Partitioning heritability by functional annotation using genome-wide association summary statistics.Nat Genet. 2015; 47:1228–1235. doi: 10.1038/ng.3404.CrossrefMedlineGoogle Scholar
Klarin D, Zhu QM, Emdin CA,; CARDIoGRAMplusC4D Consortium. Genetic analysis in UK biobank links insulin resistance and transendothelial migration pathways to coronary artery disease.Nat Genet. 2017; 49:1392–1397. doi: 10.1038/ng.3914.CrossrefMedlineGoogle Scholar
Boyle EA, Li YI, Pritchard JK. An expanded view of complex traits: from polygenic to omnigenic.Cell. 2017; 169:1177–1186. doi: 10.1016/j.cell.2017.05.038.CrossrefMedlineGoogle Scholar
Ridker PM, Revkin J, Amarenco P,; SPIRE Cardiovascular Outcome Investigators. Cardiovascular efficacy and safety of bococizumab in high-risk patients.N Engl J Med. 2017; 376:1527–1539. doi: 10.1056/NEJMoa1701488.CrossrefMedlineGoogle Scholar
Musunuru K, Strong A, Frank-Kamenetsky M,. From noncoding variant to phenotype via SORT1 at the 1p13 cholesterol locus.Nature. 2010; 466:714–719. doi: 10.1038/nature09266.CrossrefMedlineGoogle Scholar
Gupta RM, Hadaya J, Trehan A,. A genetic variant associated with five vascular diseases is a distal regulator of endothelin-1 gene expression.Cell. 2017; 170:522.e15–533.e15. doi: 10.1016/j.cell.2017.06.049.CrossrefGoogle Scholar
Kishimoto K, Kato A, Osada S, Nishizuka M, Imagawa M. Fad104, a positive regulator of adipogenesis, negatively regulates osteoblast differentiation.Biochem Biophys Res Commun. 2010; 397:187–191. doi: 10.1016/j.bbrc.2010.05.077.CrossrefMedlineGoogle Scholar
Liquori CL, Berg MJ, Siegel AM,. Mutations in a gene encoding a novel protein containing a phosphotyrosine-binding domain cause type 2 cerebral cavernous malformations.Am J Hum Genet. 2003; 73:1459–1464. doi: 10.1086/380314.CrossrefMedlineGoogle Scholar
Crose LE, Hilder TL, Sciaky N, Johnson GL. Cerebral cavernous malformation 2 protein promotes smad ubiquitin regulatory factor 1-mediated RhoA degradation in endothelial cells.J Biol Chem. 2009; 284:13301–13305. doi: 10.1074/jbc.C900009200.CrossrefMedlineGoogle Scholar
Pertel T, Hausmann S, Morger D,. TRIM5 is an innate immune sensor for the retrovirus capsid lattice.Nature. 2011; 472:361–365. doi: 10.1038/nature09976.CrossrefMedlineGoogle Scholar
Khera AV, Emdin CA, Drake I,. Genetic risk, adherence to a healthy lifestyle, and coronary disease.N Engl J Med. 2016; 375:2349–2358. doi: 10.1056/NEJMoa1605086.CrossrefMedlineGoogle Scholar
Mega JL, Stitziel NO, Smith JG,. Genetic risk, coronary heart disease events, and the clinical benefit of statin therapy: an analysis of primary and secondary prevention trials.Lancet. 2015; 385:2264–2271. doi: 10.1016/S0140-6736(14)61730-X.CrossrefMedlineGoogle Scholar
Kullo IJ, Jouni H, Austin EE, Brown SA, Kruisselbrink TM, Isseh IN, Haddad RA, Marroush TS, Shameer K, Olson JE, Broeckel U, Green RC, Schaid DJ, Montori VM, Bailey KR. Incorporating a genetic risk score into coronary heart disease risk estimates: effect on low-density lipoprotein cholesterol levels (the MI-GENES clinical trial).Circulation. 2016; 133:1181–1188. doi: 10.1161/CIRCULATIONAHA.115.020109.LinkGoogle Scholar
Novelty and Significance
What Is Known?
Coronary artery disease (CAD) is a multifactorial disease with a substantial heritable component.
Genome-wide association studies in the past decade have identified 96 loci associated with CAD and are believed to provide biological insights into key pathways under the presumption of a polygenetic model.
What New Information Does This Article Contribute?
We have identified 64 additional loci, which were associated with CAD. We fine mapped all new and known loci to provide evidence in support of the causal role of the genetic variants or genes in CAD.
Network analyses suggest a complex genetic architecture of CAD, which might not be fully captured by the prevailing polygenetic model of CAD.
This work lends support to the omnigenetic model, proposing that associated genetic variants might not necessarily lay in key disease pathways. Instead, all gene-regulatory networks maybe sufficiently interrelated such that all genes expressed, including those outside key disease pathways, may influence key disease-related genes.
CAD—a leading cause of death—is a complex multifactorial disease. Genome-wide association studies of CAD have offered new biological insights and added to risk prediction and identification of drugable targets. We performed a large systematic meta-analysis of genome-wide association studies, involving 122 733 cases and 424 528 controls and identified 64 new genetic loci that were associated with CAD. Fine mapping of all known and novel CAD loci highlighted potential causal single-nucleotide polymorphism–gene mechanisms. A large proportion of all biological pathways and a plethora of human tissues were found to be associated with CAD for no obvious reason. This finding could indicate that the polygenic model may not uphold with ever increasing sample sizes for CAD genetics, and the omnigenic model may be more appropriate to accommodate the increasing complexity. This study underscores the importance of tissue-specific dedicated mechanistic studies. New methods and tools are required to advance our understanding of genetic mechanisms influencing the development and progression of CAD.