On June 25, 2024, the AHA Journals will be launching a new website design. During the launch process, there may be intermittent outages, and some features (alert sign-ups, article/issue purchases, account customizations/activations, and comment submissions) may be unavailable. This message will be removed when the launch process is complete. Thank you for your patience and we hope that you enjoy the new site!

Skip main navigation

Systems Genetics Analysis of Genome-Wide Association Study Reveals Novel Associations Between Key Biological Processes and Coronary Artery Disease

and CARDIoGRAM consortium
Originally publishedhttps://doi.org/10.1161/ATVBAHA.115.305513Arteriosclerosis, Thrombosis, and Vascular Biology. 2015;35:1712–1722



Genome-wide association studies have identified multiple genetic variants affecting the risk of coronary artery disease (CAD). However, individually these explain only a small fraction of the heritability of CAD and for most, the causal biological mechanisms remain unclear. We sought to obtain further insights into potential causal processes of CAD by integrating large-scale GWA data with expertly curated databases of core human pathways and functional networks.

Approaches and Results—

Using pathways (gene sets) from Reactome, we carried out a 2-stage gene set enrichment analysis strategy. From a meta-analyzed discovery cohort of 7 CAD genome-wide association study data sets (9889 cases/11 089 controls), nominally significant gene sets were tested for replication in a meta-analysis of 9 additional studies (15 502 cases/55 730 controls) from the Coronary ARtery DIsease Genome wide Replication and Meta-analysis (CARDIoGRAM) Consortium. A total of 32 of 639 Reactome pathways tested showed convincing association with CAD (replication P<0.05). These pathways resided in 9 of 21 core biological processes represented in Reactome, and included pathways relevant to extracellular matrix (ECM) integrity, innate immunity, axon guidance, and signaling by PDRF (platelet-derived growth factor), NOTCH, and the transforming growth factor-β/SMAD receptor complex. Many of these pathways had strengths of association comparable to those observed in lipid transport pathways. Network analysis of unique genes within the replicated pathways further revealed several interconnected functional and topologically interacting modules representing novel associations (eg, semaphoring-regulated axonal guidance pathway) besides confirming known processes (lipid metabolism). The connectivity in the observed networks was statistically significant compared with random networks (P<0.001). Network centrality analysis (degree and betweenness) further identified genes (eg, NCAM1, FYN, FURIN, etc) likely to play critical roles in the maintenance and functioning of several of the replicated pathways.


These findings provide novel insights into how genetic variation, interpreted in the context of biological processes and functional interactions among genes, may help define the genetic architecture of CAD.

Meta-analysis of genome-wide association studies (GWAS) involving tens of thousands of subjects have provided a wealth of new information on the genetic basis of coronary artery disease (CAD), yet common susceptibility variants with achieved genome-wide significance explain only a small fraction of the heritability of CAD (≈10.6%).1,2 It has been proposed that much of the residual genetic risk may be attributable to rare variants with large effect.3,4 However, recent simulation, exome sequencing, and fine mapping studies of established GWAS loci support the hypothesis that joint contributions from common variants with modest effects are likely to account for a sizeable fraction of the missing heritability of complex diseases.57

It is likely that many more common variants are linked to CAD but have not achieved genome-wide significance in GWAS because of small effect size or lower allele frequency and insufficient sample size. However, based on the premise that clinically informative polymorphisms related to complex disease occur in systems of closely interacting genes,8 even weakly associated variants may provide important information about the biological basis of disease when such variants cluster within a common functional module or pathway. One common approach for pathway-based analysis of genomic data is gene set enrichment analysis (GSEA), originally developed and extensively used for the analysis of gene expression data.9 In 2007, Wang et al10 described a modified version of the GSEA, designed to analyze genome-wide single-nucleotide polymorphism (SNP) associations rather than gene expression data. Since then, several other GSEA methods have been developed for this purpose.1115 The common goal of these analytic algorithms is to identify a subset of genes whose variants collectively demonstrate strong association with a trait of interest even if the component SNPs individually exhibit relatively modest or nonsignificant association. Importantly, pathway analysis can also place the set of validated SNPs for a trait of interest into a broader and clearer biological context. A natural extension of this list-based pathway approach is the interrogation of molecular networks to unravel the architecture underlying complex diseases. A molecular network is based on interactions among biomolecules (genes, protein, metabolites, etc), where such interactions can take various forms (protein–protein interactions, coexpression, gene regulation, functional interactions, etc). Efforts at the characterization of disease-associated genes reveal that genes associated with the same or similar disorders tend to occupy similar neighborhoods in molecular networks through physical or functional modules.16,17 Furthermore, the study of network topology suggests that key disease-related genes differ from other genes in terms of their network connectivity and network centrality properties.17 Finally, molecular networks provide 2 distinct enhancements over traditional pathway-based approach—(1) they provide additional information on interactions among gene subsets within a given pathway, and (2) they allow for the identification of interactions between components of different biological pathways. Through these analyses, 1 is able to draw a clearer picture of the functional connectivities that influence pathway functions, and how multiple pathways may interact with 1 another to influence a phenotype.

Several studies have applied molecular networks for generating insights from GWAS data8,1820 in disorders, such as schizophrenia, multiple sclerosis, and prostate cancer. However, most of these approaches have relied mainly on protein–protein interaction networks, thereby missing the rich mechanistic information available from traditional biological pathway repositories and networks based on functional interactions. In this study, we have coupled the advantages of a well-curated biological pathway repository with a similarly curated functional interaction network to identify mechanism-based processes that may underlie the genetic architecture of CAD. First, to identify novel associations between established biological mechanisms and CAD, we have carried out a 2-stage pathway-based GSEA analysis of 16 GWAS data sets for CAD using the i-GSEA4GWAS (http://gsea4gwas.psych.ac.cn/inputPage.jsp) tool15 and the Reactome pathway database.21 Collectively, these GWAS include >25 000 subjects with CAD and >66 000 controls. We have then taken the replicated pathways as a starting point to explore functional interactions within and between pathways via interrogation of molecular interacting networks. Finally, we have characterized the CAD-associated genes based on their topological properties within these networks as a way of prioritizing gene candidates for functional follow-up studies.

Materials and Methods

Materials and Methods are available in the online-only Data Supplement. Briefly, using pathways (gene sets) from Reactome, we first carried out a 2-stage gene set enrichment analysis strategy. From a meta-analyzed discovery cohort of 7 CAD GWAS data sets (9889 cases/11 089 controls), nominally significant gene sets were tested for replication in a meta-analysis of 9 additional studies (15 502 cases/55 730 controls) from the CARDIoGRAM Consortium (Table 1). Genes from the replicated pathways were then mapped onto well-curated interaction networks (Figure 1).

Figure 1.

Figure 1. Analytic approach. Schematic of analytic approach as described in detail in the Materials and Methods section of this article. FDR indicates false discovery rate; PID, pathway interaction database; and SNP, single-nucleotide polymorphism.

Table 1. Demographics of Discovery and Replication Cohorts

GWAS Data SetNo. of Cases/ControlsAge (mean±SD)Cases/Controls% MenCases/Controls% MICases
Stage 1 studies
 GerMIFs I875/164450.2±7.8/62.6±10.050.6/49.2100
 GerMIFs II1222/129851.4±7.5/51.2±11.966.9/51.7100
 GerMIFs III (KORA)1157/174858.6±8.7/55.9±10.779.9/51.1100
 Total stage 19889/11 089
Stage 2 studies
 CHARGE2287/22 02460.0±7.9/63.1±8.066.6/40.448.0
 deCODE CAD6640/27 61174.8±11.8/53.7±21.563.7/38.154.7
 LURIC/AtheroRemo 1652/21361.0±11.8/58.3±12.179.7/54.071.9
 LURIC/AtheroRemo 2486/29663.7±9.4/56.4±12.776.6/51.479.0
 Total stage 212 501/55 730
 Total stages 1 and 225 491/66 819

CAD indicates coronary artery disease; GWAS, genome-wide association study; and WTCCC, Welcome Trust Case Control Consortium.


Significant Pathways

A total of 85 of the 639 Reactome pathways tested in stage 1 achieved a gene set enrichment P<0.05 at a false discovery rate (FDR) <0.25. Thirty-two of these 85 pathways were further replicated in stage 2 at a nominal P<0.05 (Table 2). When the replicated pathways were compared with the full pathway content of Reactome, at least 1 replicated from 9 of the 21 core Reactome-defined biological processes. These included the core processes of metabolism, signal transduction, developmental biology, ECM organization, immune system, metabolism of proteins, cell–cell communication, transmembrane transport of small molecules, and gene expression (Figure 2). Because of the hierarchical organization of Reactome pathways, several replicated pathways were nested within larger gene sets, either completely or partially (Figure II in the online-only Data Supplement). This hierarchical structure enabled us to identify instances of pathway selectivity—for example, although the CRMPS (collapsin response mediator proteins) in SEMA3A (semaphorin) signaling, Sema4D in semaphorin signaling, and Sema3A PAK (p21 protein activated kinase)-dependent axon repulsion pathways all nested completely within the Semaphorin Interactions pathway, only the former was significantly replicated (P<0.001), whereas the latter 2 pathways were not. To put the identified pathways in a broader context, we have also listed the nonreplicated pathways that share similar levels of hierarchy as the replicated pathways in Table I (online-only Data Supplement).

Table 2. List of Replicated Reactome Pathways Enriched for Genetic Association to Coronary Artery Disease

Reactome PathwayEnrichmentP ValueFDRNo. of Genes/Pathway

Pathway names are listed in column 1; column 2 lists the nominal P value for pathway enrichment; column 3 lists the corresponding FDR; and column 4 records the number of genes in each pathway. FDR indicates false discovery rate; NCAM, neural cell adhesion molecule; and TGF, transforming growth factor.

Figure 2.

Figure 2. Replicated reactome pathways for coronary artery disease using i-GSEA4GWAS with a 100 kb mapping interval. Replicated pathways are represented in a hierarchical Reactome pathway diagram. Top-level pathways, representing core biological processes, are listed to the left, and sublevels corresponding to each top level are illustrated progressively to the right. The 9 top-level pathways that contain at least 1 replicated pathway (top-level or sublevels) are shown. No sublevel pathways are shown to the right of the last replicated pathway. Pathways are color coded according to their gene-set enrichment P value from the replication stage as indicated in the legend. A P<0.05 corresponds to an false discovery rates <12.5%. Pathways containing <10 or greater than 200 genes were not tested. Replicated pathways with >50% overlap of genes with other replicated pathways are also identified as indicated in the legend. HDL indicates high-density lipoprotein; NCAM, neural cell adhesion molecule; and TGF, transforming growth factor.

About a third of the 32 replicated pathways were also significant in stage 2 (P<0.05) after correcting for linkage disequilibrium between the SNPs, by analyzing SNPs pruned genome-wide at either r2>0.5 or r2>0.2 (Table II in the online-only Data Supplement). The pathways that were in common to all 3 pruned and unpruned SNP analyses were Toll receptor cascades, degradation of the ECM, lipid digestion, mobilization, and transport, and lipoprotein metabolism. Although the association of these pathways may be of higher confidence, pruning of SNPs may also lead to loss of power because of significant reduction in SNP number (5% to 15% of unpruned SNPs) and to the fact that the pruning was agnostic to the actual CAD SNP association P values. Hence, for downstream gene and network analyses we chose to use the full set of 19 pathways that replicated with the unpruned list of SNPs.

Finally, we examined the possible effect of LD among genes leading to inflated significance scores for the replicated pathways by considering the extent of LD among the gene-tagging (best scoring) SNPs for all genes in a pathway. The extent of LD among the most significant SNPs was found to be minimal. Specifically, of all the SNPs tested, we found only 2 SNP pairs with an r2>0.8, observed across 3 pathways. Even at the more permissive r2 threshold of 0.2, only 4 SNP pairs were observed across 5 pathways (Table III in the online-only Data Supplement).

Gene and Pathway Prioritization

The 32 replicated pathways contained a total of 770 unique genes that were taggable by at least 1 SNP (no SNP tags were available for 83 genes). Figure SIII (online-only Data Supplement) summarizes the proportion of genes within the replicated pathways that were associated with CAD. All replicated pathways contained ≥50% genes above the significance threshold (range, 50.0%–92.3%), confirming that the pathway findings were driven by the combined contributions of multiple genes in each pathway and not because of large effects from a small minority of genes. For comparison purposes, we also analyzed a synthetic pathway derived from genes within the CARDIoGRAM loci reaching genome-wide significance. This synthetic pathway contained the second highest proportion of genes reaching the significance threshold.

Network Analysis

Statistical Evaluation of Network

A total of 770 genes from the replicated pathways were mapped to the InWeb PPI network and the observed network connectivity parameters (degree, and number of edges) compared with random networks of similar size and degree distribution. A network of direct interactions could be created with 620 genes (assuming a minimum interaction size of 2 genes). The resulting network (Figure SIV) was significantly different with respect to random networks; thus there were 3726 direct edges in the network compared with only 1548 edges expected by chance (P<0.001), and the observed average connectivity per gene (degree of gene) was 12, compared with an expected 5.8 from random networks (P<0.001). These results indicate that the networks constructed from the replicated pathway genes are probably not because of chance.

Mapping of Replicated Pathway Genes to an Interaction Network

Although this PPI-based analysis provided confidence that the networks derived from the replicated pathway genes are unlikely to arise from chance, it allows only limited insights into the various biological mechanisms impacted by these pathways. Thus, to identify networks that contain more relevant information on biological processes (including PPI), the genes from the replicated pathways were mapped to a functionally interacting network curated and maintained at Reactome. A total of 733 genes could be mapped to the larger network. This subnetwork was further clustered to reveal within-network modules. Clustering resulted in the identification of 17 clusters with 10 clusters containing >10 gene members (Figure 3; Table SIV). Within each cluster, a diverse array of interactions (reactions, complex formation, activation, etc) was represented by the edges connecting the genes (nodes), as exemplified in Figure V (online-only Data Supplement) for the genes in clusters 8 and 9. We also observed considerable interconnectivity between the clusters; for example, the links between cluster 4 and other clusters are highlighted in Figure 3 (additional intercluster connectivities for each of the remaining clusters are shown in Figure VI in the online-only Data Supplement). Enrichment analysis within each cluster using Gene Ontology (http://www.geneontology.org/) identified several cluster-specific overrepresentations of biological processes, as further highlighted in Figure 3. The following are some notable examples of functional enrichment within the clusters (FDR<0.001): innate immunity (clusters 1 and 4), Notch signaling (cluster 6), ECM organization (cluster 7), lipid metabolism (cluster 8), and axon guidance (cluster 9). The full list of all significantly overrepresented GO-BP terms (FDR<0.001) is provided in Table V (online-only Data Supplement).

Figure 3.

Figure 3. Functionally interacting network modules constructed from genes belonging to the replicated, CAD-associated pathways. Functional interactions among the genes from all replicated pathways were analyzed and clustered by the ReactomeFI (http://chianti.ucsd.edu/cyto_web/plugins/displayplugininfo.php?name=Reactome%20FIs) tool and visualized in Cytoscape. Genes are represented as nodes and interactions among genes are represented as edges. The parent network was further analyzed to yield subnetwork clusters; each cluster is shown separately and color coded for clarity. Intercluster connectivity is exemplified in red for cluster 4. The top GO-BP terms that are enriched in each cluster are listed in the blue boxes. For each cluster, all terms are at false discovery rates <0.0001 and contain a minimum of 10 genes (unless otherwise indicated in parentheses). A maximum of 10 GO-BP terms are shown for each cluster. Genes that were not linked to at least one other gene were excluded from the network diagram. TNF indicates tumor necrosis factor.

Gene and Pathway Prioritization Based on Network Topology

Network topology provides vital information toward the understanding of network architecture and performance and allows for the prioritization of genes based on their topological characteristics within the network. Thus, we interrogated the topological properties of the networks derived from the replicated pathways. Specifically, we investigated 2 key node centrality measures, namely degree and betweenness because of their reported significance in biological networks as drivers for gene/protein essentiality (see online-only Data Supplement for additional information on degree and betweenness).22 For this purpose, the replicated pathways were first converted into Reactome functional interaction networks (conversion was successful for 29 pathways, with the exclusion of collagen formation, metabolism of polyamines, and organic cation–anion zwitterion transport pathways) and subsequently analyzed for the above 2 node centrality measures. Figure 4 depicts the betweenness centrality measures for a merged network derived from 2 pathways related to cell–cell interactions (neural cell adhesion molecule [NCAM] signaling for neurite outgrowth and CRMPs in Sema3a signaling). In this network, the NCAM1 and Fyn proteins display large betweenness centrality and act as bridges connecting multiple other proteins in the network. Some additional genes with GWAS association P<0.001 that occupy potentially critical positions (betweenness >100) in a subset of the replicated pathways include FURIN (component of degradation of ECM, ECM organization, signaling by NOTCH1 pathways), MMP1 (degradation of ECM and ECM organization pathways), and RPS6KA5 (Toll receptor cascades and NCAM signaling for neurite outgrowth pathways). Results for the remaining pathways are shown in Figure VII and Table VI (online-only Data Supplement).

Figure 4.

Figure 4. Topology-based network analysis in replicated pathways. Topological relationships among genes are shown for a merged Reactome functional interaction network created in Cytoscape from 2 replicated pathways associated with cell-cell interactions (neural adhesion molecule [NCAM] signaling for neurite outgrowth and CRMPs in Sema3a signaling). Genes (nodes) in the network are color coded by their replication P values (deep red, P<0.001; lighter red, 0.001<P<0.01; lightest red, 0.01<P<0.05; white, P>0.05) and sized by their betweenness network centrality score (calculated via Centiscape 2.0). The individual gene names and their betweenness scores are listed beside the network diagram. Betweenness scores are not calculated for genes that do not connect to at least 1 other gene in the network (these genes are indicated with #N/A for betweenness).


Despite the recent successes of large GWAS meta-analyses,1,2 the genetic architecture of CAD remains poorly understood and the identified loci explain a small proportion of genetic risk. By integrating GWAS data with expertly curated databases of core human pathways as well as gene and reaction-based functional networks, we sought to obtain novel insights into the potential causal processes of coronary atherosclerosis. In addition, the large size of the discovery population and replication sample (25 000 CAD cases and 66 000 controls) and the 2-step discovery-replication strategy increases confidence in the results. This analysis implicates 32 core human pathways representing 9 distinct biological processes as being most etiologically relevant to CAD.

Notably, many replicated pathways from the 2-stage GWAS analysis strategy converged on processes regulating cellular growth, migration, and proliferation, such as the signaling by transforming growth factor-β receptor and signaling by PDGF, pathways previously intensively investigated for their functional role in coronary atherosclerosis. By combining GWAS-based findings with such a priori information, we obtained evidence that genetic variation in a critical number of genes representing these pathways contribute to the heritability of CAD. Moreover, these data support hypotheses that alterations in these pathways are potentially causally related to CAD. Specifically, transforming growth factor-β is known to control cell proliferation, cell migration, matrix synthesis, wound contraction, calcification and the immune response, all of which are major components of the atherosclerotic process.23PDGF is expressed in every cell type of the atherosclerotic arterial wall, as well as in infiltrating inflammatory cells24 and plays a key role in the migration of vascular smooth muscle cells from the media into the intima and their subsequent proliferation. Although both pathways have been studied in animal models, animal data are often conflicting or inadequate and there are no data related to modulation of these pathways in humans. Several pathways related to the integrity of the ECM were also highly significant, including ECM organization, degradation of the ECM, and cell ECM interactions. The ECM is responsible for maintaining not only the structural integrity of vessel wall plaques but also participates in several key events, such as cell migration, lipoprotein retention, and thrombosis that are critically linked to plaque stability.25

Two of the axon guidance pathway subclasses, such as CRMPs in Sema3 signaling, and NCAM signaling for neurite out-growth also replicated. The axon guidance pathways modulate diverse biological phenomena, including cellular adhesion, migration, proliferation, differentiation, survival, and synaptic plasticity through the participation of highly conserved families of guidance molecules, including netrins, slits, semaphorins, and ephrins, and their cognate receptors.26 Neural guidance cues such as netrin-1 and semaphorins have important roles outside the nervous system. Oksala et al27 provide compelling evidence that netrin-1 is secreted by macrophage foam cells in atherosclerotic plaques and acts to inhibit emigration of these cells out of lesions by causing dysregulation of the actin cytoskeleton. Wanschel et al28 reported that NTN1 is downregulated in atherosclerotic plaques and its expression correlates negatively with inflammatory markers and M2 signals. Like netrin-1, semaphorin 3A, encoded by SEMA3A, one of the top-ranked genes in this analysis, is also expressed in coronary artery endothelial cells and potently inhibits chemokine-directed migration of human monocytes.29,30 This study also provides further supportive evidence for a causal role of innate immunity in atherosclerosis or plaque rupture with significant pathways, including both toll receptor cascades and initial triggering of complement. Innate immune responses mounted by macrophages and other immune cells recruited to the arterial wall in response to an inflammatory challenge have a major role in the initiation of atherosclerosis.31

An important advance encompassed in the current work is our further examination of the topological characteristics of genes comprising the replicated gene sets and the potential implication of topology on biological function. Specifically, we applied the Reactome FI tool to identify gene sets related to biological processes, such as innate immunity, cell adhesion, and lipid metabolism that were further reorganized into functionally interacting networks and subnetwork clusters demonstrating a high degree of interconnectedness. Network clustering, followed by pathway enrichment analysis on the identified clusters via Gene Ontology, generated new insights on interrelationships among the enriched pathways, not available through our initial traditional gene set analysis. For example, whereas the lipid-metabolizing genes were largely concentrated in a single cluster (cluster 8), genes related to innate immunity were, by contrast, distributed within 3 separate clusters (clusters 0, 1, and 4), along with other biological processes, highlighting the possibility of extensive interactions among these processes. Finally, through analysis of such networks, we were further able to evaluate the possible criticality of genes in network function, based on the degree and betweenness centrality properties of the network genes.

Collectively, these additional analytic approaches provide important insights into the interrelationships among genes that are not usually available through conventional gene set enrichment analysis, and could assist in the formation of testable hypotheses on areas of robustness and vulnerability in functional networks otherwise not intuitively evident. For example, topological analysis implicated a potential role for the axonal growth related pathways in CAD with NCAM1 being a major hub in a network, including plexins (PLXNA1 and PLXNA2), neuropilin-1, as well as adhesion molecules (CNTN2) and several members of the collagen family relevant to the ECM of the vessel wall (Figure 4). These data support the concept that neuronal guidance cues have important roles in both arteriogenesis32,33 and atherosclerosis by regulating macrophage retention in plaques.27,29,30 Other studies demonstrate that semaphorin 3A and its receptors, neuropilin-1 and -2, plexins A1/A2/A3 are highly expressed in human monocyte-derived macrophages and play a role in induction of macrophage apoptosis.34

Despite these plausible observations, we are cognizant that betweenness is but only one of several network centrality measures that could play critical roles in network function. Because both fields of network biology and network pharmacology are currently evolving, our findings should be considered more as hypotheses-generating rather than conclusive evidence of the importance of 1 gene or 1 pathway over another. Functional testing is necessary as the next step, and can take several forms, including (1) overexpression or knockdown of medium to high betweenness genes in target pathways (eg, NCAM1, FYN, for the network in Figure 4) in CAD-relevant cell models (eg, human coronary artery endothelial or smooth muscle cells, macrophages, etc) and to interrogate their effects on cell function (cell migration, lipid accumulation, etc); (2) testing the effects of candidate genes (eg, NCAM1 and FURIN) in knockout or overexpression mouse models (generated by somatic manipulation or transgene creation) on lesion formation (similar to studies on candidate GWAS genes for lipoprotein metabolism3538); (3) statistical epistasis analysis, limited to genes within a replicated pathway, to uncover functionally important interactions underlying the genetic basis of atherosclerosis, and (4) prioritizing gene products from replicated pathways based on the availability of pharmacological agents against them, and testing these for potential benefits in animal models of atherosclerosis (successfully demonstrated in identification of memory-modulating drugs39). We hope our approach stimulates extensive further discussion on how to experimentally interrogate CAD related networks and pathways.

We acknowledge potential caveats pertaining to this study. First, the number of pathways identified and replicated was modest but the pathways are biologically plausible. In the discovery analysis, 85 of the 639 (13%) pathways tested were significant at P<0.05 (and FDR <25%) with at least 50% of the genes in any given pathway being individually significant at a P<0.05. A total of 32 of these 85 (37%) pathways, achieved replication, a number somewhat lower than expected (75%) given the FDR threshold used in the discovery phase to select pathways for testing in the replication sample. This may reflect the less stringent criteria for age of onset of CAD cases applied in some of the replication studies as well as study-specific differences in inclusion/exclusion criteria and adjudication of outcomes, leading to increased sample heterogeneity.1 Our study also highlights several generic issues that currently impose limitations on the conduct and interpretation of pathway analyses.40 Some of these issues pertain to (1) the mapping of SNPs to genes, (2) choosing the optimum pathway analysis tool for GWAS, (3) consequences of the permutation scheme used in i-GSEA4GWAS, and (4) the effects of inter-SNP linkage disequilibrium on pathway analysis results. An additional caveat is the potential for bias in the network and topological analyses because of limitations in the extent and type of experimental data available in the source databases. We have provided a further detailed discussion of issues related to pathway and network analysis in the Results section of the online-only Data Supplement.

This is an area of emerging methodology and different approaches can yield complementary findings. Our findings extend gene-centric verification of CAD GWAS loci41 and those recently reported by CARDIoGRAM+C4D, applying Ingenuity network analysis only on the top 239 candidate genes.2 In another recently published study, based on this large-scale meta-analysis of GWAS studies for CAD, we used a different approach.42 Rather than a location-based approach to map SNPs to genes, we used expression quantitative trait locus (eQTL) data from CAD-related tissues and primary cells to link CAD SNPs to their empirically defined target genes. We then created data-driven, tissue-specific gene expression networks from a multitude of human and mouse experiments.42 These networks relied heavily on available gene expression data and did not involve other types of interactions, such as protein–protein interactions or biochemical reactions. In contrast, this analysis is based on gene-to-SNP mapping methods for gene set enrichment rather than eQTL data and our analysis of the topological relationships among genes in the filtered, replicated pathways using Reactome FI and pathway interaction database cover a more extensive array of molecular interactions, thus revealing important aspects that we failed to capture from the gene expression-based networks. It is encouraging that these 2 approaches have yielded consistent results in terms of core processes related to lipid metabolism, immune system, Notch-HLH transcription and PPAR signaling. However, here we have identified additional biologically relevant pathways, including ECM integrity, transforming growth factor-β signaling and axon guidance, the latter being of particular interest given recent laboratory findings.2730,32–34 Many of these pathways had strengths of association comparable to those observed in known pathways related to lipoprotein metabolism.

The findings of this extensive but preliminary analysis do not imply causality. However, the use of the integrative approach in elucidating the genetic bases of disease has been demonstrated by studies in several complex phenotypes. For example, in an investigation of the WTCCC (Welcome Trust Case Control Consortium) Crohn disease GWAS data set, only 3 genes at 2 loci showed GWAS significant signals but pathway analysis carried out by Wang et al11 identified the 20 gene IL-12/IL-23 pathway to be associated with Crohn disease that remained significant even when the 2 original loci were removed.43 In a similar vein, Holmans et al44 provided supporting evidence for the immunogenetic origins of Parkinson disease by identifying the regulation of leukocyte/lymphocyte activation and cytokine-mediated signaling as conferring increased susceptibility to Parkinson disease, although none of the SNPs linked to genes within these pathways had achieved GWAS significance. On the contrary, pathway analysis studies have had little success in generating new biological insights for other disorders, including type 2 diabetes mellitus. Because of this variability, extensive mechanistic and functional validation of pathway and interactome-derived networks at multiple levels will be essential. An example of systematic experimental perturbation of interactome networks to understand cancer predisposition has been presented in the study by Rozenblatt-Rozen et al45 and a framework for network inference and validation based on gene knock-down has been proposed in Olsen et al.46

In summary, the present analysis has provided potential new insights into mechanisms underlying atherosclerosis and its clinical sequelae. The results of this investigation suggest a possible link between several core human biological processes and CAD, including several with and several without a substantial body of previous experimental evidence. Further study of the genes within the highlighted pathways may facilitate the development of novel testable hypotheses that could ultimately improve our understanding of atherosclerosis.

Nonstandard Abreviation and Acronyms


coronary artery disease


extracellular matrix


false discovery rate


Coronary ARtery DIsease Genome wide Replication and Meta-analysis


single-nucleotide polymorphism


gene-set enrichment analysis


genome-wide association study


From the Program in Cardiovascular and Metabolic Disorders (S.G.) and Centre for Computational Biology (S.G.), Duke-NUS Graduate Medical School, Singapore, Singapore; Department of Cardiovascular and Metabolic Research, Biomedical Biotechnology Research Institute, North Carolina Central University, Durham (S.G., J.V.); Department of Cardiovascular Sciences, University of Leicester, Glenfield Hospital, Leicester, UK (C.P.N., N.J.S.); Institut für Integrative und Experimentelle Genomik (IIEG), Universität zu Lübeck, Lübeck, Germany (C.W., J.E.); DZHK (German Research Centre for Cardiovascular Research), Partner Site Hamburg, Kiel, Lübeck, Germany (C.W., J.E.); Broad Institute of Harvard and MIT, Cambridge, MA (A.V.S., S.K.); Department of Integrative Biology and Physiology, University of California, Los Angeles (V.-P.M., X.Y.); Atherogenomics Laboratory (M.N., R.M.P.), John and Jennifer Ruddy Canadian Cardiovascular Research Centre (A.F.R.S., R.M.P.), and Division of Cardiology (R.M.P.), University of Ottawa Heart Institute, Ottawa, Canada; Clinic for General and Interventional Cardiology, University Heart Center Hamburg, Germany (S.B.); National Heart, Lung, and Blood Institute’s Framingham Heart Study, MA (C.O.D.); Mannheim Institute of Public Health, Social and Preventive Medicine, University of Heidelberg, Germany (W.M.); Synlab Academy, Mannheim, Germany (W.M.); Science Center, Tampere University Hospital, Tampere, Finland (R.L.); Cardiovascular Research Institute, Washington Hospital Center (S.E.E.); Department of Medicine, Duke University Medical Center, Durham, NC (S.H.S., C.B.G.); Cleveland Clinic, OH (S.L.H.); Cardiology Division, Center for Human Genetic Research (S.K.) and Cardiovascular Research Center (S.K.), Massachusetts General Hospital, Harvard Medical School, Boston; Cardiovascular Institute, Perelman School of Medicine, University of Pennsylvania, Philadelphia (M.P.R.); Department of Medicine, Stanford University School of Medicine, CA (T.Q., T.L.A.); National Institute for Health Research (NIHR) Leicester Cardiovascular Biomedical Research Unit, Glenfield Hospital, Leicester, United Kingdom (N.J.S.); Deutsches Herzzentrum München, Technische Universität München, Munich, Germany (H.S.); and DZHK (German Research Centre for Cardiovascular Research), Partner Site Munich Heart Alliance, Munchen, Germany (C.P.N., H.S.).


We thank all the individuals who contributed to these multicentered studies. The full list of the investigators who are part of the CARDIoGRAM Consortium is listed in the S1 Material online-only Data Supplement. A full list of the investigators who contributed to the generation of the Wellcome Trust data is available from http://www.wtccc.org.uk.


For the author affiliations, please see the Appendix section.

*These authors contributed equally to this article.

The online-only Data Supplement is available with this article at http://atvb.ahajournals.org/lookup/suppl/doi:10.1161/ATVBAHA.115.305513/-/DC1.

Correspondence to Ruth McPherson, MD, PhD, Division of Cardiology, University of Ottawa Heart Institute, 40 Ruskin St-H4203, Ottawa, Canada K1Y 4W7. E-mail ; or Sujoy Ghosh, PhD, Duke-NUS Graduate Medical School, Center for Computational Biology, 8 College Rd, Singapore 169857, Singapore. E-mail ; or Themistocles Assimes, MD, PhD, Stanford University School of Medicine, Population Health Sciences Bldg, Suite 300, 1070 Arastradero Rd, Palo Alto, CA 94304. E-mail


  • 1. Schunkert H, König IR, Kathiresan S, et al; Cardiogenics; CARDIoGRAM Consortium. Large-scale association analysis identifies 13 new susceptibility loci for coronary artery disease.Nat Genet. 2011; 43:333–338. doi: 10.1038/ng.784.CrossrefMedlineGoogle Scholar
  • 2. Deloukas P, Kanoni S, Willenborg C, et al. Large-scale association analysis identifies new risk loci for coronary artery disease.Nat Genet2012; 45:25–33.CrossrefMedlineGoogle Scholar
  • 3. Dickson SP, Wang K, Krantz I, Hakonarson H, Goldstein DB.Rare variants create synthetic genome-wide associations.PLoS Biol. 2010; 8:e1000294. doi: 10.1371/journal.pbio.1000294.CrossrefMedlineGoogle Scholar
  • 4. Cirulli ET, Goldstein DB.Uncovering the roles of rare variants in common disease through whole-genome sequencing.Nat Rev Genet. 2010; 11:415–425. doi: 10.1038/nrg2779.CrossrefMedlineGoogle Scholar
  • 5. Visscher PM, Brown MA, McCarthy MI, Yang J.Five years of GWAS discovery.Am J Hum Genet. 2012; 90:7–24. doi: 10.1016/j.ajhg.2011.11.029.CrossrefMedlineGoogle Scholar
  • 6. Yang J, Benyamin B, McEvoy BP, Gordon S, Henders AK, Nyholt DR, Madden PA, Heath AC, Martin NG, Montgomery GW, Goddard ME, Visscher PM.Common SNPs explain a large proportion of the heritability for human height.Nat Genet. 2010; 42:565–569. doi: 10.1038/ng.608.CrossrefMedlineGoogle Scholar
  • 7. Stahl EA, Wegmann D, Trynka G, et al; Diabetes Genetics Replication and Meta-analysis Consortium; Myocardial Infarction Genetics Consortium. Bayesian inference analyses of the polygenic architecture of rheumatoid arthritis.Nat Genet. 2012; 44:483–489. doi: 10.1038/ng.2232.CrossrefMedlineGoogle Scholar
  • 8. Jia P, Wang L, Fanous AH, Pato CN, Edwards TL, Zhao Z; International Schizophrenia Consortium. Network-assisted investigation of combined causal signals from genome-wide association studies in schizophrenia.PLoS Comput Biol. 2012; 8:e1002587. doi: 10.1371/journal.pcbi.1002587.CrossrefMedlineGoogle Scholar
  • 9. Subramanian A, Tamayo P, Mootha VK, Mukherjee S, Ebert BL, Gillette MA, Paulovich A, Pomeroy SL, Golub TR, Lander ES, Mesirov JP.Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles.Proc Natl Acad Sci U S A. 2005; 102:15545–15550. doi: 10.1073/pnas.0506580102.CrossrefMedlineGoogle Scholar
  • 10. Wang K, Li M, Bucan M.Pathway-based approaches for analysis of genomewide association studies.Am J Hum Genet. 2007; 81:1278–1283. doi: 10.1086/522374.CrossrefMedlineGoogle Scholar
  • 11. Wang K, Li M, Hakonarson H.Analysing biological pathways in genome-wide association studies.Nat Rev Genet. 2010; 11:843–854. doi: 10.1038/nrg2884.CrossrefMedlineGoogle Scholar
  • 12. Wang L, Jia P, Wolfinger RD, Chen X, Zhao Z.Gene set analysis of genome-wide association studies: methodological issues and perspectives.Genomics. 2011; 98:1–8. doi: 10.1016/j.ygeno.2011.04.006.CrossrefMedlineGoogle Scholar
  • 13. Segrè AVDIAGRAM Consortium; MAGIC investigators, Groop L, Mootha VK, Daly MJ, Altshuler D.Common inherited variation in mitochondrial genes is not enriched for associations with type 2 diabetes or related glycemic traits.PLoS Genet. 2010; 6:e1001058. doi: 10.1371/journal.pgen.1001058. CrossrefMedlineGoogle Scholar
  • 14. Nam D, Kim J, Kim SY, Kim S.GSA-SNP: a general approach for gene set analysis of polymorphisms.Nucleic Acids Res. 2010; 38(web server issue):W749–W754. doi: 10.1093/nar/gkq428.CrossrefMedlineGoogle Scholar
  • 15. Zhang K, Cui S, Chang S, Zhang L, Wang J.i-GSEA4GWAS: a web server for identification of pathways/gene sets associated with traits by applying an improved gene set enrichment analysis to genome-wide association study.Nucleic Acids Res. 2010; 38(web server issue):W90–W95. doi: 10.1093/nar/gkq324.CrossrefMedlineGoogle Scholar
  • 16. Oti M, Brunner HG.The modular nature of genetic diseases.Clin Genet. 2007; 71:1–11. doi: 10.1111/j.1399-0004.2006.00708.x.CrossrefMedlineGoogle Scholar
  • 17. Feldman I, Rzhetsky A, Vitkup D.Network properties of genes harboring inherited disease mutations.Proc Natl Acad Sci U S A. 2008; 105:4323–4328. doi: 10.1073/pnas.0701722105.CrossrefMedlineGoogle Scholar
  • 18. Baranzini SE, Galwey NW, Wang J, Khankhanian P, Lindberg R, Pelletier D, Wu W, Uitdehaag BM, Kappos L, Polman CH, Matthews PM, Hauser SL, Gibson RA, Oksenberg JR, Barnes MR; GeneMSA Consortium. Pathway and network-based analysis of genome-wide association studies in multiple sclerosis.Hum Mol Genet. 2009; 18:2078–2090. doi: 10.1093/hmg/ddp120.CrossrefMedlineGoogle Scholar
  • 19. Lu C, Latourelle J, O’Connor GT, Dupuis J, Kolaczyk ED.Network-guided sparse regression modeling for detection of gene-by-gene interactions.Bioinformatics. 2013; 29:1241–1249. doi: 10.1093/bioinformatics/btt139.CrossrefMedlineGoogle Scholar
  • 20. Wang L, Matsushita T, Madireddy L, Mousavi P, Baranzini SE.PINBPA: cytoscape app for network analysis of GWAS data.Bioinformatics. 2015; 31:262–264. doi: 10.1093/bioinformatics/btu644.CrossrefMedlineGoogle Scholar
  • 21. Matthews L, Gopinath G, Gillespie M, et al. Reactome knowledgebase of human biological pathways and processes.Nucleic Acids Res. 2009; 37(database issue):D619–D622. doi: 10.1093/nar/gkn863.CrossrefMedlineGoogle Scholar
  • 22. Yu H, Kim PM, Sprecher E, Trifonov V, Gerstein M.The importance of bottlenecks in protein networks: correlation with gene essentiality and expression dynamics.PLoS Comput Biol. 2007; 3:e59. doi: 10.1371/journal.pcbi.0030059.CrossrefMedlineGoogle Scholar
  • 23. Toma I, McCaffrey TA.Transforming growth factor-β and atherosclerosis: interwoven atherogenic and atheroprotective aspects.Cell Tissue Res. 2012; 347:155–175. doi: 10.1007/s00441-011-1189-3.CrossrefMedlineGoogle Scholar
  • 24. Raines EW.PDGF and cardiovascular disease.Cytokine Growth Factor Rev. 2004; 15:237–254. doi: 10.1016/j.cytogfr.2004.03.004.CrossrefMedlineGoogle Scholar
  • 25. Katsuda S, Kaji T.Atherosclerosis and extracellular matrix.J Atheroscler Thromb. 2003; 10:267–274.CrossrefMedlineGoogle Scholar
  • 26. Schmidt EF, Strittmatter SM.The CRMP family of proteins and their role in Sema3A signaling.Adv Exp Med Biol. 2007; 600:1–11. doi: 10.1007/978-0-387-70956-7_1.CrossrefMedlineGoogle Scholar
  • 27. van Gils JM, Derby MC, Fernandes LR, et al. The neuroimmune guidance cue netrin-1 promotes atherosclerosis by inhibiting the emigration of macrophages from plaques.Nat Immunol. 2012; 13:136–143. doi: 10.1038/ni.2205.CrossrefMedlineGoogle Scholar
  • 28. Oksala N, Pärssinen J, Seppälä I, Raitoharju E, Kholova I, Ivana K, Hernesniemi J, Lyytikäinen LP, Levula M, Mäkelä KM, Sioris T, Kähönen M, Laaksonen R, Hytönen V, Lehtimäki T.Association of neuroimmune guidance cue netrin-1 and its chemorepulsive receptor UNC5B with atherosclerotic plaque expression signatures and stability in human(s): Tampere Vascular Study (TVS).Circ Cardiovasc Genet. 2013; 6:579–587. doi: 10.1161/CIRCGENETICS.113.000141.LinkGoogle Scholar
  • 29. Wanschel A, Seibert T, Hewing B, Ramkhelawon B, Ray TD, van Gils JM, Rayner KJ, Feig JE, O’Brien ER, Fisher EA, Moore KJ.Neuroimmune guidance cue Semaphorin 3E is expressed in atherosclerotic plaques and regulates macrophage retention.Arterioscler Thromb Vasc Biol. 2013; 33:886–893. doi: 10.1161/ATVBAHA.112.300941.LinkGoogle Scholar
  • 30. van Gils JM, Ramkhelawon B, Fernandes L, Stewart MC, Guo L, Seibert T, Menezes GB, Cara DC, Chow C, Kinane TB, Fisher EA, Balcells M, Alvarez-Leite J, Lacy-Hulbert A, Moore KJ.Endothelial expression of guidance cues in vessel wall homeostasis dysregulation under proatherosclerotic conditions.Arterioscler Thromb Vasc Biol. 2013; 33:911–919. doi: 10.1161/ATVBAHA.112.301155.LinkGoogle Scholar
  • 31. Hansson GK, Hermansson A.The immune system in atherosclerosis.Nat Immunol. 2011; 12:204–212. doi: 10.1038/ni.2001.CrossrefMedlineGoogle Scholar
  • 32. Serini G, Valdembri D, Zanivan S, Morterra G, Burkhardt C, Caccavari F, Zammataro L, Primo L, Tamagnone L, Logan M, Tessier-Lavigne M, Taniguchi M, Püschel AW, Bussolino F.Class 3 semaphorins control vascular morphogenesis by inhibiting integrin function.Nature. 2003; 424:391–397. doi: 10.1038/nature01784.CrossrefMedlineGoogle Scholar
  • 33. Lanahan A, Zhang X, Fantin A, Zhuang Z, Rivera-Molina F, Speichinger K, Prahst C, Zhang J, Wang Y, Davis G, Toomre D, Ruhrberg C, Simons M.The neuropilin 1 cytoplasmic domain is required for VEGF-A-dependent arteriogenesis.Dev Cell. 2013; 25:156–168. doi: 10.1016/j.devcel.2013.03.019.CrossrefMedlineGoogle Scholar
  • 34. Ji JD, Park-Min KH, Ivashkiv LB.Expression and function of semaphorin 3A and its receptors in human monocyte-derived macrophages.Hum Immunol. 2009; 70:211–217. doi: 10.1016/j.humimm.2009.01.026.CrossrefMedlineGoogle Scholar
  • 35. Bauer RC, Stylianou IM, Rader DJ.Functional validation of new pathways in lipoprotein metabolism identified by human genetics.Curr Opin Lipidol. 2011; 22:123–128. doi: 10.1097/MOL.0b013e32834469b3.CrossrefMedlineGoogle Scholar
  • 36. Musunuru K, Strong A, Frank-Kamenetsky M, et al. From noncoding variant to phenotype via SORT1 at the 1p13 cholesterol locus.Nature. 2010; 466:714–719. doi: 10.1038/nature09266.CrossrefMedlineGoogle Scholar
  • 37. Kjolby M, Andersen OM, Breiderhoff T, Fjorback AW, Pedersen KM, Madsen P, Jansen P, Heeren J, Willnow TE, Nykjaer A.Sort1, encoded by the cardiovascular risk locus 1p13.3, is a regulator of hepatic lipoprotein export.Cell Metab. 2010; 12:213–223. doi: 10.1016/j.cmet.2010.08.006.CrossrefMedlineGoogle Scholar
  • 38. Burkhardt R, Toh SA, Lagor WR, Birkeland A, Levin M, Li X, Robblee M, Fedorov VD, Yamamoto M, Satoh T, Akira S, Kathiresan S, Breslow JL, Rader DJ.Trib1 is a lipid- and myocardial infarction-associated gene that regulates hepatic lipogenesis and VLDL production in mice.J Clin Invest. 2010; 120:4410–4414. doi: 10.1172/JCI44213.CrossrefMedlineGoogle Scholar
  • 39. Papassotiropoulos A, Gerhards C, Heck A, et al. Human genome-guided identification of memory-modulating drugs.Proc Natl Acad Sci U S A. 2013; 110:E4369–E4374. doi: 10.1073/pnas.1314478110.CrossrefMedlineGoogle Scholar
  • 40. Khatri P, Sirota M, Butte AJ.Ten years of pathway analysis: current approaches and outstanding challenges.PLoS Comput Biol. 2012; 8:e1002375. doi: 10.1371/journal.pcbi.1002375.CrossrefMedlineGoogle Scholar
  • 41. Erbilgin A, Civelek M, Romanoski CE, Pan C, Hagopian R, Berliner JA, Lusis AJ.Identification of CAD candidate genes in GWAS loci and their expression in vascular cells.J Lipid Res. 2013; 54:1894–1905. doi: 10.1194/jlr.M037085.CrossrefMedlineGoogle Scholar
  • 42. Mäkinen VP, Civelek M, Meng Q, et al; Coronary ARtery DIsease Genome-Wide Replication And Meta-Analysis (CARDIoGRAM) Consortium. Integrative genomics reveals novel molecular pathways and gene networks for coronary artery disease.PLoS Genet. 2014; 10:e1004502. doi: 10.1371/journal.pgen.1004502.CrossrefMedlineGoogle Scholar
  • 43. Wang K, Zhang H, Kugathasan S, et al. Diverse genome-wide association studies associate the IL12/IL23 pathway with Crohn Disease.Am J Hum Genet. 2009; 84:399–405. doi: 10.1016/j.ajhg.2009.01.026.CrossrefMedlineGoogle Scholar
  • 44. Holmans P, Moskvina V, Jones L, et al; International Parkinson’s Disease Genomics Consortium. A pathway-based analysis provides additional support for an immune-related genetic susceptibility to Parkinson’s disease.Hum Mol Genet. 2013; 22:1039–1049. doi: 10.1093/hmg/dds492.CrossrefMedlineGoogle Scholar
  • 45. Rozenblatt-Rosen O, Deo RC, Padi M, et al. Interpreting cancer genomes using systematic host network perturbations by tumour virus proteins.Nature. 2012; 487:491–495. doi: 10.1038/nature11288.CrossrefMedlineGoogle Scholar
  • 46. Olsen C, Fleming K, Prendergast N, Rubio R, Emmert-Streib F, Bontempi G, Haibe-Kains B, Quackenbush J.Inference and validation of predictive gene networks from biomedical literature and gene expression data.Genomics. 2014; 103:329–336. doi: 10.1016/j.ygeno.2014.03.004.CrossrefMedlineGoogle Scholar


Genome-wide association studies have identified >45 loci associated with coronary artery disease (CAD) risk but provide limited insight into causal mechanisms. Furthermore, the identified signals explain little >10% of the predicted heritability of CAD. Part of this missing heritability. It is likely because many more common variants are linked to CAD but have not achieved genome-wide significance in genome-wide association studies because of small effect size or lower allele frequency and insufficient sample size. However, even weakly associated variants may provide important information about the biological basis of disease when such variants cluster within a common functional module or pathway. By integrating genome-wide association study data with extensive databases on core biological processes, we have identified novel biological pathways relevant to the pathogenesis of CAD. These findings provide new insight into how genetic variation, interpreted in the context of biological processes and functional interactions among genes, may help define the genetic architecture of CAD.