Epigenomic and Transcriptomic Dynamics During Human Heart Organogenesis
There is growing evidence that common variants and rare sequence alterations in regulatory sequences can result in birth defects or predisposition to disease. Congenital heart defects are the most common birth defect and have a clear genetic component, yet only a third of cases can be attributed to structural variation in the genome or a mutation in a gene. The remaining unknown cases could be caused by alterations in regulatory sequences.
Identify regulatory sequences and gene expression networks that are active during organogenesis of the human heart. Determine whether these sites and networks are enriched for disease-relevant genes and associated genetic variation.
Methods and Results:
We characterized ChromHMM (chromatin state) and gene expression dynamics during human heart organogenesis. We profiled 7 histone modifications in embryonic hearts from each of 9 distinct Carnegie stages (13–14, 16–21, and 23), annotated chromatin states, and compared these maps to over 100 human tissues and cell types. We also generated RNA-sequencing data, performed differential expression, and constructed weighted gene coexpression networks. We identified 177 412 heart enhancers; 12 395 had not been previously annotated as strong enhancers. We identified 92% of all functionally validated heart-positive enhancers (n=281; 7.5× enrichment; P<2.2×10−16). Integration of these data demonstrated novel heart enhancers are enriched near genes expressed more strongly in cardiac tissue and are enriched for variants associated with ECG measures and atrial fibrillation. Our gene expression network analysis identified gene modules strongly enriched for heart-related functions, regulatory control by heart-specific enhancers, and putative disease genes.
Well-connected hub genes with heart-specific expression targeted by embryonic heart-specific enhancers are likely disease candidates. Our functional annotations will allow for better interpretation of whole genome sequencing data in the large number of patients affected by congenital heart defects.
Editorial, see p 1156
In This Issue, see p 1119
Meet the First Author, see p 1120
Congenital heart defects (CHDs) are among the most common birth defects, affecting ≈1% of live births worldwide and remain the leading cause of infant mortality in developed nations.1–3 Despite evidence suggesting a strong genetic component, ≈60% of CHD cases remain unexplained.4,5 Environmental causes, familial forms of CHD, and de novo damaging mutations in over 400 genes each explain <10% of cases, while chromosomal abnormalities including aneuploidies and large structural variations are implicated in over 20% of CHD.6–10 These findings suggest that CHD arises through combinations of otherwise benign mutations in a large number of genes, unappreciated genetic-environmental interactions, or disruption of regulatory sequences that control heart development. There are indications that regulatory regions are causative for CHDs. Patients homozygous for rare variation in a heart-specific regulatory sequence controlling the cardiac TF (transcription factor) TBX5 (T-box TF 5) have isolated CHD.11 However, the extent to which developmental regulatory sequences could systematically contribute to CHD has not been explored.
Recent analysis of gene expression from the heart at multiple stages of human development indicates the dynamics of gene expression occur primarily during the embryonic period of human development or the first 8 postconception weeks, aligning well with known structural and functional changes in the developing heart.5,12,13 Due to the limited nature of embryonic and fetal tissues and difficulty with performing functional genomics on small amounts of primary material, several groups have used directed differentiation approaches in human and mouse stem cells to model cardiogenesis, providing insight into global binding of multiple cardiac TFs, conserved cardiac regulatory networks, and cooperativity among these specific TFs.14–18 However, these data have not been systematically compared with human cardiomyocytes, and it remains unclear how faithfully these systems reflect normal human heart development at the gene regulatory level. Regulatory annotation of the human genome from the ENCODE Project and Roadmap Epigenome includes primary tissues of the human heart. However, these data are exclusively from fetal and adult stages after these major molecular and morphological changes have occurred. These 2 large consortia used chromatin immunoprecipitation followed by next-generation sequencing (ChIP-seq) of several histone modifications in combination with machine learning approaches in over 100 tissues and cell types.19,20 This approach resulted in the learning of several hidden Markov models of ChromHMM (chromatin state) using the combined tissue dataset.
The number of states, and, therefore, the complexity of the model, depends upon the histone modifications used. A 15-state model can be learned from the histone modifications H3K4me1, H3K4me3, H3K9me3, H3K27me3, and H3K36me3 (frequently associated with active enhancers, active promoters, stable heterochromatin, facultative heterochromatin and active transcription, respectively). When H3K27ac (associated with active chromatin with functions ranging from active transcription to active enhancers) is included, an increasingly complex model of 18 different chromatin states can be called. The addition of 5 other histone modifications (H2A.Z [histone 2A variant Z], H3K4me2, H3K9ac, H3K79me2, H4K20me1) and DNA accessibility (DNAse hypersensitivity) provide the ability to call 25 chromatin states. Through the use of such an ensemble, multitissue approach, various types of active and repressed features are mapped onto the genome and cell type–specific or stage-specific enhancers can be readily distinguished.
Here, we describe the systematic characterization of chromatin states from human embryonic hearts at 4 to 8 postconception weeks using hidden Markov models of chromatin state in a manner that allows for direct comparison with over 100 tissues profiled by Roadmap Epigenome. We are able to confirm previously validated in vivo heart enhancers and identify thousands of previously unknown embryonic heart–specific regulatory sequences. Integration of chromatin and gene expression through coexpression analysis identified groups of genes that are coordinately expressed during early heart development and likely regulated by these heart-specific enhancers. Our multilayered analysis also predicts new genes to be involved in CHD that warrant further study. These resources are available from the Genome Expression Omibus, as a public track hub on the University of California Santa Cruz (UCSC) Genome Browser or directly from our laboratory website (https://cotney.research.uchc.edu/heart).
The authors declare that anonymized data have been made publicly available at the Gene Expression Omnibus and can be accessed at via accession numbers GSE137731 (ChIP-seq) and GSE138799 (RNA sequencing [RNA-seq]). Because of the sensitive nature of the raw sequencing data collected for this study, requests to access those data from qualified researchers trained in human subject confidentiality protocols and approved by an institutional review board may be sent to the corresponding author. All other supporting data and materials presented within this article and in the Data Supplement are available from the corresponding author upon reasonable request.
Human Tissue Samples
Use of human embryonic tissue was reviewed and approved by the Human Subjects Protection Program at UConn Health (UCHC 710-2-13-14-03). Human embryonic heart tissue was collected, staged, and provided by the Joint Medical Research Council (MRC)/Wellcome Trust Human Developmental Biology Resource (www.hdbr.org). Tissues were flash frozen upon collection and stored at −80°C. Hearts were homogenized by mechanical disruption and divided between samples for RNA-seq and ChIP-seq. For ChIP-seq, tissues were fixed by incubation in 1% formaldehyde for 15 minutes at room temperature with agitation before being quenched by addition of 2.5 M glycine to a concentration of 150 mM with rotation/agitation for 10 minutes. For RNA-seq, homogenized tissue was added to Qiazol (Qiagen) and flash frozen.
A detailed description of the methods and analysis is provided in Expanded Materials and Methods in the Data Supplement.
For a complete listing of all software, public datasets, and relevant materials, please see the Major Resources Table in the Data Supplement.
Chromatin State Profiling of Human Embryonic Heart Development
As demonstrated by Roadmap Epigenome, ensemble approaches using many different biochemical markers are required to more completely characterize how the genome is utilized in a given biological condition. To systematically profile chromatin states of the developing human embryonic heart, we first used ChIP-seq using antibodies against 7 histone H3 posttranslational modifications on individual human hearts from 18 embryos spanning the organogenesis phase of heart development (Figure 1A; Online Figure I; Online Table I). Overall, these raw data were of high quality with high correlation between experiments performed with the same antibody across separate individual embryos and generally clustered by marks associated with genome activation (H3K4me1, H3K4me2, H3K4me3, H3K27ac, and H3K36me3) versus repression (H3K9me3 and H3K27me3; Figure 1B; Online Figure IIA). We uniformly processed these data to allow direct comparison of 127 human cell types profiled by Roadmap Epigenome and 21 human developing craniofacial samples. We then used an imputation approach to generate a complete set of 12 global epigenomic signals for each sample using chromatin signal imputation as previously described by Roadmap Epigenome and used by our group for human craniofacial development.19,21 The imputed signals correlated well with primary signals and predominantly clustered by known biological function as was observed in Roadmap Epigenome and human craniofacial epigenomic data (Figure 1B; Online Figure IIB).
Having uniform epigenomic datasets for each heart sample across the developmental series, we then applied the previously generated 15-, 18-, and 25-state models of chromatin activity developed by Roadmap Epigenome to segment the genome into chromatin states. The individual state classifications and color coding for each model are provided in Online Figure III for easy reference. The number of segments identified for each of the 25 chromatin states was similar across all of our 18 samples (Figure 1C). The pattern of chromatin state segments identified in our human embryonic heart samples was similar to that identified in 127 tissues from Roadmap Epigenome, with the one exception of significantly increased numbers of poised promoter segments (state 22 from 25-state model; Figure 1D). These findings suggest our imputation and segmentation approaches generated data globally similar to all other human tissues from Roadmap Epigenome, allowing us to make direct comparisons of chromatin state utilization between tissue types and stages.
Genomic enrichments and biological features of individual chromatin states have been thoroughly explored in adult tissues but have yet to be extensively explored during organogenesis.19,22 Two groups have recently published resources aimed at comprehensive identification of active regulatory sequences in heart development.23,24 However, they have profiled a limited set of active functional marks from tissue after human heart organogenesis has largely completed or applied a novel computational framework tuned for prediction of heart regulatory sequences that has not been extensively compared across human tissues or more generalized functional annotation tools.23–25 While these resources have both shown enrichment for in vivo developmental heart enhancers, they did not characterize other regulatory states such as repression. Therefore, we aimed to understand what the various chromatin states we have annotated mean biologically during heart development. To achieve this, we characterized structural and functional enrichments genome wide for each chromatin state and compared them to regions identified in a previously published compendium of heart enhancers and the EMERGE prediction framework.
We first characterized the positions of each chromatin state segment relative to known functional positions in the genome. We found as expected that segments of the genome annotated as state 1 from the 25-state model (1_TssA) previously classified as active transcription start sites (TSS) were located primarily within 10 base pairs of a TSS based on GENCODE, version 25, annotation of the human genome (Online Figure IVA). Promoter-associated states 2 through 4 (2_PromU, 3_PromD1, and 4_PromD2) and bivalent promoter state 23 (PromBiv) were generally located within 1000 bp of known TSS. The remaining states from 5 to 21 were progressively more distant from known TSS with heterochromatic segment annotations (21_Het) being the most distant. Transcription-associated states (5_Tx5, 6_Tx, 7_Tx3, and 8_TxWk) and regulatory states located in introns of active genes (9_TxReg, 10_TxEnh5, 11_TxEnh3, and 12_TxEnhW) were closer to TSS than other active regulatory states.
Among the 7 putative enhancer states (13_EnhA1, 14_EnhA2, 15_EnhAF, 16_EnhW1, 17_EnhW2, 18_EnhAc, and 19_Dnase), those segments of the genome annotated to be strong enhancer states (states 13 through 15) were closer to TSS than weak enhancer states (states 16, 17, and 19) with state 13 being systematically the closest to known TSS. State 18 (18_EnhAc), which was defined primarily by H3K27ac signals alone, showed similar distributions of distances relative to TSS as other strong enhancer states (Online Figure IVA). Regions assembled in a compendium of heart enhancers by Dickel et al showed similar distributions of distance to TSS as strong enhancer states 14, 15, and 18 but were more distant than state 13. These states were driven in the original Roadmap model by H3K27ac and H3K4me1 with the Dickel resource being heavily biased toward H3K27ac and its depositor P300. The strongest enhancer state, state 13, was identified in Roadmap Epigenome by many more features including the presence of H3K27ac, H3K4me1, H3K4me2, and DNase and conversely the absence of H3K9me3, H3K27me3, H3K36me3, and H3K79me2, which are uniquely captured in our study. Interestingly, regulatory regions predicted by EMERGE showed a bimodal distribution with a significant fraction of regions located directly overlapping known TSS (Online Figure IVA).
When we analyzed the size of segments annotated as each chromatin state, we saw similar trends across all states with median values typically around 1 kb. Larger distributions were observed at heterochromatic regions (21_Het) and portions of the genome with no detectable activity or quiescent (25_Quies). When we analyzed the Dickel compendium and regions annotated by EMERGE, we observed significant excursions from the values we observed for our chromatin state segments. The Dickel compendium identified uniformly larger regions than enhancer chromatin states identified from our data, while most regions identified by EMERGE were significantly smaller (Online Figure IVB).
Sequence conservation has been frequently used to identify functional regions of the genome, particularly those with gene regulatory function in intronic or intergenic spaces of vertebrate genomes.26–29 Regions associated with active TSS or assigned active or bivalent promoter states were most conserved based on multispecies alignments from 100 vertebrate species (Online Figure IVC).30 Heterochromatic and polycomb repressed states were the least conserved among the 25 chromatin states. Among putative enhancer chromatin states, strong enhancer state 13, weak enhancer state 16, and DNase accessible regions state 19 were the most conserved. Median conservation scores of regions identified by Dickel et al were comparable to weak enhancer state 16. Notably EMERGE regions showed among the lowest median conservation scores across all categories greater only than enhancer state 15.
While conservation is an indicator of selective pressure across species, a large portion of the human genome does not show significant conservation across all vertebrates and might be uniquely involved in human disease. Several groups have attempted to bridge this knowledge gap and developed machine learning techniques to predict deleteriousness of all 8.6 billion possible single-nucleotide variants in the human genome.31–34 These approaches leverage many different functional genomics datasets, TF-binding site predictions, and sequence conservation, among others and are informative for identifying regulatory variants linked to disease. When we interrogated our chromatin states with 2 of these scoring metrics (Combined Annotation Dependent Depletion [CADD] and Linear Model Inference of Natural Selection From Interspersed Genomically Coherent Elements [LINSIGHT]), states 20 (ZNF_Repeats) and 21 (heterochromatin) had the lowest median scores (Online Figure IVD and IVE). TSS and promoter-associated states 1, 2, 3, and 23 had the highest scores predicted by these methods. Transcribed regions (states 5 through 8) despite being nearer TSS sites had uniformly depressed Combined Annotation Dependent Depletion and Linear Model Inference of Natural Selection From Interspersed Genomically Coherent Elements scores. Strong enhancer state 13 had the highest median Combined Annotation Dependent Depletion and Linear Model Inference of Natural Selection From Interspersed Genomically Coherent Elements scores among distal regulatory chromatin states, significantly higher than all other enhancer states and regions annotated by both Dickel et al and EMERGE (Online Figure IVD and IVE).
Combined, these results strongly corroborate the functional labels assigned to each of the chromatin states applied by Roadmap Epigenome. These systematic analyses revealed many expected distributions for chromatin states relative to known features of the genome and elevated sequence conservation among active regulatory regions. Our results also revealed a number of similarities between our annotations and previous databases of heart regulatory sequences. However, our enhancer chromatin states, particularly the strong categories, are significantly more conserved and harbor significantly more positions predicted to be involved in human disease than either of these 2 catalogs of putative heart regulatory sequences. While indicative of function, these metrics are largely generic and not necessarily related to the heart. Thus, we next aimed to understand the direct relevance of these chromatin state annotations for heart development and disease.
Identification of Novel Human Embryonic Heart Enhancers
Comparison of enhancers across many different tissues has shown them to be far more tissue specific in their activation than promoters.19 To determine whether the enhancer segments from our developing heart chromatin state segmentations are indeed tissue specific, we aimed to examine analogous chromatin state annotations quantitatively across as many tissues as possible. To achieve this, we first assembled all segments identified as an enhancer chromatin state based on the 25-state chromatin model in any tissue profiled by Roadmap Epigenome, this study, previously generated human craniofacial development samples, and chromatin data for nuclei isolated from fetal, infant, and adult cardiomyocytes.19,21,35 In our human embryonic heart samples alone, we identified a total of 177 412 segments that were reproducibly annotated as 1 of 6 putative enhancer states (states 13–18, EnhA1, EnhA2, EnhAF, EnhW1, EnhW2, and EnhAc) in at least 2 samples (Online Table II).
The chromatin state segmentations are powerful tools for genome-wide annotation, but they have limited utility on their own as the quantitative nature of the underlying data is discarded. To overcome this obstacle, we leveraged H3K27ac signals from each of these samples since it has been frequently shown to be highly tissue specific in its distribution and generally associated with enhancer activation.19,36 Due to the overall better performance of normalized, imputed signals relative to primary ChIP-seq data, we extracted imputed H3K27ac P signals from 174 samples at 444 413 enhancer segments across the genome.37 We found the strongest global correlations between related tissue types, such as immune cell types, brain region tissues, and the embryonic heart samples. These data separated largely into adult versus embryonic groups and subsequently by tissue type (Figure 2A; Online Figure VA and VB). The embryonic heart samples clustered well with one another and also showed strong global correlation in H3K27ac signals with the fetal and adult heart tissue samples profiled by Roadmap Epigenome. Interestingly, the isolated cardiomyocyte nuclei formed their own distinct cluster and do not separate into adult versus developing samples. They also showed lower global correlation values with the heart tissues profiled by Roadmap Epigenome suggesting isolation of nuclei had significant effects on chromatin state or organization in those experiments. A tSNE (t-distributed stochastic neighborhood embedding) projection of these data further confirmed these findings (Figure 2A) and showed the distinct nature of the isolated cardiomyocyte H3K27ac data (Online Figure VB). Given the potential for strong batch effects in the cardiomyocyte nuclei data due to isolation and sorting of nuclei before ChIP and the decreased correlation with fetal and adult heart samples, we excluded these samples from downstream analysis. These results confirmed the early developmental stage identity and the cardiac origins of the tissues profiled in an unbiased, global fashion and further demonstrated the tissue-specific nature of the putative enhancer sequences and H3K27ac signals.
To further test the validity of our annotations, particularly the putative enhancer states, we assessed the number of experimentally tested and validated developmental enhancers by the Vista Enhancer Browser captured by each of our chromatin states from the 25-state model (Figure 2B).36 Overall, we identified 92% of all validated heart-positive enhancers (n=281) from the Vista Enhancer Browser, a 7.5-fold enrichment versus active enhancers lacking activity in the heart (heart negative; P<2.2×10−16). When we compared each of the chromatin states ability to specifically identify heart enhancers, we observed as expected that enhancer chromatin states captured a significantly larger fraction of heart-positive enhancers than other chromatin states (Figure 2B). Strong enhancer state 13 showed the greatest fraction of overlap with validated heart enhancers and the greatest fold difference relative to overlap with heart-negative enhancers. The heart specificity of state 13 was higher than all other chromatin states and significantly greater than regulatory regions annotated by EMERGE or Dickel et al.
Interestingly, weak enhancer states 16 and 17, DNase accessibility state 19, and regions harboring repressive chromatin marks including poised and bivalent promoters and polycomb repressed states showed an opposite trend in capturing heart-positive versus heart-negative enhancers. This indicated that these 2 weak enhancer states (16 and 17) are not informative for heart regulatory sequences and generally should be interpreted with caution in other tissues where large numbers of validated enhancers do not yet exist. When we compared the performance of chromatin states identified by the 15- and 18-state models, we found decreasing levels of specificity as number of states decreased (Online Figure VIA and VIB). Strong enhancer segments identified by the 18-state model showed slightly higher levels of specificity versus EMERGE and Dickel et al (Online Figure VIB). Enhancers identified by the 15-state model performed the poorest by this metric with EMERGE showing significantly higher specificity (Online Figure VIA).
These results suggest the 25-state chromatin model is best able to identify heart-specific regulatory sequences, particularly state 13 enhancers. However, this analysis does not take into account the ranking of enhancers for heart specificity provided by both Dickel et al and EMERGE. It is thus possible that higher ranking regulatory sequences in either resource may be better able to identify true heart positives. To test whether this was indeed the case, we measured the ability of strong enhancer states from the 25-state model to recover heart-positive versus heart-negative enhancers compared with performance of ranked lists from both EMERGE and Dickel et al. Area under the curve for receiver operating characteristic curves revealed consistently higher area under the curve for enhancers identified by the 25-state model across all time points versus EMERGE with the exception of the earliest Carnegie Stage (CS) 13 samples. Enhancers identified by Dickel et al had the lowest area under the curve values in this analysis (Online Figure VIIC).
When we directly compared the regions identified by our chromatin states and those by EMERGE, we found the most significant overlaps of EMERGE peaks with active TSS and active promoter annotations (Online Figure VIIA). The EMERGE signal was overall highly enriched near TSS, and subsequently, the active TSS and promoter states (states 1–4) had the highest EMERGE scores (Online Figure VIIB). When we compared the overlap of all chromatin state segments with the Dickel compendium, we found much less overlap with active TSS states and higher percentages of sequences that were annotated to be weak enhancer states or even quiescent in human embryonic development (Online Figure VIIIA). Together, these results suggest that the highest scoring EMERGE peaks are concentrated close to known TSS and the 25-state chromatin model is better able to identify tissue-specific regulatory sequences than both EMERGE and Dickel compendium.
While these results suggest that our chromatin state segmentations are capable of identifying sequences with heart regulatory capacity, the Vista Enhancer Database has been largely constructed by testing elements identified with H3K27ac deposition or P300 binding. The Dickel compendium is based almost entirely on these 2 marks, and the EMERGE framework has been tuned to find positives from this database. To determine whether our chromatin state segmentations are capable of annotating regulatory sequences identified by different means, we turned to a large set of sequences characterized by binding of 7 cardiac TFs in fetal and adult mouse hearts.18 This work demonstrated that regions bound by ≥5 TFs showed significantly more activity than regions with H3K27ac signal but lacking TF binding when systematically tested using a massively parallel reporter assay (MPRA).
When we interrogated regions tested by MPRA that could be identified in the human genome, we found the highest median RNA-to-DNA ratios for enhancer state segments in our 25-state segmentations. Specifically, we found significantly higher MPRA activity relative to negative controls in strong enhancer states 14, 15, and 18 (Online Figure IVF). Weak enhancer state MPRA signals were not significantly different from negative controls. Neither the Dickel compendium nor EMERGE regions showed significantly different MPRA signals from negative controls. These findings indicate our chromatin state segmentations are capable of identifying regions that are driven by combinatorial TF binding and not necessarily by H3K27ac.
Thus far, our analyses have indicated the strong enhancer states (states 13–15 and 18) are the most specific for developing heart. We reasoned that segments annotated as strong enhancer states in the developing heart but not so in other tissues might reveal previously unknown heart-relevant gene regulatory information. When we compared our full set of strong embryonic heart enhancer segments with analogous annotations from all tissues in Roadmap Epigenome, we identified 12 359 segments that had not been previously annotated as strong enhancer chromatin states. We refer to these putative enhancers hereafter as embryonic heart–specific enhancer segments (EHEs; Online Table II). When we compared these with regions identified in the Dickel compendium with prenatal biased scoring enhancers (enhancers with a scoring ratio >2 prenatal/postnatal) or enhancers in the Dickel compendium identified in the single human fetal sample, we observed small but significant sharing of annotations (Online Figure VIIIB through VIIIE). Combined, these results indicated that the vast majority of our putative EHEs (75.6%) have never been previously identified in any other human tissue or stages of heart development.
Unfortunately, only a handful of the putative EHEs have been functionally tested, making it difficult to assess their relevance for heart development from an in vivo perspective. While such putative enhancers might be novel, we reasoned they could potentially target genes known to be involved in heart development or function. To address this, we assigned EHEs to the single nearest gene within 1 Mb using the Genomic Regions Enrichment of Annotations Tool and found significant enrichment of biological processes related to heart development and morphogenesis (Figure 2C; Online Table III). The associated mouse phenotypes are heavily fortified with those related to abnormal development, morphology, size, and function of the heart. Additionally, the enriched molecular functions contain multiple terms related to microfibril and tubulin binding along with voltage-gated channel activity in the atrioventricular node.
We next analyzed the sequence content of the EHEs for enrichment of TF-binding sites using Hypergeometric Optimization of Motif Enrichment (Figure 2D; Online Table IV). We identified significantly enriched motifs that matched binding sites of TFs involved in heart development such as the GATA family, the MEF2 (myocyte enhancer factor-2) family, and TBX20 (T-box TF 20) among others (Figure 2D, top).38,39 When we performed de novo motif enrichment using Hypergeometric Optimization of Motif Enrichment, we saw enrichment of the same families of TFs with the addition of factors also known to contribute to heart development, such as NKX2-5 (NK2 homeobox 5) and SMAD (SMA and mothers against decapentaplegic gene family) signaling (Figure 2D, bottom). Through this global analysis, we have identified TFs likely involved in human embryonic heart enhancer activation. Functional and motif enrichment within these EHEs identified from this multitissue, unbiased analysis demonstrated the power of these approaches and suggested that many of the novel sequences we have identified are bona fide enhancers and likely important for normal heart development.
Differential Motif Utilization of Embryonic Heart Enhancers Across Development
Having demonstrated enrichment of heart-related TF-binding sites in putative EHEs, we wondered whether this was a general trend across all of heart development or if there were temporal shifts in enhancer activation modulated by different TFs during the embryonic period of heart development. To simplify our analysis, we consolidated the developmental time course into 3 general periods: early (CS13), mid (CS16–17), and late (CS23) development (Figure 3A). We identified differentially activated putative enhancer segments across this trajectory by comparing H3K27ac and H3K4me2 signals at all reproducible enhancer segments between each of the 3 stages of embryonic heart development (Figure 3B and 3C; Online Figure IX).
For putative enhancer segments differentially activated based on H3K27ac signals, we observed the greatest differences in motif enrichment between early and later stages of development (Figure 3D). Putative enhancer segments active early were specifically enriched for pluripotency-related TFs like SOX2 and OCT4 and multiple members of the KLF (Kruppel-like factor) and Forkhead families of TFs. Motifs enriched in late putative enhancer segments included many zinc finger–containing TFs and multiple members of the T-box, GATA, and PAX (paired box) families of TFs. Notably, enhancer segments more strongly active in the mid period of heart embryonic development based on H3K4me2 signals showed the most pronounced enrichment of TF motifs (Figure 3E). Many of the same TF motifs enriched in early and late putative enhancer segments based on H3K27ac were shifted to enrichment in the mid period, suggesting dynamics in TF utilization related to chromatin state.
Based on these differences in differentially activated putative enhancer segments, we hypothesized that the location of these segments and subsequently the genes that they control might uncover distinct patterns of biological pathway utilization during this developmental series. We assigned differentially activated putative enhancer segments from each comparison to the nearest gene and identified the most significantly enriched gene ontology terms (Figure 3F). In all comparisons, terms associated with cardiac chamber and septum morphogenesis and development were highly enriched. Genes near putative enhancer segments most strongly active during the early period were uniquely enriched for pathways related to bone growth and fibroblast growth factor (FGF) receptor signaling. Terms associated with cardiomyocyte differentiation, muscle cell development, and muscle cell contraction were observed for genes assigned putative enhancers more strongly active in the late period. These results agree with known dynamics in relative ratios of cardiomyocytes versus other cell types during these different developmental periods.40–42 Our bulk tissue experiments are incapable of disentangling such cell type–specific regulatory element utilization. These differentially active putative enhancer lists are likely enriched for such sequences but require single-cell chromatin accessibility methods to confirm.
Identification of Embryonic Heart–Specific Super Enhancers and Long-Range Chromatin Interactions
To better understand the role these putative enhancer segments might play in cardiac-related diseases, we started by interrogating the EHEs for known heart-related loci. Defects in the cardiac homeobox TF NKX2-5 have been implicated in many different CHDs including conotruncal malformations, septal defects, and atrioventricular conduction block.43–47 Most human patients with NKX2-5–associated CHD have heterozygous loss-of-function mutations, and haploinsufficient mice have significantly depleted protein levels in the heart.48 Depletion or complete knockout of NKX2-5 results in dysregulation of many downstream target genes in mice and human cardiomyocytes, suggesting it is a master regulator of cardiac development.48,49 Therefore, identifying regulatory sequences that control NKX2-5 expression specifically in the developing heart could be valuable information for understanding the unknown genetic causes of CHDs.
When we inspected the genomic locus containing this gene, we found it was surrounded by a plethora of strong enhancer state segments including an EHE immediately downstream of the coding exons (Figure 4A). There are a number of strong enhancer segments identified uniformly from CS14 to CS23 ≈50 kb upstream that are largely repressed in fetal and adult hearts. These regions are particularly interesting as they cannot be readily identified through sequence conservation based on comparisons of 100 vertebrate genomes (Figure 4A). Moreover, these coordinately activated strong enhancer segments are predicted as an embryonic heart super enhancer region that encompasses over 200 kbs surrounding NKX2-5 (n=4215; Online Table II). This is ≈4× larger than any other super enhancer annotation for this region based on fetal or adult human heart samples. Such regions have been associated with tissue-specification loci, further reinforcing the central role NKX2-5 plays in heart development and the novel information our resource has identified.50–52
We then inspected the locus harboring SCN5A, which has been implicated in multiple cardiac diseases and specifically known to cause ≈20% to 30% of cases of Brugada syndrome (Online Figure XA).53,54 Here, we observed several clusters of strong enhancer segments upstream, within, and downstream of the gene in heart tissues at multiple stages (adult, fetal, and embryonic tissue). The downstream cluster of strong enhancer segments has been previously annotated as a super enhancer region in mouse cardiomyocytes and deletions had significant effects on SNC5A expression in the heart.55 Our analysis confirmed this finding but also annotated a super enhancer that encompassed the strong enhancer segment cluster upstream of the gene. A strong enhancer segment in the 16th intron completely encompassed a validated enhancer with specific activity in the mouse embryonic heart. Additionally, a 13-kb linkage disequilibirium (LD) block (R2≥0.95, D′=1, 1000G CEU) that contains multiple variants associated with cardiac phenotype–related variants overlaps this tested enhancer region and adjacent loci (Online Figure XA, dashed box). These variants included rs41312411 associated with establishment of resting heart rate and P-wave duration, rs3922844 associated with establishment of ECG traits and measures, and rs11708996 associated with Brugada syndrome.56–58
Variants in potential regulatory sequences across this entire locus have been tested for effects on enhancer activity in cultured cardiomyocytes.59 When we overlaid in vitro reporter assay data for putative regulatory sequences harboring alternative alleles on our chromatin state segmentations, we generally observed the strongest effects for sequences overlapping strong putative enhancer segments (Online Figure XA). Sequences that overlapped quiescent regions in our segmentations had minimal effects on reporter gene expression indicating they likely have limited regulatory capacity in vivo. In addition to these well studied loci, we found that other genes important for normal cardiac function and development, such as HAND2 and MYOCD, were within super enhancers (Online Figure XB and XC).
Having demonstrated that genes important for normal cardiac function reside within super enhancers and EHEs are enriched for heart-relevant biology, we reasoned heart-specific super enhancers could also identify important, previously unknown regulatory landscapes or important cardiac genes. When we compared super enhancer regions globally between the embryonic heart and analogous annotations from many different tissues using dbSUPER, we identified 1611 human embryonic heart–specific super enhancers (Online Table II).60 These regions were strongly enriched for genes related to transcription regulation or roles in cell junctions (Online Table II). One such example is the cardiac T-box TF TBX20, which is required for cardiomyocyte proliferation in mice and linked to dilated cardiomyopathy in humans.61–63 The large noncoding region adjacent to this gene contains 60 putative enhancers, nearly one-third are EHEs, and multiple heart-positive in vivo enhancers (Figure 4B). The specific nature of our annotations is readily apparent at this location with the distal noncoding region, and the sequences surrounding the TBX20 TSS are sparsely annotated across 127 tissues and cell types in Roadmap Epigenome.
Another putative embryonic heart–specific super enhancer of note is ≈200 kb in length and resides in the large noncoding region upstream of gap junction protein GJA1. Sites throughout this ≈1-Mb region form long-range interactions with GJA1 in human induced pluripotent stem cell–derived cardiomyocytes.64 Deletion of heart-specific enhancers of Gja1 in mice is sufficient to decrease its expression, which has in turn been previously linked to arrhythmias.25,65 This set of embryonic heart–specific super enhancers includes many additional loci that are not currently known to play a role in cardiac development making them good candidates for future study.
While the examples above demonstrated cis-regulatory landscapes surrounding a single gene, as indicated for enhancers of GJA1, such regulatory sequences can interact with their targets over long distances through chromatin looping. Such loops can be difficult to predict in silico, and given the tissue-specific nature of enhancers, appropriate tissues or cell types have to be utilized to identify biologically relevant interactions. Although the developing heart is made up of many different cell types, recent single-cell RNA-seq analysis of CS16 hearts revealed a significant proportion of cardiomyocytes.66 We, therefore, hypothesized that interaction data from cardiomyocytes might allow us to better understand the physical relationship between our chromatin state segmentations and target genes in a cardiac relevant context. Thus we integrated our annotations with previously published high-resolution promoter capture Hi-C data from induced pluripotent stem cell–derived cardiomyocytes.64 We overlapped the distal anchor points from cardiomyocytes with the functional annotations from our human embryonic heart samples using Roadmap adult brain samples as a control. We then calculated the fold enrichment of interactions for each chromatin state (Figure 4C). The largest enrichments in distal anchor points were in strong enhancer and transcription regulatory states (specifically states 9–11 and 13–15) for both embryonic heart and brain. However, the largest degree of specificity between embryonic heart versus brain in these states was identified for strong enhancer states 13 and 14. Little-to-no significant change in fold enrichment of interactions between these 2 tissues was seen in poised, repressed, or quiescent states (states 22–25). Overall, these findings confirmed many groups’ observations that strong, tissue-specific enhancers interact with their target gene promoters in a tissue-specific fashion and further demonstrated the relevance of our annotations for human embryonic heart development.67,68
Systematic Enrichment of Heart Phenotype and Defect-Associated Variants in Embryonic Heart Enhancers
Since we observed overlap of putative enhancer segments with disease-associated variants at known cardiac disease–related genes, we set out to determine whether this finding was generalizable across all strong putative enhancer segments regardless of their proximity to a known heart gene. We compiled variants significantly associated with variation in normal heart phenotypes and CHDs from the genome-wide association study (GWAS) catalog. We then assessed enrichment of these variants in strong enhancer state annotations from all tissues profiled by Roadmap Epigenome and the developing human heart. We observed significant enrichment of variants associated with systolic blood pressure, electrocardiograph traits and measures, resting heart rate, QRS characteristics, and QT interval in strong enhancer segments identified in most embryonic heart samples (Figure 5A through 5D; Online Figure XIB). The earliest samples we profiled (CS13) did not show enrichment for any of these traits suggesting that gene regulatory programs influencing these phenotypes are not active until after 4.5 postconception weeks. As a negative control, we interrogated variants associated with Crohn disease that have been previously shown to be enriched in enhancers identified in immune cell types.69 We confirmed this finding and saw minimal enrichment of these variants in strong enhancer segments identified during human heart development (Online Figure XIC). We then turned to common variants associated with CHD incidence (Online Figure XID). Unexpectedly, we found that once we corrected P for multiple testing, no set of tissue putative enhancer segments were significantly enriched.
Recent findings in atrial fibrillation (AF) have suggested that sites accessible during fetal heart development are enriched for variants associated with this disorder.70,72,73 However, the functional nature of these sites is unknown, as significant enrichment in fetal heart was observed only for H3K4me1 peaks—a mark typically associated with poised enhancers.74 This could suggest that enhancers primed in fetal heart development but not yet fully active are important for AF, but it is unclear whether regulatory elements active even earlier or in different chromatin states may play a role. The availability of full summary statistics for this particular disease phenotype allowed us to leverage more systematic, genome-wide analysis of >8 million positions in the genome in a linkage disequilibrium aware fashion.69
To address this issue, we retrieved full summary statistics for a large, multiethnic GWAS for AF and assessed enrichment of associated variants across all 25 chromatin states from all tissues in Roadmap Epigenome, craniofacial development, and embryonic heart development. Our findings confirm enrichment of AF-associated variants in fetal and adult heart active regulatory sequences. Surprisingly, we observed enrichment of AF-associated variants across virtually all embryonic heart strong enhancer states (Figure 5E; Online Figure XIA and XIF). The greatest fold enrichment was observed in strong enhancer segments (Hidden Markov model of chromatin state 13) active at the earliest time points we profiled (CS13; OR, 6.65; P=8.69×10−06). Using this approach, we also observed enrichment for variants associated with resting heart rate, QRS interval, and P-wave duration consistently across strong enhancer states identified in the embryonic human heart (Online Figure XIH through XIJ).
Using full GWAS summary statistics from 2 immune-related diseases (systematic lupus erythematous and Crohn disease, as negative controls), we did not observe enrichment in any embryonic heart enhancer states (Figure 5F; Online Figure XIG through XIK). Conversely, we observed enrichment of lupus-associated variants in active TSS (state 1) and strong enhancer state segments from immune-related cell types (Online Figure XIE).69,75 Surprisingly, when we assessed all chromatin states identified in embryonic heart samples, we found significant enrichment of lupus-associated variants uniformly across polycomb repressed segments (state 24) and bivalent promoter segments (state 23) identified in the embryonic and fetal heart but not adult heart samples (Figure 5F).
Gene Expression Profiling of Embryonic Heart Development
Thus far, we have only analyzed putative regulatory sequences but not discerned what, if any, effects they may have on gene expression. To begin to understand how chromatin state changes during heart development influence gene expression, we profiled the transcriptomes of 3 biological replicates at each of 8 distinct embryonic stages (CS13, 16–21, and 23), largely overlapping the time points profiled for chromatin state. This window of time has the greatest dynamics of gene expression based on comparisons of transcriptomes of developing hearts from multiple species.12 To leverage a large number of previously published data sets, we processed our data using a uniform analysis pipeline used by the recount2 database.76,77
Processing data in this way allowed us to directly compare our data to all tissues profiled by Genotype-Tissue Expression Project (GTEx), including 489 samples from adult heart tissue.78 To assess the validity of this approach, we first clustered all samples from GTEx (n=9662) and our embryonic heart samples (n=24) based on expression of 36 990 genes, using t-distributed stochastic neighborhood embedding.79 We observed generally good clustering of adult samples by tissue type, including subregions of the brain and the ventricle and atrium of the adult heart. Embryonic heart samples were tightly clustered and distinct from adult heart samples and other GTEx tissues (Online Figure XIIA). Principal component analysis of only the embryonic heart samples showed good clustering by stage and minimal effects from sex and RNA quality in the first 2 components (Online Figure XIIC through XIIE). Overall, these results suggest that our expression data are of high quality and likely informative for understanding early human heart development.
Genes that are expressed in a limited number of tissues are more likely to be disease-related genes than those with broader expression patterns.80,81 Based on these general trends, we hypothesized that genes expressed specifically during embryonic heart development are likely involved in cardiac defects. To evaluate this, we used a measure of specificity based on the Gini coefficient—a metric originally used to measure income inequality, which accurately identified genes with tissue- and cell-type specific expression.82–84 We identified 347 genes with elevated Gini coefficients (>0.5) and the highest expression in embryonic heart (Figure 6A). These genes were strongly enriched for genes involved in heart development including TBX5, IRX4, HAND1, HAND2, and FGF12 (Figure 6B, bottom; Online Figure XIIB; Online Table V).
The same tissue-specific functional trends were observed when we examined genes with elevated Gini coefficients and the highest expression in either the brain or the spleen, demonstrating the unbiased nature of this analysis and the specificity of our findings (Figure 6B, top). Genes with the highest degree of embryonic heart specificity (Gini, ≥0.9) included known heart developmental TFs (NKX2-5, NKX2-6, and TBX20), myosin light chain genes (MYL3, MYL4, and MYL7), the long noncoding RNA BANCR, and the sinoatrial node–associated channel gene, HCN4 (Online Figure XIIB). The single highest Gini coefficient gene assigned to embryonic heart was LRRC10—a leucine-rich repeat–containing protein previously identified as having cardiomyocyte-specific expression and linked to human dilated cardiomyopathy.85,86
When we analyzed this list of genes for potential regulatory effects, we found that more than half of the most significant predicted upstream regulators were genes from this list including MYOCD, TBX5, TBX20, HAND2, GATA4, NKX2-5, and NKX2-6 (Online Table VI). Consistent with these findings, we discovered that promoters of embryonic heart elevated Gini coefficient genes were significantly enriched for conserved NKX2-5 binding site motifs (Online Table VI). These findings suggest highly connected direct regulatory effects among these embryonic heart–specific genes. Additionally, we observed EHEs were enriched for motifs of many embryonic heart elevated Gini coefficient genes (GATA4, GATA6, TBX20, HAND2, and NKX [NK homeobox]; Figure 2D; Online Table IV). Combined, these findings show that the Gini coefficient effectively identifies many heart-related disease genes, enables novel inference of regulatory relationships between these genes, and implicates new genes in heart-related disease phenotypes.
Thus far, our gene expression analyses combined all of our samples into a single embryonic heart category, but dynamics of heart gene expression have been reported to be the greatest during the time points we have profiled.12 To characterize these dynamics, we first set out to identify genes that are differentially expressed in a pairwise fashion between each of the stages of embryonic heart development. We identified 7167 differentially expressed between all sets of time points (Online Figure XIII). The majority of differentially expressed genes were identified relative to the CS13 time point. Few genes were differentially expressed between time points CS17 to CS23, supporting the finding that gene expression dynamics are restricted to early heart development in humans.12 Hierarchical clustering of these differentially expressed genes identified 4 major groups of genes that show a wave of expression across this developmental series (Figure 6C; Online Figure XIIIA and XIIIB). Genes expressed most strongly in early samples were enriched for functions related to general embryo development/morphogenesis and cholesterol biosynthesis (Figure 6D, left). These genes were also enriched for functions related to neuron generation and differentiation suggesting similar early developmental architecture between the heart and brain (Online Table VII). Genes more strongly expressed at the end of the embryonic period were enriched for functions related to ion channel activity, growth factor binding, extracellular matrix organization, and intracellular signaling (Figure 6D, right; Online Table VII). The most significant enrichment of genes related to heart contraction and cell-cell adhesion were found in genes most strongly expressed at the CS17 time point (Online Figure XIVA, right). These results indicate we have captured much of the dynamics of gene expression during early heart development but that many important events likely happen even earlier than we can currently examine.
Regulatory Effects of Embryonic Heart Enhancers on Gene Expression
Since we found that EHEs are strongly enriched for binding sites of heart-related TFs and systematically located near genes with heart-related functions, we next wanted to understand the impact such regulatory elements might have on embryonic heart–specific gene expression. To address this question, we first assigned embryonic heart–specific enhancers to the single nearest gene. For genes that were assigned an EHE, most had a modest number assigned (mean, 4), but some genes had >30. When we interrogated gene expression, we found that an increasing number of EHE assignments was associated with greater differences in gene expression in embryonic heart tissues relative to all other tissues from GTEx (Figure 7A). Specifically, genes with >15 EHEs had significantly higher gene expression within embryonic heart samples than all other GTEx tissues (Figure 7A; Online Figure XIVB). We then asked if these EHEs could be associated with embryonic heart–specific expression by measuring the distance of each enhancer to high Gini coefficient genes identified in the embryonic heart from above. We found significant enrichments of EHEs near embryonic heart–specific genes relative to random permutations of the same number of enhancers across distances ranging from 5 to 100 kb (Figure 7B).
These results coupled with the enrichment of physical interactions of putative embryonic heart strong enhancers with promoters active in cardiomyocytes suggested a direct role for EHEs in driving embryonic heart–specific gene expression. Additionally, the sequence content of these enhancers indicated they are in turn regulated by many of these embryonic heart–specific genes pointing to the existence of coregulated networks of genes activated in early heart development.
Gene Coexpression Networks Reveal Trajectories of Coordinated Expression During Early Heart Development
Genes that are coexpressed across development have been proposed to share similar regulatory mechanisms and form networks that are important for normal development.80 Genes coexpressed during early brain development, particularly those that have correlations with many different genes or hub genes, are involved in Autism Spectrum Disorder (ASD) risk.87–89 We, therefore, hypothesized that identifying coexpression networks and resulting hub genes could reveal novel candidate CHD genes. To build a coexpression network during embryonic heart development in an unsupervised and unbiased fashion, we used weighted gene coexpression network analysis (WGCNA) using all 24 embryonic heart samples we have profiled.90 Using this approach, 26 122 genes were distributed across 29 modules, 20 of which showed enrichment for at least one gene ontology category (Online Table VIII). We identified modules with gene ontology enrichments expected for early heart development such as embryonic patterning (green, 2573 genes), muscle cell differentiation (brown, 4945 genes), and sarcomere assembly (violet, 1267 genes; Figure 7C).
Multidimensional scaling of the module eigengenes revealed intermodule coexpression, suggesting some modules were more closely related in their expression than others (Figure 7C). Comparison of trajectories of expression of eigengenes from each module revealed 4 groups of gene expression patterns that reflect positioning of each module in multidimensional scaling space (Figure 7D).90 Groups 1 and 3, which are diametrically opposed to one another in multidimensional scaling space, showed downward and upward trends of expression throughout the embryonic period, respectively. Groups 2 and 4, which are also opposite one another in multidimensional scaling space, showed multiphasic but offset patterns of expression. Group 2 showed a particularly strong wave-like pattern between CS16 and CS20. When we performed gene set enrichment analyses across gene expression from CS16, CS18, and CS20, we readily observed cyclical enrichment of a number of pathways (Online Figure XV). These included heart valve development, tissue migration, mitochondrial gene expression, and several metabolic processes.
Significance Tests Give Context to WGCNA of Early Developing Heart
To further characterize and validate the WGCNA network, we evaluated the enrichment of several curated gene lists. As we demonstrated above, binding sites for TFs expressed specifically in the embryonic heart were significantly enriched in embryonic heart–specific enhancers. This suggested that coordinated regulation between enhancer sites, the proteins that bind them, and the genes they target might be in action during early heart development. Five modules were significantly enriched for elevated Gini coefficient genes, while 2 modules were significantly enriched for known CHD genes (Benjamini-Hochberg adjusted P<0.05; Figure 8A). All of these modules were significantly enriched for genes assigned at least 1 EHE. Analysis of our modules with single-cell RNA-seq on a 6.5 postconception week human heart revealed that despite performing bulk RNA experiments, our modules are organized in cell type–relevant patterns.66 Analysis of genes that identify the major cell types from CS16 heart revealed that many of the modules in groups 1 through 3 of our network are significantly enriched for embryonic heart cell type–specific genes (Figure 8A). For example, the brown4 and violet modules that are enriched for gene ontologies related to the sarcomere and muscle cell development (Figure 7C) concordantly have significant enrichment for the cardiomyocyte cell types (Figure 8A). The combined enrichment of both specific gene expression and specific enhancer activation in these modules suggest the network we have constructed is particularly meaningful during embryonic heart development (Figure 8A).
To confirm that the WGCNA network is indeed uniquely able to identify heart-relevant biology, we leverage a coexpression network constructed for the developing human brain.87 Analysis of embryonic heart–specific genes on this network supported this hypothesis as only 2 modules show enrichment for heart high Gini genes (Online Figure XVI). No modules were significantly enriched for known CHD genes. Analysis of the CS16 heart cell-type associated genes revealed modules primarily in group 5 with enrichment, but overall, the network lacked the cell-type specificity patterns observed on the heart WGCNA network (Online Figure XVI).
Disease Relevance of WGCNA
Functional enrichments and significance tests revealed brown4 and violet modules to be important for normal heart development; however, they lacked enrichment for known CHD-causing genes. Together, these findings indicated these modules are important for normal heart development and thus we hypothesized might harbor novel CHD genes. To address this question, we leveraged measures of selection against loss-of-function variants from whole exome and whole genome sequencing of over 140 thousand healthy controls by the Genome Aggregation Database.91 Genes scoring in the lowest 2 deciles of the loss-of-function observed/expected upper bound fraction (LOEUF) measure were shown to be enriched for known haploinsufficient genes and genes essential for survival in cell culture models. When we analyzed our gene modules, we found significant enrichment of LOEUF decile 1 and 2 genes in the violet and brown4 modules, as well as many modules within expression groups 1 through 4 but not those outside these groups.
Notably, the violet module is strongly enriched for sarcomere-related genes, and the known CHD gene NKX2-5 was a top scoring hub node (Figure 8B). This TF has been previously implicated in cardiac cell structure and function, particularly in sarcomere organization, and falls into the second decile of LOEUF scores.92–94 Given the purported importance of hub genes from WGCNA analyses in disease and a critical cardiac TF being identified as both a hub gene and intolerant to disruption, we wondered whether hub genes in our network might be generally intolerant to loss-of-function mutations in otherwise healthy individuals.89,95–97 When we interrogated hub genes across the entire network, we found significant enrichment of low LOEUF score genes and significant depletion of genes from the tenth decile (Figure 8C). Genes that have not been implicated in CHD but are characterized by high connectivity in our network, heart-specific expression, and low LOEUF scores thus represent novel candidate CHD genes (Online Table IX).
WGCNA reveals NKX2-5 regulatory program.
Throughout this work, many of our results have confirmed a central role of NKX2-5 in human heart organogenesis. Its binding sites are enriched in both EHEs and the promoters of genes with heart-specific expression. In hESC (human embryonic stem cell)-derived cardiomyocytes, NKX2-5 bound the promoters and directly regulated the expression of many genes involved in heart development, specifically those involved in voltage-gated ion channel activity.49 Two prominent genes in the violet module that are directly regulated by NKX2-5 in human cardiomyocytes are the TFs HEY2 and IRX4 (Figure 8B; Online Figure XVII). Both of these have been shown to play a role in ventricular myogenesis in mice and are linked to heart abnormalities in humans.98–100
The existence of these direct targets in the same module built in an unsupervised fashion from only gene expression data suggests that these modules may reflect direct, physical connections of TFs and their target genes. To determine whether these observations were representative of larger patterns of regulation, we analyzed all WGCNA modules for enrichment of the 228 genes directly regulated by NKX2-5 in human cardiomyocytes.49 We found that indeed the violet module is enriched for direct regulatory targets, which were all well-connected hub genes, assigned embryonic heart-specific enhancers, and had elevated heart-specific expression values (Gini, ≥0.5; Figure 8B).
We also found enrichment of NKX2-5 target genes in 4 other modules (green, lightgreen, skyblue3, and mediumpurple3) that are enriched for embryonic patterning, ion channel function, and mitosis (Figure 7C; Online Figure XVIII; Online Table X). Genes that have the same characteristics as NKX2-5 in our networks, specifically high connectivity, tissue-specific expression, and potential regulation by heart-specific enhancers (Online Table X), are likely integral to normal heart development. Indeed, several genes identified in this data-driven fashion have been clearly linked to CHD as reported in Online Mendelian Inheritance in Man (OMIM) including NKX2-6 and TNNI3K, while others have been implicated in mouse or are pending further findings in human to be considered a CHD gene, such as IRX4, HEY2, and HCN4.
Finally, to further understand the functional relevance of the gene coexpression network, we reasoned that genes might be coexpressed with one another to physically interact. When we assessed our modules for protein-protein interactions, we found that 15 of our modules contained significant protein-protein interactions. This was a highly significant result relative to randomly constructed modules (Figure 8D), indicating this network has identified coherent connections for both gene regulation and physical interactions of proteins.
Here, we have jointly profiled chromatin state and gene expression during the organogenesis phase of human heart development. Our efforts have dramatically increased the number of regulatory sequences that are predicted to be active in the developing human heart. We have validated the function and specificity of these regulatory elements by leveraging a variety of publicly available data sources including TF binding, in vivo enhancer assays, genome-wide association studies, and chromosome conformation.
We have identified tens of thousands of novel regulatory sequences likely active specifically in the developing human heart. Even though these are newly identified sequences, they display characteristics familiar to the heart development field. For example, they are enriched with binding sites for families of TFs long appreciated to play a role in heart development such as GATA, MEF2, and NKX and are located in close proximity to genes with documented roles in mammalian heart development. Our systematic comparisons of gene expression between the developing heart and 25 other human tissues have allowed us to identify a set of genes with high specificity for expression during heart organogenesis. The embryonic heart–specific enhancers we have identified are enriched near these developing heart-specific genes. Furthermore, the number of heart-specific enhancers that can be assigned to a gene is correlated with relative expression level compared with all available human tissues. These findings strongly support a direct role for these regulatory sequences in driving embryonic heart–specific gene expression patterns.
In addition to individual regulatory sequences, we have characterized localized landscapes of coactivated enhancers commonly referred to as super enhancers. Large regulatory landscapes have been implicated in tissue specification and tumorigenesis and encompass genes important for these processes. In our data, we identified thousands of super enhancers across all time points of embryonic heart development, 1611 of which had not been previously annotated (Online Table II). These include embryonic heart–specific super enhancers near important heart genes, TBX20, GATA4, HAND1, HCN4, IRX3, IRX4, and IRX6, among others. Additionally, a significant majority of genes we identified with strong heart-specific expression, including UNC45B, METTL11B, MYL3, MYL4, MYL7, and AFP, are directly adjacent or completely encompassed by super enhancers active in the embryonic heart. These findings demonstrate the large number of regulatory inputs that control tissue-specification genes during heart development and suggest a high level of redundancy to maintain proper gene control.
Numerous studies have shown tissue-specific regulatory regions are enriched for variants associated with tissue-specific phenotypes and diseases.21,69,75 Our analysis strongly confirmed these previous results. Specifically, we showed variants associated with quantitative differences in a variety of heart phenotypes such as resting heart rate and pattern of beating were enriched most strongly in embryonic heart enhancers. Most strikingly, we observed consistent enrichment of variants associated with atrial fibrillation in strong enhancer segments across all of embryonic heart development. Recent studies found significant enrichment for AF-associated index variants in accessible chromatin from fetal heart and speculated on a fetal origin of AF.72,73 In the data we have presented here, variants that confer risk of AF are in established enhancers active even earlier, specifically those active at 4.5 weeks post-conception during the embryonic period of development. This suggests either AF causes a reversion to an embryonic pattern of gene regulation or predisposition to adult onset AF is established extremely early in development. Further study of this phenomenon is clearly warranted and could indicate AF should be considered a CHD.
One of the most interesting findings of the GWAS analysis came from analysis of the autoimmune disorder lupus that we had initially intended as a control. Heart disease is a frequent complication in lupus patients and is typically attributed to inflammation of the heart due to lupus-specific antibodies that direct immune cells to attack this organ. These results could suggest that inflammation activates portions of the genome that are normally repressed in heart development or that inappropriate activation of regions during heart development predisposes one to these complications. Importantly, these results revealed that repressed regions can also be systematically enriched for specific disease-relevant loci, which has not been previously described for chromatin state segmentation data.
While we confirmed enrichment of a number of cardiac disease and phenotype-associated variants in the strong enhancer segments, the original motivation of this study to identify enhancers that harbor variants for CHD risk proved untenable. Despite the common nature of the defects, our results indicate either common variants are not informative for risk of CHD with regard to regulatory element function or these GWAS studies were insufficiently powered to identify such effects. Larger numbers of whole genome sequences from CHD patients will be needed to determine whether rare variants in heart-specific enhancer segments play a significant role in CHD incidence.
We have for the first time jointly analyzed gene expression and chromatin state for embryonic human heart development. Our WGCNA identified modules of genes in an unbiased way, yet when we analyzed these groups of genes, we uncovered coherent biological functions and expression characteristics across this developmental trajectory. We found that a subset of modules is significantly enriched for both heart-specific gene expression and heart-specific enhancer activation. Many of these same modules are also enriched for known CHD genes, and well-connected or hub genes in these modules are generally intolerant to loss-of-function mutations in otherwise healthy individuals. When we systematically interrogated these networks with binding and functional data for the cardiac TF NKX2-5, we uncovered clear physical regulatory connections both within the NKX2-5–containing module and other modules. These NKX2-5 connected modules were enriched for functions this gene has been suggested to regulate, including activation of sarcomere and ion channel genes and repression of neurogenesis.46,49,93,101–103 These findings demonstrate that our networks represent real biological connections relevant to heart development. Genes with characteristics similar to NKX2-5, such as specificity of expression, highly connected to other genes in WGNCA modules, regulated by heart-specific enhancers, and low tolerance to gene disruption, are, therefore, prime candidates for CHD genes. The gene list we provide in Online Table IX is thus of particular interest for further research and potential CHD diagnostic applications.
We provide all these datasets openly in commonly used formats that are directly comparable to other large consortia. These can be downloaded from Gene Expression Omnibus, retrieved via public track hub functionality on the UCSC Genome Browser, or directly from the Cotney Lab website. Our Gini index scoring for over 36 000 genes from 31 tissues is broadly useful for studies of all of these tissues and identifies both housekeeping and highly tissue-specific genes. We anticipate the regulatory regions and genes we have identified will be useful in research on the regulation of heart development, can be used as tissue-specific drivers of reporters, and provide much needed functional context for noncoding variation in clinical whole genome sequencing.104
ChromHMM state 1 active transcription start site
ChromHMM state 2 promoter upstream transcription start site
ChromHMM state 3 promoter downstream transcription start site 1
ChromHMM state 4 promoter downstream transcription start site 2
ChromHMM state 5 transcribed 5' preferential
ChromHMM state 6 strong transcription
ChromHMM state 7 transcribed 3' preferential
ChromHMM state 8 weak transcription
ChromHMM state 9 transcribed and regulatory
ChromHMM state 10 transcribed 5' preferential and enhancer
ChromHMM state 11 transcribed 3' preferential and enhancer
ChromHMM state 12 transcribed and weak enhancer
ChromHMM state 13 active enhancer 1
ChromHMM state 14 active enhancer 2
ChromHMM state 15 active enhancer flank
ChromHMM state 16 weak enhancer 1
ChromHMM state 17 weak enhancer 2
ChromHMM state 18 primary H3K27ac.possible enhancer
ChromHMM state 19 primary DNase
ChromHMM state 20 ZNF genes and repeats
ChromHMM state 21 heterochromatin
ChromHMM state 22 poised promoter
ChromHMM state 23 bivalent promoter
ChromHMM state 24 PolyComb repressed
ChromHMM state 25 quiescent
congenital heart defect
chromatin immunoprecipitation followed by next-generation sequencing
embryonic heart.specific enhancer segments
Genotype-Tissue Expression Project
genome-wide association study
histone 2A variant Z
histone H3 lysine 4 monomethylation
histone H3 lysine 4 dimethylation
histone H3 lysine 4 trimethylation
histone H3 lysine 9 acetylation
histone H3 lysine 9 trimethylation
histone H3 lysine 27 acetylation
histone H3 lysine 27 trimethylation
histone H3 lysine 36 trimethylation
histone H3 lysine 79 dimethylation
histone H4 lysine 20 monomethylation
loss-of-function observed/expected upper bound fraction
myocyte enhancer factor-2
massively parallel reporter assay
NK2 homeobox 5
SMA and mothers against decapentaplegic gene family
T-box transcription factor 5
T-box transcription factor 20
transcription start site
weighted gene coexpression network analysis
The human embryonic material was provided by the Joint Medical Research Council (MRC) (G0700089)/Wellcome Trust (GR082557) Human Developmental Biology Resource. We thank Ion Moraru, Steven G. King, and Mike Wilson for computer resource support. We also thank John T. Hinson, Bruce Liang, Stefan Pinter, Nagham Khouri Farah, and Emma Wentworth Winchester for critical feedback during the preparation of the manuscript. J. Cotney conceived and designed the study; J. VanOudenhove and A. Wilderman performed wet laboratory experiments; J. VanOudenhove, T.N. Yankee, and J. Cotney performed statistical analysis; J. Cotney directed the project; J. VanOudenhove, T.N. Yankee, and J. Cotney drafted the article.
Sources of Funding
This work was supported by the National Institutes of Health grant R35GM119465 (to J. Cotney).
J. Cotney is listed as the inventor on a patent application submitted by the University of Connecticut related to the gene panel described in Online Table IX. The other authors reported no conflicts.
Expanded Methods Section
Online Figures I–XVIII
Online Tables I–X
van der Linde D, Konings EE, Slager MA, Witsenburg M, Helbing WA, Takkenberg JJ, Roos-Hesselink JW. Birth prevalence of congenital heart disease worldwide: a systematic review and meta-analysis.J Am Coll Cardiol. 2011; 58:2241–2247. doi: 10.1016/j.jacc.2011.08.025CrossrefMedlineGoogle Scholar
Liu Y, Chen S, Zühlke L, Black GC, Choy MK, Li N, Keavney BD. Global birth prevalence of congenital heart defects 1970-2017: updated systematic review and meta-analysis of 260 studies.Int J Epidemiol. 2019; 48:455–463. doi: 10.1093/ije/dyz009CrossrefMedlineGoogle Scholar
- 3. Centers for Disease Control and Prevention (CDC). Racial differences by gestational age in neonatal deaths attributable to congenital heart defects --- United States, 2003-2006.MMWR Morb Mortal Wkly Rep. 2010; 59:1208–11.MedlineGoogle Scholar
Cowan JR, Ware SM. Genetics and genetic testing in congenital heart disease.Clin Perinatol. 2015; 42:373–393, ix. doi: 10.1016/j.clp.2015.02.009CrossrefMedlineGoogle Scholar
Zaidi S, Brueckner M. Genetics and genomics of congenital heart disease.Circ Res. 2017; 120:923–940. doi: 10.1161/CIRCRESAHA.116.309140LinkGoogle Scholar
Soemedi R, Wilson IJ, Bentham J, Darlay R, Töpf A, Zelenika D, Cosgrove C, Setchfield K, Thornborough C, Granados-Riveron J,. Contribution of global rare copy-number variants to the risk of sporadic congenital heart disease.Am J Hum Genet. 2012; 91:489–501. doi: 10.1016/j.ajhg.2012.08.003CrossrefMedlineGoogle Scholar
Jin SC, Homsy J, Zaidi S, Lu Q, Morton S, DePalma SR, Zeng X, Qi H, Chang W, Sierant MC,. Contribution of rare inherited and de novo variants in 2,871 congenital heart disease probands.Nat Genet. 2017; 49:1593–1601. doi: 10.1038/ng.3970CrossrefMedlineGoogle Scholar
Glessner JT, Bick AG, Ito K, Homsy J, Rodriguez-Murillo L, Fromer M, Mazaika E, Vardarajan B, Italia M, Leipzig J,. Increased frequency of de novo copy number variants in congenital heart disease by integrative analysis of single nucleotide polymorphism array and exome sequence data.Circ Res. 2014; 115:884–896. doi: 10.1161/CIRCRESAHA.115.304458LinkGoogle Scholar
Homsy J, Zaidi S, Shen Y, Ware JS, Samocha KE, Karczewski KJ, DePalma SR, McKean D, Wakimoto H, Gorham J,. De novo mutations in congenital heart disease with neurodevelopmental and other congenital anomalies.Science. 2015; 350:1262–1266. doi: 10.1126/science.aac9396CrossrefMedlineGoogle Scholar
Zaidi S, Choi M, Wakimoto H, Ma L, Jiang J, Overton JD, Romano-Adesman A, Bjornson RD, Breitbart RE, Brown KK,. De novo mutations in histone-modifying genes in congenital heart disease.Nature. 2013; 498:220–223. doi: 10.1038/nature12141CrossrefMedlineGoogle Scholar
Smemo S, Campos LC, Moskowitz IP, Krieger JE, Pereira AC, Nobrega MA. Regulatory variation in a TBX5 enhancer leads to isolated congenital heart disease.Hum Mol Genet. 2012; 21:3255–3263. doi: 10.1093/hmg/dds165CrossrefMedlineGoogle Scholar
Cardoso-Moreira M, Halbert J, Valloton D, Velten B, Chen C, Shao Y, Liechti A, Ascenção K, Rummel C, Ovchinnikova S,. Gene expression across mammalian organ development.Nature. 2019; 571:505–509. doi: 10.1038/s41586-019-1338-5CrossrefMedlineGoogle Scholar
Schoenwolf GC, Bleyl SB, Brauer PR, Francis-West PH.Larsen’s Human Embryology. 4th ed. Ann Arbor, MI: Churchill Livingstone/Elsevier; 2009.Google Scholar
Paige SL, Thomas S, Stoick-Cooper CL, Wang H, Maves L, Sandstrom R, Pabon L, Reinecke H, Pratt G, Keller G,. A temporal chromatin signature in human embryonic stem cells identifies regulators of cardiac development.Cell. 2012; 151:221–232. doi: 10.1016/j.cell.2012.08.027CrossrefMedlineGoogle Scholar
Wamstad JA, Alexander JM, Truty RM, Shrikumar A, Li F, Eilertson KE, Ding H, Wylie JN, Pico AR, Capra JA,. Dynamic and coordinated epigenetic regulation of developmental transitions in the cardiac lineage.Cell. 2012; 151:206–220. doi: 10.1016/j.cell.2012.07.035CrossrefMedlineGoogle Scholar
Stone NR, Gifford CA, Thomas R, Pratt KJB, Samse-Knapp K, Mohamed TMA, Radzinsky EM, Schricker A, Ye L, Yu P,. Context-specific transcription factor functions regulate epigenomic and transcriptional dynamics during cardiac reprogramming.Cell Stem Cell. 2019; 25:87–102.e9. doi: 10.1016/j.stem.2019.06.012CrossrefMedlineGoogle Scholar
Ang YS, Rivas RN, Ribeiro AJS, Srivas R, Rivera J, Stone NR, Pratt K, Mohamed TMA, Fu JD, Spencer CI,. Disease model of GATA4 mutation reveals transcription factor cooperativity in human cardiogenesis.Cell. 2016; 167:1734–1749.e22. doi: 10.1016/j.cell.2016.11.033CrossrefMedlineGoogle Scholar
Akerberg BN, Gu F, VanDusen NJ, Zhang X, Dong R, Li K, Zhang B, Zhou B, Sethi I, Ma Q,. A reference map of murine cardiac transcription factor chromatin occupancy identifies dynamic and conserved enhancers.Nat Commun. 2019; 10:4907. doi: 10.1038/s41467-019-12812-3CrossrefMedlineGoogle Scholar
Kundaje A, Meuleman W, Ernst J, Bilenky M, Yen A, Heravi-Moussavi A, Kheradpour P, Zhang Z, Wang J, Ziller MJ,; Roadmap Epigenomics Consortium. Integrative analysis of 111 reference human epigenomes.Nature. 2015; 518:317–330. doi: 10.1038/nature14248CrossrefMedlineGoogle Scholar
- 20. Consortium EP. An integrated encyclopedia of DNA elements in the human genome.Nature. 2012; 489:57–74. doi: 10.1038/nature11247CrossrefMedlineGoogle Scholar
Wilderman A, VanOudenhove J, Kron J, Noonan JP, Cotney J. High-resolution epigenomic atlas of human embryonic craniofacial development.Cell Rep. 2018; 23:1581–1597. doi: 10.1016/j.celrep.2018.03.129CrossrefMedlineGoogle Scholar
Chadwick LH. The NIH roadmap epigenomics program data resource.Epigenomics. 2012; 4:317–324. doi: 10.2217/epi.12.18CrossrefMedlineGoogle Scholar
Dickel DE, Barozzi I, Zhu Y, Fukuda-Yuzawa Y, Osterwalder M, Mannion BJ, May D, Spurrell CH, Plajzer-Frick I, Pickle CS,. Genome-wide compendium and functional assessment of in vivo heart enhancers.Nat Commun. 2016; 7:12923. doi: 10.1038/ncomms12923CrossrefMedlineGoogle Scholar
van Duijvenboden K, de Boer BA, Capon N, Ruijter JM, Christoffels VM. EMERGE: a flexible modelling framework to predict genomic regulatory elements from genomic signatures.Nucleic Acids Res. 2016; 44:e42. doi: 10.1093/nar/gkv1144CrossrefMedlineGoogle Scholar
van Ouwerkerk AF, Bosada FM, van Duijvenboden K, Hill MC, Montefiori LE, Scholman KT, Liu J, de Vries AAF, Boukens BJ, Ellinor PT,. Identification of atrial fibrillation associated genes and functional non-coding variants.Nat Commun. 2019; 10:4755. doi: 10.1038/s41467-019-12721-5CrossrefMedlineGoogle Scholar
Pennacchio LA, Ahituv N, Moses AM, Prabhakar S, Nobrega MA, Shoukry M, Minovitsky S, Dubchak I, Holt A, Lewis KD,. In vivo enhancer analysis of human conserved non-coding sequences.Nature. 2006; 444:499–502. doi: 10.1038/nature05295CrossrefMedlineGoogle Scholar
Visel A, Prabhakar S, Akiyama JA, Shoukry M, Lewis KD, Holt A, Plajzer-Frick I, Afzal V, Rubin EM, Pennacchio LA. Ultraconservation identifies a small subset of extremely constrained developmental enhancers.Nat Genet. 2008; 40:158–160. doi: 10.1038/ng.2007.55CrossrefMedlineGoogle Scholar
Prabhakar S, Visel A, Akiyama JA, Shoukry M, Lewis KD, Holt A, Plajzer-Frick I, Morrison H, Fitzpatrick DR, Afzal V,. Human-specific gain of function in a developmental enhancer.Science. 2008; 321:1346–1350. doi: 10.1126/science.1159974CrossrefMedlineGoogle Scholar
Pollard KS, Salama SR, King B, Kern AD, Dreszer T, Katzman S, Siepel A, Pedersen JS, Bejerano G, Baertsch R,. Forces shaping the fastest evolving regions in the human genome.PLoS Genet. 2006; 2:e168. doi: 10.1371/journal.pgen.0020168CrossrefMedlineGoogle Scholar
Pollard KS, Hubisz MJ, Rosenbloom KR, Siepel A. Detection of nonneutral substitution rates on mammalian phylogenies.Genome Res. 2010; 20:110–121. doi: 10.1101/gr.097857.109CrossrefMedlineGoogle Scholar
Rentzsch P, Witten D, Cooper GM, Shendure J, Kircher M. CADD: predicting the deleteriousness of variants throughout the human genome.Nucleic Acids Res. 2018; 47:D886–D894. doi: 10.1093/nar/gky1016CrossrefGoogle Scholar
Kircher M, Witten DM, Jain P, O’Roak BJ, Cooper GM, Shendure J. A general framework for estimating the relative pathogenicity of human genetic variants.Nat Genet. 2014; 46:310–315. doi: 10.1038/ng.2892CrossrefMedlineGoogle Scholar
Zhou J, Troyanskaya OG. Predicting effects of noncoding variants with deep learning-based sequence model.Nat Methods. 2015; 12:931–934. doi: 10.1038/nmeth.3547CrossrefMedlineGoogle Scholar
Huang YF, Gulko B, Siepel A. Fast, scalable prediction of deleterious noncoding variants from functional and population genomic data.Nat Genet. 2017; 49:618–624. doi: 10.1038/ng.3810CrossrefMedlineGoogle Scholar
Gilsbach R, Schwaderer M, Preissl S, Grüning BA, Kranzhöfer D, Schneider P, Nührenberg TG, Mulero-Navarro S, Weichenhan D, Braun C,. Distinct epigenetic programs regulate cardiac myocyte development and disease in the human heart in vivo.Nat Commun. 2018; 9:391. doi: 10.1038/s41467-017-02762-zCrossrefMedlineGoogle Scholar
Visel A, Minovitsky S, Dubchak I, Pennacchio LA. VISTA enhancer browser–a database of tissue-specific human enhancers.Nucleic Acids Res. 2007; 35:D88–D92. doi: 10.1093/nar/gkl822CrossrefMedlineGoogle Scholar
Ernst J, Kellis M. Large-scale imputation of epigenomic datasets for systematic annotation of diverse human tissues.Nat Biotechnol. 2015; 33:364–376. doi: 10.1038/nbt.3157CrossrefMedlineGoogle Scholar
Olson EN. Gene regulatory networks in the evolution and development of the heart.Science. 2006; 313:1922–1927. doi: 10.1126/science.1132292CrossrefMedlineGoogle Scholar
Whitcomb J, Gharibeh L, Nemer M. From embryogenesis to adulthood: critical role for GATA factors in heart development and function.IUBMB Life. 2020; 72:53–67. doi: 10.1002/iub.2163CrossrefMedlineGoogle Scholar
Buijtendijk MFJ, Barnett P, van den Hoff MJB. Development of the human heart.Am J Med Genet C Semin Med Genet. 2020; 184:7–22. doi: 10.1002/ajmg.c.31778CrossrefMedlineGoogle Scholar
Sylva M, van den Hoff MJB, Moorman AFM. Development of the human heart.Am J Med Genet Part A. 2014; 164:1347–1371. doi: 10.1002/ajmg.a.35896CrossrefGoogle Scholar
Monaghan MG, Linneweh M, Liebscher S, Van Handel B, Layland SL, Schenke-Layland K. Endocardial-to-mesenchymal transformation and mesenchymal cell colonization at the onset of human cardiac valve development.Development. 2016; 143:473–482. doi: 10.1242/dev.133843CrossrefMedlineGoogle Scholar
Gutierrez-Roelens I, De Roy L, Ovaert C, Sluysmans T, Devriendt K, Brunner HG, Vikkula M. A novel CSX/NKX2-5 mutation causes autosomal-dominant AV block: are atrial fibrillation and syncopes part of the phenotype?Eur J Hum Genet. 2006; 14:1313–1316. doi: 10.1038/sj.ejhg.5201702CrossrefMedlineGoogle Scholar
McElhinney DB, Geiger E, Blinder J, Benson DW, Goldmuntz E. NKX2. 5mutations in patients with congenital heart disease.J Am Coll Cardiol. 2003; 42:1650–1655. doi: 10.1016/j.jacc.2003.05.004CrossrefMedlineGoogle Scholar
Goldmuntz E, Geiger E, Benson DW. NKX2.5 mutations in patients with tetralogy of fallot.Circulation. 2001; 104:2565–2568. doi: 10.1161/hc4601.098427LinkGoogle Scholar
Schott JJ, Benson DW, Basson CT, Pease W, Silberbach GM, Moak JP, Maron BJ, Seidman CE, Seidman JG. Congenital heart disease caused by mutations in the transcription factor NKX2-5.Science. 1998; 281:108–111. doi: 10.1126/science.281.5373.108CrossrefMedlineGoogle Scholar
Nakashima Y, Yanez DA, Touma M, Nakano H, Jaroszewicz A, Jordan MC, Pellegrini M, Roos KP, Nakano A. Nkx2-5 suppresses the proliferation of atrial myocytes and conduction system.Circ Res. 2014; 114:1103–1113. doi: 10.1161/CIRCRESAHA.114.303219LinkGoogle Scholar
Jay PY, Rozhitskaya O, Tarnavski O, Sherwood MC, Dorfman AL, Lu Y, Ueyama T, Izumo S. Haploinsufficiency of the cardiac transcription factor Nkx2-5 variably affects the expression of putative target genes.FASEB J. 2005; 19:1495–1497. doi: 10.1096/fj.04-3064fjeCrossrefMedlineGoogle Scholar
Anderson DJ, Kaplan DI, Bell KM, Koutsis K, Haynes JM, Mills RJ, Phelan DG, Qian EL, Leitoguinho AR, Arasaratnam D,. NKX2-5 regulates human cardiomyogenesis via a HEY2 dependent transcriptional network.Nat Commun. 2018; 9:1373. doi: 10.1038/s41467-018-03714-xCrossrefMedlineGoogle Scholar
Whyte WA, Orlando DA, Hnisz D, Abraham BJ, Lin CY, Kagey MH, Rahl PB, Lee TI, Young RA. Master transcription factors and mediator establish super-enhancers at key cell identity genes.Cell. 2013; 153:307–319. doi: 10.1016/j.cell.2013.03.035CrossrefMedlineGoogle Scholar
Lovén J, Hoke HA, Lin CY, Lau A, Orlando DA, Vakoc CR, Bradner JE, Lee TI, Young RA. Selective inhibition of tumor oncogenes by disruption of super-enhancers.Cell. 2013; 153:320–334. doi: 10.1016/j.cell.2013.03.036CrossrefMedlineGoogle Scholar
Pott S, Lieb JD. What are super-enhancers?Nat Genet. 2015; 47:8–12. doi: 10.1038/ng.3167CrossrefMedlineGoogle Scholar
Vatta M, Dumaine R, Varghese G, Richard TA, Shimizu W, Aihara N, Nademanee K, Brugada R, Brugada J, Veerakul G,. Genetic and biophysical basis of sudden unexplained nocturnal death syndrome (SUNDS), a disease allelic to brugada syndrome.Hum Mol Genet. 2002; 11:337–345. doi: 10.1093/hmg/11.3.337CrossrefMedlineGoogle Scholar
Kapplinger JD, Tester DJ, Alders M, Benito B, Berthet M, Brugada J, Brugada P, Fressart V, Guerchicoff A, Harris-Kerr C,. An international compendium of mutations in the SCN5A-encoded cardiac sodium channel in patients referred for brugada syndrome genetic testing.Heart Rhythm. 2010; 7:33–46. doi: 10.1016/j.hrthm.2009.09.069CrossrefMedlineGoogle Scholar
Man JCK, Mohan RA, Boogaard MVD, Hilvering CRE, Jenkins C, Wakker V, Bianchi V, Laat W, Barnett P, Boukens BJ,. An enhancer cluster controls gene activity and topology of the SCN5A-SCN10A locus in vivo.Nat Commun. 2019; 10:4943. doi: 10.1038/s41467-019-12856-5CrossrefMedlineGoogle Scholar
Eppinga RN, Hagemeijer Y, Burgess S, Hinds DA, Stefansson K, Gudbjartsson DF, van Veldhuisen DJ, Munroe PB, Verweij N, van der Harst P. Identification of genomic loci associated with resting heart rate and shared genetic predictors with all-cause mortality.Nat Genet. 2016; 48:1557–1563. doi: 10.1038/ng.3708CrossrefMedlineGoogle Scholar
Wojcik GL, Graff M, Nishimura KK, Tao R, Haessler J, Gignoux CR, Highland HM, Patel YM, Sorokin EP, Avery CL,. Genetic analyses of diverse populations improves discovery for complex traits.Nature. 2019; 570:514–518. doi: 10.1038/s41586-019-1310-4CrossrefMedlineGoogle Scholar
Bezzina CR, Barc J, Mizusawa Y, Remme CA, Gourraud JB, Simonet F, Verkerk AO, Schwartz PJ, Crotti L, Dagradi F,. Common variants at SCN5A-SCN10A and HEY2 are associated with Brugada syndrome, a rare disease with high risk of sudden cardiac death.Nat Genet. 2013; 45:1044–1049. doi: 10.1038/ng.2712CrossrefMedlineGoogle Scholar
Kapoor A, Lee D, Zhu L, Soliman EZ, Grove ML, Boerwinkle E, Arking DE, Chakravarti A. Multiple SCN5A variant enhancers modulate its cardiac gene expression and the QT interval.Proc Natl Acad Sci USA. 2019; 116:10636–10645. doi: 10.1073/pnas.1808734116CrossrefMedlineGoogle Scholar
Khan A, Zhang X. dbSUPER: a database of super-enhancers in mouse and human genome.Nucleic Acids Res. 2016; 44:D164–D171. doi: 10.1093/nar/gkv1002CrossrefMedlineGoogle Scholar
Boogerd CJ, Zhu X, Aneas I, Sakabe N, Zhang L, Sobreira DR, Montefiori L, Bogomolovas J, Joslin AC, Zhou B,. Tbx20 is required in mid-gestation cardiomyocytes and plays a central role in atrial development.Circ Res. 2018; 123:428–442. doi: 10.1161/CIRCRESAHA.118.311339LinkGoogle Scholar
Kirk EP, Sunde M, Costa MW, Rankin SA, Wolstein O, Castro ML, Butler TL, Hyun C, Guo G, Otway R,. Mutations in cardiac T-box factor gene TBX20 are associated with diverse cardiac pathologies, including defects of septation and valvulogenesis and cardiomyopathy.Am J Hum Genet. 2007; 81:280–291. doi: 10.1086/519530CrossrefMedlineGoogle Scholar
Mittal A, Sharma R, Prasad R, Bahl A, Khullar M. Role of cardiac TBX20 in dilated cardiomyopathy.Mol Cell Biochem. 2016; 414:129–136. doi: 10.1007/s11010-016-2666-5CrossrefMedlineGoogle Scholar
Montefiori LE, Sobreira DR, Sakabe NJ, Aneas I, Joslin AC, Hansen GT, Bozek G, Moskowitz IP, McNally EM, Nóbrega MA. A promoter interaction map for cardiovascular disease genetics.Elife. 2018; 7:e35788. doi: 10.7554/eLife.35788CrossrefMedlineGoogle Scholar
Jansen JA, van Veen TA, de Jong S, van der Nagel R, van Stuijvenberg L, Driessen H, Labzowski R, Oefner CM, Bosch AA, Nguyen TQ,. Reduced Cx43 expression triggers increased fibrosis due to enhanced fibroblast activity.Circ Arrhythm Electrophysiol. 2012; 5:380–390. doi: 10.1161/CIRCEP.111.966580LinkGoogle Scholar
Asp M, Giacomello S, Larsson L, Wu C, Fürth D, Qian X, Wärdell E, Custodio J, Reimegård J, Salmén F,. A spatiotemporal organ-wide gene expression and cell atlas of the developing human heart.Cell. 2019; 179:1647–1660.e19. doi: 10.1016/j.cell.2019.11.025CrossrefMedlineGoogle Scholar
Kieffer-Kwon KR, Tang Z, Mathe E, Qian J, Sung MH, Li G, Resch W, Baek S, Pruett N, Grøntved L,. Interactome maps of mouse gene regulatory domains reveal basic principles of transcriptional regulation.Cell. 2013; 155:1507–1520. doi: 10.1016/j.cell.2013.11.039CrossrefMedlineGoogle Scholar
DeMare LE, Leng J, Cotney J, Reilly SK, Yin J, Sarro R, Noonan JP. The genomic landscape of cohesin-associated chromatin interactions.Genome Res. 2013; 23:1224–1234. doi: 10.1101/gr.156570.113CrossrefMedlineGoogle Scholar
Iotchkova V, Ritchie GRS, Geihs M, Morganella S, Min JL, Walter K, Timpson NJ, Dunham I, Birney E, Soranzo N; UK10K Consortium. GARFIELD classifies disease-relevant genomic features through integration of functional annotations with association signals.Nat Genet. 2019; 51:343–353. doi: 10.1038/s41588-018-0322-6CrossrefMedlineGoogle Scholar
Roselli C, Chaffin MD, Weng LC, Aeschbacher S, Ahlberg G, Albert CM, Almgren P, Alonso A, Anderson CD, Aragam KG,. Multi-ethnic genome-wide association study for atrial fibrillation.Nat Genet. 2018; 50:1225–1233. doi: 10.1038/s41588-018-0133-9CrossrefMedlineGoogle Scholar
Bentham J, Morris DL, Graham DSC, Pinder CL, Tombleson P, Behrens TW, Martín J, Fairfax BP, Knight JC, Chen L,. Genetic association analyses implicate aberrant regulation of innate and adaptive immunity genes in the pathogenesis of systemic lupus erythematosus.Nat Genet. 2015; 47:1457–1464. doi: 10.1038/ng.3434CrossrefMedlineGoogle Scholar
Nielsen JB, Fritsche LG, Zhou W, Teslovich TM, Holmen OL, Gustafsson S, Gabrielsen ME, Schmidt EM, Beaumont R, Wolford BN,. Genome-wide study of atrial fibrillation identifies seven risk loci and highlights biological pathways and regulatory elements involved in cardiac development.Am J Hum Genet. 2018; 102:103–115. doi: 10.1016/j.ajhg.2017.12.003CrossrefMedlineGoogle Scholar
Nielsen JB, Thorolfsdottir RB, Fritsche LG, Zhou W, Skov MW, Graham SE, Herron TJ, McCarthy S, Schmidt EM, Sveinbjornsson G,. Biobank-driven genomic discovery yields new insight into atrial fibrillation biology.Nat Genet. 2018; 50:1234–1239. doi: 10.1038/s41588-018-0171-3CrossrefMedlineGoogle Scholar
Creyghton MP, Cheng AW, Welstead GG, Kooistra T, Carey BW, Steine EJ, Hanna J, Lodato MA, Frampton GM, Sharp PA,. Histone H3K27ac separates active from poised enhancers and predicts developmental state.Proc Natl Acad Sci USA. 2010; 107:21931–21936. doi: 10.1073/pnas.1016071107CrossrefMedlineGoogle Scholar
Maurano MT, Humbert R, Rynes E, Thurman RE, Haugen E, Wang H, Reynolds AP, Sandstrom R, Qu H, Brody J,. Systematic localization of common disease-associated variation in regulatory DNA.Science. 2012; 337:1190–1195. doi: 10.1126/science.1222794CrossrefMedlineGoogle Scholar
Collado-Torres L, Nellore A, Kammers K, Ellis SE, Taub MA, Hansen KD, Jaffe AE, Langmead B, Leek JT. Reproducible RNA-seq analysis using recount2.Nat Biotechnol. 2017; 35:319–321. doi: 10.1038/nbt.3838CrossrefMedlineGoogle Scholar
Nellore A, Collado-Torres L, Jaffe AE, Alquicira-Hernández J, Wilks C, Pritt J, Morton J, Leek JT, Langmead B. Rail-RNA: scalable analysis of RNA-seq splicing and coverage.Bioinformatics. 2017; 33:4033–4040. doi: 10.1093/bioinformatics/btw575CrossrefMedlineGoogle Scholar
- 78. GTex Consortium. The genotype-tissue expression (GTEx) project.Nat Genet. 2013; 45:580–585. doi: 10.1038/ng.2653CrossrefMedlineGoogle Scholar
Van Der Maaten L, Hinton G. Visualizing data using t-SNEJrnl of Mach Learning Research. 2008; 9:2579–2605.Google Scholar
Lage K, Hansen NT, Karlberg EO, Eklund AC, Roque FS, Donahoe PK, Szallasi Z, Jensen TS, Brunak S. A large-scale analysis of tissue-specific pathology and gene expression of human disease genes and complexes.Proc Natl Acad Sci USA. 2008; 105:20870–20875. doi: 10.1073/pnas.0810772105CrossrefMedlineGoogle Scholar
Aguet F, Brown AA, Castel SE, Davis JR, He Y, Jo B, Mohammadi P, Park YS, Parsana P, Segrè AV,. Genetic effects on gene expression across human tissues.Nature. 2017; 550:204–213. doi: 10.1038/nature24277CrossrefMedlineGoogle Scholar
Jiang L, Chen H, Pinello L, Yuan GC. GiniClust: detecting rare cell types from single-cell gene expression data with Gini index.Genome Biol. 2016; 17:144. doi: 10.1186/s13059-016-1010-4CrossrefMedlineGoogle Scholar
Gini C. IL diverso accrescimento Delle Classi Sociali E La concentrazione Della Ricchezza.G degli Econ. 1909; 38:27–83.Google Scholar
O’Hagan S, Wright Muelas M, Day PJ, Lundberg E, Kell DB. GeneGini: assessment via the Gini coefficient of reference “housekeeping” genes and diverse human transporter expression profiles.Cell Syst. 2018; 6:230–244.e1. doi: 10.1016/j.cels.2018.01.003CrossrefMedlineGoogle Scholar
Kim KH, Kim TG, Micales BK, Lyons GE, Lee Y. Dynamic expression patterns of leucine-rich repeat containing protein 10 in the heart.Dev Dyn. 2007; 236:2225–2234. doi: 10.1002/dvdy.21225CrossrefMedlineGoogle Scholar
Qu XK, Yuan F, Li RG, Xu L, Jing WF, Liu H, Xu YJ, Zhang M, Liu X, Fang WY,. Prevalence and spectrum of LRRC10 mutations associated with idiopathic dilated cardiomyopathy.Mol Med Rep. 2015; 12:3718–3724. doi: 10.3892/mmr.2015.3843CrossrefMedlineGoogle Scholar
Werling DM, Pochareddy S, Choi J, An JY, Sheppard B, Peng M, Li Z, Dastmalchi C, Santpere G, Sousa AMM,. Whole-genome and RNA sequencing reveal variation and transcriptomic coordination in the developing human prefrontal cortex.Cell Rep. 2020; 31:107489. doi: 10.1016/j.celrep.2020.03.053CrossrefMedlineGoogle Scholar
Parikshak NN, Luo R, Zhang A, Won H, Lowe JK, Chandran V, Horvath S, Geschwind DH. Integrative functional genomic analyses implicate specific molecular pathways and circuits in autism.Cell. 2013; 155:1008–1021. doi: 10.1016/j.cell.2013.10.031CrossrefMedlineGoogle Scholar
Willsey AJ, Sanders SJ, Li M, Dong S, Tebbenkamp AT, Muhle RA, Reilly SK, Lin L, Fertuzinhos S, Miller JA,. Coexpression networks implicate human midfetal deep cortical projection neurons in the pathogenesis of autism.Cell. 2013; 155:997–1007. doi: 10.1016/j.cell.2013.10.020CrossrefMedlineGoogle Scholar
Langfelder P, Horvath S. WGCNA: an R package for weighted correlation network analysis.BMC Bioinformatics. 2008; 9:559. doi: 10.1186/1471-2105-9-559CrossrefMedlineGoogle Scholar
Karczewski KJ, Francioli LC, Tiao G, Cummings BB, Alföldi J, Wang Q, Collins RL, Laricchia KM, Ganna A, Birnbaum DP,; Genome Aggregation Database Consortium. The mutational constraint spectrum quantified from variation in 141,456 humans.Nature. 2020; 581:434–443. doi: 10.1038/s41586-020-2308-7CrossrefMedlineGoogle Scholar
Kasahara H, Ueyama T, Wakimoto H, Liu MK, Maguire CT, Converso KL, Kang PM, Manning WJ, Lawitts J, Paul DL,. Nkx2.5 homeoprotein regulates expression of gap junction protein connexin 43 and sarcomere organization in postnatal cardiomyocytes.J Mol Cell Cardiol. 2003; 35:243–256. doi: 10.1016/s0022-2828(03)00002-6CrossrefMedlineGoogle Scholar
Pashmforoush M, Lu JT, Chen H, Amand TS, Kondo R, Pradervand S, Evans SM, Clark B, Feramisco JR, Giles W,. Nkx2-5 pathways and congenital heart disease; loss of ventricular myocyte lineage specification leads to progressive cardiomyopathy and complete heart block.Cell. 2004; 117:373–386. doi: 10.1016/s0092-8674(04)00405-2CrossrefMedlineGoogle Scholar
Terada R, Warren S, Lu JT, Chien KR, Wessels A, Kasahara H. Ablation of Nkx2-5 at mid-embryonic stage results in premature lethality and cardiac malformation.Cardiovasc Res. 2011; 91:289–299. doi: 10.1093/cvr/cvr037CrossrefMedlineGoogle Scholar
Liu C, Chang H, Li XH, Qi YF, Wang JO, Zhang Y, Yang XH. Network meta-analysis on the effects of DNA damage response-related gene mutations on overall survival of breast cancer based on TCGA database.J Cell Biochem. 2017; 118:4728–4734. doi: 10.1002/jcb.26140CrossrefMedlineGoogle Scholar
Yin L, Cai Z, Zhu B, Xu C. Identification of key pathways and genes in the dynamic progression of HCC based on WGCNA.Genes (Basel). 2018; 9:92. doi: 10.3390/genes9020092CrossrefMedlineGoogle Scholar
Langfelder P, Mischel PS, Horvath S. When is hub gene selection better than standard meta-analysis?PLoS One. 2013; 8:e61505. doi: 10.1371/journal.pone.0061505CrossrefMedlineGoogle Scholar
Hartman ME, Liu Y, Zhu WZ, Chien WM, Weldy CS, Fishman GI, Laflamme MA, Chin MT. Myocardial deletion of transcription factor CHF1/Hey2 results in altered myocyte action potential and mild conduction system expansion but does not alter conduction system function or promote spontaneous arrhythmias.FASEB J. 2014; 28:3007–3015. doi: 10.1096/fj.14-251728CrossrefMedlineGoogle Scholar
Koibuchi N, Chin MT. CHF1/Hey2 plays a pivotal role in left ventricular maturation through suppression of ectopic atrial gene expression.Circ Res. 2007; 100:850–855. doi: 10.1161/01.RES.0000261693.13269.bfLinkGoogle Scholar
Bruneau BG, Bao ZZ, Tanaka M, Schott JJ, Izumo S, Cepko CL, Seidman JG, Seidman CE. Cardiac expression of the ventricle-specific homeobox gene Irx4 is modulated by Nkx2-5 and dHand.Dev Biol. 2000; 217:266–277. doi: 10.1006/dbio.1999.9548CrossrefMedlineGoogle Scholar
Furtado MB, Wilmanns JC, Chandran A, Tonta M, Biben C, Eichenlaub M, Coleman HA, Berger S, Bouveret R, Singh R,. A novel conditional mouse model for Nkx2-5 reveals transcriptional regulation of cardiac ion channels.Differentiation. 2016; 91:29–41. doi: 10.1016/j.diff.2015.12.003CrossrefMedlineGoogle Scholar
Reamon-Buettner SM, Borlak J. Somatic NKX2-5 mutations as a novel mechanism of disease in complex congenital heart disease.J Med Genet. 2004; 41:684–690. doi: 10.1136/jmg.2003.017483CrossrefMedlineGoogle Scholar
Benson DW, Silberbach GM, Kavanaugh-McHugh A, Cottrill C, Zhang Y, Riggs S, Smalls O, Johnson MC, Watson MS, Seidman JG,. Mutations in the cardiac transcription factor NKX2.5 affect diverse cardiac developmental pathways.J Clin Invest. 1999; 104:1567–1573. doi: 10.1172/JCI8154CrossrefMedlineGoogle Scholar
Blankvoort S, Witter MP, Noonan J, Cotney J, Kentros C. Marked diversity of unique cortical enhancers enables neuron-specific tools by enhancer-driven gene expression.Curr Biol. 2018; 28:2103–2114.e5. doi: 10.1016/j.cub.2018.05.015CrossrefMedlineGoogle Scholar
Cotney JL, Noonan JP. Chromatin immunoprecipitation with fixed animal tissues and preparation for high-throughput sequencing.Cold Spring Harb Protoc. 2015; 2015:419. doi: 10.1101/pdb.err087585CrossrefMedlineGoogle Scholar
Langmead B, Salzberg SL. Fast gapped-read alignment with Bowtie 2.Nat Methods. 2012; 9:357–359. doi: 10.1038/nmeth.1923CrossrefMedlineGoogle Scholar
Landt SG, Marinov GK, Kundaje A, Kheradpour P, Pauli F, Batzoglou S, Bernstein BE, Bickel P, Brown JB, Cayting P,. ChIP-seq guidelines and practices of the ENCODE and modENCODE consortia.Genome Res. 2012; 22:1813–1831. doi: 10.1101/gr.136184.111CrossrefMedlineGoogle Scholar
Feng J, Liu T, Qin B, Zhang Y, Liu XS. Identifying ChIP-seq enrichment using MACS.Nat Protoc. 2012; 7:1728–1740. doi: 10.1038/nprot.2012.101CrossrefMedlineGoogle Scholar
Ernst J, Kellis M. ChromHMM: automating chromatin-state discovery and characterization.Nat Methods. 2012; 9:215–216. doi: 10.1038/nmeth.1906CrossrefMedlineGoogle Scholar
Ramírez F, Dündar F, Diehl S, Grüning BA, Manke T. deepTools: a flexible platform for exploring deep-sequencing data.Nucleic Acids Res. 2014; 42:W187–W191. doi: 10.1093/nar/gku365CrossrefMedlineGoogle Scholar
Stark R, Brown G. DiffBind: differential binding analysis of ChIP-Seq peak datatle.2016; 100:1–29.Google Scholar
Heinz S, Benner C, Spann N, Bertolino E, Lin YC, Laslo P, Cheng JX, Murre C, Singh H, Glass CK. Simple combinations of lineage-determining transcription factors prime cis-regulatory elements required for macrophage and B cell identities.Mol Cell. 2010; 38:576–589. doi: 10.1016/j.molcel.2010.05.004CrossrefMedlineGoogle Scholar
McLean CY, Bristor D, Hiller M, Clarke SL, Schaar BT, Lowe CB, Wenger AM, Bejerano G. GREAT improves functional interpretation of cis-regulatory regions.Nat Biotechnol. 2010; 28:495–501. doi: 10.1038/nbt.1630CrossrefMedlineGoogle Scholar
Collado-Torres L, Nellore A, Jaffe AE. Recount workflow: accessing over 70,000 human RNA-seq samples with bioconductor.F1000Research. 2017; 6:1558. doi: 10.12688/f1000research.12223.1CrossrefMedlineGoogle Scholar
Novelty and Significance
What Is Known?
Gene expression during mammalian development is controlled by many tissue and time point–specific regulatory sequences known as enhancers.
Heart enhancers have been identified in a single human fetal heart and several infant and adult hearts, all of which are derived after major steps in heart patterning.
Genes that are coexpressed during development of tissues and have low rates of mutations in healthy adult humans have been implicated in developmental disorders and disease.
What New Information Does This Article Contribute?
Annotation of genome-wide chromatin states based on histone modifications during the embryonic period of human heart development revealed thousands of sequences not previously predicted to be active in the heart.
Sequences predicted to be strong enhancers during heart organogenesis are enriched with cardiac TF (transcription factor)-binding sites and systematically located near genes with known roles in the developing mammalian heart or shown in this study to have highly specific expression in the developing human heart.
Putative strong enhancer sequences in the embryonic heart are enriched for common sequence variants associated with risk for atrial fibrillation but not congenital heart defects.
Comprehensive analysis of gene expression in the developing heart revealed organized patterns of coexpression that are enriched for heart-relevant biology.
Highly connected hub genes in the coexpression network are strongly enriched for genes with low mutational rates in healthy adult humans and likely disease candidates.
Most patients affected by congenital heart defects do not have defects in other tissues—a common feature of defects caused by damaged regulatory sequences or enhanceropathies. However, enhancers active during the organogenesis phase of human heart development when most congenital heart defects are thought to arise have not been identified. We have systematically characterized ChromHMM (chromatin state) in human embryonic hearts and revealed over 12 000 novel putative enhancers. These enhancers harbored all the hallmarks of tissue-specific enhancers: enrichment of cardiac-restricted TF-binding sites, located near genes with heart-specific expression patterns, and demonstrated heart-specific activity in catalogs of experimentally tested enhancers. Enhancers predicted to be strongly active in the developing human heart were significantly enriched for common variants associated with risk for atrial fibrillation demonstrating potential embryonic origins of this disorder. Construction of networks based on genes with similar trends of expression across heart development revealed a cohort of genes regulated by the canonical cardiac TF NKX2-5. These coexpression networks uncovered over 200 genes that have similar characteristics as NKX2-5: coexpression with many other genes, specific expression in the heart, and intolerance to mutation. These genes are strong candidate disease genes warranting further study.