Investigation of Copy Number Variation in South African Patients With Congenital Heart Defects

Background: Congenital heart disease (CHD) is a leading non-infectious cause of pediatric morbidity and mortality worldwide. Although the etiology of CHD is poorly understood, genetic factors including copy number variants (CNVs) contribute to the risk of CHD in individuals of European ancestry. The presence of rare CNVs in African CHD populations is unknown. This study aimed to identify pathogenic and likely pathogenic CNVs in South African patients with CHD. Methods: Genotyping was performed on 90 patients with nonsyndromic CHD using the Affymetrix CytoScan HD platform. These data were used to identify large, rare CNVs in known CHD-associated genes and candidate genes. Results: We identified eight CNVs overlapping known CHD-associated genes (GATA4, CRKL, TBX1, FLT4, B3GAT3, NSD1) in six patients. The analysis also revealed CNVs encompassing five candidate genes likely to play a role in the development of CHD (DGCR8, KDM2A, JARID2, FSTL1, CYFIP1) in five patients. One patient was found to have 47, XXY karyotype. We report a total discovery yield of 6.7%, with 5.6% of the cohort carrying pathogenic or likely pathogenic CNVs expected to cause the observed phenotypes. Conclusions: In this study, we show that chromosomal microarray is an effective technique for identifying CNVs in African patients diagnosed with CHD and have demonstrated results similar to previous CHD genetic studies in Europeans. Novel potential CHD genes were also identified, indicating the value of genetic studies of CHD in ancestrally diverse populations.

their legal guardians or next of kin where appropriate, as well as a peripheral blood sample for DNA extraction.

Genotyping and filtering
Genotyping was performed using the Affymetrix CytoScan HD platform (Affymetrix, Santa Clara, CA USA). CMA data that did not pass quality control criteria (mean absolute pairwise difference ≤ 0.25, SNP quality control ≥ 15.0, waviness-standard deviation ≤ 0.12) were excluded from further analysis. Germline CNVs were called using Affymetrix Power Tools software v2.11.0 and based on human genome build hg19. CNVs with > 70% overlap with centromeres, telomeres or regions of segmental duplication were excluded, as were CNVs of low quality, as assessed using Affymetrix Chromosome Analysis Suit (confidence < 0.85, marker count < 50, mean marker distance < 15 kb). CNVs > 100 kb in size were extracted from the data and assessed, as described below, for their frequencies and overlap with a CHD gene panel. Mosaic CNVs were not investigated in this study.

CNV control populations
Two published control datasets were used: UK Biobank healthy controls (n = 472,378), 51 and a healthy pediatric African population from Tanzania (n = 3,463). 52 The UK Biobank was used as described previously. 53 CNVs occurring in < 0.01 of either of the control populations were considered rare and analyzed further.

CHD gene list
Large, rare CNVs were then compared to a list of CHD-associated genes (Supplemental Table   II). To identify CNVs occurring in gene regions known to be associated with CHD, all CNVs that overlapped with one or more CHD genes listed in the Genomics England PanelApp gene panel for non-syndromic CHD were investigated (available at https://panelapp.genomicsengland.co.uk/panels/212/). The Genomics England PanelApp is a publicly available crowdsourcing tool that represents a consensus of causative genes for many diseases. The genes in each panel are rated according to a traffic-light system, in which green represents genes with a high level of evidence for the gene-disease association, amber represents those with a moderate level of evidence, and red represents genes with minimal evidence. A list of the CHD genes in the panel alongside their genetic rating is presented in Supplemental Table II. These classifications are used to guide clinical interpretation, so all classifications (green, amber, red) were considered in this investigation to maximize discovery. 3 The PanelApp list was curated in-house with additional known and suspected genes for syndromic and non-syndromic CHD, that have been associated with CHD in at least one published paper. CNVs overlapping any of these genes, from PanelApp or the in-house list, were prioritized for further investigation and assessed using ACMG criteria for pathogenicity. 19

Identification of candidate CHD genes
Once CNVs overlapping known CHD-associated genes were analyzed, candidate CHD genes were identified by filtering the remaining genes using ExAC probability of loss-of-function intolerance (pLI) ≥ 0.8. 54 To further refine this list, genes were considered candidates for CHD if expression in the embryonic mouse heart was reported in the Gene Expression Database