Spontaneous Coronary Artery Dissection

Supplemental Digital Content is available in the text.


SCAD tiered gene list
We compiled a tiered list of genes of interest to SCAD based on publicly available gene lists and literature searches. The tiers indicate the current level of evidence for the association of each gene with SCAD. Tier 1 genes (n=6) comprise those that either harbour rare variants reported as pathogenic or likely pathogenic in multiple SCAD patients, or an enrichment of rare missense variants in cases compared to controls. Tier 2 genes (n=124) comprise those that harbour rare, presumed clinically relevant, variants that were found in a single SCAD patient or among patients with connective tissue or vascular disorders, or common variants associated with these disorders. Tier 3 genes (n=303) represent genes of interest contributing to relevant phenotypes in mice (Supplementary Table 1).

Identification and interpretation of pathogenic variants
For the UK SCAD cohort, pathogenic and likely pathogenic SNVs, indels, and SVs were identified using automated filtering followed by manual review and classification according to ACMG guidelines (14).
In detail, first, automated filtering identified SNVs and indels in the 384 SCAD cases that fulfilled all the following criteria:  DRAGEN Status = PASS  Quality score (QUAL) >= 30 in all carriers  Position is covered >=10X in >99% cases  GnomAD minor allele frequency (popmax exomes) <= 0.001 (0.1%; in autosomes this is equivalent to 1 in 500 individuals sampled from a general population) (15)  Minor allele frequency in 384 cases < 0.05 (5%)  Affects a SCAD tier 1 or tier 2 gene  Variant is 'high impact' in a gene for which loss of function is a known disease mechanism OR variant has previously been reported as DM in HGMD pro v2019.2 or pathogenic or likely pathogenic with no conflicts in ClinVar (accessed May 2019). (16,17) Next, we performed manual review and classification of SNVs and indels according to ACMG guidelines. (18) The following factors were considered:  Affected transcript(s)  Proximity of protein truncating variants to 3' end terminus of gene  Consistency of variant consequence with previously reported pathogenic variants in the gene  Consistency of variant genotype with previously reported mode of inheritance of the gene  Consistency of SCAD phenotype with previously reported phenotype associated with the gene  If variant has been previously reported as pathogenic, original literature was reviewed wherever possible to ensure the classification within HGMD/ClinVar was consistent with the original report Potentially pathogenic and likely pathogenic structural variants were also investigated. First, automated filtering identified the SVs in the 384 SCAD cases that fulfilled the following criteria:  Deletion, because deletions generally have higher probability than other SV types of negatively impacting the function of a gene (19,20)  Overlaps with CCDS region of SCAD tier 1 and 2 genes  Frequency within SCAD cohort < 0.05 (5%) and frequency in external datasets (DGV and 1000 genomes (19,20)) < 0.01 (1%)  Supported by more than one caller among Parliament2's ensemble of callers  Not flagged as low quality by Parliament2 Next, we performed manual review of SVs that passed automated filtering, considering the following factors in addition to all the factors already described for SNVs and indels:  < 100 high quality heterozygous SNV calls within boundaries of deletion  Supported by both drop in coverage AND insert size/split read data  Looks high-confidence upon manual inspection of the reads using Integrative Genomics Viewer (21) Assessment of statistical differences between clinical endpoints and individuals with pathogenic/likely pathogenic variants compared to those without was performed using Fisher's Exact Tests.
Automated filtering of variants in genes identified in the University of Leicester SCAD cohort in the Victor Chang Cardiac Research Institute SCAD cohort was performed using VPOT (22) with the following parameters:  gnomAD genomes (popmax) minor allele frequency < 0.01  Missense, nonsense, frameshift, or splicing variants Loss of function variants in these genes were considered as well as missense variants reported as pathogenic or likely pathogenic in either ClinVar or HGMD. Variants thus identified were confirmed with manual inspection using IGV. (21)

Collapsing analysis
We selected the subset of 357 SCAD cases in the University of Leicester cohort and 13,722 UK Biobank controls who had high quality sequencing data, are unrelated, and of European ancestry. This aimed to minimise the risk of confounding technical artefacts. In detail, the following sample-level criteria for inclusion in the exome-wide collapsing analyses were applied:  No evidence of contamination (VerifyBamID FREEMIX < 0.04) (23)  Good coverage (for cases average coverage of CCDS > 1 st percentile and < 99 th percentile and for controls >= 95% of CCDS with read depth >= 10X and average read depth >=37X and <=130X)  Percentage of reads that map to reference genome > 1 st percentile (cases only)  Unrelated (i.e. exclude one of each related pair, up to 3 rd degree, calculated using PLINK http://pngu.mgh.harvard.edu/purcell/plink/) (24)  Predicted to be of European ancestry by Peddy (25)  Concordance between self-declared and genetic prediction of sex  Ancestry of controls PC1 or PC2 <= 3SDs from mean of those of cases  Down-sampled controls to harmonise sex and menopause status of cohort with that of cases On average, at least 10-fold coverage was achieved for 96.7% and 96.6% of the 34.07 megabase pairs (Mbp) of the Consensus Coding Sequence (CCDS; release 22) for case and control subjects respectively. To alleviate confounding effects attributable to differential coverage, we only considered qualifying variants (QVs) affecting a pruned set of 33.13 Mbp (97.2%) of CCDS sites equally represented in HLI WGS and UK Biobank WES data.
Qualifying variants (QVs) are the subset of rare, high-quality, coding SNVs/indels that are considered during collapsing analysis. We used eleven distinct QV models. Selection of QVs was achieved by imposing a series of variant-level filters. Some of these filters were applied to all QV models and some were specific to the eleven distinct QV models. These filters are detailed below.
For all QV models:  After QVs had been selected, counts of cases that have at least one QV vs those that have no QVs were compared to controls using the two-tailed Fisher's exact test. Our study-wide significance threshold, after Bonferroni correction for the number of genes and models tested was α = (0.05/[10 × 18,659]) = 2.7×10 −7 . Although we retain genome-wide Bonferroni correction as our official significance cut-off (p<2.7×10 −7 ), we also assigned a tissue-specific adjusted alpha of p<4.1x10 -6 considering only 12,069 genes with expression levels above 1.5 TPM based on mean TPM value for the "Artery -Coronary" tissue subtype in the GTEx database (accessed 19/11/2019) and not correcting for 10 differing models. With a goal of identifying a more refined subset of most highly expressed coronary artery tissue genes we further focused on top decile (10%) (n=1,928 genes; adjusted p<2.6x10-5).

Review of highly ranked non-significant collapsing analysis results
Genes that were highly ranked in the collapsing analysis, but not yet achieving study-wide statistical significance were manually evaluated by AAB and TRW. Genes were assessed for reported function, involvement in human disease, human tissue expression, and mouse phenotype using GeneCards, OMIM, Human Protein Atlas, MGI, as well as a broader literature review. (27)(28)(29)(30)

Mantis-ml
We also employed mantis-ml v1.5.4 (31), an automated gene prioritisation tool that considers a wealth of publicly available resources to objectively assign probabilities to genes of unknown relevance given an input set of seed genes; here 130 SCAD tier 1 and tier 2 genes (Supplementary Table 1). Automatic feature compilation was performed by providing the following disease/phenotype terms in the input configuration file: heart, cardio, aortic, aorta, coronary, vascular, artery, dissection, fibromuscular, kidney, vessel and connective tissue. Mantis-ml was trained using six different classifiers: Extra Trees, XGBoost, Random Forest, Gradient Boosting, Support Vector Classifier and feed-forward Deep Neural Net.
Once the mantis-ml genome-wide probabilities of being a SCAD gene were generated, we performed a hypergeometric test to determine whether the top-ranked collapsing analysis genes (i.e. genes achieving a p<0.05 in the collapsing analyses) were significantly enriched for the top 5% of mantis-ml SCAD-predicted genes. A statistically significant result from the hypergeometric test would highlight that there are disease-ascertained genes among the top of the collapsing results and the specific genes most likely to be contributing to that enrichment. In parallel, and in addition to generating a permutation-based null, we also performed the hypergeometric enrichment test using the synonymous genetic model to define our empirical null controlling for the underlying case-control configurations.

Gene-set enrichment analysis
We assessed potential enrichment in gene-sets using Megagene (https://github.com/QuanliWang/MegaCollapsing). (32) Briefly, we applied a logistic regression model in which the tally of genes containing QVs in cases and controls are compared, correcting for sex, number of synonymous QVs each individual has in the gene-set, and the exome-wide tally of QVs each individual has in the QV model. We tested a total of 9,339 gene-sets for each QV model.
Apart from the four SCAD gene-sets, these gene-sets are standardised and designed to be diseaseagnostic Gene-sets containing two different genes with overlapping CCDS regions were excluded.
Gene-sets comprise the following:  ten different genetic models tested. MTR = missense tolerance ratio. PTV = protein truncating variant.

Supplementary
Supplementary