Genetic Basis and Prognostic Value of Exercise QT Dynamics

Supplemental Digital Content is available in the text.

(1) Construction of three averaged ECG waveforms by aligning and averaging all heart beats within a 15s window at three stages along the exercise stress test: (A) resting (pre-exercise, rest), (B) peak-exercise (ex), and (C) 50s recovery from exercise (rec).
(2) From each averaged ECG waveform, we measured the RT interval as the interval between the R-peak and the end of the T-wave using the tangent method 5 (Supplemental Fig. 1). The corresponding RR intervals were calculated as the median of all RR intervals within the 15s window.

Genetic quality control
To study the genetic basis and risk of QT dynamics, we used genotyped data of individuals from European ancestry from UKB. Genotyping was performed by UKB using the Applied Biosystems UK BiLEVE Axiom Array or the UKB AxiomTM Array 6 . SNVs were imputed centrally by UKB using the Haplotype Reference Consortium (HRC) and UK10K/1000Genomes haplotype resource panels 6 . We applied genetic quality control (QC) to exclude individuals with bad genotype quality, provided by UKB, i.e. high missingness or heterozygosity and discordance between the self-reported sex, and the sex inferred from the genotypes were excluded (N = 2,655) 6 . We then restricted our dataset to individuals of European ancestry only (N = 71,713). This was achieved using the k-means function in R as a clustering algorithm to select clusters according to information from the first two principal components (PC1 and PC2). The k-means algorithm partitions the points into k groups such that the sum of squares from points to the assigned cluster centres is minimised. Then, we applied k-means separately to cluster according to each of PC1 and PC2, and initially only with k=4, for a 4-way clustering, to correspond to the 4 main ethnic clusters within UKB: White, African, Asian and Chinese. We then created an overall clustering, according to the intersections of the PC1-4means-clustering and the PC2-4means-clustering, so that participants were only categorised as "White" overall, if they were contained in the "White" cluster for both PC1 and PC2. Next, we created an overall "Mixed / Other" cluster, for any participants, whose clustering differed between PC1 and PC2. Finally, we combined the PCA ancestry clusters with the self-reported ethnicity. Individuals were only included if the results PCA-clustering results matched the self-reported ancestry.

Phenotypic quality control
We next excluded all individuals with pre-existing medical conditions known to affect QT dynamics. This included atrial fibrillation, history of myocardial infarction or heart failure, (supra)-ventricular tachycardia, atrioventricular nodal re-entrant tachycardia, second or third degree atrioventricular block, bundle branch block and use of a pacemaker.
We also excluded individuals with an extreme heart rate during rest (<40 or >120 bpm), peak exercise (>200 bpm) or at 1 min post-exercise (<40 or >200 bpm), individuals with very small changes in RR interval (inverse of heart rate) from rest to exercise, or exercise to recovery was (∆ <10 ms), and individuals with poor quality ECG recording.

Genome wide association analyses
Following genetic and phenotypic quality control, 51,884 and 51,503 individuals of European ancestry from the exercise cohort were included in the genome wide association analyses for QT dynamics during exercise and recovery ( Figure 1). We applied inverse-normal transformations before genetic analyses to correct for skewed distributions of the QT dynamics markers (Supplemental Fig. 3). Model SNVs were selected from the genotyped SNVs, required for the subsequent GWASs using PLINK 1.9 7 . This selection was based on the Since we did not have access to an independent study that could serve as a replication study, we randomly divided our dataset into discovery (N ~ 30,000) and replication (N ~ 22,000) datasets and removed individuals with kinship coefficient > 0.088 9 . We compiled all SNVs with P < 1 × 10 -6 from the discovery analysis and mapped them to individual loci based on genomic distance of > 500 Kb to each side of the lead SNV. If multiple SNVs fit the selection criteria for a single region, only the SNV with the smallest P value was considered for follow up. As a QC step, for each trait we reviewed the selected SNVs to check for unrealistically high effect sizes, large standard errors, and none were observed. Regional association plots were produced for all selected SNVs and these were carefully reviewed. For each trait, replication was confirmed if the P-value in the replication cohort was lower than the Bonferroni threshold and the effect size was in the same direction observed in discovery analyses in the replication cohort.
In addition to the replication analyses, we also performed a full dataset GWAS for the three traits. Additional loci, considering one lead SNV per 1 Mb region, for each trait reaching a genome-wide significance threshold (P ≤ 5 × 10 −8 ) from the full dataset GWAS were identified. In addition to the replication study, we also performed a GWAS in the full datasets and sex-stratified GWASs to identify additional loci.

Conditional analysis
For each trait, we examined the existence of SNVs independent to lead SNVs but tagging the same loci by applying genome-wide complex trait analysis (GCTA) 10 for all validated and genome-wide significant loci from the full dataset GWAS. A secondary signal would be declared if: (i) the newly identified SNV original P value was lower than 1x10 -6 ; (ii) there was less than a 1.5-fold difference between the lead SNV and secondary association P values on a -log10 scale, i.e., if -log10(Plead)/-log10(Psec) < 1.5; and (iii) there was less than a 1.5-fold difference between the main association and conditional association P values on a -log10 scale, i.e., if -log10(Psec)/-log10(Pcond) < 1.5.

Percent variance
For each trait, the percent of variance explained by all genome-wide significant variants and the secondary signals was derived by generating the residuals from the regression model of each trait against the covariates used in each respective genetic model. We then fitted a second linear model for the trait residuals with all the identified variants plus the top ten principal components to assess the variance explained.

Sex-stratified analyses
We performed sex-stratified analyses in the full cohort (QT dynamics during exercise: 28,449 females and 24,412 males; QT dynamics during recovery: 28,689 females, 24,172 males) including the same covariates in the regression model as the full dataset analyses but excluding sex. We then looked for genetic variants that were significantly associated with any of the traits in one of the sex-specific cohorts, but not in the primary analysis.

Overlap with reported loci for resting QT
To determine whether loci overlapped with resting QT interval, we downloaded all reported SNVs for resting QT from the NHGRI GWAS catalog. At the time of writing, the catalogue had not been updated with the latest results from Bihlmayer et al. 11 and Van Setten et al. 12 , which we added to the list resulting in 43 loci in total by mapping individual loci based on genomic distance of > 500 Kb to each side of the reported SNVs.. We then filtered out variants that were not genome or exome wide significant (P < 5 x 10 -8 or P < 2 x 10 -7 , respectively). Then for each known variant, we calculated the pairwise LD for all these variants which ever was the larger, was considered as the known locus.

Bioinformatics analyses
We performed comprehensive bioinformatics analyses to annotate loci at both variant (all SNVs in linkage disequilibrium (LD), r 2 ≥0.8, were considered) and gene level. The LD was calculated using genetic data from UKB within a 8Mb region (± 4Mb) around each known variant using PLINK v2 7 . For each lead SNV, we annotated the nearest genes and genes in which SNVs in LD (r 2 >0.4) with the lead SNV are located using University of California, Santa Cruz (UCSC) known genes. Variant effect predictor (VEP) analyses determined the effect of the variants, including the impact of amino acid substitutions 13 . We also investigated whether any of the discovered GWAS signals co-localized with genetic variants that regulate expression in adrenal, heart and brain tissues using expression quantitative trait locus (eQTL) signals from the Genotype-Tissue Expression (GTEx) dataset version 7 14 using COLOC software 15 . Potential target genes of regulatory SNVs were identified using long-range chromatin interaction (Hi-C) data 16 . In addition, DEPICT 17 (Data-driven Expression-Prioritised Integration for Complex Traits) analyses prioritized likely causal genes and tissues.
Enrichments with FDR < 0.05 were deemed significant. A literature review examined all identified genes at a locus in order to prioritize which candidate genes were used as an input for g:profiler 18 . Finally, trait pleiotropy was assessed by querying associated SNVs for other traits using PhenoScanner 19 . We also queried gene-specific animal models using International Mouse Phenotyping Consortium and the Mouse Genome Informatics database 20 .

Survival analyses
We conducted a survival analyses to investigate the prognostic value of QT dynamics during exercise and recovery in 56,643 individuals without a history of CV disease (Figure 1).
The following covariates were included: age, sex, diabetes, high cholesterol, body mass index (BMI), systolic blood pressure (SBP), diastolic blood pressure (DBP), resting heart rate, heart rate increase/decrease during exercise and recovery, and the resting heart rate-corrected QT  Table 2.
Two-tailed Mann-Whitney or Chi-squared tests were used to evaluate which variables were significantly different for individuals reaching the endpoints. Variables for which differences were statistically significant (P < 0.05) were included in univariate and, if differences remained significant, in the multivariate Cox regression analyses to investigate their predictive value.
Continuous variables were standardized to have a mean of 0 and a standard deviation of 1.
Proportional hazards assumptions were tested using Schoenfeld residuals. A P-value of < 0.05 was considered statistically significant. Statistical analyses were performed with R version 3.5.1 using the survival package.

Genetic risk scores
To investigate whether the genetic variants associated with QT dynamics during exercise and recovery modulate CV risk in healthy individuals, we constructed weighted genetic risk scores (GRSs) using all identified lead and secondary SNVs. Beta coefficients from the replication analyses were used as weights. We next evaluated whether the GRSs were associated with CV risk in 357,822 unrelated individuals from UKB comprised of European ancestry who were not included in the GWAS, passed genetic QC, and were free of a previous history of CV events (Figure 1). The impact of the GRS was evaluated by comparing the incidence of CV events in the top 5% vs bottom 5% of the GRS distribution using the Chi-square test.

Supplemental Figures
Supplemental Figure 1.

Derivation of average heart beat and QRST-waveform.
The end of the T-wave (Tend) was measured using the Tangent method. The QT interval was measured between QRS onset and the end of the T-wave. The RT interval was measured from R-peak to the end of the T-wave.

Supplemental Figure 2: Schematic illustration of the QT dynamics measurements during exercise and recovery.
Signal-averaged ECGs were obtained at three timepoints during the exercise protocol: rest (preexercise), peak-exercise, and recovery. At each time point, we computed the signals averaged ECG waveform to measure the RT interval. The QT dynamics during exercise was then approximated by the change in RT interval between rest and peak-exercise divided by the corresponding change in the RR interval. Similarly, the QT dynamics during recovery was approximated by dividing the change in RT interval between recovery and peak-exercise by the corresponding change in the RR interval. The RR interval corresponded to the median RR interval of all heart beats used to compute the averaged ECG waveform.

Histograms of QT dynamics during exercise (A) and QT dynamics during recovery (B).
The red line shows the estimated distribution shape.
Supplemental Figure 4. Association results of the QT dynamics GWAS in the full data. Supplemental Figure 6A. Regional plots for all lead variants for QT dynamics during exercise in the full analysis.
Supplemental Figure 6B. Regional plot for the sex-specific lead variant for QT dynamics during exercise in the full analysis of male data.
Supplemental Figure 6C. Regional plots for all lead variants for QT dynamics during recovery in the full analysis.
Supplemental Figure 7A.   Table 2. ICD-10 codes used in follow-up analysis Supplemental Table 3. Sex-stratified analyses for QT dynamics during exercise Supplemental Table 4. Overlap between loci associated with QT dynamics during exercise and recovery, and resting QT interval. Supplemental Table 5. Phenoscanner lookup for novel variants associated with QT dynamics during exercise from loci not overlapping with QT interval. Variants reported by from peer-reviewed journals (A) and from Neale's lab (B). Supplemental Table 6A. Annotation of lead and conditionally independent variants and their proxies (r2>0.8) of QT dynamics during exercise loci using Variant Effect Predictor. Supplemental Table 6B. Annotation of lead and conditionally independent variants and their proxies (r2>0.8) of QT dynamics during recovery loci using Variant Effect Predictor. Supplemental Figure 7. Chromatin interaction (Hi-C) results for all lead and conditionally independent variants or their proxies (r2>0.8) for QT dynamics loci. Supplemental Figure 8. Significant cis-eQTLs for QT dynamics associated variants Supplemental Figure 9A. DEPICT tissue and cell type enrichment results across QT dynamics during exercise Supplemental Figure 9B. DEPICT gene prioritization results for QT dynamics during exercise Supplemental Figure 10. Candidate genes at loci associated with QT dynamics during exercise and recovery Supplemental Figure 11A. g:profiler results for QT dynamics during exercise. Input genes are the candidate genes for QT dynamics during exercise highlighted in Supplementary Table 11. Supplemental Figure 11B. g:profiler results for QT dynamics during recovery. Input genes are the candidate genes for QT dynamics during exercise highlighted in Supplementary Table 11.