Incorporating Latent Variables Using Nonnegative Matrix Factorization Improves Risk Stratification in Brugada Syndrome

Background A combination of clinical and electrocardiographic risk factors is used for risk stratification in Brugada syndrome. In this study, we tested the hypothesis that the incorporation of latent variables between variables using nonnegative matrix factorization can improve risk stratification compared with logistic regression. Methods and Results This was a retrospective cohort study of patients presented with Brugada electrocardiographic patterns between 2000 and 2016 from Hong Kong, China. The primary outcome was spontaneous ventricular tachycardia/ventricular fibrillation. The external validation cohort included patients from 3 countries. A total of 149 patients with Brugada syndrome (84% males, median age of presentation 50 [38–61] years) were included. Compared with the nonarrhythmic group (n=117, 79%), the spontaneous ventricular tachycardia/ ventricular fibrillation group (n=32, 21%) were more likely to suffer from syncope (69% versus 37%, P=0.001) and atrial fibrillation (16% versus 4%, P=0.023) as well as displayed longer QTc intervals (424 [399–449] versus 408 [386–425]; P=0.020). No difference in QRS interval was observed (108 [98–114] versus 102 [95–110], P=0.104). Logistic regression found that syncope (odds ratio, 3.79; 95% CI, 1.64–8.74; P=0.002), atrial fibrillation (odds ratio, 4.15; 95% CI, 1.12–15.36; P=0.033), QRS duration (odds ratio, 1.03; 95% CI, 1.002–1.06; P=0.037) and QTc interval (odds ratio, 1.02; 95% CI, 1.01–1.03; P=0.009) were significant predictors of spontaneous ventricular tachycardia/ventricular fibrillation. Increasing the number of latent variables of these electrocardiographic indices incorporated from n=0 (logistic regression) to n=6 by nonnegative matrix factorization improved the area under the curve of the receiving operating characteristics curve from 0.71 to 0.80. The model improves area under the curve of external validation cohort (n=227) from 0.64 to 0.71. Conclusions Nonnegative matrix factorization improves the predictive performance of arrhythmic outcomes by extracting latent features between different variables.

B rugada syndrome (BrS) is an arrhythmogenic entity characterized by the high propensity of ventricular tachycardia/ventricular fibrillation (VT/VF) or sudden cardiac death (SCD). [1][2][3] Subjects with spontaneous type 1 electrocardiographic (ECG) pattern and aborted SCD or syncope of arrhythmic origin are at the highest risk for future arrhythmic events and are advised to receive an implantable cardioverter-defibrillator. 4 However, emerging evidence clearly underscores our inability to stratify patients with BrS, particularly those who are asymptomatic. In a study of 50 individuals with SCD, the majority of SCDs related to BrS occurred in asymptomatic individuals (72%). 5 In the SABRUS (Survey on Arrhythmic Events in Brugada Syndrome) registry, only 75% of patients who exhibited an arrhythmic event after receiving a prophylactic implantable cardioverter-defibrillator complied with the 2013 class II indications of the Heart Rhythm Society, European Heart Rhythm Association, and Asia Pacific Heart Rhythm Society (syncope or inducible arrhythmias during programmed ventricular stimulation), 6 suggesting that efforts are still required for improving risk stratification.
Although the BrS was initially described as a primary electrical disease, 2,3 recent evidence has demonstrated the presence of structural abnormalities mainly located at the epicardium of the right ventricular outflow tract. 7,8 Two principal mechanisms have been proposed to explain the pathophysiologic basis of BrS: the depolarization hypothesis, based on the slow conduction of the right ventricular outflow tract, and the repolarization theory related to the transmural dispersion of the right ventricular action potential morphology, driven by the loss of the spike and dome action potential in the right ventricular epicardium. 9 So far, both repolarization and depolarization abnormalities have been associated with the development of VF in patients with BrS. 10 Different risk stratification models incorporating clinical risk factors and ECG variables have been developed. However, they do not assess the interrelations between risk variables. In this study, we used nonnegative matrix factorization (NMF) to extract latent variables capturing the inherent interrelations among risk variables. NMF is a group of algorithms in multivariate analysis and linear algebra where a matrix V is factorized usually into 2 matrices W and H, with the property that all 3 matrices have no negative elements. This nonnegativity makes the resulting matrices easier to inspect. This enabled us to test the hypothesis that incorporation of latent variables can improve outcome prediction compared with logistic regression alone.

Inclusion of Study Subjects
This retrospective study received ethics approval from The Joint Chinese University of Hong Kong -New Territories East Cluster Clinical Research Ethics Committee (NTEC-CUHK) and the Medical Ethical Review Committee of the Evangelismos General Hospital of Athens. Data on patients with BrS with spontaneous or drug-induced (ajmaline 1 mg/kg or flecainide 2 mg/kg) type 1 BrS ECG pattern from Hong Kong, China, were retrospectively analyzed. The analyses for our cohort (the training set) were validated against an external cohort (the validation set) of data on patients with BrS from Athens, Greece; Dalian, China; Guangzhou, China; and Osaka, Japan. The anonymized databases have been made available by the investigators at Zenodo: https://zenodo.org/record/3266179; https://zenodo.org/record/3465811. [11][12][13] The ECG diagnosis of BrS was strictly based on the recommendations of the 2015 European Society of Cardiology guidelines for the management of patients with ventricular arrhythmias and the prevention of SCD. 4 The presence of tructural heart disease was excluded in all subjects. The following demographic and clinical details were extracted from medical case records: age, sex, aborted SCD, syncopal symptoms, spontaneous VT/VF, and inducible VT/VF during programmed ventricular stimulation. Programmed right ventricular apex stimulation was performed at 3 running cycle lengths (600, 500, and 430 milliseconds) with up to triple extrastimuli (minimum coupled extrastimuli of 200 milliseconds). Inducible ventricular arrhythmia was defined as any ventricular arrhythmia (VT/VF) causing syncope/circulatory collapse or requiring intervention for its termination.

Electrocardiographic Variables
The 12-lead ECGs were recorded at a paper speed of 25 mm/s with an amplification of 10 mm/mV CLINICAL PERSPECTIVE What Is New?
• Nonnegative matrix factorization was used to extract latent features between clinical variables (initial type 1 pattern, syncope, atrial fibrillation) and electrocardiographic variables (QRS duration, QTc interval) in Brugada syndrome. • Application of nonnegative matrix factorization improved prediction of incident ventricular tachycardia/ventricular fibrillation compared with logistic regression.
What Are the Clinical Implications?
• Incorporation of information on inherent higher order interrelations among variables improved prediction of ventricular tachycardia/ventricular fibrillation in Brugada syndrome.

BrS
Brugada syndrome NMF nonnegative matrix factorization SCD sudden cardiac death (sampling rate: 10 seconds of ECG at 500 Hz, filters: 0.5-100 Hz). The following automated measurements were extracted from the baseline ECG records: heart rate, PR interval, QRS duration reflecting total depolarization time (beginning of Q to the end of S), and QTc interval reflecting total repolarization time with and without correction for heart rate using Bazett's formula.

Statistical Analysis
Data were expressed as median [interquartile range]. Differences between groups were tested using Kruskal-Wallis analysis of 1-way variance (ANOVA). The optimal cutoff values of the ECG variables, defined as the value maximizing the sum of sensitivity and specificity, for spontaneous VT/VF, were determined using the Youden index from receiver operating characteristic curve analysis. A P value <0.05 was considered statistically significant. Their predictive values were represented by the area under the curve (AUC) values from receiver operating characteristic analyses. Logistic regression was performed to determine the predictive value of different variables for spontaneous VT/VF. The results were presented as odds ratio (OR) with 95% CIs, with P values <0.05 considered statistically significant.

Nonnegative Matrix Factorization
An NMF approach was used to capture inherent interrelations among risk variables. We then used the latent variables to develop a machine learning model to enhance the performance in predicting spontaneous ventricular tachycardia/ventricular fibrillation. First, we constructed matrix V representing the interrelations between clinical variables (initial type 1 pattern, syncope, atrial fibrillation [AF]) and ECG variables (QRS duration, QTc interval). Second, nonnegative matrix factorization was used to decompose matrix V into a core matrix W multiplied by a matrix H with different component cases (ie, number of latent variables generated). The generated latent variables were then combined with the risk variables as the input for logistic regression.

Clinical Characteristics
This study cohort consisted of 149 patients with BrS from Hong Kong, China. The baseline demographic and clinical characteristics are presented in Table 1. The mean age was 50 (38-61) years old and 84% of the subjects were male. Syncope occurred in 65 (44%) and spontaneous VT/VF occurred in 32 (21%) patients. Atrial fibrillation was found in 10 (7%) patients. An initial spontaneous type 1 ECG pattern was recorded in 79 patients (53%). An implantable cardioverter-defibrillator was inserted in 47 (32%) subjects. In the cohort, 44 (30%) underwent electrophysiological studies and 28 (64%) were positive tests.

Regression Analysis With Latent Variables Extracted by NMF
Next, we constructed a variable matrix and then generated latent variables by performing NMF on the variable matrix to predict spontaneous VT/VF. The total number of components is denoted d, which indicates the number of extracted latent variables. We then combined the d additional latent variables and the 4 ECG variables in the 4-point score system and then use them to fit a logistic regression. The regression performance under different combination cases of d latent variables are given in Table 4 and were compared with the performance with the baseline model (ie, logistic regression without latent variables). The AUC and 3 common metrics, precision, recall and F1 score, were reported to evaluate the performance. A 2-fold cross-validation was adopted to avoid overfitting concerns.
Using logistic regression as a baseline model, the AUC was 0.7101 (Table 4, left side). However, incorporating d=2, 3, 4, 5, and 6 additional latent variables by NMF led to improvement in the AUC values to 0.72, 0.73, 0.80, 0.79, and 0.73, respectively. Therefore, NMF improved the prediction performance over the baseline model using simple logistic regression with latent variables. The model achieved the best performance with d=4 latent variables. Our models were also validated using an external cohort from Athens, Greece; Dalian, China; Guangzhou, China; and Osaka, Japan (Table 4, right side). For the external cohort, the AUC was 0.64 using logistic regression, which was improved by NMF to 0.68, 0.68, 0.70, 0.71, and 0.69, through incorporating d=2, 3, 4, 5, and 6 additional latent variables, respectively. The model achieved the best performance with d=5 latent variables for external cohort validation.

DISCUSSION
The main findings of this study are that: 1. Syncope, AF, QRS duration, and QTc interval were significant predictors of spontaneous VT/VF; 2. Incorporation of information on inherent higher order interrelations among variables improved prediction of VT/VF.
The genesis of BrS remains controversial. Both depolarization (conduction delay within the right ventricular outflow tract, prolonged and fractionated potentials) and repolarization (imbalance of epicardial/endocardial repolarizing currents) abnormalities have been suggested to play crucial roles in the arrhythmogenesis of BrS. [14][15][16] Previous studies have addressed the prognostic significance of specific depolarization and repolarization ECG markers. A prolonged QRS duration has been associated with an increased risk for arrhythmic events. 17 For depolarization markers, QRS fragmentation has been associated with a 3.9-fold increase in the risk of future arrhythmic events 18 and a worse prognosis 19 in BrS. The presence of a wide and/or large S-wave in lead I has been suggested as a powerful predictor of ventricular arrhythmias. The presence of a wide S-wave in lead I is possibly related to the delayed activation in the right ventricular outflow tract. 20 Among the repolarization ECG markers, a prolonged QTc interval, which reflects delayed cellular repolarization, has been associated with an increased risk for VT/VF and SCD in BrS. 21 Other significant ECG predictors include the presence  of an early repolarization pattern, 22 a low-voltage type 1 ECG. 23 AF 24 and syncope 25,26 are observed in patients with BrS and are predictive of spontaneous VT/VF, [27][28][29] as also seen in our study.
In this study, we used a method called NMF to extract latent variables and utilize this information for risk stratification. This led to significant improvements in the regression performance over the benchmark logistic regression-based model. Only several groups have applied the NMF technique to ECG signal analysis. First, Yao et al reported that sparse constrained NMF improved classification of ECGs between the following diseases of right bundle branch block, left bundle branch block, premature ventricular contractions, paced rhythms, and patients without these abnormalities, using data from the Massachusetts Institute of Technology-Beth Israel Hospital database. 30 Second, Guyot et al used NMF for the preprocessing of longterm ECGs, demonstrating that this simultaneously performed 3 tasks of denoising, baseline wander removal, and peak R detection. 31 Here, we extend NMF analyses for the first time in patients with BrS , demonstrating that it can be used to improve classification accuracy of novel electrocardiographic depolarization-repolarization indices between spontaneous VT/ VF and nonarrhythmic groups.

Limitations
Several limitations of this study should be recognized. First, this is a retrospective study and is therefore susceptible to bias inherent in this type of studies. Second, we fully recognize that the proposed electrocardiographic indices were measured at baseline and do not reflect the full dynamic nature of the electrophysiological changes in BrS. Indeed, improvement in risk stratification can be achieved with markers such as restitution indices or markers measured during stress [32][33][34] or by assessing temporal variability of different ECG markers. 35 Finally, the sample size was moderate with 376 patients, but our findings need to be further validated in larger prospective studies.

CONCLUSIONS
The present findings suggest that nonnegative matrix factorization improves the predictive performance by extracting latent features between different variables.