Development and Validation of 3‐Year Atrial Fibrillation Prediction Models Using Electronic Health Record With or Without Standardized Electrocardiogram Diagnosis and a Performance Comparison Among Models

Background Improved prediction of atrial fibrillation (AF) may allow for earlier interventions for stroke prevention, as well as mortality and morbidity from other AF‐related complications. We developed a clinically feasible and accurate AF prediction model using electronic health records and computerized ECG interpretation. Methods and Results A total of 671 318 patients were screened from 3 tertiary hospitals. After careful exclusion of cases with missing values and a prior AF diagnosis, AF prediction models were developed from the derivation cohort of 25 584 patients without AF at baseline. In the internal/external validation cohort of 117 523 patients, the model using 6 clinical features and 5 ECG diagnoses showed the highest performance for 3‐year new‐onset AF prediction (C‐statistic, 0.796 [95% CI, 0.785–0.806]). A more simplified model using age, sex, and 5 ECG diagnoses (atrioventricular block, fusion beats, marked sinus arrhythmia, supraventricular premature complex, and wide QRS complex) had comparable predictive power (C‐statistic, 0.777 [95% CI, 0.766–0.788]). The simplified model showed a similar or better predictive performance than the previous models. In the subgroup analysis, the models performed relatively better in patients without risk factors. Specifically, the predictive power was lower in patients with heart failure or decreased renal function. Conclusions Although the 3‐year AF prediction model using both clinical and ECG variables showed the highest performance, the simplified model using age, sex, and 5 ECG diagnoses also had a comparable prediction power with broad applicability for incident AF.

and there are no clinical symptoms despite a similarly poor outcome in asymptomatic and symptomatic patients. [2][3][4] A previous meta-analysis demonstrated that the overall AF detection rate was 11.5% after an ischemic stroke or a transient ischemic attack. 5 If AF can be diagnosed or predicted earlier, the incidence and serious consequences of strokes can be substantially reduced by appropriate AF management. 6 Clinical risk factors that are important in AF development have been introduced. Several novel biomarkers have also been reported to show comparable accuracy for predicting AF. 7 However, these biomarkers are not yet widely applied in real-world practice because of their limited availability and cost. The pathophysiological characteristics of AF include complex and heterogeneous mechanisms, which makes it difficult to develop simple and clinically available AF prediction estimates that can be easily applied in real-world practice with sufficient accuracy.
The introduction of electronic health records (EHRs) has made it easier to establish clinical big data, and studies using this have been actively conducted. Hulme et al proposed an AF prediction model using EHR data. 8 They developed their EHR-AF risk score based on 16 selected variables; its C-statistic was 0.76. External validation study, including the other AF prediction models, showed 0.80 of the C-index from EHR-AF, 0.80 from Cohorts for Heart and Aging Research in Genomic Epidemiology Model for Atrial Fibrillation (CHARGE-AF), 0.68 from C2HEST models (coronary artery disease or chronic obstructive pulmonary disease [ [2 points]; and thyroid disease [hyperthyroidism; 1 point]), and 0.72 from CHA 2 DS 2 -VASc (congestive heart failure, hypertension, age of ≥75 years, diabetes, stroke or transient ischemic attack, vascular disease, age of 65 to 74 years, and sex category). 9 These models predict the occurrence of AF using only simple information that can be obtained through questionnaires or physical measurements, such as disease history, smoking status, and blood pressure. Considering that EHR includes the results of various test equipment in hospitals, the predictive power of AF occurrence can be expected to improve by using this information.
The 12-lead ECG is a basic test used to evaluate the electrophysiological state of the heart. Modern ECG machines provide computerized ECG diagnoses comparable to the physicians' interpretations. 10,11 Automated ECG interpretation is cost-and timeeffective, with minimized intraobserver and interobserver variability. The present study aimed to apply computerized standard ECG diagnosis to the development of an AF prediction model and to validate its performance.

METHODS
All data and supporting materials have been provided with the published article. Study patients were identified from the EHRs of the 3 tertiary hospitals (Korea University Anam Hospital for model derivation and internal validation, n=397 905; and Korea University Guro/Ansan Hospital for external validation, n=133 813/139 600, respectively). EHR data from a single hospital were used for the development and internal validation of the AF prediction algorithm. External validation was performed using data from 2 other hospitals located in different districts and cities. The study protocol was approved by the institutional review boards of each institute. Written informed consent was waived because of the retrospective study design of anonymized data, with minimal risk to the patients. The study complied with the principles of the Declaration of Helsinki.
For the development and internal validation, 397 905 patients who underwent ECG recordings between January 1, 2014, and December 31, 2017, were screened ( Figure 1). Baseline covariates were CLINICAL PERSPECTIVE What Is New?
• For improved atrial fibrillation prediction, ECG, biomarkers, and clinical risk factors with balanced simplicity and applicability are important, their possible combinations are induced, and their performance was compared. • With age, sex, and 5 automated ECG diagnoses, a simplified 3-year atrial fibrillation prediction model showed a comparable performance, especially in patients without closely associated comorbidities, such as heart failure and stroke.
What Are the Clinical Implications?
• According to a given clinical situation, either simplified model or full model can predict 3-year atrial fibrillation with improved performance.

Ascertainment of Clinical Characteristics
The clinical characteristics of patients were extracted from the diagnosis code, clinical diagnosis name, medication and prescription history, outpatient charts, hospital discharge records, and examination records in the EHR. The detection of AF during the follow-up period was based on diagnosis codes and ECG reports. The validated algorithm for AF detection was adopted from a previous study. 12 AF was detected when (1) AF was documented on ECG, AF/flutter ablation, or cardioversion, and (2) ≥2 hospital visits were recorded with the International Classification of Diseases, Ninth Revision (ICD-9), diagnosis codes for AF until censored by death or the last follow-up. The positive predictive value of AF was 92%.
Potential clinical risk factors were selected on the basis of previous studies. 8,13,14 These included age, sex, smoking status, alcohol consumption, height, body weight, systolic and diastolic blood pressure, underlying comorbidities (hypertension, diabetes, dyslipidemia, chronic kidney disease [CKD], thyroid disease, heart failure, valvular heart disease, coronary artery disease, stroke, peripheral arterial disease, and chronic obstructive pulmonary disease), medications (antihypertensive medication and insulin), and laboratory findings (glucose, hemoglobin A1c, total cholesterol, low-density lipoprotein cholesterol, high-density lipoprotein cholesterol, and creatinine). The values of these variables were taken from 1 year before the index date to the closest value to the index date.
Clinical risk factors were defined by combining diagnostic codes (the International Classification of Diseases, Tenth Revision [ICD-10], codes) and laboratory test results. Hypertension was defined when patients were diagnosed with I10 to I15 of the ICD-10 codes or treated with antihypertensive drugs. Diabetes was defined when patients had been diagnosed with E10 to E14 of the ICD-10 codes or treated with oral hypoglycemic drugs or insulin or whose fasting plasma glucose level was ≥126 mg/dL or hemoglobin A1c was ≥6.5%. Patients with dyslipidemia were defined as those diagnosed with E78.0.6, E78.8, E78.9, E88.8, or E88.9 of the ICD-10 codes or treated with any lipid-lowering agents or whose total cholesterol level ≥240 mg/dL, low-density lipoprotein level ≥160 mg/dL, high-density lipoprotein level <40 mg/dL, or triglyceride level ≥200 mg/dL. Patients who had been diagnosed

Standardization of Computerized ECG Diagnosis
The hospitals participating in this study used 12-lead ECG machines from 3 vendors (GE Medical System, Philips Medical Systems, and Nihon Kohden). ECG machines automatically generated ECG diagnoses and ancillary descriptions using the approved computerized algorithm of each vendor.
For example, atrioventricular block was defined in cases with variable atrioventricular blocks, including second-degree atrioventricular block, complete heart block, or atrioventricular dissociation. Fusion beats were ectopically shaped QRS complexes with 100 ms of the expected RR interval. Marked sinus arrhythmia was defined as a range of RR intervals exceeding 40% of the average RR interval. The supraventricular premature complex was a premature, normally shaped QRS complex without the preceding P waves. A wide QRS complex indicates a wide QRS rhythm (QRS duration >120 ms and ventricular rate between 40 and 120 beats per minute). The ECG findings of the ECG diagnosis and ancillary descriptions were in the freetext format and placed in the "statement" section in the original XML files. These free texts were transformed into the terminology of the systematized nomenclature of medicine-clinical terms (SNOMED CT) and its cross-referenced terminology of the observational medical outcomes partnership-common data model (OMOP-CDM).
SNOMED CT is a systematic, international, standardized medical terminology system that is used to effectively record and communicate clinical data in EHRs to improve patient care in major countries, such as the United States, Canada, and the United Kingdom. 15 OMOP provides a common data schema with its own coding system, adopting international standard terminologies such as SNOMED CT. Because the CDM includes various international standard terminologies, it could enhance international big data analysis research through multicenter distributed network research and various clinical studies. 16 Practically, standard terms and code mapping for ECG reports were performed using a web-based software, which is an integrated algorithm using cosine similarity and rule-based hierarchy (available at cdal. korea.ac.kr/ECG2CDM). This software is optimized for mapping ECG reports from the 3 vendors (GE, Philips, and Nihon Kohden) into standard terms and codes of the OMOP-CDM. The overall accuracy of the software was >99% for all 3 ECG machine vendors. OMOP-CDM terms and codes can also be easily converted to SNOMED CT codes and terms using the CONCEPT table at http://athena.ohdsi.org. For example, the OMOP-CDM concept name "ECG normal" (concept identifier: 4065279) originated from the SNOMED CT name "electrocardiogram normal (finding)" (SNOMED code 164854000). Both the OMOP-CDM concept identifier "4065279" and SNOMED code 164854000 define "normal ECG." Finally, the ECG database of the present study included 130 ECG diagnoses.

Statistical Analysis
A Cox proportional hazards regression model was used to develop the 3-year AF prediction model. Clinical risk factors for multivariable analysis were selected when the P value of the univariate analysis was <0.1. ECG diagnoses were selected when their prevalence was >0.02%, and the P value of the univariate analysis was <0.1. With the selected ECG features and 10 clinical features (age, sex, hypertension, diabetes, dyslipidemia, heart failure, valvular heart disease, coronary artery disease, stroke, and CKD), multivariable Cox proportional hazards regression models were fitted with a backward elimination approach, retaining variables that satisfied a significance level of 0.05. Several multivariable Cox proportional hazards regression models were developed. C-statistic and net reclassification index analysis was used to estimate performance of each model.
The models were compared with previously published models. These included the following: (1) CHARGE-AF included 11 variables (age [ 14 ; and (3) EHR-AF included 16 variables (age, sex, race, smoking, height, weight, hypertension, diastolic blood pressure, dyslipidemia, CKD, thyroid disease, coronary artery disease, vascular disease, transient ischemic attack, heart failure, and valvular heart disease). 8 Discrimination of the models was evaluated using C-statistic for time-to-event data. Net reclassification index analysis was used to estimate performance of each model.
The cumulative new-onset AF incidence was demonstrated using Kaplan-Meier survival curves with a log-rank test. In this analysis, patients in the validation cohorts were divided into 3 groups of 3year new-onset AF risks (<1%, 1%-3%, and >3%), which were calculated using the models developed in this study.
Comparisons between the groups were performed using the independent Student t-test or Mann-Whitney test for continuous variables and the χ 2 test or Fisher exact test for categorical variables. All tests were 2 tailed, and P<0.05 was considered statistically significant. All statistical analyses were performed using SAS version 9.4 (SAS Institute) and R version 3.6.1, with the rms, survminer, and survival packages.

Baseline Characteristics of the Derivation, Internal Validation, and External Validation Cohorts
The baseline characteristics of the patients are presented in Table 1. Despite the different baseline characteristics of the external validation cohort, caused by different data sources, there were no significant differences in terms of AF incidence (1.1% in the internal validation cohort versus 1.2% in the external validation cohorts 1 and 2).

Development of 3-Year New-Onset AF Prediction Models
Clinical risk factors and ECG diagnoses were screened in the derivation cohort using univariate Cox regression analysis (Table S1). Four types of multivariable AF prediction models were developed ( where S 0 (t) denotes baseline survival rate at time t, i regression coefficient for each predictor, X i denotes values for each predictor, X i denotes mean values for each predictor, and k denotes the number of risk factors. Hence, using the coefficient from the model, we computed the probability of developing AF within 3 years as below. For example, the 3-year AF prediction formula for model 2 (simplified model with ECG diagnosis) is:

Validation of 3-Year New-Onset AF Prediction Models
Model 1 (ECG diagnosis model) showed the lowest performance, and model 3 (full model with ECG diagnosis) showed the highest performance, in the derivation cohort (Table 3 and Figure 2). This suggests that ECG diagnosis alone is insufficient to predict AF incidence. Model 2 (simplified model with ECG diagnosis) (C-statistic, 0.777 [95% CI, 0.766-0.788]) showed results comparable to those of model 3 (Cstatistic, 0.796 [95% CI, 0.785-0.806]). Model 4 (full model without ECG diagnosis) (C-statistic, 0.793 [95% CI, 0.783-0.804]) also showed results comparable to those of model 3. The proposed models showed a similar prediction performance in the internal and 2 external validation cohorts compared with the derivation cohort. In addition, model 2 and model 3 showed higher net reclassification index values compared with model 1 (Table 4). More important, model 2 showed comparable reclassification of AF prediction to model 3. Only 2.4% of patients were better reclassified when the other clinical information was included in model 2.
The receiver operating characteristic curve showed that the area under the curve (AUC) of model 3 was the highest at 0.8 ( Figure 2). The next AUC ranking was model 2 (AUC, 0.73), model 4 (AUC, 0.68), and model 1 (AUC, 0.47). The AUC values for each model were statistically significant. The calibration plots are shown in Figure 3. The calibration of the proposed models for AF incidence prediction was good as the black line mostly coincided with the gray line, indicating perfect calibration.

Robustness of the 3-Year New-Onset AF Prediction Models
To evaluate the robustness of the proposed models, a subgroup analysis was performed for model 2 (simplified model with ECG diagnosis), model 3 (full model with ECG diagnosis), and model 4 (full model without ECG diagnosis) in the validation cohort (Table 5). Overall, the models developed in this study showed better predictive power in young, female patients and in the group without AF risk factors, such as hypertension and diabetes. The group with the highest C-statistic was the group without hypertension. The group with the lowest C-statistic among the clinical risk factor group was the patient group with heart failure in model 2, and the group with an estimated glomerular filtration rate of <30 mL/min per 1.73 m 2 in models 3  and 4. These results suggest that separate AF prediction models for patients with CKD and heart failure may need to be developed. Next, we divided the patients into 3 groups according to the estimated AF risk derived from the proposed models (low-risk group, <1%; intermediate-risk group, 1%-3%; and high-risk group, >3%). Kaplan-Meier curves showed time-dependent linear and increased AF incidence across the 3 groups in both model 2 and model 3 (Figure 4). In model 3, the cumulative 3-year AF incidence rate was 0.51%, 2.72%, and 6.12% in the low-risk, intermediate-risk, and high-risk groups, respectively. In model 2, the cumulative 3-year AF incidence rate was 0.52%, 2.71%, and 5.13% in the low-risk, intermediate-risk, and high-risk tertiles, respectively. In model 3, compared with the low-risk group, the intermediate-risk group had a 5.39-fold increased hazard for AF incidence (95% CI, 4.78-6.07; P<0.001) and the high-risk group had a 12.36-fold increased hazard for AF incidence (95% CI, 10.62-14.38; P<0.001). In model 2, compared with the lowrisk group, the intermediate-risk group had a 5.22-fold increased hazard for AF incidence (95% CI, 4.65-5.87; P<0.001), and the high-risk group had a 10.01-fold increased hazard for AF incidence (95% CI, 8.29-12.07; P<0.001).

Comparison of the Proposed Models With the Previously Published Models (EHR-AF, CHARGE-AF, and C2HEST)
The AF predictive power of the models in the present study and the previously developed models, such as EHR-AF, CHARGE-AF, and C2HEST, was compared ( Figure 5). We adopted all the predictors in the previously developed model. The C-statistic was highest in

DISCUSSION
In the present study, we developed clinically relevant models for 3-year new-onset AF prediction. Because AF diagnosis was confirmed by ECG, we initially aimed to predict AF using only ECG data (model 1). With a systematic statistical approach, the following 5 ECG diagnoses were chosen: atrioventricular block, fusion beats, sinus arrhythmia (marked), supraventricular complex, and wide QRS complex. By adding minimal demographic variables (age and sex) to 5 ECG diagnoses, model 2 (simplified model with ECG diagnosis) showed improved C-statistics (0.785 in the derivation cohort, and 0.777 in the total validation cohort). Its prediction efficacy was comparable to that of the full model with ECG diagnosis (model 3) or without ECG diagnosis (model 4), which included 6 clinical variables (age, male sex, CKD, heart failure, valvular heart disease, and previous stroke).
More accurate new-onset AF prediction is possible with more data and clinical information. Nevertheless, a simplification of the model would be helpful, considering that there are practical limitations in obtaining a wide variety of accurate clinical information in clinical practice. Recently, attempts have been made to apply this clinical information more easily using EHRs. For this, the format and data of the EHRs should be standardized and managed consistently. In this study, an ECG diagnosis-focused model was presented by using the automated readout of an ECG machine, which was transformed into a standardized ECG diagnosis.     in the OMOP-CDM database, which was transformed from EHR (AUC, 0.800). 18 More recently, Grout et al developed an AF prediction model using 10 variables from the EHR (C-statistic, 0.81). 19 All the prior EHRdriven AF prediction models used clinical variables that mainly originated from diagnosis codes or simple body measurements. The models of the present study adopted the detailed information of 130 ECG diagnoses that were generated and transformed from the computerized ECG machine. In the past, ECG was usually performed in hospitals, but the recent development and wide use of personal ECG devices markedly improved the accessibility to ECG information. The previous CHARGE-AF model showed the potential of ECG information to improve AF prediction in its augmented model, which added only 2 ECG-related data (left ventricular hypertrophy and PR interval). 13 The present study proposes 5 ECG diagnoses carefully extracted from 130 ECG diagnoses as important predictors of new-onset AF incidence.
The AF development process might be summarized as abnormal atrial substrates and abnormal triggers through diverse pathways. 20 Before clinical AF diagnosis, electrical abnormalities associated with these processes might be found at various distant points, such as heart failure and valvular heart diseases. It can be inferred that the ECG findings predicting AF imply electrical or structural cardiac abnormalities that contribute to AF development. The proposed ECG findings for AF prediction in the present study included atrial abnormal triggers (fusion beats, supraventricular premature complex, and wide QRS rhythm) as well as abnormal substrate (marked sinus arrhythmia and atrioventricular block).
In the present study, a model using ECG diagnosis (model 2) along with minimal demographic information of age and sex (men) showed an AF prediction potential comparable to those of the prior predictive models using >10 clinical information variables that can be obtained through a physician's questionnaire. Thus, model 2 using ECG information could have a crucial advantage in real-world applications. ECG information can be easily obtained without extra cost or effort through a computerized interpretation system. Although many clinicians agree that AF risk prediction is important, they are often skeptical of gathering information for prediction. This is probably because they might not have found the results of the previous conventional AF prediction models, which adopted clinical variables, to be significantly different from what they already knew empirically. Finally, the ECG-driven AF  , and coronary artery disease or chronic obstructive pulmonary disease, hypertension, elderly, systolic heart failure, and thyroid disease (C2HEST) in the total validation cohort population. C-statistics of EHR-AF, CHARGE-AF, and C2HEST were calculated from the total validation cohort. The original coefficients of the selected variables in EHR-AF, CHARGE-AF, and C2HEST were applied. prediction model (model 2) can be applied to health checkups for the general population. Moreover, its performance was better in subjects without AF risk factors than in those with AF risk factors.

LIMITATIONS
There are several limitations to the present study. First, in our data, ≈87% of the eligible candidates had missing values, so we excluded them. These excluded candidates were relatively younger and healthier than the candidates without missing values (Table S2). There is a potential risk of selection bias. Second, this was a retrospective study. Patients with AF at the time of study enrollment were not screened by the Holter test but only by 12-lead ECG. Thus, patients with subclinical AF were not completely excluded at the time of enrollment. However, AF occurrence showed a linear relationship (Figure 4), suggesting that subclinical AF was sufficiently excluded. Third, only 12lead ECG data were used in the present study. Holter test results, which contained more ECG data, were not used. If Holter monitor or wearable ECG devices were used for new-onset AF modeling, an algorithm with better predictive power could be developed. Finally, the present study adopted an ECG diagnosis transformed from a computerized ECG interpretation system. It did not use the original raw ECG wave data; therefore, if it was modeled using the original ECG waveform data, the results may have been different or even improved.

CONCLUSIONS
The present study developed predictive models for 3year AF occurrence using EHR data, including ECG diagnosis. Although the model that included many variables had the best predictive power, the model that simply added age and sex to the ECG diagnosis also showed a comparable predictive power with broad applicability for incident AF.