Development and Validation of Prediction Models for Severe Complications After Acute Ischemic Stroke: A Study Based on the Stroke Registry of Northwestern Germany

Background The treatment of stroke has been undergoing rapid changes. As treatment options progress, prediction of those under risk for complications becomes more important. Available models have, however, frequently been built based on data no longer representative of today’s care, in particular with respect to acute stroke management. Our aim was to build and validate prediction models for 4 clinically important, severe outcomes after stroke. Methods and Results We used German registry data from 152 710 patients with acute ischemic stroke obtained in 2016 (development) and 2017 (validation). We took into account potential predictors that were available at admission and focused on in‐hospital mortality, intracranial mass effect, secondary intracerebral hemorrhage, and deep vein thrombosis as outcomes. Validation cohort prediction and calibration performances were assessed using the following 4 statistical approaches: logistic regression with backward selection, l1‐regularized logistic regression, k‐nearest neighbor, and gradient boosting classifier. In‐hospital mortality and intracranial mass effects could be predicted with high accuracy (both areas under the curve, 0.90 [95% CI, 0.90–0.90]), whereas the areas under the curve for intracerebral hemorrhage (0.80 [95% CI, 0.80–0.80]) and deep vein thrombosis (0.73 [95% CI, 0.73–0.73]) were considerably lower. Stroke severity was the overall most important predictor. Models based on gradient boosting achieved better performances than those based on logistic regression for all outcomes. However, area under the curve estimates differed by a maximum of 0.02. Conclusions We validated prediction models for 4 severe outcomes after acute ischemic stroke based on routinely collected, recent clinical data. Model performance was superior to previously proposed approaches. These predictions may help to identify patients at risk early after stroke and thus facilitate an individualized level of care.


Data S2. Complete case analyses
We performed complete case analyses and thus excluded subjects with missing information for any of the considered variables. This led to the exclusion of 1,270 out of 76,019 patients for the year 2016 (1.7%) and the exclusion of 5,378 patients out of 76,691 patients (7.0%) for the year 2017. The exclusion of more patients in 2017 than 2016 (7% vs. 1.7%) was primarily due to an increased frequency of missing information on "Swallowing impairments at admission" (3.0% in 2017 vs. 0.4% in 2016), "Speech impairments at admission" (2.1% vs. 0.2%), "Language impairments at admission" (1.4% vs. 0.1%) and the comorbidities hypercholesterinaemia (1.8% vs. 0.4%) and prior myocardial infarct (1.7% vs. 0.2%). This increase between 2016 and 2017 may be due to the fact that it was possible to record these items as "present/absent/not possible to determine" in 2017, but only as "present/absent" in 2016.
The comparison of included and excluded patients based on the complete case criterion can be found in Tables S2-S3. The groups of excluded patients in 2016 and 2017 were slightly older, comprised more female patients and had a slightly higher median Rankin Score in 2017 than the one of included patients. Included and excluded subjects in the 2016 derivation cohort, used for model training, differed in only very small absolute amounts. Differences were slightly more pronounced in the 2017 validation cohort. However, importantly, differences in key characteristics (age, sex, Rankin Scale) and rates of the four adverse outcomes of included subjects in 2016 and 2017 were very small in magnitude. This later aspect is reassuring with respect to the suitability of the 2017 cohort as validation cohort.

Downsampling
The downsampling step was motivated by pronounced class imbalances in case of all four of our adverse outcomes (rates: 5% mortality, 1.7% ICP, 1.8% ICH and 0.4% DVT). Downsampling the majority class, as implemented in our study, has been shown to improve classifier performance on minority class cases, i.e., those patients that are usually of interest. 4

Variable Normalization
As some of the used approaches, i.e., the l1-regularized logistic regression and k-nearest neighbor classifier, are sensitive to variable scales, we normalized input variables by subtracting the sample mean and dividing by the sample standard deviation in the training set and applied the same normalization to the test set.

Feature importance
The feature importance computed for the gradient boosting classifier quantifies how much each variable improves the prediction performance based on the increase in the Gini index-derived purity. The Gini index itself describes the total variance across various classes; a low value indicates that each node contains only observations from a single class. 47