Predicting Mortality and Functional Outcomes after Ischemic Stroke: External Validation of a Prognostic Model ()
1. Introduction
Accurate prediction of outcomes after stroke is important, particularly if the predictive model can be applied to patient care [1] [2] and to relaying information to patients and families [2] . Modeling is also useful if it provides insights into the mechanisms of post-stroke recovery and stratifications for recovery trials that might improve post-stroke outcomes.
Prediction of functional outcomes after ischemic stroke is challenging. There are a plethora of stroke-related clinical and imaging data to consider. To date, predictive models have differed in the factors considered to build the model, [3] - [11] and testing for external validity in independent cohorts of patients has been uncommon [12] .
We previously derived predictive models for 3-month mortality and 3-month modified Rankin Score (mRS) score after acute ischemic stroke utilizing a cohort of patients from 1999 [10] . The purpose of the current study was to test the validity of those models. Here, we have utilized two independent data sets from 2005 and 2010 that include comprehensive outcome measurements in a well-characterized cohort of ischemic strokes.
2. Methods
Source of Data:
This work was undertaken as part of the Greater Cincinnati/Northern Kentucky Stroke Study (GCNKSS), a 5-county population-based study that tracks the regional incidence of stroke and case fatality. The Institutional Review Boards at all participating institutions approved this study, and detailed methods have been previously described [13] [14] [15] . Our original models were derived using data from a cohort of 451 subjects from the GCNKSS study with ischemic stroke occurring during calendar year of 1999. Two additional cohorts were enrolled in the GCNKSS, one whose strokes occurred in 2005, and the other whose strokes occurred in 2010. We used the Transparent Reporting of a multivariable prediction model for Individual Prognosis or Diagnosis TRIPOD statement checklist [16] in analyzing and reporting this study.
Subject ascertainment and follow up:
Case identification was similar for all three study periods. As with the 1999 cohort, for the 2005 and 2010 cohorts, all ischemic stroke patients among residents of the GCNK study region at any of the 17 hospitals in our study area were eligible for enrollment; the primary reason for not enrolling was discharge before contact for consent. Trained study nurses abstracted demographics, presenting symptoms, functional status before stroke, social, family, and medical histories, medications, testing and laboratory results, and imaging studies for each case. Stroke team physicians reviewed each abstract and all available imaging studies to verify that each case was a stroke and to classify the subtype of stroke.
These cohorts were followed for three months to determine both survival and short-term functional outcome, which was assessed via an initial interview and a 3-month interview. The methodology of the initial abstraction of data and subsequent interviews was similar to that used in the derivation cohort [10] . Each consented patient or proxy underwent an initial face-to-face structured interview with a research nurse. At 3 months after stroke, research nurses telephoned patients or proxies and asked about vital status, poststroke hospitalizations and medical contacts other than simple office visits. The mRS was used to determine the functional status of each surviving patient.
The Modified Rankin Score (mRS) is a widely-used scale to determine functional outcomes after stroke and measures independence of the patients. The scale consists of 6 grades, from 0 to 5, with 0 corresponding to no symptoms and 5 corresponding to severe disability. A separate category of 6 is usually added for patients who expire [17] .
Predictors:
The predictors considered in model derivation have been detailed previously [10] . The original mortality prediction model included age and post-stroke mRS, whereas the original mRS prediction model included age, diabetes, severe white matter hyperintensity (WMH), pre-stroke mRS, post-stroke mRS, and NIH Stroke Scale (NIHSS) at presentation. The validated retrospective NIHSS (rNIHSS) scoring method was used to compute the NIHSS [18] . Post stroke mRS was estimated at time of hospital discharge or at 30 days after onset if the patient continued to be hospitalized. The degree of WMH was assessed visually using the 4-level Fazekas ordinal scale (none, mild, moderate, or severe) [19] . A single reader (B.K) graded the WMH in all three time periods (original and validation) to maintain consistency. If a patient had both CT and MRI, the MRI study was used to grade white matter disease using the same Fazekas scale.
Outcome measures:
The two main outcomes of interest were: 1) death that occurred within 3 months after stroke (3 month mortality), and 2) Modified Rankin Score (mRS) at 3 months. Each surviving cohort member or their proxy was interviewed at three months after stroke to assign a 3 month mRS.
Statistical Methods:
The procedures for deriving the original models are described extensively elsewhere [10] . A logistic regression model was used for 3-month mortality and a linear regression model was used for 3-month mRS. The 6 level mRS is a coarse measure of an underlying continuous distribution of functionality; thus, the underlying distribution is appropriately modeled using the linear regression technique [10] .
External validity of the model predictions was tested using measures of agreement between predicted and observed values, discrimination, and calibration. For the 3-month mortality logistic model, the Brier score, [20] , a measure of the average prediction error, and Tjur’s R2, [21] , a measure of explained variation, were used to assess overall model performance. The Brier score takes on values between 0 (perfect) and 1 (worst predictions), with lower Brier scores indicating better predictions. Tjur’s R2 is closely related to the traditional R2 for linear models with an upper bound of 1 (perfect) and a lower bound of 0 (worst). The area under the receiver operating characteristic curve (AUC) was used to assess discrimination. An AUC of 0.5 indicates no discrimination, whereas an AUC of 1.0 indicates perfect discrimination. For the 3-month mRS linear model, R2 and root mean squared error (RMSE) were used to assess overall model performance. For both models, calibration was evaluated by plotting the predicted outcomes aganst the observed outcomes; calibration reflects the agreement between predictions from the model and observed outcomes and perfectly calibrated models have an intercept of 0 and a slope of 1. The need for re-estimating model parameters was also tested. Handling of missing data was done by complete-case analysis, as was done in the previous model development analyses. SAS 9.3 (SAS Institute Inc., Cary, NC) and R version 3.2.4 (The R Foundation for Statistical Computing) were used for analyses.
3. Results
Participants
In 2005, we prospectively identified a cohort of 460 ischemic stroke subjects. Three subjects did not have post-stroke mRS assessed and were excluded, leaving 457 patients in the 3-month mortality validation dataset. There were 29 who died within 3 months post-stroke, five with missing WMH data, and 38 lost to follow up, leaving 388 patients in the 3-month mRS validation dataset. In 2010, a cohort of 504 ischemic stroke subjects was prospectively identified. Eight patients were missing post-stroke mRS, leaving 496 patients in the 3-month mortality validation dataset. There were 45 who died within 3 months post stroke, and 1 patient missing WMH data, and 56 lost to follow up, leaving 402 in the 3-month mRS validation dataset.
Derivation versus validation dataset
Table 1 summarizes the characteristics of the subjects included in the 3-month mortality and the 3-month mRS derivation and validation cohorts. The validation cohorts had fewer nonwhite patients, lower NIHSS and poststroke mRS, fewer patients with prior strokes, and more patients with history of smoking and hypertension compared with the derivation cohort. Of note, the proportion of patients with MRI imaging was significantly higher in 2005 (73%) and 2010
Table 1. Characteristics of patients included in the 3-month mortality and the 3-month mRS models by development and validation dataset.
Data shown as count (percentage) unless noted otherwise. Denominators vary due to missing data.
(79%) compared with 1999 (27%); p<0.01.
3-month mortality
The 3-month mortality was 6% in 1999, 6% in 2005, and 7% in 2010. The derived 3-month mortality model included two predictors, age and post-stroke mRS, and had a Brier score of 0.049 and an AUC of 0.80. When applying the derived model to the validation datasets keeping all regression coefficients fixed at their original value, the model performed well with a Brier score of 0.045 for 2005 and 0.053 for 2010. The AUC was 0.86 (95% CI: 0.79, 0.93) for 2005 and 0.84 (95% CI: 0.76, 0.92) for 2010 (Figure 1(a) and Figure 1(b)). Predicted probabilities tended to be lower than actual, as illustrated in the calibration plots. When fitting the model to the validation cohorts and allowing the coefficients to be re-estimated, the cofficients for age and post-stroke mRS were not statistically different and thus there was no need to re-estimate either of the individual coefficents in both validation dataset. After re-estimating the interecept, model performance improved with observed and predicted probabilities falling on the ideal line with intercept 0 and slope of 1 (Figure 1(c) and Figure 1(d)).
3-month mRS model
The original 3-month mRS model included six predictors-age, diabetes, severe WMH, pre-stroke mRS, post-stroke mRS, and the retrospective NIHSS and had an R2 of 0.48 and an RMSE of 1.10. When applying the derived model to the validation datasets keeping all regression coefficients fixed at their original value, the model performed well with an R2 of 0.57 for 2005 and 0.50 for 2010 and an RMSE of 0.85 for 2005 and 1.05 for 2010 (Figure 2(a) and Figure 2(b)). When compared with actual 3 month mRS, predicted values tended to be high in both validation datasets. When allowing the regression coefficients to vary, the estimated cofficients for pre-stroke mRS, post-stroke mRS and NIHSS were reasonably similar to the coefficients previously estimated. However, the effects of age, diabetes and severe WMD were significantly different for the 2005 cohort, and the effects of age and severe WMD differed for 2010 (Table 2). After re-estimating the regression coefficients, model performance improved with observed and predicted probabilities falling close to the ideal line with intercept 0 and slope of 1 (Figure 2(c) and Figure 2(d)).
4. Discussion
Our originally derived 3-month mortality and functional outcome predictive
Figure 1. Calibration and re-calibration plots of the 3-month mortality prediction model. (For comparison, the derivation model had an AUC = 0.80, Brier score = 0.049, and R2 = 0.10.)
Figure 2. Calibration and revision plots of the original 3-month mRS prediction model. (For comparison, the 1999 derivation model had an R2 = 0.48 and RMSE = 1.10.)
Table 2. Coefficients ± standard error in the 3-month mortality model (logistic regression) and in the 3-month mRS model (linear regression) derivation data and reestimation using the validation data.
models performed well in two test cohorts of stroke ischemic subjects. A major strength of our study is that we externally validated our models by testing our original model in two separate, large, and independent cohorts that were enrolled using similar methods as in the derivation study. While there are a multitude of models and clinical scores predicting outcomes after stroke, [12] [22] [23] these are rarely derived from longitudinal datasets with comprehensive measurements made at different time points. All three patient cohorts used in our study were derived from a stroke population more representative of the US population than cohorts from randomized clinical trials or administrative databases [15] .
We tested models that separately predict mortality and functional outcomes. A model that combines mortality and morbidity into a single endpoint has limitations for interpretation, as well as resulting in loss of clinically relevant information useful in counseling patients and their caregivers about both short and long term prognosis.
In contrast to the originally derived models, severe WMH burden was not significantly associated with poor outcome in the validation cohorts [10] . This is discordant with other studies that have suggested that WMH may play an important role in predicting poststroke outcomes [24] [25] . One possible reason for this discrepancy could be that our original model was developed using a cohort of stroke patients from 1999 where CT scanning was predominant, while imaging in 2005 and 2010 was predominantly MRI. CT is less sensitive than MRI in determining WMH burden. Our finding is consistent with Reid et al, who illustrated that CT derived imaging variables including WMH did not improve prediction of stroke outcome models [26] . An alternative explanation could be our use of visual scales rather than quantitative volumetric assessment of WMH. Our data do not resolve the controversy over the utility of WMH in predicting stroke outcomes. The role of chronic small vessel disease in determining post stroke outcome must be further explored in prognostic models using more sophisticated quantitative imaging biomarkers.
Our study has important limitations. Similar to the derivation dataset, the validation datasets are a convenience sample of stroke patients who survived the early days following their stroke, and hence survival bias may be present. Thus, these models are applicable to patients who survive the first few days of stroke and remain hospitalized in the acute phase. Another limitation is that our imaging variables considered only severe WMH and not the entire spectrum of chronic small vessel disease (microbleeds, lacunar infarcts, atrophy) or infarct volumes. The studies available for the imaging analysis included both CT and MRI, leading to heterogeneity in WMH grading as CT can underestimate white matter disease burden. Lastly, we had missing data and there is a chance of bias if individuals with missing data are not representative of the original cohort. However, only 1% in 2005 and 2% in 2010 had missing data and were excluded from the 3-month mortality models. Almost all of the missing data from the 3-month mRS models was due to missing outcome and we chose not to impute outcomes.
The demonstrated validity of models predicting mortality and functional outcome strengthens our contention that the models have both clinical and research utility. They will be useful tools in epidemiological studies and clinical trials to stratify cohorts or to balance treatment groups, and also for prognostication in clinical practice. However it is important to note the general limitation inherent when applying any population-based model in the context of individual patients; these models should be seen as a practical instrument to facilitate information and not as a substitute for clinical judgment and medical decision making.
5. Conclusion
Our models accurately predict 3-month mortality and functional outcome in two independent study cohorts with minor re-calibration. These models provide insight into post stroke recovery and may have utility in counseling patients and their families as well as for designing clinical trials and in epidemiological research.