Risk Factors and Prediction of Stroke in a Population with High Prevalence of Diabetes: The Strong Heart Study

Background and Objective American Indians have a high prevalence of diabetes and higher incidence of stroke than that of whites and blacks in the U.S. Stroke risk prediction models based on data from American Indians would be of clinical and public health value. Methods and Results A total of 3483 (2043 women) Strong Heart Study participants free of stroke at baseline were followed from 1989 to 2010 for incident stroke. Overall, 297 stroke cases (179 women) were identified. Cox models with stroke-free time and risk factors recorded at baseline were used to develop stroke risk prediction models. Assessment of the developed stroke risk prediction models regarding discrimination and calibration was performed by an analogous C-statistic (C) and a version of the Hosmer-Lemeshow statistic (HL), respectively, and validated internally through use of Bootstrapping methods. Results Age, smoking status, alcohol consumption, waist circumference, hypertension status, an-tihypertensive therapy, fasting plasma glucose, diabetes medications, high/low density lipoproteins, urinary albumin/creatinine ratio, history of coronary heart disease/heart failure, atrial fibrillation, or Left ventricular hypertrophy, and parental history of stroke were identified as the significant optimal risk factors for incident stroke. Discussion The models produced a C = 0.761 and HL = 4.668 (p = 0.792) for women, and a C = 0.765 and HL = 9.171 (p = 0.328) for men, showing good discrimination and calibration. Conclusions Our stroke risk prediction models provide a mechanism for stroke risk assessment designed for American Indians. The models may be also useful to other populations with high prevalence of obesity and/or diabetes for screening individuals for risk of incident stroke and designing prevention programs.


Introduction
Stroke is a major health care challenge in American Indians (AIs). Recent data indicate that AIs have a higher incidence of stroke than that of whites and blacks in the US [1]. Stroke is one of the leading causes of death as well as disability among AIs [2] [3]. Cigarette smoking, diabetes mellitus (DM), and high blood pressure are well documented modifiable risk factors for stroke [4]. We previously reported that risk factors for stroke among the AI population included age, high blood pressure, smoking, albuminuria, and diabetes [1]. Among them, DM (48.8%) and albuminuria (29.6%) were the prominent factors related to future stroke [1] [ 5]as well as coronary heart disease (CHD) [6] [ 7]in AIs. A stroke prediction model utilizing routinely collected variables will assist providers who care for AIs in evaluating the risk of stroke in their patients and assist communities to design more effective and targeted interventions. Several stroke risk-assessment tools have been developed including the widely-used Framingham Risk Profile [8] [9] [10]. However, the contributions of certain common risk factors for incident stroke vary across populations [11]. Further, some risk factors/correlates have not previously been included; for example, albuminuria has been found to be significantly and independently associated with almost all of chronic diseases such as DM [12], hypertension (HTN) [13], and CHD [6] [7] in AIs. It is important to include these risk factors in the stroke prediction models for AIs.
This article presents gender-specific stroke risk prediction equations based on longitudinal data from the Strong Heart Study (SHS) during 1989-2010. A "risk calculator" from the equations will be developed for individuals to input their values of the risk factors and instantly obtain a probability (risk) of developing stroke in 10 years (will be available on the SHS Web site: http://strongheart.ouhsc.edu).

Study Population
The SHS is a population-based cohort study of cardiovascular disease (CVD) and its risk factors in AI tribes/communities in southwestern Oklahoma, central Arizona, and North and South Dakota. Participants (n = 3516; 2056 women) aged 45 to 74 years underwent baseline examination from 1989 to 1992. The design, inclusion and exclusion criteria of participants, survey methods, and laboratory techniques of the SHS have been described in detail [14] [15] along with methods of definition and identification of first stroke [1] [16]. Participants in the present analysis (3483; 2043 women) had no history of stroke or stroke-like events at the baseline examination. Among them, 297 (179 women) suffered an incident stroke during an average follow-up of 15.04 years (inter-quartile range 9.7 -20.2 years) through the end of 2010. The study was approved by Institutional Review Boards of the participating institutions and tribes as well as the Indian Health Service. Informed consent was obtained from all participants.

Baseline Characteristics
Information on demographic factors, medical history, medication use, and personal health habits was collected by interview. A physical examination was conducted and fasting blood samples were collected for laboratory tests including lipids and lipoproteins. Anthropometric measurements were taken and sitting blood pressure (1 st and 5 th Korotkoff sounds) was measured three times consecutively using mercury sphygmomanometers (WA Baum Co) after five minutes of rest [17]. The average of the 2 nd and 3 rd systolic and diastolic blood pressure measurements were used in the analyses. HTN status was defined by the Seventh Joint National Committee on Hypertension criteria [18]: HTN if systolic blood pressure (SBP) ≥ 140 mmHg or diastolic blood pressure (DBP) ≥ 90 mmHg or on antihypertensive therapy, normal if SBP < 120 mmHg and DBP < 80 mmHg, and pre-hypertension (Pre-HTN) otherwise. DM status was defined by the American Diabetes Association diagnosis and classification guidelines [19]: DM if fasting plasma glucose (FPG) ≥ 7.0 mmol/L (126 mg/dL) or on diabetes medications, impaired fasting glucose (IFG) (or prediabetes) if 5.6 mmol/L (100 mg/dL) ≤ FPG < 7.0 mmol/L, and normal fasting plasma glucose (NFG) if FPG < 5.6 mmol/L. Micro-and macro-albuminuria were defined as urinary albumin/ creatinine ratios of 30-299 mg/g and ≥ 300 mg/g, respectively. Current smoking status was defined as smoking currently, smoking regularly, and having smoked at least 100 cigarettes in one's entire life until the date of interview. Estimated glomerular filtration rate (eGFR) was derived based on serum creatinine that was recalibrated to an isotope dilution mass spectrometry (IDMS)-traceable serum creatinine assay [20] and using the CKD-EPI (Chronic Kidney Disease Epidemiology Collaboration) formula [21]. Participants who had CHD or congestive heart failure (HF), atrial fibrillation (AFIB), or left ventricular hypertrophy (LVH) by electrocardiography before or at the baseline examination were considered as having a history of CHD/HF, AFIB, or LVH, respectively.

Outcome Variables
All study participants without a prior history of stroke at the baseline examination were under follow-up surveillance for incident stroke events occurring between the date of the baseline examination and December 31, 2010. Mortality and morbidity follow-up data were available in 99.8% and 99.2% of participants, respectively.

Fatal
Stroke-Fatal events included deaths judged to be due to definite and possible stroke. Deaths occurring during the follow-up were confirmed through Indian Health Service or private hospital records and through direct contact by study personnel with participants' families or other informants [14] [15] [22]. The process of ascertaining stroke deaths has been reported previously [1] [16] [22]. All possible stroke-related deaths were reviewed by physician members of the Strong Heart Study Mortality Review Committee and then reviewed by neurologists (D.O.W., J.P.W.) or since 2004 by a cardiologist focused on stroke (J.R.K.) for confirmation using previously described criteria [23] that differentiated eight subtypes of stroke-related events [cardioembolic infarction, subarachnoid hemorrhage, intraparenchymal hemorrhage, lacunar infarction, other unknown infarction, transient ischemic attack (TIA), unknown type of stroke, atherothrombotic infarction].

Nonfatal Stroke-
The process to confirm nonfatal stroke was similar to that for fatal stroke. Neurologists (D.O.W., J.P.W.) and later the cardiologist (J.R.K.) made up the adjudication review committee and provided the final diagnosis for non-fatal events (definite and possible non-fatal strokes) that occurred from the date of the baseline examination to Dec. 31, 2010 [14] [16] [22] [23]. Stroke event sub-types used are the same as described for fatal stroke. If more than one event happened in the same individual, the date of the earliest was considered to be the first stroke date.

Statistical Methods
Overall incidence rates (per 1000 persons-years) of stroke and their 95% confidence intervals and incidence rates by stroke types, gender, age groups (45 -54, 55 -64, and 65 -74 years old) and centers (South/North Dakotas, Oklahoma, and Arizona) were estimated by dividing the total number of observed stroke events by the total follow-up stroke-free times (person-years) in the respective group. Stroke incidences by gender among sub-categories of each potential baseline risk factor were also estimated. Cox proportional-hazards models were used to assess univariate associations of individual risk factors with incident stroke after adjusting for age. Cox model with competing risks [24] was used in sensitivity analyses. A p-value of <0.05 was considered to be statistically significant.

Development of Prediction Equations
Cox proportional-hazard models were also used to assess the simultaneous association of multiple risk factors with incident stroke and to develop gender-specific stroke prediction models. Backward variable selection methods [24] with a significance level of 0.05 was used to select optimal sets of baseline risk factors for incident stroke. The potential risk factors included were, age, body mass index (BMI), waist circumference (WAIST), SBP, DBP, antihypertensive therapy (denote its indicator function as HTNRX, HTNRX = 1 if on antihypertensive therapy and = 0 if not), smoking status, physical activity, alcohol consumption, FPG, diabetes medications (denote its indicator function as DMRX, DMRX = 1 if on diabetes medications and = 0 if not), urinary albumin/creatinine ratio (UACR), eGFR, low-density lipoprotein cholesterol (LDL-C), high-density lipoprotein cholesterol (HDL-C), triglyceride (TG), history of CHD/HF, parental history of CVD, stroke, DM or HTN, history of or electrocardiogram evident atrial fibrillation (AFIB) and left ventricular hypertrophy (LVH), as well as categorization of these variables such as DM status (Yes/No; or DM, IFG and NFG), HTN status (HTN, pre-HTN, normal), and albuminuria status (macroalbuminuria, micro-albuminuria, normal). Logarithmic transformation of skewed variables was applied if needed. For the significant risk factors selected for the models, their interactions were also considered and further selected for their possible additional contributions.

Discrimination, Calibration, and Validation of the Prediction Equations
An analogous C-statistic [7] [25] was calculated to evaluate the discrimination ability of the stroke prediction models in separating those who developed stroke from those who did not. This C-statistic is analogous to the area under the receiver operating characteristic curve (ROC curve) based on a logistic regression. A C-statistic value of ≥0.7 indicates good discrimination ability, and the closer the C value is to 1.0, the better is the discrimination ability. A version of the Hosmer-Lemeshow χ 2 statistic (HL-statistic) [7] [25] was computed to assess model calibration ability (or how closely the predicted probabilities reflected actual risk). Participants were divided into deciles according to their predicted probabilities of stroke in 10 years using the proposed prediction model, and the HL-statistic was calculated to compare the differences between the predicted and actual proportions of stroke events. HL-statisticvalues of <20 are considered good calibration.
In addition, the stroke prediction models were validated internally with the use of bootstrapping methods [7] [25]. Samples of the same size (n = 3483) as the original cohort were taken 1000 times from the original cohort with replacement. Then the "optimism" [7] [ 25]for the C-statistic or the p-value for the HL-statistic was calculated based on these 1000 bootstrapping samples. A "Bootstrap-corrected statistic" then was evaluated as "the statistic from the model"-"the optimism for the statistic". A bootstrap-corrected statistic from a model is a nearly unbiased estimate of the expected value of the statistic from the external validations of the model, with smaller "optimism" values indicating better validity of the statistic [25]. All analyses were conducted with SAS 9.4 (SAS Institute Inc., Cary, NC, USA). Table 1 shows estimated incidence rates (per 1000 person-years) of stroke for all SHS participants without prior stroke. There were no significant gender difference, but significant center differences among Arizona, Oklahoma and Dakotas with Dakotas the highest followed by Oklahoma and Arizona. Incidence rate was significantly increased with age. Incidence rates were highest for cardioembolic infarction followed by other unknown infarction, lacunar infarction and intra-parenchymal hemorrhage among identified stroke types.

Results
Gender-specific stroke incidence rates by sub-categories of each potential baseline risk factor and its univariate association with incident stroke after adjusting for age are shown in Table 2. Age, and after adjusting for age, smoking, HTN, DM, albuminuria, history of CHD/HF, and AFIB were univariately significantly associated with incident stroke for both women and men. Alcohol consumption, HDL-C, history of LVH, and parental history of stroke were significant risk factors for women only. There were no significant univariate association of incident stroke with BMI, WAIST, physical activity, LDL-C, TG, eGFR, parental history of CVD/DM/HTN ineither women or men (data not shown).
Among these associations, after adjusting for age, for examples, those with DM had 2.25fold higher risk than those without DM in women, and 1.65-fold higher in men; and those with macroalbuminuria or microalbuminuria had respective 3.39 or 1.66-fold higher risk than those had normal UACR in women, and 3.29 or 1.70-fold higher in men.
Gender-specific stroke prediction models are shown in Table 3. Age, current smoking, alcohol consumption, DBP and SBP as well as their interaction with HTN treatments, UACR, interaction of FPG and diabetes medications, HDL-C, history of CHD/HF, LVH and AFIB, and parental history of stroke were significantly associated with incident stroke in women. While age, WAIST, current smoking, DBP and SBP as well as their interaction with HTN treatments, Pre-HTN, UACR, diabetes medications, LDL-C, and history of CHD/HF were significantly associated with incident stroke in men.
The illustration of using the models in Table 3 to predict risk of incident stroke in 10 years for a stroke-free individual with measured risk factors or covariates was shown in Appendix.
In women, assuming the other measures in the model are the same, for examples, those with low to moderate alcohol consumption (1 -14 drinks per week) had 50% lower risk compared with the others; and 2.5% higher risk per 10 mg/dl higher FPG among participants on diabetic medication. All terms related to blood pressures in Equation (3) (Appendix) can be rearranged as 0.02441* DBP*HTNRX + 0.00224*DBP*(1 − HTNRX) + 0.01424*SBP*(1 − HTNRX). Therefore, associations of blood pressures with incident stroke are different among those with and without antihypertensive therapy. In women, the medians UACR in the three sub-categories, normal, microalbuminuria, and macroalbu-minuria were 7, 66, and 1492 mg/g, respectively. If we use these medians as the respective reference levels of UACR in the three sub-categories, then based on the relationship between coefficient and hazard ratio in a Cox proportional-hazard model [24], the hazard ratios of macroalbuminuria vs. microalbuminuria, macroalbuminuria vs. normal, and microalbuminuria vs. normal, will be 1.174 [= Exp((Log(1492) − Log(66)) × 0.11852), where 0.11852 is the estimated coefficient for Log(UACR) in the model for women, Table 3], 1.315, and 1.120, respectively.
For men, assuming the other measures in the model are the same, the estimated hazard ratios of different levels vs. their respective reference level or hazard ratio by units change for each variable can be interpreted similarly. In addition, the age related terms in Equation (4) (Appendix) is 0.10268 × age − 0.91966 × I(age ≥ 65). Assuming the other measures in the model are the same, based on the relationship between coefficient and hazard ratio in a Cox proportional-hazard model [24], this means for every 5 years higher age stroke risk is 67% [=Exp(5 × 0.10268) − 1] higher. The association of age with incident stroke risk is dependent upon the term 0.10268*age for those aged <65, and the term 0.10268*age − 0.91966 for those aged 65 or older. Similarly, the hazard ratios of macroalbuminuria vs. microalbuminuria, macroalbuminuria vs. normal, and microalbuminuria vs. normal, based on the three medians (5, 70, and 873 mg/g for normal, microalbuminuria, and macroalbuminuria sub-categories in men, respectively) are 1.159, 1.354, and 1.168, respectively.
The C-statistics from the models for women and men are 0.761 and 0.765, respectively, indicating good discrimination ability. The respective HL-statistics 4.668 (p = 0.792) and 9.171 (p = 0.328) show good calibration ability of the models. We also applied Framingham 2008 [9] or American College of Cardiology (ACC)/American Heart Association (AHA) 2013 [10] prediction models (with published estimated coefficients for risk factors and values of baseline function at t = 10) to predict stroke risk in AIs. The applications of Framingham 2008 prediction models produced a C-statistic= 0.701 and a HL-statistic= 109.73 (p< 0.0001) for women, and C = 0.706 and HL-statistic = 281.9 (p < 0.0001) for men; and those applications of ACC/AHA 2013 prediction models for White produced a C-statistic=0.705 and a HL-statistic = 29.82 (p < 0.00023) for women, and C= 0.709 and HL-statistic= 82.3 (p< 0.0001) for men, while those for Black produced a Cstatistic = 0.705 and a HL-statistic = 91.2 (p < 0.0001) for women, and C = 0.711 and HLstatistic = 80.6 (p < 0.0001) for men. The predicted decile specific means of risk in men and women from Framingham 2008 or ACC/AHA 2013 (for White) models are also showed in Figure 1.
To explore performance of the generated models in predicting risk of non-hemorrhagic incident strokes only, a sensitivity analyses was conducted by treating incident hemorrhagic stroke as a competing risk (and hence as censored event competing with non-hemorrhagic incident stroke) [24]. The generated models produced a C = 0.763 and a HL-statistic = 4.877 (p = 0.7706) for women, and C = 0.771 and HL-statistic = 5.558 (p = 0.6966) for men, and therefore there were better discrimination and calibration scores for the generated models for non-hemorrhagic incident strokes compared to those respective Cs and HLs for all incident strokes shown in Table 3.

Discussion
The new prediction models for incident stroke based on data routinely acquired in a clinical setting should prove to be helpful for care providers to evaluate stroke risk of their patients. Of perhaps equal importance, they will allow providers to further reinforce preventive measures such as smoking cessation, preventing or managing diabetes, and controlling blood pressure and LDL levels.
Some of these risk factors such as age, smoking status, SBP, DBP, HTN status, DM status, history of CHD/HF, AFIB, and LVH have also been reported as stroke risk factors [ [28]. Among them, albuminuria is especially and significantly associated with incident stroke in AIs. We found that SHS participants who had macroalbuminuria or microalbuminuria had respectively 3.39 or 1.66 times higher risk of incident stroke than those with normal UACR in women, and 3.29 or 1.70 times in men from the age-adjusted univariate analyses (Table 2). These hazard ratios remained to be 1.315 and 1.120 in women and 1.354 and 1.168 times in men after adjusting for the other risk factors in the models (Table 3) as explained in Results section. The hazard ratios of macroalbuminuria vs. normal UACR were almost equal to those of AFIB vs. not AFIB. Given that AFIB constitutes a previously well-known significant and crucial risk factor for incident stroke [8], the considerable association of albuminuria to stroke in this population cannot be ignored. The significant terms of diabetes medications in men and the interaction of FPG with diabetes medications in women remained in the final models. Which show DM is significantly associated with incident stroke risk and suggest that controlling FPG, especially in those with DM and on diabetes medications, is very important in preventing incident stroke. Our models identified significant independent contributions and combined effects of these risk factors in predicting risk of incident stroke after adjusting for the other risk factors in the respective models. Our models were also somewhat better at predicting non-hemorrhagic strokes than total strokes. This is likely because the majority of stroke cases were nonhemorrhagic strokes.
There are some interesting gender differences from this study. Table 2 and Table 3 show that low-moderate alcohol consumption (1 -14 drinks for female) may be protective against incident stroke in women only. The beneficial effect of low-moderate alcohol consumption in women is consistent with previous findings, but the lack of a significant association for men contradicts those reported in the literature [27]. HDL-C was associated univariatly and multivariately with incident stroke only in women while LDL-C was associated only in men. The reasons for these gender differences are unclear and require further investigation.
Our models had improved predictive value compared to either the Framing-ham 2008 [9] or ACC/AHA 2013 [10] models when examined in AI. The lower performance of the Framingham or ACC/AHA models may be affected by their miscalibration [29] (that is, the average predicted risk from these models are not close to the stroke event rate in AI). The 10 years stroke event rate in AI were 0.043 for women and 0.050 for men, while the average predicted risk from the Framingham models were 0.128 and 0.139, and from ACC/AHA models (for White) 0.078 and 0.142, respectively. The miscalibration can also be seen in Figure 1 and from their large HL-statistics and respective significant p-values mentioned in Results section.
We did not use a reclassification statistic such as net reclassification improvement (NRI) [30] to compare our models with those reported in the literature such as the Framingham or ACC/AHA models. The reasons are due to those reported issues related to miscalibration on clinical use of a risk equation (as we discussed above) in different populations, comparing different models such as different outcomes or population groups used in reported models, and uncertainty about how to draw proper 10-year stroke risk cutoff points [29]. In addition, the NRI is the difference of Youden indexes from two models for a binary classification with a cutoff probability. The problems associated with NRI include concerns about statistical invalidity in real and simulated data, inadequately accounting for clinically important differences in shifts among risk categories if there are three or more risk categories, and other controversies [29] [30] [ 31][ 32].

Conclusion
Our generated stroke prediction models based on the data from the SHS provide a stroke risk appraisal specific for a population with high prevalence of obesity, diabetes, and renal disease. With the increasing of incidence and prevalence of obesity and diabetes in the US, we believe that our generated prediction models would provide an additional helpful assessment tool for other similar populations. Although our generated stroke prediction models are internally validated, they should be tested and validated in other populations.
Men: (4) where I(.) is the indicator function, which equals 1 if the condition in the parentheses is met and 0 otherwise.
To illustrate the use of models in Table 3 and Equations (1) to (4) to predict risk of incident stroke in 10 years for, say, a stroke-free man who is 60.5 years old smoker, and has waist circumference = 100 cm, SBP/DBP = 183/114 mmHg and take hypertension medications, not using DM medications, UACR = 160.6 mg/g, LDL-C = 162 mg/dl, and the history of CHD/HF, by applying Equation (4), the summation term in Equation (1) (2) and (1), his probability (risk) of developing stroke in 10 years will equal to where S 0 (10) (=0.999942632, Table 3) is the baseline stroke-free time function evaluated at t=10 for the model. The predicted probability of 60.3% is about 5 times the average probability 12.7% (Table A1) risk of developing incident stroke in 10 years for a man this age. This calculation can be easily conducted by a MS Excel work sheet or directly using stroke risk calculator that will be created on the SHS web site.

Table A1
Average predicted probability (risk) of developing incident stroke in 10 years.  Calibration by deciles of model-based predicted probabilities of stroke event in 10 years. "KM" denotes observed risk (by using Kaplan-Meier method). "Model" denotes the models in Table 3 based predicted, "FS2008" the Framingham 2008 models based predicted, and "ACC2013" the ACC/AHA 2013 models (for White) based predicted de-cile specific risk means in deciles.  Table 3 Cox proportional hazards models for stroke-free time.  The unit used to calculate Hazard Ratio is 5 years for age, and 10 mg/dl for FPG and HDL-C.