Analysis of Hospital Mortality Data : The Role of DRG ’ s

Background: Factors associated with hospital mortality are usually identified and their effects are quantified through statistical modeling. To guide the choice of the best statistical model, we first quantify the predictive ability of each model and then use the CIHI index to see if the hospital policy needs any change. Objectives: The main purpose of this study compared three statistical models in the evaluation of the association between hospital mortality and two risk factors, namely subject’s age at admission and the length of stay, adjusting for the effect of Diagnostic Related Groups (DRG). Methods: We use several SAS procedures to quantify the effect of DRG on the variability in hospital mortality. These procedures are the Logistic Regression model (ignoring the DRG effect), the Generalized Estimating Equation (GEE) that takes into account the within DRG clustering effect (but the within cluster correlation is treated as nuisance parameter), and the Generalized Linear Mixed Model (GLIMMIX). We showed that the GLIMMIX is superior to other models as it properly accounts for the clustering effect of “Diagnostic Related Groups” denoted by DRG. Results: The GLM procedure showed that the proportional contribution of DRG is 16%. All three models showed significant and increasing trend in mortality (P < 0.0001) with respect to the two risk factors (age at admission, and hospital length of stay). It was also clear that the CIHI index was not different under the three models. We re-estimated the models parameters after dichotomizing the risk factors at the optimal cut-off points, using the ROC curve. The parameters estimates and their significance did not change. How to cite this paper: Shoukri, M.M., Algahtani, S.N., Eldali, A.M., AlMarzouqi, M.R. and Al-Ageel, S.M. (2019) Analysis of Hospital Mortality Data: The Role of DRG’s. Open Journal of Statistics, 9, 62-73. https://doi.org/10.4236/ojs.2019.91006 Received: December 24, 2018 Accepted: January 22, 2019 Published: January 25, 2019 Copyright © 2019 by author(s) and Scientific Research Publishing Inc. This work is licensed under the Creative Commons Attribution International License (CC BY 4.0). http://creativecommons.org/licenses/by/4.0/ Open Access M. M. Shoukri et al. DOI: 10.4236/ojs.2019.91006 63 Open Journal of Statistics


Introduction
The ability to gauge hospital performance using patient outcome data depends upon many factors.In principle, the outcome needs reflect features that are directly affected by the quality of hospital care, to name but a few; mortality, readmission rates patients and employee satisfaction.Beyond this, however, there are a number of important data and statistical considerations: 1) Data must be available and used to adjust for differences in patient health at admission across different hospitals (case-mix differences).These adjustments are required to ensure that variations in reported performance apply to hospitals' contributions to their patients' outcomes rather than to the intrinsic difficulty of the patients they treat.Needless to say that, performance of the adjustments depends on the type and quality of available data.
2) In distinct contrast to the previous point, reported performance should not adjust away differences related to the quality of the hospital.For example, if "presence of a special dialysis care unit" is systematically associated with better survival following organ failure, a hospital's reported performance should capture the benefit provided by that unit and as a consequence such hospital-level characteristic should not influence the risk adjustment.
3) The reported performance measure should be little affected by the variability associated with rates based on the small numbers of cases.
In this report we address technical statistical issues associated with the KFSHRC hospital mortality data.The salient point is that there is no consensus to guide our choice of an appropriate statistical model.However, we shall use the most scientific statistical models to analyze our data.To enhance the traditional modeling techniques we include use of more flexible models incorporating Diagnosis Related Groups (DRG) adjustment [1]; stressing the use of statistical distributions that do not belong to the well-known Gaussian family used in the hierarchical, random effects models; evaluation of the effectiveness of current outlier detection methods; and consideration of producing an ensemble of hospital-specific Standardized Mortality Ratios (HSMR) that accurately estimates the true, underlying distribution of ratios [2].Discussion with clinicians and other quality experts concluded that risk adjustments should not reflect hospital characteristics, but their use in reducing confounding of the case-mix/ risk relation.Statistical models are available for each of these operations.The ability to develop and implement such models is now available since the adoption of the SAS software and the acquisition of its important components.
In Section 2 we define what is meant by DGR, and in Section 3 we describe the Open Journal of Statistics data that were made available to us, with mortality as the primary outcome at the King Faisal Specialist Hospital (KFSHRC).In Section 4 we compare the models, and in Section 5 we discuss the quantitative merits of these models, followed by recommendations.

The Importance of Incorporating DRG's within the Proposed Models
The Diagnostic Related Groups (DRGs) were first developed at Yale University in 1975.The main objective was to group patients with similar treatments and conditions for comparative studies.DRGs were designed to be homogeneous units of hospital activity to which binding prices could be attached.A central theme in the advocacy of DRGs was that this reimbursement system would, by constraining the hospitals, oblige their administrators to alter the behavior of the physicians and surgeons comprising their medical staffs.Hospitals were forced to leave the nearly risk-free world of cost reimbursement, and face the uncertain financial consequences associated with the provision of health care.DRGs were designed to provide practice pattern information that administrators could use to influence individual physician behavior.
DRGs were designed to be homogeneous units of hospital activity to which binding prices could be attached.A central theme in the advocacy of DRGs was that this reimbursement system would, by constraining the hospitals, oblige their administrators to alter the behavior of the physicians and surgeons comprising their medical staffs.Hospitals were forced to leave the nearly risk-free world of cost reimbursement, and face the uncertain financial consequences associated with the provision of health care.DRGs were designed to provide practice pattern information that administrators could use to influence individual physician behavior.
In 2007, author Rick Mayes [3] described DRGs as: ...the single most influential postwar innovation in medical financing: Medicare's prospective payment system (PPS).Inexorably rising medical inflation and deep economic deterioration forced policymakers in the late 1970s to pursue radical reform of Medicare to keep the program from insolvency.
In the USA the most significant change in health policy since Medicare and Medicaid's passage in 1965 went virtually unnoticed by the general public [4].
Nevertheless, the change was nothing short of revolutionary.For the first time, the federal government gained the upper hand in its financial relationship with the hospital industry.Medicare's new prospective payment system with DRGs triggered a shift in the balance of political and economic power between the providers of medical care (hospitals and physicians) and those who paid for it-power that providers had successfully accumulated for more than half a century.From statistical view point DRG's are considered artificial clusters of subjects.Krumholz et al. [5] discussed several factors that should be considered when assessing hospital quality.These relate to differences in the chronic and clinical acuity of patients at hospital presentation, the numbers of patients treated at a hospital, the frequency of the outcome studied, the extent to which the outcome reflects a hospital quality signal, and the form of the performance metric used to assess hospital quality.However, issues related to DRG have not been considered as factors of importance.Since the outcome of interest is hospital mortality, any attempt to derive risk adjusted mortality that does not take into account the relative importance of DRG will produce biased estimates [6].The performance measure is reported as: The denominator of Equation ( 1) results from applying a model that adjusts/standardizes for an ensemble of patient-level, pre-admission risk factors, rather than only demographic factors such as age and gender as is typical in epidemiological applications.The statistical issues arising in the estimation of the standardized death rate and the SMR are identical because the latter is simply the hospital-specific value divided by the expected number of deaths computed from the postulated risk model.

Study Design
Hospital discharge status, available from the hospital medical records from 2014 through 2016 were extracted.For each subject, the age at admission, length of stay and DRG membership were included in this cross sectional retrospective design.The study was reviewed and approved by the Institutional Review Board at the King Faisal Specialist Hospital and Research Center (KFSHRC).

Dependent Variable
Discharge status is the dependent variable (dead/alive).Because of the Bernoulli distribution of the outcome, the log-odds of death were calculated in the analytical cohort.The evaluation process must be based on an effective risk adjustment.Though one might wish to have additional information of patient attributes and clinical severity, even with currently available data we should evaluate whether a more flexible risk adjustment model will improve performance.Patient characteristics (clinical and demographic) are of the three types, measured and accounted for, measurable but not accounted for, and characteristics that are difficult or impossible to measure.Prudence dictates that risk adjustments should include pre-admission medical conditions, but whether or not to include demographic attributes is a policy decision.

Statistical Analysis
Univariate and descriptive statistics were used to profile the study covariates, including the frequency distribution of the top twelve DRG's, as shown in Table 1.
Because of the binary nature of the outcome of interest (patient's status when discharged), we fitted logistic regression models to estimate change in level (intercept) and trend (slope) on log-odds of age at admission and length of stay.Each model was adjusted to account for the clustering effect of DRG.
Three statistical estimation procedures in SAS (GLM, GEE, GLIMMIX) were used to account for the correlation between responses with a DRG and heterogeneity across individuals in the study.The intra-class correlation was calculated using the one-way ANOVA using the GLM procedure in SAS.Data management and analyses were accomplished via PC-SAS (v9.4) [7], with an a-priori Type I error rate set at 0.01.The analyses produced point estimates and 95% confidence intervals of the odds ratios whether the two covariates were entered the models as continuous or as categorical variables.

Results
The study included 191943 discharges, of which 184,907 alive and 7046 dead.
The summary statistics are outlined in Table 1 and Table 2.
Assuming that the number of DRG's in the data base is k, and the size of the i th DRG is i k .The estimated intra-cluster correlation obtained from the one-way ANOVA using the GLM procedure in SAS (Shoukri [8]) is given in Equation ( 2): ( ) where MSBD and MSWD are respectively the between DRG mean squares and the within DRG mean squares.Moreover: Summary statistics for age at admission and length of stay are presented in Table 2.Note that the standard deviation formula uses the (number of observations minus one) to produce an unbiased estimator for the corresponding population parameter (Shoukri [8]).
The main purpose of using the GLM, which requires independent responses, is to produce a point estimator of the within cluster correlation (Shoukri,[8]).
Note that: The Effect of Dichotomization Measurements of continuous variables are made in all branches of epidemiological studies aiding in the diagnosis and treatment of patients.In clinical practice it is helpful to label individuals as having r not having an attribute, such as being "old" or "young" or having "long stay" depending on the number of days.
Dichotomization of continuous variables is also common in clinical research, but the statistical analysis has some serious drawbacks as there will be reduction in the precision of the estimated effect sizes.Though grouping may help data presentation, notably in tables, categorization is unnecessary for.Here we consider the impact of converting continuous data to two groups (dichotomizing), as this is the most common approach in clinical research.
Within each model we estimated for each effect the log-odds as an effect size using continuous and categorized covariates.We found that the GLIMMIX has superior advantage over the logistic regression and the GEE models.We calculated the optimal split for the AAA and LOS using the "Receiver Operating Characteristic curve" or ROC curve.Figure 1 and Table 3 shows the optimal cut off for LOS is 160 days, and the corresponding area under curve 73%.This means that the risk of death is significantly higher among patients who are hospitalized over 160 days relative to those who stay less than 160 days, (corrected P-value = 0.0001).Additionally, in Figure 2, and Table 4 we show that the optimal split for AAA is 53 years.The areas under the ROC curve corresponding to the dichotomized covariate AAA is 65%.Dichotomizing leads to several problems.Firstly, much information is lost, and this can been seen from the increase in the estimated standard errors of the odds ratios.Moreover, the odds ratios point estimates are inflated as well and therefore are potentially biased.For example, in Table 5 the estimated odds ratio of the dichotomized LOS is 14.53 while its value is 1.024 when measured on the continuous scale under the same model.The remark holds true if the fitted model is the GEE as shown in Table 6.The estimates are somewhat stable under the GLIMMIX and the results are shown in Table 7.One may conclude that dichotomization may increase the risk of a positive result being a false-positive.

Categorical Age:
Alternative to dichotomization we categorized age in a meaningful way such that: Group 1: Age is less than 14 years Group 2: Age between 15 and 30 Group 3: Age between 31 and 59 Group 4: Age above 60.
When we plotted the mortality rate, with 95% confidence limits, against the 4 age categories, as shown in Figure 3, there was an increasing trend in mortality , and giving CIHI = NUM/DEN less than unity.This indicates that the hospital risk adjusted mortality meets the CIHI criteria for quality.

Discussion
Although the fitted models produced odds ratio estimates whose changes and trend were in the same direction and of the same significance, the magnitude of point estimates and length of confidence intervals varied.Clearly the logistic regression produced a smaller length of confidence intervals.This should be expected, since this model ignores the nature of the correlation structure among responses within each DRG, and hence the standard errors are under-estimated.
The GEE, introduced by Liang and Zeger [10] is suitable for the analysis of clustered data.The GEE estimation produced similar magnitude of point estimates but relatively less precise confidence intervals.The GEE is supposed to produce consistent estimates even if the within DRG correlation parameter is misspecified.In our case we assigned an exchangeable correlation, as a working correlation parameter to represent the average within DRG heterogeneity.Finally the GLIMMIX produced entirely different set of estimates, and an estimate of an

Figure 1 .
Figure 1.The ROC curve for the LOS.

Figure 2 .
Figure 2. The ROC curve for AAA.

Figure 3 .
Figure 3. Death rate versus the 4 age categories.
thus logistic regression is a suitable approach for including the effects of patient-level characteristics.With flexible modeling of covariate influences, the model would produce a valid risk adjustment and there is no reason to replace the logistic by another function.
Another reason is, to preserve the stochastic process and hierarchical structure of the data, and develop an effective risk adjustment.Because patient-specific outcomes are binary (death indicator), a Bernoulli model operating at the patient level is appropriate.Risk adjustment and stabilization should adopt this M. M. Shoukri et al.DOI: 10.4236/ojs.2019.9100666 Open Journal of Statistics model and

Table 1 .
The ICD9 diagnoses for the top most frequent DRG's in the hospital data base.

Table 2 .
Summary statistics for age at admission (AAA) and length of stay (LOS) presented by discharge status (Alive, Dead).(a) Status = Alive; (b) Status = Dead.

Table 3 .
Area under the curve for the LOS optimal cut-off point.

Table 4 .
Area under the curve, test result variable(s) for AAA.

Table 5 .
Estimating the odds ratios using the logistic regression models.

Table 6 .
Estimating the odds ratios using the GEE models.

Table 7 .
Estimating the odds ratios using the GLIMMIX models.groups moved up.The one-degree of freedom Cochran-Armitage test for trend was quite significant with p-value < 0.001.

Table 8 .
[9] odds ratio estimate of LOS, which is highly correlated with age has improved and in fact is almost similar to the estimated odds ratio when the age was taken as a continuous covariate.There is almost no change in the between DRG variance component estimate,23.56Thescaleddeviance is less than one, indicating that the model has captured the effect of the measured and the unmeasured covariates.The value of −2 Res log pseudo-likelihood, (which is equivalently defined as the AIC),indicates that model goodness of fit is also acceptable.Under the GLIMMIX model, whose results are summarized in Table8, the-CIHI[9]index of hospital performance when mortality is the outcome of inter-

Table 8 .
GLIMMIX: Age is categorized into 4 groups with group 4 being the reference.