Modelling and predicting low count child asthma hospital readmissions using General Additive Models

Background: Daily paediatric asthma readmissions within 28 days are a good example of a low count time series and not easily amenable to common time series methods used in studies of asthma seasonality and time trends. We sought to model and predict daily trends of childhood asthma readmissions over time in Victoria, Australia. Methods: We used a database of 75,000 childhood asthma admissions from the Department of Health, Victoria, Australia in 1997-2009. Daily admissions over time were modeled using a semi parametric Generalized Additive Model (GAM) and by sex and age group. Predictions were also estimated by using these models. Results: N = 2401 asthma readmissions within 28 days occurred during study period. Of these, n = 1358 (57%) were boys. Overall, seasonal peaks occurred in winter (30.5%) followed by autumn (28.6%) and then spring (24.6%) (p < 0.0005). Day of the week and month were significantly associated with trends in readmission. Smooth function of time was significant (p < 0.0005) and indicated declining trends in readmissions in 2001-2002 and then increasing, returning to roughly initial levels. Predictions suggested readmissions would continue to increase by 5% per year with boys in the 2 to 5 years age group experiencing the largest increase. Conclusions: GAMs are reliable methods for low count time series such as repeat admissions. Our model implied: health services may need to be revised to accommodate for seasonal peaks in readmission especially for younger age groups.


INTRODUCTION
Asthma is the most common long-term medical condition in children [1].Children have higher rates of asthma hospitalization than adults.In Australia, children aged 0 -14 years account for 58% of total asthma admissions [2].Asthma hospital readmissions attract disproportionate amounts of resources from healthcare systems worldwide [3,4], including in Australia [5].Asthma readmissions are an indicator of asthma severity, psychological comorbidity, the effectiveness of hospital and community health services and also developments in hospital admission criteria, diagnostic practice and/or emergency department management [6].Seasonality is an important marker of total environmental load or triggers such as high pollen exposure and respiratory virus infections which are associated with asthma hospital admissions [7].Being aware of the days/times of the year when childhood asthma readmissions are expected to increase may assist in the achievement of better and more efficient health service planning.
The utility of predictive models for management of paediatric hospital readmissions in general and their benefit for health services research have been demonstrated [8].Predictive models for paediatric asthma in particular have also been developed.Some models have used Cox regressions [9] or logistic regressions [10][11][12][13][14] with demographic or clinical attributes as predictors of readmissions and are therefore important for use in case management.These models predict population at risk but do not have the capacity to predict readmission frequencies or their timing, and are therefore of little use for hospital management resource planning.Models based on mathematical neural networking techniques have also been used to predict paediatric asthma admissions frequencies [15,16].However, neural networks have limited capacities to demonstrate associations with seasonality and long term trends [16].
The analysis of seasonality and time trends arise naturally within a time series framework.Many studies of childhood asthma hospital admissions have developed time series models for examining the associations between aeroallergens, pollution, weather and viral infections [17][18][19][20][21][22].However, none of these models have been used to make predictions.Furthermore, only one study of childhood asthma readmissions has used a time series model, however, the model's capability to predict future readmission counts was not tested [23].In this study, we aimed to developed a time series model of childhood asthma readmissions, using the Generalized Additive Model (GAM) framework [24] to predict daily childhood asthma readmissions overall, as well as for males and females separately.

Data
We used daily childhood asthma hospital admissions in Victoria between 1997 and 2009 from the Department of Human Services (DHS).The data collection methodology has been described elsewhere [25].Children between the ages of 2 and 18 were included in the study.ICD-9 codes (493) up to 1998 and ICD-10 codes (J45 or J46) for the remainder of the data to 2009 were used to determine admissions with a principal diagnosis of asthma.The daily count of readmissions was defined as the daily number of readmissions that occurred within 28 days of the separation date of their respective index admission [26,27].

Statistical Methods
We assumed that the mean varied smoothly over time, and that a Poisson process underlay the association between the mean and observed counts.We modeled the mean using a GAM [24].As covariates we included a time variable that was used to capture any trend in the data, while a month variable modeled any seasonality.The day of the week was also included, to cover possible differences in hospital service usage on weekends compared to the rest of the week [28].Our final model was: Mean (daily readmission count) = f 1 (time) + f 2 (month) + f 3 (day of week) + error term, with the mean and covariates on a log scale.f 1 was a smooth non-parametric function, obtained as a combination of smoothing spline basis functions.f 2 and f 3 were linear functions, built using model coefficient matrices for parametric components of the model.In other words, we employed a semiparametric model.
We assumed a conditional Poisson distribution and, although the results are not shown here, χ 2 tests gave us no evidence to reject this assumption for either male or female readmissions separately and, therefore, for all readmissions.We also tried fitting models assuming an underlying negative binomial distribution for the outcome, but this had a poor fit to the data.To cope with data over-dispersion, we used a self-adjusting over-dispersion parameter.The GAM method we used guarded against over-fitting by incorporating a penalized regression spline approach, with increasing penalty with increasing curve "wiggleness", that is, its second derivative.The choice of a low spline basis dimension also guards against over-fitting by reducing smoothing parameters and hence reducing model inflexibility and increasing the effective degrees of freedom [29].By experimenting, we found that comparatively low spline basis dimensions of between 5 and 15 coped with this sort of sparse data adequately.We chose counts as the outcome rather than readmission rates, usually defined as the number of readmissions/number of admissions, because a preliminary exploration indicated that counts were a more accurate marker of population rates.Also as part of our initial exploration, we compared our final model to a fully parametric model.That is, not only time, but also day of the week and day of the year (replacing month) were modeled with GAM fitted smooth penalised regression splines instead of assuming an a priori distribution and estimating their parametric components.Using day of the year (1 -365 or 366) did not impose any assumption of monthly seasonality and so can be regarded as a test of whether seasonality is actually substantiated by the data.
We validated our model predictions by training the model on the first 11 years of data and then using the model to predict the final two years.We used χ 2 goodness-of-fit tests to compare these predictions to the actual data.Using the full 13 years of data, we forecasted readmission counts one year beyond the end of the study.We also included 95% error limits around the predictions.We constructed our graphs using either the Lattice package [30] in the open source language R, version 2.13 [5] or just using the usual plot functions in R, and Stata 10.1 (Statacorp, USA).All of the GAMs were fitted using the MGCV package in R [29].Statistical analyses were done in R or Stata using a two-sided significance level of 0.05.
In Figure 1, we display the time series graphs of the total daily readmission counts for each of the study fiscal years (1 st July to 30 th June).These are typical of low count time series, that is, infrequently occurring and low magnitude counts.The distribution of daily readmission counts had a very low range, between 0 and 5, with the majority of counts being 0, including the median (Table 1).The sparseness of the data can be appreciated by observing that the 75 th percentile was 1 and even the 99 th was only 3. For the boys' and girls' distributions separately, their respective 75 th percentiles were even lower, both being equal to 0. Even on a per-month aggregate basis, the counts were as low as 2. Runs of consecutive zero readmission days were as long as 54 days for boys' and 48 days for girls' readmissions with respective inter quartile ranges and median (IQR) of (1, 3, 5) and (2, 4, 7).Consecutive zero readmission days for total readmissions had IQR (1, 2, 3) with a maximum length of 27 days.On a monthly basis, there were as many as 12 distinct blocks (interspersed by at least 1 readmission) of these consecutive zero runs for total, male and female readmissions.The IQR for blocks of consecutive zero runs per month for total and boys' was (4, 7, 10) and for girls' readmissions (4,7,9).
In Figures 2(a) to (c), we display readmissions, stratified by age and gender, as yearly counts, rates per 100,000 population and readmission rates, respectively.A close correspondence between counts and population rates for all age groups can be seen, however, similarity between population and readmission rates is present only for 2 -5 year olds while the older age groups show a stark contrast.Although not shown here, regression analyses indicated a linear correspondence between counts and population rates but not between readmission rates and population rates.

Semi Parametric Models
We fitted our model to total readmissions and to male and female readmissions separately.The model showed that both month and day of the week had significant associations with mean daily readmission counts.Higher counts were expected in March, May, June and November, and the lowest counts in January.This pattern held for both males and females with males experiencing higher peaks.The day of the week had more of an impact for girls than boys.Overall more readmissions expected earlier in the week, from Sunday onwards, and the fewest on Saturday.
The fitted model produced a dispersion scaling parameter which was very close to unity, further justifying the assumption of a Poisson process and explaining why the negative binomial was such a bad fit.The smooth function of time was also significant (p < 0.0005), and indicated that after an initial trend of declining readmission counts until about 2001-2002, they subsequently began to increase returning to roughly initial levels (Figure 3).
Exploration with auto correlation plots and Durbin-Watson tests indicated a weak but statistically significant auto correlation at lag 1 of 0.1, 0.08 and 0.06 for total, male and female readmissions respectively and so violated the assumption of independence of the daily readmission counts.In order to adjust for non independence, lag one values were included as part of the GAM.They made no substantial difference to the results and were finally not included due to the predictive role of our model.

Non Parametric Models
The non parametric model also demonstrated time, day of the week as well as day of year to be significantly associated with readmission counts.The day of year variable was in complete agreement with the parametric monthly variable of our final model for both timing and mplitude of peaks and troughs throughout the year.a

Model Validation
We validated the model predictions by training the model with the first 11 years of data then using the model to predict the final two years of data.The predictions were subsequently aggregated by month and compared to the actual data.χ 2 goodness-of-fit tests indicated that there was no statistical difference between the predictions and the data (p = 0.65).In comparison, when using the mean of the training data as a predictor of the final 2 years of data, χ 2 tests indicated that the monthly aggregated mean predictions were not a good fit to the data (p = 0.03).We also used this method to test the parametric model's fit to the data and found very similar results to the semi parametric model's but with slightly higher χ 2 values and so the parametric model did not have as good a fit to the data as the semi parametric  model.As the semi parametric model outperformed both the mean and the non parametric model in predicting the final 2 years of readmission data this gave us some confidence in using it to predict future readmissions.Semi parametric models were also fitted to data subsets consisting of the 6 combinations of age group and sex displayed in Figure 2. Of these subsets, the largest number of total readmissions was for 2 to 5 year old boys, N = 829, and the smallest was for 13 to 18 year old girls, N = 253.In the presence of these extremely small numbers, the GAM was still sensitive to seasonality, day of the week and the time trend, with differences in the values of these covariates for the various subsets.The results of the χ 2 goodness-of-fit tests validated the fit and prediction, but were only marginally better than the mean.Due to their extremely small numbers and the marginal improvement on the mean, we could not consider the models reliable for these 6 subsets.
Table 2 shows the actual data for 2009, in fortnightly aggregates, alongside the semi parametric GAM predictions for the same period.The data predictions were obtained with the same method used for model validation.The model was trained on the first 11 years of data and then used to predict the final two years of data.The predictions largely follow the previous year's seasonal patterns, but with smoothed troughs and peaks.There are major differences in fortnights 18 & 19 and 25 & 26 representing March and June respectively.From Figure 1, we see that the March 2009 counts were unusually high and the June 2009 counts unusually low for those times of the year and so these differences may be due to normal variability or to the smoothing effect of the model.Table 2 shows that the confidence intervals for predictions after fortnight 17 tended to be the widest which indicated a decreasing reliability with an increasing predictive length.These predictions were for two years and so predictions of not more than one year are likely to be more reliable.However, overall, the predictions are in keeping with the seasonal character of the data.

Future Predictions
To obtain future predictions we trained the model on the full 13 years of data and then predicted 12 months ahead.
In Figure 4, we graphed the predicted readmissions one year after study end.The predictions largely follow the previous year's seasonal patterns, but with smoothed troughs and peaks.As with the 2009 data predictions, there are differences in June and in March probably due to the same factors as explained in the validation section.The predictions are similar with respect to seasonal patterns but they also suggest an increasing trend in readmissions.The predictions indicate that readmissions will continue to grow by up to 5% a year on average overall.Although not shown here, time trends for each of the three age groups indicated that the youngest age group, that is 2 -5 years, will experience the largest increase and especially the boys.

DISCUSSION
This study is the first to predict childhood asthma hospital readmission counts using a semi-parametric generalized additive model.The model suggested an increasing time trend in readmission counts from about 2003-2004 to study end in June 2009, and indicated that certain months of the year and days of the week were more likely to register higher readmissions.The model's predictions suggested that this increasing trend continued into the year after the end of the study.The total forecasts for 2010 suggest a return close to the high yearly totals at the commencement of the study.Even the lowest point of the 95% CI for the 2010 prediction is in keeping with this increasing trend.
To the best of our knowledge, only one other study has used time series methods to examine paediatric asthma readmissions [23].The authors used Holt's method, a weighting scheme for univariate time series with a linear trend [31], but did not report the method's capacity for predictions.Annual readmission rates were their focus, and therefore their examinations of seasonality and day of the week were limited.The semi-parametric GAM approach has been recommended as a means of adjusting for any confounding associations of seasonality or day of the week in studies of air pollution and weather factors and daily counts of deaths or respiratory hospital admissions [32,33].It has also been used both for this purpose [20] and to model seasonality [21] for paediatric hospital admissions.However, although none of these studies had produced predictions, they indicated GAM's utility in modeling seasonality and day of the week associations with respiratory hospital admissions.

Comparisons to Admissions
Although Lincoln and colleagues [21] investigated admissions and not readmissions, their study is comparable to ours due to the mutual focus on seasonality and day of the week, some overlap with the study period (they used data from the period 1994-2000) and the Australian setting-Sydney, New South Wales (NSW).Lincoln and colleagues [21] also found, similarly to our study, that there tended to be greater numbers of admissions earlier in the week from Sunday onwards.The day of week association is likely an artefact of population interface with the Australian medical system rather than an intrinsic feature of asthma exacerbation.Choice of week end or week start is quite arbitrary.General practitioners are less likely to be available on weekends and perhaps parents hope children will improve with rest over the weekend.However, it is important to adjust for it so as to tease out any latent time trends and seasonality in daily data [28].
In the two studies, peaks in admissions and readmissions occurred at almost identical times of the year.The main difference in timing was with the May/June peak in readmissions.May/June corresponded with weeks 18 -24.The Lincoln study [21] had an admission peak in June but in weeks 20 -22 and not in May.The striking difference with these peaks concerned the relative amplitudes.The risk for admissions in the August peak did not reach statistical significance.However for readmissions we found that the May/June peak was far greater and more sustained than the other three peaks and, compared to March, the August peak was slightly higher and the November peak slightly lower.Furthermore, the November readmissions peak was high for boys but low for girls.Why should admission and readmission peaks coincide?Perhaps they reflect differences in timing of seasonal influences due to location and time of study.A weekly analysis, for comparison purposes, showed that weekly readmission in 28 days rates can be as high as 21% with IQR (2.7%, 4.2%, 6.5%).Lincoln and colleagues [21] did not differentiate admissions or readmis-sions within 28 days and so readmissions may have significantly augmented admission peaks.If this is so, the children who readmitted within 28 days displayed a greater and more sustained sensitivity to possible seasonal influences and experienced exacerbations earlier and more often.Our data also showed that, of the children who experienced a readmission within 28 days, 9% of them experienced another within the same 28 days.

Further Modeling Techniques
Artificial neural networks have been used to predict paediatric asthma admissions in Baltimore, USA [15] and Athens, Greece [16].This approach produced accurate predictions of admissions, but due to the nature of this technique, it was not able to statistically assess seasonal, day of the week and time trend associations.Both studies predicted only one week ahead and neither provided long term predictions as have we.The Greek study ran over 4 years, 2001-2004, and considered age groups but not sex.The US study examined 14 years of data between 1986 and 1999 and considered age group, sex, ethnicity and compared the city of Baltimore to the rest of the state.Both studies described differences in frequencies of admissions based on seasonality, age group and sex which are similar to our findings.However, the authors of the Greek study noted that their approach was not reliable for the smaller numbers in the 5 -14 age group.Given that the numbers of daily admissions for this group were far in excess of the numbers of total readmissions for our study, artificial neural networks may not be reliable for modeling low counts.

Strengths
Our study has a number of strengths.Firstly, our model was based on a large and comprehensive database that included all paediatric admissions for all Victorian public and private acute hospitals for the fiscal years 1997-2009.While admission criteria may vary both within and between hospitals, this has the potential to introduce random error which would push our estimates towards the null.Although we were modeling a low count time series, the length of the study period provided sufficient power to train our model and extend it into the future.Our model was sensitive to the seasonality, day of the week and time trend aspects of hospital readmissions.It achieved this in the presence of very low counts, even when the counts were reduced due to stratification by age group and sex.It was also sufficiently flexible to be able to reliably forecast readmission counts at fortnightly and monthly windows and can be extended for up to a year with these time frames.In forecasting future counts, the model showed itself robust to past outlier counts.Our method provided a way of modeling intrinsically difficult low count time series which may also prove useful in other settings besides child hospital asthma readmissions.
Previous studies examining the modeling of time series counts have compiled a check list of desired model features: flexibility to incorporate pronounced dependence structures in the data; ability to account for any overdispersion; easy to use link functions allowing for easy inference of covariate parameters; procedures for model fitting; diagnostic tools for assessing model adequacy; ease of forecasting [28,34].Our model possessed these features further displaying its potential to be successfully applied in other settings.Furthermore, the semi parametric and fully parametric methods applied here, can be used respectively for parameter or observation driven models as categorised by Jung et al. [34].It has been demonstrated that GAM computational methods may not be able to adjust for covariate correlation due to a back-fitting approach.This would result in low coefficient standard errors and therefore unreliably low p-values [35].However, this does not apply to our model as the R package we used does not use back-fitting but instead fits penalized regression splines at once thus making the reliable calculation of the standard errors of the coefficient covariances possible [29].This gives us confidence in the associations with season, day of the week and time trend detected by our model.

Limitations
One limitation of the study is that, once we assumed that seasonality is a marker for total environmental load, we could not differentiate between the various components of the environment such as weather, pollution, allergens and viral infections that may be associated with child asthma readmissions.However, the strong seasonal component in readmissions, indicated by our model, demon-strated the importance of the environment and showed that seasonal timing would be a good place to start an investigation of possible asthma exacerbation triggers.Although our model did not produce predictions on a hospital or area basis, it can still assist planning by health services and clinical management by indicating when adjustments to current management practices could be considered.
The predictive function of our model assumed a linear extrapolation into the future.This assumption may be good enough in the short term but may not hold for very long term predictions.However, 13 years of data gave us some reason to expect the not too distant future to be similar.At the very least, our model has the capacity to be updated or refined as new data come to hand.This model does not take individual risk factors into account but is still useful on an individual basis in that an individual who is prone to asthma exacerbations can be alerted beforehand of periods of a higher overall risk.

Conclusion
We have provided a reliable way of modeling low count time series such as daily childhood asthma hospital readmissions.Our model showed: there had been a strong seasonal impact on child asthma hospital readmissions; an increasing trend in readmissions until June, 2009; readmissions would continue to increase by up to 5% per year on average overall, with the 2 -5 years age group experiencing the largest increase, especially for boys.This implies: clinical services may need to revise procedures and be alerted to possible greater risk of readmission at certain times of the year, especially for younger age groups; health services management may need to adjust resource allocation and planning.

Figure 1 .
Figure 1.Time series graphs of all daily readmissions for each study year.

Figure 2 .
Figure 2. (a) Annual count of readmission within 28 days; (b) Annual rate of readmission within 28 days per 100,000 population; (c) Annual rate of readmission within 28 days per total number of admissions.

Figure 3 .
Figure 3. Fitted semi parametric GAM and time trend to boys' (a) and girls' (b) daily readmission counts.Dotted lines represent 95% error limits.The horizontal lines are the overall means for the respective data subsets.

Figure 4 .
Figure 4. Cumulative fortnightly predictions for 1 year after study end (2010) compared to the final study year (2009).

Table 1 .
Distribution of counts of readmissions within 28 days for fiscal years 1997-2009.
χ 2 test for difference in seasonal counts p < 0.00005

Table 2 .
Total readmissions within 28 days, in fortnight aggregates, for 2009 are compared to semi parametric GAM predictions with a 95% CI for the same period.