Forecasting Diabetes Patients Attendance at Al-Baha Hospitals Using Autoregressive Fractional Integrated Moving Average (ARFIMA) Models

Diabetes has become a concern in the developed and developing countries with its growing number of patients reported to the ministry of health records. This paper discusses the use of the Autoregressive Fractional Moving Average (ARFIMA) technique to modeling the diabetes patient’s attendance at Al-Baha hospitals using monthly time series data. The data used in the analysis of this paper are monthly readings of diabetes patients data covered the period January 2006-December 2016. The data were collected from the General Directorate of Health Affairs, Al-Baha region. The autoregressive fractional moving average approach was applied to the data through the model identification, estimation, diagnostic checking and forecasting. Hurst test results and ACF confirmed that there is a long memory behavior in diabetic patient’s data. Also, the fractional difference to diabetes series data revealed that (d = 0.44). Moreover, unit root tests indicated that the fractional difference of diabetes series level is stationary. Furthermore, according to AIC and BIC of model selection criteria ARFIMA (1, 0.44, 0) model shown the smallest values, hence this model was chosen as an adequate represents the data. Also, a diagnostic check confirmed that ARFIMA was appropriate and highly recommended in modeling and forecasting this type of data.


Introduction
An increasing diabetic patient became a great challenge in The General Directorate of Health Affairs, Al-Baha, Kingdom of Saudi Arabia; therefore studying of this phenomenon becomes an important issue. Diabetes is a common disease around the world, which can encourage various systemic diseases and high mortality. It is a disease, which categorizes by high sugar levels in the blood and urine. It is usually diagnosed by means of a glucose tolerance test (GTT). There are three kinds of diabetes mellitus [1]. The first kind of diabetes mellitus results from the body's failure to produce sufficient insulin. It is often occurring among children young. Type 2 diabetes mellitus results from resistance to the insulin, often initially with normal or increased levels of circulating insulin. Gestational diabetes, the third kind is Gestational diabetes and it happens when pregnant women without a previous history of diabetes develop a high blood glucose level. Diabetes is a main health challenge. Globally, the estimated number diagnosed with Diabetes is approximately 463 million people per year and mortality is 4.2 million deaths per year [2]. Al-Baha Health Affairs have launched the "Diabetes Friend" Initiative, targeting 1500 diabetics, including children, school students, and the elderly across the region. The Affairs serve about 20,000 diabetics through the Diabetes Center of King Fahad Hospital-Al-Baha and diabetes clinics of the region's hospitals, in addition to the follow-up of healthcare centers [3].
There have been growing efforts developed by Saudi Arabia researchers to study and analyze the number of diabetes patients incidence behavior especially Al Baha region. In this study autoregressive fractional integrated moving average of time series methods will apply to data representing diabetes patient in Al Baha hospitals with the objective of deciding which of these models provide accurate prediction to diabetes patients in the Kingdom of Saudi Arabia based accuracy measurements such as AIC, BIC. The study also hypothesizes that the number of diabetes patient's attendance trend at Al Baha hospitals goes to increasing over the time which led to existence of the long memory characteristic in data. In this study, we shall identify the order of ARFIMA models, estimate the parameters, make relevant forecast based on the models. The paper is organized as follows. In Section 3, we briefly present some theoretical framework on ARFIMA models. Empirical results are discussed in Section 4. Finally, the conclusion is presented in Section 7.

Literature Review
Nemours forecasting models were proposed to modeling and forecasting the number of patients in many diseases however, very few papers are available for diabetes incidence researches using the time series model. Earnest et al. (2005) used autoregressive integrated moving average (ARIMA) models to predict the number of beds occupied during a severe acute respiratory syndrome (SARS) outbreak in tertiary hospital, they used Hospital admission and occupancy for isolation beds data from Tan Tock Seng hospital for the period 14th March 2003 to 31st May 2003. They found that the ARIMA(1, 0, 3) model was able to describe and predict the number of beds occupied during the SARS outbreak well. They also provided three-day forecasts of the number of beds required [4]. Juaben Municipality, they found that ARIMA(2, 1, 1) autoregressive process of order 2, differencing of order 1 and moving average of order 1 was best fit for the secondary data. Using the obtained model, they forecasted for the next two years from 2014 and 2016. Pan et al. [5] (2016) used the ARIMA model for forecasting the patient number of epidemic disease. They have used actual data of every day patient number of epidemic disease between January and August 2014, in total 223 days, which they obtained from real life CDC (center of disease control). They identify time series model of ARIMA (7, 1, 0) best fit the data. Villani

ARFIMA Models
The classical approach in modeling time series data is to apply the Box-Jenkins methodology depending on whether the series is stationary or not. If the series show long memory property prediction values based on the identified and estimated Box-Jenkins models may not be dependable [8]; and [9]. The time-series data exhibiting long memory property can be better modeled using the most appropriate model namely ARFIMA(p, d, q), this model presented by Granger and [10]. They showed that it is likely to model long memory series of any span using the Extended Maximum Likelihood (EML) estimation method. In general, the Box-Jenkins ARIMA process is as follows [11] and [12]: where: t x = time series data, d = nonnegative integer representing the difference to achieve stationarity, B = the difference lag operator, ∅ = the autoregressive parameters, θ = e moving average parameters. A long memory process is sta- be a stochastic process, the model of an ARFIMA process of order (p, d, q) [13], symbolize by ARFIMA(p, d, q), with mean zero and constant variance can be written using backward shift operator notation as [14] ( )( ) ( ) where: ( ) ( ) The fractional differencing operator is [15]: denoting the gamma function and the parameter d is escapable to have any real value. The parameter may not be an integer (Fractionally Inte- time series display a stationary and invertible ARMA method with geometrically restricted autocorrelations [8]. A long memory process or ARFIMA(p, d, q) processes 0 is a stationary process with slowly decreases autocorrelation function k ρ at lag k as k → ∞ .
According to [16], the autocorrelation function of the fractionally ARIMA process can be expressed as follows: The partial autocorrelation function of the fractionally ARIMA process can be expressed as follows [17]: Hurst parameter (H) is a measure of the strength of a precise time series.

Estimating the Hurst Parameter
There are several methods to estimate Hurst parameters of FARIMA model, the most important one is R/S method [18].

The Rescaled Range R/S Method
The rescaled range (R/S) technique was first presented by Hurst; He defined the Journal of Data Analysis and Information Processing range [19] ( ) , R t m as: where: t = the discrete integer-valued time, m = the time-span and the standard deviation of the process, ( ) , S t m , is: The use of R/S ratio permits the observation of the ranges of numerous processes to be linked to long periods. Hurst found that the power acts practical relative among the proportion of the range ( ) , R t m and the standard deviation to be: : where H is the Hurst parameter (0 < 1), and c is a finite positive constant that does not hang on the period m. by Taking logarithms to (1) (where are H and c in the equations) [20].
Equation (13) is recognized as the pox diagram of R/S.

Empirical Results
This section discusses the empirical analysis results of applying ARFIMA models to data representing the diabetes patients attended Al-Baha hospitals during the period from January 2006 to November 2016 through testing of long memory, identification, estimation, and diagnostic checking using statistical R software. The sequence chart of diabetes patients attended at Al-Baha hospitals from the period January 2006 to November 2016 fluctuates is shown in Figure 1.
It can be shown that the number of diabetes patients fluctuate shows a slight increase start from 2008 to 4308 patients in February 2014 and then decreased to 245 patients in June 2014, before it fluctuated steadily till the end of the study interval in 2016. The descriptive statistics of diabetes patients attended at Al-Baha hospitals during the period from January 2006 to November 2016 are reported in Table 1.
From Table 1 the mean and standered deviatin of diabetes patients attended at al Baha hospitals are 878.18 and 651.786 respectively the value of jarque-bera test is 130.14 with a significant probability value of 0.000 which indicates that the distribution of diabetes patients attended Al-Baha hospitals is not normal. Figure 2 shows Autocorrelation Function and Partial Autocorrelation Function of diabetes patients attended Al-Baha hospitals from January 2006 to November 2016.
It can be shown that the autocorrelation function starts with large positive peaks decays gradually to zero at increasing lags, while the partial autocorrelation function shows a large positive peaks cutoff to zero after lag 5, these results confirmed that the diabetes patients attended Al-Baha hospitals series are non-stationary and Journal of Data Analysis and Information Processing    of diabetes series level is stationary. After the empirical results of both correlogram and Hurst exponent test produced by the Rescaled range analysis confirmed the presence of long memory in the data of diabetes patients attended Al-Baha hospitals, and the estimation of fractional difference was achieved the findings confirmed that the autoregressive fractional time series model is appropriate in modeling and forecasting the diabetes patients data. In order to build an ARFIMA model, the fractional difference value of : 0 0.44 0.5 d < < is used for the estimation of ARFIMA model. Diabetes patient's fractional differenced data has been generated. Numerous ARFIMA(p, 0.44, q) models with fixed fractional parameters are estimated and tested in Table 6 in order to select an appropriate and parsimonious candidate model for forecasting the time series data.
A closer look at the Table 6 it can be shown that ARFIMA(1, 0.44, 0) model has the smallest value of AIC and BSC of model selection criteria. In this model it is assumed that diabetes patients' data is subject to autoregressive order 1, moving average of order 0, and difference of order 0.44. Table 7 reports Journal of Data Analysis and Information Processing  ARFIMA(1, 0.44, 0) parameter estimate for diabetes patients attended Al-Baha hospitals, the estimates of the ARFIMA(1, 0.44, 0) model above, the autoregressive parameter estimates are statistically significant at 0.05 significance level, therefore this model appears to be a good fit model (Figure 3). Both ADF and PP tests confirmed that the series level is non-stationary; however, the first difference is stationary. Hurst test results and ACF confirmed that there is a long memory behavior in diabetes patients' data, and the fractional difference in the data of diabetes series revealed that ( 0.44 d = ), also unit root tests indicated that fractional difference of diabetes series level is stationary. Nemours models have been suggested to model diabetic patient's data, according to model selection criteria, ARFIMA(1, 0.44, 0) model shows the smallest values of AIC and BSC, hence this model I is chosen to represent the data. Diagnostic check confirms that ARFIMA(1, 0.44, 0) is an appropriate adequate parsimonious model for diabetes patients attended in Al-Baha hospitals. These findings indicate that for this particular type of data ARFIMA is highly recommended in modeling and forecasting diabetic patient's data.