^{1}

^{2}

Thalassemia is a major health problem in Iraq, and despites a prevention programme. There has been no decrease in the prevalence of the disease, due to a lack of awareness, implying that genetic counseling was a failure. This failure has been attributed to a lack of recognition of problems related to Thalassemia, unorganised teamwork and services, lack of knowledge and insufficient numbers of extension workers, lack of Thalassemia support groups, and inadequate research in Thalassemia prevention and control. Autoregressive Integrated Moving Average (ARIMA) model and forecasting has become a major tool in different applications. The ARIMA model introduced by Box and Jenkins (1971) is among the most effective approaches for analysing time-series data. In this study, we used Box and Jenkins methodology to build an ARIMA model to forecast the number of people with Thalassemia, for the period from 2016-2018, from the data base from Maysan Health Center specific for Thalassemia the Maysan Provence, Iraq. After the model selection, the best model for forecasting was ARIMA (0, 1, 1) and of models were used for forecasting Thalassemia.

The Arabisation (The Arabic meaning) of Thalassemia is Mediterranean disease, where the Middle East and the Gulf Arab states are included, Thalassemia, which is spread throughout Iraq and other Middle East countries, is one of the most famous genetic diseases that lead to severe anaemia and other complications in the long term. It requires quoting a regular blood transfusion every three to four weeks, accompanied by pain to the patients and great suffering to their families. On top of this, the transfer of continuous blood also leads to the accumulation of iron in the vital organs of the body, such as the liver and heart, which leads to serious complications. Infected cases of this disease are divided into two types: Thalassemia Minor―characterises people who have the disease, but are living a normal life and not complaining of any symptoms. The second type is Thalassemia major, which can occur in offspring of two carriers of Thalassemia Minor, with 25% likelihood that their children are not infected with the disease, 25% that they suffer a severe form of the disease, and 50% that they have Thalassemia Minor [

For realizing the forecast of the analysed time-series we use modern methods, such as ARIMA models, because they are among the models that can analyse large time-series data and forecast future cases

The pioneers in this area were Box and Jenkins, who popularized an approach that combines the moving average and the autoregressive models (1971). An ARMA (p, q) model is a combination of AR (p) and MA (q) models, and is suitable for univariate time-series modelling. In an AR (p) model the future value of a variable is assumed to be a linear combination of p past observations and a random error, together with a constant term. Mathematically, the AR (p) model can be expressed as [

Y t = c + ∑ i = 1 p φ 1 y t − i + ε t = c + φ 1 y t − 1 + φ 2 y t − 2 + ⋯ + φ p y t − p + ε t (1)

Here y t and ε t are respectively the actual value and random error (or random shock) at time period t, φ i ( i = 1 , 2 ⋯ p ) are model parameters and c is a constant. The integer constant p is known as the order of the model. Sometimes the constant term is omitted for simplicity. Usually, for estimating parameters of an AR process using the given time series, the Yule-Walker equations are used. Just as an AR (p) model regresses against past values of the series, an MA (q) model uses past errors as the explanatory variables. The MA (q) model is given by [

y t = μ + ∑ j = 1 q θ j ε t − j + ε t = μ + θ 1 ε t − 1 + θ 2 ε t − 2 + ⋯ + θ q ε t − q + ε t (2)

Here μ is the mean of the series θ j ( j = 1 , 2 ⋯ q ) are the model parameters and q is the order of the model. The random shocks are assumed to be a white noise process, i.e. a sequence of independent and identically distributed (i.i.d) random variables with zero mean and a constant variance σ 2 . Generally, the random shocks are assumed to follow the typical normal distribution. Thus, conceptually, a moving average model is a linear regression of the current observation of the time series against the random shocks of one or more prior observations. Fitting an MA model to a time series is more complicated than fitting an AR model because in the case of the former the random error terms are not fore-seeable. Autoregressive (AR) and moving average (MA) models can be effectively combined together to form a general and useful class of time series models, known as the ARMA models. Mathematically an ARMA (p, q) model is represented as

y t = c + ε t + ∑ i = 1 p φ 1 y t − 1 + ∑ j = 1 q θ j ε t − j (3)

Usually ARMA models are manipulated using the lag operator notation. The lag or Backshift operator is defined as L y t = y t − 1 . Polynomials of lag operator or lag polynomials are used to represent ARMA models as follows [

A R ( p ) model : ε t = φ ( L ) y t (4)

M A ( q ) model : y t = θ ( L ) ε t (5)

A R I M A ( p , q ) model : φ ( L ) y t = θ ( L ) ε t (6)

Here φ ( L ) = 1 − ∑ i = 1 p φ 1 L i and θ ( L ) = 1 + ∑ J = 1 Q θ J L j .

It is shown in that an important property of AR (p) process is invertibility, i.e. an AR (p) process can always be written in terms of an MA (∞) process. Whereas for an MA (q) process to be invertible, all the roots of the equation θ ( L ) = 0 must lie outside the unit circle. This Condition is known as the Inevitability Condition for an MA process.

In both statistics and econometrics, time series analysis of an autoregressive integrated moving average, an ARIMA model is the integration of an autoregressive moving average (ARMA) model. These models are fitted to time-series data, either to better understanding the data or to forecast future points in the series (forecasting). It is applied, in some cases where the figures show proof that they are not stationary, where an initial differencing step (corresponding to the integrated fraction of the model) can be applied to reduce the non-stationarity [

x t = θ 0 + ϕ 1 x t − 1 + ϕ 2 x t − 2 + ⋯ ϕ p x t − p + e t − θ 1 e t − 1 − θ 2 e t − 2 − ⋯ θ e t − q (7)

where X_{t} and e_{t} are the actual values and random error at time t, respectively, ϕ i ( i = 1 , 2 , ⋯ , p ) and θ j ( j = 1 , 2 , ⋯ , q ) are model parameters. p and q are integers and often referred to as orders of autoregressive and moving average polynomials respectively.

・ Analysis of the series: The first step in the process of modelling is to check for the stationary of the time series data.

・ Identification of the model: This step aimed to detect periodically and to identify the order of seasonal autoregressive terms and seasonal moving average terms. This stage includes calculation of the estimated autocorrelation function (ACF) and estimation of partial autocorrelation function (PACF) these functions measure the statistical dependence between observations of data outputs.

・ Estimation of ARIMA parameters: The estimation of ARIMA parameters is achieved by the nonlinear least squares method. The values of the model coefficients are determined in relation to a particular criterion; one of these may be the maximum likelihood criterion. It can be shown that the likelihood function associated with a correct ARIMA model, used to determine the estimates of maximum likelihood of the parameters, contains all the useful information from the data series about the model's parameters.

・ Diagnostic checking: In this stage it is assumed that the errors represent a stationary process and the residues are white noise (or independent if the distribution is normal), a normal distribution with mean and variance stable. The tests used to validate the model are based on the estimated residues. It is checked that the components of this vector are autocorrelated. If there is autocorrelation, the checked model is not correctly specified. In this case, the dependencies between the components series are specified in an incomplete manner, and we have to return to the model identification step and try another model. Otherwise, the model is good and can be used to make predictions for a given time horizon.

・ Forecasting.

B I C = n ln σ ^ a 2 − ( n − M ) ln ( 1 − M / n ) + M ln ( n ) + M ln [ ( ( ( σ ^ Y 2 ) / ( σ ^ a 2 ) 1 ) ) / M ] (8)

where:

P: model rank,

n: Views,

M: The number of parameters,

σ ^ Y 2 : Estimator series variance,

σ ^ a 2 : Estimator error variance,

σ ^ a 2 = ∑ t = 1 n ( y t − y ^ t ) 2 / ( n − p ) (9)

A I C ( M ) = n ln σ ^ a 2 + 2 M (10)

or

A I C ( p , q ) = ln σ ^ a 2 + 2 ( p + q ) / n (11)

where:

M: p + q; p, q: model rank; n:views; σ ^ a 2 : Estimator error variance

This study is based on the time-series data provided by the hereditary blood disease centre in Iraq (Maysan) from people diagnosed with Thalassemia for the period 2005-2015.

The first step in the process of modelling is to check for the stationary of the time series data. This is done by observing the graph of the data or autocorrelation and the partial autocorrelation functions [

The first stage of ARIMA model building is to identify whether the variable, which is being forecasted, is stationary in the time-series or not. By stationary, with autocovariance functions, we can define the covariance stationarity, or weak stationarity. In the literature, usually stationarity means weak stationarity, unless otherwise specified. The time serie ( x t , t ϵ z ) s.

Where z is the integer set is said to be stationarity if:

var ( x t ) < ∞ ∀ t ∈ z .

E X t = μ ∀ t ∈ Z

γ x ( s , t ) = γ x ( s + h , t + h ) ∀ s , t , h ∈ Z

The time plot of the { x t } must have three features: finite variation, constant first moment, and that the second moment γ x ( s , t ) only depends on ( t − s )

and not depends on s or t In light of the last point, we can rewrite the auto covariance function of a stationarty process as

γ x ( h ) = Cov ( x t , x t + h ) for t , h ∈ Z (12)

Also, when x t is stationarty we must have

γ x ( h ) = γ x ( − h ) (13)

When h = 0 , γ x = ( 0 ) = cov ( x t , x t ) is the covariance of x t so the autocorrelation function for stationarty time series x t is defined to be

p x ( h ) = ( γ x ( h ) ) / ( γ x ( 0 ) ) (14)

In

First Difference: Z t = y t − 1 where t = 2 , 3 ⋯ n

Second Difference: Z t = ( y t − y t − 1 ) − ( y t − 1 − y t − 2 ) where t = 3 , 4 ⋯ n

As a result we obtained a time-series of first order differencing and

Using the adjusted (ADF) test [

Our null hypothesis (H0) in the test is that the time series data is non-statio- nary; while the alternative hypothesis (Ha) is that the series is stationary. The

hypothesis is then tested by performing appropriate differencing of the data in dth order and applying the ADF tests to the differenced time series data. First order differencing (d = 1) means we generate a table of differenced data of current and immediately previous time Δ x t = x t − x t − 1 . The ADF test result, as obtained upon application, is shown below (using Eviews program) in

We, therefore, fail to accept H0 and, hence, can conclude that the alternative hypothesis is true i.e. the series is stationary in its mean and variance. Thus, there is no need for further differencing of the time series and we adopt d = 1 for our ARIMA (p, d, q) model. This test enables us to go further in steps for ARIMA model development i.e. to find suitable values of p in AR and q in MA in our model. For that, we need to examine the correlogram and partial correlogram of the stationary (first order differenced) time series. The dickey fuller test depends on three simple equations and assumes a random the context of a pattern of downhill autocorrelation of (1) these equations are:

Δ x t = α 1 x t − 1 + e t (15)

Δ x t = α 0 + α 1 x t − 1 + e t (16)

Δ x t = α 0 + α 1 x t − 1 + B t + e t (17)

Whereas:

Δ : The first difference factor, which: Δ x t = x t − x t − 1 ,

e t : White noise process.

Augmented Dickey ?Fuller test statistic | t-statistic | Prob. | ||
---|---|---|---|---|

−11.38261 | 0.0000 | |||

Test critical values | 1% level | −3.482035 | ||

5% level | −2.884109 | |||

10% level | −2.578884 | |||

Augmented Dickey-Fuller Test Equation Dependent variable :D(DTHALASSEMIA) Method Least Squares Sample(adjusted)2005 M05 2015M12 | ||||

Variable | Coefficient | Std. error | t-statistic | Prob. |

DTHALASSEMIA(−1) | −2.176864 | 0.191245 | −11.38261 | 0.0000 |

D(DTHALASSEMIA(−1) | 0.699137 | 0.145316 | 4.811158 | 0.0000 |

DTHALASSEMIA(−2) | 0.381127 | 0.085537 | 4.455672 | 0.0000 |

C | 0.094297 | 0.109530 | 0.860923 | 0.3909 |

R-sequard | 0.719809 | Mean dependent var | 0.046875 | |

Adjusted R-sequard | 0.713030 | S.D dependent var | 2.310058 | |

S.E. of regression | 1.237487 | Akaike info criterion | 3.294795 | |

Sum squared resid | 189.8905 | Schwarz criterion | 3.383920 | |

Log likelihood | 206.8669 | Hannan-Quinn criter | 3.331007 | |

F-Statistic | 106.1852 | Dubin-Watson stat | 1.998406 | |

Prob(F-statistic) | 0.000000 |

Seen from the

Z t − Z t − 1 = c + a t − θ a t − 1 , | θ | < 1 (18)

Following the common practice, we shall assume c = 0. Since the model is invertible, the π weights are π i = θ i − 1 ( 1 − θ ) for i ≥ 1 . Thus

Z t − Z t − 1 = 0.05577677 + 0.98928205 t − 1

Models | AIC | BIC |
---|---|---|

ARIMA (1, 1, 0) | 89.7921 | −65.0102 |

ARIMA (0, 1, 1) | 38.5633 | −122.7428 |

The results showed that after we apply (eviews, gretl, minitab) specific statistical programs on our data and charts, there is an increase in cases with Thalassemia in the coming years from 2016 to 2018, according to the

Obs | Forecast ARIMA (1, 1, 0) | Forecast ARIMA (0, 1, 1) |
---|---|---|

2016:01 | 6.94 | 8.17 |

2016:02 | 6.94 | 8.46 |

2016:03 | 6.82 | 8.28 |

2016:04 | 7.41 | 8.82 |

2016:05 | 9.27 | 10.00 |

2016:06 | 8.01 | 9.07 |

2016:07 | 7.77 | 8.43 |

2016:08 | 8.10 | 8.40 |

2016:09 | 5.65 | 7.48 |

2016:10 | 5.53 | 8.07 |

2016:11 | 5.76 | 7.78 |

2016:12 | 8.08 | 8.58 |

2017:01 | 7.04 | 9.03 |

2017:02 | 7.04 | 9.32 |

2017:03 | 7.54 | 9.14 |

2017:04 | 8.91 | 9.68 |

2017:05 | 10.85 | 10.87 |

2017:06 | 8.61 | 9.94 |

2017:07 | 6.82 | 9.30 |

2017:08 | 8.08 | 9.28 |

2017:09 | 5.32 | 8.37 |

2017:10 | 5.82 | 8.96 |

2017:11 | 4.79 | 8.67 |

2017:12 | 8.05 | 9.46 |

2018:01 | 6.99 | 9.92 |

2018:02 | 6.99 | 10.22 |

2018:03 | 7.14 | 10.04 |

2018:04 | 8.07 | 10.58 |

2018:05 | 9.96 | 11.78 |

2018:06 | 8.26 | 10.85 |

2018:07 | 7.33 | 10.21 |

2018:08 | 8.06 | 10.19 |

2018:09 | 5.47 | 9.28 |

2018:10 | 5.62 | 9.87 |

2018:11 | 5.29 | 10.39 |

2018:12 | 8.02 | 9.969 |

Model | FPE |
---|---|

ARIMA (0, 1, 1) | 0.03256 |

ARIMA (1, 1, 0) | 0.06542 |

and these numbers are high compared to previous years, where highest forecasting for the patients is during June of 2018. Some months during the period of 2005-2015 did not record any cases while in the forecasting period 2016-2018 every month is accepted to have some patients.

Final Prediction Error (FPE) [Been using the final prediction error (FPE) a good estimate of prediction error for model with n parameters is given by the final prediction error see

F P E = σ r 2 ( N , β ∧ ) ( N + n + 1 ) / ( N − N − 1 ) (19)

σ r 2 =variance of the residuals.

N is the number of values in the estimation data set.

From

1) ARIMA model was suitable for application to Thalassemia data and analysis of other similar medical data.

2) By application of ARIMA model, we were able to forecast future cases easily and accurately.

3) Cases of Thalassemia will increase within coming years, which means that, currently, no serious efforts are offered to solve or treat this disease in Iraq.

1) Doing more similar studies using data from other provinces of Iraq to overcome the disease.

2) Doing pre-marriage tests to detect the carriers to limit the future incidence.

3) The bone marrow transplant is the most suitable treatment of the disease but it is so costly for patients that governmental support will be very helpful.

4) Should take care to Sanitaryware industry pain less and more safety during blood transfusions to relieve pain and prevent infection from bacterial and viral diseases.

Alsudani, R.S.A. and Liu, J.C. (2017) The Use of Some of the Information Criterion in Determining the Best Model for Forecasting of Thalassemia Cases Depending on Iraqi Patient Data Using ARIMA Model. Journal of Applied Ma- thematics and Physics, 5, 667-679. https://doi.org/10.4236/jamp.2017.53056