^{1}

^{*}

^{1}

In this paper, the Box-Jenkins modelling procedure is used to determine an ARIMA model and go further to forecasting. We consider data of Malaria cases from Ministry of Health (Kabwe District)-Zambia for the period, 2009 to 2013 for age 1 to under 5 years. The model-building process involves three steps: tentative identification of a model from the ARIMA class, estimation of parameters in the identified model, and diagnostic checks. Results show that an appropriate model is simply an ARIMA (1, 0, 0) due to the fact that, the ACF has an exponential decay and the PACF has a spike at lag 1 which is an indication of the said model. The forecasted Malaria cases for January and February, 2014 are 220 and 265, respectively.

Malaria remains one of the most causes of human morbidity and mortality with a high rate in Africa and Asia. Reference [

The Box-Jenkins approach to forecasting was first described by statisticians George Box and Gwilym Jenkins and was developed as a direct result of their experience with forecast problems in the business, economic, and control engineering applications [

The application significance of this study is that by developing forecasting models for predicting the expected number of malaria cases in advance, timely prevention and control measures can be effectively planned like eliminating vector breeding places, spraying insecticides, and creating public awareness.

The model-building process involves three steps.

1) Tentative identification of a model from the ARIMA class.

2) Estimation of parameters in the identified model.

3) Diagnostic checks.

Tentative identification of model―at this stage we use two graphical devices which are the estimated autocorrelation function (ACF) and an estimated partial autocorrelation function (PACF) as guides to choosing one or more Autoregressive Integrated Moving Average (ARIMA) models that are appropriate.

Estimation of parameters in the identified model―at this stage we get precise estimate of the coefficients of the model chosen at the identification stage.

Diagnostic checks―used to help determine if an estimated model is statistically adequate.

If the tentatively identified model passes the diagnostic tests, the model is ready to be used for forecasting. If it does not, the diagnostic tests should indicate how the model ought to be modified, and a new cycle of identification, estimation and diagnosis is performed. With a stationary series in place, a basic model can now be identified. Three basic models exist, AR (autoregressive), MA (moving average) and a combined ARMA. When regular differencing is applied together with AR and MA, they are referred to as ARIMA, with the “I” indicating “integrated”. The general ARIMA (p, d, q) model is defined as

where_{t} is a Gaussian

The paper is organized as follows: In Section 2, we give brief survey on previous works (literature review). In Section 3 we display the data set used in this paper. Section 4 we discuss the modelling approach together with the model used in this paper. The forecasting results are presented in Section 5 and the conclusion is presented in Section 6.

A brief survey on previous work provides the context of this paper.

Reference [

The data from

The first step in this time series analysis is to plot the observations against time. Graphs from these observations are called time plot and they show up important features of the series such as trend, seasonality, outliers and discontinuities. The input data must be adjusted to form a stationary series, one whose values vary more or less uniformly about a fixed level over time. Trends can be adjusted by “regular differencing”, a process of computing the difference between every two successive values, computing a differenced series which has overall trend behavior removed. If a single differencing does not achieve stationarity, it may be repeated although rare to have more than two regular differencing’s. Where irregularities in the differenced series continue to be displayed, log or inverse functions can be specified to stabilize the series such that the remaining residual plot displays values approaching zero and without any pattern. This is the error term, equivalent to pure, white noise [

A visual inspection of the time series plot in

Two graphical devices which are the autocorrelation function (ACF) and partial autocorrelation function (PACF) are used as guides to choosing one or more Autoregressive Integrated Moving Average (ARIMA) models that are appropriate.

Month/Year | 2009 | 2010 | 2011 | 2012 | 2013 |
---|---|---|---|---|---|

1 | 833 | 491 | 325 | 278 | 306 |

2 | 533 | 320 | 327 | 248 | 461 |

3 | 378 | 372 | 499 | 329 | 527 |

4 | 330 | 445 | 434 | 532 | 913 |

5 | 309 | 577 | 597 | 616 | 682 |

6 | 206 | 539 | 415 | 207 | 206 |

7 | 502 | 737 | 235 | 88 | 117 |

8 | 403 | 505 | 279 | 99 | 70 |

9 | 698 | 398 | 272 | 146 | 136 |

10 | 559 | 449 | 112 | 115 | 173 |

11 | 351 | 442 | 131 | 129 | 129 |

12 | 339 | 279 | 259 | 149 | 153 |

Source: Ministry of Health-Kabwe District Community Medical Office (KDCMO).

Equation (2) can now be used to estimate the parameter by least squares estimation. Reference [

We view this as a regression model with predictor variable X_{t} then apply the Least Squares estimation proceeds by minimizing the sum of the differences. The estimators are

Calculations in Excel show that:

Verification of goodness of fit of any model should include a test as to whether the residuals form a white noise process. A portfolio of tests for goodness of fit of our model has been done in this paper.

The histogram shows that the average of residuals is approximately 0. The QQ plots are an effective tool for assessing normality. The QQ plot of residual observations in

The autocorrelation plot show (see

Our other diagnostic check is to inspect a scatter plot of the residuals over time in

Box-Jenkins approach to forecasting stationary time series is relatively simple. The forecast value of

and

The general form of the forecast equation is therefore

for

ARIMA (1, 0, 0) model developed in this paper attempts to provide the best possible model for predicting malaria cases per month in the future based on observed malaria cases over the years. The results also indicate that the malaria cases will continue to occur in the near future if appropriate intervention measures are not initiated on time. The potential implication of this study is that by developing forecasting models for predicting the expected number of malaria cases in advance, timely prevention and control measures can be effectively planned like eliminating vector breeding places, spraying insecticides, and creating public awareness. The study also provides a model to foresee and allocate appropriate resources to maintain a steady decrease and combat malaria. The ARIMA model used in this paper can also be applied to other diseases like Ebola. These results can also be used to sensitize travelers about malaria risk to take necessary precautionary measures.

In this paper, the Box-Jenkins modelling procedure is discussed to determine an ARIMA model and go further to forecasting. We considered data of Malaria cases from Ministry of Health (Kabwe District)-Zambia for the period, 2009 to 2013 for age 1 to under 5 years. Results show that an appropriate model is simply an ARIMA (1, 0, 0) due to the fact that, the ACF decays exponentially and the PACF has a spike at lag 1 which is an indication of the said model. The forecasted Malaria cases for January and February, 2014 are 220 and 265, respectively. Finally, the study can be done on a wider area of Zambia and further research can be done to evaluate the effectiveness of integrating the forecasting model into the existing disease control program in terms of its impact in reducing the disease occurrence. These will be studied elsewhere.

The authors are thankful to the Ministry of Health for providing the data. Department of Mathematics and Statistics, Mulungushi University for using their resources and to all the people who helped in making comments on this paper.

Stanley Jere,Edwin Moyo, (2016) Modelling Epidemiological Data Using Box-Jenkins Procedure. Open Journal of Statistics,06,295-302. doi: 10.4236/ojs.2016.62025