Forecasting Measles Immunization Coverage Using ARIMA Model

This study aimed to find a model to forecast monthly measles immunization coverage using Autoregressive Integrated Moving Average (ARIMA). The monthly registered data for measles immunization coverage from January 2014 to December 2018 were used for the development of the model. The best model with the smallest Normalized Bayesian Information Criterion (BIC) of 8.673 is ARIMA (0, 1, 0). ARIMA (0, 1, 0) was used to forecast the monthly measles immunization coverage for the next 36 months from January 2018 to December 2020. The results obtained prove that this model can be used for forecasting future immunization coverage and will help decision-makers to establish strategies, priorities, and proper use of immunization resources.


Introduction
Measles is a highly contagious respiratory disease caused by a virus which belongs to the paramyxovirus family. It is an airborne disease which spreads quickly through the coughs and sneezes of infected people. It may also be spread through direct contact with the mouth or nasal secretions. Even though the safe and effective vaccine is available, it remains a notable cause of death among young children worldwide. In 2017, there were about 110,000 global deaths related to measles, and mostly in children under the age of five [1]. In the Philippines, the rate of measles vaccination has declined steadily from 80 percent in 2008 to below 70 percent in 2017. As a result, many children have become susceptible to measles infection. World Health Organization estimates that 2.6 mil-lion Filipino children under the age of five years are vulnerable to measles [2].
In the Philippines, Expanded Program on Immunization (EPI) was established in 1976, one dose of measles vaccine given at nine months of age has been included in the program, then later on introduced nationwide in 1983 [3]. The measles vaccine is available free of cost in government health centers, with the first dose is given when a child is nine months old, and the second dose is given when a child is 12 months old [4].
This study aimed to propose a forecasting model for monthly measles infant immunization coverage. The forecasting result generated by the model can serve as the basis for preparing the accurate monthly volume of measles vaccine. By doing so, this will help to achieve a high level of coverage and improve measles immunization planning and program.
A time-series approach with autoregressive integrated moving average (ARIMA) modeling is used to provide a monthly forecast of immunization coverage with precision. It is a combination of three statistical models. It uses the Autoregressive, Integrated, and Moving Average (ARIMA) model for statistical information. ARIMA model is commonly used in analysis and forecasting. It is the most efficient forecasting technique in social science and used extensively for time series [5]. In medical applications, time series forecasting models have been successfully applied to predict the progress of the disease, estimate the mortality rate, and assess the time-dependent risk [6].
Several studies were conducted to forecast immunization coverage. Authors in [7] utilized ARIMA and Back-Propagation Neural Networks (BPNN) to forecast the annual vaccine demand of a specific vaccine. Similarly, researcher in [8] use ARIMA analysis in predicting vaccine demand in advance through analysis of progress of birth of newborn babies in Korea, while [9] use Artificial Neural Network (ANN) to expose the trend and proposing the forecasting model for monthly pentavalent infant immunization coverage and [10] use ARIMA and Neural Network to propose computer-based forecast model to build a decision support system aiming to forecast the annual vaccine demand for specific vaccines of Taiwan. However, none of them are focused on forecasting measles immunization coverage and provided three-year forecasts. Moreover, this study will contribute to existing knowledge of applying the ARIMA model in forecasting immunization coverage.

Methodology
The research methodology used in this study is summarized below. The study used monthly registered data for measles immunization coverage from January 2014 to December 2018 total time series 60 was taken from the City Health Office of Cabanatuan. ARIMA is used to uncover the measles immunization coverage trend and the development of the forecasting model using IBM SPSS.

Data Source
The dataset for this study was obtained from Cabanatuan City Health Office,

ARIMA Model
Autoregressive Integrated Moving Average model is popular and widely used stochastic time series model. It is a combination of three statistical models. An ARIMA model is denoted as an ARIMA model (p, d, q), where p is the number of autoregressive terms, d is the degree of differencing involve, and q is the number of moving-average terms [11] [12] [13]. This study follows the Box and Jenkins methodology [14], which composed of four main steps as shown in Figure 1.
1) Model Identification. In this step, the order of respective time series model is determined, the order of differencing "d", the order of AR process which is "p" and order of MA process is "q". This mainly leads to the use of graphical procedures such as plotting the series, the ACF and PACF.
2) Estimation of Model Parameters. Involves the estimation of the parameters of the different models using previously available data and proceeds to the first selection of models using the Akaike Information Criterion (AIC) and Bayesian Information Criterion (BIC) values.
3) Diagnostic checks (model checking). This step determines whether the model(s) specified and estimated is valid. Thus, the residual values are normally distributed, no significant periodicity is present in the residual, and the residual series is uncorrelated. The process will start again if this step is not fulfilled.

4)
Forecasting. In this step, the model is now ready to be tested on real data to forecast future values of immunization coverage.

Results and Discussions
This study applies Box-Jenkins methodology for the identification, estimation, diagnostic checking, and forecasting of the univariate time-series data on monthly measles immunization coverage. The graph of the raw data is shown in Figure 2.

Model Identification
The first stage in developing an ARIMA model is to determine if the series is stationary. The graph below ( Figure 2) depicts that the series has a large variance.
To reduce the variance, the "raw data" was transformed into its natural logarithmic equivalent. The graph of the log (immunization) is shown below. Figure 2 and Figure 3 show that the mean of the series is a function of time   Differencing the natural log-transformed series can transform the series from non-stationary series to stationary. The graph below shows that the "first difference of the log-transformed series" is stationary (Figure 4).
The ADF test for the first difference of the natural log-transformed data confirms that a stationary series can be achieved after the first differencing.
The ACF and PACF of the first difference natural log-transformed series are shown below. The height of bars in the ACF represents the autocorrelation coefficients. The two parallel lines represent the confidence interval. All bars fall inside the confidence interval, indicating that the autocorrelations are statistically non-significant at 5% level. This means that an ARIMA with Moving Average component of order 0 can be tentatively considered, that is, MA (0) ( Figure 5).
The height of bars in the PACF represents the partial autocorrelation coefficients. Though the partial autocorrelation coefficients fall within the confidence level, that is, the partial autocorrelation coefficients are statistically nonsignificant at 5%, some are larger than the others. The PACF suggests an ARIMA model with AR of order 0 can be considered. AR of orders 1, 4, 8, and 9 can also

Estimation of Model Parameters
After the estimation of the parameters, the Root Mean Square Errors (RMSE) and Normalized BIC are summarized as shown in Table 2. The smaller the RMSE and Normalized BIC, the better the model. In this study, the normalized BIC is used because its value is not a function of the number of parameters. Using

Normality Distributed Residuals
Shapiro-Wilk test was used to test if the "residuals" of the ARIMA (0, 1, 0) is normally distributed. Shapiro-Wilk statistic is statistically nonsignificant (SW = 0.973, df = 59, p = 0.219), indicating that the residuals have no departure from normality. Figure 7 Histogram and Figure 8 Normal Q-Q plot support the normality.

No Autocorrelations in the Residuals
The Ljung-Box Q was used to test the absence of autocorrelations as a whole.
The Ljung-Box Q is statistically nonsignificant (Ljung-Box Q = 13.778, p = 0.743), indicating the absence of autocorrelations. The bars of the PACF and ACF plots are within the 95% confidence limits, indicating the absence of autocorrelations ( Figure 9).

Forecast
ARIMA (0, 1, 0) was used to forecast the monthly measles immunization coverage for the next 36 months from January 2018 to December 2020. Figure 10 shows the actual data (blue line) and fitted/forecasted values (red line) using the    Table 3 shows forecast values of immunization coverage from January 2018 to December 2021 using ARIMA (0, 1, 0).

Conclusion
In this study, the ARIMA model was developed to forecast monthly measles immunization for the next 36 months from January 2019 to December 2021. The monthly registered data for measles immunization coverage from January 2014 to December 2018 were used for the development of the model. The best model  0, 1, 0). The results obtained prove that this model can be used for forecasting future immunization coverage and will help decision-makers to establish strategies, priorities, and proper use of immunization resources in Cabanatuan City. As future work, the researcher will develop other models by using the neural network approach to compare it with the ARIMA model.