^{1}

^{*}

^{2}

^{*}

This study aimed to find a suitable model for forecasting the appropriate stock of vaccines to avoid shortage and over-supply. The Auto-Regressive Integrated Moving Average (ARIMA) and Multilayer Perceptron Neural Network (MLPNN) models were used for forecasting time series data. The monthly vaccination coverage was used to develop the models from January 2014 until December 2019. The dataset consists of 72 months of observation, the 60 months of data are used for model fitting from January 2014 to December 2019, and the remaining 12 months of data from January 2019 to December 2019 are used to test the accuracy of the forecast. The most suitable forecast model was selected based on the lowest Root Mean Square Error (RMSE) value and the Mean Absolute Error (MAE). The analytical result shows that the MLPNN model outperformed the ARIMA model in forecasting monthly demand for vaccines. The results will help policymakers improve the proper use of vaccination resources.

The demands for vaccines have significantly increased due to the occurrence of outbreaks and birth rates in the Philippines. This increase storage, transport capacities, and handling of vaccines, thus resulting in complex and challenging to manage the immunization supply chain. The vaccine demand forecasting tool will help address a host of problems, such as small storage space, low stock, overstocking, and maintenance costs. Forecasting is an essential part of management’s decision-making activities and plays a vital role in many areas of the company [

This study compares the Multilayer Perceptron Neural Network (MLPNN) and Auto-Regressive Integrated Moving Average (ARIMA) models for vaccine demand forecasting. The suitable forecasting models were selected based on the lowest Root Mean Square Error (RMSE) and Mean Absolute Error (MAE) values. The forecasting results generated using the best model will serve as the basis to provide decision-making solutions to improved immunization planning and program.

The Autoregressive Integrated Moving Average (ARIMA) model is considered the most advanced and robust approach among the traditional forecasting models [

The Multilayer Perceptron Neural Network (MLPNN) is the most commonly used ANN model for time series forecasting. It is a feed-forward neural network consisting of an input layer, a hidden layer, and an output layer [

This study adopted the Cross-Industry Standard Process for Data Mining or (CRISP-DM) methodology. It is a process model that provides a fluid framework for devising, creating, building testing, and deploying machine solutions [

1) Research Understanding. The research work described in this study aims to develop a vaccine forecasting model using Autoregressive Integrated Moving Average (ARIMA) and Multilayer Perceptron Neural Network (MLPNN) models and to compare their results by evaluating the forecast performance of the models used.

2) Data Understanding. The dataset for this study was obtained from Cabanatuan City Health Office. The object of the research used is the number of infants receiving the vaccines for 72 month period from January 2014 to December 2019 in Cabanatuan City. In addition, this study selected the BCG to be the experimental vaccine for the demand forecasting implementation.

3) Data preparation. In this step, the data were divided into training and testing process of the model. This step includes splitting the series into training containing the first 85% values and testing containing the last 15% of the data set. Moreover, this step applied transformations of data such as identifying and treating outlier data and constructing and decomposing time-series format.

4) Modeling and Forecasting. In this step, the training data set was used to train the statistical and machine, learning models. The auto.arima()function of the forecast package in R was used to fits an ARIMA model. This algorithm automates the ARIMA model’s tuning process by using a stepwise search to traverse the model space to select the best model with the smallest Akaike’s Information Criterion (AIC) [

On the other hand, the mlp() function of nnfor package [

5) Evaluation. In this step, the result of ARIMA and MLPNN models is evaluated to determine the model’s accuracy. The model will be evaluated using the two performance measures: the Root Mean Square Error (RMSE) and Mean Absolute Error (MAE). The precision of the models is measured based on the lower value of these output measures [

RMSE = ∑ i − 1 n ( P i − O i ) 2 N (1)

MAE = 1 n ∑ i − 1 n | A i − B i | (2)

The technologies that helped achieve this study are briefly listed below, along with the methodology applied to achieve the aim of this research:

1) Microsoft Excel 2019 was used to save the raw data in .csv format and pre-process the data and the initial checking of the dataset.

2) The forecasting models were developed using R programming language and using R Studio Desktop IDE. R has several forecasting packages for secure handling of data from time series [

3) All the software mentioned above was running on Microsoft Windows 10 Pro machine.

The monthly vaccination data of Cabanatuan City, Nueva Ecija, from January 2014 to December 2019 was used in this study. The number of observations is 72 months and was divided into two parts to use in training and forecasting. In the first part, 60 monthly data are taken into account from the January 2014 to December 2018. These data are used for model fitting. The remaining 12 months of data from January 2019 to December 2019 were used as an out-of-sample set, using Root Mean Square Error (RMSE) and Mean Absolute Error (MAE) to measure the selected model’s forecast accuracy. The R studio Desktop and R Programming language were used to build the ARIMA and MLPNN model. The following steps were used for data preparation:

1) Load Data Set and Libraries

The first step in modeling and forecasting is to install and load the necessary packages. Next is to load the data sets from the .csv file, as shown in

2) Detection and Treatment of Outlier data

Outliers can greatly affect the quality of forecasting. Therefore identification and treatment of these outliers are essential. The first step in determining the outliers appear in the data is by using a box plot diagram. The box plot diagram is a graphical tool usually expressed by quartiles and interquartile, which helps to identify the upper limit and lower limit above which any data lying would be considered outliers. Both the lower and upper limit results are statistically average and should thus be used for forecasting.

3) Constructing and Decomposing Time Series Data

This step includes the creation and decomposition of time series to determine the patterns, cycle, and seasonality of the period.

a) Time series construction

In R Library, the ts() function was used to define the frequency to construct a time series. This analysis used a ratio of 12 and 1 to show the monthly and yearly series.

b) Time series decomposition

Time series decomposition procedures were carried out to identify the pattern and seasonal factors of vaccination coverage.

4) Creating Training and Testing Data

The data set is consisting of 72-month observations from January 2014 to December 2019. The dataset was split into training and testing data. The training

dataset consists of 59-month observations from January 2014 and November 2018, while the testing data set consists of 12-month observations from December 2018 and November 2019. The code and result for the splitting dataset are shown in

1) ARIMA(p, d, q) Model

The order for ARIMA(p, d, q) model was determined, and the best fit model was identified. The ACF (Autocorrelation function) and PACF (Partial autocorrelation function) are the tools used for the selection of ARIMA(p, d, q). Likewise, the auto.arima function from the forecast package in R will do it automatically to find the order (p, d, q).

The ARIMA(p, d, q) model that are suitable for monthly series is ARIMA(1, 0, 0)(0, 1, 1) [

The ARIMA(1, 0, 0)(0, 1, 1) [

2) MLPNN Model

The analysis of Multilayer Perceptron Neural Network or MLPNN model was implemented using nnfor package of R. The result of the training of the MLPNN using 59 observations was done with five hidden nodes, 20 repetitions, and univariate lags: (1, 2, 10, 11, 12). The MSE of monthly vaccination coverage is 31.7939. The MLPNN model architecture, which consists of one input, hidden, and output layer, as shown in

The fitted MLPNN model was used to forecast the next 12 months of BCG vaccination coverage from December 2018 to November 2019.

This study compares the results obtained from the MLPNN model developed

with the ARIMA model. To provide a clearer understanding of the performance of the selected methods, the models’ accuracy measures are shown in

Models | RMSE | MAE |
---|---|---|

ARIMA(1, 0, 0)(0, 1, 1) [ | 94.68 | 64.04 |

MLPNN | 5.63 | 2.45 |

The result shows that the performance metrics RMSE and MAE are low for MLPNN model. The smaller the error values, the better the model’s performance. Therefore it can be concluded that the MLPNN model performs well than the ARIMA model in forecasting BCG vaccination coverage.

The goal of this research was to find a suitable model for forecasting the appropriate stock of vaccines to avoid shortage and over-supply. The MLPNN and ARIMA model was used for forecasting the monthly vaccine demand from January 2014 to December 2019. Then, it chooses the suitable forecasting method using the RMSE and MAE accuracy measures. The results showed that the MLPNN model is superior to the ARIMA model in forecasting the monthly vaccine demand. This result coincided with the previous literature that uses MLPNN and ARIMA in forecasting [

The authors are grateful to the staff, directors, and administrators of Cabanatuan City Health Office, Philippines for guidance, assistance, help for data collection and support to make this research work realizable.

The authors declare no conflicts of interest regarding the publication of this paper.

Alegado, R.T. and Tumibay, G.M. (2020) Statistical and Machine Learning Methods for Vaccine Demand Forecasting: A Comparative Analysis. Journal of Computer and Communications, 8, 37-49. https://doi.org/10.4236/jcc.2020.810005