Artificial Neural Networks for COVID-19 Time Series Forecasting

Today, COVID-19 pandemic has become the greatest worldwide threat, as it spreads rapidly among individuals in most countries around the world. This study concerns the problem of daily prediction of new COVID-19 cases in Italy, aiming to find the best predictive model for daily infection number in countries with a large number of confirmed cases. Finding the most accurate forecasting model would help allocate medical resources, handle the spread of the pandemic and get more prepared in terms of health care systems. We compare the forecasting performance of linear and nonlinear forecasting models using daily COVID-19 data for the period between 22 February 2020 and 10 January 2022. We discuss various forecasting approaches, including an Autoregressive Integrated Moving Average (ARIMA) model, a Nonlinear Autoregressive Neural Network (NARNN) model, a TBATS model and Exponential Smoothing on the data collected from 22 February 2020 to 10 January 2022 and compared their accuracy using the data collected from 26 March 2020 to 04 April 2020, choosing the model with the lowest Mean Absolute Percentage Error (MAPE) value. Since the linear models seem not to easily follow the nonlinear patterns of daily confirmed COVID-19 cases, Artificial Neural Network (ANN) has been successfully applied to solve prob-lems of forecasting nonlinear models. The model has been used for daily prediction of COVID-19 cases for the next 20 days without any additional intervention. The prediction model can be applied to other countries struggling with the COVID-19 pandemic and to any possible future pandemics.

3 years, it has been having a significant worldwide negative impact on all fields.
Thus, predicting future COVID-19 infections can be extremely useful, as it may enhance public health decision-making, including intervention decisions in the spread of the pandemic. Using appropriate models and consistently making accurate projections can help countries to better allocate their resources and prepare for the future. Discovering possible future values of the pandemic, in terms of number of infection cases, evolving of the spread of the virus or deaths can help countries have a more prepared health care system, whether they are among the most affected by the pandemic or have recently been struggling with its spread.
Many models for forecasting the global and local spread of infection cases have been developed since the beginning of the pandemic. In this article, we provide forecasts for the confirmed Italian new COVID-19 cases using four different time-series forecasting models and compare their performance to analyze the advancement of the cases based on the daily reported data. We aim to forecast total confirmed COVID-19 cases through a comparison of the performance of these models and provide an analysis of the errors of the forecasts, with the objective to have a clear expectation of future cases, in order to obtain more preparedness in health care systems.
The purpose of our work is to determine the best forecasting model for the spread of Coronavirus infection data in a certain region for a given period of time.
Several studies try to predict the evolution of the COVID-19 pandemic using a variety of models. Khan and Gupta [1] applied an ARIMA (1, 1, 0) and a nonlinear autoregressive (NAR) model to Indian COVID-19 infected cases for a daily prediction of new cases 50 days ahead, preferring the linear ARIMA model over the NAR model, due to the fact that the most recent Indian COVID-19 new cases followed a linear trend. Batista [2] used the logistic model to predict the number of cases in China, South Korea and the rest of the world during the first semester of 2020 before the second wave occurred. Abotaleb and Makarovskikh [3] predicted future COVID-19 cases in Russia through a hybrid system, considering linear models (ARIMA and Exponential Smoothing) and nonlinear models (BATS, TBATS) for data collected until March 2021. Safi and Sanusi [4] applied an ARIMA model to predict COVID-19 cases for data collected during the first and the second pandemic wave, dividing the time series into two parts. Gecili et al. [5] applied ARIMA, Smoothing Spline and TBATS models to COVID-19 pandemic data for USA and Italy, preferring the first two linear models to the third, for the period February-April 2020. Salaheldin and Abotaleb [6] chose the exponential growth model over ARIMA for making predictions on daily COVID-19 cases in China, Italy and USA, not considering the nonlinear models as possible forecasting models, due to the fact that in these countries COVID-19 new cases had a nonlinear trend. Tian et al. [7]

Methodology
We considered data published online from Superior Health Institute on Epide- where, n is the total number of observations, A t is the actual value and F t is the forecast value.

ARIMA Model
The first model is ARIMA (Auto-Regressive Integrated Moving Average), which is the most common model for time series forecasting. It represents a time series as a function of its past values, its own lags and the lagged errors, to forecast future values. An ARIMA model is compound by 3 terms: p, d, q: where, p is the order of the Auto-Regressive (AR) term and refers to the number of y lags which should be used as predictors, q is the order of the Moving Average (MA) term and it refers to the number of lagged errors used as predictors, while d is the number of differentiating required to make the time series stationary. More than one differentiation may be required, depending on the complexity of the series. The most common approach to making a series stationary is to subtract the previous value from the current value. So, d is the minimum number of differentiation to make the series stationary and if the time series is already stationary, then d = 0.
The principal objective of the ARIMA model is to forecast future values by recognizing the stochastic mechanism of the time series. Although ARIMA is widely used for time series analysis, it is not easy choosing appropriate orders for its components, so we proceeded to determine the orders automatically, using the auto.arima function from the forecast package in R, which returns the best ARIMA model. This includes identifying the most suitable lags for the AR and MA components and deciding whether the variable needs differentiation to induce stationary. The model that better fitted our time series data was ARIMA where, L indicates the likelihood and k is the number of parameters.   Forecasting: Once the model was identified and the parameters have been estimated, it can be used for forecasting purposes. It is checked using statistical tests and residual plots that can be used to analyze the suitability of various models to historical data.

Holt's Linear Trend
The linear exponential smoothing model uses double exponential smoothing parameters to forecast future values: the first parameter is used for the overall indicates the trend equation, where: α indicates the smoothing parameter, 0 ≤ α ≤ 1, is the smoothing parameter for the trend, 0 ≤ β ≤ 1, l t indicates the time series value at time t, b t is the time series trend at time t.

Nonlinear Autoregressive Neural Networks (NARNN)
Artificial Neural Networks are forecasting models inspired by biological neural networks. They identify and model nonlinear relationships between the response variable and its predictors. A collection of neurons, grouped in input, hidden and output layers to form the artificial network, can perform a large number of complex tasks, quite efficiently [8]. This makes ANNs a powerful tool, able to learn from previous examples and improve its performance. That gives them the ability to analyze new data based on previous results. Artificial Neural Networks are nonlinear models that map a set of input into a set of output variables, through hidden layers of neurons. An ANN is composed of several layers: -The first layer, known as the input layer, is the one that takes the data in input. The last layer, called the output layer, gives the results of the analysis or the solution to the problem. The data flow from the input layer to the output through one or more intermediate layers called hidden layers. This is where the data is analyzed and the requested outputs are taken. The nodes of the hidden layers detect the features in the pattern of the data and the nonlinear relationships between them. Then, the requested output is sent from the hidden layer to the output layer. In designing a neural network, we must determine the following variables. -The number of input nodes: corresponds to the number of variables of the input layer used to predict future values. In a time series forecasting problem, the number of input nodes corresponds to the number of lagged observations taken into consideration for the forecasting. It is preferred to use a small number of input nodes to unveil the features of the data, as too few or too many input nodes can affect the learning or prediction capability of the network [9]. -The number of hidden layers and hidden nodes: usually, one hidden layer is enough for most forecasting problems. Two or more hidden layers are preferred over one hidden layer, especially when one hidden layer network has too many nodes, which can lead to unsatisfactory results or overfitting problems. -The number of output nodes: depends directly on the considered problem. In a time series forecasting problem, the number of output nodes corresponds to the forecasting horizon, which can be one-step-ahead (using one output node) or multi-step-ahead forecasting. There are two ways of making multi-step-ahead forecasts: the iterative method, in which the forecasted values are iteratively used as inputs for the next periods' forecasts, where only one output node is necessary and the second one, called the direct method, which requires several output nodes to directly forecast each step into the future [9].
In our study, the NAR network was developed using the nnetar function of R software "caret" package that fits a neural network model to a time series [8] developed by Hyndman, O'Hara, and Wang. A NNAR (p, k), where p indicates the number of non-seasonal lags used as inputs and k the number of nodes in the hidden layer, can be described as an AR process with nonlinear functions. We chose a (28-5-1) network, with 28 lags as input nodes and 5 hidden layer nodes ( Figure 4). It has the form of a feedforward three-layer ANN, where neurons have a one-way connection with the neurons of the next layers [10]. The data set was divided into training set (70%), testing set (15%), while the last 8 days' data were used for the validation.

TBATS Model
The third was the TBATS (Trigonometric Exponential smoothing state-space model with Box-Cox transformation, ARMA errors, Trend and Seasonal component) model, which uses a combination of Fourier terms with an exponential smoothing state-space model and a Box-Cox transformation, in a completely automated manner. The unit of time used in modeling was day. The forecasting performance of all these models was evaluated using the mean absolute percentage error (MAPE), while the model fits were evaluated using AIC (Akaike Information Criterion), reported in Table 1.

Results
Selection and accuracy measures for the forecasting models are reported in Ta   We chose the best forecasting model according to the MAPE value (Mean Absolute Percentage Error), as it is recommended as an accuracy comparing unit when using different methods on a time series, considering the most accurate model the one with the lowest MAPE value, given the considered period. NNAR model has the minimal MAPE for the considered period (14.178%).
In Table 2, we represent the MAPE for the last 8 days (testing data) for cumulative data for COVID-19. We can observe that again NNAR model is the best one for forecasting COVID-19 new cases in Italy. This fact confirms once again our assumption about choosing the best model for our time series.
We performed the forecasting for confirmed COVID-19 cases in Italy using the above models. We conducted 20 days ahead forecast (until 30 January 2022) and compared the forecasting data with the testing data for 8 days (02 January 2022-10 January 2022). We applied the forecasting models to the confirmed cases for Italy for the last 8 days and compared the results with the actual COVID-19 data. We calculated the MAPE values as the difference between actual data and forecast values. The MAPE values for each forecasting model are represented in Table 3. Based on our analysis, we concluded that the prediction performance of the models was similar to the real data. In particular, NNAR model gave more accurate predictions, as its MAPE values were lower compared to the other models. We observed decreasing MAPE values, in particular for the last 6 days' testing values, as its values decreased from about 13% to 1%. While for the other predictive models, we observed higher MAPE values. ARIMA had a worse predicting performance for the first 4 days and the last 2 days, while TBATS was the worst forecasting model when comparing the 8 days' training data MAPE values.
A visual representation of the forecasting is shown in the above figures. As can be seen from the graph, the predicted values follow the trend and the seasonality of our time series testing data. The confidence interval indicates that accurate forecasts can vary within that interval (marked in blue in Figure 5). If we compare the values of eight days used as a test set, we notice that there are significant differences between the values predicted by ARIMA model and the values observed from the collected data. This is emphasized by the value of MAPE for the eight days test, which for the ARIMA model reaches 18.058%.
While Figure 6 shows a graphical representation of the ARIMA (2, 1, 2) model error tests. From the error curve, it is noticed that the ARIMA model was selected through the auto.arima function shows normal errors with a relatively low autocorrelation between them.   For the construction of the NARNN model, the data were divided into two sets; training set and testing set. The training set was used to create the model, while the test set was used for the evaluation of the created model [11]. The network structure was chosen based on the results of Zhang et al. [9], who showed, through simulation, that the best network structure corresponds to one hidden layer with a maximum of two neurons. Since the network with 5 hidden neurons

Conclusions and Discussions
In