A Prediction Method Based on Improved Echo State Network for COVID-19 Nonlinear Time Series

This paper proposes a prediction method based on improved Echo State Network for COVID-19 nonlinear time series, which improves the Echo State Network from the reservoir topology and the output weight matrix, and adopt the ABC (Artificial Bee Colony) algorithm based on crossover and crowding strategy to optimize the parameters. Finally, the proposed method is simulated and the results show that it has stronger prediction ability for COVID-19 nonlinear time series.


Introduction
The pneumonia caused by a novel coronavirus infection that was erupted in Wuhan, Hubei Province, China in December 2019, which is known as COVID-19. There are a large number of nonlinear time series which hide the development trend of the epidemic, including newly confirmed cases, newly suspected cases, cumulative confirmed cases, existing suspects cases, cured rate and mortality rate, etc. The development trend of COVID-19 can be learned in time through the prediction of the above-mentioned nonlinear time series, and then relevant personnel can take corresponding measures and strengthen the prevention and control for the upcoming trend of the epidemic before it becomes severe, which is of vital importance to guard the lives of the people.
The prediction method based on the Echo State Network (ESN) [1] is currently one of the main methods to predict nonlinear time series. The ESN is characterized by using a large-scale random sparsely connected network called "reservoir" as the hidden layer to deal with nonlinear and unstable time series.
The training process only needs to train the output weight from the reservoir to the output layer, which simplifies the training process of the network, and avoids the problems of being prone to falling into local optimality and complex training algorithms of traditional neural networks [2]. At present, some scholars have begun to research the improved methods of ESN, mainly focusing on reservoir topology optimization and output weight optimization [3] [4].
In terms of the reservoir topology, the reservoir of traditional ESN is a random network, which leads to model training purposeless. In order to solve this problem, Li et al. [5] proposed to adopt a NW small world Echo State Network with both randomness and regularity as the reservoir of ESN to predict nonlinear time series, it improves the adaptability and prediction accuracy of the prediction model, but the node connections of the NW small world Echo State Network are deterministic connections, and the prediction accuracy of time series with time-varying and ambiguity is limited.
In terms of the output weight, the output weight of traditional ESN calculation adopts pseudo inverse method, but it is prone to appear the question of multicollinearity when solving high-dimensional linear regression [6]. In order to solve this problem, Wang et al. [7] [8] proposed to use Ridge regression, Lasso regression and other linear regression methods to calculate the output weight by adding 2 L norm and 1 L norm, but Ridge regression and Lasso regression belonging to biased estimation to impose greater punishment to larger output weight, and over fitting problem is prone to exist in model prediction [9]. The asymptotically unbiased regularization method is needed to improve the prediction accuracy and generalization performance of the prediction model to solve the above problems. The common asymptotically unbiased regularization methods include SCAD (Smooth Clipped Absolute Deviation) regularization method [10] and MCP (Minimax Concave Penalty) regularization method [11]. At present, some scholars have successfully applied SCAD regularization method to the output weight optimization of small world Echo State Network [12], which improves the prediction accuracy of small world Echo State Network for nonlinear time series. However, the MCP regularization method has not been proposed to optimize the output weight of small world Echo State Network, and the penalty function of MCP regularization method has the minimum maximum convexity, which can make more appropriate punishment for the large or small output weight, and it is more suitable for processing multi-dimensional nonlinear data [13].
Therefore, this paper proposes a prediction method based on improved Echo State Network for COVID-19 nonlinear time series, which improves the ESN from the optimization of the reservoir topology and the optimization of the output weight to improve the prediction accuracy for the COVID-19 nonlinear time series.

A Prediction Method Based on Improved Echo State Network for COVID-19 Nonlinear Time Series
The prediction method based on improved Echo State Network for COVID-19 nonlinear time series includes three parts: optimization of the reservoir topology, optimization of output weight and optimization of parameters.

Optimization of the Reservoir Topology
The improved small world network is used as the reservoir of ESN to obtain the Small World Echo State Network (SWESN). Its topological structure is shown as in Figure 1. It can be seen from Figure 1 that the SWESN has three layers, including input layer, hidden layer (reservoir) and output layer. The internal weight matrix x W of the reservoir in the improved small world network is obtained by establishing the function relationship between the edge probability and the distance between nodes, and it will not change after determination. The edge probability value p decreases exponentially as the distance between nodes increases, that is: where p denotes the connection weight between nodes and its value range is [0, 1], d denotes the Euclidean distance between nodes, α is used to adjust the distance sensitivity, and β is used to adjust the overall density of the network.
The internal weight matrix and ( ) y t R ∈ respectively denote the input variable, state variable and output variable at the time t; the activation function f usually takes the hyperbolic tangent function tanh ; respectively denote the input weight matrix, the internal weight matrix of the reservoir and the output weight matrix. The input weight matrix is randomly generated and will not change after determination.

Optimization of Output Weight
The output weight matrix out W is obtained during training, that is, out W is the matrix corresponding to the minimized objective function, as shown in Equation (4), and it is obtained by the least square method, as shown in Equation (5): where ( , ) X Y denotes the training sample and † X denotes the pseudo-inverse of X. The process of optimizing the output weight is shown as in Figure 2.
It can be seen from Figure 2 that the output weight matrix optimization process has three steps. Firstly, the objective function of output weight with MCP penalty term is obtained by MCP (Minimax Concave Penalty), and then the objective function of derivable origin is obtained by LQA (Local Quadratic Approximation). Finally, the optimized output weight is obtained by Ridge regression.
MCP generates singular values at the origin and can produce sparse solutions. And, when | | θ γλ > , the variable is directly set to 0, which satisfies the approximate unbiased estimation of the variable θ , and the MCP function is shown in where , γ λ are adjustable hyperparameters ( 2 γ > , 0 λ > ). θ is a parameter vector, which denotes the output weight out W in this paper. Then the objective function of output weight with MCP penalty term is obtained, as shown in Equation (7), and the estimated matrix  out W is the matrix corresponding to the minimized objective function with MCP penalty term, as shown in Equation (8):  arg min( ) where J denotes the number of variables, and , λ γ ρ denotes the penalty function.
LQA is chosen to approximately decompose the MCP penalty function to obtain an approximate solution of the model since the MCP penalty function is not directable at the origin. Assuming that where D denotes the number of non-zero elements in out W , and the estimated output weight matrix out W can be obtained by repeatedly executing the Ridge regression solution to Equation (10), as shown in Equation (11): Finally, the optimized output weight matrix out W is obtained through iterative Equation (11).

Optimization of Parameters
In this paper, an ABC (Artificial Bee Colony) algorithm based on crossover and crowding strategy is adopted to optimize the γ and λ of MCP. ABC algorithm is a kind of global optimization algorithm based on swarm intelligence with fast convergence speed. Its intuitive background comes from bee colony's honey gathering behavior. Bees with different divisions of labor find the best solution to the problem by sharing and exchanging information. The crossover strategy is integrated to expand the search range of the whole parameter solution and the crowding strategy is integrated to eliminate the similar solutions within the population. The training process of the ABC algorithm based on crossover and crowding strategy are shown in Figure 3 and the steps are as follows: Step 1: Initialize. Set the number of food source population as Z, the maximum number of iterations of the population as Q, the maximum evolution threshold as H, the crossover probability as C, the crowding factor as P, the crowding number as a P , the current iteration times 0 q = , the current evolution threshold 0 h = , and initialize γ and λ as the initial food source (Randomly generate a group of uniform distribution and adjustable hyperparameter combination); Step 2: Calculate the fitness value F according to the training error of the sample; Step  Step 4: Perform crowding strategy. First, randomly select P food sources as crowding factors after normalizing γ and λ , and then calculate and sort the difference between other food sources and crowding factors in ascending order.
Finally, eliminate the former a P food sources and randomly generate a P food sources to ensure that the population size does not change; Step 5: Perform crossover strategy. First, randomly select two food sources from the population to calculate the numerical digits, and then circularly perform the following crossover operations until each bit of the food source is crossed to obtain two new food sources: select the i-th bit of the food source and randomly generate [0,1] j ∈ . If j C > , it will not be changed, otherwise perform the crossover operation (Exchange the i-th value of two food sources); Step 6: Eliminate the last food source and reinitialize the stagnant food source.
First，sort the food sources in the current population in descending order according to the fitness value, and then select a certain proportion of the food sources with the fitness value in the last in the current population to initialize.
Finally, record and store the contemporary food sources in the set. If the food source already exists, set 1 h h = + . When h H = , set 0 h = and retain the contemporary optimal solution. If the contemporary optimal solution is better than the historical optimal solution, it will be replaced. Otherwise, the historical optimal solution will be retained and entered into the next generation population; Step 7: Let 1 q q = + , if q Q ≤ , jump to step 2, otherwise, jump to step 8; Step 8: Output the historical optimal solution and end the training.
B. T. Liu et al.

Test Data
The proposed prediction method is applied to the COVID-19 nonlinear time series for one-step prediction. The COVID-19 nonlinear time series is derived from the National Epidemiological Map Platform, including newly confirmed cases, newly suspected cases, cure rate and mortality rate in China.

Simulation Analysis
Simulation test and analysis were carried out after normalization of COVID- where N denotes the length of the predicted time series, n Y denotes the target data, n Y denotes the average of target data and  n Y denotes the predicted data. The above-mentioned ESN and its improved seven methods were used for 30 simulation experiments, and the error results were averaged. The predicted NRMSE results are shown in Tables 1-4.   It can be seen from Tables 1-4 that the prediction method proposed in this paper has lower NRMSE than the other six prediction methods for newly confirmed cases, newly suspected cases, cure rate and mortality rate, which shows that the proposed method has stronger prediction ability for COVID-19 nonlinear time series.

Conclusion
In this paper, a prediction method based on improved Echo State Network for