Hybrid Support Vector Regression with Parallel Co-Evolution Algorithm Based on GA and PSO for Forecasting Monthly Rainfall

Accurate and timely monthly rainfall forecasting is a major challenge for the scientific community in hydrological research such as river management project and design of flood warning systems. Support Vector Regression (SVR) is a very useful precipitation prediction model. In this paper, a novel parallel co-evolution algorithm is presented to determine the appropriate parameters of the SVR in rainfall prediction based on parallel co-evolution by hybrid Genetic Algorithm and Particle Swarm Optimization algorithm, namely SVRGAPSO, for monthly rainfall prediction. The framework of the parallel co-evolutionary algorithm is to iterate two GA and PSO populations simultaneously, which is a mechanism for information exchange between GA and PSO populations to overcome premature local optimum. Our methodology adopts a hybrid PSO and GA for the optimal parameters of SVR by parallel co-evolving. The proposed technique is applied over rainfall forecasting to test its generalization capability as well as to make comparative evaluations with the several competing techniques, such as the other alternative methods, namely SVRPSO (SVR with PSO), SVRGA (SVR with GA), and SVR model. The empirical results indicate that the SVRGAPSO results have a superior generalization capability with the lowest prediction error values in rainfall forecasting. The SVRGAPSO can significantly improve the rainfall forecasting accuracy. Therefore, the SVRGAPSO model is a promising alternative for rainfall forecasting.


Introduction
Monthly rainfall time series exhibit non-stationary characteristic, which can be How to cite this paper: Wu, J.S. and Xie, Y.S. (2019) Hybrid Support Vector Regression with Parallel Co-Evolution Algorithm Based on GA and PSO for Forecasting Monthly Rainfall. Journal of Software Engineering and Applications, 12, 524-539. described as time series whose statist distributions change over time. The structural changes of monthly rainfall may be caused by the various processes of atmospheric physical change, such as atmospheric physics, temperature physics, pressure field and sea temperature field, etc. So accurate and timely monthly rainfall forecasting is one of the most difficult processes of the hydrology cycle for both water quantity and quality management [1] [2] [3]. Several recent research studies have developed for monthly rainfall forecasting based on atmospheric physics model, however, renders quantitative forecasting of rainfall extremely difficult because it involves many nonlinear variables which are interconnected in a very complicated way, and the volume of rainfall calculation [4] [5] [6]. The support vector machine (SVM) developed by Vapnik and his colleagues, is an important machine learning tool based on statistical learning theory, using the principle of structural risk minimization. With the introduction of Vapnik's insensitivity loss function, the regression model of SVM, called support vector regression (SVR), has also been receiving increasing attention to solve nonlinear estimation problems [7] [8]. Because SVR is a specific type of learning algorithms, characterized by the capacity control of the decision function, the use of the kernel function and the sparsity of the solution, SVR has used on regression estimation, include monthly rainfall forecasting modelling. These unique characteristics of SVR make them a promising alternative approach to traditional regression estimation approaches.
Although SVRs have been recently proposed as a new technique for machine learning problems, the literature about SVRs is vast and growing. When using SVR in regression estimation, many important questions research remain, such as, how to choose the optimal parameters of SVR. Optimal parameters of the kernel function can lead to the accuracy of the SVR regression estimation. Inappropriate parameters in SVR lead to over-fitting or under-fitting in the SVR regression estimate for application of actual precipitation prediction. Support vector machine hyper-parameters are obtained through trial-and-error by the operators, which leads to the effects of SVR applications strongly depends upon the operator's experience [7] [8] [9]. If the user is not careful, it is easy to cause model over-fitting. Such a model might be doing well in predicting past incidents, but unable to predict future events [10] [11]. Most studies depend on the cross-validation set to tune the parameters of the kernel function. So, it is very worthwhile to develop the method selection problem to make SVR less dependent on the skills of the experimenters.
Recently, several studies have proposed the parameter optimization of Gaussian kernel function by evolutionary optimization, such as Genetic Algorithm (GA) and Particle Swarm Optimization (PSO) [7] [8] [9] [10] [11], achieve good application results. In this paper, a novel Co-evolution algorithm has been presented to develop an efficient training algorithm for the parameter of SVR kernel function based on the standard GA and imports optimized PSO algorithm. In order to overcome the shortcoming of the standard PSO algorithm and GA, which are easily to fall into local solution and low optimization. In this paper we Journal of Software Engineering and Applications use co-evolution of "GA-PSO", taking GA and PSO to iterate each other, which the two populations can co-evolve, in order to search out high quality of association rules in the high-dimensional data set. This investigation presented in this paper is motivated by a desire to improve the inefficient disadvantages of searching algorithms mentioned above in determining the parameters in the SVR model by the parallel co-evolution based on GA and PSO for monthly rainfall forecasting modelling.
The present study proposed a novel parallel co-evolution algorithm of GA combined with PSO to optimize the SVR parameters, namely SVRGAPSO based on the mechanism of information interaction between GA and PSO when they are iterating over two populations. Our approach determines the optimal kernel parameter values for the SVR model in monthly rainfall forecasting. The rainfall data of Nannig, Guangxi, China, is predicted as a case study for our proposed method. An actual case of forecasting rainfall is illustrated to show the improvement in predictive accuracy and capability of generalization achieved by our proposed SVRGAPSO model. These are many monthly rainfall forecasting models of application different approaches; SVRGAPSO model achieves better generalization performance than other regression estimation approaches. The rest of this study is organized as follows. Section 2 describes the SVRGAPSO, ideas and procedures. For further illustration, different models are used to employ for rainfall forecasting analysis in Section 3, and conclusions are drawn in the final section.

Support Vector Regression
The brief ideas of SVR for the case of regression are introduced. Suppose we are given training data ( where i x is the input vector; i y is the output value and N is the total number of data dimension [12]. The linear regression function is formulated as follows: where x denotes the forecasting values; ( ) x ∅ denotes the high dimensional feature space, which is non-linearly mapped from the input space x; ω is the coefficients and b are adjustable. The coefficients ω and b can be estimated by minimizing the regularized risk function: Therefore, the objective of SVR is to include training patterns inside an ε -insensitive tube while keeping the norm ω as small as possible. The parameter ε is the difference between actual values and values calculated from the regression function. This difference can be viewed as a tube around the regression function.
C denotes a cost function measuring empirical risk; it indicates a parameter determining the trade-off between the empirical risk and the model flatness. After the quadratic optimization problem with inequality constraints is solved, the SVR is given by: where i α and * i α are the Lagrangian multipliers associated with the constraints, ( ) , i K x x is called the kernel function. As the kernel function defines the feature space in which the decision function is constructed, exploring useful kernel function constitutes a significant topic in SVR application. The most used kernel functions are the Gaussian radial basis functions (RBF) with of the parament σ : By using the kernel functions, SVR can efficiently and effectively construct many types of nonlinear functions to compute the dot product in feature space for regression estimation. Gaussian RBF kernel is not only easier to implement but also capable of non-linearly mapping the training data into an infinite dimensional space. Thus, it is suitable to deal with a nonlinear relationship.
Therefore, the Gaussian RBF kernel function is specified in this study.
SVR based on radial basis kernel function has three parameters to be determined, where C is to trade-off between the model flatness and the degree of the training errors, ε is the width of the insensitive loss function, and σ is the bandwidth of the Gaussian kernel function. For example, if C is too large (infinity), then the objective is to minimize the empirical risk only. Parameter ε controls the width of the ε-insensitive zone, i.e., the number of support vectors (SVs) employed in the regression. Larger ε value implies fewer SVs employed; thus, the regression function is simpler [

Genetic Algorithm and Particle Swarm Optimization
Genetic algorithm is an adaptive optimization technique developed by Holland based on natural evolution and survival of the fittest, and works on a population of individuals [17]. GA has been successfully applied to solve in many optimization problems of scientific and engineering fields, due to the versatility and robustness in solving optimization problems. However, there are two major shortcomings on GA, slow convergence and trapped into local optimum, which are mainly caused by the population diversity reduction [18] [19].
PSO has been used to solve real time issues and aroused researchers' interest due to its flexibility and efficiency, which is a stochastic, population-based optimization algorithm introduced by James Kennedy and Russell C. Eberhart [20], has gained much attention and wide applications in solving continuous nonlinear optimization problems. PSO has many advantages, such as easy exchange of information, storage of information, simple structure, quick convergence and easy implementation by all particles, nowadays PSO has gained much attention and wide applications in solving continuous nonlinear optimization problems.
However, the PSO algorithm greatly depends on its initial values, and the swarm diversity is dropped rapidly along with the increasing of the iteration times which makes it been trapped in the local optimum, i.e., premature convergence, accordingly, the global search capacity has also been affected. Particularly, as for the high-dimensional multi-modal problems, premature convergence may be appeared easily [21] [22] [23].

Parallel Co-Evolution Algorithm
Co-evolution concept is first proposed by Ehrlich and Raven who discuss the evolution between plants and herbivorous insects [24] [25] [26]. Its core idea is: the interaction of populations is indispensable conditions for survival of each other. In a long-term evolutionary process, they are interdependent and coordinate. They improve the individual and whole performance. Co-evolution algorithm adopts populations to change the mode of traditional searching optimum solutions, which could avoid the defects of dimension reduction method, the local optimum and the premature convergence. In this paper, a novel parallel co-evolution is presented for the parameters optimization problem of GA combined with PSO, taking GA and PSO to iterate each other. Combined with co-evolution concept, the two populations can co-evolve, in order to search out high quality of association rules in the high-dimensional data set. In order to Journal of Software Engineering and Applications achieve this idea, this paper designs an information exchange mechanism, named interoperability. Let information pass between the two populations to achieve the purpose of co-evolution.

The Developed SVRGAPSO Approach
In where, i x is the training samples, N is the number of training data samples, i y is the actual value, and ˆi y the predicted value. The optimal parameter setting is critical to predicting the performance of SVR model. In this paper, Parallel Co-evolutionary algorithm based on GA combined with PSO is employed to simultaneously optimize SVR's parameters and the kernel function's parameter, namely SVRGAPSO. Figure 1 illustrates the process of the SVRGAPSO algorithm for SVR optimization in rainfall modelling. Details of our proposed SVRGAPSO described as follows: Step1: Generate initial population. Two initial populations are randomly generated according to the target database. POP 1 and POP 2 use respectively the search strategy of PSO and GA to search for association rules. Two populations use the same coding rules, the fitness function, population size and the maximum evolution generation. This paper used real coding rules, in which the number of elements in an array of real numbers corresponds to transaction database field.
The number of element values represents the attribute values of the field.
Step2: Initialize the two populations with GA and PSO parameters: number of iterations, crossover probability, mutation probability, particles velocity and particles position.
Step3: Input training data and calculate the fitness, which determine G best and P best by a simple comparison of their fitness values according to Equation (6). We compare fitness value of the global best individual Gpso in POP 1 and best individual Gga in POP 2 . Individuals with larger fitness values will replace the best individual of other populations, as a basis for the next generation of evolution.
The adjustment strategies of crossover probability are shown in Equation (7): Journal of Software Engineering and Applications where { }  Step4: We judge condition whether to meet the termination condition. If the number of iterations has reached the maximum number of iterations then the algorithm ends, switch to Step 5; or continue to the next step.
Step5: The speed and location of POP 1 are updated in accordance with PSO and GA then produce next generation. Once the termination condition is met, it will output the best solution and obtain the optimal parameter setting for SVR model. Input test samples for the prediction effect of the SVR model.

Application and Experiments Analysis
The platform adopted to develop the SVRGAPSO approach is a PC with the following features: Intel Core i7-8550U, 1.80 GHz CPU, 32.0GB RAM, Windows 10 operating system and the MATLAB R2019a development environment. In this paper, GA and PSO parameters are set as follows: the iteration times are 100; the population is 40; crossover probability is 0.80; mutation probability is 0.05; the minimum inertia weight is 0.1; the maximum inertia weight is 0.9 and the learning rate is 2.0.

Independent Variables of the Monthly Rainfall Model
It is very important to select of independent variables for rainfall forecasting model. In this paper, the most commonly variable selection method in meteoro-

Criteria for Evaluating Model Performance
This paper used the following evaluation metric to measure the performance of the proposed model: Root mean square error (RMSE), Mean absolute percentage error (MAPE), Coefficient of efficiency (CE), which can be found in many paper [8]. For the purpose of comparison by the same 12 input variables, we have also built other three-monthly rainfall forecasting models: pure SVR model, SVR with pure PSO evolutionary SVR parameters (named by SVRPSO), SVR with pure GA evolutionary SVR (named by SVRGA). For building SVR rainfall forecasting model, the LIBSVM package proposed by Chang and Lin is adapted for this paper [27], which all SVR parameters are based on the RBF kernel type by the trial-and-error method. The best parameters with the minimum testing RMSE are optimal. The optimal parameters are based on the best testing and va- For building SVRPSO and SVRGA rainfall forecasting model, PSO is used to search for the optimal parameter values of SVR for rainfall forecasting by Chen K., et al. presented [28], and GA is used to search for optimal parameter values of SVR for rainfall forecasting by Li W. M., et al. presented [15]. Monthly precipitation forecasting model is established by evolutionary selection of optimal Journal of Software Engineering and Applications parameters of SVR based on the best testing and validation result (minimum RMSE) with GA and PSO respectively. These results are compared with the results of Co-evolutionary SVR SVRPSO to illustrate the performance of the different evolutionary algorithms.   four different models in Nanning, Guangxi from January 1992 to December 2011. Figure 5 shows a graphical representation of the testing data results of precipitation using four different models in Nanning, Guangxi from January 2012 to December 2017.   The main reason is that GA and PSO are easy to fall into local optimum and cannot evolve to optimal parameters. SVR are also prone to over-fitting through cross-validation, resulting in poor prediction results. In the iterative process of GAPSO using GA and PSO to exchange of information between the two populations, co-evolution algorithm not only is superior in the mining quality, but also has a significant advantage in the ability to jump out of local optimal solution also has the phenomenon of premature convergence. We get the global optimum with greater probability for SVR parameters.

Results Analysis
From the experiments presented in this paper we can draw the following con-

Conclusion
The rainfall system is one of the most active dynamic weather systems. This pa-Journal of Software Engineering and Applications per presents a parallel co-evolution algorithm using GA and PSO to exchange each information between the two populations in the process of evolutionary iteration for the parameters of SVR in rainfall forecasting modelling. In terms of empirical results, we find that across different models for the test cases of monthly rainfall based on different evaluation criteria, our proposed SVRGAPSO forecasting technique performs the best. In all testing cases, RMSE of the proposed our modeling technique is the lowest and the CE is the highest, indicating that the SVRGAPSO forecasting technique can be used as a viable solution to monthly rainfall time series forecasting.