^{1}

^{1}

^{*}

^{1}

^{1}

Stock index forecast is regarded as a challenging task of financial time-series prediction. In this paper, the non-linear support vector regression (SVR) method was optimized for the application in stock index prediction. The parameters (C, σ) of SVR models were selected by three different methods of grid search (GRID), particle swarm optimization (PSO) and genetic algorithm (GA).The optimized parameters were used to predict the opening price of the test samples. The predictive results shown that the SVR model with GRID (GRID-SVR), the SVR model with PSO (PSO-SVR) and the SVR model with GA (GA-SVR) were capable to fully demonstrate the time-dependent trend of stock index and had the significant prediction accuracy. The minimum root mean square error (RMSE) of the GA-SVR model was 15.630, the minimum mean absolute percentage error (MAPE) equaled to 0.39% and the correspondent optimal parameters (C, σ) were identified as (45.422, 0.012). The appreciated modeling results provided theoretical and technical reference for investors to make a better trading strategy.

Stock index forecast is a non-linear dynamic system. There are many factors affecting the stock index, which goes with the complex fluctuation [

A variety of machine learning methods had been applied to stock index forecasting. They perform excellently with their merits of self-organization, self- learning and nonlinear approximation [

CSI 300 index is a capitalization-weighted stock market index designed to replicate the performance of 300 stocks traded in the Shanghai and Shenzhen stock exchanges. We established SVR calibration models to predict the opening price of CSI 300 index. In order to find out the optimizational modeling parameter combination of (C, σ), we tried to respectively utilize grid search (GRID) method, particle swarm optimization (PSO) and genetic algorithm (GA) for parameter selection. Furthermore, the opening price of CSI 300 index was predicted using the best parameters of the SVR model with GRID (GRID-SVR), the SVR model with PSO (PSO-SVR) and the SVR model with GA (GA-SVR). The rest of the paper is organized as follows. The methodology demonstrated in Section 1, Section 2 illustrated the SVR modeling process and prediction results. Section 3 contains the conclusions.

Daily trading data of CSI 300 index was scraped from the Wind Financial Terminal One-Stop Platform. The daily trading data (January 4, 2013 to November 30, 2016) were selected as sample set (a total of 949-day data). The data originally includes eight variables, which are opening price, ceiling price, the lowest price, closing price, charge rate, volume, turnover and the margin balance of China stock markets. We further calculated the 5-day average charge rate, the 20-day average charge rate, the 5-day average volume and the 20-day average volume as four new indicators to identify the index-changing trend. SVR models were established by the 949 samples with 12 variables to predict the opening price of the next day. The CSI 300 index daily opening price is shown in

Four fifths of the total samples were selected for training, and rest one fifth for testing. The calibration models were practiced using the training samples and the parameters are optimized. Then, the training models with their parameters were applied to predict the opening price of the test samples. The prediction per- formance was evaluated using the root mean square error (RMSE) and mean absolute percentage error (MAPE). The formulas are as follows:

RMSE = ∑ t = 1 n ( y t − y ^ t ) 2 n − 1 , MAPE = 1 n ∑ t = 1 n | y t − y ^ t | | y t |

where y t represents the real opening price, y ^ t represents the predictive value of the opening price and n is total number of sample.

Stock index prediction requires establishing an optimal prediction function based on stock history data and other interference to calculate the stock index price and to reveal the trend of the stock index. The function is defined as follows:

y t + 1 = f ( x 1 , x 2 , ⋯ , x t ) ,

where y_{t}_{+1} represents the next-day opening price and x 1 , x 2 , ⋯ , x t are input samples.

The SVR algorithm is used to estimate the function. The input data is mapped onto a high-dimensional feature space using RBF kernel. SVR is formulated as minimization of the following optimization problem,

min ω , ξ i , ξ i * 1 2 ‖ ω ‖ 2 + C ∑ i = 1 l [ ( ξ i ) + ( ξ i * ) ] ,

where ω is the vector-form coefficients, ξ i and ξ i * represent the relaxation factors. This optimization formulation can be transformed into the dual problem, and its solution is given by

f ( x ) = ∑ i = 1 l ( α i * − α i ) K ( x i , x * ) + b * ,

St. K ( x , x * ) = exp ( − ‖ x − x * ‖ 2 / 2 σ 2 ) ,

where Lagrange multipliers ( α i , α i * ) are controlled by C, K(x, x^{*}) is a RBF kernel, σ represents the kernel width. C and σ are tuned respectively in the GRID-SVR, PSO-SVR and GA-SVR models respectively, to test the predictive capabilities.

The best parameter combination of (C, σ) is determined according to the mi- nimum mean square error (MSE) under the 5-fold cross validation. The MSE is defined as follows,

MSE = 1 n ∑ t = 1 l ( y t − y ^ t ) 2 .

In the Grid search process, C and σ were pre-set in the tuning range of { 2 − 8 , 2 − 7.5 ⋯ 2 7.5 , 2 8 } . They constitute a two-dimensional dynamic network. The optimal parameters (C, σ) are determined by searching the minimum MSE in this dynamic network.

In the PSO search process, a group of particles (random solutions of C and σ) were randomly initialized [

v d i ( t + 1 ) = ω ⋅ v d i ( t ) + c 1 ⋅ ( p best i ( t ) − x d i ( t ) ) + c 2 ⋅ ( g best i ( t ) − x d i ( t ) ) ,

x d i ( t + 1 ) = x d i ( t ) + v d i ( t + 1 ) ,

where x d _{ }represents the position of the particle, v d represents the velocity of the particle, t is the number of iteration, i is the number of particles and ω is the velocity weight, c_{1} and c_{2} are learning factors. In this study, c_{1} and c_{2} were both valued 2, ω valued 0.5. The globally-optimal and individually-optimal velocity and position of the particle swarm were found by 100 iterative computations, so that the correspondent optimized parameters (C, σ) are determined.

In the GA search process, a population of chromosomes was randomly initialized [

The GRID, PSO and GA methods were respectively applied for SVR parameters optimization. The flow charts of the experimental algorithms are shown in

The trading data of CSI 300 index from January 4, 2013 to November 30, 2016

was prepared for establishing the SVR calibration models. The records of previous consecutive 759 days were used as training samples and the remaining 190-dayrecords as the test samples. Twelve variables of each sample were input to the SVR training models and a series of the next-day opening price were predicted. The parameters (C, σ) of SVR models were optimized respectively by GRID, PSO and GA.

During the GRID-SVR modeling process, the parameters (C, σ) were tuned for searching the minimum MSE. A larger value of C and a smaller value of σ generated a smaller MSE. The MSE contours are shown in ^{8} and σ reaches 2^{−7} (the solid point in

For the PSO-SVR models, an initial group of particles was randomly generated and then the positions and velocities of particles were globally and individually updated by 100 iterative computations. The MSE convergence process is shown in

In the GA-SVR parametric tuning, a group of candidate solutions was randomly initialized, and reiteratively evolved to a more appreciate alternative group of solutions by genetic selection, crossover, and mutation.

In summary, the optimally selected parameters (C, σ) of the SVR models were determined respectively for GRID, PSO and GA optimalization. The GRID-SVR, PSO-SVR and GA-SVR calibration models with their corresponding optimal values of (C, σ) were applied to predict the validation samples. The parameters and prediction results are both presented in

obtained low values of RMSE and MAPE indicate that GRID, PSO and GA optimalizing methods were acceptable for parametric optimization of SVR models, while the GA-SVR model was best validated. It provided the lowest RMSE of 15.630 and lowest MAPE of 0.39%. The comparison between the real and the GA-SVR predictive opening price is depicted in

Model Type | Model Parameters | Predictive Results | ||
---|---|---|---|---|

C | σ | RMSE | MAPE (%) | |

GRID Optimalization | 256 | 0.008 | 17.954 | 0.47 |

PSO Optimalization | 60.576 | 0.010 | 16.179 | 0.41 |

GA Optimalization | 45.422 | 0.012 | 15.630 | 0.39 |

In this study, the SVR models with GRID, PSO and GA parametric optimization were applied to predict the opening price of CSI 300 index. The optimal parameters (C, σ) were selected as (256, 0.008), (60.576, 0.010) and (45.422, 0.012) for GRID-SVR, PSO-SVR and GA-SVR models, respectively. The optimized SVR models were applied to the validation samples, obtaining the predictive RMSE’s in the range of (15.63, 17.96), and the MAPE’s ranged from 0.39% to 0.47%. The results showed that the GRID-SVR, PSO-SVR and GA-SVR calibration models were feasible to predict the short-term trend of opening price, and the GA-SVR had the most accurate prediction. The modeling performance provided theoretical and technical reference for investors to make a better trading strategy.

This work was supported by the National Natural Scientific Foundation of China (61505037), the Natural Scientific Foundation of Guangxi (2016GXNSFBA38- 0077, 2015GXNSFBA139259).

Chen, J.C., Chen, H.Z., Huo, Y.J. and Gao, W.T. (2017) Application of SVR Models in Stock Index Forecast Based on Different Parameter Search Methods. Open Journal of Statistics, 7, 194-202. https://doi.org/10.4236/ojs.2017.72015