Support Vector Machine Prediction Modeling for Automobile Ownership ()
1. Introduction
The accurate prediction modeling of China’s automobile ownership can provide a reference for decision-making to deal with a series of urgent social problems such as traffic, environmental pollution, and energy caused by the growth of automobile ownership in China.
In predictive modeling, the mathematical model combined with computer simulation is mostly used nowadays, and the key to applying this method is to accurately determine the model parameters or functional relationship in the mathematical model.
Classical system identification methods, including the least-squares, Kalman filtering, etc., were applied in various fields. To decrease the modeling errors and enhance the control behavior in transients and steady-state, Anian Brosch used the recursive least squares forecasting approach [1] to conduct the current control. Shao et al. proposed the direct least-squares method [2] to address collinear data and big data. Fabio Tinazzi et al. described the historical current measurements using the recursive least-square algorithm [3] to predict the currency’s future behavior corresponding to a limited set of voltage vectors. A least-squares [4] predictive model using discrete cosine transform was suggested to perform the prediction of hourly air quality index. The partial least squares regression model [5] was presented to develop the sea ice concentration prediction model. Xing et al. used the Kalman filter [6] based on the fractional-order model to perform the lithium-ion battery state of charge estimation. Mathieu Brunot presented the recursive estimator [7] that was of Kalman type for discrete-time systems characterized with measurement noise modeled from Gaussian uniform mixtures. Martincevic, A. and M. Vasak used the constrained Kalman filter [8] to identify the semi-physical building thermal models. Xiao et al. utilized the improved extended Kalman filter [9] to do the intelligent vehicle localization. The neuron-based Kalman filter [10] was developed to address the problems of identification of actual systems and conventional Kalman filter for empirical parameter estimation.
However, these classical methods have some inherent defects, such as the dependence on the mathematical model of the research object; the convergence and accuracy of the algorithm depending on the initial value of the parameter estimation, etc., which hinder the further development of traditional system identification methods and make their applications somewhat limited. Artificial neural network [11] makes up for the inherent defects of traditional system identification methods, does not depend on the mathematical model of the research object, has the characteristics of algorithm stability and accuracy with little dependence on the initial value of parameter estimation, and has strong nonlinear mapping ability, which can effectively describe the input-output response characteristic for the nonlinear dynamic system of automobile ownership prediction model. However, the artificial neural network also has insufficient generalization performance, local extremum problems, dimensional catastrophe problems, etc. As a new artificial intelligence method developed based on statistical theory and the principle of structural risk minimization, a support vector machine (SVM) [12] solves the local extremum problem which cannot be avoided in neural network methods and has a better generalization ability and solves the dimensional catastrophe problem skillfully. However, there is little literature on the application of support vector machines to vehicle ownership prediction modeling. The modeling used in this paper is novel, but not innovative.
The learning samples required for the automobile ownership prediction modeling are obtained from the statistical data on automobile ownership and its impact factors. The nonlinear functional relationship in the automobile ownership prediction model is regressed using a support vector machine, and the established automobile ownership prediction model is used to predict automobile ownership in different years. To reduce the impact on the accuracy of automobile ownership prediction due to the large difference in the order of magnitude between the input and output of learning samples, the statistical data on automobile ownership and its influence factors are normalized and the results of automobile ownership prediction are processed by the inverse normalization method. The comparison between the prediction results and the statistical data shows that the automobile ownership prediction model has high performance in generalization, and SVM is a useful method to construct the automobile ownership prediction model.
2. Research Method
SVM is mainly used for pattern recognition and function regression. When used for pattern recognition, it is called support vector classification (SVC) and when used for function regression, it is called support vector regression (SVR).
The algorithm of SVR [12] can be described as follows.
Step 1: Construct the training set
Step 2: Select the appropriate accuracy parameter ε, penalty parameter C, and kernel function
.
Step 3: Solve the following optimization problem.
(1)
The optimal solution
is obtained from Eq. (1).
Step 4: Construct the decision function.
(2)
Select the
or
located in the open interval
. If
is chosen, then
(3)
If
is chosen, then
(4)
In the structural parameters for SVR, the accuracy parameter ε is selected with a certain rule, i.e., as ε increases, the sparsity of the solution becomes larger and the fitting accuracy becomes smaller; instead, the smaller ε is, the higher the fitting accuracy is. When ε decreases to 0, the number of support vectors required for SVR reaches the maximum value, which is almost the same as the number of training samples, affecting the learning efficiency. With a larger value of the penalty parameter C, the more favorable the sample fitting accuracy is but reduces the confidence and generalization performance of the learning machine. Meanwhile, the larger the C value is, the greater the penalty is for the noisy sample, but eliminating too many samples will lead to distortion of the data regression, conversely, a smaller C value may not fit the samples well enough, which likewise causes the generalization performance of the learning machine to be reduced.
3. Results and Discussion
In this paper, the training sample pairs required for SVR for automobile ownership prediction modeling were constructed by using automobile ownership and its impact factor, and the statistical data on automobile ownership and its impact factor was obtained from the National Bureau of Statistics.
The difference in the order of magnitude between automobile ownership and its impact factors can seriously affect the accuracy of automobile ownership prediction modeling. Therefore, in the paper, the automobile ownership and its impact factor statistics were first normalized and pre-processed.
The impact factors affecting the automobile ownership are used as the learning sample inputs, including seven impact indicators such as gross national income, GDP per capita, total import and export, disposable income per urban resident, steel production, road passenger traffic, and total social consumer goods, and the automobile ownership was used as the learning sample output.
The statistical data on automobile ownership and its impact factors before and after normalization are shown in Table 1 and Table 2.
In the SVR structural parameters, the penalty parameter C is chosen as 105 and the precision parameter ε is chosen as 0. To confirm the generalization performance for the automobile ownership regression function constructed by SVR, a learning sample set is constructed from the automobile ownership and its impact factor data for the period from 1994 to 2004, while a verification sample set is constructed from the automobile ownership and its impact factor data for the period from 2005 to 2012.
After the inverse normalization of the automobile ownership prediction results, the predicted values of automobile ownership for the period from 2005 to 2012 are obtained, and the comparison of the predicted values and the statistical results is shown in Figure 1.
(a)(b)
Figure 1. Automobile ownership prediction based on SVR. (a) Comparison between automobile ownership estimated values and statistical results; (b) Automobile ownership prediction error.
Table 1. Automobiles ownership and its impact factors on statistical data.
Table 2. Automobiles ownership and its impact factors on normalized statistical data.
As seen in Figure 1, the constructed automobile ownership prediction model is very well generalized and SVM presents an attractive tool for the automobile ownership prediction modeling.
4. Conclusion
In this paper, the SVR based modeling method for automobile ownership prediction was suggested, the learning sample set was constructed using the statistical results of automobile ownership and its influence factors, and the decision function for automobile ownership prediction was developed. The comparison of the automobile ownership prediction results with the statistical results confirms the SVR based automobile ownership prediction modeling method. However, there was no theoretical basis for the selection of penalty parameter C and accuracy parameter ε in SVR, and the arbitrariness in the selection of penalty parameter C and accuracy parameter ε will directly affect the accuracy of automobile ownership prediction modeling. Therefore, the next research work was to use the optimization algorithms including genetic algorithms, etc. for the optimization of penalty parameter C and accuracy parameter ε to improve the accuracy of SVR based automobile ownership prediction modeling.
Acknowledgements
This work is supported by the National Natural Science Foundation of China (Grant No. 51609132).