Prediction of Rural Residents ’ Consumption Expenditure Based on Lasso and Adaptive Lasso Methods

When the variable of model is large, the Lasso method and the Adaptive Lasso method can effectively select variables. This paper prediction the rural residents’ consumption expenditure in China, based on respectively using the Lasso method and the Adaptive Lasso method. The results showed that both can effectively and accurately choose the appropriate variable, but the Adaptive Lasso method is better than the Lasso method in prediction accuracy and prediction error. It shows that in variable selection and parameter estimation, Adaptive Lasso method is better than the Lasso method.


Introduction
Consumption, investment and export have always been referred to as the "troika" of economic growth.In China, investment and export are the main power of economic growth for a long time.But compared with the investment and the export, the level of residents' consumption in our country is very low, especially the rural residents' consumption which depressed for a long time.So, the government also stressed several times that we need to focus on expanding domestic demand, especially consumer demand.Therefore, discovering the main factors influencing the rural residents' consumption in China has a very important practical significance.
There are many empirical studies of literature about consumer spending in China.For example, by using panel data of 31 Chinese provinces from 2000 to 2011 through regression analysis, Zhao [1] found that the main factors influenced the rural residents' consumption level is the net income of rural residents.From the perspective of total and category expenditure, Liu and Wu [2] analyzed the local government expenditure on livelihood and rural consumption from the macro perspective, and made an empirical test by using the panel data of 31 Chinese provinces from 1998 to 2011.Chu and Yan [3] established a variable intercept model of the fixed effect of local government expenditure on rural residents' consumption demand.They found that the effect of local government expenditure of supporting agriculture on rural residents' consumption demand is positive, but transfer payment is not significant.Chen and Liu [4] set up different types of rural credit model which affects the rural residents' consumption experience, found that whether short-term or long-term, consumer credit can boost rural consumption better than production.Li [5] empirically analyzed the relations between the social security expenditure and the rural consumption in China.Wen and Meng [6] used ELES model from the urban and rural residents' consumption structure comparison, rural residents' marginal propensity to consume, marginal budget share, basic consumer spending, the actual consumer spending, consumption income elasticity of demand and consumption structure changes, analyzed China's rural residents' consumption structure and its evolution in across time respectively.Hu, Tian and Xia [7] empirically analyzed of China's fiscal expenditure for supporting agriculture's impact on rural residents' consumption by using the annual time series data from 1978 to 2010.Yu and Zhang [8] created a multivariate prediction model based on the combination of Lasso method and BP neural network, and prediction of China's urban and rural resident's consumption expenditure.
In this paper, we are using R statistical software's Lars packages and Msgps packages, modeling and projections for rural residents' consumption expenditure about China by using the Lasso method and the Adaptive Lasso respectively.

Lasso Method and Adaptive Lasso Method
Supposing this paper has model data , considering usually linear model: Which Usually write model (1) as the following form: , Which 1 n Y × as the response vector, n p X × as the independent variable matrix, 0 β as constant, 1 p β × as the coefficient vector, assume that the data has been standardized, remember ( ) , , , Lasso method to estimate is defined as: here s ≥ 0 is the penalty parameter.The optimal solution of ( 3) is called the Lasso solution; the entire Lasso solution can be obtained by changing the s values, at this time, this paper uses k-fold CV and Mallows C p criteria to choose the best model.k-fold CV is a common method of evaluation model, it roughly put all of the observation data divided into k equal parts, and then take turns to use one of the k − 1 parts for the training set, used to fitting data, the remaining part is a test set, totally calculating k times, get the k index of the mean square error of fitting test set, do an average, then repeat all of the steps of the above, then select the model of the minimum average mean square error.C p criteria is also a standard which used to assess a regression model, if select a p independent variables from the k independent variables to involved in regression, then C p criteria is defined as: Therefore, this paper can choose a model with minimum C p .Lasso method selects variables, at the same time, it is good in estimates the unknown parameters, can solve the multicollinearity problems that exist in the model better, especially the high-dimensional data processing.
Adaptive Lasso method to estimate is defined as: In Type (5), ( ) as the weight coefficient, and γ > 0 as a adjustment parameters, ˆi β is the initial estimate about the parametersi, ˆi β can use the least squares estimate, ridge estimate and Lasso estimates.The optimal solution of ( 5) is called Adaptive Lasso solution, all Adaptive Lasso solution can be obtained by changing the svalues, at this moment, Mallows C p criteria, AIC criteria, GCV criteria and BIC criteria can be used to choose the best model [9].The Adaptive Lasso method using different weight coefficient, with a smaller weight punish the variable which regression coefficient is larger, with a larger weight punish the variable which regression coefficient is smaller, makes the selected variables more accurately.Due to Lasso method use the same weight of all coefficient, and the Adaptive Lasso methods based on different variables given different weights, with a smaller weight punish the variable which regression coefficient is larger, with a larger weight punish the variable which regression coefficient is smaller, improved the Lasso method in variable selection, which cannot meet the model selection of consistency and parameter estimation lack of convergence speed to n , makes selected variables more accurate.For this purpose, this article respectively building the forecasting model based on the method of Lasso and Adaptive Lasso, to forecast the rural residents' consumption expenditure in China.

Variable Selection and Data Sources
In this paper, on the basis of the theory of economics and the research of Yu and Zhang [8], 16 variables which influenced rural residents' consumption expenditure(y) are selected.Name of the 16 variables specific in Table 1.
In this article, the dependency ratio data was from Statistical Yearbook of China Population.The interest rate data was from the website of the people's bank of China, interest rates will be subject to the one-year rate stipulated by the central bank, if there are multiple interest rate in a year, then use weighted average, the weight of the interest rate used in accounted for the proportion of 12 months.Other variable data are from 1981-2015 periods, China Statistical Yearbook.

The Prediction of Rural Residents' Consumption Expenditure Model Based on Lasso Method
From Figure 1, this article just need 24 steps to get all the Lasso solution, when parameters s = 1, all variables into the model.Because of the value of k-fold CV is bigger The first industrial output value Employment figure x 5 The tertiary industry output value x 13 Income distribution gap x 6 The annual fixed asset investment x 14 Spending habits x 7 The interest rate than the value of C p criteria, so this paper selects the most optimal Lasso solution according to Mallows C p criteria.This article gets minimum C p value when step is 20, then the model is optimal, in the end, this paper chooses 11 variables such as , , , , , , , , , , x x x x x x x x x x x .From the result, inflation is the most important factors influencing the rural residents' consumption expenditure; And highway mileage, Residents' disposable income, the annual fixed assets investment and the tertiary industry output value have a positive impact on rural residents' consumer spending; But the GDP growth rate, young dependency ratio, employment figure, income distribution gap, consumption habits and post and telecommunications business has a negative effect on the rural residents' consumption expenditure; Other factors are not significant in affect the rural residents' consumption expenditure, are elected to the model.LASSO model: On this basis, this paper predicts China rural residents' consumer spending from 2008 to 2014, forecasting results are shown in Table 2.
From Table 2, the forecasts of rural residents' consumption expenditure is widespread undervalued, but from the perspective of the relative error, the estimate result is more stable, the prediction error decreases year by year, proves the estimate effect of the model of Lasso method is good.

The Prediction of Rural Residents' Consumption Expenditure Model Based on Adaptive Lasso Method
Through the use of Adaptive Lasso method (see Figure 2 , , , , , , , , , , , , x x x x x x x x x x x x x .From the result, inflation is the most im- portant factors influencing the rural residents' consumption expenditure; And highway mileage, the interest rates, residents' disposable income, expending on social security and the annual fixed assets investment are have a positive impact on rural residents' On this basis, this paper predicts China rural residents' consumer spending from 2008 to 2014, forecasting results are shown in Table 3.
From Table 3, the prediction of rural residents' consumption expenditure is also has the phenomenon of widespread undervalued, but from the perspective of the relative error, the estimate result is more stable, the prediction error decreases year by year, and the most important point is that the prediction result of the Adaptive Lasso method is more close to the real value than the Lasso method, proves the estimate effect of the model of Adaptive Lasso method is better than the Lasso method.

Figure 1 .
Figure 1.The variable selection step with Lasso method.

Figure 2 .
Figure 2. The variable selection with Adaptive Lasso method.

Table 1 .
The set of independent variables.

Table 2 .
Rural residents' consumer spending predictions based on the lasso method.But the GDP growth rate, young dependency ratio, education, employment figure, income distribution gap, consumption habits and post and telecommunications business are have a negative effect on the rural residents' consumption expenditure; Other factors are not significant in affect the rural residents'

Table 3 .
Rural residents' consumer spending predictions based on the Adaptive Lasso method.