The Empirical Research of Relationship between Consumption and Income for Chinese Urban Residents

This paper studied the clustering analysis of panel data, the specification test of panel data model and its parameter estimation. By carrying out clustering analysis on panel data, we finally decided to study the relationship of Chinese urban residents’ eight income levels between consumption and income from 2007 to 2012. Based on analysis of covariance in panel data model, we built the variable coefficient panel data model and then estimated the model parameters. In this work, we can identify the relationship between consumption and income in recent years. According to the estimation results, we drew the conclusion that income disparities have important influence on urban residents’ consumption behavior.


Introduction
Panel data refer to two-dimensional data which are obtained in time series and cross section at the same time [1], and that means taking multiple cross sections on time series, and selecting the sample observations on cross sections at the same time.With the development of the society, building model only on time series data or cross section data already cannot satisfy the increasingly complex economic problems.In addition, with the development of computer technology and internet, access to panel data becomes more and more easy.
There are more advantages of building model on panel data than on time series data or cross section data.First, panel data model can estimate unobservable individual effect and time effect at the same time, so the panel data model is more efficient; second, panel data provide more information, so as to improve the degree of freedom of the model, reduce the multi-collinearity among the explanatory variables, and eventually improve the accuracy of parameter estimation [2]; third, panel data model is more suitable for complicated economic problems.
Since the 70's of the last century, a large number of theoretical and empirical analyses of panel data have sprung up [3] [4].The theory of the general panel data model is mature [5]- [8].Bai [9] summarized setting, statistical test and new progress of panel data model.Many papers discussed the relationship between consumption and income [10]- [12].But the data are not in recent years.This paper used the panel data in recent years and combined clustering analysis with panel data.So the conclusion is more consistent with the reality.
This paper preprocessed consumption panel data and income panel data of Chinese urban residents' eight income levels from 2002 to 2012, then carried out clustering analysis on the panel data, and finally concluded that the structures of consumption and income were same from 2007 to 2012.By the analysis of covariance for panel data model, eventually we built the variable coefficient panel data model on consumption panel data and income panel data of Chinese urban residents' eight income levels from 2007 to 2012.Then, we used Eviews 7.0 to estimate the parameters of the model, and analyzed the results.

Clustering Analysis of Panel Data
The panel data , 1, , ; 1, , If we use the distance between the cross sections to measure the similarity, then we obtain a T T × similarity matrix, and it is a symmetrical matrix.The similarity matrix is as follows: δ is a dissimilarity degree measure between the t-th cross section and the s-th cross section, which also is a measure of the distance.When the two time sections are very similar, its value is close to zero.
Here are several kinds of commonly used method for measuring distance between cross sections.As shown below: 1) Euclidean Distance: ( ) ( ) 2) Squared Euclidean Distance: 3) Minkowski Distance: ( ) 4) Manhattan Distance: Manhattan distance is a special case of the Minkowski distance when p = 1.
5) Chebyshev Distance: The clustering analysis of panel data can divide time sections into several divisions.Building model on one of the division can ignore unobservable time effect, which has important significance on the application.Zhu and Chen [13] studied the clustering analysis of panel data and its application, and focused on the cluster in cross section.
The basic principle of clustering analysis is: for the panel data , 1, , ; 1, ,  , first of all, we di- vide each cross section into a class, then we have a total of T classes; secondly, according to the above distance calculation, we obtain a similarity matrix of panel data, then we merge the nearest two time sections into a class, so we have 1 T − classes; again, according to the similarity matrix, we merge the nearest two time sections into a class, so we have 2 T − classes; by analogy, we eventually merge all T time series into a class.

Analysis of Covariance
To build model on panel data, we must first determine the form of the model.General panel data model is as follows: , 1, , , 1, , .
Among them, it x is a 1 K × vector, and i β is a 1 K × vector, and K is the number of explanatory variables.i α is the intercept item, and its value is related to the individual, and it is regarded as the fixed parameter to estimate here.it u is a random error term, and it is not associated with explanatory variables, and its mean is zero, and its variance is2 u σ , and it is independent and identically distributed.The common situation of model ( 1) is as follows: 1) when , 1) is called the basic model or mixed regression model; 2) when , 1) is called the variable intercept model; 3) when , 1) is called the variable coefficient model.The common test for determining the model forms is the analysis of covariance, also is called F test.The test contains two main hypotheses: Hypothesis 1: The slopes are the same, but the intercepts are not the same.The model is: , .
Hypothesis 2: The intercepts and slopes are the same in different cross sections and different time series.The model is: .
According to the method in parameter constraint test, we can construct test statistics for the above two hypotheses 1 .Test statistics for hypothesis 1 and hypothesis 2 respectively are: , and S S S respectively are the sums of squared residuals for model ( 1), ( 2) and (3) under ordinary least square method.
When hypothesis 1 is correct, . Obviously, if we accept hypothesis 2, we don't need to test hypothesis 1, and we should build model (3).If we reject hypothesis 2, we should test hypothesis 1.If we accept hypothesis 1, we should build model ( 2).If we reject hypothesis 1, we should build model (1).

The Parameter Estimation of Variable Coefficient Panel Data Model
For fixed effect variable coefficient model (1), it can be rewritten as: Among them, The matrix form is: Among them, .
Fixed effect variable coefficient model is also called seeming unrelated regression model.The model considers that coefficients don't change with time for each individual.It is put forward by Zellnerin 1962.The selection of parameter estimation method depends on the random disturbance term2 .If ( ) 0, , model ( 4) can be estimated by ordinary least square method, which is the classic method in single equation econometric model.Namely we take each time series as sample, and use ordinary least squares method to estimate i b respectively, the generalized least square method to estimate ( ) at the same time.The two kinds of estimation results are consistent.If ( ) 0, ≠ , we can use the gene- ralized least square method to estimate B .We write ( ) , then the covariance matrix of U = ( ) .
So the generalized least square estimation of the parameters is:

The Empirical Research
According to the consumption theory of Keynes, the total consumption is the function of total income.As we all known, there are a stable and interdependent relationship between consumption and income.Namely income is the decisive factor in influencing consumption.We can relate this kind of relationship with regression theory, and build the linear model C Y α β = + on consumption and income.Among them, C is the per capita consumption expenditure.Y is the per capita disposable income.α is the intercept item.β is the marginal consumption propensity, and its value is between 0 and 1.
With the development of the society, accessing to panel data becomes more and more easily, and building panel data model becomes more and more commonly.So we can build panel data model on income panel data and consumption panel data, and study the marginal consumption propensity and the intercept item among different individuals.By the empirical analysis, we can put forward feasible suggestion.

Data Introduction and Preprocessing
The modeling data is the per capita disposable income and the per capita cash expenditure of Chinese urban residents' eight income levels from 2002 to 2012 3 .In order to eliminate the rising factor of price 4 , we regarded cpi of 2002 as 100, and recalculated cpi from 2002 to 2012.Then dividing the original data by recalculated cpi, and multiplying it by 100, finally we obtained the per capita disposable income panel data and the per capita cash expenditure panel data eliminated the rising factor of price.Using SPSS 19.0, we carried out clustering analysis of the panel data respectively.The following is the comparison of the cluster tree.
From Figure 1, we can classify the per capita disposable income from 2007 to 2012 into the same cluster.From Figure 2, we can classify the per capita cash expenditure from 2007 to 2012 into the same cluster.Therefore, we can build panel data model on the per capita disposable income and the per capita cash expenditure of Chinese urban residents' eight income levels from 2007 to 2012.Table 1 and Table 2 are two original panel data from 2007 to 2012.Because China's cpi is calculated based the previous year as the base period 100, not based a certain date as the base period, we needed to recount cpi since 2007.The calculation results are shown in Table 3.We needed to eliminate the rising factor of the panel data in Table 1 and Table 2. Then it could be put into the model.Namely dividing the original data by recalculated cpi in Table 3 respectively, and multiplying it by 100.

Build Model
Due to the structure of consumption and income from 2007 to 2012 belongs to the same type, so we can set the model parameters as unaffected by time.The form is: , 1, , 8, 2007, , 2012.
Among them, it y is the per capita cash expenditure of the i-th income group in the t-th year. it x is the per capita disposable income of the i-th income group in the t-th year.The two panel data have been eliminated the  rising factor of price based cpi of 2007 as 100.In addition, due to the model studied each income group's own data, so the parameters can be regarded as fixed parameters to estimate.Namely the model is the fixed effect model.

Model Identification
Using Eviews 7.0 to respectively calculate the sums of residual squares for variable coefficient model, variable intercept model and basic model under ordinary least square method (the calculation results is in Table 4), and putting N = 8, T = 6, K = 1 together into test statistics F 2 , F 1 , and comparing with the critical value under the significance level 0.05 α = , thus determining the model form.The values of 2 1 , F F are calculated as follows: Comparing with the critical value: By the above comparison results, we can determine the model as fixed effect variable coefficient model.

Parameter Estimation
Assuming that random disturbance items are irrelevant in different cross section individuals, then we can take each time series as sample, and use ordinary least squares method to estimate i β .The following are the para- meter estimation results.
From Table 5, we can conclude that the marginal consumption propensity is decreasing and the intercept item is increasing with the improvement of income level.From Table 6, we can learn that the goodness of fit of the model is as high as 99.9%.It indicates that the fitting effect of fixed effect variable coefficient model good.Statistics F also passed the test of significance.It indicates that the regression equation is significant as a whole, and the regression coefficients are significant.It shows that income has significant effect on consumption under each income level.The value of statistic DW is close to 2, so there is no first-order autocorrelation in the random error term it u 5 , which is consistent with the hypothesis, thus the process of modeling and the results are believable.

Results Analysis
By the parameter estimation results, the following conclusions can be drawn: 1) When income levels are different, there are obvious differences in marginal consumption propensity.And the marginal consumption propensity is decreasing with the improvement of income level.It shows that income disparity exactly is the decisive factor in influencing consumption, and the higher the income is, the weaker the marginal consumption desire is.That is consistent with the saying "diminishing marginal returns" in economics.2) The intercept item is increasing with the improvement of income level.It shows that the absolute consumption level of urban residents is increasing by increased income.
3) In general, the marginal consumption propensity of different income levels is over 50%.It shows that no matter what the income levels of residents are, their consumption desire is very high.But different income levels may pursue different direction.

Conclusion
Panel data model could analyze practical problems from the angles of time and the individual, so its application is becoming wider and wider.General theory about panel data has been relatively mature, and general linear panel data model was applied in this paper.According to intercept item and marginal consumption propensity of variable coefficient panel data model, we can distinguish the spending habits in recent years between different income levels, and then introduce different policies to stimulate consumption.But this paper didn't subdivide consumption into different directions, such as: food, clothing, household goods, etc.If we join these aspects into the model, the results will be more beneficial for stimulating consumption.And general panel data model could finish the idea.Additionally, we still need to study nonclassical panel data models, such as: dynamic panel data model and nonlinear dynamic panel data model.Long and Zhang [14] studied theory and application of dynamic panel data model.But its parametric and nonparametric estimations still need to be studied further. i

Table 1 .
Per capita disposable income of Chinese urban residents (RMB).

Table 2 .
Per capita cash expenditure of Chinese urban residents (RMB).

Table 3 .
Recalculated cpi values based cpi of 2007 as 100.

Table 4 .
Sums of squared residuals of the three models.

Table 5 .
The estimation results of the variable coefficient model.

Table 6 .
The statistical results of the variable coefficient model.