^{1}

^{*}

^{1}

This paper studied the clustering analysis of panel data, the specification test of panel data model and its parameter estimation. By carrying out clustering analysis on panel data, we finally decided to study the relationship of Chinese urban residents’ eight income levels between consumption and income from 2007 to 2012. Based on analysis of covariance in panel data model, we built the variable coefficient panel data model and then estimated the model parameters. In this work, we can identify the relationship between consumption and income in recent years. According to the estimation results, we drew the conclusion that income disparities have important influence on urban residents’ consumption behavior.

Panel data refer to two-dimensional data which are obtained in time series and cross section at the same time [

There are more advantages of building model on panel data than on time series data or cross section data. First, panel data model can estimate unobservable individual effect and time effect at the same time, so the panel data model is more efficient; second, panel data provide more information, so as to improve the degree of freedom of the model, reduce the multi-collinearity among the explanatory variables, and eventually improve the accuracy of parameter estimation [

Since the 70’s of the last century, a large number of theoretical and empirical analyses of panel data have sprung up [

This paper preprocessed consumption panel data and income panel data of Chinese urban residents’ eight income levels from 2002 to 2012, then carried out clustering analysis on the panel data, and finally concluded that the structures of consumption and income were same from 2007 to 2012. By the analysis of covariance for panel data model, eventually we built the variable coefficient panel data model on consumption panel data and income panel data of Chinese urban residents’ eight income levels from 2007 to 2012. Then, we used Eviews 7.0 to estimate the parameters of the model, and analyzed the results.

The panel data

Here are several kinds of commonly used method for measuring distance between cross sections. As shown below:

1) Euclidean Distance:

2) Squared Euclidean Distance:

3) Minkowski Distance:

4) Manhattan Distance:

5) Chebyshev Distance:

The clustering analysis of panel data can divide time sections into several divisions. Building model on one of the division can ignore unobservable time effect, which has important significance on the application. Zhu and Chen [

The basic principle of clustering analysis is: for the panel data

To build model on panel data, we must first determine the form of the model. General panel data model is as follows:

Among them,

The common situation of model (1) is as follows:

1) when

2) when

3) when

The common test for determining the model forms is the analysis of covariance, also is called F test. The test contains two main hypotheses:

Hypothesis 1: The slopes are the same, but the intercepts are not the same. The model is:

Hypothesis 2: The intercepts and slopes are the same in different cross sections and different time series. The model is:

According to the method in parameter constraint test, we can construct test statistics for the above two hypotheses^{1}. Test statistics for hypothesis 1 and hypothesis 2 respectively are:

Among them,

When hypothesis 1 is correct,

For fixed effect variable coefficient model (1), it can be rewritten as:

Among them,

The matrix form is:

Among them,

Fixed effect variable coefficient model is also called seeming unrelated regression model. The model considers that coefficients don’t change with time for each individual. It is put forward by Zellnerin 1962. The selection of parameter estimation method depends on the random disturbance term^{2}. If

So the generalized least square estimation of the parameters is:

According to the consumption theory of Keynes, the total consumption is the function of total income. As we all known, there are a stable and interdependent relationship between consumption and income. Namely income is the decisive factor in influencing consumption. We can relate this kind of relationship with regression theory, and build the linear model

With the development of the society, accessing to panel data becomes more and more easily, and building panel data model becomes more and more commonly. So we can build panel data model on income panel data and consumption panel data, and study the marginal consumption propensity and the intercept item among different individuals. By the empirical analysis, we can put forward feasible suggestion.

The modeling data is the per capita disposable income and the per capita cash expenditure of Chinese urban residents’ eight income levels from 2002 to 2012^{3}. In order to eliminate the rising factor of price^{4}, we regarded cpi of 2002 as 100, and recalculated cpi from 2002 to 2012. Then dividing the original data by recalculated cpi, and multiplying it by 100, finally we obtained the per capita disposable income panel data and the per capita cash expenditure panel data eliminated the rising factor of price. Using SPSS 19.0, we carried out clustering analysis of the panel data respectively. The following is the comparison of the cluster tree.

From

Due to the structure of consumption and income from 2007 to 2012 belongs to the same type, so we can set the model parameters as unaffected by time. The form is:

Among them,

Income levels | 2007 | 2008 | 2009 | 2010 | 2011 | 2012 |
---|---|---|---|---|---|---|

Poor households | 3357.9 | 3734.4 | 4197.6 | 4739.2 | 5398.2 | 6520 |

Lowest income households | 4210.1 | 4753.6 | 5253.2 | 5948.1 | 6876.1 | 8215.1 |

Lower income households | 6504.6 | 7363.3 | 8162.1 | 9285.3 | 10,672 | 12488.6 |

Lower middle income households | 8900.5 | 10195.6 | 11243.6 | 12702.1 | 14498.3 | 16761.4 |

Middle income households | 12042.3 | 13984.2 | 15399.9 | 17224 | 19544.9 | 22419.1 |

Upper middle income households | 16385.8 | 19254.1 | 21018.0 | 23188.9 | 26,420 | 29813.7 |

Higher income households | 22233.6 | 26250.1 | 28386.5 | 31,044 | 35579.2 | 39605.2 |

Highest income households | 36784.5 | 43613.8 | 46826.1 | 51431.6 | 58841.9 | 63824.2 |

Income levels | 2007 | 2008 | 2009 | 2010 | 2011 | 2012 |
---|---|---|---|---|---|---|

Poor households | 3447.7 | 3862.7 | 4256.8 | 4715.3 | 5575.6 | 6366.8 |

Lowest income households | 4036.3 | 4532.9 | 4900.6 | 5471.8 | 6431.9 | 7301.4 |

Lower income households | 5634.2 | 6195.3 | 6743.1 | 7360.2 | 8509.3 | 9610.4 |

Lower middle income households | 7123.7 | 7993.7 | 8738.8 | 9649.2 | 10872.8 | 12280.8 |

Middle income households | 9097.4 | 10344.7 | 11309.7 | 12609.4 | 14028.2 | 15719.9 |

Upper middle income households | 11570.4 | 13316.6 | 14964.4 | 16140.4 | 18160.9 | 19830.2 |

Higher income households | 15297.7 | 17888.2 | 19263.9 | 21000.4 | 23906.2 | 25796.9 |

Highest income households | 23337.3 | 26982.1 | 29004.4 | 31761.6 | 35183.6 | 37661.7 |

Year | 2007 | 2008 | 2009 | 2010 | 2011 | 2012 |
---|---|---|---|---|---|---|

cpi | 100 | 105.6 | 104.6 | 107.9 | 113.6 | 116.7 |

rising factor of price based cpi of 2007 as 100. In addition, due to the model studied each income group’s own data, so the parameters can be regarded as fixed parameters to estimate. Namely the model is the fixed effect model.

Using Eviews 7.0 to respectively calculate the sums of residual squares for variable coefficient model, variable intercept model and basic model under ordinary least square method (the calculation results is in _{2}, F_{1}, and comparing with the critical value under the significance level

The values of

Comparing with the critical value:

By the above comparison results, we can determine the model as fixed effect variable coefficient model.

Assuming that random disturbance items are irrelevant in different cross section individuals, then we can take each time series as sample, and use ordinary least squares method to estimate

From ^{5}, which is consistent with the hypothesis, thus the process of modeling and the results are believable.

By the parameter estimation results, the following conclusions can be drawn:

1) When income levels are different, there are obvious differences in marginal consumption propensity. And the marginal consumption propensity is decreasing with the improvement of income level. It shows that income disparity exactly is the decisive factor in influencing consumption, and the higher the income is, the weaker the marginal consumption desire is. That is consistent with the saying “diminishing marginal returns” in economics.

Model forms | Variable coefficient model | Variable intercept model | Basic model |
---|---|---|---|

Sum of squared residuals | 978,731 | 2,882,910 | 8,294,394 |

Income levels | Marginal consumption propensity | Intercept |
---|---|---|

Poor households | 0.916095 | −1431.855 |

Lowest income households | 0.80146 | −1154.611 |

Lower income households | 0.628855 | −324.979 |

Lower middle income households | 0.625534 | −265.0764 |

Middle income households | 0.619132 | −166.872 |

Upper middle income households | 0.606672 | −71.132 95 |

Higher income households | 0.59452 | 370.0824 |

Highest income households | 0.505467 | 3044.444 |

Cross-section fixed (dummy variables) | |||
---|---|---|---|

R-squared | 0.999665 | Mean dependent var | 12181.09 |

Adjusted R-squared | 0.999508 | S.D. dependent var | 7882.601 |

S.E. of regression | 174.8867 | Akaike info criterion | 13.42735 |

Sum squared resid | 978,731 | Schwarz criterion | 14.05109 |

Log likelihood | −306.2565 | Hannan-Quinn criter | 13.66306 |

F-statistic | 6363.363 | Durbin-Watson stat | 1.97787 |

Prob(F-statistic) | 0.000000 |

2) The intercept item is increasing with the improvement of income level. It shows that the absolute consumption level of urban residents is increasing by increased income.

3) In general, the marginal consumption propensity of different income levels is over 50%. It shows that no matter what the income levels of residents are, their consumption desire is very high. But different income levels may pursue different consumption direction.

Panel data model could analyze practical problems from the angles of time and the individual, so its application is becoming wider and wider. General theory about panel data has been relatively mature, and general linear panel data model was applied in this paper. According to the intercept item and marginal consumption propensity of variable coefficient panel data model, we can distinguish the spending habits in recent years between different income levels, and then introduce different policies to stimulate consumption. But this paper didn’t subdivide consumption into different directions, such as: food, clothing, household goods, etc. If we join these aspects into the model, the results will be more beneficial for stimulating consumption. And general panel data model could finish the idea. Additionally, we still need to study nonclassical panel data models, such as: dynamic panel data model and nonlinear dynamic panel data model. Long and Zhang [

The authors would like to thank for the assistance provided by Hongfu Pan and Guoshuai Wang. They suggested us the journal and told us online submission. We finally finished the paper with their concern.