Time and Space Analysis of House Price in Mainland China in the Last 10 Years

In this paper, time series analysis, geographic information, cluster analysis, causality test and other techniques and means are used to analyze the housing prices of mainland China in the past 10 years. Kmeans clusters after comparison of house prices between provinces and neighboring provinces, provinces of the same category are continuously distributed on the map; the highest price area and the lowest price area on the time series are also continuously distributed on the map; the time series-based causality test found that the growth rate of house prices in six provinces was affected by the surrounding provinces, and the growth rate of house prices in one province affected the growth rate of house prices in neighboring provinces.


Introduction
In recent years, the house prices among all provinces, autonomous regions and municipalities across China show a clear inclining trend, but the patterns of incline vary a lot across regions.For example, the average sales price of residential commercial housing in Shandong Province increases from 2904.14 Yuan/m 2 (Year 2007) to 5855 Yuan/m 2 (Year 2016), with an overall growth rate of 6.45%; however, the average sales price of residential commercial housing in the Ningxia Hui Autonomous Region shows a lower rate of increment, which increases from 2722.58 Yuan/m 2 (Year 2007) to 5485 Yuan/m 2 (Year 2016) with an overall growth rate of 14.78%.This paper uses time series analysis, geographic information, cluster analysis, causality test and other techniques to study the relationship of house prices among the all provinces, autonomous regions and municipalities (except Hainan Province) across mainland China.
Previous researches in China have proved the practicability of using time series analysis in studying house prices.Xiuli Wu and Feng Zhang (2007), using the house price in Guangzhou as a subject, divide Guangzhou into extremely bustling regions, average bustling regions and non-bustling regions and create time series analysis models for house price prediction separately [1].This research proved that time series analysis is reasonable in analyzing the trend of house prices.Li He (2014) predicted commercial residential housing price index in Beijing in 2014, using time series analysis [2].Li Li and Jinwen Wu (2016) showed the application of time series model in the relationship between land prices and house prices [3].Xuyi Xie (2006) used time series analysis to examine the viewpoint that house prices have a more apparent impact to land prices in the long term.Foreign scholars also use this method in analyzing house prices under foreign economies [4].Willcocks (2009) uses time series analysis to study house price in the UK [5].
Space analysis also gains wide recognition in house price analysis.Introducing space variables into economic models has made them excellent alternatives to space-relating variables that are easily ignored.Modern development in GIS technology has provided great convenience in studying economics across regions.Zhixiong Mei and Xia Li (2007), using the house price in Dongguan as a subject, plotted the prices of residential houses in a grid graph and superimposed major roads buffer zones on to the house price distribution graph, in order to discover and explain the spatial variance in house prices of Dongguan [6].Can (1998) used GIS technology to analyze the spatial relativity between the housing market and the mortgage market [7].
This paper conducts time series analysis and spatial analysis of house prices in mainland China in the past 10 years.

Data Sources
There are two main sources of our data.The first one is the house prices from 2007 to 2016 in the provinces.This is obtained from the National Bureau of Statistics website.According to the System of Statistics Reports on Real Estate Development (2018), the data are collected from all real estate developing and management legal entities.All surveyed entities report their data through the direct network reporting system monthly [8] (Table 1).
The second data source is a list of provinces adjacent to the provinces (Table 2).This is collected from the map of China.Since Table 1 does not contain data for Hong Kong, Macau, and Chinese Taipei, we will not include these three regions in the next map analysis.

Geographic Information Technology
Geographic Information System (GIS) is a special-purpose digital database in which a common spatial coordinate system is the primary means of reference.
Comprehensive GIS require a means of: Data input, from maps, aerial photos, satellites, surveys, and other sources; data storage, retrieval, and query; data transformation, analysis, and modeling, including spatial statistics; data reporting, such as maps, reports, and plans [9].
This paper uses the map of China as geographical reference to analyze data in house prices of provinces in mainland China.

Time Series Analysis and Granger Causality Test
Time series analysis is a permutation combination of variable values with equal time intervals.There are two purposes for using time series analysis: understanding the underlying drivers and structures of observed data, finding suitable models, and predicting and monitoring [10].This article will use the Granger  [11].
Analysis tool: This article uses the lmtest package for Granger causality testing.The code is implemented in R language version 3.5.

Cluster Analysis
Cluster analysis refers to the process of grouping a collection of physical or abstract objects into multiple classes of similar objects.The goal of cluster analysis is to collect data on a similar basis to classify [12].
Chen Jian (2007) briefly introduced the concept and principle of clustering analysis algorithm [13].Cluster analysis is an ideal multivariate statistical technique, mainly consisting of hierarchical clustering and iterative clustering.Cluster analysis, also known as group analysis and point group analysis, is a multivariate statistical method for studying classification.This paper uses KMeans in sklearn cluster for cluster analysis, and uses hierarchy in scipy cluster for hierarchical clustering.The code is implemented in Python 3.6.Sklearn is a commonly used python third-party module in machine learning that can be installed via pip.Scipy is a commonly used third-party python module for data analysis.It can also be installed via pip.housing prices within the province divided by the average annual rate of increase of residential housing prices among its neighboring provinces is 0.754 among the first-cluster provinces; the average is 0.962 among the second-cluster provinces; the average is 1.187 among the third-cluster provinces; the average is 2.146 among the fourth-cluster provinces.

Monographic Analysis
The rates of house price increment are relatively slow in the first-cluster provinces compared with their neighboring provinces.These provinces, for example Hebei Province, Jiangsu Province, and the Guangxi Zhuang Autonomous Region, are mostly the neighboring provinces of the fourth-cluster provinces.They are left behind in the process of urbanization in the context of China's high-speed development of urbans.Also, net outflow of population appears in these provinces.The population emigrated from the first-cluster provinces mainly end up in more economically advanced regions, especially in the fourth-cluster provinces.This movement of population decreases the rigid demand for housing in the first-cluster provinces, while increasing the demand for housing in the fourth-cluster provinces, resulting in a speed-up in house prices growing in the fourth-cluster provinces, and a slow-down in the first-cluster provinces.
In the second-cluster provinces, the rates of house price increment stay almost the same with their neighboring provinces.These provinces are mainly located in the Northeast, the North, the Northwest, the West, and the Southwest.Little variance of economical development is shown among a second-cluster province and its neighboring provinces.And population flow is not significant among them.The rates of house price increment of the third-cluster provinces are slightly higher than their neighboring provinces, but the differences are trivial.The third cluster includes Sichuan Province, Shaanxi Province, and Hubei Province in the Middle, Liaoning Province in the Northeast, and Zhejiang Province in the East.
The fourth-cluster provinces are clearly shown on the map, which are the three cores of house price increasement: Guangdong Province, Shanghai Municipality, and Beijing Municipality.They are also the cores of economic advancement and highly urbanized regions in mainland China.
2) Hierarchical clustering of house price growth rates On the basis of Table 1, the house price growth rate is calculated, and the house price growth rate of each province from 2008 to 2016 is obtained.The hierarchical clustering of the nine-year house price growth rate in each province, we get the following picture.
Referring to the above Figure 2, we can draw the following conclusions.Shanghai Municipality, Beijing Municipality, Jiangsu Province, and Tianjin Municipality can be sorted into one cluster.Because all of them are highly developed in their economies, huge net migration inflow and active housing property investment make them similar in the pattern of house price increasement.In addition, when the central government starts controlling the house prices, the policies are made similar among these provinces.
Tibet, as an ethnic minority autonomous region assigned by Beijing, holds a large population of minorities.Population inflows and outflows of Tibet have rigidity.Its cultural and political differences from other provinces make it a special case.
Other provinces are similar in their pattern of house price fluctuation.

Analysis of the Highest and Lowest Price Maps in Time Series
The average selling price of residential commercial housing across all provinces in mainland China is highly polarized.The two municipalities of Beijing and Shanghai have the first and the second highest average selling price of residential low house prices are located in the Northwest part of China.They are located far from the ocean, and at high attitudes, with vast areas of grasslands, plateaus, deserts, and an arid environment.Harsh natural environments and economic conditions make these provinces unsuitable for human dwelling, and decreases the qualities of residential housing inside these provinces.Guizhou Province, Hunan Province, and Jiangxi Province are located in the south-central China, with some distance from the major coastal cities, such as Shenzhen in Guangdong Province, Fujian in Xiamen Province, and Hangzhou in Zhejiang Province.With the high-speed economic development in their neighboring provinces, these relatively under-developed inland provinces have huge net emigration into the coastal provinces.Thus, their relatively low rigid demand in Guizhou Province, Hunan Province, and Jiangxi Province, contributes to a depression of house prices in these provinces.

Pulling Analysis with Neighboring Provinces
First, calculate the growth rate of house prices in each province from 2008 to 2016; then, according to Table 2, calculate the growth rate of house prices in all provinces from 2008 to 2016 (the tie value of all neighbors).Finally, the Granger causality test is used to calculate two time series of house price growth rates in each province and its neighboring provinces.
1) Pulled by neighboring provinces The calculation results show that there are four provinces in the delay parameter Order = 1, namely Henan, Fujian, Yunnan and Jilin.In the case of the delay parameter Order = 2, there are two provinces, Tianjin and Shanghai respectively, as shown in Table 4 below.
In the above table, the P values of Henan and Yunnan provinces are less than 0.001, which is very significant.We draw the actual growth rate curve of house prices, as shown in Figure 4.
From Figure 4, we can see that with the rise, flatness, decline and rise of neighboring provinces, the growth rate of housing prices in Henan Province also responded with changes, and the time interval was very short.Looking at Yunnan, with the decline, rise, flat, decline, rise and fall of neighboring provinces, the growth rate of housing prices in Yunnan Province also responded to changes, a difference of about 1 year.
Two clustering methods, K-MEANS clustering and hierarchical clustering are used in this paper.The K-MEANS clustering method[14] is a kind of iterative clustering.The K-MEANS algorithm is an algorithm that inputs the number of clusters k and a database containing n data objects, and outputs a minimum of k clusters that satisfy the minimum variance.The k-means algorithm accepts the input quantity k; then divides the n data objects into k clusters so that the obtained clusters are satisfied: the object similarity in the same cluster is higher; and the object similarity in different clusters Smaller.Hierarchical clustering is a general term for a class of algorithms that continuously merges clusters from bottom to top, or continuously separate clusters from top to bottom to form nested clusters.This level of class is represented by a "tree"[15].The Agglomerative Clustering algorithm is a hierarchical clustering algorithm.The principle of the algorithm is very simple.In the beginning, all the data points themselves are clustered, and then the two clusters closest to each other are found to be combined into one, and the above steps are repeated until the preset number of clusters is reached.

Figure 1 .
Figure 1.Cluster display comparing house prices in neighboring provinces.

Table 1 .
Average sales price of commercial housing in the province from 2007 to 2016 (yuan/square meter).

Table 2 .
List of adjacent provinces of mainland China.
concept based on predictive causality.According to the Granger causality test, if X1 is the Granger of X2, the historical value of X1 should contain future values that help predict X2.The mathematical formula is based on a linear regression model of the stochastic process causality test to analyze the changes in housing prices in mainland China in the past decade and the potential factors affecting housing prices.Granger causality is a statistical

Table 3 .
the growth rate of house prices in each province from 2008 to 2016; b) Calculate the 9 years of neighboring provinces in each province by using the information of neighboring provinces in Table2.The growth rate of housing prices, average; c) Calculate the ratio of the growth rate of house prices in each province to the average of the surrounding provinces, we get Table3.Comparison with surrounding provinces (average).According to Figure1, all provinces, autonomous regions and municipalities, except Hainan Province, in mainland China are put into four clusters in the choropleth map.The average value of the annual rate of increase of residential 1) Kmeans clustering of house price growth rate and ratio of surrounding provinces On the basis of Table 1, a) Calculate the growth rate of house prices, and ob-tain Chongqing 1.056 1.013 1.088 1.180 1.074 1.067 1.133 1.112 1.079 1.019 Modern Economy

Table 4 .
Granger causality test driven by neighboring provinces (only listed with significant P values).