Analysis of the Characteristics of City Scale Distribution and Evolutionary Trends in China

Based on the 
urban resident population statistics from 2005 to 2018, this paper 
analyzes the distribution and evolution of city scale in China by screening 
city samples according to the threshold criteria and using empirical research 
methods such as the City Primacy Index, the Rank-Scale Rule, the Gini 
coefficient of city scale, Kernel Density Estimation and Markov transfer 
matrix. The results show that: the most populous city in China has obvious 
advantages. The population distribution is concentrated in high order cities 
and in accordance with the law of order-scale; the economic scale of cities is in a 
concentrated state, the gap between the economic development levels of 
different types of cities is large, and the megacities are more attractive, 
which to a certain extent limit the development of the scale of the rest of the 
cities; the number of China’s city 
population is increasing, however, the gap between the population 
scale of other cities and the most populous city continues to be large, and the 
structure of city population scale is not reasonable enough; megacities and megalopolises 
keep their original scale levels unchanged to a large extent, and the scale 
transition between the two types of cities is rather difficult. Finally, based 
on the explanatory framework of the dynamics of city scale evolution, policy 
recommendations are proposed to promote a more balanced distribution of city 
scale.


Introduction
City scale is a core topic in city research, and an accurate understanding of the evolutionary dynamics of China's city scale can provide strong technical support arguing that the factors affecting economic development also have an impact on the evolution of city scale distribution trends [9]; Sen-Sheng Li et al., applied a joint cubic equation model in economics and explored that the city-rural income gap also has an impact on the city scale distribution [10]; Yan Zhou et al., im-proved the land use model for city scale development by adding land constraint parameters in terms of population growth and land expansion to derive a quantitative relationship between the rate of land expansion and the rate of population growth [11]. Meanwhile, some scholars have explored the characteristics of city scale distribution in different regions in China, such as the Poyang Lake City Cluster [12] and the Central Liaoning City Cluster [13], and concluded that the city scale of each region in China shows uneven distribution.
However, there are fewer relevant studies using multiple research methods such as Zipf's law [14], kernel density estimation [15], and Markov transfer matrix [16] [17] in the aforementioned studies, which, together with the lack of accuracy of indicators characterizing city scale in empirical studies on the distribution and dynamic evolution of city scale in China, and the limitations of administrative statistical units, lead to some discrepancies between the estimated and true city scale. Therefore, this paper will consider the distribution characteristics

City Primacy
The American geographer Jefferson (1939) considered the first city as the most populous city in a region and its number would be higher than the second city by two times, therefore, the ratio of the population scale of the first and second cities was proposed to represent the firstness, i.e., S 2 . Later on, some scholars improved it and proposed the four-city index (S 4 ) and the eleven-city index (S 11 ) [18], which are calculated as follows: where 1 2 3 , , , P P P  denotes the number of population of cities in order 1 st , 2 nd , 3 rd , …, respectively. According to Felix Auerbach's (1913) Theory of City Rank Scale Distribution [19], ideally S 2 is 2, and S 4 , S 11 are 1.

The Rank-Scale Rule test
The American linguist George Kingsley Zipf (1935) established the city Rank-Scale Rule, known as the Zipf distribution. The general relationship equation is as follows: To transform the expression (2), we get: where: 1 P is the population of the most populous city; 2 3 4 , , , P P P  refers to the population of the 2 nd , 3 rd , and 4 th , etc., cities, respectively; i R represents the order of i city; q denotes the Zipf index.

City Gini Coefficient
Marshall first introduced the city Gini coefficient for the study of city scale, and the specific formula is: Specifically, n represents the number of cities. G represents the city Gini coefficient, which can be expressed as PG or EG. PG refers to the Gini coefficient of city population scale, with  Rosenblatt (1955) and Emanuel Parzen proposed the Kernel Density Estimation, also known as KDE, in 1962, which is a nonparametric estimation. The general expression is:

Kernel Density Estimation (KDE)
where ( ) f x is the probability density function to be estimated and ( ) . k is the kernel function, satisfying symmetry and , also, n is the number of observations and h is the bandwidth. It is worth noting that the choice of h affects the smoothness of the estimated function as well as the fitness of the model, which in principle should satisfy the minimum of the mean square error. In this paper, the commonly used Epanechnikov kernel function is selected, and the h-value is automatically adjusted in MATLAB according to the sample data characteristics to be optimal.

Markov Transfer Matrix
The city scale data are divided into k classes, and to calculate the probability of cities in each class and the probability of transferring cities between each class, this process can be approximated as a Markov process.
The Markov transfer matrix is denoted by M, ij M denotes the one-step transfer probability of a city belonging to rank i in time t converting to rank j in the next pe- , where ij n denotes the sum of the number of cities converting from rank i in time t to rank j in time 1 t + . i n is the sum of the number of all rank i cities in time t. The process of city scale evolution is accompanied by the rise and fall of each city scale class, i.e., the change of mobility in different classes of cities. In this paper, two mobility measures are used, the first of which is the SMI (Shorrocks Mobility Index) as defined by Shorrocks [20]: The scale of SMI depends on The second one is to construct a new liquidity measure DMI (Direction Mobility Index). It can be expressed as: The former part of expression (7) indicates the mobility index of cities moving to higher levels, i.e. upward mobility level, and the latter part indicates downward mobility level, where is the weight of mobility across levels, when the higher the number of levels crossed, the more hard it is to move, so it needs to be given a greater weight. DMI > 0 indicates that cities are more likely to cross to higher levels. DMI < 0 indicates that cities are more mobile downward, i.e., cities are underdeveloped in later stages and their scale levels are decreasing.

Sample Selection and Data Sources
Most studies on city scale use two types of data, one is the data of "population of municipal districts" in the City Statistical Yearbook, which includes the entire population of city areas and a large number of rural areas and is easy to overestimate city scale; the other type of data is the statistical caliber of household population, using the data of "city non-agricultural population" to measure, which lacks systematic and accurate statistics of the transient and mobile population, it is easy to underestimate city scale. Accordingly, the choice of either "municipal population" or "non-agricultural population" can lead to errors in measuring the scale of cities. The city resident population of the national census data from 2005 to 2018 is selected in this paper, and in order to more accurately portray the mobility characteristics of cities, a population scale of 5 million and the National GDP per capita (billion yuan) of that year are also selected as the threshold lower bound at the same time. Defining thresholds is of high significance for the study: for one, this paper uses both the population of 5 million and the National GDP per capita as the lower threshold to filter out cities that meet both the population scale and a certain level of economic development, so that it is easy to identify them from other cities and the economic links are more obvious. The second one is that cities will gradually prosper or decay in the process of development, and setting the threshold can make the cities whose original resident population scale exceeds the threshold enter the sample, while those below the threshold in the later stage drop out of the sample. large city (population number 1 to 5 million, among which the population number of type I large cities is 3 to 5 million, and the population number of type II large cities is 1 to 3 million); 4) medium city (population number 500,000 to 1 million); 5) small city (population number less than 500,000, including type I small cities with population number 200,000 to 500,000, and type II small cities with population number less than 200,000), according to which the distribution of city population class scale is obtained in this paper (See Table 1 and Figure   1).
As seen in Table 1   withdrew from the sample because its per capita gross product was lower than the national per capita gross product in the same year, which was due to the unhealthy industrial structure, relying mainly on light industry, and most people working in purely manual technical categories, which did not have significant advantages compared to other industries. Moreover, the foundation of the Internet industry is quite weak. The most critical thing is that Shijiazhuang has only one main line of Beijing-Guangzhou Railway in its territory, and as the capital city of the province, it has no advantages in terms of geographical location, further leading to Shijiazhuang's lack of development momentum out of the sample.
Following the visualization of the data in Table 1, Figure 1      Beijing and Shanghai, which shows that the population and economic scale of Chengdu are rapidly developing. When combined with Table 1 and Table 2, it can be concluded that Zhengzhou has developed steadily and continuously ascending from a medium-scaled megalopolis to a megacity. Zhengzhou has been the transportation hub of the south and north from the ancient times, and has unparalleled geographical advantages. The implementation of a diversified economic layout in recent years has led to the development of heavy industry, primary manufacturing, and labor-intensive enterprises in Zhengzhou, attracting a large number of mid-to-high-end talents, making it the fastest-growing GDP and fastest-improving economic ranking of any major northern city. Throughout the sample period, Guangzhou, Shenzhen has developed the most momentum and maintained a high growth rate of continuous growth, which is inextricably linked to their industrial structure that emphasizes science and technology innovation.

Evolution of the City Primacy Index
Equation (1) was further applied to calculate the trends of S 2 , S 4 and S 11 from 2005 to 2018 population data (see Figure 4). As can be noted from

City Rank-Scale Rule test
Bringing the population data into Equation (  analysis is small from 2 0.9 R > , indicating that the regression model based on the rank-scale rule fits the actual city population data effectively. Meanwhile, by 1 q > can be indicated that the city scale distribution is in the concentration stage. Also represents that high order cities such as Shanghai, Beijing, Guangzhou, Shenzhen, etc., have strong economic, technological power and significant advantages, which is why the concentration of population scale is obvious, while low order cities such as Zhangzhou, Xiangyang, Yueyang, etc., have smaller scale and lower development level. As a whole, q shows a rising trend, suggesting that the population scale of high order cities expands faster than that of low order cities, giving rise to a trend of large-scale dispersion and local concentration of city scale distribution. The strong attractiveness of some megacities has produced a monopoly effect in some ranges, exerting a driving effect while also producing a restrictive constraint effect, which results in a large difference in the speed of development between the high order cities and the rest, causing a disproportionate city development.

Analysis of City Scale Distribution Based on Gini Coefficient
Application of Equation (4)

Kernel Density Distribution of City Scale
We estimated kernel density for city scale data in 2005, 2010, 2015 and 2018 and obtained city scale kernel density distribution (Figure 7) by using MATLAB software with the horizontal coordinate being the city population scale and the vertical coordinate being the Kernel density, the results show that: The first one is that the kernel density curve of city scale gradually shifts rightward with the increase of time, which represents the high urbanization rate and rapid growth of city population scale in China.
Moreover, when the peak of the kernel density curve is higher and the curve on the right side of the peak is steeper, it indicates that the distribution of the data is more concentrated [21]. From Figure 7, we can see that the peak of the kernel density curve of city scale shows a trend of gradual decrease, reflecting that the population scale of some megalopolises are increasing year by year, with a narrowing gap with the megacities over the years.
Last but not least, the kernel density curve in the figure is mainly decentralized single-peaked, and the peaks show a tendency to spread, which demonstrates that the population tends to be more and more regionally concentrated and the scale distribution among cities is uneven, however, the change in the number of megalopolises also reflects that this uneven trend is being gradually adjusted.

Dynamic Evolutionary Analysis of City Scale
In  (Table 3). The main city transformations in this paper are: I → I , I → II, II → I, II → II. In Table 3 The mobility measure index of Markov transfer matrix ( Figure 8) is calculated by Equation (6) and Equation (7), and then combined with the distribution and analysis of the city class scale in Table 1.
In the beginning, the overall level of liquidity tends to increase, with an in-   Zhangzhou is located on the west coast of the Strait, the good ecological environment which allows it to maintain a high economic growth rate, so that the population grows faster and becomes a megalopolis in 2018.  2) The Rank-Scale rule fits the data of Chinese cities well and conforms to their distribution characteristics, that is, the population distribution is concentrated in high rank-order cities and the number of low rank-order cities is small.

Research Findings
Shanghai and Chongqing have played a central role in regional development, have an obvious effect of radiation on the surrounding city. At this stage, the distribution of city population scale in China has shown the trend of gradually increasing the most populous city, which means that the phenomenon of unreasonable city population scale distribution is becoming more serious.

Suggestions for Measures
Based on the constructed explanatory framework of the dynamics mechanism of city scale evolution (Figure 9), this paper proposes specific measures in four aspects: government measures, economic development, spatial pattern, and social livelihoods, respectively: the central cities and grow into new regional economic growth points, so that they can give full play to their role as bridges, allowing small cities to establish close economic ties with the central cities, promoting the flow of industrial capital between cities, realizing regional industrial upgrading and industrial transfer thus reducing city unemployment and raising city wage levels.
3) In terms of social livelihood, optimizing the service functions of the city, more employment opportunities, quality medical and educational resources, considerable income, and perfect service facilities are important factors to attract the inflow of middle and high-end talents. In addition, improving social network facilities and forming a safe cyberspace help cities develop.
4) The spatial pattern is the geographical basis of city development. The natural conditions and topographic features of our country lead to uneven distribution of population. Improving city transportation conditions, especially the external transportation construction of cities whose geographical location leads to obstructed communication with the outside world, improves transportation accessibility, reduces the spatial distance of cities, allows wider access to resources between cities, and thus promotes the development of cities with relatively weak economy.
Generally speaking, the samples in this paper were selected by setting thresholds based on the existing criteria for classifying city scale, however, different thresholds, as well as statistical, calibers can cause differences in the research results of city scale. In the future, we need to pay great attention to the problem of city sample selection and use multiple samples for comparative analysis as much as possible. In addition, attention should be paid to the changes in the division of the administrative areas of individual cities. Lastly, data sources should be expanded and more big data analysis tools should be used to conduct a more sophisticated study of the evolution of city scale.