A Preliminary Study on Spatial Spread Risk of Epidemics by Analyzing the Urban Subway Mobility Data

The prevention and treatment of epidemic is always an urgent problem faced by the human being. Due to the special space structure, huge passenger flow and great people mobility, the subway lines have become the areas with high epidemic transmission risks. However, there is no recent study related to epidemic transmission in the subway network on urban-scale. In this article, from the perspective of big data, we study the transmission risk of epidemic in Beijing subway network by using urban subway mobility data. By reintegrating and mining the urban subway mobility data, we preliminary assess the transmission risk in the subway lines from the passenger behaviors, station features, route features and individual case on the basis of subway network structure. This study has certain practical significance for the early stage of epidemic tracking and prevention.


Introduction
In the history, epidemic has always been a serious threat to human health.The prevention and treatment of epidemic is always an urgent problem faced by the human being.As the MERS is spreading nowadays, the prevention and suppression of similar epidemic is a significant responsibility of the government and the medical department.In the view of system science, the outbreak of epidemic can be understood as a complex diffusion process in the crowd.The modeling and assessment of this process can help us to understand the mechanism of the spread of epidemic and provide corresponding basis of epidemic analysis, simulation and interference [1].
The traditional epidemic assessment and prediction method is based on the differential equation.And the most mature and commonly used model is the SIR, SIRS and SIS.In recent years, the researches on the spread of epidemic are being deepened and refined [2]- [11].In the model, there is a transition from the single Chamber model to the complex network model with the addition of relationship network, which leads to a greatly improvement on the reliability and rationality of the results.On the other hand, due to the comprehensive improvement of traffic network, the popularization of transportation (aviation transportation, railway transportation, road transportation, etc.) makes the contact between people become easier and more complex.As a result, the transmission of epidemic in transportation network is gradually concerned by researchers [12].
However, the researches of transportation network are basically aviation network based on modeling method.And the object is always focused on national and global scale.There is no recent study related to the epidemic transmission in the subway network on urban-scale.Therefore, it is meaningful and valuable to reconstruct the subway flow information and build risk assessment of different routes and stations by using the idea of big data.

Data
In the process of taking a subway, passengers will use the Beijing municipal administration & communication card or a one-way subway ticket.So the card machine in the station can record a large number of card information every day.In this article, we use the data from the official card system.The data we used contains 1 day smart card data, namely one day in February, 2014.The number of the original data records is 3,249,333.In order to facilitate the subsequent research, we need to eliminate the ineffective information from the original data and save into our own format.The number of the final data is 1,630,213 and the content is showed in Table 1.

Method
In order to carry out a risk assessment of the spread of epidemic from both macroscopic and microcosmic, it is necessary to make full use of the existing data resources.This study uses the C++ programming to mine and count the general passenger flow information and the corresponding hidden information.

Passenger Behavior
The stops number and cost time of the passengers are satisfied with the normal distribution, as shown in Figure 1, the probability density function expression of the distribution is ( ) .We can find that most passengers prefer to choose the journey of medium length (10 stops or 30 minutes), 90% of the passengers' stops number is less than 20 and cost time is less than 60 minutes.Most of the passengers (86%) only take subway once in a day.Besides of it, there are still a small part of the passengers (13%) will take the subway twice in a day and most of them (69%) choose the round-trip travel.Three and more only account for 1% and can be neglected.
The results show that most of the passengers choose short or medium distance journeys and most of them only

COST_TIME
The cost time of the journey take subway once in a day.Therefore, there is a huge number of people exchange in the subway lines every day.
If there is an infectious case, a cross-spread situation may emerge easily in this region.
For the number of arrival passengers that exit the station, the distribution is shown in Figure 2, where each station number represents a site in the abscissa.The results showed that a total of 10 stations' number is more than 20,000, 37 stations' number is more than 10,000 but less than 20,000, 88 stations' number is more than 5000 but less than 10,000, 100 stations' number is less than 5000.The total number of the stations with the passengers' number of more than 10,000 is close to the half of all.This shows that the stations where the passengers choose to get off are relatively concentrated.After analyzing the location of these stations, we can find these stations are located in the area with the train stations, transportation hubs, commercial centers, medical institutions, places of interest and residential areas.
For the total passengers' flow that passing through the station (containing the parts of passengers passing by, getting in and getting off the station), the distribution is also shown in Figure 2. The results showed that a total of 22 stations' flow number is more than 200,000, 52 stations' flow number is more than 100,000 but less than 200,000, 61 stations' flow number is more than 50,000 but less than 100,000, 100 stations' flow number is less than 50,000.The stations with large passengers' flow relatively concentrate in the region of middle.After analyzing, there are 74 stations with the flow number of more than 100,000.If we select the highest 10 sets of data, it can be found that these stations are all large transfer stations with a daily flow of more than 250,000, which perfectly matching our knowledge.In the following analysis, we believe that on average, the greater the passenger flow of the site, the greater the risk of transmission of infectious diseases for the site area.

Station Risk Assessment
For the passenger source for a certain destination station (take the Beijing Railway Station as example), the distribution is shown in Figure 3.The results show that only 6 stations have over 1000 passengers whose destination is the Beijing railway station, 21 stations' number is more than 500 but less than 1000, 208 stations' number is less than 500 people.Therefore, the passenger source is relative concentrated for most of passengers are from 27 stations.After analyzing the location of these stations, we can find these stations are all in the area with many communities, such as TIANTONGYUAN, HUILONGGUAN, PINGUOYUAN, etc.It can be understood as the passengers go to the Beijing Railway Station from their residence.
For the passengers from the same station, the distribution of destination stations is shown in Figure 4, which is relatively dispersed.The study shows that only 1 station's passenger number is more than 1000, 6 stations' number is more than 500 but less than 1000, 228 stations' number is less than 500 people.Therefore, there are only 7 stations with relative many passengers and these stations are also in the area with train stations, transportation hubs, commercial centers, medical institutions, places of interest and residential areas.

Route Risk Assessment
Due to the existence of different route choices, different passengers may choose different ways to reach the same destination.The article shows that the passenger flow varies greatly between the different routes of two stations.In general, we can divide the route features into two categories.One is the stations with a clear shortest route and the other is the stations with competitive route.There are significant differences in the statistical results of these two lines, which is shown in Figure 5.
For the first route feature, the article takes the HAIDIANHUANGZHUANG to WUDAOKOU as an example.The red, grey and yellow lines represent the shortest, second and third shortest route.The results show that basically all the passengers (95.6%) choose one route (the shortest route), which matching our common sense.For the second route feature, the article takes the CAISHIKOU to CHAOYANGMEN as an example.The red, grey and yellow lines represent the shortest, second and third shortest route.Unlike the previous one-sided distribution, the results show that there is no marked difference between the passenger numbers of the three routes.These results are reasonable because of the similar length of the three routes.
Therefore, when it is unable to determine the actual passenger choice of routes, due to the different proportion of passenger choices, the route with a high choice proportion earn a high level of transmission risk.They belong to the area with high epidemic transmission risk.

Individual Case Tracking
According to the route matching principle, by comparing the potential routes and the cost time, we can get one certain infectious individual's actual choice of the route and the new contact individuals in each station, which is showed below: Beijing South Railway Station ( 218 The total number of the people who have a potential contact with the tracking infectious case is 580.By this method, we can obtain the urban subway mobility data information of the contact individuals, which can help us to determine suspected close contact persons.It is quite significant for us to take corresponding measures.Part of the contact individuals' card information is showed in Table 2.

Conclusion
In this study, using the big data sight, we make a preliminary research and analysis on the epidemic transmission risk of Beijing subway line through the perspective of passenger behavior, station, route, and the individual case.The article summaries the passengers' traveling behavior and reflects its rules; We also analyze the stations' source and whereabouts features by counting the urban subway mobility data and finally finish the macroscopic station epidemic transmission risk assessment; Subsequently, by using the travel route algorithm and the route matching principle, we obtain the passenger's actual travel route and the macroscopic epidemic transmission  risk assessment of different routes; Finally, we track a hypothetical case and get its influence range, through which we fulfill a microcosmic risk assessment.In this article, we have carried on a preliminary exploration of the above questions and obtained some valuable conclusions.The future work can be deepened and refined through this method.

Figure 1 .
Figure 1.The distribution of the cost time and stops number of the passengers.

Figure 2 .
Figure 2. The passenger number of the arrival & exit and total passenger flow.

Figure 3 .
Figure 3.The hot spot diagram of the passenger source.

Figure 4 .
Figure 4.The hot spot diagram of the passenger whereabouts.

Figure 5 .
Figure 5.The distribution of the route passengers

Table 1 .
The final single data record format.

Table 2 .
Part of the contact individuals' card information.