Contributing Factors for Train Delays during Morning Rush Hour in Japanese Metropolitan Areas

The present study aims to reveal the contributing factors for train delays in Tokyo metropolitan area by conducting statistical analyses, focusing on passenger trains, and using a variety of information by including data concerning train cars, stations, passengers, tracks and working timetables as explanatory variables. The present study conducted 2 types of statistical analyses including the standard multiple regression analysis and the logistic regression analysis by setting “average delay time” which indicates the quantitative conditions of delays, and “occurrence of delays” which indicates the qualitative condition, as objective variables. According to the results of the logistic regression analysis, the possibility of direct operations increasing the delay occurrence rate was quantitatively indicated. Therefore, direct operations are regarded as a contributing factor for train delays concerning metropolitan areas in recent years. Additionally, it was confirmed that the concentration of demand on terminal stations is also a contributing factor for train delays. On the other hand, it is certain that direct operations contribute to improving the convenience of passengers as well as the operational efficiency of train cars. Therefore, it would be ideal to resolve delays by easing the concentration of demands which may be accomplished by recommending off-peak commuting as well as adjustments to the working timetables.


Introduction
In the metropolitan areas of many countries, the commutable zones spread in How to cite this paper: Ohshima, K. and Yamamoto, K. (2020) Contributing Factors for Train Delays during Morning Rush Hour in Japanese Metropolitan Areas. the suburbs according to the urbanization, and train lines are congested and delayed during rush hour. Especially in the case of Japan, many train lines in the metropolitan areas have intense commuter rush every morning, and a large number of train lines are delayed during rush hour. However, the frequency and time of such delays vary depending on the characteristics of each train line. According to the Ministry of Land, Infrastructure, Transport and Tourism [1], in Tokyo metropolitan area, the number of days where delay certificates were issued during the 20 weekdays in 2016 was a maximum of 19.1 days and a minimum of 1.4 days. Additionally, there is an increase in mutual direct operations between train lines in recent years, and further expansions of such operations can be expected in the future. Due to this increase, it has also become common for an incident caused in one location to affect the entire metropolitan area.
Furthermore, according to the Tokyo Metro Co., Ltd. [2], the number of passengers in specific stations in central Tokyo has increased due to mutual direct operations, and the congestion within station yards has become even more significant. Therefore, in order to improve the convenience of train lines network in metropolitan areas, it is essential to multilaterally analyze the contributing factors for delays with the characteristics of each train line in mind. On the other hand, quantitative analyses are extremely important in searching for the above contributing factors. Therefore, focusing on passenger trains, the present study aims to reveal the contributing factors for train delays in Japanese metropolitan areas by conducting statistical analyses. The above contributing factors will be made clear using various data by adding information concerning train cars, stations, passengers, tracks and working timetables as explanatory variables. Additionally, by preparing data for both single train lines and entire direct operation sections, the above contributing factors can be identified based on the current conditions of metropolitan train networks.

Related Work
The present study will be categorized as a study related to the train delays in metropolitan areas. In this category, the preceding studies can be divided into two groups. The first one is the studies related to the modeling of passengers' behaviors, and the second one is related to the characteristics of train delays focusing on specific lines and train lines network. In Japan, because train lines network is tremendously complicated and the congestion becomes serious problem, there are many preceding studies in both these two groups. The following are representative examples of studies closely related to the present study.
Regarding the studies related to the modeling of passengers' behaviors, in Japan, Uematsu et al. (2009) [3] analyzed the causes for delays and developed a simulation system that analyzes the delay occurrence and influence mechanism using an agent model. More specifically, they modeled the movement of passengers and developed a system that simulates delays due to concentration of de-  Comparing with the preceding studies in the related fields mentioned above, focusing on passenger trains, the present study will demonstrate the originality by conducting statistical analyses of various kinds of data concerning train cars, stations, passengers, tracks, and working timetables with many train lines in metropolitan areas, in addition to conducting quantitative analyses of potential contributing factors for train delays. Furthermore, the present study also demonstrates the usefulness to clearly grasp the degree of effect of each contributing factors for train delays by conducting statistical analyses of the above data with many train lines. Accordingly, based on the analysis results, the present study can provide the detailed information to the countermeasures against the train delays in Japanese metropolitan area which has complicated train lines network and serious congestion.

Framework and Process
In Section 4, the data of train lines which is an explanatory variable, and the data of delays which is an explained variable are gathered and processed for use in statistical analyses. Next, in Section 5, statistical analyses are conducted based on the data gathered and processed in Section 4 and the potential causes for delays are discussed.

Method
In order to quantitatively grasp contributing factors for train delays, the present study will conduct 2 types of statistical analyses: the standard multiple regression analysis and the logistic regression analysis. Regarding objective variables, the former will be "average delay time" which indicate the quantitative situation of delays, while the latter will be "number of days with the occurrence of delays" which indicate the qualitative situation of delays. Additionally, as the combination method for explanatory variables, there are 3 types including the all-possible regression method, the variable specification method, and the sequential selection method. As there are many explanatory variables to reveal the contributing factors for train delays in Japanese metropolitan areas making it difficult to select the best variable, the present study will use the sequential selection method. Additionally, this method has 3 types including the forward selection method, the backward elimination method, and the stepwise method. Among these, the stepwise method will be used in the present study, as it has the highest possibility of obtaining efficient variable combination, and has been the most used method in preceding studies in the related fields. Furthermore, the stepwise method adopting the Akaike Information Criteria (AIC) [31] is superior in having a clear process to select the appropriate variables based on a constant standard.

Target Train Lines
In the present study, Tokyo metropolitan area, which is the largest metropolitan area in Japan and has tremendously complicated train lines network and serious congestion, is selected as a target. Tokyo metropolitan area consists of six prefectures such as Tokyo Metropolis, and Kanagawa, Chiba, Saitama, Yamanashi, Tochigi, Gunma and Ibaragi Prefectures. Thus, in Tokyo metropolitan area, the range of train lines is very huge, it is necessary to grasp the outlines of the target train lines selected in the present study. Therefore, Figure 1 describes the schematic diagram of the target train lines.
As shown in Figure 1, the present study targets 55 train lines of 17 railway companies in Tokyo metropolitan area. However, in Tokyo metropolitan area, as train lines network is tremendously complicated, it is difficult to display all Journal of Transportation Technologies train lines in a single figure. Therefore, Figure 1 shows the schematic diagram of the target train lines excluding subway lines. As shown in Figure 1, the Yamanote Line (the Tokyo Loop Line) surrounds the central part of Tokyo Metropolis, and most of train lines are radially extended from the sub-centers such as Shinjuku to the suburban areas.

Collection of Delay Data
For data concerning delays, delay certificates that are available on the website of each railway company were used. The delay time displayed on the delay certificates was recorded and if there were no delay certificates, the delay time was recorded as being 0 minutes. Kariyazaki et al. (2010) [16] indicated that the delay certificates reveal that the number of delays were significantly higher on weekdays than weekends, and were especially high in the morning. Therefore, weekday mornings are set as the target in the present study. The specific time zone targeted to the present study was set from the first train to 10 am. Additionally, the target period was set to 21 weekdays in June 2018. Because, in Japan, the difference between the days of the week can be minimized as there are no holidays in June, and the effect of the weather can be eliminated due to the good weather around this time.
Regarding "average delay time" which is an explained variable, the largest value is 8.6 minutes, the smallest is 0 minutes, and the average value is 8.5 minutes in June 2018. For "number of days with the occurrence of delays", the largest value was 21 days, the smallest value was 0 days, and the average value was 12.1 days. In this way, during the target period, 1 out of the 55 target train lines, which was a short train line (length of 13 km) with no direct operation, had no delays. On the other hand, one train line had delays every day during the target period which was significantly more compared with other lines. This train line has 2 direct operations, the length of the direct operation section is 173.8 km, the transportation capacity per train during peak hours is 1372.7 people/train, and the number of stations is 70. Therefore, it can be said that this is a train line that conducts direct operations in addition to being large-scaled in the first place.

Explanatory Variables
In order to consider the effect of train cars, stations, passengers, tracks and working timetables (My LINE Tokyo Timetable) [32] on train operations,10 explanatory variables shown in Table 1 will be selected. Table 1 enumerates these explanatory variables together with the data sources.
In the following part, the details of the explanatory variables shown in Table 1 are explained.
1) Transportation capacity for each train during peak hours (unit: people/train) This is a variable concerning the total passenger capacity of train cars and is the transportation capacity during peak hours in the most congested section divided by the number of operating trains per hour (7).
2) Number of stations This is the number of stations on the target train lines.
3) Transported passengers per hour during peak hours (unit: people/hour) This is a variable indicating the number of passengers on trains during peak hours in the most congested section.

4) Number of stairs and escalators in terminal stations
This is a variable concerning stations, which include the number of stairs and escalators on the platform of terminal stations. Stations of each line with the highest number of passengers on the platforms were selected as the terminal stations in principle.

5) Length of train lines (unit: km)
This is a variable concerning the length of each train line. Working kilometers were used as the value to indicate the length.

Setting Direct Operation Sections
In order to consider the recent increase of direct operations in metropolitan areas, the present study will adopt explanatory variables concerning the entire direct operation section. The standard for direct operation sections was set as "sections in which trains run on the applicable train line within the target period", and the direct operation sections were set based on the My Line Tokyo Timetable [32].

Results and Discussion
In this section, R will be used to confirm the multicollinearity of explanatory variables, and reveal the contributing factors for train delays in Japanese metropolitan areas by conducting 2 types of statistical analyses. R is a programming language of statistics of open-source free software for statistics analysis. In the present study, using R, 2 types of statistical analyses including the standard multiple regression analysis and the logistic regression analysis by setting "average delay time" which indicates the quantitative conditions of delays, and "occur-rence of delays" which indicates the qualitative condition, as objective variables. Additionally, 10 explanatory variables shown in Table 1 are adopted in the above 2 types of statistical analyses.

Variables Selection for the Standard Multiple Regression Analysis
As a result of using the stepwise method by increasing and decreasing variables in the standard multiple regression analysis, "transportation capacity for each train during peak hours", "number of stairs and escalators in terminal stations", "number of trains according to type", and "length of direct operation sections" were selected as explanatory variables.

Evaluation and Discussion of the Standard Multiple Regression
Analysis Results Table 2 shows the result of the multiple regression analysis. The discussion of each explanatory variable is as shown below.
1) Transportation capacity for each train during peak hours Trains with large transportation capacity are trains with more cars. If there are more train cars, the distribution of passengers becomes unbalanced. Therefore, even if the number of passengers is not extremely high, getting on and off the train may take longer, resulting in the train being delayed.

2) Number of stairs and escalators in terminal stations
If there are many stairs and escalators installed in the terminal station, it can be considered that the demand is concentrated on that station. Therefore, delays may be caused as getting on and off at the terminal takes time.

3) Number of trains according to type
Train lines with severe congestion issues tend to have fewer train types. Especially with many subway lines, the regression coefficient for the number of train types is under 0 as only local trains are operated.

4) Length of direct operation sections
As the frequency of accidents and trouble arising naturally increases when operating sections are longer, the average delay time also becomes longer.
Based on the information above, the average delay time is considered to increase

Variables Selection for the Standard Multiple Regression Analysis
As a result of using the stepwise method by increasing and decreasing variables in the logistics regression analysis, "transported number of passengers per hour during peak hours", "number of stairs and escalators in terminal stations", "the average number of train lines", "number of operating trains per hour during peak hours", "number of trains according to type", "number of lines with direct operation", and "length of direct operation sections" were selected as explanatory variables. Comparing with the standard multiple regression analysis, the logistic regression analysis has 3 additional explanatory variables: "the average number of train lines", "number of operating trains per hour during peak hours" and "number of lines with direct operation". Table 3 shows the result of the logistic regression analysis. The discussion of each explanatory variable is as shown below.

Evaluation and Discussion of the Standard Multiple Regression Analysis Results
1) Transported number of passengers per hour during peak hours The regression coefficient, which is extremely close to 0, indicates that it does not directly cause delays, while the change in the transported number of passengers affects the occurrence rate of delays.
2) Number of stairs and escalators in terminal stations As with the discussion of the standard multiple regression analysis, the occur as trains will be more likely to slow down as it gets closer to a train in front. As a result, the 2 above events cancel each other out and the regression coefficient becomes closer to 0.

5) Number of trains according to type
As with the discussion of the standard multiple regression analysis, train lines with few train types such as subway lines are more easily congested and have a higher occurrence rate of delays.

6) Number of lines with direct operation
Most lines conducting direct operations pass through the city center. As delays occur more frequently in city centers with high demand, train lines that conduct direct operations passing through the city center have more delays as the delays of trains in front affect the trains behind.

7) Length of direct operation sections
Tough the occurrence rate of accidents and trouble becomes higher when operation sections become longer, the regression coefficient value of the explanatory variable for the length of operation sections was low as such occurrence rate is still smaller compared to small-scale delays due to congestion. Additionally, as the p-value is high, the length of direct operation sections does not directly affect the occurrence rate of delays.
Based on the information presented above, the concentration of demand in addition to the number of trains with direct operations is highly correlated to the occurrence rate of delays. Therefore, it can be suggested the possibility that the increase in direct operations affects the occurrence of delays.

Conclusions
The present study revealed the contributing factors for train delays in Tokyo metropolitan area, Japan by conducting statistical analyses, focusing on passenger trains. More specifically, the above factors were grasped using various information including data concerning train cars, stations, passengers, tracks and working timetables as explanatory variables. Additionally, by preparing data for both single train lines and entire direct operation sections, the above contribut-ing factors according to current conditions of metropolitan train networks were identified.
The present study conducted 2 types of statistical analyses including the standard multiple regression analysis and the logistic regression analysis by setting "average delay time" which indicates the quantitative conditions of delays, and "occurrence of delays" which indicates the qualitative condition, as objective variables. Regarding the comparison between the 2 types of statistical analysis results, the logistic regression analysis had 3 added explanatory variables: "the average number of train lines", "number of operating trains per hour during peak hours", and "number of lines with direct operation". According to the results of the logistic regression analysis, the possibility of direct operations increasing the delay occurrence rate was quantitatively indicated. Therefore, direct operations are regarded as a contributing factor for train delays concerning Tokyo metropolitan areas in recent years. Additionally, it was confirmed that the concentration of demand on terminal stations is also a contributing factor for train delays. On the other hand, it is certain that direct operations contribute to improving the convenience of passengers as well as the operational efficiency of train cars. However, as direct operations make it possible for passengers to arrive at their destination without transferring at terminal stations, direct operations can also be expected to ease the concentration of demand. Therefore, it would be ideal to resolve delays by easing the concentration of demands which may be accomplished by recommending off-peak commuting as well as adjustments to the working timetables.
In the future, it will be necessary to prepare more explanatory variables and further consider the characteristics of each train line in analyses. Additionally, as the data in the present study were gathered when the weather was good, the effect of weather is not sufficiently reflected in the data. The effect of such contributing factors must be considered by extending the period for collecting data. In this way, it is a task for future research projects to improve analytical accuracy.