Modeling Crash Risk at Rail-Highway Grade Crossings by Track Class

The Federal Railroad Administration (FRA)’s Web Based Accident Prediction System (WBAPS) is used by federal, state and local agencies to get a preliminary idea on safety at a rail-highway grade crossing. It is an interactive and user-friendly tool used to make funding decisions. WBAPS is almost three decades old and involves a three-step approach making it difficult to interpret the contribution of the variables included in the model. It also does not directly account for regional/local developments and technological advancements pertaining to signals and signs implemented at rail-highway grade crossings. Further, characteristics of a rail-highway grade crossing vary by track class which is not explicitly considered by WBAPS. This research, therefore, examines and develops a method and models to estimate crashes at rail-highway grade crossings by track class using regional/local level data. The method and models developed for each track class as well as considering all track classes together are based on data for the state of North Carolina. Linear, as well as count models based on Poisson and Negative Binomial (NB) distributions, was tested for applicability. Negative binomial models were found to be the best fit for the data used in this research. Models for each track class have better goodness of fit statistics compared to the model considering data for all track classes together. This is primarily because traffic, design, and operational characteristics at rail-highway grade crossings are different for each track class. The findings from statistical models in this research are supported by model validation.


Introduction
A rail-highway grade crossing works as an at-grade junction to allow for traffic number of crashes at a rail-highway grade crossing.
The three-step process of the FRA crash prediction formula includes 128 explanatory variables but has its own shortcomings.The formula was developed almost three decades ago and has been used since without much improvement (apart from updating coefficients).The formula only gives a preliminary idea to the decision makers to allocate resources.It is quite complex and difficult to interpret in terms of the most influencing factors of safety at a rail-highway grade crossing.It does not take into consideration regional/local level geographic and other site-specific data such as sight-distance, highway congestion, local topography, and passenger exposure (train or vehicle).
The causes of crashes, driver behavior, geometric features, topographical conditions, and the presence of safety devices at rail-highway grade crossings vary for one state to another in the United States.As an example, North Carolina has active warning devices at more than 50% of its public rail-highway grade crossings.Therefore, the warning device criteria as in the FRA rail-highway grade crossing crash prediction formula may be of little use to identify hazardous rail-highway grade crossings in North Carolina.The formula could also be simplified if an analysis is performed and a method/model developed using state or regional-level data.
WBAPS does not explicitly consider track class.Since design and operational characteristics vary by track class, developing models by track class may yield more meaningful results and assist rail practitioners.This research, therefore, focuses on the development of rail-highway grade crossing crash prediction models, using regional/local level data, by track class as well as considering data for all track classes.

Literature Review
Researchers have adopted various methods to develop crash prediction formulas for rail-highway grade crossing safety improvement.Negative Binomial (NB) crash prediction models were developed for rail-highway grade crossings using a simple one-step process [7].NB distribution-based model was also found to be the best fit for the data to identify rail-highway grade crossing blackspots for three categories (passive, flashing lights, and gates) [8].
The relation between the number of crashes and characteristics of rail-highway grade crossings was also observed through the use of a gamma distribution-based model [9].The results from their study showed that crashes would increase with an increase in the total traffic volume and the average daily train volume.Further, the proximity of an industrial area and the time between signal and gate activation was observed to be associated with higher crash frequencies [9].
Zero-inflated models were also developed, to examine the role of factors affecting rail-highway grade crossing crashes, to tackle data scarcity even with a large number of rail-highway grade crossings [10] [11].The literature also documents the application of logistic regression models to observe the trends in the number of crashes at rail-highway grade crossings over time [5].Stepwise regression analysis was also adopted to develop the rail-highway grade crossing crash prediction formula that aids in prioritizing signal improvements [12].
The above discussion on models to estimate crash risk at rail-highway grade crossings clearly indicates that a single statistical distribution may not be applicable to all datasets or locations.It also emphasizes on the development of a method and models that best fit the data, accounting for factors at the regional/local level.
Literature also documents research to develop methods or examine the effect of the countermeasures on rail-highway grade crossing safety.Park and Saccomanno [13] examined the interactions between various countermeasures (such as warning devices and the posted speed limit) on safety at rail-highway grade crossings.They also studied the effect of a less explored combination of countermeasures and control measures (highway class) on crash frequency using a sequential analytical strategy.This strategy combines the tree-based regression stratification of data with generalized linear regression models [14].Saccomanno and Lai [15] categorized the rail-highway grade crossing inventory variables into non-linear factors and assigned scores.The scores were used to cluster rail-highway grade crossings and then develop a separate model for each cluster using explanatory variables relevant to that cluster [15].Bayesian data fusion method was used to tackle the problem of sparse crash data when evaluating countermeasure effectiveness.The method used previous research inferences for countermeasure effectiveness along with a calibrated model of the study area to finally generate a collision response and probability distribution for each countermeasure [16].
The type of countermeasure also plays a role in safety at rail-highway grade crossings.As an example, Yan et al. [17] showed that stop-sign treatment is an effective countermeasure to improve safety at rail-highway grade crossings.Likewise, upgrading flashing lights to gates on single track may be more effective than at a rail-highway grade crossing along multiple tracks.However, the train speed variation did not have much influence on the effectiveness of the upgrade [18].While some new models for rail-highway crash prediction were developed, past research did not primarily account for regional/local factors, which influence the crash trends to a great extent.The States (California, Texas, Illinois, Georgia, and New York) that have been chosen for research in the past are usually the ones with high train traffic.States such as North Carolina with relatively less train traffic may have different types of challenges.Considering such diverse geographic patterns is important.Therefore, this research aims at developing an approach to predict crashes at a rail-highway grade crossing based on regional/local data.Also, unlike most of the prediction models developed in the past, the models developed in this research do not make use of crash history information.This is mainly because crashes at rail-highway grade crossings in the study area are rare events, which makes the crash history of little use when pre-Journal of Transportation Technologies dicting crashes in the future.Such an approach will also help when planning, designing and building new tracks with rail-highway grade crossings.
Funds available and countermeasures implemented at rail-highway grade crossings vary based on train activity-levels and track design characteristics in addition to the risk.These characteristics differ for each track class.On a different note, analyzing and modeling by track class could yield better results rather than developing models considering data for all track classes in a region.This research addresses the aforementioned aspects to add to the current state of knowledge on safety at rail-highway grade crossings.

Methodology
The methodology to model crash risk at rail-highway grade crossings is comprised of five steps.Each of those steps is discussed next in detail.

Selection of Rail-Highway Grade Crossings
The selection of rail-highway grade crossings needs to be performed so as to have the best representative sample of the population of all the rail-highway grade crossings in the study region.The selection should comprise of rail-highway grade crossings with zero as well as more than zero crashes.Likewise, the representative sample should have a fair distribution of rail-highway grade crossings pertaining to all the track classes.

Selection of Explanatory Variables
The explanatory variables considered should represent the characteristics of the highway, rail-track and the types of warning devices at the rail-highway grade crossings.This research tried not to use minimal warning device variables so as to avoid endogeneity, which means that the cause of crashes is the reason a particular warning device is installed at a rail-highway grade crossing.The selection of the variables in this research is mainly based on the correlations between the variables and the dependent variable ("crashes per five years") and amongst the other variables considered for the analysis.

Development of Crash Risk Estimation Models
The dependent variable for all the models is the "number of crashes per five years" at a rail-highway grade crossing.Crash count models were primarily explored in this research.Poison, NB and Gamma log-link distribution-based models are the popular count models.While count models provide a sensible output, they suffer from certain limitations.The Poisson model assumes the mean and variance to be equal, while the NB model is capable of handling data with variance greater than the mean (over-dispersed).The Gamma model, however, is capable of dealing with both over-dispersed and under-dispersed data.
In this research, the analysis was conducted using SPSS software [19], in which the Gamma model excludes the zeroes in the dependent variable while modeling.As both the zero as well as non-zero values of the dependent variable are crucial, the use of a Gamma log-link distribution-based model has been excluded in this research.Researchers have considered zero-inflated models when studying crash data in the past.The zero-inflated NB model could be a special case of the NB model, and the difference in performance might be trivial [20].
For this reason, only Poisson and NB distributions are only discussed in this research.
The Akaike's Information Criterion (AIC) was used to assess the quality of various statistical models developed from the same data.The statistic provides an estimate of the information that has been lost as a result of using a particular model that generates the data.Given a set of candidate models for the data, the best model is the one with the minimum AIC value.
The Corrected Akaike's Information Criterion (AICC) was also checked to ensure that the model does not tend to over-fit the results.In general, the difference between AIC and AICC should be as low as possible.
In addition, the likelihood ratio Chi-Square and Deviance values were also computed and considered to assess the goodness-of-fit of the developed models.
The probability value of the selected explanatory variables was also tested at a 95% confidence level (significance value ≤ 0.05).

Validation of the Models
The best-fitting model was then validated using data set aside for model validation (not used for the model development).The number of crashes at each selected rail-highway grade crossing is computed and compared with the actual number of crashes at the rail-highway grade crossing.To test the predictability of models compared to WBAPS, the number of crashes were compared to the analogous term "number of collisions per year" from the WBAPS output.
A t-test was then conducted in order to check if the two groups of data belong to the same population or not.The null hypothesis is that the two groups being tested are statistically different while the alternate hypothesis is that the two groups are not statistically different.The null hypothesis cannot be rejected if the P-value is less than 0.05 (at a 95% confidence level).

Data
The data collected for this research includes two databases: 1) The FRA Office of Safety rail-highway grade crossing inventory, and, 2) The FRA Office of Safety crash/incident database, both for the state of North Carolina.The rail-highway grade crossing inventory provides site-specific details of the rail-highway crossing and highway characteristics-the number of daily through trains, warning devices, annual average daily traffic (AADT), and the posted highway speed limit.The crash history data is available for each year.This database includes details of each incident at any of the operational rail-highway grade crossing in that year.The database also includes the type of railway equipment involved in the crash (freight train, passenger train, and inspection car) and the circumstance of the crash (if the rail-user was struck by the train or vice-versa).Crash history from the year 2009 to the year 2013 was considered to develop models in this research.Only rail-highway grade crossing where conditions remained same over this five-year period were selected for analysis and modeling.
The rail-highway grade crossings were identified using the unique rail-highway grade crossing ID number.This number is a common element in both the databases.The rail-highway grade crossing ID was used to merge the rail-highway grade crossing inventory data with the crash frequency information from the crash/incident database to generate a database that was used for further analysis.Almost 97% of rail-highway grade crossings in the study area have warning devices installed at them.In such a case, including variables related to warning devices may pose endogeneity issues.In this study, it would imply that the presence of warning devices may result in zero crashes (which are the frequency of crashes found in abundance at rail-highway grade crossings) and the zero crashes are caused at a rail-highway grade crossing due to the warning devices installed at these locations.Hence, warning device variables were not included in the models as far as possible.They also were observed to be correlated to other variables considered for modeling in this research.
All rail-highway grade crossings without data for a five-year period were removed from the database and further analysis.In addition, only public and at-grade rail-highway grade crossings were retained in the database.
The data had certain variables that were categorical in nature.For example, "highway near crossing" had four fields-less than 75 ft, 75 to 200 ft, 200 to 500 ft, and no highway nearby.These variables were reduced to indicator variables i.e., one variable for each of the four fields.Also, AADT was converted to a rate of per 10,000 vehicles.All other continuous variables were used in the analysis without any changes.
Based on the FRA guidelines [21], the following range of train time table speed  Overall, the dataset considered had 681 rail-highway grade crossings in track class 1; 1432 rail-highway grade crossings in track class 2; 870 rail-highway grade crossings in track class 3; 656 rail-highway grade crossings in track class 4; and 133 rail-highway grade crossings in track class 5.About 20% of the rail-high grade crossings were randomly selected for each track class and set aside for the model validation.
The data from each track classes were combined for comparing the results for each track class model with a model for all track classes data taken together.between these variables as well as with the number of observed crashes during the five-year period was examined by constructing a Pearson correlation coefficient matrix.The examination was done for Pearson correlation coefficient matrix for the all track class dataset as well as a dataset for each track class.

Variable Selection to Develop Models
The maximum train time table speed was considered as a key variable influencing safety and risk at the rail-highway grade crossings.The maximum train time table speed was included in the analysis and modeling process for the model considering data for all rail-highway grade crossings.The number of main tracks and AADT were also forced into the models.However, the maximum train time table speed was not forced into the model for each track class as the track class is based on the maximum train time table speed.The AADT and/or the number of main tracks were selected as the key variables influencing safety risk at the rail-highway grade crossing in this case.The variables that were found not to be correlated to the key variables (at a 95% confidence level) were identified and used in the development of models (only if correlated with the observed number of crashes during the five-year period).
Table 2 summarizes the explanatory variables selected based on correlation to develop models for each track class.

Analysis and Results
Track  Track classes 1, 2 and 3 have rail-highway grade crossings with mostly zero or one main track while track classes 4 and 5 have a few rail-highway grade crossings with one or two main tracks.There are fairly a low number of rail-highway grade crossings in any of the classes with three or four main tracks.Also, a higher number of rail-highway grade crossings with one main track is found to have a higher number of reported crashes at them.This can be mainly due to the abundance of rail-highway grade crossings in this category (# of main tracks = 1) or may be due to some other unexplainable factor.The rail-highway grade crossings with two main tracks have more than 90% of the rail-highway grade crossings with two quadrant gates.

Modeling Based on All Track Class Data
Models were first developed considering data for all the track classes together.
The model developed is shown in Table 3.
The number of main tracks is positively correlated to crashes, while the highway speed limit has a negative coefficient (possibly because warning devices and signals are provided at rail-highway grade crossings with the higher posted speed limit on the highway).The model also has a negative intercept.The significance value for the likelihood ratio Chi-Square is less than 0.

Modeling Based on Each Track Class Data
The crash distribution, the mean and the variance for each track class are shown in Table 4.The data for track classes 1 and 5 were found to be under-dispersed, while the data for track classes 2, 3, and 4 were found to be over-dispersed.Since the model based on all track classes data and three out of five track classes have variance greater and then the mean, NB distribution-based models are developed for each class and summarized in Table 5.
From  The total number of trains, the number of main tracks, the total number of switching trains, the percent of heavy vehicles, and the number of traffic lanes have generally, a positive coefficient, while no highway near the rail-highway grade crossing has a negative coefficient.All the models have a negative intercept indicating that the number of crashes per year would be very low (almost zero).
The negative coefficient for the number of main tracks in the case of track class 5 could be attributed to the warning devices and signals implemented at such rail-highway grade crossings.

Model Validation
The model validation was performed using data set aside for each track class.
The computed WBAPS collision per year was converted to a five-year scale by multiplying the value with 5 (assuming conditions remain constant over the five-year period) to assist with the comparison.The T-test was performed by comparing the difference between the predictions from the developed model and the observed number of crashes with the difference between predictions from WBAPS and the observed number of crashes.

Conclusions
The NB model for each track class model was found to be the best fitting model to predict the number of crashes at rail-highway grade crossings.The total number of trains, if stop lines are present, the number of traffic lanes, the percentage of trucks, the number of main tracks, the total number of switching trains and no highway near the rail-highway grade crossing are critical explanatory variables to model crash risk by track class at rail-highway grade crossings.
The variables in each track class are different from one another, which support the fact that rail-highway grade crossings for each track class must be considered separately when modeling crash risk.
The comparison of WBAPS with the developed model outputs suggests that these models give a more conservative picture of the number of crashes.It also shows that track class is a critical factor related to the risk at a rail-highway grade crossing.The track class governs the number of crashes at rail-highway grade crossings largely and should thus always be considered when addressing rail-highway grade crossing safety problems.
The models suffer from certain limitations as they have been developed using data available which is very scarce in nature.In the models based on track class, there are classes in which only a marginal number of rail-highway grade crossings exist and so a very accurate estimate could not be made.
In the absence of funds or to enhance design standards, the agencies make the decision of closing some rail-highway grade crossings.This leads to an increase in the vehicular traffic and, hence, the risk at the other nearby rail-highway grade crossings.There are also other factors that contribute to crash reduction which could not be accommodated in the models developed in this research and are potential topics for future research.These include driver behavior at rail-highway grade crossings, driving under the influence of alcohol, and rail-highway safety awareness among users.

*
1 = no signs or signals, 2 = other signs or signals, 3 = crossbucks, 4 = stop signs, 5 = special active warning devices, 6 = wigwags, bells 7 = flashing lights, 8 = all other gates (two and three quadrant gates*), 9 = four quadrant (full barrier) gates BL # of bells NS 1 = no signs or signals; 0 = at least one sign or signal SGEQ Is track equipped with train signals? 1 = yes, 0 = no OS Development type open; 1 = yes, 0 = no RS Development type residential; 1 = yes, 0 = no COM Development type commercial; 1 = yes, 0 = no INDUS Development type industrial; 1 = yes, 0 = no INST Development type institutional; 1 = yes, 0 = no STPL If stop lines are present; 1 = yes, 0 = no RRX If rail road crossing symbol is present; 1 = yes, 0 = no NMK If there are no pavement markings; 1 = yes, 0 = no STPL If there are stop lines and rail-road crossing signals; 1 = yes, 0 = no L75 If the highway is less than 75 ft away; 1 = yes, 0 = no B200-500 If the highway is in the vicinity of 200 to 500 feet; 1 = yes, 0 = no B75-200 If the highway is in the vicinity of 75 ft to 200 feet; 1 = yes, 0 = no NHWY If there is no highway nearby; 1 = yes, 0 = no TRFLN # of trafficlanes STHWY Is crossing on state highway? 1 = yes, 0 = no AADT Average annual daily traffic PCTRK % of truck traffic SCHLB Average number of school buses passing through the crossing on a school dayWHISTBIf there is a whistle ban; 1 = 24 hr, 0 Three quadrant gates: gates at a rail-highway grade crossing along with a median on the approach to the rail-highway grade crossing that only has a gate on the entrance lane.
of main tracks 2) # of traffic lanes DOI: 10.4236/jtts.2019.93016270 Journal of Transportation Technologies grade crossings with two quadrant gates.Four quadrant gates seem to be rarely installed at rail-highway grade crossings in the study area.The warning devices across track classes are justified as track class is related to the speed of the train.
01.The AIC and AICC are equal to each other for both the models.However, the NB distributed-based model has marginally lower AIC, AICC, and Deviance values than the Poisson distribution-based model.Further, the computed variance is greater than the mean.Therefore, NB distribution-based model is considered to better fit the data used in this research.

Table 1
summarizes the list of variables considered in this research.The correlation

Table 1 .
Variables considered for analysis and modeling.

Table 2 .
Variables considered for developing each track class model.
class 1 has two-quadrant gates and crossbucks installed at 38.3% and 30.9% of the total rail-highway grade crossings.Similarly, track class 2 has 51.4% and 31.5% rail-highway grade crossings with two quadrant gates and crossbucks, respectively.Track class 1 has 15.6% rail-highway grade crossings with flashing lights installed, while track class 2 has a higher number i.e., 121 rail-highway grade crossings with flashing lights installed.Further, for track class 3 and above, more rail-highway grade crossings have flashing lights and gates installed at them rather than just crossbucks.Track class 5 has 92.7% of its total rail-highway

Table 3 .
All track class data model.
*C is coefficient and P is probability or significance value.

Table 5
, the explanatory variables that have an effect on crashes at rail-highway grade crossing vary by the track class.The AIC and AICC are reasonably close to each other for each track class model.The significance value for the likelihood ratio Chi-Square is less than or equal to 0.01 for each track class model.The AIC, AICC, and Deviance values are lower than the corresponding value for the model based on all track class data.This indicates that developing models for each track class may lower prediction errors and improve accuracy than compared to all track class data model.

Table 4 .
Descriptive statistics of data based on track class.

Table 5 .
Models by track class.

Table 6
shows the mean, standard deviation, significance value, and the absolute value of T-statistic comparing the model output from this research and WBAPS output for each track class.The mean difference for the developed model is lower than WBAPS for track class 1 and track class 2 (also shown in

Table 6 .
Comparison of errors.