^{1}

^{*}

^{1}

Run-off-road crashes in the United States have become a major cause of serious injuries and fatalities. A significant portion of run-off-road crashes are single vehicle crashes that occur due to collisions with fixed objects and overturning. These crashes typically tend to be more severe than other types of crashes. Single vehicle run-off-road crashes that occurred between 2004 and 2008 were extracted from Kansas Accident Reporting System (KARS) database to identify the important factors that affected their severity. Different driver, vehicle, road, crash, and environment related factors that influence crash severity are identified by using binary logit models. Three models were developed to take different levels of crash severity as the response variables. The first model taking fatal or incapacitating crashes as the response variable seems to better fit the data than the other two developed models. The variables that were found to increase the probability of run-off-road crash severity are driver related factors such as driver ejection, being an older driver, alcohol involvement, license state, driver being at fault, medical condition of the driver; road related factors such as speed, asphalt road surface, dry road condition; time related factors such as crashes occurring between 6 pm and midnight; environment related factors such as daylight; vehicle related factors such as being an SUV, motorcycles, vehicle getting destroyed or disabled, vehicle maneuver being straight or passing; and fixed object types such as trees and ditches.

Run-off-road (ROR) crashes in the United States have become a major cause of serious injuries and fatalities. Statistics based on Fatality Analysis Reporting System (FARS) data from 2008 illustrated that fatalities due to ROR crashes in the United States were about one-third of total traffic fatalities [

The reasons for the occurrence of ROR crashes vary and involve overturning/overcorrection during the operation of the vehicles, collision with various roadside features that include trees, guardrails, ditches, poles, sign supports, utility poles, culverts, fences, embankments, hydrants etc.; various driver related factors such as alcohol involvement, speeding, medical condition, falling asleep, careless driving, etc. There are also several other

factors that may be directly or indirectly related to ROR crashes to occur. But it has been found that ROR crashes occurring due to collision with fixed objects occupy a major portion of the total number of ROR crashes. Fixed objects’ crashes are also responsible for more severe ROR crashes. According to National Traffic Safety Administration (NHTSA, 2003), collision with fixed objects and non-collision account for 19% of all reported crashes but result in 44% of all fatal crashes [

The economic costs associated with this type of crashes are significantly high. The importance of the roadside safety problem has been recognized by different organizations, and efforts have been made to reduce the types of errors most likely to cause roadside crashes. The societal costs associated with roadside crashes must be recognized before cost-effective strategies can be developed to improve roadside safety. Therefore, it is necessary to identify the factors that are associated with single vehicle ROR crashes so that effective remedies can be developed to reduce the severity of ROR crashes. With this aim, the study focuses on single vehicle ROR crashes in Kansas that occurred between 2004 and 2008. Various driver, road, environment, crash, and vehicle related factors that influence crash severity are identified by using logistic regression analysis after initial frequency analysis of different crash characteristics has been performed.

The primary objective of this study was to identify the important driver, vehicle, road, environment, and crash related factors that influence the severity of single vehicle ROR crashes. First, important crash characteristics related to single vehicle ROR crashes have been identified and then how these factors affect the severity of ROR crashes were analyzed by developing binary logit models. Logistic regression analysis is used to investigate the association between a number of explanatory variables and a single response variable-crash severity. The factors identified in this study are expected to help developing potential countermeasures which will ultimately reduce the severity as well as the number of single vehicle ROR crashes.

Logistic regression or other relevant statistical methods are common in severity modeling. Several studies have adopted this kind of models to examine the association between crash characteristics and crash severity. Liu et al. in a study examining different factors affecting crash severity on gravel roads used binary logit model [

In a study done by Dissanayake on ROR crashes for young drivers, sequential binary logistic regression model was developed to identify the roadway, driver, environmental, and vehicle related factors that affect the crash severity [

Another study conducted in 1999 by Lee and Mannering used nested logit model to analyze ROR crash severity and different observable characteristics [

Spainhour et al. in a study on fatal ROR crashes involving overcorrection used binary logistic regression model to examine the association between human, roadway, vehicle, and environmental factors and overcorrection as opposed to traditional ROR crashes [

Ten year crash data from 1999 to 2008 were obtained from Kansas Accident Reporting System (KARS) database for this study. A Microsoft Access based database, KARS consists of all police reported crashes in Kansas. Once the data had been obtained, ROR crashes were extracted from the database based on the definition of ROR crashes that was established. ROR crashes in this study are defined as those crashes where the vehicles leaving the roadway encroach upon the median, shoulders, or beyond and either overturns, collides with fixed objects or leading to head-on crashes with other vehicles or sideswipe with opposing vehicles; or crashes where the first harmful events occur off the roadway or median-off roadway in case of divided highway sections. After the extraction of ROR crashes based on this definition it has been found that single-vehicle ROR crashes comprised of more than 85% of the total ROR crashes. Therefore, only single vehicle ROR crashes are considered for further analysis in this study.

KARS database includes important crash characteristics related to environment, time, roadway, vehicle, occupants, etc.; it also has different driver, vehicle, roadway, and environment related contributory causes. Important crash characteristics are extracted and crash severity distribution (fatal, incapacitating, non-incapacitating, possible, no injury) has been determined for each single vehicle ROR crash. Crash severity was identified as the highest injury severity sustained by an individual involved in the crash. As an example, if there is at least one person fatally injured in a crash, then the crash is identified as a fatal crash and similarly, if there is at least one incapacitating injury but no fatality resulting from a crash then it is identified as an incapacitating crash. For the purpose of analysis, single vehicle ROR crashes for five years from 2004 to 2008 were combined together and one of the five severity levels was assigned to each crash.

Among different environment related factors considered in this study, it has been found that the highest percentage of ROR crashes occurs during good weather condition. Daylight and dark condition comprise of almost equal percentages of total ROR crashes. Asphalt road surface consists of more than half of the crashes that occur in different road surface types. Among different road surface character, maximum number of crashes occurs in straight and level road surface and in dry road surface condition. Auto takes into account of about 48% of crashes. Among different vehicle maneuvers considered, two-third of the crashes occurs when vehicles follow straight road.

When functionality of crash involved vehicles is considered it has been found that about half of the vehicles involved in ROR crashes are disabled. The involvement of male drivers is higher in number in comparison to female drivers. Younger drivers (age group between 16

and 24 years) consist of more than one-third of total crashes. Alcohol involvement is found in 13% of ROR crashes. Usage of safety equipment for drivers is found to be significant, i.e., 87%. Among different time related factors, it has been identified that one-third of crashes occur during weekend. About 25% crashes occur between 6 pm and midnight.

Initially, it was tried to include as many variables (driver, vehicle, environment, roadway, time-related) as possible for the modeling considering the fact that the quality of the modeling could be expected to increase to a certain level once the number of variables increases. Selection of the variables was carried out based on previous studies and on the assumption that a particular variable would affect the severity of ROR crashes. The descriptions of 43 explanatory variables that are considered for the modeling are provided along with their statistics in Table2 All the explanatory variables are binary except SPEED, which is considered as a continuous variable. Binary variables take the form of either 0 or 1; for example, if a crash occurs during weekend, the variable WEEKEND has been assigned “1” as its value, otherwise “0” is assigned to this variable. Three binary logistic regression models were developed by considering crash severity as the response variable and the description of the models are as follows:

1) FATAL_INCAP (Binary response = 1 if the observation is a fatal or incapacitating crash, =0 otherwise i.e. non-incapacitating, possible or no injury)

2) FATAL_INCAP_NON-INCAP (Binary response = 1 if the observation is a fatal or incapacitating or nonincapacitating crash, =0 otherwise i.e. possible or no injury)

3) INJURY (Binary response = 1 if the observation is a fatal or incapacitating or non-incapacitating or possible crash, =0 otherwise i.e. no injury)

As the aim of the study was to develop models to predict the severity of ROR crashes, logistic regression was identified as the most suitable approach to identify the important factors. As the response variable, crash severity, is dichotomous, ordinary linear regression will not fit properly as the dichotomous dependent variable violates assumptions of homoscedasticity and normality of the error term [

For k explanatory variables and

where,

_{1} is the first ordered level of yα = Intercept parameterβ = Vector of slope parametersX_{i} = Vector of explanatory variables.

The statistical analysis software SAS was used to estimate the maximum likelihood with the help of Proc logistic.

The odds ratio for dichotomous explanatory variable, x, which takes value 1 or 0 (with 1 meaning that the event will certainly occur and 0 meaning that the event will definitely not occur) can be represented as the ratio of the expected number of times that an event will occur (x = 1) to the expected number of times it will not occur (x = 0). This can be illustrated by the formula below [

whereOR = Odds Ratio

Before fitting the model, it was necessary to check if there exists any association between the explanatory variables. Any linear dependency between explanatory variables is called multi-collinearity. If there is any multicollinearity between the explanatory variables, the independent effects of those variables on the outcome might not be achieved. Although multi-collinearity doesn’t bias the coefficients, it makes the coefficients more unstable [

The value of r_{xy} ranges between -1, which indicates strong negative correlation between two explanatory variables and +1, which indicates strong positive correlation between two explanatory variables. Usually variables with correlation coefficients greater than 0.5 are

identified as having multi-collinearity effects between them although there is no hard and fast rule [^{2}, and subtracting R^{2} from 1. If the estimated coefficient of determination is ^{ }for an explanatory variable x_{i}, then Tolerance and VIF will be calculated in the following manner [

Low tolerances indicate high multi-collinearity. In this study, variables with tolerances below 0.4 were removed from the model.

The results of the three crash severity models that are developed are presented in Table3 Explanatory variables that were significant at the 95 confidence level were included in the model, and their corresponding odds ratios had been presented in the parentheses. The model was developed by entering all the variables initially and then by removing one at a time once the variable was found not to be significant.

The model uses 72,181 crash records in total, among them, 3791 crash records are fatal and incapacitating; fatal, incapacitating and non-incapacitating crash severity consist of 16,418 crash records, and all injury level crashes (fatal, incapacitating, non-incapacitating and possible injury) are comprised of 24,232 observations in total.

The first model, where the response variable is FATAL_INCAP (crash severity is either fatal or incapacitating), has 25 explanatory variables as significant. The coefficient of an explanatory variable is directly related to the probability of having a more severe crash. The variables with positive coefficients denote the increasing probability of a certain crash severity and vice versa. 21 independent variables are found to have positive coefficients, which mean that the probability of a fatal or incapacitating crash is likely to increase when one or more of these 21 factors are involved. 6 of the 21 explanatory variables are driver related (driver ejection, older driver, alcohol involvement, license state, drivers at fault, and drivers’ medical condition), 3 are road related (speed, asphalt road surface, and dry road condition), 1 is environment related (daylight), 3 are crash related (accident location, overturning crashes, and time: between 6 PM and midnight), 6 are vehicle related (SUVs, Motorcycles, vehicle destroyed, vehicle disabled, vehicle straight, vehicle passing), and 2 are fixed objects’ type (tree, ditch). Usage of safety equipment, road character (straight and level road), young drivers (drivers aged between 16 and 24 years), vehicle registration state (if the involved vehicle is registered in Kansas) have negative coefficients, and this suggests that these 4 variables decrease the probability of fatal or incapacitating crashes. The odds ratios presented in the parentheses measure the amount by which the crash severity increases. Taking an example of the explanatory variable EJECT, which has an odds ratio of 8.656 for the first model, it can be stated that the probability of fatal or incapacitating crash tends to be 8.656 times higher when drivers are ejected or trapped than when drivers are not ejected or trapped, assuming that rest of the factors remains the same.

The second model, where the response variable is FATAL_INCAP_NON-INCAP (crash severity is either fatal, or incapacitating or non-incapacitating), has 37 explanatory variables that are significant. Among them, 32 variables have positive coefficients. All 21 variables in the first model including license compliance, drivers too fast for conditions, drivers fall asleep, evasive action of drivers, concrete road surface, weather, Vans, vehicle age, fixed objects type of utility pole, median barrier, and guard rail have positive coefficients. This indicates that they increase the probability of the fatal, incapacitating or non-incapacitating crashes. Male, usage of safety equipment, road character (straight and level road), vehicle turn, vehicle registration state have negative coefficients, thus these 5 variables decrease the probability of fatal, incapacitating or non-incapacitating crashes.

In the third and the last model, the response variable is taken as all four levels of injury, and the model has 39 significant variables. All the variables except accident location with positive coefficients in the second model have positive coefficients for the third model as well. In addition, younger drivers, drivers failing to give time and attention, and vehicle body type of being an auto are found to have positive coefficients. The number and type of variables having negative coefficients for the second model remain the same for the last model. Accident location, which has positive coefficient for the first and second model, is found not to be significant for the last model. Restriction compliance, road construction maintenance, and weekend are not significant for any of the three models developed. Usage of safety equipment, road character (straight and level road), and vehicle registration state have negative coefficients for all three models. This

^{*}NS = Not Significant.

means that they decrease the crash severity of any type. 20 explanatory variables are found to have positive coefficients for all the three models. These are driver ejection, older driver, alcohol involvement, license state, drivers at fault, medical condition for drivers as driver related factors, speed, asphalt road surface, condition of the road as road related factors, crashes due to overturn, time (crashes occurring between 6 pm and midnight) as crash related factors, daylight as environment related factors, SUVs, motorcycles, vehicle destroyed, vehicle disabled, vehicle straight, vehicle passing as vehicle related factors, tree and ditch as fixed objects types. Ejection of drivers, medical condition for drivers, vehicle body type of motorcycles, vehicle destroyed, and vehicle disabled have very significant odds ratios for all the three models and represent their tremendous positive effects on crash severity.

Different statistical parameters for the model show that the first model is more suitable than the other two models. The first model has the smallest number of significant predictors compared to the other two models. As a result, the prediction equation for the first model is much simpler with fewer variables while higher percentage of explanatory variables to be well predicted than the other two models. Akaike’s information criterion (AIC), Schwartz criterion (SC) and 2 Log likelihood criterions (2LLC) are the lowest for the first model. The lower the three statistics are, the more desirably the model fits the data [^{2}. The R^{2} is the highest (0.2772) for the last model and is the lowest for the first model (0.1227). But the adjusted R^{2} (Max-rescaled R^{2}) is pretty similar for all three models with 0.35 for the first and the second model and 0.38 for the last model and this designates that all the models fit the data appropriately.

The study developed binary logit model in order to determine the important factors associated with the severity of single vehicle ROR crashes. Three models had been established by using 72,181 crash records, and 43 explanatory variables were used in this study to identify how they influenced ROR crash severity. 20 variables appeared to be positively associated with ROR crash severity for all the three models; this means that all of them increase the severity of ROR crashes. These variables are driver related factors such as driver ejection, older driver, alcohol involvement, license state, drivers at fault, medical condition of the drivers; road related variables such as speed, asphalt road surface, dry road condition; crashes occurring between 6 pm and midnight, daylight as environment related factors; vehicle related factors such as SUVs, motorcycles, vehicle destroyed, vehicle disabled, vehicle straight, and vehicle passing; and tree and ditch as fixed objects types. Usage of safety equipment, straight and level road, and vehicle registration are found to have a decreasing tendency towards the crash severity for all three models. There are three variables such as restriction compliance, road construction maintenance, and weekends that appeared not to be significant for any of the three models. Variables that are found to have positive as well as negative association with crash severity in this study are identical with previous studies on ROR crashes. Five variables: ejection of drivers, medical condition for drivers, motorcycles as vehicle body type, vehicle destroyed, and vehicle disabled have incredible effects on crash severity as their odds ratios are found to be distinguishably higher for all the three models. Among the three models, the first one has been found to be better than the other two when different statistical parameters are compared. The use of logistic regression model in predicting the factors and affecting crash severity is a useful tool and could be considered to provide more accurate estimations than other methods. The variables that are identified in this study as influential towards crash severity can help in developing appropriate countermeasures to reduce the severity of single vehicle ROR crashes.

Statistical models are useful in determining the association between different factors as well as contributory causes and crash frequency, type and severity. This study focuses only on crash severity. The crash database that has been used in this study to develop the model is based on police recorded crash reports, which might raise the question of accuracy [