An Alternative Regression-Based Approach to Estimate the Crash Modification Factors of Multiple Treatments Using Before-and-After Data

Before-and-after methods have been effectively used in the road safety studies to estimate Crash Modification Factors (CMFs) of individual treatments as well as the multiple treatments on roadways. Since the common practice is to apply multiple treatments on road segments, it is important to have a method to estimate CMFs of individual treatment so that the effect of each treatment towards improving the road safety can be identified. Even though there are methods introduced by researchers to combine multiple CMFs or to isolate the safety effectiveness of individual treatment from CMFs developed for multiple treatments, those methods have to be tested before using them. This study considered two multiple treatments namely 1) Safety edge with lane widening 2) Adding 2 ft paved shoulders with shoulder rumble strips and/or asphalt resurfacing. The objectives of this research are to propose a regression-based method to estimate individual CMFs estimate CMFs using before-and-after Empirical Bayes method and compare the results. The results showed that having large sample size gives accurate predictions with smaller standard error and p-values of the considered treatments. Also, results obtained from regression method are similar to the EB method even though the values are not exactly the same. Finally, it was seen that the safety edge treatment reduces crashes by 15% 25% and adding 2 ft shoulders with rumble strips reduces crashes by 25% 49%.


Introduction
Motor traffic injuries are one of the predominant causes of fatalities in the world as well as in the United States [1] [2]. More than 32,000 fatalities per year have been reported in the United States from 2009 to 2014 [3]. Furthermore, it was identified that the lane-departure crashes account for approximately 54% of the total motor vehicle fatalities in the United States. Similar to the national level, Kansas has experienced more than 350 motor vehicle fatalities per year from 2009 to 2014 and nearly 60% of those are due to lane-departure crashes [4]. Therefore, many different treatments have been implemented on Kansas road segments such as lane widening, adding paved shoulders, rumble strips, safety edge treatments, chevrons, and cable median barriers to reduce lane-departure crashes as well as all crashes. This research estimates safety effectiveness of two treatments; safety edge treatments and adding 2 ft paved shoulders on Kansas rural two-lane road segments where the date of implementation of those treatments are known. However, those treatments were implemented with some other treatments. Therefore, the efforts have been made to isolate the safety effectiveness of individual treatments.
Crash Modification Factors (CMFs) were used to estimate the safety effectiveness of the considered treatments. Before-and-after studies such as Empirical Bayes (EB) method have been proven to be effective in estimating CMFs of the treatments where the date of implementation is known, yet those methods would estimate the combined CMFs if the considered road segments had multiple treatments. Even though, the relationships between individual treatments and the combined treatments have been investigated it is difficult to find the exact relationship between those treatments for a given region due to differences in various factors affecting crashes. Therefore, an alternative regression-based method was introduced in this study. Generalized Linear Regression modeling with Negative Binomial error structure was used to fit the regression models using before and after data. Furthermore, before-and-after EB method was used to estimate CMFs for multiple treatments. Commonly used methods of estimating combined CMFs due to multiple treatments using individual CMFs were used to isolate the safety effectiveness of safety edge treatments and adding 2 ft paved shoulders. Finally, results from both models were compared and the advantages and limitations are discussed in the results and conclusions.

Literature Review
Different methods have been used by researchers to develop CMFs, and many of those methods are summarized in the Highway Safety Manual [5]. These methods can be divided into two broad categories, namely before-and-after and cross-sectional study approaches. The major difference between those two approaches is that in before-and-after approach data are required for both before-and-after periods of the treatment. Therefore the date of implementation of the treatment is required. In cross-sectional studies, the date of implemen-U. Galgamuwa [5].
Out of the many methods of estimating CMFs using before-and-after data, EB method is proven to provide accurate results by accounting for the regression-to-the-mean effect [6] [7] [8] [9]. However, if the road segment had multiple treatments at the same time, EB method usually estimates the combined CMF due to multiple treatments.  [12].

Predicting CMFs for Multiple Treatments
-Organize CMFs based on crash types and their applications into groups.
-Previous experience and expert judgment.
-Apply a weightage factor to a multiplication of CMFs.
-Assume independence between the treatments and take the product of all the CMFs. -Apply only the most effective CMF.
However, any of these methods haven't been proven to be effective in all the regions. Therefore each method should be tested before applying the methods in another region other than the regions that they have been proven to be effective.

Past Studies on Considered Treatments
Since this study tries to estimate combined CMFs due to safety edge treatments with lane widening and adding 2 ft paved shoulders with asphalt resurfacing and/or shoulder rumble strips and the individual CMF due to safety edge, lane widening, adding 2 ft paved shoulder, asphalt resurfacing and adding shoulder rumble strips, previous literature on those treatments were considered so that the results can be compared with other studies.

Safety Edge Treatment
Safety edge enables drivers to safely re-enter into the travel lane who drifted off the highway [13]. It has been proven to reduce all crashes, run-off-road crashes, and pavement drop of crashes in many states including Iowa [14]. A study conducted in Iowa using before-and-after EB method showed that implementing safety edge treatment reduced all non-intersection crashes by 13% and fatal and injury crashes by 16%. Also, it was found that due to safety edge treatments total run-off-road (ROR) crashes were reduced by 12% and the injury ROR crashes decreased by 8% [15]. However, a study conducted in Georgia, Indiana, and New York using EB method and cross-sectional method showed the mixed ef-Journal of Transportation Technologies fects due to implementing safety edge treatments on two-lane road segments.
Results obtained from EB method for Georgia and Indiana showed that having safety edge treatments reduced total crashes and fatal and injury crashes up to 11% and 44%, total ROR crashes and fatal and injury ROR crashes up to 14% and 46% and drop off related all crashes and fatal and injury crashes up to 10% and 38%. Cross-sectional study showed that having safety edge treatments on two-lane roads in Georgia, Indiana, and New York reduced total crashes and fatal and injury crashes up to 48% and 70%, total ROR crashes and fatal and injury ROR crashes up to 57% and 81% and drop off related all crashes and fatal and injury crashes up to 70% and 86%. However, it was seen that in the cross-sectional method the standard error of the safety edge treatments had larger values than in EB method [16].

Adding Paved Shoulders
Literature indicated that adding paved shoulders have both crash reduction and increasing effect on two-lane and four-lane road segments. A study conducted in Kansas using cross-sectional method showed that having 2 ft paved shoulders reduce lane-departure crashes by 12% -18% and 11% -34% on the rural undivided the tangent and curved road segments respectively. Also, the study showed that having 2 ft paved shoulders reduce fatal and injury lane-departure crashes by 6% -16% and 7% -21% on a tangent and curved road segments in Kansas [19]. Even though there is much literature indicated that the paved shoulders reduce crashes, there are few which conclude that the paved shoulders are positively associated with crashes. A study conducted in Illinois estimated the safety effectiveness of adding and widening paved shoulders on rural multilane and two-lane road segments. The results showed that widening paved shoulders from 4 and 6 ft to 8 ft increased shoulder related fatal crashes by 4% -7% and reduced injury crashes by 3% -7%. Also, the study showed that adding 6 ft or 8 ft paved shoulders have increased shoulder related fatal crashes by 8% -10% and injury crashes by 5% -8% [20].

Asphalt Resurfacing
Asphalt resurfacing is done to improve the road condition and to increase the serviceability of the road. Same as for many other treatments, asphalt resurfacing U. Galgamuwa

Data and Methodology
This section sumerises how the data were prepared for the proposed regression based method and the EB method. Furthermore, this section sumerises methodology of estimating CMFs based on both methods.

Data
Crash related information including the location of the crash, crash year, and the severity was obtained from Kansas Crash Analysis and Reporting System (KCARS) database and the geometric and traffic related characteristics of before and after periods of the treatments on each road segment were extracted from Control Section Analysis System (CANSYS), which is the Kansas state highway system database. Three years before and three years after data were extracted for the considered treatments excluding the year of the treatment. Three roads were identified as having safety edge treatment with lane widening with the total length of 72 miles. Twelve roads were identified to have adding 2 ft paved shoulder treatment with asphalt resurfacing and/or shoulder rumble strips with the total length of 461 miles.  Figure 1 illlustrates the data preperation for the proposed method using hypothetical example which assumed that the considered road segment had road resurfacing and 2 ft paved shoulders with shoulder rumble strips.

Data Preperation
As shown in Figure 1, before the treatments have been applied the road segments did not have 2 ft paved shoulders, shoulder rumble strips or asphalt resurfacing. Therefore, those variable in the before period is zero as shown in Figure 1(b), route number 1a. However, the considered road segment had all the treatmnets at the same time. Therefore, the corresponding values for those treatment variables are one, as shown in Figure 1(b), route number 1b. Segment length remains the same since the same road segment was considered, however, AADT and number of lane-departure crashes were varied as shown in Figure   1(b), route number 1a, and 1b. Since the before and after characteristics of the same road segment was considered the effect of not considering the driver behaviour and environmental-related characteristics in the SPF in EB method can be minimized.

Model Development
A generalized linear regression model using Negative Binomial error structure was employed to develop models to estimate individual effect of each treatment.
where, y = n × 1 observations of crashes; β = p × 1 vector of estimated regression parameters corresponding to geometric design and traffic volume related independent variables; X = n × p known independent model matrix of geometric design and traffic volume related variables; i ε = n × 1 random vector variables (error).
The mean-variance relationship of negative binomial distribution can be expressed as shown in Equation (2).
Var y E y kE y = + U. Galgamuwa, S. Dissanayake Journal of Transportation Technologies where,
The maximum likelihood method estimates the coefficients in the linear regression model and the maximum likelihood function ( ) 2 , , , When developing models for the road segments with safety edge treatment and lane widening, crashes per year per segment were considered as the response variable. Access control, terrain type, segment length, posted speed limit, the percentage of heavy vehicles, AADT, average lane width, and presence of safety edge treatment were considered as explanatory variables. Similarly, crashes per year per segment were considered as the response variable when developing models for adding 2 ft paved shoulders with asphalt resurfacing and/or shoulder Journal of Transportation Technologies rumble strips. Segment length, access control, average lane width, terrain type, posted speed limit, the percentage of heavy vehicles, AADT, the presence of 2 ft paved shoulders only, presence of 2 ft paved shoulders with shoulder rumble strips and asphalt resurfacing were considered as the explanatory variables. When developing both models influence points were identified using DFBETAS, which indicates how much influence does one observation has in the determination of particular regression coefficient [27].

Before-and-After EB Method
The same dataset used to develop regression models was used to estimate CMFs using EB method. Since all the road segments with considered treatments are on rural two-lane undivided road segments, safety performance function given in HSM was used to predict average crash frequencies determined for base conditions on the respective road segment in before and after period as shown in Equation (4). Since it was decided to develop CMFs for all crash types and lane-departure crashes, the proportion of lane-departure crashes to the all crashes were calculated using reference sites. Also, proportions of fatal crashes to all crashes were calculated so that these models can be used to predict lane-departure crashes as well as fatal and injury crashes. When selecting reference sites, variables such as access control, lane width, shoulder type, shoulder width, rumble strips, posted speed limit, AADT and percentage of heavy vehicles were used. Equation (5) was used to predict average crash frequency for a specific year after calibrating SPF for the local conditions using the calculated calibration factors and Equation 6 was used to estimate the expected number of crashes during before period [5] [8]. ( ) where, Finally, CMFs were calculated using the expected number of crashes in after period and the observed crash frequencies.

Estimating CMFs for Individual Treatments
Three of the commonly used methods were identified to estimate combined effect of treatments using their individual CMFs as shown in Equation (7), (8) and (9)

Results and Discussion
In order to understand the main characteristics of the selected road segments, descriptive statistics of road segments were calculated and shown in Table 1. It is seen that the roads which had safety edge treatments with lane widening are low volume roads. The roads which had 2 ft paved shoulders with asphalt resurfacing and/or shoulder rumble strips have high traffic volumes than the roads with safety edge treatments. Segment length distribution has a wide range of the roads which had 2 ft paved shoulders with asphalt resurfacing and/or shoulder rumble strips than in the roads with safety edge treatments.

Regression Method to Estimate Individual CMFS
Separate models were developed using SAS 9.4 for each combined treatment as mentioned in the methodology [28]. Results of the two models are shown in Table 2 with their standard errors and p-values.
When developing models, the variance inflation factor (VIF) was checked in both models, and it was seen that the selected variables into the models have VIF less than 5. Therefore, it was concluded that there aren't any multicollinearity effects between explanatory variables.
Based on the results presented in Table 2, it can be seen that the dispersion For an example, consider Figure 1(b). However, it was seen that if the sample size is large the models give smaller p-values for both all crashes and fatal and injury crash models as in Model 2.
Estimated regression parameters were used to develop CMFs, and CMF = exp(β) was used to back transform the estimated regression parameters to find U. Galgamuwa, S. Dissanayake

CMFs Estimated Using Before-and-After EB Method
CMFs were estimated using before-and-after EB method to check whether there are similarities to the estimated CMFs using regression method. The method given in HSM was used to develop models as shown in the methodology. Journal of Transportation Technologies Calibration factors were calculated using the reference sites. Calibration factors for before and after periods were found to be 1.37 and 1.29. Also, the proportion of lane-departure crashes to all crash types was 0.50 and the fatal and injury crash proportions in before and after time periods were 0.24 and 0.22. The estimated CMFs for considered multiple treatments are shown in Table 4.
Based on the results it can be seen that the CMFs estimated for safety edge treatments with lane widening are not significant except for all crashes.
U. Galgamuwa, S. Dissanayake Journal of Transportation Technologies  Table 4 shows the estimated CMFs for the combined treatments. Finally, the individual CMFs were calculated using Equation

Estimated Individual CMFs
Since the main focus was to identify the safety effectiveness of safety edge treat-Journal of Transportation Technologies  Table 3. Table 5 shows the calculated individual CMFs for safety edge treatment and adding 2 ft paved shoulders on considered road segments.
It is seen that the individual CMFs estimated for all crashes from regression method shown in Table 3(a) is similar to CMFs estimated for safety edge treatments on the road segments with 1 ft lane widening using method 3 shown in

Conclusions
Developing CMFs using before-and-after EB method is one of the widely used practices among the safety engineers to identify the safety effectiveness of treatment or multiple treatments. However, it is required to use additional methods to determine the individual safety effectiveness of the combined treatments, and U. Galgamuwa, S. Dissanayake Note: a CMFs were estimated using before-and-after EB method. b CMFs were estimated using regression parameters for increasing lane width in Table 2.
c CMFs were estimated using the methods shown in Equation (7), (8)  some of the widely used methods are provided in Equation (7), (8) and (9). It is unclear which method is to be used for given geographic region. Therefore, this study employed an alternative method based on regression models to estimate individual CMFs where multiple treatments have been implemented at the same time. This method has many advantages and some limitations which should be addressed in the future research.
One of the advantages is that if the considered treatment was implemented with another treatment(s), this approach could be used to identify the individual safety effectiveness of each treatment. Hence decisions can be made whether to implement these treatments individually or collectively. Even though there are some methods to account for the multiple treatments as shown in Equation (7), (8) and (9), those methods require the CMFs of other treatments which might be not available for given region, facility type or considered crash types. In such case, this method can be implemented directly which does not require CMFs for other treatment implemented at the same time to the treatment of interest. Even though CMFs are available for the other treatments, those are average safety effectiveness of the specific treatment on similar road segments, but not necessarily the same on the considered road segments. Therefore, the regression method will estimate the safety effectiveness of other treatments specific to the considered road segments. Since many explanatory variables are considered when developing the regression models, they will also act as the SPFs in EB method and will provide accurate crash predictions. Since the models are developed using same road segments by assigning before and after characteristics, the effect of the confounding variables which are not included into crash frequency models such as drivers' culture, demographic distribution of the drivers and land use pattern, can be minimized.
Even though there are many advantages of using this regression approach to estimate individual CMFs, there are some limitations which needed to be addressed in future research. Since this approach requires regression modeling, it is necessary to have enough sample size. If the sample size is small, the developed regression model will have larger p-values for the important explanatory variables such as the treatment as in the Model 1. If the sample size is relatively large, p-values will be smaller; hence the important variables become significant at higher confidence intervals as in the Model 2. Furthermore, if the considered road segments have crash distribution with a narrow range, the developed models using such samples tend to give larger p-values as in the fatal and injury crash models. However, in both Model 1 and 2, the standard error of the treatment which is the CMF is larger than in the respective EB method. Therefore, it is necessary to find out the optimum sample size for such models which give significant p-values with lower standard errors. Finally, this method is not useful if the treatments were not implemented at the same time.