^{1}

^{1}

^{1}

^{2}

Accurate classification and prediction of future traffic conditions are essential for developing effective strategies for congestion mitigation on the highway systems. Speed distribution is one of the traffic stream parameters, which has been used to quantify the traffic conditions. Previous studies have shown that multi-modal probability distribution of speeds gives excellent results when simultaneously evaluating congested and free-flow traffic conditions. However, most of these previous analytical studies do not incorporate the influencing factors in characterizing these conditions. This study evaluates the impact of traffic occupancy on the multi-state speed distribution using the Bayesian Dirichlet Process Mixtures of Generalized Linear Models (DPM-GLM). Further, the study estimates the speed cut-point values of traffic states, which separate them into homogeneous groups using Bayesian change-point detection (BCD) technique. The study used 2015 archived one-year traffic data collected on Florida’s Interstate 295 freeway corridor. Information criteria results revealed three traffic states, which were identified as free-flow, transitional flow condition (congestion onset/offset), and the congested condition. The findings of the DPM-GLM indicated that in all estimated states, the traffic speed decreases when traffic occupancy increases. Comparison of the influence of traffic occupancy between traffic states showed that traffic occupancy has more impact on the free-flow and the congested state than on the transitional flow condition. With respect to estimating the threshold speed value, the results of the BCD model revealed promising findings in characterizing levels of traffic congestion.

Speed is one of the important parameters in traffic flow analysis. Hence, understanding its characteristics is essential in the application of intelligent transport systems and for measuring the consistency of the traffic performance of a highway system. Furthermore, the speed distribution is useful in simulation and theoretical derivations regarding different traffic performance measures such as speed reliability and variability. The accurate estimation and prediction of speed are essential for traffic operators, planners, and traveler information systems [

Several factors influence the distribution of the traffic speed on freeways. These factors can be grouped into time-variant and time-invariant factors. The time-invariant factors include road geometric characteristics (e.g., posted speed limit, lane width, pavement condition, number of lanes, etc.) while the time-variant factors include traffic conditions (i.e., traffic flow and density), vehicle mix, incidents, and driving characteristics [

The main objective of this paper is to provide a quantitative analysis of traffic congestion using mixture characteristics of the traffic speed distribution. In the modeling process, each traffic speed record is assumed to come from a hidden traffic state, which is drawn from a linear relation with traffic occupancy. Therefore, the corresponding impact of the traffic occupancy on the expected travel speed in each state is identified. More specifically, the study uses the Bayesian Dirichlet Process Mixtures of Generalized Linear Models (DPM-GLM) to cluster these states. The Dirichlet process mixture (DPM) classifies the hidden state by categorizing the GLM of each state. In addition, the study uses the Bayesian change-point detection (BCD) model to estimate the possible threshold speed value for each of the states. The threshold value is assumed to separate traffic states into homogeneous groups; thus, this procedure facilitates classification of the traffic condition. The BCD model is estimated using a Bayesian approach, which gives the posterior distribution of the threshold values as well as the uncertainty of estimates. To check the consistency of the estimated cut- points by this approach, the classification error method that minimizes the false positive rate in each state is used to estimate optimal thresholds as well. Both posterior distributions of the model parameters for DPM-GLM and BCD are fitted by the Metropolis-Hastings MCMC sampler. The study uses archived traffic data collected for a year in 2015 on an Interstate 295 corridor located in Jacksonville, Florida.

Most of the early analytical studies in modeling the characteristics of speed assume that the distribution follows a single-model distribution [

Several studies have applied the multi-modal distribution to characterize different traffic conditions. For instance, the study in [

It is worth mentioning studies, which are more closely related to our study. The study in [

In all closely related literature aforementioned, the expectation-maximization (EM) approach for estimating the parameters was used. The EM method is susceptible to a local minima problem (over-fitting). In this study, the Markov Chain Monte Carlo (MCMC) approach that treats the model parameters as distributions is used. Apart from eliminating the over-fitting problem, the posterior distribution of the parameters estimated by this method can be updated easily when new data become available. Besides, it incorporates a prior knowledge regarding speed distribution [

In the commonly used finite mixture models, the expected mean values of the given observations, such as speed in each component mixture are constants. In this study, the conventional method is extended such that it depends on the explanatory variables,

The GLM parameters are linear predictors given by:

Parameter/variable | Definition |
---|---|

DP (α, H) | random probability density function coming from the Dirichlet distribution with parameters α and H |

H | represents the base distribution |

α | concentration parameter |

G | the random distribution drawn from the Dirichlet process DP (α, H) |

the parameter of G distribution which follows a stick-breaking process (SBP) | |

is the regression parameters | |

is the vector of predictors | |

is the variance in the model | |

N | the Gaussian distribution |

S_{i} | is the speed observation |

is the mixing proportion | |

represent a Dirac delta function concentrated at | |

k | represents the number of mixture components. |

The above DPM-GLM is implemented using the stick-breaking process (SBP). The SBP involves breaking a unit length stick into disjoint pieces repeatedly [

Estimating the posterior distribution of the hierarchical Bayesian model is analytically challenging as it involves high dimensional integral in the marginal likelihood [

In this study, three information criteria, which are the Bayesian information criterion (BIC), the Akaike information criterion (AIC), and the Deviance Information Criterion (DIC) were used to select the optimal number of mixture states. All information criteria balance between model complexity (i.e., the number of parameters required) and accuracy in prediction to identify the most appropriate model. The model with the smallest score is selected as the best model among a set of candidates. The BIC is defined as:

The AIC is given as:

where k is the number of estimated parameters, L is the maximized likelihood of the model, and n is the number of observations.

In Bayesian statistics, the DIC is commonly used for the goodness of fit test [

Similar to clustering task, change-point detection (BCD) represents a threshold value/location that divides data into distinct homogeneous groups. The process of detecting the change-point is well established in time series problem, whereby the main purpose is to identify the shift in trends such that the pattern before and after a threshold value are different [

In computing the change-point using the Bayesian approach, the assumption about parameter estimation is needed. Herein, we analyzed the problem considering a linear regression with normality assumption. The following model indicates an example of the change-point detection problem with two switch points [

where switch point refers to the speed where the pattern changes, a is the speed less than

The above model can be easily modified to infer the change-point in the explanatory variables instead of the response variable. Prior to estimating the model parameters, the number of change-point was inferred from the optimal number of clusters established through information criterion methods (see model selection section). Afterward, the above model parameters are estimated via the Bayesian approach. In particular, we implemented in Pymc3 [

We also considered the classification error method that minimizes the false positive rate in each component to estimate the threshold values [

The study used traffic data from a 4.8-mile corridor of the Interstate 295 freeway (

The archived traffic data for analysis were provided by the Regional Integrated Transportation Information System (RITIS). The dataset is composed of spot speed and traffic occupancy collected from microwave vehicle detectors (MVD) aggregated at 15-minute intervals. The data gathered were collected for the period of January 1, 2015 through December 31, 2015. Weekend, holidays, and days in which incidents (crashes, work zones, etc.) happened were omitted from the dataset to reduce variability. The average speed from the MVD was calculated and considered to represent the link travel speed.

The corridor travel speed was estimated using Equation (9).

where n represents the number of detectors on a link (five (5) detectors are used), and

the morning peak hour occurred between 7 a.m. and 8 a.m. while the evening peaking hour occurred between 5 p.m. and 6 p.m. Closer examination of

In modeling of the mixture model, the stationary stochastic process is required. However, the speed characteristic is noisy in nature and a long time window of analysis is usually nonstationary. To address the problem, it is a common approach dividing the speed into intervals to create a stationary characteristic and then mixture models are applied to account for heterogeneity in speed data [

The model selection results based on AIC and BIC criteria are shown in

Time of analysis | Morning peak hours (6 a.m. - 9 a.m.) | Evening peak hours (3 p.m. - 7 p.m.) | |
---|---|---|---|

Id | Mixture components | Deviance Information Criteria (DIC) | Deviance Information Criteria (DIC) |

1 | 2 | 24,588 | 23,711 |

2 | 3 | 8592 | 22,992 |

3 | 4 | 24,330 | 27,834 |

dition. Nonetheless, the distribution characteristics of congestion onset and congestion offset are similar and are considered as one state in our study, i.e., transitional flow condition.

Morning peak hours (6 a.m. - 9 a.m.) | |||||
---|---|---|---|---|---|

Id | Mean | Std. | MC error | 95% BCI | |

Free-flow condition | |||||

1 | Intercept | 79.01 | 2.73 | 0.2673 | 78.72, 79.35 |

Occupancy | −0.16 | 0.00 | 0.00 | −0.16, −0.16 | |

Congestion onset/offset | |||||

2 | Intercept | 70.86 | 0.17 | 0.008 | 70.10, 71.54 |

Occupancy | −0.03 | 0.00 | 0.00 | −0.04, −0.03 | |

Congested condition | |||||

3 | Intercept | 66.32 | 0.69 | 0.07 | 64.91, 67.49 |

Occupancy | −0.15 | 0.00 | 0.00 | −0.16, −0.15 | |

Evening peak hours (3 p.m. - 7 p.m.) | |||||

Free-flow condition | Mean | Std. | MC error | 95% BCI | |

1 | Intercept | 73.60 | 0.34 | 0.030 | 72.92, 74.19 |

Occupancy | −0.13 | 0.003 | 0.0002 | −0.13, −0.12 | |

Congestion onset/offset | |||||

2 | Intercept | 70.99 | 0.101 | 0.007 | 70.80, 71.19 |

Occupancy | −0.029 | 0.001 | 0.0001 | −0.032, −0.026 | |

Congested condition | |||||

3 | Intercept | 46.63 | 0.540 | 0.05 | 45.59, 47.65 |

Occupancy | −0.036 | 0.010 | 0.001 | −0.059, −0.015 |

Note: BCI is the Bayesian credible interval, Std. stands for the standard deviation of the posterior distribution; MC error represents the Monte Carlo error.

offset condition. A similar pattern was seen during the evening peak hours.

The results of mixture components show that the morning peak hours revealed a higher proportion of free-flow speed data (60%) followed by 27% for

congestion onset/offset speed and congested speed being the least (13%). On the other hand, evening peak hours indicated a higher percentage of data in congestion onset/offset (nearly 89%) with the least data in the congested state (0.43%). Comparatively, the morning peak hours experience more congestion than evening peak hours (

To assess the effectiveness of the change-point detection (BCD) approach, the study started with testing the model using simulated data prior to modeling traffic dataset. The results were reasonable given that the estimated parameters were

close to the actual parameters. Then, the developed model was applied to traffic data to detect the speed threshold values.

Using the classification error method, the morning peak hours indicated 47 mph speed for congestion and congestion onset/offset cut-point value while the free-flow condition was estimated to have a speed greater than 64 mph (

Comparing the estimate from the classification error method and BCD method, the threshold values are close to one another with a smaller difference. Moreover, these findings are consistent with findings reported in the literature. For instance, a research conducted by [

The main objective of this paper was to provide a quantitative analysis of the traffic congestion using mixture characteristics of the traffic speed distribution. In the modeling process, each speed record was assumed to come from the hidden traffic state, which is linearly related to the traffic occupancy. The study used the Bayesian Dirichlet Process Mixtures of Generalized Linear Models (DPM- GLM) to achieve this task. Furthermore, the study used Bayesian change-point detection (BCD) approach to estimate the possible threshold speed value for the established states, which separates the states into homogeneous groups. In addition, the classification error method that minimizes the error in each mixture component was used for the purpose of comparison with BCD results.

To accomplish the study, data collected from the Interstate 295 freeway corridor in Jacksonville, Florida were used. The archived traffic data used in the analysis were collected in the corridor using microwave vehicle detectors (MVD) and were aggregated at a 15-minute interval. The data gathered were collected for the period of January 1, 2015 through December 31, 2015.

According to the information criteria analysis, three traffic states were identified as the optimal number of mixture states that provide a better trade-off between model complexity and accuracy in prediction. These states correspond to free-flow, transitional flow condition (congestion onset/offset), and the congested condition. The results of mixture components indicated that the proportion of congested speed is greater for the morning peak hours (13%) compared with the evening peak hours. Furthermore, congestion onset/offset speed and free-flow speed were estimated with the highest proportion among the components during the evening peak hours and the morning peak hours, respectively. The change-point detection approach demonstrated that it can be used to estimate the cut-point speed in order to classify different traffic states. In the model results, 47 mph and 48 mph are indicated as speed for congestion and congestion onset/offset cut-point value during the morning peak hours and evening peak hours, respectively. The free-flow speed is estimated at the speed greater than 64 mph and 66 mph for morning peak hours and evening peak hours, respectively.

The proposed approach can be used to identify accurately clusters of low-speed regimes to better detect congestions. The approach can be used both in a retrospective analysis of historical data evaluation and prospective evaluation to identify congestions in real-time. Practically, dissemination of this information to the public is very important so that regular and non-regular commuters can make well-informed decisions in order to avoid delays and congestion.

It is important to note that the data used in this study were aggregated at 15-minute intervals. It is not clear whether similar conclusions can be generalized to other time intervals (such as 5-minute, 1-hour etc.) of data aggregation. Future studies may consider using different time interval in the analysis. Furthermore, this study focused on evaluating the impact of traffic occupancy in characterizing traffic congestion. However, there are other factors that influence traffic conditions; therefore, evaluating the impact of other time-varying factors such as the effect of incidents, vehicle mix, weather, driving characteristics and other factors should be considered in future studies. It is also recommended that this methodology be extended to a longer corridor and large-scale road networks.

Kidando, E., Moses, R., Ozguven, E.E. and Sando, T. (2017) Evaluating Traffic Congestion Using the Traffic Occupancy and Speed Distribution Relationship: An Application of Bayesian Dirichlet Process Mixtures of Generalized Linear Model. Journal of Transportation Technologies, 7, 318-335. https://doi.org/10.4236/jtts.2017.73021