Configuration for Predicting Travel-Time Using Wavelet Packets and Support Vector Regression

Travel-time prediction has gained significance over the years especially in urban areas due to increasing traffic congestion. In this paper, the basic building blocks of the travel-time prediction models are discussed, with a small review of the previous work. A model for the travel-time prediction on freeways based on wavelet packet decomposition and support vector regression (WDSVR) is proposed, which used the multi-resolution and equivalent frequency distribution ability of the wavelet transform to train the support vector machines. The results are compared against the classical support vector regression (SVR) method. Our results indicated that the wavelet reconstructed coefficient when used as an input to the support vector machine for regression performed better (with selected wavelets only), when compared with the support vector regression model (without wavelet decomposition) with a prediction horizon of 45 minutes and more. The data used in this paper was taken from the California Department of Transportation (Caltrans) of District 12 with a detector density of 2.73, experiencing daily peak hours except most weekends. The data was stored for a period of 214 days accumulated over 5-minute intervals over a distance of 9.13 miles. The results indicated MAPE ranging from 12.35% to 14.75% against the classical SVR method with MAPE ranging from 12.57% to 15.84% with a prediction horizon of 45 minutes to 1 hour. The basic criteria for selection of wavelet basis for preprocessing the inputs of support vector machines are also explored to filter the set of wavelet families for the WDSVR model. Finally, a configuration of travel-time prediction on freeways is presented with interchangeable prediction methods.


Introduction
Accurate travel-time forecast information has become a fundamental component of all ATIS (Advanced Traffic Information Systems).Currently, drivers demand an accurate travel-time calculator that can forecast their commute time in advance.This forecast is even more significant in the morning and evening hours, when the commuters face jammed freeways and they want to avoid the peak-hour congestion.Drivers prefer precise information of the future traffic conditions to manage their route.Presently, most of the State Department traffic websites provide the current traffic conditions, some sites even calculate a forecast of the travel time based on the historical data and/or current data by employing a suitable algorithm [1,2].
The travel-time is dependent on multiple factors that are related through a complex-dependent relationship with one another.Such factors include weather conditions, driver behavior, and time of the day etc.This complex-dependence makes the traffic data both non-linear and non-stationary.Consequently, accurate prediction of travel time becomes a challenging task.
Travel time prediction method can be classified from different perspectives as shown in Figure 1.While, a brief overview of all types is given in Section 2, the focus of this paper is on improving a short-term data driven prediction method.
Table 1 shows a brief overview of the prior art in this area.The prediction horizons in Table 1 range from 5 minutes to 60 minutes.However, lower forecast horizons are not very useful for commuters in the real-world scenario as there are delays involved in every module of the travel-time prediction process; the process diagram of the prediction process is shown in Figure 2.
Artificial Intelligence methods were extensively used  in travel-time prediction [7][8][9][10].Most of this work was concentrated on the short-term travel-time prediction, (prediction horizon less than 60 minutes) mainly using the artificial neural network (ANN) technique.On the other hand, machine learning methods, such as support vector regression (SVR), that have shown superior performance when compared with other traditional methods for prediction of non-linear data, have not been applied aggressively in the area of travel-time prediction.
Support vector machines since their inception by Vapnik [11,12] were extensively used in classification and prediction problems.SVM uses a simple geometric interpretation and gives a sparse solution.The solution of SVM is also global and unique as SVM employs the structural-risk-minimization principle.The support vector regression method [13] approaches the linear regression forecast by addressing it as a convex optimization problem (details in section 4).Its performance in financial time series forecast [14], bioinformatics [15] and various other areas of research also makes it a viable method in intelligent transportation systems (ITS) applications.SVR application as a forecasting tool in ITS was first done by Wu [5], who predicted short-term travel time on the basis of past and current values.Recently, Wang in [16], used wavelet kernel support vector machine for regression to predict traffic flow in ITS applications.
In the recent years many researchers decomposed time series into more informative domains like the wavelets transform [17], S-transform [18] etc., as an input to the SVR that showed more accurate results than the nondecomposed method.This improved performance of SVR along with the ability of SVR to predict non-linear data, formed the motivation of our research to explore the effectiveness of travel-time prediction using wavelet transformed travel-time values as an input to SVR.
The rest of the paper is organized as follows: the problem statement along with some highlights of the past research is given in Section 2. Wavelet theory and Support vector regression are explained in Section 3 and 4, respectively.In Section 5 the proposed model is explained.Then we show the results of our model in Section 6.Finally, the paper is concluded in Section 7, with a brief on the claims made and future research direction.

Problem Description
The travel-time prediction problem can be viewed from the perspective of the input data type, prediction methodology and prediction horizon as shown in Figure 1.Irrespective of the class of travel-time prediction, the fundamental components of the process are similar as shown in Figure 2. Below we explain each component with a review of the main published work done in each area.

Data Acquisition and Storage (ILD)
Formulation of an accurate predictive inference relies significantly on the quality of the traffic data.A typical speed plot constructed using a portion of the dataset we used is shown in Figure 3.The blue area represents congestion, while the red part shows the free flow speeds.
Inductive Loop Detector (ILD) data based on its abundance and known quality issues has been used as input data in most travel-time prediction papers [6, [19][20][21][22][23][24][25].The scalability of the model also biased the choice of the researcher towards choosing ILD as a data source.Other orms of datasets include probe vehicle data, traffic cam-f   era feeds, and satellite data, data obtained from microwave radar, license plate matching, and automated vehicle tag matching.
Before using ILD data as our data source, certain known issues required attention in context of the site selection and data pre-processing phases.Spacing between consecutive loop detectors directly affects the quality of the data captured.The standard spacing requirement between consecutive loop detectors is not defined in literature.However, [26] concluded that the detector spacing of 1 to 1.5 km is optimum for the use of short-term forecasting of traffic parameters.In [27], it was shown that a detector spacing of 0.33 to 1 mile does not destabilize the travel-time estimation errors, while [28] concluded that a detector spacing of 0.5 miles is sufficient to represent traffic congestion with acceptable accuracy.
After data acquisition preprocessing steps are performed on this data to ensure its validity.ILDs are prone to a number of errors [29].These data errors are usually detected and removed using imputation methods [29,30].[29] gave a linear model based on historical data using neighboring detectors to detect faulty values and through linear regression imputed the missing or bad values.The method proposed in [29] was adopted by CALTRANS for data processing of the loop detector data in California roadways.

Travel-Time Estimation
Like any prediction problem, the ground truth (estimated

Trajectory-Based Methods
vert the time-mean

Flow-Based Methods
g travel-time is through

Travel-Time Prediction ch is mainly classified
The trajectory-based methods con speeds collected from detectors to space-mean speed.Different methods are proposed to calculate link traveltime from this speed.The two common methods are the mid-point method and the average speed method.Both of these methods assume a constant speed between links, which in reality is never the case especially when traffic is in transition from free flow to congestion or vice versa.Hence, the algorithms proposing a constant speed lose their accuracy with the increase in congestion [31].Van Lint and Van der Zijpp proposed an alternate approach, the "Piecewise Linear Speed" method [32], which solved the function of the travel-time based on the time mean speed using an ordinary differential equation to calculate the trajectory of the vehicle in the section based on space mean speed.
An alternate way of estimatin flow-based models which focus on capturing the dynamoics of traffic using traffic-flow theory concepts, and through traffic data simulation, draw the travel-time of the segment.Accurate flow information is also required for a precise estimation; however, in most cases it is difficult to collect data from all on-ramps and off-ramps using the existing infrastructure, which becomes a bottleneck for flow-based estimation methods.These models are, however, more popular in research involving traffic flow simulation.
The travel-time prediction approa w.r.t. the prediction horizon, modeling approach and type of input data as shown in Figure 1.Further classification is also possible w.r.t. the road type (freeways, arterials); but, since the scope of this proposal is confined to freeways; we would not discuss the arterial travel-time prediction problem.
The historical data of traffic parameters can represent a historical data with cu es similarities when compared with hi ilters used in [2,42] pr N) were extensively us co understanding of th

An Overview of Wavelets
nt a multi-resolution traffic profile, which could be implemented to predict future values, in similar traffic conditions.This approach demands offline processing.The data is classified into different subtypes based on their characteristics.In [33] the data was sub-classified into the "type of day", for prediction of travel-time.This forecast method does not take into account the dynamics of traffic for travel-time prediction, which makes this method less robust for short-term prediction.Consequently, it produces low accuracy results, when the current traffic is not representative of its historical profile.Historical predictor is normally used for long-term prediction.
A hybrid approach of combining rrent data was used in [34] where real-time data was captured directly from the road side terminals, and using it with aggregated historical data showed improved results.[1] used principal component analysis and windowed nearest neighbor, while combining historical and instantaneous data.
Traffic data shar storical data of the same day and time as the current data.Regression methods with coefficients varying with the time of the day were used by [1], [35] and [36] to predict travel-time.[6] also used linear regression with step wise variable selection method.Regression models involve the examination of historical data, thereby, extracting parameters, which represent traffic characteristics, and projecting them into the future to predict travel-time.Autoregressive integrated moving average (ARIMA) was introduced by [37] and [38] as an alternate to model the stochastic nature of traffic.[39] used autoregression model to predict travel time.Non-linear time series with multifractal analysis was implemented in [40] and [41] for travel time prediction.
Kalman and Extended Kalman F ovide good performance in predicting travel-time for one time-step ahead horizon, which is normally not more than 5 minutes, as the state model needs real observations to calculate each error term.
Artificial neural networks (AN ed for marking non-linear boundaries.To address the problem of a time series forecast, a subtype of ANN called the recurrent neural network (RNN) was considered suitable [19,24,43].RNN has an internal state, which keeps track of the temporal behavior between classes.Different architectures of the Multilayer perceptron have been used to predict travel-time with an improved accuracy [7,8,10,19,20,23,24,[43][44][45].The support vector regression method was also investigated in [5,46].
On the other hand, traffic flow models work on the ncept of correlating the theory of fluid dynamics with vehicular flow.From the perspective of traffic flow models, travel-time prediction is more of a boundary condition prediction problem, because the flow model is designed offline, and it would predict the time based on the values of demand and supply at on-ramps and offramps respectively.The model is run using a simulation scheme, which is based on the assumptions of the car-following, gap acceptance, and risk avoidance parameters.The simulation model predicts the aggregated parameters of simulated vehicles to display the predicted travel-time [47,48].This makes traffic flow models very complex and requires a high degree of expertise and long man-hours for design and maintenance.
Traffic flow models give us a better e traffic flow dynamics, but as far as their accuracy for travel time prediction is concerned, they demand a precise infrastructure of input detectors, whose location would be defined by the flow model.To manage the supply and demand parameters, the flow models require additional detectors on each off and on-ramp.Traffic flow based models are a good method to evaluate the cause and effect of traffic phenomenon, but applying them for travel-time prediction would entail a huge design and maintenance cost for every freeway section.Due to their modular design, precision of traffic flow models, for travel-time prediction, would be as accurate, as the precision of the predicted inputs and boundary conditions.
Wavelets are functions, which prese decomposition of a signal x using a mother function  and a linear combination of its dilated and/or shifted ve sions (1).
where s defines the dilation and u defines the shift.To ensure orthonormalilty of basis functions [49] the timescale parameters are sampled on a dyadic grid on the time-scale plane.Thus Equation (1) becomes The orthonormal wavelet transform is then given by

t t n dt
To make the transform computationally effective the concept of sub-band coding [50] was used to filter the signal with a series of high pass and low pass filters to analyze its high frequency and low frequency components respectively.The input signal x(t) can now be represented in discrete domain as , The sampled scaling c j,n and wavelet coefficients d j,n ca , 21 , 21 To add translation-invariance in discrete wavelet tra n now be defined using high pass h l and low pass filter g l .
nsform (DWT), maximum overlap discrete wavelet transform (MODWT) was introduced, which instead of down sampling and up sampling the signal introduces high and low pass filters up sampled by a factor of 2 j−1 .The up sampling filters also introduce redundancy in the output, since the number of samples at output in every level is equal to the number of samples in the input signal.This makes multi-resolution analysis much more effecttive especially from the perspective of using this transform as an input to another system.
The filters can now be represented as a circular filter of the original time series.
To generate the wavelet packet tree, both the approxim

Support Vector Regression
on the concept of ation and detail coefficients are decomposed instead of just the approximation coefficients as in the case of the DWT.Hence the wavelet packet distributes the frequency of the original signal evenly between all coefficients as opposed to the wavelet transform where 50% of the signal frequency is in the first detail as shown in Figure 4.In the WDSVR model, we chose the wavelet packet transform to evenly distribute the signal frequency in each support vector module.

Support vector machines (SVM) work
Structural Risk Minimization [12] by transforming a low dimensional input x into a high dimensional feature space through a mapping function  and then approximating the function f(x) using linear r ression eg where b is the threshold.w is the normal vector to the hyperplane.The coefficients can be determined from the data by minimizing the regression risk function.
where C is the cost function, which defines the tradeoff between training error and model complexity.The ε-SVR algorithm discards the training points that lie beyond the threshold ε defined by the user.Mathematically Equation ( 3) is also known as the Vapnik's ε-insensitive loss function.Both Equation (3) and the regression risk unction Equation (2) can be minimized by introducing Langrangian multipliers α and *  i to this quadratic problem, yielding the solution ich is com by calculating the dot product of some feature space.

Wav Regression
The structure of wavelet packet support vect is schematically outlined in Figure 5.The model works by evenly distributing the original signal's frequency using the wavelet packet transform into the SVR modules.The time series signal, which represented the travel-time of the freeway was sampled from the database, based on the prediction horizon selected.The time signal was then transformed using the wavelet packet decomposed signals, such as ear from Equation (4) that WDSVR ce ata For accurate predictions of a non-linear and non- The second test was to detect if the reconstructed wavelet ing a certain pattern at using as show determines the number of input features given to the support vector machine.In our case the window size of 8 was selected and the decomposition was done at level 2. These wavelet coefficients were stored for the support vector regression module.The four frequency components were processed through their respective support vector machines leading to compute one time-step ahead output, where the step was equal to the time interval between the consecutive input values.The support vector regression output was finally aggregated to calculate the travel-time forecast.Table 2 gives the step by step implementation of the wavelet packet support vector regression algorithm.

Experiments a 6.1. Selection of Mother Wavele
The major computational load of the time prediction model was divided i computation of the wavelet packet reconstructedtimeseries data, and training of the support vector regression machines using the optimal cost and epsilon values.
The grid search method was used for searching for epsilon and cost values.
A definite procedure for selection of mother wavelets is yet to be established f ctor regression models.However, analyzing the wavelet reconstructed signal in context of the characteristics of the support vector machines helped us in filtering the relevant wavelets basis.
The accuracy of the proposed model is superior to the classical SVR model, if et.
It is cl rt vector would not produ more accurate results than SVR for shorter time horizons, knowing that prediction error is proportional to the prediction horizon.In our datasets, the WDSVR gave more accurate results than the SVR method for prediction horizons of 45 minutes or more.
We conducted two basic tests for the admissibility of all wavelets for the support vector machine module.

Cross-Correlation of Wavelet Decomposed D
stationary dataset the reconstructed wavelet coefficients of successive windows should not be correlated with one another.A positive linear correlation of +1.0 would indicate a similar pattern to the SVR module for every input and would adversely affect its prediction accuracy.To test our hypothesis we computed the cross-correlation of each window with the other.

Recurrence Relationship
coefficients windows were follow a particular location.We know that the input data of the successive windows is non-linear.The existence of a unique pattern at a similar location in the input signal would indicate a similar pattern to the support vector machine in every iteration, which in reality is not the case.Consequently, it would adversely affect the performance of the SVR module.To detect such events we calculated the first difference of each successive window.
, where h is the prediction horizon in minutes.
2) Initialize p = 0 and decompose the sampled signal using wavelet packet decomposition at level j = 2 3) Store W j,n computed in step 2 for the SVR module and increment p = p + 1.

4)
Repeat steps 2 and 3 until the end of the input array    y t .

5)
Increment n = n + 1 and repeat steps 2 -4 until n = 2 j .6) Divide W j,n into training and testing sets and compute one step ahead prediction value using their respective SVR modules.shown encouraging results to motivate furthe n this area.

An Alternate Configuration for Interchangeable
The WDSVR and SVR have both proven suitable for travel-time prediction depending on the selected forecast Historic Travel-Time Database Predicted Travel Time

Figu ram of the wavelet decomposed
horizon.In our dataset, weobserved that SVR is more accurate for prediction horizons of less than 45 minutes.From 45 minutes onwards, WDSVR gives more accurate results.Considering the effectiveness of both models in different horizons, we have proposed an interchangeable configuration in Figure 8, where travel-times using both models were computed in parallel and then switch to the configuration for active use depending on the selected prediction horizon.The cloud component, which houses both the prediction models is flexible and can be either scaled horizontally or vertically toaccommodate for the computation overhead.
The route of 9.13 miles on I-5N was selected with a detector density of 2.73.The data was observed for 214 consecutive days commencing from March 01, 2011 to September 30, 2011 from 1 pm to 8 pm.The time slot was selected after observing the daily pattern of congestion during this period.The data revealed daily congestion in the evening hours except holidays and most weekends.This loop detector data was collected over a 5 minutes interval.The speed data was converted to travel-time series using the PLSB travel-time estimation method [32].We decomposed the time series using the wavelet packet decomposition at level 2. The data was then reshaped into a u*v matrix with u = N − 7 and v = 8.The decomposed and reshaped wavelet transform of travel-time matrix gave us 2 j matrices at level j represented as The data for our model validation and testing was collected from the Caltrans Perform , , The four matrices were given as input to their respective support vector machines with (N − 7) × 0.7 rows for training while the remaining 30% for evaluation.The re 5. Schematic diag support vector regression model.The predicted labels of each support vector machine were aggregated to compute the forecast time value.Finally the values generated by SVR were evaluated for errors.
We tested our model using Debauchies, Coiflets, Symlets, Reverse Biorthogonal and Biorthogonal wavelets in 42 different configurations, with different values of cost and epsilon.It was observed that not all wavelets gave better results than the benchmark SVR predicted values.However, some of the worse performing wavelets were filtered out using our wavelet selection process to save computational cost.The best outputs in each time horizon sub-category were shown in Tables 1-3.
Mean Absolute Percentage Error (MAPE), Root Mean Squared Error (RMSE) and Pearson Product-Moment Correlation were the three indicators chosen for evaluation of our model and for comparison with the classical Support Vector Regression model.Table 4 shows the comparison of MAPE between SVR and SVR with wavelet decomposed inputs.Table 5 shows comparison of Pearson product-moment correlation between SVR and SVR with wavelet decomposed inputs.
Our results indicated that the wavelet decomposed support vector regression model consistently showed better performance for prediction horizon of 45 minutes and above but below 45 minutes the classical SVR method was more accurate.Figure 9 showed the better tracking ability of the proposed model in comparison with the SVR model.

Summary of Results
The proposed wavelet packet decomposed SVR method showed improved results for travel-time data prediction over the conventional SVR method for prediction horizons of 45 minutes and above.For accurate state estimation through machine learning methods large datasets are    which could be scaled horizontally or vertically to cater fo posed a modular prediction method, where multiple prediction algorithms are stored in the cloud and the best performing algorithm is selected based on the prediction horizon.We also investigated wavelet properties in conjunction with their effectiveness for support vector machines.We observed that wavelet basis, whose crosscorrelation between the wavelet reconstr ents of successive windows resulted in a linear correlation of +1.0 or the ones with recurrent relationships are not useful for WDSVR model and should be discarded to reduce the computation cost.In our dataset it reduced computational cost by 21.43%.Further improvements to our model might be made possible by subdividing the dataset based on its pa gested and free flow parts or The scalability of the model al for its application to calculate arterial travel times.

Figure 1 .Figure 2 .
Figure 1.A taxonomy of travel time prediction approaches.
travel time) is essential to calculate and evaluate the results (predicted travel time).The travel-time estimation methods are divided into two broad categories: trajectory-based and flow-based.

Figure 3 .
Figure 3. Speed plot of a portion of the dataset.
Figure 4. Frequency allocat level DWT.Frequency allocation of 2 level wavelet packet transform.ion of 2

W
, where j is the level of the decomposition.T t decomposition was done using a sliding window n in Figure6.The window size he wavele support ve the condition in Equation (4) is m e, is the error of the classical suppo method.

7)
Aggregate the predictions of all 4 SVR modules to calculate the predicted travel time.s subset of the data chosen at random ranging four days.In Figure 7(a) the wavelet reconstructed difference signal converged to zero at a similar p the first difference o w ong the successive windows.On the other hand, the best performing wavelet at one hour prediction horizon, the Reverse Biorthogonal 6 r p as shown in Figures 7(b) and (d).for WDSVR is needed,our results on the s have r work i To identify the above characteristics in the wavelet ignal we used a oint in every iteration.Figure 7(b) is f the of the Biorthogonal 1.1 filter output at level 2,3, hich indicates a linear correlation am .8wavelet, showed no cross-correlation or recurrence elationshi issibility tests, 9 wavelets were filtered 42, hence reducing the computational loa of roject by 21.43%.While a detailed study o election of wavelets for the support vector machines

Figure 7 .
Figure 7.A comparison of wavelet recurrence relationship and cross correlation of better and worse performing wavelets: (a) First difference signal of wavelet Packet Reconstructed time series at level 2,3 using Biorthogonal 3.3; (b) First difference signal of wavelet Packet Reconstructed time series at level 2,3 using Reverse Biorthogonal 6.8; (c) First difference signal of wavelet Packet Reconstructed time series at level 2,3 using Biorthogonal 1.1; (d) First difference signal of wavelet Packet Reconstructed time series at level 2,3 using Reverse Biorthogonal 6.8.

Table 2 . Algorithm for wavelet decomposed support vector gression. re 1)
Sample travel-time array into subsets for their respective predicttion horizons