Comparison between Neural Network and Adaptive Neuro-Fuzzy Inference System for Forecasting Chaotic Traffic Volumes

This paper applies both the neural network and adaptive neuro-fuzzy inference system for forecasting short-term chaotic traffic volumes and compares the results. The architecture of the neural network consists of the input vector, one hidden layer and output layer. Bayesian regularization is employed to obtain the effective number of neurons in the hidden layer. The input variables and target of the adaptive neuro-fuzzy inference system are the same as those of the neural network. The data clustering technique is used to group data points so that the membership functions will be more tailored to the input data, which in turn greatly reduces the number of fuzzy rules. Numerical results indicate that these two models have almost the same accuracy, while the adaptive neuro-fuzzy inference system takes more time to train. It is also shown that although the effective number of neurons in the hidden layer is less than half the number of the input elements, the neural network can have satisfactory performance.


Introduction
It has been known for decades that chaotic behaviors exist in traffic flow systems Gazis et al. [1] developed a generalized car-following model, known as the GHR (Gazis-Herman-Rothery) model, whose discontinuous behavior and nonlinearity suggested chaotic solutions for a certain range of input parameters.Due to the capacity dimension [2] of the attractor being fractal and first Lyapunov exponent [3] being positive, Disbro and Frame [4] showed the presence of chaos in this General Motors' model without signals, bottlenecks, intersections, etc. or with a coordinated signal network.Chaos was observed in a platoon of vehicles described by the traditional GHR model modified by adding a nonlinear inter-car separation dependent term [5,6], Poincaré maps of which appear as a cloud of points without any repeat.Traffic volume collected at 2-min interval on the Beijing Xizhimen highway, China, was also found to posses chaotic behaviors [7].
Because of nonperiodic behaviors, chaotic time series seem to be unpredictable, but a variety of short-term forecast models have been attempted and proven to be successful, such as models employing Kalman filtering theory [8], the local linear model using information based on past values [9], the polynomial model [10], neural network-based black-box models [11][12][13][14][15], a model consisting of a fuzzy C-means clustering and a radial-basis-function neural network [16], etc.This paper also tries to forecast the short-term chaotic traffic volume at the intersection.Two kinds of models are presented for comparison.One is the neural network, where the delay coordinates [2,17,18] of the reconstructed state space of the traffic flow system are used as the input vector of the neural network and the first delay coordinate of next state as the target of the neural network.The other model is the adaptive neuro-fuzzy inference system [19,20], where inputs and targets are identical to the first one, but membership functions and fuzzy rules [21,22] replace neurons in the neural network.The number and the shapes of the membership functions are decided and tuned by a data clustering technique and backpropagation neural network, respectively, which is different from the Park's model [16] in the ways of data clustering and learning process.

Diagnosis of Chaos
The Poincaré map, time series, autocorrelation function, etc., can often provide graphic evidence for chaotic be-havior, while the fractal dimension and largest Lyapunov exponent are two principal quantitative measures of chaos.This paper selects the fractal dimension, largest Lyapunov exponent and autocorrelation function to show the existence of chaos of the traffic flow.Brief introduction to them is as follows.

Fractal Dimension
If there is only one measurement available for a system, delay coordinates are usually used to reconstruct its state space [17].Given a time series x(t) and time delay , an n-dimensional state space can be reconstructed with the delay coordinates: To get the appropriate dimension for reconstructing the state space of a chaotic dynamical system, the first step is to obtain the fractal dimension of the chaotic attractor in the state space.There are a number of ways to measure the chaotic attractor dimension [2].Among them, this paper chose the method of correlation dimension, because it is much easier to implement and not time-consuming.Consider an orbit discretized to a set of N points in the state space.A sphere of radius r is poisoned at each point of the orbit and the number of points within each sphere with Euclidean distance less than r is counted.A correlation function is then defined as [2,23] where i j is the Euclidean distance between points X i and X j and H is the Heaviside function (or unit step function ).For many attractors, this function C(r) exhibits a power law dependence on r, as ; that is Hence, a correlation dimension is defined by the expression The chaotic attractor dimension will often approach an asymptote d with the dimension of the reconstructed state space gradually increasing.To embed a d-dimensional chaotic attractor, the state space may be reconstructed with dimension greater than or equal to 2d + 1, which according to Takens [24] will be sufficient to have generic delay plots.

The Largest Lyapunov Exponent
The largest Lyapunov exponent of a chaotic orbit is defined by the expression [3] The calculation is initiated by locating the nearest neighbor to the first point of a reference trajectory in the reconstructed state space and the distance between them is denoted by d 0i .This pair of points is then propagated through the attractor for a fixed short time  and its final separation d i is computed.After that, a replacement for the propagated pair is attempted by the following procedure: 1) The distance of each delay coordinate point in the attractor to the propagated point of the reference trajectory is determined; 2) Points closer than a given length and away from another much smaller length (to avoid noise) are examined to see if the angle between the original pair and attempted pairs is less than a given small angle (e.g.0.3 radians); and 3) The attempted pair with the smallest angle is used as replacement for the next propagation.The repeating of propagating and replacing are carried out for m cycles.

Autocorrelation Function
To find out the resemblance of the signal   x t with itself as time passes, the autocorrelation function is an often-seen tool to achieve this purpose.
approaches the square of the mean of the function   x t as    , it means that the signal is only correlated with its recent past [25], i.e., sensitive to the initial conditions.Furthermore, the time lag  at which   R  first crosses the square of the mean of the function   x t is usually considered as the time delay  for reconstructing the state space.

Forecasting Models
There are two models applied in this paper to forecast short-term chaotic traffic volumes: the feedforward backpropgation neural network and the adaptive neuro-fuzzy inference system, which are described as follows.

Feedforward Backpropagation Neural Network Model
The first forecasting model used in this paper is a feedforward neural network with the backpropagation training algorithm, as shown in Figure 1.The transfer function in the single hidden layer is the tan-sigmoid function w h e r e ,1 1 ,2 2 , 1 2 R , , , ,

Input
Hidden Layer Output Layer neurons, , ,R are the weights connecting the input vector and the ith neuron, and b i is the bias of the ith neuron.The output layer with a single neuron is given by the linear function , , , are the weights connecting the neurons of the hidden layer and the neuron of the output layer, and B is the bias of the output neuron., , , , There are many variations of the backpropagation algorithm, aiming to minimize the network performance function, i.e., the mean square error between the network outputs and the targets, which is where t j and a j are the jth target and network output, respectively.This paper chooses the Levenberg-Marquardt algorithm [26][27][28] as the training function to minimize the network performance function.This algorithm interpolates between the Newton's algorithm and the gradient descent method.If a tentative step increases the performance function, this algorithm will act like the gradient descent method, while it shifts toward Newton's method if the reduction of the performance function is successful.In this way, the performance function will always be reduced at each iteration of the algorithm.To avoid the problem of overfitting, there are two methods to improve the network generalization: Bayesian regularization [29] and early stopping.The Bayesian regularization can provide a measure of how many network parameters (weights and biases) are being effectively used by the network.From this effective number of parameters, the number of neurons required in the single hidden layer of the neural network can be derived by the following equation where R is the number of elements in the input vector, s is the number of neurons in the hidden layer, and P is the effective number of parameters found by the Bayesian regularization.In the strategy of early stopping, the available data is divided into three sets: the training set, validation set and testing set.The training set is used for computing the gradient and updating the network weights and biases, while the error of the validation set is monitored during the training process.When the network begins to overfit the training data, the error on the validation set typically begins to rise.Once the validation error keeps increasing for a specified number of iterations, the training is stopped and the weights and biases at the minimum of validation error are returned.The testing set is not used during the training, but is used to check the performance of the trained network.To evaluate the performance of the trained network, this paper performs linear regression analysis between the network outputs and the corresponding targets, and computes the correlation coefficient [30].

Adaptive Neuro-Fuzzy Inference System Model
The second forecasting model used in this paper is an adaptive neuro-fuzzy inference system, as shown in Fig- ure 2. This model consists of two components: a fuzzy inference system and a backpropagation algorithm.For an ordinary fuzzy inference, the parameters in the membership functions are usually determined by experience or the trial-and-error method.However, the adaptive neuro-fuzzy inference system can overcome this disadvantage through the process of learning to tailor the membership functions to the input/output data in order to account for these types of variations in the data values, rather than arbitrarily choosing parameters associated with a given membership function.This learning method works similarly to that of neural networks.The fuzzy inference incorporated into the adaptive neuro-fuzzy inference system is the first-order Sugeno-type inference [31], the typical rule of which, if there are only two inputs x and y, has the form If input 1 and input 2 , then output is The output level z i of each rule is weighted by the firing strength w i of the rule, which is Min , where N is the number of the rules.Because the number of the input variables and data sets are large in this paper, the "subtractive clustering" technique [32] is adopted to cluster the data and assign every data point a membership grade for each cluster.According to the number of membership functions and input variables, the number of rules is then decided.Due to the fact that membership functions are more tailored to the input data, the fuzzy inference system will end up having much fewer rules than that without clustering.

Numerical Results
The  [33] is applied to build the neural network and adaptive neurofuzzy inference system.
To get a reasonable time delay for reconstruction of the traffic flow system, the autocorrelation function   R  is plotted.Figures 3-5 show the autocorrelation function for 5-min, 10-min and 15-min traffic volumes, respectively, where the dotted horizontal line represents the square of the mean of time series of the traffic vol-ume.All the three curves tend to approach the dotted horizontal line and the time lagη for the autocorrelation to first cross the dotted horizontal line is found approximately at 300 min for all these three time intervals.Hence the time delayτ to reconstruct the flow system is 60 for 5-min interval, 30 for 10-min interval and 20 for 15-min interval.By using the corresponding time delay and gradually increasing the dimension n of the state space, the correlation dimension of the chaotic attractor will reach an asymptote as n increases.These processes are shown in Figures 6 to 8 for 5-min, 10-min and 15min, respectively.These figures indicate that the correlation dimension d for 5-min interval is 6.687, for 10-min interval is 6.766 and for 15-min interval is 6.637.Therefore, the embedding dimension (  2d + 1) is 15 for these three time intervals.Aside from the fractal dimension, the largest Lyapunov exponent of the attractor is also calculated to show the presence of chaos.The largest Lyapunov exponents are all positive for different time intervals and almost identical for each time interval with different evolution steps, as shown in Table 1.Only after obtaining the required embedding dimension and time delay can the forecasting commence.The training input/ output data is a structure whose first component is a 15dimensional input: , where   x i is the observation of the time series of traffic volume and  is the time delay, and whose second component is the output: As mentioned previously, the time delay is chosen to be 60, 30 and 20 for 5-min, 10-min, and 15-min traffic volumes, respectively.Numerical results for the neural networks and adaptive neuro-fuzzy inference system are discussed as follows.

Neural Networks
By using the Bayesian regularization, the effective network parameters (weighs and biases) can be found and the number of effective neurons in the hidden layer is then calculated from Equation ( 9).The results for three time interval are listed in Table 2, which shows the number of neurons actually required in the hidden layer is indeed less than half the number of input elements.
The performance of a trained network can be measured to some extent by the errors on the training, validation and test sets.One option is to perform a regression analysis between the network response and the corresponding targets.Through linear regression analysis, the correlation coefficients between outputs and targets for different time intervals and data sets are obtained and shown in Table 2, ranging from 0.951 to 0.985.

Adaptive Neuro-Fuzzy Inference System
By using the "subtractive clustering" technique, the mini-   Through the learning process, the parameters of the membership functions in the antecedent and the constants in the equation of the consequent of each rule are decided.After simulating the fuzzy inference, the correlation coefficients between outputs and targets for different time intervals and data sets are found, as shown in Table 3, ranging from 0.951 to 0.990.

Conclusion
The phenomena of the fractal dimension, the positive  3.The number of inference rules of the adaptive neurofuzzy inference system and the correlation coefficient.largest Lyapunov exponent and the autocorrelation approaching the square of the mean of the time series confirm the existence of chaos in the traffic flow system.Two forecasting models of the chaotic traffic flow presented in this paper prove to be very successful with satisfactory accuracy.The Bayesian regularization applied to the neural network to get effective number of neurons in the hidden layer and the subtractive clustering technique applied to the adaptive neuro-fuzzy inference system to get the minimum number of fuzzy rules are both quite useful and effective.The numerical results show that the prediction accuracies of these two modes are almost the same, as far as the correlation coefficient is concerned, but the adaptive neuro-fuzzy inference system requires more time to train, because more parameters need to be determined and that the number of effective neurons in the hidden layer is usually less than the number of elements in the input vector.

Figure 1 .
Figure 1.The structure of the feed forward back propagation neural network.

1 F x and   2 FFigure 2 .
Figure 2. The structure of the adaptive neuro-fuzzy inference system.

Figure 6 .
Figure 6.(a) The curves of

Figure 8 .
Figure 8.(a) The curves of