Real-Time Urban Traffic State Estimation with A-GPS Mobile Phones as Probes

This paper presents a microscopic traffic simulation-based method for urban traffic state estimation using Assisted Global Positioning System (A-GPS) mobile phones. In this approach, real-time location data are collected by A-GPS mobile phones to track vehicles traveling on urban roads. In addition, tracking data obtained from individual mobile probes are aggregated to provide estimations of average road link speeds along rolling time periods. Moreover, the estimated average speeds are classified to different traffic condition levels, which are prepared for displaying a real-time traffic map on mobile phones. Simulation results demonstrate the effectiveness of the proposed method, which are fundamental for the subsequent development of a system demonstrator.


Introduction
Real-time traffic information is essential for supporting the development of many Intelligent Transportation Systems (ITS) applications: incident detection, vehicle navigation, traffic signal control, traffic monitoring, etc.For instance, since 2007, Google began to integrate real-time traffic information with its mapping service.The traffic data are aggregated from several sources, e.g., road sensors, cars, taxi fleets and more recently mobile users [1].
The current state-of-the-practice traffic data collection in most parts of the world is to rely on a network of road-side sensors, e.g., inductive loop detectors (ILDs), to gather information about traffic flow at fixed points on the road network [2][3][4].Although fixed sensors are a proven technology, they are not deployed at wide scale mostly because of its high cost.Moreover, with fixed sensors, it is only possible to measure the spot speed, which is one inherent deficiency in comprehensive reflection of speed over the entire road link.Additionally, this type of model is link and detector location specific, which requires careful calibration [5].An alternative to these luxury road-side infrastructures is to employ dedicated vehicles as floating traffic probes [6][7][8].The dedicated vehicle probes (PVs) are typically equipped with a GPS receiver and a dedicated communication link.A large number of vehicles should be so equipped to have enough probes.Insufficient number of probes limits the ability of generating information for large area and accuracy of results [9].Given the trend that GPS-equipped vehicles are expected to increase in the future, the capac-ity and cost of dedicated communication links between in-vehicle equipment and traffic management center will still limit the sample size of PVs [5].Moreover, since PVs are chosen from a particular category of vehicles, e.g., taxis or buses, the traffic information could be biased and not representative of the whole population [10].
With the advance of the mobile communication technology, mobile phones are increasingly utilized for collecting traffic data.This approach avoids installation and maintenance costs, either in vehicles or along roads.In addition, using mobile phones as traffic probes overcomes the coverage limitation in road-side sensors and insufficient probes in dedicated PVs.By the end of 2010, there had been 5.3 billion mobile subscriptions [11], which is equivalent to 77% of the world population.Ideally, any mobile phone that is switched on, even if not in use, can act as a probe.Thus, potentially there is a large sample size available to this type of probe system.Moreer, in 2010, sales of smartphones (most of which are equd with A-GPS chips) showed strong growth worldwide [12]: total shipments in 2010 were 292.9 million units which had increased by 67.6% from 2009; this made smartphones 21.5% of all handsets shipped.
In the past decade, field trials have been conducted to show the feasibility of using mobile phones as traffic probes [10,[13][14][15][16]. Nevertheless, most of these trials tried to estimate traffic states on freeways and only few deployments attempted to monitor urban arterial roads.It has been suggested in [10] that future research efforts should be focused on obtaining traffic data for arterials where no data is currently available rather than obtaining data from freeways where fixed traffic sensors are already deployed.However, traffic estimation on arterials is more challenging than on freeways due to the following facts [7,14,17]: 1) arterials have lower traffic volume; 2) arterials have more variability in speeds; and 3) arterials are controlled by traffic signals at intersections.
Among the past field deployments, the majority of them employed network monitoring methods that make use of network signaling information, e.g., the handover measurements or the time/angle (difference) of arrivals.Only very few of them were handset-based (using GPSenabled phones), for example, a pioneer field trial held by Globis Data in 2004 [18] and the Mobile Century field experiment conducted in 2008 [19].Evaluation results from field trials indicated that the network-based probe systems cannot provide sufficiently accurate traffic data for arterials.Since arterials tend to introduce additional complexities, the more accurate handset-based A-GPS mobile probe is expected to be a better solution for urban arterial roads; however, this has not yet been verified (both [18] and [19] provided only successful traffic estimations on highways).Two issues were identified as primary hurdles to the success of A-GPS mobile phones as traffic probes [20]: 1) the additional communications costs; and 2) the slow uptake of GPS-enabled phones.These two issues are no longer problems under current circumstances: 1) modern cellular networks have wide communication bandwidths; and 2) A-GPS mobile phones are increasingly available in the global market.
When evaluating the results from field tests, there is hardly any available ground traffic data to compare with, especially for arterial roads.Additionally, the field test data is not suitable for statistical analysis due to variations in different tests and limited number of observations.In simulation-based studies, on the other hand, individual vehicle tracks and aggregated traffic states can be extracted as "ound truth" [15].In addition, traffic simations can generate traffic data under a variety of traffic conditions, featured by different volumes and road networks [21].Although these simulation studies do not replicate the actual conditions precisely, they may still provide valuable indication of the potential performance of a probe-based traffic information system.
In our proposed smart traffic information system [22], A-GPS mobile phones are used to locate the vehicles.They are also utilized as on-board processing units.In addition, these switched-on mobile phones are employed as probes to collect traffic data, based on which real-time urban traffic states can be estimated and sent as feedback to service subscribers.In this paper we show how location data collected by A-GPS mobile phones can be used to estimate urban traffic states.The urban traffic is generated by microscopic simulation while small scale field tests are used to emulate the A-GPS measurements.This paper is organized as follows: Section 2 provides a brief overview of the simulation-based framework.In Section 3, three data processing steps are presented: emulation of A-GPS measurements, filtering of individual probe data, and estimation of average link speeds.Simulation set-up, result of each step, and performance evaluation are given in Section 4. Finally, Section 5 concludes this paper.

Simulation-Based Framework
A simulation-based framework is developed to emulate the A-GPS mobile phone-based urban traffic estimation, as shown in Figure 1.The framework consists of three parts: 1) microscopic traffic simulation; 2) location data processing and speed aggregation; and 3) performance evaluation and result presentation.The microscopic traffic simulation is used to simulate the urban road network and the corresponding traffics.It generates "actual" location tracks for each vehicle, and prepares "ground truth" traffic states.The generated locations are firstly prerocessed to emulate "realistic" location samples that A-GPS mobile phones may provide in real world situations.The pre-processing defines the percentage of vehicles that are equipped with A-GPS mobiles (according to a penetration rate), and introduces statistical errors into the location updates (according to the field test).These "realistic" location data are then post-processed by a 2-step filtering process.The Kalman Filter (KF) is implemented to track each vehicle/mobile and the simple data screening is employed to filter out undesired position/speed estimates.The next steps are allocating the individual speed estimates to road links, and aggregating them at a pre-defined time interval.As for performance assessment, the accuracy of average speed estimation is evaluated by comparing it with the "ground truth" average speeds.The coverage is examined by finding the fraction of road links that have available estimations in that time interval.Finally, with a simple threshold technique, estimated average link speeds are classified into several traffic condition levels which can be presented as colored road segments on the service subscribers' mobile phone displays.

Urban Traffic Modeling
In this work, the microscopic road traffic simulation package "Simulation of Urban MObility" (SUMO) [23] is employed to model the urban traffic on arterial roads.Two applications featured in the SUMO package are used to generate the road network and vehicle routes, respectively.NETCONVERT imports digital road networks from different sources and converts them into the SUMO-format.DFROUTER generates random routes and emits vehicles into networks.In addition, eWorld [24], a SUMO extension, is employed to facilitate the processes of importing the OpenStreetMap (OSM) data [25], then editing, enriching it and finally exporting the data files of networks and routes for SUMO simulation.
As a result of the SUMO simulation, two useful datasets can be generated for further analysis.One is the aggregated speed information for each road link/edge called "aggregated edge states".It includes information such as road edge IDs, time intervals, mean speeds, etc.These aggregated speeds can be used to determine the "ground truth" of traffic flow.The other is the location information of every vehicle called "net-state dumps".It records at every timestamp the location of every vehicle in the simulated road network.Each record consists of a vehicle ID, a timestamp, and the vehicle's coordinates.This data file is used as the basis for the simulation of the mobile probe-based traffic information system.

System Design Parameters
For such a probe-based monitoring system that makes use of collected location samples, the sample size and the sampling frequency are two key design parameters that would influence the system performance.These sampling related issues have been discussed extensively in many previous studies of probe-based traffic systems.In those studies, both the experimental figures and analytical models were presented, which could be very good references for this work.
As it has been discussed in the introduction, the main concern of probe-based monitoring is the determination of the probe penetration rate (i.e., the percentage of vehicles/ mobiles that serve as traffic probes) to ensure an accept-able quality.Similar conclusions were drawn from previous field tests and simulation studies [7,22,[26][27][28]: probe-based system can be expected to work well for freeways with penetration rates range from 3% to 5%.However, as indicated in [7,22,29], urban arterial roads may require a penetration rate greater than 7% to provide reliable speed estimates.
Another essential issue in traffic systems using GPS equipped probes is the data sampling/reporting intervals.Typical GPS receivers receive location updates at every 1 -3 seconds.This frequency of data collection from a large number of probes may cause network congestion.To avoid this issue, a temporal sampling method is usually applied in which probes report their data at a prescribed time.In addition, [30] claimed that using longer sampling intervals allows gather information over longer distances, hence reducing the chance of capturing a nonrepresentative speed.However, this sampling interval cannot be too long, since it affects the timeliness of data and the system coverage on short road links.A sampling interval of 10 -20 seconds can be used in practice, as indicated by the previous studies [7][8]22,31].

Data Processing and Aggregation
In this section, we describe in detail the processes, methods, and algorithms involved in the right branch of the simulation-based framework (Figure 1).There are mainly three steps in this part: emulation of A-GPS probe location updates, filtering of individual probe data, and estimation of average road link speeds.

A-GPS Probe Data Emulation
Due to technological and practical limitations, location updates collected by A-GPS mobile phones are not perfectly precise and limited in sample size.Therefore, both quality and quantity of the location data generated from traffic simulation should be reduced in order to emulate the realistic field condition.As described in Section 2.1, location data are degraded in two ways: 1) setting a specified percentage of simulated vehicles/ mobiles to be trafficc probes; and 2) introducing statistical positioning errors of A-GPS mobile phones.
The SUMO simulation output file "net-state dumps" may grow extremely large since it contains detail information of each vehicle/mobile.Hence, there is a need of converting this location data into more compressed one.In addition, positions in SUMO are expressed in Cartesian coordinates instead of using the WGS84 [32].In this work, the SUMOPlayer [33], a Java API, is employed to "play" the SUMO network-dump files in real-time to WGS84 coordinates for each probe.The SUMOPlayer is also customized by defining parameters such as fraction of tracked vehicles (i.e., the probe penetration rate).
In order to get practical error statistics of A-GPS location updates, field tests were conducted on Sony Ericsson and Nokia A-GPS mobile phones.The location updates include latitude and longitude coordinates, their accuracies, and the corresponding timestamps.Location accuracy (in meters) is the root mean square (RMS) of the north and the east accuracy (1-sigma standard deviation).Under a Gaussian assumption (can be motivated by the central limit theorem according to [34]), this implies that the actual location is within the circle defined by the returned point and radius at a probability of about 68%. Figure 2 shows the A-GPS location samples collected by the 10-second interval continuously for 1-hour in the urban area of Stockholm.The median of RMS errors was found to be 8.83 m; and 90% of the errors were found below 18.53 m according to their Cumulative Distribution Function (CDF).Although these A-GPS location measurements are less accurate than those from regular GPS units, they still appear sufficient for traffic state estimation (a location technology within 20 m accuracy can produce quantitative travel information [35]).

Filtering of Location Data
After the pre-processing stage, the A-GPS mobile phonebased data collection is emulated for every traffic probe.The "realistic" location samples cannot be directly utilized in trafficc estimation since they are inherently erroneous.A two-step post-processing is therefore applied: the Kalman filtering (KF) to transform A-GPS measurements into dynamical state estimates of position and velocity; and the data screening to eliminate undesired data.

KF-Based Tracking
In this subsection, the KF is exploited to track the movement of vehicle/mobile probes.For computational simplicity, we model the travelling vehicle/mobile as a dynamic linear system driven by a random acceleration.As a result, the transition equation is firstly derived for continuous-time movement, and then expressed in discretetime after discretization.
Suppose that a vehicle, equipped with a mobile, moves in a 2D Cartesian coordinate system (SUMO works with Cartesian coordinates only; and road networks specified with WGS84 are converted by NETCONVERT using UTM [36]).In order to track this moving mobile, its state can be expressed as a dynamic state vector: where   x t and   y t are the positions and their first derivatives   x t  and   y t  are velocities.Then, motion dynamics of the travelling probes is described by a continuous white noise acceleration (CWNA) model [37].In this model, the velocity undergoes perturbations which can be modeled by zero-mean white Gaussian noise   with the variance where c is the process noise intensity.The continuous-time transition equation is then written as: where Let the sampling interval in this system to be T  , after discretization, the discrete-time transition equation is: with the transition matrix and process noise vector  that models disturbance in driving velocity.The covariance matrix of the process noise vector is The state vector with the measurement matrix  taking only position observations, and the measurement is the measurement error variance.It is assumed that the cr variances in both directions are the same and independent.
In order to track the vehicle/mobile in real-time, a disete-time Kalman filter (KF) is applied to the noisy location measurements.The KF gives a recursive solution for the state estimation of the system described by equations ( 6) and ( 9).The transition and measurement equations can be rewritten in a more compact way as: where Q and R are the covariance matrices of the process error 1 k V  and measurement error k W , respectively.Then, t ptimal estimations (in terms inimizing variances) are obtained by the following iterated steps (deviation can be found in state estimation texts, e.g., [37,38]):  Before the measurements are available at k t priori he o m estimate of the state mean ˆk X  and the covariance k P  are obtained by the time update equations:  Kalman gain is then computed to s k k et an appropriate correction term for the next propagation step, so as to minimize the mean square estimation (MSE) error:  After processing the noisy measurement k Y , posteriori estimate of the state mean ˆk X is obtained by updating ˆk X  with a corrected version of measurement residual: Then the covariance matrix which is associated w X can be updated as:

 
From the last subsectio ated position and velocity estimates of each probe.Before they can be aggregated to provide the estimation of average link speeds, simple data screening process needs to be applied to filter out some undesired data.
One challenge in the network-based mob stems is the need to distinguish non-valid probes (e.g., mobile users in buildings, on subways, or pedestrians) from mobile phones travelling on-board vehicles.As stated in [39], since the outliers' influence is severe, especially for dense urban areas, they should be identified and filtered out.In our system, the validity of traffic probes is not a big issue any more.Unlike the network monitoring method that randomly monitors mobile users within a wireless network, probe data in this system come from our service subscribers, and we can assume that the service subscribers would only start the traffic application when they are in vehicles.
Provided that in this system our m ers are unlikely to be non-valid probes, following criteria are considered for the data screening process:  Speed estimates that are greater than 120% of limits should be eliminated, since those very large speed estimates most probably source from positioning errors (speed limits in Stockholm downtown areas vary from 30 km/h to 50 km/h [40]). Location estimate with a distance to the ne link larger than 20 m should be eliminated.It helps to solve the problem of mapping estimates between two nearly parallel links (location accuracy of at least 20 m is expected to differentiate between closely spaced parallel urban roads [41]).

Estimation of Aggregated Link Speeds
rk are

Allocation of State Estimates
rated in Section 3.2, ea Generally, the real-time traffic states in this wo characterized by the average link speeds of the simulated urban arterials along with rolling time periods.Within each period (i.e., the aggregation interval), speed estimates from individual probes are firstly allocated to road links and then aggregated to provide estimates of the average conditions for the road links.
After applying the filtering steps illust the estimated vehicle/mobile tracks are still deviated from their actual trajectories due to the introduction of location errors.Therefore, we need to project these position/velocity estimates onto road links (known as map matching) so as to get traffic information for each link.In this work, the simple point-to-curve geometric map matching technique [42] is applied, because of its effectiveness and given the real-time requirement of this system.Projection distances are calculated from a position estimate to each of road link candidates.The road link, which gives the smallest distance, is identified as the associated link to that position/velocity estimate.As a result, at every sampling step, speed estimations can be allocated to specific links in the simulated road network.
From the estimation result of Section 3.2, we can sily derive the position and speed of the th i probe at k t as: After map matching, at certain timestamp , we can obta  1, 2, , l   ( l s

Aggregation of Link Speeds
d estimates are rec-In Section 3.3.1, the mapped spee orded for each link along with sampling time stamps.Since traffic conditions are characterized by average travel speed on road links, the rest of the problem is to aggregate the speed estimates over a specific time interval.Previous works on traffic estimation [7,10,43] indicate that a 10-minute aggregation time appears a reasonable choice taking into consideration both the real-time requirement and data availability.As a result, the average speed along track J during the interval of interest is: where is the available speed estimate on link j terv ˆ( ) n  is the total number of available estimates.

parts of ener
As shown in Figure 3(a), the road network of downtown area in Stockholm has been chosen as a simulation case study.The OpenStreetMap (OSM) XML file is firstly edited in Java OpenStreetMap Editor (JOSM) [44] in order to remove all the road edges which cannot be used by vehicles such as railway, roadways for motorcycle, bicycle, and pedestrian, etc.In addition, all the edges are set as one-way for simplicity.The simplified version of network, which consists of 14 nodes, 22 links as well as 4 traffic lights, is imported into the eWorld, as shown in Figure 3(b).In eWorld, we can further edit road properties, e.g., street name, speed limit, and phase assignment of traffic light, which can be accessed from the OSM data.Then, with the export feature of eWorld, the network file, required by SUMO simulation, can be generated using NETCONVERT.Since OSM data works with the WGS84 instead of the Cartesian coordinate system used by SUMO, projection and offset are applied during the conversion.Traffic light logics, speed limits and priorities are also encoded in the network files.
Once the road network is ready, the next step is to g ate the traffic.In the eWorld, the properties of vehicles are firstly defined, such as acceleration, maximum allowed speed, and speed variation.Then, random routes of vehicles traveling on the road network are generated for a specific time interval using the DUAROUTER.Since the limited simulation network makes the traveling vehicles leave the network fast (typically no more than 250 s), in each time step a given number of vehicles are emitted to the network in order to achieve traffic equilibrium.These randomly generated vehicles and their routes are also exported as the SUMO file.As a result of a 3600-secondinterval, 575 vehicles have been generated with random trips.Link6 and Link15 have lower traveling speeds, which are The SUMO simulation is conducted for 1-hour without incidents.The real-time traffic data, i.e., the aggregated link/edge state is collected together with the network state dump.As introduced in Section 2.2, the aggregated state file is used to establish the "ground truth" link speeds, and the network state dump file is used as input to the SUMOPlayer to generate the mobile probe data.
Suppose that the travel speeds of interest are those of links 1, 4, 5, 6, 9, 12, 15, 18, 19, and 22.The links' length ranges from 173 m to 236 m.Other links are not included mainly because their aggregated density and occupancy are generally smalll which makes the estimation less necessary and potentially results in small number of probes.Figure 4 shows the mean traveling speeds of selected links recorded in the "aggregated edge states".Since only Link22 has a speed limit of 50 km/h and all the others have speed limits of 30 km/h, average link speed of Link22 is considerably higher.In addition, in -GPS location sa expected to be detected as congestions in the estimation.
The SUMO output "net-state dumps" is then used as put for the SUMOPlayer to generate a large amount of simulated mobile probes in real-time.The SUMOPlayer first reads the corresponding network file and then chooses randomly vehicles from the network state dump, according to the probe penetration rate (10% is specified in this case).As a result, it writes location updates (longitudes and latitudes) every second for the vehicles that are selected as traffic probes.In order to be compatible with the SUMO network format, those coordinates are projected back to zero-origin Cartesian system with the same offset used by the NETCONVERT.There are in total 63 probes and the time they spend in the system range from 49 s to 241 s.


dates of all the mobile probes from 1-hour simulation as well as all the position estimates after the KF.In the DS step, the position/velocity estimates which fail to meet the criteria (specified in Section 3.3.2) are discarded.
After the post-processing, accuracies of position eed estimations are evaluated statistically (the actual position and speed of each probe is available in the SUMOPlayer output).The figure of merits used to evaluate the th n position and speed estimates (out of totally 701 estim s) are the root square error (RSE) and the absolute error (AE), respectively: where   le 1 lists t is then allo position and speed estimation errors.It is worth noticing that more accurate speed estimates are obtained from relatively inaccurate position estimates.This is mainly due to the fact that the speed estimate is derived from two position estimates and the error in position estimates is relatively small compared to the distance traveled by the probe between successive estimates [45].
Each of the state estimates after KF and DS cated to a specific road link through the geometric map matching.Table 2 lists the statistics of the correct link identification (CLI) rates from 63 probe routes.As shown in the table, in average 84.92% of the estimates are mapped correctly to the road links.Sources of error in the link identification are due largely to the fact that   For each road link, all the mapped speed estimates are accumulated every 10 minutes.They are then aggregated to estimate the average link speed during that 10-min interval.The resultant average link speed estimations for selected links are shown in Table 3, being compared with the "ground truth" average link speeds recorded from the traffic simulation.As shown in the table, two performance metrics are considered: 1) the estimation accuracy evaluated by the mean absolute error; and 2) the system coverage evaluated by the speed estimation availability.The mean absolute error is defined as the absolute difference between the true speed on the link and the estimated speed.The speed estimation availability is the fraction of links that have speed estimates available in the time interval.
In addition, we classify the estimated road traveling speeds into three traffic condition levels, i.e., green, red, and yellow: 1) green level (smooth traffic) if link speed is above 7 m/s; 2) red level (congested traffic) if link speed is below 4 m/s; 3) yellow level (medium traffic) if link speed is between 4 m/s and 7 m/s.These two speed thresholds are determined considering the state-of-the-art traffic speed classification in urban area [8,32].As it can be seen in the table, the congested traffics on link 6 and 15 have been detected.These estimated traffic conditions will be color-coded on the road network and presented on the service subscriber's mobile display.

Conclusion
In this paper, a method of real-time ur taking advantage of the recently booming A-GPS mobile hones, potentially solves the problems (e.g., cost and sp urban coverage) in the current state-of-the-practice traffic systems.Based on the microscopic traffic simulation and field tests, "realistic" A-GPS mobile probe data is emulated and "ground truth" traffic data is generated.The A-GPS location samples are firstly processed by Kalman filtering and data screening.The resultant position/speed estimates are then allocated to nearest road links through simple map-matching.By aggregating the speed estimates on each road link, traffic states (i.e., average link speeds) are determined every 10 minutes for 1 hour.The achieved simulation results suggest that reliable average link speed estimations can be generated, which are used for indicating the real-time urban road traffic condition.Future work targets a smart traffic information system demonstrator that employs the proposed urban traffic state estimation method.

Figure 2 .
Figure 2. Vehicle trajectory samples collected by the A-GPS mobile phone.
the speed estimate of link j om probe i, where  the total number of probes), and j n  n i the number of monitored links).

Figure 3 .
Figure 3. Simula rom the Open-StreetMap; and d diagram in the eWorld.
ed road network: (a) image f

Figure 5 (Figure 5 .
Figure 5. Probe location data a ated from SUMOPlayer; and ggregated every 10 minutes: (a) individual probe locations gener (b) the emulated "realistic" A-GPS location samples.

Figure 6 .
Figure 6.KF-based tracking: A-GPS measurements vs po sition estimates after Kalman filtering.

Table 2 . Eva allocation
used only to calculate the average speed, this level of CLI is not as problematic as in the vehicle navigation and road pricing applications which rely on very accurate vehicle location.