Statistical Model for Estimating Carbon Dioxide Emissions from a Light-Duty Gasoline Vehicle

The objective of this research was development of a statistical model for estimating vehicle tailpipe emissions of carbon dioxide (CO2). Forty hours of second-by-second emissions data (144,000 data points) were collected using an On-Board emissions measurement System (Horiba OBS-1300) installed in a 2007 Dodge Charger car. Data were collected for two roadway types, arterial and highway, around Arlington, Texas, and two different time periods, off peak and peak (both a.m. and p.m.). Multiple linear regression and SAS software were used to build emission models from the data, using predictor variables of velocity, acceleration and an interaction term. The arterial model explained 61% of the variability in the emissions; the highway model explained 27%. The arterial model in particular represents a reasonably good compromise between accuracy and ease of use. The arterial model could be coupled with velocity and acceleration profiles obtained from a micro-scale traffic simulation model, such as CORSIM, or from field data from an instrumented vehicle, to estimate percent emission reductions associated with local changes in traffic system operation or management.


Introduction
According to the US Environmental Protection Agency (EPA), in 2010 mobile sources in the U.S. contributed 58% of carbon monoxide (CO), 56% of nitrogen oxide (NO x ), and 33% of volatile organic compound (VOC) [1].Despite stringent exhaust emissions standards, increases in the number of vehicles in use and a corresponding increase in vehicle miles traveled (VMT) mean that vehicles still account for large percentages of US air pollutant emissions.At the state and regional levels, transportation and air quality engineers are developing various transportation models to help estimate vehicle exhaust emissions.Emissions estimates are important for ascertaining the effects of sources, as well as developing emissions control strategies [2][3][4].Emission models could, for example, be used to estimate the emission benefits derived from intelligent transportation systems (ITS) or coordinating traffic signals [5].Due to growing concerns about climate change, models for estimating carbon dioxide emissions from mobile sources are of increasing importance.
A number of vehicle emission models have been developed in the past decade(s), including the following:  MOVES: The US EPA's Motor Vehicle Emission Simulator (MOVES) model in 2010 replaced the previous MOBILE model.MOVES2010 estimates CO, NO x , VOCs, PM, and greenhouse gas emissions from light-duty vehicles, on a project-level or regional level. CORSIM: CORSIM is a micro-scale model that estimates emissions using look-up tables, based on dynamometer data.CORSIM determines the total emissions on each link by applying default emission rates, based on speed and acceleration, to each vehicle for each second the vehicle travels on the given link [6].
The emission factors in CORSIM, however, have not been kept up-to-date. MEASURE: The Mobile Emission Assessment System for Urban and Regional Evaluation was developed by the Georgia Institute of Technology to estimate CO, NO x , and VOCs [7,8] neous models for estimating CO 2 emissions at various speed and acceleration rates, based on portable emission measurement system (PEMS) data from 3 gasoline vehicles [14].
A number of the models discussed above other models either do not estimate CO 2 emissions, or are so sophisticated as to require excessive data inputs.There needs to be a balance between the accuracy and detail of a model and its ease of application.Therefore, the objective of the research described here was to develop a model to predict vehicle CO 2 emissions which is simple and practical to use, but still accounts for instantaneous speed and acceleration, and thus can be used to evaluate emission impacts of local scale changes in traffic system operations, like traffic signal coordination.

On-Board Data Collection
The study vehicle was a 2007 gasoline-powered Dodge Charger, as an example of a typical full-size car in the US.Information about the car is given in Table 1.Although one vehicle is not enough to provide a statistically meaningful sample of the vehicle fleet, resources were only available for a small test.We hypothesize that percent reductions in emissions due to changes in traffic system operation would likely be similar for other vehicles, even though the absolute magnitude of emissions would be different; this hypothesis, however, would need to be verified.
Data from on-board instruments can facilitate development of micro-scale emission models [20,26].Compared with conventional dynamometer testing under carefully controlled conditions, on-road data reflects real driving situations.Accordingly, second-by-second data emissions data was collected using a Horiba On-Board Measurement System (OBS-1300).The equipment is composed of two on-board gas analyzers, a laptop com- puter equipped with data logger software, a power supply unit, a tailpipe attachment and other accessories.The OBS-1300 collects second-by-second measurements of nitrogen oxides (NO x ), hydrocarbons (HC), carbon monoxide (CO), carbon dioxide (CO 2 ), exhaust temperature, exhaust pressure, and vehicle position (via a global position system, or GPS).HC, CO, and CO 2 are measured using heated non-dispersive infrared (HNDIR), and NO x is measured using a non-sampling type zirconium sensor.Although the instrument measured other pollutants, the focus of this work was building a model for CO 2 emissions, because of the current interest in CO 2 emissions due to climate change, and because fewer existing models treat CO 2 , compared with the other pollutants.
For the measurement scale used, accuracy for the CO 2 emission measurements, reported in percent, was 0.3%.A 2 second lag in CO 2 emission measurement was accounted for in the data spreadsheet.Differences in vehicle position with time were used to calculate vehicle velocity; differences in vehicle velocity with time were used to calculate vehicle acceleration.Routine instrument calibrations and warm up were carried out each day before the start of each session of data collection.The sensor was also calibrated weekly as required by the protocol.Maintenance and diagnostic procedures were conducted as required.Forty hours of second-by-second emissions data were collected, totaling 144,000 data points.The field measurements included 20 hours of arterial and 20 hours of highway data collection.The arterial test route, in Arlington, Texas, consisted of a rectangle bounded by Division St., Collins St., Pioneer Parkway, and Cooper St. on the north, east, south, and west, respectively.Vehicle velocity on the arterial ranged from 0 to 54 miles per hour.The highway test route, centered on Arlington, Texas, consisted of a rectangle bounded by I-30, Spur-408 and Loop-12, I-20, and 820 on the north, east, south, and west, respectively.Vehicle velocity on the highway ranged from 0 to 85 miles per hour.The routes were driven in one direction only, not both.20 hours of data were collected during peak traffic conditions and 20 hours during off peak conditions.Peak hours were defined as times from 6:30 a.m. to 9:00 a.m.(morning peak) and 4:00 p.m. to 6:30 p.m. (evening peak).Previous data analysis had shown that emissions were not statistically different during a.m. and p.m. peaks, due to traffic conditions being similar and drivers thus driving similarly.Off-peak time periods were defined as times from 9:00 a.m. to 4:00 p.m. Data were not collected Monday morning or Friday afternoon due to variability in traffic density.
Since the driver was the same for all runs, there was no variability due to differing driving habits of different drivers.

Statistical Modeling
In general, a multiple linear regression model can be represented as shown in Equation ( 1) [27].
where: The response variable was:  Carbon dioxide emissions (CO 2 ) in grams per second Previous models developed for estimating fuel consumption or emissions have shown correlation with velocity, acceleration, and/or power demand, which is why these variables were chosen [5,[28][29][30][31][32]. Vehicle specific power (a function of speed, acceleration, and road grade) has been found to be important in previous studies [33,34], and was considered as a potential variable; however, it was not used because the road grades were assumed to be negligible for the data collected in this project.According to a study by Boriboonsomain and Barth (2009), CO 2 emissions over flat terrain are 15% -20% lower than those over hilly terrain.Since our route was not as hilly as that tested in the Boriboonsomain and Barth, potential error associated with this assumption would be substantially less than 15% [34].
Our models were based on 5-second rolling averages, in order to smooth the data and reduce the impact of the lag in GPS response to changes in speed and direction.For peak data, both a.m. and p.m. data were combined into one dataset, because previous analysis had shown that there was no statistically significant difference between the two.Separate models for arterial and highway data were required because on the arterial road, data with velocity values up to 45 miles/hour (speed limit) were collected, while for the highway, data with velocity values up to 75 miles/hour (speed limit) were collected.
Checks were performed for constant variance (modified Levene test), normal distribution of residuals, outlier influence (Bonferroni test), multi-collinearity (variance inflation values).Transformation of the data was not necessary since normality and constant variance tests proved to be satisfactory.
To identify a best model using SAS, three methods were employed: best subset selection, stepwise regression, and backward deletion [27].The best subset selection is a procedure that uses the branch and bound algorithm to find a specified number of best models containing one, two, three variables and so on, up to the single model containing all of the explanatory variables.Stepwise regression is a procedure that combines the backward elimination and forward selection methods.This method allows the addition and removal of variables at anytime in the process and finally selects the "best" model.Backward deletion starts with all the predictor variables in the model and removes those with highest p-value greater than alpha critical (selected as 0.05), one after the other.
Arterial Model Choice A Table 2 compares the number of predictor variables and fitting parameters for the 3 potential arterial models.The recommended model is C, based on high coefficient of determination (R 2 ) and adjusted coefficient of determination ( ) values, model simplicity, and ensuring that signs on the coefficients corresponding to reality (emissions increasing with velocity and acceleration).The R 2 value for Model C was almost as high as for Models A and B (0.605 vs. 0.619), but Model C includes fewer groups of predictor variables p (3 vs. 6 or 5).The vehicle velocity coefficient in Model C is positive, which shows that as the velocity of the vehicle increases, CO 2 emissions also increase.This is consistent with reality.Tong et al. (2000) found fuel consumption (mass/time) to increase with instantaneous vehicle speed, because vehicles have to consume more fuel to generate enough power and maintain engine operation at higher speeds [22].Carbon dioxide emissions, which are proportional to fuel consumption, would then also increase with instantaneous vehicle speed [29].Similarly, the acceleration coefficient in the model is positive and so is the coefficient of the interaction term, velocity × acceleration (or power demand), which would be expected.Tong et al. (2000) found that the higher the acceleration rate, the more fuel is needed per second [22].Thus, fuel consumption increases with acceleration, and carbon dioxide emissions would then also increase with acceleration.According to Clark et al. (2003), carbon dioxide emissions should increase as vehicle power increases [29].Given the large set of second by second data collected and the number of variables examined, the overall R 2 value of 0.605 for CO 2 demonstrated a good model fit to the dataset.This means that 61% of total variation in the mass emission rate of CO 2 is explained by the chosen arterial model.The remaining 39%, which was not accounted for in the model may be due to variations in factors such as road grade, weather conditions, air conditioning usage, tire pressure, road surface conditions and total vehicle weight (which may have changed due to different drivers and passengers).Adding these factors as predictor variables would potentially increase the amount of variability in emissions accounted for.All final model variables are statistically significant, since the p-values from the SAS output were all less than 0.01.

Highway Model
The highway model was developed using data from the highway facility and a procedure similar to that for the arterial model was followed in selecting the best highway model.The top 3 highway models considered are shown in Equations ( 5), (6), and (7) below.
Highway Model Choice D

Vel Vel Acc Acc Dec
Vel Acc Highway Model Choice E Highway Model Choice F 2 CO 0.765 0.026 1.54 0.060 Table 3 compares the number of predictor variables and fitting parameters for the 3 potential highway models.The recommended model is F, based on R 2 and values, model simplicity, and ensuring that signs on the coefficients corresponding to reality (emissions increasing with velocity and acceleration).The R 2 value for Model F was almost as high as for Models D and E (0.265 vs. 0.268), but Model F includes fewer groups of predictor variables (3 vs. 5 or 6).All final model variables are statistically significant.The coefficients of both the velocity and the acceleration terms for the highway model are higher than for the arterial model.This is an indication that, on the highway facility, the changes in velocity and acceleration would produce higher CO 2 emissions compared with an arterial facility.

Applicability of the Models
The arterial model can only be used over velocity ranges of 0 to 54 miles per hour, and acceleration ranges of 0 to 4.9 mile per hour per second, and power demand ranges of 0 to 119 mi 2 per hr 2 per sec.Similarly, the chosen highway model can only be used over velocity ranges of 0 to 85 mile per hour, acceleration ranges of 0 to 4.5 mile per hour per second, and power demand ranges of 0 to 315 mi 2 per hr 2 per sec.The fact that different models were needed for the two roadway types may have been due to the differing velocities measured on each type (primarily <45 miles per hour for the arterial and >45 miles per hour for the highway).These ranges represent the range of dataset used in developing the models.Since the arterial model had a higher R 2 value (0.61), the user can have more confidence that this model is accounting for a majority of the variability in emissions, despite the fact that it is a simple model, dependant only on velocity, acceleration, and their interaction.The highway model, however, because of its lower R 2 value (0.27), should only be applied with caution, since a majority of the variability in emissions is not being accounted for.
The models developed in this paper are limited to predicting emissions from the Dodge Charger vehicle tested, and clearly cannot be generalized to the entire vehicle fleet.Using the methodology described in the paper, similar models could be built for other vehicles.In addition, we hypothesize that percent reductions in emissions due to changes in traffic system operation would likely be similar for other vehicles besides the Charger, even though the absolute magnitude of emissions would be different; this hypothesis, however, would need to be verified.
As an example application of the arterial emission model, the model could be used to estimate the emission benefits derived from coordinating traffic signals.In order to use the model for this purpose, velocity and acceleration profiles before and after coordination of signals would be needed.These profiles could be obtained from a micro-scale traffic simulation model, such as CORSIM, or by collecting field data with a vehicle instrumented with a GPS receiver, to capture second-by-second vehicle position data, which can be used to calculate instantaneous velocity and acceleration.Emissions for a section of roadway could then be computed on a second-bysecond basis, using the velocity and acceleration profiles.The emissions could then be summed over the roadway section, for before and after traffic signal coordination.A percent reduction in CO 2 emissions could then be determined for the roadway section.If it is assumed that the percent reductions from the Charger are similar to those for other vehicles, then the computed percent reduction would be a reasonable estimate for the roadway.
The small number of input variables means that the arterial model could be readily incorporated into a microscale traffic simulation model, in order to estimate emissions.

Conclusions and Recommendations
Micro-scale CO 2 emission models were developed for a light duty gasoline vehicle using SAS software to analyze on-road measurements of vehicle speed and exhaust emissions.The final multiple linear regression functions for CO 2 for arterials and highways included velocity, acceleration, and a velocity*acceleration interaction term.The arterial and highway models explained 61% and 27% of the variability in emissions, respectively.In future work, inclusion of additional factors such as road grade, weather conditions, air conditioning usage, tire pressure, road surface conditions and total vehicle weight could account for additional variability.
The arterial model can be used to evaluate proposed emissions control strategies.It can be used in particular to evaluate proposed changes in local traffic system operation or management, since it is presumably sensitive to a vehicle's modal changes-idling, cruising, accelerating, and decelerating.The arterial emissions model could be coupled with velocity and acceleration profiles obtained from a micro-scale traffic simulation model, such as CORSIM, or from field data from an instrumented vehicle, to estimate percent emission reductions associated with control strategies such as traffic signal coordination.
To achieve more definitive results, similar on-road emissions data could be collected from more test vehicles, in addition to the Dodge Charger used in this study.Future research could determine whether percent reductions in emissions due to changes in traffic system operation would likely be similar for other vehicles, even though the absolute magnitude of emissions would be different.In addition, emission estimates from this model should be validated using additional Charger data, and compared to estimates from other available models.
of the regression plane β k = Parameters (k = 0, 1, 2, •••, p − 1,); p is number of parameters ε i = Random error in Y for observation i CO 2 emissions was used as the response variable, with average velocity, acceleration, power demand, and time of the day (peak/off-peak) representing the predictor (independent) variables.Statistical Analysis System (SAS) software was employed for the data analysis.Six variables including the response variable were considered during the model building process, with the most significant variables left in the model at the end of the process.Predictor variables used were:  Vehicle Velocity (Vel) in miles per hour,  Vehicle Acceleration (Acc) in miles per hour per second,  Vehicle Deceleration (Dec) in miles per hour per second,  Power Demand (PD) in mile 2 per hour 2 per second, (acceleration * velocity)  Time of Day (TD), (Traffic Period), unitless; 1 or 0 (peak or off-peak).

R
The overall R 2 value of 0.27 for CO 2 demonstrated a . MEASURE estimates exhaust emissions as a function of vehicle operating modes, such as cruise, acceleration, deceleration and idling, rather than average vehicle speed.However, the model does not estimate CO 2 emissions, and it contains over 30 variables, making it data intensive to use.
 INTEGRATION: INTEGRATION is a trip-based microscopic traffic assignment, simulation, and optimization model.It can predict emissions from computed fuel consumption as a function of velocity and acceleration obtained from a dynamometer test [6]. EMIT: Based on dynamometer data for 344 lightduty vehicles, the EMIT (EMIssions from Traffic) model estimates CO 2 , CO, HC and NO x using a regression equation with speed and acceleration as explanatory variables [10]. Ahn et al. (2002) developed regression models to estimate light-duty vehicle emissions of CO, HC, and NO x based on instantaneous speed and acceleration levels.A model for CO 2 emissions was not developed, although the model for fuel consumption could be used as a surrogate [5].

Table 3 . Comparison of number of predictor variables and fitting parameters for potential highway models.
This unexplained part of the R 2 may be due to factors such as those mentioned above in the discussion of the arterial model.Further, it may be due to the fact that the data used in developing the highway model contained speed data that is below 45 miles per hour, which is representative of speeds found on an arterial facility, and may have represented anomalous conditions for the highway.Future research using freeway data should exclude velocity data less than 45 miles per hour.