The Sensitivity of Model Results to Specification of Network-Based Level of Service Attributes: An Application of a Mixed Logit Model to Trave Mode Choice

The need for travel demand models is growing worldwide. Obtaining reasonably accurate level of service (LOS) attributes of different travel modes such as travel time and cost representing the performance of transportation system is not a trivial task, especially in growing cities of developing countries. This study investigates the sensitivity of results of a travel mode choice model to different specifications of network-based LOS attributes using a mixed logit model. The study also looks at the possibilities of correcting some of the inaccuracies in network-based LOS attributes. Further, the study also explores the effects of different specifications of LOS data on implied values of time and aggregation forecasting. The findings indicate that the implied values of time are very sensitive to specification of data and model implying that utmost care must be taken if the purpose of the model is to estimate values of time. Models estimated on all specifications of LOS-data perform well in prediction, likely suggesting that the extra expense on developing a more detailed and accurate network models so as to derive more precise LOS attributes is unnecessary for impact analyses of some policies.


Introduction
The need for travel demand models is growing due to rising travel activities in response to increasing incomes and urban population in large cities in many developing countries in Asia and Africa.Detailed and accurate data relating to land use, transportation systems and their performance, and people's travel behavior including their socioeconomic characteristics are needed to estimate the travel demand models.Such detailed data are not often collected routinely in most developing countries.Even if the data are available, they may not be accurate enough.Data on travel behavior and socioeconomic characteristics are obtained from travel surveys while the data relating to transportation level of service (LOS) attributes are mostly obtained from the zonal-based network models.Developing detailed and correct network models that can produce reasonably accurate estimates of LOS attributes is not trivial [1].All cities may not have adequate resources including human resources, money, technology and so on to develop such a network.Sometimes the LOS attributes have to be derived in a short time.Many rapidly growing cities in Asia and Africa usually lack appropriate network models and hence LOS attributes which indeed seriously constrain modeling travel demand.However, the data limitations are not confined to developing countries only.Even highly developed country like Norway may sometime lack accurate data to model travel behavior.The modeler therefore has to use available data and consequently modeling activities should account for such data limitations [2].A LOS variable used in a model may contain a mixture of systematic and random errors or errors may not have any particular pattern in general.Some of the errors can be known, for example, missing toll, which can be corrected easily later, while others are unknown and cannot be corrected.
In light of the situations discussed above, this paper explores the sensitivity of model results to different specifications of network-based LOS attributes using mixed logit (ML) model with mode choice for work trips in Oslo as an example.Specifically, we used two different data sets of network-based LOS attributes.The first data set of LOS attributes ("striplos") was obtained from network models developed in 2002 for the whole country.The LOS attributes were derived from scratch for the whole country with limited resources within a short time.Travel times by car were taken from an uncongested road network although most car drivers in the Oslo-region experience congestion.Coding of road tolls on the toll cordon in Oslo was missing.Public transit fares were estimated as a function of distance despite the fact that Oslo has a flat fare system and the remaining region has a fare system based on the number of fare-zones transversed.Some public transit routes were also missing.Those LOS attributes were used to estimate national transport models for the whole country [3] based on a national travel survey conducted in 2001.
Striplos had some obvious deficiencies with respect to cost of travel by public transport and car.These are relatively easy to detect and make corrections for.Errors that stem from the coding of road network and public transport routes are more difficult to detect and make corrections for.In preparing the data, we made the same corrections that were made in estimating the national models.We used the same values for both directions if LOS attributes were missing for one direction.We also checked for unreasonable directional asymmetry of attributes for car driving.
The second set of LOS attributes ("nylos") on the other hand was obtained from the well established network model.The network model has been existed in the Oslo-region since 1990 and has been continuously updated and improved.Travel time by car for the morning peak was also available.It was assumed that a return trip in the afternoon peak would take approximately the same time although it may not be necessarily true.Public transport assignment was also based on another network model that, it is believed, should give better estimates of different travel time components of public transport.The coding of the road network was also better and more detailed.The nylos overcomes most of the deficiencies that striplos had so it is in general a typical LOS data of transportation system performance.Presumably, LOS attributes of nylos should be more accurate than that of striplos.However, nylos is by no means a perfect data set either.The nylos was used to re-estimate the simultaneous mode/destination choice model for work trips in the Oslo-region estimated for the national travel demand model [4].
There could be many instances in many cities, especially in developing countries, where LOS attributes have to be obtained with limited resources both with re-spect to time, money and technology.The purpose of this paper is therefore to investigate the implications of using LOS attributes measured at different levels of accuracy in model results, including forecasting.We also explore whether it is possible to correct for the aforesaid limitations relating to network-based LOS attributes.
As type and severity of errors in different variables may vary from case to case, the results presented in this paper can only be an example of the consequences of estimating the same model on two different sets of LOSdata, of which one presumably is of better quality than the other.As long as we are unable to quantify the quality of a data set and relate this in a meaningful way to the results of model estimation, it is impossible to draw general conclusions.
The remainder of the paper is organized as follows.Section 2 discusses the current state of knowledge relevant to the study.Section 3 describes the data.Section 4 explains the theoretical background and modeling approach.Section 5 presents the results and discussion followed by conclusions in Section 6.

Review of Literature
This section briefly reviews the literature related to data accuracy and model results, and disaggregate travel mode choice model estimation with different specifications of network-based LOS attributes.

Data Accuracy and Model Results
Alonso [5] investigates the implications of imperfect data on modeling and prediction.His investigation is not related to transportation but his conclusions are generally applicable to all fields including transportation using statistical analysis and modeling.Based on simple numerical exercises, he generalizes a few rules of thumb for model building as follows: 1) avoid inter-correlated variables, 2) add if possible, 3) multiply or divide if addition is not applicable, and 4) avoid taking differences or raising variables to powers as far as possible.He concludes in general that it is the correlation of input variables that causes large errors in outcome variables so he suggests avoiding the correlated variables.Most of the LOS attributes used in travel demand modeling are highly correlated.Given Alonso's prescription, we could somewhat reduce the output errors if we could exclude the correlated variables in the model.He also suggests using simpler models if the input data are not that accurate.Given Alonso's thesis, formulation of a model may also help minimize the output errors.Unfortunately, we cannot exclude cost and time, which are highly correlated variables, to estimate the travel demand models.
Later Daly and Ortuzar [6] and Ortuzar and Willumsen [1] apply Alonso's original ideas in transportation.Daly and Ortuzar [6] theoretically and empirically explore data aggregation in travel demand modeling, different types of errors in modeling and forecasting, and the trade-off between model complexity and data accuracy with focus on the forecasting of mode and destination choice.They recommend that 1) the model building should take into account the efficient allocation of modeling resources, 2) errors, especially those which violate basic assumptions of the model, should be minimized, and 3) since measurement error is an important component of the overall error in modeling, it should be minimized given the budget.They thus emphasize the most efficient allocation of modeling resources.

Mode Choice Model with Different Specifications of Network-Based LOS Attributes
Interest on mode choice model estimation with different specifications of network-based LOS attributes is not new.We briefly review the studies in this section.Reid and Small [7] investigate the effects of using temporal disaggregation of trip data on traveler behavior models.Their main finding are: 1) peak average variables tend to underestimate headways for public transport users and in-vehicle times for car trips; 2) model coefficients become biased and the magnitude of the bias can be quite severe in relatively complex choice functions.
Train [8] explores the sensitivity of parameter estimates to data specification in a logit model for travel mode choice.He analyzes the effects of correcting some of the inaccuracies in the network LOS attributes on the estimated parameters.He compares the parameter estimates of models estimated on the standard network data and on temporally and spatially adjusted data so as to correct the problems in the standard network data.He seemingly concludes the following: 1) Temporal adjustment of the standard network data is perhaps advisable for analyzing policies affecting transfer wait times, 2) Spatial adjustment seems advisable for policies affecting distances to bus stops; and 3) It seems no adjustment is needed for analyzing policies that affect neither walk times nor transfer wait times.However, he evaluates the sensitivity of the parameter estimates just by "eye-balling" without taking into consideration of the variances of the estimates, their relative magnitudes, and the impacts on aggregate forecasting.Further, he is not sure whether the adjustments in the standard network data yield better estimates of the values of walk and transfer wait times.It seems that his findings are not clearly irrefutable.
Similarly, Ortuzar and Ivelic [9] investigate the effect of using more precise measures of the variable 'waiting time' in public transport modes.They conclude that clearly better models are resulted in by more detailed values, entailed replacing crude measures based on the average frequency at different distances to the central business district by more accurate values obtained with the aid of state-of-the-art public transport assignment models (cited in [6]).
Further, Ortuzar and Ivelic [10] examine both in theory and in practice the problem of using less than fully disaggregate date in estimating logit model of travel mode choice for a trip to work.They replaced peak average values of travel times by more precise values for each traveler depending on the exact time of the trips.They estimated the models with and without temporally disaggregate data on travel times.Contrary to their own findings [9], they could not conclude that the models estimated on temporally disaggregate data resulted in significantly better models and stable parameter estimates.They suggest assigning priority to cost over accuracy of model results in such cases.
Recently, Steimetz and Brownstone [11] use multiple imputation approach to overcome the problem of noisy data to estimate mode choice model in estimating commuters' value of time.Similarly, to solve the problem of sparse data, Monzon and Rodriguez-Dapena [12] use double weighted estimator for long distance transport mode choice models to estimate the choice of mode of transport for long-distance trips.They successfully validated the method in the case study of the Madrid-Barcelona interurban corridor in Spain.They claim that their results allow achieving a cheaper survey procedure for interurban transportation planning activities.
Daly and Ortuzar [6] mention in their paper that the coefficients of the detailed models are significantly better than those of the models estimated on spatially aggregated data of LOS attributes.From the review of the empirical studies of disaggregate mode choice model estimation using LOS attributes measured at different levels of accuracy, they apparently conclude that the accuracy of LOS attributes to estimate the mode choice models depend on the relative importance of the various considerations and the context.Finally, they recommend that 1) The modeling process should take into consideration of the most efficient allocation of modeling resources, and 2) tistical analysis and modeling.It is generally accepted that errors in input data create errors in models, and often those errors can become far more serious in the model than appears in the data.It is also mostly accepted that developing an accurate and detailed network model to derive accurate enough LOS attributes is not that trivial [1].Daly and Ortuzar [6] therefore emphasize that errors especially those which violate basic assumptions of a model should be minimized.
Studies on the problem of disaggregate travel mode choice model estimation using LOS attributes measured at different levels of accuracy are conducted in different situations for specific problems with different assumptions.Additionally, those studies focus on different variables and do not have consistent findings.Consequently, it is difficult to draw general conclusions.The study in this paper examines the effects of using network LOS attributes measured at different levels of accuracy on relative magnitudes of the coefficients, specifically a value of time implied by travel demand models, and aggregate forecasting.The modelers will frequently encounter the situation dealt in this paper.
We generally expect that requirement of accuracy of LOS attributes may depend on the purpose of developing a model to some extent [6,13].A systematic error in one or more variables may not necessarily result in severe consequences if model applications also use the input data having the same systematic error.The parameter estimates will in most cases be robust to the error.But systematic error may lead to severe consequences if the purpose is to estimate the implied value of travel time savings for different modes.Random measurement errors always bias parameters in an unpredictable way [14].

Data
We use data from the Norwegian national travel survey undertaken in 2001 in this study supplemented by a similar travel survey undertaken in the Oslo-region in the same period and the LOS attributes of transportation system obtained from network models.The survey randomly selected 20,751 people.The respondents were asked about the socioeconomic characteristics of the household, his/her travel activities including daily travels, long travels, employment, work travel, spouse/cohabitant, household, household access to transport resources, and detailed information about the interviewees.A detailed description of the design and conduct of the survey, characteristics of the sample, and questionnaire administered can be found in Denstadli, et al. [15].The travel survey therefore provides the information on actual choice and socioeconomic characteristics of travelers including their households.This study uses a sub-sample for commuting trips to work, hereafter referred to as work trips, in Oslo of the national travel survey.The work trip is defined as a two way movement from home to work and back.Some trips had secondary destinations such as taking/collecting kids to/from kindergarten, shopping at grocery, etc on the way to/from work.There are 2,946 such trips in the sub-sample.As a part of data validation, several screening and consistency checks were performed.Some observations were deleted during the data validation process so the final data set had only 2, 876 work tours.
The possible alternatives for the population for work trips in the study area consisted of five modes, viz., walking (WK), cycling (CK), car driving (CD), car passenger (CP) and public transport (PT) with actual modal shares of 8%, 6%, 52%, 5% and 29% respectively.These five modes serve as the universal choice set.Each individual traveler may have different choice sets given their own circumstances and constraints.The criteria used for alternative availability when estimating the models on the different specifications of the LOS attributes were of course the same.As mentioned earlier, the two data sets of the network-based LOS attributes, namely, striplos and nylos, were used to study the models in this study.
Interzonal trips are sometimes excluded from model estimation since they do not appear on a network in the centroid-to-centroid travel.The exclusion of these trips results in biased sample thereby causing biased parameter estimates of model and biased aggregate forecasting [16].In this analysis, instead of outright deleting the intrazonal trips, they were included in the estimation.It was assumed that the length of the trip was equal to the length of the centroid connector and low speed for intrazonal trips by cars since the trips are short and usually stay on local roads.However, public transport was set unavailable for the intrazonal trips.

Theoretical Framework and Model Formulation
In this section, we present theoretical framework underlying choice modeling and model formulation.

Theoretical Framework
Choice models based on random utility maximization (RUM) hypothesis are the most widely used tools to examine individual travel behavior [1,[17][18][19][20].In a RUM framework of choice modeling, a decision maker facing a mutually exclusive and collective exhaustive set of finite number of alternatives obtains utility from each alternative and chooses the one with the highest utility.But the analyst is not able to observe the utility of alter-natives, therefore, decomposes the utility into two parts for analytical purposes: 1) an observable part and 2) an unobservable part.The utility of the alternative i  J for the decision maker n, U in , can therefore be written as: where V in and ε in represent the observed and unobserved parts of the utility of the alternative i for the decision maker n respectively from the point of view of the analyst.V in is the systematic or representative utility.The systematic utility is deterministic in the sense that it is broadly a function of a vector of attributes of the alternative, Z in , and a vector of characteristics of the decision maker, S n , so: RUM models of travel demand is a highly researched field where advanced models such as generalized extreme value models allowing for advanced nesting structures (cross-nesting and multi-levels and recursive) and models with mixed distributions (e.g.mixed logit) [20][21][22] are developed.In recent years, advanced discrete choice models, such as models with advanced nesting structures and models based on mixed distributions, are increasingly used to allow for flexible substitution patterns, correlation across alternatives and/or random taste heterogeneity [22].The mixed logit (ML) model is the most advanced model among choice models.Initially, Boyd and Mellman [23], and Cardell and Dunbar [24] used the ML model in modeling automobile demand.Their models were not truly disaggregate because their dependent variable was market shares rather than individual customers' choice and the explanatory variables did not vary over the decision makers.The extremely high cost of estimation (and hence implementing the results) prevented the use of ML model for many years after the initial development.The disaggregate ML model has been in the extensive use since the advent of high speed computers, mass storage devices, and simulation.The flexibility of the model and decreased cost of computation have led to the widespread use of ML models in diverse fields, including political science [25], resource economics [26], transportation [27], peace and conflict [28], and business [29], to name only a few.
The ML is an intuitive, powerful and practical model that prevents the three limitations of the multinomial logit model by allowing for random taste variation, unrestricted substitution patterns, and correlation in unobserved factors over time and space.McFadden and Train [30] prove that the ML model is a highly flexible model that can represent any RUM model.In order to realistically represent the individual choice behavior, the ML model has become one of the most widely used models in the field of demand modeling over the years.Since ML is the most advanced and flexible model among discrete choice models, we used the ML model of travel mode choice in this paper.

Formulating a Mixed Logit Model
An ML model includes a flexible random term η in representing additional unobservable factors, independent of ε in , in its utility function [19,21,22,30].The ML model is thus given by: As evident by the notation, the random terms η in vary across both alternatives and decision makers.The researcher can assume any convenient and/or appropriate distribution for η in .Consequently, the ML model is very flexible and free from restrictive assumption such as independent of irrelevant attributes.Unfortunately, the ML choice probabilities have no longer closed form due to the presence of η in in utility function of the model.As a result, we have to estimate the model with the help of some numerical solutions such as numerical integration, numerical approximation or simulation [19].The choice probability of alternative i for decision maker n with the ML model is the integral of logit choice probability over the assumed distribution of random terms as given by: where L n (i|η n ) is the logit choice probability ofalternative i for decision maker n conditional on η n : The ML models are normally derived either to allow flexible substitution patterns across alternatives or to accommodate random taste heterogeneity across decision makers [19,30].The former approach gives rise to the error components logit (ECL) model and the latter to the random coefficients logit (RCL1 ) model.One can also develop more advanced model to allow for random taste heterogeneity, inter-alternative correlation, and heteroscedasticity by combining the ECL-RCL approaches [22].The ECL model allows some elements of η to be shared across some alternatives which in turn introduces correlation between the random terms of these alternatives.The ECL model thus closely resembles the models using a nesting structure such as nested logit that accommodates correlation across some of the alternative while it simultaneously accommodates random taste heterogeneity and heteroscedasticity, which the nested logit model does not [21].
The utility function of the ECL model is specified in such a way that error components (EC) create correlations among utilities of different alternatives [19]: where γ n is a vector of random terms with zero mean and a covariance matrix Σ and z in is a vector of observed variables relating to alternative i.Put succinctly, z in is a vector of binary variables that indicate the EC entering the utility function of alternative i.The ECL model reduces to the logit model if z in is 0 for all the alternatives meaning that the unobserved parts of the utility are uncorrelated across alternatives.Naturally, the choice probability with the ECL model does not have a closed form and thus requires the numerical processes to solve the integrals.The ECL model enables to estimate the error components that measure the relative sensitivity of changes in choice of different alternatives and to accommodate heteroscedasticity in the unobserved influences on the choice.The choice probability of alternative i for decision maker n with the ECL formulation of the ML is then obtained by integration over the distribution of γ n [22]: where ф (γ n |0, Σ) is the joint normal density function of the elements in γ n .
In recent years, there has been a considerable interest in using ECL models in order to accommodate interalternative correlation and heteroscedasticity despite high cost of estimation (and hence application) and identification issues [31].[21,32] are the two recent and the most notable applications of the ECL model structures to investigate the factors influencing the time of day and mode choice.The ECL model is also applied to analyze the corporate bankruptcy and insolvency risk in Australia [29].

Formulating the ECL Model of Mode Choice
The ECL model was used to estimate the error components that measure the relative sensitivity of changes in mode choice and to accommodate heteroscedasticity in the unobserved influences on the mode choice.We explored various possible specifications of the ECL models.Mainly, we focused on two specifications: 1) common unobserved factors between walking (WK) and cycling (CK) (non-motorized modes), and 2) common unobserved factors between car driving (CD) and car passenger (CP) (car modes), or common unobserved factors among CD, CP and public transport (PT) (motorized modes).As WK and CK are both non-motorized modes, the hypothesis is that they share unobserved factors that introduce correlation between the utilities of those alternatives.Similarly, since CD, CP and PT are motorized modes, they presumably share unobserved factors that introduce correlation among the utilities of those alternatives.Among those specifications, we chose the one sharing the error components between alternatives belonging to motorized and non-motorized modes.
Based on the discussions above, we formulated the ECL model by adding the error components to the utility function as follows (by suppressing n): where ζ nmt and ζ mt are random variables drawn independently from the standard normal distribution, and σ nmt and σ mt are the standard deviations of the error components.σ nmt and σ mt are actually the elements of the variance-covariance matrix capturing the correlation between WK and CK, and CD, CP and PT respectively.In this specification, NMT (i) is a dummy variable with 1 for the alternatives belonging to non-motorized modes and 0 otherwise.This dummy variable thus determines whether the error component relating to the non-motorized modes is included in the utility function of alternative i.Similarly, MT (i) is also a dummy variable with 1 for the alternatives belonging to the motorized modes and 0 otherwise and thus determines whether the error component relating to the motorized modes enter the utility function of alternative i.The utility of an alternative i contains at most one of those two error components.
Estimation of this ECL model yields estimates of the parameters of the standard deviations of the error components (setting their mean to zero) in addition to the coefficients of the variables included in systematic utility functions.The variance of the error components related to nonmotorized modes is estimated by normalizing the variance of the error components of motorized modes to one because we can only identify the sum of the variances in this particular formulation.The relative magnitude of the variances of the error components associated with the non-motorized and motorized modes provide a measure of the relative sensitivity of these two modes to changes in the major attributes of travel modes such as travel time and cost components.
Three types of variables such as characteristics of the journey, characteristics of a traveler and his/her households, and performance of the transportation system as measured by the LOS attributes of different modes are included in systematic utility functions of travel modes [1,13].Table 1 illustrates names and definitions of the variables including alternative specific constants (ASC) where the first two letters refer to the mode (i.e.utility function) where the variable enters.The models were coded and estimated in BIOGEME [33] using 1,000 random draws.A set of "reasonable models" were formulated (and reformulated) and estimated (and re-estimated) based on a priori-consideration.The systematic process of model building led to the final specification (Table 3) based on goodness-of-fit measures, statistical tests and informal tests.
In addition to the ECL model of mode choice esti-mated on nylos, four ECL models on different specifications of striplos with identical specification of systematic utility function were estimated as follows: • Model 1 (base case): The first model was estimated on the LOS attributes at "face value" except the correction for missing direction and unreasonable asymmetry in LOS attributes.
• Model 2: In the second model, public transport fares in Oslo were corrected, but the fares in the rest of the region and between Oslo and the rest of the region were not adjusted.
• Model 3: In the third model, missing toll in Oslo was corrected for in addition to the correction made in the Model 2.
• Model 4: In the fourth model, an attempt was made to account for congestion in Oslo in addition to the correction made in the Model 3 since striplos did not have separate LOS values for peak and off-peak hours.Respondents had reported when the trip was taken and this enabled us to adjust for congestion by using a variable where driving time interacts with a dummy for peak travel, i.e., instead of higher driving times during peak hours, the model had additional coefficient for driving time.This variable was added to the utility of car driving in the model First the differences in LOS attributes between striplos and nylos are discussed.Then estimation results of the models based on statistical significance of coefficients, goodness-of-fit measures such as final log-likelihood, log-likelihood ratio index ( 2 ) and adjusted log-likelihood ratio index ( 2 ), expected signs of the coeffients and relative magnitudes of the coefficients within each model are compared.The implied value of time (VOT), the trade-off between travel time and travel cost, was chosen as a measure of relative magnitude of coeffients within a model.Additionally, market shares of difrent travel modes and aggregate direct and cross elasticies under different policy scenarios are compared.

Results and Discussion
This section presents and discusses the results regarding differences in LOS attributes, estimation results of mods, aggregate elasticity and aggregate forecasting.

Differences in LOS Attributes
First, the actual differences and degree of correlation between the corresponding LOS attributes of the two data sets were examined.Table 2 summarizes statistics on difference between striplos and nylos.As expected for the base case, nylos gave, on average, higher values for car time and car cost.After correction for road tolls, the mean and standard deviation of the difference became much smaller for car cost (CD_cost_c).The remaining difference can mainly be attributed to differences in driving distance caused by differences in coding of the road network.But correction for public transport fare (PT_fare_c) did not help much to reduce the difference.The mean differences were smaller both in absolute and relative terms with access/egress time (PT_wktm) and invehicle time of public transport (PT_invht), but the variance was much greater, especially for PT_wktm.The striplos had, on average, higher values and the differences were relatively big for waiting time (PT_wait) and number of transfers (PT_xfers) of public transport.This probably reflects a mixture of coding and different assignment algorithms.The distance based function used to estimate public transport fares for striplos obviously overestimated the fares significantly and a relatively large difference persisted even after the correction for the fares pertaining to internal trips in Oslo.
The difference between the LOS attributes in the two data sets is a mixture of systematic differences in the mean values and a "random" component.The extent of the random component varies between the attributes and is reflected in the ratio between standard deviation and mean value of the differences and in R 2 (given in the last column) if we run a linear regression between the respective attributes in the two data sets.The random component is the most important for PT_wktm, PT_wait and PT_xfers based on the R 2 .

Estimation Results
Since the models were formulated and reformulated in a number of ways during the model building process, a substantial body of empirical results was generated.However, this section analyzes the results of the final models with the best specification based on iterative process of model building.
Table 3 summarizes the estimation results of the ECL models on both the data sets.The estimation results of the ECL model yielded the parameters for the standard deviations of the error components in addition to the coefficients of the variables included in the systematic utility functions of the models.The variance of the error components related to nonmotorized modes was estimated by normalizing the variance of the error components of motorized modes to one because we can only identify the sum of the variances in this particular formulation.Contrary to the hypothesis, σ nmt was not statistically significant in all the models, likely indicating that there is no significant common unobserved factors and heteroscedasticity across the alternatives.
The estimation results of the "base case" on striplos looked reasonably good.All the coefficients were statistically significant and had the expected signs according to theory and the previous results except the number of transfers of public transport (PT_xfers).PT_xfers was significant, but had the wrong sign.It was also the case in the estimation of the national model that used the whole sample (Madslien, et al., 2005).As a result, the number of transfers was not included in the final model.But we included it in this study in order to compare the effects of different corrections to LOS attributes.The implied value of time (VOT) of car driver seemed low (Table 4).Without any further improvement of the LOSdata, this model might have been re-estimated without PT transfers in the model if the purpose of the study is not to estimate the VOTs and the transfer vari-able is not needed in the analysis.We tried this and the results were good enough based on signs, significance and relative values of the coefficients, and goodness-of-fit measures.
In the second model on striplos, we used the corrected PT fare for the trips within Oslo instead of the fare estimated from travel distance.However, the distance based fare was still used for other combinations of origin-destination although the actual fare system was based on 'fare zones' and the number of fare zones tranversed.All the coefficients, except PT_xfers, were significant with expected signs and reasonable magnitudes.PT_xfers had still the wrong sign but significant at lower confidence level.The VOTs in this estimation came close to 'official values' used in cost benefit analyses in Norway.Surprisingly, the model gave a poorer fit as measured by the value of the log-likelihood function at maximum.
In Model 3, introducing both corrected public transport fares and road tolls resulted in a further drop of odel fit!The VOTs slightly decreased compared to the m  were almost the same with all the models estimated on striplos.
As we see (Table 4), the VOTs changed due to a small change in the specification of model and/or data.
We cannot use a log-likelihood ratio test (LRT) to compare the models estimated on different versions of striplos because the data are not identical.But we can use an LRT to select between Model 3 and Model 4. Model 3 was chosen based on the log-likelihood ratio test.If transfer variable is not needed for analysis, we can just exclude this variable from estimation and estimate the Model 3 without transfer.We can also estimate Model 4 without transfer but with CD_tmrush.In this case, the LRT is applicable.
The wrong sign of PT_xfers might be attributed to serious coding errors of this variable with striplos.This may imply that it is very difficult to correct for such types coding errors.We can simply estimate the model excluding PT_xfers if this variable is not a variable of particular interest in the analysis.It is possible to correct for "known errors" such as coding of road tolls on the toll cordon.
All the coefficients including PT_xfers were significant with correct signs and reasonable relative magnitudes when the same model was estimated on nylos.The VOTs of different aspects of time were also reasonably accurate.The implicit weights on walking and waiting time of public transport also had the expected magnitude.The transit assignment algorithm of the network model used to produce LOS-data for public transport in nylos has a slight tendency to overestimate in-vehicle time and to underestimate waiting time.This is probably also reflected in the estimated parameters of PT_invht and PT_waitm, biasing the coefficients of PT_invht downward and PT_waitm upward (in absolute values).On the other hand, the assignment algorithm used to derive the LOS attributes for public transport in striplos tends to underestimate PT_invhtm and overestimate PT_waitm when multiple paths are used.This might also have been reflected in the parameter estimates.In addition, it is suspected that more routes were un-coded with striplos.With nylos, some local routes in periphery of the Oslo-region were un-coded.Surprisingly, the model estimated on nylos resulted in lower goodness of fit measures such as 2   and 2  compared to that of the models estimated on striplos despite nylos presumably being relatively more accurate than striplos.In terms of model fit and statistical significance, models estimated on striplos looked better than the model with nylos.
The results are generally plausible, but with a rather variable pattern of significance of variables across the different levels of accuracy of LOS attributes.The t-statistics of the estimated parameters do not show any general tendency.The t-statistics of some coefficients increase and some decrease when with different versions of striplos and change from striplos to nylos.Similarly, the relative magnitudes of the coefficients, as evident by the VOTs, do not remain the same.The most notable improvements with nylos were the correct sign of PT_ xfers and consistency of the relative magnitudes of coefficients with prior expectations.
We had hypothesized that there were some common unobserved factors and heteroscedasticity across some alternatives in the choice set.However, it depends on the choice set, data used in the estimation of the models, and the specification of systematic utility functions.If the systematic utility function is adequately specified that include major factors influencing the choice, there might not be room for the error components.This might be case with our models.The goodness of fit measures such as likelihood ratio index and adjusted likelihood ratio index are reasonably high with our models.This might be one of the reasons of the error components being insignificant.Additionally, this is actually a matter of empirical question whether the error components are significant or not.Moreover, it is reasonable that the error component of each model was not significant because the choice set and the specification of systematic utility function was identical and the data used in estimation were marginally different across the different models.Further, we also explored the various possible specifications of the ECL models and we reported the results of the best identified specification.

Aggregate Forecasting
Table 5 summarizes the predicted market shares on the same data set that was used in model estimation by modes under different scenarios, viz., increasing car driving cost by 10% (scenario 1), increasing PT fare by 10% (scenario 2) and reducing PT wait time by 10% (scenario 3).
As we see in Table 5, the predicted market shares were almost identical in each scenario in each model irrespective of the specification of network LOS attributes in the model.Each model predicted as intended according to theory.Each model predicted that an increase in car driving cost and PT fare and a reduction of PT waiting time did not have any impact on the market shares of CK and WK.The prediction of market shares is consistent with previous studies, theory, and expectation.
Model 3 estimated on striplos gave a better fit than the model estimated on nylos.We re-estimated the model without transfers as would be natural in estimation if a model that gives a wrong sign for a coefficient.presents implied direct and cross-elasticities respectively with the models estimated on both nylos and striplos.Both the models yielded direct and cross elasticities as expected according to theory, i.e., negative direct elasticity and positive cross elasticity for the attributes considered.Both the direct and cross elasticities are inelastic and just the opposite in scenario 2. The cross elasticities of CD cost were significantly higher than the own elasticities with scenario 1.The implied demand elasticities were almost similar in each scenario.The main difference was a moderately lower both direct and cross elasticities of PT waiting time of the model estimated on striplos.The models should thus give the similar conclusions for policy purposes.

Summary and Conclusions
The need for travel demand models is growing worldwide.Obtaining reasonably accurate LOS attributes of transportation system for different travel modes, the major factors shaping the travel demand, is not a trivial task.The objective of the paper was therefore to investigate the effects of using LOS attributes measured at different levels of accuracy on the results of disaggregate travel mode choice models.The case study in this paper is an example of what might happen practically when we correct for 'known errors' in the data set or switch to the data set with better quality.The sensitivity of model results including goodness of fit measures, VOTs and aggregate forecasting were compared by estimating ECL models for travel mode choice on the two data sets of LOS attributes.The difference between the LOS attributes in the two data sets was a mixture of systematic differences in the mean values and a random component.The extent of the random component varied between the attributes.Striplos yielded generally better fit and reasonably satisfactory models statistically.But number of transfers had wrong sign and VOTs were generally low without any correction.The correction helped to get VOTs of reasonable magnitudes.Model estimated on nylos on the other hand had all the significant coefficients with correct sings including number of transfers, the reasonable relative magnitude of coefficients of public transport travel components and reasonably plausible VOT estimates except slightly less model fit compared to the model on striplos.Models estimated on both striplos and nylos gave almost similar aggregate forecasting and aggregate elasticities on the same data set used in estimation.During the model building process, it was also observed that the VOTs changed significantly due to a small change in the specification of model and data implying that utmost care must be taken for specification of data and model if the purpose of the study is to estimate VOTs.
The lack of peak hour driving time in striplos appeared less important for parameter estimates than it was initially expected.This may not hold true in general since having a model that accounts for congestion correctly ought to give better results in an urban setting.
All the models predicted well implying that specification of LOS attributes matters less for prediction as long as the predictions are done with the same data.The requirement of data accuracy depends on the purpose of developing a model since the model with relatively inaccurate data also predicts reasonably well.Measuring data as accurately as possible is presumably more important if the purpose of the study is to estimate VOTs.anonymous reviewers for their help and compliments on earlier draft of this work.The author is fully responsible for any errors and omissions.