Reference-Dependent Preferences and the Labor Supply of Chinese Drivers

This paper takes a comprehensive study of taxi drivers’ labor supply behavior using a new dataset of taxi drivers from China. We find strong evidence that the working hours of drivers are negatively related to the hourly rates, and this effect is both statistically and economically significant. We then conduct a discrete-choice model, showing that the probability of stopping keeps increasing as cumulative working hours increase, but the probability of stopping first increases and then decreases as the cumulative fare increases. Lastly, we use the asymmetric model with the income target and working hour target as dummy variables, and the probability of stopping is significantly positively related to income target but shows no significant relation with cumulative fare.


Introduction
There is a vast of studies in the economic literature focusing on the wage elasticity of labor supply. The neoclassic models of labor supply predict that work hours should respond positively to transitory positive wage changes, as workers intertemporally substitute labor and leisure, working more when wages are high and consuming more leisure when wages are low. While this prediction is straightforward, but it is difficult to find empirical support. The empirical evidence has been surveyed intensively (for example, Blundell and MaCurdy, 1999) and a summary of the findings is that wage elasticities of labor supply are generally very small, often not significantly different from zero, and sometimes even negative.
One criticism of this literature is that the standard neoclassical models assume that workers can choose their work hours in response to transitory wage changes, or alternatively, can select a job with the optimal wage-hours combination from a joint distribution of jobs. However, actual wage changes are rarely transitory, so the hypothesis of intertemporal substitution must be tested jointly along with the auxiliary assumption of persistent wage shocks. As a result, the insignificant or negative wage elasticity of labor supply can plausibly be attributed to specification errors.
The ideal test of labor supply responses to transitory wage changes would use a context in which wages are relatively constant within a short period but uncorrelated across periods. In such case, dynamic optimization models predict a positive relationship between wages and hours worked, because of the negligible impact of life-cycle wealth of the short period wage changes (see, for example, MaCurdy, 1981).
In order to realize the purpose of research, drivers, as one group of workers, provide us with the most appropriate research subject.
The most apparent advantage is that drivers face wages that fluctuate within a short period due to demand shocks caused by many factors, such as weather, traffic, day-of-the-week effects, holidays, and conventions. Although rates per hour/mile/job are set, during busy periods, drivers spend less time searching for customers and jobs and thus earn a higher hourly/daily wage. The wages tend to be correlated within the short periods and uncorrelated across periods.
Another advantage of focusing on drivers is that they can choose the number of hours they work each period, unlike most workers facing fixed work hours, e.g., eight hours per day and five days per week. In sum, such a study can be easily generalized to other types of workers who have the freedom to choose work hours/days or even the targeted customers, but a necessary condition is that there exist transitory wage changes.
In this paper, we use a comprehensive dataset of taxi drivers in Chengdu, China. Our dataset overcomes the aforementioned problems of the NYC taxi driver dataset. People usually do not tip taxi drivers in China and fare information automatically recorded by the meters on taxis. Hence, fares recorded are a very accurate measure of income earned in our dataset. The dataset contains over 14 thousand taxis. Each taxi there is an observation every minute including its location and status (with or without passengers). There are more than one billion observations in total. We further combine these minute observations into trips and we calculate the duration and fare earned for each trip.
Based on this comprehensive dataset, we perform empirical analyses. First, we conduct an OLS linear regression of working hours on the hourly rate earned for that day. The neoclassic theory predicts the coefficient of hourly rate to be positive, and a negative coefficient does not support the neoclassic model. Our estimation results show that hourly rate has a positive effect on working hours, and this relationship is statistically significant after controlling for a variety of fixed effects, including taxi fixed effects, weather fixed effect, day of week fixed effects, and week fixed effects. Considering the economic significance, a one standard The parameter η measures the wage elasticity of labor supply, and neoclassic models predict that η to be positive. An important econometric problem with this approach is that the estimate relies on there being significant exogenous transitory day-to-day variation in the average wage. This variation drives the accurate estimate of η. However, it is hard to see a source of legitimate variation in the average hourly wage in the real data.

Discrete-Choice Stopping Model
Alternatively, the model of driver daily labor supply can be estimated as a survival time model in which quitting can occur at discrete points in time. Without deriving a full dynamic solution to the optimal stopping problem, a simple discrete-choice problem can be implemented empirically as reasonable approximation.
At any point s, a driver can calculate the forward-looking expected optimal stopping point, s*. The optimal stopping point can be a function of many factors, including hours worked and expectations about future earnings possibilities, etc. If daily income effects are important, the optimal stopping point can also be a function of income earned. A driver will stop at s if s ≥ s* so that s − s* ≥ 0.
A reduced-form representation of R(s) = ss* is where i refers to driver; d refers to the date; c refers to the hour of the day; h s measures cumulative hours worked on the shift at s; y s measures cumulative income earned on the shift at s; X idc measures other determinants of the optimal stopping time.
The vector of X idc includes weather, a set of fixed effects for hour of the day, day of the week, and location within a province/city. These variables are included to capture variation in earning opportunities from continuing to drive.
A driver stops driving at t if R idc (s) ≥ 0. The coefficient α 1 measures whether the probability of quitting will be related to hours worked, and the coefficient α 2 measures whether income earned is important in deciding when to quit.

Asymmetric Estimation of Reference-Dependent Preferences
After any trip p during a shift, a driver can calculate the forward-looking expected optimal stopping point. This is a function of many variables, including hours worked so far on the shift and variables that affecting expectations about future earning possibilities. In addition, it could also be affected by the accumulated income in a nontraditional way: when the accumulated income is more than the reference income, there is a higher probability for the driver to stop working. An empirical representation of this reference-dependent model is given as follows: where C ijp represents the forward-looking expected optimal stopping point for driver i on shift j after trip p; X ijp is a vector of variables determining the optimal stopping time; Y ijp represents the cumulative income level for driver i on shift j at trip p; I[Y ijp > YT ij ] is an indicator equal to one if accumulated income is larger than the reference income level, and equal to zero otherwise; H ijp represents the cumulative working hours for driver i on shift j at trip p; I[H ijp > HT ij ] is an indicator equal to one if accumulated working hours is larger than the reference level, and equal to zero otherwise. The positive value of δ represents the incremental probability of stop working when the accumulated income is above the reference income level, and the positive value of γ represents the incremental probability of stop working when the accumulated working hours are above the reference level. This model can be easily extended to using only income or working hours as references.

Data Construction
We use a comprehensive dataset of taxi drivers in Chengdu, China from August 3 to August 23, 2016. The dataset contains over 14 thousand taxis. Each taxi there is an observation every minute from 6:00 am to 11:59 pm including its location and status (with or without passengers). There are more than one billion observations in total.
Compared to the dataset used in studying NYC taxi drivers, our dataset has CNY for every kilometer travelled between 2 km and 10 km, and the price is 2.85 CNY per km for over 10 km; if the speed is lower than 12 km per hour, the time counts toward waiting time and every 5 minutes waiting time is counted as 1 km travelled.
As a robustness check, we refine the dataset and keep the information for one driver over each day. Following standard literature, we identify driver shifts by the length of the taxi status without passengers. If it lasts for more than two hours for one taxi without passengers, we define it as a shift, and we keep the information only for the first driver starting from the beginning of the day to the time of the shift. We acknowledge that identifying accurate shifts is difficult from an empirical perspective, and we rely on this method commonly used in taxi driver literature. We also try to identify the shifts with longer time slots, and the results are all consistent.

Empirical Results
The literature of labor supply consists of two major competing theories, the neoclassical theory and reference-dependent theory. The empirical findings regarding these two theories are mixed and indecisive. This paper takes a comprehensive study of taxi drivers' labor supply behavior using a new dataset of taxi drivers from China. By conducting our study in a different setting from the literature, we hope to clarify the findings in the literature.

Evidence from the Wage Elasticity of Labor Supply
We perform a linear regression of working hours on the hourly rate earned for that day as discussed in the empirical model (1). Specifically, we regress Ln(Work Hour) on Ln(Hourly Rate) and control for a set of fixed effects. As we discussed earlier, neoclassic theory predicts the coefficient of Ln(Hourly Rate) to be positive, and a negative coefficient does not support the neoclassic model. We first list the summary statistics of the related variables for each taxi over each day. We have totally 197,573 taxi-day observations. The means of Total In-  (2), we include Taxi Fixed Effects. We can use this fixed effect to control for the working hour differences due to the different working habits of taxi drivers or different effects from regular work locations, etc. The coefficient of Ln(Hourly Rate) is −0.157, and it is again significant at 1% level. Column (3) of Table 1 includes Taxi Fixed Effects, as well as Weather Fixed Effects, Day of Week Fixed Effects, and Week Fixed Effects. The Weather Fixed Effects can use to control for the variations of working hours caused by the weather of the day, for example, rainy day versus sunny day might affect the working hours differently. The Day of Week Fixed Effects can take into consideration of the differences due to a weekday or a weekend. The Week Fixed Effects can control the differences week by week. These fixed effects are comprehensive and leave the hourly rate as a main source of variation in working hours. The coefficient in front of Ln(Hourly Rate) is −0.164, with a large magnitude of t-statistic of −9.351. All of the three columns show that Ln(Hourly Rate) has a negative effect on Ln(Work Hour), and the effect is statistical significant at 1% level. Now we consider the economic significance by using column (3) as an example. As shown in Putting together, as drivers work less when wages go up, it is clearly an opposite effect to what neoclassical theory predicts. While our finding is in line with the literature studying taxi drivers' labor supply. Farber et al. (2015) argue that the negative elasticity is not large enough, then he pointed out that this negativity could be due to the measurement or specification error which may lead downward bias of the elasticity. This is possibly due to that daily working hour is the dependent variable while the average hourly income is the ratio of daily income over daily hours.
Several papers in the literature then propose a possible way to fix this problem by using various instruments, i.e., other driver's hourly wage on the same day. Farber et al. (2015) show that although the OLS result produces negative elasticity, it will be strongly positive once the instrument variable is added, hence, support the neoclassical prediction. This type of measurement error may exist due to "tips" or "imperfectly recorded and transcribed paper trip sheets" in the NYC taxi dataset used in many papers including Farber et al. (2015).
Our dataset is almost immune to this problem for the following reasons. First, taxi drivers in China rarely receive tips and they do not count on that as part of their income. Second, during the sample periods, all the trips are recoded through meters without any manual input. When the accuracy of the dataset is not a concern, IV method may not be a good estimaton, because such instruments are lack of variation and essentially constant across drivers and days. The instruments therefore are rather weak in terms of the explanatory power.
An important econometric problem with this approach is that the estimate Table 2. Summary statistics on the daily basis.
(1) However, it is hard to see a source of legitimate variation in the average hourly wage in the real data. Hence, in the following, we examine the discrete-choice stopping model and its asymmetric effects.

Evidence from Discrete-Choice Stopping Model
The OLS linear regression produces negative elasticity of labor supply, which is also economically and statistically significant. This result cannot be explained by the neoclassical theory. On the other hand, the reference dependence model has a quite contrasting prediction on the elasticity as suggested in the previous section. To check if our OLS result is consistent with the reference dependence model, we follow Farber et al. (2015) to model the labor supply decision of taxi driver as a dynamic discrete choice problem, where they need to decide whether to continue working after each trip. The reduced-form therefore should take the potential earnings opportunities, hours worked, and income earned and other factors that could affect preferences for work into consideration.
As suggested in Farber et al. (2015), without deriving a fully dynamic solution to the optimal stopping problem, a simple discrete-choice problem can be implemented empirically as reasonable approximation. The optimal stopping point can be a function of many factors, including hours works and expectations about future earnings possibilities, etc. If daily income effects are important, the optimal stopping point can also be a function of income earned. Following our previous discussion of reference-dependent models, individuals can make decisions on either income or hour targets, or both of them. In order to identify the most relevant explanation, we examine all three models.
We first summarize all the related variables on the trip basis in   Week Fixed Effects Yes Yes ***denotes statistical significance (two tailed) at the 10%, 5%, and 1% levels, respectively. by day. In columns (1), (2), and (3), we conduct the OLS estimation, and in column (4), we estimate using probit model. Table 4 estimates the probability of a shift ending after each trip due to the marginal effects of accumulated income. In column (1), we show the OLS estimation result without controlling any fixed effects. As earned income accumulates, the probability of stopping starts increase significantly compared to the baseline level. Specifically, compared to the cumulative fare below 100 We can see a clear pattern that the probability of stopping slowly increases for the fare range between 300 and 400 CNY, and then it sharply increases for the fare range between 400 and 500. The probability of stopping peaks for the fare range between 600 and 700, and then gets lower as the fare range further increases. These results are all statistically significant at 1% level and they hold when controlling various fixed effects.
In column (4) of Table 4 The probability of stopping peaks at 1.799 for the fare range between 600 and 700, and then gets lower slightly to 1.700 and 1.372 for the fare range between 700 and 800 and fare range above 800, respectively. These results are all statistically significant at 1% level.
The reference-dependent model with income target suggested that 1) if income is below the income target, drivers have a higher marginal utility of income; 2) if income is above the income target, drivers have a higher marginal utility of leisure (disutility of work). Moreover, such a change around the income target is not smooth. It implies that the probability of stopping will be lowest when income is below the income target and will be highest when income is above the target. Our results in Table 4 indeed support this prediction.
One possible alternative explanation of the finding in Table 4 is that earned income and worked hours are highly positive correlated. Therefore, one may argue that taxi drivers' may make their decisions on when to stop working based on hour target instead of income targets or both of them jointly. To address this concern, we also check the reduced form estimates of the reference-dependent models with hour target and both hour and income target in the next two subsections.

Work Hour as Target
The reduced form of the working-hour dependent model can be estimated by  Table 5. Again, in columns (1), (2), and (3), we conduct the OLS estimation, and column (4) presents the estimation results using probit model. In column (1), we show the OLS estimation result without controlling any fixed effects. As working hours accumulate, the probability of stopping starts increase significantly compared to the baseline level. Specifically, compared to the baseline level of cumulative hour below 2 hours, Stop Trip first shows no significant difference for cumulative working hours between 2 hours and 4 hours; Stop Trip increases by 0.001 when the cumulative working hours are between 4 hours and 6 hours; Stop Trip increases by 0.003 when the cumulative working hours are between 6 hours and 8 hours; Stop Trip increases by 0.006  (1).
We can see a clear pattern that the probability of stopping slowly increases up to the working hours between 12 hours and 14 hours, and then it sharply increases for the hour range between 14 hours and 16 hours. The probability of stopping peaks for the working hour range above 16 hours, and we find no evidence for decreasing pattern as the working hours increase. These results are all statistically significant at 1% level and hold when controlling various fixed effects.
Column (4) of Table 5 presents the probit model regression results. Again, we cannot include Taxi Fixed Effects. The pattern from the probit model is generally consistent with the previous three columns. Comparing to the baseline level of working hour below 2 hours, the probability of stopping slowly increases from 0.118 with working hours between 4 hours and 6 hours to 1.157 with working hours between 14 hours and 16 hours, and then it sharply increases and reaches its peak to 1.576 for the working hours above 16 hours. These results are all statistically significant at 1% level.
Similar to the prediction of a reference-dependent model with income target, a reference-dependent model with hour target would suggest that the individual should have a higher marginal utility of work if working hours are below the target and higher marginal utility of leisure if working hours are above the target. In terms of probability of stopping, we should expect this probability to peak around the target working time.
On the other hand, the neoclassical model predicts that as working hours accumulated, taxi drivers' marginal utility of leisure becomes larger. Therefore, the probability of ending a shift should keep increasing.
The finding in Table 5 apparently is inconsistent with the implication of a reference-dependent model using working hours as target. Together, our findings in Table 4 and Table 5 support that a reference-dependent model of income target at least plays a role in explaining taxi drivers' behavior of their labor supply in that it rules out the possible explanation of the model with hour target alone. However, we still need to consider the model with both income and hour target as these behaviors could be better explained in this model as income target may play different roles with hour target met or not.

Both Income and Work Hours as Targets
The income and working hour dependent model can be estimated by regressing  (2). The estimation results are presented in Table 6. Similarly as presented in Table 4 and Table 5, we conduct the OLS estimation in columns (1), (2), and (3), and we estimate using probit model in column (4). Table 6. Regression of stopping trip on both fare and hour range dummy variables.
In column (1) Comparing the result here with those in the column (1) of Table 4 and column (1) of fare range between 500 and 600, respectively. The probability of stopping keeps increasing and reaches its peak at 0.418 for the fare range between 600 and 700, and then gets lower to 0.334 and 0.147 for the fare range between 600 and 700 and fare range above 800, respectively. The pattern from the effects of hour range is generally consistent with the previous three columns. Comparing to the baseline level of working hour below 2 hours, the probability of stopping slowly increases from 0.107 with working hours between 4 hours and 6 hours to 0.818 with working hours between 12 hours and 14 hours, and then it sharply increases by 1.306 with working hours between 14 hours and 16 hours, and it reaches its peak to 1.951 for the working hours above 16 hours. These results are all statistically significant at 1% level.
Overall, we find that the probability of stopping keep getting larger at an increasing rate as working hours accumulated, but as the cumulative fare increases, it first increases and then decreases.
The reference-dependent model discussed in the previous section suggested that there are two "domain of losses": 1) If income is below the income target, drivers have a higher marginal utility of income; 2) if hours are above the hours target, drivers have a higher marginal utility of leisure (disutility of work). It implies that the probability of stopping will be lowest when income is below the income target and hours are below hours target and will be highest when income and hours are above the income and hour target respectively.

Evidence from Asymmetric Models
To further check the robustness of our finding, we follow Crawford andMeng (2011) andFarber (2008) to estimate a reduced-form of stopping probability with dummy variables to measure the increment effects due to hitting the income and hours targets. As discussed in the empirical model ( Table 2. Here we calculate the sample average for each taxi, and additionally, we also calculate the average across all taxi drivers, and the results do not vary much. The two dummy variables imply whether income or working hours above the targets. If any of their coefficients are positive, it is consistent with the prediction of a reference-dependent model. Table 7 lists the regression results of the asymmetric effects of income and hour targets. We report the t-statistics in parentheses and standard errors are clustered by day. In column (1), we exclude all the fixed effects. Column (2) includes Taxi Fixed Effects, and column (3) includes Taxi Fixed Effects, Weather Fixed Effects, Day of Week Fixed Effects, and Week Fixed Effects. The estimation results in the three columns are virtually the same.
Taking column (3) as an example, the coefficient of Income Target is 0.045, and it is significant at 1% level. The coefficient of Hour Target is also significantly positive. However, when we look at the coefficients of Ln(Cum Fare) and Ln(Cum Hour), we see a totally different pattern. The coefficients of Ln(Cum Fare) is not significantly different from zero, meaning that once we take into consideration of the effects from Income Target, the log level of cumulative fare Open Journal of Social Sciences shows no effect on the probability of stopping. In contrast, the coefficient of Ln(Cum Hour) is 0.008, and significant at 1% level. The significant positive coefficient of the log level of cumulative working hours shows that the probability of stopping always increases as working hour increases.
Overall, our evidence from the asymmetric model again shows that we can use the reference-dependent model with income as target to explain the behaviors of taxi drivers, but there is a lack of evidence supporting working hour dependent model.

Discussion
We find strong evidence that the working hours of drivers are negatively related to the hour rates, and this effect is both statistically and economically significant.
We then conduct a discrete-choice model and estimate the probability of stopping on a set of cumulative fare ranges and cumulative working hours. This is consistent evidence showing that the probability of stopping keeps increasing as cumulative working hours increase, but the probability of stopping first increases and then decreases as the cumulative fare increases. This indicates the existence of an income target in taxi drivers' labor supply decisions. Lastly, we use the asymmetric model with the income target and working hour target as dummy variables, and the probability of stopping is significantly positively related to income target but shows no significant relation with cumulative fare. In contract, both working hour target and cumulative working hours seem to be important to explain the probability of stopping.
Overall, our results clearly reject the prediction of the neoclassical theory as the elasticities of labor supply is significantly negative. More interestingly, among the three reference-dependent models, our results are better explained by the income-based reference model. That taxi drivers seem to target certain income levels instead of total working time. This finding is quite different from the literature. For example, Crawford and Meng (2011) find that their results are more in line with the reference dependent model with both income and hour targets. One possible explanation of the difference between these findings is that Chinese taxi drivers may view the income and leisure differently compared to their counterparts in New York City. Such difference may be due to cultures, working conditions, and living environment, etc.

Conclusion
Drivers are a preferred research subject for studying the wage elasticity of labor supply, which has been proved by the results of the above-mentioned models.
And applications of the dataset of taxi drivers in Chengdu, China, also expose the difference between literature and empirical results, which calls for further studies.
Drivers, of course, are not representative of the whole working population.
Besides some demographic differences, many other groups (e.g., farmers and small-business proprietors) have similar self-selected occupations with low variable wages, long work hours, and relatively high rates of accidents. Therefore, it is important for these works to make long-horizon planning and effectively allocate their labor and investment in economic and educational opportunities for themselves and their children. This is where calling for attention and help from educators and policy makers to improve the social welfare of a nation.

Conflicts of Interest
The author declares no conflicts of interest regarding the publication of this paper.