Seeking for Passenger under Dynamic Prices: A Markov Decision Process Approach

In recent years, ride-on-demand (RoD) services such as Uber and Didi are becoming increasingly popular. Different from traditional taxi services, RoD services adopt dynamic pricing mechanisms to manipulate the supply and demand on the road, and such mechanisms improve service capacity and quality. Seeking route recommendation has been widely studied in taxi service. In RoD services, the dynamic price is a new and accurate indicator that represents the supply and demand condition, but it is yet rarely studied in providing clues for drivers to seek for passengers. In this paper, we proposed to incorporate the impacts of dynamic prices as a key factor in recommending seeking routes to drivers. We first showed the importance and need to do that by analyzing real service data. We then designed a Markov Decision Process (MDP) model based on passenger order and car GPS trajectories datasets, and took into account dynamic prices in designing rewards. Results show that our model not only guides drivers to locations with higher prices, but also significantly improves driver revenue. Compared with things with the drivers before using the model, the maximum yield after using it can be increased to 28%.


Introduction
Dynamic pricing is one of the key features that makes RoD attractive to both passengers and drivers-an effort to manipulate supply (the number of cars on the road) and demand (the number of passenger requests). Specifically, higher prices would attract more drivers and delay requests from passengers who are not in a hurry; Lower prices are the opposite [1].
Dynamic pricing can help to improve service quality, but it also poses new problems for researchers. In RoD services, the price multiplier is a new indicator that drivers choose to look for strategies, and it more accurately describes local supply and demand conditions. But effective strategies remain to be explored. For example, if all drivers flock to a particular area with a high price multiplier, there will be an oversupply in that area, causing the price multiplier to fall sharply. This not only created unstable prices, but also headaches for drivers seeking high prices. As a matter of fact, many news reports and research papers have discussed this intuitive "chase the wave" strategy; however, they sometimes give contradictory advice [1]. Therefore, how to recommend drivers to areas with high dynamic prices is a pressing issue rather than verbal advice and discussion.
Recommending seeking routes [2] to drivers has been a frequently-studied problem in traditional taxi service [3] [4] such as mining the pattern of customer seeking strategy in the taxi GPS track, establishing MDP model so as to evaluate some strategies, etc. But in RoD service, dynamic pricing is a new and accurate indicator of supply and demand condition, and should be considered in seeking route recommendation. Nevertheless, dynamic pricing is rarely used in the research of RoD to recommend destinations to drivers. This is because most suggestions that come from news reports or blogs are not rigorous enough, and some existing studies are mostly based on theoretical models that require a lot of assumptions and approximations. As the matter of fact, the lack of real data in RoD service has hampered research based on data analysis methods [1].
In this paper, we design a Markov Decision Process [5] [6] [7] [8] (MDP) model to answer "how to use dynamic prices to help drivers in seeking for passengers". We first illustrate the need to consider the impacts of dynamic prices by analyzing passenger order and car GPS trajectories data [9] [10]. We then establish an MDP model based on above datasets. In the model, first of all, the study area is meshed. Secondly, the passenger travel data and driver travel data are matched to the grid. Then we calculate the pickup probability of each grid to get the destination probability of passengers. Finally, we consider dynamic price as reward in our MDP model. To sum up, we adopt dynamic programming algorithm to solve MDP and obtain the optimal dynamics of each grid, thus recommending it to drivers.
The main contributions of this paper are as follows: 1) Introduce dynamic pricing into the seeking route recommendation problem in RoD service. 2) We design an MDP model to answer "how to use dynamic prices to help drivers in seeking for passengers". We also design a dynamic programming algorithm to solve the MDP. Finally, compared with real drivers, the maximum yield of our model can be increased to 28%. 3) We conduct extensive experiment on real service datasets (including: passenger data, driver GPS trajectories data [11], etc.) and then we get our evaluation results.
The remainder of this paper is organized as follows. Section 2 analyzes the patterns of dynamic prices. In Section 3 we design a Markov Decision Process (MDP) model to answer "how to use dynamic prices to help drivers in seeking for passengers". In Section 4 we evaluated our model. Finally, in Section 5 concludes the paper.

Dynamic Price Analysis
In this section, we will introduce what dynamic pricing is, and analyze the passenger data and driver GPS trajectory data of Shenzhou UCar [12] in November, and discover the potential impact of dynamic pricing on passenger and driver travel.

Dynamic Price Definition
Dynamic pricing can manipulate supply (the number of cars on the road) and demand (the number of passenger requests). Specifically, a higher price would attract more drivers and delay requests from passengers who are not in a hurry; and a lower price are the opposite. In most cases, dynamic prices are represented by price multipliers, so the fare of a trip is the product of a dynamic price multiplier (depending on conditions of supply and demand) and a fixed normal price (based on travel time and distance), hence, dynamic pricing can be written as: In Equation (1) . Meanwhile from Equation (1), it can be seen that the dynamic pricing has an effect on driver's revenue that the higher the dynamic pricing multiplier, the more the driver earns.

Data Analysis
In this paper, we randomly selected passenger data of one day for data analysis, as shown in Figure 1.
From Figure 1, it can be found that passenger demand continues to decrease from 12 midnight to 3 am, passenger demand reaches the lowest at 4 am, and then begins to increase slowly. Passenger demand from 6 am to 7 am it increases sharply, reaching the peak at 8 o'clock and decreasing at 9 o'clock. Therefore, we regard 6 -9 as the morning peak time, and for the same reason, we regard 16 -18 as the evening peak time.
According to Section 2.1 we divide orders into seven types based on dynamic price multipliers as shown in Figure 2.
It can be seen from Figure 2 that orders with a dynamic pricing multiplier of 1.0 accounted for the majority, and the rest accounts for a small part. This is in line with the laws of the market, because the dynamic price coefficient will gradually stabilize with the continuous changes of supply and demand. Therefore, it is especially important for drivers to pursue orders with highly dynamic  pricing. From the above results, in order to verify the importance of dynamic pricing, we conduct a temporal and spatial analysis of dynamic pricing multiplier.
From Figure 3(a) and Figure 3(b), it can be found that the dynamic pricing multiplier of the morning peak is generally lower than that of the evening peak. Beyond that the dynamic price coefficient at 16 o'clock in the evening peak is an upward trend and reaches dynamic equilibrium at 17 o'clock. Therefore, according to the above process, it can be found that the dynamic price multiplier Journal of Computer and Communications will be affected by time. However, from Figure 2, orders with high dynamic price multiplier only account for a small part. As can be seen from the previous analysis, there may be areas with high dynamic price multiplier and areas with low dynamic price multiplier. Hence, we continue to do spatial analysis on order data.
Figures 4(a)-(c) are price multiplier distribution maps in Beijing, which are respectively represented as low price, medium price and high price.
According to Figure 4(b) and Figure 4(c), it can be found that in the city center, the high-price multiplier area is far less than the medium-price multiplier area. However, it can be found that there are more areas with high dynamic prices outside the urban areas than within the urban areas. The possible reason is that drivers are reluctant to go to these places because of the higher cost of picking up passengers. In this case, it is difficult for passenger to hail a car in the remote areas.
As can be seen from Figures 5(a)-(c), the dynamic price coefficient fluctuates differently in different regions. In these three regions, it can be found that the region with low dynamic price coefficient fluctuates steadily, while the region with high dynamic price fluctuates sharply. Therefore, it can be considered that not all orders in regions with low dynamic prices are low, but the probability of receiving high price multiplier orders is particularly low. In the high dynamic ratio area, not all orders are of high quality but the probability of receiving high price multiplier orders is particularly large.
In the above process, we draw three conclusions that 1) The dynamic pricing has an effect on driver's revenue that the higher the dynamic coefficient, the more the driver earns. 2) According to the division of historical data, Beijing can be divided into three regions, namely, low price area, medium price area and high price area. 3) In the grid of high price area, not all orders are orders with high dynamic price coefficient, but the probability of drivers receiving high price multiplier orders is extremely high. In the low-price grid, not all orders are low dynamic price coefficient orders, but the probability of drivers receiving low price multiplier orders is extremely high. Hence, it is necessary to introduce dynamic pricing into our model as a consideration factor.

MDP Model in RoD
An MDP is described by tuples (S, A, P, R, δ), where S stands for the state space, A denotes the allowable actions, R is the collects rewards, P defines a state transition matrix, and δ is the discounts factor.

MDP for Ride-Hailing Drivers
In this section, we will develop an MDP model for ride-hailing drivers' random passenger seeking process. Notations which will be used in the subsequent analysis are listed in Table 1.  . We use ten numbers (1 -9) to index these signs, which is illustrated in Figure 6. Index 0 indicates that the driver dropped off a passenger, index 5 denotes that the driver does not have any arriving direction. Finally, a state in our MDP model can be written as The maximum number of states in our model can be calculated as: Nevertheless, the actual number of useful states is much less than this.
In decision-making states, each driver has nine actions to choose. Driver can choose an action from nine allowable actions to move neighbor gird or stay in current grid. Formally, it can be represented as: a A ∈ ,

{ }
, , , , , , , , . We use nine numbers (1 -9) to index these actions, which are shown in Figure 6(b). The number five stands for staying in the current gird. From Figure 6(a) and Figure 6(b), we can find that the index of the action and the index of D add up to ten (not including index 0), therefore, when the driver randomly chooses an action, the "incoming direction" of the next grid of d can be calculated as

State Transition
When the driver selects an action, there are two possible conditions. If the driver fails to pick up [13] a passenger, the driver will continue to select an action so as to go to neighbor grid to find passengers. If the driver successfully picks up a passenger, the driver will transport the passenger to the destination. Because of fuel consumption, the driver who fails to find passengers will have negative earnings. Figure 7 illustrates the aforementioned state transition process. The current state of the driver is represent the time that spends on driving from gird j to k. The driver will earn a fare of ( ) , y D l k Yuan, where represents the expected profits from girds j to k. The driver will start seeking from k again. Thus, the current state of the driver in grid k is seek driver driver s j t t j t j k t j k = + + + . The second possible condition is that the driver fails to pick up a passenger in grid j after ( ) seek t j . Then the driver will take an action from nine allowable actions to go to other grid so as to find passengers. For instance, we assume that the driver acts 9 a = ( ), then the ride-hailing drivers will end up in state

MDP Parameters
The pickup probability of passengers

Solving MDP
In our MDP model, our goal is to maximize revenue in the current time slot.
Hence, the optimal policy π is defined as follows: The pseudocode of DP algorithm is given in Algorithm 1. According to the data analysis, we use algorithms to assign values to the dynamic prices of each grid. When the dynamic price coefficient is determined. The algorithm first ge- seek driver seek driver y k

L T A P P d d t t D p
Ensure: The optimal policy π 1: k p is a

Evaluation
In this section, we will evaluate our MDP model from two aspects: 1) Is there any change in the way of finding passengers after using our model? 2) Is the driver's income greatly improved than they did before after using our model?

Seeking Strategy Evaluation
We conduct simulation based on MDP model and compare the simulation results with the passenger search methods of real drivers to verify the effectiveness of our recommendation. Figure 8 stands for a map of Beijing. We will select two areas which are Fengtai Qu and Xicheng Qu in order to evaluate drivers' seeking strategies. Figure 9(a) presents the search strategies of real drivers in the suburbs during the evening peak (The arrows of different sizes represent different drivers. Red and blue respectively indicate high price multiplier area, low price multiplier area.). It can be seen from Figure 9(a) that the majority of drivers miss local pickup opportunities because they are in a hurry to move to urban areas. It can also be found that a few drivers could find passengers by their experience, but there are too few experienced drivers to solve the problem of passenger hailing. Figure 9(b) presents the search strategies of MDP agent in the suburbs during the evening peak (The arrows of different sizes represent different agents.). From Figure 9(b), it can be seen that our algorithm will preferentially recommend agents to search for passengers in the local area. Meanwhile, in the search for the passengers, our algorithm will give priority to areas with high dynamic prices. By comparing Figure 9(b), we can find that after using our algorithm, agents will give priority to search for passengers locally, at the same time, our algorithm can alleviate the problem that it is difficult for passengers to take a taxi in the suburbs. Figure 10 presents the search strategies of MDP agent in the downtown during the evening peak (green background, pink and red background indicate low price multiplier, medium price multiplier and high price multiplier). It can be found from the above that our algorithm will recommend the driver to move slowly from the area of low dynamic price to the area of high dynamic price. In particular, our algorithm will not dispatch drivers from all regions to regions with the highest dynamic prices, but dispatch drivers in some regions to regions with the highest prices. The possible reason for this is that the use of dynamic pricing to dispatch drivers can prevent all drivers from flocking to a specific area with a high price multiplier, causing an oversupply situation in this area.

Revenue Evaluation
From Section 4.1, it can be known that when the driver uses our algorithm to search for passengers, our algorithm will recommend the driver to go to areas with high dynamic prices to obtain high-quality orders. Hence, we intend to use the quality of order acquisition to evaluate driver's revenue efficiency. The driver's revenue efficiency calculation, we divide the driver's income per order by the driver's working hours, the driver's working time is the sum of the time spent looking for customers and spent completing an order. Figure 11(a) shows the profit comparison diagram of drivers before and after using our algorithm (Pink represents the benefits obtained by drivers in areas with medium price). From Figure 11(b), it can be seen that driver's average profit per minute after using our algorithm is higher than before. Before using our algorithm, the average driver's minimum revenue per minute is 0.71 yuan. After using our algorithm, the maximum income of the average driver per minute is 0.8336 yuan. The maximum yield of our model can be increased to 17%.   Figure 11(b) shows the profit comparison diagram of drivers before and after using our algorithm (red represents the benefits obtained by drivers in areas with high price). From Figure 11(a), it can be seen that driver average profit per minute after using our algorithm is higher than before. Before using our algorithm, the average driver's minimum revenue per minute is 0.71 yuan. the maximum income of the average driver per minute is 0.91 yuan. The maximum yield of our model can be increased to 28%. Figure 12 shows the benefits obtained by the driver in finding passengers in different dynamic price regions after using our algorithm. Red represents the benefits obtained by drivers in areas with high price and pink represents the benefits obtained by drivers in areas with medium price. The maximum yield of our model can be increased to 9%.

Conclusions and Future Work
In this paper, we design a Markov Decision Process (MDP) model to answer "how to use dynamic prices to help drivers in seeking for passengers". We first show the importance and need to do that by analyzing real service data. We then design a Markov Decision Process (MDP) model based on passenger order and car GPS trajectories datasets, and take into account dynamic prices in designing rewards. Results show that, on the one hand, when searching for passengers in the suburbs, our model can guide drivers to areas with high dynamic prices in front of them and improve drivers' utilization rate. On the other hand, when searching for passengers in the urban area, our model will also guide the driver to slowly cruise to the dynamic high price area. In the dynamic high price zone, our model will dispatch drivers reasonably, which can prevent all drivers from flocking to a specific area with a high price multiplier, causing an oversupply situation in this area. Finally, compared with things with the drivers before using the model, the maximum yield after using it can be increased to 28%.
For future work, we will introduce multi-agent reinforcement learning to study the influence of dynamic price on driver seeking; at the same time, probabilistic model is introduced to simulate the fluctuation of dynamic multiplier.

Conflicts of Interest
The author declares no conflicts of interest regarding the publication of this paper.