_{1}

^{*}

In recent years, ride-on-demand (RoD) services such as Uber and Didi are becoming increasingly popular. Different from traditional taxi services, RoD services adopt dynamic pricing mechanisms to manipulate the supply and demand on the road, and such mechanisms improve service capacity and quality. Seeking route recommendation has been widely studied in taxi service. In RoD services, the dynamic price is a new and accurate indicator that represents the supply and demand condition, but it is yet rarely studied in providing clues for drivers to seek for passengers. In this paper, we proposed to incorporate the impacts of dynamic prices as a key factor in recommending seeking routes to drivers. We first showed the importance and need to do that by analyzing real service data. We then designed a Markov Decision Process (MDP) model based on passenger order and car GPS trajectories datasets, and took into account dynamic prices in designing rewards. Results show that our model not only guides drivers to locations with higher prices, but also significantly improves driver revenue. Compared with things with the drivers before using the model, the maximum yield after using it can be increased to 28%.

Dynamic pricing is one of the key features that makes RoD attractive to both passengers and drivers—an effort to manipulate supply (the number of cars on the road) and demand (the number of passenger requests). Specifically, higher prices would attract more drivers and delay requests from passengers who are not in a hurry; Lower prices are the opposite [

Dynamic pricing can help to improve service quality, but it also poses new problems for researchers. In RoD services, the price multiplier is a new indicator that drivers choose to look for strategies, and it more accurately describes local supply and demand conditions. But effective strategies remain to be explored. For example, if all drivers flock to a particular area with a high price multiplier, there will be an oversupply in that area, causing the price multiplier to fall sharply. This not only created unstable prices, but also headaches for drivers seeking high prices. As a matter of fact, many news reports and research papers have discussed this intuitive “chase the wave” strategy; however, they sometimes give contradictory advice [

Recommending seeking routes [

In this paper, we design a Markov Decision Process [

The main contributions of this paper are as follows: 1) Introduce dynamic pricing into the seeking route recommendation problem in RoD service. 2) We design an MDP model to answer “how to use dynamic prices to help drivers in seeking for passengers”. We also design a dynamic programming algorithm to solve the MDP. Finally, compared with real drivers, the maximum yield of our model can be increased to 28%. 3) We conduct extensive experiment on real service datasets (including: passenger data, driver GPS trajectories data [

The remainder of this paper is organized as follows. Section 2 analyzes the patterns of dynamic prices. In Section 3 we design a Markov Decision Process (MDP) model to answer “how to use dynamic prices to help drivers in seeking for passengers”. In Section 4 we evaluated our model. Finally, in Section 5 concludes the paper.

In this section, we will introduce what dynamic pricing is, and analyze the passenger data and driver GPS trajectory data of Shenzhou UCar [

Dynamic pricing can manipulate supply (the number of cars on the road) and demand (the number of passenger requests). Specifically, a higher price would attract more drivers and delay requests from passengers who are not in a hurry; and a lower price are the opposite. In most cases, dynamic prices are represented by price multipliers, so the fare of a trip is the product of a dynamic price multiplier (depending on conditions of supply and demand) and a fixed normal price (based on travel time and distance), hence, dynamic pricing can be written as:

D y = p k × ( 15 + 2.8 × distance ) (1)

In Equation (1), we utilize p k to denote dynamic price coeﬃcient and p k ∈ { 1.0 , 1.1 , 1.2 , 1.3 , 1.4 , 1.5 , 1.6 } . Meanwhile from Equation (1), it can be seen that the dynamic pricing has an effect on driver’s revenue that the higher the dynamic pricing multiplier, the more the driver earns.

In this paper, we randomly selected passenger data of one day for data analysis, as shown in

From

According to Section 2.1 we divide orders into seven types based on dynamic price multipliers as shown in

It can be seen from

pricing. From the above results, in order to verify the importance of dynamic pricing, we conduct a temporal and spatial analysis of dynamic pricing multiplier.

From

will be affected by time. However, from

Figures 4(a)-(c) are price multiplier distribution maps in Beijing, which are respectively represented as low price, medium price and high price.

According to

As can be seen from Figures 5(a)-(c), the dynamic price coefficient fluctuates differently in different regions. In these three regions, it can be found that

the region with low dynamic price coefficient fluctuates steadily, while the region with high dynamic price fluctuates sharply. Therefore, it can be considered that not all orders in regions with low dynamic prices are low, but the probability of receiving high price multiplier orders is particularly low. In the high dynamic ratio area, not all orders are of high quality but the probability of receiving high price multiplier orders is particularly large.

In the above process, we draw three conclusions that 1) The dynamic pricing has an effect on driver’s revenue that the higher the dynamic coefficient, the more the driver earns. 2) According to the division of historical data, Beijing can be divided into three regions, namely, low price area, medium price area and high price area. 3) In the grid of high price area, not all orders are orders with high dynamic price coefficient, but the probability of drivers receiving high price multiplier orders is extremely high. In the low-price grid, not all orders are low dynamic price coefficient orders, but the probability of drivers receiving low price multiplier orders is extremely high. Hence, it is necessary to introduce dynamic pricing into our model as a consideration factor.

An MDP is described by tuples (S, A, P, R, δ), where S stands for the state space, A denotes the allowable actions, R is the collects rewards, P defines a state transition matrix, and δ is the discounts factor.

In this section, we will develop an MDP model for ride-hailing drivers’ random passenger seeking process. Notations which will be used in the subsequent analysis are listed in

Variable | Explanation |
---|---|

l | Index of the current grid |

t | Current time |

d | Incoming direction to the current grid |

s | State, s = ( l , t , d ) |

S | State space, The collection of all the states |

a | Action |

A | Action space |

t s e e k ( l ) | Time spent on seeking for a passenger in grid l |

t d r i v e r ( l , k ) | Time spend on driving from grid i to l |

d s e e k ( l ) | Distance traveled when seeking for a passenger in grid l |

d d r i v e r ( l , k ) | Distance traveled for moving from grid l to grid k |

P p i c k u p ( l ) | The probability that the driver can find out passenger in grid l |

P d e s t ( l , k ) | The probability a passenger picked up in grid l wants to go to grid k |

D y ( l , k ) | The dynamic price from l to k |

p k | Dynamic price coeﬃcient |

α | Coefficient of fuel consumption |

In our MDP model, the state s = ( l , t , d ) is composed of three components, namely, l represents the ID of the grid, current time t, and d indicates that the driver enters the current grid from different grid. Note that we will divide Beijing into 900 grids, hence l ∈ L = { 1 , ⋯ , 900 } . In our model, we intend to simulate an hour of MDP, so we set T = 60 . d represents the direction in which different grids enter the current grid where d = D = { ∅ , ↗ , ↑ , ↖ , ← , ↺ , → , ↘ , ↓ , ↙ } . We use ten numbers (1 - 9) to index these signs, which is illustrated in

In decision-making states, each driver has nine actions to choose. Driver can choose an action from nine allowable actions to move neighbor gird or stay in current grid. Formally, it can be represented as: a ∈ A , A = { ↘ , ↓ , ↙ , → , ↺ , ← , ↗ , ↑ , ↖ } . We use nine numbers (1 - 9) to index these actions, which are shown in

When the driver selects an action, there are two possible conditions. If the driver fails to pick up [

The first possible condition is that the driver successfully picks up a passenger in grid j after t s e e k ( j ) . In the circumstances, the customer may select to go to one of the grid cells as destination (denotes as k) at a probability P d e s t ( l , k ) . The driver will go to grid k to drop off passengers. Hence, we use t d r i v e r ( j , k ) to represent the time that spends on driving from gird j to k. The driver will earn a fare of D y ( l , k ) Yuan, where represents the expected profits from girds j to k. The driver will start seeking from k again. Thus, the current state of the driver in grid k is s = ( j , t + t s e e k ( j ) + t d r i v e r ( j , k ) + t d r i v e r ( j , k ) , 0 ) .

The second possible condition is that the driver fails to pick up a passenger in grid j after t s e e k ( j ) . Then the driver will take an action from nine allowable actions to go to other grid so as to find passengers. For instance, we assume that the driver acts a = 9 ( ↗ ), then the ride-hailing drivers will end up in state s ′ = ( j , t + t s e e k ( j ) , 1 ) (from the bottom left grid, ↗ ).

To sum up, the driver in any state s 0 = ( i , t , d ) , the driver takes an action to go from current grid to another gird. With the probability P p i c k u p ( j ) × P d e s t ( j , k ) , k ∈ L , the driver will transition to state s 1 = ( j , t + t s e e k ( j ) + t d r i v e r ( i , j ) + t d r i v e r ( j , k ) , 0 ) , and get a reward = D y − α ( d d r i v e r ( j , k ) + d s e e k ( j ) + d d r i v e r ( i , j ) ) . With the probability 1 − P p i c k u p ( j ) , the state of driver will transition to state s 2 = ( j , t + t s e e k ( j ) + t d i r v e r ( i , j ) , 10 − a ) . Get a Reward = − α ( d s e e k ( j ) + d d r i v e r ( i , j ) ) , which is negative.

The pickup probability of passengers P p i c k u p ( j ) calculation. Firstly, we divided the map of Beijing into 30 × 30 grids. Secondly, we projected passenger pick-up and drop-off point data and GPS trajectory points of empty drivers onto the map of Beijing. Finally, we use spatial connection to match points to each grid, thus we can know that the number of pick-up and drop-off points in each grid and the number of empty cars passing through the grid. Hence, P p i c k u p ( j ) is calculated by dividing the pickup points of a grid by the number of empty cars passing through the grid. Let n p i c k u p be the number of pickup points in the grid and n p a s s the number of idle taxis crossing the grid, therefore the pickup probability can be expressed as:

P p i c k u p ( j ) = n p i c k u p ( j ) n p a s s ( j ) (2)

The passenger destination probability P d e s t ( j , k ) calculation. When the driver successfully picks up a passenger in gridj after t s e e k ( j ) , the passenger will go to the k ∈ L grid with probability P d e s t ( j , k ) . Hence, in order to calculate the destination probability, we first calculate the number of passenger orders from grid j to grid k, (denote as n j → k ( j ) ). Secondly, we use n j → k ( j ) to divide by n p i c k u p . Hence, finally the P d e s t ( l , k ) can be written as:

P d e s t ( j , k ) = n j → k ( j ) n p i c k u p ( j ) (3)

The driving time t d r i v e r ( j , k ) denotes time spend on driving from grid j to grid k. So, we are able to take the average of all driving times from grid j to grid k as an approximation of the t d r i v e r ( j , k ) . The driving distance d d r i v e r ( j , k ) is also calculated as the average distance from grid j to grid k.

The seeking time t s e e k ( j ) , we calculate all drivers’ search time for passengers and it is about 300 meters/min, hence, we set the searching time t s e e k ( j ) = 1 , The searching distance d s e e k is set as 300 meters.

In our MDP model, our goal is to maximize revenue in the current time slot. From Sub 3.2, we simulated the driver’s one-hour MDP search process, therefore, when t > 60 , our model will stop. For each a, if the driver takes an action a in states, the V ′ ( s , a ) function represents the maximum expected return in the current time slot. V ( s ) is the maximum expected return of state s. Finally, V ′ ( s , a ) is calculated as:

V ′ ( s , a ) = ( 1 − P p i c k u p ( j ) × [ − α ( d s e e k ( j ) + d d r i v e r ( i , j ) + V ( s 1 ) ) ] ) × P p i c k u p ( j ) + ∑ k = 1 | L | P d e s t ( j , k ) × [ D y ( j , k ) − α ( d s e e k ( j ) + d d r i v e r ( i , j ) + d d r i v e r ( j , k ) ) + V ( s 2 ) ] (4)

Hence, the optimal policy π is defined as follows:

π ( s ) = arg max V ′ ( s , a ) (5)

V ( s ) = V ″ ( s , π ( s ) ) (6)

The pseudocode of DP algorithm is given in Algorithm 1. According to the data analysis, we use algorithms to assign values to the dynamic prices of each grid. When the dynamic price coefficient is determined. The algorithm first generates a state s = ( l , t , d ) ,s will try all actions in order to find the best action (line 5 - line 8). When the driver takes an action, the total expected reward of state is calculated using Equation (1) (line 9). Then get the best action based on expectations (line 10 - line 11), finally, output our optimal strategy (line 16).

Algorithm 1 has time complexity, because | D | and | A | are small constant numbers, the complexity can be rewritten as O ( | T | × | L | ) . In the same way, the space complexity is O ( | T | × | L | ) .

In this section, we will evaluate our MDP model from two aspects: 1) Is there any change in the way of finding passengers after using our model? 2) Is the driver’s income greatly improved than they did before after using our model?

We conduct simulation based on MDP model and compare the simulation results with the passenger search methods of real drivers to verify the effectiveness of our recommendation.

will give priority to search for passengers locally, at the same time, our algorithm can alleviate the problem that it is difficult for passengers to take a taxi in the suburbs.

From Section 4.1, it can be known that when the driver uses our algorithm to search for passengers, our algorithm will recommend the driver to go to areas with high dynamic prices to obtain high-quality orders. Hence, we intend to use the quality of order acquisition to evaluate driver’s revenue efficiency. The driver’s revenue efficiency calculation, we divide the driver’s income per order by the driver’s working hours, the driver’s working time is the sum of the time spent looking for customers and spent completing an order.

In this paper, we design a Markov Decision Process (MDP) model to answer “how to use dynamic prices to help drivers in seeking for passengers”. We first show the importance and need to do that by analyzing real service data. We then design a Markov Decision Process (MDP) model based on passenger order and car GPS trajectories datasets, and take into account dynamic prices in designing rewards. Results show that, on the one hand, when searching for passengers in the suburbs, our model can guide drivers to areas with high dynamic prices in front of them and improve drivers’ utilization rate. On the other hand, when searching for passengers in the urban area, our model will also guide the driver to slowly cruise to the dynamic high price area. In the dynamic high price zone, our model will dispatch drivers reasonably, which can prevent all drivers from flocking to a specific area with a high price multiplier, causing an oversupply situation in this area. Finally, compared with things with the drivers before using the model, the maximum yield after using it can be increased to 28%.

For future work, we will introduce multi-agent reinforcement learning to study the influence of dynamic price on driver seeking; at the same time, probabilistic model is introduced to simulate the fluctuation of dynamic multiplier.

The author declares no conflicts of interest regarding the publication of this paper.

Shen, Q.R. (2021) Seeking for Passenger under Dynamic Prices: A Markov Decision Process Approach. Journal of Computer and Communications, 9, 80-97. https://doi.org/10.4236/jcc.2021.912006