Two Agent Paths Planning Collaboration Based on the State Feedback Stackelberg Dynamic Game ()
1. Introduction
When multiple autonomous robotic platforms are involved in common task, the team coordination is usually an important control aspect. Robots in the team make their own decisions, which may be in conflict with other teammate’s decisions [1,2]. In particular, the coordination problem must be efficiently handled when two Autonomous Navigation Platforms, which are sharing the same configuration space, need to achieve a team formation in order to avoid a motion dreadlocks during a task execution such as moving through a single way point, cleaning floor, transporting load or tracking a common target [3]. Two Autonomous Navigation Platforms that should pass trough an emergency door in order to escape from a dangerous situation is such an example.
The motion deadlock avoidance problem has been studied for robot soccer game. The fuzzy logic approach was used to derive each robotic platform teammate action [4]. The use of fuzzy logic was motivated by the fact this problem could be easy solve in a similar manner as human beings [5]. Since this approach relies mainly on the accurate representation of a human being decisionmaking in similar situation, any misrepresentation may lead to a poor control decision for each teammate. Although this approach was tested in simulation, no experimentation was performed on real robotic platforms. Harunori [6] proposed a deadlock-free navigation scheme based on a centralized coordination approach for a team of robots. In order to assign a move task to one robot, the proposed scheme required a global path planning to be constructed in the configuration space for each robot. This operation is time consuming. Since this approach is centered on a single main component, the individual teammate decision is inhibited.
Among well known framework for studying robotic platform team collaboration is the game theory. When no natural decision making hierarchy can be established among teammates, Nash’s equilibrium approach is often used [7]. On the other hand, when this hierarchy exists, the Stackelberg theory is a good candidate in order to study the decision problem. Hence, robot team obstacle avoidance methods based upon these theories were recently proposed in [8,9]. In [8], the team coordination used a semi-cooperative Stackelberg equilibrium point [10]. However, the implicit assumption of no static feedback is done, and the equilibrium point is found only for one stage. In real life, this assumption is weak since each team has access to state feedback for other teammates.
This paper studies two autonomous agents collaboration using the game theory framework approach. The agents are heterogeneous Autonomous Navigation Modules that are sharing a common workspace [11]. From their initial configuration, they can move to a common place: this is a rendez-vous. The configuration of the team is therefore obtained by taking into account everyone position and orientation. From this point, we consider that one agent is acting as the leader (the other teammates are acting as followers) and the team must move according to the leader motion planning even if each follower can have its own planned trajectory. The rendez-vous problem is not considered for this paper. Instead, we focus on the team trajectory planning problem.
As stated before, the game theory offers a clear formulation for finding equilibrium point in situation where many decision makers (agents) are involved [12]. In this study, only two Autonomous Navigation platforms are involved, and it is assumed that a natural hierarchy exists between them [13]. By hierarchy, we mean that one platform (the leader) makes its decision and then advertises this decision to the other platform (the follower) [14-16]. One platform is clearly designed to act as the leader when a motion deadlock is detected [17,18]. In realistic navigation environment, both platform states need to be taken into account in a close loop manner [19,20]. Hence, state feedback controllers are required. The state feedback Stackelberg game theory is selected as the framework in order to solve collaborative decision problems [21,22].
This paper provides two contributions. We proposed a method for finding the solution of an important class of discrete-time two-agent non zero-sum dynamic games with linear state dynamic and quadratic cost functional. Our solution is an extension of the solution proposed by [10]. The main difference is that Ming [10] approach is related to the class of regulator controllers whereas the method presented here is related to the class of reference trajectory tracking. The second contribution of this works is related to the design of a robotic architecture based on the proposed method for deadlock avoidance when two robotic platforms are sharing the same workspace. This collaborative robotic architecture is based on the threelayer architecture concept [23]: the deliberative, the sequencer and the execution layer.
The rest of the paper is organized as follows. The path planning problem in linear space and the methodology based on Stackelberg game theory are presented in Section 2. In Section 3, reactive robotic architecture is presented. The simulation results and the conclusion are presented in Section 4 and Section 5, respectively.
2. Path Planning in Linear Space
2.1. Problem Formulation
Consider a system with the configurations of two Autonomous Navigation Modules (ANM). Given the following state equation:
(1)
where:
is the current stage; is a state vector of the system, at stage; are control signals or strategies generated respectively by agent 1 (ANM 1) and agent 2 (ANM 2), at stage; it is assumed that both control signals have the same dimensions; where is the admissible set of strategies for agent; is a transition matrix of the system, at stage; are control matrices respectively for agent 1 and agent 2.
Given the following functional:
(2)
(3)
(4)
is the finit optimization horizon; Subscripts and with, represent different agents; is a vector of the reference trajectory that is followed by agentat stage; is a vector of the reference trajectory that is followed by agent at the end of optimization horizon; is a symmetric and positive semi-definite matrix that penalises the state vector and the reference vector deviation at stage; is a symmetric and positive definite matrix that penalises agent i control signal at stage within its functional; with is a symmetric and positive definite matrix that penalises agent j control signal at stage within the functional of agent i.
We consider only the case where the state vector is fully accessible by all agents and the initial state vector is completely known. Furthermore, we assume that agent 1 is the leader and agent 2 is the follower.
At stage, each agent selects its strategy such that the functional is minimized. In a state feedback Stackelberg game formulation, the leader wishes to influence the follower so that the selected strategies minimize its functional. Strategies of the leader and the follower that minimize the leader functional are considered by definition as the team optimal strategies. The state vector obtained by applying these strategies is represented by. To influence the follower strategy selection, we assume that the leader is using a linear incentive function represented by the following equation [10]:
(5)
The problem is to find at each stage the two matrix gains and such that the state feedback control achieves a Stackelberg solution. In next sections of the paper, the terms control signal and strategy are considered equivalent.
2.2. Methodology
General Stackelberg Solution
In general, for two agents involved in a dynamic Stackelberg game, a solution concept is a pair of strategies from both agents that minimizes both functionals at each stage. This notion of equilibrium is extended to allow the definition of the feedback Stackelberg solution [18]. Hence, according to Jose [18], for each leader strategy, the follower selects a strategy where is a mapping from to. The selection of is done as followed:,
(6)
The strategy of the leader is chosen so that ,
(7)
The pair of strategies from Equations (6) and (7) is the Stalkelberg solution with agent 1 as the leader. We consider that the mapping is a linear function represented by Equation (5). In order to completely define this function, matrix gains and should be determined based on the optimal control theory.
2.3. Team Optimal Solution
Since the goal of the leader is to induce the follower to choose a strategy that minimizes its functional, we need to determine these two strategies called team optimal strategy. The team optimal strategies is defined as:
(8)
2.4. Incentive Matrix Gains
Assuming that the leader knows the follower functional. In order to incite the follower to adopt, the leader uses the strategy represented by Equation (5). The follower, in order to minimize its own functional and find its strategy, takes into account the previously mentioned leader strategy. This is a usual optimization problem from the follower side. Since the two matrix and are parts of the optimal strategy of the follower, the leader needs to provide them. This is not a simple optimization problem because the leader should take into account the expected rational strategy of the follower.
2.5. Solving Optimal Tracking Problem
Team Optimal Solution
It is assumed that both agents minimize the leader functional represented by Equation (2). To find the pair of strategies that minimized, the optimal control theory is applied.
(9)
where stands for the arguments that allow the functional to attain its minimum value. The team Hamiltonian of the system is given by:
(10)
where:
(11)
Using the minimum principle, we obtain the following expressions:
(12)
and
(13)
and
(14)
Equation (12) becomes:
(15)
with the boundary condition:
(16)
From Equations (13) and (14) the following expressions are obtained:
(17)
(18)
The state Equation (1) becomes:
(19)
From the boundary condition (16), it seems reasonable to assume that for all:
(20)
where is a matrix and is a vector. By substituting in Equation (19) with it expression (Equation (20)), the following equation is obtained:
(21)
where:
(22)
and is an identity matrix with proper dimensions. From Equation (15), the following expression is obtained by substituting and with their expressions:
(23)
from Equation (23) is replaced by its expression and the following expression is obtained:
(24)
Since Equation (24) must hold for all X(n) given any X(0), we must have:
(25)
and
(26)
Rewriting Equations (25) and (26), the following expressions are obtained:
(27)
and
(28)
with the following boundary conditions:
(29)
(30)
Given the expressions of and, the equation of is completely determined. Hence, the team optimal strategies is represented by:
(31)
(32)
where:
(33)
(34)
(35)
(36)
is an identity matrix with proper dimensions.
Incentive Matrix Gains To incite the follower to adopt, the leader advertises its strategy represented by Equation (5). We assume that the leader has a full knowledge of the follower reference path. Hence, given:
(37)
The follower reaction is found by solving its Hamiltonian:
(38)
where:
(39)
Using the minimum principle, we obtain the following expressions:
(40)
and
(41)
Equation (40) becomes:
(42)
with the boundary condition:
(43)
where is the state sequence when and are applied to the system [10]. The expression of is given below. From Equations (41) the following expression is obtained:
(44)
Assume that
(45)
where:, and are matrices with proper dimensions;
is the sequence of state vector when and are applied on the system. We know that:
(46)
Equation (44) can be rewritten as followed, given the expression of:
(47)
Substituting by its expression (46), the following equation is obtained:
(48)
If the follower acts exactly as the leader expected, is equal to and is equal to.
Hence, expression (46) becomes:
(49)
If the previous equation is true for any initial state, we must have the following conditions:
(50)
and
(51)
If both conditions hold, then the follower strategy is equivalent to:
(52)
To be able to compute the follower strategy, need to be evaluated. Consider the state equation when and are applied:
(53)
(54)
(55)
(56)
(57)
From Equation (45), the following expression is deduced:
(58)
Substituting Equation (57) in Equation (58) yields:
(59)
Substituting Equations (45), (46) and (57) in (42), we obtain the following equation:
(60)
Substituting in Equation (60) by its expression (59) yields:
(61)
This equation is true for any and if the following conditions hold:
1- for all
(62)
2- for all
(63)
3- for all constant values
(64)
Substituting expression of in Equation (62) yield:
(65)
Algorithm
The algorithm to solve the feedback Stackelberg game for trajectory following is summarized as followed:
backwardd processing:
1. Find all sequences of by using Equation (26);
2. Find all sequences of by using Equation (27); forward processing: At each step,
1. find by using Equation (33);
2. find by using Equation (34);
3. find by using Equation (35);
4. find by using Equation (36);
5. find from Equation (50);
6. find from Equation (65);
7. find from Equation (63);
8. find from Equation (51);
9. find from Equation (64);
3. Reactive Robotic Architecture
Generic Architecture
The generic reactive architecture depicted on Figure 1, has three layers: the decision layer, the planning layer and the execution layer. Since the collaboration between agents is heavily goal oriented, it requires that a common goal is defined for the team. For collaborative navigation application, the common goal could be to reach a particular configuration given the current configuration. The configuration refers to the platform position and orientation measured in a reference frame. Given the common goal, a reference trajectory is generated for each agent since each of them may have its own way to drive the platform to the goal configuration. The reference trajectory for the follower could be generated according to the surrounding map with only known obstacles. Therefore, this follower doesn’t require any additional sensor to detect obstacles. The reference trajectory for the leader could be generated by taking into account unknown obstacles (obstacles that were not modeled on the follower map). The common goal specification could be part of the leader functionality. In Figure 1, after common goal specification, reference trajectories (for a specific number of stages) for the leader and the follower
Figure 1. Stackelberg-based collaborative architecture.
are generated respectively by the leader reference path generator and the follower reference path generator.
Given the reference paths, the Stackelberg formulation can take place by considering that the leader reference path is the team reference trajectory. For the collaborative navigation application, the leader reference trajectory is the same as the follower trajectory unless an unknown obstacle is detected. As shown in Figure 1, the leader of Stakelberg model produces a control signal (if the number of stages is 1) or a sequence of control signal (for the general case) by taking into account the rationale reaction of the follower. The generated signal, based on incentive strategy, is then used by the follower when making its own planning.
For each planned stage, the two signals generated by both agents are directly given to the position controller. This module is responsible for applying required low level control signal to the platform effectors so that its configuration tends to be as close as possible to the given configuration. The position controller causes the platform to change its configuration during a stage. The obtained platform configuration is used as feedback for the two Stackelberg solver in order to generate next stage control signals.
The described architecture fulfills the minimum requirements stated by Hoc [3]. Indeed, the entire architecture is mainly goal oriented. The two agents work towards goals. Furthermore, the leader can interfere with the follower behaviour with the proposed incentive approach. The follower, by reacting rationally, allows easy and predictive interaction with the leader.
4. Application to Collaborative Robot Navigation
4.1. Simulation Scenario
To validate all required steps for collaborative control based upon the feedback Stackelberg theory, a simulation has been performed. The focus is put on the planning layer of the generic architecture presented in Figure 1, since this is the heart of this work.
Assume that the two agents are at point A. Their platforms are considered as a single team platform. The goal of the simulated collaborative navigation is to drive the team platform and to reach the point B, starting at point A, as shown in Figure 2. This can be interpreted as the common goal for the leader and the follower. Furthermore, the line shown in this figure represents the leader reference path. Hence, this reference path is considered the team reference path. The follower and leader reference paths are represented in Figure 3. By applying the procedure mentioned in the proposed methodology, we obtained the collaborative control that allows the team to follow the leader reference path at any
Figure 3. Leader and follower reference paths.
stage. The team platform state is designated by and the simulation parameters are described bellow.
(66)
where:
(67)
(68)
(69)
(70)
(71)
(72)
where:
represents the system integration time step;
represents the leader control signal along xaxis;
represents the leader control signal along yaxis;
represents leader control signal related to orientation;
represents the follower control signal along xaxis;
represents the follower control signal along yaxis;
represents follower control signal related to the orientation.
The leader and follower functionals are represented by Equation (2). We assume that control signals are not bounded. Involved functional matrices are defined to be well dimensioned unit matrices except, , and which are set to. The optimization horizon is set to the whole simulation number of stages.
4.2. Simulation Results
In Figure 4, we observe a match between the leader reference path and the path followed by the system composed with the two ANM, although the leader and the follower have different reference paths. This result suggests that the proposed Stackelberg optimal solution is valid. In order to analyze deeply the different interactions between agent control signals, these signals are shown in the subsequent figures.
Figure 3 shows two different phases. In the first phase, both reference paths are identical, whereas in the second phase, they are different. The breaking point happened at stage.
Figures 5-7 show agent control signals during the first phase. Since the reference paths are identical, the leader contribution along each axis is small meaning that the incentive part in the leader control signal is also small. This result makes sense since the follower is acting as wished by the leader.
The second phase starts from stage 701. During this phase, the leader needs to make use of the incentive strategy in order to induce the follower to track the leader reference path instead of its own reference path, as shown on Figures 8-10.
5. Conclusion
A new collaborative architecture is presented in this pa-
Figure 4. Comparison between leader reference path and the team trajectory.
Figure 5. Control signal along x-axis for identical reference paths.
Figure 6. Control signal along y-axis for identical reference paths.
Figure 7. Orientation control signal for identical reference paths.
Figure 8. Control signal along x-axis for non-identical reference path.
Figure 9. Control signal along y-axis for non-identical reference path.
Figure 10. Orientation control signal for non-identical reference path.
per. This architecture is based upon the incentive Stackelberg game formulation and the three-layer architecture. The proposed method is suitable to applications in which there is a hierarchy between decision makers. All required conditions, and equations have been provided in order to find incentives matrices and an algorithm for solving the Stackelberg problem for a class of discretetime two-agent non zero-sum dynamic games with linear state dynamic and quadratic cost functional is also provided. The feasibility and validity of this architecture are provided through the study of collaborative path planning of two robotic platforms. In a completely deterministic framework, the results suggest that the optimal solution for this game can be obtained. The proposed method as well as the collaborative architecture could be used for smart wheelchair team and unmanned vehicle team collaborative control.
6. Acknowledgements
The author wishes to thank Prof. Paul Cohen of École Polytechnique Montréal, Québec, Canada.