Human-Robot Collaborative Planning for Navigation Based on Optimal Control Theory

Navigation modules are capable of driving a robotic platform without direct human participation. However, for some specific contexts, it is preferable to give the control to a human driver. The human driver participation in the robotic control process when the navigation module is running raises the share control issue. This work presents a new approach for two agents collaborative planning using the optimal control theory and the three-layer architecture. In particular, the problem of a human and a navigation module collaborative planning for a trajectory following is analyzed. The collaborative plan executed by the platform is a weighted summation of each agent control signal. As a result, the proposed architecture could be set to work in autonomous mode, in human direct control mode or in any aggregation of these two operating modes. A collaborative obstacle avoidance maneuver is used to validate this approach. The proposed collaborative architecture could be used for smart wheelchairs, telerobotics and unmanned vehicle applications.


Introduction
The human-machine interaction is gaining interest in the robotic community [1][2][3].In particular, for robotic platform control, this interaction leads to a share control problem.The robotic platform share control falls into two main categories.The first category is related to situations where the agents (human or intelligent modules embedded on robotic platforms) compete to find the best control action to use [4].The second category is related to a collaborative approach to achieve a given goal [5][6][7][8].
In the context of the collaborative navigation with obstacle avoidance, the agents are often heterogeneous (i.e. a human and a navigation module).The navigation module has the ability to perform a local obstacle avoidance maneuver without the direct human intervention.The human agent is assumed to be able to perform a perceived obstacle avoidance with an appropriate continuous control modality (a joystick or any proportional control device).So, the agents use different obstacle perception modalities and they behave differently during the perceived obstacles avoidance process.In this paper, we consider the collaborative approach for the share control in order to leverage each agent strength.
There is no agreement about the formal definition of collaborative control.However, according to Hoc [3], two minimal conditions are required in order to have two agents to collaborate: each agent works towards goals and can interfere with the other; each agent tries to manage the interference to facilitate the common task when it exists.Hence, the definition of common goals is an important aspect of this approach.In order to meet these requirements, most of the collaborative control architectures try to address these three issues: the collaboration goal definition; the elaboration of the most appropriate plan to meet the identified goal; the execution of the selected plan.
The first issue is a decision making problem.The second issue is related to planning whereas the third one is part of the execution problem.When a human is part of two agent team, most applications focus on the decision problem.For the wheelchair collaborative control applications, the intention of the driver is predicted based on the navigation context given by the on-board sensory-based systems [9][10][11][12].By estimating the wheelchair user intention, an appropriate navigation mode is selected.However, the planning (sequence of actions that may be used) is considered as the navigation module issue.Using a knowledge-based approach, other studies aimed at selecting the best maneuver the navigation module can execute [13].Again, the user is not part of the planning and execution steps of the collaborative control.Ignoring the user action in the planning and execution steps makes it difficult for the user to directly modify any navigation module action after the decision making step.For example, Qiang [7] mentioned that an autonomous agent may prevent the robotic platform to move to a table if it did not approach at a given angle.
In this paper, we proposed a new architecture that efficiently included both agents at the decision and planning levels.The optimal control theory is used in order to handle both agent interactions [14][15][16].The contribution of the paper is the formal methodology for collaborative planning.
The rest of the paper is organized in four sections.Section 2 presents the collaborative architecture and the methodology for collaborative planning.Sections 3 and 4 are related to simulation and discussion.Section 5 presents the conclusion.

Architecture for Collaborative Navigation
In order to efficiently allow collaboration between a human and an Autonomous Navigation Module (ANM), a suitable architecture is required [17].Among well-known robotic platform architectures are the subsumption [18] and the three-layer architectures [19].The three-layer architecture includes an Execution Layer (EL), a Sequencer Layer (SL) and a Deliberative Layer (DL) which individual roles are explained in the next sections.The three-layer architecture as shown in Figure 1 is selected as the basis of the collaborative control architecture because it provides a high level of decoupling between layers and it can be easily modified to allow the human control integration in the architecture [5].

Deliberative Layer
The Deliberative Layer is the top layer of the proposed architecture.The role of the DL depends on the type of application.In this paper, we consider the collaborative navigation application with obstacle avoidance capability.The ANM, within the collaborative framework, is responsible to support the human during the obstacle avoidance maneuvers (avoiding collision, avoiding obstacle).
To perform these maneuvers, the DL needs the human control signal (HCS) which is obtained via a continuous command modality.Mode confusion may occur [20].In particular, the following two maneuvers may be confusing:  to get close to an obstacle;  to avoid an obstacle.
One way to handle the mode confusion is to try to guess the maneuver the human would like to execute, given some a prior knowledge [21].This specific task belongs to the Maneuver Recognition Module of the DL.Once a candidate maneuver is selected, a free obstacle trajectory which is a sequence of non-colliding waypoints is the number of these way-points.A way-point is considered as a sub goal during the platform motion.It is important to allow the DL to propagate the level of confidence of the selected maneuver to the SL.A high level of confident will allow the SL to give a substantial portion of the ANM control signal into the collaborative control signal.
One way to propagate this confidence is to let the DL set the value of the a used in the cost function of the SL (see Equations ( 6) and ( 7)).Indeed, a large value of a will indicate that the confidence of the way-point sequence is low and the SL should heavily penalize the ANM control signal when generating the collaborative R R control signal.In this case, the computed collaborative control signal at the SL level will be close to the HCS.Several approaches were reported for collision free way-point sequence generation [22].

Sequencer Layer
The challenge for the ANM is to carefully design the plan that the EL will execute.This task mainly belongs to the Sequencer Layer.The SL is the most important aspect of the paper for the following reasons: usually, human control signal is not directly involved in this layer.Instead, the proposed architectures in literature used HCS only in the Deliberative Layer.this layer design is the most challenging because of the integration of the HCS.Given two consecutive way-points   w X i and , the role of the Sequencer Layer is to find the sequence of configuration changes to provide to the Execution Layer in order to move the platform from as suggested by the DL.
  1 w X i M is the number of intermediate points on the subtrajectory joining the two way-points.This sub-trajectory is generated using B-spline method in order to allow a smooth transition over the way-points.So, a geometric sub-trajectory is a set of reference configurations

Planning Problem Formulation
Since the Execution Layer (EL) is decoupled with the Sequencer Layer, the EL can be considered a black box.
If we assume that, given two consecutive stages and , the EL will have enough time to allow the robotic platform to reach the configuration from the configuration , a simple linear model can then be used to approximate the EL.Hence, from the SL perspective, the EL dynamic model is represented by the following equation: where represents the collaborative control signal and where A(k) is the state transition matrix.There are several ways to define this collaborative signal.In order to keep the system simple and easy to be implemented in robotics, we assume that U k is a weighted sum of the human signal and the ANM signal.Hence, where: k is the current stage (the index of the current point on the sub-trajectory); where x(k) and y(k) are the platform coordinates in a ref-     is the time between two consecutive stages.In order to generate smooth motion for the platform, the SL should avoid large variation on magnitude.Furthermore, the deviation between the platform configuration where:

Solving the Planning Problem
In order to solve the problem, we assume that: 1) The configuration is fully observable to the ANM and the initial configuration is known.
2) The human control signal U k is known.In practice, this signal varies slowly and can then be approximated by a constant function within a short elapsed time .
 , 0, , 3) The geometric sub-trajectory is realizable; all involved configurations on this trajectory are reachable individually.
The Hamiltonian of the system is described by the following equation: Using the minimum principle, we obtain the following expressions: and with the following boundary condition: By applying optimal control theory, the following results are obtained: where: According to Equation ( 2), the collaborative control signal is represented by: The planning law represented by Equation ( 13) is linear.Furthermore, which values are set by the DL according to the confidence on the selected maneuver.Hence, if DL does not have a good confidence on the selected maneuver,   a R k   .Thus, according to Equations ( 14), ( 16) and ( 17), the ANM control signal .The collaborative control signal , according to Equation (2).

Execution Layer
We assume that the robotic platform state p X is represented by its configuration expressed in a reference frame (working space).The platform configuration consists of its position   , x y and its orientation  in this frame as shown in Figure 2. Note that  is the angle between x-axis and v-axis (Figure 2).
The EL input signal is a weighted sum of the two control signals where the first control signal h comes from the human.This signal represents the rate of configuration change the human would like to apply to the platform.The second control signal a , provided by the ANM, represents the rate of configuration change to apply to the platform in order to achieve the plan given by the ANM Sequencer Layer (second layer represented in Figure 1).Hence, can be taught as the human-ANM team configuration change to apply to the platform.Knowing the current configuration X k and U , the desired configuration is computed using the platform dynamics.The EL, in the traditional three-layer architecture, is designed to be tightly coupled with sensors and actuators.It receives the sequence of configurations  from the SL as the set points and  and the measured platform configuration  p X k .Many methods exist to solve the controller problem when the involved platform is a wheeled robot [23][24][25].Recently, Lyapunov based approaches has been proposed [26,27] .In this paper, a differentially driven robotic platform model is used [24].Since the EL is decoupled with the upper layers, any other model can be used with a little modification of the architecture.

Collaboration Planning Simulation
The goal of the simulation is to test the proposed Sequencer Layer planning method with the collaborative three layer architecture.

Simulation Scenario
A human wishes to drive a robotic platform from point A to point B as illustrated in Figure 3 by following the solid line trajectory.However, an obstacle is present on this trajectory.The goal of the ANM is to allow the team to avoid this obstacle by following trajectory represented by a dashed line.This trajectory is generated by the DL.We ran two different simulations in order to explain the influence of a on the collaborative planning.In the first simulation, all ANM functional matrices were set to identity matrices except the matrix a which is set to R R 0.01 I  ( I is a well-dimension unit matrix).By reducing the value of a , the ANM control signal is less penalized.Hence, the ANM could adequately contribute to the team collaborative planning.In the second simulation, the ANM control signal is heavily penalized.For this case, the value of is set to

Case of ANM Full Contribution
We show in Figure 4 the result of the simulation when the ANM control signal is less penalized.This is the ANM Full Contribution mode.Although the presence of the human trajectory (solid line in Figure 3), the trajectory followed by the team is the same as the DL trajectory (Figure 4).
In Figure 5, we show the three control signal dynamics along x-axis: ax , hx and U U x U are expressed in the reference frame.Whereas in Figure 6, the three control signal ( ay U , hy and U y U ) along y-axis are shown.For both figures, the contribution of the ANM is null until stage 300.Hence, the team trajectory is exactly the same as the human trajectory.From stage 300 to the end of the simulation, the team trajectory must follow  the DL suggested trajectory in order to properly avoid the obstacle.To achieve this behaviour, the ANM uses the collaborative planning method proposed in Section 2.2.2.The ANM plan along x-axis and y-axis are respectively represented by the dashed lines in Figures 5 and 6.As a result, the team control signal (solid lines) which is also the collaborative signal is the required control signal in order to follow the DL trajectory.

Case of Manual Robotic Platform Control
When the ANM signal is heavily penalized, the architectture behaves as if the human is the unique controller of the platform.This result is shown in Figure 7. Notice that the team trajectory is the same as the human trajectory.Hence, the obstacle is not avoided.The analysis of several control signals of this simulation (see Figures 8 and 9) reveals that the ANM control signal is very small.

Discussion
One assumption of the proposed collaborative planning is based on the full knowledge of the human control signal during the planning period (between   w X i and   1 w X i ).In practice, this assumption may not be valid.However, in the Figure 10, we show that even if the planning horizon is set to 1 (there is no need for knowing future human control signal), by selecting the appropriate value of a , the trajectory followed by the team is quiet the same as if the planning horizon was greater than 1.It is interesting to notice that the proposed Sequencer Layer can be used with any decision-making process that provides a sequence of way-points which represents the navigation task sub goals.If the way-points are too distant from each other, the planning horizon M in the Sequencer Layer can be set to 1. Furthermore, a non holonomic constraint can be integrated in the formulation of the optimization problem in order to produce the ANM plans.

Conclusion
A new Human-Navigation Module collaborative architecture that involves the human intervention at the deliberative and the sequencer layers, is presented in this paper.This architecture is based on the three-layer architecture.At the Deliberative Layer, the human control signal is analyzed in order to estimate the human maneuver during the navigation task.Based on this maneuver, the Deliberative Layer provides a sequence of waypoints to the Sequencer Layer which is also the planning layer.We proposed a method based on the optimal control theory that took into account the human plan and the Autonomous Navigation Module plan.The resulting collaborative plan is then executed in the Execution Layer which is responsible for non-linear platform control.The collaborative planning is simulated and results suggest that the penalty on the Autonomous Navigation Module control signal can be used to impose the platform operating mode among the following modes: Autonomous Navigation Module alone, Human Driven Mode alone or as any combination of the two previous modes.The proposed architecture could be used in applications such as telerobotic, smart wheelchair and unmanned vehicle collaborative navigation.


the ANM control signal change rate on x-axis;   u k ay is the ANM control signal change rate on y-axis; is the HCS for orientation change rate; and the reference configuration   r X k at stage should be minimized in order to allow the platform to follow this reference path.One way to take into account all these requirements when generating the ANM sequence of configuration change rate k   a U k is the formulation of an optimization problem.The following functional (used by the SL) takes into account the previously mentioned requirements: semidefinite matrix that penalizes the deviation between the state vector and the reference vector at stage ; k definite matrix that penalizes large ANM control signal at stage ;k   h R k is a  3 3  symmetric and positive definite matrix that penalizes large HCS at stage ; k The optimization horizon is M .The optimal sequence

Figure 4 .
Figure 4. Simulation result with full ANM contribution.

Figure 5 .
Figure 5.Control signals along x-axis with full ANM contribution.

Figure 6 .Figure 7 .
Figure 6.Control signals along y-axis with full ANM contribution

Figure 8 .
Figure 8.Control signals along x-axis with No ANM contribution.

Figure 9 .
Figure 9. Signals along y-axis with No ANM contribution.