On the Connection between the Hamilton-Jacobi-Bellman and the Fokker-Planck Control Frameworks

In the framework of stochastic processes, the connection between the dynamic programming scheme given by the Hamilton-Jacobi-Bellman equation and a recently proposed control approach based on the Fokker-Planck equation is discussed. Under appropriate assumptions it is shown that the two strategies are equivalent in the case of expected cost functionals, while the FokkerPlanck formalism allows considering a larger class of objectives. To illustrate the connection between the two control strategies, the cases of an Itō stochastic process and of a piecewise-deterministic process are considered.


Introduction
In the modelling of uncertainty, the theory of stochastic processes [1] provides established mathematical tools for the modelling and the analysis of systems with random dynamics. Furthermore in application, the possibility to control sequences of events subject to random disturbances is highly desirable for real applications. In this paper, we elucidate the connection between the well established Hamilton-Jacobi-Bellman (HJB) control frame-work [2] [3] and a control strategy based on the Fokker-Planck (FP) equation [4] [5]. Illustrative examples allow gaining additional insight on this connection.
We focus on a representative n-dimensional continuous-time stochastic processes described by the following model where a is a Lipschitz-continuous n-dimensional drift function and m t W ∈  is a m-dimensional Wiener process with stochastically independent components. The dispersion function b with values in n m ×  is assumed to be smooth and full rank; see [6]. This is the well-known Itō stochastic differential equation (SDE) [1] where we consider also the action of a d-components vector of controls ( ) ∈  , that allows driving the random process towards a certain goal [3]. We denote with  the set of Markovian controls that contains all jointly measurable functions [ ] A ⊂  is a given compact set [3]. In deterministic dynamics, the optimal control is achieved by finding a control law α that minimizes a given objective defined by a cost functional ( ) , J X α ; see, e.g., [2].
In the non-deterministic case, the state t X is random, so that inserting a stochastic process into a deterministic cost functional will result into a random variable. Therefore, in stochastic optimal control problems the expected value of a given cost functional is considered [7]. In particular, we have This is a Bolza type cost functional in the finite-horizon case ( ) T < ∞ and it is assumed here that the controller knows the state of the system at each instant of time (complete observations). For this case, the method of dynamic programming can be applied [2] [7] [8] in order to derive the HJB equation for inf J α with α as the optimization function. Some other cases of the cost structure of ( ) , J X α are quoted in [8], that have applications in finance, engineering, and in production planning and forest harvesting. Each J will lead to a different form of the HJB equation that can be analysed with appropriate methods of partial differential equations; see, e.g., [9].
A control approach close to the HJB formulation consists in approximating the continuous stochastic process by a discrete Markov decision chain. In this approach the information of the controlled stochastic process, carried by the transition probability density function of the approximating Markov process, is utilized to solve the Bellman equation; for details see [10].
However, the common methodology to find an optimal controller of random processes consists in reformulating the problem from stochastic to deterministic. This is a reasonable approach when we consider the problem from a statistical point of view, with the purpose to find out the collective "ensemble" behaviour of the process.
In fact, the average [ ] ⋅  of the functional of the process X is omnipresent in almost all stochastic optimal control problems considered in the scientific literature.
The value of the cost functional before averaging is a way to measure the cost of a single trajectory of the process. However, the knowledge of the single realization is not useful for the statistical analysis, that would require to determine the average, the variance, and other properties associated to the state of the stochastic process.
On the other hand, a stochastic process is completely characterized by its law which, in many cases, can be represented by the probability density function (PDF). Therefore, a control methodology that employs the PDF would provide an accurate and flexible control strategy that could accommodate a wide class of objectives. For this reason, in [11]- [14] PDF control schemes were proposed, where the cost functional depends, possibly nonlinearly, on the PDF of the stochastic state variable; see, e.g., [11]- [14] for specific applications.
The important step in the Fokker-Planck control framework proposed in [4] [5] is to recognize that the evolution of the PDF associated to the stochastic process (1.1) is characterized as the solution of the Fokker-Planck (also known as forward Kolmogorov) equation; see, e.g., [15] [16]. This is a partial differential equation of parabolic type with Cauchy data given by the initial PDF distribution. Therefore, the formulation of objectives in terms of the PDF and the use of the Fokker-Planck equation provide a consistent framework to formulate an optimal control strategy of stochastic processes.
In this paper, we discuss the relationship between the HJB and the FP frameworks. We show that the FP control strategy provides the same optimal control as the HJB method for an appropriate choice of the objectives.
Specifically, this is the case for objectives that are formulated as expected cost functionals and assuming that both the HJB equation and the FP equation admit a unique classical solution. The latter assumption is motivated by the purpose of this work to show the connection between the HJB and FP frameworks, without aiming at finding the most general setting, e.g. for viscosity solution of the HJB equation [9] [17] [18] or FP equation with irregular coefficients [19], where this connection holds. Furthermore, we remark that the FP approach allows accommodating any desired functional of the stochastic state and its density, that is now represented by the PDF associated to the controlled stochastic process.
In the next section, we illustrate the HJB framework. In Section 3, we discuss the FP method. Section 4 is devoted to specific illustrative examples. A section of conclusions completes this paper.

The HJB Framework
We consider the optimal control of the state n t X  ∈ whose evolution is governed by drift and random diffusion as follows The control function α use the current value t X to affect the dynamics of the stochastic process by adjusting the drift and the dispersion function.
We define the expected cost for the admissible controls α ∈  as follows which is an expectation conditional to the process t X taking the value 0 x at time 0 t . Here, t X solves the stochastic differential Equation (1.3) with control α and the following functions : , : are smooth and bounded. We call h the running cost and g the terminal cost. Our goal is to find an optimal control * α which minimizes the expected cost We assume that this control is unique. Further, we define the following value function, also known as the cost to go function, It is well known [2] [3] that u solves the Hamilton-Jacobi-Bellman equation and that the optimal control can be reconstructed from u . Assume that the optimal control is unique and is attained, then we have In the following, to ease notation we use the Einstein summation convention: when an index variable appears twice in a single term this means that a summation over all possible values of the index takes place. For example, we have that Notice that the optimal control satisfies at each time t and state x the following optimality condition Existence and uniqueness of solutions to the HJB equation often involve the concept of uniform parabolicity; see [3] [18] [20]. The HJB equation is called uniformly parabolic [3] if there exists a constant 0 c > such that, for all ( ) , , represents a bounded or unbounded set. If the non-degeneracy condition holds, results from the theory of PDEs of parabolic type imply existence and uniqueness of solutions to the HJB problem (1.7) with the properties required in the Verification Theorem [3]. In particular, we have the following theorem due to Krylov.

Theorem 2. If the non-degeneracy assumption holds, and in addition we have that A is compact,  is bounded with
, the drift, the diffusion, and the Lagrange functions are sufficiently smooth on the space-time cylinder , and the final condition is then the HJB has a unique solution

The Fokker-Planck Formulation
In this section, we discuss an alternative to the HJB approach that is based on the formulation of a Fokker-Planck optimal control problem. We suppose that the functions a and b of (1.3) yield a stochastic process , .
Also in this case, the FP problem can be defined in a bounded or unbounded set in n  . Existence and uniqueness to this problem often relay on the concept of uniform parabolicity as in Theorem 2. For the case n =   , we refer to [19] [20]; see also [21] and the references therein. Furthermore, we remark that in the case of bounded domains, boundary conditions for the FP model must be chosen that ought to be meaningful for the underlying stochastic process. This is a delicate issue that is not the focus of this work and therefore, in the following, we consider the common case where n =   . : , n q t T × →   represents the Lagrange multiplier. We have that the optimal control solution is characterized as the solution to the following optimality system , , Now, we illustrate the equivalence between the HJB and the FP formulation. The key point is to notice that (1.16) corresponds to the first-order necessary optimality condition (1.9) for the minimization of the Hamiltonian in the HJB formulation, once we identify the Lagrange multiplier q with the value function u in (1.7). Therefore, provided that the minimization problem (1.11) admits a unique solution * α in terms of the first-and second-order derivatives of u , then we can replace such * α into the backward Kolmogorov Equation (1.15), thus obtaining the HJB Equation (1.7). This procedure results in the Equation (1.15), because of the formal equivalence between (1.16) and (1.9). With our setting, since the solution u of (1.7) is unique, then the uniqueness of α and ρ follows; see [3].
Notice that with the above setting, the optimal control * α does not depend explicitly on the density ρ , but only on the Lagrange multiplier q , that is the value function u . (This explains why the feedback control is based on the value function.) Hence, the Equations (1.15)-(1.16) determine the optimal control. This will not be the case in the more general situation in which the cost functional in (1.12) is not linear in ρ . This happens for instance when J does not represent an expected cost; see [4] [5] for the case of a tracking functional for the density.
We also note that the solution to the adjoint FP Equation (1.15) and to the optimality condition equation for the control function (1.16) do not depend on the initial condition 0 ρ of the forward FP Equation (1.14). Hence, according the HJB formulation, the solution to the backward Kolmogorov equation is not affected by the initial state of the system.

Illustrative Examples
In this section, we consider two examples that illustrate that the FP optimal control formulation may provide the same control strategy as the HJB method. In the following, the first example refers to a Itōstochastic process, while the second example considers a piecewise deterministic process.

Controlled Itō Stochastic Process
We consider an optimal transport problem that is related to a model for mean-field games; see, e.g., [23]. It reflects the congestion situation, where the behaviour of the crowd depends on the form of the attractive strongly convex potential g . In this model, the dynamics of an agent is governed by the following stochastic differential equation where the velocity α represents the controlling drift function and the dynamics is perturbed by random diffusion of intensity 2 b ε = . With this setting, the evolution of the PDF for this process is given by the following FP equation where the PDF ρ formally corresponds to the mass density of the transport problem.
The purpose of the optimal control is to determine a drift of minimal kinetic energy that moves a mass distribution from an initial location to a final destination. The corresponding objective is as follows [24] 1 0, , , 2 t q q q q T g α α ε ∂ + + ⋅∇ + ∆ = ⋅ = (1.19) and the optimality condition is given by It is immediate to see that combining the adjoint equation and the optimality condition, we obtain the following HJB problem

Controlled Piecewise Deterministic Process
Our second example refers to a class of piecewise deterministic processes (PDP). A first general formulation of these systems that switch randomly within a certain number of deterministic discrete states at random times is given in [25]. Specifically, we deal with a PDP model described by a state function that is continuous in time and is driven by a discrete S-state renewal Markov process denoted with ( ) t  ; see [26] for additional details. A switching control problem for ordinary differential equations has been investigated in [27]. In our case, the PDP equation model is a first-order differential equation, where the driving term is affected by the renewal process [28]. The state function ( ) ; see [29].
b) The state function satisfies the initial condition c) The process ( )  When a transition event occurs, the PDP system switches instantaneously from a state j ∈  , with dynamic function j a , randomly to a new state i ∈  , driven by the dynamics function i a . Virtual transitions from the M. Annunziato et al. 2482 state j to itself are allowed for this model, that is 0 jj r > . We have that the time evolution of the PDFs of the states of our PDP model is governed by the following Fokker-Planck system [26]  The initial conditions for the solution of the FP system are given as follows ( ) ( )