On Optimal Sparse-Control Problems Governed by Jump-Diffusion Processes

A framework for the optimal sparse-control of the probability density function of a jump-diffusion process is presented. This framework is based on the partial integro-differential Fokker-Planck (FP) equation that governs the time evolution of the probability density function of this process. In the stochastic process and, correspondingly, in the FP model the control function enters as a time-dependent coefficient. The objectives of the control are to minimize a discrete-in-time, resp. continuous-intime, tracking functionals and its L2and L1-costs, where the latter is considered to promote control sparsity. An efficient proximal scheme for solving these optimal control problems is considered. Results of numerical experiments are presented to validate the theoretical results and the computational effectiveness of the proposed control framework.


Introduction
Recently, largely motivated by computational finance applications, there has been a growing interest in stochastic jump-diffusion processes.In fact, empirical facts suggest that a discontinuous path could be most appropriate for describing the dynamics of stock prices; see [1] and references therein.Therefore, in many application models, the stock price is modeled by a jump-diffusion stochastic process, rather than by an Itô-diffusion process [2] [3].In this framework, when option pricing models and portfolio optimization problems are considered, partial integro-differential equations (PIDEs) naturally arise; see [1] [4] and references therein.In the present paper, we focus on a stochastic jump-diffusion (JD) process, whose jump component is given by a compound Poisson process subject to given barriers.Also concerning market models, systems driven by Poisson processes have been considered; see, e.g., [5].
When one considers decision making issues involving random quantities, stochastic optimization problems must be solved.Such problems have largely been examined in the scientific literature, because of the numerous applications in, e.g., physics, biology, finance, and economy [6]- [8].In these references, the usual procedure consists of minimizing a deterministic objective function that depends on the state and on the control variables.However, within this approach, statistical expectation objectives must be considered, since the state evolution is subject to randomness.
In this work, we tackle the issue of controlling a stochastic process by following an alternative approach already proposed in [9]- [11], where the problem is reformulated from stochastic to deterministic.The key idea of this strategy is to focus on the probability density function (PDF) of the considered process, whose time evolution is modeled by the Fokker-Planck (FP) equation, also known as the Kolmogorov forward equation.The FP control approach is advantageous since it allows to model the action of the control over the entire space-time range of the underlying process, which is characterized by the shape of its PDF.
In the case of our JD process, the FP equation takes the form of a PIDE endowed with initial and boundary conditions.While the Cauchy data must be the initial distribution of the given random variable, the boundary conditions of a FP problem depend on the considered model.For the derivation of the FP equation and a discussion about boundary conditions, see [12]- [14].Starting from the controlled stochastic differential model, the coefficients of the FP equation and thus the control mechanism are authomatically determined and thus an infinite dimensional optimal control problem governed by the FP PIDE related to a JD process is obtained.Since the control variable enters the state equation as a coefficient of the partial integro-differential operator, the resulting optimization problem is nonconvex.
Infinite-dimensional optimization is a very active research field, motivated by a broad range of applications ranging from, e.g., fluid flow, space technology, heat phenomena, and image reconstruction; see, e.g., [15]- [17].The main focus of this research work has been on problems with smooth cost functionals governed by partial differential equations (PDEs) with linear control mechanism [16] [17].However, bilinear control problems governed by parabolic and elliptic PDEs have been also recently investigated; see, e.g., [9] [18] [19] and references therin.In these references, the purpose is often to compute optimal controls such that an appropriate norm of the difference between a given target and the resulting state is minimized.In the present paper, we consider tracking objectives that include mean expectation values as in [20].Our framework aims at the minimization of the difference between a known sequence of values and the first moment of a JD process, such that our formulation can also be considered as a parameter estimation problem for stochastic processes.In the discrete-in-time case, the form of the cost functional gives rise to a finite number of discontinuities in time in the adjoint variable and hence of the control.A similar situation has already been considered in [21].
Very recently, PDE-based optimal control problems with sparsity promoting L 1 -cost functionals have been investigated starting with [22].See [19] for a short survey and further references.Such formulation gives rise to a sparse optimal control, and for their solution variants of the semismooth Newton (SSN) method [23] have been considered.An alternative to such techniques is represented by proximal iterative schemes, introduced in [24] and [25] and further developed in the framework of finite-dimensional optimization [26] [27].Recents works have adapted the structure of these algorithms for solving infinite-dimensional PDE optimization problems [19].Moreover, it has been shown in [19] that in infinite-dimensional problems, proximal algorithms have a computational performance comparable to SSN methods while they do not require the construction of the second-order derivatives.In the present paper, we consider a L 1 cost functional and apply the proximal algorithm proposed in [19] [28].One of the novelties of our work consists of combining pioneering techniques for nonsmooth problems with the control of the PDF of a FP PIDE of a JD process.
This paper is organized as follows.In the next section, we discuss the functional setting of the FP problem modeling the evolution of the PDF of a JD stochastic process.In Section 3, we formulate our optimal control problems.Section 4 is devoted to the formulation of the corresponding first-order optimality systems.In Section 5, we discuss the discretization of the state and adjoint equations of the optimality system.In Section 6, we illustrate a proximal method for solving our optimal control problems.Section 7 is devoted to presenting results of numerical tests, including a discussion on the robustness of the algorithm to the choice of the parameters of the optimization problem.A section of conclusions completes this work.

The Fokker-Planck Equation of a Jump-Diffusion Process
In this section, we introduce a JD process and the corresponding FP equation that models the time evolution of the PDF of this process.Further, we discuss well-posedness and regularity of solutions to our FP problem.
We consider a time interval  and a stochastic process { } t t I X ∈ with range in a bounded domain n Ω ⊂  .We assume that the set Ω is convex with Lipschitz boundary.The dynamic of X is governed by the following initial value problem ( ) ( ) Define : . Since σ is full rank, a is posi- tive definite, and hence there exists , for each , a.e. in .
In this work, we consider a stochastic process with reflecting barriers.This assumption determines the boundary conditions for the FP equation corresponding to (1), see below.Define : Q I = Ω × and : I Σ = ∂Ω × , and denote with f the PDF of the process given by (1).It is known [12] [13] that the time evolution of f is modeled by the following FP of PIDE type where the differential operator  and the integral operator  are defined as follows and respectively.The definition of g in ( 5) takes into account the presence of reflecting barriers and the dependence on the jump amplitude ĝ , as we discuss later.Notice that the differential operator  can be rewritten as follows , , for each 1, , i n =  .The function F in (6) represents the flux of the differential operator L, and −F is known in the literature as the probability current in case of stochastic processes without jumps [13].
The PDF f of X in (1) in the bounded domain Ω is obtained by solving (3), endowed by suitable initial and boundary conditions.In our setting, the initial data 0 f represents the PDF of the initial random variable 0 X .The choice of a bounded domain with reflecting barriers results in the following zero-flux boundary conditions for the FP model where n is the unit outward normal on ∂Ω .Notice that the flux F corresponds to the differential part of the FP equation, that is, to the drift and diffusion components of the stochastic process.In order to take into account the action of a reflecting barrier on the jumps, we consider a suitable definition of the kernel g, which can be conveniently illustrated in the one-dimensional case as follows. Consider . The kernel g in ( 5) takes the following form ˆ2 , g x y g x y g r x y H r s x y g s x y H x y r s where H is the Heaviside step function defined by ( ) We normalize g and ĝ such that The next remark motivates the choice of the boundary conditions ( 8) and of the condition (10).
Remark 2.1.Assume ( 8) and (10).Provided that 0 f is a PDF in Ω , then the solution to our FP problem satisfies the following conservation equation That is, the total probability over the space domain Ω at each time t I ∈ is preserved, in the sense that Our FP problem is stated as follows Next, we recall some definitions concerning the functional spaces needed to state the existence and uniqueness of solutions to (11).The space ( ) 0 C Ω refers to the func- tions that are continuous in Ω and it is endowed with the supremum norm.Let α be a constant, . The space

( )
C α Σ refers to the functions that are Hölder con- tinuous on Σ , with Hölder exponent α with respect to the space variable.The space

( )
L ∞ Ω denotes all the functions that are bounded on Ω , up to a set of zero measure.
These spaces are endowed with the following norms where , n i j ∈  denote multi-indeces.
We assume that the coefficients a and b in (4) satisfy the following conditions Notice that a and b must be defined on the closure Ω due to their role in the boundary conditions in (11).We assume that the following condition is satisfied ( ) We have the following theorem [29].4) and that g satisfy the assumptions ( 13) and ( 14), respectively.Then, for given ( ) f is also a PDF, it follows by standard arguments [29] that ( ) , 0 Consider the following spaces ( ) We denote with * V the dual space of V and with * , V V ⋅ ⋅ their canonical pairing.We consider the following Gelfand triple  between a Hilbert space with his dual.Each embedding is dense and continuous [17].
Given the interval  and an arbitrary Banach space Z, we define the following spaces ; : such that d which are also Banach spaces [17] equipped with the following norms ; : d and : max , respectively.We consider the following space which is a Hilbert space [17] with respect to the scalar product defined as follows With this preparation, we can recall the following theorem [17].
Theorem 2.2.The embedding ( ) coincides with an element of ( ) The following proposition provides a useful a priori estimate of the solution to (11).Proposition 2.3.Let ( ) f ≥ , and g satisfies (14).Then if f is a solution to (11), the following inequality holds Proof.Consider the H inner product of the equation in (11) with f.Exploiting the properties of the Gelf and triple, we have We make use of the following fact [17], ( ) ( ) . The terms on the right-hand side in (19) are recast as follows.
First, we exploit the zero-flux boundary conditions in (11) and the coercivity of a as given in (2).Moreover, we make use of the following Cauchy inequality Integrating by parts and recalling the defi- nition of F in (6), we have , where a c is defined in (2), and define ( ) ( ) We have B c < ∞ thanks to (13).
Therefore we have Recalling the definition of I in (5) and defining : .
Since Ω is bounded, we have .
The estimates in ( 20) and ( 21) allow us to write (19) as follows By applying the Gronwall inequality, we have Next, we outline how to obtain an upper bound of ( ) ( ) over Ω and then recall the definition of F in (6).We have where we used the PIDE and the boundary condition of the FP problem in (11).Proceeding as above, we obtain with 0 c > .This last estimate, together with (22), proves that , up to a redefinition of the constant 0 c > .The estimates of the other addends in (18) follow after some calculation with arguments as in [9] [17]. Proposition 2.4.Assume ( 13) and ( ) Then the unique solution to (11) belongs to ( ) ( ) ; . Moreover, ( ) = , where 0 f is the initial data in (11).The initial-boundary value problem (11) can be stated as , where the map  is defined as follows with F and I defined in ( 6) and (5), respectively.

Two Fokker-Planck Optimal Control Problems
In this section, we define our optimal control problems governed by (23) and prove the existence of at least an optimal solution.We consider a control mechanism that acts on the drift function by means of a time-dependent control ( ) Therefore we refer to (23) as ( ) We assume that b is a smooth function of its arguments and that assumption ( 13) is fulfilled.We remark that a time-dependent control function is a natural choice considering that it originates from the stochastic differential model where the time is the only independent variable.We assume the presence of control constraints given by , { } : : .
Remark 3.1.The subset ad  is nonempty, closed, and convex.Let ν and γ be positive constants.We consider the following objective ( ) ( ) The term ( ) 26) represents a tracking objective that involves the expecta-  , and a desired trajectory or a discrete set of values (e.g.measurements).We investigate the following two cases.

1) Given a set of values
, we have 2) Given a square-integrable function : The norms in (26) are defined as follows The choice of a bounded time interval I ensures that the L 1 -norm is finite whenever u U ∈ .
Remark 3.3.The functional  is convex and continuous with respect to ( ) , f u in the 2  L norm.
We investigate the following optimal control problem (s) In order to discuss the existence and uniqueness of solutions to (29), we consider the control-to-state operator : →    , that maps a given u ∈  into ( ) : where ( ) . Note that the definition of ad  in (25) ensures that b satisfies (13).Because of Theorem 2.1, the operator  is well defined.The next proposition can be proved by using standard arguments [9] where b is the drift in (1) and F is defined in (6). where is the so-called reduced cost functional.
The solvability of ( 31) is ensured by the next theorem, whose proof adapts techniques given in [30] [31] and [17].Theorem 3.2.There exists at least one optimal pair ( ) , f u that solves (29), such that u solves ( 31   follows from similar arguments [31].These observa- tions lead to the conclusion that the limit f solves (11), with ( ) . Therefore, the constraint ( ) and therefore the pair ( ) , f u is a minimizer for the problem (29).
Remark 3.4.The uniqueness of the control u can not be stated a priori since  is non convex.

Two First-Order Optimality Systems
We follow the standard approach [17] [23] [32] of characterizing the solution of our optimal control problem as the solution to first-order optimality conditions that constitute the optimality system.
Consider the reduced problem (31) and write the reduced functional  as : .
Remark 4.1.The functional 1  is smooth and possibly nonconvex, while 2  is convex and nonsmooth.
The following definitions are needed in order to determine the first-order optimality system.If  is finite at a point u, the Fréchet subdifferential of  at u is defined as follows [32].We have where *  is the dual space of  .Any element is called a subgradient.In our framework, we have  is Fréchet differentiable at u; this follows from standard arguments [17] [30].
Moreover, for each 0 α > , it holds that ( ) [26].The following proposition gives a necessary condition for a local minimum of  .Proposition 4.1.If  and 2  given by (4.1), attains a local minimum in for v sufficiently close to u .Exploiting the convexity of 2  , we have Dividing by θ and considering the limit Dividing by 2 v u − and considering the following limit we conclude that ( ) ( ) , according to the definiton in (33).By using results in [22] [32], we have that (34) implies that each λ λ ∈  that play the role of Lagrange multipliers [17].The previous considerations lead to the following proposition, that states the optimality system for the reduced problem (31).
Proposition 4.2.The optimal solution u of the minimization problem (31) with (32), is characterized by the existence of ( ) a.e. on : 0 We refer to the last three conditions in (37) for the pair ( ) , , where α and β depend on the choice of D in ( 27) and (28).When D is given by ( 27), ( ) . On the other hand, when D is given by ( The operator   is defined as follows The terminal boundary-value problem (40) admits a unique solution ( ) thanks to the assumptions ( 13) and ( 14), following the same arguments as in Theorem 2.1 [29].
The reduced gradient in (39), for given u, f, and p, takes the following form The complementarity conditions in (37) can be recast in a more compact form, as follows.We define : , : max 0, min 0, max 0, min 0, .
The previous considerations can be summarized in the following propositions.Proposition 4.3.(Optimality system for a discrete-in-time tracking functional) A local solution ( ) 29) with D given by ( 27) is characterized by the existence of ( ) ( )

Numerical Approximation of the Optimality Systems
In section, we discuss the discretization of the optimality systems given in ( 42) and (43).For simplicity, we focus on a one-dimensional case with ( )  Notice that a cell-centered space discretization is considered with cells midpoints at j x , 1, , j N =  , and cell faces at The approximation of the forward and backward FP PIDEs is based on a discretization method discussed in [33], where a convergent and conservative numerical scheme for solving the FP problem of a JD process is presented.This discretization scheme is obtained based on the so-called method of lines (MOL) [34].The differential operator in (11) is discretized by applying the Chang-Cooper (CC) scheme [35] [36].Setting ( ) ( ) 1 .
The zero-flux boundary conditions are implemented referring to the points 1 2 x and . The integral addend is approximated by the midpoint rule.After spacial discretization, the forward FP PIDE problem takes the following form ( ) ( ) ( ), where ( ) N f t ∈  .The matrices  and  correspond to the CC scheme and to the quadrature rule, respectively.The time integration of (44) is carried out with the combination of the Strang-Marchuk (SM) splitting scheme [37] [38] together with a predictor-corrector scheme [39].We refer to [40] for a detailed introduction to splitting methods.With this choice, the numerical scheme solving (11) is second-order convergent both in space and time with respect to the 2 , h t L δ norm.Notice that the chosen numerical method for the FP problem must ensure that the PDF solution is nonnegative and that the total probability remains constant along the time evolution.See [33] for all details and numerical analysis results.
If we follow the optimize-before-discretize (OBD) approach, the optimality system has already been computed on a continuous level as in ( 42) and (43) and subsequently discretized.As a consequence, the OBD approach allows one to discretize the forward abd adjoint FP problems according to different numerical schemes.However, the OBD procedure might introduce an inconsistency between the discretized objective and the reduced gradient; see [15] and references therein.For this reason, the DBO (discretize-before-optimize) approach could be preferred and we pursue it in this work.
The DBO approach results in the following approximations The time integration of (45) is carried out with the combination of the SM splitting with a predictor corrector scheme, as in (44).

A Proximal Optimization Scheme
In this section, we discuss a proximal optimization scheme for solving (31).This scheme and the related theoretical discussion follow the work in [19] [27].Proximal methods conveniently exploit the additive structure of the reduced objective, and in our framework, we have that the reduced functional  is given by the sum of a noncon- vex smooth function 1  and a convex nonsmooth function 2  as in (32).
For our discussion, we need the following definitions and properties.Definition 6.1.Let Z be a Hilbert space and l a convex lower semi continuous function, : l Z →  .The proximity operator with ( ) . This method has been proposed in [28].Our inertial proximal method is summarized in the following algorithm.
Algorithm 1 (Inertial proximal method).Input: initial guess 0 u , 0 i = , max i , ( ) where ( ) ( )  Remark 6.1.The backtracking scheme in Algorithm 1 provides an estimation of the upper bound of the Lipschitz constant in (48), since it is not known a priori.The initial guess for L is chosen as follows.Given a small variation ε of u, we have    Table 2 the values of the tracking error for different values of the weight ν .As ex- pected, the tracking improves as the value of this optimization parameter becomes smaller.In Figure 1 and Figure 3, we can see that the optimal control u drives the expected mean value of the PDF towards the mean values given by k ξ and ( ) t ξ , re- spectively.
Next, we investigate the behavior of the optimal solution considering the full optimization setting, that is, the case when the L 1 -cost actively enters in the optimization process, i.e.In Figures 5-7, we depict the optimal controls for three different choices of values of γ and considering the discrete-in-time tracking functional given by (27).In Figures 8-10, we show the optimal controls for three different choices of values of γ and con- sidering the continuous-in-time tracking functional given by (28).In both cases, we can clearly see that increasing the value of the parameter γ significantly increases the sparsity of the solution, as expected.
Finally, in the Table 1 and Table 2, we also report values of the tracking error when both the L 2 -and L 1 -costs are considered.For a direct comparison with the first series of experiments, we consider an unconstrained control.We find that already with a small value of γ , the tracking ability of the optimization scheme worsen for both choices of the tracking functional.

Conclusion
A framework for the optimal control of probability density functions of jump-diffusion processes was discussed.In this framework, two different, discrete-in-time and continuous-in-time, tracking functionals were considered together with a sparsity promoting L 1 -cost of the control.The resulting nonsmooth minimization problems governed by a Fokker-Planck partial integro-differential equation were investigated.The existence of at least an optimal control solution was proven.To characterize and compute the optimal controls, the corresponding first-order optimality systems were derived and their numerical approximation was discussed.These optimality systems in combination with a proximal scheme allowed to formulate an efficient solution procedure, which was also theoretically discussed.Results of numerical experiments were presented to validate the computational effectiveness of the proposed method.
is a random variable with known distribution.The functions : represent the drift and the diffusion coefficients, respectively.We assume that σ is full rank.Random increments to the process are given by a Wiener process d W ∈  and a compound Poisson process n P ∈  .The rate of jumps and the jump distribution are denoted with λ + ∈  and ĝ , respectively.

Theorem 2 . 1 . Assume 2 C
∂Ω ∈ .Let the coefficients a and b in L in ( holds for each , α β ∈  and 0 ε > . The constrained optimization problem (29) can be transformed into an unconstrained one as follows functional  in(31) is bounded from below and therefore we can dead  is a convex, closed, and bounded subset of the reflexive Banach space  .Hence, ad  is weakly sequentially compact and we can extract a subsef .The fact f ∈  follows from standard arguments.Note that each couple ( ) and therefore weakly convergent to t f ∂ .Define as in(7) and the smoothness of b with respect to u together with(18), en- , with u a local minimum, satisfies the following inequality of(35), which takes into account the definition (25) of the admissible controls, ensures the existence of two nonnegative functions * , a b . For each k + ∈  , we define the following quantity and (d) Compute E according to (42) or (43).

Figure 1 .
Figure 1.State variable in the case of the discrete-in-time tracking functional defined in (27), with 0 γ = .

Figure 2 .
Figure 2. Adjoint variable in case of the discrete-in-time tracking functional defined in (27), with 0 γ = .

Figure 3 .
Figure 3. State variable in the case of the continuous-in-time tracking functional defined in (28), with 0 γ = .

Figure 4 .
Figure 4. Adjoint variable in case of the continuous-in-time tracking functional defined in (28), with 0 γ = .

Table 2 .
Tracking error of the continuous-in-time functional ( ) D f given by(28).
The statement follows from the a priori estimates of Proposition 2.3 and Theorem 2.2.

Table 1 .
Tracking error of the discrete-in-time functional