Proximal Methods for Elliptic Optimal Control Problems with Sparsity Cost Functional

First-order proximal methods that solve linear and bilinear elliptic optimal control problems with a sparsity cost functional are discussed. In particular, fast convergence of these methods is proved. For benchmarking purposes, inexact proximal schemes are compared to an inexact semismooth Newton method. Results of numerical experiments are presented to demonstrate the computational effectiveness of proximal schemes applied to infinite-dimensional elliptic optimal control problems and to validate the theoretical estimates.


Introduction
In recent years, a great research effort has been made to solve optimization problems governed by Partial Differential Equations (PDEs); see, e.g., [1]- [3] and references therein.In many cases, this research has focused on objective functionals with differentiable 2  L terms and non-smoothness resulted from the presence of control and state constraints.However, more recently, the investigation of 1  L cost functionals has become a central topic in PDE-based optimization [4]- [6], because they give rise to sparse controls that are advantageous in many applications like optimal actuator placement [4] or impulse control [7].
A representative formulation of optimal control problems with 1 L control costs is the following where ( ) , 0 c y u = represents a PDE for the state y including the control u.This problem has been discussed in [4] [5] for the case where ( ) , c y u represents a linear elliptic operator.Nonlinear PDE constraints have been considered in [6].However, in these references a linear control mechanism is discussed.Concerning the optimization methodology for (1.1), the semi-smooth Newton method has been the solver of choice in [4]- [6].
On the other hand, in the field of signal acquisition and reconstruction, l 1 -based optimization and sparsity have been exploited to successfully recover "functions" from few samples; see, e.g., [8]- [10].
In this framework, it was shown [11] that l 1 -based inverse problems in signal recovery can be very efficiently solved by proximal methods.Nowadays, these iterative schemes are the method of choice in magnetic resonance imaging and a special proximal method called "Fast Iterative Soft Thresholding Algorithm" (FISTA) [12] is considered the state-of-the-art method for solving finite-dimensional optimization problems of the following form where the rectangular matrix A represents a blur operator.
We remark that the research and successful application of proximal schemes are attracting attention of many scientists and practitioners, which result in many new developments in this field.We refer to, e.g., [13] for recent results and additional references.
The purpose of our work is to contribute to the field of PDE-based optimization with 1 L control costs by investigating proximal methods in this infinite-dimensional setting.In particular, we aim at implementing and analysing proximal schemes for solving (1.1) that exploit first-order optimality conditions.Our investigation is motivated by the fact that proximal methods may have a computational performance that is comparable to that of semismooth Newton methods.However, in contrast to the latter, proximal schemes do not require the construction of second-order derivatives and the implementation of, e.g., a Krylov solver.
For our investigation, we consider (1.1) with elliptic operators and linear and bilinear control mechanisms.Notice that the latter case has been a much less investigated problem.One of our main contributions is to prove convergence for all variants of the proximal schemes that we discuss in this paper.In particular, we prove an ( ) convergence rate of the value of reduced cost functional, where k is the number of proximal iterations.This notion of convergence is used in l 1 -based optimization and in some application fields [14].
We remark that many arguments in our analysis are similar to those presented in the finite-dimensional case.However, some additional arguments are necessary in infinite dimensions, especially regarding the structure of our differential constraints and the discussion of our inexact proximal schemes.We refer to [13] for further results concerning the formulation of proximal schemes for infinite-dimensional optimization problems from a different perspective.
In the next section, we discuss linear and bilinear elliptic optimal control problems, where for completeness, some conditions for the existence of a unique control-to-state operators are considered.Section 3 is devoted to optimal control problems with sparsity costs and governed by elliptic equations with linear and bilinear control mechanisms.We discuss conditions for convexity of the bilinear problem and state the optimality conditions.In Section 4, we present a Fast Inexact Proximal method (FIP) that represents an infinite-dimensional extension of the FISTA method.In Section 5, the convergence rate of this method is proven to be ( ) In Section 6, an inexact semismooth Newton method in function spaces is presented as the state of the art method for comparison purposes.For completeness, the theory of this method is extended to the case of elliptic bilinear control problems.A numerical comparison of the FIP and Semismooth-Newton methods is presented in Section 7. A section of conclusion completes this work.

Elliptic Models with Linear and Bilinear Control Mechanisms
In this section, we discuss elliptic PDE models with linear and bilinear control structures.Notice that these models are already discussed in many references; see, e.g., [2] [15]- [17].However, in this section, we report the main results required for our analysis of convergence of the proposed proximal methods.

:
L Ω ⋅ = ⋅ induced by the inner product Then, there exists a unique weak solution ( ) ) and the following property holds ( ) Proof.The proof is immediate using the Lemma of Lax-Milgram and the following result , we have that where in the linear case ( ) ( ) represents the unique solution to (2.1) and in the bilinear case ( ) ( ) is the unique solution to (2.3).In the following, we use the expression ( ) S u when it is valid for both the linear and the bilinear systems.
In the following, we omit the index U and write Ω and its derivatives have the following properties for all directions ( ) iii) The following inequalities hold ( )( ) ( )( ) Proof.Part (i) and (ii) can be shown by direct calculation (see ([16], Lemma 2.9).So part (iii) is left to be proved.If ( ) f L ∈ Ω , by using (2.9), we obtain ( ) ( ) ( ) where the constants depend on the measure of Ω and not on y.Therefore we obtain (2.13) and ( ) Therefore, we obtain (2.14), which completes the proof.□

Elliptic Optimal Control Problems with Sparsity Cost Functional
In this section, we discuss optimal control problems governed by the linear-and bilinear-control elliptic systems discussed in the previous section.We consider the following cost functional ( ) where ( ) and , 0 α β > .This functional is made of a Fréchet-differentiable classical tracking type cost with 2  L -regularization and a nondifferentiable 1 L -control cost.Using the control- to-state operator (2.10), we have the following reduced optimal control problem ( ) ( ) The nondifferentiable part ( ) Therefore, in order to state convexity of the reduced functional ( ) Ĵ u , we investigate the second-derivative of the differentiable part ( ) ( ) We have In particular, in the linear case, we have We conclude that the reduced functional is strictly convex in the linear case.
In the bilinear case, we have a non-convex optimization problem.However, local convexity can be guaranteed under some conditions.To be specific, we chose the sufficient condition stated in the following theorem.

Proof. Since
( ) is convex, we have to prove that ( ) ( ) Therefore we show that the reduced Hessian is positive definite in ad U as follows □ We remark that the result of Lemma 3.1 is well known.It expresses local convexity of the reduced objective when the state function is sufficiently close to the target and the weight of the quadratic 2  L cost of the control is sufficiently large.Indeed, local convexity may result with much weaker assumptions.However, since our focus is the investigation of proximal schemes, we make the following strong assumption.
Assumption 1.We assume that (3.4) holds for all ad u U ∈ .Because of Lemma 2.3, this assumption holds if the regularization parameter ( ) In the next step, the first-order optimality conditions for (3.2) are derived.First, we need the definition of the subdifferential.
Definition 3.1.Let H be a Hilbert space and : F H →  be convex.We call the set-valued mapping From ( [20], Remark 3.2), we obtain that u is a solution of (3.2), if and only if there exists a ( ) where * denotes the adjoint operator.From (3.5), one can derive the optimality system by using the Lagrange multipliers We have the following theorem (see [4], Theorem 2.1).
Theorem 3.2.The optimal solution u of (3.2) is characterized by the existence of , , If one introduces the parameter : 2) in the linear control case is characterized by the existence of the dual pair ( ) ( ) ( ) Furthermore, the reduced gradient and the reduced Hessian of ( ) ˆ, and .

J u u p J u
Notice that with an abuse of notation, we denote the reduced Hessian with ( )

Ĵ u ′′
, which is also used to denote the second derivative operator.
For the bilinear-control system, we have By setting ( ) this can be written as follows, 0 yp u α µ + + = .We summarize the previous considerations into the following theorem.
Theorem 3.4.(Bilinear optimality system) The optimal solution ( ) ( ) ( ) 2) in the bilinear control case is characterized by the existence of the dual pair ( ) ( ) ( ) Furthermore, the explicit reduced gradient and the reduced Hessian of ( )

Proximal Methods for Elliptic Control Problems
In this section, we discuss first-order proximal methods to solve our linear and bilinear optimal control problems.
The starting point to discuss proximal methods consists of identifying a smooth and a nonsmooth part in the reduced objective ( ) Ĵ u .That is, we consider the following optimization problem where, we assume ( ) ˆis continuous, convex and nondifferentiable J u ( ( ) ˆis Q-differentiable with respect to , convex, and has Lipschitz-continuous gradient: where ( ) 2 ˆ0 L J > .Notice that our optimal control problem (3.2) has this additive structure where (4.2) holds for ( ) appropriate conditions discussed in the previous section, and it has Lipschitz-continuous gradient that we prove in the following lemma.
(linear-control case) and for ( ) ( ) For the linear-control case, we have such that we have the Lipschitz-constant ( ) ( ) For the bilinear-control case, we use the mean value theorem.There exists a for the last inequality, we use (2 represents the smallest value of L such that (4.5) is satisfied.We remark that the discussion that follows is valid for as in Lemma 4. 5.However, as we discuss below, the efficiency of our proximal schemes depends on how close is the chosen L to the minimal and optimal value ( ) 2 L J .Now, since this value is usually not available analytically, we discuss and implement below some numerical strategies for determining a sufficiently accurate approximation of ( ) 2 L J .In particular, we consider a power iteration [21], and the backtracking approach discussed in Remark 5. 1.Further, notice that also in the case of choosing ( ) , our proximal scheme still converges with rate 1 k (resp. 1 k ) times a convergence constant.However, this convergence constant grows considerably as L becomes larger and therefore the convergence of the proximal method appears recognizably slower.On the other hand, if L is chosen smaller than the Lipschitz constant, then convergence cannot be guaranteed.
The strategy of the proximal scheme is to minimize an upper bound of the objective functional at each iteration, instead of minimizing the functional directly.Lemma 4.2 gives us the following upper bound for all ad v U ∈ . We have where, we have equality if u v = .Furthermore, we have the following equation Now, consider (4.6) and recall that ( ) We have the following lemma.Lemma 4. 3 where the projected soft thresholding function is defined as follows There exists a ( ) ( ), 0, .
Now, we show that ( ) The following investigation of the different cases is meant to be pointwise.We have and therefore ( ) It follows that ˆ0 u = and ( ) ( ) and therefore ( ) and therefore ( ) □ Based on this lemma, we conclude that the solution to (4.6) is given by  thus obtaining an approximation to the optimal u sought.Therefore we can use this result to define an iterative scheme as follows ( ) starting from a given 0 u .The resulting algorithm implements a proximal scheme as follows Algorithm 1 (Proximal (P) method) ( ) ( ) This scheme is discussed in [9] [12] for the case of finite-dimensional optimization problems.Notice that the iterated thresholding scheme discussed in [9] coincides with Algorithm 1 for the special case 1 L = .The convergence results for Algorithm 1 presented in [12] can be extended to our elliptic control problems, using the theoretical results presented above.Therefore we can state the following theorem.
In [22], an acceleration strategy for proximal methods applied to convex optimization problems fulfilling (4.4) is formulated, which improves the rate of convergence of these schemes from ( ) and ( ) ( ) Correspondingly, the optimization variable k u is updated by the following ( ) This procedures is summarized in the following algorithm Algorithm 2 (Fast proximal (FP) method) ( ) ( ) ( ) The following convergence result represents an extension of ( [12], Theorem 4.4).We have Theorem 4.5.Let { } k u be a sequence generated by Algorithm 2 and * u be the solution of (3.2) with linear-or bilinear-control elliptic equality constraints.Then, for every 1 k ≥ the following holds Algorithm 1 and Algorithm 2 require the calculation of However, the exact inversion of a discretized elliptic differential operator A may become too expensive.Therefore one has to use iterative methods; e.g., the conjugate gradient method [23].For this reason, we discuss an inexact version of the proximal scheme, where the equality constraints and the corresponding adjoint equations are solved up to a given tolerance quantified by 0 ε > .In the following, we denote with ( ) the inexact gradient that corresponds to an inexact inversion of the equation Ay f u = − , resp.( ) , that results in an approximated state variable y ε , resp.p ε , in the following sense ( ) Hence, there exists an ( ) .
We denote the inexact inversion method for the problem y g =  , with an error y g . With this notation, the inexact gradient computation is illustrated in Algorithm 3.

, , p inv A z y p inv A u z y
( ) ( ) With this preparation, we formulate our inexact proximal (IP) scheme with Algorithm 4.
Algorithm 4 (Inexact proximal (IP) method) ( ) ( ) We also formulate the accelerated (fast) version of our IP scheme in Algorithm 5. We refer to it as the FIP method.

Convergence Analysis of Inexact Proximal Methods
In this section, we investigate the convergence of our IP and FIP schemes.Notice that our analysis differs from that presented in [12] where finite-dimensional problems and exact inversion are considered.We start investigating the error of the inexact gradient for some c > 0.
Proof.In the linear case, we have . Using (4.10) there exist the errors ( ) In the bilinear case, we have We also have ( ) ( )

ˆˆ, J u J u A A f u e z e A A f u z A A e A e c
Using (4.10) there exist the errors ( )

J u J u A u A u f e z e A u f e A u A u f z A u f A u A u e e A u f e A u A u f z A u e A u A u e e y A u e
) ( ) For the three last inequalities, we use (5.3), (2.7), and 1 ε < .□ We refer to the estimation error of the inexact gradient in step k as follows ˆ: , where .
such that one step of Algorithm 4, resp.Algorithm 5, can be written as follows ( ) ( ) In order to prove the convergence of the IP method, we need the following two lemmas.Lemma 5.2.For any Proof.This is immediate from the variational inequality of (5.5).For a proof see, e.g., [20].□ Lemma 5. 3 ˆˆ, and , .
so using (5.6), (5.8), and the definition of L Q in (5.7) gives the following  be the sequence generated by Algorithm 4 and * u be the solution of (3.2) with linear or bilinear elliptic equality constraints; let c be determined by (5.2) resp.(5.4).Then for any ( ) .
Summing this inequality over 0, , 1 Using again Lemma 5.3 with Multiplying this inequality by n and summing again over 0, , 1 which simplifies to the following ( ) ( ) Adding (5.10) and (5.11) together, we get □ Next, we present a convergence result for the FIP method.For this purpose, we need the following lemma.e be the error of the inexact gradient, and let * u be the solution to (3.2), then for any 1 k ≥ , we have Proof.We apply Lemma 4.2 at the points ( ) and likewise at the points ( ) . We obtain the following ( ) where we used the fact that ( ) . Now, we multiply the first inequality above by ( ) and add it to the second inequality to obtain the following , .
Multiplying this inequality by k t and using . , : , : Therefore, with k v (see (4.9)) and k r defined as

□
We also have the following lemmas.
Proof.The proof is immediate by mathematical induction.□ Now, we can prove a convergence rate of ( ) for Algorithm 5 (FIP scheme).
Theorem 5.8.Let ( ) k u be the sequence generated by Algorithm 5, let * u be the solution to (3.2) with linear or bilinear elliptic equality constraints; let c be determined by (5.2) resp.(5.4).Then for any 0 k ≥ , the following holds Proof.Let us define the quantities .
As in Lemma 5.5, we define ( ) ( ) . Then, by Lemma 5.5, the following holds for every and hence assuming that 1 1 1 a b c d + + ≤ holds true, invoking Lemma 5.7, we obtain ( ) which combined with (5.13) and ( ) What remains to be proved is the validity of the relation 1 1 1 a b c d The IP and FIP methods converge also replacing L with an upper bound of it.In particular, we can prove ( ) 2 1 k  convergence of the FIP method using a backtracking stepsize rule for the Lipschitz constant (Step 1 in Algorithm 5) as in [12].
We complete this section formulating a fast inexact proximal scheme where the Lipschitz constant L is obtained by forward tracking, (nevertheless we call it backtracking as in [12]), thus avoiding any need to compute the reduced Hessian.Our fast inexact proximal backtracking (FIPB) method is presented in Algorithm 6.

The Inexact Semismooth Newton Method
We consider the semismooth Newton method as a benchmark scheme for solving elliptic non-smooth optimal control problems; see, e.g., [4]- [6].This method is proven to be equivalent to the primal-dual active set method in [24].The inexact semismooth Newton (ISSN) method is presented in [25] for finite-dimensional problems.In this section, we discuss the ISSN method for infinite-dimensional optimization problems and use it for comparison with our inexact proximal schemes.To support our use of the ISSN scheme to solve bilinear control problems, we extend two theoretical results in [3] [4].For the analysis that follows, we need the following definition.
Definition 6.This definition is similar to the semismoothness stated in [3] and also known under the name "slant differentiability"; see, e.g., [24].Now, we discuss the solution of the following nonlinear equation is bounded, then the semismooth Newton (SSN) iteration ( ) ( ) An inexact version of the SSN scheme discussed in this theorem is formulated in ([3], Algorithm 3.19), where the update k d to k x is obtained as follows.Choose a boundedly invertible operator ( ) and compute ( ) For this scheme, superlinear convergence is proven in ([3], Theorem 3.20), provided that there exists a ( ) However, this procedure is difficult to realize in practice.For this reason, in our ISSN scheme, the "exact" update step [24], is replaced by Our ISSN scheme is given in Algorithm 7.
Algorithm 7 (Inexact semismooth Newton (ISSN) method) On the basis of the proof of Theorem 3.20 in [3], we prove the following theorem that states convergence of Algorithm 7. We have Theorem 6.
. We estimate the Y-norm of k r as ( ) This result, the generalized differentiability of  at * x , and (6.4) give the following ( ) ( ) Hence, for sufficiently small 0 δ > , we have ( ) , and thus ( ) ( ) We conclude from (6.6) the following ( ) ( ) , which completes the proof.□ Our purpose is to solve the nonlinear and nonsmooth equation system (3.13)-(3.14)by the semismooth Newton iteration.We introduce the operator where  is the Sobolev embedding (see [4] and [19], Theorem 5.4]) of ( ) This embedding is necessary to show that the function  defined in (6.7) is generalized differentiable.Now, by using The function  is generalized differentiable (see [4], Theorem 4.2 for the linear case, analogue for the bilinear case) and a generalized derivative is given by Using Theorem 6.2, the following theorem guarantees the superlinear convergence of the semismooth Newton method applied to our problems.To prove this we extend the proof of Theorem 4. ( ) : The corresponding adjoint operator is the extension-by-zero operator ( ) ( ) . We assume that ( )( ) From (6.8) we obtain that , := , , , , , , , and (3.4) we have coercivity of a for ad u U ∈ and therefore the Lax-Milgram-Lemma can be applied to show that (6.9) admits a unique solution ( ) , with a constant C independent of u.

Numerical Experiments
In this section, we present results of numerical experiments to validate the computational performance of our inexact proximal methods and to demonstrate the convergence rate of ( )   In order to validate the convergence rate of ( ) We conclude this section considering challenging linear-and a bilinear-control cases.However, the exact solutions are not known.In these cases, the target function is not attainable.We have 1 sin 2π sin 2π z x y H = + ∉ Ω and 1 f ≡ .We discretize Ω with gridsize 1 256 h = . A is discretized by second-order finite differences.In Figure 2, we present the optimal controls obtained for the Examples 3 and 4, respectively.Notice that the controls obtained with the FIP, FIPB, and ISSN schemes are indistinguishable.We observe that in the case of a small α there is an abrupt change between 0 u = and b u u = , whereas for bigger α the change is con- tinuous.We also see that by increasing β the support of u decreases, as expected.The different computational times of the FIP, FIPB, and ISSN schemes are also given in the figure.We see that the FIPB scheme may outperform the ISSN scheme and vice versa.We also have a case where the ISSN scheme has difficulty to converge; see Figure 2, test case (d).Notice that very similar results are also obtained using a globalized version [7] of the ISSN scheme.These results and further results of numerical experiments demonstrate that fast inexact proximal scheme represent an effective alternative to semi-smooth Newton methods.

Conclusion
Inexact proximal schemes for solving linear-and bilinear elliptic optimal control problems were discussed.A complete analysis of these methods was presented and a convergence rate of ( ) 2 1 k  was proven.For benchmarking purposes, the proposed inexact proximal schemes were compared to an inexact semismooth Newton method.Results of numerical experiments demonstrated the computational effectiveness of inexact proximal schemes and successfully validated the theoretical estimates.

Theorem 3 . 3 .
arbitrary.With this setting, the optimality system (3.6)-(3.11)reduces to the following In the linear-control case, the Equation (3.13) becomes the following We summarize the previous considerations into the following theorem.(Linear optimality conditions) The optimal solution ( ) ( ) ( )

Theorem 4 . 4 .
Let { } k u be a sequence generated by Algorithm 1 and * u be the solution to (3.2) with linear-or bilinear-control elliptic equality constraints.Then, for every 1 k ≥ the following holds

Lemma 7 . 1 .
5.8.In the following procedures, for validation purposes, we formulate control problems for which we know the exact solution.We have Procedure 1. (Linear case) Procedure 1 provides a solution ( ) ˆ, y u of the optimal control problem (3.2) with linear-control elliptic equality constraints.Proof.We show that the optimality conditions (3.16)-(3.19) in Theorem 3.3 are fulfilled.(3.16)-(3.18)are obviously fulfilled because of 3) in Procedure 1. Now, we consider different cases to show (3.19):

Figure 1 .
upper bound of Theorem5.20  and the actual error of the functional in correspondence to Example 1 and Example 2 We see that the observed convergence may be faster than the theoretical prediction.

Figure 2 .
Figure 2. Optimal controls u for the Case 3 (top) and Case 4 (bottom).
[4]of.The linear-control case is investigated in[4], so we focus on the bilinear case.Define : J

Table 1 .
Example 1: Comparison of the FIP, FIPB and ISSN methods.

Table 2 .
Example 2: Comparison of the FIP, FIPB, and ISSN methods.