First Order Convergence Analysis for Sparse Grid Method in Stochastic Two-Stage Linear Optimization Problem

Stochastic two-stage linear optimization is an important and widely used optimization model. Efficiency of numerical integration of the second stage value function is critical. However, the second stage value function is piecewise linear convex, which imposes challenges for applying the modern efficient spare grid method. In this paper, we prove the first order convergence rate of the sparse grid method for this important stochastic optimization model, utilizing convexity analysis and measure theory. The result is two-folded: it establishes a theoretical foundation for applying the sparse grid method in stochastic programming, and extends the convergence theory of sparse grid integration method to piecewise linear and convex functions.


Introduction
Stochastic two-stage linear optimization, also called stochastic two-stage linear programming, models a sequential decision structure, where the first stage decisions are made now before the random variable manifests itself; and the second stage decisions are made adaptively to the realized random variable and the first stage decisions.The adaptive decision model has been applied many important application areas.For example, in the introductory farmer's problem [1], a farmer needs to divide the land for different vegetables in spring.The farmer's objective is to maximize profit in the harvest season.The profit is related to the market price at that time and the weather dependent yield.Neither the price nor the weather is known at the present time, hence the farmer's decision in spring has to take into account multiple scenarios.It is not a simple forecasting problem though, since the farmer's second stage decision in fall, which adapts to different scenarios, also jointly determines the profit.The second stage decision problem is also called "recourse" problem.[2] collects more recent applications in engineering, manufacture, finance, transportation, telecommunication et al.
A stochastic two-stage linear problem with recourse has the following general representation: where  is a random vector properly defined on , X is a polytope feasible region for the first stage, m n Q    , s and are a vector and a matrix of proper sizes, The high dimensional integration in (1) is difficult and is usually approximated by using a set of scenarios and weights   Under this scenario approximation, the optimal objective value * K z  of (2) provides an approximation of the optimal objective value of (1).An optimal solution * z * K x  of (2) provides an approximation of an optimal solution * x of (1).Monte Carlo (MC) method has been widely used in this approximation, where are random , =1, , k k   K sampling points and = 1 k w K .The convergence theory of Monte Carlo method has been extensively studied [3][4][5][6].The core result is the epi-convergent theorem: under mild assumptions, * K z  converges to w.p.1 as ; and any clustering point of the =1 , which is the sequence of optimal solutions to (2), is in the optimal solution set of the original problem.Quasi Monte-Carlo (QMC) method has also been recently studied [7], and similar convergence result has been achieved.
The sparse grid (SP) method is an established high dimensional quadrature rule, which was originally proposed by Smolyak [8], and has been studied by many authors in the context of numerical integration [9] (and references therein).Its application in the stochastic twostage linear optimization is only shown in a recent numerical study in [10].Though [10] shows the superior numerical performance of sparse grid method, compared with both MC and QMC, the convergence analysis is based on an assumption that the recourse function is in a Sobolev space, which only holds for a very narrow subset of the two-stage linear problems, i.e., separable problems.The contribution of this paper are 1) establishing the epi-convergence of the sparse grid method for this important decision model; 2) prove the first order convergence rate of the method.
We first introduce the spare grid approximation error for integrand functions in Sobolev spaces.Let j D denote the partial derivative operator  be a multi-index, and where r Sobolev spaces could also be defined using p  norms, see Evans [11].The derivatives in the definition of Sobolev space are weak derivatives.Formally,  satisfies the following equation: x defined in   0,1 has the first order weak derivative function ; but the function is nondifferentiable at 0 in the usual strong sense.It has been shown that weak derivative is essentially unique, and coincides with the classical strong derivative when it exists.Various properties of strong derivatives carry over to weak derivative as well, for example, For , the sparse grid method achieves the following convergence rate [12]: where K is the number of function evaluations, , r d  is a constant independent of f , increasing with both d and , see Brass [13].Note that implies r . Since the norm for large , it is none trivial to tell which space will yield the tightest bound.The problem is called fat r K F problem in Wonzniakowski [14].In this paper, as we shall see, only is relevant for our discussion.

= 1 r
The convergence result in (3) only holds for the twostage stochastic linear programming (1) in the trivial case, i.e., when the integrand function is separable.For example,   , : , , , = min , , , 0 y y y y x    and the convergence result in (3) can be applied directly.
However, in general,   , x   is non-separable piecewise function, see Birge and Louveaux [1], and does not belong to for any .For example, , , , = min and does not have the (weak) derivative   discon and non-differentiable even in the we (3) can not be applied to two-stage linear problem directly.The major contribution of this paper is to prove the convergence of (2) to (1) in the rate specified in (3) with =1 r , i.e., the first order conver e rate, even though   The paper is organized as the followings.In Section 2, we introduce a logarithmic mollifier function and prove its various properties.The mollifier function is quite familiar to the optimization community as it is the barrier function used in the Interior Point Method for linear programming.In Section 3, we use the limiting properties of the mollifier function to prove the uniform convergence and the first order convergence rate for the objective function.We also show the converging behaviour of the optimal solutions * K x  in a subsequence.Finally, Section 4 presents our conc ions.
In the coming sections, we a lus ssume .For a m The transforma re complexity in the an

Mollifier Function
We make the followi tions of the problem : tion brings in mo ng mild assump alysis without changing our conclusion, hence we assume in the following sections and extend the an ugh inverse transformation and truncation in the Appendix.
ector with an inve cu ysis using the In rtible mulative distribution function.Assumption A1 is necessary for our anal terior Point Method theory.Assumption A2 is for convenience since otherwise we need to discuss the case   , = x    , which will drag our analysis to a different ption A3 is implicitly assumed in many analysis of linear programming, since the rows of Q could be preprocessed such that the reduced Q has fu row rank.Assumption A4 facilitates the conversion from a unit cube focus.Assum ll   0,1 d to  through the inverse c.d.f.transformation.
We define a mollifier function , : where following, we call e mollifier be the optimal primal and of th dual solutions function, then where   has been an impor Interior Point Me d has been extensive studied by many authors.For interested readers, in addition to the reference given in the proof, we refer the extensive research in Megiddo [16], an early work Fiacco [17], and a survey of degeneracy and IPM by Güler et al. [18].For readers interested in the interior point method in general, we refer Nesterov and Nemirovskii [19], Renegar [20] and Wright [21].
The KKT condition of the optimization problem thod, an , : where Hence by n th In the following, we directly derive t e (strong) partial de  , e last equality follows the KKT condition.
. Hence by the i licit function theorem and the conclusion follows straightforward computation tion 2.3.For all 1 .Furthermo essentia a 2.1, Hence and (5).
If the optimal set of 1, is non-singular, hen bove limit is zero.If al set is degenerate, then the limit 0 Q the optim   is not defined.However, in the following, we sh at the degenerated case has zero Lebesgue measure, hence the limit is zero with probability one.Let's first consider a special case of degeneration.Let the first m columns of Q be an optimal basis B and Lebesgue m injective and surjective, hence Clearly the same argument holds egenerated optimal basis B with an arbitrarily chosen degenerated basic colum Furthermore, there are only finitely many ways to choose basis and degenerated columns.Hence the total measure of the set  which leads to a degenerated   , x   is zero.Hence we conclude that the limit is ntially. We continue t ca zero esse o lculate higher order partial derivatives.Note that = .
In general, for any and any set of Proof.We first prove the conclusion f r , and extend by induction. where , , is shown in lemma 2.1.Clearly ( 8) is finite for > 0  .Now taking limit 0   , by Theorem 2.1 and tion 2.3: Furthermore, we prove by induction that , , then by the chain rule, , , by the Theo-, , , , 1 and Proposition iv igher order 2.3.We also der e h partial derivatives explicitly in the Appendix A.
Theorem 2.2.For any 1 , , .sup Proof.follows the definition of the norm

First Order Convergence Rate
We first discuss the convergence of the objective function specified in the approximati n model ( 2) to the objective function of the true model ), and the converrat Theorem 3.1.For all x X  , Proof.For notation convenience, define operators and Taking limiting on both sides, then the first and third term on t nd side go to zero by Theorem 2.1, and the sec is bounded by the classical convergence rate grid method, see (3), Proposition 2. 3

and Theorem
Let the objective function of the true problem (1) and the approximated problem (2) be 0 d let the optimal objective value and optimal solution set of the true model and approximated model be * * * * , , , respectively.Theorem 3.2 and Theorem lts for the optimal objective value and the optimal sets separately.
Theorem 3.2.The optimal objective value converges, i.e., and the rate of convergence is For the minimization problem we note that where the inequalities follow Theorem 3.1.Theorem 3.3.For any is feasible; * of a subsequence 3) For any clustering point x  * , , x  satisfies the first stage constraints and by the relativ mpleteness assumption, e co K x  is feasible.To ow t sh hat K x  is also very close to a mal solutio m 3.1.ny opti n, we apply the similar technique used in Theore by Theorem 3.1, and

Conclusions
The modern sparse grid method is very efficient in numerical integration for integrant functions the Sobolev space then we apply the sparse grid method to generate scenars and weights io for on the cube .We need to check the properties of the integrand .Its differentiability only depends on The higher order partial derivative of a posite function can be calculated explicitly.For : where n  is the set of all partit : , We apply Tsoy-Wo Ma's higher chain fo ula [26]: in   respectively if the decomposition equation , where For the problem under discussion, let 1  in the approximation model (2) The


decreasing in r , and the term an unique optimal solution.Since the pro- non-empty relative interior by assumption, the central path for each x , and converges to th er of the optimal set, see e analytical cent property of Roos et al.[15] Theorem I.30 and its Definition I.20 for analytic center.The converging

1  1 d
 .Note that there are 2  number of  s satisfy 1 ing   .We also prov partial derivatives are finite for all e these   0,1 , x X  .Finally, we show that their limits are fi nite when 0 limiting point defined in the Theorem 2. not an arbitrary optimal solution of


. However, the integrand function in twostage linear programming does not belong to r

1  1 F
Most commonly used invertible cumulative distribution functions, for e ple, inverse of normal distribution function     , is also in C xam  .The finiteness of the partial derivatives of   since partial derivatives of   ,x    are finite for any multi-index 1   component-wisely.
means the cardinality of the set A .For a vecp rm tor com site function: some distributions, the condition might not hold.For example, the inverse of a cumulative function of the normal distribution does not have this property nearby 0 or 1.To remove the singularities, truncation of the cube [0, 1] could be applied: is a small positive number.To comd side rd sparse grid method, o change the variable to , where pute the right using the standa yHence for a two-stage linear problem with an invertible but unbounded cumulative distribution function F  , we shall first generate the standa weights