On Approximating the Gradient of the Value Function

The optimality conditions for macroeconomic problems with limited commitment often contain partial derivatives of the optimal value function, corresponding to the outside option. This paper contributes to the literature on recursive contracts by proposing an algorithm for approximating the gradient of the value function using simulation-based methods. Our method combines numerical solution and simulation of the model, Monte-Carlo integration and numerical differentiation. It does not suffer from the curse of dimensionality and is therefore convenient for models involving many state variables. The algorithm inherits the speed and accuracy limitations of the numerical solution method it relies on. Our accuracy analysis is limited to a few classical examples from macroeconomic literature.


Introduction
The purpose of this paper is to propose a simple algorithm for computing partial derivatives of the optimal value function.Macroeconomic problems involving incentive compatibility constraints have received wide attention in the literature due to recent advances in dynamic optimization techniques (see [1] [2] and references therein).Often the optimality conditions for this class of problems involve partial derivatives with respect to endogenous state variables of the optimal value function corresponding to the dynamic programming formulation of an outside option.Although many numerical methods can provide an approximation for the value function, there is no reason to believe that a derivative of this A. Dmitriev DOI: 10.4236/tel.2019.91011 128 Theoretical Economics Letters that solving it boils down to designing an optimal social contract which takes into account not only technological but also incentive and legal constraints.Our example illustrates the need for computing the gradient of the value function and its practical implementation.Furthermore, it shows that our algorithm is applicable to some widely used models, to which the method in [3] cannot be applied.
Consider a model of international risk sharing, which distinguishes itself from the canonical model [5] in two respects.First, as in [6], we introduce a friction in the credit markets.We assume that the international loans are feasible only to the extent to which they can be enforced by the threat of exclusion from participation in any other international risk sharing arrangement.Second, we incorporate habit formation preferences into the model.The motivation for doing this is threefold.Habits help us illustrate the features of the algorithm by expanding the set of endogenous states in the model.Habit formation preferences tend to improve performance of the international business cycle models [7] [8].Finally, empirical studies suggest that habit formation is consistent with the observed consumption behavior [9].To simplify the exposition, we assume inelastic labor supply.
The planner's problem is to choose the sequences of consumption { } it c and investment { } it i to maximize a weighted sum of utilities subject to an aggregate feasibility constraint ( ) , , individual participation constraints for each 1, , the equations of motion for the capital, ( ) the laws of motion for habits, ( ) and non-negativity constraints , 0 it it c i ≥ .We assume that productivity shocks t θ follow a first order stationary vector autoregressive process, and that the ini- tial values for the state variables 0 0 0 , , k h θ , and the initial non-negative weights, i λ , are given.In addition, the usual restrictions apply to the discount factor, ( ) δ ∈ , and the persistence of habits, ( ) The outside option

( )
, , represents the optimal value function corresponding to the autarkic environment. A.
with the initial values being equal to the values of the state variables , , k h θ at the moment of deviation from the optimal plan.
In addition to Equations ( 2)-( 5), the optimal allocations, for all , 1, , the intertemporal condition, ( ) ( ) ( ) the complementary slackness condition, and the law of motion for the co-state variables where The gradient of the optimal value function a i V enters the intertemporal condition (8) and the risk-sharing condition (7).Approximation of this gradient is the purpose of the algorithm proposed in [3] and in this paper.Because both algorithms can approximate ( ) , , , we will use that fact to compare their accuracy in Section 1.Our approach can also approximate ( ) , , whereas [3] cannot, because the analytical solution for this partial derivative as an expectation of the known functions of the model's solution is not available.
This will be further discussed in Section 1.
where r is an instantaneous utility function, ( ) an exogenous Markov stochastic process, t x a vector of endogenous state va- riables, t a a vector of control variables, A a feasibility correspondence and l the law of motion for the endogenous state variables.The functional equation to this problem can be derived using the standard dynamic programming techniques.It yields a time invariant policy function f such that the optimal allocations satisfy ( ) The purpose of our algorithm is to find a pointwise approximation to the partial derivative ( ) of the value function with respect to its i-th argument.The algorithm takes the following three steps: Step I (Numerical Solution) Solve the model in ( 9) with a spectral method and formulate the solution in terms of approximated policy functions , which depend on the state variables and some coefficients, ω .
Step II (Monte Carlo Integration) Simulate N sequences of the realizations of the stochastic process { } Step II to obtain approximations of the value function at two points, for instance ( ) ( ) , where i ι denotes a conformable vector of zeros with one on its i-th coordinate, and  is a small positive number.Calculate the value of the partial derivative using, for example, Stirling's finite difference formula: The optimal choice of the method for calculating the derivatives in Step III is problem specific and its accuracy depends on the smoothness of the value function.The approaches available include a variety of difference formulas, Richardson Extrapolation, or curve fitting with cubic splines.These are described at length in the standard numerical methods texts such as [10] [11] [12].
A brief note should be made at this point on the accuracy of the algorithm.In principle, arbitrary accuracy of the approximation can be achieved, by simultaneously increasing the dimension of the approximating family of functions in Step I, increasing the size of Monte Carlo iterations in Steps I and II, and decreasing the denominator  in Step III.In practical applications, however, there are several sources of the approximation errors.First, in order to obtain the values of the optimal value function at a point, one relies on the approximations of the policy functions implied by the numerical solution to the model.
Second, because we consider stochastic models, there is an additional error stemming from the evaluation of the integral in the computation of expected discounted returns.Finally, numerical differentiation introduces two more sources of error: the truncation error and the roundoff error.The truncation error comes from omitting higher order terms in the Taylor series expansion.The roundoff error is associated with storing real numbers in computer's floating-point format.Section 0 discusses some practical accuracy issues in the context of an example.

Implementation of the Algorithm
This section describes a practical computational strategy for implementing the algorithm using the example from Section 2. The optimality conditions include partial derivatives of the value function corresponding to the dynamic programming formulation of the agents outside option, i.e. autarky.The functional equation for the autarkic problem is: , The objective of the algorithm is to find the values of the partials ( ) ( ) where marginal utility of consumption is ( ) ( ) ( ) To simplify the exposition we will consider the case of non-persistent habits, which corresponds to 1 λ = .We assume the functional forms standard in the growth literature.The instantaneous utility function is given by σ .In this example, we restrict attention to one particular set of the parameters which are summarized in Table 1.
Step I (Numerical Solution) The sequences of optimal allocations { } must satisfy the following system of stochastic difference equations: ( ) For the expositional purpose, we solve the model with a version of a stochastic simulation algorithm, which is easiest to implement (see e.g.[13]).It takes the following steps: 1) Fix the initial conditions and draw a series of { } 1 T t t θ = that obeys the law of motion for the exogenous shocks with T sufficiently large.To ensure sufficient accuracy of solution we chose 50000 T = for all the numerical examples considered.The computational burden of this is still rather low since the model needs to be solved only once.
2) Substitute the conditional expectations in (11) with the flexible functional forms that depend on the state variables , , denotes polynomial of degree n.By using the exponent of the logarithmic polynomial expansion we guarantee that the left hand side of (11) remains positive.
Given ( ) t c ω , the next period values for the capital and habit stocks follow di- rectly from the laws of motion ( 12) and ( 13).
3) Using the realizations of { } 0 T t t θ = , repeat the previous step in order to obtain recursively a series of the endogenous variables where the role of the dependent variable ( ) Y ω is performed by the expression inside the conditional expectation in Equation ( 11).

5) Letting ( )
S ω be the result of the regression in the previous step, use an iterative procedure to find the fixed point of S, and the set of coefficients ( ) This would provide the solution for the endogenous variables for this particular realization of the stochastic process { } 1 T t t θ = along with the approximated policy functions: , , ; , , , , , 1 ; , , , , , ; , , .
Step II (Monte Carlo Integration) Our objective is to find approximations of partials at a range of points.Supposing that the point of interest is ( ) .Using the simulated series calculate the discounted sums of the instantaneous utilities and average over N, ( ) ( ) The partial with respect to the habit stock is obtained in a similar way.The length of the simulated series T can be very moderate due to discounting of the future utilities.The optimal value of  is both computer and problem spe- cific.

Numerical Accuracy: A Comparison
This section considers the accuracy of the algorithm in the context of our example.First, we compare performance of our algorithm with the approach in [3] when such comparison is feasible.Next, we present several special cases, which isolate the contributions of different sources to the overall approximation error.
Consider the optimality conditions for the autarkic problem, written in the sequence Condition (17) can be used to compare our algorithm with the method in [3].
The latter requires solving the model numerically and expressing the derivatives of interest in terms of conditional expectations and functions of the equilibrium path of the model.Applying recursive substitution and the law of iterated expectations to ( An approximation of this derivative can be obtained by parameterizing the right hand side with flexible functional forms in the state variables ( ) k h θ and running a non-linear regression using the simulated series from the numerical solution of the model.
Figure 1 shows the approximations of the derivative obtained using our algorithm and the approach in [3].The approximated values of ( ) , , V k h θ are plotted for a range of a state variable while keeping the remaining states fixed at their deterministic steady state values.The histograms plot the sample distributions of capital and habit stock.A few observations can be made based on Figure 1.First, the two algorithms produce indistinguishable results when the state variables take the values which often occur in equilibrium.Second, for the points which are unlikely to occur in equilibrium, the approximations differ signifi- V k h θ at their deterministic steady state values.However, the points where capital is very high while consumption (and hence the habit stock) are at the steady state level are rather unusual.This is an expected result, since Monte Carlo integration delivers good approximations only in the region of the state space which is frequently visited by the model in equilibrium.
The framework we have chosen for a worked out example embeds several well known special cases.For instance, if 0 λ = , it reduces to the Brock-Mirman stochastic growth model.In this case, the analytical form of the one-period return function r, which maps the graph A of the feasibility correspondence Γ into the real numbers is known.The correspondence describing the feasibility constraints is given by and the instantaneous return function becomes : given by , , , , , : , . Hence, by virtue of the Benveniste-Scheinkman theorem the derivative of interest can be expressed as where g is the optimal policy function for capital stock.This special case allows us to compare the simulation from our algorithm with the example where the only source of approximation errors is the approximation of the policy function g.This will allow us to isolate the contribution of the approximation errors in evaluation of the integrals and numerical differentiation to the overall approximation error of the algorithm.As shown in Figure 1, the approximation delivered by our algorithm is very close to the approximation which relies on the Benveniste-Scheinkman theorem.Once again, in the region of the state space which is often visited by the model in equilibrium the two approximations are virtually identical.This allows us to tentatively suggest that the main contribution to the approximation error of the algorithm comes from the approximation of the policy functions.Our final special case compares the approximation of the derivative with its known exact solution.It is well known that for the functional forms ( ) , and 1 δ = , the optimal policy function is de- fined by the simple law of motion   indistinguishable for the range of six standard deviations of t k around its de- terministic steady state value.The approximation errors stemming from Monte Carlo integration and numerical differentiation are of an order of 10 −9 of the value of the derivative.This suggests that obtaining accurate approximation of the policy functions is crucial for the accuracy of the whole algorithm.

Concluding Remarks
This paper contributes to the literature on recursive contracts by proposing an algorithm for computing the gradient of the optimal value function using simulation-based techniques.The proposed procedure is conceptually straightforward, computationally inexpensive, and simple to implement.It allows researchers to extend existing risk-sharing models with limited commitment by including additional endogenous state variables.For example, one may extend a multi-country single-good model in [6] by including different types of capital or time-non-separable preferences.Alternatively, a multi-country multi-good model in [14] can be extended to include capital accumulation and cross-country trade in investment goods.In terms of accuracy, the algorithm demonstrates performance comparable with Marcet and Marimon's method [3] in our benchmark example.In contrast to [3], our method is flexible enough to handle dynamic models with large numbers of state variables even when derivatives of interest cannot be expressed in terms of conditional expectations and functions of the equilibrium path of the model.
While our algorithm has wide applicability, it inherits its speed and accuracy trade-offs from the underlying numerical solution method.Our experiments suggest that obtaining accurate approximation of the policy functions is crucial for the accuracy of the whole algorithm.
An additional limitation on the algorithm's computational speed is imposed by model's simulation and Monte-Carlo integration.However, both steps can be parallelized along the lines proposed in [15] in order to reduce the computational time burden.Exploring the costs and benefits of a parallel implementation of the algorithm is left for the future research.
abbreviations apply to other terms 1 .

=
using approximated policy functions f , the equations for motion for the state variables(10), and the initial values 0 n x x = .Using the simulated series calculate the discounted sums of the instantaneous returns and average over N: h θ the algorithm proceeds as follows:Simulate N sequences of the realizations of the stochastic process { }

Figure 2
Figure 2 compares the approximated derivative obtained using the exact policy function for t k with its analytical counterpart.The reported graphs are visually

Figure 2 .
Figure 2. Comparison with a Closed Form Solution.
which is likely to happen in equilibrium.Since the analytical expression for these derivatives is in general unavailable, we have no choice but to rely on numerical differentiation.Another complication which arises here is that the closed form solution to the optimal value function is gen-can approximate the policy functions with arbitrary accuracy.This example relies on a version of stochastic simulation algorithm, which formulates the solution in terms of approximated policy functions.The Euler equation for the problem is given by: to be able to use the finite differencing approach.The first step of the algorithm involves solving the model with a spectral me-A.Dmitriev DOI: 10.4236/tel.2019.91011132 Theoretical Economics Letters thod which The production function is Cobb-Douglas and is given by ( ) t ε are independent normally distributed random variables with zero mean and variance 2 ε

Table 1 .
Parameterization of the model.
t t k h θ and some coefficients, ω , to yield A. Dmitriev DOI: 10.4236/tel.2019.91011133 Theoretical Economics Letters To see this feature, consider the range of values of capital stock in excess of 6.5.The plots of the approximate derivatives reported in the upper panel of Figure1do not coincide.Moreover, the upper tail of the histogram suggests that such values of t k are not unlikely to happen in equilibrium.Notice, that while