^{1}

^{*}

^{1}

^{*}

^{2}

The stochastic dual dynamic programming (SDDP) algorithm is becoming increasingly used. In this paper we present analysis of different methods of lattice construction for SDDP exemplifying a realistic variant of the newsvendor problem, incorporating storage of production. We model several days of work and compare the profits realized using different methods of the lattice construction and the corresponding computer time spent in lattice construction. Our case differs from the known one because we consider not only a multidimensional but also a multistage case with stage dependence. We construct scenario lattice for different Markov processes which play a crucial role in stochastic modeling. The novelty of our work is comparing different methods of scenario lattice construction. We considered a realistic variant of the newsvendor problem. The results presented in this article show that the Voronoi method slightly outperforms others, but the k-means method is much faster overall.

This article discusses one of the most powerful modern algorithms, which can solve problems of stochastic optimization—the stochastic dual dynamic programming algorithm first described in [

There are many articles about scenario generation methods. The approach of [

The rest of the paper is organized as follows: In Section 2, we are making an introduction to the theory, then in Sections 3 and 4 describing the methods of lattice construction we are using, and then we discuss numerical experiments in Section 5, and after this summarize all the results in Section 6.

The original problem of stochastic programming in general terms looks like the following [

min x ∈ X C ( x ) (1.1)

C ( x ) = ∫ c ( x , z ) d F ( z ) (1.2)

where x is a controlling variable with definition area X ⊆ ℝ n ; z is a vector vector of realizations of random variable Z; C ( x ) is a cost function.

To resolve this problem, we need to move from integral, whose computation is too hard, to sum, so our problem will have the discrete form:

min x ∈ X C ^ ( x ) (1.3)

C ^ ( x ) = ∑ z ^ ∈ F ^ p ^ ( z ) c ( x , z ^ ) (1.4)

In literature, this transition is known as sample average approximation (SAA), and it is formally described, for example, in [

There are many ways to implement transition from (1.1)-(1.2) to (1.3)-(1.4), for example Monte Carlo [

Our approach is based on established work [

e ( C , C ^ ) = | max x ( C ( x ) − C ( arg min x C ^ ( x ) ) ) | (1.5)

Since it is nearly impossible to compute e ( C , C ^ ) with (1.5), it is usual to work with the upper bound of the error. To estimate the upper bound, firstly, we need to introduce some additional notation. Denote as L r ( c ( x ) ) Lipschitz constant of c ( x , z ) function of r order:

L r ( c ( x , z ) ) = inf { L : | c ( x , z 1 ) − c ( x , z 2 ) | ≤ L | z 1 − z 2 | max ( 1 , | z 1 | r − 1 , | z 2 | r − 1 ) ∀ z 1 , z 2 ∈ Z } ≤ L ¯ r (1.6)

where L ¯ r is the upper bound. Further, d r is Wasserstein distance between F and F ^ distributions:

d r ( F , F ^ ) = ( min g ∈ M ( F , F ^ ) ∫ ‖ z − z ^ ‖ r d g ( z , z ^ ) ) 1 r , (1.7)

where the minimum is taken over the entire space of distribution functions M ( F , F ^ ) , marginal distributions F and F ^ are Lipschitz, which means they satisfy (1.6). From [

e ( C , C ^ ) ≤ 2 L ¯ r d r ( F , F ^ ) , (1.8)

so as far as L ¯ r is a constant for given r and c ( x , z ) , it is clear that with the reduction of d r ( F , F ^ ) , the approximation error will also reduce.

It is impractical to find d r ( F , F ^ ) using (1.7) because of the integral, so we will use distance between two discrete distributions:

d r ( F , F ^ ) = ( min y i , j ∈ [ 0 , 1 ] ( ∑ i = 1 N ∑ j = 1 M y i , j ‖ z i − z ^ j ‖ r | ∑ j = 1 M y i , j = p i , ∑ i = 1 N y i , j = q j ) ) 1 r , (1.9)

where p i , q j are the probabilities of the corresponding discrete distribution values, and y i , j = p i q j . Thus, according to (1.8), we need to find the discrete distribution F ^ , which is the best approximation for F, reducing SAA error.

Since we consider continuous distributions, we need to generate a sample of size N, so (1.9) will have the following form:

min z ^ 1 , ⋯ , z ^ M ( ( min y i , j ∈ { 0 , 1 } , q j ≥ 0 ( ∑ i = 1 N ∑ j = 1 M y i , j ‖ z i − z ^ j ‖ r | ∑ j = 1 M y i , j = 1 , ∑ i = 1 N y i , j = q j N ) ) 1 r ) , (1.10)

where q j , j = 1 , ⋯ , M are probabilities of discrete distribution values; y i , j = 1 , if element z i of the original sample attributes to element z ^ j of the new distribution, otherwise y i , j = 0 ; M is the number of values of the new distribution, M < N .

Now, we will briefly describe algorithms, which we used to solve (1.10) and get a new approximate probability distribution. More comprehensively, they are discussed in [

k-means algorithm:

1) Randomly choose M elements z ^ 1 * , ⋯ , z ^ M * from the source sample z 1 , ⋯ , z N .

2) Attribute element i , i = 1 , ⋯ , N of sample z 1 , ⋯ , z N to element j of sample z ^ 1 * , ⋯ , z ^ M * , if j is the solution of min j ‖ z i − z ^ j * ‖ .

3) Recalculate cluster centers z ^ 1 * , ⋯ , z ^ M * : z ^ j * = 1 S j ∑ k = 1 s j z k , where s i is the

number of elements of the sample z 1 , ⋯ , z N , attributed to cluster j on stage 2.

4) If the stopping condition is met, then stop, else return to step 2.

Note that we are using k-means modification from [

Competitive learning:

The sample z 1 , ⋯ , z N is obtained from the distribution F and initial approximations ( z ^ j 0 ) j = 1 M are chosen. Then the following algorithm is fulfilled for j = 1 , ⋯ , M ; i = 1 , ⋯ , N :

z ^ j n = { z ^ j n − 1 + α n ( z n − z ^ j n − 1 ) , if j = arg min k ( ‖ z n − z ^ k n − 1 ‖ 2 ) z ^ j n − 1 , else (1.11)

α n is the step size, j = 1 , ⋯ , M ; n = 1 , ⋯ , N , values z ^ j N give the wanted discrete distribution.

Voronoi cells sampling:

The sample z 1 , ⋯ , z N is obtained from the distribution F and initial approximations ( z ^ j 0 ) j = 1 M are chosen. Then the following algorithm is fulfilled for j = 1 , ⋯ , M ; i = 1 , ⋯ , N :

z ^ j n = { z ^ j n − 1 + α n ( z n − z ^ j n − 1 ) , if j = arg min k ( ‖ z n − z ^ k n − 1 ‖ 2 ) z ^ j n − 1 , else (1.12)

Further, put z ^ ′ j 0 = z ^ j 0 and:

z ^ ′ j n = { z n , if j = arg min k ( ‖ z n − z ^ k n − 1 ‖ 2 ) z ^ ′ j n − 1 , else (1.13)

Now consider the scenario lattice construction algorithm that was used.

1) First of all, the grid parameters are selected: the number of stages, the number of nodes at each stage, the number of scenarios generated from each node.

2) Then, from each vertex of stage, t, we generate M scenarios according to the relevant random processes.

3) From the M× N scenarios, where N is the number of nodes in thet stage, N is selected using one of the methods described in Section 3.

4) The probabilities of transition from the vertex i of stage t to the vertex j of stage t + 1 are calculated as the ratio of the number of scenarios generated from the vertex i and related to the vertex j of stage t + 1 to the total number of scenarios generated from the vertex i.

5) If t+ 1 is not equal to T, where T is the number of stages, we pass to point 2; otherwise, the grid is constructed.

The lattice obtained with this algorithm can be illustrated as follows (

In

To compare k-means, competitive learning, and Voronoi cell sampling algorithms, we will use them to solve “The problem of commodities production, storage and selling” that can be formulated as follows.

Let T be the time interval at which we consider the problem, k is the number of type of goods, p i is the cost of single item production of type i, v i is the maximum number of items that can be produced in one day, x t i is the number of goods of type i that were produced on day t, s i is the selling price for goods of type i, δ t i is the demand for type of goods i on day t, r t i – number of undelivered goods of type i on day t, c i is the cost of storage of goods of type i.

The goal is to find a strategy (amount of produced goods) that leads to maximum income.

Income in day t: ∑ i = 1 k ( − p i x t i + s i w t i − c i r t i )

Amount of sold goods in day t: w t i = min ( δ t i , x t i + r t − 1 i ) , i = 1 , ⋯ , k

Amount of goods in storage at the end of day t: r t i = r t − 1 i + x t i − w t i , i = 1 , ⋯ , k

CVaR is used as a risk measure; the optimization criterion is the weighted sum of CVaR and profit expectation: ( 1 − λ ) E [ P ] + λ CVaR .

Formally dynamic programming equations look like the following:

for t = T , ⋯ , 2

Q t ( r t − 1 , δ t ) = min x t i ≤ v i w t i ≤ δ t i w t i ≤ x t i + r t − 1 i r t i = r t − 1 i + x t i − w t i x t i ≥ 0 , u t ≥ 0 { ∑ i = 1 k ( − p i x t i + s i w t i − c i r t i ) + λ t + 1 u t + Q t + 1 ( r t , u t , δ t ) }

where

Q t ( r t − 1 , u t − 1 , δ t − 1 ) = E { ( 1 − λ t ) Q t ( r t − 1 , δ t ) + λ t α t − 1 [ Q t ( r t − 1 , δ t ) − u t − 1 ] + | δ t − 1 } , ( Q T + 1 ( . , . , . ) ≡ 0 by definition), r t = ( r t 1 , ⋯ , r t k ) и δ t = ( δ t 1 , ⋯ , δ t k ) .

for t = 1

min x 1 i ≤ v i w 1 i ≤ δ 1 i w 1 i ≤ x 1 i + r 0 i r 1 i = r 0 i + x 1 i − w 1 i x 1 i ≥ 0 , u 1 ≥ 0 { ∑ i = 1 k ( − p i x 1 i + s i w 1 i − c i r 1 i ) + λ 2 u t + Q 2 ( r 1 , u 1 , δ 1 ) }

For numerical experiments, we considered four different stochastic processes for the demand for goods.

1) Autoregressive model (AR):

X t = c + ∑ i = 1 p a i X t − i + ε t ,

where c is a constant, α i are the model parameters, ε t ~ N ( 0 , σ 2 )

2) Autoregressive moving-average model (ARMA):

X t = c + ∑ i = 1 p α i X t − i + ε t + ∑ j = 1 q β j ε t − j ,

where c is a constant, α i , β j are the model parameters, ε t ~ N ( 0 , σ 2 )

3) Geometric Brownian motion (GBM):

d S t = μ S t + σ S t d W t ,

where μ , σ are process parameters, W t is the Wiener process.

4) Stage independent normal distribution (SIND):

X t = N ( μ , σ 2 )

We are using the processes with chosen parameters, so in our case, the processes look as follows:

1) GBM: d X t = 5 X t d t + 0.1 X t d W t ;

2) AR(1): X t = c + 0.9 X t − 1 + ε t , ε t ~ N ( 0 , σ 2 ) , σ = 1 ;

3) ARMA(1,1): X t = c + ε t + 0.9 X t − 1 + 0.15 ε t − 1 , ε t ~ N ( 0 , σ 2 ) , σ = 1 ;

4) SIND: X t ~ N ( μ , σ 2 ) , μ = 10 , σ = 5 .

The experiments were organized as follows. First of all, for every combination of process (AR, ARMA, GBM, SIND) and scenario grid construction algorithm, we simulated the stochastic process several times. We then checked if the results came from the normal distribution using the Shapiro-Wilk test and quantile-quantile (Q-Q) plot, then compared results using the one-sided t-test. (Q-Q) plots show the relationship between observed data and theoretical quantiles. It is necessary to check our results for normality because we are using the t-test to compare average profits obtained using different lattice construction methods. All the parameters of our case problem are in

For the SDDP algorithm, we used the following optimization criterion:

0.5 E [ P ] + 0.5 CVaR

Our lattice construction parameters:

The number of nodes at each stage is 10; number of scenarios generated from each node during lattice construction is 100.

To perform the experiments, we were using a computer with an Intel Core i5-6300HQ processor running at 2.30 GHz.

We used Python 3.7.3 to generate scenario lattice and Julia 1.3.1 to run the SDDP algorithm; we implemented the SDDP by ourselves. To understand SDDP realization, please refer to [

The full list and versions of used packages are in

Number of commodities | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 |
---|---|---|---|---|---|---|---|---|---|

Production costs p i | 150 | 80 | 30 | 17 | 40 | 10 | 30 | 10 | 20 |

Selling price s i | 200 | 100 | 40 | 22 | 70 | 13 | 60 | 12 | 27 |

Storage cost c i | 30 | 3 | 4 | 2 | 10 | 2 | 7 | 1.5 | 4 |

Maximum production volume v i | 10 | 20 | 40 | 70 | 100 | 120 | 160 | 200 | 300 |

Numpy | 1.16.2 |
---|---|

Pandas | 0.24.2 |

Scipy | 1.2.1 |

Sklearn | 0.22.1 |

JuMP | 0.21.1 |
---|---|

Clp | 0.7.1 |

Distributions | 0.22.5 |

CSV | 0.5.26 |

First of all, it is necessary to compare lattice construction time for every process and lattice construction algorithm because work time plays a crucial role in computing.

As we can see (

As we can see from the tables and plots, our results came from a normal distribution (Figures 4-6 and

In this case, our results are also normal (see Figures 7-9 and

For GBM process, our results are slightly different from the normal distribution but not so much (Figures 10-12 and

For the Competitive learning method, our results slightly differ from the normal distribution but not so much (Figures 13-15 and

K-means | Competitive learning | Voronoi | |
---|---|---|---|

Mean | 26,390.746 | 26,390.623 | 35,290.404 |

SD | 10,196.322 | 10,196.386 | 11,703.831 |

Shapiro-Wilk p-value | 0.688 | 0.688 | 0.182 |

K-means | Competitive learning | Voronoi | |
---|---|---|---|

K-means | - | 0.499 | 0.001 |

Competitive learning | 0.499 | - | 0.001 |

Voronoi | 0.001 | 0.001 | - |

K-means | Competitive learning | Voronoi | |
---|---|---|---|

Mean | 50,931.03 | −44,927.187 | 50,900.551 |

SD | 33,633.708 | 59,910.994 | 33,635.331 |

Shapiro-Wilk p-value | 0.548 | 0.091 | 0.542 |

K-means | Competitive learning | Voronoi | |
---|---|---|---|

K-means | - | 8.17e−10 | 0.498 |

Competitive learning | 8.17e−10 | - | 8.24e−10 |

Voronoi | 0.498 | 8.24e−10 | - |

K-means | Competitive learning | Voronoi | |
---|---|---|---|

Mean | 166,051.19 | 166,163.13 | 165,994.99 |

SD | 52,557.44 | 52,572.10 | 52,972.71 |

Shapiro-Wilk p-value | 0.029 | 0.036 | 0.037 |

K-means | Competitive learning | Voronoi | |
---|---|---|---|

K-means | - | 0.496 | 0.498 |

Competitive learning | 0.496 | - | 0.495 |

Voronoi | 0.498 | 0.495 | - |

K-means | Competitive learning | Voronoi | |
---|---|---|---|

Mean | 26,243.39 | 8323.40 | 27,525.61 |

SD | 10,142.48 | 14,519.86 | 20,496.62 |

Shapiro-Wilk p-value | 0.713 | 0.019 | 0.099 |

K-means | Competitive learning | Voronoi | |
---|---|---|---|

K-means | - | 7.08e−07 | 0.382 |

Competitive learning | 7.08e−07 | - | 6.85e−05 |

Voronoi | 0.382 | 6.85e−05 | - |

As we can see in the numerical results section, in our experimental problem, the Voronoi method slightly outperforms others, but the k-means method is much faster.

1) With the AR process, the Voronoi method performs better than k-means and Competitive learning, which both show almost the same result.

2) With the ARMA process, the Voronoi method is comparable to k-means, and Competitive learning underperforms pretty hard.

3) With GBM process, results of the methods are close to each other.

4) Scenario lattice calculation time was the same for the Competitive learning and Voronoi method but much lower for the k-means.

The authors declare no conflicts of interest regarding the publication of this paper.

Golembiovsky, D., Pavlov, A. and Daniil, S. (2021) Experimental Study of Methods of Scenario Lattice Construction for Stochastic Dual Dynamic Programming. Open Journal of Optimization, 10, 47-60. https://doi.org/10.4236/ojop.2021.102004