^{1}

^{*}

^{2}

^{3}

In this paper, an efficient computational approach is proposed to solve the discrete time nonlinear stochastic optimal control problem. For this purpose, a linear quadratic regulator model, which is a linear dynamical system with the quadratic criterion cost function, is employed. In our approach, the model-based optimal control problem is reformulated into the input-output equations. In this way, the Hankel matrix and the observability matrix are constructed. Further, the sum squares of output error is defined. In these point of views, the least squares optimization problem is introduced, so as the differences between the real output and the model output could be calculated. Applying the first-order derivative to the sum squares of output error, the necessary condition is then derived. After some algebraic manipulations, the optimal control law is produced. By substituting this control policy into the input-output equations, the model output is updated iteratively. For illustration, an example of the direct current and alternating current converter problem is studied. As a result, the model output trajectory of the least squares solution is close to the real output with the smallest sum squares of output error. In conclusion, the efficiency and the accuracy of the approach proposed are highly presented.

Stochastic dynamical system is a practical system in modeling and simulating the real-world problems. The behavior of the fluctuation, which is caused by the effect of noise disturbance in the dynamical system to represent the real situation, rises to the attention of many researchers. See for examples, [

Recently, the integrated optimal control and parameter estimation (ICOPE) algorithm, which solves the linear model-based optimal control problem iteratively, is proposed in the literature [

In this paper, an efficient matching scheme, which diminishes the adjusted parameters, is established deliberately. In our approach, the model-based optimal control problem is simplified from the discrete-time nonlinear stochastic optimal control problem. Then, this model-based optimal control problem is reformulated into the input-output equations. During the formulation of the input-output equations, the Hankel matrix and the observability matrix are constructed. These matrices capture the characteristic of the model used into the output measurement. By virtue of this, the least square optimization problem is introduced. From the validation of the first order necessary condition, the normal equation is resulted and the optimal control law is updated accordingly on the recursion formula. As a result of this, the sum squares of the output error could be minimized demonstratively. Hence, the efficiency of the algorithm proposed is highly presented.

The rest of the paper is organized as follows. In Section 2, the discrete time nonlinear stochastic optimal control problem and its simplified model-based optimal control problem are described. In Section 3, the system optimization with the least squares updating scheme is discussed. The computation procedure is summarized as the iterative algorithm. In Section 4, the current converter problem is illustrated and the result is demonstrated. Finally, some concluding remarks are made.

Consider a general class of dynamical system given by

x ( k + 1 ) = f ( x ( k ) , u ( k ) , k ) + G ω ( k ) (1a)

y ( k ) = h ( x ( k ) , k ) + η ( k ) (1b)

where u ( k ) ∈ ℜ m , k = 0 , 1 , ⋯ , N − 1 , x ( k ) ∈ ℜ n , k = 0 , 1 , ⋯ , N , and

y ( k ) ∈ ℜ p , k = 0 , 1 , ⋯ , N , are, respectively, the control sequence, the state sequence and the output sequence. The terms ω ( k ) ∈ ℜ q , k = 0 , 1 , ⋯ , N − 1 , and η ( k ) ∈ ℜ p , k = 0 , 1 , ⋯ , N , are, respectively, process noise sequences and output noise sequences. Both of these noise sequences are the stationary Gaussian white noise sequences with zero mean and their covariance matrices are given by Q ω and R η , respectively, where Q ω is a q × q positive definite matrix and R η is a p × p positive definite matrix. In addition, G is an n × q process noise coefficient matrix, f : ℜ n × ℜ m × ℜ → ℜ n represents the plant dynamics and h : ℜ n × ℜ → ℜ p is the output measurement channel.

Here, the aim is to find the control sequence u ( k ) , k = 0 , 1 , ⋯ , N − 1 , such that the following cost function

g 0 ( u ) = E [ φ ( x ( N ) , N ) + ∑ k = 0 N − 1 L ( x ( k ) , u ( k ) , k ) ] (2)

is minimized over the dynamical system in Equation (1), where φ : ℜ n × ℜ → ℜ is the terminal cost and L : ℜ n × ℜ m × ℜ → ℜ is the cost under summation. The cost function g 0 is the scalar function and E [ ⋅ ] is the expectation operator. It is assumed that all functions in Equations (1) and (2) are continuously differentiable with respect to their respective arguments.

The initial state is

x ( 0 ) = x 0

where x 0 ∈ ℜ n is a random vector with mean and variance given, respectively, by

E [ x 0 ] = x ¯ 0 and E [ ( x 0 − x ¯ 0 ) ( x 0 − x ¯ 0 ) T ] = M 0 .

Here, M 0 is an n × n positive definite matrix. It is assumed that the initial state, the process noise and the measurement noise are statistically independent.

This problem is regarded as the discrete-time nonlinear stochastic optimal control problem and is referred to as Problem (P). Notice that the structure of Problem (P) is complex, and the exact solution of Problem (P) is, in general, unable to be obtained. In view of these, Problem (P) is proposed to be solved via solving a simplified model-based optimal control problem iteratively. For this purpose, let this simplified model-based optimal control problem, which is referred to as Problem (M), be given below:

min u ( k ) g 1 ( u ) = 1 2 x ¯ ( N ) T S ( N ) x ¯ ( N ) + 1 2 ∑ k = 0 N − 1 ( x ¯ ( k ) T Q x ¯ ( k ) + u ( k ) T R u ( k ) )

subject to (3)

x ¯ ( k + 1 ) = A x ¯ ( k ) + B u ( k ) , x ¯ ( 0 ) = x ¯ 0

y ¯ ( k ) = C x ¯ ( k )

where x ¯ ( k ) ∈ ℜ n , k = 0 , 1 , ⋯ , N , and y ¯ ( k ) ∈ ℜ p , k = 0 , 1 , ⋯ , N , are, respectively, the expected state sequence and the expected output sequence. Here, A is an n × n state transition matrix, B is an n × m control coefficient matrix, C is a p × n output coefficient matrix, whereas S ( N ) and Q are n × n positive semi-definite matrices, R is a m × m positive definite matrix and g 1 is the scalar cost function.

Notice that only solving Problem (M) actually would not give the optimal solution of Problem (P). However, by establishing an efficient matching scheme, it is possible to approximate the true optimal solution of Problem (P), in spite of model-reality differences. This could be done iteratively.

Now, let us define the Hamiltonian function for Problem (M) as follows:

H ( k ) = 1 2 ( x ¯ ( k ) T Q x ¯ ( k ) + u ( k ) T R u ( k ) ) + p ( k + 1 ) T ( A x ¯ ( k ) + B u ( k ) ) . (4)

Then, from Equation (3), the augmented cost function becomes

g ′ 1 ( u ) = 1 2 x ¯ ( N ) T S ( N ) x ¯ ( N ) + p ( 0 ) T x ¯ ( 0 ) − p ( N ) T x ¯ ( N ) + ∑ k = 0 N − 1 ( H ( k ) − p ( k ) T x ¯ ( k ) ) (5)

where p ( k ) ∈ ℜ n is the appropriate multiplier to be determined later.

Applying the calculus of variation [

1) Stationary condition:

∂ H ( k ) ∂ u ( k ) = R u ( k ) + B T p ( k + 1 ) = 0 (6)

2) Costate equation:

∂ H ( k ) ∂ x ¯ ( k ) = Q x ¯ ( k ) + A T p ( k + 1 ) = p ( k ) (7)

3) State equation:

∂ H ( k ) ∂ p ( k + 1 ) = A x ¯ ( k ) + B u ( k ) = x ¯ ( k + 1 ) (8)

with the boundary conditions x ¯ ( 0 ) = x ¯ 0 and p ( N ) = S ( N ) x ¯ ( N ) .

The following theorem expresses the feedback control law based on the state that can be used in solving Problem (M).

Theorem 1. For the given Problem (M), the optimal control law is the feedback control law defined by

u ( k ) = − K ( k ) x ¯ ( k ) (9)

where

K ( k ) = ( B T S ( k + 1 ) B + R ) − 1 B T S ( k + 1 ) A (10)

S ( k ) = A T S ( k + 1 ) ( A − B K ( k ) ) + Q (11)

with the boundary condition S ( N ) given. Here, the feedback control law is a linear combination of the states. That is, the optimal control is linear state-vari- able feedback.

Proof. From Equation (6), the stationary condition can be rewritten as

R u ( k ) = − B T p ( k + 1 ) . (12)

Applying the sweep method [

p ( k ) = S ( k ) x ¯ ( k ) , (13)

for k = k + 1 in Equation (12) to yield

R u ( k ) = − B T S ( k + 1 ) x ¯ ( k + 1 ) . (14)

Taking Equation (8) into Equation (14), we have

R u ( k ) = − B T S ( k + 1 ) ( A x ¯ ( k ) + B u ( k ) ) .

After some algebraic manipulations, the feedback control law (9) is obtained, where Equation (10) is satisfied.

Now, substituting Equation (13) for k = k + 1 into Equation (7), the costate equation is written as

p ( k ) = Q x ¯ ( k ) + A T S ( k + 1 ) x ¯ ( k + 1 ) (15)

and considering the state Equation (8) in Equation (15), the costate equation becomes

p ( k ) = Q x ¯ ( k ) + A T S ( k + 1 ) ( A x ¯ ( k ) + B u ( k ) ) . (16)

Hence, by applying the feedback control law (9) in Equation (16), and doing some algebraic manipulations, it could be seen that Equation (11) is satisfied after comparing the manipulation result to Equation (13). This completes the proof.

By substituting Equation (9) into Equation (8), the state equation becomes

x ¯ ( k + 1 ) = ( A − B K ( k ) ) x ¯ ( k ) (17)

and the model output is measured from

y ¯ ( k ) = C x ¯ ( k ) . (18)

In view of this, the calculation procedure for obtaining the feedback control law for Problem (M) is summarized below:

Algorithm 1: Feedback control algorithm

Data Given A , B , C , Q , R , S ( N ) , x 0 , N .

Step 0 Calculate K ( k ) , k = 0 , 1 , ⋯ , N − 1 , and S ( k ) , k = 0 , 1 , ⋯ , N , from Equations (10) and (11), respectively.

Step 1 Solve Problem (M) that is defined by Equation (3) to obtain

u ( k ) , k = 0 , 1 , ⋯ , N − 1 , and x ¯ ( k ) , y ¯ ( k ) , k = 0 , 1 , ⋯ , N , respectively, from Equations (9), (17) and (18).

Step 2 Evaluate the cost function g 1 from Equation (3).

Remarks:

1) Data A, B, C can be obtained by the linearization of the real plant f and the output measurement h from Problem (P).

2) In Step 0, the offline calculation is done for K ( k ) , k = 0 , 1 , ⋯ , N − 1 , and S ( k ) , k = 0 , 1 , ⋯ , N .

3) The solution procedure, which the dynamical system is solved in Step 1, and the cost function is evaluated in Step 2, is known as system optimization.

Now, we define the output error r : ℜ m → ℜ p given by

r ( u ) = y ( k ) − y ¯ ( k ) , (19)

where the model output (18) is reformulated as

y ¯ ( k ) = C x ¯ ( k ) + D u ( k ) . (20)

where D is a p × m coefficient matrix. Formulate Equation (20) as the following input-output equations [

[ y ¯ ( 0 ) y ¯ ( 1 ) y ¯ ( 2 ) ⋮ y ¯ ( N ) ] = [ C C A C A 2 ⋮ C A N ] x ¯ 0 + [ D 0 0 ⋯ 0 C B D 0 ⋯ 0 C A B C B D ⋯ 0 ⋮ ⋮ ⋮ ⋱ ⋮ C A N − 1 B C A N − 2 B C A N − 3 B ⋯ D ] [ u ( 0 ) u ( 1 ) u ( 2 ) ⋮ u ( N − 1 ) ] . (21)

For simplicity, we have

y ¯ = E x ¯ 0 + F u (22)

where

E = [ C C A C A 2 ⋮ C A N ] and F = [ D 0 0 ⋯ 0 C B D 0 ⋯ 0 C A B C B D ⋯ 0 ⋮ ⋮ ⋮ ⋱ ⋮ C A N − 1 B C A N − 2 B C A N − 3 B ⋯ D ] .

In addition, define the objective function g 2 : ℜ m → ℜ , which represents the sum squares of error (SSE), given by

g 2 ( u ) = r ( u ) T r ( u ) . (23)

Then, an optimization problem, which is referred to as Problem (O), is stated as follows:

Problem (O):

Find a set of the control sequence u ( k ) , k = 0 , 1 , ⋯ , N − 1 , such that the objective function g 2 is minimized.

It is obviously noticed that for solving Problem (O), Taylor’s theorem [

g 2 ( u ( i + 1 ) ) ≈ g 2 ( u ( i ) ) + ( u ( i + 1 ) − u ( i ) ) T ∇ g 2 ( u ( i ) ) + 1 2 ( u ( i + 1 ) − u ( i ) ) T ( ∇ 2 g 2 ( u ( i ) ) ) ( u ( i + 1 ) − u ( i ) ) (24)

where the higher-order terms are ignored and the notation ∇ represents the differential operator. The first-order condition in Equation (24) with respect to u ( i + 1 ) is expressed by

0 ≈ ∇ g 2 ( u ( i ) ) + ( ∇ 2 g 2 ( u ( i ) ) ) ( u ( i + 1 ) − u ( i ) ) . (25)

Rearrange Equation (25) to yield the normal equation,

( ∇ 2 g 2 ( u ( i ) ) ) ( u ( i + 1 ) − u ( i ) ) = − ∇ g 2 ( u ( i ) ) . (26)

Notice that the gradient ∇ g 2 is calculated from

∇ g 2 ( u ( i ) ) = 2 ∇ r ( u ( i ) ) T r ( u ( i ) ) (27)

and the Hessian of g 2 is computed from

∇ 2 g 2 ( u ( i ) ) = 2 ( ∇ 2 r ( u ( i ) ) T r ( u ( i ) ) + ∇ r ( u ( i ) ) T ∇ r ( u ( i ) ) ) (28)

where ∇ r ( u ( i ) ) is the Jacobian matrix of r ( u ( i ) ) , and its entries are denoted by

( ∇ r ( u ( i ) ) ) i j = ∂ r i ∂ u j ( u ( i ) ) = F , i = 1 , 2 , ⋯ , N − 1 ; j = 1 , 2 , ⋯ , N − 1. (29)

From Equations (27) and (28), Equation (26) can be rewritten as

( ∇ 2 r ( u ( i ) ) T r ( u ( i ) ) + ∇ r ( u ( i ) ) T ∇ r ( u ( i ) ) ) ( u ( i + 1 ) − u ( i ) ) = − ∇ r ( u ( i ) ) T r ( u ( i ) ) . (30)

By ignoring the second-order derivative term, that is, the first term at the left-hand side of Equation (30), and define

Δ u ( i ) = u ( i + 1 ) - u ( i ) , (31)

the normal equation given by Equation (26) is simplified to

( ∇ r ( u ( i ) ) T ∇ r ( u ( i ) ) ) Δ u ( i ) = − ∇ r ( u ( i ) ) T r ( u ( i ) ) . (32)

Then, we obtain the following updating recurrence relation,

u ( i + 1 ) = u ( i ) + Δ u ( i ) (33)

with the initial u ( 0 ) given, where

Δ u ( i ) = − ( ∇ r ( u ( i ) ) T ∇ r ( u ( i ) ) ) − 1 ( ∇ r ( u ( i ) ) T r ( u ( i ) ) ) . (34)

Hence, Equations (33) and (34) are known as the least squares recursive equations, which are based on the Gauss-Newton recursion formula.

From the discussion above, the least-squares updating scheme for the control sequence is summarized below:

Algorithm 2: Least-squares updating scheme

Step 0 Given an initial u ( 0 ) and the tolerance ε . Set i = 0 .

Step 1 Evaluate the output error r ( u ( i ) ) and the Jacobian matrix ∇ r ( u ( i ) ) from Equations (19) and (29), respectively.

Step 2 Solve the normal equation from Equation (32) to obtain Δ u ( i ) .

Step 3 Update the control sequence by using Equation (33). If u ( i + 1 ) = u ( i ) , within a given tolerance ε , stop; else set i = i + 1 and repeat from Step 1 to Step 3.

Remarks:

1) In Step 1, the calculation of the output error r ( u ( i ) ) and the Jacobian matrix ∇ r ( u ( i ) ) are done online, however, the Jacobian matrix ∇ r ( u ( i ) ) could be done offline if it is independent from u ( i ) .

2) In Step 2, the inverse of ∇ r ( u ( i ) ) T ∇ r ( u ( i ) ) must be exist, and the value of Δ u ( i ) represents the step-size for the control set-point.

3) In Step 3, the initial u ( 0 ) is calculated from (9). The condition u ( i + 1 ) = u ( i ) is required to be satisfied for the converged optimal control sequence. The following 2-norm is computed and it is compared with a given tolerance to verify the convergence of u ( k ) :

‖ u ( i + 1 ) − u ( i ) ‖ = ( ∑ k = 0 N − 1 ‖ u ( k ) ( i + 1 ) − u ( k ) ( i ) ‖ ) 1 / 2 . (35)

4) In order to provide a convergence mechanism for the state sequence, a simple relaxation method is employed:

x ¯ ( i + 1 ) = x ¯ ( i ) + k x ( x − x ¯ ( i ) ) (36)

where k x ∈ ( 0 , 1 ] , and x is the state sequence of the real plant.

In this section, an example of the direct current and alternating current (DC/AC) converter model is illustrated [

Problem (P):

min u g 0 ( u ) = E [ 1 2 x ( N ) T S ( N ) x ( N ) + 1 2 ∑ k = 0 N − 1 ( x ( k ) T Q x ( k ) + u ( k ) T R u ( k ) ) ]

subject to

x ˙ 1 ( t ) = ( x 2 ( t ) ) 2 x 1 ( t ) − 5 x 1 ( t ) + 5 u ( t ) + ω 1 ( t )

x ˙ 2 ( t ) = ( x 2 ( t ) ) 3 ( x 1 ( t ) ) 2 − 7 x 2 ( t ) + ( 5 x 2 ( t ) x 1 ( t ) + 2 x 1 ( t ) ) u ( t ) + ω 2 ( t )

y ( t ) = x 2 ( t ) + η ( k )

with the initial x ( 0 ) = ( 0.1 , 0 ) T . Here, ω ( k ) = ( ω 1 ( k ) ω 2 ( k ) ) T and η ( k ) are Gaussian white noise sequences with their respective covariance given by

Q ω = 10 − 2 and R η = 10 − 1 . The weighting matrices in the cost function are

S ( N ) = I 2 × 2 , Q = I 2 × 2 and R = 1 .

Problem (M):

min u g 1 ( u ) = 1 2 x ¯ ( N ) T S ( N ) x ¯ ( N ) + 1 2 ∑ k = 0 N − 1 ( x ¯ ( k ) T Q x ¯ ( k ) + u ( k ) T R u ( k ) )

subject to

( x ¯ 1 ( k + 1 ) x ¯ 2 ( k + 1 ) ) = ( 1 − 5 ⋅ T 0 0 1 − 7 ⋅ T ) ( x ¯ 1 ( k ) x ¯ 2 ( k ) ) + ( 5 ⋅ T 0.2 ⋅ T ) u ( k )

y ¯ ( k ) = x ¯ 2 ( k )

for k = 0 , 1 , ⋯ , 80 , with the sampling time T = 0.01 second.

The simulation result is shown in ^{3} is the value of cost function of the original optimal control problem. It is noticed that the cost reduction is 89.28 percent. The value of SSE, which is 8.046097 × 10^{−12}, shows the smallest differences between the real output and the model output.

The trajectories of final control and real control are shown in

Number of Iterations | Initial Cost | Final Cost | Original Cost | SSE |
---|---|---|---|---|

4 | 0.0429 | 116.9461 | 1.0881 × 10^{3} | 8.046097 ´ 10^{−12} |

A computational algorithm, which is equipped with the efficient matching scheme, was discussed in this paper. The linear model-based optimal control problem, which is simplified from the discrete time nonlinear stochastic optimal control problem, was solved iteratively. During the calculation procedure, the model used was reformulated into the input-output equations. Then, the least squares optimization problem was introduced. By satisfying the first order necessary condition, the normal equation was solved such that the least squares

recursion formula could be established. In this way, the control policy could be updated iteratively. As a result of this, the sum squares of the output errors was minimized, which indicates the equivalent of the model output to the real output. For illustration, an example of the direct current and alternating current converter model was studied. The result obtained showed the accuracy of the algorithm proposed. In conclusion, the efficiency of the algorithm proposed was highly presented.

The authors would like to thanks the Universiti Tun Hussein Onn Malaysia (UTHM) for financial supporting to this study under Incentive Grant Scheme for Publication (IGSP) VOT. U417. The second author was supported by the NSF (11501053) of China and the fund (15C0026) of the Education Department of Hunan Province.

Kek, S.L., Li, J. and Teo, K.L. (2017) Least Squares Solution for Discrete Time Nonlinear Stochastic Optimal Control Problem with Model-Reality Differences. Applied Mathematics, 8, 1-14. http://dx.doi.org/10.4236/am.2017.81001