Discrete-Time Nonlinear Stochastic Optimal Control Problem Based on Stochastic Approximation Approach

In this paper, a computational approach is proposed for solving the discrete-time nonlinear optimal control problem, which is disturbed by a sequence of random noises. Because of the exact solution of such optimal control problem is impossible to be obtained, estimating the state dynamics is currently required. Here, it is assumed that the output can be measured from the real plant process. In our approach, the state mean propagation is applied in order to construct a linear model-based optimal control problem, where the model output is measureable. On this basis, an output error, which takes into account the differences between the real output and the model output, is defined. Then, this output error is minimized by applying the stochastic approximation approach. During the computation procedure, the stochastic gradient is established, so as the optimal solution of the model used can be updated iteratively. Once the convergence is achieved, the iterative solution approximates to the true optimal solution of the original optimal control problem, in spite of model-reality differences. For illustration, an example on a continuous stirred-tank reactor problem is studied, and the result obtained shows the applicability of the approach proposed. Hence, the efficiency of the approach proposed is highly recommended.


Introduction
Nonlinear optimal control problem, which is disturbed by random noises, is an interesting research topic.In presence of the random noises, the entire state trajectory could not be measured exactly.Due to the nonlinear structure and the fluctuation behavior of the dynamical system, an efficient computational approach is, therefore, necessarily required to estimate the state dynamics.Further from this, the state estimate shall be used to optimize and control the dynamical system, where the optimal control policy is drawn apparently [1] [2] [3] [4] [5].From literatures, the applications of the nonlinear stochastic optimal control are widely studied, see for examples, vehicle trajectory planning [6], portfolio selection problem [7], building structural system [8], investment in insurance [9], switching system [10], machine maintenance problem [11], nonlinear differential game problem [12], and viscoelastic systems [13].
In recent years, using the linear optimal control model with model-reality differences in solving the nonlinear optimal control problem, especially for discrete-time nonlinear stochastic optimal control problem, is proposed [14] [15] [16] [17].Such method is known as the integrated optimal control and parameter estimation (IOCPE) algorithm.In this approach, the adjusted parameters are introduced into the model used, so as the differences between the real plant and the model used can be calculated repeatedly.This algorithm is an iterative procedure, where system optimization and parameter estimation are integrated interactively.During the computation procedure, the optimal solution of the model used is updated iteratively.Once the convergence is achieved, the iterative solution of the model used approximates to the true optimal solution of the original optimal control problem, in spite of model-reality differences.
Besides, the applications of the IOCPE algorithm in providing the expectation solution as well as the filtering solution of the discrete-time nonlinear stochastic optimal control problem have been well-demonstrated [14] [15].In addition, the optimal output solution obtained from the IOCPE algorithm has been improved by using the weighted output residual [16], which is introduced into the model cost function, and the output matching scheme [17], where the adjusted parameter is introduced into the model output.Moreover, the application of the approaches on the least-square and the Gauss-Newton with the principle of model-reality differences, which omits from using the adjusted parameters, enhance the practical usage of the IOCPE algorithm for delivering the optimal solution of the original optimal control problem [18] [19].
By virtue of the improvement done, it is simply seen that the efficiency of the IOCPE algorithm for solving the discrete-time nonlinear stochastic optimal control problem is shown.However, we find that the output residual from the Kalman filtering theory could be further reduced, in turn, having an efficient output solution for representing the original output.Hence, in this paper, we aim to improve the accuracy of the output solution of the model used.In our approach, the stochastic approximation approach, which is an iterative stochastic optimization algorithm [20] [21] [22] [23], is applied.The advantage of the Advances in Pure Mathematics stochastic approximation algorithm is to find the optimum of a function, which cannot be computed directly, but only be estimated from noisy observations [24] [25] [26] [27], and its applications to control systems have been well-defined [28] [29] [30] [31] [32].This advantage motivates us on applying the stochastic approximation algorithm into the IOCPE algorithm can significantly reduce the output residual compared to those output residual from the Kalman filtering theory.Here, the optimal control law, which is based on the state mean propagation, is constructed.At the end of iteration, the trajectories of state and control, which are in expectation manner, are obtained, while the output trajectory could track the real output closely.Hence, the efficiency of the approach proposed is highly recommended.
The rest of the paper is organized as follows.In Section 2, a general discrete-time nonlinear stochastic optimal control problem is described.In Section 3, the stochastic approximation scheme, which is combined with the principle of model-reality differences, is discussed.The calculation procedure is then formulated as an iterative algorithm.In Section 4, an illustrative example on a continuous stirred-tank reactor problem is studied and the applicability of the approach proposed is presented.Finally, some concluding remarks are made.

Problem Statement
Consider a general discrete-time nonlinear stochastic optimal control problem given by where ( ) Here, 0 J is the scalar cost function and [ ] E ⋅ is the expectation operator.It is assumed that all functions in (1) are continuously differentiable with respect to their respective arguments.
The initial state where 0 n x ∈ℜ is a random vector with mean and covariance are, respectively, given by [ ] ( )( ) is a positive definite matrix.It is assumed that initial state, process noise and measurement noise are statistically independent.
This problem, which is regarded as the discrete-time stochastic optimal control problem, is referred to as Problem (P).Notice that the exact solution of Problem (P) is, in general, unable to be obtained.Moreover, applying the nonlinear filtering theory to estimate the state of the real plant is computationally demanding.Nevertheless, the output can be measured from the real plant process.
In view of these weaknesses, a linear model-based optimal control problem, which is referred to as Problem (M), is constructed, given by where ( ) the expected state sequence and the expected output sequence; It is emphasized that only solving Problem (M) would not give the optimal solution of Problem (P).However, by establishing an efficient matching scheme based on the output error, which is the differences between the real output and the model output, to Problem (M), it is possible to obtain the optimal solution of Problem (P) as solving Problem (M) iteratively.In this point of view, we are motivated to look into the possibility of constructing an expanded optimal control model with the output error.This model formulation is for obtaining the true optimal solution of Problem (P) despite model-reality differences.

Optimal Control with Stochastic Approximation
Now, let us define the expanded optimal control problem, which is referred to as Problem (E), is formulated by where ( ) is introduced to separate the output sequence from the respective signals in the output error problem.It is important to note that the algorithm is to be designed such that the constraint ( ) ( ) ŷ k y k = will be satisfied at the end of the iterations.In this situation, the output ( ) ŷ k will be used for the output error problem and the establishment of the matching scheme, whereas the corresponding output ( ) y k will be reserved for the mod- el output after optimizing the model-based optimal control problem.Here, the output error is defined as

Necessary Optimality Conditions
Define the Hamiltonian function as follows then, the augmented cost function becomes where ( ) ( ) ( ) p k q k r k and ( ) s k are the appropriate multipliers to be de- termined later.
Applying the calculus of variation [2] [14] [33] to the augmented cost function (6), the following necessary optimality conditions are obtained: 1) Stationary condition: 2) Co-state equation: 3) State equation: with the boundary conditions ( ) 5) Separable variables: with the multipliers ( ) ( ) In view of these necessary optimality conditions, the conditions (7a), (7b) and (7c) are the necessary conditions for Problem (M), while the necessary condition (7d) is an adjustable output measurement.Notice that with this adjustable output, the real output could be tracked by the model output as closely as possible once the output residual is significantly minimized.

Feedback Optimal Control Law
From (7a), the feedback optimal control law can be calculated from ( ) For more detail, see [14] [18] [19] [33] for the proof of the derivation on this feedback optimal control law.

Stochastic Approximation Scheme
In general, the recursive equation for the stochastic approximation (SA) algorithm [28] [30] [31] [32] is defined by where ( ) and the stochastic gradient, which is assumed to be measurable for the objective function given in (3), is introduced as Refer to the SA algorithm (12), it leads to the following iterative equations: These equations would be used to update the optimal solution of Problem (E), in turn, to approximate the optimal solution of Problem (P), in spite of model-reality differences.
Consequently, to evaluate the stochastic gradient, rewrite the output error defined in (4), for k = k +1, as ( ) ( ) ( ) ( ) ( ) where the separable variable in (7e) is satisfied.After that, taking the expected output measured (7d) for k = k +1, and substituting ( ) by the state equation (10), we have Hence, from the objective function (3) in Problem (E), the stochastic gradient, which the chain rule differentiation is applied, is calculated from On the other hand, the gain sequence ( ) a k , which is given in (12), has the asymptotic normality and its convergence property has been well-defined [20] [24] [26] [30] [31].In particular, the formulation form of the gain sequence ( ) where a and b are strictly positive and the stability constant A ≥ 0. The practical value of b is 0.602, which provides the generally more desirable slowly decaying gain (17).

Computational Algorithm
From the discussion above, the resulting algorithm provides the optimal solution of the linear model-based optimal control problem.This optimal solution is then updated based on the stochastic approximation algorithm to approximate the true optimal solution of the original optimal control problem.As a result, the computation procedure of the iterative algorithm is summarized as follows.
Step 3: Update the optimal solution given, respectively, by (13a), (13b) and , within a given tolerance, stop; else set and repeat from Step 1.
Remarks 1) The off-line computation is done, as stated in Step 0, to calculate , for the control law design.
Then, these parameters are used for solving Problem (M) in Step 0 and for solving Problem (E) in Step 2, respectively.
2) The variable ( ) is zero in Step 0 and the calculated value of ( ) i k α changes from iteration to iteration.
3) Problem (P) is not necessary to be linear or to have a quadratic cost function.
4) The conditions ( ) ( ) and ( ) ( ) are required to be satisfied for the converged optimal control sequence and the converged state estimate sequence.The following averaged 2-norms are computed and then they are compared with a given tolerance to verify the convergence of ( ) 5) The gain sequence ( ) a k , which is considered in the algorithm proposed, is ( ) ( ) where A = 0 from (17).
The linear model-based optimal control problem, which is simplified from Problem (P) and is referred to as Problem (M), is defined by ) 1 1.0895 0.0184 0.003 1 0.1095 0.9716 0.000 By running the approach proposed, the simulation result is shown in Table 1, where it is compared to the result of the filtering solution [15].It can be seen that the iteration number of the approach proposed is more than the iteration number of filtering model, and the final cost of the approach proposed is greater than the final cost of filtering model.But, it is found that the output residual of the approach proposed is dramatically reduced to 0.000216 unit, which is a 99 percent reduction.This percentage shows that the model output solution obtained by the approach proposed is significantly closely to the real output trajectory.Hence, this indicates that the approach proposed is practically useful in obtaining the real output solution.
The trajectories of control, state, and output are, respectively, shown in Figures 1-3.It is noticed that the trajectories of control and state are smoothly freely from the disturbance of random noise sequences.This is because of they are an ideal deterministic optimal solution to the nonlinear model-based optimal control problem.However, the real output that is disturbed by the random noise sequences is really fluctuated.By applying the approach proposed, the model  output trajectory could follow the real output trajectory as closely as possible.
Additionally, the output error, which is presenting the differences between the real output and the model output, is shown in Figure 4.As a result of this, it is concluded that the approach proposed is efficient and its applicability is demonstrated.

Concluding Remarks
Applying the stochastic approximation scheme into the IOCPE algorithm was discussed in this paper.The aim is to improve the output solution of the model used.From previous studies, the IOCPE algorithm is for solving the discrete-time nonlinear stochastic optimal control problem, while the stochastic approximation is for the stochastic optimization.In combining these two approaches, the state mean propagation is constructed, where the adjusted parameter is added into the model output used.During the calculation procedure, the differences between the real plant and the model used are taken into account for updating the iterative solution repeatedly.On the other hand, the least square output error is established such that the stochastic gradient is derived.Consequently, the iterative solution approximates to the optimal solution of the original optimal control problem, in spite of model-reality differences.For illustration, an example on a continuous stirred-tank reactor problem was studied to show the applicability of the approach proposed.In conclusion, the efficiency of the approach proposed is highly recommended.
For the future research direction, it is suggested to apply the SA algorithm to solve the linear model-based optimal control problem, without calculating the adjusted parameter, in order to obtain the true optimal solution of the nonlinear optimal control problem.The result would be compared to the result which is obtained by using the Gauss-Newton method [18] [19].Hence, the calculation procedure in the IOCPE could be simplified.

References
definite matrix.Here, 1 J is the scalar cost function.

θ
is the set of the parameters to be estimated, gradient, and ( ) a k is the gain sequence.On this basis, refer to Problem (E), let us define ( ) ( ) ( ) ( )