^{1}

^{*}

^{1}

^{*}

^{2}

^{*}

^{2}

^{*}

^{1}

^{*}

In this paper, a practical analysis of stability by simulation for the effect of incorporating a Kalman estimator in the control loop of the inverted pendulum with a neurocontroller is presented. The neurocontroller is calculated by approximate optimal control, without considering the Kalman estimator in the loop following the Theorem of the separation. The results are compared with a time-varying linear controller, which in noiseless conditions in the state or in the measurement has an acceptable performance, but when it is under noise conditions its operation closes into a state space range more limited than the one proposed here.

The motivation for this case study known as inverted pendulum control arises from the need to obtain robust controller systems to implement in situations where it is desired to maintain equilibrium of an unstable system. A direct related situation is the attitude control of a booster rocket at takeoff for sending a payload to space. It is a well-known problem in the control theory literature [

In this paper, cases with and without estimator of a model that represents an inverted pendulum are studied. When a time varying linear quadratic regulator (TVLQR) with direct state measurement is used, good performance can be achieved. However, it can be improved with a neurocontroller. The obtained performances by using linear and neurocontroller are shown in

(3760.7) is 28% higher than that of the Neurocontroller (2691.3). However, when the controller is used in more realistic situations using a state estimator and considering noisy conditions in the measurements, the performance of the linear controller deteriorates more than the performance of the neurocontroller even until fails to stabilize the system for the same initial conditions. In this paper, an analysis of the system performance deterioration is shown when it requires a state estimator.

This paper is organized as follows. After this Introduction, the problem is detailed and expressed as mathematical equation in Section 2. In Section 3 is detailed the proposed solution. In Section 4 the implementation of the obtained solution and another one with classical methods for comparison purposes is developed. The obtained results are discussed in Section 5, with its pros and cons.

The dynamic programming approach assumes that the process evolution can be split in stages [

This section introduces the nomenclature used along the article. The symbols are listed and explained.

i. k ∈ ℕ describes the discrete time variable.

ii. x(k), with x ( k ) ∈ ℜ n is the time dependent n-dimensional state vector whose components has the system’s state variables. These variables describe the process dynamics over discrete time.

iii. f is a nonlinear continuous function, f : ℜ n × ℜ m → ℜ n that describes the relation between the state vector for two time instants.

iv. u ( k ) ∈ ℜ m is the system input or manipulated vector.

v. I is a convex function, I : ℜ n × ℜ m × ℕ → ℜ + called the performance index designed by the control engineer.

vi. J is a convex function, J : ℜ n × ℜ m × ℕ → ℜ + named the cost function designed by the control engineer.

vii. X is a bounded and closed set, X ⊂ ℜ n .

viii. U is a bounded and closed set, U ⊂ ℜ m .

ix. m is the control law or decision policy μ : ℜ n → ℜ m , this function maps each state vector value with a control action u.

x. r and v are real value arrays of parameters r , v ∈ ℜ n + h + 2 , where h ∈ ℕ is determined by the control engineer.

xi. J ˜ is the approximation of the cost function J, and its domains includes the parameter vector r, J ˜ : ℜ n × ℜ m × ℜ n + h + 2 → ℜ + .

xii. μ ˜ is a function whose behavior approximates the function m, includes the parameter vector v, μ ˜ : ℜ n × ℜ n + h + 2 → ℜ m .

xiii. J * ( x ( k ) ) is the minimum cost to go from the state x at time k up to the terminal state at time N.

xiv. u o ( k ) ∈ ℜ m is the optimal control action at time k.

xv. ξ 1 , ξ 2 , ⋯ , ξ h are real scalar values.

xvi. S ˜ data set of samples from the process under study.

xvii. C_{(}_{i}_{) }is the cost associate to a control law evolving from the state i up to the terminal process state.

xviii. J ( i ) μ is the value of the cost to go function obtained after use the control law m starting at state i up to terminal state at time N.

xix. Ñ gradient operator.

xx. Q ( i , u ) , real valued function associated at state i and action u.

xxi. η n is a function that varies with iteration number n, bounded between 0 and 1.

xxii. Q ˜ ( i , u ) is the approximate version of the factor Q ( i , u ) .

xxiii. γ n is the discount factor, variable with iteration n and bounded between 0 and 1.

xxiv. μ ¯ control action expressed as look up table from every state.

xxv. J ˜ μ ( ⋅ , r ) approximate cost to go function associated with the control law m.

xxvi. A ∈ ℜ n × n is the state matrix for the linear dynamic model.

xxvii. B ∈ ℜ n × 1 is the input matrix of the linear dynamic model.

xxviii. F ∈ ℜ n × n is the additive noise model at the state variables.

xxix. v ( k ) ∈ ℜ n is the random sequence with Gaussian distribution in each variable, zero mean and unit variance.

xxx. y ( k ) ∈ ℜ is the linear model output.

xxxi. C ∈ ℜ 1 × n is the linear model output matrix.

xxxii. G ∈ ℜ is the noise model at the measured variable.

xxxiii. w ( k ) ∈ ℜ is a white noise sequence with zero mean and unit variance.

xxxiv. δ ∈ ℜ is longitudinal displacement of the cart.

xxxv. δ ˙ ∈ ℜ is longitudinal velocity of the cart.

xxxvi. ϕ ∈ ℜ is the angle of the inverted pendulum bar.

xxxvii. ϕ ˙ ∈ ℜ is the angular velocity of the inverted pendulum bar.

xxxviii. M_{P} is the cart concentrated mass, whose value here is 0.5 Kgr.

xxxix. m_{P} is the bar concentrated mass, valued here is 0.1 Kgr.

xl. F_{P} is the displacement friction constant assigned 0.1 N∙m^{−1}∙s.

xli. l_{P} is the size of the pendulum bar, 0.6 m.

xlii. g_{P} is the standard acceleration due to gravity, 9.81 m∙s^{−2}.

xliii. Q ∈ ℜ 4 × 4 is the weighing matrix for the state vector from k = 0 to k = N − 1 with N the terminal state time.

xliv. S ∈ ℜ 4 × 4 is the weighing matrix for the state vector at the terminal state time N.

xlv. R ∈ ℜ is the weighing matrix for the control action variable.

Thus, to formulate the optimal control problem the expressions of the process model in discrete time, the restrictions in the variables and the cost function to be minimized are presented. Next, the problem of minimizing the separable cost function is considered by

x ( k + 1 ) = f ( x ( k ) , u ( k ) , k ) , k = 0 , 1 , ⋯ , N − 1 (1)

where x(0) has a fixed value and the constraints must be satisfied together with the system equation,

J ( x ( k ) , u ( k ) ) = ∑ k = 0 N I ( x ( k ) , u ( k ) ) (2)

where the constraints on state and manipulated variables are

x ∈ X ⊂ ℜ n , u ∈ U ⊂ ℜ m . (3)

The function I(×) is defined by the control engineer which must be convex but not necessarily quadratic, and f(×) is the nonlinear relationship between instants k and k + 1 of the state and manipulated variables. Moreover, they are bounded and continuous functions of their arguments, and both x and u belong to closed and bounded subsets of Â^{n} and Â^{m}, respectively. Then, the Weierstrass theorem asserts that there exists a minimization policy also called control law. Therefore, it is desired to find a correspondence relation

μ ( x ( k ) ) : ℜ n → ℜ m (4)

that makes evolve the processes modelled by (2) from any initial condition to the final terminal state x(N) satisfying constraints (3), and minimizing the cost function (1). The implementation is shown in

In order to solve the formulated problem, the proposed solution is by using dynamic programming and then approximations are introduced through functions

μ ˜ ( x ( k ) , v ) : ℜ n → ℜ m , (5)

J ˜ ( x , u , r ) : ℜ n × m × N → ℜ + (6)

where the parameter vectors v and r must be determined.

The procedure to solve the optimal control problem for both continuous and discrete time dynamic systems is well known [

The principle of optimality [

which a dynamic process evolves over time through stages. Applying the principle of optimality in (1), we obtain

J * ( x ( k ) ) = min u ( k ) J ( x ( k ) , u ( k ) ) = min u ( k ) { I ( x ( k ) , u ( k ) ) + J * ( f ( x ( k ) , u ( k ) ) ) } , (7)

called the Bellman’s Equation. Therefore, the optimal control action u^{o} will be

u o ( k ) = arg min u ( k ) { I ( x ( k ) , u ( k ) ) + J * ( f ( x ( k ) , u ( k ) ) ) } , (8)

which is the optimal policy of decisions or optimal control law. Note that J^{*} does not depend explicitly on u(k), as shows Equation (8).

To obtain the control law or the decision policy, there exists numerical methods, [

The approximation function incorporates a set of vectors of parameters r, which is defined as a partitioned vector whose structure defines the function structure,

r = { r 1 1 , r 1 2 , ⋯ , r 1 h , r 2 } (9)

where each vector r_{1} has the same dimension, which is the number of inputs of the function plus one to consider a static scalar unit parameter. So, h intermediate scalar values ξ are computed as the scalar product between the input vector x and the corresponding parameters as

ξ 1 = [ x T 1 ] ⋅ r 1 1 , (10)

ξ 2 = [ x T 1 ] ⋅ r 1 2 , (11)

ξ h = [ x T 1 ] ⋅ r 1 h , (12)

every single value are processed through the hyperbolic tangent function, avoiding large numbers by

f ( ξ ) = exp ( ξ ) − exp ( − ξ ) exp ( ξ ) + exp ( − ξ ) = 1 − 2 1 + exp ( 2 ξ ) (13)

where the right side has only one exp(×) computation for improving calculation time. So, with these h values together with the polarization 1 the inner product is implemented with the rest of the r parameter vector which is r_{2}, and must be consistent in its dimension to be able to perform the product

μ ˜ ( x ( k ) , r ) = [ f ( ξ 1 ) f ( ξ 2 ) ⋯ f ( ξ h ) 1 ] T ⋅ r 2 . (14)

This approximation function has the parameter h as a tuning parameter for the dimension of vector r, in terms of the structure of the approximation function.

Finding a suitable value for vector r, one have the approximated value of the minimum cost that is incurred to reach the terminal state from the current state x(k), and with the model of the system can be found the control policy using the argument u(k) that minimizes

J * ( x ( k ) ) = min u k { I ( x ( k ) , u ( k ) ) + J ˜ * ( f ( x ( k ) , u ( k ) ) , r ) } . (15)

For finding r, the search process that finds the policy function is divided into two tasks, as shown in

A set of representative data S ˜ in the state space in a domain is available and for each state i ∈ S ˜ the cost values C(i) are calculated. To this end, an initial control law or control policy is proposed, and the system (1) is evolved from the given state i to the terminal stage, evaluating the performance index by expression (2) of the cost function J ( i ) μ . This procedure is performed for every state i ∈ S ˜ . Then the approximated cost function is tuned by minimizing in r

min r ∑ i ∈ S ˜ ( J ˜ ( i , r ) − C ( i ) ) 2 (16)

an approximation function for the cost associated to the evaluated policy is obtained. The parameters vector r is obtained by minimizing expression (16). The incremental gradient iteration is

r : = r + η n ∇ J ˜ ( i , r ) ( C ( i ) − J ˜ ( i , r ) ) , ∀ i ∈ S ˜ (17)

where η fulfills the conditions

∑ n = 0 ∞ η n ( i , u ) = ∞ , ∑ n = 0 ∞ η n 2 ( i , u ) < ∞ , ∀ i , u ∈ U ( i ) , (18)

such that the algorithm converges, where n refers to the tuning iteration n. For computing C(i) the performance evaluation is implemented through the system model (1) and the cost function (2).

Then the costs associated with each state-action pair are computed, by using

the auxiliary cost function Q(i, u), which in its approximate version is

Q ˜ ( i , u ) = I ( i , u ) + γ n J ˜ ( j , r ) (19)

where g_{n} is a discount factor that can vary from iteration to iteration up to reach unity. Then, the improved policy is obtained by the table

μ ¯ ( i ) = arg min u ∈ U ( i ) ( I ( i , u ) + γ n J ˜ ( j , r ) ) , ∀ i ∈ S ˜ . (20)

Once available J ˜ μ ( ⋅ , r ) , it can be obtained μ ¯ ( ⋅ ) from Equation (20). Then the costs associated with each state, symbolized by C(i), are evaluated by Equation (17), and the r parameters are tuned by obtaining a new version of the approximation function J ˜ μ ( i , r ) , ∀ i ∈ S ˜ . Then, the policy improvement task is carried out, in which a new tabulated control policy μ ¯ ( ⋅ ) expressed as (20) is obtained. After that, the calculation of the costs for each state i starts, and in each iteration the function g_{n} is updated.

Simultaneously with the described tasks, an approximation for the improved control law μ ¯ ( ⋅ ) is introduced, by a function with parameters v as shown

Thus, since the function μ ( ⋅ ) is the analytical solution of the optimal control problem, it is intended to obtain an approximation μ ˜ ( ⋅ , v ) of the function μ ( ⋅ ) -which is expressed as table, where v is the parameter vector.

To find the approximation function μ ˜ ( ⋅ , v ) , using the data of the improved policy μ ¯ ( ⋅ ) defined in (20), it is proposed to minimize the expression

min v ∑ i ∈ S ( μ ˜ ( i , v ) − μ ¯ ( i ) ) 2 (21)

within the set S ˜ , where the control law is represented by μ ˜ ( ⋅ , v ) with the tuning parameters vector v. A solution for Equation (21) is obtained by the incremental gradient method [

v : = v + η n ∇ μ ˜ ( i , v ) ( μ ¯ ( i ) − μ ˜ ( i , v ) ) , ∀ i ∈ S ˜ (22)

where η_{n} fulfills the conditions (18). A summary of the algorithm is detailed in

Note that two approximation problems are solved at the same time, since given μ ¯ ( ⋅ ) it is evaluated to find J ˜ μ ( ⋅ , r ) of J μ ( ⋅ ) defined by C(i) with i ∈ S ˜ .

Then, given J ˜ μ ( ⋅ , r ) , the improved policy μ ¯ ( i ) is computed for i ∈ S ˜ and then find the new policy μ ˜ ( ⋅ , v ) . Once the function μ ˜ ( ⋅ , v ) is available, the control actions are obtained as shown in

The algorithm to solve the optimal control problem for nonlinear processes with non-quadratic cost function and constraints was detailed. Given the employment of approximations, the topic of approximation function in dynamic systems [

As general suggestions, it must be mentioned that as in many nonlinear system, the algorithm is strongly dependent on the initial conditions. Thus, its dependence lies on the initial policy and on the states used to compute μ ˜ ( ⋅ , v ) , represented by set S ˜ .

The parameter tune speed with respect of the iterations, is fixed by the function γ, and the method is sensitive to this parameter. Usually one can make the first attempts setting γ = 1 constant, with few iterations, and then begin to modify it to converge to 1 with the iterations, always verifying that the performance of the controller improves at the long term. The adjustment parameters amount in each approximation function depends on the data complexity, which generally are conditioned by implementing some normalization or feature extraction techniques.

The inverted pendulum can be represented as shown

system, a controller will be designed using the algorithm of

{ ( M P + m P ) δ ¨ + m P l P ϕ ¨ cos φ − m P l P ϕ ˙ 2 sin ϕ + F P δ ˙ = u l P ϕ ¨ − g P sin ϕ + δ ¨ cos ϕ = 0 (23)

whereas the controller is designed the system trajectories are generated by simulation for initial angle f of 0.2 radians. It is considered that the force u must fulfill with the constraint − 30 ≤ u ≤ 30 .

The proposed cost function is composed by

J ( x , u ) = ∑ k = 0 N x k T [ 5 0 0 0 0 0 0 0 0 0 50 0 0 0 0 0 ] x k + 0.001 ⋅ θ u (24)

where q_{u} is defined to constrain the values of u_{k} by

θ u = { | u k − 30 | ,if u k > 30 , | − 30 − u k | ,if u k < − 30. (25)

The continuous time model is discretized at a rate of 0.1 Section.

In order to retrieve the system state vector, a Kalman estimator is used, where the discrete-time linearized version estimate of (23) is given by

x k + 1 = A x k + B u k + F v k (26)

y k = C x k + G w k (27)

where x T = [ δ , δ ˙ , ϕ , ϕ ˙ ] , u_{k} is the operating force as shows

F = [ σ 11 0 0 0 0 σ 22 0 0 0 0 σ 33 0 0 0 0 σ 44 ] (28)

with σ i i = 1 × 10 − 3 for the state vector x, and

G = [ ς ] (29)

with ς = 1 × 10 − 3 for the measure y(k). Furthermore, C = [1 0 0 0].

To find the x(k) estimate, it used a priori estimate of the observed states by means of

x ^ ( k + 1 ) − = A x ^ ( k ) + B u ( k ) (30)

and these states x ^ are obtained from measurements of the system output

x ^ ( k ) = x ^ ( k ) − + K O ( y ( k ) − C x ^ ( k ) − ) (31)

where K_{O} is the Kalman gain [

The tune of the ^{T} to the final time, set in 10 sec. Note that the behavior is not stable in first half of the performed iterations, but then stabilizes and the control objective is achieved.

The set S ˜ was of 3000 samples in the Â^{4} space of the state variables, with the range shown in

u ( k ) = μ ˜ ( x ^ ( k ) , v ) (32)

where x ^ ( k ) is obtained from Equation (31), and v contains the parameters corresponding to a set as (9), where 7 hidden nodes were used, which gives 7 vectors r 1 i ∈ ℜ 5 , and vector r 2 ∈ ℜ 8 , implementing the Equation (14). For the approximation of the function J(⋅) defined in Equation (24) it is used the same structure of the approximation function as for the control law. The parameters tuning was performed by the Levenberg Marquardt algorithm [

In order to perform the comparison of the neurocontroller performance, a time variant-discrete linear quadratic regulator controller with the classical LQR theory in discrete time [

Q = [ 10 0 0 0 0 10 3 0 0 0 0 10 0 0 0 0 10 3 ] , S = [ 10 0 0 0 0 10 0 0 0 0 10 0 0 0 0 10 2 ] and R = 1 . (33)

Since the control objective of the system with estimator is that the pendulum does not fall, a qualitative analysis of the performance of the TVLQR and NC controllers can be inferred. As can be seen in the examples shown, in

implement the algorithm of

Item | Do |
---|---|

1 | Initialize: g_{n}, Iterations, h, parameters r and v. |

2 | Evaluate the initial policy μ ˜ ( ⋅ , v ) through (16). |

3 | Update the parameters r via (17) |

4 | Compute functions Q(×) by (19) |

5 | Update policy μ ¯ ( ⋅ ) using (20) |

6 | Update the parameters v of μ ˜ ( ⋅ , v ) by (22) |

7 | Update function g_{n}. |

8 | Go to 2 and repeat items 2, 3, 4, 5, 6 and 7 until the complete the Iterations |

The temporal evolution of these indices is shown in

In this paper, a stability analysis of a neurocontroller with Kalman estimator for the inverted pendulum case was presented. The NC performance was compared against the TVLQR controller with the same Kalman estimator.

The obtained results can be stated that the use of approximate optimal control implemented through

The initial angle range was possible to extend from 0.19 rad for the case of linear controller with estimator, to 0.47 rad in the present scheme simulating a Monte Carlo with 150 trajectories.

It is important to highlight that the technique requires a good mastery for functions approximation for dynamic processes, and simulation of natural process through numerical methods.

Pucheta, J., Rodríguez Rivero, C., Salas, C., Herrera, M. and Laboret, S. (2017) Stability Analysis of a Neurocontroller with Kalman Estimator for the Inverted Pendulum Case. Applied Mathematics, 8, 1602-1618. https://doi.org/10.4236/am.2017.811117