Stability Analysis of a Neurocontroller with Kalman Estimator for the Inverted Pendulum Case ()

Julián Antonio Pucheta^{1}, Cristian Rodríguez Rivero^{1}, Carlos Alberto Salas^{2}, Martín Herrera^{2}, Sergio Oscar Laboret^{1}

In this paper, a practical analysis of stability by simulation for the effect of incorporating
a Kalman estimator in the control loop of the inverted pendulum
with a neurocontroller is presented. The neurocontroller is calculated by approximate
optimal control, without considering the Kalman estimator in the
loop following the Theorem of the separation. The results are compared with
a time-varying linear controller, which in noiseless conditions in the state or
in the measurement has an acceptable performance, but when it is under
noise conditions its operation closes into a state space range more limited
than the one proposed here.

Cite this paper

Pucheta, J. , Rivero, C. , Salas, C. , Herrera, M. and Laboret, S. (2017) Stability Analysis of a Neurocontroller with Kalman Estimator for the Inverted Pendulum Case. *Applied Mathematics*, **8**, 1602-1618. doi: 10.4236/am.2017.811117.

1. Introduction

The motivation for this case study known as inverted pendulum control arises from the need to obtain robust controller systems to implement in situations where it is desired to maintain equilibrium of an unstable system. A direct related situation is the attitude control of a booster rocket at takeoff for sending a payload to space. It is a well-known problem in the control theory literature [1] [2] and machine learning [3] [4] [5] [6] . However, such problems are challenging since real systems are difficult to control and this is to some extent due to the fact that redundant feedback systems must be considered by the controller, as an effect similar to that of incorporating an estimator into the variables controller status. This fact causes instability in the closed loop, which must be foreseen and analyzed. The analysis can be done through simulations with the estimator-controller system, in order to establish some stability domain. In this work we opt for the control based on optimization [7] [8] [9] , where the optimal control problem is formulated. To solve the problem of optimal control, a very powerful tool is the Dynamic Programming technique [10] [11] implemented with approximations [3] [4] [5] in a machine learning scheme [12] , since it allows dealing with constrained, nonlinear processes and non-quadratic performance indexes. However, it is often difficult to achieve a methodology for the implementation of controllers based on machine learning, since they require heuristic and a good knowledge of the involved adaptation mechanisms [3] . In this work, a methodology to determine the conditions that achieve good results to implement in simulation is shown. A controller consisting of a compact function called neurocontroller is achieved.

In this paper, cases with and without estimator of a model that represents an inverted pendulum are studied. When a time varying linear quadratic regulator (TVLQR) with direct state measurement is used, good performance can be achieved. However, it can be improved with a neurocontroller. The obtained performances by using linear and neurocontroller are shown in Figure 1 and Figure 2, respectively. Note that the cumulative cost of the linear controller

Figure 1. Controller TVLQR with direct measurements. The initial conditions are x_{0} = [0 0 f_{0} 0]^{T}, where f_{0} takes the values 5.7˚, 11.5˚ and 63˚. Accumulated costs from each initial condition are 12.5, 51.8 and 3760.7, respectively.

Figure 2. Neurocontroller with direct measurements. The initial conditions are x_{0} = [0 0 f_{0} 0]^{T}, where f_{0} takes the values 5.7˚, 11.5˚ and 63˚. Accumulated costs from each initial condition are 123.8, 140.7 and 2691.3, respectively.

(3760.7) is 28% higher than that of the Neurocontroller (2691.3). However, when the controller is used in more realistic situations using a state estimator and considering noisy conditions in the measurements, the performance of the linear controller deteriorates more than the performance of the neurocontroller even until fails to stabilize the system for the same initial conditions. In this paper, an analysis of the system performance deterioration is shown when it requires a state estimator.

This paper is organized as follows. After this Introduction, the problem is detailed and expressed as mathematical equation in Section 2. In Section 3 is detailed the proposed solution. In Section 4 the implementation of the obtained solution and another one with classical methods for comparison purposes is developed. The obtained results are discussed in Section 5, with its pros and cons.

2. Problem Formulation

The dynamic programming approach assumes that the process evolution can be split in stages [10] , so take the version of dynamic systems in discrete time [9] is straightforward. The problem formulation puts in formal terms the optimal control elements. These elements are the cost function to minimize, the control law and the dynamic system model with its constraints. If the system model cannot be or is not feasible to express it in closed analytical form through a differential equation, it is useful to generate a black box model [13] .

This section introduces the nomenclature used along the article. The symbols are listed and explained.

i. $k\in \mathbb{N}$ describes the discrete time variable.

ii. x(k), with $x\left(k\right)\in {\Re}^{n}$ is the time dependent n-dimensional state vector whose components has the system’s state variables. These variables describe the process dynamics over discrete time.

iii. f is a nonlinear continuous function, $f:{\Re}^{n}\times {\Re}^{m}\to {\Re}^{n}$ that describes the relation between the state vector for two time instants.

iv. $u\left(k\right)\in {\Re}^{m}$ is the system input or manipulated vector.

v. I is a convex function, $I:{\Re}^{n}\times {\Re}^{m}\times \mathbb{N}\to {\Re}^{+}$ called the performance index designed by the control engineer.

vi. J is a convex function, $J:{\Re}^{n}\times {\Re}^{m}\times \mathbb{N}\to {\Re}^{+}$ named the cost function designed by the control engineer.

vii. X is a bounded and closed set, $X\subset {\Re}^{n}$ .

viii. U is a bounded and closed set, $U\subset {\Re}^{m}$ .

ix. m is the control law or decision policy $\mu :{\Re}^{n}\to {\Re}^{m}$ , this function maps each state vector value with a control action u.

x. r and v are real value arrays of parameters $r,v\in {\Re}^{n+h+2}$ , where $h\in \mathbb{N}$ is determined by the control engineer.

xi. $\stackrel{\u02dc}{J}$ is the approximation of the cost function J, and its domains includes the parameter vector r, $\stackrel{\u02dc}{J}:{\Re}^{n}\times {\Re}^{m}\times {\Re}^{n+h+2}\to {\Re}^{+}$ .

xii. $\stackrel{\u02dc}{\mu}$ is a function whose behavior approximates the function m, includes the parameter vector v, $\stackrel{\u02dc}{\mu}:{\Re}^{n}\times {\Re}^{n+h+2}\to {\Re}^{m}$ .

xiii. ${J}^{*}\left(x\left(k\right)\right)$ is the minimum cost to go from the state x at time k up to the terminal state at time N.

xiv. ${u}^{o}\left(k\right)\in {\Re}^{m}$ is the optimal control action at time k.

xv. ${\xi}_{1},{\xi}_{2},\cdots ,{\xi}_{h}$ are real scalar values.

xvi. $\stackrel{\u02dc}{S}$ data set of samples from the process under study.

xvii. C_{(}_{i}_{) }is the cost associate to a control law evolving from the state i up to the terminal process state.

xviii. ${J}_{\left(i\right)}^{\mu}$ is the value of the cost to go function obtained after use the control law m starting at state i up to terminal state at time N.

xix. Ñ gradient operator.

xx. $Q\left(i,u\right)$ , real valued function associated at state i and action u.

xxi. ${\eta}_{n}$ is a function that varies with iteration number n, bounded between 0 and 1.

xxii. $\stackrel{\u02dc}{Q}\left(i,u\right)$ is the approximate version of the factor $Q\left(i,u\right)$ .

xxiii. ${\gamma}_{n}$ is the discount factor, variable with iteration n and bounded between 0 and 1.

xxiv. $\stackrel{\xaf}{\mu}$ control action expressed as look up table from every state.

xxv. ${\stackrel{\u02dc}{J}}^{\mu}\left(\cdot ,r\right)$ approximate cost to go function associated with the control law m.

xxvi. $A\in {\Re}^{n\times n}$ is the state matrix for the linear dynamic model.

xxvii. $B\in {\Re}^{n\times 1}$ is the input matrix of the linear dynamic model.

xxviii. $F\in {\Re}^{n\times n}$ is the additive noise model at the state variables.

xxix. $v\left(k\right)\in {\Re}^{n}$ is the random sequence with Gaussian distribution in each variable, zero mean and unit variance.

xxx. $y\left(k\right)\in \Re $ is the linear model output.

xxxi. $C\in {\Re}^{1\times n}$ is the linear model output matrix.

xxxii. $G\in \Re $ is the noise model at the measured variable.

xxxiii. $w\left(k\right)\in \Re $ is a white noise sequence with zero mean and unit variance.

xxxiv. $\delta \in \Re $ is longitudinal displacement of the cart.

xxxv. $\stackrel{\dot{}}{\delta}\in \Re $ is longitudinal velocity of the cart.

xxxvi. $\varphi \in \Re $ is the angle of the inverted pendulum bar.

xxxvii. $\stackrel{\dot{}}{\varphi}\in \Re $ is the angular velocity of the inverted pendulum bar.

xxxviii. M_{P} is the cart concentrated mass, whose value here is 0.5 Kgr.

xxxix. m_{P} is the bar concentrated mass, valued here is 0.1 Kgr.

xl. F_{P} is the displacement friction constant assigned 0.1 N∙m^{−1}∙s.

xli. l_{P} is the size of the pendulum bar, 0.6 m.

xlii. g_{P} is the standard acceleration due to gravity, 9.81 m∙s^{−2}.

xliii. $Q\in {\Re}^{4\times 4}$ is the weighing matrix for the state vector from k = 0 to k = N − 1 with N the terminal state time.

xliv. $S\in {\Re}^{4\times 4}$ is the weighing matrix for the state vector at the terminal state time N.

xlv. $R\in \Re $ is the weighing matrix for the control action variable.

Thus, to formulate the optimal control problem the expressions of the process model in discrete time, the restrictions in the variables and the cost function to be minimized are presented. Next, the problem of minimizing the separable cost function is considered by

$x\left(k+1\right)=f\left(x\left(k\right),u\left(k\right),k\right),\text{\hspace{0.17em}}k=0,1,\cdots ,N-1$ (1)

where x(0) has a fixed value and the constraints must be satisfied together with the system equation,

$J\left(x\left(k\right),u\left(k\right)\right)={\displaystyle \underset{k=0}{\overset{N}{\sum}}I\left(x\left(k\right),u\left(k\right)\right)}$ (2)

where the constraints on state and manipulated variables are

$x\in X\subset {\Re}^{n},\text{}u\in U\subset {\Re}^{m}.$ (3)

The function I(×) is defined by the control engineer which must be convex but not necessarily quadratic, and f(×) is the nonlinear relationship between instants k and k + 1 of the state and manipulated variables. Moreover, they are bounded and continuous functions of their arguments, and both x and u belong to closed and bounded subsets of Â^{n} and Â^{m}, respectively. Then, the Weierstrass theorem asserts that there exists a minimization policy also called control law. Therefore, it is desired to find a correspondence relation

$\mu \left(x\left(k\right)\right):{\Re}^{n}\to {\Re}^{m}$ (4)

that makes evolve the processes modelled by (2) from any initial condition to the final terminal state x(N) satisfying constraints (3), and minimizing the cost function (1). The implementation is shown in Figure 3, where the flow of information between the controller and the closed-loop system is stated. Note that the behavior of the closed loop system is done by designing the performance index which is added at each stage in the cost function (1).

3. Proposed Solution

In order to solve the formulated problem, the proposed solution is by using dynamic programming and then approximations are introduced through functions

$\stackrel{\u02dc}{\mu}\left(x\left(k\right),v\right):{\Re}^{n}\to {\Re}^{m},$ (5)

$\stackrel{\u02dc}{J}\left(x,u,r\right):{\Re}^{n\times m\times N}\to {\Re}^{+}$ (6)

where the parameter vectors v and r must be determined.

The procedure to solve the optimal control problem for both continuous and discrete time dynamic systems is well known [7] [8] [14] , and consists of analytically minimize the proposed cost function (1) and from this minimization achieve an expression for function μ. When the system is linear and the cost function is quadratic, the optimal control problem has unique solution through the Riccati Equation. However, when the system is nonlinear the solution of the Hamilton-Jacobi-Bellman equation [14] must be found, whose solution is restricted to a certain class of nonlinear systems. Here, an optimization principle to solve the same control problem that allows to use any cost function and respecting the constraints in the state variables and in the control variables in a natural way is used.

The principle of optimality [10] allows solving an optimization problem in

Figure 3. Implementation of the controller based on numerical dynamic programming.

which a dynamic process evolves over time through stages. Applying the principle of optimality in (1), we obtain

$\begin{array}{l}{J}^{*}\left(x\left(k\right)\right)=\underset{u\left(k\right)}{\mathrm{min}}J\left(x\left(k\right),u\left(k\right)\right)\\ \text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}=\underset{u\left(k\right)}{\mathrm{min}}\{I\left(x\left(k\right),u\left(k\right)\right)+{J}^{*}\left(f\left(x\left(k\right),u\left(k\right)\right)\right)\},\end{array}$ (7)

called the Bellman’s Equation. Therefore, the optimal control action u^{o} will be

${u}^{o}\left(k\right)=\mathrm{arg}\underset{u\left(k\right)}{\mathrm{min}}\{I\left(x\left(k\right),u\left(k\right)\right)+{J}^{*}\left(f\left(x\left(k\right),u\left(k\right)\right)\right)\},$ (8)

which is the optimal policy of decisions or optimal control law. Note that J^{*} does not depend explicitly on u(k), as shows Equation (8).

To obtain the control law or the decision policy, there exists numerical methods, [3] [4] [10] [11] and approximations [3] [5] [12] which are detailed below. Now, an approximation function for values of Equation (1) in a compact domain is introduced. Thus, a compact representation of the cost associated with each state of the process is obtained.

The approximation function incorporates a set of vectors of parameters r, which is defined as a partitioned vector whose structure defines the function structure,

$r=\left\{{r}_{1}^{1},{r}_{1}^{2},\cdots ,{r}_{1}^{h},{r}_{2}\right\}$ (9)

where each vector r_{1} has the same dimension, which is the number of inputs of the function plus one to consider a static scalar unit parameter. So, h intermediate scalar values ξ are computed as the scalar product between the input vector x and the corresponding parameters as

${\xi}_{1}=\left[\begin{array}{cc}{x}^{\text{T}}& 1\end{array}\right]\cdot {r}_{1}^{1},$ (10)

${\xi}_{2}=\left[\begin{array}{cc}{x}^{\text{T}}& 1\end{array}\right]\cdot {r}_{1}^{2},$ (11)

${\xi}_{h}=\left[\begin{array}{cc}{x}^{\text{T}}& 1\end{array}\right]\cdot {r}_{1}^{h},$ (12)

every single value are processed through the hyperbolic tangent function, avoiding large numbers by

$f\left(\xi \right)=\frac{\mathrm{exp}\left(\xi \right)-\mathrm{exp}\left(-\xi \right)}{\mathrm{exp}\left(\xi \right)+\mathrm{exp}\left(-\xi \right)}=1-\frac{2}{1+\mathrm{exp}\left(2\xi \right)}$ (13)

where the right side has only one exp(×) computation for improving calculation time. So, with these h values together with the polarization 1 the inner product is implemented with the rest of the r parameter vector which is r_{2}, and must be consistent in its dimension to be able to perform the product

$\stackrel{}{}!DOCTYPE\; html\; PUBLIC\; "\; -\; w3c\; dtd\; xhtml\; 1.0\; transitional\; en"\; "http:\; www.w3.org\; tr\; xhtml1\; xhtml1-transitional.dtd">\n$