^{1}

^{2}

This paper proposes a novel Hamiltonian servo system, a combined modeling framework for control and estimation of a large team/fleet of autonomous robotic vehicles. The Hamiltonian servo framework represents high-dimensional, nonlinear and non-Gaussian generalization of the classical Kalman servo system. After defining the Kalman servo as a motivation, we define the affine Hamiltonian neural network for adaptive nonlinear control of a team of UGVs in continuous time. We then define a high-dimensional Bayesian particle filter for estimation of a team of UGVs in discrete time. Finally, we formulate a hybrid Hamiltonian servo system by combining the continuous-time control and the discrete-time estimation into a coherent framework that works like a predictor-corrector system.

Today’s military forces face an increasingly complex operating environment, whether dealing with humanitarian operations or in the theater of war. In many situations one has to deal with a wide variety of issues from logistics to lack of communications infrastructure [

Given the increasing success of artificial intelligence (AI) techniques in recent times and the advancements made in unmanned vehicle (UV) design, autonomous systems in particular teams or swarms of autonomous vehicles have become a potential attractive solution to solving the complexity issues facing the military today. Teams or swarms of intelligent autonomous vehicles would also introduce a level of survivability to the systems which would increase its utility in such complex and hostile environments [

A problem which arises in utilizing large teams or swarms of autonomous vehicles is one of control. Recently, the US Defence Advanced Research Projects Agency (DARPA) has sought to address some of the control issues associated with large swarm through its Offensive Swarm-Enabled Tactics (OFFSET) [

This paper proposes a Hamiltonian servo system which is a combined modeling framework for the control and estimation of such autonomous teams of UV’s. This paper will demonstrate the ability of this framework to handle the high-dimensional, nonlinear and non-Gaussian nature of the problem at hand. The authors will also show this to be a generalization of the classic Kalman Servo System. Finally results of a number of simulations will be presented to demonstrate the validity of the framework through the authors implementation using Mathematica & C++ code (which is provided in the Appendix).

Slightly more specifically, we recall here that traditional robotics from the 1970s-80s were mainly focused on building humanoid robots, manipulators and leg locomotion robots. Its modeling framework was linear dynamics and linear control, that is, linearized mechanics of multi-body systems (derived using Newton-Euler, Lagrangian, Gibs-Appel or kinetostatics equations of motion) and controlled by Kalman’s linear quadratic controllers (for a comprehensive review, see [

In contrast, contemporary robotics research has mainly been focused on field robotics, based on estimation rather than control, resulting in a variety of self-localization and mapping (SLAM) algorithms (see [

Instead of following one of these two sides of robotics, in the present paper we will try to combine them. From a robotics point-of-view, control and estimation are two complementary sides, or necessary components, of efficient manipulation of autonomous robotic vehicles. In this paper we attempt to develop a computational framework (designed in Wolfram Mathematica^{Ò} and implemented in C++) that unifies both of these components. From a mathematical point-of-view, in this paper we attempt to provide nonlinear, non-Gaussian and high-dimensional generalization of the classical Kalman servo system, outlined in the next section.

In 1960s, Rudolf Kalman developed concept of a controller-estimator servo

system (see

System state dynamics: linear, low-dimensional, multi-input multi-output process cascade (as implemented in Matlab^{Ò} Control and Signal Processing toolboxes):

x ˙ = A x + B u , y = C x + D u , [ x ( 0 ) = x 0 ] , (1)

where we have three vector spaces: state space X ⊂ ℝ n , input space U ⊂ ℝ m and output space Y ⊂ ℝ k , such that x ( t ) ∈ X is an n-vector of state variables, u ( t ) ∈ U is an m-vector of inputs and y ( t ) ∈ Y is a k-vector of outputs; x 0 is the initial state vector. The matrix variables are the following maps: i) n × n dimensional state dynamics A = A ( t ) : X → X ; ii) n × m dimensional input map B = B ( t ) : U → X ; iii) k × n dimensional output map C = C ( t ) : X → Y ; and (iv) k × m dimensional input-output transform D = D ( t ) : U → Y . The first equation in (1) is called the state or dynamics equation and the second equation in (1) is called the output or measurement equation.

Kalman feedback controller: optimal LQR-controller:^{1}

y = K [ x r e f ( t ) − x ( t ) ] , (2)

with the controller gain matrix: K = K ( t ) : X → Y = R c − 1 B T P c determined by the matrix P c governed by the controller Riccati ordinary differential equation (ODE):

P ˙ c = A T P c + P c A + Q c − P c B R c − 1 B T P c , [ P c ( 0 ) = P 0 c ] ,

where P 0 c is the initial controller matrix. Q c : X → X and R c : U → U are matrices from the quadratic cost function:

J = ∫ t 0 t 1 ( x T Q x + u T R u ) d t . (3)

Kalman filter: optimal LQG-estimator^{2} with the (additive, zero-mean, delta- correlated) Gaussian white noise η ( t ) :

x ˙ = A x + B u + L [ y − ( C x + D u ) ] + η , [ x ( 0 ) = x 0 ] , (4)

with the filter gain matrix: L = L ( t ) : X → Y = R f − 1 P f C T determined by the matrix P f governed by the filter Riccati ODE:

P ˙ f = A T P f + P f A + Q f − P f C T R f − 1 C P f , [ P f ( 0 ) = P 0 f ] ,

where P 0 f is the initial filter matrix. Q f : X → X and R f : U → U are covariance matrices from the noisy η -generalization of the quadratic cost function (3).

Defined in this way, Kalman’s controller-estimator servo system has a two-cycle action, similar to the predictor-corrector algorithm: in the first time-step the servo controls the plant (e.g., a UGV), in the second step it estimates the plants ( x , y ) coordinates, in the next step it corrects the control, then estimates again, and so on.

In the remainder of this paper we will develop a nonlinear, non-Gaussian and high-dimensional generalization of the Kalman servo.

Now we switch from matrix to ( α , β = 1 , ⋯ , n )-index notation,^{3} to label the position of n × n individual UGVs within the swarm’s global plane coordinates, longitude and latitude, respectively. The first step in the nonlinear generalization of the Kalman servo is replacing the linear, low-dimensional, state dynamics (1) and control (2)-(3) with nonlinear, adaptive and high-dimensional dynamics and control system, as follows.

In the recent paper [

H α β = 1 2 ( q α β q α − 1 β − 1 + p α β p α − 1 β − 1 ) .

The following pair of attractor matrix equations defines the activation dynamics of a bidirectional recurrent neural network:

q ˙ α β = φ ( A q − q α β − ω α β q q α β 2 p α β ) − ∑ j , k ∂ p α β H α β u α β q , (5)

p ˙ α β = φ ( A p − p α β − ω α β p p α β 2 q α β ) + ∑ j , k ∂ q α β H α β u α β p , (6)

where superscripts ( q , p ) denote the corresponding Hamiltonian equations and the partial derivative is written as: ∂ u H ≡ ∂ H / ∂ u . In Equations ((5), (6)) the matrices q α β = q α β ( t ) and p α β = p α β ( t ) define time evolution of the swarm’s coordinates and momenta, respectively, with initial conditions: q α β ( 0 ) and p α β ( 0 ) . The field attractor lines A q and A p (with the field strength φ ) are defined in the simulated urban environment using the A-star algorithm that finds the shortest Manhattan distance from point A to point B on the environmental digital map (i.e., street directory). The adaptive synaptic weights matrices: ω α β q = ω α β q ( t ) and ω α β p = ω α β p ( t ) are trained by Hebbian learning (11) defined below. Input matrices u α β q = u α β q ( t ) and u α β p = u α β p ( t ) are Lie-derivative based controllers (explained below).

Equations ((5), (6)) are briefly derived as follows (for technical details, see [

H a ( q , p , u ) = H 0 ( q , p ) − ∑ j < n H j ( q , p ) u j ( t , q , p ) , (7)

^{4 c s − 1 } are the coefficients of the linear ODE of order r for the error function e ( t ) = q j ( t ) − q j ref ( t ) :

e ( r ) + c r − 1 e ( r − 1 ) + ⋯ + c 1 e ( 1 ) + c 0 e = 0.

where H 0 ( q , p ) : T ∗ M → ℝ is the physical Hamiltonian function (kinetic plus potential energy of the swarm) and control inputs u j ( t , q , p ) are defined using recurrent Lie derivatives:^{4}

u j = o ˙ R ( r ) j − L f ( r ) H j + ∑ s = 1 r c s − 1 ( o R ( s − 1 ) j − L f ( s − 1 ) H j ) L g L f ( r − 1 ) H j . (8)

Using (7) and (8), the affine Hamiltonian control system can be formulated (for i = 1 , ⋯ , n ; j < i ) as:

q ˙ i = V i + ∂ p i H 0 − ∑ j < i ∂ p i H j u j + ∂ p i D , (9)

p ˙ i = F i − ∂ q i H 0 + ∑ j < i ∂ q i H j u j + ∂ q i D . (10)

where D is Rayleigh’s dissipative function and V i = V i ( t , q , p ) and F i = F i ( t , q , p ) are velocity and force controllers, respectively.

The affine Hamiltonian control system (9)-(10) can be reshaped into a matrix form suitable for a UGV-swarm or a planar formation of a UAV-swarm: q i → q α β , p i → p α β , u i → u α β , by evaluating dissipation terms into cubic terms in the brackets, and replacing velocity and force controllers with attractors A q and A p , respectively (with strength φ ). If we add synaptic weights ω α β we come to our recurrent neural network Equations ((5), (6)).

Next, to make Equations ((5), (6)) adaptive, we use the abstract Hebbian learning scheme:

NewValue = OldValue + Innovation

which in our settings formalizes as (see [

ω ˙ α β q = − ω α β q + Φ α β q ( q α β , p α β ) , (11)

ω ˙ α β p = − ω α β p + Φ α β p ( q α β , p α β ) ,

where the innovations are given in the matrix signal form (generalized from [

Φ α β q = S α β q ( q α β ) S α β q ( p α β ) + S ˙ α β q ( q α β ) S ˙ α β q ( p α β ) , (12)

Φ α β p = S α β p ( q α β ) S α β p ( p α β ) + S ˙ α β p ( q α β ) S ˙ α β p ( p α β ) ,

with sigmoid activation functions S ( ⋅ ) = t a n h ( ⋅ ) and signal velocities: S ˙ ( ⋅ ) = 1 − t a n h ( ⋅ ) .

In this way defined affine Hamiltonian control system (5)-(6), which is also a recurrent neural network with Hebbian leaning (11)-(12), represents our nonlinear, adaptive and high-dimensional generalization of Kalman’s linear state dynamics (1) and control (2)-(3).

Using Wolfram Mathematica^{Ò} code given below, the Hamiltonian neural net was simulated for 1 sec, with the matrix dimensions n = 10 (i.e., with 100 neurons in both matrices q i j and p i j , which are longitudes and latitudes of a large fleet of 100 UGVs, see Figures 2-4). The Figures illustrate both stability and convergence of the recurrent Hamiltonian neural net.

C++ code for the the Hamiltonian neural net is given in Appendix 1.1.

Mathematica^{Ò} code

Dimensions:

n = 10 ; Tfin = 1 ; φ = 30 ;

Coordinates and momenta:

Hyperbolic tangent activation functions:

Derivatives of activation functions:

Control inputs:

Affine Hamiltonians (nearest-neighbor coupling):

Attractor lines (figure-8 attractor):

Aq = Sin [ t ] ; Ap = Cos [ t ] ;

Equations of motion:

Eqns = Flatten [ Join [

Numerical solution:

sol = NDSolve [ [ Eqns , Flatten [ Join [

Plots of coordinates:

Plot [ Evaluate [

Plots of momenta:

Plot [ Evaluate [

Plots of weights:

Plot [ Evaluate [

The second step in the nonlinear generalization of the Kalman servo is to replace the linear/Gaussian Kalman filter (4) with a general nonlinear/non-Gaussian Monte-Carlo filter, as follows.

Recall that the celebrated Bayes rule gives the relation between the three basic conditional probabilities: i) a Prior P ( A ) (i.e., an initial degree-of-belief in event A, that is, Initial Odds), ii) Likelihood P ( B | A ) (i.e., a degree-of-belief in event B, given A, that is, a New Evidence); and Posterior: P ( A | B ) (i.e., a degree-of-belief in A, given B, that is, New Odds). Provided P ( B ) ≠ 0 , Bayes rule reads:

P ( A | B ) Posterior = P ( A ) Prior × P ( B | A ) Likelihood P ( B ) . (13)

In statistical settings, Bayes rule (13) can be rewritten as:

p ( H | D ) Posterior = p ( H ) Prior × p ( D | H ) Likelihood p ( D ) , (14)

where p ( H ) is the prior probability density function (PDF) that the hypothesis H is true, p ( D | H ) is the likelhood PDF for the data D given a hypothesis H, p ( H | D ) is the posterior PDF that the hypothesis H is true given the data D, and the normalization constant: p ( D ) = ∑ H i p ( D | H ) p ( H ) is the PDF for the data D averaged over all possible hypotheses H i .

When Bayes rule (13)-(14) is applied iteratively/recursively over signal distributions evolving in discrete time steps, in such a way that the Old Posterior becomes the New Prior and New Evidence is added, it becomes the recursive Bayesian filter, the formal origin of all Kalman and particle filters. The Bayesian filter can be applied to estimate the hidden state x t of a nonlinear dynamical system evolving in discrete time steps (e.g., a numerical solution of a system of ODEs, or discrete-time sampling measurements) in a recursive manner by processing a sequence of observations Y T = { y t } t = 1 T dependent on the state x t within a dynamic noise (either Gaussian, or non-Gaussian) η t measured as μ t . The state dynamics and measurement, or the so-called state-space model (SSM) are usually given either by Kalman’s state Equation (1), or by its nonlinear generalization. Instead of the Kalman filter Equation (4) combined with the cost function (3), the Bayesian filter includes (see, e.g. [

Time-update equation:

p ( x t | Y t − 1 ) Predictor = ∫ ℝ n p ( x t | x t − 1 ) Prior p ( x t − 1 | Y t − 1 ) Old Posterior d x t − 1 , (15)

and

Measurement-update equation:

p ( x t | Y t ) New Posterior = Z t ( − 1 ) p ( x t | Y t − 1 ) Predictor l ( y t | x t ) Likelihood , (16)

where Z t is the normalizing constant, or partition function, defined by:

Z t = ∫ ℝ n p ( x t | Y t − 1 ) l ( y t | x t ) d x t .

In the Bayesian filter (15)-(16), given the sequence of observations Y t , the posterior fully defines the available information about the state of the system and the noisy environment―the filter propagates the posterior (embodying time and measurement updates for each iteration) across the nonlinear state-space model; therefore, it is the maximum a posteriori estimator of the system state. In a special case of a linear system and Gaussian environment, the filter (15)-(16) has optimal closed-form solution, which is the Kalman filter (4). However, in the general case of a nonlinear dynamics and/or the non-Gaussian environment, it is no longer feasible to search for closed-form solutions for the integrals in (15)-(16), so we are left with suboptimal, approximate, Monte Carlo solutions, called particle filters.

Now we consider a general, discrete-time, nonlinear, probabilistic SSM, defined by a Markov process { x t } t ≥ 1 ⊆ ℝ n x , which is observed indirectly via a measurement process { y t } t ≥ 1 ⊆ ℝ n y . This SSM consists of two probability density functions: dynamics f ( ⋅ ) and measurement h ( ⋅ ) . Formally, we have a system of nonlinear difference equations, given both in Bayesian probabilistic formulation (left-hand side) and nonlinear state-space formulation (right-hand side):

Dynamics : x t + 1 | x t ~ f ( x t + 1 | x t ) ⇔ x t + 1 = f ( x t ) + ϵ t , (17)

Measurement : y t | x t ~ h ( y t | x t ) ⇔ y t = h ( x t ) + ν t , (18)

withinitialstate : x 1 ~ μ ( x 1 ) ,

where f and h are the state and output vector-functions, ϵ t and ν t are non-Gaussian process and measurement noises and μ is a given non-Gaussian distribution. The state filtering problem means to recover information about the current state x t in (17) based on the available measurements y 1 : t in (18) (see, e.g. [

The principle solution to the nonlinear/non-Gaussian state filtering problem is provided by the following recursive Bayesian measurement and time update equations:

p ( x t | y 1 : t ) New Filter = h ( y t | x t ) Measure p ( x t | y 1 : t − 1 ) Predictor p ( y t | y 1 : t − 1 ) , (19)

where

p ( x t | y 1 : t − 1 ) Predictor = ∫ f ( x t | x t − 1 ) Dynamics p ( x t − 1 | y 1 : t − 1 ) Old Filter d x t − 1 . (20)

In a particular case of linear/Gaussian SSMs (17)-(18), the state filtering problem (19)-(20) has an optimal closed-form solution for the filter PDF p ( x t | y 1 : t ) , given by the Kalman filter (4). However, in the general case of nonlinear/non-Gaussian SSMs―such an optimal closed-form solution does not exist.

The general state filtering problem (19)-(20) associated with any nonlinear/ non-Gaussian SSMs (17)-(18) can be only numerically approximated using the sequential Monte Carlo (SMC) methods. The SMC-approximation to the filter PDF, denoted by p ^ ( x t | y 1 : t ) , is an empirical distribution represented as a weighted sum of Dirac-delta functions:

p ^ ( x t | y 1 : t ) = ∑ i = 1 N w t i δ x t i ( x t ) , (21)

where the samples { x t i } i = 1 N are called particles (point-masses ‘spread out’ in the state space); each particle x t i represents one possible system state and the corresponding weight w t i assigns its probability. In this way defined particle filter (PF) plays the role of the Kalman filter for nonlinear/non-Gaussian SSMs. PF approximates the filter PDF p ( x t | y 1 : t ) using the SMC delta-approximation (21).

Briefly, a PF can be interpreted as a sequential application of the SMC technique called importance sampling (IS, see e.g. [

The IS proceeds in an inductive fashion:

p ^ ( x t − 1 | y 1 : t − 1 ) = ∑ i = 1 N w t − 1 i δ x t − 1 i ( x t − 1 ) ,

which, inserted as an old/previous filter into the time-update Equation (20), gives a mixture distribution, approximating p ( x t | y 1 : t − 1 ) as:

p ^ ( x t | y 1 : t − 1 ) = ∫ f ( x t | x t − 1 ) Dynamics ∑ i = 1 N w t − 1 i δ x t − 1 i ( x t − 1 ) d x t − 1 = ∑ i = 1 N w t − 1 i f ( x t | x t − 1 i ) . (22)

Subsequent insertion of (22) as a predictor into the measurement-update Equation (19), gives the following approximation of the filter PDF:

p ( x t | y 1 : t ) Filter ≈ h ( y t | x t ) Measure p ( y t | y 1 : t − 1 ) ∑ i = 1 N w t − 1 i f ( x t | x t − 1 i ) Weighted Dynamics ,

which needs to be further approximated using the IS. The proposal density is pragmatically chosen as: r ( x t | y 1 : t ) = ∑ j = 1 N w t − 1 j f ( x t | x t − 1 j ) . The N ancestor indices { a t i } i = 1 N are resampled into a new set of particles { a t − 1 i } i = 1 N which is subsequently used to propagate the particles to time t.

The final step is to assign the importance weights to the new particles as:

w ¯ t i = h ( y t | x t i ) ∑ j = 1 N w t − 1 j f ( x t i | x t − 1 j ) ∑ j = 1 N w t − 1 j f ( x t | x t − 1 j ) .

By evaluating w ¯ t i for i = 1 , ⋯ , N and normalizing the weights, we obtain a new set of weighted particles { x t i , w t i } i = 1 N , constituting an empirical approximation of the filter PDF p ( x t | y 1 : t ) . Since these weighted particles { x t i , w t i } i = 1 N can be used to iteratively approximate the filter PDF at all future times, this completes the PF-algorithm with an overall computational complexity of O ( N ) (as pioneered by [

We remark that the pinnacle of the PF-theory is the so-called Rao- Blackwellized (or, marginalized) PF, which is a special kind of factored PF, where the state-space dynamics (17) is split into a linear/Gaussian part and a nonlinear/non-Gaussian part, so that each particle has the optimal linear- Gaussian Kalman filter associated to it (see [

^{5}Half continuous-time, half discrete-time.

The Hamiltonian servo system is our hybrid^{5} global control-estimation framework for a large team/fleet of UGVs (e.g., 10 × 10 = 100 ), consisting of the following four components:

Adaptive Control, which is nonlinear and high-dimensional, defined by:

q ˙ α β = φ ( A q − q α β − ω α β q q α β 2 p α β ) − ∑ j , k ∂ p α β H α β u α β q , (23)

p ˙ α β = φ ( A p − p α β − ω α β p p α β 2 q α β ) + ∑ j , k ∂ q α β H α β u α β p ,

ω ˙ α β q = − ω α β q + Φ α β q ( q α β , p α β ) , ω ˙ α β p = − ω α β p + Φ α β p ( q α β , p α β ) , (24)

Φ α β q = S α β q ( q α β ) S α β q ( p α β ) + S ˙ α β q ( q α β ) S ˙ α β q ( p α β ) , ( α , β = 1 , ⋯ , n )

Φ α β p = S α β p ( q α β ) S α β p ( p α β ) + S ˙ α β p ( q α β ) S ˙ α β p ( p α β ) .

Measurement process, given in discrete-time, probabilistic and state-space Hamiltonian ( q , p )-form:

p t α β | q t α β ~ h ( p t α β | q t α β ) ⇔ p t α β = h ( q t α β ) + ν t , with: q 1 α β ~ μ ( q 1 α β ) , (25)

where ν t is a non-Gaussian measurement noise. Note that here we use the inverse dynamics (given coordinates, calculate velocities, accelerations and forces)―as a bridge between robot’s self-localization and control: measuring coordinates q t α β at discrete time steps t and from them calculating momenta p t α β and control forces p ˙ t α β ;

Bayesian recursions, given in discrete-time Hamiltonian ( q , p )-form:

p ( q t α β | p 1 : t α β ) New Filter = h ( p t α β | q t α β ) Measure p ( q t α β | p 1 : t − 1 α β ) Predictor p ( p t α β | p 1 : t − 1 α β ) ,

where

p ( q t α β | p 1 : t − 1 α β ) Predictor = ∫ f ( q t α β | q t − 1 α β ) Dynamics p ( q t − 1 α β | p 1 : t − 1 α β ) Old Filter d x t − 1 ; (26)

and

p ( q t α β | p 1 : t α β ) FilterPDF ≈ h ( p t α β | q t α β ) Measure p ( p t α β | p 1 : t − 1 α β ) ∑ ( i ) = 1 N w t − 1 ( i ) f ( q t α β | q t − 1 α β ( i ) ) We i ghted Dynamics , (27)

with IS weights:

w t ( i ) = h ( p t α β | q t α β ( i ) ) ∑ n = 1 N w t − 1 ( n ) f ( q t α β ( i ) | q t − 1 α β ( n ) ) ∑ n = 1 N w t − 1 ( n ) f ( q t α β | q t − 1 α β ( n ) ) ,

where bracketed superscripts ( i , n ) label IS weights.

This Hamiltonian servo framework has been implemented in C++ language (using Visual Studio 2015; see Appendix 1.2). A sample simulation output of the servo system is given in

In this paper we have outlined a novel control and estimation framework called the Hamiltonian Servo system. The framework was shown to be a generalization of the Kalman Servo System. Using an example of a large team/fleet of 81 ( 9 × 9 ) unmanned ground vehicles (UGVs), it was shown that the framework described provided the control and estimation as required. This approach encompassed a three tired system of adaptive control, measurement process and Bayesian recursive filtering which was demonstrated to enable control and estimation of multidimensional, non-linear and non-Gaussian systems.

The authors are grateful to Dr. Martin Oxenham, JOAD, DST Group, Australia for his constructive comments which have improved the quality of this paper.

Ivancevic, V. and Pourbeik, P. (2017) Hamiltonian Servo: Control and Estimation of a Large Team of Autonomous Robotic Vehicles. Intelligent Control and Automation, 8, 175-197. https://doi.org/10.4236/ica.2017.84014

The following C++ code uses the Armadillo matrix library [