Distributed Learning for Echo State Networks with Dynamic Event-Triggered Consensus

Xiyu Gong; Wu Ai

doi:10.4236/jcc.2025.134003

Journal of Computer and Communications > Vol.13 No.4, April 2025

Distributed Learning for Echo State Networks with Dynamic Event-Triggered Consensus

Xiyu Gong¹, Wu Ai^1,2*
¹School of Mathematics and Statistics, Guilin University of Technology, Guilin, China.
²Guangxi Colleges and Universities Key Laboratory of Applied Statistics, Guilin, China.
DOI: 10.4236/jcc.2025.134003 PDF HTML XML 34 Downloads 133 Views

Abstract

This paper is devoted to studying the dynamic event-triggered consensus problem in distributed Echo State Networks (ESNs). The traditional ESN problem can be formulated as a set of distributed sub-problems with consensus constraints, which are solved using the Zero-Gradient-Sum (ZGS) distributed optimization strategy. Additionally, each agent employs a distributed dynamic event-triggered strategy based on its internal dynamic variables to achieve asymptotic consensus convergence. The purposeful dynamic triggering strategy is more communication-efficient compared to static triggering strategies and eliminates the need for continuous communication to update controllers or monitor triggering thresholds. Numerical simulation results are presented to validate the efficacy of the proposed algorithm.

Keywords

Echo State Network (ESN), Dynamic Event-Triggered Control, Optimization

Share and Cite:

Gong, X. and Ai, W. (2025) Distributed Learning for Echo State Networks with Dynamic Event-Triggered Consensus. Journal of Computer and Communications, 13, 35-49. doi: 10.4236/jcc.2025.134003.

1. Introduction

Echo state network is a novel type of recurrent neural network. Compared to other recurrent neural networks, ESN can handle time series classification and regression tasks by simply training the weight matrix connecting the hidden layer to the output layer using linear regression [1]. In practical applications, due to the large volume of data and privacy concerns, distributed ESNs have been widely applied [2]. In recent years, extensive research has been conducted on ESNs. For example, Zhu proposed a distributed learning method for DeepESN (DESN) based on improved singular value decomposition (SVD) and alternating direction method of multipliers (ADMM), which effectively reduces computation and communication costs for training DESNs in edge environments [3]. In [4], a novel distributed ESN model integrated with an auto-encoder (AE-DESNm) eliminates the dimension disaster problem and uses an extreme learning machine (ELM) to extract features for estimating key variables. Integrating ESN with the decentralized average consensus (DAC) [5] approach facilitates achieving consensus in output values. Over time, considerable advancements have been made in methods for enhancing ESNs. For instance, a hybrid evolutionary algorithm that combines a competitive swarm optimizer with local search has been applied to mitigate the challenges associated with the random selection of input and reservoir weights [6]. Additionally, an optimized echo state network (O-ESN) proposed in [7] utilizes binary particle swarm optimization to minimize network complexity while improving generalization capabilities.

In a distributed framework, the assumption for achieving consensus convergence is often that each agent communicates continuously with its neighbors, which typically requires unlimited communication resources. However, a perfect communication network with infinite bandwidth capacity is usually not feasible in real-life scenarios [8]. To address the resource consumption issue in continuous communication among multi-agents, Lu proposed a general sampling framework where the sampled signals originate from a known union of subspaces, and the sampling operator is linear. Furthermore, the framework identified the minimum sampling requirements for several classes of signals [9]. If the continuous sampling signals fluctuate minimally, agents will engage in continuous unnecessary communication, resulting in a large amount of wasted samples [10]. The introduction of the event-based control method has garnered widespread interest, as it adjusts the sampling sequence based on events related to the system state, thus avoiding the communication resource wastage associated with traditional periodic sampling [11]. An event-triggered consensus protocol and triggering law proposed in [12] can achieve global consensus if and only if the underlying directed graph has a directed spanning tree. Moreover, it is found that the existence of a directed spanning tree is always a necessary and sufficient condition for achieving global consensus, regardless of whether input saturation is present. This conclusion remains valid even when more complex nonlinear dynamic behaviors are introduced. The fully distributed event-triggered algorithm proposed by Berneberg ensures asymptotic convergence to the average consensus state while allowing each agent to independently select a desired minimum inter-event time, thereby guaranteeing non-Zeno behavior [13]. In [14], static and dynamic event-triggered methods are introduced as secondary control techniques to restore voltage frequency and improve power sharing accuracy.

A new dynamic event-triggered control strategy, which includes internal dynamic variables, was proposed in [15]. Compared to static event-triggered strategies, it offers more advantages. As a result, dynamic event-triggered strategies have been used in recent years to solve the consensus problem in multi-agent systems (MAS). Later, dynamic event triggering was applied to islanded DC microgrids [16] and vehicle platoon systems [17], where the data from neighboring DGs are used only at event-triggered times, thus reducing the burden on the communication network. In our previous work [18], we applied an event-triggered control policy to the communication component of the problem to avoid unnecessary transmissions, where the agents send messages only when the negotiation is critically needed. In [19], a distributed cooperative learning algorithm designed for stochastic configuration networks with fixed-time convergence, known as FixD-SCN, which is presented to tackle the ‘Big Data’ problem. To our best knowledge, little effort has been made in the field of dynamic event-triggered distributed learning over peer-to-peer networks, especially for RNNs.

The purpose of this paper is to propose a distributed learning algorithm for training the ESN, in which all agents converge asymptotically while minimizing the use of communication resources during the process. To accomplish this objective, we extend the multi-agent system with dynamic event-triggered algorithms to address the distributed learning problem. Initially, we convert the centralized ESN into a distributed form by breaking down the training process into smaller components, each subject to equality constraints on the output parameters. Next, a dynamic event-triggered algorithm is used to reduce communication during the process of optimizing the distributed ESN output weights. The proposed distributed algorithm is inspired by the use of dynamic event-triggered techniques to achieve distributed consensus convergence [20]. This work extends that of Scardapane et al. [21] by computing the final optimal consensus of ESN output parameters. The primary contributions presented in this paper are summarized below.

A fully decentralized algorithm for training ESNs is introduced, where the training process is collaboratively executed without relying on a central coordinator.
The convergence with dynamic event-triggered mechanism of a distributed learning algorithm is addressed to provide an asymptotically convergent.
The convergent property of the proposed algorithm is mathematically substantiated when the Lyapunov theory is employed.

Notation: Denote the set of real (natural) numbers as $ℝ$ ( $ℕ$ ). Let $1_{m}$ ( $0_{m}$ ) represent the $m$ -dimensional column vector where each entry is 1 (0). The matrix $I_{L} \in ℝ^{L \times L}$ is the identity matrix. Letting $‖ \cdot ‖$ and ${‖ \cdot ‖}_{1}$ represent the Euclidean norm (2-norm) and 1-norm, respectively. $ρ (\cdot)$ denotes the spectral radius of a matrix. $⊤$ is used to denote the transpose of a vector or matrix and $\otimes$ to represent the Kronecker product. For a vector $x = {[x_{1}, x_{2}, \dots, x_{D}]}^{⊤} \in ℝ^{D}$ , $| x | = {[| x_{1} |, \dots, | x_{D} |]}^{⊤}$ , denote the $υ$ -norm of $x$ by ${‖ x ‖}_{υ} = {(\sum_{i = 1}^{D} {| x_{i} |}^{υ})}^{\frac{1}{υ}}$ , denote the sign of $x$ by $sign (x) = {[sign (x_{1}), \dots, sign (x_{D})]}^{⊤}$ , where $sign (\cdot)$ represents the sign function; denote $x^{[α]} = {[x_{1}^{[α]}, \dots, x_{D}^{[α]}]}^{⊤}$ with $x_{i}^{[α]} = {| x_{i} |}^{α} sign (x_{i})$ , where $| \cdot |$ denotes the absolute value. Then we have $x_{i} sign (x_{i}) = | x_{i} |$ , $x_{i} x_{i}^{[α]} = {| x_{i} |}^{α + 1}$ and $| x_{i} | x_{i}^{[α]} = x_{i}^{[α + 1]}$ . Let $\underline{≺}$ denote the matrix inequality sign, i.e., $A \underline{≺} B$ , indicating that $B - A$ is a positive semidefinite matrix.

2. Problem Formulation

2.1. Graph Theory

We design a communication network of $J$ agents distributed over a geographic area. Mathematically, we represent the communication network by a graph $G = (V, ℰ, A)$ of order $J$ with $V = {1, \dots, J}$ depicting the set of vertices, $ℰ \subseteq V \times V$ representing the edge set, and $A = [a_{i j}] \in ℝ^{J \times J}$ illustrates the weighted adjacency matrix, where $a_{i j} > 0$ if $(j, i) \in ℰ$ and $a_{i j} = 0$ , otherwise. The graph has no loops or multiple edges. If there is a connection between node $i$ and node $j$ , $(j, i) \in ℰ$ . The set of neighbors of node $i$ is represented by $N_{i} = {j : (j, i) \in ℰ}$ . Define the Laplacian matrix $ℒ = [l_{i j}] \in ℝ^{J \times J}$ , where $l_{i j} = - a_{i j}$ for $i \neq j$ and $l_{i i} = \sum_{j = 1}^{J} a_{i j}$ . The matrix $ℒ$ is symmetric and positive semi-definite, with all eigenvalues being non-negative.

2.2. Problem Formulation

Lemma 1 (graphs) For an undirected connected graph $G$ with $J$ vertices, the eigenvalues of $ℒ$ contains only one zero entry. $0 = λ_{1} (ℒ)$ possessing a right eigenvector, $1$ , i.e., $ℒ 1 = 0$ , and other have positive real parts. Suppose $0 = λ_{1} (ℒ) < λ_{2} (ℒ) \leq \dots \leq λ_{J} (ℒ) = λ_{\max} (ℒ)$ in an ascending order. The minimum nonzero eigenvalue $λ_{2} (ℒ) = \min_{x \neq 0, 1^{⊤} x = 0} \frac{x^{⊤} ℒ x}{x^{⊤} x}$ , where $x = {[x_{1}, \dots, x_{J}]}^{⊤}$ . We have

$x^{⊤} ℒ x = \frac{1}{2} \sum_{i = 1}^{J} \sum_{j = 1}^{J} a_{i j} {(x_{j} - x_{i})}^{2} .$ (1)

Furthermore, if $1^{⊤} x = 0$ , then $x^{⊤} ℒ x \geq λ_{2} (ℒ) x^{⊤} x$ .

Lemma 2 (nonsingular) Let $κ_{1}, κ_{2}, \dots, κ_{J} \geq 0$ . Then

${(\sum_{i = 1}^{J} κ_{i})}^{p} \leq \sum_{i = 1}^{J} κ_{i}^{p} \leq J^{1 - p} {(\sum_{i = 1}^{J} κ_{i})}^{p}$ (2)

if $0 < p \leq 1$ , and

$J^{1 - p} {(\sum_{i = 1}^{J} κ_{i})}^{p} \leq \sum_{i = 1}^{J} κ_{i}^{p} \leq {(\sum_{i = 1}^{J} κ_{i})}^{p}$ (3)

if $1 < p < \infty$ .

2.3. Distributed ESN over Distributed Datasets in a Multi-Agent Networks

ESN consists of three parts: an input layer, a hidden layer (reservoir), and an output layer. It is a variant of feedforward neural networks, with the training process involving feeding back output results to the hidden layer to adjust connection weights. The input vector $x_{n} \in ℝ^{D_{in}}$ , with a dimensionality of $D_{in}$ , is fed into a reservoir of dimensionality $D_{res}$ . The internal state of the reservoir, $g_{n} \in ℝ^{D_{res}}$ , is computed as follows:

$g_{n} = f_{res} (β_{in}^{res} x_{n} + β_{res}^{res} g_{n - 1} + β_{out}^{res} y_{n - 1}),$ (4)

where $β_{in}^{res} \in ℝ^{D_{res} \times D_{in}}, β_{res}^{res} \in ℝ^{D_{res} \times D_{res}}$ and $β_{out}^{res} \in ℝ^{D_{res} \times D_{out}}$ are matrices generated randomly. The nonlinear function $f_{res} (\cdot)$ is applied, where $y_{n - 1} \in ℝ^{D_{out}}$ denotes the output from the previous time step. The output at the current step is then calculated as

$y_{n} = f_{out} (β_{in}^{out} x_{n} + β_{res}^{out} g_{n}) .$ (5)

Depending on the specific dataset, the parameters $β_{in}^{out} \in ℝ^{D_{out} \times D_{in}}$ and $β_{res}^{out} \in ℝ^{D_{out} \times D_{res}}$ are adapted, so as an invertible nonlinear function $f_{out} (\cdot)$ . To train the readout, assume that there is a sequence of $N$ input-output pairs $(x_{1}, y_{1}), \dots, (x_{N}, y_{N})$ . The inputs are mapped to the reservoir, generating a series of internal states $g_{1}, \dots, g_{N}$ . At this stage, due to the unavailability of the ESN output for feedback, the designated output is substituted in (4). We obtain the hidden matrix $H$ and output matrix $S$ as

$H = [\begin{matrix} x_{1}^{⊤}, g_{1}^{⊤} \\ ⋮ \\ x_{N}^{⊤}, g_{N}^{⊤} \end{matrix}], S = [\begin{matrix} f_{out}^{- 1} (y_{1}^{⊤}) \\ ⋮ \\ f_{out}^{- 1} (y_{N}^{⊤}) \end{matrix}] .$ (6)

The optimal weight vector for the output is obtained by solving the subsequent regularized least-squares problem:

$β^{*} = \underset{β}{\arg \min} {\frac{1}{2} {‖ H β - S ‖}^{2} + \frac{C}{2} {‖ β ‖}^{2}},$ (7)

where $β = {[β_{in}^{out}, β_{res}^{out}]}^{⊤} \in ℝ^{(D_{in} + D_{res}) \times D_{out}}$ and $C > 0$ is a regularization parameter. The solution to the problem in (7) can be written as

$β^{*} = {(H^{⊤} H + C I_{L})}^{- 1} H^{⊤} S,$ (8)

where $L = D_{in} + D_{res}$ .

To transform the centralized ESN into a distributed ESN, at the outset, we break down the centralized ESN task into several subtasks, each corresponding to a specific agent. Obtaining the optimal solution for the regularized problem leads to the determination of the output weights $β^{*}$ .

$\begin{array}{l} min_{{β_{i}}, {ξ_{n, i}}} & \frac{1}{2} \sum_{i = 1}^{J} \sum_{n = 1}^{N_{i}} {‖ ξ_{n, i} ‖}^{2} + \frac{C}{2 J} \sum_{i = 1}^{J} {‖ β_{i} ‖}^{2} \\ s . t . & β_{i}^{⊤} h_{n, i} = s_{n, i}^{⊤} - ξ_{n, i}^{⊤}, \forall i \in V, n = 1, \dots, N_{i}, \\ β_{i} = β_{j}, \forall i \in V, j \in N_{i} . \end{array}$ (9)

The training error vector for the output neurons, corresponding to the training example $x_{n, i}$ at the $i$ -th learning agent, is denoted by $ξ_{n, i} \in ℝ^{D_{out}}$ . To convert a centralized ESN into a fully distributed ESN, the overall output weights $β$ are first duplicated and assigned to each agent. Equality constraints are then enforced along the edges of the communication network. By incorporating the constraints into the objective function, problem (9) can be reformulated as

$\begin{array}{l} min_{{β_{i}}} & \sum_{i = 1}^{J} (\frac{1}{2} {‖ H_{i} β_{i} - S_{i} ‖}^{2} + \frac{C}{2 J} {‖ β_{i} ‖}^{2}) \\ s . t . & β_{i} = β_{j}, \forall i \in V, j \in N_{i}, \end{array}$ (10)

where the hidden matrix $H_{i}$ and the training target matrix $S_{i}$ are autonomously acquired from the respective dataset $D_{i}$ observed by agent $i$ itself. For simplicity in presentation, we begin by simplifying problem (10) as

$β^{*} = \underset{β}{\arg \min} {\sum_{i = 1}^{J} u_{i} (β) ≜ u (β)},$ (11)

where

$u_{i} (β) = \frac{1}{2} {‖ H_{i} β - S_{i} ‖}^{2} + \frac{C}{2 J} {‖ β ‖}^{2}$ (12)

is the individual function performed by agent $i$ locally. Agent $i$ is capable of independently optimizing its local objective and establishing local consensus constraints through communication with neighboring agent $j$ .

2.4. Distributed ESN Solved by the ZGS Algorithm

The ZGS algorithm aims to achieve optimal consensus on a manifold where the sum of the gradients of local functions is consistently zero. Let $β_{i} (t)$ denote the state of agent $i$ estimating the unknown optimum $β^{*}$ at iterative step $k$ , with $β_{i} (0)$ indicating the initial value. The objective is to drive $β_{i} (t)$ of each agent asymptotically to $β^{*}$ , i.e., assuming convexity in the objective function, the global objective can be optimally achieved uniquely. According to the form of the objective function, it is known that it is a convex function. The global objective can be uniquely achieved optimally.

Assumption 1 For each $i \in V$ , the function $u_{i}$ demonstrates strong convexity and maintains twice continuous differentiability.

Under Assumption 1, there is a unique minimizer $β_{i}^{*}$ , i.e.,

$β_{i}^{*} = {(H_{i}^{⊤} H_{i} + \frac{C}{J} I_{L})}^{- 1} H_{i}^{⊤} S_{i}$ (13)

such that the gradient $\nabla u_{i} (β_{i}^{*}) = 0$ . The ZGS algorithm is formulated as an iterative procedure under the assumption of a zero gradient for $u_{i}$ . The following implicit distributed algorithm is considered to achieve the goal.

$\begin{array}{l} \nabla u_{i} (β_{i} (t + 1)) - \nabla u_{i} (β_{i} (t)) \\ = γ_{1} \sum_{j \in N_{i}} a_{i j} {(β_{j} (t) - β_{i} (t))}^{[α_{1}]} + γ_{2} \sum_{j \in N_{i}} a_{i j} {(β_{j} (t) - β_{i} (t))}^{[α_{2}]} \end{array}$ (14)

with $β_{i} (0) = β_{i}^{*}$ , where $γ_{1}, γ_{2} > 0, 0 < α_{1} < 1, α_{2} > 1$ and $a_{i j}$ represents entries in the weighted adjacency matrix $A$ of graph $G$ . Under the assumption that the graph $G$ is undirected and connected, we have

$\sum_{i = 1}^{J} (\nabla u_{i} (β_{i} (t + 1)) - \nabla u_{i} (β_{i} (t))) = 0.$ (15)

Assuming $β_{i} (0) = β_{i}^{*}$ , we have

$\sum_{i = 1}^{J} \nabla u_{i} (β_{i} (0)) = \sum_{i = 1}^{J} \nabla u_{i} (β_{i}^{*}) = 0.$ (16)

Consequently, the consensus value $β^{o}$ must be the optimizer $β^{*}$ . In such sense, a shared $β^{*}$ is determined for each agent, ensuring $\nabla u (β^{*}) = \sum_{i = 1}^{J} \nabla u_{i} (β^{*}) = 0$ . So that (10) holds. Furthermore, the gradient of the function $u_{i} (β)$ with respect to $β$ can be easily computed as $\nabla u_{i} (β) = (H_{i}^{⊤} H_{i} + \frac{C}{J} I_{L}) β - H_{i}^{⊤} S_{i}$ . The Hessian of $u_{i} (β)$ is

$\nabla^{2} u_{i} (β) = H_{i}^{⊤} H_{i} + \frac{C}{J} I_{L} ≜ {\tilde{Ω}}_{i} .$ (17)

The matrix ${\tilde{Ω}}_{i}$ is static, nonsingular, symmetric and positive definite. Letting $θ_{i} = λ_{min} ({\tilde{Ω}}_{i}) > 0$ and $Θ_{i} = λ_{max} ({\tilde{Ω}}_{i}) > 0$ denote the minimum and maximum eigenvalues of ${\tilde{Ω}}_{i}$ , respectively, then

$θ_{i} I_{L} \underline{≺} {\tilde{Ω}}_{i} \underline{≺} Θ_{i} I_{L} .$ (18)

From the gradient of $u_{i} (β)$ , we have $\nabla u_{i} (β_{i} (t + 1)) - \nabla u_{i} (β_{i} (t))$ $= {\tilde{Ω}}_{i} (β_{i} (t + 1) - β_{i} (t))$ . Then, $β_{i} (t + 1) - β_{i} (t) = {\tilde{Ω}}_{i}^{- 1} (\nabla u_{i} (β_{i} (t + 1)) - \nabla u_{i} (β_{i} (t)))$ . Substituting this expression into the left-hand side of (14), an algorithm that iteratively solves the problem in a distributed manner is designed. For each agent $i \in V$ , we design an algorithm for distributed ESN (ZGS-ESN). The ZGS-ESN iterates as follows:

$\begin{matrix} β_{i} (t + 1) = β_{i} (t) + γ_{1} {\tilde{Ω}}_{i}^{- 1} \sum_{j \in N_{i}} a_{i j} {(β_{j} (t) - β_{i} (t))}^{[α_{1}]} \\ + γ_{2} {\tilde{Ω}}_{i}^{- 1} \sum_{j \in N_{i}} a_{i j} {(β_{j} (t) - β_{i} (t))}^{[α_{2}]} \end{matrix}$ (19)

where $γ_{1}, γ_{2} > 0, 0 < α_{1} < 1, α_{2} > 1$ , with initial values

$β_{i} (0) = {\tilde{Ω}}_{i}^{- 1} H_{i}^{⊤} S_{i},$ (20)

where the matrix ${\tilde{Ω}}_{i}$ is defined in (17). However, the continuous updating strategy may result in elevated communication consumption and frequent updates. The dynamic event-triggered strategy is an effective approach to overcome these disadvantages, in which the information interactions only happen at the trigger instants.

3. Distributed ESN with Dynamic Event-Triggered Consensus

Under the dynamic event-triggered strategy, we consider the following distributed algorithm:

$\begin{array}{l} β_{i} (t + 1) = β_{i} (t) + {\tilde{Ω}}_{i}^{- 1} \sum_{j \in N_{i}} a_{i j} (γ_{1} {(β_{j} (t_{k}^{i}) - β_{i} (t_{k}^{i}))}^{[α_{1}]} \\ + γ_{2} {(β_{j} (t_{k}^{i}) - β_{i} (t_{k}^{i}))}^{[α_{2}]}), t \in [t_{k}^{i}, t_{k + 1}^{i}) \end{array}$ (21)

with initial values as in (20), where $t_{k}^{i}$ is the latest triggering time for agent $i$ . Drawing inspiration from [24], we begin by introducing the following combined measurement variable:

$p_{i} (t) = \sum_{j = 1}^{N} a_{i j} (β_{j} (t) - β_{i} (t)) .$ (22)

Based on the definition of the combined measurement, we express the measurement error as follows :

$e_{i} (t) = p_{i} (t_{k}^{i}) - p_{i} (t) .$ (23)

Agent $i$ requires the combined measurement $p_{i} (t)$ only at specific time intervals, known as triggering times (or event times), which are denoted by $t_{0}^{i}, t_{1}^{i}, t_{2}^{i}, \dots$ . At each triggering time $t_{k}^{i}$ , agent $i$ must calculate $p_{i} (t_{k}^{i})$ through communication with its neighbors. Given the current triggering time $t_{k}^{i}$ , the subsequent triggering time $t_{k + 1}^{i}$ is determined by the following triggering mechanism:

${\begin{array}{l} t_{k + 1}^{i} = inf {t > t_{k}^{i} | h_{i}^{d} (e_{i} (t), \cup_{j \in N_{i}} x_{j} (t), η_{i} (t)) \geq 0}, \\ {\dot{η}}_{i} (t) = g_{i} (e_{i} (t), p_{i} (t), η_{i} (t)), \end{array}$ (24)

where $η_{i} (t)$ represents the internal dynamic state and is the measurement error $e_{i} (t)$ that needs to be defined.

The triggering function $h_{i}^{d} (\cdot)$ is affected not only by the state of the controlled system but also by the internal dynamic state, denoted as $η_{i} (t)$ . This type of triggering mechanism, referred to as a dynamic event-triggering mechanism, was first introduced for single controlled system in [15]. When the internal dynamic state $η_{i} (t)$ is disregarded , the dynamic event-triggering approach simplifies to a static event-triggering mechanism, expressed as $h_{i}^{s} (e_{i} (t), \cup_{j \in N_{i}} β_{j} (t)) \geq 0$ . In this study, the dynamic triggering conditions are defined by the following equations.

${\begin{cases} t_{k + 1}^{i} = inf {t > t_{k}^{i} | {‖ e_{i} (t) ‖}^{2} - δ_{i} {‖ p_{i} (t) ‖}^{2} - π_{i} η_{i} (t) \geq 0}, \\ {\dot{η}}_{i} (t) = - τ_{i} η_{i} (t) + θ_{i} (δ_{i} {‖ p_{i} (t) ‖}^{2} - {‖ e_{i} (t) ‖}^{2}), η_{i} (0) > 0, \end{cases}$ (25)

where, $θ_{i} = 68$ , $τ_{i} = 0.03$ , $π_{i} = 0.002$ and $δ_{i} = 0.01$ . In a multi-agent system, both static and dynamic event-triggering mechanisms need to be decentralized, meaning that each triggering mechanism is restricted to accessing information from its neighboring agents.

4. Main Results

In this section, a numerical example is provided to verify the effectiveness of the proposed algorithm.

4.1. Description of the Datasets

The ZGS-FxTdt-ESN algorithm was validated on synthetic dataset designed for nonlinear system identification and the prediction of chaotic time series. For a large-scale simulation analysis, we utilize datasets that are approximately one to two orders of magnitude larger than those used in earlier studies. To be specific, for each dataset, we create 50 sequences, each comprising 2,000 elements. These sequences start from varying initial conditions, resulting in a total of 100,000 samples for each experiment.

The Mackey-Glass chaotic time-series (referred to as MKG) dataset is used for prediction tasks. Characterized in continuous time, this series follows the subsequent differential equation:

${\dot{x}}_{n} = - 0.1 x_{n} + \frac{0.2 x_{n - τ}}{1 + x_{n - 30}^{10}} .$ (26)

For $τ > 16.8$ , the time-series given by (26) undergoes integration using a fourth-order Runge-Kutta method with a time step of 0.1. Subsequently, it is sampled every 10 time-instants. The task then becomes a 10-step-ahead prediction problem

$y_{n} = x_{n + 10} .$ (27)

The functional form of Geometric Brownian Motion (GBM) is: $S_{t} = S_{t - 1} \cdot e^{(μ - \frac{1}{2} σ^{2}) d t + σ d W}$ . In this model: $μ = 0.0003$ represents the drift rate, governing the long-term trend of the asset price; $σ = 0.005$ denotes the volatility, determining the magnitude of price fluctuations; $S (t)$ is the asset price at time $t$ ; $d W (t)$ is the increment of Brownian motion over an infinitesimal time interval $d t$ . Based on this model, we will generate 50 time series, each containing 2000 consecutive time steps of simulated asset price data.

4.2. The Setting of the Algorithms

We create an agent network employing a model with a randomly generated topology with a 55% connectivity rate. Our experimentation involves varying the number of agents, starting from 4 and incrementing by 4 up to 20. To assess testing errors, we employ a 3-fold cross-validation on the initial set of 50 sequences. Each sequence is 2000 units in length. In every fold of the cross-validation process, the training sequences are distributed across the agents, after which we assess the performance of the four algorithms listed below:

C-ESN: This algorithm is a single-agent ESN. The training data is centralized, and the ESN is trained by directly addressing the problem.
L-ESN: In this algorithm, each agent independently trains a localized ESN using its own data, with no communication taking place. The average testing error is computed across all agents.
ZGS-ESN: This algorithm set $γ_{2} = 0$ and $α_{1} = 1$ . The maximum number of iterations is 400 with an error level $ϵ = 10^{- 6}$ . The parameter $γ_{1} = 0.01$ . ZGS-FxTet-ESN: This algorithm is an extension of the ZGS-FxT-ESN with $ω = 0.01$ . We set the maximum number of iterations at 400, and the error threshold at $ϵ = 10^{- 6}$ .
ZGS-FxTdt-ESN: We set the maximum number of iterations at 400, $ω = 0.01$ and the error threshold at $ϵ = 10^{- 6}$ .

Each of the algorithms employs an identical ESN architecture, which is detailed in the following subsection. The threefold cross-validation process is iterated 10 times with variations in ESN initialization and data partitioning . Errors from each iteration and fold are accumulated. The trained ESN is applied to the test sequences to compute the error , and the resulting predicted outputs ${\tilde{y}}_{1}, \dots, {\tilde{y}}_{N_{test}}$ are compiled. In this context, $N_{test}$ denotes the count of testing samples excluding the initial washout elements from the test sequences. The normalized root mean-square error (N-RMSE) is defined as follows:

(a) [MKG] (b) [GBM]

Figure 1. The output of the algorithms for the network of 20 agents on MKG and GBM dataset. The performance of the distributed algorithms (ZGS-ESN, ZGS-FxTet-ESN and ZGS-FxTdt-ESN) is averaged across the agents.

(a) [MKG] (b) [GBM]

Figure 2. Evolution of testing error (NRMSE) for networks in the range of 4 to 20 agents.

(a) [MKG] (b) [GBM]

Figure 3. Evolution of training time for networks in the range of 4 to 20 agents.

Figure 4. Event-triggered instants of the agents.

$NRMSE = \sqrt{\frac{\sum_{n = 1}^{N_{test}} {({\tilde{y}}_{n} - y_{n})}^{2}}{N_{test} {\hat{σ}}_{t}}} .$ (28)

In this context, ${\hat{σ}}_{t}$ signifies an empirical estimate of the variance in the actual output samples $y_{1}, \dots, y_{N_{test}}$ , where $N_{test}$ is the count of testing samples, and ${\tilde{y}}_{1}, \dots, {\tilde{y}}_{N_{test}}$ , denote the predicted outputs.

4.3. ESN Architecture

A regularization factor of $C = 10$ and a reservoir size of $D_{res} = 300$ were selected, and these choices proved to be effective in all situations. In the reservoir, we employ $tanh (\cdot)$ nonlinearities, and for the output function, a scaled identity $f (s) = α_{t} s$ is utilized. In all instances, the parameter $α_{t}$ is set by searching within the set ${0.1, 0.3, \dots, 0.9}$ . The matrix $β_{in}^{res}$ is populated with entries drawn from a uniform distribution within the range $[- α_{i}, α_{i}]$ . The parameter $α_{i}$ is explored within the same range as $α_{t}$ . The matrix $β_{out}^{res}$ , which links the output to the reservoir, is initialized as a complete matrix with entries drawn from a uniform distribution within the range $[- α_{f}, α_{f}]$ . The parameter $α_{f}$ is searched within the same range as $α_{t}$ . $α_{f} = 0$ is allowed for the case in which no output feedback is required.

The reservoir matrix $β_{res}^{res}$ is initialized using a uniform distribution in the range of $[- 1, 1]$ . Subsequently, around 55% of its connections are set to zero to foster sparsity. Lastly, the matrix is adjusted in scale to achieve a target spectral radius $ρ$ , with $ρ$ being explored within the same range as $α_{t}$ . Furthermore, uniform noise is introduced during the state update of the reservoir, where the noise is sampled from a uniform distribution within the interval $[0, 10^{- 3}]$ . Additionally, the initial $D = 100$ elements are excluded from each sequence. Table 1 displays the ultimate configurations obtained through the grid-search process.

Table 1. Description of the parameters.

Dataset	$ρ$	$α_{i}$	$α_{t}$	$α_{f}$
MKG	0.9	0.3	0.5	0

4.4. Numerical Results

The test outputs of the four algorithms are presented in Figure 1, which demonstrate the prediction capability of the ESN. ZGS-FxTdt-ESN consistently follows the performance of the centralized solution across all scenarios. In Figure 1, the output value of ZGS-FxTdt-ESN basically coincides with the actual output value, indicating that ZGS-FxTdt-ESN maintains a relatively accurate output value. With the increasing number of agents, five algorithms show their test errors in Figure 2. We can see the test error of ZGS-FxTdt-ESN closely matches that of ZGS-FxTet-ESN. This shows that ZGS-FxTdt-ESN has the same test error as ZGS-FxTet-ESN, but ZGS-FxTdt-ESN has more advantages. Not only can its accuracy be comparable to that of ZGS-FxTet-ESN, but it also has a faster convergence speed. Figure 1 and Figure 2 indicate that the output values of C-ESN, ZGS-ESN and ZGS-FxTdt-ESN coincide and their test errors are all identical to that of ZGS-FxTet-ESN , which all suggests that their accuracies are similar. It has a faster convergence speed and only updates the state at the moment when the trigger condition is reached, which can save communication costs more. As shown in Figure 3, ZGS-FxTet-ESN achieves a significantly shorter training time than ZGS-FxTdt-ESN, primarily due to the latter’s need for continuous internal-state monitoring. In Figure 4, the event-triggered instants of agents are depicted, and it can be inferred that Zeno behavior is absent.

5. Conclusions

This paper presents a novel distributed learning algorithm designed for the training of a specialized recurrent neural network for the first time. The algorithm trained an identical ESN model among multiple agents. These agents were linked through a communication topology that operated without a central coordinator, relying solely on information exchange with neighboring nodes. The ZGS method was applied to solve the corresponding distributed optimization problem. The algorithm also incorporates the dynamic event-triggered method in order to reduce communication costs.

The effectiveness of the introduced algorithm was investigated by simulations. In the domain of distributed multi-agent systems, the convergence feature is a primary index for evaluating the performance of a distributed learning algorithm. Directions for future research could include considering dynamic handling of event triggering conditions to make it more applicable to real-life scenarios.

Acknowledgements

This work was supported in part by the National Natural Science Foundation of China (No. 62166013), the Natural Science Foundation of Guangxi (No. 2022GXNSFAA035499) and the Foundation of Guilin University of Technology (No. GLUTQD2007029).

Conflicts of Interest

The authors declare no conflicts of interest regarding the publication of this paper.

References

[1]	Zhang, H., Wang, Z. and Liu, D. (2014) A Comprehensive Review of Stability Analysis of Continuous-Time Recurrent Neural Networks. IEEE Transactions on Neural Networks and Learning Systems, 25, 1229-1262. https://doi.org/10.1109/tnnls.2014.2317880
[2]	Verykios, V.S., Bertino, E., Fovino, I.N., Provenza, L.P., Saygin, Y. and Theodoridis, Y. (2004) State-of-the-Art in Privacy Preserving Data Mining. ACM SIGMOD Record, 33, 50-57. https://doi.org/10.1145/974121.974131
[3]	Zhu, M., Zhou, J., Cai, H., He, X. and Xiao, F. (2023) A Distributed Learning Method for Deep Echo State Network Based on Improved SVD and ADMM. 2023 IEEE 20th International Conference on Mobile Ad Hoc and Smart Systems (MASS), Toronto, 25-27 September 2023, 270-278. https://doi.org/10.1109/mass58611.2023.00040
[4]	He, Y., Chen, L., Xu, Y., Zhu, Q. and Lu, S. (2023) A New Distributed Echo State Network Integrated with an Auto-Encoder for Dynamic Soft Sensing. IEEE Transactions on Instrumentation and Measurement, 72, 1-8. https://doi.org/10.1109/tim.2022.3228278
[5]	Alaviani, S.S. and Elia, N. (2019) Distributed Average Consensus over Random Networks. 2019 American Control Conference (ACC), Philadelphia, 10-12 July 2019, 1854-1859. https://doi.org/10.23919/acc.2019.8815134
[6]	Wang, H. and Yan, X. (2015) Optimizing the Echo State Network with a Binary Particle Swarm Optimization Algorithm. Knowledge-Based Systems, 86, 182-193. https://doi.org/10.1016/j.knosys.2015.06.003
[7]	Long, J., Zhang, S. and Li, C. (2020) Evolving Deep Echo State Networks for Intelligent Fault Diagnosis. IEEE Transactions on Industrial Informatics, 16, 4928-4937. https://doi.org/10.1109/tii.2019.2938884
[8]	Zhang, B., Han, Q. and Zhang, X. (2017) Recent Advances in Vibration Control of Offshore Platforms. Nonlinear Dynamics, 89, 755-771. https://doi.org/10.1007/s11071-017-3503-4
[9]	Lu, Y.M. and Do, M.N. (2008) A Theory for Sampling Signals from a Union of Subspaces. IEEE Transactions on Signal Processing, 56, 2334-2345. https://doi.org/10.1109/tsp.2007.914346
[10]	Zhang, X., Han, Q. and Yu, X. (2016) Survey on Recent Advances in Networked Control Systems. IEEE Transactions on Industrial Informatics, 12, 1740-1752. https://doi.org/10.1109/tii.2015.2506545
[11]	Heemels, W.P.M.H., Sandee, J.H. and Van Den Bosch, P.P.J. (2008) Analysis of Event-Driven Controllers for Linear Systems. International Journal of Control, 81, 571-590. https://doi.org/10.1080/00207170701506919
[12]	Yi, X., Yang, T., Wu, J. and Johansson, K.H. (2019) Distributed Event-Triggered Control for Global Consensus of Multi-Agent Systems with Input Saturation. Automatica, 100, 1-9. https://doi.org/10.1016/j.automatica.2018.10.032
[13]	Berneburg, J. and Nowzari, C. (2019) Distributed Dynamic Event-Triggered Coordination with a Designable Minimum Inter-Event Time. 2019 American Control Conference (ACC), Philadelphia, 10-12 July 2019, 1424-1429. https://doi.org/10.23919/acc.2019.8815227
[14]	Chen, J., Yue, D., Dou, C., Weng, S., Xie, X., Li, Y., et al. (2022) Static and Dynamic Event-Triggered Mechanisms for Distributed Secondary Control of Inverters in Low-Voltage Islanded Microgrids. IEEE Transactions on Cybernetics, 52, 6925-6938. https://doi.org/10.1109/tcyb.2020.3034727
[15]	Girard, A. (2015) Dynamic Triggering Mechanisms for Event-Triggered Control. IEEE Transactions on Automatic Control, 60, 1992-1997. https://doi.org/10.1109/tac.2014.2366855
[16]	Lu, J., Zhang, X., Zhang, B., Hou, X. and Wang, P. (2022) Distributed Dynamic Event-Triggered Control for Voltage Restoration and Current Sharing in DC Microgrids. IEEE Transactions on Sustainable Energy, 13, 619-628. https://doi.org/10.1109/tste.2021.3123372
[17]	Chen, J., Zhang, H. and Yin, G. (2023) Distributed Dynamic Event-Triggered Secure Model Predictive Control of Vehicle Platoon against Dos Attacks. IEEE Transactions on Vehicular Technology, 72, 2863-2877. https://doi.org/10.1109/tvt.2022.3215966
[18]	Ai, W. and Wang, D. (2020) Distributed Stochastic Configuration Networks with Cooperative Learning Paradigm. Information Sciences, 540, 1-16. https://doi.org/10.1016/j.ins.2020.05.112
[19]	Li, S., Ai, W. and Ge, X. (2020) Fixed-Time Distributed Cooperative Learning for Stochastic Configuration Networks. 2020 Chinese Automation Congress (CAC), Shanghai, 6-8 November 2020, 3380-3383. https://doi.org/10.1109/cac51589.2020.9326656
[20]	Hu, W., Yang, C., Huang, T. and Gui, W. (2020) A Distributed Dynamic Event-Triggered Control Approach to Consensus of Linear Multiagent Systems with Directed Networks. IEEE Transactions on Cybernetics, 50, 869-874. https://doi.org/10.1109/tcyb.2018.2868778
[21]	Scardapane, S., Wang, D. and Panella, M. (2016) A Decentralized Training Algorithm for Echo State Networks in Distributed Big Data Applications. Neural Networks, 78, 65-74. https://doi.org/10.1016/j.neunet.2015.07.006
[22]	Bapat, R.B. (2010) Graphs and Matrices, Volume 27. Springer.
[23]	Zuo, Z. (2015) Nonsingular Fixed-Time Consensus Tracking for Second-Order Multi-Agent Networks. Automatica, 54, 305-309. https://doi.org/10.1016/j.automatica.2015.01.021
[24]	Fan, Y., Feng, G., Wang, Y. and Song, C. (2013) Distributed Event-Triggered Control of Multi-Agent Systems with Combinational Measurements. Automatica, 49, 671-675. https://doi.org/10.1016/j.automatica.2012.11.010

Journals Menu

Follow SCIRP

	customer@scirp.org
	+86 18163351462(WhatsApp)
	1655362766

	Paper Publishing WeChat

Journals Menu

Home

About SCIRP

Service

Policies