Distributed Learning for Echo State Networks with Dynamic Event-Triggered Consensus ()
1. Introduction
Echo state network is a novel type of recurrent neural network. Compared to other recurrent neural networks, ESN can handle time series classification and regression tasks by simply training the weight matrix connecting the hidden layer to the output layer using linear regression [1]. In practical applications, due to the large volume of data and privacy concerns, distributed ESNs have been widely applied [2]. In recent years, extensive research has been conducted on ESNs. For example, Zhu proposed a distributed learning method for DeepESN (DESN) based on improved singular value decomposition (SVD) and alternating direction method of multipliers (ADMM), which effectively reduces computation and communication costs for training DESNs in edge environments [3]. In [4], a novel distributed ESN model integrated with an auto-encoder (AE-DESNm) eliminates the dimension disaster problem and uses an extreme learning machine (ELM) to extract features for estimating key variables. Integrating ESN with the decentralized average consensus (DAC) [5] approach facilitates achieving consensus in output values. Over time, considerable advancements have been made in methods for enhancing ESNs. For instance, a hybrid evolutionary algorithm that combines a competitive swarm optimizer with local search has been applied to mitigate the challenges associated with the random selection of input and reservoir weights [6]. Additionally, an optimized echo state network (O-ESN) proposed in [7] utilizes binary particle swarm optimization to minimize network complexity while improving generalization capabilities.
In a distributed framework, the assumption for achieving consensus convergence is often that each agent communicates continuously with its neighbors, which typically requires unlimited communication resources. However, a perfect communication network with infinite bandwidth capacity is usually not feasible in real-life scenarios [8]. To address the resource consumption issue in continuous communication among multi-agents, Lu proposed a general sampling framework where the sampled signals originate from a known union of subspaces, and the sampling operator is linear. Furthermore, the framework identified the minimum sampling requirements for several classes of signals [9]. If the continuous sampling signals fluctuate minimally, agents will engage in continuous unnecessary communication, resulting in a large amount of wasted samples [10]. The introduction of the event-based control method has garnered widespread interest, as it adjusts the sampling sequence based on events related to the system state, thus avoiding the communication resource wastage associated with traditional periodic sampling [11]. An event-triggered consensus protocol and triggering law proposed in [12] can achieve global consensus if and only if the underlying directed graph has a directed spanning tree. Moreover, it is found that the existence of a directed spanning tree is always a necessary and sufficient condition for achieving global consensus, regardless of whether input saturation is present. This conclusion remains valid even when more complex nonlinear dynamic behaviors are introduced. The fully distributed event-triggered algorithm proposed by Berneberg ensures asymptotic convergence to the average consensus state while allowing each agent to independently select a desired minimum inter-event time, thereby guaranteeing non-Zeno behavior [13]. In [14], static and dynamic event-triggered methods are introduced as secondary control techniques to restore voltage frequency and improve power sharing accuracy.
A new dynamic event-triggered control strategy, which includes internal dynamic variables, was proposed in [15]. Compared to static event-triggered strategies, it offers more advantages. As a result, dynamic event-triggered strategies have been used in recent years to solve the consensus problem in multi-agent systems (MAS). Later, dynamic event triggering was applied to islanded DC microgrids [16] and vehicle platoon systems [17], where the data from neighboring DGs are used only at event-triggered times, thus reducing the burden on the communication network. In our previous work [18], we applied an event-triggered control policy to the communication component of the problem to avoid unnecessary transmissions, where the agents send messages only when the negotiation is critically needed. In [19], a distributed cooperative learning algorithm designed for stochastic configuration networks with fixed-time convergence, known as FixD-SCN, which is presented to tackle the ‘Big Data’ problem. To our best knowledge, little effort has been made in the field of dynamic event-triggered distributed learning over peer-to-peer networks, especially for RNNs.
The purpose of this paper is to propose a distributed learning algorithm for training the ESN, in which all agents converge asymptotically while minimizing the use of communication resources during the process. To accomplish this objective, we extend the multi-agent system with dynamic event-triggered algorithms to address the distributed learning problem. Initially, we convert the centralized ESN into a distributed form by breaking down the training process into smaller components, each subject to equality constraints on the output parameters. Next, a dynamic event-triggered algorithm is used to reduce communication during the process of optimizing the distributed ESN output weights. The proposed distributed algorithm is inspired by the use of dynamic event-triggered techniques to achieve distributed consensus convergence [20]. This work extends that of Scardapane et al. [21] by computing the final optimal consensus of ESN output parameters. The primary contributions presented in this paper are summarized below.
A fully decentralized algorithm for training ESNs is introduced, where the training process is collaboratively executed without relying on a central coordinator.
The convergence with dynamic event-triggered mechanism of a distributed learning algorithm is addressed to provide an asymptotically convergent.
The convergent property of the proposed algorithm is mathematically substantiated when the Lyapunov theory is employed.
Notation: Denote the set of real (natural) numbers as
(
). Let
(
) represent the
-dimensional column vector where each entry is 1 (0). The matrix
is the identity matrix. Letting
and
represent the Euclidean norm (2-norm) and 1-norm, respectively.
denotes the spectral radius of a matrix.
is used to denote the transpose of a vector or matrix and
to represent the Kronecker product. For a vector
,
, denote the
-norm of
by
, denote the sign of
by
, where
represents the sign function; denote
with
, where
denotes the absolute value. Then we have
,
and
. Let
denote the matrix inequality sign, i.e.,
, indicating that
is a positive semidefinite matrix.
2. Problem Formulation
2.1. Graph Theory
We design a communication network of
agents distributed over a geographic area. Mathematically, we represent the communication network by a graph
of order
with
depicting the set of vertices,
representing the edge set, and
illustrates the weighted adjacency matrix, where
if
and
, otherwise. The graph has no loops or multiple edges. If there is a connection between node
and node
,
. The set of neighbors of node
is represented by
. Define the Laplacian matrix
, where
for
and
. The matrix
is symmetric and positive semi-definite, with all eigenvalues being non-negative.
2.2. Problem Formulation
Lemma 1 (graphs) For an undirected connected graph
with
vertices, the eigenvalues of
contains only one zero entry.
possessing a right eigenvector,
, i.e.,
, and other have positive real parts. Suppose
in an ascending order. The minimum nonzero eigenvalue
, where
. We have
(1)
Furthermore, if
, then
.
Lemma 2 (nonsingular) Let
. Then
(2)
if
, and
(3)
if
.
2.3. Distributed ESN over Distributed Datasets in a Multi-Agent
Networks
ESN consists of three parts: an input layer, a hidden layer (reservoir), and an output layer. It is a variant of feedforward neural networks, with the training process involving feeding back output results to the hidden layer to adjust connection weights. The input vector
, with a dimensionality of
, is fed into a reservoir of dimensionality
. The internal state of the reservoir,
, is computed as follows:
(4)
where
and
are matrices generated randomly. The nonlinear function
is applied, where
denotes the output from the previous time step. The output at the current step is then calculated as
(5)
Depending on the specific dataset, the parameters
and
are adapted, so as an invertible nonlinear function
. To train the readout, assume that there is a sequence of
input-output pairs
. The inputs are mapped to the reservoir, generating a series of internal states
. At this stage, due to the unavailability of the ESN output for feedback, the designated output is substituted in (4). We obtain the hidden matrix
and output matrix
as
(6)
The optimal weight vector for the output is obtained by solving the subsequent regularized least-squares problem:
(7)
where
and
is a regularization parameter. The solution to the problem in (7) can be written as
(8)
where
.
To transform the centralized ESN into a distributed ESN, at the outset, we break down the centralized ESN task into several subtasks, each corresponding to a specific agent. Obtaining the optimal solution for the regularized problem leads to the determination of the output weights
.
(9)
The training error vector for the output neurons, corresponding to the training example
at the
-th learning agent, is denoted by
. To convert a centralized ESN into a fully distributed ESN, the overall output weights
are first duplicated and assigned to each agent. Equality constraints are then enforced along the edges of the communication network. By incorporating the constraints into the objective function, problem (9) can be reformulated as
(10)
where the hidden matrix
and the training target matrix
are autonomously acquired from the respective dataset
observed by agent
itself. For simplicity in presentation, we begin by simplifying problem (10) as
(11)
where
(12)
is the individual function performed by agent
locally. Agent
is capable of independently optimizing its local objective and establishing local consensus constraints through communication with neighboring agent
.
2.4. Distributed ESN Solved by the ZGS Algorithm
The ZGS algorithm aims to achieve optimal consensus on a manifold where the sum of the gradients of local functions is consistently zero. Let
denote the state of agent
estimating the unknown optimum
at iterative step
, with
indicating the initial value. The objective is to drive
of each agent asymptotically to
, i.e., assuming convexity in the objective function, the global objective can be optimally achieved uniquely. According to the form of the objective function, it is known that it is a convex function. The global objective can be uniquely achieved optimally.
Assumption 1 For each
, the function
demonstrates strong convexity and maintains twice continuous differentiability.
Under Assumption 1, there is a unique minimizer
, i.e.,
(13)
such that the gradient
. The ZGS algorithm is formulated as an iterative procedure under the assumption of a zero gradient for
. The following implicit distributed algorithm is considered to achieve the goal.
(14)
with
, where
and
represents entries in the weighted adjacency matrix
of graph
. Under the assumption that the graph
is undirected and connected, we have
(15)
Assuming
, we have
(16)
Consequently, the consensus value
must be the optimizer
. In such sense, a shared
is determined for each agent, ensuring
. So that (10) holds. Furthermore, the gradient of the function
with respect to
can be easily computed as
. The Hessian of
is
(17)
The matrix
is static, nonsingular, symmetric and positive definite. Letting and denote the minimum and maximum eigenvalues of
, respectively, then
(18)
From the gradient of
, we have
. Then,
. Substituting this expression into the left-hand side of (14), an algorithm that iteratively solves the problem in a distributed manner is designed. For each agent
, we design an algorithm for distributed ESN (ZGS-ESN). The ZGS-ESN iterates as follows:
(19)
where
, with initial values
(20)
where the matrix
is defined in (17). However, the continuous updating strategy may result in elevated communication consumption and frequent updates. The dynamic event-triggered strategy is an effective approach to overcome these disadvantages, in which the information interactions only happen at the trigger instants.
3. Distributed ESN with Dynamic Event-Triggered Consensus
Under the dynamic event-triggered strategy, we consider the following distributed algorithm:
(21)
with initial values as in (20), where
is the latest triggering time for agent
. Drawing inspiration from [24], we begin by introducing the following combined measurement variable:
(22)
Based on the definition of the combined measurement, we express the measurement error as follows :
(23)
Agent
requires the combined measurement
only at specific time intervals, known as triggering times (or event times), which are denoted by
. At each triggering time
, agent
must calculate
through communication with its neighbors. Given the current triggering time
, the subsequent triggering time
is determined by the following triggering mechanism:
(24)
where
represents the internal dynamic state and is the measurement error
that needs to be defined.
The triggering function
is affected not only by the state of the controlled system but also by the internal dynamic state, denoted as
. This type of triggering mechanism, referred to as a dynamic event-triggering mechanism, was first introduced for single controlled system in [15]. When the internal dynamic state
is disregarded , the dynamic event-triggering approach simplifies to a static event-triggering mechanism, expressed as
. In this study, the dynamic triggering conditions are defined by the following equations.
(25)
where,
,
,
and
. In a multi-agent system, both static and dynamic event-triggering mechanisms need to be decentralized, meaning that each triggering mechanism is restricted to accessing information from its neighboring agents.
4. Main Results
In this section, a numerical example is provided to verify the effectiveness of the proposed algorithm.
4.1. Description of the Datasets
The ZGS-FxTdt-ESN algorithm was validated on synthetic dataset designed for nonlinear system identification and the prediction of chaotic time series. For a large-scale simulation analysis, we utilize datasets that are approximately one to two orders of magnitude larger than those used in earlier studies. To be specific, for each dataset, we create 50 sequences, each comprising 2,000 elements. These sequences start from varying initial conditions, resulting in a total of 100,000 samples for each experiment.
The Mackey-Glass chaotic time-series (referred to as MKG) dataset is used for prediction tasks. Characterized in continuous time, this series follows the subsequent differential equation:
(26)
For
, the time-series given by (26) undergoes integration using a fourth-order Runge-Kutta method with a time step of 0.1. Subsequently, it is sampled every 10 time-instants. The task then becomes a 10-step-ahead prediction problem
(27)
The functional form of Geometric Brownian Motion (GBM) is:
. In this model:
represents the drift rate, governing the long-term trend of the asset price;
denotes the volatility, determining the magnitude of price fluctuations;
is the asset price at time
;
is the increment of Brownian motion over an infinitesimal time interval
. Based on this model, we will generate 50 time series, each containing 2000 consecutive time steps of simulated asset price data.
4.2. The Setting of the Algorithms
We create an agent network employing a model with a randomly generated topology with a 55% connectivity rate. Our experimentation involves varying the number of agents, starting from 4 and incrementing by 4 up to 20. To assess testing errors, we employ a 3-fold cross-validation on the initial set of 50 sequences. Each sequence is 2000 units in length. In every fold of the cross-validation process, the training sequences are distributed across the agents, after which we assess the performance of the four algorithms listed below:
C-ESN: This algorithm is a single-agent ESN. The training data is centralized, and the ESN is trained by directly addressing the problem.
L-ESN: In this algorithm, each agent independently trains a localized ESN using its own data, with no communication taking place. The average testing error is computed across all agents.
ZGS-ESN: This algorithm set
and
. The maximum number of iterations is 400 with an error level
. The parameter
. ZGS-FxTet-ESN: This algorithm is an extension of the ZGS-FxT-ESN with
. We set the maximum number of iterations at 400, and the error threshold at
.
ZGS-FxTdt-ESN: We set the maximum number of iterations at 400,
and the error threshold at
.
Each of the algorithms employs an identical ESN architecture, which is detailed in the following subsection. The threefold cross-validation process is iterated 10 times with variations in ESN initialization and data partitioning . Errors from each iteration and fold are accumulated. The trained ESN is applied to the test sequences to compute the error , and the resulting predicted outputs are compiled. In this context,
denotes the count of testing samples excluding the initial washout elements from the test sequences. The normalized root mean-square error (N-RMSE) is defined as follows:
(a) [MKG] (b) [GBM]
Figure 1. The output of the algorithms for the network of 20 agents on MKG and GBM dataset. The performance of the distributed algorithms (ZGS-ESN, ZGS-FxTet-ESN and ZGS-FxTdt-ESN) is averaged across the agents.
(a) [MKG] (b) [GBM]
Figure 2. Evolution of testing error (NRMSE) for networks in the range of 4 to 20 agents.
(a) [MKG] (b) [GBM]
Figure 3. Evolution of training time for networks in the range of 4 to 20 agents.
Figure 4. Event-triggered instants of the agents.
(28)
In this context,
signifies an empirical estimate of the variance in the actual output samples
, where
is the count of testing samples, and , denote the predicted outputs.
4.3. ESN Architecture
A regularization factor of
and a reservoir size of
were selected, and these choices proved to be effective in all situations. In the reservoir, we employ
nonlinearities, and for the output function, a scaled identity
is utilized. In all instances, the parameter
is set by searching within the set
. The matrix
is populated with entries drawn from a uniform distribution within the range
. The parameter
is explored within the same range as
. The matrix
, which links the output to the reservoir, is initialized as a complete matrix with entries drawn from a uniform distribution within the range
. The parameter
is searched within the same range as
.
is allowed for the case in which no output feedback is required.
The reservoir matrix
is initialized using a uniform distribution in the range of
. Subsequently, around 55% of its connections are set to zero to foster sparsity. Lastly, the matrix is adjusted in scale to achieve a target spectral radius
, with
being explored within the same range as
. Furthermore, uniform noise is introduced during the state update of the reservoir, where the noise is sampled from a uniform distribution within the interval
. Additionally, the initial
elements are excluded from each sequence. Table 1 displays the ultimate configurations obtained through the grid-search process.
Table 1. Description of the parameters.
Dataset |
|
|
|
|
MKG |
0.9 |
0.3 |
0.5 |
0 |
4.4. Numerical Results
The test outputs of the four algorithms are presented in Figure 1, which demonstrate the prediction capability of the ESN. ZGS-FxTdt-ESN consistently follows the performance of the centralized solution across all scenarios. In Figure 1, the output value of ZGS-FxTdt-ESN basically coincides with the actual output value, indicating that ZGS-FxTdt-ESN maintains a relatively accurate output value. With the increasing number of agents, five algorithms show their test errors in Figure 2. We can see the test error of ZGS-FxTdt-ESN closely matches that of ZGS-FxTet-ESN. This shows that ZGS-FxTdt-ESN has the same test error as ZGS-FxTet-ESN, but ZGS-FxTdt-ESN has more advantages. Not only can its accuracy be comparable to that of ZGS-FxTet-ESN, but it also has a faster convergence speed. Figure 1 and Figure 2 indicate that the output values of C-ESN, ZGS-ESN and ZGS-FxTdt-ESN coincide and their test errors are all identical to that of ZGS-FxTet-ESN , which all suggests that their accuracies are similar. It has a faster convergence speed and only updates the state at the moment when the trigger condition is reached, which can save communication costs more. As shown in Figure 3, ZGS-FxTet-ESN achieves a significantly shorter training time than ZGS-FxTdt-ESN, primarily due to the latter’s need for continuous internal-state monitoring. In Figure 4, the event-triggered instants of agents are depicted, and it can be inferred that Zeno behavior is absent.
5. Conclusions
This paper presents a novel distributed learning algorithm designed for the training of a specialized recurrent neural network for the first time. The algorithm trained an identical ESN model among multiple agents. These agents were linked through a communication topology that operated without a central coordinator, relying solely on information exchange with neighboring nodes. The ZGS method was applied to solve the corresponding distributed optimization problem. The algorithm also incorporates the dynamic event-triggered method in order to reduce communication costs.
The effectiveness of the introduced algorithm was investigated by simulations. In the domain of distributed multi-agent systems, the convergence feature is a primary index for evaluating the performance of a distributed learning algorithm. Directions for future research could include considering dynamic handling of event triggering conditions to make it more applicable to real-life scenarios.
Acknowledgements
This work was supported in part by the National Natural Science Foundation of China (No. 62166013), the Natural Science Foundation of Guangxi (No. 2022GXNSFAA035499) and the Foundation of Guilin University of Technology (No. GLUTQD2007029).