Communication-Censored Distributed Learning for Stochastic Configuration Networks

This paper aims to reduce the communication cost of the distributed learning algorithm for stochastic configuration networks (SCNs), in which information exchange between the learning agents is conducted only at a trigger time. For this purpose, we propose the communication-censored distributed learning algorithm for SCN, namely ADMMM-SCN-ET, by introducing the event-triggered communication mechanism to the alternating direction method of multipliers (ADMM). To avoid unnecessary information transmissions, each learning agent is equipped with a trigger function. Only if the event-trigger error exceeds a specified threshold and meets the trigger condition, the agent will transmit the variable information to its neighbors and update its state in time. The simulation results show that the proposed algorithm can effectively reduce the communication cost for training decentralized SCNs and save communication resources.


Introduction
In traditional machine learning, it is generally assumed that the whole dataset is located at a single machine. However, in the big data environment, the massiveness and incompleteness of big data make single-threaded optimization algorithms face huge challenges that are difficult to solve. Therefore, parallel optimization and learning algorithms using shared-nothing architectures have been proposed, such as MapReduce [1], by dividing sample data and model parameters, using multi-threaded methods to train sub-dataset in parallel. The model International Journal of Intelligence Science nication efficiency. For the first time, Dimarogonas et al. used event-triggered control in the cooperative research of multi-agent systems. Under centralized and distributed control strategies, they designed event-triggered functions based on state dependence, and obtained corresponding event-triggered time series, getting two event-triggered methods. One is the centralized event-triggered control method [12]. This method requires all nodes to know the global information of the network, and also requires all nodes to update their control signals at the same time, that is, events occur at all nodes at the same time. The other is a distributed event-triggered control method [13]. According to this method, all nodes need to constantly access the state of neighboring nodes to determine when to update their state parameters. However, these requirements may be difficult to meet in practice. This paper mainly studies distributed event-triggered control based on state dependence. This paper aims to introduce an event-triggered communication mechanism into the distributed learning algorithm based on stochastic configuration networks (SCN) to improve the distributed learning algorithm. Only when each node and its neighboring nodes on the network topology are in great need, information will be transmitted between them. By designing a trigger function and continuously monitoring the parameters, only when the error of the node parameters exceeds the threshold to meet the trigger function conditions, the node will transmit the variable information to its neighbors and update its variable information in time. Finally, achieve the purpose of effectively reducing the communication volume of distributed learning algorithms.

Notation
For matrices a n ∈  is represented as stacking two matrices by row.
as the inner product of 1 v and 2 v , the Euclidean normal form : of vector v is naturally derived. In the whole paper, network topology ( ) is adjacency matrix, the elements ij a in the matrix represent the degree of association between a node i and another

Alternating Direction Method of Multipliers (ADMM)
ADMM is a powerful tool to solve structured optimization problems with two variables. These two variables are separable in the loss function and constrained by linear equality. By introducing Lagrange multiplier, the optimization problem with n variables and k constraints can be transformed into an unconstrained optimization problem with n k + variables.
It can be used to solve the unconstrained and separable convex optimization problem with fixed topology in a communication network The problem can be rewritten as a standard bivariate form, by introducing the copy of global variable * x , local variable p i x ∈  and auxiliary variable p ij z ∈  at node i. Since the network is connected, the problem (1) is equivalent to the following bivariate form: The kernel optimal solution satisfies * x is the global optimal solution of the problem (1  λ + can be replaced by a low dimensional dual variable µ , so that (4) the variables will be updated as follows The unconstrained and separable optimization problem in (1) can be solved through continuous iterations.

Stochastic Configuration Network (SCN)
The SCNrandomly assigns the hidden layer parameters within an adjustable interval and introduce a supervised mechanism to ensure its infinite approximation property. As an incremental learning algorithm, SCN builds a pool of candidate nodes to select the best node in each incremental learning process, which accelerates the convergence speed. Given N training set samples ( ) x , and the corresponding output is t . The SCN model with 1 L − hidden layer nodes can be expressed as: where ( ) g x as activation function and The residual error, that is, the difference between the actual observation value and the fit value can be expressed: 1 L e − does not reach the target value, it generates a new L g , and the output weight L β is also recalculated as: The incremental construction learning algorithm is an effective method to solve the network structure, starting from randomly generating the first node, and gradually adding nodes to the network. According to the general approximation property already proved in [14], given span( Γ ) is dense in 2 L and for  when adding a new node, the new input weights and deviations are subject to the following inequalities: tion problem for the output weights can be obtained by the following equation:

Distributed Event-Triggered Learning for SCN
For a distributed learning algorithm, it will impose a consistency constraint on the consistency of each node, making the final solution effect close to that of the centralized processing approach. In the centralized SCN model, the output International Journal of Intelligence Science weights β can be obtained by the following equation [15]: where n ξ is the training error corresponding to the input vector and 0 C > regularization parameter. Bring the constraint conditions to (12) we can obtain where the hidden layer output matrix is written as H and the target output matrix is denoted by T, i.e., , .
SCN is placed in a distributed computing scenario where the training samples are to be written x t and each agent i ∈  on the network topology has its own local dataset, the input vector can be denoted as and the corresponding output vector is denoted as , . We want to train all the training sets i  using a distributed approach, then we can turn the problem (12) of solving for the global unknown output weight β into an optimization problem of the following form. A common constraint is imposed on this problem. Let i β be a copy of i β on node i. Problem (12) By introducing i j = β β into formula (15), we can get Each node i gets the output matrix To solve the distributed learning problem of SCN, we use the ADMM with the local loss function Take the local loss function , , After derivation and solution, the ADMM method is used to solve the above convex consistency optimization problem, and the solution is changed from Equation (20) to ( ) In the training process of algorithm operation, the observation Formula (22) can find that only the transmission variable i β is needed for communication.
Combined with the trigger function, the equation of event-triggered algorithm Furthermore, the matrix form of the algorithm (27) can be written as where matrices D and W are degree matrix and adjacency matrix defined in communication network graph respectively. Each node computes the hidden layer matrix

Numerical Verification
It is easy to understand that if the trigger function part is removed, that is, only the algorithm (22) is used, ADMM-SCN-ET will become a new algorithm ADMM-SCN. In this section, we will compare these two algorithms to prove that the newly proposed distributed improved algorithm can effectively reduce the amount of communication between network nodes.

Regression on Conhull Dataset
The data set is obtained from the real-valued function given by the following In the simulation experiment using dataset Conhull, the network topology of node 4 V = is shown in Figure 1 . When the red line value becomes zero, it means that the node has carried out information communication. Figure 3 shows the communication time of each node. Figure 3(a) shows that all node events occur in the case of time trigger, that is, information communication is carried out between nodes in each iteration, Figure 3

Classification on 2Moons Dataset
The 2Moons dataset is an artificial dataset where the data points are divided into two clusters with distinct moon shapes [10]. The global training dataset is 300 points randomly selected from the uniform distribution [ ] The communication time of each node in this example is shown in Figure 5.  60,000 training datasets and 10,000 test datasets. Each data unit includes a picture containing handwritten digits and a corresponding label. Each picture contains 28 × 28 pixels, and the corresponding label ranges from 0 to 9 representing handwritten digits from 0 to 9. As shown in Figure 6(a).
The network diagram used in this data classification example is shown in  used, the communication times of each node are: node 1 is 10 times, node 2 is 8 times, node 3 is 6 times, node 4 is 9 times, node 5 is 10 times, node 6 is 10 times, node 7 is 10 times, node 8 is 9 times, node 9 is 10 times, and node 10 is 10 times.

Conclusion
This paper designs a distributed algorithm based on an event-triggered communication mechanism called ADMM-SCN-ET. Combined with the ADMM for solving the convex consensus optimization problem, we propose a communication-censored distributed learning algorithm and use it to solve the optimal output weight problem of neural networks. The algorithm uses an event-triggered communication mechanism to prevent the transmission of variable information with a small change of node state, and reduces unnecessary information communication between nodes. Finally, three datasets are used to verify the effectiveness of the proposed algorithm. The simulation results show that the algorithm proposed in this paper can effectively reduce the communication traffic of distributed learning algorithms while ensuring the performance of existing distributed learning algorithms in data regression and classification.