Push-Pull Finite-Time Convergence Distributed Optimization Algorithm

With the widespread application of distributed systems, many problems need to be solved urgently. How to design distributed optimization strategies has become a research hotspot. This article focuses on the solution rate of the distributed convex optimization algorithm. Each agent in the network has its own convex cost function. We consider a gradient-based distributed method and use a push-pull gradient algorithm to minimize the total cost function. Inspired by the current multi-agent consensus cooperation protocol for distributed convex optimization algorithm, a distributed convex optimization algorithm with finite time convergence is proposed and studied. In the end, based on a fixed undirected distributed network topology, a fast convergent distributed cooperative learning method based on a linear parameterized neural network is proposed, which is different from the existing distributed convex optimization algorithms that can achieve exponential convergence. The algorithm can achieve finite-time convergence. The convergence of the algorithm can be guaranteed by the Lyapunov method. The corresponding simulation examples also show the effectiveness of the algorithm intuitively. Compared with other algorithms, this algorithm is competitive.


Introduction
Consider a network with N nodes. Each node on the network has its own cost function, expressed as : , 1,2, , . It is strictly convex. All nodes cooperate to achieve the optimal value of the target cost function.  [2], and similar to it is the resource positioning problem [3], formation control [4], sensor scheduling [5] and distributed message routing [6], etc. At present, a series of algorithms on problem (1-1) have been extensively studied. In general, these algorithms can be divided into two categories: discrete-time algorithms [1] [2] [7] [8] [9] and continuous-time algorithms [10]- [16]. Most of the former adopt iterative method, and based on the consistency of the dynamic system to achieve the goal. For example, in reference [1], the authors propose a non-gradient distributed random iterative algorithm, which can achieve asymptotic convergence with less information transmission, which is better than some existing gradient-based algorithms. In [2], the authors propose a new event-driven zero-gradient and algorithm that can be widely applied to most network models. It can achieve exponential convergence when the network topology is strongly connected and is a detail balance graph. The latter are mostly designed in continuous time, and the study of their convergence properties uses control theory as the main tool. In [10], the researchers proposed a distributed zero-gradient sum algorithm based on continuous time. The initial value of the algorithm is the optimal value of the cost function of each node. Exponential convergence can be achieved when the network is a connected and undirected fixed topology. In [13], the author pointed out that the algorithm can achieve exponential convergence when the local cost function of the node is strongly convex and the gradient meets the global Lipschitz continuity condition. However, most of the existing algorithms on problem (3-1) can only achieve asymptotic or exponential convergence. In real engineering systems, we all hope that the nodes can reach the optimal value x* in a certain time. Some effective methods have also been studied to improve the speed of consensus convergence, for example, by designing optimal topology and optimal communication weights [17] [18] [19] [20] [21]. Although these consensus algorithms have fast convergence speed, they cannot solve the problem in a limited time .
Based on the above research, a finite-time convergence algorithm is proposed in this chapter, using the Hessian inverse matrix to solve the problem (1-1). This algorithm was inspired by references [22] and [23], and extended the existing continuous-time exponential convergence ZGS algorithm to finite-time convergence. The convergence of the algorithm can be guaranteed by the Lyapunov method. Corresponding numerical simulations also verify the effectiveness of within a certain time. Therefore, the problem of finite-time consistency control of multi-agents has attracted widespread attention from scholars [39] [40].
For distributed learning, the learning speed is as important as the learning effect. At present, many algorithms are dedicated to finding an optimal learning strategy [41] [42] [43]. In reference [41], the author gives a distributed cooperative learning algorithm that can achieve exponential convergence. In reference [42], the authors propose a distributed optimization algorithm based on the ADMM method. Under this strategy, the algorithm can achieve global goal problems with asymptotic convergence speed. In [43], the authors proposed two distributed cooperative learning algorithms based on decentralized consensus strategy (DAC) and ADMM strategy. Algorithms based on the ADMM strategy can only achieve asymptotic convergence, but algorithms using the DAC strategy can achieve exponential convergence.

Major Outcomes
Based on the existing research results in related fields, this paper proposes a finite-time convergence distributed optimization algorithm and a fast-convergent distributed cooperative learning algorithm. The effectiveness of our algorithm is verified theoretically and experimentally. . First, a new distributed optimization method and its graph variants are used. Based on this, a neural network-based finite-time convergence algorithm is used to solve the distributed strong convex optimization based on the fixed-time undirected topology network's finite-time convergence problem. The proposed distributed convex optimization algorithm can clearly give the upper bound of the convergence time, which is closely related to the initial state of the algorithm, the algorithm parameters, and the network topology graph. Secondly, the proposed distributed cooperative learning algorithm is a privacy protection algorithm, and the global optimization goal can be solved by simply exchanging the learning weights of the neural network. Unlike previous distributed cooperative learning algorithms that can only achieve asymptotic or exponential convergence, this algorithm can achieve rapid convergence.

Organization of the Paper
We first give the basic assumptions of symbols and descriptions in Section 1.4. Then introduce the push-pull gradient algorithm in the second section and prove its convergence. An introduction to the finite-time convergence algorithm and proof of convergence are given in Section 3. In the fourth section, we introduce a push-pull fast convergence distributed cooperative learning algorithm, demonstrate its convergence, and give numerical simulation. Section 5 gives simulations and comparisons with other algorithms to prove their competitiveness, and gives the conclusion +  represent the real number set and the non-negative real number set, respectively; ⋅ represents the Euclidean norm on the set n  ; Table ⊗ Real Kroneck   Product,   {  }   11  1  1 , , , , , , holds on the set U, then the system is stable in finite time, and the bound of its convergence time T For a linear parameterized neural network with m-dimensional input, n-dimensional output, and l hidden neuron, it can be modeled as follows where m x ⊂  represents the m-dimensional input vector, i s represents the output of the i-th hidden node, and n i w ⊂  is the neural network learning weight connecting the output node with the i-th hidden node.

Push-Pull Gradient Method
In this section, the default vector is a column, let Write it as T  T  1  1  2  2 , , , Under this assumption we studied, there is a problem of unique optimal solution.
For the interactive topology graph between the nodes to be used, we model it abstractly as a directed graph. A histogram ( ) , =    consisting of a pair of nodes  and ordered edge sets  . Here we think that if a message from node i reaches node j in the graph, and , i j is within the directed edge  , then i is defined as the parent node and j is the child node. Information can be passed from parent to child nodes. In graph  , a directed edge path is a subsequence of edges, such as ( ) ( ) , , , , i j j k  In addition, directed trees are directed graphs, in other words, each vertex has only one parent. A tree generated by a directed graph is a directed tree that will follow all vertices in the graph.

Detailed Push-Pull Gradient Method
The algebraic form of the push-pull gradient method can be written as: tree. In addition, at least one node is followed by a spanning tree of R  and T C  , that is, , and R R is the set of all possible spanning tree roots in graph R  .
For the choice of step size, we assume that at least one node in the range has a positive step size.
From the above prerequisites and assumptions we can get some constraints and the scope of the argument, which intuitively opens the way for the algorithm, so we explain our algorithm from another angle.
In order to show the feasibility of the push-pull algorithm, we first calculate in We now reproduce the feasibility of the push-pull algorithm, and from the above assumptions and conditions we know that it is linearly convergent The algorithm in (2-7) is similar in structure to the DIGing algorithm proposed in [44], with mixed matrix distortion. The x update can be viewed as an inexact gradient step with a formula, and it can be viewed as a gradient tracking step. This asymmetric R-C structure design has been used in the literature of average consensus [45], but this algorithm has a gradient term and nonlinear dynamic characteristics, so it cannot explain linear dynamic systems. Above we have explained the rationality of this method mathematically, now we conceptually explain it as a push-pull algorithm and its reliability. In the current calculation, we still put it in a static network, discuss and analyze it. But in fact, many networks in the real world are dynamic or even unreliable. We need to expand the scope of the discussion. The original algorithm was actually calculated from [44], and it also gave us some inspiration. In a dynamic network, if we need to disseminate or integrate information, we need to know the weight of the scatter or know how to derive its weight. When in an unreliable network, the connection between the dissemination and receiving nodes is not reliable. We need some specific strategies to specify the weight distribution or customization In order to keep the part of the network we specified converge, a relatively effective method is to make the receiver perform the task of scaling and combining. When the network environment changes, as the underlying sender, it is difficult to know the entire network change and we can adjust the weight accordingly. We can also continue to use the push protocol to communicate and let the surrounding nodes continue to send messages to it. However, it is difficult to determine whether it is still alive (expired) in the network, because we do not know its status should not or cannot respond as death). We can "subjectively" judge whether a certain node or agent is dead. The important reason is that we cannot fully synchronize. If a node waits for a certain period of time without responding, we can consider it to be dead until he again Answer. In fact, a pull communication protocol can also be used to allow agents to pull information from neighbors or nodes for effective coordination and synchronization.
To sum up, for the general implementation of Algorithm 1, the push protocol is indispensable, and using the pull protocol on this basis can improve the network operation efficiency, but it cannot be operated only by the pull network.

Unify Different Distributed Computing Architecture Systems
We now show how the proposed algorithm unifies different types of distributed architectures to a limited extent. For a completely decentralized case, for example, there is an undirected connection graph  , we can set R C = =    , and let R C = , then it becomes a symmetric matrix. In this case, the algorithm can be regarded as [44] [46]. If the graph is directional and closely connected, we can also let R C = =    and set the corresponding R and C weights.
While it may not be straightforward to implement in a centralized or As an illustration, Figure 1 shows the network topology diagram of R  and The above example is more of a semi-centralized case. Node 1 cannot be replaced by a strongly connected subnet in R and C, but 2, 3, and 4 can be replaced by different nodes, as long as the information of these subnodes can be passed to R  . In the subordinate agent layer of the above, the theory is discussed in the next section. The layer in C  , using the concept of the root tree, can be understood as the specific requirement of the subnet connectivity. In the network, his role is similar to the role of node 1, we call it the leader, and other nodes are called followers. One thing we want to emphasize here is that a subnet can be used to replace a node, but after the replacement, all subnet structures are decentralized, and the relationship between the leader and the subnet is subordinate. This is what we call a semi-centralized architecture.

Proof of Convergence
In this section, we will study the convergence of the algorithm. First, we define Our thinking is based on the linear constraint He is a specific specification. On this basis, a linear system can be established, which belongs to the inequality.
Algorithm analysis According to Formula (2-7), we can get ( ) Let's further define, According to the above definition we can get lemma we build a linear system of inequality According to the previous inequality linear system, we know that when the The problem to be explained next is Given a nonnegative irreducible matrix We now give convergence results for the proposed algorithm.
We assume that in the algorithm (1-1), Among them 1 2 3 , , c c c will be given later. In this way, when the spectral ra- We prove that according to the above lemma, we guarantee that 11 22 33 , , 1 a a a < , The small problem now is to explain that 11 22 33 , , 1 a a a < make the above formula hold.
First, 11 1 a < , Secondly, the sufficient condition for making c a c a c + − < .
Here we explain 1 2 3 , , c c c ( ) And then ( )( ) As discussed above From this we get the final limit of â .

Finite-Time Convergence Algorithm
Now we introduce the optimization algorithm for finite-time convergence.

Algorithm Introduction
Consider a network with N nodes. Each node on the network has its own cost function, expressed as , which is strictly convex. All nodes cooperate to obtain the optimal value of the target cost function. In order to better design the algorithm, we give the following assumptions: From this we get where n i x ∈  represents the state of node i, and γ + ∈  is a gain constant that can be used to improve the convergence speed of Aijie; means The set of all neighbor nodes of node i; ij a is an element of the adjacency matrix A; 0 1 a < < .

And
* i x is the optimal value of cost function i f . Note 3.1: The algorithm (3-1) is inspired by continuous time zero gradient [10] and finite time consistency protocol [20]. From the first formula, From the second formula, we can get , So it is easy to get the gradient and satisfy ( ) can ensure that the algorithm achieves finite-time consistent convergence, that is, there is a convergence time T and a convergence state From the hypothesis 2, we know that Strongly convex has only one optimal value * x , and satisfies ( ) ( ) . The above analysis shows that * x x =  , which shows that at the upper level, this algorithm can solve the problem we raised. It should be noted that when 1 α = , the algorithm only achieves progressive convergence.

Convergence Analysis
This function is given in [10]. Based on Hypothesis 3.2, second-order continuously differentiable function. It is also known that ( ) ( ) V x t is a locally strongly convex function.
Next, for convenience of derivation, we give the following definitions: For i V ∈ , i f is a local strongly convex function. From the above formula, we know that i U is a compact set. In order to take advantage of the strong convex function, we need to find another convex compact set, so we let x t x = holds, so V can be used to prove Theorem 3.1.
In addition, Combining (1-4) for ( ) x t U ∈ , (3-7) can be written as Combined with the finite-time stability theorem proposed earlier, we can get that our algorithm is convergent, then there is a time T,

Simulation
In this section, a simulation experiment is given to demonstrate the effectiveness of the algorithm in this section. We set up a 6-node network topology diagram, as shown in Figure 2. His adjacency matrix is 1 It can be obtained that the optimal value of each node satisfies The optimal value of Equation (1-1) is calculated as * 3.5 Combining the convex compact set U in the proof, we can get 1  In the simulation, we use the parameter values ( )

Push-Pull Fast Convergent Distributed Cooperative Learning Algorithm
This chapter aims to combine and generalize the previously proposed algorithms to practical applications, such as common machine learning scenarios. Inspired by the previous algorithm, we will design a fast convergent distributed cooperative learning (P-DCL) algorithm based on a linear parameterized neural network based on push-pull mode. In the first step, a P-DCL algorithm based on continuous-time convergence in push-pull gradient mode is first given. In the second step, we give a convergence analysis of the algorithm based on the Lyapunov method. In the third step, for the practical effect of the algorithm, we use the fourth-order Runge-Kutta (RK4) method to discretize the algorithm. In the fourth step, the distributed ADMM algorithm and the push-pull gradient-based (P-DCL) algorithm simulation are given. Experiments show that our proposed algorithm has higher learning ability and faster convergence speed. Finally, we give the relationship between the algorithm's own convergence speed and some parameters. Simulation results show that the convergence speed of the algorithm can be effectively improved by properly selecting some adjustable parameters.
Restatement: In order to construct the algorithm systematically, the problem formation is given first, and then the local cost function is analyzed. Then the X. B. Chen et al.
relationship between global cost function and local cost function solution is given.
Consider a network with N nodes. Each node i ∈  in the network contains i M + ∈  samples, and each sample set can be expressed as , represents the k-th sample on the i-th node, so for each node, their local cost function can be expressed as: δ is a non-negative constant. In this way, the optimal learning weight of node i can be easily obtained.
If all the node samples satisfy As mentioned earlier, there are many distributed solving algorithms for this problem that can achieve progressive convergence. Next, what needs to be done is to design a fast distributed optimization algorithm, such as the following requirements: This shows that all nodes can converge to the optimal learning weight * W in a finite time T.
From the above analysis, the global cost function (1-7) can be written as: This is often referred to as global consistency. Unlike the traditional multi-agent consistency problem, the result of consistency convergence here has no specific meaning. Consistency has a long history of research. The basic concept is that all nodes in all networks eventually reach the same state through information exchange with neighbors. From the perspective of learning, an efficient learning algorithm is very necessary. For distributed cooperative learning algorithms, their learning rate is an important measurement index of their algorithm. However, in real life, it is more necessary to reach a valid result within a certain time, which also prompts us to design a fast consensus learning cooperation algorithm.

Fast Convergent Distributed Algorithm
Here, based on the linear parameterized neural network, a distributed strategy where R ρ + ∈ is a constant used to adjust the convergence rate. 0 1 β < < , , i j a is an element in the adjacency matrix  ; Figure 4 can show the operation of the algorithm more intuitively.
, , , The algorithm can be written as a matrix: Note 4.1: The above algorithms are inspired by [47]. Linear consistency algorithms can achieve progressive convergence, while cruise ship consistency algorithms that can achieve limited time convergence mostly use symbolic functions [20] [39].
, so it is easy to get the gradient sum of the node cost function Satisfies ( ) , and because ( ) ( ) E W t is a strong convex function, that is, it has only one optimal. The value also reflects that the algorithm we mentioned does have a solution.
is a second-order continuous positive definite function, β is a constant in the algorithm (4-7), ( ) λ is related to the network topology Graph-related algebraic connectivity. Θ is a constant related to the cost function of all nodes; ρ is the gain constant in the algorithm. Proof: Based on the Lyapunov method, a rigorous proof of Theorem 4.1 is given next. Before certification, some related work needs to be prepared. First, select: As a Lyapunov candidate function, In other words, Next, by calculating the inverse of ( ) ( ) represents the neighbor of node i, In addition, it can be concluded Combining Formula (4)(5)(6)(7)(8)(9)(10)(11)(12) and Formula (4-13) Formula (4)(5)(6)(7)(8)(9)(10)(11) can be written This indicates that we can get that the proposed algorithm (4-7) is stable for a finite time, so there is Based on the above analysis, we can get that the algorithm proposed in this chapter can indeed find the optimal value of (1-7) in a limited time.

Fast Convergent Discrete-Time Distributed Cooperative Learning Algorithm
Based on the algorithm of (4-6), this section gives the discrete form:  . In addition, Figure 5 can more intuitively show the iterative process of the discrete algorithm (4-17).
Note 4.2: In order to obtain good control performance or simplify the design process, usually in the design process of modern industrial control, we need to discretize a continuous-time system. In addition, effective discretization can not only reduce time and space costs, but also improve the learning accuracy of the algorithm. Methods like pulse invariance methods, pole-zero mapping methods, and triangle-equivalent equivalence are commonly used to convert continuous-time systems into equivalent discrete systems. Runkutta (RKK) algorithm with high accuracy and good stability is widely used. Therefore, we use the fourth-order RK (RK4) to process the discretization algorithm (4-6). However, for node i, we need to add 4 i N communications for each step. In other words, using the RK4 method for calculation increases the complexity of the calculation.

Simulation
In this section, we consider numerically verifying our conclusions on real data , In order to show the convergence speed of the proposed algorithm more clearly, we randomly select a component of W to display. Its convergence speed can be seen in Figure 6.
From the figure, the convergence time 130 s 31398 s t <  can be obtained.
Combined with Theorem 4.1, the relationship between the convergence speed and parameters of the algorithm will be given intuitively in this part. Figure 7 serves as our network topology. Figure 8 shows

Conclusions
In this paper, we study the distributed optimization problem on the network.
We propose a new distributed method based on push-pull finite time convergence, in which each node keeps the average gradient estimation of the optimal decision variable and the principal objective function. Information about gradients is pushed to its neighbors, and information about decision variables is pulled from its neighbors. This method uses two different graphs for information exchange between agents and is applicable to different types of distributed architectures, including decentralized, centralized, and semi-centralized architectures. Along with this, we introduced a fast convergent distributed cooperative learning algorithm based on a linear parameterized neural network.
Through strict theoretical proof, the algorithm can achieve finite-time convergence under continuous time conditions. In the simulation, we have investigated the influence of different parameter changes on the convergence speed, and also proved the effectiveness of the algorithm compared with some typical algorithms. In the future work, we can properly promote and apply the proposed distributed cooperative learning algorithm to large-scale distributed machine learning problems.