A Study on the Convergence of Gradient Method with Momentum for Sigma-Pi-Sigma Neural Networks

In this paper, a gradient method with momentum for sigma-pi-sigma neural networks (SPSNN) is considered in order to accelerate the convergence of the learning procedure for the network weights. The momentum coefficient is chosen in an adaptive manner, and the corresponding weak convergence and strong convergence results are proved.


Introduction
Pi-sigma network (PSN) is a kind of high order feedforward neural network which is characterized by the fast convergence rate of the single-layer network, and the unique high order network nonlinear mapping capability [1].In order to further improve the application capacity of the network, Li introduces more complex network structures based on PSN called sigma-pi-sigma neural network (SPSNN) [2].SPSNN can be learned to implement static mapping in the similar manner to that of multilayer neural networks and the radial basis function networks.
The gradient method is often used for training neural networks, and the main disadvantages of this method are the slow convergence and the local minimum problem.To speed up and stabilize the training iteration procedure for the gradient method, a momentum term is often added to the increment formula for the weights, in which the present weight updating increment is a combination of the present gradient of the error function and the previous weight updating in-X.Zhang The key for the convergence analysis for momentum algorithms is the monotonicity of the error function during the learning procedure, which is generally proved under the uniformly boundedness assumption of the activation function and its derivatives.In [8] [10] [12] [13], for the gradient method with momentum, some convergence results are given for both two-layer and multi-layer feedforward neural networks.In this paper, we will consider the gradient method with momentum for sigma-pi-sigma neural networks and discuss its convergence.
The rest of the paper is organized as follows.In Section 2 we introduce the neural network model of SPSNN and the gradient method with momentum.In Section 3 we give the convergence analysis of the gradient method with momentum for training SPSNN.Numerical experiments are given in Section 4. Finally, in Section 5, we end the paper with some conclusions.

The Neural Network Model of SPSNN and Gradient Method with Momentum
In this section we introduce the sigma-pi-sigma neural network that is composed of multilayer neural network.The output of SPSNN has the form ( ) , where j x is an input, v N is the number of inputs, ( ) nij f is a function to be generated through the network training, and K is the number of pi-sigma network(PSN) that is the basic building block for SPSNN.
The expression of the function is either 0 or 1, and nijk w is weight values stored in memory.

q
N and e N are information numbers stored in j x .For a K-th order SPSNN, the total weight value will be ( ) ( )  , we have the following actual output: x denotes the jth element of a given input vector t S .
In order to train the SPSNN, we choose a quadratic error function ( ) The gradient method with momentum is used to train weights.The gradients of ( ) , , , , , and the Hessian matrices of ( ) Given any arbitrarily initial weight vectors 0 W , 1 W , the gradient method with momentum updates the weight vector W by ( ) ( ) where 0 η > is the learning rate, Similar to [12] [14], in this paper, we choose m τ as follows: ( ) where µ is a positive number and Notice the component form of ( 1) is In fact, ( )

Convergence Results
Similar to [12] [14], we need the following assumptions.(A2): The number of the elements of From (A1), it is easy to see that there exists a constant 0 M > such that ( ) be continuously differentiable, the number of the elements of the set be finite, and the sequence Then there exists a Theorem 3.2 If Assumption (A1) is satisfied.Then there exists 0 , it holds the following weak convergence result for the iteration (1): ( ) ( ) Furthermore, if Assumption (A2) is also valid, then it holds the strong convergence result, that is there exists W * such that ( ) Proof.
Using Taylor's formula, we expand ( ) where m ξ lies in between m W and The above equation is equivalent to where It is easy to see that Together with (3), we have It is easy to see that 0 If η and μ satisfy (5), then the sequence By (4) it is easy to see for any positive integer N, it holds ( ) ( ) ( ) Let N → ∞ , then we have ( )

Numerical Results
In this section, we propose an example to illustrate the convergence behavior of   In this simulation experiment, the initial weights 0 W is a zero vector of 24 dimensional and 1 W is a 24 dimensional vector whose elements are all 1.The learning rate 0.00001 η = and momentum factor 0.00005 µ = .The number of training samples is 16 T = .In the above Table 2, we compare the convergence behavior of the gradient method with momentum and the gradient method with no momentum.It can be seen that the network training is improved significantly after added the momentum item.

Conclusion
In this paper, we study the gradient method with momentum for training sigma-pi-sigma neural networks.We take the momentum coefficient in an adaptive manner, and the corresponding weak convergence and strong convergence results are proved.The Assumptions A1 and A2 in this paper seem to be a little severe, so how to weaken the one or two assumptions will be our future work.

(
proof for the strong convergence.
[4]y researchers have developed the theory about momentum and extended its applications.For the back-propagation algorithm, Phansalkar and Sastry give a stability analysis with adding the momentum term[4].Torii and Bhaya discuss the convergence of the gradient method with momentum under , N. M. Zhang DOI: 10.4236/jamp.2018.64075881 Journal of Applied Mathematics and Physics crement [3].

Table 1 .
The data samples.