A Symmetric Bregman Alternating Direction Method of Multipliers for Separable Nonconvex Minimization Problem

Abstract

The alternating direction method of multipliers (ADMM) and its symmetric version are efficient for minimizing two-block separable problems with linear constraints. However, both ADMM and symmetric ADMM have limited versatility across various fields due to the requirement that the gradients of differentiable functions exhibit global Lipschitz continuity, a condition that is typically challenging to satisfy in nonconvex optimization problems. Recently, a novel Bregman ADMM that not only eliminates the need for global Lipschitz continuity of the gradient, but also ensures that Bregman ADMM can be degenerated to the classical ADMM has been proposed for two-block nonconvex optimization problems with linear constraints. Building on this, we propose a symmetric Bregman alternating direction method of multipliers, which can be degenerated into the symmetric ADMM and the Bregman ADMM, and thus further degenerated into the classical ADMM. Moreover, when solving separable nonconvex optimization problems, it does not require the global Lipschitz continuity of the gradients of differentiable functions. Furthermore, we demonstrate that under the Kurdyka-Lojasiewicz inequality and certain conditions, the iterative sequence generated by our algorithm converges to a critical point of the problem. In addition, we examine the convergence rate of the algorithm.

Share and Cite:

Zeng, Z. and Zhao, S. (2025) A Symmetric Bregman Alternating Direction Method of Multipliers for Separable Nonconvex Minimization Problem. Journal of Applied Mathematics and Physics, 13, 889-913. doi: 10.4236/jamp.2025.133047.

1. Introduction

In this paper, we consider the following two-block separable optimization problem with linear constraint

min x,y f( x )+g( y ),  s.t. Ax+y=b, (1.1)

where f: n { + } is a proper lower semicontinuous function, g: m is a continuous differentiable function, A m×n and b m . Numerous valuable optimization problems can be expressed in the form (1), rendering it widely applicable across diverse fields, such as machine learning [1]-[3], image processing [4]-[6], and signal processing [7]-[10].

When both f and g are convex functions, a prominent approach for addressing problem (1) is the alternating direction method of multipliers (ADMM), proposed by Gabay, Mercier, Glowinski and Marroco [11] [12] in 1970s. This method fully harnesses the separable properties to their utmost potential, thus attracting considerable attention across diverse domains in recent years. The iterative scheme of the ADMM is as follows

{ x k+1 argmin x { β ( x, y k , λ k ) }, y k+1 argmin y { β ( x k+1 ,y, λ k ) }, λ k+1 := λ k β( A x k+1 + y k+1 b ). (1.2)

Here, β ( ) denotes the augmented Lagrangian function for (1.1), given by

β ( x,y,λ )=f( x )+g( y ) λ,Ax+yb + β 2 Ax+yb 2 ,

where λ is the Lagrangian multiplier associated with the linear constraint, and β>0 is the penalty parameter. The study of ADMM has a lengthy academic lineage, its convergence and convergence rate are well-understood [13]-[16] for convex objectives.

However, for scenarios where there is at least one nonconvex component in the objective function, many studies primarily focus on proving the convergence of the ADMM or its variant and analyzing problem scenarios under additional conditions, such as Li and Pong [17] and Hong et al. [18]. Particularly, in 2017, Guo et al. [19] demonstrated that under conditions less stringent than those outlined in [17] [18], the convergence and convergence rate of ADMM for nonconvex problems can be established, contingent upon the augmented Lagrangian function satisfying the Kurdyka-Lojasiewicz inequality. Motivated by the insights provided in the aforementioned article, Wu et al. [20] considered utilizing symmetric ADMM to address a two-block linearly constrained separable nonconvex optimization, which can revert back to the classical ADMM, and conducted an analysis on its convergence and convergence rate for the case B=I ( I is the identity matrix with proper dimension). Note that it can numerically accelerate ADMM with some values of α>0 . Its iterative format is as follows

{ x k+1 argmin x { β ( x, y k , λ k ) }, λ k+ 1 2 := λ k αβ( A x k+1 +B y k b ), y k+1 argmin y { β ( x k+1 ,y, λ k+ 1 2 ) }, λ k+1 := λ k+ 1 2 β( A x k+1 +B y k+1 b ), (1.3)

The only difference from the classical ADMM lies in the addition of a relaxation factor α( 1,1 ) in the update of the multiplier λ k+1 between the iterative formulas of x and y . In particular, the algorithm returns to the classical ADMM when α=0 . Therefore, it presents the same limitations in certain areas [21]-[23]. Whether using the ADMM or the symmetric ADMM to tackle two-block separable nonconvex problems with linear constraints, both methods are constrained by the assumption of Lipschitz continuity of differentiable functions.

To relax the Lipschitz continuity constraint on the gradient of the objective function, Tan and Guo [24] introduced a novel version of Bregman ADMM, which is distinguished from that proposed by Wang et al. [25] by its ability to revert to the classical ADMM. Its iteration is as follows

{ x k+1 argmin x { β h ( x, y k , λ k ) }, y k+1 argmin y { β h ( x k+1 ,y, λ k ) }, λ k+1 := λ k β( h( A x k+1 b )h( y k+1 ) ), (1.4)

where β h ( ) denotes the Bregman augmented Lagrangian function for (1.1)

β h ( x,y,λ )=f( x )+g( y ) λ,Ax+yb +β D h ( y,Axb ), (1.5)

where λ is the Lagrangian multiplier associated with the linear constraint, and β>0 is the penalty parameter. And the Bregman distance D h ( , ) is defined as

D h ( x,y )=h( x )h( y )h ( y ) T ( xy ).

When h( )= 1 2 2 , the Bregman ADMM (1.4) reduces to the classical ADMM (1.2).

Drawing on the aforementioned concept, our aim is to alleviate the necessity for Lipschitz continuity of the gradient of differentiable functions in the symmetric ADMM while addressing two-block separable nonconvex problems with linear constraints. In this paper, we propose an iteration for symmetric version of the Bregman ADMM, whose iterative scheme is

{ x k+1 argmin x { β h ( x, y k , λ k ) }, (1.6a) λ k+ 1 2 := λ k αβ( h( A x k+1 b )h( y k ) ), (1.6b) y k+1 argmin y { β h ( x k+1 ,y, λ k+ 1 2 ) }, (1.6c) λ k+1 := λ k+ 1 2 β( h( A x k+1 b )h( y k+1 ) ), (1.6d)

The symmetric Bregman ADMM (1.6) can transition to the symmetric ADMM (1.3) by setting h( )= 1 2 2 , to the Bregman ADMM (1.4) by setting α=0 , and to the classic ADMM (1.2) by setting h( )= 1 2 2 and α=0 . In essence, the ADMM framework investigated by Wu et al. [20] and Tan and Guo [24] is a particular instance of the approach we introduce. Under the same assumption as those in Tan and Guo [24], we can prove the convergence of the symmetric Bregman ADMM (1.6), providing that the associated function satisfies the Kurdyka-Lojasiewicz inequality. Moreover, we demonstrate that the iterative sequence produced by the symmetric Bregman ADMM (1.6) converges to a critical point of the problem (1.1), and we also analyze the convergence rate of the algorithm.

The remainder of this paper is organized as follows. In Section 2, we provide some necessary preliminaries for our subsequent analysis. In Section 3, we present the convergence analysis of the symmetric Bregman ADMM (1.6) and analyze its convergence rate. Finally, in Section 4, we summarize our findings and draw conclusions.

2. Preliminaries

In this section, we recall some definitions and basic results that will be used for further analysis.

Definition 2.1. [26] For an extended real-valued function f: n { + } , the effective domain or just the domain is the set

 dom ( f )={ x n :f( x )< }.

Definition 2.2. [26] A function f: n { + } is called proper if there exists at least one x n such that f( x )< .

Definition 2.3. [26] A function f: n { + } is called lower semicontinuous at x n if

f( x ) liminf k f( x k ).

for any sequence { x k } n for which x k x as k . Moreover, f( ) is called lower semicontinuous if it is lower semicontinuous at each point in n .

Definition 2.4. ([27], kernel generating distance) Let C be a nonempty, convex, and open subset of m . Associated with C , a function h: m { + } is called a kernel generating distance if it satisfies the following

(i) h( ) is proper, lower semicontinuous, and convex, with dom( h ) C ¯ and dom( h )=C .

(ii) h( ) is C 1 on int( dom( h ) )C .

We denote the class of kernel generating distance by G( C ) .

Definition 2.5 ([27], L-smooth adaptable) Let hG( C ) , g: m continuously differentiable on C=int( dom( h ) ) . A pair ( g,h ) is called L-smooth adaptable on C if there exists L>0 such that Lhg and Lh+g are convex on C .

Remark 2.1. Definition 2.5 serves as a natural extension and complement to the definition of A Lipschitz-like/Convexity Condition as presented in reference [21]. This extension enables the derivation of the following two-sided descent lemma.

Lemma 2.1. ([27], extended descent lemma) The pair of functions ( g,h ) is L-smooth adaptable on C if and only if

| g( x )g( y ) g( y ),xy |L D h ( x,y )x,yint( dom( h ) ). (2.1)

Remark 2.2. In particular, when the set C= m and h( )= 1 2 2 , (2.1) reduces to the classical descent lemma for function g , i.e.,

| g( x )g( y ) g( y ),xy | L 2 xy 2 x,y m .

Definition 2.6. [27] Let g: m { + } be a proper and lower semicontinuous function. The gradient of g( ) is D-Lipschitz if there exists L>0 satisfying

g( x )g( y ) L D h ( x,y )+ D h ( y,x ) xy ,xyint( dom( h ) ).

Remark 2.3. According to Cauchy-Schward inequality, we have

| g( x )g( y ),xy | g( x )g( y ) xy ,

which combines with definition ??, we obtain that

| g( x )g( y ),xy |L( D h ( x,y )+ D h ( y,x ) ).

Using the conclusion in Lemma 2.4, the above inequality is equivalent to

g( x )g( y ),xy L( D h ( x,y )+ D h ( y,x ) )=L h( x )h( y ),xy ,

g( y )g( x ),xy L( D h ( x,y )+ D h ( y,x ) )=L h( x )h( y ),xy .

Then,

( Lh( x )g( x ) )( Lh( y )g( y ) ),xy 0,

( Lh( x )+g( x ) )( Lh( y )+g( y ) ),xy 0. (2.2)

Based on inequality (2.2), it can be concluded that the functions Lh+g and Lhg exhibit convexity due to the monotonicity of their gradients on the set C . Hence, ensuring the gradient of function g( ) satisfies the D-Lipschitz continuity condition is adequate for establishing the ( g,h ) function pair as L-smooth adaptable. Considering the intricacy of the ADMM iterative procedure in this study, we find it necessary to presuppose the D-Lipschitz continuity of function g( ) .

Remark 2.4. Certainly, the D-Lipschitz continuity characteristic is essentially a Lipschitz-like gradient property in the context of the Bregman distance for the function g( ) , and it becomes equivalent to the gradient Lipschitz continuity of g( ) when h( )= 1 2 2 .

Definition 2.7. [27] Let f: n { + } be a proper lower semicontinuous function.

(i) The Fréchet subdifferential, or regular subdifferential, of f( ) at xdom( f ) , written ^ f( x ) , is the set of vectors x * n that satisfy

lim yx inf yx f( y )f( x ) x * ,yx yx 0.

When xdom( f ) , we set ^ f( x )= .

(ii) The limiting-subdifferential, or simply the subdifferential, of f( ) at xdom( f ) , written f( x ) , is defined as follows:

f( x )={ x * n , x n x,f( x n )f( x ), x n * ^ f( x n )with x n * x * }.

Remark 2.5. From above definition, we note that

(i) It implies ^ f( x )f( x ) for each x n , where the first set is closed convex while the second one is only closed.

(ii) Let ( x k , x k * )Graphf be a sequence that converges to ( x, x * ) . By the definition of f , if f( x k ) converges to f( x ) as k+ , then ( x, x * )Graphf , where Graphf={ ( x,y )|yf( x ) } .

(iii) A necessary condition for x n to be a minimizer of f( ) is

0f( x ). (2.3)

(iv) If f: n { + } is a proper lower semicontinuous function and g: n is continuous differentiable function, then ( f+g )( x )=f( x )+g( x ) for any xdom( f ) .

A point that meets the condition of Equation (2.3) is referred to as a critical or a stationary point. The critical points set of f is denoted by critf .

Next, we recall an important property of subdifferential calculus.

Lemma 2.2. [28] Suppose that F( x,y )=f( x )+g( x ) , where f: n { + } and g: m { + } are proper lower semicontinuous functions. Then for all ( x,y )dom( F )=dom( f )×dom( g ) , we have

F( x,y )= x F( x,y )× y F( x,y ).

Definition 2.8. ([28], Kurdyka-Lojasiewicz inequality) Let f: n { + } be a proper lower semicontinuous function. For < η 1 < η 1 + , set

[ η 1 <f< η 1 ]={ x n : η 1 <f( x )< η 2 }.

We say that function f( ) has the KL property at x * dom( f ) if there exist η( 0,+ ] , a neighbourhood U of x * , and a continuous concave function φ:[ 0,η ) + , such that

(i) φ( 0 )=0 ;

(ii) φ is C 1 on ( 0,η ) and continuous at 0;

(iii) φ ( x )>0 , x( 0,η ) ;

(iv) for all x in U[ f( x * )<f<f( x * )+η ] , the Kurdyka-Lojasiewicz inequality holds

φ ( f( x )f( x * ) )d( 0,f( x ) )1,

where d( x,f( x ) )= inf yf( x ) yx , is the distance from x to f( x ) .

Remark 2.6. Denote Φ η be the set of all continuous functions φ( ) which satisfy (i)-(iii).

Definition 2.9. ([29], Kurdyka-Lojasiewicz function) If function f( ) satisfies the KL property at each point of dom( f ) , then f( ) is called a KL function.

Lemma 2.3. ([30], Uniformized KL property) Let Ω be a compact set and f: n { + } be a proper and lower semicontinuous function. Assume that f( ) is constant on Ω and satisfies the KL property at each point of Ω . Then, there exist ε>0 , η>0 , and φ Φ η such that for all x ¯ Ω and for all x in the following intersection:

{ x n :d( x,Ω )<ε }[ f( x ¯ )<f<f( x ¯ )+η ].

one has

φ'( f( x )f( x ¯ ) )d( 0,f( x ) )1.

Lemma 2.4. [31] Let h: m { + } . For any x,yint( dom( h ) ) and zdom( h ) , then

(i) D h ( x,y )+ D h ( y,x )= h( x )h( y ),xy .

(ii) Three points identity holds:

D h ( z,x ) D h ( z,y ) D h ( y,x )= h( x )h( y ),yz . (2.4)

Definition 2.10. [32] Let hG( C ) . The Bregman distance D h :dom( h )×int( dom( h ) ) + is defined by

D h ( x,y ):=h( x )h( y ) h( y ),xy .

Since h( ) is convex, D h ( x,y )0 , and D h ( x,y )=0 if only if when x=y .

Definition 2.11. We say that ( x * , y * , λ * ) is a critical point of the Augmented Lagrangian Function β h ( ) (1.5) with Bregman distance if it satisfies

{ A T λ * f( x * ), λ * =g( y * ), A x * + y * =b.

Lemma 2.5. Let { w k :=( x k , y k , λ k ) } be the sequence generated by the symmetric Bregman ADMM (1.6). Then, we have

{ h( A x k+1 b )h( y k )= 1 ( α+1 )β ( λ k λ k+1 )+ 1 α+1 ( h( y k+1 )h( y k ) ), g( y k+1 )= λ k+1 , h( A x k+1 b )h( y k+1 )= 1 ( α+1 )β ( λ k λ k+1 ) α α+1 ( h( y k+1 )h( y k ) ).

Proof. Combining (1.6b) and (1.6d), we get

λ k+1 λ k =αβ( h( A x k+1 b )h( y k ) )β( h( A x k+1 b )h( y k+1 ) ) =αβ( h( A x k+1 b )h( y k ) )β( h( A x k+1 b )h( y k ) ) +β( h( A x k+1 b )h( y k ) )β( h( A x k+1 b )h( y k+1 ) ) =( α+1 )β( h( A x k+1 b )h( y k ) )+β( h( y k+1 )h( y k ) )

and thus

h( A x k+1 b )h( y k )= 1 ( α+1 )β ( λ k λ k+1 )+ 1 α+1 ( h( y k+1 )h( y k ) ).

Similarly, we combine (1.6b) and (1.6d) again

λ k+1 λ k =αβ( h( A x k+1 b )h( y k ) )β( h( A x k+1 b )h( y k+1 ) ) =αβ( h( A x k+1 b )h( y k+1 ) )β( h( A x k+1 b )h( y k+1 ) ) αβ( h( y k+1 )h( y k ) ) =( α+1 )β( h( A x k+1 b )h( y k+1 ) )αβ( h( y k+1 )h( y k ) ),

thus, we obtain

h( A x k+1 b )h( y k+1 )= 1 ( α+1 )β ( λ k λ k+1 ) α α+1 ( h( y k+1 )h( y k ) ).

From the optimality condition of (1.6c), we have

0=g( y k+1 ) λ k+ 1 2 +β( h( A x k+1 b )h( y k+1 ) ).

Substituting (1.6d) into the above equation yields

g( y k+1 )= λ k+1 .

This completes the proof.

3. Convergence Analysis

In this section, we analyze the convergence of the symmetric Bregman ADMM (1.6) and show that the sequence { w k :=( x k , y k , λ k ) } generated by the symmetric Bregman ADMM (1.6) converges to a critical point { w * :=( x * , y * , λ * ) } of h ( ) under the following assumptions.

Assumption A. Let f: n { + } be a proper lower semicontinuous function, g: m be a continuously differentiable function with g being D-Lipschitz continuous. Additionally, let hG( C ) be a twice differentiable function on C=int( dom( h ) ) , which is 1-strong-convex, and whose h is Lipschitz continuous with L h on any bounded subset of m . We assume that

(i) α( 1 2 L h 2 +1 ,0 ] , β> 2L L h 2 2αL L h 3 1+α+2α L h 2 , which implies

δ=( β L h L+ 2αL L h +2αβ L h +2αL L h 2 α+1 2 L 2 L h 2 ( α+1 )β )>0.

(ii) α[ 0, 1 2 L h 2 1 ) , β> 2( 1+α )L L h 2 +2αL L h 3 1+α2α L h 2 , which implies

δ=( β L h L 2αL L h +2αβ L h +2αL L h 2 α+1 2 L 2 L h 2 ( α+1 )β )>0.

(iii) A T A _ MI for some M>0 .

Next, we examine the optimality conditions of (1.6). Invoking the optimality conditions, we have that

{ 0f( x k+1 ) A T λ k β A T ( 2 h( A x k+1 b ) ) T ( A x k+1 y k +b ), λ k+ 1 2 = λ k αβ( h( A x k+1 b )h( y k ) ), 0=g( y k+1 ) λ k+ 1 2 +β( h( A x k+1 b )h( y k+1 ) ), λ k+1 = λ k+ 1 2 β( h( A x k+1 b )h( y k+1 ) ). (3.1)

Lemma 3.1. Let { w k =( x k , y k , λ k ) } be the sequence generated by the symmetric Bregman ADMM (1.6), which is assumed to be bounded. Then we have

β h ( w k+1 ) β h ( w k )δ D h ( y k , y k+1 ). (3.2)

Proof. From the definition of β h in (1.5), it follows that

β h ( x k+1 , y k , λ k+ 1 2 ) β h ( x k+1 , y k+1 , λ k+ 1 2 ) =g( y k ) λ k+ 1 2 ,A x k+1 + y k b +β D h ( y k ,A x k+1 b ) g( y k+1 )+ λ k+ 1 2 ,A x k+1 + y k+1 b β D h ( y k+1 ,A x k+1 b ) =g( y k )g( y k+1 )+ λ k+ 1 2 , y k+1 y k +β D h ( y k ,A x k+1 b ) β D h ( y k+1 ,A x k+1 b ). (3.3)

Since the gradient of the function g is D-Lipschitz continuous on int( dom( h ) ) , it follows that the function pair ( g,h ) is L-smooth adaptable. Then, according to Lemma 2.1, we can obtain

g( y k )g( y k+1 ) g( y k+1 ), y k y k+1 L D h ( y k , y k+1 ). (3.4)

By substituting inequality (3.4) into relation (3.3), we obtain

β h ( x k+1 , y k , λ k+ 1 2 ) β h ( x k+1 , y k+1 , λ k+ 1 2 ) g( y k+1 ), y k y k+1 L D h ( y k , y k+1 )+ λ k+ 1 2 , y k+1 y k +β D h ( y k ,A x k+1 b )β D h ( y k+1 ,A x k+1 b ) = λ k+1 λ k+ 1 2 , y k y k+1 L D h ( y k , y k+1 )

+β D h ( y k ,A x k+1 b )β D h ( y k+1 ,A x k+1 b ) = λ k+1 λ k+ 1 2 , y k y k+1 L D h ( y k , y k+1 )+β D h ( y k , y k+1 ) +β( D h ( y k ,A x k+1 b ) D h ( y k , y k+1 ) D h ( y k+1 ,A x k+1 b ) ) = λ k+1 λ k+ 1 2 , y k y k+1 L D h ( y k , y k+1 )+β D h ( y k , y k+1 ) +β h( A x k+1 b )h( y k+1 ), y k y k+1 = λ k+1 λ k+ 1 2 , y k y k+1 L D h ( y k , y k+1 )+β D h ( y k , y k+1 ) + λ k+ 1 2 λ k+1 , y k y k+1 =β D h ( y k , y k+1 )L D h ( y k , y k+1 ),

where the first and the third equalities follow from Lemma (2.5) and the three points identity (2.4) respectively.

Based on the 1-strong convexity and L h -smoothness of the function h( ) , we can obtain the following two inequalities respectively

D h ( y k , y k+1 )=h( y k )h( y k+1 ) h( y k+1 ), y k y k+1 L h 2 y k y k+1 2 ,

D h ( y k , y k+1 )=h( y k )h( y k+1 ) h( y k+1 ), y k+1 y k 1 2 y k y k+1 2 .

Combining the above two inequalities, we get

D h ( y k , y k+1 ) 1 2 y k+1 y k 2 1 L h D h ( y k , y k+1 ),

thus, we can obtain

β h ( x k+1 , y k , λ k+ 1 2 ) β h ( x k+1 , y k+1 , λ k+ 1 2 )( β L h L ) D h ( y k , y k+1 ). (3.5)

Next, by using (1.6b) and (1.5), we have

β h ( x k+1 , y k , λ k ) β h ( x k+1 , y k , λ k+ 1 2 )+ β h ( x k+1 , y k+1 , λ k+ 1 2 ) β h ( x k+1 , y k+1 , λ k+1 ) = λ k+ 1 2 λ k ,A x k+1 + y k b + λ k+1 λ k+ 1 2 ,A x k+1 + y k+1 b = αβ( h( A x k+1 b )h( y k ) ),A x k+1 + y k b + λ k+1 λ k +αβ( h( A x k+1 b )h( y k ) ),A x k+1 + y k+1 b = αβ( h( A x k+1 b )h( y k ) ), y k+1 y k + λ k+1 λ k ,A x k+1 + y k+1 b ,

where the second equality follows from (1.6b). Then, according to Lemma 2.5

h( A x k+1 b )h( y k+1 )= 1 ( α+1 )β ( λ k λ k+1 ) α α+1 ( h( y k+1 )h( y k ) ),

h( A x k+1 b )h( y k )= 1 ( α+1 )β ( λ k λ k+1 )+ 1 α+1 ( h( y k+1 )h( y k ) ),

and, since h( ) is 1-strong-convex, we have

h( y k+1 )h( A x k+1 b ) A x k+1 + y k+1 b .

Now, we discuss the two cases based on the range of α .

(i) α( 1 2 L h 2 +1 ,0 ] , combining the above formulas

β h ( x k+1 , y k , λ k ) β h ( x k+1 , y k , λ k+ 1 2 )+ β h ( x k+1 , y k+1 , λ k+ 1 2 ) β h ( x k+1 , y k+1 , λ k+1 ) αβ ( α+1 )β ( λ k λ k+1 )+ αβ α+1 ( h( y k+1 )h( y k ) ), y k+1 y k λ k+1 λ k A x k+1 + y k+1 b α α+1 ( λ k λ k+1 )+ αβ α+1 ( h( y k+1 )h( y k ) ), y k+1 y k λ k+1 λ k h( y k+1 )h( A x k+1 b ) α α+1 ( λ k λ k+1 )+ αβ α+1 ( h( y k+1 )h( y k ) ), y k+1 y k λ k+1 λ k [ 1 ( α+1 )β λ k+1 λ k α α+1 h( y k+1 )h( y k ) ] α α+1 λ k λ k+1 y k+1 y k + αβ α+1 h( y k+1 )h( y k ) y k+1 y k 1 ( α+1 )β λ k+1 λ k 2 + α α+1 λ k+1 λ k h( y k+1 )h( y k ) . (3.6)

Subsequently, we claim that

g( y k+1 )g( y k ) L L h y k+1 y k . (3.7)

To prove (3.7), we consider two cases. When y k = y k+1 , (3.7) holds obviously. Next, we assume y k+1 y k . Since g( ) is D -Lipschitz, we obtain

g( y k+1 )g( y k ) L D h ( y k+1 , y k )+ D h ( y k , y k+1 ) y k+1 y k =L h( y k )h( y k+1 ), y k y k+1 y k+1 y k L h( y k )h( y k+1 ) y k y k+1 y k+1 y k L L h y k y k+1 2 y k+1 y k =L L h y k+1 y k ,

where the second inequality is a consequence of h is Lipschitz continuous with L h ( L h 1 ) on any bounded subset of m , that is

h( y k+1 )h( y k ) L h y k+1 y k . (3.8)

Since λ k+1 =g( y k+1 ) , (3.7) becomes

λ k+1 λ k L L h y k+1 y k . (3.9)

Moreover, according to h( ) is 1-strong-convex, we get D h ( y k , y k+1 ) 1 2 y k y k+1 2 , i.e.,

y k y k+1 2 2 D h ( y k , y k+1 ). (3.10)

Substituting (3.8), (3.9) and (3.10) into (3.6), we conclude that

β h ( x k+1 , y k , λ k ) β h ( x k+1 , y k , λ k+ 1 2 )+ β h ( x k+1 , y k+1 , λ k+ 1 2 ) β h ( x k+1 , y k+1 , λ k+1 ) 2αL L h α+1 D h ( y k , y k+1 )+ 2αβ L h α+1 D h ( y k , y k+1 ) 2 L 2 L h 2 ( α+1 )β D h ( y k , y k+1 )+ 2αL L h 2 α+1 D h ( y k , y k+1 ) =( 2αL L h +2αβ L h +2αL L h 2 α+1 2 L 2 L h 2 ( α+1 )β ) D h ( y k , y k+1 ). (3.11)

Then, combining (3.5) and (3.11), we obtain that

β h ( x k+1 , y k , λ k ) β h ( x k+1 , y k+1 , λ k+1 ) ( β L h L+ 2αL L h +2αβ L h +2αL L h 2 α+1 2 L 2 L h 2 ( α+1 )β ) D h ( y k , y k+1 ). (3.12)

Consequently, according to (1.6a), we have

β h ( x k , y k , λ k ) β h ( x k+1 , y k , λ k )0. (3.13)

Finally, summing the inequality (3.12) and (3.13), we conclude that

β h ( w k ) β h ( w k+1 ) ( β L h L+ 2αL L h +2αβ L h +2αL L h 2 α+1 2 L 2 L h 2 ( α+1 )β ) D h ( y k , y k+1 ).

This completes the discussion for case (i), we now proceed to discuss case (ii).

(ii) α[ 0, 1 2 L h 2 1 ) , combining the same formulas

β h ( x k+1 , y k , λ k ) β h ( x k+1 , y k , λ k+ 1 2 )+ β h ( x k+1 , y k+1 , λ k+ 1 2 ) β h ( x k+1 , y k+1 , λ k+1 ) αβ ( α+1 )β ( λ k λ k+1 )+ αβ α+1 ( h( y k+1 )h( y k ) ), y k+1 y k λ k+1 λ k A x k+1 + y k+1 b α α+1 ( λ k λ k+1 )+ αβ α+1 ( h( y k+1 )h( y k ) ), y k+1 y k

λ k+1 λ k h( y k+1 )h( A x k+1 b ) α α+1 ( λ k λ k+1 )+ αβ α+1 ( h( y k+1 )h( y k ) ), y k+1 y k λ k+1 λ k [ 1 ( α+1 )β λ k+1 λ k + α α+1 h( y k+1 )h( y k ) ] α α+1 λ k λ k+1 y k+1 y k αβ α+1 h( y k+1 )h( y k ) y k+1 y k 1 ( α+1 )β λ k+1 λ k 2 α α+1 λ k+1 λ k h( y k+1 )h( y k ) . (3.14)

Similarly, substituting (3.8), (3.9) and (3.10) into (3.14), we conclude that

β h ( x k+1 , y k , λ k ) β h ( x k+1 , y k , λ k+ 1 2 )+ β h ( x k+1 , y k+1 , λ k+ 1 2 ) β h ( x k+1 , y k+1 , λ k+1 ) 2αL L h α+1 D h ( y k , y k+1 ) 2αβ L h α+1 D h ( y k , y k+1 ) 2 L 2 L h 2 ( α+1 )β D h ( y k , y k+1 ) 2αL L h 2 α+1 D h ( y k , y k+1 ) =( 2αL L h +2αβ L h +2αL L h 2 α+1 2 L 2 L h 2 ( α+1 )β ) D h ( y k , y k+1 ). (3.15)

Then, combining (3.5) and (3.15), we obtain that

β h ( x k+1 , y k , λ k ) β h ( x k+1 , y k+1 , λ k+1 ) ( β L h L 2αL L h +2αβ L h +2αL L h 2 α+1 2 L 2 L h 2 ( α+1 )β ) D h ( y k , y k+1 ). (3.16)

Consequently, according to (1.6a), we have

β h ( x k , y k , λ k ) β h ( x k+1 , y k , λ k )0. (3.17)

Finally, summing the inequality (3.16) and (3.17), we conclude that

β h ( w k ) β h ( w k+1 ) ( β L h L 2αL L h +2αβ L h +2αL L h 2 α+1 2 L 2 L h 2 ( α+1 )β ) D h ( y k , y k+1 ).

This completes the discussion for case (ii), and with this, the proof is complete.□

Remark 3.1. Since δ>0 , Lemma 3.1 implies that { β h ( w k ) } is monotonically nonincreasing. Note that when α=0 , we have β>2L L h 2 . This corresponds to the requirement in Tan and Guo [24]. Furthermore, when h( )= 1 2 2 , we have α=0 and L h =1 , thus β>2L . This corresponds to the requirement in Guo et al. [19].

Lemma 3.2. Let { w k =( x k , y k , λ k ) } be the sequence generated by the symmetric Bregman ADMM (1.6), which is assumed to be bounded. Then we have

k=0 + w k+1 w k 2 <+.

Proof: Considering that { w k } is bounded, it has at least one limit point. Let w * be a limit point of { w k } and let { w k j } be the subsequence converging to it, i.e. w k j w * . Given that β h ( ) is a lower semicontinuous function, we can deduce that

β h ( w * ) liminf j+ β h ( w k j ).

Consequently, { β h ( w k j ) } is bounded from below. Besides, the fact that { β h ( w k ) } is nonincreasing implies that { β h ( w k j ) } is convergent. Moreover, { β h ( w k ) } is convergent, and β h ( w k ) β h ( w * ) . Based on the Equation (3.2), we can obtain

δ D h ( y k , y k+1 ) β h ( w k ) β h ( w k+1 ).

Summing up the above inequality from k=0 to n , we conclude that

k=0 n δ D h ( y k , y k+1 ) β h ( w 0 ) β h ( w n+1 ) β h ( w 0 ) β h ( w * )<+.

Owing to δ>0 , we have k=0 D h ( y k , y k+1 )<+ , which implies k=0 y k+1 y k 2 <+ . Consequently, according to (3.9), we get that

k=0 λ k+1 λ k 2 <+.

Recall Lemma 2.5, we have

λ k+1 = λ k β[ ( α+1 )( h( A x k+1 b )h( y k ) )( h( y k+1 )h( y k ) ) ],

λ k = λ k1 β[ ( α+1 )( h( A x k b )h( y k1 ) )( h( y k )h( y k1 ) ) ].

Combining the two equalities, we obtain

( α+1 )β( h( A x k+1 b )h( A x k b ) ) =( λ k λ k1 )( λ k+1 λ k )αβ( h( y k1 )h( y k ) ) β( h( y k )h( y k+1 ) ).

Then, we use the 1-strong-convex of h( ) and Assumption A(iii), we can follow

M x k+1 x k 2 A x k+1 A x k 2 h( A x k+1 b )h( A x k b ) 2 .

Thus, combining the above two formulas, we obtain that

( α+1 ) 2 β 2 M x k+1 x k 2 ( α+1 ) 2 β 2 A x k+1 A x k 2 ( α+1 ) 2 β 2 h( A x k+1 b )h( A x k b ) 2 4( λ k λ k1 2 + λ k+1 λ k 2 + α 2 β 2 h( y k1 )h( y k ) 2 + β 2 h( y k )h( y k+1 ) 2 ) 4( λ k λ k1 2 + λ k+1 λ k 2 + α 2 β 2 L h 2 y k y k1 2 + β 2 L h 2 y k+1 y k 2 ), (3.18)

where M>0 . Then, the last inequality implies k=0 x k+1 x k 2 <+ .

Thus, k=0 w k+1 w k 2 <+ . This completes the proof.□

Lemma 3.3. Let { w k =( x k , y k , λ k ) } be the sequence generated by the symmetric Bregman ADMM (1.6), which is supposed to be bounded, and given that Assumption A holds. For any positive integer k , we define

{ x k+1 * := A T λ k A T λ k+1 +β A T h 2 ( A x k+1 b ), y k+1 y k , y k+1 * := λ k+ 1 2 λ k+1 , λ k+1 * :=A x k+1 y k+1 +b.

Then, ( x k+1 * , y k+1 * , λ k+1 * ) β h ( w k+1 ) and there exists ξ>0 such that

d( 0, β h ( w k+1 ) )ξ y k+1 y k .

Proof: By definition of function β h ( ) , we can obtain that

{ x β h ( w k+1 )=f( x k+1 ) A T λ k+1 β A T 2 h( A x k+1 b ),A x k+1 y k+1 +b , y β h ( w k+1 )=g( y k+1 ) λ k+1 +β( h( A x k+1 b )h( y k+1 ) ), λ β h ( w k+1 )=( A x k+1 + y k+1 b ). (3.19)

Combining the optimality condition (3.1) with equation (3.19), we get

{ A T λ k A T λ k+1 +β A T h 2 ( A x k+1 b ), y k+1 y k x β h ( w k+1 ), λ k+ 1 2 λ k+1 y β h ( w k+1 ), A x k+1 y k+1 +b λ β h ( w k+1 ).

Then, according to Lemma 2.2, we obtain that ( x k+1 * , y k+1 * , λ k+1 * ) β h ( w k+1 ) .

Furthermore,

h 2 ( A x k+1 b ), y k+1 y k h 2 ( A x k+1 b ) y k+1 y k L h y k+1 y k .

In addition,

A x k+1 + y k+1 b h( A x k+1 b )h( y k+1 ) 1 ( α+1 )β λ k λ k+1 + α α+1 h( y k+1 )h( y k ) 1 ( α+1 )β λ k λ k+1 + α L h α+1 y k y k+1 .

On the other hand,

λ k+ 1 2 λ k+1 = β( h( A x k+1 b )h( y k+1 ) ) 1 α+1 λ k λ k+1 + αβ α+1 h( y k+1 )h( y k ) 1 α+1 λ k λ k+1 + αβ L h α+1 y k+1 y k .

Based on the above relationship, we know that there exist ξ 1 , ξ 2 >0 such that

( x k+1 * , y k+1 * , λ k+1 * ) ξ 1 y k+1 y k + ξ 2 λ k+1 λ k .

Defining ξ:= ξ 1 + ξ 2 L L h and using (3.9), we obtain

d( 0, β h ( w k+1 ) ) ( x k+1 * , y k+1 * , λ k+1 * ) ξ y k+1 y k .

This completes the proof.

Lemma 3.4. Let { w k =( x k , y k , λ k ) } be the sequence generated by the symmetric Bregman ADMM (1.6), which is supposed to be bounded. Let S( w 0 ) denote the set of its limit points. Then

(i) S( w 0 ) is a nonempty compact set, and

d( w k ,S( w 0 ) )0 , as k+ ;

(ii) S( w 0 )crit β h , where  crit β h denotes the set of all stationary points of β h ;

(iii) β h ( ) is finite and constant on S( w 0 ) , which equals to

inf k β h ( w k )= lim k+ β h ( w k ).

Proof:

(i) The proposition is immediately derived from the definition of limit points.

(ii) We assume that ( x * , y * , λ * )S( w 0 ) , then there exists a subsequence { ( x k j , y k j , λ k j ) } that converges to ( x * , y * , λ * ) . Note that Lemma 3.2, we have

w k+1 w k 0. (3.20)

Consequently, we deduce that { ( x k j +1 , y k j +1 , λ k j +1 ) } also converges to ( x * , y * , λ * ) . Given that x k+1 is the minimizer of β h ( x, y k , λ k ) concerning the variable x , we have

β h ( x k+1 , y k , λ k ) β h ( x * , y k , λ k ). (3.21)

On one hand, according to (3.20), (3.21) and the continuity of β h ( ) with respect to y and λ , we get

limsup j+ β h ( x k j +1 , y k j , λ k j )= limsup j+ β h ( x k j +1 , y k j +1 , λ k j +1 ) β h ( x * , y * , λ * ). (3.22)

On the other hand, using the lower semicontinuity of β h ( ) , we have

limsup j+ β h ( x k j +1 , y k j +1 , λ k j +1 ) β h ( x * , y * , λ * ). (3.23)

The above two relations (3.22) and (3.23) imply that

lim j+ f( x k j +1 )=f( x * ).

Taking the limit in the optimality conditions (3.1) along the subsequence { ( x k j +1 , y k j +1 , λ k j +1 ) } , and utilizing (3.20) yields

{ A T λ * f( x * ), λ * =g( y * ), h( A x * b )h( y * )=0.

Thus, ( x * , y * , λ * ) is a critical point of (5), which implies that w *  crit β h .

(iii) For any point ( x * , y * , λ * )S( w 0 ) , there exists a subsequence { ( x k j , y k j , λ k j ) } that converges to ( x * , y * , λ * ) as j+ . By merging equations (22) and (23) with the observation that the sequence { β h ( w k ) } is nonincreasing, we have

lim k+ β h ( x k , y k , λ k )= β h ( x * , y * , λ * ).

Therefore, β h ( ) is constant on S( w 0 ) . Furthermore, we also have inf k β h ( w k )= lim k+ β h ( w k ) .

Hence, we have complete the proof.

Now we present the main convergence result for the symmetric Bregman ADMM (1).

Theorem 3.1. Let { w k =( x k , y k , λ k ) } be the sequence generated by the symmetric Bregman ADMM (1.6), which is assumed to be bounded. Suppose that β h ( ) is a KL function, then { w k } has finite length, that is

k=0 + w k+1 w k <+,

and as a consequence, { w k } converges to a critical point of β h ( ) .

Proof: Based on the proof of Lemma 3.4, it implies that β h ( w k ) β h ( w * ) for all w * S( w 0 ) . Next, we will consider two cases.

(i) If there exists an integer k 0 for which β h ( w k 0 )= β h ( w * ) . Then according to Remark 3.1 and (2), we have

δ D h ( y k , y k+1 ) β h ( w k ) β h ( w k+1 ) β h ( w k 0 ) β h ( w * )=0,

for any k> k 0 . Consequently, we conclude that y k+1 = y k for any k> k 0 . Combining (9) and (18), we further deduce that λ k+1 = λ k and x k+1 = x k for any k> k 0 +1 , which implies that w k+1 = w k . As a result, the assertion is substantiated.

(ii) If β h ( w k )> β h ( w * ) for all k . Considering that d( w k ,S( w 0 ) )0 , there exists k 1 >0 , such that d( w k ,S( w 0 ) )<ε ( ε>0 ) for any k> k 1 . Furthermore, with β h ( w k ) β h ( w * ) , it follows that there exists k 2 >0 , such that β h ( w k )< β h ( w * )+η ( η>0 ) for any k> k 2 .

Thus, when k> k ˜ =max{ k 1 , k 2 } for all ε,η>0 , we derive the following conclusions

d( w k ,S( w 0 ) )<ε, β h ( w * )< β h ( w k )< β h ( w * )+η.

Since β h ( ) is constant on S( w 0 ) and S( w 0 ) is a nonempty compact set, we use Lemma 2.3 with Ω=S( w 0 ) to derive that for any k> k ˜ ,

φ ( β h ( w k ) β h ( w * ) )d( 0, β h ( w k ) )1. (3.24)

Relying on the fact that β h ( w k ) β h ( w k+1 )= β h ( w k ) β h ( w * )( β h ( w k+1 ) β h ( w * ) ) , and the concavity of φ( ) , it follows that

φ( β h ( w k ) β h ( w * ) )φ( β h ( w k+1 ) β h ( w * ) ) φ ( β h ( w k ) β h ( w * ) )( β h ( w k ) β h ( w k+1 ) ).

Combining d( 0, β h ( w k ) )ξ y k y k1 with the above inequality, φ ( β h ( w k ) β h ( w * ) )>0 and inequality (3.24), we can obtain

β h ( w k ) β h ( w k+1 ) φ( β h ( w k ) β h ( w * ) )φ( β h ( w k+1 ) β h ( w * ) ) φ ( β h ( w k ) β h ( w * ) ) d( 0, β h ( w k ) )[ φ( β h ( w k ) β h ( w * ) )φ( β h ( w k+1 ) β h ( w * ) ) ] ξ y k y k1 [ φ( β h ( w k ) β h ( w * ) )φ( β h ( w k+1 ) β h ( w * ) ) ]. (3.25)

For convenience, for all p,q , we define Δ p,q :=φ( β h ( w p ) β h ( w * ) )φ( β h ( w q ) β h ( w * ) ) .

Hence, (3.25) can be simplified as

β h ( w k ) β h ( w k+1 )ξ y k y k1 Δ k,k+1 . (3.26)

Combining inequality (3.26) with Lemma 2, we get that for all k> k ˜ ,

δ 2 y k y k+1 2 δ D h ( y k , y k+1 )ξ y k y k1 Δ k,k+1 .

Then,

y k y k+1 2ξ δ Δ k,k+1 y k y k1 1/2 .

Applying the fact that 2 αβ α+β , we obtain

2 y k y k+1 y k y k1 + 2ξ δ Δ k,k+1 . (3.27)

Summing up (3.27) over for k= k ˜ +1,,m , we have

2 k= k ˜ +1 m y k+1 y k k= k ˜ +1 m y k y k1 + 2ξ δ Δ k ˜ +1,m+1 .

Note that φ( β h ( w m+1 ) β h ( w * ) )>0 from Definition 2.8. Taking m+ , we have

k= k ˜ +1 + y k+1 y k y k ˜ +1 y k ˜ + 2ξ δ φ( β h ( w k ˜ +1 ) β h ( w * ) ). (3.28)

Therefore,

k=0 + y k+1 y k <+. (3.29)

Combining (3.9) and (3.29), we get

k=0 + λ k+1 λ k <+. (3.30)

Using (3.18), we obtain

x k+1 x k 2 ( α+1 )β M ( λ k λ k1 2 + λ k+1 λ k 2 + α 2 β 2 L h 2 y k y k1 2 + β 2 L h 2 y k+1 y k 2 ) 1 2 2 ( α+1 )β M ( λ k λ k1 + λ k+1 λ k +αβ L h y k y k1 + β L h y k+1 y k ).

Combining above inequality with (3.29) and (3.30), we have

k=0 + x k+1 x k <+. (3.31)

Furthermore, we note that

w k+1 w k = ( x k+1 x k 2 + y k+1 y k 2 + λ k+1 λ k 2 ) 1/2 x k+1 x k + y k+1 y k + λ k+1 λ k .

Using (3.29), (3.30) and (3.31), it follows that

k=0 + w k+1 w k <+,

which implies that { w k } is a Cauchy sequence and thus convergent. The assertion follows from Lemma 3.4 immediately.□

Next, we provide the essential sufficient conditions to establish that the sequence { w k =( x k , y k , λ k ) } generated by the symmetric Bregman ADMM (1.6) is bounded.

Lemma 3.5. Let { w k =( x k , y k , λ k ) } be the sequence generated by the symmetric Bregman ADMM (1.6). Suppose that

inf y { g( y ) 1 2L g( y ) 2 }=: g ¯ >.

If at least one of the following statements is true:

(i) liminf x + f( x )=+ .

(ii) inf x f( x )> and liminf y + g( y )=+ .

Then, the sequence { w k =( x k , y k , λ k ) } is bounded.

Proof. Suppose that condition (i) holds. From Lemma 3.1, we have

β h ( x k , y k , λ k ) β h ( x 1 , y 1 , λ 1 ).

which implies

β h ( x 1 , y 1 , λ 1 )f( x k )+g( y k ) λ k ,A x k + y k b +β D h ( y k ,A x k b ). (3.32)

Using the 1-strong convexity of the function h( ) and the definition of D h ( , ) , we have

D h ( y k ,A x k b )=h( y k )h( A x k b ) h( A x k b ), y k A x k b 1 2 A x k + y k b 2 . (3.33)

Combining (3.32) with (3.33) and using the fact λ k =g( y k ) , we get

β h ( x 1 , y 1 , λ 1 )f( x k )+g( y k ) λ k ,A x k + y k b + β 2 A x k + y k b 2 =f( x k )+g( y k ) 1 2β λ k 2 + β 2 A x k + y k b 1 β λ k 2 =f( x k )+( g( y k ) 1 2L g( y k ) 2 )+( 1 2L 1 2β ) λ k 2 + β 2 A x k + y k b 1 β λ k 2 f( x k )+ g ¯ +( 1 2L 1 2β ) λ k 2 + β 2 A x k + y k b 1 β λ k 2 . (34)

Note that (i) implies that inf x f( x )> . When α( 1 2 L h 2 +1 ,0 ] , we have β> 2L L h 2 2αL L h 3 1+α+2α L h 2 >2L , besides, when α[ 0, 1 2 L h 2 1 ) , we have β> 2( 1+α )L L h 2 +2αL L h 3 1+α2α L h 2 >2L . Therefore, we can derive that { x k } , { λ k } and { β 2 A x k + y k b 1 β λ k 2 } are bounded. Consequently, { y k } is also bounded, and hence { w k } is bounded.

Next, suppose condition (ii) holds. Using λ k g( y k ) and (3.34), we get that

β h ( x 1 , y 1 , λ 1 )f( x k )+g( y k ) 1 2β λ k 2 + β 2 A x k + y k b 1 β λ k 2 =f( x k )+ 1 2 g( y k )+( 1 2 g( y k ) 1 2L g( y k ) 2 ) +( 1 4L 1 2β ) λ 2 + β 2 A x k + y k b 1 β λ k 2 f( x k )+ 1 2 g( y k )+ 1 2 g ¯ +( 1 4L 1 2β ) λ k 2 + β 2 A x k + y k b 1 β λ k 2 .

Note that liminf y g( y )=+ implies that inf y g( y )> . When α( 1 2 L h 2 +1 ,0 ] , we have β> 2L L h 2 2αL L h 3 1+α+2α L h 2 >2L , besides, when α[ 0, 1 2 L h 2 1 ) , we have β> 2( 1+α )L L h 2 +2αL L h 3 1+α2α L h 2 >2L . Therefore, we conclude that the sequences { y k } , { λ k } and { β 2 A x k + y k b 1 β λ k 2 } are bounded. Then, from (3.18), it follows that { x k } is also bounded, and as a consequence, { w k } is bounded. This completes the proof.

Theorem 3.2. (Convergence rate) Let { w k =( x k , y k , λ k ) } be the sequence generated by the symmetric Bregman ADMM (1.6) and converge to { w * =( x * , y * , λ * ) } . Assuming that β h ( ) has the KL property at ( x * , y * , λ * ) with φ( s )=c s 1θ , θ[ 0,1 ) , c>0 . Then, the following results hold

(i) If θ=0 , then the sequence { w k =( x k , y k , λ k ) } converges in a finite number of steps.

(ii) If θ( 0, 1 2 ] , then there exists c 1 >0 and τ[ 0,1 ) such that

( x k , y k , λ k )( x * , y * , λ * ) c 1 τ k .

(iii) If θ( 1 2 ,1 ) , then there exists c 2 >0 such that

( x k , y k , λ k )( x * , y * , λ * ) c 2 k ( θ1 )/ ( 2θ1 ) .

Proof: Firstly, consider the case that θ=0 , we have φ( s )=cs and φ ( s )=c . Proof by contradiction, suppose that { w k =( x k , y k , λ k ) } does not converge in a finite number of steps, and then, the KL property at ( x * , y * , λ * ) yields cd( 0, β h ( w k ) )1 for any sufficiently large k , which is contrary to Lemma 3.1.

Secondly, consider that θ>0 and set Δ k = i=k + y i+1 y i for k0 . By the triangle inequality, we derive that Δ k y k y * , and hence it is able to estimate Δ k . With these notations, it follows from (3.28) that

Δ k ˜ +1 Δ k ˜ Δ k ˜ +1 + 2ξ δ φ( β h ( w k ˜ +1 ) β h ( w * ) ).

Invoking the KL property of β h ( ) at ( x * , y * , λ * ) , we conclude that

φ ( β h ( w k ˜ +1 ) β h ( w * ) )d( 0, β h ( w k ˜ +1 ) )1.

This can be taken to imply

( β h ( w k ˜ +1 ) β h ( w * ) ) θ c( 1θ )d( 0, β h ( w k ˜ +1 ) ). (3.35)

According to Lemma 3.1, we get

d( 0, β h ( w k ˜ +1 ) )ξ y k ˜ +1 y k ˜ =ξ( Δ k ˜ Δ k ˜ +1 ). (3.36)

Combining (3.35) and (3.36), it follows that there exists γ>0 such that

φ( β h ( w k ˜ +1 ) β h ( w * ) )=c ( β h ( w k ˜ +1 ) β h ( w * ) ) 1θ γ ( Δ k ˜ Δ k ˜ +1 ) ( 1θ )/θ .

Therefore,

Δ k ˜ +1 Δ k ˜ Δ k ˜ +1 + 2ξ δ γ ( Δ k ˜ Δ k ˜ +1 ) ( 1θ )/θ .

Sequences satisfying such inequalities have been studied in [33]. It is shown that

  • If θ( 0, 1 2 ] , then there exists c 1 >0 and τ[ 0,1 ) , such that

y k y * c 1 τ k . (3.37)

  • If θ( 1 2 ,1 ) , then there exists c 2 >0 , such that

y k y * c 2 k ( θ1 )/ ( 2θ1 ) . (3.38)

Recalling that

λ k+1 λ k L L h y k+1 y k ,

consequently,

λ k λ * L L h y k y * . (3.39)

Furthermore, from the relations

λ k = λ k1 αβ( h( A x k b )h( y k1 ) )β( h( A x k b )h( y k ) ),

and

h( A x * b )h( y * )=0,

it follows that

( α+1 )β( h( A x k b )h( A x * b ) ) =β( h( y k )h( y * ) )+( λ k1 λ * )+( λ * λ k ) +αβ( h( y k1 )h( y * ) ).

i.e.,

( h( A x k b )h( A x * b ) ) = 1 α+1 ( h( y k )h( y * ) )+ 1 ( α+1 )β ( λ k1 λ * ) + 1 ( α+1 )β ( λ * λ k )+ α α+1 ( h( y k1 )h( y * ) ).

Subsequently, combining the 1-strong-convexity of h( ) and above equality, it follows that

x k x * 1 M h( A x k b )h( A x * b ) 1 M ( 1 ( α+1 )β λ k1 λ * + 1 ( α+1 )β λ * λ k + 1 α+1 h( y k )h( y * ) + α α+1 h( y k1 )h( y * ) )

1 M ( 1 ( α+1 )β λ k1 λ * + 1 ( α+1 )β λ * λ k + L h ( α+1 )β y k y * + α L h ( α+1 )β y k1 y * ) 1 M ( L L h ( α+1 )β y k1 y * + L L h ( α+1 )β y k y * + L h ( α+1 )β y k y * + α L h ( α+1 )β y k1 y * ) 1 M ( ( 1+L ) L h ( α+1 )β y k y * + ( α+L ) L h ( α+1 )β y k1 y * ). (3.40)

The desired inequalities follow from (3.37)-(3.40) immediately.□

4. Conclusions

In this paper, we proposed a symmetric Bregman ADMM, which can return to the symmetric ADMM, the Bregman ADMM and classical ADMM while circumventing the requirement for global Lipschitz continuity of the gradient when minimizing a linearly constrained nonconvex minimization problem whose objective function is the sum of two separable nonconvex functions. Moreover, we analyze its convergence, under certain assumptions and when the associated function satisfies the Kurdyka-Lojasiewicz inequality, we prove that the iterative sequence generated by the symmetric Bregman ADMM converges to a critical point of the problem. Finally, we establish the convergence rate of the algorithm.

Conflicts of Interest

The authors declare no conflicts of interest regarding the publication of this paper.

References

[1] Goldstein, T., O’Donoghue, B., Setzer, S. and Baraniuk, R. (2014) Fast Alternating Direction Optimization Methods. SIAM Journal on Imaging Sciences, 7, 1588-1623.
https://doi.org/10.1137/120896219
[2] Boyd, S. (2010) Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers. Foundations and Trends in Machine Learning, 3, 1-122.
https://doi.org/10.1561/2200000016
[3] Yin, W., Osher, S., Goldfarb, D. and Darbon, J. (2008) Bregman Iterative Algorithms for-Minimization with Applications to Compressed Sensing. SIAM Journal on Imaging Sciences, 1, 143-168.
https://doi.org/10.1137/070703983
[4] Figueiredo, M.A.T. and Bioucas-Dias, J.M. (2010) Restoration of Poissonian Images Using Alternating Direction Optimization. IEEE Transactions on Image Processing, 19, 3133-3145.
https://doi.org/10.1109/tip.2010.2053941
[5] Goldstein, T., Bresson, X. and Osher, S. (2009) Geometric Applications of the Split Bregman Method: Segmentation and Surface Reconstruction. Journal of Scientific Computing, 45, 272-293.
https://doi.org/10.1007/s10915-009-9331-z
[6] Ochs, P., Chen, Y., Brox, T. and Pock, T. (2014) iPiano: Inertial Proximal Algorithm for Nonconvex Optimization. SIAM Journal on Imaging Sciences, 7, 1388-1419.
https://doi.org/10.1137/130942954
[7] Candes, E.J. and Tao, T. (2006) Near-Optimal Signal Recovery from Random Projections: Universal Encoding Strategies? IEEE Transactions on Information Theory, 52, 5406-5425.
https://doi.org/10.1109/tit.2006.885507
[8] Beck, A. and Teboulle, M. (2009) Fast Gradient-Based Algorithms for Constrained Total Variation Image Denoising and Deblurring Problems. IEEE Transactions on Image Processing, 18, 2419-2434.
https://doi.org/10.1109/tip.2009.2028250
[9] Dai, W. and Milenkovic, O. (2009) Subspace Pursuit for Compressive Sensing Signal Reconstruction. IEEE Transactions on Information Theory, 55, 2230-2249.
https://doi.org/10.1109/tit.2009.2016006
[10] Aharon, M., Elad, M. and Bruckstein, A. (2006) K-SVD: An Algorithm for Designing Overcomplete Dictionaries for Sparse Representation. IEEE Transactions on Signal Processing, 54, 4311-4322.
https://doi.org/10.1109/tsp.2006.881199
[11] Gabay, D. and Mercier, B. (1976) A Dual Algorithm for the Solution of Nonlinear Variational Problems via Finite Element Approximation. Computers & Mathematics with Applications, 2, 17-40.
https://doi.org/10.1016/0898-1221(76)90003-1
[12] Glowinski, R. and Marroco, A. (1975) Sur l’approximation, par éléments finis d’ordre un, et la résolution, par pénalisation-dualité d’une classe de problèmes de Dirichlet non linéaires. Revue française dautomatique, informatique, recherche opérationnelle. Analyse numérique, 9, 41-76.
https://doi.org/10.1051/m2an/197509r200411
[13] Boley, D. (2013) Local Linear Convergence of the Alternating Direction Method of Multipliers on Quadratic or Linear Programs. SIAM Journal on Optimization, 23, 2183-2207.
https://doi.org/10.1137/120878951
[14] Han, D. and Yuan, X. (2013) Local Linear Convergence of the Alternating Direction Method of Multipliers for Quadratic Programs. SIAM Journal on Numerical Analysis, 51, 3446-3457.
https://doi.org/10.1137/120886753
[15] He, B. and Yuan, X. (2012) On the Convergence Rate of the Douglas-Rachford Alternating Direction Method. SIAM Journal on Numerical Analysis, 50, 700-709.
https://doi.org/10.1137/110836936
[16] Yang, W.H. and Han, D. (2016) Linear Convergence of the Alternating Direction Method of Multipliers for a Class of Convex Optimization Problems. SIAM Journal on Numerical Analysis, 54, 625-640.
https://doi.org/10.1137/140974237
[17] Li, G. and Pong, T.K. (2015) Global Convergence of Splitting Methods for Nonconvex Composite Optimization. SIAM Journal on Optimization, 25, 2434-2460.
https://doi.org/10.1137/140998135
[18] Hong, M., Luo, Z. and Razaviyayn, M. (2016) Convergence Analysis of Alternating Direction Method of Multipliers for a Family of Nonconvex Problems. SIAM Journal on Optimization, 26, 337-364.
https://doi.org/10.1137/140990309
[19] Guo, K., Han, D.R. and Wu, T.T. (2016) Convergence of Alternating Direction Method for Minimizing Sum of Two Nonconvex Functions with Linear Constraints. International Journal of Computer Mathematics, 94, 1653-1669.
https://doi.org/10.1080/00207160.2016.1227432
[20] Wu, Z., Li, M., Wang, D.Z.W. and Han, D. (2017) A Symmetric Alternating Direction Method of Multipliers for Separable Nonconvex Minimization Problems. Asia-Pacific Journal of Operational Research, 34, Article ID: 1750030.
https://doi.org/10.1142/s0217595917500300
[21] Bauschke, H.H., Bolte, J. and Teboulle, M. (2017) A Descent Lemma Beyond Lipschitz Gradient Continuity: First-Order Methods Revisited and Applications. Mathematics of Operations Research, 42, 330-348.
https://doi.org/10.1287/moor.2016.0817
[22] Dragomir, R., d’Aspremont, A. and Bolte, J. (2021) Quartic First-Order Methods for Low-Rank Minimization. Journal of Optimization Theory and Applications, 189, 341-363.
https://doi.org/10.1007/s10957-021-01820-3
[23] Nesterov, Y. (2019) Implementable Tensor Methods in Unconstrained Convex Optimization. Mathematical Programming, 186, 157-183.
https://doi.org/10.1007/s10107-019-01449-1
[24] Tan, L. and Guo, K. (2025) Bregman ADMM: A New Algorithm for Nonconvex Optimization with Linear Constraints. Journal of Nonlinear and Variational Analysis, 9, 176-196.
https://doi.org/10.23952/jnva.9.2025.2.02
[25] Wang, H. and Banerjee, A. (2014) Bregman Alternating Direction Method of Multipliers. arXiv: 1306.3203.
[26] Beck, A. (2017) First-Order Methods in Optimization. Society for Industrial and Applied Mathematics.
https://doi.org/10.1137/1.9781611974997
[27] Bolte, J., Sabach, S., Teboulle, M. and Vaisbourd, Y. (2018) First Order Methods Beyond Convexity and Lipschitz Gradient Continuity with Applications to Quadratic Inverse Problems. SIAM Journal on Optimization, 28, 2131-2151.
https://doi.org/10.1137/17m1138558
[28] Attouch, H., Bolte, J., Redont, P. and Soubeyran, A. (2010) Proximal Alternating Minimization and Projection Methods for Nonconvex Problems: An Approach Based on the Kurdyka-Łojasiewicz Inequality. Mathematics of Operations Research, 35, 438-457.
https://doi.org/10.1287/moor.1100.0449
[29] Attouch, H., Bolte, J. and Svaiter, B.F. (2011) Convergence of Descent Methods for Semi-Algebraic and Tame Problems: Proximal Algorithms, Forward-Backward Splitting, and Regularized Gauss–seidel Methods. Mathematical Programming, 137, 91-129.
https://doi.org/10.1007/s10107-011-0484-9
[30] Bolte, J., Sabach, S. and Teboulle, M. (2013) Proximal Alternating Linearized Minimization for Nonconvex and Nonsmooth Problems. Mathematical Programming, 146, 459-494.
https://doi.org/10.1007/s10107-013-0701-9
[31] Bauschke, H.H., Borwein, J.M. and Combettes, P.L. (2003) Bregman Monotone Optimization Algorithms. SIAM Journal on Control and Optimization, 42, 596-636.
https://doi.org/10.1137/s0363012902407120
[32] Bregman, L.M. (1967) The Relaxation Method of Finding the Common Point of Convex Sets and Its Application to the Solution of Problems in Convex Programming. USSR Computational Mathematics and Mathematical Physics, 7, 200-217.
https://doi.org/10.1016/0041-5553(67)90040-7
[33] Attouch, H. and Bolte, J. (2007) On the Convergence of the Proximal Algorithm for Nonsmooth Functions Involving Analytic Features. Mathematical Programming, 116, 5-16.
https://doi.org/10.1007/s10107-007-0133-5

Copyright © 2025 by authors and Scientific Research Publishing Inc.

Creative Commons License

This work and the related PDF file are licensed under a Creative Commons Attribution 4.0 International License.