Convergence of Bregman Alternating Direction Method of Multipliers for Nonseparable Nonconvex Objective with Linear Constraints

Abstract

In this paper, our focus lies on addressing a two-block linearly constrained nonseparable nonconvex optimization problem with coupling terms. The most classical algorithm, the alternating direction method of multipliers (ADMM), is employed to solve such problems typically, which still requires the assumption of the gradient Lipschitz continuity condition on the objective function to ensure overall convergence from the current knowledge. However, many practical applications do not adhere to the conditions of smoothness. In this study, we justify the convergence of variant Bregman ADMM for the problem with coupling terms to circumvent the issue of the global Lipschitz continuity of the gradient. We demonstrate that the iterative sequence generated by our approach converges to a critical point of the issue when the corresponding function fulfills the Kurdyka-Lojasiewicz inequality and certain assumptions apply. In addition, we illustrate the convergence rate of the algorithm.

Share and Cite:

Zeng, X. , Yao, J. and Xia, H. (2024) Convergence of Bregman Alternating Direction Method of Multipliers for Nonseparable Nonconvex Objective with Linear Constraints. Journal of Applied Mathematics and Physics, 12, 639-660. doi: 10.4236/jamp.2024.122042.

1. Introduction

We consider the following two-block linearly constrained nonseparable nonconvex optimization problem:

min f ( x ) + g ( y ) + H ( x , y ) s .t . A x + y = b , (1)

where f : n { + } is a proper lower semicontinuous function, g : m is a continuous differentiable function, H : n × m is a smooth function, A m × n is a matrix and b m is a vector. Problem (1) finds numerous applications in scenarios such as the control of a smart grid system [1] [2] [3] , the appliance load model [4] , cognitive radio network and other related domains [5] [6] .

To solve this problem, the commonly employed method is the Alternating Direction Method of Multipliers (ADMM) [7] which stands out as a popular and well-established approach.

In 2017, Guo et al. studied the convergence of the ADMM to solve the problem (1) [7] and its iteration is as follows:

( x k + 1 = arg min x { L β ( x , y k , λ k ) } , y k + 1 = arg min y { L β ( x k + 1 , y , λ k ) } , λ k + 1 = λ k + β ( A x k + 1 + y k + 1 b ) . (2)

Here, L β ( ) denotes the augmented Lagrangian function defined as

L β ( x , y , λ ) = f ( x ) + g ( y ) + H ( x , y ) + λ T ( A x + y b ) + β 2 A x + y b 2 2 ,

where λ is the Lagrangian multiplier associated with the linear constraints and β > 0 is the penalty parameter.

Note that, there is a critical condition that the function g ( ) is L-smooth in their convergence analysis. Indeed, for most algorithms, the absence of convexity makes the smoothness condition a necessary requirement for convergence analysis. Unfortunately, the same holds true for ADMM [8] [9] [10] [11] . However, there exists a multitude of applications that are nonsmooth. The prevalence of non-smooth problems necessitates contemplating how to relax smoothness in nonconvex situations.

To alleviate the L-smoothness condition, Tan and Guo introduced in 2023 the convergence of the following Bregman ADMM for the special case of problem (1) [12] where the coupling term H ( ) = 0 :

min f ( x ) + g ( y ) s .t . A x + y = b , (3)

where f : n { + } is a proper lower semicontinuous function, g : m is a continuous differentiable function, A m × n is a matrix and b m is a vector. It also encompasses a range of significant applications in diverse areas, such as imaging processing [13] [14] [15] , statistical learning [16] [17] [18] , engineering [19] [20] [21] and so on. Problem (1) reduces to (3) when the coupled function H is absent.

The iterative format of Bregman ADMM is as follows:

( x k + 1 = arg min x { L β h ( x , y k , λ k ) } , y k + 1 = arg min y { L β h ( x k + 1 , y , λ k ) } , λ k + 1 = λ k + β ( h ( A x k + 1 b ) h ( y k + 1 ) ) . (4)

Here, the augmented Lagrangian function L β h ( ) which is defined as

L β h ( x , y , λ ) = f ( x ) + g ( y ) + λ T ( A x + y b ) + β D h ( y , A x b ) .

where λ is the Lagrangian multiplier associated with the linear constraints and β > 0 is the penalty parameter. The above Bregman ADMM reduces to the

classic ADMM when h = 1 2 2 2 . However, their proof only established the con-

vergence of the separable problem (3), where there is no coupling term. Therefore, the motivation of this paper is to extend the algorithm to the nonseparable problem (1) with a coupling term.

The iterative format of Bregman ADMM for the problem (1) is as follows:

( x k + 1 = arg min x { L β h ( x , y k , λ k ) } , y k + 1 = arg min y { L β h ( x k + 1 , y , λ k ) } , λ k + 1 = λ k + β ( h ( A x k + 1 b ) h ( y k + 1 ) ) . (5)

Here, the augmented Lagrangian function L β h ( ) which is defined as

L β h ( x , y , λ ) = f ( x ) + g ( y ) + H ( x , y ) + λ T ( A x + y b ) + β D h ( y , A x b ) .

where λ is the Lagrangian multiplier associated with the linear constraints and β > 0 is the penalty parameter.

Given our aim to relax the strict smoothness condition g ( ) to be relatively smooth, our focus is solely on modifying the Euclidean distance in the iterative formulation of the y variable to the Bregman distance, and the iterative format of x is the same as the classical ADMM. The iterative format called new Bregman ADMM for the problem (1) is as follows:

( x k + 1 = arg min x { f ( x ) + g ( y k ) + H ( x , y k ) + λ k T ( A x + y k b ) + β 1 2 A x + y k b 2 } , y k + 1 = arg min y { L β 2 h ( x k + 1 , y , λ k ) } , λ k + 1 = λ k + β 2 ( h ( A x k + 1 b ) h ( y k + 1 ) ) . (6)

Here denotes L β 2 h ( ) the augmented Lagrangian function which is defined as

L β 2 h ( x , y , λ ) = f ( x ) + g ( y ) + H ( x , y ) + λ T ( A x + y b ) + β 2 D h ( y , A x b ) .

where λ is the Lagrangian multiplier associated with the linear constraints and β 1 , β 2 > 0 is the penalty parameter.

Building upon the aforementioned motivation, our attention is directed towards the Bregman ADMM applied to the nonseparable problem (1). We delve into the following aspects: For two-block nonseparable nonconvex optimization problems with linear constraints, we develop an appropriate Lyapunov function and leverage the KL inequality to examine the global convergence of the Bregman ADMM. The paper is organized as follows. We first summarize some necessary preliminaries for further analysis in Section 2. In Section 3, we analyze the convergence of variant Bregman ADMM and its convergence rate. Finally, some conclusions are made in Section 4.

2. Preliminaries

In this section, we summarized some notations and preliminaries to be used for further analysis.

Definition 2.1. [22] For an extended real-valued function f : n { + } , the effective domain or just the domain is the set

d o m ( f ) = { x n : f ( x ) < } .

Definition 2.2. [22] A function f : n { + } is called proper if there at least one x n such that f ( x ) < .

Definition 2.3. [22] A function f : n { + } is called lower semicontinuous at x n if

f ( x ) l i m k inf f ( x k )

for any sequence { x k } n for which x k x as k . Moreover, f is called lower semicontinuous if it is lower semicontinuous at each point in n .

Definition 2.4. ( [23] kernel generating distance) Let C be a nonempty, convex, and open subset of m . Associated with C, a function h : m { + } is called a kernel generating distance if it satisfies the following:

1) h is proper, lower semicontinuous, and convex, with d o m ( h ) C ¯ and d o m ( h ) = C .

2) h is C 1 on i n t ( d o m ( h ) ) C .

We denote the class of kernel generating distance by G ( C ) .

Given h G ( C ) , define Bregman distance D h : dom ( h ) × int ( dom ( h ) ) + by

D h ( x , y ) : = h ( x ) h ( y ) h ( y ) , x y . (7)

Since h is convex, D h ( x , y ) 0 , and D h ( x , y ) = 0 only when x = y .

Lemma 2.1. [24] Let h : m { + } be a kernel generating distance. The following properties follow directly from (7). Let x , y i n t ( d o m ( h ) ) and z d o m ( h ) , then

1) D h ( x , y ) + D h ( y , x ) = h ( x ) h ( y ) , x y .

2) Three points identity holds:

D h ( z , x ) D h ( z , y ) D h ( y , x ) = h ( x ) h ( y ) , y z . (8)

In the subsequent analysis, we will use a pair of functions ( g , h ) satisfying

1) h G ( C ) .

2) g : m { + } is a proper and lower semicontinuous function dom ( h ) dom ( g ) , which is continuously differentiable on C = int ( dom ( h ) ) .

Definition 2.5. ( [23] L-smooth adaptable) A pair ( g , h ) is called L-smooth adaptable on C if there exists L > 0 such that L h g and L h + g are convex on C.

Remark 2.1. Since L h g and L h + g is convex, then its gradient are monotonous, we obtain

g ( x ) g ( y ) , x y L h ( x ) h ( y ) , x y = L ( D h ( x , y ) + D h ( y , x ) ) ,

and

g ( y ) g ( x ) , x y L h ( x ) h ( y ) , x y = L ( D h ( x , y ) + D h ( y , x ) ) .

If we assume,

g ( x ) g ( y ) L D h ( x , y ) + D h ( y , x ) x y , for all x y i n t ( d o m ( h ) ) , (9)

then by Cauchy-Schward inequality we immediately obtain the above two inequalities. Thus, (9) provides a Lipschitz-like gradient property of g with respect to D h .

Remark 2.2. Definition 2.5 naturally complements and extends the definition of A Lipschitz-like/Convexity Condition in [22] , which allows to obtain the following two-sided descend lemma.

Lemma 2.2. ( [23] extended descent lemma) The pair of functions ( g , h ) is L-smooth adaptable on C if and only if

| g ( x ) g ( y ) g ( y ) , x y | L D h ( x , y ) x , y i n t ( d o m ( h ) ) . (10)

In particular, when the set C = m and h ( ) = ( 1 / 2 ) 2 , (10) reduces to the classical descent lemma for a function g with a L-smooth gradient on m , i.e.,

| g ( x ) g ( y ) g ( y ) , x y | L 2 x y 2 x , y m .

Definition 2.6. [23] Let f : n { + } be a proper lower semicontinuous function.

1) The Fréchet subdifferential, or regular subdifferential of f at x d o m ( f ) , written ^ f ( x ) , is the set of vectors x * n that satisfy

l i m y x inf y x f ( y ) f ( x ) x * , y x y x 0.

When x d o m ( f ) we set ^ f ( x ) = .

2) The limiting-subdifferential, or simply the subdifferential, of f at x d o m ( f ) , written f ( x ) , is defined as follows:

f ( x ) = { x * n , x n x , f ( x n ) f ( x ) , x n * ^ f ( x n ) , w i t h x n * x * } .

Remark 2.3. From the above definition, we note that

1) The above definition implies ^ f ( x ) f ( x ) for each x n , where the first set is closed convex while the second one is only closed.

2) Let ( x k , x k * ) G r a p h f be a sequence that converges to ( x , x * ) . By the definition of f , if f ( x k ) converges to f ( x ) as k + , then ( x , x * ) G r a p h f .

3) A necessary condition for x n to be a minimizer of f is

0 f ( x ) . (11)

4) If f : n { + } is a proper lower semicontinuous and g : n is continuous differentiable, then ( f + g ) ( x ) = f ( x ) + g ( x ) for any x d o m ( f ) .

A point satisfying Equation (11) is called a critical point or a station point. The critical points set of f is denoted by crit f .

Now, we recall an important property of subdifferential calculus.

Lemma 2.3. [25] Suppose that F ( x , y ) = f ( x ) + g ( x ) , where f : n { + } and g : m { + } are proper lower r semicontinuous functions. Then for all ( x , y ) d o m ( F ) = d o m ( f ) × d o m ( g ) , we have

F ( x , y ) = x F ( x , y ) × y F ( x , y ) .

Definition 2.7. ( [25] Kurdyka-Lojasiewicz inequality) Let f : n { + } be a proper lower semicontinuous function. For < η 1 < η 1 + , set

[ η 1 < f < η 1 ] = { x n : η 1 < f ( x ) < η 2 } .

We say that function f has the KL property at x * d o m ( f ) if there exist η ( 0, + ] , a neighbourhood U of x * , and a continuous concave function φ : [ 0, η ) + , such that

1) φ ( 0 ) = 0 ;

2) φ is C 1 on ( 0, η ) and continuous at 0;

3) φ ( x ) > 0 , x ( 0 , η ) ;

4) for all x in U [ f ( x * ) < f < f ( x * ) + η ] , the Kurdyka-Lojasiewicz inequality holds

φ ( f ( x ) f ( x * ) ) d ( 0, f ( x ) ) 1.

Definition 2.8. ( [26] Kurdyka-Lojasiewicz function) Denote ϕ η be the set of functions that satisfy Defination 2.7 1) - 3). If f satisfies the KL property at each point of d o m ( f ) , then f is called a KL function.

Lemma 2.4 ( [27] Uniformized KL property) Let Ω be a compact set and let f : n { + } be a proper and lower semicontinuous function. Assume that f is constant on Ω and satisfies the KL property at each point of Ω. Then, there exist ε > 0 , η > 0 , and φ ϕ η such that for all x ¯ Ω and for all x in the following intersection:

{ x n : d ( x , Ω ) < ε } [ f ( x ¯ ) < f < f ( x ¯ ) + η ] .

one has

φ ( f ( x ) f ( x ¯ ) ) d ( 0, f ( x ) ) 1.

Definition 2.9. We say that ( x * , y * , λ * ) is a critical point of the Augmented Lagrangian Function with Bregman distance L β ( ) if it satisfies

{ A T λ * f ( x * ) , λ * = g ( y * ) , A x * + y * b = 0.

Lemma 2.5. [28] Let h : n be a continuously differentiable function whose gradient h is Lipschitz continuous with constant L > 0 , then for any x , y n , we have

| h ( y ) h ( x ) h ( x ) , y x | L 2 y x 2 .

Definition 2.10. A proper lower semicontinuous function f : n { + } is called semi-convex with constant ω 0 if the function

x f ( x ) + ω 2 x 2

is convex. Specially, if ω = 0 , then g is convex.

Remark 2.4. Definition (2.10) is equivalent to

f ( y ) f ( x ) + p , y x ω 2 y x 2 ,

for all x , y n and all p f ( x ) .

3. Convergence Analysis

We commence our analysis by examining the optimality condition. By invoking the optimality condition for Equation (6), we obtain

( 0 f ( x k + 1 ) + x H ( x k + 1 , y k ) + A T λ k + β 1 A T ( A x k + 1 + y k b ) 0 = g ( y k + 1 ) + y H ( x k + 1 , y k + 1 ) + λ k + β 2 ( h ( A x k + 1 b ) h ( y k + 1 ) ) λ k + 1 = λ k + β 2 ( h ( A x k + 1 b ) h ( y k + 1 ) ) . (12)

By utilizing the previous equation and reorganizing terms, we derive

( x H ( x k + 1 , y k ) A T λ k β 1 A T ( A x k + 1 + y k b ) f ( x k + 1 ) y H ( x k + 1 , y k + 1 ) λ k β 2 ( h ( A x k + 1 b ) h ( y k + 1 ) ) = g ( y k + 1 ) λ k + 1 = λ k + β 2 ( h ( A x k + 1 b ) h ( y k + 1 ) ) . (13)

In what follows, we assume:

Assumption 3.1. Let f : n { + } be a semiconvex function with constant ω > 0 , ( g , h ) be L-smooth adaptable on domh where h be C 2 on the interior of domh, μ h strong-convex and h is Lipschitz continuous with L h on any bounded subset of m , and let H : n × m be a smooth function. Assume the following holds:

1) inf ( x , y ) n × m H ( x , y ) > , inf x n f 1 ( x ) > , inf y m f 2 ( y ) > ;

2) For any fixed x, the partial gradient y H ( x , y ) is globally Lipschitz with constant L 2 ( x ) , that is

y H ( x , y 1 ) y H ( x , y 2 ) L 2 ( x ) y 1 y 2 , y 1 , y 2 m .

For any fixed y, the partial gradient x H ( x , y ) is globally Lipschitz with constant L 3 ( y ) , that is

x H ( x 1 , y ) x H ( x 2 , y ) L 3 ( y ) x 1 x 2 , x 1 , x 2 n .

3) There exists L 2 , L 3 > 0 such that

sup { L 2 ( x k ) : k N } L 2 , sup { L 3 ( y k ) : k N } L 3 .

4) H is Lipschitz continuous on bounded subsets of n × m . In other words, for each bounded subset B 1 × B 2 n × m , there exists M > 0 such that for all ( x i , y i ) B 1 × B 2 , i = 1 , 2 :

( x H ( x 1 , y 1 ) x H ( x 2 , y 2 ) , y H ( x 1 , y 1 ) y H ( x 2 , y 2 ) ) M ( x 1 x 2 , y 1 y 2 ) .

5) A T A μ I for some μ > 0 .

6) β 1 = L h β 2

For the sake of convenience, in the following analysis, we frequently employ the notation { w k : = ( x k , y k , λ k ) } k N and { v k : = ( x k , y k ) } k N . We commence our analysis with the subsequent lemma.

Lemma 3.1. Let { w k } k N be the sequence generated by the ADMM (1.3) which is assumed to be bounded, then we have

L ( w k + 1 ) L ( w k ) δ v k + 1 v k 2 (14)

where

L ( x , y , λ ) = f ( x ) + g ( y ) + H ( x , y ) + λ T ( A x + y b ) + β 1 2 A x + y b 2 + β 2 D h ( y , A x b )

Proof.

By considering the definition L ( ) , we obtain

L ( x k + 1 , y k + 1 , λ k + 1 ) = L ( x k + 1 , y k + 1 , λ k ) + λ k + 1 λ k , A x k + 1 + y k + 1 b . L ( x k + 1 , y k + 1 , λ k ) + λ k + 1 λ k A k + 1 + y k + 1 b . (15)

By applying the fact that h is μ h -strong-convex, we get

h ( A x k + 1 b ) h ( y k + 1 ) , A x k + 1 + y k + 1 b μ h A x k + 1 + y k + 1 b 2 . (16)

Using the Cauchy-Schwarz inequality, we have

h ( A x k + 1 b ) h ( y k + 1 ) A x k + 1 + y k + 1 b h ( A x k + 1 b ) h ( y k + 1 ) , A x k + 1 + y k + 1 b . (17)

Combining (16) and (17), we can conclude that

h ( y k + 1 ) h ( A x k + 1 b ) μ h A x k + 1 + y k + 1 b . (18)

Based on the iterative scheme of the algorithm, we can infer the following

λ k + 1 = λ k + β 2 ( h ( A x k + 1 b ) h ( y k + 1 ) ) .

Therefore,

h ( A x k + 1 b ) h ( y k + 1 ) = 1 β 2 λ k + 1 λ k . (19)

By combining Equations (18) and (19), we can infer that

A x k + 1 + y k + 1 b 1 β 2 μ h λ k + 1 λ k . (20)

Substituting the previous inequality into Equation (15), which yields

L ( x k + 1 , y k + 1 , λ k + 1 ) L ( x k + 1 , y k + 1 , λ k ) + 1 β 2 μ h λ k + 1 λ k 2 . (21)

As defined by L ( ) , we have

L ( x k + 1 , y k + 1 , λ k ) = L ( x k + 1 , y k , λ k ) + g ( y k + 1 ) g ( y k ) + H ( x k + 1 , y k + 1 ) H ( x k + 1 , y k ) + λ k , y k + 1 y k + β 2 D h ( y k + 1 , A x k + 1 b ) β 2 D h ( y k , A x k + 1 b ) + β 1 2 A x k + 1 + y k + 1 b 2 β 1 2 A x k + 1 + y k b 2 . (22)

Assumption A (3.1) demonstrates ( g , h ) is L-smooth adaptable, which implies

g ( y k + 1 ) g ( y k ) g ( y k + 1 ) , y k y k + 1 + L D h ( y k , y k + 1 ) . (23)

For any fixed x , the partial gradient y H ( x , y ) is globally Lipschitz with constant L 2 ( x ) , which can be inferred from assumptions 2). Further applying the Lemma (2.5) yields

H ( x k + 1 , y k + 1 ) H ( x k + 1 , y k ) y H ( x k + 1 , y k + 1 ) , y k y k + 1 + L 2 ( x k + 1 ) 2 y k + 1 y k 2 . (24)

By three points identity (8), we obtain,

β 2 D h ( y k + 1 , A x k + 1 b ) β 2 D h ( y k , A x k + 1 b ) = β 2 D h ( y k + 1 , A x k + 1 b ) β 2 D h ( y k , A x k + 1 b ) + β 2 D h ( y k , y k + 1 ) β 2 D h ( y k , y k + 1 ) = β 2 ( D h ( y k , A x k + 1 b ) D h ( y k , y k + 1 ) D h ( y k + 1 , A x k + 1 b ) β 2 D h ( y k , y k + 1 )

= β 2 h ( A x k + 1 b ) h ( y k + 1 ) , y k y k + 1 β 2 D h ( y k , y k + 1 ) = λ k + 1 λ k , y k + 1 y k β 2 D h ( y k , y k + 1 ) . (25)

The above two equations use that λ k + 1 = λ k + β 2 ( h ( A x k + 1 b ) h ( y k + 1 ) ) .

Inserting the above three Equations (24), (25), (23) into (22) yields

L ( x k + 1 , y k + 1 , λ k ) L ( x k + 1 , y k , λ k ) L D h ( y k , y k + 1 ) β 2 D h ( y k , y k + 1 ) + L 2 2 y k + 1 y k 2 + β 1 2 A x k + 1 + y k + 1 b 2 β 1 2 A x k + 1 + y k b 2 + λ k + 1 λ k , y k + 1 y k + ( g ( y k + 1 ) + y H ( x k + 1 , y k + 1 ) ) , y k y k + 1 + λ k , y k + 1 y k (26)

From optimal condition, λ k + 1 = ( g ( y k + 1 ) + y H ( x k + 1 , y k + 1 ) ) (13) we get

L ( x k + 1 , y k + 1 , λ k ) L ( x k + 1 , y k , λ k ) L D h ( y k , y k + 1 ) β 2 D h ( y k , y k + 1 ) + L 2 2 y k + 1 y k 2 + β 1 2 A x k + 1 + y k + 1 b 2 β 1 2 A x k + 1 + y k b 2 . (27)

From the condition that h is μ h -strong-convex and L h -smooth, we get

μ h 2 y k + 1 y k 2 D h ( y k , y k + 1 ) L h 2 y k + 1 y k 2 ,

μ h 2 y k + 1 y k 2 D h ( y k , y k + 1 ) L h 2 y k + 1 y k 2 . (28)

Substituting the aforementioned inequality into (27) reveals

L ( x k + 1 , y k + 1 , λ k ) L ( x k + 1 , y k , λ k ) L L h + L 2 β 2 μ h 2 y k + 1 y k 2 + β 1 2 A x k + 1 + y k + 1 b 2 β 1 2 A x k + 1 + y k b 2 . (29)

By defination of L ( ) , we obtain

L ( x k + 1 , y k , λ k ) L ( x k , y k , λ k ) = f ( x k + 1 ) f ( x k ) + H ( x k + 1 , y k ) H ( x k , y k ) + λ k , A x k + 1 A x k + β 2 D h ( y k , A x k + 1 b ) β 2 D h ( y k , A x k b ) + β 1 2 A x k + 1 + y k b 2 β 1 2 A x k + y k b 2 . (30)

From the Assumption 3.1 that the partial gradient x H ( x , y ) is globally Lipschitz with constant L 3 ( y ) , which implies

x H ( x 1 , y ) x H ( x 2 , y ) L 3 ( y ) x 1 x 2 , x 1 , x 2 R n .

By Lemma 2.5, we know

H ( x k + 1 , y k ) H ( x k , y k ) x H ( x k + 1 , y k ) , x k x k + 1 + L 3 ( y k ) 2 x k + 1 x k 2 . (31)

Besides, f is semiconvex with parameter ω , which implies (2.4) as follows:

f ( x k + 1 ) f ( x k ) p , x k x k + 1 + ω 2 x k + 1 x k 2 ,

for all x , y R n and all p f ( x k + 1 ) .

From optimality condition (13), we could know

x H ( x k + 1 , y k ) A T λ k + β 1 A T ( A x k + 1 y k + b ) f ( x k + 1 ) .

Therefore,

f ( x k + 1 ) f ( x k ) x H ( x k + 1 , y k ) A T λ k + β 1 A T ( A x k + 1 y k + b ) , x k + 1 x k + ω 2 x k + 1 x k 2 = x H ( x k + 1 , y k ) , x k + 1 x k A T λ k , x k + 1 x k + β 1 A T ( A x k + 1 y k + b ) , x k + 1 x k + ω 2 x k + 1 x k 2 . (32)

Inserting Equation (32), (31) into (30) yields

L ( x k + 1 , y k , λ k ) L ( x k , y k , λ k ) x H ( x k + 1 , y k ) , x k + 1 x k A T λ k , x k + 1 x k + β 1 A T ( A x k + 1 y k + b ) , x k + 1 x k + ω 2 x k + 1 x k 2 x H ( x k + 1 , y k ) , x k x k + 1 + L 3 ( y k ) 2 x k + 1 x k 2 + λ k , A x k + 1 A x k + β 2 D h ( y k , A x k + 1 b ) β 2 D h ( y k , A x k b )

+ β 1 2 A x k + 1 + y k b 2 β 1 2 A x k + y k b 2 = β 1 A T ( A x k + 1 y k + b ) , x k x k + 1 + ω 2 x k + 1 x k 2 + L 3 2 x k + 1 x k 2 + β 2 D h ( y k , A x k + 1 b ) β 2 D h ( y k , A x k b ) + β 1 2 A x k + 1 + y k b 2 β 1 2 A x k + y k b 2 . (33)

From the three-point equation, we get

β 2 D h ( y k , A x k + 1 b ) β 2 D h ( y k , A x k b ) = β 2 ( D h ( y k , A x k b ) D h ( y k , A x k + 1 b ) D h ( A x k + 1 b , A x k b ) ) β 2 D h ( A x k + 1 b , A x k b ) = β 2 h ( A x k b ) h ( A x k + 1 b ) , A x k + 1 + y k b β 2 D h ( A x k + 1 b , A x k b ) . (34)

Using (28), we could know

D h ( A x k + 1 b , A x k b ) μ h 2 A x k + 1 A x k 2 . (35)

Substituting above two equations (34), (35) into (33), we obtain

L ( x k + 1 , y k , λ k ) L ( x k , y k , λ k ) ω + L 3 2 x k + 1 x k 2 β 2 μ h 2 A x k + 1 A x k 2 + β 1 2 A x k + 1 + y k b 2 β 1 2 A x k + y k b 2 β 1 A T ( A x k + 1 y k + b ) , x k x k + 1 β 2 h ( A x k b ) h ( A x k + 1 b ) , A x k + 1 + y k b . (36)

By the fact that h is Lipschitz continuous with L h on any bounded subset of m , we get

β 2 h ( A x k b ) h ( A x k + 1 b ) , A x k + 1 + y k b β 2 L h A x k + 1 + y k b A x k A x k + 1 . (37)

Using Equation (21), (29) and (36), which yields

L ( x k + 1 , y k + 1 , λ k + 1 ) L ( x k , y k , λ k ) ω + L 3 2 x k + 1 x k 2 β 2 μ h 2 A x k + 1 A x k 2 β 1 2 A x k + y k b 2 + L L h + L 2 β 2 μ h 2 y k + 1 y k 2 + β 1 2 A x k + 1 + y k + 1 b 2 + 1 β 2 μ h λ k + 1 λ k 2 β 1 A T ( A x k + 1 y k + b ) , x k x k + 1 + β 2 L h A x k + 1 + y k b A x k A x k + 1 . (38)

It is not difficult to see

β 1 2 A x k + y k b 2 β 1 A T ( A x k + 1 y k + b ) , x k x k + 1 = β 1 2 A x k + 1 + y k b + A x k A x k + 1 2 β 1 A T ( A x k + 1 y k + b ) , x k x k + 1 = β 1 2 ( A x k + 1 + y k b 2 + A x k A x k + 1 2 + 2 A x k + 1 + y k b , A x k A x k + 1 ) β 1 A T ( A x k + 1 y k + b ) , x k x k + 1

= β 1 2 ( A x k + 1 + y k b 2 + A x k A x k + 1 2 ) β 1 A x k + 1 + y k b A x k A x k + 1 = β 2 L h A x k + 1 + y k b A x k A x k + 1 ( β 1 = L h β 2 ) . (39)

Using the condition that β 1 = L h β 2 , which yields

β 1 2 A x k + y k b 2 β 1 A T ( A x k + 1 y k + b ) , x k x k + 1

β 2 h ( A x k b ) h ( A x k + 1 b ) , A x k + 1 + y k b 0 (40)

From the assumption that A T A μ I for some μ > 0 , we obtain

A x k + 1 A x k 2 μ x k + 1 x k 2 . (41)

Substituting (40), (41) and (20) into (38), which yields

L ( x k + 1 , y k + 1 , λ k + 1 ) L ( x k , y k , λ k ) L L h + L 2 β 2 μ h 2 y k + 1 y k 2 + ω + L 3 2 x k + 1 x k 2 β 2 μ h 2 A x k + 1 A x k 2 + β 1 + 2 β 2 μ h 2 β 2 2 μ h 2 λ k + 1 λ k 2 L L h + L 2 β 2 μ h 2 y k + 1 y k 2 + ω + L 3 β 2 μ h μ 2 x k + 1 x k 2 + L h + 2 μ h 2 β 2 μ h 2 λ k + 1 λ k 2 . (42)

From the optimality condition (13), we have

λ k + 1 = ( g ( y k + 1 ) + y H ( x k + 1 , y k + 1 ) ) . (43)

Then,

λ k + 1 λ k 2 = g ( y k ) g ( y k + 1 ) + y H ( x k , y k ) y H ( x k + 1 , y k + 1 ) 2 2 g ( y k ) g ( y k + 1 ) 2 + 2 y H ( x k , y k ) y H ( x k + 1 , y k + 1 ) 2 . (44)

Using assumption 4),

( x H ( x 1 , y 1 ) x H ( x 2 , y 2 ) , y H ( x 1 , y 1 ) y H ( x 2 , y 2 ) ) M ( x 1 x 2 , y 1 y 2 )

We can obtain

x H ( x k , y k ) x H ( x k + 1 , y k + 1 ) 2 + y H ( x k , y k ) y H ( x k + 1 , y k + 1 ) 2 M 2 x k + 1 x k 2 + M 2 y k + 1 y k 2 ,

which implies

2 y H ( x k , y k ) y H ( x k + 1 , y k + 1 ) 2 2 M 2 x k + 1 x k 2 + 2 M 2 y k + 1 y k 2 . (45)

Next, we consider two different scenarios:

1) y k + 1 y k ,

From the given assumption that ( g , h ) is L-smooth adaptable and Equation (9), which yields

g ( x ) g ( y ) L D h ( x , y ) + D h ( y , x ) x y , for all x y i n t ( d o m ( h ) ) ,

then,

g ( y k + 1 ) g ( y k ) L D h ( y k + 1 , y k ) + D h ( y k , y k + 1 ) y k + 1 y k = L h ( y k + 1 ) h ( y k ) h ( y k ) , y k + 1 y k + h ( y k ) h ( y k + 1 ) h ( y k + 1 ) , y k y k + 1 y k + 1 y k = L h ( y k ) h ( y k + 1 ) , y k y k + 1 y k + 1 y k L L h y k y k + 1 2 y k + 1 y k = L L h y k + 1 y k . (46)

2) y k + 1 = y k , it is easy to verify that (46) holds.

By substituting Equation (45) and (46) into (44), we obtain

λ k + 1 λ k 2 2 L 2 L h 2 y k + 1 y k 2 + 2 M 2 x k + 1 x k 2 + 2 M 2 y k + 1 y k 2 = ( 2 L 2 L h 2 + 2 M 2 ) y k + 1 y k 2 + 2 M 2 x k + 1 x k 2 . (47)

Combining (47) and (42), we have

L ( x k + 1 , y k + 1 , λ k + 1 ) L ( x k , y k , λ k ) β 2 μ h 2 ( L L h + L 2 β 2 μ h ) + ( L h + 2 μ h ) ( 2 L 2 L h 2 + 2 M 2 ) 2 β 2 μ h 2 y k + 1 y k 2 + β 2 μ h 2 ( ω + L 3 β 2 μ h μ ) + 2 M 2 ( L h + 2 μ h ) 2 β 2 μ h 2 x k + 1 x k 2 . (48)

By simple calculation, when

β 2 > max { β x , β y }

where

β y : = L L h + L 2 + ( L L h + L 2 ) 2 + 4 μ h 3 ( L h + 2 μ h ) ( 2 L 2 L h 2 + 2 M 2 ) 2 μ h 3 , (49)

β x : = ω + L 3 + ( ω + L 3 ) 2 + 8 μ h 3 μ M 2 ( L h + 2 μ h ) 2 μ h 3 μ , (50)

then

L ( w k + 1 ) L ( w k ) δ v k + 1 v k 2 ,

where

δ : = min { δ 1 , δ 2 }

δ 1 : = L L h L 2 + β μ h 2 2 L 2 L h 2 + 2 M 2 β 2 μ h L 2 L h 3 + M 2 β 2 μ h 2

δ 2 : = ω L 3 + β 2 μ h μ 2 M 2 ( L h + 2 μ h ) β 2 μ h 2 .

Thus, the theorem is established.

Remark 3.1. If H = 0 , then we have L 2 = L 3 = M = 0 . Moreover, by

L h = μ h = 1 , we have β 2 > max { 3 L , ω μ } . This result is larger than 2L which is

proved by Guo et al.s classical result [8] . Howerver, if we directly solve the separable problem, it can cover their conclusion. The reason of our weaker conclusion could be that the inequality bounds are not tight enough in our proof.

Lemma 3.2. Let { ω k } k N be the sequence generated by the Bragman ADMM (6) which is assumed to be bounded. Then the following holds

k = 0 + ω k + 1 ω k 2 < + . (51)

Proof.

Since { ω k } is bounded, then there exists a subsequence { ω k j } such that ω k j ω * . By the condition that f is lower semi-continuous and g, H are continuous, we obtain L ( ) is semicontinuous, which implies

L ( ω * ) l i m j + inf L ( ω k j ) .

Consequently, { L ( ω k j ) } j N is bounded from below.

Note that, Lemma 3.1 implies that { L ( ω k j ) } j N is nonincreasing and thus { L ( ω k j ) } j N is convergent. Moreover, we have { L ( ω k j ) } j N is convergent and L ( ω k ) L ( ω * ) . Reassanging terms of (3.1) leads to

δ v k + 1 v k 2 L ( ω k ) L ( ω k + 1 ) .

Now, summing the above inequality over k = 0, , n yields

k = 0 n δ v k + 1 v k 2 L ( ω 0 ) L ( ω n + 1 ) L ( ω 0 ) L ( ω * ) < + .

Since δ > 0 , we have k = 0 + v k + 1 v k 2 < + , thus

k = 0 + x k + 1 x k 2 < + , k = 0 + y k + 1 y k 2 < + . (52)

Moreover, it follows from (52) and (47) that k = 0 + λ k + 1 λ k 2 < + , therefore, we obtain k = 0 + ω k + 1 ω k 2 < + .

Lemma 3.3. Let { ω k } k N be the sequence generated by the Bragman ADMM (6) which is assumed to be bounded. Then there exists ζ > 0 such that

d ( 0, L ( ω k + 1 ) ) ζ v k + 1 v k . (53)

Proof.

By the definition of function L ( ) , it follows

( x L ( ω k + 1 ) = f ( x k + 1 ) + x H ( x k + 1 , y k + 1 ) + A T λ k + 1 + L h β 2 A T A x k + 1 + y k + 1 b + β 2 A T h 2 ( A x k + 1 b ) , A x k + 1 + y k + 1 b y L ( ω k + 1 ) = g ( y k + 1 ) + y H ( x k + 1 , y k + 1 ) + λ k + 1 + L h β 2 ( A x k + 1 + y k + 1 b ) + β 2 ( h ( A x k + 1 b ) h ( y k + 1 ) ) λ L ( ω k + 1 ) = A x k + 1 + y k + 1 b . (54)

This, together with Equation (13), which yields

( x L ( ω k + 1 ) = x H ( x k + 1 , y k + 1 ) x H ( x k + 1 , y k ) + A T ( λ k + 1 λ k ) + L h β 2 A T y k + 1 y k + β 2 A T h 2 ( A x k + 1 b ) , A x k + 1 + y k + 1 b y L ( ω k + 1 ) = λ k + 1 λ k + L h β 2 ( A x k + 1 + y k + 1 b ) λ L ( ω k + 1 ) = A x k + 1 + y k + 1 b . (55)

Let

( ξ k + 1 1 , ξ k + 1 2 , ξ k + 1 3 ) : = ( x H ( x k + 1 , y k + 1 ) x H ( x k + 1 , y k ) + A T ( λ k + 1 λ k ) + L h β 2 A T y k + 1 y k + β 2 A T h 2 ( A x k + 1 b ) , A x k + 1 + y k + 1 b , λ k + 1 λ k + L h β 2 ( A x k + 1 + y k + 1 b ) , A x k + 1 + y k + 1 b ) . (56)

Then it follows from Lemma 2.3 that ( ξ k + 1 1 , ξ k + 1 2 , ξ k + 1 3 ) L ( ω k + 1 ) . Furthermore, there exist ζ 1 , ζ 2 , ζ 3 > 0 such that

( ξ k + 1 1 , ξ k + 1 2 , ξ k + 1 3 ) ζ 1 y k + 1 y k + ζ 2 λ k + 1 λ k + ζ 3 x H ( x k + 1 , y k + 1 ) x H ( x k + 1 , y k ) . (57)

Again, since H is lipschitz continuous on bounded subsets and { ( x k , y k ) } is bounded, it follows

x H ( x k + 1 , y k + 1 ) x H ( x k + 1 , y k ) M y k + 1 y k . (58)

From (47), we know

λ k + 1 λ k 2 L 2 L h 2 + 2 M 2 y k + 1 y k + 2 M x k + 1 x k . (59)

By setting ζ = ( ζ 1 + 2 L 2 L h 2 + 2 M 2 ζ 2 + M ζ 3 ) 2 + ( 2 M ζ 3 ) 2 , it follows above three Equations (57), (58), (59), we can obtain

d ( 0, L ( ω k + 1 ) ) ( ξ k + 1 1 , ξ k + 1 2 , ξ k + 1 3 ) ( ζ 1 + 2 L 2 L h 2 + 2 M 2 ζ 2 + M ζ 3 ) y k + 1 y k + 2 M ζ 3 x k + 1 x k ζ v k + 1 v k , (60)

where the third inequality follows from the Cauchy inequality. Thus, the Lemma 3.3 is proven.

Lemma 3.4. Let { ω k } k N be the sequence generated by the Bragman ADMM (6) which is assumed to be bounded. Let S ( ω 0 ) denote the set of its limit point. Then,

1) S ( ω 0 ) is a nonempty compact set,and d ( ω k , S ( ω 0 ) 0 ) , as k + ;

2) S ( ω 0 ) c r i t L β ;

3) L ( ) is finite and constant on S ( ω 0 ) , equal to inf k N L ( ω k ) = lim k + L ( ω k ) .

Proof.

1) This item follows as an elementary consequence of the definition of limit points.

2) For any fixed ( x * , y * , λ * ) S ( w 0 ) , there exists a subsequence { ( x k j , y k j , λ k j ) } j N of { ( x k , y k , λ k ) } k N converges to ( x * , y * , λ * ) .

Since x k + 1 is the global minimizer of L ( x , y k , λ k ) for the variable x , it holds

L ( x k + 1 , y k , λ k ) L ( x * , y k , λ k ) .

Using the above inequality and the continuity of L ( ) with respect to y and λ ensure

lim sup j + L β ( x k j + 1 , y k j , λ k j ) = lim sup j + L ( x k j + 1 , y k j + 1 , λ k j + 1 ) L ( x * , y * , λ * ) .

On the other hand, Lemma 3.3 implies w k + 1 w k 0 , which means that the subsequence { ( x k j + 1 , y k j + 1 , λ k j + 1 ) } j N also converges to ( x * , y * , λ * ) .

From the lower semicontinuity of L ( ) , we have

lim inf j + L ( x k j + 1 , y k j + 1 , λ k j + 1 ) L ( x * , y * , λ * ) . (61)

Then by combining above two equations together we can get

lim j + L ( x k j + 1 , y k j + 1 , λ k j + 1 ) = L ( x * , y * , λ * ) , (62)

which implies

lim j + f ( x k j + 1 ) = f ( x * ) . (63)

Passing to the limit in (13) and (54) along the subsequence { ( x k j + 1 , y k j + 1 , λ k j + 1 ) } j N and invoking (63), the continuity of g ( ) , x H ( , ) , y H ( , ) , it follows that

( x H ( x * , y * ) A T λ * f ( x * ) y H ( x * , y * ) λ * = g ( y * ) h ( A x * b ) h ( y * ) = 0.

The last equation implies that A x * + y * b = 0 due to the strong convexity of h ( ) .

By the definition of L ( ) , we have

( 0 x L ( ω * ) = f ( x * ) + x H ( x * , y * ) + A T λ * 0 y L ( ω * ) = g ( y * ) + y H ( x * , y * ) + λ * 0 λ L ( ω * ) = A x * + y * b . (64)

By the definition of critical point of the augmented Lagrangian function L β ( ) (1.2), if it satisfies:

{ A T λ * x H ( x * , y * ) f ( x * ) λ * y H ( x * , y * ) = g ( y * ) A x * + y * b = 0

Then, ( x * , y * , λ * ) is a critical point of L β , L , i.e., ( x * , y * , λ * ) crit L β , and ( x * , y * , λ * ) crit L .

3) For any point ( x * , y * , λ * ) S ( w 0 ) , there exists a subsequence { ( x k j , y k j , λ k j ) } j N converges to ( x * , y * , λ * ) .

Since { L ( w k ) } k N is nonincreasing, combining (61) and (62) leads to

lim k + L ( x k , y k , λ k ) = L ( x * , y * , λ * ) . (65)

Therefore, L ( ) is constant on S ( w 0 ) . Moreover, inf k N L ( w k ) = lim k + L ( w k ) .

Next, we will prove the convergence of the iterative sequence.

Theorem 3.1. Let { w k } k N be the sequence generated by the ADMM (6) which is assumed to be bounded. Suppose that L ( ) is a KL function, then { w k } k N has finite length, that is

k = 0 + w k + 1 w k < + ,

and as a consequence, we have { w k } k N converged to a critical point of L β ( ) .

Proof.

Since from the proof of Lemma (3.4), it follows that L ( w k ) L ( w * ) for all w * S ( w 0 ) . We consider two cases:

1) If there exists an integer k 0 for which L ( w k 0 ) = L ( w * ) . Rearranging terms of (14) we have that for any k > k 0 ,

δ v k + 1 v k 2 L ( w k ) L ( w k + 1 ) L ( w k 0 ) L ( w * ) = 0 ,

implying that v k + 1 = v k for any k > k 0 .

Associated with (47), for any k > k 0 + 1 , it follows that w k + 1 = w k and the assertion holds.

2) Now, assume L ( w k ) > L ( w * ) for all k . We claim there exists k ˜ > 0 such that for all k > k ˜ ,

δ v k + 1 v k 2 ζ v k v k 1 Δ k , k + 1 , (66)

where Δ p , q : = φ ( L ( w p ) L ( w * ) ) φ ( L ( w q ) L ( w * ) ) . To see this, note that d ( w k , S ( w 0 ) ) 0 and L β ( w k ) L ( w * ) , then for all ε , η > 0 , there exists k ˜ > 0 such that when k > k ˜ , we have

d ( w k , S ( w 0 ) ) < ε , L ( w * ) < L ( w k ) < L ( w * ) + η .

Since S ( w 0 ) is a nonempty compact set and L ( ) is constant on S ( w 0 ) , applying Lemma (2.4) with Ω : = S ( w 0 ) , we deduce that for any k > k ˜

φ ( L ( w k ) L ( w * ) ) d ( 0, L ( w k ) ) 1.

Since L ( w k ) L ( w k + 1 ) = ( L ( w k ) L ( w * ) ) ( L ( w k + 1 ) L β ( w * ) ) , making use of the concavity of φ we get that

φ ( L ( w k ) L ( w * ) ) φ ( L β ( w k + 1 ) L β ( w * ) ) φ ( L ( w k ) L ( w * ) ) ( L β ( w k ) L β ( w k + 1 ) )

Thus, using the inequalities d ( 0, L ( w k ) ) ζ v k v k 1 and φ ( L ( w k ) L ( w * ) ) > 0 , we obtain

L ( w k ) L ( w k + 1 ) φ ( L ( w k ) L ( w * ) ) φ ( L ( w k + 1 ) L ( w * ) ) φ ( L ( w k ) L ( w * ) ) ζ v k v k 1 [ φ ( L ( w k ) L ( w * ) ) φ ( L ( w k + 1 ) L ( w * ) ) ] .

Combining Lemma 3.1 with the above relation gives (66), as desired. Moreover, (66) implies

v k + 1 v k ζ δ Δ k , k + 1 v k v k 1 1 2 .

Notice that 2 α β α + β , for all α , β 0 . Then we obtain

2 v k + 1 v k v k v k 1 + ζ δ Δ k , k + 1 .

Summing the inequality above over k = k ˜ + 1, , m yields

2 k = k ˜ + 1 m v k + 1 v k k = k ˜ + 1 m v k v k 1 + ζ δ Δ k ˜ + 1 , m + 1 .

Since φ ( L ( w m + 1 ) L ( w * ) ) > 0 , rearranging terms and letting m + lead to

k = k ˜ + 1 + v k + 1 v k v k ˜ + 1 v k ˜ + ζ δ φ ( L ( w k ˜ + 1 ) L ( w * ) ) ,

which implies

k = 0 + v k + 1 v k < + .

Thus, it follows that

k = 0 + x k + 1 x k < + , k = 0 + y k + 1 y k < + .

These, together with (47), we obtain

k = 0 + λ k + 1 λ k < + .

Moreover, note that

w k + 1 w k = ( x k + 1 λ k 2 + y k + 1 y k 2 + λ k + 1 λ k 2 ) 1 2 x k + 1 λ k + y k + 1 y k + λ k + 1 λ k .

Therefore,

k = 0 + w k + 1 w k < + ,

and { w k } k N is a Cauchy sequence (see P.482 for a simple proof), which shows that the sequence generated by our algorithm converges. The assertion then follows immediately from Lemma 3.4.

We now establish the convergence rate for the ADMM (6). The proof of the following result is similar to that in [8] and hence omitted here.

Theorem 3.2 Let { w k } k N be the sequence generated by the ADMM (6) and converges to w * : = ( x * , y * , λ * ) . Assume that L ( ) has the KL property at ( x * , y * , λ * ) with φ ( s ) : = c s 1 θ , θ [ 0 , 1 ) , c > 0 . Then the following estimations hold:

1) If θ = 0 , then the sequence { w k } k N converges in a finite number of steps.

2) If θ ( 0, 1 2 ] , then there exist c > 0 and τ [ 0 , 1 ) , such that

( x k , y k , λ k ) ( x * , y * , λ * ) c τ k .

3) If θ ( 1 2 ,1 ) , then there exists c > 0 , such that

( x k , y k , λ k ) ( x * , y * , λ * ) c k θ 1 2 θ 1 .

4. Conclusions

In this paper, we conducted an analysis of the convergence properties of the Bregman Alternating Direction Method of Multipliers (ADMM) for addressing linearly constrained nonconvex minimization problems with coupled objective functions. Our study operated under the assumption that the associated functions satisfy the Kurdyka-Lojasiewicz inequality. We successfully demonstrated that the iterative sequence generated by the Bregman ADMM converges to a critical point of the augmented Lagrangian function, given that the penalty parameter in the augmented Lagrangian surpasses a certain threshold. Additionally, with further conditions on the problem’s data, we established the convergence rate of the algorithm. Looking forward, extending Algorithm (6) to handle multi-block non-separable linearly constrained nonconvex optimization problems and incorporating generalized ADMM represent meaningful avenues for future exploration.

Funding

These authors were supported by the Natural Science Foundation of China (Grant Nos. 11801455), the Sichuan Science and Technology Program (Grant No. 2023NSFSC1922), the Innovation Team Funds of China West Normal University (Grant No. KCXTD2023-3), the Fundamental Research Funds of China West Normal University (Grant No. 23kc010).

Conflicts of Interest

The authors declare no conflicts of interest regarding the publication of this paper.

References

[1] Alizadeh, M., Li, X., Wang, Z., Scaglione, A. and Melton, R. (2012) Demand-Side Management in the Smart Grid: Information Processing for the Power Switch. IEEE Signal Processing Magazine, 29, 55-67.
https://doi.org/10.1109/MSP.2012.2192951
[2] Chang, T.H., Alizadeh, M. and Scaglione, A. (2012) Coordinated Home Energy Management for Real-Time Power Balancing. 2012 IEEE Power and Energy Society General Meeting, San Diego, 22-26 July 2012, 22-26.
[3] Li, N., Chen, L. and Low, S.H. (2011) Optimal Demand Response Based on Utility Maximization in Power Networks. 2011 IEEE Power and Energy Society General Meeting, Detroit, 24-28 July 2011, 1-8.
https://doi.org/10.1109/PES.2011.6039082
[4] Paatero, J.V. and Lund, P.D. (2006) A Model for Generating Household Electricity Load Profiles. International Journal of Energy Research, 30, 273-290.
https://doi.org/10.1002/er.1136
[5] Scutari, G. and Palomar, D.P. (2010) MIMO Cognitive Radio: A Game Theoretical Approach. IEEE Transactions on Signal Processing, 58, 761-780.
https://doi.org/10.1109/TSP.2009.2032039
[6] Zhao, Q. and Sadler, B.M. (2007) A Survey of Dynamic Spectrum Access. IEEE Signal Processing Magazine, 24, 79-89.
https://doi.org/10.1109/MSP.2007.361604
[7] Guo, K., Han, D.R. and Wu, T.T. (2018) Convergence Analysis for Optimization Problems with Nonseparable Nonconvex Objective and Linear Constraints. Pacific Journal of Optimization, 14, 489-506.
[8] Guo, K., Han, D.R. and Wu, T.T. (2017) Convergence of Alternating Method for Minimizing Sum of Two Nonconvex Functions with Linear Constraints. International Journal of Computer Mathematics, 94, 1653-1669.
https://doi.org/10.1080/00207160.2016.1227432
[9] Kong, W.W. and Monteiro, R.D.C. (2024) Global Complexity Bound of a Proximal ADMM for Linearly-Constrained Nonseperable Nonconvex Composite Programming. SIAM Journal on Optimization, 34.
[10] Guo, K., Han, D., Wang, D.Z.W. and Wu, T.T. (2017) Convergence of ADMM for Multi-Block Nonconvex Separable Optimization Models. Frontiers of Mathematics in China, 12, 1139-1162.
[11] Guo, K. and Wang, X. (2018) Convergence of Generalized Alternating Direction Method of Multipliers for Nonseparable Nonconvex Objective with Linear Constraints.
https://doi.org/10.3770/j.issn:2095-2651.2018.05.010
[12] Tan, L. and Guo, K. (2024) Bregman ADMM: A New Algorithm for Nonconvex Optimization with Linear Constraints, Accepted. Journal of Nonlinear and Variational Analysis, 8.
[13] Boyd, S., Parikh, N., Chu, E., Peleato, B. and Eckstein, J. (2010) Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers. Foundations and Trends in Machine Learning, 3, 1-122.
https://doi.org/10.1561/2200000016
[14] Figueiredo, M.A.T. and Bioucas-Dias, J.M. (2010) Restoration of Poissonian Images Using Alternating Direction Optimization. IEEE Transactions on Image Processing, 19, 3133-3145.
https://doi.org/10.1109/TIP.2010.2053941
[15] Goldstein, T., Bresson, X. and Osher, S. (2010) Geometric Applications of the Split Bregman Method: Segmentation and Surface Reconstruction. Journal of Scientific Computing, 45, 272-293.
https://doi.org/10.1007/s10915-009-9331-z
[16] Fu, Q., Wang, H. and Banerjee, A. (2013) Bethe-ADMM for Tree Decomposition Based Parallel MAP Inference. arXiv: 1309.6829.
[17] Wang, H. and Banerjee, A. (2013) Online alternating Direction Method. arXiv: 1206.6448.
[18] Yang, J. and Zhang, Y. (2011) Alternating Direction Algorithms for ℓ1-Problems in Compressive Sensing. SIAM Journal on Scientific Computing, 33, 250-278.
https://doi.org/10.1137/090777761
[19] Gabay, D. (1983) Applications of the Method of Multipliers to Variational Inequalities. Studies in Mathematics and Its Applications, 15, 299-331.
https://doi.org/10.1016/S0168-2024(08)70034-1
[20] Kellogg, R.B. (1969) Nonlinear Alternating Direction Algorithm. Mathematics of Computation, 23, 23-38.
[21] Marchuk, G.I. (1990) Splitting and Alternating Direction Methods. Handbook of Numerical Analysis, 1, 197-462.
https://doi.org/10.1016/S1570-8659(05)80035-3
[22] Beck, A. (2017) First Order Methods In Optimization. SIAM, Philadelphia.
https://doi.org/10.1137/1.9781611974997
[23] Bolte, J., Sabach, S., Teboulle, M. and Vaisbourd, T. (2018) First Order Methods beyond Convexity and Lipschitz Gradient Continuity with Applications to Quadratic Inverse Problems. SIAM Journal on Optimization, 28, 2131-2151.
https://doi.org/10.1137/17M1138558
[24] Bauschke, H.H., Borwein, J. and Combettes, P.L. (2003) Bregman Monotone Optimization Algorithms. SIAM Journal on Control and Optimization, 42, 596-636.
https://doi.org/10.1137/S0363012902407120
[25] Attouch, H., Bolte, J., Redont, P. and Soubeyran, A. (2010) Proximal Alternating Minimization and Projection Methods for Nonconvex Problems: An Approach Based on the Kurdyka Lojasiewicz Inequality. Mathematics of Operations Research, 35, 438-457.
https://doi.org/10.1287/moor.1100.0449
[26] Attouch, H., Bolte, J. and Svaiter, B.F. (2013) Convergence of Descent Methods for Semi-Algebraic and Tame Problems: Proximal Algorithms, Forward-Backward Splitting, and Regularized Guass-Seidel Methods. Mathematical Programming, 137, 91-129.
https://doi.org/10.1007/s10107-011-0484-9
[27] Bolte, J., Sabach, S. and Teboulle, M. (2014) Proximal Alternating Linearized Minimization for Nonconvex and Nonsmooth Problem. Mathematical Programming, 146, 459-494.
https://doi.org/10.1007/s10107-013-0701-9
[28] Nesterov, Y. (2004) Introduction Lectures on Convex Optimization: A Basic Course. Springer, New York.
https://doi.org/10.1007/978-1-4419-8853-9

Copyright © 2024 by authors and Scientific Research Publishing Inc.

Creative Commons License

This work and the related PDF file are licensed under a Creative Commons Attribution 4.0 International License.