A Symmetric Bregman Alternating Direction Method of Multipliers for Separable Nonconvex Minimization Problem ()
1. Introduction
In this paper, we consider the following two-block separable optimization problem with linear constraint
(1.1)
where
is a proper lower semicontinuous function,
is a continuous differentiable function,
and
. Numerous valuable optimization problems can be expressed in the form (1), rendering it widely applicable across diverse fields, such as machine learning [1]-[3], image processing [4]-[6], and signal processing [7]-[10].
When both
and
are convex functions, a prominent approach for addressing problem (1) is the alternating direction method of multipliers (ADMM), proposed by Gabay, Mercier, Glowinski and Marroco [11] [12] in 1970s. This method fully harnesses the separable properties to their utmost potential, thus attracting considerable attention across diverse domains in recent years. The iterative scheme of the ADMM is as follows
(1.2)
Here,
denotes the augmented Lagrangian function for (1.1), given by
where
is the Lagrangian multiplier associated with the linear constraint, and
is the penalty parameter. The study of ADMM has a lengthy academic lineage, its convergence and convergence rate are well-understood [13]-[16] for convex objectives.
However, for scenarios where there is at least one nonconvex component in the objective function, many studies primarily focus on proving the convergence of the ADMM or its variant and analyzing problem scenarios under additional conditions, such as Li and Pong [17] and Hong et al. [18]. Particularly, in 2017, Guo et al. [19] demonstrated that under conditions less stringent than those outlined in [17] [18], the convergence and convergence rate of ADMM for nonconvex problems can be established, contingent upon the augmented Lagrangian function satisfying the Kurdyka-Lojasiewicz inequality. Motivated by the insights provided in the aforementioned article, Wu et al. [20] considered utilizing symmetric ADMM to address a two-block linearly constrained separable nonconvex optimization, which can revert back to the classical ADMM, and conducted an analysis on its convergence and convergence rate for the case
(
is the identity matrix with proper dimension). Note that it can numerically accelerate ADMM with some values of
. Its iterative format is as follows
(1.3)
The only difference from the classical ADMM lies in the addition of a relaxation factor
in the update of the multiplier
between the iterative formulas of
and
. In particular, the algorithm returns to the classical ADMM when
. Therefore, it presents the same limitations in certain areas [21]-[23]. Whether using the ADMM or the symmetric ADMM to tackle two-block separable nonconvex problems with linear constraints, both methods are constrained by the assumption of Lipschitz continuity of differentiable functions.
To relax the Lipschitz continuity constraint on the gradient of the objective function, Tan and Guo [24] introduced a novel version of Bregman ADMM, which is distinguished from that proposed by Wang et al. [25] by its ability to revert to the classical ADMM. Its iteration is as follows
(1.4)
where
denotes the Bregman augmented Lagrangian function for (1.1)
(1.5)
where
is the Lagrangian multiplier associated with the linear constraint, and
is the penalty parameter. And the Bregman distance
is defined as
When
, the Bregman ADMM (1.4) reduces to the classical ADMM (1.2).
Drawing on the aforementioned concept, our aim is to alleviate the necessity for Lipschitz continuity of the gradient of differentiable functions in the symmetric ADMM while addressing two-block separable nonconvex problems with linear constraints. In this paper, we propose an iteration for symmetric version of the Bregman ADMM, whose iterative scheme is
The symmetric Bregman ADMM (1.6) can transition to the symmetric ADMM (1.3) by setting
, to the Bregman ADMM (1.4) by setting
, and to the classic ADMM (1.2) by setting
and
. In essence, the ADMM framework investigated by Wu et al. [20] and Tan and Guo [24] is a particular instance of the approach we introduce. Under the same assumption as those in Tan and Guo [24], we can prove the convergence of the symmetric Bregman ADMM (1.6), providing that the associated function satisfies the Kurdyka-Lojasiewicz inequality. Moreover, we demonstrate that the iterative sequence produced by the symmetric Bregman ADMM (1.6) converges to a critical point of the problem (1.1), and we also analyze the convergence rate of the algorithm.
The remainder of this paper is organized as follows. In Section 2, we provide some necessary preliminaries for our subsequent analysis. In Section 3, we present the convergence analysis of the symmetric Bregman ADMM (1.6) and analyze its convergence rate. Finally, in Section 4, we summarize our findings and draw conclusions.
2. Preliminaries
In this section, we recall some definitions and basic results that will be used for further analysis.
Definition 2.1. [26] For an extended real-valued function
, the effective domain or just the domain is the set
Definition 2.2. [26] A function
is called proper if there exists at least one
such that
.
Definition 2.3. [26] A function
is called lower semicontinuous at
if
for any sequence
for which
as
. Moreover,
is called lower semicontinuous if it is lower semicontinuous at each point in
.
Definition 2.4. ([27], kernel generating distance) Let
be a nonempty, convex, and open subset of
. Associated with
, a function
is called a kernel generating distance if it satisfies the following
(i)
is proper, lower semicontinuous, and convex, with
and
.
(ii)
is
on
.
We denote the class of kernel generating distance by
.
Definition 2.5 ([27], L-smooth adaptable) Let
,
continuously differentiable on
. A pair
is called L-smooth adaptable on
if there exists
such that
and
are convex on
.
Remark 2.1. Definition 2.5 serves as a natural extension and complement to the definition of “A Lipschitz-like/Convexity Condition” as presented in reference [21]. This extension enables the derivation of the following two-sided descent lemma.
Lemma 2.1. ([27], extended descent lemma) The pair of functions
is L-smooth adaptable on
if and only if
(2.1)
Remark 2.2. In particular, when the set
and
, (2.1) reduces to the classical descent lemma for function
, i.e.,
Definition 2.6. [27] Let
be a proper and lower semicontinuous function. The gradient of
is D-Lipschitz if there exists
satisfying
Remark 2.3. According to Cauchy-Schward inequality, we have
which combines with definition ??, we obtain that
Using the conclusion in Lemma 2.4, the above inequality is equivalent to
Then,
(2.2)
Based on inequality (2.2), it can be concluded that the functions
and
exhibit convexity due to the monotonicity of their gradients on the set
. Hence, ensuring the gradient of function
satisfies the D-Lipschitz continuity condition is adequate for establishing the
function pair as L-smooth adaptable. Considering the intricacy of the ADMM iterative procedure in this study, we find it necessary to presuppose the D-Lipschitz continuity of function
.
Remark 2.4. Certainly, the D-Lipschitz continuity characteristic is essentially a Lipschitz-like gradient property in the context of the Bregman distance for the function
, and it becomes equivalent to the gradient Lipschitz continuity of
when
.
Definition 2.7. [27] Let
be a proper lower semicontinuous function.
(i) The Fréchet subdifferential, or regular subdifferential, of
at
, written
, is the set of vectors
that satisfy
When
, we set
.
(ii) The limiting-subdifferential, or simply the subdifferential, of
at
, written
, is defined as follows:
Remark 2.5. From above definition, we note that
(i) It implies
for each
, where the first set is closed convex while the second one is only closed.
(ii) Let
be a sequence that converges to
. By the definition of
, if
converges to
as
, then
, where
.
(iii) A necessary condition for
to be a minimizer of
is
(2.3)
(iv) If
is a proper lower semicontinuous function and
is continuous differentiable function, then
for any
.
A point that meets the condition of Equation (2.3) is referred to as a critical or a stationary point. The critical points set of
is denoted by
.
Next, we recall an important property of subdifferential calculus.
Lemma 2.2. [28] Suppose that
, where
and
are proper lower semicontinuous functions. Then for all
, we have
Definition 2.8. ([28], Kurdyka-Lojasiewicz inequality) Let
be a proper lower semicontinuous function. For
, set
We say that function
has the KL property at
if there exist
, a neighbourhood
of
, and a continuous concave function
, such that
(i)
;
(ii)
is
on
and continuous at 0;
(iii)
,
;
(iv) for all
in
, the Kurdyka-Lojasiewicz inequality holds
where
, is the distance from
to
.
Remark 2.6. Denote
be the set of all continuous functions
which satisfy (i)-(iii).
Definition 2.9. ([29], Kurdyka-Lojasiewicz function) If function
satisfies the KL property at each point of
, then
is called a KL function.
Lemma 2.3. ([30], Uniformized KL property) Let
be a compact set and
be a proper and lower semicontinuous function. Assume that
is constant on
and satisfies the KL property at each point of
. Then, there exist
,
, and
such that for all
and for all
in the following intersection:
one has
Lemma 2.4. [31] Let
. For any
and
, then
(i)
.
(ii) Three points identity holds:
(2.4)
Definition 2.10. [32] Let
. The Bregman distance
is defined by
Since
is convex,
, and
if only if when
.
Definition 2.11. We say that
is a critical point of the Augmented Lagrangian Function
(1.5) with Bregman distance if it satisfies
Lemma 2.5. Let
be the sequence generated by the symmetric Bregman ADMM (1.6). Then, we have
Proof. Combining (1.6b) and (1.6d), we get
and thus
Similarly, we combine (1.6b) and (1.6d) again
thus, we obtain
From the optimality condition of (1.6c), we have
Substituting (1.6d) into the above equation yields
This completes the proof.
3. Convergence Analysis
In this section, we analyze the convergence of the symmetric Bregman ADMM (1.6) and show that the sequence
generated by the symmetric Bregman ADMM (1.6) converges to a critical point
of
under the following assumptions.
Assumption A. Let
be a proper lower semicontinuous function,
be a continuously differentiable function with
being D-Lipschitz continuous. Additionally, let
be a twice differentiable function on
, which is 1-strong-convex, and whose
is Lipschitz continuous with
on any bounded subset of
. We assume that
(i)
,
, which implies
(ii)
,
, which implies
(iii)
for some
.
Next, we examine the optimality conditions of (1.6). Invoking the optimality conditions, we have that
(3.1)
Lemma 3.1. Let
be the sequence generated by the symmetric Bregman ADMM (1.6), which is assumed to be bounded. Then we have
(3.2)
Proof. From the definition of
in (1.5), it follows that
(3.3)
Since the gradient of the function
is D-Lipschitz continuous on
, it follows that the function pair
is L-smooth adaptable. Then, according to Lemma 2.1, we can obtain
(3.4)
By substituting inequality (3.4) into relation (3.3), we obtain
where the first and the third equalities follow from Lemma (2.5) and the three points identity (2.4) respectively.
Based on the 1-strong convexity and
-smoothness of the function
, we can obtain the following two inequalities respectively
Combining the above two inequalities, we get
thus, we can obtain
(3.5)
Next, by using (1.6b) and (1.5), we have
where the second equality follows from (1.6b). Then, according to Lemma 2.5
and, since
is 1-strong-convex, we have
Now, we discuss the two cases based on the range of
.
(i)
, combining the above formulas
(3.6)
Subsequently, we claim that
(3.7)
To prove (3.7), we consider two cases. When
, (3.7) holds obviously. Next, we assume
. Since
is
-Lipschitz, we obtain
where the second inequality is a consequence of
is Lipschitz continuous with
on any bounded subset of
, that is
(3.8)
Since
, (3.7) becomes
(3.9)
Moreover, according to
is 1-strong-convex, we get
, i.e.,
(3.10)
Substituting (3.8), (3.9) and (3.10) into (3.6), we conclude that
(3.11)
Then, combining (3.5) and (3.11), we obtain that
(3.12)
Consequently, according to (1.6a), we have
(3.13)
Finally, summing the inequality (3.12) and (3.13), we conclude that
This completes the discussion for case (i), we now proceed to discuss case (ii).
(ii)
, combining the same formulas
(3.14)
Similarly, substituting (3.8), (3.9) and (3.10) into (3.14), we conclude that
(3.15)
Then, combining (3.5) and (3.15), we obtain that
(3.16)
Consequently, according to (1.6a), we have
(3.17)
Finally, summing the inequality (3.16) and (3.17), we conclude that
This completes the discussion for case (ii), and with this, the proof is complete.□
Remark 3.1. Since
, Lemma 3.1 implies that
is monotonically nonincreasing. Note that when
, we have
. This corresponds to the requirement in Tan and Guo [24]. Furthermore, when
, we have
and
, thus
. This corresponds to the requirement in Guo et al. [19].
Lemma 3.2. Let
be the sequence generated by the symmetric Bregman ADMM (1.6), which is assumed to be bounded. Then we have
Proof: Considering that
is bounded, it has at least one limit point. Let
be a limit point of
and let
be the subsequence converging to it, i.e.
. Given that
is a lower semicontinuous function, we can deduce that
Consequently,
is bounded from below. Besides, the fact that
is nonincreasing implies that
is convergent. Moreover,
is convergent, and
. Based on the Equation (3.2), we can obtain
Summing up the above inequality from
to
, we conclude that
Owing to
, we have
, which implies
. Consequently, according to (3.9), we get that
Recall Lemma 2.5, we have
Combining the two equalities, we obtain
Then, we use the 1-strong-convex of
and Assumption A(iii), we can follow
Thus, combining the above two formulas, we obtain that
(3.18)
where
. Then, the last inequality implies
.
Thus,
. This completes the proof.□
Lemma 3.3. Let
be the sequence generated by the symmetric Bregman ADMM (1.6), which is supposed to be bounded, and given that Assumption A holds. For any positive integer
, we define
Then,
and there exists
such that
Proof: By definition of function
, we can obtain that
(3.19)
Combining the optimality condition (3.1) with equation (3.19), we get
Then, according to Lemma 2.2, we obtain that
.
Furthermore,
In addition,
On the other hand,
Based on the above relationship, we know that there exist
such that
Defining
and using (3.9), we obtain
This completes the proof.
Lemma 3.4. Let
be the sequence generated by the symmetric Bregman ADMM (1.6), which is supposed to be bounded. Let
denote the set of its limit points. Then
(i)
is a nonempty compact set, and
, as
;
(ii)
, where
denotes the set of all stationary points of
;
(iii)
is finite and constant on
, which equals to
Proof:
(i) The proposition is immediately derived from the definition of limit points.
(ii) We assume that
, then there exists a subsequence
that converges to
. Note that Lemma 3.2, we have
(3.20)
Consequently, we deduce that
also converges to
. Given that
is the minimizer of
concerning the variable
, we have
(3.21)
On one hand, according to (3.20), (3.21) and the continuity of
with respect to
and
, we get
(3.22)
On the other hand, using the lower semicontinuity of
, we have
(3.23)
The above two relations (3.22) and (3.23) imply that
Taking the limit in the optimality conditions (3.1) along the subsequence
, and utilizing (3.20) yields
Thus,
is a critical point of (5), which implies that
.
(iii) For any point
, there exists a subsequence
that converges to
as
. By merging equations (22) and (23) with the observation that the sequence
is nonincreasing, we have
Therefore,
is constant on
. Furthermore, we also have
.
Hence, we have complete the proof.
Now we present the main convergence result for the symmetric Bregman ADMM (1).
Theorem 3.1. Let
be the sequence generated by the symmetric Bregman ADMM (1.6), which is assumed to be bounded. Suppose that
is a KL function, then
has finite length, that is
and as a consequence,
converges to a critical point of
.
Proof: Based on the proof of Lemma 3.4, it implies that
for all
. Next, we will consider two cases.
(i) If there exists an integer
for which
. Then according to Remark 3.1 and (2), we have
for any
. Consequently, we conclude that
for any
. Combining (9) and (18), we further deduce that
and
for any
, which implies that
. As a result, the assertion is substantiated.
(ii) If
for all
. Considering that
, there exists
, such that
for any
. Furthermore, with
, it follows that there exists
, such that
for any
.
Thus, when for all
, we derive the following conclusions
Since
is constant on
and
is a nonempty compact set, we use Lemma 2.3 with
to derive that for any
,
(3.24)
Relying on the fact that
, and the concavity of
, it follows that
Combining
with the above inequality,
and inequality (3.24), we can obtain
(3.25)
For convenience, for all
, we define
.
Hence, (3.25) can be simplified as
(3.26)
Combining inequality (3.26) with Lemma 2, we get that for all
,
Then,
Applying the fact that
, we obtain
(3.27)
Summing up (3.27) over for
, we have
Note that
from Definition 2.8. Taking
, we have
(3.28)
Therefore,
(3.29)
Combining (3.9) and (3.29), we get
(3.30)
Using (3.18), we obtain
Combining above inequality with (3.29) and (3.30), we have
(3.31)
Furthermore, we note that
Using (3.29), (3.30) and (3.31), it follows that
which implies that
is a Cauchy sequence and thus convergent. The assertion follows from Lemma 3.4 immediately.□
Next, we provide the essential sufficient conditions to establish that the sequence
generated by the symmetric Bregman ADMM (1.6) is bounded.
Lemma 3.5. Let
be the sequence generated by the symmetric Bregman ADMM (1.6). Suppose that
If at least one of the following statements is true:
(i)
.
(ii)
and
.
Then, the sequence
is bounded.
Proof. Suppose that condition (i) holds. From Lemma 3.1, we have
which implies
(3.32)
Using the 1-strong convexity of the function
and the definition of
, we have
(3.33)
Combining (3.32) with (3.33) and using the fact
, we get
(34)
Note that (i) implies that
. When
, we have
, besides, when
, we have
. Therefore, we can derive that
,
and
are bounded. Consequently,
is also bounded, and hence
is bounded.
Next, suppose condition (ii) holds. Using
and (3.34), we get that
Note that
implies that
. When
, we have
, besides, when
, we have
. Therefore, we conclude that the sequences
,
and
are bounded. Then, from (3.18), it follows that
is also bounded, and as a consequence,
is bounded. This completes the proof.
Theorem 3.2. (Convergence rate) Let
be the sequence generated by the symmetric Bregman ADMM (1.6) and converge to
. Assuming that
has the KL property at
with
,
,
. Then, the following results hold
(i) If
, then the sequence
converges in a finite number of steps.
(ii) If
, then there exists
and
such that
(iii) If
, then there exists
such that
Proof: Firstly, consider the case that
, we have
and
. Proof by contradiction, suppose that
does not converge in a finite number of steps, and then, the KL property at
yields
for any sufficiently large
, which is contrary to Lemma 3.1.
Secondly, consider that
and set
for
. By the triangle inequality, we derive that
, and hence it is able to estimate
. With these notations, it follows from (3.28) that
Invoking the KL property of
at
, we conclude that
This can be taken to imply
(3.35)
According to Lemma 3.1, we get
(3.36)
Combining (3.35) and (3.36), it follows that there exists
such that
Therefore,
Sequences satisfying such inequalities have been studied in [33]. It is shown that
If
, then there exists
and
, such that
(3.37)
(3.38)
Recalling that
consequently,
(3.39)
Furthermore, from the relations
and
it follows that
i.e.,
Subsequently, combining the 1-strong-convexity of
and above equality, it follows that
(3.40)
The desired inequalities follow from (3.37)-(3.40) immediately.□
4. Conclusions
In this paper, we proposed a symmetric Bregman ADMM, which can return to the symmetric ADMM, the Bregman ADMM and classical ADMM while circumventing the requirement for global Lipschitz continuity of the gradient when minimizing a linearly constrained nonconvex minimization problem whose objective function is the sum of two separable nonconvex functions. Moreover, we analyze its convergence, under certain assumptions and when the associated function satisfies the Kurdyka-Lojasiewicz inequality, we prove that the iterative sequence generated by the symmetric Bregman ADMM converges to a critical point of the problem. Finally, we establish the convergence rate of the algorithm.