A Symmetric Bregman Alternating Direction Method of Multipliers for Separable Nonconvex Minimization Problem

Zhangyuan Zeng; Shilian Zhao

doi:10.4236/jamp.2025.133047

Journal of Applied Mathematics and Physics > Vol.13 No.3, March 2025

A Symmetric Bregman Alternating Direction Method of Multipliers for Separable Nonconvex Minimization Problem

Zhangyuan Zeng, Shilian Zhao^*
Key Laboratory of Optimization Theory and Applications at China West Normal University of Sichuan Province, School of Mathematics and Information, China West Normal University, Nanchong, China.
DOI: 10.4236/jamp.2025.133047 PDF HTML XML 24 Downloads 128 Views

Abstract

The alternating direction method of multipliers (ADMM) and its symmetric version are efficient for minimizing two-block separable problems with linear constraints. However, both ADMM and symmetric ADMM have limited versatility across various fields due to the requirement that the gradients of differentiable functions exhibit global Lipschitz continuity, a condition that is typically challenging to satisfy in nonconvex optimization problems. Recently, a novel Bregman ADMM that not only eliminates the need for global Lipschitz continuity of the gradient, but also ensures that Bregman ADMM can be degenerated to the classical ADMM has been proposed for two-block nonconvex optimization problems with linear constraints. Building on this, we propose a symmetric Bregman alternating direction method of multipliers, which can be degenerated into the symmetric ADMM and the Bregman ADMM, and thus further degenerated into the classical ADMM. Moreover, when solving separable nonconvex optimization problems, it does not require the global Lipschitz continuity of the gradients of differentiable functions. Furthermore, we demonstrate that under the Kurdyka-Lojasiewicz inequality and certain conditions, the iterative sequence generated by our algorithm converges to a critical point of the problem. In addition, we examine the convergence rate of the algorithm.

Keywords

Symmetric ADMM, Bregman ADMM, Nonconvex Optimization, Separable, Kurdyka-Lojasiewicz Inequality

Share and Cite:

Zeng, Z. and Zhao, S. (2025) A Symmetric Bregman Alternating Direction Method of Multipliers for Separable Nonconvex Minimization Problem. Journal of Applied Mathematics and Physics, 13, 889-913. doi: 10.4236/jamp.2025.133047.

1. Introduction

In this paper, we consider the following two-block separable optimization problem with linear constraint

$\begin{array}{l} min_{x, y} & f (x) + g (y), \\ s .t . & A x + y = b, \end{array}$ (1.1)

where $f : ℝ^{n} \to ℝ \cup {+ \infty}$ is a proper lower semicontinuous function, $g : ℝ^{m} \to ℝ$ is a continuous differentiable function, $A \in ℝ^{m \times n}$ and $b \in ℝ^{m}$ . Numerous valuable optimization problems can be expressed in the form (1), rendering it widely applicable across diverse fields, such as machine learning [1]-[3], image processing [4]-[6], and signal processing [7]-[10].

When both $f$ and $g$ are convex functions, a prominent approach for addressing problem (1) is the alternating direction method of multipliers (ADMM), proposed by Gabay, Mercier, Glowinski and Marroco [11] [12] in 1970s. This method fully harnesses the separable properties to their utmost potential, thus attracting considerable attention across diverse domains in recent years. The iterative scheme of the ADMM is as follows

${\begin{cases} x_{k + 1} \in \underset{x}{\arg \min} {ℒ_{β} (x, y_{k}, λ_{k})}, \\ y_{k + 1} \in \underset{y}{\arg \min} {ℒ_{β} (x_{k + 1}, y, λ_{k})}, \\ λ_{k + 1} : = λ_{k} - β (A x_{k + 1} + y_{k + 1} - b) . \end{cases}$ (1.2)

Here, $ℒ_{β} (\cdot)$ denotes the augmented Lagrangian function for (1.1), given by

$ℒ_{β} (x, y, λ) = f (x) + g (y) - 〈 λ, A x + y - b 〉 + \frac{β}{2} {‖ A x + y - b ‖}^{2},$

where $λ$ is the Lagrangian multiplier associated with the linear constraint, and $β > 0$ is the penalty parameter. The study of ADMM has a lengthy academic lineage, its convergence and convergence rate are well-understood [13]-[16] for convex objectives.

However, for scenarios where there is at least one nonconvex component in the objective function, many studies primarily focus on proving the convergence of the ADMM or its variant and analyzing problem scenarios under additional conditions, such as Li and Pong [17] and Hong et al. [18]. Particularly, in 2017, Guo et al. [19] demonstrated that under conditions less stringent than those outlined in [17] [18], the convergence and convergence rate of ADMM for nonconvex problems can be established, contingent upon the augmented Lagrangian function satisfying the Kurdyka-Lojasiewicz inequality. Motivated by the insights provided in the aforementioned article, Wu et al. [20] considered utilizing symmetric ADMM to address a two-block linearly constrained separable nonconvex optimization, which can revert back to the classical ADMM, and conducted an analysis on its convergence and convergence rate for the case $B = I$ ( $I$ is the identity matrix with proper dimension). Note that it can numerically accelerate ADMM with some values of $α > 0$ . Its iterative format is as follows

${\begin{cases} x_{k + 1} \in \underset{x}{\arg \min} {ℒ_{β} (x, y_{k}, λ_{k})}, \\ λ_{k + \frac{1}{2}} : = λ_{k} - α β (A x_{k + 1} + B y_{k} - b), \\ y_{k + 1} \in \underset{y}{\arg \min} {ℒ_{β} (x_{k + 1}, y, λ_{k + \frac{1}{2}})}, \\ λ_{k + 1} : = λ_{k + \frac{1}{2}} - β (A x_{k + 1} + B y_{k + 1} - b), \end{cases}$ (1.3)

The only difference from the classical ADMM lies in the addition of a relaxation factor $α \in (- 1, 1)$ in the update of the multiplier $λ^{k + 1}$ between the iterative formulas of $x$ and $y$ . In particular, the algorithm returns to the classical ADMM when $α = 0$ . Therefore, it presents the same limitations in certain areas [21]-[23]. Whether using the ADMM or the symmetric ADMM to tackle two-block separable nonconvex problems with linear constraints, both methods are constrained by the assumption of Lipschitz continuity of differentiable functions.

To relax the Lipschitz continuity constraint on the gradient of the objective function, Tan and Guo [24] introduced a novel version of Bregman ADMM, which is distinguished from that proposed by Wang et al. [25] by its ability to revert to the classical ADMM. Its iteration is as follows

${\begin{cases} x_{k + 1} \in \underset{x}{\arg \min} {ℒ_{β}^{h} (x, y_{k}, λ_{k})}, \\ y_{k + 1} \in \underset{y}{\arg \min} {ℒ_{β}^{h} (x_{k + 1}, y, λ_{k})}, \\ λ_{k + 1} : = λ_{k} - β (\nabla h (A x_{k + 1} - b) - \nabla h (- y_{k + 1})), \end{cases}$ (1.4)

where $ℒ_{β}^{h} (\cdot)$ denotes the Bregman augmented Lagrangian function for (1.1)

$ℒ_{β}^{h} (x, y, λ) = f (x) + g (y) - 〈 λ, A x + y - b 〉 + β D_{h} (- y, A x - b),$ (1.5)

where $λ$ is the Lagrangian multiplier associated with the linear constraint, and $β > 0$ is the penalty parameter. And the Bregman distance $D_{h} (\cdot, \cdot)$ is defined as

$D_{h} (x, y) = h (x) - h (y) - \nabla h {(y)}^{T} (x - y) .$

When $h (\cdot) = \frac{1}{2} {‖ \cdot ‖}^{2}$ , the Bregman ADMM (1.4) reduces to the classical ADMM (1.2).

Drawing on the aforementioned concept, our aim is to alleviate the necessity for Lipschitz continuity of the gradient of differentiable functions in the symmetric ADMM while addressing two-block separable nonconvex problems with linear constraints. In this paper, we propose an iteration for symmetric version of the Bregman ADMM, whose iterative scheme is

${\begin{array}{l} x_{k + 1} \in \underset{x}{\arg \min} {ℒ_{β}^{h} (x, y_{k}, λ_{k})}, & (1.6 a) \\ λ_{k + \frac{1}{2}} : = λ_{k} - α β (\nabla h (A x_{k + 1} - b) - \nabla h (- y_{k})), & (1.6 b) \\ y_{k + 1} \in \underset{y}{\arg \min} {ℒ_{β}^{h} (x_{k + 1}, y, λ_{k + \frac{1}{2}})}, & (1.6 c) \\ λ_{k + 1} : = λ_{k + \frac{1}{2}} - β (\nabla h (A x_{k + 1} - b) - \nabla h (- y_{k + 1})), & (1.6 d) \end{array}$

The symmetric Bregman ADMM (1.6) can transition to the symmetric ADMM (1.3) by setting $h (\cdot) = \frac{1}{2} {‖ \cdot ‖}^{2}$ , to the Bregman ADMM (1.4) by setting $α = 0$ , and to the classic ADMM (1.2) by setting $h (\cdot) = \frac{1}{2} {‖ \cdot ‖}^{2}$ and $α = 0$ . In essence, the ADMM framework investigated by Wu et al. [20] and Tan and Guo [24] is a particular instance of the approach we introduce. Under the same assumption as those in Tan and Guo [24], we can prove the convergence of the symmetric Bregman ADMM (1.6), providing that the associated function satisfies the Kurdyka-Lojasiewicz inequality. Moreover, we demonstrate that the iterative sequence produced by the symmetric Bregman ADMM (1.6) converges to a critical point of the problem (1.1), and we also analyze the convergence rate of the algorithm.

The remainder of this paper is organized as follows. In Section 2, we provide some necessary preliminaries for our subsequent analysis. In Section 3, we present the convergence analysis of the symmetric Bregman ADMM (1.6) and analyze its convergence rate. Finally, in Section 4, we summarize our findings and draw conclusions.

2. Preliminaries

In this section, we recall some definitions and basic results that will be used for further analysis.

Definition 2.1. [26] For an extended real-valued function $f : ℝ^{n} \to ℝ \cup {+ \infty}$ , the effective domain or just the domain is the set

$d o m (f) = {x \in ℝ^{n} : f (x) < \infty} .$

Definition 2.2. [26] A function $f : ℝ^{n} \to ℝ \cup {+ \infty}$ is called proper if there exists at least one $x \in ℝ^{n}$ such that $f (x) < \infty$ .

Definition 2.3. [26] A function $f : ℝ^{n} \to ℝ \cup {+ \infty}$ is called lower semicontinuous at $x \in ℝ^{n}$ if

$f (x) \leq \underset{k \to \infty}{\lim \inf} f (x_{k}) .$

for any sequence ${x_{k}} \subseteq ℝ^{n}$ for which $x_{k} \to x$ as $k \to \infty$ . Moreover, $f (\cdot)$ is called lower semicontinuous if it is lower semicontinuous at each point in $ℝ^{n}$ .

Definition 2.4. ([27], kernel generating distance) Let $C$ be a nonempty, convex, and open subset of $ℝ^{m}$ . Associated with $C$ , a function $h : ℝ^{m} \to ℝ \cup {+ \infty}$ is called a kernel generating distance if it satisfies the following

(i) $h (\cdot)$ is proper, lower semicontinuous, and convex, with $d o m (h) \subset \bar{C}$ and $d o m (\partial h) = C$ .

(ii) $h (\cdot)$ is $C^{1}$ on $i n t (d o m (h)) \equiv C$ .

We denote the class of kernel generating distance by $G (C)$ .

Definition 2.5 ([27], L-smooth adaptable) Let $h \in G (C)$ , $g : ℝ^{m} \to ℝ$ continuously differentiable on $C = i n t (d o m (h))$ . A pair $(g, h)$ is called L-smooth adaptable on $C$ if there exists $L > 0$ such that $L h - g$ and $L h + g$ are convex on $C$ .

Remark 2.1. Definition 2.5 serves as a natural extension and complement to the definition of “A Lipschitz-like/Convexity Condition” as presented in reference [21]. This extension enables the derivation of the following two-sided descent lemma.

Lemma 2.1. ([27], extended descent lemma) The pair of functions $(g, h)$ is L-smooth adaptable on $C$ if and only if

$| g (x) - g (y) - 〈 \nabla g (y), x - y 〉 | \leq L D_{h} (x, y) \forall x, y \in i n t (d o m (h)) .$ (2.1)

Remark 2.2. In particular, when the set $C = ℝ^{m}$ and $h (\cdot) = \frac{1}{2} {‖ \cdot ‖}^{2}$ , (2.1) reduces to the classical descent lemma for function $g$ , i.e.,

$| g (x) - g (y) - 〈 \nabla g (y), x - y 〉 | \leq \frac{L}{2} \cdot {‖ x - y ‖}^{2} \forall x, y \in ℝ^{m} .$

Definition 2.6. [27] Let $g : ℝ^{m} \to ℝ \cup {+ \infty}$ be a proper and lower semicontinuous function. The gradient of $g (\cdot)$ is D-Lipschitz if there exists $L > 0$ satisfying

$‖ \nabla g (x) - \nabla g (y) ‖ \leq L \frac{D_{h} (x, y) + D_{h} (y, x)}{‖ x - y ‖}, x \neq y \in i n t (d o m (h)) .$

Remark 2.3. According to Cauchy-Schward inequality, we have

$| 〈 \nabla g (x) - \nabla g (y), x - y 〉 | \leq ‖ \nabla g (x) - \nabla g (y) ‖ ‖ x - y ‖,$

which combines with definition ??, we obtain that

$| 〈 \nabla g (x) - \nabla g (y), x - y 〉 | \leq L (D_{h} (x, y) + D_{h} (y, x)) .$

Using the conclusion in Lemma 2.4, the above inequality is equivalent to

$〈 \nabla g (x) - \nabla g (y), x - y 〉 \leq L (D_{h} (x, y) + D_{h} (y, x)) = L 〈 \nabla h (x) - \nabla h (y), x - y 〉,$

$〈 \nabla g (y) - \nabla g (x), x - y 〉 \leq L (D_{h} (x, y) + D_{h} (y, x)) = L 〈 \nabla h (x) - \nabla h (y), x - y 〉 .$

Then,

$〈 (L \nabla h (x) - \nabla g (x)) - (L \nabla h (y) - \nabla g (y)), x - y 〉 \geq 0,$

$〈 (L \nabla h (x) + \nabla g (x)) - (L \nabla h (y) + \nabla g (y)), x - y 〉 \geq 0.$ (2.2)

Based on inequality (2.2), it can be concluded that the functions $L h + g$ and $L h - g$ exhibit convexity due to the monotonicity of their gradients on the set $C$ . Hence, ensuring the gradient of function $g (\cdot)$ satisfies the D-Lipschitz continuity condition is adequate for establishing the $(g, h)$ function pair as L-smooth adaptable. Considering the intricacy of the ADMM iterative procedure in this study, we find it necessary to presuppose the D-Lipschitz continuity of function $g (\cdot)$ .

Remark 2.4. Certainly, the D-Lipschitz continuity characteristic is essentially a Lipschitz-like gradient property in the context of the Bregman distance for the function $g (\cdot)$ , and it becomes equivalent to the gradient Lipschitz continuity of $g (\cdot)$ when $h (\cdot) = \frac{1}{2} {‖ \cdot ‖}^{2}$ .

Definition 2.7. [27] Let $f : ℝ^{n} \to ℝ \cup {+ \infty}$ be a proper lower semicontinuous function.

(i) The Fréchet subdifferential, or regular subdifferential, of $f (\cdot)$ at $x \in d o m (f)$ , written $\hat{\partial} f (x)$ , is the set of vectors $x^{*} \in ℝ^{n}$ that satisfy

$lim_{y \neq x} inf_{y \to x} \frac{f (y) - f (x) - 〈 x^{*}, y - x 〉}{‖ y - x ‖} \geq 0.$

When $x \notin d o m (f)$ , we set $\hat{\partial} f (x) = \emptyset$ .

(ii) The limiting-subdifferential, or simply the subdifferential, of $f (\cdot)$ at $x \in d o m (f)$ , written $\partial f (x)$ , is defined as follows:

$\partial f (x) = {x^{*} \in ℝ^{n}, \exists x_{n} \to x, f (x_{n}) \to f (x), x_{n}^{*} \in \hat{\partial} f (x_{n}) with x_{n}^{*} \to x^{*}} .$

Remark 2.5. From above definition, we note that

(i) It implies $\hat{\partial} f (x) \subseteq \partial f (x)$ for each $x \in ℝ^{n}$ , where the first set is closed convex while the second one is only closed.

(ii) Let $(x_{k}, x_{k}^{*}) \in G r a p h \partial f$ be a sequence that converges to $(x, x^{*})$ . By the definition of $\partial f$ , if $f (x_{k})$ converges to $f (x)$ as $k \to + \infty$ , then $(x, x^{*}) \in G r a p h \partial f$ , where $G r a p h \partial f = {(x, y) | y \in \partial f (x)}$ .

(iii) A necessary condition for $x \in ℝ^{n}$ to be a minimizer of $f (\cdot)$ is

$0 \in \partial f (x) .$ (2.3)

(iv) If $f : ℝ^{n} \to ℝ \cup {+ \infty}$ is a proper lower semicontinuous function and $g : ℝ^{n} \to ℝ$ is continuous differentiable function, then $\partial (f + g) (x) = \partial f (x) + \nabla g (x)$ for any $x \in d o m (f)$ .

A point that meets the condition of Equation (2.3) is referred to as a critical or a stationary point. The critical points set of $f$ is denoted by $crit f$ .

Next, we recall an important property of subdifferential calculus.

Lemma 2.2. [28] Suppose that $F (x, y) = f (x) + g (x)$ , where $f : ℝ^{n} \to ℝ \cup {+ \infty}$ and $g : ℝ^{m} \to ℝ \cup {+ \infty}$ are proper lower semicontinuous functions. Then for all $(x, y) \in d o m (F) = d o m (f) \times d o m (g)$ , we have

$\partial F (x, y) = \partial_{x} F (x, y) \times \partial_{y} F (x, y) .$

Definition 2.8. ([28], Kurdyka-Lojasiewicz inequality) Let $f : ℝ^{n} \to ℝ \cup {+ \infty}$ be a proper lower semicontinuous function. For $- \infty < η_{1} < η_{1} \leq + \infty$ , set

$[η_{1} < f < η_{1}] = {x \in ℝ^{n} : η_{1} < f (x) < η_{2}} .$

We say that function $f (\cdot)$ has the KL property at $x^{*} \in d o m (\partial f)$ if there exist $η \in (0, + \infty]$ , a neighbourhood $U$ of $x^{*}$ , and a continuous concave function $φ : [0, η) \to ℝ_{+}$ , such that

(i) $φ (0) = 0$ ;

(ii) $φ$ is $C^{1}$ on $(0, η)$ and continuous at 0;

(iii) $φ^{'} (x) > 0$ , $\forall x \in (0, η)$ ;

(iv) for all $x$ in $U \cap [f (x^{*}) < f < f (x^{*}) + η]$ , the Kurdyka-Lojasiewicz inequality holds

$φ^{'} (f (x) - f (x^{*})) d (0, \partial f (x)) \geq 1,$

where $d (x, \partial f (x)) = inf_{y \in \partial f (x)} ‖ y - x ‖$ , is the distance from $x$ to $\partial f (x)$ .

Remark 2.6. Denote $Φ_{η}$ be the set of all continuous functions $φ (\cdot)$ which satisfy (i)-(iii).

Definition 2.9. ([29], Kurdyka-Lojasiewicz function) If function $f (\cdot)$ satisfies the KL property at each point of $d o m (\partial f)$ , then $f (\cdot)$ is called a KL function.

Lemma 2.3. ([30], Uniformized KL property) Let $Ω$ be a compact set and $f : ℝ^{n} \to ℝ \cup {+ \infty}$ be a proper and lower semicontinuous function. Assume that $f (\cdot)$ is constant on $Ω$ and satisfies the KL property at each point of $Ω$ . Then, there exist $ε > 0$ , $η > 0$ , and $φ \in Φ_{η}$ such that for all $\bar{x} \in Ω$ and for all $x$ in the following intersection:

${x \in ℝ^{n} : d (x, Ω) < ε} \cap [f (\bar{x}) < f < f (\bar{x}) + η] .$

one has

$φ' (f (x) - f (\bar{x})) d (0, \partial f (x)) \geq 1.$

Lemma 2.4. [31] Let $h : ℝ^{m} \to ℝ \cup {+ \infty}$ . For any $x, y \in i n t (d o m (h))$ and $z \in d o m (h)$ , then

(i) $D_{h} (x, y) + D_{h} (y, x) = 〈 \nabla h (x) - \nabla h (y), x - y 〉$ .

(ii) Three points identity holds:

$D_{h} (z, x) - D_{h} (z, y) - D_{h} (y, x) = 〈 \nabla h (x) - \nabla h (y), y - z 〉 .$ (2.4)

Definition 2.10. [32] Let $h \in G (C)$ . The Bregman distance $D_{h} : d o m (h) \times i n t (d o m (h)) \to ℝ^{+}$ is defined by

$D_{h} (x, y) : = h (x) - h (y) - 〈 \nabla h (y), x - y 〉 .$

Since $h (\cdot)$ is convex, $D_{h} (x, y) \geq 0$ , and $D_{h} (x, y) = 0$ if only if when $x = y$ .

Definition 2.11. We say that $(x^{*}, y^{*}, λ^{*})$ is a critical point of the Augmented Lagrangian Function $ℒ_{β}^{h} (\cdot)$ (1.5) with Bregman distance if it satisfies

${\begin{cases} A^{T} λ^{*} \in \partial f (x^{*}), \\ λ^{*} = \nabla g (y^{*}), \\ A x^{*} + y^{*} = b . \end{cases}$

Lemma 2.5. Let ${w_{k} : = (x_{k}, y_{k}, λ_{k})}$ be the sequence generated by the symmetric Bregman ADMM (1.6). Then, we have

${\begin{cases} \nabla h (A x_{k + 1} - b) - \nabla h (- y_{k}) = \frac{1}{(α + 1) β} (λ_{k} - λ_{k + 1}) + \frac{1}{α + 1} (\nabla h (- y_{k + 1}) - \nabla h (- y_{k})), \\ \nabla g (y_{k + 1}) = λ_{k + 1}, \\ \nabla h (A x_{k + 1} - b) - \nabla h (- y_{k + 1}) = \frac{1}{(α + 1) β} (λ_{k} - λ_{k + 1}) - \frac{α}{α + 1} (\nabla h (- y_{k + 1}) - \nabla h (- y_{k})) . \end{cases}$

Proof. Combining (1.6b) and (1.6d), we get

$\begin{matrix} λ_{k + 1} - λ_{k} = - α β (\nabla h (A x_{k + 1} - b) - \nabla h (- y_{k})) - β (\nabla h (A x_{k + 1} - b) - \nabla h (- y_{k + 1})) \\ = - α β (\nabla h (A x_{k + 1} - b) - \nabla h (- y_{k})) - β (\nabla h (A x_{k + 1} - b) - \nabla h (- y_{k})) \\ + β (\nabla h (A x_{k + 1} - b) - \nabla h (- y_{k})) - β (\nabla h (A x_{k + 1} - b) - \nabla h (- y_{k + 1})) \\ = - (α + 1) β (\nabla h (A x_{k + 1} - b) - \nabla h (- y_{k})) + β (\nabla h (- y_{k + 1}) - \nabla h (- y_{k})) \end{matrix}$

and thus

$\nabla h (A x_{k + 1} - b) - \nabla h (- y_{k}) = \frac{1}{(α + 1) β} (λ_{k} - λ_{k + 1}) + \frac{1}{α + 1} (\nabla h (- y_{k + 1}) - \nabla h (- y_{k})) .$

Similarly, we combine (1.6b) and (1.6d) again

$\begin{matrix} λ_{k + 1} - λ_{k} = - α β (\nabla h (A x_{k + 1} - b) - \nabla h (- y_{k})) - β (\nabla h (A x_{k + 1} - b) - \nabla h (- y_{k + 1})) \\ = - α β (\nabla h (A x_{k + 1} - b) - \nabla h (- y_{k + 1})) - β (\nabla h (A x_{k + 1} - b) - \nabla h (- y_{k + 1})) \\ - α β (\nabla h (- y_{k + 1}) - \nabla h (- y_{k})) \\ = - (α + 1) β (\nabla h (A x_{k + 1} - b) - \nabla h (- y_{k + 1})) - α β (\nabla h (- y_{k + 1}) - \nabla h (- y_{k})), \end{matrix}$

thus, we obtain

$\nabla h (A x_{k + 1} - b) - \nabla h (- y_{k + 1}) = \frac{1}{(α + 1) β} (λ_{k} - λ_{k + 1}) - \frac{α}{α + 1} (\nabla h (- y_{k + 1}) - \nabla h (- y_{k})) .$

From the optimality condition of (1.6c), we have

$0 = \nabla g (y_{k + 1}) - λ_{k + \frac{1}{2}} + β (\nabla h (A x_{k + 1} - b) - \nabla h (- y_{k + 1})) .$

Substituting (1.6d) into the above equation yields

$\nabla g (y_{k + 1}) = λ_{k + 1} .$

This completes the proof.

3. Convergence Analysis

In this section, we analyze the convergence of the symmetric Bregman ADMM (1.6) and show that the sequence ${w_{k} : = (x_{k}, y_{k}, λ_{k})}$ generated by the symmetric Bregman ADMM (1.6) converges to a critical point ${w^{*} : = (x^{*}, y^{*}, λ^{*})}$ of $ℒ_{h} (\cdot)$ under the following assumptions.

Assumption A. Let $f : ℝ^{n} \to ℝ \cup {+ \infty}$ be a proper lower semicontinuous function, $g : ℝ^{m} \to ℝ$ be a continuously differentiable function with $\nabla g$ being D-Lipschitz continuous. Additionally, let $h \in G (C)$ be a twice differentiable function on $C = i n t (d o m (h))$ , which is 1-strong-convex, and whose $\nabla h$ is Lipschitz continuous with $L_{h}$ on any bounded subset of $ℝ^{m}$ . We assume that

(i) $α \in (- \frac{1}{2 L_{h}^{2} + 1}, 0]$ , $β > \frac{2 L L_{h}^{2} - 2 α L L_{h}^{3}}{1 + α + 2 α L_{h}^{2}}$ , which implies

$δ = (\frac{β}{L_{h}} - L + \frac{2 α L L_{h} + 2 α β L_{h} + 2 α L L_{h}^{2}}{α + 1} - \frac{2 L^{2} L_{h}^{2}}{(α + 1) β}) > 0.$

(ii) $α \in [0, \frac{1}{2 L_{h}^{2} - 1})$ , $β > \frac{2 (1 + α) L L_{h}^{2} + 2 α L L_{h}^{3}}{1 + α - 2 α L_{h}^{2}}$ , which implies

$δ = (\frac{β}{L_{h}} - L - \frac{2 α L L_{h} + 2 α β L_{h} + 2 α L L_{h}^{2}}{α + 1} - \frac{2 L^{2} L_{h}^{2}}{(α + 1) β}) > 0.$

(iii) $A^{T} A \underline{≻} M I$ for some $M > 0$ .

Next, we examine the optimality conditions of (1.6). Invoking the optimality conditions, we have that

${\begin{cases} 0 \in \partial f (x_{k + 1}) - A^{T} λ_{k} - β A^{T} {(\nabla^{2} h (A x_{k + 1} - b))}^{T} (- A x_{k + 1} - y_{k} + b), \\ λ_{k + \frac{1}{2}} = λ_{k} - α β (\nabla h (A x_{k + 1} - b) - \nabla h (- y_{k})), \\ 0 = \nabla g (y_{k + 1}) - λ_{k + \frac{1}{2}} + β (\nabla h (A x_{k + 1} - b) - \nabla h (- y_{k + 1})), \\ λ_{k + 1} = λ_{k + \frac{1}{2}} - β (\nabla h (A x_{k + 1} - b) - \nabla h (- y_{k + 1})) . \end{cases}$ (3.1)

Lemma 3.1. Let ${w_{k} = (x_{k}, y_{k}, λ_{k})}$ be the sequence generated by the symmetric Bregman ADMM (1.6), which is assumed to be bounded. Then we have

$ℒ_{β}^{h} (w_{k + 1}) \leq ℒ_{β}^{h} (w_{k}) - δ \cdot D_{h} (y_{k}, y_{k + 1}) .$ (3.2)

Proof. From the definition of $ℒ_{β}^{h}$ in (1.5), it follows that

$\begin{array}{l} ℒ_{β}^{h} (x_{k + 1}, y_{k}, λ_{k + \frac{1}{2}}) - ℒ_{β}^{h} (x_{k + 1}, y_{k + 1}, λ_{k + \frac{1}{2}}) \\ = g (y_{k}) - 〈 λ_{k + \frac{1}{2}}, A x_{k + 1} + y_{k} - b 〉 + β D_{h} (- y_{k}, A x_{k + 1} - b) \\ - g (y_{k + 1}) + 〈 λ_{k + \frac{1}{2}}, A x_{k + 1} + y_{k + 1} - b 〉 - β D_{h} (- y_{k + 1}, A x_{k + 1} - b) \\ = g (y_{k}) - g (y_{k + 1}) + 〈 λ_{k + \frac{1}{2}}, y_{k + 1} - y_{k} 〉 + β D_{h} (- y_{k}, A x_{k + 1} - b) \\ - β D_{h} (- y_{k + 1}, A x_{k + 1} - b) . \end{array}$ (3.3)

Since the gradient of the function $g$ is D-Lipschitz continuous on $i n t (d o m (h))$ , it follows that the function pair $(g, h)$ is L-smooth adaptable. Then, according to Lemma 2.1, we can obtain

$g (y_{k}) - g (y_{k + 1}) \geq 〈 \nabla g (y_{k + 1}), y_{k} - y_{k + 1} 〉 - L D_{h} (y_{k}, y_{k + 1}) .$ (3.4)

By substituting inequality (3.4) into relation (3.3), we obtain

$\begin{array}{l} ℒ_{β}^{h} (x_{k + 1}, y_{k}, λ_{k + \frac{1}{2}}) - ℒ_{β}^{h} (x_{k + 1}, y_{k + 1}, λ_{k + \frac{1}{2}}) \\ \geq 〈 \nabla g (y_{k + 1}), y_{k} - y_{k + 1} 〉 - L D_{h} (y_{k}, y_{k + 1}) + 〈 λ_{k + \frac{1}{2}}, y_{k + 1} - y_{k} 〉 \\ + β D_{h} (- y_{k}, A x_{k + 1} - b) - β D_{h} (- y_{k + 1}, A x_{k + 1} - b) \\ = 〈 λ_{k + 1} - λ_{k + \frac{1}{2}}, y_{k} - y_{k + 1} 〉 - L D_{h} (y_{k}, y_{k + 1}) \end{array}$

$\begin{array}{l} + β D_{h} (- y_{k}, A x_{k + 1} - b) - β D_{h} (- y_{k + 1}, A x_{k + 1} - b) \\ = 〈 λ_{k + 1} - λ_{k + \frac{1}{2}}, y_{k} - y_{k + 1} 〉 - L D_{h} (y_{k}, y_{k + 1}) + β D_{h} (- y_{k}, - y_{k + 1}) \\ + β (D_{h} (- y_{k}, A x_{k + 1} - b) - D_{h} (- y_{k}, - y_{k + 1}) - D_{h} (- y_{k + 1}, A x_{k + 1} - b)) \\ = 〈 λ_{k + 1} - λ_{k + \frac{1}{2}}, y_{k} - y_{k + 1} 〉 - L D_{h} (y_{k}, y_{k + 1}) + β D_{h} (- y_{k}, - y_{k + 1}) \\ + β 〈 \nabla h (A x_{k + 1} - b) - \nabla h (- y_{k + 1}), y_{k} - y_{k + 1} 〉 \\ = 〈 λ_{k + 1} - λ_{k + \frac{1}{2}}, y_{k} - y_{k + 1} 〉 - L D_{h} (y_{k}, y_{k + 1}) + β D_{h} (- y_{k}, - y_{k + 1}) \\ + 〈 λ_{k + \frac{1}{2}} - λ_{k + 1}, y_{k} - y_{k + 1} 〉 \\ = β D_{h} (- y_{k}, - y_{k + 1}) - L D_{h} (y_{k}, y_{k + 1}), \end{array}$

where the first and the third equalities follow from Lemma (2.5) and the three points identity (2.4) respectively.

Based on the 1-strong convexity and $L_{h}$ -smoothness of the function $h (\cdot)$ , we can obtain the following two inequalities respectively

$D_{h} (y_{k}, y_{k + 1}) = h (y_{k}) - h (y_{k + 1}) - 〈 \nabla h (y_{k + 1}), y_{k} - y_{k + 1} 〉 \leq \frac{L_{h}}{2} {‖ y_{k} - y_{k + 1} ‖}^{2},$

$D_{h} (- y_{k}, - y_{k + 1}) = h (- y_{k}) - h (- y_{k + 1}) - 〈 \nabla h (- y_{k + 1}), y_{k + 1} - y_{k} 〉 \geq \frac{1}{2} {‖ y_{k} - y_{k + 1} ‖}^{2} .$

Combining the above two inequalities, we get

$D_{h} (- y_{k}, - y_{k + 1}) \geq \frac{1}{2} {‖ y_{k + 1} - y_{k} ‖}^{2} \geq \frac{1}{L_{h}} \cdot D_{h} (y_{k}, y_{k + 1}),$

thus, we can obtain

$ℒ_{β}^{h} (x_{k + 1}, y_{k}, λ_{k + \frac{1}{2}}) - ℒ_{β}^{h} (x_{k + 1}, y_{k + 1}, λ_{k + \frac{1}{2}}) \geq (\frac{β}{L_{h}} - L) D_{h} (y_{k}, y_{k + 1}) .$ (3.5)

Next, by using (1.6b) and (1.5), we have

$\begin{array}{l} ℒ_{β}^{h} (x_{k + 1}, y_{k}, λ_{k}) - ℒ_{β}^{h} (x_{k + 1}, y_{k}, λ_{k + \frac{1}{2}}) + ℒ_{β}^{h} (x_{k + 1}, y_{k + 1}, λ_{k + \frac{1}{2}}) - ℒ_{β}^{h} (x_{k + 1}, y_{k + 1}, λ_{k + 1}) \\ = 〈 λ_{k + \frac{1}{2}} - λ_{k}, A x_{k + 1} + y_{k} - b 〉 + 〈 λ_{k + 1} - λ_{k + \frac{1}{2}}, A x_{k + 1} + y_{k + 1} - b 〉 \\ = 〈 - α β (\nabla h (A x_{k + 1} - b) - \nabla h (- y_{k})), A x_{k + 1} + y_{k} - b 〉 \\ + 〈 λ_{k + 1} - λ_{k} + α β (\nabla h (A x_{k + 1} - b) - \nabla h (- y_{k})), A x_{k + 1} + y_{k + 1} - b 〉 \\ = 〈 α β (\nabla h (A x_{k + 1} - b) - \nabla h (- y_{k})), y_{k + 1} - y_{k} 〉 + 〈 λ_{k + 1} - λ_{k}, A x_{k + 1} + y_{k + 1} - b 〉, \end{array}$

where the second equality follows from (1.6b). Then, according to Lemma 2.5

$\nabla h (A x_{k + 1} - b) - \nabla h (- y_{k + 1}) = \frac{1}{(α + 1) β} (λ_{k} - λ_{k + 1}) - \frac{α}{α + 1} (\nabla h (- y_{k + 1}) - \nabla h (- y_{k})),$

$\nabla h (A x_{k + 1} - b) - \nabla h (- y_{k}) = \frac{1}{(α + 1) β} (λ_{k} - λ_{k + 1}) + \frac{1}{α + 1} (\nabla h (- y_{k + 1}) - \nabla h (- y_{k})),$

and, since $h (\cdot)$ is 1-strong-convex, we have

$‖ \nabla h (- y_{k + 1}) - \nabla h (A x_{k + 1} - b) ‖ \geq ‖ A x_{k + 1} + y_{k + 1} - b ‖ .$

Now, we discuss the two cases based on the range of $α$ .

(i) $α \in (- \frac{1}{2 L_{h}^{2} + 1}, 0]$ , combining the above formulas

$\begin{array}{l} ℒ_{β}^{h} (x_{k + 1}, y_{k}, λ_{k}) - ℒ_{β}^{h} (x_{k + 1}, y_{k}, λ_{k + \frac{1}{2}}) + ℒ_{β}^{h} (x_{k + 1}, y_{k + 1}, λ_{k + \frac{1}{2}}) - ℒ_{β}^{h} (x_{k + 1}, y_{k + 1}, λ_{k + 1}) \\ \geq 〈 \frac{α β}{(α + 1) β} (λ_{k} - λ_{k + 1}) + \frac{α β}{α + 1} (\nabla h (- y_{k + 1}) - \nabla h (- y_{k})), y_{k + 1} - y_{k} 〉 \\ - ‖ λ_{k + 1} - λ_{k} ‖ ‖ A x_{k + 1} + y_{k + 1} - b ‖ \\ \geq 〈 \frac{α}{α + 1} (λ_{k} - λ_{k + 1}) + \frac{α β}{α + 1} (\nabla h (- y_{k + 1}) - \nabla h (- y_{k})), y_{k + 1} - y_{k} 〉 \\ - ‖ λ_{k + 1} - λ_{k} ‖ ‖ \nabla h (- y_{k + 1}) - \nabla h (A x_{k + 1} - b) ‖ \\ \geq 〈 \frac{α}{α + 1} (λ_{k} - λ_{k + 1}) + \frac{α β}{α + 1} (\nabla h (- y_{k + 1}) - \nabla h (- y_{k})), y_{k + 1} - y_{k} 〉 \\ - ‖ λ_{k + 1} - λ_{k} ‖ [\frac{1}{(α + 1) β} ‖ λ_{k + 1} - λ_{k} ‖ - \frac{α}{α + 1} ‖ \nabla h (- y_{k + 1}) - \nabla h (- y_{k}) ‖] \\ \geq \frac{α}{α + 1} ‖ λ_{k} - λ_{k + 1} ‖ ‖ y_{k + 1} - y_{k} ‖ + \frac{α β}{α + 1} ‖ \nabla h (- y_{k + 1}) - \nabla h (- y_{k}) ‖ ‖ y_{k + 1} - y_{k} ‖ \\ - \frac{1}{(α + 1) β} {‖ λ_{k + 1} - λ_{k} ‖}^{2} + \frac{α}{α + 1} ‖ λ_{k + 1} - λ_{k} ‖ ‖ \nabla h (- y_{k + 1}) - \nabla h (- y_{k}) ‖ . \end{array}$ (3.6)

Subsequently, we claim that

$‖ \nabla g (y_{k + 1}) - \nabla g (y_{k}) ‖ \leq L L_{h} \cdot ‖ y_{k + 1} - y_{k} ‖ .$ (3.7)

To prove (3.7), we consider two cases. When $y_{k} = y_{k + 1}$ , (3.7) holds obviously. Next, we assume $y_{k + 1} \neq y_{k}$ . Since $\nabla g (\cdot)$ is $D$ -Lipschitz, we obtain

$\begin{matrix} ‖ \nabla g (y_{k + 1}) - \nabla g (y_{k}) ‖ \leq L \cdot \frac{D_{h} (y_{k + 1}, y_{k}) + D_{h} (y_{k}, y_{k + 1})}{‖ y_{k + 1} - y_{k} ‖} \\ = L \cdot \frac{〈 \nabla h (y_{k}) - \nabla h (y_{k + 1}), y_{k} - y_{k + 1} 〉}{‖ y_{k + 1} - y_{k} ‖} \\ \leq L \cdot \frac{‖ \nabla h (y_{k}) - \nabla h (y_{k + 1}) ‖ \cdot ‖ y_{k} - y_{k + 1} ‖}{‖ y_{k + 1} - y_{k} ‖} \\ \leq L \cdot \frac{L_{h} {‖ y_{k} - y_{k + 1} ‖}^{2}}{‖ ‖ y_{k + 1} - y_{k} ‖ ‖} = L L_{h} \cdot ‖ y_{k + 1} - y_{k} ‖, \end{matrix}$

where the second inequality is a consequence of $\nabla h$ is Lipschitz continuous with $L_{h} (L_{h} \geq 1)$ on any bounded subset of $ℝ^{m}$ , that is

$‖ \nabla h (y_{k + 1}) - \nabla h (y_{k}) ‖ \leq L_{h} ‖ y_{k + 1} - y_{k} ‖ .$ (3.8)

Since $λ_{k + 1} = \nabla g (y_{k + 1})$ , (3.7) becomes

$‖ λ_{k + 1} - λ_{k} ‖ \leq L L_{h} ‖ y_{k + 1} - y_{k} ‖ .$ (3.9)

Moreover, according to $h (\cdot)$ is 1-strong-convex, we get $D_{h} (y_{k}, y_{k + 1}) \geq \frac{1}{2} {‖ y_{k} - y_{k + 1} ‖}^{2}$ , i.e.,

${‖ y_{k} - y_{k + 1} ‖}^{2} \leq 2 D_{h} (y_{k}, y_{k + 1}) .$ (3.10)

Substituting (3.8), (3.9) and (3.10) into (3.6), we conclude that

$\begin{array}{l} ℒ_{β}^{h} (x_{k + 1}, y_{k}, λ_{k}) - ℒ_{β}^{h} (x_{k + 1}, y_{k}, λ_{k + \frac{1}{2}}) + ℒ_{β}^{h} (x_{k + 1}, y_{k + 1}, λ_{k + \frac{1}{2}}) - ℒ_{β}^{h} (x_{k + 1}, y_{k + 1}, λ_{k + 1}) \\ \geq \frac{2 α L L_{h}}{α + 1} D_{h} (y_{k}, y_{k + 1}) + \frac{2 α β L_{h}}{α + 1} D_{h} (y_{k}, y_{k + 1}) \\ - \frac{2 L^{2} L_{h}^{2}}{(α + 1) β} D_{h} (y_{k}, y_{k + 1}) + \frac{2 α L L_{h}^{2}}{α + 1} D_{h} (y_{k}, y_{k + 1}) \\ = (\frac{2 α L L_{h} + 2 α β L_{h} + 2 α L L_{h}^{2}}{α + 1} - \frac{2 L^{2} L_{h}^{2}}{(α + 1) β}) D_{h} (y_{k}, y_{k + 1}) . \end{array}$ (3.11)

Then, combining (3.5) and (3.11), we obtain that

$\begin{array}{l} ℒ_{β}^{h} (x_{k + 1}, y_{k}, λ_{k}) - ℒ_{β}^{h} (x_{k + 1}, y_{k + 1}, λ_{k + 1}) \\ \geq (\frac{β}{L_{h}} - L + \frac{2 α L L_{h} + 2 α β L_{h} + 2 α L L_{h}^{2}}{α + 1} - \frac{2 L^{2} L_{h}^{2}}{(α + 1) β}) D_{h} (y_{k}, y_{k + 1}) . \end{array}$ (3.12)

Consequently, according to (1.6a), we have

$ℒ_{β}^{h} (x_{k}, y_{k}, λ_{k}) - ℒ_{β}^{h} (x_{k + 1}, y_{k}, λ_{k}) \geq 0.$ (3.13)

Finally, summing the inequality (3.12) and (3.13), we conclude that

$\begin{array}{l} ℒ_{β}^{h} (w^{k}) - ℒ_{β}^{h} (w^{k + 1}) \\ \geq (\frac{β}{L_{h}} - L + \frac{2 α L L_{h} + 2 α β L_{h} + 2 α L L_{h}^{2}}{α + 1} - \frac{2 L^{2} L_{h}^{2}}{(α + 1) β}) D_{h} (y_{k}, y_{k + 1}) . \end{array}$

This completes the discussion for case (i), we now proceed to discuss case (ii).

(ii) $α \in [0, \frac{1}{2 L_{h}^{2} - 1})$ , combining the same formulas

$\begin{array}{l} - ‖ λ_{k + 1} - λ_{k} ‖ ‖ \nabla h (- y_{k + 1}) - \nabla h (A x_{k + 1} - b) ‖ \\ \geq 〈 \frac{α}{α + 1} (λ_{k} - λ_{k + 1}) + \frac{α β}{α + 1} (\nabla h (- y_{k + 1}) - \nabla h (- y_{k})), y_{k + 1} - y_{k} 〉 \\ - ‖ λ_{k + 1} - λ_{k} ‖ [\frac{1}{(α + 1) β} ‖ λ_{k + 1} - λ_{k} ‖ + \frac{α}{α + 1} ‖ \nabla h (- y_{k + 1}) - \nabla h (- y_{k}) ‖] \\ \geq - \frac{α}{α + 1} ‖ λ_{k} - λ_{k + 1} ‖ ‖ y_{k + 1} - y_{k} ‖ - \frac{α β}{α + 1} ‖ \nabla h (- y_{k + 1}) - \nabla h (- y_{k}) ‖ ‖ y_{k + 1} - y_{k} ‖ \\ - \frac{1}{(α + 1) β} {‖ λ_{k + 1} - λ_{k} ‖}^{2} - \frac{α}{α + 1} ‖ λ_{k + 1} - λ_{k} ‖ ‖ \nabla h (- y_{k + 1}) - \nabla h (- y_{k}) ‖ . \end{array}$ (3.14)

Similarly, substituting (3.8), (3.9) and (3.10) into (3.14), we conclude that

$\begin{array}{l} ℒ_{β}^{h} (x_{k + 1}, y_{k}, λ_{k}) - ℒ_{β}^{h} (x_{k + 1}, y_{k}, λ_{k + \frac{1}{2}}) + ℒ_{β}^{h} (x_{k + 1}, y_{k + 1}, λ_{k + \frac{1}{2}}) - ℒ_{β}^{h} (x_{k + 1}, y_{k + 1}, λ_{k + 1}) \\ \geq - \frac{2 α L L_{h}}{α + 1} D_{h} (y_{k}, y_{k + 1}) - \frac{2 α β L_{h}}{α + 1} D_{h} (y_{k}, y_{k + 1}) \\ - \frac{2 L^{2} L_{h}^{2}}{(α + 1) β} D_{h} (y_{k}, y_{k + 1}) - \frac{2 α L L_{h}^{2}}{α + 1} D_{h} (y_{k}, y_{k + 1}) \\ = (- \frac{2 α L L_{h} + 2 α β L_{h} + 2 α L L_{h}^{2}}{α + 1} - \frac{2 L^{2} L_{h}^{2}}{(α + 1) β}) D_{h} (y_{k}, y_{k + 1}) . \end{array}$ (3.15)

Then, combining (3.5) and (3.15), we obtain that

$\begin{array}{l} ℒ_{β}^{h} (x_{k + 1}, y_{k}, λ_{k}) - ℒ_{β}^{h} (x_{k + 1}, y_{k + 1}, λ_{k + 1}) \\ \geq (\frac{β}{L_{h}} - L - \frac{2 α L L_{h} + 2 α β L_{h} + 2 α L L_{h}^{2}}{α + 1} - \frac{2 L^{2} L_{h}^{2}}{(α + 1) β}) D_{h} (y_{k}, y_{k + 1}) . \end{array}$ (3.16)

Consequently, according to (1.6a), we have

$ℒ_{β}^{h} (x_{k}, y_{k}, λ_{k}) - ℒ_{β}^{h} (x_{k + 1}, y_{k}, λ_{k}) \geq 0.$ (3.17)

Finally, summing the inequality (3.16) and (3.17), we conclude that

$\begin{array}{l} ℒ_{β}^{h} (w^{k}) - ℒ_{β}^{h} (w^{k + 1}) \\ \geq (\frac{β}{L_{h}} - L - \frac{2 α L L_{h} + 2 α β L_{h} + 2 α L L_{h}^{2}}{α + 1} - \frac{2 L^{2} L_{h}^{2}}{(α + 1) β}) D_{h} (y_{k}, y_{k + 1}) . \end{array}$

This completes the discussion for case (ii), and with this, the proof is complete.□

Remark 3.1. Since $δ > 0$ , Lemma 3.1 implies that ${ℒ_{β}^{h} (w_{k})}$ is monotonically nonincreasing. Note that when $α = 0$ , we have $β > 2 L L_{h}^{2}$ . This corresponds to the requirement in Tan and Guo [24]. Furthermore, when $h (\cdot) = \frac{1}{2} {‖ \cdot ‖}^{2}$ , we have $α = 0$ and $L_{h} = 1$ , thus $β > 2 L$ . This corresponds to the requirement in Guo et al. [19].

Lemma 3.2. Let ${w_{k} = (x_{k}, y_{k}, λ_{k})}$ be the sequence generated by the symmetric Bregman ADMM (1.6), which is assumed to be bounded. Then we have

$\sum_{k = 0}^{+ \infty} {‖ w_{k + 1} - w_{k} ‖}^{2} < + \infty .$

Proof: Considering that ${w_{k}}$ is bounded, it has at least one limit point. Let $w^{*}$ be a limit point of ${w_{k}}$ and let ${w_{k_{j}}}$ be the subsequence converging to it, i.e. $w_{k_{j}} \to w^{*}$ . Given that $ℒ_{β}^{h} (\cdot)$ is a lower semicontinuous function, we can deduce that

$ℒ_{β}^{h} (w^{*}) \leq \underset{j \to + \infty}{\lim \inf} ℒ_{β}^{h} (w_{k_{j}}) .$

Consequently, ${ℒ_{β}^{h} (w_{k_{j}})}$ is bounded from below. Besides, the fact that ${ℒ_{β}^{h} (w_{k})}$ is nonincreasing implies that ${ℒ_{β}^{h} (w_{k_{j}})}$ is convergent. Moreover, ${ℒ_{β}^{h} (w_{k})}$ is convergent, and $ℒ_{β}^{h} (w_{k}) \geq ℒ_{β}^{h} (w^{*})$ . Based on the Equation (3.2), we can obtain

$δ \cdot D_{h} (y_{k}, y_{k + 1}) \leq ℒ_{β}^{h} (w_{k}) - ℒ_{β}^{h} (w_{k + 1}) .$

Summing up the above inequality from $k = 0$ to $n$ , we conclude that

$\sum_{k = 0}^{n} δ D_{h} (y_{k}, y_{k + 1}) \leq ℒ_{β}^{h} (w_{0}) - ℒ_{β}^{h} (w_{n + 1}) \leq ℒ_{β}^{h} (w_{0}) - ℒ_{β}^{h} (w^{*}) < + \infty .$

Owing to $δ > 0$ , we have $\sum_{k = 0}^{\infty} D_{h} (y_{k}, y_{k + 1}) < + \infty$ , which implies $\sum_{k = 0}^{\infty} {‖ y_{k + 1} - y_{k} ‖}^{2} < + \infty$ . Consequently, according to (3.9), we get that

$\sum_{k = 0}^{\infty} {‖ λ_{k + 1} - λ_{k} ‖}^{2} < + \infty .$

Recall Lemma 2.5, we have

$λ_{k + 1} = λ_{k} - β [(α + 1) (\nabla h (A x_{k + 1} - b) - \nabla h (- y_{k})) - (\nabla h (- y_{k + 1}) - \nabla h (- y_{k}))],$

$λ_{k} = λ_{k - 1} - β [(α + 1) (\nabla h (A x_{k} - b) - \nabla h (- y_{k - 1})) - (\nabla h (- y_{k}) - \nabla h (- y_{k - 1}))] .$

Combining the two equalities, we obtain

$\begin{array}{l} (α + 1) β (\nabla h (A x_{k + 1} - b) - \nabla h (A x_{k} - b)) \\ = (λ_{k} - λ_{k - 1}) - (λ_{k + 1} - λ_{k}) - α β (\nabla h (- y_{k - 1}) - \nabla h (- y_{k})) \\ - β (\nabla h (- y_{k}) - \nabla h (- y_{k + 1})) . \end{array}$

Then, we use the 1-strong-convex of $h (\cdot)$ and Assumption A(iii), we can follow

$M {‖ x_{k + 1} - x_{k} ‖}^{2} \leq {‖ A x_{k + 1} - A x_{k} ‖}^{2} \leq {‖ \nabla h (A x_{k + 1} - b) - \nabla h (A x_{k} - b) ‖}^{2} .$

Thus, combining the above two formulas, we obtain that

$\begin{array}{l} {(α + 1)}^{2} β^{2} M {‖ x_{k + 1} - x_{k} ‖}^{2} \\ \leq {(α + 1)}^{2} β^{2} {‖ A x_{k + 1} - A x_{k} ‖}^{2} \leq {(α + 1)}^{2} β^{2} {‖ \nabla h (A x_{k + 1} - b) - \nabla h (A x_{k} - b) ‖}^{2} \\ \leq 4 ({‖ λ_{k} - λ_{k - 1} ‖}^{2} + {‖ λ_{k + 1} - λ_{k} ‖}^{2} + α^{2} β^{2} {‖ \nabla h (- y_{k - 1}) - \nabla h (- y_{k}) ‖}^{2} \\ + β^{2} {‖ \nabla h (- y_{k}) - \nabla h (- y_{k + 1}) ‖}^{2}) \\ \leq 4 ({‖ λ_{k} - λ_{k - 1} ‖}^{2} + {‖ λ_{k + 1} - λ_{k} ‖}^{2} + α^{2} β^{2} L_{h}^{2} {‖ y_{k} - y_{k - 1} ‖}^{2} + β^{2} L_{h}^{2} {‖ y_{k + 1} - y_{k} ‖}^{2}), \end{array}$ (3.18)

where $M > 0$ . Then, the last inequality implies $\sum_{k = 0}^{\infty} {‖ x_{k + 1} - x_{k} ‖}^{2} < + \infty$ .

Thus, $\sum_{k = 0}^{\infty} {‖ w_{k + 1} - w_{k} ‖}^{2} < + \infty$ . This completes the proof.□

Lemma 3.3. Let ${w_{k} = (x_{k}, y_{k}, λ_{k})}$ be the sequence generated by the symmetric Bregman ADMM (1.6), which is supposed to be bounded, and given that Assumption A holds. For any positive integer $k$ , we define

${\begin{cases} x_{k + 1}^{*} : = A^{T} λ_{k} - A^{T} λ_{k + 1} + β A^{T} 〈 \nabla h^{2} (A x_{k + 1} - b), y_{k + 1} - y_{k} 〉, \\ y_{k + 1}^{*} : = λ_{k + \frac{1}{2}} - λ_{k + 1}, \\ λ_{k + 1}^{*} : = - A x_{k + 1} - y_{k + 1} + b . \end{cases}$

Then, $(x_{k + 1}^{*}, y_{k + 1}^{*}, λ_{k + 1}^{*}) \in \partial ℒ_{β}^{h} (w_{k + 1})$ and there exists $ξ > 0$ such that

$d (0, \partial ℒ_{β}^{h} (w_{k + 1})) \leq ξ ‖ y_{k + 1} - y_{k} ‖ .$

Proof: By definition of function $ℒ_{β}^{h} (\cdot)$ , we can obtain that

${\begin{cases} \partial_{x} ℒ_{β}^{h} (w_{k + 1}) = \partial f (x_{k + 1}) - A^{T} λ_{k + 1} - β A^{T} 〈 \nabla^{2} h (A x_{k + 1} - b), - A x_{k + 1} - y_{k + 1} + b 〉, \\ \partial_{y} ℒ_{β}^{h} (w_{k + 1}) = \nabla g (y_{k + 1}) - λ_{k + 1} + β (\nabla h (A x_{k + 1} - b) - \nabla h (- y_{k + 1})), \\ \partial_{λ} ℒ_{β}^{h} (w_{k + 1}) = - (A x_{k + 1} + y_{k + 1} - b) . \end{cases}$ (3.19)

Combining the optimality condition (3.1) with equation (3.19), we get

${\begin{cases} A^{T} λ_{k} - A^{T} λ_{k + 1} + β A^{T} 〈 \nabla h^{2} (A x_{k + 1} - b), y_{k + 1} - y_{k} 〉 \in \partial_{x} ℒ_{β}^{h} (w_{k + 1}), \\ λ_{k + \frac{1}{2}} - λ_{k + 1} \in \partial_{y} ℒ_{β}^{h} (w_{k + 1}), \\ - A x_{k + 1} - y_{k + 1} + b \in \partial_{λ} ℒ_{β}^{h} (w_{k + 1}) . \end{cases}$

Then, according to Lemma 2.2, we obtain that $(x_{k + 1}^{*}, y_{k + 1}^{*}, λ_{k + 1}^{*}) \in \partial ℒ_{β}^{h} (w_{k + 1})$ .

Furthermore,

$〈 \nabla h^{2} (A x_{k + 1} - b), y_{k + 1} - y_{k} 〉 \leq ‖ \nabla h^{2} (A x_{k + 1} - b) ‖ ‖ y_{k + 1} - y_{k} ‖ \leq L_{h} ‖ y_{k + 1} - y_{k} ‖ .$

In addition,

$\begin{matrix} ‖ A x_{k + 1} + y_{k + 1} - b ‖ \leq ‖ \nabla h (A x_{k + 1} - b) - \nabla h (- y_{k + 1}) ‖ \\ \leq \frac{1}{(α + 1) β} ‖ λ_{k} - λ_{k + 1} ‖ + \frac{α}{α + 1} ‖ \nabla h (- y_{k + 1}) - \nabla h (- y_{k}) ‖ \\ \leq \frac{1}{(α + 1) β} ‖ λ_{k} - λ_{k + 1} ‖ + \frac{α L_{h}}{α + 1} ‖ y_{k} - y_{k + 1} ‖ . \end{matrix}$

On the other hand,

$\begin{matrix} ‖ λ_{k + \frac{1}{2}} - λ_{k + 1} ‖ = ‖ β (\nabla h (A x_{k + 1} - b) - \nabla h (- y_{k + 1})) ‖ \\ \leq \frac{1}{α + 1} ‖ λ_{k} - λ_{k + 1} ‖ + \frac{α β}{α + 1} ‖ \nabla h (- y_{k + 1}) - \nabla h (- y_{k}) ‖ \\ \leq \frac{1}{α + 1} ‖ λ_{k} - λ_{k + 1} ‖ + \frac{α β L_{h}}{α + 1} ‖ y_{k + 1} - y_{k} ‖ . \end{matrix}$

Based on the above relationship, we know that there exist $ξ_{1}, ξ_{2} > 0$ such that

$‖ (x_{k + 1}^{*}, y_{k + 1}^{*}, λ_{k + 1}^{*}) ‖ \leq ξ_{1} ‖ y_{k + 1} - y_{k} ‖ + ξ_{2} ‖ λ_{k + 1} - λ_{k} ‖ .$

Defining $ξ : = ξ_{1} + ξ_{2} L L_{h}$ and using (3.9), we obtain

$d (0, \partial ℒ_{β}^{h} (w_{k + 1})) \leq ‖ (x_{k + 1}^{*}, y_{k + 1}^{*}, λ_{k + 1}^{*}) ‖ \leq ξ ‖ y_{k + 1} - y_{k} ‖ .$

This completes the proof.

Lemma 3.4. Let ${w_{k} = (x_{k}, y_{k}, λ_{k})}$ be the sequence generated by the symmetric Bregman ADMM (1.6), which is supposed to be bounded. Let $S (w_{0})$ denote the set of its limit points. Then

(i) $S (w_{0})$ is a nonempty compact set, and

$d (w_{k}, S (w_{0})) \to 0$ , as $k \to + \infty$ ;

(ii) $S (w_{0}) \subset c r i t ℒ_{β}^{h}$ , where $c r i t ℒ_{β}^{h}$ denotes the set of all stationary points of $ℒ_{β}^{h}$ ;

(iii) $ℒ_{β}^{h} (\cdot)$ is finite and constant on $S (w_{0})$ , which equals to

$inf_{k \in ℕ} ℒ_{β}^{h} (w_{k}) = lim_{k \to + \infty} ℒ_{β}^{h} (w_{k}) .$

Proof:

(i) The proposition is immediately derived from the definition of limit points.

(ii) We assume that $(x^{*}, y^{*}, λ^{*}) \in S (w_{0})$ , then there exists a subsequence ${(x_{k_{j}}, y_{k_{j}}, λ_{k_{j}})}$ that converges to $(x^{*}, y^{*}, λ^{*})$ . Note that Lemma 3.2, we have

$‖ w_{k + 1} - w_{k} ‖ \to 0.$ (3.20)

Consequently, we deduce that ${(x_{k_{j} + 1}, y_{k_{j} + 1}, λ_{k_{j} + 1})}$ also converges to $(x^{*}, y^{*}, λ^{*})$ . Given that $x_{k + 1}$ is the minimizer of $ℒ_{β}^{h} (x, y_{k}, λ_{k})$ concerning the variable $x$ , we have

$ℒ_{β}^{h} (x_{k + 1}, y_{k}, λ_{k}) \leq ℒ_{β}^{h} (x^{*}, y_{k}, λ_{k}) .$ (3.21)

On one hand, according to (3.20), (3.21) and the continuity of $ℒ_{β}^{h} (\cdot)$ with respect to $y$ and $λ$ , we get

$\underset{j \to + \infty}{\lim \sup} ℒ_{β}^{h} (x_{k_{j} + 1}, y_{k_{j}}, λ_{k_{j}}) = \underset{j \to + \infty}{\lim \sup} ℒ_{β}^{h} (x_{k_{j} + 1}, y_{k_{j} + 1}, λ_{k_{j} + 1}) \leq ℒ_{β}^{h} (x^{*}, y^{*}, λ^{*}) .$ (3.22)

On the other hand, using the lower semicontinuity of $ℒ_{β}^{h} (\cdot)$ , we have

$\underset{j \to + \infty}{\lim \sup} ℒ_{β}^{h} (x_{k_{j} + 1}, y_{k_{j} + 1}, λ_{k_{j} + 1}) \geq ℒ_{β}^{h} (x^{*}, y^{*}, λ^{*}) .$ (3.23)

The above two relations (3.22) and (3.23) imply that

$lim_{j \to + \infty} f (x_{k_{j} + 1}) = f (x^{*}) .$

Taking the limit in the optimality conditions (3.1) along the subsequence ${(x_{k_{j} + 1}, y_{k_{j} + 1}, λ_{k_{j} + 1})}$ , and utilizing (3.20) yields

${\begin{cases} A^{T} λ^{*} \in \partial f (x^{*}), \\ λ^{*} = \nabla g (y^{*}), \\ \nabla h (A x^{*} - b) - \nabla h (- y^{*}) = 0. \end{cases}$

Thus, $(x^{*}, y^{*}, λ^{*})$ is a critical point of (5), which implies that $w^{*} \in c r i t ℒ_{β}^{h}$ .

(iii) For any point $(x^{*}, y^{*}, λ^{*}) \in S (w_{0})$ , there exists a subsequence ${(x_{k_{j}}, y_{k_{j}}, λ_{k_{j}})}$ that converges to $(x^{*}, y^{*}, λ^{*})$ as $j \to + \infty$ . By merging equations (22) and (23) with the observation that the sequence ${ℒ_{β}^{h} (w_{k})}$ is nonincreasing, we have

$lim_{k \to + \infty} ℒ_{β}^{h} (x_{k}, y_{k}, λ_{k}) = ℒ_{β}^{h} (x^{*}, y^{*}, λ^{*}) .$

Therefore, $ℒ_{β}^{h} (\cdot)$ is constant on $S (w_{0})$ . Furthermore, we also have $inf_{k \in ℕ} ℒ_{β}^{h} (w_{k}) = lim_{k \to + \infty} ℒ_{β}^{h} (w_{k})$ .

Hence, we have complete the proof.

Now we present the main convergence result for the symmetric Bregman ADMM (1).

Theorem 3.1. Let ${w_{k} = (x_{k}, y_{k}, λ_{k})}$ be the sequence generated by the symmetric Bregman ADMM (1.6), which is assumed to be bounded. Suppose that $ℒ_{β}^{h} (\cdot)$ is a KL function, then ${w_{k}}$ has finite length, that is

$\sum_{k = 0}^{+ \infty} ‖ w_{k + 1} - w_{k} ‖ < + \infty,$

and as a consequence, ${w_{k}}$ converges to a critical point of $ℒ_{β}^{h} (\cdot)$ .

Proof: Based on the proof of Lemma 3.4, it implies that $ℒ_{β}^{h} (w_{k}) \to ℒ_{β}^{h} (w^{*})$ for all $w^{*} \in S (w_{0})$ . Next, we will consider two cases.

(i) If there exists an integer $k_{0}$ for which $ℒ_{β}^{h} (w_{k_{0}}) = ℒ_{β}^{h} (w^{*})$ . Then according to Remark 3.1 and (2), we have

$δ D_{h} (y_{k}, y_{k + 1}) \leq ℒ_{β}^{h} (w_{k}) - ℒ_{β}^{h} (w_{k + 1}) \leq ℒ_{β}^{h} (w_{k_{0}}) - ℒ_{β}^{h} (w^{*}) = 0,$

for any $k > k_{0}$ . Consequently, we conclude that $y_{k + 1} = y_{k}$ for any $k > k_{0}$ . Combining (9) and (18), we further deduce that $λ_{k + 1} = λ_{k}$ and $x_{k + 1} = x_{k}$ for any $k > k_{0} + 1$ , which implies that $w_{k + 1} = w_{k}$ . As a result, the assertion is substantiated.

(ii) If $ℒ_{β}^{h} (w_{k}) > ℒ_{β}^{h} (w^{*})$ for all $k$ . Considering that $d (w_{k}, S (w_{0})) \to 0$ , there exists $k_{1} > 0$ , such that $d (w_{k}, S (w_{0})) < ε$ $(\forall ε > 0)$ for any $k > k_{1}$ . Furthermore, with $ℒ_{β}^{h} (w_{k}) \to ℒ_{β}^{h} (w^{*})$ , it follows that there exists $k_{2} > 0$ , such that $ℒ_{β}^{h} (w_{k}) < ℒ_{β}^{h} (w^{*}) + η$ $(\forall η > 0)$ for any $k > k_{2}$ .

Thus, when $k > \tilde{k} = max {k_{1}, k_{2}}$ for all $ε, η > 0$ , we derive the following conclusions

$d (w_{k}, S (w_{0})) < ε, ℒ_{β}^{h} (w^{*}) < ℒ_{β}^{h} (w_{k}) < ℒ_{β}^{h} (w^{*}) + η .$

Since $ℒ_{β}^{h} (\cdot)$ is constant on $S (w_{0})$ and $S (w_{0})$ is a nonempty compact set, we use Lemma 2.3 with $Ω = S (w_{0})$ to derive that for any $k > \tilde{k}$ ,

$φ^{'} (ℒ_{β}^{h} (w_{k}) - ℒ_{β}^{h} (w^{*})) d (0, \partial ℒ_{β}^{h} (w_{k})) \geq 1.$ (3.24)

Relying on the fact that $ℒ_{β}^{h} (w_{k}) - ℒ_{β}^{h} (w_{k + 1}) = ℒ_{β}^{h} (w_{k}) - ℒ_{β}^{h} (w^{*}) - (ℒ_{β}^{h} (w_{k + 1}) - ℒ_{β}^{h} (w^{*}))$ , and the concavity of $φ (\cdot)$ , it follows that

$\begin{array}{l} φ (ℒ_{β}^{h} (w_{k}) - ℒ_{β}^{h} (w^{*})) - φ (ℒ_{β}^{h} (w_{k + 1}) - ℒ_{β}^{h} (w^{*})) \\ \geq φ^{'} (ℒ_{β}^{h} (w_{k}) - ℒ_{β}^{h} (w^{*})) (ℒ_{β}^{h} (w_{k}) - ℒ_{β}^{h} (w_{k + 1})) . \end{array}$

Combining $d (0, \partial ℒ_{β}^{h} (w_{k})) \leq ξ ‖ y_{k} - y_{k - 1} ‖$ with the above inequality, $φ^{'} (ℒ_{β}^{h} (w_{k}) - ℒ_{β}^{h} (w^{*})) > 0$ and inequality (3.24), we can obtain

$\begin{array}{l} ℒ_{β}^{h} (w_{k}) - ℒ_{β}^{h} (w_{k + 1}) \leq \frac{φ (ℒ_{β}^{h} (w_{k}) - ℒ_{β}^{h} (w^{*})) - φ (ℒ_{β}^{h} (w_{k + 1}) - ℒ_{β}^{h} (w^{*}))}{φ^{'} (ℒ_{β}^{h} (w_{k}) - ℒ_{β}^{h} (w^{*}))} \\ \leq d (0, \partial ℒ_{β}^{h} (w_{k})) [φ (ℒ_{β}^{h} (w_{k}) - ℒ_{β}^{h} (w^{*})) - φ (ℒ_{β}^{h} (w_{k + 1}) - ℒ_{β}^{h} (w^{*}))] \\ \leq ξ ‖ y_{k} - y_{k - 1} ‖ [φ (ℒ_{β}^{h} (w_{k}) - ℒ_{β}^{h} (w^{*})) - φ (ℒ_{β}^{h} (w_{k + 1}) - ℒ_{β}^{h} (w^{*}))] . \end{array}$ (3.25)

For convenience, for all $p, q \in ℕ$ , we define $Δ_{p, q} : = φ (ℒ_{β}^{h} (w_{p}) - ℒ_{β}^{h} (w^{*})) - φ (ℒ_{β}^{h} (w_{q}) - ℒ_{β}^{h} (w^{*}))$ .

Hence, (3.25) can be simplified as

$ℒ_{β}^{h} (w_{k}) - ℒ_{β}^{h} (w_{k + 1}) \leq ξ ‖ y_{k} - y_{k - 1} ‖ Δ_{k, k + 1} .$ (3.26)

Combining inequality (3.26) with Lemma 2, we get that for all $k > \tilde{k}$ ,

$\frac{δ}{2} {‖ y_{k} - y_{k + 1} ‖}^{2} \leq δ D_{h} (y_{k}, y_{k + 1}) \leq ξ ‖ y_{k} - y_{k - 1} ‖ Δ_{k, k + 1} .$

Then,

$‖ y_{k} - y_{k + 1} ‖ \leq \sqrt{\frac{2 ξ}{δ} Δ_{k, k + 1}} {‖ y_{k} - y_{k - 1} ‖}^{1 / 2} .$

Applying the fact that $2 \sqrt{α β} \leq α + β$ , we obtain

$2 ‖ y_{k} - y_{k + 1} ‖ \leq ‖ y_{k} - y_{k - 1} ‖ + \frac{2 ξ}{δ} Δ_{k, k + 1} .$ (3.27)

Summing up (3.27) over for $k = \tilde{k} + 1, \dots, m$ , we have

$2 \sum_{k = \tilde{k} + 1}^{m} ‖ y_{k + 1} - y_{k} ‖ \leq \sum_{k = \tilde{k} + 1}^{m} ‖ y_{k} - y_{k - 1} ‖ + \frac{2 ξ}{δ} Δ_{\tilde{k} + 1, m + 1} .$

Note that $φ (ℒ_{β}^{h} (w_{m + 1}) - ℒ_{β}^{h} (w^{*})) > 0$ from Definition 2.8. Taking $m \to + \infty$ , we have

$\sum_{k = \tilde{k} + 1}^{+ \infty} ‖ y_{k + 1} - y_{k} ‖ \leq ‖ y_{\tilde{k} + 1} - y_{\tilde{k}} ‖ + \frac{2 ξ}{δ} φ (ℒ_{β}^{h} (w_{\tilde{k} + 1}) - ℒ_{β}^{h} (w^{*})) .$ (3.28)

Therefore,

$\sum_{k = 0}^{+ \infty} ‖ y_{k + 1} - y_{k} ‖ < + \infty .$ (3.29)

Combining (3.9) and (3.29), we get

$\sum_{k = 0}^{+ \infty} ‖ λ_{k + 1} - λ_{k} ‖ < + \infty .$ (3.30)

Using (3.18), we obtain

$\begin{matrix} ‖ x_{k + 1} - x_{k} ‖ \leq \frac{2}{(α + 1) β \sqrt{M}} ({‖ λ_{k} - λ_{k - 1} ‖}^{2} + {‖ λ_{k + 1} - λ_{k} ‖}^{2} \\ + α^{2} β^{2} L_{h}^{2} {‖ y_{k} - y_{k - 1} ‖}^{2} + {β^{2} L_{h}^{2} {‖ y_{k + 1} - y_{k} ‖}^{2})}^{\frac{1}{2}} \\ \leq \frac{2}{(α + 1) β \sqrt{M}} (‖ λ_{k} - λ_{k - 1} ‖ + ‖ λ_{k + 1} - λ_{k} ‖ \\ + α β L_{h} ‖ y_{k} - y_{k - 1} ‖ + β L_{h} ‖ y_{k + 1} - y_{k} ‖) . \end{matrix}$

Combining above inequality with (3.29) and (3.30), we have

$\sum_{k = 0}^{+ \infty} ‖ x_{k + 1} - x_{k} ‖ < + \infty .$ (3.31)

Furthermore, we note that

$\begin{matrix} ‖ w_{k + 1} - w_{k} ‖ = {({‖ x_{k + 1} - x_{k} ‖}^{2} + {‖ y_{k + 1} - y_{k} ‖}^{2} + {‖ λ_{k + 1} - λ_{k} ‖}^{2})}^{1 / 2} \\ \leq ‖ x_{k + 1} - x_{k} ‖ + ‖ y_{k + 1} - y_{k} ‖ + ‖ λ_{k + 1} - λ_{k} ‖ . \end{matrix}$

Using (3.29), (3.30) and (3.31), it follows that

$\sum_{k = 0}^{+ \infty} ‖ w_{k + 1} - w_{k} ‖ < + \infty,$

which implies that ${w_{k}}$ is a Cauchy sequence and thus convergent. The assertion follows from Lemma 3.4 immediately.□

Next, we provide the essential sufficient conditions to establish that the sequence ${w_{k} = (x_{k}, y_{k}, λ_{k})}$ generated by the symmetric Bregman ADMM (1.6) is bounded.

Lemma 3.5. Let ${w_{k} = (x_{k}, y_{k}, λ_{k})}$ be the sequence generated by the symmetric Bregman ADMM (1.6). Suppose that

$inf_{y} {g (y) - \frac{1}{2 L} {‖ \nabla g (y) ‖}^{2}} = : \bar{g} > - \infty .$

If at least one of the following statements is true:

(i) $\underset{‖ x ‖ \to + \infty}{liminf} f (x) = + \infty$ .

(ii) $inf_{x} f (x) > - \infty$ and $\underset{‖ y ‖ \to + \infty}{liminf} g (y) = + \infty$ .

Then, the sequence ${w_{k} = (x_{k}, y_{k}, λ_{k})}$ is bounded.

Proof. Suppose that condition (i) holds. From Lemma 3.1, we have

$ℒ_{β}^{h} (x_{k}, y_{k}, λ_{k}) \leq ℒ_{β}^{h} (x_{1}, y_{1}, λ_{1}) .$

which implies

$ℒ_{β}^{h} (x_{1}, y_{1}, λ_{1}) \geq f (x_{k}) + g (y_{k}) - 〈 λ_{k}, A x_{k} + y_{k} - b 〉 + β D_{h} (- y_{k}, A x_{k} - b) .$ (3.32)

Using the 1-strong convexity of the function $h (\cdot)$ and the definition of $D_{h} (\cdot, \cdot)$ , we have

$\begin{matrix} D_{h} (- y_{k}, A x_{k} - b) = h (- y_{k}) - h (A x_{k} - b) - 〈 \nabla h (A x_{k} - b), - y_{k} - A x_{k} - b 〉 \\ \geq \frac{1}{2} {‖ A x_{k} + y_{k} - b ‖}^{2} . \end{matrix}$ (3.33)

Combining (3.32) with (3.33) and using the fact $λ_{k} = \nabla g (y_{k})$ , we get

$\begin{matrix} ℒ_{β}^{h} (x_{1}, y_{1}, λ_{1}) \geq f (x_{k}) + g (y_{k}) - 〈 λ_{k}, A x_{k} + y_{k} - b 〉 + \frac{β}{2} {‖ A x_{k} + y_{k} - b ‖}^{2} \\ = f (x_{k}) + g (y_{k}) - \frac{1}{2 β} {‖ λ_{k} ‖}^{2} + \frac{β}{2} {‖ A x_{k} + y_{k} - b - \frac{1}{β} λ_{k} ‖}^{2} \\ = f (x_{k}) + (g (y_{k}) - \frac{1}{2 L} {‖ \nabla g (y_{k}) ‖}^{2}) + (\frac{1}{2 L} - \frac{1}{2 β}) {‖ λ_{k} ‖}^{2} \\ + \frac{β}{2} {‖ A x_{k} + y_{k} - b - \frac{1}{β} λ_{k} ‖}^{2} \\ \geq f (x_{k}) + \bar{g} + (\frac{1}{2 L} - \frac{1}{2 β}) {‖ λ_{k} ‖}^{2} + \frac{β}{2} {‖ A x_{k} + y_{k} - b - \frac{1}{β} λ_{k} ‖}^{2} . \end{matrix}$ (34)

Note that (i) implies that $inf_{x} f (x) > - \infty$ . When $α \in (- \frac{1}{2 L_{h}^{2} + 1}, 0]$ , we have $β > \frac{2 L L_{h}^{2} - 2 α L L_{h}^{3}}{1 + α + 2 α L_{h}^{2}} > 2 L$ , besides, when $α \in [0, \frac{1}{2 L_{h}^{2} - 1})$ , we have $β > \frac{2 (1 + α) L L_{h}^{2} + 2 α L L_{h}^{3}}{1 + α - 2 α L_{h}^{2}} > 2 L$ . Therefore, we can derive that ${x_{k}}$ , ${λ_{k}}$ and ${\frac{β}{2} {‖ A x_{k} + y_{k} - b - \frac{1}{β} λ_{k} ‖}^{2}}$ are bounded. Consequently, ${y_{k}}$ is also bounded, and hence ${w_{k}}$ is bounded.

Next, suppose condition (ii) holds. Using $λ_{k} \in \partial g (y_{k})$ and (3.34), we get that

$\begin{matrix} ℒ_{β}^{h} (x_{1}, y_{1}, λ_{1}) \geq f (x_{k}) + g (y_{k}) - \frac{1}{2 β} {‖ λ_{k} ‖}^{2} + \frac{β}{2} {‖ A x_{k} + y_{k} - b - \frac{1}{β} λ_{k} ‖}^{2} \\ = f (x_{k}) + \frac{1}{2} g (y_{k}) + (\frac{1}{2} g (y_{k}) - \frac{1}{2 L} {‖ \nabla g (y_{k}) ‖}^{2}) \\ + (\frac{1}{4 L} - \frac{1}{2 β}) {‖ λ ‖}^{2} + \frac{β}{2} {‖ A x_{k} + y_{k} - b - \frac{1}{β} λ_{k} ‖}^{2} \\ \geq f (x_{k}) + \frac{1}{2} g (y_{k}) + \frac{1}{2} \bar{g} + (\frac{1}{4 L} - \frac{1}{2 β}) {‖ λ_{k} ‖}^{2} \\ + \frac{β}{2} {‖ A x_{k} + y_{k} - b - \frac{1}{β} λ_{k} ‖}^{2} . \end{matrix}$

Note that $\underset{‖ y ‖ \to \infty}{\lim \inf} g (y) = + \infty$ implies that $inf_{y} g (y) > - \infty$ . When $α \in (- \frac{1}{2 L_{h}^{2} + 1}, 0]$ , we have $β > \frac{2 L L_{h}^{2} - 2 α L L_{h}^{3}}{1 + α + 2 α L_{h}^{2}} > 2 L$ , besides, when $α \in [0, \frac{1}{2 L_{h}^{2} - 1})$ , we have $β > \frac{2 (1 + α) L L_{h}^{2} + 2 α L L_{h}^{3}}{1 + α - 2 α L_{h}^{2}} > 2 L$ . Therefore, we conclude that the sequences ${y_{k}}$ , ${λ_{k}}$ and ${\frac{β}{2} {‖ A x_{k} + y_{k} - b - \frac{1}{β} λ_{k} ‖}^{2}}$ are bounded. Then, from (3.18), it follows that ${x_{k}}$ is also bounded, and as a consequence, ${w_{k}}$ is bounded. This completes the proof.

Theorem 3.2. (Convergence rate) Let ${w_{k} = (x_{k}, y_{k}, λ_{k})}$ be the sequence generated by the symmetric Bregman ADMM (1.6) and converge to ${w^{*} = (x^{*}, y^{*}, λ^{*})}$ . Assuming that $ℒ_{β}^{h} (\cdot)$ has the KL property at $(x^{*}, y^{*}, λ^{*})$ with $φ (s) = c s^{1 - θ}$ , $θ \in [0, 1)$ , $c > 0$ . Then, the following results hold

(i) If $θ = 0$ , then the sequence ${w_{k} = (x_{k}, y_{k}, λ_{k})}$ converges in a finite number of steps.

(ii) If $θ \in (0, \frac{1}{2}]$ , then there exists $c_{1} > 0$ and $τ \in [0, 1)$ such that

$‖ (x_{k}, y_{k}, λ_{k}) - (x^{*}, y^{*}, λ^{*}) ‖ \leq c_{1} τ^{k} .$

(iii) If $θ \in (\frac{1}{2}, 1)$ , then there exists $c_{2} > 0$ such that

$‖ (x_{k}, y_{k}, λ_{k}) - (x^{*}, y^{*}, λ^{*}) ‖ \leq c_{2} k^{(θ - 1) / (2 θ - 1)} .$

Proof: Firstly, consider the case that $θ = 0$ , we have $φ (s) = c s$ and $φ^{'} (s) = c$ . Proof by contradiction, suppose that ${w_{k} = (x_{k}, y_{k}, λ_{k})}$ does not converge in a finite number of steps, and then, the KL property at $(x^{*}, y^{*}, λ^{*})$ yields $c \cdot d (0, \partial ℒ_{β}^{h} (w_{k})) \geq 1$ for any sufficiently large $k$ , which is contrary to Lemma 3.1.

Secondly, consider that $θ > 0$ and set $Δ_{k} = \sum_{i = k}^{+ \infty} ‖ y_{i + 1} - y_{i} ‖$ for $k \geq 0$ . By the triangle inequality, we derive that $Δ_{k} \geq ‖ y_{k} - y^{*} ‖$ , and hence it is able to estimate $Δ_{k}$ . With these notations, it follows from (3.28) that

$Δ_{\tilde{k} + 1} \leq Δ_{\tilde{k}} - Δ_{\tilde{k} + 1} + \frac{2 ξ}{δ} φ (ℒ_{β}^{h} (w_{\tilde{k} + 1}) - ℒ_{β}^{h} (w^{*})) .$

Invoking the KL property of $ℒ_{β}^{h} (\cdot)$ at $(x^{*}, y^{*}, λ^{*})$ , we conclude that

$φ^{'} (ℒ_{β}^{h} (w_{\tilde{k} + 1}) - ℒ_{β}^{h} (w^{*})) d (0, \partial ℒ_{β}^{h} (w_{\tilde{k} + 1})) \geq 1.$

This can be taken to imply

${(ℒ_{β}^{h} (w_{\tilde{k} + 1}) - ℒ_{β}^{h} (w^{*}))}^{θ} \leq c \cdot (1 - θ) d (0, \partial ℒ_{β}^{h} (w_{\tilde{k} + 1})) .$ (3.35)

According to Lemma 3.1, we get

$d (0, \partial ℒ_{β}^{h} (w_{\tilde{k} + 1})) \leq ξ ‖ y_{\tilde{k} + 1} - y_{\tilde{k}} ‖ = ξ (Δ_{\tilde{k}} - Δ_{\tilde{k} + 1}) .$ (3.36)

Combining (3.35) and (3.36), it follows that there exists $γ > 0$ such that

$φ (ℒ_{β}^{h} (w_{\tilde{k} + 1}) - ℒ_{β}^{h} (w^{*})) = c \cdot {(ℒ_{β}^{h} (w_{\tilde{k} + 1}) - ℒ_{β}^{h} (w^{*}))}^{1 - θ} \leq γ {(Δ_{\tilde{k}} - Δ_{\tilde{k} + 1})}^{(1 - θ) / θ} .$

Therefore,

$Δ_{\tilde{k} + 1} \leq Δ_{\tilde{k}} - Δ_{\tilde{k} + 1} + \frac{2 ξ}{δ} γ {(Δ_{\tilde{k}} - Δ_{\tilde{k} + 1})}^{(1 - θ) / θ} .$

Sequences satisfying such inequalities have been studied in [33]. It is shown that

If $θ \in (0, \frac{1}{2}]$ , then there exists $c_{1} > 0$ and $τ \in [0, 1)$ , such that

$‖ y_{k} - y^{*} ‖ \leq c_{1} τ^{k} .$ (3.37)

If $θ \in (\frac{1}{2}, 1)$ , then there exists $c_{2} > 0$ , such that

$‖ y_{k} - y^{*} ‖ \leq c_{2} k^{(θ - 1) / (2 θ - 1)} .$ (3.38)

Recalling that

$‖ λ_{k + 1} - λ_{k} ‖ \leq L L_{h} ‖ y_{k + 1} - y_{k} ‖,$

consequently,

$‖ λ_{k} - λ^{*} ‖ \leq L L_{h} ‖ y_{k} - y^{*} ‖ .$ (3.39)

Furthermore, from the relations

$λ_{k} = λ_{k - 1} - α β (\nabla h (A x_{k} - b) - \nabla h (- y_{k - 1})) - β (\nabla h (A x_{k} - b) - \nabla h (- y_{k})),$

and

$\nabla h (A x^{*} - b) - \nabla h (- y^{*}) = 0,$

it follows that

$\begin{array}{l} (α + 1) β (\nabla h (A x_{k} - b) - \nabla h (A x^{*} - b)) \\ = β (\nabla h (- y_{k}) - \nabla h (- y^{*})) + (λ_{k - 1} - λ^{*}) + (λ^{*} - λ_{k}) \\ + α β (\nabla h (- y_{k - 1}) - \nabla h (- y^{*})) . \end{array}$

i.e.,

$\begin{array}{l} (\nabla h (A x_{k} - b) - \nabla h (A x^{*} - b)) \\ = \frac{1}{α + 1} (\nabla h (- y_{k}) - \nabla h (- y^{*})) + \frac{1}{(α + 1) β} (λ_{k - 1} - λ^{*}) \\ + \frac{1}{(α + 1) β} (λ^{*} - λ_{k}) + \frac{α}{α + 1} (\nabla h (- y_{k - 1}) - \nabla h (- y^{*})) . \end{array}$

Subsequently, combining the 1-strong-convexity of $h (\cdot)$ and above equality, it follows that

$\begin{matrix} ‖ x_{k} - x^{*} ‖ \leq \frac{1}{M} ‖ \nabla h (A x_{k} - b) - \nabla h (A x^{*} - b) ‖ \\ \leq \frac{1}{M} (\frac{1}{(α + 1) β} ‖ λ_{k - 1} - λ^{*} ‖ + \frac{1}{(α + 1) β} ‖ λ^{*} - λ_{k} ‖ \\ + \frac{1}{α + 1} ‖ \nabla h (- y_{k}) - \nabla h (- y^{*}) ‖ + \frac{α}{α + 1} ‖ \nabla h (- y_{k - 1}) - \nabla h (- y^{*}) ‖) \end{matrix}$

$\begin{matrix} \leq \frac{1}{M} (\frac{1}{(α + 1) β} ‖ λ_{k - 1} - λ^{*} ‖ + \frac{1}{(α + 1) β} ‖ λ^{*} - λ_{k} ‖ \\ + \frac{L_{h}}{(α + 1) β} ‖ y_{k} - y^{*} ‖ + \frac{α L_{h}}{(α + 1) β} ‖ y_{k - 1} - y^{*} ‖) \\ \leq \frac{1}{M} (\frac{L L_{h}}{(α + 1) β} ‖ y_{k - 1} - y^{*} ‖ + \frac{L L_{h}}{(α + 1) β} ‖ y_{k} - y^{*} ‖ \\ + \frac{L_{h}}{(α + 1) β} ‖ y_{k} - y^{*} ‖ + \frac{α L_{h}}{(α + 1) β} ‖ y_{k - 1} - y^{*} ‖) \\ \leq \frac{1}{M} (\frac{(1 + L) L_{h}}{(α + 1) β} ‖ y_{k} - y^{*} ‖ + \frac{(α + L) L_{h}}{(α + 1) β} ‖ y_{k - 1} - y^{*} ‖) . \end{matrix}$ (3.40)

The desired inequalities follow from (3.37)-(3.40) immediately.□

4. Conclusions

In this paper, we proposed a symmetric Bregman ADMM, which can return to the symmetric ADMM, the Bregman ADMM and classical ADMM while circumventing the requirement for global Lipschitz continuity of the gradient when minimizing a linearly constrained nonconvex minimization problem whose objective function is the sum of two separable nonconvex functions. Moreover, we analyze its convergence, under certain assumptions and when the associated function satisfies the Kurdyka-Lojasiewicz inequality, we prove that the iterative sequence generated by the symmetric Bregman ADMM converges to a critical point of the problem. Finally, we establish the convergence rate of the algorithm.

Conflicts of Interest

The authors declare no conflicts of interest regarding the publication of this paper.

References

[1]	Goldstein, T., O’Donoghue, B., Setzer, S. and Baraniuk, R. (2014) Fast Alternating Direction Optimization Methods. SIAM Journal on Imaging Sciences, 7, 1588-1623. https://doi.org/10.1137/120896219
[2]	Boyd, S. (2010) Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers. Foundations and Trends in Machine Learning, 3, 1-122. https://doi.org/10.1561/2200000016
[3]	Yin, W., Osher, S., Goldfarb, D. and Darbon, J. (2008) Bregman Iterative Algorithms for-Minimization with Applications to Compressed Sensing. SIAM Journal on Imaging Sciences, 1, 143-168. https://doi.org/10.1137/070703983
[4]	Figueiredo, M.A.T. and Bioucas-Dias, J.M. (2010) Restoration of Poissonian Images Using Alternating Direction Optimization. IEEE Transactions on Image Processing, 19, 3133-3145. https://doi.org/10.1109/tip.2010.2053941
[5]	Goldstein, T., Bresson, X. and Osher, S. (2009) Geometric Applications of the Split Bregman Method: Segmentation and Surface Reconstruction. Journal of Scientific Computing, 45, 272-293. https://doi.org/10.1007/s10915-009-9331-z
[6]	Ochs, P., Chen, Y., Brox, T. and Pock, T. (2014) iPiano: Inertial Proximal Algorithm for Nonconvex Optimization. SIAM Journal on Imaging Sciences, 7, 1388-1419. https://doi.org/10.1137/130942954
[7]	Candes, E.J. and Tao, T. (2006) Near-Optimal Signal Recovery from Random Projections: Universal Encoding Strategies? IEEE Transactions on Information Theory, 52, 5406-5425. https://doi.org/10.1109/tit.2006.885507
[8]	Beck, A. and Teboulle, M. (2009) Fast Gradient-Based Algorithms for Constrained Total Variation Image Denoising and Deblurring Problems. IEEE Transactions on Image Processing, 18, 2419-2434. https://doi.org/10.1109/tip.2009.2028250
[9]	Dai, W. and Milenkovic, O. (2009) Subspace Pursuit for Compressive Sensing Signal Reconstruction. IEEE Transactions on Information Theory, 55, 2230-2249. https://doi.org/10.1109/tit.2009.2016006
[10]	Aharon, M., Elad, M. and Bruckstein, A. (2006) K-SVD: An Algorithm for Designing Overcomplete Dictionaries for Sparse Representation. IEEE Transactions on Signal Processing, 54, 4311-4322. https://doi.org/10.1109/tsp.2006.881199
[11]	Gabay, D. and Mercier, B. (1976) A Dual Algorithm for the Solution of Nonlinear Variational Problems via Finite Element Approximation. Computers & Mathematics with Applications, 2, 17-40. https://doi.org/10.1016/0898-1221(76)90003-1
[12]	Glowinski, R. and Marroco, A. (1975) Sur l’approximation, par éléments finis d’ordre un, et la résolution, par pénalisation-dualité d’une classe de problèmes de Dirichlet non linéaires. Revue française d’automatique, informatique, recherche opérationnelle. Analyse numérique, 9, 41-76. https://doi.org/10.1051/m2an/197509r200411
[13]	Boley, D. (2013) Local Linear Convergence of the Alternating Direction Method of Multipliers on Quadratic or Linear Programs. SIAM Journal on Optimization, 23, 2183-2207. https://doi.org/10.1137/120878951
[14]	Han, D. and Yuan, X. (2013) Local Linear Convergence of the Alternating Direction Method of Multipliers for Quadratic Programs. SIAM Journal on Numerical Analysis, 51, 3446-3457. https://doi.org/10.1137/120886753
[15]	He, B. and Yuan, X. (2012) On the Convergence Rate of the Douglas-Rachford Alternating Direction Method. SIAM Journal on Numerical Analysis, 50, 700-709. https://doi.org/10.1137/110836936
[16]	Yang, W.H. and Han, D. (2016) Linear Convergence of the Alternating Direction Method of Multipliers for a Class of Convex Optimization Problems. SIAM Journal on Numerical Analysis, 54, 625-640. https://doi.org/10.1137/140974237
[17]	Li, G. and Pong, T.K. (2015) Global Convergence of Splitting Methods for Nonconvex Composite Optimization. SIAM Journal on Optimization, 25, 2434-2460. https://doi.org/10.1137/140998135
[18]	Hong, M., Luo, Z. and Razaviyayn, M. (2016) Convergence Analysis of Alternating Direction Method of Multipliers for a Family of Nonconvex Problems. SIAM Journal on Optimization, 26, 337-364. https://doi.org/10.1137/140990309
[19]	Guo, K., Han, D.R. and Wu, T.T. (2016) Convergence of Alternating Direction Method for Minimizing Sum of Two Nonconvex Functions with Linear Constraints. International Journal of Computer Mathematics, 94, 1653-1669. https://doi.org/10.1080/00207160.2016.1227432
[20]	Wu, Z., Li, M., Wang, D.Z.W. and Han, D. (2017) A Symmetric Alternating Direction Method of Multipliers for Separable Nonconvex Minimization Problems. Asia-Pacific Journal of Operational Research, 34, Article ID: 1750030. https://doi.org/10.1142/s0217595917500300
[21]	Bauschke, H.H., Bolte, J. and Teboulle, M. (2017) A Descent Lemma Beyond Lipschitz Gradient Continuity: First-Order Methods Revisited and Applications. Mathematics of Operations Research, 42, 330-348. https://doi.org/10.1287/moor.2016.0817
[22]	Dragomir, R., d’Aspremont, A. and Bolte, J. (2021) Quartic First-Order Methods for Low-Rank Minimization. Journal of Optimization Theory and Applications, 189, 341-363. https://doi.org/10.1007/s10957-021-01820-3
[23]	Nesterov, Y. (2019) Implementable Tensor Methods in Unconstrained Convex Optimization. Mathematical Programming, 186, 157-183. https://doi.org/10.1007/s10107-019-01449-1
[24]	Tan, L. and Guo, K. (2025) Bregman ADMM: A New Algorithm for Nonconvex Optimization with Linear Constraints. Journal of Nonlinear and Variational Analysis, 9, 176-196. https://doi.org/10.23952/jnva.9.2025.2.02
[25]	Wang, H. and Banerjee, A. (2014) Bregman Alternating Direction Method of Multipliers. arXiv: 1306.3203.
[26]	Beck, A. (2017) First-Order Methods in Optimization. Society for Industrial and Applied Mathematics. https://doi.org/10.1137/1.9781611974997
[27]	Bolte, J., Sabach, S., Teboulle, M. and Vaisbourd, Y. (2018) First Order Methods Beyond Convexity and Lipschitz Gradient Continuity with Applications to Quadratic Inverse Problems. SIAM Journal on Optimization, 28, 2131-2151. https://doi.org/10.1137/17m1138558
[28]	Attouch, H., Bolte, J., Redont, P. and Soubeyran, A. (2010) Proximal Alternating Minimization and Projection Methods for Nonconvex Problems: An Approach Based on the Kurdyka-Łojasiewicz Inequality. Mathematics of Operations Research, 35, 438-457. https://doi.org/10.1287/moor.1100.0449
[29]	Attouch, H., Bolte, J. and Svaiter, B.F. (2011) Convergence of Descent Methods for Semi-Algebraic and Tame Problems: Proximal Algorithms, Forward-Backward Splitting, and Regularized Gauss–seidel Methods. Mathematical Programming, 137, 91-129. https://doi.org/10.1007/s10107-011-0484-9
[30]	Bolte, J., Sabach, S. and Teboulle, M. (2013) Proximal Alternating Linearized Minimization for Nonconvex and Nonsmooth Problems. Mathematical Programming, 146, 459-494. https://doi.org/10.1007/s10107-013-0701-9
[31]	Bauschke, H.H., Borwein, J.M. and Combettes, P.L. (2003) Bregman Monotone Optimization Algorithms. SIAM Journal on Control and Optimization, 42, 596-636. https://doi.org/10.1137/s0363012902407120
[32]	Bregman, L.M. (1967) The Relaxation Method of Finding the Common Point of Convex Sets and Its Application to the Solution of Problems in Convex Programming. USSR Computational Mathematics and Mathematical Physics, 7, 200-217. https://doi.org/10.1016/0041-5553(67)90040-7
[33]	Attouch, H. and Bolte, J. (2007) On the Convergence of the Proximal Algorithm for Nonsmooth Functions Involving Analytic Features. Mathematical Programming, 116, 5-16. https://doi.org/10.1007/s10107-007-0133-5

Journals Menu

Follow SCIRP

	customer@scirp.org
	+86 18163351462(WhatsApp)
	1655362766

	Paper Publishing WeChat

Journals Menu

Home

About SCIRP

Service

Policies