A Symmetric Alternating Direction Method of Multipliers with Two Different Relaxation Factors for Solving Non-Separable Nonconvex Minimization Problems

Mei  Lu; Zidan  Wang

doi:10.4236/jamp.2026.144081

Journal of Applied Mathematics and Physics > Vol.14 No.4, April 2026

A Symmetric Alternating Direction Method of Multipliers with Two Different Relaxation Factors for Solving Non-Separable Nonconvex Minimization Problems

Mei Lu, Zidan Wang^*
Key Laboratory of Optimization Theory and Applications, School of Mathematics and Information, China West Normal University, Nanchong, China.
DOI: 10.4236/jamp.2026.144081 PDF HTML XML 7 Downloads 65 Views

Abstract

This paper proposes a symmetric alternating direction method of multipliers with two different relaxation factors for solving nonconvex optimization problems with linear constraints and a non-separable structure. Although many studies have proposed variants of symmetric ADMM with a single relaxation factor, incorporating techniques such as Bregman distances, inertial terms, regularization terms, or linearization, the most basic form of symmetric ADMM with two different relaxation factors for solving non-separable problems has not yet been resolved. The introduction of two different relaxation factors in this method yields a broader range of parameters, making it applicable to more practical problems, and also provides a fundamental theoretical basis for accelerating the algorithm with other techniques. Based on the Kurdyka-Łojasiewicz property, we establish the convergence of the sequences generated by the proposed algorithm and analyze its convergence rate.

Keywords

Symmetric Alternating Direction Method of Multipliers, Non-Separable Structure, Nonconvex Optimization, Kurdyka-Łojasiewicz Inequality

Share and Cite:

Lu, M. and Wang, Z. (2026) A Symmetric Alternating Direction Method of Multipliers with Two Different Relaxation Factors for Solving Non-Separable Nonconvex Minimization Problems. Journal of Applied Mathematics and Physics, 14, 1714-1732. doi: 10.4236/jamp.2026.144081.

1. Introduction

In many practical problems such as image processing, machine learning, and statistical modeling [1]-[3], the objective function often contains a coupling term $H (x, y)$ . That is, for the nonconvex and nonseparable problem considered in this paper, the iterative scheme is as follows:

$\begin{array}{l} min_{x, y} f (x) + g (y) + H (x, y), \\ s . t . A x + y = b . \end{array}$ (1)

where $f : ℝ^{n} \to ℝ \cup {+ \infty}$ is a proper lower semicontinuous function, possibly nonsmooth and nonconvex, both $g : ℝ^{m} \to ℝ$ and $H : ℝ^{n} \times ℝ^{m} \to ℝ$ are continuously differentiable and possibly nonconvex functions, the gradient $\nabla g$ is $L_{g}$ -Lipschitz continuous and the gradient $\nabla H$ is $L_{h}$ -Lipschitz continuous. $A \in ℝ^{m \times n}$ is a matrix and $b \in ℝ^{m}$ is a vector. In particular, when $H (x, y) = 0$ , problem (1) reduces to a separable optimization problem of the following form:

$\begin{array}{l} min_{x, y} f (x) + g (y), \\ s . t . A x + y = b . \end{array}$ (2)

The alternating direction method of multipliers (ADMM) is one of the most effective approaches for solving problem (2). The convergence of ADMM has been extensively studied [4]-[9], and the convergence rate analysis is relatively well-established for problem (2). However, when $H (x, y) \neq 0$ , for problem (1), the convergence results for ADMM are relatively limited. Gao et al. [10] proved the convergence of the proximal ADMM under the setting where $H$ is smooth, $f$ and $g$ are convex, combined with the assumption that $\nabla H$ is Lipschitz continuous. Chen et al. [11] proved the convergence of the extended ADMM when the coupling term $H$ is a quadratic function. In recent years, researchers have begun to focus on solving (1) by using ADMM in non-convex settings, and have employed tools such as the Kurdyka-Łojasiewicz (KL) inequality to prove the convergence of the algorithm. Guo et al. [12] [13] proved the convergence of the classic ADMM and extended it to the generalized ADMM (GADMM). Liu et al. [14] proposed a linearized ADMM and proved the convergence of the algorithm.

Regarding problem (2), it has been established in the literature that under convergence occurs, the symmetric ADMM often converges faster than the classical ADMM. Wu et al. [15] proposed a symmetric ADMM with one relaxation factor and verified its convergence. On this basis, some scholars have studied the symmetric ADMM with a relaxation factor and its variants for solving the nonseparable problem (1). For example, Dang et al. [16] proposed a linear proximal symmetric ADMM and verified that the sequence generated by the algorithm converges to a stationary point of the problem. The iterative scheme is as follows:

${\begin{cases} x^{k + 1} \in \underset{x}{\arg \min} {f (x) + 〈 x - x^{k}, \nabla_{x} H (x^{k}, y^{k}) + β A^{T} (A x^{k} + B y^{k} - b) 〉_{}^{} \\ - 〈 λ^{k}, A x 〉 + \frac{α_{x}}{2} {‖ x - x^{k} ‖}^{2}}, \\ λ^{k + \frac{1}{2}} = λ^{k} - τ β (A x^{k + 1} + B y^{k} - b), \\ y^{k + 1} \in \underset{y}{\arg \min} {g (y) + 〈 y - y^{k}, \nabla_{y} H (x^{k + 1}, y^{k}) 〉 + \frac{β}{2} {‖ A x^{k + 1} + B y - b - \frac{λ^{k + \frac{1}{2}}}{β} ‖}^{2}}, \\ λ^{k + 1} = λ^{k + \frac{1}{2}} - β (A x^{k + 1} + B y^{k + 1} - b) . \end{cases}$

The parameters are selected as follows:

$τ \in (- 1, 0), α_{y} \geq l_{h},$

$α_{x} \geq \frac{8 μ^{2} l_{h}^{2}}{(τ + 1) β} + β λ_{\max} (A^{T} A) + l_{h}, β \geq \frac{μ^{2} (b - \sqrt{b^{2} - 64 τ c})}{4 τ} .$

Dang et al. [17] proposed an inertial Bregman symmetric ADMM algorithm and established its global convergence. The iterative scheme is as follows:

${\begin{cases} x^{k + 1} \in \underset{x}{\arg \min} {L_{ρ} (x, y^{k}, λ^{k}) + Δ_{ϕ_{1}} (x, x^{k}) + θ_{1 k} 〈 x, x^{k - 1} - x^{k} 〉 + θ_{2 k} 〈 x, x^{k - 2} - x^{k - 1} 〉}, \\ λ^{k + \frac{1}{2}} = λ^{k} - r ρ (A x^{k + 1} + B y^{k} - b) + θ_{1 k} B (y^{k - 1} - y^{k}) + θ_{2 k} B (y^{k - 2} - y^{k - 1}), \\ y^{k + 1} \in \underset{y}{\arg \min} {L_{ρ} (x^{k + 1}, y, λ^{k + \frac{1}{2}}) + Δ_{ϕ_{2}} (y, y^{k}) + θ_{1 k} 〈 B y, B (y^{k - 1} - y^{k}) 〉 \\ \begin{matrix} \end{matrix} + θ_{2 k} 〈 B y, B (y^{k - 2} - y^{k - 1}) 〉}, \\ λ^{k + 1} = λ^{k + \frac{1}{2}} - ρ (A x^{k + 1} + B y^{k + 1} - b) + θ_{1 k} B (y^{k} - y^{k - 1}) + θ_{2 k} B (y^{k - 1} - y^{k - 2}) . \end{cases}$

where $Δ_{ϕ_{i}} (s, t)$ , $i = 1, 2$ represents the Bregman distances, and $r \in (- 1, 1)$ is a relaxation factor. However, the convergence of the sequence generated by the algorithm depends on the strong convexity of the kernel function associated with the Bregman distance.

In recent studies on solving problem (2), we observe that Lu et al. [18] proposed a symmetric ADMM with two different relaxation factors, without introducing the Bregman distance, they proved its convergence with a broader range of parameters, numerical experiments verified that their algorithm converges faster than the algorithm proposed by [15]. Its iterative scheme is as follows:

${\begin{cases} x^{k + 1} \in \underset{x}{\arg \min} {ℒ_{β}^{s} (x, y^{k}, λ^{k})}, \\ λ^{k + \frac{1}{2}} = λ^{k} - α β (A x^{k + 1} + y^{k} - b), \\ y^{k + 1} \in \underset{y}{\arg \min} {ℒ_{β}^{s} (x^{k + 1}, y, λ^{k + \frac{1}{2}})}, \\ λ^{k + 1} = λ^{k + \frac{1}{2}} - s β (A x^{k + 1} + y^{k + 1} - b) . \end{cases}$ (3)

Among them, $ℒ_{β}^{s}$ is the augmented Lagrangian function associated with problem (2), defined as follows:

$ℒ_{β}^{s} (x, y, λ) = f (x) + g (y) - 〈 λ, A x + y - b 〉 + \frac{s β}{2} {‖ A x + y - b ‖}^{2} .$ (4)

where $β > 0$ is the penalty parameter, $α$ and s are two different relaxation factors.

The above research on symmetric ADMM algorithms for solving problem (1) is limited to those containing only one relaxation factor. However, inspired by the algorithm proposed by the work of Lu et al. [18], we consider introducing two relaxation factors to achieve wider applicability, faster convergence, and a more concise and unified algorithmic framework, this paper proposes using a symmetric ADMM with two different relaxation factors to solve the non-separable (1), the iterative scheme is as follows:

${\begin{cases} x^{k + 1} \in \underset{x}{\arg \min} {f (x) + H (x, y^{k}) - 〈 λ^{k}, A x 〉 + \frac{s β}{2} {‖ A x + y^{k} - b ‖}^{2}}, \\ λ^{k + \frac{1}{2}} = λ^{k} - α β (A x^{k + 1} + y^{k} - b), \\ y^{k + 1} \in \underset{y}{\arg \min} {g (y) + H (x^{k + 1}, y) - 〈 λ^{k \frac{1}{2}}, y 〉 + \frac{s β}{2} {‖ A x^{k + 1} + y - b ‖}^{2}}, \\ λ^{k + 1} = λ^{k + \frac{1}{2}} - s β (A x^{k + 1} + y^{k + 1} - b) . \end{cases}$ (5)

Compared with the algorithm proposed by Dang et al. [16], our algorithm has a wider range of parameter values, allowing it to be applied to more practical scenarios through appropriate parameter tuning. Moreover, unlike the algorithm proposed by Dang et al. [17], our method does not require the introduction of the Bregman distance, thereby providing a unified iterative framework for solving non-separable problems via ADMM. The optimality conditions are as follows:

${\begin{cases} 0 \in \partial f (x^{k + 1}) + \nabla_{x} H (x^{k + 1}, y^{k}) - A^{T} λ^{k} + s β A^{T} (A x^{k + 1} + y^{k} - b), \\ λ^{k + \frac{1}{2}} = λ^{k} - α β (A x^{k + 1} + y^{k} - b), \\ 0 = \nabla g (y^{k + 1}) + \nabla_{y} H (x^{k + 1}, y^{k + 1}) - λ^{k + \frac{1}{2}} + s β (A x^{k + 1} + y^{k + 1} - b), \\ λ^{k + 1} = λ^{k + \frac{1}{2}} - s β (A x^{k + 1} + y^{k + 1} - b) . \end{cases}$ (6)

In the case where $H = 0$ , when $a = 0$ and $s = 1$ , the proposed algorithm reduces to the classical ADMM. When $a \neq 0$ and $s = 1$ , the algorithm reduces to the symmetric ADMM with a scaling factor. This demonstrates that our method provides an improved basic iterative framework for symmetric ADMM with relaxation factors.

2. Preliminaries

In this section, we provide the necessary definitions and properties required for the following study.

Notations:

$ℝ$ , $ℝ^{n}$ , $ℝ^{m \times n}$ : real numbers, $n$ -dimensional real vectors, $m \times n$ -real matrices.
$〈 \cdot, \cdot 〉$ and $‖ \cdot ‖$ : inner product and associated norm.

Set-valued mapping $F : ℝ^{n} ⇉ ℝ^{m}$ :

Domain: $dom F : = {x \in ℝ^{n} | F (x) \neq \emptyset}$ .
Graph: $Gra F : = {(x, y) \in ℝ^{n} \times ℝ^{m} : y \in F (x)}$ .

Distance: For $S \subseteq ℝ^{n}$ and $x \in ℝ^{n}$ , $d (x, S) : = \inf {‖ y - x ‖ | y \in S}$ , with $d (x, S) : = + \infty$ .

Definition 1. [19] The domain of function $f : ℝ^{n} \to ℝ \cup {+ \infty}$ is denoted by

$dom f = {x \in R^{n} : f (x) < + \infty},$

then $f (x)$ is called a proper function.

Definition 2. [19] A function $f : ℝ^{n} \to ℝ \cup {+ \infty}$ is said to be lower semicontinuous at $x \in ℝ^{n}$ if it satisfies

$f (x) \leq \underset{k \to \infty}{\lim \inf} f (x_{k}) .$

If this holds for every point in $dom f$ , then $f$ is said to be a lower semicontinuous function.

Definition 3. [20] Let $f : ℝ^{n} \to ℝ \cup {+ \infty}$ be a proper lower semicontinuous function.

(i) The Fréchet subdifferential of $f$ at $x \in dom f$ , written $\hat{\partial} f (x)$ , is the set of vectors $x^{*} \in ℝ^{n}$ that satisfy

$lim_{y \neq x} inf_{y \to x} \frac{f (y) - f (x) - 〈 x^{*}, y - x 〉}{‖ y - x ‖} \geq 0.$

When $x \notin dom f$ , we set $\hat{\partial} f (x) = \emptyset$ .

(ii) The limiting-subdifferential of $f$ at $x \in dom f$ , written $\partial f (x)$ , is defined as follows

$\partial f (x) = {x^{*} \in ℝ^{n}, \exists x_{n} \to x, f (x_{n}) \to f (x), x_{n}^{*} \in \hat{\partial} f (x_{n}), with x_{n}^{*} \to x^{*}} .$

Lemma 1. [21] Let $g : R^{m} \to R$ be a continuously differentiable function and suppose that $\nabla g$ is $L$ -Lipschitz continuous. Then

$‖ g (u) - g (v) - 〈 u - v, \nabla g (v) 〉 ‖ \leq \frac{L}{2} {‖ u - v ‖}^{2}, \forall u, v \in R^{m} .$

Definition 4. ([22], Kurdyka-Lojasiewicz inequality) Let $f : ℝ^{n} \to ℝ \cup {+ \infty}$ be a proper lower semicontinuous function. For $- \infty < η_{1} < η_{1} \leq + \infty$ , set

$[η_{1} < f < η_{1}] = {x \in ℝ^{n} : η_{1} < f (x) < η_{2}} .$

We say that function $f$ has the KL property at $x^{*} \in dom (\partial f)$ if there exist $η \in (0, + \infty]$ , a neighbourhood $U$ of $x^{*}$ , and a continuous concave function $φ : [0, η) \to ℝ_{+}$ , such that

(i) $φ (0) = 0$ ;

(ii) $φ$ is $C^{1}$ on $(0, η)$ and continuous at 0;

(iii) $φ^{'} (x) > 0, \forall x \in (0, η)$ ;

(iv) for all $x$ in $U \cap [f (x^{*}) < f < f (x^{*}) + η]$ , the Kurdyka-Lojasiewicz inequality holds

$φ^{'} (f (x) - f (x^{*})) d (0, \partial f (x)) \geq 1,$

where $d (x, \partial f (x)) = \inf_{y \in \partial f (x)} ‖ y - x ‖$ is the distance from $x$ to $\partial f (x)$ .

Lemma 2. ([23], Uniformized KL property) Let $Ω$ be a compact set and $f : ℝ^{n} \to ℝ \cup {+ \infty}$ be a proper and lower semicontinuous function. Assume that $f$ is constant on $Ω$ and satisfies the KL property at each point of $Ω$ . Then, there exist $ϵ > 0, η > 0$ , and $φ \in Φ_{η}$ such that for all $\bar{x} \in Ω$ and for all $x$ in the following intersection:

${x \in ℝ^{n} : d (x, Ω) < ϵ} \cap [f (\bar{x}) < f < f (\bar{x}) + η],$

one has

$φ^{'} (f (x) - f (\bar{x})) d (0, \partial f (x)) \geq 1.$

Lemma 3. [24] Suppose that $F (x, y) = f (x) + g (x)$ , where $f : ℝ^{n} \to ℝ \cup {+ \infty}$ and $g : ℝ^{m} \to ℝ \cup {+ \infty}$ are proper lower semicontinuous functions. Then for all $(x, y) \in dom F = dom f \times dom g$ , we have

$\partial F (x, y) = \partial_{x} F (x, y) \times \partial_{y} F (x, y) .$

Definition 5. ([24], Kurdyka-Lojasiewicz function) If $f$ satisfies the KL property at each point of $dom (\partial f)$ , then $f$ is called a KL function.

Definition 6. $(x^{*}, y^{*}, λ^{*})$ is a stationary point of the augmented Lagrangian function $ℒ_{β}^{s} (\cdot)$ for problem (1), if and only if

${\begin{cases} - \nabla_{x} H (x^{*}, y^{*}) + A^{T} λ^{*} \in \partial f (x^{*}), \\ - \nabla_{y} H (x^{*}, y^{*}) + λ^{*} = \nabla g (y^{*}), \\ A x^{*} + y^{*} = b . \end{cases}$ (7)

Lemma 4. Let the iterative sequence generated by the algorithm (5) be denoted as ${w^{k} : (x^{k}, y^{k}, λ^{k})}$ . Then, the following holds

${\begin{cases} 0 \in \partial f (x^{k + 1}) + \nabla_{x} H (x^{k + 1}, y^{k}) - A^{T} λ^{k + 1} + \frac{α}{α + s} A^{T} (λ^{k + 1} - λ^{k}) \\ - \frac{s^{2} β}{α + s} A^{T} (y^{k + 1} - y^{k}), \\ \nabla g (y^{k + 1}) = λ^{k + 1} - \nabla_{y} H (x^{k + 1}, y^{k + 1}), \\ λ^{k + 1} = λ^{k} - β [(α + s) (A x^{k + 1} + y^{k} - b) + s (y^{k + 1} - y^{k})] . \end{cases}$

Proof. Combining the second and fourth equations in (6) yields

$λ^{k + 1} = λ^{k} - α β (A x^{k + 1} + y^{k} - b) - s β (x^{k + 1} + y^{k + 1} - b),$

$λ^{k + 1} = λ^{k} - (α + s) β (A x^{k + 1} + y^{k} - b) - s β (y^{k + 1} - y^{k}),$

subtracting the above two equations, we obtain

$λ^{k + 1} - λ^{k} = - (α + s) β (A x^{k + 1} + y^{k} - b) - s β (y^{k + 1} - y^{k}),$ (8)

and thus

$A x^{k + 1} + y^{k} - b = - \frac{1}{(α + s) β} (λ^{k + 1} - λ^{k}) - \frac{s}{α + s} (y^{k + 1} - y^{k}),$ (9)

$A x^{k + 1} + y^{k + 1} - b = - \frac{1}{(α + s) β} (λ^{k + 1} - λ^{k}) + \frac{α}{α + s} (y^{k + 1} - y^{k}) .$ (10)

According to the optimality condition (6) of the $x$ -subproblem, we have

$\begin{matrix} 0 \in \partial f (x^{k + 1}) + \nabla_{x} H (x^{k + 1}, y^{k}) - A^{T} λ^{k} + s β A^{T} (A x^{k + 1} + y^{k} - b) \\ = \partial f (x^{k + 1}) + \nabla_{x} H (x^{k + 1}, y^{k}) - A^{T} λ^{k + 1} + A^{T} (λ^{k + 1} - λ^{k}) \\ + s β A^{T} (A x^{k + 1} + y^{k} - b), \end{matrix}$ (11)

putting (9) into (11), we get

$\begin{array}{l} 0 \in \partial f (x^{k + 1}) + \nabla_{x} H (x^{k + 1}, y^{k}) - A^{T} λ^{k + 1} + \frac{α}{α + s} A^{T} (λ^{k + 1} - λ^{k}) \\ - \frac{s^{2} β}{α + s} A^{T} (y^{k + 1} - y^{k}) . \end{array}$ (12)

According to the third and fourth equations in (6), we obtain

$\nabla g (y^{k + 1}) = λ^{k + 1} - \nabla_{y} H (x^{k + 1}, y^{k + 1}) .$ (13)

This completes the proof. □

3. Convergence Analysis

Assumption A. let $f : ℝ^{n} \to ℝ \cup {+ \infty}$ be a proper lower semicontinuous function, and let $g : ℝ^{m} \to ℝ$ be a continuously differentiable function with an $L_{g}$ -Lipschitz continuous gradient $\nabla g$ , and $H : ℝ^{m} \times ℝ^{n} \to ℝ$ be a continuously differentiable function with an $L_{h}$ -Lipschitz continuous gradient $\nabla H$ . Also assume that

${(α, s) \in ℝ^{2} | α < s < 0}$ ,
${β \in ℝ | β > - \frac{1}{s} (L_{g} + L_{h})}$

Lemma 5. Let the iterative sequence generated by the algorithm (5) be denoted

as ${w^{k} : (x^{k}, y^{k}, λ^{k})}$ . Suppose that this sequence is bounded and that Assumption A holds. Then the following conclusion holds

$ℒ_{β}^{s} (w^{k + 1}) - ℒ_{β}^{s} (w^{k}) \leq - η ({‖ x^{k + 1} - x^{k} ‖}^{2} + {‖ y^{k + 1} - y^{k} ‖}^{2}),$ (14)

where $η > 0$ (see Remark (0.1)).

Proof. The polarization identity implies

$\begin{array}{l} \frac{s β}{2} {‖ A x^{k + 1} + y^{k + 1} - b ‖}^{2} - \frac{s β}{2} {‖ A x^{k + 1} + y^{k} - b ‖}^{2} \\ = - \frac{s β}{2} {‖ y^{k} - y^{k + 1} ‖}^{2} + s β 〈 A x^{k + 1} + y^{k + 1} - b, y^{k + 1} - y^{k} 〉 . \end{array}$ (15)

Due to $\nabla g$ is $L_{g}$ -Lipschitz continuous and $\nabla H$ is $L_{h}$ -Lipschitz continuous, we combining the lemma 1, then we have

$g (y^{k + 1}) - g (y^{k}) \leq 〈 \nabla g (y^{k + 1}), y^{k + 1} - y^{k} 〉 + \frac{L}{2} {‖ y^{k} - y^{k + 1} ‖}^{2} .$ (16)

$\begin{array}{l} H (x^{k + 1}, y^{k + 1}) - H (x^{k + 1}, y^{k}) \\ \leq 〈 \nabla_{y} H (x^{k + 1}, y^{k + 1}), y^{k + 1} - y^{k} 〉 + \frac{L}{2} {‖ y^{k} - y^{k + 1} ‖}^{2} . \end{array}$ (17)

From the fourth equation in the optimality condition (6) of the problem, we obtain

$\nabla g (y^{k + 1}) = λ^{k + \frac{1}{2}} - s β (A x^{k + 1} + y^{k + 1} - b) .$ (18)

We obtain $λ^{k + 1} = \nabla g (y^{k + 1}) + \nabla_{y} H (x^{k + 1}, y^{k + 1})$ in (13) and combining the Lipschitz continuities of $\nabla g$ and $\nabla H$ , we have

$\begin{matrix} {‖ λ^{k + 1} - λ^{k} ‖}^{2} = {‖ \nabla g (y^{k + 1}) - \nabla g (y^{k}) + \nabla_{y} H (x^{k + 1}, y^{k + 1}) - \nabla_{y} H (x^{k}, y^{k}) ‖}^{2} \\ \leq 2 {‖ \nabla g (y^{k + 1}) - \nabla g (y^{k}) ‖}^{2} + 2 {‖ \nabla_{y} H (x^{k + 1}, y^{k + 1}) - \nabla_{y} H (x^{k}, y^{k}) ‖}^{2} \\ \leq 2 L_{g}^{2} {‖ y^{k + 1} - y^{k} ‖}^{2} + 2 L_{h}^{2} {‖ (x^{k + 1}, y^{k + 1}) - (x^{k}, y^{k}) ‖}^{2} \\ = 2 (L_{g}^{2} + L_{h}^{2}) {‖ y^{k + 1} - y^{k} ‖}^{2} + 2 L_{h}^{2} {‖ x^{k + 1} - x^{k} ‖}^{2} . \end{matrix}$ (19)

Recall (9) and (10), we get

$\begin{array}{l} α β {‖ A x^{k + 1} + y^{k} - b ‖}^{2} + s β {‖ A x^{k + 1} + y^{k + 1} - b ‖}^{2} \\ = \frac{1}{(α + s) β} {‖ λ^{k} - λ^{k + 1} ‖}^{2} + \frac{α s β}{α + s} {‖ y^{k} - y^{k + 1} ‖}^{2} . \end{array}$ (20)

From the definition of the augmented Lagrangian function $ℒ_{β}^{s} (\cdot)$ in (4), and in combination with (15), we have

$\begin{array}{l} ℒ_{β}^{s} (x^{k + 1}, y^{k + 1}, λ^{k + \frac{1}{2}}) - ℒ_{β}^{s} (x^{k + 1}, y^{k}, λ^{k + \frac{1}{2}}) \\ = g (y^{k + 1}) - g (y^{k}) - 〈 λ^{k + \frac{1}{2}}, y^{k + 1} - y^{k} 〉 + H (x^{k + 1}, y^{k + 1}) - H (x^{k + 1}, y^{k}) \\ + \frac{s β}{2} {‖ A x^{k + 1} + y^{k + 1} - b ‖}^{2} - \frac{s β}{2} {‖ A x^{k + 1} + y^{k} - b ‖}^{2} \\ = g (y^{k + 1}) - g (y^{k}) - 〈 λ^{k + \frac{1}{2}}, y^{k + 1} - y^{k} 〉 + H (x^{k + 1}, y^{k + 1}) - H (x^{k + 1}, y^{k}) \\ - \frac{s β}{2} {‖ y^{k} - y^{k + 1} ‖}^{2} + s β 〈 A x^{k + 1} + y^{k + 1} - b, y^{k + 1} - y^{k} 〉 . \end{array}$ (21)

Then, combining it with (16) and (17) yields

$\begin{array}{l} ℒ_{β}^{s} (x^{k + 1}, y^{k + 1}, λ^{k + \frac{1}{2}}) - ℒ_{β}^{s} (x^{k + 1}, y^{k}, λ^{k + \frac{1}{2}}) \\ \leq 〈 \nabla g (y^{k + 1}), y^{k + 1} - y^{k} 〉 + \frac{L_{g}}{2} {‖ y^{k} - y^{k + 1} ‖}^{2} - 〈 λ^{k + \frac{1}{2}}, y^{k + 1} - y^{k} 〉 \\ + 〈 \nabla_{y} H (x^{k + 1}, y^{k + 1}), y^{k + 1} - y^{k} 〉 + \frac{L_{h}}{2} {‖ y^{k} - y^{k + 1} ‖}^{2} - \frac{s β}{2} {‖ y^{k} - y^{k + 1} ‖}^{2} \\ + s β 〈 A x^{k + 1} + y^{k + 1} - b, y^{k + 1} - y^{k} 〉 . \end{array}$ (22)

Substituting (13) into the above expression and simplifying, we obtain

$ℒ_{β}^{s} (x^{k + 1}, y^{k + 1}, λ^{k + \frac{1}{2}}) - ℒ_{β}^{s} (x^{k + 1}, y^{k}, λ^{k + \frac{1}{2}}) \leq \frac{L_{g} + L_{h} - s β}{2} {‖ y^{k + 1} - y^{k} ‖}^{2} .$ (23)

Note that, by using (4), (6), (20) and (19), we have

$\begin{array}{l} ℒ_{β}^{s} (x^{k + 1}, y^{k + 1}, λ^{k + 1}) - ℒ_{β}^{s} (x^{k + 1}, y^{k + 1}, λ^{k + \frac{1}{2}}) \\ + ℒ_{β}^{s} (x^{k + 1}, y^{k}, λ^{k + \frac{1}{2}}) - ℒ_{β}^{s} (x^{k + 1}, y^{k}, λ^{k}) \\ = 〈 λ^{k + 1} - λ^{k + \frac{1}{2}}, A x^{k + 1} + y^{k + 1} - b 〉 + 〈 λ^{k + \frac{1}{2}} - λ^{k}, A x^{k + 1} + y^{k} - b 〉 \\ = s β {‖ A x^{k + 1} + y^{k + 1} - b ‖}^{2} + α β {‖ A x^{k + 1} + y^{k} - b ‖}^{2} \\ = \frac{1}{(α + s) β} {‖ λ^{k + 1} - λ^{k} ‖}^{2} + \frac{α s β}{α + s} {‖ y^{k + 1} - y^{k} ‖}^{2} \\ \leq \frac{2 (L_{g}^{2} + L_{h}^{2}) + α s β^{2}}{(α + s) β} {‖ y^{k + 1} - y^{k} ‖}^{2} + \frac{2 L_{h}^{2}}{(α + s) β} {‖ x^{k + 1} - x^{k} ‖}^{2} . \end{array}$ (24)

Summing the inequalities (23), (24) and the first equation in Section (5) we obtain

$\begin{array}{l} ℒ_{β}^{s} (w^{k + 1}) - ℒ_{β}^{s} (w^{k}) \\ = ℒ_{β}^{s} (x^{k + 1}, y^{k + 1}, λ^{k + 1}) - ℒ_{β}^{s} (x^{k + 1}, y^{k + 1}, λ^{k + \frac{1}{2}}) + ℒ_{β}^{s} (x^{k + 1}, y^{k + 1}, λ^{k + \frac{1}{2}}) \\ - ℒ_{β}^{s} (x^{k + 1}, y^{k}, λ^{k + \frac{1}{2}}) + ℒ_{β}^{s} (x^{k + 1}, y^{k}, λ^{k + \frac{1}{2}}) - ℒ_{β}^{s} (x^{k + 1}, y^{k}, λ^{k}) \\ + ℒ_{β}^{s} (x^{k + 1}, y^{k}, λ^{k}) - ℒ_{β}^{s} (x^{k}, y^{k}, λ^{k}) \\ \leq [\frac{L_{g} + L_{h} - s β}{2} + \frac{2 (L_{g}^{2} + L_{h}^{2}) + α s β^{2}}{(α + s) β}] {‖ y^{k + 1} - y^{k} ‖}^{2} \\ + \frac{2 L_{h}^{2}}{(α + s) β} {‖ x^{k + 1} - x^{k} ‖}^{2} . \end{array}$ (25)

Thus

$ℒ_{β}^{s} (w^{k + 1}) - ℒ_{β}^{s} (w^{k}) \leq - η ({‖ x^{k + 1} - x^{k} ‖}^{2} + {‖ y^{k + 1} - y^{k} ‖}^{2}),$ (26)

there $η : = \min {\frac{s β - L_{g} - L_{h}}{2} - \frac{2 (L_{g}^{2} + L_{h}^{2}) + α s β^{2}}{(α + s) β}, - \frac{2 L_{h}^{2}}{(α + s) β}}$ . This completes the proof. □

Remark 3.1. Thus, as long as $η > 0$ , the sequence $ℒ_{β}^{s} (w_{k + 1})$ has sufficient descent properties, which means that $ℒ_{β}^{s} (w_{k + 1})$ is monotonically nonincreasing. Therefore, we can achieve $η > 0$ by appropriately choosing the parameters $(α, s, β)$ . The constraints on these parameters are as follows:

${(α, s) \in ℝ^{2} | α < s < 0}$ ,
${β \in ℝ | β > - \frac{1}{s} (L_{g} + L_{h})}$

Lemma 6. Let the iterative sequence generated by the algorithm (5) be denoted as ${w^{k} : (x^{k}, y^{k}, λ^{k})}$ . Suppose that this sequence is bounded, that Assumption A holds and $η > 0$ . Then we have

$\sum_{k = 0}^{+ \infty} {‖ w^{k + 1} - w^{k} ‖}^{2} < + \infty .$

Proof. The boundedness of ${w^{k}}$ implies the existence of a subsequence ${w^{k_{j}}}$ converging to $w^{*}$ . Moreover, the lower semicontinuity of $f (x)$ together with the continuity of $g (y)$ ensures that $ℒ_{β}^{s} (\cdot)$ is lower semicontinuous. Hence,

$ℒ_{β}^{s} (w^{*}) \leq \underset{j \to + \infty}{\lim \inf} ℒ_{β}^{s} (w^{k_{j}}) .$

Hence, ${ℒ_{β}^{s} (w^{k_{j}})}$ is bounded below. The fact that it is also nonincreasing implies its convergence. Since ${ℒ_{β}^{s} (w^{k})}$ is monotonic and contains a convergent subsequence, it follows that ${ℒ_{β}^{s} (w^{k})}$ itself converges, satisfying $ℒ_{β}^{s} (w^{k}) \geq ℒ_{β}^{s} (w^{*})$ . Finally, invoking Equation (14) yields

$η ({‖ x^{k + 1} - x^{k} ‖}^{2} + {‖ y^{k + 1} - y^{k} ‖}^{2}) \leq ℒ_{β}^{s} (w^{k}) - ℒ_{β}^{s} (w^{k + 1}), \forall k .$

By summing over $k = 0, \dots, n$ , and observing that $ℒ_{β}^{s} (w^{0}) < + \infty$ , we arrive at

$\begin{matrix} η \sum_{k = 0}^{n} ({‖ x^{k + 1} - x^{k} ‖}^{2} + {‖ y^{k + 1} - y^{k} ‖}^{2}) \leq ℒ_{β}^{s} (w^{0}) - ℒ_{β}^{s} (w^{n + 1}) \\ \leq ℒ_{β}^{s} (w^{0}) - ℒ_{β}^{s} (w^{*}) < + \infty . \end{matrix}$

The condition $η > 0$ yields $\sum_{k = 0}^{\infty} {‖ y^{k + 1} - y^{k} ‖}^{2} < + \infty$ and $\sum_{k = 0}^{\infty} {‖ x^{k + 1} - x^{k} ‖}^{2} < + \infty$ . Using (19), we further obtain $\sum_{k = 0}^{\infty} {‖ λ^{k + 1} - λ^{k} ‖}^{2} < + \infty$ . Hence, $\sum_{k = 0}^{+ \infty} {‖ w^{k + 1} - w^{k} ‖}^{2} < + \infty$ . This completes the proof. □

Lemma 7. Let the iterative sequence generated by the algorithm (5) be denoted as ${w^{k} : (x^{k}, y^{k}, λ^{k})}$ . Suppose that this sequence is bounded and that Assumption A holds. We define

${\begin{cases} ε_{1}^{k + 1} = A^{T} (λ^{k} - λ^{k + 1}) + \nabla_{x} H (x^{k + 1}, y^{k + 1}) - \nabla_{x} H (x^{k + 1}, y^{k}) + s β (y^{k + 1} - y^{k}), \\ ε_{2}^{k + 1} = - \frac{s}{α + s} (λ^{k + 1} - λ^{k}) + \frac{α s β}{α + s} (y^{k + 1} - y^{k}), \\ ε_{3}^{k + 1} = \frac{1}{(α + s) β} (λ^{k + 1} - λ^{k}) - \frac{α}{α + s} (y^{k + 1} - y^{k}) . \end{cases}$ (27)

Hence, it holds that $ε^{k + 1} : = (ε_{1}^{k + 1}, ε_{2}^{k + 1}, ε_{3}^{k + 1}) \in \partial ℒ_{β}^{s} (w^{k + 1})$ , and there exists a constant $δ > 0$ such that

$d (0, \partial ℒ_{β}^{s} (w^{k + 1})) \leq δ (‖ x^{k + 1} - x^{k} ‖ + ‖ y^{k + 1} - y^{k} ‖) .$ (28)

Proof. From the definition of the function $ℒ_{β}^{s} (\cdot)$ in (4), the following system of equations holds

${\begin{cases} \partial_{x} ℒ_{β}^{s} (w^{k + 1}) = \partial f (x^{k + 1}) + \nabla_{x} H (x^{k + 1}, y^{k + 1}) - A^{T} λ^{k + 1} + s β A^{T} (A x^{k + 1} + y^{k + 1} - b), \\ \partial_{y} ℒ_{β}^{s} (w^{k + 1}) = \nabla g (y^{k + 1}) + \nabla_{y} H (x^{k + 1}, y^{k + 1}) - λ^{k + 1} + s β (A x^{k + 1} + y^{k + 1} - b), \\ \partial_{λ} ℒ_{β}^{s} (w^{k + 1}) = - (A x^{k + 1} + y^{k + 1} - b) . \end{cases}$ (29)

From the optimality condition (6), after rearrangement, we have

${\begin{cases} A^{T} λ^{k} - \nabla_{x} H (x^{k + 1}, y^{k}) - s β A^{T} (A x^{k + 1} + y^{k} - b) \in \partial f (x^{k + 1}), \\ λ^{k + \frac{1}{2}} - \nabla_{y} H (x^{k + 1}, y^{k + 1}) - s β A^{T} (A x^{k + 1} + y^{k + 1} - b) = \nabla g (y^{k + 1}), \\ \frac{1}{(α + s) β} (λ^{k + 1} - λ^{k}) - \frac{α}{α + s} (y^{k + 1} - y^{k}) = - (A x^{k + 1} + y^{k + 1} - b) . \end{cases}$

Then, by substituting the above into (29), we obtain

${\begin{cases} A^{T} (λ^{k} - λ^{k + 1}) + \nabla_{x} H (x^{k + 1}, y^{k + 1}) - \nabla_{x} H (x^{k + 1}, y^{k}) + s β (y^{k + 1} - y^{k}) \in \partial_{x} ℒ_{β}^{s} (w^{k + 1}), \\ - \frac{s}{α + s} (λ^{k + 1} - λ^{k}) + \frac{α s β}{α + s} (y^{k + 1} - y^{k}) \in \partial_{y} ℒ_{β}^{s} (w^{k + 1}), \\ \frac{1}{(α + s) β} (λ^{k + 1} - λ^{k}) - \frac{α}{α + s} (y^{k + 1} - y^{k}) \in \partial_{λ} ℒ_{β}^{s} (w^{k + 1}) . \end{cases}$

Consequently, applying Lemma 3 yields $(ε_{1}^{k + 1}, ε_{2}^{k + 1}, ε_{3}^{k + 1}) \in \partial ℒ_{β}^{s} (w^{k + 1})$ . Based on the preceding relation, we can find $δ_{1}, δ_{2}$ such that

$\begin{matrix} {‖ ε^{k + 1} ‖}^{2} = {‖ (ε_{1}^{k + 1}, ε_{2}^{k + 1}, ε_{3}^{k + 1}) ‖}^{2} \\ \leq {‖ ε_{1}^{k + 1} ‖}^{2} + {‖ ε_{2}^{k + 1} ‖}^{2} + {‖ ε_{3}^{k + 1} ‖}^{2} \\ \leq δ_{1}^{2} {‖ y^{k + 1} - y^{k} ‖}^{2} + δ_{2}^{2} {‖ λ^{k + 1} - λ^{k} ‖}^{2} . \end{matrix}$ (30)

Applying (19), there exists $δ$ for which the following holds

$d^{2} (0, \partial ℒ_{β}^{s} (w^{k + 1})) \leq {‖ ε^{k + 1} ‖}^{2} \leq δ^{2} ({‖ x^{k + 1} - x^{k} ‖}^{2} + {‖ y^{k + 1} - y^{k} ‖}^{2}),$

then

$d (0, \partial ℒ_{β}^{s} (w^{k + 1})) \leq δ (‖ x^{k + 1} - x^{k} ‖ + ‖ y^{k + 1} - y^{k} ‖) .$ (31)

This completes the proof. □

Lemma 8. Let the iterative sequence generated by the algorithm (5) be denoted as ${w^{k} : (x^{k}, y^{k}, λ^{k})}$ . Suppose that this sequence is bounded and that Assumption A holds. Let $Ω$ represent the set of all cluster points of the sequence ${w^{k}}$ . Then the following statement is true

(i) $Ω$ is a nonempty compact set, and

$d (w^{k}, Ω) \to 0, as k \to + \infty;$

(ii) $Ω \subset crit ℒ_{β}$ , where $crit ℒ_{β}$ denotes the set of all stationary points of $ℒ_{β}^{s}$ ;

(iii) $ℒ_{β}^{s} (\cdot)$ is finite and constant on $Ω$ , which equals to

$inf_{k \in ℕ} ℒ_{β}^{s} (w^{k}) = lim_{k \to + \infty} ℒ_{β}^{s} (w^{k}) .$

Proof. We now verify the above results one by one.

(i) This is immediate from the definition of limit points.

(ii) Suppose $w^{*} \in Ω$ . Then there exists a subsequence ${w^{k_{j}}}$ of ${w^{k}}$ such that $w^{k_{j}} \to w^{*}$ . Applying Lemma 6 yields

$lim_{k \to + \infty} ‖ w^{k + 1} - w^{k} ‖ = 0,$ (32)

thus, $w^{k_{j} + 1} \to w^{*}$ . Noting that $x^{k + 1}$ minimize $ℒ_{β}^{s} (x, y^{k}, λ^{k})$ with respect to $x$ , we get

$ℒ_{β}^{s} (x^{k + 1}, y^{k}, λ^{k}) \leq ℒ_{β}^{s} (x^{*}, y^{k}, λ^{k}) .$ (33)

With respect to the variables $y$ , $λ$ , and $(y^{k_{j}}, λ^{k_{j}}) \to (y^{*}, λ^{*})$ . Hence, it follows that

$\underset{j \to + \infty}{\lim \sup} ℒ_{β}^{s} (x^{k_{j} + 1}, y^{k_{j}}, λ^{k_{j}}) = \underset{j \to + \infty}{\lim \sup} ℒ_{β}^{s} (x^{k_{j} + 1}, y^{k_{j} + 1}, λ^{k_{j} + 1}) .$ (34)

Furthermore, applying (33) yields

$\underset{j \to + \infty}{\lim \sup} ℒ_{β}^{s} (x^{k_{j} + 1}, y^{k_{j} + 1}, λ^{k_{j} + 1}) \leq ℒ_{β}^{s} (x^{*}, y^{*}, λ^{*}) .$ (35)

Since $ℒ_{β}^{s} (\cdot)$ is lower semicontinuous, we know that

$\underset{j \to + \infty}{\lim \sup} ℒ_{β}^{s} (x^{k_{j} + 1}, y^{k_{j} + 1}, λ^{k_{j} + 1}) \geq ℒ_{β}^{s} (x^{*}, y^{*}, λ^{*}) .$ (36)

From (34), (35) and (36), we obtain

$\lim_{j \to + \infty} f (x^{k_{j} + 1}) = f (x^{*}) .$

Taking the limit in (6) along the subsequence ${(x^{k_{j} + 1}, y^{k_{j} + 1}, λ^{k_{j} + 1})}$ using (32) again, we obtain

${\begin{cases} - \nabla_{x} H (x^{*}, y^{*}) + A^{T} λ^{*} \in \partial f (x^{*}), \\ - \nabla_{y} H (x^{*}, y^{*}) + λ^{*} = \nabla g (y^{*}), \\ A x^{*} + y^{*} - b = 0. \end{cases}$

Therefore, $(x^{*}, y^{*}, λ^{*})$ satisfies the critical point condition of (4), which implies that $w^{*} \in crit ℒ_{β}^{s}$ . Thus, $Ω \subset crit ℒ_{β}^{s}$ .

(iii) Take any $(x^{*}, y^{*}, λ^{*}) \in Ω$ . Then there exists a subsequence ${(x^{k_{j}}, y^{k_{j}}, λ^{k_{j}})}$ , such that ${(x^{k_{j}}, y^{k_{j}}, λ^{k_{j}})} \to (x^{*}, y^{*}, λ^{*})$ . Since $ℒ_{β}^{s} (w^{k})$ is nonincreasing and has a convergent subsequence, the entire sequence $ℒ_{β}^{s} (w^{k})$ converges, we have

$lim_{k \to + \infty} ℒ_{β}^{s} (x^{k}, y^{k}, λ^{k}) = ℒ_{β}^{s} (x^{*}, y^{*}, λ^{*}) .$

That is, $ℒ_{β}^{s} (\cdot)$ takes a constant value on $Ω$ . Clearly,

$inf_{k \in ℕ} ℒ_{β}^{s} (w^{k}) = lim_{k \to + \infty} ℒ_{β}^{s} (w^{k}) .$

This completes the proof. □

Theorem 9. Let the iterative sequence generated by the algorithm (5) be denoted as ${w^{k} : (x^{k}, y^{k}, λ^{k})}$ . Suppose that this sequence is bounded, that Assumption A holds and $η > 0$ . When $ℒ_{β}^{s} (\cdot)$ is a KL function, then ${w^{k}}$ has finite length, that is

$\sum_{k = 0}^{+ \infty} ‖ w^{k + 1} - w^{k} ‖ < + \infty .$

Moreover, it converges to a critical point of $ℒ_{β}^{s} (\cdot)$ .

Proof. Lemma 8 implies that $ℒ_{β}^{s} (w^{k}) \to ℒ_{β}^{s} (w^{*})$ for any $w^{*} \in Ω$ . Next, we examine two cases.

(i) Suppose there exists $k_{0}$ with $ℒ_{β}^{s} (w^{k_{0}}) = ℒ_{β}^{s} (w^{*})$ , then using (14) and Remark 0.1, for every $k > k_{0}$ , we have

$\begin{matrix} η ({‖ x^{k + 1} - x^{k} ‖}^{2} + {‖ y^{k + 1} - y^{k} ‖}^{2}) \leq ℒ_{β}^{s} (w^{k}) - ℒ_{β}^{s} (w^{k + 1}) \\ \leq ℒ_{β}^{s} (w^{k_{0}}) - ℒ_{β}^{s} (w^{*}) = 0. \end{matrix}$

Hence, $y^{k + 1} = y^{k}$ and $x^{k + 1} = x^{k}$ for any $k > k_{0}$ . Then, by (19), we further obtain $λ^{k + 1} = λ^{k}$ for any $k > k_{0} + 1$ , which means $w^{k + 1} = w^{k}$ .

(ii) If $ℒ_{β}^{s} (w^{k}) > ℒ_{β}^{s} (w^{*})$ holds for all $k$ , then the following convergence properties hold:

Since $d (w^{k}, Ω) \to 0$ , for any $ε_{1} > 0$ there exists $k_{1} > 0$ such that for all $k > k_{1}$ , it holds $d (w^{k}, Ω) < ε_{1}$ is true.
Since $ℒ_{β}^{s} (w^{k}) \to ℒ_{β}^{s} (w^{*})$ , for any $ε_{2} > 0$ there exists $k_{2} > 0$ such that for all $k > k_{2}$ , it holds that $ℒ_{β}^{s} (w^{k}) < ℒ_{β}^{s} (w^{*}) + ε_{2}$ is true.

Now set $k > \tilde{k} = max {k_{1}, k_{2}}$ and any $ε_{1}, ε_{2} > 0$ , we have

$d (w^{k}, Ω) < ε_{1}, ℒ_{β}^{s} (w^{*}) < ℒ_{β}^{s} (w^{k}) < ℒ_{β}^{s} (w^{*}) + ε_{2} .$

By Lemma 8, we have established that $Ω$ is a nonempty compact set and that $ℒ_{β}^{s} (\cdot)$ is constant $Ω$ . Consequently, Lemma 2 implies that

$φ^{'} (ℒ_{β}^{s} (w^{k}) - ℒ_{β}^{s} (w^{*})) d (0, \partial ℒ_{β}^{s} (w^{k})) \geq 1, \forall k > \tilde{k} .$ (37)

Drawing on what has been clearly established, namely that

$ℒ_{β}^{s} (w^{k}) - ℒ_{β}^{s} (w^{k + 1}) = ℒ_{β}^{s} (w^{k}) - ℒ_{β}^{s} (w^{*}) - (ℒ_{β}^{s} (w^{k + 1}) - ℒ_{β}^{s} (w^{*})),$

and the concavity of $φ (\cdot)$ , it follows that

$\begin{array}{l} φ (ℒ_{β}^{s} (w^{k}) - ℒ_{β}^{s} (w^{*})) - φ (ℒ_{β}^{s} (w^{k + 1}) - ℒ_{β}^{s} (w^{*})) \\ \geq φ^{'} (ℒ_{β}^{s} (w^{k}) - ℒ_{β}^{s} (w^{*})) (ℒ_{β}^{s} (w^{k}) - ℒ_{β}^{s} (w^{k + 1})) . \end{array}$

Now, taking the above inequality together with

$d (0, \partial ℒ_{β}^{s} (w^{k})) \leq ξ ‖ y^{k} - y^{k - 1} ‖, φ^{'} (ℒ_{β}^{s} (w^{k}) - ℒ_{β}^{s} (w^{*})) > 0,$

and relation (37), it follows that

$\begin{array}{l} ℒ_{β}^{s} (w^{k}) - ℒ_{β}^{s} (w^{k + 1}) \\ \leq \frac{φ (ℒ_{β}^{s} (w^{k}) - ℒ_{β}^{s} (w^{*})) - φ (ℒ_{β}^{s} (w^{k + 1}) - ℒ_{β}^{s} (w^{*}))}{φ^{'} (ℒ_{β}^{s} (w^{k}) - ℒ_{β}^{s} (w^{*}))} \\ \leq d (0, \partial ℒ_{β}^{s} (w^{k})) [φ (ℒ_{β}^{s} (w^{k}) - ℒ_{β}^{s} (w^{*})) - φ (ℒ_{β}^{s} (w^{k + 1}) - ℒ_{β}^{s} (w^{*}))] \\ \leq δ (‖ x^{k} - x^{k - 1} ‖ + ‖ y^{k} - y^{k - 1} ‖) [φ (ℒ_{β}^{s} (w^{k}) - ℒ_{β}^{s} (w^{*})) - φ (ℒ_{β}^{s} (w^{k + 1}) - ℒ_{β}^{s} (w^{*}))] . \end{array}$

For convenience, we define

$Δ_{p, q} : = φ (ℒ_{β}^{s} (w^{p}) - ℒ_{β}^{s} (w^{*})) - φ (ℒ_{β}^{s} (w^{q}) - ℒ_{β}^{s} (w^{*})) .$

Then the above equation is equivalent to

$ℒ_{β}^{s} (w^{k}) - ℒ_{β}^{s} (w^{k + 1}) \leq δ (‖ x^{k} - x^{k - 1} ‖ + ‖ y^{k} - y^{k - 1} ‖) Δ_{k, k + 1} .$ (38)

Together, Lemma 5 and (38), imply that

$η ({‖ x^{k + 1} - x^{k} ‖}^{2} + {‖ y^{k + 1} - y^{k} ‖}^{2}) \leq δ (‖ x^{k} - x^{k - 1} ‖ + ‖ y^{k} - y^{k - 1} ‖) Δ_{k, k + 1}, k > \tilde{k},$

together with

$\frac{1}{2} {(‖ x^{k + 1} - x^{k} ‖ + ‖ y^{k + 1} - y^{k} ‖)}^{2} \leq ({‖ x^{k + 1} - x^{k} ‖}^{2} + {‖ y^{k + 1} - y^{k} ‖}^{2}),$

and thereby

$2 (‖ x^{k + 1} - x^{k} ‖ + ‖ y^{k + 1} - y^{k} ‖) \leq 2 \sqrt{‖ x^{k} - x^{k - 1} ‖ + ‖ y^{k} - y^{k - 1} ‖} \sqrt{\frac{2 δ}{η} Δ_{k, k + 1}}, k > \tilde{k} .$

Using the fact that $2 \sqrt{α β} \leq α + β$ , we have

$2 (‖ x^{k + 1} - x^{k} ‖ + ‖ y^{k + 1} - y^{k} ‖) \leq ‖ x^{k} - x^{k - 1} ‖ + ‖ y^{k} - y^{k - 1} ‖ + \frac{2 δ}{η} Δ_{k, k + 1} .$ (39)

Taking the sum of (39) over $k = \tilde{k} + 1, \dots, N$ gives

$2 \sum_{k = \tilde{k} + 1}^{N} (‖ x^{k + 1} - x^{k} ‖ + ‖ y^{k + 1} - y^{k} ‖) \leq \sum_{k = \tilde{k} + 1}^{N} (‖ x^{k} - x^{k - 1} ‖ + ‖ y^{k} - y^{k - 1} ‖) + \frac{2 δ}{η} Δ_{\tilde{k} + 1, N + 1} .$

From Definition, we have $φ (ℒ_{β}^{s} (w^{N + 1}) - ℒ_{β}^{s} (w^{*})) > 0$ from Definition 4. Rearranging terms and taking $N \to + \infty$ gives

$\begin{array}{l} \sum_{k = \tilde{k} + 1}^{+ \infty} (‖ x^{k + 1} - x^{k} ‖ + ‖ y^{k + 1} - y^{k} ‖) \\ \leq ‖ x^{\tilde{k} + 1} - x^{\tilde{k}} ‖ + ‖ y^{\tilde{k} + 1} - y^{\tilde{k}} ‖ + \frac{2 δ}{η} φ (ℒ_{β}^{s} (w^{\tilde{k} + 1}) - ℒ_{β}^{s} (w^{*})) . \end{array}$ (40)

Thus,

$\sum_{k = 0}^{+ \infty} ‖ y^{k + 1} - y^{k} ‖ < + \infty,$ (41)

and

$\sum_{k = 0}^{+ \infty} ‖ x^{k + 1} - x^{k} ‖ < + \infty .$ (42)

Substituting into (19) yields

$\sum_{k = 0}^{+ \infty} ‖ λ^{k + 1} - λ^{k} ‖ < + \infty .$ (43)

Additionally, we note that

$\begin{matrix} ‖ w^{k + 1} - w^{k} ‖ = {({‖ x^{k + 1} - x^{k} ‖}^{2} + {‖ y^{k + 1} - y^{k} ‖}^{2} + {‖ λ^{k + 1} - λ^{k} ‖}^{2})}^{1 / 2} \\ \leq ‖ x^{k + 1} - x^{k} ‖ + ‖ y^{k + 1} - y^{k} ‖ + ‖ λ^{k + 1} - λ^{k} ‖ . \end{matrix}$

From (41), (42) and (43), we arrive at

$\sum_{k = 0}^{+ \infty} ‖ w^{k + 1} - w^{k} ‖ < + \infty .$

Finally, ${w^{k}}$ converges to a critical point of $ℒ_{β}^{s} (\cdot)$ by Lemma 8. This completes the proof. □

Lemma 10. Let ${w^{k} : = (x^{k}, y^{k}, λ^{k})}$ be the sequence generated by the symmetric ADMM (5), If at least one of the following statements holds

(i) $\underset{‖ x ‖ \to + \infty}{\lim \inf} f (x) = + \infty$ .

(ii) $inf_{x} f (x) > - \infty$ and $\underset{‖ y ‖ \to + \infty}{\lim \inf} g (y) = + \infty$ .

Then we can conclude that the sequence ${w^{k} : = (x^{k}, y^{k}, λ^{k})}$ is bounded.

Proof. In the first place, suppose that condition (i) holds. Based on Lemma 5, we arrive at

$ℒ_{β}^{s} (x^{k}, y^{k}, λ^{k}) \leq ℒ_{β}^{s} (x^{1}, y^{1}, λ^{1}) .$

By combining (4) with $\nabla g (y^{k + 1}) = λ^{k + 1} - \nabla_{y} H (x^{k + 1}, y^{k + 1})$ , we get

$\begin{matrix} ℒ_{β}^{s} (x^{1}, y^{1}, λ^{1}) \geq f (x^{k}) + g (y^{k}) + H (x^{k}, y^{k}) - 〈 λ^{k}, A x^{k} + y^{k} - b 〉 \\ + \frac{s β}{2} {‖ A x^{k} + y^{k} - b ‖}^{2} \\ = f (x^{k}) + g (y^{k}) + H (x^{k}, y^{k}) \\ - 〈 \nabla g (y^{k}) + \nabla_{y} H (x^{k}, y^{k}), A x^{k} + y^{k} - b 〉 + \frac{s β}{2} {‖ A x^{k} + y^{k} - b ‖}^{2} \\ \geq f (x^{k}) + (\frac{L_{g}}{2} + \frac{L_{h}}{2} + \frac{s β}{2}) {‖ A x^{k} + y^{k} - b ‖}^{2} . \end{matrix}$

Note that (i) implies that $inf_{x} f (x) > - \infty$ . Based on Assumption A, we can obtain $\frac{L_{g}}{2} + \frac{L_{h}}{2} + \frac{s β}{2} > 0$ . Thus, we can deduce that ${x^{k}}$ and ${y^{k}}$ are bounded. Hence, ${λ^{k}}$ is also bounded, and thus ${w^{k}}$ is bounded. This completes the proof. □

Theorem 11. (Convergence rate) Let the iterative sequence generated by the algorithm (5) be denoted as ${w^{k} : (x^{k}, y^{k}, λ^{k})}$ . Suppose that this sequence is bounded, that Assumption A holds and $η > 0$ , and ${w^{k} : (x^{k}, y^{k}, λ^{k})} \to {w^{*} = (x^{*}, y^{*}, λ^{*})}$ . Assume that $ℒ_{β}^{s} (\cdot)$ possesses the KL property at $(x^{*}, y^{*}, λ^{*})$ , and that the corresponding function is given by $φ (s) = c t^{1 - θ}$ , with $θ \in [0, 1)$ , $c > 0$ . Then the following three statements hold

(i) If $θ = 0$ , the sequence ${w^{k} = (x^{k}, y^{k}, λ^{k})}$ converges in finitely many steps. This means we can find an index $k$ with $w^{k} = w^{*}$ .

(ii) If $θ \in (0, \frac{1}{2}]$ , then there is a constant $c_{1} > 0$ and $τ \in [0, 1)$ so that

$‖ (x^{k}, y^{k}, λ^{k}) - (x^{*}, y^{*}, λ^{*}) ‖ \leq c_{1} τ^{k} .$

(iii) If $θ \in (\frac{1}{2}, 1)$ , then there is a constant $c_{2} > 0$ for which

$‖ (x^{k}, y^{k}, λ^{k}) - (x^{*}, y^{*}, λ^{*}) ‖ \leq c_{2} k^{(θ - 1) / (2 θ - 1)} .$

Proof. For $θ = 0$ , we have $φ (t) = c t$ and $φ^{'} (t) = c$ . Suppose, contrary to the claim, that ${w^{k} = (x^{k}, y^{k}, λ^{k})}$ does not terminate in finitely many steps. Then, for large enough $k$ , the KL property yields $c d (0, \partial ℒ_{β}^{s} (w^{k})) \geq 1$ , which contradicts Lemma 7. Now let $θ > 0$ and set $Δ_{k} = \sum_{i = k}^{+ \infty} (‖ x^{i + 1} - x^{i} ‖ + ‖ y^{i + 1} - y^{i} ‖)$ for $k \geq 0$ . From (40) we deduce

$Δ_{\tilde{k} + 1} \leq Δ_{\tilde{k}} - Δ_{\tilde{k} + 1} + \frac{2 δ}{η} φ (ℒ_{β}^{s} (w^{\tilde{k} + 1}) - ℒ_{β}^{s} (w^{*})) .$ (44)

Because $ℒ_{β}^{s} (\cdot)$ has the KL property at $w^{*}$ , we have

$φ^{'} (ℒ_{β}^{s} (w^{\tilde{k} + 1}) - ℒ_{β}^{s} (w^{*})) d (0, \partial ℒ_{β}^{s} (w^{\tilde{k} + 1})) \geq 1.$

This inequality can be rearranged as

${(ℒ_{β}^{s} (w^{\tilde{k} + 1}) - ℒ_{β}^{s} (w^{*}))}^{θ} \leq c \cdot (1 - θ) d (0, \partial ℒ_{β}^{s} (w^{\tilde{k} + 1})) .$ (45)

Finally, applying Lemma 7 yields

$d (0, \partial ℒ_{β}^{s} (w^{\tilde{k} + 1})) \leq δ (‖ x^{\tilde{k} + 1} - x^{\tilde{k}} ‖ + ‖ y^{\tilde{k} + 1} - y^{\tilde{k}} ‖) = δ (Δ_{\tilde{k}} - Δ_{\tilde{k} + 1}) .$ (46)

From (45) and (46), there exists $γ = {[c (1 - θ) δ]}^{\frac{1 - θ}{θ}} > 0$ satisfying

$φ (ℒ_{β}^{s} (w^{\tilde{k} + 1}) - ℒ_{β}^{s} (w^{*})) = c \cdot {(ℒ_{β}^{s} (w^{\tilde{k} + 1}) - ℒ_{β}^{s} (w^{*}))}^{1 - θ} \leq γ {(Δ_{\tilde{k}} - Δ_{\tilde{k} + 1})}^{(1 - θ) / θ} .$

Inserting this into (44) gives

$Δ_{\tilde{k} + 1} \leq Δ_{\tilde{k}} - Δ_{\tilde{k} + 1} + \frac{2 δ}{η} γ {(Δ_{\tilde{k}} - Δ_{\tilde{k} + 1})}^{(1 - θ) / θ} .$ (47)

Now, by (47) and the results of Attouch and Bolte [25], we obtain

Case 1: $θ \in (0, \frac{1}{2}]$ , then there exist $c_{1} > 0$ and $τ \in [0, 1)$ such that

$‖ x^{k} - x^{*} ‖ + ‖ y^{k} - y^{*} ‖ \leq c_{1} τ^{k} .$ (48)

Case 2: $θ \in (\frac{1}{2}, 1)$ , then there exists $c_{2} > 0$ such that

$‖ x^{k} - x^{*} ‖ + ‖ y^{k} - y^{*} ‖ \leq c_{2} k^{\frac{θ - 1}{2 θ - 1}} .$ (49)

Next, applying (19) yields

$\begin{matrix} ‖ λ^{k} - λ^{*} ‖ \leq \sqrt{2} {[L_{g}^{2} {‖ y^{k} - y^{*} ‖}^{2} + L_{h}^{2} {‖ y^{k} - y^{*} ‖}^{2} + L_{h}^{2} {‖ x^{k} - x^{*} ‖}^{2}]}^{\frac{1}{2}} \\ \leq \sqrt{2} [L_{g} ‖ y^{k} - y^{*} ‖ + L_{h}^{2} ‖ y^{k} - y^{*} ‖ + L_{h} ‖ x^{k} - x^{*} ‖] . \end{matrix}$ (50)

Thus, (ii) and (iii) follow from (48)-(50).

4. Conclusion

In this paper, we propose a symmetric alternating direction method of multipliers with two different relaxation factors for minimizing the sum of two non-separable nonconvex functions. For problems where the objective function contains a coupling term, i.e., the case where $f$ and $g$ are non-separable, research remains limited in both convex and non-convex settings. We review the development of symmetric ADMM for solving non-separable problems and find that although many existing works have proposed symmetric ADMM variants incorporating techniques such as Bregman distances, inertial terms, regularization terms, or linearization, they often introduce only one relaxation factor. Inspired by this, we introduce two different relaxation factors and apply the algorithm to non-separable nonconvex problems, thereby refining the basic form of symmetric ADMM for solving non-separable problems. This makes the parameter range of the algorithm broader, allowing it to be adapted to more practical problems by adjusting the parameters. It also provides fundamental theoretical support for further integration with other techniques. Finally, based on the Kurdyka-Łojasiewicz (KL) property, we prove that the sequence generated by the algorithm converges to a stationary point of the problem and further analyze its finite-step convergence, linear convergence, and sublinear convergence.

Acknowledgements

Sincere thanks to the members of JAMP for their professional performance, and special thanks to managing editor Hellen XU for a rare attitude of high quality.

Conflicts of Interest

The authors declare no conflicts of interest regarding the publication of this paper.

References

[1]	Shang, F., Xu, T., Liu, Y., Liu, H., Shen, L. and Gong, M. (2021) Differentially Private ADMM Algorithms for Machine Learning. IEEE Transactions on Information Forensics and Security, 16, 4733-4745.[CrossRef]
[2]	Chan, S.H. (2019) Performance Analysis of Plug-And-Play ADMM: A Graph Signal Processing Perspective. IEEE Transactions on Computational Imaging, 5, 274-286.[CrossRef]
[3]	Xu, H., Caramanis, C. and Mannor, S. (2013) Outlier-robust PCA: The High-Dimensional Case. IEEE Transactions on Information Theory, 59, 546-572.[CrossRef]
[4]	Glowinski, R. and Marroco, A. (1975) Sur l’approximation, par éléments finis d’ordre un, et la résolution, par pénalisation-dualité d’une classe de problèmes de Dirichlet non linéaires. Revue française d’automatique, informatique, recherche opérationnelle. Analyse numérique, 9, 41-76.[CrossRef]
[5]	Douglas, J. and Rachford, H.H. (1956) On the Numerical Solution of Heat Conduction Problems in Two and Three Space Variables. Transactions of the American Mathematical Society, 82, 421-439.[CrossRef]
[6]	He, B., Liu, H., Wang, Z. and Yuan, X. (2014) A Strictly Contractive Peaceman—Rachford Splitting Method for Convex Programming. SIAM Journal on Optimization, 24, 1011-1040.[CrossRef] [PubMed]
[7]	He, B., Ma, F. and Yuan, X. (2016) Convergence Study on the Symmetric Version of ADMM with Larger Step Sizes. SIAM Journal on Imaging Sciences, 9, 1467-1501.[CrossRef]
[8]	Jia, Z., Gao, X., Cai, X. and Han, D. (2021) The Convergence Rate Analysis of the Symmetric ADMM for the Nonconvex Separable Optimization Problems. Journal of Industrial & Management Optimization, 17, 1943-1971.[CrossRef]
[9]	Liu, P., Jian, J., He, B. and Jiang, X. (2022) Convergence of Bregman Peaceman-Rachford Splitting Method for Nonconvex Nonseparable Optimization. Journal of the Operations Research Society of China, 11, 707-733.[CrossRef]
[10]	Gao, X. and Zhang, S. (2016) First-Order Algorithms for Convex Optimization with Nonseparable Objective and Coupled Constraints. Journal of the Operations Research Society of China, 5, 131-159.[CrossRef]
[11]	Chen, C., Li, M., Liu, X. and Ye, Y. (2017) Extended ADMM and BCD for Nonseparable Convex Minimization Models with Quadratic Coupling Terms: Convergence Analysis and Insights. Mathematical Programming, 173, 37-77.[CrossRef]
[12]	Guo, K. and Han, D.R. and Wu, T.T. (2018) Convergence Analysis for Optimization Problems with Nonseparable Nonconvex Objective and Linear Constraints. Pacific Journal of Optimization, 14, 489-506. https://webofscience.clarivate.cn/wos/woscc/full-record/WOS:000461411400008
[13]	Guo, K. and Wang, X. (2018) Convergence of Generalized Alternating Direction Method of Multipliers for Nonseparable Nonconvex Objective with Linear Constraints. Journal of Mathematical Research with Applications, 38, 18. https://d.wanfangdata.com.cn/periodical/sxyjypl201805010
[14]	Liu, Q., Shen, X. and Gu, Y. (2019) Linearized ADMM for Nonconvex Nonsmooth Optimization with Convergence Analysis. IEEE Access, 7, 76131-76144.[CrossRef]
[15]	Wu, Z., Li, M., Wang, D.Z.W. and Han, D. (2017) A Symmetric Alternating Direction Method of Multipliers for Separable Nonconvex Minimization Problems. Asia-Pacific Journal of Operational Research, 34, Article ID: 1750030.[CrossRef]
[16]	Dang, Y.Z. and Cui, T.T. (2023) Convergence Analysis of Linear Symmetric Proximal ADMM for Nonconvex Nonsmooth Nonseparable Optimization. Journal of Systems Science and Mathematical Sciences, 43, 2949-2969. (In Chinese)http://dx.chinadoi.cn/10.12341/jssms23295[CrossRef]
[17]	Dang, Y. and Liu, K. (2026) A Two-Step Inertial Bregman Symmetric Admm-Type Algorithm with Kl-Property for Nonconvex Nonsmooth Nonseparable Optimization Problems with Application. Journal of Computational and Applied Mathematics, 472, Article ID: 116815.[CrossRef]
[18]	Mei, L. and Ke, G. (2026) Convergence Analysis of the Symmetric Alternating Direction Method of Multipliers for Two-Block Separable Nonconvex Optimization Problems with Linear Constraints. Pacific Journal of Optimization.[CrossRef]
[19]	Beck, A. (2017) First-Order Methods in Optimization. Society for Industrial and Applied Mathematics.[CrossRef]
[20]	Guo, K., Han, D.R. and Wu, T.T. (2016) Convergence of Alternating Direction Method for Minimizing Sum of Two Nonconvex Functions with Linear Constraints. International Journal of Computer Mathematics, 94, 1653-1669.[CrossRef]
[21]	Nesterov, Y. (2013) Introductory Lectures on Convex Optimization: A Basic Course, Springer Science and Business Media.
[22]	Attouch, H., Bolte, J., Redont, P. and Soubeyran, A. (2010) Proximal Alternating Minimization and Projection Methods for Nonconvex Problems: An Approach Based on the Kurdyka-Łojasiewicz Inequality. Mathematics of Operations Research, 35, 438-457.[CrossRef]
[23]	Bolte, J., Sabach, S. and Teboulle, M. (2013) Proximal Alternating Linearized Minimization for Nonconvex and Nonsmooth Problems. Mathematical Programming, 146, 459-494.[CrossRef]
[24]	Attouch, H., Bolte, J. and Svaiter, B.F. (2011) Convergence of Descent Methods for Semi-Algebraic and Tame Problems: Proximal Algorithms, Forward-Backward Splitting, and Regularized Gauss-Seidel Methods. Mathematical Programming, 137, 91-129.[CrossRef]
[25]	Attouch, H. and Bolte, J. (2007) On the Convergence of the Proximal Algorithm for Nonsmooth Functions Involving Analytic Features. Mathematical Programming, 116, 5-16.[CrossRef]

	customer@scirp.org
	+86 18163351462 (WhatsApp)
	1655362766
	SCIRP WeChat

Journals Menu

Home

About SCIRP

Service

Policies