_{1}

^{*}

The Alternating Direction Multiplier Method (ADMM) is widely used in various fields, and different variables are customized in the literature for different application scenarios [1] [2] [3] [4] . Among them, the linearized alternating direction multiplier method (LADMM) has received extensive attention because of its effectiveness and ease of implementation. This paper mainly discusses the application of ADMM in dictionary learning (non-convex problem). Many numerical experiments show that to achieve higher convergence accuracy, the convergence speed of ADMM is slower, especially near the optimal solution. Therefore, we introduce the linearized alternating direction multiplier method (LADMM) to accelerate the convergence speed of ADMM. Specifically, the problem is solved by linearizing the quadratic term of the subproblem, and the convergence of the algorithm is proved. Finally, there is a brief summary of the full text.

With the development of technology, data collection and processing have become easier, and many areas involve high-dimensional data issues, such as information technology, economic finance, and data modeling. Faced with such huge data, many researchers have proposed different solutions, and compressed sensing and sparsity has become an effective algorithm, because sparsity reduces the dimensionality of data in a certain sense and alternates direction multiplier method (ADMM) [

The ADMM algorithm was first proposed by Gabay, Meicher and Glowinski in the mid-1970 [

First consider the convex optimization problem with equality constraints

min f ( x ) s .t . A x = b (1)

where x ∈ R n , A ∈ R m × n , f : R n → R is a convex function.

Firstly, an optimization algorithm with good properties is introduced, which augments the Lagrangian multiplier method. The augmented Lagrangian function is defined as:

L ρ ( x , λ ) = f ( x ) − λ T ( A x − b ) + ( ρ / 2 ) ‖ A x − b ‖ 2 2 , (2)

where ρ > 0 is called the penalty parameter. when ρ = 0 , L 0 is the Lagrangian function. The iterative steps of the augmented Lagrangian multiplier method are:

x ( k + 1 ) : = arg min L ρ ( x , λ k ) λ k + 1 : = λ k − ρ ( A x k + 1 − b ) (3)

where λ is the Lagrangian multiplier, i.e. the dual variable.

The advantage of this algorithm is that the convergence of the iterative sequence can be guaranteed without too strong conditions. For example, for the penalty parameter, it is not required to increase to infinity in the iterative process, and a fixed value can be taken. But the disadvantage of this algorithm is that when the objective function is separable, the model becomes:

min f ( x ) + g ( y ) s .t . A x + B y = b (4)

where g is also a convex function. In the x iteration step, the augmented Lagrangian function L ρ is inseparable, and the discrete variables cannot be solved in parallel for the x subproblem. This leads to the ADMM algorithm we will discuss in the next section. The Alternating Direction Method (ADMM) is mainly used to solve the optimization problem with separable variables like (4), where x ∈ R n , y ∈ R m , A ∈ R p × n , B ∈ R p × m , b ∈ R p . Let’s assume that both f and g are convex functions, and then make other assumptions. Similar to the definition in the previous section, the augmented Lagrangian penalty function of (4) is:

L ρ ( x , y , λ ) = f ( x ) + g ( y ) − λ T ( A x + B y − b ) + ( ρ / 2 ) ‖ A x + B y − b ‖ 2 2 (5)

The steps of the ADMM algorithm iteration are as follows:

x k + 1 : = arg min L ρ ( x , y k , λ k ) y k + 1 : = arg min L ρ ( x k + 1 , y , λ k ) λ k + 1 : = λ k − ρ ( A x ( k + 1 ) + B y ( k + 1 ) − b ) (6)

where ρ > 0 . The similarity between the algorithm and the augmented Lagrangian multiplier method is to iteratively solve the variables x and y and then iteratively solve the dual variables.

If the augmented Lagrangian multiplier method is used for iterative solution:

( x ( k + 1 ) , y ( k + 1 ) : = arg min L ρ ( x , y , λ ( k ) ) ) λ k + 1 : = λ k − ρ ( A x ( k + 1 ) + B y ( k + 1 ) − b ) (7)

As mentioned in the previous section, you can see that the augmented Lagrangian multiplier method deals with two separate variables at the same time, and ADMM alternates the variables, which is the origin of the algorithm name. It can be considered that this is the use of Gauss-Seidel iterations on two variables. For details, please refer to. It is obvious from the algorithm framework that the ADMM algorithm is more suitable for solving the problem of having separate variables because the objective functions f and g are also separated.

To get a simpler form of ADMM, normalize the dual variable so that μ = ( 1 / ρ ) λ . Then the ADMM iteration becomes:

x ( k + 1 ) : = arg min ( f ( x ) + ( ρ / 2 ) ‖ A x + B y ( k ) − b + μ ( k ) ‖ 2 2 ) y ( k + 1 ) : = arg min ( g ( y ) + ( ρ / 2 ) ‖ A x k + 1 + B y − b + μ ( k ) ‖ 2 2 ) μ ( k + 1 ) : = μ ( k ) − ( A x ( k + 1 ) + B y ( k + 1 ) − b ) (8)

ADMM convergence: Regarding the convergence of ADMM, please refer to the literature.

As we all know, the alternating direction method (ADMM) is one of the effective algorithms for solving large-scale sparse optimization problems. It is solved by splitting the problem into a number of low-dimensional sub-problems by augmented Lagrangian function construction. In recent years, a large number of working signals have pointed to the sparse expression of signals. Sparse expression refers to the use of a D ∈ R m × n ( m ≪ n ) dictionary, the dictionary contains nSignal atoms { d j } j = 1 n . A signal y ∈ R m can be expressed as a sparse linear representation of these signal atoms. In fact, the so-called sparse means that the number of non-zero coefficients is much smaller than that of n. Such a sparse representation may be a determined y = D x or an approximate representation with an error term ‖ y − D x ‖ p ≤ ε . The vector x ∈ R n is the signal y sparse expression coefficient. In practice, p often takes a value of 1, 2, or ∞.

If m < n and the dictionary D is full rank, then the underdetermined system of the problem has an infinite number of solutions, and the solution using the least non-zero coefficient is one of them, and is the solution we hope to find. Sparse expression is expressed as a mathematical expression

( P 0 ) min ‖ x ‖ 0 subject to y = D x (9)

Or

( P 0 ϵ ) min ‖ x ‖ 0 subject to ‖ y − D x ‖ 2 ≤ ϵ (10)

where ‖ ⋅ ‖ 0 is ι 0 ―module, which means that the corresponding vector takes a non-zero quantity.

Dictionary DesignLearn the dictionary based on the signal set. First given a data set Y = { y i } i = 1 L , assuming that there is a dictionary D so that for a given signal can be represented as a sparse representation of the dictionary, i.e. for a given signal y i , the model ( P 0 ) Or ( P 1 ) can find the sparse coefficient x i . The question then is how to find such a dictionary D. Detailed reference can be found in the literature [

The model of the problem can be written as:

min D , X { ‖ Y − D x ‖ F 2 } s .t . ‖ x i ‖ 0 ≤ τ 0 , i = 1 , ⋯ , L (11)

where τ 0 is the upper bound of the coefficient sparsity, x i is the ith column of the coefficient matrix X, and ‖ ⋅ ‖ F 2 is the Frobenius norm of the matrix, i.e. the sum of the squares of the elements of the matrix.

Another model for dictionary learning is corresponding to the above model.

min D , X ∑ i = 1 L ‖ x i ‖ 0 s .t . { ‖ Y − D x ‖ F 2 } ≤ ϵ (12)

ϵ is a fixed error value.

Before applying ADMM, first make some transformations to the model, let Z = D X , then the model becomes:

min D , X , Z { ‖ Y − D X ‖ F 2 } s .t . Z = D X , ‖ x i ‖ 0 ≤ τ 0 , i = 1 , ⋯ , L (13)

Then the augmented Lagrangian function of the problem is:

L β ( D , X , Z , Λ ) : = ‖ Y − D X ‖ F 2 + ∑ i = 1 L 〈 Λ i , ( Z − D X ) i 〉 + β 2 ‖ Z − D X ‖ F 2 (14)

where Λ is the Lagrange multiplier matrix and Λ i is the ith column of Λ .

Using the ADMM algorithm in the above model, there is a X subproblem

min X ∑ i = 1 L 〈 Λ i , ( Z − D X ) i 〉 + β 2 ‖ Z − D X ‖ F 2 s .t . ‖ x i ‖ 0 ≤ τ 0 , i = 1 , ⋯ , L (15)

Equivalent to

min X β 2 ‖ Z + Λ / β − D X ‖ F 2 s .t . ‖ x i ‖ 0 ≤ τ 0 , i = 1 , ⋯ , L (16)

The Z subproblem is

min Z ‖ Y − Z ‖ F 2 + ∑ i = 1 L 〈 Λ i , ( Z − D X ) i 〉 + β 2 ‖ Z − D X ‖ F 2 (17)

This sub-question has a solution.

Z = ( β D X + 2 Y − Λ ) / ( 2 + β ) (18)

D sub-problem is

min D β 2 ‖ Z + Λ / β − D X ‖ F 2 (19)

Λ updated to

Λ ( k + 1 ) = Λ ( k ) + γ β ( Z − D X ) (20)

But ADL (ADMM for Dictionary Learning) is prone to the local best of the problem. Using linearization techniques, we extended LADMM to solve the problem of ADL local straits and proved the convergence of the algorithm. Numerical experiments are used to illustrate the effectiveness of the proposed algorithm.

In order to apply ADMM, we can rewrite (11) into the following form

min D , X ∑ i = 1 L ‖ x i ‖ 0 s .t . Y = D X (21)

Then the augmented Lagrangian function of the model (20) is

L μ ( X , Y , λ ) = ∑ i = 1 L ‖ x i ‖ 0 − λ T ( Y − D X ) + μ 2 ‖ Y − D X ‖ 2 2 (22)

The iterative method of ADMM is:

{ X k + 1 = arg min L μ ( X , Y k , λ k ) Y k + 1 = arg min L μ ( X k + 1 , Y , λ k ) λ k + 1 = λ k − μ ( Y k + 1 − D X k + 1 ) (23)

Now, we solve the subproblem in (22). First we solve the X-sub problem.

X k + 1 = arg min { ∑ i = 1 L ‖ x i ‖ 0 − ( λ k ) T ( Y k − D X ) + μ 2 ‖ Y k − D X ‖ 2 2 } = arg min { ∑ i = 1 L ‖ x i ‖ 0 − ( λ k ) T ( Y k − D X ) + μ 2 ‖ Y k − D X ‖ 2 2 } = ∑ i = 1 L ‖ x i ‖ 0 + μ 2 ‖ Y k − D X − λ k / μ ‖ 2 2 (24)

Because of the non-identity of matrix D, this subproblem does not show a solution. Inspired by [

D T ( D X k − Y k − λ k / μ ) T ( X − X k ) + ρ 2 ‖ X − X k ‖ 2 2 (25)

The parameter ρ > 0 controls the degree of approximation of X and X k , then we solve the following problem and use the solution of this problem to approximate the solution of the subproblem generated by ADMM.

X k = arg min { ∑ i = 1 L ‖ x i ‖ 0 + μ ρ 2 ‖ X − X k ‖ 2 2 + μ [ D T ( D X k − Y k − λ k / μ ) T ( X − X k ) ] } = arg min { ∑ i = 1 L ‖ x i ‖ 0 + μ ρ 2 ‖ X − X k + D T ( D X k − Y k + λ k / μ ) / ρ ‖ 2 2 } (26)

For the above problem, it is known from [

( x i ) k + 1 = arg min { ∑ i = 1 L ‖ x i ‖ 0 + μ ρ 2 ‖ x i − ( X k − D T ( D X k − Y k − λ k / μ ) ρ ) i ‖ 2 2 } = shrink ( ( X k − D T ( D X k − Y k − λ k / μ ) / ρ ) i , λ μ ρ ) (27)

Furthermore, for the Y subproblem, the Equation (10) in [

Y k + 1 = shrink 1 , 2 ( D X k + 1 − λ κ μ , 1 μ ) (28)

As can be seen from the above discussion, the LADMM iterative algorithm can be described by the following table.

Convergence Proof

In this section, we will demonstrate that the LADMM algorithm is convergent. I p is the unit matrix of p × p . Let S = R p × R n + p × R n + p and ω = ( β T , γ T , α T ) T , f ( β ) ∈ ∂ ( ‖ β ‖ 2 , 1 ) , g ( γ ) ∈ ∂ ( ‖ γ ‖ 1 ) . Solving the model is equivalent to finding ( β * , γ * , α * ) that satisfies the KKT condition of the model, i.e.

{ λ 2 f ( β * ) − X ˜ T α * = 0 g ( γ * ) + α * = 0 X ˜ β * − y ˜ − γ * = 0 (29)

Let us remember that the set of elements that satisfy the above formula in S is S * . The KKT condition of the above formula can be written as the form of variational inequality (VI) as:

( ω − ω * ) T F ( ω * ) ≥ 0 , ∀ ω ∈ S , (30)

where

F ( ω ) = { λ 2 f ( β ) − X ˜ T α g ( γ ) + α X ˜ β − y ˜ − γ } (31)

In order to prove these conclusions, as well as the proof of convergence of LADMM, need to introduce some lemma. For details, please refer to the literature [

In this chapter, we will discuss the application of the algorithm in image deblur ring to prove the effectiveness of the algorithm. All experiments were carried out on a four-core notebook computer with Intel Intel(R) Core(TM) i5-7200UCPU @ 2.50 GHz and 4 GB memory. Procedures for this experiment, pictures are referenced [

The noise levels are delta = 0.256, 0.511 respectively. For comparison, we also include the results of FTVd v4.1 in [

PSNR = 10 ⋅ log 10 255 2 MSE [ dB ]

MSE represents the average square error of each pixel.

In this paper, we propose a linearized alternating direction multiplier method

(LADMM) to solve the problem of rapid convergence of the dictionary model that is easy to fall into the local optimal solution. This model combines the advantages of linearization features to make sub-problems easier to solve. When the algorithm is applied to the problem of image reconstruction, we use the quadratic term of the linearized sub-problem to make the sub-problem easier to solve.

Moreover, in the case of similar PSNR, the algorithm does not need regularization operator.

In addition, we analyze the convergence of the LADMM algorithm. Our next step is to generalize the model to more non-convex problems (non-negative matrix factorization problems) and use practical problems.

The author declares no conflicts of interest regarding the publication of this paper.

Yu, X.L. (2019) Application of Linearized Alternating Direction Multiplier Method in Dictionary Learning. Journal of Applied Mathematics and Physics, 7, 138-147. https://doi.org/10.4236/jamp.2019.71012