^{1}

^{1}

This paper presents an accurate and efficient algorithm for solving the generalized elastic net regularization problem with smoothed
*l*
_{0} penalty for recovering sparse vector. Finding the optimal solution to the unconstrained
*l*
_{0} minimization problem in the recovery of compressive sensed signals is an NP-hard problem. We proposed an iterative algorithm to solve this problem. We then prove that the algorithm is convergent based on algebraic methods. The numerical result shows the efficiency and the accuracy of the algorithm.

Compressive sensing (CS) has been emerging as a very active research field and brought about great changes in the field of signal processing during recent years with broad applications such as compressed imaging, analog-to-information conversion, biosensors, and so on [

min x ∈ R N ‖ x ‖ 0 , subjectto A x = y , (1)

where y ∈ R m , A ∈ R m × N is a measurement matrix, ‖ ⋅ ‖ 2 denotes the Euclidean norm and ‖ x ‖ 0 , formally called the quasi-norm, denotes the number of the nonzero components of x = ( x 1 , x 2 , ⋯ , x n ) T ∈ R N , and the λ > 0 is a regulari- zation parameter.

We can then solve the unconstrained l 0 regularization problem

min x ∈ R N { 1 2 ‖ A x − y ‖ 2 2 + λ ‖ x ‖ 0 } , (2)

A natural approach to this problem is to solve a convex relaxation l 1 re- gularization problem [

min x ∈ R N { 1 2 ‖ A x − y ‖ 2 2 + λ ‖ x ‖ 1 } , (3)

where the ‖ x ‖ 1 = ∑ i = 1 N ‖ x i ‖ is the l 1 norm. Undoubtly, the l 1 regularization

has many applications [

min x ∈ R N { 1 2 ‖ A x − y ‖ 2 2 + λ 1 ‖ x ‖ 1 + λ 2 ‖ x ‖ 2 } , (4)

where the λ 1 , λ 2 > 0 are two regularization parameters. It is proved in many papers that the elastic net regularization outperforms the Lasso in prediction accuracy. Cands proved that as long as A satisfies the RIP condition with a constant parameter, the l 1 minimization can yield an equivalent solution as that of l 0 minimization [

min x ∈ R N { 1 2 ‖ A x − y ‖ 2 2 + λ 1 ‖ x ‖ 0 + λ 2 ‖ x ‖ 2 2 } , (5)

Unfortunately, the l 0 norm minimization problem is NP-hard [

min x ∈ R N { 1 2 ‖ A x − y ‖ 2 2 + λ 1 ‖ x ‖ 0 , δ + λ 2 ‖ x ‖ 2 2 } , (6)

where ‖ x ‖ 0 , δ = ∑ i = 1 N x i 2 x i 2 + δ , the δ > 0 is a parameter which approaches zero in

order to approximate ‖ x ‖ 0 .

In this paper,we propose an iterative algorithm for recovering sparse vectors which substitute the l 0 penalty with a function [

The rest of this paper is organized as follows. We develop the new algorithm in Section 2 and prove its convergence in Section 3. Experiments on accuracy and efficiency are reported in Section 4. Finally, we conclude this paper in Section 5.

The reconstruction method discussed in this paper is for directly approaching the l 0 norm and obtaining its minimal solution with suitably designed objective functions. We denote by C δ ( x , λ 1 , λ 2 ) the objective function of the minimization problem (6).

C δ ( x , λ 1 , λ 2 ) = 1 2 ‖ A x − y ‖ 2 2 + λ 1 ∑ i = 1 N x i 2 x i 2 + δ + λ 2 ‖ x ‖ 2 2 . (7)

Our goal is to minimize the objective function. For any δ > 0 and λ 1 , λ 2 > 0 , the minimization problem is convex coercie, thus it has a solution. So the optimal solution of (7) can be given according to the optimal condition.

A T ( A x ^ − y ) + [ 2 λ 1 x ^ i δ ( x ^ i 2 + δ ) 2 ] 1 ≤ i ≤ N + 2 λ 2 x ^ = 0. (8)

Then we can present the following iterative algorithm to solve the above minimization problem.

In this section, we prove that the algorithm is convergent. Firstly, we start from the lemma 1 [

Lemma 1. Given δ > 0 , then the inequality

x 2 x 2 + δ − y 2 y 2 + δ − 2 δ ( x − y ) y ( x 2 + δ ) 2 ≥ δ ( x − y ) 2 ( x 2 + δ ) 2 (11)

holds for any x , y ∈ R .

Proof. We first denote f ( x 2 ) = x 2 x 2 + δ , then by the mean value theorem, we

have

f ( x 2 ) − f ( y 2 ) = f ′ ( ξ ) ( x 2 − y 2 ) where ξ between x 2 and y 2 . (12)

So we have

x 2 x 2 + δ − y 2 y 2 + δ = δ ( x 2 − y 2 ) ( ξ + δ ) 2 = 2 δ ( x − y ) y + δ ( x − y ) 2 ( x 2 + δ ) 2 . (13)

Thus, we can simplify the inequality as follow:

x 2 x 2 + δ − y 2 y 2 + δ − 2 δ ( x 2 − y 2 ) ( x 2 + δ ) 2 ≥ δ ( x − y ) 2 ( x 2 + δ ) 2 .

This inequality of (11) holds no matter x 2 > y 2 , x 2 < y 2 or x 2 = y 2 . And the next Lemma proves that the sequence x ( k ) drives the function C δ ( x , λ 1 , λ 2 ) downhill. □

Lemma 2. For any δ > 0 and λ 1 , λ 2 > 0 , let x ( k + 1 ) be the solution of (9) for k = 1 , 2 , 3 , ⋯ Then we can have

‖ A x k − A x k + 1 ‖ 2 2 ≤ 2 ( C δ ( x k , λ 1 , λ 2 ) − C δ ( x k + 1 , λ 1 , λ 2 ) ) . (14)

Furthermore,

‖ x k − x k + 1 ‖ 2 2 ≤ c ( C δ ( x k , λ 1 , λ 2 ) − C δ ( x k + 1 , λ 1 , λ 2 ) ) . (15)

where c is a positive constant that depends on λ 2 .

Proof.

C δ ( x k , λ 1 , λ 2 ) − C δ ( x k + 1 , λ 1 , λ 2 ) = λ 1 ∑ i = 1 N ( ( x i k ) 2 ( x i k ) 2 + δ − ( x i k + 1 ) 2 ( x i k + 1 ) 2 + δ ) + λ 2 ( ‖ x k ‖ 2 2 − ‖ x k + 1 ‖ 2 2 ) + 1 2 ( ‖ A x k − y ‖ 2 2 − ‖ A x k + 1 − y ‖ 2 2 ) = λ 1 ∑ i = 1 N ( ( x i k ) 2 ( x i k ) 2 + δ − ( x i k + 1 ) 2 ( x i k + 1 ) 2 + δ ) + 1 2 ‖ A x k − A x k + 1 ‖ 2 2 + λ 2 ‖ x k − x k + 1 ‖ 2 2 + 2 λ 2 ( x k − x k + 1 ) T x k + 1 + ( A x k − A x k + 1 ) T ( A x k + 1 − y ) . (16)

Using (9). The last term in (16) can be simplified to be

( A x k − A x k + 1 ) T ( A x k + 1 − y ) = ( A x k − A x k + 1 ) T [ − ( A T ) − 1 ( 2 λ 2 x k + 1 + 2 λ 1 δ x k + 1 ( ( x k ) 2 + δ ) 2 ) ] = − ∑ i = 1 N 2 λ 1 δ x i k + 1 + ( x i k − x i k + 1 ) ( ( x k ) 2 + δ ) 2 − 2 λ 2 ( x k − x k + 1 ) T x k + 1 . (17)

Substituting (15) into (16) and using (11),

C δ ( x k , λ 1 , λ 2 ) − C δ ( x k + 1 , λ 1 , λ 2 ) = λ 1 ∑ i = 1 N ( ( x i k ) 2 ( x i k ) 2 + δ − ( x i k + 1 ) 2 ( x i k + 1 ) 2 + δ − 2 δ x i k + 1 ( x i k − x i k + 1 ) ( ( x k ) 2 + δ ) 2 ) + 1 2 ‖ A x k − A x k + 1 ‖ 2 2 + λ 2 ‖ x k − x k + 1 ‖ 2 2 ≥ ∑ i = 1 N δ λ 1 ( x k − x i k + 1 ) 2 ( ( x k ) 2 + δ ) 2 + 1 2 ‖ A x k − A x k + 1 ‖ 2 2 + λ 2 ‖ x k − x k + 1 ‖ 2 2 . (18)

□

Since ∑ i = 1 N δ λ 1 ( x k − x i k + 1 ) 2 ( ( x k ) 2 + δ ) 2 ≥ 0 for any x k and x k + 1 . From (18) we can

obtain the results of (14) and (15) with C = 1 λ 2 .

Lemma 3. ( [

Theorem 1. For any δ > 0 and λ 1 , λ 2 > 0 . Then the iterative solutions x k in (9) converge to x * , that is lim k → ∞ x k = x * and x * is a critical point of (6).

Proof. Here, we need to prove that the sequence x k is bounded. We assume that x k i is one convergent subsequence of x k and its limit point is x * . By (15) we know that the sequence x k i + 1 also converges to x * . If we replace x k , x k + 1 with x k i , x k i + 1 in (10) and letting i → ∞ yields.

□

2 λ 1 δ x j * ( ( x j * ) 2 + δ ) 2 + A T ( A x * − y ) + 2 λ 2 x * = 0. (19)

And this implies that the limit point which converges to any convergent subsequence of x k is the critical point of (8). In order to prove the convergence of sequence x k , we need to prove that the limit point set M, which contains all the limit points of convergent subsequence of x k is a finite set. So we have to prove that the following equation has finite solutions.

[ 2 λ 1 δ u j ( ( u i ) 2 + δ ) 2 ] 1 ≤ i ≤ N + A T ( A u − y ) + 2 λ 2 u = 0. (20)

where u = ( u 1 , u 2 , ⋯ , u N ) T ∈ R N . We can rewrite (20) as follow:

[ 2 λ 1 δ u j ( ( u i ) 2 + δ ) 2 ] 1 ≤ i ≤ N + ( A T A + 2 λ 2 I N ) u − A T y = 0. (21)

It is obvious that A T A + 2 λ 2 I N is a positive definite matrix, A T y ∈ R N is the N × N identity matrix. Then the (21) can be rewritten as the following eq- uation:

2 λ 1 δ u + B ( ( A T A + 2 λ 2 I N ) u − A T y ) = 0. (22)

where B is an N × N diagonal matrix with diagonal entries B i i = ( ( u i ) 2 + δ ) 2 , i = 1 , 2 , 3 , ⋯ , N . We denote A T A + 2 λ 2 I N = ( a i i ) N × N and A T y = ( q 1 , q 2 , ⋯ , q N ) T . Then

( 2 λ 1 δ u 1 + ( a 11 u 1 + a 12 u 2 + ⋯ + a 1 N u N − q 1 ) ( u 1 2 + δ ) 2 = 0 , 2 λ 1 δ u 2 + ( a 21 u 1 + a 22 u 2 + ⋯ + a 2 N u N − q 2 ) ( u 2 2 + δ ) 2 = 0 , ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ 2 λ 1 δ u N + ( a N 1 u 1 + a N 2 u 2 + ⋯ + a N N u N − q N ) ( u N 2 + δ ) 2 = 0. (23)

If we want to prove that (23) has finite solutions, then we can prove the (22) system has finite solutions. According to lemma 3, if we prove that the highest ordered system of (23) has only trivial solution, then it's easy to conclude that the Equation (23) has finite solutions.

( ( a 11 u 1 + a 12 u 2 + ⋯ + a 1 N u N ) u 1 4 = 0 , ( a 21 u 1 + a 22 u 2 + ⋯ + a 2 N u N ) u 2 4 = 0 , ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ ( a N 1 u 1 + a N 2 u 2 + ⋯ + a N N u N ) u N 4 = 0. (24)

We prove the system (24) has only trivial solution. We assume that u = ( u 1 , u 2 , ⋯ , u s , 0 , ⋯ , 0 ) T ∈ R N is a nonzero solution of (24), u i ≠ 0 for i = 1 , 2 , ⋯ , s , 1 ≤ s ≤ N . Then we have

C u s = 0. (25)

where C = ( a i i ) s × s is the s × s leading principle submatrix of matrix A T A + 2 λ 2 I N is the positive definite, therefore the matrix C is positive definite as well. So we have u i = 0 for i = 1 , 2 , ⋯ , s . This contradicts the assumption of u i ≠ 0 , i = 1 , 2 , ⋯ , s , 1 ≤ i ≤ s .

Therefore, the system (24) has only trivial solution. So the Equation (20) has finite solutions. Since all the limit points of convergent subsequence of x ( k ) satisfies the Equation (20) and we have proved that (20) has finite solutions. So the limit point set M is a finite set. Combining with ‖ x ( k + 1 ) − x ( k ) ‖ 2 → 0 as k → ∞ , we thus obtain that the sequence x ( k ) is convergent and limit x * is a critical point of problem (7).

In this section, we present some numerical experiments to show the efficiency and the accuracy of the Algorithm 1 for sparse vector recovery. We compare the performance of Algorithm 1 with l 1 IST [

The experiment results contain two parts: the first one focuses on the comparison of the two algorithms in accuracy; the other one focuses on the efficiency of the two algorithms. In the experiments, the mean squared error of the original vector and the result is recorded as

MSE = ‖ x k − x 0 ‖ 2 2 / N (26)

The matrix A ∈ R 100 × 250 and the original sparse vector x 0 ∈ R 250 was gene- rated randomly according to the standard Gaussian distribution with N-length and s-sparse, which varies as 2, 4, 6, 8, ・・・, 48. The location of the nonzero elements were randomly generated. The regularization parameters were set as δ = 10 − 6 and λ 1 = 10 − 3 , λ 2 = 10 − 5 . All the other parameters of the two al- gorithms were set to be the same.The results are shown in

The

In this subsection, we focus on the speed of the two algorithms. We conduct various experiments to test the effectiveness of the proposed algorithm.

Sparsity | Algorithm | Time | MSE |
---|---|---|---|

2 | IAGENR-L0 | 0.038234 s | 7.35e−007 |

IST | 0.721920 s | 1.68e−006 | |

4 | IAGENR-L0 | 0.088917 s | 1.03e−006 |

IST | 0.943177 s | 2.21e−006 | |

8 | IAGENR-L0 | 0.031317 s | 1.83e−007 |

IST | 2.146079 s | 3.95e−003 | |

16 | IAGENR-L0 | 0.046566 s | 1.35e−009 |

IST | 2.368845 s | 9.78e−002 | |

32 | IAGENR-L0 | 0.501139 s | 8.24e−003 |

IST | 1.608879 s | 1.14e−001 |

In this paper, we consider an iterative algorithm for solving the generalized elastic net regularization problems with smoothod l 0 penalty for recovering sparse vectors. Then a detailed proof of convergence of the iterative algorithm is given in Section 2 by using the algebraic method. Additionally, the numerical experiments in Section 3 show that our iterative algorithm is convergent and performs better than the IST on recovering sparse vectors.

Li, S.S. and Ye, W.Z. (2017) A Generalized Elastic Net Regularization with Smoothed Penalty. Advances in Pure Mathematics, 7, 66- 74. http://dx.doi.org/10.4236/apm.2017.71006