lqMinimization, Nonconvex Optimization, Convergence, Critical Point" /> l1Algorithm for ElasticlqRegularization"/> l1Algorithm for ElasticlqRegularization"/> l1Algorithm for ElasticlqRegularization"/> lqMinimization, Nonconvex Optimization, Convergence, Critical Point"/>

Advances in Pure Mathematics
Vol.06 No.07(2016), Article ID:67409,9 pages
10.4236/apm.2016.67036

An Efficient Adaptive Iteratively Reweighted l1 Algorithm for Elastic lq Regularization

Yong Zhang, Wanzhou Ye

Department of Mathematics, College of Science, Shanghai University, Shanghai, China

Copyright © 2016 by authors and Scientific Research Publishing Inc.

This work is licensed under the Creative Commons Attribution International License (CC BY).

http://creativecommons.org/licenses/by/4.0/

Received 12 May 2016; accepted 13 June 2016; published 16 June 2016

ABSTRACT

In this paper, we propose an efficient adaptive iteratively reweighted algorithm (A-IRL1 algorithm) for solving the elastic regularization problem. We prove that the sequence generated by the A-IRL1 algorithm is convergent for any rational and the limit is a critical point of the elastic regularization problem. Under certain conditions, we present an error bound for the limit point of convergent sequence.

Keywords:

Compressed Sensing, Elastic Minimization, Nonconvex Optimization, Convergence, Critical Point

1. Introduction

Compressed sensing (CS) has been emerging as very active research field and brings about great changes in the fields of signal processing in recent years [1] [2] . The main task of CS focuses on the recovery of sparse signal from a small number of linear measurement data. It can be mathematically modeled as following optimization problem,

(1)

where, (commonly) is a measurement matrix and, formally called the quasi-norm, denotes the number of nonzero components of. In general, it is difficult to tackle problem (1) due to its nonsmooth and nonconvex nature. In recent years, some researchers have proposed the norm regularization problem [3] - [5] with, that is, to consider the following regularization problem

(2)

or the unconstrained regularization problem

(3)

where for and is a regularization parameter.

When, it is well known that the problems (2) and (3) are both convex optimization problems, and therefore, can be solved efficiently [6] [7] . On the other hand, when, the above problems (2) and (3) lead to nonconvex, nonsmooth and even non-Lipschitz optimization problem. It is difficult to solve them fastly and efficiently. Iterative reweighted algorithms, which include iteratively reweighted algorithm [8] and iteratively reweighted least squares [9] , are very effective for solving the nonconvex regularization problem.

In this paper, we consider the following elastic regularization problem,

(4)

where are two parameters. When, the above problem (4) reduces to the well-known elastic-net regularization proposed by Zou and Hastie [10] , which is an effective method for variable selection. In [10] , Zou et al. showed that this method outperformed Lasso [11] in terms of prediction accuracy for both simulation studies and real-data applications on variable selection. For further statistical properties of the elastic-net regularization in detail, we refer to references [12] [13] . When, problem (4) is an extension of elastic net regularization from penalty to penalty. In statistics, elastic regularization is usually very effective for group variable selection.

Obviously, for, the norm term in (4) is not differentiable at zero. Therefore, in this paper, we study the following relaxed elastic minimization problem with

(5)

The model (5) can be considered as an approximation to the model (4) as. In order to solve the above problem (5), we propose the following adaptive iteratively reweighted minimization algorithm (A-IRL1 algorithm),

(6)

where the weight is defined by the previous iterates and updated in each iteration as

The adaptive iteratively update of in the proposed algorithm is the same as the one in [9] , which is also adopted in [14] . The A-IRL1 algorithm (6) solves a convex minimization problem, which can be solved by many efficient algorithms [6] [7] [15] .

The relaxed elastic regularization problem (5) can be solved by A-IRL1 algorithm (6). In this paper, we prove that any sequence generated by the A-IRL1 algorithm (6) is convergent itself for any rational as the case. Moreover, we present an error bound between the limit point and the sparse solution of problem (1).

The rest of this paper is organized as follows: In Section 2, we summarize the A-IRL1 algorithm for solving elastic regularization problem (5). In Section 3, we present a detail convergence analysis for the A-IRL1 algorithm (6). We prove that the A-IRL1 algorithm is convergent for any rational based on an algebraic method with. Furthermore, under certain conditions, we present an error bound between the limit point and the sparse solution of problem (1). Finally, a conclusion is given in Section 4.

2. A-IRL1 Algorithm for Solving Elastic Regularization

We give a detailed implementation of A-IRL1 algorithm (6) for solving elastic regularization problem (5). The algorithm is summarized as Algorithm 1.

In Algorithm 1, is the rearrangement of the absolute values of in decreasing order. If, we choose to be the approximate sparse solution and stop iteration. Otherwise, we stop the algorithm within a reasonable time and return the last.

It is clear from Algorithm 1 that is a nonincreasing sequence which is convergent to some nonnegative number. In the next section, we prove that the sequence is convergent when, and the limit is a critical point of problem (5) with. Furthermore, we also present an error bound for the limit point.

3. Convergence of Algorithm 1

In this section, we first prove that the the sequence generated by Algorithm 1 is bounded and asymp- totically regular. Then, based on an algebraic method, we prove that Algorithm 1 is convergent for any rational with. Next, we begin with the following inequality.

Lemma 1. Given and, then the inequality

(7)

holds for any.

Proof. We first define. For any, by the mean value theorem, we have

(8)

The following inequality is always hold for any, or,

Let and, we thus have

(9)

After rewriting the terms of (9), we thus get the desired inequality (7).

Our next result shows the monotonicity of along the sequence and this sequence is also asymptotically regular.

Lemma 2. Let be the sequence generated by Algorithm 1. Then we have

(10)

Furthermore,

(11)

Proof. Since is a solution of problem (6), we thus have,

Besides, we can get the subgradient of as follows,

(12)

Hence, we find

(13)

which means that there exists such that

(14)

where

We compute

(15)

Using (14), we have

(16)

Substituting (16) into (15) and using Lemma 1 yields

(17)

where the first inequality uses and, and the last inequality uses Lemma 1. Therefore, from (17) we get the desired results (10) and (11).

From Lemma 2 (10), we know that is monotonically decreasing and bounded. Otherwise, as. Therefore, is also bounded. On the other hand, from (11) we obtain that the sequence is asymptotically regular.

In order to prove that the whole sequence generated by Algorithm 1 is convergent, we need the following lemma, which plays an important role in the proof of convergence. The following lemma mainly states that for almost every system of n polynomial equations in n complex variables, if its corresponding highest ordered system of equations have only trivial solution, then there is a finite number of solutions to the n polynomial equations. For detailed proof refer to Theorem 3.1 in [16] .

Lemma 3. ( [16] ) Let n polynomial equations in n complex variables be given, and let be its corresponding highest ordered system of equations. If has only the trivial solution, then has solutions, where is the degree of.

With above lemmas, we are now in a position to present the convergence of Algorithm 1 for any rational number with.

Theorem 1 For any, if the limit of is, then the sequence generated by Algorithm 1 is convergent. Denoting the limit by, i.e.,. Moreover, the limit is a critical point of problem (5) with.

Proof. From (10), we know that the sequence is monotonically decreasing and bounded below. Thus, we can infer that the sequence is also bounded. The boundedness of implies that

there exists at least one convergent subsequence. We assume that is any one of the convergent subsequences of with limit. By (11), we know that the sequence also converges to. Now replacing, , , with, , , in (14) respectively, and letting yields

(18)

where.

The above Equation (18) demonstrates that the limit of any convergent subsequence of is a stationary point of problem (5) with. In order to prove the convergence of the whole sequence, one first needs to prove that the limit point set, denoted by, which contains all the limit points of convergent subsequence of, is a finite set.

A classification is made for the limit point set with different sparsity s,. That is the set

which contains all the limit points with each sparsity s. If we prove the set is a finite set, then we obtain that the limit point set is also a finite set. Without loss of generality, we define a set

(19)

Furthermore, for any given with, we define another set

(20)

From (19) and (20), we have. If we prove that the set is a finite set, then it implies that the set is also finite, and we further conclude that the limit point set is a finite set.

For any with, let denotes the support set of with. By (18), we know that satisfies the following equation

(21)

where denotes the subvector of with components restricted to S and denotes the submatrix of A with columns restricted to S. Next, if we prove the Equation (21) has finite solutions, then we can obtain the set as a finite set.

It is clear that (21) can be rewritten as follows:

(22)

where is an positive-definite matrix, and is the identity matrix. We observe that (22) can further be rewritten as follows:

(23)

where is an diagonal matrix with the diagonal entries for. Without loss of generality, we denote and. Let, where are two positive integers. By using simple calculation for Equation (23), we get the following system:

(24)

Since all the solutions of Equation (21) satisfy (24), we can thus show that Equation (21) has finite solutions as long as we can prove that (24) has finite solutions. To do that, we show that the following system has finite solutions:

(25)

where. Now, we extract the highest order terms from system (25) to get the following system:

(26)

To prove that system (26) has only trivial solution, we use the method of proof by contradiction. Without loss of generality, we assume is a nonzero solution of (26), for, and. By the assumption for, and from (26) we can get the following equation:

(27)

where is the leading principal submatrix of matrix and for. Because the matrix is positive definite; implies that the

matrix B is also positive definite, and thus we have for. This contradicts the assumption that,. Therefore, we get that the system (26) has only trivial solutions. According to Lemma 3, we deduce that the system (25) has finite solutions, which further implies that the Equation (21) has also finite solutions, that is, the set is a finite set. Therefore, we get that the limit point set is a finite set.

Combining with as, we thus obtain that the sequence is convergent. By the convergence of sequence and (18), we obtain that the limit is a critical point of problem (5) with.

Theorem 1 gives a detailed convergence proof of Algorithm 1 based on an algebraic approach. In the next, we will present an error bound between the convergent limit and the sparse solution of problem (1).

Under the Restricted Isometry Property (RIP) on the matrix A, we present an error bound between the convergent limit and the sparse solution of problem (1). First of all, we give a definition of RIP on the matrix A as follows.

Definition: For every integer, we define as the s-restricted isometry constant of A as the smallest positive quantity such that

(28)

for all subsets of cardinality at most s and vectors x supported on T.

Under the RIP assumption, we can ensure that the limit is a reasonable approximation of the sparse solution if has a very small tail in the sense that

for, which is the error term of the best s-term approximation of in the -norm.

With the concept of RIP, we are able to prove the result of following theorem.

Theorem 2. Suppose that x is an s-sparse solution of (1) satisfying. Assume that A satisfies the RIP of order 2s with and. For any fixed.

(1) When, the limit of convergent sequence is a critical point of problem (5) with, and it satisfies

(29)

(2) When, there must exist a subsequence from converging to an s-sparse point which satisfies

(30)

Here, and C are positive constants dependent on and the initial point.

Proof. (1). In Theorem 1, we have proved that the limit of convergent sequence is a critical point of problem (5) with.

We use Lemma 2 to get

(31)

where we use the assumption that the initial value satisfies and in Algorithm 1. By (31), we have

et S be the index set of the s-sparse solution x, and let be the index set of s largest entries in the absolute value of. Since, we have

Letting and.

(3) If, then for some k or holds for sufficiently large k and some integer. In the former case, is an -sparse vector, and we denote. In the latter case, by the boundedness of, we have. Then. That is, is an s-sparse vector. Therefore, in both cases, we have an s-sparse limit point. Without loss of generality, we assume. Using RIP of A, we get

where the third inequality uses (31) with, the last equality uses the assumption and. Denoting. This completes the proof.

Under the condition of RIP on the matrix A, when, Theorem 2 provide an error bound between the convergent limit and the sparse solution of problem (1). While, we present an error bound for the limit point of any convergent subsequence. In this case, the limit point of any convergent subsequence is an s-sparse vector.

4. Conclusion

The iteratively reweighted algorithm has been widely used for solving nonconvex optimization problem. In this paper, we propose an efficient adaptive iteratively reweighted algorithm (6) for solving the elastic regularization (5) and we prove the convergence of the algorithm. In particular, we first prove that the sequence generated by Algorithm 1 is bounded and the sequence is asymptotically regular. When, based on an algebraic method, we prove that the sequence generated by Algorithm 1 is convergent for any rational and the limit is a critical point of problem (5) with. Furthermore, under the condition of the RIP on the matrix A, when, we present an error bound between the convergent limit and the sparse solution of problem (1). While, we present an error bound for the limit point of any convergent subsequence. Our established convergence results provide a theoretical guarantee for a wide range of applications of adaptive iteratively reweighted algorithm.

Cite this paper

Yong Zhang,Wanzhou Ye, (2016) An Efficient Adaptive Iteratively Reweighted l1 Algorithm for Elastic lq Regularization. Advances in Pure Mathematics,06,498-506. doi: 10.4236/apm.2016.67036

References

  1. 1. Donoho, D.L. (2006) Compressed Sensing. IEEE Transactions on Information Theory, 52, 1289-1306.
    http://dx.doi.org/10.1109/TIT.2006.871582

  2. 2. Candès, E., Romberg, J. and Tao, T. (2006) Robust Uncertainty Principles: Exact Signal Reconstruction from Highly Incomplete Frequency Information. IEEE Transactions on Information Theory, 52, 489-509.
    http://dx.doi.org/10.1109/TIT.2005.862083

  3. 3. Chartrand, R. (2007) Exact Reconstruction of Sparse Signals via Nonconvex Minimization. IEEE Signal Processing Letters, 14, 707-710.
    http://dx.doi.org/10.1109/LSP.2007.898300

  4. 4. Xu, Z.B., Chang, X.Y., Xu, F.M. and Zhang, H. (2012) L1/2 Regularization: A Thresholding Representation Theory and a Fast Solver. IEEE Transactions on Neural Networks and Learning Systems, 23, 1013-1027.
    http://dx.doi.org/10.1109/TNNLS.2012.2197412

  5. 5. Foucart, S. and Lai, M. (2009) Sparsest Solutions of Underdetermined Linear Systems via lq Minimization for 0<q≤1. Applied and Computational Harmonic Analysis, 26, 395-407.
    http://dx.doi.org/10.1016/j.acha.2008.09.001

  6. 6. Chen, S.S., Donoho, D.L. and Saunders, M.A. (1998) Atomic Decomposition by Basis Pursuit. SIAM Journal on Scientific Computing, 20, 33-61.
    http://dx.doi.org/10.1137/S1064827596304010

  7. 7. Daubechies, I., Defrise, M. and Christine, D.M. (2004) An Iterative Thresholding Algorithm for Linear Inverse Problems with a Sparsity Constraint. Communications on Pure and Applied Mathematics, 57, 1413-1457.
    http://dx.doi.org/10.1002/cpa.20042

  8. 8. Candes, E.J., Wakin, M.B. and Boyd, S.P. (2008) Enhancing Sparity by Reweighted l1 Minimization. Journal of Fourier Analysis and Applications, 14, 877-905.
    http://dx.doi.org/10.1007/s00041-008-9045-x

  9. 9. Daubechies, I., Devore, R., Fornasier, M. and Güntük, C.S. (2010) Iteratively Reweighted Least Suqares Minimization for Sparse Recovery. Communications on Pure and Applied Mathematics, 63, 1-38.
    http://dx.doi.org/10.1002/cpa.20303

  10. 10. Zou, H. and Hastie, T. (2005) Regularization and Variable Selection via the Elastic Net. Journal of the Royal Statistical Society: Series B, 67, 301-320.
    http://dx.doi.org/10.1111/j.1467-9868.2005.00503.x

  11. 11. Tibshirani, R. (1996) Regression Shrinkage and Selection via The Lasso. Journal of the Royal Statistical Society: Series B, 58, 267-288.

  12. 12. Jin, B.T., Lorenz, D.A. and Schiffler, S. (2009) Elastic-Net Regularization: Error Estimates and Active Set Methods. Inverse Problems, 25, Article ID: 115022.
    http://dx.doi.org/10.1088/0266-5611/25/11/115022

  13. 13. De Mola, C., De Vitob, E. and Rosasco, L. (2009) Elastic-Net Regularization in Learning Theory. Journal of Complexity, 25, 201-230.
    http://dx.doi.org/10.1016/j.jco.2009.01.002

  14. 14. Lai, M.J., Xu, Y.Y. and Yin, W.T. (2013) Improved Iteratively Reweighted Least Squares for Unconstrained Smoothed lq Minimization. SIAM Journal on Numerical Analysis, 51, 927-957.
    http://dx.doi.org/10.1137/110840364

  15. 15. Donoho, D.L. and Tsaig, Y. (2008) Fast Solution of l1 Norm Minimization Problems When the Solution May Be Sparse. IEEE Transactions on Information Theory, 54, 4789-4812.
    http://dx.doi.org/10.1109/TIT.2008.929958

  16. 16. Garcia, C.B. and Li, T.Y. (1980) On the Number of Solutions to Polynomial Systems of Equations. SIAM Journal on Numerical Analysis, 17, 540-546.
    http://dx.doi.org/10.1137/0717046