^{1}

^{*}

^{1}

^{1}

In this paper, we proposed an iterative reweighted <i>l</i><sub>1</sub> penalty regression approach to solve the line spectral estimation problem. In each iteration process, we first use the ideal of Bayesian lasso to update the sparse vectors ; the derivative of the penalty function forms the regularization parameter. We choose the anti-trigonometric function as a penalty function to approximate the <i>l</i><sub>0</sub> norm. Then we use the gradient descent method to update the dictionary parameters. The theoretical analysis and simulation results demonstrate the effectiveness of the method and show that the proposed algorithm outperforms other state-of-the-art methods for many practical cases.

Spectral estimation technology is widely used in the fields of electronic countermeasures, radar, sonar and mobile communication. In this paper, we mainly consider the line spectral estimation in compressed sensing. Considering the problem of the line spectral estimation using a pre-specified discrete Fourier transform matrix, the sparse solution we obtained may not close to the real sparse vectors when the true frequency components may not lie on the pre-specified frequency grid. This error, referred as grid mismatch, results in performance degradation or even recovery failure. Therefore, in this paper, we treat the dictionary parameters as the unknown variable along with the sparse signal, and complete the optimization of the dictionary parameters when we estimate the sparse vector through the iterative way.

Rather than applying the traditional compressed sensing theory, an increasing number of scholars have concentrated on the grid mismatch problem instead. For example, work [

In addition, we analyze the first-order optimal condition of the original problem and then prove that the problem can be transformed into a series of reweighted lasso [

The remainder of the paper is organized as follows. Section 2 is the description of the line spectral estimation problem, which we formulate as the penalty least squares problem with dictionary parameters. In Section 3, we provide a theoretical analysis and propose the iterative reweighted l 1 algorithm. In Section 4, we present several sets of numerical experiments to demonstrate that the iterative reweighted l 1 method is better than other state-of-the-art algorithms in many cases. Section 5 concludes the paper and provides some ideas for future work.

Assume the line spectral estimation problem where the observed signal is a summation of a number of complex sinusoids:

y n = ∑ j = 1 k β j e − i n θ j + ε n , n = 1 , ⋯ , N (1)

And we write it in the form of a matrix expression:

Y = X ( θ ) β + ε (2)

where Y T = [ y 1 , ⋯ , y N ] represents the observed value. θ T = [ θ 1 , ⋯ , θ k ] is unknown parameters represent the frequency. X ( θ ) N ∗ k ( N ≪ k ) determined by parameter θ . The covariate in the model β T = [ β 1 , ⋯ , β k ] represents the amplitude of the corresponding frequency. ε T = [ ε 1 , ⋯ , ε n ] represents a random error term, assuming that they are independent.

In the process of signal reconstruction, the dimension of Y is much smaller than the number of measurements ( N ≪ k ). Since the signal is sparse, the Equation (2) would be transformed into an optimization problem (3):

min ‖ β ‖ 0 s .t Y = X ( θ ) β + ε (3)

where ‖ β ‖ 0 stands for the number of the non-zero components of ‖ β ‖ . The optimization (3), however, is an NP-hard problem (which is difficult to find the solution in polynomial time). We can transform optimization (3) into a penalty least squares problem:

min G ( β ) s .t Y = X ( θ ) β + ε (4)

The optimization (4) can be formulated as an unconstrained optimization problem by removing the constraint and adding a penalty term to the objective function:

min β , θ H ( θ , β ) = ‖ Y − X ( θ ) β ‖ 2 2 + λ ∑ i = 1 k G ( β i ) (5)

where λ represents the adjustable penalty parameter. Different penalty functions form different regularization parameters in the iterative process. We find that the penalty function of the inverse trigonometric function has better properties than other common penalty function such as logarithmic penalty function. In the next section we propose an iterative reweighted l 1 sparse algorithm with anti-trigonometric function penalties.

We now develop an iterative reweighted algorithm for joint dictionary parameter learning and sparse signal recovery. Consider the line spectral estimation with anti-trigonometric penalty function:

min β , θ H ( θ , β ) = ‖ Y − X ( θ ) β ‖ 2 2 + λ ∑ i = 1 k arctan ( φ | β i | ) (6)

We consider the first derivative of the problem (6). Since the absolute value is involved, we summarize the following derivative functions:

∂ H ( θ , β ) ∂ β i = { 2 ( X ′ ( θ ) X ( θ ) β − X ′ ( θ ) Y ) i + λ φ 1 + φ 2 | β i | 2 , β i > 0 2 ( X ′ ( θ ) X ( θ ) β − X ′ ( θ ) Y ) i − λ φ 1 + φ 2 | β i | 2 , β i < 0 2 ( X ′ ( θ ) X ( θ ) β − X ′ ( θ ) Y ) i − λ C | β i | , β i = 0 (7)

∂ H ( θ , β ) ∂ θ = β ′ ( ∂ X ′ ( θ ) ∂ θ X ( θ ) + X ′ ( θ ) ∂ X ( θ ) θ ) β − 2 Y ′ ∂ X ′ ( θ ) θ β (8)

The penalty function arctan ( φ | β i | ) cannot be guided at zero, C | β i | represents its sub-gradient at zero which is a set of real number: a ≤ C | β i | ≤ b

a = lim β i → 0 − arctan ( φ | β i | ) − arctan ( 0 ) β i − 0 = − φ b = lim β i → 0 + arctan ( φ | β i | ) − arctan ( 0 ) β i − 0 = φ (9)

If we have the iteration value of step t: ( θ t , β t ) , combine with (7) we can estimate β t + 1 by solving the next weighted lasso problem:

G ( β t + 1 | θ t , β t ) = min β ‖ Y − X ( θ ) β t + 1 ‖ 2 2 + ∑ i = 1 k λ φ 1 + φ 2 | β i t | 2 | β i t + 1 | (10)

Here we use the ideal of Bayesian lasso [

∂ G ( β t + 1 | θ t , β t ) ∂ β i t + 1 = { 2 ( X ′ ( θ ) X ( θ ) β − X ′ ( θ ) Y ) i + λ φ 1 + φ 2 | β i t | 2 , β i > 0 2 ( X ′ ( θ ) X ( θ ) β − X ′ ( θ ) Y ) i − λ φ 1 + φ 2 | β i t | 2 , β i < 0 2 ( X ′ ( θ ) X ( θ ) β − X ′ ( θ ) Y ) i − λ φ 1 + φ 2 | β i t | 2 C ^ | β i | , β i = 0 (11)

C ^ | β i | represents the sub-gradient of | β i t + 1 | at zero which is also a set of real number: a ^ ≤ C ^ | β i | ≤ b ^

a ^ = lim β i t + 1 → 0 − | β i t + 1 | − 0 β i t + 1 − 0 = − 1 b ^ = lim β i t + 1 → 0 + | β i t + 1 | − 0 β i t + 1 − 0 = 1 (12)

The next step is to find θ t + 1 . In this situation we do not need to calculate the optimal solution, instead we are going to find the estimation θ t + 1 which satisfied:

‖ Y − X ( θ t + 1 ) β t + 1 ‖ 2 2 ≤ ‖ Y − X ( θ t ) β t + 1 ‖ 2 2 (13)

The stop condition of the algorithm is controlled by tolerance value ω 1 , ω 2 . In this paper we set the tolerance value equals 0.02 in the numerical simulation. Based on the discussion above, we summarise our algorithm as follows:

First we want to prove that the objective function (6) is guaranteed to be non-increasing at each iteration:

H ( θ t , β t ) ≥ H ( θ t , β t + 1 ) ≥ H ( θ t + 1 , β t + 1 ) (14)

Since we obtain θ t + 1 by the gradient descent method, it is obvious that H ( θ t , β t + 1 ) ≤ H ( θ t + 1 , β t + 1 ) .

On the other hand, we prove H ( θ t , β t ) ≥ H ( θ t , β t + 1 ) using the next lemma which has been introduced by [

LEMMA: Given that the adjust parameter φ > 0 , then we have the following inequality:

arctan ( φ | β i t | ) − arctan ( φ | β i t + 1 | ) ≥ φ 1 + φ 2 | β i t | 2 ( | β i t | − | β i t + 1 | ) (15)

proof:

We first denote f ( x ) = arctan ( x ) and let x ≥ 0 , then by the mean value theorem we have: f ( x 1 ) − f ( x 2 ) = f ′ ( ζ ) ( x 1 − x 2 ) , where ζ between x 1 and x 2 .

Since f ( x ) is an increasing function and f ′ ( x ) is a decreasing function, the following inequality: f ( x 1 ) − f ( x 2 ) ≥ f ′ ( x 1 ) ( x 1 − x 2 ) is always holds for any non-negative value x 1 and x 2 . If we let x 1 = φ | β i t | and x 2 = φ | β i t + 1 | , the inequality (15) would be certainly proved.

Next we consider the following equality:

H ( θ t , β t ) − H ( θ t , β t + 1 ) = ‖ Y − X ( θ t ) β t ‖ 2 2 + λ ∑ i = 1 k arctan ( φ | β i t | ) − ‖ Y − X ( θ t ) β t + 1 ‖ 2 2 − λ ∑ i = 1 k arctan ( φ | β i t + 1 | ) = ‖ X ( θ t ) β t ‖ 2 2 + ‖ X ( θ t ) β t + 1 ‖ 2 2 − 2 ( Y − X ( θ t ) β t ) ′ X ( θ t ) β t + 2 ( Y − X ( θ t ) β t + 1 ) ′ X ( θ t ) β t + 1 + λ ∑ i = 1 k ( arctan ( φ | β i t | ) − arctan ( φ | β i t + 1 | ) )

= ‖ X ( θ t ) β t + 1 − X ( θ t ) β t ‖ 2 2 + 2 ( X ( θ t ) β t + 1 − X ( θ t ) β t ) ′ X ( θ t ) β t − 2 ( Y − X ( θ t ) β t ) ′ X ( θ t ) β t + 2 ( Y − X ( θ t ) β t + 1 ) ′ X ( θ t ) β t + 1 + λ ∑ i = 1 k ( arctan ( φ | β i t | ) − arctan ( φ | β i t + 1 | ) ) = ‖ X ( θ t ) β t + 1 − X ( θ t ) β t ‖ 2 2 + 2 [ ( Y − X ( θ t ) β t + 1 ) X ( θ t ) ( β t + 1 − β t ) ] + λ ∑ i = 1 k ( arctan ( φ | β i t | ) − arctan ( φ | β i t + 1 | ) ) _{} (16)

using the lemma above we can yields:

H ( θ t , β t ) − H ( θ t , β t + 1 ) ≥ ‖ X ( θ t ) β t + 1 − X ( θ t ) β t ‖ 2 2 + 2 [ ( Y − X ( θ t ) β t + 1 ) X ( θ t ) ( β t + 1 − β t ) ] + λ ∑ i = 1 k φ 1 + φ 2 | β i t | 2 ( | β i t | − | β i t + 1 | ) (17)

where β i t + 1 is the optimal solution of problem (10), which satisfied:

∂ G ( β t + 1 | θ t , β t ) ∂ β i t + 1 = 0 (18)

Substituting (18) to (17), we show that we have the inequality:

H ( θ t , β t ) ≤ H ( θ t , β t + 1 ) (19)

in different situations:

When: β i t + 1 ≥ 0 :

2 [ ( Y − X ( θ t ) β t + 1 ) ′ X ( θ t ) ( β t + 1 − β t ) ] = λ ∑ i = 1 k φ 1 + φ 2 | β i t | 2 ( β i t + 1 − β i t ) (20)

then:

H ( θ t , β t ) − H ( θ t , β t + 1 ) ≥ ‖ X ( θ t ) β t + 1 − X ( θ t ) β t ‖ 2 2 + λ ∑ i = 1 k φ 1 + φ 2 | β i t | 2 ( β i t + 1 − β i t + | β i t | − | β i t + 1 | ) (21)

By the fact that β i t + 1 − β i t + | β i t | − | β i t + 1 | ≥ 0 on the condition of β i t + 1 > 0 , the inequality (19) has proved to be correct.

When: β i t + 1 ≤ 0 :

2 [ ( Y − X ( θ t ) β t + 1 ) ′ X ( θ t ) ( β t + 1 − β t ) ] = − λ ∑ i = 1 k φ 1 + φ 2 | β i t | 2 ( β i t + 1 − β i t ) (22)

then:

H ( θ t , β t ) − H ( θ t , β t + 1 ) ≥ ‖ X ( θ t ) β t + 1 − X ( θ t ) β t ‖ 2 2 + λ ∑ i = 1 k φ 1 + φ 2 | β i t | 2 ( β i t − β i t + 1 + | β i t | − | β i t + 1 | ) (23)

By the fact that β i t − β i t + 1 + | β i t | − | β i t + 1 | ≥ 0 on the condition of β i t + 1 < 0 , the inequality (19) has proved to be correct.

When: β i t + 1 = 0 :

The set of sub-gradient at β i t + 1 (which is [ a ^ , b ^ ] ) should contains 0 when β i t + 1 is the critical point, thus we have the following inequality:

− 2 ( X ′ ( θ ) Y ) i − φ λ 1 + φ 2 | β i t | 2 ≤ 0 , − 2 ( X ′ ( θ ) Y ) i + φ λ 1 + φ 2 | β i t | 2 ≥ 0 (24)

Consider the inequality we want to prove, we can easily formulate:

H ( θ t , β t ) − H ( θ t , β t + 1 ) ≥ ‖ X ( θ t ) β t ‖ 2 2 + ∑ i = 1 k λ φ 1 + φ 2 | β i t | 2 ( | β i t | − β i t ) ≥ 0 (25)

The discussion above proves that we can ensure that the function value keeps non-increasing at each iteration. In addition we want to illustrate that β t + 1 convergence to β * and θ t + 1 convergence to θ * on the limit situation β t → β * and θ t → θ * .

As β t + 1 is the optimal solution of (10), we define: β ^ t + 1 = lim β → β * , θ → θ * β t + 1 .

Consider the first-order condition of β t + 1 which satisfied:

lim β → β * , θ → θ * ∂ G ( β | β t , θ t ) ∂ β i t + 1 = 0 (26)

On the other hand we have the following conclusion in the situation of β * ≠ 0 :

∂ H ( β , θ ) ∂ β i * = 0 (27)

Compare the above two equalities we can conclude that β ^ t + 1 = β * .

As for the situation when β t + 1 = 0 , consider the inequality from (18):

− 2 ( X ′ ( θ ) Y ) i − φ λ 1 + φ 2 | β i * | 2 ≤ 0 − 2 ( X ′ ( θ ) Y ) i + φ λ 1 + φ 2 | β i * | 2 ≥ 0 (28)

Since φ λ 1 + φ 2 | β i * | 2 ≤ φ λ , we can easily conclude:

− 2 ( X ′ ( θ ) Y ) i − φ λ ≤ 0 − 2 ( X ′ ( θ ) Y ) i + φ λ ≥ 0 (29)

After the discussion before we can summarize that β ^ t + 1 always satisfied the first-order condition (7) when β t → β * and θ t → θ * . Thus we demonstrate the limit of: β t + 1 → β * . When θ t → θ * , consider the gradient of θ t :

lim θ → θ * ∂ H ( β t + 1 , θ ) ∂ θ t = 0 (30)

Thus the value of θ t + 1 remains θ * .

In this article we use the ideal of Bayesian lasso to estimate the optimal solution of problem (10).

Assuming that the prior distribution of the parameter β follows the Laplace distribution:

f ( β i ) = λ 4 σ 2 exp ( − λ 2 σ 2 | β i | ) (31)

Combined with the likelihood function we can get the posterior probability:

f ( β | Y ) ∝ exp ( − 1 2 σ 2 ‖ Y − X β ‖ 2 2 − ∑ i = 1 k λ 2 σ 2 | β i | ) (32)

Solving problem (10) is equivalent to solving the maximal probability of posterior probability, which we can obtain from Gibbs sampling [

As Laplace distribution is difficult to directly derive intuitive full condition posterior distribution, the following integral (33) provides an effective solution:

a 2 exp ( − a | z | ) = ∫ 0 ∞ 1 ( 2 π v ) 1 / 2 exp ( − z 2 2 v ) × a 2 2 exp ( − a 2 v 2 ) d v (33)

Using the above integral we can rewritten the Laplace prior distribution

β i ~ a 2 exp ( − a | z | ) by introducing the intermediate parameter v:

f ( β i ) ~ N ( 0 , v i ) f ( v i ) ~ exp ( a 2 2 ) (34)

In problem (32) we let a = λ 2 σ 2 .

Then we can motivate the following hierarchical Bayesian lasso model:

L ( Y | β , X ) ~ N ( X β , σ 2 I ) f ( β | V ) ~ N ( 0 , D v ) f ( v i ) ~ exp ( λ 2 8 σ 4 ) (35)

where V = [ v 1 , ⋯ , v k ] corresponds to β = [ β 1 , ⋯ , β k ] and D v = d i a g [ v 1 , ⋯ , v k ] .

For Bayesian inference, the full condition distribution of β and v is:

f ( β | V , Y ) ~ N ( A − 1 X t Y , σ 2 A − 1 ) f ( 1 v i | β i , Y ) ~ I − G ( a * , b * ) (36)

where A − 1 = X t X + D v , the distribution I − G ( a * , b * ) represent inverse Gaussian

distribution with a * = λ * β i , b * = ( λ * ) 2 and λ * = λ 2 σ 2 . The density of inverse

Gaussian as follows:

f ( x , a * , b * ) = ( λ 2 π x 3 ) 1 2 exp ( − − b * ( x − a * ) 2 2 ( a * ) 2 x ) (37)

By repeated sampling, we will form a Markov chain contains a series of point:

( β 1 t , v 1 ) , ( β 2 t , v 2 ) , ⋯ , ( β m t , v m ) .

Since each iteration will lead to a Markov chain, we will get a long sample of β through the whole algorithm:

β 1 1 , ⋯ , β l 1 , ⋯ , β k 1 → β 2 , θ 2 β 1 2 , ⋯ , β l 2 , ⋯ , β k 2 → β 3 , θ 3 ⋮ β 1 T , ⋯ , β l T , ⋯ , β k T → β * , θ * (38)

And we have demonstrated that H ( θ t , β t ) ≥ H ( θ t , β t + 1 ) ≥ ⋯ ≥ H ( θ * , β * ) in the above. The simulation results are shown in the next section.

In this section, we carry out a series of experiments to illustrate the performance of our proposed l 1 iterative reweighted algorithm (denoted to as l1-IR). In our simulations, we compare our proposed algorithm with other existing state-of-the-art methods, including the sparse Bayesian learning with dictionary refinement algorithm (denoted as DicRefCS) [

In order to control the noise level at some of the experiments, we first give the definition of observation quality by the peak-signal-to-noise ratio (PSNR):

P S N R ≡ 10 log 10 ( 1 / σ 2 ) (39)

where σ 2 represents the variance of noise. We calculate the signal-to-noise ratio (PSNR) to complete the recovery effect comparison:

R S N R = 20 log 10 ( ‖ β * ‖ 2 2 ‖ β * − β ^ ‖ 2 2 ) (40)

where β * represents the original sparse signal and β ^ represents the signal recovered by the algorithm. Parameters λ and φ have the same effect as regularization parameters, we choose φ = 1 and select optimal parameters λ by cross validation.

In the following, we examine the behaviour of respective algorithms under different scenarios. First we control the noise level P S N R = 20 , which means σ 2 = 0.01 .

Next, we illustrate the influence of the sample size N on the recovery effect using another set of experiments. We keep K = 64 , the sparseness S = 3 and the PSNR maintains at 20. We change the number of measurements N from 6 to 32. For each N,

It can be observed from

of measurements. This means that our method performs better for the recovery of sparse signals when the number of samples is limited.

Our last experiment tested the recovery performance of respective algorithm under different noise level PSNR. According to the definition of PSNR, we set the variance of noise from 0.01 to 1, which makes the PSNR changed from 20 to 0. At this experiment use signals of length K = 64 contains S = 3 complex sinusoids and set the number of measurements N = 24 . We take 20 points evenly between 0 and 20 and record the performance of respective algorithm in each point. Each data point is averaged by repeated tests.

Relatively speaking, the L1-IR algorithm and the SR-IR algorithm were more stable than the other algorithms as the noise level increased.

The validity, superiority, and stability of the l1-IR algorithm are illustrated by these experiments, indicating that the algorithm is worth applying in some practical cases.

In this paper, we treated the real dictionary parameters as unknown variables, and studied the line spectral estimation problem with unknown dictionary parameters. Based on the ideal of Bayesian lasso and the analysis of the first-order

R S N R / s i g m a 2 / 1 | |||||||
---|---|---|---|---|---|---|---|

0/1 | 3/0.5 | 6.98/0.2 | 10/0.1 | 13.1/0.05 | 20.0.01 | ||

7* a l g o r i t h m | S B L - D E | 45% | 75% | 80% | 95% | 100% | 100% |

D i s R e f C S | 40% | 60% | 75% | 80% | 90% | 100% | |

S R - I R | 55% | 80% | 95% | 100% | 100% | 100% | |

L 1 - I R | 60% | 75% | 95% | 95% | 100% | 100% |

condition of the optimal solution, we proposed an iterative reweighted l 1 penalty regression algorithm. We proved that in each step of the iterative process, the function value is continuously reduced until the approximate solution of the real sparse vector is obtained. The numerical results in Section 4 illustrated that the performance of our algorithm is better than other state-of-the-art algorithms in some cases. The disadvantage is that our method is more time-consuming, partly because in each sampling step it is necessary to ensure convergence, resulting in a sampling length that cannot be effectively reduced. Future studies will focus on this problem.

Ye, F., Luo, X. and Ye, W.Z. (2018) Iterative Reweighted l_{1} Penalty Regression Approach for Line Spectral Estimation. Advances in Pure Mathematics, 8, 155-167. https://doi.org/10.4236/apm.2018.82008