^{1}

^{*}

^{1}

This paper studies the problem of recovering low-rank tensors, and the tensors are corrupted by both impulse and Gaussian noise. The problem is well accomplished by integrating the tensor nuclear norm and the
*l*
_{1}-norm in a unified convex relaxation framework. The nuclear norm is adopted to explore the low-rank components and the
*l*
_{1}-norm is used to exploit the impulse noise. Then, this optimization problem is solved by some augmented-Lagrangian-based algorithms. Some preliminary numerical experiments verify that the proposed method can well recover the corrupted low-rank tensors.

The problem of exploiting low-dimensional structures in high-dimensional data is taking on increasing importance in image, text and video processing, and web search, where the observed data lie in very high dimensional spaces. The principal component analysis (PCA) proposed in [

min A , E ‖ A ‖ * + λ ‖ E ‖ 1 s .t . X = A + E (1)

where ‖ A ‖ * denotes the nuclear norm of A and ‖ E ‖ 1 denotes the l 1 -norm of E . Nuclear norm and l 1 -norm are used to induce low rank and sparsity, specifically. λ > 0 is a parameter balancing the low rank and sparsity. Candes et al. [

In many real world applications, we need to consider the model defined in Equation (1) under more complicated circumstance [

min A , E ‖ A ‖ * + λ ‖ E ‖ 1 s .t . ‖ P Ω ( X − A − E ) ‖ F ≤ δ (2)

where Ω is a subset of the index set of entries { 1 , 2 , ⋯ , n 1 } × { 1 , 2 , ⋯ , n 2 } . It’s assumed that only these entries { X − i j , ( i , j ) ∈ Ω } can be observed. The operator P Ω : ℝ n 1 × n 2 → ℝ n 1 × n 2 is a orthogonal projection onto the span of matrices vanishing outside of Ω so that the ij-th entry of P Ω ( X ) is X i j if ( i , j ) ∈ Ω and zero otherwise. The problem defined in Equation (2) can be solved by the classical Augmented Lagrangian Method (ALM). The separable structure emerging in the objective function and the constraints entails the idea of splitting the corresponding augmented Lagrangian function to derive more efficient numerical algorithms. Tao et al. [

One shortcoming of model defined in Equation (2) is that it can only handle matrix (two-way) data. However, the real-world data are ubiquitously in multi-way, also referred to as tensor. For example, a color image is a 3-way object with column, row and color modes; a greyscale video is indexed by two spatial variables and one temporal variable. If we use the model defined in Equation (2) to process the tensor data, we have to unfold the multi-way data into a matrix. Such a preprocessing usually leads to the loss of the inherent structure high-di- mensional information in the original observations. To avoid this negative factor, a common approach is to manipulate the tensor data by taking the advantage of its multi-dimensional structure. Tensor analysis have many applications in computer vision [

The goal of this paper is to study the Tensor Robust PCA which aims to accurately recover a low-rank tensor from impulse and Gaussian noise. The observations can also be incomplete. Tensors of low rank appear in a variety of applications such as video processing (d = 3) [

X = A 0 + E 0 + F 0 (3)

where A 0 is low-rank and E 0 is sparse. F 0 is Gaussian noise with the noise level being δ . Then, we try to recover the low-rank A 0 through the following convex relaxation problem:

min A , E ‖ A ‖ * + λ ‖ E ‖ 1 s .t . ‖ P Ω ( X − A − E ) ‖ F ≤ δ (4)

Although the recovery of low-rank matrix has been well studied, the research of low-rank tensor recovery is still lacking. This is mainly because it’s difficult to define a satisfactory tensor rank which enjoys similar good properties as the matrix case. Several different definitions of tensor rank have been proposed but each has its limitation. For example, the CP rank [

rank t c ( X ) : = ( rank ( X ( 1 ) ) , rank ( X ( 2 ) ) , ⋯ , rank ( X ( d ) ) ) ,

where X ( i ) is the mode-i matricization of X . Motivated by the fact that the nuclear norm is the convex envelop of the matrix rank within the unit ball of the spectral norm. The Sum of Nuclear Norms (SNN), defined as ∑ i ‖ X ( i ) ‖ * , is used as a convex surrogate of the Tucker rank. This approach is effective, but SNN is not a tight convex relaxation of Tucker rank.

More recently, the work [

min A ‖ A ‖ * s .t . P Ω ( A ) = P Ω ( A 0 ) (5)

where ‖ ⋅ ‖ * is the nuclear norm of A , Ω is the index set of known elements in the original tensor, and P Ω is the projector onto the span of tensors. Lu et al. [

min A , E ‖ A ‖ * + λ ‖ E ‖ 1 s .t . X = A + E (6)

In this work, we go one step further, and consider recovering low-rank and sparse components of tensors from incomplete and noisy observations as defined in Equation (4).

The contributions of this work are two-fold:

・ A unified convex relaxation framework is proposed for the problem of recovering low-rank and sparse components of tensors from incomplete and noisy observations. Three augmented-Lagrangian-based algorithms are developed for the optimization problem.

・ Numerical experiments on synthetical data validate the efficacy of our proposed denoising approach.

The rest of the paper is organized as follows. In Section 2, some preliminaries that are useful for the subsequent analysis are provided. In Section 3, three augmented-Lagrangian-based methods are developed for the problem defined in Equation (4). In Section 4, some numerical experiments verify the justification of the model defined in Equation (4) and the efficiency of the proposed algorithms. Finally, in Section 5, we make some conclusions and discuss some topics for future work.

In this section, we list some lemmas concerning the shrinkage operators, which will be used at each iteration of the proposed augmented Lagrangian type methods to solve the generated subproblems.

Lemma 1. For τ > 0 , and T ∈ ℝ n 1 × n 2 , the solution of the following problem (7) obeys

arg min S { 1 2 ‖ S − T ‖ F 2 + τ ‖ S ‖ 1 } (7)

is given by shrink ( T , τ ) . shrink ( ⋅ , ⋅ ) is a soft shrinkage operator and defined as:

shrink ( a , κ ) { a − κ a > κ 0 | a | ≤ κ a + κ a < − κ (8)

Lemma 2. Consider the singular value decomposition (SVD) of a matrix A ∈ ℝ n 1 × n 2 of rank r .

A = Q * S * V , S = diag ( { σ i } 1 ≤ i ≤ r ) (9)

where Q ∈ ℝ n 1 × r and V ∈ ℝ n 2 × r are orthogonal, and the singular values σ i are real and positive. Then, for all τ > 0 , define the soft-thresholding operator D ,

D τ ( A ) : = Q * D τ ( S ) * V , D τ ( S ) = diag ( { ( σ i − τ ) + } 1 ≤ i ≤ r ) (10)

where x + is the operator that x + = max ( 0 , x ) . Then, for each τ > 0 and B ∈ ℝ n 1 × n 2 , the singular value shrinkage operator (10) obeys

D τ ( A ) = arg min B { 1 2 ‖ B − A ‖ F 2 + τ ‖ B ‖ * } (11)

An alternative model to study the problem defined in Equation (4) is the following nuclear-norm- and l 1 -norm- normalized least squares problem:

min A , E ‖ A ‖ * + λ 1 ‖ E ‖ 1 + λ 2 ‖ P Ω ( X − A − E ) ‖ F 2 (12)

Equation (12) can be reformulated into the following favourable form:

min A , E , F ‖ A ‖ * + λ 1 ‖ E ‖ 1 + λ 2 ‖ P Ω ( F ) ‖ F 2 s .t . X = A + E + F (13)

Alternating Direction Method of Multiplier (ADMM), which is an extension of ALM algorithm, can be used to solve the tensor recovery problem defined in (13). With given ( A k , E k , F k ) , the ADMM generate the new iterates via the following scheme:

{ A k + 1 = arg min A ( ‖ A ‖ * + β 2 ‖ X − ( A + E k + F k ) + Λ 1 k β ‖ ) E k + 1 = arg min E ( λ 1 ‖ E ‖ 1 + β 2 ‖ X − ( A k + 1 + E + F k ) + Λ 1 k β ‖ ) F k + 1 = arg min F ∈ B ( λ 2 ‖ F ‖ F 2 + β 2 ‖ X − ( A k + 1 + E k + 1 + F ) + Λ 1 k β ‖ ) Λ 1 k + 1 = Λ 1 k + β ( X − ( A k + 1 + E k + 1 + F k + 1 ) ) (14)

See Algorithm 1 for the optimization details.

It can be easily verified that the iterates generated by the proposed ADMM algorithm can be characterized by

{ 0 ∈ ∂ ‖ A k + 1 ‖ * − [ Λ 1 k − β ( A k + 1 + E k + F k − X ) ] 0 ∈ ∂ ( λ 1 ‖ E k + 1 ‖ 1 ) − [ Λ 1 k − β ( A k + 1 + E k + 1 + F k − X ) ] 0 ∈ ∂ ( λ 2 ‖ F k + 1 ‖ F 2 ) − [ Λ 1 k − β ( A k + 1 + E k + 1 + F k + 1 − X ) ] Λ 1 k + 1 = Λ 1 k − [ ( A k + 1 + E k + 1 + F k + 1 ) − X ] (15)

which is equivalent to

{ 0 ∈ ∂ ‖ A k + 1 ‖ * − Λ 1 k + 1 + β ( E k − E k + 1 ) + β ( F k − F k + 1 ) 0 ∈ ∂ ( λ 1 ‖ E k + 1 ‖ 1 ) − Λ 1 k + 1 + β ( F k − F k + 1 ) 0 ∈ ∂ ( λ 2 ‖ F k + 1 ‖ F 2 ) − Λ 1 k + 1 Λ 1 k + 1 = Λ 1 k − β [ ( A k + 1 + E k + 1 + F k + 1 ) − X ] (16)

Algorithm 1. Optimization framework for problem defined in Equation (13).

Equation (16) shows that the distance of the iterates ( A k + 1 , E k + 1 , F k + 1 ) to the

solution ( A * , E * , F * , Λ * ) can be characterized by β ( ‖ E k − E k + 1 ‖ + ‖ F k − F k + 1 ‖ ) and 1 β ‖ Λ 1 k − Λ 1 k + 1 ‖ . Thus, a straightforward stopping criterion for Algorithm 1 is:

m i n { β ( ‖ E k − E k + 1 ‖ + ‖ F k − F k + 1 ‖ ) , 1 β ‖ Λ 1 k − Λ 1 k + 1 ‖ } ≤ ϵ (17)

Here ϵ is an infinitesimal number, e.g., 10^{−6}.

In this subsection, we mainly analyze the convergence of ADMM for solving problem defined in Equation (13). We denote f 1 ( ⋅ ) = ‖ ⋅ ‖ * , f 2 ( ⋅ ) = ‖ ⋅ ‖ F 2 , and f 3 ( ⋅ ) = ‖ ⋅ ‖ 1 . f 2 ( ⋅ ) is strongly convex, while f 1 ( ⋅ ) and f 3 ( ⋅ ) are convex terms, but may not be strongly convex. The problem defined in Equation (13) can be reformulated as

min A , E , F f 1 ( A ) + λ 2 f 2 ( F ) + λ 1 f 3 ( E ) s .t . χ = A + E + F (18)

Definition 1. (Convex and Strongly Convex) Let f : ℝ n → [ − ∞ , + ∞ ] , if the domain of f denoted by f : = { x ∈ ℝ n , f ( x ) < + ∞ } is not empty, f is considered to be proper. If for any x ∈ ℝ n and y ∈ ℝ n , we always have f ( t x + ( 1 − t ) y ) ≤ t f ( x ) + ( 1 − t ) f ( y ) , ∀ t ∈ [ 0,1 ] , then it is considered that f is convex. Furthermore, f is considered to be strongly convex with the modulus μ > 0 , if

f ( t x + ( 1 − t ) y ) ≤ t f ( x ) + ( 1 − t ) f ( y ) − 1 2 μ t ( 1 − t ) ‖ x − y ‖ 2 , ∀ t ∈ [ 0,1 ] (19)

Cai et al. [

Assumption 1. In Equation (18), f 1 and f 3 are convex, and f 2 is strongly convex with modulus μ 2 > 0 .

Assumption 2. The optimal solution set for the problem defined in Equation (18) is nonempty, i.e., there exist ( A * , E * , F * , Λ 1 * ) ∈ Ω * such that the following requirements can be satisfied:

∇ f 1 ( A * ) − Λ * = 0 , (20)

λ 2 ∇ f 2 ( F * ) − Λ 1 * = 0 , (21)

λ 1 ∇ f 3 ( E * ) − Λ 1 * = 0 , (22)

A * + E * + F * − χ = 0 , (23)

Theorem 1. Assume that Assumption 1 and Assumption 2 hold. Let ( A k , E k , F k , Λ 1 k ) be the sequence generated by Algorithm 1 for solving the problem

defined in Equation (18). If β ∈ ( 0 , 6 μ 2 13 ) , the limit point of ( A k , F k , E k , Λ 1 k )

is an optimal solution to Equation (18). Moreover, the objective function con- verges to the optimal value and the constraint violation converges to zero, i.e.,

lim k → ∞ ‖ f 1 ( A * ) + λ 2 f 2 ( F * ) + λ 1 f 3 ( E * ) − f * ‖ = 0 (24)

and

lim k → ∞ ‖ χ − ( A + E + F ) ‖ = 0 (25)

where f * denotes the optimal objective value for the problem defined in Equa-

tion (18). In our specific application, β ∈ ( 0 , 6 * 2 λ 2 13 ) can sufficiently ensure

the convergence [

In our optimization framework given in Equation (13), there are three parameters β , λ 1 and λ 2 . As mentioned in Lu [

is limited to the range β ∈ ( 0, 6 * 2 λ 2 13 ) to ensure the convergence of our algo-

rithm (based on the analysis in Theorem 1). Thus, the value of λ 2 is important for the performance of our algorithm. For simplicity, we consider the case when A is only degraded by Gaussian noise F without sparse noise E , that is:

m i n 1 2 ‖ F ‖ F 2 + 1 2 λ 2 ‖ A ‖ * s .t . χ = A + F (26)

The solution for Equation (26) is equal to χ but with singular values being shifted towards zero by soft thresholding. λ 2 should be set large enough to remove noise (i.e., to keep the variance low), and small to avoid over-shrinking of the original tensor A (i.e., to keep the bias low). For the matrix case (i.e., n 3 = 1 ), Candes et al. [

Theorem 2. Supposing that the Gaussian noise term F ∈ ℝ n × n , and each entry n i , j is iid normally distributed, we can have that for N ( 0, σ 2 ) ,

‖ F ‖ F 2 ≤ ( n + 8 n ) σ 2 with high probability. Then, 1 2 λ 1 = ( n + 8 n ) σ . That is, λ 2 = 1 2 ( n + 8 n ) σ .

Based on this conclusion, we derive the required conditions for convex program defined in Equation (13) to accurately recover the low-rank component A from corrupted observations. Our derivations are given in the following main result.

Main Result 1. Assume that the low-rank tensor A 0 ∈ ℝ n 1 × n 2 × n 3 obeys the incoherence conditions [

as λ 2 = 1 2 n ( 1 ) n 3 + 8 n ( 1 ) n 3 σ and λ 1 = 1 n ( 1 ) n 3 . In the same time, the rank of

A 0 and the number of non-zero entries of E 0 should satisfy that

rank t ( A ) ≤ ρ r n ( 2 ) μ 0 ( log ( n ( 1 ) n 3 ) ) 2 and m ≤ ρ s n 1 n 2 n 3 2

where n ( 1 ) = max { n 1 , n 2 } and n ( 2 ) = min { n 1 , n 2 } . ρ_{r} and ρ_{s} are positive constants.

The value of penalty parameter β should be within the range of ( 0 , 6 * 2 λ 2 13 ) to

ensure the convergence.

In this section, we conduct synthetic data and real data experiments to corroborate our algorithm. We investigate the ability of our proposed Robust Low Rank Tensor Approximation (denoted as RLRTA) algorithm for recovering low-rank tensors of various tubal rank from noises of various sparsity and random Gaussian noise of different intensity.

We first verify the correct recovery performance of our algorithm for different sparsity of E . Be similar to [

E i j k = { 1, w .p . ρ s / 2 0 w .p .1 − ρ s − 1 w .p . ρ s / 2 (27)

where w .p . is the abbreviation of “with probability”. We test on two settings: the first scenario with setting r = rank t ( A ) = 0.1 n and ρ s = 0.1 . The second scenario with setting r = rank t ( A ) = 0.1 n and ρ s = 0.2 .

The Gaussian noise F in each frontal slice is generated independently with each other, i.e.

F ( : , : , i 3 ) ∼ N ( 0 , σ i 3 2 ) , 1 ≤ i 3 ≤ n (28)

The variance values σ i 3 2 in each frontal slice are randomly selected from 0 to 0.1. In this sub-subsection 1, our task is to recovery A from noisy observation χ = A + E + F with E of varying sparsity.

n | r | m | |||

100 | 10 | 1e5 | 132,399 | 1.1838e−04 | 0.3040 |

200 | 20 | 8e5 | 1,046,860 | 2.8331e−05 | 0.3026 |

n | r | m | |||

100 | 10 | 2e5 | 222,128 | 1.5001e−04 | 0.3072 |

200 | 20 | 16e5 | 1,797,586 | 3.8035e−05 | 0.3118 |

n | r | m | |||

100 | 10 | 1e5 | 575,485 | 0.0021 | 0.2805 |

200 | 20 | 8e5 | 4,594,860 | 5.4577e−04 | 0.2727 |

n | r | m | |||

100 | 10 | 2e5 | 576,448 | 0.0030 | 0.1597 |

200 | 20 | 16e5 | 4,609,591 | 8.3233e−04 | 0.1707 |

recovery results of algorithm RLRTA and TRPCA. It’s shown that RLRTA can better recover the low-rank compnent A under different sparse component E .

Now we exam the recovery phenomenon with Gaussian noise of varying variances. The generation of A ∈ ℝ n × n × n is the same as that in sub-subsection 1 and r = rank t ( A ) = 0.1 n . The sparse component E has sparsity ρ s = 0.1 . For simplicity, we assume that F is white Gaussian noise, that is

F ( i 1 , i 2 , i 3 ) ∼ N ( 0, σ w 2 ) (29)

where 1 ≤ i 1 ≤ n , 1 ≤ i 2 ≤ n , 1 ≤ i 3 ≤ n . The noise variance values σ w 2 are 0.02, 0.04, 0.06, 0.08 and 0.1, respectively

Now we exam the recovery phenomenon with varying rank of A and varying sparsity of E . Similar to [

r | m | ||||

0.02 | 10 | 1e5 | 100,438 | 1.1900e−04 | 0.2707 |

0.04 | 10 | 1e5 | 111,249 | 1.1886e−04 | 0.2915 |

0.06 | 10 | 1e5 | 134,908 | 1.2041e−04 | 0.3138 |

0.08 | 10 | 1e5 | 160,814 | 1.5044e−04 | 0.3451 |

0.10 | 10 | 1e5 | 184,006 | 2.2189e−04 | 0.3824 |

r | m | ||||

0.02 | 20 | 8e5 | 803,564 | 2.8135e−05 | 0.2709 |

0.04 | 20 | 8e5 | 889,969 | 2.8269e−05 | 0.2913 |

0.06 | 20 | 8e5 | 1,080,602 | 2.9120e−05 | 0.3145 |

0.08 | 20 | 8e5 | 1,287,048 | 3.6441e−05 | 0.3460 |

0.10 | 20 | 8e5 | 1,472,423 | 5.4040e−05 | 0.3821 |

r | m | ||||

0.02 | 10 | 1e5 | 571,972 | 0.0013 | 0.1036 |

0.04 | 10 | 1e5 | 571,943 | 0.0024 | 0.2041 |

0.06 | 10 | 1e5 | 571,944 | 0.0036 | 0.3039 |

0.08 | 10 | 1e5 | 571,290 | 0.0048 | 0.4037 |

0.10 | 10 | 1e5 | 571,880 | 0.0059 | 0.5039 |

r | m | ||||

0.02 | 20 | 8e5 | 4,573,505 | 3.0970e−04 | 0.1036 |

0.04 | 20 | 8e5 | 4,574,842 | 6.0571e−04 | 0.2043 |

0.06 | 20 | 8e5 | 4,573,573 | 9.0370e−04 | 0.3040 |

0.08 | 20 | 8e5 | 4,573,394 | 0.0012 | 0.4039 |

0.10 | 20 | 8e5 | 4,572,467 | 0.0015 | 0.5021 |

n = 100 , n 3 = 50 , (2) n = 200 , n 3 = 50 . We generate A = Q * V , where the entries of Q ∈ ℝ n × r × n 3 and V ∈ ℝ r × n × n 3 are independently sampled from a uniform distribution in interval ( 0, 1 / n ) . For E , we still consider a Bernoulli model for its support and random signs as in Equation (27). The variance values σ w , i 3 2 in each frontal slice i 3 ( 1 ≤ i 3 ≤ n 3 ) are randomly selected from 0 to 0.1, and the mean variance values are both set to be 0.05.

We set r / n as all the choices in [ 0.01 : 0.01 : 0.5 ] , and ρ_{s} in [ 0.01 : 0.01 : 0.5 ] . For each ( r , ρ s ) -pair, we simulate 10 test instances and declare a trial to be successful if the recovered A ⌣ satisfies ‖ A ⌣ − A ‖ F / ‖ A ‖ F ≤ 10 − 3 .

the fraction of correct recovery for each pair (black = 0% and white = 100%). It can be seen that there is a large region in which the recovery is correct.

Now we exam the recovery phenomenon with varying rank of A and varying intensity of noise F . We still consider two sizes of A ∈ ℝ n × n × n 3 : (1) n = 100 , n 3 = 50 , (2) n = 200 , n 3 = 50 . We generate A = Q * V , where the entries of Q ∈ ℝ n × r × n 3 and V ∈ ℝ r × n × n 3 independently sampled from a uniform distribution in interval ( 0, 1 / n ) . For E , we still consider a Bernoulli model for its

support and random signs as in Equation (27) and sparsity parameter ρ_{s} is fixed at 0.1. The generation of F is similar to Equation (29).

We set r / n as all the choices in [ 0.01 : 0.05 : 0.5 ] . The noise variance values σ w 2 are in [ 0.01 : 0.01 : 0.1 ] . For each ( r , σ w 2 ) -pair, we simulate 10 test instances and declare a trial to be successful if the recovered A ⌣ satisfies ‖ A ⌣ − A ‖ F / ‖ A ‖ F ≤ 10 − 3 .

This work verifies the ability of convex optimization for the recovery of low- rank tensors corrupted by both impulse and Gaussian noise. The problem is tackled by integrating the tensor nuclear norm, l 1 -norm and least square term in a unified convex relaxation framework. Parameters are selected to comprise the low-rank component, the sparse component and the Gaussian-noise term. Besides, the convergence of the proposed algorithm is discussed. Numerical experiments have been conducted to demonstrate the efficacy of our proposed denoising approach.

The authors would like to thank Canyi Lu for providing the code for TRPCA algorithm.

Fan, H.Y. and Kuang, G.Y. (2017) Recovery of Corrupted Low-Rank Tensors. Applied Mathematics, 8, 229-244. https://doi.org/10.4236/am.2017.82019