Recovery of Corrupted Low-Rank Tensors

This paper studies the problem of recovering low-rank tensors, and the tensors are corrupted by both impulse and Gaussian noise. The problem is well accomplished by integrating the tensor nuclear norm and the l1-norm in a unified convex relaxation framework. The nuclear norm is adopted to explore the low-rank components and the l1-norm is used to exploit the impulse noise. Then, this optimization problem is solved by some augmented-Lagrangianbased algorithms. Some preliminary numerical experiments verify that the proposed method can well recover the corrupted low-rank tensors.


Introduction
The problem of exploiting low-dimensional structures in high-dimensional data is taking on increasing importance in image, text and video processing, and web search, where the observed data lie in very high dimensional spaces.The principal component analysis (PCA) proposed in [1] is the most widely used tool for low-rank component extraction and dimensionality reduction.It is easily implementable and efficient for data mildly corrupted by small noise.However, this PCA method is sensitive to data corrupted by heavy impulse noise or outlying observations.Then, a number of robust PCA methods were proposed.But none of these approaches yield a polynomial-time algorithm with strong performance guarantees under broad conditions.The proposed Robust PCA [2] is one among the earliest polynomial-time algorithms.Assume that a data matrix consists of a low-rank matrix 0 A and a sparse matrix 0 E .Then, 0 A and 0 E can be recovered with a high probability by solving the following convex relaxation problem (if 0 A is low-rank and satisfies some incoherent conditions, 0 E is sufficiently sparse): X A E (1) where * A denotes the nuclear norm of A and 1 E denotes the 1 l -norm of E .Nuclear norm and 1 l -norm are used to induce low rank and sparsity, specifically.0 λ > is a parameter balancing the low rank and sparsity.Candes et al. [2] prove that ( ) works with a high probability for recovering any low-rank, incoherent matrix.Notably, Chandrasekaran et al. [3] also consider the problem of decomposing a given data matrix into sparse and low-rank components, and give sufficient conditions for convex programming to succeed.Their work was motivated by applications in system identification and learning of graphical models.In contrast, Candes et al. [2] were motivated by robust principal component computations in high dimensional settings when there were erroneous and missing entries; missing entries were not considered in [3].In [3], the parameter λ is data-dependent, and may have to be selected by solving a number of convex programs, while Candes et al. [2] provide a universal value of λ , namely, ( ) 1 max , n n λ = . The distinction between the two results is a consequence of different assumptions about the origin of the data matrix X .In many real world applications, we need to consider the model defined in Equation ( 1) under more complicated circumstance [4] [5].First, only a fraction of entries of X can be observed.This is the well-known matrix completion problem.If the unknown matrix is known to have low rank or approximately low rank, then accurate and even exact recovery is possible by nuclear norm minimization [6] [7].Second, the observed data are corrupted by both impulse noise (sparse but large) and Gaussian noise (small but dense).Let X be the superposition of low-rank matrix A , the impulse noise matrix E and the Gaussian noise F , i.e., = + + X A E F .The Gaussian noise of the observed entries is assumed to be small in the sense that , where δ is the Gaussian noise level and F ⋅ is the Frobenius norm.Then, to be broadly ap- plicable, we consider the following extension of model defined in Equation ( 1): where Ω is a subset of the index set of entries { } { } » is a orthogonal projection onto the span of matrices vanishing outside of Ω so that the ij-th entry of ( ) One shortcoming of model defined in Equation ( 2) is that it can only handle matrix (two-way) data.However, the real-world data are ubiquitously in multi-way, also referred to as tensor.For example, a color image is a 3-way object with column, row and color modes; a greyscale video is indexed by two spatial variables and one temporal variable.If we use the model defined in Equation (2) to process the tensor data, we have to unfold the multi-way data into a matrix.Such a preprocessing usually leads to the loss of the inherent structure high-dimensional information in the original observations.To avoid this negative factor, a common approach is to manipulate the tensor data by taking the advantage of its multi-dimensional structure.Tensor analysis have many applications in computer vision [8], diffusion Magnetic Resonance Imaging (MRI) [9] [10] [11], quantum entanglement problem [12], spectral hypergraph theory [13] and higher-order Markov chains [14].
The goal of this paper is to study the Tensor Robust PCA which aims to accurately recover a low-rank tensor from impulse and Gaussian noise.The observations can also be incomplete.Tensors of low rank appear in a variety of applications such as video processing (d = 3) [15], time-dependent 3D imaging (d = 4), ray tracing where the material dependent bidirectional reflection function is an order four tensor that has to be determined from measurements [15], numerical solution of the electronic Schrödinger equation ( , where N is the number of particles) [16] [17] [18], machine learning [19] and more.More specifically, suppose we are given a data tensor ( d is the number of ways), and it can be decomposed as where 0  is low-rank and 0  is sparse.0  is Gaussian noise with the noise level being δ .Then, we try to recover the low-rank 0  through the following convex relaxation problem:

Related Work
Although the recovery of low-rank matrix has been well studied, the research of low-rank tensor recovery is still lacking.This is mainly because it's difficult to define a satisfactory tensor rank which enjoys similar good properties as the matrix case.Several different definitions of tensor rank have been proposed but each has its limitation.For example, the CP rank [20] is defined as the smallest number of rank one tensors that sum up to  , where a rank one tensor is of the form ⊗ ⊗ ⊗ 1 2 d u u u .Expectedly, the CP rank is NP-hard to compute.Its convex relaxation is also intractable.Another more popular direction is to use the tractable Tucker rank [20] and its convex relaxation.For a d-way tensor  , the Tucker rank is a vector defined as ( ( ) where ( ) i X is the mode-i matricization of  .Motivated by the fact that the nuclear norm is the convex envelop of the matrix rank within the unit ball of the spectral norm.The Sum of Nuclear Norms (SNN), defined as used as a convex surrogate of the Tucker rank.This approach is effective, but SNN is not a tight convex relaxation of Tucker rank.
More recently, the work [21] proposed the tensor tubal rank based on a new tensor decomposition scheme denoted as t-SVD.The t-SVD is based on a new tensor-tensor product which enjoys many similar properties as the matrix case.
Based on the computable t-SVD, the tensor nuclear norm [22] is used to replace the tubal rank for low-rank tensor recovery (from incomplete/corrupted tensors).
The problem of recovering the unknown low-rank tensor  from incomplete samples 0  can be solved by the following convex program [21], where * ⋅ is the nuclear norm of  , Ω is the index set of known elements in the original tensor, and Ω  is the projector onto the span of tensors.Lu et al. [23] extended the work and studied the Tensor Robust Principal Component (TRPCA) problem, as defined in the following equation: In this work, we go one step further, and consider recovering low-rank and sparse components of tensors from incomplete and noisy observations as defined in Equation (4).

Paper Contribution
The contributions of this work are two-fold: • A unified convex relaxation framework is proposed for the problem of recovering low-rank and sparse components of tensors from incomplete and noisy observations.Three augmented-Lagrangian-based algorithms are developed for the optimization problem.
• Numerical experiments on synthetical data validate the efficacy of our proposed denoising approach.

Paper Organization
The rest of the paper is organized as follows.In Section 2, some preliminaries that are useful for the subsequent analysis are provided.In Section 3, three augmented-Lagrangian-based methods are developed for the problem defined in Equation ( 4).In Section 4, some numerical experiments verify the justification of the model defined in Equation ( 4) and the efficiency of the proposed algorithms.Finally, in Section 5, we make some conclusions and discuss some topics for future work.

Preliminaries
In this section, we list some lemmas concerning the shrinkage operators, which will be used at each iteration of the proposed augmented Lagrangian type methods to solve the generated subproblems.Lemma 1.For 0 τ > , and , the solution of the following problem (7) obeys is given by ( ) shrink ,τ T .

( )
shrink , ⋅ ⋅ is a soft shrinkage operator and defined as: Lemma 2. Consider the singular value decomposition (SVD) of a matrix where are orthogonal, and the singular values i σ are real and positive.Then, for all 0 τ > , define the soft-thresholding operator where x + is the operator that ( ) . Then, for each 0 τ > and , the singular value shrinkage operator (10) obeys

Algorithm
An alternative model to study the problem defined in Equation ( 4) is the following nuclear-norm-and 1 l -norm-normalized least squares problem: ( ) Equation ( 12) can be reformulated into the following favourable form: ( ) Alternating Direction Method of Multiplier (ADMM), which is an extension of ALM algorithm, can be used to solve the tensor recovery problem defined in (13).With given ( )    , the ADMM generate the new iterates via the following scheme: See Algorithm 1 for the optimization details.

Stopping Criterion
It can be easily verified that the iterates generated by the proposed ADMM algorithm can be characterized by ) which is equivalent to ( ) Algorithm 1. Optimization framework for problem defined in Equation (13).repeat Update Update penalty parameter ( ) Update Lagrangian multiplier ( ) ( ) if some stopping criterion is satisfied; then Break; else 1 k k ← + ; end if until exceed the maximum number of outer loop.

Convergence Analysis
In this subsection, we mainly analyze the convergence of ADMM for solving problem defined in Equation ( 13).We denote ( ) ( ) f ⋅ is strongly convex, while ( ) f ⋅ are convex terms, but may not be strongly convex.The problem defined in Equation ( 13) can be reformulated as is not empty, f is considered to be proper.If for any , then it is considered that f is convex.Furthermore, f is considered to be strongly convex with the modulus Cai et al. [24] and Li et al. [25] have proved the convergence of Extended Alternating Direction Method of Multipliers (e-ADMM) with only one strongly convex function for the case m = 3. Assumption 1.In Equation ( 18), 1 f and 3 f are convex, and 2 f is strongly convex with modulus 2 0 µ > .Assumption 2. The optimal solution set for the problem defined in Equation ( 18) is nonempty, i.e., there exist ( ) such that the following requirements can be satisfied: ( ) , the limit point of ( ) is an optimal solution to Equation (18).Moreover, the objective function converges to the optimal value and the constraint violation converges to zero, i.e., ( ) ( ) ( ) and where * f denotes the optimal objective value for the problem defined in Equa- tion (18).In our specific application, the convergence [24].

Parameter Choice
In our optimization framework given in Equation ( 13), there are three parameters β , 1 λ and 2 λ .As mentioned in Lu [23], 1 λ does not need any tuning and can be set to to ensure the convergence of our algorithm (based on the analysis in Theorem 1).Thus, the value of 2 λ is important for the performance of our algorithm.For simplicity, we consider the case when  is only degraded by Gaussian noise  without sparse noise  , that is: The solution for Equation ( 26) is equal to χ but with singular values being shifted towards zero by soft thresholding.2 λ should be set large enough to re- move noise (i.e., to keep the variance low), and small to avoid over-shrinking of the original tensor A (i.e., to keep the bias low).For the matrix case (i.e., 3 1 n = ), Candes et al. [26] have deduced the proper value of 2 λ , as shown in the follow- ing theorem.
Theorem 2. Supposing that the Gaussian noise term n n × ∈  F , and each entry , i j n is iid normally distributed, we can have that for with high probability.Then, ( ) Based on this conclusion, we derive the required conditions for convex program defined in Equation ( 13) to accurately recover the low-rank component  from corrupted observations.Our derivations are given in the following main result.
Main Result 1. Assume that the low-rank tensor obeys the incoherence conditions [27].The support set Ω of 0  is uniformly distributed among all sets of cardinality m.Then, there exist universal constants 1 2 , 0 c c > such that with probability at least , 0 0 ,   is the unique minimizer to problem defined in Equation (13).The values of 1 λ and 2 λ can be determined as ( ) ( ) In the same time, the rank of 0  and the number of non-zero entries of 0  should satisfy that (

Experiments on Synthetic Data
In this section, we conduct synthetic data and real data experiments to corroborate our algorithm.We investigate the ability of our proposed Robust Low Rank Tensor Approximation (denoted as RLRTA) algorithm for recovering low-rank tensors of various tubal rank from noises of various sparsity and random Gaussian noise of different intensity.

Exact Recovery from Varying Sparsity of 
We first verify the correct recovery performance of our algorithm for different sparsity of  .Be similar to [23], we consider the tensors of size n n n × × , with varying dimensions 100, 200 n = . We generate a rank t r where the entries of are independently sampled from a uniform distribution in interval ( ) 0,1 n .The support set Ω , with size of sparse component  is chosen uniformly at random.For all ( )  , where  is a tensor with independent Bernoulli 1 ± entries.For  , it can be mathematically expressed as where w.p. is the abbreviation of "with probability".We test on two settings: the first scenario with setting  :,:, 0, , 1 The variance values σ in each frontal slice are randomly selected from 0 to 0.1.In this sub-subsection 1, our task is to recovery  from noisy observation χ = + +    with  of varying sparsity.Table 2. Correct recovery for random problems of varying sparsity using TRPCA [23].

Exact Recovery with  of Varying Intensity
Now we exam the recovery phenomenon with Gaussian noise of varying variances.The generation of is the same as that in sub-subsection 1 and ( ) . The sparse component  has sparsity 0.1 For simplicity, we assume that  is white Gaussian noise, that is where

Phase Transition in Rank and Sparsity
Now we exam the recovery phenomenon with varying rank of  and varying sparsity of  .Similar to [23], we consider two sizes of Table 3. Correct recovery for random problems of varying intensity using RLRTA.
( )  ( )  uniform distribution in interval ( ) 0,1 n .For  , we still consider a Bernoulli model for its support and random signs as in Equation (27).The variance values For each ( ) , s r ρ -pair, we simulate 10 test instances and declare a trial to be successful if the recovered  satisfies the fraction of correct recovery for each pair (black = 0% and white = 100%).It can be seen that there is a large region in which the recovery is correct.

Phase Transition in Rank and Entry-Wise Noise Intensity
Now we exam the recovery phenomenon with varying rank of  and varying intensity of noise  .We still consider two sizes of   support and random signs as in Equation ( 27) and sparsity parameter ρ s is fixed at 0.1.The generation of  is similar to Equation (29).
We set r n as all the choices in [ ] 0.01: 0.05 : 0.5 .The noise variance values

Conclusion
This work verifies the ability of convex optimization for the recovery of lowrank tensors corrupted by both impulse and Gaussian noise.The problem is tackled by integrating the tensor nuclear norm, 1 l -norm and least square term in a unified convex relaxation framework.Parameters are selected to comprise the low-rank component, the sparse component and the Gaussian-noise term.
Besides, the convergence of the proposed algorithm is discussed.Numerical experiments have been conducted to demonstrate the efficacy of our proposed denoising approach.
and zero otherwise.The problem defined in Equation (2) can be solved by the classical Augmented Lagrangian Method (ALM).The separable structure emerging in the objective function and the constraints entails the idea of splitting the corresponding augmented Lagrangian function to derive more efficient numerical algorithms.Tao et al. [5] developed the Alternating Splitting Augmented Lagrangian Method (ASALM) and its variant (VASALM), and the Pa-rallel Splitting Augmented Lagrangian method (PSALM) for solving Equation (2).

Theorem 1 .
Assume that Assumption 1 and Assumption 2 hold.Let ( ) be the sequence generated by Algorithm 1 for solving the problem defined in Equation(18).If and ρ s are positive constants.The value of penalty parameter β should be within the range of noise  in each frontal slice is generated independently with each other, i.e.
algorithm RLRTA and TRPCA.It's shown that RLRTA can better recover the low-rank compnent  under different sparse component  .
randomly selected from 0 to 0.1, and the mean variance values are both set to be 0.05.We set r n as all the choices in [ ] 0.01: 0.01: 0.5 , and ρ s in [ ] 0.01: 0.01: 0.5 .

Figure 1 .
Figure 1.Correct recovery for varying rank and sparsity for RLRTA and TRPCA [23].Fraction of correct recoveries, as a function of n .For  , we still consider a Bernoulli model for its

Figure 2 .
Figure 2. Correct recovery for varying rank and noise variance for RLRTA and TRPCA [23].Fraction of correct recoveries, as a function of σ -pair, we simulate 10 test in- stances and declare a trial to be successful if the recovered  satisfies

2
plots the fraction of correct recovery for each pair (black = 0% and white = 100%).It can be seen that there is a large region in which the recovery is correct.

Table 2 show the Table 1 .
Correct recovery for random problems of varying sparsity using RLRTA.
The noise variance values 2 Table3and Table4show the recovery results of algorithm RLRTA and TRPCA.It's shown that RLRTA can better recover the low-rank compnent  under different Gaussian noise  .