1. Introduction
The problem of exploiting low-dimensional structures in high-dimensional data is taking on increasing importance in image, text and video processing, and web search, where the observed data lie in very high dimensional spaces. The principal component analysis (PCA) proposed in [1] is the most widely used tool for low-rank component extraction and dimensionality reduction. It is easily implementable and efficient for data mildly corrupted by small noise. However, this PCA method is sensitive to data corrupted by heavy impulse noise or outlying observations. Then, a number of robust PCA methods were proposed. But none of these approaches yield a polynomial-time algorithm with strong performance guarantees under broad conditions. The proposed Robust PCA [2] is one among the earliest polynomial-time algorithms. Assume that a data matrix
consists of a low-rank matrix
and a sparse matrix
. Then,
and
can be recovered with a high probability by solving the following convex relaxation problem (if
is low-rank and satisfies some incoherent conditions,
is sufficiently sparse):
(1)
where
denotes the nuclear norm of
and
denotes the
-norm of
. Nuclear norm and
-norm are used to induce low rank and sparsity, specifically.
is a parameter balancing the low rank and sparsity. Candes et al. [2] prove that
works with a high probability for recovering any low-rank, incoherent matrix. Notably, Chandrasekaran et al. [3] also consider the problem of decomposing a given data matrix into sparse and low-rank components, and give sufficient conditions for convex programming to succeed. Their work was motivated by applications in system identification and learning of graphical models. In contrast, Candes et al. [2] were motivated by robust principal component computations in high dimensional settings when there were erroneous and missing entries; missing entries were not considered in [3] . In [3] , the parameter
is data-dependent, and may have to be selected by solving a number of convex programs, while Candes et al. [2] provide a universal value of
, namely,
. The distinction between the two results is a consequence of different assumptions about the origin of the data matrix
.
In many real world applications, we need to consider the model defined in Equation (1) under more complicated circumstance [4] [5] . First, only a fraction of entries of
can be observed. This is the well-known matrix completion problem. If the unknown matrix is known to have low rank or approximately low rank, then accurate and even exact recovery is possible by nuclear norm minimization [6] [7] . Second, the observed data are corrupted by both impulse noise (sparse but large) and Gaussian noise (small but dense). Let
be the superposition of low-rank matrix
, the impulse noise matrix
and the Gaussian noise
, i.e.,
. The Gaussian noise of the observed entries is assumed to be small in the sense that
, where
is the Gaussian noise level and
is the Frobenius norm. Then, to be broadly applicable, we consider the following extension of model defined in Equation (1):
(2)
where
is a subset of the index set of entries
. It’s assumed that only these entries
can be observed. The operator
is a orthogonal projection onto the span of matrices vanishing outside of
so that the ij-th entry of
is
if
and zero otherwise. The problem defined in Equation (2) can be solved by the classical Augmented Lagrangian Method (ALM). The separable structure emerging in the objective function and the constraints entails the idea of splitting the corresponding augmented Lagrangian function to derive more efficient numerical algorithms. Tao et al. [5] developed the Alternating Splitting Augmented Lagrangian Method (ASALM) and its variant (VASALM), and the Parallel Splitting Augmented Lagrangian method (PSALM) for solving Equation (2).
One shortcoming of model defined in Equation (2) is that it can only handle matrix (two-way) data. However, the real-world data are ubiquitously in multi-way, also referred to as tensor. For example, a color image is a 3-way object with column, row and color modes; a greyscale video is indexed by two spatial variables and one temporal variable. If we use the model defined in Equation (2) to process the tensor data, we have to unfold the multi-way data into a matrix. Such a preprocessing usually leads to the loss of the inherent structure high-di- mensional information in the original observations. To avoid this negative factor, a common approach is to manipulate the tensor data by taking the advantage of its multi-dimensional structure. Tensor analysis have many applications in computer vision [8] , diffusion Magnetic Resonance Imaging (MRI) [9] [10] [11] , quantum entanglement problem [12] , spectral hypergraph theory [13] and higher-order Markov chains [14] .
The goal of this paper is to study the Tensor Robust PCA which aims to accurately recover a low-rank tensor from impulse and Gaussian noise. The observations can also be incomplete. Tensors of low rank appear in a variety of applications such as video processing (d = 3) [15] , time-dependent 3D imaging (d = 4), ray tracing where the material dependent bidirectional reflection function is an order four tensor that has to be determined from measurements [15] , numerical solution of the electronic Schrödinger equation (
, where N is the number of particles) [16] [17] [18] , machine learning [19] and more. More specifically, suppose we are given a data tensor
(
is the number of ways), and it can be decomposed as
(3)
where
is low-rank and
is sparse.
is Gaussian noise with the noise level being
. Then, we try to recover the low-rank
through the following convex relaxation problem:
(4)
1.1. Related Work
Although the recovery of low-rank matrix has been well studied, the research of low-rank tensor recovery is still lacking. This is mainly because it’s difficult to define a satisfactory tensor rank which enjoys similar good properties as the matrix case. Several different definitions of tensor rank have been proposed but each has its limitation. For example, the CP rank [20] is defined as the smallest number of rank one tensors that sum up to
, where a rank one tensor is of the form
. Expectedly, the CP rank is NP-hard to compute. Its convex relaxation is also intractable. Another more popular direction is to use the tractable Tucker rank [20] and its convex relaxation. For a d-way tensor
, the Tucker rank is a vector defined as
,
where
is the mode-i matricization of
. Motivated by the fact that the nuclear norm is the convex envelop of the matrix rank within the unit ball of the spectral norm. The Sum of Nuclear Norms (SNN), defined as
, is used as a convex surrogate of the Tucker rank. This approach is effective, but SNN is not a tight convex relaxation of Tucker rank.
More recently, the work [21] proposed the tensor tubal rank based on a new tensor decomposition scheme denoted as t-SVD. The t-SVD is based on a new tensor-tensor product which enjoys many similar properties as the matrix case. Based on the computable t-SVD, the tensor nuclear norm [22] is used to replace the tubal rank for low-rank tensor recovery (from incomplete/corrupted tensors). The problem of recovering the unknown low-rank tensor
from incomplete samples
can be solved by the following convex program [21] ,
(5)
where
is the nuclear norm of
,
is the index set of known elements in the original tensor, and
is the projector onto the span of tensors. Lu et al. [23] extended the work and studied the Tensor Robust Principal Component (TRPCA) problem, as defined in the following equation:
(6)
In this work, we go one step further, and consider recovering low-rank and sparse components of tensors from incomplete and noisy observations as defined in Equation (4).
1.2. Paper Contribution
The contributions of this work are two-fold:
・ A unified convex relaxation framework is proposed for the problem of recovering low-rank and sparse components of tensors from incomplete and noisy observations. Three augmented-Lagrangian-based algorithms are developed for the optimization problem.
・ Numerical experiments on synthetical data validate the efficacy of our proposed denoising approach.
1.3. Paper Organization
The rest of the paper is organized as follows. In Section 2, some preliminaries that are useful for the subsequent analysis are provided. In Section 3, three augmented-Lagrangian-based methods are developed for the problem defined in Equation (4). In Section 4, some numerical experiments verify the justification of the model defined in Equation (4) and the efficiency of the proposed algorithms. Finally, in Section 5, we make some conclusions and discuss some topics for future work.
2. Preliminaries
In this section, we list some lemmas concerning the shrinkage operators, which will be used at each iteration of the proposed augmented Lagrangian type methods to solve the generated subproblems.
Lemma 1. For
, and
, the solution of the following problem (7) obeys
(7)
is given by
.
is a soft shrinkage operator and defined as:
(8)
Lemma 2. Consider the singular value decomposition (SVD) of a matrix
of rank
.
(9)
where
and
are orthogonal, and the singular values
are real and positive. Then, for all
, define the soft-thresholding operator
,
(10)
where
is the operator that
. Then, for each
and
, the singular value shrinkage operator (10) obeys
(11)
3. Algorithm
An alternative model to study the problem defined in Equation (4) is the following nuclear-norm- and
-norm- normalized least squares problem:
(12)
Equation (12) can be reformulated into the following favourable form:
(13)
Alternating Direction Method of Multiplier (ADMM), which is an extension of ALM algorithm, can be used to solve the tensor recovery problem defined in (13). With given
, the ADMM generate the new iterates via the following scheme:
(14)
See Algorithm 1 for the optimization details.
3.1. Stopping Criterion
It can be easily verified that the iterates generated by the proposed ADMM algorithm can be characterized by
(15)
which is equivalent to
(16)
Algorithm 1. Optimization framework for problem defined in Equation (13).
Equation (16) shows that the distance of the iterates
to the
solution
can be characterized by
and
. Thus, a straightforward stopping criterion for Algorithm 1 is:
(17)
Here
is an infinitesimal number, e.g., 10−6.
3.2. Convergence Analysis
In this subsection, we mainly analyze the convergence of ADMM for solving problem defined in Equation (13). We denote
,
, and
.
is strongly convex, while
and
are convex terms, but may not be strongly convex. The problem defined in Equation (13) can be reformulated as
(18)
Definition 1. (Convex and Strongly Convex) Let
, if the domain of
denoted by
is not empty,
is considered to be proper. If for any
and
, we always have
, then it is considered that
is convex. Furthermore,
is considered to be strongly convex with the modulus
, if
(19)
Cai et al. [24] and Li et al. [25] have proved the convergence of Extended Alternating Direction Method of Multipliers (e-ADMM) with only one strongly convex function for the case m = 3.
Assumption 1. In Equation (18),
and
are convex, and
is strongly convex with modulus
.
Assumption 2. The optimal solution set for the problem defined in Equation (18) is nonempty, i.e., there exist
such that the following requirements can be satisfied:
(20)
(21)
(22)
(23)
Theorem 1. Assume that Assumption 1 and Assumption 2 hold. Let
be the sequence generated by Algorithm 1 for solving the problem
defined in Equation (18). If
, the limit point of
is an optimal solution to Equation (18). Moreover, the objective function con- verges to the optimal value and the constraint violation converges to zero, i.e.,
(24)
and
(25)
where
denotes the optimal objective value for the problem defined in Equa-
tion (18). In our specific application,
can sufficiently ensure
the convergence [24] .
3.3. Parameter Choice
In our optimization framework given in Equation (13), there are three parameters
,
and
. As mentioned in Lu [23] ,
does not need any tuning and can be set to
, where
. Besides, the value of
is limited to the range
to ensure the convergence of our algo-
rithm (based on the analysis in Theorem 1). Thus, the value of
is important for the performance of our algorithm. For simplicity, we consider the case when
is only degraded by Gaussian noise
without sparse noise
, that is:
(26)
The solution for Equation (26) is equal to
but with singular values being shifted towards zero by soft thresholding.
should be set large enough to remove noise (i.e., to keep the variance low), and small to avoid over-shrinking of the original tensor A (i.e., to keep the bias low). For the matrix case (i.e.,
), Candes et al. [26] have deduced the proper value of
, as shown in the following theorem.
Theorem 2. Supposing that the Gaussian noise term
, and each entry
is iid normally distributed, we can have that for
,
with high probability. Then,
. That is,
.
Based on this conclusion, we derive the required conditions for convex program defined in Equation (13) to accurately recover the low-rank component
from corrupted observations. Our derivations are given in the following main result.
Main Result 1. Assume that the low-rank tensor
obeys the incoherence conditions [27] . The support set Ω of
is uniformly distributed among all sets of cardinality m. Then, there exist universal constants
such that with probability at least
,
is the unique minimizer to problem defined in Equation (13). The values of
and
can be determined
as
and
. In the same time, the rank of
and the number of non-zero entries of
should satisfy that
where
and
. ρr and ρs are positive constants.
The value of penalty parameter β should be within the range of
to
ensure the convergence.
4. Experiments on Synthetic Data
In this section, we conduct synthetic data and real data experiments to corroborate our algorithm. We investigate the ability of our proposed Robust Low Rank Tensor Approximation (denoted as RLRTA) algorithm for recovering low-rank tensors of various tubal rank from noises of various sparsity and random Gaussian noise of different intensity.
4.1. Exact Recovery from Varying Sparsity of
We first verify the correct recovery performance of our algorithm for different sparsity of
. Be similar to [23] , we consider the tensors of size
, with varying dimensions
. We generate a
tensor
, where the entries of
and
are independently sampled from a uniform distribution in interval
. The support set
, with size
of sparse component
is chosen uniformly at random. For all
, let
, where
is a tensor with independent Bernoulli
entries. For
, it can be mathematically expressed as
(27)
where
is the abbreviation of “with probability”. We test on two settings: the first scenario with setting
and
. The second scenario with setting
and
.
The Gaussian noise
in each frontal slice is generated independently with each other, i.e.
(28)
The variance values
in each frontal slice are randomly selected from 0 to 0.1. In this sub-subsection 1, our task is to recovery
from noisy observation
with
of varying sparsity. Table 1 and Table 2 show the
![]()
Table 1. Correct recovery for random problems of varying sparsity using RLRTA.
![]()
Table 2. Correct recovery for random problems of varying sparsity using TRPCA [23] .
recovery results of algorithm RLRTA and TRPCA. It’s shown that RLRTA can better recover the low-rank compnent
under different sparse component
.
4.2. Exact Recovery with
of Varying Intensity
Now we exam the recovery phenomenon with Gaussian noise of varying variances. The generation of
is the same as that in sub-subsection 1 and
. The sparse component
has sparsity
. For simplicity, we assume that
is white Gaussian noise, that is
(29)
where
. The noise variance values
are 0.02, 0.04, 0.06, 0.08 and 0.1, respectively Table 3 and Table 4 show the recovery results of algorithm RLRTA and TRPCA. It’s shown that RLRTA can better recover the low-rank compnent
under different Gaussian noise
.
4.3. Phase Transition in Rank and Sparsity
Now we exam the recovery phenomenon with varying rank of
and varying sparsity of
. Similar to [23] , we consider two sizes of
: (1)
![]()
Table 3. Correct recovery for random problems of varying intensity using RLRTA.
![]()
Table 4. Correct recovery for random problems of varying intensity using TRPCA [23] .
, (2)
. We generate
, where the entries of
and
are independently sampled from a uniform distribution in interval
. For
, we still consider a Bernoulli model for its support and random signs as in Equation (27). The variance values
in each frontal slice
are randomly selected from 0 to 0.1, and the mean variance values are both set to be 0.05.
We set
as all the choices in
, and ρs in
. For each
-pair, we simulate 10 test instances and declare a trial to be successful if the recovered
satisfies
. Figure 1 plots
(a)
(b)
Figure 1. Correct recovery for varying rank and sparsity for RLRTA and TRPCA [23] . Fraction of correct recoveries, as a function of
(
-axis) and sparsity of
(
-axis).
the fraction of correct recovery for each pair (black = 0% and white = 100%). It can be seen that there is a large region in which the recovery is correct.
4.4. Phase Transition in Rank and Entry-Wise Noise Intensity
Now we exam the recovery phenomenon with varying rank of
and varying intensity of noise
. We still consider two sizes of
: (1)
, (2)
. We generate
, where the entries of
and
independently sampled from a uniform distribution in interval
. For
, we still consider a Bernoulli model for its
(a)
(b)
Figure 2. Correct recovery for varying rank and noise variance for RLRTA and TRPCA [23] . Fraction of correct recoveries, as a function of
(
-axis) and variance of
(
-axis).
support and random signs as in Equation (27) and sparsity parameter ρs is fixed at 0.1. The generation of
is similar to Equation (29).
We set
as all the choices in
. The noise variance values
are in
. For each
-pair, we simulate 10 test instances and declare a trial to be successful if the recovered
satisfies
. Figure 2 plots the fraction of correct recovery for each pair (black = 0% and white = 100%). It can be seen that there is a large region in which the recovery is correct.
5. Conclusion
This work verifies the ability of convex optimization for the recovery of low- rank tensors corrupted by both impulse and Gaussian noise. The problem is tackled by integrating the tensor nuclear norm,
-norm and least square term in a unified convex relaxation framework. Parameters are selected to comprise the low-rank component, the sparse component and the Gaussian-noise term. Besides, the convergence of the proposed algorithm is discussed. Numerical experiments have been conducted to demonstrate the efficacy of our proposed denoising approach.
Acknowledgements
The authors would like to thank Canyi Lu for providing the code for TRPCA algorithm.