Face Recognition from Incomplete Measurements via l 1-Optimization

In this work, we consider a homotopic principle for solving large-scale and dense underdetermined problems and its applications in image processing and classification. We solve the face recognition problem where the input image contains corrupted and/or lost pixels. The approach involves two steps: first, the incomplete or corrupted image is subject to an inpainting process, and secondly, the restored image is used to carry out the classification or recognition task. Addressing these two steps involves solving large scale minimization problems. To that end, we propose to solve a sequence of linear equality constrained multiquadric problems that depends on a regularization parameter that converges to zero. The procedure generates a central path that converges to a point on the solution set of the underdetermined problem. In order to solve each subproblem, a conjugate gradient algorithm is formulated. When noise is present in the model, inexact directions are taken so that an approximate solution is computed faster. This prevents the ill conditioning produced when the conjugate gradient is required to iterate until a zero residual is attained. 1 


Introduction
Over the last years, new developments in the area of computational harmonic analysis have shown that a wide class of signals can be well represented by linear combinations of only few elements of an appropriate basis.The benefits and applications of this new advance abound and are of extensive research in the present.
The new sampling theory of compressed sensing has unified several insights about wavelets and sparse representation, benefiting several disciplines in sciences including image processing.Practical compressed sensing problems involve solving an optimization problem of the form 0 min subject to , for decoding a sparse signal that has been significantly sub-sampled by a sampling matrix x counts the number of non-zero entries of the vector .
x Solving (1) is equivalent to finding the sparsest vector x such that .Ax b  Nevertheless, finding such a vector x is by nature a combinatorial and generally NP-hard problem [1].Efficient numerical algorithms to recover signals under this framework have been developed [2][3][4], and extensions of this theory have been explored for solving general problems in different areas including statistics, signal processing, geophysics, and others.Significant progress toward understanding this problem has been made in recent years [5][6][7], and its study has become state-of-the-art interdisciplinary research.
One of the most important characteristics of problem (1) is that under some mild conditions, the input vector x  can be recovered by solving an -norm underdetermined problem This decoding model in compressed sensing is known as the basis-pursuit problem, first investigated by Chen, Donoho and Saunders [8] and theoretically studied by Donoho and Huo [9].Candès and Tao [5] proved formally this equivalence provided that x  is sufficiently sparse, and that A possesses certain properties.
The compressed sensing theory is extended to solve a more general problem that considers noise on the measurements, and almost sparsity for the input vector to be recovered.That is, 1 min subject to , where the energy of the noise vector is upper bounded by .
 Problem (3) can also be reformulated as an unconstrained regularized linear least-squares problem.One strategy consists of replacing the Tikhonov regularization with one that uses the 1 -norm [2,4].Another equivalent formulation is aimed at minimizing the 1  -norm function regularized by a linear leasts-quares problem [3], known as unconstrained basis pursuit denoising.All these approaches have proved to be successful for solving compressed sensing problems.


We reformulate problem (3) by 1 min subject to , where is bounded by m   .


In this work we propose a new strategy to obtain an optimal solution to problem (4), and present an application in robust face recognition to demonstrate the effecttiveness of our algorithm.The idea consists of relaxing the nondifferentiable objective function by a sequence of multiquadric, continuously differentiable, strictly convex functions that depend on a positive regularization parameter .
 More precisely, we solve 2 1 min subject to .
The main accomplishment of this idea is that it leads to the generation of a path that converges to an optimal solution of problem (4).This leads us to a path-following algorithm similar to the ones used in primal-dual interior-point methods.The path-following strategy that we are proposing uses inexact fixed-point directions to obtain approximate solutions to problems of the form (5). Such inexact directions are computed via a conjugate gradient algorithm.In order to prevent the procedure from becoming costly, a proximity measure to the central path is introduced for each regularization parameter.The regularization parameter is defined in a dynamic manner that converges to zero as in interior-point methods.

Problem Formulation
We study the underdetermined problem (4), where and A is a full-rank matrix with .The Lagrangian function associated with problem (4) is where is the Lagrange multiplier for the equality constraint.The optimality conditions for (4) are given by Notice that the main role in the characterization of the optimal conditions for problem (4) is not played by the Lagrange multiplier , y  but by .
T A y  Using this fact, the complementarity conditions associated with the primal variables i x  are determined by x y     to be an optimal solution of (4) is T 1 0, 0 and 0.

A Regularization Path
The nondifferentiability of ( 4) is overcome by regularizing the 1 -norm with a sum of continuously differentiable functions in the following way: for  0 where g is the scalar function defined by   2 , .

Optimality Conditions
We propose to obtain an optimal solution to problem (4) by solving a sequence of subproblems of the form (5) as 0.

 
Since each subproblem (5) is strictly convex, then the optimal solution is obtained by solving the associated KKT conditions.The Lagrangian function associated with ( 5) is , where m y  is the Lagrange multiplier for the equality constraint.Therefore, the KKT conditions are given by the following square system of nonlinear equations: where for 1, 2, , and 0.

A Fixed Point Problem for the KKT Conditions
We propose to solve (7) using a fixed-point method.To that effect, we rewrite these nonlinear equations as the augmented system   where ,   The matrix associated with ( 8) is nonsingular since A has full rank and is positive definite.In this manner, the nonlinear Equation ( 7) is posed as a fixed- point problem.In order to solve (8) for a fixed 0,   we proceed by taking an initial point 0 x and iteratively compute   for until two consecutive iterations are less than some stopping criteria.0,1, k  

Inexact Directions for the Augmented System
For a current point _, x the following system   is reduced to a weighted normal equation.The first block of equations gives With this reduction, we move from an indefinite system of order to a positive definite system of order Moreover, the conjugate gradient algorithm applied to (10) converges in at most iterations in exact arithmetic.The solution

Taking into account that the values of T
A y characterize the optimality set * , X we formulate a conjugate gradient algorithm that finds an approximation of T A y rather than y, see Figure 1.
At each iteration, the CG algorithm satisfies the first block of equations  therefore controlling the stopping criteria for solving the augmented system ( 9) is equivalent to controlling the stopping criteria for solving the linear system 0.
as the or for the augmented system, where .
the conjugate hm gradient algorit stops when .

Ax b   
This implies the stopping criterio zero, overcoming the ill-conditioning of the

Central Path
We define the centra by The set consists of all the points that are sol of the su blem (5) for C bpro utions 0.

) as t
This set defines a smooth cur called the central path that converges to an optimal solution of problem (4 he regularization parameter ve  tends to zero [10].

Pro mity Measures xi
ollows in the direction sequence f regularization Our path-following method f generated by a decreasing C o parameters  Since moving on C obtain an optimal solution for (4) could be computationally expensive, we restrict th iterates to some neig orhoods of the central path given by to e hb 1 : . 1

Updating the Perturbation Parameter
is a necessary condition for obtain optimal solution of problem (4), then following th ing an e same primal-idea of dual methods we define the regularization parameter by n does not need to be weighted normal equation close to the solution. and repeat the fixed-point procedure.An optimal solutio found as , n for ( 4) is  approaches to zero.For the primal variables x  we define their corresponding complementarity variables z  such that 0 at the optimal solution of problem (4).This allows us to define the regularization parameter  in the same manner as in interior-point methods.

Path Following Algorithm
In this sec ry" ( tion we PFSR prese t our "Path-) algo ithm.The pseu 2.
n r Following Recove do-cod Sign e for al m of the algorithm is presented in Figure The PFSR algorithm generates two sequences of iterates.The first sequence (inner loop) generates a series of iterates for obtaining an approximate solution of subproblem (5) for a fixed regularization parameter 0.
  The second sequence (outer loop) generates a series of approximate solutions for the subproblem (5) that converges to an optimal solution of problem (4) for a sequence of decreasing regularization parameters 0.

Sparse Signal Recovery
In this section, we present a set of experiment lt al res AB im u s ple-that illustrate the performance of th m.

e MATL mentation for the proposed algorith
In the implementation of Algorithm 1 the initial points and the parameters are chosen as follows.In Step 1a, the initial points for T  A y and x are the n-dimensional zero vector.In Step 2, we fix the maximum number of CG iterations by cg_maxiter = 10.
In the implementation of Algorithm 2 the parameters are chosen as follows.The initial regularization parameter  is given by

Sparse Signal Recovery Example
This experimentation has the obje ive of investigating the cap g sparse ct ability of recoverin signals by our PFSR algorithm.The goal in this test is to reconstruct a re m < n. length-n sparse signal from m observations, whe We start with a classical example also considered in [2,4,11].The problem consists of recovering a signal 4096 x  with 160 spikes with amplitude ±1, from m = 1024 noisy measurements.We use a partial DCT matrix A whose m rows are chosen randomly from the n n  discrete cosine transform, without having access to it in explicit form, but using A as a linear operator on . n The same for the matrix T .
A Partial DCT matrices are fast transforms for which matrix-vector multiplications cost just   log O n n flops, and storage is not required.This case is common for compressed sensing [3,4].this problem, we have ,

The Effect of Noise
For the same test problem described above, we run the algorithm considering the noiseless case, and zero mean Gaussian no arying from 0.01 to 0.04.The stopping criterion for the conjugate ed by the noise level as exroblems for instance, an input sample can usually be exat is, be expressed as a linear ise with standard deviations v gradient residual is determin plained in Section 3.3.In all cases we successfully recover the original signal after a process of thresholding.Table 1 reports the results for this experimentation showing that the algorithm is also effective for solving noiseless sparse recovery problems.Moreover, successful recovery was obtained with noise level up to 0.04.

Sparse Representation in Classification and Image Processing
There exist a number of areas where the sparse representtation model (4) emerges naturally.In classification p plained from few other previously trained samples.Th an incoming sample b can combination of only few columns  is a matrix whose columns are previously trained samples.
On the other hand, natural images can be efficiently encoded in an appropriate basis that exploits the hight correlation present between pixels.Among many emerging class of transformations, the Di crete Cosine Transform (DCT) and Discrete Wavelet Transform (DWT) are a s 0.01 0.02 0.03 0.04 standard basis where natural images can be sparsely represented.This property has been extensively studied in recent years, leading to a construction of new models for images where the sparsity is incorporated in the model as a regularizer.

Classification
In pattern recognition and machine learning, a classifica- roblem is formulated as follows: For a testing sam .
sampl erefore inducing a rse representation.rrange the given samples from the same i-th class as the colum s of a submatrix We show that indeed a valid test sample can be represented using only the training es from its same class, th natural spa Let us rea training In other words, we group all of those samples with the same label into a matrix .

Discriminant Functions and Classifier
s computed, we which One of the advantages of this formulation is that the lack of robustness with respect to outliers can be overcome.Furthermore, we do not need to care for model selection as in support vector machine approaches for classification problems [13] Once the sparse representation vector x i identify the class to which the testing sample b belongs.The approach consists in associating the nonzero entries of x with the columns of A corresponding to those training samples having the same class of the testing sample b.The solution vector x is decomposed as the sum of d-dimensional vectors ˆ, k x where ˆk x is obtained by h class k keeping only those entries in x associated wit and assigning zeros to all the other entries.Then, we define the N discriminant functions Thus, k g represents the approximation error when b is assigned to category k.Finally, we assign b to the class with the smallest approxi tion error.That is, ma In this manner, we identify the class of the test sample b based on how effectively the coefficients as with the training samples of each class recreate b.
red by its error rate on the entire population.Cross Valida statistical method for evaluating performance i th eing validated.sociated

Cross Validation
A classifier performance is commonly measu tion is a n which e data is divided in two sets: one used for the training stage, and the second one used for testing (validation).
Both training and testing sets should cross-over in consecutive rounds in such a way that each sample in the data set has a chance of b In the case of K-fold cross validation, a K-fold partition of the dataset is created by splitting the data into K equally (nearly equal) sized subsets (folds), and then for each of the K experiments, K − 1 folds are used for training and the remaining one for testing.A common choice for K-Fold cross validation is K = 10.The work in [12], compares several approaches for estimating accuracy, and recommends stratified 10-fold cross-validation as the best model selection method because it provides less biased estimation of the actual accuracy.In our numerical experimentation we follow this validation approach to test the performance of the classification algorithm.

Image Processing
In where  is an sparsifying matrix for u, and x is a sparse vector of coefficients.That is, .u x   When H models a process of missing information, it receives the name of mask, and is constructed by removing from the n n  identity matrix, the n − m rows asso- ciated with the missing data.In this setting, we consider that n − m pixels has been lost, leading to an incomplete image .In this section, we demonstrate the effectiveness of the proposed lgorithm by showing a rea a l application in robust face recognition.The challenge consists in automatically identify an input human face image within an existing database of several individuals [14].Furthermore, we assume that the input image has been subject to a data loss process, or that several of its pixels has been severely co te We consider the database of hu AT&T laboratories in Cambridge (C Computer Laboratory) 1 which consists of 400 images of 40 individuals each of which with ten different images.For some individuals, the images were taken at different times, lighting and facial expression.The size of each image is 112 92  pixels with 256 gray levels in a pgm format.
The proposed approach involves two steps.First, the test image containing several corrupted or lost pixels is reconstructed via inpainting.es.Despite Figure 4 shows two corrupt test images with a total of 34.12% (top) and 15.18% (bottom ectively, in our numerical experiments we consider a percentage of lost data ranging from 5 to 40%.The average recognition rate obtained after the cross validation experiments was 97.25%, and Table 2 reports each of these rates per fold. In Figure 5 one can notice that the solution vector x for problem (17) is sparse, and its nonzero components mark those training samples i u in the dictionary A that are from the same class of the test sample b, that is, those other pictures of the individual b in the training dataset.

Concluding Remarks
In this work we present a novel methodology for solving large-scale and dense  underdetermined problems.Fo
vious value for the regularizetion parameter.Now, we present a global orithm for so the convex and nondifferentiable problem (4).The methodology consists in following the central path C to note re ization alg lving obtain an optimal solution.To prevent the algorithm from becoming computationally expensive a series of neighborhoods, , k   around the central path are defined to be used as proximity measures of the path for a decreasing regularization parameters . To obtain an approximate solution on the path for a given 0,   an inexact fixed-point procedure is applied to (8) until a point   , .x y    If an optimal solution to (4) is not found, we decrease ,  specify a new neighborhood 

Figure 2 .
Figure 2. PFSR algorithm.where x is the minimum energy reconstruction.Our num perimentation suggests erical ex 0.008   0 as the (5) to m as a In Step 3, we axium ber o solve.good choice.num set maxiter = 1 f subproblems of the form m The new regularization parameter  is updated in Step 10 by   min , gap     with 0.9.

A
Any test sample b from the same class will be represented as a linear combination of the training samples associated with class i: use of the whole training dataset, we define a d n  matrix A by concatenating all of the n training samples of the different N classes, that is test sample b is expressed by a sparse linear combination of the training 16    samples, more specifically, as a linear combination of only th longing to the same class.This motivates us to formulate the following problem:

Figure 4
depicts the recognition process where the test image is corrupted by two different kind of masks.In our numerical experiments, we perform a c ss validation scheme in order to assess a recognition rate for a total of 10-fold cross validation runs, e  lves 40 test problems.None of these experiments included the test image in the training dataset.We consider three different types of corruptive masks: 1) Random missing pixels uniformly distributed over the test image, Corruptive horizontal line 9. Acknowledgements The authors want to thank the financial support provided by the US Army Research Laboratory, through the Incomplete test image Inpainted test image PFSR recognition

Table 1 . Performance considering different setups for noise.
i u a classifier is a mapping f from W to T, inin   , : 1, , , many practical image processing applications, we are interested in recover a target image , To that end, we solve problem (21) where b is the corrupted test image,  is the wavelet Daubechies level 7 matrix, and H is the matrix ering large signals with noisy data.The sparse representtation approach is applied to a face recognition problem where the data is incomplete and/or corrupted.An inpainting procedure is carried out for reconstructing the input image, and then a classification process is performed in order to identify the correct individual.Both processes are accomplished with the proposed algorithm for 1 minimization problems, achieving a high recognition rate.(mask)associated with the missing pixels.Secondly, we exhibit the reconstructed test image to a training dataset in order to find the sparsest linear combination o ining samples that better represents the test imag 1 minimization problem.