^{1}

^{*}

^{1}

^{1}

As a new dimension reduction method, the two-dimensional principal component (2DPCA) can be well applied in face recognition, but it is susceptible to outliers. Therefore, this paper proposes a new 2DPCA algorithm based on angel-2DPCA. To reduce the reconstruction error and maximize the variance simultaneously, we choose F norm as the measure and propose the Fp-2DPCA algorithm. Considering that the image has two dimensions, we offer the Fp-2DPCA algorithm based on bilateral. Experiments show that, compared with other algorithms, the Fp-2DPCA algorithm has a better dimensionality reduction effect and better robustness to outliers.

Principal component analysis (PCA) [

PCA only works in the image’s row direction for dimensionality reduction, and the compression ratio is meager. To solve this problem, some methods based on bilateral compression are proposed. Kong et al. [

PCA and 2DPCA are both based on the sum of the least square F-norm, equivalent to the least square loss or the square L2 norm. As we all know, the least direct loss is not robust because the edge data points can easily make the solution deviate from the expected answer. Compared with square F-norm, the L1 norm is more vital to outliers. Kwak [

In recent years, to overcome this defect, Gao et al. [

The rest of this paper is organized as follows. The second section introduces the theory of 2DPCA, including 2DPCA and 2DPCA-L1. The third section proposes the Fp-2DPCA algorithm and the bilateral Fp-2DPCA algorithm and proves the algorithm’s convergence. We also discuss the rotation invariance of the algorithm. In Section 4, we perform numerical experiments to evaluate the performance of our algorithm. Finally, the conclusion of this paper is given in the fifth section.

2DPCA is the result of expanding to two-dimensional space based on PCA. Still, its basic idea is the same as PCA: to maximize the variance sum of the original data projected to the main components to maintaining the maximum amount of information of the original data as far as possible [

max W T W = I k t r ( ∑ i = 1 M W T ( A i ) T A i W T ) = max W T W = I k ∑ i = 1 M ‖ A i W ‖ F 2 (1)

t r ( · ) is the trace of the matrix; if there is a n dimensional matrix A, then the trace of the matrix A is equal to the sum of the eigenvalues of A, that is, the sum of the main diagonal elements of the matrix A. Since the F-norm is used in (1), the following equation is satisfied:

∑ i = 1 M ‖ E i ‖ F 2 + ∑ i = 1 M ‖ A i W ‖ F 2 = ∑ i = 1 M ‖ A i ‖ F 2 (2)

So (1) is equivalent to:

min W T W = I ∑ i = 1 M ‖ E i ‖ F 2 (3)

where I k ∈ R d × d is a k dimensional identity matrix, ‖ ⋅ ‖ F is the F norm of the matrix, E i = A i − A i W W T . Objective functions Equation (1) and Equation (3) show that 2DPCA mainly considers the reconstruction error or variance contribution of image data.

As shown in the objective function Equation (1), the square F norm is used as the distance measure. Still, the square F norm is not robust because the edge’s observation value will quickly make the solution deviate from the expected answer. There is 2DPCA based on L_{1} norm to solve this problem, which can reduce this influence to a certain extent.

The objective function of 2DPCA-L1 is as follows：

max W T W = I ∑ i = 1 M ‖ A i W ‖ L 1 2 (4)

‖ ⋅ ‖ L 1 is the L_{1} norm of the matrix. Compared with the traditional 2DPCA, 2DPCA-L1 is more robust to the data with outliers, but it also has some defects. Firstly, it does not satisfy the rotation invariance:

∑ i = 1 M ‖ E i ‖ L 1 + ∑ i = 1 M ‖ A i W ‖ L 1 ≠ ∑ i = 1 M ‖ A i ‖ L 1 (5)

Obviously, the solution of Equation (4) is not the solution of Equation (6); that is to say, it does not consider the reconstruction error. As a result, their robustness has not been greatly improved. It still thinks that every image has the same contribution. The outliers or noises make the samples have sparse distribution, which also affects the robustness of the model. Secondly, it is very difficult to solve the objective function Equation (6). In order to solve these problems, a new robust 2DPCA model is proposed in the third part.

min W T W = I ∑ i = 1 M ‖ E i ‖ L 1 2 (6)

It can be seen from the above analysis that the square F norm exaggerates the role of some data (mainly noise) in solving the 2DPCA model. This reduces the robustness of 2DPCA to noise. Therefore, to overcome the limitations of the above methods, we should adopt an appropriate distance measure, which can reduce the influence of outliers in the objective function and characterize the geometric structure of the objective function. F norm and square f norms have the same position in characterizing data dispersion and geometric design in the normative sense. The main difference between them is that, compared with the square F norm, the f norm can make the influence difference of different data smaller. Therefore, if F-norm is selected as the distance measure in 2DPCA, it will have the following two advantages:

1) It can capture the geometry structure well and has rotation invariance.

2) It can reduce the role of outliers in solving the optimal projection direction.

3) It helps to enhance the part of some adjacent data points with different labels.

Since the relationship between variance and reconstruction error is nonlinear, the maximum variation does not guarantee the minimum reconstruction error. According to the above analysis, we propose a new dimension reduction method, namely Fp-2DPCA. Fp-2DPCA uses F-norm to represent the low dimensional representation and reconstruction errors and integrates them into the criterion function. Specifically, our goal is to find the projection direction, minimize the angle between the projection directions, and reconstruct each data’s error. The objective function of Fp-2DPCA is as follows .

min W T W = I k ∑ i = 1 M ‖ E i ‖ F p ‖ A i W ‖ | F p ( 0 < p < 2 ) (7)

Through the simple algebraic operation, we can get the following results:

∑ i = 1 M ‖ E i ‖ F p ‖ A i W ‖ F p = ∑ i = 1 M ‖ E i ‖ F 2 ‖ E i ‖ F p − 2 ‖ A i W ‖ F p = ∑ i = 1 M ‖ E i ‖ F 2 ∗ d i = ∑ i = 1 M ‖ A i − A i W W T ‖ F 2 ∗ d i = ∑ i = 1 M t r ( A i T A i − W T A i T A i W ) ∗ d i = ∑ i = 1 M ‖ t r ( G ) − t r ( W T G W ) ‖ (8)

Among them,

G = ∑ i = 1 M t r ( A i T A i d i ) , d i = ‖ E i ‖ F p − 2 ‖ A i W ‖ F p = ‖ A i − A i W W T ‖ F p − 2 ‖ A i W ‖ F p (9)

According to Equation (8) and Equation (9), the objective function Equation (7) becomes:

t r ( G ) − t r ( W T G W ) (10)

Several theorems are introduced before solving the objective function Equation (10).

Lemma 1 Cauchy Schwarz inequality: for all sequences of real numbers a i and b i , we have

( ∑ i = 1 n a i 2 ) ( ∑ i = 1 n b i 2 ) ≥ ( ∑ i = 1 n a i b i ) 2 (11)

Equality holds if and only if a i = k b i for a non-zero constant k ∈ ℝ .

Theorem 1 For matrices P, Q of the same order as any two, we can get:

t r ( P T Q ) ≤ ‖ P ‖ F ‖ Q ‖ F (12)

If and only if P = l Q , the equal sign holds, and l is any real number.

Proof of Theorem 1:

According to the definition of matrix trace, we can get the following results

t r ( P T Q ) = ( v e c ( P ) ) T v e c ( Q ) (13)

According to Cauchy Schwarz inequality (Equation (11)), there is a

( v e c ( P ) ) T v e c ( Q ) ≤ ‖ v e c ( P ) ‖ 2 ‖ v e c ( Q ) ‖ 2 = ‖ v e c ( P ) ‖ F ‖ v e c ( Q ) ‖ F (14)

v e c ( ⋅ ) is the vectorization of the matrix; that is, let A m × n = ( a 1 , a 2 , ⋅ ⋅ ⋅ , a n ) define the vector (Equation (15)) of m n × 1 , which is the vector that arranges the matrix A in column vectors.

v e c ( A ) = ( a 1 a 2 ⋮ a n ) (15)

Therefore, according to Equation (13) and Equation (14), it can be obtained that:

t r ( P T Q ) ≤ ‖ P ‖ F ‖ Q ‖ F (16)

And if and only if P = l Q ( l is any real number), the equal sign holds.

Theorem 2 Let the SVD of H ∈ R m × n be decomposed into H = U Σ V T , where U T U = V T V = I k , Σ ∈ R k × k is a nonsingular diagonal matrix, and its diagonal element λ j ( j = 1 , ⋯ , k ) is the singular value of H, k = r a n k ( H ) . Then W = U V T is the solution of:

max W T W = I k t r ( W T H ) (17)

The proof of theorem 2:

According to the SVD decomposition of H, we can get:

t r ( W T H ) = t r ( W T U Σ V T ) = t r ( U Σ 1 / 2 Σ 1 / 2 V T W T ) (18)

According to theorem 1, we can get:

t r ( W T H ) ≤ ‖ U Σ 1 / 2 ‖ F ‖ Σ 1 / 2 V T W ‖ T F = ‖ Σ 1 / 2 ‖ F ‖ Σ 1 / 2 ‖ F (19)

The equation holds if and only if:

Σ 1 / 2 U T = Σ 1 / 2 V T W T (20)

holds, so the solution is:

W = U V T (21)

We consider how to solve the objective function Equation (10), where there are unknown variables d i related to V. Therefore, it has no closed-form solution. We can develop an algorithm to alternately update V (fixed d i ) and d i (fixed V). Specifically, we have two steps to solve the objective function Equation (10). First, update V while revising d i . In this case, the objective function (10) is constant. Therefore, the objective function Equation (10) becomes:

max W T W = I k t r ( W T H ) (22)

where H = G W and G are the weighted covariance matrix of the image data.

Let SVD of H be decomposed into H = U Σ V T , where H = U Σ V T and Σ ∈ R k × k are nonsingular diagonal matrices, and λ j ( j = 1 , ⋯ , k ) is the singular value of H. According to theorem 2, the optimal solution of the objective function Equation (22) is as follows:

W = U V T (23)

Secondly, d i is calculated with the updated V. Algorithm CC lists the pseudo-code to solve the objective function (10) namely Fp-2DPCA algorithm (

As discussed in Section 3, compared with the traditional 2DPCA, angel-2DPCA is more robust to outliers. However, angel-2DPCA only reduces dimensions in the row direction of the image. In other words, angel-2DPCA learns the projection matrix through a set of training images, which only reflects the information between image rows and does not consider the information embedded in image columns. Therefore, it needs more dimensions to represent images, so it needs more storage space to store large-scale data sets. Inspired by reference [

Algorithm 1 Fp-2DPCA |
---|

Input: A i ∈ R m × n ( i = 1 , ⋯ , N ) , k , ∑ i = 1 M A i = 0 , J ( W ( t ) ) = ∑ i = 1 M ‖ E i ‖ F p ‖ A i W ‖ F p |

Initialize: W ( t ) ∈ R m × k , ( W t ) T W t = I k , t = 1 , δ = | J ( W ( t ) ) − J ( W ( t − 1 ) ) | |

While: δ ≥ ε 1. Calculate d i ( t ) for each A i according to Equation (9), Calculate W : W = U V T according to the conclusion of Equation (23), |

2. Calculate H ( t ) according to Equation (9) and Equation (23), |

3. SVD decomposition of H ( t ) : H ( t ) = U Σ V T , |

4. Calculate W : W = U V T according to the conclusion of Equation (23), |

5. Update δ , |

6. Update t : t = t + 1 . |

Output: W t + 1 ∈ R n × d |

Specifically, firstly, the dimension of training sample A i ∈ R m × n ( i = 1 , 2 , ⋯ , N ) is reduced to get the right projection matrix R ∈ ℝ n × r . The image A i is projected onto R to reach Y i = A i R ∈ ℝ m × r , called the suitable feature matrix of A i . Then, we use Fp-2DPCA to reduce the training sample Y i T ∈ ℝ r × m ( i = 1 , 2 , ⋯ , N ) dimension and map it to the feature matrix B i T = Y i T L with size r × l . Through the above two processes, the training sample A i is projected to a smaller feature matrix B i :

B i = L T A i R , i = 1 , 2 , ⋯ , N (24)

We call R right projection matrix, L left the projection matrix, and the corresponding algorithm is called bilateral Fp-2DPCA. Because l × r is much smaller than m × n , bilateral Fp-2DPCA can use fewer dimensions to represent the input image. Experimental results show that, compared with 2DPCA, (2D) 2DPCA, and angle-2DPCA, bilateral Fp-2DPCA can achieve higher performance with fewer dimensions. Algorithm 2 (

Theorem 3 in the iteration of algorithm 1, we can get:

∑ i = 1 M ‖ A i W ( t + 1 ) ‖ F ≥ ∑ i = 1 M ‖ A i W ( t ) ‖ F (25)

Proof: in the t + 1 iteration, according to the fourth step of algorithm 1, the following inequality can be obtained:

∑ i = 1 M t r ( ( W ( t + 1 ) ) T A i T A i W ( t ) ) ‖ A i W ( t ) ‖ F ≥ ∑ i = 1 M t r ( ( W ( t ) ) T A i T A i W ( t ) ) ‖ A i W ( t ) ‖ F (26)

Algorithm 2 Bilateral Fp-2DPCA |
---|

Input: A i ∈ R m × n ( i = 1 , ⋯ , N ) , ∑ i = 1 M A i = 0 |

1. Taking A i ( i = 1 , ⋯ , N ) as the input of algorithm 1, the right projection matrix R ∈ ℝ n × r is obtained, and the characteristic matrix Y i = A i R of A i is calculated; |

2. Taking Y i T ( i = 1 , ⋯ , N ) as the input of algorithm 1, the left projection matrix L ∈ ℝ m × l is obtained, and the characteristic matrix B i T = Y i T L of Y i T is calculated; |

Output: Output: output left projection matrix L ∈ ℝ m × l , right projection matrix R ∈ ℝ n × r , and characteristic matrix B i = L T A i R ∈ ℝ l × r , i = 1 , 2 , ⋯ , N . |

Then, we get:

∑ i = 1 M t r ( ( W ( t + 1 ) ) T A i T A i W ( t ) ) ‖ A i W ( t ) ‖ F ≥ ∑ i = 1 M ‖ A i W ( t ) ‖ F (27)

For each i ( i = 1 , ⋯ , M ) , we can get:

t r ( ( W ( t + 1 ) ) T A i T A i W ( t ) ) = t r ( ( A i W ( t + 1 ) ) T A i W ( t ) ) = ( v e c ( A i W ( t + 1 ) ) ) T v e c ( A i W ( t ) ) (28)

According to Cauchy Schwarz inequality, we can get:

( v e c ( A i W ( t + 1 ) ) ) T v e c ( A i W ( t ) ) ≤ ‖ v e c ( A i W ( t + 1 ) ) ‖ 2 ‖ v e c ( A i W ( t ) ) ‖ 2 = ‖ A i W ( t + 1 ) ‖ F ‖ A i W ( t ) ‖ F (29)

According to Equation (28) and Equation (29), we can get:

∑ i = 1 M t r ( ( W ( t + 1 ) ) T A i T A i W ( t ) ) ‖ A i W ( t ) ‖ F ≤ ‖ A i W ( t + 1 ) ‖ F ‖ A i W ( t ) ‖ F ‖ A i W ( t ) ‖ F (30)

According to Equation (27) and Equation (30), we can get:

∑ i = 1 M ‖ A i W ( t + 1 ) ‖ F ≥ ∑ i = 1 M ‖ A i W ( t ) ‖ F (31)

Theorem 4 in the iteration of algorithm 1, we can get:

∑ i = 1 M ‖ A i − A i W ( t + 1 ) ( W ( t + 1 ) ) T ‖ F ≤ ∑ i = 1 M ‖ A i − A i W ( t ) ( W ( t ) ) T ‖ F (32)

Prove: after calculation, get:

∑ i = 1 M ‖ A i − A i W ( t + 1 ) ( W ( t + 1 ) ) T ‖ F 2 = ∑ i = 1 M t r ( A i T A i ) − t r ( ( W ( t + 1 ) ) T A i T A i W ( t + 1 ) ) = ∑ i = 1 M t r ( A i T A i ) − ‖ A i W ( t + 1 ) ‖ F (33)

According to theorem 1, there are:

∑ i = 1 M ‖ A i W ( t + 1 ) ‖ F ≥ ∑ i = 1 M ‖ A i W ( t ) ‖ F (34)

So, Equation (34) can get:

∑ i = 1 M ‖ A i − A i W ( t + 1 ) ( W ( t + 1 ) ) T ‖ F ≤ ∑ i = 1 M ‖ A i − A i W ( t ) ( W ( t ) ) T ‖ F (35)

Theorem 5 From Theorem 3 and theorem 4, we can get:

∑ i = 1 M ‖ E i ‖ F p ‖ A i W ‖ F p | W = W ( t + 1 ) ≤ ∑ i = 1 M ‖ E i ‖ F p ‖ A i W ‖ F p | W = W ( t ) (36)

Proof: By Theorem 3 and theorem 4, and 0 < p < 2 , there are

∑ i = 1 M ‖ A i W ( t + 1 ) ‖ F p ≥ ∑ i = 1 M ‖ A i W ( t ) ‖ F p (37)

and

∑ i = 1 M ‖ A i − A i W ( t + 1 ) ( W ( t + 1 ) ) T ‖ F p ≤ ∑ i = 1 M ‖ A i − A i W ( t ) ( W ( t ) ) T ‖ F p (38)

So, it’s easy to get:

∑ i = 1 M ‖ E i ‖ F p ‖ A i W ‖ F p | W = W ( t + 1 ) ≤ ∑ i = 1 M ‖ E i ‖ F p ‖ A i W ‖ F p | W = W ( t ) (39)

According to the conclusion of Theorem 5, algorithm 1 continuously reduces the function value of an objective function Equation (7) in iteration, so W will continue to approach the optimal solution. Finally, algorithm 1 will converge to the optimal local resolution of the objective function Equation (7). Algorithm 2 is based on Algorithm 1, so algorithm 2 must link to the optimal local solution.

In this part, we mainly show that Fp-2DPCA has good rotation invariance. Rotation invariance means that the low dimensional representation remains unchanged under the rotation transformation of the sample space.

Theorem 6 The solution of Fp-2DPCA is rotationally invariant.

Proof: given any orthogonal matrix Γ ( Γ T Γ = I ) , for each step of algorithm 1 to get the solution W, there is

‖ E i ‖ F p ‖ A i W ‖ F p = ‖ A i − A i W W T ‖ F p ‖ A i W ‖ F p = ‖ A i − Z i W T ‖ F p ‖ Z i ‖ F p = ‖ ( A i − Z i W T ) Γ T Γ ‖ F p ‖ Z i ‖ F p = ( ∑ j = 1 m ‖ ( A i ( j , : ) Γ T − Z i ( j , : ) W T Γ T ) Γ ‖ 2 2 ) p ‖ Z i ‖ F p = ( ∑ j = 1 m ‖ A i ( j , : ) Γ T − Z i ( j , : ) W T Γ T ‖ 2 2 ) p ‖ Z i ‖ F p (40)

= ( ∑ j = 1 m ‖ A ˜ i ( j , : ) − Z i ( j , : ) W ˜ T ‖ 2 2 ) p ‖ Z i ‖ F p = ‖ A ˜ i − Z i W ˜ T ‖ F p ‖ Z i ‖ F p

where W ˜ = Γ W , A ˜ i = A i Γ , so A ˜ i W ˜ = A i Γ T Γ W = A i W . Equation (40) shows that if W is the solution of the objective function, W ˜ is the solution of the objective function under the orthogonal matrix Γ transformation.

Besides, compared with the existing 2DPCA method based on L_{1} norm, this method considers the reconstruction error directly and synthesizes the variance of low dimensional data in the criterion function. Also, it has strong robustness to outliers and is related to the covariance matrix of the image.

In this part, we use four most advanced algorithms, namely 2DPCA [^{2}PCA [

Because the feature matrices of different algorithms have different dimensions, we use the same feature size for all methods for a fair comparison. For example, if the column dimension of one-sided dimensionality reduction methods (2DPCA, 2DPCAL1, angel-2DPCA) is r. The row and column dimension reduction results of two-sided dimensionality reduction methods (2D) 2DPCA and Fp-2DPCA) are l ′ and r ′ respectively to make l ′ × r ′ ≈ m × r . For simplification, set l ′ = r ′ . For all algorithms, 1-Nearest neighbor classification (1-NN) is used for variety.

ORL face database consists of 400 frontal images collected from different lighting conditions, with ten shots per person. In this database, each print is adjusted to 112 × 92 pixels. We randomly selected seven pictures for each person and put the noise in the range of 0 - 255 in the chosen images. The noise location is random, and the ratio of noise pixels to image pixels is 0.05 - 0.15.

If the dimension reduction is too low, the reconstructed image will be difficult to recognize, so we choose to reduce columns’ dimension to 30. For (2D) 2DPCA, the dimensions of rows and columns are reduced to 50 (50 × 50 < 112 × 30). Our algorithm selects rows and columns to reduce to 30 (30 × 30 < 50 × 50) to compare the effect. In

Algorithm | dim | Acc ± std% (clean) | Acc ± std% (noised) | Diff% |
---|---|---|---|---|

2DPCA | 112 × 30 | 66.66 ± 1.18 | 62.50 ± 1.25 | 4.16 |

2DPCA-L1 | 112 × 30 | 67.50 ± 1.14 | 64.10 ± 1.52 | 3.40 |

（2D) 2PCA | 50 × 50 | 73.33 ± 1.09 | 70.19 ± 1.50 | 3.14 |

Angel-2DPCA | 112 × 30 | 75.83 ± 1.01 | 73.22 ± 1.29 | 2.61 |

B-Fp-2DPCA (p = 1) | 30 × 30 | 79.16 ± 1.12 | 77.45 ± 1.49 | 1.71 |

B-Fp-2DPCA (p = 0.5) | 30 × 30 | 81.66 ± 1.04 | 79.98 ± 1.35 | 1.68 |

(clean) and noisy dataset (noisy). The last column shows the difference of precision means between the clean data set and noise data set.

In

In this paper, we propose a new 2DPCA model, which considers the reconstruction error and considers the maximum variance, and adopts the f norm, which has good robustness to outliers. We increase the parameter p to make the model have more choices. The experiment shows that the effect of P-value 0.5 is better. We also extend it to the bilateral projection model and propose the bilateral FP-2DPCA. The new Fp-2DPCA reduces the dimension of the original image matrix from both row and column. Experimental results on face data sets show that the proposed method can achieve higher performance with fewer measurements.

The authors declare no conflicts of interest regarding the publication of this paper.

Kuang, H.J., Ye, W.Z. and Zhu, Z. (2021) Research on Face Recognition Algorithm Based on Robust 2DPCA. Advances in Pure Mathematics, 11, 149-161. https://doi.org/10.4236/apm.2021.112010