Face Recognition Using Fuzzy Clustering and Kernel Least Square

Over the last fifteen years, face recognition has become a popular area of research in image analysis and one of the most successful applications of machine learning and understanding. To enhance the classification rate of the image recognition, several techniques are introduced, modified and combined. The suggested model extracts the features using Fourier-Gabor filter, selects the best features using signal to noise ratio, deletes or modifies anomalous images using fuzzy c-mean clustering, uses kernel least square and optimizes it by using wild dog pack optimization. To compare the suggested method with the previous methods, four datasets are used. The results indicate that the suggested methods without fuzzy clustering and with fuzzy clustering outperform stateof-art methods for all datasets.


Introduction
Facial recognition has been an active research topic since the early nineties.There have been several advances in the past few years in terms of face detection and tracking, feature extraction mechanisms and the related machine learning techniques.Face recognition has drawn the attention of researchers in fields from image analysis and processing, computer vision, to psychology and security [1].In spite of more than fifteen years of extensive re-search, large number of papers published in journals and conferences dedicated to this area, we still cannot claim that automatic face recognition is comparable to human performance [2].The difficulty of automatic face recognition is due to various effects like aging, facial expressions, occlusions, lighting and viewpoint changes induced by body movement [3].A successful face recognition mythology depends heavily on the particular choice of the features, observation reduction and classification method.In this paper four algorithms have in-troduced, modified and combined to enhance the face recognition and detection [4].The features are extracted using Fourier-Gabor filter and selected by generalizing signal to noise equation.Modified fuzzy clustering algorithm is used to reduce the misclassified faces by excluding or modifying the low membership faces.Furthermore a new non-linear kernel is used to kernel the least square method and the wild dog pack optimization is implemented to find the optimal parameters.The suggested method is implemented by using four datasets: AT & T database of faces (ATT), Indian Face Database (IFD), Faces95 from Essex university database and Yale face database (Yale).

Previous Studies
Principal component analysis (PCA) converts a set of faces images of possibly correlated variables into a set of values of linearly uncorrelated variables called principal components.PCA can be used to capture as much as possible of the variability of the face image where the eigenfaces matrix can be used to recognize the new faces by minimizing [5]: where U is the eigenfaces matrix or features matrix, µ is the mean image and b δ is the difference between the input image x b and the mean.Linear discriminant analysis (LDA) selects eigenvectors U in such a way that the ratio of the between-class scatter and the within class scatter is maximized which characterizes or separates two or more classes of faces images where S B and S W are the between class scatter matrix and the within class scatter matrix respectively and U opt can be found by solving the generalized eigenvalue problem [6].Independent component analysis (ICA) separates a multivariate signal into additive subcomponents.ICA uses centering, whitening, and dimensionality reduction as preprocessing steps in order to simplify and reduce the complexity of the problem for the actual iterative algorithm [7].The whitening matrix is twice the inverse square root of the covariance matrix, this removes the first and the second-order statistics of the data [8].Fourier transform is another important method, which plays a critical role in a broad range of image processing applications.The output of the transformation represents the image in the Fourier or frequency domain, while the input image is the spatial domain equivalent.For a square image of size n × n, the two-dimensional DFT for the point a and b in spatial domain is equivalent to the point m and n in frequency domain, this transform can be calculated as following: Gabor filters have been used to solve the configuration, orientations and emotions problems.A filter bank consisting of Gabor filters with various scales and rotations can be created and combined with different methods to enhance the recognition rate.The most commonly used filter in face recognition have the form [9]: where The output locations for each Gabor sub-matrix are specified by x and y. µ and ν define the orientation and scale of each sub-matrix.

The Suggested Method
The suggested method modifies and combines several techniques to enhance the face recognition.The main steps can be summarized as following: 1) Extract features from each image using Fourier-Gabor filte 2) Select the best features using signal to noise ratio (SNR) 3) Delete or modify anomalous images using fuzzy c-mean cluster 4) Model the training images using the least square method 5) Use a new non linear kernel 6) Optimize the weight vector using wild dog pack optimization (WDPO)

Feature Extraction and Selection
There are several methods have been used to extract the features from the face images, one among the best methods is Fourier-Gabor filter which is introduced in [8].Therefore the same method will be used in this study to extract the features from the face images.However different size, orientation and scale values are adopted in this study.The main steps are as following: 1) Resize the images to 40 × 40 2) Transform the images to frequency domain by applying Fourier transform in Equation ( 3) 3) Use Equation ( 4) to prepare 8 Gabor filters for orientation and 10 filters for scaling.Thus 80 different filters will be constructed 4) Multiply the result matrix in step 2 by each matrix in step 3. 5) Resize each matrix in step 4 as one row 6) Construct the feature vector for each face by concatenation all the rows in step 5.
The number of the extracted features is 40 × 40 × 80 = 104,000.To delete insignificant features, Signal-to-Noise Ratio (SNR) method is applied [10]: where µ is the mean, σ is the standard deviation and the signs {+, −} are indicate to the class.But for mul- ticlass, as in our case, one against all procedure is repeated for all classes and the total is calculated.The features that have lowest total are deleted.

Face Reduction and Modification
If face x in a class is closer to the faces in another class, this will lead to misclassification of the new faces that are close to the face x.Therefore, fuzzy clustering is used to modify the faces that have low membership degree in the correct class and delete the faces that have high membership degree in the wrong classes.Thus the fuzzy c-mean clustering will be modified as following: 1) Initialize the membership randomly ij u 2) Calculate the centers 3) Update the memberships ( ) 4) Go to step 2 until no significant improvement.5) For all { } and for all { } x x c = + 6) Go to step 2 until the above conditions become invalid.
E. Al Daoud ij u is the degree of membership of i x in the cluster j, x i is the i th face in the dataset, c j is the center of the cluster, σ and ρ are parameters to be tuned in the experimental section, n is number of images and m is number of classes or clusters (the fuzzy c-mean clustering before modification can be found in [11]).

Kernel Least Square Method
Once the features and the observations have prepared, several classifiers can be used such as neural networks, support vector machine (SVM) and regularized least squares classification (RLSC) [10].In this section RLSC will be used with a new non linear kernel and will be minimized by wild dog packs optimizer (WDPO).The general formulization of non-linear regularized least squares classification as following: ) w is the weight vector, y is the target class, λ is the regularization parameter and K is a kernel.In this study the following kernel is suggested: To train RLSC the parameters λ , µ and the vector w must be optimized.Therefore wild dog pack opti- mizer is used to find the best values.Algorithm 1 summarizes the main steps of WDPO [12]: Algorithm 1: Wild dog pack optimization (WDPO) Generate n dogs randomly while (t < iter) Evaluate Locations using the updated parameters Choose the best location as alpha ( ) t α Every few iterations update the parameters using self competition Update the dogs locations with regard to alpha using Equation ( c: is used to control the diversity from alpha Evaluate dogs Choose the best dog as alpha If no improvement for v iterations Use Hoo procedure to escape from the local minimum

Experimental Results
In this study four datasets are used: AT & T database of faces (ATT), Indian Face Database (IFD), Faces95 from Essex university database and Yale face database (Yale) [13]- [16].Figure 1 show samples faces from each dataset.All the datasets have many subjects and several images per individual, different facial expressions, configuration, orientations and emotions are used.Table 1 summarizes the content of each dataset.

E. Al Daoud
All the results in this section are found by using 10-fold cross validations.The code is written by using Matlab 10.The experiments are divided into three stages.The first stage is used to find the best number of features using signal to noise method.The implementation in this stage is repeated several times, each time the number of the features is reduced by 10,000.Table 2 shows sample results of different number of features.The best number of the features according to this stage is 80,000.The second stage is to find the best values of the clustering parameters σ and ρ.Table 3 indicates that the best values are σ = 50 and ρ = 30.The third stage is to compare the proposed method with three famous methods: Linear discriminant analysis (LDA), Principal component analysis (PCA), and Gabor-SVM.Moreover, to the proposed method is implemented with and without fuzzy clustering.Table 4, Table 5 show the classification rate using 30% and 50% for training respectively.The results indicate to that the suggested methods without fuzzy clustering and with fuzzy clustering outperform state-of-art methods for all datasets.

Conclusion
Face recognition presents a challenging problem in the field of image analysis and computer vision, and as such has received a great deal of attention over the last few years because of its many applications in various domains.The performance of the proposed method was demonstrated on various datasets that contain several images per individual, different facial expressions, configuration, orientations and emotions.The results are quite promising; the suggested method outperforms the previous methods for all datasets.The future work will be dedicated to apply and combine the other features that are extracted by various techniques.

Table 2 .
Sample results by using different number of features.

Table 3 .
Sample results by using different values for the clustering parameters σ and ρ.

Table 4 .
The classification rate using 30% for training.

Table 5 .
The classification rate using 50% for training.