A New Algorithm for Reducing Dimensionality of L1-CSVM Use Augmented Lagrange Method

Abstract

Principal component analysis and generalized low rank approximation of matrices are two different dimensionality reduction methods. Two different dimensionality reduction algorithms are applied to the L1-CSVM model based on augmented Lagrange method to explore the variation of running time and accuracy of the model in dimensionality reduction space. The results show that the improved algorithm can greatly reduce the running time and improve the accuracy of the algorithm.

Share and Cite:

Cui, M. and Fan, L. (2022) A New Algorithm for Reducing Dimensionality of L1-CSVM Use Augmented Lagrange Method. Journal of Applied Mathematics and Physics, 10, 21-30. doi: 10.4236/jamp.2022.101003.

1. Introduction

Support Vector Machine (SVM) [1] is a machine learning method based on statistical learning and optimization theory, first proposed by Vapnik et al. in 1995. SVM is a widely used binary classification model. The basic idea is to find a hyperplane and divide sample points into two categories. The basic idea of Principal Component Analysis (PCA) [2] [3] is to transform the original multi-variable data into new variables with fewer than before through linear transformation. Considering the dimensionality reduction, the data can still maintain the maximum separability between samples and the recent reconstruction of samples. However, PCA uses vector data, which requires a large amount of calculation in the process of singular value decomposition. In order to improve the disadvantage of large PCA calculation, YE proposed a generalized low rank approximation of matrices (GLRAM) method in 2005 [4] [5]. The data set used in this method is a collection of matrices. The optimization process is to use the method of bilinear transformation for iteration to minimize the reconstruction error, which can lead to lower computational cost.

Sparse representation optimization is widely used. L1 norm is usually used to regularize penalty terms to improve the classification accuracy of network models. The L1 norm uses fewer features for classification by selecting. L1 norm is considered to be an effective method to reduce the influence of outliers [6] [7] [8]. In 2014, Nie et al. used Augmented Lagrange method (ALM) [9] [10] in SVM [11] to solve the classification problem of linear big data. In 2019, Chauhan et al. summarized the linear SVM [12] and pointed out that the algorithm of Nie et al. was the fastest method to solve the original problem. In 2019, Yan et al. proposed the algorithm for ALM to solve L1-CSVM [13].

Based on the study of the above theory, this paper takes feature dimension reduction as the starting point to enhance the performance of image binary classification algorithm. PCA and GLRAM are introduced to improve the algorithm model of L1-CSVM based on ALM. The experimental results show that when calculating the L1-CSVM model based on dimensionality reduction, it can significantly reduce the calculation time and generally improve the accuracy due to the reduction of the number of data features.

The following contents of this article are as follows. In Section 2, PCA and GLRAM are introduced. In Section 3, PCA-L1-CSVM and GLRAM-L1-CSVM are described. In Section 4, the content and results of comparative experiments on MNIST and Fashion-MNIST datasets are introduced in detail. Finally, the conclusion of this paper is in Section 5.

2. Related Work

This section mainly introduces two dimensionality reduction algorithms: PCA and GLRAM.

2.1. PCA

Given data T = { ( x 1 , x 2 , , x m ) } R d × m , Let’s say we have m samples, and each sample corresponds to d attributes. Take any centralized data x ˜ i d , A new coordinate system composed of orthonormal basis vectors is obtained by projection transformation { v 1 , v 2 , , v d } ( d d ) , v i 2 = 1 , v i T v j = 0 ( i j ) . The projection of x ˜ i in my new coordinates is s i = ( s i 1 ; s i 2 ; ; s i d ) , meet s i j = v j T x ˜ i and x ^ i = j = 1 d s i j v j . Therefore, the following optimization model can be established

i = 1 m j = 1 d s i j v j x ˜ i 2 2 s .t . v i 2 = 1 , v i T v j = 0 ( i j ) (1)

The orthogonal coordinate system can be obtained by solving (1), from which low-dimensional characteristic data s i = ( s i 1 ; s i 2 ; ; s i d ) R d × m can be obtained. Thus, the original high-dimensional data set T is transformed into sparse representation data set T r = { ( r i , y i ) } i = 1 m R d × { ± 1 } .

2.2. GLRAM

GLRAM minimizes the reconstruction error of the new matrix and the original matrix by reducing the dimension of the matrix data, and uses the low-rank matrix set to approximate the original matrix set to complete data compression. Given data T = { ( X i , y i ) } i = 1 m R r × c × { ± 1 } , Let’s say we have m samples, and each sample corresponds to r × c attributes. We want to find two orthogonal transformation matrices L R r × q 1 , P R c × q 2 and M i R q 1 × q 2 , i = 1 , , m , make L M i P T as close as possible to X i .

The following optimization model is established

min L R r × q 1 : L T L = I q 1 P R c × q 2 : P T P = I q 2 M i R q 1 × q 2 , i = 1 , , m i = 1 m X i L M i P T F 2 (2)

Adopt iterative method, fix P first, use M L = i = 1 m X i P P T X i T get the q 1 largest eigenvalue pairs L R r × q 1 of M L . Then fix L, use M P = i = 1 m X i T L L T X i get the q 2 largest eigenvalue pairs P R c × q 2 of M L . And finally, solve M i through M i = L T X i P . Repeat the solution until the requirements are met. Then get new dimension reduction dataset T = { ( X 1 , X 2 , , X m ) } R q 1 × q 2 , i = 1 , , m . In this paper, each sample after dimensionality reduction is pulled into column vectors to form sample matrix T r = { ( r i , y i ) } i = 1 m R d × { ± 1 } similar to section 2.1.

3. Proposed Model

The algorithm proposed in this chapter is based on article (13), mainly introduces two improved models: PCA-L1-CSVM and GLRAM-L1-CSVM. The first model uses data set is T = { ( x i , y i ) } i = 1 m R d × { ± 1 } . The second model uses data set is T = { ( X i , y i ) } i = 1 m R r × c × { ± 1 } . A = [ x 1 + , , x m 1 + ] , B = [ x 1 , , x m 2 ] , X = [ A , B ] represents the matrix of the positive sample, the matrix of the negative sample, and the matrix of the population sample. I 1 = { i { 1 , , m 1 } : y i = 1 } and I 2 = { j { 1 , , m 2 } : y j = 1 } represents the subscript set of positive and negative data respectively.

3.1. PCA-L1-CSVM

PCA-L1-CSVM uses dimension reduction dataset T r = { ( r i , y i ) } i = 1 m R d × { ± 1 } . The following planning model is constructed

min w , b , r i , v i , ξ i 1 2 w 2 + C i I ξ i s .t . y i ( w T r i + b ) 1 ξ , ξ i 0 , i = 1 , , m , min r i i = 1 m j = 1 d r i j v j x ˜ i 2 2 s .t . v i 2 = 1 , v i T v j = 0 ( i j ) . (3)

where C > 0 is a penalty parameter, the second condition of the constraint is to use PCA dimensionality reduction for the original data set. First, PCA is used to calculate the data after dimensionality reduction, and then the model is represented as an unconstrained model

min w , b 1 2 w 2 + C i = 1 m max ( 1 y i ( w T r i + b ) , 0 ) + 1 (4)

Because bias has little effect on model performance, bias is often ignored in recent studies, and r i T [ r i T , 1 ] R d + 1 , w T [ w T , b ] R d + 1 for convenience, this section d + 1 is denoted as d . The model (4) was rewritten as an unconstrained quadratic programming model

min w f ( w ) : = 1 2 w 2 + C i = 1 m max ( 1 y i ( w T r i ) , 0 ) (5)

Model (5) is expressed in matrix form

min w 1 2 w 2 + C max 0 , B w + d 1 (6)

where

B = [ y 1 r 1 T y m r m T ] R m × d , d = [ 1 1 ] R d .

Introducing variable s R d , we obtain the following constrained optimization problem

min w 1 2 w 2 + P ( s ) s .t . s = B w + d , (7)

and P ( s ) = C max 0 , s 1 .

Consider the ALM function of model (7) above:

L σ ( w , s , λ ) = 1 2 w 2 + P ( s ) λ T ( s B w d ) + σ 2 s B w d 2

At iteration k, solve

min w , s L σ k ( w k , s k , λ k ) , (8)

obtain ( w k + 1 , s k + 1 ) , and then update the Lagrange multiplier λ k + 1 .

3.2. GLRAM-L1-CSVM

PCA-L1-CSVM uses dimension reduction dataset T = { ( X 1 , X 2 , , X m ) } R q 1 × q 2 , i I . The following planning model is constructed

min w , b , r i , v i , ξ i 1 2 w 2 + C i I ξ i s .t . y i ( w T r i + b ) 1 ξ , ξ i 0 , i = 1 , , m , min r i i = 1 m X i L M i , R P T F 2 s .t . L T L = I q 1 , P T P = I q 2 (9)

where C > 0 is a penalty parameter, different from the second constraint condition in formula (3), the second constraint of formula (9) is to use GLRAM to reduce the dimension of the original data. First, GLRAM is used to reduce the dimension of the data, and pull the data matrix into column vectors. Then, the model is transformed into an unconstrained optimization model as (4). Then, the following operations are solved using the same method as in Section 3.1.

4. Experiment and Result Analysis

In order to test the effectiveness of Algorithm 1 and GLRAM-L1-CSVM, a series of comparative experiments are conducted on MNIST and fashion-MNIST data in this section. Each dataset contains 10 image types, each of which has a pixel size of 28 × 28. Eight groups of data were randomly selected from each data set and divided into four groups for binary comparison test. This experiment only uses the training set data of each kind of data set and divides it into training set and test set. The size of the training set is 3000, 4000 and 5000 respectively, accounting for 50%, 66% and 83% of the total data set, and the rest is the test set. In the experiment, the eigenvalues of 81, 100 and 121 after dimensionality reduction were used. All the features in the table refer to the features of the improved algorithm.

The experimental platform used in this experiment is matlab R2014B, and the Windows specification is Windows 10. The device specifications are as follows: The processor is Inter(R) Core (TM) I7-10700 (2.90 GHz), and the onboard RAM is 8 GB.

Algorithm 1. PCA-L1-CSVM.

A bar chart is used to represent the comparison of the accuracy of the first two groups of binary classification experiments of different algorithms, and a broken line chart is used to represent the running time of each algorithm in the first two groups of experiments. The bar charts and line charts from left to right of each image are the data results of dimensionality reduction to 81, 100, 121, respectively. The average results of each algorithm under the latter two groups of experiments are shown in the table. The data in bold is the best data, the time unit is second, and the precision unit is %.

4.1. Experimental Results on MNIST Datasets

The experimental results on the MNIST dataset are presented.

According to Figure 1, Figure 2, we can find that the two improved algorithms generally improve the accuracy of operation. According to Figure 3, Figure 4, it can be found that the original algorithm takes a long time, while the two improved algorithms can significantly shorten the running time. According to Table 1, it can also be concluded that the improved algorithm can generally improve accuracy. Table 2 shows that the time of the two improved algorithms is about 10% of the original algorithm. Comparing the two improved algorithms, it can be found that GLRAM shows better results in both accuracy and time than PCA.

4.2. Experimental Results on Fashion-MNIST Datasets

Next, the experimental results on the Fashion-MNIST dataset are presented.

It can be seen from Figure 5 and Figure 6 that the two improved algorithms generally improve the accuracy of operation. The accuracy of GLRAM is improved obviously and has stability. As can be seen from Figure 7, Figure 8, the two improved algorithms can shorten the running time. It can be seen from Table 3 that the improved algorithm can improve the accuracy on the whole, the results of PCA are floating, and the overall results of GLRAM are optimal in comparison of the three results. It can be seen from Table 4 that the time of the two improved algorithms is about 14% of the original algorithm. Comparing the two improved algorithms, the algorithm with GLRAM is better than that with PCA in accuracy and time.

Figure 1. Precision comparison of the first set of MNIST data.

Figure 2. Precision comparison of the second set of MNIST data.

Figure 3. The running time of the first set of MNISTdata.

Figure 4. The running time of the second set of MNIST data.

Figure 5. Precision comparison of the first set of Fashion-MNIST data.

Figure 6. Precision comparison of the second set of Fashion-MNIST data.

Figure 7. The running time of the first set of Fashion-MNIST data.

Figure 8. The running time of the second set of Fashion-MNIST data.

Table 1. Average Test accuracy about MNIST data.

Table 2. Average Test Time about MNIST data.

Table 3. Average accurancy about Fashion-MNIST data.

Table 4. Average Test Time about Fashion-MNIST data.

5. Conclusion

In this paper, the ALM is applied to solve two models: PCA-L1-CSVM and GLRA-L1-CSVM. The main contribution of this paper is as follows: Firstly, principal component analysis or GLRAM method is introduced to reduce dimension of data set. Then, the data after dimensionality reduction is dichotomized using L1-CSVM. Experimental results show that the improved model can greatly reduce the running time while maintaining a general increase in accuracy. In the future, we hope to use ALM to solve the original problems of other models.

Acknowledgements

Supported by Natural Science Foundation of Shandong Province, Funded Project Nos. ZR2020MA026, ZR2018BF010.

Conflicts of Interest

The authors declare no conflicts of interest regarding the publication of this paper.

References

[1] Cortes, C. and Vapnik, V. (1995) Support Vector Networks. Machine Learning, 20, 273-297.
https://doi.org/10.1007/BF00994018
[2] Mohammed, A.A., Minhas, R., Wu, Q.M.J. and Sid-Ahmed, M.A. (2011) Human Face Recognition Based on Multidimensional PCA and Extreme Learning Machine. Pattern Recognition, 44, 2588-2597.
https://doi.org/10.1016/j.patcog.2011.03.013
[3] Majumdar, A. (2009) Image Compression by Sparse PCA Coding in Curvelet Domain. Signal, Image and Video Processing, 3, 27-34.
https://doi.org/10.1007/s11760-008-0056-5
[4] Ye, J.P. (2005) Generalized Low Rank Approximations of Matrices. Machine Learning, 61, 167-191.
https://doi.org/10.1007/s10994-005-3561-6
[5] Ahmadi, S. and Rezghi, M. (2020) Generalized Low-Rank Approximation of Matrices Based on Multiple Transformation Pairs. Pattern Recognition, 108, Article ID: 107545.
https://doi.org/10.1016/j.patcog.2020.107545
[6] Wang, H., Lu, X., Hu, Z. and Zheng, W. (2013) Fisher Discriminant Analysis with l1-Norm. IEEE Transactions on Cybernetics, 44, 828-842.
https://doi.org/10.1109/TCYB.2013.2273355
[7] Ye, Q., Zhao, H., Yang, X. and Ye, N. (2017) L1-Norm Distance Minimization Based Fast Robust Twin Support Vector K-Plane Clustering. IEEE Transactions on Neural Networks and Learning Systems, 29, 4494-4503.
https://doi.org/10.1109/TNNLS.2017.2749428
[8] Li, C.H., Shao, Y.H. and Deng, N.Y. (2015) Robust l1-Norm Non-Parallel Proximal Support Vector Machine. Optimization, 65, 1-15.
https://doi.org/10.1080/02331934.2014.994627
[9] Han, D.Y., Tang, Q.H., Zhang, Z.K., Yuan, L. and Li, J. (2021) An Efficient Augmented Lagrange Multiplier Method for Steelmaking and Continuous Casting Production Scheduling. Chemical Engineering Research and Design, 168, 169-192.
https://doi.org/10.1016/j.cherd.2021.01.035
[10] Shi, C.X. and Yang, G.H. (2018) Augmented Lagrange Algorithms for Distributed Optimization over Multi-Agent Networks via Edge-Based Method. Automatica, 94, 55-62.
https://doi.org/10.1016/j.automatica.2018.04.010
[11] Nie, F., Huang, Y., Wang, X. and Huang, H. (2014) New Primal SVM Solver with Linear Computational Cost for Big Data Classifications. Proceedings of the 31st International Conference on Machine Learning, 32, 505-513.
[12] Chauhan, V.K., Dahiya, K. and Sharma, A. (2019) Problem Formulations and Solvers in Linear SVM: A Review. Artificial Intelligence Review, 52, 803-855.
https://doi.org/10.1007/s10462-018-9614-6
[13] Yan, Y.Q. and Li, Q.N. (2019) An Efficient Augmented Lagrangian Method for Support Vector Machine. arXiv e-prints, arXiv:1912.06800.

Copyright © 2024 by authors and Scientific Research Publishing Inc.

Creative Commons License

This work and the related PDF file are licensed under a Creative Commons Attribution 4.0 International License.