Inner Product Laplacian Embedding Based on Semidefinite Programming

This paper proposes an inner product Laplacian embedding algorithm based on semi-definite programming, named as IPLE algorithm. The new algorithm learns a geodesic distance-based kernel matrix by using semi-definite programming under the constraints of local contraction. The criterion function is to make the neighborhood points on manifold as close as possible while the geodesic distances between those distant points are preserved. The IPLE algorithm sufficiently integrates the advantages of LE, ISOMAP and MVU algorithms. The comparison experiments on two image datasets from COIL-20 images and USPS handwritten digit images are performed by applying LE, ISOMAP, MVU and the proposed IPLE. Experimental results show that the intrinsic low-dimensional coordinates obtained by our algorithm preserve more information according to the fraction of the dominant eigenvalues and can obtain the better comprehensive performance in clustering and manifold structure.


Introduction
In the current information age, a large quantity of data can be obtained easily.The valuable information is submerged into large scale datasets.It is urgently necessary to find the intrinsic laws of the data sets and predict the future development trend.One of the central problems in machine learning, computer vision and pattern recognition, is to develop appropriate representations for complex data.Manifold learning assumes that these observed data lie on or close to intrinsic low-dimensional manifolds embedded in the high-dimensional Euclidean space.The main goal of manifold learning is to find intrinsic low-dimensional manifold structures of high-dimensional observed dataset.Several known manifold learning algorithms have been proposed, such as ISOmetric feature MAPping (ISOMAP) [1], Laplacian Eigenmaps (LE) [2], and Maximum Variance Unfolding (MVU) [3], etc.
ISOMAP isometrically preserves the geodesic distances between any two points, but the centered geodesic distance matrix constructed by ISOMAP from finite data may have negative eigenvalues that are simply neglected [4], and it does not consider for clustering requirement in intrinsic low-dimensional space.LE makes neighborhood points in Euclidean space stay as close as possible, so a natural clustering can emerge in low-dimensional space [5].But, on one hand, LE can't guarantee that distant points in high-dimensional Euclidean space still stay distant in intrinsic low-dimensional space; on the other hand, the distances among the smallest d eigenvalues which are obtained by the spectral decomposition step of LE algorithm are so small and close, that the obtained intrinsic space is ill-posed and instable.MVU finds a low dimensional embedding of the observed data that preserves local Euclidean distances while maximizing the global variance [6], but a natural clustering is not considered.We propose a new manifold learning algorithm which is named as Inner Production Laplacian Embedding (IPLE) based on the following four considerations: 1) the geodesic distances along the curve are more meaningful than Euclidean distances, 2) the geodesic distance-based kernel matrix should be guaranteed to be positive semi-definite, 3) the requirement of natural clustering is constrained in the intrinsic low-dimensional space, 4) the solving scheme of the semi-definite programming is applied to optimizing the objection function with positive semi-definite constraint condition.
The rest of this paper is organized as follows: Section 2 introduces three classical manifold learning algorithms: MVU, ISOMAP and LLE.In Section 3, we analyze the  principle of our IPLE algorithm and describe the detailed procedures of IPLE.Some experimental results on COIL-20 image library and USPS handwritten digits dataset are shown in Section 4. Finally, we give some concluding remarks and future works in Section 5.

Maximum Variance Unfolding (MVU)
Maximum variance unfolding (MVU) is a recently proposed promising manifold learning algorithm by K.Q.Weinberger and L. K. Saul, that was referred to as semidefinite embedding in related earlier papers [7][8][9][10].MVU is also classified as the nonlinear dimensionality reduction algorithm based on extending the principles of two linear methods (PCA and MDS) [6].The principle of MVU: the goal is to find a low dimensional embedding of the observed data that preserves local Euclidean distances while maximizing the global variance.The objective function and the constraints are reformulated into a semi-definite programming problem.Then, a gram inner product matrix K is solved by semi-definite programming tool.Finally, the low dimensional embedding is obtained by decomposing the inner product matrix K. Similar to PCA method, a large gap between the d-th and (d+1)th eigenvalues of the matrix K may be used to estimate the intrinsic dimensionality d.The d-dimensional coordinates always consist of the product which the d largest eigenvectors are respectively multiplied by the square roots of the corresponding d eigenvalues of the inner product matrix K.The detailed procedures of MVU are described as follows: MVU Algorithm: , observed data k, the number of nearest neighbors based on Euclidean distance. Output: Step 1. Select k nearest neighbors for each point and construct the Euclidean neighborhood graph G that connects each point to its k nearest neighbors.
Step 2. Compute the inner product matrix K that is centered on the origin and preserves the Euclidean dis-tances of all neighborhood edges in graph G. Furthermore, the inner product matrix K is obtained by solving the following semi-definite programming problem (SDP).
Step 3. Compute a low dimensional embedding from the top eigenvectors and eigenvalues of the inner product matrix K.

ISOmetric Feature MAPping (ISOMAP)
ISOMAP is a known manifold learning method proposed by J. B. Tenenbaum, V. de Silva and J. C. Langford [1].Intuitively, geodesic distance between a pair of points on a manifold is the distance measured along the manifold in ISOMAP algorithm.Owing to geodesic distance reflects the underlying geometry of data, data embedding using geodesic distance is expected to unfold the twisted data manifolds [11].So these geodesic distances are more meaningful than traditional Euclidean distances.The main idea of ISOMAP algorithm is: firstly, determining which points are neighbors in the input space X and the usual trick is to connect each point to all points of its k-nearest neighbors for constructing the Euclidean neighborhood graph; Secondly, estimating the geodesic distances between all pairs of points on the manifold M by computing their shortest path distances on the connective Euclidean neighborhood graph; Finally, applying the classical MDS algorithm to the matrix of geodesic distances, for computing an embedding of the observed data in a lower-dimensional space that best preserves the manifold's estimated intrinsic geometry under the way of geodesic distance.The key steps of ISOMAP are shown as follows: ISOMAP Algorithm: by using Dijkstra's or Floyd's algorithm on neighborhood graph G.
Step 3. Compute low dimensional embedding by applying MDS algorithm on geodesic distance matrix .G D

Laplacian Eigenmaps (LE)
Laplacian Eigenmaps algorithm (LE) is a classical manifold learning algorithm with the most theoretical foundation, that proposed by Belkin and Niyogi in literature [2,12].Its intuitive idea is to make neighborhood points in Euclidean space stay as close as possible in low-dimensional space.So, one of LE's main advantages is that a natural clustering can emerge in low-dimensional space.
In LE algorithm, it include building the neighborhood graph, choosing the weights for edges in the neighborhood graph, eigen-decomposition of the graph Laplacian and forming the low-dimensional embedding.The key steps are described as follows: LE Algorithm: Step 1. Construct the neighborhood graph G by finding k nearest neighbors of each data point x i  X and connecting these edges.
Step 2. Compute the neighborhood similarities by choosing Heat kernel or simple mode.
where Laplacian matrix L = D -W, and .

IPLE Algorithm Analysis
Given the observed high-dimensional data , the goal of manifold learning is to gain the intrinsic low-dimensional coordinates .Like other manifold learning algorithms, the proposed algorithm in this paper is based on a simple geometric intuition.Preserving the geodesic distances between far points on data manifold, low dimensional coordinates of neighborhood data points are contracted as near as possible on the intrinsic low-dimensional manifold.In fact, our goal is to compute a geodesic distance-based kernel matrix under the requirement of natural clustering, which the advantages of LE, ISOMAP and MVU algorithms are sufficiently applied.
Let NG be the indicator matrix of k1-nearest-neighbor graph based on geodesic distance, and let FG be the indicator matrix of k2-farthest-point graph based on geodesic distance.The parameters k1 and k2 respectively play the local measure role and the global measure role on manifold.The two indicator matrixes are defined as follows: if and are among each other's 1 1-nearest neighbors based on geodesic distance 0, elsewise Let W be neighborhood similarity matrix, and its elements are defined as follows: where GD denotes geodesic distance matrix and t is Heat Kernel parameter.Neighborhood points are contracted as near as possible in the intrinsic low-dimensional space, while the geodesic distances of those farther points on manifold are preserved.According to the intuitional description, the corresponding objection function is described as follows: where GD denotes geodesic distance matrix.
Theorem 1.Let low dimensional coordinates . If all elements of inner product matrix K in low-dimensional space .
= and are symmetric matrixes.In theorem 1, the weighted distances between neighborhood points are converted into the trace of the product of Laplacian matrix and inner product matrix.
For preserving a translation invariance, low dimensional coordinates are constrained to be centered on the origin.That is: The inner product matrix K is a gram matrix, so K must be constrained to be positive semi-definite matrix (that is, ). 0 K  Collecting the objection function and constraints of the above optimization in terms of the inner product matrix K, a new semi-definite programming problem is described as follows:

S.t. (2) 0
(3) 0 where ij denotes the geodesic distance between the observed points i GD x and j x , L denotes Laplacian matrix, and FG is the indicator matrix of k2-farthest-point graph based on geodesic distance.
From the inner product matrix K learned by semi-definite programming(SDP), K represents the covariance matrix of low dimensional coordinates.The output can be recovered by matrix di-   be the sorted eigenvalues of K. Let i v denote the i-th eigenvector with the corresponding eigenvalue i  .In these eigenvalue spectrums, a large gap between the d-th and (d + 1)-th eigenvalue can estimate that the outputs may lie in or near a low dimensional intrinsic manifold with dimension d (that is, the intrinsic dimension of the obtained manifold is considered as d).The d-dimension embedding According to the above analysis, we obtained the following theorem: Theorem 2. If the learned matrix K represents the inner product matrix of low dimensional coordinates by minimizing or maximizing the cost function in semidefinite programming problem, then a low dimensional embedding coordinates Y can be recovered from the top eigenvectors of the inner product matrix K, that is

The Basic Procedures of IPLE
In this paper, the new manifold learning algorithm finds the inner product matrix of low dimensional coordinates by semi-definite programming, and the coefficient matrix in the objective function is Laplacian matrix.So we refer the new algorithm to as Inner Production Laplacian Embedding (IPLE).The basic procedures of IPLE are summarized as follows: IPLE Algorithm: Observed dataset k The number of nearest neighbors based on Euclidean distance.
k1 The number of nearest neighbors based on geodesic distance.
k2 The number of farthest points based on geodesic distance. Output: Construct the graph G that connects each input to its k nearest neighbors, and the distances between pairs of adjacency points are Euclidean distance.
Step 2. Compute geodesic distance matrix GD.Compute the shortest path on the adjacency graph G to approximate the geodesic distance by applying Dijkstra's algorithm [13].
Step 4. Compute the similarity matrix W on k1-nearestneighbor graph NG.
, where GD is geodesic distance matrix.
Step 5. Compute the inner product matrix K by solving the following semi-definite programming problem as shown in Equation ( 9): Step 6. Compute a low dimensional embedding from the top eigenvectors of the inner product matrix K.That is: The d-dimensional embedding  are respectively computed by the d largest eigenvalues and the corresponding eigenvectors of the matrix K.
Note that: NG is the indicator matrix of k1-nearestneighbor graph based on geodesic distances, and FG is the indicator matrix of k2-farthest-point graph based on geodesic distances.Laplacian matrix L = D -W, where W is the similarity matrix on the graph NG and diagonal matrix .
ii ij D W   j

Experiments
For evaluating our IPLE algorithm, several comparison experiments on two datasets are performed by applying LE, ISOMAP, MVU and IPLE.The first dataset is from the USPS handwritten digits dataset [14], and the second one is from the Columbia Object Image Library (COIL-20) [15].Experimental results about 2-dimension visualization, the eigen-spectrums with corresponding to the intrinsic low-dimensional coordinates and the clustering property are compared.
As for the information capacity included in low-dimensional coordinates, the ratio of the corresponding eigenvalue vs. the trace is used.Specially, if the metric matrix is non-positive semi-definite, then the trace is substituted by the sum of the absolute value of eigenvalues.
For comparing the low-dimensional clustering performance, the following experiments use the ratio of between-class scatter distance ( b ) versus within-class scatter distance.In general, if the ratio is more large, then the quality of the clustering is considered to be more high.The ratio θ is defined to quantify the quality of the clustering performance, as follows: where y is the mean of all low coordinate points, j y is the centroid of the j-th class, j n denotes the number of the j-th class samples, ij denotes the i-th low-dimensional coordinate points of the j-th class.

Experiments on USPS Handwritten Digits Dataset
The original dataset is from the well known US Postal Service (USPS) handwritten digits recognition corpus [14].It contains 11000 normalized grey images of size 16  16 pixels, with 1100 images for each of the ten class digits 0 ~ 9.For simplicity, our experimental dataset (named as USPS-01 dataset) consists of 600 images which were respectively selected the first 300 samples from each of two class digits: "0" and "1".Each image was represented by 256-dimensional vector with transforming pixel grey-value to the interval from 0 to1.Six hundred 256-dimensional vectors with corresponding to these training images were used to find the intrinsic low-dimensional coordinates by applying LE, ISOMAP, MVU and our IPLE.
For constructing the connected graph based on Euclidean distance, the neighborhood parameter 8 k  in the first step of the four algorithms.Both LE and our IPLE algorithm all set the heat parameter .In the third step of our IPLE algorithm,the parameter of k1-nearestneighbor graph k  .In addition, both IPLE algorithm and MVU algorithm all used the semi-definite programming tool CSDP v4.8 [16] to compute the inner product matrix K , where the iterations of the CSDP tool is set to 50 times in the experiments.
Figure 1 shows the two-dimensional embedding of 600 images from handwritten digits "0" and "1".Some original handwritten digit images are remarked on the corresponding 2-dimensional coordinates, which the change of the slant and stroke thickness can be observed.Two-dimensional manifold structures obtained by IPLE algorithm have the higher degree of separation for the two-class digits, while the changing laws of the slant and stroke thickness are preserved, as shown in Figure 1(d).
The second row in Table 1 shows the ratio of betweenclass distance versus within-class distance for two dimensional coordinates of two-class handwritten digits ("0" and "1").The ratio of IPLE is largest in four algorithms, and it indicates that low dimensional structure obtained by IPLE has better clustering performance than LE, ISOMAP and MVU.Visualization experiments on USPS-01 dataset show the IPLE can obtain the better comprehensive performance in clustering and manifold structure, as Figure 1 and Table 1.applying LE, ISOMAP, MVU and IPLE algorithms.In IPLE algorithms, there are four dominant eigenvalues, that is, the intrinsic dimension is four.In LE,ISOMAP and MVU,more than four dominant eigenvalues demonstrate that their corresponding low-dimensional coordinates contain some noise.gramming tool CSDP v4.8 [16] to compute the inner product matrix K, where the iterations of the CSDP tool is set to 40 times in the experiments.
Figures 5(a)-(d) shows the results of two dimensional embedding by applying LE, ISOMAP, MVU and our IPLE on COIL-TWO image dataset.The third row in Table 1 shows the ratio of between-class distance versus within-class distance for two dimensional coordinates of two-class objects ("duck" toy and "cat" toy).The ratio of IPLE is largest in four algorithms, and it indicates that low dimensional structure obtained by IPLE is better clustering performance than LE, ISOMAP and MVU.In some degree, the intrinsic low dimensional visualization is better, as shown in Figure 5(d).
Figure 6(a) shows the fraction of each of the top 20 eigenvalues in the trace of the centered metric matrix by applying LE, ISOMAP, MVU and IPLE algorithms.To ISOMAP and MVU algorithms, there are three or four dominant eigenvalues.But IPLE only obtained two dominant eigenvalues, that is, the intrinsic dimension is two which is consistent with the intrinsic law of the practical image set that was sampled from two rotated objects.obtain the positive semi-definite metric matrix under the constraints of positive semi-definite programming problem.In Figure 7, from left to right, the i-th colored region denotes the fraction of the i-th eigenvalue in the trace, and it shows that the top two eigenvalues of IPLE preserve more information than that of ISOMA and MVU.That is, two dimensional structure obtained by IPLE is more meaningful.

Conclusions
In this paper, we propose an inner product Laplacian embedding algorithm based on semi-definite programming.The new algorithm avoids the problem of ISOMAP's nonpositive semi-definite matrix decomposition, the problem of LE's small and close dominant eigenvalues, and the problem of MVU's non-clustering property.The problem of LE's small and close dominant eigenvalues, experiments on USPS-01 dataset and COIL-TWO dataset demonstrate the feasibleness of IPLE algorithm.Experimental results also show that the dominant eigenvalues of IPLE preserved more information and can obtain the better comprehensive performance in clustering and manifold structure.One of our future research tasks is to develop the incremental learning of IPLE for large scale datasets, as introduced the technique in literature [17][18][19][20][21][22][23].And another possible extension is to consider the labels of samples for designing the semi-supervised or supervised inner product Laplacian embedding algorithm.

Step 3 .
if x i , x j are connected on the graph G, : if x i , x j are connected, Compute low dimensional embedding by optimizing the following objection function: y

Figure 2 .
Figure 2. Comparison of eigenvalues from experiments on USPS-01 dataset.Left: the fraction of each of the top 20 eigenvalues in the trace; Right: the fraction of each of the bottom 20 eigenvalues in the trace.

Figure 2 ( 3 , 4 . 2 .Figure 3 .
Figure 3.The comparison of the dominant eigenvalues of the metric matrices obtained by ISOMAP, MVU and IPLE on USPS-01 dataset.Each region denotes the fraction of the corresponding eigenvalue in the trace.

Figure 4 .
Figure 4. Sample images of two objects from COIL-20 Image library, (named as COIL-TWO).(a)144 sample images of the duck toy, (b) 144 sample images of the cat toy.

Figure 6 (
b) shows the fraction of each of the bottom 20 eigenvalues in the trace.It shows that ISOMAP can't guarantee to obtain the positive semi-definite matrix and the negative eigenvalues should not simply be removed in ISOMAP algorithm.IPLE and MVU can ensure to

Figure 5 .
Figure 5. Two dimensional embedding of 144 images in COIL-TWO dataset.

Figure 6 .
Figure 6.Comparison of eigenvalues from experiments on COIL-TWO image dataset.Left: the fraction of each of the top 20 eigenvalues in the trace; Right: the fraction of each of the bottom 20 eigenvalues in the trace.

Figure 7 .
Figure 7.The comparison of the dominant eigenvalues of the metric matrices obtained by ISOMAP, MVU and IPLE on COIL-TWO dataset.Each region denotes the fraction of the corresponding eigenvalue in the trace.