An Actual Survey of Dimensionality Reduction

Dimension reduction is defined as the processes of projecting high-dimensional data to a much lower-dimensional space. Dimension reduction methods variously applied in regression, classification, feature analysis and visualization. In this paper, we review in details the last and most new version of methods that extensively developed in the past decade.


Introduction
Any progresses in efficiently using data processing and storage capacities need control on the number of useful variables.Researchers working in domains as diverse as computer science, astronomy, bio-informatics, remote sensing, economics, face recognition are always challenged with the reduction of the number of data-variables.The original dimensionality of the data is the number of variables that are measured on each observation.Especially when signals, processes, images or physical fields are sampled, high-dimensional representations are generated.High-dimensional data-sets present many mathematical challenges as well as some opportunities, and are bound to give rise to new theoretical developments [1].
In many cases, these representations are redundant and the varaibles are correlated, which means that eventually only a small sub-space of the original representation space is populated by the sample and by the underlying process.This is most probably the case, when very narrow process classes are considered.For the purpose of enabling low-dimensional representations with minimal information loss according dimension reduction methods are needed.
Hence, we are reviewing in this paper the most important dimensional reduction methods, including most traditional methods, such as principal component analysis (PCA) and non-linear PCA up to current state-of-art methods published in various areas, such as signal processing and statistical machine learning literature.This actual survey is organized as follows: Section 2 reviews the linear nature of Principal component analysis and its relation with multidimensional scaling (classical scaling) in a comparable way.Section 3 introduces non-linear or Kernel PCA (KPCA) using the kernel-trick.Section 4 is about linear discriminant analysis (LDA), and we give an optimization model of LDA which is a measuring of a power of this method.In Section 5 we summarize another higher-order linear method, namely canonical correlation analysis (CCA)), which finds a low dimensional representation maximizing the correlation and of course its optimization-formulation.Section 6 reviews the relatively new version of PCA, the so-called oriented PCA (OPCA) which is introduced by Kung and Diamantaras [2] as a generalization of PCA.It corresponds to the generalized eigenvalue decomposition of a pair of covariance matrices, but PCA corresponds to the eigenvalue decomposition of only a single covariance matrix.Section 7 introducs principal curves and includes a characterization of these curves with an optimization problem which tell us when a given curve can be a principal curve.Section 8 gives a very compact summary about non-linear dimensional-reduction methods using neural networks which include the simplest neural network which has only three layers: 1) Input Layer 2) Hidden Layer (bottleneck) 3) Output Layer and an auto-associative neural network with five layers: 1) Input Layer 2) Hidden Layer 3) Bottleneck 4) Hidden Layer 5) Output Layer A very nice optimizing formulation is also given.In Section 9, we review the Nystroem method which is a very useful and well known method using the numerical solution of an integral equation.In Section 10, we look the multidimensional scaling (MDS) from a modern and more exact consideration view of point, specially a defined objective stress function arises in this method.Section 11 summarizes locally linear embedding (LLE) method which address the problem of nonlinear dimensionality reduction by computing low-dimensional neighborhood preserving embedding of high-dimensional data.Section 12 is about one of the most important dimensional-reduction method namely Graph-based method.Here we will see how the adjacency matrix good works as a powerful tool to obtain a small space which is in fact the eigen-space of this matrix.Section 13 gives a summary on Isomap and the most important references about Dijstra algorithm and Floyd's algorithm are given.Section 14 is a review of Hessian eigenmaps method, a most important method in the so called manifold embedding.This section needs more mathematical backgrounds.Section 15 reviews most new developed methods such as • vector quantization • genetic and evolutionary algorithms

• regression
We have to emphasize here the all of given references in the body of survey are used and they are the most important references or original references for the related subject.To obtain more mathematical outline and sensation, we give an appendix about the most important backgrounds on the fractal and topological dimension definitions which are also important to understand the notion of intrinsic dimension.

Principal Component Analysis (PCA)
Principal component Analysis (PCA) [3] [4] [5]- [8] is a linear method that it performs dimensionality reduction by embedding the data into a linear subspace of lower dimensional.PCA is the most popular unsupervised linear method.The result of PCA is a lower dimensional representation from the original data that describes as much of the variance in the data as possible.This can be reached by finding a linear basis (possibly orthogonal) of reduced dimensionality for the data, in which the amount of variance in the data is maximal.
In the mathematical language, PCA attempts to find a linear mapping P that maximizes the cost function

( )
T tr P AP , where A is the sample covariance matrix of the zero-mean data.Another words PCA maximizes T P AP with respect to P under the constraint the norm of each column v of P is 1 , i.e., Why the above optimization Problem is equivalent to the eigenvalue problem (1.1)? consider the convex form ( ) , it is a straightforward calculation that the maximum happens when Av v λ = .
It is interesting to see that in fact PCA is identical to the multidimensional scaling (classical scaling) [9].
For the given data { } 1 Euclidean distance between the high-dimensional data points i x and i x .multidimensional scaling finds the linear mapping P such that maximizes the cost function: , is the Euclidean distance between the low-dimensional data points i y and j y , i y is restricted to be i x A , with 2 1 j v = for all column vector j v of P .It can be shown [10] [11] that the minimum of the cost function is given by the eigen-decomposition of the Gram matrix . Actually we can obtain the Gram matrix by double-centering the pairwise squared Euclidean distance matrix, i.e., by computing: Now consider the multiplication of principal eigenvectors of the double-centered squared Euclidean distance matrix (i.e., the principal eigenvectors of the Gram matrix) with the square-root of their corresponding eigenvalues, this gives us exactly the minimum of the cost function in Equation (1.2).
It is well known that the eigenvectors i u and i v of the matrices T X X and T XX are related through [12], it turns out that the similarity of classical scaling to PCA .The connection between PCA and classical scaling is described in more detail in, e.g., [11] [13].PCA may also be viewed upon as a latent variable model called probabilistic PCA [14].This model uses a Gaussian prior over the latent space, and a linear-Gaussian noise model.The probabilistic formulation of PCA leads to an EM-algorithm that may be computationally more efficient for very high-dimensional data.By using Gaussian processes, probabilistic PCA may also be extended to learn nonlinear mappings between the high-dimensional and the low-dimensional space [15].Another extension of PCA also includes minor components (i.e., the eigenvectors corresponding to the smallest eigenvalues) in the linear mapping, as minor components may be of relevance in classification settings [16].PCA and classical scaling have been successfully applied in a large number of domains such as face recognition [17], coin classification [18], and seismic series analysis [19].
PCA and classical scaling suffer from two main drawbacks.First, in PCA, the size of the covariance matrix is proportional to the dimensionality of the data-points.As a result, the computation of the eigenvectors might be infeasible for very high-dimensional data.In data-sets in which < n D , this drawback may be overcome by performing classical scaling instead of PCA, because the classical scaling scales with the number of data-points instead of with the number of dimensions in the data.Alternatively, iterative techniques such as Simple PCA [20] or probabilistic PCA [14] may be employed.Second, the cost function in Equation (1.2) reveals that PCA and classical scaling focus mainly on retaining large pairwise distances 2 ij d , instead of focusing on retaining the small pairwise distances, which is much more important.

Non-Linear PCA
Non-linear or Kernel PCA (KPCA) is in fact the reconstruction from linear PCA in a high-dimensional space that is constructed using a given kernel function [21].Recently , such reconstruction from linear techniques using the kernel-trick has led to the proposal of successful techniques such as kernel ridge regression and Support Vector Machines [22].Kernel PCA computes the principal eigenvectors of the kernel matrix, rather than those of the covariance matrix.The reconstruction from PCA in kernel space is straightforward, since a kernel matrix is similar to the inner product of the data-points in the high-dimensional space that is constructed using the kernel function.The application of PCA in the kernel space provides Kernel PCA the property of constructing nonlinear mappings.
Kernel PCA computes the kernel matrix ij K k   =   of the data-points i x .The entries in the kernel matrix are defined by ( ) where κ is a kernel function [22], which may be any function that gives rise to a positive-semi-definite kernel K. Subsequently, the kernel matrix K is double-centered using the following modification of the entries The centering operation corresponds to subtracting the mean of the features in traditional PCA: it subtracts the mean of the data in the feature space defined by the kernel function κ .Hence, the data in the features space defined by the kernel function is zero-mean.Subsequently, the principal d eigenvectors i v of the centered kernel matrix are computed.The eigenvectors of the covariance matrix i a (in the feature space constructed by κ ) can now be computed, since they are related to the eigenvectors of the kernel matrix i v (see, e.g., [12]) through In order to obtain the low-dimensional data representation, the data is projected onto the eigenvectors of the covariance matrix ( ) i a .The result of the projection (i.e., the low-dimensional data representation , , , , indicates the th j value in the vector l a and κ is the kernel function that was also used in the computation of the kernel matrix.Since Kernel PCA is a kernel-based method, the mapping performed by Kernel PCA relies on the choice of the kernel function κ .Possible choices for the kernel function include the linear kernel (making Kernel PCA equal to traditional PCA), the polynomial kernel, and the Gaussian kernel that is given in [12].Notice that when the linear kernel is employed, the kernel matrix K is equal to the Gram matrix, and the procedure described above is identical to classical scaling (previous section).
An important weakness of Kernel PCA is that the size of the kernel matrix is proportional to the square of the number of instances in the data-set.An approach to resolve this weakness is proposed in [23] [24].Also, Kernel PCA mainly focuses on retaining large pairwise distances (even though these are now measured in feature space).
Kernel PCA has been successfully applied to, e.g., face recognition [25], speech recognition [26], and novelty detection [25].Like Kernel PCA, the Gaussian Process Latent Variable Model (GPLVM) also uses kernel functions to construct non-linear variants of (probabilistic) PCA [15].However, the GPLVM is not simply the probabilistic counterpart of Kernel PCA: in the GPLVM, the kernel function is defined over the low-dimensional latent space, whereas in Kernel PCA, the kernel function is defined over the high-dimensional data space.

Linear Discriminant Analysis (LDA)
The main Reference here is [27] see also [28].The LDA is a method to find a linear transformation that maximizes class separability in the reduced dimensional space.The criterion in LDA is in fact to maximize between class scatter and minimize within-class scatter.The scatters are measured by using scatter matrices.Let we have

Now we define three scatter matrices:
The between-class scatter matrix ( )( ) The within-class scatter matrix ( )( ) , The total scatter matrix ( )( ) . Actually LDA is a method for the following optimization problem: Hence in this way the dimension is reduced from l to m by a linear transformation U which is the solution of above optimization problem.Although we know from Fukunaga (1990), (see [27] and [29]) that the eigenvectors corresponding to the 1 r − largest eigenvalues of

Canonical Correlation Analysis (CCA)
CCA is an old method back to the works of Hotelling 1936 [30], recently Sun et al. [31] used CCA as an unsupervised feature fusion method for two feature sets describing the same data objects.CCA finds projective directions which maximize the correlation between the feature vectors of the two feature sets.Let be two data set of n points in p  and q  respectively, associate with them we have two matrices: are the means of i x and i y s, respectively.
Actually CCA is a method for the following optimization problem: U U be the solution of above optimization problem, we can find another pair of projective directions by solving repeating the above process 1 m − times we obtain a m -dimensional specs of linear combination of these vec- tor-solutions.
In fact we can obtain this m -dimensional space with solving of the paired eigenvalue problem: ( ) and the eigenvectors corresponding to the m largest eigenvalues are the pairs of projective directions for CCA see [31].Hence compose the feature sets extracted from X A and Y A by CCA.It turns out that the number m is determined as the number of nonzero eigenvalue.

Oriented PCA (OPCA)
Oriented PCA is introduced by Kung and Diamantaras [2] as a generalization of PCA.It corresponds to the generalized eigenvalue decomposition of a pair of covariance matrices in the same way that PCA corresponds to the eigenvalue decomposition of a single covariance matrix.For the given pair of vectors u and v the objective function maximized by OPCA is given as follows: ( )

Principal Curves and Surfaces
By the definition, principal curves are smooth curves that pass through the middle of multidimensional data sets, see [32]- [34] as main references and also [35] and [36].
Given the n -dimensional random vector ).Hence we have We can associate to the curve f the projection index geometrically as the value of θ corresponding to the point on the curve f that under Euclidean metric is the closet point to x .
We say f is self-consistent if each point ( ) θ f is the mean of all points in the support of density function p that are projected on θ , i.e., ( ) ( ).
It is shown in [32] that the set of principal curves do not intersect themselves and they are self-consistent.Most important fact about principal curves which proved in [32] is a characterization of these curves with an optimization Problem: , min Of course to solve (or even estimate) minimization (0.7) is a complex problem, to estimate f and θ in [32] an iterative algorithm has given.It started with

( ) ( )
for each − f is less than a threshold.One can find in [37] another formulation of the principal curves, along with a generalized EM algorithm for its estimation under Gaussian pdf ( ) p x .Unfortunately except for a few special cases, it is an open problem for what type of distributions do principal curves exist, how many principal curves there exist and which properties the have see [36]. in recent years the concept of principal curves has been extended to higher dimensional principal surfaces, but of course the estimation algorithms are not smooth as the curves.

Non-Linear Methods Using Neural Networks
, neural networks getting this input and gives output variables , , , m y y y  with ( ) , , 1, , where the weights w are determined by training the neural network using a set of given instances and a cost function see [38].Over the last two decades there are several developments based on a ring architectures and learning algorithms of dimensional reduction techniques could be implemented using neural networks, see [35] [36] [38]- [40].Consider the simplest neural network which has only three layers: 1) Input Layer 2) Hidden Layer (bottleneck) 3) Output Layer there are two steps here: • In order to obtain the data at node k of the hidden layer, we have to consider any inputs i x in combination with their associated weight's ik w along with a threshold term (or called bias in some references) k ρ , Now they are ready passing through to the corresponding activation k ϕ , hence we are building up the expression .
We observe that the first part of network reduces the input data into the lower-dimensional space just as same as a linear PCA, but the second part decodes the reduced data into the original domain [36] [35].Note that only by adding two more hidden layers with nonlinear activation functions, one between the input and the bottleneck, the other between the bottleneck and the output layer, the PCA network can be generalized to obtain non-linear PCA.One can extend this idea from the feed-forward neural implementation of PCA extending to include non-linear activation function in the hidden layers [41],.In this framework, the non-linear PCA network can be considered of as an auto-associative neural network with five layers: 1) Input Layer 2) Hidden Layer 3) Bottleneck 4) Hidden Layer 5) Output Layer If : n l → Θ   f be the function modeled by layers (1) , (2) and (3) , and : be the modeled function by layers (3) , ( 4) and (5) , in [35] have been shown that weights of the non-linear PCA network are determined such that the following optimization Problem solved: As we have seen in the last section the function f must be Principal curve(surface).In the thesis [42], one can find comparison between PCA, Vector Quantization and five layer neural networks, for reducing the dimension of images.

Nystroem Method
The Nystroem Method is a well known technique for finding numerical approximations of generic integral equation and specially to eigenfunction problems of the following form: Now consider the simple quadrature rule: = we obtain a system of n equations: without loss of generality we can shift interval [ ] , a b to unit interval [ ] 0,1 and change the above system of equations to the following eigenvalue problem: where ( ) , substituting back into 0.8 yields the Nstroem extension for each i f  : We can extend above arguments for n x ∈  and > 1 n , see [42].
Motivated from 0.9 our main question is if A be a given n n × real symmetric matrix with small rank r , i.e., r n  , can we approximate the eigenvectors and eigenvalues of A using those of a small sub-matrix of A? Nystroem method gives a positive answer to this question.Actually we can assume that the r randomly chosen samples come first and the n r − samples come next.Hence the matrix A in 0.9 can have following form: Hence E represents the sub-block of weights among the random samples, B contains the weights from the random samples to the rest of samples and C contains the weights between all of remaining samples.Since r n  , C must be a large matrix.Let U denote the approximate eigenvectors of A , the Nystroem extension method gives: where U and D are eigenvectors and diagonal matrix associate with E , i.e., T E U DU = .Now the associated approximation of A , which we denote it with A  , then we have:


The last equation is called "bottleneck" form.There is a very interesting application of this form in Spectral Grouping which it was possible to construct the exact eigen-decomposition of A using the eigen- decomposition of smaller matrix rank r .Also Fowlkes et al have given an application of the Nystroem method t NCut Problem, see [43].

Multidimensional Scaling (MDS)
Given N point { } for some metric which defined ) d MDS ( better to say a m -dimensional MDS ) is a technique that produces output points { } • Define an objective stress function and stress factor α , that it depends on ( ) , , : • Now if for a given X as above, find * f that minimize 0.10, i.e., ( ) • Determine the optimal data set X  by If we use Euclidean distance and take f id = in Equation (1.10) the produced output data set should be coincide to the Principal component of cov(X)( without re-scaling to correlation), hence in this special case MDS and PCA are coincide (see [44]) There exist an alternative method to MDS, namely Fast Map see [45] [46].

Locally Linear Embedding (LLE)
Locally linear embedding is an approach which address the problem of nonlinear dimensionality reduction by computing low-dimensional neighborhood preserving embedding of high-dimensional data.A data set of dimensionality n , which is assumed to lie on or near a smooth nonlinear manifold of dimensionality m n  , is mapped into a single global coordinate system of lower-dimensionality m .The global nonlinear structure is recovered by locally linear fits.As usual given a Data set of N points on a n -dimensional points { } 1 x = from some underlying manifold.Without loss of generality we can assume each data point and its neighbors lie on are close to a locally linear sub-manifold.By a linear transform, consisting of a translation, rotation and rescaling, the high-dimensional coordinates of each neighborhood can be mapped to global internal coordinates on the manifold.In order to map the high-dimensional data to the single global coordinate system of the manifold such that the relationships between neighboring points are preserved.This proceeds in three steps: • Identify neighbors of each data point i x .this can be done by finding the K nearest neighbors, or choosing all points within some fixed radius ε .
• Compute the weights ij w     that best linearly reconstruct i x from its neighbors.• Find the low-dimensional embedding vector i y which is the best reconstructed by the weights determined in the previous step.After finding the nearest neighbors in the first step, the second step must compute a local geometry for each locally linear sub-manifold.This geometry is characterized by linear coefficients that reconstruct each data point from its neighbors.
where ( ) i N j is the index of the th j neighbor of the point.It then selects code vectors so as to preserve the reconstruction weights by solving This objective can be restate as

Graph-Based Dimensionality Reduction
As before given a data set X include N points in n  , i.e., { } , , , N X x x x =  , we associate to X a weighted undirected graph with N vertices and use the Laplacian matrix which defined see [47].In order to define an undirected graph we need define a pair ( ) ; V E of sets, V the set of vertices and E the set of edges.we follows here the method introduced in [48]. we But what it means to be close ?there are two variations define it: • ε -neighborhoods, which ε is a positive small real number.x are close iff i x is among K nearest neighbors of j x or j x is among K near-es neighbors of i x .that means this relation is a symmetric relation.To associate the weights to edges, as well, there is two variations: • Heat kernel, which γ is a real number.
We assume our graph, defined as above, is connected, otherwise proceed following for each connected component.Set is the Laplacian matrix of the graph, which is a symmetric, positive sewmi-definite matrix, so can be thought of as an operator on the space of real functions defined on the vertices set V of Graph.Compute eigenvalues and eigenvectors for the generalized eigenvector problem: f be the solutions of the above eigenvalue problem, ordered acording to their eigenvalues, means th i component of the vector f .This called the Laplacian Eigenmap embedding by Bel- kin and Niogi, see [48].

Isomap
Like LLE the Isomap algorithm proceeds in three steps: • Find the neighbors of each data point in high-dimensional data space.
• Compute the geodesic pairwise distances between all points.• Embed the data MDS so as preserve those distances Again like LLE, the first, the first step can be performed by identifying the K -nearest neighbors, or by choosing all points within some fixed radius, ε .These neighborhood relations are represented by graph G in which each data point is connected to its nearest neighbors, with edges of weights

Hessian Eigenmaps Method
High dimensional data sets arise in many real-world applications.These data points may lie approximately on a low dimensional manifold embedded in a high dimensional space.Dimensionality reduction (or as in this case, called manifold learning) is to recover a set of low-dimensional parametric representations for the high-dimensional data points, which may be used for further processing of the data.More precisely consider a d-dimensional parametrized manifold  embedded in n  where ( ) < d n characterized by a nonlinear map :  for some i y ∈  .Then the dimensionality reduction problem is to recover the parameter points i y s and the map ψ from i y s.Of course, this problem is not well defined for a general nonlinear map ψ .However, as is shown by Donoho and Grimes in the derivation of the Hessian Eigenmaps method [51], if ψ is a local isometric map, then ( ) = is uniquely determined up to a rigid motion and hence captures the geometric structure of the data set.
Given that the map ψ defined as above, is a local isometric embedding, the map 1 : locally) isometric coordinate system for  .Each component of φ is a function defined on  that provides one coordinate.The main idea of the Hessian Eigenmaps is to introduce a Hessian operator and a functional called the  -functional defined for functions on  , for which the null space consists of the d coordinate functions and the constant function.Let : f    be a function defined on  and let 0 x be an interior point of manifold  .We can define a function : . We call the Hessian matrix of g at 0 y the Hessian matrix of function f at 0 x in the isometric coordinate and we denote it by ( ) . From the Hessian matrix, we define a functional of f in isometric coordinates, denoted by where dx is a probability measure on  which has strictly positive density everywhere on the interior  .It is clear that f  of the d component functions of φ are zero as their pullbacks to  are linear functions.Indeed, ( ) null space, consisting of the span of the constant func- tions and the d component functions of φ ; see [51] (Corollary 4).The Hessian matrix and the  -functional in isometric coordinates introduced above are unfortunately not computable without knowing the isometric coordinate system φ first.To obtain a functional with the same property but independent of the isometric coordinate system φ , a Hessian matrix and the  -functional in local tangent coordinate systems are introduced in [51].Qiang Ye and Weifeng Zhi [52] developed a discrete version of the Hessian Eigenmaps method of Donoho ad Grims.

Vector Quantization
The main references for vector quantization are [40] and [53].In [53] it is introduced a hybrid non-linear dimension reduction method based on combining vector quantization for first clustering the data, after constructing the Voronoi cell clusters, applying PCA on them.In [40] both non-linear method i.e., vector quantization and non-linear PCA (using a five layer neural network) on the image data set have been used.It turns out that the vector quantization achieved much better results than non-linear PCA.

Genetic and Evolutionary Algorithms
These algorithms introduced in [54] are in fact optimization algorithms based on Darwinian theory of evolution which uses natural selection and genetics to find the optimized solution among members of competing population.There are several references for genetic and evolutionary algorithms [55], see [56] for more detail.An evolutionary algorithm for optimization is different from classical optimization methods in several ways: In [55] using genetic and evolutionary and algorithms combine with a k-nearest neighbor classifier to reduce the dimension of feature set.Here Input is population matrices which are in fact random transformation matrices

Regression
We can use Regression methods for dimension reduction when we are looking for a variable function Under assumption that the i x s are uncorrelated and relevant to expanding the variation in y .Of course in modern data mining applications however such assumptions rarely hold.Hence we need a dimension reduction for such a case.We can list well-known dimension reduction methods as follows: • The Wrapper method in machine learning community [57] • Projection pursuit regression [36] [58] • Generalized linear models [59] [60] • Adaptive models [61] • Neural network models and sliced regression and Principal hessian direction [62] • Dimension reduction for conditional mean in regression [63] • Principal manifolds and non-linear dimension reduction [64] • Sliced regression for dimension reduction [65] • Canonical correlation [66] Example 3 ( Ω : the Middle third Cantor set 2 E , This is the set of closed set of points in the unit interval whose triadic expansion does not contain any occurrence of the digit 1 : { } columns of U as above for LDA.
smooth curve which can be parametrized by a real value θ (actually we can choose

.
Then it iterates the two steps: • For a fixed θ , minimize ( ) have to repeat step(1) with changing original data i x with new one namely threshold j ρ and possibly new output function out ϕ .Hence we have: the distances ij d are as close as possible to a f of the corresponding proximity's ( ) ij f d .From[36], whether this function φ is linear or non-linear, MDS is called either metric or non-metric.Define an objective stress function MDS-PROCEDURE: solution for Y can have an arbitrary origin and orientation.In order to make the problem well-posed, those two degree of freedom must be removed.Requiring the coordinates to be centered on origin ( embedding vectors to have unit covariance ( ) T Y Y I = , removes the first and second degrees of freedom respectively.The cost function can be optimized initially by the second of those two constraints.Under this constraint, the cost is minimized when the column of T Y (rows of Y) are the eigenvectors with the lowest eigenvalues of L .Discarding the eigenvector associated with eigenvalue 0 satisfies the first constraint.
the norm is as usual the Euclidean norm in n  .•K nearest neighbors.Here K is a natural number.


We leave out the eigenvector (trivial eigenfuntion) corresponding to eigenvalue o, which is a vector with all component equal to 1 and use next m eigenvectors for embedding in m -dimensional Euclidean space:

(
between all pairs of points on the manifold M are then estimated in the second step.Isomap approximates in the graph G .This can be done in different ways including Dijstra algorithm[49] and Floyd's algorithm[50] will find output m N Y × so that the k-nearest neighbor classifier using the new features m for a given data set variables { } i x .

Figure 2 .
Figure 2. The construction of von Koch-curve.
solution 1 w * of above optimization problem is called Principal oriented component and it is the generalized eigenvector of matrix pair [ ] where  is a compact and connected subset of d • Random Versus Deterministic Operation • Population Versus Single Best Solution • Creating New Solutions Through Mutation • Combining Solutions Through Crossover • Selecting Solutions Via "Survival of the Fittest" • Drawbacks of Evolutionary Algorithms