Nonlinear Principal and Canonical Directions from Continuous Extensions of Multidimensional Scaling

A continuous random variable is expanded as a sum of a sequence of uncorrelated random variables. These variables are principal dimensions in continuous scaling on a distance function, as an extension of classic scaling on a distance matrix. For a particular distance, these dimensions are principal components. Then some properties are studied and an inequality is obtained. Diagonal expansions are considered from the same continuous scaling point of view, by means of the chi-square distance. The geometric dimension of a bivariate distribution is defined and illustrated with copulas. It is shown that the dimension can have the power of continuum.


Introduction
Let X be a random variable on a probability space ( ) where ( ) n X is a sequence of uncorrelated random variables, which can be seen as a decomposition of the so-called geometric variability ( ), V X δ defined below, a dispersion measure of X in relation with a suitable distance function ( ) , , , .
Here orthogonality is synonymous with a lack of correlation.Some goodness-of-fit statistics, which can be expressed as integrals of the empirical processes of a sample, have expansions of this kind [1][2][3].Expansion (1) is obtained following a similar procedure, except that we have a sequence of uncorrelated rather than independent random variables.Finite orthogonal expansions appear in analysis of variance and in factor analysis.Orthogonal expansions and series also appear in the theory of stochastic processes, in martingales in the wide sense ([4], Chap.4; [5], Chap.10), in non-parametric statistics [6], in goodness-of-fit tests [7,8], in testing independence [9] and in characterizing distributions [10].
The existence of an orthogonal expansion and some classical expansions is presented in Section 2. A continuous extension of matrix formulations in multidimensional scaling (MDS), which provides a wide class of expansions, is presented in Section 3. Some interesting expansions are obtained in Section 4 for a particular distance as well as additional results, such as an optimal property of the first dimension.Section 5 contains an inequality concerning the variance of a function.Section 6 is devoted to diagonal expansions from the continuous scaling point of view.Sections 7 and 8 are devoted to canonical correlation analysis, including a continuous generalization.This paper extends the previous results on continuous scaling [11][12][13][14], and other related topics [15,16].

Existence and Classical Expansions
There are many ways of obtaining expansion (1).Our aim is to obtain some explicit expansions with good properties from a multivariate analysis point of view.However, before doing this, let us prove that a such expansion exists and present some classical expansions.
Theorem 1.Let X be an absolutely continuous r.v. with density f and support I and h a measurable function such that ( ) ( ) Then there exists a sequence ( ) with ( ) ( ) where the series L f .The assumption ( ) d var .
where n b are the Fourier coefficients and where mn δ is Kronecker's delta.Replacing x by X , and defining ( ) , we obtain ( ) This series converges almost surely, since the series

L f
We may suppose In particular, the series in (2) converges also in the mean square. Some classical expansions for X or a function of X are next given.

Legendre Expansions
Let F be the cdf of .X An all-purpose expansion can be obtained from the shifted Legendre polynomials Thus we may consider the orthogonal expansion are the Fourier coefficients.This expansion is quite useful due to the simplicity of these polynomials, is optimal for the logistic distribution, but may be improved for other distributions, as it is shown below.

Univariate Expansions
A class of orthogonal expansions arises from where both , h g are probability density functions.Then ( ) ( ) n A X is a complete orthonormal set w.r.t. .g

Diagonal Expansions
Lancaster [17] studied the orthogonal expansions where h is a bivariate probability density with marginal densities , .
are complete orthonormal sets w.r.t f and , g respectively.Moreover where n ρ is the n-th canonical correlation between X and .Y Expansion (4) can be viewed as a particular extension of Theorem 3, proved below, when the distance is the so-called chi-square distance.This is proved in [18].See Section 6.

Continuous Scaling Expansions
In this section we propose a distance-based approach for obtaining orthogonal expansions for a r.v.X , which contains the Karhunen-Loève expansion of a Bernoulli process related to X as a particular case.We will prove that we can obtain suitable expansions using continuous multidimensional scaling on a Euclidean distance.
Let : I I δ + × →  be a dissimilarity function, i.e., ( ) ( ) = for all , .x y I ∈ Definition 1.We say that δ is a Euclidean distance function if there exists an embedding ( ) , where E is a real separable Hilbert space with quadratic norm ⋅ , such that ( ) ( ) ( ) We may always view the Hilbert space E as a closed linear subspace of 2 . In this case, we may identify ( ) .
I δ The geo- metric variability of X w.r.t.δ is defined by The proximity function of x to X is defined by The double-centered inner product related to δ is the symmetric function These definitions can easily be extended to random vectors.For example, if X is ( ) x is an observation of X and δ is the Euclidean distance in p  , then is the Mahalanobis distance from x to .µ The function G is symmetric, semi-definite positive and satisfies It can be proved [19], that there is an embedding G is the continuous counterpart of the centered inner product matrix computed from a n n × distance matrix and used in metric multidimensional scaling [20,21].The Euclidean embedding or method of finding the Euclidean coordinates from the Euclidean distances was first given by Schoenberg [22].The concepts of geometric variability and proximity function, were originated in [23] and are used in discriminant analysis [19], and in constructing probability densities from distances [24].In fact, ( ) δ is a variant of Rao's quadratic entropy [25].See also [14,26].
In order to obtain expansions, our interest is on ( G X X ′ i.e., we substitute x by the r.v. .X For convenience and analogy with the finite classic scaling, let us use a generalized product matrix notation (i.e., a "multivariate analysis notation"), following [18].We write ( ) according to (5), (8), i.e., we substitute x by X in ( ) ( ) ( ) ( )  and suppose that each and may be expressed as where 1 2  f stands for the continuous diagonal matrix ( ) ( ) diag f x and the row × column multiplication, denoted by , * is evaluated at x I ∈ and then follows an integration w.r.t. .x In the theorems below, ( ) ( ).
n n u x u x′ Theorem 2. Suppose that for a Euclidean distance δ the geometric variability ( ) This proves 1).Next, G is p.s.d., so for 1 , , n x x  the n n × matrix with entries ( ) where ( ) n u is a complete orthonormal set of eigenfunctions in ( ) ( ) , X tr G < ∞ by Theorem 2. Next, ( 9) and ( 10) can be written as i.e., Thus, for all , , Moreover, from (10) we also have where mn δ is Kronecker's delta, showing that the variables ( ) n c X are centered and uncorrelated.
Recall the product matrix notation.The principal components of Q such that is the spectral decomposition of .Σ Premultiplying (12) by 1 2 f Q and postmultiplying by ′ Φ we obtain contains the principal components of .
Q The rows in C may be accordingly called the principal coordinates of distance δ .This name can be justified as follows.
Let us write the coordinates another Euclidean configurations giving the same distance δ .The linear transformation ( ) ( ) , , p X p X  are uncorrelated and ( ) ( ) x is the first principal coordinate in the sense that ( ) x and has maximum variance, and so on with the others coordinates. The following expansions hold as an immediate consequence of this theorem: where ≥ are sequences of centered and uncorrelated random variables, which are principal components of Q .We next obtain some concrete expansions.

A Particular Expansion
If X is a continuous r.v. with finite mean and variance, ( ), say, and δ is the ordi- nary Euclidean distance x x′ − , then it is easy to prove that Then from ( 14) and taking A much more interesting expansion can be obtained by taking the square root of x x′ − .

The Square Root Distance
Let us consider the distance function ( ) The double-centered inner product ( ) where n c is defined in (10).We immediately have the following result.Proposition 1.The function n h satisfies: .
Proof.Using (17) and expanding we get (18). Replacing x by X we have the expansion and, as a consequence [12]: This expansion also follows from (15).19) we can also obtain the expansions as well as where the convergence is in the mean square sense [27].

Principal Components
Related to the r.v.X with cdf , F let us define the stochastic process { } , , and follows the Bernoulli ( ) . For the distance ( 16), the relation between X , the Bernoulli process X and δ is ( ) ( ) where ( ) ( ) The covariance kernel λ are arranged in descending order.As a consequence of Mercer's theorem, the covariance ker- nel . ( , , h x h x  are principal coordinates for the distance . To prove 1), let us use the multiplication " * " and write ( ) x s > and 0 oth- erwise.Then X is a centered continuous configuration for δ and clearly .′ = * G X X Arguing as in Theo- rem 3, the centered principal coordinates are 2) is a consequence of Theorem 3.An alternative proof follows by taking , , , in the formula for the covariance [28]: where , , h x h x  are principal coordinates for the distance ( )  The above results have been obtained via continuous scaling.For this particular distance, we get the same results by using the Karhunen-Loève (also called Kac-Siegert) expansion of X, namely, ( ) where Thus each n X is a principal component of X and the sequence constitutes a countable set of uncorrelated random variables.

The Differential Equation
Let , n n E X µ = It can be proved [12] that the means , n µ eigenvalues n λ and func- tions , n g satisfy the second order differential equation The solution of this equation is well-known when X is ( ) 1. X is ( ) where n ξ is the n-th positive root of 1 J and 0 1 , J J are the Bessel functions of the first order.3. X is standard logistic ( ) ( ) ( ) , 1:

A Comparison
The results obtained in the previous sections can be compared and summarized in Table 1, where X is a random variable with absolutely continuous cdf , F density f and support

I a b =
The continuous scaling expansion is found w.r.t. the distance (16).Note that we reach the same orthogonal expansions (we only include two), but this continuous scaling approach is more general, since by changing the distance we may find other principal directions and expansions.This distance-based approach may be an alternative to the problem of finding nonlinear principal dimensions [30].

Some Properties of the Eigenfunctions
In this section we study some properties of the eigenfunctions n ψ and their integrals .
ψ is also positive (Perron-Frobenius theorem).On the other hand ( ) h is increasing and positive.Moreover, any n h satisfies the following bound.
is an eigenfunction and from (24)

L F
Proof.The orthogonality has been proved as a consequence of (24).Let ( ) and integrating, we have 0 1 .
shows that φ must be constant.

The First Principal Component
In this section we prove two interesting properties of 1 h and the first principal component ( ) where  When the function z is given, µ and λ may be obtained by solving the equations and the density of X is ( ).
ρ denote the squared correlation between t X and a func- tion ( ).
g X The average of and we can suppose Thus the supreme is attained at 1 .g h ≡ 

An Inequality
The following inequality holds for X with the normal ( ) [31,32]: where g is an absolutely continuous function and ( ) g X has finite variance.This inequality has been extended to other distributions by Klaassen [33].Let us prove a related inequality concerning the function of a random variable and its derivative.If ( ) f If g is an absolutely continuous function and ( ) g X has finite variance, the following inequality holds with equality if g is 1 .h Proof.From Proposition 6, we can write . .
( ) Some examples are next given.

Uniform Distribution
Suppose X uniform ( )

Logistic Distribution
Suppose that X follows the standard logistic distribution.The cdf is ( ) ( ) and the density is ( ) The density ( ) is just f , therefore inequality (30) for the lo- gistic distribution reduces to are orthogonal to f and using (24), we obtain

Diagonal Expansions
Correspondence analysis is a variant of multidimensional scaling, used for representing the rows and columns of a contingency table, as points in a space of low-dimension separated by the chi-square distance.See [15,34].This method employs a singular value decomposition (SVD) of a transformed matrix.A continuous scaling expansion, viewed as a generalization of correspondence analysis, can be obtained from (3) and (4).

Univariate Case
Let , h f be two densities with the same support .I Define the squared distance The double-centered inner product is given by and the geometric variability is which is a Pearson measure of divergence between h and .
is an orthonormal basis for ( )
, , A x A x  are not the principal coordinates related to the above distance.In fact, the continuous scaling dimension is 1 for this distance and it can be found in a straightforward way.
b d The geometric variability of the chi-square distance is 2 .
The proximity function is and the double-centered inner product is We can express (32) as ( ) φ < ∞ Multiplying ( ) by himself and integrating w.r.t.y we readily obtain .
See [18] for further details.Finally, the following expansion in terms of cdf's holds [35]: where Using a matrix notation, this diagonal expansion can be written as ( ) ( ) where D ρ stands for the diagonal matrix with the canonical correlations, and , , , dA da da , , .db db ′ 

The Covariance between Two Functions
Here we generalize the well-known Hoeffding's formula which provides the covariance in terms of the bivariate and univariate cdf's.The proof of the generalization below uses Fubini's theorem and integration by parts, being different from the proof given in [28].
Let us suppose that the supports of ,

Canonical Analysis
Given two sets of variables, the purpose of canonical correlation analysis is to find sets of linear combinations with maximal correlation.In Section 6.2 we studied, from a multidimensional scaling perspective, the nonlinear canonical functions of X and , Y with joint h Here we find the canonical correlations and functions for several copulas.

Let ( )
, U V be a bivariate random vector with cdf ( ) , C u v , where U and V are ( ) 0,1 uniform.Then C is a cdf called copula [36].Let us suppose C symmetric.Then ( ) ( ) ( ) Thus, the set of canonical functions and correlations for the Cuadras-Augé copula is the uncountable set ( ) 1 , , θ γ φ θγ − 0 1, γ ≤ ≤ with dimension of the power of the continuum.In particular, the maximum correlation is the parameter θ with canonical function the Heaviside distribution 1 .
absolutely continuous cdf F and density f w.r.t. the Lebesgue measure.Our main purpose is to expand (a function of)

2 L f of measurable functions u on I such that 2 2 L
f are separa- ble Hilbert spaces with quadratic-norms ⋅ and f ⋅ , respectively.Moreover f T of centered and uncorrelated r.v.'s, which are principal components of ,

2 L
I and the eigenvalues ( ) n

Note that the change of variable y x α = transforms g in g α and ( 26
For instance, we immediately can obtain the principal dimensions of the Pareto distribution with cdf

n h Proposition 4 .
The first eigenfunction 1 ψ is strictly positive and satisfies

Theorem 5 .
Let Y be a r.v. with pdf 1 .

X 1 .
Y are the intervals [ ] [ ] , , , , a b c d ⊂  respectively, although the results may also hold for other subsets of . We then have Both functions are of bounded variation.
the cdf of Y given X x = , we can write

Table 1 . Principal components and principal directions of a random variable.
This distribution has especial interest, as the two first principal components are 1 t with density ( ) Suppose that h is absolutely continuous w.r.t.f g × and let us consider the Radom-Nikodym derivative