Distribution of the Sample Correlation Matrix and Applications

For the case where the multivariate normal population does not have null correlations, we give the exact expression of the distribution of the sample matrix of correlations R, with the sample variances acting as parameters. Also, the distribution of its determinant is established in terms of Meijer G-functions in the null-correlation case. Several numerical examples are given, and applications to the concept of system dependence in Reliability Theory are presented.


Introduction
The correlation matrix plays an important role in multivariate analysis since by itself it captures the pairwise degrees of relationship between different components of a random vector.Its presence is very visible in Principal Component Analysis and Factor Analysis, where in general, it gives results different from those obtained with the covariance matrix.Also, as a test criterion, it is used to test the independence of variables, or subsets of variables ( [1], p. 407).
In a normal distribution context, when the population correlation matrix = I Λ , the identity matrix, or equivalently, the population covariance matrix Σ is diagonal, i.e. ( ) , the distribution of the sample correlation matrix R is relatively easy to compute, and its determinant has a distribution that can be expressed as a Meijer G-function distribution.But when ≠ I Λ no expression for the density of R is presently available in the literature, and the distribution of its determinant is still unknown, in spite of efforts made by several researchers.We will provide here the closed form expression of the distribution of R, with the sample variances as parameters, hence complementing a result presented by Fisher in [2].
As explained in [3], for a random matrix, there are at least three distributions of interest, its "entries distribution" which gives the joint distribution of its matrix entries, its "determinant distribution" and its "latent roots distribution".We will consider the first two only and note that, quite often, the first distribution is also expressed in terms of the determinant, and can lead to some confusion.In Section 2 we recall some results related to the case where ≠ I Λ , and establish the new exact expression of the density of R, with the sample variances { } ii s as parameters, denoted { } ( ) . In Section 3, some simulation results are given.The distribution of R , the determinant of R, is given in Section 4 for = I Λ .Applications of the above results to the concept of dependence within a multi-component system are given in Section 5. Numerical examples are given throughout the latter part of the paper to illustrate the results.

Covariance and Correlation Matrices
Let us consider a random vector X with mean µ and covariance matrix Σ , of the form of a (p × p) symmetric positive definite random matrix of pairwise covariances between components in the matrix.We obtain the population correlation matrix Λ by diving each ij σ by ii jj σ σ .Then (or matrix of sums of squares and products, ) are independent, with the latter having a Wishart distribution ( ) Similarly, the correlation matrix  1 , and, similarly, It is noted that by considering the usual sample covariance matrix ( ) , between the two diagonal ele- ments, but the sample correlation matrix is the same.
The ( ) coefficients ij r are (marginally) distributed independently of both the sample and popula- tion means [2].It is to be noticed that while R can always be defined from S, the reverse is not true since S has ( ) independent parameters.This fact explains the differences between results when either R or S is used.
In the bivariate case, Hotelling's expression [4] clearly shows that the density of r depends only on the population correlation coefficient ρ and the sample size n (see Section 4.3) and we will see that, similarly, the density of the sample correlation matrix, with the sample variances as parameters, is dependent on the population variances and the sample size.However, ij r are biased estimators of ij σ , and Olkin and Pratt [5] have suggested using the modified estimator ( ) , with a table of corrective multipliers for ij r for convenience.

Some Related Work
Several efforts have been carried out in the past to obtain the exact form of the density of R for the general case, where 0 ij ρ ≠ , i j ≠ and 3 p ≥ .For example, Joarder and Ali [6] derived the distribution of R for a class of elliptical models.Ali, Fraser and Lee [7], starting from the identity correlation matrix case, derived the density for the general case when ≠ I Λ , again by modulating the likelihood ratio to obtain a density of R containing the function , already used by Fraser ([8], p. 196) in the bivariate case.But here,

( )
Vec R , gives for a first-order approximation for R and the expression of ( )

( )
Vec R , presented a general method for approximating the density of R through another multivariate density, possibly one of higher dimension.Finally, Farrell ([11], p. 177) has approached the problem using exterior differential forms.However, no explicit expression for the density of R is given in any of these works, and the question rightly raised is whether that such an expression really exists.

Some New Results
We begin with the case of ( ) . Then it can be easily established that the density of R is ( [12], p. 107): ( ) By showing that the joint density of the diagonal elements { } and R are hence independent.We can also show that (2) is a density, i.e. it integrates to 1 within its definition domain, by using the approach given in Mathai and Haubold ([13], p. 421), based on matrix decomposition.
In what follows, following Kshirsagar's approach [14], which itself is a variation of Fisher's [2] original method, we establish first the expression of a similar joint distribution, when Σ is not diagonal.Λ has ii λ as its diagonal elements.Then the correlation matrix R of a random sample of n observations has its distribution given by: ( ) ( ) where , with the sample covariance { } ij s , 1 i j p ≤ < ≤ , serving as parameters.
PROOF: Let us consider the "adjusted sample covariance matrix" S, given by (1).We know that ~p W S ( ) with C being an appropriate constant.
Our objective is to find the joint density of R with a set of variables { } 1 p i i u = so that the density of R can be obtained by integrating out the variables i u .Let 1 − Λ be the inverse of Λ , the (population) matrix of correlations, with diagonals ii λ , 1 i p ≤ .We first transform the ( ) Since the diagonal ii s remain unchanged, the Jacobian of the transformation from S to and the new joint density is: where , which depends on S.
The joint density of { } and R is now: are independent only if the above expression can be expressed as the product , which is not the case since Φ contains ij r .
To integrate out the vector u , we set ( ) by Fisher [2], and by ( ) Kshirsagar [14] who used the nota- tion ( ) F Γ and, in both instances, was left non-computed).Hence, if ( ) G Φ is constant, as in the case when with , being the p-gamma function.However, in the general case this is not true.
Consider the quadratic form: Changing to ij s , we have ( ) ( ) Since each integral is a gamma density in ii s , with value G Φ contains all non-diagonal entries of the sample covariance matrix, and depends on S. We now obtain from (6) expression (3) of Theorem 1.Here, the off-diagonal sample covariance, { } ij s , serve as parameters of this density.

QED.
Alternately, using the corresponding correlation coefficients, we have:

REMARKS:
1) For = I Λ our results given above should reduce to known results, and they do.Indeed, since we now have = I Φ ( ) as in (2) since we now have ( ) Here, only the value of p needs to be known and this explains why expression (2) depends only on n and p.Also, as pointed out by Muirhead ([15], p. 148), if we do not suppose normality, the same results can be obtained under some hypothesis.
2) Expression (3) can be interpreted as the density of R when { } 1 p ii i σ = are known, and { } , 1 ij s i j p ≤ < ≤ , is a set of constant sample covariances.But when this set is considered as a random vector, with a certain distribution, ( ) ℑ is given by (7).
3) We have, using ( ) Expression (8) gives the positive numerical value of ( ) upon knowledge of the value of S. It will serve to set up simulation computations in Section 3.However, it also shows that ( ) f R R can be defined with S, and when S has a certain distribution the values of ( ) f R R are completely determined with that distri- bution.This highlights again the fact that, in statistics, using R or S can lead to different results, as mentioned previously.

Other Known Results
1) The distribution of the sample coefficient of correlation in the bivariate normal case can be determined fairly directly when integrating out 1 s and 2 s , and this fact is mentioned explicitly by Fisher ([2], p. 4), who stated: "This, however, is not a feasible path for more than two variables."In the bivariate case, ( , and has been well-studied by several researchers, using different approaches and, as early as 1915, Fisher [16] gave its density as: Using geometric arguments, with ρ being the population coefficient of correlation.We refer to ( [17], p. 524-534) for more details on the derivation of the above expression.Other equivalent expressions, reportedly as numerous as 52, were obtained by other researchers, such as Hotelling [4], Sawkins [18], Ali et al. [7].
2) Using Equation (3) on the sample correlation matrix 1 1 , obtained from the sample covariance matrix Together with p = 2, we can also arrive at one of these forms (see also Section 4.3, where the determinant of R provides a more direct approach).
3) Although Fisher [2] did not give the explicit form of the integral ( ) Φ , as a function of the sample size n and coefficients . For example, in the case all 0 ij ρ = , i j ≠ , the generalized volume in the p(p − 1)/2-dimension space of the region of integration for r ij is found to be a function of p, having a maximum at p = 6.In the case all 0 ij γ = , the ex- pressions of the partial derivatives of Log ( ) G Φ w.r.t.ij γ can be obtained, and so are the mixed derivatives.
For the case n = 2, and p = 3,

( )
G Φ has an interesting geometrical interpretation as ( ) , where

Simulations Related to R
A matrix equation such as (3) can be difficult to visualize numerically, especially when the dimensions are high, i.e. 3 p ≥ .Ideally, to illustrate (7), a figure giving ( ) f R R in function of the matrix R itself is most informa- tive, but, naturally, impossible to obtain.One question we can investigate is how the values of ( ) µ Σ ?Simulation using (8) can provide some information on this distri- bution in some specific cases.For example, we can start from the (4 × 4) population covariance matrix 0.122 0.097 0.016 0.010 0.140 0.011 0.009 0.030 0.006 0.011 taken from our analysis of Fisher's iris data [19].It concerns the Setosa iris variety, with x 1 = sepal length, x 2 = sepal with, x 3 = petal length and x 4 = petal width.It gives the population correlation matrix 1 0.7425 0.2672 0.2781 1 0.1777 0.2328 1 0.3316 where all 0 ij ρ ≠ .We generate 10,000 samples of 100 observations each from h W , with a very small variation interval, i.e. most of its values are concentrated around the mode.
Note: A special approach to graphing distributions of covariance matrices, using the principle of decomposing a matrix into scale parameters and correlations, is presented in: Tokuda, T., Goodrich, B., Van Mechelen, I., Gelman, A. and Tuerlinckx, F., Visualizing Distributions of Covariance Matrices (Document on the Internet).
It is also mentioned there that for the Inverted Wishart case, with ν degrees of freedom, then ( ) where ii R is the i-th principal sub-matrix of R, obtained by removing row and column i (p.12).

Simulations Related to R
Similarly, the same application above gives the approximate simulated distribution of R presented in Figure 2. We can see that it is a unimodal density which depends on the correlation coefficients ij r .
Using expression (7), which exhibits explicitely r ij , and replacing r ij by the corrected value ( ) , the unbiased estimator of ij ρ , we obtain R .However, since an unbiased estimator of Λ is still to be found we cannot use neither R nor R as point-estimate of Λ .Figure 2 gives the simulated distributions of R and of R .We can see that the two approximate densities are different, and the density of R has higher mean and median, resulting in a shift to the right.But, again, the two variation intervals are very small.

Expression of ( ) Φ G
In the proof of Theorem 1, we have established that  ( ) Using the above matrix Σ , for a simulated sample, say 0.1106 0.019 0.015 0.017 0.0384 0.0004 0.0092 0.0098 0.0034 0.172 we compute directly the left side by numerical integration, and the right side by using the algebraic expression.
The results are extremely close to each other, with both around the numerical value 1.238523012 × 10 5 .

Distribution of R , the Determinant of R
First, let det(R) be denoted by R .In this case , this distribution is very complex and no related result is known when 3 p ≥ .Nagar and Castaneda [20], for example, established some results in the general case, for p = 2.
Theoretically, we can obtain the density of R from (3) by applying the transformation → R R , with dif- ferential ( ) ( ) , but the expression obtained quickly becomes intractable.Only in the case of = I Λ that we can derive some analytical results on R , as presented in the next section.Gupta and Nagar [21] established some results for the case of a mixture of normal models, but again under the hypothesis of = I Λ .

Density of the Determinant R
When considering Meijer G-functions and their extensions, Fox's H-functions [22], for = I Λ the density of R can be expressed in closed forms, as are those of other related multivariate statistics [23].Let us recall that the Meijer function , and the Fox function , are defined as follows: ( ) is the integral along the complex contour L of a rational expression of Gamma functions ( ) ( ) ( ) ( ) It is a special case, when 1, Under some fairly general conditions on the poles of the gamma functions in the numerator, the above integrals exist.( ) ( )

PROOF:
From (2) the moments of order t of R are: ( ) 1 t ≥ , which is a product of moments of order t of independent beta variables.Upon identification of ( 12) with these products, we can see that  .Using [23], the product of k independent betas, ( ) ( ) ( ) ) , Hence, we have here, the density of R as: β = and ( ) And, hence, we obtain:

QED.
The density of R can easily be computed and graphed, and percentiles of R can be determined numeri- cally.For example, for p = 4, n = 8.The 2.5 th and 97.5 th percentiles can be found to be 0.04697 and 0.7719 respectively.

Product and Ratio
Let 1 R and 2 R be two independent correlation matrices, obtained from 2 populations, each with zero popula-tion correlation coefficients.The determinant of their product is also a G-function distribution, and its density can be obtained.This result is among those which extend relations obtained in the univariate case by Pham-Gia and Turkkan [24], and also has potential applications in several domains.THEOREM 3: , p N µ Σ respectively, both i Σ being diagonal.Then the determinant R of the product R and 2 R has density: where PROOF: Immediate by using multiplication of G-densities presented in [23].

QED.
Figure 3 shows the density of n = , 1 4 p = and 1 10 n = , 2 5 p = .Using again results presented in [23] we can similarly derive the density of the ratio 1 2 R R in terms of G-functions.Its expression is not given here to save space but is available upon request.

Particular Cases
1) Bivariate normal case: a) for the bivariate case we have , and when ρ is zero, we have from (11) the density of R as . Hence, the distribution of 2 r is: ( ) ( ) for 1 1 r − ≤ ≤ .Testing 0 : 0 H ρ = is much simpler when using Student's t-distribution, , and is covered in most textbooks.
2) When 0 ρ ≠ Hotelling ( [4]) gave the density of r as: where , ; ; F a b c x is Gauss hypergeometric function with parameters , a b and c. 3) Mixture of Normal Distributions: With X coming now from the mixture: , 1 , F , as expected.For the bivariate case, Nagar and Castaneda [20] established the density of r and gave its expression for both cases, 0 ρ ≠ and 0 ρ = .In the first case the density of r, when only one population is considered, reduces to the expression obtained by Hotelling [4] above.

Dependence and R
Correlation is useful in multiple regression analysis, where it is strongly related to collinearity.As an example of how individual correlation coefficients are used in regression, the variance inflation factor (VIF), well adopted now in several statistical softwares, measures how much the variance of a coefficient is increased by collinearity, or in other words, how much of the variation in one independent variable is explained by the others.For the j-th variable, j VIF is the j-th diagonal element of 1 − R .We know that it equals ( ) , with .12 1, 1, , being the multiple correlation of the j-th variable regressed on the remaining 1 p − others.When all correlation measures are considered together, measuring intercorrelation by a single number has been approached in different ways by various authors.Either the value of R or those of its latent roots can be used.Rencher ([26], p. 21) mentions six of these measures, among them 1− R , where R is an observed value of R, and takes the value 1 if the variables are independent, and 0 if there is an exact linear dependence.But since the exact distribution of R is not available this sample measure is rather of descriptive type and no formal inferential process has really been developed.
Although the notion of independence between different components of a system is of widespread use in the study of the system structure, reliability and performance, its complement, the notion of dependence has been a difficult one to deal with.There are several dependence concepts, as explained by Jogdev [27], but using the covariance matrix between different components in a joint distribution remains probably the most direct approach.Other more theoretical approaches, are related to the relations between marginal and joint distributions, and Joe [28] can be consulted on these approaches.Still, other aspects of dependence are explored in Bertail et al. [29].But two random variables can have zero correlation while being dependent.Hence, no-correlation and independence are two different concepts, as pointed out in Drouet Mari and Kotz [30].Furthermore, for two independent events, the product of their probabilities gives the probability of the intersection event, which is not necessarily the case for two non-correlated events.Fortunately, these two concepts are equivalent, when the underlying population is supposed normal, a hypothesis that we will suppose in this section.

Inner Dependence of a System
When considering only two variables, several measures of dependence have also been suggested in the literature (Lancaster [31]), and especially in system reliability (Hoyland and Rausand [32]), but a joint measure of the degree of dependence between several components of a random vector, or within a system ϑ , or inner depen- dence of ϑ , denoted by ( ) δ ϑ , is still missing.We approach this dependence concept here by way of the cor- relation matrix, where a single measure attached to it would reflect the overall degree of dependence.This concept has been presented first in Bekker, Roux and Pham-Gia [33], to which we refer for more details.It is defined as ( ) ( ) ≤ .The measure of independence within the system is then , where Λ is a point estimation of Λ based on R, the correlation matrix associated with a sample of n observations of the p-component system.In the general case this estimation question is still unresolved, except for the binormal case, In the language of Reliability Theory, a p-component normal system is fully statistically independent when the ( ) ρ of its components are all zero.We have: THEOREM 4: 1) Let the fully statistically independent system ϑ have p components with a joint normal distribution with = I Σ , where 2 p ≥ .a) Then the distribution of the sample coefficient of inner dependence ( ) b) For the two-component case (p = 2), we have: 2) For a non-fully independent two-component binormal system ( )

Conclusion
In this article we have established an original expression for the density of the correlation matrix, with the sample variances as parameters, in the case of the multivariate normal population with non-identity population correlation matrix.We have, furthermore, established the expression of the distribution of the determinant of that random matrix in the case of identity population correlation matrix, and computed its value.Applications are made to the dependence among p components of a system.Also, expressions for the densities of a sample measure of a system inner dependence are established.
have the relation between determinants: where we suppose that the population correlation≠ I Λand its inverse 1 − defined in two steps: a) { } ij s has an (known or unknown) distribution 0 ℘ , i.e. { } 0 ĩj s ℘ .b) (R; { } 0 ĩj s ℘ ) has the density given by (3), denoted 0 ℑ .The distribution of R is then 0 0 ℑ * ℘ , where * denotes the mixture operation.However, a closed form for this mixture is often difficult to obtain.Alternately, ( ) f R R could be 1 1 ℑ * ℘ , with 1 ℘ being the density of the diagonal sample variances { } ii s and off-diagonal sample correlations, { } 1 1 , and the relation = ⋅ S S D R the following equivalent expression, where generalized volume defined by ij γ − and D is the volume defined by the p unit vectors in a transformation where ij γ − are the cosines of the angles between pairs of edges.

( ) 4 ,Figure 1
N µ Σ , which give 10,000 values of the covariance matrix S, which, in turn, give matrix values for R, scalar values for R and, finally, positive scalar values for R S S , as given by(8).Recall that for X normal S has a Wishart distri- bution gives the corresponding histogram, which shows that values of W are distributed along a unimodal density, denoted by ( )

Figure 2 .
Figure 2. Simulated densities of R and R .
, the density of R depends only on n and p:

Figure 3 .
Figure 3. Density of product of independent correlation determinants.
the density of W in terms of Meijer G-functions, but for the case = I Σ only.The complicated form of this density contains the hypergeometric function ( )

=
, where ρ is the coefficient of correlation with its estimation well known, depending on either ρ is supposed to be zero or not.The associated sample measure being ( ) d r ϑ = , it is of interest to study the dis- tribution of the sample inner dependence ( ) d ϑ , based on a sample of n observations of the system.
the density of ( ) d ϑ , as given by(17), is obtained from(11) by a change of varia- ble. Figure 4 gives the density of the sample coefficient of inner dependence ( ) d ϑ , for n = 10 and p = 4. Expression (18) is obtained from (15) by the change of variable d r = .Again, the density of d, as given by (19), can be derived from (16) by considering the same change of variable.QED.Numerical computations give ( ) 0.17665 E d = .Estimation of ( ) δ ϑ from ( ) d ϑ now follows the same principles as ρ from r.