On Rotational Robustness of Shapiro-Wilk Type Tests for Multivariate Normality ()
1. Introduction
Normal distributions are of central importance in statistical inference and in numerous applications. Thus, testing for normality including assessing multivariate normality has been studied extensively in statistics. For instance, in a research monograph, Thode [1] reviewed more than 30 formal statistical procedures proposed specifically for testing normality. For univariate data, the Shapiro-Wilk test (SWT) [2] for normality is the well- known benchmark of omnibus tests in terms of competitive power performance against a broad range of alternatives. Many extensions of the univariate SWT to multi-dimensions have been developed [1] , and at least two R packages for testing multivariate normality have recently been made based on the SWT-based tests by Royston [3] - [7] and Villaseñor-Alva and González-Estrada [8] [9] . The recently developed R-packages make the two SWT-based tests quite user-friendly thus have potential to be widely used by many practitioners in various fields of applications. Therefore, assessing the relative strength and robustness of these normality tests and providing insights and guidelines on when to use them would be of practical interest. This paper discusses the relative strength and rotational robustness of these user-friendly SWT-based tests and provides some guidelines for selecting tests in practical applications. In the next section we briefly review the original SWT [2] and two SWT-based tests for multivariate normality with R packages [3] - [9] . Section 3 discusses rotational robustness and the FA test [10] . Section 4 discusses the Iris data example and reports Monte Carlo simulation results. Section 5 provides some guidelines and remarks. The R code for the FA test is provided in the appendix.
2. The SWT-Based Normality Tests with R-Packages
Originally created to test univariate distributions for normality, given univariate data
the Shapiro-Wilk test (SWT) [2] statistic is

where
are the order statistics of the univariate data
,
is the sample mean, and the constants
are
, with
and
being the mean
vector and covariance matrix of the order statistics of a random standard normal sample of size
. The univariate SWT can be performed easily in R using the function shapiro.test based on Royston’s algorithms [3] [4] . The Shapiro-Wilk test (SWT) for normality is well known for its competitive power against numerous one-di- mensional alternatives. Many extensions of the SWT to the multivariate case have been proposed in the literature including Royston’s H-test [3] - [7] and a new multivariate extension of the SWT by Villaseñor-Alva and González-Estrada [8] . The first step in the extension of SWT proposed in [8] is to standardize the multiva-
riate data
by substracting the sample mean
and multiplying by the square root of the
inverse of its covariance matrix
. It is known that the covariance matrix
of a
-variate normal is non-
degenerate with probability one. Without loss of generality, let
be the symmetric square root of
and denote the standardized data as
and
is the
data matrix.
Under the null hypothesis,
has a multivariate normal distribution and
has approximately the multivariate standard normal distribution [8] . Villaseñor-Alva and González-Estrada [8] proposed to use the following test statistic
![]()
where
is the univariate SWT statistic evaluated on the ith coordinate of the standardized data
where
,
. This test is easy to understand and has been programmed into an R pack-
age called mvShapiroTest that makes it very user friendly [9] . Moreover, power simulations against many alternatives reported in [8] indicate that the new test has competitive power compared to many well-known normality tests including the HZ test of Henze and Zirkler [11] . Similarly, the well-known Royston’s extension of SWT has also been made into an R-package recently called royston [7] . Both Villaseñor-Alva and González- Estrada [8] and Royston developed effective numerical approximations to the null distribution of the normality tests so that p-values can easily be estimated for any given sample size. Thus the R-packages make these tests much more user-friendly than other SWT-based tests for multivariate normality in the literature [1] [12] - [14] . In addition, the royston package can also draw q-q plots thus facilitating visualization of the non-normality in the data [7] . Given the importance of assessing multivariate normality in practical applications, it is of practical interest to systematically investigate the relative strength of these user-friendly normality tests and provide potential guidelines about test selection among these candidate tests in practice.
3. Rotational Robustness and the FA Test of Fattorini
The multivariate normal distributions have rotational invariance. In particular,
is bivariate normal if and only if
is bivariate normal. In general,
is bivariate normal if and only if
is bivariate normal for any angle
. Also, when
is not bivariate normal,
is not bivariate normal for all angle ![]()
[15] . When
Thus if we know when
the data is more non-normal than at other angles, we would prefer testing normality for
. Similarly, if we know
is more non-normal than at other angles, we would prefer testing normality for
instead of testing normality for
. Ideally, when
is not bivariate normal, we might desire a normality test with good power for detecting non-normality of
for all angle
or a test that can automatically search for the most non-normal direction to test, i.e., one might hope to have a robust normality test when there is no clue about which angle corresponds to the most non-normal data. Among the SWT-based test for multivariate normality, the FA test proposed by Fattorini [10] and a similar test proposed by Malkovich and Afifi [12] actually search for the most non-normal direction and thus would be more rotationally robust than either the mvShapi-
roTest or the Royston test. The FA test statistic is given by
, where
is the SWT statistic evaluated at
, where
i.e.
is the jth standardized observation
, and
is the
data matrix. Possibly due to the ro-
bust power properties of the FA test, Thode [1] recommended the FA test as a generally powerful multivariate normality test besides the HZ test by Henze and Zirkler [11] after reviewing more than 30 tests for normality including both SWT-based and non-SWT based tests. The HZ test uses empirical characteristic functions instead of using extensions of the SWT.
4. The Iris Data Example and Numerical Studies
4.1. The Iris Data Example
The Iris data set is a well-known multivariate data set collected to measure the morphologic variation of Iris flowers of three related species. The data set consists of 50 samples from each of three species of Iris including setosa, virginica and versicolor. For each sample, four variables were measured including the length and the width of the sepals and petals, in centimeters. Fisher [16] used the Iris data to demonstrate the use of unclassified observations in estimating discriminant function. The non-normality of the multivariate Iris data has been investigated by many statisticians [17] [18] . For illustration, we consider the bivariate normality of the Iris data set by using the length and the width of the sepals of two of the species Iris setosa, and Iris versicolor that are the first two columns and first 100 rows of the Iris data. The p-values of the mvShapiro.Test and royston.test in R were 0.26 and 0.37, respectively. Thus both tests fail to detect non-normality when testing X. However, if we test U(X) instead of X for normality, both the mvShapiro.Test and royston.test have very small p-values, 0.0003 and 0.0018, respectively, providing significant evidence of non-normality. Thus in this case, the two tests based on testing U(X) are more powerful than testing X. For the above data the FA test has p-values < 0.001 in testing both X and U(X) indicating rotational robustness.
4.2. Simulation Studies
From the above Iris data example, it is clear that testing for X and testing for U(X) can have dramatically different powers for the mvShapiro.Test and royston.test, we conducted further simulations for a wide variety of alternatives. Indeed, neither of these two tests has robust power against rotational alternatives when the marginal distributions of X are independent. They are seriously lack of rotational robustness compared to the FA test. More specifically, the R package mvShapiroTest was used to evaluate the test statistics, the critical values, and powers of the test discussed in [8] [9] . The mvShapiro.Test function was applied to 500,000 independent samples from the standard bivariate normal distribution to evaluate the test statistics and the percentiles corresponding to alpha = 5% and 1%. The estimated power was the percentage of the simulated mvShapiro.Test$statistic
that exceeded the previously calculated critical values under 50,000 samples from each alternative. Similarly the royston package in R is used to calculate critical values of the royston.test based on 500,000 samples from the standard bivariate normal distribution and the empirical power based on 50,000 samples from each alternative distribution. Using the same set up, the critical values and power of the FA test [10] were obtained using the R code provided in the Appendix. Note that all three tests are applied to standardized data in our simulations to make sure Type I errors are correct under multivariate normality with unknown mean vector and covariance matrix and to ensure fair power comparison.
4.3. Simulation Findings
The simulated power is illustrated in Table 1 for the sample size n = 50 and
. The power patterns are the same for other sample sizes e.g. n = 25 or 100, thus not presented. More specifically, in Table 1, the al-
![]()
Table 1. Power (in percent): (α = 5%, n = 50).
ternative χ2(5)*χ2(5) stands for a bivariate distribution with the two independent marginal distributions each being the χ2(5) distribution.
is non-normal for any given angle when
is from χ2(5)*χ2(5) [15] . However, it is intuitively clear that
is closer to normal than either
or
by the central limit theorem. Thus the mvShapiroTest and the Royston test both have high power to detect non-normality of χ2(5)*χ2(5) when testing
, but they only have less than 50% power to detect non-normality of
On the other hand, the power for the FA test stays essentially at the same 92% level because it searches for the most non-normal directions to test. The power patterns for testing other alternatives with independent marginal distributions are similar. For illustration of alternatives with non-independent marginals, we use the same Pearson II and VII alternatives and the same normal mixture alternatives as in [8] . When marginal distributions are not independent, the
can be more non-normal than
, then the mvShapiroTest and the Royston test both can have much higher power to detect non-normality when testing
than when testing
. This can be seen clearly from the above Iris data example and from the last normal mixture alternative in Table 1, where the mvShapiroTest and the Royston test have only 6% and 15% power when testing
, respectively. However, both tests have much higher power (more than 60% power) when testing non-normality for
From Table 1, if we know the observed marginal distributions are independent and correspond to the most non-normal directions, the mvShapiro.Test or the royston.test in R is a good choice for testing
because of good power. However, the most non-normal angles are typically unknown in practice. In the absence of such information about alternatives, a rotationally robust test such as the FA test might be more desirable than both Royston’s H-test and the mvShapiro.Test proposed in [8] and [9] .
5. Discussion
If we have prior information that
has independent margins and
itself is the most non-normal configuration than any of the
, then we can use either the mvShapiro.Test or the royston.test in R to detect non-normality. When there is no such information available, one might prefer to use a rotationally robust test such as the FA test. The FA test has not been incorporated into R packages yet, but it can be coded easily using the shapiro.test function in R (see Appendix). Given its rotationally robust property, it is likely to be included in some R packages in the future. It is also likely that some new robust test possibly more powerful than the FA test might be developed. For example, Table 1 also reports a test based on searching for the most non-normal
angle in the Royston test among six fixed angles
called the R6 test. Note that the R6 test has equal power whether testing
or
and seems to have robust power
comparable to the FA test in the bivariate case. Of course, combining SWT-based tests with other non-SWT type tests (e.g. the kurtosis test [18] - [20] ) might also worth consideration in order to obtain generally robust and powerful tests such as the combination test developed in [21] .
Acknowledgements
The authors would like to thank Dr. Ming Zhou and Dr. Jiqiang Guo for help with R programming and for constructive conversations. Y.S.’s research is partially supported by the NYU NIEHS Center Grant P30 ES00260 and the NYU Cancer Center Support Grant 2P30 CA16087.
Appendix: The R Code for Calculating the FA Test Statistic
“SW” <- function(X) shapiro.test(X)$statistic
“FA” <- function(X) {
X < - as.matrix(X)
n < - NROW(X)
p < - NCOL(X)
mu< - apply(X,2,mean)
nSinver< - solve((n-1)*cov(X))
Y < - X%*%t ((X-matrix(rep(mu,n),ncol=p,byrow=TRUE))%*%nSinver)
return(min(apply(Y,2,SW)))}
## END