A Simulation Based Comparison of Correlation Coefficients with Regard to Type I Error Rate and Power ()
1. Introduction
In practice, researchers are often interested in investigating the linear association between two variables, where Pearson-moment correlation coefficient (r) is probably the most commonly used method [1] -[4] . This correlation coefficient, however, is only valid under the assumptions of bivariate normality, and linear relationship between and homoscedasticity of the two variables [5] -[8] . It is also highly sensitive to outliers as reported by Wilcox [5] that a single outlier can mask strong relationship between two variables. In practice, these assumptions may not be satisfied, which results in a decrease in power and a deviation of the type I error rate from the nominal level [9] -[12] . The issue of non-normality, a commonly recommended approach is to normalize the non-normal variables using an appropriate transformation and then compute Pearson correlation coefficient on the transformed data. However, in practice, transformation approach does not give satisfactory results in many experimental conditions [13] [14] . Various alternatives have been proposed, such as Spearman-Rank, Kendall-Tau, non- normal transformation based Pearson correlation coefficient, Winsorized and Permutation-based correlation coefficients when assumptions of Pearson-Moment correlation are not satisfied [5] [15] [16] , among which Spearman-Rank correlation coefficient is the next most commonly utilized approach in practice while the Winsorized and Permutation-based correlation coefficients may be even more appropriate as they both are not highly affected by distribution shape and outliers [2] [12] [16] -[19] . However, the relative performance of these different methods needs to be explored further.
In this study, we compare the performance of the Pearson-moment correlation coefficient with that of the alternatives, namely, Spearman-Rank, Kendall-Tau, Winsorized and the Permutation-based correlation coefficient under different experimental conditions through extensive simulations. In Section 2, we provide a brief description of the correlation coefficients we are comparing. In Section 3, we describe those experimentation condition, followed by the results of these simulations in Section 4. We end with a practical real-life example and discussions.
2. Materials and Methods
2.1. Simulation Setting
Three different bivariate distributions, namely, bivariate normal distribution, bivariate t-distribution with 10 degree of freedom, and bivariate log-normal distribution (skewness = 2.1), were considered, where the choices of bivariate t and log-normal distributions represents different degrees of violations of bivariate normality. All three distributions were generated as suitable transformations from a standard bivariate normal distribution with a given covariance structure.
The bivariate normal distributions
were generated according to the following transformation:

where
and 
The bivariate t-distribution with 10 d.f.
was generated as follows:

where, u and v are independent variables generated from the gamma distribution.
The log-normal distribution was generated as an exponential transformation of the bivariate normal distribution as follows:
[20] -[23] .
For each experimentation condition with sample sizes of 5, 10, 20, 30, 50, and 100, and the true correlation coefficients of 0, 0.2, 0.4, 0.6 and 0.8, 50,000 random samples were generated. Naturally, the experimental conditions with the true Pearson’s correlation coefficient of 0 (zero) correspond to the absence of any relations between two variables; thus, the rejection of the null hypothesis that ρ = 0 corresponds to the type I error rate while the rejection rate is the test power when ρ ≠ 0.
The type I error rate and test power are estimated as the rejection proportion based on the nominal alpha level of 0.05; that is, the rejection rate (R) is calculated as

where I is the indicator function,
, is the permutation p-value, and N is the number of simulations. IMSL library of Fortran Power Station Developer was used for all simulations.
2.2. Correlation Coefficients
2.2.1. Pearson-Moment Correlation Coefficient
Given a bivariate data set of size n,
, the sample Pearson-moment correlation coefficient,
, between variables X and Y is defined by the formula

where
is the sum of products of the departures from the means,
and
are the sum of squares of X and Y variables, respectively [24] .
2.2.2. Spearman-Rank Correlation Coefficient
Spearman’s rank correlation provides a distribution free test of independence between two variables. Let R(x) and R(y) denote the ranks of a pair of variables (X and Y). In this case, Spearman-Rank correlation coefficient is calculated as
[24] .
2.2.3. Winsorized Correlation
It is well known that the Pearson correlation coefficient is sensitive to outliers. Wilcox (2001) proposed that one of the ways of handling outliers is to compute the Winsorized correlation coefficient, which is computed after the k smallest observations are replaced by the (k + 1)st smallest observation, and the k largest observations are replaced by the (k + 1)st largest observation. Therefore, the observations are winsorizedat each end of both X and Y [5] [18] .
Let
be a random sample from any bivariate distribution. We winsorize
to
and
to
as described above. Then, the Winsorized correlation coefficient
, is computed same as the Pearson moment correlation using the Winsorized data as follows:
![]()
where
and
are the Winsorized means of X and Y variables, respectively Let γ denote the Winsorized percent and define
; then,
is distributed as t-distribution with (n-2g-2) d.f. [18] .
2.2.4. Permutation-Based Pearson Correlation
Principals of permutation tests consist of the following four basic steps:
1) Calculate the Pearson correlation coefficient,
, using the original data set.
2) Find all possible permutations as: n!; if n is too large, take a large sample, say, 10,000, from all possible permutations.
3) Re-calculate the Pearson correlation coefficient for each permutation sample (
), and
4) Compare the absolute value of the correlation coefficients from the permuted samples with the absolute value of the correlation coefficient from the original sample. The permutation-based p-value is computed by counting the number of times that correlation coefficients (
) calculated from all permuted samples are greater than or equal to calculated Pearson correlation coefficient (
) and dividing it by the total number of permutations used, which is shown as follows:
[25] [26] .
In our simulations, we used 10,000 permutations to obtain the permutation p-values to evaluate the Type-1 error and power.
2.2.5. Kendall-Tau Correlation
Let
be a set of joint observations from two random variables, X and Y. Any pair of observations
and
are called to be concordant if the ranks for both pairs agree (i.e. if both
and
or if both
and
), otherwise discordant if
and
or if
and
.
Let C and D denote the number concordant and discordant pairs, respectively. The Kendall Tau coefficient is defined as:
[27] .
3. Results
The empirical type I error rates and power for Pearson-moment, Spearman-Rank, Kendal-Tau, Permutation- based and Winsorized correlation coefficients are presented in Tables 1-3. When samples are taken from bivariate normal distribution, which is the ideal case for Pearson-moment correlation coefficient, except for very small sample sizes such as n = 5, all correlation coefficients retain the type I error rate at the nominal. When the type I error rate and power were evaluated together Pearson, Winsorized and Permutation-based correlation coefficients are more robust when compared to the Spearman-Rank and Kendall-Tau correlation coefficients.
Empirical the type I error rates for bivariate t-distribution with 10 d.f. are given in Table 2. As seen in Table 2, the type I error rates are very similar to obtained under bivariate normality presented in Table 1, which is not surprising because bivariate t-distribution with 10 degrees of freedom is quite close to the bivariate normal distribution. Permutation-based and Winsorized correlation coefficients are the best especially when sample sizes are small (n ≤ 10) under these experimental conditions as they retain the type-I error rate while have higher power than the other approaches. On the other hand, as sample size increases, all correlations coefficient tend to give similar results as expected.
The empirical type I error rate and power of the correlation coefficients under bivariate lognormal distribution are presented in Table 3, where it is obvious that the Pearson correlation coefficient is highly negatively affected by the distribution shape. The empirical the type I error rate for Pearson correlation coefficient is estimated to be greater than 6% even when sample size is 100. Spearman-Rank and Kendall-Tau correlation coefficients are not much affected from the distribution shape where their type I error rates are quite similar to what was observed the type I error bivariate normal distribution. These two tests especially affected by the sample sizes, which is not surprising. Under these experimental conditions, the Winsorized and Permutation-based correlation coefficients, especially the Winsorized correlation coefficient, seem to be the most robust alternatives to the distribution shape, while all methods except Pearson correlation coefficient, seem to have similar the type I error rates as sample sizes are increased. When both the type I error rate and power were evaluated together, the Winsorized and Permutation-based correlation coefficients seem to be more advantageous over the Pearson, Spearman-Rank and Kendall-Tau correlation coefficients as they are more robust to the shape of the underlying distribution and sample size.
We also studied sampling distributions of the estimates for different correlation coefficients to evaluate the relative frequency of the positive estimates which provide an indirect indication of the relative test power of
![]()
Table 1. Type I error rates and test powers when samples taken from bivariate normal distribution.
![]()
Table 2. Type I error rates and test powers when samples taken from bivariate t-distribution with 10 d.f.
![]()
Table 3. Type I error rates and test powers when samples taken from bivariate log-normal distribution.
these correlation coefficients. It is expected to observe a higher frequency of positive estimates for the correla- tion coefficient which corresponds to higher power when compared to the others. As expected, the number of positive correlation coefficient estimates increase as the sample size increases (Figures 1-3). In general, the test power of Permutation-based, Winsorized and Pearson correlation coefficients are higher than those of the Spearman rank and Kendal Tau correlation coefficients. There are significant differences between these tests especially when the sample size is small.
Figures 1-3 (also these) are examples of the sampling distributions of the five correlation coefficient methods for the true correlation coefficient of 0.6 and sample sizes of 5, 20 and 100, respectively. Each sampling distribution
![]()
Figure 1. Sampling distributions of Pearson-Moment, Spearman-Rank, Kendal Tau and Winsorized correlation coefficients when samples were taken from bivariate normal distribution with ρ = 0.60.
![]()
Figure 2. Sampling distributions of Pearson-Moment, Spearman-Rank, Kendal Tau and Winsorized correlation coefficients when samples were taken from bivariate t-distribution with 10 d.f.
is obtained from 50,000 simulations. Despite the smaller range, the frequency of positive estimates of Kendall’s correlation coefficient is the least powerful of the five approaches.
4. A Real-Life Application
To illustrate the application of the five correlation coefficients discussed above. We used a data set that is obtained from an experiment to investigate the effects of three different feeding programs (AD: Adlibitum, R20:
Feed Restriction Program 1, NF6: Feed Restriction Program 2) on growth curves of male chickens (Table 4), where the association between the back weight and the chest weight was of interest to the investigator.
![]()
Figure 3. Sampling distributions of Pearson-Moment, Spearman-Rank, Kendal Tau and Winsorized correlation coefficients when samples were taken from bivariate log-normal distribution.
![]()
Table 4. Back weight (g) and chest weight (g) of chickens.
4.1. Pearson Correlation Coefficient
![]()
and
![]()
4.2. Spearman Rank Correlation
![]()
![]()
4.3. Kendall Tau Correlation
For this data set, the number concordant and discordant pairs are found to be 13 and 8, respectively. Thus, the Kendall Tau correlation coefficient and its significance test are calculated as follows:
![]()
![]()
4.4. Winsorized Correlation
Firstly, the data are sorted from the smallest to the largest in each cell. Then, the smallest and the largest values are replaced with the observations closest to them. Let Winsorized percentage, g, be 10.0%. In this case, the new data set would be as follows:
![]()
![]()
![]()
![]()
![]()
![]()
4.5. Permutation-Based Correlation Coefficient
For this data set, it is possible to form
permutation samples. From these possible permutations, we have computed the permutation-based p-value as 0.152. Correlation coefficient values and hypotheses testing about these correlation coefficients are given in Table 5. As seen in Table 5, while the Winsorized correlation coefficient rejected the null hypothesis, the Pearson, Permutation-based, Spearman and Kendal-Tau correlations accepted the null hypothesis. Therefore, due to differences in their theory, computation steps, and assumptions it is not surprise to get different results even the same data set is used.
5. Discussion
In any research area, association between two variables is often of interest to the researcher; for example, the researcher may be interested in estimating the association between weight and systolic blood pressure, or plant length and yield, etc. In doing so, the researcher wishes to test for the existence of an association as well as its strength and direction if one exists. To do that, among various available methods, the researcher needs to choose the one that is most suitable to the data at hand, which may not be an easy task. Although most commonly employed in practice, many studies have shown that the Pearson-moment correlation coefficient is not robust to the violations of its assumptions, which prompts the researcher to consider alternatives such as Spearman-Rank, Kendal-Tau, and Pearson correlation-based nonlinear transformation, Permutation-based and Winsorized correlation coefficients, etc., where Spearman-Rank and Kendal-Tau correlation coefficients are the most cited alternatives. However, although the advantages of these two correlation coefficients for the ordinal and ranked data are emphasized, their performance under continuous data is not clear. In addition, the performance of the Spearman-Rank correlation coefficient has not been investigated thoroughly under many experimental conditions which the researcher faces in practice. Another practical approach is to normalize the data via a suitable transformation so to use Pearson-correlation coefficient [15] [16] [28] [29] . However, nonlinear transformations
![]()
Table 5. Correlation coefficients, the corresponding critical values, and decision about null hypothesis.
are not given satisfactory results especially when distributions are highly skewed and sample size is small. Furthermore, nonlinear transformations can cause to interpretational ambiguity [13] [14] . Thus, it necessitates using other robust alternatives such as Winsorized and Permutation-based correlation coefficients when assumptions of Pearson-moment correlation coefficient are violated. However, there has been a need to extensively investigate the performance of these two correlation coefficients under different experimental conditions commonly met in practice.
Simulation results showed that unless severe deviations from the normality assumptionexist, Pearson correlation coefficient is a method of choice, which is the case under both bivariate normal distribution and bivariate t-distribution with 10 d.f, where there is no skewness. However, it is sensitive to excessive deviations from the normality especially to skewness with small sample sizes (n < 50).
Although Spearman-Rank correlation coefficient is the most recommended when assumptions of Pearson- moment correlation coefficient is not met, in our simulations, its performance as well as the performance of Kendal-Tau are quite poor. Our findings seem to contradict with those of Fowler [10] who reported that Spearman’s correlation was more powerful than Pearson’s r across a range of non-normal bivariate distributions. Likewise, Zimmerman and Zumbo [11] reported that Spearman’s correlation was more powerful under mixed- normal and non-normal distributions. They also reported that when studying with exponential distributions, Spearman’s correlation retained the type I error rates at or below the nominal level whereas Pearson’s correlation produced inflated the type I error. Indeed, as stated in many studies, using the Spearman-Rank and Kendal Tau correlation coefficients are more appropriate when data are ordinal or ranked [24] [30] -[32] . The apparent difference between our results and those discussed above may come from the experimental conditions such as distribution shapes, sample size, number of simulations taken into considered (Note that we worked with 50,000 simulation samples per condition).
In general, Permutation-based Pearson correlation coefficient and especially the Winsorized correlation coefficient are most robust to the distribution shape, sample size, and outliers. Thus, it is possible to recommend both Permutation-based correlation coefficient and Winsorized correlation coefficient over the Spearman-Rank and Kendall-Tau correlation coefficients when assumptions of Pearson correlation coefficient are not satisfied. Hayes [2] reported that especially when normality assumption is violated, permutation tests tend to do well at controlling the type I error rate, which was similarly argued by Good [19] as well. Good [33] and Mielke and Berry [34] recommended using Permutation-based procedures, especially when sample sizes are small and the variables of interest are non-normally distributed, which are conclusions supported by our study as well. However, it must be kept in mind as suggested by Keller-McNulty and Higgins [35] , Rasmussen [15] , Hayes [2] that permutation tests do not always solve all assumption violation problems, and simulation results on this issue have been mixed.
6. Conclusion
To conclude, we recommend the use of either Winsorized correlation coefficient or Permutation-based Pearson correlation for investigating the relationship between two continuous variables of interest. Between the two, the permutation-based Pearson correlation is more computationally challenging, as it is critical to generate sufficiently large number of permutation samples to avoid misleading results [36] . For example, for n = 5, there exists 5! = 120 permutation samples, which can be generated very easily. For n = 8, we have 40,320, and for n = 10, we have 3,628,800 possible permutations. Any sample size beyond n = 10 may be too difficult to generate all possible permutations, which necessitates to sample from possible permutations, where such a sample must be sufficiently large. Although Hayes [2] [37] [38] proposed to limit the number of permutation samples to 5000, with today’s computing power, a sample of as big as 100,000 or even bigger sample size, can be easily achieved.
NOTES
*Corresponding author.