^{1}

^{*}

^{2}

^{*}

^{1}

In many areas of applied statistics, confidence intervals for the mean of the population are of interest. Confidence intervals are typically constructed as-suming normality although non-normally distributed data are a common occurrence in practice. Given a large enough sample size, confidence intervals for the mean can be constructed by applying the Central Limit Theorem or by the bootstrap method. Another commonly used method in practice is the back-transformation method, which takes on the following three steps. First, apply a transformation to the data such that the transformed data are normally distributed. Second, obtain confidence intervals for the transformed mean in the usual manner, which assumes normality. Third, apply the back- transformation to obtain confidence intervals for the mean of the original, non-transformed distribution. The parametric Wald method and a small sample likelihood-based third order method, which can address non-normality, are also reviewed in this paper. Our simulation results suggest that common approaches such as back-transformation give erroneous and misleading results even when the sample size is large. However, the likelihood-based third order method gives extremely accurate results even when the sample size is small.

In the last two decades, there has been a push in psychological science to improve research reporting with an emphasis on effect size and confidence interval reporting (see American Education Research Association [

Most, if not all, modern introductory statistics textbooks review and describe the construction of confidence intervals (e.g., see Moore et al. [

( x ¯ − t * s x n , x ¯ + t * s x n ) (1)

where

x ¯ = ∑ i = 1 n x i n , s x 2 = ∑ i = 1 n ( x i − x ¯ ) 2 n − 1 ,

and t* is the ( 1 − α / 2 ) 100 t h percentile of the Student t distribution with ( n − 1 ) degrees of freedom. Moreover, when the sample size n is large (usually stated n is larger than 30), then the ( 1 − α ) 100 % confidence interval for μ can still be obtained from (1) except that we replaced t* by z*, which is the ( 1 − α / 2 ) 100 t h percentile of the standard normal distribution.

The fundamental assumption underlying the construction of this confidence interval is that the data are normally distributed. However, collected data are usually non-normally distributed in practice (for examples in psychology, see Cain et al. [

In this paper, we compare various methods for constructing confidence intervals when data are non-normally distributed. Three of the most popular and commonly used methods are the method based on the Central Limit Theorem, the bootstrap method, and the back-transformation method, which are reviewed in Section 2. The parametric based Wald method and likelihood-based third order method are also discussed in Section 2. Note that the popular back-trans- formation method requires the existence of a transformation such that the transformed data are normally distributed. The selection of such a transformation by the Box-Cox transformation and the Tukey’s ladder of power transformation are briefly discussed in Section 2. Two empirical examples are presented in Section 3 to illustrate that confidence intervals based on the different methods discussed in Section 2 can be vastly different. Simulation results are presented in Section 4 to compare the accuracy of the methods discussed in this paper and illustrated that the likelihood-based third order method gives extremely accurate coverage probability even when the sample size is small, the Wald method, the Central Limit Theorem method and then bootstrap method all performed poorly when sample size is small but the performance increases when the sample size increases, and the popular back-transformation method should not be used because it does not construct the confidence interval for the correct parameter. Finally, some concluding remarks are given in Section 5.

This section reviews four commonly used methods, namely the Central Limit Theorem, bootstrap, back-transformation, and Wald for obtaining a confidence interval for the mean of a non-normal distribution. A very accurate likelihood-based method is also introduced in this section.

Let ( x 1 , ⋯ , x n ) be a sample from a non-normal distribution with mean ψ . When the sample size n is large, the Central Limit Theorem gives

X ¯ − ψ v a r ( X ¯ ) → N ( 0,1 )

where X ¯ = ∑ i = 1 n X i n and v a r ( X ¯ ) = v a r ( X ) n . Since x ¯ and s x 2 are the unbiased

estimates of ψ and v a r ( X ) respectively; by the Central Limit Theorem, an approximate ( 1 − α ) 100 % confidence interval for ψ is

( x ¯ − z * s x n , x ¯ + z * s x n ) (2)

where z * is the ( 1 − α / 2 ) 100 t h percentile of the standard normal distribution.

The bootstrap method is a popular non-parametric method, which does not require any distributional assumptions. Efron and Tibshirani [

Sample: ( x 1 , ⋯ , x n )

Step 1: Resample the observed sample with replacement and calculate the sample mean for this bootstrap sample.

Step 2: Repeat Step 1 B times, where, typically, B ≥ 200 .

Step 3: Sort the B bootstrapped sample means; the ( α / 2 ) 100 t h and ( 1 − α / 2 ) 100 t h percentiles give the ( 1 − α ) 100 % percentile bootstrap confidence interval for the population mean.

Note that as with the Central Limit Theorem method, the bootstrap method requires the observed sample size to be large so as to be representative of the population.

Recall that X is a non-normally distributed random variable with mean ψ . Assume there exists a transformation g ( ⋅ ) such that Y = g ( X ) is normally dis- tributed with mean μ and variance σ 2 . By the delta method,

ψ = E ( X ) = E [ g − 1 ( Y ) ] ≈ g − 1 ( μ ) ,

and an approximate ( 1 − α ) 100 % confidence interval for ψ from (1) is

( g − 1 ( y ¯ − t * s y n ) , g − 1 ( y ¯ + t * s y n ) ) . (3)

It is important to note that (3) could be misleading because g − 1 ( μ ) can be very different from ψ . For example, if X follows a lognormal ( μ , σ 2 ) distribution, then Y = log ( X ) is distributed as N ( μ , σ 2 ) . It follows that the delta method gives ψ = E ( X ) ≈ exp ( μ ) . However, as shown in

The rest of this subsection is to provide a systematic way of choosing the transformation g ( ⋅ ) . In practice, the most common simple transformations are the logarithmic transformation and square root transformation. Box and Cox [

When observed data are non-normally distributed, a common approach is to first apply a transformation such that the transformed data become somewhat

Transformation | ||
---|---|---|

−1 | reciprocal | does not exist |

0 | logarithmic | |

square root | ||

1 | none |

normally distributed. In the statistical literature, two very similar families of transformations are frequently discussed: the Box-Cox transformation and Tukey’s ladder of power transformation. In particular, Osborne [

Tukey’s ladder of power transformation takes the form

Y = X λ = { log ( X ) if λ = 0 X λ if λ ≠ 0

where λ is called the power parameter of this transformation, where λ is chosen such that Y is approximately normally distributed. Moreover, λ should be chosen such that the power parameter is easy to interpret. Note that λ = 1 is equivalent to no transformation. In practice, the popular reciprocal transformation, logarithmic transformation, and square root transformation are equivalent

to λ = − 1 , 0 and 1 2 , respectively.

With an observed sample, we suggest the choice of λ be based on three criteria:

1. de-trended normal quantile-quantile (Q-Q) plot,

2. p-value of the Shapiro-Wilk test of normality, and

3. skewness.

First, when the de-trended normal Q-Q plot deviates from the horizontal reference line which indicates identical quantiles between the data and a theoretical normal distribution, the plot suggests that the data are likely non-normally distributed. Second, simulation studies by Razali and Wah [

As in the previous subsection, we assume that X be a non-normally distributed random variable with mean ψ and there exists a transformation g ( ⋅ ) such that Y = g ( X ) is normally distributed with mean μ and variance σ 2 . Moreover, ψ = ψ ( μ , σ 2 ) .

The log-likelihood function concerning Y can be written as

l ( μ , σ 2 ) = a − n 2 log σ 2 − 1 2 σ 2 ∑ i = 1 n ( y i − μ ) 2 (4)

where a is an additive constant. Without loss of generality, a is set to zero hereafter. The overall maximum likelihood estimate (MLE), denoted by ( μ ^ , σ ^ 2 ) ′ , can be obtained by solving

∂ l ( μ , σ 2 ) ∂ μ | μ ^ , σ ^ 2 = 1 σ ^ 2 ∑ i = 1 n ( y i − μ ^ ) = 0

∂ l ( μ , σ 2 ) ∂ σ 2 | μ ^ , σ ^ 2 = − n 2 σ ^ 2 + 1 2 σ ^ 4 ∑ i = 1 n ( y i − μ ^ ) 2 = 0.

Hence, we have

μ ^ = y ¯ , σ ^ 2 = 1 n ∑ i = 1 n ( y i − μ ^ ) 2 = ( n − 1 ) s y 2 n .

The observed information matrix is the negative of the second derivatives of the log-likelihood function with respect to the parameters:

j ( μ , σ 2 ) = ( n σ 2 1 σ 4 ∑ i = 1 n ( y i − μ ) 1 σ 4 ∑ i = 1 n ( y i − μ ) − n 2 σ 4 + 1 σ 6 ∑ i = 1 n ( y i − μ ) 2 ) .

The variance-covariance matrix for ( μ ^ , σ ^ 2 ) ′ can be approximated by the inverse of Fisher’s expected information matrix, { E [ j ( μ , σ 2 ) ] } − 1 , which, in general, can be difficult to obtain in practice. Nevertheless, the variance-covariance matrix for ( μ ^ , σ ^ 2 ) ′ can be approximated by the inverse of the observed information evaluated at the MLE, j − 1 ( μ ^ , σ ^ 2 ) where

j ( μ ^ , σ ^ 2 ) = ( n σ ^ 2 0 0 n 2 σ ^ 4 ) .

It is well-known that ( μ ^ , σ ^ 2 ) ′ is asymptotically distributed as normal with mean ( μ , σ 2 ) ′ and variance j − 1 ( μ ^ , σ ^ 2 ) .

Recall that the parameter of interest is ψ = ψ ( μ , σ 2 ) , and we denote ψ ^ = ψ ( μ ^ , σ ^ 2 ) . By the delta method,

v a r ^ ( ψ ^ ) ≈ ( ∂ l ( μ ^ , σ ^ 2 ) ∂ ( μ , σ 2 ) ) j − 1 ( μ ^ , σ ^ 2 ) ( ∂ l ( μ ^ , σ ^ 2 ) ∂ ( μ , σ 2 ) )

where

∂ l ( μ ^ , σ ^ 2 ) ∂ ( μ , σ 2 ) = ( ∂ l ( μ ^ , σ ^ 2 ) ∂ μ ∂ l ( μ ^ , σ ^ 2 ) ∂ σ 2 ) | ( μ ^ , σ ^ 2 ) .

Thus, an approximate ( 1 − α ) 100 % confidence interval for ψ is

( ψ ^ − z * v a r ^ ( ψ ^ ) , ψ ^ + z * v a r ^ ( ψ ^ ) ) . (5)

For the case of the logarithmic transformation (i.e., Tukey’s ladder of power transformation where λ = 0 ), the parameter of interest is ψ = exp ( γ ) , where γ = μ + σ 2 / 2 . Therefore, by the Wald method, a ( 1 − α ) 100 % confidence in- terval for γ is

( γ ^ − z * v a r ^ ( γ ^ ) , γ ^ + z * v a r ^ ( γ ^ ) )

where γ ^ = μ ^ + σ ^ 2 / 2 and v a r ^ ( γ ^ ) ≈ σ ^ 2 n + σ ^ 4 2 n . Thus, an approximate ( 1 − α ) 100 %

confidence interval for ψ is

( e x p { γ ^ − z * v a r ^ ( γ ^ ) } , e x p { γ ^ + z * v a r ^ ( γ ^ ) } ) .

For the case of the square root transformation (i.e., Tukey’s (1977) ladder of

power transformation where λ = 1 2 ), the parameter of interest is ψ = μ 2 + σ 2 .

Therefore, an approximate ( 1 − α ) 100 % confidence interval for ψ is given by (5), where

ψ ^ = μ ^ 2 + σ ^ 2 and v a r ^ ( ψ ^ ) = 2 σ ^ 2 ( 2 μ ^ 2 + σ ^ 2 ) n .

Both the Central Limit Theorem method and Wald method have a theoretical rate of convergence of O ( n − 1 / 2 ) , and both the back-transformation method and the bootstrap method have no known rate of convergence. In recent years, many methods have been developed to improve the rate of convergence of existing asymptotic methods. In this subsection, we review the modified signed log-like- lihood ratio method by Barndorff-Nielsen [

r * = r * ( ψ ) = r ( ψ ) − 1 r ( ψ ) log r ( ψ ) q ( ψ ) (6)

where

r ( ψ ) = s i g n ( ψ ^ − ψ ) { 2 [ l ( μ ^ , σ ^ 2 ) − l ( μ ^ ψ , σ ^ ψ 2 ) ] } 1 / 2 (7)

is the signed log-likelihood ratio statistic, ( μ ^ ψ , σ ^ ψ 2 ) is the constrained MLE obtained by maximizing the log-likelihood function for a given ψ value, and q ( ψ ) is a statistic based on the log-likelihood function given in (4). Barndorff- Nielsen [

If the model is an exponential family model and the parameter of interest ψ is a component parameter of the canonical parameter, Fraser [

Notation: l ( θ ) is the log-likelihood function;

θ is a k-dimensional vector of parameters;

φ = φ ( θ ) is a k-dimensional vector of canonical parameters for the exponential family model;

ψ = ψ ( θ ) is a scalar parameter of interest;

( x 1 , ⋯ , x n ) is the observed data.

Aim: Inference about ψ .

Step 1: From the log-likelihood function, obtain the overall MLE, θ ^ , ψ ^ = ψ ( θ ^ ) , l ( θ ^ ) and j ^ = j θ θ ( θ ^ ) can be obtained.

Step 2: Apply the Lagrange multiplier technique to obtain the constrained MLE at ψ = ψ 0 . More specifically, maximize

H ( θ , λ ) = l ( θ ) + λ ( ψ ( θ ) − ψ 0 )

with respect to ( θ , λ ) , where λ is defined as the Lagrange multiplier. Denote the result of the maximization be ( θ ˜ ψ 0 , λ ˜ ) .

Step 3: Define the tilted log-likelihood function as

l ˜ ( θ ) = l ( θ ) + λ ˜ ( ψ ( θ ) − ψ )

where ψ is a fixed value. Obtain the constrained MLE either from the tilted log-likelihood function or from Step 2, θ ˜ ψ , l ( θ ˜ ψ ) = l ˜ ( θ ˜ ψ ) and j ˜ θ θ ( θ ˜ ψ ) , which is the matrix of the negative of the second derivatives of the tilted log-likelihood function.

Step 4: The signed log-likelihood ratio statistic is

r = s g n ( ψ ^ − ψ ) { 2 [ l ( θ ^ ) − l ( θ ˜ ψ ) ] } 1 / 2 .

Step 5: Define

χ ( θ ) = ψ θ ( θ ˜ ψ ) φ θ − 1 ( θ ˜ ψ ) φ ( θ )

where ψ θ ( θ ) is the first derivative of ψ ( θ ) with respect to θ , and φ θ ( θ ) is the first derivative of φ ( θ ) with respect θ . This quantity is a recalibration of the parameter of interest ψ in the canonical parameter φ space.

Step 6: The quantity | χ ( θ ^ ) − χ ( θ ˜ ψ ) | measures the departure of ψ ^ from ψ in φ space.

Step 7: The estimated variance for the departure in φ space is given by

v a r ^ ( χ ( θ ^ ) − χ ( θ ˜ ψ ) ) = ψ θ ( θ ˜ ψ ) j ˜ θ θ − 1 ( θ ˜ ψ ) ψ ′ θ ( θ ˜ ψ ) | j ˜ θ θ ( θ ˜ ψ ) | | φ θ ( θ ˜ ψ ) | − 2 | j θ θ ( θ ^ ) | | φ θ ( θ ^ ) | − 2 .

Step 8: The standardized MLE departure under the φ scale is given by

q = s i g n ( ψ ^ − ψ ) | χ ( θ ^ ) − χ ( θ ˜ ψ ) | v a r ^ ( χ ( θ ^ ) − χ ( θ ˜ ψ ) ) .

Step 9: The modified signed log-likelihood ratio statistic is given by

r * = r − 1 r log r q .

Although the algorithm involves many steps, it can easily be implemented into algebraic or statistical software such as MATLAB, Maple and R.

In this section, the different methods of constructing a confidence interval about the mean of non-normally distributed data are illustrated with two empirical examples. We demonstrate that the results obtained by the methods discussed in this paper can be very different.

Bland and Altman [

Data | Method | 95% confidence interval |
---|---|---|

Original | Central Limit Theorem | (0.48, 0.54) |

Log-transformed | Back-transformation | (0.45, 0.49) |

Log-transformed | Wald | (0.46, 0.51) |

Log-transformed | Third order | (0.46, 0.51) |

measurements for the alternative methods reviewed above Note that for this example, the bootstrap method cannot be applied because the original data set is not unavailable.

Bland and Altman [

McDonald [

These data are non-normally distributed and McDonald [

Data | Method | 95% confidence interval |
---|---|---|

Original | Central Limit Theorem | (9.3, 28.5) |

Original | Bootstrap (B = 5000) | (10.1, 28.3) |

Log-transformed | Back-transformation | (5.0, 24.4) |

Log-transformed | Wald | (9.3, 54.4) |

Log-transformed | Third order | (11.1, 123.9) |

Square root transformed | Back-transformation | (6.7, 26.9) |

Square root transformed | Wald | (9.8, 28.0) |

Square root transformed | Third order | (11.1, 31.7) |

and the bootstrap method with B = 5000 to the original data; and the back- transformation method, Wald method, and likelihood-based third order method to both the logarithmic transformed data and square root transformed data.

The results obtained by the methods discussed in this paper are very different for different transformations. In particular, the logarithmic transformation results in a much larger upper bound of the interval compared to the square root transformation. Thus, it is essential to identify which transformation is more appropriate for a given set of data.

The de-trended normal Q-Q plots for the original data, logarithmic transformed data and square root transformed data are shown in

these plots, it is obvious that the original data are not normally distributed because the points deviate from the horizontal reference line, which indicates identical quantiles between the data and a theoretical normal distribution. The two sets of transformed data are more closely normally distributed because the points in the de-trended normal Q-Q plots lie more closely to the reference line relative to the original data.

The Shapiro-Wilk test on normality of the original data gives a p-value of 0.1091. The same test on the logarithmic transformed data gives a p-value of 0.5261, and it gives a p-value of 0.6479 on the square root transformed data. Consistent with the de-trended Q-Q plot, the p-values of the Shapiro-Wilk test similarly suggest that the two transformed data sets are more likely to be normally distributed. Additionally, the empirical skewness of the original data, logarithmic transformed data, and square root transformed data are 0.5864, −0.4886, and 0.1632, respectively. These quantifications of skewness imply that the square root transformed data are more symmetrical than the original data and logarithmic transformed data. Thus, based on the criteria discussed in Section 2.3, the square root transformation is recommended for these data.

A simulation study was carried out to compare the accuracies of the methods discussed in this paper. R code for the simulation is available to the interested reader upon request. For each ( n , μ , σ ) combination, we generated 10,000 samples from N ( μ , σ 2 ) . These are our simulated transformed samples, and the non-transformed (i.e., original) samples can be obtained by applying the inverse transformation to the simulated data. The transformations examined are the natural logarithm and square root. For each simulated sample, we computed a 95% confidence interval for the mean of the untransformed population from the five reviewed methods: Central Limit Theorem, bootstrap (B = 5000), back- transformation, Wald, and likelihood-based third order. The following quantities are recorded: the proportion of true means falling within the 95% confidence interval (coverage proportion), the proportion of true means less than the lower 95% confidence limit (lower error), and the proportion of true means greater than the upper 95% confidence limit (upper error). The nominal values of coverage, lower error, upper error, and bias are: 0.95, 0.025, and 0.025, respectively. We present only a small subset of the simulations we conducted to highlight several key points below, and other simulation results are available upon request.

generated from N ( μ , σ 2 ) and the parameter of interest is e x p ( μ + σ 2 2 ) .

It can be observed that the likelihood-based third order method outperforms the other methods especially when the sample size is small; coverage, lower and upper errors associated with the likelihood-based third order method are relatively closer to nominal rates compared to the alternative methods. Among the remaining methods, the Central Limit Theorem method and the bootstrap

Coverage | Lower | Upper | ||||
---|---|---|---|---|---|---|

Method | proportion | error | error | |||

1 | 2 | 10 | Central Limit Theorem | 0.5389 | 0.0003 | 0.4608 |

Bootstrap (B = 5000) | 0.5582 | 0.0007 | 0.4411 | |||

Back-transformation | 0.1965 | 0.0000 | 0.8035 | |||

Wald | 0.8549 | 0.0005 | 0.1446 | |||

Third order | 0.9460 | 0.0262 | 0.0278 | |||

50 | Central Limit Theorem | 0.6907 | 0.0001 | 0.3092 | ||

Bootstrap (B = 5000) | 0.7149 | 0.0013 | 0.2838 | |||

Back-transformation | 0.0000 | 0.0000 | 1.0000 | |||

Wald | 0.9303 | 0.0034 | 0.0063 | |||

Third order | 0.9501 | 0.0238 | 0.0261 | |||

200 | Central Limit Theorem | 0.7757 | 0.0007 | 0.2236 | ||

Bootstrap (B = 5000) | 0.7968 | 0.0021 | 0.2011 | |||

Back-transformation | 0.0000 | 0.0000 | 1.0000 | |||

Wald | 0.9446 | 0.0134 | 0.0420 | |||

Third order | 0.9500 | 0.0261 | 0.0239 | |||

2 | 0.5 | 10 | Central Limit Theorem | 0.8902 | 0.0129 | 0.0969 |

Bootstrap (B = 5000) | 0.8793 | 0.0239 | 0.078 | |||

Back-transformation | 0.8884 | 0.0047 | 0.1069 | |||

Wald | 0.8997 | 0.0246 | 0.0757 | |||

Third order | 0.9492 | 0.0252 | 0.0256 | |||

50 | Central Limit Theorem | 0.9340 | 0.0106 | 0.0554 | ||

Bootstrap (B = 5000) | 0.9345 | 0.0197 | 0.0458 | |||

Back-transformation | 0.5905 | 0.0003 | 0.4092 | |||

Wald | 0.9415 | 0.0167 | 0.0418 | |||

Third order | 0.9505 | 0.0231 | 0.0264 | |||

200 | Central Limit Theorem | 0.9430 | 0.0172 | 0.0398 | ||

Bootstrap (B = 5000) | 0.9466 | 0.0197 | 0.0337 | |||

Back-transformation | 0.0591 | 0.0000 | 0.9409 | |||

Wald | 0.9452 | 0.0220 | 0.0328 | |||

Third order | 0.9481 | 0.0262 | 0.0257 | |||

3 | 3 | 10 | Central Limit Theorem | 0.2739 | 0.0000 | 0.7261 |

Bootstrap (B = 5000) | 0.2842 | 0.0000 | 0.7158 | |||

Back-transformation | 0.0132 | 0.0000 | 0.9868 | |||

Wald | 0.8335 | 0.0000 | 0.1665 | |||

Third order | 0.9465 | 0.0265 | 0.0270 | |||

50 | Central Limit Theorem | 0.3983 | 0.0000 | 0.6017 | ||

Bootstrap (B = 5000) | 0.4166 | 0.0000 | 0.5834 | |||

Back-transformation | 0.0000 | 0.0000 | 1.0000 | |||

Wald | 0.9256 | 0.0022 | 0.0722 | |||

Third order | 0.9507 | 0.0235 | 0.0258 | |||

200 | Central Limit Theorem | 0.4936 | 0.0000 | 0.5065 | ||

Bootstrap (B = 5000) | 0.5166 | 0.0002 | 0.4834 | |||

Back-transformation | 0.0000 | 0.0000 | 1.0000 | |||

Wald | 0.9418 | 0.0120 | 0.0462 | |||

Third order | 0.9504 | 0.0261 | 0.0235 |

method give similar results. The Wald method seems to converge faster than the Central Limit Theorem and bootstrap methods. As discussed in Section 2, the back-transformation method gives unacceptable coverage probability because it is constructing confidence intervals about a parameter that is not of interest.

It can be observed that the likelihood-based third order method outperforms the other methods especially when the sample size is small; coverage, lower and upper errors associated with the likelihood-based third order method are relatively closer to nominal rates compared to the alternative methods. Among the remaining methods, the Central Limit Theorem method and the bootstrap method give similar results. The Wald method seems to converge faster than the Central Limit Theorem and bootstrap methods. As discussed in Section 2, the back-transformation method gives unacceptable coverage probability because it is constructing confidence intervals about a parameter that is not of interest.

Similar to results in

Based on these simulation results, the Central Limit Theorem method, bootstrap method and Wald method converge slowly relative to the likelihood-based third order method. Hence, we recommend using the likelihood-based third order method to obtain confidence intervals for the mean of the non-transformed distribution after applying a normalizing transformation to non-normal data, especially for small sample sizes or large departures from normality. It is important to note that researchers should not use the popular back-transformation method despite its simplicity except for the special case where ψ = E ( X ) Math_195#.

More simulations have been performed with the same pattern of results. They are not reported here, but are available upon request.

When interest is in constructing a confidence interval about a non-normal distribution, normalizing transformations are typically recommended as a first step. This paper recommends the use of de-trended normal Q-Q plots, the largest p-value of the Shapiro-Wilk test, and quantifications of skewness on the transformed data to determine the power parameter ( λ ) for Tukey’s ladder of power transformation when the exact transformation is unavailable. Our results strongly advise against using the popular back-transformation approach in applied work because it does not construct confidence intervals about the parameter of interest (i.e., the mean of the original distribution). Instead, we recommend the

Coverage | Lower | Upper | ||||
---|---|---|---|---|---|---|

Method | proportion | error | error | |||

50 | 10 | 10 | Central Limit Theorem | 0.9135 | 0.0284 | 0.0581 |

Bootstrap (B = 5000) | 0.8998 | 0.0383 | 0.0614 | |||

Back-transformation | 0.9403 | 0.0131 | 0.0466 | |||

Wald | 0.9030 | 0.0315 | 0.0655 | |||

Third order | 0.9485 | 0.0262 | 0.0253 | |||

50 | Central Limit Theorem | 0.9405 | 0.0220 | 0.0375 | ||

Bootstrap (B = 5000) | 0.9433 | 0.0222 | 0.0351 | |||

Back-transformation | 0.8910 | 0.0048 | 0.1042 | |||

Wald | 0.9386 | 0.0228 | 0.0386 | |||

Third order | 0.9483 | 0.0257 | 0.0260 | |||

200 | Central Limit Theorem | 0.9490 | 0.0227 | 0.0283 | ||

Bootstrap (B = 5000) | 0.9473 | 0.0222 | 0.0305 | |||

Back-transformation | 0.7097 | 0.0002 | 0.2901 | |||

Wald | 0.9489 | 0.0227 | 0.0284 | |||

Third order | 0.9463 | 0.0292 | 0.0245 | |||

75 | 20 | 10 | Central Limit Theorem | 0.9093 | 0.0245 | 0.0662 |

Bootstrap (B = 5000) | 0.8974 | 0.0348 | 0.0678 | |||

Back-transformation | 0.9335 | 0.0108 | 0.0537 | |||

Wald | 0.8982 | 0.0279 | 0.0739 | |||

Third order | 0.9482 | 0.0265 | 0.0253 | |||

50 | Central Limit Theorem | 0.9382 | 0.0211 | 0.0407 | ||

Bootstrap (B = 5000) | 0.9429 | 0.0201 | 0.0370 | |||

Back-transformation | 0.8495 | 0.0025 | 0.1480 | |||

Wald | 0.9376 | 0.0206 | 0.0418 | |||

Third order | 0.9493 | 0.0249 | 0.0258 | |||

200 | Central Limit Theorem | 0.9490 | 0.0215 | 0.0295 | ||

Bootstrap (B = 5000) | 0.9471 | 0.0219 | 0.0310 | |||

Back-transformation | 0.5434 | 0.0001 | 0.4565 | |||

Wald | 0.9487 | 0.0215 | 0.0298 | |||

Third order | 0.9494 | 0.0264 | 0.0242 | |||

100 | 30 | 10 | Central Limit Theorem | 0.9075 | 0.0227 | 0.0698 |

Bootstrap (B = 5000) | 0.8976 | 0.0328 | 0.0696 | |||

Back-transformation | 0.9290 | 0.0098 | 0.0612 | |||

Wald | 0.8962 | 0.0262 | 0.0776 | |||

Third order | 0.9469 | 0.0279 | 0.0252 | |||

50 | Central Limit Theorem | 0.9374 | 0.0201 | 0.0425 | ||

Bootstrap (B = 5000) | 0.9416 | 0.0202 | 0.0382 | |||

Back-transformation | 0.8236 | 0.0017 | 0.1747 | |||

Wald | 0.9362 | 0.0201 | 0.0437 | |||

Third order | 0.9490 | 0.0249 | 0.0261 | |||

200 | Central Limit Theorem | 0.9492 | 0.0213 | 0.0295 | ||

Bootstrap (B = 5000) | 0.9485 | 0.0212 | 0.0303 | |||

Back-transformation | 0.4543 | 0.0000 | 0.5457 | |||

Wald | 0.9479 | 0.0213 | 0.0308 | |||

Third order | 0.9505 | 0.0254 | 0.0241 |

likelihood-based third order method because of its superior performance in terms of its rate of convergence, coverage, and accuracy relative to the Central Limit Theorem, bootstrap and Wald methods, even when the sample size is small or the distribution is far from being normal.

We thank the editor and the referee for their comments. This work was based on O.C.Y. Wong’s undergraduate honors thesis. J. Pek was supported by the Natural Sciences and Engineering Research Council of Canada Discovery Grant (RGPIN-04301-2014) and the Early Researcher Award by the Ontario Ministry of Research and Innovation (ER15-11-004). A.C.M. Wong was supported by the Natural Sciences and Engineering Research Council of Canada Discovery Grant (RGPIN-163597-2012).

Pek, J., Wong, A.C.M. and Wong, O.C.Y. (2017) Confidence Intervals for the Mean of Non-Normal Distribution: Transform or Not to Transform. Open Journal of Statistics, 7, 405-421. https://doi.org/10.4236/ojs.2017.73029