Confidence Intervals for the Mean of Non-Normal Distribution : Transform or Not to Transform

In many areas of applied statistics, confidence intervals for the mean of the population are of interest. Confidence intervals are typically constructed assuming normality although non-normally distributed data are a common occurrence in practice. Given a large enough sample size, confidence intervals for the mean can be constructed by applying the Central Limit Theorem or by the bootstrap method. Another commonly used method in practice is the back-transformation method, which takes on the following three steps. First, apply a transformation to the data such that the transformed data are normally distributed. Second, obtain confidence intervals for the transformed mean in the usual manner, which assumes normality. Third, apply the backtransformation to obtain confidence intervals for the mean of the original, non-transformed distribution. The parametric Wald method and a small sample likelihood-based third order method, which can address non-normality, are also reviewed in this paper. Our simulation results suggest that common approaches such as back-transformation give erroneous and misleading results even when the sample size is large. However, the likelihood-based third order method gives extremely accurate results even when the sample size is small.


Introduction
In the last two decades, there has been a push in psychological science to im- prove research reporting with an emphasis on effect size and confidence interval reporting (see American Education Research Association [1]; Cumming [2]; Wilkinson and the Task Force for Statistical Inference [3]).Effect sizes communicate the magnitude and direction of a practically important effect (e.g., treatment decreased depression scores by 13%), and confidence intervals communicate this effect's estimate precision.The importance of confidence intervals, their basic construction, and interpretation have thus been the focus of several influential pedagogical articles (e.g., see Cumming and Fidler [4]; Cumming and Finch [5]; Greenland et al. [6]).
Most, if not all, modern introductory statistics textbooks review and describe the construction of confidence intervals (e.g., see Moore et al. [7]).Let ( ) 1 , , n x x  be a sample obtained from a normally distributed population with mean µ and variance 2 σ .Then a ( ) , , 1 and t* is the ( ) 1 2 100 th α − percentile of the Student t distribution with ( ) degrees of freedom.Moreover, when the sample size n is large (usually stated n is larger than 30), then the ( ) α − confidence interval for µ can still be obtained from (1) except that we replaced t* by z*, which is the ( ) The fundamental assumption underlying the construction of this confidence interval is that the data are normally distributed.However, collected data are usually non-normally distributed in practice (for examples in psychology, see Cain et al. [8]; Micceri [9]).In public health research, Bland and Altman [10] reported that serum triglyceride measurements are distributed with positive skewness.In biology, McDonald [11] reported that the number of Eastern mudminnows in Maryland streams are non-normally distributed.
In this paper, we compare various methods for constructing confidence intervals when data are non-normally distributed.Three of the most popular and commonly used methods are the method based on the Central Limit Theorem, the bootstrap method, and the back-transformation method, which are reviewed in Section 2. The parametric based Wald method and likelihood-based third order method are also discussed in Section 2. Note that the popular back-trans- Section 4 to compare the accuracy of the methods discussed in this paper and illustrated that the likelihood-based third order method gives extremely accurate coverage probability even when the sample size is small, the Wald method, the Central Limit Theorem method and then bootstrap method all performed poorly when sample size is small but the performance increases when the sample size increases, and the popular back-transformation method should not be used because it does not construct the confidence interval for the correct parameter.Finally, some concluding remarks are given in Section 5.

Methodology
This section reviews four commonly used methods, namely the Central Limit Theorem, bootstrap, back-transformation, and Wald for obtaining a confidence interval for the mean of a non-normal distribution.A very accurate likelihood-based method is also introduced in this section.

Central Limit Theorem Method
Let ( ) be a sample from a non-normal distribution with mean ψ .When the sample size n is large, the Central Limit Theorem gives . Since x and 2 x s are the unbiased estimates of ψ and ( ) var X respectively; by the Central Limit Theorem, an approximate ( ) where * z is the ( ) percentile of the standard normal distribution.

Bootstrap Method
The bootstrap method is a popular non-parametric method, which does not require any distributional assumptions.Efron and Tibshirani [12] provide a detailed review of the bootstrap method.The following is an algorithmic approach of obtaining a ( )  Note that as with the Central Limit Theorem method, the bootstrap method requires the observed sample size to be large so as to be representative of the population.

Back-Transformation Method
Recall that X is a non-normally distributed random variable with mean ψ .As- sume there exists a transformation ( ) tributed with mean µ and variance 2 σ .By the delta method, ( ) ( ) ( ) and an approximate ( ) , .
It is important to note that (3) could be misleading because ( ) which can be quite different from 2 µ , especially when 2 σ is large.
The rest of this subsection is to provide a systematic way of choosing the transformation ( ) g ⋅ .In practice, the most common simple transformations are the logarithmic transformation and square root transformation.Box and Cox [13] proposed a more complicated transformation, which requires the determination of the power parameter.Similarly, Tukey [14] suggested a ladder of power transformation, which also requires the determination of the power parameter.
We review Tukey's method in a later subsection.With an observed sample ( ) , our aim is to obtain confidence intervals for ψ .In this paper, focus is placed on the two most commonly used transformations in practice: the logarithmic transformation and the square root transformation.Note that the tansformation methods discussed can be generalized to any known transformation in theory (cf., Box-Cox or Tukey's transformations).
When observed data are non-normally distributed, a common approach is to first apply a transformation such that the transformed data become somewhat Table 1.Transformation and the parameter of interest.Tukey's ladder of power transformation takes the form where λ is called the power parameter of this transformation, where λ is chosen such that Y is approximately normally distributed.Moreover, λ should be chosen such that the power parameter is easy to interpret.Note that 1 λ = is equivalent to no transformation.In practice, the popular reciprocal transformation, logarithmic transformation, and square root transformation are equivalent to 1, 0 Table 1 presents the mean of distribution prior to transformation, ψ , in terms of µ and 2 σ based on the type of transformation used.Since ψ does not exist for the reciprocal transformation, this transformation is not considered in this paper.
With an observed sample, we suggest the choice of λ be based on three criteria: 1. de-trended normal quantile-quantile (Q-Q) plot, 2. p-value of the Shapiro-Wilk test of normality, and 3. skewness.
First, when the de-trended normal Q-Q plot deviates from the horizontal reference line which indicates identical quantiles between the data and a theoretical normal distribution, the plot suggests that the data are likely non-normally distributed.Second, simulation studies by Razali and Wah [16] illustrate that the Shapiro-Wilk test is the most powerful test among all formulated statistical tests for normality.Under the assumption of a normal distribution, the smaller the p-value associated with the Shaprio-Wilk test, the more evidence against the normality assumption.Thus, the transformation which gives the largest p-value of the Shapiro-Wilk test is associated with the least evidence against the transformed data being normally distributed.Finally, with regard to skewness, the normal distribution has skewness 0. In this vein, the transformation which results in a skewness value closest to 0 is most symmetric and would be the preferred transformation.

Wald Method
As in the previous subsection, we assume that X be a non-normally distributed random variable with mean ψ and there exists a transformation ( ) is normally distributed with mean µ and variance 2 σ .
Moreover, ( ) The log-likelihood function concerning Y can be written as where a is an additive constant.Without loss of generality, a is set to zero hereafter.The overall maximum likelihood estimate (MLE), denoted by ( ) .
The observed information matrix is the negative of the second derivatives of the log-likelihood function with respect to the parameters: The variance-covariance matrix for ( ) µ σ ′ and variance ( ) Recall that the parameter of interest is Thus, an approximate ( ) For the case of the logarithmic transformation (i.e., Tukey's ladder of power transformation where 0 λ = ), the parameter of interest is ( ) . Therefore, by the Wald method, a ( ) and  ( ) . Thus, an approximate ( ) For the case of the square root transformation (i.e., Tukey's (1977) ladder of power transformation where 1 2 λ = ), the parameter of interest is Therefore, an approximate ( ) confidence interval for ψ is given by (5), where  ( ) ( )

Likelihood-Based Third Order Method
Both the Central Limit Theorem method and Wald method have a theoretical rate of convergence of ( ) O n − , and both the back-transformation method and the bootstrap method have no known rate of convergence.In recent years, many methods have been developed to improve the rate of convergence of existing asymptotic methods.In this subsection, we review the modified signed log-likelihood ratio method by Barndorff-Nielsen [17].The modified signed log-likelihood ratio statistic is defined as where is the signed log-likelihood ratio statistic, ( ) ( ) q ψ is a statistic based on the log-likelihood function given in (4).Barndorff- Nielsen [17] showed that * r is asymptotically distributed as a standard normal distribution with a rate of convergence of ( ) . Thus, a ( ) If the model is an exponential family model and the parameter of interest ψ is a component parameter of the canonical parameter, Fraser [18] showed that ( ) q ψ is the standardized MLE statistic.Given a general model and this idea in mind, Fraser and Reid [19] first approximate the model using an approximate tangent exponential model to obtain the locally defined canonical parameter.
Then, they express the parameter of interest in terms of the locally defined canonical parameter and also derived the variance the estimated parameter of interest in this locally defined canonical parameter scale.Thus, ( ) q ψ is the stan- dardized MLE statistic expressed in the locally defined canonical parameter scale, and the modified signed likelihood ratio statistic can be used to obtain confidence interval for ψ .Details of this algorithmic approach of obtaining * r is outlined below.
Notation: ( ) θ  is the log-likelihood function; θ is a k-dimensional vector of parameters; ( ) ϕ ϕ θ = is a k-dimensional vector of canonical parameters for the exponential family model; ( ) is the observed data.
Step 1: From the log-likelihood function, obtain the overall MLE, where ψ is a fixed value.Obtain the constrained MLE either from the tilted log-likelihood function or from Step 2, ( ) ( ) ( ) is the matrix of the negative of the second derivatives of the tilted log-likelihood function.
Step 4: The signed log-likelihood ratio statistic is Step 5: Define Step 9: The modified signed log-likelihood ratio statistic is given by * 1 log .r r r r q = − Although the algorithm involves many steps, it can easily be implemented into algebraic or statistical software such as MATLAB, Maple and R.

Empirical Examples
In this section, the different methods of constructing a confidence interval about the mean of non-normally distributed data are illustrated with two empirical examples.We demonstrate that the results obtained by the methods discussed in this paper can be very different.

Example 1: Serum Triglyceride Measurements
Bland and Altman [10] considered n = 278 serum triglyceride measurements, which had a positively skewed data distribution with an average of 0.51 mmol/l and a standard deviation of 0.22 mmol/l.By applying a base 10 logarithm transformation to the data to obtain a less skewed distribution, the transformed distribution became bell-shaped with an average of −0.33 and a standard deviation of 0.17.By applying the Central Limit Theorem, they report a 95% confidence interval for the mean serum triglyceride measurements to be (0.48, 0.54).Using the back-transformation method, the corresponding interval is (0.45, 0.49).Table 2 presents the 95% confidence intervals for the mean serum triglyceride measurements for the alternative methods reviewed above Note that for this example, the bootstrap method cannot be applied because the original data set is not unavailable.
Bland and Altman [10] noted that the interval obtained by the back-transformation method is actually the 95% confidence interval for the geometric mean of serum triglyceride measurements instead of the mean serum triglyceride measurements, where the latter is the parameter of interest.Stated differently, the back-transformation method does not provide information about the focal parameter of interest (i.e., the mean of the non-normal distribution).From Table 2, it can be observed that the results from the Central Limit Theorem method are different from those obtained by the Wald method and third order method.Additionally, the Wald method and third order method give results which agree up to the second decimal place.This observation is not surprising because these two methods theoretically converge to the same answer when the sample size goes to infinity.The only difference is that the third order method will have a faster rate of convergence than the Wald method (i.e., ( ) , respectively).The different rates of convergence are more formally illustrated in Section 4.

Example 2: Abundance of Eastern Mudminnows
McDonald [11] reported on data on the abundance of Eastern mudminnows in Maryland streams, which is reproduced below: These data are non-normally distributed and McDonald [11] suggested that both the logarithmic and square root transformed data are suitable for analysis because they are more normally distributed compared to the original and other competing transformations.His final analysis makes use of the logarithmic transformed data.
Table 3 presents the 95% confidence intervals for the mean of the non-transformed distribution obtained by applying the Central Limit Theorem method and the bootstrap method with B = 5000 to the original data; and the backtransformation method, Wald method, and likelihood-based third order method to both the logarithmic transformed data and square root transformed data.
The results obtained by the methods discussed in this paper are very different for different transformations.In particular, the logarithmic transformation results in a much larger upper bound of the interval compared to the square root transformation.Thus, it is essential to identify which transformation is more appropriate a given set of data.
The de-trended normal Q-Q plots for the original data, logarithmic transformed data and square root transformed data are shown in Figure 1.From

Simulation Study
A simulation study was carried out to compare the accuracies of the methods discussed in this paper.R code for the simulation is available to the interested reader upon request.For each ( ) We present only a small subset of the simulations we conducted to highlight several key points below, and other simulation results are available upon request.It can be observed that the likelihood-based third order method outperforms the other methods especially when the sample size is small; coverage, lower and upper errors associated with the likelihood-based third order method are relatively closer to nominal rates compared to the alternative methods.Among the remaining methods, the Central Limit Theorem method and the bootstrap method give similar results.The Wald method seems to converge faster than the Central Limit Theorem and bootstrap methods.As discussed in Section 2, the back-transformation method gives unacceptable coverage probability because it is constructing confidence intervals about a parameter that is not of interest.Similar to results in Table 4, we can observe that the likelihood-based third order method outperforms the other methods, especially when sample size is small.In this context, the Central Limit Theorem method and the bootstrap method give similar results and they seem to converge faster than the Wald method.The back-transformation method continues to give unacceptable coverage probability because it constructs confidence intervals about a parameter that is not of interest.
Based on these simulation results, the Central Limit Theorem method, bootstrap method and Wald method converge slowly relative to the likelihood-based third order method.Hence, we recommend using the likelihood-based third order method to obtain confidence intervals for the mean of the non-transformed distribution after applying a normalizing transformation to non-normal data, especially for small sample sizes or large departures from normality.It is important to note that researchers should not use the popular back-transformation method despite its simplicity except for the special case where ( ) More simulations have been performed with the same pattern of results.They are not reported here, but are available upon request.

Conclusion
When interest is in constructing a confidence interval about a non-normal distribution, normalizing transformations are typically recommended as a first step.
This paper recommends the use of de-trended normal Q-Q plots, the largest p-value of the Shapiro-Wilk test, and quantifications of skewness on the transformed data to determine the power parameter ( λ ) for Tukey's ladder of power transformation when the exact transformation is unavailable.Our results strongly advise against using the popular back-transformation approach in applied work because it does not construct confidence intervals about the parameter of interest (i.e., the mean of the original distribution).Instead, we recommend the likelihood-based third order method because of its superior performance in terms of its rate of convergence, coverage, and accuracy relative to the Central Limit Theorem, bootstrap and Wald methods, even when the sample size is small or the distribution is far from being normal.
How to cite this paper: Pek, J., Wong, A.C.M. and Wong, O.C.Y. (2017) Confidence Intervals for the Mean of Non-Normal Distribution: Transform or Not to Trans- standard normal distribution.
formation method requires the existence of a transformation such that the transformed data are normally distributed.The selection of such a transformation by the Box-Cox transformation and the Tukey's ladder of power transformation are briefly discussed in Section 2. Two empirical examples are presented in Section 3 to illustrate that confidence intervals based on the different methods discussed in Section 2 can be vastly different.Simulation results are presented in

−
percentile bootstrap confidence interval for the population mean.
normally distributed.In the statistical literature, two very similar families of transformations are frequently discussed: the Box-Cox transformation and Tukey's ladder of power transformation.In particular, Osborne[15] gives a detailed discussion of the application of the Box-Cox transformation.Mathematically, the Box-Cox transformation and Tukey's ladder of power transformation are very similar.Because Tukey's ladder of power transformation is easier to interpret compared to the Box-Cox transformation, we review the ladder of power transformation and suggested criteria to choose an appropriate transformation to address non-normally distributed data below.
can be approximated by the in- verse of Fisher's expected information matrix, in general, can be difficult to obtain in practice.Nevertheless, the variance-covariance matrix for ( ) 2 ˆ, µ σ ′ can be approximated by the inverse of the observed infor- mation evaluated at the MLE,

.
By the delta method,  ( )

2 ˆ
, ψ ψ µ σ is the constrained MLE obtained by maximizing the log-likelihood function for a given ψ value, and

3 :
where λ is defined as the Lagrange multiplier.Denote the result of the maximization be ( ) Define the tilted log-likelihood function as

Figure 1 .
Figure 1.De-trended Normal Q-Q plots for original and transformed data of the abundance of Eastern mudminnows.

.
These are our simulated transformed samples, and the non-transformed (i.e., original) samples can be obtained by applying the inverse transformation to the simulated data.The transformations examined are the natural logarithm and square root.For each simulated sample, we computed a 95% confidence interval for the mean of the untransformed population from the five reviewed methods: Central Limit Theorem, bootstrap (B = 5000), backtransformation, Wald, and likelihood-based third order.The following quantities are recorded: the proportion of true means falling within the 95% confidence interval (coverage proportion), the proportion of true means less than the lower 95% confidence limit (lower error), and the proportion of true means greater than the upper 95% confidence limit (upper error).The nominal values of coverage, lower error, upper error, and bias are: 0.95, 0.025, and 0.025, respectively.
observed that the likelihood-based third order method outperforms the other methods especially when the sample size is small; coverage, lower and upper errors associated with the likelihood-based third order method are relatively closer to nominal rates compared to the alternative methods.Among the remaining methods, the Central Limit Theorem method and the bootstrap

Table 2 .
95% confidence interval for the mean serum triglyceride measurements.

Table 3 .
95% confidence intervals for the mean of the abundance of Eastern mudminnows in Maryland streams.

Table 4 .
95% coverage probability for the logarithmic transformation case.The Wald method seems to converge faster than the Central Limit Theorem and bootstrap methods.As discussed in Section 2, the back-transformation method gives unacceptable coverage probability because it is constructing confidence intervals about a parameter that is not of interest.

Table 5
presents results with the square root transformed data being