Confidence Interval Estimation of the Correlation in the Presence of Non-Detects

This article deals with correlating two variables that have values that fall below the known limit of detection (LOD) of the measuring device; these values are known as non-detects (NDs). We use simulation to compare several methods for estimating the association between two such variables. The most commonly used method, simple substitution, consists of replacing each ND with some representative value such as LOD/2. Spearman’s correlation, in which all NDs are assumed to be tied at some value just smaller than the LOD, is also used. We evaluate each method under several scenarios, includ-ing small to moderate sample size, moderate to large censoring proportions, extreme imbalance in censoring proportions, and non-bivariate normal (BVN) data. In this article, we focus on the coverage probability of 95% confidence intervals obtained using each method. Confidence intervals using a maximum likelihood approach based on the assumption of BVN data have acceptable performance under most scenarios, even with non-BVN data. Intervals based on Spearman’s coefficient also perform well under many conditions. The methods are illustrated using real data taken from the biomarker literature.


Introduction
In research studies involving clinical measurements such as biomarker concentrations, it is quite common to have specimens for which the concentration of the analyte is non-zero, but below the analytic limit of detection (LOD); that is, the measuring device used to determine the level of the analyte in the biological specimen is unable to measure the concentration. For such specimens, all that we know is that the analyte is present and that the concentration is less than the LOD. Non-zero observations that are less than the LOD are commonly referred to as non-detects (NDs). For purposes of statistical analysis, non-detects as we have defined them are considered to be left-censored.
In the study by Amorim and Alvarez-Leite [1], NDs were of particular concern. The authors evaluated urinary o-cresol as a biomarker of toluene exposure by calculating the Pearson correlation coefficient (PCC) relating the level of o-cresol in the urine of workers exposed to toluene to the level of urinary hippuric acid in the same workers. Out of the 54 urine samples that Amorim and Alvarez-Leite analyzed, the o-cresol concentration was below its LOD (0.2 μg/ml) in 39 (72%); out of these 39 samples, the concentration of hippuric acid was below its LOD (0.1 mg/ml) in 4 (10%). Thus, there were only 15 samples for which the data were "complete" for both biomarkers. In the study by Atawodi et al. [2], NDs were also of great concern. These authors examined various hemoglobin adducts as biomarkers of tobacco smoke exposure by comparing the adduct levels of 18 current smokers with those of 52 "never smokers". The hemoglobin adduct levels were below the LOD (9 fmol HPB/g Hb) in 7 (13%) of the 52 samples from the "never smokers".
Perhaps the method that is most commonly used to deal with samples in which there are NDs is to remove the NDs and perform the statistical analysis using only the "complete data". In the study by Lagorio et al. [3], the authors used this approach in their examination of trans, trans muconic acid (t, t-MA) as a biomarker for low-level benzene exposure. They calculated the Pearson correlations among t, t-MA concentrations in urine samples obtained from 10 Estonian shale oil workers; these concentrations were estimated using high-performance liquid chromatography (HPLC) following three different pre-analytical procedures (methanol dilution, filtration, ether extraction). Another method that is commonly used to analyze data sets in which NDs are present is to use "simple substitution"; in other words, a value is substituted in place of the NDs and then the "usual" statistical analysis is performed on the resulting "new" sample of data. The most commonly used values in this crude type of imputation include the LOD [1] [2], and LOD/2 [4]. We contend that the approaches that are commonly used to handle NDs have several shortcomings; the purpose of our study was to evaluate some of the commonly used methods, along with some that are not so common.
Nonparametric methods have also been used to deal with samples in which NDs are present. In this approach, one treats all NDs as if they were tied at some value just below the LOD of the respective measuring device. For example, if one wished to correlate two analytes X and Y, at least one of which was undetectable in some specimens, one could use Spearman's Rank Correlation Coefficient (denoted here by r s ). In this method, the original X and Y values are replaced by their respective midranks and the NDs are assigned the smallest midrank for  If one wished to compare the analyte levels between two groups and NDs were present in at least one of the two samples, one could compute the midranks after combining the data into a single sample; each of the NDs would then be assigned the smallest mid-rank. One could then use the Mann-Whitney-Wilcoxon (M-W-W) test or other nonparametric two-sample method based on these mid-ranks to compare the two groups in terms of the level of the analyte.
In their evaluation of potential biomarkers of exposure to tobacco smoke, Atawodi et al. [2] used the M-W-W test to compare smokers and never smokers in terms of the level of a hemoglobin adduct; NDs were present in the sample of never smokers.
In a simulation study, Wang [5] found that none of the "standard" methods described above perform satisfactorily when correlating two measurements X and Y that are both subject to left-censoring, especially if X and Y are strongly positively correlated (ρ ≥ 0.5). If the distribution of X and Y is bivariate normal (BVN), a preferable approach is to estimate the Pearson correlation between X and Y using maximum likelihood (ML) [6]. Multiple imputation could also be used if the appropriate missing data mechanism is present and other conditions are satisfied [7].

Methods
We performed a Monte Carlo simulation to compare 5 methods that can be used to obtain point and confidence interval (CI) estimates of the correlation between X and Y when both X and Y are left censored. These methods included several of the "standard methods" that have been used to analyze data in which NDs are present, as well as the ML method [6]. The methods compared were as follows: (1) Simple Substitution: replace each ND by the conditional mean of that variable, given that it is known that the value is less than the LOD for that variable.
(3) Random Imputation from a Uniform Distribution: Substitute a randomly selected value from the interval [0, LOD x ] in place of each ND among the x-values and substitute a randomly selected value from the interval [0, LOD y ] in place of each ND among the y-values. The rationale for this method is that there may be nothing special about using the LOD or some fraction of it in place of the NDs; why not use any randomly generated number between 0 and the LOD.
(5) Spearman Correlation: All NDs among the x-values are treated as if they were tied at some value smaller than the smallest observed x-value; similarly, all Open Journal of Statistics NDs among the y-values are treated as if they were tied at some value smaller than the smallest observed y-value. Assign midranks in the usual way and calculate r s using these midranks.
For estimation methods (1)-(4) above, we used a 2 nd -order Fisher z-transformation, which provides a more accurate estimate of the variance of ( ) z ρ , in the calculation of the 95% C.I. for ρ. The coverage probability of CIs based on this method has been shown to be closer to the nominal level than those based on the usual Fisher z-transformation, and the 2 nd -order z-transformation poses no computational difficulties [10]. For estimation method (5), we evaluated both the jackknife and approximate bootstrap confidence interval (ABC) as methods for finding a 95% C.I. for the population value of the Spearman correlation. Defining the population value of the Spearman coefficient is controversial [11]; we followed Newton and Rudel [12] and defined the true value, ρ s , to be the mean of the r s values calculated from the Monte Carlo samples prior to applying the censoring schemes.
In the simulation study to compare the point estimation and confidence interval methods described above, we included various settings of several simula- Each of the point estimates and corresponding 95% C.I. procedures described above were evaluated in terms of the following criteria: 1) bias (and absolute bias), 2) median absolute deviation, 3) confidence interval width, and 4) confidence interval coverage probability (CP). In this article, we present results from our comparisons of the CP of 95% CIs based on the different estimation methods.

Results
The maximum likelihood estimate (MLE) performed best overall in terms of all of the criteria that we considered, and it can be recommended for estimating ρ even when the assumption of BVN is violated. However, the ML method may not be able to produce a point estimate (due to failure of the optimization routine to converge) for extreme negative values of ρ, small sample sizes and/or extremely heavy or imbalanced censoring. This is more likely to occur when the (denoted by β 1,p and β 2,p , respectively). For the bivariate normal, β 1,2 = 0 and β 2,2 = 8. For the bivariate gamma we used in the simulation study, β 1,2 = 3.5 and β 2,2 = 12. For the bivariate beta we used, β 1,2 = 3 and β 2,2 = 10. The ML method performed quite well for most settings of the simulation parameters even when the simulated data were generated from these non-BVN distributions. This is somewhat surprising since the MLEs were derived under the assumption that X and Y followed a BVN distribution.  Table 1 were obtained by calculating the median CP over all settings of the other simulation parameters (namely, the true value of ρ or ρ s , and sample size). The results labelled "non-normal" in the bottom half of the level for very few of the settings for the true value of ρ. The random imputation CIs achieved the 92.5% level only when ρ = 0.
The effects of sample size (n) on the median CP of the CIs based on the various methods are illustrated in Table 3. As in Table 1 and Table 2, boldface values in Table 3 identify median CPs that did not achieve the lower acceptability criterion of 92.5%. It is interesting to note that the ML-based CIs maintained an acceptable value of CP for all sample sizes except n = 500 for the non-BVN simulated data. Confidence intervals based on the Spearman coefficient performed  almost as well as the ML method with the non-BVN data, but failed to maintain the 92.5% level for several sample sizes with the BVN data. The CIs based on the complex substitution, simple substitution and random imputation methods failed to achieve the 92.5% level for any of the sample sizes in Table 3.

Example
We used the data from the study by Amorim and Alvarez-Leite  , with a 95% CI of (0.66, 0.87). Analyzing only the 15 cases with complete data yielded r = 0.76 with a 95% CI of (0.40, 0.92).
Simple substitution with LOD/2, which was the method used by Amorim and Alvarez-Leite, yielded 2 0.79 LOD ρ = , with a 95% CI of (0.65, 0.87). As can be seen in Table 4, the ML-based results differed very little from those based on the various substitution methods, with the exception of random imputation. However,  ). Similarly, Spearman-based CIs achieve acceptable CP (93.4%) when the true value of ρ s is 0.5; this also appears to be a reasonable assumption based on the results in Table 4 (r s = 0.58). Finally, from the results for non-BVN data in Table 3, we see that CIs based on either the ML method (93.8%) or the Spearman coefficient (93.7%) achieve acceptable CP when n is 50, this is approximately so for the Amorim and Alvarez-Leite study (n = 54). Thus, based on our simulation results (summarized in Tables 1-3  R code for computing each of the point estimates and corresponding confidence intervals described in this article is available from the second author.

Discussion
In this article, we compared 5 methods that can be used to obtain a confidence interval for the correlation between two variates X and Y, both of which are left censored. Other authors have recently proposed that alternative methods be considered; for example, Weaver et al. [14] considered a Bayesian approach. In this article, we restrict our attention to estimators based on the frequentist approach. Some authors have considered estimation of the elements of a covariance matrix of dimension p x p under left censoring [15] [16]. The estimation problem considered in the present article corresponds to the case p = 2. Jones et al. [15] did consider p = 2; however, their focus was on the bias of the point estimate of ρ and they did not consider confidence interval estimation as in the present article. Pesonen et al. [16]  Lapidus et al. [18] examined the use of multiple imputation to estimate the CCC.
The PCC considered in the present article is a special case of the CCC, so our future work will focus on adapting these methods to the estimation of the PCC.
Our simulation results showed that when the simulated data were from a BVN distribution, the ML-based CIs had median CP above 92.5% under all conditions that we considered except when p 1 = p 2 = 0.9 or ρ = −0.9. Furthermore, the ML-based CIs were superior to all other CI methods in terms of median CP under all simulation scenarios using BVN data. Interestingly, for non-BVN simulated data, the ML-based CIs were still superior to those based on other estimation methods for almost all scenarios that we considered. Spearman-based CIs performed acceptably as long as |ρ s | was small or moderate, the sample size was not too large (i.e., less than 500), and the censoring proportions for X and Y were not too large and there was little or no imbalance. The Spearman-based CIs generally performed better for non-BVN data than for BVN data. The complex substitution method was proposed by Lynn [8] for use when only one of the variables is subject to NDs and McCracken [9] extended the method to the situation in which both variables are subject to NDs. The CPs of CIs based on this method were acceptable for some settings of the simulation parameters. However, complex substitution-based CIs were typically inferior to those based on either the MLE or the Spearman coefficient or both. The distribution of the simulated data (either BVN or non-BVN) had only a minimal impact on the performance of complex substitution-based CIs. The CP of these CIs was generally superior to that of CIs based on simple substitution, but they were comparable for many of the settings of the simulation parameters that we considered. Confidence intervals based on the random imputation method were greatly inferior to the CIs based on all of the other methods for almost all settings of the simulation parameters, and rarely achieved the 92.5% acceptability criterion for CP.
The simple substitution-based CIs generally did better in terms of CP when the data were not BVN. However, these intervals did not yield acceptable CP except in the presence of no censoring (Table 1) or when the population correlation was zero ( Table 2). Handelsman and Ly [19] recommended simple substitution with 2 LOD when estimating bivariate correlations for serum steroid measurements; however, their study only considered situations in which either X or Y but not both are left censored.
For several of the simulation scenarios we included, the ML method failed to produce a point estimate of ρ (and hence a confidence interval) due to the failure of the optimization routine to converge. In case this happens with a real data set, we recommend using a Spearman-based CI if one wishes only to measure the strength of association between X and Y, and not to measure the strength of linear association between X and Y. If estimation of ρ (as a measure of linear association) is the primary aim of the analysis, and the MLE cannot be obtained due to lack of convergence, we recommend that complex substitution be used to estimate ρ. However, for some combinations of the simulation settings that we considered, complex substitution-based CIs performed extremely poorly in terms of CP. Under these conditions, the CI based on the CS method should be considered as only a rough approximation.
Li et al. [20] also examined the use of the ML method to estimate a bivariate correlation in the presence of left censoring of both X and Y. They provided R code for implementing this method and thoroughly evaluated its performance in a simulation study in which they used several of the same parameter settings as in our simulation study [9]. Their results for the CP of 95% CIs based on the ML method under assumptions of both BVN and non-BVN data are comparable to ours. However, Li et al. did not provide a comparison of ML-based CIs with those based on other methods, as we have done here. The results we have presented in Tables 1-3 enable the analyst to select an estimation method that is likely to give acceptable results for the CP depending on the degree of censoring, the true value of ρ, and the sample size, as we illustrated in the Example.