_{1}

In diagnostic trials, clustered data are obtained when several subunits of the same patient are observed. Within-cluster correlations need to be taken into account when analyzing such clustered data. A nonparametric method has been proposed by Obuchowski (1997) to estimate the Receiver Operating Characteristic curve area (AUC) for such clustered data. However, Obuchowski’s estimator gives equal weight to all pairwise rankings within and between cluster. In this paper, we modify Obuchowski’s estimate by allowing weights for the pairwise rankings vary across clusters. We consider the optimal weights for estimating one AUC as well as two AUCs’ difference. Our results in this paper show that the optimal weights depends on not only the within-patient correlation but also the proportion of patients that have both unaffected and affected units. More importantly, we show that the loss of efficiency using equal weight instead of our optimal weights can be severe when there is a large within-cluster correlation and the proportion of patients that have both unaffected and affected units is small.

In diagnostic trials, clustered data are obtained when several subunits of the same patient are observed. For example, in a study by Masaryk et al. (1991) [

In the above example, each patient(cluster) contributes a number of unaffected and affected units. Correlation exists for outcomes between two unaffected units, between two affected units, and between an unaffected and an affected unit from the same cluster, and between the outcomes of the two diagnostic tests from the same cluster. All these correlations need to be taken into account when analyzing such clustered data.

An ROC curve is a plot of a diagnostic test’s sensitivity versus 1-specificity. The curve is constructed by changing the cutpoint that defines a positive diagnostic test result. The area under the ROC curve (AUC) summarizes the test’s overall diagnostic ability and is typically used as a global measure of the accuracy of the diagnostic test.

In the clustered data case, Obuchowski (1997) [

In this paper, we modify Obuchowski’s estimator by allowing the weight assigned to each pairwise ranking to vary across clusters, and derive the optimal weights that minimize the variance of the AUC estimator. Our results in this paper show that the optimal weights depends not only on the within-cluster correlation but also the proportion of clusters that have both unaffected and affected units. More importantly, we show that the gain of efficiency in comparison with two simple weighting schemes can be doubled when there is a large within-cluster correlation and the proportion of clusters that have both unaffected and affected units is small.

The rest of this paper is organized as follows. In Section 2, the optimal weights for one AUC are derived and the estimators of the optimal weights are discussed. The relative asymptotic efficiencies in comparing our optimal estimator with two simple weighting schemes are studied. A data example is presented in Section 3 and conclusions are provided in Section 4.

Assume that there are

only unaffected units, clusters

Let

where

Note that

where

where

We propose to estimate

Notice that when

To derive our optimal weight, we utilize the following result which can be found in the Appendix of Emir, et al. (2000) [

where

and

Note that

and

Defining the transformation

we can express the variance of

where

and

The optimal weights can be obtained by minimizing (8) with respect to

and

where

and

Let

Along the same line of the proofs for (??), (??) and (??), we can show that

and

where

and

Let

We have proposed an optimal nonparametric estimator for one AUC, which modifies Obuchowski’s estimate by allowing different weights for the pairwise rankings within and between cluster. Optimal weights for one AUC has been derived by minimizing the variance of the estimate of one AUC(two AUCs’ difference). Asymptotic performance of the AUC estimate using our optimal weights has been studied in contrast with the two weighting schemes.

We have shown that when there is a moderate within-cluster unaffected-affected units correlation and the proportion of clusters that contain both unaffected and affected units is small, using either of the two weighting schemes, corresponding to Obuchowski’s estimator or the estimator with equal cluster weights, can lead to dramatic efficiency loss. For this situation, the optimal weights are recommended.

Yougui Wu, (2015) Optimal Weights in Nonparametric Analysis of Clustered ROC Curve Data. Journal of Applied Mathematics and Physics,03,828-834. doi: 10.4236/jamp.2015.37102