Optimal Weights in Nonparametric Analysis of Clustered ROC Curve Data

In diagnostic trials, clustered data are obtained when several subunits of the same patient are observed. Within-cluster correlations need to be taken into account when analyzing such clustered data. A nonparametric method has been proposed by Obuchowski (1997) to estimate the Receiver Operating Characteristic curve area (AUC) for such clustered data. However, Obuchowski’s estimator gives equal weight to all pairwise rankings within and between cluster. In this paper, we modify Obuchowski’s estimate by allowing weights for the pairwise rankings vary across clusters. We consider the optimal weights for estimating one AUC as well as two AUCs’ difference. Our results in this paper show that the optimal weights depends on not only the within-patient correlation but also the proportion of patients that have both unaffected and affected units. More importantly, we show that the loss of efficiency using equal weight instead of our optimal weights can be severe when there is a large within-cluster correlation and the proportion of patients that have both unaffected and affected units is small.


Introduction
In diagnostic trials, clustered data are obtained when several subunits of the same patient are observed.For example, in a study by Masaryk et al. (1991) [2], two radiologists evaluated 65 carotid arteries (left and right) in 36 patients using three-dimensional Magnetic Resonance Angiography(MRA), a potential screeening tool for athe-rosclerosis of the carorid arteries.These patients also underwent intra-arterial digital subtraction angiography (DSA), which is considered the gold standard for characterizing the degree of stenosis.The goals of the study were to evaluate the performance of MRA according to each reader, and to compare the performance for the two radiologists.
In the above example, each patient(cluster) contributes a number of unaffected and affected units.Correlation exists for outcomes between two unaffected units, between two affected units, and between an unaffected and an affected unit from the same cluster, and between the outcomes of the two diagnostic tests from the same cluster.
All these correlations need to be taken into account when analyzing such clustered data.
An ROC curve is a plot of a diagnostic test's sensitivity versus 1-specificity.The curve is constructed by changing the cutpoint that defines a positive diagnostic test result.The area under the ROC curve (AUC) summarizes the test's overall diagnostic ability and is typically used as a global measure of the accuracy of the diagnostic test.
In the clustered data case, Obuchowski (1997) [1] proposed a nonparametric AUC estimator, and derived an asymptotic variance estimate for the AUC estimator, taking into account of within-cluster correlations.However, Obuchowski's AUC estimator gives equal weight to all pairwise rankings within and between clusters.Clusters can be different in terms of cluster size, the number of unaffected units, and the number of affected units.In the presence of various within-cluster correlations, these differences would affect the contribution of a cluster to the overall variance of the AUC estimator and hence weights should vary across clusters.
In this paper, we modify Obuchowski's estimator by allowing the weight assigned to each pairwise ranking to vary across clusters, and derive the optimal weights that minimize the variance of the AUC estimator.Our results in this paper show that the optimal weights depends not only on the within-cluster correlation but also the proportion of clusters that have both unaffected and affected units.More importantly, we show that the gain of efficiency in comparison with two simple weighting schemes can be doubled when there is a large within-cluster correlation and the proportion of clusters that have both unaffected and affected units is small.
The rest of this paper is organized as follows.In Section 2, the optimal weights for one AUC are derived and the estimators of the optimal weights are discussed.The relative asymptotic efficiencies in comparing our optimal estimator with two simple weighting schemes are studied.A data example is presented in Section 3 and conclusions are provided in Section 4.

Optimal Weights for Estimating One Auc
. Obuchowski (1997) [1] proposed a non-parametric estimate for θ , given by where . This estimate gives equal weight to all pairwise ranking.
Note that ( ) F t can be estimated by where  is a set of weights assigned to the clusters with at least one unaffected unit satis- . Similarly, ( ) G t can be estimated by We propose to estimate θ by , our estimator is the same as that in Obuchowski (1997) [1].
To derive our optimal weight, we utilize the following result which can be found in the Appendix of Emir, et al. (2000) [3]: where Note that we can express the variance of θ in (6) in terms of jk U and jk V as , , ) .
The optimal weights can be obtained by minimizing (8) with respect to ,

Asymptotic Variance Comparison
Let ˆop δ be the estimated optimal weight, 1 δ be the estimator of δ using simple weighting Scheme 1: , and 2 δ be the estimator of δ using simple weighting Scheme 2: Along the same line of the proofs for (??), (??) and (??), we can show that ( )

Conculsions
We have proposed an optimal nonparametric estimator for one AUC, which modifies Obuchowski's estimate by allowing different weights for the pairwise rankings within and between cluster.Optimal weights for one AUC has been derived by minimizing the variance of the estimate of one AUC(two AUCs' difference).Asymptotic performance of the AUC estimate using our optimal weights has been studied in contrast with the two weighting schemes.
We have shown that when there is a moderate within-cluster unaffected-affected units correlation and the proportion of clusters that contain both unaffected and affected units is small, using either of the two weighting schemes, corresponding to Obuchowski's estimator or the estimator with equal cluster weights, can lead to dramatic efficiency loss.For this situation, the optimal weights are recommended.

.
of weights assigned to the clusters with at least one affected unit satisfying Similar toEmir et al. (2000) [3], two simple weighting schemes can be considered: (1) assigning equal weights to observations, i.e., correlation is low, and (2) assigning equal weights to clusters, i.e., correlation is high.

 and 0 1 j∆
= if the jth cluster contains at least one unaffected unit and =0 otherwise and 1 1 j ∆ = if the jth cluster contains at least one affected unit and =0 otherwise.Hence, the variance of θ is approximately

2.1. Optimal Weights Derivation
jk X denote the diagnostic test result of the kth unaffected unit in the jth cluster 1 ( 1, 2, , ),( 1, 2, , ) jk X and jk Y , respectively.Assume that if the value of jk X or jk Y exceeds a predetermined cut-off point c the diagnostic test will be considered positive.Then the area under the ROC curve of the diagnostic test is 0 ,