Estimation of Area under Receiver Operating Characteristic Curve for Bi-Pareto and Bi-Two Parameter Exponential Models

In this paper, we find the ROC curves for Bi-Pareto and Bi-two parameter exponential distributions. Theoretical, parametric and non-parametric values of area under receiver operating characteristic (AUROC) curve for different parametric combinations have been calculated using simulations. These values are compared in terms of root mean square and mean absolute errors. The results are demonstrated for two real data sets.


Introduction
Receiver operating characteristic (ROC) curves have become the standard tool for evaluating the discriminatory power of medical diagnostic tests and are commonly used in assessing the predictive ability of binary regression models.ROC curve is a diagnostic tool that helps in determining the accuracy of a test conducted on a person to know whether a particular disease is present or not.In a typical setting, one has a binary indicator and a set of predictors or marker values.The goal is to see how well the marker values predict the binary indicator.The principal idea is to dichotomize the marker at various thresholds and compute the resulting sensitivity and specificity.Sensitivity of a test is defined as the probability of a positive test result when the disease is present and specificity is the probability of a negative test result when disease is absent.Sensitivity is also known as True Positive Rate (TPR) and specificity is known as False Negative Rate (FNR).False Positive Rate is termed as (1-specificity).ROC curve is obtained by plotting the sensitivity versus (1-specificity).
In credit rating models in finance, sensitivity is termed as "Hit Rate" (HR) whereas (1-specificity) is known as "False Alarm Rate" (FAR).If the rating score of the debtor is lower than a cut-off value C, he is treated a defaulter.Otherwise, he is a non-defaulter.Hence ( )

Number of defaulters classified correctly HR C
Total number of defaulters = and ( )

Number of non-defaulters classified incorrectly FAR C
Total number of non-defaulters = ROC curve plots HR versus FAR [1].For detailed discussion on ROC curves, one can refer to [2].If F and G are the cumulative distribution functions (cdfs) for two populations N and P, then the ROC curve has the form The area under ROC curve (AUROC) is a widely used summary index [3][4][5][6].It is the average TPR taken uniformly over all FPRs on (0, 1) and written as For credit rating models, [ ] ( ) The area under ROC curve is • 0.5 if the model does not have discriminative quality; • between 0.5 and 1.0 for a reasonable model; • if the model is perfect.
There are many methods, parametric as well as non-parametric, to find the AUROC.Parametric methods are used when the statistical distribution of test values is known in diseased and non-diseased groups.The most common ROC curve model is the Binormal model which assumes that both diseased and healthy test values follow normal distribution.In some situations, the assumption of normality is violated.In case sample sizes are small, this model cannot be adopted.So, we consider Bi-Pareto and Bi-Two parameter exponential models and study the areas under ROC curve.
In Section 2, we derive the expressions for AUROC of Bi-Pareto and Bi-two parameter exponential distributions.In Section 3, we carry out simulations for various combinations of parameters and calculate the values of AUROC using both parametric and non-parametric approaches.Section 4 includes two real life applications for finding ROC curves and areas under them.Conclusions are given in Section 5.

Area under ROC for Bi-Pareto and Bi-Two Parameter Exponential Models
In this section, we derive the parametrical forms of ROC by assuming that the two populations labelled as N and P follow some particular distributions.The distributions under consideration are Pareto and two parameter exponential distributions.

ROC for Bi-Pareto Model
It is assumed that population N follows Pareto distribution with parameters 1 α and 1 λ and population P fol- lows Pareto distribution with parameters 2 α and 2 λ .Hence, the cdf of population N is ( ) and the cdf of population P is ( ) Hence using (3), the ROC curve has the form  Therefore using (1), the area under ROC curve is ( ) We estimate the AUROC using the maximum likelihood estimators of parameters of Pareto distribution given by where X i 's are the sample observations and G M is the geometric mean of the observations [7].
In particular, if 1 2, 1 and 0.5, 0.25 Solving the above integral using Mathematica, we get the theoretical value of AUROC as 0.921433.

ROC for Bi-Two Parameter Exponential Model
It is assumed that populations N and P follow two parameter exponential distributions with parameters 1 µ , 1 θ and 2 µ , 2 θ respectively.Then the cdf for population N is ( ) and the cdf for population P is ( ) z F x = , and using ( 5), we get ( ) ( ) Using (6), we get the ROC curve as This gives the area under ROC curve as ( ) We estimate the AUROC using the maximum likelihood estimators of µ and θ, parameters of two parameter exponential distribution given by ( ) where i X ′ s are the sample observations [7].In particular, if The theoretical value of AUROC is obtained as 0.815805 by solving the above integral using Mathematica.
In the following section, parametric and non-parametric estimates of AUROC are calculated by carrying out simulations.

Simulations
The theoretical AUROC values are calculated by assuming Pareto and two-parameter exponential distributions for both populations N and P. Samples are generated from assumed distributions by choosing different values of the parameters.We obtain the parametric estimates of AUROC by substituting the values of MLEs of parameters in the AUROC formulas given in ( 4) and (7) for Bi-Pareto and Bi-two parameter exponential models respectively.The non-parametric estimates of AUROC are obtained using Mann-Whitney U statistic [8].
We use 1000 replications for sample sizes 25, 50 and 100 for each distribution.The parameters are estimated using MLEs for each replication and substituted back into the AUROC formula.The error is defined as the difference between estimates based on sample and theoretical AUROC values.n and m denote the sample sizes from two populations.Various parametric combinations are taken and theoretical as well as simulated area under ROC curve is computed using Mathematica and R softwares.The theoretical AUROC (TAUROC), root mean square errors (RMSEs), mean absolute errors (MAEs) and AUROC have been computed using parametric and non-parametric approach.The results are presented in Table 1 for Bi-Pareto model and Table 2 for Bi-two parameter exponential model.
It is evident from Tables 1 and 2 that • the root mean square error and mean absolute error for parametric approach are less than those for nonparametric approach.Therefore, one can estimate the area under ROC curve more accurately by parametric approach than by non-parametric approach in case of Bi-Pareto and Bi-two parameter exponential models In the following discussion, we present two real life examples where the two groups in data sets fit well to Pareto and two-parameter exponential distributions.The theoretical value of AUROC, RMSE and MAE has been obtained for both models.

Bi-Pareto Model
The data as shown in Table 3 consist of 50 patients [9] with advanced acute myelogenous leukemia reported to the International Bone Marrow Transplant registry.28 of these patients had received an autologous (auto) bone   marrow transplant in which, after high doses of chemotherapy, their own marrow was reinfused to replace their destroyed immune system.22 patients had an allogeneic (allo) bone marrow transplant where marrow from an HLA (Histocompatibility Leukocyte Antigen) matched sibling was used to replenish their immune systems.By using the easy fit software, it is seen that data in both groups fit well to the Pareto distribution.The p-values for Kolmogorov-Smirnov and Chi-square tests are shown in Table 4.
The histograms of Allo and Auto patients are shown in Figures 1 and 2.
The area under ROC curve is calculated to be 0.711 by taking the allo patients as one group and auto patients as the second group when both groups follow Pareto distribution.The ROC curve is plotted in Figure 3.
For the above example, the parametric and non-parametric values of AUROC are 0.8164450 and 0.8049404 respectively.The root mean square errors for parametric and non-parametric approach are 0.2107548 and 0.2511394 respectively.

Bi-Two Parameter Exponential Model
Freireich [10] gave the results of a clinical trial of a drug 6-mercaptopurine (6-MP) versus a placebo in 42 children with acute leukemia and data is given in Table 5.The trial was conducted at 11 American hospitals.Those patients were selected who had a complete remission or partial remission of their leukemia induced by treatment with the drug prednisone.The trial was conducted by matching pair of patients at a given hospital by remission status (partial or complete) and randomising within the pair to either a 6-MP or placebo maintenance therapy.The patients were followed until their leukemia returned or until the end of the study (in months).The data are given below: By using the easyfit software, we see that data for placebo and 6-MP patients fit well to the two parameter exponential distribution and this can also be concluded from the values in Table 6.
The histograms of Placebo and 6-MP patients are as shown in Figures 4 and 5.    3.
The area under ROC curve is calculated to be 0.759 by taking the placebo patients as one group and 6-MP patients as the second group when both groups follow the two parameter exponential distribution.The ROC curve is plotted in Figure 6.
For the above example, parametric value of AUROC is 0.78713997 and non-parametric value is 0.45884774.The root mean square errors for parametric and non-parametric approach are 0.02813997 and 0.30015226 respectively.

Conclusion
In this paper, we derive the AUROC for Bi-Pareto and Bi-two parameter exponential models.The theoretical, parametric and non-parametric values of AUROC for different parameter combinations have been calculated.The     5.

OPEN ACCESS OJS
root mean square and mean absolute errors are calculated using simulations.For both the models, the area under ROC curve can be estimated more accurately by parametric approach as compared to the non-parametric approach.The applications have been discussed using real life data sets.

Figure 3 .
Figure 3. ROC Curve for the data given in Table3.

Figure 6 .
Figure 6.ROC Curve for the data given in Table5.