Comparison of Outlier Techniques Based on Simulated Data


This research work employed a simulation study to evaluate six outlier techniques: t-Statistic, Modified Z-Statistic, Cancer Outlier Profile Analysis (COPA), Outlier Sum-Statistic (OS), Outlier Robust t-Statistic (ORT), and the Truncated Outlier Robust t-Statistic (TORT) with the aim of determining the technique that has a higher power of detecting and handling outliers in terms of their P-values, true positives, false positives, False Discovery Rate (FDR) and their corresponding Receiver Operating Characteristic (ROC) curves. From the result of the analysis, it was revealed that OS was the best technique followed by COPA, t, ORT, TORT and Z respectively in terms of their P-values. The result of the False Discovery Rate (FDR) shows that OS is the best technique followed by COPA, t, ORT, TORT and Z. In terms of their ROC curves, t-Statistic and OS have the largest Area under the ROC Curve (AUC) which indicates better sensitivity and specificity and is more significant followed by COPA and ORT with the equal significant AUC while Z and TORT have the least AUC which is not significant.

Share and Cite:

Obikee, A. , Ebuh, G. and Obiora-Ilouno, H. (2014) Comparison of Outlier Techniques Based on Simulated Data. Open Journal of Statistics, 4, 536-561. doi: 10.4236/ojs.2014.47051.

Conflicts of Interest

The authors declare no conflicts of interest.


[1] [1] Grubbs, F.E. (1969) Procedures for Detecting Outlying Observations in Samples. Technometrics, 11, 1-21.
[2] Hawkins, D. (1980) Identification of Outliers. Chapman and Hall, Kluwer Academic Publishers, Boston/Dordrecht/ London.
[3] Aggarwal, C.C. (2005) On Abnormality Detection in Spuriously Populated Data Streams. SIAM Conference on Data Mining. Kluwer Academic Publishers Boston/Dordrech/London.
[4] Barnett, V. and Lewis, T. (1994) Outliers in Statistical Data. 3rd Edition, John Wiley & Sons, Kluwer Academic Publishers, Boston/Dordrecht/London.
[5] Dudoit, S., Yang, Y., Callow, M. and Speed, T. (2002) Statistical Methods for Identifying Differentially Expressed Genes in Replicated DNA Microarray Experiments. Statistica Sinica, 12, 111-139.
[6] Troyanskaya, O.G., Garber, M.E., Brown, P.O., Botstein, D. and Altman, R.B. (2002) Nonparametric Methods for Identifying Differentially Expressed Genes in Microarray Data. Bioinformatics, 18, 1454-1461.
[7] Tomlins, S., Rhodes, D., Perner, S., Dhanasekaran, S., Mehra, R., Sun, X., Varambally, S., Cao, X., Tchinda, J., Kuefer, R., et al. (2005) Recurrent Fusion of TMPRSS2 and ETS Transcription Factor Genes in Prostate Cancer. Science, 310, 644-648.
[8] Efron, B., Tibshirani, R., Storey, J. and Tusher, V. (2001) Empirical Bayes Analysis of a Microarray Experiment. Journal of the American Statistical Association, 96, 1151-1160.
[9] Iglewicz, B. and Hoaglin, D.C. (2010) Detection of Outliers. Engineering Statistical Handbook, Database Systems Group.
[10] Lyons-Weiler, J., Patel, S., Becich, M. and Godfrey, T. (2004) Tests for Finding Complex Patterns of Differential Expression in Cancers: Towards Individualized Medicine. Bioinformatics, 5, 1-9.
[11] Tibshirani, R. and Hastie, R. (2006) Outlier Sums Statistic for Differential Gene Expression Analysis. Biostatistics, 8, 2-8.
[12] Benjamini, Y. and Hochberg, Y. (1995) Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing. Journal of the Royal Statistical Society Series B, 57, 289-300.
[13] Wu, B. (2007) Cancer Outlier Differential Gene Expression Detection. Biostatistics, 8, 566-575.
[14] Luo, J. (2012) Truncated Outlier Robust T-Statistic for Outlier Detection. Open Journal of Statistics, 2, 120-123.
[15] Fonseca, R., Barlogie, B., Bataille, R., Bastard, C., Bergsagel, P.L., Chesi, M., et al. (2004) Genetics and Cytogenetics of Multiple Myelom: A Workshop Report. Cancer Research, 64, 1546-1558.
[16] MacDonald, J.W. and Ghosh, D. (2006) Copa—Cancer Outlier Profile Analysis. Bioinformatics, 22, 2950-2951.
[17] Hu, J. (2008) Cancer Outlier Detection Based on Likelihood Ratio Test. Bioinformatics, 24, 2193-2199.
[18] Lian, H. (2008) MOST: Detecting Cancer Differential Gene Expression. Biostatistics, 9, 411-418.
[19] Ghosh, D. (2009) Genomic Outlier Profile Analysis: Mixture Models, Null Hypotheses, and Nonparametric Estimation. Biostatistics, 10, 60-69.
[20] Chen, L.A., Chen, D.T. and Chan, W. (2010) The Distribution-Based P-value for the Outlier Sum in Differential Gene Expression Analysis. Biometrika, 97, 246-253.
[21] Ghosh, D. (2010) Discrete Nonparametric Algorithms for Outlier Detection with Genomic Data. Journal of Biopharmaceutical Statistics, 20, 193-208.
[22] Filmoser, P., Maronna, R. and Werner, M. (2008) Outlier Identification in High Dimensions. Computational Statistics and Data Analysis, 52, 1694-1711.
[23] Mori, K., Oura, T., Noma, H. and Matsui, S. (2013) Cancer Outlier Analysis Based on Mixture Modeling of Gene Expression Data. Computational and Mathematical Methods in Medicine, 2013, Article ID: 693901.

Copyright © 2021 by authors and Scientific Research Publishing Inc.

Creative Commons License

This work and the related PDF file are licensed under a Creative Commons Attribution 4.0 International License.