Comparing Data Mining Techniques in HIV Testing Prediction


Introduction: The present work compared the prediction power of the different data mining techniques used to develop the HIV testing prediction model. Four popular data mining algorithms (Decision tree, Naive Bayes, Neural network, logistic regression) were used to build the model that predicts whether an individual was being tested for HIV among adults in Ethiopia using EDHS 2011. The final experimentation results indicated that the decision tree (random tree algorithm) performed the best with accuracy of 96%, the decision tree induction method (J48) came out to be the second best with a classification accuracy of 79%, followed by neural network (78%). Logistic regression has also achieved the least classification accuracy of 74%. Objectives: The objective of this study is to compare the prediction power of the different data mining techniques used to develop the HIV testing prediction model. Methods: Cross-Industry Standard Process for Data Mining (CRISP-DM) was used to predict the model for HIV testing and explore association rules between HIV testing and the selected attributes. Data preprocessing was performed and missing values for the categorical variable were replaced by the modal value of the variable. Different data mining techniques were used to build the predictive model. Results: The target dataset contained 30,625 study participants. Out of which 16,515 (54%) participants were women while the rest 14,110 (46%) were men. The age of the participants in the dataset ranged from 15 to 59 years old with modal age of 15 - 19 years old. Among the study participants, 17,719 (58%) have never been tested for HIV while the rest 12,906 (42%) had been tested. Residence, educational level, wealth index, HIV related stigma, knowledge related to HIV, region, age group, risky sexual behaviour attributes, knowledge about where to test for HIV and knowledge on family planning through mass media were found to be predictors for HIV testing. Conclusion and Recommendation: The results obtained from this research reveal that data mining is crucial in extracting relevant information for the effective utilization of HIV testing services which has clinical, community and public health importance at all levels. It is vital to apply different data mining techniques for the same settings and compare the model performances (based on accuracy, sensitivity, and specificity) with each other. Furthermore, this study would also invite interested researchers to explore more on the application of data mining techniques in healthcare industry or else in related and similar settings for the future.

Share and Cite:

Hailu, T. (2015) Comparing Data Mining Techniques in HIV Testing Prediction. Intelligent Information Management, 7, 153-180. doi: 10.4236/iim.2015.73014.

Conflicts of Interest

The authors declare no conflicts of interest.


[1] Ababa, A. (2006) AIDS in Ethiopia: 6th Report. Federal Ministry of Health National HIV/AIDS Prevention and Control Office.
[2] The Voluntary HIV-1 Counseling and Testing Study Group (2000) Efficacy of Voluntary HIV-1 Counseling and Testing among Individuals and Couples in Kenya, Tanzania, and Trinidad: A Randomized Trial. The Lancet, 356, 103-112.
[3] UNAIDS (2002) HIV Voluntary Counseling and Testing: A Gateway to Prevention and Care. Five Case Studies Related to Mother-to-Child Transmission of HIV, Tuberculosis, Young People, and Reaching General Population Groups. UNAIDS Case Study.
[4] WHO (2006) Towards Universal Access: Part II. A Report on “3 by 5” and Beyond.
[5] Lawn, S.D., Myer, L., Orrell, C., Bekker, L.G. and Wood, R. (2005) Early Mortality among Adults Accessing a Community-Based Antiretroviral Service in South Africa: Implications for Programme Design. AIDS, 19, 2141-2148.
[6] Ita, M. (1998) Counseling in Reproductive Health among Young People in the Shitta Community in Lagos State. Abstracts of the XIIth International AIDS Conference, Geneva, 28 June-3 July 1998, Abstract 60857.
[7] Valdiserri, R.O., Holtgrave, D.R. and West, G.R. (1999) Promoting Early HIV Diagnosis and Entry into Care. AIDS, 13, 2317-2330.
[8] Carpenter, C.C., Fischl, M.A., Hammer, S.M., et al. (1998) Antiretroviral Therapy for HIV Infection: Updated Recommendations of the International AIDS Society-USA Panel. The Journal of the American Medical Association, 280, 78-86.
[9] Quinn, T.C., Wawer, M.J., Sewankambo, N., et al. (2000) Viral Load and Heterosexual Transmission of Human Immunodeficiency Virus. The Journal of the American Medical Association, 342, 921-929.
[10] Alwano-Edyegu, M.G. and Marum, E. (1999) Knowledge Is Power: Voluntary HIV Counseling and Testing in Uganda. UNAIDS, Geneva.
[11] Denning, P., Nakashima, A., Wortley, C. and SHAS Project Group (1999) High Risk Sexual Behaviors among HIV-Infected Adolescents and Young Adults. Abstracts of the 6th Conference on Retroviruses and Opportunistic Infections, Chicago, 31 January-4 February 1999.
[12] Assefa, Y., Jerene, D., Lulseged, S., Ooms, G. and Van Damme, W. (2009) Rapid Scale-Up of Antiretroviral Treatment in Ethiopia: Successes and System-Wide Effects. PLoS Medicine, 6, e1000056.
[13] Kononenko, I. (2001) Machine Learning for Medical Diagnosis: History, State of the Art and Perspective. Artificial Intelligence in Medicine, 23, 89-109.
[14] Koh, H.C. and Tan, G.J. (2005) Data Mining Applications in Healthcare. Journal of Healthcare Information Management, 19, 64-72.
[15] Delen, D., Walker, G. and Kadam, A. (2004) Predicting Breast Cancer Survivability: A Comparison of Three Data Mining Methods. Artificial Intelligence in Medicine, 34, 113-127.
[16] Wang, J., Hu, X.H. and Zhu, D. (2008) Applications of Data Mining in the Healthcare Industry.
[17] Fayyad, U., Piatetsky-Shapiro, G., and Smyth, P. (1996) From Data Mining to Knowledge Discovery in Databases. American Association for Artificial Intelligence, 17, 36-51.
[18] Hand, D., Mannila, H. and Smyth, P. (2001) Principles of Data Mining. The MIT Press, Cambridge.
[19] Azevedo, A. and Santos, M.F. (2008) KDD, SEMMA and CRISP-DM: A Parallel Overview. Proceedings of the IADIS European Conference Data Mining, Amsterdam, 24-26 July 2008, 182-185.
[20] Bigus, J.P. (1996) Data Mining with Neural Networks. McGraw-Hill, New York.
[21] Kurgan, L.A. and Musilek, P. (2006) A Survey of Knowledge Discovery and Data Mining Process Models. The Knowledge Engineering Review, 21, 1-24.
[22] Mining Techniques in Health Care (2011).
[23] Refaat, M. (2007) Data Preparation for Data Mining Using SAS. Morgan Kaufmann Publishers, San Francisco.
[24] Chapman, P., Clinton, J., Kerber, R., Khabaza, T., Reinartz, T., Shearer, C. and Wirth, R. (2000) CRISP-DM 1.0: Step- by-Step Data Mining Guide. SPSS, Copenhagen.
[25] Shouman, M., Turner, T. and Stocker, R. (2012) Using Data Mining Techniques in Heartdisease Diagnosis and Treatment.
[26] Nabney, I. (2003) Netlab: Algorithms for Pattern Recognition. Springer Verlag, Berlin.
[27] Leke-Betechuoh, B., Marwala, T., Tim, T. and Lagazio, M. (2006) Prediction of HIV Status from Demographic Data Using NNs. Proceedings of the 2006 IEEE International Conference on Systems, Man and Cybernetics, Taipei, 8-11 October 2006, 2339-2344.
[28] Abraham, T. (2005) Application of Data Mining Technology to Identify Determinant Risk Factors of HIV Infection and to Find Their Association Rules: The Case of Center for Disease Control and Prevention (CDC). Master’s Thesis, Addis Ababa University, Addis Ababa.
[29] Asmare, B. (2009) Application of Data Mining Technology to Support VCT for HIV: A Case of Center for Disease Control and Prevention. Master Thesis, School of Information Science, Addis Ababa University, Addis Ababa.
[30] Lemuye, E. (2011) HIV Status Predictive Modeling Using Data Mining Technology. Master Thesis, AAU School of Information Science and Public Health, Addis Ababa.
[31] Lee, C.W. and Park, J.-A. (2001) Assessment of HIV/AIDS-Related Health Performanceusing an Artificial Neural Network. Information & Management, 38, 231-238.
[32] Kwak, N.K. and Lee, C. (1997) A Neural Network Application to Classification of Health Status of HIV/AIDS Patient. Journal of Medical Systems, 21, 87-97.
[33] Han, J. and Kamber, M. (2006) Data Mining: Concepts and Techniques. 2nd Edition, Morgan Kaufmann Publishers, San Francisco.
[34] Liu, H. and Motoda, H. (1998) Feature Selection for Knowledge Discovery and Data Mining. Springer, Berlin.
[35] Famili, A. and Turney, P. (1997) Data Preprocessing and Intelligent Data Analysis. Institute of Information Technology, National Research Council Canada.
[36] Chakrabarti, S., Cox, E., Frank, E., Hartmut, G.R., Han, J., Jiang, X., Kamber, M. and Witten, I. (2009) Data Mining: Know It All. Morga Kaufmann Publishers, Burlington, San Francisco.
[37] Kantardzic, M. (2003) Data Mining: Concepts, Models, Methods, and Algorithms. John Wiley & Sons, Hoboken.
[38] Brachman, R. J. and Anand, T. (1996) The Process of Knowledge Discovery in Databases.
[39] Zurada, J.M. (1992) An Introduction To Artificial Neural Networks Systems. West Publishing, St. Paul.
[40] Lu, H., Setiono, R. and Liu, H. (1996) Effective Data Mining Using Neural Networks. IEEE Transactions on Knowledge and Data Engineering, 8, 957-961.
[41] Roberts, A. (2005) AI32: Guide to Weka.
[42] Witten, I.H. and Frank, E. (2005) Data Mining: Practical Machine Learning Tools and Techniques. 2nd Edition, Morgan Kaufmann Publishers, San Francisco.
[43] Breiman, L., Friedman, J.H., Olshen, R.A. and Stone, C.J. (1984) Classification and Regresion Trees.
[44] Shafer, J., Agrawal, R. and Mehta, M. (1996) SPRINT: A Scalable Parallel Classifier for Data Mining. Proceedings of the 22th International Conference on Very Large Data Bases, Mumbai, 3-6 September 1996, 544-555.
[45] Predictive Modeling.
[46] Spaulding, A.B., Brickley, D.B., Kennedy, C., Almers, L., Packel, L., Mirjahangir, J., Kennedy, G., Collins, L., Osbornee, K. and Mbizvo, M. (2009) Linking Family Planning with HIV/AIDS Interventions: A Systematic Review of the Evidence. AIDS, 23, S79-S88.

Copyright © 2023 by authors and Scientific Research Publishing Inc.

Creative Commons License

This work and the related PDF file are licensed under a Creative Commons Attribution 4.0 International License.