TITLE:
Using Statistical Learning to Treat Missing Data: A Case of HIV/TB Co-Infection in Kenya
AUTHORS:
Joshua O. Mwaro, Linda Chaba, Collins Odhiambo
KEYWORDS:
Missing Data, HIV/TB Co-Infection, Imputation, Missing at Random, Count Data
JOURNAL NAME:
Journal of Data Analysis and Information Processing,
Vol.8 No.3,
July
31,
2020
ABSTRACT: In this study, we investigate the effects of missing data when estimating HIV/TB co-infection. We revisit the concept of missing data and examine three available approaches for dealing with missingness. The main objective is to identify the best method for correcting missing data in TB/HIV Co-infection setting. We employ both empirical data analysis and extensive simulation study to examine the effects of missing data, the accuracy, sensitivity, specificity and train and test error for different approaches. The novelty of this work hinges on the use of modern statistical learning algorithm when treating missingness. In the empirical analysis, both HIV data and TB-HIV co-infection data imputations were performed, and the missing values were imputed using different approaches. In the simulation study, sets of 0% (Complete case), 10%, 30%, 50% and 80% of the data were drawn randomly and replaced with missing values. Results show complete cases only had a co-infection rate (95% Confidence Interval band) of 29% (25%, 33%), weighted method 27% (23%, 31%), likelihood-based approach 26% (24%, 28%) and multiple imputation approach 21% (20%, 22%). In conclusion, MI remains the best approach for dealing with missing data and failure to apply it, results to overestimation of HIV/TB co-infection rate by 8%.