Missing Data Handling: A Comprehensive Review, Taxonomy, and Comparative Evaluation ()
ABSTRACT
Missing data remains a persistent and pervasive challenge across a wide range of domains, significantly impacting data analysis pipelines, predictive modeling outcomes, and the reliability of decision-making processes. This paper presents a comprehensive and updated review of missing data handling techniques that entail both traditional statistical methods and state-of-the-art graph-based and machine-learning approaches. A novel taxonomy is introduced, classifying strategies into three principal categories: preprocessing techniques, graph-based imputations, and algorithms inherently tolerant to missing values. Particular emphasis is placed on recent advancements in deep learning architectures, including Generative Adversarial Imputation Networks (GAIN), Self-Attention Imputation for Time Series (SAITS), and MissFormer, as well as graph-based methods such as Graph Recovery Imputation Network (GRIN) and Temporal Spatial Imputation Graph Neural Network (TSI-GNN). These models demonstrate notable improvements in handling complex missingness patterns and scaling to large heterogeneous datasets. To complement the theoretical review, an empirical evaluation was conducted on two benchmark datasets (Heart Disease and Kidney Disease), examining the effectiveness and limitations of various imputation strategies under different missingness scenarios. The results underscore the critical importance of adapting missing data handling techniques to the nature of the dataset, the underlying missingness mechanism, and the proportion of missing entries. Finally, the paper outlines promising research directions, advocating for the development of lightweight, explainable, and scalable models; online adaptive imputation strategies for streaming data; multimodal data integration techniques; and privacy-preserving imputation frameworks within federated and decentralized learning environments. Addressing these challenges is essential for building the next generation of reliable, transparent, and intelligent data-driven systems.
Share and Cite:
Chourib, I. (2025) Missing Data Handling: A Comprehensive Review, Taxonomy, and Comparative Evaluation.
Journal of Computer and Communications,
13, 81-102. doi:
10.4236/jcc.2025.136006.
Cited by
No relevant information.