Journal of Computer and Communications

Volume 13, Issue 6 (June 2025)

ISSN Print: 2327-5219   ISSN Online: 2327-5227

Google-based Impact Factor: 1.98  Citations  

Missing Data Handling: A Comprehensive Review, Taxonomy, and Comparative Evaluation

  XML Download Download as PDF (Size: 1865KB)  PP. 81-102  
DOI: 10.4236/jcc.2025.136006    9 Downloads   84 Views  
Author(s)

Affiliation(s)

ABSTRACT

Missing data remains a persistent and pervasive challenge across a wide range of domains, significantly impacting data analysis pipelines, predictive modeling outcomes, and the reliability of decision-making processes. This paper presents a comprehensive and updated review of missing data handling techniques that entail both traditional statistical methods and state-of-the-art graph-based and machine-learning approaches. A novel taxonomy is introduced, classifying strategies into three principal categories: preprocessing techniques, graph-based imputations, and algorithms inherently tolerant to missing values. Particular emphasis is placed on recent advancements in deep learning architectures, including Generative Adversarial Imputation Networks (GAIN), Self-Attention Imputation for Time Series (SAITS), and MissFormer, as well as graph-based methods such as Graph Recovery Imputation Network (GRIN) and Temporal Spatial Imputation Graph Neural Network (TSI-GNN). These models demonstrate notable improvements in handling complex missingness patterns and scaling to large heterogeneous datasets. To complement the theoretical review, an empirical evaluation was conducted on two benchmark datasets (Heart Disease and Kidney Disease), examining the effectiveness and limitations of various imputation strategies under different missingness scenarios. The results underscore the critical importance of adapting missing data handling techniques to the nature of the dataset, the underlying missingness mechanism, and the proportion of missing entries. Finally, the paper outlines promising research directions, advocating for the development of lightweight, explainable, and scalable models; online adaptive imputation strategies for streaming data; multimodal data integration techniques; and privacy-preserving imputation frameworks within federated and decentralized learning environments. Addressing these challenges is essential for building the next generation of reliable, transparent, and intelligent data-driven systems.

Share and Cite:

Chourib, I. (2025) Missing Data Handling: A Comprehensive Review, Taxonomy, and Comparative Evaluation. Journal of Computer and Communications, 13, 81-102. doi: 10.4236/jcc.2025.136006.

Cited by

No relevant information.

Copyright © 2025 by authors and Scientific Research Publishing Inc.

Creative Commons License

This work and the related PDF file are licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.