Comparative Study of Four Methods in Missing Value Imputations under Missing Completely at Random Mechanism

DOI: 10.4236/ojs.2014.41004   PDF   HTML     4,802 Downloads   7,312 Views   Citations


In analyzing data from clinical trials and longitudinal studies, the issue of missing values is always a fundamental challenge since the missing data could introduce bias and lead to erroneous statistical inferences. To deal with this challenge, several imputation methods have been developed in the literature to handle missing values where the most commonly used are complete case method, mean imputation method, last observation carried forward (LOCF) method, and multiple imputation (MI) method. In this paper, we conduct a simulation study to investigate the efficiency of these four typical imputation methods with longitudinal data setting under missing completely at random (MCAR). We categorize missingness with three cases from a lower percentage of 5% to a higher percentage of 30% and 50% missingness. With this simulation study, we make a conclusion that LOCF method has more bias than the other three methods in most situations. MI method has the least bias with the best coverage probability. Thus, we conclude that MI method is the most effective imputation method in our MCAR simulation study.

Share and Cite:

M. Nakai, D. Chen, K. Nishimura and Y. Miyamoto, "Comparative Study of Four Methods in Missing Value Imputations under Missing Completely at Random Mechanism," Open Journal of Statistics, Vol. 4 No. 1, 2014, pp. 27-37. doi: 10.4236/ojs.2014.41004.

Conflicts of Interest

The authors declare no conflicts of interest.


[1] G. M. Fitzmaurice, N. M. Laird and J. H. Ware, “Applied Longitudinal Analysis,” Wiley, New Jersey, 2004.
[2] D. Hedeker and R. D. Gibbons, “Longitudinal Data Analysis,” Wiley, New Jersey, 2006.
[3] R. J. A. Little and D. B. Rubin, “Statistical Analysis with Missing Data,” 2nd Edition, Wiley, New Jersey, 2002.
[4] J. L. Schafer and J. W. Graham, “Missing Data: Our View of the State of the Art,” Psychological Methods, Vol. 7, No. 2, 2002, pp. 147-177.
[5] C. M. Musil, C. B. Warner, P. K. Yobas and S. L. Jones, “A Comparison of Imputation Techniques for Handling Missing Data,” Western Journal of Nursing Research, Vol. 24, No. 7, 2002, pp. 815-829.
[6] J. M. Engel and P. Diehr, “Imputation of Missing Longitudinal Data: A Comparison of Methods,” Journal of Clinical Epidemiology, Vol. 56, No. 10, 2003, pp. 968-976.
[7] C. D. Tufis, “Multiple Imputation as a Solution to the Missing Data Problem in Social Sciences,” Calitatea Vietii, Vol. 1-2, 2008, pp. 199-212.
[8] K. J. M. Janssen, A. T. Donders, F. E. Harrell Jr., Y. Vergouwe, Q. Chen, D. E. Grobbee and K. G. M. Moons, “Missing Covariate Data in Medical Research: To Impute Is Better than to Ignore,” Journal of Clinical Epidemiology, Vol. 63, No. 7, 2010, pp. 721-727.
[9] X. H. Zhou, G. J. Eckert and W. H. Tierney, “Multiple Imputation in Public Health Research,” Statistics in Medicine, Vol. 20, No. 9-10, 2001, pp. 1541-1549.
[10] F. M. Shrive, H. Stuart, H. Quan and W. A. Ghali, “Dealing with Missing Data in a Multi-Question Depression Scale: A Comparison of Imputation Methods,” BMC Medical Research Methodology, Vol. 6, 2006, p. 57.
[11] I. R. White and J. B. Carlin, “Bias and Efficiency of Multiple Imputation Compared with Complete-Case Analysis for Missing Covariate Values,” Statistics in Medicine, Vol. 29, No. 28, 2010, pp. 2920-2931.
[12] M. W.-L. Cheung, “Comparison of Methods of Handling Missing Time-Invariant Covariates in Latent Growth Models under the Assumption of Missing Completely at Random,” Organizational Research Methods, Vol. 10, No. 4, 2007, pp. 609-634.
[13] S. M. Fox-Wasylyshyn and M. M. El-Masri, “Handling Missing Data in Self-report Measures,” Research in Nursing & Health, Vol. 28, No. 6, 2005, pp. 488-495.
[14] M. Nakai, “Analysis of Imputation Methods for Missing Data in AR(1) Longitudinal Dataset,” International Journal of Mathematical Analysis, Vol. 5, No. 45, 2011, pp. 2217-2227.
[15] M. Nakai, “Effectiveness of Imputation Methods for Missing Data in AR(1) Longitudinal Dataset,” International Journal of Mathematical Analysis, Vol. 6, No. 28, 2012, pp. 1391-1394.
[16] G. Liu and A. L. Gould, “Comparison of Alternative Strategies for Analysis of Longitudinal Trials with Dropouts,” Journal of Biopharmaceutical Statistics, Vol. 12, No. 2, 2002, pp. 207-226.
[17] P. Lane, “Handling Drop-Out in Longitudinal Clinical Trials: A Comparison of the LOCF and MMRM Approaches,” Pharmaceutical Statistics, Vol. 7, No. 2, 2008, pp. 93-106.
[18] P. D. Allison, “Missing Data,” Thousand Oaks, 2002.
[19] SAS Institute Inc, “SAS/STAT® User’s Guide, Version 8,” Cary, 1999.
[20] Y. S. Su, A. Gelman, J. Hill and M. Yajima, “Multiple Imputation with Diagnostics (mi) in R: Opening Windows into the Black Box,” Journal of Statistical Software, Vol. 45, No. 2, 2011.
[21] L. M. Collins, J. L. Schafer and C. M. Kam, “A Comparison of Inclusive and Restrictive Strategies in Modern Missing Data Procedures,” Psychological Methods, Vol. 6, No. 4, 2001, pp. 330-351.
[22] R. J. A. Little, “A Test of Missing Completely at Random for Multivariate Data with Missing Values,” Journal of the American Statistical Association, Vol. 83, No. 404, 1988, pp. 1198-1202.
[23] Y. Kim, “Missing Data Handling in Chronic Pain Trials,” Journal of Biopharmaceutical Statistics, Vol. 21, No. 2, 2011, pp. 311-325.
[24] L. Tang, J. Song, T. R. Belin and J. Unutzer, “A Comparison of Imputation Methods in a Longitudinal Randomized Clinical Trial,” Statistics in Medicine, Vol. 24, No. 14, 2005, pp. 2111-2128.

comments powered by Disqus

Copyright © 2020 by authors and Scientific Research Publishing Inc.

Creative Commons License

This work and the related PDF file are licensed under a Creative Commons Attribution 4.0 International License.