Share This Article:

Multiple Imputation of Missing Data: A Simulation Study on a Binary Response

Full-Text HTML Download Download as PDF (Size:336KB) PP. 370-378
DOI: 10.4236/ojs.2013.35043    4,957 Downloads   7,901 Views   Citations

ABSTRACT

Currently, a growing number of programs become available in statistical software for multiple imputation of missing values. Among others, two algorithms are mainly implemented: Expectation Maximization (EM) and Multiple Imputation by Chained Equations (MICE). They have been shown to work well in large samples or when only small proportions of missing data are to be imputed. However, some researchers have begun to impute large proportions of missing data or to apply the method to small samples. A simulation was performed using MICE on datasets with 50, 100 or 200 cases and four or eleven variables. A varying proportion of data (3% - 63%) was set as missing completely at random and subsequently substituted using multiple imputation by chained equations. In a logistic regression model, four coefficients, i.e. non-zero and zero main effects as well as non-zero and zero interaction effects were examined. Estimations of all main and interaction effects were unbiased. There was a considerable variance in the estimates, increasing with the proportion of missing data and decreasing with sample size. The imputation of missing data by chained equations is a useful tool for imputing small to moderate proportions of missing data. The method has its limits, however. In small samples, there are considerable random errors for all effects.

Conflicts of Interest

The authors declare no conflicts of interest.

Cite this paper

J. Hardt, M. Herke, T. Brian and W. Laubach, "Multiple Imputation of Missing Data: A Simulation Study on a Binary Response," Open Journal of Statistics, Vol. 3 No. 5, 2013, pp. 370-378. doi: 10.4236/ojs.2013.35043.

References

[1] J. Hardt and K. Gorgen, “Multiple Imputation Using ICE: A Simulation Study on a Binary Response,” 6th German Stata User Group Meeting, Berlin, 2008.
http://www.stata.com/meeting/germany08/abstracts.html
[2] J. W. Graham, S. M. Hofer and A. M. Piccinin, “Analysis with Missing Data in Drug Prevention Research,” NIDA Research Monography, Vol. 142, 1994, pp. 13-63.
[3] S. van Buuren, “Multiple Imputation of Discrete and Continuous Data by Fully Conditional Specification,” Statistical Methods in Medical Research, Vol. 16 No. 3, 2007, pp. 219-242.
http://dx.doi.org/10.1177/0962280206074463
[4] G. Papastenafou and M. Wiedenbeck, “Singulare und Multiple Imputation Fehlender Einkommenswerte. Ein Empirischer Vergleich,” ZUMA Nachrichten, Vol. 43, 1998, pp. 73-89.
[5] G. J. van der Heijden, A. R. Donders, T. Stijnen and K. G. Moons, “Imputation of Missing Values Is Superior to Complete Case Analysis and the Missing-Indicator Method in Multivariable Diagnostic Research: A Clinical Example,” Journal of Clinical Epidemiology, Vol. 59, No. 10, 2006, pp. 1102-1109. http://dx.doi.org/10.1016/j.jclinepi.2006.01.015
[6] D. B. Rubin, “Multiple Imputation for Nonresponse in Surveys,” Wiley & Sons, New York, 1987.
http://dx.doi.org/10.1002/9780470316696
[7] X. L. Meng, “Multiple-Imputation Inferences with uncongenial Sources of Input,” Statistical Science, Vol. 9, No. 4, 1994, pp. 538-573.
[8] P. S. Albert, “Imputation Approaches for Estimating Diagnostic Accuracy for Multiple Tests from Partially Verified Designs,” Biometrics, Vol. 63, No. 3, 2007, pp. 947957.
http://dx.doi.org/10.1111/j.1541-0420.2006.00734.x
[9] S. A. Barnes, S. R. Lindborg and J. W. Seaman Jr., “Multiple Imputation Techniques in Small Sample Clinical Trials,” Statistics in Medicine, Vol. 25, No. 2, 2006, pp. 233-245.
http://dx.doi.org/10.1002/sim.2231
[10] O. Harel and X. H. Zhou, “Multiple Imputation: Review of Theory, Implementation and Software,” Statistics in Medicine, Vol. 26, No. 16, 2007, pp. 3057-3077. http://dx.doi.org/10.1002/sim.2787
[11] D. B. Rubin, “Multiple Imputations after 18 plus Years,” Journal of the American Statistical Association, Vol. 91, No. 434, 1996, pp. 473-489. http://dx.doi.org/10.1080/01621459.1996.10476908
[12] A. R. Donders, G. J. van der Heijden, T. Stijnen and K. G. Moons, “Review: A Gentle Introduction to Imputation of Missing Values,” Journal of Clinical Epidemiology, Vol. 59, No. 10, 2006, pp. 1087-1091. http://dx.doi.org/10.1016/j.jclinepi.2006.01.014
[13] J. L. Schafer and J. W. Graham, “Missing Data: Our View of the State of the Art,” Psychological Methods, Vol. 7, No. 2, 2002, pp. 147-177. http://dx.doi.org/10.1037/1082-989X.7.2.147
[14] C. Bono, L. D. Ried, C. Kimberlin and B. Vogel, “Missing Data on the Center for Epidemiologic Studies Depression Scale: A Comparison of 4 Imputation Techniques,” Research in Social and Administrative Pharmacy, Vol. 3, No. 1, 2007, pp. 1-27. http://dx.doi.org/10.1016/j.sapharm.2006.04.001
[15] F. M. Shrive, H. Stuart, H. Quan and W. A. Ghali, “Dealing with Missing Data in a MultiQuestion Depression Scale: A Comparison of Imputation Methods,” BMC Medical Research Methodology, Vol. 6, No. 57, 2006. http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citation&list_uids=17166270 http://dx.doi.org/10.1186/1471-2288-6-57
[16] N. J. Horton and K. P. Kleinman, “Much Ado about Nothing: A Comparison of Missing Data Methods and Software to Fit Incomplete Data Regression Models,” American Statistician, Vol. 61, No. 1, 2007, pp. 79-90. http://dx.doi.org/10.1198/000313007X172556
[17] R. J. Little, M. Yosef, K. C. Cain, B. Nan and S. D. Harlow, “A Hot-Deck Multiple Imputation Procedure for Gaps in Longitudinal Data on Recurrent Events,” Statistics in Medicine, Vol. 27, No. 1, 2008, pp. 103-120. http://dx.doi.org/10.1002/sim.2939
[18] J. Siddique and T. R. Belin, “Multiple Imputation Using an Iterative Hot-Deck with Distance-Based Donor Selection,” Statistics in Medicine, Vol. 27, No. 1, 2008, pp. 83102.
http://dx.doi.org/10.1002/sim.3001
[19] P. Royston, “Multiple Imputation of Missing Data: Update,” Stata Journal, Vol. 5, No. 4, 2005, pp. 188-201.
[20] P. Royston, “Multiple Imputation of Missing Data,” Stata Journal, Vol. 4, No. 3, 2004, pp. 227-241.
[21] J. W. Graham, A. E. Olchowski and T. D. Gilreath, “How Many Imputations Are Really Needed? Some Practical Clarifications of Multiple Imputation Theory,” Prevention Science, Vol. 8, No. 3, 2007, pp. 206-213. http://dx.doi.org/10.1007/s11121-007-0070-9
[22] A. Marshall, D. G. Altman, R. L. Holder and P. Royston, “Combining Estimates of Interest in Prognostic Modelling Studies after Multiple Imputation: Current Practice and Guidelines,” BMC Medical Research Methodology, Vol. 9, No. 57, 2009. http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citation&list_uids=21194416 http://dx.doi.org/10.1186/1471-2288-9-57
[23] J. Hardt, M. Herke and R. Leonhart, “Auxiliary Variables in Multiple Imputation in Regression with Missing X: A Warning against Including Too Many in Small Sample Research,” BMC Medical Research Methodology, Vol. 12, No. 184, 2012.
http://www.biomedcentral.com/1471-2288/12/184 http://dx.doi.org/10.1186/1471-2288-12-184
[24] G. Ambler, R. Z. Omar and P. Royston, “A Comparison of Imputation Techniques for Handling Missing Predictor Values in a Risk Model with a Binary Outcome,” Statistical Methods in Medical Research, Vol. 16, No. 3, 2007, pp. 277-298. http://dx.doi.org/10.1177/0962280206074466
[25] P. Royston, “Multiple Imputation of Missing Data: Update of Ice,” Stata Journal, Vol. 5, No. 4, 2005, pp. 527536.
[26] K. Groothuis-Oudshoorn and S. van Buuren, “Mice: Multivariate Imputation by Chained Equations in R,” Journal of Statistical Software, Vol. 45, No. 3, 2011. http://www.jstatsoft.org/v45/i03
[27] S. van Buuren, H. C. Boshuizen and D. L. Knook, “Multiple Imputation of Missing Blood Pressure Covariates in Survival Analysis,” Statistics in Medicine, Vol. 18, No. 6, 1999, pp. 681-694. http://dx.doi.org/10.1002/(SICI)1097-0258(19990330)18:6<681::AID-SIM71>3.0.CO;2-R
[28] A. Gelman, G. King and C. Liu, “Not Asked and Not Answered: Multiple Imputation for Multiple Surveys,” Journal of the American Statistical Association, Vol. 93, No. 443, 1998, pp. 846-855. http://dx.doi.org/10.1080/01621459.1998.10473737
[29] S. R. Seaman, J. W. Bartlett and I. R. White, “Multiple Imputation of Missing Covariates with Non-Linear Effects and Interactions: An Evaluation of Statistical Methods,” BMC Medical Research Methodology, Vol. 12, No. 46, 2012. http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citation&list_uids=22489953 http://dx.doi.org/10.1186/1471-2288-12-46
[30] S. van Buuren, “Flexible Imputation of Missing Data,” CRC Press (Chapman & Hall), Boca Raton, 2012. http://dx.doi.org/10.1201/b11826
[31] P. Royston, “Multiple Imputation of Missing Data: Update,” Stata Journal, Vol. 5, No. 2, pp. 188-201.
[32] J. Hardt, U. T. Egle and J. G. Johnson, “Suicide Attempts and Retrospective Reports about Parent-Child Relationships: Evidence for the Affectionless Control Hypothesis,” GMS—Psycho-Social-Medicine, 2007. http://www.egms.de/en/journals/psm/2007-4/psm000044.shtml
[33] U. T. Egle and J. Hardt, “MSBI: Mainz Structured Biographical Interview (MSBI: Mainzer Strukturiertes Biografisches Interview),” In: B. Strauss and J. Schumacher, Eds., Klinische Interviews und Ratingskalen. Diagnostik für Klinik und Praxis (Band 4), 2004, Gottingen, Hogrefe, pp. 261-265.
[34] J. Hardt, U. T. Egle and A. Engfer, “Der Kindheitsfragebogen, ein Instrument zur Beschreibung der Erlebten Kindheitsbeziehungen zu den Eltern,” Zeitschrift fuer Differentielle und Diagnostische Psychololgie, Vol. 24, No. 1, 2003, pp. 33-43. http://dx.doi.org/10.1024//0170-1789.24.1.33
[35] J. Hunzinger, U. T. Egle, G. Vossel and J. Hardt, “Stabilitat und Stimmungsabhangigkeit Retrospektiver Berichte zu Eltern-Kind-Beziehungen,” Zeitschrift fuer Klinische Psychologie und Psychotherapie, Vol. 36, No. 4, 2007, pp. 235-242. http://dx.doi.org/10.1026/1616-3443.36.4.235
[36] C. E. Enders, “Applied Missing Data Analysis,” Guilford, New York, 2010.
[37] A. Marshall, D. G. Altman and R. L. Holder, “Comparison of Imputation Methods for Handling Missing Covariate Data When Fitting a Cox Proportional Hazards Model: A Resampling Study,” BMC Medical Research Methodology, Vol. 10, No. 112, 2010. http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citation&list_uids=21194416 http://dx.doi.org/10.1186/1471-2288-10-112
[38] Y. An, “Smoothed Empirical Likelihood Inference for ROC Curves with Missing Dat,” Open Journal of Statistics, Vol. 2012, No. 2, 2012, pp. 21-27.
http://www.SciRP.org/journal/ojs http://dx.doi.org/10.4236/ojs.2012.21003

  
comments powered by Disqus

Copyright © 2018 by authors and Scientific Research Publishing Inc.

Creative Commons License

This work and the related PDF file are licensed under a Creative Commons Attribution 4.0 International License.