Open Journal of Statistics
Volume 3, Issue 5 (October 2013)
ISSN Print: 2161-718X ISSN Online: 2161-7198
Google-based Impact Factor: 0.53 Citations
Multiple Imputation of Missing Data: A Simulation Study on a Binary Response ()
Affiliation(s)
ABSTRACT
Currently, a growing number of programs become available in statistical software for multiple imputation of missing values. Among others, two algorithms are mainly implemented: Expectation Maximization (EM) and Multiple Imputation by Chained Equations (MICE). They have been shown to work well in large samples or when only small proportions of missing data are to be imputed. However, some researchers have begun to impute large proportions of missing data or to apply the method to small samples. A simulation was performed using MICE on datasets with 50, 100 or 200 cases and four or eleven variables. A varying proportion of data (3% - 63%) was set as missing completely at random and subsequently substituted using multiple imputation by chained equations. In a logistic regression model, four coefficients, i.e. non-zero and zero main effects as well as non-zero and zero interaction effects were examined. Estimations of all main and interaction effects were unbiased. There was a considerable variance in the estimates, increasing with the proportion of missing data and decreasing with sample size. The imputation of missing data by chained equations is a useful tool for imputing small to moderate proportions of missing data. The method has its limits, however. In small samples, there are considerable random errors for all effects.
KEYWORDS
Share and Cite:
Cited by
Copyright © 2024 by authors and Scientific Research Publishing Inc.
This work and the related PDF file are licensed under a Creative Commons Attribution 4.0 International License.