TITLE:
Multiple Imputation of Missing Data: A Simulation Study on a Binary Response
AUTHORS:
Jochen Hardt, Max Herke, Tamara Brian, Wilfried Laubach
KEYWORDS:
Multiple Imputation; Chained Equation; Large Proportion Missing; Main Effect; Interaction Effect
JOURNAL NAME:
Open Journal of Statistics,
Vol.3 No.5,
October
9,
2013
ABSTRACT:
Currently, a growing number of
programs become available in statistical software for multiple imputation of
missing values. Among others, two algorithms are mainly implemented:
Expectation Maximization (EM) and Multiple Imputation by Chained Equations
(MICE). They have been shown to work well in large samples or when only small
proportions of missing data are to be imputed. However, some researchers have
begun to impute large proportions of missing data or to apply the method to
small samples. A simulation was performed using MICE on datasets with 50, 100
or 200 cases and four or eleven variables. A varying proportion of data (3% - 63%)
was set as missing completely at random and subsequently substituted using
multiple imputation by chained equations. In a logistic regression model, four
coefficients, i.e. non-zero and zero
main effects as well as non-zero and zero interaction effects were examined.
Estimations of all main and interaction effects were unbiased. There was a considerable
variance in the estimates, increasing with the proportion of missing data and
decreasing with sample size. The imputation of missing data by chained equations
is a useful tool for imputing small to moderate proportions of missing data.
The method has its limits, however. In small samples, there are considerable
random errors for all effects.