_{1}

^{*}

Missing data can frequently occur in a longitudinal data analysis. In the literature, many methods have been proposed to handle such an issue. Complete case (CC), mean substitution (MS), last observation carried forward (LOCF), and multiple imputation (MI) are the four most frequently used methods in practice. In a real-world data analysis, the missing data can be MCAR, MAR, or MNAR depending on the reasons that lead to data missing. In this paper, simulations under various situations (including missing mechanisms, missing rates, and slope sizes) were conducted to evaluate the performance of the four methods considered using bias, RMSE, and 95% coverage probability as evaluation criteria. The results showed that LOCF has the largest bias and the poorest 95% coverage probability in most cases under both MAR and MCAR missing mechanisms. Hence, LOCF should not be used in a longitudinal data analysis. Under MCAR missing mechanism, CC and MI method are performed equally well. Under MAR missing mechanism, MI has the smallest bias, smallest RMSE, and best 95% coverage probability. Therefore, CC or MI method is the appropriate method to be used under MCAR while MI method is a more reliable and a better grounded statistical method to be used under MAR.

The problem of missing observations can frequently occur in all types of clinical trials, especially when observations are measured repeatedly at each scheduled visit for the same subject in a longitudinal study. In longitudinal studies, there are many possible causes leading to missing data including the duration of the study, the nature of the disease, the efficacy and adverse effects of the drug under study, accidents, patients’ refusal to continue, moving, or other administrative reasons. Frequently, missingness can potentially lead to two serious problems in statistical practice: reducing the overall statistical power and having biases in the estimates. In statistical practice, missing data is a key problem that can never be avoided completely. Since the most traditional statistical methods are intentionally designed to handle complete data sets by default, therefore data analysts should pay special attention to incomplete data sets.

Little and Rubin [

In the literature, several alternative statistical approaches have been applied to the analysis of longitudinal data with missing values. These appropriate methods for analysis should be selected based on the data missing mechanism, since different statistical methods are valid only under certain situations (missing mechanisms) with specified missing rates. In other words, there is no unique best method available for all situations. However, it is difficult to test the missing mechanism in a longitudinal clinical study and there is also no clear rules regarding how much is qualified as too much missing data [

Despite these difficulties, several researchers have considered and constructed simulation studies for the proof of strong consistency of imputation methods to check the efficiency of the imputation methods. For example, Myers [

Hening and Koonce [

Ali, et al. [

Recently, Nakai, et al. [

This paper is organized as follows. Section 2 reviews methods of missing data analysis. The simulation procedures (with available covariates) under MCAR and MAR settings are described in Section 3. In Section 4, the simulation results are used to evaluate the performance of those four imputation methods considered. Finally, discussion and concluding remarks are provided in Section 5.

There are so many techniques in handling missing data discussed in the literature. Especially, many methods have been proposed and developed to handle missing data in longitudinal clinical trials. However, there are few methods that are actually used in real trials with missing data. The purpose of this paper is to study four most frequently used methods for dealing with missing data and they will be described as follows.

This method deletes all cases with missing data and then performs statistical analyses on the remaining complete data set (which has a smaller sample size). Since all cases containing missing data have been removed, there is no missing data problem to handle. Therefore, all statistical methods can be used to analyze the smaller data set. Obviously, one major advantage of this method is its ease of use. In fact, virtually all statistical programs incorporate this method as a default method because it accommodates any type of statistical analysis [

In general, the major disadvantage of the method is that it could possibly lead to losing statistical power due to the reduction of the sample size. Also, complete case techniques erode efficiency such that the variation (i.e., the standard error) around the true estimate is too large [

The method of mean substitution imputes the missing values using the mean of the available observed values. This method has the potential of introducing biases as well as underestimating variability [

The simplest imputation approach is the LOCF method that replaces every missing value with its corresponding last observed value. LOCF method is often used in longitudinal studies of continuous outcomes under MCAR. Conceptually, this method assumes that the outcome would not change after the last observed value. Therefore, there is no time effect since the last observed data. In fact, LOCF has been a popular method that is frequently used in handling missing data problems because it is easy to understand and can be implemented easily as well. Also, unlike the CC method, the sample size does not change. For example, in a clinical trial (see the data below), patient 3 dropped out from the study after baseline. Patient 6 dropped out after the first month follow up.

?: missing value; *: value imputed by LOCF.

If there are more patients dropped out in the placebo group due to the lack of efficacy, then this method might give a biased conclusion about the effect of the treatment group. In general, the measurements are unlikely to remain unchanged for either placebo or treatment group. In our example, the measurement of patient 3 from the control group will increase while that of patient 6 from the active group will decrease. This implies that there is no improvement in the active group and hence no difference between these two groups.

Rigorously speaking, LOCF is not an analytic approach, but it is a method that is very easy to impute missing values. Analytic proofs [

Multiple imputation was first proposed by Rubin [

We conducted a simulation study on different scenarios. In general, generating each dataset is based on the setting described in Section 3.1 by the following assuming: 1) the measurement at the first time point (t = 1) from the original data is completely observed; 2) data are MCAR and MAR missing mechanisms; 3) the missing pattern is monotone. To begin the simulated process, the first step (1-step) generates the five-time points of measurements of each subject by a random number from a multivariate normal distribution with AR(1) correlation structure and repeats the step 100 times for 100 subjects, given the observed values,

n | β_{1} | β_{0} | s^{2} | ρ | Missing Rate at Time t | |||||
---|---|---|---|---|---|---|---|---|---|---|

1 | 2 | 3 | 4 | 5 | ||||||

Case 1 | 100 | 0.1 | 10 | 1 | 0.7 | 0% | 5% | 5% | 5% | 5% |

2 | ||||||||||

10 | ||||||||||

Case 2 | 100 | 0.1 | 10 | 1 | 0.7 | 0% | 5% | 20% | 15% | 20% |

2 | ||||||||||

10 | ||||||||||

Case 3 | 100 | 0.1 | 10 | 1 | 0.7 | 0% | 10% | 20% | 30% | 50% |

2 | ||||||||||

10 |

Finally, each of the predefined 9 situations was repeated 1000 times by using SAS procedures. There are 1000 data sets each containing 100 subjects with 5 time points per subject to be analyzed. Next, the standard mixed model procedures were performed on these simulated data sets. Finally, the regression coefficients and their standard errors were obtained. Then, the performance of four selected methods (that is, complete case analysis, mean substitution, last observation carried forward, and multiple imputation) was compared based on biases, root mean square errors, and 95% coverage probabilities. In the simulation, we considered the missing data with MCAR and MAR missing mechanisms. In addition, without loss of generality, the missing pattern was assumed to be monotone. The missing rate varied from 5% to 50%.

In the simulation, we generated the longitudinal data ^{th} subject at the t^{th} visit according to a multivariate normal distribution model, _{0} is the intercept and β_{1} is the slope. The variance at each occasion is assumed to be constant over time, while the correlation coefficient between

More precisely, a data set X with n rows and p columns is drawn from a multivariate normal distribution with a zero mean vector and a variance-covariance matrix

The correlation structure of an AR(1) model can be described as

where

In this study, the correlation coefficient ρ is taken to be 0.7 to simulate the strong relationships among variables. The total number of cases (or subjects) is 100 and there are 5 measurements at 5 time points for each subject.

Therefore, in the simulation 1000 samples were generated covering 18 different situations: 3 (missing rates) × 3 (values of the slope) × 2 (missing mechanisms).

The data were generated with situations described in Sections 3.1 and 3.2 and the measurements were drawn from a multivariate normal distribution with AR(1) correlation structure [

After the original data sets were created, the measurements at different time points for different subjects were set to missing, according to the MCAR or MAR missing mechanism. However, the measurement at the first time point (that is, the baseline value) of each subject was assumed always observed. In the MCAR setting, missing data were generated randomly at visits 2 through 5 based on the missing probabilities listed in

In the MAR setting, the probability of missing at visit 2 was set in proportional to the baseline values based on a logistic probability distribution model. In the same way, the missing data at visits 3 through 5 were set based on the probabilities given in

Bias, root mean squared error (RMSE), and coverage probability are used as criteria to assess the performance of the four imputation methods. The SAS System (version 9.3) is used to perform all the statistical analyses as well as to produce the required results. Also, we set covariance structure to “Unstructured” simply to explore the accuracy of imputation within “PROC MIXED” procedure. These criteria will be described in detail in the following subsections.

Bias is defined as the difference between the average value of estimated parameters (_{0} and β_{1}) obtained from the corresponding original data set.

The mean squared error (MSE) is defined as the average squared difference between the estimated parameters (_{0} and β_{1}) obtained from the original data set. MSE is equal to the sum of variance and the squared bias of the estimated parameters. RMSE is defined as the square root of the MSE. The RMSE is a useful measure of overall precision or accuracy and can be used to evaluate the performance of each imputation method. In general, the more effective method would have a lower RMSE [

The coverage probability (CP) is defined as the proportion of the simulated data sets, among the 1000 simulated data sets, that yield the 95% confidence intervals containing the true parameter values based on the original data sets. Therefore, an appropriate method should have a coverage probability around 95%.

The simulation results are summarized in Tables 2-7. In these table, CC, MS, LOCF, MI, MCAR, and MAR stand for complete case, mean substitution, last observation carried forward, multiple imputation, missing completely at random, and missing at random, respectively. Case 1, Case 2, and Case 3 represent the low, moderate, and high missing rate setting as given in

Tables 2-4 show the simulation results for MCAR (missing complete at random) data. The cases of low, moderate, and high missing rates are illustrated in

Based on the six performance criteria for MCAR missing data, the LOCF is the poorest method. The CC method is the best method. However, the MI method had the same performance as the CC method. The template is designed so that author affiliations are not repeated each time for multiple authors of the same affiliation. Please keep your affiliations as succinct as possible (for example, do NOT post your job titles, positions, academic degrees, zip codes, names of building/street/district/province/state, etc.). This template was designed for two affiliations.

Slope (β_{1}) | Method | Bias (RMSE) of | Bias (RMSE) of | 95% CP for | 95% CP for |
---|---|---|---|---|---|

0.1 | CC | 0.000 (0.1251) | 0.000 (0.0329) | 89.5 | 92.9 |

MS | 0.000 (0.1254) | 0.000 (0.0329) | 88.4 | 92.3 | |

LOCF | 0.005 (0.1237) | −0.005 (0.0322) | 91.8 | 94.7 | |

MI | 0.000 (0.1253) | 0.000 (0.0330) | 90.1 | 93.2 | |

2 | CC | 0.000 (0.1248) | 0.000 (0.0316) | 89.0 | 94.8 |

MS | 0.000 (0.1251) | 0.000 (0.0316) | 87.9 | 93.7 | |

LOCF | 0.100 (0.1586) | −0.100 (0.1049) | 95.0 | 40.5 | |

MI | 0.000 (0.1244) | 0.000 (0.0317) | 90.3 | 95.2 | |

10 | CC | 0.001 (0.1270) | 0.000 (0.0315) | 89.1 | 95.3 |

MS | 0.001 (0.1273) | 0.000 (0.0314) | 88.1 | 94.3 | |

LOCF | 0.501 (0.5164) | −0.501 (0.5015) | 100.0 | 0.0 | |

MI | 0.001 (0.1266) | −0.001 (0.0315) | 89.6 | 95.7 |

Slope (β_{1}) | Method | Bias (RMSE) of | Bias (RMSE) of | 95% CP for | 95% CP for |
---|---|---|---|---|---|

0.1 | CC | 0.001 (0.1251) | 0.000 (0.0337) | 88.4 | 93.3 |

MS | 0.001 (0.1252) | 0.000 (0.0328) | 84.9 | 90.9 | |

LOCF | 0.018 (0.1214) | −0.013 (0.0327) | 94.6 | 97.2 | |

MI | 0.002 (0.1253) | −0.002 (0.0332) | 90.8 | 96.4 | |

2 | CC | −0.001 (0.1258) | 0.001 (0.0349) | 88.0 | 92.2 |

MS | −0.001 (0.1257) | 0.001 (0.0340) | 85.5 | 89.6 | |

LOCF | 0.351 (0.3708) | −0.251 (0.2525) | 51.3 | 0.0 | |

MI | 0.002 (0.1254) | −0.002 (0.0341) | 90.2 | 95.8 | |

10 | CC | 0.000 (0.1287) | 0.000 (0.0348) | 87.7 | 92.6 |

MS | 0.000 (0.1286) | 0.000 (0.0338) | 84.0 | 90.4 | |

LOCF | 1.751 (1.7555) | −1.251 (1.2511) | 0.1 | 0.0 | |

MI | 0.003 (0.1277) | −0.002 (0.0337) | 90.0 | 95.0 |

Slope (β_{1}) | Method | Bias (RMSE) of | Bias (RMSE) of | 95% CP for | 95% CP for |
---|---|---|---|---|---|

0.1 | CC | 0.002 (0.1014) | −0.001 (0.0336) | 88.7 | 92.6 |

MS | 0.001 (0.0838) | −0.001 (0.0253) | 81.6 | 86.5 | |

LOCF | 0.038 (0.1217) | −0.027 (0.0367) | 94.9 | 93.5 | |

MI | 0.007 (0.1148) | −0.004 (0.0393) | 91.9 | 95.5 | |

2 | CC | 0.000 (0.1016) | 0.000 (0.0337) | 87.6 | 92.8 |

MS | 0.000 (0.0839) | 0.000 (0.0253) | 79.2 | 86.8 | |

LOCF | 0.781 (0.2135) | −0.541 (0.0644) | 0.1 | 0.0 | |

MI | 0.006 (0.1149) | −0.003 (0.0394) | 91.5 | 96.0 | |

10 | CC | 0.000 (0.1016) | 0.000 (0.0337) | 89.0 | 93.5 |

MS | 0.000 (0.0839) | 0.000 (0.0253) | 81.8 | 87.0 | |

LOCF | 3.901 (0.8862) | −2.700 (0.2672) | 0.0 | 0.0 | |

MI | 0.005 (0.1149) | −0.002 (0.0393) | 92.4 | 96.7 |

Tables 5-7 show the simulation results for MAR (missing at random) data. The cases of low, moderate, and high missing rates are illustrated in

Slope (β_{1}) | Method | Bias (RMSE) of | Bias (RMSE) of | 95% CP for | 95% CP for |
---|---|---|---|---|---|

0.1 | CC | −0.029 (0.1253) | −0.001 (0.0317) | 89.7 | 95.3 |

MS | −0.029 (0.1255) | −0.001 (0.0317) | 88.4 | 94.3 | |

LOCF | −0.006 (0.1207) | 0.012 (0.0324) | 92.0 | 95.0 | |

MI | 0.001 (0.1222) | −0.000 (0.0319) | 91.3 | 95.7 | |

2 | CC | −0.029 (0.1290) | −0.002 (0.0328) | 88.7 | 93.9 |

MS | −0.028 (0.1293) | −0.001 (0.0330) | 87.8 | 93.3 | |

LOCF | 0.090 (0.1533) | −0.084 (0.0891) | 90.0 | 41.4 | |

MI | 0.001 (0.1262) | −0.001 (0.0329) | 88.9 | 94.5 | |

10 | CC | −0.028 (0.1296) | −0.001 (0.0326) | 89.0 | 93.8 |

MS | −0.028 (0.1298) | −0.001 (0.0326) | 88.0 | 93.2 | |

LOCF | 0.490 (0.5054) | −0.483 (0.4842) | 100.0 | 0.0 | |

MI | 0.002 (0.1267) | −0.000 (0.0327) | 89.8 | 94.2 |

Slope (β_{1}) | Method | Bias (RMSE) of | Bias (RMSE) of | 95% CP for | 95% CP for |
---|---|---|---|---|---|

0.1 | CC | 0.032 (0.1247) | −0.050 (0.0605) | 88.2 | 62.7 |

MS | 0.016 (0.1216) | −0.039 (0.0512) | 86.6 | 68.3 | |

LOCF | −0.051 (0.1264) | 0.043 (0.0518) | 91.0 | 76.8 | |

MI | 0.001 (0.1219) | −0.003 (0.0347) | 91.4 | 96.0 | |

2 | CC | 0.033 (0.1276) | −0.050 (0.0599) | 88.3 | 63.3 |

MS | 0.017 (0.1246) | −0.039 (0.0507) | 85.7 | 68.3 | |

LOCF | 0.281 (0.3054) | −0.195 (0.1966) | 38.5 | 0.0 | |

MI | 0.001 (0.1267) | −0.003 (0.0345) | 90.2 | 94.5 | |

10 | CC | 0.032 (0.1235) | −0.050 (0.0602) | 88.7 | 63.1 |

MS | 0.017 (0.1204) | −0.039 (0.0510) | 86.8 | 68.5 | |

LOCF | 1.682 (1.6854) | −1.195 (1.1949) | 0.0 | 0.0 | |

MI | −0.001 (0.1213) | −0.002 (0.0351) | 91.5 | 94.5 |

slope cases. In summary, the LOCF method yields larger biases, RMSEs, and poor 95% CPs in most cases. In contrast, the MI method performs much better than the other three methods under MAR mechanism.

Although the simulation results suggested that the CC method was superior to the MS, LOCF, and MI methods under MCAR missing mechanism while MI method was superior to CC, MS, and MI methods under MAR, the performance of these methods actually depended on several factors especially the missing rate and time effect (that is, the size of the slope). However, there is no one single method that is the best under all situations.

Under the assumption of MCAR missing mechanism, when the missing rate increased from low to moderate (slope = 0.1 or 2), the values of estimated bias and RMSE for CC, MS, and MI methods were very close. Except

Slope (β_{1}) | Method | Bias (RMSE) of | Bias (RMSE) of | 95% CP for | 95% CP for |
---|---|---|---|---|---|

0.1 | CC | 0.095 (0.1575) | −0.104 (0.1097) | 79.8 | 15.5 |

MS | 0.015 (0.1256) | −0.053 (0.0619) | 82.6 | 48.4 | |

LOCF | −0.056 (0.1288) | 0.052 (0.0581) | 90.5 | 66.6 | |

MI | 0.008 (0.1348) | −0.004 (0.0433) | 92.5 | 96.5 | |

2 | CC | 0.093 (0.1572) | −0.103 (0.1094) | 80.0 | 18.1 |

MS | 0.014 (0.1269) | −0.052 (0.0623) | 81.7 | 48.9 | |

LOCF | 0.684 (0.6934) | −0.461 (0.4619) | 0.0 | 0.0 | |

MI | 0.006 (0.1392) | −0.004 (0.0462) | 91.4 | 95.2 | |

10 | CC | 0.096 (0.1590) | −0.104 (0.1101) | 79.4 | 16.6 |

MS | 0.015 (0.1262) | −0.053 (0.0622) | 84.0 | 47.2 | |

LOCF | 3.805 (3.8064) | −2.621 (2.6212) | 0.0 | 0.0 | |

MI | 0.009 (0.1367) | −0.005 (0.0432) | 93.0 | 96.9 |

for high missing rate and large slope (that is, slope = 10), the values of bias and RMSE obtained by the MI method had large differences compared with those obtained by the CC and MS methods. This is not surprising at all because the CC method will yield unbiased estimated parameters under MCAR only with a small missing rate.

For the MAR missing data, the simulation results revealed that MI is the best method regardless of the missing rate and slope size based on bias, RMSE, and 95% CP. In fact, such a result is well documented in the literature [

In this paper, we consider a longitudinal study with five visiting time points and a total of 100 subjects. Three possible missing rates and three different slopes are used to mimic the real-world situations. In addition, two missing mechanisms are considered (that is, MCAR and MAR).Based on the simulation results, we have reached the following important conclusions: 1) CC method is the most appropriate method for handling MCAR missing data; 2) MI method is the most effective one in all simulated situations particularly under MAR setting because it yields smallest biases and has good 95% CP compared with the other methods; 3) the use of the LOCF method can potentially lead to imprecise parameter estimates hence can lead to invalid inferences.

In practice, inferior methods such as LOCF are still used for the longitudinal data analysis. The results via the simulation dataare indeed provide a good reference and rationale in choosing missing data handling method in order to obtain precise parameter estimates and valid inferences. Kenward and Molenberghs [

I would like to thank the reviewer for his/her valuable comments and suggestions that make this paper much better in its contents.