Copy Mean: A New Method to Impute Intermittent Missing Values in Longitudinal Studies


Longitudinal studies are those in which the same variable is repeatedly measured at different times. These studies are more likely than others to suffer from missing values. Since the presence of missing values may have an important impact on statistical analyses, it is important that they should be dealt with properly. In this paper, we present “Copy Mean”, a new method to impute intermittent missing values. We compared its efficiency in eleven imputation methods dedicated to the treatment of missing values in longitudinal data. All these methods were tested on three markedly different real datasets (stationary, increasing, and sinusoidal pattern) with complete data. For each of them, we generated nine types of incomplete datasets that include 10%, 30%, or 50% of missing data using either a Missing Completely at Random, a Missing at Random, or a Missing Not at Random missingness mechanism. Our results show that Copy Mean has a great effectiveness, exceeding or equaling the performance of other methods in almost all configurations. The effectiveness of linear interpolation is highly data-dependent. The Last Occurrence Carried Forward method is strongly discouraged.

Share and Cite:

C. Genolini, R. Écochard and H. Jacqmin-Gadda, "Copy Mean: A New Method to Impute Intermittent Missing Values in Longitudinal Studies," Open Journal of Statistics, Vol. 3 No. 4A, 2013, pp. 26-40. doi: 10.4236/ojs.2013.34A004.

Conflicts of Interest

The authors declare no conflicts of interest.


[1] R. Little, “Pattern-Mixture Models for Multivariate Incomplete Data,” Journal of the American Statistical Association, Vol. 88, No. 421, 1993, pp. 125-134.
[2] N. Laird, “Missing Data in Longitudinal Studies,” Statistics in Medicine, Vol. 7, No. 1-2, 1988, pp. 305-315. doi:10.1002/sim.4780070131
[3] J. Engels and P. Diehr, “Imputation of Missing Longitudinal Data: A Comparison of Methods,” Journal of Clinical Epidemiology, Vol. 56, No. 10, 2003, pp. 968-976. doi:10.1016/S0895-4356(03)00170-7
[4] R. Little, “Modeling the Drop-Out Mechanism in Repeated-Measures Studies,” Journal of the American Statistical Association, Vol. 90, No. 431, 1995, pp. 1112-1121. doi:10.1080/01621459.1995.10476615
[5] S. Zeger and K. Liang, “An Overview of Methods for the Analysis of Longitudinal Data,” Statistics in Medicine, Vol. 11, No. 14-15, 1992, pp. 1825-1839. doi:10.1002/sim.4780111406
[6] W. Shih, H. Quan, et al., “Testing for Treatment Differences with Dropouts Present in Clinical Trials—A Composite Approach,” Statistics in Medicine, Vol. 16, No. 11, 1997, pp. 1225-1239. doi:10.1002/(SICI)1097-0258(19970615)16:11<1225::AID-SIM548>3.0.CO;2-Y
[7] E. Dantan, C. Proust-Lima, L. Letenneur and H. JacqminGadda, “Pattern Mixture Models and Latent Class Models for the Analysis of Multivariate Longitudinal Data with Informative Dropouts,” The International Journal of Biostatistics, Vol. 4, No. 1, 2008, pp. 1-26. doi:10.2202/1557-4679.1088
[8] J. Twisk and W. De Vente, “Attrition in Longitudinal Studies: How to Deal with Missing Data,” Journal of Clinical Epidemiology, Vol. 55, No. 4, 2002, pp. 329-337. doi:10.1016/S0895-4356(01)00476-0
[9] D. Rubin, “Inference and Missing Data,” Biometrika, Vol. 63, No. 3, 1976, pp. 581-592. doi:10.1093/biomet/63.3.581
[10] R. Little and D. Rubin, “Statistical Analysis with Missing Data,” Vol. 4, Wiley, New York, 1987.
[11] G. Molenberghs, H. Thijs, I. Jansen, C. Beunckens, M. Kenward, C. Mallinckrodt and R. Carroll, “Analyzing Incomplete Longitudinal Clinical Trial Data,” Biostatistics, Vol. 5, No. 3, 2004, pp. 445-464. doi:10.1093/biostatistics/kxh001
[12] J. Graham, S. Hofer and A. Piccinin, “Analysis with Missing Data in Drug Prevention Research,” NIDA Research Monograph, Vol. 142, 1994, pp. 13-63.
[13] F. Fritsch and R. Carlson, “Monotone Piecewise Cubic Interpolation,” SIAM Journal on Numerical Analysis, Vol. 17, No. 2, 1980, pp. 238-246. doi:10.1137/0717021
[14] C. Genolini and B. Falissard, “Kml: k-Means for Longitudinal Data,” Computational Statistics, Vol. 25, No. 2, 2010, pp. 317-328. doi:10.1007/s00180-009-0178-4
[15] C. Genolini and B. Falissard, “Kml: A Package to Cluster Longitudinal Data,” Computer Methods and Programs in Biomedicine, Vol. 104, No. 3, 2011, pp. e112-e121. doi:10.1016/j.cmpb.2011.05.008
[16] C. Genolini, J. Pingault, T. Driss, S. Coté, R. Tremblay, F. Vitaro, C. Arnaud and B. Falissard, “KmL3D: A Non-Parametric Algorithm for Clustering Joint Trajectories,” Computer Methods and Programs in Biomedicine, Vol. 109, No. 1, 2012, pp. 104-111.
[17] R. Ecochard, H. Boehringer, M. Rabilloud and H. Marret, “Chronological Aspects of Ultrasonic, Hormonal, and Other Indirect Indices of Ovulation,” BJOG: An International Journal of Obstetrics & Gynaecology, Vol. 108, No. 8, 2001, pp. 822-829. doi:10.1111/j.1471-0528.2001.00194.x
[18] D. Lee, J. Archibald, R. Schoenberger, A. Dennis and D. Shiozawa, “Contour Matching for Fish Species Recognition and Migration Monitoring,” Applications of Computational Intelligence in Biology, Vol. 122, 2008, pp. 183207.
[19] R. Tremblay, R. Pihl, F. Vitaro, and P. Dobkin, “Predicting Early Onset of Male Antisocial Behavior from Preschool Behavior,” Archives of General Psychiatry, Vol. 51, No. 9, 1994, p. 732. doi:10.1001/archpsyc.1994.03950090064009
[20] O. Francois and P. Leray, “Generation of Incompliete Test-Data Usinng Bayesinan Networks,” International Joint Conference on Neural Networks, Orlando, 12-17 August 2007, pp. 2391-2396.
[21] R Development Core Team, “A Language and Environment for Statistical Computing,” R Foundation for Statistical Computing, Vienna, 2012.
[22] C. Genolini, “Longitudinal Data,” R Package Version 2.3., 2012.
[23] G. Forsythe, M. Malcolm and C. Moler, “Computer Methods for Mathematical Computations,” Prentice Hall Professional Technical Reference, 1977.
[24] S. Buuren and K. Groothuis-Oudshoorn, “Mice: Multivariate Imputation by Chained Equations in r,” Journal of Statistical Software, Vol. 45, No. 3, 2011.
[25] G. Gadbury, C. Coffey and D. Allison, “Modern Statistical Methods for Handling Missing Repeated Measurements in Obesity Trial Data: Beyond LOCF,” Obesity Reviews, Vol. 4, No. 3, 2003, pp. 175-184. doi:10.1046/j.1467-789X.2003.00109.x
[26] S. Fielding, G. Maclennan, J. Cook and C. Ramsay, “A Review of RCTS in Four Medical Journals to Assess the Use of Imputation to Overcome Missing Data in Quality of Life Outcomes,” Trials, Vol. 9, No. 1, 2008, p. 51. doi:10.1186/1745-6215-9-51

Copyright © 2024 by authors and Scientific Research Publishing Inc.

Creative Commons License

This work and the related PDF file are licensed under a Creative Commons Attribution 4.0 International License.