An Empirical Bayes Approach to Robust Variance Estimation: A Statistical Proposal for Quantitative Medical Image Testing

Abstract

The current standard for measuring tumor response using X-ray, CT and MRI is based on the response evaluation criterion in solid tumors (RECIST) which, while providing simplifications over previous (WHO) 2-D methods, stipulate four response categories: CR (complete response), PR (partial response), PD (progressive disease), SD (stable disease) based purely on percentage changes without consideration of any measurement uncertainty. In this paper, we propose a statistical procedure for tumor response assessment based on uncertainty measures of radiologist’s measurement data. We present several variance estimation methods using time series methods and empirical Bayes methods when a small number of serial observations are available on each member of a group of subjects. We use a publically available database which contains a set of over 100 CT scan images on 23 patients with annotated RECIST measurements by two radiologist readers. We show that despite of bias in each individual reader’s measurements, statistical decisions on tumor change can be made on each individual subject. The consistency of the two readers can be established based on the intra-reader change assessments. Our proposal compares favorably with the RECIST standard protocol, raising the hope that, statistically sound decision on change analysis can be made in future based on careful variability and measurement uncertainty analysis.

Share and Cite:

Z. Lu, C. Fenimore, R. Gottlieb and C. Jaffe, "An Empirical Bayes Approach to Robust Variance Estimation: A Statistical Proposal for Quantitative Medical Image Testing," Open Journal of Statistics, Vol. 2 No. 3, 2012, pp. 260-268. doi: 10.4236/ojs.2012.23031.

Conflicts of Interest

The authors declare no conflicts of interest.

References

[1] E. A. Eisenhauer, P. Therasse, J. Bogaerts, L. H. Schwartz, D. Sargent, R. Ford, J. Dancey, S. Arbuck, S. Gwyther, M. Mooney, L. Rubinstein, L. Shankar, L. Dodd, R. Kaplan, D. Lacombe and J. Verweij, “New Response Evaluation Criteria in Solid Tumours: Revised RECIST Guideline (Version 1.1),” European Journal of Cancer, Vol. 45, No. 2, 2009, pp. 228-247. doi:10.1016/j.ejca.2008.10.026
[2] C. C. Jaffe, “Measures of Response: RECIST, WHO, and New Alternatives,” Journal of Clinical Oncology, Vol. 24, No. 20, 2006, pp. 3245-3251. doi:10.1200/JCO.2006.06.5599
[3] H. Robbins, “Estimating Many Variances,” In: S. S. Gupta, Ed., Statistical Decision Theory and Related Topics III, Vol. 2, Academic Press, New York, 1982, pp. 251-261.
[4] H. Robbins, “Some Thoughts on Empirical Bayes Eestimation,” Annals of Statistics, Vol. 11, No. 3, 1983, pp. 713-723. doi:10.1214/aos/1176346239
[5] L. H. Schwartz, M. Mazumdar, W. Brown, A. Smith and D. M. Panicek, “Variability in Response Assessment in Solid Tumors: Effect of Number of Lesions Chosen for Measurement,” Clinical Cancer Research, Vol. 9, No. 12, 2003, pp. 4318-4323.
[6] Z. Q. J. Lu, N. Petrick, C. Fenimore, D. Clunie, K. Borradaile, R. Ford, M. F. McNitt-Gray, H. J. G. Kim, R. Zeng, M. A. Gavrielides, B. Zhao and A. J. Buckler, “Statistical Analysis of Reader Measurement Variability in Nodule Sizing with CT Phantom Imaging Data,” NIST Interagency Report, 2012.
[7] J. J. Erasmus, G. W. Gladish, L. Broemeling, B. S. Sabloff, M. T. Truong, R. S. Herbst and R. F. Munden, “Interobserver and Intraobserver Variability in Measurement of Non-Small-Cell Carcinoma Lung Lesions: Implications for Assessment of Tumor Response,” Journal of Clinical Oncology, Vol. 21, No. 13, 2003, pp. 2574-2582. doi:10.1200/JCO.2003.01.144
[8] L. E. Dodd, R. F. Wagner, S. G. Armato III, M. F. McNittGray, S. Beiden, H.-P. Chan, D. Gur, G. McleNnan, C. E. Metz, N. Petrick, B. Sahiner and J. Sayre, “Assessment Methodologies and Statistical Issues for Computer-Aided Diagnosis of Lung Nodules in Computed Tomography: Contemporary Research Topics Relevant to the Lung Image Database Consortium,” Academic Radiology, Vol. 11, No. 4, 2004, pp. 462-475. doi:10.1016/S1076-6332(03)00814-6
[9] C. R. Meyer, T. D. Johnson, G. McLennan, D. R. Aberle, E. A. Kazerooni, H. MacMahon, B. F. Mullan, D. F. Yankelevitz, E. J. R. van Beek, S. G. Armato III, M. F. McNitt-Gray, A. P. Reeves, D. Gur, C. I. Henschke, E. A. Hoffman, R. H. Bland, G. Laderach, R. Pais, D. Qing, C. Piker, J. Guo, A. Starkey, D. Max, B. Y. Croft and L. P. Clarke, “Evaluation of Lung MDCT Nodule Annotation Across Radiologists and Methods,” Academic Radiology, Vol. 13, No. 10, 2006, pp. 1254-1265. doi:10.1016/j.acra.2006.07.012
[10] RIDER: Reference Image Database to Evaluate Response, National Institute of Biomedical Imaging and Bioengineering Institute of NIH. http://www.nibib.nih.gov/Research/Resources/ImageClinData#RIDER
[11] Z. Q. Lu, “Local Polynomial Prediction and Volatility Estimation in Financial Time Series,” In: A. S. Soofi and L. Cao, Eds., Modelling and Forecasting Financial Data: Techniques of Nonlinear Dynamics, Kluwer, Boston, 2002, pp. 115-135.
[12] C. R. Meyer, S. G. Armato III, C. P. Fenimore, G. McLennan, L. M. Bidaut, D. P. Barboriak, M. A. Gavrielides, E. F. Jackson, M. F. McNitt-Gray, P. E. Kinahan, N. Petrick and B. Zhao, “Quantitative Imaging to Assess Tumor Response to Therapy: Common Themes of Measurement, Truth Data, and Error Sources,” Translational Oncology, Vol. 2, No. 4, 2009, pp.198-210.
[13] P. J. Huber, “Robust Statistics,” Wiley, New York, 1981.
[14] D. C. Hoaglin, F. Mosteller and J. W. Tukey, “Understanding Robust and Exploratory Data Analysis,” Wiley, New York, 1983.
[15] S. G. Armato III, C. R. Meyer, M. F. McNitt-Gray, G. McLennan, A. P. Reeves, B. Y. Croft and L. P. Clarke, “The Reference Image Database to Evaluate Response to Therapy in Lung Cancer (RIDER) Project: A Resource for the Development of Change-Analysis Software,” Clinical Pharmacology & Therapeutics, Vol. 84, No. 4, 2008, pp.448-456. doi:10.1038/clpt.2008.161
[16] J. R. Landis and G. G. Koch, “The Measurement of Observer Agreement for Categorical Data,” Biometrics, Vol. 33, No. 1, 1977, pp.159-174. doi:10.2307/2529310
[17] A. J. Viera and J. M. Garrett, “Understanding the Interobserver Agreement: The Kappa Statistics,” Family Medicine, Vol. 37, No. 5, 2005, pp. 360-363.
[18] B. Zhao, L. P. James, C. S. Moskowitz, P. Guo, M. S. Ginsberg, R. A. Lefkowitz, Y. Qin, G. J. Riely, M. G. Kris and L. H. Schwartz, “Evaluating Variability in Tumor Measurements from Same-Day Repeat Scans of Patients with Non-Small Cell Lung Cancer,” Radiology, Vol. 252, No. 1, 2009, pp. 263-272. doi:10.1148/radiol.2522081593
[19] A. P. Reeves, A. B. Chan, D. F. Yankelevitz, C. I. Henschke, B. Kressler, W. J. Kostis, “On Measuring the Change in Size of Pulmonary Nodules,” IEEE Transactions on Medical Imaging, Vol. 25, No. 4, 2006, pp. 435- 450. doi:10.1109/TMI.2006.871548
[20] J. M. Reinhardt, K. Ding, K. Cao, C. E. Christensen, E. A. Hoffman and S. V. Bodas, “Registration-Based Estimates of Local Lung Tissue Expansion Compared to Xenon CT Measures of Specific Ventilation,” Medical Image Analysis, Vol. 12, No. 6, 2008, pp. 752-763. doi:10.1016/j.media.2008.03.007
[21] L. D. Broemeling, “Bayesian Biostatistics and Diagnostic Medicine,” Chapmall & Hll/CRC, Boca Raton, 2007.
[22] E. A. Eisenhauer, P. Therasse, J. Bogaerts, L. H. Schwartz, D. Sargent, R. Ford, J. Dancey, S. Arbuck, S. Gwyther, M. Mooney, L. Rubinstein, L. Shankar, L. Dodd, R. Kaplan, D. Lacombe and J. Verweij, “New Response Evaluation Criteria in Solid Tumours: Revised RECIST Guideline (Version 1.1),” European Journal of Cancer, Vol. 45, No. 2, 2009, pp. 228-247. doi:10.1016/j.ejca.2008.10.026
[23] C. C. Jaffe, “Measures of Response: RECIST, WHO, and New Alternatives,” Journal of Clinical Oncology, Vol. 24, No. 20, 2006, pp. 3245-3251. doi:10.1200/JCO.2006.06.5599
[24] H. Robbins, “Estimating Many Variances,” In: S. S. Gupta, Ed., Statistical Decision Theory and Related Topics III, Vol. 2, Academic Press, New York, 1982, pp. 251-261.
[25] H. Robbins, “Some Thoughts on Empirical Bayes Eestimation,” Annals of Statistics, Vol. 11, No. 3, 1983, pp. 713-723. doi:10.1214/aos/1176346239
[26] L. H. Schwartz, M. Mazumdar, W. Brown, A. Smith and D. M. Panicek, “Variability in Response Assessment in Solid Tumors: Effect of Number of Lesions Chosen for Measurement,” Clinical Cancer Research, Vol. 9, No. 12, 2003, pp. 4318-4323.
[27] Z. Q. J. Lu, N. Petrick, C. Fenimore, D. Clunie, K. Borradaile, R. Ford, M. F. McNitt-Gray, H. J. G. Kim, R. Zeng, M. A. Gavrielides, B. Zhao and A. J. Buckler, “Statistical Analysis of Reader Measurement Variability in Nodule Sizing with CT Phantom Imaging Data,” NIST Interagency Report, 2012.
[28] J. J. Erasmus, G. W. Gladish, L. Broemeling, B. S. Sabloff, M. T. Truong, R. S. Herbst and R. F. Munden, “Interobserver and Intraobserver Variability in Measurement of Non-Small-Cell Carcinoma Lung Lesions: Implications for Assessment of Tumor Response,” Journal of Clinical Oncology, Vol. 21, No. 13, 2003, pp. 2574-2582. doi:10.1200/JCO.2003.01.144
[29] L. E. Dodd, R. F. Wagner, S. G. Armato III, M. F. McNittGray, S. Beiden, H.-P. Chan, D. Gur, G. McleNnan, C. E. Metz, N. Petrick, B. Sahiner and J. Sayre, “Assessment Methodologies and Statistical Issues for Computer-Aided Diagnosis of Lung Nodules in Computed Tomography: Contemporary Research Topics Relevant to the Lung Image Database Consortium,” Academic Radiology, Vol. 11, No. 4, 2004, pp. 462-475. doi:10.1016/S1076-6332(03)00814-6
[30] C. R. Meyer, T. D. Johnson, G. McLennan, D. R. Aberle, E. A. Kazerooni, H. MacMahon, B. F. Mullan, D. F. Yankelevitz, E. J. R. van Beek, S. G. Armato III, M. F. McNitt-Gray, A. P. Reeves, D. Gur, C. I. Henschke, E. A. Hoffman, R. H. Bland, G. Laderach, R. Pais, D. Qing, C. Piker, J. Guo, A. Starkey, D. Max, B. Y. Croft and L. P. Clarke, “Evaluation of Lung MDCT Nodule Annotation Across Radiologists and Methods,” Academic Radiology, Vol. 13, No. 10, 2006, pp. 1254-1265. doi:10.1016/j.acra.2006.07.012
[31] RIDER: Reference Image Database to Evaluate Response, National Institute of Biomedical Imaging and Bioengineering Institute of NIH. http://www.nibib.nih.gov/Research/Resources/ImageClinData#RIDER
[32] Z. Q. Lu, “Local Polynomial Prediction and Volatility Estimation in Financial Time Series,” In: A. S. Soofi and L. Cao, Eds., Modelling and Forecasting Financial Data: Techniques of Nonlinear Dynamics, Kluwer, Boston, 2002, pp. 115-135.
[33] C. R. Meyer, S. G. Armato III, C. P. Fenimore, G. McLennan, L. M. Bidaut, D. P. Barboriak, M. A. Gavrielides, E. F. Jackson, M. F. McNitt-Gray, P. E. Kinahan, N. Petrick and B. Zhao, “Quantitative Imaging to Assess Tumor Response to Therapy: Common Themes of Measurement, Truth Data, and Error Sources,” Translational Oncology, Vol. 2, No. 4, 2009, pp.198-210.
[34] P. J. Huber, “Robust Statistics,” Wiley, New York, 1981.
[35] D. C. Hoaglin, F. Mosteller and J. W. Tukey, “Understanding Robust and Exploratory Data Analysis,” Wiley, New York, 1983.
[36] S. G. Armato III, C. R. Meyer, M. F. McNitt-Gray, G. McLennan, A. P. Reeves, B. Y. Croft and L. P. Clarke, “The Reference Image Database to Evaluate Response to Therapy in Lung Cancer (RIDER) Project: A Resource for the Development of Change-Analysis Software,” Clinical Pharmacology & Therapeutics, Vol. 84, No. 4, 2008, pp.448-456. doi:10.1038/clpt.2008.161
[37] J. R. Landis and G. G. Koch, “The Measurement of Observer Agreement for Categorical Data,” Biometrics, Vol. 33, No. 1, 1977, pp.159-174. doi:10.2307/2529310
[38] A. J. Viera and J. M. Garrett, “Understanding the Interobserver Agreement: The Kappa Statistics,” Family Medicine, Vol. 37, No. 5, 2005, pp. 360-363.
[39] B. Zhao, L. P. James, C. S. Moskowitz, P. Guo, M. S. Ginsberg, R. A. Lefkowitz, Y. Qin, G. J. Riely, M. G. Kris and L. H. Schwartz, “Evaluating Variability in Tumor Measurements from Same-Day Repeat Scans of Patients with Non-Small Cell Lung Cancer,” Radiology, Vol. 252, No. 1, 2009, pp. 263-272. doi:10.1148/radiol.2522081593
[40] A. P. Reeves, A. B. Chan, D. F. Yankelevitz, C. I. Henschke, B. Kressler, W. J. Kostis, “On Measuring the Change in Size of Pulmonary Nodules,” IEEE Transactions on Medical Imaging, Vol. 25, No. 4, 2006, pp. 435- 450. doi:10.1109/TMI.2006.871548
[41] J. M. Reinhardt, K. Ding, K. Cao, C. E. Christensen, E. A. Hoffman and S. V. Bodas, “Registration-Based Estimates of Local Lung Tissue Expansion Compared to Xenon CT Measures of Specific Ventilation,” Medical Image Analysis, Vol. 12, No. 6, 2008, pp. 752-763. doi:10.1016/j.media.2008.03.007
[42] L. D. Broemeling, “Bayesian Biostatistics and Diagnostic Medicine,” Chapmall & Hll/CRC, Boca Raton, 2007.
[43] E. A. Eisenhauer, P. Therasse, J. Bogaerts, L. H. Schwartz, D. Sargent, R. Ford, J. Dancey, S. Arbuck, S. Gwyther, M. Mooney, L. Rubinstein, L. Shankar, L. Dodd, R. Kaplan, D. Lacombe and J. Verweij, “New Response Evaluation Criteria in Solid Tumours: Revised RECIST Guideline (Version 1.1),” European Journal of Cancer, Vol. 45, No. 2, 2009, pp. 228-247. doi:10.1016/j.ejca.2008.10.026
[44] C. C. Jaffe, “Measures of Response: RECIST, WHO, and New Alternatives,” Journal of Clinical Oncology, Vol. 24, No. 20, 2006, pp. 3245-3251. doi:10.1200/JCO.2006.06.5599
[45] H. Robbins, “Estimating Many Variances,” In: S. S. Gupta, Ed., Statistical Decision Theory and Related Topics III, Vol. 2, Academic Press, New York, 1982, pp. 251-261.
[46] H. Robbins, “Some Thoughts on Empirical Bayes Eestimation,” Annals of Statistics, Vol. 11, No. 3, 1983, pp. 713-723. doi:10.1214/aos/1176346239
[47] L. H. Schwartz, M. Mazumdar, W. Brown, A. Smith and D. M. Panicek, “Variability in Response Assessment in Solid Tumors: Effect of Number of Lesions Chosen for Measurement,” Clinical Cancer Research, Vol. 9, No. 12, 2003, pp. 4318-4323.
[48] Z. Q. J. Lu, N. Petrick, C. Fenimore, D. Clunie, K. Borradaile, R. Ford, M. F. McNitt-Gray, H. J. G. Kim, R. Zeng, M. A. Gavrielides, B. Zhao and A. J. Buckler, “Statistical Analysis of Reader Measurement Variability in Nodule Sizing with CT Phantom Imaging Data,” NIST Interagency Report, 2012.
[49] J. J. Erasmus, G. W. Gladish, L. Broemeling, B. S. Sabloff, M. T. Truong, R. S. Herbst and R. F. Munden, “Interobserver and Intraobserver Variability in Measurement of Non-Small-Cell Carcinoma Lung Lesions: Implications for Assessment of Tumor Response,” Journal of Clinical Oncology, Vol. 21, No. 13, 2003, pp. 2574-2582. doi:10.1200/JCO.2003.01.144
[50] L. E. Dodd, R. F. Wagner, S. G. Armato III, M. F. McNittGray, S. Beiden, H.-P. Chan, D. Gur, G. McleNnan, C. E. Metz, N. Petrick, B. Sahiner and J. Sayre, “Assessment Methodologies and Statistical Issues for Computer-Aided Diagnosis of Lung Nodules in Computed Tomography: Contemporary Research Topics Relevant to the Lung Image Database Consortium,” Academic Radiology, Vol. 11, No. 4, 2004, pp. 462-475. doi:10.1016/S1076-6332(03)00814-6
[51] C. R. Meyer, T. D. Johnson, G. McLennan, D. R. Aberle, E. A. Kazerooni, H. MacMahon, B. F. Mullan, D. F. Yankelevitz, E. J. R. van Beek, S. G. Armato III, M. F. McNitt-Gray, A. P. Reeves, D. Gur, C. I. Henschke, E. A. Hoffman, R. H. Bland, G. Laderach, R. Pais, D. Qing, C. Piker, J. Guo, A. Starkey, D. Max, B. Y. Croft and L. P. Clarke, “Evaluation of Lung MDCT Nodule Annotation Across Radiologists and Methods,” Academic Radiology, Vol. 13, No. 10, 2006, pp. 1254-1265. doi:10.1016/j.acra.2006.07.012
[52] RIDER: Reference Image Database to Evaluate Response, National Institute of Biomedical Imaging and Bioengineering Institute of NIH. http://www.nibib.nih.gov/Research/Resources/ImageClinData#RIDER
[53] Z. Q. Lu, “Local Polynomial Prediction and Volatility Estimation in Financial Time Series,” In: A. S. Soofi and L. Cao, Eds., Modelling and Forecasting Financial Data: Techniques of Nonlinear Dynamics, Kluwer, Boston, 2002, pp. 115-135.
[54] C. R. Meyer, S. G. Armato III, C. P. Fenimore, G. McLennan, L. M. Bidaut, D. P. Barboriak, M. A. Gavrielides, E. F. Jackson, M. F. McNitt-Gray, P. E. Kinahan, N. Petrick and B. Zhao, “Quantitative Imaging to Assess Tumor Response to Therapy: Common Themes of Measurement, Truth Data, and Error Sources,” Translational Oncology, Vol. 2, No. 4, 2009, pp.198-210.
[55] P. J. Huber, “Robust Statistics,” Wiley, New York, 1981.
[56] D. C. Hoaglin, F. Mosteller and J. W. Tukey, “Understanding Robust and Exploratory Data Analysis,” Wiley, New York, 1983.
[57] S. G. Armato III, C. R. Meyer, M. F. McNitt-Gray, G. McLennan, A. P. Reeves, B. Y. Croft and L. P. Clarke, “The Reference Image Database to Evaluate Response to Therapy in Lung Cancer (RIDER) Project: A Resource for the Development of Change-Analysis Software,” Clinical Pharmacology & Therapeutics, Vol. 84, No. 4, 2008, pp.448-456. doi:10.1038/clpt.2008.161
[58] J. R. Landis and G. G. Koch, “The Measurement of Observer Agreement for Categorical Data,” Biometrics, Vol. 33, No. 1, 1977, pp.159-174. doi:10.2307/2529310
[59] A. J. Viera and J. M. Garrett, “Understanding the Interobserver Agreement: The Kappa Statistics,” Family Medicine, Vol. 37, No. 5, 2005, pp. 360-363.
[60] B. Zhao, L. P. James, C. S. Moskowitz, P. Guo, M. S. Ginsberg, R. A. Lefkowitz, Y. Qin, G. J. Riely, M. G. Kris and L. H. Schwartz, “Evaluating Variability in Tumor Measurements from Same-Day Repeat Scans of Patients with Non-Small Cell Lung Cancer,” Radiology, Vol. 252, No. 1, 2009, pp. 263-272. doi:10.1148/radiol.2522081593
[61] A. P. Reeves, A. B. Chan, D. F. Yankelevitz, C. I. Henschke, B. Kressler, W. J. Kostis, “On Measuring the Change in Size of Pulmonary Nodules,” IEEE Transactions on Medical Imaging, Vol. 25, No. 4, 2006, pp. 435- 450. doi:10.1109/TMI.2006.871548
[62] J. M. Reinhardt, K. Ding, K. Cao, C. E. Christensen, E. A. Hoffman and S. V. Bodas, “Registration-Based Estimates of Local Lung Tissue Expansion Compared to Xenon CT Measures of Specific Ventilation,” Medical Image Analysis, Vol. 12, No. 6, 2008, pp. 752-763. doi:10.1016/j.media.2008.03.007
[63] L. D. Broemeling, “Bayesian Biostatistics and Diagnostic Medicine,” Chapmall & Hll/CRC, Boca Raton, 2007.

Copyright © 2024 by authors and Scientific Research Publishing Inc.

Creative Commons License

This work and the related PDF file are licensed under a Creative Commons Attribution 4.0 International License.