Multiple Choice Tests: Inferences Based on Estimators of Maximum Likelihood

Abstract

This paper revises and expands the model Delta for estimating the knowledge level in multiple choice tests (MCT). This model was originally proposed by Martín and Luna in 1989 (British Journal of Mathematical and Statistical Psychology, 42: 251) considering conditional inference. Consequently, the aim of this paper is to obtain the unconditioned estimators by means of the maximum likelihood method. Besides considering some properties arising from the unconditional inference, some additional issues regarding this model are also going to be addressed, e.g. test-inversion confidence intervals and how to treat omitted answers. A free program that allows the calculations described in the document is available on the website http://www.ugr.es/local/bioest/Delta

Share and Cite:

Femia-Marzo, P. and Martín-Andrés, A. (2014) Multiple Choice Tests: Inferences Based on Estimators of Maximum Likelihood. Open Journal of Statistics, 4, 466-483. doi: 10.4236/ojs.2014.46045.

Conflicts of Interest

The authors declare no conflicts of interest.

References

[1] Tarrant, M., Ware, J. and Mohammed, A.M. (2009) An Assessment of Functioning and Non-Functioning Distractors in Multiple-Choice Questions: A Descriptive Analysis. BMC Medical Education, 9, 40-48.
http://dx.doi.org/10.1186/1472-6920-9-40
[2] Simkin, M.G. and Kuechler, W.L. (2005) Multiple-Choice Tests and Student Understanding: What Is the Connection? Decision Sciences Journal of Innovative Education, 3, 73-97.
http://dx.doi.org/10.1111/j.1540-4609.2005.00053.x
[3] Gronlund, N.E. and Waugh, C.K. (2008) Assessment of Student Achievement. Pearson, Upper Saddle River.
[4] Scharf, E.M. and Baldwin, L.P. (2007) Assessing Multiple Choice Question (MCQ) Tests—A Mathematical Perspective. Active Learning in Higher Education, 8, 31-47.
http://dx.doi.org/10.1177/1469787407074009
[5] Lesage, E., Valcke, M. and Sabbe, E. (2013) Scoring Methods for Multiple Choice Assessment in Higher Education—Is It Still a Matter of Number Right Scoring or Negative Marking? Studies in Educational Evaluation, 39, 118-193. http://dx.doi.org/10.1016/j.stueduc.2013.07.001
[6] Martín Andrés, A. and Luna del Castillo, J.D. (1989) Tests and Intervals in Multiple Choice Tests: A Modification of the Simplest Classical Model. British Journal of Mathematical and Statistical Psychology, 42, 251-263.
http://dx.doi.org/10.1111/j.2044-8317.1989.tb00914.x
[7] Budescu, D. and Bar-Hillel, M. (1993) To Guess or Not to Guess: A Decision-Theoretic View of Formula Scoring. Journal of Educational Measurement, 30, 277-291.
http://dx.doi.org/10.1111/j.1745-3984.1993.tb00427.x
[8] Bar-Hillel, M., Budescu, D. and Attali, Y. (2005) Scoring and Keying Multiple Choice Tests: A Case Study in Irrationality. Mind & Society, 4, 3-12.
http://dx.doi.org/10.1007/s11299-005-0001-z
[9] Espinosa, M.P. and Gardazabal, J. (2010) Optimal Correction for Guessing in Multiple-Choice Tests. Journal of Mathematical Psychology, 54, 415-425.
http://dx.doi.org/10.1016/j.jmp.2010.06.001
[10] Hutchinson, T.P. (1982) Some Theories of Performance in Multiple-Choice Tests, and Their Implications for Variants of the Task. British Journal of Mathematical and Statistical Psychology, 35, 71-89.
[11] Lord, F.M. and Novick, M.R. (1968) Statistical Theories of Mental Test Scores. Addison-Wesley, Menlo Park.
[12] Martín Andrés, A. and Luna del Castillo, J.D. (1990) Multiple Choice Tests: Power, Length and Optimal Number of Choices Per Item. British Journal of Mathematical and Statistical Psychology, 43, 57-71.
http://dx.doi.org/10.1111/j.2044-8317.1990.tb00926.x
[13] Martín Andrés, A. and Femia Marzo, P. (2004) Delta: A New Measure of Agreement between Two Raters. British Journal of Mathematical and Statistical Psychology, 57, 1-19.
http://dx.doi.org/10.1348/000711004849268
[14] Martín Andrés, A. and Femia Marzo, P. (2005) Chance-Corrected Measures of Reliability and Validity in K × K Tables. Statistical Methods in Medical Research, 14, 473-492.
http://dx.doi.org/10.1191/0962280205sm412oa
[15] Martín Andrés, A. and Femia Marzo, P. (2008) Chance-Corrected Measures of Reliability and Validity in 2 × 2 Tables. Communications in Statistics-Theory and Methods, 37, 760-772.
http://dx.doi.org/10.1080/03610920701669884
[16] Peirce, C.S. (1884) The Numerical Measure of Success in Predictions. Science, 4, 453-454.
http://dx.doi.org/10.1126/science.ns-4.93.453-a
[17] Agresti, A. and Min, Y. (2001) On Small Sample Confidence Intervals for Parameters in Discrete Distributions. Biometrics, 57, 963-971.
http://dx.doi.org/10.1111/j.0006-341X.2001.00963.x
[18] Dunnett, C.W. and Gent, M. (1977) Significance Testing to Establish Equivalence between Treatments, with Special Reference to Data in the Form of 2 × 2 Tables. Biometrics, 33, 593-602.
http://dx.doi.org/10.2307/2529457
[19] Cytel (2014) StatXact Statistical Sofware.
http://www.cytel.com/software-solutions/statxact
[20] Group of Biostatistics of the University of Granada (Spain) (2014) Statistical Software.
http://www.ugr.es/~bioest/software.htm
[21] Agresti, A. and Caffo, B. (2000) Simple and Effective Confidence Intervals for Proportions and Difference of Proportions Result from Adding Two Successes and Two Failures. The American Statistician, 54, 280-288.
[22] Altman, D.G., Machin, D., Bryant, T.N. and Gardner, M.J. (2000) Statistics with Confidence. 2nd Edition, BMJ.
[23] Femia Marzo, P. and Martín Andrés, A. (2014) Software Delta Website.
http://www.ugr.es/local/bioest/Delta
[24] Meyer, C.D. (2000) Matrix Analysis and Applied Linear Algebra. Society for Industrial and Applied Mathematics (SIAM), Philadelphia.

Copyright © 2023 by authors and Scientific Research Publishing Inc.

Creative Commons License

This work and the related PDF file are licensed under a Creative Commons Attribution 4.0 International License.