TITLE:
Assessing the Quality of Examination Questions in Medical Education: A Classical Test and Item Response Theory Approach in a Morphology Course
AUTHORS:
Ângela Tavares Paes, Danielle Tamashiro Duarte, Natália Oliveira Feitosa, Marcella M. Ceratti, Felipe Prieto Siqueira, Pedro Afonso Liberato, Carlos Augusto Cardim de Oliveira
KEYWORDS:
Educational Evaluation, Academic Success, Medical Education, Quality of Tests, Proficiency Measure, Classical Test Theory (CTT), Item Response Theory (IRT)
JOURNAL NAME:
Creative Education,
Vol.16 No.6,
June
30,
2025
ABSTRACT: Introduction: One of the main challenges in educational courses in medical schools is the development of tests which are able to measure the proficiency of the students enrolled in an accurate manner. Despite the advances in the educational system, conventional methods of testing, such as the sum of points from multiple choice questions, are still commonplace and professors seldom focus on the quality of the questions contained within the examination. Objectives: The purpose of this study is to foster a discussion on the quality of exams in medical education, using a psychometric analysis of tests from a Morphology course as an example. Methods: Four examinations (tests) from the Morphology course in the first year of a private medical school were analyzed. The questions therein were all in a multiple-choice model, assessing basic concepts of anatomy, radiology and pathology. Techniques from Classical Test Theory (CTT) and Item Response Theory (IRT) using the mirt package were applied to evaluate the effectiveness of the questions in measuring students’ learning outcomes. Results: In the analyzed examinations, a discrepancy in the distribution of question difficulty was observed, with a predominance of easy questions, contrasted by a scarce or entirely absent presence of difficult ones. Additionally, a considerable proportion of questions exhibited a high likelihood of being answered correctly by chance. Overall, the discrimination rates of the examination questions were low (below 20%), with none approaching 100%. Moreover, a great number of questions displayed discrimination rate near zero, a value which disfavors the differentiation of students with both greater and lesser proficiency in the subjects examined. Conclusions: Although CTT and IRT are well-established and effective methods for assessing the quality of test questions, they are seldom utilized to guide educators in the development of more equitable examinations. These methods should be employed to assist professors in enhancing item development and the creation of more effective assessments, ensuring a better balance between easy, moderate, and difficult questions.