Short Answer Questions or Modified Essay questions – More Than a Technical Issue ()
1. Introduction
Examinations have several functions, e.g. to make sure that students have learnt the essential part of a course, and to give feedback to students and teachers on the effectiveness of learning and teaching. It may also be assumed that the form of examination influences the pedagogical process at large, thereby contributing to what is sometimes referred to as systemic validity [1]. Thus, summative as well as formative aspects of assessment should be recognized. Moreover, examinations need to be analyzed within the framework of an expanded view of validity, in which coverage of the defined knowledge domain—the construct—as well as inferences, actions and consequences are taken into account [2].
The questions in a written examination can be constructed in different ways, e.g. short answer questions (SAQ) or essay questions [3]. Within the medical field, the latter type has been developed into modified essay questions (MEQ) in order to assess clinical problemsolving skills [4]. A clinical case is presented in a chronological sequence of items in a case booklet. After each item a decision is required, and the student is not allowed to preview the subsequent item until the decision has been made. The reliability of MEQ is reported to be reasonable [5]. The aim of the present study was to evaluate if, and how, the results of a student examination differ when SAQ and MEQ are used.
2. Method
The study was carried out in the 2003 internal medicine course in Gothenburg, Sweden. At the end of the course, the students take a written examination in order to pass the theoretical part of the course. 49 students took part in the present examination, which consisted of 15 SAQ and 5 MEQ. All questions had previously been constructed, used, and validated by other universities in Sweden, which, at that time followed the same curriculum. The question constructors had not been informed that their questions were to be used in order to compare results on SAQ and MEQ. None of the questions had been available for the Gothenburg students. Based on the empirically developed guidelines, rating was done by one of the authors (SW), with the possibility of co-rating if needed.
Statistics
Statistical analyses and randomization were conducted using SPSS 12.0. Spearman correlation coefficients were calculated between the student results from 1) SAQ and MEQ; 2) SAQ and two randomly chosen MEQ; 3) SAQ and the remaining three MEQ, and 4) the MEQ in 2) and 3). Percentage correctly answered questions in different groups were compared with Mann-Whitney’s test. Values are presented as mean (standard deviation).
3. Results
The examination results for the 49 medical students are presented in Table 1. The percentage correctly answered questions did not differ significantly between SAQ and MEQ (P = 0.075).
Two students’ results were markedly poorer than the rest. Their percentages correctly answered questions were 33% and 51%, respectively (SAQ 60% and 60%, respectively; MEQ 22% and 47%, respectively). For SAQ, two students had less than 50% correctly answered questions (43% each), their results in MEQ being 69% and 64%, respectively.
There was a correlation between the results of 1) SAQ and MEQ; 2) SAQ and two randomly chosen MEQ; 3) SAQ and the remaining three MEQ; and 4) MEQ in 2) and 3) (Table 2).
4. Discussion
Although a few individual students had poor results in either SAQ or MEQ, their results in one of the parts being markedly better, the present study generally showed a significant correlation between SAQ and MEQ. This may not be surprising, since it has been reported that a large proportion of MEQ are often factual recall questions or

Table 1. Various test models with maximum score for each type, and results (% correct answers) for all 49 medical students participating in the examination (SAQ, short answer questions; MEQ, modified essay questions).

Table 2. Spearman correlation coefficient between the student results of short answer questions (SAQ) and modified essay questions (MEQ).
deal with interpretation of data [6], i.e. they imitate traditional SAQ. Although the MEQ format was initially developed to assess clinical problem-solving skills, Feletti and Smith [6] found that only 27% of MEQ on nine occasions during three consecutive years were actually problem-solving questions. The correlation coefficients between SAQ and MEQ found in our study were similar to the ones between subgroups of MEQ. This indicates that variation in content rather than in format affects the outcome.
The results of the present study do not allow conclusions concerning what kind of knowledge was assessed through the two types of questions. However, the differing results for the low achieving students may indicate that different types of knowing are focused upon. Indeed, MEQ has been claimed to be especially valuable for diagnosing student weaknesses [5].
To further explore the issue of different question types, larger number of results would be required. However, since 2003, the faculties of medicine in Sweden have developed diverging curricula. Thus, it was not possible to further validate our results by expanding our study to include more students. Moreover, it would be valuable to actively collaborate with the students, for example by using established reporting techniques during test taking, interviewing students about their perceptions after the exam, or asking them to write comments on specific aspects of the different formats used, e.g. in relation to degree of coverage of the domain, and the perceived authenticity of the examination.
A disadvantage with MEQ is that the examination time usually only allows a small number of cases, thus limiting the number of items to test. This may negatively affect the coverage of the area (content validity), as well as the precision of the measurement (reliability). Another problem with MEQ is that the construction of such items is time-consuming. However, it may also be claimed that there are validity related advantages to the technique, for example concerning the aspect of authenticity, commonly regarded as an important feature of construct validity.
5. Conclusion
Our experience is that students appreciate the MEQ type of questions. However, one may ask if it is worth the effort, since the results correlate well with the much more easy-to-construct SAQ. Maybe a written examination consisting of both SAQ and MEQ, like in the present study, could be an adequate compromise, with beneficial effects on validity and reliability, as well as on feasibility.