A Gendered Study of Student Ratings of Instruction

This research tests for differences in mean class averages between male and female faculty for questions on a student rating of instruction form at one university in the Midwest are considered to be in the category of “very high research activity” by the Carnegie Commission on Higher Education. Differences in variances of class averages are also examined for male and female faculty. Tests are conducted by first considering all classes across the entire university and then classes just within the College of Science and Mathematics. The proportion of classes taught by female instructors in which the average male student rating was higher than the average female student rating was compared to the proportion of classes taught by male instructors in which the average male student rating was higher than the average female student rating. Results are discussed.


Introduction
Student ratings of instruction are often used by universities as the main way of evaluating the teaching effectiveness of a faculty member.This is particularly true at research universities [1].Some studies have shown that there is gender bias in student ratings of instruction [2] [3].Marcotte [4] discusses a small study conducted based on student ratings in online courses.When students thought the instructor was female, the instructor was rated lower in all 12 categories considered.This included the category of "caring and respectful".Everything about the online course was the same in all cases except for students were told different genders for the instructor.One category in the study discussed by Marcotte [4] is "promptness" of the instructor in returning graded items.Students who thought the instructor was male gave the instructor an average rating of 4.35.Students who thought the instructor was female, gave the instructor an average rating of 3.55.
In other research, Feldman [5] [6] and Lueck, Endres, & Caplan [7], found that female students rated female instructors higher and male students rated male instructors higher.Feldman [5] [6] found that this was evidenced further based one's perception of gender "roles", which is more pronounced among business and engineering students.Worthington [8], in particular found this gender bias among finance majors.Basow [9] and Centra & Gaubatz [10] found that male instructors were rated similarly by both their male and female students.These studies found however, that female students tended to rate female instructors higher overall.Male instructors tend to be rated higher in knowledge/scholarship questions as well as enthusiasm.Female instructors tend to be rated higher in "comfortable environment" [11]- [13].
The subject area that an instructor teaches in also has an effect on student ratings of instruction.Teachers in science and engineering get lower ratings than those teachers in the humanities and social sciences.Female faculty fair even worse in the sciences, finding both male and female students rated female instructors lower than male instructors [9] [11] [12].
The age of an instructor was found to also affect ratings.Older instructors do not receive as high of student ratings of instruction.Increasing age has a more negative effect on female instructors [14].
Sprague and Massoni [15] and Laube, Massoni, Sprague, and Ferber [16] found that students have gendered expectations of instructors.Students expect female instructors to be caring and nurturing.They also expect female instructors to be available more often than male instructors in addition to not being as demanding or grading as harshly [12] [13] [15] [17] [18].Students expect male instructors to be entertaining and energetic.Schmidt created a database on the words students used on the website "Rate My Professor" [19].He found that patterns in the words used were associated with the gender of the instructor.The words "intelligent", "genius", and "smart" came up more often when evaluating male instructors across all disciplines.The words "bossy", "nurturing", and "strict" came up more when evaluating female instructors across all disciplines [19].Researchers found that female instructors who did not meet these "gendered" expectations were rated more harshly than the male instructors who did not meet their gendered expectations [12] [13] [15] [17].
Studies have also shown that there is a significant positive correlation between the grade a student expects in a class and the same student's teaching evaluation of the professor [20]- [23].In classes that have a common final exam, it has been shown that there is a modest significant positive correlation between student performance on the exam and the corresponding evaluation by the student of the teaching effectiveness of the instructor.However, based on the results of another study considering classes with no common final, there is some evidence that a student may give a higher rating to an instructor because they are "grateful" for the grade they are expecting to receive, or may give a lower rating to an instructor because they are upset about the grade they are receiving [24].This could result in grade inflation [24].Recall that researchers have found that female faculty are expected to grade less harshly than male faculty and are rated lower by students if they do not meet this gendered expectation [12] [13] [15] [17].
Basow and Martin [25] summarize much of the research that has been done concerning gendered expectations of students and completing student ratings of instruction for male and female faculty.They say that female faculty must be more caring and nurturing, be available more often, and not be as demanding on tests and assignments as male faculty, in order to get comparable evaluations.Female faculty not having these "gendered" characteristics get lower ratings.Findings from a North Carolina University study with lead researcher Lillian MacNell, suggest that female faculty still have to work harder to get similar ratings to male faculty members [4].

Study Design
We would like to explore some of these gendered findings at an intensive research university in the Midwest to see if we find similar results.In particular, we would like to examine student ratings of instruction for the 2013-14 academic year at North Dakota State University.This university is located in Fargo, North Dakota.It is ranked by the Carnegie Commission on Higher Education as one of the top public and private universities in the country and by NSF in the category of "Research University/Very High Research Activity".As of 2014, the university had a total enrollment of 14,747 students broken down into 12,124 undergraduate students, 340 professional students, and 2283 graduate students.Approximately 91% of the students were US citizens, 7% international, and 2% permanent residents with 54% of the students being male and 46% being female.The University consists of 8 colleges (www.ndsu.edu).Institutional Review Board (IRB) approval was obtained.
At North Dakota State University (NDSU), all instructors are required to give students in their classes an op-portunity to evaluate the instruction in the class.This evaluation must take place during the last three weeks of the semester.The student rating of instruction form used at NDSU currently consists of 16 questions.These questions are given in Figure 1.The first six questions on the form have been used during the past 10 years.The last ten questions on the form were added to the form in 2013.The form also asks students to respond to the following demographic questions: 1) Their gender; 2) Their level (freshman, sophomore, junior, senior, graduate student); 3) Whether the course is elective or not; and 4) Their expected grade (A, B, C, D, F).Student response data for the academic year 2013-14 was collected from all classes taught in an in-class environment, but not including workshops, seminars, or independent study classes.Data was collected from a total of 2092 classes.
Three areas of research will be examined in regard to this student rating of instruction questionnaire.The first area that will be examined is how the average student response to each of the questions is related to the class demographics.In particular, the demographics that will be considered include the following: 1) The percentage of students in the class required to take the class for their major; 2) The percentage of males in the class; 3) The percentage of students expecting to receive either an A or B in the class; 4) The percentage of freshman and sophomores in the class; and then 5) The gender of the instructor.In this phase of the research, a total of 16 least squares regressions will be conducted with the dependent variables for each of the regressions being the average class responses to each of the questions.The independent variables for each of the regression models will be the five demographic variables that have been mentioned.We would like to determine how much of the variation in class average responses to each question is explained by the class demographics.If this percentage is high, this indicates that the question is not measuring effective instruction, but rather something else.In this phase of the research, we would also like to investigate whether or not the gender variable is significant.
The second area of research that we will examine is to compare the average class responses of each of the questions for female instructors with the average class responses of each of the questions for male instructors across all 8 colleges of the university.We will test to see if there is a significant difference in the average class responses between male and female instructors for each question using two versions of a t-test: one assuming equal variances (pooled); and one not assuming equal variances (Satterthwaite).We are particularly interested in how the means compare between male and female instructors for the following questions: Question 5, "The fairness of procedures for grading this course", and Question 10, "I understood how my grades were assigned in this course", since research indicates that students expect female instructors to grade them less harshly [12] [13] [15] [17]; Question 12, "The instructor was available to assist students outside of class", because research has shown that students expect female instructors to be available more often [12] [13] [15] [17]; and Question 13, "The instructor provided feedback in a timely manner", since research has shown students rate male instructors higher in this category even if response time is the same [4].The average class responses of each questions between male and female instructors within the College of Science and Mathematics will also be compared.Research has shown that female instructors in the sciences tend to get lower ratings than male instructors from the students [9] [11] [12].In addition to comparing the mean class average responses between males and females for both the entire University and then for the College of Science and Mathematics, the variances in the average class responses between male and female instructors will also be compared for each question considering the entire University and then considering only the College of Science and Mathematics.
In the last phase of the research, we will use only classes in which at least five male students and five female students responded.In this phase we will compare the proportion of classes taught by female instructors in which the average male student response is higher than the average female student response to the proportion of classes taught by male instructors in which the average male student response is higher.These proportions will be compared for each question.Our hypothesis is that the proportions will be higher for male instructors.Our sample consisted of 112 classes taught by female instructors and 162 classes taught by male instructors.The lower sample size was due to the lack of students not responding to the demographic question about their gender and to the fact that we deleted any class from this portion of the study that did not have at least 5 male and 5 female students responding.

Results
Recall that in the first phase of the research, 16 ordinary least square regressions were to be conducted with the 16 dependent variables being the average class responses to each of the 16 questions.The independent variables in the model were the five class demographic variables collected: 1) The percentage of students in the class required to take the class for their major; 2) The percentage of males in the class; 3) The percentage of students expecting to receive either an A or B in the class; 4) The percentage of freshman and sophomores in the class; and then 5. The gender of the instructor.For the majority of the questions, the demographics explained between 15% and 28% of the variation in responses.There were four questions that were the exception to this.Fifty-nine percent of the variation of in class average responses to question 11, "I met or exceeded the course objectives given for this course", were explained by the demographics.Forty-nine percent of the variation in class average response to question 6, "Your understanding of the course content"; 39% of the variation in class average responses to question 5, "The fairness of procedures for grading this course"; and 32% of the variation in class average responses to question 4, "The quality of this course", was explained by the demographics.For all questions, the percentage of students expecting to receive an A or a B was highly significant in explaining the variation in class average responses.For questions 7, 12, 15, and 16, the percentage of males in the class was significant at alpha equal to 0.05, and for question 8, the percentage of males in the class was significant at 0.10.In all of these cases, an increase of 10% males in the class corresponded with an average increase of approximately 0.04 in the instructor rating for that question.The only other demographic variable found to be significant for any question was the percentage of students taking the class because it was required for their major.This demographic variable was only significant (alpha = 0.05) for question 6.In this case, the class average response decreased with an increase in the proportion of students for which this course was required for their major.The first variable to enter any of the models, and the most significant variable, was the variable for the proportion of students in the class expected to receive either an A or a B grade.The indicator variable for gender of the instructor was not significant for any of these models with the proportion of students expecting an A or a B already in the model.
If this set of questions were used to evaluate effective teaching, one may consider dropping questions 6 and 11 since about 50% or more of the variation in class average responses is explained by the class demographics.These questions are not really evaluating effective teaching.
We next tested whether there was a difference in the mean class average responses for each of the 16 questions between classes taught by female instructors and classes taught by male instructors.The sample means for male and female instructors for each of the questions is given in Table 1.
When the classes across all 8 colleges of the university was considered, a significant difference was found in the mean class average responses between male and female instructors for questions 6 and 11 with alpha equal to 0.05 and for question 15 with an alpha value of 0.10 ( p-value = 0.0548).In all these cases, the mean response for female instructors was higher.Female instructors were rated higher on the student's understanding of the course content, the student's meeting or exceeding course objectives, and the instructor setting and maintaining higher standards.It is interesting to note that when a regression analysis was conducted with class average response to question 11 being the dependent variable, gender of the instructor was not significant when the percentage of students who expected to get an A or a B was in the model.This was also true for questions 6 and 15.Students in female instructor classes were expecting more A's and B's than student's in male instructor classes.
Female instructors were not rated significantly lower on average to Question 5, "The fairness of grading procedures", but the sample mean of the class averages for female instructors was lower (p-value = 0.1242).Female instructors were not rated significantly lower on average to Question 10, "I understood how my grades were assigned", but the sample mean of the class averages for female instructors was slightly lower for this question (p-value = 0.6359).There was also no significant difference in means for Questions 12 and 13, about the availability of the instructor and providing feedback in a timely manner.The sample means for male and female instructors were very close to each other, with the sample means for females only being very slightly higher (pvalues = 0.6848 and 0.8016, respectively).
Since research has shown that female faculty are rated lower in the science field [9] [11] [12], mean class average responses between male and female instructors were compared in the College of Science and Mathematics.These are given in Table 2.A significant difference at alpha equal to 0.05 was found between the mean responses for questions 3, 5, 7. A marginally significant difference was found between the mean responses of question 1 (p-value = 0.1004), and question 2 (p-value = 0.0824).In the three significant cases, except for question 5, the mean response associated with female faculty was higher.Female instructors were rated higher on communication ability, and creating an atmosphere conducive to learning.Research has shown that female faculty tend to be rated higher in "comfortable environment", but it is interesting to note we did not find a significant difference for the overall University, but in the College of Science and Mathematics for this question [11]- [13].Females were rated marginally higher on the student's satisfaction with the instruction in the course, and on the instructor as a teacher.Question 5 had students rating the fairness of procedures for grading this course.Research  has shown that student expect female faculty to not be as harsh on grading, with students giving female faculty lower evaluations on grading if they are harsh [12] [13] [15] [17] [18].For question 10, "I understood how my grades were assigned in this course", the sample mean of class average responses was slightly lower for female faculty, although not significant (p-value = 0.2587).The class average responses for questions 12 and 13 on availability and providing feedback in a timely manner were actually slightly higher for female faculty, but not significant (p-values = 0.6848 and 0.8016, respectively).Recall that research suggests that students expect female faculty to be available more often than male faculty [12] [13] [15] [17] [18].Research has also shown that males are rated higher in promptness even when taking the same amount of time to return assignments as female faculty [4].
We did consider the differences in variation of class average responses for each question between male and female instructors over the entire University (All Colleges).A significant difference in variability was found between class average responses for classes taught by male and female instructors for questions 5, and 15 at alpha equal to 0.05.A marginally significant difference was found between the class average responses for question 11 (p-value = 0.0906).The class average responses for question 5 had a larger variability for female faculty.This question was on rating the fairness of grading procedures.It is noted again that research has shown students expect female faculty to grade less harshly [12] [13] [15] [17] [18].This could account for the larger variability among female faculty.The variances of the responses for male faculty were higher in the other two cases.The p-values for tests of variances are given in Table 1.The difference in variability of the class average responses for male and female faculty was tested was tested within the College of Science and Mathematics.A significant difference in variability between class average responses for classes taught by male and female instructors was found for questions 1, 2, 5, 7, 8, 9, 11, 12, 14, 15, and 16 at alpha equal to 0.05.A marginally significant difference in variability was found between class average responses for question 13 (p-value = 0.0974).In all cases, except for question 5, the class average responses were found to be less variable for classes taught by female instructors than for classes taught by male instructors (Table 2).
In the last phase of our research, we considered only classes in which at least 5 male students and 5 female students responded.Other research has found that male students tend to rate male instructors higher and female students tend to rate female instructors higher [5]- [7].We wanted to test whether the proportion of classes taught by female instructors in which the male student response was higher was significantly lower than the proportion of classes taught by male instructors in which the male student response was higher.The sample proportion of classes taught by female instructors in which the male response was higher was calculated (Proportion 1) and the sample proportion of classes taught by male instructors in which the male response was higher (Proportion 2) was calculated.These are given in Table 3.In all cases, the sample proportion for female instructors was lower Table 3. Proportion of classes in which male student response is higher.

Figure 1 .
Figure 1.Student rating of instruction form.

Table 1 .
Mean gender results for Questions 1 -16 all colleges.(a) Mean gender results Question 1; (b) Mean gender results Question 2; (c) Mean gender results Question 3; (d) Mean gender results Question 4; (e) Mean gender results Question 5; (f) Mean gender results Question 6; (g) Mean gender results Question 7; (h) Mean gender results Question 8; (i) Mean gender results Question 9; (j) Mean gender results Question 10; (k) Mean gender results Question 11; (l) Mean gender results Question 12; (m) Mean gender results Question 13; (n) Mean gender results Question 14; (o) Mean gender results Question 15; (p) Mean gender results Question 16.
P-value = 0.3904 when testing equality of variances.

Table 2 .
Mean gender results for questions 1 -16 college of science and math.(a) Mean gender results Question 1; (b) Mean gender results Question 2; (c) Mean gender results Question 3; (d) Mean gender results Question 4; (e) Mean gender results Question 5; (f) Mean gender results Question 6; (g) Mean gender results Question 7; (h) Mean gender results Question 8; (i) Mean gender results Question 9; (j) Mean gender results Question 10; (k) Mean gender results Question 11; (l) Mean gender results Question 12; (m) Mean gender results Question 13; (n) Mean gender results Question 14; (o) Mean gender results Question 15; (p) Mean gender results Question 16.
P-value = 0.0019 when testing equality of variances (males higher).P-value = 0.0268 when testing equality of variances (males higher).