^{1}

^{1}

^{*}

This research tests for differences in mean class averages between male and female faculty for questions on a student rating of instruction form at one university in the Midwest are considered to be in the category of “very high research activity” by the Carnegie Commission on Higher Education. Differences in variances of class averages are also examined for male and female faculty. Tests are conducted by first considering all classes across the entire university and then classes just within the College of Science and Mathematics. The proportion of classes taught by female instructors in which the average male student rating was higher than the average female student rating was compared to the proportion of classes taught by male instructors in which the average male student rating was higher than the average female student rating. Results are discussed.

Student ratings of instruction are often used by universities as the main way of evaluating the teaching effectiveness of a faculty member. This is particularly true at research universities [

In other research, Feldman [

The subject area that an instructor teaches in also has an effect on student ratings of instruction. Teachers in science and engineering get lower ratings than those teachers in the humanities and social sciences. Female faculty fair even worse in the sciences, finding both male and female students rated female instructors lower than male instructors [

The age of an instructor was found to also affect ratings. Older instructors do not receive as high of student ratings of instruction. Increasing age has a more negative effect on female instructors [

Sprague and Massoni [

Studies have also shown that there is a significant positive correlation between the grade a student expects in a class and the same student’s teaching evaluation of the professor [

Basow and Martin [

We would like to explore some of these gendered findings at an intensive research university in the Midwest to see if we find similar results. In particular, we would like to examine student ratings of instruction for the 2013- 14 academic year at North Dakota State University. This university is located in Fargo, North Dakota. It is ranked by the Carnegie Commission on Higher Education as one of the top public and private universities in the country and by NSF in the category of “Research University/Very High Research Activity”. As of 2014, the university had a total enrollment of 14,747 students broken down into 12,124 undergraduate students, 340 professional students, and 2283 graduate students. Approximately 91% of the students were US citizens, 7% international, and 2% permanent residents with 54% of the students being male and 46% being female. The University consists of 8 colleges (www.ndsu.edu). Institutional Review Board (IRB) approval was obtained.

At North Dakota State University (NDSU), all instructors are required to give students in their classes an opportunity to evaluate the instruction in the class. This evaluation must take place during the last three weeks of the semester. The student rating of instruction form used at NDSU currently consists of 16 questions. These questions are given in

last ten questions on the form were added to the form in 2013. The form also asks students to respond to the following demographic questions: 1) Their gender; 2) Their level (freshman, sophomore, junior, senior, graduate student); 3) Whether the course is elective or not; and 4) Their expected grade (A, B, C, D, F).

Student response data for the academic year 2013-14 was collected from all classes taught in an in-class environment, but not including workshops, seminars, or independent study classes. Data was collected from a total of 2092 classes.

Three areas of research will be examined in regard to this student rating of instruction questionnaire. The first area that will be examined is how the average student response to each of the questions is related to the class demographics. In particular, the demographics that will be considered include the following: 1) The percentage of students in the class required to take the class for their major; 2) The percentage of males in the class; 3) The percentage of students expecting to receive either an A or B in the class; 4) The percentage of freshman and sophomores in the class; and then 5) The gender of the instructor. In this phase of the research, a total of 16 least squares regressions will be conducted with the dependent variables for each of the regressions being the average class responses to each of the questions. The independent variables for each of the regression models will be the five demographic variables that have been mentioned. We would like to determine how much of the variation in class average responses to each question is explained by the class demographics. If this percentage is high, this indicates that the question is not measuring effective instruction, but rather something else. In this phase of the research, we would also like to investigate whether or not the gender variable is significant.

The second area of research that we will examine is to compare the average class responses of each of the questions for female instructors with the average class responses of each of the questions for male instructors across all 8 colleges of the university. We will test to see if there is a significant difference in the average class responses between male and female instructors for each question using two versions of a t-test: one assuming equal variances (pooled); and one not assuming equal variances (Satterthwaite). We are particularly interested in how the means compare between male and female instructors for the following questions: Question 5, “The fairness of procedures for grading this course”, and Question 10, “I understood how my grades were assigned in this course”, since research indicates that students expect female instructors to grade them less harshly [

In the last phase of the research, we will use only classes in which at least five male students and five female students responded. In this phase we will compare the proportion of classes taught by female instructors in which the average male student response is higher than the average female student response to the proportion of classes taught by male instructors in which the average male student response is higher. These proportions will be compared for each question. Our hypothesis is that the proportions will be higher for male instructors. Our sample consisted of 112 classes taught by female instructors and 162 classes taught by male instructors. The lower sample size was due to the lack of students not responding to the demographic question about their gender and to the fact that we deleted any class from this portion of the study that did not have at least 5 male and 5 female students responding.

Recall that in the first phase of the research, 16 ordinary least square regressions were to be conducted with the 16 dependent variables being the average class responses to each of the 16 questions. The independent variables in the model were the five class demographic variables collected: 1) The percentage of students in the class required to take the class for their major; 2) The percentage of males in the class; 3) The percentage of students expecting to receive either an A or B in the class; 4) The percentage of freshman and sophomores in the class; and then 5. The gender of the instructor. For the majority of the questions, the demographics explained between 15% and 28% of the variation in responses. There were four questions that were the exception to this. Fifty-nine percent of the variation of in class average responses to question 11, “I met or exceeded the course objectives given for this course”, were explained by the demographics. Forty-nine percent of the variation in class average response to question 6, “Your understanding of the course content”; 39% of the variation in class average responses to question 5, “The fairness of procedures for grading this course”; and 32% of the variation in class average responses to question 4, “The quality of this course”, was explained by the demographics. For all questions, the percentage of students expecting to receive an A or a B was highly significant in explaining the variation in class average responses. For questions 7, 12, 15, and 16, the percentage of males in the class was significant at alpha equal to 0.05, and for question 8, the percentage of males in the class was significant at 0.10. In all of these cases, an increase of 10% males in the class corresponded with an average increase of approximately 0.04 in the instructor rating for that question. The only other demographic variable found to be significant for any question was the percentage of students taking the class because it was required for their major. This demographic variable was only significant (alpha = 0.05) for question 6. In this case, the class average response decreased with an increase in the proportion of students for which this course was required for their major. The first variable to enter any of the models, and the most significant variable, was the variable for the proportion of students in the class expected to receive either an A or a B grade. The indicator variable for gender of the instructor was not significant for any of these models with the proportion of students expecting an A or a B already in the model.

If this set of questions were used to evaluate effective teaching, one may consider dropping questions 6 and 11 since about 50% or more of the variation in class average responses is explained by the class demographics. These questions are not really evaluating effective teaching.

We next tested whether there was a difference in the mean class average responses for each of the 16 questions between classes taught by female instructors and classes taught by male instructors. The sample means for male and female instructors for each of the questions is given in

When the classes across all 8 colleges of the university was considered, a significant difference was found in the mean class average responses between male and female instructors for questions 6 and 11 with alpha equal to 0.05 and for question 15 with an alpha value of 0.10 ( p-value = 0.0548). In all these cases, the mean response for female instructors was higher. Female instructors were rated higher on the student’s understanding of the course content, the student’s meeting or exceeding course objectives, and the instructor setting and maintaining higher standards. It is interesting to note that when a regression analysis was conducted with class average response to question 11 being the dependent variable, gender of the instructor was not significant when the percentage of students who expected to get an A or a B was in the model. This was also true for questions 6 and 15. Students in female instructor classes were expecting more A’s and B’s than student’s in male instructor classes.

Female instructors were not rated significantly lower on average to Question 5, “The fairness of grading procedures”, but the sample mean of the class averages for female instructors was lower (p-value = 0.1242). Female instructors were not rated significantly lower on average to Question 10, “I understood how my grades were assigned”, but the sample mean of the class averages for female instructors was slightly lower for this question (p-value = 0.6359). There was also no significant difference in means for Questions 12 and 13, about the availability of the instructor and providing feedback in a timely manner. The sample means for male and female instructors were very close to each other, with the sample means for females only being very slightly higher (p- values = 0.6848 and 0.8016, respectively).

Since research has shown that female faculty are rated lower in the science field [

Sex | N | Mean | Std Dev | Std Err | Minimum | Maximum |
---|---|---|---|---|---|---|

F | 887 | 4.2703 | 0.6180 | 0.0208 | 1.0000 | 5.0000 |

M | 1199 | 4.2758 | 0.6191 | 0.0179 | 1.0000 | 5.0000 |

Diff (1-2) | −0.00552 | 0.6186 | 0.0274 |

Method | Variances | DF | t Value | Pr > |t| |
---|---|---|---|---|

Pooled | Equal | 2084 | −0.20 | 0.8405 |

Satterthwaite | Unequal | 1910.9 | −0.20 | 0.8404 |

Sex | N | Mean | Std Dev | Std Err | Minimum | Maximum |
---|---|---|---|---|---|---|

F | 887 | 4.3301 | 0.6187 | 0.0208 | 1.0000 | 5.0000 |

M | 1199 | 4.3420 | 0.6188 | 0.0179 | 1.0000 | 5.0000 |

Diff (1-2) | −0.0119 | 0.6188 | 0.0274 |

Method | Variances | DF | t Value | Pr > |t| |
---|---|---|---|---|

Pooled | Equal | 2084 | −0.44 | 0.6628 |

Satterthwaite | Unequal | 1909.3 | −0.44 | 0.6628 |

Sex | N | Mean | Std Dev | Std Err | Minimum | Maximum |
---|---|---|---|---|---|---|

F | 887 | 4.2936 | 0.6469 | 0.0217 | 1.2500 | 5.0000 |

M | 1199 | 4.2472 | 0.6710 | 0.0194 | 1.0000 | 5.0000 |

Diff (1-2) | 0.0464 | 0.6609 | 0.0293 |

Method | Variances | DF | t Value | Pr > |t| |
---|---|---|---|---|

Pooled | Equal | 2084 | 1.59 | 0.1127 |

Satterthwaite | Unequal | 1946 | 1.60 | 0.1108 |

Sex | N | Mean | Std Dev | Std Err | Minimum | Maximum |
---|---|---|---|---|---|---|

F | 887 | 4.1927 | 0.5889 | 0.0198 | 1.6000 | 5.0000 |

M | 1199 | 4.2207 | 0.5878 | 0.0170 | 1.0000 | 5.0000 |

Diff (1-2) | −0.0280 | 0.5882 | 0.0261 |

Method | Variances | DF | t Value | Pr > |t| |
---|---|---|---|---|

Pooled | Equal | 2084 | −1.07 | 0.2831 |

Satterthwaite | Unequal | 1907 | −1.07 | 0.2832 |

Sex | N | Mean | Std Dev | Std Err | Minimum | Maximum |
---|---|---|---|---|---|---|

F | 887 | 4.3222 | 0.5981 | 0.0201 | 1.0000 | 5.0000 |

M | 1199 | 4.3617 | 0.5553 | 0.0160 | 1.0000 | 5.0000 |

Diff (1-2) | −0.0395 | 0.5739 | 0.0254 |

Method | Variances | DF | t Value | Pr > |t| |
---|---|---|---|---|

Pooled | Equal | 2084 | −1.56 | 0.1200 |

Satterthwaite | Unequal | 1826.9 | −1.54 | 0.1242 |

Sex | N | Mean | Std Dev | Std Err | Minimum | Maximum |
---|---|---|---|---|---|---|

F | 887 | 4.2371 | 0.5477 | 0.0184 | 1.0000 | 5.0000 |

M | 1199 | 4.1667 | 0.5614 | 0.0162 | 1.0000 | 5.0000 |

Diff (1-2) | 0.0704 | 0.5556 | 0.0246 |

Method | Variances | DF | t Value | Pr > |t| |
---|---|---|---|---|

Pooled | Equal | 2084 | 2.86 | 0.0043 |

Satterthwaite | Unequal | 1934.4 | 2.87 | 0.0041 |

Sex | N | Mean | Std Dev | Std Err | Minimum | Maximum |
---|---|---|---|---|---|---|

F | 887 | 4.3359 | 0.5629 | 0.0189 | 2.0000 | 5.0000 |

M | 1199 | 4.3250 | 0.5547 | 0.0160 | 1.0000 | 5.0000 |

Diff (1-2) | 0.0109 | 0.5582 | 0.0247 |

Method | Variances | DF | t Value | Pr > |t| |
---|---|---|---|---|

Pooled | Equal | 2084 | 0.44 | 0.6586 |

Satterthwaite | Unequal | 1893.5 | 0.44 | 0.6593 |

Sex | N | Mean | Std Dev | Std Err | Minimum | Maximum |
---|---|---|---|---|---|---|

F | 887 | 4.2859 | 0.5621 | 0.0189 | 1.0000 | 5.0000 |

M | 1199 | 4.2671 | 0.5698 | 0.0165 | 1.0000 | 5.0000 |

Diff (1-2) | 0.0188 | 0.5665 | 0.0251 |

Method | Variances | DF | t Value | Pr > |t| |
---|---|---|---|---|

Pooled | Equal | 2084 | 0.75 | 0.4529 |

Satterthwaite | Unequal | 1923.1 | 0.75 | 0.4520 |

Sex | N | Mean | Std Dev | Std Err | Minimum | Maximum |
---|---|---|---|---|---|---|

F | 887 | 4.2319 | 0.6026 | 0.0202 | 1.2500 | 5.0000 |

M | 1199 | 4.2166 | 0.6196 | 0.0179 | 1.0000 | 5.0000 |

Diff (1-2) | 0.0153 | 0.6124 | 0.0271 |

Method | Variances | DF | t Value | Pr > |t| |
---|---|---|---|---|

Pooled | Equal | 2084 | 0.56 | 0.5739 |

Satterthwaite | Unequal | 1937.3 | 0.56 | 0.5723 |

Sex | N | Mean | Std Dev | Std Err | Minimum | Maximum |
---|---|---|---|---|---|---|

F | 887 | 4.3324 | 0.5664 | 0.0190 | 1.0000 | 5.0000 |

M | 1199 | 4.3441 | 0.5415 | 0.0156 | 1.0000 | 5.0000 |

Diff (1-2) | −0.0117 | 0.5522 | 0.0245 |

Method | Variances | DF | t Value | Pr > |t| |
---|---|---|---|---|

Pooled | Equal | 2084 | −0.48 | 0.6336 |

Satterthwaite | Unequal | 1860.1 | −0.47 | 0.6359 |

Sex | N | Mean | Std Dev | Std Err | Minimum | Maximum |
---|---|---|---|---|---|---|

F | 887 | 4.2201 | 0.5119 | 0.0172 | 1.0000 | 5.0000 |

M | 1199 | 4.1705 | 0.5399 | 0.0156 | 1.0000 | 5.0000 |

Diff (1-2) | 0.0496 | 0.5281 | 0.0234 |

Method | Variances | DF | t Value | Pr > |t| |
---|---|---|---|---|

Pooled | Equal | 2084 | 2.12 | 0.0340 |

Satterthwaite | Unequal | 1961.7 | 2.14 | 0.0327 |

Sex | N | Mean | Std Dev | Std Err | Minimum | Maximum |
---|---|---|---|---|---|---|

F | 887 | 4.3247 | 0.5343 | 0.0179 | 1.7500 | 5.0000 |

M | 1199 | 4.3149 | 0.5582 | 0.0161 | 1.0000 | 5.0000 |

Diff (1-2) | 0.00979 | 0.5482 | 0.0243 |

Method | Variances | DF | t Value | Pr > |t| |
---|---|---|---|---|

Pooled | Equal | 2084 | 0.40 | 0.6867 |

Satterthwaite | Unequal | 1952.8 | 0.41 | 0.6848 |

Sex | N | Mean | Std Dev | Std Err | Minimum | Maximum |
---|---|---|---|---|---|---|

F | 887 | 4.2960 | 0.5839 | 0.0196 | 1.2500 | 5.0000 |

M | 1199 | 4.2896 | 0.5715 | 0.0165 | 1.0000 | 5.0000 |

Diff (1-2) | 0.00644 | 0.5768 | 0.0255 |

Method | Variances | DF | t Value | Pr > |t| |
---|---|---|---|---|

Pooled | Equal | 2084 | 0.25 | 0.8009 |

Satterthwaite | Unequal | 1886.2 | 0.25 | 0.8016 |

Sex | N | Mean | Std Dev | Std Err | Minimum | Maximum |
---|---|---|---|---|---|---|

F | 887 | 4.2817 | 0.5706 | 0.0192 | 1.6000 | 5.0000 |

M | 1199 | 4.2610 | 0.5718 | 0.0165 | 1.0000 | 5.0000 |

Diff (1-2) | 0.0207 | 0.5713 | 0.0253 |

Method | Variances | DF | t Value | Pr > |t| |
---|---|---|---|---|

Pooled | Equal | 2084 | 0.82 | 0.4132 |

Satterthwaite | Unequal | 1911.4 | 0.82 | 0.4131 |

Sex | N | Mean | Std Dev | Std Err | Minimum | Maximum |
---|---|---|---|---|---|---|

F | 887 | 4.3334 | 0.4834 | 0.0162 | 2.0000 | 5.0000 |

M | 1199 | 4.2909 | 0.5212 | 0.0151 | 1.0000 | 5.0000 |

Diff (1-2) | 0.0425 | 0.5054 | 0.0224 |

Method | Variances | DF | t Value | Pr > |t| |
---|---|---|---|---|

Pooled | Equal | 2084 | 1.90 | 0.0576 |

Satterthwaite | Unequal | 1981.4 | 1.92 | 0.0548 |

Sex | N | Mean | Std Dev | Std Err | Minimum | Maximum |
---|---|---|---|---|---|---|

F | 887 | 4.2866 | 0.5496 | 0.0185 | 1.0000 | 5.0000 |

M | 1199 | 4.2998 | 0.5350 | 0.0155 | 1.0000 | 5.0000 |

Diff (1-2) | −0.0133 | 0.5413 | 0.0240 |

Method | Variances | DF | t Value | Pr > |t| |
---|---|---|---|---|

Pooled | Equal | 2084 | −0.55 | 0.5804 |

Satterthwaite | Unequal | 1880.2 | −0.55 | 0.5819 |

Sex | N | Mean | Std Dev | Std Err | Minimum | Maximum |
---|---|---|---|---|---|---|

F | 177 | 4.2393 | 0.5589 | 0.0420 | 2.3717 | 5.0000 |

M | 359 | 4.1506 | 0.6387 | 0.0337 | 1.0000 | 5.0000 |

Diff (1-2) | 0.0887 | 0.6135 | 0.0563 |

Method | Variances | DF | t Value | Pr > |t| |
---|---|---|---|---|

Pooled | Equal | 534 | 1.57 | 0.1161 |

Satterthwaite | Unequal | 395.07 | 1.65 | 0.1004 |

Sex | N | Mean | Std Dev | Std Err | Minimum | Maximum |
---|---|---|---|---|---|---|

F | 177 | 4.2855 | 0.5664 | 0.0426 | 2.4336 | 5.0000 |

M | 359 | 4.1888 | 0.6756 | 0.0357 | 1.0000 | 5.0000 |

Diff (1-2) | 0.0967 | 0.6417 | 0.0589 |

Method | Variances | DF | t Value | Pr > |t| |
---|---|---|---|---|

Pooled | Equal | 534 | 1.64 | 0.1014 |

Satterthwaite | Unequal | 410.29 | 1.74 | 0.0824 |

Sex | N | Mean | Std Dev | Std Err | Minimum | Maximum |
---|---|---|---|---|---|---|

F | 177 | 4.2515 | 0.6058 | 0.0455 | 2.1947 | 5.0000 |

M | 359 | 4.1098 | 0.7088 | 0.0374 | 1.0000 | 5.0000 |

Diff (1-2) | 0.1417 | 0.6765 | 0.0621 |

Method | Variances | DF | t Value | Pr > |t| |
---|---|---|---|---|

Pooled | Equal | 534 | 2.28 | 0.0230 |

Satterthwaite | Unequal | 403.41 | 2.40 | 0.0166 |

Sex | N | Mean | Std Dev | Std Err | Minimum | Maximum |
---|---|---|---|---|---|---|

F | 177 | 4.0381 | 0.5628 | 0.0423 | 2.3333 | 5.0000 |

M | 359 | 4.0758 | 0.5859 | 0.0309 | 1.0000 | 5.0000 |

Diff (1-2) | −0.0377 | 0.5784 | 0.0531 |

Method | Variances | DF | t Value | Pr > |t| |
---|---|---|---|---|

Pooled | Equal | 534 | −0.71 | 0.4783 |

Satterthwaite | Unequal | 363.33 | −0.72 | 0.4725 |

Sex | N | Mean | Std Dev | Std Err | Minimum | Maximum |
---|---|---|---|---|---|---|

F | 177 | 4.1894 | 0.6386 | 0.0480 | 2.0000 | 5.0000 |

M | 359 | 4.3100 | 0.5427 | 0.0286 | 2.0000 | 5.0000 |

Diff (1-2) | −0.1206 | 0.5761 | 0.0529 |

Method | Variances | DF | t Value | Pr > |t| |
---|---|---|---|---|

Pooled | Equal | 534 | −2.28 | 0.0230 |

Satterthwaite | Unequal | 304.65 | −2.16 | 0.0317 |

Sex | N | Mean | Std Dev | Std Err | Minimum | Maximum |
---|---|---|---|---|---|---|

F | 177 | 4.0063 | 0.5612 | 0.0422 | 2.0000 | 5.0000 |

M | 359 | 3.9361 | 0.5868 | 0.0310 | 1.0000 | 5.0000 |

Diff (1-2) | 0.0702 | 0.5785 | 0.0531 |

Method | Variances | DF | t Value | Pr > |t| |
---|---|---|---|---|

Pooled | Equal | 534 | 1.32 | 0.1869 |

Satterthwaite | Unequal | 364.76 | 1.34 | 0.1805 |

Sex | N | Mean | Std Dev | Std Err | Minimum | Maximum |
---|---|---|---|---|---|---|

F | 177 | 4.3116 | 0.4841 | 0.0364 | 2.3333 | 5.0000 |

M | 359 | 4.2158 | 0.6059 | 0.0320 | 1.0000 | 5.0000 |

Diff (1-2) | 0.0958 | 0.5687 | 0.0522 |

Method | Variances | DF | t Value | Pr > |t| |
---|---|---|---|---|

Pooled | Equal | 534 | 1.83 | 0.0672 |

Satterthwaite | Unequal | 427.52 | 1.98 | 0.0486 |

Sex | N | Mean | Std Dev | Std Err | Minimum | Maximum |
---|---|---|---|---|---|---|

F | 177 | 4.2013 | 0.4823 | 0.0362 | 3.0000 | 5.0000 |

M | 359 | 4.1366 | 0.6250 | 0.0330 | 1.0000 | 5.0000 |

Diff (1-2) | 0.0647 | 0.5819 | 0.0534 |

Method | Variances | DF | t Value | Pr > |t| |
---|---|---|---|---|

Pooled | Equal | 534 | 1.21 | 0.2265 |

Satterthwaite | Unequal | 439.9 | 1.32 | 0.1874 |

Sex | N | Mean | Std Dev | Std Err | Minimum | Maximum |
---|---|---|---|---|---|---|

F | 177 | 4.1465 | 0.5244 | 0.0394 | 2.6991 | 5.0000 |

M | 359 | 4.0847 | 0.6461 | 0.0341 | 1.0000 | 5.0000 |

Diff (1-2) | 0.0618 | 0.6086 | 0.0559 |

Method | Variances | DF | t Value | Pr > |t| |
---|---|---|---|---|

Pooled | Equal | 534 | 1.11 | 0.2694 |

Satterthwaite | Unequal | 421.84 | 1.19 | 0.2363 |

Sex | N | Mean | Std Dev | Std Err | Minimum | Maximum |
---|---|---|---|---|---|---|

F | 177 | 4.2926 | 0.4657 | 0.0350 | 2.6667 | 5.0000 |

M | 359 | 4.3426 | 0.5114 | 0.0270 | 2.0000 | 5.0000 |

Diff (1-2) | −0.0500 | 0.4968 | 0.0456 |

Method | Variances | DF | t Value | Pr > |t| |
---|---|---|---|---|

Pooled | Equal | 534 | −1.10 | 0.2736 |

Satterthwaite | Unequal | 381.25 | −1.13 | 0.2587 |

Sex | N | Mean | Std Dev | Std Err | Minimum | Maximum |
---|---|---|---|---|---|---|

F | 177 | 4.0409 | 0.5240 | 0.0394 | 2.6250 | 5.0000 |

M | 359 | 4.0269 | 0.6053 | 0.0319 | 1.0000 | 5.0000 |

Diff (1-2) | 0.0141 | 0.5798 | 0.0533 |

Method | Variances | DF | t Value | Pr > |t| |
---|---|---|---|---|

Pooled | Equal | 534 | 0.26 | 0.7918 |

Satterthwaite | Unequal | 398.9 | 0.28 | 0.7816 |

Sex | N | Mean | Std Dev | Std Err | Minimum | Maximum |
---|---|---|---|---|---|---|

F | 177 | 4.2335 | 0.4775 | 0.0359 | 3.0000 | 5.0000 |

M | 359 | 4.2077 | 0.5880 | 0.0310 | 1.0000 | 5.0000 |

Diff (1-2) | 0.0259 | 0.5540 | 0.0509 |

Method | Variances | DF | t Value | Pr > |t| |
---|---|---|---|---|

Pooled | Equal | 534 | 0.51 | 0.6115 |

Satterthwaite | Unequal | 421.68 | 0.55 | 0.5860 |

Sex | N | Mean | Std Dev | Std Err | Minimum | Maximum |
---|---|---|---|---|---|---|

F | 177 | 4.2718 | 0.4791 | 0.0360 | 2.0000 | 5.0000 |

M | 359 | 4.2398 | 0.5349 | 0.0282 | 1.0000 | 5.0000 |

Diff (1-2) | 0.0320 | 0.5171 | 0.0475 |

Method | Variances | DF | t Value | Pr > |t| |
---|---|---|---|---|

Pooled | Equal | 534 | 0.67 | 0.5008 |

Satterthwaite | Unequal | 386.95 | 0.70 | 0.4847 |

Sex | N | Mean | Std Dev | Std Err | Minimum | Maximum |
---|---|---|---|---|---|---|

F | 177 | 4.2060 | 0.4996 | 0.0376 | 2.9554 | 5.0000 |

M | 359 | 4.1444 | 0.5837 | 0.0308 | 1.0000 | 5.0000 |

Diff (1-2) | 0.0615 | 0.5574 | 0.0512 |

Method | Variances | DF | t Value | Pr > |t| |
---|---|---|---|---|

Pooled | Equal | 534 | 1.20 | 0.2298 |

Satterthwaite | Unequal | 402.88 | 1.27 | 0.2059 |

Sex | N | Mean | Std Dev | Std Err | Minimum | Maximum |
---|---|---|---|---|---|---|

F | 177 | 4.1940 | 0.4350 | 0.0327 | 3.0000 | 5.0000 |

M | 359 | 4.1558 | 0.5303 | 0.0280 | 2.0000 | 5.0000 |

Diff (1-2) | 0.0382 | 0.5009 | 0.0460 |

Method | Variances | DF | t Value | Pr > |t| |
---|---|---|---|---|

Pooled | Equal | 534 | 0.83 | 0.4071 |

Satterthwaite | Unequal | 418.01 | 0.89 | 0.3757 |

Sex | N | Mean | Std Dev | Std Err | Minimum | Maximum |
---|---|---|---|---|---|---|

F | 177 | 4.2277 | 0.4545 | 0.0342 | 2.6667 | 5.0000 |

M | 359 | 4.1930 | 0.5268 | 0.0278 | 2.0000 | 5.0000 |

Diff (1-2) | 0.0347 | 0.5041 | 0.0463 |

Method | Variances | DF | t Value | Pr > |t| |
---|---|---|---|---|

Pooled | Equal | 534 | 0.75 | 0.4545 |

Satterthwaite | Unequal | 400.06 | 0.79 | 0.4319 |

has shown that student expect female faculty to not be as harsh on grading, with students giving female faculty lower evaluations on grading if they are harsh [

We did consider the differences in variation of class average responses for each question between male and female instructors over the entire University (All Colleges). A significant difference in variability was found between class average responses for classes taught by male and female instructors for questions 5, and 15 at alpha equal to 0.05. A marginally significant difference was found between the class average responses for question 11 (p-value = 0.0906). The class average responses for question 5 had a larger variability for female faculty. This question was on rating the fairness of grading procedures. It is noted again that research has shown students expect female faculty to grade less harshly [

In the last phase of our research, we considered only classes in which at least 5 male students and 5 female students responded. Other research has found that male students tend to rate male instructors higher and female students tend to rate female instructors higher [

Question | Proportion 1 (female) | Proportion 2 (male) | Test Statistic | P-value |
---|---|---|---|---|

1 | 0.4375 | 0.5432 | −1.73 | 0.04182^{*} |

2 | 0.4821 | 0.5432 | −1.25 | 0.10565 |

3 | 0.5089 | 0.5926 | −1.37 | 0.08534 |

4 | 0.3750 | 0.5432 | −2.79 | 0.00264^{**} |

5 | 0.4286 | 0.5617 | −2.19 | 0.01426^{**} |

6 | 0.4821 | 0.6049 | −2.02 | 0.02169^{**} |

7 | 0.4123 | 0.5185 | −1.75 | 0.04006^{*} |

8 | 0.3947 | 0.5000 | −1.74 | 0.04093^{*} |

9 | 0.4474 | 0.5802 | −2.19 | 0.01426^{**} |

10 | 0.4561 | 0.5370 | −1.27 | 0.10204 |

11 | 0.4298 | 0.5864 | −2.59 | 0.00480^{**} |

12 | 0.4123 | 0.5000 | −1.45 | 0.07353 |

13 | 0.4649 | 0.4877 | −0.37 | 0.05715 |

14 | 0.4825 | 0.5864 | −1.71 | 0.04363^{*} |

15 | 0.4561 | 0.5123 | −0.87 | 0.19215 |

16 | 0.4298 | 0.5370 | −1.77 | 0.03836 |

than the sample proportion for male instructors. There were 10 questions in which the male student response was significantly lower for female instructors at alpha equal to 0.05. These were for questions 1, 4, 5, 6, 7, 8, 9, 11, 14, and 16. There were 2 additional questions in which the male response was significantly lower at alpha equal to 0.10 for female instructors. These were for questions 3 and 12 with the proportions not significantly different at alpha equal to 0.10 for question 10, but with a p-value of 0.102. This is similar to the research of Feldman [

Our research did not find the gender indicator variable to be significant when proportion of students expecting A’s and B’s in model (All Colleges). We did find significant differences for questions 6, 11, and 15 with student responses associated with female faculty higher (understanding of course content, met or exceeded course objectives, instructor set and maintained high standards); this is when proportion of students expecting A’s and B’s is not controlled for in model. It does appear that a higher proportion of students taking courses from female faculty are expecting A’s or B’s. We also compared the variances of class average responses for male and female faculty across all colleges. The variances were significantly different or marginally significantly different for 3 of the 16 questions. The variance of class average responses was significantly higher for female faculty on question 5, the “fairness of procedures for grading”. In the other two questions, the variance of class averages for male faculty was higher.

When considering only the College of Science and Mathematics, we did not find that female faculty was rated lower on average. We found female faculty to be associated with significantly higher ratings in creating an atmosphere conducive to learning (other research has found this), and communication ability of the instructor as a teacher (marginally). We did find that female faculty was rated significantly lower on the fairness of grading procedures. We did find that the variances of the class average responses between male and female faculty were significant or marginally significantly different on 13 of the 16 questions. In all of these cases, except for question 5 asking about the fairness of grading procedures, the variances were higher for male faculty class averages.

LucasHuebner,Rhonda C.Magel, (2015) A Gendered Study of Student Ratings of Instruction. Open Journal of Statistics,05,552-567. doi: 10.4236/ojs.2015.56058