The Art of Estimating a Moving Parameter and Reducing Bias Introduced by Inflated Measurements in Student Assessments

In this paper, we discuss the art of estimating the greatest level of understanding obtained by a student based on five assessment types ranked based on their correlation between the set maximum levels of understanding. The results show a weighting system yields a point estimate that has a stronger correlation between the preset levels of understanding than a simple point system.


Introduction
Assigning letter grades based on a measured level of understanding or points successfully attempted is a task that virtually every college professor and teacher in the educational system must address each semester.It is important to test, assess and change if necessary the instruments used as both teaching and grading tools.Grades are subjective to the instructor and need a well-defined rubric with internal structures that can easily be weighted to remove bias introduced into the process of student interaction with each other and the exchange of information; familiarity with chapter review assignments over understanding of materials covered; limitations on time; and stress.Moreover, student understanding over time increases and therefore, information gathered at the beginning of a course becomes less relevant by the end of that course.
There are many types of grading schemes [1]; pass/fail, completion, and per-centages are a few.Percentage grading is the most commonly used method of high schools.However, how are these percentage scores determined?Percentage scores can be evaluated using pooled proportions, mean proportions or weighted mean proportions.Grades let students know how they are doing and what teachers expect of them [2].Some instructors use point structures while others use weighted grades based on various categorizations of assignments.This allows various types of assessments contribution to the overall grade to be adjusted based on relevance and validity of the measure [3].Having varied types of assessments makes grading more accurate and effort based; which can make for a more supportive learning environment [4].Thus, the question becomes "how will these assessments be graded?"On a point scale or percentage scale?A point scale is comparable to a pooled proportion where points earned are considered collectively out of a total where a weighted grade allows for a type of pulley system which is easier to modify during the process of evaluation.Dr. Mary Clement [5] cites the reason for using a point system is to help students that do not understand how to determine their grades based on percentages; however, point structures require greater detail in the initial structure of a course and do not allow for statistical adjustments to be made to more accurately evaluate student's level of understanding throughout the process.To address this question, the point structure and weighting scheme for the introductory statistics course developed by Dr. Wooten are used to simulate 100 sets of grades to compare various grading systems.

Structure of Course
Teaching introductory statistics for over eight years, the instruments used by Dr.
Wooten to measure students' understanding of Introduction to Statistics (STA 2023 at the University of South Florida, Tampa) used to coordinate 800+ students a semester include homework, chapter reviews, projects, midterm exams and a final exam.
In this breakdown of statistical topics, there are eleven homework assignments breaking eight chapters into related topics, submitted online once before the set deadline; eight chapter reviews, submitted online multiple times before the set deadline (and reopened the week before finals as a form of review); three midterm exams covering the three primary topics in Introductory Statistics, namely, descriptive statistics, probability and inferential statistics; and a cumulative final exam.A total of 26 surveys of students understanding of statistical concepts covered are taken.
The process of gathering this information covers a 16 week time span in a course that holds lectures twice a week for 75 minutes in a mass lecture hall with 180 students and help sections one day a week for thirty minutes in groups of 30.
The issues addressed included but are not limited to: • Homework: Students collaborate on homework which can inflate the measured grades on this assessment and reduced its reliability in correctly assess- ing the percent understanding of the outlined materials introducing significant bias in the measured grades.However, it is a form of gathering and organizing information and teaches students to gather and organize information in addition to exposing them to the topics covered in each chapter.• Chapter Reviews: With repeated tries, students sometimes are able to deduce the correct response without full comprehension.As this is practice and the goal is for students to become more familiar with the concepts presented in mass lecture, this is a minor issue and can be self-correcting in that students that manipulate this assignment to get a better score, do not do as well on the exams.
• Projects: Students can be burnt out by the end of the semester and be overwhelmed by the last project in conjunction with other courses.To address this, at the appropriate time in the semester, students are informed that projects will be best two out of three.
• Exams: Exams are limited in two ways: (1) they are time restricted and (2) require concepts best calculated using statistical software on a computer be assessed using a hand held calculators.An additional issue arises when the grading is performed by multiple graders as consistency must be enforced.This was done by giving graders a detailed point structure of the solution key and viewing random samples from each grader to ensure they adhered.
The benefits of each type of assessment include: • Homework: Helps keep students on pace with the course and students has the opportunity for question and answer sessions during help sessions and tutoring sessions offered by the library.
• Chapter Reviews: These assignments are used as a teaching tool as well as a grading tool.Viewing the Summary Statistics for each review shows the areas that need to be addressed again or in more detail.In a class this large with limited direct interaction with students, it is important to spend time on the material that students need clarification on and not waste time on topics the majority of the students already understand.• Projects: Allows students time to apply the statistical concepts covered in their own time and encourages students to use statistics in their area of interest [6].
• Exams: Exams are less susceptible to outside influence and are therefore the most accurate measurement of a student's level of understanding.

Point Assignments and Associated Inflation Factors
The ultimate goal is to have a single point estimate that best estimates the students' overall understanding of the course,, all the while monitoring the students' progress.To this end, point structures are assigned to each of these 26 assessments.In general, homework assignments and chapter reviews range from 15 to 50 points each; projects are 100 points each; and exams are 200 points each.
In this initial part of the study, 100 students' scores will be simulated assum-American Journal of Computational Mathematics ing that there are associated inflation factors for each assessment.For homework, the associated inflation factor will range between 0.05 and 0.55, Equation (1); that is, at the extreme and more likely for later assignments, students working together can artificially show scores on homework 50% -55% higher than their true level of understanding; such as a student who understands 50% of the material may score a 75% or a student who understands 90% of the material scores up to a 95%.
Equation (1) The estimated proportion as related to the true proportion and the inflation factor α. Equation (2) The true proportion as related to the previous proportion and marginal change in the level of understanding over time, where

Simulation
In this study, the inflation factors for homework are set to be 0.05, 0.10, 0.15, 0.2, …, 0.55 for the eleven homework assignments; 0.01, 0.02, …, 0.08 for the eight chapter reviews; 0.2, 0.05, 0.05 for the three projects, and 0.01 for the exams.These inflation factors were based on over 20 years of observations.That is, the inflation factors for the first homework is 5% and as the semester progresses, the material becomes more challenging and students get to know each other and form study groups, this factor increases steadily to a maximum of 55% inflation.
However, this effect is not as strong in the chapter reviews, the last two projects or exams as the chapter reviews are able to be done repeatedly for a higher score and students are more willing to attempt these types of survey questions; for the projects, the first is a dictionary assignment in which similarities are expected, but the second project uses data provided by the individual student on a topic in their area of interests and the third project is the M & M Experiment which requires students to count their own bag of M & M's and use the unique data set in their analysis.
Moreover, the marginal change in the level of understanding is assumed to be a linear progression starting at the initial level of understanding, 0 ω at time equal zero and ending with the maximum level of understanding, ω , after for- ty-five days of lectures and help sessions, Equation (3).
Equation (3) Level of understanding as a function of time.
( ) The information gathered to estimate the student's level of understanding are American Journal of Computational Mathematics outlined in Table 1.
Let i x be the number of points earned for each assignment and i n be the total number of points available for each assignment.Then the point estimates for each individual assignment is the relative frequency, Equation (4).
Equation (4) The estimated level of understanding on each of the individual assignments, This information will initially be considered in one of two ways: an overall pooled proportion, Equation ( 5); and a weighted mean proportion, Equation ( 6).
Let i q be the mean proportion by assessment type; that is, given the set of proportions, partition them into like assessments and then take the average within each group.
Equation (6) Weighted mean proportion In a simulation of 100 students with initial understanding between 50% -70% and maximum level of understanding between the initial reading and 95% showed that the weighted system gives a more accurate estimate of the students' level of understanding, both with and without the inflation factor, Figure 1 and

Comparison of Weighting Schemes
With and without the inflation factor, there is either an over-estimate or under-estimate of the student's level of understanding.Consider the following additional weighting schemes (for a total of nine point estimates) for 1000 students.
Relabel the point estimates outlined in Equation (3) by assessment type: h for the homework, c for the chapter review, p for the projects, e for the midterm exam and f for the final; where the mean proportions for each assessment type are given in Equation ( 7) and the final exam is f and the outlined weighted mean proportion is given in Equation (8).Equation (7) Mean proportions for each assessment type Hence, consider the following seventeen point estimates: 1) Pooled Proportion, as given in Equation (4).
2) Mean Proportion, averaging across the assessment types: Equation (9) Mean proportion, average of mean proportions by category American Journal of Computational Mathematics Equation (20) Modified mid-term exam proportion using linear weights ) Weighted Mean Proportion with four adjustments using linear weights; dropping the lowest homework (percentage wise), dropping the three lowest chapter review assignments (percentage wise), dropping the lowest project grade and using linear weights to measure improvement over time in the midterm exam assessments.Equation (29) Weighted mean proportion with adjustments to the homework, chapter review, projects and mid-term exam modified using trapezoidal weights 17) Weighted Mean Proportion with four adjustments; dropping the lowest homework (percentage wise), dropping the three lowest chapter review assignments (percentage wise), dropping the lowest project grade and using trapezoidal weights to measure improvement over time in the midterm exam assessments.Equation (30) Weighted mean proportion with adjustments to the homework, chapter review, projects and mid-term exam modified using linear weights All outlined point estimates show high correlation to the set level of understanding, Figure 3, and show high correlation to the maximum level of understanding, illustrated in Figure 4.For inflated data, the point estimate most highly correlated with the set level of understanding uses the weighting scheme where the midterm exams (percentage wise) are the best two out of three, projects are the best two out of three, chapter reviews are the best five out of eight, and homework is taken best ten out of eleven, Table 2.For non-inflated data, the point estimate most highly correlated with the set level of understanding uses the weighting scheme where the final replaces the lowest midterm exam (percentage wise), projects are the best two out of three, chapter reviews are the best five out of eight, and homework is taken best ten out of eleven, Table 2.

Addressing Time Bias and Challenge Assignments
Other issues that often need to be addressed are bias introduced when an exam requires more time than allotted and when students are unable to finish which makes the estimated level of understanding lower than the students' actual level of understanding.This requires an adjustment, or what might be referred to as a curve to be implemented.Time bias is usually indicated by a class average less than 65% and a maximum less than 100%.To determine if time bias exists, during each exam, enumerate or place a time stamp at the top of exams as they are submitted.When the lowest grade is one of the first exams submitted then time most likely was not the issue.Otherwise, if all the low scores occurred when time is called, then time may be an issue.
Two main data manipulations are additive and multiplicative.An additive adjustment preserves the order and range of the data by adding a common compensatory value, Equation (8).For example, in a class with an average of 67 and maximum earned grade of 95 out of 100; by adding 5 points to each student's grade, the overall average is brought up to a 72 (where in general 75 is the   assignment is given to gauge the scope of students' learning in order to differentiate students on a rigid scale, a mapping of both the minimum and maximum grades to an appropriate grading scale is applied, Equation (10).For example, on a challenging assignment, with a minimum of 45, mean of 60 and maximum of 85; then the linear transformation ( ) 0.625 0.46875  The effects of the three outlined transformations: additive, multiplicative and linear, are illustrated in Figure 5.

Usefulness
The usefulness of such weighing systems is to accurately assess students and

Conclusion
In conclusion, there is an art to assigning a percent grade to a student.Including various assessments to evaluate students' level of understanding on multiple platforms; creating appropriate point structures for each assessment that is not intimidating to students and maintaining a sound point structure or weighting system which is able to estimate the moving parameter that is the students true understanding of the material takes a skilled hand.While all methods are unbiased, a pure point system ranks 17 out of 17 in accuracy and ability to address inflation; weighting systems which assign zero weights to the least relevant information ranks number one.Therefore, while percentage grades are more complex and therefore more difficult to understand and compute, weighting systems are more precise and accurate when gauging students' level of understanding.
are several factors that affect a student's performance level.First, there are the students' natural level of understanding ω ; which changes over time based on the amount of information covered and what the student comprehends, Equation (2).

Figure 1 .
Figure 1.Scatter plot of students' measured level of understanding with inflation (a) using an overall pooled proportion (black) and (b) using a weighted mean proportion (red) versus the students' maximum level of understanding.

Figure 2 .
Figure 2. Scatter plot of student's measured level of understanding without inflation (a) using an overall pooled proportion (black) and (b) using a weighted mean proportion (red) versus the student's maximum level of understanding.

4 ) 5 ) 6 )
Proportion with one adjustment; dropping the first ordered measure or minimum homework grade percentage-wise, ( )1 h .Equation (10) Adjusted mean proportion for homework dropping lowest per-Mean proportion, average of mean proportions by category using adjusted homework proportion Mean Proportion with two adjustments; dropping both the minimum homework grade and minimum chapter review assignments percentage-wise.Equation (12) Adjusted mean proportion for chapter review top five percentage-wise Mean proportion, average of mean proportions by category using adjusted homework proportion and adjusted) chapter review Mean Proportion with three adjustments; dropping both the minimum homework grade, minimum chapter review assignment and the minimum project grade, percentage-wise.Equation (14) Adjusted mean proportion for projects dropping lowest per-) Mean proportion, average of mean proportions by category using adjusted homework proportion, adjusted chapter review and adjusted projects proportion Mean Proportion with four adjustments; dropping both the minimum homework grade, minimum chapter review assignment, the minimum project grade and minimum exam score, percentage-wise.Equation (16) Adjusted mean proportion for mid-term exam score dropping lowest percentage-wise ( ) Mean proportion, average of mean proportions by category using adjusted homework proportion, adjusted chapter review, adjusted projects proportion and adjusted mid-term exam proportion Proportion with four adjustments modified; dropping both the minimum homework grade, minimum chapter review assignment and the minimum project grade, percentage-wise in addition to the final replacing the minimum exam score percentage-wise (if it helps).Equation (18) Modified mid-term exam proportion allowing final to replace minimum if the final is higher ) Mean proportion, average of mean proportions by category using adjusted homework proportion, adjusted chapter review, adjusted projects proportion and modified mid-term exam proportion ˆˆˆ5 h c p e f ω + + + + = 8) Mean Proportion with four adjustments modified using linear weights; dropping both the minimum homework grade, minimum chapter review assignment and the minimum project grade, percentage-wise in addition to linearly weighting the exam scores.The linearly weighted mean indicates the rate of improvement over time when compared to the standard mean.

1 ˆˆˆ0 2 ˆˆˆ0
) Mean proportion, average of mean proportions by category using adjusted homework proportion, adjusted chapter review, adjusted projects proportion and modified mid-term exam proportion using linear weights Proportion with four adjustments modified using trapezoidal weights; dropping both the minimum homework grade, minimum chapter review assignment and the minimum project grade, percentage-wise in addition to trapezoidal weighting the exam scores.The trapezoidal weighted mean adjusts for bias introduced when students are unfamiliar with an instructor testing style on the first test and the pressures associated with the last exam, this can arise from students being "burnt-out" from multiple classes.Equation (22) Modified mid-term exam proportion using trapezoidal weights ) Mean proportion, average of mean proportions by category using adjusted homework proportion, adjusted chapter review, adjusted projects proportion and modified mid-term exam proportion using linear weights Mean Proportion, as given in Equation (7).11) Weighted Mean Proportion with a single adjustment; that is, a point estimate with the lowest homework (percentage wise) dropped and averaging the top ten percentages in the homework assessments.Equation (24) Weighted mean proportion with single adjustment to the homework Mean Proportion with two adjustments; dropping the lowest homework (percentage wise) and averaging the top five percentages in the chapter review assessments.Equation (25) Weighted mean proportion with adjustments to the homework and chapter Mean Proportion with three adjustments; dropping the lowest homework (percentage wise), dropping the three lowest chapter review assignments (percentage wise) and averaging the top two percentages in the project assessments.Equation (26) Weighted mean proportion with adjustments to the homework, chapter review and projects 13 ˆˆ0.050.1 0.15 0.45 0.25 .Mean Proportion with four adjustments; dropping the lowest homework (percentage wise), dropping the three lowest chapter review assignments (percentage wise), dropping the lowest project grade and averaging the top two percentages in the midterm exam assessments.Equation (27) Weighted mean proportion with adjustments to the homework, chapter review, projects and mid-term exams 14 Mean Proportion with four adjustments modified; dropping the lowest homework (percentage wise), dropping the three lowest chapter review assignments (percentage wise), dropping the lowest project grade and allowing the final to replace the lowest percentage in the midterm exam assessments.Equation (28) Weighted mean proportion with adjustments to the homework, chapter review, projects and mid-term exam modified 15

Figure 3 .
Figure 3. Scatter plot of varying point estimates with inflation versus the students' maximum level of understanding"; that is; the set level of understanding versus (1) the pooled proportion; (2) the mean proportion; (3) the mean proportion with one adjustment; (4) the mean proportion with two adjustments; (5) the mean proportion with three adjustments; (6) the mean proportion with four adjustments; (7)the mean proportion with four adjustments modified; (8) the mean proportion with four adjustments modified using linear weights; (9) the mean proportion with four adjustments modified using trapezoidal weights; (10) weighted mean proportion; (11) weighted mean proportion with a single adjustment; (12)weighted mean proportion with a two adjustments; (13) weighted mean proportion with a three adjustments; (14) weighted mean proportion with a four adjustments; (15) weighted mean proportion with a four adjustments modified; (16) weighted mean proportion with a four adjustments using linear weights; (17) weighted mean proportion with a four adjustments using trapezoidal weights.

=
preserves the order; however, does not preserve the range, Equation (9).For example, in a class with an average of 67 and maximum earned grade of 95 out of 100; by multiplying by 100 average is brought up to 70.5 and the maximum increase to 100.Equation (32) Estimated level of understanding with a multiplicative adjustment.Challenge assignments often require a linear transformation; that is, when an

Figure 4 .
Figure 4. Scatter plot of varying point estimates without inflation versus the students' maximum level of understanding; that is, the maximum level of understanding versus (1) the pooled proportion; (2) the mean proportion; (3) the mean proportion with one adjustment; (4) the mean proportion with two adjustments; (5) the mean proportion with three adjustments; (6) the mean proportion with four adjustments; (7) the mean proportion with four adjustments modified; (8) the mean proportion with four adjustments modified using linear weights; (9) the mean proportion with four adjustments modified using trapezoidal weights; (10) weighted mean proportion; (11) weighted mean proportion with a single adjustment; (12) weighted mean proportion with a two adjustments; (13) weighted mean proportion with a three adjustments; (14) weighted mean proportion with a four adjustments; (15) weighted mean proportion with a four adjustments modified; (16) weighted mean proportion with a four adjustments using linear weights; (17) weighted mean proportion with a four adjustments using trapezoidal weights.
45 to a 75, the mean of 60 to 84.375 and the maximum of 85 to 100.This transformation reduces the variance by a multiple of 0.390625.Equation (33) Estimated level of understanding with a linear adjustment.

Table 1 .
Breakdown of assessments by week, day, assignment and assigned points.