Memo 1: Guessing | Memo 2: Difficulty | Memo 3: Essays
Memo 4: Multiple Choice 1 | Memo 5: Multiple Choice 2
Memo 6: Averaging Grades | Memo 7: Assigning Grades
Memo 8: Reliability | Memo 9: Missed Test
Memo 10: Multiple Choice 3 | Memo 11: Absolute/Relative Grading
Robert B. Frary
Traditional grading practices used by many teachers at nearly all levels of education seem geared to a belief that test scores of around 70% or better represent a "passable" level of achievement. Sometimes circumstances seem to indicate the advisability of a lower level, but typically such a level is adopted only with misgivings. Reflecting these misgivings, the instructor would usually prefer that the situation not arise again and the knowledge of the fact that the grades were "curved" be restricted as much as possible.
The thinking that leads to the idea of a passing score in the neighborhood of 70% may have its roots in elementary education. For example, consider a spelling test. The teacher selects more or less randomly from a list of words the students have been assigned. Then a score of 70% is probably a fairly accurate indicator that the student knows 70% of the words on the list. Similar arguments can be made for arithmetic computation tests when problems of a specific type have been formulated in some random manner.
However, this approach to testing frequently breaks down, even at the elementary school level. Consider a test in history. How does one define the universe of all questions that might appear on a test? Moreover, even if such a list were available, random selection from it would probably not yield a test with satisfactory content from the standpoint of what the teacher wanted to emphasize. There are test construction methods available for dealing with these problems, but the average teacher simply sits down and, consulting a list of course objectives, course or unit outlines, etc., writes enough test questions to fill the allotted time. Under these circumstances, there is certainly no assurance that students who should get passing grades will be able to answer 70% or more correctly. The test may be somewhat harder or easier than the instructor intended.
Eventually, with experience, most instructors learn to write tests that are about as difficult as they intend, and for many this outcome means that marginal students score about 70% or perhaps as low as 60%. Since this level is essentially arbitrary, it is reasonable to question whether it is desirable. In fact, we will argue that a 70% passing level is much too high for a large majority of college-level classroom tests. There are exceptions, of course. In what follows, we will not be discussing essay examinations or tests designed to measure degree of achievement in a restricted and highly defined subarea of a course (criterion referenced tests). What we are concerned about are tests covering fairly diverse topics on which scores are determined by adding up points for each right answer. Almost any multiple-choice midterm or final examination would be in the category as would most short-answer and problem-solving tests.
Since such a test may be as hard or as easy as the instructor makes it, it is clear that the percent right does not estimate some level of achievement directly as do spelling or arithmetic tests. What the scores do provide is a ranking of class members in terms of their achievement over the content of the test. Under these circumstances maximum testing effectiveness is gained when the average score is in the range 50% - 60%. To see why this is the case, consider a question gotten right by 99 of 100 examinees. This outcome provides 99 "bits" of information that could be used to rank the examinees, namely, that each of the 99 who answered correctly knows more than the one who answered incorrectly. If only 90 answer correctly, 900 "bits" are generated. Specifically, the first examinee who answered correctly knows more than each of the 10 who answered incorrectly, and so on for each of the 90 who answered correctly, which generates 90 X 10 "bits." Of course, the maximum number of "bits" is generated when 50 answer correctly and 50 answer incorrectly; 50 X 50 = 2500 "bits."
Hence, questions that only about half can answer correctly are the best kind for generating ranking information. Obviously, using a lot of them on a test will yield an average score somewhat below 70%. Of course, it is desirable to ask a few easy questions to encourage students, especially at the beginning of a test, and to ask some really hard questions to let better students learn their own capabilities and limitations.
One result of making a test more difficult will be a wider spread of scores. On a hundred question test with an average score of 80, nearly all scores will be in the range of 60 to 100. If the average score is 55, scores will probably range from 25 to 90. Then fewer examinees will earn any given score, which makes letter grade designation much more fair. In the former case a large number of students may receive the wrong letter grade due to very small errors in grading or slight variations in the instructor's liberality on a given test. While these problems cannot be completely avoided, their effect is minimized when fewer students earn scores adjacent to letter grade boundaries.
The recommendation to avoid many questions answered correctly by more than 80 - 90% of examinees can also be justified from the standpoint of preventing waste -- waste of clerical time, testing time, and supplies. There is simply no need to write, print and obtain responses to that half of a typical test which nearly all examinees get right. The instructor can gain about as much ranking information with a test half as long containing mainly the more difficult questions and even better ranking information from such a test of the original length. It is only necessary to assume that all functioning, qualified students can answer a large percentage of the easy questions. A completely negligent student will score badly enough on the harder test to justify a failing grade.
The instructor who introduces harder tests will have a little adjusting to do with respect to grade assignment. Obviously, if the average score is around 55%, the lowest passing score will have to be somewhat below 50%. This outcome will bother some people, though it shouldn't. After all, the easy questions just weren't asked, and it is reasonable to assume that most students would answer nearly all of them correctly.