It is well known that the length of a test contributes to reliability. A test with only a few items can gather only a small amount of information regarding what the student does and does not know. A test with more items or questions allows you to gather more information and therefore will most likely be more reliable. In fact, there is a mathematical relation between the number of items a test has and the reliability of that test (Brown, 1910; Cronbach, 1951; Spearman, 1910). A similar concept applies to the number of elements that are used to determine a final grade (such as grades from papers, scores on tests, completed assignments). Centers of teaching and learning at universities encourage instructors to consider enough elements to ensure that a final grade has a high degree of precision (http://cte.illinois.edu/testing/exam/course_grades.html).
Another factor influencing reliability is consistency in scoring. This issue arises most when scoring essays in which the amount of credit given depends on the scorer’s subjective opinion. The scorer could be inconsistent and assign a paper an A one day and the same paper a B the next day. Also, if multiple people are scoring a test, two graders could potentially provide different scores for the same exact response. All these differences within individual graders and between graders degrade the reliability of a test.
There is actually a significant amount of interplay between reliability and validity. Basically, no matter how much work you invest in ensuring that the content of the test matches your content domain and that the tasks are measuring the construct of interest, if your test cannot provide consistent scores, then your test will lack evidence of validity. You cannot have a valid test with low reliability. In another scenario, you could have a test with extremely high reliability, but this does not guarantee that the test is valid. You need evidence of both validity and high reliability for a test to be considered good quality.
In order to create an exam that is reliable, then, the challenge is to create an exam with enough items to provide a sufficient amount of information about the student’s mastery of the content and to ensure consistency in scoring.
No comments:
Post a Comment