When trying to determine if an item is good or bad, the main deciding factor is the item’s ability to discriminate students who have mastered the material from those students who have not. Simply put, bad items do not have the ability to discriminate. The key is to look at who is getting the item correct and who is getting it wrong. If only the high-performing students are getting the answer right, then the item is simply difficult. On the other hand, if mid- and low-performing students are also getting the answer correct or if they are the only ones getting the item correct and the high-performing students are getting it wrong, then there is something wrong with the item. With these patterns, the item is not doing what it is intended to do, and that is measure mastery of the content.
So how do you figure out who is getting the item correct? One of the most accepted ways of evaluating an item is to calculate a correlation. The technical term for the kind of correlation you are looking for (or calculating) is a point biserial. The term point biserial is the name for a special kind of correlation. Instead of correlating two sets of things that are on a continuous scale, you are simply correlating one thing that is on a continuous scale and one thing that has only two possible values: correct or incorrect. At a high level, what you are doing is correlating a response on a single question with the student’s overall test score. The overall test score gives you an indication of whether the student is high-performing or low-performing. If an item is functioning well, then it will correlate with the overall test scores of your students.
Often scoring services such as GradeHub will provide point biserials for you.
A high point-biserial reflects the fact that the item is doing a good job of discriminating your high-performing students from your low-performing students. Values for point biserials can range from -1.00 to 1.00. Values of 0.15 or higher mean that the item is performing well. Generally the students with high scores are answering it correctly and the students with low scores are getting it wrong. Point biserials around zero indicate that there is no clear pattern. For example, some high-performing students may be getting the item right, but so are some low-performing students.
Point biserials that are negative signify a big problem. With this pattern, the high-performing students are getting the answer wrong and the low- and/or mid-performing students are getting it right. This pattern is the complete opposite of what makes an item good. Researchers have recommended removing items that have a negative point-biserial (Kaplan & Saccuzzo, 2013) or even a point biserial less than 0.15 (Varma, 2006). Clearly, deleting half the items on your test because of negative point-biserials is not a desirable outcome, but awareness of item discrimination issues can help you make some tough decisions regarding what cut-off you want to choose and can help you to improve future tests.
References
Kaplan, R. M. & Saccuzzo, D. P. (2013). Psychological testing: Principles, applications and issues (8th ed.). Belmont CA: Cengage.
Varma, S. (2006). Preliminary item statistics using point-biserial correlation and p-values. Educational Data Systems Inc.: Morgan Hill CA.