# Point biserial correlation

The **point biserial correlation** measures item reliability.

How? It correlates student scores on one particular question with their scores on the test as a whole.

The driving assumption is simple: Students who score well on the test as a whole should on average score well on the question under review. Students who struggle on the test as a whole should on average struggle on the question under review. If a question deviates from this assumption (aka, a "suspect" question), the point biserial correlation lets us know.

The point biserial correlation ranges from a low of **-1.0** to a high of **+1.0**.

The closer the point biserial correlation is to +1.0 the more reliable the question is considered because it discriminates well among students who mastered the test material and those who did not.

A point biserial correlation of 0.0 means the question didn't discriminate at all. Imagine a test where all 20 students answered Question 1 correctly. Since Question 1 doesn't discriminate among any of the students relative to how they performed on the rest of the test, its point biserial correlation of 0.0 makes perfect sense.

A negative point biserial correlation means that students who performed well on the test as a whole tended to miss the question under review and students who didn't perform as well on the test as a whole got it right. It's a red flag, and there are a number of possible things to check. Is the answer key correct? Is the question clearly worded? If it's multiple choice, are the choices too similar?

EAC suggestion: For high stakes exams intended to distinguish among students who mastered the material from those who did not, shoot for questions with point biserial correlations greater than +0.30. They're considered very good items. Questions with point biserial correlations less than +0.09 are considered poor. Questions with point biserial correlations between +0.09 and +0.30 are considered acceptable to reasonably good.

