As I set out here, Dr Chris Wheadon has come up with a beautifully simple solution to assessing students’ essays which requires no rubrics, very little marking time and produces extremely reliable results with no attendant loss of validity. It relies on the cumulative power of comparative judgement and represents the future of assessment for subjects which rely on essay length answers to open-ended questions. If you doubt me, the reason might be that your experience of, and sense of success with, mark schemes has blinded you to better alternatives.

Imagine you have 3 water jars, each with the capacity to hold a different, fixed amount of water. Jar A holds 21 units of water, B is capable of holding 127 units, and C can hold up to 3 units. How would you go about measuring a 100 units of water using these jars?*

This question formed the basis for Abraham Luchins’s classic experiment in which subjects were divided into two groups. The experimental group was given five practice problems, followed by 4 critical test problems. The control group did not have the five practice problems. All of the practice problems and some of the critical problems had only one possible solution (if you can’t be bothered working it out, see below.) While most of the test problems could be solved either with the solution learned in the practice rounds or with a simpler, more efficient method, one – the ‘extinction problem’ – could only be solved by generating a novel solution. The majority of the experimental subjects were anchored by their experience of the solution to the practice problems and struggled to see simpler more efficient solutions and were unable to tackle the extinction problem.

The conclusion was that if people are given a series of problems each with a similar a solution they become blind to the possibility of alternative, more effective solutions. We appear to have a hardwired preference for solutions we already know rather than working out potentially superior one. This has become known as ‘the Einstellung effect‘ after the German word for ‘setting’ or ‘predisposition’. Over the years, psychologists have observed the Einstellung effect in all sorts of diverse fields and time and again have noted that the experience of successfully completing a task in one way blinds us to other possibilities; the existence of ‘good’ ideas prevent better ideas being developed.

This is exactly the problem with the way we attempt to assess learning. In the US, the tradition has been to value reliability by scoring students’ response to easily standardised multiple choice questions. In the UK we have a preference for the validity of longer answers which allow for a much greater range of complex responses. The trouble is, the only way to standardise these responses and compare outcomes in national exams is to design detailed, but vague rubrics which are open to a very broad range of interpretations.

The grade or level descriptor produces an Einstellung effect which has prevented most of us from developing better ways to assess performance. For me, it wasn’t until Daisy Christodoulou pointed out the failings of descriptive statements for about the fifth time that I started to understand that a better alternation was not just needed but possible. It’s taken some time to explore my biases sufficiently to be able to see a new possibility, but that possibility is out there and, I contend, it’s better than our current way of thinking about assessment.

* The solution is to fill up Jar B and pour out enough water to fill A once and C twice, leaving 100 units in Jar B.