“Optimism and stupidity are nearly synonymous.” Hyman G. Rickover — Speech to US Naval Post Graduate School, March 16, 1954

In this post I picked up on a rather odd comment made by Professor Hattie at a recent conference:

…tests don’t tell kids about how much they’ve learnt. Kids are very, very good at predicting how well they’ll do in a test.”

Are they? In my response I argued that he’s wrong:

Most students are novices – they don’t yet know much about the subject they’re studying. Not only do they not know much, they’re unlikely to know the value of what they do know or have much of an idea about the extent of their ignorance. As such they’re likely to suffer from the Dunning-Kruger effect and over-estimate the extent of their expertise. All of this creates a sense of familiarity with subject content which leads to the illusion of knowledge. The reason tests are so good at building students’ knowledge is because they revealing surprising information about what is actually known as opposed to what we think we know. Added to that, our ability to accurately self-report on anything is weak at best.

With thanks to George Lilley, a bit of investigation has revealed the potential source of Hattie’s mistake. One of the interventions rated most highly in Visible Learning is ‘self-reported grades’ with a whopping effect size of d=1.44.* According to Hattie’s calculations this would represent an incredible advance of over three years additional progress. If Hattie’s right it would criminally negligent not to harness such an unimaginable force. So, what is this voodoo?

It turns out that self-reported grades is… students predicting what grades they hope they are going to get. If you predict you’re going to get an A, then you will! It’s as simple – and as improbable – as that.

Hattie used these 5 meta-analyses used to get the average d = 1.44:

Mabe & West (1982): Validity of self-evaluation of ability. (pdf)
Fachikov & Boud (1989): Student Self-Assessment in Higher Education.
Ross (1998): Self-assessment in second language testing. (pdf)
Falchikov & Goldfinch (2000): Student Peer Assessment in Higher Education.
Kuncel & Crede & Thomas (2005); The Validity of Self-Reported Grade Point Averages, Class Ranks, and Test Scores.

But, as Lilley points out here, two of the studies weren’t even attempted to measure the effect of self-reporting grades. Falchikov (2000) was studying the effects of peer-assessment whilst Kuncel (2005) was testing whether students were able to remember their test scores from the previous year. At least part of the effects cited by Hattie as evidence for self-reported grades is actually evidence of something entirely different.

The authors of several the studies themselves go to the trouble of warning against Hattie’s interpretation:

Since it is often difficult to get results transcripts of student previous GPA’s from High School or College, the aim of this study is to see whether self-reported grades can be used as a substitute. This obviously has time saving administration advantages. Kuncel et al (2o05) p.64

We conceive of the present study as an investigation of the validity of peer marking. Falchikov and Goldfinch (2000) p.288

The intent of this review is to develop general conclusions about the validity of self-evaluation of ability. Mabe and West (1982) p.281

Not only were the studies cited in Visible Learning not in fact measuring what Hattie claims they were, worse, Falchikov and Boud (1989) actually state that “the greater the effect size, the less the self-marker ratings resemble those of staff markers.” (p. 417) Or, in other words, high effect sizes are more likely to down to students’ inability to accurately predict their grades not that over prediction causes increased performance as Hattie appears to have concluded.

The hammer blow comes from Dr Kristen Dicerbo:

The studies that produced the 1.44 effect size did not study self-report grades as a teaching technique. They looked at the correlation of self-report to actual grades, often in the context of whether self-report could be substituted for other kinds of assessment. None of them studied the effect of changing those self-reports. As we all know, correlation does not imply causation. This research does not imply that self-expectations cause grades. [my emphasis]

All this strongly suggests not only that getting students to predict their grades is unlikely to have much of an effect on increasing said grades (Who knew!) but that Hattie is very likely fooling himself when he says students are “very, very good at predicting how well they’ll do in a test.”

*I’ve critiqued the idea of effect sizes here.