Testing & assessment – have we been doing the right things for the wrong reasons?

THIS. Now take the test AGAIN

A curious peculiarity of our memory is that things are impressed better by active than by passive repetition. I mean that in learning (by heart, for example), when we almost know the piece, it pays better to wait and recollect by an effort from within, than to look at the book again. If we recover the words in the former way, we shall probably know them the next time; if in the latter way, we shall very likely need the book once more.

William James, The principles of psychology (1890)


Never stop testing, and your advertising will never stop improving.

David Ogilvy

Tests are rubbish, right? Like me, you may find yourself baring your teeth at the thought of being drilled to death, or inflicting endless rounds of mind-numbing tests on your students. That’s no way to learn, is it? All that’s going to do is produce ‘inert knowledge’ that will just sit there and be of no use whatsoever, right? Wrong. Apparently, the ‘retrieval practice’ of testing actually helps us induce “readily accessible information that can be flexibly used to solve new problems.”[1]

Most tests are conducted in order to produced summative information on how much students have learned and as such have (possibly rightly) attracted lots of ire. But maybe this is a very narrow way to view the humble test.

In my post on desirable difficulties I reported the following nugget:

We think we know more than in fact we do. For instance you may well have some pretty fixed ideas about testing. Which of these study patterns is more likely to result in long term learning?

1. study study study study – test

2. study study study test – test

3. study study test test – test

4. study test test test – test

Most of us will pick 1. It just feels right, doesn’t it? Spaced repetitions of study are bound to result in better results, right? Wrong. The most successful pattern is in fact No. 4. Having just one study session, followed by three short testing sessions – and then a final assessment – will out perform any other pattern.

This is something I’ve only just begun to research and experiment with, but the implications are fascinating. One of the first things I needed to reconsider was what might constitute at test. That is to say, I had to move away from the limited definition of testing being merely a pen and paper based exercise conducted under exam conditions. Testing can (and should) include some of the tricks and techniques we’ve been misusing and misunderstanding as AfL for the past 10 years or so. In fact, it doesn’t really matter how you test students as long as your emphasis changes; testing should not be primarily used to assess the efficacy of your teaching and students’ learning, it should be used as a powerful tool in your pedagogical armoury to help them learn.

Maybe this is really obvious and everyone else has always understood the fundamental point of classroom assessment, but I don’t think so. Everything I’ve read (and I’ve read a fair bit) indicates that the point of AfL is find out what students have learned and to adjust your teaching to fill in any gaps. This deficit model means that teachers (and students) might be labouring under some quite fundamental misunderstandings.

They are:

1) The Input/Output Myth – what teachers teach, students learn. Learning appears to be waaaay more complicated than this myth suggests.

2) Classroom performance equates with student learning. It doesn’t. Learning takes place over time and can only be inferred from performance

3) Students will retain what they’ve learned. They won’t. Students will forget the vast majority of what you teach and what they do remember will be largely unique to individuals.

If we just carry on waving our lolly sticks about, festooning students with Post-it notes and smugly getting them to fill in exit passes, what will we accomplish? Well, if cognitive science is correct about the human mind and how it learns, the answer might be: precious little.

So, should we chuck out the baby with this particularly gritty bathwater? How about if instead we rethought the purpose of assessment and considered how our AfL toolkits might actually benefit learning instead of just monitoring performance.

This paper on Ten Benefits of Testing and Their Applications to Educational Practice is a good starting point. The benefits are organised into direct effects on retention and indirect benefits on meta-cognition, teaching and learning. Whilst all are interesting and worth perusing, the purposes of this post I’m just going to discuss how I’ve been trying to use the direct benefits of testing.


Mean number of idea units recalled on the final test taken 5 min or 1 week
after the initial learning session

The Testing Effect: retrieval aids later retention – the is the claim  made above that studying material once and testing three times leads to about 80% improved retention than studying three times and testing once. The research evidence suggests that it doesn’t matter whether people are asked to recall individual items or passages of text, testing beats restudying every time. Now, we all know that cramming for a test works, hut what theses studies show is that testing leads to a much increased likelihood that information being retained over the long term. The implication is that if we want our students to learn whatever it is we’re trying to teach them we should test them on it regularly. And by regularly I mean every lesson. What if every lesson began with a test of what students had studied the previous lesson? Far from finding it dull, most students actually seem to enjoy this kind of exercise. And if you explain to them what you’re up to and why, they get pretty excited at seeing whether the theory holds water. And what of accusations that this might lead to instances of The Hawthorn Effect? Frankly my dear, I couldn’t give a damn! I’m not a researcher and I’m not trying to prove anything; I just want to take advantage of something that’s already been proven.

Testing causes students to learn more from the next study episode – this is also pleasingly referred to as ‘test-potentiated learning’. Basically it means that having followed a Study Test Test Test (STTT) pattern of lessons, the next STTT pattern will result in even better retention: the more test you do, the better you are at learning!

This particular field of study belongs to Hideki Izawa who began by investigating whether learning was actually taking place during testing.  She examined three hypotheses:

  1. During a test students will neither learn nor forget
  2. Learning and forgetting could occur during a test
  3. Taking a test might influence the amount of learning during a future study session.

Guess what? Propositions 1 and 3 turn out to be correct. But doesn’t this contradict The Testing Effect? Well, apparently not; the testing effect can be interpreted as a slowing of forgetting after the test. And the real kicker is that this potential improvement occurs whether or not students get any feedback on their tests!

Testing improves transfer of knowledge to new contexts – this one is the Grail! One of the myths Daisy Chrisodoulou’s new book Seven Myths About Education is that we should teach transferable skills. She argues the following:

Skills are tied to domain knowledge. If you can analyse a poem, it doesn’t mean you can analyse a quadratic equation, even though we apply the word ‘analysis’ to each activity. Likewise with evaluation, synthesis, explanation and all the other words to be found at the top of Bloom’s Taxonomy. When we see people employing what we think of as transferable skills, what we’re probably seeing is someone with a wide-ranging body of knowledge in a number of different domains.

But what if testing could improve the transferability of skills and knowledge? What then? Can retrieval really help the transferability of knowledge?

Let’s start by defining ‘transfer’. How about, “applying knowledge learned in one situation to a new situation”? And let’s be a little more cautious than the example of ‘far transfer’ given by Daisy above. Can we teach students how to analyse non-fiction texts and then expect them to be able to analyse poetry? This is a real bugbear of mine because, frustratingly, it’s hard. Within a ‘skills-based’ subject like English we ought to be able to do this. But, year after year, I’ve found myself stymied by students’ damnable inability to see that analysing in one context is exactly the same as analysing in another. Rebranding the skill as ‘zooming in’ has helped but it’s still an uphill struggle; they need constant prodding and reminding.

Ebbinghaus was experimenting the transferability of skills way back in 1885, and more recently Barnett and Ceci (2002)went as far as proposing a taxonomy for transfer studies which attempt to describe the dimensions against which transfer of a learned skill might be assessed.

So could testing make the difference? There’s been a number of different studies on the effects of testing on the ability to transfer skills and there’s lots of evidence for ‘near transfer’ and Butler (2010) has shown that ‘far transfer’ (transfer to new questions in different knowledge domains) may be possible:

In this experiment, subjects studied prose passages on various topics (e.g., bats; the respiratory system). Subjects then restudied some of the passages three times and took three tests on other passages. After each question during the repeated tests, subjects were presented with the question and the correct answer for feedback. One week later subjects completed the final transfer test. On the final test, subjects were required to transfer what they learned during the initial learning session to new inferential questions in different knowledge domains (e.g., from echolocation in bats to similar processes used in sonar on submarines).

The results showed that subjects were more likely to correctly answer a transfer question when they had answered the corresponding question during initial testing. Is this conclusive? Maybe not, but it’s compelling. I don’t think my teaching of analysis in English is going to result in my students being better able to analyse quadratic equations, but if it helps them transfer between non-fiction and poetry I’ll be chuffed.

Testing can facilitate retrieval of material that was not tested – yes, you heard it: taking a test will help you remember even the stuff that wasn’t actually tested. This concept of ‘retrieval-induced facilitation’ sounds almost magical and seems at odds with Bjork’s theory of ‘retrieval-induced forgetting. But the contradiction only exists in the short term; the more incidences of re-testing and the longer you leave the final test (at least 24 hours) results in clear improvements of material that has not been tested in the STTT pattern of learning.

I’m right at the beginning of all this, but it looks like testing is the way forward if I want to make sure my students remember (that is to say, learn) the stuff I’m teaching them. I’ve already started get students to summarise what they’ve learned in a paragraph at the end of each lesson and setting homework designed to test students’ recall of lesson content. Also I’ve begun tinkering around with concept maps to see how they can be used as testing tools.

At the beginning of the year I was preparing to junk a lot of what I’d come to believe was best practice. Turns out, all I need to get rid of are my misconceptions about what assessment for learning might actually be for. Maybe it really could be for learning and not just performance!

24 Responses to Testing & assessment – have we been doing the right things for the wrong reasons?

  1. Mary Whitehouse says:

    Interesting – maybe the same effect as is seen with ‘spaced versus massed practice’ – Hattie, Visible Learning pp 185-186..

  2. Bystander says:

    AiL anybody? Or practice makes perfect, as we used to say.

  3. Bystander says:

    Interesting to see you moving towards a view of learning that distinguishes between potentiation (encoding?) and retrieval. Biggest issue I think for ‘standard’ ofsted view of learning is that encoding (at least in the LTM) seems to take days, if not weeks so the idea that anything meaningful can be said about what has ‘learned’ in one lesson is a bit of a chimera.

  4. learningspy says:

    This is precisely the problem! Learning doesn’t happen in neat 1 hour blocks. If you look for it, or expect it, what you will see is cued response performance. Anyone can train a monkey!

  5. Bystander says:

    …. and so, at least if you think there’s anything in Hebbian learning, one can imagine that the repeated retrieval and encoding cycles implied by retesting would strengthen/multiply synaptic connections in the ‘memorise’ phase. I’m sure it’s all way more complex than this and nobody seems to say much about how retrieval might work at a neural level but at least it all sounds a bit more credible than Brain Gym.

  6. learningspy says:

    Haha! “at least it all sounds a bit more credible than Brain Gym.” What more could we ask for?

  7. mariusfrank says:

    Have a look at newqualithinking.net

    There is a group of educationalists trying to reconcile the tensions in the system!

    We would love clear thinking like this to inform progress!

    • learningspy says:

      Hi Marius – I’m sure you won’t remember but I briefly worked as an English/drama teacher at Bedminster Down back 2002. How would you like me to be involved with newqaulithinking?

  8. […] A curious peculiarity of our memory is that things are impressed better by active than by passive repetition. I mean that in learning (by heart, for example), when we almost know the piece, it pays…  […]

  9. dodiscimus says:

    This is really useful, David, thank you. I’m sufficiently inspired to think that we may add “Effect of Testing” to the topics our Secondary Science PGCE trainees pick from for their Masters-level assignments. I’ll have to find some time from somewhere to do the necessary additional reading but this post and that paper are a fantastic start.

    I can’t speak for all science teachers, but certainly in my own teaching I’ve tended to do quite a lot of regular testing. Following on (sort of) from our last discussion, I wonder whether this is more common in science than in English. However I’ve often been too casual about organising this – making up quick quizzes on the hoof – rather than carefully planning to achieve comprehensive coverage. I feel partly pleased that the evidence you are pointing to suggests that plenty of testing improves learning, even of the bits I failed to cover in my questions, and partly guilty that I didn’t put a bit more effort into doing this better.

    Here are a few questions that occur to me (I’ve skim read the paper you have cited and don’t think it answers these but maybe it does if read more carefully or if the references are followed).

    A lot of the research involves simple factual recall – foreign vocabulary, lists of words etc. Even the research using passages of text are probably matched to the subjects so that understanding the content wasn’t an issue. My feeling is that actually, improving retrieval, even of difficult, half-understood, or mis-understood ideas, helps enormously in hard-to-understand areas because it reduces cognitive load when grappling with problems later, but I don’t think any of the research cited backs up this hunch.

    Section 12 (fourth point) is important – the need to correct wrong answers, particularly if questions are T/F or MCQ. I can’t imagine a teacher not doing so but it is easy to be inspired to try something new and then implement incorrectly. Maybe that needs to be kept prominent.

    As before, I’m struggling to offer examples in English but grabbing ideas from http://pragmaticreform.wordpress.com/2014/04/06/3-apps-cognitive-science/ what happens if you test “which poetic techniques does Shelley use in Ozymandias and why?” and the pupil correctly answers about alliteration but actually thinks that “far away” is alliteration because of the repeated ‘a’? This makes me think that the choice of questions is more important in real teaching than in psychological research. Do you agree?

    And where I’m kind of going here is back to some gentle defence of AfL – in this case use of testing to diagnose and fix misconceptions and gaps in knowledge by adapting teaching. Section 10 quite strongly endorses this, doesn’t it?

  10. […] The Cult of Outstanding Everything we’ve been told about teaching is wrong and what to do about it! Testing & assessment: have we been doing the right things for the wrong reasons? […]

  11. […] We may all learn at different rates and in subtly different ways, but the spacing effect and the testing effect seem to apply to […]

  12. […] Testing & assessment – have we been doing the right things for the wrong reasons? – David Didau […]

  13. […] Testing and Assessment by @learningspy comprehensive review of whether assessment is for performance or actually for learning […]

  14. […] idea of testing and the nature of testing is explored in more detail by David Didau in one of his learning spy blogs, and this offers a link to research work by Roediger et al. Like […]

  15. […] we were told that the much vaunted Testing Effect (which I’ve written about here) has been effectively shown to be useless in improving the learning of ‘complex’ […]

  16. […] here is an understanding of the testing effect and how low stakes quizzes can be put together. Some important points to consider include the need […]

  17. […] Testing & assessment – have we been doing the right things for the wrong reasons? A curious peculiarity of our memory is that things are impressed better by active than by passive repetition. […]

  18. […] answer, plain and simple, is rote learning and regular low-stakes testing. From a young age, Chinese children spend literally hours writing and rewriting characters, again […]

  19. […] I’ve written a lot about the benefits of using stories over rote brushstroke repetition. But don’t mistake me: efficient memorisation techniques must still be combined with continual low-stakes testing and revisiting previously learnt characters. My stories were only half of the strategy. Despite the evocative scenes in my head, I wouldn’t have gotten anywhere without daily practice, recall and testing. Even when using the most powerful & fun memorisation techniques, learning must be consistently revisited – i.e. tested – if it is to be memorised. […]

  20. […] Assessing to boost retention. Beyond the value of formative assessment (to help a teacher decide what to teach) and summative assessment (to determine what students have learned), assessments that require students to recall material help information ‘stick’. This is usually referred to as the ‘testing effect‘. […]

  21. […] it’s sometimes called, retrieval practice. I’ve written about the testing effect before here and have discussed some of the recent research evidence in more depth here. But for those who are […]

Constructive feedback is always appreciated

%d bloggers like this: