Testing & assessment – have we been doing the right things for the wrong reasons?
A curious peculiarity of our memory is that things are impressed better by active than by passive repetition. I mean that in learning (by heart, for example), when we almost know the piece, it pays better to wait and recollect by an effort from within, than to look at the book again. If we recover the words in the former way, we shall probably know them the next time; if in the latter way, we shall very likely need the book once more.
William James, The principles of psychology (1890)
Never stop testing, and your advertising will never stop improving.
Tests are rubbish, right? Like me, you may find yourself baring your teeth at the thought of being drilled to death, or inflicting endless rounds of mind-numbing tests on your students. That’s no way to learn, is it? All that’s going to do is produce ‘inert knowledge’ that will just sit there and be of no use whatsoever, right? Wrong. Apparently, the ‘retrieval practice’ of testing actually helps us induce “readily accessible information that can be flexibly used to solve new problems.”
Most tests are conducted in order to produced summative information on how much students have learned and as such have (possibly rightly) attracted lots of ire. But maybe this is a very narrow way to view the humble test.
In my post on desirable difficulties I reported the following nugget:
We think we know more than in fact we do. For instance you may well have some pretty fixed ideas about testing. Which of these study patterns is more likely to result in long term learning?
1. study study study study – test
2. study study study test – test
3. study study test test – test
4. study test test test – test
Most of us will pick 1. It just feels right, doesn’t it? Spaced repetitions of study are bound to result in better results, right? Wrong. The most successful pattern is in fact No. 4. Having just one study session, followed by three short testing sessions – and then a final assessment – will out perform any other pattern.
This is something I’ve only just begun to research and experiment with, but the implications are fascinating. One of the first things I needed to reconsider was what might constitute at test. That is to say, I had to move away from the limited definition of testing being merely a pen and paper based exercise conducted under exam conditions. Testing can (and should) include some of the tricks and techniques we’ve been misusing and misunderstanding as AfL for the past 10 years or so. In fact, it doesn’t really matter how you test students as long as your emphasis changes; testing should not be primarily used to assess the efficacy of your teaching and students’ learning, it should be used as a powerful tool in your pedagogical armoury to help them learn.
Maybe this is really obvious and everyone else has always understood the fundamental point of classroom assessment, but I don’t think so. Everything I’ve read (and I’ve read a fair bit) indicates that the point of AfL is find out what students have learned and to adjust your teaching to fill in any gaps. This deficit model means that teachers (and students) might be labouring under some quite fundamental misunderstandings.
1) The Input/Output Myth – what teachers teach, students learn. Learning appears to be waaaay more complicated than this myth suggests.
2) Classroom performance equates with student learning. It doesn’t. Learning takes place over time and can only be inferred from performance
3) Students will retain what they’ve learned. They won’t. Students will forget the vast majority of what you teach and what they do remember will be largely unique to individuals.
If we just carry on waving our lolly sticks about, festooning students with Post-it notes and smugly getting them to fill in exit passes, what will we accomplish? Well, if cognitive science is correct about the human mind and how it learns, the answer might be: precious little.
So, should we chuck out the baby with this particularly gritty bathwater? How about if instead we rethought the purpose of assessment and considered how our AfL toolkits might actually benefit learning instead of just monitoring performance.
This paper on Ten Benefits of Testing and Their Applications to Educational Practice is a good starting point. The benefits are organised into direct effects on retention and indirect benefits on meta-cognition, teaching and learning. Whilst all are interesting and worth perusing, the purposes of this post I’m just going to discuss how I’ve been trying to use the direct benefits of testing.
The Testing Effect: retrieval aids later retention – the is the claim made above that studying material once and testing three times leads to about 80% improved retention than studying three times and testing once. The research evidence suggests that it doesn’t matter whether people are asked to recall individual items or passages of text, testing beats restudying every time. Now, we all know that cramming for a test works, hut what theses studies show is that testing leads to a much increased likelihood that information being retained over the long term. The implication is that if we want our students to learn whatever it is we’re trying to teach them we should test them on it regularly. And by regularly I mean every lesson. What if every lesson began with a test of what students had studied the previous lesson? Far from finding it dull, most students actually seem to enjoy this kind of exercise. And if you explain to them what you’re up to and why, they get pretty excited at seeing whether the theory holds water. And what of accusations that this might lead to instances of The Hawthorn Effect? Frankly my dear, I couldn’t give a damn! I’m not a researcher and I’m not trying to prove anything; I just want to take advantage of something that’s already been proven.
Testing causes students to learn more from the next study episode – this is also pleasingly referred to as ‘test-potentiated learning’. Basically it means that having followed a Study Test Test Test (STTT) pattern of lessons, the next STTT pattern will result in even better retention: the more test you do, the better you are at learning!
This particular field of study belongs to Hideki Izawa who began by investigating whether learning was actually taking place during testing. She examined three hypotheses:
- During a test students will neither learn nor forget
- Learning and forgetting could occur during a test
- Taking a test might influence the amount of learning during a future study session.
Guess what? Propositions 1 and 3 turn out to be correct. But doesn’t this contradict The Testing Effect? Well, apparently not; the testing effect can be interpreted as a slowing of forgetting after the test. And the real kicker is that this potential improvement occurs whether or not students get any feedback on their tests!
Testing improves transfer of knowledge to new contexts – this one is the Grail! One of the myths Daisy Chrisodoulou’s new book Seven Myths About Education is that we should teach transferable skills. She argues the following:
Skills are tied to domain knowledge. If you can analyse a poem, it doesn’t mean you can analyse a quadratic equation, even though we apply the word ‘analysis’ to each activity. Likewise with evaluation, synthesis, explanation and all the other words to be found at the top of Bloom’s Taxonomy. When we see people employing what we think of as transferable skills, what we’re probably seeing is someone with a wide-ranging body of knowledge in a number of different domains.
But what if testing could improve the transferability of skills and knowledge? What then? Can retrieval really help the transferability of knowledge?
Let’s start by defining ‘transfer’. How about, “applying knowledge learned in one situation to a new situation”? And let’s be a little more cautious than the example of ‘far transfer’ given by Daisy above. Can we teach students how to analyse non-fiction texts and then expect them to be able to analyse poetry? This is a real bugbear of mine because, frustratingly, it’s hard. Within a ‘skills-based’ subject like English we ought to be able to do this. But, year after year, I’ve found myself stymied by students’ damnable inability to see that analysing in one context is exactly the same as analysing in another. Rebranding the skill as ‘zooming in’ has helped but it’s still an uphill struggle; they need constant prodding and reminding.
Ebbinghaus was experimenting the transferability of skills way back in 1885, and more recently Barnett and Ceci (2002)went as far as proposing a taxonomy for transfer studies which attempt to describe the dimensions against which transfer of a learned skill might be assessed.
So could testing make the difference? There’s been a number of different studies on the effects of testing on the ability to transfer skills and there’s lots of evidence for ‘near transfer’ and Butler (2010) has shown that ‘far transfer’ (transfer to new questions in different knowledge domains) may be possible:
In this experiment, subjects studied prose passages on various topics (e.g., bats; the respiratory system). Subjects then restudied some of the passages three times and took three tests on other passages. After each question during the repeated tests, subjects were presented with the question and the correct answer for feedback. One week later subjects completed the final transfer test. On the final test, subjects were required to transfer what they learned during the initial learning session to new inferential questions in different knowledge domains (e.g., from echolocation in bats to similar processes used in sonar on submarines).
The results showed that subjects were more likely to correctly answer a transfer question when they had answered the corresponding question during initial testing. Is this conclusive? Maybe not, but it’s compelling. I don’t think my teaching of analysis in English is going to result in my students being better able to analyse quadratic equations, but if it helps them transfer between non-fiction and poetry I’ll be chuffed.
Testing can facilitate retrieval of material that was not tested – yes, you heard it: taking a test will help you remember even the stuff that wasn’t actually tested. This concept of ‘retrieval-induced facilitation’ sounds almost magical and seems at odds with Bjork’s theory of ‘retrieval-induced forgetting. But the contradiction only exists in the short term; the more incidences of re-testing and the longer you leave the final test (at least 24 hours) results in clear improvements of material that has not been tested in the STTT pattern of learning.
I’m right at the beginning of all this, but it looks like testing is the way forward if I want to make sure my students remember (that is to say, learn) the stuff I’m teaching them. I’ve already started get students to summarise what they’ve learned in a paragraph at the end of each lesson and setting homework designed to test students’ recall of lesson content. Also I’ve begun tinkering around with concept maps to see how they can be used as testing tools.
At the beginning of the year I was preparing to junk a lot of what I’d come to believe was best practice. Turns out, all I need to get rid of are my misconceptions about what assessment for learning might actually be for. Maybe it really could be for learning and not just performance!
 Roediger et al, Psychology of Learning and Motivation, Volume 55, 2011