While it’s apparent there were research design challenges with HEQCO’s recent attempt to examine the usefulness of a basic skills test as a post-secondary outcomes measure (see a commentary in University Affairs and my own post), it’s more challenging to recognize the problems with the test tool itself.
A recent blog post from Higher Education Strategy Associates argues that the test tool, Education and Skills Online (ESO) could be a useful performance measure for PSE as long as sampling is more rigorous and the results can be viewed along side other tests, such as the SAT.
In fact, compared to some of the other performance measures the government of Ontario is considering, I would say it is among the better ones.
The argument disregards two fundamental problems with the use of this tool and any similar tools derived from the PIAAC test.
First, there is no empirical basis to use test scores as a predictor of job performance or academic performance. No basis whatsoever. This is why the current manager, the test designer and a key test developer have all stated that the score, on it’s own, can’t be used to predict one’s performance in everyday life. No cut-offs for employment or program entry have been empirically established.
Secondly, the construct itself is a problem, as it was designed to provide information about textual proficiency that was detached from the type of feedback provided by other standardized language and literacy tests, often used in education. In other words, this is a model of literacy (really, just reading) intentionally designed unlike any other model of reading.
Misalignment between ESO proficiency and actual performance and practices
To use an ESO test score as a predictor of actual academic and job performance, the score would need to be compared with grades and students’ performance on the job. A rigorous cohort study that tracked 1000 adults who didn’t finish high school for 10 years, found very little change in their test scores overall, despite students being in various occupations or eventually completing high school. Despite the cohort limitations, it is a comprehensive study with valuable insights about the use of a PIAAC-type test tool (see Scaling Up and Moving In: Connecting social practices views to policies and programs in adult education).
While population test scores showed little change over a ten year period, individual scores from multiple waves of testing did show some increases and decreases, depending on life events and wider economic forces. Importantly, the students’ involvement in education had little impact on their scores. This happened despite acquiring new literacy and numeracy practices—-that is, participants acquired digital skills, completed their GED, read more complex texts, etc. In other words, changes in what adults actually did with their developing literacy repertoires on the job, at school and in their day-to-day lives, weren’t detected.
Here is what Stephen Reder, the study’s principal investigator and a prominent adult literacy academic, concludes.
The proficiency and practices measures are subject to quite different dynamics of change…There is thus a major misalignment between the effects programs are having on their students’ literacy and numeracy development, on one hand, and the short-term proficiency gains for which programs are accountable under the dominant policy and funding regimes. As the stakes rise in these accountability schemes, such misalignments are likely to produce substantial distortions in educational practice.
Why are proficiency and practices not aligned? The construct or model of test development behind the test used in the longitudinal study (a precursor to ESO) and the broader PIAAC initiative is fundamentally different from models used to develop tests typically used in academic settings.
A distinct construct thwarts comparison with other measures, perverts pedagogy and unfairly impacts particular student groups
While the ESO construct derived from PIAAC works for what it was designed to do—compare test results with results from questionnaires to explore socioeconomic relationships—once the model is carried into educational environments using ESO, it introduces a series of distortions and pedagogical perversions that are counter-productive and systemically unfair.
Validity in one context does not imply validity in another context. To make this assumption, one has to make additional assumptions—that the texts and processes used to respond to textual demands in the ESO are the same as real life, and when different, the knowledge is readily transferable. However, both assumptions are incorrect.
With regard to the transferability of literacy skills and strategies, only the most fundamental skills, such as phonetics and sentence structures readily transfer from one textual context to another. More complex skills and knowledge have to be re-contextualized and even re-learned. The ability to read a newspaper article or interpret the hidden social meaning in an email message or derive key information and insights from a text book are very distinct skills sets, dependent on background knowledge, experience, understandings of social conventions, specialized language and text structures, etc.
With regard to the similarity of texts and processes, the ESO construct is unique and disassociated from actual textual problem-solving skills and other academic tests.
ESO assesses superficial textual dexterity and short-term memory
Although referred to as a test of literacy, numeracy and problem-solving, these comprehensive terms obscure the fact that ESO is a primarily a reading test. The other part of literacy, producing text, arguably the more important part in today’s workplaces, is not tested. In the numeracy portion, some basic calculations must be performed to complete the reading items. In the problem-solving portion (already admittedly outdated) test-takers are presented with texts in on-line environments and must apply their knowledge of screen navigation to respond to reading questions.
The type of reading tested is a superficial scanning technique. Test-takers are prompted to find one or two bits of information, avoid distracting information (words and phrases intentionally designed to confuse) and supply a one word or single phrase response. They must have excellent short-term memory skills as they scan through the various texts, with an average Grade 8 readability, locating the correct response to produce a match.
Think of the model as an eye-hand coordination test for the information age. Rather than manipulate pegs on a peg board, test-takers manipulate bits of information in highly controlled textual environments.
Also needed are solid test-taking skills. Not only will test-takers have to deal with distractors but they will also need to adjust their typical problem-solving and sense-making abilities. Initially, the test items appear to be familiar, inviting people to apply acquired strategies when viewing similar items in everyday life. However, a sensible approach will prove frustrating, and test-takers will need to have enough experience and test-taking resilience to make necessary adjustments.
A related problem with the construct is that it does not draw on language development indicators, such as vocabulary development or sentence construction or reading comprehension skills developed in K-12. Nor does it draw on strategies one might learn to read for academic purposes developed in PSE. Perversely, its theoretical basis is an error analysis of test-taking skills and not reading development indicators!
Although the reading technique is superficial, the test-item design conventions, reliance on short-term memory, scanning processes, distractors, unfamiliar texts, thwarting of typical problem-solving and disassociation from other educational tests combine to make completion challenging. Arguably, one needs a solid formal education at the secondary level in the language of the test simply to complete Level 1 items. (There are three additional levels of difficulty.)
So what happens when this construct is used in high-stakes decision-making in PSE? A series of counter-productive, confounding and compromising responses will likely follow.
- ESO results won’t align neatly with the results from other tests or even grades, since the underlying construct and the overall pedagogical approach are fundamentally different.
- ESO skills (quick scanning to find bits of information to complete the question) are counter-productive to the development of skills valued and needed in PSE and afterwards—that is deep, careful reading, connecting text to experience and other texts, and the application of text-informed knowledge.
- Superficial scanning skills may become more valued, depending on the stakes, and more time may be devoted to their development, pushing aside what really matters.
- If stakes are low for students, many will blow off the test. If stakes are high for institutions and low for students they will likely be pressured to do well. The professional judgement of educators will be compromised.
- Those who will likely struggle with the unique testing construct are non-traditional students, out of school for some time with rusty test-taking skills and multilingual students studying in English or French for the first time. They may then be the targets of ESO instructional remediation.
- They very students who need more academic literacy supports could potentially receive less.
Like a faulty or biased algorithm, the ESO reading model, lifted out of its original context and into a learning context, will unleash a series of systemic inequities and counter-productive decisions and actions.
There is a need for careful consideration and critique from test design, reading development and postsecondary curriculum experts who grasp the implications of high-stakes standardized testing and its unintended consequences.