- Home
- Articles
- Reviews
- About
- Archives
- Past Issues
- The eLearn Blog
Archives
To leave a comment you must sign in. Please log in or create an ACM Account. Forgot your username or password? |
Create an ACM Account |
To anyone outside the world of assessment, it can seem surprising that developing items and tests is so hard. Why is it that items often need to go through several stages of review and editing, and then may need to be pre-tested on top? Why is it that tests need reviewing by so many different experts? Why does so much editing and re-editing take place? Some organizations have up to 60 possible steps before a test is published, with an elapsed time of 18 months or more [1]. Despite everything, many tests still need “one final change” several times after they should have been signed off.
Of course, as well as catching basic mistakes, much of this process is designed to ensure items are valid and reliable. Do they consistently assess the skill covered by the curriculum, whomever the student? Are the command words in items clear and consistent? Is the language level appropriate for the age and language capabilities of the students?
The pitfalls are numerous. If a translated version of a text is much longer than the source, does it become more difficult? Is a maths item based on a situation described in complex language, is it confusing an assessment of language skills with mathematics? Do items embodying assumptions about cultural background, class or gender discriminate against groups of students? Are the images used in items clear, so questions don’t accidentally become assessments of image decoding instead of the intended skill? A good example of this sort of thing is the physics question assessment understanding of forces, based on a complex image of a fire-engine beside a burning building, ladder extended. Was the question about understanding the force of gravity, or was it also about decoding complex visual cues?
There is a large amount of literature devoted to these issues, and a great deal of training and guidance to help assessment authors understand how to develop high quality tests.
Thinking of this from a publishing perspective, however, another set of challenges arise which have a deep impact on assessment providers and their ability to innovate. These issues are less often written about.
As outlined above, delivering valid and reliable tests does require cycles of review—whether that’s only one or two stages, or multiple cycles, which often applies in truly high stakes settings.
The challenge for assessment bodies is that they may be required to produce multiple assessments each year and each one must go through its own series of review steps. The multiplier effect is dramatic. Some organizations publish hundreds (or even thousands) of tests each year, with the result that there are multiple products going through multiple stages all at the same time. Sometimes this review process applies separately to each component of a test, and there can be a doubling up for a second language. Operationally, the number of “product/steps” becomes huge and hard to manage.
A second level of operational challenge arises from the unique characteristics of each test.
In a well-run assessment body, the quality assurance steps required to take raw items to a final approved test will be documented. The purpose of each step will be written down so each user can be trained, and their performance audited. Because their content is often diverse, however, many organizations find that they need to run several different workflows. There may be one pathway for general item development and a second for maths items (which may need additional steps). There may be another for item-bank based test construction, and another again for the situation in which an author creates a whole paper and another for translations. The effect is that different tests may be running on differing tracks, each one working to different dates. This makes setting up an authoring cycle and tracking progress hard. Where dates and processes vary across a suite of products, how do you see if you are on track? How do you drill down to where the issues are so you can fix them?
The unique characteristics of each test also means very specific skills are needed to work on them.
People outside the sector often think of education as a single, unified market. Anyone who has worked with educational content knows that in fact education is highly fragmented, both by level and subject. The consequence in assessment is that the people able to work on one subject are often unlikely to be able to create items for a second subject. Sometimes this effect even applies to test level—some people who write well for grade 6 would struggle at grade 11 or above. This all makes the operational challenge harder as there are likely to be large numbers of users contributing to the authoring process. These contributors are often based in many separate locations, some dropping in and out as availability changes, and often only working on one step in a tests’ development cycle.
A third set of challenges arise from the publishing requirements of tests themselves. Are the tests very clearly and consistently laid out without distractions which make the questions more difficult? Are modified versions available to support users with access needs? Getting the final product right for print and online is hard and takes many assessment bodies a great deal of time. Print particularly can be a massive challenge and very costly.
Finally, and most obviously, challenge arises from the security requirements associated with assessment development. Not only should the minimum number of people see each item before it is approved, but it is also ideal if the item authors don’t know if their items will go into the final test. This creates a challenge in test development. Access to tests themselves must also be strictly limited, with tight permissions control right through to the publishing stage.
All of this makes assessment development a very particular challenge—and arguably the most difficult job in publishing.
It’s hard to think of another field in which content must be developed at such a granular level. Test development needs tight security, consistently enforced across large and dispersed teams, throughout many cycles of review and approval [2]. There are high penalties if something goes wrong, a single leak can mean an exam is canceled. A single error can mean a student receives the wrong grade and so their life chances are affected.
It is no surprise then that the process is time consuming and hard to manage. For many organizations, “tracking progress” involves users manually entering updates for each test at each stage into a reporting system. Pushing work from one user to the next can involve email alerts, and even sending physical packages. Getting the right next step for an item might require lookups to see which track it should be on. Final test publishing may involve re-keying, or even manual work in layout systems—sometimes to be followed by several further editorial iterations.
These challenges have wider consequences, however, and have severely limited the capacity of assessment bodies to innovate. This article argues that the advent of better technology to support the authoring process now holds the promise of reducing the operational barriers to change.
The limitations arising from traditional processes became apparent during the COVID crisis. Traditional systems, being inflexible, made it hard to adapt the structure of tests late on in the development process (for example introducing more optionality or changing the coverage of a test). In some places even running the traditional process became hard, as previously essential face to face contact had to end. Some even operate in countries where for extended periods it was not even legal to travel from one town to the next, bringing traditional test development models to a halt.
The impact of the unwieldy nature of assessment development, particularly at scale, is felt far beyond COVID with deep effects on the pace of change [3].
First, the current test development cycle often hinders innovation simply because it is so cyclic. Teams are extremely busy for whole periods of the year, and the window to start new projects is often very tight. When this is added to the treacly nature of traditional processes, the effect is that desirable change can take a long time to introduce.
What priorities are assessment organizations most concerned about right now?
This operational view of the pressures associated with test publishing, therefore, partly explains the slow pace of innovation among many assessment bodies.
Critically, however, what this does not suggest is that careful, deeply considered traditional quality assurance model itself is broken. It is still essential to carry out thorough authoring processes. They need to utilize thoughtful, targeted qualitative review within approved workflows, embed real expertise in assessment design, and use quality checklists to confirm processes have been understood and followed. Furthermore, it is vital for test publishers to carry out thorough reviews of the submitted questions. They need to preview how content will appear to candidates, to complete scrutiny steps where users "take" tests before they are published, and check test coverage and balance. Their purpose is to ensure assessments are valid and reliable, the absolute gold-dust of assessment. These are the non-negotiables, which set the bar for any change which the technology community may bring to the authoring process.
Looking at the wider publishing world, outside assessment, many sectors have long since seen a technology revolution aimed at delivering more efficient processes and better marked up, more reusable content. They rely on XML and HTML and have been developed with both print and online output in mind. These changes have been underway for at least 20 years and in many sectors are very well embedded.
Assessment has lagged, for reasons we have now discussed, but this is changing rapidly. Key to this has been the growth in cloud-based computing, improvements in databases and workflow capability, the overall improvement in software usability (driven by consumer technology), and better integration tools. The impetus of COVID and the desire to move tests and processes increasingly online has also provided new drivers.
The effect is that it is now possible to realize huge improvements in process control and efficiency, to enhance test security and to improve the quality model, thereby making innovation easier. As test publishers adopt new technology, the effect will be to stimulate wider innovation in all aspects of assessment, as barriers to change start to reduce.
Getting the right solution in place is of course key. There are several key criteria that should be considered as item and test development processes are moved into new technical systems.
Technology change, driven by improved user experience, cloud computing, better security options and advances in workflow means the wider digital publishing revolution is finally available to assessment providers—fueled by new investment and system development.But technology alone does not deliver improvement.
Operational change may be technically enabled but needs to be directed at the right problems and be led by the right strategic priorities. So, to get started in reforming your test production process a good first step is to audit your current system to check for pain points and bottlenecks. What is holding you back? Where do current systems just not work? Change must target these blockers.
The second key step is to think hard about your strategic priorities. What do you want to do more of? What do you want to start, and what do you want to stop? How do you see your market and your role in it changing? How do you want to work in the future? Making sure your technology systems enable you to respond to your learners and market, and don’t constrain you, is key.
Next, think about people. People make change happen, but also can easily block it. Make sure when you implement technology you’re paying real attention to your users, checking the software works for them and they feel a part of change. People won’t be able to work smarter without a system that accounts for their needs.
Then when you start to use new technology, you can set criteria to judge its success.
[1] AQA. “Making an exam - a guide to creating a question paper.” YouTube video. May 15, 2014, https://www.youtube.com/watch?v=nQcSXv6PcXs
[2] Office of Qualifications and Examinations Regulation (Ofqual). Ofqual Handbook: General Conditions of Recognition. Section G - Setting and delivering the assessment. October 12, 2017. Updated May 12, 2022.
[3] Hazell, W. GCSEs and A-level exams: Ofqual to look at replacing pen and paper with computers in GCSE and A-level exams. i. May 4, 2022.
[4] IMS Global. Igniting Digital Assessment Innovation. IMS Global. (Accessed May 26, 2022); https://www.imsglobal.org/activity/qtiapip
Alice Leigh has worked in the educational publishing technology field for eight years, creating more effective communications within the formative and summative assessment sectors. As a marketing specialist, she works to aid the uptake of innovations in technology so that institutions and learners continually reap the benefits of advancements in assessment.
Shaun Crowley, Head of Sales and Marketing, GradeMaker. Shaun’s career spans 20 years working in education, from starting as an English teacher to managing marketing teams for an international educational publisher. Shaun has extensive experience working in international markets including the Middle East, Asia, Latin America and Europe. Before joining GradeMaker’s senior management team in July 2021 he helped to operationalize AQA’s international qualification business and Oxford University Press’s new curriculum service for international schools.
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Copyright © ACM 2022 1535-394X/2022/11-3558396 $15.00 https://doi.org/10.1145/3558396
To leave a comment you must sign in. |
Create an ACM Account. |