ACM Logo  An ACM Publication  |  CONTRIBUTE  |  FOLLOW    

Getting Authoring Right: How to innovate for meaningful improvement
Advances in eAssessment (Special Series)

By Alice Leigh, Shaun Crowley / November 2022

TYPE: OPINION
Print Email
Comments Instapaper

To anyone outside the world of assessment, it can seem surprising that developing items and tests is so hard. Why is it that items often need to go through several stages of review and editing, and then may need to be pre-tested on top? Why is it that tests need reviewing by so many different experts? Why does so much editing and re-editing take place? Some organizations have up to 60 possible steps before a test is published, with an elapsed time of 18 months or more [1]. Despite everything, many tests still need “one final change” several times after they should have been signed off.

Of course, as well as catching basic mistakes, much of this process is designed to ensure items are valid and reliable. Do they consistently assess the skill covered by the curriculum, whomever the student? Are the command words in items clear and consistent? Is the language level appropriate for the age and language capabilities of the students?

The pitfalls are numerous. If a translated version of a text is much longer than the source, does it become more difficult? Is a maths item based on a situation described in complex language, is it confusing an assessment of language skills with mathematics? Do items embodying assumptions about cultural background, class or gender discriminate against groups of students? Are the images used in items clear, so questions don’t accidentally become assessments of image decoding instead of the intended skill? A good example of this sort of thing is the physics question assessment understanding of forces, based on a complex image of a fire-engine beside a burning building, ladder extended. Was the question about understanding the force of gravity, or was it also about decoding complex visual cues?

There is a large amount of literature devoted to these issues, and a great deal of training and guidance to help assessment authors understand how to develop high quality tests.

Thinking of this from a publishing perspective, however, another set of challenges arise which have a deep impact on assessment providers and their ability to innovate. These issues are less often written about.

Operational Challenges

As outlined above, delivering valid and reliable tests does require cycles of review—whether that’s only one or two stages, or multiple cycles, which often applies in truly high stakes settings.  

The challenge for assessment bodies is that they may be required to produce multiple assessments each year and each one must go through its own series of review steps. The multiplier effect is dramatic. Some organizations publish hundreds (or even thousands) of tests each year, with the result that there are multiple products going through multiple stages all at the same time. Sometimes this review process applies separately to each component of a test, and there can be a doubling up for a second language. Operationally, the number of “product/steps” becomes huge and hard to manage.

A second level of operational challenge arises from the unique characteristics of each test.

In a well-run assessment body, the quality assurance steps required to take raw items to a final approved test will be documented. The purpose of each step will be written down so each user can be trained, and their performance audited. Because their content is often diverse, however, many organizations find that they need to run several different workflows. There may be one pathway for general item development and a second for maths items (which may need additional steps). There may be another for item-bank based test construction, and another again for the situation in which an author creates a whole paper and another for translations. The effect is that different tests may be running on differing tracks, each one working to different dates. This makes setting up an authoring cycle and tracking progress hard. Where dates and processes vary across a suite of products, how do you see if you are on track? How do you drill down to where the issues are so you can fix them?

The unique characteristics of each test also means very specific skills are needed to work on them. 

People outside the sector often think of education as a single, unified market. Anyone who has worked with educational content knows that in fact education is highly fragmented, both by level and subject. The consequence in assessment is that the people able to work on one subject are often unlikely to be able to create items for a second subject. Sometimes this effect even applies to test level—some people who write well for grade 6 would struggle at grade 11 or above. This all makes the operational challenge harder as there are likely to be large numbers of users contributing to the authoring process. These contributors are often based in many separate locations, some dropping in and out as availability changes, and often only working on one step in a tests’ development cycle.

A third set of challenges arise from the publishing requirements of tests themselves. Are the tests very clearly and consistently laid out without distractions which make the questions more difficult? Are modified versions available to support users with access needs? Getting the final product right for print and online is hard and takes many assessment bodies a great deal of time. Print particularly can be a massive challenge and very costly.

Finally, and most obviously, challenge arises from the security requirements associated with assessment development. Not only should the minimum number of people see each item before it is approved, but it is also ideal if the item authors don’t know if their items will go into the final test. This creates a challenge in test development. Access to tests themselves must also be strictly limited, with tight permissions control right through to the publishing stage.

All of this makes assessment development a very particular challenge—and arguably the most difficult job in publishing.

It’s hard to think of another field in which content must be developed at such a granular level. Test development needs tight security, consistently enforced across large and dispersed teams, throughout many cycles of review and approval [2]. There are high penalties if something goes wrong, a single leak can mean an exam is canceled. A single error can mean a student receives the wrong grade and so their life chances are affected.

It is no surprise then that the process is time consuming and hard to manage. For many organizations, “tracking progress” involves users manually entering updates for each test at each stage into a reporting system. Pushing work from one user to the next can involve email alerts, and even sending physical packages. Getting the right next step for an item might require lookups to see which track it should be on. Final test publishing may involve re-keying, or even manual work in layout systems—sometimes to be followed by several further editorial iterations.

These challenges have wider consequences, however, and have severely limited the capacity of assessment bodies to innovate. This article argues that the advent of better technology to support the authoring process now holds the promise of reducing the operational barriers to change.

The Impact of Challenge: Hindering Innovation

The limitations arising from traditional processes became apparent during the COVID crisis. Traditional systems, being inflexible, made it hard to adapt the structure of tests late on in the development process (for example introducing more optionality or changing the coverage of a test). In some places even running the traditional process became hard, as previously essential face to face contact had to end. Some even operate in countries where for extended periods it was not even legal to travel from one town to the next, bringing traditional test development models to a halt. 

The impact of the unwieldy nature of assessment development, particularly at scale, is felt far beyond COVID with deep effects on the pace of change [3].

First, the current test development cycle often hinders innovation simply because it is so cyclic. Teams are extremely busy for whole periods of the year, and the window to start new projects is often very tight. When this is added to the treacly nature of traditional processes, the effect is that desirable change can take a long time to introduce. 

What priorities are assessment organizations most concerned about right now?

  • Separating item writers from tests. Often driven by regulatory pressures, there is widespread interest in reducing the number of people who know what’s in the final test. Delivering this highly desirable goal however requires deep change in the operational process, often through introducing item or section-level authoring.
  • More frequent testing. Some organizations would like to offer more test sessions each year but find this hard when managing the production of each test is difficult.
  • Faster development cycles. Others would like to speed up test development, taking months off current cycle times to get products to market more quickly, either for commercial benefit or to respond educationally to changing needs. Several test development businesses we know talk of a 12­– or 18–month cycle time to produce tests when the “on task” time might be a matter of days or a few weeks at most. Changing this rhythm is challenging.
  • Widening services. Across the industry, there is also a widespread interest in extending the scope of the services provided by assessment bodies, extending beyond a traditional focus on high stakes tests. Given the power of data to read and interpret assessment results, how can assessment bodies deepen their feedback to exam centers on candidate’s performance? Implementing this sort of service requires a rich data stream from the test development team which many traditional systems do not provide.
  • Question reuse. The interest in broadening the scope of the assessment providers’ services extends strongly to item reuse too, as well as the provision of new assessment types. In many settings the exam publisher is one of the most prolific generators of content to support a curriculum, with banks of material building up year on year. Linking these banks to curriculum mapping and difficulty data opens new options for “checkpoint” testing and new formative products. However traditional systems, focused as they are on the “final test,” often don’t easily generate content in formats which make such solutions easy to implement. The content just isn’t available as flexible, fully mapped items with thorough usage tracking and so it is hard to repurpose. Equally, managing item reuse across high stakes tests can only be achieved with high quality technology and control which traditional models did not provide. The operational friction means innovation is hard to deliver.
  • Futureproofing publishing. And then of course there is the move of assessment online. Once again operational complexity has an impact. The transition process from print itself can create obstacles that make the introduction of online testing harder. If authors have been used to writing tests, not items, is it easy to continue this working method in the new medium? What if the first iteration of e-testing requires dual medium working, needing some tests to be published on paper as well as online? Does the solution for building online tests support the tried and tested quality assurance process associated with paper, or does the test publisher find itself having to support two different QA regimes? 

This operational view of the pressures associated with test publishing, therefore, partly explains the slow pace of innovation among many assessment bodies.

Critically, however, what this does not suggest is that careful, deeply considered traditional quality assurance model itself is broken. It is still essential to carry out thorough authoring processes. They need to utilize thoughtful, targeted qualitative review within approved workflows, embed real expertise in assessment design, and use quality checklists to confirm processes have been understood and followed. Furthermore, it is vital for test publishers to carry out thorough reviews of the submitted questions. They need to preview how content will appear to candidates, to complete scrutiny steps where users "take" tests before they are published, and check test coverage and balance. Their purpose is to ensure assessments are valid and reliable, the absolute gold-dust of assessment. These are the non-negotiables, which set the bar for any change which the technology community may bring to the authoring process. 

Technology for Test Publishing

Looking at the wider publishing world, outside assessment, many sectors have long since seen a technology revolution aimed at delivering more efficient processes and better marked up, more reusable content. They rely on XML and HTML and have been developed with both print and online output in mind. These changes have been underway for at least 20 years and in many sectors are very well embedded.

Assessment has lagged, for reasons we have now discussed, but this is changing rapidly.  Key to this has been the growth in cloud-based computing, improvements in databases and workflow capability, the overall improvement in software usability (driven by consumer technology), and better integration tools. The impetus of COVID and the desire to move tests and processes increasingly online has also provided new drivers.  

The effect is that it is now possible to realize huge improvements in process control and efficiency, to enhance test security and to improve the quality model, thereby making innovation easier. As test publishers adopt new technology, the effect will be to stimulate wider innovation in all aspects of assessment, as barriers to change start to reduce.

What to Check for When Introducing Technology for Test Development

Getting the right solution in place is of course key. There are several key criteria that should be considered as item and test development processes are moved into new technical systems. 

  • Usability. The solution must be easy to use by the operators we described earlier—the physics reviewer who will only log in a couple of times a year and the maths author who only uses technology rarely in their day job. It also must deliver for the core team, who have been used to managing complex processes often using locally developed tracking systems. Being easy to use is essential.
  • Richness. One of the reasons why assessment has changed slowly is that the processes being followed are inherently rich. Item reuse rules, version management, the reviewer dialogue, asset approval, syllabus mapping, test blueprinting, maths formulae, the pre-test cycle, using "anchor" items, mark scheme creation, additional documents, booklets, rubrics, print layout rules, onscreen test presentation conventions and so on—they all matter. A solution that is 90% complete is unlikely to work. Good usability cannot come at the price of weak features.
  • Modularity. Exchanging one set of complex, manually generated processes for another hard-wired set of technology components which can’t be changed easily is to switch one set of constraints for another. To retain power over their suppliers, assessment bodies should insist on modular solutions so an authoring system can be changed independently of the onscreen delivery system, the results system, and the marking solution. This also allows the assessment body to take advantage of new systems that come on stream without having to change their entire business architecture.
  • Standards based. Standards here refers to the way content is marked up and held in the database—and they are particularly important. If content is stored using proprietary methods, it effectively gets "locked in" to the technical supplier’s systems. Switching it out into other solutions becomes difficult. Integrating it with other components in the assessment architecture, such as test players and learning management systems is also challenging. Fortunately, there are very well-established technical standards for marking up assessment content (QTI) which support interchange of assessment content between systems [4]. Not to use these is to ensure your content cannot be ported to other solutions. Open content standards should be at the heart of technical assessment systems.
  • Security. This is non-negotiable. Security should be considered not just in terms of technical security (how data is stored and transferred, multi factor authentication, system resilience, anti-virus protection etc.) but also in terms of the granularity of the roles and permissions supported by the software. More granular roles translate into greater control over who sees what. It’s also important to look at supplier processes and to ask if the supplier has robust security processes across the organization, which might surface in a commitment to security-related standards.
  • Publishing agnostic. To achieve the improved flexibility over what happens to content, assessment bodies should aim to find systems which make it possible to develop content for multiple output formats. Whether that’s a high stakes test delivery system, a portal, a low stakes test player, or direct to a print-ready PDF. This facility makes it possible to realize one of the goals we discussed earlier—finding ways to reuse and exploit a growing bank of well-constructed assessment material in new products and services.
  • Flexibility. The technology should not lead but facilitate. While much of the friction in current development processes can be swept away, many aspects of current practice are there for a good reason. Operating many workflows is useful. Whole test authoring is just as valid as item banking. Sometimes it is necessary to have more than one author working on a paper. Sometimes tests do need to share draft items with other tests. Additional booklets are sometimes essential. Some tests will use the classical model, while others follow an IRT route. Processes will change in the future, so flexibility is key.
  • Respect for validity and reliability. It is essential that any authoring system meets the needs of the quality assurance team so they can genuinely be confident it will support them in the critical task of producing valid and reliable tests. This is all about the quality of the review process, tracking review comments, investigating old versions of items and tests, clean and authentic preview screens, quality checklists, good usability, data downloads to validate test coverage and so on. Great technology that is weak on quality is not fit for purpose.

What Next?

Technology change, driven by improved user experience, cloud computing, better security options and advances in workflow means the wider digital publishing revolution is finally available to assessment providers—fueled by new investment and system development.But technology alone does not deliver improvement.

Operational change may be technically enabled but needs to be directed at the right problems and be led by the right strategic priorities. So, to get started in reforming your test production process a good first step is to audit your current system to check for pain points and bottlenecks. What is holding you back? Where do current systems just not work? Change must target these blockers.  

The second key step is to think hard about your strategic priorities. What do you want to do more of? What do you want to start, and what do you want to stop? How do you see your market and your role in it changing? How do you want to work in the future? Making sure your technology systems enable you to respond to your learners and market, and don’t constrain you, is key.   

Next, think about people. People make change happen, but also can easily block it. Make sure when you implement technology you’re paying real attention to your users, checking the software works for them and they feel a part of change. People won’t be able to work smarter without a system that accounts for their needs.

Then when you start to use new technology, you can set criteria to judge its success.

References

[1] AQA. “Making an exam - a guide to creating a question paper.” YouTube video. May 15, 2014, https://www.youtube.com/watch?v=nQcSXv6PcXs

[2] Office of Qualifications and Examinations Regulation (Ofqual). Ofqual Handbook: General Conditions of Recognition. Section G - Setting and delivering the assessment. October 12, 2017. Updated May 12, 2022.

[3] Hazell, W. GCSEs and A-level exams: Ofqual to look at replacing pen and paper with computers in GCSE and A-level exams. i. May 4, 2022.

[4] IMS Global. Igniting Digital Assessment Innovation. IMS Global. (Accessed May 26, 2022); https://www.imsglobal.org/activity/qtiapip

About the Authors

Alice Leigh has worked in the educational publishing technology field for eight years, creating more effective communications within the formative and summative assessment sectors. As a marketing specialist, she works to aid the uptake of innovations in technology so that institutions and learners continually reap the benefits of advancements in assessment.

Shaun Crowley, Head of Sales and Marketing, GradeMaker. Shaun’s career spans 20 years working in education, from starting as an English teacher to managing marketing teams for an international educational publisher. Shaun has extensive experience working in international markets including the Middle East, Asia, Latin America and Europe. Before joining GradeMaker’s senior management team in July 2021 he helped to operationalize AQA’s international qualification business and Oxford University Press’s new curriculum service for international schools.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Copyright © ACM 2022 1535-394X/2022/11-3558396 $15.00 https://doi.org/10.1145/3558396


Comments

  • There are no comments at this time.

ADDITIONAL READING

Advances in eAssessment (Special Series) This series of articles covers advancements in eAssessment. The series features educators, developers, and researchers from around the world who are innovating how learning is assessed while meeting the challenges of efficiency, scalability, usability, and accessibility.
  1. Going Beyond Multiple Choice
  2. Centering All Students in Their Assessment
  3. Harnessing the Power of Natural Language Processing to Mass Produce Test Items
  4. Getting Authoring Right—How to Innovate for Meaningful Improvement
  5. Closing the Assessment Excellence Gap—Why Digital Assessments Should go Beyond Recall and be More Inclusive