Duration versus accuracy: What matters for computerized adaptive testing in schools?

Journal articleResearchPeer reviewed

Publication data


ByNikola Ebenbeck, Morten Bastian, Andreas Mühling, Markus Gebhardt
Original languageEnglish
Published inJournal of Computer Assisted Learning, 40(6)
Pages3443-3453
Editor (Publisher)Wiley
ISSN1365-2729, 0266-4909
DOI/Linkhttps://doi.org/10.1111/jcal.13074 (Open Access)
Publication statusPublished – 12.2024
KeywordsComputerised adaptive testing, assessment, special education, school, simulation

Background

Computerised adaptive tests (CATs) are tests that provide personalised, efficient and accurate measurement while reducing testing time, depending on the desired level of precision. Schools have different types of assessments that can benefit from a significant reduction in testing time to varying degrees, depending on the area of application, but for which the loss of measurement accuracy has a different impact. The implementation of CAT can take several approaches, each of which can potentially affect the resulting test length and accuracy.

Objectives

We compare the methods of estimation-based CAT and binary-search-based CAT to determine the extent to which they are suitable for school assessment in terms of their length and accuracy.

Methods

This study uses simulations based on empirical data from a cohort of pupils with and without special needs (n = 400) to examine the effects of probabilistic estimation-based CAT and deterministic binary-search-based CAT on the length and accuracy of an adaptive reading test for pupils with different ability levels.

Results and Conclusions

Estimation-based CAT leads to a 40% test reduction with an average accuracy of r = 0.96, while binary-search-based CAT leads to a test reduction of up to 88% with an average accuracy of r = 0.83. Both methods demonstrate the applicability of CAT in educational environments. Practical advantages and disadvantages of both methods for learning environments are discussed, as well as which method is best suited for specific assessment needs.