Increasing the generalizability of similarity-based essay scoring through cross-prompt training

Aufsatz in KonferenzbandForschungbegutachtet

Publikationsdaten


VonMarie Bexte, Yuning Ding, Andrea Horbach
OriginalspracheEnglisch
Erschienen inProceedings of the 20th Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2025)
Seiten225–236
Herausgeber (Verlag)Association for Computational Linguistics
ISBN979-8-89176-270-1
DOI/Linkhttps://aclanthology.org/2025.bea-1.17 (Open Access)
PublikationsstatusVeröffentlicht – 07.2025

In this paper, we address generic essay scoring, i.e., the use of training data from one writing task to score data from a different task. We approach this by generalizing a similarity-based essay scoring method (Xie et al., 2022) to learning from texts that are written in response to a mixture of different prompts. In our experiments, we compare within-prompt and cross-prompt performance on two large datasets (ASAP and PERSUADE). We combine different amounts of prompts in the training data and show that our generalized method substantially improves cross-prompt performance, especially when an increasing number of prompts is used to form the training data. In the most extreme case, this leads to more than double the performance, increasing QWK from .26 to .55.