Increasing the generalizability of similarity-based essay scoring through cross-prompt training
Conference contribution (Article) › Research › Peer reviewed
Publication data
| By | Marie Bexte, Yuning Ding, Andrea Horbach |
| Original language | English |
| Published in | Proceedings of the 20th Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2025) |
| Pages | 225–236 |
| Editor (Publisher) | Association for Computational Linguistics |
| ISBN | 979-8-89176-270-1 |
| DOI/Link | https://aclanthology.org/2025.bea-1.17 |
| Publication status | Published – 07.2025 |
In this paper, we address generic essay scoring, i.e., the use of training data from one writing task to score data from a different task. We approach this by generalizing a similarity-based essay scoring method (Xie et al., 2022) to learning from texts that are written in response to a mixture of different prompts. In our experiments, we compare within-prompt and cross-prompt performance on two large datasets (ASAP and PERSUADE). We combine different amounts of prompts in the training data and show that our generalized method substantially improves cross-prompt performance, especially when an increasing number of prompts is used to form the training data. In the most extreme case, this leads to more than double the performance, increasing QWK from .26 to .55.