Don’t score too early! Evaluating argument mining models on incomplete essays

Aufsatz in KonferenzbandForschungbegutachtet

Publikationsdaten


VonNils-Jonathan Schaller, Yuning Ding, Thorben Jansen, Andrea Horbach
OriginalspracheEnglisch
Erschienen inProceedings of the 20th Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2025)
Seiten345–355
Herausgeber (Verlag)Association for Computational Linguistics
ISBN979-8-89176-270-1
DOI/Linkhttps://aclanthology.org/2025.bea-1.27/ (Open Access)
PublikationsstatusVeröffentlicht – 07.2025

Students' argumentative writing benefits from receiving automated feedback, particularly throughout the writing process. Argument Mining (AM) technology shows promise for delivering automated feedback on argumentative structures; however, existing systems are frequently trained on completed essays. Although they provide rich context information, concerns have been raised about their usefulness for offering writing support on incomplete texts during the writing process. This study evaluates the robustness of AM algorithms on artificially fragmented learner texts from two large-scale corpora of secondary school essays: the German DARIUS corpus and the English PERSUADE corpus. Our analysis reveales that token-level sequence-tagging methods, while highly effective on complete essays, suffer significantly when the context is limited or misleading. Conversely, sentence-level classifiers maintain relative stability under such conditions. We show that deliberately training AM models on fragmented input substantially mitigates these context-related weaknesses, enabling AM systems to better support dynamic educational writing scenarios.