Through the Sentence Lens: Explainable Essay Scoring through Fine-Grained Predictions

Conference contribution (Article)ResearchPeer reviewed

Publication data


ByDaniel Ignacio Mora Melanchthon, Stefan Keller, Andrea Horbach
Original languageEnglish
Published inProceedings of the 21th Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2026)
Editor (Publisher)Association for Computational Linguistics
Publication statusPublished – 07.2026

Beyond performance, model transparency is a crucial factor in Automated Essay Scoring, yet current systems often lack explainability, limiting their pedagogical value and users' trust. Existing explainability methods, such as gradient-based attribution or feature-importance approaches, either produce counterintuitive explanations or are too complex for classroom use. To address this limitation, we make use of fine-grained prediction at the sentence level as a way to enhance explainability. We propose ablation strategies to derive sentence-level pseudo scores from essay-level gold scores and use them to train sentence-level models. We evaluate their performance against essay-level baselines on two datasets (ASAP and MEWS), and compare their sentence-level output to a human baseline. Results indicate a trade-off between essay-level performance and sentence-level granularity. For the language quality trait, most sentence-level models achieve performance comparable to the essay-level baseline, whereas for content, the approach yields more positive results on prompts with shorter student texts.