ASLAN at BEA 2026 Shared Task 2: Voting Across Scoring Paradigms

Aufsatz in KonferenzbandForschungbegutachtet

Publikationsdaten


VonMarie Bexte, Yuning Ding, Josef Ruppenhofer, Nils-Jonathan Schaller, Daniel Ignacio Mora Melanchthon, Torsten Zesch, Andrea Horbach
OriginalspracheEnglisch
Erschienen inProceedings of the 21st Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2026). Association for Computational Linguistics.
Herausgeber (Verlag)Association for Computational Linguistics
ISBN979-8-89176-409-5
PublikationsstatusVeröffentlicht – 07.2026

This paper describes the ASLAN system contribution to the BEA 2026 Shared Task on rubric-based short answer scoring for German (Gombert et al., 2026). We investigate three complementary modeling paradigms: similarity-based scoring, instance-based classification, and rubric-prompted large language models (LLMs). For the unseen answers track, where test answers belong to prompts observed during training, we compare question-specific and generic scoring models as well as ensemble variants. For the unseen questions track, where

models must generalize to previously unseen prompts, we primarily rely on zero-shot LLMbased scoring using the scoring rubrics. Our experiments show that similarity-based models outperform instance-based models and LLMbased models in the unseen answers setting. In addition, we find that ensemble methods improve robustness over individual models