Self-assessment accuracy in the age of artificial intelligence: Differential effects of LLM-generated feedback

Artikel in FachzeitschriftForschungbegutachtet

Publikationsdaten


VonLucas Wilhelm Liebenow, Fabian T.C. Schmidt, Jennifer Meyer, Johanna Fleckenstein
OriginalspracheEnglisch
Erschienen inComputers & Education, 237, Artikel 105385
Herausgeber (Verlag)Elsevier
ISSN0360-1315, 1873-782X
DOI/Linkhttps://doi.org/10.1016/j.compedu.2025.105385 (Open Access)
PublikationsstatusVeröffentlicht – 11.2025

Feedback is a promising intervention to foster students’ self-assessment accuracy (SAA), but the effect can vary depending on students' initial skill levels or prior performance. In particular, lower-performing students who are less accurate might benefit more from feedback in terms of SAA. To deepen our understanding, the present study investigated the mechanism and dependencies of feedback effects on SAA in the realm of large language models (LLMs). Within a randomized control experiment, we examined the effect of LLM-generated feedback on SAA by considering students’ initial performance and initial SAA as potential moderators. A sample of N = 459 upper secondary students wrote an argumentative essay in English as a foreign language and revised their text. After finishing their first draft (pretest) and revision (posttest) of the draft, students self-assessed their writing performance. Students in the experimental group received GPT-3.5-turbo-generated feedback on their first draft during their revision. In the control group, students could revise their text without feedback. Our results indicated no significant main effect of LLM-generated feedback on students’ SAA. Furthermore, we found a significant interaction effect between feedback and students' pretest SAA on SAA changes, indicating that lower-calibrated students improved their SAA with feedback more than students with similar pretest SAA and without feedback. Exploratory analyses revealed that students with higher pretest SAA did not improve their SAA with feedback and decreased their SAA. We discuss this nuanced evidence and draw implications for research and practice using LLM-generated feedback in education.