Testing teacher judgments comprehensively: Accuracy, halo, frame of reference, strategy, and personality effects in holistic and analytic assessments of student essays

Journal articleResearchPeer reviewed

Publication data


ByJulian Franz Lohmann, Flavio Lötscher, Fynn Junge, Stefan Keller, Johanna Fleckenstein, Thorben Jansen, Jens Möller
Original languageEnglish
Published inJournal of Educational Psychology
Pages20
Editor (Publisher)American Psychological Association
ISSN0022-0663, 1939-2176
DOI/Linkhttps://doi.org/10.1037/edu0000969 (Open Access)
Publication statusPublished advanced online – 08.2025

The present study examined teacher judgment accuracy and bias in text assessment. We (a) investigated differences in judgment accuracy depending on multiple text quality criteria, (b) explored teacher characteristics and judgment behavior as potential moderators of judgment accuracy, and (c) juxtaposed accuracy and typical judgment biases, namely reference group and halo effects. A sample of NLevel 1 = 6,300 judgments from NLevel 2 = 315 German and Swiss preservice English teachers was analyzed using hierarchical linear models. The results predominantly imply a medium to strong relative judgment accuracy (rank component) across analytic and holistic text ratings. Moderation analyses showed that relative judgment accuracy was higher when teachers spent more time and switched more often between essays to achieve consistent judgments but tended to be lower for teachers reporting more text assessment experience. Language quality and content triggered halo effects in judgments of other text quality dimensions. Mean essay quality had a negative effect on teacher judgments, implying a reference-group effect in line with the contrast hypothesis. Although statistically significant, biases were much smaller than coefficients representing accuracy. We discuss the implications of our findings with regard to teacher education.