Instruction-tuned large-language models for quality control in automatic item generation: A feasibility study

Artikel in Fachzeitschrift › Forschung › begutachtet

Publikationsdaten

Von	Guher Gorgun, Okan Bulut
Originalsprache	Englisch
Erschienen in	Educational Measurement: Issues and Practice, 44(1)
Seiten	96-107
Herausgeber (Verlag)	Wiley-Blackwell
ISSN	1745-3992
DOI/Link	https://doi.org/10.1111/emip.12663
Publikationsstatus	Veröffentlicht – 03.2025

Automatic item generation may supply many items instantly and efficiently to assessment and learning environments. Yet, the evaluation of item quality persists to be a bottleneck for deploying generated items in learning and assessment settings. In this study, we investigated the utility of using large-language models, specifically Llama 3-8B, for evaluating automatically generated cloze items. The trained large-language model was able to filter out majority of good and bad items accurately. Evaluating items automatically with instruction-tuned LLMs may aid educators and test developers in understanding the quality of items created in an efficient and scalable manner. The item evaluation process with LLMs may also act as an intermediate step between item creation and field testing to reduce the cost and time associated with multiple rounds of revision.

Aktuelles

Über uns

Abteilungen

Forschungslinien

Projekte

Alle Publikationen des IPN

Open Science & Gute Wissenschaftliche Praxis

Kooperationen & Vernetzung

Themen

Unterrichtsergänzende Angebote

Unterrichts- und Fortbildungsmaterialien

Podcasts - Forschung zum Hören

IPN Journal

Instruction-tuned large-language models for quality control in automatic item generation: A feasibility study

Artikel in Fachzeitschrift › Forschung › begutachtet

Publikationsdaten

DOI

IPN - Leibniz-Institut für die Pädagogik der Naturwissenschaften und Mathematik