Lesson planning with ChatGPT for inquiry-based biology instruction – A(I) roll of the dice?
Journal article › Research › Peer reviewed
Publication data
| By | Leroy Großmann, Maren Koberstein-Schwarz, Dirk Krüger, Moritz Krell |
| Original language | English |
| Published in | International Journal of Science Education |
| Editor (Publisher) | Taylor and Francis Ltd. |
| ISSN | 0950-0693, 1464-5289 |
| DOI/Link | https://doi.org/10.1080/09500693.2025.2567509 |
| Publication status | Published advanced online – 10.2025 |
This study investigates the capability of ChatGPT-4o to generate
high-quality inquiry-based science lesson plans, that is, aligning all
elements of a written lesson plan to students’ learning about
procedural and epistemic aspects of science instead of gaining
subject matter knowledge. Using an exploratory sequential mixed
methods design, we analysed N = 60 biology lesson plans
generated by the research team across four key topics (cell
biology, genetics/evolution, human biology, ecology) and five
scientific inquiry practices (microscopy, observing, experimenting,
modelling, reflecting the nature of science) from the German
curriculum . First, lesson plans were quantitatively evaluated using
a modified version of Großmann and Krügers’ [(2024). Assessing
the quality of science teachers’ lesson plans: Evaluation and
application of a novel instrument. Science Education, 108(1), 153–
189. https://doi.org/10.1002/sce.v108.1] scoring rubric, which
assessed ten quality criteria with substantial interrater agreement
(Cohen’s κ = .67). Second, based on the score distribution, we
conducted qualitative analyses to identify strengths and
weaknesses in ChatGPT-generated inquiry-based lesson plans.
Results revealed considerable variation in lesson plan quality, with
n = 22 lesson plans achieving less than 50% of the maximum
possible score and only n = 5 lesson plans reaching 75% or higher.
While the lesson plans demonstrated particular strengths in
content accuracy and learning outcome alignment, they exhibited
significant weaknesses in addressing students’ inquiry-related
conceptions and maintaining consistent focus on scientific inquiry.
While ChatGPT-4o successfully generated some high-quality
lessons plans for inquiry-based instruction, many lesson plans
shifted inappropriately towards subject matter knowledge
acquisition rather than scientific inquiry processes. This discrepancy
between our initial prompts and the resulting lesson plans
indicates that while large language models like ChatGPT-4o
demonstrate potential as preliminary planning tools, they
necessitate thorough evaluation and substantive modification by
teachers possessing robust pedagogical content knowledge. The
findings emphasise that the advent of generative artificial intelligence does not diminish the importance of professional
knowledge in teacher education but rather transforms how this
knowledge is applied in lesson planning processes.