Lesson planning with ChatGPT for inquiry-based biology instruction – A(I) roll of the dice?

Journal articleResearchPeer reviewed

Publication data


ByLeroy Großmann, Maren Koberstein-Schwarz, Dirk Krüger, Moritz Krell
Original languageEnglish
Published inInternational Journal of Science Education
Editor (Publisher)Taylor and Francis Ltd.
ISSN0950-0693, 1464-5289
DOI/Linkhttps://doi.org/10.1080/09500693.2025.2567509 (Open Access)
Publication statusPublished advanced online – 10.2025

This study investigates the capability of ChatGPT-4o to generate

high-quality inquiry-based science lesson plans, that is, aligning all

elements of a written lesson plan to students’ learning about

procedural and epistemic aspects of science instead of gaining

subject matter knowledge. Using an exploratory sequential mixed

methods design, we analysed N = 60 biology lesson plans

generated by the research team across four key topics (cell

biology, genetics/evolution, human biology, ecology) and five

scientific inquiry practices (microscopy, observing, experimenting,

modelling, reflecting the nature of science) from the German

curriculum . First, lesson plans were quantitatively evaluated using

a modified version of Großmann and Krügers’ [(2024). Assessing

the quality of science teachers’ lesson plans: Evaluation and

application of a novel instrument. Science Education, 108(1), 153–

189. https://doi.org/10.1002/sce.v108.1] scoring rubric, which

assessed ten quality criteria with substantial interrater agreement

(Cohen’s κ = .67). Second, based on the score distribution, we

conducted qualitative analyses to identify strengths and

weaknesses in ChatGPT-generated inquiry-based lesson plans.

Results revealed considerable variation in lesson plan quality, with

n = 22 lesson plans achieving less than 50% of the maximum

possible score and only n = 5 lesson plans reaching 75% or higher.

While the lesson plans demonstrated particular strengths in

content accuracy and learning outcome alignment, they exhibited

significant weaknesses in addressing students’ inquiry-related

conceptions and maintaining consistent focus on scientific inquiry.

While ChatGPT-4o successfully generated some high-quality

lessons plans for inquiry-based instruction, many lesson plans

shifted inappropriately towards subject matter knowledge

acquisition rather than scientific inquiry processes. This discrepancy

between our initial prompts and the resulting lesson plans

indicates that while large language models like ChatGPT-4o

demonstrate potential as preliminary planning tools, they

necessitate thorough evaluation and substantive modification by

teachers possessing robust pedagogical content knowledge. The

findings emphasise that the advent of generative artificial intelligence does not diminish the importance of professional

knowledge in teacher education but rather transforms how this

knowledge is applied in lesson planning processes.