AI tackling the Physics Olympiad: what are the implications for physics education?

August 13 saw the issuance of two intriguing publications by Dr. Paul Tschisgale, a researcher in the IPN’s Department of Physics Education. One of them is a paper in the highly regarded Physical Review Physics Education Research (PRPER), in which Paul and his co-authors show that currently available AI models such as GPT-4o and the new o1 reasoning model are highly competent at solving tasks set as part of the Physics Olympiad – and indeed perform, on average, better than human school students. These results raise urgent issues around AI’s future role in relation to competitions for young learners such as the Physics Olympiad, alongside wider questions about education in schools more generally.

Paul has also published an opinion piece in the American Physical Society’s Physics magazine, setting out why it’s not enough to simply discourage school students from using AI - and how we can support them to use AI productively in the physics classroom.

We caught up with Paul Tschisgale to talk about the two publications and the background to and potential implications of his research.

Interview

IPN: Paul, in your research you’ve been looking at how well currently available AI models can handle tasks from the Physics Olympiad. What about your findings surprised you most?

Paul Tschisgale: Even before I began the study, I did expect large language models such as GPT‑4o to do relatively well with questions from the Physics Olympiad. But I was surprised at how well they actually did – particularly with problems purposely written to challenge even the best-performing learners. This came out especially clearly with one of the newer, optimized reasoning models, which performed much more strongly than we had expected it to. Another unexpected finding was that using different prompting strategies barely made a difference, although other studies had often found this to be decisive. And our detailed analysis showed that the models consistently generated incorrect solutions to particular parts of problems – sometimes these were precisely the parts that human learners found comparatively easy.

IPN: What are the concrete implications of your findings for schools and competitions? Do we need to rethink formats like the Physics Olympiad?

Paul Tschisgale: Our findings illuminate the new challenges faced by schools and by competition formats such as the Physics Olympiad due to the advent of AI. Types of task based on students working without invigilation, such as take-home tests or homework tasks, are susceptible to issues with learners using AI systems like ChatGPT without thinking about the solutions provided – or maybe even submitting those solutions as their own work. This said, just banning the use of AI is an insufficient way to tackle this difficulty. Instead, we should focus on teaching students to use it in a considered and responsible manner. The main thing is to make sure learners don’t just get AI to solve problems and then use those solutions – if you just put a physics problem into an LLM and copy down the answer it gives you, for one, you miss out on key processes of learning, and for another, the solution isn’t your own work. But specific, targeted use of AI opens up new possibilities for learning. For example, I could imagine increased use of AI-based feedback or smart tutoring systems in the future, to give learners immediate and individualized feedback on their work. This could help students get a lot more out of practice and reinforcement of new learnings in the classroom, giving them direct responses to their suggested solutions to problems, while teachers have an easier time and are freed up to provide targeted support where it’s needed.

Traditionally, competitors complete the first round of the Physics Olympiad at home. This opens it up to the use of AI, making it impossible to compare the participants’ performance. If some competitors solve the problems on their own, while others just let AI do the work and submit its solutions without thinking about them, the competition’s integrity is at risk. This is especially concerning because one of the competition’s aims is to recognize and value learners’ individual performance and achievements. The short-term response is to use types of problems that current LLMs struggle to access, such as analyzing diagrams, deriving relevant information from figures and charts, and conducting experiments. Taking a longer view, however, it won’t be enough to just rely on LLMs’ current weaknesses, due to the speed of development in this field – it’s not long ago at all that OpenAI released GPT-5, which it says brings together and surpasses all previous models.

We’ll need to rethink the competition and turn to new formats that don’t just call for the kind of solutions that AI can generate, but rather ones that require continuous engagement with physics. One way might be to run an online course over a relatively long period of time, incorporating exercises, seminars, and experiments, with an exam at the end that you can only take if you’ve been participating actively in the course. This would make it all about exploring physics – as well as replacing the format of just submitting solutions to problems, it would set up a situation which would allow for the productive use of AI as an instrument for formative feedback. So we’re really just at the beginning of a development which will have a significant long-term impact not just on competition formats, but on how learning happens in schools more generally.

IPN: How, methodologically speaking, can you make a valid comparison of the performance of LLMs on tasks from the Physics Olympiad with that of human learners?

Paul Tschisgale: We attained comparability by using the exact same problems that competitors in previous editions of the Physics Olympiad had tackled. LLMs don’t always produce identical answers, so with this in mind, we put each problem through the AI twenty times for each model and each prompting strategy. The key thing was that we assessed the AI’s answers using precisely the same criteria that were used to assess the competitors’ responses. This meant we got distributions of points scores that we could compare directly with the students’ results.
This said, we need to remember that the competitors, in the competition’s later rounds, were working under exam conditions – you have limited time, you’re stressed, maybe tired. The LLMs got to generate their answers in “lab conditions,” so to speak. This means it’s not a perfect comparison, but it does give us a robust sense of how well the AI models perform compared to high-achieving participants in the Physics Olympiad.

IPN: What might sensible and productive use of LLMs in the classroom look like?

Paul Tschisgale: This might, for example, look like students critically analyzing AI-generated solutions to various problems and comparing them to their own answers. It could help them recognize when using AI can help - and where its limits lie. We can see from this that the human ability to reflect on and check things is still essential. We could also use LLMs as a basis for systems that give learners individualized, adaptive feedback. Especially where students are practicing and consolidating what they have learned, this could make things easier for teachers, as the learners would get feedback on the spot and the teacher would have more time to support those who needed it. This sort of scenario shows that AI isn’t just a risk, but – used the right way – can supplement learning.

IPN: What do you think are the most important skills for school students to learn for using AI – alongside curricular content in physics, of course?

Paul Tschisgale: As well as curricular content, it’s crucial that students see AI as a tool and learn how to use it in a thoughtful way. This entails knowing about the strengths and weaknesses of LLMs and being able to judge when using them might be helpful and where their limits are. In the classroom, students should practice critically checking AI-generated answers, using them as a basis for their own further work, and going through iterative processes with AI tools. It’s these sorts of tasks that demonstrate how vital it still is for students to have a proper grasp of the concepts; if they don’t, they can’t judge whether an answer generated by AI makes sense or whether it’s actually complete nonsense. So AI can provide valuable feedback and food for thought, but learners remain responsible for assessing what it produces and understanding the content of their physics lessons.

IPN: What would you say is the key takeaway from your publication for education stakeholders?

Paul Tschisgale: The key message, in my view, is that AI is here to stay – and that tools like ChatGPT will have an important role to play in our students’ futures, which means that they need to be learning now how to use them thoughtfully and responsibly. It’s not about competing with AI, but about seeing it as a partner while being clear on the fact that humans retain responsibility for critically checking what AI generates and improving it in line with their own expertise. We still need physics teaching to provide learners with solid foundations and with problem-solving skills, but we also need it to focus more on critical thinking and on how to use AI in a way that’s aware of its strengths and limitations.