Synthetic data as a method for increasing reproducibility and transparency in educational research

Journal articleResearchPeer reviewed

Publication data


BySimon Grund, Oliver Lüdtke, Alexander Robitzsch
Original languageEnglish
Published inZeitschrift für Erziehungswissenschaft
Pages25
Editor (Publisher)VS Verlag fur Sozialwissenschaften
ISSN1434-663X, 1862-5215
DOI/Linkhttps://doi.org/10.1007/s11618-026-01396-6 (Open Access)
Publication statusPublished advanced online – 02.2026

Open data are often regarded as an important step towards improving the reproducibility and transparency of educational science. Yet, data sharing remains rare, and without open data, statistical analyses often remain irreproducible. In this article, we provide an introduction to synthetic data, a statistical technique based on multiple imputation (MI) that can be used to create simulated copies of the data that can be shared even when the original data cannot. To this end, we discuss reproducibility-related challenges of synthetic data and outline different approaches for generating synthetic data, including conventional and data-augmented MI (DA-MI) approaches to synthetic data. Furthermore, we conducted a case study using data from the PISA 2018 study, in which we aimed to address several challenges with synthetic data in educational research, such as missing data, multilevel data, and complex sampling designs. Our results indicate that these challenges can be addressed with relatively simple tools and that synthetic data can reproduce the results in a variety of statistical analyses. Finally, we discuss remaining challenges and directions for future research.