NEPS Technical Report for Mathematics: Scaling Results of Starting Cohort 8 for Grade 5

Project reportResearch

Publication data


ByLara Aylin Petersen, Tessa Tabea Beyer
Original languageEnglish
Editor (Publisher)Leibniz Institut für Bildungsverläufe, Nationales Bildungspanel
DOI/Linkhttps://doi.org/10.5157/NEPS:SP122:1.0 (Open Access)
Publication statusPublished – 08.2025

The National Educational Panel Study (NEPS) aims at investigating the development of competencies across the whole life span and designs tests for assessing these different competence domains. To evaluate the quality of the competence tests, a wide range of analyses based on item response theory (IRT) were performed. This report describes the data and scaling procedure for the mathematical competence test for Grade 5 students of Starting Cohort 8. The mathematics test consisted of 24 items representing different content areas as well as different cognitive components and used different response formats. The test was administered to 5,424 students (50% girls) from regular schools. A partial-credit model was used for scaling the data. Item fit statistics, differential item functioning, Rasch-homogeneity, and the test´s dimensionality were evaluated to ensure the quality of the test. The results show that the test exhibited a good reliability (EAP/PV reliability = .82), good item fit statistics, and negligible differential item functioning across different subgroups. Limitations of the test include some difficult items that were missing for the accurate proficiency estimation in the upper ability range. Overall, the results revealed predominantly good psychometric properties of the mathematics test, thus supporting the estimation of an acceptable mathematics competence score. Furthermore, analyses of differential item functioning showed comparable measurement models for the test administered in Starting Cohort 8 and an identical test previously used in Starting Cohort 3 in the same grade. Competence scores from both tests were therefore linked to allow comparisons between the two cohorts. Besides the scaling results, this report also describes the data available in the Scientific Use File and provides the R syntax for scaling the data.