SlideShare ist ein Scribd-Unternehmen logo
1 von 30
Demonstrating the 
consequences of not taking into 
account sampling designs with 
TIMSS 2011 data 
Dr. Christian Bokhove 
Lecturer in Mathematics Education 
University of Southampton 
EARLI SIG 
August 28th 2014
OUTLINE 
• International studies 
• IEA & OECD 
• PISA, TIMSS, … 
• Some aspects of their sampling design 
• Two stage sampling 
• Weights 
• Rotated test design 
• What if you don’t take this into account? 
• Simulation with TIMSS 2011 data 
• Single level model 
• Multilevel models
OUTLINE 
• International studies 
• IEA & OECD 
• PISA, TIMSS, … 
• Some aspects of their sampling design 
• Two stage sampling 
• Weights 
• Rotated test design 
• What if you don’t take this into account? 
• Simulation with TIMSS 2011 data 
• Single level model 
• Multilevel models
IEA & OECD 
The International Association for 
the Evaluation of Educational 
Achievement (IEA) is an 
independent, international 
cooperative of national research 
institutions and governmental 
research agencies. It conducts 
large-scale comparative studies of 
educational achievement and 
other aspects of education. 
The mission of the Organisation 
for Economic Co-operation and 
Development (OECD) is to 
promote policies that will 
improve the economic and social 
well-being of people around the 
world.
PISA 
http://www.oecd.org/pisa/ 
“The Programme for International Student Assessment (PISA) is a 
triennial international survey which aims to evaluate education 
systems worldwide by testing the skills and knowledge of 15-year-old 
students. To date, students representing more than 70 economies have 
participated in the assessment.” 
• Last one appeared in 2013 with 2012 data
TIMSS 
http://timssandpirls.bc.edu/timss2011/ 
“TIMSS 2011 is the fifth in IEA’s series of international assessments of 
student achievement dedicated to improving teaching and learning in 
mathematics and science. First conducted in 1995, TIMSS reports every 
four years on the achievement of fourth and eighth grade students.“
OUTLINE 
• International studies 
• IEA & OECD 
• PISA, TIMSS, … 
• Some aspects of their sampling design 
• Two stage sampling 
• Weights 
• Rotated test design 
• What if you don’t take this into account? 
• Simulation with TIMSS 2011 data 
• Single level model 
• Multilevel models
Two-stage sampling in educational studies 
● Random sampling is rarely used in educational surveys: 
– Too expensive (e.g., training test administrators and travel costs) 
● Selected students attend many different schools 
– It is not practical to contact many schools 
– A link with class, teacher, school variables is sought 
● Sampling is usually conducted in two stages 
● First stage 
– Schools are selected 
● Second stage 
– Students (PISA) or classes (TIMSS/PIRLS) are selected 
● 35 students selected randomly (PISA) 
● One or two intact classes (TIMSS/PIRLS)
Replicate weights 
● Replicate weights or resampling techniques are used to calculate 
correct standard errors in two-stage sampling designs 
● The idea behind: 
– There are many possible samples of schools and not all of them yield the 
same estimates 
– Use different samples of schools to calculate estimates 
– Take into account error of selecting one school and not another 
(sampling error) 
● Each replicate weight represents one sample 
● Variability between estimates reflects the sampling error
Two replication methods 
● Jackknife 
– TIMSS and PIRLS 
– Schools are paired with other similar schools within zones 
– A replicate is created for each zone or pair of schools 
– One school is randomly removed within each zone and the weight of the 
other school is doubled 
● Balanced repeated replication (BRR) 
– Select one school at random within each stratum 
– Set its weight to 0 
– Double the weight of the other school 
– PISA uses a variant of BRR (Fay) to prevent 
smaller sample size 
Source: OECD (2009). PISA Data Analysis Manual: SPSS (2nd Edition. Paris): OECD Publishing.
OUTLINE 
• International studies 
• IEA & OECD 
• PISA, TIMSS, … 
• Some aspects of their sampling design 
• Two stage sampling 
• Weights 
• Rotated test design 
• What if you don’t take this into account? 
• Simulation with TIMSS 2011 data 
• Single level model 
• Multilevel models
Weights 
• In theory sampling design provides student samples with equal 
selection probabilities. 
• But variation in number of classes selected, and differential patterns 
of nonresponse can result in varying selection probabilities, requiring 
a unique sampling weight for the students in each participating class 
in the study. 
• Total weight (TOTWGT) 
• Sums to the student population size in each country 
• The overall student sampling weight is the product of the final weight 
components for schools, classes, and students 
• Important in multilevel analyses 
• School level: final school weight 
• Student level: final student weight multiplied with final class weight
OUTLINE 
• International studies 
• IEA & OECD 
• PISA, TIMSS, … 
• Some aspects of their sampling design 
• Two stage sampling 
• Weights 
• Rotated test design 
• What if you don’t take this into account? 
• Simulation with TIMSS 2011 data 
• Single level model 
• Multilevel models
Rotated test design 
● The item pool should include a large number of items for domain 
validity (e.g., mathematical literacy) 
● At the same time: 
– Fatigue biases results of long tests 
– Schools refuse to participate in lengthy studies 
● Rotated test forms 
– Students are assigned a subset of item pool 
– Minimize testing time
Plausible values 
● Rotated booklets introduce challenges for estimating academic 
achievement 
– Students miss data on a number of items 
● Plausible values methods are employed to obtain population 
estimates with rotated booklet designs 
● Students do not answer all items but plausible scores are produced 
as if they had responded to all items based on 
– Responses to test items 
– Background characteristics
Plausible values 
● Plausible values are random draws from the distribution of a 
student's ability 
– Instead of obtaining a point estimate, a range of values are estimated for 
each student 
● A single score cannot be calculated because data is missing for a 
number of items 
● Plausible values account for imputation error 
– Making inference on ability from small number of items 
● Estimation should be conducted separately for each plausible value 
– Typically five plausible values are considered 
– The variability between estimates reflects the imputation error
Challenge 
● Ignoring the complex design leads to wrong conclusions, like different 
point estimates and/or underestimated standard errors, see Rutkowski et 
al. (2010) 
– Variance estimation: jackknife, BRR 
– Not taking into account weights (e.g. Rutkowski et al (2010): Bulgarian TIMSS 
2007, higher probability of selection to students from vocational and profiled 
schools). In a multilevel situation choosing wrong composite weights. 
– Treatment of plausible values: instead of Rubin’s rules averaging (five) 
plausible values or choosing only one plausible value. 
● Drent et al. (2013) formulated quality criteria (low, satisfactory, high) 
● Standard software cannot handle replicate weights and plausible values
Available software 
● IDB Analyzer (SPSS) 
● NAEP Data Explorer (web tool) 
● PISA SPSS macros 
● R package 'intsvy‘ (Daniel Caro, Oxford) 
– Free 
– Does not rely on commercial software like SPSS or SAS 
– Open source 
– Can be extended to perform other analyses
Available software 
Multilevel software 
● R 
– Has multilevel package but no weights 
– Can link to MLwin 
● MLwin 
– Have to combine plausible 
values manually 
– No resampling 
– Does handle weights 
● HLM 
– Combines plausible values 
– Weights 
– No resampling
OUTLINE 
• International studies 
• IEA & OECD 
• PISA, TIMSS, … 
• Some aspects of their sampling design 
• Two stage sampling 
• Weights 
• Rotated test design 
• What if you don’t take this into account? 
• Simulation with TIMSS 2011 data 
• Single level model 
• Multilevel models
Simulation with TIMSS 2011 data 
• TIMSS 2011 
• Three aspects: jackknife, weights, plausible values 
• Five countries: 
England is chosen as a base-level, using the ranking for grade 8 TIMSS 
2011. One arbitrary country significantly above England in the 
rankings, Singapore, is chosen, as well as one country significantly 
below England in the rankings (Norway). In addition the countries 
respectively one place higher and one place lower are chosen 
(United States and Hungary).
Simulation with TIMSS 2011 data 
• Data preparation: 
• Publicly available TIMSS 2011 year 8 data files are used. 
• Additional columns calculated: average of the five plausible values and 
different weighting columns. 
• Two experiments: 
A. single level analyses, and 
B. multilevel analyses with students nested in schools. 
• For experiment A an open source R package intsvy (Caro, 2014) for R 
is used. 
• Experiment B looks at multilevel models by constructing null models 
in HLM 6.08 for five countries with student and school levels.
Single level 
Different scenarios: 
• Two conditions concern variance estimation with jackknife (JK): 
either jackknife is applied or isn’t applied. 
• Two conditions concern weights (Wgt): either weights are applied or 
are not applied. 
• Three final conditions for the maths achievement scores are used for 
Plausible Values. 
• PVR denotes the correct approach using ‘plausible values with Rubin’s rules’. 
• PVA denotes the ‘mean of the plausible values’. 
• PV1 only uses ‘the first plausible value’. 
A total of 2×2×3=12 cases are calculated, as shown in the table on the 
next slide. Case 1 replicates the values from the international report 
(Mullis, Martin, Foy, & Arora, 2012).
PV1 Case 9 
With JK With Wgt 
Case 10 
No JK With Wgt 
Case 11 
With JK No Wgt 
Case 12 
No JK No Wgt 
Country Score SE # Score SE # Score SE # Score SE # 
Singapore 609.71 3.68 1 609.71 1.08 1 606.22 3.63 1 606.22 1.08 1 
USA 508.75 2.58 2 508.75 0.75 2 508.92 2.52 4 508.92 0.74 4 
England 506.03 5.45 3 506.03 1.36 3 509.44 5.59 3 509.44 1.37 3 
Hungary 504.75 3.44 4 504.75 1.22 4 513.38 2.96 2 513.38 1.16 2 
Norway 475.24 2.38 5 475.24 1.03 5 477.04 2.62 5 477.04 1.03 5 
PVA Case 5 
With JK With Wgt 
Case 6 
No JK With Wgt 
Case 7 
With JK No Wgt 
Case 8 
No JK No Wgt 
Country Score SE # Score SE # Score SE # Score SE # 
Singapore 610.99 3.73 1 610.99 1.06 1 607.54 3.68 1 607.54 1.06 1 
USA 509.48 2.59 2 509.48 0.73 2 509.68 2.53 4 509.68 0.72 4 
England 506.76 5.48 3 506.76 1.34 3 509.99 5.64 3 509.99 1.35 3 
Hungary 504.81 3.48 4 504.81 1.21 4 513.47 2.98 2 513.47 1.15 2 
Norway 474.64 2.37 5 474.64 0.99 5 476.55 2.64 5 476.55 1.00 5 
PVR Case 1 
With JK With Wgt 
Case 2 
No JK With Wgt 
Case 3 
With JK No Wgt 
Case 4 
No JK No Wgt 
Country Score SE # Score SE # Score SE # Score SE # 
Singapore 610.99 3.77 1 610.99 0.83 1 607.54 3.74 1 607.54 0.87 1 
USA 509.48 2.63 2 509.48 0.55 2 509.68 2.58 4 509.68 0.57 4 
England 506.76 5.53 3 506.76 0.89 3 509.99 5.63 3 509.99 0.70 3 
Hungary 504.81 3.48 4 504.81 0.47 4 513.47 2.98 2 513.47 0.40 2 
Norway 474.64 2.44 5 474.64 0.55 5 476.55 2.66 5 476.55 0.50 5 
Maths achievement scores and standard errors for five countries for twelve different cases with weights, jackknife 
and plausible values.
Observations 
Differences in achievement results and standard errors: 
• Not taking into account Jackknife (example in yellow) 
• Average score the same. 
• Underestimates standard error. 
• So: relative ranking same but significant testing influenced. 
• Not taking into account weights (example in orange) 
• Influences achievement scores: USA, England, Hungary and Norway scoring 
higher, and Singapore scoring lower. 
• Impact on relative rankings. 
• Standard errors different, some higher some lower. 
• Plausible values (example in green) 
• PVA and PVR the same achievement score, PV1 different. 
• PVA and PV1 underestimate standard error. 
• But no clear pattern PVA and PV1 (which contradicts previous literature).
Multilevel 
Used HLM, does not have Jackknife 
• Note that with MLwin you need to 
combine Plausible Values manually. 
• Three conditions concern weights: 
no weights, weights only at student 
level (see Willms & Smith, 2005) 
and final weights (Rutkowski et al., 
2010). 
• Three conditions for the maths 
achievement scores are used for 
Plausible Values. PVR denotes the 
correct approach using ‘plausible 
values with Rubin’s rules’. PVA 
denotes the ‘mean of the plausible 
values’. PV1 only uses ‘the first 
plausible value’. 
• The 3×3 scenarios are reported in 
table 3.
Maths achievement scores and standard errors of five countries for multilevel null models in three different 
weighting scenarios S1, S4 and S6 and plausible values.
Observations 
Differences in achievement results and standard errors: 
• The different weighting methods greatly influence achievement scores and 
standard errors. This also has an impact on the relative rankings. There does 
not seem to be a pattern in over- or underestimation of scores and standard 
errors. 
• For plausible values the cases for PV1 yield a different average than PVA and 
PVR, in three cases lower except for Hungary and Norway. For PVA and PV1, 
the standard error is underestimated with respect to PVR. However, 
between PVA and PV1 underestimation of SE’s differ only slightly, with PVA 
in most cases being closer to or just as close to PVR as PV1. 
• Singapore PV1 PVA PVR 
United states PVA PVR PV1 
England PV1 PVA PVR 
Hungary PV1 PVA PVR 
Norway PVA PV1 PVR
Final thoughts 
• Not taking into account three features of complex sample designs for 
LSA’s can have a big influence on achievement scores, standard errors 
and rankings. 
• Confirms findings by Rutkowski et al. (2010). 
• Not all ‘rules of thumb’ from previous literature (Drent et al., 2013; 
Rutkowski et al., 2010) seem to hold. 
• Therefore, caution should always be taken when analysing LSA data, 
hopefully improving future LSA analyses by educational researchers. 
• Need transparent methodology 
THANK YOU 
C.Bokhove@soton.ac.uk 
QUESTIONS/DISCUSSION
Relevant references 
Beaton, A.E., & Gonzalez, E.J. (1995). NAEP Primer. Center for the study of testing, evaluation and 
educational policy, Boston College. Chestnut hill: MA. 
Caro, D. (2014). intsvy: International Assessment Data Manager. R package version 1.3. http://CRAN.R-project. 
org/package=intsvy 
Drent, M, Meelissen, M.R.M., & van der Kleij, F.M. (2013). The contribution of TIMSS to the link between 
school and classroom factors and student achievement. Journal of curriculum studies, 45 (2), 198 - 224. 
Goldstein, H. (2004). International comparisons of student attainment: some issues arising from the PISA 
study. Assessment in Education, 11(3), 319-330. 
Kreiner, S., & Christensen, K. B. (2014). Analyses of model fit and robustness. A new look at the PISA 
scaling model underlying ranking of countries according to reading literacy. Psychometrika, 79(2), 210-231. 
Martin, M.O. & Mullis, I.V.S. (Eds.). (2012). Methods and procedures in TIMSS and PIRLS 2011. Chestnut 
Hill, MA: TIMSS & PIRLS International Study Center, Boston College. 
Mullis, I.V.S., Martin, M.O., Foy, P., & Arora, A. (2012).TIMSS 2011 International results in mathematics. 
Lynch School of Education, Boston College. 
Rubin, D. (1987). Multiple imputation for nonresponse in sample surveys. New York: John Wiley. 
Rutkowski, L., Gonzalez, E., Joncas, M., & von Davier, M. (2010). International large-scale assessment data: 
Issues in secondary analysis and reporting. Educational Researcher, 39(2), 142-151. 
Von Davier, M., Gonzalez, E., & Mislevy, R.J. (2009). Plausible values: What are they and why do we need 
them? IERI Monograph Series: Issues and Methodologies in Large-Scale Assessments, 2, 9-36. 
Willms, J.D., & Smith, T. (2005). A manual for conducting analyses with data from TIMSS and PISA. Report 
prepared for UNESCO Institute for Statistics.

Weitere ähnliche Inhalte

Was ist angesagt?

District 30 - 2011 ISAT
District 30 - 2011 ISATDistrict 30 - 2011 ISAT
District 30 - 2011 ISATAndrew Kohl
 
2016 SSLMA Award Presentation
2016 SSLMA Award Presentation 2016 SSLMA Award Presentation
2016 SSLMA Award Presentation NassauSLS
 
Non linear to linear
Non linear to linearNon linear to linear
Non linear to linearjosephraj007
 
Assessment for learning part 2
Assessment for learning part 2Assessment for learning part 2
Assessment for learning part 2donpott
 

Was ist angesagt? (7)

District 30 - 2011 ISAT
District 30 - 2011 ISATDistrict 30 - 2011 ISAT
District 30 - 2011 ISAT
 
Mite 6025 surveys
Mite 6025 surveysMite 6025 surveys
Mite 6025 surveys
 
2016 SSLMA Award Presentation
2016 SSLMA Award Presentation 2016 SSLMA Award Presentation
2016 SSLMA Award Presentation
 
Non linear to linear
Non linear to linearNon linear to linear
Non linear to linear
 
INEE Curso UIMP 2016 - Evaluación educativa: Luka Boeskens
INEE Curso UIMP 2016 - Evaluación educativa: Luka BoeskensINEE Curso UIMP 2016 - Evaluación educativa: Luka Boeskens
INEE Curso UIMP 2016 - Evaluación educativa: Luka Boeskens
 
cv_1_
cv_1_cv_1_
cv_1_
 
Assessment for learning part 2
Assessment for learning part 2Assessment for learning part 2
Assessment for learning part 2
 

Ähnlich wie Demonstrating the consequences of not taking into account sampling designs with TIMSS 2011 data

Opportunity to learn secondary maths: A curriculum approach with TIMSS 2011 data
Opportunity to learn secondary maths: A curriculum approach with TIMSS 2011 dataOpportunity to learn secondary maths: A curriculum approach with TIMSS 2011 data
Opportunity to learn secondary maths: A curriculum approach with TIMSS 2011 dataChristian Bokhove
 
IEA: Evaluaciones externas 4º E.Primaria TIMSS PIRLS (Gabriela Noveanu) - Sim...
IEA: Evaluaciones externas 4º E.Primaria TIMSS PIRLS (Gabriela Noveanu) - Sim...IEA: Evaluaciones externas 4º E.Primaria TIMSS PIRLS (Gabriela Noveanu) - Sim...
IEA: Evaluaciones externas 4º E.Primaria TIMSS PIRLS (Gabriela Noveanu) - Sim...Instituto Nacional de Evaluación Educativa
 
CATALST intro stats course presentation at JMM 2013 (Elizabeth Fry, Laura Zie...
CATALST intro stats course presentation at JMM 2013 (Elizabeth Fry, Laura Zie...CATALST intro stats course presentation at JMM 2013 (Elizabeth Fry, Laura Zie...
CATALST intro stats course presentation at JMM 2013 (Elizabeth Fry, Laura Zie...statisfactions
 
Building Institutional Research Capacity in a K-12 Unified District
Building Institutional Research Capacity in a K-12 Unified DistrictBuilding Institutional Research Capacity in a K-12 Unified District
Building Institutional Research Capacity in a K-12 Unified DistrictChristopher Kolar
 
Curriculum development 2
Curriculum development 2Curriculum development 2
Curriculum development 2Anh Nguyet Diep
 
Models of Curriculum Evaluation.pptx
Models of Curriculum Evaluation.pptxModels of Curriculum Evaluation.pptx
Models of Curriculum Evaluation.pptxgabasajennifer0
 
From novice to expert: A critical evaluation of direct instruction
From novice to expert: A critical evaluation of direct instructionFrom novice to expert: A critical evaluation of direct instruction
From novice to expert: A critical evaluation of direct instructionChristian Bokhove
 
Curriculum Evaluation.pptx
Curriculum Evaluation.pptxCurriculum Evaluation.pptx
Curriculum Evaluation.pptxAlmeraAkmad
 
Curriculum Evaluation.pptx
Curriculum Evaluation.pptxCurriculum Evaluation.pptx
Curriculum Evaluation.pptxAlmeraAkmad
 
Various Types of Quantitative Research.pptx
Various Types of Quantitative Research.pptxVarious Types of Quantitative Research.pptx
Various Types of Quantitative Research.pptxMonojitGope
 
Katarina Thomson and Karl Molden - Turning Course Evaluation into Information
Katarina Thomson and Karl Molden - Turning Course Evaluation into InformationKatarina Thomson and Karl Molden - Turning Course Evaluation into Information
Katarina Thomson and Karl Molden - Turning Course Evaluation into InformationAssociation of University Administrators
 
8.6.13 assessment policy media briefing
8.6.13 assessment policy media briefing8.6.13 assessment policy media briefing
8.6.13 assessment policy media briefingchipubschools
 
classroom-based assessment in vietnam - VLAS 2017
classroom-based assessment in vietnam - VLAS 2017classroom-based assessment in vietnam - VLAS 2017
classroom-based assessment in vietnam - VLAS 2017anh vu
 
2015 EDM Leopard for Adaptive Tutoring Evaluation
2015 EDM Leopard for Adaptive Tutoring Evaluation2015 EDM Leopard for Adaptive Tutoring Evaluation
2015 EDM Leopard for Adaptive Tutoring EvaluationYun Huang
 
Mixed methods research combined
Mixed methods research combinedMixed methods research combined
Mixed methods research combinedabnaking
 
Education, data policy and practice - Kim Schildkamp
Education, data policy and practice - Kim Schildkamp  Education, data policy and practice - Kim Schildkamp
Education, data policy and practice - Kim Schildkamp EduSkills OECD
 
Decant workshop 3.9.13
Decant workshop 3.9.13Decant workshop 3.9.13
Decant workshop 3.9.13gwsis
 

Ähnlich wie Demonstrating the consequences of not taking into account sampling designs with TIMSS 2011 data (20)

Opportunity to learn secondary maths: A curriculum approach with TIMSS 2011 data
Opportunity to learn secondary maths: A curriculum approach with TIMSS 2011 dataOpportunity to learn secondary maths: A curriculum approach with TIMSS 2011 data
Opportunity to learn secondary maths: A curriculum approach with TIMSS 2011 data
 
IEA: Evaluaciones externas 4º E.Primaria TIMSS PIRLS (Gabriela Noveanu) - Sim...
IEA: Evaluaciones externas 4º E.Primaria TIMSS PIRLS (Gabriela Noveanu) - Sim...IEA: Evaluaciones externas 4º E.Primaria TIMSS PIRLS (Gabriela Noveanu) - Sim...
IEA: Evaluaciones externas 4º E.Primaria TIMSS PIRLS (Gabriela Noveanu) - Sim...
 
ICME 2016 presentation
ICME 2016 presentationICME 2016 presentation
ICME 2016 presentation
 
CATALST intro stats course presentation at JMM 2013 (Elizabeth Fry, Laura Zie...
CATALST intro stats course presentation at JMM 2013 (Elizabeth Fry, Laura Zie...CATALST intro stats course presentation at JMM 2013 (Elizabeth Fry, Laura Zie...
CATALST intro stats course presentation at JMM 2013 (Elizabeth Fry, Laura Zie...
 
Big Data in Education: Detection of ICT Factors Associated with School Effect...
Big Data in Education: Detection of ICT Factors Associated with School Effect...Big Data in Education: Detection of ICT Factors Associated with School Effect...
Big Data in Education: Detection of ICT Factors Associated with School Effect...
 
Assessments of National Educational Progress, School Improvement and Students...
Assessments of National Educational Progress, School Improvement and Students...Assessments of National Educational Progress, School Improvement and Students...
Assessments of National Educational Progress, School Improvement and Students...
 
Building Institutional Research Capacity in a K-12 Unified District
Building Institutional Research Capacity in a K-12 Unified DistrictBuilding Institutional Research Capacity in a K-12 Unified District
Building Institutional Research Capacity in a K-12 Unified District
 
Curriculum development 2
Curriculum development 2Curriculum development 2
Curriculum development 2
 
Models of Curriculum Evaluation.pptx
Models of Curriculum Evaluation.pptxModels of Curriculum Evaluation.pptx
Models of Curriculum Evaluation.pptx
 
From novice to expert: A critical evaluation of direct instruction
From novice to expert: A critical evaluation of direct instructionFrom novice to expert: A critical evaluation of direct instruction
From novice to expert: A critical evaluation of direct instruction
 
Curriculum Evaluation.pptx
Curriculum Evaluation.pptxCurriculum Evaluation.pptx
Curriculum Evaluation.pptx
 
Curriculum Evaluation.pptx
Curriculum Evaluation.pptxCurriculum Evaluation.pptx
Curriculum Evaluation.pptx
 
Various Types of Quantitative Research.pptx
Various Types of Quantitative Research.pptxVarious Types of Quantitative Research.pptx
Various Types of Quantitative Research.pptx
 
Katarina Thomson and Karl Molden - Turning Course Evaluation into Information
Katarina Thomson and Karl Molden - Turning Course Evaluation into InformationKatarina Thomson and Karl Molden - Turning Course Evaluation into Information
Katarina Thomson and Karl Molden - Turning Course Evaluation into Information
 
8.6.13 assessment policy media briefing
8.6.13 assessment policy media briefing8.6.13 assessment policy media briefing
8.6.13 assessment policy media briefing
 
classroom-based assessment in vietnam - VLAS 2017
classroom-based assessment in vietnam - VLAS 2017classroom-based assessment in vietnam - VLAS 2017
classroom-based assessment in vietnam - VLAS 2017
 
2015 EDM Leopard for Adaptive Tutoring Evaluation
2015 EDM Leopard for Adaptive Tutoring Evaluation2015 EDM Leopard for Adaptive Tutoring Evaluation
2015 EDM Leopard for Adaptive Tutoring Evaluation
 
Mixed methods research combined
Mixed methods research combinedMixed methods research combined
Mixed methods research combined
 
Education, data policy and practice - Kim Schildkamp
Education, data policy and practice - Kim Schildkamp  Education, data policy and practice - Kim Schildkamp
Education, data policy and practice - Kim Schildkamp
 
Decant workshop 3.9.13
Decant workshop 3.9.13Decant workshop 3.9.13
Decant workshop 3.9.13
 

Mehr von Christian Bokhove

Can data from largescale assessments ever be useful For mathematics education?
Can data from largescale assessments ever be useful For mathematics education?Can data from largescale assessments ever be useful For mathematics education?
Can data from largescale assessments ever be useful For mathematics education?Christian Bokhove
 
Creating interactive digital books for the transition from secondary to under...
Creating interactive digital books for the transition from secondary to under...Creating interactive digital books for the transition from secondary to under...
Creating interactive digital books for the transition from secondary to under...Christian Bokhove
 
Research on school inspections: What do we know?
Research on school inspections: What do we know?Research on school inspections: What do we know?
Research on school inspections: What do we know?Christian Bokhove
 
Master mathematics teachers: What do Chinese primary schools look like?
Master mathematics teachers: What do Chinese primary schools look like?Master mathematics teachers: What do Chinese primary schools look like?
Master mathematics teachers: What do Chinese primary schools look like?Christian Bokhove
 
The role of non-cognitive factors in science achievement: an analysis of PISA...
The role of non-cognitive factors in science achievement: an analysis of PISA...The role of non-cognitive factors in science achievement: an analysis of PISA...
The role of non-cognitive factors in science achievement: an analysis of PISA...Christian Bokhove
 
Multilevel modelling of Chinese primary children’s metacognitive strategies i...
Multilevel modelling of Chinese primary children’s metacognitive strategies i...Multilevel modelling of Chinese primary children’s metacognitive strategies i...
Multilevel modelling of Chinese primary children’s metacognitive strategies i...Christian Bokhove
 
Help-seeking in an online maths environment: A sequence analysis of log files
Help-seeking in an online maths environment: A sequence analysis of log filesHelp-seeking in an online maths environment: A sequence analysis of log files
Help-seeking in an online maths environment: A sequence analysis of log filesChristian Bokhove
 
Learning loss and learning inequalities during the covid-19 pandemic: an anal...
Learning loss and learning inequalities during the covid-19 pandemic: an anal...Learning loss and learning inequalities during the covid-19 pandemic: an anal...
Learning loss and learning inequalities during the covid-19 pandemic: an anal...Christian Bokhove
 
The challenge of proof in the transition from A-level mathematics to university
The challenge of proof in the transition from A-level mathematics to universityThe challenge of proof in the transition from A-level mathematics to university
The challenge of proof in the transition from A-level mathematics to universityChristian Bokhove
 
How can we develop expansive, research-informed ITE ?
How can we develop expansive, research-informed ITE ?How can we develop expansive, research-informed ITE ?
How can we develop expansive, research-informed ITE ?Christian Bokhove
 
(On)waarheden en (on)bekende zaken uit onderzoek over reken-wiskundeonderwijs
(On)waarheden en (on)bekende zaken uit onderzoek over reken-wiskundeonderwijs(On)waarheden en (on)bekende zaken uit onderzoek over reken-wiskundeonderwijs
(On)waarheden en (on)bekende zaken uit onderzoek over reken-wiskundeonderwijsChristian Bokhove
 
Transparency in Data Analysis
Transparency in Data AnalysisTransparency in Data Analysis
Transparency in Data AnalysisChristian Bokhove
 
Proof by induction in Calculus: Investigating first-year students’ examinatio...
Proof by induction in Calculus: Investigating first-year students’ examinatio...Proof by induction in Calculus: Investigating first-year students’ examinatio...
Proof by induction in Calculus: Investigating first-year students’ examinatio...Christian Bokhove
 
Evidence informed: Waar is de Bijsluiter?
Evidence informed: Waar is de Bijsluiter?Evidence informed: Waar is de Bijsluiter?
Evidence informed: Waar is de Bijsluiter?Christian Bokhove
 
Methodological innovation for mathematics education research
Methodological innovation for mathematics education researchMethodological innovation for mathematics education research
Methodological innovation for mathematics education researchChristian Bokhove
 
Roundtable slides RiTE Paderborn 24/9/2021
Roundtable slides RiTE Paderborn 24/9/2021Roundtable slides RiTE Paderborn 24/9/2021
Roundtable slides RiTE Paderborn 24/9/2021Christian Bokhove
 
Structural Topic Modelling of Ofsted Documents
Structural Topic Modelling of Ofsted DocumentsStructural Topic Modelling of Ofsted Documents
Structural Topic Modelling of Ofsted DocumentsChristian Bokhove
 
Learning loss and learning inequalities during the Covid-19 pandemic: an anal...
Learning loss and learning inequalities during the Covid-19 pandemic: an anal...Learning loss and learning inequalities during the Covid-19 pandemic: an anal...
Learning loss and learning inequalities during the Covid-19 pandemic: an anal...Christian Bokhove
 

Mehr von Christian Bokhove (20)

Can data from largescale assessments ever be useful For mathematics education?
Can data from largescale assessments ever be useful For mathematics education?Can data from largescale assessments ever be useful For mathematics education?
Can data from largescale assessments ever be useful For mathematics education?
 
Creating interactive digital books for the transition from secondary to under...
Creating interactive digital books for the transition from secondary to under...Creating interactive digital books for the transition from secondary to under...
Creating interactive digital books for the transition from secondary to under...
 
Research on school inspections: What do we know?
Research on school inspections: What do we know?Research on school inspections: What do we know?
Research on school inspections: What do we know?
 
Master mathematics teachers: What do Chinese primary schools look like?
Master mathematics teachers: What do Chinese primary schools look like?Master mathematics teachers: What do Chinese primary schools look like?
Master mathematics teachers: What do Chinese primary schools look like?
 
The role of non-cognitive factors in science achievement: an analysis of PISA...
The role of non-cognitive factors in science achievement: an analysis of PISA...The role of non-cognitive factors in science achievement: an analysis of PISA...
The role of non-cognitive factors in science achievement: an analysis of PISA...
 
Multilevel modelling of Chinese primary children’s metacognitive strategies i...
Multilevel modelling of Chinese primary children’s metacognitive strategies i...Multilevel modelling of Chinese primary children’s metacognitive strategies i...
Multilevel modelling of Chinese primary children’s metacognitive strategies i...
 
Cryptography
CryptographyCryptography
Cryptography
 
Help-seeking in an online maths environment: A sequence analysis of log files
Help-seeking in an online maths environment: A sequence analysis of log filesHelp-seeking in an online maths environment: A sequence analysis of log files
Help-seeking in an online maths environment: A sequence analysis of log files
 
Learning loss and learning inequalities during the covid-19 pandemic: an anal...
Learning loss and learning inequalities during the covid-19 pandemic: an anal...Learning loss and learning inequalities during the covid-19 pandemic: an anal...
Learning loss and learning inequalities during the covid-19 pandemic: an anal...
 
The challenge of proof in the transition from A-level mathematics to university
The challenge of proof in the transition from A-level mathematics to universityThe challenge of proof in the transition from A-level mathematics to university
The challenge of proof in the transition from A-level mathematics to university
 
How can we develop expansive, research-informed ITE ?
How can we develop expansive, research-informed ITE ?How can we develop expansive, research-informed ITE ?
How can we develop expansive, research-informed ITE ?
 
Discussant EARLI sig 27
Discussant EARLI sig 27Discussant EARLI sig 27
Discussant EARLI sig 27
 
(On)waarheden en (on)bekende zaken uit onderzoek over reken-wiskundeonderwijs
(On)waarheden en (on)bekende zaken uit onderzoek over reken-wiskundeonderwijs(On)waarheden en (on)bekende zaken uit onderzoek over reken-wiskundeonderwijs
(On)waarheden en (on)bekende zaken uit onderzoek over reken-wiskundeonderwijs
 
Transparency in Data Analysis
Transparency in Data AnalysisTransparency in Data Analysis
Transparency in Data Analysis
 
Proof by induction in Calculus: Investigating first-year students’ examinatio...
Proof by induction in Calculus: Investigating first-year students’ examinatio...Proof by induction in Calculus: Investigating first-year students’ examinatio...
Proof by induction in Calculus: Investigating first-year students’ examinatio...
 
Evidence informed: Waar is de Bijsluiter?
Evidence informed: Waar is de Bijsluiter?Evidence informed: Waar is de Bijsluiter?
Evidence informed: Waar is de Bijsluiter?
 
Methodological innovation for mathematics education research
Methodological innovation for mathematics education researchMethodological innovation for mathematics education research
Methodological innovation for mathematics education research
 
Roundtable slides RiTE Paderborn 24/9/2021
Roundtable slides RiTE Paderborn 24/9/2021Roundtable slides RiTE Paderborn 24/9/2021
Roundtable slides RiTE Paderborn 24/9/2021
 
Structural Topic Modelling of Ofsted Documents
Structural Topic Modelling of Ofsted DocumentsStructural Topic Modelling of Ofsted Documents
Structural Topic Modelling of Ofsted Documents
 
Learning loss and learning inequalities during the Covid-19 pandemic: an anal...
Learning loss and learning inequalities during the Covid-19 pandemic: an anal...Learning loss and learning inequalities during the Covid-19 pandemic: an anal...
Learning loss and learning inequalities during the Covid-19 pandemic: an anal...
 

Kürzlich hochgeladen

Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdfGrade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdfJemuel Francisco
 
Transaction Management in Database Management System
Transaction Management in Database Management SystemTransaction Management in Database Management System
Transaction Management in Database Management SystemChristalin Nelson
 
31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...
31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...
31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...Nguyen Thanh Tu Collection
 
Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...
Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...
Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...DhatriParmar
 
Expanded definition: technical and operational
Expanded definition: technical and operationalExpanded definition: technical and operational
Expanded definition: technical and operationalssuser3e220a
 
ROLES IN A STAGE PRODUCTION in arts.pptx
ROLES IN A STAGE PRODUCTION in arts.pptxROLES IN A STAGE PRODUCTION in arts.pptx
ROLES IN A STAGE PRODUCTION in arts.pptxVanesaIglesias10
 
ICS2208 Lecture6 Notes for SL spaces.pdf
ICS2208 Lecture6 Notes for SL spaces.pdfICS2208 Lecture6 Notes for SL spaces.pdf
ICS2208 Lecture6 Notes for SL spaces.pdfVanessa Camilleri
 
Narcotic and Non Narcotic Analgesic..pdf
Narcotic and Non Narcotic Analgesic..pdfNarcotic and Non Narcotic Analgesic..pdf
Narcotic and Non Narcotic Analgesic..pdfPrerana Jadhav
 
Active Learning Strategies (in short ALS).pdf
Active Learning Strategies (in short ALS).pdfActive Learning Strategies (in short ALS).pdf
Active Learning Strategies (in short ALS).pdfPatidar M
 
Mythology Quiz-4th April 2024, Quiz Club NITW
Mythology Quiz-4th April 2024, Quiz Club NITWMythology Quiz-4th April 2024, Quiz Club NITW
Mythology Quiz-4th April 2024, Quiz Club NITWQuiz Club NITW
 
Daily Lesson Plan in Mathematics Quarter 4
Daily Lesson Plan in Mathematics Quarter 4Daily Lesson Plan in Mathematics Quarter 4
Daily Lesson Plan in Mathematics Quarter 4JOYLYNSAMANIEGO
 
ESP 4-EDITED.pdfmmcncncncmcmmnmnmncnmncmnnjvnnv
ESP 4-EDITED.pdfmmcncncncmcmmnmnmncnmncmnnjvnnvESP 4-EDITED.pdfmmcncncncmcmmnmnmncnmncmnnjvnnv
ESP 4-EDITED.pdfmmcncncncmcmmnmnmncnmncmnnjvnnvRicaMaeCastro1
 
Student Profile Sample - We help schools to connect the data they have, with ...
Student Profile Sample - We help schools to connect the data they have, with ...Student Profile Sample - We help schools to connect the data they have, with ...
Student Profile Sample - We help schools to connect the data they have, with ...Seán Kennedy
 
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptxQ4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptxlancelewisportillo
 
Oppenheimer Film Discussion for Philosophy and Film
Oppenheimer Film Discussion for Philosophy and FilmOppenheimer Film Discussion for Philosophy and Film
Oppenheimer Film Discussion for Philosophy and FilmStan Meyer
 
4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptx4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptxmary850239
 
Mental Health Awareness - a toolkit for supporting young minds
Mental Health Awareness - a toolkit for supporting young mindsMental Health Awareness - a toolkit for supporting young minds
Mental Health Awareness - a toolkit for supporting young mindsPooky Knightsmith
 

Kürzlich hochgeladen (20)

Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdfGrade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
 
Transaction Management in Database Management System
Transaction Management in Database Management SystemTransaction Management in Database Management System
Transaction Management in Database Management System
 
INCLUSIVE EDUCATION PRACTICES FOR TEACHERS AND TRAINERS.pptx
INCLUSIVE EDUCATION PRACTICES FOR TEACHERS AND TRAINERS.pptxINCLUSIVE EDUCATION PRACTICES FOR TEACHERS AND TRAINERS.pptx
INCLUSIVE EDUCATION PRACTICES FOR TEACHERS AND TRAINERS.pptx
 
31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...
31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...
31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...
 
Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...
Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...
Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...
 
Expanded definition: technical and operational
Expanded definition: technical and operationalExpanded definition: technical and operational
Expanded definition: technical and operational
 
ROLES IN A STAGE PRODUCTION in arts.pptx
ROLES IN A STAGE PRODUCTION in arts.pptxROLES IN A STAGE PRODUCTION in arts.pptx
ROLES IN A STAGE PRODUCTION in arts.pptx
 
ICS2208 Lecture6 Notes for SL spaces.pdf
ICS2208 Lecture6 Notes for SL spaces.pdfICS2208 Lecture6 Notes for SL spaces.pdf
ICS2208 Lecture6 Notes for SL spaces.pdf
 
prashanth updated resume 2024 for Teaching Profession
prashanth updated resume 2024 for Teaching Professionprashanth updated resume 2024 for Teaching Profession
prashanth updated resume 2024 for Teaching Profession
 
Narcotic and Non Narcotic Analgesic..pdf
Narcotic and Non Narcotic Analgesic..pdfNarcotic and Non Narcotic Analgesic..pdf
Narcotic and Non Narcotic Analgesic..pdf
 
Paradigm shift in nursing research by RS MEHTA
Paradigm shift in nursing research by RS MEHTAParadigm shift in nursing research by RS MEHTA
Paradigm shift in nursing research by RS MEHTA
 
Active Learning Strategies (in short ALS).pdf
Active Learning Strategies (in short ALS).pdfActive Learning Strategies (in short ALS).pdf
Active Learning Strategies (in short ALS).pdf
 
Mythology Quiz-4th April 2024, Quiz Club NITW
Mythology Quiz-4th April 2024, Quiz Club NITWMythology Quiz-4th April 2024, Quiz Club NITW
Mythology Quiz-4th April 2024, Quiz Club NITW
 
Daily Lesson Plan in Mathematics Quarter 4
Daily Lesson Plan in Mathematics Quarter 4Daily Lesson Plan in Mathematics Quarter 4
Daily Lesson Plan in Mathematics Quarter 4
 
ESP 4-EDITED.pdfmmcncncncmcmmnmnmncnmncmnnjvnnv
ESP 4-EDITED.pdfmmcncncncmcmmnmnmncnmncmnnjvnnvESP 4-EDITED.pdfmmcncncncmcmmnmnmncnmncmnnjvnnv
ESP 4-EDITED.pdfmmcncncncmcmmnmnmncnmncmnnjvnnv
 
Student Profile Sample - We help schools to connect the data they have, with ...
Student Profile Sample - We help schools to connect the data they have, with ...Student Profile Sample - We help schools to connect the data they have, with ...
Student Profile Sample - We help schools to connect the data they have, with ...
 
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptxQ4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
 
Oppenheimer Film Discussion for Philosophy and Film
Oppenheimer Film Discussion for Philosophy and FilmOppenheimer Film Discussion for Philosophy and Film
Oppenheimer Film Discussion for Philosophy and Film
 
4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptx4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptx
 
Mental Health Awareness - a toolkit for supporting young minds
Mental Health Awareness - a toolkit for supporting young mindsMental Health Awareness - a toolkit for supporting young minds
Mental Health Awareness - a toolkit for supporting young minds
 

Demonstrating the consequences of not taking into account sampling designs with TIMSS 2011 data

  • 1. Demonstrating the consequences of not taking into account sampling designs with TIMSS 2011 data Dr. Christian Bokhove Lecturer in Mathematics Education University of Southampton EARLI SIG August 28th 2014
  • 2. OUTLINE • International studies • IEA & OECD • PISA, TIMSS, … • Some aspects of their sampling design • Two stage sampling • Weights • Rotated test design • What if you don’t take this into account? • Simulation with TIMSS 2011 data • Single level model • Multilevel models
  • 3. OUTLINE • International studies • IEA & OECD • PISA, TIMSS, … • Some aspects of their sampling design • Two stage sampling • Weights • Rotated test design • What if you don’t take this into account? • Simulation with TIMSS 2011 data • Single level model • Multilevel models
  • 4. IEA & OECD The International Association for the Evaluation of Educational Achievement (IEA) is an independent, international cooperative of national research institutions and governmental research agencies. It conducts large-scale comparative studies of educational achievement and other aspects of education. The mission of the Organisation for Economic Co-operation and Development (OECD) is to promote policies that will improve the economic and social well-being of people around the world.
  • 5. PISA http://www.oecd.org/pisa/ “The Programme for International Student Assessment (PISA) is a triennial international survey which aims to evaluate education systems worldwide by testing the skills and knowledge of 15-year-old students. To date, students representing more than 70 economies have participated in the assessment.” • Last one appeared in 2013 with 2012 data
  • 6. TIMSS http://timssandpirls.bc.edu/timss2011/ “TIMSS 2011 is the fifth in IEA’s series of international assessments of student achievement dedicated to improving teaching and learning in mathematics and science. First conducted in 1995, TIMSS reports every four years on the achievement of fourth and eighth grade students.“
  • 7. OUTLINE • International studies • IEA & OECD • PISA, TIMSS, … • Some aspects of their sampling design • Two stage sampling • Weights • Rotated test design • What if you don’t take this into account? • Simulation with TIMSS 2011 data • Single level model • Multilevel models
  • 8. Two-stage sampling in educational studies ● Random sampling is rarely used in educational surveys: – Too expensive (e.g., training test administrators and travel costs) ● Selected students attend many different schools – It is not practical to contact many schools – A link with class, teacher, school variables is sought ● Sampling is usually conducted in two stages ● First stage – Schools are selected ● Second stage – Students (PISA) or classes (TIMSS/PIRLS) are selected ● 35 students selected randomly (PISA) ● One or two intact classes (TIMSS/PIRLS)
  • 9. Replicate weights ● Replicate weights or resampling techniques are used to calculate correct standard errors in two-stage sampling designs ● The idea behind: – There are many possible samples of schools and not all of them yield the same estimates – Use different samples of schools to calculate estimates – Take into account error of selecting one school and not another (sampling error) ● Each replicate weight represents one sample ● Variability between estimates reflects the sampling error
  • 10. Two replication methods ● Jackknife – TIMSS and PIRLS – Schools are paired with other similar schools within zones – A replicate is created for each zone or pair of schools – One school is randomly removed within each zone and the weight of the other school is doubled ● Balanced repeated replication (BRR) – Select one school at random within each stratum – Set its weight to 0 – Double the weight of the other school – PISA uses a variant of BRR (Fay) to prevent smaller sample size Source: OECD (2009). PISA Data Analysis Manual: SPSS (2nd Edition. Paris): OECD Publishing.
  • 11. OUTLINE • International studies • IEA & OECD • PISA, TIMSS, … • Some aspects of their sampling design • Two stage sampling • Weights • Rotated test design • What if you don’t take this into account? • Simulation with TIMSS 2011 data • Single level model • Multilevel models
  • 12. Weights • In theory sampling design provides student samples with equal selection probabilities. • But variation in number of classes selected, and differential patterns of nonresponse can result in varying selection probabilities, requiring a unique sampling weight for the students in each participating class in the study. • Total weight (TOTWGT) • Sums to the student population size in each country • The overall student sampling weight is the product of the final weight components for schools, classes, and students • Important in multilevel analyses • School level: final school weight • Student level: final student weight multiplied with final class weight
  • 13. OUTLINE • International studies • IEA & OECD • PISA, TIMSS, … • Some aspects of their sampling design • Two stage sampling • Weights • Rotated test design • What if you don’t take this into account? • Simulation with TIMSS 2011 data • Single level model • Multilevel models
  • 14. Rotated test design ● The item pool should include a large number of items for domain validity (e.g., mathematical literacy) ● At the same time: – Fatigue biases results of long tests – Schools refuse to participate in lengthy studies ● Rotated test forms – Students are assigned a subset of item pool – Minimize testing time
  • 15. Plausible values ● Rotated booklets introduce challenges for estimating academic achievement – Students miss data on a number of items ● Plausible values methods are employed to obtain population estimates with rotated booklet designs ● Students do not answer all items but plausible scores are produced as if they had responded to all items based on – Responses to test items – Background characteristics
  • 16. Plausible values ● Plausible values are random draws from the distribution of a student's ability – Instead of obtaining a point estimate, a range of values are estimated for each student ● A single score cannot be calculated because data is missing for a number of items ● Plausible values account for imputation error – Making inference on ability from small number of items ● Estimation should be conducted separately for each plausible value – Typically five plausible values are considered – The variability between estimates reflects the imputation error
  • 17. Challenge ● Ignoring the complex design leads to wrong conclusions, like different point estimates and/or underestimated standard errors, see Rutkowski et al. (2010) – Variance estimation: jackknife, BRR – Not taking into account weights (e.g. Rutkowski et al (2010): Bulgarian TIMSS 2007, higher probability of selection to students from vocational and profiled schools). In a multilevel situation choosing wrong composite weights. – Treatment of plausible values: instead of Rubin’s rules averaging (five) plausible values or choosing only one plausible value. ● Drent et al. (2013) formulated quality criteria (low, satisfactory, high) ● Standard software cannot handle replicate weights and plausible values
  • 18. Available software ● IDB Analyzer (SPSS) ● NAEP Data Explorer (web tool) ● PISA SPSS macros ● R package 'intsvy‘ (Daniel Caro, Oxford) – Free – Does not rely on commercial software like SPSS or SAS – Open source – Can be extended to perform other analyses
  • 19. Available software Multilevel software ● R – Has multilevel package but no weights – Can link to MLwin ● MLwin – Have to combine plausible values manually – No resampling – Does handle weights ● HLM – Combines plausible values – Weights – No resampling
  • 20. OUTLINE • International studies • IEA & OECD • PISA, TIMSS, … • Some aspects of their sampling design • Two stage sampling • Weights • Rotated test design • What if you don’t take this into account? • Simulation with TIMSS 2011 data • Single level model • Multilevel models
  • 21. Simulation with TIMSS 2011 data • TIMSS 2011 • Three aspects: jackknife, weights, plausible values • Five countries: England is chosen as a base-level, using the ranking for grade 8 TIMSS 2011. One arbitrary country significantly above England in the rankings, Singapore, is chosen, as well as one country significantly below England in the rankings (Norway). In addition the countries respectively one place higher and one place lower are chosen (United States and Hungary).
  • 22. Simulation with TIMSS 2011 data • Data preparation: • Publicly available TIMSS 2011 year 8 data files are used. • Additional columns calculated: average of the five plausible values and different weighting columns. • Two experiments: A. single level analyses, and B. multilevel analyses with students nested in schools. • For experiment A an open source R package intsvy (Caro, 2014) for R is used. • Experiment B looks at multilevel models by constructing null models in HLM 6.08 for five countries with student and school levels.
  • 23. Single level Different scenarios: • Two conditions concern variance estimation with jackknife (JK): either jackknife is applied or isn’t applied. • Two conditions concern weights (Wgt): either weights are applied or are not applied. • Three final conditions for the maths achievement scores are used for Plausible Values. • PVR denotes the correct approach using ‘plausible values with Rubin’s rules’. • PVA denotes the ‘mean of the plausible values’. • PV1 only uses ‘the first plausible value’. A total of 2×2×3=12 cases are calculated, as shown in the table on the next slide. Case 1 replicates the values from the international report (Mullis, Martin, Foy, & Arora, 2012).
  • 24. PV1 Case 9 With JK With Wgt Case 10 No JK With Wgt Case 11 With JK No Wgt Case 12 No JK No Wgt Country Score SE # Score SE # Score SE # Score SE # Singapore 609.71 3.68 1 609.71 1.08 1 606.22 3.63 1 606.22 1.08 1 USA 508.75 2.58 2 508.75 0.75 2 508.92 2.52 4 508.92 0.74 4 England 506.03 5.45 3 506.03 1.36 3 509.44 5.59 3 509.44 1.37 3 Hungary 504.75 3.44 4 504.75 1.22 4 513.38 2.96 2 513.38 1.16 2 Norway 475.24 2.38 5 475.24 1.03 5 477.04 2.62 5 477.04 1.03 5 PVA Case 5 With JK With Wgt Case 6 No JK With Wgt Case 7 With JK No Wgt Case 8 No JK No Wgt Country Score SE # Score SE # Score SE # Score SE # Singapore 610.99 3.73 1 610.99 1.06 1 607.54 3.68 1 607.54 1.06 1 USA 509.48 2.59 2 509.48 0.73 2 509.68 2.53 4 509.68 0.72 4 England 506.76 5.48 3 506.76 1.34 3 509.99 5.64 3 509.99 1.35 3 Hungary 504.81 3.48 4 504.81 1.21 4 513.47 2.98 2 513.47 1.15 2 Norway 474.64 2.37 5 474.64 0.99 5 476.55 2.64 5 476.55 1.00 5 PVR Case 1 With JK With Wgt Case 2 No JK With Wgt Case 3 With JK No Wgt Case 4 No JK No Wgt Country Score SE # Score SE # Score SE # Score SE # Singapore 610.99 3.77 1 610.99 0.83 1 607.54 3.74 1 607.54 0.87 1 USA 509.48 2.63 2 509.48 0.55 2 509.68 2.58 4 509.68 0.57 4 England 506.76 5.53 3 506.76 0.89 3 509.99 5.63 3 509.99 0.70 3 Hungary 504.81 3.48 4 504.81 0.47 4 513.47 2.98 2 513.47 0.40 2 Norway 474.64 2.44 5 474.64 0.55 5 476.55 2.66 5 476.55 0.50 5 Maths achievement scores and standard errors for five countries for twelve different cases with weights, jackknife and plausible values.
  • 25. Observations Differences in achievement results and standard errors: • Not taking into account Jackknife (example in yellow) • Average score the same. • Underestimates standard error. • So: relative ranking same but significant testing influenced. • Not taking into account weights (example in orange) • Influences achievement scores: USA, England, Hungary and Norway scoring higher, and Singapore scoring lower. • Impact on relative rankings. • Standard errors different, some higher some lower. • Plausible values (example in green) • PVA and PVR the same achievement score, PV1 different. • PVA and PV1 underestimate standard error. • But no clear pattern PVA and PV1 (which contradicts previous literature).
  • 26. Multilevel Used HLM, does not have Jackknife • Note that with MLwin you need to combine Plausible Values manually. • Three conditions concern weights: no weights, weights only at student level (see Willms & Smith, 2005) and final weights (Rutkowski et al., 2010). • Three conditions for the maths achievement scores are used for Plausible Values. PVR denotes the correct approach using ‘plausible values with Rubin’s rules’. PVA denotes the ‘mean of the plausible values’. PV1 only uses ‘the first plausible value’. • The 3×3 scenarios are reported in table 3.
  • 27. Maths achievement scores and standard errors of five countries for multilevel null models in three different weighting scenarios S1, S4 and S6 and plausible values.
  • 28. Observations Differences in achievement results and standard errors: • The different weighting methods greatly influence achievement scores and standard errors. This also has an impact on the relative rankings. There does not seem to be a pattern in over- or underestimation of scores and standard errors. • For plausible values the cases for PV1 yield a different average than PVA and PVR, in three cases lower except for Hungary and Norway. For PVA and PV1, the standard error is underestimated with respect to PVR. However, between PVA and PV1 underestimation of SE’s differ only slightly, with PVA in most cases being closer to or just as close to PVR as PV1. • Singapore PV1 PVA PVR United states PVA PVR PV1 England PV1 PVA PVR Hungary PV1 PVA PVR Norway PVA PV1 PVR
  • 29. Final thoughts • Not taking into account three features of complex sample designs for LSA’s can have a big influence on achievement scores, standard errors and rankings. • Confirms findings by Rutkowski et al. (2010). • Not all ‘rules of thumb’ from previous literature (Drent et al., 2013; Rutkowski et al., 2010) seem to hold. • Therefore, caution should always be taken when analysing LSA data, hopefully improving future LSA analyses by educational researchers. • Need transparent methodology THANK YOU C.Bokhove@soton.ac.uk QUESTIONS/DISCUSSION
  • 30. Relevant references Beaton, A.E., & Gonzalez, E.J. (1995). NAEP Primer. Center for the study of testing, evaluation and educational policy, Boston College. Chestnut hill: MA. Caro, D. (2014). intsvy: International Assessment Data Manager. R package version 1.3. http://CRAN.R-project. org/package=intsvy Drent, M, Meelissen, M.R.M., & van der Kleij, F.M. (2013). The contribution of TIMSS to the link between school and classroom factors and student achievement. Journal of curriculum studies, 45 (2), 198 - 224. Goldstein, H. (2004). International comparisons of student attainment: some issues arising from the PISA study. Assessment in Education, 11(3), 319-330. Kreiner, S., & Christensen, K. B. (2014). Analyses of model fit and robustness. A new look at the PISA scaling model underlying ranking of countries according to reading literacy. Psychometrika, 79(2), 210-231. Martin, M.O. & Mullis, I.V.S. (Eds.). (2012). Methods and procedures in TIMSS and PIRLS 2011. Chestnut Hill, MA: TIMSS & PIRLS International Study Center, Boston College. Mullis, I.V.S., Martin, M.O., Foy, P., & Arora, A. (2012).TIMSS 2011 International results in mathematics. Lynch School of Education, Boston College. Rubin, D. (1987). Multiple imputation for nonresponse in sample surveys. New York: John Wiley. Rutkowski, L., Gonzalez, E., Joncas, M., & von Davier, M. (2010). International large-scale assessment data: Issues in secondary analysis and reporting. Educational Researcher, 39(2), 142-151. Von Davier, M., Gonzalez, E., & Mislevy, R.J. (2009). Plausible values: What are they and why do we need them? IERI Monograph Series: Issues and Methodologies in Large-Scale Assessments, 2, 9-36. Willms, J.D., & Smith, T. (2005). A manual for conducting analyses with data from TIMSS and PISA. Report prepared for UNESCO Institute for Statistics.