2010 ohio tif meeting creating a comprehensive teacher effectiveness system
1. Creating a Comprehensive
Teacher Effectiveness System
Christopher A. Thorn
University of Wisconsin-Madison
Center for Data Quality and Systems Innovation
Value-Added Research Center
2.
3. Propositions and Claims to the
Validity Evaluation
Proposition 1: The standards clearly define learning
expectations for the subject area and grade level
Design Claims: Evidence:
• Clarity • Expert reviews
• Feasibility • Research studies validating
• Explicit progressions progressions
4. Proposition 2a: The assessment instruments have
been designed to yield scores that can accurately and
fairly reflect student achievement of standards.
Design Claims: Evidence:
• Alignment with standards • Expert reviews of alignment
(specs and items) • Small scale studies
• Fair and accessible • Sensitivity reviews
• Reliable/Replicable • Measurement reviews of
procedures design, administration, and
scoring procedures
5. Proposition 2b: The assessment instruments have
been designed to yield scores that accurately and
fairly reflect student growth over the course of the
year
Design Claims: Evidence:
• Sample the range of where • Expert review
students may start and end • Teacher review
the school year
• Designed to be sensitive to
instruction
6. Proposition 3: Assessment scores accurately and fairly
reflect the status of students’ knowledge and skills
relative to learning expectations
Design Claims: Evidence:
• Psychometric analyses • Psychometric analysis
confirm the assessment’s • Bias analysis
blueprint
• Scores are sufficiently
precise and reliable
• Scores are fair/unbiased
7. Proposition 4: Student growth scores accurately and
fairly measures student progress over the course of
the year
Design Claims: Evidence:
• Score scale reflects the full • Psychometric modeling and
distribution of where fit statistics
students may start and end • Sensitivity/bias analysis
the year
• Growth scores are
sufficiently precise and
reliable for all students
• Growth scores are
fair/relatively free of bias
8. Proposition 5: Value-added scores represent teachers’
contribution to student growth
Design Claims: Evidence:
• Scores are instructionally • Assumption checking
sensitive • Advanced statistical
• Sores representing teacher modeling
contribution are sufficiently • Research on instructional
precise and reliable sensitivity
• Scores representing teachers
contributions are relatively
free of bias
9. What do we see across the US?
• Student Learning Objectives
– Austin, Texas; Charlotte- Mecklenburg, North Carolina;
and Washington, DC
• Subject- and grade-alike measures
– Delaware and Tennessee
• Universal pre-/post-tests
– Hillsborough County, Florida, and Washington, DC
• Value-added composite (school/cohort)
– Charlotte- Mecklenburg, North Carolina; Delaware;
and Tennessee
10. SLOs
Teacher (overseen by administrator) selects
appropriate measure(s); assesses students at
beginning of year; sets specific objectives for
performance; and assesses students at end of
year. Administrator ultimately determines the
teacher’s success in achieving SLOs
(Goe 2011).
11. Pro/Con for SLOs
• Advantages: Adaptable; permits
specialization; and tends to be credible among
educators.
• Disadvantages: Requires significant attention
from administrators; difficult to create
comparability and rigor across classrooms;
and does not account for differences across
teachers in the students served
(RttT Assistance Network 2011).
12. Subject- and grade-alike measures
Teachers meet in grade- and/ or subject-specific
teams to consider growth measures (existing,
adapted, or new assessments; portfolios of
work; performances, and the like), then agree
on measure(s) all will use to determine
individual contributions to student growth.
District/state then reviews and approves the list
for each grade and subject. (May include SLOs,
pre-/ post-tests, or other measures.)
13. Subject- and grade-alike measures
• Advantages: Designed to yield comparable
measures across classrooms/districts.
• Disadvantages: Requires significant attention
to ensuring comparability and much dedicated
time for teachers to work together to develop
consistent scoring patterns
(RttT Assistance Network 2011).
14. Universal pre-/post-tests
• Written pre-/post-tests developed for every
grade and every subject.
• Advantages: Enables annual growth
calculations for all students.
• Disadvantages: Requires extensive test
development and analysis capacity; and may
be difficult to link previous years of test data,
given course variety at secondary level.
15. Value-added composite
(school/ cohort)
Nontested teachers are assigned a composite
value- added score based on average test
performance of either the school as a whole or
the particular cohort of students “claimed” by
the teacher
(Goe 2011).
16. Value-added composite
(school/ cohort)
• Advantages: Can contribute to collective
efforts around student achievement.
• Disadvantages: May mask high and low
performers and hold individual teachers
accountable for measures over which they
have limited influence.
17. A Tale of Two Cities – K-2
• Value-Added Measures of Math and Reading
• Assessments administered 2-4 times a year
• Administered 1-1 by retired teachers
• Minneapolis
– Off the shelf public domain assessments
– Used to study Beat the Odds Kindergarten Teachers
• Edina
– Purchased NWEA early grade assessments
– Used whole grade and school to study overall
effectiveness
18. Build Decision Points
Test date
Test Demographic Grade
timing and
Properties Controls Retention
frequency
Classroom/ Aggregation
Student School-year
Teacher Units (grades,
Mobility Mobility
Indicators teams, schools)
Special
Aggregation Multiple
Education Multi-year data
over time components
detail
19. References
Webinar on Evaluating Teacher Quality
http://www.aacompcenter.org/cs/aacc/view/rs/26579
A Practical Guide to Designing Comprehensive
Teacher Evaluation Systems
http://tqsource.org/publications/practicalGuideEvalSyst
ems.pdf
Measuring Teachers’ Contributions to Student
Learning Growth for Nontested Grades and
Subjects
http://www.lauragoe.com/LauraGoe/MeasuringTeacher
sContributions.pdf
Hinweis der Redaktion
(1) Test DateDo you test students only near the end (early May) or beginning (late September) of the school year? If so, do you want to take account of the fact that the annual growth period (say, March to March) cuts across two school years and, typically, two different teachers or sets of teachers.(2) Test Properties Are test scores vertically scaled (across grades)? How can we account for test measurement error? Can we refine our predictions using statistical shrinkage?(3) Demographic controlsControl for differences across schools in student demographic characteristics?Income status (free lunch)race/ethnicitygenderspecial educationEnglish language learner, bilingual(4) Retention Include retained-in-grade and promoted students in the estimation of school effects? (Almost certainly yes.) (5) Student mobility Include students who changed schools over the summer (that is, within the annual testing interval if tests are not administered near the end or beginning of the school year)? (Probably yes.)(6) School-year mobility Include students who changed schools during the school year and take account of within-school year mobility by defining school enrollments in the model as the fraction of the school year enrolled in a given school (dose model)?(7) Classroom/teacher indicators What does a classroom teacher indicator represent? Answer: The productivity of teacher, classroom, principal, and school inputs.(8) Aggregation over units: schools, schools by grade, teacher teams, individual classroom /teacher/ school?Statistical precision is highest at the highest level of aggregation since precision increases with the number of students.Where should incentives be directed: at individuals or teams?(9) Aggregation over time. “Smooth” data over time to improve precision?(10) Multiple components. Separately estimate the productivity of regular school (and teachers), summer school, after school, NCLB Supplemental Education Services (SES)?(11) Special education detail. Control for many different types of special education status (type and severity of handicap)?(12) Multi-year data. Exploit multiple years of longitudinal student data to implicitly control for heterogeneity in student achievement growth profiles?