SlideShare ist ein Scribd-Unternehmen logo
1 von 89
Andy Hegedus, Ed.D.
Kingsbury Center at NWEA
June 2014
Using Assessment Data
for Educator and
Student Growth
• Increase your understanding about
various urgent assessment related topics
–Ask better questions
–Useful for making all types of decisions with
data
My Purpose
1. Alignment between the content assessed and
the content to be taught
2. Selection of an appropriate assessment
• Used for the purpose for which it was designed
(proficiency vs. growth)
• Can accurately measure the knowledge of all
students
• Adequate sensitivity to growth
3. Adjust for context/control for factors outside a
teacher’s direct control (value-added)
Three primary conditions
1. Assessment results used
wisely as part of a
dialogue to help teachers
set and meet challenging
goals
2. Use of tests as a “yellow
light” to identify teachers
who may be in need of
additional support or are
ready for more
Two approaches we like
• What we’ve known to be true is now being
shown to be true
– Using data thoughtfully improves student
achievement and growth rates
– 12% mathematics, 13% reading
• There are dangers present however
– Unintended Consequences
Go forth thoughtfully
with care
Slotnik, W. J. , Smith, M. D., It’s more than money, February 2013, retrieved from
http://www.ctacusa.com/PDFs/MoreThanMoney-report.pdf
“What gets measured (and attended to),
gets done”
Remember the old adage?
• NCLB
–Cast light on inequities
–Improved performance of “Bubble Kids”
–Narrowed taught curriculum
The same dynamic happens
inside your schools
An infamous example
It’s what we do that counts
A patient’s health
doesn’t change
because we know
their blood pressure
It’s our response that
makes all the
difference
Be considerate of the continuum of
stakes involved
Support
Compensate
Terminate
Increasing levels of required rigor
Increasingrisk
Marcus Normal Growth Needed Growth
Marcus’ growth
College readiness standard
The Test
The Growth Metric
The Evaluation
The Rating
There are four key steps required to
answer this question
Top-Down Model
Assessment 1
Goal Setting
Assessment(s)
Results and Analysis
Evaluation (Rating)
How does the other
popular process work?
Bottom-Up Model
(Student Learning Objectives)
Understanding
all four of the
top-down
elements are
needed here
The Test
The Growth Metric
The Evaluation
The Rating
Let’s begin at the beginning
3rd Grade
ELA
Standards
3rd Grade
ELA
Teacher?
3rd Grade
Social
Studies
Teacher?
Elem. Art
Teacher?
What is measured should be
aligned to what is to be taught
1. Answer questions to demonstrate
understanding of text….
2. Determine the main idea of a
text….
3. Determine the meaning of general
academic and domain specific
words…
Would you use a general
reading assessment in the
evaluation of a….
~30% of teachers teach in tested subjects and grades
The Other 69 Percent: Fairly Rewarding the Performance of Teachers of Nontested Subjects and Grades,
http://www.cecr.ed.gov/guides/other69Percent.pdf
• Assessments should align with the
teacher’s instructional responsibility
– Specific advanced content
• HS teachers teaching discipline specific content
– Especially 11th and 12th grade
• MS teachers teaching HS content to advanced students
– Non-tested subjects
• School-wide results are more likely “professional
responsibility” rather than reflecting competence
– HS teachers providing remedial services
What is measured should be
aligned to what is to be taught
• Many assessments are
not designed to
measure growth
• Others do not measure
growth equally well for
all students
The purpose and design of the
instrument is significant
Let’s ensure we have similar
meaning
Beginning Literacy
Adult
Reading
5th Grade
x
x
Time 1 Time 2
Status
Two assumptions:
1. Measurement accuracy,
and
2. Vertical interval scale
Accurately measuring
growth
depends on
accurately measuring
achievement
Questions
surrounding the
student’s
achievement level
The more
questions the
merrier
What does it take to accurately
measure achievement?
Teachers encounter a distribution
of student performance
Beginning Literacy
Adult
Reading
5th
Grade
x x x
x
x
x
x
x
x
x
x
x
x
x
x
Grade Level
Performance
Adaptive testing works differently
Item bank can
span full
range of
achievement
How about accurately
measuring height?
What if the yardstick
stopped in the middle of
his back?
Items available need to match student
ability
California STAR NWEA MAP
How about accurately
measuring height?
What if we could only
mark within a pre-
defined six inch range?
5th Grade
Level Items
These differences impact
measurement error
.00
.02
.04
.06
.08
.10
.12
Information
170 180 190 200 210 220 230 240
Scale Score
Fully
Adaptive Test
Significantly
Different Error
160
Constrained
Adaptive or
Paper/Pencil
Test
To determine growth,
achievement
measurements
must be related through
a scale
If I was measured as:
5’ 9”
And a year later I was:
1.82m
Did I grow?
Yes. ~ 2.5”
How do you know?
Let’s measure height again
Traditional assessment uses items
reflecting the grade level standards
Beginning Literacy
Adult
Reading
4th Grade
5th Grade
6th Grade
Grade Level Standards
Traditional
Assessment Item Bank
Traditional assessment uses items
reflecting the grade level standards
Beginning Literacy
Adult
Reading
4th Grade
5th Grade
6th Grade
Grade Level Standards
Grade Level Standards
Overlap allows
linking and scale
construction
Grade Level Standards
Black, P. and Wiliam, D.(2007) 'Large-scale assessment systems: Design principles
drawn from international comparisons', Measurement: Interdisciplinary Research &
Perspective, 5: 1, 1 — 53
• …when science is defined in terms of
knowledge of facts that are taught in
school…(then) those students who have been
taught the facts will know them, and those
who have not will…not. A test that assesses
these skills is likely to be highly sensitive to
instruction.
The instrument must be able to
detect instruction
Black, P. and Wiliam, D.(2007) 'Large-scale assessment systems: Design principles
drawn from international comparisons', Measurement: Interdisciplinary Research &
Perspective, 5: 1, 1 — 53
• When ability in science is defined in terms of
scientific reasoning…achievement will be less
closely tied to age and exposure, and more
closely related to general intelligence. In
other words, science reasoning tasks are
relatively insensitive to instruction.
The more complex, the harder to
detect and attribute to one teacher
• Tests specifically designed to inform classroom
instruction and school improvement in
formative ways
No incentive in the system for
inaccurate data
Using tests in high stakes ways
creates new dynamic
-6.00
-4.00
-2.00
0.00
2.00
4.00
6.00
8.00
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51 53 55 57 59 61 63 65 67 69 71
Students taking 10+ minutes longer spring than fall All other students
New phenomenon when used as part of
a compensation program
Mean value-added growth by school
Cheating
Atlanta Public Schools
Crescendo Charter Schools
Philadelphia Public Schools
Washington DC Public Schools
Houston Independent School
District
Michigan Public Schools
When teachers are evaluated
on growth using a once per
year assessment, one teacher
who cheats disadvantages the
next teacher
Other consequence
• Both a proctor and the teacher should be
presenting during testing
– Teacher can best guide students and ensure effort
– Proctor protects integrity of results and can
support defense of teacher if results are
challenged
• Have all student test each term
– Need two terms to determine growth
– More student aggregated the more you know
Proctoring
• Important for reliable test data particularly when
determining growth
• Use Testing Condition Indicators as KPIs
– Accuracy, duration, changes in duration
– Formative conversations to improve over time
• Short test durations are worth considering follow-
up
– Apply criteria each test event
• Be concerned more with consistency in test
duration than duration itself
Consistent Testing Conditions
• Pause or terminate before completion
– Preferred option – Address when problems are
identified
– Not subject to challenge that student retested
simply because the score wasn’t good enough
• Monitor students as testing is going on
– Ensure effort
– Support students as they struggle – G&T
• Show that accurate data is important
Early Intervention
• Define “Significant” decline between test
events
– Apply significant decline criteria each test term
• Simply missing cut score is not an acceptable
reason to retest
Retesting
Testing is complete . . .
What is useful to answer our question?
The Test
The Growth Metric
The Evaluation
The Rating
0
10
20
30
40
50
60
70
80
90
100
Grade 2 Grade 3 Grade 4 Grade 5 Grade 6 Grade 7 Grade 8
Reading
Math
The metric matters -
Let’s go underneath “Proficiency”
Difficulty of New York Cut Score Between Level 2 and 3
NationalPercentile
College
Readiness
A study of the alignment of the NWEA RIT scale with the New York State (NYS) Testing Program, November 2013
Difficulty of ACT college readiness
standards
The metric matters -
Let’s go underneath “Proficiency”
Dahlin, M. and Durant, S., The State of Proficiency, Kingsbury Center at NWEA, July 2011
NumberofStudents
Fall RIT
Mathematics
No Change
Down
Up
What gets measured and attended to
really does matter
Proficiency College Readiness
One district’s change in 5th grade mathematics performance
relative to the KY proficiency cut scores
NumberofStudents
Student’s score in fall
Mathematics
Below projected
growth
Met or above
projected growth
Number of 5th grade students meeting projected
mathematics growth in the same district
Changing from Proficiency to Growth
means all kids matter
• What did you just learn?
• How will you change what you typically
do?
Guiding Questions
How can we make it fair?
The Test
The Growth Metric
The Evaluation
The Rating
Without context what is
“Good”?
Beginning
Reading
Adult
Literacy
National
Percentile
Norms StudyScale
CollegeReadiness
Benchmarks
ACT
PerformanceLevels
State Test
“Meets”
Proficiency
PerformanceLevels
Common
Core
Proficient
Normative data for growth is a
bit different
Fall
Score
Subject:
Reading
Grade: 4th
7
points
FRL vs. non-FRL?
IEP vs. non-IEP?
ESL vs. non-ESL?
Outside of a teacher’s direct control
Starting
Achievement
Instructional
Weeks
Basic
Factors
Typical growth
60%20%
20%
APPR
Observations State Test Growth EA Value-Added
How did we address requirements
in New York?
State Tested Grades / Subjects
(4-8 Math and Reading)
Other Grades / Subjects for which
there is an available non-state test
60%20%
20%
APPR
Observations Local Measure 2 EA Value-Added
Value-
Added
Value-
Added
Local
Measure 2
(SLO)
State
Test
Growth
Partnered with Education Analytics on VAM
The Oak Tree Analogy* – a conceptual introduction
to the metric
*Developed at the Value-Added Research Center
An Introduction to Value-Added
The Oak Tree Analogy
Gardener A Gardener B
Explaining Value-Added by Evaluating
Gardener Performance
• For the past year, these gardeners have been tending to their oak trees
trying to maximize the height of the trees.
This method is analogous to using an Achievement Model.
Gardener A Gardener B
61 in.
72 in.
Method 1: Measure the Height of the Trees
Today (One Year After the Gardeners Began)
• Using this method, Gardener B is the more effective gardener.
61 in.
72 in.Gardener A Gardener B
Oak A
Age 4
(Today)
Oak B
Age 4
(Today)
Oak A
Age 3
(1 year ago)
Oak B
Age 3
(1 year ago)
47 in.
52 in.
This Achievement Result is not the
Whole Story
• We need to find the starting height for each tree in order to more fairly
evaluate each gardener’s performance during the past year.
This is analogous to a Simple Growth Model, also called Gain.
61 in.
72 in.Gardener A Gardener B
Oak A
Age 4
(Today)
Oak B
Age 4
(Today)
Oak A
Age 3
(1 year ago)
Oak B
Age 3
(1 year ago)
47 in.
52 in.
Method 2: Compare Starting Height to
Ending Height
• Oak B had more growth this year, so Gardener B is the more effective gardener.
Gardener A Gardener B
What About Factors Outside the
Gardener’s Influence?
• This is an “apples to oranges” comparison.
• For our oak tree example, three environmental factors we will examine are:
Rainfall, Soil Richness, and Temperature.
External condition Oak Tree A Oak Tree B
Rainfall amount
Soil richness
Temperature
High Low
Low High
High Low
Gardener A Gardener B
Gardener A Gardener B
How Much Did These External Factors
Affect Growth?
• We need to analyze real data from the region to predict growth for these trees.
• We compare the actual height of the trees to their predicted heights to determine
if the gardener’s effect was above or below average.
In order to find the impact of rainfall, soil richness, and temperature, we will plot the
growth of each individual oak in the region compared to its environmental conditions.
Rainfall Low Medium High
Growth in inches
relative to the
average
-5 -2 +3
Soil Richness Low Medium High
Growth in inches
relative to the
average
-3 -1 +2
Temperature Low Medium High
Growth in inches
relative to the
average
+5 -3 -8
Calculating Our Prediction
Adjustments Based on Real Data
Oak A
Age 3
(1 year ago)
Oak B
Age 3
(1 year ago)
67 in.
72 in.Gardener A Gardener B
Oak A
Prediction
Oak B
Prediction
47 in.
52 in.
+20 Average+20 Average
Make Initial Prediction for the Trees
Based on Starting Height
• Next, we will refine out prediction based on the growing conditions for each tree. When we
are done, we will have an “apples to apples” comparison of the gardeners’ effect.
70 in. 67 in.Gardener A Gardener B
47 in.
52 in.
+20 Average+20 Average
+ 3 for Rainfall - 5 for Rainfall
Based on Real Data, Customize
Predictions based on Rainfall
• For having high rainfall, Oak A’s prediction is adjusted by +3 to compensate.
• Similarly, for having low rainfall, Oak B’s prediction is adjusted by -5 to compensate.
67 in.
69 in.Gardener A Gardener B
47 in.
52 in.
+20 Average+20 Average
+ 3 for Rainfall
- 3 for Soil + 2 for Soil
- 5 for Rainfall
Adjusting for Soil Richness
• For having poor soil, Oak A’s prediction is adjusted by -3.
• For having rich soil, Oak B’s prediction is adjusted by +2.
59 in.
74 in.
Gardener A Gardener B
47 in.
52 in.
+20 Average+20 Average
+ 3 for Rainfall
- 3 for Soil + 2 for Soil
- 8 for Temp + 5 for Temp
- 5 for Rainfall
Adjusting for Temperature
• For having high temperature, Oak A’s prediction is adjusted by -8.
• For having low temperature, Oak B’s prediction is adjusted by +5.
+20 Average+20 Average
+ 3 for Rainfall
- 3 for Soil + 2 for Soil
- 8 for Temp + 5 for Temp
_________
+12 inches
During the year
_________
+22 inches
During the year
59 in.
74 in.
Gardener A Gardener B
47 in.
52 in.
- 5 for Rainfall
Our Gardeners are Now on a Level
Playing Field
• The predicted height for trees in Oak A’s conditions is 59
inches.
• The predicted height for trees in Oak B’s conditions is 74
inches.
Predicted
Oak A
Predicted
Oak B
Actual
Oak A
Actual
Oak B
59 in.
74 in.
Gardener A Gardener B
61 in.
72 in.
+2
-2
Compare the Predicted Height to the
Actual Height
• Oak A’s actual height is 2 inches more than predicted. We attribute this to the effect of Gardener A.
• Oak B’s actual height is 2 inches less than predicted. We attribute this to the effect of Gardener B.
This is analogous to a Value-Added measure.
Above
Average
Value-Added
Below
Average
Value-Added
Predicted
Oak A
Predicted
Oak B
Actual
Oak A
Actual
Oak B
59 in.
74 in.
Gardener A Gardener B
61 in.
72 in.
+2
-2
Method 3: Compare the Predicted
Height to the Actual Height
• By accounting for last year’s height and environmental conditions of the trees during this year, we found
the “value” each gardener “added” to the growth of the trees.
Gardener A
Value-Added is a Group Measure
• To statistically isolate a gardener’s effect, we need data from many trees under
that gardener’s care.
Gardener B
Oak Tree Analogy Value-Added in Education
What are we
evaluating?
• Gardeners • Districts
• Schools
• Grades
• Classrooms
• Programs and Interventions
How does this analogy relate to value added in the education context?
What are we using to
measure success?
• Relative height
improvement in inches
• Relative improvement on
standardized test scores
Sample • Single oak tree • Groups of students
Control factors • Tree’s prior height
• Other factors beyond
the gardener’s control:
• Rainfall
• Soil richness
• Temperature
• Students’ prior test performance
(usually most significant predictor)
• Other demographic characteristics
such as:
• Grade level
• Gender
• Race / Ethnicity
• Low-Income Status
• ELL Status
• Disability Status
• Section 504 Status
• What if I skip this step?
–Comparison is likely against normative data
so the comparison is to “typical kids in
typical settings”
• How fair is it to disregard context?
–Good teacher – bad school
–Good teacher – challenging kids
Consider . . .
• Control for measurement
error
– All models attempt to address
this issue
• Population size
• Multiple data points
– Error is compounded with
combining two test events
– Many teachers’ value-added
scores will fall within the range
of statistical error
A variety of errors means more
stability only at the extremes
-12.00
-11.00
-10.00
-9.00
-8.00
-7.00
-6.00
-5.00
-4.00
-3.00
-2.00
-1.00
0.00
1.00
2.00
3.00
4.00
5.00
6.00
7.00
8.00
9.00
10.00
11.00
12.00
AverageGrowthIndexScoreandRange
Mathematics Growth Index Distribution by Teacher - Validity Filtered
Q5
Q4
Q3
Q2
Q1
Each line in this display represents a single teacher. The graphic
shows the average growth index score for each teacher (green
line), plus or minus the standard error of the growth index estimate
(black line). We removed students who had tests of questionable
validity and teachers with fewer than 20 students.
Range of teacher value-added
estimates
With one teacher,
error means a lot
Because we want students
to learn more!
• Research view
–Setting goals improves performance
Why should we care about goal
setting in education?
What does research say on goal
setting?
Locke, E. A. & Latham, G. P. (2002). Building a practically useful theory of goal setting and task motivation: A 35-year odyssey. American psychologist. American
Psychological Association.
Goals
Moderators
Mechanisms
Performance
Satisfaction
with
Performance
and Rewards
Willingness to
commit
Essential Elements of Goal-Setting Theory
and the High-Performance Cycle
What does research say on goal
setting?
Locke, E. A. & Latham, G. P. (2002). Building a practically useful theory of goal setting and task motivation: A 35-year odyssey. American psychologist. American
Psychological Association.
Goals
Moderators
Mechanisms
Performance
Satisfaction
with
Performance
and Rewards
Willingness to
commit
Essential Elements of Goal-Setting Theory
and the High-Performance Cycle
• Specificity
• Difficulty
– Performance and
learning goals
– Proximal goals
Goals
Goals Explanation
• Specific goals are typically
stronger than “Do your best”
goals
• Moderately challenging is better
than too easy or too hard
– If complex and new knowledge or
skills needed, set learning goals
• Master five new ways to assess each
student’s learning in the moment
– If complex, set short term goals to
gauge progress and feel rewarded
• Lack of a historical context
– What has this teacher and these students done in
the past?
• Lack of comparison groups
– What have other teachers done in the past?
• What is the objective?
– Is the objective to meet a standard of
performance or demonstrate improvement?
• Do you set safe goals or challenging goals?
Challenges with goal setting
• Goals and targets themselves
–Appropriately balance moderately
challenging goals with consequences
• Only use “Stretch” goals for the organization to
stimulate creativity and create unconventional
solutions
Suggestions
Locke, E. A., & Latham, G. P. (2013). New developments in goal setting and task performance.
• Goals and targets themselves (cont.)
–Set additional learning goals if complex and
new
–Set interim benchmarks for progress
monitoring
–Carefully consider what will not happen to
attain the goal
• Can you live with the consequences?
• How will you look for other unintended ones?
Suggestions
Locke, E. A., & Latham, G. P. (2013). New developments in goal setting and task performance.
How tests are used to evaluate
teachers
The Test
The Growth Metric
The Evaluation
The Rating
• How would you
translate a rank order
to a rating?
• Data can be provided
• Value judgment
ultimately the basis
for setting cut scores
for points or rating
Translation into ratings can be
difficult to inform with data
• What is far below a
district’s expectation is
subjective
• What about
• Obligation to help
teachers improve?
• Quality of replacement
teachers?
Decisions are value based,
not empirical
• System for combining elements and
producing a rating is also a value based
decision
–Multiple measures and principal judgment
must be included
–Evaluate the extremes to make sure it
makes sense
Even multiple measures need to
be used well
Leadership Courage Is A Key
0
1
2
3
4
5
Teacher 1 Teacher 2 Teacher 3
Ratings can be driven by the assessment
Observation Assessment
Real
or
Noise?
If evaluators do not differentiate
their ratings,
then all differentiation comes from
the test
Big Message
1. Alignment between the content assessed and
the content to be taught
2. Selection of an appropriate assessment
• Used for the purpose for which it was designed
(proficiency vs. growth)
• Can accurately measure the knowledge of all
students
• Adequate sensitivity to growth
3. Adjust for context/control for factors outside a
teacher’s direct control (value-added)
Please be thoughtful about . . .
• Presentations and other recommended
resources are available at:
– www.nwea.org
– www.kingsburycenter.org
– www.slideshare.net
• Contacting us:
NWEA Main Number
503-624-1951
E-mail: andy.hegedus@nwea.org
More information

Weitere ähnliche Inhalte

Was ist angesagt? (8)

TEST-CONSTRUCTION-AND-PREPARATION.pptx
TEST-CONSTRUCTION-AND-PREPARATION.pptxTEST-CONSTRUCTION-AND-PREPARATION.pptx
TEST-CONSTRUCTION-AND-PREPARATION.pptx
 
My individual performance-commitment-and-review-form- 3
My individual performance-commitment-and-review-form- 3My individual performance-commitment-and-review-form- 3
My individual performance-commitment-and-review-form- 3
 
Methods for recording observation
Methods for recording observationMethods for recording observation
Methods for recording observation
 
Norm referenced and criterion-referenced evaluation
Norm referenced and criterion-referenced evaluationNorm referenced and criterion-referenced evaluation
Norm referenced and criterion-referenced evaluation
 
Action Research in the Classroom
Action Research in the ClassroomAction Research in the Classroom
Action Research in the Classroom
 
ITL 512 Thematic Unit Week 1
ITL 512 Thematic Unit Week 1ITL 512 Thematic Unit Week 1
ITL 512 Thematic Unit Week 1
 
Program quiz bee
Program quiz beeProgram quiz bee
Program quiz bee
 
Test Construction
Test ConstructionTest Construction
Test Construction
 

Andere mochten auch

Using assessment data
Using assessment dataUsing assessment data
Using assessment data
fcaristo
 
NWEA 101: Building on the Basics of MAP Testing
NWEA 101: Building on the Basics of MAP TestingNWEA 101: Building on the Basics of MAP Testing
NWEA 101: Building on the Basics of MAP Testing
lissaweier
 
Developing tiered lessons using nwea map scores
Developing tiered lessons using nwea map scoresDeveloping tiered lessons using nwea map scores
Developing tiered lessons using nwea map scores
Jeremy
 
What is Differentiated instruction?
What is Differentiated instruction?What is Differentiated instruction?
What is Differentiated instruction?
sophietsai
 
King oedipus presentation
King oedipus presentationKing oedipus presentation
King oedipus presentation
beatnikbrown
 
Assessment of Learning
Assessment of LearningAssessment of Learning
Assessment of Learning
YaKu Loveria
 

Andere mochten auch (20)

Using assessment data
Using assessment dataUsing assessment data
Using assessment data
 
Assessment of learning 1
Assessment of learning 1Assessment of learning 1
Assessment of learning 1
 
NWEA 101: Building on the Basics of MAP Testing
NWEA 101: Building on the Basics of MAP TestingNWEA 101: Building on the Basics of MAP Testing
NWEA 101: Building on the Basics of MAP Testing
 
NWEA Growth and Teacher evaluation VA 9-13
NWEA Growth and Teacher evaluation VA 9-13NWEA Growth and Teacher evaluation VA 9-13
NWEA Growth and Teacher evaluation VA 9-13
 
Teacher goal setting in texas
Teacher goal setting in texasTeacher goal setting in texas
Teacher goal setting in texas
 
Week four
Week fourWeek four
Week four
 
Developing tiered lessons using nwea map scores
Developing tiered lessons using nwea map scoresDeveloping tiered lessons using nwea map scores
Developing tiered lessons using nwea map scores
 
What is Differentiated instruction?
What is Differentiated instruction?What is Differentiated instruction?
What is Differentiated instruction?
 
Using Assessment data
Using Assessment dataUsing Assessment data
Using Assessment data
 
Utilization of Data: Introduction to Statistics
Utilization of Data: Introduction to StatisticsUtilization of Data: Introduction to Statistics
Utilization of Data: Introduction to Statistics
 
King oedipus presentation
King oedipus presentationKing oedipus presentation
King oedipus presentation
 
Using assessment data for improving teaching practice acer conference 2009 ppt
Using assessment data for improving teaching practice acer conference 2009 pptUsing assessment data for improving teaching practice acer conference 2009 ppt
Using assessment data for improving teaching practice acer conference 2009 ppt
 
Interpretation of Assessment Results
Interpretation of Assessment ResultsInterpretation of Assessment Results
Interpretation of Assessment Results
 
Differentiated instruction
Differentiated instructionDifferentiated instruction
Differentiated instruction
 
Interpreting Test Scores
Interpreting Test ScoresInterpreting Test Scores
Interpreting Test Scores
 
Differentiated Instruction
Differentiated InstructionDifferentiated Instruction
Differentiated Instruction
 
Assessment ppt
Assessment pptAssessment ppt
Assessment ppt
 
Assessment of Learning
Assessment of LearningAssessment of Learning
Assessment of Learning
 
Assessment Of Student Learning
Assessment Of Student LearningAssessment Of Student Learning
Assessment Of Student Learning
 
Statistics for interpreting test scores
Statistics for interpreting test scoresStatistics for interpreting test scores
Statistics for interpreting test scores
 

Ähnlich wie Using Assessment Data for Educator and Student Growth

Connecticut mesuring and modeling growth
Connecticut   mesuring and modeling growthConnecticut   mesuring and modeling growth
Connecticut mesuring and modeling growth
John Cronin
 
Connecticut mesuring and modeling growth
Connecticut   mesuring and modeling growthConnecticut   mesuring and modeling growth
Connecticut mesuring and modeling growth
John Cronin
 
Ed Reform Lecture - University of Arkansas
Ed Reform Lecture - University of ArkansasEd Reform Lecture - University of Arkansas
Ed Reform Lecture - University of Arkansas
John Cronin
 
Using tests for teacher evaluation texas
Using tests for teacher evaluation texasUsing tests for teacher evaluation texas
Using tests for teacher evaluation texas
NWEA
 
Week1 Assessment Overview
Week1 Assessment OverviewWeek1 Assessment Overview
Week1 Assessment Overview
IPT652
 
Map ppt for parents 5.2013
Map ppt for parents 5.2013Map ppt for parents 5.2013
Map ppt for parents 5.2013
gurubesar
 
teaching material
teaching material teaching material
teaching material
Kadek Astiti
 

Ähnlich wie Using Assessment Data for Educator and Student Growth (20)

NYSCOSS Conference Superintendents Training on Assessment 9 14
NYSCOSS Conference Superintendents Training on Assessment 9 14NYSCOSS Conference Superintendents Training on Assessment 9 14
NYSCOSS Conference Superintendents Training on Assessment 9 14
 
Teacher evaluation presentation3 mass
Teacher evaluation presentation3  massTeacher evaluation presentation3  mass
Teacher evaluation presentation3 mass
 
Connecticut mesuring and modeling growth
Connecticut   mesuring and modeling growthConnecticut   mesuring and modeling growth
Connecticut mesuring and modeling growth
 
Connecticut mesuring and modeling growth
Connecticut   mesuring and modeling growthConnecticut   mesuring and modeling growth
Connecticut mesuring and modeling growth
 
Connecticut mesuring and modeling growth
Connecticut   mesuring and modeling growthConnecticut   mesuring and modeling growth
Connecticut mesuring and modeling growth
 
Teacher evaluation and goal setting connecticut
Teacher evaluation and goal setting   connecticutTeacher evaluation and goal setting   connecticut
Teacher evaluation and goal setting connecticut
 
Classroom Assessment
Classroom AssessmentClassroom Assessment
Classroom Assessment
 
Ed Reform Lecture - University of Arkansas
Ed Reform Lecture - University of ArkansasEd Reform Lecture - University of Arkansas
Ed Reform Lecture - University of Arkansas
 
Data Summer
Data SummerData Summer
Data Summer
 
Using tests for teacher evaluation texas
Using tests for teacher evaluation texasUsing tests for teacher evaluation texas
Using tests for teacher evaluation texas
 
Presentation ms english linguistics [autosaved]
Presentation ms english linguistics [autosaved]Presentation ms english linguistics [autosaved]
Presentation ms english linguistics [autosaved]
 
Continuous improvement presentation 2014
Continuous improvement presentation 2014Continuous improvement presentation 2014
Continuous improvement presentation 2014
 
Week1 Assessment Overview
Week1 Assessment OverviewWeek1 Assessment Overview
Week1 Assessment Overview
 
Map ppt for parents 5.2013
Map ppt for parents 5.2013Map ppt for parents 5.2013
Map ppt for parents 5.2013
 
teaching material
teaching material teaching material
teaching material
 
Aeiou of k 3 literary checkpoints-3 3
Aeiou of k 3 literary checkpoints-3 3Aeiou of k 3 literary checkpoints-3 3
Aeiou of k 3 literary checkpoints-3 3
 
Instructional leardership chpt.8
Instructional leardership chpt.8Instructional leardership chpt.8
Instructional leardership chpt.8
 
20101004 principalsconference
20101004 principalsconference20101004 principalsconference
20101004 principalsconference
 
20101004 principalsconference
20101004 principalsconference20101004 principalsconference
20101004 principalsconference
 
National Superintendent's Dialogue
National Superintendent's DialogueNational Superintendent's Dialogue
National Superintendent's Dialogue
 

Mehr von NWEA

Maximizing student assessment systems cronin
Maximizing student assessment systems   croninMaximizing student assessment systems   cronin
Maximizing student assessment systems cronin
NWEA
 
Predicting Proficiency… How MAP Predicts State Test Performance
Predicting Proficiency… How MAP Predicts State Test PerformancePredicting Proficiency… How MAP Predicts State Test Performance
Predicting Proficiency… How MAP Predicts State Test Performance
NWEA
 
Connecting the Dots: CCSS, DI, NWEA, Help!
Connecting the Dots: CCSS, DI, NWEA, Help!Connecting the Dots: CCSS, DI, NWEA, Help!
Connecting the Dots: CCSS, DI, NWEA, Help!
NWEA
 
Finding Meaning in NWEA Data
Finding Meaning in NWEA DataFinding Meaning in NWEA Data
Finding Meaning in NWEA Data
NWEA
 
Data Driven Learning and the iPad
Data Driven Learning and the iPadData Driven Learning and the iPad
Data Driven Learning and the iPad
NWEA
 

Mehr von NWEA (20)

Taking control of the South Carolina Teacher Evaluation framework
Taking control of the South Carolina Teacher Evaluation frameworkTaking control of the South Carolina Teacher Evaluation framework
Taking control of the South Carolina Teacher Evaluation framework
 
Maximizing student assessment systems cronin
Maximizing student assessment systems   croninMaximizing student assessment systems   cronin
Maximizing student assessment systems cronin
 
ND Assessment Program Alignment
ND Assessment Program AlignmentND Assessment Program Alignment
ND Assessment Program Alignment
 
Nd evaluations using growth data 4 13
Nd evaluations using growth data 4 13Nd evaluations using growth data 4 13
Nd evaluations using growth data 4 13
 
Dylan Wiliam seminar for district leaders accelerate learning with formative...
Dylan Wiliam seminar for district leaders  accelerate learning with formative...Dylan Wiliam seminar for district leaders  accelerate learning with formative...
Dylan Wiliam seminar for district leaders accelerate learning with formative...
 
SC Assessment Summit March 2013
SC Assessment Summit March 2013SC Assessment Summit March 2013
SC Assessment Summit March 2013
 
Assessment Program Alignment: Making Essential Connections Between Assessment...
Assessment Program Alignment: Making Essential Connections Between Assessment...Assessment Program Alignment: Making Essential Connections Between Assessment...
Assessment Program Alignment: Making Essential Connections Between Assessment...
 
Predicting Student Performance on the MSP-HSPE: Understanding, Conducting, an...
Predicting Student Performance on the MSP-HSPE: Understanding, Conducting, an...Predicting Student Performance on the MSP-HSPE: Understanding, Conducting, an...
Predicting Student Performance on the MSP-HSPE: Understanding, Conducting, an...
 
KLT TLC Leader Materials Set Excerpt
KLT TLC Leader Materials Set ExcerptKLT TLC Leader Materials Set Excerpt
KLT TLC Leader Materials Set Excerpt
 
What's New at NWEA: Children’s Progress Academic Assessment (CPAA)
What's New at NWEA: Children’s Progress Academic Assessment (CPAA)What's New at NWEA: Children’s Progress Academic Assessment (CPAA)
What's New at NWEA: Children’s Progress Academic Assessment (CPAA)
 
Predicting Proficiency… How MAP Predicts State Test Performance
Predicting Proficiency… How MAP Predicts State Test PerformancePredicting Proficiency… How MAP Predicts State Test Performance
Predicting Proficiency… How MAP Predicts State Test Performance
 
Connecting the Dots: CCSS, DI, NWEA, Help!
Connecting the Dots: CCSS, DI, NWEA, Help!Connecting the Dots: CCSS, DI, NWEA, Help!
Connecting the Dots: CCSS, DI, NWEA, Help!
 
What's New at NWEA: Keeping Learning on Track
What's New at NWEA: Keeping Learning on TrackWhat's New at NWEA: Keeping Learning on Track
What's New at NWEA: Keeping Learning on Track
 
What’s New at NWEA: Power of Teaching
What’s New at NWEA: Power of TeachingWhat’s New at NWEA: Power of Teaching
What’s New at NWEA: Power of Teaching
 
What’s New at NWEA: Skills Pointer
What’s New at NWEA: Skills PointerWhat’s New at NWEA: Skills Pointer
What’s New at NWEA: Skills Pointer
 
Finding Meaning in NWEA Data
Finding Meaning in NWEA DataFinding Meaning in NWEA Data
Finding Meaning in NWEA Data
 
An Alternative Method to Rate Teacher Performance
An Alternative Method to Rate Teacher PerformanceAn Alternative Method to Rate Teacher Performance
An Alternative Method to Rate Teacher Performance
 
Data Driven Learning and the iPad
Data Driven Learning and the iPadData Driven Learning and the iPad
Data Driven Learning and the iPad
 
21st Century Teaching and Learning
21st Century Teaching and Learning21st Century Teaching and Learning
21st Century Teaching and Learning
 
Grading and Reporting Student Learning
Grading and Reporting Student LearningGrading and Reporting Student Learning
Grading and Reporting Student Learning
 

Kürzlich hochgeladen

Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
PECB
 
An Overview of Mutual Funds Bcom Project.pdf
An Overview of Mutual Funds Bcom Project.pdfAn Overview of Mutual Funds Bcom Project.pdf
An Overview of Mutual Funds Bcom Project.pdf
SanaAli374401
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
QucHHunhnh
 

Kürzlich hochgeladen (20)

Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17  How to Extend Models Using Mixin ClassesMixin Classes in Odoo 17  How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
 
Class 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfClass 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdf
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
 
An Overview of Mutual Funds Bcom Project.pdf
An Overview of Mutual Funds Bcom Project.pdfAn Overview of Mutual Funds Bcom Project.pdf
An Overview of Mutual Funds Bcom Project.pdf
 
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
 
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across Sectors
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
 
fourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writingfourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writing
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdf
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activity
 
Unit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptxUnit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptx
 
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
 
SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...
SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...
SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptx
 
Unit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxUnit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptx
 
psychiatric nursing HISTORY COLLECTION .docx
psychiatric  nursing HISTORY  COLLECTION  .docxpsychiatric  nursing HISTORY  COLLECTION  .docx
psychiatric nursing HISTORY COLLECTION .docx
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impact
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdf
 

Using Assessment Data for Educator and Student Growth

  • 1. Andy Hegedus, Ed.D. Kingsbury Center at NWEA June 2014 Using Assessment Data for Educator and Student Growth
  • 2. • Increase your understanding about various urgent assessment related topics –Ask better questions –Useful for making all types of decisions with data My Purpose
  • 3. 1. Alignment between the content assessed and the content to be taught 2. Selection of an appropriate assessment • Used for the purpose for which it was designed (proficiency vs. growth) • Can accurately measure the knowledge of all students • Adequate sensitivity to growth 3. Adjust for context/control for factors outside a teacher’s direct control (value-added) Three primary conditions
  • 4. 1. Assessment results used wisely as part of a dialogue to help teachers set and meet challenging goals 2. Use of tests as a “yellow light” to identify teachers who may be in need of additional support or are ready for more Two approaches we like
  • 5. • What we’ve known to be true is now being shown to be true – Using data thoughtfully improves student achievement and growth rates – 12% mathematics, 13% reading • There are dangers present however – Unintended Consequences Go forth thoughtfully with care Slotnik, W. J. , Smith, M. D., It’s more than money, February 2013, retrieved from http://www.ctacusa.com/PDFs/MoreThanMoney-report.pdf
  • 6. “What gets measured (and attended to), gets done” Remember the old adage?
  • 7. • NCLB –Cast light on inequities –Improved performance of “Bubble Kids” –Narrowed taught curriculum The same dynamic happens inside your schools An infamous example
  • 8. It’s what we do that counts A patient’s health doesn’t change because we know their blood pressure It’s our response that makes all the difference
  • 9. Be considerate of the continuum of stakes involved Support Compensate Terminate Increasing levels of required rigor Increasingrisk
  • 10. Marcus Normal Growth Needed Growth Marcus’ growth College readiness standard
  • 11. The Test The Growth Metric The Evaluation The Rating There are four key steps required to answer this question Top-Down Model
  • 12. Assessment 1 Goal Setting Assessment(s) Results and Analysis Evaluation (Rating) How does the other popular process work? Bottom-Up Model (Student Learning Objectives) Understanding all four of the top-down elements are needed here
  • 13. The Test The Growth Metric The Evaluation The Rating Let’s begin at the beginning
  • 14. 3rd Grade ELA Standards 3rd Grade ELA Teacher? 3rd Grade Social Studies Teacher? Elem. Art Teacher? What is measured should be aligned to what is to be taught 1. Answer questions to demonstrate understanding of text…. 2. Determine the main idea of a text…. 3. Determine the meaning of general academic and domain specific words… Would you use a general reading assessment in the evaluation of a…. ~30% of teachers teach in tested subjects and grades The Other 69 Percent: Fairly Rewarding the Performance of Teachers of Nontested Subjects and Grades, http://www.cecr.ed.gov/guides/other69Percent.pdf
  • 15. • Assessments should align with the teacher’s instructional responsibility – Specific advanced content • HS teachers teaching discipline specific content – Especially 11th and 12th grade • MS teachers teaching HS content to advanced students – Non-tested subjects • School-wide results are more likely “professional responsibility” rather than reflecting competence – HS teachers providing remedial services What is measured should be aligned to what is to be taught
  • 16. • Many assessments are not designed to measure growth • Others do not measure growth equally well for all students The purpose and design of the instrument is significant
  • 17. Let’s ensure we have similar meaning Beginning Literacy Adult Reading 5th Grade x x Time 1 Time 2 Status Two assumptions: 1. Measurement accuracy, and 2. Vertical interval scale
  • 19. Questions surrounding the student’s achievement level The more questions the merrier What does it take to accurately measure achievement?
  • 20. Teachers encounter a distribution of student performance Beginning Literacy Adult Reading 5th Grade x x x x x x x x x x x x x x x Grade Level Performance
  • 21. Adaptive testing works differently Item bank can span full range of achievement
  • 22. How about accurately measuring height? What if the yardstick stopped in the middle of his back?
  • 23. Items available need to match student ability California STAR NWEA MAP
  • 24. How about accurately measuring height? What if we could only mark within a pre- defined six inch range?
  • 25. 5th Grade Level Items These differences impact measurement error .00 .02 .04 .06 .08 .10 .12 Information 170 180 190 200 210 220 230 240 Scale Score Fully Adaptive Test Significantly Different Error 160 Constrained Adaptive or Paper/Pencil Test
  • 27. If I was measured as: 5’ 9” And a year later I was: 1.82m Did I grow? Yes. ~ 2.5” How do you know? Let’s measure height again
  • 28. Traditional assessment uses items reflecting the grade level standards Beginning Literacy Adult Reading 4th Grade 5th Grade 6th Grade Grade Level Standards Traditional Assessment Item Bank
  • 29. Traditional assessment uses items reflecting the grade level standards Beginning Literacy Adult Reading 4th Grade 5th Grade 6th Grade Grade Level Standards Grade Level Standards Overlap allows linking and scale construction Grade Level Standards
  • 30. Black, P. and Wiliam, D.(2007) 'Large-scale assessment systems: Design principles drawn from international comparisons', Measurement: Interdisciplinary Research & Perspective, 5: 1, 1 — 53 • …when science is defined in terms of knowledge of facts that are taught in school…(then) those students who have been taught the facts will know them, and those who have not will…not. A test that assesses these skills is likely to be highly sensitive to instruction. The instrument must be able to detect instruction
  • 31. Black, P. and Wiliam, D.(2007) 'Large-scale assessment systems: Design principles drawn from international comparisons', Measurement: Interdisciplinary Research & Perspective, 5: 1, 1 — 53 • When ability in science is defined in terms of scientific reasoning…achievement will be less closely tied to age and exposure, and more closely related to general intelligence. In other words, science reasoning tasks are relatively insensitive to instruction. The more complex, the harder to detect and attribute to one teacher
  • 32. • Tests specifically designed to inform classroom instruction and school improvement in formative ways No incentive in the system for inaccurate data Using tests in high stakes ways creates new dynamic
  • 33. -6.00 -4.00 -2.00 0.00 2.00 4.00 6.00 8.00 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51 53 55 57 59 61 63 65 67 69 71 Students taking 10+ minutes longer spring than fall All other students New phenomenon when used as part of a compensation program Mean value-added growth by school
  • 34. Cheating Atlanta Public Schools Crescendo Charter Schools Philadelphia Public Schools Washington DC Public Schools Houston Independent School District Michigan Public Schools
  • 35. When teachers are evaluated on growth using a once per year assessment, one teacher who cheats disadvantages the next teacher Other consequence
  • 36. • Both a proctor and the teacher should be presenting during testing – Teacher can best guide students and ensure effort – Proctor protects integrity of results and can support defense of teacher if results are challenged • Have all student test each term – Need two terms to determine growth – More student aggregated the more you know Proctoring
  • 37. • Important for reliable test data particularly when determining growth • Use Testing Condition Indicators as KPIs – Accuracy, duration, changes in duration – Formative conversations to improve over time • Short test durations are worth considering follow- up – Apply criteria each test event • Be concerned more with consistency in test duration than duration itself Consistent Testing Conditions
  • 38. • Pause or terminate before completion – Preferred option – Address when problems are identified – Not subject to challenge that student retested simply because the score wasn’t good enough • Monitor students as testing is going on – Ensure effort – Support students as they struggle – G&T • Show that accurate data is important Early Intervention
  • 39. • Define “Significant” decline between test events – Apply significant decline criteria each test term • Simply missing cut score is not an acceptable reason to retest Retesting
  • 40. Testing is complete . . . What is useful to answer our question? The Test The Growth Metric The Evaluation The Rating
  • 41. 0 10 20 30 40 50 60 70 80 90 100 Grade 2 Grade 3 Grade 4 Grade 5 Grade 6 Grade 7 Grade 8 Reading Math The metric matters - Let’s go underneath “Proficiency” Difficulty of New York Cut Score Between Level 2 and 3 NationalPercentile College Readiness A study of the alignment of the NWEA RIT scale with the New York State (NYS) Testing Program, November 2013
  • 42. Difficulty of ACT college readiness standards
  • 43. The metric matters - Let’s go underneath “Proficiency” Dahlin, M. and Durant, S., The State of Proficiency, Kingsbury Center at NWEA, July 2011
  • 44. NumberofStudents Fall RIT Mathematics No Change Down Up What gets measured and attended to really does matter Proficiency College Readiness One district’s change in 5th grade mathematics performance relative to the KY proficiency cut scores
  • 45. NumberofStudents Student’s score in fall Mathematics Below projected growth Met or above projected growth Number of 5th grade students meeting projected mathematics growth in the same district Changing from Proficiency to Growth means all kids matter
  • 46. • What did you just learn? • How will you change what you typically do? Guiding Questions
  • 47. How can we make it fair? The Test The Growth Metric The Evaluation The Rating
  • 48. Without context what is “Good”? Beginning Reading Adult Literacy National Percentile Norms StudyScale CollegeReadiness Benchmarks ACT PerformanceLevels State Test “Meets” Proficiency PerformanceLevels Common Core Proficient
  • 49. Normative data for growth is a bit different Fall Score Subject: Reading Grade: 4th 7 points FRL vs. non-FRL? IEP vs. non-IEP? ESL vs. non-ESL? Outside of a teacher’s direct control Starting Achievement Instructional Weeks Basic Factors Typical growth
  • 50. 60%20% 20% APPR Observations State Test Growth EA Value-Added How did we address requirements in New York? State Tested Grades / Subjects (4-8 Math and Reading) Other Grades / Subjects for which there is an available non-state test 60%20% 20% APPR Observations Local Measure 2 EA Value-Added Value- Added Value- Added Local Measure 2 (SLO) State Test Growth Partnered with Education Analytics on VAM
  • 51. The Oak Tree Analogy* – a conceptual introduction to the metric *Developed at the Value-Added Research Center An Introduction to Value-Added
  • 52. The Oak Tree Analogy
  • 53. Gardener A Gardener B Explaining Value-Added by Evaluating Gardener Performance • For the past year, these gardeners have been tending to their oak trees trying to maximize the height of the trees.
  • 54. This method is analogous to using an Achievement Model. Gardener A Gardener B 61 in. 72 in. Method 1: Measure the Height of the Trees Today (One Year After the Gardeners Began) • Using this method, Gardener B is the more effective gardener.
  • 55. 61 in. 72 in.Gardener A Gardener B Oak A Age 4 (Today) Oak B Age 4 (Today) Oak A Age 3 (1 year ago) Oak B Age 3 (1 year ago) 47 in. 52 in. This Achievement Result is not the Whole Story • We need to find the starting height for each tree in order to more fairly evaluate each gardener’s performance during the past year.
  • 56. This is analogous to a Simple Growth Model, also called Gain. 61 in. 72 in.Gardener A Gardener B Oak A Age 4 (Today) Oak B Age 4 (Today) Oak A Age 3 (1 year ago) Oak B Age 3 (1 year ago) 47 in. 52 in. Method 2: Compare Starting Height to Ending Height • Oak B had more growth this year, so Gardener B is the more effective gardener.
  • 57. Gardener A Gardener B What About Factors Outside the Gardener’s Influence? • This is an “apples to oranges” comparison. • For our oak tree example, three environmental factors we will examine are: Rainfall, Soil Richness, and Temperature.
  • 58. External condition Oak Tree A Oak Tree B Rainfall amount Soil richness Temperature High Low Low High High Low Gardener A Gardener B
  • 59. Gardener A Gardener B How Much Did These External Factors Affect Growth? • We need to analyze real data from the region to predict growth for these trees. • We compare the actual height of the trees to their predicted heights to determine if the gardener’s effect was above or below average.
  • 60. In order to find the impact of rainfall, soil richness, and temperature, we will plot the growth of each individual oak in the region compared to its environmental conditions.
  • 61. Rainfall Low Medium High Growth in inches relative to the average -5 -2 +3 Soil Richness Low Medium High Growth in inches relative to the average -3 -1 +2 Temperature Low Medium High Growth in inches relative to the average +5 -3 -8 Calculating Our Prediction Adjustments Based on Real Data
  • 62. Oak A Age 3 (1 year ago) Oak B Age 3 (1 year ago) 67 in. 72 in.Gardener A Gardener B Oak A Prediction Oak B Prediction 47 in. 52 in. +20 Average+20 Average Make Initial Prediction for the Trees Based on Starting Height • Next, we will refine out prediction based on the growing conditions for each tree. When we are done, we will have an “apples to apples” comparison of the gardeners’ effect.
  • 63. 70 in. 67 in.Gardener A Gardener B 47 in. 52 in. +20 Average+20 Average + 3 for Rainfall - 5 for Rainfall Based on Real Data, Customize Predictions based on Rainfall • For having high rainfall, Oak A’s prediction is adjusted by +3 to compensate. • Similarly, for having low rainfall, Oak B’s prediction is adjusted by -5 to compensate.
  • 64. 67 in. 69 in.Gardener A Gardener B 47 in. 52 in. +20 Average+20 Average + 3 for Rainfall - 3 for Soil + 2 for Soil - 5 for Rainfall Adjusting for Soil Richness • For having poor soil, Oak A’s prediction is adjusted by -3. • For having rich soil, Oak B’s prediction is adjusted by +2.
  • 65. 59 in. 74 in. Gardener A Gardener B 47 in. 52 in. +20 Average+20 Average + 3 for Rainfall - 3 for Soil + 2 for Soil - 8 for Temp + 5 for Temp - 5 for Rainfall Adjusting for Temperature • For having high temperature, Oak A’s prediction is adjusted by -8. • For having low temperature, Oak B’s prediction is adjusted by +5.
  • 66. +20 Average+20 Average + 3 for Rainfall - 3 for Soil + 2 for Soil - 8 for Temp + 5 for Temp _________ +12 inches During the year _________ +22 inches During the year 59 in. 74 in. Gardener A Gardener B 47 in. 52 in. - 5 for Rainfall Our Gardeners are Now on a Level Playing Field • The predicted height for trees in Oak A’s conditions is 59 inches. • The predicted height for trees in Oak B’s conditions is 74 inches.
  • 67. Predicted Oak A Predicted Oak B Actual Oak A Actual Oak B 59 in. 74 in. Gardener A Gardener B 61 in. 72 in. +2 -2 Compare the Predicted Height to the Actual Height • Oak A’s actual height is 2 inches more than predicted. We attribute this to the effect of Gardener A. • Oak B’s actual height is 2 inches less than predicted. We attribute this to the effect of Gardener B.
  • 68. This is analogous to a Value-Added measure. Above Average Value-Added Below Average Value-Added Predicted Oak A Predicted Oak B Actual Oak A Actual Oak B 59 in. 74 in. Gardener A Gardener B 61 in. 72 in. +2 -2 Method 3: Compare the Predicted Height to the Actual Height • By accounting for last year’s height and environmental conditions of the trees during this year, we found the “value” each gardener “added” to the growth of the trees.
  • 69. Gardener A Value-Added is a Group Measure • To statistically isolate a gardener’s effect, we need data from many trees under that gardener’s care. Gardener B
  • 70. Oak Tree Analogy Value-Added in Education What are we evaluating? • Gardeners • Districts • Schools • Grades • Classrooms • Programs and Interventions How does this analogy relate to value added in the education context? What are we using to measure success? • Relative height improvement in inches • Relative improvement on standardized test scores Sample • Single oak tree • Groups of students Control factors • Tree’s prior height • Other factors beyond the gardener’s control: • Rainfall • Soil richness • Temperature • Students’ prior test performance (usually most significant predictor) • Other demographic characteristics such as: • Grade level • Gender • Race / Ethnicity • Low-Income Status • ELL Status • Disability Status • Section 504 Status
  • 71. • What if I skip this step? –Comparison is likely against normative data so the comparison is to “typical kids in typical settings” • How fair is it to disregard context? –Good teacher – bad school –Good teacher – challenging kids Consider . . .
  • 72. • Control for measurement error – All models attempt to address this issue • Population size • Multiple data points – Error is compounded with combining two test events – Many teachers’ value-added scores will fall within the range of statistical error A variety of errors means more stability only at the extremes
  • 73. -12.00 -11.00 -10.00 -9.00 -8.00 -7.00 -6.00 -5.00 -4.00 -3.00 -2.00 -1.00 0.00 1.00 2.00 3.00 4.00 5.00 6.00 7.00 8.00 9.00 10.00 11.00 12.00 AverageGrowthIndexScoreandRange Mathematics Growth Index Distribution by Teacher - Validity Filtered Q5 Q4 Q3 Q2 Q1 Each line in this display represents a single teacher. The graphic shows the average growth index score for each teacher (green line), plus or minus the standard error of the growth index estimate (black line). We removed students who had tests of questionable validity and teachers with fewer than 20 students. Range of teacher value-added estimates
  • 74. With one teacher, error means a lot
  • 75. Because we want students to learn more! • Research view –Setting goals improves performance Why should we care about goal setting in education?
  • 76. What does research say on goal setting? Locke, E. A. & Latham, G. P. (2002). Building a practically useful theory of goal setting and task motivation: A 35-year odyssey. American psychologist. American Psychological Association. Goals Moderators Mechanisms Performance Satisfaction with Performance and Rewards Willingness to commit Essential Elements of Goal-Setting Theory and the High-Performance Cycle
  • 77. What does research say on goal setting? Locke, E. A. & Latham, G. P. (2002). Building a practically useful theory of goal setting and task motivation: A 35-year odyssey. American psychologist. American Psychological Association. Goals Moderators Mechanisms Performance Satisfaction with Performance and Rewards Willingness to commit Essential Elements of Goal-Setting Theory and the High-Performance Cycle
  • 78. • Specificity • Difficulty – Performance and learning goals – Proximal goals Goals Goals Explanation • Specific goals are typically stronger than “Do your best” goals • Moderately challenging is better than too easy or too hard – If complex and new knowledge or skills needed, set learning goals • Master five new ways to assess each student’s learning in the moment – If complex, set short term goals to gauge progress and feel rewarded
  • 79. • Lack of a historical context – What has this teacher and these students done in the past? • Lack of comparison groups – What have other teachers done in the past? • What is the objective? – Is the objective to meet a standard of performance or demonstrate improvement? • Do you set safe goals or challenging goals? Challenges with goal setting
  • 80. • Goals and targets themselves –Appropriately balance moderately challenging goals with consequences • Only use “Stretch” goals for the organization to stimulate creativity and create unconventional solutions Suggestions Locke, E. A., & Latham, G. P. (2013). New developments in goal setting and task performance.
  • 81. • Goals and targets themselves (cont.) –Set additional learning goals if complex and new –Set interim benchmarks for progress monitoring –Carefully consider what will not happen to attain the goal • Can you live with the consequences? • How will you look for other unintended ones? Suggestions Locke, E. A., & Latham, G. P. (2013). New developments in goal setting and task performance.
  • 82. How tests are used to evaluate teachers The Test The Growth Metric The Evaluation The Rating
  • 83. • How would you translate a rank order to a rating? • Data can be provided • Value judgment ultimately the basis for setting cut scores for points or rating Translation into ratings can be difficult to inform with data
  • 84. • What is far below a district’s expectation is subjective • What about • Obligation to help teachers improve? • Quality of replacement teachers? Decisions are value based, not empirical
  • 85. • System for combining elements and producing a rating is also a value based decision –Multiple measures and principal judgment must be included –Evaluate the extremes to make sure it makes sense Even multiple measures need to be used well
  • 86. Leadership Courage Is A Key 0 1 2 3 4 5 Teacher 1 Teacher 2 Teacher 3 Ratings can be driven by the assessment Observation Assessment Real or Noise?
  • 87. If evaluators do not differentiate their ratings, then all differentiation comes from the test Big Message
  • 88. 1. Alignment between the content assessed and the content to be taught 2. Selection of an appropriate assessment • Used for the purpose for which it was designed (proficiency vs. growth) • Can accurately measure the knowledge of all students • Adequate sensitivity to growth 3. Adjust for context/control for factors outside a teacher’s direct control (value-added) Please be thoughtful about . . .
  • 89. • Presentations and other recommended resources are available at: – www.nwea.org – www.kingsburycenter.org – www.slideshare.net • Contacting us: NWEA Main Number 503-624-1951 E-mail: andy.hegedus@nwea.org More information

Hinweis der Redaktion

  1. Teacher evaluations and the use of data in them can take many forms. You can use them for supporting teachers and their improvement. You can use the evaluations to compensate teachers or groups of teachers differently or you can use them in their highest stakes way to terminate teachers. The higher the stakes put on the evaluation, the more risk there is to you and your organization from a political, legal, and equity perspective. Most people naturally respond with increasing the levels of rigor put into designing the process as a way to ameliorate the risk. One fact is that the risk can’t be eliminated. Our goal – Make sure you are prepared. Understand the risk. Proper ways to implement including legal issues. Clarify some of the implications – Very complex – Prepare you and a prudent course
  2. Contrast with what value added communicates Plot normal growth for Marcus vs anticipated growth – value added. If you ask whether the teachers provided value added, the answer is Yes. Other line is what is needed for college readiness Blue line is what is used to evaluate the teacher. Is he on the line the parents want him to be on? Probably not. Don’t focus on one at the expense of the other NCLB – AYP vs what the parent really wants for goal setting Can be come so focused on measuring teachers that we lose sight of what parents value We are better off moving towards the kids aspirations As a parent I didn’t care if the school made AYP. I cared if my kids got the courses that helped them go where they want to go.
  3. Steps are quite important. People tend to skip some of these. Kids take a test – important that the test is aligned to instruction being given Metric – look at growth vs growth norm and calculate a growth index. Two benefits – Very transparent/Simple. People tend to use our growth norms – if you hit 60% for a grade level within a school you are dong well. Norms – growth of a kid or group of kids compared to a nationally representative sample of students Why isn’t this value added? Not all teachers can be compared to a nationally representative sample because they don’t teach kids that are just like the national sample The third step controls for variables unique to the teacher’s classroom or environment Fourth step – rating – how much below average before the district takes action or how much above before someone gets performance pay. Particular challenge in NY state right now. Law requires it.
  4. Steps are quite important. People tend to skip some of these. Kids take a test – important that the test is aligned to instruction being given Metric – look at growth vs growth norm and calculate a growth index. Two benefits – Very transparent/Simple. People tend to use our growth norms – if you hit 60% for a grade level within a school you are dong well. Norms – growth of a kid or group of kids compared to a nationally representative sample of students Why isn’t this value added? Not all teachers can be compared to a nationally representative sample because they don’t teach kids that are just like the national sample The third step controls for variables unique to the teacher’s classroom or environment Fourth step – rating – how much below average before the district takes action or how much above before someone gets performance pay. Particular challenge in NY state right now. Law requires it.
  5. Common core – very ambitious things they want to measure – tackle things on an AP test. Write and show their work. A CC assessment to evaluate teachers can be a problem. Raise your hand if you know what the capital of Chile is. Santiago. Repeat after me. We will review in a couple of minutes. Facts can be relatively easily acquired and are instructionally sensitive. If you expose kids to facts in a meaningful and engaging ways, it is sensitive to instruction.
  6. State assessment designed to measure proficiency – many items in the middle not at the ends Must use multiple points of data over time to measure this. We also believe that a principal should be more in control of the evaluation than the test – Principal and Teacher leaders are what changes schools
  7. 5th grade NY reading cut scores shown
  8. Problem – insensitive to instruction Prereq skills – writing skills. Given events on N. Africa today, Q requires a lot of pre-req knowledge. Need to know the story. Put it into writing. Reasoning skills to put it together with events today. And I need to know what is going on today as well. One doesn’t develop this entire set of skills in the 9 months of instruction. Common core is what we want. Just not for teacher evaluation. These questions are not that sensitive to instruction. Problematic when we hold teachers accountable for instruction or growth.
  9. Problem – insensitive to instruction Prereq skills – writing skills. Given events on N. Africa today, Q requires a lot of pre-req knowledge. Need to know the story. Put it into writing. Reasoning skills to put it together with events today. And I need to know what is going on today as well. One doesn’t develop this entire set of skills in the 9 months of instruction. Common core is what we want. Just not for teacher evaluation. These questions are not that sensitive to instruction. Problematic when we hold teachers accountable for instruction or growth.
  10. How you talk with students in advance How students see their data being used Does it make a difference in their life? Test scheduling and pre- or post- activities When during the day is testing scheduled
  11. Steps are quite important. People tend to skip some of these. Kids take a test – important that the test is aligned to instruction being given Metric – look at growth vs growth norm and calculate a growth index. Two benefits – Very transparent/Simple. People tend to use our growth norms – if you hit 60% for a grade level within a school you are dong well. Norms – growth of a kid or group of kids compared to a nationally representative sample of students Why isn’t this value added? Not all teachers can be compared to a nationally representative sample because they don’t teach kids that are just like the national sample The third step controls for variables unique to the teacher’s classroom or environment Fourth step – rating – how much below average before the district takes action or how much above before someone gets performance pay. Particular challenge in NY state right now. Law requires it.
  12. NCLB required everyone to get above proficient – message focus on kids at or near proficient School systems responded MS standards are harder than the elem standards – MS problem No effort to calibrate them – no effort to project elem to ms standards Start easy and ramp up. Proficient in elem and not in MS with normal growth. When you control for the difficulty in the standards Elem and MS performance are the same
  13. Not only are standards different across grades, they are different across states. It’s data like this that helps to inspire the Common Core and consistent standards so we compare apples to apples
  14. Dramatic differences between standards based vs growth KY 5th grade mathematics Sample of students from a large school system X-axis Fall score, Y number of kids Blue are the kids who did not change status between the fall and the spring on the state test Red are the kids who declined in performance over spring – Decender Green are kids who moved above it in performance over the spring – Ascender – Bubble kids About 10% based on the total number of kids Accountability plans are made typically based on these red and green kids
  15. Same district as before Yellow – did not meet target growth – spread over the entire range of kids Green – did meet growth targets 60% vs 40% is doing well – This is a high performing district with high growth Must attend to all kids – this is a good thing – ones in the middle and at both extremes Old one was discriminatory – focus on some in lieu of others Teachers who teach really hard at the standard for years – Teachers need to be able to reach them all This does a lot to move the accountability system to parents and our desires.
  16. Steps are quite important. People tend to skip some of these. Kids take a test – important that the test is aligned to instruction being given Metric – look at growth vs growth norm and calculate a growth index. Two benefits – Very transparent/Simple. People tend to use our growth norms – if you hit 60% for a grade level within a school you are dong well. Norms – growth of a kid or group of kids compared to a nationally representative sample of students Why isn’t this value added? Not all teachers can be compared to a nationally representative sample because they don’t teach kids that are just like the national sample The third step controls for variables unique to the teacher’s classroom or environment Fourth step – rating – how much below average before the district takes action or how much above before someone gets performance pay. Particular challenge in NY state right now. Law requires it.
  17. There are wonderful teachers who teach in very challenging, dysfunctional settings. The setting can impact the growth. HLM embeds the student in a classroom, the classroom in the school, and controls for the school parameters. Is it perfect. No. Is it better? Yes. Opposite is true and learning can be magnified as well. What if kids are a challenge, ESL or attendance for instance. It can deflate scores especially with a low number of kids in the sample being analyzed. Also need to make sure you have a large enough ‘n’ to make this possible especially true in small districts. Our position is that a test can inform the decision, but the principal/administrator should collect the bulk of the data that is used in the performance evaluation process.
  18. Measurement error is compounded in test 1 and test 2
  19. Green line is their VA estimate and bar is the error of measure Both on top and bottom people can be in other quartiles People in the middle can cross quintiles – just based on SEM Cross country – winners spread out. End of the race spread. Middle you get a pack. Middle moving up makes a big difference in the overall race. Instability and narrowness of ranges means evaluating teachers in the middle of the test mean slight changes in performance can be a large change in performance ranking
  20. No solid research on learning and performance goals at the same time. For complex situations where learning is required, learning goals work best, then “Do your best” goals, then performance goals. Focus should be on mastering skills rather than reaching a desired level of performance. That will come later. Performance goals distract from the learning that is needed. Learning goals help moderate cheating as opposed to performance goal
  21. No solid research on learning and performance goals at the same time. For complex situations where learning is required, learning goals work best, then “Do your best” goals, then performance goals. Focus should be on mastering skills rather than reaching a desired level of performance. That will come later. Performance goals distract from the learning that is needed. Learning goals help moderate cheating as opposed to performance goal
  22. Steps are quite important. People tend to skip some of these. Kids take a test – important that the test is aligned to instruction being given Metric – look at growth vs growth norm and calculate a growth index. Two benefits – Very transparent/Simple. People tend to use our growth norms – if you hit 60% for a grade level within a school you are dong well. Norms – growth of a kid or group of kids compared to a nationally representative sample of students Why isn’t this value added? Not all teachers can be compared to a nationally representative sample because they don’t teach kids that are just like the national sample The third step controls for variables unique to the teacher’s classroom or environment Fourth step – rating – how much below average before the district takes action or how much above before someone gets performance pay. Particular challenge in NY state right now. Law requires it.
  23. Use NY point system as the example