www.earnperhit.com/essay => Professional academic writing
www.Lucky-Bet.site => Bet on Sports - 50% Deposit Bonus
www.Lucky-Bet.site/casino => Online Casino - 5000$ Welcome Bonus
www.Lucky-Bet.site/lotto247 => Lotto247 - Win Big, Live Free
www.Lucky-Bet.site/eurobet => Best European Bookmaker
This presentation reviews major topics to be considered when using assessment data in implementing a school's program of educator and student growth and evaluation. By attending this workshop, participants will improve their assessment literacy, learn how to improve student achievement and instructional effectiveness through thoughtful data use, and discuss common issues shared by educators when using data for evaluative purposes.
2. • Increase your understanding about
various urgent assessment related topics
–Ask better questions
–Useful for making all types of decisions with
data
My Purpose
3. 1. Alignment between the content assessed and
the content to be taught
2. Selection of an appropriate assessment
• Used for the purpose for which it was designed
(proficiency vs. growth)
• Can accurately measure the knowledge of all
students
• Adequate sensitivity to growth
3. Adjust for context/control for factors outside a
teacher’s direct control (value-added)
Three primary conditions
4. 1. Assessment results used
wisely as part of a
dialogue to help teachers
set and meet challenging
goals
2. Use of tests as a “yellow
light” to identify teachers
who may be in need of
additional support or are
ready for more
Two approaches we like
5. • What we’ve known to be true is now being
shown to be true
– Using data thoughtfully improves student
achievement and growth rates
– 12% mathematics, 13% reading
• There are dangers present however
– Unintended Consequences
Go forth thoughtfully
with care
Slotnik, W. J. , Smith, M. D., It’s more than money, February 2013, retrieved from
http://www.ctacusa.com/PDFs/MoreThanMoney-report.pdf
7. • NCLB
–Cast light on inequities
–Improved performance of “Bubble Kids”
–Narrowed taught curriculum
The same dynamic happens
inside your schools
An infamous example
8. It’s what we do that counts
A patient’s health
doesn’t change
because we know
their blood pressure
It’s our response that
makes all the
difference
9. Be considerate of the continuum of
stakes involved
Support
Compensate
Terminate
Increasing levels of required rigor
Increasingrisk
11. The Test
The Growth Metric
The Evaluation
The Rating
There are four key steps required to
answer this question
Top-Down Model
12. Assessment 1
Goal Setting
Assessment(s)
Results and Analysis
Evaluation (Rating)
How does the other
popular process work?
Bottom-Up Model
(Student Learning Objectives)
Understanding
all four of the
top-down
elements are
needed here
13. The Test
The Growth Metric
The Evaluation
The Rating
Let’s begin at the beginning
14. 3rd Grade
ELA
Standards
3rd Grade
ELA
Teacher?
3rd Grade
Social
Studies
Teacher?
Elem. Art
Teacher?
What is measured should be
aligned to what is to be taught
1. Answer questions to demonstrate
understanding of text….
2. Determine the main idea of a
text….
3. Determine the meaning of general
academic and domain specific
words…
Would you use a general
reading assessment in the
evaluation of a….
~30% of teachers teach in tested subjects and grades
The Other 69 Percent: Fairly Rewarding the Performance of Teachers of Nontested Subjects and Grades,
http://www.cecr.ed.gov/guides/other69Percent.pdf
15. • Assessments should align with the
teacher’s instructional responsibility
– Specific advanced content
• HS teachers teaching discipline specific content
– Especially 11th and 12th grade
• MS teachers teaching HS content to advanced students
– Non-tested subjects
• School-wide results are more likely “professional
responsibility” rather than reflecting competence
– HS teachers providing remedial services
What is measured should be
aligned to what is to be taught
16. • Many assessments are
not designed to
measure growth
• Others do not measure
growth equally well for
all students
The purpose and design of the
instrument is significant
17. Let’s ensure we have similar
meaning
Beginning Literacy
Adult
Reading
5th Grade
x
x
Time 1 Time 2
Status
Two assumptions:
1. Measurement accuracy,
and
2. Vertical interval scale
20. Teachers encounter a distribution
of student performance
Beginning Literacy
Adult
Reading
5th
Grade
x x x
x
x
x
x
x
x
x
x
x
x
x
x
Grade Level
Performance
27. If I was measured as:
5’ 9”
And a year later I was:
1.82m
Did I grow?
Yes. ~ 2.5”
How do you know?
Let’s measure height again
28. Traditional assessment uses items
reflecting the grade level standards
Beginning Literacy
Adult
Reading
4th Grade
5th Grade
6th Grade
Grade Level Standards
Traditional
Assessment Item Bank
29. Traditional assessment uses items
reflecting the grade level standards
Beginning Literacy
Adult
Reading
4th Grade
5th Grade
6th Grade
Grade Level Standards
Grade Level Standards
Overlap allows
linking and scale
construction
Grade Level Standards
30. Black, P. and Wiliam, D.(2007) 'Large-scale assessment systems: Design principles
drawn from international comparisons', Measurement: Interdisciplinary Research &
Perspective, 5: 1, 1 — 53
• …when science is defined in terms of
knowledge of facts that are taught in
school…(then) those students who have been
taught the facts will know them, and those
who have not will…not. A test that assesses
these skills is likely to be highly sensitive to
instruction.
The instrument must be able to
detect instruction
31. Black, P. and Wiliam, D.(2007) 'Large-scale assessment systems: Design principles
drawn from international comparisons', Measurement: Interdisciplinary Research &
Perspective, 5: 1, 1 — 53
• When ability in science is defined in terms of
scientific reasoning…achievement will be less
closely tied to age and exposure, and more
closely related to general intelligence. In
other words, science reasoning tasks are
relatively insensitive to instruction.
The more complex, the harder to
detect and attribute to one teacher
32. • Tests specifically designed to inform classroom
instruction and school improvement in
formative ways
No incentive in the system for
inaccurate data
Using tests in high stakes ways
creates new dynamic
33. -6.00
-4.00
-2.00
0.00
2.00
4.00
6.00
8.00
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51 53 55 57 59 61 63 65 67 69 71
Students taking 10+ minutes longer spring than fall All other students
New phenomenon when used as part of
a compensation program
Mean value-added growth by school
34. Cheating
Atlanta Public Schools
Crescendo Charter Schools
Philadelphia Public Schools
Washington DC Public Schools
Houston Independent School
District
Michigan Public Schools
35. When teachers are evaluated
on growth using a once per
year assessment, one teacher
who cheats disadvantages the
next teacher
Other consequence
36. • Both a proctor and the teacher should be
presenting during testing
– Teacher can best guide students and ensure effort
– Proctor protects integrity of results and can
support defense of teacher if results are
challenged
• Have all student test each term
– Need two terms to determine growth
– More student aggregated the more you know
Proctoring
37. • Important for reliable test data particularly when
determining growth
• Use Testing Condition Indicators as KPIs
– Accuracy, duration, changes in duration
– Formative conversations to improve over time
• Short test durations are worth considering follow-
up
– Apply criteria each test event
• Be concerned more with consistency in test
duration than duration itself
Consistent Testing Conditions
38. • Pause or terminate before completion
– Preferred option – Address when problems are
identified
– Not subject to challenge that student retested
simply because the score wasn’t good enough
• Monitor students as testing is going on
– Ensure effort
– Support students as they struggle – G&T
• Show that accurate data is important
Early Intervention
39. • Define “Significant” decline between test
events
– Apply significant decline criteria each test term
• Simply missing cut score is not an acceptable
reason to retest
Retesting
40. Testing is complete . . .
What is useful to answer our question?
The Test
The Growth Metric
The Evaluation
The Rating
41. 0
10
20
30
40
50
60
70
80
90
100
Grade 2 Grade 3 Grade 4 Grade 5 Grade 6 Grade 7 Grade 8
Reading
Math
The metric matters -
Let’s go underneath “Proficiency”
Difficulty of New York Cut Score Between Level 2 and 3
NationalPercentile
College
Readiness
A study of the alignment of the NWEA RIT scale with the New York State (NYS) Testing Program, November 2013
43. The metric matters -
Let’s go underneath “Proficiency”
Dahlin, M. and Durant, S., The State of Proficiency, Kingsbury Center at NWEA, July 2011
44. NumberofStudents
Fall RIT
Mathematics
No Change
Down
Up
What gets measured and attended to
really does matter
Proficiency College Readiness
One district’s change in 5th grade mathematics performance
relative to the KY proficiency cut scores
45. NumberofStudents
Student’s score in fall
Mathematics
Below projected
growth
Met or above
projected growth
Number of 5th grade students meeting projected
mathematics growth in the same district
Changing from Proficiency to Growth
means all kids matter
46. • What did you just learn?
• How will you change what you typically
do?
Guiding Questions
47. How can we make it fair?
The Test
The Growth Metric
The Evaluation
The Rating
48. Without context what is
“Good”?
Beginning
Reading
Adult
Literacy
National
Percentile
Norms StudyScale
CollegeReadiness
Benchmarks
ACT
PerformanceLevels
State Test
“Meets”
Proficiency
PerformanceLevels
Common
Core
Proficient
49. Normative data for growth is a
bit different
Fall
Score
Subject:
Reading
Grade: 4th
7
points
FRL vs. non-FRL?
IEP vs. non-IEP?
ESL vs. non-ESL?
Outside of a teacher’s direct control
Starting
Achievement
Instructional
Weeks
Basic
Factors
Typical growth
50. 60%20%
20%
APPR
Observations State Test Growth EA Value-Added
How did we address requirements
in New York?
State Tested Grades / Subjects
(4-8 Math and Reading)
Other Grades / Subjects for which
there is an available non-state test
60%20%
20%
APPR
Observations Local Measure 2 EA Value-Added
Value-
Added
Value-
Added
Local
Measure 2
(SLO)
State
Test
Growth
Partnered with Education Analytics on VAM
51. The Oak Tree Analogy* – a conceptual introduction
to the metric
*Developed at the Value-Added Research Center
An Introduction to Value-Added
53. Gardener A Gardener B
Explaining Value-Added by Evaluating
Gardener Performance
• For the past year, these gardeners have been tending to their oak trees
trying to maximize the height of the trees.
54. This method is analogous to using an Achievement Model.
Gardener A Gardener B
61 in.
72 in.
Method 1: Measure the Height of the Trees
Today (One Year After the Gardeners Began)
• Using this method, Gardener B is the more effective gardener.
55. 61 in.
72 in.Gardener A Gardener B
Oak A
Age 4
(Today)
Oak B
Age 4
(Today)
Oak A
Age 3
(1 year ago)
Oak B
Age 3
(1 year ago)
47 in.
52 in.
This Achievement Result is not the
Whole Story
• We need to find the starting height for each tree in order to more fairly
evaluate each gardener’s performance during the past year.
56. This is analogous to a Simple Growth Model, also called Gain.
61 in.
72 in.Gardener A Gardener B
Oak A
Age 4
(Today)
Oak B
Age 4
(Today)
Oak A
Age 3
(1 year ago)
Oak B
Age 3
(1 year ago)
47 in.
52 in.
Method 2: Compare Starting Height to
Ending Height
• Oak B had more growth this year, so Gardener B is the more effective gardener.
57. Gardener A Gardener B
What About Factors Outside the
Gardener’s Influence?
• This is an “apples to oranges” comparison.
• For our oak tree example, three environmental factors we will examine are:
Rainfall, Soil Richness, and Temperature.
58. External condition Oak Tree A Oak Tree B
Rainfall amount
Soil richness
Temperature
High Low
Low High
High Low
Gardener A Gardener B
59. Gardener A Gardener B
How Much Did These External Factors
Affect Growth?
• We need to analyze real data from the region to predict growth for these trees.
• We compare the actual height of the trees to their predicted heights to determine
if the gardener’s effect was above or below average.
60. In order to find the impact of rainfall, soil richness, and temperature, we will plot the
growth of each individual oak in the region compared to its environmental conditions.
61. Rainfall Low Medium High
Growth in inches
relative to the
average
-5 -2 +3
Soil Richness Low Medium High
Growth in inches
relative to the
average
-3 -1 +2
Temperature Low Medium High
Growth in inches
relative to the
average
+5 -3 -8
Calculating Our Prediction
Adjustments Based on Real Data
62. Oak A
Age 3
(1 year ago)
Oak B
Age 3
(1 year ago)
67 in.
72 in.Gardener A Gardener B
Oak A
Prediction
Oak B
Prediction
47 in.
52 in.
+20 Average+20 Average
Make Initial Prediction for the Trees
Based on Starting Height
• Next, we will refine out prediction based on the growing conditions for each tree. When we
are done, we will have an “apples to apples” comparison of the gardeners’ effect.
63. 70 in. 67 in.Gardener A Gardener B
47 in.
52 in.
+20 Average+20 Average
+ 3 for Rainfall - 5 for Rainfall
Based on Real Data, Customize
Predictions based on Rainfall
• For having high rainfall, Oak A’s prediction is adjusted by +3 to compensate.
• Similarly, for having low rainfall, Oak B’s prediction is adjusted by -5 to compensate.
64. 67 in.
69 in.Gardener A Gardener B
47 in.
52 in.
+20 Average+20 Average
+ 3 for Rainfall
- 3 for Soil + 2 for Soil
- 5 for Rainfall
Adjusting for Soil Richness
• For having poor soil, Oak A’s prediction is adjusted by -3.
• For having rich soil, Oak B’s prediction is adjusted by +2.
65. 59 in.
74 in.
Gardener A Gardener B
47 in.
52 in.
+20 Average+20 Average
+ 3 for Rainfall
- 3 for Soil + 2 for Soil
- 8 for Temp + 5 for Temp
- 5 for Rainfall
Adjusting for Temperature
• For having high temperature, Oak A’s prediction is adjusted by -8.
• For having low temperature, Oak B’s prediction is adjusted by +5.
66. +20 Average+20 Average
+ 3 for Rainfall
- 3 for Soil + 2 for Soil
- 8 for Temp + 5 for Temp
_________
+12 inches
During the year
_________
+22 inches
During the year
59 in.
74 in.
Gardener A Gardener B
47 in.
52 in.
- 5 for Rainfall
Our Gardeners are Now on a Level
Playing Field
• The predicted height for trees in Oak A’s conditions is 59
inches.
• The predicted height for trees in Oak B’s conditions is 74
inches.
67. Predicted
Oak A
Predicted
Oak B
Actual
Oak A
Actual
Oak B
59 in.
74 in.
Gardener A Gardener B
61 in.
72 in.
+2
-2
Compare the Predicted Height to the
Actual Height
• Oak A’s actual height is 2 inches more than predicted. We attribute this to the effect of Gardener A.
• Oak B’s actual height is 2 inches less than predicted. We attribute this to the effect of Gardener B.
68. This is analogous to a Value-Added measure.
Above
Average
Value-Added
Below
Average
Value-Added
Predicted
Oak A
Predicted
Oak B
Actual
Oak A
Actual
Oak B
59 in.
74 in.
Gardener A Gardener B
61 in.
72 in.
+2
-2
Method 3: Compare the Predicted
Height to the Actual Height
• By accounting for last year’s height and environmental conditions of the trees during this year, we found
the “value” each gardener “added” to the growth of the trees.
69. Gardener A
Value-Added is a Group Measure
• To statistically isolate a gardener’s effect, we need data from many trees under
that gardener’s care.
Gardener B
70. Oak Tree Analogy Value-Added in Education
What are we
evaluating?
• Gardeners • Districts
• Schools
• Grades
• Classrooms
• Programs and Interventions
How does this analogy relate to value added in the education context?
What are we using to
measure success?
• Relative height
improvement in inches
• Relative improvement on
standardized test scores
Sample • Single oak tree • Groups of students
Control factors • Tree’s prior height
• Other factors beyond
the gardener’s control:
• Rainfall
• Soil richness
• Temperature
• Students’ prior test performance
(usually most significant predictor)
• Other demographic characteristics
such as:
• Grade level
• Gender
• Race / Ethnicity
• Low-Income Status
• ELL Status
• Disability Status
• Section 504 Status
71. • What if I skip this step?
–Comparison is likely against normative data
so the comparison is to “typical kids in
typical settings”
• How fair is it to disregard context?
–Good teacher – bad school
–Good teacher – challenging kids
Consider . . .
72. • Control for measurement
error
– All models attempt to address
this issue
• Population size
• Multiple data points
– Error is compounded with
combining two test events
– Many teachers’ value-added
scores will fall within the range
of statistical error
A variety of errors means more
stability only at the extremes
75. Because we want students
to learn more!
• Research view
–Setting goals improves performance
Why should we care about goal
setting in education?
76. What does research say on goal
setting?
Locke, E. A. & Latham, G. P. (2002). Building a practically useful theory of goal setting and task motivation: A 35-year odyssey. American psychologist. American
Psychological Association.
Goals
Moderators
Mechanisms
Performance
Satisfaction
with
Performance
and Rewards
Willingness to
commit
Essential Elements of Goal-Setting Theory
and the High-Performance Cycle
77. What does research say on goal
setting?
Locke, E. A. & Latham, G. P. (2002). Building a practically useful theory of goal setting and task motivation: A 35-year odyssey. American psychologist. American
Psychological Association.
Goals
Moderators
Mechanisms
Performance
Satisfaction
with
Performance
and Rewards
Willingness to
commit
Essential Elements of Goal-Setting Theory
and the High-Performance Cycle
78. • Specificity
• Difficulty
– Performance and
learning goals
– Proximal goals
Goals
Goals Explanation
• Specific goals are typically
stronger than “Do your best”
goals
• Moderately challenging is better
than too easy or too hard
– If complex and new knowledge or
skills needed, set learning goals
• Master five new ways to assess each
student’s learning in the moment
– If complex, set short term goals to
gauge progress and feel rewarded
79. • Lack of a historical context
– What has this teacher and these students done in
the past?
• Lack of comparison groups
– What have other teachers done in the past?
• What is the objective?
– Is the objective to meet a standard of
performance or demonstrate improvement?
• Do you set safe goals or challenging goals?
Challenges with goal setting
80. • Goals and targets themselves
–Appropriately balance moderately
challenging goals with consequences
• Only use “Stretch” goals for the organization to
stimulate creativity and create unconventional
solutions
Suggestions
Locke, E. A., & Latham, G. P. (2013). New developments in goal setting and task performance.
81. • Goals and targets themselves (cont.)
–Set additional learning goals if complex and
new
–Set interim benchmarks for progress
monitoring
–Carefully consider what will not happen to
attain the goal
• Can you live with the consequences?
• How will you look for other unintended ones?
Suggestions
Locke, E. A., & Latham, G. P. (2013). New developments in goal setting and task performance.
82. How tests are used to evaluate
teachers
The Test
The Growth Metric
The Evaluation
The Rating
83. • How would you
translate a rank order
to a rating?
• Data can be provided
• Value judgment
ultimately the basis
for setting cut scores
for points or rating
Translation into ratings can be
difficult to inform with data
84. • What is far below a
district’s expectation is
subjective
• What about
• Obligation to help
teachers improve?
• Quality of replacement
teachers?
Decisions are value based,
not empirical
85. • System for combining elements and
producing a rating is also a value based
decision
–Multiple measures and principal judgment
must be included
–Evaluate the extremes to make sure it
makes sense
Even multiple measures need to
be used well
86. Leadership Courage Is A Key
0
1
2
3
4
5
Teacher 1 Teacher 2 Teacher 3
Ratings can be driven by the assessment
Observation Assessment
Real
or
Noise?
87. If evaluators do not differentiate
their ratings,
then all differentiation comes from
the test
Big Message
88. 1. Alignment between the content assessed and
the content to be taught
2. Selection of an appropriate assessment
• Used for the purpose for which it was designed
(proficiency vs. growth)
• Can accurately measure the knowledge of all
students
• Adequate sensitivity to growth
3. Adjust for context/control for factors outside a
teacher’s direct control (value-added)
Please be thoughtful about . . .
89. • Presentations and other recommended
resources are available at:
– www.nwea.org
– www.kingsburycenter.org
– www.slideshare.net
• Contacting us:
NWEA Main Number
503-624-1951
E-mail: andy.hegedus@nwea.org
More information
Hinweis der Redaktion
Teacher evaluations and the use of data in them can take many forms. You can use them for supporting teachers and their improvement. You can use the evaluations to compensate teachers or groups of teachers differently or you can use them in their highest stakes way to terminate teachers.
The higher the stakes put on the evaluation, the more risk there is to you and your organization from a political, legal, and equity perspective. Most people naturally respond with increasing the levels of rigor put into designing the process as a way to ameliorate the risk. One fact is that the risk can’t be eliminated.
Our goal – Make sure you are prepared. Understand the risk. Proper ways to implement including legal issues. Clarify some of the implications – Very complex – Prepare you and a prudent course
Contrast with what value added communicates
Plot normal growth for Marcus vs anticipated growth – value added. If you ask whether the teachers provided value added, the answer is Yes.
Other line is what is needed for college readiness
Blue line is what is used to evaluate the teacher.
Is he on the line the parents want him to be on? Probably not.
Don’t focus on one at the expense of the other
NCLB – AYP vs what the parent really wants for goal setting
Can be come so focused on measuring teachers that we lose sight of what parents value
We are better off moving towards the kids aspirations
As a parent I didn’t care if the school made AYP. I cared if my kids got the courses that helped them go where they want to go.
Steps are quite important. People tend to skip some of these.
Kids take a test – important that the test is aligned to instruction being given
Metric – look at growth vs growth norm and calculate a growth index. Two benefits – Very transparent/Simple.
People tend to use our growth norms – if you hit 60% for a grade level within a school you are dong well.
Norms – growth of a kid or group of kids compared to a nationally representative sample of students
Why isn’t this value added?
Not all teachers can be compared to a nationally representative sample because they don’t teach kids that are just like the national sample
The third step controls for variables unique to the teacher’s classroom or environment
Fourth step – rating – how much below average before the district takes action or how much above before someone gets performance pay. Particular challenge in NY state right now. Law requires it.
Steps are quite important. People tend to skip some of these.
Kids take a test – important that the test is aligned to instruction being given
Metric – look at growth vs growth norm and calculate a growth index. Two benefits – Very transparent/Simple.
People tend to use our growth norms – if you hit 60% for a grade level within a school you are dong well.
Norms – growth of a kid or group of kids compared to a nationally representative sample of students
Why isn’t this value added?
Not all teachers can be compared to a nationally representative sample because they don’t teach kids that are just like the national sample
The third step controls for variables unique to the teacher’s classroom or environment
Fourth step – rating – how much below average before the district takes action or how much above before someone gets performance pay. Particular challenge in NY state right now. Law requires it.
Common core – very ambitious things they want to measure – tackle things on an AP test. Write and show their work.
A CC assessment to evaluate teachers can be a problem.
Raise your hand if you know what the capital of Chile is. Santiago. Repeat after me. We will review in a couple of minutes. Facts can be relatively easily acquired and are instructionally sensitive. If you expose kids to facts in a meaningful and engaging ways, it is sensitive to instruction.
State assessment designed to measure proficiency – many items in the middle not at the ends
Must use multiple points of data over time to measure this.
We also believe that a principal should be more in control of the evaluation than the test – Principal and Teacher leaders are what changes schools
5th grade NY reading cut scores shown
Problem – insensitive to instruction
Prereq skills – writing skills.
Given events on N. Africa today,
Q requires a lot of pre-req knowledge. Need to know the story. Put it into writing. Reasoning skills to put it together with events today. And I need to know what is going on today as well. One doesn’t develop this entire set of skills in the 9 months of instruction.
Common core is what we want. Just not for teacher evaluation.
These questions are not that sensitive to instruction. Problematic when we hold teachers accountable for instruction or growth.
Problem – insensitive to instruction
Prereq skills – writing skills.
Given events on N. Africa today,
Q requires a lot of pre-req knowledge. Need to know the story. Put it into writing. Reasoning skills to put it together with events today. And I need to know what is going on today as well. One doesn’t develop this entire set of skills in the 9 months of instruction.
Common core is what we want. Just not for teacher evaluation.
These questions are not that sensitive to instruction. Problematic when we hold teachers accountable for instruction or growth.
How you talk with students in advance
How students see their data being used
Does it make a difference in their life?
Test scheduling and pre- or post- activities
When during the day is testing scheduled
Steps are quite important. People tend to skip some of these.
Kids take a test – important that the test is aligned to instruction being given
Metric – look at growth vs growth norm and calculate a growth index. Two benefits – Very transparent/Simple.
People tend to use our growth norms – if you hit 60% for a grade level within a school you are dong well.
Norms – growth of a kid or group of kids compared to a nationally representative sample of students
Why isn’t this value added?
Not all teachers can be compared to a nationally representative sample because they don’t teach kids that are just like the national sample
The third step controls for variables unique to the teacher’s classroom or environment
Fourth step – rating – how much below average before the district takes action or how much above before someone gets performance pay. Particular challenge in NY state right now. Law requires it.
NCLB required everyone to get above proficient – message focus on kids at or near proficient
School systems responded
MS standards are harder than the elem standards – MS problem
No effort to calibrate them – no effort to project elem to ms standards
Start easy and ramp up.
Proficient in elem and not in MS with normal growth.
When you control for the difficulty in the standards Elem and MS performance are the same
Not only are standards different across grades, they are different across states.
It’s data like this that helps to inspire the Common Core and consistent standards so we compare apples to apples
Dramatic differences between standards based vs growth
KY 5th grade mathematics
Sample of students from a large school system
X-axis Fall score, Y number of kids
Blue are the kids who did not change status between the fall and the spring on the state test
Red are the kids who declined in performance over spring – Decender
Green are kids who moved above it in performance over the spring – Ascender – Bubble kids
About 10% based on the total number of kids
Accountability plans are made typically based on these red and green kids
Same district as before
Yellow – did not meet target growth – spread over the entire range of kids
Green – did meet growth targets
60% vs 40% is doing well – This is a high performing district with high growth
Must attend to all kids – this is a good thing – ones in the middle and at both extremes
Old one was discriminatory – focus on some in lieu of others
Teachers who teach really hard at the standard for years – Teachers need to be able to reach them all
This does a lot to move the accountability system to parents and our desires.
Steps are quite important. People tend to skip some of these.
Kids take a test – important that the test is aligned to instruction being given
Metric – look at growth vs growth norm and calculate a growth index. Two benefits – Very transparent/Simple.
People tend to use our growth norms – if you hit 60% for a grade level within a school you are dong well.
Norms – growth of a kid or group of kids compared to a nationally representative sample of students
Why isn’t this value added?
Not all teachers can be compared to a nationally representative sample because they don’t teach kids that are just like the national sample
The third step controls for variables unique to the teacher’s classroom or environment
Fourth step – rating – how much below average before the district takes action or how much above before someone gets performance pay. Particular challenge in NY state right now. Law requires it.
There are wonderful teachers who teach in very challenging, dysfunctional settings. The setting can impact the growth. HLM embeds the student in a classroom, the classroom in the school, and controls for the school parameters. Is it perfect. No. Is it better? Yes.
Opposite is true and learning can be magnified as well.
What if kids are a challenge, ESL or attendance for instance. It can deflate scores especially with a low number of kids in the sample being analyzed. Also need to make sure you have a large enough ‘n’ to make this possible especially true in small districts.
Our position is that a test can inform the decision, but the principal/administrator should collect the bulk of the data that is used in the performance evaluation process.
Measurement error is compounded in test 1 and test 2
Green line is their VA estimate and bar is the error of measure
Both on top and bottom people can be in other quartiles
People in the middle can cross quintiles – just based on SEM
Cross country – winners spread out. End of the race spread. Middle you get a pack. Middle moving up makes a big difference in the overall race.
Instability and narrowness of ranges means evaluating teachers in the middle of the test mean slight changes in performance can be a large change in performance ranking
No solid research on learning and performance goals at the same time. For complex situations where learning is required, learning goals work best, then “Do your best” goals, then performance goals. Focus should be on mastering skills rather than reaching a desired level of performance. That will come later. Performance goals distract from the learning that is needed.
Learning goals help moderate cheating as opposed to performance goal
No solid research on learning and performance goals at the same time. For complex situations where learning is required, learning goals work best, then “Do your best” goals, then performance goals. Focus should be on mastering skills rather than reaching a desired level of performance. That will come later. Performance goals distract from the learning that is needed.
Learning goals help moderate cheating as opposed to performance goal
Steps are quite important. People tend to skip some of these.
Kids take a test – important that the test is aligned to instruction being given
Metric – look at growth vs growth norm and calculate a growth index. Two benefits – Very transparent/Simple.
People tend to use our growth norms – if you hit 60% for a grade level within a school you are dong well.
Norms – growth of a kid or group of kids compared to a nationally representative sample of students
Why isn’t this value added?
Not all teachers can be compared to a nationally representative sample because they don’t teach kids that are just like the national sample
The third step controls for variables unique to the teacher’s classroom or environment
Fourth step – rating – how much below average before the district takes action or how much above before someone gets performance pay. Particular challenge in NY state right now. Law requires it.