Current issues in assessment and accountability: Are students over-tested

Current issues in assessment and accountability
John Cronin, Ph.D. – Senior Director of Education Research
Northwest Evaluation Association

Most teachers (59%) and the vast
majority of district administrators
(89%) say the ideal focus of
assessments should be frequently
tracking student performance and
providing daily or weekly feedback in
the classroom.
Make Assessment Matter: Students and Educators
Want Tests that Support Learning (2014). –
Portland, OR. NWEA and Grunwald Associates LLC.

Percent of students who say
they do not receive their state
accountability test results.
37%

A majority of teachers believe that students are
over-tested…

Teachers – How much time do you feel is spent
preparing for and taking assessments?
59%
28%
13%
Too much
Just the Right Amount
Too Little

Number of annual instructional hours required in
California schools for students in grades 6-8
900

Ways in which educators use
assessment data
Inform instruction
Evaluate program effectiveness
Evaluate teacher/principal performance
Measure growth in student learning…
0% 20% 40% 60% 80% 100%
Evaluate school performance
Measure student achievement
Administrators Teachers
Make Assessment Matter: Students and Educators Want
Tests that Support Learning (2014). – Portland, OR. NWEA
and Grunwald Associates LLC.

Number of hours required to administer the
Smarter Balanced Assessment summative
assessment in grades 6-8
7.5

Estimated Time Devoted to Testing in
Third Grade – 12 Urban School Systems
3.5
7.2
4.5
9
11.8
12
6.3
3.5
7
11.4
8
14.5
6
3.1
12
2
12.3
1.8
8
10.6
15.3
9
9
10.5
0 5 10 15 20 25 30
Cleveland
Houston
Atlanta
Indianapolis
Denver
Los Angeles
Boston
Washington DC
Anchorage
Baltimore
Shelby Cty, TN
Chicago
Classroom Hours
State Mandated District
Source: Teoh, M., Coggins, C., Guan, C. and Hiller, H. (2014, Winter). The Student and the Stopwatch.
How much time do American students spend on testing? Teach Plus.

What drives the perception of over-testing is the
amount of time invested in test preparation or
drilling

Estimated time devoted to test
preparation in one mid-western school
system
100
90
80
70
60
50
40
30
20
10
0
K 1 2 3 4 5 6 7 8 9 10 11 12
Classroom hours
Grade in school
State Mandated
Benchmark
Local
Nelson, H. (2013). Testing More Teaching Less. What America’s Obsession with Student Testing Costs in
Terms of Money and Time. Washington, D.C. – American Federation of Teachers

And the constraints imposed by testing students
on a limited number of computers or tablets.

Days required to complete SBAC summative testing in a
school with 200 students in grades 3-5.
20
16
13
11.3
10
25
20
15
10
5
0
20 25 30 35 40
Number of Instrucitonal Days Needed
to Complete Testing
Number of Available Computers in Labs
Estimates based on results from SBAC technology needs calculator.

Days required to complete SBAC summative testing in a school with
800 students in grades 6-8.
32
26.67
22.86
20
17.78
16
14.55
13.33
35
30
25
20
15
10
5
0
50 60 70 80 90 100 110 120
Number of Instructional Days
Number of Available Computers in Labs
Estimates based on results from SBAC technology needs calculator.

The evolving evaluation
landscape – value-added

A simple framework for teacher evaluation
Evidence of
professional
responsibilities
Effective teaching
and professional
job performance
Evidence of
student
learning
The evaluation of teaching
by classroom observation
and use of artifacts
Evidence of
professional
practice
The evaluation of the
teacher’s effectiveness in
making progress toward
their goals and fulfilling the
responsibilities of a
professional educator.
The evaluation of a
teacher’s contribution to
student learning and
growth

What teacher effectiveness infers
• Evidence of Learning – A claim that the
improvement in learning (or lack of it)
reflected on one or more tests is caused by
the teacher.
• Evidence of good practice – That the
observers ratings or conclusions are reliable
and associated with behaviors that cause
improved learning in the classroom.

Three ways tests are used in
evaluation and their claims
Value-Added
• Produces rankings of teachers relative to each other
based on assessment results.
• Introduces controls to account for factors that may
influence growth that are outside the teachers
influence.
• Advances a claim of causation – that the teachers
ranking is based on learning caused.
• Can be applied to as few as 20% of the teachers in a
school system (Whitehurst, 2013).
Whitehurst, G. J. (2013). Teacher value- added: Do we want a ten percent solution? The
Brown Center Chalkboard, April 24. Washington, DC: Brookings Institution.
Retrieved October 2, 2014, from www.brookings.edu/blogs/brown-center-chalkboard/
posts/2013/04/24-merit-pay-whitehurst

Student Growth Percentiles
• Produce rankings of teachers relative to each other
based on assessment results.
• Use prior results to predict student growth.
• Do not introduce controls to account for factors that
may influence growth that are outside the teachers
influence.
• Does not advance a claim of causation – that results
describe the growth of the classroom but are not
intended to identify the cause.

Student Learning Objectives
• Are a contract negotiated between the principal and teacher
around student results.
• Do not produce rankings that compare teacher results across
settings
• Do not introduce controls to account for factors that may
influence growth that are outside the teachers influence.
• Do not advance a claim of causation – teacher competence is
demonstrated by fulfillment of the contract
• Can be applied to as few as 20% of the teachers in a school
system

evaluation and their issues
Value-Added
• High measurement error
• Reliability of the results
• Insufficient controls
• Measurement quality of assessments used
• Fairness of application

Student Growth Percentiles
• Reliability of the results
• Use of past performance as the only control
• Measurement quality of assessments used
• Fairness of application
• The developer’s claim that the percentiles are
descriptive and not evidence of causation

Student Learning Objectives
• Do not provide evidence of teacher effectiveness.
• Teachers using SLOs may be evaluated against less
rigorous criteria than teachers evaluated by value-added
or student growth percentiles.
• Goals are not consistent in difficulty.
• Goals are not consistent across teachers.

The evolving evaluation landscape
– principal observation

What NWEA supports
• The evaluation process should focus on
helping teachers improve.
• The principal or designated evaluator should
control the evaluation.
• Tests should not be the deciding factor in an
evaluation.
• Multiple measures should be used.

Distinguishing teacher effectiveness
from teacher evaluation
• Teacher effectiveness – The judgment of a teacher’s
ability to positively impact learning in the classroom.
• Teacher evaluation – The judgment of a teacher’s
overall performance including:
– Teacher effectiveness
– Common standards of job performance
– Participation in the school community
– Adherence to professional standards

Purposes of summative evaluation
• Make an accurate and defensible judgment of an educator’s
job performance.
• Provide ratings of performance that provide meaningful
differentiation across educators.
• Goals of evaluation
– Help educators focus on their students and their practice
– Retain your top educators
– Dismiss ineffective educators

Learn from others’ mistakes.

The greatest tragedy of this century in
education so far, was the number of
young, talented teachers who lost their
positions in the last recession.

Employment of Elementary Teachers
2007-2012
NUMBER OF TEACHERS
1538000 1544270 1544300
1485600
The elementary school
teacher workforce shrunk by
178,000 teachers (11%)
between May, 2007 and May,
2012.
1415000
1360380
2007 2008 2009 2010 2011 2012
Source: (2012, May) Bureau of Labor Statistics – Occupational Employment Statistics
Numbers exclude special education and kindergarten teachers

The impact of seniority based layoffs on
school quality
In a simulation study of implementation of a layoff of 5%
of teachers using New York City data, reliance on seniority
based layoffs:
• Resulted in 25% more teachers laid off.
• Teachers laid off were .31 standard deviations more
effective (using a value-added criterion) than those lost
using an effectiveness criterion.
• 84% of teachers with unsatisfactory ratings were
retained.
Source: Boyd, L., Lankford, H., Loeb, S., and Wycoff, J. (2011). Center for Education Policy.
Stanford University.

Ultimately – the principal should
decide
• Evaluation inherently involves
judgment – not a bad thing.
• Evidence should inform and not
direct their judgment.
• The implemented system should
differentiate performance.
• Courts respect the judgment of
school administrators relative to
personnel decisions.

If evaluators do not
differentiate their ratings,
then all differentiation
comes from the test.

Why differentiating ratings is
important
100
90
80
70
60
50
40
30
20
10
0
60 70 80 90 100
Principal Rating Value-added rating

“The (Race to the Top teacher evaluation) changes, already
under way in some cities and states, are intended to
provide meaningful feedback and, critically, to weed out
weak performers. And here are some of the early results:
In Florida, 97 percent of teachers were deemed effective
or highly effective in the most recent evaluations. In
Tennessee, 98 percent of teachers were judged to be “at
expectations.” In Michigan, 98 percent of teachers were
rated effective or better.”
Source: New York Times (2013, March 30). Curious Grade for Teachers: Nearly all Pass.
Retrieved from:
http://www.nytimes.com/2013/03/31/education/curious-grade-for-teachers-nearly-all-
pass.html?pagewanted=all&_r=0

Results of Georgia Teacher Evaluation
1% 2%
75%
23%
Evaluator Rating
ineffective
Minimally Effective
Effective
Highly Effective
Pilot

Teacher Evaluation Ratings in Eleven Florida
Schools - 2013
Florida
District
Highly
Effective
Effective Needs
Improvement
Developing Unsatisfactory VA Score Florida
Ranking
Ranking
1 44.4% 55.6% 0.0% 0.0% 0.0%
2 25.0% 75.0% 0.0% 0.0% 0.0%
3 90.9% 9.1% 0.0% 0.0% 0.0%
4 60.7% 39.3% 0.0% 0.0% 0.0%
5 81.2% 18.8% 0.0% 0.0% 0.0%
6 37.3% 54.2% 1.7% 0.0% 6.8%
7 81.3% 18.8% 0.0% 0.0% 0.0%
8 41.7% 55.6% 1.4% 1.4% 0.0%
9 52.2% 47.8% 0.0% 0.0% 0.0%
10 27.0% 66.2% 1.4% 0.0% 5.4%
11 7.1% 72.6% 9.5% 10.7% 0.0%

Teacher Evaluation Ratings in Eleven Florida
Schools - 2013
Florida
District
Highly
Effective
Effectiv
e
Needs
Improveme
nt
Developin
g
Unsatisfactor
y
VA
Score
Florida
Rankin
g
Rankin
g
1 44.4% 55.6% 0.0% 0.0% 0.0% 0.39 109 1
2 25.0% 75.0% 0.0% 0.0% 0.0% 0.37 121 2
3 90.9% 9.1% 0.0% 0.0% 0.0% -0.14 2802 9
4 60.7% 39.3% 0.0% 0.0% 0.0% -0.14 2797 8
5 81.2% 18.8% 0.0% 0.0% 0.0% -0.16 2831 10
6 37.3% 54.2% 1.7% 0.0% 6.8% 0.12 880 5
7 81.3% 18.8% 0.0% 0.0% 0.0% 0.22 402 3
8 41.7% 55.6% 1.4% 1.4% 0.0% -0.34 3274 11
9 52.2% 47.8% 0.0% 0.0% 0.0% 0.16 664 4
10 27.0% 66.2% 1.4% 0.0% 5.4% 0 1764 6
11 7.1% 72.6% 9.5% 10.7% 0.0% -0.08 2445 7

Teacher observation as a part of
teacher evaluation
Systematic observation of teacher performance
is a central part of every state’s teacher
evaluation plan.

New York Teacher Ratings
1291 5741
55115
63809
State Test Value-added
Ineffective Developing
Effective Highly Effective
616 2853
51400
71087
Principal Rating

How should tests,
observations, and surveys be
weighted?

The importance of non-cognitive
factors in
teacher evaluation

Non-cognitive factors
In education, value-added
measurement has focused
policy-makers on the teacher’s
contribution to academic
success, as reflected in test
scores.
Jackson (2012) argues that
teachers may have more impact
on non-cognitive factors that are
essential to student success like
attendance, grades, and
suspensions.
These are not the only measures
that matter however.

Non-cognitive factors
• Lowered the average student absenteeism by 7.4 days.
• Improved the probability that students would enroll in
the next grade by 5 percentage points.
Employing value-added methodologies, Jackson
found that teachers had a substantive effect on
non-cognitive outcomes that was independent
of their effect on test scores
• Reduced the likelihood of suspension by 2.8%
• Improved the average GPA by .09 (Algebra) or .05
(English)
Source: Jackson, K. (2013). Non-Cognitive Ability, Test Scores and Teacher
Quality: Evidence from 9th Grade Teachers in North Carolina. Northwestern
University and NBER

Solving one problem can sometimes
create another.

Suggested reading
Baker B., Oluwole, J., Green, P. (2013). The legal
consequences of mandating high stakes decisions based
on low quality information: Teacher evaluation in the
Race to the Top Era. Education Policy Analysis Archives.
Vol 21. No 5.

Current issues in assessment and accountability: Are students over-tested

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Current issues in assessment and accountability: Are students over-tested

Similar to Current issues in assessment and accountability: Are students over-tested (20)

More from NWEA

More from NWEA (20)

Recently uploaded

Recently uploaded (20)

Current issues in assessment and accountability: Are students over-tested