How satisfied are you with the last assessment you gave? Would you describe your exam as a highly effective evaluation tool? How much information does it reveal about individual student’s abilities, and the overall performance of your current class as compared to previous classes? Do you trust your assessment to accurately identify which students “get it,” and which ones clearly do not grasp the content, nor meet the expected standards required to pass your course?
The use of a 3-step item analysis method based on an item’s difficulty levels, discrimination values, and response frequencies provides a revealing look at the quality of your assessment by focusing your attention on the effectiveness of each test item and its contribution to the exam blueprint. Save time and effort in identifying exactly which exam questions need editing, and how much editing is required, before you take any action. You’ll likely find that replacing the item with a brand new question may not be necessary. Learn how your efforts to make small improvements within just a few exam items, guided by a systematic process of reviewing statistical results before you start editing, can drastically enhance the items’ quality, and eliminate the need to spend hours rewriting the entire exam. By using this item analysis method, your future assessments will be able to provide an accurate measurement of your students’ abilities to apply nursing content and solve clinical problems.
4. 4
Five Guidelines to Developing
Effective Critical Thinking Exams
q Assemble the “basics.”
q Write critical thinking test items.
q Pay attention to housekeeping duties.
q Develop a test blueprint.
q Scientifically analyze all exams.
6. 6
Bloom’s Taxonomy: Benjamin Bloom, 1956
(revised)
Terminology changes "The graphic is a representation of the NEW verbage
associated with the long familiar Bloom's Taxonomy. Note the change from Nouns to
Verbs [e.g., Application to Applying] to describe the different levels of the taxonomy.
Note that the top two levels are essentially exchanged from the Old to the New
version." (Schultz, 2005) (Evaluation moved from the top to Evaluating in the second
from the top, Synthesis moved from second on top to the top as Creating.) Source:
http://www.odu.edu/educ/llschult/blooms_taxonomy.htm
11. 11
q Item difficulty 30% - 90%
q Item Discrimination Ratio 25% and Above
q PBCC 0.20 and Above
q KR20 0.70 and Above
Standards of Acceptance
12. Thinking more about mean item
difficulty on teacher-made tests…
Mean
difficulty
level
for
a
teacher-‐made
nursing
exam
should
be
80
–
85%.
So,
why
might
low
NCLEX-‐RN®
pass
rates
persist
when
mean
difficulty
levels
on
teacher-‐made
exams
remain
consistently
within
this
desired
range?
12
13. …. and one “absolute”
rule about item difficulty
Since
the
mean
difficulty
level
for
a
teacher-‐made
nursing
exam
is
80
–
85%,
what
should
the
lowest
acceptable
value
be
for
each
test
item
on
the
exam?
TEST
ITEMS
ANSWERED
CORRECTLY
BY
30%
or
LESS
of
the
examinees
should
always
be
considered
too
difficult,
and
the
instructor
must
take
acSon.
Why?
13
14. …but what about high
difficulty levels?
q Test
items
with
high
difficulty
levels
(>90%)
oIen
yield
poor
discriminaJon
values.
q Is
there
a
situaJon
where
faculty
can
legiJmately
expect
that
100%
of
the
class
will
answer
a
test
item
correctly,
and
be
pleased
when
this
happens?
q RULE
OF
THUMB
ABOUT
MASTERY
ITEMS:
Due
to
their
negaJve
impact
on
test
discriminaJon
and
reliability,
they
should
comprise
no
more
than
10%
of
the
test.
14
15. 15
q Item difficulty 30% - 90%
q Item Discrimination Ratio 25% and Above
q PBCC 0.20 and Above
q KR20 0.70 and Above
Standards of Acceptance
16. Thinking more about item
discrimination on teacher-
made tests…
q IDR
can
be
calculated
quickly,
but
doesn’t
consider
variance
of
the
enJre
group.
Use
it
to
quickly
idenJfy
items
that
have
zero/negaJve
discriminaJon
values,
since
these
need
to
be
edited
before
using
again.
q PBCC
is
a
more
powerful
measure
discriminaJon.
q Correlates
the
correct
answer
to
a
single
test
items
with
the
total
test
score
of
the
student.
q Considers
the
variance
of
the
enJre
student
group,
not
just
the
lower
and
upper
27%
groups.
q For
a
small
‘n,’
consider
cumulaJve
value.
16
17. … what decisions need
to be made about items?
q When
a
test
item
has
poor
difficulty
and/or
discriminaJon
values,
acJon
is
needed.
q All
of
these
acSons
require
that
the
exam
be
rescored.
q Credit
can
be
given
for
more
than
one
choice.
q Test
item
can
be
nullified.
q Test
item
can
be
deleted.
q Each
of
these
acSons
has
a
consequence,
so
faculty
need
to
carefully
consider
these
when
choosing
an
acSon.
Faculty
judgment
is
crucial
when
determining
acSons
affecSng
test
scores.
17
19. Thinking more about adjusting
standard of acceptance for
nursing tests…
q Remember
that
the
key
staJsJcal
concept
inherent
in
calculaJng
coefficients
is
VARIANCE.
q When
there
is
less
variance
in
test
scores,
reliability
of
the
test
will
decrease,
ie
the
KR-‐20
value
will
drop.
q What
contributes
to
lack
of
variance
in
nursing
students’
test
scores?
19
21. ..and a word about using
Response Frequencies
SomeJmes
LESS
is
MORE
when
it
comes
to
ediJng
a
test
item.
A
review
of
the
response
frequency
data
can
focus
your
ediJng.
For
items
where
100%
of
students
answer
correctly,
and
no
other
opJons
were
chosen,
make
sure
that
this
is
indeed
intenJonal
(MASTERY
ITEM),
and
not
just
reflecJve
of
an
item
that
is
too
easy
(>90%
DIFFICULTY.)
Target
re-‐wriJng
the
“zero”
distracters
–
those
opJons
that
are
ignored
by
students.
Replacing
“zeros”
with
plausible
opJons
will
immediately
improve
item
DISCRIMINATION.
21
22. 22
3-Step Method for
Item Analysis
1. Review Difficulty Level
2. Review Discrimination Data
q Item Discrimination Ratio (IDR)
q Point Biserial Correlation Coefficient (PBCC)
3. Review Effectiveness of Alternatives
q Response Frequencies
q Non-distracters
Source: Morrison, Nibert, Flick, J. (2006). Critical
thinking and test item writing (2nd ed.).Houston,
TX: Health Education Systems, Inc.
28. 28
Does the test measure what
it claims to measure?
C o n t e n t V a l i d i t y
29. 29
Use a Blueprint to Assess a
Test’s Validity
q Test Blueprint
Ø Reflects Course Objectives
Ø Rational/Logical Tool
Ø Testing Software Program
Ø Storage of item analysis data (Last & Cum)
Ø Storage of test item categories
36. Item Writing Tools for
Success …
Knowledge
Test Blueprint
Testing Software
37. References
Morrison,
S.,
Nibert,
A.,
&
Flick,
J.
(2006).
Cri$cal
thinking
and
test
item
wri$ng
(2nd
ed.).
Houston,
TX:
Health
EducaJon
Systems,
Inc.
Morrison,
S.
(2004).
Improving
NCLEX-‐RN
pass
rates
through
internal
and
external
curriculum
evaluaJon.
In
M.
Oermann
&
K.
Heinrich
(Eds.),
Annual
review
of
nursing
educaJon
(Vol.
3).
New
York:
Springer
NaJonal
Council
of
State
Boards
of
Nursing.
(2013)
2013
NCLEX-‐RN
test
plan.
Chicago,
IL:
NaJonal
Council
of
State
Boards
of
Nursing.
hpps://www.ncsbn.org/3795.htm
Nibert,
A.
(2010)
Benchmarking
for
student
progression
throughout
a
nursing
program:
Implica$ons
for
students,
faculty,
and
administrators.
In
CapuJ,
L.
(Ed.),
Teaching
nursing:
The
art
and
science,
2nd
ed.
(Vol.
3).
(pp.
45-‐64).
Chicago:
College
of
DuPage
Press.
37
38. Have Ques]ons? Need More Info?
Thanks
for
your
Jme
&
apenJon
today!
38
866-429-8889