Presented by Laurel Sampognaro,Clinical Associate Professor, David Caldwell, Director of Professional Affairs, and Adam Pate, Assistant Professor all from University of Louisiana Monroe School of Pharmacy
This presentation will describe a process to improve examination item quality by educating and involving course instructors in an item review process using evidence based guidelines and describe application of this process to multiple courses. In this interactive session presenters will discuss personal experiences and barriers to implementation of a collaborative exam item review process involving 21 faculty members from 2 departments in 3 different courses. Attendees will be exposed to a review of item writing guidelines, a discussion of common errors in item writing, and effects of item writing on test performance. A post exam process to objectively categorize test items based on item statistics will also be outlined
2. Objectives
1. Describe how to implement a collaborative item review process
2. Identify potential barriers to implementation and success of a
collaborative item writing process
3. Generate ideas to establish a collaborative process at your
respective institution
7. Real quick item statistics
• Point biserial correlation (rpb)
• Difficulty (p)
8. Item classification guide
Item Class Item Difficulty
Item Discrimination
(point biserial)
Description
Level I 0.45 to 0.75 +0.20 or higher
Best item statistics; use most
items in this range if possible
Level II 0.76 to 0.91 +0.15 or higher Easy; use sparingly
Level III 0.25 to 0.44 +0.10 or higher
Difficulty; use very sparingly and
only if content is essential--rewrite
if possible
Level IV <0.24 or >0.91 Any discrimination
Extremely difficult or easy; do not
use unless content is essential
9. Faculty discussed these guidelines and
came to group consensus on usage
Guideline For (%) Uncited (%) Against (%)
Use positives, no negatives 63 19 18
Write as many plausible
distractors as you can
70 26 4
Use carefully None of the Above 44 7 48
Avoid All of the Above 70 7 22
Use humor sparingly 0 85 15
10. Item Review Process
• Who is involved?
• Self Care I – 11 faculty
• Self Care II – 9 faculty
• Participation
15. Self Care 1
(without training or review)
Self Care 2
(without training or review)
Spring and Fall 2012, control sequence
6 exams, 272 items
(without training or review)
Spring and Fall 2013, intervention sequence
Self Care 1
(with training or review)
Self Care 2
(with training or review)
6 exams, 264 items
(without training or review)
Interventions
1. Pre-semester survey
2. Presentation of item-writing
guidelines at semester start
3. Guideline review and discussion
at each exam review meeting
4. Review and editing of exam
items per guidelines
5. Post-semester survey
Comparisons
item difficulty, discrimination, and
classification by these factors, and
student performance
All interventions were completed in both
Self Care 1 and 2; instructors teaching in
both only completed surveys in
Self Care 1
Comparisons
Pre- versus post-survey
16. About the participating faculty
NOTABLE BASELINE REPORTS
Which of the following factors affect your
sense of success in item writing?
• Item statistics (n=9)
• Previous training in item writing (n=4)
• Student challenges to exam items (n=3)
How often have you participated in peer-
review of exam items?
• Half of the time (n=4)
• A minority of the time (n=1)
• Never (n=5)
17. Results
GOALS
1. To improve examination quality through a faculty development program,
followed by a longitudinal item review occurring before examination
administration
2. To improve faculty members’ self-rated confidence and success
3. To measure changes in their opinions regarding item-writing guidelines
and review.
19. Results
• No significant difference between
the control and intervention items,
respectively
• Mean student scores (% ± SD)
did change (p<0.001):
• Control sequence, 88.3 ± 4.5
• Intervention sequence 85.6 ± 6.0
21. Distribution by level
ITEM CLASS WITH REVIEW, n (%) WITHOUT REVIEW, n (%)
Level 1 31 (11.4) 52 (19.7)
Level 2 70 (27.5) 76 (28.8)
Level 3 7 (2.6) 3 (1.1)
Level 4 142 (52.2) 122 (46.2)
Uncategorizable 22 (8.1) 11 (4.2)
22. Goal 2: How did participants’ self-rated
confidence and success change?
23. Survey opinions: self-focused
ITEM Pre (mean) Post (mean) p-value
How would you rate your confidence at writing effective
multiple-choice test items?
(0 – very unconfident, 10 – very confident)
6.0 8.1 0.002
How would you rate your success at writing multiple-
choice test items?
(0 – very unsuccessful, 10 – very successful)
6.4 7.9 <0.001
To what degree do you feel confident that you can
properly evaluate your and your colleagues' test
questions?
(0 – very unconfident, 10 – very confident)
6.7 8.4 0.005
To what degree do you feel confident that you could
implement a formal exam item evaluation process
as a coordinator of another course?
(0 – very unconfident, 10 – very confident)
5.5 7.1 0.008
24. Goal 3: How did participants’ opinions of
item guidelines and review change?
25. Survey opinions: item-focused
ITEM Pre (mean) Post (mean) p-value
In your opinion, to what degree will peer-review
of exam items affect item quality?
(0 – very negatively, 10 – very positively)
7.9 8.5 0.14
Do you plan to modify future multiple-choice
items based on item-writing guidelines?
(1 – Yes, 2 – No)
9 Y
1 N
10 Y
0 N
1.00
In your opinion, to what degree will voluntary
application of item-writing guidelines affect
item quality?
(0 – very negatively, 10 – very positively)
7.9 8.4 0.24
27. Top 5 item flaws
GUIDELINES n (% of total changes)
Include the central idea in the stem instead of the choices. 37 (33.6)
Use correct grammar, punctuation, capitalization, and spelling. 17 (15.5)
Minimize the amount of reading in each stem. 13 (11.8)
Use the question, completion, and best answer versions of the conventional
multiple choice (MC), the alternate choice, true-false, multiple true-false,
matching, and the context-dependent item and item set formats, but avoid the
complex MC (Type K) format.
10 (9.1)
Keep choices independent; choices should not be overlapping. 7 (6.4)
We decided to start an exam item review process because: we had a wide variability in what we collectively thought was a “good” or a “bad” question. We had a relatively young faculty members who like myself had no clue how to write questions if we’re being honest. Lastly we wanted to limit any grade variability that may have been due to poorly written questions.
Am I describing that this was only clinical faculty here too?
So how do you make the first meeting and all the meeting less like this and more like this? First we got ALL faculty members involved in the course into the room for meeting 1. We had faculty members with experience in item writing present a mini- “faculty development” presentation of haladyna’s item writing guidelines.
Developed from an analysis of 27 textbooks and 27 research studies
Purpose was to validate each guideline based on agreement in studied sources
Haladyna and Downing examined 46 measurement textbook passages dealing with how to write multiple choice items. They produced a set of 43 item-writing guidelines. They found that some guidelines had a strong consensus from these testing specialists. Some guidelines were given lesser attention. Several guidelines were controversial. Coverage of these guidelines in these books varied from very comprehensive to very limited. Commonly, authors did not logically or empirically justify the guidelines they presented.
Point biserial correlates student scores on one particular question with their scores on the test as a whole. The driving assumption is, students who score well on the test as a whole should on average score well on the question under review. And vice versa. If a question deviates from this assumption the rpb lets us know.
Rpb ranges from -1.0 to +1.0
The closer the rpb to 1.0 means the more reliable the question is considered because it “discriminates” well between students who mastered the material and those who did not.
P-value simple measure of question difficulty ranges from 0 to 1 with lower numbers meaning more difficult the question. For example if p value is 0 then no students got the question correct or if p value is 1 then everyone got it correct.
Let's look at the handout. The majority of the guidelines are just common sense and are universally endorsed (when mentioned). I don't think we need to spend any time discussing these since the consensus is already there, but do read over them all at some point just to bring them into active consideration. What I think we should spend time discussing are the five that received mixed recommendations, both from textbooks and empirical research.
So now that you know why we felt a peer review process was important, I am going to discuss the implementation and logistics of the process.
We have two, sequential Self Care modules that occur over one year. 11 faculty teach in SC I, and 9 faculty teach in SC II.
Of 11 faculty in SC I, 6 (Fall 2012) and 5 (Fall 2013) also teach in SC II
Important that all faculty members are willing to participate in the process.
Deadlines for exam items are posted in the original schedule at the beginning of each semester. We usually only require them one week in advance, but due to this process, we asked that they be submitted at least 2 weeks in advance. Once all TQs were submitted, the course coordinator created the assessment in ES and downloaded it as a PDF. This draft was then sent to all course instructors as quickly as possible. We tried to give them several days to prepare for the meeting. In preparation for the meeting, we asked that all faculty review each item and be ready to present suggested revisions at the meeting. They should keep in mind all guidelines for exam writing that were covered at the beginning of the course and use these as a basis for recommendations.
The faculty involved in the course were distributed over all three of our campuses. The meeting were face-to-face with distance connection to the off-site campuses. We fostered an open, friendly environment for people to feel comfortable making their suggestions. Each meeting included great discussion about several items ending in agreement of how to make the items better. When the exam was sent out to the faculty, the author of each item was not noted. However, we all have access to the syllabus and know what each other teach. During the meetings, the authors of items were often asked for clarification and/or rationale behind certain questions.
Once everyone agreed upon suggested revisions, the course coordinators revised the exam and resent it to the group to double check that all changes accurately reflected the group’s decision. Once, this process was complete, the exam was posted for students to download.
All faculty volunteered to participate and bought into the process
Meetings were well attended
Faculty were prepared for the meeting
All faculty were members of the same department
Faculty adhered to the agreed upon “do not use” formats from the orientation
No one took revisions personally
Resistant faculty
Courses that involve more than one department/discipline
Communication
Buy-in
The pre- and post-survey assessed:
Faculty confidence and success at writing exam items
Past experience with test question writing guidelines and peer review processes
How they think this process will affect item quality
Confidence in incorporating the process into other courses
Ten of 12 instructors completed both the pre- and post-surveys. Survey questions and responses are summarized in Table 1. Six participating faculty reported that they have been teaching in a professional pharmacy curriculum for ≤5 years and four for 6-10 years. Seventy percent reported previous training in item writing with faculty development programs (n=8) and credentialing board training (n=3) being the most commonly reported experiences. When asked, “which of the following factors affect your sense of success in item writing,” faculty responded as follows: item statistics (n=9), previous training in item writing (n=4), and student challenges to exam items (n=3). At baseline, only 5 faculty members (50%) had ever participated in item peer-review, four reporting using peer-review “half of the time” and one “a minority of the time”. Similarly, only five faculty members had ever modified exam items based on item-writing guidelines at baseline.
Ten of 12 instructors completed both the pre- and post-surveys. Survey questions and responses are summarized in Table 1. Six participating faculty reported that they have been teaching in a professional pharmacy curriculum for ≤5 years and four for 6-10 years. Seventy percent reported previous training in item writing with faculty development programs (n=8) and credentialing board training (n=3) being the most commonly reported experiences. When asked, “which of the following factors affect your sense of success in item writing,” faculty responded as follows: item statistics (n=9), previous training in item writing (n=4), and student challenges to exam items (n=3). At baseline, only 5 faculty members (50%) had ever participated in item peer-review, four reporting using peer-review “half of the time” and one “a minority of the time”. Similarly, only five faculty members had ever modified exam items based on item-writing guidelines at baseline.
Ten of 12 instructors completed both the pre- and post-surveys. Survey questions and responses are summarized in Table 1. Six participating faculty reported that they have been teaching in a professional pharmacy curriculum for ≤5 years and four for 6-10 years. Seventy percent reported previous training in item writing with faculty development programs (n=8) and credentialing board training (n=3) being the most commonly reported experiences. When asked, “which of the following factors affect your sense of success in item writing,” faculty responded as follows: item statistics (n=9), previous training in item writing (n=4), and student challenges to exam items (n=3). At baseline, only 5 faculty members (50%) had ever participated in item peer-review, four reporting using peer-review “half of the time” and one “a minority of the time”. Similarly, only five faculty members had ever modified exam items based on item-writing guidelines at baseline.
Word the stem positively, avoid negatives such as NOT or EXCEPT. If negative words are used, use the word cautiously and always ensure that the word appears capitalized and boldface.
7 (6.4)
Avoid window dressing.
6 (5.5)
Avoid all-of-the-above.
3 (2.7)
Place choices in logical or numerical order.
2 (1.8)
Develop as many effective choices as you can, but research suggests three is adequate.
2 (1.8)
Ensure that the directions in the stem are very clear.
2 (1.8)
Avoid giving clues to the right answer, such as grammatical inconsistencies that cue the test-taker to the correct choice.
1 (0.9)