Grading Your Assessments: How to Evaluate the Quality of Your Exams

How satisfied are you with the last assessment you gave? Would you describe your exam as a highly effective evaluation tool? How much information does it reveal about individual student’s abilities, and the overall performance of your current class as compared to previous classes? Do you trust your assessment to accurately identify which students “get it,” and which ones clearly do not grasp the content, nor meet the expected standards required to pass your course?

The use of a 3-step item analysis method based on an item’s difficulty levels, discrimination values, and response frequencies provides a revealing look at the quality of your assessment by focusing your attention on the effectiveness of each test item and its contribution to the exam blueprint. Save time and effort in identifying exactly which exam questions need editing, and how much editing is required, before you take any action. You’ll likely find that replacing the item with a brand new question may not be necessary. Learn how your efforts to make small improvements within just a few exam items, guided by a systematic process of reviewing statistical results before you start editing, can drastically enhance the items’ quality, and eliminate the need to spend hours rewriting the entire exam. By using this item analysis method, your future assessments will be able to provide an accurate measurement of your students’ abilities to apply nursing content and solve clinical problems.

  1. 1. Grading your Assessments: How to Evaluate the Quality of your Exams An ExamSoft Client Webinar
  2. 2. Grading  Your  Assessments:   How  to  evaluate  the  quality   of  your  exams AINSLIE  T.  NIBERT,  PHD,  RN,  FAAN MARCH  12,  2015
  3. 3. 3 Sound Instruction Educator’s Golden Triangle Instruction Evaluation Objectives Outcomes
  4. 4. 4 Five Guidelines to Developing Effective Critical Thinking Exams q  Assemble the “basics.” q  Write critical thinking test items. q  Pay attention to housekeeping duties. q  Develop a test blueprint. q  Scientifically analyze all exams.
  5. 5. 5 Definition Critical Thinking The process of analyzing and understanding how and why we reached a certain conclusion.
  6. 6. 6 Bloom’s  Taxonomy:  Benjamin  Bloom,  1956     (revised) Terminology changes "The graphic is a representation of the NEW verbage associated with the long familiar Bloom's Taxonomy. Note the change from Nouns to Verbs [e.g., Application to Applying] to describe the different levels of the taxonomy. Note that the top two levels are essentially exchanged from the Old to the New version." (Schultz, 2005) (Evaluation moved from the top to Evaluating in the second from the top, Synthesis moved from second on top to the top as Creating.) Source: http://www.odu.edu/educ/llschult/blooms_taxonomy.htm
  8. 8. Post-Exam Item Analysis: An important aspect of item writing Helps to determine the quality of a test 8
  9. 9. 9 Consistency of Scores
  10. 10. Reliability Tools 10 q Kuder-Richardson Formula 20 (KR20) —EXAM Ø Range from –1 to + 1 q Point Biserial Correlation Coefficient (PBCC)—TEST ITEMS Ø Range from – 1 to + 1
  11. 11. 11 q Item difficulty 30% - 90% q Item Discrimination Ratio 25% and Above q PBCC 0.20 and Above q KR20 0.70 and Above Standards of Acceptance
  12. 12. Thinking more about mean item difficulty on teacher-made tests… Mean  difficulty  level  for  a  teacher-­‐made   nursing  exam  should  be  80  –  85%.     So,  why  might  low  NCLEX-­‐RN®  pass  rates  persist  when  mean   difficulty  levels  on  teacher-­‐made  exams  remain  consistently   within  this  desired  range?     12
  13. 13. …. and one “absolute” rule about item difficulty   Since  the  mean  difficulty  level  for  a  teacher-­‐made  nursing   exam  is  80  –  85%,  what  should  the  lowest  acceptable   value  be  for  each  test  item  on  the  exam?     TEST  ITEMS  ANSWERED  CORRECTLY  BY  30%  or  LESS  of  the   examinees  should  always  be  considered  too  difficult,  and   the  instructor  must  take  acSon.     Why?   13
  14. 14. …but what about high difficulty levels? q Test  items  with  high  difficulty  levels  (>90%)  oIen   yield  poor  discriminaJon  values.   q Is  there  a  situaJon  where  faculty  can  legiJmately   expect  that  100%  of  the  class  will  answer  a  test   item  correctly,  and  be  pleased  when  this  happens?   q RULE  OF  THUMB  ABOUT  MASTERY  ITEMS:  Due  to   their  negaJve  impact  on  test  discriminaJon  and   reliability,  they  should  comprise  no  more  than  10%   of  the  test.   14
  15. 15. 15 q Item difficulty 30% - 90% q Item Discrimination Ratio 25% and Above q PBCC 0.20 and Above q KR20 0.70 and Above Standards of Acceptance
  16. 16. Thinking more about item discrimination on teacher- made tests… q IDR  can  be  calculated  quickly,  but  doesn’t  consider  variance  of  the   enJre  group.  Use  it  to  quickly  idenJfy  items  that  have  zero/negaJve   discriminaJon  values,  since  these  need  to  be  edited  before  using  again.   q PBCC  is  a  more  powerful  measure  discriminaJon.   q Correlates  the  correct  answer  to  a  single  test  items  with  the  total  test  score   of  the  student.   q Considers  the  variance  of  the  enJre  student  group,  not  just  the  lower  and   upper  27%  groups.     q For  a  small  ‘n,’  consider  cumulaJve  value.   16
  17. 17. … what decisions need to be made about items? q When  a  test  item  has  poor  difficulty  and/or   discriminaJon  values,  acJon  is  needed.   q All  of  these  acSons  require  that  the  exam  be  rescored.   q Credit  can  be  given  for  more  than  one  choice.   q Test  item  can  be  nullified.   q Test  item  can  be  deleted.   q Each  of  these  acSons  has  a  consequence,  so   faculty  need  to  carefully  consider  these  when   choosing  an  acSon.  Faculty  judgment  is  crucial   when  determining  acSons  affecSng  test  scores.   17
  18. 18. Standards of Acceptance Nursing   Nursing-PBCC 0.15 and Above Nursing-KR20 0.60 - 0.65 and Above 18
  19. 19. Thinking more about adjusting standard of acceptance for nursing tests… q Remember  that  the  key  staJsJcal  concept  inherent  in   calculaJng  coefficients  is  VARIANCE.     q When  there  is  less  variance  in  test  scores,  reliability  of  the   test  will  decrease,  ie  the  KR-­‐20  value  will  drop.   q What  contributes  to  lack  of  variance  in   nursing  students’  test  scores?   19
  21. 21. ..and a word about using Response Frequencies   SomeJmes  LESS  is  MORE  when  it  comes  to  ediJng  a  test  item.     A  review  of  the  response  frequency  data  can  focus  your  ediJng.     For  items  where  100%  of  students  answer  correctly,  and  no   other  opJons  were  chosen,  make  sure  that  this  is  indeed   intenJonal  (MASTERY  ITEM),  and  not  just  reflecJve  of  an   item  that  is  too  easy  (>90%  DIFFICULTY.)       Target  re-­‐wriJng  the  “zero”  distracters  –  those  opJons  that   are  ignored  by  students.  Replacing  “zeros”  with  plausible   opJons  will  immediately  improve  item  DISCRIMINATION.   21
  22. 22. 22 3-Step Method for Item Analysis 1. Review Difficulty Level 2. Review Discrimination Data q  Item Discrimination Ratio (IDR) q  Point Biserial Correlation Coefficient (PBCC) 3. Review Effectiveness of Alternatives q  Response Frequencies q  Non-distracters Source: Morrison, Nibert, Flick, J. (2006). Critical thinking and test item writing (2nd ed.).Houston, TX: Health Education Systems, Inc.
  28. 28. 28 Does the test measure what it claims to measure? C o n t e n t V a l i d i t y
  29. 29. 29 Use a Blueprint to Assess a Test’s Validity q  Test Blueprint Ø  Reflects Course Objectives Ø  Rational/Logical Tool Ø  Testing Software Program Ø  Storage of item analysis data (Last & Cum) Ø  Storage of test item categories
  30. 30. 30 Test Blueprints q  Faculty Generated q  Electronically Generated
  31. 31. An  electronic  blueprint   for  each  exam  in  each  course 31
  34. 34. NCLEX-­‐RN®  Client  Needs   Percentages  of  Items  2011  vs.  2014 34 Source: https://www.ncsbn.org/4701.htm
  35. 35. NCLEX-­‐RN®  Client  Needs   Percentages  of  Items  2011  vs.  2014   Increases  vs.  Decreases 35
  36. 36. Item Writing Tools for Success … Knowledge Test Blueprint Testing Software
  37. 37. References   Morrison,  S.,  Nibert,  A.,  &  Flick,  J.  (2006).  Cri$cal  thinking  and  test  item   wri$ng  (2nd  ed.).  Houston,  TX:  Health  EducaJon  Systems,  Inc.   Morrison,  S.  (2004).  Improving  NCLEX-­‐RN  pass  rates  through  internal  and   external  curriculum  evaluaJon.  In  M.  Oermann  &  K.  Heinrich  (Eds.),  Annual   review  of  nursing  educaJon  (Vol.  3).  New  York:  Springer   NaJonal  Council  of  State  Boards  of  Nursing.  (2013)  2013  NCLEX-­‐RN  test   plan.  Chicago,  IL:  NaJonal  Council  of  State  Boards  of  Nursing.   hpps://www.ncsbn.org/3795.htm   Nibert,  A.  (2010)  Benchmarking  for  student  progression  throughout  a   nursing  program:    Implica$ons  for  students,  faculty,  and  administrators.  In   CapuJ,  L.  (Ed.),  Teaching  nursing:    The  art  and  science,  2nd  ed.  (Vol.  3).  (pp. 45-­‐64).  Chicago:  College  of  DuPage  Press.       37
  38. 38. Have  Ques]ons?  Need  More  Info?     Thanks  for  your  Jme  &  apenJon  today!   38 866-429-8889