SlideShare ist ein Scribd-Unternehmen logo
1 von 40
Downloaden Sie, um offline zu lesen
Upcoming Caveon Events
• Caveon Webinar Series: Next session, October 16
The Good and Bad of Online Proctoring, Part 2
• EATP – September 25-27 in St. Julian’s, Malta.
– Caveon’s John Fremer and Steve Addicott presenting:
What are we Accountable For? Security Standards and Resources for High
Stakes Testing Programs
– Steve Addicott hosting an ignite session: Leveraging Social Media to Connect with
International Test Candidates
• The 2nd Annual Statistical Detection of Potential Test Fraud Conference
– October 17-19, 2013, Madison, Wisconsin
– Caveon’s Dennis Maynes and Cindy Butler will be presenting three sessions
• Handbook of Test Security – Now Available. We will share a discount code at the
end of this session.
Caveon Online
• Caveon Security Insights Blog
– http://www.caveon.com/blog/
• twitter
– Follow @Caveon
• LinkedIn
– Caveon Company Page
– ―Caveon Test Security‖ Group
• Please contribute!
• Facebook
– Will you be our ―friend?‖
– ―Like‖ us!
www.caveon.com
Improving Testing with Key Strength Analysis
Dennis Maynes Dan Allen
Chief Scientist Psychometrician
Caveon Test Security Western Governors University
Marcus Scott Barbara Foster
Data Forensics Scientist Psychometrician
Caveon Test Security American Board of Obstetrics
and Gynecology
September 18, 2013
Caveon Webinar Series:
Agenda for Today
• Review classical item analysis
• Introduce Key Strength Analysis
• Derive Key Strength Analysis
• Observations by Dan Allen and Barbara Foster
• Conclusions and Q&A
Review Classical Item Analysis
• Statistics
– P-value
– Point-biserial correlation
• Typical rules
– Low p-values (hard items)
– High p-values (easy items)
– Low point-biserial correlations (low discriminations)
• Easy to understand and implement
• Good at flagging poor items
Introduce Key Strength Analysis
• Why Key Strength Analysis?
– Model uses information from all items
– Answer choices for same item are compared
– Provides possible reasons for poor performance
• High performing test takers (knowledgeable students)
– Typically report problems with the answer key
– Usually choose the correct answer
• Most frequently selected choice
– Is usually correct for easy items
– Is not necessarily correct for hard items
Capabilities of Key Strength Analysis
• Built upon classical item analysis
– Point-biserial correlations discriminate between high and low
performers
– P-values detect hard/easy items
• Typical problems with items
– Mis-keyed items
– Weakly keyed items
– Ambiguously keyed items
• Use probabilities to make inferences about item
performance
Modify Point-Biserial Correlation
1. Exclude the item score from the test score
• Places all answer choices on ―the same playing field‖
• Allows correct and incorrect answers to be compared using
―what if‖
2. Compute point-biserial correlations
• For correct answer and
• For distractors
3. Scale point-biserial appropriately
• We call this statistic, z*
• Use z* to compute the probability of the choice (A, B, etc.) being
a key--this is the ―key strength‖
Derive Key Strength Analysis
After Some Algebra
Why z* Depends on all the Right Quantities
Z* for all Items and Responses
0
0.02
0.04
0.06
0.08
0.1
0.12
0.14
0.16
-10 -9 -8 -7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7 8 9 10
z*
Right Wrong
154 Examinees, 100 Items
Calculating p(choice is a key | data)
Approximation Theory
• Central Limit Theorem  z* is normal.
• Probability function should be monotonic
increasing, which requires equal variances
0
0.02
0.04
0.06
0.08
0.1
0.12
0.14
0.16
-10
-9
-8
-7
-6
-5
-4
-3
-2
-1
0
1
2
3
4
5
6
7
8
9
10
z*
Right Right Normal Wrong Wrong Normal
P(choice is a key | z*)
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
-4
-3.5
-3
-2.5
-2
-1.5
-1
-0.5
0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
5
p(choiceisakey|z*)
z*
Analysis of Distractors
• Compute key strength (KS) for all responses
• Low KS – probability less than 50%
• High KS – probability 50% or more
AnswerDistractors Low KS High KS
Low KS Weakly keyed Potential mis-key
High KS Normal Ambiguously keyed
Example I – Good Key
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
-4
-3.5
-3
-2.5
-2
-1.5
-1
-0.5
0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
5
p(choiceisakey|z*)
z*
A
C D
B
Response z* Probability
A 3.25 0.99
B 0.25 0.06
C -2.75 0
D -2.4 0
Answer key arrow is
colored gold
Example II – Potential Mis-key
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
-4
-3.5
-3
-2.5
-2
-1.5
-1
-0.5
0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
5
p(choiceisakey|z*)
z*
A
B
C D
Response z* Probability
A 3.25 0.99
B 0.25 0.06
C -2.75 0
D -2.4 0
Answer key arrow is
colored gold
Example III – Weak Key
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
-4
-3.5
-3
-2.5
-2
-1.5
-1
-0.5
0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
5
p(choiceisakey|z*)
z*
A
B
C D
Response z* Probability
A 1.0 0.32
B 0.25 0.06
C -3 0
D -2.5 0
Answer key arrow is
colored gold
Example IV – Ambiguous Key
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
-4
-3.5
-3
-2.5
-2
-1.5
-1
-0.5
0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
5
p(choiceisakey|z*)
z*
Response z* Probability
A 3.75 0.99
B 2.25 0.9
C -3 0
D -2.5 0
C D
A
B
Answer key arrow is
colored gold
Validation – Answer Key Estimation
• Assume the key is not known
• Check accuracy of estimated answer key
• Algorithm:
– Start with most frequent response as initial guess
– Revise key using probabilities until no more changes
• For 12 different exams
– Key estimation accuracy varied from 81% to 99%
– Cannot infer multiple keys
– Cannot guess key when there are no correct responses
Summary of Validation Study
• Accuracy improves with item quality
• Accuracy affected by sample size & test length
Exam
Name
N Forms
Form
Length
Items
Non-scored
Items
Accuracy Observations
A 2,966 2 180 307 0 99.2%
B 337 2 107 214 0 85.5%
C 337 1 230 230 0 90.9%
D 1815 1 204 204 7 92.1%Some association with "deleted" items
E 1408 1 199 199 1 96.0%
F 46,356 2 240 480 0 96.0%
G 44,104 2 120 240 0 95.8%
H 25,448 2 60 120 0 93.3%
I 121 3 165 417 43 81.0%Strong association with "field test" items
J 1,071 8 52 & 61 391 0 80.5%85.2% (English-only)
K 2,033 8 68, 76 & 77 510 0 85.9%
L 6,473 21 250 1050 850 85.7%
All errors except one were on non-scored
items.
Reason for Answer Key Estimation
• If a group of test takers has stolen the test and worked
out their own answer key, it is likely some answers will
be wrong.
• Answer key estimation can find the errors committed by
test thieves.
Dan Allen
Psychometrician
Western Governors University
Example Item: Ambiguous Key
Which is a property of all X?
A. They contain Y.
B. They have property Z.
C. * They do not contain Y.
D. They have property W.
Looking at the item text, we see that this is likely being
caused by rival options A and C. SME feedback
suggests the item is too text specific.
Example Item: Ambiguous Key
Which is a component of X?
A. * Real anticipated expense
B. Time spent
C. Liquid assets
D. Quality
In this case, students of high ability were often
selecting C instead of A. SME feedback suggests the
deleted word may have been turning students off to
that option.
Example Item: Weak Key
Select 3 possible causes of X
A. *Obesity
B. Contaminated drinking water
C. *Unhealthy diet
D. *Genetic factors
E. Lack of exercise
High performing students were picking C and D correctly, but
were as likely to pick E as they were to pick A. SME feedback
suggested that E may be a reasonable answer to the question.
The revision involved making A, C, and E all incorrect answers
so that D would remain the sole answer.
Example Item: Potential Mis-key
Which is a sound accounting principle?
A. X
B. Not X
C. *Y
D. Z
Nearly all students selected distractor B (Not X). This
item was not mis-keyed. It seems most likely that this
concept was not covered sufficiently in the text and/or
other learning resources—leaving students to use
guessing strategies rather than content knowledge.
Barbara Foster
Psychometrician
The American Board of Obstetrics
and Gynecology
The American Board of
Obstetrics and Gynecology
2013 Certifying Exam
• 180 scored items
• Five sets of 40 field test items
• Potential mis-keys from Caveon
– 8 identified among the scored items (4%)
– 22 identified among the field test items (11%)
The lower proportion in the scored items is not
surprising since those items have been field
tested and some may have been previously
used.
The American Board of Obstetrics and Gynecology
• Result of the SME review of the flagged scored
items:
– 4 of the 8 (50%) were found to have problems.
These problems were a combination of ambiguous
wording, new information published just prior to
the exam, recent changes in guidelines, or just a
very difficult item. These items were deleted from
the exam prior to scoring.
The American Board of Obstetrics and Gynecology
• Result of the SME review of the flagged field
test items:
– 15 of the 22 (68%) were found to have problems.
These problems were mostly a combination of
ambiguous wording, responses too closely related,
and changes in the field.
The American Board of Obstetrics and Gynecology
Our Standard Methods The z* Method
27 Field Test Items
flagged
(13.5%)
22 Field Test Items
flagged
(11.0%)8 (4%)
items
flagged
by both
The American Board of Obstetrics and Gynecology
Our Standard Methods The z* Method
27 Field Test Items
flagged
(13.5%)
13 had problems
22 Field Test Items
flagged
(11.0%)
15 had problems
8 (4%)
5 items
had
problems
The American Board of Obstetrics and Gynecology
• Conclusion
This new method indicates that it is detecting
differences that are not being detected by our
current methods. These differences do not
appear to be strictly keying errors but involve
other important problem areas as well.
The American Board of Obstetrics and Gynecology
Conclusions
• Item analysis helps ensure
– Unidimensionality
– Desired item performance
• Key Strength Analysis enhances classical item analysis
– Uses information from all items
– Compares answer choices for same item
• Can detect structural flaws in items
• Can suggest the actual key when the item is mis-keyed
– Suggests possible reasons for poor performance
• Future research
– Investigate thresholds for Key Strength Analysis
– Simulate item problems to measure ability to detect
– Evaluate performance when assumptions fail
Questions?
Please type questions for our presenters in the
GoToWebinar control panel on your screen.
HANDBOOK OF TEST SECURITY
• Editors - James Wollack & John Fremer
• Published March 2013
• Preventing, Detecting, and Investigating Cheating
• Testing in Many Domains
– Certification/Licensure
– Clinical
– Educational
– Industrial/Organizational
• Don’t forget to order your copy at www.routledge.com
– http://bit.ly/HandbookTS (Case Sensitive)
– Save 20% - Enter discount code: HYJ82
THANK YOU!
- Follow Caveon on twitter @caveon
- Check out our blog…www.caveon.com/blog
- LinkedIn Group – ―Caveon Test Security‖
Dennis Maynes Dan Allen
Chief Scientist Psychometrician
Caveon Test Security Western Governors University
Marcus Scott Barbara Foster
Data Forensics Scientist Psychometrician
Caveon Test Security American Board of Obstetrics
and Gynecology

Weitere ähnliche Inhalte

Ähnlich wie Caveon Webinar Series: Improving Testing with Key Strength Analysis

Surveys that work: training course for Rosenfeld Media, day 3
Surveys that work: training course for Rosenfeld Media, day 3 Surveys that work: training course for Rosenfeld Media, day 3
Surveys that work: training course for Rosenfeld Media, day 3 Caroline Jarrett
 
Psychometrics 101: Know What Your Assessment Data is Telling You
Psychometrics 101: Know What Your Assessment Data is Telling YouPsychometrics 101: Know What Your Assessment Data is Telling You
Psychometrics 101: Know What Your Assessment Data is Telling YouExamSoft
 
Lesson 21 designing the questionaire and establishing validity and reliabilty
Lesson 21 designing the questionaire and establishing validity and reliabiltyLesson 21 designing the questionaire and establishing validity and reliabilty
Lesson 21 designing the questionaire and establishing validity and reliabiltymjlobetos
 
I wish I could believe you: the frustrating unreliability of some assessment ...
I wish I could believe you: the frustrating unreliability of some assessment ...I wish I could believe you: the frustrating unreliability of some assessment ...
I wish I could believe you: the frustrating unreliability of some assessment ...Tim Hunt
 
Fdu item analysis (1).ppt revised by dd
Fdu item analysis (1).ppt revised by ddFdu item analysis (1).ppt revised by dd
Fdu item analysis (1).ppt revised by dddettmore
 
Item and Distracter Analysis
Item and Distracter AnalysisItem and Distracter Analysis
Item and Distracter AnalysisSue Quirante
 
Harmon, Uncertainty analysis: An evaluation metric for synthesis science
Harmon, Uncertainty analysis: An evaluation metric for synthesis scienceHarmon, Uncertainty analysis: An evaluation metric for synthesis science
Harmon, Uncertainty analysis: An evaluation metric for synthesis sciencequestRCN
 
Unit 2 MARKETING RESEARCH
Unit 2 MARKETING RESEARCHUnit 2 MARKETING RESEARCH
Unit 2 MARKETING RESEARCHPramod Rawat
 
Administering, analyzing, and improving the test or assessment
Administering, analyzing, and improving the test or assessmentAdministering, analyzing, and improving the test or assessment
Administering, analyzing, and improving the test or assessmentNema Grace Medillo
 
Cognitive, personality and behavioural predictors of academic success in a la...
Cognitive, personality and behavioural predictors of academic success in a la...Cognitive, personality and behavioural predictors of academic success in a la...
Cognitive, personality and behavioural predictors of academic success in a la...Blackboard APAC
 
Chapter 6: Writing Objective Test Items
Chapter 6: Writing Objective Test ItemsChapter 6: Writing Objective Test Items
Chapter 6: Writing Objective Test ItemsSHELAMIE SANTILLAN
 
Collection of data
Collection of dataCollection of data
Collection of dataBaiju KT
 
Test construction tony coloma
Test construction tony colomaTest construction tony coloma
Test construction tony colomaTony Coloma
 

Ähnlich wie Caveon Webinar Series: Improving Testing with Key Strength Analysis (20)

Surveys that work: training course for Rosenfeld Media, day 3
Surveys that work: training course for Rosenfeld Media, day 3 Surveys that work: training course for Rosenfeld Media, day 3
Surveys that work: training course for Rosenfeld Media, day 3
 
Psychometrics 101: Know What Your Assessment Data is Telling You
Psychometrics 101: Know What Your Assessment Data is Telling YouPsychometrics 101: Know What Your Assessment Data is Telling You
Psychometrics 101: Know What Your Assessment Data is Telling You
 
Lesson 21 designing the questionaire and establishing validity and reliabilty
Lesson 21 designing the questionaire and establishing validity and reliabiltyLesson 21 designing the questionaire and establishing validity and reliabilty
Lesson 21 designing the questionaire and establishing validity and reliabilty
 
ch.9 (1).ppt
ch.9 (1).pptch.9 (1).ppt
ch.9 (1).ppt
 
Item analysis with spss software
Item analysis with spss softwareItem analysis with spss software
Item analysis with spss software
 
I wish I could believe you: the frustrating unreliability of some assessment ...
I wish I could believe you: the frustrating unreliability of some assessment ...I wish I could believe you: the frustrating unreliability of some assessment ...
I wish I could believe you: the frustrating unreliability of some assessment ...
 
Questionnaire development
Questionnaire developmentQuestionnaire development
Questionnaire development
 
Teaching technology2
Teaching technology2Teaching technology2
Teaching technology2
 
Fdu item analysis (1).ppt revised by dd
Fdu item analysis (1).ppt revised by ddFdu item analysis (1).ppt revised by dd
Fdu item analysis (1).ppt revised by dd
 
Item and Distracter Analysis
Item and Distracter AnalysisItem and Distracter Analysis
Item and Distracter Analysis
 
Harmon, Uncertainty analysis: An evaluation metric for synthesis science
Harmon, Uncertainty analysis: An evaluation metric for synthesis scienceHarmon, Uncertainty analysis: An evaluation metric for synthesis science
Harmon, Uncertainty analysis: An evaluation metric for synthesis science
 
Unit 2 MARKETING RESEARCH
Unit 2 MARKETING RESEARCHUnit 2 MARKETING RESEARCH
Unit 2 MARKETING RESEARCH
 
Administering, analyzing, and improving the test or assessment
Administering, analyzing, and improving the test or assessmentAdministering, analyzing, and improving the test or assessment
Administering, analyzing, and improving the test or assessment
 
Cognitive, personality and behavioural predictors of academic success in a la...
Cognitive, personality and behavioural predictors of academic success in a la...Cognitive, personality and behavioural predictors of academic success in a la...
Cognitive, personality and behavioural predictors of academic success in a la...
 
Chapter 6: Writing Objective Test Items
Chapter 6: Writing Objective Test ItemsChapter 6: Writing Objective Test Items
Chapter 6: Writing Objective Test Items
 
Collection of data
Collection of dataCollection of data
Collection of data
 
AOL-CHAPTER-3.pptx
AOL-CHAPTER-3.pptxAOL-CHAPTER-3.pptx
AOL-CHAPTER-3.pptx
 
Test construction tony coloma
Test construction tony colomaTest construction tony coloma
Test construction tony coloma
 
Analysis of item test
Analysis of item testAnalysis of item test
Analysis of item test
 
Analysis of item test
Analysis of item testAnalysis of item test
Analysis of item test
 

Mehr von Caveon Test Security

Unpublished study indicates high chance of fraud in thousands of tests of enem
Unpublished study indicates high chance of fraud in thousands of tests of enemUnpublished study indicates high chance of fraud in thousands of tests of enem
Unpublished study indicates high chance of fraud in thousands of tests of enemCaveon Test Security
 
Caveon webinar series - smart items- using innovative item design to make you...
Caveon webinar series - smart items- using innovative item design to make you...Caveon webinar series - smart items- using innovative item design to make you...
Caveon webinar series - smart items- using innovative item design to make you...Caveon Test Security
 
Caveon webinar series - smart items- using innovative item design to make you...
Caveon webinar series - smart items- using innovative item design to make you...Caveon webinar series - smart items- using innovative item design to make you...
Caveon webinar series - smart items- using innovative item design to make you...Caveon Test Security
 
Caveon Webinar Series - A Guide to Online Protection Strategies - March 28, ...
Caveon Webinar Series -  A Guide to Online Protection Strategies - March 28, ...Caveon Webinar Series -  A Guide to Online Protection Strategies - March 28, ...
Caveon Webinar Series - A Guide to Online Protection Strategies - March 28, ...Caveon Test Security
 
Caveon Webinar Series - Five Things You Can Do Now to Protect Your Assessment...
Caveon Webinar Series - Five Things You Can Do Now to Protect Your Assessment...Caveon Webinar Series - Five Things You Can Do Now to Protect Your Assessment...
Caveon Webinar Series - Five Things You Can Do Now to Protect Your Assessment...Caveon Test Security
 
The Do's and Dont's of Administering High Stakes Tests in Schools Final 121217
The Do's and Dont's of Administering High Stakes Tests in Schools Final 121217The Do's and Dont's of Administering High Stakes Tests in Schools Final 121217
The Do's and Dont's of Administering High Stakes Tests in Schools Final 121217Caveon Test Security
 
Caveon Webinar Series - The Art of Test Security - Know Thy Enemy - November ...
Caveon Webinar Series - The Art of Test Security - Know Thy Enemy - November ...Caveon Webinar Series - The Art of Test Security - Know Thy Enemy - November ...
Caveon Webinar Series - The Art of Test Security - Know Thy Enemy - November ...Caveon Test Security
 
Caveon Webinar Series - Four Steps to Effective Investigations in School Dis...
Caveon Webinar Series -  Four Steps to Effective Investigations in School Dis...Caveon Webinar Series -  Four Steps to Effective Investigations in School Dis...
Caveon Webinar Series - Four Steps to Effective Investigations in School Dis...Caveon Test Security
 
Caveon Webinar Series - On-site Monitoring in Districts 0317
Caveon Webinar Series - On-site Monitoring in Districts 0317Caveon Webinar Series - On-site Monitoring in Districts 0317
Caveon Webinar Series - On-site Monitoring in Districts 0317Caveon Test Security
 
CESP Study Session #1 October 2016
CESP Study Session #1 October 2016CESP Study Session #1 October 2016
CESP Study Session #1 October 2016Caveon Test Security
 
A Tale of Two Cities - School District Webinar #1 Jan 2017
A Tale of Two Cities - School District Webinar  #1 Jan 2017A Tale of Two Cities - School District Webinar  #1 Jan 2017
A Tale of Two Cities - School District Webinar #1 Jan 2017Caveon Test Security
 
Caveon Webinar Series - Discrete Option Multiple Choice: A Revolution in Te...
Caveon Webinar Series  - Discrete Option Multiple Choice:  A Revolution in Te...Caveon Webinar Series  - Discrete Option Multiple Choice:  A Revolution in Te...
Caveon Webinar Series - Discrete Option Multiple Choice: A Revolution in Te...Caveon Test Security
 
Caveon Webinar Series - Test Cheaters Say the Darnedest Things! - 072016
Caveon Webinar Series -  Test Cheaters Say the Darnedest Things! - 072016Caveon Webinar Series -  Test Cheaters Say the Darnedest Things! - 072016
Caveon Webinar Series - Test Cheaters Say the Darnedest Things! - 072016Caveon Test Security
 
Caveon Webinar Series - The Test Security Framework- Why Different Tests Nee...
Caveon Webinar Series -  The Test Security Framework- Why Different Tests Nee...Caveon Webinar Series -  The Test Security Framework- Why Different Tests Nee...
Caveon Webinar Series - The Test Security Framework- Why Different Tests Nee...Caveon Test Security
 
Caveon Webinar Series - Conducting Test Security Investigations in School Di...
Caveon Webinar Series -  Conducting Test Security Investigations in School Di...Caveon Webinar Series -  Conducting Test Security Investigations in School Di...
Caveon Webinar Series - Conducting Test Security Investigations in School Di...Caveon Test Security
 
Caveon Webinar Series - Creating Your Test Security Game Plan - March 2016
Caveon Webinar Series -  Creating Your Test Security Game Plan - March 2016Caveon Webinar Series -  Creating Your Test Security Game Plan - March 2016
Caveon Webinar Series - Creating Your Test Security Game Plan - March 2016Caveon Test Security
 
Caveon Webinar Series - Mastering the US DOE Test Security Requirements Janua...
Caveon Webinar Series - Mastering the US DOE Test Security Requirements Janua...Caveon Webinar Series - Mastering the US DOE Test Security Requirements Janua...
Caveon Webinar Series - Mastering the US DOE Test Security Requirements Janua...Caveon Test Security
 
Caveon Webinar Series - Will the Real Cloned Item Please Stand Up? final
Caveon Webinar Series -  Will the Real Cloned Item Please Stand Up? finalCaveon Webinar Series -  Will the Real Cloned Item Please Stand Up? final
Caveon Webinar Series - Will the Real Cloned Item Please Stand Up? finalCaveon Test Security
 
Caveon Webinar Series - Lessons Learned at the 2015 National Conference on S...
Caveon Webinar Series -  Lessons Learned at the 2015 National Conference on S...Caveon Webinar Series -  Lessons Learned at the 2015 National Conference on S...
Caveon Webinar Series - Lessons Learned at the 2015 National Conference on S...Caveon Test Security
 
Caveon Webinar Series - Learning and Teaching Best Practices in Test Security...
Caveon Webinar Series - Learning and Teaching Best Practices in Test Security...Caveon Webinar Series - Learning and Teaching Best Practices in Test Security...
Caveon Webinar Series - Learning and Teaching Best Practices in Test Security...Caveon Test Security
 

Mehr von Caveon Test Security (20)

Unpublished study indicates high chance of fraud in thousands of tests of enem
Unpublished study indicates high chance of fraud in thousands of tests of enemUnpublished study indicates high chance of fraud in thousands of tests of enem
Unpublished study indicates high chance of fraud in thousands of tests of enem
 
Caveon webinar series - smart items- using innovative item design to make you...
Caveon webinar series - smart items- using innovative item design to make you...Caveon webinar series - smart items- using innovative item design to make you...
Caveon webinar series - smart items- using innovative item design to make you...
 
Caveon webinar series - smart items- using innovative item design to make you...
Caveon webinar series - smart items- using innovative item design to make you...Caveon webinar series - smart items- using innovative item design to make you...
Caveon webinar series - smart items- using innovative item design to make you...
 
Caveon Webinar Series - A Guide to Online Protection Strategies - March 28, ...
Caveon Webinar Series -  A Guide to Online Protection Strategies - March 28, ...Caveon Webinar Series -  A Guide to Online Protection Strategies - March 28, ...
Caveon Webinar Series - A Guide to Online Protection Strategies - March 28, ...
 
Caveon Webinar Series - Five Things You Can Do Now to Protect Your Assessment...
Caveon Webinar Series - Five Things You Can Do Now to Protect Your Assessment...Caveon Webinar Series - Five Things You Can Do Now to Protect Your Assessment...
Caveon Webinar Series - Five Things You Can Do Now to Protect Your Assessment...
 
The Do's and Dont's of Administering High Stakes Tests in Schools Final 121217
The Do's and Dont's of Administering High Stakes Tests in Schools Final 121217The Do's and Dont's of Administering High Stakes Tests in Schools Final 121217
The Do's and Dont's of Administering High Stakes Tests in Schools Final 121217
 
Caveon Webinar Series - The Art of Test Security - Know Thy Enemy - November ...
Caveon Webinar Series - The Art of Test Security - Know Thy Enemy - November ...Caveon Webinar Series - The Art of Test Security - Know Thy Enemy - November ...
Caveon Webinar Series - The Art of Test Security - Know Thy Enemy - November ...
 
Caveon Webinar Series - Four Steps to Effective Investigations in School Dis...
Caveon Webinar Series -  Four Steps to Effective Investigations in School Dis...Caveon Webinar Series -  Four Steps to Effective Investigations in School Dis...
Caveon Webinar Series - Four Steps to Effective Investigations in School Dis...
 
Caveon Webinar Series - On-site Monitoring in Districts 0317
Caveon Webinar Series - On-site Monitoring in Districts 0317Caveon Webinar Series - On-site Monitoring in Districts 0317
Caveon Webinar Series - On-site Monitoring in Districts 0317
 
CESP Study Session #1 October 2016
CESP Study Session #1 October 2016CESP Study Session #1 October 2016
CESP Study Session #1 October 2016
 
A Tale of Two Cities - School District Webinar #1 Jan 2017
A Tale of Two Cities - School District Webinar  #1 Jan 2017A Tale of Two Cities - School District Webinar  #1 Jan 2017
A Tale of Two Cities - School District Webinar #1 Jan 2017
 
Caveon Webinar Series - Discrete Option Multiple Choice: A Revolution in Te...
Caveon Webinar Series  - Discrete Option Multiple Choice:  A Revolution in Te...Caveon Webinar Series  - Discrete Option Multiple Choice:  A Revolution in Te...
Caveon Webinar Series - Discrete Option Multiple Choice: A Revolution in Te...
 
Caveon Webinar Series - Test Cheaters Say the Darnedest Things! - 072016
Caveon Webinar Series -  Test Cheaters Say the Darnedest Things! - 072016Caveon Webinar Series -  Test Cheaters Say the Darnedest Things! - 072016
Caveon Webinar Series - Test Cheaters Say the Darnedest Things! - 072016
 
Caveon Webinar Series - The Test Security Framework- Why Different Tests Nee...
Caveon Webinar Series -  The Test Security Framework- Why Different Tests Nee...Caveon Webinar Series -  The Test Security Framework- Why Different Tests Nee...
Caveon Webinar Series - The Test Security Framework- Why Different Tests Nee...
 
Caveon Webinar Series - Conducting Test Security Investigations in School Di...
Caveon Webinar Series -  Conducting Test Security Investigations in School Di...Caveon Webinar Series -  Conducting Test Security Investigations in School Di...
Caveon Webinar Series - Conducting Test Security Investigations in School Di...
 
Caveon Webinar Series - Creating Your Test Security Game Plan - March 2016
Caveon Webinar Series -  Creating Your Test Security Game Plan - March 2016Caveon Webinar Series -  Creating Your Test Security Game Plan - March 2016
Caveon Webinar Series - Creating Your Test Security Game Plan - March 2016
 
Caveon Webinar Series - Mastering the US DOE Test Security Requirements Janua...
Caveon Webinar Series - Mastering the US DOE Test Security Requirements Janua...Caveon Webinar Series - Mastering the US DOE Test Security Requirements Janua...
Caveon Webinar Series - Mastering the US DOE Test Security Requirements Janua...
 
Caveon Webinar Series - Will the Real Cloned Item Please Stand Up? final
Caveon Webinar Series -  Will the Real Cloned Item Please Stand Up? finalCaveon Webinar Series -  Will the Real Cloned Item Please Stand Up? final
Caveon Webinar Series - Will the Real Cloned Item Please Stand Up? final
 
Caveon Webinar Series - Lessons Learned at the 2015 National Conference on S...
Caveon Webinar Series -  Lessons Learned at the 2015 National Conference on S...Caveon Webinar Series -  Lessons Learned at the 2015 National Conference on S...
Caveon Webinar Series - Lessons Learned at the 2015 National Conference on S...
 
Caveon Webinar Series - Learning and Teaching Best Practices in Test Security...
Caveon Webinar Series - Learning and Teaching Best Practices in Test Security...Caveon Webinar Series - Learning and Teaching Best Practices in Test Security...
Caveon Webinar Series - Learning and Teaching Best Practices in Test Security...
 

Kürzlich hochgeladen

Department of Health Compounder Question ‍Solution 2022.pdf
Department of Health Compounder Question ‍Solution 2022.pdfDepartment of Health Compounder Question ‍Solution 2022.pdf
Department of Health Compounder Question ‍Solution 2022.pdfMohonDas
 
How to Show Error_Warning Messages in Odoo 17
How to Show Error_Warning Messages in Odoo 17How to Show Error_Warning Messages in Odoo 17
How to Show Error_Warning Messages in Odoo 17Celine George
 
2024.03.23 What do successful readers do - Sandy Millin for PARK.pptx
2024.03.23 What do successful readers do - Sandy Millin for PARK.pptx2024.03.23 What do successful readers do - Sandy Millin for PARK.pptx
2024.03.23 What do successful readers do - Sandy Millin for PARK.pptxSandy Millin
 
Protein Structure - threading Protein modelling pptx
Protein Structure - threading Protein modelling pptxProtein Structure - threading Protein modelling pptx
Protein Structure - threading Protein modelling pptxvidhisharma994099
 
Prescribed medication order and communication skills.pptx
Prescribed medication order and communication skills.pptxPrescribed medication order and communication skills.pptx
Prescribed medication order and communication skills.pptxraviapr7
 
3.21.24 The Origins of Black Power.pptx
3.21.24  The Origins of Black Power.pptx3.21.24  The Origins of Black Power.pptx
3.21.24 The Origins of Black Power.pptxmary850239
 
CHUYÊN ĐỀ DẠY THÊM TIẾNG ANH LỚP 11 - GLOBAL SUCCESS - NĂM HỌC 2023-2024 - HK...
CHUYÊN ĐỀ DẠY THÊM TIẾNG ANH LỚP 11 - GLOBAL SUCCESS - NĂM HỌC 2023-2024 - HK...CHUYÊN ĐỀ DẠY THÊM TIẾNG ANH LỚP 11 - GLOBAL SUCCESS - NĂM HỌC 2023-2024 - HK...
CHUYÊN ĐỀ DẠY THÊM TIẾNG ANH LỚP 11 - GLOBAL SUCCESS - NĂM HỌC 2023-2024 - HK...Nguyen Thanh Tu Collection
 
The basics of sentences session 10pptx.pptx
The basics of sentences session 10pptx.pptxThe basics of sentences session 10pptx.pptx
The basics of sentences session 10pptx.pptxheathfieldcps1
 
P4C x ELT = P4ELT: Its Theoretical Background (Kanazawa, 2024 March).pdf
P4C x ELT = P4ELT: Its Theoretical Background (Kanazawa, 2024 March).pdfP4C x ELT = P4ELT: Its Theoretical Background (Kanazawa, 2024 March).pdf
P4C x ELT = P4ELT: Its Theoretical Background (Kanazawa, 2024 March).pdfYu Kanazawa / Osaka University
 
SOLIDE WASTE in Cameroon,,,,,,,,,,,,,,,,,,,,,,,,,,,.pptx
SOLIDE WASTE in Cameroon,,,,,,,,,,,,,,,,,,,,,,,,,,,.pptxSOLIDE WASTE in Cameroon,,,,,,,,,,,,,,,,,,,,,,,,,,,.pptx
SOLIDE WASTE in Cameroon,,,,,,,,,,,,,,,,,,,,,,,,,,,.pptxSyedNadeemGillANi
 
Vani Magazine - Quarterly Magazine of Seshadripuram Educational Trust
Vani Magazine - Quarterly Magazine of Seshadripuram Educational TrustVani Magazine - Quarterly Magazine of Seshadripuram Educational Trust
Vani Magazine - Quarterly Magazine of Seshadripuram Educational TrustSavipriya Raghavendra
 
A gentle introduction to Artificial Intelligence
A gentle introduction to Artificial IntelligenceA gentle introduction to Artificial Intelligence
A gentle introduction to Artificial IntelligenceApostolos Syropoulos
 
How to Solve Singleton Error in the Odoo 17
How to Solve Singleton Error in the  Odoo 17How to Solve Singleton Error in the  Odoo 17
How to Solve Singleton Error in the Odoo 17Celine George
 
HED Office Sohayok Exam Question Solution 2023.pdf
HED Office Sohayok Exam Question Solution 2023.pdfHED Office Sohayok Exam Question Solution 2023.pdf
HED Office Sohayok Exam Question Solution 2023.pdfMohonDas
 
How to Send Emails From Odoo 17 Using Code
How to Send Emails From Odoo 17 Using CodeHow to Send Emails From Odoo 17 Using Code
How to Send Emails From Odoo 17 Using CodeCeline George
 
How to Add Existing Field in One2Many Tree View in Odoo 17
How to Add Existing Field in One2Many Tree View in Odoo 17How to Add Existing Field in One2Many Tree View in Odoo 17
How to Add Existing Field in One2Many Tree View in Odoo 17Celine George
 
Clinical Pharmacy Introduction to Clinical Pharmacy, Concept of clinical pptx
Clinical Pharmacy  Introduction to Clinical Pharmacy, Concept of clinical pptxClinical Pharmacy  Introduction to Clinical Pharmacy, Concept of clinical pptx
Clinical Pharmacy Introduction to Clinical Pharmacy, Concept of clinical pptxraviapr7
 
Education and training program in the hospital APR.pptx
Education and training program in the hospital APR.pptxEducation and training program in the hospital APR.pptx
Education and training program in the hospital APR.pptxraviapr7
 

Kürzlich hochgeladen (20)

Department of Health Compounder Question ‍Solution 2022.pdf
Department of Health Compounder Question ‍Solution 2022.pdfDepartment of Health Compounder Question ‍Solution 2022.pdf
Department of Health Compounder Question ‍Solution 2022.pdf
 
How to Show Error_Warning Messages in Odoo 17
How to Show Error_Warning Messages in Odoo 17How to Show Error_Warning Messages in Odoo 17
How to Show Error_Warning Messages in Odoo 17
 
2024.03.23 What do successful readers do - Sandy Millin for PARK.pptx
2024.03.23 What do successful readers do - Sandy Millin for PARK.pptx2024.03.23 What do successful readers do - Sandy Millin for PARK.pptx
2024.03.23 What do successful readers do - Sandy Millin for PARK.pptx
 
Protein Structure - threading Protein modelling pptx
Protein Structure - threading Protein modelling pptxProtein Structure - threading Protein modelling pptx
Protein Structure - threading Protein modelling pptx
 
Prescribed medication order and communication skills.pptx
Prescribed medication order and communication skills.pptxPrescribed medication order and communication skills.pptx
Prescribed medication order and communication skills.pptx
 
3.21.24 The Origins of Black Power.pptx
3.21.24  The Origins of Black Power.pptx3.21.24  The Origins of Black Power.pptx
3.21.24 The Origins of Black Power.pptx
 
CHUYÊN ĐỀ DẠY THÊM TIẾNG ANH LỚP 11 - GLOBAL SUCCESS - NĂM HỌC 2023-2024 - HK...
CHUYÊN ĐỀ DẠY THÊM TIẾNG ANH LỚP 11 - GLOBAL SUCCESS - NĂM HỌC 2023-2024 - HK...CHUYÊN ĐỀ DẠY THÊM TIẾNG ANH LỚP 11 - GLOBAL SUCCESS - NĂM HỌC 2023-2024 - HK...
CHUYÊN ĐỀ DẠY THÊM TIẾNG ANH LỚP 11 - GLOBAL SUCCESS - NĂM HỌC 2023-2024 - HK...
 
The basics of sentences session 10pptx.pptx
The basics of sentences session 10pptx.pptxThe basics of sentences session 10pptx.pptx
The basics of sentences session 10pptx.pptx
 
P4C x ELT = P4ELT: Its Theoretical Background (Kanazawa, 2024 March).pdf
P4C x ELT = P4ELT: Its Theoretical Background (Kanazawa, 2024 March).pdfP4C x ELT = P4ELT: Its Theoretical Background (Kanazawa, 2024 March).pdf
P4C x ELT = P4ELT: Its Theoretical Background (Kanazawa, 2024 March).pdf
 
SOLIDE WASTE in Cameroon,,,,,,,,,,,,,,,,,,,,,,,,,,,.pptx
SOLIDE WASTE in Cameroon,,,,,,,,,,,,,,,,,,,,,,,,,,,.pptxSOLIDE WASTE in Cameroon,,,,,,,,,,,,,,,,,,,,,,,,,,,.pptx
SOLIDE WASTE in Cameroon,,,,,,,,,,,,,,,,,,,,,,,,,,,.pptx
 
Vani Magazine - Quarterly Magazine of Seshadripuram Educational Trust
Vani Magazine - Quarterly Magazine of Seshadripuram Educational TrustVani Magazine - Quarterly Magazine of Seshadripuram Educational Trust
Vani Magazine - Quarterly Magazine of Seshadripuram Educational Trust
 
Finals of Kant get Marx 2.0 : a general politics quiz
Finals of Kant get Marx 2.0 : a general politics quizFinals of Kant get Marx 2.0 : a general politics quiz
Finals of Kant get Marx 2.0 : a general politics quiz
 
A gentle introduction to Artificial Intelligence
A gentle introduction to Artificial IntelligenceA gentle introduction to Artificial Intelligence
A gentle introduction to Artificial Intelligence
 
How to Solve Singleton Error in the Odoo 17
How to Solve Singleton Error in the  Odoo 17How to Solve Singleton Error in the  Odoo 17
How to Solve Singleton Error in the Odoo 17
 
Personal Resilience in Project Management 2 - TV Edit 1a.pdf
Personal Resilience in Project Management 2 - TV Edit 1a.pdfPersonal Resilience in Project Management 2 - TV Edit 1a.pdf
Personal Resilience in Project Management 2 - TV Edit 1a.pdf
 
HED Office Sohayok Exam Question Solution 2023.pdf
HED Office Sohayok Exam Question Solution 2023.pdfHED Office Sohayok Exam Question Solution 2023.pdf
HED Office Sohayok Exam Question Solution 2023.pdf
 
How to Send Emails From Odoo 17 Using Code
How to Send Emails From Odoo 17 Using CodeHow to Send Emails From Odoo 17 Using Code
How to Send Emails From Odoo 17 Using Code
 
How to Add Existing Field in One2Many Tree View in Odoo 17
How to Add Existing Field in One2Many Tree View in Odoo 17How to Add Existing Field in One2Many Tree View in Odoo 17
How to Add Existing Field in One2Many Tree View in Odoo 17
 
Clinical Pharmacy Introduction to Clinical Pharmacy, Concept of clinical pptx
Clinical Pharmacy  Introduction to Clinical Pharmacy, Concept of clinical pptxClinical Pharmacy  Introduction to Clinical Pharmacy, Concept of clinical pptx
Clinical Pharmacy Introduction to Clinical Pharmacy, Concept of clinical pptx
 
Education and training program in the hospital APR.pptx
Education and training program in the hospital APR.pptxEducation and training program in the hospital APR.pptx
Education and training program in the hospital APR.pptx
 

Caveon Webinar Series: Improving Testing with Key Strength Analysis

  • 1. Upcoming Caveon Events • Caveon Webinar Series: Next session, October 16 The Good and Bad of Online Proctoring, Part 2 • EATP – September 25-27 in St. Julian’s, Malta. – Caveon’s John Fremer and Steve Addicott presenting: What are we Accountable For? Security Standards and Resources for High Stakes Testing Programs – Steve Addicott hosting an ignite session: Leveraging Social Media to Connect with International Test Candidates • The 2nd Annual Statistical Detection of Potential Test Fraud Conference – October 17-19, 2013, Madison, Wisconsin – Caveon’s Dennis Maynes and Cindy Butler will be presenting three sessions • Handbook of Test Security – Now Available. We will share a discount code at the end of this session.
  • 2. Caveon Online • Caveon Security Insights Blog – http://www.caveon.com/blog/ • twitter – Follow @Caveon • LinkedIn – Caveon Company Page – ―Caveon Test Security‖ Group • Please contribute! • Facebook – Will you be our ―friend?‖ – ―Like‖ us! www.caveon.com
  • 3. Improving Testing with Key Strength Analysis Dennis Maynes Dan Allen Chief Scientist Psychometrician Caveon Test Security Western Governors University Marcus Scott Barbara Foster Data Forensics Scientist Psychometrician Caveon Test Security American Board of Obstetrics and Gynecology September 18, 2013 Caveon Webinar Series:
  • 4. Agenda for Today • Review classical item analysis • Introduce Key Strength Analysis • Derive Key Strength Analysis • Observations by Dan Allen and Barbara Foster • Conclusions and Q&A
  • 5. Review Classical Item Analysis • Statistics – P-value – Point-biserial correlation • Typical rules – Low p-values (hard items) – High p-values (easy items) – Low point-biserial correlations (low discriminations) • Easy to understand and implement • Good at flagging poor items
  • 6. Introduce Key Strength Analysis • Why Key Strength Analysis? – Model uses information from all items – Answer choices for same item are compared – Provides possible reasons for poor performance • High performing test takers (knowledgeable students) – Typically report problems with the answer key – Usually choose the correct answer • Most frequently selected choice – Is usually correct for easy items – Is not necessarily correct for hard items
  • 7. Capabilities of Key Strength Analysis • Built upon classical item analysis – Point-biserial correlations discriminate between high and low performers – P-values detect hard/easy items • Typical problems with items – Mis-keyed items – Weakly keyed items – Ambiguously keyed items • Use probabilities to make inferences about item performance
  • 8. Modify Point-Biserial Correlation 1. Exclude the item score from the test score • Places all answer choices on ―the same playing field‖ • Allows correct and incorrect answers to be compared using ―what if‖ 2. Compute point-biserial correlations • For correct answer and • For distractors 3. Scale point-biserial appropriately • We call this statistic, z* • Use z* to compute the probability of the choice (A, B, etc.) being a key--this is the ―key strength‖
  • 11. Why z* Depends on all the Right Quantities
  • 12. Z* for all Items and Responses 0 0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16 -10 -9 -8 -7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7 8 9 10 z* Right Wrong 154 Examinees, 100 Items
  • 13. Calculating p(choice is a key | data)
  • 14. Approximation Theory • Central Limit Theorem  z* is normal. • Probability function should be monotonic increasing, which requires equal variances 0 0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16 -10 -9 -8 -7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7 8 9 10 z* Right Right Normal Wrong Wrong Normal
  • 15. P(choice is a key | z*) 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 -4 -3.5 -3 -2.5 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 p(choiceisakey|z*) z*
  • 16. Analysis of Distractors • Compute key strength (KS) for all responses • Low KS – probability less than 50% • High KS – probability 50% or more AnswerDistractors Low KS High KS Low KS Weakly keyed Potential mis-key High KS Normal Ambiguously keyed
  • 17. Example I – Good Key 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 -4 -3.5 -3 -2.5 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 p(choiceisakey|z*) z* A C D B Response z* Probability A 3.25 0.99 B 0.25 0.06 C -2.75 0 D -2.4 0 Answer key arrow is colored gold
  • 18. Example II – Potential Mis-key 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 -4 -3.5 -3 -2.5 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 p(choiceisakey|z*) z* A B C D Response z* Probability A 3.25 0.99 B 0.25 0.06 C -2.75 0 D -2.4 0 Answer key arrow is colored gold
  • 19. Example III – Weak Key 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 -4 -3.5 -3 -2.5 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 p(choiceisakey|z*) z* A B C D Response z* Probability A 1.0 0.32 B 0.25 0.06 C -3 0 D -2.5 0 Answer key arrow is colored gold
  • 20. Example IV – Ambiguous Key 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 -4 -3.5 -3 -2.5 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 p(choiceisakey|z*) z* Response z* Probability A 3.75 0.99 B 2.25 0.9 C -3 0 D -2.5 0 C D A B Answer key arrow is colored gold
  • 21. Validation – Answer Key Estimation • Assume the key is not known • Check accuracy of estimated answer key • Algorithm: – Start with most frequent response as initial guess – Revise key using probabilities until no more changes • For 12 different exams – Key estimation accuracy varied from 81% to 99% – Cannot infer multiple keys – Cannot guess key when there are no correct responses
  • 22. Summary of Validation Study • Accuracy improves with item quality • Accuracy affected by sample size & test length Exam Name N Forms Form Length Items Non-scored Items Accuracy Observations A 2,966 2 180 307 0 99.2% B 337 2 107 214 0 85.5% C 337 1 230 230 0 90.9% D 1815 1 204 204 7 92.1%Some association with "deleted" items E 1408 1 199 199 1 96.0% F 46,356 2 240 480 0 96.0% G 44,104 2 120 240 0 95.8% H 25,448 2 60 120 0 93.3% I 121 3 165 417 43 81.0%Strong association with "field test" items J 1,071 8 52 & 61 391 0 80.5%85.2% (English-only) K 2,033 8 68, 76 & 77 510 0 85.9% L 6,473 21 250 1050 850 85.7% All errors except one were on non-scored items.
  • 23. Reason for Answer Key Estimation • If a group of test takers has stolen the test and worked out their own answer key, it is likely some answers will be wrong. • Answer key estimation can find the errors committed by test thieves.
  • 25. Example Item: Ambiguous Key Which is a property of all X? A. They contain Y. B. They have property Z. C. * They do not contain Y. D. They have property W. Looking at the item text, we see that this is likely being caused by rival options A and C. SME feedback suggests the item is too text specific.
  • 26. Example Item: Ambiguous Key Which is a component of X? A. * Real anticipated expense B. Time spent C. Liquid assets D. Quality In this case, students of high ability were often selecting C instead of A. SME feedback suggests the deleted word may have been turning students off to that option.
  • 27. Example Item: Weak Key Select 3 possible causes of X A. *Obesity B. Contaminated drinking water C. *Unhealthy diet D. *Genetic factors E. Lack of exercise High performing students were picking C and D correctly, but were as likely to pick E as they were to pick A. SME feedback suggested that E may be a reasonable answer to the question. The revision involved making A, C, and E all incorrect answers so that D would remain the sole answer.
  • 28. Example Item: Potential Mis-key Which is a sound accounting principle? A. X B. Not X C. *Y D. Z Nearly all students selected distractor B (Not X). This item was not mis-keyed. It seems most likely that this concept was not covered sufficiently in the text and/or other learning resources—leaving students to use guessing strategies rather than content knowledge.
  • 29. Barbara Foster Psychometrician The American Board of Obstetrics and Gynecology
  • 30. The American Board of Obstetrics and Gynecology 2013 Certifying Exam • 180 scored items • Five sets of 40 field test items
  • 31. • Potential mis-keys from Caveon – 8 identified among the scored items (4%) – 22 identified among the field test items (11%) The lower proportion in the scored items is not surprising since those items have been field tested and some may have been previously used. The American Board of Obstetrics and Gynecology
  • 32. • Result of the SME review of the flagged scored items: – 4 of the 8 (50%) were found to have problems. These problems were a combination of ambiguous wording, new information published just prior to the exam, recent changes in guidelines, or just a very difficult item. These items were deleted from the exam prior to scoring. The American Board of Obstetrics and Gynecology
  • 33. • Result of the SME review of the flagged field test items: – 15 of the 22 (68%) were found to have problems. These problems were mostly a combination of ambiguous wording, responses too closely related, and changes in the field. The American Board of Obstetrics and Gynecology
  • 34. Our Standard Methods The z* Method 27 Field Test Items flagged (13.5%) 22 Field Test Items flagged (11.0%)8 (4%) items flagged by both The American Board of Obstetrics and Gynecology
  • 35. Our Standard Methods The z* Method 27 Field Test Items flagged (13.5%) 13 had problems 22 Field Test Items flagged (11.0%) 15 had problems 8 (4%) 5 items had problems The American Board of Obstetrics and Gynecology
  • 36. • Conclusion This new method indicates that it is detecting differences that are not being detected by our current methods. These differences do not appear to be strictly keying errors but involve other important problem areas as well. The American Board of Obstetrics and Gynecology
  • 37. Conclusions • Item analysis helps ensure – Unidimensionality – Desired item performance • Key Strength Analysis enhances classical item analysis – Uses information from all items – Compares answer choices for same item • Can detect structural flaws in items • Can suggest the actual key when the item is mis-keyed – Suggests possible reasons for poor performance • Future research – Investigate thresholds for Key Strength Analysis – Simulate item problems to measure ability to detect – Evaluate performance when assumptions fail
  • 38. Questions? Please type questions for our presenters in the GoToWebinar control panel on your screen.
  • 39. HANDBOOK OF TEST SECURITY • Editors - James Wollack & John Fremer • Published March 2013 • Preventing, Detecting, and Investigating Cheating • Testing in Many Domains – Certification/Licensure – Clinical – Educational – Industrial/Organizational • Don’t forget to order your copy at www.routledge.com – http://bit.ly/HandbookTS (Case Sensitive) – Save 20% - Enter discount code: HYJ82
  • 40. THANK YOU! - Follow Caveon on twitter @caveon - Check out our blog…www.caveon.com/blog - LinkedIn Group – ―Caveon Test Security‖ Dennis Maynes Dan Allen Chief Scientist Psychometrician Caveon Test Security Western Governors University Marcus Scott Barbara Foster Data Forensics Scientist Psychometrician Caveon Test Security American Board of Obstetrics and Gynecology