SlideShare ist ein Scribd-Unternehmen logo
1 von 42
MCQ Test Item Analysis

Presented by:
                          Dr. Soha Rashed
                    Prof. of Community Medicine



       Executive Director of Medical Education Department
                Alexandria Faculty of Medicine, Egypt


                           10 March 2013
Content outlines
 Why are we here (Purpose of this session)?
 What’s next (Needed future tasks)?
 Key Features of Student Assessment Methods:
  ⁻ Content and construct Validity
  ⁻ Reliability
  ⁻ Objectivity
 MCQ Test Item Analysis:
  ⁻ Difficulty index (p-value)
  ⁻ Discrimination index (DI)=Point-Biserial correlation (PBS)
  ⁻ Distractor efficiency (DE)
  ⁻ Internal Consistency Reliability
  ⁻ Writing a technical report (including remedial actions &
   recommendations)
 MCQs evaluation checklist
 Why are we here (Purpose of this session)?
 What’s next (Needed future tasks)?
What do we assess?
Achievement of course ILOs:
   Knowledge
   Skills
   Attitudes
ILOs: 5 DOMAINS
1.   Knowledge (Recall) and Understanding
2.   Intellectual Skills
3.   Professional Skills (Practical, Procedural and
     Clinical)
4.   General and Transferable Skills
5.   Professional Attitudes and Ethics
Problem
solving   Proble
          m
          solving
Written exams
Objective    written exams:    MCQ,
 Matching, Extended matching, TF, and
 Short answer Qs.

Essay Qs.(Long, short, modified   essay
 Qs)
Key Features of Student
                Assessment Methods
Quality standards
 Validity: The ability of the test to measure what it is supposed to
    measure.
   Reliability: The consistency of the test scores over time, under
    different testing conditions, and with different raters.
   Objectivity: The degree by which examiners agree to the correct
    answer (Q is scored accurately and fairly, free of examiners’ bias)
   Practicability/Feasibility:     Overall      ease      of construction,
    administration, scoring, and reporting of an assessment instrument.
   Acceptability: the responsiveness of faculty and students to the
    assessment.
   Value/Educational impact: The utility of the test results in producing
    meaningful conclusions (usable information) about the educational
    process.
Validity
Validity refers to the extent to which an
assessment instrument or a test measures what
it intends to measure.


 Content validity
 Construct validity
I. Content validity
Content validity ensures that knowledge and skills
covered by the test items are representative of
the larger domain of knowledge and skills covered
in the course.
Test blueprint
                        Learning Objectives to be tested
          Recall Unders Applica               Problem solving         Total     %
Content/ of facts tanding tion                                                weight
 subject
                                       Analysi    Synthe    Evaluat
   area
                                         s          sis       ion
  ….       3 items 3 items     --        --         --          --     6       6%

  ….       2 items 4 items 2 items                2 items              10      10%
  ….       4 items 3 items 4 items               4 items               15      15%

  ….       5 items 4 items 4 items               4 items               17      17%
  ….       4 items     10    8 items             8 items               30      30%
                     items

  ….       3 items 7 items 5 items                7 items              22      22%

 Total       21       31       23                   25                100
% weight    21%      31%      23%                  25%                        100%
II. Construct validity

This refers to the COMPATIBILITY/
CONGRUENCE between the learning
objective (LO) to be assessed and the type
of assessment.

In other words, construct validity emphasizes
that assessment techniques should be based
on the nature of the LOs that they are
supposed to measure.
Construct validity
    Learning objective to be           Assessment instrument
           assessed
Knowledge & understanding       MCQ, TF, Matching, SAQ, Complete
                                Short essay Q
                                Long essay Q
                                Oral exam
Application & problem solving   Clinical scenario-based MCQ
                                Extended matching Q
                                Modified essay Q
                                Case study (Patient management
                                problem)
                                Oral exam
Practical skills                OSPE

Clinical skills                 OSCE (real or simulated patients)
                                Short Case
                                Long Case

Procedural skills               OSCE (Anatomical models)
To increase the test validity :
 Use of the test blue print
 Focus on the important content areas
 Sample widely across the domains and across the
  content area (% wt)
 To increase construct validity: Use items that have
  high discriminative value (those testing higher
  cognitive/thinking abilities such as comprehension,
  application and problem solving.
   e.g., applied Qs- Clinical scenario-based Qs)
 Use multiple methods to have a valid comprehensive
  assessment
Reliability
 Refers to consistency or repeatability of test
  scores.
 In practice, a reliable assessment should
  yield the same result:
  - When given to the same student at two
    different times (Test-retest reliability ), or
  - By different examiners (Inter-rater reliability),
  - While keeping all the other variables (timing,
    length, content or other contextual features) as
    consistent as possible.
- Internal consistency (intra-exam, inter-item
 reliability): Coherence of the test items, or the
 extent to which the test questions are interrelated.

                Cronbach’s alpha
MCQs are highly reliable
The results of the test are unlikely to be
influenced by:
   when the test is administered,
   when the test is scored, or by
   who does the scoring.


Hence the term “objective” is often used
when referring to these kinds of
assessments.
On the other hand, reliability is an important
concern when grading essay questions, rating
clinical skills or scoring other assessments
requiring judgment or interpretation.

In these situations, clear scoring criteria are
needed to attain a high level of reliability, regardless
of whether one or multiple people will be involved in
grading the responses.
How to improve reliability of the
           test items?
 Writing clear unambiguous questions and test
 instructions improve reliability by generating
 consistent patterns of response from the students.

 Use of structured predefined marking scheme: An
 answer key for MCQs and essay Qs, standardized
 checklists (in OSCEs/OSPEs) with clear scoring
 criteria.

 A longer test with multiple items is more likely to
 have better reliability than a shorter test with a limited
 number of items as the former 'evens out' possible
 inconsistencies of individual items.
Desirable Features of Valid and
        Reliable Assessments
 There is a clearly specified set of learning outcomes.
 Assessment tasks are matched to the stated learning
  outcomes.
 Assessment tasks are a representative sample of the
  stated learning outcomes.
 Assessment tasks are the appropriate level of
  difficulty.
 Assessment   tasks effectively distinguish
 (discriminate) between achievers and non-
 achievers.

 Clear   instructions are given for the
 administration, scoring, and interpretation of the
 assessment results.
MCQ test item analysis
Remark Classic OMR
(Optical Mark Recognition) software
Parameters commonly assessed in
        MCQ test item analysis
 Item analysis:
   Difficulty index (p-value)
   Discrimination index (DI)=Point-Biserial correlation
    (PBS)
   Distractor efficiency (DE)


 Internal Consistency Reliability
Do final grades attained by students actually reflect
               their competences??
 Do they produce meaningful conclusions about
               their performance??
Difficulty and Discrimination
Indices
Difficulty Index (p-value)
 Calculated as the percentage of students that correctly answered the item.
 The range is from 0% to 100%, or more typically written as a proportion as 0.0
  to 1.00 (p-value).
 The higher the value, the easier the item:


 Difficulty level
      d ≥75% = very easy
      d ≥ 70% = easy
      d 30-70% = moderately difficult to moderately easy (Recommended)
      d <30 % = difficult
      d <25% = very difficult

 P-values above 0.90 are very easy items and should not be reused again for
  subsequent tests. If almost all of the students can get the item correct, it is a
  concept probably not worth testing.

 P-values below 0.20 are very difficult items and should be reviewed for
  possible confusing language, removed from subsequent tests, and/or
  highlighted for an area for re-instruction. If almost all of the students get the
  item wrong there is either a problem with the item or students did not get the
  concept.
Discrimination index (DI)=
        Point-Biserial correlation (PBS)
 It describes the ability of an item to distinguish between high and
  low scorers (scores of upper and lower 27% of students after being
  ordered descendingly).

 The range is from 0.0 to 1.00.

 The higher the value, the more discriminating the item. A highly
  discriminating item indicates that the students who had high tests
  scores got the item correct whereas students who had low test
  scores got the item incorrect.

 Items with discrimination values near or less than zero should be
  removed from the test. This indicates that students who overall did
  poorly on the test did better on that item than students who overall
  did well. The item may be confusing for your better scoring students
  in some way.
Interpreting discrimination index
 0.40 or higher = very good discrimination

 0.30 to 0.39 = reasonably good discrimination but possibly subject
  to improvement

 0.20 to 0.29 = Marginal/acceptable discrimination (subject to
  improvement)

 0.00 to 0.19 = poor discrimination (to be rejected or improved by
  revision)

 Negative DI = Low performing students selected the correct
  answer more often than high scorers (to be rejected)
   Use items that have high discrimination values in the test (those
    testing higher cognitive/thinking abilities such as comprehension,
    application and problem solving)

   Linking questions to case scenarios. Asking the question in the
    context of a clinical situation, diagram, graph, image, radiologic
    image, histo-pathological section, laboratory findings, etc.
Distractor efficiency
 The distractors are important components of an item, as they show a
  relationship between the total test score and the distractor chosen by
  the student.

 Distractor efficiency is one such tool that tells whether the item was
  well constructed or failed to perform its purpose.

 The quality of the distractors influences student performance on a
  test item. Ideally, low-scoring students, who have not mastered the
  subject, should choose the distractors more often, whereas, high
  scorers should discard them more frequently while choosing the
  correct option.

 Any distractor that has been selected by less than 5% of the students
  is considered to be a non-functioning distractor (NF-D).

 Reviewing the options can reveal potential errors of judgment and
  inadequate performance of distractors. These poor distractors can be
  revised, replaced, or removed.
Internal Consistency Reliability

 Internal consistency reliability indicates how well the
  items are correlated with one another. It measures
  whether multiple items within an instrument reveal
  similar results.

 Cronbach's Alpha is used as a coefficient of internal
  consistency.

Interpreting Cronbach's Alpha:
 The range is from 0.0 to 1.0, with 0.7 generally accepted
  as a sign of acceptable reliability.
 High reliability indicates that the items are all measuring
  the same thing, or general construct
 The higher the value, the more reliable the overall test
  score.
Interpreting Cronbach's Alpha
Cronbach's
                        Internal consistency
   alpha
   α ≥ 0.9       Excellent
 0.8 ≤ α < 0.9   Very good
                 Good (There are probably a few items
 0.7 ≤ α < 0.8
                 which could be improved).
                  Somewhat low (There are probably some
 0.6 ≤ α < 0.7
                 items which could be improved.
 0.5 ≤ α < 0.6   Poor (Suggests need for revision of test).
                 Questionable/Unacceptable (This test
   α < 0.5       should not contribute heavily to the course
                 grade, and it needs revision).
Practice exercises
 Interpreting Remark Classic OMR (Optical Mark Recognition)
  software outputs

 Writing a technical report on MCQ test item analysis (including
  remedial actions & recommendations)

 Use of MCQs evaluation checklist
MCQ test item analysis

Weitere ähnliche Inhalte

Was ist angesagt?

Validity and reliability in assessment.
Validity and reliability in assessment. Validity and reliability in assessment.
Validity and reliability in assessment. Tarek Tawfik Amin
 
Validity and Reliability
Validity and ReliabilityValidity and Reliability
Validity and ReliabilityMaury Martinez
 
Objective Types of test...
Objective Types of test...Objective Types of test...
Objective Types of test...M Saqib
 
Testing and Test Construction
Testing and Test ConstructionTesting and Test Construction
Testing and Test Constructionsongoten77
 
Item analysis in education
Item analysis  in educationItem analysis  in education
Item analysis in educationmunsif123
 
Test Reliability and Validity
Test Reliability and ValidityTest Reliability and Validity
Test Reliability and ValidityBrian Ebie
 
Validity of Assessment Tools
Validity of Assessment ToolsValidity of Assessment Tools
Validity of Assessment ToolsUmairaNasim
 
Norm referenced and criterion-referenced evaluation
Norm referenced and criterion-referenced evaluationNorm referenced and criterion-referenced evaluation
Norm referenced and criterion-referenced evaluationAsifEqbal15
 
Constructing test Items
Constructing test ItemsConstructing test Items
Constructing test ItemsDEBABRATA GIRI
 
4. qualities of good measuring instrument
4. qualities of good measuring instrument4. qualities of good measuring instrument
4. qualities of good measuring instrumentJohn Paul Hablado
 
Presentation validity
Presentation validityPresentation validity
Presentation validityAshMusavi
 
Understanding reliability and validity
Understanding reliability and validityUnderstanding reliability and validity
Understanding reliability and validityMuhammad Faisal
 

Was ist angesagt? (20)

Validity and reliability in assessment.
Validity and reliability in assessment. Validity and reliability in assessment.
Validity and reliability in assessment.
 
Validity and Reliability
Validity and ReliabilityValidity and Reliability
Validity and Reliability
 
Item analysis
Item analysisItem analysis
Item analysis
 
Objective Types of test...
Objective Types of test...Objective Types of test...
Objective Types of test...
 
Testing and Test Construction
Testing and Test ConstructionTesting and Test Construction
Testing and Test Construction
 
Item analysis.pptx du
Item analysis.pptx duItem analysis.pptx du
Item analysis.pptx du
 
Measurment and evaluation
Measurment and evaluationMeasurment and evaluation
Measurment and evaluation
 
Item analysis in education
Item analysis  in educationItem analysis  in education
Item analysis in education
 
Test Reliability and Validity
Test Reliability and ValidityTest Reliability and Validity
Test Reliability and Validity
 
Multiple choice tests
Multiple choice testsMultiple choice tests
Multiple choice tests
 
Interpretation of test Scores
Interpretation of test ScoresInterpretation of test Scores
Interpretation of test Scores
 
Qualities of a Good Test
Qualities of a Good TestQualities of a Good Test
Qualities of a Good Test
 
Validity of Assessment Tools
Validity of Assessment ToolsValidity of Assessment Tools
Validity of Assessment Tools
 
Norm referenced and criterion-referenced evaluation
Norm referenced and criterion-referenced evaluationNorm referenced and criterion-referenced evaluation
Norm referenced and criterion-referenced evaluation
 
Constructing test Items
Constructing test ItemsConstructing test Items
Constructing test Items
 
Item analysis
Item analysisItem analysis
Item analysis
 
4. qualities of good measuring instrument
4. qualities of good measuring instrument4. qualities of good measuring instrument
4. qualities of good measuring instrument
 
Test construction
Test constructionTest construction
Test construction
 
Presentation validity
Presentation validityPresentation validity
Presentation validity
 
Understanding reliability and validity
Understanding reliability and validityUnderstanding reliability and validity
Understanding reliability and validity
 

Andere mochten auch

CPT Imp MCQs on Sale of Goods Act
CPT Imp MCQs on Sale of Goods ActCPT Imp MCQs on Sale of Goods Act
CPT Imp MCQs on Sale of Goods ActVXplain
 
Security analysis and Portfolio Management
Security analysis and Portfolio ManagementSecurity analysis and Portfolio Management
Security analysis and Portfolio ManagementAshutosh Pandey
 
How to create multiple choice questions
How to create multiple choice questionsHow to create multiple choice questions
How to create multiple choice questionsJennifer Morrow
 
Introduction to Excel - Excel 2013 Tutorial
Introduction to Excel - Excel 2013 TutorialIntroduction to Excel - Excel 2013 Tutorial
Introduction to Excel - Excel 2013 TutorialSpreadsheetTrainer
 
Odesk english skill test Answer 2014
Odesk english skill  test Answer 2014Odesk english skill  test Answer 2014
Odesk english skill test Answer 2014MD Riyad Rana
 
Synonyms and Antonyms (Mashup)
Synonyms and Antonyms (Mashup) Synonyms and Antonyms (Mashup)
Synonyms and Antonyms (Mashup) Carla Meyer
 
Alphabetical Order
Alphabetical OrderAlphabetical Order
Alphabetical OrderLee De Groft
 
Alphabetical order 2 ppt tg 2012
Alphabetical order 2 ppt tg 2012Alphabetical order 2 ppt tg 2012
Alphabetical order 2 ppt tg 2012gavinnancarrow
 
2014 als a&e test elementary level test passers
2014 als a&e test elementary level test passers2014 als a&e test elementary level test passers
2014 als a&e test elementary level test passersArvic Lasaca
 
Vocabulary- One Word Substitutes
Vocabulary- One Word SubstitutesVocabulary- One Word Substitutes
Vocabulary- One Word Substitutessaraswathi tenneti
 
Word Power Made Easy - Chapter 1
Word Power Made Easy - Chapter 1Word Power Made Easy - Chapter 1
Word Power Made Easy - Chapter 1Venkatram Sureddy
 
Alphabetical order
Alphabetical orderAlphabetical order
Alphabetical orderBevan James
 

Andere mochten auch (20)

Company law mcq
Company law mcqCompany law mcq
Company law mcq
 
CPT Imp MCQs on Sale of Goods Act
CPT Imp MCQs on Sale of Goods ActCPT Imp MCQs on Sale of Goods Act
CPT Imp MCQs on Sale of Goods Act
 
Security analysis and Portfolio Management
Security analysis and Portfolio ManagementSecurity analysis and Portfolio Management
Security analysis and Portfolio Management
 
Mis mcq
Mis mcqMis mcq
Mis mcq
 
Security analysis and portfolio management
Security analysis and portfolio managementSecurity analysis and portfolio management
Security analysis and portfolio management
 
How to create multiple choice questions
How to create multiple choice questionsHow to create multiple choice questions
How to create multiple choice questions
 
Introduction to Excel - Excel 2013 Tutorial
Introduction to Excel - Excel 2013 TutorialIntroduction to Excel - Excel 2013 Tutorial
Introduction to Excel - Excel 2013 Tutorial
 
Antonyms
AntonymsAntonyms
Antonyms
 
Odesk english skill test Answer 2014
Odesk english skill  test Answer 2014Odesk english skill  test Answer 2014
Odesk english skill test Answer 2014
 
Synonyms and Antonyms (Mashup)
Synonyms and Antonyms (Mashup) Synonyms and Antonyms (Mashup)
Synonyms and Antonyms (Mashup)
 
1 alphabetical order
1 alphabetical order1 alphabetical order
1 alphabetical order
 
Alphabetical Order
Alphabetical OrderAlphabetical Order
Alphabetical Order
 
Synonyms
SynonymsSynonyms
Synonyms
 
Alphabetical order 2 ppt tg 2012
Alphabetical order 2 ppt tg 2012Alphabetical order 2 ppt tg 2012
Alphabetical order 2 ppt tg 2012
 
2014 als a&e test elementary level test passers
2014 als a&e test elementary level test passers2014 als a&e test elementary level test passers
2014 als a&e test elementary level test passers
 
Alphabetical order
Alphabetical orderAlphabetical order
Alphabetical order
 
Vocabulary- One Word Substitutes
Vocabulary- One Word SubstitutesVocabulary- One Word Substitutes
Vocabulary- One Word Substitutes
 
Word Power Made Easy - Chapter 1
Word Power Made Easy - Chapter 1Word Power Made Easy - Chapter 1
Word Power Made Easy - Chapter 1
 
Synonyms
SynonymsSynonyms
Synonyms
 
Alphabetical order
Alphabetical orderAlphabetical order
Alphabetical order
 

Ähnlich wie MCQ test item analysis

STANDARDIZED AND NON-STANDARDIZED TEST
STANDARDIZED AND NON-STANDARDIZED TESTSTANDARDIZED AND NON-STANDARDIZED TEST
STANDARDIZED AND NON-STANDARDIZED TESTsakshi rana
 
Practical Language Testing by Fulcher (2010)
Practical Language Testing by Fulcher (2010)Practical Language Testing by Fulcher (2010)
Practical Language Testing by Fulcher (2010)Mahsa Farahanynia
 
Validity and reliability
Validity and reliabilityValidity and reliability
Validity and reliabilityrandoparis
 
research-instruments (1).pptx
research-instruments (1).pptxresearch-instruments (1).pptx
research-instruments (1).pptxJCronus
 
Validity and reliability of questionnaires
Validity and reliability of questionnairesValidity and reliability of questionnaires
Validity and reliability of questionnairesVenkitachalam R
 
Issues regarding construction of exams
Issues regarding construction of examsIssues regarding construction of exams
Issues regarding construction of examsFalehaa
 
Construction of Tests
Construction of TestsConstruction of Tests
Construction of TestsDakshta1
 
Stages of test writings final by joy,, language testing
Stages of test writings final by joy,, language testingStages of test writings final by joy,, language testing
Stages of test writings final by joy,, language testingpatiluna
 
Presentation Validity & Reliability
Presentation Validity & ReliabilityPresentation Validity & Reliability
Presentation Validity & Reliabilitysongoten77
 
constructionoftests-211015110341 (1).pptx
constructionoftests-211015110341 (1).pptxconstructionoftests-211015110341 (1).pptx
constructionoftests-211015110341 (1).pptxGajeSingh9
 
Languageassessmenttsl3123notes 141203115756-conversion-gate01 (1)
Languageassessmenttsl3123notes 141203115756-conversion-gate01 (1)Languageassessmenttsl3123notes 141203115756-conversion-gate01 (1)
Languageassessmenttsl3123notes 141203115756-conversion-gate01 (1)hakim azman
 
Measurement & Evaluation pptx
Measurement & Evaluation pptxMeasurement & Evaluation pptx
Measurement & Evaluation pptxAliimtiaz35
 
Assessment 15 Annotated
Assessment 15 AnnotatedAssessment 15 Annotated
Assessment 15 AnnotatedJames Atherton
 
LENGUAGE TESTING (II Bimestre Abril Agosto 2011)
LENGUAGE TESTING (II Bimestre Abril Agosto 2011)LENGUAGE TESTING (II Bimestre Abril Agosto 2011)
LENGUAGE TESTING (II Bimestre Abril Agosto 2011)Videoconferencias UTPL
 

Ähnlich wie MCQ test item analysis (20)

STANDARDIZED AND NON-STANDARDIZED TEST
STANDARDIZED AND NON-STANDARDIZED TESTSTANDARDIZED AND NON-STANDARDIZED TEST
STANDARDIZED AND NON-STANDARDIZED TEST
 
Practical Language Testing by Fulcher (2010)
Practical Language Testing by Fulcher (2010)Practical Language Testing by Fulcher (2010)
Practical Language Testing by Fulcher (2010)
 
Validity and reliability
Validity and reliabilityValidity and reliability
Validity and reliability
 
Standardized and non standardized tests (1)
Standardized and non standardized tests (1)Standardized and non standardized tests (1)
Standardized and non standardized tests (1)
 
research-instruments (1).pptx
research-instruments (1).pptxresearch-instruments (1).pptx
research-instruments (1).pptx
 
7.1 assessment and the cefr (1)
7.1 assessment and the cefr (1)7.1 assessment and the cefr (1)
7.1 assessment and the cefr (1)
 
Week 8 & 9 - Validity and Reliability
Week 8 & 9 - Validity and ReliabilityWeek 8 & 9 - Validity and Reliability
Week 8 & 9 - Validity and Reliability
 
Validity and reliability of questionnaires
Validity and reliability of questionnairesValidity and reliability of questionnaires
Validity and reliability of questionnaires
 
Issues regarding construction of exams
Issues regarding construction of examsIssues regarding construction of exams
Issues regarding construction of exams
 
7.1 assessment and the cefr (1)
7.1 assessment and the cefr (1)7.1 assessment and the cefr (1)
7.1 assessment and the cefr (1)
 
Construction of Tests
Construction of TestsConstruction of Tests
Construction of Tests
 
Stages of test writings final by joy,, language testing
Stages of test writings final by joy,, language testingStages of test writings final by joy,, language testing
Stages of test writings final by joy,, language testing
 
Presentation Validity & Reliability
Presentation Validity & ReliabilityPresentation Validity & Reliability
Presentation Validity & Reliability
 
constructionoftests-211015110341 (1).pptx
constructionoftests-211015110341 (1).pptxconstructionoftests-211015110341 (1).pptx
constructionoftests-211015110341 (1).pptx
 
Evaluation in education
Evaluation in educationEvaluation in education
Evaluation in education
 
Validity
ValidityValidity
Validity
 
Languageassessmenttsl3123notes 141203115756-conversion-gate01 (1)
Languageassessmenttsl3123notes 141203115756-conversion-gate01 (1)Languageassessmenttsl3123notes 141203115756-conversion-gate01 (1)
Languageassessmenttsl3123notes 141203115756-conversion-gate01 (1)
 
Measurement & Evaluation pptx
Measurement & Evaluation pptxMeasurement & Evaluation pptx
Measurement & Evaluation pptx
 
Assessment 15 Annotated
Assessment 15 AnnotatedAssessment 15 Annotated
Assessment 15 Annotated
 
LENGUAGE TESTING (II Bimestre Abril Agosto 2011)
LENGUAGE TESTING (II Bimestre Abril Agosto 2011)LENGUAGE TESTING (II Bimestre Abril Agosto 2011)
LENGUAGE TESTING (II Bimestre Abril Agosto 2011)
 

Kürzlich hochgeladen

Separation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesSeparation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesFatimaKhan178732
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfchloefrazer622
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...EduSkills OECD
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdfSoniaTolstoy
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdfQucHHunhnh
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeThiyagu K
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxheathfieldcps1
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3JemimahLaneBuaron
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformChameera Dedduwage
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introductionMaksud Ahmed
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxSayali Powar
 
mini mental status format.docx
mini    mental       status     format.docxmini    mental       status     format.docx
mini mental status format.docxPoojaSen20
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfciinovamais
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13Steve Thomason
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingTechSoup
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphThiyagu K
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxGaneshChakor2
 

Kürzlich hochgeladen (20)

Separation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesSeparation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and Actinides
 
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptxINDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdf
 
Advance Mobile Application Development class 07
Advance Mobile Application Development class 07Advance Mobile Application Development class 07
Advance Mobile Application Development class 07
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and Mode
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy Reform
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
 
mini mental status format.docx
mini    mental       status     format.docxmini    mental       status     format.docx
mini mental status format.docx
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot Graph
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptx
 
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
 

MCQ test item analysis

  • 1. MCQ Test Item Analysis Presented by: Dr. Soha Rashed Prof. of Community Medicine Executive Director of Medical Education Department Alexandria Faculty of Medicine, Egypt 10 March 2013
  • 2. Content outlines  Why are we here (Purpose of this session)?  What’s next (Needed future tasks)?  Key Features of Student Assessment Methods: ⁻ Content and construct Validity ⁻ Reliability ⁻ Objectivity  MCQ Test Item Analysis: ⁻ Difficulty index (p-value) ⁻ Discrimination index (DI)=Point-Biserial correlation (PBS) ⁻ Distractor efficiency (DE) ⁻ Internal Consistency Reliability ⁻ Writing a technical report (including remedial actions & recommendations)  MCQs evaluation checklist
  • 3.  Why are we here (Purpose of this session)?  What’s next (Needed future tasks)?
  • 4.
  • 5. What do we assess? Achievement of course ILOs: Knowledge Skills Attitudes
  • 6. ILOs: 5 DOMAINS 1. Knowledge (Recall) and Understanding 2. Intellectual Skills 3. Professional Skills (Practical, Procedural and Clinical) 4. General and Transferable Skills 5. Professional Attitudes and Ethics
  • 7. Problem solving Proble m solving
  • 8. Written exams Objective written exams: MCQ, Matching, Extended matching, TF, and Short answer Qs. Essay Qs.(Long, short, modified essay Qs)
  • 9. Key Features of Student Assessment Methods Quality standards  Validity: The ability of the test to measure what it is supposed to measure.  Reliability: The consistency of the test scores over time, under different testing conditions, and with different raters.  Objectivity: The degree by which examiners agree to the correct answer (Q is scored accurately and fairly, free of examiners’ bias)  Practicability/Feasibility: Overall ease of construction, administration, scoring, and reporting of an assessment instrument.  Acceptability: the responsiveness of faculty and students to the assessment.  Value/Educational impact: The utility of the test results in producing meaningful conclusions (usable information) about the educational process.
  • 10. Validity Validity refers to the extent to which an assessment instrument or a test measures what it intends to measure.  Content validity  Construct validity
  • 11. I. Content validity Content validity ensures that knowledge and skills covered by the test items are representative of the larger domain of knowledge and skills covered in the course.
  • 12. Test blueprint Learning Objectives to be tested Recall Unders Applica Problem solving Total % Content/ of facts tanding tion weight subject Analysi Synthe Evaluat area s sis ion …. 3 items 3 items -- -- -- -- 6 6% …. 2 items 4 items 2 items 2 items 10 10% …. 4 items 3 items 4 items 4 items 15 15% …. 5 items 4 items 4 items 4 items 17 17% …. 4 items 10 8 items 8 items 30 30% items …. 3 items 7 items 5 items 7 items 22 22% Total 21 31 23 25 100 % weight 21% 31% 23% 25% 100%
  • 13. II. Construct validity This refers to the COMPATIBILITY/ CONGRUENCE between the learning objective (LO) to be assessed and the type of assessment. In other words, construct validity emphasizes that assessment techniques should be based on the nature of the LOs that they are supposed to measure.
  • 14. Construct validity Learning objective to be Assessment instrument assessed Knowledge & understanding MCQ, TF, Matching, SAQ, Complete Short essay Q Long essay Q Oral exam Application & problem solving Clinical scenario-based MCQ Extended matching Q Modified essay Q Case study (Patient management problem) Oral exam Practical skills OSPE Clinical skills OSCE (real or simulated patients) Short Case Long Case Procedural skills OSCE (Anatomical models)
  • 15. To increase the test validity :  Use of the test blue print  Focus on the important content areas  Sample widely across the domains and across the content area (% wt)  To increase construct validity: Use items that have high discriminative value (those testing higher cognitive/thinking abilities such as comprehension, application and problem solving. e.g., applied Qs- Clinical scenario-based Qs)  Use multiple methods to have a valid comprehensive assessment
  • 16. Reliability  Refers to consistency or repeatability of test scores.  In practice, a reliable assessment should yield the same result: - When given to the same student at two different times (Test-retest reliability ), or - By different examiners (Inter-rater reliability), - While keeping all the other variables (timing, length, content or other contextual features) as consistent as possible.
  • 17. - Internal consistency (intra-exam, inter-item reliability): Coherence of the test items, or the extent to which the test questions are interrelated. Cronbach’s alpha
  • 18. MCQs are highly reliable The results of the test are unlikely to be influenced by:  when the test is administered,  when the test is scored, or by  who does the scoring. Hence the term “objective” is often used when referring to these kinds of assessments.
  • 19. On the other hand, reliability is an important concern when grading essay questions, rating clinical skills or scoring other assessments requiring judgment or interpretation. In these situations, clear scoring criteria are needed to attain a high level of reliability, regardless of whether one or multiple people will be involved in grading the responses.
  • 20. How to improve reliability of the test items?  Writing clear unambiguous questions and test instructions improve reliability by generating consistent patterns of response from the students.  Use of structured predefined marking scheme: An answer key for MCQs and essay Qs, standardized checklists (in OSCEs/OSPEs) with clear scoring criteria.  A longer test with multiple items is more likely to have better reliability than a shorter test with a limited number of items as the former 'evens out' possible inconsistencies of individual items.
  • 21. Desirable Features of Valid and Reliable Assessments  There is a clearly specified set of learning outcomes.  Assessment tasks are matched to the stated learning outcomes.  Assessment tasks are a representative sample of the stated learning outcomes.  Assessment tasks are the appropriate level of difficulty.
  • 22.  Assessment tasks effectively distinguish (discriminate) between achievers and non- achievers.  Clear instructions are given for the administration, scoring, and interpretation of the assessment results.
  • 23. MCQ test item analysis
  • 24. Remark Classic OMR (Optical Mark Recognition) software
  • 25. Parameters commonly assessed in MCQ test item analysis  Item analysis:  Difficulty index (p-value)  Discrimination index (DI)=Point-Biserial correlation (PBS)  Distractor efficiency (DE)  Internal Consistency Reliability
  • 26.
  • 27.
  • 28.
  • 29. Do final grades attained by students actually reflect their competences?? Do they produce meaningful conclusions about their performance??
  • 30.
  • 31.
  • 33. Difficulty Index (p-value)  Calculated as the percentage of students that correctly answered the item.  The range is from 0% to 100%, or more typically written as a proportion as 0.0 to 1.00 (p-value).  The higher the value, the easier the item:  Difficulty level  d ≥75% = very easy  d ≥ 70% = easy  d 30-70% = moderately difficult to moderately easy (Recommended)  d <30 % = difficult  d <25% = very difficult  P-values above 0.90 are very easy items and should not be reused again for subsequent tests. If almost all of the students can get the item correct, it is a concept probably not worth testing.  P-values below 0.20 are very difficult items and should be reviewed for possible confusing language, removed from subsequent tests, and/or highlighted for an area for re-instruction. If almost all of the students get the item wrong there is either a problem with the item or students did not get the concept.
  • 34. Discrimination index (DI)= Point-Biserial correlation (PBS)  It describes the ability of an item to distinguish between high and low scorers (scores of upper and lower 27% of students after being ordered descendingly).  The range is from 0.0 to 1.00.  The higher the value, the more discriminating the item. A highly discriminating item indicates that the students who had high tests scores got the item correct whereas students who had low test scores got the item incorrect.  Items with discrimination values near or less than zero should be removed from the test. This indicates that students who overall did poorly on the test did better on that item than students who overall did well. The item may be confusing for your better scoring students in some way.
  • 35. Interpreting discrimination index  0.40 or higher = very good discrimination  0.30 to 0.39 = reasonably good discrimination but possibly subject to improvement  0.20 to 0.29 = Marginal/acceptable discrimination (subject to improvement)  0.00 to 0.19 = poor discrimination (to be rejected or improved by revision)  Negative DI = Low performing students selected the correct answer more often than high scorers (to be rejected)
  • 36. Use items that have high discrimination values in the test (those testing higher cognitive/thinking abilities such as comprehension, application and problem solving)  Linking questions to case scenarios. Asking the question in the context of a clinical situation, diagram, graph, image, radiologic image, histo-pathological section, laboratory findings, etc.
  • 37.
  • 38. Distractor efficiency  The distractors are important components of an item, as they show a relationship between the total test score and the distractor chosen by the student.  Distractor efficiency is one such tool that tells whether the item was well constructed or failed to perform its purpose.  The quality of the distractors influences student performance on a test item. Ideally, low-scoring students, who have not mastered the subject, should choose the distractors more often, whereas, high scorers should discard them more frequently while choosing the correct option.  Any distractor that has been selected by less than 5% of the students is considered to be a non-functioning distractor (NF-D).  Reviewing the options can reveal potential errors of judgment and inadequate performance of distractors. These poor distractors can be revised, replaced, or removed.
  • 39. Internal Consistency Reliability  Internal consistency reliability indicates how well the items are correlated with one another. It measures whether multiple items within an instrument reveal similar results.  Cronbach's Alpha is used as a coefficient of internal consistency. Interpreting Cronbach's Alpha:  The range is from 0.0 to 1.0, with 0.7 generally accepted as a sign of acceptable reliability.  High reliability indicates that the items are all measuring the same thing, or general construct  The higher the value, the more reliable the overall test score.
  • 40. Interpreting Cronbach's Alpha Cronbach's Internal consistency alpha α ≥ 0.9 Excellent 0.8 ≤ α < 0.9 Very good Good (There are probably a few items 0.7 ≤ α < 0.8 which could be improved). Somewhat low (There are probably some 0.6 ≤ α < 0.7 items which could be improved. 0.5 ≤ α < 0.6 Poor (Suggests need for revision of test). Questionable/Unacceptable (This test α < 0.5 should not contribute heavily to the course grade, and it needs revision).
  • 41. Practice exercises  Interpreting Remark Classic OMR (Optical Mark Recognition) software outputs  Writing a technical report on MCQ test item analysis (including remedial actions & recommendations)  Use of MCQs evaluation checklist

Hinweis der Redaktion

  1. MCQ, TF, Matching, SAQ, CompleteShort essay QLong essay QOral exam