SlideShare ist ein Scribd-Unternehmen logo
1 von 36
Evaluation of the
reliability for L2 speech
rating in discourse
completion test
Yusuke Kondo and
Yutaka Ishii
Prediction method used in
automated scoring system for L2
1
0
01
Item x
Item x
Predictors
Speech rate
Pitch range
Mean length of utterance
2
Predictor examination
0
1 01
IndexA
Index B
IndexC
Index D
Good predictors Bad predictors
When we try to predict scores using two indices …,
3
Unreliable rating
= 0
= 1
IndexA
Index B
The first rating The second rating
IndexA
Index B
4
Ishii and Kondo (2015)
5
.27 .57
Our own ratings Ratings in Narita (2013)
Agreement of automated scoring with raters
Group Correlation % Exact
Agreement
% Adjacent
Agreement Kappa
Weighted
Kappa
Naïve .77 41 89 .27 .75
Untrained .61 31 73 .16 .59
Certificated (Average) .92 70 99 .62 .91
Certificated (Exemplary) .95 80 100 .76 .94
Powers, Escoffery, and Duchnowski (2015) Applied Measurement in Education
Untrained < Naïve < Certificated (Average) < Certificated (Exemplary)
6
Comes as no surprise
• Reliable rating is absolutely
essential for the construction of
automated scoring system.
7
Then,
• how do we evaluate reliability in
L2 performance?
• What index should be used?
8
Outline
• Reliability indices in L2
performance assessment
• Reliability indices in
psychometrics
• Observation of reliability indices
• Some comments and suggestions
9
Language Testing 30-32
• Reliability indices used
1. Cronbach’s Alpha
2. Percentage of agreements
3. Cohen’s kappa
4. Spearman rank correlation coefficient
5. Pearson correlation coefficient
6. Infit and Outfit measures (IRT)
7. Root-mean-square deviation
10
Alpha in rating data
• Bachman (2004) “coefficient
alpha should be used”
• Bachman’s recommendation is
introduced in Carr (2011) and
Sawaki (2013).
11
Journals on psychometrics
• Reliability indices discussed
1. Polychoric correlation coefficient
2. McDonald’s omega
3. Intraclass correlation coefficient
4. Standard deviation of correlation coefficients
5. Means of correlation coefficients
12
Next,
• we will be looking at how the
reliability indices behave in our
rating data.
13
Data
• 30 different discourse completion
task completed by 44-60
university students.
• Each utterance was rated by
different three raters
14
Example
When you (A) want to ask your friend
about their weekend, what would you
say in the conversation below?
A: ( )
B: We went shopping.
15
Rating criteria
Score Description
3
Can understand the speaker’s intention. Natural pronunciation and
Intonation. Almost no foreign accentedness.
2 Can understand the speaker’s intention, but can find some foreign accents.
1 Can’t understand the speakers’ intention because of strong foreign accents
0 Can’t catch the utterance because of low voice or noise.
16
Target indices
• Cronbach’s alpha
– Kendall
– Spearman
– Pearson
– Polychoric
• McDonald’s omega
• Mean of correlation
coefficients
• Fleiss’ kappa
• Percentage of exact and
adjacent agreement
17
Data frame
α_k α_spe α_pea α_pol . . . κ %
Item 1 .47 .53 .48 .74 . . . .22 .75
Item 2 .56 .55 .55 .67 . . . .25 .80
Item 3 .62 .67 .64 .59 . . . .30 .90
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
Item 30 .66 .86 .67 .92 . . . .47 .66
18
Much the same.
Mean of correlation
coefficients
Cronbach’s alpha McDonald’s omega
19
Correlations among coefficients
Cronbach’s Alpha
alpha_ken
0.5 0.6 0.7 0.8
0.99 0.91
0.4 0.6 0.8
0.450.600.75
0.79
0.50.60.70.8
alpha_spe
0.93 0.81
alpha_pea
0.500.650.80
0.81
0.45 0.60 0.75
0.40.60.8
0.50 0.65 0.80
alpha_pol
Mean of Correlation Coefficients
m_ken
0.3 0.4 0.5 0.6
1.00 0.92
0.2 0.4 0.6 0.8
0.20.30.40.5
0.74
0.30.40.50.6
m_spe
0.94 0.76
m_pea
0.30.40.50.6
0.78
0.2 0.3 0.4 0.5
0.20.40.60.8
0.3 0.4 0.5 0.6
m_pol
20
Correlations among coefficients
McDonald’s omega
omegah_ken
0.50 0.60 0.70 0.80
0.97 0.86
0.3 0.5 0.7 0.9
0.500.600.700.80
0.69
0.500.600.700.80
omegah_spe
0.91 0.73
omegah_pea
0.550.650.750.85
0.67
0.50 0.60 0.70 0.80
0.30.50.70.9
0.55 0.65 0.75 0.85
omegah_pol
21
Comment
• Much the same results can be
obtained by Spearman’s and
Pearson’s in 4-point scale.
22
Suggestion
• Polychoric correlation coefficients
should be used, if you would
prefer not to violate statistical
constraints and/or to
underestimate the reliability of
your data.
23
Reason
• Pearson’s should not be used for
rating data.
• Use Spearman’s instead.
• But, their correlation is extremely
high.
• They might share their construct.
24
Correlation among indices
Kendall’s based indices
m_ken
0.45 0.55 0.65 0.75
0.99
0.20.30.40.5
0.97
0.450.550.650.75
alpha_ken
0.97
0.2 0.3 0.4 0.5 0.50 0.60 0.70 0.80
0.500.600.700.80
omegah_ken
Spearman’s-based indices
m_spe
0.5 0.6 0.7 0.8
0.99
0.30.40.50.6
0.96
0.50.60.70.8
alpha_spe
0.97
0.3 0.4 0.5 0.6 0.50 0.60 0.70 0.80
0.500.600.700.80
omegah_spe
25
Correlation among indices
Pearson’s-based indices
m_pea
0.50 0.60 0.70 0.80
0.99
0.30.40.50.6
0.95
0.500.600.700.80
alpha_pea
0.95
0.3 0.4 0.5 0.6 0.55 0.65 0.75 0.85
0.550.650.750.85
omegah_pea
Polychoric-based indices
alpha_pol
0.3 0.4 0.5 0.6 0.7 0.8 0.9
0.94
0.40.50.60.70.80.9
0.98
0.30.40.50.60.70.80.9
omegah_pol
0.88
0.4 0.5 0.6 0.7 0.8 0.9 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
0.20.30.40.50.60.70.80.9
m_pol
26
Suggestion
• Mean of correlation coefficients,
Cronbach’s alpha, and
McDonald’s omega, you can use
any of them.
27
ICC, Kappa, and %
α M of r ω ICC κ %
α 1 .98 .94 .75 .54 .53
M of r .98 1 .88 .72 .54 .44
ω .94 .88 1 .74 .48 .58
ICC .75 .72 .74 1 .81 .72
κ .54 .54 .48 .81 1 .61
% .53 .44 .58 .72 .61 1
α : α using polychoric correlation coefficients
M of r : Mean of polychoric correlation coefficients
ω : ω using polychoric correlation coefficients
ICC : Intraclass correlation coefficients
κ : Fleiss’ kappa
% : Percentage of exact and adjacent agreements
28
Comment
• “Agreement” may be a construct
different from “reliability.”
29
Rater A Rater B
↑
True score
Agreement
↓
• One more thing, we have found
30
A feature of alpha
A B C D E
A 1
B .7 1
C .7 .7 1
D .7 .7 .7 1
E .7 .7 .7 .7 1
F G H I J
F 1
G .9 1
H .9 .9 1
I .5 .5 .5 1
J .6 .6 .6 .9 1
Table 1: Item A Table 2: Item B
𝛼 = .92 𝛼 = .92
The tables were created, based on Schmitt (1996)
Psychological Assessment
To show the difference, SD of correlation coefficients is
recommended to be reported.
31
In our data
K L M
K 1
L .80 1
M .45 .90 1
0.05
0.10
0.15
0.20
0.4 0.6 0.8
Alpha
SD
N O P
N 1
O .95 1
P .92 .76 1
32
Comments
• Even if we obtain much the same
alphas, the correlations among
raters are different in two items.
33
Another feature of alpha
Q R S
Q 1
R .7 1
S .7 .7 1
T U V X Y Z
T 1
U .7 1
V .7 .7 1
X .7 .7 .7 1
Y .7 .7 .7 .7 1
Z .7 .7 .7 .7 .7 1
𝛼 = .87
𝛼 = .93
a b c d e f
a 1
b .5 1
c .5 .5 1
d .5 .5 .5 1
e .5 .5 .5 .5 1
f .5 .5 .5 .5 .5 1 𝛼 = .86
34
Final suggestions
• When you report on the
reliability in the rating data with
more than two raters,
– Polychoric correlation coefficients should be used.
– SD of correlation coefficients among raters is
recommended to be reported.
– Mean of correlation coefficients might be used
instead of alpha (, which might be more
comprehensible than alpha).
35
Outline
• Reliability indices in L2
performance assessment
• Reliability indices in
psychometrics
• Observation of reliability indices
• Some comments and suggestions
36

Weitere ähnliche Inhalte

Was ist angesagt?

Mba724 s2 w2 spss intro & daya types
Mba724 s2 w2 spss intro & daya typesMba724 s2 w2 spss intro & daya types
Mba724 s2 w2 spss intro & daya types
Rachel Chung
 
Lesson 1 7 (3)
Lesson 1 7 (3)Lesson 1 7 (3)
Lesson 1 7 (3)
chrismac47
 
Rasch Model Theorem_Scale Construct
Rasch Model Theorem_Scale ConstructRasch Model Theorem_Scale Construct
Rasch Model Theorem_Scale Construct
Saidfudin Mas'udi
 
MEASURES OF CENTRAL TENDENCY AND VARIABILITY
MEASURES OF CENTRAL TENDENCY AND VARIABILITYMEASURES OF CENTRAL TENDENCY AND VARIABILITY
MEASURES OF CENTRAL TENDENCY AND VARIABILITY
Mariele Brutas
 

Was ist angesagt? (18)

Math 221 week 6 live lecture
Math 221 week 6 live lectureMath 221 week 6 live lecture
Math 221 week 6 live lecture
 
Regression analysis on SPSS
Regression analysis on SPSSRegression analysis on SPSS
Regression analysis on SPSS
 
Mb0040 statistics for management
Mb0040  statistics for managementMb0040  statistics for management
Mb0040 statistics for management
 
Assignment 1 (to be submitted through the assignment submiss
Assignment 1 (to be submitted through the assignment submissAssignment 1 (to be submitted through the assignment submiss
Assignment 1 (to be submitted through the assignment submiss
 
Sample computer
Sample computerSample computer
Sample computer
 
Mba724 s2 w2 spss intro & daya types
Mba724 s2 w2 spss intro & daya typesMba724 s2 w2 spss intro & daya types
Mba724 s2 w2 spss intro & daya types
 
Central tendency Measures and Variability
Central tendency Measures and VariabilityCentral tendency Measures and Variability
Central tendency Measures and Variability
 
VCE Physics: Dealing with numerical measurments
VCE Physics: Dealing with numerical measurmentsVCE Physics: Dealing with numerical measurments
VCE Physics: Dealing with numerical measurments
 
measures of position - grade 10 math
measures of position - grade 10 mathmeasures of position - grade 10 math
measures of position - grade 10 math
 
Lesson 1 7 (3)
Lesson 1 7 (3)Lesson 1 7 (3)
Lesson 1 7 (3)
 
Objective Standard Setting_An application of Many Facet Rasch Model
Objective Standard Setting_An application of Many Facet Rasch ModelObjective Standard Setting_An application of Many Facet Rasch Model
Objective Standard Setting_An application of Many Facet Rasch Model
 
Rasch Model Theorem_Scale Construct
Rasch Model Theorem_Scale ConstructRasch Model Theorem_Scale Construct
Rasch Model Theorem_Scale Construct
 
Method of measuring test reliability
Method of measuring test reliabilityMethod of measuring test reliability
Method of measuring test reliability
 
Math533 final exam_study_guide
Math533 final exam_study_guideMath533 final exam_study_guide
Math533 final exam_study_guide
 
Measures of central tendancy
Measures of central tendancy Measures of central tendancy
Measures of central tendancy
 
Mixed Effects Models - Random Intercepts
Mixed Effects Models - Random InterceptsMixed Effects Models - Random Intercepts
Mixed Effects Models - Random Intercepts
 
Math533 finalexamreviewapr13
Math533 finalexamreviewapr13Math533 finalexamreviewapr13
Math533 finalexamreviewapr13
 
MEASURES OF CENTRAL TENDENCY AND VARIABILITY
MEASURES OF CENTRAL TENDENCY AND VARIABILITYMEASURES OF CENTRAL TENDENCY AND VARIABILITY
MEASURES OF CENTRAL TENDENCY AND VARIABILITY
 

Andere mochten auch

Matthew Gray Exit Presentation Summer 2016 Full
Matthew Gray Exit Presentation Summer 2016 FullMatthew Gray Exit Presentation Summer 2016 Full
Matthew Gray Exit Presentation Summer 2016 Full
Matthew Gray
 
Different techniques for speech recognition
Different  techniques for speech recognitionDifferent  techniques for speech recognition
Different techniques for speech recognition
yashi saxena
 
Voice Recognition
Voice RecognitionVoice Recognition
Voice Recognition
Amrita More
 
Speech recognition project report
Speech recognition project reportSpeech recognition project report
Speech recognition project report
Sarang Afle
 

Andere mochten auch (20)

Mems project by abhishek mahajan
Mems project by abhishek mahajanMems project by abhishek mahajan
Mems project by abhishek mahajan
 
Matthew Gray Exit Presentation Summer 2016 Full
Matthew Gray Exit Presentation Summer 2016 FullMatthew Gray Exit Presentation Summer 2016 Full
Matthew Gray Exit Presentation Summer 2016 Full
 
Speaker recognition system by abhishek mahajan
Speaker recognition system by abhishek mahajanSpeaker recognition system by abhishek mahajan
Speaker recognition system by abhishek mahajan
 
Speech recognition techniques
Speech recognition techniquesSpeech recognition techniques
Speech recognition techniques
 
Mandarin-English Code-Switching Automatic Speech Recognition Presentation
Mandarin-English Code-Switching Automatic Speech Recognition PresentationMandarin-English Code-Switching Automatic Speech Recognition Presentation
Mandarin-English Code-Switching Automatic Speech Recognition Presentation
 
Different techniques for speech recognition
Different  techniques for speech recognitionDifferent  techniques for speech recognition
Different techniques for speech recognition
 
机器学习概述
机器学习概述机器学习概述
机器学习概述
 
Amazon Echo
Amazon EchoAmazon Echo
Amazon Echo
 
Voice Recognition
Voice RecognitionVoice Recognition
Voice Recognition
 
Automatic speech recognition
Automatic speech recognitionAutomatic speech recognition
Automatic speech recognition
 
Speech recognition project report
Speech recognition project reportSpeech recognition project report
Speech recognition project report
 
Speech recognition final presentation
Speech recognition final presentationSpeech recognition final presentation
Speech recognition final presentation
 
Speech Recognition System By Matlab
Speech Recognition System By MatlabSpeech Recognition System By Matlab
Speech Recognition System By Matlab
 
Artificial intelligence for speech recognition
Artificial intelligence for speech recognitionArtificial intelligence for speech recognition
Artificial intelligence for speech recognition
 
Speech recognition
Speech recognitionSpeech recognition
Speech recognition
 
AWS re:Invent 2016: Workshop: Build an Alexa-Enabled Product with Raspberry P...
AWS re:Invent 2016: Workshop: Build an Alexa-Enabled Product with Raspberry P...AWS re:Invent 2016: Workshop: Build an Alexa-Enabled Product with Raspberry P...
AWS re:Invent 2016: Workshop: Build an Alexa-Enabled Product with Raspberry P...
 
Soumith Chintala at AI Frontiers: A Dynamic View of the Deep Learning World
Soumith Chintala at AI Frontiers: A Dynamic View of the Deep Learning WorldSoumith Chintala at AI Frontiers: A Dynamic View of the Deep Learning World
Soumith Chintala at AI Frontiers: A Dynamic View of the Deep Learning World
 
AWS re:Invent 2016: Workshop: Creating Voice Experiences with Alexa Skills: F...
AWS re:Invent 2016: Workshop: Creating Voice Experiences with Alexa Skills: F...AWS re:Invent 2016: Workshop: Creating Voice Experiences with Alexa Skills: F...
AWS re:Invent 2016: Workshop: Creating Voice Experiences with Alexa Skills: F...
 
AWS re:Invent 2016: Deep Learning in Alexa (MAC202)
AWS re:Invent 2016: Deep Learning in Alexa (MAC202)AWS re:Invent 2016: Deep Learning in Alexa (MAC202)
AWS re:Invent 2016: Deep Learning in Alexa (MAC202)
 
Nikko Ström at AI Frontiers: Deep Learning in Alexa
Nikko Ström at AI Frontiers: Deep Learning in AlexaNikko Ström at AI Frontiers: Deep Learning in Alexa
Nikko Ström at AI Frontiers: Deep Learning in Alexa
 

Ähnlich wie Evaluation of the reliability for L2 speech rating in discourse completion testMethoken in seoul

PSY 527 Assessment Techniques Text Mastering Modern Psycholog.docx
PSY 527 Assessment Techniques Text Mastering Modern Psycholog.docxPSY 527 Assessment Techniques Text Mastering Modern Psycholog.docx
PSY 527 Assessment Techniques Text Mastering Modern Psycholog.docx
amrit47
 
Topic 8a Basic Statistics
Topic 8a Basic StatisticsTopic 8a Basic Statistics
Topic 8a Basic Statistics
Yee Bee Choo
 
Educational Psychology 565 Practice Quiz(use α = .05 unl.docx
Educational Psychology 565 Practice Quiz(use α = .05 unl.docxEducational Psychology 565 Practice Quiz(use α = .05 unl.docx
Educational Psychology 565 Practice Quiz(use α = .05 unl.docx
toltonkendal
 
Assessment 3 – Hypothesis, Effect Size, Power, and t Tests.docx
Assessment 3 – Hypothesis, Effect Size, Power, and t Tests.docxAssessment 3 – Hypothesis, Effect Size, Power, and t Tests.docx
Assessment 3 – Hypothesis, Effect Size, Power, and t Tests.docx
cargillfilberto
 
10. The Pearson r and Spearman rho correlation coefficients ar.docx
10. The Pearson r and Spearman rho correlation coefficients ar.docx10. The Pearson r and Spearman rho correlation coefficients ar.docx
10. The Pearson r and Spearman rho correlation coefficients ar.docx
hyacinthshackley2629
 
eBook PDF textbook - Essentials of Econometrics, 5e Damodar Gujarati.pdf
eBook PDF textbook - Essentials of Econometrics, 5e Damodar Gujarati.pdfeBook PDF textbook - Essentials of Econometrics, 5e Damodar Gujarati.pdf
eBook PDF textbook - Essentials of Econometrics, 5e Damodar Gujarati.pdf
EdwinPolack1
 
Measurement of variable& scaling
Measurement of variable& scalingMeasurement of variable& scaling
Measurement of variable& scaling
H9460730008
 
Measurement of variable& scaling (2)
Measurement of variable& scaling (2)Measurement of variable& scaling (2)
Measurement of variable& scaling (2)
H9460730008
 

Ähnlich wie Evaluation of the reliability for L2 speech rating in discourse completion testMethoken in seoul (20)

PSY 527 Assessment Techniques Text Mastering Modern Psycholog.docx
PSY 527 Assessment Techniques Text Mastering Modern Psycholog.docxPSY 527 Assessment Techniques Text Mastering Modern Psycholog.docx
PSY 527 Assessment Techniques Text Mastering Modern Psycholog.docx
 
ANSWERS
ANSWERSANSWERS
ANSWERS
 
Accuracy and errors
Accuracy and errorsAccuracy and errors
Accuracy and errors
 
Topic 8a Basic Statistics
Topic 8a Basic StatisticsTopic 8a Basic Statistics
Topic 8a Basic Statistics
 
Educational Psychology 565 Practice Quiz(use α = .05 unl.docx
Educational Psychology 565 Practice Quiz(use α = .05 unl.docxEducational Psychology 565 Practice Quiz(use α = .05 unl.docx
Educational Psychology 565 Practice Quiz(use α = .05 unl.docx
 
Assessment 3 – Hypothesis, Effect Size, Power, and t Tests.docx
Assessment 3 – Hypothesis, Effect Size, Power, and t Tests.docxAssessment 3 – Hypothesis, Effect Size, Power, and t Tests.docx
Assessment 3 – Hypothesis, Effect Size, Power, and t Tests.docx
 
Friedman-SPSS.docx
Friedman-SPSS.docxFriedman-SPSS.docx
Friedman-SPSS.docx
 
10. The Pearson r and Spearman rho correlation coefficients ar.docx
10. The Pearson r and Spearman rho correlation coefficients ar.docx10. The Pearson r and Spearman rho correlation coefficients ar.docx
10. The Pearson r and Spearman rho correlation coefficients ar.docx
 
Practical Language Testing by Fulcher (2010)
Practical Language Testing by Fulcher (2010)Practical Language Testing by Fulcher (2010)
Practical Language Testing by Fulcher (2010)
 
eBook PDF textbook - Essentials of Econometrics, 5e Damodar Gujarati.pdf
eBook PDF textbook - Essentials of Econometrics, 5e Damodar Gujarati.pdfeBook PDF textbook - Essentials of Econometrics, 5e Damodar Gujarati.pdf
eBook PDF textbook - Essentials of Econometrics, 5e Damodar Gujarati.pdf
 
Watson-Glaser II Technical Details
Watson-Glaser II Technical DetailsWatson-Glaser II Technical Details
Watson-Glaser II Technical Details
 
tutor2u Strong Foundations A Level Psychology
tutor2u Strong Foundations A Level Psychologytutor2u Strong Foundations A Level Psychology
tutor2u Strong Foundations A Level Psychology
 
Chisquare Test
Chisquare Test Chisquare Test
Chisquare Test
 
Measurement of variable& scaling
Measurement of variable& scalingMeasurement of variable& scaling
Measurement of variable& scaling
 
Final.Version
Final.VersionFinal.Version
Final.Version
 
Measures of variability to grading and reporting
Measures of variability to grading and reportingMeasures of variability to grading and reporting
Measures of variability to grading and reporting
 
Measurement of variable& scaling (2)
Measurement of variable& scaling (2)Measurement of variable& scaling (2)
Measurement of variable& scaling (2)
 
Reliability and validity
Reliability and validityReliability and validity
Reliability and validity
 
TCI in general pracice - reliability (2006)
TCI in general pracice - reliability (2006)TCI in general pracice - reliability (2006)
TCI in general pracice - reliability (2006)
 
Session 4 Structural Model Evaluation
Session 4 Structural Model Evaluation Session 4 Structural Model Evaluation
Session 4 Structural Model Evaluation
 

Kürzlich hochgeladen

Spellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please PractiseSpellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please Practise
AnaAcapella
 
Vishram Singh - Textbook of Anatomy Upper Limb and Thorax.. Volume 1 (1).pdf
Vishram Singh - Textbook of Anatomy  Upper Limb and Thorax.. Volume 1 (1).pdfVishram Singh - Textbook of Anatomy  Upper Limb and Thorax.. Volume 1 (1).pdf
Vishram Singh - Textbook of Anatomy Upper Limb and Thorax.. Volume 1 (1).pdf
ssuserdda66b
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
QucHHunhnh
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptx
heathfieldcps1
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
ciinovamais
 

Kürzlich hochgeladen (20)

UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdfUGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
 
General Principles of Intellectual Property: Concepts of Intellectual Proper...
General Principles of Intellectual Property: Concepts of Intellectual  Proper...General Principles of Intellectual Property: Concepts of Intellectual  Proper...
General Principles of Intellectual Property: Concepts of Intellectual Proper...
 
Towards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptxTowards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptx
 
FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024
 
Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024
 
Spellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please PractiseSpellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please Practise
 
Food safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdfFood safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdf
 
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.
 
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
 
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptxSKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
 
Vishram Singh - Textbook of Anatomy Upper Limb and Thorax.. Volume 1 (1).pdf
Vishram Singh - Textbook of Anatomy  Upper Limb and Thorax.. Volume 1 (1).pdfVishram Singh - Textbook of Anatomy  Upper Limb and Thorax.. Volume 1 (1).pdf
Vishram Singh - Textbook of Anatomy Upper Limb and Thorax.. Volume 1 (1).pdf
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
 
Sociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning ExhibitSociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning Exhibit
 
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptx
 
Google Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptxGoogle Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptx
 
Single or Multiple melodic lines structure
Single or Multiple melodic lines structureSingle or Multiple melodic lines structure
Single or Multiple melodic lines structure
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdf
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
 
How to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POSHow to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POS
 

Evaluation of the reliability for L2 speech rating in discourse completion testMethoken in seoul

  • 1. Evaluation of the reliability for L2 speech rating in discourse completion test Yusuke Kondo and Yutaka Ishii
  • 2. Prediction method used in automated scoring system for L2 1 0 01 Item x Item x Predictors Speech rate Pitch range Mean length of utterance 2
  • 3. Predictor examination 0 1 01 IndexA Index B IndexC Index D Good predictors Bad predictors When we try to predict scores using two indices …, 3
  • 4. Unreliable rating = 0 = 1 IndexA Index B The first rating The second rating IndexA Index B 4
  • 5. Ishii and Kondo (2015) 5 .27 .57 Our own ratings Ratings in Narita (2013)
  • 6. Agreement of automated scoring with raters Group Correlation % Exact Agreement % Adjacent Agreement Kappa Weighted Kappa Naïve .77 41 89 .27 .75 Untrained .61 31 73 .16 .59 Certificated (Average) .92 70 99 .62 .91 Certificated (Exemplary) .95 80 100 .76 .94 Powers, Escoffery, and Duchnowski (2015) Applied Measurement in Education Untrained < Naïve < Certificated (Average) < Certificated (Exemplary) 6
  • 7. Comes as no surprise • Reliable rating is absolutely essential for the construction of automated scoring system. 7
  • 8. Then, • how do we evaluate reliability in L2 performance? • What index should be used? 8
  • 9. Outline • Reliability indices in L2 performance assessment • Reliability indices in psychometrics • Observation of reliability indices • Some comments and suggestions 9
  • 10. Language Testing 30-32 • Reliability indices used 1. Cronbach’s Alpha 2. Percentage of agreements 3. Cohen’s kappa 4. Spearman rank correlation coefficient 5. Pearson correlation coefficient 6. Infit and Outfit measures (IRT) 7. Root-mean-square deviation 10
  • 11. Alpha in rating data • Bachman (2004) “coefficient alpha should be used” • Bachman’s recommendation is introduced in Carr (2011) and Sawaki (2013). 11
  • 12. Journals on psychometrics • Reliability indices discussed 1. Polychoric correlation coefficient 2. McDonald’s omega 3. Intraclass correlation coefficient 4. Standard deviation of correlation coefficients 5. Means of correlation coefficients 12
  • 13. Next, • we will be looking at how the reliability indices behave in our rating data. 13
  • 14. Data • 30 different discourse completion task completed by 44-60 university students. • Each utterance was rated by different three raters 14
  • 15. Example When you (A) want to ask your friend about their weekend, what would you say in the conversation below? A: ( ) B: We went shopping. 15
  • 16. Rating criteria Score Description 3 Can understand the speaker’s intention. Natural pronunciation and Intonation. Almost no foreign accentedness. 2 Can understand the speaker’s intention, but can find some foreign accents. 1 Can’t understand the speakers’ intention because of strong foreign accents 0 Can’t catch the utterance because of low voice or noise. 16
  • 17. Target indices • Cronbach’s alpha – Kendall – Spearman – Pearson – Polychoric • McDonald’s omega • Mean of correlation coefficients • Fleiss’ kappa • Percentage of exact and adjacent agreement 17
  • 18. Data frame α_k α_spe α_pea α_pol . . . κ % Item 1 .47 .53 .48 .74 . . . .22 .75 Item 2 .56 .55 .55 .67 . . . .25 .80 Item 3 .62 .67 .64 .59 . . . .30 .90 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Item 30 .66 .86 .67 .92 . . . .47 .66 18
  • 19. Much the same. Mean of correlation coefficients Cronbach’s alpha McDonald’s omega 19
  • 20. Correlations among coefficients Cronbach’s Alpha alpha_ken 0.5 0.6 0.7 0.8 0.99 0.91 0.4 0.6 0.8 0.450.600.75 0.79 0.50.60.70.8 alpha_spe 0.93 0.81 alpha_pea 0.500.650.80 0.81 0.45 0.60 0.75 0.40.60.8 0.50 0.65 0.80 alpha_pol Mean of Correlation Coefficients m_ken 0.3 0.4 0.5 0.6 1.00 0.92 0.2 0.4 0.6 0.8 0.20.30.40.5 0.74 0.30.40.50.6 m_spe 0.94 0.76 m_pea 0.30.40.50.6 0.78 0.2 0.3 0.4 0.5 0.20.40.60.8 0.3 0.4 0.5 0.6 m_pol 20
  • 21. Correlations among coefficients McDonald’s omega omegah_ken 0.50 0.60 0.70 0.80 0.97 0.86 0.3 0.5 0.7 0.9 0.500.600.700.80 0.69 0.500.600.700.80 omegah_spe 0.91 0.73 omegah_pea 0.550.650.750.85 0.67 0.50 0.60 0.70 0.80 0.30.50.70.9 0.55 0.65 0.75 0.85 omegah_pol 21
  • 22. Comment • Much the same results can be obtained by Spearman’s and Pearson’s in 4-point scale. 22
  • 23. Suggestion • Polychoric correlation coefficients should be used, if you would prefer not to violate statistical constraints and/or to underestimate the reliability of your data. 23
  • 24. Reason • Pearson’s should not be used for rating data. • Use Spearman’s instead. • But, their correlation is extremely high. • They might share their construct. 24
  • 25. Correlation among indices Kendall’s based indices m_ken 0.45 0.55 0.65 0.75 0.99 0.20.30.40.5 0.97 0.450.550.650.75 alpha_ken 0.97 0.2 0.3 0.4 0.5 0.50 0.60 0.70 0.80 0.500.600.700.80 omegah_ken Spearman’s-based indices m_spe 0.5 0.6 0.7 0.8 0.99 0.30.40.50.6 0.96 0.50.60.70.8 alpha_spe 0.97 0.3 0.4 0.5 0.6 0.50 0.60 0.70 0.80 0.500.600.700.80 omegah_spe 25
  • 26. Correlation among indices Pearson’s-based indices m_pea 0.50 0.60 0.70 0.80 0.99 0.30.40.50.6 0.95 0.500.600.700.80 alpha_pea 0.95 0.3 0.4 0.5 0.6 0.55 0.65 0.75 0.85 0.550.650.750.85 omegah_pea Polychoric-based indices alpha_pol 0.3 0.4 0.5 0.6 0.7 0.8 0.9 0.94 0.40.50.60.70.80.9 0.98 0.30.40.50.60.70.80.9 omegah_pol 0.88 0.4 0.5 0.6 0.7 0.8 0.9 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 0.20.30.40.50.60.70.80.9 m_pol 26
  • 27. Suggestion • Mean of correlation coefficients, Cronbach’s alpha, and McDonald’s omega, you can use any of them. 27
  • 28. ICC, Kappa, and % α M of r ω ICC κ % α 1 .98 .94 .75 .54 .53 M of r .98 1 .88 .72 .54 .44 ω .94 .88 1 .74 .48 .58 ICC .75 .72 .74 1 .81 .72 κ .54 .54 .48 .81 1 .61 % .53 .44 .58 .72 .61 1 α : α using polychoric correlation coefficients M of r : Mean of polychoric correlation coefficients ω : ω using polychoric correlation coefficients ICC : Intraclass correlation coefficients κ : Fleiss’ kappa % : Percentage of exact and adjacent agreements 28
  • 29. Comment • “Agreement” may be a construct different from “reliability.” 29 Rater A Rater B ↑ True score Agreement ↓
  • 30. • One more thing, we have found 30
  • 31. A feature of alpha A B C D E A 1 B .7 1 C .7 .7 1 D .7 .7 .7 1 E .7 .7 .7 .7 1 F G H I J F 1 G .9 1 H .9 .9 1 I .5 .5 .5 1 J .6 .6 .6 .9 1 Table 1: Item A Table 2: Item B 𝛼 = .92 𝛼 = .92 The tables were created, based on Schmitt (1996) Psychological Assessment To show the difference, SD of correlation coefficients is recommended to be reported. 31
  • 32. In our data K L M K 1 L .80 1 M .45 .90 1 0.05 0.10 0.15 0.20 0.4 0.6 0.8 Alpha SD N O P N 1 O .95 1 P .92 .76 1 32
  • 33. Comments • Even if we obtain much the same alphas, the correlations among raters are different in two items. 33
  • 34. Another feature of alpha Q R S Q 1 R .7 1 S .7 .7 1 T U V X Y Z T 1 U .7 1 V .7 .7 1 X .7 .7 .7 1 Y .7 .7 .7 .7 1 Z .7 .7 .7 .7 .7 1 𝛼 = .87 𝛼 = .93 a b c d e f a 1 b .5 1 c .5 .5 1 d .5 .5 .5 1 e .5 .5 .5 .5 1 f .5 .5 .5 .5 .5 1 𝛼 = .86 34
  • 35. Final suggestions • When you report on the reliability in the rating data with more than two raters, – Polychoric correlation coefficients should be used. – SD of correlation coefficients among raters is recommended to be reported. – Mean of correlation coefficients might be used instead of alpha (, which might be more comprehensible than alpha). 35
  • 36. Outline • Reliability indices in L2 performance assessment • Reliability indices in psychometrics • Observation of reliability indices • Some comments and suggestions 36