SlideShare ist ein Scribd-Unternehmen logo
1 von 28
Vietnam TESOL 8
July 2013
Timothy Farnsworth, CUNY Hunter College
Introduction
 Overview of automated scoring approaches
 Writing examples: E-rater, Criterion (ETS)
 Oral examples: Versant, Versant Junior
(Pearson)
 Impact on teaching: Benefits
 Impact on teaching: Dangers
 Thoughts for the future
What is automated scoring?
 Computer software that automatically
assigns scores to writing or speaking
samples
 Essays can be assigned scores instantly by
computer
 Test takers can call a testing center and take
an oral test without speaking to a human
 Scores reported instantly
 Some level of feedback given to test takers
 Variety of software approaches
How does a computer grade a test?
 Approach #1: Natural Language Processing
(NLP)
 Software identifies and counts linguistic features
 Software does not attempt to gauge content in any
way
 Used for testing writing
 Approach #2: Speech Recognition
 Software compares speech sample to a large
database of samples of the same test question(s)
 Faster responses are “more fluent”, etc.
 Used for testing speaking
Example1: E-Rater (ETS)
 Automated scoring of timed essays
 Uses NLP
 Currently used in limited way to rate:
 TOEFL
 GRE
 Used for formative assessment (Criterion,
Scoreitnow!, TOEFL Practice Online)
 Individual assessment
 Students turn in essays, receive scores, revise,
repeat
What does E-Rater do with an
essay?
 Global measures:
 Count total words, total sentences,
 sentence length, # of paragraphs
 Vocabulary measures:
 # of unique words used ÷ total words (lexical
diversity)
 # of low-frequency words (lexical depth)
 # of prompt-specific words (topic appropriateness)
What does E-Rater do with an
essay? #2
 Grammatical measures:
 Dependent, independent clauses
 Passive voice
 Subject-verb agreement, etc.
 Other measures:
 Sequencing words (then, next, etc.)
 Logical relations (as a result, however, etc.)
 Mechanics (punctuation, etc.)
What is a “good” essay
according to E-rater?
 Long (longer is always better for e-rater)
 Standard structure
 Longer sentences, many dependent clauses
 Many explicit organizational words
 Obscure vocabulary
 Indubitably > Surely
 Obfuscate profusely > Lie a lot
 Wide range of vocabulary
What does E-rater not
notice? “Teaching assistants are paid an excessive
amount of money. The average teaching
assistant makes six times as much money as
college presidents. In addition, they often
receive a plethora of extra benefits such as
private jets, vacations in the south seas, a
staring roles in motion pictures. Moreover, in
the Dickens novel Great Expectation, Pip
makes his fortune by being a teaching
assistant.” (Perelman 2012)
Criterion: E-rater application
 Designed for in-class use
 Students’ essays are instantly scored using
E-rater software
 Essays get individualized feedback on errors
and style
 Students directed to materials for self-study
and revision of essay
 Process repeated
 Used in many schools worldwide
Example2: Versant
(Pearson)
 First fully automated oral language test used
commercially
 Developed by Ordinate corp, bought by Pearson
 Test is taken in computer lab using microphone
and headset, or over the telephone
 Computer automatically rates the speech and
produces scores
 Used widely in business, increasingly in schools
 Many versions, multiple uses and languages
What is a Versant test like?
 About 15 minutes long
 Several simple task types:
 Repeating sentences
 Scrambled Sentences
 “Oral multiple choice”
 All responses totally scripted
 Optional “Free response” final question
 Not scored, but saved for reference
Sample Versant Jr.
Questions
What does Versant do with speech?
 Test takers’ speech is captured by a
microphone and processed in computer
server
 This speech is “compared to” a large
database of human-scored responses
 Native speakers from different countries
 English learners from different countries, of all
different proficiency levels
 Scores given in the range of “most similar”
responses to the test taker
 Scores available immediately
What is a “good” Versant
response?
 Fast response (fluency score)
 Clear
 Accurate (the sentence is repeated exactly,
etc.) (sentence mastery + vocabulary)
 Native-like pronunciation (pronunciation score)
 We talk about Global English nowadays!
 “Comprehensibility” is more important than native-
like speech (Celce-Murcia, Brinton, & Goodwin 2010)
What does Versant NOT
measure?
 Range of vocabulary
 Extended speaking
 Pragmatics, cultural awareness
 Ability to interact with others
What are some advantages of
these systems?
 Reliability
 Computers do not get tired
 Computers are not biased for or against
individuals
 Scores are more consistent than with human
raters (Bernstein, Van Moere et al 2010)
 Practicality
 Automated scoring is much less expensive than
human rating
 Scores and feedback obtained instantly
What does research show?
 When test takers are acting “in good faith”,
scores are roughly equivalent of human raters
 Bridgeman et al (2005): E-rater scores are similar to
humans for most nationalities
 Bernstein, Van Moere, & Cheng (2010),Van Moere
(2012): Versant scores correspond closely to scores
from interview assessments
 *Even though the final scores are very similar,
the tests do not actually measure the same
things (Chun 2006)
Problems with automated
scores Automated tests can be “gamed” or
tricked
 Farnsworth (2013): Versant scores can
be quickly raised by coaching, but
similar results found with an interview
assessment
 Monaghan & Bridgeman (2005): E-rater
scores cannot be used without human
raters for “real” testing (TOEFL, etc.)
How does this positively affect
teaching?
 Writing feedback: Students may get
more (and faster) feedback on:
 Grammar errors in writing
 Lexical errors in writing
 Oral feedback: Teachers may be able to
more often assess students’ speaking
skills
How might this negatively affect
teaching?
 Washback: Effect of testing on instructional
practice (Wall 1999, Bachman & Palmer
1996)
 Teachers tend to focus on what is tested
(Bailey 1999)
 What is tested is different in automated
scoring
 Mismatch between current ideas in
Communicative Language Teaching vs.
automated scoring
Effects on writing instruction
 Increased focus on grammatical
accuracy and low-frequency vocabulary
 Heavy focus on traditional essay
structure and devices
 Decreased focus on quality of content,
selection of examples, style, etc.
 “Use a lot of high level vocabulary, make
sentences longer, mimic conventional
thinking on the topic”
Effects on oral instruction
 Increased focus on oral repetition, word-
level pronunciation
 Increased focus on speed of response
 Decreased focus on pragmatic / cultural
components of language
 Decreased focus on critical thinking
Maybe this is a good thing?
 Some argue that we should return to a
greater focus on structure, vocabulary, speed,
and pronunciation (Van Moere 2012a, 2012b)
 Focus on grammatical forms, linguistic
structures certainly is beneficial
 Students consistently express a desire for
direct instruction on fundamentals
Conclusion
 Computer-scored testing is in all our
futures
 Provides compelling practical benefits
 Students benefit from frequent feedback on
grammar and vocabulary
 Does not (cannot) measure the same
things as humans measure
 Great danger of limiting instruction and
curriculum to grammar, vocabulary, speed,
and pronunciation
Thank you!
Tim Farnsworth
tfarnswo@hunter.cuny.edu
Powerpoint is on
Slideshare :

Weitere ähnliche Inhalte

Was ist angesagt?

Natural Language Processing (NLP)
Natural Language Processing (NLP)Natural Language Processing (NLP)
Natural Language Processing (NLP)Yuriy Guts
 
Classification vs clustering
Classification vs clusteringClassification vs clustering
Classification vs clusteringKhadija Parween
 
Glove global vectors for word representation
Glove global vectors for word representationGlove global vectors for word representation
Glove global vectors for word representationhyunyoung Lee
 
Grammarly AI-NLP Club #8 - Arabic Natural Language Processing: Challenges and...
Grammarly AI-NLP Club #8 - Arabic Natural Language Processing: Challenges and...Grammarly AI-NLP Club #8 - Arabic Natural Language Processing: Challenges and...
Grammarly AI-NLP Club #8 - Arabic Natural Language Processing: Challenges and...Grammarly
 
Introduction to Prolog (PROramming in LOGic)
Introduction to Prolog (PROramming in LOGic)Introduction to Prolog (PROramming in LOGic)
Introduction to Prolog (PROramming in LOGic)Ahmed Gad
 
Stemming And Lemmatization Tutorial | Natural Language Processing (NLP) With ...
Stemming And Lemmatization Tutorial | Natural Language Processing (NLP) With ...Stemming And Lemmatization Tutorial | Natural Language Processing (NLP) With ...
Stemming And Lemmatization Tutorial | Natural Language Processing (NLP) With ...Edureka!
 
Deep Learning as a Cat/Dog Detector
Deep Learning as a Cat/Dog DetectorDeep Learning as a Cat/Dog Detector
Deep Learning as a Cat/Dog DetectorRoelof Pieters
 
Review and evaluations of shortest path algorithms
Review and evaluations of shortest path algorithmsReview and evaluations of shortest path algorithms
Review and evaluations of shortest path algorithmsPawan Kumar Tiwari
 
Ensemble learning Techniques
Ensemble learning TechniquesEnsemble learning Techniques
Ensemble learning TechniquesBabu Priyavrat
 
Natural language processing: feature extraction
Natural language processing: feature extractionNatural language processing: feature extraction
Natural language processing: feature extractionGabriel Hamilton
 
Nlp toolkits and_preprocessing_techniques
Nlp toolkits and_preprocessing_techniquesNlp toolkits and_preprocessing_techniques
Nlp toolkits and_preprocessing_techniquesankit_ppt
 

Was ist angesagt? (20)

Natural Language Processing (NLP)
Natural Language Processing (NLP)Natural Language Processing (NLP)
Natural Language Processing (NLP)
 
Machine learning
Machine learningMachine learning
Machine learning
 
Apriori algorithm
Apriori algorithmApriori algorithm
Apriori algorithm
 
Classification vs clustering
Classification vs clusteringClassification vs clustering
Classification vs clustering
 
Text Classification
Text ClassificationText Classification
Text Classification
 
Glove global vectors for word representation
Glove global vectors for word representationGlove global vectors for word representation
Glove global vectors for word representation
 
AI: AI & Problem Solving
AI: AI & Problem SolvingAI: AI & Problem Solving
AI: AI & Problem Solving
 
Grammarly AI-NLP Club #8 - Arabic Natural Language Processing: Challenges and...
Grammarly AI-NLP Club #8 - Arabic Natural Language Processing: Challenges and...Grammarly AI-NLP Club #8 - Arabic Natural Language Processing: Challenges and...
Grammarly AI-NLP Club #8 - Arabic Natural Language Processing: Challenges and...
 
Introduction to Prolog (PROramming in LOGic)
Introduction to Prolog (PROramming in LOGic)Introduction to Prolog (PROramming in LOGic)
Introduction to Prolog (PROramming in LOGic)
 
What is word2vec?
What is word2vec?What is word2vec?
What is word2vec?
 
Stemming And Lemmatization Tutorial | Natural Language Processing (NLP) With ...
Stemming And Lemmatization Tutorial | Natural Language Processing (NLP) With ...Stemming And Lemmatization Tutorial | Natural Language Processing (NLP) With ...
Stemming And Lemmatization Tutorial | Natural Language Processing (NLP) With ...
 
Word2Vec
Word2VecWord2Vec
Word2Vec
 
Deep Learning as a Cat/Dog Detector
Deep Learning as a Cat/Dog DetectorDeep Learning as a Cat/Dog Detector
Deep Learning as a Cat/Dog Detector
 
Meta learning tutorial
Meta learning tutorialMeta learning tutorial
Meta learning tutorial
 
Review and evaluations of shortest path algorithms
Review and evaluations of shortest path algorithmsReview and evaluations of shortest path algorithms
Review and evaluations of shortest path algorithms
 
Ensemble learning Techniques
Ensemble learning TechniquesEnsemble learning Techniques
Ensemble learning Techniques
 
Natural language processing: feature extraction
Natural language processing: feature extractionNatural language processing: feature extraction
Natural language processing: feature extraction
 
Nlp toolkits and_preprocessing_techniques
Nlp toolkits and_preprocessing_techniquesNlp toolkits and_preprocessing_techniques
Nlp toolkits and_preprocessing_techniques
 
Ai swarm intelligence
Ai   swarm intelligenceAi   swarm intelligence
Ai swarm intelligence
 
Topic Models
Topic ModelsTopic Models
Topic Models
 

Ähnlich wie Automated Language Assessment Scoring and impact on instruction

Alternative assessment
Alternative assessmentAlternative assessment
Alternative assessmentcholovacs
 
Chapter 3-different ways of data gathering
Chapter 3-different ways of data gatheringChapter 3-different ways of data gathering
Chapter 3-different ways of data gatheringAbolfazl Ghanbary
 
Automated Writing Assessment In The Classroom
Automated Writing Assessment In The ClassroomAutomated Writing Assessment In The Classroom
Automated Writing Assessment In The ClassroomCourtney Esco
 
CALL Advantages and Apprehensions
CALL Advantages and ApprehensionsCALL Advantages and Apprehensions
CALL Advantages and ApprehensionsSalina Saharudin
 
Assessment &testing in the classroom
Assessment &testing in the classroomAssessment &testing in the classroom
Assessment &testing in the classroomCidher89
 
Assessment &testing in the classroom
Assessment &testing in the classroomAssessment &testing in the classroom
Assessment &testing in the classroomCidher89
 
ELSA's Speech Recognition Overview
ELSA's Speech Recognition OverviewELSA's Speech Recognition Overview
ELSA's Speech Recognition OverviewLinhVu946763
 
My Theory of Learning
My Theory of LearningMy Theory of Learning
My Theory of LearningJeremy Wang
 
Testing Listening and Reading
Testing Listening and ReadingTesting Listening and Reading
Testing Listening and ReadingSamcruz5
 
Expanding the "E" with interactive multimedia English language software
Expanding the "E" with interactive multimedia English language softwareExpanding the "E" with interactive multimedia English language software
Expanding the "E" with interactive multimedia English language softwareMarsha J. Chan
 

Ähnlich wie Automated Language Assessment Scoring and impact on instruction (20)

Alternative assessment
Alternative assessmentAlternative assessment
Alternative assessment
 
Chapter 3-different ways of data gathering
Chapter 3-different ways of data gatheringChapter 3-different ways of data gathering
Chapter 3-different ways of data gathering
 
Text-to-Speech for Beginning Readers -ATIA Chicago 09
Text-to-Speech for Beginning Readers -ATIA Chicago 09Text-to-Speech for Beginning Readers -ATIA Chicago 09
Text-to-Speech for Beginning Readers -ATIA Chicago 09
 
Automated Writing Assessment In The Classroom
Automated Writing Assessment In The ClassroomAutomated Writing Assessment In The Classroom
Automated Writing Assessment In The Classroom
 
CALL Advantages and Apprehensions
CALL Advantages and ApprehensionsCALL Advantages and Apprehensions
CALL Advantages and Apprehensions
 
New Options for Online Student Feedback
New Options for Online Student FeedbackNew Options for Online Student Feedback
New Options for Online Student Feedback
 
Wk8 appwynnbellk
Wk8 appwynnbellkWk8 appwynnbellk
Wk8 appwynnbellk
 
Assessment &testing in the classroom
Assessment &testing in the classroomAssessment &testing in the classroom
Assessment &testing in the classroom
 
Assessment &testing in the classroom
Assessment &testing in the classroomAssessment &testing in the classroom
Assessment &testing in the classroom
 
ELSA's Speech Recognition Overview
ELSA's Speech Recognition OverviewELSA's Speech Recognition Overview
ELSA's Speech Recognition Overview
 
Nov 7-11, 2011
Nov 7-11, 2011Nov 7-11, 2011
Nov 7-11, 2011
 
Eassessment Bob Rotheram
Eassessment Bob RotheramEassessment Bob Rotheram
Eassessment Bob Rotheram
 
My Theory of Learning
My Theory of LearningMy Theory of Learning
My Theory of Learning
 
Assessing speaking
Assessing speakingAssessing speaking
Assessing speaking
 
Podcasting ppp 08 - Derek France - Chester 08
Podcasting   ppp 08 - Derek France - Chester 08Podcasting   ppp 08 - Derek France - Chester 08
Podcasting ppp 08 - Derek France - Chester 08
 
Fluency
FluencyFluency
Fluency
 
Testing Listening and Reading
Testing Listening and ReadingTesting Listening and Reading
Testing Listening and Reading
 
Fluency
FluencyFluency
Fluency
 
Synchronous Communication
Synchronous CommunicationSynchronous Communication
Synchronous Communication
 
Expanding the "E" with interactive multimedia English language software
Expanding the "E" with interactive multimedia English language softwareExpanding the "E" with interactive multimedia English language software
Expanding the "E" with interactive multimedia English language software
 

Kürzlich hochgeladen

Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)eniolaolutunde
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Krashi Coaching
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introductionMaksud Ahmed
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfAdmir Softic
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxVishalSingh1417
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhikauryashika82
 
9548086042 for call girls in Indira Nagar with room service
9548086042  for call girls in Indira Nagar  with room service9548086042  for call girls in Indira Nagar  with room service
9548086042 for call girls in Indira Nagar with room servicediscovermytutordmt
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Sapana Sha
 
Student login on Anyboli platform.helpin
Student login on Anyboli platform.helpinStudent login on Anyboli platform.helpin
Student login on Anyboli platform.helpinRaunakKeshri1
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationnomboosow
 
General AI for Medical Educators April 2024
General AI for Medical Educators April 2024General AI for Medical Educators April 2024
General AI for Medical Educators April 2024Janet Corral
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingTechSoup
 
Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Disha Kariya
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfciinovamais
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfchloefrazer622
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactPECB
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeThiyagu K
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactdawncurless
 

Kürzlich hochgeladen (20)

Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdf
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptx
 
Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
 
9548086042 for call girls in Indira Nagar with room service
9548086042  for call girls in Indira Nagar  with room service9548086042  for call girls in Indira Nagar  with room service
9548086042 for call girls in Indira Nagar with room service
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
 
Student login on Anyboli platform.helpin
Student login on Anyboli platform.helpinStudent login on Anyboli platform.helpin
Student login on Anyboli platform.helpin
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communication
 
General AI for Medical Educators April 2024
General AI for Medical Educators April 2024General AI for Medical Educators April 2024
General AI for Medical Educators April 2024
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 
Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
 
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptxINDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdf
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and Mode
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impact
 

Automated Language Assessment Scoring and impact on instruction

  • 1. Vietnam TESOL 8 July 2013 Timothy Farnsworth, CUNY Hunter College
  • 2. Introduction  Overview of automated scoring approaches  Writing examples: E-rater, Criterion (ETS)  Oral examples: Versant, Versant Junior (Pearson)  Impact on teaching: Benefits  Impact on teaching: Dangers  Thoughts for the future
  • 3. What is automated scoring?  Computer software that automatically assigns scores to writing or speaking samples  Essays can be assigned scores instantly by computer  Test takers can call a testing center and take an oral test without speaking to a human  Scores reported instantly  Some level of feedback given to test takers  Variety of software approaches
  • 4. How does a computer grade a test?  Approach #1: Natural Language Processing (NLP)  Software identifies and counts linguistic features  Software does not attempt to gauge content in any way  Used for testing writing  Approach #2: Speech Recognition  Software compares speech sample to a large database of samples of the same test question(s)  Faster responses are “more fluent”, etc.  Used for testing speaking
  • 5. Example1: E-Rater (ETS)  Automated scoring of timed essays  Uses NLP  Currently used in limited way to rate:  TOEFL  GRE  Used for formative assessment (Criterion, Scoreitnow!, TOEFL Practice Online)  Individual assessment  Students turn in essays, receive scores, revise, repeat
  • 6. What does E-Rater do with an essay?  Global measures:  Count total words, total sentences,  sentence length, # of paragraphs  Vocabulary measures:  # of unique words used ÷ total words (lexical diversity)  # of low-frequency words (lexical depth)  # of prompt-specific words (topic appropriateness)
  • 7. What does E-Rater do with an essay? #2  Grammatical measures:  Dependent, independent clauses  Passive voice  Subject-verb agreement, etc.  Other measures:  Sequencing words (then, next, etc.)  Logical relations (as a result, however, etc.)  Mechanics (punctuation, etc.)
  • 8. What is a “good” essay according to E-rater?  Long (longer is always better for e-rater)  Standard structure  Longer sentences, many dependent clauses  Many explicit organizational words  Obscure vocabulary  Indubitably > Surely  Obfuscate profusely > Lie a lot  Wide range of vocabulary
  • 9. What does E-rater not notice? “Teaching assistants are paid an excessive amount of money. The average teaching assistant makes six times as much money as college presidents. In addition, they often receive a plethora of extra benefits such as private jets, vacations in the south seas, a staring roles in motion pictures. Moreover, in the Dickens novel Great Expectation, Pip makes his fortune by being a teaching assistant.” (Perelman 2012)
  • 10. Criterion: E-rater application  Designed for in-class use  Students’ essays are instantly scored using E-rater software  Essays get individualized feedback on errors and style  Students directed to materials for self-study and revision of essay  Process repeated  Used in many schools worldwide
  • 11.
  • 12. Example2: Versant (Pearson)  First fully automated oral language test used commercially  Developed by Ordinate corp, bought by Pearson  Test is taken in computer lab using microphone and headset, or over the telephone  Computer automatically rates the speech and produces scores  Used widely in business, increasingly in schools  Many versions, multiple uses and languages
  • 13. What is a Versant test like?  About 15 minutes long  Several simple task types:  Repeating sentences  Scrambled Sentences  “Oral multiple choice”  All responses totally scripted  Optional “Free response” final question  Not scored, but saved for reference
  • 14.
  • 16. What does Versant do with speech?  Test takers’ speech is captured by a microphone and processed in computer server  This speech is “compared to” a large database of human-scored responses  Native speakers from different countries  English learners from different countries, of all different proficiency levels  Scores given in the range of “most similar” responses to the test taker  Scores available immediately
  • 17. What is a “good” Versant response?  Fast response (fluency score)  Clear  Accurate (the sentence is repeated exactly, etc.) (sentence mastery + vocabulary)  Native-like pronunciation (pronunciation score)  We talk about Global English nowadays!  “Comprehensibility” is more important than native- like speech (Celce-Murcia, Brinton, & Goodwin 2010)
  • 18. What does Versant NOT measure?  Range of vocabulary  Extended speaking  Pragmatics, cultural awareness  Ability to interact with others
  • 19. What are some advantages of these systems?  Reliability  Computers do not get tired  Computers are not biased for or against individuals  Scores are more consistent than with human raters (Bernstein, Van Moere et al 2010)  Practicality  Automated scoring is much less expensive than human rating  Scores and feedback obtained instantly
  • 20. What does research show?  When test takers are acting “in good faith”, scores are roughly equivalent of human raters  Bridgeman et al (2005): E-rater scores are similar to humans for most nationalities  Bernstein, Van Moere, & Cheng (2010),Van Moere (2012): Versant scores correspond closely to scores from interview assessments  *Even though the final scores are very similar, the tests do not actually measure the same things (Chun 2006)
  • 21. Problems with automated scores Automated tests can be “gamed” or tricked  Farnsworth (2013): Versant scores can be quickly raised by coaching, but similar results found with an interview assessment  Monaghan & Bridgeman (2005): E-rater scores cannot be used without human raters for “real” testing (TOEFL, etc.)
  • 22. How does this positively affect teaching?  Writing feedback: Students may get more (and faster) feedback on:  Grammar errors in writing  Lexical errors in writing  Oral feedback: Teachers may be able to more often assess students’ speaking skills
  • 23. How might this negatively affect teaching?  Washback: Effect of testing on instructional practice (Wall 1999, Bachman & Palmer 1996)  Teachers tend to focus on what is tested (Bailey 1999)  What is tested is different in automated scoring  Mismatch between current ideas in Communicative Language Teaching vs. automated scoring
  • 24. Effects on writing instruction  Increased focus on grammatical accuracy and low-frequency vocabulary  Heavy focus on traditional essay structure and devices  Decreased focus on quality of content, selection of examples, style, etc.  “Use a lot of high level vocabulary, make sentences longer, mimic conventional thinking on the topic”
  • 25. Effects on oral instruction  Increased focus on oral repetition, word- level pronunciation  Increased focus on speed of response  Decreased focus on pragmatic / cultural components of language  Decreased focus on critical thinking
  • 26. Maybe this is a good thing?  Some argue that we should return to a greater focus on structure, vocabulary, speed, and pronunciation (Van Moere 2012a, 2012b)  Focus on grammatical forms, linguistic structures certainly is beneficial  Students consistently express a desire for direct instruction on fundamentals
  • 27. Conclusion  Computer-scored testing is in all our futures  Provides compelling practical benefits  Students benefit from frequent feedback on grammar and vocabulary  Does not (cannot) measure the same things as humans measure  Great danger of limiting instruction and curriculum to grammar, vocabulary, speed, and pronunciation