Aspiring Minds | Svar

Aspiring Minds
www.aspiringminds.com
Spoken English Evaluation
Machine Learning with Crowd Intelligence
Varun Aggarwal
Presented at KDD, 2015, ACL 2015

Problem Statement & Motivation
Importance of spoken English
English language has a very high socio-economic impact – with people speaking the language fluently
reported to earn 30-50% more than their peers who don’t.
Grading spoken English in a scalable way needed by companies, training organization and also
individuals.
Problem Statement
Scalable grading of spontaneous English speech, as good as experts.

Why are automated methods not accurate?
Speaker independent Speech
recognition for spontaneous
speech is a hard problem!

Proposed system architecture
Crowdsourcing helps us get
accurate transcriptions. Crowd grades
also help
improve!Crowd Grades
FA Features

Crowdsourcing task
Worker quality control
• Each worker is assigned a risk level which reflects the
quality of his past work.
• Based on the state, number and when to give a gold
standard task is determined.

Supervised learning setup
Experiment Details
• Sample Size : 566
• 319 India
• 247 from Philippines
Expert Grading
• Two expert raters
• Overall score based on Pronunciation/Fluency
Content-Org/Grammar.
• Inter-rater correlation ~0.8.
The learning task
• Modelling done separately for Indian and Philippines
set.
• Linear ridge regression, Neural Networks and SVM
regression with different kernels were used to build
the models.

Case study
• Studied deployment of proposed algorithm in
Philippines.
• Event had 500 applicants for the role of a
customer support executive. The scoring
algorithm was tested on a subset of 150 students.
• Internal expert graded each candidate’s speech as
hirable or not-hireable.

Features used
We use three classes of features
• Force Alignment features (FA) and
• The speech sample is forced aligned on the crowdsourced transcription.
• Features like– rate of speech, position and length of pauses, log likelihood of recognition, posterior probability,
hesitations and repetitions, etc are derived.
• Natural Language Processing features (NLP).
• Surface level features : number of words, complexity or difficulty of words and the number of common words
used.
• Semantic features like the coherency in text, context of the words spoken, sentiment of the text and grammar
correctness.
• Crowd Grades (CG)
• Crowd provides scores on - pronunciation, fluency, content organization and grammar.
• These grades are combined to form a composite score.

Experiment and Results
Crowdsourced transcriptions + Crowd grades outperforms all other methods
Accuracy nears inter expert agreement (~0.8).

Summing it up
• Svar provides an automated assessment of candidate’s pronunciation and fluency.
• Crowdsourcing, in addition to NLP feature, renders reliable composite scores.
• Speech assessments can be made scalable with accuracy nearly matching experts’ opinion.

Aspiring Minds | Svar

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Andere mochten auch

Andere mochten auch (18)

Ähnlich wie Aspiring Minds | Svar

Ähnlich wie Aspiring Minds | Svar (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Aspiring Minds | Svar

Hinweis der Redaktion