SlideShare ist ein Scribd-Unternehmen logo
1 von 25
Downloaden Sie, um offline zu lesen
ASA 173, Boston
Crowdsourcing Speech
Intelligibility Judgements
Maria K Wolters, University of
Edinburgh
Karl B Isaac, freelance
researcher
Contact: maria.wolters@ed.ac.uk,
@mariawolters
with many thanks to Steve Renals &
the EPSRC MultiMemoHome team
Key Questions
❖ What can we know about the context of the judgements
people make?
❖ How might they affect performance?
❖ could explain some of increased variation in results
❖ could yield new hypotheses about real-world
intelligibility
❖ How can we improve the experience?
Data
❖ Series of 14 lab and Amazon Mechanical Turk
experiments on speech synthesis intelligibility (Isaac,
2015, PhD thesis)
❖ Lab vs Mechanical Turk
❖ effect of type of test sentences
❖ effect of noise and reverberation
Experiment Overview
Study complete
not
complete
Aim
amt 167 62
Semantically unpredictable sentences,
AMT vs Lab, 4 systems
matrix 61 40 testing matrix sentences
newvoice 61 49 three new voices
lowrev 68 NA effects of low reverberation
highrev 36 NA effects of high reverberation
noiserev 78 183 noise x reverberation
Total 471 334
no exclusions and filtering
Important aspects of context
❖ People’s hearing
❖ How they are listening
❖ Where they are listening
❖ Experience with speech tested
❖ Did they do what they were supposed to do?
Hearing Issues
❖ Self-report does not correlate very well with actual
hearing loss (Wolters, Isaac, Johnson 2011)
❖ Yet, many instances of self-reported hearing difficulties
that affect ability to understand speech in noise, with no
hearing loss (Bharawaj et al., 2015)
How people are listening
❖ Headphones versus no headphones
❖ Type of headphones (earbuds, on ear, full ear …)
❖ Features of headphones
❖ configuration of listening device (phone / computer;
browser; volume)
Where they are listening
❖ Room acoustics
❖ Public / private
❖ Interruptions
❖ background noise
❖ source
❖ loudness
❖ fluctuating / constant / bursty
Experience with Speech Type
❖ Dialect
❖ Life history
❖ exposure to target speech
Did They Do What They Were Supposed To Do?
❖ Manipulation checks, such as very easy sentence
❖ Different task / item, that stirs people out of „tickybox“
mode
❖ Instructions at the start, then questions about aspects of
instructions at the end (people are surprisingly honest!)
Effect on Performance
❖ Context Variables:
❖ self-reported hearing problems
❖ self-reported loudness of background noise
❖ Performance Variables:
❖ Word error rate (WER) mean for each within-participant
condition
❖ self-reported performance
Self-Reported Hearing
(Hearing Handicap Inventory for Adults)
Study mean median IQR Max >=10
amt 3 0 0 38 21 (13%)
matrix 3 0 0 34 4 (7%)
newvoice 3.5 0 4 36 10 (16%)
lowrev 1 0 0 18 5 (7%)
highrev 1.5 0 0 28 2 (6%)
noiserev 1.5 0 0 20 6 (8%)
Self-Reported Noise Loudness
Study
1
(none)
2 3 4
5
(LOUD)
median IQR
matrix 25 29 4 3 0 2 1
newvoice 29 20 7 4 1 2 1
lowrev 36 16 11 4 0 1 1
highrev 18 15 1 1 1 1.5 1
noiserev 44 22 5 1 6 1 1
not captured in AMT study
Mean WER
Study min mean median IQR Max
amt 0.06 0.20 0.18 0.8 1.00
matrix 0 0.09 0.08 0.40 0.32
newvoice 0 0.14 0.14 0.15 0.42
lowrev 0 0.05 0.04 0.06 0.5
highrev 0 0.15 0.08 0.22 0.92
noiserev 0 0.50 0.48 0.88 1.16
Self-Reported Intelligibility
Study usually all
usually
most
worse
link
Mean WER
amt 7 (4%) 125 (75%) 35 (21%) p<0.0001
matrix 27 (44%) 33 (54%) 1 (2%) p<0.005
newvoice 10 (16%) 47 (77%) 4 (6.5%) p<0.01
lowrev 45 (66%) 21 (31%) 1 (1%) p<0.001
highrev 11 (31%) 22 (61%) 3 (8%) p<0.05
noiserev 7 (9%) 31 (40%) 40 (51%) p<0.0001
Link with mean WER assessed using Kruskal-Wallis test
Checking for Correlations
❖ Spearman test as implemented in R package coin
❖ stratified by relevant experimental variables
❖ H0 is that mean WER and HHIA score / loudness are
independent, given the experimental variable
HHIA vs Mean WER
Study by System by Reverb by SNR
amt p=0.55
matrix p=0.08
newvoice p<0.01
lowrev p=0.37 p=0.44
highrev p=0.88 p=0.85
noiserev p=0.11 p<0.01 p<0.005
self-reported hearing becomes relevant
* in the most difficult study (noiserev)
* in the study with the highest number of people over threshold
Example: NoiseReverb
Loudness vs WER
Study by System by Reverb by SNR
matrix p=0.08
newvoice p=0.30
lowrev p=0.11 p=0.17
highrev p=0.14 p<0.07
noiserev p<0.05 p=0.14 p=0.18
no evidence for a strong influence
Loudness vs Self-Reported Understanding
Study by System by Reverb by SNR
matrix p<0.01
newvoice p<0.005
lowrev p<0.005 p<0.005
highrev p<0.005 p<0.005
noiserev p<0.001 p<0.001 p<0.001
Self-reported loudness of environment noise
relates to self-reported difficulty, not WER
Example: Noise x Reverb
Effects of Context on Performance
• can be subtle
• may depend on whether self-reported or measured
performance
• may depend on who shows up for your study: better
understanding of possible confounders!
Suggestion: build up library of context data across studies
How Can We Make it Easier?
❖ Design between subject rather than within. 90 sentences
on final study was a killer
❖ Pay a living wage
❖ encourage free comments that can be mined for useful
information (think canary in a coal mine)
❖ offer more info on goal of study, opt-in to receive results
summary
Canaries in the Comment Coalmine
❖ issues with the software
❖ issues with their memory
❖ typing while listening
❖ issues with UK accent for US listeners
❖ how they adjusted the volume at their end
Conclusion
❖ Use consistent brief questions regarding context to better characterise your
samples across all your studies
❖ Use free comments to look for aspects you hadn’t considered before
❖ Be kind to your participants
Questions?
Contact: 

maria.wolters@ed.ac.uk, @mariawolters, 

http://mariawolters.net
Dr Karl B Isaac

Weitere ähnliche Inhalte

Ähnlich wie Crowdsourcing Speech Intelligibility Judgements

On the Diffusion of Test Smells in Automatically Generated Test Code: An Empi...
On the Diffusion of Test Smells in Automatically Generated Test Code: An Empi...On the Diffusion of Test Smells in Automatically Generated Test Code: An Empi...
On the Diffusion of Test Smells in Automatically Generated Test Code: An Empi...Fabio Palomba
 
Development of voice password based speaker verification system
Development of voice password based speaker verification systemDevelopment of voice password based speaker verification system
Development of voice password based speaker verification systemniranjan kumar
 
Development of voice password based speaker verification system
Development of voice password based speaker verification systemDevelopment of voice password based speaker verification system
Development of voice password based speaker verification systemniranjan kumar
 
Frequency Lowering Hearing Aids: Procedures for Assessing Candidacy and Fine ...
Frequency Lowering Hearing Aids: Procedures for Assessing Candidacy and Fine ...Frequency Lowering Hearing Aids: Procedures for Assessing Candidacy and Fine ...
Frequency Lowering Hearing Aids: Procedures for Assessing Candidacy and Fine ...Phonak
 
2015 ohsu-metagenome
2015 ohsu-metagenome2015 ohsu-metagenome
2015 ohsu-metagenomec.titus.brown
 
Feasibility of EEG Super-Resolution Using Deep Convolutional Networks
Feasibility of EEG Super-Resolution Using Deep Convolutional NetworksFeasibility of EEG Super-Resolution Using Deep Convolutional Networks
Feasibility of EEG Super-Resolution Using Deep Convolutional NetworksSangjun Han
 
To Improve Speech Recognition in Noise for Cochlear Implant Users
To Improve Speech Recognition in Noise for Cochlear Implant UsersTo Improve Speech Recognition in Noise for Cochlear Implant Users
To Improve Speech Recognition in Noise for Cochlear Implant UsersIOSR Journals
 
Employing Acoustic, Tactile and PPI Startle Response Procedures in Rodent Beh...
Employing Acoustic, Tactile and PPI Startle Response Procedures in Rodent Beh...Employing Acoustic, Tactile and PPI Startle Response Procedures in Rodent Beh...
Employing Acoustic, Tactile and PPI Startle Response Procedures in Rodent Beh...InsideScientific
 
Presentation lecture w3_2015_v2
Presentation lecture w3_2015_v2Presentation lecture w3_2015_v2
Presentation lecture w3_2015_v2Robert Williamson
 
An Application of Uncertainty Quantification to MPM
An Application of Uncertainty Quantification to MPMAn Application of Uncertainty Quantification to MPM
An Application of Uncertainty Quantification to MPMwallstedt
 
MediaEval 2014: THU-HCSIL Approach to Emotion in Music Task using Multi-level...
MediaEval 2014: THU-HCSIL Approach to Emotion in Music Task using Multi-level...MediaEval 2014: THU-HCSIL Approach to Emotion in Music Task using Multi-level...
MediaEval 2014: THU-HCSIL Approach to Emotion in Music Task using Multi-level...multimediaeval
 
System performance as a function of calibration methods, sample size and samp...
System performance as a function of calibration methods, sample size and samp...System performance as a function of calibration methods, sample size and samp...
System performance as a function of calibration methods, sample size and samp...Bruce WANG
 
Neural Power Amplifier
Neural Power AmplifierNeural Power Amplifier
Neural Power AmplifierAndrew Doyle
 
Measuring EEG in vivo for Preclinical Evaluation of Sleep and Alzheimer’s Dis...
Measuring EEG in vivo for Preclinical Evaluation of Sleep and Alzheimer’s Dis...Measuring EEG in vivo for Preclinical Evaluation of Sleep and Alzheimer’s Dis...
Measuring EEG in vivo for Preclinical Evaluation of Sleep and Alzheimer’s Dis...InsideScientific
 
System performance as a function of calibration methods, sample size and samp...
System performance as a function of calibration methods, sample size and samp...System performance as a function of calibration methods, sample size and samp...
System performance as a function of calibration methods, sample size and samp...Bruce WANG
 
Exploring statistical approaches to Auditory Brainstem Response testing
Exploring statistical approaches to Auditory Brainstem Response testingExploring statistical approaches to Auditory Brainstem Response testing
Exploring statistical approaches to Auditory Brainstem Response testingMohammad B S Khan
 
An Investigation of Machine Translation Evaluation Metrics in Cross-lingual Q...
An Investigation of Machine Translation Evaluation Metrics in Cross-lingual Q...An Investigation of Machine Translation Evaluation Metrics in Cross-lingual Q...
An Investigation of Machine Translation Evaluation Metrics in Cross-lingual Q...Kyoshiro Sugiyama
 

Ähnlich wie Crowdsourcing Speech Intelligibility Judgements (20)

On the Diffusion of Test Smells in Automatically Generated Test Code: An Empi...
On the Diffusion of Test Smells in Automatically Generated Test Code: An Empi...On the Diffusion of Test Smells in Automatically Generated Test Code: An Empi...
On the Diffusion of Test Smells in Automatically Generated Test Code: An Empi...
 
Development of voice password based speaker verification system
Development of voice password based speaker verification systemDevelopment of voice password based speaker verification system
Development of voice password based speaker verification system
 
Development of voice password based speaker verification system
Development of voice password based speaker verification systemDevelopment of voice password based speaker verification system
Development of voice password based speaker verification system
 
Frequency Lowering Hearing Aids: Procedures for Assessing Candidacy and Fine ...
Frequency Lowering Hearing Aids: Procedures for Assessing Candidacy and Fine ...Frequency Lowering Hearing Aids: Procedures for Assessing Candidacy and Fine ...
Frequency Lowering Hearing Aids: Procedures for Assessing Candidacy and Fine ...
 
2015 ohsu-metagenome
2015 ohsu-metagenome2015 ohsu-metagenome
2015 ohsu-metagenome
 
Feasibility of EEG Super-Resolution Using Deep Convolutional Networks
Feasibility of EEG Super-Resolution Using Deep Convolutional NetworksFeasibility of EEG Super-Resolution Using Deep Convolutional Networks
Feasibility of EEG Super-Resolution Using Deep Convolutional Networks
 
Speech Recognition No Code
Speech Recognition No CodeSpeech Recognition No Code
Speech Recognition No Code
 
To Improve Speech Recognition in Noise for Cochlear Implant Users
To Improve Speech Recognition in Noise for Cochlear Implant UsersTo Improve Speech Recognition in Noise for Cochlear Implant Users
To Improve Speech Recognition in Noise for Cochlear Implant Users
 
EEG course.pptx
EEG course.pptxEEG course.pptx
EEG course.pptx
 
Employing Acoustic, Tactile and PPI Startle Response Procedures in Rodent Beh...
Employing Acoustic, Tactile and PPI Startle Response Procedures in Rodent Beh...Employing Acoustic, Tactile and PPI Startle Response Procedures in Rodent Beh...
Employing Acoustic, Tactile and PPI Startle Response Procedures in Rodent Beh...
 
Presentation lecture w3_2015_v2
Presentation lecture w3_2015_v2Presentation lecture w3_2015_v2
Presentation lecture w3_2015_v2
 
An Application of Uncertainty Quantification to MPM
An Application of Uncertainty Quantification to MPMAn Application of Uncertainty Quantification to MPM
An Application of Uncertainty Quantification to MPM
 
MediaEval 2014: THU-HCSIL Approach to Emotion in Music Task using Multi-level...
MediaEval 2014: THU-HCSIL Approach to Emotion in Music Task using Multi-level...MediaEval 2014: THU-HCSIL Approach to Emotion in Music Task using Multi-level...
MediaEval 2014: THU-HCSIL Approach to Emotion in Music Task using Multi-level...
 
System performance as a function of calibration methods, sample size and samp...
System performance as a function of calibration methods, sample size and samp...System performance as a function of calibration methods, sample size and samp...
System performance as a function of calibration methods, sample size and samp...
 
Neural Power Amplifier
Neural Power AmplifierNeural Power Amplifier
Neural Power Amplifier
 
Measuring EEG in vivo for Preclinical Evaluation of Sleep and Alzheimer’s Dis...
Measuring EEG in vivo for Preclinical Evaluation of Sleep and Alzheimer’s Dis...Measuring EEG in vivo for Preclinical Evaluation of Sleep and Alzheimer’s Dis...
Measuring EEG in vivo for Preclinical Evaluation of Sleep and Alzheimer’s Dis...
 
Asr
AsrAsr
Asr
 
System performance as a function of calibration methods, sample size and samp...
System performance as a function of calibration methods, sample size and samp...System performance as a function of calibration methods, sample size and samp...
System performance as a function of calibration methods, sample size and samp...
 
Exploring statistical approaches to Auditory Brainstem Response testing
Exploring statistical approaches to Auditory Brainstem Response testingExploring statistical approaches to Auditory Brainstem Response testing
Exploring statistical approaches to Auditory Brainstem Response testing
 
An Investigation of Machine Translation Evaluation Metrics in Cross-lingual Q...
An Investigation of Machine Translation Evaluation Metrics in Cross-lingual Q...An Investigation of Machine Translation Evaluation Metrics in Cross-lingual Q...
An Investigation of Machine Translation Evaluation Metrics in Cross-lingual Q...
 

Mehr von Maria Wolters

Technology and Mental Health
Technology and Mental HealthTechnology and Mental Health
Technology and Mental HealthMaria Wolters
 
The Hidden Stories of Missing Data
The Hidden Stories of Missing DataThe Hidden Stories of Missing Data
The Hidden Stories of Missing DataMaria Wolters
 
How to write a CHI paper
How to write a CHI paperHow to write a CHI paper
How to write a CHI paperMaria Wolters
 
Designing From, With, and By Mental Health Data
Designing From, With, and By Mental Health DataDesigning From, With, and By Mental Health Data
Designing From, With, and By Mental Health DataMaria Wolters
 
Epidemiology versus Data Collection Bias - Studying the Needs of Platform Wor...
Epidemiology versus Data Collection Bias - Studying the Needs of Platform Wor...Epidemiology versus Data Collection Bias - Studying the Needs of Platform Wor...
Epidemiology versus Data Collection Bias - Studying the Needs of Platform Wor...Maria Wolters
 
Give Me Your Data, And I will Diagnose You
Give Me Your Data, And I will Diagnose YouGive Me Your Data, And I will Diagnose You
Give Me Your Data, And I will Diagnose YouMaria Wolters
 
Oh Data Where Art Thou?
Oh Data Where Art Thou?Oh Data Where Art Thou?
Oh Data Where Art Thou?Maria Wolters
 
The Emotional Work of eHealth Research
The Emotional Work of eHealth ResearchThe Emotional Work of eHealth Research
The Emotional Work of eHealth ResearchMaria Wolters
 
A Process View of Missing Data
A Process View of Missing DataA Process View of Missing Data
A Process View of Missing DataMaria Wolters
 
The Hidden Stories Behind Missing Data
The Hidden Stories Behind Missing DataThe Hidden Stories Behind Missing Data
The Hidden Stories Behind Missing DataMaria Wolters
 
Designing Reminders that Work
Designing Reminders that WorkDesigning Reminders that Work
Designing Reminders that WorkMaria Wolters
 
Designing Auditory Reminders that Older People can Remember
Designing Auditory Reminders that Older People can RememberDesigning Auditory Reminders that Older People can Remember
Designing Auditory Reminders that Older People can RememberMaria Wolters
 
Leveraging Large Data Sets to Make Technology more Accessible for Older People
Leveraging Large Data Sets to Make Technology more Accessible for Older PeopleLeveraging Large Data Sets to Make Technology more Accessible for Older People
Leveraging Large Data Sets to Make Technology more Accessible for Older PeopleMaria Wolters
 
What Companions Know and Remember
What Companions Know and RememberWhat Companions Know and Remember
What Companions Know and RememberMaria Wolters
 
How Safe are mHealth Apps?
How Safe are mHealth Apps?How Safe are mHealth Apps?
How Safe are mHealth Apps?Maria Wolters
 
eHealth Support for People with Depression - Lessons from Case Studies
eHealth Support for People with Depression - Lessons from Case StudieseHealth Support for People with Depression - Lessons from Case Studies
eHealth Support for People with Depression - Lessons from Case StudiesMaria Wolters
 
How Private Is Mental Health? Two Case Studies - Samaritans Radar versus Help...
How Private Is Mental Health? Two Case Studies - Samaritans Radar versus Help...How Private Is Mental Health? Two Case Studies - Samaritans Radar versus Help...
How Private Is Mental Health? Two Case Studies - Samaritans Radar versus Help...Maria Wolters
 

Mehr von Maria Wolters (17)

Technology and Mental Health
Technology and Mental HealthTechnology and Mental Health
Technology and Mental Health
 
The Hidden Stories of Missing Data
The Hidden Stories of Missing DataThe Hidden Stories of Missing Data
The Hidden Stories of Missing Data
 
How to write a CHI paper
How to write a CHI paperHow to write a CHI paper
How to write a CHI paper
 
Designing From, With, and By Mental Health Data
Designing From, With, and By Mental Health DataDesigning From, With, and By Mental Health Data
Designing From, With, and By Mental Health Data
 
Epidemiology versus Data Collection Bias - Studying the Needs of Platform Wor...
Epidemiology versus Data Collection Bias - Studying the Needs of Platform Wor...Epidemiology versus Data Collection Bias - Studying the Needs of Platform Wor...
Epidemiology versus Data Collection Bias - Studying the Needs of Platform Wor...
 
Give Me Your Data, And I will Diagnose You
Give Me Your Data, And I will Diagnose YouGive Me Your Data, And I will Diagnose You
Give Me Your Data, And I will Diagnose You
 
Oh Data Where Art Thou?
Oh Data Where Art Thou?Oh Data Where Art Thou?
Oh Data Where Art Thou?
 
The Emotional Work of eHealth Research
The Emotional Work of eHealth ResearchThe Emotional Work of eHealth Research
The Emotional Work of eHealth Research
 
A Process View of Missing Data
A Process View of Missing DataA Process View of Missing Data
A Process View of Missing Data
 
The Hidden Stories Behind Missing Data
The Hidden Stories Behind Missing DataThe Hidden Stories Behind Missing Data
The Hidden Stories Behind Missing Data
 
Designing Reminders that Work
Designing Reminders that WorkDesigning Reminders that Work
Designing Reminders that Work
 
Designing Auditory Reminders that Older People can Remember
Designing Auditory Reminders that Older People can RememberDesigning Auditory Reminders that Older People can Remember
Designing Auditory Reminders that Older People can Remember
 
Leveraging Large Data Sets to Make Technology more Accessible for Older People
Leveraging Large Data Sets to Make Technology more Accessible for Older PeopleLeveraging Large Data Sets to Make Technology more Accessible for Older People
Leveraging Large Data Sets to Make Technology more Accessible for Older People
 
What Companions Know and Remember
What Companions Know and RememberWhat Companions Know and Remember
What Companions Know and Remember
 
How Safe are mHealth Apps?
How Safe are mHealth Apps?How Safe are mHealth Apps?
How Safe are mHealth Apps?
 
eHealth Support for People with Depression - Lessons from Case Studies
eHealth Support for People with Depression - Lessons from Case StudieseHealth Support for People with Depression - Lessons from Case Studies
eHealth Support for People with Depression - Lessons from Case Studies
 
How Private Is Mental Health? Two Case Studies - Samaritans Radar versus Help...
How Private Is Mental Health? Two Case Studies - Samaritans Radar versus Help...How Private Is Mental Health? Two Case Studies - Samaritans Radar versus Help...
How Private Is Mental Health? Two Case Studies - Samaritans Radar versus Help...
 

Kürzlich hochgeladen

Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityWSO2
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxRustici Software
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...apidays
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamUiPathCommunity
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDropbox
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelDeepika Singh
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...Zilliz
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Victor Rentea
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontologyjohnbeverley2021
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native ApplicationsWSO2
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...apidays
 

Kürzlich hochgeladen (20)

Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital Adaptability
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 

Crowdsourcing Speech Intelligibility Judgements

  • 1. ASA 173, Boston Crowdsourcing Speech Intelligibility Judgements Maria K Wolters, University of Edinburgh Karl B Isaac, freelance researcher Contact: maria.wolters@ed.ac.uk, @mariawolters with many thanks to Steve Renals & the EPSRC MultiMemoHome team
  • 2. Key Questions ❖ What can we know about the context of the judgements people make? ❖ How might they affect performance? ❖ could explain some of increased variation in results ❖ could yield new hypotheses about real-world intelligibility ❖ How can we improve the experience?
  • 3. Data ❖ Series of 14 lab and Amazon Mechanical Turk experiments on speech synthesis intelligibility (Isaac, 2015, PhD thesis) ❖ Lab vs Mechanical Turk ❖ effect of type of test sentences ❖ effect of noise and reverberation
  • 4. Experiment Overview Study complete not complete Aim amt 167 62 Semantically unpredictable sentences, AMT vs Lab, 4 systems matrix 61 40 testing matrix sentences newvoice 61 49 three new voices lowrev 68 NA effects of low reverberation highrev 36 NA effects of high reverberation noiserev 78 183 noise x reverberation Total 471 334 no exclusions and filtering
  • 5. Important aspects of context ❖ People’s hearing ❖ How they are listening ❖ Where they are listening ❖ Experience with speech tested ❖ Did they do what they were supposed to do?
  • 6. Hearing Issues ❖ Self-report does not correlate very well with actual hearing loss (Wolters, Isaac, Johnson 2011) ❖ Yet, many instances of self-reported hearing difficulties that affect ability to understand speech in noise, with no hearing loss (Bharawaj et al., 2015)
  • 7. How people are listening ❖ Headphones versus no headphones ❖ Type of headphones (earbuds, on ear, full ear …) ❖ Features of headphones ❖ configuration of listening device (phone / computer; browser; volume)
  • 8. Where they are listening ❖ Room acoustics ❖ Public / private ❖ Interruptions ❖ background noise ❖ source ❖ loudness ❖ fluctuating / constant / bursty
  • 9. Experience with Speech Type ❖ Dialect ❖ Life history ❖ exposure to target speech
  • 10. Did They Do What They Were Supposed To Do? ❖ Manipulation checks, such as very easy sentence ❖ Different task / item, that stirs people out of „tickybox“ mode ❖ Instructions at the start, then questions about aspects of instructions at the end (people are surprisingly honest!)
  • 11. Effect on Performance ❖ Context Variables: ❖ self-reported hearing problems ❖ self-reported loudness of background noise ❖ Performance Variables: ❖ Word error rate (WER) mean for each within-participant condition ❖ self-reported performance
  • 12. Self-Reported Hearing (Hearing Handicap Inventory for Adults) Study mean median IQR Max >=10 amt 3 0 0 38 21 (13%) matrix 3 0 0 34 4 (7%) newvoice 3.5 0 4 36 10 (16%) lowrev 1 0 0 18 5 (7%) highrev 1.5 0 0 28 2 (6%) noiserev 1.5 0 0 20 6 (8%)
  • 13. Self-Reported Noise Loudness Study 1 (none) 2 3 4 5 (LOUD) median IQR matrix 25 29 4 3 0 2 1 newvoice 29 20 7 4 1 2 1 lowrev 36 16 11 4 0 1 1 highrev 18 15 1 1 1 1.5 1 noiserev 44 22 5 1 6 1 1 not captured in AMT study
  • 14. Mean WER Study min mean median IQR Max amt 0.06 0.20 0.18 0.8 1.00 matrix 0 0.09 0.08 0.40 0.32 newvoice 0 0.14 0.14 0.15 0.42 lowrev 0 0.05 0.04 0.06 0.5 highrev 0 0.15 0.08 0.22 0.92 noiserev 0 0.50 0.48 0.88 1.16
  • 15. Self-Reported Intelligibility Study usually all usually most worse link Mean WER amt 7 (4%) 125 (75%) 35 (21%) p<0.0001 matrix 27 (44%) 33 (54%) 1 (2%) p<0.005 newvoice 10 (16%) 47 (77%) 4 (6.5%) p<0.01 lowrev 45 (66%) 21 (31%) 1 (1%) p<0.001 highrev 11 (31%) 22 (61%) 3 (8%) p<0.05 noiserev 7 (9%) 31 (40%) 40 (51%) p<0.0001 Link with mean WER assessed using Kruskal-Wallis test
  • 16. Checking for Correlations ❖ Spearman test as implemented in R package coin ❖ stratified by relevant experimental variables ❖ H0 is that mean WER and HHIA score / loudness are independent, given the experimental variable
  • 17. HHIA vs Mean WER Study by System by Reverb by SNR amt p=0.55 matrix p=0.08 newvoice p<0.01 lowrev p=0.37 p=0.44 highrev p=0.88 p=0.85 noiserev p=0.11 p<0.01 p<0.005 self-reported hearing becomes relevant * in the most difficult study (noiserev) * in the study with the highest number of people over threshold
  • 19. Loudness vs WER Study by System by Reverb by SNR matrix p=0.08 newvoice p=0.30 lowrev p=0.11 p=0.17 highrev p=0.14 p<0.07 noiserev p<0.05 p=0.14 p=0.18 no evidence for a strong influence
  • 20. Loudness vs Self-Reported Understanding Study by System by Reverb by SNR matrix p<0.01 newvoice p<0.005 lowrev p<0.005 p<0.005 highrev p<0.005 p<0.005 noiserev p<0.001 p<0.001 p<0.001 Self-reported loudness of environment noise relates to self-reported difficulty, not WER
  • 22. Effects of Context on Performance • can be subtle • may depend on whether self-reported or measured performance • may depend on who shows up for your study: better understanding of possible confounders! Suggestion: build up library of context data across studies
  • 23. How Can We Make it Easier? ❖ Design between subject rather than within. 90 sentences on final study was a killer ❖ Pay a living wage ❖ encourage free comments that can be mined for useful information (think canary in a coal mine) ❖ offer more info on goal of study, opt-in to receive results summary
  • 24. Canaries in the Comment Coalmine ❖ issues with the software ❖ issues with their memory ❖ typing while listening ❖ issues with UK accent for US listeners ❖ how they adjusted the volume at their end
  • 25. Conclusion ❖ Use consistent brief questions regarding context to better characterise your samples across all your studies ❖ Use free comments to look for aspects you hadn’t considered before ❖ Be kind to your participants Questions? Contact: 
 maria.wolters@ed.ac.uk, @mariawolters, 
 http://mariawolters.net Dr Karl B Isaac