SlideShare a Scribd company logo
1 of 28
Watch the video with slide 
synchronization on InfoQ.com! 
http://www.infoq.com/presentations 
/speech-ui-mobile 
InfoQ.com: News & Community Site 
• 750,000 unique visitors/month 
• Published in 4 languages (English, Chinese, Japanese and Brazilian 
Portuguese) 
• Post content from our QCon conferences 
• News 15-20 / week 
• Articles 3-4 / week 
• Presentations (videos) 12-15 / week 
• Interviews 2-3 / week 
• Books 1 / month
Presented at QCon London 
www.qconlondon.com 
Purpose of QCon 
- to empower software development by facilitating the spread of 
knowledge and innovation 
Strategy 
- practitioner-driven conference designed for YOU: influencers of 
change and innovation in your teams 
- speakers and topics driving the evolution and innovation 
- connecting and catalyzing the influencers and innovators 
Highlights 
- attended by more than 12,000 delegates since 2007 
- held in 9 cities worldwide
An Unseen Interface 
Creating Speech-driven UI For Your App That Makes Users Happy 
by Halle Winkler, @politepix http://www.politepix.com 
:D
What is a speech-driven UI?
A speech-driven UI uses either 
speech recognition as an input 
method, speech synthesis as 
an information source for the 
user, or both together. 
...but it can also be multi-modal.
How does speech 
recognition work? 
The elements of speech recognition are: 
1. An acoustic model 
2. A lexicon 
3. A language model (probability) or grammar (ruleset for states) 
4. A decoder
What kind of apps benefit 
from speech UIs? 
Large Vocabulary Tasks: server, built-in vocabulary 
(UITextView, Android.speech, Nuance, AT&T, iSpeech) 
Tasks in which free-form dictation is useful 
Tasks which relate specifically to language 
Command and control tasks: offline, you generally define 
vocabulary (OpenEars or other CMU Sphinx or Julius 
implementations, some Android.speech devices and OSes) 
Interfaces where the user is looking somewhere else 
Interfaces where speech provides a new input or output 
Interfaces that are more fun with speech 
Interfaces where it’s easier to speak than type 
Interfaces where it’s easier to listen than read 
Interfaces where a heavy obstacle is removed
Why offline? 
The interface is always available to your user 
Speed is as fast or faster as a network API – and it's quantifiable! 
Interface design and implementation is simpler and more 
predictable without an asynchronous network dependency 
The user is not giving away any of their data
How is a speech UI 
different from a 
visual UI? 
What are the dimensions on which a 
visual UI is rendered? 
What are the dimensions on which a speech UI is rendered? 
A speech UI is rendered on the dimension of time. 
People value their time exquisitely.
Do people understand each other 
perfectly all the time? 
Why not? 
Accents 
Lack of shared vocabulary/Dialect 
Noise 
Distractions 
Interruptions 
Hearing difficulties 
Distance 
Language errors 
! 
Human speech interactions have frequent comprehension faults 
Emotional intelligence makes us incredibly fault-tolerant
Automated speech 
recognition is subject to all 
the same issues as human 
speech recognition, but 
without the emotional 
intelligence
We have to stack 
the deck in our 
(users’) favor.
Short is good. 
Don't bite off more than you can chew – small (read "fast") steps 
forward means small (read "fast") steps backwards 
! 
Use keyword detection to launch events 
! 
Switch between small vocabularies that each relate to one domain 
This results in accuracy, speed, and a large vocabulary!
Short is bad. 
Phonemes are the smallest unit of speech 
Words with few of them have a lot of rhymes 
Contextless rhyming is our enemy 
Medium-sized, crunchy granola words are our friends
My app, my rules 
Some apps need to recognize words 
or phrases in ways that can be 
expressed by rules. 
Or be flexible 
Some apps need to do probability-based detection 
There are probability-based language models for 
expressing this such as ARPA models
Out of vocabulary 
Your app also has to behave well when 
people aren't speaking to it!
Mic distance and 
vocabulary 
The more distance, the less vocabulary
Test, test, test. 
And obtain appropriate test material.
Case study 1: Recipe App 
A natural implementation of offline speech recognition
What are our interface 
considerations? 
• What are we buying with our time? Hands-free operation, moving locus 
• Hands-free doesn't mean eyes-free! We can provide visual info 
• Operational distance is pretty far 
• Instead of NLP, offline grammar 
• Secret weapon: we know all the words in a recipe in advance 
• Fault tolerance: one level of complexity, don't confirm; return! 
• Challenges: noise, moving locus, reflection, competing speech 
!
Case study 2: Marco Polo 
A dialog management tag game: one user checks in 
a single location and the other user receives 
volume-based speech feedback about their 
proximity to the target when they say “Marco”
UX Considerations 
• What are we buying with our time: play! 
• For a single word, language model is fast and sufficient 
• Acoustic environment and OOV semi-important 
• This is a single-mode interface – an actual dialog 
manager 
• Extra development time should be put into increasing 
voice dynamic range
Case study 3: 
TalkCheater 
An app to whisper sweet presentation notes in your ear
UX Considerations 
• What are we buying? Eye contact, moving locus, 
enhanced human capabilities 
• Is this a speech recognition app? 
• Does this have a visual or a touch interface? 
• The body is the interface 
• Fault tolerance, always important but most 
important in a high-value scenario 
• Volume 
• Speaking speed of synthesized speech
Talk to me @politepix 
and the OpenEars 
forums. I will tell you 
all the things.
Watch the video with slide synchronization on 
InfoQ.com! 
http://www.infoq.com/presentations/speech-ui- 
mobile

More Related Content

More from C4Media

Software Teams and Teamwork Trends Report Q1 2020
Software Teams and Teamwork Trends Report Q1 2020Software Teams and Teamwork Trends Report Q1 2020
Software Teams and Teamwork Trends Report Q1 2020C4Media
 
Understand the Trade-offs Using Compilers for Java Applications
Understand the Trade-offs Using Compilers for Java ApplicationsUnderstand the Trade-offs Using Compilers for Java Applications
Understand the Trade-offs Using Compilers for Java ApplicationsC4Media
 
Kafka Needs No Keeper
Kafka Needs No KeeperKafka Needs No Keeper
Kafka Needs No KeeperC4Media
 
High Performing Teams Act Like Owners
High Performing Teams Act Like OwnersHigh Performing Teams Act Like Owners
High Performing Teams Act Like OwnersC4Media
 
Does Java Need Inline Types? What Project Valhalla Can Bring to Java
Does Java Need Inline Types? What Project Valhalla Can Bring to JavaDoes Java Need Inline Types? What Project Valhalla Can Bring to Java
Does Java Need Inline Types? What Project Valhalla Can Bring to JavaC4Media
 
Service Meshes- The Ultimate Guide
Service Meshes- The Ultimate GuideService Meshes- The Ultimate Guide
Service Meshes- The Ultimate GuideC4Media
 
Shifting Left with Cloud Native CI/CD
Shifting Left with Cloud Native CI/CDShifting Left with Cloud Native CI/CD
Shifting Left with Cloud Native CI/CDC4Media
 
CI/CD for Machine Learning
CI/CD for Machine LearningCI/CD for Machine Learning
CI/CD for Machine LearningC4Media
 
Fault Tolerance at Speed
Fault Tolerance at SpeedFault Tolerance at Speed
Fault Tolerance at SpeedC4Media
 
Architectures That Scale Deep - Regaining Control in Deep Systems
Architectures That Scale Deep - Regaining Control in Deep SystemsArchitectures That Scale Deep - Regaining Control in Deep Systems
Architectures That Scale Deep - Regaining Control in Deep SystemsC4Media
 
ML in the Browser: Interactive Experiences with Tensorflow.js
ML in the Browser: Interactive Experiences with Tensorflow.jsML in the Browser: Interactive Experiences with Tensorflow.js
ML in the Browser: Interactive Experiences with Tensorflow.jsC4Media
 
Build Your Own WebAssembly Compiler
Build Your Own WebAssembly CompilerBuild Your Own WebAssembly Compiler
Build Your Own WebAssembly CompilerC4Media
 
User & Device Identity for Microservices @ Netflix Scale
User & Device Identity for Microservices @ Netflix ScaleUser & Device Identity for Microservices @ Netflix Scale
User & Device Identity for Microservices @ Netflix ScaleC4Media
 
Scaling Patterns for Netflix's Edge
Scaling Patterns for Netflix's EdgeScaling Patterns for Netflix's Edge
Scaling Patterns for Netflix's EdgeC4Media
 
Make Your Electron App Feel at Home Everywhere
Make Your Electron App Feel at Home EverywhereMake Your Electron App Feel at Home Everywhere
Make Your Electron App Feel at Home EverywhereC4Media
 
The Talk You've Been Await-ing For
The Talk You've Been Await-ing ForThe Talk You've Been Await-ing For
The Talk You've Been Await-ing ForC4Media
 
Future of Data Engineering
Future of Data EngineeringFuture of Data Engineering
Future of Data EngineeringC4Media
 
Automated Testing for Terraform, Docker, Packer, Kubernetes, and More
Automated Testing for Terraform, Docker, Packer, Kubernetes, and MoreAutomated Testing for Terraform, Docker, Packer, Kubernetes, and More
Automated Testing for Terraform, Docker, Packer, Kubernetes, and MoreC4Media
 
Navigating Complexity: High-performance Delivery and Discovery Teams
Navigating Complexity: High-performance Delivery and Discovery TeamsNavigating Complexity: High-performance Delivery and Discovery Teams
Navigating Complexity: High-performance Delivery and Discovery TeamsC4Media
 
High Performance Cooperative Distributed Systems in Adtech
High Performance Cooperative Distributed Systems in AdtechHigh Performance Cooperative Distributed Systems in Adtech
High Performance Cooperative Distributed Systems in AdtechC4Media
 

More from C4Media (20)

Software Teams and Teamwork Trends Report Q1 2020
Software Teams and Teamwork Trends Report Q1 2020Software Teams and Teamwork Trends Report Q1 2020
Software Teams and Teamwork Trends Report Q1 2020
 
Understand the Trade-offs Using Compilers for Java Applications
Understand the Trade-offs Using Compilers for Java ApplicationsUnderstand the Trade-offs Using Compilers for Java Applications
Understand the Trade-offs Using Compilers for Java Applications
 
Kafka Needs No Keeper
Kafka Needs No KeeperKafka Needs No Keeper
Kafka Needs No Keeper
 
High Performing Teams Act Like Owners
High Performing Teams Act Like OwnersHigh Performing Teams Act Like Owners
High Performing Teams Act Like Owners
 
Does Java Need Inline Types? What Project Valhalla Can Bring to Java
Does Java Need Inline Types? What Project Valhalla Can Bring to JavaDoes Java Need Inline Types? What Project Valhalla Can Bring to Java
Does Java Need Inline Types? What Project Valhalla Can Bring to Java
 
Service Meshes- The Ultimate Guide
Service Meshes- The Ultimate GuideService Meshes- The Ultimate Guide
Service Meshes- The Ultimate Guide
 
Shifting Left with Cloud Native CI/CD
Shifting Left with Cloud Native CI/CDShifting Left with Cloud Native CI/CD
Shifting Left with Cloud Native CI/CD
 
CI/CD for Machine Learning
CI/CD for Machine LearningCI/CD for Machine Learning
CI/CD for Machine Learning
 
Fault Tolerance at Speed
Fault Tolerance at SpeedFault Tolerance at Speed
Fault Tolerance at Speed
 
Architectures That Scale Deep - Regaining Control in Deep Systems
Architectures That Scale Deep - Regaining Control in Deep SystemsArchitectures That Scale Deep - Regaining Control in Deep Systems
Architectures That Scale Deep - Regaining Control in Deep Systems
 
ML in the Browser: Interactive Experiences with Tensorflow.js
ML in the Browser: Interactive Experiences with Tensorflow.jsML in the Browser: Interactive Experiences with Tensorflow.js
ML in the Browser: Interactive Experiences with Tensorflow.js
 
Build Your Own WebAssembly Compiler
Build Your Own WebAssembly CompilerBuild Your Own WebAssembly Compiler
Build Your Own WebAssembly Compiler
 
User & Device Identity for Microservices @ Netflix Scale
User & Device Identity for Microservices @ Netflix ScaleUser & Device Identity for Microservices @ Netflix Scale
User & Device Identity for Microservices @ Netflix Scale
 
Scaling Patterns for Netflix's Edge
Scaling Patterns for Netflix's EdgeScaling Patterns for Netflix's Edge
Scaling Patterns for Netflix's Edge
 
Make Your Electron App Feel at Home Everywhere
Make Your Electron App Feel at Home EverywhereMake Your Electron App Feel at Home Everywhere
Make Your Electron App Feel at Home Everywhere
 
The Talk You've Been Await-ing For
The Talk You've Been Await-ing ForThe Talk You've Been Await-ing For
The Talk You've Been Await-ing For
 
Future of Data Engineering
Future of Data EngineeringFuture of Data Engineering
Future of Data Engineering
 
Automated Testing for Terraform, Docker, Packer, Kubernetes, and More
Automated Testing for Terraform, Docker, Packer, Kubernetes, and MoreAutomated Testing for Terraform, Docker, Packer, Kubernetes, and More
Automated Testing for Terraform, Docker, Packer, Kubernetes, and More
 
Navigating Complexity: High-performance Delivery and Discovery Teams
Navigating Complexity: High-performance Delivery and Discovery TeamsNavigating Complexity: High-performance Delivery and Discovery Teams
Navigating Complexity: High-performance Delivery and Discovery Teams
 
High Performance Cooperative Distributed Systems in Adtech
High Performance Cooperative Distributed Systems in AdtechHigh Performance Cooperative Distributed Systems in Adtech
High Performance Cooperative Distributed Systems in Adtech
 

Recently uploaded

DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embeddingZilliz
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxBkGupta21
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rick Flair
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsNathaniel Shimoni
 

Recently uploaded (20)

DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embedding
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptx
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
 

An Unseen Interface

  • 1.
  • 2. Watch the video with slide synchronization on InfoQ.com! http://www.infoq.com/presentations /speech-ui-mobile InfoQ.com: News & Community Site • 750,000 unique visitors/month • Published in 4 languages (English, Chinese, Japanese and Brazilian Portuguese) • Post content from our QCon conferences • News 15-20 / week • Articles 3-4 / week • Presentations (videos) 12-15 / week • Interviews 2-3 / week • Books 1 / month
  • 3. Presented at QCon London www.qconlondon.com Purpose of QCon - to empower software development by facilitating the spread of knowledge and innovation Strategy - practitioner-driven conference designed for YOU: influencers of change and innovation in your teams - speakers and topics driving the evolution and innovation - connecting and catalyzing the influencers and innovators Highlights - attended by more than 12,000 delegates since 2007 - held in 9 cities worldwide
  • 4. An Unseen Interface Creating Speech-driven UI For Your App That Makes Users Happy by Halle Winkler, @politepix http://www.politepix.com :D
  • 5. What is a speech-driven UI?
  • 6. A speech-driven UI uses either speech recognition as an input method, speech synthesis as an information source for the user, or both together. ...but it can also be multi-modal.
  • 7. How does speech recognition work? The elements of speech recognition are: 1. An acoustic model 2. A lexicon 3. A language model (probability) or grammar (ruleset for states) 4. A decoder
  • 8. What kind of apps benefit from speech UIs? Large Vocabulary Tasks: server, built-in vocabulary (UITextView, Android.speech, Nuance, AT&T, iSpeech) Tasks in which free-form dictation is useful Tasks which relate specifically to language Command and control tasks: offline, you generally define vocabulary (OpenEars or other CMU Sphinx or Julius implementations, some Android.speech devices and OSes) Interfaces where the user is looking somewhere else Interfaces where speech provides a new input or output Interfaces that are more fun with speech Interfaces where it’s easier to speak than type Interfaces where it’s easier to listen than read Interfaces where a heavy obstacle is removed
  • 9. Why offline? The interface is always available to your user Speed is as fast or faster as a network API – and it's quantifiable! Interface design and implementation is simpler and more predictable without an asynchronous network dependency The user is not giving away any of their data
  • 10. How is a speech UI different from a visual UI? What are the dimensions on which a visual UI is rendered? What are the dimensions on which a speech UI is rendered? A speech UI is rendered on the dimension of time. People value their time exquisitely.
  • 11. Do people understand each other perfectly all the time? Why not? Accents Lack of shared vocabulary/Dialect Noise Distractions Interruptions Hearing difficulties Distance Language errors ! Human speech interactions have frequent comprehension faults Emotional intelligence makes us incredibly fault-tolerant
  • 12. Automated speech recognition is subject to all the same issues as human speech recognition, but without the emotional intelligence
  • 13. We have to stack the deck in our (users’) favor.
  • 14. Short is good. Don't bite off more than you can chew – small (read "fast") steps forward means small (read "fast") steps backwards ! Use keyword detection to launch events ! Switch between small vocabularies that each relate to one domain This results in accuracy, speed, and a large vocabulary!
  • 15. Short is bad. Phonemes are the smallest unit of speech Words with few of them have a lot of rhymes Contextless rhyming is our enemy Medium-sized, crunchy granola words are our friends
  • 16. My app, my rules Some apps need to recognize words or phrases in ways that can be expressed by rules. Or be flexible Some apps need to do probability-based detection There are probability-based language models for expressing this such as ARPA models
  • 17. Out of vocabulary Your app also has to behave well when people aren't speaking to it!
  • 18. Mic distance and vocabulary The more distance, the less vocabulary
  • 19. Test, test, test. And obtain appropriate test material.
  • 20. Case study 1: Recipe App A natural implementation of offline speech recognition
  • 21. What are our interface considerations? • What are we buying with our time? Hands-free operation, moving locus • Hands-free doesn't mean eyes-free! We can provide visual info • Operational distance is pretty far • Instead of NLP, offline grammar • Secret weapon: we know all the words in a recipe in advance • Fault tolerance: one level of complexity, don't confirm; return! • Challenges: noise, moving locus, reflection, competing speech !
  • 22. Case study 2: Marco Polo A dialog management tag game: one user checks in a single location and the other user receives volume-based speech feedback about their proximity to the target when they say “Marco”
  • 23. UX Considerations • What are we buying with our time: play! • For a single word, language model is fast and sufficient • Acoustic environment and OOV semi-important • This is a single-mode interface – an actual dialog manager • Extra development time should be put into increasing voice dynamic range
  • 24. Case study 3: TalkCheater An app to whisper sweet presentation notes in your ear
  • 25. UX Considerations • What are we buying? Eye contact, moving locus, enhanced human capabilities • Is this a speech recognition app? • Does this have a visual or a touch interface? • The body is the interface • Fault tolerance, always important but most important in a high-value scenario • Volume • Speaking speed of synthesized speech
  • 26. Talk to me @politepix and the OpenEars forums. I will tell you all the things.
  • 27.
  • 28. Watch the video with slide synchronization on InfoQ.com! http://www.infoq.com/presentations/speech-ui- mobile