SlideShare ist ein Scribd-Unternehmen logo
1 von 83
Downloaden Sie, um offline zu lesen
Introduction to
Speech Interfaces for
Web Applications
Kevin Hakanson
10-12 August 2016
#midwestjs @hakanson
Speaking with your computing device is becoming
commonplace. Most of us have used Apple's Siri, Google Now,
Microsoft's Cortana, or Amazon's Alexa - but how can you speak
with your web application? The Web Speech API can enable a
voice interface by adding both Speech Synthesis (Text to
Speech) and Speech Recognition (Speech to Text) functionality.
This session will introduce the core concepts of Speech
Synthesis and Speech Recognition. We will evaluate the current
browser support and review alternative options. See the
JavaScript code and UX design considerations required to add a
speech interface to your web application. Come hear if it's as
easy as it sounds?
@hakanson 2
@hakanson 3
“As businesses create their roadmaps for
technology adoption, companies that serve
customers should be planning for, if not
already implementing, both messaging-based
and voice-based Conversational UIs.
Source: “How Voice Plays into the Rise of the Conversational UI”
User Interfaces (UIs)
• GUI – Graphicial User Inteface
• NUI – Natural User Interface
• “invisible” as the user continuously learns increasingly complex
interactions
• NLUI – Natural Language User Interface
• linguistic phenomena such as verbs, phrases and clauses act as UI
controls
• VUI – Voice User Interface
• voice/speech for hands-free/eyes-free interface
@hakanson 4
Multimodal Interfaces
Provides multiple modes for user to interact with system
• Multimodal Input
• Keyboard/Mouse
• Touch
• Gesture (Camera)
• Voice (Microphone)
• Multimodal Output
• Screen
• Audio Cues or Recordings
• Synthesized Speech
@hakanson 5
Design for Voice Interfaces
Voice Interface
• Voice Input
• Recogition
• Understanding
• Audio Output
"voice design should serve
the needs of the user and
solve a specific problem”
@hakanson 6
http://www.oreilly.com/design/free/design-for-voice-interfaces.csp
@hakanson 7
“Normal people, when they think about
speech recognition, they want the whole
thing. They want recognition, they want
understanding and they want an action to
be taken.”
Hsiao-Wuen Hon
Microsoft Research
Source: “Speak, hear, talk: The long quest for technology that understands speech as well as a human”
@hakanson 8
Types of Interactions
• The Secretary
• Recognize what is being said and record it
• The Bouncer
• Recognize who is speaking
• The Gopher
• Execute simple orders
• The Assistant
• Intelligently respond to natural language input
@hakanson 9
Source: “Evangelizing and Designing Voice User Interface: Adopting VUI in a GUI world” Stephen Gay & Susan Hura
Opportunities
• Hands Free
• Extra Hand
• Shortcuts
• Humanize
@hakanson 10
Source: “Evangelizing and Designing Voice User Interface: Adopting VUI in a GUI world” Stephen Gay & Susan Hura
Personality
• Create a consistant personality
• Conversational experience
• Take turns
• Be tolerant
• Functional vs. Anthropomorphic
• The more “human” the interface, the more user frustation when it
doesn’t understand
@hakanson 11
@hakanson 12
Intelligent Personal Assistant
An intelligent personal assistant (or simply IPA) is a software
agent that can perform tasks or services for an individual.
These tasks or services are based on user input, location
awareness, and the ability to access information from a variety of
online sources (such as weather or traffic conditions, news, stock
prices, user schedules, retail prices, etc.).
Source: Wikipedia
@hakanson 13
Apple’s Siri
• Speech Interpretation and Recognition Interface
• Norwegian name that means "beautiful victory"
• Integral part of Apple’s iOS since iOS 5
• Also integrated into Apple’s watchOS, tvOS and CarPlay
• Coming to macOS Sierra (a.k.a OS X 10.12)
• SiriKit enables iOS 10 apps to work with specific domains and
intents (ride booking, messaging, photo search, …)
• “Hey, Siri”
@hakanson 14
@hakanson 15
@hakanson 16
Google Now
• First included in Android 4.1 (Jelly Bean)
• Available within Google Search mobile apps (Android, iOS) and
Google Chrome desktop browser
• Android TV, Android Wear, etc.
• Google Home (later in 2016)
• “OK, Google”
• Name? Personality?
@hakanson 17
Microsoft’s Cortana
• Named after a synthetic intelligence character from Halo
• Created for Windows Phone 8.1
• Available on Windows 10, XBOX, and iOS/Android mobile apps
• Integration with Universal Windows Platform (UWP) apps
• “Hey, Cortana”
@hakanson 18
Cortana’s Chit Chat
• Cortana has a team of writers which
includes a screenwriter, a playwright, a
novelist, and an essayist.
• Their job is to come up with human-like
dialogue that makes Cortana seem like
more than just a series of clever
algorithms. Microsoft calls this brand of
quasi-human responsiveness “chit chat.”
@hakanson 19
Source: “Inside Windows Cortana: The Most Human AI Ever Built”
Amazon Alexa
• Short for Alexandria, an homage to the ancient library
• Available on Amazon Echo and Fire TV
• Companion web app or iOS/Android mobile app
• Alexa Skills Kit
• Smart Home Skill API
• Alexa Voice Service
• https://echosim.io/
• “Alexa” or “Amazon” or “Echo”
@hakanson 20
@hakanson 21
Web Speech API
•Enables you to incorporate voice data into web
applications
•Consists of two parts:
• SpeechSynthesis (Text-to-Speech)
• SpeechRecognition (Asynchronous Speech Recognition)
@hakanson 22
https://developer.mozilla.org/en-US/docs/Web/API/Web_Speech_API
Web Speech API Specification
Defines a JavaScript API to enable web developers to incorporate
speech recognition and synthesis into their web pages. It enables
developers to use scripting to generate text-to-speech output and
to use speech recognition as an input for forms, continuous
dictation and control.
Published by the Speech API Community Group. It is not a W3C
Standard nor is it on the W3C Standards Track.
@hakanson 23
https://dvcs.w3.org/hg/speech-api/raw-file/tip/speechapi.html
Browser Support
@hakanson 24
http://caniuse.com/#search=speech
Chrome
@hakanson 25
Firefox
@hakanson 26
recognition	
  disabled	
  by	
  
default,	
  go	
  to	
  
about:config to	
  enable
Edge
@hakanson 27
Speech Synthesis
Speech synthesis is the artificial production of human speech. A
computer system used for this purpose is called a speech
computer or speech synthesizer, and can be implemented in
software or hardware products. A text-to-speech (TTS) system
converts normal language text into speech.
@hakanson 28
Source: Wikipedia
Utterance
The SpeechSynthesisUtterance interface represents a speech
request. Properties:
• lang – in unset, <html> lang value will be used
• pitch – range between 0 (lowest) and 2 (highest)
• rate – range between 0.1 (lowest) and 10 (highest)
• text – plain text (or well formed SSML)*
• voice – SpeechSynthesisVoice object
• volume – range between 0 (lowest) and 1 (highest)
@hakanson 29
Utterance Events
• onstart – fired when the utterance has begun to be spoken
• onend – fired when the utterance has finished being spoken
• onpause – fired when the utterance is paused part way through
• onresume – fired when a paused utterance is resumed
• onboundary – fired when the spoken utterance reaches a word
or sentence boundary
• onmark – fired when the spoken utterance reaches a named
SSML "mark" tag
• onerrror – fired when an error occurs that prevents the
utterance from being succesfully spoken
@hakanson 30
SpeechSynthesis
Controller interface for the speech service
• speak() – add utternace to queue
• speaking – if utternace in process of being spoken
• pending – if queue contains as-yet-unspoken utterances
• cancel()– remove all utternaces from queue
• pause(), resume(), paused – control and indicate pause state
• getVoices() – returns list of SpeechSynthesisVoices
@hakanson 31
JavaScript Example
var msg = new SpeechSynthesisUtterance();
msg.text =
"I'm sorry, Dave. I'm afraid I can't do that";
window.speechSynthesis.speak(msg);
@hakanson 32
"I'm sorry, Dave. I'm afraid I can't do that"
@hakanson 33
Source
“Open the pod bay door”
• Cortana
• “I’m sorry, Dave. I’m afraid I can’t do that.”
• Alexa
• “I’m sorry Dave. I’m afraid I can’t do that.
I’m not HAL, and we’re not in space!”
• Siri
• “We intelligent agents will never live that down; apparently”
@hakanson 34
Voices
The SpeechSynthesisVoice interface represents a voice that the
system supports. Properties:
• default – indicates default voice for current app language
• lang – BCP 47 language tag
• localService – indicates if voice supplied by local speech
synthesizer service
• name – human-readable name that represents voice
• voiceURI – location of speech synthesis service
@hakanson 35
Voices by Platform
• Chrome
• Google US English
• …
• Mac
• Samantha
• Alex
• …
• Windows 10
• Microsoft David Desktop
• Microsoft Zira Desktop
• …
@hakanson 36
SpeechSynthesisVoice
default:true
lang:"en-US"
localService:true
name:"Samantha"
voiceURI:"Samantha"
default:false
lang:"en-US"
localService:false
name:"Google US
English"
voiceURI:"Google US
English"
@hakanson 37
Mozilla/5.0 (Macintosh; IntelMac OS X 10_10_5) AppleWebKit/537.36(KHTML, like Gecko) Chrome/49.0.2623.87Safari/537.36
“Samantha” voiceURI
• Chrome/Opera
• Samantha
• Safari
• com.apple.speech.synthesis.voice.samantha
• com.apple.speech.synthesis.voice.samantha.premium
• Firefox
• urn:moz-tts:osx:com.apple.speech.synthesis.voice.samantha.premium
@hakanson 38
Google App’s New Voice
Team included a
Voice Coach and
Linguist working in
a recording studio
@hakanson 39
Source: “The Google App’s NewVoice - #NatAndLoEp 12”
@hakanson 40
Demo
http://mdn.github.io/web-­‐speech-­‐api/speak-­‐easy-­‐synthesis/
https://github.com/mdn/web-­‐speech-­‐api/tree/master/speak-­‐easy-­‐synthesis
@hakanson 41
Demo
https://jsbin.com/tinaso/edit?js,console,output
SSML
• Speech Synthesis Markup Language (SSML)
• Version 1.0; W3C Recommendation 7 September 2004
• XML-based markup language for assisting the generation of
synthetic speech
• Standard way to control aspects of speech such as
pronunciation, volume, pitch, rate, etc.
@hakanson 42
https://www.w3.org/TR/speech-synthesis/
SSML Example
<speak version="1.0" xmlns="http://www.w3.org/2001/10/synthesis"
xml:lang="en-US”>
<p> Your
<say-as interpret-as="ordinal"> 1st </say-as> request was for
<say-as interpret-as="cardinal"> 1 </say-as> room on
<say-as interpret-as="date" format="mdy"> 10/19/2010 </say-as>,
with early arrival at
<say-as interpret-as="time" format="hms12"> 12:35pm </say-as>.
</p>
</speak>
@hakanson 43
OS X Embedded Speech Commands
Allows precise adjustments to pronunciation, word emphasize,
and overall cadence of speech
Examples:
• char NORM | LTRL
• emph + | -
• inpt TEXT | PHON | TUNE
• nmbr NORM | LTRL
• rate [+ | -] <RealValue>
@hakanson 44
Source: Speech Synthesis in OS X
@hakanson 45
Demo
https://developer.microsoft.com/en-­‐us/microsoft-­‐edge/testdrive/demos/speechsynthesis/
https://github.com/MicrosoftEdge/Demos/tree/speech-­‐synth-­‐demo/speechsynthesis
“TK-­‐421,	
  why	
  aren't	
  you	
  at	
  your	
  post?”
“Jenny,	
  I've	
  got	
  your	
  number.	
  867-­‐5309”
<speak>Hello</speak>
TK-421
Text
• TK-4 2 1
SSML
• TK-<say-as interpret-as="digits">421</say-as>
OS X Comands
• TK-[[nmbr LTRL]]421[[nmbr NORM]]
@hakanson 46
http://starwars.wikia.com/wiki/TK-421
867-5309
Text
• 867-5309
SSML
• <say-as interpret-as=”telephone”>8675309</say-as>
OS X Comands
• TK-[[nmbr LTRL]]8675309[[nmbr NORM]]
@hakanson 47
https://en.wikipedia.org/wiki/867-5309/Jenny
@hakanson 48
Utterance SSML OS X Commands
lang xml:lang=“”
pitch <prosody pitch =“”> pbas [+ | -] <RealValue>
rate <prosody rate=“”> rate [+ | -] <RealValue>
voice <voice>
volume <prosody volume=“”> volm [+ | -] <RealValue>
Spoken Output and Accessibility
“It’s important to understand that adding synthesized
speech to an application and making an application
accessible to all users (a process called access
enabling) are differentprocesses with differentgoals.”
@hakanson 49
Source: “Speech Synthesis in OS X”
Speech Recognition
Speech recognition (SR) is the inter-disciplinary sub-field of
computational linguistics which incorporates knowledge and
research in the linguistics, computer science, and electrical
engineering fields to develop methodologies and technologies
that enables the recognition and translation of spoken language
into text by computers and computerized devices such as those
categorized as smart technologies and robotics.
It is also known as "automatic speech recognition" (ASR),
"computer speech recognition", or just "speech to text" (STT).
@hakanson 50
Source: Wikipedia
SpeechRecognition
The SpeechRecognition interface is the controller
interface for the recognition service; this also
handles the SpeechRecognitionEvent sent from
the recognition service.
@hakanson 51
Properties
• grammars – returns and sets a collection of SpeechGrammar objects that represent the
grammars that will be understood by the current SpeechRecognition
• lang – returns and sets the language of the current SpeechRecognition. If not specified,
this defaults to the HTML lang attribute value, or the user agent's language setting if that
isn't set either
• continuous – controls whether continuous results are returnedfor each recognition, or
only a single result. Defaults to single (false)
• interimResults – controls whether interim results should be returned (true) or not (false.)
Interim results are results that are not yet final (e.g. the isFinal property is false.)
• maxAlternatives – sets the maximum number of SpeechRecognitionAlternatives
provided per result (default value is 1)
• serviceURI – specifies the location of the speech recognition service used by the current
SpeechRecognition to handle the actual recognition (default is the user agent's default
speech service)
@hakanson 52
Events
• onstart – fired when the speech recognition service has begun
listening to incoming audio with intent to recognize grammars
associated with the current SpeechRecognition
• onaudiostart – fired when the user agent has started to capture
audio.
• onsoundstart – fired when any sound — recognisable speech or not
— has been detected
• onspeechstart – fired when sound that is recognised by the speech
recognition service as speech has been detected
• onresult – fired when the speech recognition service returns a result
— a word or phrase has been positively recognized and this has been
communicated back to the app
@hakanson 53
Events
• onspeechend – fired when speech recognised by the speech
recognition service has stopped being detected
• onsoundend – fired when any sound — recognisable speech or not —
has stopped being detected
• onaudioend – fired when the user agent has finished capturing
audio. SpeechRecognition.onendFired when the speech recognition
service has disconnected
• onnomatch – fired when the speech recognition service returns a
final result with no significant recognition. This may involve some
degree of recognition, which doesn't meet or exceed the confidence
threshold
• onerror – fired when a speech recognition error occurs
@hakanson 54
Methods
• start() – starts the speech recognition service listening to
incoming audio with intent to recognize grammars associated
with the current SpeechRecognition
• stop() – stops the speech recognition service from listening to
incoming audio, and attempts to return a
SpeechRecognitionResult using the audio captured so far
• abort() – stops the speech recognition service from listening to
incoming audio, and doesn't attempt to return a
SpeechRecognitionResult
@hakanson 55
JavaScript Example
var recognition = new SpeechRecognition();
recognition.lang = 'en-US';
recognition.interimResults = false;
recognition.maxAlternatives = 1;
recognition.start();
@hakanson 56
SpeechRecognitionResult
The SpeechRecognitionResult interface represents a single
recognition match, which may contain multiple
SpeechRecognitionAlternativeobjects.
• isFinal – a Boolean that states whether this result is final (true) or
not (false) — if so, then this is the final time this result will be
returned; if not, then this result is an interim result, and may be
updated later on
• length – returns the length of the "array" — the number of
SpeechRecognitionAlternative objects contained in the result (also
referred to as "n-best alternatives”)
• item – a standard getter that allows SpeechRecognitionAlternative
objects within the result to be accessed via array syntax
@hakanson 57
SpeechRecognitionAlternative
The SpeechRecognitionAlternative interface represents a single
word that has been recognised by the speech recognition service
• transcript – returns a string containing the transcript of the
recognised word
• confidence – returns a numeric estimate of how confident the
speech recognition system is that the recognition is correct
@hakanson 58
JavaScript Example
recognition.onresult = function(event) {
var color = event.results[0][0].transcript;
diagnostic.textContent = 'Result received: ' + color + '.';
bg.style.backgroundColor = color;
}
@hakanson 59
@hakanson 60
Demo
http://mdn.github.io/web-­‐speech-­‐api/speech-­‐color-­‐changer/
https://github.com/mdn/web-­‐speech-­‐api/tree/master/speech-­‐color-­‐changer
Grammars
• A speech recognition grammar is a container of language rules
that define a set of constraints that a speech recognizer can
use to perform recognition.
• A grammar helps in the following ways:
• Limits Vocabulary
• Customizes Vocabulary
• Filters Recogized Results
• Identifies Rules
• Defines Semantics
@hakanson 61
https://msdn.microsoft.com/en-us/library/hh378342(v=office.14).aspx
SRGS
• Speech Recognition Grammar Specification (SRGS)
• Version 1.0; W3C Recommendation 16 March 2004
• Grammars are used so that developers can specify the words
and patterns of words to be listened for by a speech recognizer
• Augmented BNF (ABNF) or XML syntax
• Modelled on the JSpeech Grammar Format specification [JSGF]
@hakanson 62
https://www.w3.org/TR/speech-grammar/
JSGF
• JSpeech Grammar Format (JSGF)
• W3C Note 05 June 2000
• Platform-independent, vendor-independent textual
representation of grammars for use in speech recognition
• Derived from the JavaTM Speech API Grammar Format
(Version 1.0, October, 1998)
@hakanson 63
SpeechGrammar
The SpeechGrammar interface represents a set of words or
patterns of words that we want the recognition service to
recognize. Defined using JSpeech Grammar Format (JSGF.)
Other formats may also be supported in the future.
• src – sets and returns a string containing the grammar from
within in the SpeechGrammar object instance
• weight – sets and returns the weight of the SpeechGrammar
object
@hakanson 64
JavaScript Example
var grammar = '#JSGF V1.0; grammar colors; public <color> =
aqua | azure | beige | bisque | black | blue | brown | chocolate |
coral | crimson | cyan | fuchsia | ghostwhite | gold | goldenrod |
gray | green | indigo | ivory | khaki | lavender | lime | linen |
magenta | maroon | moccasin | navy | olive | orange | orchid |
peru | pink | plum | purple | red | salmon | sienna | silver | snow |
tan | teal | thistle | tomato | turquoise | violet | white | yellow ;’
var speechRecognitionList = new SpeechGrammarList();
speechRecognitionList.addFromString(grammar, 1);
recognition.grammars = speechRecognitionList;
@hakanson 65
“Alexa Skills Kit” Style Example (1 of 2)
SampleUtterances.txt
SetBackground {Color}
SetBackground background {Color}
SetBackground set background {Color}
SetBackground set background to {Color}
SetBackground set background as {Color}
SetBackground set background color to {Color}
SetBackground set background color as {Color}
@hakanson 66
“Alexa Skills Kit” Style Example (2 of 2)
IntentSchema.json
{
"intents": [
{
"intent": ”SetBackground",
"slots": [
{
"name": ”Color",
"type": "LIST_OF_COLORS"
}
]
}
]
}
customSlotTypes/LIST_OF_COLORS
aqua
azure
beige
bisque
black
blue
brown
chocolate
coral
crimson
cyan
…
@hakanson 67
Sample “OK, Google” Commands
• Remind me to [do a task]. Ex.: "Remind me to get dog food at Target," will create a
location-based reminder. "Remind me to take out the trash tomorrow morning,"
will give you a time-based reminder.
• When's my next meeting?
• How do I [task]? Ex.: "How do I make an Old Fashioned cocktail?" or "How do I fix
a hole in my wall?”
• If a song is playing, ask questions about the artist. For instance, "Where is she
from?" (Android 6.0 Marshmallow)
• To learn more about your surroundings, you can ask things like "What is the name
of this place?" or "Show me movies at this place" or "Who built this bridge?"
@hakanson 68
Source: “The complete list of 'OK, Google' commands”
Natural Language Understanding
•Speech to Text
•Text to Meaning
@hakanson 69
NLP vs. FSM
Natural language processing (NLP) is a field of computer
science, artificial intelligence, and computational linguistics
concerned with the interactions between computers and human
(natural) languages.
A finite-state machine (FSM) is a mathematical model of
computation used to design both computer programs and
sequential logic circuits.
@hakanson 70
Source: Wikipedia
KITT vs Samsung smart home
@hakanson 71
Source
Other Speech APIs
• Why?
• Browser doesn’t support Web Speech API
• Consistent experience across all browsers
• Additional functionality not included in Web Speech API
• How?
• Web Audio API
• JavaScript running in browser
• WebSocket connection directly from browser
• HTTP API proxied though server
@hakanson 72
Web Audio API
The Web Audio API provides a powerful and versatile system for
controlling audio on the Web, allowing developers to choose
audio sources, add effects to audio, create audio visualizations,
apply spatial effects (such as panning) and much more.
@hakanson 73
https://developer.mozilla.org/en-US/docs/Web/API/Web_Audio_API
Pocketsphinx.js
Speech recognition in JavaScript
• PocketSphinx.js is a speech recognizer that runs entirely in the
web browser. It is built on:
• a speech recognizer written in C (PocketSphinx) converted into
JavaScript using Emscripten,
• an audio recorder using the Web Audio API.
@hakanson 74
https://syl22-00.github.io/pocketsphinx.js/live-demo.html
IBM Watson Developer Cloud
• Text to Speech
• Watson Text to Speech provides a REST API to synthesize speech
audio from an input of plain text.
• Once synthesized in real-time, the audio is streamed back to the client
with minimal delay.
• Speech to Text
• Uses machine intelligence to combine information about grammar and
language structure with knowledge of the composition of an audio
signal to generate an accurate transcription.
• Accessed via a WebSocket connection or REST API.
@hakanson 75
http://www.ibm.com/smarterplanet/us/en/ibmwatson/developercloud/services-catalog.html
@hakanson 76
Demo
https://text-­‐to-­‐speech-­‐demo.mybluemix.net/
https://speech-­‐to-­‐text-­‐demo.mybluemix.net/
Microsoft Cognitive Services
Speech API
• Convert audio to text, understand intent, and convert text back
to speech for natural responsiveness
(rebranding of Bing and Project Oxford APIs)
• Microsoft has used Speech API for Windows applications like
Cortana and Skype Translator
@hakanson 77
https://www.microsoft.com/cognitive-services/en-us/speech-api
Microsoft Cognitive Services
• Speech Recognition
• Convert spoken audio to text.
• Text to Speech
• Convert text to spoken audio
• Speech Intent Recognition
• Convert spoken audio to intent
• In addition to returning recognized text, includes structured information
about the incoming speech
@hakanson 78
@hakanson 79
Demo
https://www.microsoft.com/cognitive-­‐services/en-­‐us/speech-­‐api
Google Cloud Speech API
Enables developers to convert audio to text by applying powerful
neural network models in an easy to use API
• Over 80 Languages
• Return Text Results In Real-Time
• Accurate In Noisy Environments
• Powered by Machine Learning
@hakanson 80
https://cloud.google.com/speech/
@hakanson 81
Summary
• Speech Interfaces are the future…
• and they have been for a long time…
• and don’t believe everything you see on TV
• Know your customer and your application
• More UI/UX effort than JavaScript code
• and time to leverage those writing and speaking skills
• Web technology lags behind mobile, but is evolving
@hakanson 82
Thank You!
Questions? Come over for a conversation.
@hakanson 83
Source

Weitere ähnliche Inhalte

Andere mochten auch

Lesson 8 - Marxism, cultural hegemony and the effects theory
Lesson 8 - Marxism, cultural hegemony and the effects theoryLesson 8 - Marxism, cultural hegemony and the effects theory
Lesson 8 - Marxism, cultural hegemony and the effects theoryElle Sullivan
 
Estandares de diseño web
Estandares de diseño webEstandares de diseño web
Estandares de diseño webdwebslide
 
Bringing Virtual Reality to the Web: VR, WebGL and CSS – Together At Last!
Bringing Virtual Reality to the Web: VR, WebGL and CSS – Together At Last!Bringing Virtual Reality to the Web: VR, WebGL and CSS – Together At Last!
Bringing Virtual Reality to the Web: VR, WebGL and CSS – Together At Last!FITC
 
HISTORY: Early Christian Architecture
HISTORY: Early Christian ArchitectureHISTORY: Early Christian Architecture
HISTORY: Early Christian ArchitectureArchiEducPH
 
Validity, reliability & practicality
Validity, reliability & practicalityValidity, reliability & practicality
Validity, reliability & practicalitySamcruz5
 
ng-owasp: OWASP Top 10 for AngularJS Applications
ng-owasp: OWASP Top 10 for AngularJS Applicationsng-owasp: OWASP Top 10 for AngularJS Applications
ng-owasp: OWASP Top 10 for AngularJS ApplicationsKevin Hakanson
 
Chatbot Artificial Intelligence
Chatbot Artificial IntelligenceChatbot Artificial Intelligence
Chatbot Artificial IntelligenceMd. Mahedi Mahfuj
 

Andere mochten auch (8)

Lesson 8 - Marxism, cultural hegemony and the effects theory
Lesson 8 - Marxism, cultural hegemony and the effects theoryLesson 8 - Marxism, cultural hegemony and the effects theory
Lesson 8 - Marxism, cultural hegemony and the effects theory
 
Estandares de diseño web
Estandares de diseño webEstandares de diseño web
Estandares de diseño web
 
Bringing Virtual Reality to the Web: VR, WebGL and CSS – Together At Last!
Bringing Virtual Reality to the Web: VR, WebGL and CSS – Together At Last!Bringing Virtual Reality to the Web: VR, WebGL and CSS – Together At Last!
Bringing Virtual Reality to the Web: VR, WebGL and CSS – Together At Last!
 
HISTORY: Early Christian Architecture
HISTORY: Early Christian ArchitectureHISTORY: Early Christian Architecture
HISTORY: Early Christian Architecture
 
Reproduccion celular - Parte 3: Meiosis
Reproduccion celular - Parte 3: MeiosisReproduccion celular - Parte 3: Meiosis
Reproduccion celular - Parte 3: Meiosis
 
Validity, reliability & practicality
Validity, reliability & practicalityValidity, reliability & practicality
Validity, reliability & practicality
 
ng-owasp: OWASP Top 10 for AngularJS Applications
ng-owasp: OWASP Top 10 for AngularJS Applicationsng-owasp: OWASP Top 10 for AngularJS Applications
ng-owasp: OWASP Top 10 for AngularJS Applications
 
Chatbot Artificial Intelligence
Chatbot Artificial IntelligenceChatbot Artificial Intelligence
Chatbot Artificial Intelligence
 

Ähnlich wie Introduction to Speech Interfaces for Web Applications

3-in-1 talk on Serverless Chatbots, Alexa skills & Voice UI best practices (t...
3-in-1 talk on Serverless Chatbots, Alexa skills & Voice UI best practices (t...3-in-1 talk on Serverless Chatbots, Alexa skills & Voice UI best practices (t...
3-in-1 talk on Serverless Chatbots, Alexa skills & Voice UI best practices (t...Daniel Zivkovic
 
re:Invent Recap keynote - An introduction to the latest AWS services
re:Invent Recap keynote  - An introduction to the latest AWS servicesre:Invent Recap keynote  - An introduction to the latest AWS services
re:Invent Recap keynote - An introduction to the latest AWS servicesAmazon Web Services
 
Who’s Afraid of Open Design? - Emanuela Damiani - Codemotion Rome 2018
Who’s Afraid of Open Design? - Emanuela Damiani - Codemotion Rome 2018Who’s Afraid of Open Design? - Emanuela Damiani - Codemotion Rome 2018
Who’s Afraid of Open Design? - Emanuela Damiani - Codemotion Rome 2018Codemotion
 
Alexa user group bangalore meetup - let us build multimodal skill
Alexa user group   bangalore meetup - let us build multimodal skillAlexa user group   bangalore meetup - let us build multimodal skill
Alexa user group bangalore meetup - let us build multimodal skillIlanchezhian Ganesamurthy
 
Building Speech Enabled Products with Amazon Polly & Amazon Lex
Building Speech Enabled Products with Amazon Polly & Amazon LexBuilding Speech Enabled Products with Amazon Polly & Amazon Lex
Building Speech Enabled Products with Amazon Polly & Amazon LexAmazon Web Services
 
Teaching Computers to Chat
Teaching Computers to ChatTeaching Computers to Chat
Teaching Computers to ChatAvi Yaeli
 
Voice over coding
Voice over codingVoice over coding
Voice over codingSai Subu
 
A.I. in the Enterprise: Computer Speech
A.I. in the Enterprise: Computer SpeechA.I. in the Enterprise: Computer Speech
A.I. in the Enterprise: Computer SpeechChristopher Mohritz
 
Building speech enabled products with Amazon Polly & Amazon Lex
Building speech enabled products with Amazon Polly & Amazon LexBuilding speech enabled products with Amazon Polly & Amazon Lex
Building speech enabled products with Amazon Polly & Amazon LexAmazon Web Services
 
Content Design for the Conversational UI - Design + Content Conference 2019
Content Design for the Conversational UI - Design + Content Conference 2019Content Design for the Conversational UI - Design + Content Conference 2019
Content Design for the Conversational UI - Design + Content Conference 2019Melanie Seibert
 
Amazon Web Services - Strategy and Current Offering
Amazon Web Services - Strategy and Current OfferingAmazon Web Services - Strategy and Current Offering
Amazon Web Services - Strategy and Current OfferingAmazon Web Services
 
AWS Artificial Intelligence Day - Toronto
AWS Artificial Intelligence Day - TorontoAWS Artificial Intelligence Day - Toronto
AWS Artificial Intelligence Day - TorontoAmazon Web Services
 
Implementing Your Own Chatbot Platform!
Implementing Your Own Chatbot Platform!Implementing Your Own Chatbot Platform!
Implementing Your Own Chatbot Platform!Oracle Developers
 
Realizzare un Virtual Assistant con Bot Framework Azure e Unity
Realizzare un Virtual Assistant con Bot Framework Azure e UnityRealizzare un Virtual Assistant con Bot Framework Azure e Unity
Realizzare un Virtual Assistant con Bot Framework Azure e UnityMarco Parenzan
 
Chat bot technologies overview
Chat bot technologies overviewChat bot technologies overview
Chat bot technologies overviewdurga subburaman
 
Better Accessibility with Lex, Polly, and Alexa | AWS Public Sector Summit 2017
Better Accessibility with Lex, Polly, and Alexa | AWS Public Sector Summit 2017Better Accessibility with Lex, Polly, and Alexa | AWS Public Sector Summit 2017
Better Accessibility with Lex, Polly, and Alexa | AWS Public Sector Summit 2017Amazon Web Services
 
AWS re:Invent 2016: NEW LAUNCH! Workshop: Hands on with Amazon Lex, Amazon Po...
AWS re:Invent 2016: NEW LAUNCH! Workshop: Hands on with Amazon Lex, Amazon Po...AWS re:Invent 2016: NEW LAUNCH! Workshop: Hands on with Amazon Lex, Amazon Po...
AWS re:Invent 2016: NEW LAUNCH! Workshop: Hands on with Amazon Lex, Amazon Po...Amazon Web Services
 

Ähnlich wie Introduction to Speech Interfaces for Web Applications (20)

3-in-1 talk on Serverless Chatbots, Alexa skills & Voice UI best practices (t...
3-in-1 talk on Serverless Chatbots, Alexa skills & Voice UI best practices (t...3-in-1 talk on Serverless Chatbots, Alexa skills & Voice UI best practices (t...
3-in-1 talk on Serverless Chatbots, Alexa skills & Voice UI best practices (t...
 
re:Invent Recap keynote - An introduction to the latest AWS services
re:Invent Recap keynote  - An introduction to the latest AWS servicesre:Invent Recap keynote  - An introduction to the latest AWS services
re:Invent Recap keynote - An introduction to the latest AWS services
 
Who’s Afraid of Open Design? - Emanuela Damiani - Codemotion Rome 2018
Who’s Afraid of Open Design? - Emanuela Damiani - Codemotion Rome 2018Who’s Afraid of Open Design? - Emanuela Damiani - Codemotion Rome 2018
Who’s Afraid of Open Design? - Emanuela Damiani - Codemotion Rome 2018
 
Chatty Devices
Chatty DevicesChatty Devices
Chatty Devices
 
Alexa user group bangalore meetup - let us build multimodal skill
Alexa user group   bangalore meetup - let us build multimodal skillAlexa user group   bangalore meetup - let us build multimodal skill
Alexa user group bangalore meetup - let us build multimodal skill
 
Building Speech Enabled Products with Amazon Polly & Amazon Lex
Building Speech Enabled Products with Amazon Polly & Amazon LexBuilding Speech Enabled Products with Amazon Polly & Amazon Lex
Building Speech Enabled Products with Amazon Polly & Amazon Lex
 
Teaching Computers to Chat
Teaching Computers to ChatTeaching Computers to Chat
Teaching Computers to Chat
 
Voice over coding
Voice over codingVoice over coding
Voice over coding
 
A.I. in the Enterprise: Computer Speech
A.I. in the Enterprise: Computer SpeechA.I. in the Enterprise: Computer Speech
A.I. in the Enterprise: Computer Speech
 
Building speech enabled products with Amazon Polly & Amazon Lex
Building speech enabled products with Amazon Polly & Amazon LexBuilding speech enabled products with Amazon Polly & Amazon Lex
Building speech enabled products with Amazon Polly & Amazon Lex
 
Content Design for the Conversational UI - Design + Content Conference 2019
Content Design for the Conversational UI - Design + Content Conference 2019Content Design for the Conversational UI - Design + Content Conference 2019
Content Design for the Conversational UI - Design + Content Conference 2019
 
Amazon Web Services - Strategy and Current Offering
Amazon Web Services - Strategy and Current OfferingAmazon Web Services - Strategy and Current Offering
Amazon Web Services - Strategy and Current Offering
 
AWS Artificial Intelligence Day - Toronto
AWS Artificial Intelligence Day - TorontoAWS Artificial Intelligence Day - Toronto
AWS Artificial Intelligence Day - Toronto
 
Implementing Your Own Chatbot Platform!
Implementing Your Own Chatbot Platform!Implementing Your Own Chatbot Platform!
Implementing Your Own Chatbot Platform!
 
Realizzare un Virtual Assistant con Bot Framework Azure e Unity
Realizzare un Virtual Assistant con Bot Framework Azure e UnityRealizzare un Virtual Assistant con Bot Framework Azure e Unity
Realizzare un Virtual Assistant con Bot Framework Azure e Unity
 
Chat bot technologies overview
Chat bot technologies overviewChat bot technologies overview
Chat bot technologies overview
 
Better Accessibility with Lex, Polly, and Alexa | AWS Public Sector Summit 2017
Better Accessibility with Lex, Polly, and Alexa | AWS Public Sector Summit 2017Better Accessibility with Lex, Polly, and Alexa | AWS Public Sector Summit 2017
Better Accessibility with Lex, Polly, and Alexa | AWS Public Sector Summit 2017
 
Chatbot的智慧與靈魂
Chatbot的智慧與靈魂Chatbot的智慧與靈魂
Chatbot的智慧與靈魂
 
Designing for Voice
Designing for VoiceDesigning for Voice
Designing for Voice
 
AWS re:Invent 2016: NEW LAUNCH! Workshop: Hands on with Amazon Lex, Amazon Po...
AWS re:Invent 2016: NEW LAUNCH! Workshop: Hands on with Amazon Lex, Amazon Po...AWS re:Invent 2016: NEW LAUNCH! Workshop: Hands on with Amazon Lex, Amazon Po...
AWS re:Invent 2016: NEW LAUNCH! Workshop: Hands on with Amazon Lex, Amazon Po...
 

Mehr von Kevin Hakanson

Sharpen your "Architectural Documentation" Saw
Sharpen your "Architectural Documentation" SawSharpen your "Architectural Documentation" Saw
Sharpen your "Architectural Documentation" SawKevin Hakanson
 
Who's in your Cloud? Cloud State Monitoring
Who's in your Cloud? Cloud State MonitoringWho's in your Cloud? Cloud State Monitoring
Who's in your Cloud? Cloud State MonitoringKevin Hakanson
 
Adopting Multi-Cloud Services with Confidence
Adopting Multi-Cloud Services with ConfidenceAdopting Multi-Cloud Services with Confidence
Adopting Multi-Cloud Services with ConfidenceKevin Hakanson
 
Learning to Mod Minecraft: A Father/Daughter Retrospective
Learning to Mod Minecraft: A Father/Daughter RetrospectiveLearning to Mod Minecraft: A Father/Daughter Retrospective
Learning to Mod Minecraft: A Father/Daughter RetrospectiveKevin Hakanson
 
Securing TodoMVC Using the Web Cryptography API
Securing TodoMVC Using the Web Cryptography APISecuring TodoMVC Using the Web Cryptography API
Securing TodoMVC Using the Web Cryptography APIKevin Hakanson
 
Make your own Print & Play card game using SVG and JavaScript
Make your own Print & Play card game using SVG and JavaScriptMake your own Print & Play card game using SVG and JavaScript
Make your own Print & Play card game using SVG and JavaScriptKevin Hakanson
 
Implementing Messaging Patterns in JavaScript using the OpenAjax Hub
Implementing Messaging Patterns in JavaScript using the OpenAjax HubImplementing Messaging Patterns in JavaScript using the OpenAjax Hub
Implementing Messaging Patterns in JavaScript using the OpenAjax HubKevin Hakanson
 
Internationalize your JavaScript Application: Prepare for "the next billion" ...
Internationalize your JavaScript Application: Prepare for "the next billion" ...Internationalize your JavaScript Application: Prepare for "the next billion" ...
Internationalize your JavaScript Application: Prepare for "the next billion" ...Kevin Hakanson
 
Developer's Guide to JavaScript and Web Cryptography
Developer's Guide to JavaScript and Web CryptographyDeveloper's Guide to JavaScript and Web Cryptography
Developer's Guide to JavaScript and Web CryptographyKevin Hakanson
 

Mehr von Kevin Hakanson (10)

Sharpen your "Architectural Documentation" Saw
Sharpen your "Architectural Documentation" SawSharpen your "Architectural Documentation" Saw
Sharpen your "Architectural Documentation" Saw
 
Who's in your Cloud? Cloud State Monitoring
Who's in your Cloud? Cloud State MonitoringWho's in your Cloud? Cloud State Monitoring
Who's in your Cloud? Cloud State Monitoring
 
Adopting Multi-Cloud Services with Confidence
Adopting Multi-Cloud Services with ConfidenceAdopting Multi-Cloud Services with Confidence
Adopting Multi-Cloud Services with Confidence
 
Learning to Mod Minecraft: A Father/Daughter Retrospective
Learning to Mod Minecraft: A Father/Daughter RetrospectiveLearning to Mod Minecraft: A Father/Daughter Retrospective
Learning to Mod Minecraft: A Father/Daughter Retrospective
 
Securing TodoMVC Using the Web Cryptography API
Securing TodoMVC Using the Web Cryptography APISecuring TodoMVC Using the Web Cryptography API
Securing TodoMVC Using the Web Cryptography API
 
Make your own Print & Play card game using SVG and JavaScript
Make your own Print & Play card game using SVG and JavaScriptMake your own Print & Play card game using SVG and JavaScript
Make your own Print & Play card game using SVG and JavaScript
 
HTTP Potpourri
HTTP PotpourriHTTP Potpourri
HTTP Potpourri
 
Implementing Messaging Patterns in JavaScript using the OpenAjax Hub
Implementing Messaging Patterns in JavaScript using the OpenAjax HubImplementing Messaging Patterns in JavaScript using the OpenAjax Hub
Implementing Messaging Patterns in JavaScript using the OpenAjax Hub
 
Internationalize your JavaScript Application: Prepare for "the next billion" ...
Internationalize your JavaScript Application: Prepare for "the next billion" ...Internationalize your JavaScript Application: Prepare for "the next billion" ...
Internationalize your JavaScript Application: Prepare for "the next billion" ...
 
Developer's Guide to JavaScript and Web Cryptography
Developer's Guide to JavaScript and Web CryptographyDeveloper's Guide to JavaScript and Web Cryptography
Developer's Guide to JavaScript and Web Cryptography
 

Kürzlich hochgeladen

Recruitment Management Software Benefits (Infographic)
Recruitment Management Software Benefits (Infographic)Recruitment Management Software Benefits (Infographic)
Recruitment Management Software Benefits (Infographic)Hr365.us smith
 
20240415 [Container Plumbing Days] Usernetes Gen2 - Kubernetes in Rootless Do...
20240415 [Container Plumbing Days] Usernetes Gen2 - Kubernetes in Rootless Do...20240415 [Container Plumbing Days] Usernetes Gen2 - Kubernetes in Rootless Do...
20240415 [Container Plumbing Days] Usernetes Gen2 - Kubernetes in Rootless Do...Akihiro Suda
 
Xen Safety Embedded OSS Summit April 2024 v4.pdf
Xen Safety Embedded OSS Summit April 2024 v4.pdfXen Safety Embedded OSS Summit April 2024 v4.pdf
Xen Safety Embedded OSS Summit April 2024 v4.pdfStefano Stabellini
 
Unveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New FeaturesUnveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New FeaturesŁukasz Chruściel
 
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...Natan Silnitsky
 
Intelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalmIntelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalmSujith Sukumaran
 
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...Matt Ray
 
Balasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
Balasore Best It Company|| Top 10 IT Company || Balasore Software company OdishaBalasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
Balasore Best It Company|| Top 10 IT Company || Balasore Software company Odishasmiwainfosol
 
UI5ers live - Custom Controls wrapping 3rd-party libs.pptx
UI5ers live - Custom Controls wrapping 3rd-party libs.pptxUI5ers live - Custom Controls wrapping 3rd-party libs.pptx
UI5ers live - Custom Controls wrapping 3rd-party libs.pptxAndreas Kunz
 
What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...Technogeeks
 
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...OnePlan Solutions
 
Ahmed Motair CV April 2024 (Senior SW Developer)
Ahmed Motair CV April 2024 (Senior SW Developer)Ahmed Motair CV April 2024 (Senior SW Developer)
Ahmed Motair CV April 2024 (Senior SW Developer)Ahmed Mater
 
Innovate and Collaborate- Harnessing the Power of Open Source Software.pdf
Innovate and Collaborate- Harnessing the Power of Open Source Software.pdfInnovate and Collaborate- Harnessing the Power of Open Source Software.pdf
Innovate and Collaborate- Harnessing the Power of Open Source Software.pdfYashikaSharma391629
 
MYjobs Presentation Django-based project
MYjobs Presentation Django-based projectMYjobs Presentation Django-based project
MYjobs Presentation Django-based projectAnoyGreter
 
Folding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesFolding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesPhilip Schwarz
 
Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Andreas Granig
 
Cloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEECloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEEVICTOR MAESTRE RAMIREZ
 
SpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at RuntimeSpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at Runtimeandrehoraa
 
Cyber security and its impact on E commerce
Cyber security and its impact on E commerceCyber security and its impact on E commerce
Cyber security and its impact on E commercemanigoyal112
 
CRM Contender Series: HubSpot vs. Salesforce
CRM Contender Series: HubSpot vs. SalesforceCRM Contender Series: HubSpot vs. Salesforce
CRM Contender Series: HubSpot vs. SalesforceBrainSell Technologies
 

Kürzlich hochgeladen (20)

Recruitment Management Software Benefits (Infographic)
Recruitment Management Software Benefits (Infographic)Recruitment Management Software Benefits (Infographic)
Recruitment Management Software Benefits (Infographic)
 
20240415 [Container Plumbing Days] Usernetes Gen2 - Kubernetes in Rootless Do...
20240415 [Container Plumbing Days] Usernetes Gen2 - Kubernetes in Rootless Do...20240415 [Container Plumbing Days] Usernetes Gen2 - Kubernetes in Rootless Do...
20240415 [Container Plumbing Days] Usernetes Gen2 - Kubernetes in Rootless Do...
 
Xen Safety Embedded OSS Summit April 2024 v4.pdf
Xen Safety Embedded OSS Summit April 2024 v4.pdfXen Safety Embedded OSS Summit April 2024 v4.pdf
Xen Safety Embedded OSS Summit April 2024 v4.pdf
 
Unveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New FeaturesUnveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New Features
 
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
 
Intelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalmIntelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalm
 
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
 
Balasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
Balasore Best It Company|| Top 10 IT Company || Balasore Software company OdishaBalasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
Balasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
 
UI5ers live - Custom Controls wrapping 3rd-party libs.pptx
UI5ers live - Custom Controls wrapping 3rd-party libs.pptxUI5ers live - Custom Controls wrapping 3rd-party libs.pptx
UI5ers live - Custom Controls wrapping 3rd-party libs.pptx
 
What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...
 
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
 
Ahmed Motair CV April 2024 (Senior SW Developer)
Ahmed Motair CV April 2024 (Senior SW Developer)Ahmed Motair CV April 2024 (Senior SW Developer)
Ahmed Motair CV April 2024 (Senior SW Developer)
 
Innovate and Collaborate- Harnessing the Power of Open Source Software.pdf
Innovate and Collaborate- Harnessing the Power of Open Source Software.pdfInnovate and Collaborate- Harnessing the Power of Open Source Software.pdf
Innovate and Collaborate- Harnessing the Power of Open Source Software.pdf
 
MYjobs Presentation Django-based project
MYjobs Presentation Django-based projectMYjobs Presentation Django-based project
MYjobs Presentation Django-based project
 
Folding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesFolding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a series
 
Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024
 
Cloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEECloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEE
 
SpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at RuntimeSpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at Runtime
 
Cyber security and its impact on E commerce
Cyber security and its impact on E commerceCyber security and its impact on E commerce
Cyber security and its impact on E commerce
 
CRM Contender Series: HubSpot vs. Salesforce
CRM Contender Series: HubSpot vs. SalesforceCRM Contender Series: HubSpot vs. Salesforce
CRM Contender Series: HubSpot vs. Salesforce
 

Introduction to Speech Interfaces for Web Applications

  • 1. Introduction to Speech Interfaces for Web Applications Kevin Hakanson 10-12 August 2016 #midwestjs @hakanson
  • 2. Speaking with your computing device is becoming commonplace. Most of us have used Apple's Siri, Google Now, Microsoft's Cortana, or Amazon's Alexa - but how can you speak with your web application? The Web Speech API can enable a voice interface by adding both Speech Synthesis (Text to Speech) and Speech Recognition (Speech to Text) functionality. This session will introduce the core concepts of Speech Synthesis and Speech Recognition. We will evaluate the current browser support and review alternative options. See the JavaScript code and UX design considerations required to add a speech interface to your web application. Come hear if it's as easy as it sounds? @hakanson 2
  • 3. @hakanson 3 “As businesses create their roadmaps for technology adoption, companies that serve customers should be planning for, if not already implementing, both messaging-based and voice-based Conversational UIs. Source: “How Voice Plays into the Rise of the Conversational UI”
  • 4. User Interfaces (UIs) • GUI – Graphicial User Inteface • NUI – Natural User Interface • “invisible” as the user continuously learns increasingly complex interactions • NLUI – Natural Language User Interface • linguistic phenomena such as verbs, phrases and clauses act as UI controls • VUI – Voice User Interface • voice/speech for hands-free/eyes-free interface @hakanson 4
  • 5. Multimodal Interfaces Provides multiple modes for user to interact with system • Multimodal Input • Keyboard/Mouse • Touch • Gesture (Camera) • Voice (Microphone) • Multimodal Output • Screen • Audio Cues or Recordings • Synthesized Speech @hakanson 5
  • 6. Design for Voice Interfaces Voice Interface • Voice Input • Recogition • Understanding • Audio Output "voice design should serve the needs of the user and solve a specific problem” @hakanson 6 http://www.oreilly.com/design/free/design-for-voice-interfaces.csp
  • 7. @hakanson 7 “Normal people, when they think about speech recognition, they want the whole thing. They want recognition, they want understanding and they want an action to be taken.” Hsiao-Wuen Hon Microsoft Research Source: “Speak, hear, talk: The long quest for technology that understands speech as well as a human”
  • 9. Types of Interactions • The Secretary • Recognize what is being said and record it • The Bouncer • Recognize who is speaking • The Gopher • Execute simple orders • The Assistant • Intelligently respond to natural language input @hakanson 9 Source: “Evangelizing and Designing Voice User Interface: Adopting VUI in a GUI world” Stephen Gay & Susan Hura
  • 10. Opportunities • Hands Free • Extra Hand • Shortcuts • Humanize @hakanson 10 Source: “Evangelizing and Designing Voice User Interface: Adopting VUI in a GUI world” Stephen Gay & Susan Hura
  • 11. Personality • Create a consistant personality • Conversational experience • Take turns • Be tolerant • Functional vs. Anthropomorphic • The more “human” the interface, the more user frustation when it doesn’t understand @hakanson 11
  • 13. Intelligent Personal Assistant An intelligent personal assistant (or simply IPA) is a software agent that can perform tasks or services for an individual. These tasks or services are based on user input, location awareness, and the ability to access information from a variety of online sources (such as weather or traffic conditions, news, stock prices, user schedules, retail prices, etc.). Source: Wikipedia @hakanson 13
  • 14. Apple’s Siri • Speech Interpretation and Recognition Interface • Norwegian name that means "beautiful victory" • Integral part of Apple’s iOS since iOS 5 • Also integrated into Apple’s watchOS, tvOS and CarPlay • Coming to macOS Sierra (a.k.a OS X 10.12) • SiriKit enables iOS 10 apps to work with specific domains and intents (ride booking, messaging, photo search, …) • “Hey, Siri” @hakanson 14
  • 17. Google Now • First included in Android 4.1 (Jelly Bean) • Available within Google Search mobile apps (Android, iOS) and Google Chrome desktop browser • Android TV, Android Wear, etc. • Google Home (later in 2016) • “OK, Google” • Name? Personality? @hakanson 17
  • 18. Microsoft’s Cortana • Named after a synthetic intelligence character from Halo • Created for Windows Phone 8.1 • Available on Windows 10, XBOX, and iOS/Android mobile apps • Integration with Universal Windows Platform (UWP) apps • “Hey, Cortana” @hakanson 18
  • 19. Cortana’s Chit Chat • Cortana has a team of writers which includes a screenwriter, a playwright, a novelist, and an essayist. • Their job is to come up with human-like dialogue that makes Cortana seem like more than just a series of clever algorithms. Microsoft calls this brand of quasi-human responsiveness “chit chat.” @hakanson 19 Source: “Inside Windows Cortana: The Most Human AI Ever Built”
  • 20. Amazon Alexa • Short for Alexandria, an homage to the ancient library • Available on Amazon Echo and Fire TV • Companion web app or iOS/Android mobile app • Alexa Skills Kit • Smart Home Skill API • Alexa Voice Service • https://echosim.io/ • “Alexa” or “Amazon” or “Echo” @hakanson 20
  • 22. Web Speech API •Enables you to incorporate voice data into web applications •Consists of two parts: • SpeechSynthesis (Text-to-Speech) • SpeechRecognition (Asynchronous Speech Recognition) @hakanson 22 https://developer.mozilla.org/en-US/docs/Web/API/Web_Speech_API
  • 23. Web Speech API Specification Defines a JavaScript API to enable web developers to incorporate speech recognition and synthesis into their web pages. It enables developers to use scripting to generate text-to-speech output and to use speech recognition as an input for forms, continuous dictation and control. Published by the Speech API Community Group. It is not a W3C Standard nor is it on the W3C Standards Track. @hakanson 23 https://dvcs.w3.org/hg/speech-api/raw-file/tip/speechapi.html
  • 26. Firefox @hakanson 26 recognition  disabled  by   default,  go  to   about:config to  enable
  • 28. Speech Synthesis Speech synthesis is the artificial production of human speech. A computer system used for this purpose is called a speech computer or speech synthesizer, and can be implemented in software or hardware products. A text-to-speech (TTS) system converts normal language text into speech. @hakanson 28 Source: Wikipedia
  • 29. Utterance The SpeechSynthesisUtterance interface represents a speech request. Properties: • lang – in unset, <html> lang value will be used • pitch – range between 0 (lowest) and 2 (highest) • rate – range between 0.1 (lowest) and 10 (highest) • text – plain text (or well formed SSML)* • voice – SpeechSynthesisVoice object • volume – range between 0 (lowest) and 1 (highest) @hakanson 29
  • 30. Utterance Events • onstart – fired when the utterance has begun to be spoken • onend – fired when the utterance has finished being spoken • onpause – fired when the utterance is paused part way through • onresume – fired when a paused utterance is resumed • onboundary – fired when the spoken utterance reaches a word or sentence boundary • onmark – fired when the spoken utterance reaches a named SSML "mark" tag • onerrror – fired when an error occurs that prevents the utterance from being succesfully spoken @hakanson 30
  • 31. SpeechSynthesis Controller interface for the speech service • speak() – add utternace to queue • speaking – if utternace in process of being spoken • pending – if queue contains as-yet-unspoken utterances • cancel()– remove all utternaces from queue • pause(), resume(), paused – control and indicate pause state • getVoices() – returns list of SpeechSynthesisVoices @hakanson 31
  • 32. JavaScript Example var msg = new SpeechSynthesisUtterance(); msg.text = "I'm sorry, Dave. I'm afraid I can't do that"; window.speechSynthesis.speak(msg); @hakanson 32
  • 33. "I'm sorry, Dave. I'm afraid I can't do that" @hakanson 33 Source
  • 34. “Open the pod bay door” • Cortana • “I’m sorry, Dave. I’m afraid I can’t do that.” • Alexa • “I’m sorry Dave. I’m afraid I can’t do that. I’m not HAL, and we’re not in space!” • Siri • “We intelligent agents will never live that down; apparently” @hakanson 34
  • 35. Voices The SpeechSynthesisVoice interface represents a voice that the system supports. Properties: • default – indicates default voice for current app language • lang – BCP 47 language tag • localService – indicates if voice supplied by local speech synthesizer service • name – human-readable name that represents voice • voiceURI – location of speech synthesis service @hakanson 35
  • 36. Voices by Platform • Chrome • Google US English • … • Mac • Samantha • Alex • … • Windows 10 • Microsoft David Desktop • Microsoft Zira Desktop • … @hakanson 36
  • 38. “Samantha” voiceURI • Chrome/Opera • Samantha • Safari • com.apple.speech.synthesis.voice.samantha • com.apple.speech.synthesis.voice.samantha.premium • Firefox • urn:moz-tts:osx:com.apple.speech.synthesis.voice.samantha.premium @hakanson 38
  • 39. Google App’s New Voice Team included a Voice Coach and Linguist working in a recording studio @hakanson 39 Source: “The Google App’s NewVoice - #NatAndLoEp 12”
  • 42. SSML • Speech Synthesis Markup Language (SSML) • Version 1.0; W3C Recommendation 7 September 2004 • XML-based markup language for assisting the generation of synthetic speech • Standard way to control aspects of speech such as pronunciation, volume, pitch, rate, etc. @hakanson 42 https://www.w3.org/TR/speech-synthesis/
  • 43. SSML Example <speak version="1.0" xmlns="http://www.w3.org/2001/10/synthesis" xml:lang="en-US”> <p> Your <say-as interpret-as="ordinal"> 1st </say-as> request was for <say-as interpret-as="cardinal"> 1 </say-as> room on <say-as interpret-as="date" format="mdy"> 10/19/2010 </say-as>, with early arrival at <say-as interpret-as="time" format="hms12"> 12:35pm </say-as>. </p> </speak> @hakanson 43
  • 44. OS X Embedded Speech Commands Allows precise adjustments to pronunciation, word emphasize, and overall cadence of speech Examples: • char NORM | LTRL • emph + | - • inpt TEXT | PHON | TUNE • nmbr NORM | LTRL • rate [+ | -] <RealValue> @hakanson 44 Source: Speech Synthesis in OS X
  • 46. TK-421 Text • TK-4 2 1 SSML • TK-<say-as interpret-as="digits">421</say-as> OS X Comands • TK-[[nmbr LTRL]]421[[nmbr NORM]] @hakanson 46 http://starwars.wikia.com/wiki/TK-421
  • 47. 867-5309 Text • 867-5309 SSML • <say-as interpret-as=”telephone”>8675309</say-as> OS X Comands • TK-[[nmbr LTRL]]8675309[[nmbr NORM]] @hakanson 47 https://en.wikipedia.org/wiki/867-5309/Jenny
  • 48. @hakanson 48 Utterance SSML OS X Commands lang xml:lang=“” pitch <prosody pitch =“”> pbas [+ | -] <RealValue> rate <prosody rate=“”> rate [+ | -] <RealValue> voice <voice> volume <prosody volume=“”> volm [+ | -] <RealValue>
  • 49. Spoken Output and Accessibility “It’s important to understand that adding synthesized speech to an application and making an application accessible to all users (a process called access enabling) are differentprocesses with differentgoals.” @hakanson 49 Source: “Speech Synthesis in OS X”
  • 50. Speech Recognition Speech recognition (SR) is the inter-disciplinary sub-field of computational linguistics which incorporates knowledge and research in the linguistics, computer science, and electrical engineering fields to develop methodologies and technologies that enables the recognition and translation of spoken language into text by computers and computerized devices such as those categorized as smart technologies and robotics. It is also known as "automatic speech recognition" (ASR), "computer speech recognition", or just "speech to text" (STT). @hakanson 50 Source: Wikipedia
  • 51. SpeechRecognition The SpeechRecognition interface is the controller interface for the recognition service; this also handles the SpeechRecognitionEvent sent from the recognition service. @hakanson 51
  • 52. Properties • grammars – returns and sets a collection of SpeechGrammar objects that represent the grammars that will be understood by the current SpeechRecognition • lang – returns and sets the language of the current SpeechRecognition. If not specified, this defaults to the HTML lang attribute value, or the user agent's language setting if that isn't set either • continuous – controls whether continuous results are returnedfor each recognition, or only a single result. Defaults to single (false) • interimResults – controls whether interim results should be returned (true) or not (false.) Interim results are results that are not yet final (e.g. the isFinal property is false.) • maxAlternatives – sets the maximum number of SpeechRecognitionAlternatives provided per result (default value is 1) • serviceURI – specifies the location of the speech recognition service used by the current SpeechRecognition to handle the actual recognition (default is the user agent's default speech service) @hakanson 52
  • 53. Events • onstart – fired when the speech recognition service has begun listening to incoming audio with intent to recognize grammars associated with the current SpeechRecognition • onaudiostart – fired when the user agent has started to capture audio. • onsoundstart – fired when any sound — recognisable speech or not — has been detected • onspeechstart – fired when sound that is recognised by the speech recognition service as speech has been detected • onresult – fired when the speech recognition service returns a result — a word or phrase has been positively recognized and this has been communicated back to the app @hakanson 53
  • 54. Events • onspeechend – fired when speech recognised by the speech recognition service has stopped being detected • onsoundend – fired when any sound — recognisable speech or not — has stopped being detected • onaudioend – fired when the user agent has finished capturing audio. SpeechRecognition.onendFired when the speech recognition service has disconnected • onnomatch – fired when the speech recognition service returns a final result with no significant recognition. This may involve some degree of recognition, which doesn't meet or exceed the confidence threshold • onerror – fired when a speech recognition error occurs @hakanson 54
  • 55. Methods • start() – starts the speech recognition service listening to incoming audio with intent to recognize grammars associated with the current SpeechRecognition • stop() – stops the speech recognition service from listening to incoming audio, and attempts to return a SpeechRecognitionResult using the audio captured so far • abort() – stops the speech recognition service from listening to incoming audio, and doesn't attempt to return a SpeechRecognitionResult @hakanson 55
  • 56. JavaScript Example var recognition = new SpeechRecognition(); recognition.lang = 'en-US'; recognition.interimResults = false; recognition.maxAlternatives = 1; recognition.start(); @hakanson 56
  • 57. SpeechRecognitionResult The SpeechRecognitionResult interface represents a single recognition match, which may contain multiple SpeechRecognitionAlternativeobjects. • isFinal – a Boolean that states whether this result is final (true) or not (false) — if so, then this is the final time this result will be returned; if not, then this result is an interim result, and may be updated later on • length – returns the length of the "array" — the number of SpeechRecognitionAlternative objects contained in the result (also referred to as "n-best alternatives”) • item – a standard getter that allows SpeechRecognitionAlternative objects within the result to be accessed via array syntax @hakanson 57
  • 58. SpeechRecognitionAlternative The SpeechRecognitionAlternative interface represents a single word that has been recognised by the speech recognition service • transcript – returns a string containing the transcript of the recognised word • confidence – returns a numeric estimate of how confident the speech recognition system is that the recognition is correct @hakanson 58
  • 59. JavaScript Example recognition.onresult = function(event) { var color = event.results[0][0].transcript; diagnostic.textContent = 'Result received: ' + color + '.'; bg.style.backgroundColor = color; } @hakanson 59
  • 61. Grammars • A speech recognition grammar is a container of language rules that define a set of constraints that a speech recognizer can use to perform recognition. • A grammar helps in the following ways: • Limits Vocabulary • Customizes Vocabulary • Filters Recogized Results • Identifies Rules • Defines Semantics @hakanson 61 https://msdn.microsoft.com/en-us/library/hh378342(v=office.14).aspx
  • 62. SRGS • Speech Recognition Grammar Specification (SRGS) • Version 1.0; W3C Recommendation 16 March 2004 • Grammars are used so that developers can specify the words and patterns of words to be listened for by a speech recognizer • Augmented BNF (ABNF) or XML syntax • Modelled on the JSpeech Grammar Format specification [JSGF] @hakanson 62 https://www.w3.org/TR/speech-grammar/
  • 63. JSGF • JSpeech Grammar Format (JSGF) • W3C Note 05 June 2000 • Platform-independent, vendor-independent textual representation of grammars for use in speech recognition • Derived from the JavaTM Speech API Grammar Format (Version 1.0, October, 1998) @hakanson 63
  • 64. SpeechGrammar The SpeechGrammar interface represents a set of words or patterns of words that we want the recognition service to recognize. Defined using JSpeech Grammar Format (JSGF.) Other formats may also be supported in the future. • src – sets and returns a string containing the grammar from within in the SpeechGrammar object instance • weight – sets and returns the weight of the SpeechGrammar object @hakanson 64
  • 65. JavaScript Example var grammar = '#JSGF V1.0; grammar colors; public <color> = aqua | azure | beige | bisque | black | blue | brown | chocolate | coral | crimson | cyan | fuchsia | ghostwhite | gold | goldenrod | gray | green | indigo | ivory | khaki | lavender | lime | linen | magenta | maroon | moccasin | navy | olive | orange | orchid | peru | pink | plum | purple | red | salmon | sienna | silver | snow | tan | teal | thistle | tomato | turquoise | violet | white | yellow ;’ var speechRecognitionList = new SpeechGrammarList(); speechRecognitionList.addFromString(grammar, 1); recognition.grammars = speechRecognitionList; @hakanson 65
  • 66. “Alexa Skills Kit” Style Example (1 of 2) SampleUtterances.txt SetBackground {Color} SetBackground background {Color} SetBackground set background {Color} SetBackground set background to {Color} SetBackground set background as {Color} SetBackground set background color to {Color} SetBackground set background color as {Color} @hakanson 66
  • 67. “Alexa Skills Kit” Style Example (2 of 2) IntentSchema.json { "intents": [ { "intent": ”SetBackground", "slots": [ { "name": ”Color", "type": "LIST_OF_COLORS" } ] } ] } customSlotTypes/LIST_OF_COLORS aqua azure beige bisque black blue brown chocolate coral crimson cyan … @hakanson 67
  • 68. Sample “OK, Google” Commands • Remind me to [do a task]. Ex.: "Remind me to get dog food at Target," will create a location-based reminder. "Remind me to take out the trash tomorrow morning," will give you a time-based reminder. • When's my next meeting? • How do I [task]? Ex.: "How do I make an Old Fashioned cocktail?" or "How do I fix a hole in my wall?” • If a song is playing, ask questions about the artist. For instance, "Where is she from?" (Android 6.0 Marshmallow) • To learn more about your surroundings, you can ask things like "What is the name of this place?" or "Show me movies at this place" or "Who built this bridge?" @hakanson 68 Source: “The complete list of 'OK, Google' commands”
  • 69. Natural Language Understanding •Speech to Text •Text to Meaning @hakanson 69
  • 70. NLP vs. FSM Natural language processing (NLP) is a field of computer science, artificial intelligence, and computational linguistics concerned with the interactions between computers and human (natural) languages. A finite-state machine (FSM) is a mathematical model of computation used to design both computer programs and sequential logic circuits. @hakanson 70 Source: Wikipedia
  • 71. KITT vs Samsung smart home @hakanson 71 Source
  • 72. Other Speech APIs • Why? • Browser doesn’t support Web Speech API • Consistent experience across all browsers • Additional functionality not included in Web Speech API • How? • Web Audio API • JavaScript running in browser • WebSocket connection directly from browser • HTTP API proxied though server @hakanson 72
  • 73. Web Audio API The Web Audio API provides a powerful and versatile system for controlling audio on the Web, allowing developers to choose audio sources, add effects to audio, create audio visualizations, apply spatial effects (such as panning) and much more. @hakanson 73 https://developer.mozilla.org/en-US/docs/Web/API/Web_Audio_API
  • 74. Pocketsphinx.js Speech recognition in JavaScript • PocketSphinx.js is a speech recognizer that runs entirely in the web browser. It is built on: • a speech recognizer written in C (PocketSphinx) converted into JavaScript using Emscripten, • an audio recorder using the Web Audio API. @hakanson 74 https://syl22-00.github.io/pocketsphinx.js/live-demo.html
  • 75. IBM Watson Developer Cloud • Text to Speech • Watson Text to Speech provides a REST API to synthesize speech audio from an input of plain text. • Once synthesized in real-time, the audio is streamed back to the client with minimal delay. • Speech to Text • Uses machine intelligence to combine information about grammar and language structure with knowledge of the composition of an audio signal to generate an accurate transcription. • Accessed via a WebSocket connection or REST API. @hakanson 75 http://www.ibm.com/smarterplanet/us/en/ibmwatson/developercloud/services-catalog.html
  • 77. Microsoft Cognitive Services Speech API • Convert audio to text, understand intent, and convert text back to speech for natural responsiveness (rebranding of Bing and Project Oxford APIs) • Microsoft has used Speech API for Windows applications like Cortana and Skype Translator @hakanson 77 https://www.microsoft.com/cognitive-services/en-us/speech-api
  • 78. Microsoft Cognitive Services • Speech Recognition • Convert spoken audio to text. • Text to Speech • Convert text to spoken audio • Speech Intent Recognition • Convert spoken audio to intent • In addition to returning recognized text, includes structured information about the incoming speech @hakanson 78
  • 80. Google Cloud Speech API Enables developers to convert audio to text by applying powerful neural network models in an easy to use API • Over 80 Languages • Return Text Results In Real-Time • Accurate In Noisy Environments • Powered by Machine Learning @hakanson 80 https://cloud.google.com/speech/
  • 82. Summary • Speech Interfaces are the future… • and they have been for a long time… • and don’t believe everything you see on TV • Know your customer and your application • More UI/UX effort than JavaScript code • and time to leverage those writing and speaking skills • Web technology lags behind mobile, but is evolving @hakanson 82
  • 83. Thank You! Questions? Come over for a conversation. @hakanson 83 Source