Speaker: Aurelio De Rosa
Language: English
As web developers, our job is to build nice, fast, and reliable websites, web apps, or web services. But our role isn't limited to this. We have to build these products not only for our ideal users but for a range of people as wide as possible. Today's browsers help us in achieving this goal providing APIs created with this scope in mind. One of these APIs is the Web Speech API that provides speech input and text-to-speech output features in a web browser.
In this talk you'll learn what the Web Speech API is and how it can drastically improve the way users, especially those with disabilities, perform tasks in your web pages.
4Developers: http://4developers.org.pl/pl/
5. WHAT WE'LL COVER
Natural language processing (NLP)
Why it matters
The Web Speech API
Speech recognition
Speech synthesis
Issues and inconsistencies
Demo
6. NATURAL LANGUAGE
PROCESSING (NLP)
A field of computer science, artificial intelligence, and linguistics
concerned with the interactions between computers and human
(natural) languages.
7. NATURAL LANGUAGE PROCESSING (NLP)
It all started in 1950 when Alan Turing published an article titled
“Computing Machinery and Intelligence” where he proposed
what is now called the Turing test.
11. VOICEXML
It's an XML language for writing Web pages you interact with by
listening to spoken prompts and other forms of audio that you
can control by providing spoken inputs.
Specifications:http://www.w3.org/TR/voicexml30/
13. JAVA APPLET
It's an application written in Java and delivered to users in the
form of bytecode through a web page. The applet is then
executed within a Java Virtual Machine (JVM) in a process
separated from the browser itself.
15. WHY YOU SHOULD CARE
A step ahead to fill the gap with native apps
Improve user experience
Feature needed by some applications such as navigators
Help people with disabilities
16. “DEMO IT OR IT DIDN'T HAPPEN”™
Register to our website
Name:
Surname:
Nationality:
Start
Thisdemocanbefoundathttps://jsbin.com/faguji/watch?output
17. WEB SPEECH API
The Web Speech API allows you to deal with two aspects of the
computer-human interaction: Automatic Speech Recognition
(ASR) and Text-to-Speech (TTS).
Specifications:https://dvcs.w3.org/hg/speech-api/raw-file/tip/webspeechapi.html
18. WEB SPEECH API
Introduced at the end of 2012
Defines two interfaces: one for recognition and one for
synthesis
Requires the permission before acquiring audio
Agnostic of the underlying technology
19. SPEECH RECOGNITION
There are two types of recognition available: one-shot and
continuous. The first stops as soon as the user stops talking, the
second must be stopped programmatically.
To instantiate a new speech recognizer you have to call
speechRecognition():
var recognizer = new speechRecognition();
20. SPEECH RECOGNITION: BROWSERS SUPPORT
Explorer Chrome Safari Firefox Opera
None 25+(-webkit) None None None
Dataupdatedto18thApril2015
25. SPEECH RECOGNITION: RESULTS
Results are obtained as an object (that implements the
SpeechRecognitionEvent interface) passed as the first
argument of the handler attached to the result event.
26. PROBLEM: SOMETIMES RECOGNITION SUCKS!
Imagine a user of your website or web app says a command but
the recognizer returns the wrong string. Your system is good and
it asks the user to repeat it, but the recognition fails again.
How can you get out of this loop?
29. LEVENSHTEIN DISTANCE: EXAMPLE
Commands available: "Send email", "Call"
Names in the phonebook: "Aurelio De Rosa", "Annarita
Tranfici", "John Doe"
Recognized text:
Updated text:
Start
Thisdemocanbefoundathttps://jsbin.com/tevogu/watch?output
30.
31. SPEECH SYNTHESIS
Provides text-to-speech functionality in the browser. This is
especially useful for blind people and those with visual
impairments in general.
The feature is exposed via a speechSynthesis object that
possess static methods.
32. SPEECH SYNTHESIS: BROWSERS SUPPORT
Explorer Chrome Safari Firefox Opera
None 33+ 7+ None 27+
Dataupdatedto18thApril2015
37. SPEECH SYNTHESIS: UTTERANCE INTERFACE
The SpeechSynthesisUtterance interface represents the
utterance (i.e. the text) that will be spoken by the synthesizer.
43. SPEECH SYNTHESIS: DEMO IT MAN!
I know my voice isn't very sexy, but I still want to say that this
conference is wonderful and the audience of my talk is even
better. You all rock!
Thisdemocanbefoundathttps://jsbin.com/cipepa/watch?output
46. INTERACTIVE FORM: STEP 1 - HTML
<form id="form">
<label for="name"
data‐question="What's your name?">Name:</label>
<input id="name" />
<label for="surname"
data‐question="What's your surname?">Surname:</label>
<input id="surname" />
<!‐‐ Other label/element pairs here ‐‐>
<input id="btn‐voice" type="submit" value="Start" />
</form>
47. INTERACTIVE FORM: STEP 2 - SUPPORT
LIBRARY
Create a Speech object containing two methods: speak and
recognize that return a Promise.
var Speech = {
speak: function(text) {
return new Promise(function(resolve, reject) {...}
},
recognize: function() {
return new Promise(function(resolve, reject) {...}
}
}