4Developers 2015: Talking and listening to web pages - Aurelio De Rosa

TALKING AND LISTENING
TO WEB PAGES
Aurelio De Rosa
Warsaw, Poland - 20 April 2015

WEB & APP DEVELOPER
CONTRIBUTE(D) TO
...
jQuery
CanIUse
PureCSS
WRITE(D) FOR
...
SitePoint
Tuts+
.NET megazine
php [architect] megazine
Telerik
Web & PHP magazine

AUTHORED BOOKS
JQUERY IN ACTION (3RD EDITION) INSTANT JQUERY SELECTORS
(Shameless self-promotion!)

WHAT WE'LL COVER
Natural language processing (NLP)
Why it matters
The Web Speech API
Speech recognition
Speech synthesis
Issues and inconsistencies
Demo

NATURAL LANGUAGE
PROCESSING (NLP)
A field of computer science, artificial intelligence, and linguistics
concerned with the interactions between computers and human
(natural) languages.

NATURAL LANGUAGE PROCESSING (NLP)
It all started in 1950 when Alan Turing published an article titled
“Computing Machinery and Intelligence” where he proposed
what is now called the Turing test.

VOICEXML
It's an XML language for writing Web pages you interact with by
listening to spoken prompts and other forms of audio that you
can control by providing spoken inputs.
Specifications:http://www.w3.org/TR/voicexml30/

VOICEXML: EXAMPLE
<?xml version="1.0" encoding="ISO‐8859‐1"?>
<vxml version="3.0" lang="en">
<form>
   <field name="city">
      <prompt>Where do you want to travel to?</prompt>
      <option>New York</option>
      <option>London</option>
      <option>Tokyo</option>
   </field>
   <block>
      <submit next="http://www.test.com" namelist="city"/>
   </block>
</form>
</vxml>

JAVA APPLET
It's an application written in Java and delivered to users in the
form of bytecode through a web page. The applet is then
executed within a Java Virtual Machine (JVM) in a process
separated from the browser itself.

WHY YOU SHOULD CARE
A step ahead to fill the gap with native apps
Improve user experience
Feature needed by some applications such as navigators
Help people with disabilities

“DEMO IT OR IT DIDN'T HAPPEN”™
Register to our website
Name:
Surname:
Nationality:
Start
Thisdemocanbefoundathttps://jsbin.com/faguji/watch?output

WEB SPEECH API
The Web Speech API allows you to deal with two aspects of the
computer-human interaction: Automatic Speech Recognition
(ASR) and Text-to-Speech (TTS).
Specifications:https://dvcs.w3.org/hg/speech-api/raw-file/tip/webspeechapi.html

WEB SPEECH API
Introduced at the end of 2012
Defines two interfaces: one for recognition and one for
synthesis
Requires the permission before acquiring audio
Agnostic of the underlying technology

SPEECH RECOGNITION
There are two types of recognition available: one-shot and
continuous. The first stops as soon as the user stops talking, the
second must be stopped programmatically.
To instantiate a new speech recognizer you have to call
speechRecognition():
var recognizer = new speechRecognition();

SPEECH RECOGNITION: BROWSERS SUPPORT
Explorer Chrome Safari Firefox Opera
None 25+(-webkit) None None None
Dataupdatedto18thApril2015

SPEECH RECOGNITION: PROPERTIES
continuous
grammars*
interimResults
lang
maxAlternatives
serviceURI**
*UptoChrome42addingagrammartothegrammarspropertydoesnothing.ThishappensbecauseThegroupis
currentlydiscussingoptionsforwhichgrammarformatsshouldbesupported,howbuiltingrammartypesarespecified,
anddefaultgrammarswhennotspecified.
**serviceURIisn'texposedbyanybrowser.

SPEECH RECOGNITION: METHODS
start()
stop()
abort()

SPEECH RECOGNITION: EVENTS
start
end*
audiostart
audioend
soundstart
soundend
speechstart
speechend
result
nomatch
error
*UptoChrome42onWindows8.1doesn'tfiretheresultortheerroreventbeforetheendeventwhenonly
noisesareproduced(issue ).#428873

“IT'S SHOWTIME!”
Start Stop
Thisdemocanbefoundathttps://jsbin.com/zesew/watch?output

SPEECH RECOGNITION: RESULTS
Results are obtained as an object (that implements the
SpeechRecognitionEvent interface) passed as the first
argument of the handler attached to the result event.

PROBLEM: SOMETIMES RECOGNITION SUCKS!
Imagine a user of your website or web app says a command but
the recognizer returns the wrong string. Your system is good and
it asks the user to repeat it, but the recognition fails again.
How can you get out of this loop?

SOLUTION: LEVENSHTEIN DISTANCE
An approach that isn't ideal but that you can use today.

LEVENSHTEIN DISTANCE: EXAMPLE
Commands available: "Send email", "Call"
Names in the phonebook: "Aurelio De Rosa", "Annarita
Tranfici", "John Doe"
Recognized text:
Updated text:
Start
Thisdemocanbefoundathttps://jsbin.com/tevogu/watch?output

SPEECH SYNTHESIS
Provides text-to-speech functionality in the browser. This is
especially useful for blind people and those with visual
impairments in general.
The feature is exposed via a speechSynthesis object that
possess static methods.

SPEECH SYNTHESIS: BROWSERS SUPPORT
Explorer Chrome Safari Firefox Opera
None 33+ 7+ None 27+
Dataupdatedto18thApril2015

SPEECH SYNTHESIS: PROPERTIES
pending
speaking
paused* **
*UptoChrome42,pausingtheutterancedoesn'treflectinachangeofthepauseproperty(issue )#425553
**InOpera27pausingtheutterancereflectinanerroneous,reversedchangeofthepauseproperty(issue
#DNA-37487)

SPEECH SYNTHESIS: METHODS
speak()*
cancel()
pause()
resume()
getVoices()
*UptoChrome42,speak()doesn'tsupportSSMLanddoesn'tstripunrecognizedtags(issue ).#428902

SPEECH SYNTHESIS: EVENTS
voicechanged

SPEECH SYNTHESIS: UTTERANCE INTERFACE
The SpeechSynthesisUtterance interface represents the
utterance (i.e. the text) that will be spoken by the synthesizer.

SPEECH SYNTHESIS: UTTERANCE PROPERTIES
lang
pitch*
rate*
text**
voice
volume*
*UptoChrome42,changingthepitch,thevolume,andtheratepropertiesdoesnothing(issue )#376280
**UptoChrome42,thetextpropertycan'tbesettoanSSML(SpeechSynthesisMarkupLanguage)document
becauseitisn'tsupportedandChromedoesn'tstriptheunrecognizedtags(issue ).#428902

SPEECH SYNTHESIS: UTTERANCE EVENTS
start
end
pause
resume
boundary*
mark*
error
boundaryandmarkarenotsupportedbyanybrowserbecausetheyarefiredbytheinteractionwithSSML
documents.

SHOW ME TEH CODEZ
To set the text to emit, we can either pass it when instantiating an
utterance object or set it later using the text property.

EXAMPLE 1
var utterance = new SpeechSynthesisUtterance('Hello!');
utterance.lang = 'en‐US';
utterance.rate = 1.2;
utterance.addEventListener('end', function() {
console.log('Speech completed');
});
speechSynthesis.speak(utterance);

EXAMPLE 2
var utterance = new SpeechSynthesisUtterance();
utterance.text = 'Hello!';
utterance.lang = 'en‐US';
utterance.rate = 1.2;
utterance.addEventListener('end', function() {
console.log('Speech completed');
});
speechSynthesis.speak(utterance);

SPEECH SYNTHESIS: DEMO IT MAN!
I know my voice isn't very sexy, but I still want to say that this
conference is wonderful and the audience of my talk is even
better. You all rock!
Thisdemocanbefoundathttps://jsbin.com/cipepa/watch?output

INTERACTIVE FORM: RECIPE
Promises (to avoid the callback hell)
Speech recognition
Speech synthesis
TheactualcodeisabitdifferentbutImadethechangesforthesakeofbrevityandthelimitedsizeofthescreen.

INTERACTIVE FORM: STEP 1 - HTML
<form id="form">
<label for="name"
data‐question="What's your name?">Name:</label>
<input id="name" />
<label for="surname"
data‐question="What's your surname?">Surname:</label>
<input id="surname" />
<!‐‐ Other label/element pairs here ‐‐>
<input id="btn‐voice" type="submit" value="Start" />
</form>

INTERACTIVE FORM: STEP 2 - SUPPORT
LIBRARY
Create a Speech object containing two methods: speak and
recognize that return a Promise.
var Speech = {
  speak: function(text) {
    return new Promise(function(resolve, reject) {...}
  },
  recognize: function() {
    return new Promise(function(resolve, reject) {...}
  }
}

INTERACTIVE FORM: STEP 3 - JS 1/2
function formData(i) {
  return promise.then(function() {
        return Speech.speak(
           fieldLabels[i].dataset.question
        );
     }).then(function() {
       return Speech.recognize().then(function(text) {
         document.getElementById(
           fieldLabels[i].getAttribute('for')
         ).value = text;
       })
     });
}

INTERACTIVE FORM: STEP 3 - JS 2/2
var form = document.getElementById('form');
form.addEventListener('click', function(event) {
   var fieldLabels = document.querySelectorAll('label');
   function formData(i) { /* code here */ }
   for(var i = 0; i < fieldLabels.length; i++) {
      promise = formData(i);
   }
   promise.then(function() {
      return Speech.speak('Thank you for filling...');
   }).catch(function(error) { alert(error); });
});

DICTATION: RECIPE
Speech recognition
TheactualcodeisabitdifferentbutImadethechangesforthesakeofbrevityandthelimitedsizeofthescreen.

DICTATION: STEP 1 - HTML
<div id="transcription" contenteditable="true"></div>
<button id="btn‐start">Start</button>
<button id="btn‐stop">Stop</button>

DICTATION: STEP 2 - JS 1/3
var recognizer = new SpeechRecognition();
recognizer.interimResults = true;
recognizer.continuous = true;
var transcr = document.getElementById('transcription');
var currTranscr = document.createElement('span');
currTranscr.id = 'current‐transcription';

recognizer.addEventListener('result', function(event){
  currTranscr.textContent = '';
  var i = event.resultIndex;
  while (i < event.results.length) {
    var result = event.results[i++];
    if (result.isFinal) {
      transcr.removeChild(currTranscr);
      transcr.textContent += result[0].transcript;
      transcr.appendChild(currTranscr);
    } else {
      currTranscr.textContent += result[0].transcript;
    }
  }
});

var btnStart = document.getElementById('btn‐start');
btnStart.addEventListener('click', function() {
   transcr.textContent = '';
   transcr.appendChild(currTranscr);
   recognizer.start();
});
var btnStop = document.getElementById('btn‐stop');
btnStop.addEventListener('click', function() {
   recognizer.stop();
});

ONE LAST DEMO...
VideocourtesyofSzymonNowak( ):@szimek https://www.youtube.com/watch?v=R8ejjVAZweg

CONTACTS
Website:
Email:
Twitter:
www.audero.it
a.derosa@audero.it
@AurelioDeRosa

4Developers 2015: Talking and listening to web pages - Aurelio De Rosa

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Ähnlich wie 4Developers 2015: Talking and listening to web pages - Aurelio De Rosa

Ähnlich wie 4Developers 2015: Talking and listening to web pages - Aurelio De Rosa (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

4Developers 2015: Talking and listening to web pages - Aurelio De Rosa