Moshe Yudkowsky's Presentation at Emerging Communication Conference & Awards 2009 Europe

•Als KEY, PDF herunterladen•

2 gefällt mir•562 views

This document summarizes a presentation on practical applications of speech technology. It discusses speech recognition, text-to-speech, biometrics, and data analytics. For speech recognition, call centers have excellent standardized systems while dictation and personalized answers are more expensive. Text-to-speech requires understanding language and translating terms. Speaker identification and characterization are practical using biometrics but verification is still rare. Data analytics through mining is useful but not real-time. The document also lists sponsors of the conference.

Technologie

2009 | Westergasfabriek | Amsterdam | http://eComm.ec

Practical Edge of
Speech Technology

Moshe Yudkowsky
www.Disaggregate.com
2

“Practical” is Relative

Affordable

Schedule

Achievable

3

Core Technology: Speech
Recognition (ASR), Text-
Engines to-Speech (TTS),
Biometrics, Thynometrics
(emotions)
Data mining, problem
Analytics
discovery

4

Two 20-second Exercises

Exercise 1

Travel Agency
Automated
Reservations

5

Two 20-second Exercises

Exercise 1 Exercise 2

Travel Agency Twitter Update
Automated of eComm
Reservations Conference

5

Lessons
Exercise 1 Exercise 2
Everyone has the
same & simple
answers
Call centers;

6

Lessons
Exercise 1 Exercise 2
Everyone has the
same & simple
answers
Call centers;
standard device
commands
Speaker

6

Lessons
Exercise 1 Exercise 2
Everyone has the
same & simple
answers
Call centers;
standard device
commands
Speaker
Speaker
Independent
6

Lessons
Exercise 1 Exercise 2
Everyone has the
Highly Personal
same & simple
Answers
answers
Call centers;
standard device
commands
Speaker
Speaker
Independent
6

Lessons
Exercise 1 Exercise 2
Everyone has the
Highly Personal
same & simple
Answers
answers
Call centers;
Dictation; voice
standard device
search
commands
Speaker Speaker
Speaker
Independent
6

Network Hardware for
Speaker Independent

7

Network-
based
systems:
Your
equipment
(“Premises”)

Device-
based
systems
ASR
Results

Local
Recogniti
on
Known text
Complex,
personal
text

Device-
based
systems:
Hybrid

Voice Results
Voice to
server,
data back
to device
Speaker
independent
(?) ASR

Engine
Speech Recognition (ASR)
s

Summary:
You can do almost anything — but
the more you do, the more you
pay.
13

Telephony ASR is excellent:
Inexpensiv “What city?”—
“Amsterdam”
“What is wrong with your
phone?” — “I dropped it
Very
on the ﬂoor, and the
expensive
screen is cracked, and
now I can’t see anything.”

14

Cautions

No such thing as “speech to text”
Speaker dependent comes closest
Voicemail to text: human assisted
Some telephone ASR is also human
assisted

15

Speaker Dependant

Desktop computers can do excellent
transcription, need corrections
Hand-held devices have more
memory & power → better ASR

16

Engine
Text-to-speech (TTS)
s

Summary:
Available in many languages,
reasonable quality, sometimes
difficult to understand.
17

TTS requires language understanding
and speciﬁc jargon translation:

18

TTS requires language understanding
and speciﬁc jargon translation:
“Mr.” → “Mister”

18

TTS requires language understanding
and speciﬁc jargon translation:
“Mr.” → “Mister”
“bbl” →“Be Back Later

18

TTS requires language understanding
and speciﬁc jargon translation:
“Mr.” → “Mister”
“bbl” →“Be Back Later
“287 m” →“about 300 meters”

18

TTS requires language understanding
and speciﬁc jargon translation:
“Mr.” → “Mister”
“bbl” →“Be Back Later
“287 m” →“about 300 meters”
Custom voices available

18

Biometrics (Speaker
Engine Identiﬁcation, Speaker
s Veriﬁcation, Speaker
Characterization)

Summary:
Speaker veriﬁcation practical but
still rare; speaker identiﬁcation &
characterization practical and
secret
19

Speaker Veriﬁcation (is that really
you?)
Available, practical
Rare in the US, more prevalent in
Australia, Israel, and Canada
Roadblocks: valid fear; fear of
biometrics; love of ﬁngerprints;
only part of complete solution

20

•Speaker Identiﬁcation (who are
you?)
•Speaker Characterization (what are
you?)

21

Analytic Data mining, problem
s discovery

Summary:
Surprising useful, expensive

22

Not a real-time process
Word searches, “speech to text”
Emotion detection by ASR (swearing)
and by thynometrics (pitch, volume)

23

About Disaggregate

Moshe Yudkowsky
Disaggregate
2952 W. Fargo
Chicago, IL 60645
+1 773 764 8727
www.Disaggregate.com

Headline Sponsor

Platinum Sponsors

Gold Sponsors

2009 | Westergasfabriek | Amsterdam | http://eComm.ec

Weitere ähnliche Inhalte

Ähnlich wie Moshe Yudkowsky's Presentation at Emerging Communication Conference & Awards 2009 Europe

Radio Drama At A DistanceRichard Elen

Design Patterns Summer Course 2009-2010 - Session#1Muhamad Hesham

Mira Georgieva - VoIP2DAY 2016 | Open hardware to be used by your deaf grandmaVOIP2DAY

English: Web 2.0's Universal LanguageSmokler

Interspeech Gemmeke 2008 V6jgemmeke

Ppsp icassp17v10Gérard Chollet

Chatbots and Voice Conversational Interfaces with Amazon Alexa, Neo4j and Gra...Christophe Willemsen

Careers in Home Stagingapsdhomestaging

ICANN DNS Symposium 2019: Resolver CentralityAPNIC

General Speereo TechnologyDaniel Ischenko

Do you Mean what you say? Recognizing Emotions.Sunil Kumar Kopparapu

Basho and Riak at GOTO Stockholm: "Don't Use My Database."Basho Technologies

Just the basics_strata_2013Ken Mwai

Extent 2013 Obninsk Test Tools for Trading Systems: Evolution Theoryextentconf Tsoy

Ibm aix wlm ideaDamir Delija

Speech recognition system seminarDiptimaya Sarangi

EXTENT Trading Test Tools Evolution TheoryIosif Itkin

Ähnlich wie Moshe Yudkowsky's Presentation at Emerging Communication Conference & Awards 2009 Europe (17)

Radio Drama At A Distance

Design Patterns Summer Course 2009-2010 - Session#1

Mira Georgieva - VoIP2DAY 2016 | Open hardware to be used by your deaf grandma

English: Web 2.0's Universal Language

Interspeech Gemmeke 2008 V6

Ppsp icassp17v10

Chatbots and Voice Conversational Interfaces with Amazon Alexa, Neo4j and Gra...

Careers in Home Staging

ICANN DNS Symposium 2019: Resolver Centrality

General Speereo Technology

Do you Mean what you say? Recognizing Emotions.

Basho and Riak at GOTO Stockholm: "Don't Use My Database."

Just the basics_strata_2013

Extent 2013 Obninsk Test Tools for Trading Systems: Evolution Theory

Ibm aix wlm idea

Speech recognition system seminar

EXTENT Trading Test Tools Evolution Theory

Mehr von eCommConf

Ronald Azuma - Presentation at Emerging Communications Conference & Awards (e...eCommConf

David Troy - Presentation at Emerging Communications Conference & Awards (eCo...eCommConf

Bhaskar Krishnamachari - Presentation at Emerging Communications Conference &...eCommConf

Clark Dodsworth - Presentation at Emerging Communications Conference & Awards...eCommConf

Ryan Gallagher - Presentation at Emerging Communications Conference & Awards ...eCommConf

Darren Schreiber - Presentation at Emerging Communications Conference & Award...eCommConf

Bryan Johns - Presentation at Emerging Communications Conference & Awards (eC...eCommConf

Tim Panton - Presentation at Emerging Communications Conference & Awards (eCo...eCommConf

Peter Ecclesine - Presentation at Emerging Communications Conference & Awards...eCommConf

John Harmon - Presentation at Emerging Communications Conference & Awards (eC...eCommConf

Eladio Martin - Presentation at Emerging Communications Conference & Awards (...eCommConf

Adrian Avendano - Presentation at Emerging Communications Conference & Awards...eCommConf

Rob Lewis - Presentation at Emerging Communications Conference & Awards (eCom...eCommConf

Christophe Ramstein - Presentation at Emerging Communications Conference & Aw...eCommConf

Richard Whitt - Presentation at Emerging Communications Conference & Awards (...eCommConf

Susan Crawford - Presentation at Emerging Communications Conference & Awards ...eCommConf

Larry Downes - Presentation at Emerging Communications Conference & Awards (e...eCommConf

Brough Turner - Presentation at Emerging Communications Conference & Awards (...eCommConf

Chris Mairs - Presentation at Emerging Communications Conference & Awards (eC...eCommConf

Tomaz Stolfa - Presentation at Emerging Communications Conference & Awards (e...eCommConf

Mehr von eCommConf (20)

Ronald Azuma - Presentation at Emerging Communications Conference & Awards (e...

David Troy - Presentation at Emerging Communications Conference & Awards (eCo...

Bhaskar Krishnamachari - Presentation at Emerging Communications Conference &...

Clark Dodsworth - Presentation at Emerging Communications Conference & Awards...

Ryan Gallagher - Presentation at Emerging Communications Conference & Awards ...

Darren Schreiber - Presentation at Emerging Communications Conference & Award...

Bryan Johns - Presentation at Emerging Communications Conference & Awards (eC...

Tim Panton - Presentation at Emerging Communications Conference & Awards (eCo...

Peter Ecclesine - Presentation at Emerging Communications Conference & Awards...

John Harmon - Presentation at Emerging Communications Conference & Awards (eC...

Eladio Martin - Presentation at Emerging Communications Conference & Awards (...

Adrian Avendano - Presentation at Emerging Communications Conference & Awards...

Rob Lewis - Presentation at Emerging Communications Conference & Awards (eCom...

Christophe Ramstein - Presentation at Emerging Communications Conference & Aw...

Richard Whitt - Presentation at Emerging Communications Conference & Awards (...

Susan Crawford - Presentation at Emerging Communications Conference & Awards ...

Larry Downes - Presentation at Emerging Communications Conference & Awards (e...

Brough Turner - Presentation at Emerging Communications Conference & Awards (...

Chris Mairs - Presentation at Emerging Communications Conference & Awards (eC...

Tomaz Stolfa - Presentation at Emerging Communications Conference & Awards (e...

Kürzlich hochgeladen

Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3

Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely

Take control of your SAP testing with UiPath Test SuiteDianaGray10

How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe

DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell

How to write a Business Continuity PlanDatabarracks

"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays

Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro

Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan

Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation

Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity

Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed

WordPress Websites for Engineers: Elevate Your Brandgvaughan

Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3

Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm

Advanced Computer Architecture – An IntroductionDilum Bandara

New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada

What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett

SALESFORCE EDUCATION CLOUD | FEXLE SERVICESmohitsingh558521

Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University

Kürzlich hochgeladen (20)

Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx

Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf

Take control of your SAP testing with UiPath Test Suite

How AI, OpenAI, and ChatGPT impact business and software.

DSPy a system for AI to Write Prompts and Do Fine Tuning

How to write a Business Continuity Plan

"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack

Unraveling Multimodality with Large Language Models.pdf

Generative AI for Technical Writer or Information Developers

Connect Wave/ connectwave Pitch Deck Presentation

Dev Dives: Streamline document processing with UiPath Studio Web

Scanning the Internet for External Cloud Exposures via SSL Certs

WordPress Websites for Engineers: Elevate Your Brand

Digital Identity is Under Attack: FIDO Paris Seminar.pptx

Streamlining Python Development: A Guide to a Modern Project Setup

Advanced Computer Architecture – An Introduction

New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024

What's New in Teams Calling, Meetings and Devices March 2024

SALESFORCE EDUCATION CLOUD | FEXLE SERVICES

Nell’iperspazio con Rocket: il Framework Web di Rust!

Moshe Yudkowsky's Presentation at Emerging Communication Conference & Awards 2009 Europe

1. 2009 | Westergasfabriek | Amsterdam | http://eComm.ec

2. Practical Edge of Speech Technology Moshe Yudkowsky www.Disaggregate.com 2

3. “Practical” is Relative Affordable Schedule Achievable 3

4. Core Technology: Speech Recognition (ASR), Text- Engines to-Speech (TTS), Biometrics, Thynometrics (emotions) Data mining, problem Analytics discovery 4

5. Two 20-second Exercises 5

6. Two 20-second Exercises Exercise 1 Travel Agency Automated Reservations 5

7. Two 20-second Exercises Exercise 1 Exercise 2 Travel Agency Twitter Update Automated of eComm Reservations Conference 5

8. Lessons Exercise 1 Exercise 2 6

9. Lessons Exercise 1 Exercise 2 Everyone has the same & simple answers Call centers; 6

10. Lessons Exercise 1 Exercise 2 Everyone has the same & simple answers Call centers; standard device commands Speaker 6

11. Lessons Exercise 1 Exercise 2 Everyone has the same & simple answers Call centers; standard device commands Speaker Speaker Independent 6

12. Lessons Exercise 1 Exercise 2 Everyone has the Highly Personal same & simple Answers answers Call centers; standard device commands Speaker Speaker Independent 6

13. Lessons Exercise 1 Exercise 2 Everyone has the Highly Personal same & simple Answers answers Call centers; Dictation; voice standard device search commands Speaker Speaker Speaker Independent 6

14. Lessons Exercise 1 Exercise 2 Everyone has the Highly Personal same & simple Answers answers Call centers; Dictation; voice standard device search commands Speaker Speaker Speaker Dependent Independent or 6

15. Network Hardware for Speaker Independent 7

16. Network- based systems: Your equipment (“Premises”)

17. Network- based systems: “Hosted”

18. Local Hardware 10

19. Device- based systems ASR Results Local Recogniti on Known text Complex, personal text

20. Device- based systems: Hybrid Voice Results Voice to server, data back to device Speaker independent (?) ASR

21. Engine Speech Recognition (ASR) s Summary: You can do almost anything — but the more you do, the more you pay. 13

22. Telephony ASR is excellent: Inexpensiv “What city?”— “Amsterdam” “What is wrong with your phone?” — “I dropped it Very on the ﬂoor, and the expensive screen is cracked, and now I can’t see anything.” 14

23. Cautions No such thing as “speech to text” Speaker dependent comes closest Voicemail to text: human assisted Some telephone ASR is also human assisted 15

24. Speaker Dependant Desktop computers can do excellent transcription, need corrections Hand-held devices have more memory & power → better ASR 16

25. Engine Text-to-speech (TTS) s Summary: Available in many languages, reasonable quality, sometimes difficult to understand. 17

26. 18

27. TTS requires language understanding and speciﬁc jargon translation: 18

28. TTS requires language understanding and speciﬁc jargon translation: “Mr.” → “Mister” 18

29. TTS requires language understanding and speciﬁc jargon translation: “Mr.” → “Mister” “bbl” →“Be Back Later 18

30. TTS requires language understanding and speciﬁc jargon translation: “Mr.” → “Mister” “bbl” →“Be Back Later “287 m” →“about 300 meters” 18

31. TTS requires language understanding and speciﬁc jargon translation: “Mr.” → “Mister” “bbl” →“Be Back Later “287 m” →“about 300 meters” Custom voices available 18

32. Biometrics (Speaker Engine Identification, Speaker s Verification, Speaker Characterization) Summary: Speaker verification practical but still rare; speaker identification & characterization practical and secret 19

33. Speaker Veriﬁcation (is that really you?) Available, practical Rare in the US, more prevalent in Australia, Israel, and Canada Roadblocks: valid fear; fear of biometrics; love of ﬁngerprints; only part of complete solution 20

34. •Speaker Identiﬁcation (who are you?) •Speaker Characterization (what are you?) 21

35. Analytic Data mining, problem s discovery Summary: Surprising useful, expensive 22

36. Not a real-time process Word searches, “speech to text” Emotion detection by ASR (swearing) and by thynometrics (pitch, volume) 23

37. About Disaggregate Moshe Yudkowsky Disaggregate 2952 W. Fargo Chicago, IL 60645 +1 773 764 8727 www.Disaggregate.com

38. Headline Sponsor Platinum Sponsors Gold Sponsors 2009 | Westergasfabriek | Amsterdam | http://eComm.ec

Hinweis der Redaktion

Other topics: APIs, IDE, Grammar building tools, VUI tools
1. Ask the person next to you a question as if you were an airline reservations system. Find out what city he wants to fly to. 2. Ask the person next to you for a twitter updates of the conference.
1. Ask the person next to you a question as if you were an airline reservations system. Find out what city he wants to fly to. 2. Ask the person next to you for a twitter updates of the conference.
Google, for example, does Voice mail transcriptions - poorly.
Google, for example, does Voice mail transcriptions - poorly.
Google, for example, does Voice mail transcriptions - poorly.
Google, for example, does Voice mail transcriptions - poorly.
Google, for example, does Voice mail transcriptions - poorly.
Google, for example, does Voice mail transcriptions - poorly.
Practical deployment configurations
The telco server is also hosted. The voice of the user (the &#x201C;utterance&#x201D;) must have a good, clean path to the recognition system.
Known text: address book, firmware Complex: dictation, add-on
Not practical in the network: who is using the phone?
We have reviewed the hardware and the types of recognition. I will now review some more specific details about recognition.
Not magic. You still have to manage the data; enroll users; deal with users who are locked out; etc.

Moshe Yudkowsky's Presentation at Emerging Communication Conference & Awards 2009 Europe

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Ähnlich wie Moshe Yudkowsky's Presentation at Emerging Communication Conference & Awards 2009 Europe

Ähnlich wie Moshe Yudkowsky's Presentation at Emerging Communication Conference & Awards 2009 Europe (17)

Mehr von eCommConf

Mehr von eCommConf (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Moshe Yudkowsky's Presentation at Emerging Communication Conference & Awards 2009 Europe

Hinweis der Redaktion