SlideShare ist ein Scribd-Unternehmen logo
1 von 28
1 / 28
http://www.flickr.com/photos/joshmichtom/4311110421/
Guardian:
A Crowd-Powered Spoken Dialog
System for Web APIs
Ting-Hao (Kenneth) Huang
Carnegie Mellon University
Walter S. Lasecki
University of Michigan
Jeffrey P. Bigham
Carnegie Mellon University
2 / 28
What time is it?
It’s 9:30.
Kenneth’s apartment.
3 / 28
How was the Pirates
game last night?
!
Kenneth’s apartment.
4 / 28
How was the Steelers
game yesterday?
!
Kenneth’s apartment.
5 / 28
Is the movie Martian
still playing in theaters?
!
Kenneth’s apartment.
6 / 28
Use Web APIs to Empower
Dialog Systems
7 / 28
Gap between User & Machine
?
8 / 28
A Crowdsourcing Solution
9 / 28
Two Challenges
term
location
Hi, I’m in San Diego.
Any Chinese restaurants here?
Define Parameters
Extract Parameters
10 / 28
How Do Dialog Systems Usually Do?
term
location
Hi, I’m in San Diego.
Any Chinese restaurants here?
Define Parameters
Extract Parameters
11 / 28
Bridging this Gap is Expensive
• Define Parameters requires Experts
– Experts are expensive.
– Most services are not designed for dialog systems.
– Unsupervised Slot Induction
• Extract Parameters requires Data
– (Which we don’t have.)
– Supervised Slot Filling
• Slot Filling / Entity Recognition
– No labeled data
• State Tracking
– No dialogue data
– Unsupervised Slot Filling
12 / 28
Can the Crowd Do It?
term
location
Hi, I’m in San Diego.
Any Chinese restaurants here?
Define Parameters
Extract Parameters
13 / 28
Define Parameters
term
location
Hi, I’m in San Diego.
Any Chinese restaurants here?
Define Parameters
Extract Parameters
14 / 28
How machines understand a Web API?
1. Use which parameters ?
2. Ask user what questions
to elicit these parameters?
Yelp Search API 2.0 has
22 parameters.
15 / 28
Parameter Rating Problem
offset
term
location
sw_latitude
sw_longitude
category_filter
accuracy
deals_filter
radius_filter
...
offset
term
location
sw_latitude
sw_longitude
category_filter
accuracy
deals_filter
radius_filter
...
Pick good parameters for the dialog system.
16 / 28
How about just do a survey?
Task
Parameter Name / Desc
17 / 28
Baselines
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
MAP MRR
Not Unnatural
Ask Siri
Ask a Friend
Average results of 8 Web APIs’ parameters
Results are not so good...
18 / 28
Match Questions with Parameters
offset
I like Chinese food.
What do you want to eat?
? !
I’m in Pittsburgh.
Which city are you in?
? !
Dinner.
Is it dinner or lunch?
? !
...
location
?
!
term
? !
!
?
!
? !
?
!
?
!
category_filter
? !
?
!
?
!
?
!
? !
?
!
? !
?
! ? !
? ! ? !
?
!
?
!
?
!
?
!
?
!
?
!?
!
? !
? !
? !
? !
? !? !
?
!
?
!
? !
? !? !
? !
? !
? !
?
!
? !
?
!
term
location
sw_latitude
sw_longitude
category_filter
BetterParameter
Yelp API
Question Collection
Parameter Filtering
Qestion-Parameter Matching
19 / 28
Evaluation on Parameter Ranking
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
MAP MRR
Question Matching
Not Unnatural
Ask Siri
Ask a Friend
Question Matching
outperforms all baselines.
Average results of 8 Web APIs’ parameters
20 / 28
Questions Collected Already!
1. Use which parameters ?
2. Ask user what questions
to elicit these parameters?
Yelp Search API 2.0 has
22 parameters.
21 / 28
Extract Parameters
term
location
Define Parameters
Extract Parameters
Hi, I’m in San Diego.
Any Chinese restaurants here?
22 / 28
Dialog ESP Game
Hi, I’m in San Diego.
Answer
Aggregate
Location =
San Diego
RecruitedPlayers
Time Constraint
23 / 28
Guardian: A Crowd-Powered Spoken
Dialog System for Web APIs
3
2 Call Web APIHi, I’m in San Diego.
Any Chinese restaurants here?
1 Talk and Extract Parameter
Interpret Result to User
Mandarin Wok Restaurant is
good ! It’s on 4227 Balboa Ave.
term = Chinese
location = San Diego
Yelp
Search
API 2.0
{ ... "name":
"Mandarin Wok
Restaurant”,...
"address":["4227
Balboa Ave”,...], …}
JSON
24 / 28
Engineering Challenges
• Real-time Response ……..…..……… Retainer Model
• Converse with User ……………………………….. Chorus
• Speech Recognition ………………... Web Speech API
• Parameter Extraction ………..…… Dialog ESP-Game
• JSON Visualization ………………..….. JSON Visualizer
• Response Generation Assistant ………………. jQuery
• Workflow Control ………………. Game-like Interface
• Dialog Management …………. Finite-state Machine
• Crowdsourcing Platform …………. Mechanical Turk
25 / 28
System Evaluation
Web API
Task
Find Chinese
restaurants in
Pittsburgh.
Check current
weather
by using a zip
code.
Find
information
of “Titanic”.
Valid JSON 9 / 10 9 / 10 6 / 10
Task
Completion
10 / 10 9 / 10 10 / 10
Domain
Referenced TCR
0.96 0.94 0.88
26 / 28
Guardian: A Hybrid Framework
Annotate Data on the Fly !
27 / 28
What’s next?
• More Automations
– Slot Filling / Entity Recognition
– Dialog Management
– Response Generation
• 1,000+ APIs?
• Future of Dialog Systems
– What if you can really talk to a machine…
– On wearable device?
28 / 28
Thank you!
http://www.flickr.com/photos/joshmichtom/4311110421/

Weitere ähnliche Inhalte

Ähnlich wie Guardian: A Crowd-Powered Spoken Dialog System for Web APIs

How do software engineers understand code changes?
How do software engineers understand code changes?How do software engineers understand code changes?
How do software engineers understand code changes?Yida Tao
 
Machine programming
Machine programmingMachine programming
Machine programmingDESMOND YUEN
 
Movebot ENGR245 Lean LaunchPad Stanford 2018
Movebot ENGR245 Lean LaunchPad Stanford 2018Movebot ENGR245 Lean LaunchPad Stanford 2018
Movebot ENGR245 Lean LaunchPad Stanford 2018Stanford University
 
Measure camp tools of the cro rabble
Measure camp   tools of the cro rabbleMeasure camp   tools of the cro rabble
Measure camp tools of the cro rabbleCraig Sullivan
 
Safety Bot Guaranteed -- Shmoocon 2017
Safety Bot Guaranteed -- Shmoocon 2017Safety Bot Guaranteed -- Shmoocon 2017
Safety Bot Guaranteed -- Shmoocon 2017Richard Seymour
 
Early Identification of Future Committers in Open Source Software Projects
Early Identification of Future Committers in Open Source Software ProjectsEarly Identification of Future Committers in Open Source Software Projects
Early Identification of Future Committers in Open Source Software ProjectsSAIL_QU
 
Rated Ranking Evaluator (RRE) Hands-on Relevance Testing @Chorus
Rated Ranking Evaluator (RRE) Hands-on Relevance Testing @ChorusRated Ranking Evaluator (RRE) Hands-on Relevance Testing @Chorus
Rated Ranking Evaluator (RRE) Hands-on Relevance Testing @ChorusSease
 
Seeding a Tree in a Gherkin
Seeding a Tree in a GherkinSeeding a Tree in a Gherkin
Seeding a Tree in a GherkinPaul Rohorzka
 
Oscon2014 Netflix API - Top 10 Lessons Learned
Oscon2014 Netflix API - Top 10 Lessons LearnedOscon2014 Netflix API - Top 10 Lessons Learned
Oscon2014 Netflix API - Top 10 Lessons LearnedSangeeta Narayanan
 
WannaEat: A computer vision-based, multi-platform restaurant lookup app
WannaEat: A computer vision-based, multi-platform restaurant lookup appWannaEat: A computer vision-based, multi-platform restaurant lookup app
WannaEat: A computer vision-based, multi-platform restaurant lookup appRakuten Group, Inc.
 
Automation Central - FME @ HOK
Automation Central - FME @ HOKAutomation Central - FME @ HOK
Automation Central - FME @ HOKSafe Software
 
What (and why) Admins need to know about Unit Testing, Julio Fernandez & Dori...
What (and why) Admins need to know about Unit Testing, Julio Fernandez & Dori...What (and why) Admins need to know about Unit Testing, Julio Fernandez & Dori...
What (and why) Admins need to know about Unit Testing, Julio Fernandez & Dori...CzechDreamin
 
What are the Characteristics of High-rated Apps
What are the Characteristics of High-rated AppsWhat are the Characteristics of High-rated Apps
What are the Characteristics of High-rated AppsSAIL_QU
 
TestBoss Manchester March 2019 - Automation in Testing: The missing piece
TestBoss Manchester March 2019 - Automation in Testing: The missing pieceTestBoss Manchester March 2019 - Automation in Testing: The missing piece
TestBoss Manchester March 2019 - Automation in Testing: The missing pieceCorecom Consulting
 
HSPS 2015 - SharePoint Performance Santiy Checks
HSPS 2015 - SharePoint Performance Santiy ChecksHSPS 2015 - SharePoint Performance Santiy Checks
HSPS 2015 - SharePoint Performance Santiy ChecksAndreas Grabner
 
Conversionista : Conversion manager course - Stockholm 20 march 2013
Conversionista : Conversion manager course  - Stockholm 20 march 2013Conversionista : Conversion manager course  - Stockholm 20 march 2013
Conversionista : Conversion manager course - Stockholm 20 march 2013Craig Sullivan
 
Tools of destruction - Efrim Bartosik
Tools of destruction  - Efrim BartosikTools of destruction  - Efrim Bartosik
Tools of destruction - Efrim BartosikKuldeep Kulshreshtha
 

Ähnlich wie Guardian: A Crowd-Powered Spoken Dialog System for Web APIs (20)

How do software engineers understand code changes?
How do software engineers understand code changes?How do software engineers understand code changes?
How do software engineers understand code changes?
 
Chat bot in_pythion
Chat bot in_pythionChat bot in_pythion
Chat bot in_pythion
 
Machine programming
Machine programmingMachine programming
Machine programming
 
Movebot ENGR245 Lean LaunchPad Stanford 2018
Movebot ENGR245 Lean LaunchPad Stanford 2018Movebot ENGR245 Lean LaunchPad Stanford 2018
Movebot ENGR245 Lean LaunchPad Stanford 2018
 
Measure camp tools of the cro rabble
Measure camp   tools of the cro rabbleMeasure camp   tools of the cro rabble
Measure camp tools of the cro rabble
 
Safety Bot Guaranteed -- Shmoocon 2017
Safety Bot Guaranteed -- Shmoocon 2017Safety Bot Guaranteed -- Shmoocon 2017
Safety Bot Guaranteed -- Shmoocon 2017
 
Early Identification of Future Committers in Open Source Software Projects
Early Identification of Future Committers in Open Source Software ProjectsEarly Identification of Future Committers in Open Source Software Projects
Early Identification of Future Committers in Open Source Software Projects
 
Introduction to R
Introduction to RIntroduction to R
Introduction to R
 
Rated Ranking Evaluator (RRE) Hands-on Relevance Testing @Chorus
Rated Ranking Evaluator (RRE) Hands-on Relevance Testing @ChorusRated Ranking Evaluator (RRE) Hands-on Relevance Testing @Chorus
Rated Ranking Evaluator (RRE) Hands-on Relevance Testing @Chorus
 
Seeding a Tree in a Gherkin
Seeding a Tree in a GherkinSeeding a Tree in a Gherkin
Seeding a Tree in a Gherkin
 
Oscon2014 Netflix API - Top 10 Lessons Learned
Oscon2014 Netflix API - Top 10 Lessons LearnedOscon2014 Netflix API - Top 10 Lessons Learned
Oscon2014 Netflix API - Top 10 Lessons Learned
 
Mobile+API
Mobile+APIMobile+API
Mobile+API
 
WannaEat: A computer vision-based, multi-platform restaurant lookup app
WannaEat: A computer vision-based, multi-platform restaurant lookup appWannaEat: A computer vision-based, multi-platform restaurant lookup app
WannaEat: A computer vision-based, multi-platform restaurant lookup app
 
Automation Central - FME @ HOK
Automation Central - FME @ HOKAutomation Central - FME @ HOK
Automation Central - FME @ HOK
 
What (and why) Admins need to know about Unit Testing, Julio Fernandez & Dori...
What (and why) Admins need to know about Unit Testing, Julio Fernandez & Dori...What (and why) Admins need to know about Unit Testing, Julio Fernandez & Dori...
What (and why) Admins need to know about Unit Testing, Julio Fernandez & Dori...
 
What are the Characteristics of High-rated Apps
What are the Characteristics of High-rated AppsWhat are the Characteristics of High-rated Apps
What are the Characteristics of High-rated Apps
 
TestBoss Manchester March 2019 - Automation in Testing: The missing piece
TestBoss Manchester March 2019 - Automation in Testing: The missing pieceTestBoss Manchester March 2019 - Automation in Testing: The missing piece
TestBoss Manchester March 2019 - Automation in Testing: The missing piece
 
HSPS 2015 - SharePoint Performance Santiy Checks
HSPS 2015 - SharePoint Performance Santiy ChecksHSPS 2015 - SharePoint Performance Santiy Checks
HSPS 2015 - SharePoint Performance Santiy Checks
 
Conversionista : Conversion manager course - Stockholm 20 march 2013
Conversionista : Conversion manager course  - Stockholm 20 march 2013Conversionista : Conversion manager course  - Stockholm 20 march 2013
Conversionista : Conversion manager course - Stockholm 20 march 2013
 
Tools of destruction - Efrim Bartosik
Tools of destruction  - Efrim BartosikTools of destruction  - Efrim Bartosik
Tools of destruction - Efrim Bartosik
 

Mehr von Ting-Hao Huang

A Crowd-Powered Conversational Assistant That Automates Itself Over Time
A Crowd-Powered Conversational Assistant That Automates Itself Over TimeA Crowd-Powered Conversational Assistant That Automates Itself Over Time
A Crowd-Powered Conversational Assistant That Automates Itself Over TimeTing-Hao Huang
 
Evorus: A Crowd-Powered Conversational Assistant Built to Automate Itself Ove...
Evorus: A Crowd-Powered Conversational Assistant Built to Automate Itself Ove...Evorus: A Crowd-Powered Conversational Assistant Built to Automate Itself Ove...
Evorus: A Crowd-Powered Conversational Assistant Built to Automate Itself Ove...Ting-Hao Huang
 
A 10-Month-Long Deployment Study of On-Demand Recruiting for Low-Latency Crow...
A 10-Month-Long Deployment Study of On-Demand Recruiting for Low-Latency Crow...A 10-Month-Long Deployment Study of On-Demand Recruiting for Low-Latency Crow...
A 10-Month-Long Deployment Study of On-Demand Recruiting for Low-Latency Crow...Ting-Hao Huang
 
Real-time On-Demand Crowd-powered Entity Extraction
Real-time On-Demand Crowd-powered Entity ExtractionReal-time On-Demand Crowd-powered Entity Extraction
Real-time On-Demand Crowd-powered Entity ExtractionTing-Hao Huang
 
"Is there anything else I can help you with?": Challenges in Deploying an On-...
"Is there anything else I can help you with?": Challenges in Deploying an On-..."Is there anything else I can help you with?": Challenges in Deploying an On-...
"Is there anything else I can help you with?": Challenges in Deploying an On-...Ting-Hao Huang
 
Visual Storytelling (NAACL 2016, Poster)
Visual Storytelling (NAACL 2016, Poster)Visual Storytelling (NAACL 2016, Poster)
Visual Storytelling (NAACL 2016, Poster)Ting-Hao Huang
 
Social Metaphor Detection via Topical Analysis
Social Metaphor Detection via Topical AnalysisSocial Metaphor Detection via Topical Analysis
Social Metaphor Detection via Topical AnalysisTing-Hao Huang
 

Mehr von Ting-Hao Huang (7)

A Crowd-Powered Conversational Assistant That Automates Itself Over Time
A Crowd-Powered Conversational Assistant That Automates Itself Over TimeA Crowd-Powered Conversational Assistant That Automates Itself Over Time
A Crowd-Powered Conversational Assistant That Automates Itself Over Time
 
Evorus: A Crowd-Powered Conversational Assistant Built to Automate Itself Ove...
Evorus: A Crowd-Powered Conversational Assistant Built to Automate Itself Ove...Evorus: A Crowd-Powered Conversational Assistant Built to Automate Itself Ove...
Evorus: A Crowd-Powered Conversational Assistant Built to Automate Itself Ove...
 
A 10-Month-Long Deployment Study of On-Demand Recruiting for Low-Latency Crow...
A 10-Month-Long Deployment Study of On-Demand Recruiting for Low-Latency Crow...A 10-Month-Long Deployment Study of On-Demand Recruiting for Low-Latency Crow...
A 10-Month-Long Deployment Study of On-Demand Recruiting for Low-Latency Crow...
 
Real-time On-Demand Crowd-powered Entity Extraction
Real-time On-Demand Crowd-powered Entity ExtractionReal-time On-Demand Crowd-powered Entity Extraction
Real-time On-Demand Crowd-powered Entity Extraction
 
"Is there anything else I can help you with?": Challenges in Deploying an On-...
"Is there anything else I can help you with?": Challenges in Deploying an On-..."Is there anything else I can help you with?": Challenges in Deploying an On-...
"Is there anything else I can help you with?": Challenges in Deploying an On-...
 
Visual Storytelling (NAACL 2016, Poster)
Visual Storytelling (NAACL 2016, Poster)Visual Storytelling (NAACL 2016, Poster)
Visual Storytelling (NAACL 2016, Poster)
 
Social Metaphor Detection via Topical Analysis
Social Metaphor Detection via Topical AnalysisSocial Metaphor Detection via Topical Analysis
Social Metaphor Detection via Topical Analysis
 

Kürzlich hochgeladen

Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demoHarshalMandlekar2
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embeddingZilliz
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESmohitsingh558521
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
What is Artificial Intelligence?????????
What is Artificial Intelligence?????????What is Artificial Intelligence?????????
What is Artificial Intelligence?????????blackmambaettijean
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsNathaniel Shimoni
 

Kürzlich hochgeladen (20)

Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demo
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embedding
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
What is Artificial Intelligence?????????
What is Artificial Intelligence?????????What is Artificial Intelligence?????????
What is Artificial Intelligence?????????
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
 

Guardian: A Crowd-Powered Spoken Dialog System for Web APIs

  • 1. 1 / 28 http://www.flickr.com/photos/joshmichtom/4311110421/ Guardian: A Crowd-Powered Spoken Dialog System for Web APIs Ting-Hao (Kenneth) Huang Carnegie Mellon University Walter S. Lasecki University of Michigan Jeffrey P. Bigham Carnegie Mellon University
  • 2. 2 / 28 What time is it? It’s 9:30. Kenneth’s apartment.
  • 3. 3 / 28 How was the Pirates game last night? ! Kenneth’s apartment.
  • 4. 4 / 28 How was the Steelers game yesterday? ! Kenneth’s apartment.
  • 5. 5 / 28 Is the movie Martian still playing in theaters? ! Kenneth’s apartment.
  • 6. 6 / 28 Use Web APIs to Empower Dialog Systems
  • 7. 7 / 28 Gap between User & Machine ?
  • 8. 8 / 28 A Crowdsourcing Solution
  • 9. 9 / 28 Two Challenges term location Hi, I’m in San Diego. Any Chinese restaurants here? Define Parameters Extract Parameters
  • 10. 10 / 28 How Do Dialog Systems Usually Do? term location Hi, I’m in San Diego. Any Chinese restaurants here? Define Parameters Extract Parameters
  • 11. 11 / 28 Bridging this Gap is Expensive • Define Parameters requires Experts – Experts are expensive. – Most services are not designed for dialog systems. – Unsupervised Slot Induction • Extract Parameters requires Data – (Which we don’t have.) – Supervised Slot Filling • Slot Filling / Entity Recognition – No labeled data • State Tracking – No dialogue data – Unsupervised Slot Filling
  • 12. 12 / 28 Can the Crowd Do It? term location Hi, I’m in San Diego. Any Chinese restaurants here? Define Parameters Extract Parameters
  • 13. 13 / 28 Define Parameters term location Hi, I’m in San Diego. Any Chinese restaurants here? Define Parameters Extract Parameters
  • 14. 14 / 28 How machines understand a Web API? 1. Use which parameters ? 2. Ask user what questions to elicit these parameters? Yelp Search API 2.0 has 22 parameters.
  • 15. 15 / 28 Parameter Rating Problem offset term location sw_latitude sw_longitude category_filter accuracy deals_filter radius_filter ... offset term location sw_latitude sw_longitude category_filter accuracy deals_filter radius_filter ... Pick good parameters for the dialog system.
  • 16. 16 / 28 How about just do a survey? Task Parameter Name / Desc
  • 17. 17 / 28 Baselines 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 MAP MRR Not Unnatural Ask Siri Ask a Friend Average results of 8 Web APIs’ parameters Results are not so good...
  • 18. 18 / 28 Match Questions with Parameters offset I like Chinese food. What do you want to eat? ? ! I’m in Pittsburgh. Which city are you in? ? ! Dinner. Is it dinner or lunch? ? ! ... location ? ! term ? ! ! ? ! ? ! ? ! ? ! category_filter ? ! ? ! ? ! ? ! ? ! ? ! ? ! ? ! ? ! ? ! ? ! ? ! ? ! ? ! ? ! ? ! ? !? ! ? ! ? ! ? ! ? ! ? !? ! ? ! ? ! ? ! ? !? ! ? ! ? ! ? ! ? ! ? ! ? ! term location sw_latitude sw_longitude category_filter BetterParameter Yelp API Question Collection Parameter Filtering Qestion-Parameter Matching
  • 19. 19 / 28 Evaluation on Parameter Ranking 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 MAP MRR Question Matching Not Unnatural Ask Siri Ask a Friend Question Matching outperforms all baselines. Average results of 8 Web APIs’ parameters
  • 20. 20 / 28 Questions Collected Already! 1. Use which parameters ? 2. Ask user what questions to elicit these parameters? Yelp Search API 2.0 has 22 parameters.
  • 21. 21 / 28 Extract Parameters term location Define Parameters Extract Parameters Hi, I’m in San Diego. Any Chinese restaurants here?
  • 22. 22 / 28 Dialog ESP Game Hi, I’m in San Diego. Answer Aggregate Location = San Diego RecruitedPlayers Time Constraint
  • 23. 23 / 28 Guardian: A Crowd-Powered Spoken Dialog System for Web APIs 3 2 Call Web APIHi, I’m in San Diego. Any Chinese restaurants here? 1 Talk and Extract Parameter Interpret Result to User Mandarin Wok Restaurant is good ! It’s on 4227 Balboa Ave. term = Chinese location = San Diego Yelp Search API 2.0 { ... "name": "Mandarin Wok Restaurant”,... "address":["4227 Balboa Ave”,...], …} JSON
  • 24. 24 / 28 Engineering Challenges • Real-time Response ……..…..……… Retainer Model • Converse with User ……………………………….. Chorus • Speech Recognition ………………... Web Speech API • Parameter Extraction ………..…… Dialog ESP-Game • JSON Visualization ………………..….. JSON Visualizer • Response Generation Assistant ………………. jQuery • Workflow Control ………………. Game-like Interface • Dialog Management …………. Finite-state Machine • Crowdsourcing Platform …………. Mechanical Turk
  • 25. 25 / 28 System Evaluation Web API Task Find Chinese restaurants in Pittsburgh. Check current weather by using a zip code. Find information of “Titanic”. Valid JSON 9 / 10 9 / 10 6 / 10 Task Completion 10 / 10 9 / 10 10 / 10 Domain Referenced TCR 0.96 0.94 0.88
  • 26. 26 / 28 Guardian: A Hybrid Framework Annotate Data on the Fly !
  • 27. 27 / 28 What’s next? • More Automations – Slot Filling / Entity Recognition – Dialog Management – Response Generation • 1,000+ APIs? • Future of Dialog Systems – What if you can really talk to a machine… – On wearable device?
  • 28. 28 / 28 Thank you! http://www.flickr.com/photos/joshmichtom/4311110421/

Hinweis der Redaktion

  1. Hi every one, I am Kenneth from Carnegie Mellon University, Pittsburgh. Today, I am going to talk about the Guardian, a crowd-powered dialog system for Web APIs. This is a joint work with Walter from Michigan and Jeff from CMU. For this talk, we found this interesting photo on Flickr. I says DO.NOT.TALK.TO.MACHINE. I mean, why? Today we have many devices that we can talk with. We have Siri, we have Cortana, we have Google Now, and we have Amazon Echo. I have an Echo in my apartment.
  2. I can ask simple questions like what time is it, and it will say, it is 9 :30. Or I can ask the weather, and it will tell me the weather today. However, as researcher from Pittsburgh, I really want to ask this question to Echo:
  3. Echo is not able to answer this question. This is NOT because of a bad speech recognition. When I ask “what is Pittsburgh Pirate”, it can tell you basic information about this baseball team. It can not answer this question because it does not have this knowledge and service supported in the bad end. OK, let’e try again
  4. How about Steelers?
  5. How about the movie? It turned out that the dialog systems almost always have a limitation of its capability. They can answer questions in some certain domains, and answers is reasonably good. But when you ask something out of the system’s scope, the system almost has no ability to handle it. How to empower your intelligent agent to handle many many different domains?
  6. We think of Web APIs. This page shows the ProgrammableWeb, a web site that collects Web APIs. Nowadays, it contains 14 thousands of Web APIs. That is a lot of resources we can explore. Web API is a representation of the knowledges available on the Internet. Most Web APIs follow the same identical protocols, that is the RSET protocol, and it makes life much easier when you try to implement a new wrapper for a new Web API. However, adding new arbitrary service to a dialog system is not quite easy.
  7. All dialog systems in the world encounter this challenge: Humans and machines do not talk to each other easily. Machines do not have problems talking with other machines. However, there is a significant communication gap between humans and machines. Machine needs to understand your word to do the task for you.
  8. In this work, we propose to use crowdsourcing to bridge this gap. How to do it? Let’s take a closer look:
  9. In any dialog systems, if you want to add a new service to your system, you need to solve 2 main problems: Define the slots, and fill the slots. In other word, under the context of APIs, it is to define the parameters, and to fill the parameters. For example, if you want to add Yelp Search API to your system. Firstly, you need to know what information is required by this API. That is location, and query term. And in your system, you need to have something extracting the location and query term for you, so that you can call and use the API. How do modern dialog systems usually do it?
  10. Not surprisingly, they have the experts or the API provider to define a set of parameters that fit the capability of the service and the context of dialog. And for parameter extraction, modern dialog systems usually use automated approaches to do it. There are bunch of supervised learning methods like CRF or RNN that you can train a entity recognizer or slot filler from labeled training data. What is the problem here?
  11. Those bridging steps are expensive and painful. Most services are not designed for dialog systems, so you need to design a set of proper slots that can be used in the dialog of this services. However, experts who understand both the API and the dialog system is not always available. Even they are available, they can be expensive. More importantly, automated parameter extraction technology usually relies on labelled training data. But we do not have data. We do not always have training data available for arbitrary APIs.
  12. In response to these pains, we propose to use crowdsourcing to solve both of the problems: Parameter defining and parameter extraction. How do we do it?
  13. Let’s start from defining parameters. How to use the crowd to do it?
  14. Let’s take a step back and think about this problem. What does a machine understand a web API? There are two main things: First, know the parameters to use. Second, know the question to ask. First, in Yelp Search API, there are 22 parameters in total. Not all of them fits in the context of dialog systems. For instance, in a human conversation, it’s not likely you’re going to specify the location by longitude and latitude. Only the location parameter which takes location names as input can be used in the dialog system. How to choose the reasonable parameter? Second, now you know Yelp API requires the parameter “location”, but how to make the user provide his location? -- Simply by asking. The system needs to know what question to ask. In this case, “where are you?” Let’s start from the first problem.
  15. How to choose parameters? We think of this problem as a Parameter Rating Problem. Imagine you have a list of all parameters of Yelp API. The task is to rate how good is each parameter for dialog systems. The output is the rating score attached to each parameter, and thus you can have a ranking list of all parameters.
  16. As a crowdsourcing person, people would ask: Why don’t you just tell the crowd what you want and do a survey on each parameters? So we did. This is our interface. This survey is conducted on CrowdFlower. For each parameter, we show the parameter name, parameter’s description, and the task of the API. Then we ask the worker to imagine a scenario, and rate how likely you are going to provide the information of this parameter as a user. To be more careful, we run experiment on three different scenarios. First, ask Siri. Imagine you’re talking to Siri, how likely you’re going to provide this information? Second, as a friend. Imagine you can not use Internet right now and call a friend for help, how likely you’re going to provide this information? Third, we also ask the workers to rate how wired is the parameter, and use “Not Weird” as rating. How does this work?
  17. We run this three experiments on 8 Web APIs. For evaluation, we compare the output ranking list against the expert annotated ranking list. We use MAP -- the Mean Average Precision -- and MRR -- the Mean Reciprocal Rank -- which are two common evaluation metrics for ranking list to evaluation the output. It turned out these three survey questions are all not good enough. The MAP and MRR are all low. And when you take a look by eyes, the output is not close to practical use. So we might need some workflow that are more complicated…
  18. Like this! The ideas we propose here is to collect questions related to this task, and then ask the workers use questions to vote for parameters. Take the Yelp API for example, we first collect all possible questions from the crowd. Like “what do you want to eat?”, “where are you?”, “What’s your budget?”and so on. And then we ask workers to associate questions with parameters. So essentially, the workers are using questions to vote for parameters. We assume the parameters that are associated with more questions are better for dialog systems. How does this work?
  19. It turned out our workflow outperforms all three baselines. When you take a look at the result, you will know the quality is much better and close to practical use.
  20. Even better, this workflow naturally solves the second question of this stage! The questions have been collected in the workflow. And we can just use it. So, let’s move to the second challenge.
  21. How to extract parameters from a running conversation, if you don’t have any training data? How to do it? We think real-time crowdsourcing can help.
  22. We propose a multi-player Dialog ESP Game to extract parameter values from a running conversation. ESP Game is originally proposed for image labeling, now we adopt the idea to dialog. In the interface, we show the dialog, we show the description of the parameter, and ask the workers to type what the other workers might type If there are two answers matching with each other, we take it as the extracted parameter value. This method works well. Now we can extract parameters without having any training data. Therefore, based on all the works we’ve done, we propose a system called “Guardian”:
  23. Guaridan’s framework contains three main steps: First, the workers have a conversation with the user, and extract the parameter values with a dialog ESP Game. Second, behind the scenes, the system will us these values to call the Yelp API and run the query. Finally, when Yelp API returns the result, it’s in a JSON file. We also use the crowd to interpret the response. We visualize the JSON file as a user friendly interface. The workers can click through the data and explore the information inside the JSON. By using Guardian, we can have a running dialog system without using any training data or even pre-knowledge of task.
  24. To build an end-to-end system of Guardian to test our idea, we encounter a lot of engineering challenges. For real-time response , we implement a retainer model; For having the capability to converse with the user, we use the propose and vote mechanism of Chorus; For speech recognition, we use Web Speech API of Google Chrome; For parameter extraction, we implement the dialog ESP-Game; We also use JSON Visualizer to visualize the JSON object. Finally, we use a game-like interface to put every small features together.
  25. We implement the system on 3 different Web APIs. Yelp API for restaurant search, Weather Underground API for weather query, and RottenTomatoes API for movie query. We design three small tasks for each API, and run 10 trials on each systems. Here we only talking about the task completion rate. By task completion we mean the system provides the valid responses that contains the information the user requires. You can see the task completion rate is almost perfect. It’s because, first, the task here is relatively simple, second, even when the results returned from the API is incorrect, most of the time, crowd workers is able to figure it out the recover the correct answers. We also compare our result with the task completion rate reported by literature. The numbers are not directly comparable, but you can still see that our system reaches the same level of task completion rate with automated systems.
  26. At the end of the day, we have a hybrid dialog system framework that is running both by the crowd and the machine. It doesn’t requires any training data or domain knowledge. Furthermore, it keeps annotating data when it runs. So when you run this system for an API for a while, you can have a small amount of labeled data and start to think about possible automation. That brings us to the future work.
  27. What’s next? The first thing comes to our mind is automations. Each steps in Guardian system can be somewhat automated. Entity extraction, dialog management, and response generation. Once we start running Guardian, we start creating annotated data; And once we collect enough data, all the automations will become possible. Second, what happens when we what to add 1000 APIs? Will there be any new challenges? We would also like to explore on that. And the ultimate question we want to ask is, if we have a system that contains thousands of web APIs and can actually talk to us, what are we going to do with it?
  28. Maybe one day, we can proudly say: Oh, sure, you can talk to machine. But inside that smart machine, some crowd workers are working hard on-line to help the machine. So, sort of. Thank you very much.