Real-time On-Demand Crowd-powered Entity Extraction

•Als PPTX, PDF herunterladen•

1 gefällt mir•775 views

The document discusses crowd-powered conversational agents and experiments comparing different methods for aggregating responses from multiple workers. It finds that for simple queries, taking the first response provides the best speed while taking the first matching response and ESP provides the best quality. For more complex queries, ESP + first matching response with 5-8 workers and 15-20 seconds performs best, achieving around 80% F1 score. More workers generally provides better results but slower response times.

Technologie

2/20
Chorus: A Crowd-powered
Conversational Assistant
"Is there anything else I can help you with?":
Challenges in Deploying an On-Demand Crowd-Powered Conversational Agent
Ting-Hao K. Huang, Walter S. Lasecki, Amos Azaria, Jeffrey P. Bigham. HCOMP’16

3/20
Guardian: A Crowd-Powered Dialog System
for Web APIs
Guardian: A Crowd-Powered Spoken Dialog System for Web APIs
Ting-Hao K. Huang, Walter S. Lasecki, Jeffrey P. Bigham. HCOMP’15

5/20
Time-Limited Output-Agreement Mechanism
Sunday flights from New York City to Las Vegas
Answer
Aggregate
Destination:
Las Vegas
RecruitedPlayers
Time Constraint

6/20
Worker Interface
http://tinyurl.com/DiaESPDemo

7/20
We Want to Know More!
How fast?
How many
players?
How good?
Trade-offs ?

8/20
3 Variables
Sunday flights from New York City to Las Vegas
Answer
Aggregate
Destination:
Las Vegas
RecruitedPlayers
Time Constraint
3. Answer Aggregate
Method
2. Time Constraint
1. Number of Players

9/20
Aggregate Method 1: ESP Only
ESP Answers
do NOT
Match
Empty
Label
ESP Answer
Matches
Time

10/20
Aggregate Method 2: 1st Only
ESP Answers
do NOT
Match
ESP Answer
Matches
Time

11/20
Aggregate Method 3: ESP + 1st
ESP Answers
do NOT
Match
ESP Answer
Matches
Time

12/20
Experiment
• Data
– Airline Travel Information System (ATIS)
• Class A: Context Independent
• Class D: Context Dependent
• Class X: Unevaluable
• Settings
– Focus on the toloc.city_name slot
– Number of workers = 10
– Time constraint = 15 and 20 seconds
– 3 aggregate methods
– Using Amazon Mechanical Turk
Simple
Complex

13/20
Simple Queries (Class A)
ESP + 1st
has the best quality
1st Only
has the best speed
20 seconds has
better quality &
similar speed

14/20
Trade-Offs on Simple Queries (Class A)
4
6
8
10
12
14
16
18
20
0 2 4 6 8 10
Avg.ResponseTime(sec)
# Player
ESP + First (20 sec)
ESP + First (15 sec)
First (20 sec)
First (15 sec)
0.60
0.65
0.70
0.75
0.80
0.85
0.90
0.95
1.00
0 2 4 6 8 10
F1-score
# Player
ESP + 1st (20 sec)
ESP + 1st (15 sec)
1st (20 sec)
1st (15 sec)
0.65
0.70
0.75
0.80
0.85
0.90
0.95
5 6 7 8 9 10 11 12
F1-score
Avg. Response Time (sec)
10 Players
9 Player
8 players
7 Players
6 Players
5 Players
ESP + 1st
(20 sec)
1st Only
(20 sec)
More Players,
Faster
More Players,
Better Result
Faster,
Worse Result

15/20
0
1
2
3
4
5
6
7
8
9
1 2 3
Series1
Series2
0
0.2
0.4
0.6
0.8
1
1 2 3
Series1
Series2
Series3
F1-score Response Time (sec)
On Complex Queries (Class D & X)
Automatic
F1-score = 0.8
(Class D)
5 to 8 seconds
(1st / ESP+1st )

16/20
Now we know…
5 to 8 seconds.
10 Players!
(5 is also fine.)
F1 = 0.8 in Class D.
F1 = 0.9 in Class A.
Yes. Trade-offs.

17/20
Eatity System
• Extracting food entities from user messages
• Accuracy(Food) = 78.89% (In-lab study, 150 msgs)
Accuracy(Drink) = 83.33%

18/20
When to Use it?
• As a backup / support for automated annotators
– One player can be an automated annotator
– Low-confidence or failed cases / Validation
• Crowd-powered Systems
– Deployed Chorus: TalkingToTheCrowd.org

20/20
Thank you!
@windx0303
Ting-Hao (Kenneth) Huang
Carnegie Mellon University
KennethHuang.cc
Jeffrey P. Bigham
Carnegie Mellon University
www.JeffreyBigham.com
Yun-Nung Chen
National Taiwan University
VivianChen.idv.tw

22/20
How about having humans do it?
Ling Tung University, 35th 2016 Young Designers Exhibition, Taiwan
https://www.facebook.com/nownews/videos/10153864340447663/

23/20
Why always pick the 1st?
0.60
0.65
0.70
0.75
0.80
0.85
0 1 2 3 4 5 6 7
F1-score
Input Order (i)
10 Players
4 Players
2 Players
Because they are better.

Empfohlen

A Crowd-Powered Conversational Assistant That Automates Itself Over TimeTing-Hao Huang

Evorus: A Crowd-Powered Conversational Assistant Built to Automate Itself Ove...Ting-Hao Huang

A 10-Month-Long Deployment Study of On-Demand Recruiting for Low-Latency Crow...Ting-Hao Huang

A Crowd-Powered Conversational Assistant That Automates Itself Over TimeTing-Hao Huang

"Is there anything else I can help you with?": Challenges in Deploying an On-...Ting-Hao Huang

Visual Storytelling (NAACL 2016, Poster)Ting-Hao Huang

Social Metaphor Detection via Topical AnalysisTing-Hao Huang

Guardian: A Crowd-Powered Spoken Dialog System for Web APIsTing-Hao Huang

Empfohlen

A Crowd-Powered Conversational Assistant That Automates Itself Over TimeTing-Hao Huang

Evorus: A Crowd-Powered Conversational Assistant Built to Automate Itself Ove...Ting-Hao Huang

A 10-Month-Long Deployment Study of On-Demand Recruiting for Low-Latency Crow...Ting-Hao Huang

A Crowd-Powered Conversational Assistant That Automates Itself Over TimeTing-Hao Huang

"Is there anything else I can help you with?": Challenges in Deploying an On-...Ting-Hao Huang

Visual Storytelling (NAACL 2016, Poster)Ting-Hao Huang

Social Metaphor Detection via Topical AnalysisTing-Hao Huang

Guardian: A Crowd-Powered Spoken Dialog System for Web APIsTing-Hao Huang

Platformless Horizons for Digital AdaptabilityWSO2

Corporate and higher education May webinar.pptxRustici Software

Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Jeffrey Haguewood

Vector Search -An Introduction in Oracle Database 23ai.pptxRemote DBA Services

Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...apidays

ICT role in 21st century education and its challengesrafiqahmad00786416

MS Copilot expands with MS Graph connectorsNanddeep Nachan

Finding Java's Hidden Performance Traps @ DevoxxUK 2024Victor Rentea

Understanding the FAA Part 107 License ..Christopher Logan Kennedy

Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Orbitshub

Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services

Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays

[BuildWithAI] Introduction to Gemini.pdfSandro Moreira

Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Angeliki Cooney

"I see eyes in my soup": How Delivery Hero implemented the safety system for ...Zilliz

EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWERMadyBayot

MINDCTI Revenue Release Quarter One 2024MIND CTI

CNIC Information System with Pakdata Cf In Pakistandanishmna97

FWD Group - Insurer Innovation Award 2024The Digital Insurer

TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc

2024 State of Marketing Report – by HubspotMarius Sescu

Everything You Need To Know About ChatGPTExpeed Software

Weitere ähnliche Inhalte

Kürzlich hochgeladen

Platformless Horizons for Digital AdaptabilityWSO2

Corporate and higher education May webinar.pptxRustici Software

Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Jeffrey Haguewood

Vector Search -An Introduction in Oracle Database 23ai.pptxRemote DBA Services

Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...apidays

ICT role in 21st century education and its challengesrafiqahmad00786416

MS Copilot expands with MS Graph connectorsNanddeep Nachan

Finding Java's Hidden Performance Traps @ DevoxxUK 2024Victor Rentea

Understanding the FAA Part 107 License ..Christopher Logan Kennedy

Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Orbitshub

Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services

Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays

[BuildWithAI] Introduction to Gemini.pdfSandro Moreira

Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Angeliki Cooney

"I see eyes in my soup": How Delivery Hero implemented the safety system for ...Zilliz

EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWERMadyBayot

MINDCTI Revenue Release Quarter One 2024MIND CTI

CNIC Information System with Pakdata Cf In Pakistandanishmna97

FWD Group - Insurer Innovation Award 2024The Digital Insurer

TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc

Kürzlich hochgeladen (20)

Platformless Horizons for Digital Adaptability

Corporate and higher education May webinar.pptx

Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...

Vector Search -An Introduction in Oracle Database 23ai.pptx

Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...

ICT role in 21st century education and its challenges

MS Copilot expands with MS Graph connectors

Finding Java's Hidden Performance Traps @ DevoxxUK 2024

Understanding the FAA Part 107 License ..

Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...

Strategies for Landing an Oracle DBA Job as a Fresher

Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe

[BuildWithAI] Introduction to Gemini.pdf

Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...

"I see eyes in my soup": How Delivery Hero implemented the safety system for ...

EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER

MINDCTI Revenue Release Quarter One 2024

CNIC Information System with Pakdata Cf In Pakistan

FWD Group - Insurer Innovation Award 2024

TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery

Empfohlen

2024 State of Marketing Report – by HubspotMarius Sescu

Everything You Need To Know About ChatGPTExpeed Software

Product Design Trends in 2024 | Teenage EngineeringsPixeldarts

How Race, Age and Gender Shape Attitudes Towards Mental HealthThinkNow

AI Trends in Creative Operations 2024 by Artwork Flow.pdfmarketingartwork

Skeleton Culture CodeSkeleton Technologies

PEPSICO Presentation to CAGNY Conference Feb 2024Neil Kimberley

Content Methodology: A Best Practices Report (Webinar)contently

How to Prepare For a Successful Job Search for 2024Albert Qian

Social Media Marketing Trends 2024 // The Global Indie InsightsKurio // The Social Media Age(ncy)

Trends In Paid Search: Navigating The Digital Landscape In 2024Search Engine Journal

5 Public speaking tips from TED - Visualized summarySpeakerHub

ChatGPT and the Future of Work - Clark Boyd Clark Boyd

Getting into the tech field. what next Tessa Mero

Google's Just Not That Into You: Understanding Core Updates & Search IntentLily Ray

How to have difficult conversations Rajiv Jayarajah, MAppComm, ACC

Introduction to Data ScienceChristy Abraham Joy

Time Management & Productivity - Best PracticesVit Horky

The six step guide to practical project managementMindGenius

Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...RachelPearson36

Empfohlen (20)

2024 State of Marketing Report – by Hubspot

Everything You Need To Know About ChatGPT

Product Design Trends in 2024 | Teenage Engineerings

How Race, Age and Gender Shape Attitudes Towards Mental Health

AI Trends in Creative Operations 2024 by Artwork Flow.pdf

Skeleton Culture Code

PEPSICO Presentation to CAGNY Conference Feb 2024

Content Methodology: A Best Practices Report (Webinar)

How to Prepare For a Successful Job Search for 2024

Social Media Marketing Trends 2024 // The Global Indie Insights

Trends In Paid Search: Navigating The Digital Landscape In 2024

5 Public speaking tips from TED - Visualized summary

ChatGPT and the Future of Work - Clark Boyd

Getting into the tech field. what next

Google's Just Not That Into You: Understanding Core Updates & Search Intent

How to have difficult conversations

Introduction to Data Science

Time Management & Productivity - Best Practices

The six step guide to practical project management

Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...

Real-time On-Demand Crowd-powered Entity Extraction

1. 1/20

2. 2/20 Chorus: A Crowd-powered Conversational Assistant "Is there anything else I can help you with?": Challenges in Deploying an On-Demand Crowd-Powered Conversational Agent Ting-Hao K. Huang, Walter S. Lasecki, Amos Azaria, Jeffrey P. Bigham. HCOMP’16

3. 3/20 Guardian: A Crowd-Powered Dialog System for Web APIs Guardian: A Crowd-Powered Spoken Dialog System for Web APIs Ting-Hao K. Huang, Walter S. Lasecki, Jeffrey P. Bigham. HCOMP’15

4. 4/20 Crowd-powered Entity Extraction

5. 5/20 Time-Limited Output-Agreement Mechanism Sunday flights from New York City to Las Vegas Answer Aggregate Destination: Las Vegas RecruitedPlayers Time Constraint

6. 6/20 Worker Interface http://tinyurl.com/DiaESPDemo

7. 7/20 We Want to Know More! How fast? How many players? How good? Trade-offs ?

8. 8/20 3 Variables Sunday flights from New York City to Las Vegas Answer Aggregate Destination: Las Vegas RecruitedPlayers Time Constraint 3. Answer Aggregate Method 2. Time Constraint 1. Number of Players

9. 9/20 Aggregate Method 1: ESP Only ESP Answers do NOT Match Empty Label ESP Answer Matches Time

10. 10/20 Aggregate Method 2: 1st Only ESP Answers do NOT Match ESP Answer Matches Time

11. 11/20 Aggregate Method 3: ESP + 1st ESP Answers do NOT Match ESP Answer Matches Time

12. 12/20 Experiment • Data – Airline Travel Information System (ATIS) • Class A: Context Independent • Class D: Context Dependent • Class X: Unevaluable • Settings – Focus on the toloc.city_name slot – Number of workers = 10 – Time constraint = 15 and 20 seconds – 3 aggregate methods – Using Amazon Mechanical Turk Simple Complex

13. 13/20 Simple Queries (Class A) ESP + 1st has the best quality 1st Only has the best speed 20 seconds has better quality & similar speed

14. 14/20 Trade-Offs on Simple Queries (Class A) 4 6 8 10 12 14 16 18 20 0 2 4 6 8 10 Avg.ResponseTime(sec) # Player ESP + First (20 sec) ESP + First (15 sec) First (20 sec) First (15 sec) 0.60 0.65 0.70 0.75 0.80 0.85 0.90 0.95 1.00 0 2 4 6 8 10 F1-score # Player ESP + 1st (20 sec) ESP + 1st (15 sec) 1st (20 sec) 1st (15 sec) 0.65 0.70 0.75 0.80 0.85 0.90 0.95 5 6 7 8 9 10 11 12 F1-score Avg. Response Time (sec) 10 Players 9 Player 8 players 7 Players 6 Players 5 Players ESP + 1st (20 sec) 1st Only (20 sec) More Players, Faster More Players, Better Result Faster, Worse Result

15. 15/20 0 1 2 3 4 5 6 7 8 9 1 2 3 Series1 Series2 0 0.2 0.4 0.6 0.8 1 1 2 3 Series1 Series2 Series3 F1-score Response Time (sec) On Complex Queries (Class D & X) Automatic F1-score = 0.8 (Class D) 5 to 8 seconds (1st / ESP+1st )

16. 16/20 Now we know… 5 to 8 seconds. 10 Players! (5 is also fine.) F1 = 0.8 in Class D. F1 = 0.9 in Class A. Yes. Trade-offs.

17. 17/20 Eatity System • Extracting food entities from user messages • Accuracy(Food) = 78.89% (In-lab study, 150 msgs) Accuracy(Drink) = 83.33%

18. 18/20 When to Use it? • As a backup / support for automated annotators – One player can be an automated annotator – Low-confidence or failed cases / Validation • Crowd-powered Systems – Deployed Chorus: TalkingToTheCrowd.org

19. 19/20 How about having humans do it?

20. 20/20 Thank you! @windx0303 Ting-Hao (Kenneth) Huang Carnegie Mellon University KennethHuang.cc Jeffrey P. Bigham Carnegie Mellon University www.JeffreyBigham.com Yun-Nung Chen National Taiwan University VivianChen.idv.tw

21. 21/20 Backup Slides

22. 22/20 How about having humans do it? Ling Tung University, 35th 2016 Young Designers Exhibition, Taiwan https://www.facebook.com/nownews/videos/10153864340447663/

23. 23/20 Why always pick the 1st? 0.60 0.65 0.70 0.75 0.80 0.85 0 1 2 3 4 5 6 7 F1-score Input Order (i) 10 Players 4 Players 2 Players Because they are better.