SlideShare a Scribd company logo
1 of 23
Download to read offline
Interactive Learning of
Task-Oriented Dialog Systems
Bing Liu
Research Scientist, Facebook Conversational AI
Rasa Developer Summit - 2019
Interactive Learning of Task-Oriented
Dialog Systems
Bing Liu
Research Scientist, Facebook
PhD, Carnegie Mellon University
❖ Dialog systems
➢ Chit-chat bot, QA bot, task-oriented dialog system, ...
❖ Get stuff done - assist users in completing specific tasks
➢ Personal assistants (e.g. Siri, Alexa, Google Assistant, Hey Portal)
➢ Voice command in vehicle and smart home
➢ Customer service; Sales and marketing
Task-Oriented Dialog System
2
Modular Dialog System Architecture
3
Task-Oriented Dialog System
❖ Highly handcrafted
❖ Process interdependent
4
❖ Data driven end-to-end (E2E) systems
➢ [Wen et al. 2016]: E2E supervised training neural dialog model
➢ [Bordes and Weston, 2017]: E2E model with memory network
➢ [Andrea et al, 2018]: Mem2Seq for incorporating knowledge to E2E
system
❖ Interactive learning for E2E system with less human supervision
Why Learn through Interactions?
❖ Task-oriented dialog as a sequential decision making process over
multiple steps
5
❖ State space grows exponentially with number of dialog turns
❖ Extremely hard to
➢ Design all possible dialog paths
➢ Collect a dialog corpus that is large
enough to cover all dialog scenarios
→ Continuously learn through the interaction
with users and improve over time
How can we learn end-to-end task-oriented dialog
system effectively through interaction with users?
6
End-to-End Task-Oriented Dialog Modeling
7
❖ Dialog context modeling with hierarchical RNN
B Liu, et al, "Dialogue Learning with Human Teaching and Feedback in End-To-End Trainable Task-Oriented Dialogue Systems", NAACL 2018.
End-to-End Task-Oriented Dialog Modeling
8
End-to-End Modeling of
SLU, DST, and Dialog Policy
Supervised Pre-training
❖ Supervised model pre-training on dialog corpus with MLE
➢ Objective function: linear interpolation of cross-entropy losses for
■ Dialog state tracking, i.e. user goal estimation, and
■ Dialog policy, i.e. system action prediction
➢ Optimization: Stochastic gradient descent, Adam
9
← Loss for user goal estimation
← Loss for system action prediction
Learn Interactively from User Feedback
❖ Interactive dialog learning with user feedback
10
Provide feedback for
policy optimization
Human-Human
Dialog Corpora
Supervised
Pre-training
Learn Interactively from User Feedback
❖ Use user feedback as dialog reward
❖ Introduce step penalty to encourage
shorter dialog for task completion
❖ Optimize dialog model end-to-end
with policy gradient RL:
11
Learn Interactively from User Feedback
❖ Policy optimization with RL can be slow due to sparse reward
12
❖ Dialog state distribution mismatch between offline training and
interactive learning leads to compounding errors
→ Ask user for correction/demonstration
when fails at a task and learn to act
❖ Agent may learn to recover from bad state with
RL but the search process can be very inefficient
Learn Interactively from User Teaching
❖ Interactive dialog learning with user teaching
13
Correct mistakes &
Demo desired dialog
agent behavior
Add to existing corpora
Driven by the
agent’s own policy
New
Dialog
Human-Human
Dialog Corpora
Supervised
Pre-training
Evaluation
14
Slots: theatre name, movie, date, time, num of people
SL: Supervised pre-training model
IL: Imitation learning with user teaching
RL: Reinforcement learning with user feedback
❖ Movie booking domain simulation (M2M)
Table: Human evaluation results. Mean and
standard deviation of crowd worker scores (1-5)
B Liu, et al, "Dialogue Learning with Human Teaching and Feedback in End-To-End Trainable Task-Oriented Dialogue Systems", NAACL 2018.
15
What if a user did not provide any feedback, can we
still learn anything from the interaction?
Can we learn a dialog reward function?
❖ User feedback serves as reward to RL optimization
16
❖ Task completion based reward requires prior knowledge of user’s goal
→ NOT usually accessible in real world user interactions
❖ In practice, user feedback can be inconsistent and is NOT always
available
Adversarial Dialog Learning
17
Reward
Bing Liu and Ian Lane, "Adversarial Learning of Task-Oriented Neural Dialog Models", in SIGDIAL 2018.
❖ Reward a machine-agent for conducting task-oriented dialog in a way
that is indistinguishable from the way human-agents do it.
Discriminative Reward Model
18
User’s Turn Agent’s Turn
External
Entity Info
❖ Input:
➢ Sequence of dialog turns
❖ Representation:
➢ BiLSTM with max-pooling
❖ Output:
➢ Prob. of a dialog being
successfully completed by
a human agent
Bing Liu and Ian Lane, "Adversarial Learning
of Task-Oriented Neural Dialog Models", in
SIGDIAL 2018.
Model Training
❖ Supervised pre-training with an initial set of pos & neg samples
➢ Pre-train dialog agent G on positive dialog samples with MLE
➢ Pre-train discriminative reward function D on pos & neg samples
❖ Interactive learning cycle
➢ Collect new dialog sample(s) between agent G and users
➢ Update dialog agent G with RL using the reward produced by D
➢ Update reward function D using the newly collected sample(s)
➢ Continue for next learning cycle
19
❖ Comparing different reward functions
Evaluation
20
Bing Liu and Ian Lane, "Adversarial Learning of
Task-Oriented Neural Dialog Models", in
SIGDIAL 2018.
Summary
❖ The multi-turn nature of task-oriented dialogs makes it especially
important for a system to learn through interaction with users
❖ Learning task-oriented dialog model end-to-end with user teaching
and feedback
❖ Adversarial dialog learning to address the challenges with missing or
inconsistent user feedback with less human supervision
21
Thanks!
Q & A
22

More Related Content

What's hot

What's hot (20)

Convolutional neural neworks
Convolutional neural neworksConvolutional neural neworks
Convolutional neural neworks
 
Chatbot ppt
Chatbot pptChatbot ppt
Chatbot ppt
 
Text Summarization
Text SummarizationText Summarization
Text Summarization
 
text summarization using amr
text summarization using amrtext summarization using amr
text summarization using amr
 
Generative AI at the edge.pdf
Generative AI at the edge.pdfGenerative AI at the edge.pdf
Generative AI at the edge.pdf
 
Conversation UIs & Chatbots an introduction
Conversation UIs & Chatbots an introductionConversation UIs & Chatbots an introduction
Conversation UIs & Chatbots an introduction
 
Active Retrieval Augmented Generation.pdf
Active Retrieval Augmented Generation.pdfActive Retrieval Augmented Generation.pdf
Active Retrieval Augmented Generation.pdf
 
The current state of generative AI
The current state of generative AIThe current state of generative AI
The current state of generative AI
 
230309_LoRa
230309_LoRa230309_LoRa
230309_LoRa
 
Introduction to Kneser-Ney Smoothing on Top of Generalized Language Models fo...
Introduction to Kneser-Ney Smoothing on Top of Generalized Language Models fo...Introduction to Kneser-Ney Smoothing on Top of Generalized Language Models fo...
Introduction to Kneser-Ney Smoothing on Top of Generalized Language Models fo...
 
Rasa AI: Building clever chatbots
Rasa AI: Building clever chatbotsRasa AI: Building clever chatbots
Rasa AI: Building clever chatbots
 
Open AI Chat GPT.
Open AI Chat GPT.Open AI Chat GPT.
Open AI Chat GPT.
 
How to fine-tune and develop your own large language model.pptx
How to fine-tune and develop your own large language model.pptxHow to fine-tune and develop your own large language model.pptx
How to fine-tune and develop your own large language model.pptx
 
Paper presentation on LLM compression
Paper presentation on LLM compression Paper presentation on LLM compression
Paper presentation on LLM compression
 
Webinar on ChatGPT.pptx
Webinar on ChatGPT.pptxWebinar on ChatGPT.pptx
Webinar on ChatGPT.pptx
 
Using AI chatbots for deep learning and teaching with specific examples to en...
Using AI chatbots for deep learning and teaching with specific examples to en...Using AI chatbots for deep learning and teaching with specific examples to en...
Using AI chatbots for deep learning and teaching with specific examples to en...
 
ChatGPT, Generative AI and Microsoft Copilot: Step Into the Future - Geoff Ab...
ChatGPT, Generative AI and Microsoft Copilot: Step Into the Future - Geoff Ab...ChatGPT, Generative AI and Microsoft Copilot: Step Into the Future - Geoff Ab...
ChatGPT, Generative AI and Microsoft Copilot: Step Into the Future - Geoff Ab...
 
ChatGPT 顛覆傳統的科技創新 - 不僅文字工作者會被AI取代?
ChatGPT 顛覆傳統的科技創新 - 不僅文字工作者會被AI取代?ChatGPT 顛覆傳統的科技創新 - 不僅文字工作者會被AI取代?
ChatGPT 顛覆傳統的科技創新 - 不僅文字工作者會被AI取代?
 
ODSC APAC 2022 - Explainable AI
ODSC APAC 2022 - Explainable AIODSC APAC 2022 - Explainable AI
ODSC APAC 2022 - Explainable AI
 
Bundle Adjustment
Bundle AdjustmentBundle Adjustment
Bundle Adjustment
 

Similar to Rasa Developer Summit - Bing Liu - Interactive Learning of Task-Oriented Dialog Systems

UX class presentation
UX class presentationUX class presentation
UX class presentation
Theo V
 
RESUME_SURABHI_LATEST
RESUME_SURABHI_LATESTRESUME_SURABHI_LATEST
RESUME_SURABHI_LATEST
surabhi hm
 

Similar to Rasa Developer Summit - Bing Liu - Interactive Learning of Task-Oriented Dialog Systems (20)

Li Deng at AI Frontiers: Three Generations of Spoken Dialogue Systems (Bots)
Li Deng at AI Frontiers: Three Generations of Spoken Dialogue Systems (Bots)Li Deng at AI Frontiers: Three Generations of Spoken Dialogue Systems (Bots)
Li Deng at AI Frontiers: Three Generations of Spoken Dialogue Systems (Bots)
 
#1 Berlin Students in AI, Machine Learning & NLP presentation
#1 Berlin Students in AI, Machine Learning & NLP presentation#1 Berlin Students in AI, Machine Learning & NLP presentation
#1 Berlin Students in AI, Machine Learning & NLP presentation
 
Continuous Improvement of Conversational AI in Production | Rasa Summit
Continuous Improvement of Conversational AI in Production | Rasa SummitContinuous Improvement of Conversational AI in Production | Rasa Summit
Continuous Improvement of Conversational AI in Production | Rasa Summit
 
Realizing AI Conversational Bot
Realizing AI Conversational BotRealizing AI Conversational Bot
Realizing AI Conversational Bot
 
Deep Dialog System Review
Deep Dialog System ReviewDeep Dialog System Review
Deep Dialog System Review
 
case study-home.pdf
case study-home.pdfcase study-home.pdf
case study-home.pdf
 
Case study OOPS .pptx
Case study OOPS .pptxCase study OOPS .pptx
Case study OOPS .pptx
 
UX class presentation
UX class presentationUX class presentation
UX class presentation
 
World Usability Day 2009 - Remote vs Lab Usability Testing
World Usability Day 2009 - Remote vs Lab Usability TestingWorld Usability Day 2009 - Remote vs Lab Usability Testing
World Usability Day 2009 - Remote vs Lab Usability Testing
 
Phase 4 Presentation
Phase 4 PresentationPhase 4 Presentation
Phase 4 Presentation
 
Social sales enablement with jive
Social sales enablement with jiveSocial sales enablement with jive
Social sales enablement with jive
 
Design UX for AI
Design UX for AIDesign UX for AI
Design UX for AI
 
Bill on the Hill
Bill on the HillBill on the Hill
Bill on the Hill
 
Jason Brenier's Presentation "Principles of Conversational Business" - Activa...
Jason Brenier's Presentation "Principles of Conversational Business" - Activa...Jason Brenier's Presentation "Principles of Conversational Business" - Activa...
Jason Brenier's Presentation "Principles of Conversational Business" - Activa...
 
By Thoughtworks | Accessible by default: Shift accessibility left with Katie ...
By Thoughtworks | Accessible by default: Shift accessibility left with Katie ...By Thoughtworks | Accessible by default: Shift accessibility left with Katie ...
By Thoughtworks | Accessible by default: Shift accessibility left with Katie ...
 
Understanding Chatbot-Mediated Task Management
Understanding Chatbot-Mediated Task ManagementUnderstanding Chatbot-Mediated Task Management
Understanding Chatbot-Mediated Task Management
 
ChatGPT and OpenAI.pdf
ChatGPT and OpenAI.pdfChatGPT and OpenAI.pdf
ChatGPT and OpenAI.pdf
 
RESUME_SURABHI_LATEST
RESUME_SURABHI_LATESTRESUME_SURABHI_LATEST
RESUME_SURABHI_LATEST
 
Case Study 3 - Portfolio Project Final - Google UX Design Certificate
Case Study 3 - Portfolio Project Final - Google UX Design CertificateCase Study 3 - Portfolio Project Final - Google UX Design Certificate
Case Study 3 - Portfolio Project Final - Google UX Design Certificate
 
Hard and Soft skills: be successful in the IT market
Hard and Soft skills: be successful in the IT marketHard and Soft skills: be successful in the IT market
Hard and Soft skills: be successful in the IT market
 

More from Rasa Technologies

Voice First: Ready Your Content to Serve 50% of Global Searches | Rasa Summit...
Voice First: Ready Your Content to Serve 50% of Global Searches | Rasa Summit...Voice First: Ready Your Content to Serve 50% of Global Searches | Rasa Summit...
Voice First: Ready Your Content to Serve 50% of Global Searches | Rasa Summit...
Rasa Technologies
 

More from Rasa Technologies (20)

Six Steps to Conversation Driven Development
Six Steps to Conversation Driven DevelopmentSix Steps to Conversation Driven Development
Six Steps to Conversation Driven Development
 
Beyond Sentiment Analysis: Creating Engaging Conversational Experiences throu...
Beyond Sentiment Analysis: Creating Engaging Conversational Experiences throu...Beyond Sentiment Analysis: Creating Engaging Conversational Experiences throu...
Beyond Sentiment Analysis: Creating Engaging Conversational Experiences throu...
 
Using Rasa to Power an Immersive Multimedia Conversational Experience | Rasa ...
Using Rasa to Power an Immersive Multimedia Conversational Experience | Rasa ...Using Rasa to Power an Immersive Multimedia Conversational Experience | Rasa ...
Using Rasa to Power an Immersive Multimedia Conversational Experience | Rasa ...
 
How to Effectively Test Your Chatbot | Rasa Summit
How to Effectively Test Your Chatbot  | Rasa SummitHow to Effectively Test Your Chatbot  | Rasa Summit
How to Effectively Test Your Chatbot | Rasa Summit
 
End-to-end dialogue systems, or a feature which wasn’t meant to happen | Rasa...
End-to-end dialogue systems, or a feature which wasn’t meant to happen | Rasa...End-to-end dialogue systems, or a feature which wasn’t meant to happen | Rasa...
End-to-end dialogue systems, or a feature which wasn’t meant to happen | Rasa...
 
Voice First: Ready Your Content to Serve 50% of Global Searches | Rasa Summit...
Voice First: Ready Your Content to Serve 50% of Global Searches | Rasa Summit...Voice First: Ready Your Content to Serve 50% of Global Searches | Rasa Summit...
Voice First: Ready Your Content to Serve 50% of Global Searches | Rasa Summit...
 
The missing link: How AI can help create a safer society and better businesse...
The missing link: How AI can help create a safer society and better businesse...The missing link: How AI can help create a safer society and better businesse...
The missing link: How AI can help create a safer society and better businesse...
 
Boss - Bringing More Diversity to Tech | Rasa Summit
Boss - Bringing More Diversity to Tech | Rasa SummitBoss - Bringing More Diversity to Tech | Rasa Summit
Boss - Bringing More Diversity to Tech | Rasa Summit
 
How Our Team Uses Rasa to Learn from Real Conversations | Rasa Summit
How Our Team Uses Rasa to Learn from Real Conversations | Rasa SummitHow Our Team Uses Rasa to Learn from Real Conversations | Rasa Summit
How Our Team Uses Rasa to Learn from Real Conversations | Rasa Summit
 
Applying Conversational AI in the Enterprise
Applying Conversational AI in the EnterpriseApplying Conversational AI in the Enterprise
Applying Conversational AI in the Enterprise
 
Ai = your data | Rasa Summit 2021
Ai = your data | Rasa Summit 2021Ai = your data | Rasa Summit 2021
Ai = your data | Rasa Summit 2021
 
Supercharging User Interfaces with Rasa | Rasa Summit 2021
Supercharging User Interfaces with Rasa | Rasa Summit 2021Supercharging User Interfaces with Rasa | Rasa Summit 2021
Supercharging User Interfaces with Rasa | Rasa Summit 2021
 
STAR: A Schema-Guided Dialog Dataset for Transfer Learning | Rasa Summit 2021
STAR: A Schema-Guided Dialog Dataset for Transfer Learning | Rasa Summit 2021 STAR: A Schema-Guided Dialog Dataset for Transfer Learning | Rasa Summit 2021
STAR: A Schema-Guided Dialog Dataset for Transfer Learning | Rasa Summit 2021
 
Ethnobots: Reimagining Chatbots as Ethnographic Research Tools | Rasa Summit ...
Ethnobots: Reimagining Chatbots as Ethnographic Research Tools | Rasa Summit ...Ethnobots: Reimagining Chatbots as Ethnographic Research Tools | Rasa Summit ...
Ethnobots: Reimagining Chatbots as Ethnographic Research Tools | Rasa Summit ...
 
The State of Conversation Design - Designing for the Conversational Future
The State of Conversation Design - Designing for the Conversational FutureThe State of Conversation Design - Designing for the Conversational Future
The State of Conversation Design - Designing for the Conversational Future
 
Rasa Open Source - What's next?
Rasa Open Source - What's next?Rasa Open Source - What's next?
Rasa Open Source - What's next?
 
Building an AI Assistant Factory - Rasa Summit 2021
Building an AI Assistant Factory - Rasa Summit 2021Building an AI Assistant Factory - Rasa Summit 2021
Building an AI Assistant Factory - Rasa Summit 2021
 
Building an End-to-End Test Automation Pipeline for Conversational AI | Rasa ...
Building an End-to-End Test Automation Pipeline for Conversational AI | Rasa ...Building an End-to-End Test Automation Pipeline for Conversational AI | Rasa ...
Building an End-to-End Test Automation Pipeline for Conversational AI | Rasa ...
 
Deploy your Rasa Chatbots like a Boss with DevOps | Rasa Summit 2021
Deploy your Rasa Chatbots like a Boss with DevOps | Rasa Summit 2021Deploy your Rasa Chatbots like a Boss with DevOps | Rasa Summit 2021
Deploy your Rasa Chatbots like a Boss with DevOps | Rasa Summit 2021
 
What’s next in CDD: Intent Clashes and Selective Confidence | Rasa Summit 2021
What’s next in CDD: Intent Clashes and Selective Confidence | Rasa Summit 2021What’s next in CDD: Intent Clashes and Selective Confidence | Rasa Summit 2021
What’s next in CDD: Intent Clashes and Selective Confidence | Rasa Summit 2021
 

Recently uploaded

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Recently uploaded (20)

Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital Adaptability
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 

Rasa Developer Summit - Bing Liu - Interactive Learning of Task-Oriented Dialog Systems

  • 1. Interactive Learning of Task-Oriented Dialog Systems Bing Liu Research Scientist, Facebook Conversational AI Rasa Developer Summit - 2019
  • 2. Interactive Learning of Task-Oriented Dialog Systems Bing Liu Research Scientist, Facebook PhD, Carnegie Mellon University
  • 3. ❖ Dialog systems ➢ Chit-chat bot, QA bot, task-oriented dialog system, ... ❖ Get stuff done - assist users in completing specific tasks ➢ Personal assistants (e.g. Siri, Alexa, Google Assistant, Hey Portal) ➢ Voice command in vehicle and smart home ➢ Customer service; Sales and marketing Task-Oriented Dialog System 2
  • 4. Modular Dialog System Architecture 3
  • 5. Task-Oriented Dialog System ❖ Highly handcrafted ❖ Process interdependent 4 ❖ Data driven end-to-end (E2E) systems ➢ [Wen et al. 2016]: E2E supervised training neural dialog model ➢ [Bordes and Weston, 2017]: E2E model with memory network ➢ [Andrea et al, 2018]: Mem2Seq for incorporating knowledge to E2E system ❖ Interactive learning for E2E system with less human supervision
  • 6. Why Learn through Interactions? ❖ Task-oriented dialog as a sequential decision making process over multiple steps 5 ❖ State space grows exponentially with number of dialog turns ❖ Extremely hard to ➢ Design all possible dialog paths ➢ Collect a dialog corpus that is large enough to cover all dialog scenarios → Continuously learn through the interaction with users and improve over time
  • 7. How can we learn end-to-end task-oriented dialog system effectively through interaction with users? 6
  • 8. End-to-End Task-Oriented Dialog Modeling 7 ❖ Dialog context modeling with hierarchical RNN B Liu, et al, "Dialogue Learning with Human Teaching and Feedback in End-To-End Trainable Task-Oriented Dialogue Systems", NAACL 2018.
  • 9. End-to-End Task-Oriented Dialog Modeling 8 End-to-End Modeling of SLU, DST, and Dialog Policy
  • 10. Supervised Pre-training ❖ Supervised model pre-training on dialog corpus with MLE ➢ Objective function: linear interpolation of cross-entropy losses for ■ Dialog state tracking, i.e. user goal estimation, and ■ Dialog policy, i.e. system action prediction ➢ Optimization: Stochastic gradient descent, Adam 9 ← Loss for user goal estimation ← Loss for system action prediction
  • 11. Learn Interactively from User Feedback ❖ Interactive dialog learning with user feedback 10 Provide feedback for policy optimization Human-Human Dialog Corpora Supervised Pre-training
  • 12. Learn Interactively from User Feedback ❖ Use user feedback as dialog reward ❖ Introduce step penalty to encourage shorter dialog for task completion ❖ Optimize dialog model end-to-end with policy gradient RL: 11
  • 13. Learn Interactively from User Feedback ❖ Policy optimization with RL can be slow due to sparse reward 12 ❖ Dialog state distribution mismatch between offline training and interactive learning leads to compounding errors → Ask user for correction/demonstration when fails at a task and learn to act ❖ Agent may learn to recover from bad state with RL but the search process can be very inefficient
  • 14. Learn Interactively from User Teaching ❖ Interactive dialog learning with user teaching 13 Correct mistakes & Demo desired dialog agent behavior Add to existing corpora Driven by the agent’s own policy New Dialog Human-Human Dialog Corpora Supervised Pre-training
  • 15. Evaluation 14 Slots: theatre name, movie, date, time, num of people SL: Supervised pre-training model IL: Imitation learning with user teaching RL: Reinforcement learning with user feedback ❖ Movie booking domain simulation (M2M) Table: Human evaluation results. Mean and standard deviation of crowd worker scores (1-5) B Liu, et al, "Dialogue Learning with Human Teaching and Feedback in End-To-End Trainable Task-Oriented Dialogue Systems", NAACL 2018.
  • 16. 15 What if a user did not provide any feedback, can we still learn anything from the interaction?
  • 17. Can we learn a dialog reward function? ❖ User feedback serves as reward to RL optimization 16 ❖ Task completion based reward requires prior knowledge of user’s goal → NOT usually accessible in real world user interactions ❖ In practice, user feedback can be inconsistent and is NOT always available
  • 18. Adversarial Dialog Learning 17 Reward Bing Liu and Ian Lane, "Adversarial Learning of Task-Oriented Neural Dialog Models", in SIGDIAL 2018. ❖ Reward a machine-agent for conducting task-oriented dialog in a way that is indistinguishable from the way human-agents do it.
  • 19. Discriminative Reward Model 18 User’s Turn Agent’s Turn External Entity Info ❖ Input: ➢ Sequence of dialog turns ❖ Representation: ➢ BiLSTM with max-pooling ❖ Output: ➢ Prob. of a dialog being successfully completed by a human agent Bing Liu and Ian Lane, "Adversarial Learning of Task-Oriented Neural Dialog Models", in SIGDIAL 2018.
  • 20. Model Training ❖ Supervised pre-training with an initial set of pos & neg samples ➢ Pre-train dialog agent G on positive dialog samples with MLE ➢ Pre-train discriminative reward function D on pos & neg samples ❖ Interactive learning cycle ➢ Collect new dialog sample(s) between agent G and users ➢ Update dialog agent G with RL using the reward produced by D ➢ Update reward function D using the newly collected sample(s) ➢ Continue for next learning cycle 19
  • 21. ❖ Comparing different reward functions Evaluation 20 Bing Liu and Ian Lane, "Adversarial Learning of Task-Oriented Neural Dialog Models", in SIGDIAL 2018.
  • 22. Summary ❖ The multi-turn nature of task-oriented dialogs makes it especially important for a system to learn through interaction with users ❖ Learning task-oriented dialog model end-to-end with user teaching and feedback ❖ Adversarial dialog learning to address the challenges with missing or inconsistent user feedback with less human supervision 21