SlideShare ist ein Scribd-Unternehmen logo
1 von 36
Downloaden Sie, um offline zu lesen
Using Deep Reinforcement Learning
for Dialogue Systems
Harm van Seijen, Research Scientist
Montréal, Canada
spoken dialogue system
natural language
understanding
state tracker
policy manager
natural language
generation
data
“Hi, do you know a good

Indian restaurant”
system
response
user act
system

act
dialogue
state
user
inform(food=“Indian”)
user
input
“Sure. What price range 

are you thinking of?” request(price_range)
spoken dialogue system
natural language
understanding
state tracker
policy manager
natural language
generation
data
“Hi, do you know a good

Indian restaurant”
system
response
user act
system

act
dialogue
state
user
The central question: how to train the policy manager?
inform(food=“Indian”)
user
input
“Sure. What price range 

are you thinking of?” request(price_range)
outline
1. what is reinforcement learning
2. solution strategies for RL
3. applying RL to dialogue systems
what is reinforcement learning
Reinforcement Learning is a data-driven 

approach towards learning behaviour.
what is reinforcement learning
Reinforcement Learning is a data-driven 

approach towards learning behaviour.
machine learning
unsupervised
learning
supervised
learning
reinforcement
learning
what is reinforcement learning
Reinforcement Learning is a data-driven 

approach towards learning behaviour.
machine learning
unsupervised
learning
supervised
learning
reinforcement
learning
+
deep learning deep learning
+ +
deep learning
what is reinforcement learning
Reinforcement Learning is a data-driven 

approach towards learning behaviour.
machine learning
unsupervised
learning
supervised
learning
reinforcement
learning
+
deep learning deep learning
+ +
deep learning
=
deep reinforcement
learning
RL vs supervised learning
behaviour: function that maps environment states to actions
RL vs supervised learning
supervised learning
hard to specify function
easy to identify correct output
behaviour: function that maps environment states to actions
RL vs supervised learning
supervised learning
hard to specify function
easy to identify correct output
behaviour: function that maps environment states to actions
example: recognizing cats in images
f cat / no cat
RL vs supervised learning
behaviour: function that maps environment states to actions
reinforcement learning:
hard to specify function
hard to identify correct output
easy to specify behaviour goal
RL vs supervised learning
behaviour: function that maps environment states to actions
reinforcement learning:
hard to specify function
hard to identify correct output
easy to specify behaviour goal
example: double inverted pendulum
state: θ1, θ2, ω1, ω2 

action: clockwise/counter-clockwise

torque on top joint
goal: balance pendulum upright
advantages RL
does not require knowledge of good policy
does not require labelled data
online learning: adaptation to environment changes
challenges RL
requires lots of data
sample distribution changes during learning
samples are not i.i.d.
outline
1. what is reinforcement learning
2. solution strategies for RL
3. applying RL to dialogue systems
definitions
definitions
definitions
definitions
definitions
estimating the value function
estimating the value function
estimating the value function
estimating the value function
estimating the value function
finding the optimal policy
policy estimation
policy improvement:
finding the optimal policy
Q-learning:
classical RL algorithm
combines (partial) policy evaluation with (partial)
policy improvement
update target:
policy estimation
policy improvement:
deep reinforcement learning
2015 Nature paper from DeepMind introduced an RL 

method based on deep learning, called DQN
main result: with same network architecture, learned to 

play large number of Atari 2600 games effectively
deep reinforcement learning
2015 Nature paper from DeepMind introduced an RL 

method based on deep learning, called DQN
main result: with same network architecture, learned to 

play large number of Atari 2600 games effectively
DQN characteristics
variation on Q-learning that uses deep neural networks to
approximate the Q function
uses experience replay to deal with non-i.i.d. samples
uses two networks (Q and Q’) to mitigate non-stationarity of
update targets
outline
1. what is reinforcement learning
2. solution strategies for RL
3. applying RL to dialogue systems
applying RL to dialogue system
training dialogue manager requires huge number
of online samples
hence, a user simulator, trained on offline data, is
used to train dialogue manager
policy manager
system

act
user
simulator
training
state tracker
dialogue

act
offline
data
deep RL for dialogue system
exact state is not observed, hence belief state is
used
belief-state spaces are typically discretized into
summary state spaces to make the task tractable
deep RL can be applied directly to the belief-state
space due to its strong generalization properties
with pre-training, a deep RL method can become
even more efficient
effect of pre-training
without pre-training with pre-training
[based on DSTC2 dataset]
summary
RL is a data-driven approach towards learning
behaviour
RL does not require knowledge of good policy
RL can be used for online learning
combining RL with deep learning means that RL
can be applied to much bigger problems
constructing a good policy for a modern dialogue
manager is a challenging task
deep RL is the perfect candidate to address this
challenge
Further reading:
“Introduction to Reinforcement Learning”
by Richard S. Sutton & Andrew G. Barto
https://webdocs.cs.ualberta.ca/~sutton/book/the-book.html
“Algorithms for Reinforcement Learning”

by Csaba Szepesvari

https://sites.ualberta.ca/~szepesva/RLBook.html
“Policy Networks with Two-Stage Training for Dialogue Systems”
by Mehdi Fatemi, Layla El Asri, Hannes Schulz, Jing He, Kaheer Suleman
https://arxiv.org/abs/1606.03152
Code examples:
simple DQN example in Python: 

https://edersantana.github.io/articles/keras_rl/
tool for testing/developing RL algorithms: 

https://gym.openai.com/

Weitere ähnliche Inhalte

Was ist angesagt?

RECENT PROGRESS IN ADVERSARIAL DEEP LEARNING ATTACK AND DEFENSE - Wenbo Guo a...
RECENT PROGRESS IN ADVERSARIAL DEEP LEARNING ATTACK AND DEFENSE - Wenbo Guo a...RECENT PROGRESS IN ADVERSARIAL DEEP LEARNING ATTACK AND DEFENSE - Wenbo Guo a...
RECENT PROGRESS IN ADVERSARIAL DEEP LEARNING ATTACK AND DEFENSE - Wenbo Guo a...
GeekPwn Keen
 
Master re exam simulation course --i.e. sd course -- 2005
Master re exam simulation course --i.e. sd course -- 2005Master re exam simulation course --i.e. sd course -- 2005
Master re exam simulation course --i.e. sd course -- 2005
Hany Nozhy
 
Master exam simulation course -i.e. sd course -- 2005
Master exam simulation course  -i.e. sd course -- 2005Master exam simulation course  -i.e. sd course -- 2005
Master exam simulation course -i.e. sd course -- 2005
Hany Nozhy
 

Was ist angesagt? (19)

ACM ICTIR 2019 Slides - Santa Clara, USA
ACM ICTIR 2019 Slides -  Santa Clara, USAACM ICTIR 2019 Slides -  Santa Clara, USA
ACM ICTIR 2019 Slides - Santa Clara, USA
 
How popular are your tweets?
How popular are your tweets?How popular are your tweets?
How popular are your tweets?
 
[WI 2017] Context Suggestion: Empirical Evaluations vs User Studies
[WI 2017] Context Suggestion: Empirical Evaluations vs User Studies[WI 2017] Context Suggestion: Empirical Evaluations vs User Studies
[WI 2017] Context Suggestion: Empirical Evaluations vs User Studies
 
An introduction to reinforcement learning
An introduction to  reinforcement learningAn introduction to  reinforcement learning
An introduction to reinforcement learning
 
Shanghai deep learning meetup 4
Shanghai deep learning meetup 4Shanghai deep learning meetup 4
Shanghai deep learning meetup 4
 
Actor critic algorithm
Actor critic algorithmActor critic algorithm
Actor critic algorithm
 
Temporal based Recommendation System
Temporal based Recommendation SystemTemporal based Recommendation System
Temporal based Recommendation System
 
Alanoud alqoufi inductive learning
Alanoud alqoufi inductive learningAlanoud alqoufi inductive learning
Alanoud alqoufi inductive learning
 
Recommender Systems
Recommender SystemsRecommender Systems
Recommender Systems
 
RECENT PROGRESS IN ADVERSARIAL DEEP LEARNING ATTACK AND DEFENSE - Wenbo Guo a...
RECENT PROGRESS IN ADVERSARIAL DEEP LEARNING ATTACK AND DEFENSE - Wenbo Guo a...RECENT PROGRESS IN ADVERSARIAL DEEP LEARNING ATTACK AND DEFENSE - Wenbo Guo a...
RECENT PROGRESS IN ADVERSARIAL DEEP LEARNING ATTACK AND DEFENSE - Wenbo Guo a...
 
An introduction to reinforcement learning
An introduction to reinforcement learningAn introduction to reinforcement learning
An introduction to reinforcement learning
 
AI: Learning in AI 2
AI: Learning in AI 2AI: Learning in AI 2
AI: Learning in AI 2
 
Temporal difference learning
Temporal difference learningTemporal difference learning
Temporal difference learning
 
Master re exam simulation course --i.e. sd course -- 2005
Master re exam simulation course --i.e. sd course -- 2005Master re exam simulation course --i.e. sd course -- 2005
Master re exam simulation course --i.e. sd course -- 2005
 
AI: Learning in AI
AI: Learning in AI AI: Learning in AI
AI: Learning in AI
 
Question Answering System using machine learning approach
Question Answering System using machine learning approachQuestion Answering System using machine learning approach
Question Answering System using machine learning approach
 
Meta learned Confidence for Few-shot Learning
Meta learned Confidence for Few-shot LearningMeta learned Confidence for Few-shot Learning
Meta learned Confidence for Few-shot Learning
 
Master exam simulation course -i.e. sd course -- 2005
Master exam simulation course  -i.e. sd course -- 2005Master exam simulation course  -i.e. sd course -- 2005
Master exam simulation course -i.e. sd course -- 2005
 
CSU_comp
CSU_compCSU_comp
CSU_comp
 

Andere mochten auch

[216]딥러닝예제로보는개발자를위한통계 최재걸
[216]딥러닝예제로보는개발자를위한통계 최재걸[216]딥러닝예제로보는개발자를위한통계 최재걸
[216]딥러닝예제로보는개발자를위한통계 최재걸
NAVER D2
 
[221] 딥러닝을 이용한 지역 컨텍스트 검색 김진호
[221] 딥러닝을 이용한 지역 컨텍스트 검색 김진호[221] 딥러닝을 이용한 지역 컨텍스트 검색 김진호
[221] 딥러닝을 이용한 지역 컨텍스트 검색 김진호
NAVER D2
 
[F2]자연어처리를 위한 기계학습 소개
[F2]자연어처리를 위한 기계학습 소개[F2]자연어처리를 위한 기계학습 소개
[F2]자연어처리를 위한 기계학습 소개
NAVER D2
 
아마존 에코를 활용한 음성 인식 에어컨 제어 A to z
아마존 에코를 활용한 음성 인식 에어컨 제어 A to z아마존 에코를 활용한 음성 인식 에어컨 제어 A to z
아마존 에코를 활용한 음성 인식 에어컨 제어 A to z
Jueun Seo
 

Andere mochten auch (20)

Spm12를 이용한 fmri analysis
Spm12를 이용한 fmri analysisSpm12를 이용한 fmri analysis
Spm12를 이용한 fmri analysis
 
[216]딥러닝예제로보는개발자를위한통계 최재걸
[216]딥러닝예제로보는개발자를위한통계 최재걸[216]딥러닝예제로보는개발자를위한통계 최재걸
[216]딥러닝예제로보는개발자를위한통계 최재걸
 
Driving Computer Vision Research Innovation In Artificial Intelligence
Driving Computer Vision Research Innovation In Artificial IntelligenceDriving Computer Vision Research Innovation In Artificial Intelligence
Driving Computer Vision Research Innovation In Artificial Intelligence
 
weather-data-processing-using-python
weather-data-processing-using-pythonweather-data-processing-using-python
weather-data-processing-using-python
 
[224] 번역 모델 기반_질의_교정_시스템
[224] 번역 모델 기반_질의_교정_시스템[224] 번역 모델 기반_질의_교정_시스템
[224] 번역 모델 기반_질의_교정_시스템
 
Python 2 와 3 공존하기
Python 2 와 3 공존하기Python 2 와 3 공존하기
Python 2 와 3 공존하기
 
Denoising auto encoders(d a)
Denoising auto encoders(d a)Denoising auto encoders(d a)
Denoising auto encoders(d a)
 
[ Pycon Korea 2017 ] Infrastructure as Code를위한 Ansible 활용
[ Pycon Korea 2017 ] Infrastructure as Code를위한 Ansible 활용[ Pycon Korea 2017 ] Infrastructure as Code를위한 Ansible 활용
[ Pycon Korea 2017 ] Infrastructure as Code를위한 Ansible 활용
 
PYCON KR 2017 - 구름이 하늘의 일이라면 (윤상웅)
PYCON KR 2017 - 구름이 하늘의 일이라면 (윤상웅)PYCON KR 2017 - 구름이 하늘의 일이라면 (윤상웅)
PYCON KR 2017 - 구름이 하늘의 일이라면 (윤상웅)
 
[221] 딥러닝을 이용한 지역 컨텍스트 검색 김진호
[221] 딥러닝을 이용한 지역 컨텍스트 검색 김진호[221] 딥러닝을 이용한 지역 컨텍스트 검색 김진호
[221] 딥러닝을 이용한 지역 컨텍스트 검색 김진호
 
[F2]자연어처리를 위한 기계학습 소개
[F2]자연어처리를 위한 기계학습 소개[F2]자연어처리를 위한 기계학습 소개
[F2]자연어처리를 위한 기계학습 소개
 
Speaker Diarization
Speaker DiarizationSpeaker Diarization
Speaker Diarization
 
PYCON 2017 발표자료 한성준
PYCON 2017 발표자료 한성준PYCON 2017 발표자료 한성준
PYCON 2017 발표자료 한성준
 
아마존 에코를 활용한 음성 인식 에어컨 제어 A to z
아마존 에코를 활용한 음성 인식 에어컨 제어 A to z아마존 에코를 활용한 음성 인식 에어컨 제어 A to z
아마존 에코를 활용한 음성 인식 에어컨 제어 A to z
 
One-Shot Learning
One-Shot LearningOne-Shot Learning
One-Shot Learning
 
Pycon2017 이성용 Dances with the Last Samurai
Pycon2017 이성용 Dances with the Last SamuraiPycon2017 이성용 Dances with the Last Samurai
Pycon2017 이성용 Dances with the Last Samurai
 
머신러닝의 자연어 처리기술(I)
머신러닝의 자연어 처리기술(I)머신러닝의 자연어 처리기술(I)
머신러닝의 자연어 처리기술(I)
 
딥러닝을 이용한 자연어처리의 연구동향
딥러닝을 이용한 자연어처리의 연구동향딥러닝을 이용한 자연어처리의 연구동향
딥러닝을 이용한 자연어처리의 연구동향
 
CNN 초보자가 만드는 초보자 가이드 (VGG 약간 포함)
CNN 초보자가 만드는 초보자 가이드 (VGG 약간 포함)CNN 초보자가 만드는 초보자 가이드 (VGG 약간 포함)
CNN 초보자가 만드는 초보자 가이드 (VGG 약간 포함)
 
Developing Korean Chatbot 101
Developing Korean Chatbot 101Developing Korean Chatbot 101
Developing Korean Chatbot 101
 

Ähnlich wie Harm van Seijen, Research Scientist, Maluuba at MLconf SF 2016

Machine Learning with Python- Methods for Machine Learning.pptx
Machine Learning with Python- Methods for Machine Learning.pptxMachine Learning with Python- Methods for Machine Learning.pptx
Machine Learning with Python- Methods for Machine Learning.pptx
iaeronlineexm
 
2017 10-10 (netflix ml platform meetup) learning item and user representation...
2017 10-10 (netflix ml platform meetup) learning item and user representation...2017 10-10 (netflix ml platform meetup) learning item and user representation...
2017 10-10 (netflix ml platform meetup) learning item and user representation...
Ed Chi
 
Reinforcement Learning
Reinforcement LearningReinforcement Learning
Reinforcement Learning
butest
 
What is Reinforcement Learning.pdf
What is Reinforcement Learning.pdfWhat is Reinforcement Learning.pdf
What is Reinforcement Learning.pdf
Aiblogtech
 
What is Function approximation in RL and its types.pdf
What is Function approximation in RL and its types.pdfWhat is Function approximation in RL and its types.pdf
What is Function approximation in RL and its types.pdf
Aiblogtech
 

Ähnlich wie Harm van Seijen, Research Scientist, Maluuba at MLconf SF 2016 (20)

Reinforcement Learning.pdf
Reinforcement Learning.pdfReinforcement Learning.pdf
Reinforcement Learning.pdf
 
HarshithAkkapelli_Presentation.pdf
HarshithAkkapelli_Presentation.pdfHarshithAkkapelli_Presentation.pdf
HarshithAkkapelli_Presentation.pdf
 
Machine Learning with Python- Methods for Machine Learning.pptx
Machine Learning with Python- Methods for Machine Learning.pptxMachine Learning with Python- Methods for Machine Learning.pptx
Machine Learning with Python- Methods for Machine Learning.pptx
 
REINFORCEMENT LEARNING (reinforced through trial and error).pptx
REINFORCEMENT LEARNING (reinforced through trial and error).pptxREINFORCEMENT LEARNING (reinforced through trial and error).pptx
REINFORCEMENT LEARNING (reinforced through trial and error).pptx
 
Introduction to Reinforcement Learning.pdf
Introduction to Reinforcement Learning.pdfIntroduction to Reinforcement Learning.pdf
Introduction to Reinforcement Learning.pdf
 
Big Data Analytics - It is here and now!
Big Data Analytics - It is here and now!Big Data Analytics - It is here and now!
Big Data Analytics - It is here and now!
 
2017 10-10 (netflix ml platform meetup) learning item and user representation...
2017 10-10 (netflix ml platform meetup) learning item and user representation...2017 10-10 (netflix ml platform meetup) learning item and user representation...
2017 10-10 (netflix ml platform meetup) learning item and user representation...
 
My experiment
My experimentMy experiment
My experiment
 
Reinforcement Learning
Reinforcement LearningReinforcement Learning
Reinforcement Learning
 
What is Reinforcement Learning.pdf
What is Reinforcement Learning.pdfWhat is Reinforcement Learning.pdf
What is Reinforcement Learning.pdf
 
Muhammad Usman Akhtar | Ph.D Scholar | Wuhan University | School of Co...
Muhammad Usman Akhtar  |  Ph.D Scholar  |  Wuhan  University  |  School of Co...Muhammad Usman Akhtar  |  Ph.D Scholar  |  Wuhan  University  |  School of Co...
Muhammad Usman Akhtar | Ph.D Scholar | Wuhan University | School of Co...
 
Your learning ecosystem
Your learning ecosystemYour learning ecosystem
Your learning ecosystem
 
Reinforcement Learning with Deep Architectures
Reinforcement Learning with Deep ArchitecturesReinforcement Learning with Deep Architectures
Reinforcement Learning with Deep Architectures
 
Reinforcement course material samples: lecture 1
Reinforcement course material samples: lecture 1Reinforcement course material samples: lecture 1
Reinforcement course material samples: lecture 1
 
How to formulate reinforcement learning in illustrative ways
How to formulate reinforcement learning in illustrative waysHow to formulate reinforcement learning in illustrative ways
How to formulate reinforcement learning in illustrative ways
 
What is Function approximation in RL and its types.pdf
What is Function approximation in RL and its types.pdfWhat is Function approximation in RL and its types.pdf
What is Function approximation in RL and its types.pdf
 
inductive human biases.pptx
inductive human biases.pptxinductive human biases.pptx
inductive human biases.pptx
 
AI_Unit-4_Learning.pptx
AI_Unit-4_Learning.pptxAI_Unit-4_Learning.pptx
AI_Unit-4_Learning.pptx
 
LLM Paradigm Adaptations in Recommender Systems.pdf
LLM Paradigm Adaptations in Recommender Systems.pdfLLM Paradigm Adaptations in Recommender Systems.pdf
LLM Paradigm Adaptations in Recommender Systems.pdf
 
Technostress in Healthcare
Technostress in HealthcareTechnostress in Healthcare
Technostress in Healthcare
 

Mehr von MLconf

Ted Willke - The Brain’s Guide to Dealing with Context in Language Understanding
Ted Willke - The Brain’s Guide to Dealing with Context in Language UnderstandingTed Willke - The Brain’s Guide to Dealing with Context in Language Understanding
Ted Willke - The Brain’s Guide to Dealing with Context in Language Understanding
MLconf
 
Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...
Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...
Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...
MLconf
 
Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...
Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...
Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...
MLconf
 
Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...
Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...
Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...
MLconf
 
Vito Ostuni - The Voice: New Challenges in a Zero UI World
Vito Ostuni - The Voice: New Challenges in a Zero UI WorldVito Ostuni - The Voice: New Challenges in a Zero UI World
Vito Ostuni - The Voice: New Challenges in a Zero UI World
MLconf
 

Mehr von MLconf (20)

Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...
Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...
Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...
 
Ted Willke - The Brain’s Guide to Dealing with Context in Language Understanding
Ted Willke - The Brain’s Guide to Dealing with Context in Language UnderstandingTed Willke - The Brain’s Guide to Dealing with Context in Language Understanding
Ted Willke - The Brain’s Guide to Dealing with Context in Language Understanding
 
Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...
Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...
Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...
 
Igor Markov - Quantum Computing: a Treasure Hunt, not a Gold Rush
Igor Markov - Quantum Computing: a Treasure Hunt, not a Gold RushIgor Markov - Quantum Computing: a Treasure Hunt, not a Gold Rush
Igor Markov - Quantum Computing: a Treasure Hunt, not a Gold Rush
 
Josh Wills - Data Labeling as Religious Experience
Josh Wills - Data Labeling as Religious ExperienceJosh Wills - Data Labeling as Religious Experience
Josh Wills - Data Labeling as Religious Experience
 
Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...
Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...
Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...
 
Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...
Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...
Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...
 
Meghana Ravikumar - Optimized Image Classification on the Cheap
Meghana Ravikumar - Optimized Image Classification on the CheapMeghana Ravikumar - Optimized Image Classification on the Cheap
Meghana Ravikumar - Optimized Image Classification on the Cheap
 
Noam Finkelstein - The Importance of Modeling Data Collection
Noam Finkelstein - The Importance of Modeling Data CollectionNoam Finkelstein - The Importance of Modeling Data Collection
Noam Finkelstein - The Importance of Modeling Data Collection
 
June Andrews - The Uncanny Valley of ML
June Andrews - The Uncanny Valley of MLJune Andrews - The Uncanny Valley of ML
June Andrews - The Uncanny Valley of ML
 
Sneha Rajana - Deep Learning Architectures for Semantic Relation Detection Tasks
Sneha Rajana - Deep Learning Architectures for Semantic Relation Detection TasksSneha Rajana - Deep Learning Architectures for Semantic Relation Detection Tasks
Sneha Rajana - Deep Learning Architectures for Semantic Relation Detection Tasks
 
Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...
Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...
Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...
 
Vito Ostuni - The Voice: New Challenges in a Zero UI World
Vito Ostuni - The Voice: New Challenges in a Zero UI WorldVito Ostuni - The Voice: New Challenges in a Zero UI World
Vito Ostuni - The Voice: New Challenges in a Zero UI World
 
Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...
Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...
Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...
 
Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...
Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...
Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...
 
Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...
Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...
Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...
 
Neel Sundaresan - Teaching a machine to code
Neel Sundaresan - Teaching a machine to codeNeel Sundaresan - Teaching a machine to code
Neel Sundaresan - Teaching a machine to code
 
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...
 
Soumith Chintala - Increasing the Impact of AI Through Better Software
Soumith Chintala - Increasing the Impact of AI Through Better SoftwareSoumith Chintala - Increasing the Impact of AI Through Better Software
Soumith Chintala - Increasing the Impact of AI Through Better Software
 
Roy Lowrance - Predicting Bond Prices: Regime Changes
Roy Lowrance - Predicting Bond Prices: Regime ChangesRoy Lowrance - Predicting Bond Prices: Regime Changes
Roy Lowrance - Predicting Bond Prices: Regime Changes
 

Kürzlich hochgeladen

Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 

Kürzlich hochgeladen (20)

Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 

Harm van Seijen, Research Scientist, Maluuba at MLconf SF 2016

  • 1. Using Deep Reinforcement Learning for Dialogue Systems Harm van Seijen, Research Scientist Montréal, Canada
  • 2. spoken dialogue system natural language understanding state tracker policy manager natural language generation data “Hi, do you know a good
 Indian restaurant” system response user act system
 act dialogue state user inform(food=“Indian”) user input “Sure. What price range 
 are you thinking of?” request(price_range)
  • 3. spoken dialogue system natural language understanding state tracker policy manager natural language generation data “Hi, do you know a good
 Indian restaurant” system response user act system
 act dialogue state user The central question: how to train the policy manager? inform(food=“Indian”) user input “Sure. What price range 
 are you thinking of?” request(price_range)
  • 4. outline 1. what is reinforcement learning 2. solution strategies for RL 3. applying RL to dialogue systems
  • 5. what is reinforcement learning Reinforcement Learning is a data-driven 
 approach towards learning behaviour.
  • 6. what is reinforcement learning Reinforcement Learning is a data-driven 
 approach towards learning behaviour. machine learning unsupervised learning supervised learning reinforcement learning
  • 7. what is reinforcement learning Reinforcement Learning is a data-driven 
 approach towards learning behaviour. machine learning unsupervised learning supervised learning reinforcement learning + deep learning deep learning + + deep learning
  • 8. what is reinforcement learning Reinforcement Learning is a data-driven 
 approach towards learning behaviour. machine learning unsupervised learning supervised learning reinforcement learning + deep learning deep learning + + deep learning = deep reinforcement learning
  • 9. RL vs supervised learning behaviour: function that maps environment states to actions
  • 10. RL vs supervised learning supervised learning hard to specify function easy to identify correct output behaviour: function that maps environment states to actions
  • 11. RL vs supervised learning supervised learning hard to specify function easy to identify correct output behaviour: function that maps environment states to actions example: recognizing cats in images f cat / no cat
  • 12. RL vs supervised learning behaviour: function that maps environment states to actions reinforcement learning: hard to specify function hard to identify correct output easy to specify behaviour goal
  • 13. RL vs supervised learning behaviour: function that maps environment states to actions reinforcement learning: hard to specify function hard to identify correct output easy to specify behaviour goal example: double inverted pendulum state: θ1, θ2, ω1, ω2 
 action: clockwise/counter-clockwise
 torque on top joint goal: balance pendulum upright
  • 14. advantages RL does not require knowledge of good policy does not require labelled data online learning: adaptation to environment changes
  • 15. challenges RL requires lots of data sample distribution changes during learning samples are not i.i.d.
  • 16. outline 1. what is reinforcement learning 2. solution strategies for RL 3. applying RL to dialogue systems
  • 27. finding the optimal policy policy estimation policy improvement:
  • 28. finding the optimal policy Q-learning: classical RL algorithm combines (partial) policy evaluation with (partial) policy improvement update target: policy estimation policy improvement:
  • 29. deep reinforcement learning 2015 Nature paper from DeepMind introduced an RL 
 method based on deep learning, called DQN main result: with same network architecture, learned to 
 play large number of Atari 2600 games effectively
  • 30. deep reinforcement learning 2015 Nature paper from DeepMind introduced an RL 
 method based on deep learning, called DQN main result: with same network architecture, learned to 
 play large number of Atari 2600 games effectively DQN characteristics variation on Q-learning that uses deep neural networks to approximate the Q function uses experience replay to deal with non-i.i.d. samples uses two networks (Q and Q’) to mitigate non-stationarity of update targets
  • 31. outline 1. what is reinforcement learning 2. solution strategies for RL 3. applying RL to dialogue systems
  • 32. applying RL to dialogue system training dialogue manager requires huge number of online samples hence, a user simulator, trained on offline data, is used to train dialogue manager policy manager system
 act user simulator training state tracker dialogue
 act offline data
  • 33. deep RL for dialogue system exact state is not observed, hence belief state is used belief-state spaces are typically discretized into summary state spaces to make the task tractable deep RL can be applied directly to the belief-state space due to its strong generalization properties with pre-training, a deep RL method can become even more efficient
  • 34. effect of pre-training without pre-training with pre-training [based on DSTC2 dataset]
  • 35. summary RL is a data-driven approach towards learning behaviour RL does not require knowledge of good policy RL can be used for online learning combining RL with deep learning means that RL can be applied to much bigger problems constructing a good policy for a modern dialogue manager is a challenging task deep RL is the perfect candidate to address this challenge
  • 36. Further reading: “Introduction to Reinforcement Learning” by Richard S. Sutton & Andrew G. Barto https://webdocs.cs.ualberta.ca/~sutton/book/the-book.html “Algorithms for Reinforcement Learning”
 by Csaba Szepesvari
 https://sites.ualberta.ca/~szepesva/RLBook.html “Policy Networks with Two-Stage Training for Dialogue Systems” by Mehdi Fatemi, Layla El Asri, Hannes Schulz, Jing He, Kaheer Suleman https://arxiv.org/abs/1606.03152 Code examples: simple DQN example in Python: 
 https://edersantana.github.io/articles/keras_rl/ tool for testing/developing RL algorithms: 
 https://gym.openai.com/