Harm van Seijen, Research Scientist, Maluuba at MLconf SF 2016

Using Deep Reinforcement Learning
for Dialogue Systems
Harm van Seijen, Research Scientist
Montréal, Canada

spoken dialogue system
natural language
understanding
state tracker
policy manager
natural language
generation
data
“Hi, do you know a good 
Indian restaurant”
system
response
user act
system 
act
dialogue
state
user
inform(food=“Indian”)
user
input
“Sure. What price range  
are you thinking of?” request(price_range)

spoken dialogue system
natural language
understanding
state tracker
policy manager
natural language
generation
data
“Hi, do you know a good 
Indian restaurant”
system
response
user act
system 
act
dialogue
state
user
The central question: how to train the policy manager?
inform(food=“Indian”)
user
input
“Sure. What price range  
are you thinking of?” request(price_range)

outline
1. what is reinforcement learning
2. solution strategies for RL
3. applying RL to dialogue systems

what is reinforcement learning
Reinforcement Learning is a data-driven  
approach towards learning behaviour.

machine learning
unsupervised
learning
supervised
learning
reinforcement
learning

machine learning
unsupervised
learning
supervised
learning
reinforcement
learning
+
deep learning deep learning
+ +
deep learning

machine learning
unsupervised
learning
supervised
learning
reinforcement
learning
+
deep learning deep learning
+ +
deep learning
=
deep reinforcement
learning

RL vs supervised learning
behaviour: function that maps environment states to actions

supervised learning
hard to specify function
easy to identify correct output

supervised learning
easy to identify correct output
example: recognizing cats in images
f cat / no cat

reinforcement learning:
hard to identify correct output
easy to specify behaviour goal

reinforcement learning:
hard to identify correct output
easy to specify behaviour goal
example: double inverted pendulum
state: θ1, θ2, ω1, ω2  
action: clockwise/counter-clockwise 
torque on top joint
goal: balance pendulum upright

advantages RL
does not require knowledge of good policy
does not require labelled data
online learning: adaptation to environment changes

challenges RL
requires lots of data
sample distribution changes during learning
samples are not i.i.d.

ﬁnding the optimal policy
policy estimation
policy improvement:

ﬁnding the optimal policy
Q-learning:
classical RL algorithm
combines (partial) policy evaluation with (partial)
policy improvement
update target:
policy estimation
policy improvement:

deep reinforcement learning
2015 Nature paper from DeepMind introduced an RL  
method based on deep learning, called DQN
main result: with same network architecture, learned to  
play large number of Atari 2600 games effectively

deep reinforcement learning
2015 Nature paper from DeepMind introduced an RL  
method based on deep learning, called DQN
main result: with same network architecture, learned to  
play large number of Atari 2600 games effectively
DQN characteristics
variation on Q-learning that uses deep neural networks to
approximate the Q function
uses experience replay to deal with non-i.i.d. samples
uses two networks (Q and Q’) to mitigate non-stationarity of
update targets

applying RL to dialogue system
training dialogue manager requires huge number
of online samples
hence, a user simulator, trained on ofﬂine data, is
used to train dialogue manager
policy manager
system 
act
user
simulator
training
state tracker
dialogue 
act
ofﬂine
data

deep RL for dialogue system
exact state is not observed, hence belief state is
used
belief-state spaces are typically discretized into
summary state spaces to make the task tractable
deep RL can be applied directly to the belief-state
space due to its strong generalization properties
with pre-training, a deep RL method can become
even more efﬁcient

effect of pre-training
without pre-training with pre-training
[based on DSTC2 dataset]

summary
RL is a data-driven approach towards learning
behaviour
RL does not require knowledge of good policy
RL can be used for online learning
combining RL with deep learning means that RL
can be applied to much bigger problems
constructing a good policy for a modern dialogue
manager is a challenging task
deep RL is the perfect candidate to address this
challenge

Further reading:
“Introduction to Reinforcement Learning”
by Richard S. Sutton & Andrew G. Barto
https://webdocs.cs.ualberta.ca/~sutton/book/the-book.html
“Algorithms for Reinforcement Learning” 
by Csaba Szepesvari 
https://sites.ualberta.ca/~szepesva/RLBook.html
“Policy Networks with Two-Stage Training for Dialogue Systems”
by Mehdi Fatemi, Layla El Asri, Hannes Schulz, Jing He, Kaheer Suleman
https://arxiv.org/abs/1606.03152
Code examples:
simple DQN example in Python:  
https://edersantana.github.io/articles/keras_rl/
tool for testing/developing RL algorithms:  
https://gym.openai.com/

Harm van Seijen, Research Scientist, Maluuba at MLconf SF 2016

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (19)

Andere mochten auch

Andere mochten auch (20)

Ähnlich wie Harm van Seijen, Research Scientist, Maluuba at MLconf SF 2016

Ähnlich wie Harm van Seijen, Research Scientist, Maluuba at MLconf SF 2016 (20)

Mehr von MLconf

Mehr von MLconf (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Harm van Seijen, Research Scientist, Maluuba at MLconf SF 2016