Handwritten Text Recognition for manuscripts and early printed texts
Harm van Seijen, Research Scientist, Maluuba at MLconf SF 2016
1. Using Deep Reinforcement Learning
for Dialogue Systems
Harm van Seijen, Research Scientist
Montréal, Canada
2. spoken dialogue system
natural language
understanding
state tracker
policy manager
natural language
generation
data
“Hi, do you know a good
Indian restaurant”
system
response
user act
system
act
dialogue
state
user
inform(food=“Indian”)
user
input
“Sure. What price range
are you thinking of?” request(price_range)
3. spoken dialogue system
natural language
understanding
state tracker
policy manager
natural language
generation
data
“Hi, do you know a good
Indian restaurant”
system
response
user act
system
act
dialogue
state
user
The central question: how to train the policy manager?
inform(food=“Indian”)
user
input
“Sure. What price range
are you thinking of?” request(price_range)
4. outline
1. what is reinforcement learning
2. solution strategies for RL
3. applying RL to dialogue systems
5. what is reinforcement learning
Reinforcement Learning is a data-driven
approach towards learning behaviour.
6. what is reinforcement learning
Reinforcement Learning is a data-driven
approach towards learning behaviour.
machine learning
unsupervised
learning
supervised
learning
reinforcement
learning
7. what is reinforcement learning
Reinforcement Learning is a data-driven
approach towards learning behaviour.
machine learning
unsupervised
learning
supervised
learning
reinforcement
learning
+
deep learning deep learning
+ +
deep learning
8. what is reinforcement learning
Reinforcement Learning is a data-driven
approach towards learning behaviour.
machine learning
unsupervised
learning
supervised
learning
reinforcement
learning
+
deep learning deep learning
+ +
deep learning
=
deep reinforcement
learning
9. RL vs supervised learning
behaviour: function that maps environment states to actions
10. RL vs supervised learning
supervised learning
hard to specify function
easy to identify correct output
behaviour: function that maps environment states to actions
11. RL vs supervised learning
supervised learning
hard to specify function
easy to identify correct output
behaviour: function that maps environment states to actions
example: recognizing cats in images
f cat / no cat
12. RL vs supervised learning
behaviour: function that maps environment states to actions
reinforcement learning:
hard to specify function
hard to identify correct output
easy to specify behaviour goal
13. RL vs supervised learning
behaviour: function that maps environment states to actions
reinforcement learning:
hard to specify function
hard to identify correct output
easy to specify behaviour goal
example: double inverted pendulum
state: θ1, θ2, ω1, ω2
action: clockwise/counter-clockwise
torque on top joint
goal: balance pendulum upright
14. advantages RL
does not require knowledge of good policy
does not require labelled data
online learning: adaptation to environment changes
29. deep reinforcement learning
2015 Nature paper from DeepMind introduced an RL
method based on deep learning, called DQN
main result: with same network architecture, learned to
play large number of Atari 2600 games effectively
30. deep reinforcement learning
2015 Nature paper from DeepMind introduced an RL
method based on deep learning, called DQN
main result: with same network architecture, learned to
play large number of Atari 2600 games effectively
DQN characteristics
variation on Q-learning that uses deep neural networks to
approximate the Q function
uses experience replay to deal with non-i.i.d. samples
uses two networks (Q and Q’) to mitigate non-stationarity of
update targets
31. outline
1. what is reinforcement learning
2. solution strategies for RL
3. applying RL to dialogue systems
32. applying RL to dialogue system
training dialogue manager requires huge number
of online samples
hence, a user simulator, trained on offline data, is
used to train dialogue manager
policy manager
system
act
user
simulator
training
state tracker
dialogue
act
offline
data
33. deep RL for dialogue system
exact state is not observed, hence belief state is
used
belief-state spaces are typically discretized into
summary state spaces to make the task tractable
deep RL can be applied directly to the belief-state
space due to its strong generalization properties
with pre-training, a deep RL method can become
even more efficient
35. summary
RL is a data-driven approach towards learning
behaviour
RL does not require knowledge of good policy
RL can be used for online learning
combining RL with deep learning means that RL
can be applied to much bigger problems
constructing a good policy for a modern dialogue
manager is a challenging task
deep RL is the perfect candidate to address this
challenge
36. Further reading:
“Introduction to Reinforcement Learning”
by Richard S. Sutton & Andrew G. Barto
https://webdocs.cs.ualberta.ca/~sutton/book/the-book.html
“Algorithms for Reinforcement Learning”
by Csaba Szepesvari
https://sites.ualberta.ca/~szepesva/RLBook.html
“Policy Networks with Two-Stage Training for Dialogue Systems”
by Mehdi Fatemi, Layla El Asri, Hannes Schulz, Jing He, Kaheer Suleman
https://arxiv.org/abs/1606.03152
Code examples:
simple DQN example in Python:
https://edersantana.github.io/articles/keras_rl/
tool for testing/developing RL algorithms:
https://gym.openai.com/