Deep Learning and Reinforcement Learning summer schools summary
26th June-6th July 2017, Montreal, Quebec
Things I learned. What was your favourite lesson?
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
MILA DL & RL summer school highlights
1. Deep Learning &
Reinforcement Learning MILA
Summer School Highlights
Natalia Díaz Rodríguez, PhD 26th June-5th July 2017
Montreal, Quebec
2. Learning to Learn - Nando de Freitas
• What is the intrinsic motivation we are
here? learning, satisfaction of getting
knowledge
• From Bengio’s brothers 92 to GitHub.com/
deepmind/learning-to-learn
• 1 single network: optimiser & optimizee
• Generalize: learning to learn X by doing
Y (unsup. by super. learning)
16. Automatic differentiation: the new trend by all
DL frameworks
• Matt Johnson great tutorial on Automatic
Differentiation
• IDEA: checkpointing and less config
boilerplate code
• Becoming standard:
• Tensor Flow eager
• PyTorch Taping
19. GANs state-of-the-art
• Applications: image generation, attribute morphing, image inpainting…
• State-of-the-art
• BEGAN*, Cycle-GAN (draw a bag and find a real one)
• Unsupervised Pixel–Level Domain Adaptation with Generative
Adversarial Networks, Bousmalis 16 (Unsupervised (GAN)–
based architecture able to learn a transformation without using
corresponding pairs from the two domains, code to appear,
CVPR17).
• The best state of the art approach improving over:
• Decoupling from the Task-Specific Architecture
• Generalization Across Label Spaces
• Achieve Training Stability
• Data Augmentation
* Fast and stable, new boundary equilibrium enforcing method paired with a loss derived from the Wasserstein distance for
training auto-encoder based GAN
CycleGAN
20.
21. KNN is still one of the most repeated
quantitative measure for unsupervised
evaluation
Bousmalis’16
22. GANs help Semi and Unsupervised
learning as well as domain randomisation
23. • CVAE-GAN fine-grained category image
generation.
CVAE-GAN: Fine-Grained Image Generation through Asymmetric Training, Bao’17
GANs Mode Collapse: inability to generate a variable
distribution of data
25. One/Few-shot learning
• Extending siamese with one-shot learning: Siamese
Neural Networks for One-shot Image Recognition.
One Shot Learning with Siamese Networks in PyTorch – Harshvardhan Gupta – Medium
This is Part 1 of a two part article. Part 2 will be shown here once it is published.
• Black-Box Data-efficient Policy Search for Robotics
Mouret17 (Gaussian process regression for policy
optimisation using model based policy search). 5
episodes enough to learn the whole dynamics of
the arm from scratch.
26.
27. • If you can’t predict
reward, predict a
relative ordering
rank (same vs
different)
• Siamese network:
optimize all rankings
simultaneously
28. • Natural language embedding into
multidimensional space really helps learning
(humans ALWAYS learn language)
• Physics and bodies provide essential
consistency for understanding intelligence, and
facilitate transfer and continuous learning
• Solving many tasks helps: sometimes many
tasks are essential to learn at all [Learning more
things at once often helps performance in RL.
Intentional unintentional agents]
• Reporting failure cases is also important!
Take Home
Messages
28
[NdF]
29.
30.
31. • TD-learning is back & hot (from the first
TD-Gammon AI won game)*
• Only 1 reward at the end
• No feedback along the way
• New venue: Int’ conference on RL and
decision making https://groups.google.com/
forum/#!forum/rldm-list
* See unsupervised representation learning talk by R. Sutton and latest
DeepMind (Mnih’17 evolution of UNREAL)
Take Home
Messages
31
32. • Domain randomization: use to transfer
from simulation to real life learning without
domain adaptation (OpenAI, NVIDIA cube
pose estimation: distractors and different
backgrounds, lights, virtual elements to real
images).
• Learning by demonstration and few shot
learning: Most data-efficient learning
algorithms for semi supervised learning
Take Home
Messages
32
33. • Regularizing NN by penalising confident
output distributions [Pereyra 17].
• Additional objectives (similar to UNREAL):
RL with Unsupervised Auxiliary Tasks
[Jaderberg’17]
• Generating grounded rewards
automatically [Littman, Topcu et al 17].
Take Home Papers
33
*Reinforcement Learning with Unsupervised Auxiliary Tasks - Implementation: https://github.com/miyosuda/unreal
**Option: a generalisation step of a single-step action that may span across more than 1 timestep and can be used as a
standard action. We move to the policy mu over options o with probability mu(s,o). We can derive a policy over options
Pi_omega that maximises the expected discounted (via regrets) sum of rewards.
34. •DeepMind 2 parallel works: Relational Networks and Visual Interaction
Networks (philosophically similar works using abstract logic to reason
about the world).
•Dealing with sparse rewards:
•Reward shaping: Off-Policy Reward Shaping with Ensembles: https://
arxiv.org/abs/1502.03248 and Expressing Arbitrary Reward Functions
as Potential-Based Advice: https://www.aaai.org/ocs/index.php/AAAI/
AAAI15/paper/viewFile/9893/9923
•http://papers.nips.cc/paper/6538-safe-and-efficient-off-policy-reinf
https://ai.vub.ac.be/sites/default/files/PID3130853.pdf
•Reinforcement Learning from Demonstration through Shaping
•Non-Markovian Rewards Expressed in LTL: Guiding Search Via
Reward Shaping. A. Camacho, et al. (RLDM), June 2017
•https://arxiv.org/pdf/1706.10295.pdf
Take Home Papers
34
35. •GANS:
•Allan Ma (Guelph) State of art GAN implem. +
evaluation.
•GAN used to perform domain adaptation (useful
ideas to go from simulated robot simulation to
real world robot simulation)
•LANGUAGE GROUNDING AND VISUAL/DIALOG
HYBRID SYSTEMS (Ideas for PARL.AI grant call):
End-to-end optimization of goal-driven and visually
grounded dialogue systems
Take Home Papers
35
36. • Dex-Net Grasping dataset (10K 3D models to acquire force
closure grasps, for the ABB YuMi)
• ROS service for grasp planning. Dex-Net as a Service: Fall
2017. HTTP web API to create new databases with custom 3D
models and compute grasp robustness metrics.
• Google robot farm dataset: many robot arms for grasping,
pushing, etc. 800,000 grasp attempts (6-14 robotic
manipulators)
• Using Baxter:
• Pinto and Gupta Baxter dataset (40k grasping experiences).
CNNs predict lifting successes or to resist grasp perturbations
caused by an adversary*.
• Oberlin’15 Autonomously collecting object scans
Take Home Datasets
36
*Lerrel Pinto and Abhinav Gupta. Supersizing self-supervision: Learning to grasp from 50k tries and 700 robot hours. In Proc. IEEE
Int. Conf. Robotics and Automation (ICRA), 2016.
Lerrel Pinto, James Davidson, and Abhinav Gupta. Supervision via competition: Robot adversaries for learning tasks. arXiv preprint
arXiv:1610.01685, 2016.
37.
38. Food for thought
• Is AI = DL + RL? (Hado van Hasselt)
• Does the brain do backpropagation?
• Even if the brain is not doing back-propagation as
ANN do, there is no mathematical handicap that
can prove otherwise
• CNNs and LSTMs: successful ubiquitous AI
models inspired by the human brain
• :( Neuroscience is still far apart from AI community
39. Keyword Summary
• GANS as data augmentation
(CycleGAN, BEGAN,…)
• Autoregressive models (PixelGAN)
• Embedding language and vision
representations
40. •End-to-end
•Self-supervision
•Learning by:
•Imitation*, cloning, demonstration and by predicting the
future (natural learning)
•One-shot learning
•Reward shaping and other myriad signals
•TD-learning
•Options framework
* E.g. Imitating Driver Behavior with Generative Adversarial Networks https://arxiv.org/pdf/1701.06699.pdf
Keyword
Summary
42. Papers right out of the oven
[PDF] End-to-End Learning of Semantic Grasping
E Jang, S Vijaynarasimhan, P Pastor, J Ibarz, S Levine - arXiv preprint arXiv: …, 2017
Abstract: We consider the task of semantic robotic grasping, in which a robot picks up an
object of a user-specified class using only monocular images. Inspired by the two-stream
hypothesis of visual reasoning, we present a semantic grasping framework that learns object
[PDF] Imitation from Observation: Learning to Imitate Behaviors from Raw Video via
Context Translation
YX Liu, A Gupta, P Abbeel, S Levine - arXiv preprint arXiv:1707.03374, 2017
Abstract: Imitation learning is an effective approach for autonomous systems to acquire
control policies when an explicit reward function is unavailable, using supervision provided
as demonstrations from an expert, typically a human operator. However, standard imitation
[PDF] Vision-Based Multi-Task Manipulation for Inexpensive Robots Using End-To-End
Learning from Demonstration
R Rahmatizadeh, P Abolghasemi, L Bölöni, S Levine - arXiv preprint arXiv: …, 2017
42
45. Limitations:
• Requires a substantial number of demonstrations to learn the
translation model.
• Requires observations of demonstrations from multiple
contexts in order to learn to translate between them.
Insights:
• Training an end-to-end model from scratch for each task may
be inefficient in practice
• Combining our method with higher level representations
proposed in prior work would likely lead to more efficient
training (Sermanet et al., 2017).
• Challenge: Domain shift: combine multiple tasks from different
contexts into a single model
Papers right out of the oven
47. Papers right out of the oven
• REINFORCEMENT LEARNING WITH
UNSUPERVISED AUXILIARY TASKS
(UNREAL and extension Mnih17)
• Auxiliary control and reward prediction
tasks in Deep RL doubles data efficiency
& robustness to hyperp. settings.
• A3C successor in learning speed and the
robustness (over 87% of human scores)
62. Using relational properties in our priors?
•Neural-symbolic (Knowledge Graph) learning
and reasoning
62
Relational Networks (Santoro’17) and Visual Interaction Networks (Watters’17)
Philosophically similar models using abstract logic to reason about the world
63. Interpreting unsupervised representations
•Understanding intermediate layers using linear
classifier probes. Alain and Bengio’16 https://
arxiv.org/pdf/1610.01644.pdf
•Explaining the Unexplained: A CLass-Enhanced
Attentive Response (CLEAR) Approach to
Understanding Deep Neural Networks, Kumar et
al 17. https://arxiv.org/pdf/1704.04133.pdf