Frontier in reinforcement learning

Frontiers in Reinforcement Learning
Jie-Han Chen
NetDB, National Cheng Kung University
5/29, 2018 @ National Cheng Kung University, Taiwan
1

Outline
● Transfer Learning
● Curriculum learning
● Snubs in our lectures
● Questions
2

Transfer Learning
Transfer learning means learning the knowledge based on source domain, and
then transfer the knowledge to target domain.
Recently, Transfer Learning has become a hot research domain because it benefits
learning speed and learning performance.
4

Traditional Machine Learning
Task A, domain A
Model for task A
Learning Evaluate
Task B, domain B
Model for task B
Learning Evaluate
We train the model for each task from scratch.
Each model responsible for each task.
5

Transfer Learning
source task,
source domain
Model for task A
Learning
Model for task B
Knowledge
Transferring
Evaluate
targe task,
target domain
We train the model from source domain and
apply it to a different but related problem.
6

The advantages of transfer learning
● In some critical domains, there are not enough data for training from scratch.
We can apply transfer learning to help learning.
Images are from: https://becominghuman.ai/nvidia-and-the-gpu-contribution-to-the-ai-world-of-self-driving-cars-1f00e3212508
and Paper: A Survey on Deep Learning in Medical Image Analysis
7

Zero-shot learning / One-shot learning
● Zero-shot learning: learn the model from source domain, and apply it to target
domain directly without tuning in target domain.
●
● One-shot learning: learn the model from source domain, and finetune with little
samples in target domain.
8

Transfer features from pretrained model
In the previous work by J Yoshiski et al[1],
they surveyed how to transfer the
features in neural network.
9[1] How transferable are features in deep neural networks? [NIPS 2014]

Transfer features from pretrained model
10Transferred Layers

Transfer Module Knowledge
● Proposed by Coline Devin et al. (UCB)[2]
● learn module for specific task / robotic control
11
The image is from CS294, UCB
[2] Learning Modular Neural Network Policies for Multi-Task and Multi-Robot Transfer

12

13
task-related observation robot-related observation

14

Distill Multitask knowledge into single network
How to learn a multitask policy that can simultaneously perform many tasks?
● Actor-Mimic [3]
● Distral [4]
15
[3] Actor-Mimic: Deep Multitask and Transfer Reinforcement Learning
[4] Distral: Robust Multitask Reinforcement Learning

Actor-Mimic
● proposed by Emilio Parisotto, Jimmy Ba,
Ruslan Salakhutdinov.
● teach 1 NN by multiple experts
● use supervised learning to mimic
multi-task policy
16

Distral
Distral: Distillation and Transfer Learning, proposed by DeepMind in 2017
● Distillation: combine multiple policies into one, for concurrent multitask
learning (accelerate all tasks through sharing) (from CS294)
18

Curriculum learning
● Proposed by Yoshua Bengio in 2009 [5]
● They emphasize the importance in the order of learning samples
○ Learn from the simple samples first, and then learn from much harder ones.
○ Dynamically expand the sample space from smaller and simpler to complicated target domain
● Help to converge to better local optimal, make us learn unlearnable task
20[5] Curriculum Learning, Yoshua Bengio et al.

Predict next word
● Corpus: Wikipedia
● Expand learning corpus periodically.
21
expand corpus

How to decide a good curriculum?
● noisy or not
● diversity
● similarty to our target problem or not
22

Self-Play
23
Self-play in AlphaGo Zero [6]
[6] Mastering the game of Go without human knowledge

Self-Play and Curriculum Learning
In Reinforcement Learning, self-play has succeeded in many thorny problem.
DeepMind use self-play to train AlphaGo Zero, and it needs less samples to reach
much higher performance than use supervised learning one before.
In self-play, the agent fights against itself. When it learns from scratch, the rival is
poor which is similar to use simpler samples to train the model. When the agent
grows stronger, the rival is also stronger too. Just like the samples and the problem
become more complicated and more difficult in Curriculum Learning.
24

Snubs in our lecture
1. Active Learning
2. Meta-Learning
3. Inverse RL
4. GAN and RL
5. Model-based RL
6. RL in NN Architecture Searching
25

Questions
Can we transfer multi-task policy into single NN to play a game with multitask?
(Contextual Policy)
26

How to learn AI?
● Find your own path to learn AI foundations, here is my path:
https://github.com/JIElite/Learning-AI
● Read diverse AI papers
● Polish your math skill
● Do much many experiments, and learn from practical experience
● Follow some AI researchers on Twitter, Reddit.
27

Frontier in reinforcement learning

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Ähnlich wie Frontier in reinforcement learning

Ähnlich wie Frontier in reinforcement learning (20)

Mehr von Jie-Han Chen

Mehr von Jie-Han Chen (6)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Frontier in reinforcement learning