SlideShare ist ein Scribd-Unternehmen logo
1 von 27
Downloaden Sie, um offline zu lesen
Frontiers in Reinforcement Learning
Jie-Han Chen
NetDB, National Cheng Kung University
5/29, 2018 @ National Cheng Kung University, Taiwan
1
Outline
● Transfer Learning
● Curriculum learning
● Snubs in our lectures
● Questions
2
Transfer Learning
3
Transfer Learning
Transfer learning means learning the knowledge based on source domain, and
then transfer the knowledge to target domain.
Recently, Transfer Learning has become a hot research domain because it benefits
learning speed and learning performance.
4
Traditional Machine Learning
Task A, domain A
Model for task A
Learning Evaluate
Task B, domain B
Model for task B
Learning Evaluate
We train the model for each task from scratch.
Each model responsible for each task.
5
Transfer Learning
source task,
source domain
Model for task A
Learning
Model for task B
Knowledge
Transferring
Evaluate
targe task,
target domain
We train the model from source domain and
apply it to a different but related problem.
6
The advantages of transfer learning
● In some critical domains, there are not enough data for training from scratch.
We can apply transfer learning to help learning.
Images are from: https://becominghuman.ai/nvidia-and-the-gpu-contribution-to-the-ai-world-of-self-driving-cars-1f00e3212508
and Paper: A Survey on Deep Learning in Medical Image Analysis
7
Zero-shot learning / One-shot learning
● Zero-shot learning: learn the model from source domain, and apply it to target
domain directly without tuning in target domain.
●
● One-shot learning: learn the model from source domain, and finetune with little
samples in target domain.
8
Transfer features from pretrained model
In the previous work by J Yoshiski et al[1],
they surveyed how to transfer the
features in neural network.
9[1] How transferable are features in deep neural networks? [NIPS 2014]
Transfer features from pretrained model
10Transferred Layers
Transfer Module Knowledge
● Proposed by Coline Devin et al. (UCB)[2]
● learn module for specific task / robotic control
11
The image is from CS294, UCB
[2] Learning Modular Neural Network Policies for Multi-Task and Multi-Robot Transfer
Transfer Module Knowledge
12
The image is from CS294, UCB
Transfer Module Knowledge
13
The image is from CS294, UCB
task-related observation robot-related observation
Transfer Module Knowledge
14
The image is from CS294, UCB
Distill Multitask knowledge into single network
How to learn a multitask policy that can simultaneously perform many tasks?
● Actor-Mimic [3]
● Distral [4]
15
[3] Actor-Mimic: Deep Multitask and Transfer Reinforcement Learning
[4] Distral: Robust Multitask Reinforcement Learning
Actor-Mimic
● proposed by Emilio Parisotto, Jimmy Ba,
Ruslan Salakhutdinov.
● teach 1 NN by multiple experts
● use supervised learning to mimic
multi-task policy
16
Actor-Mimic
17
Distral
Distral: Distillation and Transfer Learning, proposed by DeepMind in 2017
● Distillation: combine multiple policies into one, for concurrent multitask
learning (accelerate all tasks through sharing) (from CS294)
18
Distral
19
Curriculum learning
● Proposed by Yoshua Bengio in 2009 [5]
● They emphasize the importance in the order of learning samples
○ Learn from the simple samples first, and then learn from much harder ones.
○ Dynamically expand the sample space from smaller and simpler to complicated target domain
● Help to converge to better local optimal, make us learn unlearnable task
20[5] Curriculum Learning, Yoshua Bengio et al.
Predict next word
● Corpus: Wikipedia
● Expand learning corpus periodically.
21
expand corpus
How to decide a good curriculum?
● noisy or not
● diversity
● similarty to our target problem or not
22
Self-Play
23
Self-play in AlphaGo Zero [6]
[6] Mastering the game of Go without human knowledge
Self-Play and Curriculum Learning
In Reinforcement Learning, self-play has succeeded in many thorny problem.
DeepMind use self-play to train AlphaGo Zero, and it needs less samples to reach
much higher performance than use supervised learning one before.
In self-play, the agent fights against itself. When it learns from scratch, the rival is
poor which is similar to use simpler samples to train the model. When the agent
grows stronger, the rival is also stronger too. Just like the samples and the problem
become more complicated and more difficult in Curriculum Learning.
24
Snubs in our lecture
1. Active Learning
2. Meta-Learning
3. Inverse RL
4. GAN and RL
5. Model-based RL
6. RL in NN Architecture Searching
25
Questions
Can we transfer multi-task policy into single NN to play a game with multitask?
(Contextual Policy)
26
How to learn AI?
● Find your own path to learn AI foundations, here is my path:
https://github.com/JIElite/Learning-AI
● Read diverse AI papers
● Polish your math skill
● Do much many experiments, and learn from practical experience
● Follow some AI researchers on Twitter, Reddit.
27

Weitere ähnliche Inhalte

Was ist angesagt?

acai01-updated.ppt
acai01-updated.pptacai01-updated.ppt
acai01-updated.ppt
butest
 

Was ist angesagt? (20)

Deep Reinforcement Learning
Deep Reinforcement LearningDeep Reinforcement Learning
Deep Reinforcement Learning
 
Policy gradient
Policy gradientPolicy gradient
Policy gradient
 
Introduction: Asynchronous Methods for Deep Reinforcement Learning
Introduction: Asynchronous Methods for  Deep Reinforcement LearningIntroduction: Asynchronous Methods for  Deep Reinforcement Learning
Introduction: Asynchronous Methods for Deep Reinforcement Learning
 
Reinforcement Learning
Reinforcement LearningReinforcement Learning
Reinforcement Learning
 
Intro to Deep Reinforcement Learning
Intro to Deep Reinforcement LearningIntro to Deep Reinforcement Learning
Intro to Deep Reinforcement Learning
 
Deep Reinforcement Learning
Deep Reinforcement LearningDeep Reinforcement Learning
Deep Reinforcement Learning
 
Reinforcement Learning
Reinforcement LearningReinforcement Learning
Reinforcement Learning
 
Discrete sequential prediction of continuous actions for deep RL
Discrete sequential prediction of continuous actions for deep RLDiscrete sequential prediction of continuous actions for deep RL
Discrete sequential prediction of continuous actions for deep RL
 
Introduction of Deep Reinforcement Learning
Introduction of Deep Reinforcement LearningIntroduction of Deep Reinforcement Learning
Introduction of Deep Reinforcement Learning
 
A brief overview of Reinforcement Learning applied to games
A brief overview of Reinforcement Learning applied to gamesA brief overview of Reinforcement Learning applied to games
A brief overview of Reinforcement Learning applied to games
 
25 introduction reinforcement_learning
25 introduction reinforcement_learning25 introduction reinforcement_learning
25 introduction reinforcement_learning
 
Deep Q-Learning
Deep Q-LearningDeep Q-Learning
Deep Q-Learning
 
Reinforcement learning
Reinforcement learningReinforcement learning
Reinforcement learning
 
acai01-updated.ppt
acai01-updated.pptacai01-updated.ppt
acai01-updated.ppt
 
An Introduction to Reinforcement Learning - The Doors to AGI
An Introduction to Reinforcement Learning - The Doors to AGIAn Introduction to Reinforcement Learning - The Doors to AGI
An Introduction to Reinforcement Learning - The Doors to AGI
 
Reinforcement learning
Reinforcement learning Reinforcement learning
Reinforcement learning
 
Reinforcement Learning
Reinforcement LearningReinforcement Learning
Reinforcement Learning
 
Temporal difference learning
Temporal difference learningTemporal difference learning
Temporal difference learning
 
Generalized Reinforcement Learning
Generalized Reinforcement LearningGeneralized Reinforcement Learning
Generalized Reinforcement Learning
 
Reinforcement Learning Tutorial | Edureka
Reinforcement Learning Tutorial | EdurekaReinforcement Learning Tutorial | Edureka
Reinforcement Learning Tutorial | Edureka
 

Ähnlich wie Frontier in reinforcement learning

Supporting Learners in Adaptive Learning Environments through the enhancement...
Supporting Learners in Adaptive Learning Environments through the enhancement...Supporting Learners in Adaptive Learning Environments through the enhancement...
Supporting Learners in Adaptive Learning Environments through the enhancement...
Luca Mazzola
 
Deep Learning, Where Are You Going?
Deep Learning, Where Are You Going?Deep Learning, Where Are You Going?
Deep Learning, Where Are You Going?
NAVER Engineering
 

Ähnlich wie Frontier in reinforcement learning (20)

NTU DBME5028 Week8 Transfer Learning
NTU DBME5028 Week8 Transfer LearningNTU DBME5028 Week8 Transfer Learning
NTU DBME5028 Week8 Transfer Learning
 
MILA DL & RL summer school highlights
MILA DL & RL summer school highlights MILA DL & RL summer school highlights
MILA DL & RL summer school highlights
 
Lecture 11 - Advance Learning Techniques
Lecture 11 - Advance Learning TechniquesLecture 11 - Advance Learning Techniques
Lecture 11 - Advance Learning Techniques
 
Dealing with Data Scarcity in Natural Language Processing - Belgium NLP Meetup
Dealing with Data Scarcity in Natural Language Processing - Belgium NLP MeetupDealing with Data Scarcity in Natural Language Processing - Belgium NLP Meetup
Dealing with Data Scarcity in Natural Language Processing - Belgium NLP Meetup
 
Thomas Wolf "Transfer learning in NLP"
Thomas Wolf "Transfer learning in NLP"Thomas Wolf "Transfer learning in NLP"
Thomas Wolf "Transfer learning in NLP"
 
Review On In-Context Leaning.pptx
Review On In-Context Leaning.pptxReview On In-Context Leaning.pptx
Review On In-Context Leaning.pptx
 
About Peer Learning Tool
About Peer Learning ToolAbout Peer Learning Tool
About Peer Learning Tool
 
Strata 2016 - Lessons Learned from building real-life Machine Learning Systems
Strata 2016 -  Lessons Learned from building real-life Machine Learning SystemsStrata 2016 -  Lessons Learned from building real-life Machine Learning Systems
Strata 2016 - Lessons Learned from building real-life Machine Learning Systems
 
Visual concept learning
Visual concept learningVisual concept learning
Visual concept learning
 
Medbiq xAPI workshop2b
Medbiq xAPI workshop2bMedbiq xAPI workshop2b
Medbiq xAPI workshop2b
 
Supporting Learners in Adaptive Learning Environments through the enhancement...
Supporting Learners in Adaptive Learning Environments through the enhancement...Supporting Learners in Adaptive Learning Environments through the enhancement...
Supporting Learners in Adaptive Learning Environments through the enhancement...
 
Transfer Learning: Breve introducción a modelos pre-entrenados.
Transfer Learning: Breve introducción a modelos pre-entrenados.Transfer Learning: Breve introducción a modelos pre-entrenados.
Transfer Learning: Breve introducción a modelos pre-entrenados.
 
Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...
Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...
Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...
 
BIG2016- Lessons Learned from building real-life user-focused Big Data systems
BIG2016- Lessons Learned from building real-life user-focused Big Data systemsBIG2016- Lessons Learned from building real-life user-focused Big Data systems
BIG2016- Lessons Learned from building real-life user-focused Big Data systems
 
Vwbpe acs stc_slides_4-5-19_w-o_notes
Vwbpe acs stc_slides_4-5-19_w-o_notesVwbpe acs stc_slides_4-5-19_w-o_notes
Vwbpe acs stc_slides_4-5-19_w-o_notes
 
Student Survey Nov 2022
Student Survey Nov 2022Student Survey Nov 2022
Student Survey Nov 2022
 
Transfer learning with real world applications in deep learning
Transfer learning with real world applications in deep learningTransfer learning with real world applications in deep learning
Transfer learning with real world applications in deep learning
 
Deep Learning, Where Are You Going?
Deep Learning, Where Are You Going?Deep Learning, Where Are You Going?
Deep Learning, Where Are You Going?
 
Presentation of master thesis
Presentation of master thesisPresentation of master thesis
Presentation of master thesis
 
Instructor-led training and gamification - webinar with GamEffective, Sykes a...
Instructor-led training and gamification - webinar with GamEffective, Sykes a...Instructor-led training and gamification - webinar with GamEffective, Sykes a...
Instructor-led training and gamification - webinar with GamEffective, Sykes a...
 

Mehr von Jie-Han Chen (6)

Temporal difference learning
Temporal difference learningTemporal difference learning
Temporal difference learning
 
Deep reinforcement learning
Deep reinforcement learningDeep reinforcement learning
Deep reinforcement learning
 
Markov decision process
Markov decision processMarkov decision process
Markov decision process
 
BiCNet presentation (multi-agent reinforcement learning)
BiCNet presentation (multi-agent reinforcement learning)BiCNet presentation (multi-agent reinforcement learning)
BiCNet presentation (multi-agent reinforcement learning)
 
Data science-toolchain
Data science-toolchainData science-toolchain
Data science-toolchain
 
The artofreadablecode
The artofreadablecodeThe artofreadablecode
The artofreadablecode
 

Kürzlich hochgeladen

Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
ciinovamais
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
QucHHunhnh
 
Gardella_PRCampaignConclusion Pitch Letter
Gardella_PRCampaignConclusion Pitch LetterGardella_PRCampaignConclusion Pitch Letter
Gardella_PRCampaignConclusion Pitch Letter
MateoGardella
 
Seal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptxSeal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptx
negromaestrong
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
QucHHunhnh
 

Kürzlich hochgeladen (20)

Unit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxUnit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptx
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 
Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activity
 
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptxINDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
 
Gardella_PRCampaignConclusion Pitch Letter
Gardella_PRCampaignConclusion Pitch LetterGardella_PRCampaignConclusion Pitch Letter
Gardella_PRCampaignConclusion Pitch Letter
 
Seal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptxSeal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptx
 
PROCESS RECORDING FORMAT.docx
PROCESS      RECORDING        FORMAT.docxPROCESS      RECORDING        FORMAT.docx
PROCESS RECORDING FORMAT.docx
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impact
 
Unit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptxUnit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptx
 
fourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writingfourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writing
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
 
ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptx
 
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdf
 
Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1
 
psychiatric nursing HISTORY COLLECTION .docx
psychiatric  nursing HISTORY  COLLECTION  .docxpsychiatric  nursing HISTORY  COLLECTION  .docx
psychiatric nursing HISTORY COLLECTION .docx
 
Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..
 

Frontier in reinforcement learning

  • 1. Frontiers in Reinforcement Learning Jie-Han Chen NetDB, National Cheng Kung University 5/29, 2018 @ National Cheng Kung University, Taiwan 1
  • 2. Outline ● Transfer Learning ● Curriculum learning ● Snubs in our lectures ● Questions 2
  • 4. Transfer Learning Transfer learning means learning the knowledge based on source domain, and then transfer the knowledge to target domain. Recently, Transfer Learning has become a hot research domain because it benefits learning speed and learning performance. 4
  • 5. Traditional Machine Learning Task A, domain A Model for task A Learning Evaluate Task B, domain B Model for task B Learning Evaluate We train the model for each task from scratch. Each model responsible for each task. 5
  • 6. Transfer Learning source task, source domain Model for task A Learning Model for task B Knowledge Transferring Evaluate targe task, target domain We train the model from source domain and apply it to a different but related problem. 6
  • 7. The advantages of transfer learning ● In some critical domains, there are not enough data for training from scratch. We can apply transfer learning to help learning. Images are from: https://becominghuman.ai/nvidia-and-the-gpu-contribution-to-the-ai-world-of-self-driving-cars-1f00e3212508 and Paper: A Survey on Deep Learning in Medical Image Analysis 7
  • 8. Zero-shot learning / One-shot learning ● Zero-shot learning: learn the model from source domain, and apply it to target domain directly without tuning in target domain. ● ● One-shot learning: learn the model from source domain, and finetune with little samples in target domain. 8
  • 9. Transfer features from pretrained model In the previous work by J Yoshiski et al[1], they surveyed how to transfer the features in neural network. 9[1] How transferable are features in deep neural networks? [NIPS 2014]
  • 10. Transfer features from pretrained model 10Transferred Layers
  • 11. Transfer Module Knowledge ● Proposed by Coline Devin et al. (UCB)[2] ● learn module for specific task / robotic control 11 The image is from CS294, UCB [2] Learning Modular Neural Network Policies for Multi-Task and Multi-Robot Transfer
  • 12. Transfer Module Knowledge 12 The image is from CS294, UCB
  • 13. Transfer Module Knowledge 13 The image is from CS294, UCB task-related observation robot-related observation
  • 14. Transfer Module Knowledge 14 The image is from CS294, UCB
  • 15. Distill Multitask knowledge into single network How to learn a multitask policy that can simultaneously perform many tasks? ● Actor-Mimic [3] ● Distral [4] 15 [3] Actor-Mimic: Deep Multitask and Transfer Reinforcement Learning [4] Distral: Robust Multitask Reinforcement Learning
  • 16. Actor-Mimic ● proposed by Emilio Parisotto, Jimmy Ba, Ruslan Salakhutdinov. ● teach 1 NN by multiple experts ● use supervised learning to mimic multi-task policy 16
  • 18. Distral Distral: Distillation and Transfer Learning, proposed by DeepMind in 2017 ● Distillation: combine multiple policies into one, for concurrent multitask learning (accelerate all tasks through sharing) (from CS294) 18
  • 20. Curriculum learning ● Proposed by Yoshua Bengio in 2009 [5] ● They emphasize the importance in the order of learning samples ○ Learn from the simple samples first, and then learn from much harder ones. ○ Dynamically expand the sample space from smaller and simpler to complicated target domain ● Help to converge to better local optimal, make us learn unlearnable task 20[5] Curriculum Learning, Yoshua Bengio et al.
  • 21. Predict next word ● Corpus: Wikipedia ● Expand learning corpus periodically. 21 expand corpus
  • 22. How to decide a good curriculum? ● noisy or not ● diversity ● similarty to our target problem or not 22
  • 23. Self-Play 23 Self-play in AlphaGo Zero [6] [6] Mastering the game of Go without human knowledge
  • 24. Self-Play and Curriculum Learning In Reinforcement Learning, self-play has succeeded in many thorny problem. DeepMind use self-play to train AlphaGo Zero, and it needs less samples to reach much higher performance than use supervised learning one before. In self-play, the agent fights against itself. When it learns from scratch, the rival is poor which is similar to use simpler samples to train the model. When the agent grows stronger, the rival is also stronger too. Just like the samples and the problem become more complicated and more difficult in Curriculum Learning. 24
  • 25. Snubs in our lecture 1. Active Learning 2. Meta-Learning 3. Inverse RL 4. GAN and RL 5. Model-based RL 6. RL in NN Architecture Searching 25
  • 26. Questions Can we transfer multi-task policy into single NN to play a game with multitask? (Contextual Policy) 26
  • 27. How to learn AI? ● Find your own path to learn AI foundations, here is my path: https://github.com/JIElite/Learning-AI ● Read diverse AI papers ● Polish your math skill ● Do much many experiments, and learn from practical experience ● Follow some AI researchers on Twitter, Reddit. 27