SlideShare ist ein Scribd-Unternehmen logo
1 von 18
Downloaden Sie, um offline zu lesen
Modular Multitask Reinforcement Learning with
Policy Sketches
Yoonho Lee
Department of Computer Science and Engineering
Pohang University of Science and Technology
August 09, 2017
Modular Multitask Reinforcement Learning with Policy
Sketches
ICML 2017 Best Paper Honorable Mention
Hierarchical Reinforcement Learning
Proposes a looser form of task supervision for RL agents
(compared to e.g. reward shaping)
Proposed form of task supervision is decoupled from the
environment
Proposes a new way to use NNs for hierarchical RL
Natural extensions to zero-shot and unlabelled RL
Hierarchical Reinforcement Learning
Why is starcraft harder than go for RL?
Why is starcraft not harder than go for humans?
Hierarchical Reinforcement Learning
Decision = Action. RL algorithms are designed with decisions
in mind, but operate on actions.
Real world decisions have hierarchical structure(e.g. cook
dinner → cut potato → activate arm muscle)
Effective knowledge reuse
Effecient credit assignment
Open question: How do we discover salient/reusable
decisions?
Previous Work
Learn multiple timestep policy along with confidence1
Learn controller network that sets desired direction of state
change2
Encode multiple sub-policies with an SNN3
Actor-Critic algorithm for options4
1
Alexander Vezhnevets et al. “Strategic Attentive Writer for Learning
Macro-Actions”. In: NIPS (2016).
2
Alexander Sasha Vezhnevets et al. “FeUdal Networks for Hierarchical
Reinforcement Learning”. In: (2017).
3
Carlos Florensa, Yan Duan, and Pieter Abbeel. “Stochastic Neural
Networks for Hierarchical Reinforcement Learning”. In: ICLR 2017 (2017),
pp. 1056–1064.
4
Pierre-Luc Bacon, Jean Harb, and Doina Precup. “The Option-Critic
Architecture”. In: AAAI (2017).
Previous Work
Previous hierarchical RL algorithms require reward engineering
for high-dimensional environments
Proposed Approach
Proposed Approach
Sketches
Method
Method
Method
Experiments
Comparison to Baseline
Experiments
Ablation
Experiments
Accuracy
Discussion
Weaknesses
Only works for tasks that are strictly composed
To label, one needs to understand which tasks are ”RL-able”
Labels needed: Fundamentally limited to tasks understood by
humans
Discussion
Implications
Zero-shot and Adaptation experiments had similar
performance, what does this imply?
Weight sharing between policies?
Curriculum instead of policy sketches?
Common theme of cooperating networks for hierarchical RL:
Manager-Worker5, Option-Policy6, Policy modules7
5
Alexander Sasha Vezhnevets et al. “FeUdal Networks for Hierarchical
Reinforcement Learning”. In: (2017).
6
Pierre-Luc Bacon, Jean Harb, and Doina Precup. “The Option-Critic
Architecture”. In: AAAI (2017).
7
Jacob Andreas, Dan Klein, and Sergey Levine. “Modular Multitask
Reinforcement Learning with Policy Sketches”. In: ICML (2017). arXiv:
1611.01796.
References I
[1] Jacob Andreas, Dan Klein, and Sergey Levine. “Modular
Multitask Reinforcement Learning with Policy Sketches”. In:
ICML (2017). arXiv: 1611.01796.
[2] Pierre-Luc Bacon, Jean Harb, and Doina Precup. “The
Option-Critic Architecture”. In: AAAI (2017).
[3] Carlos Florensa, Yan Duan, and Pieter Abbeel. “Stochastic
Neural Networks for Hierarchical Reinforcement Learning”.
In: ICLR 2017 (2017), pp. 1056–1064.
[4] Alexander Vezhnevets et al. “Strategic Attentive Writer for
Learning Macro-Actions”. In: NIPS (2016).
[5] Alexander Sasha Vezhnevets et al. “FeUdal Networks for
Hierarchical Reinforcement Learning”. In: (2017).
Thank You

Weitere ähnliche Inhalte

Was ist angesagt?

Introduction to Model-Based Machine Learning
Introduction to Model-Based Machine LearningIntroduction to Model-Based Machine Learning
Introduction to Model-Based Machine LearningDaniel Emaasit
 
MultiObjective(11) - Copy
MultiObjective(11) - CopyMultiObjective(11) - Copy
MultiObjective(11) - CopyAMIT KUMAR
 
Using Dempster-Shafer Theory and Real Options Theory
Using Dempster-Shafer Theory and Real Options TheoryUsing Dempster-Shafer Theory and Real Options Theory
Using Dempster-Shafer Theory and Real Options TheoryEric van Heck
 
Always adopt self supervised learning
Always adopt self supervised learningAlways adopt self supervised learning
Always adopt self supervised learningLibgirlTeam
 
Predictive Metabonomics
Predictive MetabonomicsPredictive Metabonomics
Predictive MetabonomicsMarilyn Arceo
 
CNN Structure: From LeNet to ShuffleNet
CNN Structure: From LeNet to ShuffleNetCNN Structure: From LeNet to ShuffleNet
CNN Structure: From LeNet to ShuffleNetDalin Zhang
 
A h k clustering algorithm for high dimensional data using ensemble learning
A h k clustering algorithm for high dimensional data using ensemble learningA h k clustering algorithm for high dimensional data using ensemble learning
A h k clustering algorithm for high dimensional data using ensemble learningijitcs
 
Review : Adaptive Consistency Regularization for Semi-Supervised Transfer Lea...
Review : Adaptive Consistency Regularization for Semi-Supervised Transfer Lea...Review : Adaptive Consistency Regularization for Semi-Supervised Transfer Lea...
Review : Adaptive Consistency Regularization for Semi-Supervised Transfer Lea...Dongmin Choi
 
PREDICTING STOCK PRICE MOVEMENTS BASED ON NEWSPAPER ARTICLES USING A NOVEL DE...
PREDICTING STOCK PRICE MOVEMENTS BASED ON NEWSPAPER ARTICLES USING A NOVEL DE...PREDICTING STOCK PRICE MOVEMENTS BASED ON NEWSPAPER ARTICLES USING A NOVEL DE...
PREDICTING STOCK PRICE MOVEMENTS BASED ON NEWSPAPER ARTICLES USING A NOVEL DE...webwinkelvakdag
 
Bellman Equation in Dynamic Programming
Bellman Equation in Dynamic ProgrammingBellman Equation in Dynamic Programming
Bellman Equation in Dynamic Programmingijcoa
 
A Survey of Generative Adversarial Neural Networks (GAN) for Text-to-Image Sy...
A Survey of Generative Adversarial Neural Networks (GAN) for Text-to-Image Sy...A Survey of Generative Adversarial Neural Networks (GAN) for Text-to-Image Sy...
A Survey of Generative Adversarial Neural Networks (GAN) for Text-to-Image Sy...Mirsaeid Abolghasemi
 
Ivy Zhu, Research Scientist, Intel at MLconf SEA - 5/01/15
Ivy Zhu, Research Scientist, Intel at MLconf SEA - 5/01/15Ivy Zhu, Research Scientist, Intel at MLconf SEA - 5/01/15
Ivy Zhu, Research Scientist, Intel at MLconf SEA - 5/01/15MLconf
 

Was ist angesagt? (20)

Introduction to Model-Based Machine Learning
Introduction to Model-Based Machine LearningIntroduction to Model-Based Machine Learning
Introduction to Model-Based Machine Learning
 
MultiObjective(11) - Copy
MultiObjective(11) - CopyMultiObjective(11) - Copy
MultiObjective(11) - Copy
 
2011611009
20116110092011611009
2011611009
 
Using Dempster-Shafer Theory and Real Options Theory
Using Dempster-Shafer Theory and Real Options TheoryUsing Dempster-Shafer Theory and Real Options Theory
Using Dempster-Shafer Theory and Real Options Theory
 
Always adopt self supervised learning
Always adopt self supervised learningAlways adopt self supervised learning
Always adopt self supervised learning
 
final seminar
final seminarfinal seminar
final seminar
 
Jl
JlJl
Jl
 
Icml2018 naver review
Icml2018 naver reviewIcml2018 naver review
Icml2018 naver review
 
Predictive Metabonomics
Predictive MetabonomicsPredictive Metabonomics
Predictive Metabonomics
 
CNN Structure: From LeNet to ShuffleNet
CNN Structure: From LeNet to ShuffleNetCNN Structure: From LeNet to ShuffleNet
CNN Structure: From LeNet to ShuffleNet
 
A h k clustering algorithm for high dimensional data using ensemble learning
A h k clustering algorithm for high dimensional data using ensemble learningA h k clustering algorithm for high dimensional data using ensemble learning
A h k clustering algorithm for high dimensional data using ensemble learning
 
One shot learning
One shot learningOne shot learning
One shot learning
 
Fahroo - Optimization and Discrete Mathematics - Spring Review 2013
Fahroo - Optimization and Discrete Mathematics - Spring Review 2013Fahroo - Optimization and Discrete Mathematics - Spring Review 2013
Fahroo - Optimization and Discrete Mathematics - Spring Review 2013
 
Review : Adaptive Consistency Regularization for Semi-Supervised Transfer Lea...
Review : Adaptive Consistency Regularization for Semi-Supervised Transfer Lea...Review : Adaptive Consistency Regularization for Semi-Supervised Transfer Lea...
Review : Adaptive Consistency Regularization for Semi-Supervised Transfer Lea...
 
PREDICTING STOCK PRICE MOVEMENTS BASED ON NEWSPAPER ARTICLES USING A NOVEL DE...
PREDICTING STOCK PRICE MOVEMENTS BASED ON NEWSPAPER ARTICLES USING A NOVEL DE...PREDICTING STOCK PRICE MOVEMENTS BASED ON NEWSPAPER ARTICLES USING A NOVEL DE...
PREDICTING STOCK PRICE MOVEMENTS BASED ON NEWSPAPER ARTICLES USING A NOVEL DE...
 
Efficient projections
Efficient projectionsEfficient projections
Efficient projections
 
CLIM: Transition Workshop - Introduction to Remote Sensing Working Group - A...
CLIM: Transition Workshop - Introduction to Remote Sensing Working Group  - A...CLIM: Transition Workshop - Introduction to Remote Sensing Working Group  - A...
CLIM: Transition Workshop - Introduction to Remote Sensing Working Group - A...
 
Bellman Equation in Dynamic Programming
Bellman Equation in Dynamic ProgrammingBellman Equation in Dynamic Programming
Bellman Equation in Dynamic Programming
 
A Survey of Generative Adversarial Neural Networks (GAN) for Text-to-Image Sy...
A Survey of Generative Adversarial Neural Networks (GAN) for Text-to-Image Sy...A Survey of Generative Adversarial Neural Networks (GAN) for Text-to-Image Sy...
A Survey of Generative Adversarial Neural Networks (GAN) for Text-to-Image Sy...
 
Ivy Zhu, Research Scientist, Intel at MLconf SEA - 5/01/15
Ivy Zhu, Research Scientist, Intel at MLconf SEA - 5/01/15Ivy Zhu, Research Scientist, Intel at MLconf SEA - 5/01/15
Ivy Zhu, Research Scientist, Intel at MLconf SEA - 5/01/15
 

Ähnlich wie Modular Multitask Reinforcement Learning with Policy Sketches

The role of systems analysis in co-learning. Walter Rossing
The role of systems analysis in co-learning. Walter RossingThe role of systems analysis in co-learning. Walter Rossing
The role of systems analysis in co-learning. Walter RossingJoanna Hicks
 
Building AI Applications using Knowledge Graphs
Building AI Applications using Knowledge GraphsBuilding AI Applications using Knowledge Graphs
Building AI Applications using Knowledge GraphsAndre Freitas
 
AI Beyond Deep Learning
AI Beyond Deep LearningAI Beyond Deep Learning
AI Beyond Deep LearningAndre Freitas
 
Effective Semantics for Engineering NLP Systems
Effective Semantics for Engineering NLP SystemsEffective Semantics for Engineering NLP Systems
Effective Semantics for Engineering NLP SystemsAndre Freitas
 
Biological Foundations for Deep Learning: Towards Decision Networks
 Biological Foundations for Deep Learning: Towards Decision Networks Biological Foundations for Deep Learning: Towards Decision Networks
Biological Foundations for Deep Learning: Towards Decision Networksdiannepatricia
 
Bridging the gap between AI and UI - DSI Vienna - full version
Bridging the gap between AI and UI - DSI Vienna - full versionBridging the gap between AI and UI - DSI Vienna - full version
Bridging the gap between AI and UI - DSI Vienna - full versionLiad Magen
 
My degree is an EDD in Performance Improvement LeadershipSyste.docx
My degree is an EDD in Performance Improvement LeadershipSyste.docxMy degree is an EDD in Performance Improvement LeadershipSyste.docx
My degree is an EDD in Performance Improvement LeadershipSyste.docxgriffinruthie22
 
Goal Dynamics_From System Dynamics to Implementation
Goal Dynamics_From System Dynamics to ImplementationGoal Dynamics_From System Dynamics to Implementation
Goal Dynamics_From System Dynamics to ImplementationAmjad Adib
 
Information Systems Action research methods
Information Systems  Action research methodsInformation Systems  Action research methods
Information Systems Action research methodsRaimo Halinen
 
Architecture Design for Deep Neural Networks III
Architecture Design for Deep Neural Networks IIIArchitecture Design for Deep Neural Networks III
Architecture Design for Deep Neural Networks IIIWanjin Yu
 
Xin Yao: "What can evolutionary computation do for you?"
Xin Yao: "What can evolutionary computation do for you?"Xin Yao: "What can evolutionary computation do for you?"
Xin Yao: "What can evolutionary computation do for you?"ieee_cis_cyprus
 
OO Development 1 - Introduction to Object-Oriented Development
OO Development 1 - Introduction to Object-Oriented DevelopmentOO Development 1 - Introduction to Object-Oriented Development
OO Development 1 - Introduction to Object-Oriented DevelopmentRandy Connolly
 
Object-Oriented Design Heuristics
Object-Oriented Design HeuristicsObject-Oriented Design Heuristics
Object-Oriented Design Heuristicskim.mens
 
[PR12] understanding deep learning requires rethinking generalization
[PR12] understanding deep learning requires rethinking generalization[PR12] understanding deep learning requires rethinking generalization
[PR12] understanding deep learning requires rethinking generalizationJaeJun Yoo
 
Professor Steve Roberts; The Bayesian Crowd: scalable information combinati...
Professor Steve Roberts; The Bayesian Crowd: scalable information combinati...Professor Steve Roberts; The Bayesian Crowd: scalable information combinati...
Professor Steve Roberts; The Bayesian Crowd: scalable information combinati...Ian Morgan
 
Professor Steve Roberts; The Bayesian Crowd: scalable information combinati...
Professor Steve Roberts; The Bayesian Crowd: scalable information combinati...Professor Steve Roberts; The Bayesian Crowd: scalable information combinati...
Professor Steve Roberts; The Bayesian Crowd: scalable information combinati...Bayes Nets meetup London
 
The overlaps between Action Research and Design Research
The overlaps between Action Research and Design ResearchThe overlaps between Action Research and Design Research
The overlaps between Action Research and Design ResearchSandeep Purao
 
JISC RSC London Workshop - Learner analytics
JISC RSC London Workshop - Learner analyticsJISC RSC London Workshop - Learner analytics
JISC RSC London Workshop - Learner analyticsJames Ballard
 
Responsibility Driven Design
Responsibility Driven DesignResponsibility Driven Design
Responsibility Driven DesignHarsh Jegadeesan
 

Ähnlich wie Modular Multitask Reinforcement Learning with Policy Sketches (20)

MILA DL & RL summer school highlights
MILA DL & RL summer school highlights MILA DL & RL summer school highlights
MILA DL & RL summer school highlights
 
The role of systems analysis in co-learning. Walter Rossing
The role of systems analysis in co-learning. Walter RossingThe role of systems analysis in co-learning. Walter Rossing
The role of systems analysis in co-learning. Walter Rossing
 
Building AI Applications using Knowledge Graphs
Building AI Applications using Knowledge GraphsBuilding AI Applications using Knowledge Graphs
Building AI Applications using Knowledge Graphs
 
AI Beyond Deep Learning
AI Beyond Deep LearningAI Beyond Deep Learning
AI Beyond Deep Learning
 
Effective Semantics for Engineering NLP Systems
Effective Semantics for Engineering NLP SystemsEffective Semantics for Engineering NLP Systems
Effective Semantics for Engineering NLP Systems
 
Biological Foundations for Deep Learning: Towards Decision Networks
 Biological Foundations for Deep Learning: Towards Decision Networks Biological Foundations for Deep Learning: Towards Decision Networks
Biological Foundations for Deep Learning: Towards Decision Networks
 
Bridging the gap between AI and UI - DSI Vienna - full version
Bridging the gap between AI and UI - DSI Vienna - full versionBridging the gap between AI and UI - DSI Vienna - full version
Bridging the gap between AI and UI - DSI Vienna - full version
 
My degree is an EDD in Performance Improvement LeadershipSyste.docx
My degree is an EDD in Performance Improvement LeadershipSyste.docxMy degree is an EDD in Performance Improvement LeadershipSyste.docx
My degree is an EDD in Performance Improvement LeadershipSyste.docx
 
Goal Dynamics_From System Dynamics to Implementation
Goal Dynamics_From System Dynamics to ImplementationGoal Dynamics_From System Dynamics to Implementation
Goal Dynamics_From System Dynamics to Implementation
 
Information Systems Action research methods
Information Systems  Action research methodsInformation Systems  Action research methods
Information Systems Action research methods
 
Architecture Design for Deep Neural Networks III
Architecture Design for Deep Neural Networks IIIArchitecture Design for Deep Neural Networks III
Architecture Design for Deep Neural Networks III
 
Xin Yao: "What can evolutionary computation do for you?"
Xin Yao: "What can evolutionary computation do for you?"Xin Yao: "What can evolutionary computation do for you?"
Xin Yao: "What can evolutionary computation do for you?"
 
OO Development 1 - Introduction to Object-Oriented Development
OO Development 1 - Introduction to Object-Oriented DevelopmentOO Development 1 - Introduction to Object-Oriented Development
OO Development 1 - Introduction to Object-Oriented Development
 
Object-Oriented Design Heuristics
Object-Oriented Design HeuristicsObject-Oriented Design Heuristics
Object-Oriented Design Heuristics
 
[PR12] understanding deep learning requires rethinking generalization
[PR12] understanding deep learning requires rethinking generalization[PR12] understanding deep learning requires rethinking generalization
[PR12] understanding deep learning requires rethinking generalization
 
Professor Steve Roberts; The Bayesian Crowd: scalable information combinati...
Professor Steve Roberts; The Bayesian Crowd: scalable information combinati...Professor Steve Roberts; The Bayesian Crowd: scalable information combinati...
Professor Steve Roberts; The Bayesian Crowd: scalable information combinati...
 
Professor Steve Roberts; The Bayesian Crowd: scalable information combinati...
Professor Steve Roberts; The Bayesian Crowd: scalable information combinati...Professor Steve Roberts; The Bayesian Crowd: scalable information combinati...
Professor Steve Roberts; The Bayesian Crowd: scalable information combinati...
 
The overlaps between Action Research and Design Research
The overlaps between Action Research and Design ResearchThe overlaps between Action Research and Design Research
The overlaps between Action Research and Design Research
 
JISC RSC London Workshop - Learner analytics
JISC RSC London Workshop - Learner analyticsJISC RSC London Workshop - Learner analytics
JISC RSC London Workshop - Learner analytics
 
Responsibility Driven Design
Responsibility Driven DesignResponsibility Driven Design
Responsibility Driven Design
 

Mehr von Yoonho Lee

Meta-learning and the ELBO
Meta-learning and the ELBOMeta-learning and the ELBO
Meta-learning and the ELBOYoonho Lee
 
On First-Order Meta-Learning Algorithms
On First-Order Meta-Learning AlgorithmsOn First-Order Meta-Learning Algorithms
On First-Order Meta-Learning AlgorithmsYoonho Lee
 
New Insights and Perspectives on the Natural Gradient Method
New Insights and Perspectives on the Natural Gradient MethodNew Insights and Perspectives on the Natural Gradient Method
New Insights and Perspectives on the Natural Gradient MethodYoonho Lee
 
Parameter Space Noise for Exploration
Parameter Space Noise for ExplorationParameter Space Noise for Exploration
Parameter Space Noise for ExplorationYoonho Lee
 
Continuous Adaptation via Meta Learning in Nonstationary and Competitive Envi...
Continuous Adaptation via Meta Learning in Nonstationary and Competitive Envi...Continuous Adaptation via Meta Learning in Nonstationary and Competitive Envi...
Continuous Adaptation via Meta Learning in Nonstationary and Competitive Envi...Yoonho Lee
 
Dueling Network Architectures for Deep Reinforcement Learning
Dueling Network Architectures for Deep Reinforcement LearningDueling Network Architectures for Deep Reinforcement Learning
Dueling Network Architectures for Deep Reinforcement LearningYoonho Lee
 
Gradient Estimation Using Stochastic Computation Graphs
Gradient Estimation Using Stochastic Computation GraphsGradient Estimation Using Stochastic Computation Graphs
Gradient Estimation Using Stochastic Computation GraphsYoonho Lee
 

Mehr von Yoonho Lee (7)

Meta-learning and the ELBO
Meta-learning and the ELBOMeta-learning and the ELBO
Meta-learning and the ELBO
 
On First-Order Meta-Learning Algorithms
On First-Order Meta-Learning AlgorithmsOn First-Order Meta-Learning Algorithms
On First-Order Meta-Learning Algorithms
 
New Insights and Perspectives on the Natural Gradient Method
New Insights and Perspectives on the Natural Gradient MethodNew Insights and Perspectives on the Natural Gradient Method
New Insights and Perspectives on the Natural Gradient Method
 
Parameter Space Noise for Exploration
Parameter Space Noise for ExplorationParameter Space Noise for Exploration
Parameter Space Noise for Exploration
 
Continuous Adaptation via Meta Learning in Nonstationary and Competitive Envi...
Continuous Adaptation via Meta Learning in Nonstationary and Competitive Envi...Continuous Adaptation via Meta Learning in Nonstationary and Competitive Envi...
Continuous Adaptation via Meta Learning in Nonstationary and Competitive Envi...
 
Dueling Network Architectures for Deep Reinforcement Learning
Dueling Network Architectures for Deep Reinforcement LearningDueling Network Architectures for Deep Reinforcement Learning
Dueling Network Architectures for Deep Reinforcement Learning
 
Gradient Estimation Using Stochastic Computation Graphs
Gradient Estimation Using Stochastic Computation GraphsGradient Estimation Using Stochastic Computation Graphs
Gradient Estimation Using Stochastic Computation Graphs
 

Kürzlich hochgeladen

No Advance 8868886958 Chandigarh Call Girls , Indian Call Girls For Full Nigh...
No Advance 8868886958 Chandigarh Call Girls , Indian Call Girls For Full Nigh...No Advance 8868886958 Chandigarh Call Girls , Indian Call Girls For Full Nigh...
No Advance 8868886958 Chandigarh Call Girls , Indian Call Girls For Full Nigh...Sheetaleventcompany
 
Mathematics of Finance Presentation.pptx
Mathematics of Finance Presentation.pptxMathematics of Finance Presentation.pptx
Mathematics of Finance Presentation.pptxMoumonDas2
 
BDSM⚡Call Girls in Sector 93 Noida Escorts >༒8448380779 Escort Service
BDSM⚡Call Girls in Sector 93 Noida Escorts >༒8448380779 Escort ServiceBDSM⚡Call Girls in Sector 93 Noida Escorts >༒8448380779 Escort Service
BDSM⚡Call Girls in Sector 93 Noida Escorts >༒8448380779 Escort ServiceDelhi Call girls
 
George Lever - eCommerce Day Chile 2024
George Lever -  eCommerce Day Chile 2024George Lever -  eCommerce Day Chile 2024
George Lever - eCommerce Day Chile 2024eCommerce Institute
 
ANCHORING SCRIPT FOR A CULTURAL EVENT.docx
ANCHORING SCRIPT FOR A CULTURAL EVENT.docxANCHORING SCRIPT FOR A CULTURAL EVENT.docx
ANCHORING SCRIPT FOR A CULTURAL EVENT.docxNikitaBankoti2
 
SaaStr Workshop Wednesday w/ Lucas Price, Yardstick
SaaStr Workshop Wednesday w/ Lucas Price, YardstickSaaStr Workshop Wednesday w/ Lucas Price, Yardstick
SaaStr Workshop Wednesday w/ Lucas Price, Yardsticksaastr
 
Microsoft Copilot AI for Everyone - created by AI
Microsoft Copilot AI for Everyone - created by AIMicrosoft Copilot AI for Everyone - created by AI
Microsoft Copilot AI for Everyone - created by AITatiana Gurgel
 
Introduction to Prompt Engineering (Focusing on ChatGPT)
Introduction to Prompt Engineering (Focusing on ChatGPT)Introduction to Prompt Engineering (Focusing on ChatGPT)
Introduction to Prompt Engineering (Focusing on ChatGPT)Chameera Dedduwage
 
If this Giant Must Walk: A Manifesto for a New Nigeria
If this Giant Must Walk: A Manifesto for a New NigeriaIf this Giant Must Walk: A Manifesto for a New Nigeria
If this Giant Must Walk: A Manifesto for a New NigeriaKayode Fayemi
 
Night 7k Call Girls Noida Sector 128 Call Me: 8448380779
Night 7k Call Girls Noida Sector 128 Call Me: 8448380779Night 7k Call Girls Noida Sector 128 Call Me: 8448380779
Night 7k Call Girls Noida Sector 128 Call Me: 8448380779Delhi Call girls
 
CTAC 2024 Valencia - Sven Zoelle - Most Crucial Invest to Digitalisation_slid...
CTAC 2024 Valencia - Sven Zoelle - Most Crucial Invest to Digitalisation_slid...CTAC 2024 Valencia - Sven Zoelle - Most Crucial Invest to Digitalisation_slid...
CTAC 2024 Valencia - Sven Zoelle - Most Crucial Invest to Digitalisation_slid...henrik385807
 
Re-membering the Bard: Revisiting The Compleat Wrks of Wllm Shkspr (Abridged)...
Re-membering the Bard: Revisiting The Compleat Wrks of Wllm Shkspr (Abridged)...Re-membering the Bard: Revisiting The Compleat Wrks of Wllm Shkspr (Abridged)...
Re-membering the Bard: Revisiting The Compleat Wrks of Wllm Shkspr (Abridged)...Hasting Chen
 
Governance and Nation-Building in Nigeria: Some Reflections on Options for Po...
Governance and Nation-Building in Nigeria: Some Reflections on Options for Po...Governance and Nation-Building in Nigeria: Some Reflections on Options for Po...
Governance and Nation-Building in Nigeria: Some Reflections on Options for Po...Kayode Fayemi
 
Mohammad_Alnahdi_Oral_Presentation_Assignment.pptx
Mohammad_Alnahdi_Oral_Presentation_Assignment.pptxMohammad_Alnahdi_Oral_Presentation_Assignment.pptx
Mohammad_Alnahdi_Oral_Presentation_Assignment.pptxmohammadalnahdi22
 
Call Girl Number in Khar Mumbai📲 9892124323 💞 Full Night Enjoy
Call Girl Number in Khar Mumbai📲 9892124323 💞 Full Night EnjoyCall Girl Number in Khar Mumbai📲 9892124323 💞 Full Night Enjoy
Call Girl Number in Khar Mumbai📲 9892124323 💞 Full Night EnjoyPooja Nehwal
 
Presentation on Engagement in Book Clubs
Presentation on Engagement in Book ClubsPresentation on Engagement in Book Clubs
Presentation on Engagement in Book Clubssamaasim06
 
WhatsApp 📞 9892124323 ✅Call Girls In Juhu ( Mumbai )
WhatsApp 📞 9892124323 ✅Call Girls In Juhu ( Mumbai )WhatsApp 📞 9892124323 ✅Call Girls In Juhu ( Mumbai )
WhatsApp 📞 9892124323 ✅Call Girls In Juhu ( Mumbai )Pooja Nehwal
 
Andrés Ramírez Gossler, Facundo Schinnea - eCommerce Day Chile 2024
Andrés Ramírez Gossler, Facundo Schinnea - eCommerce Day Chile 2024Andrés Ramírez Gossler, Facundo Schinnea - eCommerce Day Chile 2024
Andrés Ramírez Gossler, Facundo Schinnea - eCommerce Day Chile 2024eCommerce Institute
 
CTAC 2024 Valencia - Henrik Hanke - Reduce to the max - slideshare.pdf
CTAC 2024 Valencia - Henrik Hanke - Reduce to the max - slideshare.pdfCTAC 2024 Valencia - Henrik Hanke - Reduce to the max - slideshare.pdf
CTAC 2024 Valencia - Henrik Hanke - Reduce to the max - slideshare.pdfhenrik385807
 
Thirunelveli call girls Tamil escorts 7877702510
Thirunelveli call girls Tamil escorts 7877702510Thirunelveli call girls Tamil escorts 7877702510
Thirunelveli call girls Tamil escorts 7877702510Vipesco
 

Kürzlich hochgeladen (20)

No Advance 8868886958 Chandigarh Call Girls , Indian Call Girls For Full Nigh...
No Advance 8868886958 Chandigarh Call Girls , Indian Call Girls For Full Nigh...No Advance 8868886958 Chandigarh Call Girls , Indian Call Girls For Full Nigh...
No Advance 8868886958 Chandigarh Call Girls , Indian Call Girls For Full Nigh...
 
Mathematics of Finance Presentation.pptx
Mathematics of Finance Presentation.pptxMathematics of Finance Presentation.pptx
Mathematics of Finance Presentation.pptx
 
BDSM⚡Call Girls in Sector 93 Noida Escorts >༒8448380779 Escort Service
BDSM⚡Call Girls in Sector 93 Noida Escorts >༒8448380779 Escort ServiceBDSM⚡Call Girls in Sector 93 Noida Escorts >༒8448380779 Escort Service
BDSM⚡Call Girls in Sector 93 Noida Escorts >༒8448380779 Escort Service
 
George Lever - eCommerce Day Chile 2024
George Lever -  eCommerce Day Chile 2024George Lever -  eCommerce Day Chile 2024
George Lever - eCommerce Day Chile 2024
 
ANCHORING SCRIPT FOR A CULTURAL EVENT.docx
ANCHORING SCRIPT FOR A CULTURAL EVENT.docxANCHORING SCRIPT FOR A CULTURAL EVENT.docx
ANCHORING SCRIPT FOR A CULTURAL EVENT.docx
 
SaaStr Workshop Wednesday w/ Lucas Price, Yardstick
SaaStr Workshop Wednesday w/ Lucas Price, YardstickSaaStr Workshop Wednesday w/ Lucas Price, Yardstick
SaaStr Workshop Wednesday w/ Lucas Price, Yardstick
 
Microsoft Copilot AI for Everyone - created by AI
Microsoft Copilot AI for Everyone - created by AIMicrosoft Copilot AI for Everyone - created by AI
Microsoft Copilot AI for Everyone - created by AI
 
Introduction to Prompt Engineering (Focusing on ChatGPT)
Introduction to Prompt Engineering (Focusing on ChatGPT)Introduction to Prompt Engineering (Focusing on ChatGPT)
Introduction to Prompt Engineering (Focusing on ChatGPT)
 
If this Giant Must Walk: A Manifesto for a New Nigeria
If this Giant Must Walk: A Manifesto for a New NigeriaIf this Giant Must Walk: A Manifesto for a New Nigeria
If this Giant Must Walk: A Manifesto for a New Nigeria
 
Night 7k Call Girls Noida Sector 128 Call Me: 8448380779
Night 7k Call Girls Noida Sector 128 Call Me: 8448380779Night 7k Call Girls Noida Sector 128 Call Me: 8448380779
Night 7k Call Girls Noida Sector 128 Call Me: 8448380779
 
CTAC 2024 Valencia - Sven Zoelle - Most Crucial Invest to Digitalisation_slid...
CTAC 2024 Valencia - Sven Zoelle - Most Crucial Invest to Digitalisation_slid...CTAC 2024 Valencia - Sven Zoelle - Most Crucial Invest to Digitalisation_slid...
CTAC 2024 Valencia - Sven Zoelle - Most Crucial Invest to Digitalisation_slid...
 
Re-membering the Bard: Revisiting The Compleat Wrks of Wllm Shkspr (Abridged)...
Re-membering the Bard: Revisiting The Compleat Wrks of Wllm Shkspr (Abridged)...Re-membering the Bard: Revisiting The Compleat Wrks of Wllm Shkspr (Abridged)...
Re-membering the Bard: Revisiting The Compleat Wrks of Wllm Shkspr (Abridged)...
 
Governance and Nation-Building in Nigeria: Some Reflections on Options for Po...
Governance and Nation-Building in Nigeria: Some Reflections on Options for Po...Governance and Nation-Building in Nigeria: Some Reflections on Options for Po...
Governance and Nation-Building in Nigeria: Some Reflections on Options for Po...
 
Mohammad_Alnahdi_Oral_Presentation_Assignment.pptx
Mohammad_Alnahdi_Oral_Presentation_Assignment.pptxMohammad_Alnahdi_Oral_Presentation_Assignment.pptx
Mohammad_Alnahdi_Oral_Presentation_Assignment.pptx
 
Call Girl Number in Khar Mumbai📲 9892124323 💞 Full Night Enjoy
Call Girl Number in Khar Mumbai📲 9892124323 💞 Full Night EnjoyCall Girl Number in Khar Mumbai📲 9892124323 💞 Full Night Enjoy
Call Girl Number in Khar Mumbai📲 9892124323 💞 Full Night Enjoy
 
Presentation on Engagement in Book Clubs
Presentation on Engagement in Book ClubsPresentation on Engagement in Book Clubs
Presentation on Engagement in Book Clubs
 
WhatsApp 📞 9892124323 ✅Call Girls In Juhu ( Mumbai )
WhatsApp 📞 9892124323 ✅Call Girls In Juhu ( Mumbai )WhatsApp 📞 9892124323 ✅Call Girls In Juhu ( Mumbai )
WhatsApp 📞 9892124323 ✅Call Girls In Juhu ( Mumbai )
 
Andrés Ramírez Gossler, Facundo Schinnea - eCommerce Day Chile 2024
Andrés Ramírez Gossler, Facundo Schinnea - eCommerce Day Chile 2024Andrés Ramírez Gossler, Facundo Schinnea - eCommerce Day Chile 2024
Andrés Ramírez Gossler, Facundo Schinnea - eCommerce Day Chile 2024
 
CTAC 2024 Valencia - Henrik Hanke - Reduce to the max - slideshare.pdf
CTAC 2024 Valencia - Henrik Hanke - Reduce to the max - slideshare.pdfCTAC 2024 Valencia - Henrik Hanke - Reduce to the max - slideshare.pdf
CTAC 2024 Valencia - Henrik Hanke - Reduce to the max - slideshare.pdf
 
Thirunelveli call girls Tamil escorts 7877702510
Thirunelveli call girls Tamil escorts 7877702510Thirunelveli call girls Tamil escorts 7877702510
Thirunelveli call girls Tamil escorts 7877702510
 

Modular Multitask Reinforcement Learning with Policy Sketches

  • 1. Modular Multitask Reinforcement Learning with Policy Sketches Yoonho Lee Department of Computer Science and Engineering Pohang University of Science and Technology August 09, 2017
  • 2. Modular Multitask Reinforcement Learning with Policy Sketches ICML 2017 Best Paper Honorable Mention Hierarchical Reinforcement Learning Proposes a looser form of task supervision for RL agents (compared to e.g. reward shaping) Proposed form of task supervision is decoupled from the environment Proposes a new way to use NNs for hierarchical RL Natural extensions to zero-shot and unlabelled RL
  • 3. Hierarchical Reinforcement Learning Why is starcraft harder than go for RL? Why is starcraft not harder than go for humans?
  • 4. Hierarchical Reinforcement Learning Decision = Action. RL algorithms are designed with decisions in mind, but operate on actions. Real world decisions have hierarchical structure(e.g. cook dinner → cut potato → activate arm muscle) Effective knowledge reuse Effecient credit assignment Open question: How do we discover salient/reusable decisions?
  • 5. Previous Work Learn multiple timestep policy along with confidence1 Learn controller network that sets desired direction of state change2 Encode multiple sub-policies with an SNN3 Actor-Critic algorithm for options4 1 Alexander Vezhnevets et al. “Strategic Attentive Writer for Learning Macro-Actions”. In: NIPS (2016). 2 Alexander Sasha Vezhnevets et al. “FeUdal Networks for Hierarchical Reinforcement Learning”. In: (2017). 3 Carlos Florensa, Yan Duan, and Pieter Abbeel. “Stochastic Neural Networks for Hierarchical Reinforcement Learning”. In: ICLR 2017 (2017), pp. 1056–1064. 4 Pierre-Luc Bacon, Jean Harb, and Doina Precup. “The Option-Critic Architecture”. In: AAAI (2017).
  • 6. Previous Work Previous hierarchical RL algorithms require reward engineering for high-dimensional environments
  • 15. Discussion Weaknesses Only works for tasks that are strictly composed To label, one needs to understand which tasks are ”RL-able” Labels needed: Fundamentally limited to tasks understood by humans
  • 16. Discussion Implications Zero-shot and Adaptation experiments had similar performance, what does this imply? Weight sharing between policies? Curriculum instead of policy sketches? Common theme of cooperating networks for hierarchical RL: Manager-Worker5, Option-Policy6, Policy modules7 5 Alexander Sasha Vezhnevets et al. “FeUdal Networks for Hierarchical Reinforcement Learning”. In: (2017). 6 Pierre-Luc Bacon, Jean Harb, and Doina Precup. “The Option-Critic Architecture”. In: AAAI (2017). 7 Jacob Andreas, Dan Klein, and Sergey Levine. “Modular Multitask Reinforcement Learning with Policy Sketches”. In: ICML (2017). arXiv: 1611.01796.
  • 17. References I [1] Jacob Andreas, Dan Klein, and Sergey Levine. “Modular Multitask Reinforcement Learning with Policy Sketches”. In: ICML (2017). arXiv: 1611.01796. [2] Pierre-Luc Bacon, Jean Harb, and Doina Precup. “The Option-Critic Architecture”. In: AAAI (2017). [3] Carlos Florensa, Yan Duan, and Pieter Abbeel. “Stochastic Neural Networks for Hierarchical Reinforcement Learning”. In: ICLR 2017 (2017), pp. 1056–1064. [4] Alexander Vezhnevets et al. “Strategic Attentive Writer for Learning Macro-Actions”. In: NIPS (2016). [5] Alexander Sasha Vezhnevets et al. “FeUdal Networks for Hierarchical Reinforcement Learning”. In: (2017).