Modular Multitask Reinforcement Learning with Policy Sketches

•

0 gefällt mir•252 views

Yoonho Lee

Presentation slides on hierarchical reinforcement learning algorithms

Präsentationen & Vorträge

Modular Multitask Reinforcement Learning with
Policy Sketches
Yoonho Lee
Department of Computer Science and Engineering
Pohang University of Science and Technology
August 09, 2017

Modular Multitask Reinforcement Learning with Policy
Sketches
ICML 2017 Best Paper Honorable Mention
Hierarchical Reinforcement Learning
Proposes a looser form of task supervision for RL agents
(compared to e.g. reward shaping)
Proposed form of task supervision is decoupled from the
environment
Proposes a new way to use NNs for hierarchical RL
Natural extensions to zero-shot and unlabelled RL

Hierarchical Reinforcement Learning
Why is starcraft harder than go for RL?
Why is starcraft not harder than go for humans?

Hierarchical Reinforcement Learning
Decision = Action. RL algorithms are designed with decisions
in mind, but operate on actions.
Real world decisions have hierarchical structure(e.g. cook
dinner → cut potato → activate arm muscle)
Eﬀective knowledge reuse
Eﬀecient credit assignment
Open question: How do we discover salient/reusable
decisions?

Previous Work
Learn multiple timestep policy along with conﬁdence1
Learn controller network that sets desired direction of state
change2
Encode multiple sub-policies with an SNN3
Actor-Critic algorithm for options4
1
Alexander Vezhnevets et al. “Strategic Attentive Writer for Learning
Macro-Actions”. In: NIPS (2016).
2
Alexander Sasha Vezhnevets et al. “FeUdal Networks for Hierarchical
Reinforcement Learning”. In: (2017).
3
Carlos Florensa, Yan Duan, and Pieter Abbeel. “Stochastic Neural
Networks for Hierarchical Reinforcement Learning”. In: ICLR 2017 (2017),
pp. 1056–1064.
4
Pierre-Luc Bacon, Jean Harb, and Doina Precup. “The Option-Critic
Architecture”. In: AAAI (2017).

Previous Work
Previous hierarchical RL algorithms require reward engineering
for high-dimensional environments

Discussion
Weaknesses
Only works for tasks that are strictly composed
To label, one needs to understand which tasks are ”RL-able”
Labels needed: Fundamentally limited to tasks understood by
humans

Discussion
Implications
Zero-shot and Adaptation experiments had similar
performance, what does this imply?
Weight sharing between policies?
Curriculum instead of policy sketches?
Common theme of cooperating networks for hierarchical RL:
Manager-Worker5, Option-Policy6, Policy modules7
5
Alexander Sasha Vezhnevets et al. “FeUdal Networks for Hierarchical
Reinforcement Learning”. In: (2017).
6
Pierre-Luc Bacon, Jean Harb, and Doina Precup. “The Option-Critic
Architecture”. In: AAAI (2017).
7
Jacob Andreas, Dan Klein, and Sergey Levine. “Modular Multitask
Reinforcement Learning with Policy Sketches”. In: ICML (2017). arXiv:
1611.01796.

References I
[1] Jacob Andreas, Dan Klein, and Sergey Levine. “Modular
Multitask Reinforcement Learning with Policy Sketches”. In:
ICML (2017). arXiv: 1611.01796.
[2] Pierre-Luc Bacon, Jean Harb, and Doina Precup. “The
Option-Critic Architecture”. In: AAAI (2017).
[3] Carlos Florensa, Yan Duan, and Pieter Abbeel. “Stochastic
Neural Networks for Hierarchical Reinforcement Learning”.
In: ICLR 2017 (2017), pp. 1056–1064.
[4] Alexander Vezhnevets et al. “Strategic Attentive Writer for
Learning Macro-Actions”. In: NIPS (2016).
[5] Alexander Sasha Vezhnevets et al. “FeUdal Networks for
Hierarchical Reinforcement Learning”. In: (2017).

Weitere ähnliche Inhalte

Was ist angesagt?

Introduction to Model-Based Machine LearningDaniel Emaasit

MultiObjective(11) - CopyAMIT KUMAR

2011611009pavimeena88

Using Dempster-Shafer Theory and Real Options TheoryEric van Heck

Always adopt self supervised learningLibgirlTeam

final seminarAMIT KUMAR

Jlqiongyu

Icml2018 naver reviewNAVER Engineering

Predictive MetabonomicsMarilyn Arceo

CNN Structure: From LeNet to ShuffleNetDalin Zhang

A h k clustering algorithm for high dimensional data using ensemble learningijitcs

One shot learningVuong Ho Ngoc

Fahroo - Optimization and Discrete Mathematics - Spring Review 2013The Air Force Office of Scientific Research

Review : Adaptive Consistency Regularization for Semi-Supervised Transfer Lea...Dongmin Choi

PREDICTING STOCK PRICE MOVEMENTS BASED ON NEWSPAPER ARTICLES USING A NOVEL DE...webwinkelvakdag

Efficient projectionsTomasz Waszczyk

CLIM: Transition Workshop - Introduction to Remote Sensing Working Group - A...The Statistical and Applied Mathematical Sciences Institute

Bellman Equation in Dynamic Programmingijcoa

A Survey of Generative Adversarial Neural Networks (GAN) for Text-to-Image Sy...Mirsaeid Abolghasemi

Ivy Zhu, Research Scientist, Intel at MLconf SEA - 5/01/15MLconf

Was ist angesagt? (20)

Introduction to Model-Based Machine Learning

MultiObjective(11) - Copy

2011611009

Using Dempster-Shafer Theory and Real Options Theory

Always adopt self supervised learning

final seminar

Icml2018 naver review

Predictive Metabonomics

CNN Structure: From LeNet to ShuffleNet

A h k clustering algorithm for high dimensional data using ensemble learning

One shot learning

Fahroo - Optimization and Discrete Mathematics - Spring Review 2013

Review : Adaptive Consistency Regularization for Semi-Supervised Transfer Lea...

PREDICTING STOCK PRICE MOVEMENTS BASED ON NEWSPAPER ARTICLES USING A NOVEL DE...

Efficient projections

CLIM: Transition Workshop - Introduction to Remote Sensing Working Group - A...

Bellman Equation in Dynamic Programming

A Survey of Generative Adversarial Neural Networks (GAN) for Text-to-Image Sy...

Ivy Zhu, Research Scientist, Intel at MLconf SEA - 5/01/15

Ähnlich wie Modular Multitask Reinforcement Learning with Policy Sketches

MILA DL & RL summer school highlights Natalia Díaz Rodríguez

The role of systems analysis in co-learning. Walter RossingJoanna Hicks

Building AI Applications using Knowledge GraphsAndre Freitas

AI Beyond Deep LearningAndre Freitas

Effective Semantics for Engineering NLP SystemsAndre Freitas

Biological Foundations for Deep Learning: Towards Decision Networksdiannepatricia

Bridging the gap between AI and UI - DSI Vienna - full versionLiad Magen

My degree is an EDD in Performance Improvement LeadershipSyste.docxgriffinruthie22

Goal Dynamics_From System Dynamics to ImplementationAmjad Adib

Information Systems Action research methodsRaimo Halinen

Architecture Design for Deep Neural Networks IIIWanjin Yu

Xin Yao: "What can evolutionary computation do for you?"ieee_cis_cyprus

OO Development 1 - Introduction to Object-Oriented DevelopmentRandy Connolly

Object-Oriented Design Heuristicskim.mens

[PR12] understanding deep learning requires rethinking generalizationJaeJun Yoo

Professor Steve Roberts; The Bayesian Crowd: scalable information combinati...Ian Morgan

Professor Steve Roberts; The Bayesian Crowd: scalable information combinati...Bayes Nets meetup London

The overlaps between Action Research and Design ResearchSandeep Purao

JISC RSC London Workshop - Learner analyticsJames Ballard

Responsibility Driven DesignHarsh Jegadeesan

Ähnlich wie Modular Multitask Reinforcement Learning with Policy Sketches (20)

MILA DL & RL summer school highlights

The role of systems analysis in co-learning. Walter Rossing

Building AI Applications using Knowledge Graphs

AI Beyond Deep Learning

Effective Semantics for Engineering NLP Systems

Biological Foundations for Deep Learning: Towards Decision Networks

Bridging the gap between AI and UI - DSI Vienna - full version

My degree is an EDD in Performance Improvement LeadershipSyste.docx

Goal Dynamics_From System Dynamics to Implementation

Information Systems Action research methods

Architecture Design for Deep Neural Networks III

Xin Yao: "What can evolutionary computation do for you?"

OO Development 1 - Introduction to Object-Oriented Development

Object-Oriented Design Heuristics

[PR12] understanding deep learning requires rethinking generalization

Professor Steve Roberts; The Bayesian Crowd: scalable information combinati...

The overlaps between Action Research and Design Research

JISC RSC London Workshop - Learner analytics

Responsibility Driven Design

Mehr von Yoonho Lee

Meta-learning and the ELBOYoonho Lee

On First-Order Meta-Learning AlgorithmsYoonho Lee

New Insights and Perspectives on the Natural Gradient MethodYoonho Lee

Parameter Space Noise for ExplorationYoonho Lee

Continuous Adaptation via Meta Learning in Nonstationary and Competitive Envi...Yoonho Lee

Dueling Network Architectures for Deep Reinforcement LearningYoonho Lee

Gradient Estimation Using Stochastic Computation GraphsYoonho Lee

Mehr von Yoonho Lee (7)

Meta-learning and the ELBO

On First-Order Meta-Learning Algorithms

New Insights and Perspectives on the Natural Gradient Method

Parameter Space Noise for Exploration

Continuous Adaptation via Meta Learning in Nonstationary and Competitive Envi...

Dueling Network Architectures for Deep Reinforcement Learning

Gradient Estimation Using Stochastic Computation Graphs

Kürzlich hochgeladen

No Advance 8868886958 Chandigarh Call Girls , Indian Call Girls For Full Nigh...Sheetaleventcompany

Mathematics of Finance Presentation.pptxMoumonDas2

BDSM⚡Call Girls in Sector 93 Noida Escorts >༒8448380779 Escort ServiceDelhi Call girls

George Lever - eCommerce Day Chile 2024eCommerce Institute

ANCHORING SCRIPT FOR A CULTURAL EVENT.docxNikitaBankoti2

SaaStr Workshop Wednesday w/ Lucas Price, Yardsticksaastr

Microsoft Copilot AI for Everyone - created by AITatiana Gurgel

Introduction to Prompt Engineering (Focusing on ChatGPT)Chameera Dedduwage

If this Giant Must Walk: A Manifesto for a New NigeriaKayode Fayemi

Night 7k Call Girls Noida Sector 128 Call Me: 8448380779Delhi Call girls

CTAC 2024 Valencia - Sven Zoelle - Most Crucial Invest to Digitalisation_slid...henrik385807

Re-membering the Bard: Revisiting The Compleat Wrks of Wllm Shkspr (Abridged)...Hasting Chen

Governance and Nation-Building in Nigeria: Some Reflections on Options for Po...Kayode Fayemi

Mohammad_Alnahdi_Oral_Presentation_Assignment.pptxmohammadalnahdi22

Call Girl Number in Khar Mumbai📲 9892124323 💞 Full Night EnjoyPooja Nehwal

Presentation on Engagement in Book Clubssamaasim06

WhatsApp 📞 9892124323 ✅Call Girls In Juhu ( Mumbai )Pooja Nehwal

Andrés Ramírez Gossler, Facundo Schinnea - eCommerce Day Chile 2024eCommerce Institute

CTAC 2024 Valencia - Henrik Hanke - Reduce to the max - slideshare.pdfhenrik385807

Thirunelveli call girls Tamil escorts 7877702510Vipesco

Kürzlich hochgeladen (20)

No Advance 8868886958 Chandigarh Call Girls , Indian Call Girls For Full Nigh...

Mathematics of Finance Presentation.pptx

BDSM⚡Call Girls in Sector 93 Noida Escorts >༒8448380779 Escort Service

George Lever - eCommerce Day Chile 2024

ANCHORING SCRIPT FOR A CULTURAL EVENT.docx

SaaStr Workshop Wednesday w/ Lucas Price, Yardstick

Microsoft Copilot AI for Everyone - created by AI

Introduction to Prompt Engineering (Focusing on ChatGPT)

If this Giant Must Walk: A Manifesto for a New Nigeria

Night 7k Call Girls Noida Sector 128 Call Me: 8448380779

CTAC 2024 Valencia - Sven Zoelle - Most Crucial Invest to Digitalisation_slid...

Re-membering the Bard: Revisiting The Compleat Wrks of Wllm Shkspr (Abridged)...

Governance and Nation-Building in Nigeria: Some Reflections on Options for Po...

Mohammad_Alnahdi_Oral_Presentation_Assignment.pptx

Call Girl Number in Khar Mumbai📲 9892124323 💞 Full Night Enjoy

Presentation on Engagement in Book Clubs

WhatsApp 📞 9892124323 ✅Call Girls In Juhu ( Mumbai )

Andrés Ramírez Gossler, Facundo Schinnea - eCommerce Day Chile 2024

CTAC 2024 Valencia - Henrik Hanke - Reduce to the max - slideshare.pdf

Thirunelveli call girls Tamil escorts 7877702510

Modular Multitask Reinforcement Learning with Policy Sketches

1. Modular Multitask Reinforcement Learning with Policy Sketches Yoonho Lee Department of Computer Science and Engineering Pohang University of Science and Technology August 09, 2017

2. Modular Multitask Reinforcement Learning with Policy Sketches ICML 2017 Best Paper Honorable Mention Hierarchical Reinforcement Learning Proposes a looser form of task supervision for RL agents (compared to e.g. reward shaping) Proposed form of task supervision is decoupled from the environment Proposes a new way to use NNs for hierarchical RL Natural extensions to zero-shot and unlabelled RL

3. Hierarchical Reinforcement Learning Why is starcraft harder than go for RL? Why is starcraft not harder than go for humans?

4. Hierarchical Reinforcement Learning Decision = Action. RL algorithms are designed with decisions in mind, but operate on actions. Real world decisions have hierarchical structure(e.g. cook dinner → cut potato → activate arm muscle) Eﬀective knowledge reuse Eﬀecient credit assignment Open question: How do we discover salient/reusable decisions?

5. Previous Work Learn multiple timestep policy along with conﬁdence1 Learn controller network that sets desired direction of state change2 Encode multiple sub-policies with an SNN3 Actor-Critic algorithm for options4 1 Alexander Vezhnevets et al. “Strategic Attentive Writer for Learning Macro-Actions”. In: NIPS (2016). 2 Alexander Sasha Vezhnevets et al. “FeUdal Networks for Hierarchical Reinforcement Learning”. In: (2017). 3 Carlos Florensa, Yan Duan, and Pieter Abbeel. “Stochastic Neural Networks for Hierarchical Reinforcement Learning”. In: ICLR 2017 (2017), pp. 1056–1064. 4 Pierre-Luc Bacon, Jean Harb, and Doina Precup. “The Option-Critic Architecture”. In: AAAI (2017).

6. Previous Work Previous hierarchical RL algorithms require reward engineering for high-dimensional environments

7. Proposed Approach

8. Proposed Approach Sketches

9. Method

10. Method

11. Method

12. Experiments Comparison to Baseline

13. Experiments Ablation

14. Experiments Accuracy

15. Discussion Weaknesses Only works for tasks that are strictly composed To label, one needs to understand which tasks are ”RL-able” Labels needed: Fundamentally limited to tasks understood by humans

16. Discussion Implications Zero-shot and Adaptation experiments had similar performance, what does this imply? Weight sharing between policies? Curriculum instead of policy sketches? Common theme of cooperating networks for hierarchical RL: Manager-Worker5, Option-Policy6, Policy modules7 5 Alexander Sasha Vezhnevets et al. “FeUdal Networks for Hierarchical Reinforcement Learning”. In: (2017). 6 Pierre-Luc Bacon, Jean Harb, and Doina Precup. “The Option-Critic Architecture”. In: AAAI (2017). 7 Jacob Andreas, Dan Klein, and Sergey Levine. “Modular Multitask Reinforcement Learning with Policy Sketches”. In: ICML (2017). arXiv: 1611.01796.

17. References I [1] Jacob Andreas, Dan Klein, and Sergey Levine. “Modular Multitask Reinforcement Learning with Policy Sketches”. In: ICML (2017). arXiv: 1611.01796. [2] Pierre-Luc Bacon, Jean Harb, and Doina Precup. “The Option-Critic Architecture”. In: AAAI (2017). [3] Carlos Florensa, Yan Duan, and Pieter Abbeel. “Stochastic Neural Networks for Hierarchical Reinforcement Learning”. In: ICLR 2017 (2017), pp. 1056–1064. [4] Alexander Vezhnevets et al. “Strategic Attentive Writer for Learning Macro-Actions”. In: NIPS (2016). [5] Alexander Sasha Vezhnevets et al. “FeUdal Networks for Hierarchical Reinforcement Learning”. In: (2017).

18. Thank You

Modular Multitask Reinforcement Learning with Policy Sketches

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Ähnlich wie Modular Multitask Reinforcement Learning with Policy Sketches

Ähnlich wie Modular Multitask Reinforcement Learning with Policy Sketches (20)

Mehr von Yoonho Lee

Mehr von Yoonho Lee (7)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Modular Multitask Reinforcement Learning with Policy Sketches