SlideShare ist ein Scribd-Unternehmen logo
1 von 59
Hierarchical Reinforcement Learning Mausam [A Survey and Comparison of HRL techniques]
The Outline of the Talk ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Decision Making Slide courtesy Dan Weld Environment Percept Action What action next?
Personal Printerbot ,[object Object],[object Object],[object Object],[object Object],[object Object]
Episodic Markov Decision Process ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],* Markovian assumption. ** bounds  R  for  infinite horizon. Episodic MDP  ´  MDP with absorbing goals
Goal of an Episodic MDP ,[object Object],[object Object],[object Object],[object Object],* Non-noisy complete information perceptors
Solution of an Episodic MDP ,[object Object],[object Object]
Complexity of Value Iteration ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],* Bellman’s curse of dimensionality
The Outline of the Talk ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Learning Environment Data ,[object Object],[object Object],[object Object],[object Object]
Decision Making while Learning* Environment Percepts Datum Action * Known as  Reinforcement Learning What action next?   ,[object Object],[object Object],[object Object],[object Object]
Reinforcement Learning ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Planning vs. MDP vs. RL ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Exploration vs. Exploitation ,[object Object],[object Object],[object Object],[object Object]
Model Based Learning ,[object Object],[object Object],[object Object],[object Object],[object Object]
Model Free Learning ,[object Object],[object Object],[object Object],[object Object]
Learning ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Q-Learning ,[object Object],[object Object],[object Object],Optimal policy is the action with maximum  Q * value.
Q-Learning ,[object Object],[object Object],New estimate of  Q  value Old estimate of  Q  value
Semi-MDP: When actions take time. ,[object Object],[object Object],[object Object],[object Object]
Printerbot ,[object Object],[object Object],[object Object],[object Object],[object Object]
The Outline of the Talk ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
1. The Mathematical Perspective A Structure Paradigm ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
2. Modular Decision Making
2. Modular Decision Making ,[object Object],[object Object],[object Object]
2. Modular Decision Making ,[object Object],[object Object],[object Object]
3. Background Knowledge ,[object Object],[object Object],[object Object],[object Object],[object Object]
A mechanism that exploits all three avenues : Hierarchies ,[object Object],[object Object],[object Object]
The Outline of the Talk ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Hierarchy ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Hierarchical Algos  ´  Gating Mechanism ,[object Object],[object Object],[object Object],[object Object],* *Can be a multi-  level hierarchy. g is a gate b i  is a behaviour
Option : Move e  until end of hallway ,[object Object],[object Object],[object Object]
Options  [Sutton, Precup, Singh’99] ,[object Object],[object Object],[object Object],[object Object],[object Object],*Can be a policy over lower level options.
Learning ,[object Object],[object Object],[object Object],[object Object]
Machine: Move e  + Collision Avoidance Move e Choose Return End of hallway :  End of hallway Obstacle Call M1 Call M2 M1 M2 Move w Move n Move n Return Move w Move s Move s Return
Hierarchies of Abstract Machines [Parr, Russell’97] ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Hierarchies of Abstract Machines ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Learning ,[object Object],[object Object],[object Object],[object Object]
Task Hierarchy: MAXQ Decomposition [Dietterich’00] Root Take Give Navigate(loc) Deliver Fetch Extend-arm Extend-arm Grab Release Move e Move w Move s Move n Children of a task are unordered
MAXQ Decomposition ,[object Object],[object Object],[object Object],[object Object],[object Object],*Observe the context-free nature of Q -value Reward received while navigating Reward received after navigation
The Outline of the Talk ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
1. State Abstraction ,[object Object],[object Object],[object Object]
State Abstraction in MAXQ ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
State Abstraction in Options, HAM ,[object Object],[object Object],[object Object],[object Object],[object Object],*[Andre,Russell’02]
2. Optimality Hierarchical Optimality vs. Recursive Optimality
Optimality ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],* Can define eqns for both optimalities **Adv. of using macro-actions maybe lost.
3. Language Expressiveness ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
4. Knowledge Requirements ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
5. Models advanced ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
6. Structure Paradigm ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
The Outline of the Talk ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Directions for Future Research ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Directions for Future Research ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Applications ,[object Object],[object Object],[object Object],[object Object],Images courtesy various sources Parts Assemblies Ware-house P2 P1 P3 P4 D2 D3 D4 D1                    
Thinking Big… ,[object Object],[object Object]
The Outline of the Talk ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
How to choose appropriate hierarchy ,[object Object],[object Object],[object Object],[object Object],[object Object]
The Structure Paradigm ,[object Object],[object Object]
Main ideas in HRL community ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Weitere ähnliche Inhalte

Was ist angesagt?

Continuous control with deep reinforcement learning (DDPG)
Continuous control with deep reinforcement learning (DDPG)Continuous control with deep reinforcement learning (DDPG)
Continuous control with deep reinforcement learning (DDPG)Taehoon Kim
 
Model-Based Reinforcement Learning @NIPS2017
Model-Based Reinforcement Learning @NIPS2017Model-Based Reinforcement Learning @NIPS2017
Model-Based Reinforcement Learning @NIPS2017mooopan
 
안.전.제.일. 강화학습!
안.전.제.일. 강화학습!안.전.제.일. 강화학습!
안.전.제.일. 강화학습!Dongmin Lee
 
Introduction to A3C model
Introduction to A3C modelIntroduction to A3C model
Introduction to A3C modelWEBFARMER. ltd.
 
강화 학습 기초 Reinforcement Learning an introduction
강화 학습 기초 Reinforcement Learning an introduction강화 학습 기초 Reinforcement Learning an introduction
강화 학습 기초 Reinforcement Learning an introductionTaehoon Kim
 
Reinforcement learning
Reinforcement learningReinforcement learning
Reinforcement learningDing Li
 
Imitation learning tutorial
Imitation learning tutorialImitation learning tutorial
Imitation learning tutorialYisong Yue
 
Deep reinforcement learning from scratch
Deep reinforcement learning from scratchDeep reinforcement learning from scratch
Deep reinforcement learning from scratchJie-Han Chen
 
DQN (Deep Q-Network)
DQN (Deep Q-Network)DQN (Deep Q-Network)
DQN (Deep Q-Network)Dong Guo
 
Introducing Zap Q-Learning
Introducing Zap Q-Learning   Introducing Zap Q-Learning
Introducing Zap Q-Learning Sean Meyn
 
An introduction to reinforcement learning
An introduction to  reinforcement learningAn introduction to  reinforcement learning
An introduction to reinforcement learningJie-Han Chen
 
Markov decision process
Markov decision processMarkov decision process
Markov decision processHamed Abdi
 
[DL輪読会]Active Domain Randomization
[DL輪読会]Active Domain Randomization[DL輪読会]Active Domain Randomization
[DL輪読会]Active Domain RandomizationDeep Learning JP
 
Artificial Intelligence: What Is Reinforcement Learning?
Artificial Intelligence: What Is Reinforcement Learning?Artificial Intelligence: What Is Reinforcement Learning?
Artificial Intelligence: What Is Reinforcement Learning?Bernard Marr
 
RLCode와 A3C 쉽고 깊게 이해하기
RLCode와 A3C 쉽고 깊게 이해하기RLCode와 A3C 쉽고 깊게 이해하기
RLCode와 A3C 쉽고 깊게 이해하기Woong won Lee
 
Reinforcement Learning
Reinforcement LearningReinforcement Learning
Reinforcement LearningSalem-Kabbani
 
Multi-armed Bandits
Multi-armed BanditsMulti-armed Bandits
Multi-armed BanditsDongmin Lee
 
(Paper Review)Image to image translation with conditional adversarial network...
(Paper Review)Image to image translation with conditional adversarial network...(Paper Review)Image to image translation with conditional adversarial network...
(Paper Review)Image to image translation with conditional adversarial network...MYEONGGYU LEE
 

Was ist angesagt? (20)

Continuous control with deep reinforcement learning (DDPG)
Continuous control with deep reinforcement learning (DDPG)Continuous control with deep reinforcement learning (DDPG)
Continuous control with deep reinforcement learning (DDPG)
 
Model-Based Reinforcement Learning @NIPS2017
Model-Based Reinforcement Learning @NIPS2017Model-Based Reinforcement Learning @NIPS2017
Model-Based Reinforcement Learning @NIPS2017
 
안.전.제.일. 강화학습!
안.전.제.일. 강화학습!안.전.제.일. 강화학습!
안.전.제.일. 강화학습!
 
Introduction to A3C model
Introduction to A3C modelIntroduction to A3C model
Introduction to A3C model
 
Reinforcement Learning
Reinforcement LearningReinforcement Learning
Reinforcement Learning
 
강화 학습 기초 Reinforcement Learning an introduction
강화 학습 기초 Reinforcement Learning an introduction강화 학습 기초 Reinforcement Learning an introduction
강화 학습 기초 Reinforcement Learning an introduction
 
Reinforcement learning
Reinforcement learningReinforcement learning
Reinforcement learning
 
Imitation learning tutorial
Imitation learning tutorialImitation learning tutorial
Imitation learning tutorial
 
Deep reinforcement learning from scratch
Deep reinforcement learning from scratchDeep reinforcement learning from scratch
Deep reinforcement learning from scratch
 
DQN (Deep Q-Network)
DQN (Deep Q-Network)DQN (Deep Q-Network)
DQN (Deep Q-Network)
 
Deep Reinforcement Learning
Deep Reinforcement LearningDeep Reinforcement Learning
Deep Reinforcement Learning
 
Introducing Zap Q-Learning
Introducing Zap Q-Learning   Introducing Zap Q-Learning
Introducing Zap Q-Learning
 
An introduction to reinforcement learning
An introduction to  reinforcement learningAn introduction to  reinforcement learning
An introduction to reinforcement learning
 
Markov decision process
Markov decision processMarkov decision process
Markov decision process
 
[DL輪読会]Active Domain Randomization
[DL輪読会]Active Domain Randomization[DL輪読会]Active Domain Randomization
[DL輪読会]Active Domain Randomization
 
Artificial Intelligence: What Is Reinforcement Learning?
Artificial Intelligence: What Is Reinforcement Learning?Artificial Intelligence: What Is Reinforcement Learning?
Artificial Intelligence: What Is Reinforcement Learning?
 
RLCode와 A3C 쉽고 깊게 이해하기
RLCode와 A3C 쉽고 깊게 이해하기RLCode와 A3C 쉽고 깊게 이해하기
RLCode와 A3C 쉽고 깊게 이해하기
 
Reinforcement Learning
Reinforcement LearningReinforcement Learning
Reinforcement Learning
 
Multi-armed Bandits
Multi-armed BanditsMulti-armed Bandits
Multi-armed Bandits
 
(Paper Review)Image to image translation with conditional adversarial network...
(Paper Review)Image to image translation with conditional adversarial network...(Paper Review)Image to image translation with conditional adversarial network...
(Paper Review)Image to image translation with conditional adversarial network...
 

Andere mochten auch

Aggregate planning
Aggregate planningAggregate planning
Aggregate planningAtif Ghayas
 
Machine Learning presentation.
Machine Learning presentation.Machine Learning presentation.
Machine Learning presentation.butest
 

Andere mochten auch (6)

Htn in videogames
Htn in videogamesHtn in videogames
Htn in videogames
 
強化学習勉強会・論文紹介(Kulkarni et al., 2016)
強化学習勉強会・論文紹介(Kulkarni et al., 2016)強化学習勉強会・論文紹介(Kulkarni et al., 2016)
強化学習勉強会・論文紹介(Kulkarni et al., 2016)
 
Hierarchical Object Detection with Deep Reinforcement Learning
Hierarchical Object Detection with Deep Reinforcement LearningHierarchical Object Detection with Deep Reinforcement Learning
Hierarchical Object Detection with Deep Reinforcement Learning
 
Aggregate Planning
Aggregate  PlanningAggregate  Planning
Aggregate Planning
 
Aggregate planning
Aggregate planningAggregate planning
Aggregate planning
 
Machine Learning presentation.
Machine Learning presentation.Machine Learning presentation.
Machine Learning presentation.
 

Ähnlich wie Hierarchical Reinforcement Learning

reiniforcement learning.ppt
reiniforcement learning.pptreiniforcement learning.ppt
reiniforcement learning.pptcharusharma165
 
How to formulate reinforcement learning in illustrative ways
How to formulate reinforcement learning in illustrative waysHow to formulate reinforcement learning in illustrative ways
How to formulate reinforcement learning in illustrative waysYasutoTamura1
 
RL_online _presentation_1.ppt
RL_online _presentation_1.pptRL_online _presentation_1.ppt
RL_online _presentation_1.pptssuser43a599
 
Reinforcement Learning.ppt
Reinforcement Learning.pptReinforcement Learning.ppt
Reinforcement Learning.pptPOOJASHREEC1
 
Matineh Shaker, Artificial Intelligence Scientist, Bonsai at MLconf SF 2017
Matineh Shaker, Artificial Intelligence Scientist, Bonsai at MLconf SF 2017Matineh Shaker, Artificial Intelligence Scientist, Bonsai at MLconf SF 2017
Matineh Shaker, Artificial Intelligence Scientist, Bonsai at MLconf SF 2017MLconf
 
Reinfrocement Learning
Reinfrocement LearningReinfrocement Learning
Reinfrocement LearningNatan Katz
 
lecture_21.pptx - PowerPoint Presentation
lecture_21.pptx - PowerPoint Presentationlecture_21.pptx - PowerPoint Presentation
lecture_21.pptx - PowerPoint Presentationbutest
 
Lecture notes
Lecture notesLecture notes
Lecture notesbutest
 
Reinforcement learning
Reinforcement learningReinforcement learning
Reinforcement learningDongHyun Kwak
 
Introduction of Deep Reinforcement Learning
Introduction of Deep Reinforcement LearningIntroduction of Deep Reinforcement Learning
Introduction of Deep Reinforcement LearningNAVER Engineering
 
anintroductiontoreinforcementlearning-180912151720.pdf
anintroductiontoreinforcementlearning-180912151720.pdfanintroductiontoreinforcementlearning-180912151720.pdf
anintroductiontoreinforcementlearning-180912151720.pdfssuseradaf5f
 
REINFORCEMENT LEARNING
REINFORCEMENT LEARNINGREINFORCEMENT LEARNING
REINFORCEMENT LEARNINGpradiprahul
 
AI_03_Solving Problems by Searching.pptx
AI_03_Solving Problems by Searching.pptxAI_03_Solving Problems by Searching.pptx
AI_03_Solving Problems by Searching.pptxYousef Aburawi
 

Ähnlich wie Hierarchical Reinforcement Learning (20)

reiniforcement learning.ppt
reiniforcement learning.pptreiniforcement learning.ppt
reiniforcement learning.ppt
 
How to formulate reinforcement learning in illustrative ways
How to formulate reinforcement learning in illustrative waysHow to formulate reinforcement learning in illustrative ways
How to formulate reinforcement learning in illustrative ways
 
RL intro
RL introRL intro
RL intro
 
YijueRL.ppt
YijueRL.pptYijueRL.ppt
YijueRL.ppt
 
RL_online _presentation_1.ppt
RL_online _presentation_1.pptRL_online _presentation_1.ppt
RL_online _presentation_1.ppt
 
Reinforcement Learning.ppt
Reinforcement Learning.pptReinforcement Learning.ppt
Reinforcement Learning.ppt
 
Matineh Shaker, Artificial Intelligence Scientist, Bonsai at MLconf SF 2017
Matineh Shaker, Artificial Intelligence Scientist, Bonsai at MLconf SF 2017Matineh Shaker, Artificial Intelligence Scientist, Bonsai at MLconf SF 2017
Matineh Shaker, Artificial Intelligence Scientist, Bonsai at MLconf SF 2017
 
Introduction to Deep Reinforcement Learning
Introduction to Deep Reinforcement LearningIntroduction to Deep Reinforcement Learning
Introduction to Deep Reinforcement Learning
 
Reinfrocement Learning
Reinfrocement LearningReinfrocement Learning
Reinfrocement Learning
 
lecture_21.pptx - PowerPoint Presentation
lecture_21.pptx - PowerPoint Presentationlecture_21.pptx - PowerPoint Presentation
lecture_21.pptx - PowerPoint Presentation
 
Lecture notes
Lecture notesLecture notes
Lecture notes
 
Reinforcement learning
Reinforcement learningReinforcement learning
Reinforcement learning
 
AI_Planning.pdf
AI_Planning.pdfAI_Planning.pdf
AI_Planning.pdf
 
Introduction of Deep Reinforcement Learning
Introduction of Deep Reinforcement LearningIntroduction of Deep Reinforcement Learning
Introduction of Deep Reinforcement Learning
 
Cs221 rl
Cs221 rlCs221 rl
Cs221 rl
 
Reinforcement Learning - DQN
Reinforcement Learning - DQNReinforcement Learning - DQN
Reinforcement Learning - DQN
 
Goprez sg
Goprez  sgGoprez  sg
Goprez sg
 
anintroductiontoreinforcementlearning-180912151720.pdf
anintroductiontoreinforcementlearning-180912151720.pdfanintroductiontoreinforcementlearning-180912151720.pdf
anintroductiontoreinforcementlearning-180912151720.pdf
 
REINFORCEMENT LEARNING
REINFORCEMENT LEARNINGREINFORCEMENT LEARNING
REINFORCEMENT LEARNING
 
AI_03_Solving Problems by Searching.pptx
AI_03_Solving Problems by Searching.pptxAI_03_Solving Problems by Searching.pptx
AI_03_Solving Problems by Searching.pptx
 

Mehr von ahmad bassiouny (20)

Work Study & Productivity
Work Study & ProductivityWork Study & Productivity
Work Study & Productivity
 
Work Study
Work StudyWork Study
Work Study
 
Motion And Time Study
Motion And Time StudyMotion And Time Study
Motion And Time Study
 
Motion Study
Motion StudyMotion Study
Motion Study
 
The Christmas Story
The Christmas StoryThe Christmas Story
The Christmas Story
 
Turkey Photos
Turkey PhotosTurkey Photos
Turkey Photos
 
Mission Bo Kv3
Mission Bo Kv3Mission Bo Kv3
Mission Bo Kv3
 
Miramar
MiramarMiramar
Miramar
 
Mom
MomMom
Mom
 
Linearization
LinearizationLinearization
Linearization
 
Kblmt B000 Intro Kaizen Based Lean Manufacturing
Kblmt B000 Intro Kaizen Based Lean ManufacturingKblmt B000 Intro Kaizen Based Lean Manufacturing
Kblmt B000 Intro Kaizen Based Lean Manufacturing
 
How To Survive
How To SurviveHow To Survive
How To Survive
 
Dad
DadDad
Dad
 
Ancient Hieroglyphics
Ancient HieroglyphicsAncient Hieroglyphics
Ancient Hieroglyphics
 
Dubai In 2009
Dubai In 2009Dubai In 2009
Dubai In 2009
 
DesignPeopleSystem
DesignPeopleSystemDesignPeopleSystem
DesignPeopleSystem
 
Organizational Behavior
Organizational BehaviorOrganizational Behavior
Organizational Behavior
 
Work Study Workshop
Work Study WorkshopWork Study Workshop
Work Study Workshop
 
Workstudy
WorkstudyWorkstudy
Workstudy
 
Time And Motion Study
Time And  Motion  StudyTime And  Motion  Study
Time And Motion Study
 

Kürzlich hochgeladen

Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphThiyagu K
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfciinovamais
 
fourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writingfourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writingTeacherCyreneCayanan
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdfQucHHunhnh
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactdawncurless
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationnomboosow
 
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...fonyou31
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13Steve Thomason
 
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...Sapna Thakur
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Sapana Sha
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformChameera Dedduwage
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...christianmathematics
 
General AI for Medical Educators April 2024
General AI for Medical Educators April 2024General AI for Medical Educators April 2024
General AI for Medical Educators April 2024Janet Corral
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactPECB
 
Disha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdfDisha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdfchloefrazer622
 
social pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajansocial pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajanpragatimahajan3
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDThiyagu K
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfchloefrazer622
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsTechSoup
 

Kürzlich hochgeladen (20)

Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot Graph
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
 
fourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writingfourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writing
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
 
Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impact
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communication
 
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13
 
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy Reform
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
 
General AI for Medical Educators April 2024
General AI for Medical Educators April 2024General AI for Medical Educators April 2024
General AI for Medical Educators April 2024
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
 
Disha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdfDisha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdf
 
social pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajansocial pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajan
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SD
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdf
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The Basics
 

Hierarchical Reinforcement Learning

  • 1. Hierarchical Reinforcement Learning Mausam [A Survey and Comparison of HRL techniques]
  • 2.
  • 3. Decision Making Slide courtesy Dan Weld Environment Percept Action What action next?
  • 4.
  • 5.
  • 6.
  • 7.
  • 8.
  • 9.
  • 10.
  • 11.
  • 12.
  • 13.
  • 14.
  • 15.
  • 16.
  • 17.
  • 18.
  • 19.
  • 20.
  • 21.
  • 22.
  • 23.
  • 25.
  • 26.
  • 27.
  • 28.
  • 29.
  • 30.
  • 31.
  • 32.
  • 33.
  • 34.
  • 35. Machine: Move e + Collision Avoidance Move e Choose Return End of hallway : End of hallway Obstacle Call M1 Call M2 M1 M2 Move w Move n Move n Return Move w Move s Move s Return
  • 36.
  • 37.
  • 38.
  • 39. Task Hierarchy: MAXQ Decomposition [Dietterich’00] Root Take Give Navigate(loc) Deliver Fetch Extend-arm Extend-arm Grab Release Move e Move w Move s Move n Children of a task are unordered
  • 40.
  • 41.
  • 42.
  • 43.
  • 44.
  • 45. 2. Optimality Hierarchical Optimality vs. Recursive Optimality
  • 46.
  • 47.
  • 48.
  • 49.
  • 50.
  • 51.
  • 52.
  • 53.
  • 54.
  • 55.
  • 56.
  • 57.
  • 58.
  • 59.