SlideShare a Scribd company logo
1 of 22
Hierarchical
Reinforcement Learning




    David Jardim & Luís Nunes
      ISCTE-IUL 2009/2010
Hierarchical
Reinforcement Learning




    David Jardim & Luís Nunes
      ISCTE-IUL 2009/2010
Outline 1/2
Planning Process
The Problem and Motivation
Reinforcement Learning
Markov Decision Process
Q-Learning
Hierarchical Reinforcement Learning
  Why HRL?
  Approaches
                   3
Outline 2/2
  Semi-Markov Decision Process
  Options
Until Now
Next Step - Simbad
Limitations of HRL
Future Work on HRL
Questions
References

                     4
Planning Process




       5
The Problem and




                                              LEGO_Mindstorms_NXT_mini.jpg
                                              @ http:/
       Motivation




                                                      /lambcutlet.org/images/
LEGO MindStorms Robot with sensors,
actuators and noise

Purpose of collecting “bricks” and assembly
them accordingly to a plan

Decompose the global problem in sub-
problems

Try to solve the problem by implementing
well-known RL and HRL techniques

                    6
Reinforcement Learning

 Computational
 approach to learning       @ R. S. Sutton, Reinforcement Learning: An Introduction
                                               (MIT Press, 1998).


 An agent tries to maximize the reward he receives
 when an action is taken

 Interacts with a complex, uncertain environment

 Learns how to map situations to actions


                        7
Markov Decision Process

 A finite MDP is defined by

   a finite set of states S

   a finite set of actions A


                              @ http://en.wikipedia.org/wiki/Markov_decision_process




                     8
Q-Learning
          [Watkins, C.J.C.H.’89]


Agent with a state set S and action set A.

Performs an action a in order to change its
state.

A reward is provided by the environment.

The goal of the agent is to maximize its
total reward.


             @ http://en.wikipedia.org/wiki/Q-learning
                                9
Why HRL?
Improve the performance

Impossibility to apply RL to problems with
large state/action (curse of dimensionality)

Sub-goals and abstract actions can be used
on different tasks (state abstraction)

Multiple levels of temporal abstraction

Obtain state abstraction

                    10
Approaches
HAMs - Hierarchies of Abstract Machines (Parr
& Russell, 98)

Options - Between MDPs and Semi-MDPs:
Learning, Planning, and Representing Knowledge
at Multiple Temporal Scales (Sutton, Precup &
Singh, 99)

MAXQ Value Function Decomposition (Dietterich,
2000)

Discovering Hierarchy in RL with HEXQ (Hengst,
2002)
                    11
Approaches
HAMs - Hierarchies of Abstract Machines (Parr
& Russell, 98)

Options - Between MDPs and Semi-MDPs:
Learning, Planning, and Representing Knowledge
at Multiple Temporal Scales (Sutton, Precup &
Singh, 99)

MAXQ Value Function Decomposition (Dietterich,
2000)

Discovering Hierarchy in RL with HEXQ (Hengst,
2002)
                    11
Semi-Markov Decision
      Process
An SMDP consists of

  A set of states S

  A set of actions A

  An expected cumulative discounted reward

  A well-defined joint distribution of the
  next state and transit time


                      12
Options
       [Sutton, Precup & Singh’99]


An Option is defined by

  A policy ∏: SxA ➞ [0,1]

  A termination condition β: S^+ →[0,1]

  And an initiation set I⊆S

Its hierarchical and used to reach sub-goals



                      13
Until Now




O1

     O2




              14
Until Now
                                                                     St

Steps                                      Steps




                  Episodes                              Episodes



        @ Sutton, Precup & Singh’99                @ My Simulation



                                      15
Next Step - Simbad
Java 3D Robot Simulator

3D visualization and sensing

Range Sensor: sonars and IR

Contact Sensor: bumpers               @ http://simbad.sourceforge.net/



Will allow us to simulate and learn first, and then
transfer the learning to our LEGO MindStorm


                        16
Limitations of HRL

Effectiveness of these ideas on large and
complex continuous control tasks

Sub-goals are assigned manually

Some of the existing algorithms only work
well for the problem which they were
designed to solve


                   17
Future Work on HRL

Automated discovery of state abstraction

Find the best automated way to discovery
sub-goals to associate with Options

Obtain a long lived learning agent that faces
a continued series of tasks and keep evolving



                   18
Questions?




    19
Questions?




    19
References

R. S. Sutton, Reinforcement Learning: An Introduction (MIT Press, 1998).

R. Parr and S. Russell. Reinforcement learning with hierarchies of machines. In Advances in
Neural Information Processing Systems: Proceedings of the 1997 Conference, Cambridge,
MA, 1998. MIT Press.

R. S. Sutton, D. Precup, and S. Singh. Between mdps and semi-mdps: A framework for
temporal abstraction in reinforcement learning. Artificial Intelligence, 112:181–211, 1999.

T. G. Dietterich. Hierarchical reinforcement learning with the MAXQ value function
decomposition. Journal of Artificial Intelligence Research, 13:227–303, 2000.

B. Hengst. Discovering hierarchy in reinforcement learning with hexq. In Maching Learning:
Proceedings of the Nineteenth International Conference on Machine Learning, 2002.




                                         20

More Related Content

What's hot

Deep Reinforcement Learning
Deep Reinforcement LearningDeep Reinforcement Learning
Deep Reinforcement LearningUsman Qayyum
 
Reinforcement Learning 4. Dynamic Programming
Reinforcement Learning 4. Dynamic ProgrammingReinforcement Learning 4. Dynamic Programming
Reinforcement Learning 4. Dynamic ProgrammingSeung Jae Lee
 
Reinforcement learning
Reinforcement learningReinforcement learning
Reinforcement learningDing Li
 
Reinforcement learning
Reinforcement learning Reinforcement learning
Reinforcement learning Chandra Meena
 
Deep Reinforcement Learning and Its Applications
Deep Reinforcement Learning and Its ApplicationsDeep Reinforcement Learning and Its Applications
Deep Reinforcement Learning and Its ApplicationsBill Liu
 
Deep Reinforcement Learning: Q-Learning
Deep Reinforcement Learning: Q-LearningDeep Reinforcement Learning: Q-Learning
Deep Reinforcement Learning: Q-LearningKai-Wen Zhao
 
Actor critic algorithm
Actor critic algorithmActor critic algorithm
Actor critic algorithmJie-Han Chen
 
Reinforcement Learning 1. Introduction
Reinforcement Learning 1. IntroductionReinforcement Learning 1. Introduction
Reinforcement Learning 1. IntroductionSeung Jae Lee
 
An introduction to reinforcement learning
An introduction to reinforcement learningAn introduction to reinforcement learning
An introduction to reinforcement learningSubrat Panda, PhD
 
Lecture 9 Markov decision process
Lecture 9 Markov decision processLecture 9 Markov decision process
Lecture 9 Markov decision processVARUN KUMAR
 
Reinforcement Learning 3. Finite Markov Decision Processes
Reinforcement Learning 3. Finite Markov Decision ProcessesReinforcement Learning 3. Finite Markov Decision Processes
Reinforcement Learning 3. Finite Markov Decision ProcessesSeung Jae Lee
 
Multi armed bandit
Multi armed banditMulti armed bandit
Multi armed banditJie-Han Chen
 
An introduction to reinforcement learning (rl)
An introduction to reinforcement learning (rl)An introduction to reinforcement learning (rl)
An introduction to reinforcement learning (rl)pauldix
 
Maximum Entropy Reinforcement Learning (Stochastic Control)
Maximum Entropy Reinforcement Learning (Stochastic Control)Maximum Entropy Reinforcement Learning (Stochastic Control)
Maximum Entropy Reinforcement Learning (Stochastic Control)Dongmin Lee
 
Reinforcement learning, Q-Learning
Reinforcement learning, Q-LearningReinforcement learning, Q-Learning
Reinforcement learning, Q-LearningKuppusamy P
 
Rl chapter 1 introduction
Rl chapter 1 introductionRl chapter 1 introduction
Rl chapter 1 introductionConnorShorten2
 
Reinforcement Learning Tutorial | Edureka
Reinforcement Learning Tutorial | EdurekaReinforcement Learning Tutorial | Edureka
Reinforcement Learning Tutorial | EdurekaEdureka!
 

What's hot (20)

Deep Reinforcement Learning
Deep Reinforcement LearningDeep Reinforcement Learning
Deep Reinforcement Learning
 
Reinforcement Learning 4. Dynamic Programming
Reinforcement Learning 4. Dynamic ProgrammingReinforcement Learning 4. Dynamic Programming
Reinforcement Learning 4. Dynamic Programming
 
Reinforcement learning
Reinforcement learningReinforcement learning
Reinforcement learning
 
Reinforcement learning
Reinforcement learning Reinforcement learning
Reinforcement learning
 
Deep Reinforcement Learning and Its Applications
Deep Reinforcement Learning and Its ApplicationsDeep Reinforcement Learning and Its Applications
Deep Reinforcement Learning and Its Applications
 
Deep Reinforcement Learning: Q-Learning
Deep Reinforcement Learning: Q-LearningDeep Reinforcement Learning: Q-Learning
Deep Reinforcement Learning: Q-Learning
 
Actor critic algorithm
Actor critic algorithmActor critic algorithm
Actor critic algorithm
 
Deep Q-Learning
Deep Q-LearningDeep Q-Learning
Deep Q-Learning
 
Reinforcement Learning 1. Introduction
Reinforcement Learning 1. IntroductionReinforcement Learning 1. Introduction
Reinforcement Learning 1. Introduction
 
An introduction to reinforcement learning
An introduction to reinforcement learningAn introduction to reinforcement learning
An introduction to reinforcement learning
 
Lecture 9 Markov decision process
Lecture 9 Markov decision processLecture 9 Markov decision process
Lecture 9 Markov decision process
 
Reinforcement Learning 3. Finite Markov Decision Processes
Reinforcement Learning 3. Finite Markov Decision ProcessesReinforcement Learning 3. Finite Markov Decision Processes
Reinforcement Learning 3. Finite Markov Decision Processes
 
Multi armed bandit
Multi armed banditMulti armed bandit
Multi armed bandit
 
An introduction to reinforcement learning (rl)
An introduction to reinforcement learning (rl)An introduction to reinforcement learning (rl)
An introduction to reinforcement learning (rl)
 
Maximum Entropy Reinforcement Learning (Stochastic Control)
Maximum Entropy Reinforcement Learning (Stochastic Control)Maximum Entropy Reinforcement Learning (Stochastic Control)
Maximum Entropy Reinforcement Learning (Stochastic Control)
 
Q-learning
Q-learningQ-learning
Q-learning
 
Reinforcement Learning
Reinforcement LearningReinforcement Learning
Reinforcement Learning
 
Reinforcement learning, Q-Learning
Reinforcement learning, Q-LearningReinforcement learning, Q-Learning
Reinforcement learning, Q-Learning
 
Rl chapter 1 introduction
Rl chapter 1 introductionRl chapter 1 introduction
Rl chapter 1 introduction
 
Reinforcement Learning Tutorial | Edureka
Reinforcement Learning Tutorial | EdurekaReinforcement Learning Tutorial | Edureka
Reinforcement Learning Tutorial | Edureka
 

Viewers also liked

HRL: Learning Subgoals and State Abstraction
HRL: Learning Subgoals and State AbstractionHRL: Learning Subgoals and State Abstraction
HRL: Learning Subgoals and State AbstractionDavid Jardim
 
My culture assignment
My culture assignmentMy culture assignment
My culture assignmentShawn Cap
 
Culture and community report
Culture and community reportCulture and community report
Culture and community reportEvelyn Avila
 
Popular Culture Assignment 1 5
Popular Culture Assignment 1 5Popular Culture Assignment 1 5
Popular Culture Assignment 1 5Todd Messner
 
Culture and communication
Culture and communicationCulture and communication
Culture and communicationBelaCELT
 
Hard to measure: Company Culture Report Summary Findings
Hard to measure: Company Culture Report Summary FindingsHard to measure: Company Culture Report Summary Findings
Hard to measure: Company Culture Report Summary FindingsJames Bridgman
 
Intercultural communication.sal july 2010
Intercultural communication.sal july 2010Intercultural communication.sal july 2010
Intercultural communication.sal july 2010sun1you
 
Bangladesh culture
Bangladesh cultureBangladesh culture
Bangladesh culturearifplus
 
Axact - OB report on Organizational Culture
Axact - OB report on Organizational CultureAxact - OB report on Organizational Culture
Axact - OB report on Organizational CultureFayaz T
 
My Cultural Project Intro
My Cultural Project IntroMy Cultural Project Intro
My Cultural Project IntroChris Allen
 
Culture and Nonverbal Communication in Italy
Culture and Nonverbal Communication in ItalyCulture and Nonverbal Communication in Italy
Culture and Nonverbal Communication in ItalySNash53328
 
Relationships between communication and culture
Relationships between communication and cultureRelationships between communication and culture
Relationships between communication and culturexochitlfaro
 
High context cross culture communication of india
High context cross culture communication of india High context cross culture communication of india
High context cross culture communication of india Vikram M. Nimbal
 
Communication culture and context
Communication culture and contextCommunication culture and context
Communication culture and contextSurbhi Parashar
 

Viewers also liked (20)

HRL: Learning Subgoals and State Abstraction
HRL: Learning Subgoals and State AbstractionHRL: Learning Subgoals and State Abstraction
HRL: Learning Subgoals and State Abstraction
 
Generalized Reinforcement Learning
Generalized Reinforcement LearningGeneralized Reinforcement Learning
Generalized Reinforcement Learning
 
Htn in videogames
Htn in videogamesHtn in videogames
Htn in videogames
 
My culture assignment
My culture assignmentMy culture assignment
My culture assignment
 
World Cities Culture Report - 2015
World Cities Culture Report - 2015World Cities Culture Report - 2015
World Cities Culture Report - 2015
 
Culture and community report
Culture and community reportCulture and community report
Culture and community report
 
Popular Culture Assignment 1 5
Popular Culture Assignment 1 5Popular Culture Assignment 1 5
Popular Culture Assignment 1 5
 
Cross clulture uk
Cross clulture ukCross clulture uk
Cross clulture uk
 
Culture and communication
Culture and communicationCulture and communication
Culture and communication
 
Hard to measure: Company Culture Report Summary Findings
Hard to measure: Company Culture Report Summary FindingsHard to measure: Company Culture Report Summary Findings
Hard to measure: Company Culture Report Summary Findings
 
Intercultural communication.sal july 2010
Intercultural communication.sal july 2010Intercultural communication.sal july 2010
Intercultural communication.sal july 2010
 
Bangladesh culture
Bangladesh cultureBangladesh culture
Bangladesh culture
 
Axact - OB report on Organizational Culture
Axact - OB report on Organizational CultureAxact - OB report on Organizational Culture
Axact - OB report on Organizational Culture
 
Culture and communication
Culture and communicationCulture and communication
Culture and communication
 
My Cultural Project Intro
My Cultural Project IntroMy Cultural Project Intro
My Cultural Project Intro
 
Communication and Culture
Communication and CultureCommunication and Culture
Communication and Culture
 
Culture and Nonverbal Communication in Italy
Culture and Nonverbal Communication in ItalyCulture and Nonverbal Communication in Italy
Culture and Nonverbal Communication in Italy
 
Relationships between communication and culture
Relationships between communication and cultureRelationships between communication and culture
Relationships between communication and culture
 
High context cross culture communication of india
High context cross culture communication of india High context cross culture communication of india
High context cross culture communication of india
 
Communication culture and context
Communication culture and contextCommunication culture and context
Communication culture and context
 

Similar to Hierarchical Reinforcement Learning

Matineh Shaker, Artificial Intelligence Scientist, Bonsai at MLconf SF 2017
Matineh Shaker, Artificial Intelligence Scientist, Bonsai at MLconf SF 2017Matineh Shaker, Artificial Intelligence Scientist, Bonsai at MLconf SF 2017
Matineh Shaker, Artificial Intelligence Scientist, Bonsai at MLconf SF 2017MLconf
 
Reinforcement learning
Reinforcement learningReinforcement learning
Reinforcement learningDongHyun Kwak
 
Deep RL for Autonomous Driving exploring applications Cognitive vehicles 2019
Deep RL for Autonomous Driving exploring applications Cognitive vehicles 2019Deep RL for Autonomous Driving exploring applications Cognitive vehicles 2019
Deep RL for Autonomous Driving exploring applications Cognitive vehicles 2019Ravi Kiran B.
 
cs330_2021_lifelong_learning.pdf
cs330_2021_lifelong_learning.pdfcs330_2021_lifelong_learning.pdf
cs330_2021_lifelong_learning.pdfKuan-Tsae Huang
 
Deep Learning for Real-Time Atari Game Play Using Offline Monte-CarloTree Sear...
Deep Learning for Real-Time Atari Game Play Using Offline Monte-CarloTree Sear...Deep Learning for Real-Time Atari Game Play Using Offline Monte-CarloTree Sear...
Deep Learning for Real-Time Atari Game Play Using Offline Monte-CarloTree Sear...Willy Marroquin (WillyDevNET)
 
Get hands-on with Explainable AI at Machine Learning Interpretability(MLI) Gym!
Get hands-on with Explainable AI at Machine Learning Interpretability(MLI) Gym!Get hands-on with Explainable AI at Machine Learning Interpretability(MLI) Gym!
Get hands-on with Explainable AI at Machine Learning Interpretability(MLI) Gym!Sri Ambati
 
Entity Summarization with User Feedback (ESWC 2020)
Entity Summarization with User Feedback (ESWC 2020)Entity Summarization with User Feedback (ESWC 2020)
Entity Summarization with User Feedback (ESWC 2020)Qingxia Liu
 
An exhaustive survey of reinforcement learning with hierarchical structure
An exhaustive survey of reinforcement learning with hierarchical structureAn exhaustive survey of reinforcement learning with hierarchical structure
An exhaustive survey of reinforcement learning with hierarchical structureeSAT Journals
 
State representation learning for control: an overview
State representation learning for control: an overview State representation learning for control: an overview
State representation learning for control: an overview Natalia Díaz Rodríguez
 
Human Level Artificial Intelligence
Human Level Artificial IntelligenceHuman Level Artificial Intelligence
Human Level Artificial IntelligenceRahul Chaurasia
 
10 1 planning, acting, learning
10 1 planning, acting, learning10 1 planning, acting, learning
10 1 planning, acting, learningTianlu Wang
 
Deep reinforcement learning from scratch
Deep reinforcement learning from scratchDeep reinforcement learning from scratch
Deep reinforcement learning from scratchJie-Han Chen
 
Presentation v2
Presentation v2Presentation v2
Presentation v2MehrnooshV
 
Hibridization of Reinforcement Learning Agents
Hibridization of Reinforcement Learning AgentsHibridization of Reinforcement Learning Agents
Hibridization of Reinforcement Learning Agentsbutest
 
Reinforcement Learning (DLAI D7L2 2017 UPC Deep Learning for Artificial Intel...
Reinforcement Learning (DLAI D7L2 2017 UPC Deep Learning for Artificial Intel...Reinforcement Learning (DLAI D7L2 2017 UPC Deep Learning for Artificial Intel...
Reinforcement Learning (DLAI D7L2 2017 UPC Deep Learning for Artificial Intel...Universitat Politècnica de Catalunya
 
Location Prediction Under Data Sparsity
Location Prediction Under Data SparsityLocation Prediction Under Data Sparsity
Location Prediction Under Data SparsityJames McInerney
 
Introduction of Deep Reinforcement Learning
Introduction of Deep Reinforcement LearningIntroduction of Deep Reinforcement Learning
Introduction of Deep Reinforcement LearningNAVER Engineering
 
anintroductiontoreinforcementlearning-180912151720.pdf
anintroductiontoreinforcementlearning-180912151720.pdfanintroductiontoreinforcementlearning-180912151720.pdf
anintroductiontoreinforcementlearning-180912151720.pdfssuseradaf5f
 

Similar to Hierarchical Reinforcement Learning (20)

Matineh Shaker, Artificial Intelligence Scientist, Bonsai at MLconf SF 2017
Matineh Shaker, Artificial Intelligence Scientist, Bonsai at MLconf SF 2017Matineh Shaker, Artificial Intelligence Scientist, Bonsai at MLconf SF 2017
Matineh Shaker, Artificial Intelligence Scientist, Bonsai at MLconf SF 2017
 
Reinforcement learning
Reinforcement learningReinforcement learning
Reinforcement learning
 
Deep RL for Autonomous Driving exploring applications Cognitive vehicles 2019
Deep RL for Autonomous Driving exploring applications Cognitive vehicles 2019Deep RL for Autonomous Driving exploring applications Cognitive vehicles 2019
Deep RL for Autonomous Driving exploring applications Cognitive vehicles 2019
 
cs330_2021_lifelong_learning.pdf
cs330_2021_lifelong_learning.pdfcs330_2021_lifelong_learning.pdf
cs330_2021_lifelong_learning.pdf
 
Deep Learning for Real-Time Atari Game Play Using Offline Monte-CarloTree Sear...
Deep Learning for Real-Time Atari Game Play Using Offline Monte-CarloTree Sear...Deep Learning for Real-Time Atari Game Play Using Offline Monte-CarloTree Sear...
Deep Learning for Real-Time Atari Game Play Using Offline Monte-CarloTree Sear...
 
Get hands-on with Explainable AI at Machine Learning Interpretability(MLI) Gym!
Get hands-on with Explainable AI at Machine Learning Interpretability(MLI) Gym!Get hands-on with Explainable AI at Machine Learning Interpretability(MLI) Gym!
Get hands-on with Explainable AI at Machine Learning Interpretability(MLI) Gym!
 
Entity Summarization with User Feedback (ESWC 2020)
Entity Summarization with User Feedback (ESWC 2020)Entity Summarization with User Feedback (ESWC 2020)
Entity Summarization with User Feedback (ESWC 2020)
 
An exhaustive survey of reinforcement learning with hierarchical structure
An exhaustive survey of reinforcement learning with hierarchical structureAn exhaustive survey of reinforcement learning with hierarchical structure
An exhaustive survey of reinforcement learning with hierarchical structure
 
State representation learning for control: an overview
State representation learning for control: an overview State representation learning for control: an overview
State representation learning for control: an overview
 
Human Level Artificial Intelligence
Human Level Artificial IntelligenceHuman Level Artificial Intelligence
Human Level Artificial Intelligence
 
10 1 planning, acting, learning
10 1 planning, acting, learning10 1 planning, acting, learning
10 1 planning, acting, learning
 
Deep reinforcement learning from scratch
Deep reinforcement learning from scratchDeep reinforcement learning from scratch
Deep reinforcement learning from scratch
 
Presentation v2
Presentation v2Presentation v2
Presentation v2
 
Hibridization of Reinforcement Learning Agents
Hibridization of Reinforcement Learning AgentsHibridization of Reinforcement Learning Agents
Hibridization of Reinforcement Learning Agents
 
Reinforcement Learning (DLAI D7L2 2017 UPC Deep Learning for Artificial Intel...
Reinforcement Learning (DLAI D7L2 2017 UPC Deep Learning for Artificial Intel...Reinforcement Learning (DLAI D7L2 2017 UPC Deep Learning for Artificial Intel...
Reinforcement Learning (DLAI D7L2 2017 UPC Deep Learning for Artificial Intel...
 
Scheduling And Htn
Scheduling And HtnScheduling And Htn
Scheduling And Htn
 
Location Prediction Under Data Sparsity
Location Prediction Under Data SparsityLocation Prediction Under Data Sparsity
Location Prediction Under Data Sparsity
 
MILA DL & RL summer school highlights
MILA DL & RL summer school highlights MILA DL & RL summer school highlights
MILA DL & RL summer school highlights
 
Introduction of Deep Reinforcement Learning
Introduction of Deep Reinforcement LearningIntroduction of Deep Reinforcement Learning
Introduction of Deep Reinforcement Learning
 
anintroductiontoreinforcementlearning-180912151720.pdf
anintroductiontoreinforcementlearning-180912151720.pdfanintroductiontoreinforcementlearning-180912151720.pdf
anintroductiontoreinforcementlearning-180912151720.pdf
 

Recently uploaded

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...Zilliz
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...apidays
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024The Digital Insurer
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Orbitshub
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Victor Rentea
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...apidays
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusZilliz
 

Recently uploaded (20)

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 

Hierarchical Reinforcement Learning

  • 1. Hierarchical Reinforcement Learning David Jardim & Luís Nunes ISCTE-IUL 2009/2010
  • 2. Hierarchical Reinforcement Learning David Jardim & Luís Nunes ISCTE-IUL 2009/2010
  • 3. Outline 1/2 Planning Process The Problem and Motivation Reinforcement Learning Markov Decision Process Q-Learning Hierarchical Reinforcement Learning Why HRL? Approaches 3
  • 4. Outline 2/2 Semi-Markov Decision Process Options Until Now Next Step - Simbad Limitations of HRL Future Work on HRL Questions References 4
  • 6. The Problem and LEGO_Mindstorms_NXT_mini.jpg @ http:/ Motivation /lambcutlet.org/images/ LEGO MindStorms Robot with sensors, actuators and noise Purpose of collecting “bricks” and assembly them accordingly to a plan Decompose the global problem in sub- problems Try to solve the problem by implementing well-known RL and HRL techniques 6
  • 7. Reinforcement Learning Computational approach to learning @ R. S. Sutton, Reinforcement Learning: An Introduction (MIT Press, 1998). An agent tries to maximize the reward he receives when an action is taken Interacts with a complex, uncertain environment Learns how to map situations to actions 7
  • 8. Markov Decision Process A finite MDP is defined by a finite set of states S a finite set of actions A @ http://en.wikipedia.org/wiki/Markov_decision_process 8
  • 9. Q-Learning [Watkins, C.J.C.H.’89] Agent with a state set S and action set A. Performs an action a in order to change its state. A reward is provided by the environment. The goal of the agent is to maximize its total reward. @ http://en.wikipedia.org/wiki/Q-learning 9
  • 10. Why HRL? Improve the performance Impossibility to apply RL to problems with large state/action (curse of dimensionality) Sub-goals and abstract actions can be used on different tasks (state abstraction) Multiple levels of temporal abstraction Obtain state abstraction 10
  • 11. Approaches HAMs - Hierarchies of Abstract Machines (Parr & Russell, 98) Options - Between MDPs and Semi-MDPs: Learning, Planning, and Representing Knowledge at Multiple Temporal Scales (Sutton, Precup & Singh, 99) MAXQ Value Function Decomposition (Dietterich, 2000) Discovering Hierarchy in RL with HEXQ (Hengst, 2002) 11
  • 12. Approaches HAMs - Hierarchies of Abstract Machines (Parr & Russell, 98) Options - Between MDPs and Semi-MDPs: Learning, Planning, and Representing Knowledge at Multiple Temporal Scales (Sutton, Precup & Singh, 99) MAXQ Value Function Decomposition (Dietterich, 2000) Discovering Hierarchy in RL with HEXQ (Hengst, 2002) 11
  • 13. Semi-Markov Decision Process An SMDP consists of A set of states S A set of actions A An expected cumulative discounted reward A well-defined joint distribution of the next state and transit time 12
  • 14. Options [Sutton, Precup & Singh’99] An Option is defined by A policy ∏: SxA ➞ [0,1] A termination condition β: S^+ →[0,1] And an initiation set I⊆S Its hierarchical and used to reach sub-goals 13
  • 15. Until Now O1 O2 14
  • 16. Until Now St Steps Steps Episodes Episodes @ Sutton, Precup & Singh’99 @ My Simulation 15
  • 17. Next Step - Simbad Java 3D Robot Simulator 3D visualization and sensing Range Sensor: sonars and IR Contact Sensor: bumpers @ http://simbad.sourceforge.net/ Will allow us to simulate and learn first, and then transfer the learning to our LEGO MindStorm 16
  • 18. Limitations of HRL Effectiveness of these ideas on large and complex continuous control tasks Sub-goals are assigned manually Some of the existing algorithms only work well for the problem which they were designed to solve 17
  • 19. Future Work on HRL Automated discovery of state abstraction Find the best automated way to discovery sub-goals to associate with Options Obtain a long lived learning agent that faces a continued series of tasks and keep evolving 18
  • 22. References R. S. Sutton, Reinforcement Learning: An Introduction (MIT Press, 1998). R. Parr and S. Russell. Reinforcement learning with hierarchies of machines. In Advances in Neural Information Processing Systems: Proceedings of the 1997 Conference, Cambridge, MA, 1998. MIT Press. R. S. Sutton, D. Precup, and S. Singh. Between mdps and semi-mdps: A framework for temporal abstraction in reinforcement learning. Artificial Intelligence, 112:181–211, 1999. T. G. Dietterich. Hierarchical reinforcement learning with the MAXQ value function decomposition. Journal of Artificial Intelligence Research, 13:227–303, 2000. B. Hengst. Discovering hierarchy in reinforcement learning with hexq. In Maching Learning: Proceedings of the Nineteenth International Conference on Machine Learning, 2002. 20

Editor's Notes

  1. Boa tarde. Estou cá para vós falar sobre o meu projecto de dissertação, que insere-se na área da aprendizagem hierárquica por reforço.
  2. Durante esta apresentação ou abordar o problema e a motivação para o mesmo, definir a aprendizagem por reforço e a sua estrutura matemática. Falar um pouco sobre o QLearning e posteriormente aprofundar o tema da Aprendizagem por Reforço Hierárquica.
  3. Mostrar algum do trabalho desenvolvido até ao momento, e definir quais são os próximos passos.
  4. Pretende-se simular um robô que tem o objectivo de buscar tijolos e dispôr os mesmos de acordo com um plano. Vou tentar dividir o problema em várias tarefas, por ex: como encontrar o tijolo, empurrar o tijolo... Numa fase seguinte, efectuar o mesmo com um robô real num cenário real. A questão é até que ponto, as técnicas conhecidas de aprendizagem por reforço e aprendizagem por reforço hierarquico nos podem levar à resolução do problema começando por uma simplificação e adicionar complexidade progressivamente.
  5. A aprendizagem por reforço na área computacional, consiste, numa abordagem à aprendizagem, onde um agente ao executar uma acção recebe uma recompensa e altera o seu estado. Ao longo do tempo essa recompensa vai permitir mapear estados para acções e criar uma política. Essa política vai permitir ao agente resolver o problema a que foi proposto.
  6. Se uma tarefa em aprendizagem por reforço possui um conjunto de estados e de acções finitos, então podemos afirmar que essa mesma tarefa é um Markov Decision Process. Para qualquer estado e acção, a probabilidade do estado seguinte ocorrer é definido pela 1ª equação. Consiste na recompensa imediata após transitar para o estado seguinte com a probabilidade definida anteriormente
  7. Foi uma descoberta muito importante para a área, pode ser vista como um agente que escolhe uma acção a partir de uma política, de seguida executa essa mesma acção, recebe uma recompensa e transita para um novo estado actualizando a qualidade da acção executada no estado correspondente.
  8. A aprendizagem por reforço possui algumas limitações, consoante a complexidade do problema pode-se revelar impracticável a sua aprendizagem. Quanto mais complexo maior é o conjunto de estados e de acções. A aprendizagem por reforço pode ser utilizada para acelerar o processo de aprendizagem, diminuir a quantidade de recursos necessários (memória), reutilizar a aprendizagem adquirida em diferentes tarefas (state abstraction). Dessa forma tornar possível a resolução de alguns problemas impossíveis.
  9. HAMs - Através de uma hierarquia de máquinas de estados finitos, organizada por ordem crescente de complexidade, onde as máquinas de topo são compostas pelas máquinas subjacentes até chegarmos às máquinas que executam as acções primitivas. MAXQ - Trata o problema como um conjunto de problemas QLearning simultâneos. Consegue decompôr a política de forma a aproveitar partes que se repetem. Implementa vários tipos de state abstraction. HEXQ - Tenta decompôr um MDP ao dividir o espaço dos estados em regiões sub-MDP aninhadas e posteriormente tenta resolver o problema para cada uma das regiões.
  10. Os Semi-Markov Decision Process são considerados um tipo especial de MDP, apropriados para modelar sistemas de eventos discretos em tempo continuo. A grande diferença é que neste caso as acções podem levar quantidades de tempo variáveis de forma a modelarem acções temporalmente alargadas.
  11. Esta foi a abordagem escolhida como base para o trabalho a desenvolver, porque em relação às outras abordagens, esta revelou-se a mais flexível em termos de problemas em que pode ser aplicada. Politica - Consiste no conjunto de acções primitivas que definem a acção composta Condição de terminação - A probabilidade é igual a 1 quando o agente muda de sala por ex. Consiste no conjunto de estados onde a Option pode ser iniciada (uma sala). Criação da OPTION atráves de QLearning
  12. De forma a obter uma base sólida de conhecimento acerca das OPTIONS, tentei replicar o estudo efectuado pelos autores, e esta foi a minha implementação. Na 1ª imagem temos uma representação de uma OPTION, que não é nada mais que uma acção composta. E na 2ª imagem podemos ver os estados onde o agente decidiu que era mais vantajoso executar uma OPTION ou uma acção primitiva.
  13. Os resultados obtidos foram muito próximos da simulação efectuada pelos autores, como podem ver pela comparação entre os dois gráficos. Neste momento está criada a base que vai ser utilizada para efectuar testes e tentar resolver o problema, quem sabe até melhorar estes resultados.
  14. O passo seguinte é utilizar uma plataforma de simulação de robôs com diversas funcionalidades, e tentar resolver o problema proposto do robô construtor utilizando os conhecimentos adquiridos na implementação das OPTIONS. Quando esse objectivo for alcançado, então começarei a trabalhar para transferir a aprendizagem adquirida na simulação para o robô real e fazer com que ele resolva o problema aprendido.
  15. É claro que a aprendizagem por reforço hierárquica não é perfeita, porque por enquanto apenas consegue ser aplicada a problemas de pequenas ou médias dimensões, e os sub-objectivos são atribuidos de forma manual pelo programador. Também a maior parte dos algorítmos desenvolvidos apenas funcionam como devem de ser para os problemas que foram pensados incialmente.
  16. Existe muito trabalho futuro nesta área e as possibilidades são imensas. Pela investigação efectuada o maior desafio consiste na descoberta automática de estruturas hierárquicas, como a descoberta dos sub-objectivos de forma a que o agente divida o problema em vários problemas mais simples, e fazer com que o agente atráves do processo de aprendizagem seja capaz de responder a novos desafios e evoluir de acordo.
  17. Obrigado, agora se alguem tiver questões que eu possa responder.