SlideShare ist ein Scribd-Unternehmen logo
1 von 15
Hierarchical POMDP Planning and Execution Joelle Pineau Machine Learning Lunch November 20, 2000
Partially Observable MDP ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],S 1 S 2 S 3
The problem ,[object Object],[object Object]
Proposed Approach ,[object Object],[object Object],[object Object],Act InvestigateHealth Move Navigate CheckPulse AskWhere Left Right Up Down CheckMeds
Hierarchical POMDP Planning ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Example M B K E 0.1 0.1 0.1 0.1 0.1 0.1 0.8 0.8 POMDP: S o = { M eds,  K itchen,  B edroom} A o  = {ClarifyTask, Check M eds, GoTo K itchen, GoTo B edroom} O o  = {Noise,  M eds,  K itchen,  B edroom} Value Function: MedsState KitchenState BedroomState 0.8 GoToKitchen ClarifyTask GoToBedroom CheckMeds
Hierarchical POMDP Action Partitioning: Act Move CheckMeds ClarifyTask ClarifyTask GoToKitchen GoToBedroom
Local Value Function and Policy -  Move  Controller ClarifyTask GoToKitchen GoToBedroom MedsState KitchenState BedroomState
Modeling Abstract Actions ClarifyTask GoToKitchen GoToBedroom MedsState KitchenState BedroomState Problem :  Need parameters for abstract action  Move Solution :  Use the local policy of corresponding low-level controller General form :  Pr ( s j  | s i , a k abstract  ) = Pr ( s j  | s i , Policy(a k abstract ,s i ) ) Example : Pr ( s j  |  MedsState ,  Move  ) = Pr ( s j  |  MedsState , ClarifyTask ) Policy   (Move,s i ):
Local Value Function and Policy -  Act  Controller Move MedsState KitchenState BedroomState CheckMeds
Comparing Policies Hierarchical Policy: Optimal Policy: = ClarifyTask = CheckMeds = GoToKitchen = GoToBedroom
Bounding the value of the approximation ,[object Object],[object Object],[object Object],[object Object],[object Object]
A real dialogue management example - AskGoWhere - GoToRoom - GoToKitchen - GoToFollow - VerifyRoom - VerifyKitchen - VerifyFollow - GreetGeneral - GreetMorning - GreetNight - RespondThanks - AskWeatherTime - SayCurrent - SayToday - SayTomorrow - StartMeds - NextMeds - ForceMeds - QuitMeds - AskCallWho - CallHelp - CallNurse - CallRelative - VerifyHelp - VerifyNurse - VerifyRelative - AskHealth - OfferHelp - SayTime Act CheckHealth Phone DoMeds CheckWeather Move Greet
Results:
Final words ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Weitere ähnliche Inhalte

Ähnlich wie Hierarchical Pomdp Planning And Execution

Deep Reinforcement Learning Through Policy Optimization, John Schulman, OpenAI
Deep Reinforcement Learning Through Policy Optimization, John Schulman, OpenAIDeep Reinforcement Learning Through Policy Optimization, John Schulman, OpenAI
Deep Reinforcement Learning Through Policy Optimization, John Schulman, OpenAIJack Clark
 
Reinforcement Learning on Mine Sweeper
Reinforcement Learning on Mine SweeperReinforcement Learning on Mine Sweeper
Reinforcement Learning on Mine SweeperDataScienceLab
 
Problem Formulation in Artificial Inteligence Projects
Problem Formulation in Artificial Inteligence ProjectsProblem Formulation in Artificial Inteligence Projects
Problem Formulation in Artificial Inteligence ProjectsDr. C.V. Suresh Babu
 
How to formulate reinforcement learning in illustrative ways
How to formulate reinforcement learning in illustrative waysHow to formulate reinforcement learning in illustrative ways
How to formulate reinforcement learning in illustrative waysYasutoTamura1
 
Counterfactual Learning for Recommendation
Counterfactual Learning for RecommendationCounterfactual Learning for Recommendation
Counterfactual Learning for RecommendationOlivier Jeunen
 
anintroductiontoreinforcementlearning-180912151720.pdf
anintroductiontoreinforcementlearning-180912151720.pdfanintroductiontoreinforcementlearning-180912151720.pdf
anintroductiontoreinforcementlearning-180912151720.pdfssuseradaf5f
 
An introduction to reinforcement learning
An introduction to reinforcement learningAn introduction to reinforcement learning
An introduction to reinforcement learningSubrat Panda, PhD
 
Uncertainty Awareness in Integrating Machine Learning and Game Theory
Uncertainty Awareness in Integrating Machine Learning and Game TheoryUncertainty Awareness in Integrating Machine Learning and Game Theory
Uncertainty Awareness in Integrating Machine Learning and Game TheoryRikiya Takahashi
 
Demystifying deep reinforement learning
Demystifying deep reinforement learningDemystifying deep reinforement learning
Demystifying deep reinforement learning재연 윤
 
reiniforcement learning.ppt
reiniforcement learning.pptreiniforcement learning.ppt
reiniforcement learning.pptcharusharma165
 
An Introduction to boosting
An Introduction to boostingAn Introduction to boosting
An Introduction to boostingbutest
 
Reinforcement Learning in Practice: Contextual Bandits
Reinforcement Learning in Practice: Contextual BanditsReinforcement Learning in Practice: Contextual Bandits
Reinforcement Learning in Practice: Contextual BanditsMax Pagels
 
Amit ppt
Amit pptAmit ppt
Amit pptamitp26
 
RL_online _presentation_1.ppt
RL_online _presentation_1.pptRL_online _presentation_1.ppt
RL_online _presentation_1.pptssuser43a599
 

Ähnlich wie Hierarchical Pomdp Planning And Execution (20)

Deep Reinforcement Learning Through Policy Optimization, John Schulman, OpenAI
Deep Reinforcement Learning Through Policy Optimization, John Schulman, OpenAIDeep Reinforcement Learning Through Policy Optimization, John Schulman, OpenAI
Deep Reinforcement Learning Through Policy Optimization, John Schulman, OpenAI
 
Problem space
Problem spaceProblem space
Problem space
 
Problem space
Problem spaceProblem space
Problem space
 
Problem space
Problem spaceProblem space
Problem space
 
PMED Undergraduate Workshop - Introduction to Reinforcement Learning - Lili W...
PMED Undergraduate Workshop - Introduction to Reinforcement Learning - Lili W...PMED Undergraduate Workshop - Introduction to Reinforcement Learning - Lili W...
PMED Undergraduate Workshop - Introduction to Reinforcement Learning - Lili W...
 
AI_Planning.pdf
AI_Planning.pdfAI_Planning.pdf
AI_Planning.pdf
 
Reinforcement Learning on Mine Sweeper
Reinforcement Learning on Mine SweeperReinforcement Learning on Mine Sweeper
Reinforcement Learning on Mine Sweeper
 
Problem Formulation in Artificial Inteligence Projects
Problem Formulation in Artificial Inteligence ProjectsProblem Formulation in Artificial Inteligence Projects
Problem Formulation in Artificial Inteligence Projects
 
How to formulate reinforcement learning in illustrative ways
How to formulate reinforcement learning in illustrative waysHow to formulate reinforcement learning in illustrative ways
How to formulate reinforcement learning in illustrative ways
 
Counterfactual Learning for Recommendation
Counterfactual Learning for RecommendationCounterfactual Learning for Recommendation
Counterfactual Learning for Recommendation
 
anintroductiontoreinforcementlearning-180912151720.pdf
anintroductiontoreinforcementlearning-180912151720.pdfanintroductiontoreinforcementlearning-180912151720.pdf
anintroductiontoreinforcementlearning-180912151720.pdf
 
An introduction to reinforcement learning
An introduction to reinforcement learningAn introduction to reinforcement learning
An introduction to reinforcement learning
 
Uncertainty Awareness in Integrating Machine Learning and Game Theory
Uncertainty Awareness in Integrating Machine Learning and Game TheoryUncertainty Awareness in Integrating Machine Learning and Game Theory
Uncertainty Awareness in Integrating Machine Learning and Game Theory
 
Demystifying deep reinforement learning
Demystifying deep reinforement learningDemystifying deep reinforement learning
Demystifying deep reinforement learning
 
reiniforcement learning.ppt
reiniforcement learning.pptreiniforcement learning.ppt
reiniforcement learning.ppt
 
An Introduction to boosting
An Introduction to boostingAn Introduction to boosting
An Introduction to boosting
 
Reinforcement Learning in Practice: Contextual Bandits
Reinforcement Learning in Practice: Contextual BanditsReinforcement Learning in Practice: Contextual Bandits
Reinforcement Learning in Practice: Contextual Bandits
 
Amit ppt
Amit pptAmit ppt
Amit ppt
 
YijueRL.ppt
YijueRL.pptYijueRL.ppt
YijueRL.ppt
 
RL_online _presentation_1.ppt
RL_online _presentation_1.pptRL_online _presentation_1.ppt
RL_online _presentation_1.ppt
 

Mehr von ahmad bassiouny (20)

Work Study & Productivity
Work Study & ProductivityWork Study & Productivity
Work Study & Productivity
 
Work Study
Work StudyWork Study
Work Study
 
Motion And Time Study
Motion And Time StudyMotion And Time Study
Motion And Time Study
 
Motion Study
Motion StudyMotion Study
Motion Study
 
The Christmas Story
The Christmas StoryThe Christmas Story
The Christmas Story
 
Turkey Photos
Turkey PhotosTurkey Photos
Turkey Photos
 
Mission Bo Kv3
Mission Bo Kv3Mission Bo Kv3
Mission Bo Kv3
 
Miramar
MiramarMiramar
Miramar
 
Mom
MomMom
Mom
 
Linearization
LinearizationLinearization
Linearization
 
Kblmt B000 Intro Kaizen Based Lean Manufacturing
Kblmt B000 Intro Kaizen Based Lean ManufacturingKblmt B000 Intro Kaizen Based Lean Manufacturing
Kblmt B000 Intro Kaizen Based Lean Manufacturing
 
How To Survive
How To SurviveHow To Survive
How To Survive
 
Dad
DadDad
Dad
 
Ancient Hieroglyphics
Ancient HieroglyphicsAncient Hieroglyphics
Ancient Hieroglyphics
 
Dubai In 2009
Dubai In 2009Dubai In 2009
Dubai In 2009
 
DesignPeopleSystem
DesignPeopleSystemDesignPeopleSystem
DesignPeopleSystem
 
Organizational Behavior
Organizational BehaviorOrganizational Behavior
Organizational Behavior
 
Work Study Workshop
Work Study WorkshopWork Study Workshop
Work Study Workshop
 
Workstudy
WorkstudyWorkstudy
Workstudy
 
Time And Motion Study
Time And  Motion  StudyTime And  Motion  Study
Time And Motion Study
 

Kürzlich hochgeladen

Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...
Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...
Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...RKavithamani
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3JemimahLaneBuaron
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityGeoBlogs
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...EduSkills OECD
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationnomboosow
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxheathfieldcps1
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingTechSoup
 
Student login on Anyboli platform.helpin
Student login on Anyboli platform.helpinStudent login on Anyboli platform.helpin
Student login on Anyboli platform.helpinRaunakKeshri1
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104misteraugie
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Celine George
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformChameera Dedduwage
 
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptxContemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptxRoyAbrique
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactdawncurless
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdfQucHHunhnh
 
URLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website AppURLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website AppCeline George
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13Steve Thomason
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfchloefrazer622
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Krashi Coaching
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxiammrhaywood
 

Kürzlich hochgeladen (20)

Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...
Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...
Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activity
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communication
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 
Student login on Anyboli platform.helpin
Student login on Anyboli platform.helpinStudent login on Anyboli platform.helpin
Student login on Anyboli platform.helpin
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy Reform
 
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptxContemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impact
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
 
URLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website AppURLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website App
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13
 
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptxINDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdf
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
 

Hierarchical Pomdp Planning And Execution

  • 1. Hierarchical POMDP Planning and Execution Joelle Pineau Machine Learning Lunch November 20, 2000
  • 2.
  • 3.
  • 4.
  • 5.
  • 6. Example M B K E 0.1 0.1 0.1 0.1 0.1 0.1 0.8 0.8 POMDP: S o = { M eds, K itchen, B edroom} A o = {ClarifyTask, Check M eds, GoTo K itchen, GoTo B edroom} O o = {Noise, M eds, K itchen, B edroom} Value Function: MedsState KitchenState BedroomState 0.8 GoToKitchen ClarifyTask GoToBedroom CheckMeds
  • 7. Hierarchical POMDP Action Partitioning: Act Move CheckMeds ClarifyTask ClarifyTask GoToKitchen GoToBedroom
  • 8. Local Value Function and Policy - Move Controller ClarifyTask GoToKitchen GoToBedroom MedsState KitchenState BedroomState
  • 9. Modeling Abstract Actions ClarifyTask GoToKitchen GoToBedroom MedsState KitchenState BedroomState Problem : Need parameters for abstract action Move Solution : Use the local policy of corresponding low-level controller General form : Pr ( s j | s i , a k abstract ) = Pr ( s j | s i , Policy(a k abstract ,s i ) ) Example : Pr ( s j | MedsState , Move ) = Pr ( s j | MedsState , ClarifyTask ) Policy (Move,s i ):
  • 10. Local Value Function and Policy - Act Controller Move MedsState KitchenState BedroomState CheckMeds
  • 11. Comparing Policies Hierarchical Policy: Optimal Policy: = ClarifyTask = CheckMeds = GoToKitchen = GoToBedroom
  • 12.
  • 13. A real dialogue management example - AskGoWhere - GoToRoom - GoToKitchen - GoToFollow - VerifyRoom - VerifyKitchen - VerifyFollow - GreetGeneral - GreetMorning - GreetNight - RespondThanks - AskWeatherTime - SayCurrent - SayToday - SayTomorrow - StartMeds - NextMeds - ForceMeds - QuitMeds - AskCallWho - CallHelp - CallNurse - CallRelative - VerifyHelp - VerifyNurse - VerifyRelative - AskHealth - OfferHelp - SayTime Act CheckHealth Phone DoMeds CheckWeather Move Greet
  • 15.

Hinweis der Redaktion

  1. Talk to you about my recent work on ...