SlideShare ist ein Scribd-Unternehmen logo
1 von 25
Downloaden Sie, um offline zu lesen
PRM-RL: Long-range Robotics Navigation
Tasks by Combining Reinforcement
Learning and Sampling-based Planning
IEEE International Conference on Robotics and Automation (ICRA), 2018
Best Paper Award in Service Robotics
Aleksandra Faust et al.
Google Brain Robotics
Presented by Dongmin Lee
December 1, 2019
Outline
• Abstract
• Introduction
• Reinforcement Learning
• Methods
• Results
1
Abstract
2
PRM-RL (Probabilistic Roadmap-Reinforcement Learning):
• A hierarchical method for long-range navigation task
• Combines sampling-based path planning with RL
• Uses feature-based and deep neural net policies (DDPG) in continuous
state and action spaces
Experiments: simulation and robot on two navigation tasks (end-to-end)
• Indoor (drive) navigation in office environments - selected
• Aerial cargo delivery in urban environments
Abstract
3
PRM-RL (Probabilistic Roadmap-Reinforcement Learning):
• A hierarchical method for long-range navigation task
• Combines sampling-based path planning with RL
• Uses feature-based and deep neural net policies (DDPG) in continuous
state and action spaces
Experiments: simulation and robot on two navigation tasks (end-to-end)
• Indoor (drive) navigation in office environments - selected
• Aerial cargo delivery in urban environments
Introduction
4
PRM-RL YouTube video
• https://bit.ly/34zCTmd
Traditional Motion Planning (or Path Planning)
• CS287 Advanced Robotics (Fall 2019), Lecture 9: Motion Planning
• https://people.eecs.berkeley.edu/~pabbeel/cs287-fa19/slides/Lec10-
motion-planning.pdf
Probabilistic Roadmap (PRM) YouTube video
• https://bit.ly/34rRKz0
• https://bit.ly/35Nb61Q
Rapidly-exploring Random Tree* (RRT*) YouTube video
• https://bit.ly/2OXiocb
• https://bit.ly/2OQbUvM
5
RL provides a formalism for behaviors
• Problem of a goal-directed agent interacting with an uncertain environment
• Interaction à adaptation
feedback & decision
Reinforcement Learning
6
What are the challenges of RL?
• Huge # of samples: millions
• Fast, stable learning
• Hyperparameter tuning
• Exploration
• Sparse reward signals
• Safety / reliability
• Simulator
Reinforcement Learning
7
What are the challenges of RL?
• Huge # of samples: millions
• Fast, stable learning
• Hyperparameter tuning
• Exploration
• Sparse reward signals due to long-range navigation
• Safety / reliability
• Simulator
Reinforcement Learning
8
What are the challenges of RL?
• Huge # of samples: millions
• Fast, stable learning
• Hyperparameter tuning
• Exploration
• Sparse reward signals due to long-range navigation
à Solve with hierarchical waypoints
• Safety / reliability
• Simulator
Reinforcement Learning
9
So, What’s the advantage of PRM-RL than traditional methods?
• In PRM-RL, an RL agent is trained to execute a local point-to-point task
without knowledge of the topology, learning the task constraints.
• The PRM-RL builds a roadmap using the RL agent instead of the traditional
collision-free straight-line planner.
• Thus, the resulting long-range navigation planner combines the planning
efficiency of a PRM with the robustness of an RL agent.
Introduction
10
Experiment: environments used for the indoor navigation tasks
Introduction
11
Three stages:
1. RL agent training
2. PRM construction (roadmap creation)
3. PRM-RL querying (roadmap querying)
Methods
12
Methods
1. RL agent training
Definition
• 𝑆: robot’s state space
• 𝑠: start state in state space 𝑆
• 𝑔: goal state in state space 𝑆
• C-space: a space of all possible robot configurations
(e.g., state space 𝑆 is a superset of the C-space)
• C-free: a partition of C-space consisting of only collision-free paths
• 𝐿(𝑠): some task predicate (attribute) to satisfies the task constraints
• 𝑝(𝑠): a state space point’s estimate onto C-space that belong in C-free
The task is completed when the system is sufficiently close to the goal state:
∥ 𝑝 𝑠 − 𝑝 𝑔 ∥ ≤ 𝜖
Our goal is to find a transfer function:
𝑠, = 𝑓(𝑠, 𝑎)
1. RL agent training
Markov Decision Process (MDP):
• 𝑆: 𝑆 ⊂ ℝ34 is the state or observation space of the robot
• 𝑠: 𝑠 = (𝑔, 𝑜) ∈ ℝ77
(the goal 𝑔 in polar coordinates and LIDAR observations 𝑜)
• 𝐴: 𝐴 ⊂ ℝ39 is the space of all possible actions that the robot can perform
• 𝑎: 𝑎 = (𝑣;, 𝑣<) ∈ ℝ=
(two-dimensional vector of wheel speeds)
• 𝑃: 𝑆 × 𝐴 → ℝ is a probability distribution over state and actions. We assume a presence of
a simplified black-box simulator without knowing the full non-linear system dynamics
• 𝑅: 𝑆 → ℝ is a scalar reward. We reward the agent for staying away from obstacles.
Our goal is to find a policy 𝜋 ∶ 𝑆 → 𝐴:
𝜋 𝑠 = 𝑎
Given an observed state 𝑠, returns an action 𝑎 that agent should perform to maximize
long-term return:
𝜋∗(𝑠) = arg max
J∈K
𝔼 M
NOP
Q
𝛾N 𝑅 𝑠N
13
Methods
14
Methods
1. RL agent training
Markov Decision Process (MDP):
• 𝑆: 𝑆 ⊂ ℝ34 is the state or observation space of the robot
• 𝑠: 𝑠 = (𝑔, 𝑜) ∈ ℝ77
(the goal 𝑔 in polar coordinates and LIDAR observations 𝑜)
• 𝐴: 𝐴 ⊂ ℝ39 is the space of all possible actions that the robot can perform
• 𝑎: 𝑎 = (𝑣;, 𝑣<) ∈ ℝ=
(two-dimensional vector of wheel speeds)
• 𝑃: 𝑆 × 𝐴 → ℝ is a probability distribution over state and actions. We assume a presence of
a simplified black-box simulator without knowing the full non-linear system dynamics
• 𝑅: 𝑆 → ℝ is a scalar reward. We reward the agent for staying away from obstacles.
Our goal is to find a policy 𝜋 ∶ 𝑆 → 𝐴:
𝜋 𝑠 = 𝑎
Given an observed state 𝑠, returns an action 𝑎 that agent should perform to maximize
long-term return:
𝜋∗(𝑠) = arg max
J∈K
𝔼 M
NOP
Q
𝛾N 𝑅 𝑠N
15
Methods
1. RL agent training
Training with DDPG algorithm for the indoor navigation tasks
16
Methods
2. PRM construction (roadmap creation)
Algorithm 1: connect two nodes using PRM-RL
17
Methods
3. PRM querying (roadmap querying)
Generate long-range trajectories
• We query a roadmap that return a list of waypoints to a higher-level planner.
• The higher-level planner then invokes a RL agent to produce a trajectory to the next
waypoint.
• When the robot is within the waypoint’s goal range, the higher-level planner changes
the goal with the next waypoint in the list.
18
Results
Indoor navigation
1. Roadmap construction evaluation
2. Expected trajectory characteristics
3. Actual trajectory characteristics
4. Physical robot experiments
à Each roadmap is evaluated on 100 randomly generated queries from the C-free.
1. Roadmap construction evaluation
• The higher sampling density produces larger maps and more successful
queries.
• The number of nodes in the map does not depend on the local planner, but
the number of edges and collision checks do.
• Roadmaps built with the RL local planner are more densely connected with 15%
and 50% more edges.
• The RL agent can go around the corners and small obstacles.
19
Results
20
Results
2. Expected trajectory characteristics
• The RL agent does not require the robot to come to rest at the goal region,
therefore the robot experiences some inertia when the waypoint is
switched. This causes some of the failures.
• The PRM-RL paths contain more waypoints except Building 3.
• Expected trajectory length and duration are longer for the RL agent.
21
Results
3. Actual trajectory characteristics
• We look at the query characteristics for successful versus unsuccessful
queries.
• The RL agent produces higher success rate than the PRM-SL.
• The successful trajectories have fewer waypoints than the expected
waypoints, which means that the shorter queries are more likely to succeed.
4. Physical robot experiments
• To transfer of our approach on a real robot, we created a simple slalom-like
environment with four obstacles.
22
Results
23
Results
PRM-RL YouTube video
• https://bit.ly/34zCTmd
Thank You!
Any Questions?

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Introduction to Deep Learning
Introduction to Deep LearningIntroduction to Deep Learning
Introduction to Deep Learning
 
An overview of gradient descent optimization algorithms
An overview of gradient descent optimization algorithms An overview of gradient descent optimization algorithms
An overview of gradient descent optimization algorithms
 
Episodic Policy Gradient Training
Episodic Policy Gradient TrainingEpisodic Policy Gradient Training
Episodic Policy Gradient Training
 
Reinforcement Learning
Reinforcement LearningReinforcement Learning
Reinforcement Learning
 
Algorithms and Programming
Algorithms and ProgrammingAlgorithms and Programming
Algorithms and Programming
 
Activation function
Activation functionActivation function
Activation function
 
Hyperparameter Tuning
Hyperparameter TuningHyperparameter Tuning
Hyperparameter Tuning
 
Continuous Control with Deep Reinforcement Learning, lillicrap et al, 2015
Continuous Control with Deep Reinforcement Learning, lillicrap et al, 2015Continuous Control with Deep Reinforcement Learning, lillicrap et al, 2015
Continuous Control with Deep Reinforcement Learning, lillicrap et al, 2015
 
Anu 8th sem
Anu 8th semAnu 8th sem
Anu 8th sem
 
Trust Region Policy Optimization, Schulman et al, 2015
Trust Region Policy Optimization, Schulman et al, 2015Trust Region Policy Optimization, Schulman et al, 2015
Trust Region Policy Optimization, Schulman et al, 2015
 
Logistic Regression
Logistic RegressionLogistic Regression
Logistic Regression
 
Temporal difference learning
Temporal difference learningTemporal difference learning
Temporal difference learning
 
Model Based Episodic Memory
Model Based Episodic MemoryModel Based Episodic Memory
Model Based Episodic Memory
 
Recent Trends in Neural Net Policy Learning
Recent Trends in Neural Net Policy LearningRecent Trends in Neural Net Policy Learning
Recent Trends in Neural Net Policy Learning
 
Discrete sequential prediction of continuous actions for deep RL
Discrete sequential prediction of continuous actions for deep RLDiscrete sequential prediction of continuous actions for deep RL
Discrete sequential prediction of continuous actions for deep RL
 
Machine learning Algorithms with a Sagemaker demo
Machine learning Algorithms with a Sagemaker demoMachine learning Algorithms with a Sagemaker demo
Machine learning Algorithms with a Sagemaker demo
 
L05 language model_part2
L05 language model_part2L05 language model_part2
L05 language model_part2
 
PR12-094: Model-Agnostic Meta-Learning for fast adaptation of deep networks
PR12-094: Model-Agnostic Meta-Learning for fast adaptation of deep networksPR12-094: Model-Agnostic Meta-Learning for fast adaptation of deep networks
PR12-094: Model-Agnostic Meta-Learning for fast adaptation of deep networks
 
Deep Dive into Hyperparameter Tuning
Deep Dive into Hyperparameter TuningDeep Dive into Hyperparameter Tuning
Deep Dive into Hyperparameter Tuning
 
Deep reinforcement learning from scratch
Deep reinforcement learning from scratchDeep reinforcement learning from scratch
Deep reinforcement learning from scratch
 

Ähnlich wie PRM-RL: Long-range Robotics Navigation Tasks by Combining Reinforcement Learning and Sampling-based Planning

Reactive Deformation of Path for Navigation Among Dynamic Obstacles
Reactive Deformation of Path for Navigation Among Dynamic ObstaclesReactive Deformation of Path for Navigation Among Dynamic Obstacles
Reactive Deformation of Path for Navigation Among Dynamic Obstacles
Anand Taralika
 
NEAL-2016 ARL Symposium Poster
NEAL-2016 ARL Symposium PosterNEAL-2016 ARL Symposium Poster
NEAL-2016 ARL Symposium Poster
Barbara Jean Neal
 
Welch Verolog 2013
Welch Verolog 2013Welch Verolog 2013
Welch Verolog 2013
Philip Welch
 
Path Planning And Navigation
Path Planning And NavigationPath Planning And Navigation
Path Planning And Navigation
guest90654fd
 
Path Planning And Navigation
Path Planning And NavigationPath Planning And Navigation
Path Planning And Navigation
guest90654fd
 

Ähnlich wie PRM-RL: Long-range Robotics Navigation Tasks by Combining Reinforcement Learning and Sampling-based Planning (20)

Reactive Deformation of Path for Navigation Among Dynamic Obstacles
Reactive Deformation of Path for Navigation Among Dynamic ObstaclesReactive Deformation of Path for Navigation Among Dynamic Obstacles
Reactive Deformation of Path for Navigation Among Dynamic Obstacles
 
Robotics Navigation
Robotics NavigationRobotics Navigation
Robotics Navigation
 
Robotics Localization
Robotics LocalizationRobotics Localization
Robotics Localization
 
SPLT Transformer.pptx
SPLT Transformer.pptxSPLT Transformer.pptx
SPLT Transformer.pptx
 
Multiple UGV SLAM Map Sharing
Multiple UGV SLAM Map SharingMultiple UGV SLAM Map Sharing
Multiple UGV SLAM Map Sharing
 
NEAL-2016 ARL Symposium Poster
NEAL-2016 ARL Symposium PosterNEAL-2016 ARL Symposium Poster
NEAL-2016 ARL Symposium Poster
 
Welch Verolog 2013
Welch Verolog 2013Welch Verolog 2013
Welch Verolog 2013
 
CDE Marketplace Sept 2016: Polaris Consulting Ltd (Autonomy & Big Data) Sessi...
CDE Marketplace Sept 2016: Polaris Consulting Ltd (Autonomy & Big Data) Sessi...CDE Marketplace Sept 2016: Polaris Consulting Ltd (Autonomy & Big Data) Sessi...
CDE Marketplace Sept 2016: Polaris Consulting Ltd (Autonomy & Big Data) Sessi...
 
Trajectory Transformer.pptx
Trajectory Transformer.pptxTrajectory Transformer.pptx
Trajectory Transformer.pptx
 
Introductory Level of SLAM Seminar
Introductory Level of SLAM SeminarIntroductory Level of SLAM Seminar
Introductory Level of SLAM Seminar
 
Neural Approximate Dynamic Programming for On-Demand Ride-Pooling
Neural Approximate Dynamic Programming for On-Demand Ride-PoolingNeural Approximate Dynamic Programming for On-Demand Ride-Pooling
Neural Approximate Dynamic Programming for On-Demand Ride-Pooling
 
Optimized Multi-agent Box-pushing - 2017-10-24
Optimized Multi-agent Box-pushing - 2017-10-24Optimized Multi-agent Box-pushing - 2017-10-24
Optimized Multi-agent Box-pushing - 2017-10-24
 
20210226 esa-science-coffee-v2.0
20210226 esa-science-coffee-v2.020210226 esa-science-coffee-v2.0
20210226 esa-science-coffee-v2.0
 
Rapid motor adaptation for legged robots
Rapid motor adaptation for legged robotsRapid motor adaptation for legged robots
Rapid motor adaptation for legged robots
 
FastCampus 2018 SLAM Workshop
FastCampus 2018 SLAM WorkshopFastCampus 2018 SLAM Workshop
FastCampus 2018 SLAM Workshop
 
Path Planning And Navigation
Path Planning And NavigationPath Planning And Navigation
Path Planning And Navigation
 
Path Planning And Navigation
Path Planning And NavigationPath Planning And Navigation
Path Planning And Navigation
 
Robotics:The computational motion planning:: Sampling based algorithms
Robotics:The computational motion planning:: Sampling based algorithmsRobotics:The computational motion planning:: Sampling based algorithms
Robotics:The computational motion planning:: Sampling based algorithms
 
Visual odometry & slam utilizing indoor structured environments
Visual odometry & slam utilizing indoor structured environmentsVisual odometry & slam utilizing indoor structured environments
Visual odometry & slam utilizing indoor structured environments
 
Driving Behavior for ADAS and Autonomous Driving VI
Driving Behavior for ADAS and Autonomous Driving VIDriving Behavior for ADAS and Autonomous Driving VI
Driving Behavior for ADAS and Autonomous Driving VI
 

Mehr von Dongmin Lee

Causal Confusion in Imitation Learning
Causal Confusion in Imitation LearningCausal Confusion in Imitation Learning
Causal Confusion in Imitation Learning
Dongmin Lee
 

Mehr von Dongmin Lee (12)

Causal Confusion in Imitation Learning
Causal Confusion in Imitation LearningCausal Confusion in Imitation Learning
Causal Confusion in Imitation Learning
 
Causal Confusion in Imitation Learning
Causal Confusion in Imitation LearningCausal Confusion in Imitation Learning
Causal Confusion in Imitation Learning
 
Maximum Entropy Reinforcement Learning (Stochastic Control)
Maximum Entropy Reinforcement Learning (Stochastic Control)Maximum Entropy Reinforcement Learning (Stochastic Control)
Maximum Entropy Reinforcement Learning (Stochastic Control)
 
Let's do Inverse RL
Let's do Inverse RLLet's do Inverse RL
Let's do Inverse RL
 
모두를 위한 PG여행 가이드
모두를 위한 PG여행 가이드모두를 위한 PG여행 가이드
모두를 위한 PG여행 가이드
 
Safe Reinforcement Learning
Safe Reinforcement LearningSafe Reinforcement Learning
Safe Reinforcement Learning
 
안.전.제.일. 강화학습!
안.전.제.일. 강화학습!안.전.제.일. 강화학습!
안.전.제.일. 강화학습!
 
Planning and Learning with Tabular Methods
Planning and Learning with Tabular MethodsPlanning and Learning with Tabular Methods
Planning and Learning with Tabular Methods
 
Multi-armed Bandits
Multi-armed BanditsMulti-armed Bandits
Multi-armed Bandits
 
강화학습 알고리즘의 흐름도 Part 2
강화학습 알고리즘의 흐름도 Part 2강화학습 알고리즘의 흐름도 Part 2
강화학습 알고리즘의 흐름도 Part 2
 
강화학습의 흐름도 Part 1
강화학습의 흐름도 Part 1강화학습의 흐름도 Part 1
강화학습의 흐름도 Part 1
 
강화학습의 개요
강화학습의 개요강화학습의 개요
강화학습의 개요
 

Kürzlich hochgeladen

Verification of thevenin's theorem for BEEE Lab (1).pptx
Verification of thevenin's theorem for BEEE Lab (1).pptxVerification of thevenin's theorem for BEEE Lab (1).pptx
Verification of thevenin's theorem for BEEE Lab (1).pptx
chumtiyababu
 
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak HamilCara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Kandungan 087776558899
 
Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwait
Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills KuwaitKuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwait
Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwait
jaanualu31
 

Kürzlich hochgeladen (20)

Verification of thevenin's theorem for BEEE Lab (1).pptx
Verification of thevenin's theorem for BEEE Lab (1).pptxVerification of thevenin's theorem for BEEE Lab (1).pptx
Verification of thevenin's theorem for BEEE Lab (1).pptx
 
DC MACHINE-Motoring and generation, Armature circuit equation
DC MACHINE-Motoring and generation, Armature circuit equationDC MACHINE-Motoring and generation, Armature circuit equation
DC MACHINE-Motoring and generation, Armature circuit equation
 
Thermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - VThermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - V
 
Introduction to Serverless with AWS Lambda
Introduction to Serverless with AWS LambdaIntroduction to Serverless with AWS Lambda
Introduction to Serverless with AWS Lambda
 
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak HamilCara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
 
Engineering Drawing focus on projection of planes
Engineering Drawing focus on projection of planesEngineering Drawing focus on projection of planes
Engineering Drawing focus on projection of planes
 
Block diagram reduction techniques in control systems.ppt
Block diagram reduction techniques in control systems.pptBlock diagram reduction techniques in control systems.ppt
Block diagram reduction techniques in control systems.ppt
 
Online electricity billing project report..pdf
Online electricity billing project report..pdfOnline electricity billing project report..pdf
Online electricity billing project report..pdf
 
Unit 4_Part 1 CSE2001 Exception Handling and Function Template and Class Temp...
Unit 4_Part 1 CSE2001 Exception Handling and Function Template and Class Temp...Unit 4_Part 1 CSE2001 Exception Handling and Function Template and Class Temp...
Unit 4_Part 1 CSE2001 Exception Handling and Function Template and Class Temp...
 
Thermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.pptThermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.ppt
 
Tamil Call Girls Bhayandar WhatsApp +91-9930687706, Best Service
Tamil Call Girls Bhayandar WhatsApp +91-9930687706, Best ServiceTamil Call Girls Bhayandar WhatsApp +91-9930687706, Best Service
Tamil Call Girls Bhayandar WhatsApp +91-9930687706, Best Service
 
Generative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPTGenerative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPT
 
Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwait
Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills KuwaitKuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwait
Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwait
 
Work-Permit-Receiver-in-Saudi-Aramco.pptx
Work-Permit-Receiver-in-Saudi-Aramco.pptxWork-Permit-Receiver-in-Saudi-Aramco.pptx
Work-Permit-Receiver-in-Saudi-Aramco.pptx
 
Computer Networks Basics of Network Devices
Computer Networks  Basics of Network DevicesComputer Networks  Basics of Network Devices
Computer Networks Basics of Network Devices
 
Thermal Engineering Unit - I & II . ppt
Thermal Engineering  Unit - I & II . pptThermal Engineering  Unit - I & II . ppt
Thermal Engineering Unit - I & II . ppt
 
NO1 Top No1 Amil Baba In Azad Kashmir, Kashmir Black Magic Specialist Expert ...
NO1 Top No1 Amil Baba In Azad Kashmir, Kashmir Black Magic Specialist Expert ...NO1 Top No1 Amil Baba In Azad Kashmir, Kashmir Black Magic Specialist Expert ...
NO1 Top No1 Amil Baba In Azad Kashmir, Kashmir Black Magic Specialist Expert ...
 
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
 
HOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptx
HOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptxHOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptx
HOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptx
 
Hostel management system project report..pdf
Hostel management system project report..pdfHostel management system project report..pdf
Hostel management system project report..pdf
 

PRM-RL: Long-range Robotics Navigation Tasks by Combining Reinforcement Learning and Sampling-based Planning

  • 1. PRM-RL: Long-range Robotics Navigation Tasks by Combining Reinforcement Learning and Sampling-based Planning IEEE International Conference on Robotics and Automation (ICRA), 2018 Best Paper Award in Service Robotics Aleksandra Faust et al. Google Brain Robotics Presented by Dongmin Lee December 1, 2019
  • 2. Outline • Abstract • Introduction • Reinforcement Learning • Methods • Results 1
  • 3. Abstract 2 PRM-RL (Probabilistic Roadmap-Reinforcement Learning): • A hierarchical method for long-range navigation task • Combines sampling-based path planning with RL • Uses feature-based and deep neural net policies (DDPG) in continuous state and action spaces Experiments: simulation and robot on two navigation tasks (end-to-end) • Indoor (drive) navigation in office environments - selected • Aerial cargo delivery in urban environments
  • 4. Abstract 3 PRM-RL (Probabilistic Roadmap-Reinforcement Learning): • A hierarchical method for long-range navigation task • Combines sampling-based path planning with RL • Uses feature-based and deep neural net policies (DDPG) in continuous state and action spaces Experiments: simulation and robot on two navigation tasks (end-to-end) • Indoor (drive) navigation in office environments - selected • Aerial cargo delivery in urban environments
  • 5. Introduction 4 PRM-RL YouTube video • https://bit.ly/34zCTmd Traditional Motion Planning (or Path Planning) • CS287 Advanced Robotics (Fall 2019), Lecture 9: Motion Planning • https://people.eecs.berkeley.edu/~pabbeel/cs287-fa19/slides/Lec10- motion-planning.pdf Probabilistic Roadmap (PRM) YouTube video • https://bit.ly/34rRKz0 • https://bit.ly/35Nb61Q Rapidly-exploring Random Tree* (RRT*) YouTube video • https://bit.ly/2OXiocb • https://bit.ly/2OQbUvM
  • 6. 5 RL provides a formalism for behaviors • Problem of a goal-directed agent interacting with an uncertain environment • Interaction à adaptation feedback & decision Reinforcement Learning
  • 7. 6 What are the challenges of RL? • Huge # of samples: millions • Fast, stable learning • Hyperparameter tuning • Exploration • Sparse reward signals • Safety / reliability • Simulator Reinforcement Learning
  • 8. 7 What are the challenges of RL? • Huge # of samples: millions • Fast, stable learning • Hyperparameter tuning • Exploration • Sparse reward signals due to long-range navigation • Safety / reliability • Simulator Reinforcement Learning
  • 9. 8 What are the challenges of RL? • Huge # of samples: millions • Fast, stable learning • Hyperparameter tuning • Exploration • Sparse reward signals due to long-range navigation à Solve with hierarchical waypoints • Safety / reliability • Simulator Reinforcement Learning
  • 10. 9 So, What’s the advantage of PRM-RL than traditional methods? • In PRM-RL, an RL agent is trained to execute a local point-to-point task without knowledge of the topology, learning the task constraints. • The PRM-RL builds a roadmap using the RL agent instead of the traditional collision-free straight-line planner. • Thus, the resulting long-range navigation planner combines the planning efficiency of a PRM with the robustness of an RL agent. Introduction
  • 11. 10 Experiment: environments used for the indoor navigation tasks Introduction
  • 12. 11 Three stages: 1. RL agent training 2. PRM construction (roadmap creation) 3. PRM-RL querying (roadmap querying) Methods
  • 13. 12 Methods 1. RL agent training Definition • 𝑆: robot’s state space • 𝑠: start state in state space 𝑆 • 𝑔: goal state in state space 𝑆 • C-space: a space of all possible robot configurations (e.g., state space 𝑆 is a superset of the C-space) • C-free: a partition of C-space consisting of only collision-free paths • 𝐿(𝑠): some task predicate (attribute) to satisfies the task constraints • 𝑝(𝑠): a state space point’s estimate onto C-space that belong in C-free The task is completed when the system is sufficiently close to the goal state: ∥ 𝑝 𝑠 − 𝑝 𝑔 ∥ ≤ 𝜖 Our goal is to find a transfer function: 𝑠, = 𝑓(𝑠, 𝑎)
  • 14. 1. RL agent training Markov Decision Process (MDP): • 𝑆: 𝑆 ⊂ ℝ34 is the state or observation space of the robot • 𝑠: 𝑠 = (𝑔, 𝑜) ∈ ℝ77 (the goal 𝑔 in polar coordinates and LIDAR observations 𝑜) • 𝐴: 𝐴 ⊂ ℝ39 is the space of all possible actions that the robot can perform • 𝑎: 𝑎 = (𝑣;, 𝑣<) ∈ ℝ= (two-dimensional vector of wheel speeds) • 𝑃: 𝑆 × 𝐴 → ℝ is a probability distribution over state and actions. We assume a presence of a simplified black-box simulator without knowing the full non-linear system dynamics • 𝑅: 𝑆 → ℝ is a scalar reward. We reward the agent for staying away from obstacles. Our goal is to find a policy 𝜋 ∶ 𝑆 → 𝐴: 𝜋 𝑠 = 𝑎 Given an observed state 𝑠, returns an action 𝑎 that agent should perform to maximize long-term return: 𝜋∗(𝑠) = arg max J∈K 𝔼 M NOP Q 𝛾N 𝑅 𝑠N 13 Methods
  • 15. 14 Methods 1. RL agent training Markov Decision Process (MDP): • 𝑆: 𝑆 ⊂ ℝ34 is the state or observation space of the robot • 𝑠: 𝑠 = (𝑔, 𝑜) ∈ ℝ77 (the goal 𝑔 in polar coordinates and LIDAR observations 𝑜) • 𝐴: 𝐴 ⊂ ℝ39 is the space of all possible actions that the robot can perform • 𝑎: 𝑎 = (𝑣;, 𝑣<) ∈ ℝ= (two-dimensional vector of wheel speeds) • 𝑃: 𝑆 × 𝐴 → ℝ is a probability distribution over state and actions. We assume a presence of a simplified black-box simulator without knowing the full non-linear system dynamics • 𝑅: 𝑆 → ℝ is a scalar reward. We reward the agent for staying away from obstacles. Our goal is to find a policy 𝜋 ∶ 𝑆 → 𝐴: 𝜋 𝑠 = 𝑎 Given an observed state 𝑠, returns an action 𝑎 that agent should perform to maximize long-term return: 𝜋∗(𝑠) = arg max J∈K 𝔼 M NOP Q 𝛾N 𝑅 𝑠N
  • 16. 15 Methods 1. RL agent training Training with DDPG algorithm for the indoor navigation tasks
  • 17. 16 Methods 2. PRM construction (roadmap creation) Algorithm 1: connect two nodes using PRM-RL
  • 18. 17 Methods 3. PRM querying (roadmap querying) Generate long-range trajectories • We query a roadmap that return a list of waypoints to a higher-level planner. • The higher-level planner then invokes a RL agent to produce a trajectory to the next waypoint. • When the robot is within the waypoint’s goal range, the higher-level planner changes the goal with the next waypoint in the list.
  • 19. 18 Results Indoor navigation 1. Roadmap construction evaluation 2. Expected trajectory characteristics 3. Actual trajectory characteristics 4. Physical robot experiments à Each roadmap is evaluated on 100 randomly generated queries from the C-free.
  • 20. 1. Roadmap construction evaluation • The higher sampling density produces larger maps and more successful queries. • The number of nodes in the map does not depend on the local planner, but the number of edges and collision checks do. • Roadmaps built with the RL local planner are more densely connected with 15% and 50% more edges. • The RL agent can go around the corners and small obstacles. 19 Results
  • 21. 20 Results 2. Expected trajectory characteristics • The RL agent does not require the robot to come to rest at the goal region, therefore the robot experiences some inertia when the waypoint is switched. This causes some of the failures. • The PRM-RL paths contain more waypoints except Building 3. • Expected trajectory length and duration are longer for the RL agent.
  • 22. 21 Results 3. Actual trajectory characteristics • We look at the query characteristics for successful versus unsuccessful queries. • The RL agent produces higher success rate than the PRM-SL. • The successful trajectories have fewer waypoints than the expected waypoints, which means that the shorter queries are more likely to succeed.
  • 23. 4. Physical robot experiments • To transfer of our approach on a real robot, we created a simple slalom-like environment with four obstacles. 22 Results
  • 24. 23 Results PRM-RL YouTube video • https://bit.ly/34zCTmd