SlideShare ist ein Scribd-Unternehmen logo
1 von 20
Downloaden Sie, um offline zu lesen
A Deeper Look at Experience Replay
(17.12)
Seungjae Ryan Lee
Online Learning
• Learn directly from experience
• Highly correlated data
New transition t
Experience Replay
• Save transitions (𝑆𝑡, 𝐴 𝑡, 𝑅𝑡+1, 𝑆𝑡+1) into buffer and sample batch 𝐵
• Use batch 𝐵 to train the agent
(S, A, R, S)
(S, A, R, S)
(S, A, R, S)
Replay Buffer D
Transition t Batch B
Effectiveness of Experience Replay
• Only method that can generate uncorrelated data for online RL
• Except using multiple workers (A3C)
• Significantly improves data efficiency
• Norm in many deep RL algorithms
• Deep Q-Networks (DQN)
• Deep Deterministic Policy Gradient (DDPG)
• Hindsight Experience Replay (HER)
Problem with Experience Replay
• There has been default capacity of 106 used for:
• Different algorithms (DQN, PG, etc.)
• Different environments (retro games, continuous control, etc.)
• Different neural network architectures
Result 1
Replay buffer capacity can have significant negative impact on
performance if too low or too high.
Combined Experience Replay (CER)
• Save transitions (𝑆𝑡, 𝐴 𝑡, 𝑅𝑡+1, 𝑆𝑡+1) into buffer and sample batch 𝐵
• Use batch 𝐵 to and online transition 𝑡 to train the agent
(S, A, R, S)
(S, A, R, S)
(S, A, R, S)
Replay Buffer D
Batch B
Transition t
Combined Experience Replay (CER)
Result 2
CER can remedy the negative influence of a large replay buffer with
𝑂 1 computation.
CER vs. Prioritized Experience Replay (PER)
• Prioritized Experience Replay (PER)
• Stochastic replay method
• Designed to replay the buffer more efficiently
• Always expected to improve performance
• 𝑂(𝑁 log 𝑁)
• Combined Experience Replay (CER)
• Guaranteed to use newest transition
• Designed to remedy negative influence of a large replay buffer
• Does not improve performance for good replay buffer sizes
• 𝑂(1)
Test agents
1. Online-Q
• Q-learning with online transitions 𝑡
2. Buffer-Q
• Q-learning with the replay buffer 𝐵
3. Combined-Q
• Q-learning with both the replay buffer 𝐵 and online transitions 𝑡
Testbed Environments
• 3 environments for 3 methods
• Tabular, Linear and Nonlinear approximations
• Introduce “timeout” to all tasks
• Episode ends automatically after 𝑇 timesteps (large enough for each task)
• Prevent episode being arbitrarily long
• Used partial-episode-bootstrap (PEB) to minimize negative side-effects
Testbed: Gridworld
• Represent tabular methods
• Agent starts in 𝑆 and has a goal state 𝐺
• Agent can move left, right, up, down
• Reward is -1 until goal is reached
• If the agent bumps into the wall (black),
it remains in the same position
Gridworld Results (Tabular)
• Online-Q solves task very slowly
• Buffer-Q shows worse performance / speed for larger buffers
• Combined-Q shows slightly faster speed for larger buffers
Gridworld Results (Nonlinear)
• Online-Q fails to learn
• Combined-Q significantly speeds up learning
Testbed: Lunar Lander
• Represent linear approximation methods (with tile coding)
• Agent tries to land a shuttle on
the moon
• State space: 𝑅8
• 4 discrete actions
Lunar Lander Results (Linear)
• Buffer-Q shows worse learning speed for larger buffers
• Combined-Q is robust for varying buffer size
Lunar Lander Results (Nonlinear)
• Online-Q achieves best performance
• Combined-Q shows marginal improvement to Buffer-Q
• Buffer-Q and Combined-Q overfits after some time
Testbed: Pong
• Represent nonlinear approximation methods
• RAM states used instead of raw
pixels
• More accurate state representation
• State space: 0, … , 255 128
• 6 discrete actions
Pong Results (Nonlinear)
• All 3 agents fail to learn with a simple 1-hidden-layer network
• CER does not improve performance or speed
Limitations of Experience Replay
• Important transitions have delayed effects
• Partially mitigated with PER, but has a cost of 𝑂(𝑁 log 𝑁)
• Partially mitigated with correct buffer size or CER
• Both are workarounds, not solutions
• Experience Replay itself is flawed
• Focus should be on replacing experience replay
Thank you!
Original Paper: https://arxiv.org/abs/1712.01275
Paper Recommendations:
• Prioritized Experience Replay
• Hindsight Experience Replay
• Asynchronous Methods for Deep Reinforcement Learning
You can find more content in www.endtoend.ai/slides

Weitere ähnliche Inhalte

KĂźrzlich hochgeladen

Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 

KĂźrzlich hochgeladen (20)

Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 

Empfohlen

How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental Health
ThinkNow
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
Kurio // The Social Media Age(ncy)
 

Empfohlen (20)

2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot
 
Everything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTEverything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPT
 
Product Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsProduct Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage Engineerings
 
How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental Health
 
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
 
Skeleton Culture Code
Skeleton Culture CodeSkeleton Culture Code
Skeleton Culture Code
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search Intent
 
How to have difficult conversations
How to have difficult conversations How to have difficult conversations
How to have difficult conversations
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best Practices
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project management
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
 

[1712.01275] A Deeper Look at Experience Replay

  • 1. A Deeper Look at Experience Replay (17.12) Seungjae Ryan Lee
  • 2. Online Learning • Learn directly from experience • Highly correlated data New transition t
  • 3. Experience Replay • Save transitions (𝑆𝑡, 𝐴 𝑡, 𝑅𝑡+1, 𝑆𝑡+1) into buffer and sample batch 𝐵 • Use batch 𝐵 to train the agent (S, A, R, S) (S, A, R, S) (S, A, R, S) Replay Buffer D Transition t Batch B
  • 4. Effectiveness of Experience Replay • Only method that can generate uncorrelated data for online RL • Except using multiple workers (A3C) • Significantly improves data efficiency • Norm in many deep RL algorithms • Deep Q-Networks (DQN) • Deep Deterministic Policy Gradient (DDPG) • Hindsight Experience Replay (HER)
  • 5. Problem with Experience Replay • There has been default capacity of 106 used for: • Different algorithms (DQN, PG, etc.) • Different environments (retro games, continuous control, etc.) • Different neural network architectures Result 1 Replay buffer capacity can have significant negative impact on performance if too low or too high.
  • 6. Combined Experience Replay (CER) • Save transitions (𝑆𝑡, 𝐴 𝑡, 𝑅𝑡+1, 𝑆𝑡+1) into buffer and sample batch 𝐵 • Use batch 𝐵 to and online transition 𝑡 to train the agent (S, A, R, S) (S, A, R, S) (S, A, R, S) Replay Buffer D Batch B Transition t
  • 7. Combined Experience Replay (CER) Result 2 CER can remedy the negative influence of a large replay buffer with 𝑂 1 computation.
  • 8. CER vs. Prioritized Experience Replay (PER) • Prioritized Experience Replay (PER) • Stochastic replay method • Designed to replay the buffer more efficiently • Always expected to improve performance • 𝑂(𝑁 log 𝑁) • Combined Experience Replay (CER) • Guaranteed to use newest transition • Designed to remedy negative influence of a large replay buffer • Does not improve performance for good replay buffer sizes • 𝑂(1)
  • 9. Test agents 1. Online-Q • Q-learning with online transitions 𝑡 2. Buffer-Q • Q-learning with the replay buffer 𝐵 3. Combined-Q • Q-learning with both the replay buffer 𝐵 and online transitions 𝑡
  • 10. Testbed Environments • 3 environments for 3 methods • Tabular, Linear and Nonlinear approximations • Introduce “timeout” to all tasks • Episode ends automatically after 𝑇 timesteps (large enough for each task) • Prevent episode being arbitrarily long • Used partial-episode-bootstrap (PEB) to minimize negative side-effects
  • 11. Testbed: Gridworld • Represent tabular methods • Agent starts in 𝑆 and has a goal state 𝐺 • Agent can move left, right, up, down • Reward is -1 until goal is reached • If the agent bumps into the wall (black), it remains in the same position
  • 12. Gridworld Results (Tabular) • Online-Q solves task very slowly • Buffer-Q shows worse performance / speed for larger buffers • Combined-Q shows slightly faster speed for larger buffers
  • 13. Gridworld Results (Nonlinear) • Online-Q fails to learn • Combined-Q significantly speeds up learning
  • 14. Testbed: Lunar Lander • Represent linear approximation methods (with tile coding) • Agent tries to land a shuttle on the moon • State space: 𝑅8 • 4 discrete actions
  • 15. Lunar Lander Results (Linear) • Buffer-Q shows worse learning speed for larger buffers • Combined-Q is robust for varying buffer size
  • 16. Lunar Lander Results (Nonlinear) • Online-Q achieves best performance • Combined-Q shows marginal improvement to Buffer-Q • Buffer-Q and Combined-Q overfits after some time
  • 17. Testbed: Pong • Represent nonlinear approximation methods • RAM states used instead of raw pixels • More accurate state representation • State space: 0, … , 255 128 • 6 discrete actions
  • 18. Pong Results (Nonlinear) • All 3 agents fail to learn with a simple 1-hidden-layer network • CER does not improve performance or speed
  • 19. Limitations of Experience Replay • Important transitions have delayed effects • Partially mitigated with PER, but has a cost of 𝑂(𝑁 log 𝑁) • Partially mitigated with correct buffer size or CER • Both are workarounds, not solutions • Experience Replay itself is flawed • Focus should be on replacing experience replay
  • 20. Thank you! Original Paper: https://arxiv.org/abs/1712.01275 Paper Recommendations: • Prioritized Experience Replay • Hindsight Experience Replay • Asynchronous Methods for Deep Reinforcement Learning You can find more content in www.endtoend.ai/slides