[1312.5602] Playing Atari with Deep Reinforcement Learning

•

1 gefällt mir•908 views

Presentation slides for 'Playing Atari with Deep Reinforcement Learning' by Mnih et al. You can find more presentation slides in my website: https://www.endtoend.ai We present the first deep learning model to successfully learn control policies directly from high-dimensional sensory input using reinforcement learning. The model is a convolutional neural network, trained with a variant of Q-learning, whose input is raw pixels and whose output is a value function estimating future rewards. We apply our method to seven Atari 2600 games from the Arcade Learning Environment, with no adjustment of the architecture or learning algorithm. We find that it outperforms all previous approaches on six of the games and surpasses a human expert on three of them.

Technologie

Playing Atari with Deep Reinforcement
Learning (13.12)
Seungjae Ryan Lee

Previous Applications of RL
• Linear value functions or policy representations
• Rely on hand-crafted features
• Feature representation determines performance
• Can diverge with model-free RL, nonlinear approximation, off-policy

TD-gammon
• Superhuman-level Backgammon playing RL agent
• Model-free algorithm with one-hidden-layer MLP
• Considered a “special case”
https://users.auth.gr/kehagiat/Research/GameTheory/12CombBiblio/BackGammon.html

Meanwhile, in Deep Learning…
• CNN, MLP, RNN
• Extract high-level features from raw data
• Works in both supervised and unsupervised learning
• Will it work on Reinforcement Learning?
http://fromdata.org/2015/10/01/imagenet-cnn-architecture-image/

Problems of combining DL and RL
• No labelled data: only sparse, noisy, delayed rewards
• Breaks most underlying assumptions of deep learning algorithms
• Highly correlated data
• Moving distribution of data

Proposal: Experience Replay
• Idea originally by Long-Ji Lin
• Save transitions in a replay memory 𝐷
• Randomly sample previous transitions to update parameters
https://pdfs.semanticscholar.org/54c4/cf3a8168c1b70f91cf78a3dc98b671935492.pdf

Advantages of Experience Replay
1. Achieve greater data efficiency
• Each experience is used multiple times
2. Break correlation between samples
• Randomly sampled experience is nonconsecutive
3. Average the behavior distribution
• Current parameters does not affect incoming data distribution

Preprocessing
• Originally 210 × 160 image with 128 color palette
• Gray-scale and downsampled to 110 × 84
• Cropped to 84 × 84
• Only to fit particular GPU implementation
• Frame stacking
• Stack 4 preprocessed frames as input
• Need multiple frames for velocity, etc.

Model Architecture: Deep Q-Networks (DQN)
• Return 𝑄 values for all 10 actions
https://nthu-datalab.github.io/ml/labs/17_DQN_Policy_Network/17-DQN_&_Policy_Network.html

Environment: Atari 2600
• 7 games selected
• Create a single agent that can play as many games possible
• No game-specific information
• No hand-designed features
• Same network architecture and hyperparameters
https://github.com/mgbellemare/Arcade-Learning-Environment

Reward Clipping
• Clip all rewards to {−1, 0, 1}
• Fix all positive reward as +1
• Fix all negative rewards as −1
• Leave zero rewards unchanged
• Limits scale of error derivatives for different games
• Cannot differentiate between rewards of different magnitude

Frame Skipping
• Agent sees and selects actions on every 𝑘 𝑡ℎ
frame
• Action is repeated on skipped frames
• Agent can play roughly 𝑘 times more games
https://danieltakeshi.github.io/2016/11/25/frame-skipping-and-preprocessing-for-deep-q-networks-on-atari-2600-games/

Training and Stability
• How to evaluate agent during training?
• Total episodic reward: very noisy
• Policy’s estimated 𝑄 function: smooth
• Did not encounter any divergence issue in practice

Result on Atari 2600
• Outperform all previous RL algorithms in 6 games
• Surpass expert human player in 3 games

Thank you!
Original Paper: https://arxiv.org/abs/1312.5602
Paper Recommendations:
• [1502] Human-level control through Deep Reinforcement Learning
• [1511.05952] Prioritized Experience Replay
• [1712.01275] A Deeper Look at Experience Replay
You can find more content in www.endtoend.ai/slides

Empfohlen

Intro to Deep Reinforcement LearningKhaled Saleh

What is Deep Learning | Deep Learning Simplified | Deep Learning Tutorial | E...Edureka!

An introduction to reinforcement learningSubrat Panda, PhD

Deep neural networksSi Haem

An introduction to deep reinforcement learningBig Data Colombia

Recurrent Neural Networks (RNN) | RNN LSTM | Deep Learning Tutorial | Tensorf...Edureka!

What Is Deep Learning? | Introduction to Deep Learning | Deep Learning Tutori...Simplilearn

Reinforcement Learning Q-Learning Melaku Eneayehu

Empfohlen

Intro to Deep Reinforcement LearningKhaled Saleh

What is Deep Learning | Deep Learning Simplified | Deep Learning Tutorial | E...Edureka!

An introduction to reinforcement learningSubrat Panda, PhD

Deep neural networksSi Haem

An introduction to deep reinforcement learningBig Data Colombia

Recurrent Neural Networks (RNN) | RNN LSTM | Deep Learning Tutorial | Tensorf...Edureka!

What Is Deep Learning? | Introduction to Deep Learning | Deep Learning Tutori...Simplilearn

Reinforcement Learning Q-Learning Melaku Eneayehu

Image-to-Image Translation with Conditional Adversarial Nets (UPC Reading Group)Universitat Politècnica de Catalunya

Reinforcement LearningSalem-Kabbani

Deep Learning With Python | Deep Learning And Neural Networks | Deep Learning...Simplilearn

Machine Learning - Convolutional Neural NetworkRichard Kuo

Deep Reinforcement Learning: Q-LearningKai-Wen Zhao

Deep Reinforcement Learning and Its ApplicationsBill Liu

Introduction to Deep LearningOswald Campesato

Deep Reinforcement LearningUsman Qayyum

Generative adversarial networks남주 김

Generative Adversarial Networks (GANs)Luis Serrano

Multi-armed BanditsDongmin Lee

An introduction to reinforcement learningJie-Han Chen

Deep Learning - CNN and RNNAshray Bhandare

Deep Q-LearningNikolay Pavlov

Deep Belief NetworksHasan H Topcu

Vanishing & Exploding GradientsSiddharth Vij

A brief overview of Reinforcement Learning applied to gamesThomas da Silva Paula

Deep learning lecture - part 1 (basics, CNN)SungminYou

Playing Atari with Deep Reinforcement LearningYoonho Shin

Deep Learning in Computer VisionSungjoon Choi

Using Apache Spark to Predict Installer Retention from Messy Clickstream Data...Databricks

Startup.Ml: Using neon for NLP and Localization Applications Intel Nervana

Weitere ähnliche Inhalte

Was ist angesagt?

Image-to-Image Translation with Conditional Adversarial Nets (UPC Reading Group)Universitat Politècnica de Catalunya

Reinforcement LearningSalem-Kabbani

Deep Learning With Python | Deep Learning And Neural Networks | Deep Learning...Simplilearn

Machine Learning - Convolutional Neural NetworkRichard Kuo

Deep Reinforcement Learning: Q-LearningKai-Wen Zhao

Deep Reinforcement Learning and Its ApplicationsBill Liu

Introduction to Deep LearningOswald Campesato

Deep Reinforcement LearningUsman Qayyum

Generative adversarial networks남주 김

Generative Adversarial Networks (GANs)Luis Serrano

Multi-armed BanditsDongmin Lee

An introduction to reinforcement learningJie-Han Chen

Deep Learning - CNN and RNNAshray Bhandare

Deep Q-LearningNikolay Pavlov

Deep Belief NetworksHasan H Topcu

Vanishing & Exploding GradientsSiddharth Vij

A brief overview of Reinforcement Learning applied to gamesThomas da Silva Paula

Deep learning lecture - part 1 (basics, CNN)SungminYou

Playing Atari with Deep Reinforcement LearningYoonho Shin

Deep Learning in Computer VisionSungjoon Choi

Was ist angesagt? (20)

Image-to-Image Translation with Conditional Adversarial Nets (UPC Reading Group)

Reinforcement Learning

Deep Learning With Python | Deep Learning And Neural Networks | Deep Learning...

Machine Learning - Convolutional Neural Network

Deep Reinforcement Learning: Q-Learning

Deep Reinforcement Learning and Its Applications

Introduction to Deep Learning

Deep Reinforcement Learning

Generative adversarial networks

Generative Adversarial Networks (GANs)

Multi-armed Bandits

An introduction to reinforcement learning

Deep Learning - CNN and RNN

Deep Q-Learning

Deep Belief Networks

Vanishing & Exploding Gradients

A brief overview of Reinforcement Learning applied to games

Deep learning lecture - part 1 (basics, CNN)

Playing Atari with Deep Reinforcement Learning

Deep Learning in Computer Vision

Ähnlich wie [1312.5602] Playing Atari with Deep Reinforcement Learning

Using Apache Spark to Predict Installer Retention from Messy Clickstream Data...Databricks

Startup.Ml: Using neon for NLP and Localization Applications Intel Nervana

Grokking Techtalk #37: Data intensive problemGrokking VN

Шлигін Олександр “Розробка ігор в Unity загальні помилки” GameDev Conference ...Lviv Startup Club

ARISEkaptoxic

Deep Learning at ScaleMateusz Dymczyk

20 x Tips to better Optimize your Flash contentElad Elrom

GDC 2010 - A Dynamic Component Architecture for High Performance GameplayTerrance Cohen

GDC 2010 - A Dynamic Component Architecture for High Performance Gameplay - M...Terrance Cohen

3.2 Streaming and Messaging振东刘

CDN algosHikmat Abdoollayev

Single Page Applications with AngularJS 2.0 Sumanth Chinthagunta

Sista: Improving Cog’s JIT performanceESUG

AI powered emotion recognition: From Inception to Production - Global AI Conf...Vandana Kannan

AI powered emotion recognition: From Inception to Production - Global AI Conf...Apache MXNet

Hpc lunch and learnJohn D Almon

Realtime traffic analyserAlex Moskvin

John adams talk cloudyJohn Adams

Productionizing Deep Reinforcement Learning with Spark and MLflowDatabricks

Our Puppet Story (GUUG FFG 2015)DECK36

Ähnlich wie [1312.5602] Playing Atari with Deep Reinforcement Learning (20)

Using Apache Spark to Predict Installer Retention from Messy Clickstream Data...

Startup.Ml: Using neon for NLP and Localization Applications

Grokking Techtalk #37: Data intensive problem

Шлигін Олександр “Розробка ігор в Unity загальні помилки” GameDev Conference ...

ARISE

Deep Learning at Scale

20 x Tips to better Optimize your Flash content

GDC 2010 - A Dynamic Component Architecture for High Performance Gameplay

GDC 2010 - A Dynamic Component Architecture for High Performance Gameplay - M...

3.2 Streaming and Messaging

CDN algos

Single Page Applications with AngularJS 2.0

Sista: Improving Cog’s JIT performance

AI powered emotion recognition: From Inception to Production - Global AI Conf...

Hpc lunch and learn

Realtime traffic analyser

John adams talk cloudy

Productionizing Deep Reinforcement Learning with Spark and MLflow

Our Puppet Story (GUUG FFG 2015)

Mehr von Seung Jae Lee

[1807] Learning Montezuma's Revenge from a Single DemonstrationSeung Jae Lee

[1808.00177] Learning Dexterous In-Hand ManipulationSeung Jae Lee

Reinforcement Learning 8: Planning and Learning with Tabular MethodsSeung Jae Lee

Reinforcement Learning 7. n-step BootstrappingSeung Jae Lee

Reinforcement Learning 6. Temporal Difference LearningSeung Jae Lee

Reinforcement Learning 5. Monte Carlo MethodsSeung Jae Lee

Reinforcement Learning 10. On-policy Control with ApproximationSeung Jae Lee

Reinforcement Learning 4. Dynamic ProgrammingSeung Jae Lee

Reinforcement Learning 3. Finite Markov Decision ProcessesSeung Jae Lee

Reinforcement Learning 2. Multi-armed BanditsSeung Jae Lee

Reinforcement Learning 1. IntroductionSeung Jae Lee

GitHub으로 웹페이지 만들기Seung Jae Lee

Mehr von Seung Jae Lee (12)

[1807] Learning Montezuma's Revenge from a Single Demonstration

[1808.00177] Learning Dexterous In-Hand Manipulation

Reinforcement Learning 8: Planning and Learning with Tabular Methods

Reinforcement Learning 7. n-step Bootstrapping

Reinforcement Learning 6. Temporal Difference Learning

Reinforcement Learning 5. Monte Carlo Methods

Reinforcement Learning 10. On-policy Control with Approximation

Reinforcement Learning 4. Dynamic Programming

Reinforcement Learning 3. Finite Markov Decision Processes

Reinforcement Learning 2. Multi-armed Bandits

Reinforcement Learning 1. Introduction

GitHub으로 웹페이지 만들기

Kürzlich hochgeladen

Scaling API-first – The story of a global engineering organizationRadu Cotescu

Presentation on how to chat with PDF using ChatGPT code interpreternaman860154

08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls

How to convert PDF to text with Nanonetsnaman860154

Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC

CNv6 Instructor Chapter 6 Quality of Servicegiselly40

Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j

Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung

A Domino Admins Adventures (Engage 2024)Gabriella Davis

08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls

WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal

Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays

Boost PC performance: How more available memory can improve productivityPrincipled Technologies

Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer

08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls

Histor y of HAM Radio presentation slidevu2urc

Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun

The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad

[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745

Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko

Kürzlich hochgeladen (20)

Scaling API-first – The story of a global engineering organization

Presentation on how to chat with PDF using ChatGPT code interpreter

08448380779 Call Girls In Greater Kailash - I Women Seeking Men

How to convert PDF to text with Nanonets

Breaking the Kubernetes Kill Chain: Host Path Mount

CNv6 Instructor Chapter 6 Quality of Service

Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...

Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...

A Domino Admins Adventures (Engage 2024)

08448380779 Call Girls In Friends Colony Women Seeking Men

WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service

Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...

Boost PC performance: How more available memory can improve productivity

Tata AIG General Insurance Company - Insurer Innovation Award 2024

08448380779 Call Girls In Diplomatic Enclave Women Seeking Men

Histor y of HAM Radio presentation slide

Data Cloud, More than a CDP by Matt Robison

The Codex of Business Writing Software for Real-World Solutions 2.pptx

[2024]Digital Global Overview Report 2024 Meltwater.pdf

Handwritten Text Recognition for manuscripts and early printed texts

[1312.5602] Playing Atari with Deep Reinforcement Learning

1. Playing Atari with Deep Reinforcement Learning (13.12) Seungjae Ryan Lee

2. Previous Applications of RL • Linear value functions or policy representations • Rely on hand-crafted features • Feature representation determines performance • Can diverge with model-free RL, nonlinear approximation, off-policy

3. TD-gammon • Superhuman-level Backgammon playing RL agent • Model-free algorithm with one-hidden-layer MLP • Considered a “special case” https://users.auth.gr/kehagiat/Research/GameTheory/12CombBiblio/BackGammon.html

4. Meanwhile, in Deep Learning… • CNN, MLP, RNN • Extract high-level features from raw data • Works in both supervised and unsupervised learning • Will it work on Reinforcement Learning? http://fromdata.org/2015/10/01/imagenet-cnn-architecture-image/

5. Problems of combining DL and RL • No labelled data: only sparse, noisy, delayed rewards • Breaks most underlying assumptions of deep learning algorithms • Highly correlated data • Moving distribution of data

6. Proposal: Experience Replay • Idea originally by Long-Ji Lin • Save transitions in a replay memory 𝐷 • Randomly sample previous transitions to update parameters https://pdfs.semanticscholar.org/54c4/cf3a8168c1b70f91cf78a3dc98b671935492.pdf

7. Advantages of Experience Replay 1. Achieve greater data efficiency • Each experience is used multiple times 2. Break correlation between samples • Randomly sampled experience is nonconsecutive 3. Average the behavior distribution • Current parameters does not affect incoming data distribution

8. Preprocessing • Originally 210 × 160 image with 128 color palette • Gray-scale and downsampled to 110 × 84 • Cropped to 84 × 84 • Only to fit particular GPU implementation • Frame stacking • Stack 4 preprocessed frames as input • Need multiple frames for velocity, etc.

9. Model Architecture: Deep Q-Networks (DQN) • Return 𝑄 values for all 10 actions https://nthu-datalab.github.io/ml/labs/17_DQN_Policy_Network/17-DQN_&_Policy_Network.html

10. Environment: Atari 2600 • 7 games selected • Create a single agent that can play as many games possible • No game-specific information • No hand-designed features • Same network architecture and hyperparameters https://github.com/mgbellemare/Arcade-Learning-Environment

11. Reward Clipping • Clip all rewards to {−1, 0, 1} • Fix all positive reward as +1 • Fix all negative rewards as −1 • Leave zero rewards unchanged • Limits scale of error derivatives for different games • Cannot differentiate between rewards of different magnitude

12. Frame Skipping • Agent sees and selects actions on every 𝑘 𝑡ℎ frame • Action is repeated on skipped frames • Agent can play roughly 𝑘 times more games https://danieltakeshi.github.io/2016/11/25/frame-skipping-and-preprocessing-for-deep-q-networks-on-atari-2600-games/

13. Training and Stability • How to evaluate agent during training? • Total episodic reward: very noisy • Policy’s estimated 𝑄 function: smooth • Did not encounter any divergence issue in practice

14. Result on Atari 2600 • Outperform all previous RL algorithms in 6 games • Surpass expert human player in 3 games

15. Thank you! Original Paper: https://arxiv.org/abs/1312.5602 Paper Recommendations: • [1502] Human-level control through Deep Reinforcement Learning • [1511.05952] Prioritized Experience Replay • [1712.01275] A Deeper Look at Experience Replay You can find more content in www.endtoend.ai/slides