Deep Q-Learning

•

10 gefällt mir•3,928 views

Reinforcement learning is an area of machine learning inspired by behaviorist psychology, concerned with how software agents ought to take actions in an environment so as to maximize some notion of cumulative reward.

Technologie

Deep Q-Learning
A Reinforcement Learning approach

What is Reinforcement Learning?
- Much like biological agents behave
- No supervisor, only a reward
- Data is time dependent (non iid)
- Feedback is delayed
- Agent actions affect the data it receives

Examples
- Play checkers (1959)
- Defeat the world champion at Backgammon (1992)
- Control a helicopter (2008)
- Make a robot to walk
- Robocup Soccer
- Play ATARI games better than humans (2014)
- Defeat the world champion at Go (2016)
Videos

Reward Hypothesis
All goals can be described by the maximisation of expected cumulative reward
- Defeat the world champion at Go: +R / -R for winning/losing a game
- Make a robot to walk: +R for forward, -R for falling over
- Play ATARI games: +R / -R for increasing/decreasing score
- Control a helicopter: + R / -R following trajectory / crashing

Fully Observable Environments
Fully Observable Environments (agent state = environment state):
- Agent directly observes environment
- Example: chess board
Partially Observable Environments (agent state not equal environment state):
- Agent indirectly observes environment
- Example: A robot with motion sensor or camera
- Agent must construct its own state representation

RL components: Policy and Value Function
Policy is agent’s behaviour function
- Maps from state to action
- Deterministic policy:
- Stochastic:
Value function is a is a prediction of future reward
- Used to evaluate state and select between actions
-

Model
Predicts what environment will do next:

Maze example: r = -1 per time-step and policy
[David Silver. Advanced Topics: RL]

Maze example: Value function and Model
[David Silver. Advanced Topics: RL]

Math: Markov Decision Process (MDP)
Almost all RL problems can be formalised as MDPs
It’s a tuple:
- S is finite set of states
- A is finite set of actions
- P is state transition probability matrix:
- R is a reward function:
- Discount factor:

State-Value and Action-Value functions, Bellman eq.
Expected return starting from state s, and then following policy :
Expected return starting from state s, taking action a, and then following policy :

Finding an Optimal Policy
- There is always optimal policy for any MPD
- All optimal policies achieve the optimal value function
- All optimal policies achieve the optimal action-value function
All you need is to find

Bellman Opt Equation for state-value function
[David Silver. Advanced Topics: RL]

Bellman Opt Equation for action-value function
[David Silver. Advanced Topics: RL]

Q-Learning - model-free off-policy control algorithm
Model-free (vs Model-based):
- MDP model is unknown, but experience can be sampled MDP
- Model is known, but is too big to use, except by samples
Off-policy (vs On-policy):
- Can learn about policy from experience sampled from some other policy
Control (vs Prediction):
- Find best policy

Q-Learning
[David Silver. Advanced Topics: RL]

DQN - Q-Learning with function approximation
[Human-level control through deep reinforcement learning]

[Human-level control through deep reinforcement learning]

Issues with Q-learning with neural network
- Data is sequential (non-iid)
- Policy changes rapidly with slight changes to Q-values
- Policy may oscillate
- Experience flows from one extreme to another
- Scale of rewards and Q-values is unknown
- Unstable backpropagation due to large gradients

DQN solutions
- Use experience replay
- Breaks correlations in data
- Learn from all past policies
- Using off-policy Q-learning
- Freeze target Q-network
- Avoid policy oscillations
- Break correlations between Q-network and target
- Clip rewards and gradients

Links
- Human-level control through deep reinforcement learning
- Course: David Silver. Advanced Topics: RL
- Tutorial: David Silver. Deep Reinforcement Learning
- Book: Sutton, Barto. Reinforcement learning
- Source Code: simple_dqn
- Reinforcejs
- The Arcade Learning Environment

Weitere ähnliche Inhalte

Was ist angesagt?

Reinforcement Learning 3. Finite Markov Decision ProcessesSeung Jae Lee

Reinforcement learningDing Li

Q-learning and Deep Q Network (Reinforcement Learning)Thom Lane

Reinforcement Learning 4. Dynamic ProgrammingSeung Jae Lee

Lecture 9 Markov decision processVARUN KUMAR

Reinforcement learningSKS

Reinforcement Learning 5. Monte Carlo MethodsSeung Jae Lee

Deep Reinforcement Learning and Its ApplicationsBill Liu

Reinforcement Learning 2. Multi-armed BanditsSeung Jae Lee

Reinforcement Learning : A Beginners TutorialOmar Enayet

Generalized Reinforcement LearningPo-Hsiang (Barnett) Chiu

Reinforcement Learning Tutorial | EdurekaEdureka!

Reinforcement learning 7313Slideshare

Multi-armed BanditsDongmin Lee

Reinforcement Learningbutest

DQN (Deep Q-Network)Dong Guo

Markov decision processHamed Abdi

Intro to Reinforcement learning - part IIIMikko Mäkipää

Deep Reinforcement Learning: Q-LearningKai-Wen Zhao

Policy gradientJie-Han Chen

Was ist angesagt? (20)

Reinforcement Learning 3. Finite Markov Decision Processes

Reinforcement learning

Q-learning and Deep Q Network (Reinforcement Learning)

Reinforcement Learning 4. Dynamic Programming

Lecture 9 Markov decision process

Reinforcement learning

Reinforcement Learning 5. Monte Carlo Methods

Deep Reinforcement Learning and Its Applications

Reinforcement Learning 2. Multi-armed Bandits

Reinforcement Learning : A Beginners Tutorial

Generalized Reinforcement Learning

Reinforcement Learning Tutorial | Edureka

Reinforcement learning 7313

Multi-armed Bandits

Reinforcement Learning

DQN (Deep Q-Network)

Markov decision process

Intro to Reinforcement learning - part III

Deep Reinforcement Learning: Q-Learning

Policy gradient

Andere mochten auch

Distributed Deep Q-LearningLyft

Deep Q-Network　論文輪読会Kotaro Tanahashi

Reinforcement learning Chandra Meena

Human brain how it workhudvin

1118_Seminar_Continuous_Deep Q-Learning with Model based accelerationHye-min Ahn

Encoding Robotic Sensor States for Q-Learning using the butest

Face detection and recognition using OpenCVAndrew Babiy

Deep Q-Network for beginnersEtsuji Nakai

Your first TensorFlow programming with JupyterEtsuji Nakai

"Playing Atari with Deep Reinforcement Learning"mooopan

強化学習入門Shunta Saito

最近のDQNmooopan

MachineLearning_QLearningCircuitSean Williams

нейронные сетиhudvin

Основы коспьютерного стерео зренияArtyom Shklovets

Deep Reinforcement Learning Through Policy Optimization, John Schulman, OpenAIJack Clark

SURFAndrew Babiy

Recognition of handwritten digitsAndrew Babiy

Details of Lazy Deep Learning for Images Recognition in ZZ Photo appPAY2 YOU

NLP Project Full CycleVsevolod Dyomkin

Andere mochten auch (20)

Distributed Deep Q-Learning

Deep Q-Network　論文輪読会

Reinforcement learning

Human brain how it work

1118_Seminar_Continuous_Deep Q-Learning with Model based acceleration

Encoding Robotic Sensor States for Q-Learning using the

Face detection and recognition using OpenCV

Deep Q-Network for beginners

Your first TensorFlow programming with Jupyter

"Playing Atari with Deep Reinforcement Learning"

強化学習入門

Ähnlich wie Deep Q-Learning

Reinfrocement LearningNatan Katz

reinforcement-learning its based on the slide of universityMOHDNADEEM971008

reinforcement-learning.ppthemalathache

Making Complex Decisions(Artificial Intelligence)United International University

Intro to Reinforcement learning - part IIMikko Mäkipää

Head First Reinforcement Learningazzeddine chenine

Introduction to Deep Reinforcement LearningIDEAS - Int'l Data Engineering and Science Association

RL introKhangBom

Is Production RL at a tipping point?M Waleed Kadous

Andrii Prysiazhnyk: Why the amazon sellers are buiyng the RTX 3080: Dynamic p...Lviv Startup Club

14_ReinforcementLearning.pptxRithikRaj25

Making smart decisions in real-time with Reinforcement LearningRuth Yakubu

Structured prediction with reinforcement learningguruprasad110

Reinforcement learning, Q-LearningKuppusamy P

Reinforcement Learning with Amazon SageMaker RLThom Lane

Reinforcement Learning on Mine SweeperDataScienceLab

Demystifying deep reinforement learning재연 윤

How to formulate reinforcement learning in illustrative waysYasutoTamura1

Deep Reinforcement learningCairo University

Introduction of Deep Reinforcement LearningNAVER Engineering

Ähnlich wie Deep Q-Learning (20)

Reinfrocement Learning

reinforcement-learning its based on the slide of university

reinforcement-learning.ppt

Making Complex Decisions(Artificial Intelligence)

Intro to Reinforcement learning - part II

Head First Reinforcement Learning

Introduction to Deep Reinforcement Learning

RL intro

Is Production RL at a tipping point?

Andrii Prysiazhnyk: Why the amazon sellers are buiyng the RTX 3080: Dynamic p...

14_ReinforcementLearning.pptx

Making smart decisions in real-time with Reinforcement Learning

Structured prediction with reinforcement learning

Reinforcement learning, Q-Learning

Reinforcement Learning with Amazon SageMaker RL

Reinforcement Learning on Mine Sweeper

Demystifying deep reinforement learning

How to formulate reinforcement learning in illustrative ways

Deep Reinforcement learning

Introduction of Deep Reinforcement Learning

Kürzlich hochgeladen

Manulife - Insurer Innovation Award 2024The Digital Insurer

MINDCTI Revenue Release Quarter One 2024MIND CTI

Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo

Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra

Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2

AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin

Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software

From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software

presentation ICT roal in 21st century educationjfdjdjcjdnsjd

The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los

Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Principled Technologies

Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays

Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j

A Year of the Servo Reboot: Where Are We Now?Igalia

Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer

ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous

Partners Life - Insurer Innovation Award 2024The Digital Insurer

How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes

Kürzlich hochgeladen (20)

Manulife - Insurer Innovation Award 2024

MINDCTI Revenue Release Quarter One 2024

Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...

Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving

Exploring the Future Potential of AI-Enabled Smartphone Processors

AWS Community Day CPH - Three problems of Terraform

Strategies for Landing an Oracle DBA Job as a Fresher

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME

From Event to Action: Accelerate Your Decision Making with Real-Time Automation

presentation ICT roal in 21st century education

The 7 Things I Know About Cyber Security After 25 Years | April 2024

Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...

Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...

Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...

A Year of the Servo Reboot: Where Are We Now?

Tata AIG General Insurance Company - Insurer Innovation Award 2024

ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke

Partners Life - Insurer Innovation Award 2024

How to Troubleshoot Apps for the Modern Connected Worker

Deep Q-Learning

1. Deep Q-Learning A Reinforcement Learning approach

2. What is Reinforcement Learning? - Much like biological agents behave - No supervisor, only a reward - Data is time dependent (non iid) - Feedback is delayed - Agent actions affect the data it receives

3. Examples - Play checkers (1959) - Defeat the world champion at Backgammon (1992) - Control a helicopter (2008) - Make a robot to walk - Robocup Soccer - Play ATARI games better than humans (2014) - Defeat the world champion at Go (2016) Videos

4. Reward Hypothesis All goals can be described by the maximisation of expected cumulative reward - Defeat the world champion at Go: +R / -R for winning/losing a game - Make a robot to walk: +R for forward, -R for falling over - Play ATARI games: +R / -R for increasing/decreasing score - Control a helicopter: + R / -R following trajectory / crashing

5. Agent and Environment

6. Fully Observable Environments Fully Observable Environments (agent state = environment state): - Agent directly observes environment - Example: chess board Partially Observable Environments (agent state not equal environment state): - Agent indirectly observes environment - Example: A robot with motion sensor or camera - Agent must construct its own state representation

7. RL components: Policy and Value Function Policy is agent’s behaviour function - Maps from state to action - Deterministic policy: - Stochastic: Value function is a is a prediction of future reward - Used to evaluate state and select between actions -

8. Model Predicts what environment will do next:

9. Maze example: r = -1 per time-step and policy [David Silver. Advanced Topics: RL]

10. Maze example: Value function and Model [David Silver. Advanced Topics: RL]

11. Exploration - Exploitation dilemma

12. Math: Markov Decision Process (MDP) Almost all RL problems can be formalised as MDPs It’s a tuple: - S is finite set of states - A is finite set of actions - P is state transition probability matrix: - R is a reward function: - Discount factor:

13. State-Value and Action-Value functions, Bellman eq. Expected return starting from state s, and then following policy : Expected return starting from state s, taking action a, and then following policy :

14. Finding an Optimal Policy - There is always optimal policy for any MPD - All optimal policies achieve the optimal value function - All optimal policies achieve the optimal action-value function All you need is to find

15. Bellman Opt Equation for state-value function [David Silver. Advanced Topics: RL]

16. Bellman Opt Equation for action-value function [David Silver. Advanced Topics: RL]

17. Bellman Opt Equation for state-value function [David Silver. Advanced Topics: RL]

18. Bellman Opt Equation for action-value function [David Silver. Advanced Topics: RL]

19. Policy Iteration Demo

20. Q-Learning - model-free off-policy control algorithm Model-free (vs Model-based): - MDP model is unknown, but experience can be sampled MDP - Model is known, but is too big to use, except by samples Off-policy (vs On-policy): - Can learn about policy from experience sampled from some other policy Control (vs Prediction): - Find best policy

21. Q-Learning [David Silver. Advanced Topics: RL]

22. DQN - Q-Learning with function approximation [Human-level control through deep reinforcement learning]

23. [Human-level control through deep reinforcement learning]

24. Issues with Q-learning with neural network - Data is sequential (non-iid) - Policy changes rapidly with slight changes to Q-values - Policy may oscillate - Experience flows from one extreme to another - Scale of rewards and Q-values is unknown - Unstable backpropagation due to large gradients

25. DQN solutions - Use experience replay - Breaks correlations in data - Learn from all past policies - Using off-policy Q-learning - Freeze target Q-network - Avoid policy oscillations - Break correlations between Q-network and target - Clip rewards and gradients

26. Neon Demo

27. Links - Human-level control through deep reinforcement learning - Course: David Silver. Advanced Topics: RL - Tutorial: David Silver. Deep Reinforcement Learning - Book: Sutton, Barto. Reinforcement learning - Source Code: simple_dqn - Reinforcejs - The Arcade Learning Environment

Deep Q-Learning

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Andere mochten auch

Andere mochten auch (20)

Ähnlich wie Deep Q-Learning

Ähnlich wie Deep Q-Learning (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Deep Q-Learning