SlideShare ist ein Scribd-Unternehmen logo
1 von 27
Downloaden Sie, um offline zu lesen
Deep Q-Learning
A Reinforcement Learning approach
What is Reinforcement Learning?
- Much like biological agents behave
- No supervisor, only a reward
- Data is time dependent (non iid)
- Feedback is delayed
- Agent actions affect the data it receives
Examples
- Play checkers (1959)
- Defeat the world champion at Backgammon (1992)
- Control a helicopter (2008)
- Make a robot to walk
- Robocup Soccer
- Play ATARI games better than humans (2014)
- Defeat the world champion at Go (2016)
Videos
Reward Hypothesis
All goals can be described by the maximisation of expected cumulative reward
- Defeat the world champion at Go: +R / -R for winning/losing a game
- Make a robot to walk: +R for forward, -R for falling over
- Play ATARI games: +R / -R for increasing/decreasing score
- Control a helicopter: + R / -R following trajectory / crashing
Agent and Environment
Fully Observable Environments
Fully Observable Environments (agent state = environment state):
- Agent directly observes environment
- Example: chess board
Partially Observable Environments (agent state not equal environment state):
- Agent indirectly observes environment
- Example: A robot with motion sensor or camera
- Agent must construct its own state representation
RL components: Policy and Value Function
Policy is agent’s behaviour function
- Maps from state to action
- Deterministic policy:
- Stochastic:
Value function is a is a prediction of future reward
- Used to evaluate state and select between actions
-
Model
Predicts what environment will do next:
Maze example: r = -1 per time-step and policy
[David Silver. Advanced Topics: RL]
Maze example: Value function and Model
[David Silver. Advanced Topics: RL]
Exploration - Exploitation dilemma
Math: Markov Decision Process (MDP)
Almost all RL problems can be formalised as MDPs
It’s a tuple:
- S is finite set of states
- A is finite set of actions
- P is state transition probability matrix:
- R is a reward function:
- Discount factor:
State-Value and Action-Value functions, Bellman eq.
Expected return starting from state s, and then following policy :
Expected return starting from state s, taking action a, and then following policy :
Finding an Optimal Policy
- There is always optimal policy for any MPD
- All optimal policies achieve the optimal value function
- All optimal policies achieve the optimal action-value function
All you need is to find
Bellman Opt Equation for state-value function
[David Silver. Advanced Topics: RL]
Bellman Opt Equation for action-value function
[David Silver. Advanced Topics: RL]
Bellman Opt Equation for state-value function
[David Silver. Advanced Topics: RL]
Bellman Opt Equation for action-value function
[David Silver. Advanced Topics: RL]
Policy Iteration Demo
Q-Learning - model-free off-policy control algorithm
Model-free (vs Model-based):
- MDP model is unknown, but experience can be sampled MDP
- Model is known, but is too big to use, except by samples
Off-policy (vs On-policy):
- Can learn about policy from experience sampled from some other policy
Control (vs Prediction):
- Find best policy
Q-Learning
[David Silver. Advanced Topics: RL]
DQN - Q-Learning with function approximation
[Human-level control through deep reinforcement learning]
[Human-level control through deep reinforcement learning]
Issues with Q-learning with neural network
- Data is sequential (non-iid)
- Policy changes rapidly with slight changes to Q-values
- Policy may oscillate
- Experience flows from one extreme to another
- Scale of rewards and Q-values is unknown
- Unstable backpropagation due to large gradients
DQN solutions
- Use experience replay
- Breaks correlations in data
- Learn from all past policies
- Using off-policy Q-learning
- Freeze target Q-network
- Avoid policy oscillations
- Break correlations between Q-network and target
- Clip rewards and gradients
Neon Demo
Links
- Human-level control through deep reinforcement learning
- Course: David Silver. Advanced Topics: RL
- Tutorial: David Silver. Deep Reinforcement Learning
- Book: Sutton, Barto. Reinforcement learning
- Source Code: simple_dqn
- Reinforcejs
- The Arcade Learning Environment

Weitere ähnliche Inhalte

Was ist angesagt?

Reinforcement Learning 3. Finite Markov Decision Processes
Reinforcement Learning 3. Finite Markov Decision ProcessesReinforcement Learning 3. Finite Markov Decision Processes
Reinforcement Learning 3. Finite Markov Decision ProcessesSeung Jae Lee
 
Reinforcement learning
Reinforcement learningReinforcement learning
Reinforcement learningDing Li
 
Q-learning and Deep Q Network (Reinforcement Learning)
Q-learning and Deep Q Network (Reinforcement Learning)Q-learning and Deep Q Network (Reinforcement Learning)
Q-learning and Deep Q Network (Reinforcement Learning)Thom Lane
 
Reinforcement Learning 4. Dynamic Programming
Reinforcement Learning 4. Dynamic ProgrammingReinforcement Learning 4. Dynamic Programming
Reinforcement Learning 4. Dynamic ProgrammingSeung Jae Lee
 
Lecture 9 Markov decision process
Lecture 9 Markov decision processLecture 9 Markov decision process
Lecture 9 Markov decision processVARUN KUMAR
 
Reinforcement learning
Reinforcement  learningReinforcement  learning
Reinforcement learningSKS
 
Reinforcement Learning 5. Monte Carlo Methods
Reinforcement Learning 5. Monte Carlo MethodsReinforcement Learning 5. Monte Carlo Methods
Reinforcement Learning 5. Monte Carlo MethodsSeung Jae Lee
 
Deep Reinforcement Learning and Its Applications
Deep Reinforcement Learning and Its ApplicationsDeep Reinforcement Learning and Its Applications
Deep Reinforcement Learning and Its ApplicationsBill Liu
 
Reinforcement Learning 2. Multi-armed Bandits
Reinforcement Learning 2. Multi-armed BanditsReinforcement Learning 2. Multi-armed Bandits
Reinforcement Learning 2. Multi-armed BanditsSeung Jae Lee
 
Reinforcement Learning : A Beginners Tutorial
Reinforcement Learning : A Beginners TutorialReinforcement Learning : A Beginners Tutorial
Reinforcement Learning : A Beginners TutorialOmar Enayet
 
Reinforcement Learning Tutorial | Edureka
Reinforcement Learning Tutorial | EdurekaReinforcement Learning Tutorial | Edureka
Reinforcement Learning Tutorial | EdurekaEdureka!
 
Reinforcement learning 7313
Reinforcement learning 7313Reinforcement learning 7313
Reinforcement learning 7313Slideshare
 
Multi-armed Bandits
Multi-armed BanditsMulti-armed Bandits
Multi-armed BanditsDongmin Lee
 
Reinforcement Learning
Reinforcement LearningReinforcement Learning
Reinforcement Learningbutest
 
DQN (Deep Q-Network)
DQN (Deep Q-Network)DQN (Deep Q-Network)
DQN (Deep Q-Network)Dong Guo
 
Markov decision process
Markov decision processMarkov decision process
Markov decision processHamed Abdi
 
Intro to Reinforcement learning - part III
Intro to Reinforcement learning - part IIIIntro to Reinforcement learning - part III
Intro to Reinforcement learning - part IIIMikko Mäkipää
 
Deep Reinforcement Learning: Q-Learning
Deep Reinforcement Learning: Q-LearningDeep Reinforcement Learning: Q-Learning
Deep Reinforcement Learning: Q-LearningKai-Wen Zhao
 

Was ist angesagt? (20)

Reinforcement Learning 3. Finite Markov Decision Processes
Reinforcement Learning 3. Finite Markov Decision ProcessesReinforcement Learning 3. Finite Markov Decision Processes
Reinforcement Learning 3. Finite Markov Decision Processes
 
Reinforcement learning
Reinforcement learningReinforcement learning
Reinforcement learning
 
Q-learning and Deep Q Network (Reinforcement Learning)
Q-learning and Deep Q Network (Reinforcement Learning)Q-learning and Deep Q Network (Reinforcement Learning)
Q-learning and Deep Q Network (Reinforcement Learning)
 
Reinforcement Learning 4. Dynamic Programming
Reinforcement Learning 4. Dynamic ProgrammingReinforcement Learning 4. Dynamic Programming
Reinforcement Learning 4. Dynamic Programming
 
Lecture 9 Markov decision process
Lecture 9 Markov decision processLecture 9 Markov decision process
Lecture 9 Markov decision process
 
Reinforcement learning
Reinforcement  learningReinforcement  learning
Reinforcement learning
 
Reinforcement Learning 5. Monte Carlo Methods
Reinforcement Learning 5. Monte Carlo MethodsReinforcement Learning 5. Monte Carlo Methods
Reinforcement Learning 5. Monte Carlo Methods
 
Deep Reinforcement Learning and Its Applications
Deep Reinforcement Learning and Its ApplicationsDeep Reinforcement Learning and Its Applications
Deep Reinforcement Learning and Its Applications
 
Reinforcement Learning 2. Multi-armed Bandits
Reinforcement Learning 2. Multi-armed BanditsReinforcement Learning 2. Multi-armed Bandits
Reinforcement Learning 2. Multi-armed Bandits
 
Reinforcement Learning : A Beginners Tutorial
Reinforcement Learning : A Beginners TutorialReinforcement Learning : A Beginners Tutorial
Reinforcement Learning : A Beginners Tutorial
 
Generalized Reinforcement Learning
Generalized Reinforcement LearningGeneralized Reinforcement Learning
Generalized Reinforcement Learning
 
Reinforcement Learning Tutorial | Edureka
Reinforcement Learning Tutorial | EdurekaReinforcement Learning Tutorial | Edureka
Reinforcement Learning Tutorial | Edureka
 
Reinforcement learning 7313
Reinforcement learning 7313Reinforcement learning 7313
Reinforcement learning 7313
 
Multi-armed Bandits
Multi-armed BanditsMulti-armed Bandits
Multi-armed Bandits
 
Reinforcement Learning
Reinforcement LearningReinforcement Learning
Reinforcement Learning
 
DQN (Deep Q-Network)
DQN (Deep Q-Network)DQN (Deep Q-Network)
DQN (Deep Q-Network)
 
Markov decision process
Markov decision processMarkov decision process
Markov decision process
 
Intro to Reinforcement learning - part III
Intro to Reinforcement learning - part IIIIntro to Reinforcement learning - part III
Intro to Reinforcement learning - part III
 
Deep Reinforcement Learning: Q-Learning
Deep Reinforcement Learning: Q-LearningDeep Reinforcement Learning: Q-Learning
Deep Reinforcement Learning: Q-Learning
 
Policy gradient
Policy gradientPolicy gradient
Policy gradient
 

Andere mochten auch

Distributed Deep Q-Learning
Distributed Deep Q-LearningDistributed Deep Q-Learning
Distributed Deep Q-LearningLyft
 
Deep Q-Network 論文輪読会
Deep Q-Network 論文輪読会Deep Q-Network 論文輪読会
Deep Q-Network 論文輪読会Kotaro Tanahashi
 
Reinforcement learning
Reinforcement learning Reinforcement learning
Reinforcement learning Chandra Meena
 
Human brain how it work
Human brain how it workHuman brain how it work
Human brain how it workhudvin
 
1118_Seminar_Continuous_Deep Q-Learning with Model based acceleration
1118_Seminar_Continuous_Deep Q-Learning with Model based acceleration1118_Seminar_Continuous_Deep Q-Learning with Model based acceleration
1118_Seminar_Continuous_Deep Q-Learning with Model based accelerationHye-min Ahn
 
Encoding Robotic Sensor States for Q-Learning using the
Encoding Robotic Sensor States for Q-Learning using the Encoding Robotic Sensor States for Q-Learning using the
Encoding Robotic Sensor States for Q-Learning using the butest
 
Face detection and recognition using OpenCV
Face detection and recognition using OpenCVFace detection and recognition using OpenCV
Face detection and recognition using OpenCVAndrew Babiy
 
Deep Q-Network for beginners
Deep Q-Network for beginnersDeep Q-Network for beginners
Deep Q-Network for beginnersEtsuji Nakai
 
Your first TensorFlow programming with Jupyter
Your first TensorFlow programming with JupyterYour first TensorFlow programming with Jupyter
Your first TensorFlow programming with JupyterEtsuji Nakai
 
"Playing Atari with Deep Reinforcement Learning"
"Playing Atari with Deep Reinforcement Learning""Playing Atari with Deep Reinforcement Learning"
"Playing Atari with Deep Reinforcement Learning"mooopan
 
強化学習入門
強化学習入門強化学習入門
強化学習入門Shunta Saito
 
最近のDQN
最近のDQN最近のDQN
最近のDQNmooopan
 
MachineLearning_QLearningCircuit
MachineLearning_QLearningCircuitMachineLearning_QLearningCircuit
MachineLearning_QLearningCircuitSean Williams
 
нейронные сети
нейронные сетинейронные сети
нейронные сетиhudvin
 
Основы коспьютерного стерео зрения
Основы коспьютерного стерео зренияОсновы коспьютерного стерео зрения
Основы коспьютерного стерео зренияArtyom Shklovets
 
Deep Reinforcement Learning Through Policy Optimization, John Schulman, OpenAI
Deep Reinforcement Learning Through Policy Optimization, John Schulman, OpenAIDeep Reinforcement Learning Through Policy Optimization, John Schulman, OpenAI
Deep Reinforcement Learning Through Policy Optimization, John Schulman, OpenAIJack Clark
 
Recognition of handwritten digits
Recognition of handwritten digitsRecognition of handwritten digits
Recognition of handwritten digitsAndrew Babiy
 
Details of Lazy Deep Learning for Images Recognition in ZZ Photo app
Details of Lazy Deep Learning for Images Recognition in ZZ Photo appDetails of Lazy Deep Learning for Images Recognition in ZZ Photo app
Details of Lazy Deep Learning for Images Recognition in ZZ Photo appPAY2 YOU
 

Andere mochten auch (20)

Distributed Deep Q-Learning
Distributed Deep Q-LearningDistributed Deep Q-Learning
Distributed Deep Q-Learning
 
Deep Q-Network 論文輪読会
Deep Q-Network 論文輪読会Deep Q-Network 論文輪読会
Deep Q-Network 論文輪読会
 
Reinforcement learning
Reinforcement learning Reinforcement learning
Reinforcement learning
 
Human brain how it work
Human brain how it workHuman brain how it work
Human brain how it work
 
1118_Seminar_Continuous_Deep Q-Learning with Model based acceleration
1118_Seminar_Continuous_Deep Q-Learning with Model based acceleration1118_Seminar_Continuous_Deep Q-Learning with Model based acceleration
1118_Seminar_Continuous_Deep Q-Learning with Model based acceleration
 
Encoding Robotic Sensor States for Q-Learning using the
Encoding Robotic Sensor States for Q-Learning using the Encoding Robotic Sensor States for Q-Learning using the
Encoding Robotic Sensor States for Q-Learning using the
 
Face detection and recognition using OpenCV
Face detection and recognition using OpenCVFace detection and recognition using OpenCV
Face detection and recognition using OpenCV
 
Deep Q-Network for beginners
Deep Q-Network for beginnersDeep Q-Network for beginners
Deep Q-Network for beginners
 
Your first TensorFlow programming with Jupyter
Your first TensorFlow programming with JupyterYour first TensorFlow programming with Jupyter
Your first TensorFlow programming with Jupyter
 
"Playing Atari with Deep Reinforcement Learning"
"Playing Atari with Deep Reinforcement Learning""Playing Atari with Deep Reinforcement Learning"
"Playing Atari with Deep Reinforcement Learning"
 
強化学習入門
強化学習入門強化学習入門
強化学習入門
 
最近のDQN
最近のDQN最近のDQN
最近のDQN
 
MachineLearning_QLearningCircuit
MachineLearning_QLearningCircuitMachineLearning_QLearningCircuit
MachineLearning_QLearningCircuit
 
нейронные сети
нейронные сетинейронные сети
нейронные сети
 
Основы коспьютерного стерео зрения
Основы коспьютерного стерео зренияОсновы коспьютерного стерео зрения
Основы коспьютерного стерео зрения
 
Deep Reinforcement Learning Through Policy Optimization, John Schulman, OpenAI
Deep Reinforcement Learning Through Policy Optimization, John Schulman, OpenAIDeep Reinforcement Learning Through Policy Optimization, John Schulman, OpenAI
Deep Reinforcement Learning Through Policy Optimization, John Schulman, OpenAI
 
SURF
SURFSURF
SURF
 
Recognition of handwritten digits
Recognition of handwritten digitsRecognition of handwritten digits
Recognition of handwritten digits
 
Details of Lazy Deep Learning for Images Recognition in ZZ Photo app
Details of Lazy Deep Learning for Images Recognition in ZZ Photo appDetails of Lazy Deep Learning for Images Recognition in ZZ Photo app
Details of Lazy Deep Learning for Images Recognition in ZZ Photo app
 
NLP Project Full Cycle
NLP Project Full CycleNLP Project Full Cycle
NLP Project Full Cycle
 

Ähnlich wie Deep Q-Learning

Reinfrocement Learning
Reinfrocement LearningReinfrocement Learning
Reinfrocement LearningNatan Katz
 
reinforcement-learning its based on the slide of university
reinforcement-learning its based on the slide of universityreinforcement-learning its based on the slide of university
reinforcement-learning its based on the slide of universityMOHDNADEEM971008
 
reinforcement-learning.ppt
reinforcement-learning.pptreinforcement-learning.ppt
reinforcement-learning.ppthemalathache
 
Intro to Reinforcement learning - part II
Intro to Reinforcement learning - part IIIntro to Reinforcement learning - part II
Intro to Reinforcement learning - part IIMikko Mäkipää
 
Head First Reinforcement Learning
Head First Reinforcement LearningHead First Reinforcement Learning
Head First Reinforcement Learningazzeddine chenine
 
Is Production RL at a tipping point?
Is Production RL at a tipping point?Is Production RL at a tipping point?
Is Production RL at a tipping point?M Waleed Kadous
 
Andrii Prysiazhnyk: Why the amazon sellers are buiyng the RTX 3080: Dynamic p...
Andrii Prysiazhnyk: Why the amazon sellers are buiyng the RTX 3080: Dynamic p...Andrii Prysiazhnyk: Why the amazon sellers are buiyng the RTX 3080: Dynamic p...
Andrii Prysiazhnyk: Why the amazon sellers are buiyng the RTX 3080: Dynamic p...Lviv Startup Club
 
14_ReinforcementLearning.pptx
14_ReinforcementLearning.pptx14_ReinforcementLearning.pptx
14_ReinforcementLearning.pptxRithikRaj25
 
Making smart decisions in real-time with Reinforcement Learning
Making smart decisions in real-time with Reinforcement LearningMaking smart decisions in real-time with Reinforcement Learning
Making smart decisions in real-time with Reinforcement LearningRuth Yakubu
 
Structured prediction with reinforcement learning
Structured prediction with reinforcement learningStructured prediction with reinforcement learning
Structured prediction with reinforcement learningguruprasad110
 
Reinforcement learning, Q-Learning
Reinforcement learning, Q-LearningReinforcement learning, Q-Learning
Reinforcement learning, Q-LearningKuppusamy P
 
Reinforcement Learning with Amazon SageMaker RL
Reinforcement Learning with Amazon SageMaker RLReinforcement Learning with Amazon SageMaker RL
Reinforcement Learning with Amazon SageMaker RLThom Lane
 
Reinforcement Learning on Mine Sweeper
Reinforcement Learning on Mine SweeperReinforcement Learning on Mine Sweeper
Reinforcement Learning on Mine SweeperDataScienceLab
 
Demystifying deep reinforement learning
Demystifying deep reinforement learningDemystifying deep reinforement learning
Demystifying deep reinforement learning재연 윤
 
How to formulate reinforcement learning in illustrative ways
How to formulate reinforcement learning in illustrative waysHow to formulate reinforcement learning in illustrative ways
How to formulate reinforcement learning in illustrative waysYasutoTamura1
 
Deep Reinforcement learning
Deep Reinforcement learningDeep Reinforcement learning
Deep Reinforcement learningCairo University
 
Introduction of Deep Reinforcement Learning
Introduction of Deep Reinforcement LearningIntroduction of Deep Reinforcement Learning
Introduction of Deep Reinforcement LearningNAVER Engineering
 

Ähnlich wie Deep Q-Learning (20)

Reinfrocement Learning
Reinfrocement LearningReinfrocement Learning
Reinfrocement Learning
 
reinforcement-learning its based on the slide of university
reinforcement-learning its based on the slide of universityreinforcement-learning its based on the slide of university
reinforcement-learning its based on the slide of university
 
reinforcement-learning.ppt
reinforcement-learning.pptreinforcement-learning.ppt
reinforcement-learning.ppt
 
Making Complex Decisions(Artificial Intelligence)
Making Complex Decisions(Artificial Intelligence)Making Complex Decisions(Artificial Intelligence)
Making Complex Decisions(Artificial Intelligence)
 
Intro to Reinforcement learning - part II
Intro to Reinforcement learning - part IIIntro to Reinforcement learning - part II
Intro to Reinforcement learning - part II
 
Head First Reinforcement Learning
Head First Reinforcement LearningHead First Reinforcement Learning
Head First Reinforcement Learning
 
Introduction to Deep Reinforcement Learning
Introduction to Deep Reinforcement LearningIntroduction to Deep Reinforcement Learning
Introduction to Deep Reinforcement Learning
 
RL intro
RL introRL intro
RL intro
 
Is Production RL at a tipping point?
Is Production RL at a tipping point?Is Production RL at a tipping point?
Is Production RL at a tipping point?
 
Andrii Prysiazhnyk: Why the amazon sellers are buiyng the RTX 3080: Dynamic p...
Andrii Prysiazhnyk: Why the amazon sellers are buiyng the RTX 3080: Dynamic p...Andrii Prysiazhnyk: Why the amazon sellers are buiyng the RTX 3080: Dynamic p...
Andrii Prysiazhnyk: Why the amazon sellers are buiyng the RTX 3080: Dynamic p...
 
14_ReinforcementLearning.pptx
14_ReinforcementLearning.pptx14_ReinforcementLearning.pptx
14_ReinforcementLearning.pptx
 
Making smart decisions in real-time with Reinforcement Learning
Making smart decisions in real-time with Reinforcement LearningMaking smart decisions in real-time with Reinforcement Learning
Making smart decisions in real-time with Reinforcement Learning
 
Structured prediction with reinforcement learning
Structured prediction with reinforcement learningStructured prediction with reinforcement learning
Structured prediction with reinforcement learning
 
Reinforcement learning, Q-Learning
Reinforcement learning, Q-LearningReinforcement learning, Q-Learning
Reinforcement learning, Q-Learning
 
Reinforcement Learning with Amazon SageMaker RL
Reinforcement Learning with Amazon SageMaker RLReinforcement Learning with Amazon SageMaker RL
Reinforcement Learning with Amazon SageMaker RL
 
Reinforcement Learning on Mine Sweeper
Reinforcement Learning on Mine SweeperReinforcement Learning on Mine Sweeper
Reinforcement Learning on Mine Sweeper
 
Demystifying deep reinforement learning
Demystifying deep reinforement learningDemystifying deep reinforement learning
Demystifying deep reinforement learning
 
How to formulate reinforcement learning in illustrative ways
How to formulate reinforcement learning in illustrative waysHow to formulate reinforcement learning in illustrative ways
How to formulate reinforcement learning in illustrative ways
 
Deep Reinforcement learning
Deep Reinforcement learningDeep Reinforcement learning
Deep Reinforcement learning
 
Introduction of Deep Reinforcement Learning
Introduction of Deep Reinforcement LearningIntroduction of Deep Reinforcement Learning
Introduction of Deep Reinforcement Learning
 

Kürzlich hochgeladen

Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024The Digital Insurer
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Principled Technologies
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 

Kürzlich hochgeladen (20)

Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 

Deep Q-Learning

  • 2. What is Reinforcement Learning? - Much like biological agents behave - No supervisor, only a reward - Data is time dependent (non iid) - Feedback is delayed - Agent actions affect the data it receives
  • 3. Examples - Play checkers (1959) - Defeat the world champion at Backgammon (1992) - Control a helicopter (2008) - Make a robot to walk - Robocup Soccer - Play ATARI games better than humans (2014) - Defeat the world champion at Go (2016) Videos
  • 4. Reward Hypothesis All goals can be described by the maximisation of expected cumulative reward - Defeat the world champion at Go: +R / -R for winning/losing a game - Make a robot to walk: +R for forward, -R for falling over - Play ATARI games: +R / -R for increasing/decreasing score - Control a helicopter: + R / -R following trajectory / crashing
  • 6. Fully Observable Environments Fully Observable Environments (agent state = environment state): - Agent directly observes environment - Example: chess board Partially Observable Environments (agent state not equal environment state): - Agent indirectly observes environment - Example: A robot with motion sensor or camera - Agent must construct its own state representation
  • 7. RL components: Policy and Value Function Policy is agent’s behaviour function - Maps from state to action - Deterministic policy: - Stochastic: Value function is a is a prediction of future reward - Used to evaluate state and select between actions -
  • 9. Maze example: r = -1 per time-step and policy [David Silver. Advanced Topics: RL]
  • 10. Maze example: Value function and Model [David Silver. Advanced Topics: RL]
  • 12. Math: Markov Decision Process (MDP) Almost all RL problems can be formalised as MDPs It’s a tuple: - S is finite set of states - A is finite set of actions - P is state transition probability matrix: - R is a reward function: - Discount factor:
  • 13. State-Value and Action-Value functions, Bellman eq. Expected return starting from state s, and then following policy : Expected return starting from state s, taking action a, and then following policy :
  • 14. Finding an Optimal Policy - There is always optimal policy for any MPD - All optimal policies achieve the optimal value function - All optimal policies achieve the optimal action-value function All you need is to find
  • 15. Bellman Opt Equation for state-value function [David Silver. Advanced Topics: RL]
  • 16. Bellman Opt Equation for action-value function [David Silver. Advanced Topics: RL]
  • 17. Bellman Opt Equation for state-value function [David Silver. Advanced Topics: RL]
  • 18. Bellman Opt Equation for action-value function [David Silver. Advanced Topics: RL]
  • 20. Q-Learning - model-free off-policy control algorithm Model-free (vs Model-based): - MDP model is unknown, but experience can be sampled MDP - Model is known, but is too big to use, except by samples Off-policy (vs On-policy): - Can learn about policy from experience sampled from some other policy Control (vs Prediction): - Find best policy
  • 22. DQN - Q-Learning with function approximation [Human-level control through deep reinforcement learning]
  • 23. [Human-level control through deep reinforcement learning]
  • 24. Issues with Q-learning with neural network - Data is sequential (non-iid) - Policy changes rapidly with slight changes to Q-values - Policy may oscillate - Experience flows from one extreme to another - Scale of rewards and Q-values is unknown - Unstable backpropagation due to large gradients
  • 25. DQN solutions - Use experience replay - Breaks correlations in data - Learn from all past policies - Using off-policy Q-learning - Freeze target Q-network - Avoid policy oscillations - Break correlations between Q-network and target - Clip rewards and gradients
  • 27. Links - Human-level control through deep reinforcement learning - Course: David Silver. Advanced Topics: RL - Tutorial: David Silver. Deep Reinforcement Learning - Book: Sutton, Barto. Reinforcement learning - Source Code: simple_dqn - Reinforcejs - The Arcade Learning Environment