SlideShare ist ein Scribd-Unternehmen logo
1 von 38
Downloaden Sie, um offline zu lesen
1
Reinforcement Learning
By Usman Qayyum
13, Nov, 2018
Machine Learning Expert ?
2
Supervised Learning suffers from
underline human-bias present in the data
Machine Learning
• Supervised Learning
Example Class
• Reinforcement Learning
Situation Reward Situation Reward
…
• Un-Supervised Learning
Example
Classification
Regression
Clustering
Auto-Encoder
Qlearning, DQN
Policy Gradient
Actor-Critic
3
Human Learning (Trail & Error)
● Achieves Goal Fail to achieve Goal
Baby starts walking and successfully reaches the couch
4
Reinforcement Learning
● Trial & error learning
● Learning from interaction
● Learning what to do—how to map
situations to actions—so as to maximize a
numerical reward signal
5
How to Formulate RL Problem
Environment—Physical world in which the agent
operates
State—Current situation of the agent
Action— Agent interaction with environment
through actions
Reward—Feedback from the environment
Policy—Method to map agent’s state to actions
Value—Future reward that an agent would receive
by taking an action in a particular state
6
RL Applications (Games/Networking)
Objective Complete the game with the highest score
State Raw pixel inputs of the game state
Action Game controls e.g. Left, Right, Up, Down
Reward Score increase/decrease at each time step
Objective Win the game!
State Position of all pieces
Action Where to put the next piece down
Reward 1 if win at the end of the game, 0 otherwise
Objective Intelligent Channel Selection
State Occupation on each channel in current time slot
Action Set the channel to be used for the next time slot
Reward +1 in case of no collision with interferer
otherwise -17
Markov Decision Process 
8
Markov Decision Process
9
• MDP is used to describe an environment for reinforcement learning
• Almost all RL problems can be formalized as MDPs
Markov property states that, “ The future is independent of the past given the present.”
P[St+1 | St ] = P[ St+1 | S1, ….. , St ]
Markov Chain Transition matrix
Markov reward
Model / Model-Free Learning
10
Environment (Taxi Game)
11
Representations
WALL --> (Can't pass through, will remain in the same position
Yellow --> Taxi Current Location
Blue --> Pick up Location
Purple --> Drop-off Location
Green --> Taxi turn green once passenger board
Q Learning …
● Q-Table is just a fancy name for a simple lookup table where we calculate
the maximum expected future rewards for action at each state.
But the questions are:
How do we calculate the values of the Q-table?
Are the values available or predefined?12
States = 500
Actions
0: move south
1: move north
2: move east
3: move west
4: pickup passenger
5: dropoff passenger
Reward:
+20: successfully pick up a passenger and
drop them off at desired location
-1: for each step
-10: every time you incorrectly pick up or
drop off a passenger
Q Learning …
Step1: When the episode initially starts, every Q-value is 0.
13
Q Learning …
Step 2&3: choose and perform an action
In the beginning, the agent will explore the environment and randomly choose actions.
As the agent explores the environment, the agent starts to exploit the environment.
14
Q Learning …
Step 4 & 5: Measure reward and Update Q Table
The Q-function uses the Bellman equation and takes two inputs: state (s) and action (a).
Learning Rate Discount Factor (Future reward)
15
Q-Learning to DQN
16
Google Deep-mind (Deep Q-Network)
17 “Human-level control through deep reinforcement learning”, Nature, 2015
Gym
A library that can simulate large numbers of reinforcement learning environments, including Atari games
18
• Lack of standardization of environments used in publications
• The need for better benchmarks.
Example: Taxi Game Problem (OpenAI Gym)
19
Example-1
20
Example-2
21
Example-2 …
22
23
Deep Q-Network
Human-level control through deep reinforcement learning – Nature Vol 518, Feb 26, 2015
By Usman Qayyum
15, Nov, 2018
24
Model-Free RL (Recap)
● Policy-based RL
○ Search directly for the optimal policy ∏*
○ This is the policy achieving maximum future reward
● Value-based RL
○ Estimate the optimal value function Q*(s,a)
○ This is the maximum value achievable under any
policy
25
Q-Learning to DQN (Value based RL )
26
Q-table is like a “cheat-sheet” to help us to find the maximum expected
future reward of an action, given a current state.
• Good strategy — however, this is not scalable.
Playing Atari with Deep RL (Nature, 2015)
● Played seven Atari 2600 games
● Beat previous ML approaches on six
● Beat human expert on three
● Aim to create a single neural network
agent that is able to successfully learn
to play as many of the games as
possible.
● Learns strictly from experience - no pre-
training.
● Inputs: game screen + score.
● No game-specific tuning.
27
What’s Next
28
Atari
● Rules of the game unknown
● Learn directly from interactive
game play
● Pick Action on joystick, see pixels
and score
29
Preprocessing & Temporal limitation
30
Convolution Layer/Fully Connected
31
• Frames are processed by three convolution layers.
• These layers allow you to exploit spatial relationships in images.
• But also, because frames are stacked together, you can exploit
some spatial properties across those frames.
Experience Replay
32
Experience replay will help us to handle two things:
Avoid forgetting previous experiences: the variability of the weights, because
there is high correlation between actions and states.
Solution: create a “replay buffer.” This stores experience tuples while interacting
with the environment, and then we sample a small batch of tuple to feed our neural
network.
Reduce correlations between experiences: we know that every action affects the next state. This
outputs a sequence of experience tuples which can be highly correlated
Solution: By sampling from the replay buffer at random, we can break this correlation. This prevents
action values from oscillating or diverging catastrophically.
Clipping Rewards
33
Each game has different score scales. For example, in Pong, players
can get 1 point when wining the play. Otherwise, players get -1 point.
However, in SpaceInvaders, players get 10~30 points when defeating
invaders. This difference would make training unstable.
Thus Clipping Rewards technique clips scores, which all positive
rewards are set +1 and all negative rewards are set -1.
DQN Algorithm
34
Performance
35
Recent Graph from Google Deepmind, 2018
(current trend in RL Gaming)
Naïve DQN vs Replay-buffer-based DQN
STRENGTHS AND WEAKNESSES
● Good at
‣ Quick-moving, complex, short-horizon games ‣ Semi-independent trails
within the game
‣ Negative feedback on failure
● Bad at
‣ long-horizon games that don’t converge ‣ Any “walking around” game
‣ Montezuma’s revenge
Worldly knowledge helps humans play these games relatively easily.
36
Example Code
● DQN with Atari Game
○ Colab jupyter notebooks
37
Reference
● Rich Sutton, Reinforcement Learning: an introduction, 2017
● Deep Reinforcement Learning, An overview, 2017 https://arxiv.org/pdf/1701.07274.pdf
● UCL course Reinforcement Learning:
http://www0.cs.ucl.ac.uk/staff/d.silver/web/Teaching.html
● CS231, Reinfrocement Learning, Lecture 14, 2017
http://cs231n.stanford.edu/slides/2017/cs231n_2017_lecture14.pdf
● Thomas Simonini, Medium Post “An introduction to Reinforcement Learning”
https://medium.freecodecamp.org/an-introduction-to-reinforcement-learning-
4339519de419
● Arthur Juliani, Medium Post “Simple Reinforcement Learning in Tensorflow”,
https://medium.com/@awjuliani/super-simple-reinforcement-learning-tutorial-part-1-
fd544fab149
38

Weitere ähnliche Inhalte

Was ist angesagt?

Deep Reinforcement Learning and Its Applications
Deep Reinforcement Learning and Its ApplicationsDeep Reinforcement Learning and Its Applications
Deep Reinforcement Learning and Its ApplicationsBill Liu
 
Reinforcement Learning
Reinforcement LearningReinforcement Learning
Reinforcement LearningDongHyun Kwak
 
DQN (Deep Q-Network)
DQN (Deep Q-Network)DQN (Deep Q-Network)
DQN (Deep Q-Network)Dong Guo
 
Actor critic algorithm
Actor critic algorithmActor critic algorithm
Actor critic algorithmJie-Han Chen
 
Reinforcement Learning Q-Learning
Reinforcement Learning   Q-Learning Reinforcement Learning   Q-Learning
Reinforcement Learning Q-Learning Melaku Eneayehu
 
An introduction to reinforcement learning
An introduction to  reinforcement learningAn introduction to  reinforcement learning
An introduction to reinforcement learningJie-Han Chen
 
Deep reinforcement learning from scratch
Deep reinforcement learning from scratchDeep reinforcement learning from scratch
Deep reinforcement learning from scratchJie-Han Chen
 
Markov decision process
Markov decision processMarkov decision process
Markov decision processHamed Abdi
 
Reinforcement learning
Reinforcement learningReinforcement learning
Reinforcement learningDing Li
 
Lecture 9 Markov decision process
Lecture 9 Markov decision processLecture 9 Markov decision process
Lecture 9 Markov decision processVARUN KUMAR
 
Reinforcement Learning 5. Monte Carlo Methods
Reinforcement Learning 5. Monte Carlo MethodsReinforcement Learning 5. Monte Carlo Methods
Reinforcement Learning 5. Monte Carlo MethodsSeung Jae Lee
 
6 games
6 games6 games
6 gamesMhd Sb
 
Reinforcement Learning : A Beginners Tutorial
Reinforcement Learning : A Beginners TutorialReinforcement Learning : A Beginners Tutorial
Reinforcement Learning : A Beginners TutorialOmar Enayet
 
Deep sarsa, Deep Q-learning, DQN
Deep sarsa, Deep Q-learning, DQNDeep sarsa, Deep Q-learning, DQN
Deep sarsa, Deep Q-learning, DQNEuijin Jeong
 
Reinforcement Learning
Reinforcement LearningReinforcement Learning
Reinforcement Learningbutest
 
Reinforcement learning, Q-Learning
Reinforcement learning, Q-LearningReinforcement learning, Q-Learning
Reinforcement learning, Q-LearningKuppusamy P
 
Reinforcement learning
Reinforcement learningReinforcement learning
Reinforcement learningDongHyun Kwak
 
Temporal difference learning
Temporal difference learningTemporal difference learning
Temporal difference learningJie-Han Chen
 

Was ist angesagt? (20)

Deep Reinforcement Learning and Its Applications
Deep Reinforcement Learning and Its ApplicationsDeep Reinforcement Learning and Its Applications
Deep Reinforcement Learning and Its Applications
 
Reinforcement Learning
Reinforcement LearningReinforcement Learning
Reinforcement Learning
 
DQN (Deep Q-Network)
DQN (Deep Q-Network)DQN (Deep Q-Network)
DQN (Deep Q-Network)
 
Actor critic algorithm
Actor critic algorithmActor critic algorithm
Actor critic algorithm
 
Reinforcement Learning Q-Learning
Reinforcement Learning   Q-Learning Reinforcement Learning   Q-Learning
Reinforcement Learning Q-Learning
 
An introduction to reinforcement learning
An introduction to  reinforcement learningAn introduction to  reinforcement learning
An introduction to reinforcement learning
 
Deep reinforcement learning from scratch
Deep reinforcement learning from scratchDeep reinforcement learning from scratch
Deep reinforcement learning from scratch
 
Markov decision process
Markov decision processMarkov decision process
Markov decision process
 
Reinforcement learning
Reinforcement learningReinforcement learning
Reinforcement learning
 
Reinforcement learning
Reinforcement learningReinforcement learning
Reinforcement learning
 
Q-learning
Q-learningQ-learning
Q-learning
 
Lecture 9 Markov decision process
Lecture 9 Markov decision processLecture 9 Markov decision process
Lecture 9 Markov decision process
 
Reinforcement Learning 5. Monte Carlo Methods
Reinforcement Learning 5. Monte Carlo MethodsReinforcement Learning 5. Monte Carlo Methods
Reinforcement Learning 5. Monte Carlo Methods
 
6 games
6 games6 games
6 games
 
Reinforcement Learning : A Beginners Tutorial
Reinforcement Learning : A Beginners TutorialReinforcement Learning : A Beginners Tutorial
Reinforcement Learning : A Beginners Tutorial
 
Deep sarsa, Deep Q-learning, DQN
Deep sarsa, Deep Q-learning, DQNDeep sarsa, Deep Q-learning, DQN
Deep sarsa, Deep Q-learning, DQN
 
Reinforcement Learning
Reinforcement LearningReinforcement Learning
Reinforcement Learning
 
Reinforcement learning, Q-Learning
Reinforcement learning, Q-LearningReinforcement learning, Q-Learning
Reinforcement learning, Q-Learning
 
Reinforcement learning
Reinforcement learningReinforcement learning
Reinforcement learning
 
Temporal difference learning
Temporal difference learningTemporal difference learning
Temporal difference learning
 

Ähnlich wie RL-DQN-Atari

Matineh Shaker, Artificial Intelligence Scientist, Bonsai at MLconf SF 2017
Matineh Shaker, Artificial Intelligence Scientist, Bonsai at MLconf SF 2017Matineh Shaker, Artificial Intelligence Scientist, Bonsai at MLconf SF 2017
Matineh Shaker, Artificial Intelligence Scientist, Bonsai at MLconf SF 2017MLconf
 
24.09.2021 Reinforcement Learning Algorithms.pptx
24.09.2021 Reinforcement Learning Algorithms.pptx24.09.2021 Reinforcement Learning Algorithms.pptx
24.09.2021 Reinforcement Learning Algorithms.pptxManiMaran230751
 
Demystifying deep reinforement learning
Demystifying deep reinforement learningDemystifying deep reinforement learning
Demystifying deep reinforement learning재연 윤
 
Reinforcement learning
Reinforcement learningReinforcement learning
Reinforcement learningElias Hasnat
 
Ben Lau, Quantitative Researcher, Hobbyist, at MLconf NYC 2017
Ben Lau, Quantitative Researcher, Hobbyist, at MLconf NYC 2017Ben Lau, Quantitative Researcher, Hobbyist, at MLconf NYC 2017
Ben Lau, Quantitative Researcher, Hobbyist, at MLconf NYC 2017MLconf
 
Q-Learning Algorithm: A Concise Introduction [Shakeeb A.]
Q-Learning Algorithm: A Concise Introduction [Shakeeb A.]Q-Learning Algorithm: A Concise Introduction [Shakeeb A.]
Q-Learning Algorithm: A Concise Introduction [Shakeeb A.]Shakeeb Ahmad Mohammad Mukhtar
 
Review :: Demystifying deep reinforcement learning (written by Tambet Matiisen)
Review :: Demystifying deep reinforcement learning (written by Tambet Matiisen)Review :: Demystifying deep reinforcement learning (written by Tambet Matiisen)
Review :: Demystifying deep reinforcement learning (written by Tambet Matiisen)Hogeon Seo
 
TensorFlow and Deep Learning Tips and Tricks
TensorFlow and Deep Learning Tips and TricksTensorFlow and Deep Learning Tips and Tricks
TensorFlow and Deep Learning Tips and TricksBen Ball
 
Intro to Reinforcement Learning
Intro to Reinforcement LearningIntro to Reinforcement Learning
Intro to Reinforcement LearningUtkarsh Garg
 
A Brief Survey of Reinforcement Learning
A Brief Survey of Reinforcement LearningA Brief Survey of Reinforcement Learning
A Brief Survey of Reinforcement LearningGiancarlo Frison
 
Survey of Modern Reinforcement Learning
Survey of Modern Reinforcement Learning Survey of Modern Reinforcement Learning
Survey of Modern Reinforcement Learning Julia Maddalena
 
Reinfrocement Learning
Reinfrocement LearningReinfrocement Learning
Reinfrocement LearningNatan Katz
 
Is Production RL at a tipping point?
Is Production RL at a tipping point?Is Production RL at a tipping point?
Is Production RL at a tipping point?M Waleed Kadous
 

Ähnlich wie RL-DQN-Atari (20)

Matineh Shaker, Artificial Intelligence Scientist, Bonsai at MLconf SF 2017
Matineh Shaker, Artificial Intelligence Scientist, Bonsai at MLconf SF 2017Matineh Shaker, Artificial Intelligence Scientist, Bonsai at MLconf SF 2017
Matineh Shaker, Artificial Intelligence Scientist, Bonsai at MLconf SF 2017
 
24.09.2021 Reinforcement Learning Algorithms.pptx
24.09.2021 Reinforcement Learning Algorithms.pptx24.09.2021 Reinforcement Learning Algorithms.pptx
24.09.2021 Reinforcement Learning Algorithms.pptx
 
CS799_FinalReport
CS799_FinalReportCS799_FinalReport
CS799_FinalReport
 
RL.ppt
RL.pptRL.ppt
RL.ppt
 
Deep einforcement learning
Deep einforcement learningDeep einforcement learning
Deep einforcement learning
 
Demystifying deep reinforement learning
Demystifying deep reinforement learningDemystifying deep reinforement learning
Demystifying deep reinforement learning
 
Reinforcement learning
Reinforcement learningReinforcement learning
Reinforcement learning
 
Ben Lau, Quantitative Researcher, Hobbyist, at MLconf NYC 2017
Ben Lau, Quantitative Researcher, Hobbyist, at MLconf NYC 2017Ben Lau, Quantitative Researcher, Hobbyist, at MLconf NYC 2017
Ben Lau, Quantitative Researcher, Hobbyist, at MLconf NYC 2017
 
Finalver
FinalverFinalver
Finalver
 
Q-Learning Algorithm: A Concise Introduction [Shakeeb A.]
Q-Learning Algorithm: A Concise Introduction [Shakeeb A.]Q-Learning Algorithm: A Concise Introduction [Shakeeb A.]
Q-Learning Algorithm: A Concise Introduction [Shakeeb A.]
 
Introduction to Deep Reinforcement Learning
Introduction to Deep Reinforcement LearningIntroduction to Deep Reinforcement Learning
Introduction to Deep Reinforcement Learning
 
Review :: Demystifying deep reinforcement learning (written by Tambet Matiisen)
Review :: Demystifying deep reinforcement learning (written by Tambet Matiisen)Review :: Demystifying deep reinforcement learning (written by Tambet Matiisen)
Review :: Demystifying deep reinforcement learning (written by Tambet Matiisen)
 
Deep RL.pdf
Deep RL.pdfDeep RL.pdf
Deep RL.pdf
 
TensorFlow and Deep Learning Tips and Tricks
TensorFlow and Deep Learning Tips and TricksTensorFlow and Deep Learning Tips and Tricks
TensorFlow and Deep Learning Tips and Tricks
 
Intro to Reinforcement Learning
Intro to Reinforcement LearningIntro to Reinforcement Learning
Intro to Reinforcement Learning
 
A Brief Survey of Reinforcement Learning
A Brief Survey of Reinforcement LearningA Brief Survey of Reinforcement Learning
A Brief Survey of Reinforcement Learning
 
Survey of Modern Reinforcement Learning
Survey of Modern Reinforcement Learning Survey of Modern Reinforcement Learning
Survey of Modern Reinforcement Learning
 
Reinforcement Learning - DQN
Reinforcement Learning - DQNReinforcement Learning - DQN
Reinforcement Learning - DQN
 
Reinfrocement Learning
Reinfrocement LearningReinfrocement Learning
Reinfrocement Learning
 
Is Production RL at a tipping point?
Is Production RL at a tipping point?Is Production RL at a tipping point?
Is Production RL at a tipping point?
 

Mehr von Usman Qayyum

Artificial intelligence on the Edge
Artificial intelligence on the EdgeArtificial intelligence on the Edge
Artificial intelligence on the EdgeUsman Qayyum
 
Object Detection using Deep Neural Networks
Object Detection using Deep Neural NetworksObject Detection using Deep Neural Networks
Object Detection using Deep Neural NetworksUsman Qayyum
 
Deep Learning disruption
Deep Learning disruptionDeep Learning disruption
Deep Learning disruptionUsman Qayyum
 
Thermal colorization using Deep Neural Network
Thermal colorization using Deep Neural NetworkThermal colorization using Deep Neural Network
Thermal colorization using Deep Neural NetworkUsman Qayyum
 
Introduction to deep Learning
Introduction to deep LearningIntroduction to deep Learning
Introduction to deep LearningUsman Qayyum
 

Mehr von Usman Qayyum (6)

Artificial intelligence on the Edge
Artificial intelligence on the EdgeArtificial intelligence on the Edge
Artificial intelligence on the Edge
 
Ai for kids
Ai for kidsAi for kids
Ai for kids
 
Object Detection using Deep Neural Networks
Object Detection using Deep Neural NetworksObject Detection using Deep Neural Networks
Object Detection using Deep Neural Networks
 
Deep Learning disruption
Deep Learning disruptionDeep Learning disruption
Deep Learning disruption
 
Thermal colorization using Deep Neural Network
Thermal colorization using Deep Neural NetworkThermal colorization using Deep Neural Network
Thermal colorization using Deep Neural Network
 
Introduction to deep Learning
Introduction to deep LearningIntroduction to deep Learning
Introduction to deep Learning
 

Kürzlich hochgeladen

Objectives n learning outcoms - MD 20240404.pptx
Objectives n learning outcoms - MD 20240404.pptxObjectives n learning outcoms - MD 20240404.pptx
Objectives n learning outcoms - MD 20240404.pptxMadhavi Dharankar
 
An Overview of the Calendar App in Odoo 17 ERP
An Overview of the Calendar App in Odoo 17 ERPAn Overview of the Calendar App in Odoo 17 ERP
An Overview of the Calendar App in Odoo 17 ERPCeline George
 
Indexing Structures in Database Management system.pdf
Indexing Structures in Database Management system.pdfIndexing Structures in Database Management system.pdf
Indexing Structures in Database Management system.pdfChristalin Nelson
 
Grade Three -ELLNA-REVIEWER-ENGLISH.pptx
Grade Three -ELLNA-REVIEWER-ENGLISH.pptxGrade Three -ELLNA-REVIEWER-ENGLISH.pptx
Grade Three -ELLNA-REVIEWER-ENGLISH.pptxkarenfajardo43
 
Jason Potel In Media Res Media Component
Jason Potel In Media Res Media ComponentJason Potel In Media Res Media Component
Jason Potel In Media Res Media ComponentInMediaRes1
 
MS4 level being good citizen -imperative- (1) (1).pdf
MS4 level   being good citizen -imperative- (1) (1).pdfMS4 level   being good citizen -imperative- (1) (1).pdf
MS4 level being good citizen -imperative- (1) (1).pdfMr Bounab Samir
 
Congestive Cardiac Failure..presentation
Congestive Cardiac Failure..presentationCongestive Cardiac Failure..presentation
Congestive Cardiac Failure..presentationdeepaannamalai16
 
How to Manage Buy 3 Get 1 Free in Odoo 17
How to Manage Buy 3 Get 1 Free in Odoo 17How to Manage Buy 3 Get 1 Free in Odoo 17
How to Manage Buy 3 Get 1 Free in Odoo 17Celine George
 
Sulphonamides, mechanisms and their uses
Sulphonamides, mechanisms and their usesSulphonamides, mechanisms and their uses
Sulphonamides, mechanisms and their usesVijayaLaxmi84
 
BÀI TẬP BỔ TRỢ TIẾNG ANH 8 - I-LEARN SMART WORLD - CẢ NĂM - CÓ FILE NGHE (BẢN...
BÀI TẬP BỔ TRỢ TIẾNG ANH 8 - I-LEARN SMART WORLD - CẢ NĂM - CÓ FILE NGHE (BẢN...BÀI TẬP BỔ TRỢ TIẾNG ANH 8 - I-LEARN SMART WORLD - CẢ NĂM - CÓ FILE NGHE (BẢN...
BÀI TẬP BỔ TRỢ TIẾNG ANH 8 - I-LEARN SMART WORLD - CẢ NĂM - CÓ FILE NGHE (BẢN...Nguyen Thanh Tu Collection
 
16. Discovery, function and commercial uses of different PGRS.pptx
16. Discovery, function and commercial uses of different PGRS.pptx16. Discovery, function and commercial uses of different PGRS.pptx
16. Discovery, function and commercial uses of different PGRS.pptxUmeshTimilsina1
 
Transdisciplinary Pathways for Urban Resilience [Work in Progress].pptx
Transdisciplinary Pathways for Urban Resilience [Work in Progress].pptxTransdisciplinary Pathways for Urban Resilience [Work in Progress].pptx
Transdisciplinary Pathways for Urban Resilience [Work in Progress].pptxinfo924062
 
4.11.24 Poverty and Inequality in America.pptx
4.11.24 Poverty and Inequality in America.pptx4.11.24 Poverty and Inequality in America.pptx
4.11.24 Poverty and Inequality in America.pptxmary850239
 
Scientific Writing :Research Discourse
Scientific  Writing :Research  DiscourseScientific  Writing :Research  Discourse
Scientific Writing :Research DiscourseAnita GoswamiGiri
 
DiskStorage_BasicFileStructuresandHashing.pdf
DiskStorage_BasicFileStructuresandHashing.pdfDiskStorage_BasicFileStructuresandHashing.pdf
DiskStorage_BasicFileStructuresandHashing.pdfChristalin Nelson
 
Shark introduction Morphology and its behaviour characteristics
Shark introduction Morphology and its behaviour characteristicsShark introduction Morphology and its behaviour characteristics
Shark introduction Morphology and its behaviour characteristicsArubSultan
 
Vinícius Portella In Media Res Media Component
Vinícius Portella In Media Res Media ComponentVinícius Portella In Media Res Media Component
Vinícius Portella In Media Res Media ComponentInMediaRes1
 

Kürzlich hochgeladen (20)

CARNAVAL COM MAGIA E EUFORIA _
CARNAVAL COM MAGIA E EUFORIA            _CARNAVAL COM MAGIA E EUFORIA            _
CARNAVAL COM MAGIA E EUFORIA _
 
Objectives n learning outcoms - MD 20240404.pptx
Objectives n learning outcoms - MD 20240404.pptxObjectives n learning outcoms - MD 20240404.pptx
Objectives n learning outcoms - MD 20240404.pptx
 
An Overview of the Calendar App in Odoo 17 ERP
An Overview of the Calendar App in Odoo 17 ERPAn Overview of the Calendar App in Odoo 17 ERP
An Overview of the Calendar App in Odoo 17 ERP
 
Teaching Critical AI Literacies - Maha Bali
Teaching Critical AI Literacies - Maha BaliTeaching Critical AI Literacies - Maha Bali
Teaching Critical AI Literacies - Maha Bali
 
Introduction to Research ,Need for research, Need for design of Experiments, ...
Introduction to Research ,Need for research, Need for design of Experiments, ...Introduction to Research ,Need for research, Need for design of Experiments, ...
Introduction to Research ,Need for research, Need for design of Experiments, ...
 
Indexing Structures in Database Management system.pdf
Indexing Structures in Database Management system.pdfIndexing Structures in Database Management system.pdf
Indexing Structures in Database Management system.pdf
 
Grade Three -ELLNA-REVIEWER-ENGLISH.pptx
Grade Three -ELLNA-REVIEWER-ENGLISH.pptxGrade Three -ELLNA-REVIEWER-ENGLISH.pptx
Grade Three -ELLNA-REVIEWER-ENGLISH.pptx
 
Jason Potel In Media Res Media Component
Jason Potel In Media Res Media ComponentJason Potel In Media Res Media Component
Jason Potel In Media Res Media Component
 
MS4 level being good citizen -imperative- (1) (1).pdf
MS4 level   being good citizen -imperative- (1) (1).pdfMS4 level   being good citizen -imperative- (1) (1).pdf
MS4 level being good citizen -imperative- (1) (1).pdf
 
Congestive Cardiac Failure..presentation
Congestive Cardiac Failure..presentationCongestive Cardiac Failure..presentation
Congestive Cardiac Failure..presentation
 
How to Manage Buy 3 Get 1 Free in Odoo 17
How to Manage Buy 3 Get 1 Free in Odoo 17How to Manage Buy 3 Get 1 Free in Odoo 17
How to Manage Buy 3 Get 1 Free in Odoo 17
 
Sulphonamides, mechanisms and their uses
Sulphonamides, mechanisms and their usesSulphonamides, mechanisms and their uses
Sulphonamides, mechanisms and their uses
 
BÀI TẬP BỔ TRỢ TIẾNG ANH 8 - I-LEARN SMART WORLD - CẢ NĂM - CÓ FILE NGHE (BẢN...
BÀI TẬP BỔ TRỢ TIẾNG ANH 8 - I-LEARN SMART WORLD - CẢ NĂM - CÓ FILE NGHE (BẢN...BÀI TẬP BỔ TRỢ TIẾNG ANH 8 - I-LEARN SMART WORLD - CẢ NĂM - CÓ FILE NGHE (BẢN...
BÀI TẬP BỔ TRỢ TIẾNG ANH 8 - I-LEARN SMART WORLD - CẢ NĂM - CÓ FILE NGHE (BẢN...
 
16. Discovery, function and commercial uses of different PGRS.pptx
16. Discovery, function and commercial uses of different PGRS.pptx16. Discovery, function and commercial uses of different PGRS.pptx
16. Discovery, function and commercial uses of different PGRS.pptx
 
Transdisciplinary Pathways for Urban Resilience [Work in Progress].pptx
Transdisciplinary Pathways for Urban Resilience [Work in Progress].pptxTransdisciplinary Pathways for Urban Resilience [Work in Progress].pptx
Transdisciplinary Pathways for Urban Resilience [Work in Progress].pptx
 
4.11.24 Poverty and Inequality in America.pptx
4.11.24 Poverty and Inequality in America.pptx4.11.24 Poverty and Inequality in America.pptx
4.11.24 Poverty and Inequality in America.pptx
 
Scientific Writing :Research Discourse
Scientific  Writing :Research  DiscourseScientific  Writing :Research  Discourse
Scientific Writing :Research Discourse
 
DiskStorage_BasicFileStructuresandHashing.pdf
DiskStorage_BasicFileStructuresandHashing.pdfDiskStorage_BasicFileStructuresandHashing.pdf
DiskStorage_BasicFileStructuresandHashing.pdf
 
Shark introduction Morphology and its behaviour characteristics
Shark introduction Morphology and its behaviour characteristicsShark introduction Morphology and its behaviour characteristics
Shark introduction Morphology and its behaviour characteristics
 
Vinícius Portella In Media Res Media Component
Vinícius Portella In Media Res Media ComponentVinícius Portella In Media Res Media Component
Vinícius Portella In Media Res Media Component
 

RL-DQN-Atari

  • 1. 1 Reinforcement Learning By Usman Qayyum 13, Nov, 2018
  • 2. Machine Learning Expert ? 2 Supervised Learning suffers from underline human-bias present in the data
  • 3. Machine Learning • Supervised Learning Example Class • Reinforcement Learning Situation Reward Situation Reward … • Un-Supervised Learning Example Classification Regression Clustering Auto-Encoder Qlearning, DQN Policy Gradient Actor-Critic 3
  • 4. Human Learning (Trail & Error) ● Achieves Goal Fail to achieve Goal Baby starts walking and successfully reaches the couch 4
  • 5. Reinforcement Learning ● Trial & error learning ● Learning from interaction ● Learning what to do—how to map situations to actions—so as to maximize a numerical reward signal 5
  • 6. How to Formulate RL Problem Environment—Physical world in which the agent operates State—Current situation of the agent Action— Agent interaction with environment through actions Reward—Feedback from the environment Policy—Method to map agent’s state to actions Value—Future reward that an agent would receive by taking an action in a particular state 6
  • 7. RL Applications (Games/Networking) Objective Complete the game with the highest score State Raw pixel inputs of the game state Action Game controls e.g. Left, Right, Up, Down Reward Score increase/decrease at each time step Objective Win the game! State Position of all pieces Action Where to put the next piece down Reward 1 if win at the end of the game, 0 otherwise Objective Intelligent Channel Selection State Occupation on each channel in current time slot Action Set the channel to be used for the next time slot Reward +1 in case of no collision with interferer otherwise -17
  • 9. Markov Decision Process 9 • MDP is used to describe an environment for reinforcement learning • Almost all RL problems can be formalized as MDPs Markov property states that, “ The future is independent of the past given the present.” P[St+1 | St ] = P[ St+1 | S1, ….. , St ] Markov Chain Transition matrix Markov reward
  • 10. Model / Model-Free Learning 10
  • 11. Environment (Taxi Game) 11 Representations WALL --> (Can't pass through, will remain in the same position Yellow --> Taxi Current Location Blue --> Pick up Location Purple --> Drop-off Location Green --> Taxi turn green once passenger board
  • 12. Q Learning … ● Q-Table is just a fancy name for a simple lookup table where we calculate the maximum expected future rewards for action at each state. But the questions are: How do we calculate the values of the Q-table? Are the values available or predefined?12 States = 500 Actions 0: move south 1: move north 2: move east 3: move west 4: pickup passenger 5: dropoff passenger Reward: +20: successfully pick up a passenger and drop them off at desired location -1: for each step -10: every time you incorrectly pick up or drop off a passenger
  • 13. Q Learning … Step1: When the episode initially starts, every Q-value is 0. 13
  • 14. Q Learning … Step 2&3: choose and perform an action In the beginning, the agent will explore the environment and randomly choose actions. As the agent explores the environment, the agent starts to exploit the environment. 14
  • 15. Q Learning … Step 4 & 5: Measure reward and Update Q Table The Q-function uses the Bellman equation and takes two inputs: state (s) and action (a). Learning Rate Discount Factor (Future reward) 15
  • 17. Google Deep-mind (Deep Q-Network) 17 “Human-level control through deep reinforcement learning”, Nature, 2015
  • 18. Gym A library that can simulate large numbers of reinforcement learning environments, including Atari games 18 • Lack of standardization of environments used in publications • The need for better benchmarks.
  • 19. Example: Taxi Game Problem (OpenAI Gym) 19
  • 23. 23 Deep Q-Network Human-level control through deep reinforcement learning – Nature Vol 518, Feb 26, 2015 By Usman Qayyum 15, Nov, 2018
  • 24. 24
  • 25. Model-Free RL (Recap) ● Policy-based RL ○ Search directly for the optimal policy ∏* ○ This is the policy achieving maximum future reward ● Value-based RL ○ Estimate the optimal value function Q*(s,a) ○ This is the maximum value achievable under any policy 25
  • 26. Q-Learning to DQN (Value based RL ) 26 Q-table is like a “cheat-sheet” to help us to find the maximum expected future reward of an action, given a current state. • Good strategy — however, this is not scalable.
  • 27. Playing Atari with Deep RL (Nature, 2015) ● Played seven Atari 2600 games ● Beat previous ML approaches on six ● Beat human expert on three ● Aim to create a single neural network agent that is able to successfully learn to play as many of the games as possible. ● Learns strictly from experience - no pre- training. ● Inputs: game screen + score. ● No game-specific tuning. 27
  • 29. Atari ● Rules of the game unknown ● Learn directly from interactive game play ● Pick Action on joystick, see pixels and score 29
  • 30. Preprocessing & Temporal limitation 30
  • 31. Convolution Layer/Fully Connected 31 • Frames are processed by three convolution layers. • These layers allow you to exploit spatial relationships in images. • But also, because frames are stacked together, you can exploit some spatial properties across those frames.
  • 32. Experience Replay 32 Experience replay will help us to handle two things: Avoid forgetting previous experiences: the variability of the weights, because there is high correlation between actions and states. Solution: create a “replay buffer.” This stores experience tuples while interacting with the environment, and then we sample a small batch of tuple to feed our neural network. Reduce correlations between experiences: we know that every action affects the next state. This outputs a sequence of experience tuples which can be highly correlated Solution: By sampling from the replay buffer at random, we can break this correlation. This prevents action values from oscillating or diverging catastrophically.
  • 33. Clipping Rewards 33 Each game has different score scales. For example, in Pong, players can get 1 point when wining the play. Otherwise, players get -1 point. However, in SpaceInvaders, players get 10~30 points when defeating invaders. This difference would make training unstable. Thus Clipping Rewards technique clips scores, which all positive rewards are set +1 and all negative rewards are set -1.
  • 35. Performance 35 Recent Graph from Google Deepmind, 2018 (current trend in RL Gaming) Naïve DQN vs Replay-buffer-based DQN
  • 36. STRENGTHS AND WEAKNESSES ● Good at ‣ Quick-moving, complex, short-horizon games ‣ Semi-independent trails within the game ‣ Negative feedback on failure ● Bad at ‣ long-horizon games that don’t converge ‣ Any “walking around” game ‣ Montezuma’s revenge Worldly knowledge helps humans play these games relatively easily. 36
  • 37. Example Code ● DQN with Atari Game ○ Colab jupyter notebooks 37
  • 38. Reference ● Rich Sutton, Reinforcement Learning: an introduction, 2017 ● Deep Reinforcement Learning, An overview, 2017 https://arxiv.org/pdf/1701.07274.pdf ● UCL course Reinforcement Learning: http://www0.cs.ucl.ac.uk/staff/d.silver/web/Teaching.html ● CS231, Reinfrocement Learning, Lecture 14, 2017 http://cs231n.stanford.edu/slides/2017/cs231n_2017_lecture14.pdf ● Thomas Simonini, Medium Post “An introduction to Reinforcement Learning” https://medium.freecodecamp.org/an-introduction-to-reinforcement-learning- 4339519de419 ● Arthur Juliani, Medium Post “Simple Reinforcement Learning in Tensorflow”, https://medium.com/@awjuliani/super-simple-reinforcement-learning-tutorial-part-1- fd544fab149 38