SlideShare ist ein Scribd-Unternehmen logo
1 von 50
Downloaden Sie, um offline zu lesen
Hi
1
Deep Q-Network
Deep Reinforcement Learning
Azzeddine CHENINE
@azzeddineCH_
2
15 October 2019
3
Q-Learning
Quick Reminder
4
Quick Reminder
Learning by experience
5
Quick Reminder
Learning by experience
Episodic tasks and continuous task
6
Quick Reminder
Learning by experience
Episodic tasks and continuous task
An action has a reward and results a new state
7
Quick Reminder
The value based methods try to reach an optimal policy by using a value function
8
Quick Reminder
The value based methods try to reach an optimal policy by using a value function
The Q function of an optima policy maximise the expected return G
9
Quick Reminder
The value based methods try to reach an optimal policy by using a value function
The Q function of an optima policy maximise the expected return G
Epsilon greedy policies solve the exploration and exploitation
problem
10
Q-Learning
Limitation of
11
§
Limitation
Non practical with environments with large or in
fi
nite state or
action space
Games
Stock market
Robots
12
Modern problems


require


Modern solutions
13
Deep Q-Networks
Deepminds
- 2015 -
14
No more given de
fi
nitions ….
15
No more given de
fi
nitions ….
fi
guring it out by
ourselves
16
1 - How to replace the
Q-table ?
Neural networks can be trained to
fi
t on a linear or a non
linear function.
17
1 - How to replace the
Q-table ?
Neural networks can be trained to
fi
t on a linear or a non
linear function.
Replace the Q-value function with a deep neural network
18
1 - How to replace the
Q-table ?
Neural networks can be trained to
fi
t on a linear or a non
linear function.
Replace the Q-value function with a deep neural network
The outputs layer hold the Q-value of each possible
action
⟹ make the actions space discrete
19
1 - How to replace the
Q-table ?
Neural networks can be trained to
fi
t on a linear or a non
linear function.
Replace the Q-value function with a deep neural network
The outputs layer hold the Q-value of each possible
action
The input layer takes the desired representation of the
state
⟹ make the actions space discrete
⟹ a pipeline of convolutions
⟹ a Dense layer
⟹ a sequence of LSTM/RNN cells
20
1 - How to replace the
Q-table ✅
21
1 - How to replace the
Q-table ✅
2 - How to optimize the
model
How did we used update the Q-table
The agent took an action (a) in a state (S)
22
1 - How to replace the
Q-table ✅
2 - How to optimize the
model
How did we used update the Q-table
The agent took an action (a) in a state (S)
The agent moved to the next state (S’) and
received a reward (r)
23
1 - How to replace the
Q-table ✅
2 - How to optimize the
model
the discount rate
𝞭
re
fl
ects how much we care
about the futur reward
How did we used update the Q-table
The agent took an action (a) in a state (S)
The agent moved to the next state (S’) and
received a reward (r)
𝞭
≈ 1⟹ we care too much about the future reward
𝞭
≈ 0⟹ we care too much about the immediate reward
24
1 - How to replace the
Q-table ✅
2 - How to optimize the
model
Q' ( S , a ) ⟸ Q ( S , a ) +
𝛼
[ r ( S , a , S’ ) +
𝞭
max Q(S’, a') - Q ( S, a ) ]
How did we used update the Q-table
A good Q value for an action (a) given a state (S) is a
good approximation to the expected discounted
cumulative reward.
⟹ Q ( S, a ) ≈ r ( S , a , S’ ) +
𝞭
max Q(S’, a')
25
1 - How to replace the
Q-table ✅
2 - How to optimize the
model
How are we going to update the Q-
networks weights
Q-values
State
Q-networks
⟹ Q ( S, a ) ≈ r ( S , a , S’ ) +
𝞭
max Q(S’, a')
26
1 - How to replace the
Q-table ✅
2 - How to optimize the
model
How are we going to update the Q-
networks weights
Q-values Action
Action selection
State
Q-networks Environments
New State S’
Reward R
⟹ Q ( S, a ) ≈ r ( S , a , S’ ) +
𝞭
max Q(S’, a')
27
1 - How to replace the
Q-table ✅
2 - How to optimize the
model
How are we going to update the Q-
networks weights
Q-values Action
Action selection
⨁
r ( S , a , S’ ) +
𝞭
max Q(S’, a')
State
Q-networks Environments
New State S’
Reward R
⟹ Q ( S, a ) ≈ r ( S , a , S’ ) +
𝞭
max Q(S’, a')
28
1 - How to replace the
Q-table ✅
2 - How to optimize the
model
How are we going to update the Q-
networks weights
Q-values Action
Action selection
⨁
r ( S , a , S’ ) +
𝞭
max Q(S’, a')
Q(S, a')
State
Q-networks Environments
New State S’
Reward R
⟹ Q ( S, a ) ≈ r ( S , a , S’ ) +
𝞭
max Q(S’, a')
29
1 - How to replace the
Q-table ✅
2 - How to optimize the
model
How are we going to update the Q-
networks weights
Q-values Action
Action selection
⨁
r ( S , a , S’ ) +
𝞭
max Q(S’, a')
Q(S, a')
State
Q-networks Environments
New State S’
Reward R
Loss Optimizers
⟹ Q ( S, a ) ≈ r ( S , a , S’ ) +
𝞭
max Q(S’, a')
30
1 - How to replace the
Q-table ✅
2 - How to optimize the
model
How are we going to update the Q-
networks weights
Q-values Action
Action selection
⨁
r ( S , a , S’ ) +
𝞭
max Q(S’, a')
Q(S, a')
New network weights
State
Q-networks Environments
New State S’
Reward R
Loss Optimizers
⟹ Q ( S, a ) ≈ r ( S , a , S’ ) +
𝞭
max Q(S’, a')
31
1 - How to replace the
Q-table ✅
2 - How to optimize the
model
How are we going to update the Q-
networks weights
⟹ Q ( S, a ) ≈ r ( S , a , S’ ) +
𝞭
max Q(S’, a')
Update weights
State
New state
For n episode
32
We are chasing a moving target
33
r ( S , a , S’ ) +
𝞭
max Q(S’, a')
Q ( S, a )
34
Q ( S, a )
r ( S , a , S’ ) +
𝞭
max Q(S’, a')
35
Q ( S, a )
r ( S , a , S’ ) +
𝞭
max Q(S’, a')
36
r ( S , a , S’ ) +
𝞭
max Q(S’, a')
Q ( S, a )
37
1 - How to replace the
Q-table ✅
2 - How to optimize the
model ✅
How are we going to update the Q-
networks weights
3 - T-target
We use a local network and a target network
38
1 - How to replace the
Q-table ✅
2 - How to optimize the
model ✅
How are we going to update the Q-
networks weights
3 - T-target
We use a local network and a target network
The target neural network get slightly updated based
on the local networks weights
39
1 - How to replace the
Q-table ✅
2 - How to optimize the
model ✅
How are we going to update the Q-
networks weights
3 - T-target
We use a local network and a target network
The target neural network get slightly updated based
on the local networks weights
The local network predict the Q-values for the current
state
40
1 - How to replace the
Q-table ✅
2 - How to optimize the
model ✅
How are we going to update the Q-
networks weights
3 - T-target
We use a local network and a target network
The target neural network get slightly updated based
on the local networks weights
The local network predict the Q-values for the current
state
The target network predict the Q-values for the next
state
41
1 - How to replace the
Q-table ✅
2 - How to optimize the
model ✅
How are we going to update the Q-
networks weights
3 - T-target
Q-values Action
Action selection
State
Local network Environments
New State S’
Reward R
42
1 - How to replace the
Q-table ✅
2 - How to optimize the
model ✅
How are we going to update the Q-
networks weights
3 - T-target
Q-values Action
Action selection
⨁
r ( S , a , S’ ) +
𝞭
max Q(S’, a')
Q(S, a')
New network weights
State
Local network Environments
New State S’
Reward R
Loss Optimizers for
the local network
Target network
43
?
Someone to answer the question
please
In supervised learning, why we shu
ffl
ing
the dataset gives better accuracy?
44
Someone to answer the question
please
In supervised learning, why we shu
ffl
ing
the dataset gives better accuracy?
If not shu
ffl
ing data, the data can be sorted or
similar data points will lie next to each other,
which leads to slow convergence
?
45
Someone to answer the question
please
In supervised learning, why we shu
ffl
ing
the dataset gives better accuracy?
If not shu
ffl
ing data, the data can be sorted or
similar data points will lie next to each other,
which leads to slow convergence
In our case we are having a big
correlation between consecutive states
which may lead to ine
ffi
cient learning .
?
46
1 - How to replace the
Q-table ✅
2 - How to optimize the
model ✅
How to break the coloration
between states
3 - T-target ✅
4 - Experience
replay
Stack the experiences in a buffer
After the speci
fi
ed number of steps pick an N
random experiences
Proceed with optimizing the loss with the
learning batch
47
1 - How to replace the
Q-table ✅
2 - How to optimize the
model ✅
How to break the coloration
between states
3 - T-target ✅
4 - Experience
replay ✅
Stack the experiences in a buffer
After the speci
fi
ed number of steps pick an N
random experiences
Proceed with optimizing the loss with the
learning batch
48
Build a DQL agent
Putting it all together
49
Saha ftourkoum

Weitere ähnliche Inhalte

Was ist angesagt?

An introduction to reinforcement learning (rl)
An introduction to reinforcement learning (rl)An introduction to reinforcement learning (rl)
An introduction to reinforcement learning (rl)pauldix
 
Q-learning and Deep Q Network (Reinforcement Learning)
Q-learning and Deep Q Network (Reinforcement Learning)Q-learning and Deep Q Network (Reinforcement Learning)
Q-learning and Deep Q Network (Reinforcement Learning)Thom Lane
 
Reinforcement Learning / E-Book / Part 1
Reinforcement Learning / E-Book / Part 1Reinforcement Learning / E-Book / Part 1
Reinforcement Learning / E-Book / Part 1Hitesh Mohapatra
 
Deep Reinforcement Learning
Deep Reinforcement LearningDeep Reinforcement Learning
Deep Reinforcement LearningUsman Qayyum
 
Deep Reinforcement Learning and Its Applications
Deep Reinforcement Learning and Its ApplicationsDeep Reinforcement Learning and Its Applications
Deep Reinforcement Learning and Its ApplicationsBill Liu
 
An introduction to reinforcement learning
An introduction to reinforcement learningAn introduction to reinforcement learning
An introduction to reinforcement learningSubrat Panda, PhD
 
Reinforcement learning, Q-Learning
Reinforcement learning, Q-LearningReinforcement learning, Q-Learning
Reinforcement learning, Q-LearningKuppusamy P
 
Reinforcement learning 7313
Reinforcement learning 7313Reinforcement learning 7313
Reinforcement learning 7313Slideshare
 
MIT 6.S091: Introduction to Deep Reinforcement Learning (Deep RL) by Lex Fridman
MIT 6.S091: Introduction to Deep Reinforcement Learning (Deep RL) by Lex FridmanMIT 6.S091: Introduction to Deep Reinforcement Learning (Deep RL) by Lex Fridman
MIT 6.S091: Introduction to Deep Reinforcement Learning (Deep RL) by Lex FridmanPeerasak C.
 
Reinforcement Learning
Reinforcement LearningReinforcement Learning
Reinforcement LearningSalem-Kabbani
 
Reinforcement learning
Reinforcement  learningReinforcement  learning
Reinforcement learningSKS
 
Model-Based Reinforcement Learning @NIPS2017
Model-Based Reinforcement Learning @NIPS2017Model-Based Reinforcement Learning @NIPS2017
Model-Based Reinforcement Learning @NIPS2017mooopan
 
Reinforcement Learning : A Beginners Tutorial
Reinforcement Learning : A Beginners TutorialReinforcement Learning : A Beginners Tutorial
Reinforcement Learning : A Beginners TutorialOmar Enayet
 
An introduction to reinforcement learning
An introduction to  reinforcement learningAn introduction to  reinforcement learning
An introduction to reinforcement learningJie-Han Chen
 
Reinforcement Learning
Reinforcement LearningReinforcement Learning
Reinforcement LearningJungyeol
 
강화학습의 흐름도 Part 1
강화학습의 흐름도 Part 1강화학습의 흐름도 Part 1
강화학습의 흐름도 Part 1Dongmin Lee
 
Temporal difference learning
Temporal difference learningTemporal difference learning
Temporal difference learningJie-Han Chen
 

Was ist angesagt? (20)

An introduction to reinforcement learning (rl)
An introduction to reinforcement learning (rl)An introduction to reinforcement learning (rl)
An introduction to reinforcement learning (rl)
 
Q-learning and Deep Q Network (Reinforcement Learning)
Q-learning and Deep Q Network (Reinforcement Learning)Q-learning and Deep Q Network (Reinforcement Learning)
Q-learning and Deep Q Network (Reinforcement Learning)
 
Reinforcement Learning / E-Book / Part 1
Reinforcement Learning / E-Book / Part 1Reinforcement Learning / E-Book / Part 1
Reinforcement Learning / E-Book / Part 1
 
Deep Reinforcement Learning
Deep Reinforcement LearningDeep Reinforcement Learning
Deep Reinforcement Learning
 
Deep Reinforcement Learning
Deep Reinforcement LearningDeep Reinforcement Learning
Deep Reinforcement Learning
 
Deep Reinforcement Learning and Its Applications
Deep Reinforcement Learning and Its ApplicationsDeep Reinforcement Learning and Its Applications
Deep Reinforcement Learning and Its Applications
 
An introduction to reinforcement learning
An introduction to reinforcement learningAn introduction to reinforcement learning
An introduction to reinforcement learning
 
Reinforcement learning, Q-Learning
Reinforcement learning, Q-LearningReinforcement learning, Q-Learning
Reinforcement learning, Q-Learning
 
Reinforcement learning 7313
Reinforcement learning 7313Reinforcement learning 7313
Reinforcement learning 7313
 
MIT 6.S091: Introduction to Deep Reinforcement Learning (Deep RL) by Lex Fridman
MIT 6.S091: Introduction to Deep Reinforcement Learning (Deep RL) by Lex FridmanMIT 6.S091: Introduction to Deep Reinforcement Learning (Deep RL) by Lex Fridman
MIT 6.S091: Introduction to Deep Reinforcement Learning (Deep RL) by Lex Fridman
 
Reinforcement Learning
Reinforcement LearningReinforcement Learning
Reinforcement Learning
 
Lec3 dqn
Lec3 dqnLec3 dqn
Lec3 dqn
 
Reinforcement learning
Reinforcement  learningReinforcement  learning
Reinforcement learning
 
Model-Based Reinforcement Learning @NIPS2017
Model-Based Reinforcement Learning @NIPS2017Model-Based Reinforcement Learning @NIPS2017
Model-Based Reinforcement Learning @NIPS2017
 
Reinforcement Learning
Reinforcement LearningReinforcement Learning
Reinforcement Learning
 
Reinforcement Learning : A Beginners Tutorial
Reinforcement Learning : A Beginners TutorialReinforcement Learning : A Beginners Tutorial
Reinforcement Learning : A Beginners Tutorial
 
An introduction to reinforcement learning
An introduction to  reinforcement learningAn introduction to  reinforcement learning
An introduction to reinforcement learning
 
Reinforcement Learning
Reinforcement LearningReinforcement Learning
Reinforcement Learning
 
강화학습의 흐름도 Part 1
강화학습의 흐름도 Part 1강화학습의 흐름도 Part 1
강화학습의 흐름도 Part 1
 
Temporal difference learning
Temporal difference learningTemporal difference learning
Temporal difference learning
 

Ähnlich wie Deep Q-learning explained

Survey of Modern Reinforcement Learning
Survey of Modern Reinforcement Learning Survey of Modern Reinforcement Learning
Survey of Modern Reinforcement Learning Julia Maddalena
 
TensorFlow and Deep Learning Tips and Tricks
TensorFlow and Deep Learning Tips and TricksTensorFlow and Deep Learning Tips and Tricks
TensorFlow and Deep Learning Tips and TricksBen Ball
 
Off-Policy Deep Reinforcement Learning without Exploration.pdf
Off-Policy Deep Reinforcement Learning without Exploration.pdfOff-Policy Deep Reinforcement Learning without Exploration.pdf
Off-Policy Deep Reinforcement Learning without Exploration.pdfPo-Chuan Chen
 
Reinforcement learning
Reinforcement learningReinforcement learning
Reinforcement learningElias Hasnat
 
Discrete sequential prediction of continuous actions for deep RL
Discrete sequential prediction of continuous actions for deep RLDiscrete sequential prediction of continuous actions for deep RL
Discrete sequential prediction of continuous actions for deep RLJie-Han Chen
 
DDPG algortihm for angry birds
DDPG algortihm for angry birdsDDPG algortihm for angry birds
DDPG algortihm for angry birdsWangyu Han
 
Head First Reinforcement Learning
Head First Reinforcement LearningHead First Reinforcement Learning
Head First Reinforcement Learningazzeddine chenine
 
Ai lecture 11(unit02)
Ai lecture  11(unit02)Ai lecture  11(unit02)
Ai lecture 11(unit02)vikas dhakane
 
safe and efficient off policy reinforcement learning
safe and efficient off policy reinforcement learningsafe and efficient off policy reinforcement learning
safe and efficient off policy reinforcement learningRyo Iwaki
 
Deep Q-Network for beginners
Deep Q-Network for beginnersDeep Q-Network for beginners
Deep Q-Network for beginnersEtsuji Nakai
 
ゆるふわ強化学習入門
ゆるふわ強化学習入門ゆるふわ強化学習入門
ゆるふわ強化学習入門Ryo Iwaki
 
Dueling Network Architectures for Deep Reinforcement Learning
Dueling Network Architectures for Deep Reinforcement LearningDueling Network Architectures for Deep Reinforcement Learning
Dueling Network Architectures for Deep Reinforcement LearningYoonho Lee
 
FinalPresentation_200630888 (1)
FinalPresentation_200630888 (1)FinalPresentation_200630888 (1)
FinalPresentation_200630888 (1)woods6795
 
Dueling network architectures for deep reinforcement learning
Dueling network architectures for deep reinforcement learningDueling network architectures for deep reinforcement learning
Dueling network architectures for deep reinforcement learningTaehoon Kim
 
increasing the action gap - new operators for reinforcement learning
increasing the action gap - new operators for reinforcement learningincreasing the action gap - new operators for reinforcement learning
increasing the action gap - new operators for reinforcement learningRyo Iwaki
 
Matineh Shaker, Artificial Intelligence Scientist, Bonsai at MLconf SF 2017
Matineh Shaker, Artificial Intelligence Scientist, Bonsai at MLconf SF 2017Matineh Shaker, Artificial Intelligence Scientist, Bonsai at MLconf SF 2017
Matineh Shaker, Artificial Intelligence Scientist, Bonsai at MLconf SF 2017MLconf
 
learned optimizer.pptx
learned optimizer.pptxlearned optimizer.pptx
learned optimizer.pptxQingsong Guo
 
Reinfrocement Learning
Reinfrocement LearningReinfrocement Learning
Reinfrocement LearningNatan Katz
 
Auto-Pilot for Apache Spark Using Machine Learning
Auto-Pilot for Apache Spark Using Machine LearningAuto-Pilot for Apache Spark Using Machine Learning
Auto-Pilot for Apache Spark Using Machine LearningDatabricks
 

Ähnlich wie Deep Q-learning explained (20)

Survey of Modern Reinforcement Learning
Survey of Modern Reinforcement Learning Survey of Modern Reinforcement Learning
Survey of Modern Reinforcement Learning
 
TensorFlow and Deep Learning Tips and Tricks
TensorFlow and Deep Learning Tips and TricksTensorFlow and Deep Learning Tips and Tricks
TensorFlow and Deep Learning Tips and Tricks
 
Off-Policy Deep Reinforcement Learning without Exploration.pdf
Off-Policy Deep Reinforcement Learning without Exploration.pdfOff-Policy Deep Reinforcement Learning without Exploration.pdf
Off-Policy Deep Reinforcement Learning without Exploration.pdf
 
Reinforcement learning
Reinforcement learningReinforcement learning
Reinforcement learning
 
Discrete sequential prediction of continuous actions for deep RL
Discrete sequential prediction of continuous actions for deep RLDiscrete sequential prediction of continuous actions for deep RL
Discrete sequential prediction of continuous actions for deep RL
 
DDPG algortihm for angry birds
DDPG algortihm for angry birdsDDPG algortihm for angry birds
DDPG algortihm for angry birds
 
Head First Reinforcement Learning
Head First Reinforcement LearningHead First Reinforcement Learning
Head First Reinforcement Learning
 
Ai lecture 11(unit02)
Ai lecture  11(unit02)Ai lecture  11(unit02)
Ai lecture 11(unit02)
 
safe and efficient off policy reinforcement learning
safe and efficient off policy reinforcement learningsafe and efficient off policy reinforcement learning
safe and efficient off policy reinforcement learning
 
Deep Q-Network for beginners
Deep Q-Network for beginnersDeep Q-Network for beginners
Deep Q-Network for beginners
 
ゆるふわ強化学習入門
ゆるふわ強化学習入門ゆるふわ強化学習入門
ゆるふわ強化学習入門
 
Dueling Network Architectures for Deep Reinforcement Learning
Dueling Network Architectures for Deep Reinforcement LearningDueling Network Architectures for Deep Reinforcement Learning
Dueling Network Architectures for Deep Reinforcement Learning
 
Reinforcement Learning - DQN
Reinforcement Learning - DQNReinforcement Learning - DQN
Reinforcement Learning - DQN
 
FinalPresentation_200630888 (1)
FinalPresentation_200630888 (1)FinalPresentation_200630888 (1)
FinalPresentation_200630888 (1)
 
Dueling network architectures for deep reinforcement learning
Dueling network architectures for deep reinforcement learningDueling network architectures for deep reinforcement learning
Dueling network architectures for deep reinforcement learning
 
increasing the action gap - new operators for reinforcement learning
increasing the action gap - new operators for reinforcement learningincreasing the action gap - new operators for reinforcement learning
increasing the action gap - new operators for reinforcement learning
 
Matineh Shaker, Artificial Intelligence Scientist, Bonsai at MLconf SF 2017
Matineh Shaker, Artificial Intelligence Scientist, Bonsai at MLconf SF 2017Matineh Shaker, Artificial Intelligence Scientist, Bonsai at MLconf SF 2017
Matineh Shaker, Artificial Intelligence Scientist, Bonsai at MLconf SF 2017
 
learned optimizer.pptx
learned optimizer.pptxlearned optimizer.pptx
learned optimizer.pptx
 
Reinfrocement Learning
Reinfrocement LearningReinfrocement Learning
Reinfrocement Learning
 
Auto-Pilot for Apache Spark Using Machine Learning
Auto-Pilot for Apache Spark Using Machine LearningAuto-Pilot for Apache Spark Using Machine Learning
Auto-Pilot for Apache Spark Using Machine Learning
 

Mehr von azzeddine chenine

Introduction to Deep Reinforcement Learning workshop at School of Ai: AI Day
Introduction to Deep Reinforcement Learning workshop at School of Ai: AI Day Introduction to Deep Reinforcement Learning workshop at School of Ai: AI Day
Introduction to Deep Reinforcement Learning workshop at School of Ai: AI Day azzeddine chenine
 
MENADD firebase cloud functions
MENADD firebase cloud functionsMENADD firebase cloud functions
MENADD firebase cloud functionsazzeddine chenine
 
Android pei devfest Algiers 2018
Android pei devfest Algiers 2018Android pei devfest Algiers 2018
Android pei devfest Algiers 2018azzeddine chenine
 
Io18...what's new in Android
Io18...what's new in AndroidIo18...what's new in Android
Io18...what's new in Androidazzeddine chenine
 
Android architecture components
Android architecture components Android architecture components
Android architecture components azzeddine chenine
 

Mehr von azzeddine chenine (7)

Introduction to Deep Reinforcement Learning workshop at School of Ai: AI Day
Introduction to Deep Reinforcement Learning workshop at School of Ai: AI Day Introduction to Deep Reinforcement Learning workshop at School of Ai: AI Day
Introduction to Deep Reinforcement Learning workshop at School of Ai: AI Day
 
Firebase ml kit
Firebase ml kitFirebase ml kit
Firebase ml kit
 
MENADD firebase cloud functions
MENADD firebase cloud functionsMENADD firebase cloud functions
MENADD firebase cloud functions
 
Android pei devfest Algiers 2018
Android pei devfest Algiers 2018Android pei devfest Algiers 2018
Android pei devfest Algiers 2018
 
Kubernates
KubernatesKubernates
Kubernates
 
Io18...what's new in Android
Io18...what's new in AndroidIo18...what's new in Android
Io18...what's new in Android
 
Android architecture components
Android architecture components Android architecture components
Android architecture components
 

Kürzlich hochgeladen

%in Rustenburg+277-882-255-28 abortion pills for sale in Rustenburg
%in Rustenburg+277-882-255-28 abortion pills for sale in Rustenburg%in Rustenburg+277-882-255-28 abortion pills for sale in Rustenburg
%in Rustenburg+277-882-255-28 abortion pills for sale in Rustenburgmasabamasaba
 
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfonteinmasabamasaba
 
%in Benoni+277-882-255-28 abortion pills for sale in Benoni
%in Benoni+277-882-255-28 abortion pills for sale in Benoni%in Benoni+277-882-255-28 abortion pills for sale in Benoni
%in Benoni+277-882-255-28 abortion pills for sale in Benonimasabamasaba
 
WSO2CON 2024 - API Management Usage at La Poste and Its Impact on Business an...
WSO2CON 2024 - API Management Usage at La Poste and Its Impact on Business an...WSO2CON 2024 - API Management Usage at La Poste and Its Impact on Business an...
WSO2CON 2024 - API Management Usage at La Poste and Its Impact on Business an...WSO2
 
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...masabamasaba
 
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyviewmasabamasaba
 
VTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnVTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnAmarnathKambale
 
WSO2CON 2024 - How to Run a Security Program
WSO2CON 2024 - How to Run a Security ProgramWSO2CON 2024 - How to Run a Security Program
WSO2CON 2024 - How to Run a Security ProgramWSO2
 
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...masabamasaba
 
Artyushina_Guest lecture_YorkU CS May 2024.pptx
Artyushina_Guest lecture_YorkU CS May 2024.pptxArtyushina_Guest lecture_YorkU CS May 2024.pptx
Artyushina_Guest lecture_YorkU CS May 2024.pptxAnnaArtyushina1
 
WSO2CON 2024 - Freedom First—Unleashing Developer Potential with Open Source
WSO2CON 2024 - Freedom First—Unleashing Developer Potential with Open SourceWSO2CON 2024 - Freedom First—Unleashing Developer Potential with Open Source
WSO2CON 2024 - Freedom First—Unleashing Developer Potential with Open SourceWSO2
 
WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...
WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...
WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...WSO2
 
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...Bert Jan Schrijver
 
WSO2Con2024 - Enabling Transactional System's Exponential Growth With Simplicity
WSO2Con2024 - Enabling Transactional System's Exponential Growth With SimplicityWSO2Con2024 - Enabling Transactional System's Exponential Growth With Simplicity
WSO2Con2024 - Enabling Transactional System's Exponential Growth With SimplicityWSO2
 
AI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplateAI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplatePresentation.STUDIO
 
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...Shane Coughlan
 
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...masabamasaba
 
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfonteinmasabamasaba
 
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...masabamasaba
 

Kürzlich hochgeladen (20)

%in Rustenburg+277-882-255-28 abortion pills for sale in Rustenburg
%in Rustenburg+277-882-255-28 abortion pills for sale in Rustenburg%in Rustenburg+277-882-255-28 abortion pills for sale in Rustenburg
%in Rustenburg+277-882-255-28 abortion pills for sale in Rustenburg
 
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
 
%in Benoni+277-882-255-28 abortion pills for sale in Benoni
%in Benoni+277-882-255-28 abortion pills for sale in Benoni%in Benoni+277-882-255-28 abortion pills for sale in Benoni
%in Benoni+277-882-255-28 abortion pills for sale in Benoni
 
WSO2CON 2024 - API Management Usage at La Poste and Its Impact on Business an...
WSO2CON 2024 - API Management Usage at La Poste and Its Impact on Business an...WSO2CON 2024 - API Management Usage at La Poste and Its Impact on Business an...
WSO2CON 2024 - API Management Usage at La Poste and Its Impact on Business an...
 
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
 
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
 
VTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnVTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learn
 
WSO2CON 2024 - How to Run a Security Program
WSO2CON 2024 - How to Run a Security ProgramWSO2CON 2024 - How to Run a Security Program
WSO2CON 2024 - How to Run a Security Program
 
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
 
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
 
Artyushina_Guest lecture_YorkU CS May 2024.pptx
Artyushina_Guest lecture_YorkU CS May 2024.pptxArtyushina_Guest lecture_YorkU CS May 2024.pptx
Artyushina_Guest lecture_YorkU CS May 2024.pptx
 
WSO2CON 2024 - Freedom First—Unleashing Developer Potential with Open Source
WSO2CON 2024 - Freedom First—Unleashing Developer Potential with Open SourceWSO2CON 2024 - Freedom First—Unleashing Developer Potential with Open Source
WSO2CON 2024 - Freedom First—Unleashing Developer Potential with Open Source
 
WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...
WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...
WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...
 
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
 
WSO2Con2024 - Enabling Transactional System's Exponential Growth With Simplicity
WSO2Con2024 - Enabling Transactional System's Exponential Growth With SimplicityWSO2Con2024 - Enabling Transactional System's Exponential Growth With Simplicity
WSO2Con2024 - Enabling Transactional System's Exponential Growth With Simplicity
 
AI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplateAI & Machine Learning Presentation Template
AI & Machine Learning Presentation Template
 
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
 
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
 
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
 
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
 

Deep Q-learning explained

  • 2. Deep Q-Network Deep Reinforcement Learning Azzeddine CHENINE @azzeddineCH_ 2
  • 6. Quick Reminder Learning by experience Episodic tasks and continuous task 6
  • 7. Quick Reminder Learning by experience Episodic tasks and continuous task An action has a reward and results a new state 7
  • 8. Quick Reminder The value based methods try to reach an optimal policy by using a value function 8
  • 9. Quick Reminder The value based methods try to reach an optimal policy by using a value function The Q function of an optima policy maximise the expected return G 9
  • 10. Quick Reminder The value based methods try to reach an optimal policy by using a value function The Q function of an optima policy maximise the expected return G Epsilon greedy policies solve the exploration and exploitation problem 10
  • 12. § Limitation Non practical with environments with large or in fi nite state or action space Games Stock market Robots 12
  • 15. No more given de fi nitions …. 15
  • 16. No more given de fi nitions …. fi guring it out by ourselves 16
  • 17. 1 - How to replace the Q-table ? Neural networks can be trained to fi t on a linear or a non linear function. 17
  • 18. 1 - How to replace the Q-table ? Neural networks can be trained to fi t on a linear or a non linear function. Replace the Q-value function with a deep neural network 18
  • 19. 1 - How to replace the Q-table ? Neural networks can be trained to fi t on a linear or a non linear function. Replace the Q-value function with a deep neural network The outputs layer hold the Q-value of each possible action ⟹ make the actions space discrete 19
  • 20. 1 - How to replace the Q-table ? Neural networks can be trained to fi t on a linear or a non linear function. Replace the Q-value function with a deep neural network The outputs layer hold the Q-value of each possible action The input layer takes the desired representation of the state ⟹ make the actions space discrete ⟹ a pipeline of convolutions ⟹ a Dense layer ⟹ a sequence of LSTM/RNN cells 20
  • 21. 1 - How to replace the Q-table ✅ 21
  • 22. 1 - How to replace the Q-table ✅ 2 - How to optimize the model How did we used update the Q-table The agent took an action (a) in a state (S) 22
  • 23. 1 - How to replace the Q-table ✅ 2 - How to optimize the model How did we used update the Q-table The agent took an action (a) in a state (S) The agent moved to the next state (S’) and received a reward (r) 23
  • 24. 1 - How to replace the Q-table ✅ 2 - How to optimize the model the discount rate 𝞭 re fl ects how much we care about the futur reward How did we used update the Q-table The agent took an action (a) in a state (S) The agent moved to the next state (S’) and received a reward (r) 𝞭 ≈ 1⟹ we care too much about the future reward 𝞭 ≈ 0⟹ we care too much about the immediate reward 24
  • 25. 1 - How to replace the Q-table ✅ 2 - How to optimize the model Q' ( S , a ) ⟸ Q ( S , a ) + 𝛼 [ r ( S , a , S’ ) + 𝞭 max Q(S’, a') - Q ( S, a ) ] How did we used update the Q-table A good Q value for an action (a) given a state (S) is a good approximation to the expected discounted cumulative reward. ⟹ Q ( S, a ) ≈ r ( S , a , S’ ) + 𝞭 max Q(S’, a') 25
  • 26. 1 - How to replace the Q-table ✅ 2 - How to optimize the model How are we going to update the Q- networks weights Q-values State Q-networks ⟹ Q ( S, a ) ≈ r ( S , a , S’ ) + 𝞭 max Q(S’, a') 26
  • 27. 1 - How to replace the Q-table ✅ 2 - How to optimize the model How are we going to update the Q- networks weights Q-values Action Action selection State Q-networks Environments New State S’ Reward R ⟹ Q ( S, a ) ≈ r ( S , a , S’ ) + 𝞭 max Q(S’, a') 27
  • 28. 1 - How to replace the Q-table ✅ 2 - How to optimize the model How are we going to update the Q- networks weights Q-values Action Action selection ⨁ r ( S , a , S’ ) + 𝞭 max Q(S’, a') State Q-networks Environments New State S’ Reward R ⟹ Q ( S, a ) ≈ r ( S , a , S’ ) + 𝞭 max Q(S’, a') 28
  • 29. 1 - How to replace the Q-table ✅ 2 - How to optimize the model How are we going to update the Q- networks weights Q-values Action Action selection ⨁ r ( S , a , S’ ) + 𝞭 max Q(S’, a') Q(S, a') State Q-networks Environments New State S’ Reward R ⟹ Q ( S, a ) ≈ r ( S , a , S’ ) + 𝞭 max Q(S’, a') 29
  • 30. 1 - How to replace the Q-table ✅ 2 - How to optimize the model How are we going to update the Q- networks weights Q-values Action Action selection ⨁ r ( S , a , S’ ) + 𝞭 max Q(S’, a') Q(S, a') State Q-networks Environments New State S’ Reward R Loss Optimizers ⟹ Q ( S, a ) ≈ r ( S , a , S’ ) + 𝞭 max Q(S’, a') 30
  • 31. 1 - How to replace the Q-table ✅ 2 - How to optimize the model How are we going to update the Q- networks weights Q-values Action Action selection ⨁ r ( S , a , S’ ) + 𝞭 max Q(S’, a') Q(S, a') New network weights State Q-networks Environments New State S’ Reward R Loss Optimizers ⟹ Q ( S, a ) ≈ r ( S , a , S’ ) + 𝞭 max Q(S’, a') 31
  • 32. 1 - How to replace the Q-table ✅ 2 - How to optimize the model How are we going to update the Q- networks weights ⟹ Q ( S, a ) ≈ r ( S , a , S’ ) + 𝞭 max Q(S’, a') Update weights State New state For n episode 32
  • 33. We are chasing a moving target 33
  • 34. r ( S , a , S’ ) + 𝞭 max Q(S’, a') Q ( S, a ) 34
  • 35. Q ( S, a ) r ( S , a , S’ ) + 𝞭 max Q(S’, a') 35
  • 36. Q ( S, a ) r ( S , a , S’ ) + 𝞭 max Q(S’, a') 36
  • 37. r ( S , a , S’ ) + 𝞭 max Q(S’, a') Q ( S, a ) 37
  • 38. 1 - How to replace the Q-table ✅ 2 - How to optimize the model ✅ How are we going to update the Q- networks weights 3 - T-target We use a local network and a target network 38
  • 39. 1 - How to replace the Q-table ✅ 2 - How to optimize the model ✅ How are we going to update the Q- networks weights 3 - T-target We use a local network and a target network The target neural network get slightly updated based on the local networks weights 39
  • 40. 1 - How to replace the Q-table ✅ 2 - How to optimize the model ✅ How are we going to update the Q- networks weights 3 - T-target We use a local network and a target network The target neural network get slightly updated based on the local networks weights The local network predict the Q-values for the current state 40
  • 41. 1 - How to replace the Q-table ✅ 2 - How to optimize the model ✅ How are we going to update the Q- networks weights 3 - T-target We use a local network and a target network The target neural network get slightly updated based on the local networks weights The local network predict the Q-values for the current state The target network predict the Q-values for the next state 41
  • 42. 1 - How to replace the Q-table ✅ 2 - How to optimize the model ✅ How are we going to update the Q- networks weights 3 - T-target Q-values Action Action selection State Local network Environments New State S’ Reward R 42
  • 43. 1 - How to replace the Q-table ✅ 2 - How to optimize the model ✅ How are we going to update the Q- networks weights 3 - T-target Q-values Action Action selection ⨁ r ( S , a , S’ ) + 𝞭 max Q(S’, a') Q(S, a') New network weights State Local network Environments New State S’ Reward R Loss Optimizers for the local network Target network 43
  • 44. ? Someone to answer the question please In supervised learning, why we shu ffl ing the dataset gives better accuracy? 44
  • 45. Someone to answer the question please In supervised learning, why we shu ffl ing the dataset gives better accuracy? If not shu ffl ing data, the data can be sorted or similar data points will lie next to each other, which leads to slow convergence ? 45
  • 46. Someone to answer the question please In supervised learning, why we shu ffl ing the dataset gives better accuracy? If not shu ffl ing data, the data can be sorted or similar data points will lie next to each other, which leads to slow convergence In our case we are having a big correlation between consecutive states which may lead to ine ffi cient learning . ? 46
  • 47. 1 - How to replace the Q-table ✅ 2 - How to optimize the model ✅ How to break the coloration between states 3 - T-target ✅ 4 - Experience replay Stack the experiences in a buffer After the speci fi ed number of steps pick an N random experiences Proceed with optimizing the loss with the learning batch 47
  • 48. 1 - How to replace the Q-table ✅ 2 - How to optimize the model ✅ How to break the coloration between states 3 - T-target ✅ 4 - Experience replay ✅ Stack the experiences in a buffer After the speci fi ed number of steps pick an N random experiences Proceed with optimizing the loss with the learning batch 48
  • 49. Build a DQL agent Putting it all together 49