SlideShare ist ein Scribd-Unternehmen logo
1 von 32
Deep Reinforcement
Learning
using deep learning to play self-driving car games
Ben Lau
Ben Lau - Deep Learning and Reinforcement
MLConf 2017, New York City
What is Reinforcement
Learning?
Ben Lau - Deep Learning and Reinforcement
3 classes of
learning
Supervised Learning
 Label data
 Direct Feedback
Unsupervised Learning
 No labels data
 No feedback
 “Find Hidden Structure
Reinforcement Learning
 Using reward as feedback
 Learn series of actions
 Trial and Error
RL: Agent and Environment
Ben Lau - Deep Learning and Reinforcement
𝑅𝑡
Agent
Action 𝐴 𝑡
Environment
Reward
Observation 𝑂𝑡
At each step t the Agent
• Receive observation 𝑂𝑡
• Execute action 𝐴 𝑡
• Receive reward 𝑅𝑡
the Environment
• Receive action 𝐴 𝑡
• Sends observation 𝑂𝑡+1
• Sends reward 𝑅𝑡+1
RL: State
Ben Lau - Deep Learning and Reinforcement
Experience is a sequence of observations, actions, rewards
𝑜1, 𝑟1, 𝑎1 … , 𝑜𝑡−1, 𝑟𝑡−1, 𝑎 𝑡−1, 𝑜𝑡, 𝑟𝑡, 𝑎 𝑡
The state is a summary of experience
𝑠𝑡 = 𝑓(𝑜1, 𝑟1, 𝑎1 … , 𝑜𝑡−1, 𝑟𝑡−1, 𝑎 𝑡−1, 𝑜𝑡, 𝑟𝑡, 𝑎 𝑡)
Note: Not all the state are fully observable
Fully Observable Not Fully Observable
Approach to Reinforcement
Learning
Ben Lau - Deep Learning and Reinforcement
Value-Based RL
 Estimate the optimal value function 𝑄∗(𝑠, 𝑎)
 This is the maximum value achievable under any policy
Policy-Based RL
 Search directly for the optimal policy 𝜋∗
 This is the policy achieving maximum future reward
Model-based RL
 Build a model of the environment
 Plan (e.g. by lookahead) using model
Deep Learning + RL  AI
Ben Lau - Deep Learning and Reinforcement
reward
Game input
Deep convolution network
Stee
r
Gas
Peda
l
Brake
Policies
Ben Lau - Deep Learning and Reinforcement
A deterministic policy is the agent’s behavior
 It is a map from state to action:
 𝑎 𝑡 = 𝜋(𝑠𝑡)
In Reinforcement Learning, the agent’s goal is to
choose each action such that it maximize the sum
of future rewards
Choose at to maximize 𝑅𝑡 = 𝑟𝑡+1 + 𝛾𝑟𝑡+2 + 𝛾2
𝑟𝑡+3 + ⋯
𝛾 is a discount factor [0,1], as the reward is less certain when
further away
State(s) Action(a)
Obstacle Brake
Corner Left/Right
Straight line Acceleration
Approach to Reinforcement
Learning
Ben Lau - Deep Learning and Reinforcement
Value-Based RL
 Estimate the optimal value function 𝑄∗(𝑠, 𝑎)
 This is the maximum value achievable under any policy
Value Function
Ben Lau - Deep Learning and Reinforcement
 A value function is a prediction of future reward
 How much reward will I get from action a in state s?
 A Q-value function gives expected total reward
 From state-action pair (s, a)
 Under policy 𝜋
 With discount factor 𝛾
𝑄 𝜋
𝑠, 𝑎 = 𝐸 𝑟𝑡+1 + 𝛾𝑟𝑡+2 + 𝛾2
𝑟𝑡+3 + ⋯ 𝑠, 𝑎]
 An optimal value function is the maximum achievable value
𝑄∗ 𝑠, 𝑎 = 𝑀𝑎𝑥 𝑎 𝑄 𝜋 𝑠, 𝑎
 Once we have the 𝑄∗
we can act optimally
𝜋∗
𝑠 = 𝑎𝑟𝑔𝑚𝑎𝑥 𝑄∗
(𝑠, 𝑎)
Understanding Q Function
Ben Lau - Deep Learning and Reinforcement
 The best way to understand Q function is considering a “strategy guide”
 Suppose you are playing a difficult game (DOOM)
 If you have a strategy guide, it’s pretty easy  Just follow the guide
 Suppose you are in state s, and need to make a decision, If you have this m
Q-function(strategy guide), then it is easy, just pick the action with highest Q
Doom Strategy Guide
How to find Q-function
Ben Lau - Deep Learning and Reinforcement
 Discount Future Reward:𝑅𝑡 = 𝑟𝑡 + 𝛾𝑟𝑡+1 + 𝛾2 𝑟𝑡+2 + ⋯ + 𝛾 𝑛−𝑡 𝑟𝑛
which can be written as:
 𝑅𝑡 = 𝑟𝑡 + 𝛾𝑅𝑡+1
Recall the definition of Q-function (max reward if choose action a in state s)
 𝑄 𝑠𝑡, 𝑎 𝑡 = max 𝑅𝑡+1
Therefore, we can rewrite the Q-function as below
 𝑄 𝑠, 𝑎 = 𝑟 + 𝛾 × 𝑚𝑎𝑥 𝑎′Q(𝑠′, 𝑎′)
In plain English, it means maximum future reward for (s,a) is the
immediate reward r + maximum future reward in next state s’, action a’
It can be solved by dynamic programming or iterative solution
Deep Q-Network (DQN)
Ben Lau - Deep Learning and Reinforcement
 Action-Value function (Q-function) often very big
 DQN idea: I use the neural network to compress this Q-table, using
the weight (w) in the neural network
 𝑄 𝑠, 𝑎 ≈ 𝑄 𝑠, 𝑎, 𝑤
 Training become finding sets of optimal weights w instead
 In the literature we often called “non-linear function approximation”
State Action Value
A 1 140.11
A 2 139.22
B 1 145.89
B 2 140.23
C 1 123.67
C 2 135.27
≈
DQN Demo Using DeepQ network to play Doom
Approach to Reinforcement
Learning
Ben Lau - Deep Learning and Reinforcement
Policy-Based RL
 Search directly for the optimal policy 𝜋∗
 This is the policy achieving maximum future reward
Deep Policy Network
Ben Lau - Deep Learning and Reinforcement
Review: A policy is the agent’s behavior
 It is a map from state to action:
 at = π(st)
 We can directly search the policy
 Let’s parameterize the policy by some model parameters 𝜃
 𝑎 = 𝜋(𝑠, 𝜃)
 We called it Policy-Based reinforcement learning because we
will adjust the model parameters 𝜃 directly
 The goal is to maximize the total discount reward from beginning
maximize total 𝑅 = 𝑟1 + 𝛾𝑟2 + 𝛾2
𝑟3 + ⋯
Policy Gradient
Ben Lau - Deep Learning and Reinforcement
How to make good action more likely?
 Define objective function as total discounted reward
𝐿 𝜃 = 𝐸 𝑟1 + 𝛾𝑟2 + 𝛾2
𝑟3 + ⋯ |𝜋 𝜃(𝑠, 𝑎)
or
𝐿 𝜃 = 𝐸 𝑅|𝜋 𝜃(𝑠, 𝑎)
Where the expectations of the total reward R is calculated under some
probability distribution 𝑝(𝑎|𝜃) parameterized by 𝜃
 The goal become maximize the total reward by
compute the gradient
𝜕𝐿(𝜃)
𝜕𝜃
Policy Gradient (II)
Ben Lau - Deep Learning and Reinforcement
Recall: Q-function is the maximum discounted future reward in state s, actio
𝑄 𝑠𝑡, 𝑎 𝑡 = 𝑚𝑎𝑥𝑅𝑡+1
 In the continuous case we can written as
𝑄 𝑠𝑡, 𝑎 𝑡 = 𝑅𝑡+1
Therefore, we can compute the gradient as
𝜕𝐿(𝜃)
𝜕𝜃
= 𝐸 𝑝(𝑎|𝜃)
𝜕𝑄
𝜕𝜃
 Using chain-rule, we can re-write as
𝜕𝐿(𝜃)
𝜕𝜃
= 𝐸 𝑝(𝑎|𝜃)
𝜕𝑄 𝜃(𝑠,𝑎)
𝜕𝑎
𝜕𝑎
𝜕𝜃
No dynamics model required!
1. Only requires Q is differential w.r.t. a
2. As long as a can be parameterized
as function of 𝜃
The power of Policy Gradient
Ben Lau - Deep Learning and Reinforcement
Because the policy gradient does not require the dynamical model
therefore, no prior domain knowledge is required
AlphaGo doesn’t pre-programme any domain knowledge
It keep playing many times (via self-play) and adjust the policy parameters 𝜃
to maximize the reward(winning probability)
Intuition: Value vs Policy RL
Ben Lau - Deep Learning and Reinforcement
 Valued Based RL is similar to driving instructor : A score is
given for any action is taken by student
 Policy Based RL is similar to a driver : It is the actual policy
how to drive a car
The car racing game TORCS
Ben Lau - Deep Learning and Reinforcement
 TORCS is a state of the art open source simulator written in C++
 Main Features
 Sophisticated dynamics
 Provided with several
tracks, controllers
 Sensors
 Rangefinder
 Speed
 Position on track
 Rotation speed of wheels
 RPM
 Angle with tracks
Quite realistic to self-driving cars… Track sensors
Deep Learning Recipe
Ben Lau - Deep Learning and Reinforcement
reward
Game input state s
Deep Neural network
Stee
r
Gas
Peda
l
Brak
e
 Rangefinder
 Speed
 Position on track
 Rotation speed of wheels
 RPM
 Angle with tracks
Compute the optimal policy 𝜋 via policy gradient
Design of the reward function
Ben Lau - Deep Learning and Reinforcement
 Obvious choice : Highest velocity of the car 𝑅 = 𝑉𝑐𝑎𝑟 cos 𝜃
 However, experience found that learning not very stable
 Use modify reward function 𝑅 = 𝑉𝑥 cos 𝜃 −𝑉𝑥 sin 𝜃 −𝑉𝑥|track pos|
Encourage stay in the center of the track
Source code available here:
Google: DDPG Keras
Ben Lau - Deep Learning and Reinforcement
Training Set: Aalborg Track
Validation Set: Alpine Tracks
Recall basic Machine Learning, make sure you need to test the
model
In the validation set, not the training set
Learning how to brake
Ben Lau - Deep Learning and Reinforcement
Since we try to maximize the velocity of the car
The AI agent don’t want to hit the brake at all! (As it go against the reward function)
Using Stochastic Brake Idea
Final Demo – Car does not stay center
of track
Ben Lau - Deep Learning and Reinforcement
Future Application
Ben Lau - Deep Learning and Reinforcement
Self driving cars:
Future Application
Thank you!
Twitter: @yanpanlau
Appendix
How to find Q-function (II)
Ben Lau - Deep Learning and Reinforcement
 𝑄 𝑠, 𝑎 = 𝑟 + 𝛾 × 𝑚𝑎𝑥 𝑎′Q(𝑠′
, 𝑎′
)
We could use iterative method to solve the Q-function, given a transition (s,a,
 We want 𝑟 + 𝛾 × 𝑚𝑎𝑥 𝑎′Q(𝑠′
, 𝑎′
) to be same as 𝑄 𝑠, 𝑎
 Consider find Q-function is a regression task, we can define a loss function
 Loss function =
1
2
𝑟 + 𝛾 × 𝑚𝑎𝑥 𝑎′Q(𝑠′, 𝑎′) − 𝑄(𝑠, 𝑎) 2
 Q is optimal when the loss function is minimum
target prediction

Weitere ähnliche Inhalte

Was ist angesagt?

Reinforcement learning
Reinforcement learningReinforcement learning
Reinforcement learningDongHyun Kwak
 
Deep Reinforcement Learning
Deep Reinforcement LearningDeep Reinforcement Learning
Deep Reinforcement LearningUsman Qayyum
 
Reinforcement learning 7313
Reinforcement learning 7313Reinforcement learning 7313
Reinforcement learning 7313Slideshare
 
An introduction to reinforcement learning (rl)
An introduction to reinforcement learning (rl)An introduction to reinforcement learning (rl)
An introduction to reinforcement learning (rl)pauldix
 
Reinforcement Learning
Reinforcement LearningReinforcement Learning
Reinforcement LearningDongHyun Kwak
 
Introduction of Deep Reinforcement Learning
Introduction of Deep Reinforcement LearningIntroduction of Deep Reinforcement Learning
Introduction of Deep Reinforcement LearningNAVER Engineering
 
An introduction to reinforcement learning
An introduction to  reinforcement learningAn introduction to  reinforcement learning
An introduction to reinforcement learningJie-Han Chen
 
Demystifying deep reinforement learning
Demystifying deep reinforement learningDemystifying deep reinforement learning
Demystifying deep reinforement learning재연 윤
 
An introduction to deep reinforcement learning
An introduction to deep reinforcement learningAn introduction to deep reinforcement learning
An introduction to deep reinforcement learningBig Data Colombia
 
Reinforcement Learning (DLAI D7L2 2017 UPC Deep Learning for Artificial Intel...
Reinforcement Learning (DLAI D7L2 2017 UPC Deep Learning for Artificial Intel...Reinforcement Learning (DLAI D7L2 2017 UPC Deep Learning for Artificial Intel...
Reinforcement Learning (DLAI D7L2 2017 UPC Deep Learning for Artificial Intel...Universitat Politècnica de Catalunya
 
Jsai final final final
Jsai final final finalJsai final final final
Jsai final final finaldinesh malla
 
Reinforcement learning
Reinforcement learningReinforcement learning
Reinforcement learningDing Li
 
Trust Region Policy Optimization, Schulman et al, 2015
Trust Region Policy Optimization, Schulman et al, 2015Trust Region Policy Optimization, Schulman et al, 2015
Trust Region Policy Optimization, Schulman et al, 2015Chris Ohk
 
Financial Trading as a Game: A Deep Reinforcement Learning Approach
Financial Trading as a Game: A Deep Reinforcement Learning ApproachFinancial Trading as a Game: A Deep Reinforcement Learning Approach
Financial Trading as a Game: A Deep Reinforcement Learning Approach謙益 黃
 
Introduction to Reinforcement Learning
Introduction to Reinforcement LearningIntroduction to Reinforcement Learning
Introduction to Reinforcement LearningEdward Balaban
 
Continuous Control with Deep Reinforcement Learning, lillicrap et al, 2015
Continuous Control with Deep Reinforcement Learning, lillicrap et al, 2015Continuous Control with Deep Reinforcement Learning, lillicrap et al, 2015
Continuous Control with Deep Reinforcement Learning, lillicrap et al, 2015Chris Ohk
 
Multi PPT - Agent Actor-Critic for Mixed Cooperative-Competitive Environments
Multi PPT - Agent Actor-Critic for Mixed Cooperative-Competitive EnvironmentsMulti PPT - Agent Actor-Critic for Mixed Cooperative-Competitive Environments
Multi PPT - Agent Actor-Critic for Mixed Cooperative-Competitive EnvironmentsJisang Yoon
 
1118_Seminar_Continuous_Deep Q-Learning with Model based acceleration
1118_Seminar_Continuous_Deep Q-Learning with Model based acceleration1118_Seminar_Continuous_Deep Q-Learning with Model based acceleration
1118_Seminar_Continuous_Deep Q-Learning with Model based accelerationHye-min Ahn
 

Was ist angesagt? (20)

Reinforcement learning
Reinforcement learningReinforcement learning
Reinforcement learning
 
Deep Reinforcement Learning
Deep Reinforcement LearningDeep Reinforcement Learning
Deep Reinforcement Learning
 
Reinforcement learning 7313
Reinforcement learning 7313Reinforcement learning 7313
Reinforcement learning 7313
 
An introduction to reinforcement learning (rl)
An introduction to reinforcement learning (rl)An introduction to reinforcement learning (rl)
An introduction to reinforcement learning (rl)
 
Reinforcement Learning
Reinforcement LearningReinforcement Learning
Reinforcement Learning
 
Introduction of Deep Reinforcement Learning
Introduction of Deep Reinforcement LearningIntroduction of Deep Reinforcement Learning
Introduction of Deep Reinforcement Learning
 
An introduction to reinforcement learning
An introduction to  reinforcement learningAn introduction to  reinforcement learning
An introduction to reinforcement learning
 
Demystifying deep reinforement learning
Demystifying deep reinforement learningDemystifying deep reinforement learning
Demystifying deep reinforement learning
 
An introduction to deep reinforcement learning
An introduction to deep reinforcement learningAn introduction to deep reinforcement learning
An introduction to deep reinforcement learning
 
Reinforcement Learning (DLAI D7L2 2017 UPC Deep Learning for Artificial Intel...
Reinforcement Learning (DLAI D7L2 2017 UPC Deep Learning for Artificial Intel...Reinforcement Learning (DLAI D7L2 2017 UPC Deep Learning for Artificial Intel...
Reinforcement Learning (DLAI D7L2 2017 UPC Deep Learning for Artificial Intel...
 
Generalized Reinforcement Learning
Generalized Reinforcement LearningGeneralized Reinforcement Learning
Generalized Reinforcement Learning
 
Jsai final final final
Jsai final final finalJsai final final final
Jsai final final final
 
Reinforcement learning
Reinforcement learningReinforcement learning
Reinforcement learning
 
Trust Region Policy Optimization, Schulman et al, 2015
Trust Region Policy Optimization, Schulman et al, 2015Trust Region Policy Optimization, Schulman et al, 2015
Trust Region Policy Optimization, Schulman et al, 2015
 
Hierarchical Object Detection with Deep Reinforcement Learning
Hierarchical Object Detection with Deep Reinforcement LearningHierarchical Object Detection with Deep Reinforcement Learning
Hierarchical Object Detection with Deep Reinforcement Learning
 
Financial Trading as a Game: A Deep Reinforcement Learning Approach
Financial Trading as a Game: A Deep Reinforcement Learning ApproachFinancial Trading as a Game: A Deep Reinforcement Learning Approach
Financial Trading as a Game: A Deep Reinforcement Learning Approach
 
Introduction to Reinforcement Learning
Introduction to Reinforcement LearningIntroduction to Reinforcement Learning
Introduction to Reinforcement Learning
 
Continuous Control with Deep Reinforcement Learning, lillicrap et al, 2015
Continuous Control with Deep Reinforcement Learning, lillicrap et al, 2015Continuous Control with Deep Reinforcement Learning, lillicrap et al, 2015
Continuous Control with Deep Reinforcement Learning, lillicrap et al, 2015
 
Multi PPT - Agent Actor-Critic for Mixed Cooperative-Competitive Environments
Multi PPT - Agent Actor-Critic for Mixed Cooperative-Competitive EnvironmentsMulti PPT - Agent Actor-Critic for Mixed Cooperative-Competitive Environments
Multi PPT - Agent Actor-Critic for Mixed Cooperative-Competitive Environments
 
1118_Seminar_Continuous_Deep Q-Learning with Model based acceleration
1118_Seminar_Continuous_Deep Q-Learning with Model based acceleration1118_Seminar_Continuous_Deep Q-Learning with Model based acceleration
1118_Seminar_Continuous_Deep Q-Learning with Model based acceleration
 

Andere mochten auch

Caroline Sinders, Online Harassment Researcher, Wikimedia at The AI Conferenc...
Caroline Sinders, Online Harassment Researcher, Wikimedia at The AI Conferenc...Caroline Sinders, Online Harassment Researcher, Wikimedia at The AI Conferenc...
Caroline Sinders, Online Harassment Researcher, Wikimedia at The AI Conferenc...MLconf
 
Scott Clark, Co-Founder and CEO, SigOpt at MLconf SF 2016
Scott Clark, Co-Founder and CEO, SigOpt at MLconf SF 2016Scott Clark, Co-Founder and CEO, SigOpt at MLconf SF 2016
Scott Clark, Co-Founder and CEO, SigOpt at MLconf SF 2016MLconf
 
Yi Wang, Tech Lead of AI Platform, Baidu, at MLconf 2017
Yi Wang, Tech Lead of AI Platform, Baidu, at MLconf 2017Yi Wang, Tech Lead of AI Platform, Baidu, at MLconf 2017
Yi Wang, Tech Lead of AI Platform, Baidu, at MLconf 2017MLconf
 
Andrew Musselman, Committer and PMC Member, Apache Mahout, at MLconf Seattle ...
Andrew Musselman, Committer and PMC Member, Apache Mahout, at MLconf Seattle ...Andrew Musselman, Committer and PMC Member, Apache Mahout, at MLconf Seattle ...
Andrew Musselman, Committer and PMC Member, Apache Mahout, at MLconf Seattle ...MLconf
 
Aaron Roth, Associate Professor, University of Pennsylvania, at MLconf NYC 2017
Aaron Roth, Associate Professor, University of Pennsylvania, at MLconf NYC 2017Aaron Roth, Associate Professor, University of Pennsylvania, at MLconf NYC 2017
Aaron Roth, Associate Professor, University of Pennsylvania, at MLconf NYC 2017MLconf
 
Rajat Monga, Engineering Director, TensorFlow, Google at MLconf 2016
Rajat Monga, Engineering Director, TensorFlow, Google at MLconf 2016Rajat Monga, Engineering Director, TensorFlow, Google at MLconf 2016
Rajat Monga, Engineering Director, TensorFlow, Google at MLconf 2016MLconf
 
Irina Rish, Researcher, IBM Watson, at MLconf NYC 2017
Irina Rish, Researcher, IBM Watson, at MLconf NYC 2017Irina Rish, Researcher, IBM Watson, at MLconf NYC 2017
Irina Rish, Researcher, IBM Watson, at MLconf NYC 2017MLconf
 
Serena Yeung, PHD, Stanford, at MLconf Seattle 2017
Serena Yeung, PHD, Stanford, at MLconf Seattle 2017 Serena Yeung, PHD, Stanford, at MLconf Seattle 2017
Serena Yeung, PHD, Stanford, at MLconf Seattle 2017 MLconf
 
Nikhil Garg, Engineering Manager, Quora at MLconf SF 2016
Nikhil Garg, Engineering Manager, Quora at MLconf SF 2016Nikhil Garg, Engineering Manager, Quora at MLconf SF 2016
Nikhil Garg, Engineering Manager, Quora at MLconf SF 2016MLconf
 
Hanie Sedghi, Research Scientist at Allen Institute for Artificial Intelligen...
Hanie Sedghi, Research Scientist at Allen Institute for Artificial Intelligen...Hanie Sedghi, Research Scientist at Allen Institute for Artificial Intelligen...
Hanie Sedghi, Research Scientist at Allen Institute for Artificial Intelligen...MLconf
 
Scott Clark, CEO, SigOpt, at The AI Conference 2017
Scott Clark, CEO, SigOpt, at The AI Conference 2017Scott Clark, CEO, SigOpt, at The AI Conference 2017
Scott Clark, CEO, SigOpt, at The AI Conference 2017MLconf
 
Jean-François Puget, Distinguished Engineer, Machine Learning and Optimizatio...
Jean-François Puget, Distinguished Engineer, Machine Learning and Optimizatio...Jean-François Puget, Distinguished Engineer, Machine Learning and Optimizatio...
Jean-François Puget, Distinguished Engineer, Machine Learning and Optimizatio...MLconf
 
Alex Smola, Director of Machine Learning, AWS/Amazon, at MLconf SF 2016
Alex Smola, Director of Machine Learning, AWS/Amazon, at MLconf SF 2016Alex Smola, Director of Machine Learning, AWS/Amazon, at MLconf SF 2016
Alex Smola, Director of Machine Learning, AWS/Amazon, at MLconf SF 2016MLconf
 
Ashirth Barthur, Security Scientist, H2O, at MLconf Seattle 2017
Ashirth Barthur, Security Scientist, H2O, at MLconf Seattle 2017Ashirth Barthur, Security Scientist, H2O, at MLconf Seattle 2017
Ashirth Barthur, Security Scientist, H2O, at MLconf Seattle 2017MLconf
 
Ross Goodwin, Technologist, Sunspring, MLconf NYC 2017
Ross Goodwin, Technologist, Sunspring, MLconf NYC 2017Ross Goodwin, Technologist, Sunspring, MLconf NYC 2017
Ross Goodwin, Technologist, Sunspring, MLconf NYC 2017MLconf
 
Josh Patterson, Advisor, Skymind – Deep learning for Industry at MLconf ATL 2016
Josh Patterson, Advisor, Skymind – Deep learning for Industry at MLconf ATL 2016Josh Patterson, Advisor, Skymind – Deep learning for Industry at MLconf ATL 2016
Josh Patterson, Advisor, Skymind – Deep learning for Industry at MLconf ATL 2016MLconf
 
Luna Dong, Principal Scientist, Amazon at MLconf Seattle 2017
Luna Dong, Principal Scientist, Amazon at MLconf Seattle 2017Luna Dong, Principal Scientist, Amazon at MLconf Seattle 2017
Luna Dong, Principal Scientist, Amazon at MLconf Seattle 2017MLconf
 
Chris Fregly, Research Scientist, PipelineIO at MLconf ATL 2016
Chris Fregly, Research Scientist, PipelineIO at MLconf ATL 2016Chris Fregly, Research Scientist, PipelineIO at MLconf ATL 2016
Chris Fregly, Research Scientist, PipelineIO at MLconf ATL 2016MLconf
 
Funda Gunes, Senior Research Statistician Developer & Patrick Koch, Principal...
Funda Gunes, Senior Research Statistician Developer & Patrick Koch, Principal...Funda Gunes, Senior Research Statistician Developer & Patrick Koch, Principal...
Funda Gunes, Senior Research Statistician Developer & Patrick Koch, Principal...MLconf
 
Stephanie deWet, Software Engineer, Pinterest at MLconf SF 2016
Stephanie deWet, Software Engineer, Pinterest at MLconf SF 2016Stephanie deWet, Software Engineer, Pinterest at MLconf SF 2016
Stephanie deWet, Software Engineer, Pinterest at MLconf SF 2016MLconf
 

Andere mochten auch (20)

Caroline Sinders, Online Harassment Researcher, Wikimedia at The AI Conferenc...
Caroline Sinders, Online Harassment Researcher, Wikimedia at The AI Conferenc...Caroline Sinders, Online Harassment Researcher, Wikimedia at The AI Conferenc...
Caroline Sinders, Online Harassment Researcher, Wikimedia at The AI Conferenc...
 
Scott Clark, Co-Founder and CEO, SigOpt at MLconf SF 2016
Scott Clark, Co-Founder and CEO, SigOpt at MLconf SF 2016Scott Clark, Co-Founder and CEO, SigOpt at MLconf SF 2016
Scott Clark, Co-Founder and CEO, SigOpt at MLconf SF 2016
 
Yi Wang, Tech Lead of AI Platform, Baidu, at MLconf 2017
Yi Wang, Tech Lead of AI Platform, Baidu, at MLconf 2017Yi Wang, Tech Lead of AI Platform, Baidu, at MLconf 2017
Yi Wang, Tech Lead of AI Platform, Baidu, at MLconf 2017
 
Andrew Musselman, Committer and PMC Member, Apache Mahout, at MLconf Seattle ...
Andrew Musselman, Committer and PMC Member, Apache Mahout, at MLconf Seattle ...Andrew Musselman, Committer and PMC Member, Apache Mahout, at MLconf Seattle ...
Andrew Musselman, Committer and PMC Member, Apache Mahout, at MLconf Seattle ...
 
Aaron Roth, Associate Professor, University of Pennsylvania, at MLconf NYC 2017
Aaron Roth, Associate Professor, University of Pennsylvania, at MLconf NYC 2017Aaron Roth, Associate Professor, University of Pennsylvania, at MLconf NYC 2017
Aaron Roth, Associate Professor, University of Pennsylvania, at MLconf NYC 2017
 
Rajat Monga, Engineering Director, TensorFlow, Google at MLconf 2016
Rajat Monga, Engineering Director, TensorFlow, Google at MLconf 2016Rajat Monga, Engineering Director, TensorFlow, Google at MLconf 2016
Rajat Monga, Engineering Director, TensorFlow, Google at MLconf 2016
 
Irina Rish, Researcher, IBM Watson, at MLconf NYC 2017
Irina Rish, Researcher, IBM Watson, at MLconf NYC 2017Irina Rish, Researcher, IBM Watson, at MLconf NYC 2017
Irina Rish, Researcher, IBM Watson, at MLconf NYC 2017
 
Serena Yeung, PHD, Stanford, at MLconf Seattle 2017
Serena Yeung, PHD, Stanford, at MLconf Seattle 2017 Serena Yeung, PHD, Stanford, at MLconf Seattle 2017
Serena Yeung, PHD, Stanford, at MLconf Seattle 2017
 
Nikhil Garg, Engineering Manager, Quora at MLconf SF 2016
Nikhil Garg, Engineering Manager, Quora at MLconf SF 2016Nikhil Garg, Engineering Manager, Quora at MLconf SF 2016
Nikhil Garg, Engineering Manager, Quora at MLconf SF 2016
 
Hanie Sedghi, Research Scientist at Allen Institute for Artificial Intelligen...
Hanie Sedghi, Research Scientist at Allen Institute for Artificial Intelligen...Hanie Sedghi, Research Scientist at Allen Institute for Artificial Intelligen...
Hanie Sedghi, Research Scientist at Allen Institute for Artificial Intelligen...
 
Scott Clark, CEO, SigOpt, at The AI Conference 2017
Scott Clark, CEO, SigOpt, at The AI Conference 2017Scott Clark, CEO, SigOpt, at The AI Conference 2017
Scott Clark, CEO, SigOpt, at The AI Conference 2017
 
Jean-François Puget, Distinguished Engineer, Machine Learning and Optimizatio...
Jean-François Puget, Distinguished Engineer, Machine Learning and Optimizatio...Jean-François Puget, Distinguished Engineer, Machine Learning and Optimizatio...
Jean-François Puget, Distinguished Engineer, Machine Learning and Optimizatio...
 
Alex Smola, Director of Machine Learning, AWS/Amazon, at MLconf SF 2016
Alex Smola, Director of Machine Learning, AWS/Amazon, at MLconf SF 2016Alex Smola, Director of Machine Learning, AWS/Amazon, at MLconf SF 2016
Alex Smola, Director of Machine Learning, AWS/Amazon, at MLconf SF 2016
 
Ashirth Barthur, Security Scientist, H2O, at MLconf Seattle 2017
Ashirth Barthur, Security Scientist, H2O, at MLconf Seattle 2017Ashirth Barthur, Security Scientist, H2O, at MLconf Seattle 2017
Ashirth Barthur, Security Scientist, H2O, at MLconf Seattle 2017
 
Ross Goodwin, Technologist, Sunspring, MLconf NYC 2017
Ross Goodwin, Technologist, Sunspring, MLconf NYC 2017Ross Goodwin, Technologist, Sunspring, MLconf NYC 2017
Ross Goodwin, Technologist, Sunspring, MLconf NYC 2017
 
Josh Patterson, Advisor, Skymind – Deep learning for Industry at MLconf ATL 2016
Josh Patterson, Advisor, Skymind – Deep learning for Industry at MLconf ATL 2016Josh Patterson, Advisor, Skymind – Deep learning for Industry at MLconf ATL 2016
Josh Patterson, Advisor, Skymind – Deep learning for Industry at MLconf ATL 2016
 
Luna Dong, Principal Scientist, Amazon at MLconf Seattle 2017
Luna Dong, Principal Scientist, Amazon at MLconf Seattle 2017Luna Dong, Principal Scientist, Amazon at MLconf Seattle 2017
Luna Dong, Principal Scientist, Amazon at MLconf Seattle 2017
 
Chris Fregly, Research Scientist, PipelineIO at MLconf ATL 2016
Chris Fregly, Research Scientist, PipelineIO at MLconf ATL 2016Chris Fregly, Research Scientist, PipelineIO at MLconf ATL 2016
Chris Fregly, Research Scientist, PipelineIO at MLconf ATL 2016
 
Funda Gunes, Senior Research Statistician Developer & Patrick Koch, Principal...
Funda Gunes, Senior Research Statistician Developer & Patrick Koch, Principal...Funda Gunes, Senior Research Statistician Developer & Patrick Koch, Principal...
Funda Gunes, Senior Research Statistician Developer & Patrick Koch, Principal...
 
Stephanie deWet, Software Engineer, Pinterest at MLconf SF 2016
Stephanie deWet, Software Engineer, Pinterest at MLconf SF 2016Stephanie deWet, Software Engineer, Pinterest at MLconf SF 2016
Stephanie deWet, Software Engineer, Pinterest at MLconf SF 2016
 

Ähnlich wie Ben Lau, Quantitative Researcher, Hobbyist, at MLconf NYC 2017

RL_Dr.SNR Final ppt for Presentation 28.05.2021.pptx
RL_Dr.SNR Final ppt for Presentation 28.05.2021.pptxRL_Dr.SNR Final ppt for Presentation 28.05.2021.pptx
RL_Dr.SNR Final ppt for Presentation 28.05.2021.pptxdeeplearning6
 
Making smart decisions in real-time with Reinforcement Learning
Making smart decisions in real-time with Reinforcement LearningMaking smart decisions in real-time with Reinforcement Learning
Making smart decisions in real-time with Reinforcement LearningRuth Yakubu
 
Naive Reinforcement algorithm
Naive Reinforcement algorithmNaive Reinforcement algorithm
Naive Reinforcement algorithmSameerJolly2
 
rlpptgroup3-231018180804-0c05fb2f789piutt
rlpptgroup3-231018180804-0c05fb2f789piuttrlpptgroup3-231018180804-0c05fb2f789piutt
rlpptgroup3-231018180804-0c05fb2f789piutt201roopikha
 
Reinforcement learning
Reinforcement learningReinforcement learning
Reinforcement learningElias Hasnat
 
reinforcement-learning-141009013546-conversion-gate02.pdf
reinforcement-learning-141009013546-conversion-gate02.pdfreinforcement-learning-141009013546-conversion-gate02.pdf
reinforcement-learning-141009013546-conversion-gate02.pdfVaishnavGhadge1
 
Inspirit AI Deep Dive - Self Driving Car Project (Mar 2022)
Inspirit AI Deep Dive - Self Driving Car Project (Mar 2022)Inspirit AI Deep Dive - Self Driving Car Project (Mar 2022)
Inspirit AI Deep Dive - Self Driving Car Project (Mar 2022)EmilyJoseph18
 
24.09.2021 Reinforcement Learning Algorithms.pptx
24.09.2021 Reinforcement Learning Algorithms.pptx24.09.2021 Reinforcement Learning Algorithms.pptx
24.09.2021 Reinforcement Learning Algorithms.pptxManiMaran230751
 
0415_seminar_DeepDPG
0415_seminar_DeepDPG0415_seminar_DeepDPG
0415_seminar_DeepDPGHye-min Ahn
 
Matineh Shaker, Artificial Intelligence Scientist, Bonsai at MLconf SF 2017
Matineh Shaker, Artificial Intelligence Scientist, Bonsai at MLconf SF 2017Matineh Shaker, Artificial Intelligence Scientist, Bonsai at MLconf SF 2017
Matineh Shaker, Artificial Intelligence Scientist, Bonsai at MLconf SF 2017MLconf
 
Reinfrocement Learning
Reinfrocement LearningReinfrocement Learning
Reinfrocement LearningNatan Katz
 
Aaa ped-24- Reinforcement Learning
Aaa ped-24- Reinforcement LearningAaa ped-24- Reinforcement Learning
Aaa ped-24- Reinforcement LearningAminaRepo
 
semi supervised Learning and Reinforcement learning (1).pptx
 semi supervised Learning and Reinforcement learning (1).pptx semi supervised Learning and Reinforcement learning (1).pptx
semi supervised Learning and Reinforcement learning (1).pptxDr.Shweta
 
An efficient use of temporal difference technique in Computer Game Learning
An efficient use of temporal difference technique in Computer Game LearningAn efficient use of temporal difference technique in Computer Game Learning
An efficient use of temporal difference technique in Computer Game LearningPrabhu Kumar
 
PPT - Discovering Reinforcement Learning Algorithms
PPT - Discovering Reinforcement Learning AlgorithmsPPT - Discovering Reinforcement Learning Algorithms
PPT - Discovering Reinforcement Learning AlgorithmsJisang Yoon
 

Ähnlich wie Ben Lau, Quantitative Researcher, Hobbyist, at MLconf NYC 2017 (20)

RL.ppt
RL.pptRL.ppt
RL.ppt
 
RL_Dr.SNR Final ppt for Presentation 28.05.2021.pptx
RL_Dr.SNR Final ppt for Presentation 28.05.2021.pptxRL_Dr.SNR Final ppt for Presentation 28.05.2021.pptx
RL_Dr.SNR Final ppt for Presentation 28.05.2021.pptx
 
Making smart decisions in real-time with Reinforcement Learning
Making smart decisions in real-time with Reinforcement LearningMaking smart decisions in real-time with Reinforcement Learning
Making smart decisions in real-time with Reinforcement Learning
 
Naive Reinforcement algorithm
Naive Reinforcement algorithmNaive Reinforcement algorithm
Naive Reinforcement algorithm
 
rlpptgroup3-231018180804-0c05fb2f789piutt
rlpptgroup3-231018180804-0c05fb2f789piuttrlpptgroup3-231018180804-0c05fb2f789piutt
rlpptgroup3-231018180804-0c05fb2f789piutt
 
Reinforcement learning
Reinforcement learningReinforcement learning
Reinforcement learning
 
reinforcement-learning-141009013546-conversion-gate02.pdf
reinforcement-learning-141009013546-conversion-gate02.pdfreinforcement-learning-141009013546-conversion-gate02.pdf
reinforcement-learning-141009013546-conversion-gate02.pdf
 
Deep einforcement learning
Deep einforcement learningDeep einforcement learning
Deep einforcement learning
 
Inspirit AI Deep Dive - Self Driving Car Project (Mar 2022)
Inspirit AI Deep Dive - Self Driving Car Project (Mar 2022)Inspirit AI Deep Dive - Self Driving Car Project (Mar 2022)
Inspirit AI Deep Dive - Self Driving Car Project (Mar 2022)
 
24.09.2021 Reinforcement Learning Algorithms.pptx
24.09.2021 Reinforcement Learning Algorithms.pptx24.09.2021 Reinforcement Learning Algorithms.pptx
24.09.2021 Reinforcement Learning Algorithms.pptx
 
Intro rl
Intro rlIntro rl
Intro rl
 
0415_seminar_DeepDPG
0415_seminar_DeepDPG0415_seminar_DeepDPG
0415_seminar_DeepDPG
 
Q_Learning.ppt
Q_Learning.pptQ_Learning.ppt
Q_Learning.ppt
 
Matineh Shaker, Artificial Intelligence Scientist, Bonsai at MLconf SF 2017
Matineh Shaker, Artificial Intelligence Scientist, Bonsai at MLconf SF 2017Matineh Shaker, Artificial Intelligence Scientist, Bonsai at MLconf SF 2017
Matineh Shaker, Artificial Intelligence Scientist, Bonsai at MLconf SF 2017
 
Reinfrocement Learning
Reinfrocement LearningReinfrocement Learning
Reinfrocement Learning
 
Aaa ped-24- Reinforcement Learning
Aaa ped-24- Reinforcement LearningAaa ped-24- Reinforcement Learning
Aaa ped-24- Reinforcement Learning
 
semi supervised Learning and Reinforcement learning (1).pptx
 semi supervised Learning and Reinforcement learning (1).pptx semi supervised Learning and Reinforcement learning (1).pptx
semi supervised Learning and Reinforcement learning (1).pptx
 
Introduction to Deep Reinforcement Learning
Introduction to Deep Reinforcement LearningIntroduction to Deep Reinforcement Learning
Introduction to Deep Reinforcement Learning
 
An efficient use of temporal difference technique in Computer Game Learning
An efficient use of temporal difference technique in Computer Game LearningAn efficient use of temporal difference technique in Computer Game Learning
An efficient use of temporal difference technique in Computer Game Learning
 
PPT - Discovering Reinforcement Learning Algorithms
PPT - Discovering Reinforcement Learning AlgorithmsPPT - Discovering Reinforcement Learning Algorithms
PPT - Discovering Reinforcement Learning Algorithms
 

Mehr von MLconf

Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...
Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...
Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...MLconf
 
Ted Willke - The Brain’s Guide to Dealing with Context in Language Understanding
Ted Willke - The Brain’s Guide to Dealing with Context in Language UnderstandingTed Willke - The Brain’s Guide to Dealing with Context in Language Understanding
Ted Willke - The Brain’s Guide to Dealing with Context in Language UnderstandingMLconf
 
Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...
Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...
Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...MLconf
 
Igor Markov - Quantum Computing: a Treasure Hunt, not a Gold Rush
Igor Markov - Quantum Computing: a Treasure Hunt, not a Gold RushIgor Markov - Quantum Computing: a Treasure Hunt, not a Gold Rush
Igor Markov - Quantum Computing: a Treasure Hunt, not a Gold RushMLconf
 
Josh Wills - Data Labeling as Religious Experience
Josh Wills - Data Labeling as Religious ExperienceJosh Wills - Data Labeling as Religious Experience
Josh Wills - Data Labeling as Religious ExperienceMLconf
 
Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...
Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...
Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...MLconf
 
Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...
Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...
Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...MLconf
 
Meghana Ravikumar - Optimized Image Classification on the Cheap
Meghana Ravikumar - Optimized Image Classification on the CheapMeghana Ravikumar - Optimized Image Classification on the Cheap
Meghana Ravikumar - Optimized Image Classification on the CheapMLconf
 
Noam Finkelstein - The Importance of Modeling Data Collection
Noam Finkelstein - The Importance of Modeling Data CollectionNoam Finkelstein - The Importance of Modeling Data Collection
Noam Finkelstein - The Importance of Modeling Data CollectionMLconf
 
June Andrews - The Uncanny Valley of ML
June Andrews - The Uncanny Valley of MLJune Andrews - The Uncanny Valley of ML
June Andrews - The Uncanny Valley of MLMLconf
 
Sneha Rajana - Deep Learning Architectures for Semantic Relation Detection Tasks
Sneha Rajana - Deep Learning Architectures for Semantic Relation Detection TasksSneha Rajana - Deep Learning Architectures for Semantic Relation Detection Tasks
Sneha Rajana - Deep Learning Architectures for Semantic Relation Detection TasksMLconf
 
Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...
Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...
Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...MLconf
 
Vito Ostuni - The Voice: New Challenges in a Zero UI World
Vito Ostuni - The Voice: New Challenges in a Zero UI WorldVito Ostuni - The Voice: New Challenges in a Zero UI World
Vito Ostuni - The Voice: New Challenges in a Zero UI WorldMLconf
 
Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...
Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...
Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...MLconf
 
Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...
Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...
Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...MLconf
 
Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...
Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...
Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...MLconf
 
Neel Sundaresan - Teaching a machine to code
Neel Sundaresan - Teaching a machine to codeNeel Sundaresan - Teaching a machine to code
Neel Sundaresan - Teaching a machine to codeMLconf
 
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...MLconf
 
Soumith Chintala - Increasing the Impact of AI Through Better Software
Soumith Chintala - Increasing the Impact of AI Through Better SoftwareSoumith Chintala - Increasing the Impact of AI Through Better Software
Soumith Chintala - Increasing the Impact of AI Through Better SoftwareMLconf
 
Roy Lowrance - Predicting Bond Prices: Regime Changes
Roy Lowrance - Predicting Bond Prices: Regime ChangesRoy Lowrance - Predicting Bond Prices: Regime Changes
Roy Lowrance - Predicting Bond Prices: Regime ChangesMLconf
 

Mehr von MLconf (20)

Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...
Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...
Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...
 
Ted Willke - The Brain’s Guide to Dealing with Context in Language Understanding
Ted Willke - The Brain’s Guide to Dealing with Context in Language UnderstandingTed Willke - The Brain’s Guide to Dealing with Context in Language Understanding
Ted Willke - The Brain’s Guide to Dealing with Context in Language Understanding
 
Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...
Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...
Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...
 
Igor Markov - Quantum Computing: a Treasure Hunt, not a Gold Rush
Igor Markov - Quantum Computing: a Treasure Hunt, not a Gold RushIgor Markov - Quantum Computing: a Treasure Hunt, not a Gold Rush
Igor Markov - Quantum Computing: a Treasure Hunt, not a Gold Rush
 
Josh Wills - Data Labeling as Religious Experience
Josh Wills - Data Labeling as Religious ExperienceJosh Wills - Data Labeling as Religious Experience
Josh Wills - Data Labeling as Religious Experience
 
Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...
Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...
Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...
 
Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...
Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...
Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...
 
Meghana Ravikumar - Optimized Image Classification on the Cheap
Meghana Ravikumar - Optimized Image Classification on the CheapMeghana Ravikumar - Optimized Image Classification on the Cheap
Meghana Ravikumar - Optimized Image Classification on the Cheap
 
Noam Finkelstein - The Importance of Modeling Data Collection
Noam Finkelstein - The Importance of Modeling Data CollectionNoam Finkelstein - The Importance of Modeling Data Collection
Noam Finkelstein - The Importance of Modeling Data Collection
 
June Andrews - The Uncanny Valley of ML
June Andrews - The Uncanny Valley of MLJune Andrews - The Uncanny Valley of ML
June Andrews - The Uncanny Valley of ML
 
Sneha Rajana - Deep Learning Architectures for Semantic Relation Detection Tasks
Sneha Rajana - Deep Learning Architectures for Semantic Relation Detection TasksSneha Rajana - Deep Learning Architectures for Semantic Relation Detection Tasks
Sneha Rajana - Deep Learning Architectures for Semantic Relation Detection Tasks
 
Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...
Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...
Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...
 
Vito Ostuni - The Voice: New Challenges in a Zero UI World
Vito Ostuni - The Voice: New Challenges in a Zero UI WorldVito Ostuni - The Voice: New Challenges in a Zero UI World
Vito Ostuni - The Voice: New Challenges in a Zero UI World
 
Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...
Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...
Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...
 
Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...
Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...
Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...
 
Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...
Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...
Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...
 
Neel Sundaresan - Teaching a machine to code
Neel Sundaresan - Teaching a machine to codeNeel Sundaresan - Teaching a machine to code
Neel Sundaresan - Teaching a machine to code
 
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...
 
Soumith Chintala - Increasing the Impact of AI Through Better Software
Soumith Chintala - Increasing the Impact of AI Through Better SoftwareSoumith Chintala - Increasing the Impact of AI Through Better Software
Soumith Chintala - Increasing the Impact of AI Through Better Software
 
Roy Lowrance - Predicting Bond Prices: Regime Changes
Roy Lowrance - Predicting Bond Prices: Regime ChangesRoy Lowrance - Predicting Bond Prices: Regime Changes
Roy Lowrance - Predicting Bond Prices: Regime Changes
 

Kürzlich hochgeladen

Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 

Kürzlich hochgeladen (20)

Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 

Ben Lau, Quantitative Researcher, Hobbyist, at MLconf NYC 2017

  • 1. Deep Reinforcement Learning using deep learning to play self-driving car games Ben Lau Ben Lau - Deep Learning and Reinforcement MLConf 2017, New York City
  • 2. What is Reinforcement Learning? Ben Lau - Deep Learning and Reinforcement 3 classes of learning Supervised Learning  Label data  Direct Feedback Unsupervised Learning  No labels data  No feedback  “Find Hidden Structure Reinforcement Learning  Using reward as feedback  Learn series of actions  Trial and Error
  • 3. RL: Agent and Environment Ben Lau - Deep Learning and Reinforcement 𝑅𝑡 Agent Action 𝐴 𝑡 Environment Reward Observation 𝑂𝑡 At each step t the Agent • Receive observation 𝑂𝑡 • Execute action 𝐴 𝑡 • Receive reward 𝑅𝑡 the Environment • Receive action 𝐴 𝑡 • Sends observation 𝑂𝑡+1 • Sends reward 𝑅𝑡+1
  • 4. RL: State Ben Lau - Deep Learning and Reinforcement Experience is a sequence of observations, actions, rewards 𝑜1, 𝑟1, 𝑎1 … , 𝑜𝑡−1, 𝑟𝑡−1, 𝑎 𝑡−1, 𝑜𝑡, 𝑟𝑡, 𝑎 𝑡 The state is a summary of experience 𝑠𝑡 = 𝑓(𝑜1, 𝑟1, 𝑎1 … , 𝑜𝑡−1, 𝑟𝑡−1, 𝑎 𝑡−1, 𝑜𝑡, 𝑟𝑡, 𝑎 𝑡) Note: Not all the state are fully observable Fully Observable Not Fully Observable
  • 5. Approach to Reinforcement Learning Ben Lau - Deep Learning and Reinforcement Value-Based RL  Estimate the optimal value function 𝑄∗(𝑠, 𝑎)  This is the maximum value achievable under any policy Policy-Based RL  Search directly for the optimal policy 𝜋∗  This is the policy achieving maximum future reward Model-based RL  Build a model of the environment  Plan (e.g. by lookahead) using model
  • 6. Deep Learning + RL  AI Ben Lau - Deep Learning and Reinforcement reward Game input Deep convolution network Stee r Gas Peda l Brake
  • 7. Policies Ben Lau - Deep Learning and Reinforcement A deterministic policy is the agent’s behavior  It is a map from state to action:  𝑎 𝑡 = 𝜋(𝑠𝑡) In Reinforcement Learning, the agent’s goal is to choose each action such that it maximize the sum of future rewards Choose at to maximize 𝑅𝑡 = 𝑟𝑡+1 + 𝛾𝑟𝑡+2 + 𝛾2 𝑟𝑡+3 + ⋯ 𝛾 is a discount factor [0,1], as the reward is less certain when further away State(s) Action(a) Obstacle Brake Corner Left/Right Straight line Acceleration
  • 8. Approach to Reinforcement Learning Ben Lau - Deep Learning and Reinforcement Value-Based RL  Estimate the optimal value function 𝑄∗(𝑠, 𝑎)  This is the maximum value achievable under any policy
  • 9. Value Function Ben Lau - Deep Learning and Reinforcement  A value function is a prediction of future reward  How much reward will I get from action a in state s?  A Q-value function gives expected total reward  From state-action pair (s, a)  Under policy 𝜋  With discount factor 𝛾 𝑄 𝜋 𝑠, 𝑎 = 𝐸 𝑟𝑡+1 + 𝛾𝑟𝑡+2 + 𝛾2 𝑟𝑡+3 + ⋯ 𝑠, 𝑎]  An optimal value function is the maximum achievable value 𝑄∗ 𝑠, 𝑎 = 𝑀𝑎𝑥 𝑎 𝑄 𝜋 𝑠, 𝑎  Once we have the 𝑄∗ we can act optimally 𝜋∗ 𝑠 = 𝑎𝑟𝑔𝑚𝑎𝑥 𝑄∗ (𝑠, 𝑎)
  • 10. Understanding Q Function Ben Lau - Deep Learning and Reinforcement  The best way to understand Q function is considering a “strategy guide”  Suppose you are playing a difficult game (DOOM)  If you have a strategy guide, it’s pretty easy  Just follow the guide  Suppose you are in state s, and need to make a decision, If you have this m Q-function(strategy guide), then it is easy, just pick the action with highest Q Doom Strategy Guide
  • 11. How to find Q-function Ben Lau - Deep Learning and Reinforcement  Discount Future Reward:𝑅𝑡 = 𝑟𝑡 + 𝛾𝑟𝑡+1 + 𝛾2 𝑟𝑡+2 + ⋯ + 𝛾 𝑛−𝑡 𝑟𝑛 which can be written as:  𝑅𝑡 = 𝑟𝑡 + 𝛾𝑅𝑡+1 Recall the definition of Q-function (max reward if choose action a in state s)  𝑄 𝑠𝑡, 𝑎 𝑡 = max 𝑅𝑡+1 Therefore, we can rewrite the Q-function as below  𝑄 𝑠, 𝑎 = 𝑟 + 𝛾 × 𝑚𝑎𝑥 𝑎′Q(𝑠′, 𝑎′) In plain English, it means maximum future reward for (s,a) is the immediate reward r + maximum future reward in next state s’, action a’ It can be solved by dynamic programming or iterative solution
  • 12. Deep Q-Network (DQN) Ben Lau - Deep Learning and Reinforcement  Action-Value function (Q-function) often very big  DQN idea: I use the neural network to compress this Q-table, using the weight (w) in the neural network  𝑄 𝑠, 𝑎 ≈ 𝑄 𝑠, 𝑎, 𝑤  Training become finding sets of optimal weights w instead  In the literature we often called “non-linear function approximation” State Action Value A 1 140.11 A 2 139.22 B 1 145.89 B 2 140.23 C 1 123.67 C 2 135.27 ≈
  • 13. DQN Demo Using DeepQ network to play Doom
  • 14. Approach to Reinforcement Learning Ben Lau - Deep Learning and Reinforcement Policy-Based RL  Search directly for the optimal policy 𝜋∗  This is the policy achieving maximum future reward
  • 15. Deep Policy Network Ben Lau - Deep Learning and Reinforcement Review: A policy is the agent’s behavior  It is a map from state to action:  at = π(st)  We can directly search the policy  Let’s parameterize the policy by some model parameters 𝜃  𝑎 = 𝜋(𝑠, 𝜃)  We called it Policy-Based reinforcement learning because we will adjust the model parameters 𝜃 directly  The goal is to maximize the total discount reward from beginning maximize total 𝑅 = 𝑟1 + 𝛾𝑟2 + 𝛾2 𝑟3 + ⋯
  • 16. Policy Gradient Ben Lau - Deep Learning and Reinforcement How to make good action more likely?  Define objective function as total discounted reward 𝐿 𝜃 = 𝐸 𝑟1 + 𝛾𝑟2 + 𝛾2 𝑟3 + ⋯ |𝜋 𝜃(𝑠, 𝑎) or 𝐿 𝜃 = 𝐸 𝑅|𝜋 𝜃(𝑠, 𝑎) Where the expectations of the total reward R is calculated under some probability distribution 𝑝(𝑎|𝜃) parameterized by 𝜃  The goal become maximize the total reward by compute the gradient 𝜕𝐿(𝜃) 𝜕𝜃
  • 17. Policy Gradient (II) Ben Lau - Deep Learning and Reinforcement Recall: Q-function is the maximum discounted future reward in state s, actio 𝑄 𝑠𝑡, 𝑎 𝑡 = 𝑚𝑎𝑥𝑅𝑡+1  In the continuous case we can written as 𝑄 𝑠𝑡, 𝑎 𝑡 = 𝑅𝑡+1 Therefore, we can compute the gradient as 𝜕𝐿(𝜃) 𝜕𝜃 = 𝐸 𝑝(𝑎|𝜃) 𝜕𝑄 𝜕𝜃  Using chain-rule, we can re-write as 𝜕𝐿(𝜃) 𝜕𝜃 = 𝐸 𝑝(𝑎|𝜃) 𝜕𝑄 𝜃(𝑠,𝑎) 𝜕𝑎 𝜕𝑎 𝜕𝜃 No dynamics model required! 1. Only requires Q is differential w.r.t. a 2. As long as a can be parameterized as function of 𝜃
  • 18. The power of Policy Gradient Ben Lau - Deep Learning and Reinforcement Because the policy gradient does not require the dynamical model therefore, no prior domain knowledge is required AlphaGo doesn’t pre-programme any domain knowledge It keep playing many times (via self-play) and adjust the policy parameters 𝜃 to maximize the reward(winning probability)
  • 19. Intuition: Value vs Policy RL Ben Lau - Deep Learning and Reinforcement  Valued Based RL is similar to driving instructor : A score is given for any action is taken by student  Policy Based RL is similar to a driver : It is the actual policy how to drive a car
  • 20. The car racing game TORCS Ben Lau - Deep Learning and Reinforcement  TORCS is a state of the art open source simulator written in C++  Main Features  Sophisticated dynamics  Provided with several tracks, controllers  Sensors  Rangefinder  Speed  Position on track  Rotation speed of wheels  RPM  Angle with tracks Quite realistic to self-driving cars… Track sensors
  • 21. Deep Learning Recipe Ben Lau - Deep Learning and Reinforcement reward Game input state s Deep Neural network Stee r Gas Peda l Brak e  Rangefinder  Speed  Position on track  Rotation speed of wheels  RPM  Angle with tracks Compute the optimal policy 𝜋 via policy gradient
  • 22. Design of the reward function Ben Lau - Deep Learning and Reinforcement  Obvious choice : Highest velocity of the car 𝑅 = 𝑉𝑐𝑎𝑟 cos 𝜃  However, experience found that learning not very stable  Use modify reward function 𝑅 = 𝑉𝑥 cos 𝜃 −𝑉𝑥 sin 𝜃 −𝑉𝑥|track pos| Encourage stay in the center of the track
  • 23. Source code available here: Google: DDPG Keras Ben Lau - Deep Learning and Reinforcement
  • 25. Validation Set: Alpine Tracks Recall basic Machine Learning, make sure you need to test the model In the validation set, not the training set
  • 26. Learning how to brake Ben Lau - Deep Learning and Reinforcement Since we try to maximize the velocity of the car The AI agent don’t want to hit the brake at all! (As it go against the reward function) Using Stochastic Brake Idea
  • 27. Final Demo – Car does not stay center of track Ben Lau - Deep Learning and Reinforcement
  • 28. Future Application Ben Lau - Deep Learning and Reinforcement Self driving cars:
  • 32. How to find Q-function (II) Ben Lau - Deep Learning and Reinforcement  𝑄 𝑠, 𝑎 = 𝑟 + 𝛾 × 𝑚𝑎𝑥 𝑎′Q(𝑠′ , 𝑎′ ) We could use iterative method to solve the Q-function, given a transition (s,a,  We want 𝑟 + 𝛾 × 𝑚𝑎𝑥 𝑎′Q(𝑠′ , 𝑎′ ) to be same as 𝑄 𝑠, 𝑎  Consider find Q-function is a regression task, we can define a loss function  Loss function = 1 2 𝑟 + 𝛾 × 𝑚𝑎𝑥 𝑎′Q(𝑠′, 𝑎′) − 𝑄(𝑠, 𝑎) 2  Q is optimal when the loss function is minimum target prediction