SlideShare a Scribd company logo
1 of 24
Download to read offline
Introduction to Reinforcement Learning
Lili Wu, Laber Lab
October 23, 2018
Outline
Basics and examples
Setup and notation
Problems in RL
How do we get optimal policy given data?
How do we balance exploration and exploitation?
RL in Laber Labs
Basics and examples
Setup and notation
Problems in RL
How do we get optimal policy given data?
How do we balance exploration and exploitation?
RL in Laber Labs
Basic idea
Reinforcement learning (RL): An agent interacting with an
environment, which provides rewards
Goal: Learn how to take actions in order to maximize the
cumulative rewards
History
Figure 1: Puzzle Box. (Trial and Error
Learning)
Figure 2: Thorndike,
1911
Humans and animals learn from reward and punishment
In reinforcement learning, we try to get computers to learn
complicated skills in a similar way
Framework
Figure 3: Reinforcement learning
RL in the news
Advances in computer power and algorithms in recent years
have led to lots of interest in using RL for artificial intelligence
RL has now been used to achieve superhuman performance
for a number of difficult games
Example: Atari
Figure 4: Deep Q-Network playing
Breakout. (Mnih et al. 2015.)
States: Pixels on
screen
Actions: Move
paddle
Rewards: Points
Example: AlphaZero (Silver et al. 2017)
Figure 5: The game of Go.
States: Positions
of stones
Actions: Stone
placement
Rewards:
Win/lose
Basics and examples
Setup and notation
Problems in RL
How do we get optimal policy given data?
How do we balance exploration and exploitation?
RL in Laber Labs
Setup: MDPs
We formalize the reinforcement learning problem using a Markov
decision process (MDP) (S, A, T, r, γ):
S is the set of states the environment can be in;
A is the set of actions available to the decision-maker;
T : S × A × S → R+ is a transition function which gives the
probability distribution of the next state given the current
state and action;
r : S → R is the reward function;
γ is a discount factor, 0 ≤ γ < 1.
Data: at each time t we observe current state, action, and reward
(St, At, Rt, St+1).
Setup: Policies
Policies tell us which action to take in each state
π : S → A
Goal: choose policy to maximize expected cumulative
discounted reward
Eπ
∞
t=0
γt
Rt
Setup: Value functions
Value functions tell us the long-term rewards we can expect under
a given policy, starting from a given state and/or action.
“V-function” measures expected cumulative reward from
given state:
V π
(s) = Eπ
∞
t=0
γt
Rt | S0 = s
“Q-function” measures expected cumulative reward from
given state and action:
Qπ
(s, a) = Eπ
∞
t=0
γt
Rt | S0 = s, A0 = a
=
s ∈S
r(s ) + γV π
(s ) T(s |s, a)
Basics and examples
Setup and notation
Problems in RL
How do we get optimal policy given data?
How do we balance exploration and exploitation?
RL in Laber Labs
Problem 1: Estimating optimal policy
Two ways of getting at optimal policy π∗:
Try to improve π directly
Try to estimate Qπ∗
Example: Q-learning
Qnew
(St, At) ← (1 − α)Q(St, At) + α[Rt + γ max
a
Q(St+1, a)],
where α is learning rate, 0 ≤ α ≤ 1.
Problem 2: Exploration-exploitation tradeoff
Tradeoff between gaining information (exploration) and
following current estimate of optimal policy (exploitation)
Restaurant example
Exploitation: Go to your favorite restaurant
Exploration: Try a new place
Need to balance both to maximize cumulative deliciousness
Different strategies
Occasionally do something completely random
Act based on optimistic estimates of each action’s value
Sample action according to its posterior probability of being
optimal
A small task using RL to solve(CartPole)
Action space A = {0, 1}, represents {left, right}
State space (S1, S2, S3, S4) ∈ R4, represents (position,
velocity, angle, angular velocity)
Goal: Stand for 200 timesteps in each episode. (Large angle
or Far-away distance → Die! )
Define Rt
i ∈ {−1, 1}
Application to CartPole Problem
Basics and examples
Setup and notation
Problems in RL
How do we get optimal policy given data?
How do we balance exploration and exploitation?
RL in Laber Labs
RL in Laber Labs
At Laber Labs we apply reinforcement learning to interesting
and important real-world problems
Controlling the spread of disease
Dynamic medical treatment
Education
Sports decision-making
Stopping the spread of disease
Figure 6: The spread of white-nose
syndrome in bats, 2006-2014. States: Which
locations are
infected
Actions: Locations
to treat
Rewards: Number
of uninfected
locations
Space Mice
Figure 7: Space Mice (By Laber
Labs’ Marshall Wang).
Dynamic medical treatment
Figure 8: RL can help us customize medical
treatment to individual patients’
characteristics.
States: Current
health status
(exercise levels,
food intake, blood
pressure, blood
sugar, many more)
Actions:
Recommend
treatment
Rewards: Health
outcomes

More Related Content

Similar to PMED Undergraduate Workshop - Introduction to Reinforcement Learning - Lili Wu (Laber Labs), October 23, 2018

RL_online _presentation_1.ppt
RL_online _presentation_1.pptRL_online _presentation_1.ppt
RL_online _presentation_1.pptssuser43a599
 
Reinforcement learning
Reinforcement learningReinforcement learning
Reinforcement learningDing Li
 
Financial Trading as a Game: A Deep Reinforcement Learning Approach
Financial Trading as a Game: A Deep Reinforcement Learning ApproachFinancial Trading as a Game: A Deep Reinforcement Learning Approach
Financial Trading as a Game: A Deep Reinforcement Learning Approach謙益 黃
 
Reinforcement Learning.ppt
Reinforcement Learning.pptReinforcement Learning.ppt
Reinforcement Learning.pptPOOJASHREEC1
 
S19_lecture6_exploreexploitinbandits.pdf
S19_lecture6_exploreexploitinbandits.pdfS19_lecture6_exploreexploitinbandits.pdf
S19_lecture6_exploreexploitinbandits.pdfLPrashanthi
 
How to formulate reinforcement learning in illustrative ways
How to formulate reinforcement learning in illustrative waysHow to formulate reinforcement learning in illustrative ways
How to formulate reinforcement learning in illustrative waysYasutoTamura1
 
Counterfactual Learning for Recommendation
Counterfactual Learning for RecommendationCounterfactual Learning for Recommendation
Counterfactual Learning for RecommendationOlivier Jeunen
 
Reinforcement learning 7313
Reinforcement learning 7313Reinforcement learning 7313
Reinforcement learning 7313Slideshare
 
REINFORCEMENT LEARNING
REINFORCEMENT LEARNINGREINFORCEMENT LEARNING
REINFORCEMENT LEARNINGpradiprahul
 
14_ReinforcementLearning.pptx
14_ReinforcementLearning.pptx14_ReinforcementLearning.pptx
14_ReinforcementLearning.pptxRithikRaj25
 
Reinforcement learning Research experiments OpenAI
Reinforcement learning Research experiments OpenAIReinforcement learning Research experiments OpenAI
Reinforcement learning Research experiments OpenAIRaouf KESKES
 
0415_seminar_DeepDPG
0415_seminar_DeepDPG0415_seminar_DeepDPG
0415_seminar_DeepDPGHye-min Ahn
 
Reinforcement Learning on Mine Sweeper
Reinforcement Learning on Mine SweeperReinforcement Learning on Mine Sweeper
Reinforcement Learning on Mine SweeperDataScienceLab
 

Similar to PMED Undergraduate Workshop - Introduction to Reinforcement Learning - Lili Wu (Laber Labs), October 23, 2018 (20)

Playing Atari with Deep Reinforcement Learning
Playing Atari with Deep Reinforcement LearningPlaying Atari with Deep Reinforcement Learning
Playing Atari with Deep Reinforcement Learning
 
YijueRL.ppt
YijueRL.pptYijueRL.ppt
YijueRL.ppt
 
RL_online _presentation_1.ppt
RL_online _presentation_1.pptRL_online _presentation_1.ppt
RL_online _presentation_1.ppt
 
Reinforcement learning
Reinforcement learningReinforcement learning
Reinforcement learning
 
Introduction to Deep Reinforcement Learning
Introduction to Deep Reinforcement LearningIntroduction to Deep Reinforcement Learning
Introduction to Deep Reinforcement Learning
 
Financial Trading as a Game: A Deep Reinforcement Learning Approach
Financial Trading as a Game: A Deep Reinforcement Learning ApproachFinancial Trading as a Game: A Deep Reinforcement Learning Approach
Financial Trading as a Game: A Deep Reinforcement Learning Approach
 
Deep RL.pdf
Deep RL.pdfDeep RL.pdf
Deep RL.pdf
 
GDRR Opening Workshop - Deep Reinforcement Learning for Asset Based Modeling ...
GDRR Opening Workshop - Deep Reinforcement Learning for Asset Based Modeling ...GDRR Opening Workshop - Deep Reinforcement Learning for Asset Based Modeling ...
GDRR Opening Workshop - Deep Reinforcement Learning for Asset Based Modeling ...
 
Reinforcement Learning.ppt
Reinforcement Learning.pptReinforcement Learning.ppt
Reinforcement Learning.ppt
 
S19_lecture6_exploreexploitinbandits.pdf
S19_lecture6_exploreexploitinbandits.pdfS19_lecture6_exploreexploitinbandits.pdf
S19_lecture6_exploreexploitinbandits.pdf
 
How to formulate reinforcement learning in illustrative ways
How to formulate reinforcement learning in illustrative waysHow to formulate reinforcement learning in illustrative ways
How to formulate reinforcement learning in illustrative ways
 
Counterfactual Learning for Recommendation
Counterfactual Learning for RecommendationCounterfactual Learning for Recommendation
Counterfactual Learning for Recommendation
 
Reinforcement learning 7313
Reinforcement learning 7313Reinforcement learning 7313
Reinforcement learning 7313
 
Intro rl
Intro rlIntro rl
Intro rl
 
REINFORCEMENT LEARNING
REINFORCEMENT LEARNINGREINFORCEMENT LEARNING
REINFORCEMENT LEARNING
 
14_ReinforcementLearning.pptx
14_ReinforcementLearning.pptx14_ReinforcementLearning.pptx
14_ReinforcementLearning.pptx
 
Reinforcement learning Research experiments OpenAI
Reinforcement learning Research experiments OpenAIReinforcement learning Research experiments OpenAI
Reinforcement learning Research experiments OpenAI
 
0415_seminar_DeepDPG
0415_seminar_DeepDPG0415_seminar_DeepDPG
0415_seminar_DeepDPG
 
Reinforcement-Learning.ppt
Reinforcement-Learning.pptReinforcement-Learning.ppt
Reinforcement-Learning.ppt
 
Reinforcement Learning on Mine Sweeper
Reinforcement Learning on Mine SweeperReinforcement Learning on Mine Sweeper
Reinforcement Learning on Mine Sweeper
 

More from The Statistical and Applied Mathematical Sciences Institute

More from The Statistical and Applied Mathematical Sciences Institute (20)

Causal Inference Opening Workshop - Latent Variable Models, Causal Inference,...
Causal Inference Opening Workshop - Latent Variable Models, Causal Inference,...Causal Inference Opening Workshop - Latent Variable Models, Causal Inference,...
Causal Inference Opening Workshop - Latent Variable Models, Causal Inference,...
 
2019 Fall Series: Special Guest Lecture - 0-1 Phase Transitions in High Dimen...
2019 Fall Series: Special Guest Lecture - 0-1 Phase Transitions in High Dimen...2019 Fall Series: Special Guest Lecture - 0-1 Phase Transitions in High Dimen...
2019 Fall Series: Special Guest Lecture - 0-1 Phase Transitions in High Dimen...
 
Causal Inference Opening Workshop - Causal Discovery in Neuroimaging Data - F...
Causal Inference Opening Workshop - Causal Discovery in Neuroimaging Data - F...Causal Inference Opening Workshop - Causal Discovery in Neuroimaging Data - F...
Causal Inference Opening Workshop - Causal Discovery in Neuroimaging Data - F...
 
Causal Inference Opening Workshop - Smooth Extensions to BART for Heterogeneo...
Causal Inference Opening Workshop - Smooth Extensions to BART for Heterogeneo...Causal Inference Opening Workshop - Smooth Extensions to BART for Heterogeneo...
Causal Inference Opening Workshop - Smooth Extensions to BART for Heterogeneo...
 
Causal Inference Opening Workshop - A Bracketing Relationship between Differe...
Causal Inference Opening Workshop - A Bracketing Relationship between Differe...Causal Inference Opening Workshop - A Bracketing Relationship between Differe...
Causal Inference Opening Workshop - A Bracketing Relationship between Differe...
 
Causal Inference Opening Workshop - Testing Weak Nulls in Matched Observation...
Causal Inference Opening Workshop - Testing Weak Nulls in Matched Observation...Causal Inference Opening Workshop - Testing Weak Nulls in Matched Observation...
Causal Inference Opening Workshop - Testing Weak Nulls in Matched Observation...
 
Causal Inference Opening Workshop - Difference-in-differences: more than meet...
Causal Inference Opening Workshop - Difference-in-differences: more than meet...Causal Inference Opening Workshop - Difference-in-differences: more than meet...
Causal Inference Opening Workshop - Difference-in-differences: more than meet...
 
Causal Inference Opening Workshop - New Statistical Learning Methods for Esti...
Causal Inference Opening Workshop - New Statistical Learning Methods for Esti...Causal Inference Opening Workshop - New Statistical Learning Methods for Esti...
Causal Inference Opening Workshop - New Statistical Learning Methods for Esti...
 
Causal Inference Opening Workshop - Bipartite Causal Inference with Interfere...
Causal Inference Opening Workshop - Bipartite Causal Inference with Interfere...Causal Inference Opening Workshop - Bipartite Causal Inference with Interfere...
Causal Inference Opening Workshop - Bipartite Causal Inference with Interfere...
 
Causal Inference Opening Workshop - Bridging the Gap Between Causal Literatur...
Causal Inference Opening Workshop - Bridging the Gap Between Causal Literatur...Causal Inference Opening Workshop - Bridging the Gap Between Causal Literatur...
Causal Inference Opening Workshop - Bridging the Gap Between Causal Literatur...
 
Causal Inference Opening Workshop - Some Applications of Reinforcement Learni...
Causal Inference Opening Workshop - Some Applications of Reinforcement Learni...Causal Inference Opening Workshop - Some Applications of Reinforcement Learni...
Causal Inference Opening Workshop - Some Applications of Reinforcement Learni...
 
Causal Inference Opening Workshop - Bracketing Bounds for Differences-in-Diff...
Causal Inference Opening Workshop - Bracketing Bounds for Differences-in-Diff...Causal Inference Opening Workshop - Bracketing Bounds for Differences-in-Diff...
Causal Inference Opening Workshop - Bracketing Bounds for Differences-in-Diff...
 
Causal Inference Opening Workshop - Assisting the Impact of State Polcies: Br...
Causal Inference Opening Workshop - Assisting the Impact of State Polcies: Br...Causal Inference Opening Workshop - Assisting the Impact of State Polcies: Br...
Causal Inference Opening Workshop - Assisting the Impact of State Polcies: Br...
 
Causal Inference Opening Workshop - Experimenting in Equilibrium - Stefan Wag...
Causal Inference Opening Workshop - Experimenting in Equilibrium - Stefan Wag...Causal Inference Opening Workshop - Experimenting in Equilibrium - Stefan Wag...
Causal Inference Opening Workshop - Experimenting in Equilibrium - Stefan Wag...
 
Causal Inference Opening Workshop - Targeted Learning for Causal Inference Ba...
Causal Inference Opening Workshop - Targeted Learning for Causal Inference Ba...Causal Inference Opening Workshop - Targeted Learning for Causal Inference Ba...
Causal Inference Opening Workshop - Targeted Learning for Causal Inference Ba...
 
Causal Inference Opening Workshop - Bayesian Nonparametric Models for Treatme...
Causal Inference Opening Workshop - Bayesian Nonparametric Models for Treatme...Causal Inference Opening Workshop - Bayesian Nonparametric Models for Treatme...
Causal Inference Opening Workshop - Bayesian Nonparametric Models for Treatme...
 
2019 Fall Series: Special Guest Lecture - Adversarial Risk Analysis of the Ge...
2019 Fall Series: Special Guest Lecture - Adversarial Risk Analysis of the Ge...2019 Fall Series: Special Guest Lecture - Adversarial Risk Analysis of the Ge...
2019 Fall Series: Special Guest Lecture - Adversarial Risk Analysis of the Ge...
 
2019 Fall Series: Professional Development, Writing Academic Papers…What Work...
2019 Fall Series: Professional Development, Writing Academic Papers…What Work...2019 Fall Series: Professional Development, Writing Academic Papers…What Work...
2019 Fall Series: Professional Development, Writing Academic Papers…What Work...
 
2019 GDRR: Blockchain Data Analytics - Machine Learning in/for Blockchain: Fu...
2019 GDRR: Blockchain Data Analytics - Machine Learning in/for Blockchain: Fu...2019 GDRR: Blockchain Data Analytics - Machine Learning in/for Blockchain: Fu...
2019 GDRR: Blockchain Data Analytics - Machine Learning in/for Blockchain: Fu...
 
2019 GDRR: Blockchain Data Analytics - QuTrack: Model Life Cycle Management f...
2019 GDRR: Blockchain Data Analytics - QuTrack: Model Life Cycle Management f...2019 GDRR: Blockchain Data Analytics - QuTrack: Model Life Cycle Management f...
2019 GDRR: Blockchain Data Analytics - QuTrack: Model Life Cycle Management f...
 

Recently uploaded

THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATIONTHEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATIONHumphrey A Beña
 
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...JhezDiaz1
 
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITYISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITYKayeClaireEstoconing
 
ENGLISH6-Q4-W3.pptxqurter our high choom
ENGLISH6-Q4-W3.pptxqurter our high choomENGLISH6-Q4-W3.pptxqurter our high choom
ENGLISH6-Q4-W3.pptxqurter our high choomnelietumpap1
 
Science 7 Quarter 4 Module 2: Natural Resources.pptx
Science 7 Quarter 4 Module 2: Natural Resources.pptxScience 7 Quarter 4 Module 2: Natural Resources.pptx
Science 7 Quarter 4 Module 2: Natural Resources.pptxMaryGraceBautista27
 
Culture Uniformity or Diversity IN SOCIOLOGY.pptx
Culture Uniformity or Diversity IN SOCIOLOGY.pptxCulture Uniformity or Diversity IN SOCIOLOGY.pptx
Culture Uniformity or Diversity IN SOCIOLOGY.pptxPoojaSen20
 
Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17Celine George
 
Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Celine George
 
What is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERPWhat is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERPCeline George
 
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxiammrhaywood
 
AUDIENCE THEORY -CULTIVATION THEORY - GERBNER.pptx
AUDIENCE THEORY -CULTIVATION THEORY -  GERBNER.pptxAUDIENCE THEORY -CULTIVATION THEORY -  GERBNER.pptx
AUDIENCE THEORY -CULTIVATION THEORY - GERBNER.pptxiammrhaywood
 
ANG SEKTOR NG agrikultura.pptx QUARTER 4
ANG SEKTOR NG agrikultura.pptx QUARTER 4ANG SEKTOR NG agrikultura.pptx QUARTER 4
ANG SEKTOR NG agrikultura.pptx QUARTER 4MiaBumagat1
 
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17Celine George
 
Keynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designKeynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designMIPLM
 
Concurrency Control in Database Management system
Concurrency Control in Database Management systemConcurrency Control in Database Management system
Concurrency Control in Database Management systemChristalin Nelson
 
Choosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for ParentsChoosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for Parentsnavabharathschool99
 

Recently uploaded (20)

THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATIONTHEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
 
LEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptx
LEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptxLEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptx
LEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptx
 
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
 
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITYISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
 
ENGLISH6-Q4-W3.pptxqurter our high choom
ENGLISH6-Q4-W3.pptxqurter our high choomENGLISH6-Q4-W3.pptxqurter our high choom
ENGLISH6-Q4-W3.pptxqurter our high choom
 
Science 7 Quarter 4 Module 2: Natural Resources.pptx
Science 7 Quarter 4 Module 2: Natural Resources.pptxScience 7 Quarter 4 Module 2: Natural Resources.pptx
Science 7 Quarter 4 Module 2: Natural Resources.pptx
 
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
 
Culture Uniformity or Diversity IN SOCIOLOGY.pptx
Culture Uniformity or Diversity IN SOCIOLOGY.pptxCulture Uniformity or Diversity IN SOCIOLOGY.pptx
Culture Uniformity or Diversity IN SOCIOLOGY.pptx
 
Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17
 
Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17
 
What is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERPWhat is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERP
 
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
 
AUDIENCE THEORY -CULTIVATION THEORY - GERBNER.pptx
AUDIENCE THEORY -CULTIVATION THEORY -  GERBNER.pptxAUDIENCE THEORY -CULTIVATION THEORY -  GERBNER.pptx
AUDIENCE THEORY -CULTIVATION THEORY - GERBNER.pptx
 
ANG SEKTOR NG agrikultura.pptx QUARTER 4
ANG SEKTOR NG agrikultura.pptx QUARTER 4ANG SEKTOR NG agrikultura.pptx QUARTER 4
ANG SEKTOR NG agrikultura.pptx QUARTER 4
 
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
 
YOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptx
YOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptxYOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptx
YOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptx
 
YOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptx
YOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptxYOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptx
YOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptx
 
Keynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designKeynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-design
 
Concurrency Control in Database Management system
Concurrency Control in Database Management systemConcurrency Control in Database Management system
Concurrency Control in Database Management system
 
Choosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for ParentsChoosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for Parents
 

PMED Undergraduate Workshop - Introduction to Reinforcement Learning - Lili Wu (Laber Labs), October 23, 2018

  • 1. Introduction to Reinforcement Learning Lili Wu, Laber Lab October 23, 2018
  • 2. Outline Basics and examples Setup and notation Problems in RL How do we get optimal policy given data? How do we balance exploration and exploitation? RL in Laber Labs
  • 3. Basics and examples Setup and notation Problems in RL How do we get optimal policy given data? How do we balance exploration and exploitation? RL in Laber Labs
  • 4. Basic idea Reinforcement learning (RL): An agent interacting with an environment, which provides rewards Goal: Learn how to take actions in order to maximize the cumulative rewards
  • 5. History Figure 1: Puzzle Box. (Trial and Error Learning) Figure 2: Thorndike, 1911
  • 6. Humans and animals learn from reward and punishment In reinforcement learning, we try to get computers to learn complicated skills in a similar way
  • 8. RL in the news Advances in computer power and algorithms in recent years have led to lots of interest in using RL for artificial intelligence RL has now been used to achieve superhuman performance for a number of difficult games
  • 9. Example: Atari Figure 4: Deep Q-Network playing Breakout. (Mnih et al. 2015.) States: Pixels on screen Actions: Move paddle Rewards: Points
  • 10. Example: AlphaZero (Silver et al. 2017) Figure 5: The game of Go. States: Positions of stones Actions: Stone placement Rewards: Win/lose
  • 11. Basics and examples Setup and notation Problems in RL How do we get optimal policy given data? How do we balance exploration and exploitation? RL in Laber Labs
  • 12. Setup: MDPs We formalize the reinforcement learning problem using a Markov decision process (MDP) (S, A, T, r, γ): S is the set of states the environment can be in; A is the set of actions available to the decision-maker; T : S × A × S → R+ is a transition function which gives the probability distribution of the next state given the current state and action; r : S → R is the reward function; γ is a discount factor, 0 ≤ γ < 1. Data: at each time t we observe current state, action, and reward (St, At, Rt, St+1).
  • 13. Setup: Policies Policies tell us which action to take in each state π : S → A Goal: choose policy to maximize expected cumulative discounted reward Eπ ∞ t=0 γt Rt
  • 14. Setup: Value functions Value functions tell us the long-term rewards we can expect under a given policy, starting from a given state and/or action. “V-function” measures expected cumulative reward from given state: V π (s) = Eπ ∞ t=0 γt Rt | S0 = s “Q-function” measures expected cumulative reward from given state and action: Qπ (s, a) = Eπ ∞ t=0 γt Rt | S0 = s, A0 = a = s ∈S r(s ) + γV π (s ) T(s |s, a)
  • 15. Basics and examples Setup and notation Problems in RL How do we get optimal policy given data? How do we balance exploration and exploitation? RL in Laber Labs
  • 16. Problem 1: Estimating optimal policy Two ways of getting at optimal policy π∗: Try to improve π directly Try to estimate Qπ∗ Example: Q-learning Qnew (St, At) ← (1 − α)Q(St, At) + α[Rt + γ max a Q(St+1, a)], where α is learning rate, 0 ≤ α ≤ 1.
  • 17. Problem 2: Exploration-exploitation tradeoff Tradeoff between gaining information (exploration) and following current estimate of optimal policy (exploitation) Restaurant example Exploitation: Go to your favorite restaurant Exploration: Try a new place Need to balance both to maximize cumulative deliciousness Different strategies Occasionally do something completely random Act based on optimistic estimates of each action’s value Sample action according to its posterior probability of being optimal
  • 18. A small task using RL to solve(CartPole) Action space A = {0, 1}, represents {left, right} State space (S1, S2, S3, S4) ∈ R4, represents (position, velocity, angle, angular velocity) Goal: Stand for 200 timesteps in each episode. (Large angle or Far-away distance → Die! ) Define Rt i ∈ {−1, 1}
  • 20. Basics and examples Setup and notation Problems in RL How do we get optimal policy given data? How do we balance exploration and exploitation? RL in Laber Labs
  • 21. RL in Laber Labs At Laber Labs we apply reinforcement learning to interesting and important real-world problems Controlling the spread of disease Dynamic medical treatment Education Sports decision-making
  • 22. Stopping the spread of disease Figure 6: The spread of white-nose syndrome in bats, 2006-2014. States: Which locations are infected Actions: Locations to treat Rewards: Number of uninfected locations
  • 23. Space Mice Figure 7: Space Mice (By Laber Labs’ Marshall Wang).
  • 24. Dynamic medical treatment Figure 8: RL can help us customize medical treatment to individual patients’ characteristics. States: Current health status (exercise levels, food intake, blood pressure, blood sugar, many more) Actions: Recommend treatment Rewards: Health outcomes