SlideShare ist ein Scribd-Unternehmen logo
1 von 21
Reinforcement Learning Michael L. Littman Slides from  http://www.cs.vu.nl/~elena/ml_13light.ppt which appear to have been adapted from http://www.cs.cmu.edu/afs/cs.cmu.edu/project/theo-3/www/l20.ps
Reinforcement Learning ,[object Object],[object Object],[object Object],[object Object],[Read Ch. 13] [Exercise 13.2]
Control Learning ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
One Example: TD-Gammon ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Reinforcement Learning Problem ,[object Object],[object Object]
Markov Decision Processes  ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Agent’s Learning Task  ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Value Function  ,[object Object],[object Object],where r t , r t+1 , ... are generated by following policy    starting at state s Restated, the task is to learn the optimal policy   *
 
What to Learn  ,[object Object],[object Object],[object Object],[object Object],[object Object]
Q Function  ,[object Object],If agent learns Q, it can choose optimal action even without knowing   ! Q is the evaluation function the agent will learn [Watkins 1989].
Training Rule to Learn Q  ,[object Object],This allows us to write Q recursively as Nice!  Let  denote learner’s current approximation to Q. Consider training rule where s’ is the state resulting from applying action a in state s.
Q Learning for Deterministic Worlds ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Updating Q Notice if rewards non-negative, then and
[object Object],[object Object],[object Object],[object Object],Convergence Theorem
Note we used general fact that: This works with things other than max that satisfy this  non-expansion  property [Szepesv á ri & Littman, 1999].
Non-deterministic Case (1) ,[object Object],[object Object]
Nondeterministic Case (2) Q  learning generalizes to nondeterministic worlds Alter training rule to where Can still prove convergence of  to  Q  [Watkins and Dayan, 1992].  Standard properties:      n  = 0,      n 2  =   .
Temporal Difference Learning (1) Q learning: reduce discrepancy between successive Q estimates One step time difference: Why not two steps? Or  n ? Blend all of these:
Temporal Difference Learning (2) ,[object Object],[object Object],[object Object],[object Object],Equivalent expression:
Subtleties and Ongoing Research ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Weitere ähnliche Inhalte

Was ist angesagt?

Reinforcement learning 7313
Reinforcement learning 7313Reinforcement learning 7313
Reinforcement learning 7313Slideshare
 
Actor critic algorithm
Actor critic algorithmActor critic algorithm
Actor critic algorithmJie-Han Chen
 
Cs221 lecture8-fall11
Cs221 lecture8-fall11Cs221 lecture8-fall11
Cs221 lecture8-fall11darwinrlo
 
Forecasting exponential smoothing
Forecasting exponential smoothingForecasting exponential smoothing
Forecasting exponential smoothingSublaxmi Gupta
 
[1808.00177] Learning Dexterous In-Hand Manipulation
[1808.00177] Learning Dexterous In-Hand Manipulation[1808.00177] Learning Dexterous In-Hand Manipulation
[1808.00177] Learning Dexterous In-Hand ManipulationSeung Jae Lee
 
Newton's Laws Cheat sheet
Newton's Laws Cheat sheetNewton's Laws Cheat sheet
Newton's Laws Cheat sheetTimothy Welsh
 
Chapter 13 kinematics1 d
Chapter 13 kinematics1 dChapter 13 kinematics1 d
Chapter 13 kinematics1 dTimothy Welsh
 
kinematics1_d cheat sheet
kinematics1_d cheat sheetkinematics1_d cheat sheet
kinematics1_d cheat sheetTimothy Welsh
 
Work, Power & Energy Cheat sheet
Work, Power & Energy Cheat sheetWork, Power & Energy Cheat sheet
Work, Power & Energy Cheat sheetTimothy Welsh
 
Momentum & Collisions cheat sheet
Momentum & Collisions cheat sheetMomentum & Collisions cheat sheet
Momentum & Collisions cheat sheetTimothy Welsh
 
Transient response ALA CE
Transient response ALA CETransient response ALA CE
Transient response ALA CEShrey Patel
 
Comparing Different Job Scheduling Heuristics
Comparing Different Job Scheduling HeuristicsComparing Different Job Scheduling Heuristics
Comparing Different Job Scheduling HeuristicsRadu Stoenescu
 
Kinematics 2d cheat sheet
Kinematics 2d cheat sheetKinematics 2d cheat sheet
Kinematics 2d cheat sheetTimothy Welsh
 

Was ist angesagt? (16)

Reinforcement learning 7313
Reinforcement learning 7313Reinforcement learning 7313
Reinforcement learning 7313
 
POMDP Seminar Backup3
POMDP Seminar Backup3POMDP Seminar Backup3
POMDP Seminar Backup3
 
Lar calc10 ch02_sec2
Lar calc10 ch02_sec2Lar calc10 ch02_sec2
Lar calc10 ch02_sec2
 
Actor critic algorithm
Actor critic algorithmActor critic algorithm
Actor critic algorithm
 
Cs221 lecture8-fall11
Cs221 lecture8-fall11Cs221 lecture8-fall11
Cs221 lecture8-fall11
 
Forecasting exponential smoothing
Forecasting exponential smoothingForecasting exponential smoothing
Forecasting exponential smoothing
 
[1808.00177] Learning Dexterous In-Hand Manipulation
[1808.00177] Learning Dexterous In-Hand Manipulation[1808.00177] Learning Dexterous In-Hand Manipulation
[1808.00177] Learning Dexterous In-Hand Manipulation
 
Newton's Laws Cheat sheet
Newton's Laws Cheat sheetNewton's Laws Cheat sheet
Newton's Laws Cheat sheet
 
Chapter 13 kinematics1 d
Chapter 13 kinematics1 dChapter 13 kinematics1 d
Chapter 13 kinematics1 d
 
kinematics1_d cheat sheet
kinematics1_d cheat sheetkinematics1_d cheat sheet
kinematics1_d cheat sheet
 
Planning Algorithms
Planning AlgorithmsPlanning Algorithms
Planning Algorithms
 
Work, Power & Energy Cheat sheet
Work, Power & Energy Cheat sheetWork, Power & Energy Cheat sheet
Work, Power & Energy Cheat sheet
 
Momentum & Collisions cheat sheet
Momentum & Collisions cheat sheetMomentum & Collisions cheat sheet
Momentum & Collisions cheat sheet
 
Transient response ALA CE
Transient response ALA CETransient response ALA CE
Transient response ALA CE
 
Comparing Different Job Scheduling Heuristics
Comparing Different Job Scheduling HeuristicsComparing Different Job Scheduling Heuristics
Comparing Different Job Scheduling Heuristics
 
Kinematics 2d cheat sheet
Kinematics 2d cheat sheetKinematics 2d cheat sheet
Kinematics 2d cheat sheet
 

Ähnlich wie Lecture notes

lecture_21.pptx - PowerPoint Presentation
lecture_21.pptx - PowerPoint Presentationlecture_21.pptx - PowerPoint Presentation
lecture_21.pptx - PowerPoint Presentationbutest
 
reiniforcement learning.ppt
reiniforcement learning.pptreiniforcement learning.ppt
reiniforcement learning.pptcharusharma165
 
RL_online _presentation_1.ppt
RL_online _presentation_1.pptRL_online _presentation_1.ppt
RL_online _presentation_1.pptssuser43a599
 
14_ReinforcementLearning.pptx
14_ReinforcementLearning.pptx14_ReinforcementLearning.pptx
14_ReinforcementLearning.pptxRithikRaj25
 
Reinforcement Learning.ppt
Reinforcement Learning.pptReinforcement Learning.ppt
Reinforcement Learning.pptPOOJASHREEC1
 
Reinforcement learning
Reinforcement learning Reinforcement learning
Reinforcement learning Chandra Meena
 
Reinfrocement Learning
Reinfrocement LearningReinfrocement Learning
Reinfrocement LearningNatan Katz
 
Lecture notes
Lecture notesLecture notes
Lecture notesbutest
 
Matineh Shaker, Artificial Intelligence Scientist, Bonsai at MLconf SF 2017
Matineh Shaker, Artificial Intelligence Scientist, Bonsai at MLconf SF 2017Matineh Shaker, Artificial Intelligence Scientist, Bonsai at MLconf SF 2017
Matineh Shaker, Artificial Intelligence Scientist, Bonsai at MLconf SF 2017MLconf
 
Reinforcement learning
Reinforcement learningReinforcement learning
Reinforcement learningDing Li
 
REINFORCEMENT LEARNING
REINFORCEMENT LEARNINGREINFORCEMENT LEARNING
REINFORCEMENT LEARNINGpradiprahul
 
reinforcement-learning-141009013546-conversion-gate02.pdf
reinforcement-learning-141009013546-conversion-gate02.pdfreinforcement-learning-141009013546-conversion-gate02.pdf
reinforcement-learning-141009013546-conversion-gate02.pdfVaishnavGhadge1
 
How to formulate reinforcement learning in illustrative ways
How to formulate reinforcement learning in illustrative waysHow to formulate reinforcement learning in illustrative ways
How to formulate reinforcement learning in illustrative waysYasutoTamura1
 

Ähnlich wie Lecture notes (20)

lecture_21.pptx - PowerPoint Presentation
lecture_21.pptx - PowerPoint Presentationlecture_21.pptx - PowerPoint Presentation
lecture_21.pptx - PowerPoint Presentation
 
RL.ppt
RL.pptRL.ppt
RL.ppt
 
reiniforcement learning.ppt
reiniforcement learning.pptreiniforcement learning.ppt
reiniforcement learning.ppt
 
YijueRL.ppt
YijueRL.pptYijueRL.ppt
YijueRL.ppt
 
RL_online _presentation_1.ppt
RL_online _presentation_1.pptRL_online _presentation_1.ppt
RL_online _presentation_1.ppt
 
14_ReinforcementLearning.pptx
14_ReinforcementLearning.pptx14_ReinforcementLearning.pptx
14_ReinforcementLearning.pptx
 
Intro rl
Intro rlIntro rl
Intro rl
 
Reinforcement Learning.ppt
Reinforcement Learning.pptReinforcement Learning.ppt
Reinforcement Learning.ppt
 
Reinforcement learning
Reinforcement learning Reinforcement learning
Reinforcement learning
 
Reinfrocement Learning
Reinfrocement LearningReinfrocement Learning
Reinfrocement Learning
 
Lecture notes
Lecture notesLecture notes
Lecture notes
 
Matineh Shaker, Artificial Intelligence Scientist, Bonsai at MLconf SF 2017
Matineh Shaker, Artificial Intelligence Scientist, Bonsai at MLconf SF 2017Matineh Shaker, Artificial Intelligence Scientist, Bonsai at MLconf SF 2017
Matineh Shaker, Artificial Intelligence Scientist, Bonsai at MLconf SF 2017
 
Reinforcement learning
Reinforcement learningReinforcement learning
Reinforcement learning
 
REINFORCEMENT LEARNING
REINFORCEMENT LEARNINGREINFORCEMENT LEARNING
REINFORCEMENT LEARNING
 
reinforcement-learning-141009013546-conversion-gate02.pdf
reinforcement-learning-141009013546-conversion-gate02.pdfreinforcement-learning-141009013546-conversion-gate02.pdf
reinforcement-learning-141009013546-conversion-gate02.pdf
 
How to formulate reinforcement learning in illustrative ways
How to formulate reinforcement learning in illustrative waysHow to formulate reinforcement learning in illustrative ways
How to formulate reinforcement learning in illustrative ways
 
(ppt
(ppt(ppt
(ppt
 
Reinforcement-Learning.ppt
Reinforcement-Learning.pptReinforcement-Learning.ppt
Reinforcement-Learning.ppt
 
Q_Learning.ppt
Q_Learning.pptQ_Learning.ppt
Q_Learning.ppt
 
the bellman equation
 the bellman equation the bellman equation
the bellman equation
 

Mehr von butest

EL MODELO DE NEGOCIO DE YOUTUBE
EL MODELO DE NEGOCIO DE YOUTUBEEL MODELO DE NEGOCIO DE YOUTUBE
EL MODELO DE NEGOCIO DE YOUTUBEbutest
 
1. MPEG I.B.P frame之不同
1. MPEG I.B.P frame之不同1. MPEG I.B.P frame之不同
1. MPEG I.B.P frame之不同butest
 
LESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALLESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALbutest
 
Timeline: The Life of Michael Jackson
Timeline: The Life of Michael JacksonTimeline: The Life of Michael Jackson
Timeline: The Life of Michael Jacksonbutest
 
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...butest
 
LESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALLESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALbutest
 
Com 380, Summer II
Com 380, Summer IICom 380, Summer II
Com 380, Summer IIbutest
 
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
The MYnstrel Free Press Volume 2: Economic Struggles, Meet JazzThe MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazzbutest
 
MICHAEL JACKSON.doc
MICHAEL JACKSON.docMICHAEL JACKSON.doc
MICHAEL JACKSON.docbutest
 
Social Networks: Twitter Facebook SL - Slide 1
Social Networks: Twitter Facebook SL - Slide 1Social Networks: Twitter Facebook SL - Slide 1
Social Networks: Twitter Facebook SL - Slide 1butest
 
Facebook
Facebook Facebook
Facebook butest
 
Executive Summary Hare Chevrolet is a General Motors dealership ...
Executive Summary Hare Chevrolet is a General Motors dealership ...Executive Summary Hare Chevrolet is a General Motors dealership ...
Executive Summary Hare Chevrolet is a General Motors dealership ...butest
 
Welcome to the Dougherty County Public Library's Facebook and ...
Welcome to the Dougherty County Public Library's Facebook and ...Welcome to the Dougherty County Public Library's Facebook and ...
Welcome to the Dougherty County Public Library's Facebook and ...butest
 
NEWS ANNOUNCEMENT
NEWS ANNOUNCEMENTNEWS ANNOUNCEMENT
NEWS ANNOUNCEMENTbutest
 
C-2100 Ultra Zoom.doc
C-2100 Ultra Zoom.docC-2100 Ultra Zoom.doc
C-2100 Ultra Zoom.docbutest
 
MAC Printing on ITS Printers.doc.doc
MAC Printing on ITS Printers.doc.docMAC Printing on ITS Printers.doc.doc
MAC Printing on ITS Printers.doc.docbutest
 
Mac OS X Guide.doc
Mac OS X Guide.docMac OS X Guide.doc
Mac OS X Guide.docbutest
 
WEB DESIGN!
WEB DESIGN!WEB DESIGN!
WEB DESIGN!butest
 

Mehr von butest (20)

EL MODELO DE NEGOCIO DE YOUTUBE
EL MODELO DE NEGOCIO DE YOUTUBEEL MODELO DE NEGOCIO DE YOUTUBE
EL MODELO DE NEGOCIO DE YOUTUBE
 
1. MPEG I.B.P frame之不同
1. MPEG I.B.P frame之不同1. MPEG I.B.P frame之不同
1. MPEG I.B.P frame之不同
 
LESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALLESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIAL
 
Timeline: The Life of Michael Jackson
Timeline: The Life of Michael JacksonTimeline: The Life of Michael Jackson
Timeline: The Life of Michael Jackson
 
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
 
LESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALLESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIAL
 
Com 380, Summer II
Com 380, Summer IICom 380, Summer II
Com 380, Summer II
 
PPT
PPTPPT
PPT
 
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
The MYnstrel Free Press Volume 2: Economic Struggles, Meet JazzThe MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
 
MICHAEL JACKSON.doc
MICHAEL JACKSON.docMICHAEL JACKSON.doc
MICHAEL JACKSON.doc
 
Social Networks: Twitter Facebook SL - Slide 1
Social Networks: Twitter Facebook SL - Slide 1Social Networks: Twitter Facebook SL - Slide 1
Social Networks: Twitter Facebook SL - Slide 1
 
Facebook
Facebook Facebook
Facebook
 
Executive Summary Hare Chevrolet is a General Motors dealership ...
Executive Summary Hare Chevrolet is a General Motors dealership ...Executive Summary Hare Chevrolet is a General Motors dealership ...
Executive Summary Hare Chevrolet is a General Motors dealership ...
 
Welcome to the Dougherty County Public Library's Facebook and ...
Welcome to the Dougherty County Public Library's Facebook and ...Welcome to the Dougherty County Public Library's Facebook and ...
Welcome to the Dougherty County Public Library's Facebook and ...
 
NEWS ANNOUNCEMENT
NEWS ANNOUNCEMENTNEWS ANNOUNCEMENT
NEWS ANNOUNCEMENT
 
C-2100 Ultra Zoom.doc
C-2100 Ultra Zoom.docC-2100 Ultra Zoom.doc
C-2100 Ultra Zoom.doc
 
MAC Printing on ITS Printers.doc.doc
MAC Printing on ITS Printers.doc.docMAC Printing on ITS Printers.doc.doc
MAC Printing on ITS Printers.doc.doc
 
Mac OS X Guide.doc
Mac OS X Guide.docMac OS X Guide.doc
Mac OS X Guide.doc
 
hier
hierhier
hier
 
WEB DESIGN!
WEB DESIGN!WEB DESIGN!
WEB DESIGN!
 

Lecture notes

  • 1. Reinforcement Learning Michael L. Littman Slides from http://www.cs.vu.nl/~elena/ml_13light.ppt which appear to have been adapted from http://www.cs.cmu.edu/afs/cs.cmu.edu/project/theo-3/www/l20.ps
  • 2.
  • 3.
  • 4.
  • 5.
  • 6.
  • 7.
  • 8.
  • 9.  
  • 10.
  • 11.
  • 12.
  • 13.
  • 14. Updating Q Notice if rewards non-negative, then and
  • 15.
  • 16. Note we used general fact that: This works with things other than max that satisfy this non-expansion property [Szepesv á ri & Littman, 1999].
  • 17.
  • 18. Nondeterministic Case (2) Q learning generalizes to nondeterministic worlds Alter training rule to where Can still prove convergence of to Q [Watkins and Dayan, 1992]. Standard properties:   n = 0,   n 2 =  .
  • 19. Temporal Difference Learning (1) Q learning: reduce discrepancy between successive Q estimates One step time difference: Why not two steps? Or n ? Blend all of these:
  • 20.
  • 21.