Diese Präsentation wurde erfolgreich gemeldet.
Wir verwenden Ihre LinkedIn Profilangaben und Informationen zu Ihren Aktivitäten, um Anzeigen zu personalisieren und Ihnen relevantere Inhalte anzuzeigen. Sie können Ihre Anzeigeneinstellungen jederzeit ändern.
Reinforcement Learning<br />A Beginner’s TutorialBy: Omar Enayet<br />(Presentation Version)<br />
The Problem<br />
Agent-Environment Interface<br />
Environment Model<br />
Goals & Rewards<br />
Returns<br />
Credit-Assignment Problem<br />
Markov Decision Process<br />An MDP is defined by &lt; S, A, p, r, &gt;<br />S  -  set of states of the environment<br /...
Value Functions<br />
Value Functions<br />
Value Functions<br />
Optimal Value Functions<br />
Exploration-Exploitation Problem<br />
Policies<br />
Elementary Solution Methods<br />
Dynamic Programming<br />
Perfect Model<br />
Bootstrapping<br />
Generalized Policy Iteration<br />
Efficiency of DP<br />
Monte-Carlo Methods<br />
Episodic Return<br />
Advantages over DP<br /><ul><li>No Model
Simulation OR part of Model
Focus on small subset of states
Less Harmed by violations of Markov Property</li></li></ul><li>First Visit VS Every-Visit<br />
On-Policy VS Off-Policy<br />
Action-value instead of State-value<br />
Temporal-Difference Learning<br />
Advantages of TD Learning<br />
SARSA (On-Policy)<br />
Q-Learning (Off-Policy)<br />
Actor-Critic Methods(On-Policy)<br />
R-Learning (Off-Policy)<br />&gt;&gt;Average Expected reward per time-step<br />
Eligibility Traces<br />
Nächste SlideShare
Wird geladen in …5
×

Reinforcement Learning : A Beginners Tutorial

15.941 Aufrufe

Veröffentlicht am

This a presentation of a Reinforcement Learning tutorial for beginners which I worked on.

  • Als Erste(r) kommentieren

Reinforcement Learning : A Beginners Tutorial

  1. 1. Reinforcement Learning<br />A Beginner’s TutorialBy: Omar Enayet<br />(Presentation Version)<br />
  2. 2. The Problem<br />
  3. 3. Agent-Environment Interface<br />
  4. 4. Environment Model<br />
  5. 5. Goals & Rewards<br />
  6. 6. Returns<br />
  7. 7. Credit-Assignment Problem<br />
  8. 8. Markov Decision Process<br />An MDP is defined by &lt; S, A, p, r, &gt;<br />S - set of states of the environment<br />A(s)– set of actions possible in state s<br /> - probability of transition from s<br />- expected reward when executing ain s<br /> - discount rate for expected reward<br />Assumption: discrete timet = 0, 1, 2, . . .<br />r<br />r<br />r<br />t +2<br />t +3<br />. . .<br />s<br />. . .<br />t +1<br />s<br />s<br />s<br />t+3<br />t+1<br />t+2<br />t<br />a<br />a<br />a<br />a<br />t<br />t+1<br />t +2<br />t +3<br />
  9. 9. Value Functions<br />
  10. 10. Value Functions<br />
  11. 11. Value Functions<br />
  12. 12. Optimal Value Functions<br />
  13. 13. Exploration-Exploitation Problem<br />
  14. 14. Policies<br />
  15. 15. Elementary Solution Methods<br />
  16. 16. Dynamic Programming<br />
  17. 17. Perfect Model<br />
  18. 18. Bootstrapping<br />
  19. 19. Generalized Policy Iteration<br />
  20. 20. Efficiency of DP<br />
  21. 21. Monte-Carlo Methods<br />
  22. 22. Episodic Return<br />
  23. 23. Advantages over DP<br /><ul><li>No Model
  24. 24. Simulation OR part of Model
  25. 25. Focus on small subset of states
  26. 26. Less Harmed by violations of Markov Property</li></li></ul><li>First Visit VS Every-Visit<br />
  27. 27. On-Policy VS Off-Policy<br />
  28. 28. Action-value instead of State-value<br />
  29. 29. Temporal-Difference Learning<br />
  30. 30. Advantages of TD Learning<br />
  31. 31. SARSA (On-Policy)<br />
  32. 32. Q-Learning (Off-Policy)<br />
  33. 33.
  34. 34. Actor-Critic Methods(On-Policy)<br />
  35. 35. R-Learning (Off-Policy)<br />&gt;&gt;Average Expected reward per time-step<br />
  36. 36. Eligibility Traces<br />
  37. 37.
  38. 38.
  39. 39. References<br />Richard S. Sutton and Andrew G. Barto. Reinforcement Learning, Bradford Books, 1998.<br />Richard Crouch, Peter Bennett, Stephen Bridges, Nick Piper and Robert Oates - Monte Carlo - 2003<br />Slides for reading with :<br /><ul><li>Omar Enayet – Reinforcement Learning : A Beginner’s Tutorial- 2009</li>

×