SlideShare ist ein Scribd-Unternehmen logo
1 von 32
Downloaden Sie, um offline zu lesen
Introduction to Machine
       Learning
                  Lecture 22
      Reinforcement Learning

                 Albert Orriols i Puig
             http://www.albertorriols.net
             htt //       lb t i l      t
                aorriols@salle.url.edu

      Artificial Intelligence – Machine Learning
                        g                      g
          Enginyeria i Arquitectura La Salle
                 Universitat Ramon Llull
Recap of Lecture 21
        Value functions
                Vπ(s): Long-term reward estimation
                from s a e s following po cy π
                  o state o o        g policy
                Qπ(s,a): Long-term reward estimation
                from s a e s e ecu g ac o a
                  o state executing action
                and then following policy π
        The long term reward is a recency weighted average of
                                  recency-weighted
        the received rewards

      …r                                                                                        …
                               at rt+1          at+1 rt+2             at+2 rt+3          at+3
                    t
                          st             st+1                  st+2               st+3




                                                                                                Slide 2
Artificial Intelligence                         Machine Learning
Recap of Lecture 21


        Policy
                A policy, π, is a mapping from states, s∈S, and actions,
                a∈A(s), to the probability π(s, a) of taking action a when in
                state s.




                                                                         Slide 3
Artificial Intelligence               Machine Learning
Today’s Agenda

        Bellman equations for value functions
        Optimal policy
        Learning the optimal policy
        Q-learning




                                                  Slide 4
Artificial Intelligence        Machine Learning
Let’s Estimate the Future Reward
        I want to estimate which will be my reward g
                                          y        given a
        certain state and a policy π
                For the state value function Vπ(s)
                        state-value




                For the action-value function Qπ(s,a)




                                                             Slide 5
Artificial Intelligence                Machine Learning
Bellman Equation for a Policy π
        Playing a little with the equations
           yg                      q




        Therefore




        Finally


                                                  Slide 6
Artificial Intelligence        Machine Learning
Q-value Bellman Equation
        If we estimate the q-value
                           q




                                                  Slide 7
Artificial Intelligence        Machine Learning
Calculation of Value Functions
        How to calculate the value functions for a given policy
                                                   g     p    y
                 Solve a set of linear equations
        1.

                          Bellman equation for Vπ




                          This is a system of |S| linear equations


                 Iterative method (convergence proved)
        2.

                          Calculate the value by sweeping through the states


                 Greedy methods
        3.

                                                                               Slide 8
Artificial Intelligence                       Machine Learning
Example: The Gridworld
        Rewards
                -1 if the agent goes out of the grid
                0 for all the other states except from state A and B
                From A, all four actions yield a reward of 10 and take the agent to A’
                From B, all four actions yield a reward of 5 and take the agent to B’




                (b) obtained by solving
                          Policy = equal probability for each movement
                          γ=0.9
                                                                                    Slide 9
Artificial Intelligence                        Machine Learning
Looking for the Optimal Policy




                                                           Slide 10
Artificial Intelligence   Machine Learning
Optimal Policy
        We search for a policy that achieves a lot of reward over
                        p    y
        the long run
        Value functions enable us to define a partial order over
        policies
                A policy π is better than or equal to π’ if its expected return is
                                                      π
                greater than or equal to that of π’ for all states
                Optimal policies π* share the optimal state value function V*
                                 π                    state-value          V




                Which can be written as



                                                                                Slide 11
Artificial Intelligence                  Machine Learning
Learning Optimal Policies




                                                                  Slide 12
Artificial Intelligence   Machine Learning
Focusing on the Objective
        We want to find the optimal policy
                             p      p    y
        There are many methods for this purpose
                Dynamic programming
                D    i          i
                          Policy iteration
                          Value iteration
                          [Asynchronous versions]
                RL algorithms
                          Q-learning
                          Sarsa
                          TD-learning



        We are going to see Q-learning

                                                                Slide 13
Artificial Intelligence                      Machine Learning
Q-learning
        RL algorithms
             g
                Learning by doing


        Temporal difference method
                Learn directly from raw experience without a model of the
                environment’s dynamics


        Advantages
                No model of the world needed
                Good policies before learning the optimal policy
                Reacts to changes in the environment
                              g

                                                                            Slide 14
Artificial Intelligence                Machine Learning
Dynamic Programming in Brief




        Needs a model of the environment to compute true expected values
        A very informative backup
                                                                   Slide 15
Artificial Intelligence             Machine Learning
Temporal Difference Leraning




        No model of the world needed
        Most incremental
                                                    Slide 16
Artificial Intelligence          Machine Learning
Q-learning
        Based on Q-backups
                 Q      p




                The learned action-value function Q directly approximates Q*,
                independent of the policy being followed




                                                                          Slide 17
Artificial Intelligence               Machine Learning
Q-learning: Pseudo code
        Pseudo code for Q-learning
                        Q        g




                                                Slide 18
Artificial Intelligence      Machine Learning
Q-learning in Action
15x15 maze world; R(goal)=1; R(other)=0

γ=0.9
α=0.65




                                          Slide 19
Q-learning in Action
Initial policy




                                        Slide 20
Q-learning in Action
After 20 episodes




                                   Slide 21
Q-learning in Action
After 30 episodes




                                   Slide 22
Q-learning in Action
After 100 episodes




                                  Slide 23
Q-learning in Action
After 150 episodes




                                  Slide 24
Q-learning in Action
After 200 episodes




                                  Slide 25
Q-learning in Action
After 250 episodes




                                  Slide 26
Q-learning in Action
After 300 episodes




                                  Slide 27
Q-learning in Action
After 350 episodes




                                  Slide 28
Q-learning in Action
After 400 episodes




                                  Slide 29
Some Last Remarks
        Exploration regime
          p           g
                Explore vs. exploit
                          ε-greedy
                          ε greedy action selection
                          Soft-max action selection
                Initialization f Q-values: b optimistic
                I iti li ti of Q l         be ti i ti
                Learning rate α
                          In stationary environments
                              α(s) = 1 / (number of visits to state s)
                          In non-stationary environments
                              α takes a constant value
                              The higher the value the higher the influence of recent
                                             value,
                              experiences




                                                                                        Slide 30
Artificial Intelligence                         Machine Learning
Next Class

        Reinforcement l
        Rif         t learning with LCSs
                           i    ith LCS




                                                Slide 31
Artificial Intelligence      Machine Learning
Introduction to Machine
       Learning
                  Lecture 22
      Reinforcement Learning

                 Albert Orriols i Puig
             http://www.albertorriols.net
             htt //       lb t i l      t
                aorriols@salle.url.edu

      Artificial Intelligence – Machine Learning
                        g                      g
          Enginyeria i Arquitectura La Salle
                 Universitat Ramon Llull

Weitere ähnliche Inhalte

Was ist angesagt?

New Challenges in Learning Classifier Systems: Mining Rarities and Evolving F...
New Challenges in Learning Classifier Systems: Mining Rarities and Evolving F...New Challenges in Learning Classifier Systems: Mining Rarities and Evolving F...
New Challenges in Learning Classifier Systems: Mining Rarities and Evolving F...Albert Orriols-Puig
 
CCIA'2008: Can Evolution Strategies Improve Learning Guidance in XCS? Design ...
CCIA'2008: Can Evolution Strategies Improve Learning Guidance in XCS? Design ...CCIA'2008: Can Evolution Strategies Improve Learning Guidance in XCS? Design ...
CCIA'2008: Can Evolution Strategies Improve Learning Guidance in XCS? Design ...Albert Orriols-Puig
 
4th grade math curriculum map 2011 2012-1
4th grade math curriculum map 2011 2012-14th grade math curriculum map 2011 2012-1
4th grade math curriculum map 2011 2012-1Isaac_Schools_5
 
25 introduction reinforcement_learning
25 introduction reinforcement_learning25 introduction reinforcement_learning
25 introduction reinforcement_learningAndres Mendez-Vazquez
 
HIS'2008: New Crossover Operator for Evolutionary Rule Discovery in XCS
HIS'2008: New Crossover Operator for Evolutionary Rule Discovery in XCSHIS'2008: New Crossover Operator for Evolutionary Rule Discovery in XCS
HIS'2008: New Crossover Operator for Evolutionary Rule Discovery in XCSAlbert Orriols-Puig
 
A QUICK INTRODUCTION TO DEEP LEARNING
A QUICK INTRODUCTION TO DEEP LEARNINGA QUICK INTRODUCTION TO DEEP LEARNING
A QUICK INTRODUCTION TO DEEP LEARNINGVishalChitkara4
 
Acoustic Features to Predict Topic Change
Acoustic Features to Predict Topic ChangeAcoustic Features to Predict Topic Change
Acoustic Features to Predict Topic ChangeVineet Kumar
 
Reinforcement Learning (DLAI D7L2 2017 UPC Deep Learning for Artificial Intel...
Reinforcement Learning (DLAI D7L2 2017 UPC Deep Learning for Artificial Intel...Reinforcement Learning (DLAI D7L2 2017 UPC Deep Learning for Artificial Intel...
Reinforcement Learning (DLAI D7L2 2017 UPC Deep Learning for Artificial Intel...Universitat Politècnica de Catalunya
 
P04 restricted boltzmann machines cvpr2012 deep learning methods for vision
P04 restricted boltzmann machines cvpr2012 deep learning methods for visionP04 restricted boltzmann machines cvpr2012 deep learning methods for vision
P04 restricted boltzmann machines cvpr2012 deep learning methods for visionzukun
 
ACM ICMI Workshop 2012
ACM ICMI Workshop 2012ACM ICMI Workshop 2012
ACM ICMI Workshop 2012Lê Anh
 
chapter18.doc.doc
chapter18.doc.docchapter18.doc.doc
chapter18.doc.docbutest
 
NNFL 4 - Guru Nanak Dev Engineering College
NNFL  4 - Guru Nanak Dev Engineering CollegeNNFL  4 - Guru Nanak Dev Engineering College
NNFL 4 - Guru Nanak Dev Engineering CollegeMR. VIKRAM SNEHI
 

Was ist angesagt? (20)

Lecture2 - Machine Learning
Lecture2 - Machine LearningLecture2 - Machine Learning
Lecture2 - Machine Learning
 
Lecture6 - C4.5
Lecture6 - C4.5Lecture6 - C4.5
Lecture6 - C4.5
 
Lecture11 - neural networks
Lecture11 - neural networksLecture11 - neural networks
Lecture11 - neural networks
 
Lecture17
Lecture17Lecture17
Lecture17
 
Lecture24
Lecture24Lecture24
Lecture24
 
Lecture4 - Machine Learning
Lecture4 - Machine LearningLecture4 - Machine Learning
Lecture4 - Machine Learning
 
New Challenges in Learning Classifier Systems: Mining Rarities and Evolving F...
New Challenges in Learning Classifier Systems: Mining Rarities and Evolving F...New Challenges in Learning Classifier Systems: Mining Rarities and Evolving F...
New Challenges in Learning Classifier Systems: Mining Rarities and Evolving F...
 
CCIA'2008: Can Evolution Strategies Improve Learning Guidance in XCS? Design ...
CCIA'2008: Can Evolution Strategies Improve Learning Guidance in XCS? Design ...CCIA'2008: Can Evolution Strategies Improve Learning Guidance in XCS? Design ...
CCIA'2008: Can Evolution Strategies Improve Learning Guidance in XCS? Design ...
 
Lecture5 - C4.5
Lecture5 - C4.5Lecture5 - C4.5
Lecture5 - C4.5
 
4th grade math curriculum map 2011 2012-1
4th grade math curriculum map 2011 2012-14th grade math curriculum map 2011 2012-1
4th grade math curriculum map 2011 2012-1
 
25 introduction reinforcement_learning
25 introduction reinforcement_learning25 introduction reinforcement_learning
25 introduction reinforcement_learning
 
HIS'2008: New Crossover Operator for Evolutionary Rule Discovery in XCS
HIS'2008: New Crossover Operator for Evolutionary Rule Discovery in XCSHIS'2008: New Crossover Operator for Evolutionary Rule Discovery in XCS
HIS'2008: New Crossover Operator for Evolutionary Rule Discovery in XCS
 
A QUICK INTRODUCTION TO DEEP LEARNING
A QUICK INTRODUCTION TO DEEP LEARNINGA QUICK INTRODUCTION TO DEEP LEARNING
A QUICK INTRODUCTION TO DEEP LEARNING
 
Acoustic Features to Predict Topic Change
Acoustic Features to Predict Topic ChangeAcoustic Features to Predict Topic Change
Acoustic Features to Predict Topic Change
 
Reinforcement Learning (DLAI D7L2 2017 UPC Deep Learning for Artificial Intel...
Reinforcement Learning (DLAI D7L2 2017 UPC Deep Learning for Artificial Intel...Reinforcement Learning (DLAI D7L2 2017 UPC Deep Learning for Artificial Intel...
Reinforcement Learning (DLAI D7L2 2017 UPC Deep Learning for Artificial Intel...
 
P04 restricted boltzmann machines cvpr2012 deep learning methods for vision
P04 restricted boltzmann machines cvpr2012 deep learning methods for visionP04 restricted boltzmann machines cvpr2012 deep learning methods for vision
P04 restricted boltzmann machines cvpr2012 deep learning methods for vision
 
ACM ICMI Workshop 2012
ACM ICMI Workshop 2012ACM ICMI Workshop 2012
ACM ICMI Workshop 2012
 
JavaYDL20
JavaYDL20JavaYDL20
JavaYDL20
 
chapter18.doc.doc
chapter18.doc.docchapter18.doc.doc
chapter18.doc.doc
 
NNFL 4 - Guru Nanak Dev Engineering College
NNFL  4 - Guru Nanak Dev Engineering CollegeNNFL  4 - Guru Nanak Dev Engineering College
NNFL 4 - Guru Nanak Dev Engineering College
 

Andere mochten auch (19)

Lecture19
Lecture19Lecture19
Lecture19
 
Lecture1 AI1 Introduction to artificial intelligence
Lecture1 AI1 Introduction to artificial intelligenceLecture1 AI1 Introduction to artificial intelligence
Lecture1 AI1 Introduction to artificial intelligence
 
Lecture14 - Advanced topics in association rules
Lecture14 - Advanced topics in association rulesLecture14 - Advanced topics in association rules
Lecture14 - Advanced topics in association rules
 
HAIS09-BeyondHomemadeArtificialDatasets
HAIS09-BeyondHomemadeArtificialDatasetsHAIS09-BeyondHomemadeArtificialDatasets
HAIS09-BeyondHomemadeArtificialDatasets
 
Lecture8 - From CBR to IBk
Lecture8 - From CBR to IBkLecture8 - From CBR to IBk
Lecture8 - From CBR to IBk
 
Lecture9 - Bayesian-Decision-Theory
Lecture9 - Bayesian-Decision-TheoryLecture9 - Bayesian-Decision-Theory
Lecture9 - Bayesian-Decision-Theory
 
Lecture12 - SVM
Lecture12 - SVMLecture12 - SVM
Lecture12 - SVM
 
Lecture10 - Naïve Bayes
Lecture10 - Naïve BayesLecture10 - Naïve Bayes
Lecture10 - Naïve Bayes
 
Lecture13 - Association Rules
Lecture13 - Association RulesLecture13 - Association Rules
Lecture13 - Association Rules
 
Day 9 routing
Day 9 routingDay 9 routing
Day 9 routing
 
Routing
RoutingRouting
Routing
 
Routing Presentation
Routing PresentationRouting Presentation
Routing Presentation
 
Algorithmic Puzzles
Algorithmic PuzzlesAlgorithmic Puzzles
Algorithmic Puzzles
 
Ad hoc Networks
Ad hoc NetworksAd hoc Networks
Ad hoc Networks
 
Unit 7
Unit 7Unit 7
Unit 7
 
Decision tree Using c4.5 Algorithm
Decision tree Using c4.5 AlgorithmDecision tree Using c4.5 Algorithm
Decision tree Using c4.5 Algorithm
 
Lecture 11 14. Adhoc routing protocols cont..
Lecture 11 14. Adhoc  routing protocols cont..Lecture 11 14. Adhoc  routing protocols cont..
Lecture 11 14. Adhoc routing protocols cont..
 
Ad-HOc presentation
Ad-HOc presentationAd-HOc presentation
Ad-HOc presentation
 
Ch2 properties of the task environment
Ch2 properties of the task environmentCh2 properties of the task environment
Ch2 properties of the task environment
 

Ähnlich wie Lecture22

Reinforcement Learning
Reinforcement LearningReinforcement Learning
Reinforcement LearningSalem-Kabbani
 
Reinforcement learning
Reinforcement learningReinforcement learning
Reinforcement learningDongHyun Kwak
 
A Brief Survey of Reinforcement Learning
A Brief Survey of Reinforcement LearningA Brief Survey of Reinforcement Learning
A Brief Survey of Reinforcement LearningGiancarlo Frison
 
“Vision-language Representations for Robotics,” a Presentation from the Unive...
“Vision-language Representations for Robotics,” a Presentation from the Unive...“Vision-language Representations for Robotics,” a Presentation from the Unive...
“Vision-language Representations for Robotics,” a Presentation from the Unive...Edge AI and Vision Alliance
 
“Reinforcement Learning: a Practical Introduction,” a Presentation from Micro...
“Reinforcement Learning: a Practical Introduction,” a Presentation from Micro...“Reinforcement Learning: a Practical Introduction,” a Presentation from Micro...
“Reinforcement Learning: a Practical Introduction,” a Presentation from Micro...Edge AI and Vision Alliance
 
Reinforcement learning
Reinforcement learningReinforcement learning
Reinforcement learningDing Li
 
AI driven classification framework for advanced Test Automation
AI driven classification framework for advanced Test AutomationAI driven classification framework for advanced Test Automation
AI driven classification framework for advanced Test AutomationSTePINForum
 
Fundamental of Machine Learning
Fundamental of Machine LearningFundamental of Machine Learning
Fundamental of Machine LearningSARCCOM
 
Reinforcement Learning (Reloaded) - Xavier Giró-i-Nieto - UPC Barcelona 2018
Reinforcement Learning (Reloaded) - Xavier Giró-i-Nieto - UPC Barcelona 2018Reinforcement Learning (Reloaded) - Xavier Giró-i-Nieto - UPC Barcelona 2018
Reinforcement Learning (Reloaded) - Xavier Giró-i-Nieto - UPC Barcelona 2018Universitat Politècnica de Catalunya
 
Jay Yagnik at AI Frontiers : A History Lesson on AI
Jay Yagnik at AI Frontiers : A History Lesson on AIJay Yagnik at AI Frontiers : A History Lesson on AI
Jay Yagnik at AI Frontiers : A History Lesson on AIAI Frontiers
 
Machine learning Vs Deep learning Vs Reinforcement learning | Pydata Mumbai
Machine learning Vs Deep learning Vs Reinforcement learning | Pydata Mumbai Machine learning Vs Deep learning Vs Reinforcement learning | Pydata Mumbai
Machine learning Vs Deep learning Vs Reinforcement learning | Pydata Mumbai Pratik Bhavsar
 
Demystifying deep reinforement learning
Demystifying deep reinforement learningDemystifying deep reinforement learning
Demystifying deep reinforement learning재연 윤
 
TensorFlow London 17: Practical Reinforcement Learning with OpenAI
TensorFlow London 17: Practical Reinforcement Learning with OpenAITensorFlow London 17: Practical Reinforcement Learning with OpenAI
TensorFlow London 17: Practical Reinforcement Learning with OpenAISeldon
 
Reinforcement Learning
Reinforcement LearningReinforcement Learning
Reinforcement LearningDongHyun Kwak
 

Ähnlich wie Lecture22 (20)

World models v0.14
World models v0.14World models v0.14
World models v0.14
 
Reinforcement Learning
Reinforcement LearningReinforcement Learning
Reinforcement Learning
 
Reinforcement learning
Reinforcement learningReinforcement learning
Reinforcement learning
 
A Brief Survey of Reinforcement Learning
A Brief Survey of Reinforcement LearningA Brief Survey of Reinforcement Learning
A Brief Survey of Reinforcement Learning
 
Introduction to Deep Reinforcement Learning
Introduction to Deep Reinforcement LearningIntroduction to Deep Reinforcement Learning
Introduction to Deep Reinforcement Learning
 
“Vision-language Representations for Robotics,” a Presentation from the Unive...
“Vision-language Representations for Robotics,” a Presentation from the Unive...“Vision-language Representations for Robotics,” a Presentation from the Unive...
“Vision-language Representations for Robotics,” a Presentation from the Unive...
 
Playing Atari with Deep Reinforcement Learning
Playing Atari with Deep Reinforcement LearningPlaying Atari with Deep Reinforcement Learning
Playing Atari with Deep Reinforcement Learning
 
Reinforcement Learning - DQN
Reinforcement Learning - DQNReinforcement Learning - DQN
Reinforcement Learning - DQN
 
Deep Reinforcement Learning
Deep Reinforcement LearningDeep Reinforcement Learning
Deep Reinforcement Learning
 
“Reinforcement Learning: a Practical Introduction,” a Presentation from Micro...
“Reinforcement Learning: a Practical Introduction,” a Presentation from Micro...“Reinforcement Learning: a Practical Introduction,” a Presentation from Micro...
“Reinforcement Learning: a Practical Introduction,” a Presentation from Micro...
 
Reinforcement learning
Reinforcement learningReinforcement learning
Reinforcement learning
 
AI driven classification framework for advanced Test Automation
AI driven classification framework for advanced Test AutomationAI driven classification framework for advanced Test Automation
AI driven classification framework for advanced Test Automation
 
Fundamental of Machine Learning
Fundamental of Machine LearningFundamental of Machine Learning
Fundamental of Machine Learning
 
Reinforcement Learning (Reloaded) - Xavier Giró-i-Nieto - UPC Barcelona 2018
Reinforcement Learning (Reloaded) - Xavier Giró-i-Nieto - UPC Barcelona 2018Reinforcement Learning (Reloaded) - Xavier Giró-i-Nieto - UPC Barcelona 2018
Reinforcement Learning (Reloaded) - Xavier Giró-i-Nieto - UPC Barcelona 2018
 
Direct policy search
Direct policy searchDirect policy search
Direct policy search
 
Jay Yagnik at AI Frontiers : A History Lesson on AI
Jay Yagnik at AI Frontiers : A History Lesson on AIJay Yagnik at AI Frontiers : A History Lesson on AI
Jay Yagnik at AI Frontiers : A History Lesson on AI
 
Machine learning Vs Deep learning Vs Reinforcement learning | Pydata Mumbai
Machine learning Vs Deep learning Vs Reinforcement learning | Pydata Mumbai Machine learning Vs Deep learning Vs Reinforcement learning | Pydata Mumbai
Machine learning Vs Deep learning Vs Reinforcement learning | Pydata Mumbai
 
Demystifying deep reinforement learning
Demystifying deep reinforement learningDemystifying deep reinforement learning
Demystifying deep reinforement learning
 
TensorFlow London 17: Practical Reinforcement Learning with OpenAI
TensorFlow London 17: Practical Reinforcement Learning with OpenAITensorFlow London 17: Practical Reinforcement Learning with OpenAI
TensorFlow London 17: Practical Reinforcement Learning with OpenAI
 
Reinforcement Learning
Reinforcement LearningReinforcement Learning
Reinforcement Learning
 

Kürzlich hochgeladen

Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Celine George
 
Concurrency Control in Database Management system
Concurrency Control in Database Management systemConcurrency Control in Database Management system
Concurrency Control in Database Management systemChristalin Nelson
 
ANG SEKTOR NG agrikultura.pptx QUARTER 4
ANG SEKTOR NG agrikultura.pptx QUARTER 4ANG SEKTOR NG agrikultura.pptx QUARTER 4
ANG SEKTOR NG agrikultura.pptx QUARTER 4MiaBumagat1
 
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17Celine George
 
Judging the Relevance and worth of ideas part 2.pptx
Judging the Relevance  and worth of ideas part 2.pptxJudging the Relevance  and worth of ideas part 2.pptx
Judging the Relevance and worth of ideas part 2.pptxSherlyMaeNeri
 
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxMULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxAnupkumar Sharma
 
Karra SKD Conference Presentation Revised.pptx
Karra SKD Conference Presentation Revised.pptxKarra SKD Conference Presentation Revised.pptx
Karra SKD Conference Presentation Revised.pptxAshokKarra1
 
Earth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatEarth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatYousafMalik24
 
4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptx4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptxmary850239
 
How to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERPHow to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERPCeline George
 
Student Profile Sample - We help schools to connect the data they have, with ...
Student Profile Sample - We help schools to connect the data they have, with ...Student Profile Sample - We help schools to connect the data they have, with ...
Student Profile Sample - We help schools to connect the data they have, with ...Seán Kennedy
 
ACC 2024 Chronicles. Cardiology. Exam.pdf
ACC 2024 Chronicles. Cardiology. Exam.pdfACC 2024 Chronicles. Cardiology. Exam.pdf
ACC 2024 Chronicles. Cardiology. Exam.pdfSpandanaRallapalli
 
Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Mark Reed
 
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...JhezDiaz1
 
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdfGrade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdfJemuel Francisco
 
Virtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdf
Virtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdfVirtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdf
Virtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdfErwinPantujan2
 
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxINTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxHumphrey A Beña
 
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxiammrhaywood
 

Kürzlich hochgeladen (20)

Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17
 
Concurrency Control in Database Management system
Concurrency Control in Database Management systemConcurrency Control in Database Management system
Concurrency Control in Database Management system
 
ANG SEKTOR NG agrikultura.pptx QUARTER 4
ANG SEKTOR NG agrikultura.pptx QUARTER 4ANG SEKTOR NG agrikultura.pptx QUARTER 4
ANG SEKTOR NG agrikultura.pptx QUARTER 4
 
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
 
Judging the Relevance and worth of ideas part 2.pptx
Judging the Relevance  and worth of ideas part 2.pptxJudging the Relevance  and worth of ideas part 2.pptx
Judging the Relevance and worth of ideas part 2.pptx
 
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxMULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
 
Karra SKD Conference Presentation Revised.pptx
Karra SKD Conference Presentation Revised.pptxKarra SKD Conference Presentation Revised.pptx
Karra SKD Conference Presentation Revised.pptx
 
Earth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatEarth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice great
 
4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptx4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptx
 
Raw materials used in Herbal Cosmetics.pptx
Raw materials used in Herbal Cosmetics.pptxRaw materials used in Herbal Cosmetics.pptx
Raw materials used in Herbal Cosmetics.pptx
 
How to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERPHow to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERP
 
Student Profile Sample - We help schools to connect the data they have, with ...
Student Profile Sample - We help schools to connect the data they have, with ...Student Profile Sample - We help schools to connect the data they have, with ...
Student Profile Sample - We help schools to connect the data they have, with ...
 
ACC 2024 Chronicles. Cardiology. Exam.pdf
ACC 2024 Chronicles. Cardiology. Exam.pdfACC 2024 Chronicles. Cardiology. Exam.pdf
ACC 2024 Chronicles. Cardiology. Exam.pdf
 
Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)
 
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
 
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdfGrade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
 
Virtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdf
Virtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdfVirtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdf
Virtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdf
 
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxINTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
 
YOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptx
YOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptxYOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptx
YOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptx
 
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
 

Lecture22

  • 1. Introduction to Machine Learning Lecture 22 Reinforcement Learning Albert Orriols i Puig http://www.albertorriols.net htt // lb t i l t aorriols@salle.url.edu Artificial Intelligence – Machine Learning g g Enginyeria i Arquitectura La Salle Universitat Ramon Llull
  • 2. Recap of Lecture 21 Value functions Vπ(s): Long-term reward estimation from s a e s following po cy π o state o o g policy Qπ(s,a): Long-term reward estimation from s a e s e ecu g ac o a o state executing action and then following policy π The long term reward is a recency weighted average of recency-weighted the received rewards …r … at rt+1 at+1 rt+2 at+2 rt+3 at+3 t st st+1 st+2 st+3 Slide 2 Artificial Intelligence Machine Learning
  • 3. Recap of Lecture 21 Policy A policy, π, is a mapping from states, s∈S, and actions, a∈A(s), to the probability π(s, a) of taking action a when in state s. Slide 3 Artificial Intelligence Machine Learning
  • 4. Today’s Agenda Bellman equations for value functions Optimal policy Learning the optimal policy Q-learning Slide 4 Artificial Intelligence Machine Learning
  • 5. Let’s Estimate the Future Reward I want to estimate which will be my reward g y given a certain state and a policy π For the state value function Vπ(s) state-value For the action-value function Qπ(s,a) Slide 5 Artificial Intelligence Machine Learning
  • 6. Bellman Equation for a Policy π Playing a little with the equations yg q Therefore Finally Slide 6 Artificial Intelligence Machine Learning
  • 7. Q-value Bellman Equation If we estimate the q-value q Slide 7 Artificial Intelligence Machine Learning
  • 8. Calculation of Value Functions How to calculate the value functions for a given policy g p y Solve a set of linear equations 1. Bellman equation for Vπ This is a system of |S| linear equations Iterative method (convergence proved) 2. Calculate the value by sweeping through the states Greedy methods 3. Slide 8 Artificial Intelligence Machine Learning
  • 9. Example: The Gridworld Rewards -1 if the agent goes out of the grid 0 for all the other states except from state A and B From A, all four actions yield a reward of 10 and take the agent to A’ From B, all four actions yield a reward of 5 and take the agent to B’ (b) obtained by solving Policy = equal probability for each movement γ=0.9 Slide 9 Artificial Intelligence Machine Learning
  • 10. Looking for the Optimal Policy Slide 10 Artificial Intelligence Machine Learning
  • 11. Optimal Policy We search for a policy that achieves a lot of reward over p y the long run Value functions enable us to define a partial order over policies A policy π is better than or equal to π’ if its expected return is π greater than or equal to that of π’ for all states Optimal policies π* share the optimal state value function V* π state-value V Which can be written as Slide 11 Artificial Intelligence Machine Learning
  • 12. Learning Optimal Policies Slide 12 Artificial Intelligence Machine Learning
  • 13. Focusing on the Objective We want to find the optimal policy p p y There are many methods for this purpose Dynamic programming D i i Policy iteration Value iteration [Asynchronous versions] RL algorithms Q-learning Sarsa TD-learning We are going to see Q-learning Slide 13 Artificial Intelligence Machine Learning
  • 14. Q-learning RL algorithms g Learning by doing Temporal difference method Learn directly from raw experience without a model of the environment’s dynamics Advantages No model of the world needed Good policies before learning the optimal policy Reacts to changes in the environment g Slide 14 Artificial Intelligence Machine Learning
  • 15. Dynamic Programming in Brief Needs a model of the environment to compute true expected values A very informative backup Slide 15 Artificial Intelligence Machine Learning
  • 16. Temporal Difference Leraning No model of the world needed Most incremental Slide 16 Artificial Intelligence Machine Learning
  • 17. Q-learning Based on Q-backups Q p The learned action-value function Q directly approximates Q*, independent of the policy being followed Slide 17 Artificial Intelligence Machine Learning
  • 18. Q-learning: Pseudo code Pseudo code for Q-learning Q g Slide 18 Artificial Intelligence Machine Learning
  • 19. Q-learning in Action 15x15 maze world; R(goal)=1; R(other)=0 γ=0.9 α=0.65 Slide 19
  • 20. Q-learning in Action Initial policy Slide 20
  • 21. Q-learning in Action After 20 episodes Slide 21
  • 22. Q-learning in Action After 30 episodes Slide 22
  • 23. Q-learning in Action After 100 episodes Slide 23
  • 24. Q-learning in Action After 150 episodes Slide 24
  • 25. Q-learning in Action After 200 episodes Slide 25
  • 26. Q-learning in Action After 250 episodes Slide 26
  • 27. Q-learning in Action After 300 episodes Slide 27
  • 28. Q-learning in Action After 350 episodes Slide 28
  • 29. Q-learning in Action After 400 episodes Slide 29
  • 30. Some Last Remarks Exploration regime p g Explore vs. exploit ε-greedy ε greedy action selection Soft-max action selection Initialization f Q-values: b optimistic I iti li ti of Q l be ti i ti Learning rate α In stationary environments α(s) = 1 / (number of visits to state s) In non-stationary environments α takes a constant value The higher the value the higher the influence of recent value, experiences Slide 30 Artificial Intelligence Machine Learning
  • 31. Next Class Reinforcement l Rif t learning with LCSs i ith LCS Slide 31 Artificial Intelligence Machine Learning
  • 32. Introduction to Machine Learning Lecture 22 Reinforcement Learning Albert Orriols i Puig http://www.albertorriols.net htt // lb t i l t aorriols@salle.url.edu Artificial Intelligence – Machine Learning g g Enginyeria i Arquitectura La Salle Universitat Ramon Llull