SlideShare ist ein Scribd-Unternehmen logo
1 von 27
Downloaden Sie, um offline zu lesen
Introduction to Machine
       Learning
                  Lecture 21
      Reinforcement Learning

                 Albert Orriols i Puig
             http://www.albertorriols.net
             htt //       lb t i l      t
                aorriols@salle.url.edu

      Artificial Intelligence – Machine Learning
                        g                      g
          Enginyeria i Arquitectura La Salle
                 Universitat Ramon Llull
Recap of Lectures 5-18
Supervised learning
  p               g
        Data classification
                Labeled data
                Build a model that
                covers all the space

Unsupervised learning
        Clustering
                Unlabeled data
                Group similar objects
                G      i il    bj t

        Association rule analysis
                Unlabeled data
                Get the most frequent/important associations

Genetic Fuzzy Systems
                                                               Slide 2
Artificial Intelligence                    Machine Learning
Today’s Agenda


        Introduction
        Reinforcement Learning
        Some examples before going farther




                                                  Slide 3
Artificial Intelligence        Machine Learning
Introduction
        What does reinforcement learning aim at?
                                       g
                Learning from interaction (with environment)

                Goal-directed learning

GOAL
                                       State

                                                                  Environment
                                                                    Environment
                                       Action

                                         Agent
                  agent


                Learning what to do and its effect
                          Trial-and-error search and delayed reward
                                                                                  Slide 4
Artificial Intelligence                        Machine Learning
Introduction

        Learn a reactive behaviors
        Behaviors as a mapping between perceptions and actions
        The
        Th agent has to exploit what it already knows in order to
                  th t        l it h t     l    dk       i    dt
        obtain reward, but it also has to explore in order to make
        better action selections in the future.
        Dilemma − neither exploitation nor exploration can be
           e  a    e t e e p o tat o     o e p o at o ca
        pursued exclusively without failing at the task.




                                                             Slide 5
Artificial Intelligence       Machine Learning
How Can We Learn It?
        Look-up tables
              p                                    Rules
1.                                          3.

         Perception        Action
            State 1       Action 1
            State 2       Action 2
            State 3       Action 3
                …         …



        Neural Net orks
        Ne ral Networks                            Finite t
                                                   Fi it automata
                                                               t
2.                                          4.




                                                                    Slide 6
Artificial Intelligence              Machine Learning
Reinforcement Learning




                                                               Slide 7
Artificial Intelligence   Machine Learning
Reinforcement Learning
                                                                     Reward function
                                    Agent
                                                                           r:S → R
                  State                                 Action
                                                                           or
                                        Reward
                    st                                    at
                                                                          r:S×A→ R
                                          rt

                                Environment

                Agent and environment interact at discrete time steps t=0,1,2, …

                The agent
                     g
                          observes state at step t: st ε S
                          produces action at at step t: at ε A(st)
                          gets resulting reward: rt+1 ε R
                          goes to the next step st+1

                                                                                       Slide 8
Artificial Intelligence                       Machine Learning
Reinforcement Learning
                                                Agent

                                State                                  Action
                                                     Reward
                                  st                                     at
                                                       rt

                                           Environment

                Trace of a trial


      …r                                                                                        …
                               at rt+1          at+1 rt+2             at+2 rt+3          at+3
                    t
                          st             st+1                  st+2               st+3

                Agent goal:
                          Maximize the total amount of reward t receives

                Therefore, that means maximizing not only the immediate reward,
                but cumulative reward in the long run
                                                                                                Slide 9
Artificial Intelligence                         Machine Learning
Example of RL
        Example: Recycling robot
                State
                          charge level of battery

                Actions
                          look for cans, wait for can, go recharge

                Reward
                R    d
                          positive for finding cans, negative for running out of battery




                                                                                    Slide 10
Artificial Intelligence                       Machine Learning
More precisely…
        Restricting to Markovian Decision Process (MDP)
                  g                               (   )
                Finite set of situations
                Finite t f ti
                Fi it set of actions
                Transition probabilities




                Reward probabilities




                This means that
                          The agent needs to have complete information of the world
                          State st+1 only depends on state st and action at
                                                                                Slide 11
Artificial Intelligence                       Machine Learning
Recycling Robot Example

                                           1 − β , −3                      β , R search
                      wait
           1, R
                            wait                                        search

                                                    recharge
                                           1, 0
                                   High
                                     g                            Low


                          search                                    wait


              α ,R                        1 − α ,R
                            search                                               wait
                                                         search
                                                                           1R
                                                                           1,


                                                                                        Slide 12
Artificial Intelligence                     Machine Learning
Recycling Robot Example
S = {high, low}
         g
A (high) = {wait, search}
A (low ) = {wait, search, recharge}




                                                  R search : expected # cans while searching
                                                  R wait : expected # cans while waiting
                                                               R search > R wait




                                                                                   Slide 13
Artificial Intelligence        Machine Learning
Breaking the Markovian Property
        Possible problems that do not satisfy MDP
                 p                          y
                When action and states are not finite
                          Solution: Discretize the set of actions and states
                When transition probabilities do not depend only on the current
                state
                          Possible solution: represent states as structures build up
                          over time from sequences of sensations
                                           q
                          This is POMDP     Partial observable MDP
                          Use POMDP algorithms to solve these problems
                                      g




                                                                                   Slide 14
Artificial Intelligence                      Machine Learning
Elements of Reinforcement Learning




                                                        Slide 15
Artificial Intelligence     Machine Learning
Elements of RL




                Policy: what to do
                Reward: what’s good
                Value: What’s good because it p ed cts reward
                 a ue    at s               t predicts e a d
                Model: What follows what


                                                                Slide 16
Artificial Intelligence               Machine Learning
Components of an RL Agent
        Policy (behavior)
                Mapping from states to actions
                          π*: S       A
        Reward
                Local reward in state t:
                          rt
        Model
                Probability of transition from state s to s’ by executing action a
                                                          s
                          T(s,a,s’)
        And
                The transitions probabilities depend only on these parameters
                This is not known by the agent
                                                                              Slide 17
Artificial Intelligence                    Machine Learning
Components of an RL Agent
        Value functions
                Vπ(s): Long-term reward estimation from state s following policy
                π
                Qπ(s,a): Long-term reward estimation from state s executing
                ac o
                action a and then following po cy π
                         ad e oo          g policy
        A simple example
                A maze




                Note t at t e age t does not know its o
                 ote that the agent         ot o ts own pos t o It ca o y
                                                            position. t can only
                perceive what it has in the surrounding states
                                                                            Slide 18
Artificial Intelligence                Machine Learning
Components of an RL Agent
        Value functions
                Vπ(s): Long-term reward estimation from state s following policy
                π
                Qπ(s,a): Long-term reward estimation from state s executing
                ac o
                action a and then following po cy π
                         ad e oo          g policy
        A simple example
                A maze




                Note t at t e age t does not know its o
                 ote that the agent         ot o ts own pos t o It ca o y
                                                            position. t can only
                perceive what it has in the surrounding states
                                                                            Slide 19
Artificial Intelligence                Machine Learning
Pursuing the goal: Maximize long term reward




                                                                  Slide 20
Artificial Intelligence              Machine Learning
Goals and Rewards
        Ok, but I need to maximize my long term reward. How I
          ,                         y    g
        get the long term reward?
                Long term reward defined in terms of the goal of the agent
                The agent receives the local reward at each time step


        How?
                Intuitive idea: Sum all the rewards obtained so far




                Problem: It can increase heavily in non-ending tasks



                                                                             Slide 21
Artificial Intelligence                Machine Learning
Goals and Rewards
        How can we deal with non-ending tasks?
                                      g
                Weighted addition of local rewards




                The γ parameter (0 < γ < 1) is the discounting factor
                  e pa a ete              ) s t e d scou t g acto

      …r                                                                                         …
                                at rt+1          at+1 rt+2             at+2 rt+3          at+3
                    t
                          st              st+1                  st+2               st+3




                Note t e b as for immediate rewards
                 ote the bias o      ed ate e a ds
                          If you want to avoid it, set γ close to 1
                                                                                                 Slide 22
Artificial Intelligence                          Machine Learning
Some examples




                                                      Slide 23
Artificial Intelligence   Machine Learning
Pole balancing
        Balance the pole
                    p
                The car can move forward
                a d backward
                and bac a d
                Avoid failure:
                          the pole falling beyond
                          a certain critical angle
                          the car hitting the end of the track
                                        g


                Reward
                          -1 upon failure
                          -ak, for k steps before failure
                           a




                                                                  Slide 24
Artificial Intelligence                        Machine Learning
Mountain Car Problem
        Objective
          j
                Get to the top of the hill as
                qu c y
                quickly as poss b e
                           possible


                State d fi iti
                St t definition:
                          Car position and speed


                Actions
                          Forward, reverse, none


                Reward
                          -1 for each step that are not the on the top of the hill
                          -number of steps before reaching the top of the hill
                                                                                     Slide 25
Artificial Intelligence                       Machine Learning
Next Class

        How t l
        H   to learn th policies
                     the  li i




                                                Slide 26
Artificial Intelligence      Machine Learning
Introduction to Machine
       Learning
                  Lecture 21
      Reinforcement Learning

                 Albert Orriols i Puig
             http://www.albertorriols.net
             htt //       lb t i l      t
                aorriols@salle.url.edu

      Artificial Intelligence – Machine Learning
                        g                      g
          Enginyeria i Arquitectura La Salle
                 Universitat Ramon Llull

Weitere ähnliche Inhalte

Was ist angesagt?

New Challenges in Learning Classifier Systems: Mining Rarities and Evolving F...
New Challenges in Learning Classifier Systems: Mining Rarities and Evolving F...New Challenges in Learning Classifier Systems: Mining Rarities and Evolving F...
New Challenges in Learning Classifier Systems: Mining Rarities and Evolving F...Albert Orriols-Puig
 
CCIA'2008: Can Evolution Strategies Improve Learning Guidance in XCS? Design ...
CCIA'2008: Can Evolution Strategies Improve Learning Guidance in XCS? Design ...CCIA'2008: Can Evolution Strategies Improve Learning Guidance in XCS? Design ...
CCIA'2008: Can Evolution Strategies Improve Learning Guidance in XCS? Design ...Albert Orriols-Puig
 
2 tri partite model algebra
2 tri partite model algebra2 tri partite model algebra
2 tri partite model algebraAle Cignetti
 
HIS'2008: New Crossover Operator for Evolutionary Rule Discovery in XCS
HIS'2008: New Crossover Operator for Evolutionary Rule Discovery in XCSHIS'2008: New Crossover Operator for Evolutionary Rule Discovery in XCS
HIS'2008: New Crossover Operator for Evolutionary Rule Discovery in XCSAlbert Orriols-Puig
 
4th grade math curriculum map 2011 2012-1
4th grade math curriculum map 2011 2012-14th grade math curriculum map 2011 2012-1
4th grade math curriculum map 2011 2012-1Isaac_Schools_5
 
Knowledge Components & Objects
Knowledge Components & ObjectsKnowledge Components & Objects
Knowledge Components & Objectsmohdazrulazlan
 
cvpr2011: human activity recognition - part 5: description based
cvpr2011: human activity recognition - part 5: description basedcvpr2011: human activity recognition - part 5: description based
cvpr2011: human activity recognition - part 5: description basedzukun
 
Ontology 101 - New York Semantic Technology Conference
Ontology 101 - New York Semantic Technology ConferenceOntology 101 - New York Semantic Technology Conference
Ontology 101 - New York Semantic Technology ConferenceRobert Kost
 
Ontology 101 - Kendall & McGuiness
Ontology 101 - Kendall & McGuinessOntology 101 - Kendall & McGuiness
Ontology 101 - Kendall & McGuinessthematixpartners
 
ACM ICMI Workshop 2012
ACM ICMI Workshop 2012ACM ICMI Workshop 2012
ACM ICMI Workshop 2012Lê Anh
 

Was ist angesagt? (20)

Lecture3 - Machine Learning
Lecture3 - Machine LearningLecture3 - Machine Learning
Lecture3 - Machine Learning
 
Lecture2 - Machine Learning
Lecture2 - Machine LearningLecture2 - Machine Learning
Lecture2 - Machine Learning
 
Lecture17
Lecture17Lecture17
Lecture17
 
Lecture24
Lecture24Lecture24
Lecture24
 
Lecture4 - Machine Learning
Lecture4 - Machine LearningLecture4 - Machine Learning
Lecture4 - Machine Learning
 
Lecture11 - neural networks
Lecture11 - neural networksLecture11 - neural networks
Lecture11 - neural networks
 
New Challenges in Learning Classifier Systems: Mining Rarities and Evolving F...
New Challenges in Learning Classifier Systems: Mining Rarities and Evolving F...New Challenges in Learning Classifier Systems: Mining Rarities and Evolving F...
New Challenges in Learning Classifier Systems: Mining Rarities and Evolving F...
 
CCIA'2008: Can Evolution Strategies Improve Learning Guidance in XCS? Design ...
CCIA'2008: Can Evolution Strategies Improve Learning Guidance in XCS? Design ...CCIA'2008: Can Evolution Strategies Improve Learning Guidance in XCS? Design ...
CCIA'2008: Can Evolution Strategies Improve Learning Guidance in XCS? Design ...
 
Lecture5 - C4.5
Lecture5 - C4.5Lecture5 - C4.5
Lecture5 - C4.5
 
2 tri partite model algebra
2 tri partite model algebra2 tri partite model algebra
2 tri partite model algebra
 
HIS'2008: New Crossover Operator for Evolutionary Rule Discovery in XCS
HIS'2008: New Crossover Operator for Evolutionary Rule Discovery in XCSHIS'2008: New Crossover Operator for Evolutionary Rule Discovery in XCS
HIS'2008: New Crossover Operator for Evolutionary Rule Discovery in XCS
 
4th grade math curriculum map 2011 2012-1
4th grade math curriculum map 2011 2012-14th grade math curriculum map 2011 2012-1
4th grade math curriculum map 2011 2012-1
 
Lecture13 - Association Rules
Lecture13 - Association RulesLecture13 - Association Rules
Lecture13 - Association Rules
 
Knowledge Components & Objects
Knowledge Components & ObjectsKnowledge Components & Objects
Knowledge Components & Objects
 
cvpr2011: human activity recognition - part 5: description based
cvpr2011: human activity recognition - part 5: description basedcvpr2011: human activity recognition - part 5: description based
cvpr2011: human activity recognition - part 5: description based
 
Ontology 101 - New York Semantic Technology Conference
Ontology 101 - New York Semantic Technology ConferenceOntology 101 - New York Semantic Technology Conference
Ontology 101 - New York Semantic Technology Conference
 
Ontology 101 - Kendall & McGuiness
Ontology 101 - Kendall & McGuinessOntology 101 - Kendall & McGuiness
Ontology 101 - Kendall & McGuiness
 
Making Intelligence
Making IntelligenceMaking Intelligence
Making Intelligence
 
Lecture04 / scenarios
Lecture04 / scenariosLecture04 / scenarios
Lecture04 / scenarios
 
ACM ICMI Workshop 2012
ACM ICMI Workshop 2012ACM ICMI Workshop 2012
ACM ICMI Workshop 2012
 

Mehr von Albert Orriols-Puig

Lecture1 AI1 Introduction to artificial intelligence
Lecture1 AI1 Introduction to artificial intelligenceLecture1 AI1 Introduction to artificial intelligence
Lecture1 AI1 Introduction to artificial intelligenceAlbert Orriols-Puig
 
HAIS09-BeyondHomemadeArtificialDatasets
HAIS09-BeyondHomemadeArtificialDatasetsHAIS09-BeyondHomemadeArtificialDatasets
HAIS09-BeyondHomemadeArtificialDatasetsAlbert Orriols-Puig
 
Lecture14 - Advanced topics in association rules
Lecture14 - Advanced topics in association rulesLecture14 - Advanced topics in association rules
Lecture14 - Advanced topics in association rulesAlbert Orriols-Puig
 
Lecture9 - Bayesian-Decision-Theory
Lecture9 - Bayesian-Decision-TheoryLecture9 - Bayesian-Decision-Theory
Lecture9 - Bayesian-Decision-TheoryAlbert Orriols-Puig
 
IWLCS'2008: First Approach toward Online Evolution of Association Rules wit...
IWLCS'2008: First Approach toward Online Evolution of Association Rules wit...IWLCS'2008: First Approach toward Online Evolution of Association Rules wit...
IWLCS'2008: First Approach toward Online Evolution of Association Rules wit...Albert Orriols-Puig
 
HIS'2008: Genetic-based Synthetic Data Sets for the Analysis of Classifiers B...
HIS'2008: Genetic-based Synthetic Data Sets for the Analysis of Classifiers B...HIS'2008: Genetic-based Synthetic Data Sets for the Analysis of Classifiers B...
HIS'2008: Genetic-based Synthetic Data Sets for the Analysis of Classifiers B...Albert Orriols-Puig
 
HIS'2008: Artificial Data Sets based on Knowledge Generators: Analysis of Lea...
HIS'2008: Artificial Data Sets based on Knowledge Generators: Analysis of Lea...HIS'2008: Artificial Data Sets based on Knowledge Generators: Analysis of Lea...
HIS'2008: Artificial Data Sets based on Knowledge Generators: Analysis of Lea...Albert Orriols-Puig
 

Mehr von Albert Orriols-Puig (12)

Lecture1 AI1 Introduction to artificial intelligence
Lecture1 AI1 Introduction to artificial intelligenceLecture1 AI1 Introduction to artificial intelligence
Lecture1 AI1 Introduction to artificial intelligence
 
HAIS09-BeyondHomemadeArtificialDatasets
HAIS09-BeyondHomemadeArtificialDatasetsHAIS09-BeyondHomemadeArtificialDatasets
HAIS09-BeyondHomemadeArtificialDatasets
 
Lecture19
Lecture19Lecture19
Lecture19
 
Lecture14 - Advanced topics in association rules
Lecture14 - Advanced topics in association rulesLecture14 - Advanced topics in association rules
Lecture14 - Advanced topics in association rules
 
Lecture12 - SVM
Lecture12 - SVMLecture12 - SVM
Lecture12 - SVM
 
Lecture10 - Naïve Bayes
Lecture10 - Naïve BayesLecture10 - Naïve Bayes
Lecture10 - Naïve Bayes
 
Lecture9 - Bayesian-Decision-Theory
Lecture9 - Bayesian-Decision-TheoryLecture9 - Bayesian-Decision-Theory
Lecture9 - Bayesian-Decision-Theory
 
Lecture8 - From CBR to IBk
Lecture8 - From CBR to IBkLecture8 - From CBR to IBk
Lecture8 - From CBR to IBk
 
Lecture1 - Machine Learning
Lecture1 - Machine LearningLecture1 - Machine Learning
Lecture1 - Machine Learning
 
IWLCS'2008: First Approach toward Online Evolution of Association Rules wit...
IWLCS'2008: First Approach toward Online Evolution of Association Rules wit...IWLCS'2008: First Approach toward Online Evolution of Association Rules wit...
IWLCS'2008: First Approach toward Online Evolution of Association Rules wit...
 
HIS'2008: Genetic-based Synthetic Data Sets for the Analysis of Classifiers B...
HIS'2008: Genetic-based Synthetic Data Sets for the Analysis of Classifiers B...HIS'2008: Genetic-based Synthetic Data Sets for the Analysis of Classifiers B...
HIS'2008: Genetic-based Synthetic Data Sets for the Analysis of Classifiers B...
 
HIS'2008: Artificial Data Sets based on Knowledge Generators: Analysis of Lea...
HIS'2008: Artificial Data Sets based on Knowledge Generators: Analysis of Lea...HIS'2008: Artificial Data Sets based on Knowledge Generators: Analysis of Lea...
HIS'2008: Artificial Data Sets based on Knowledge Generators: Analysis of Lea...
 

Kürzlich hochgeladen

Keynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designKeynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designMIPLM
 
Choosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for ParentsChoosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for Parentsnavabharathschool99
 
Global Lehigh Strategic Initiatives (without descriptions)
Global Lehigh Strategic Initiatives (without descriptions)Global Lehigh Strategic Initiatives (without descriptions)
Global Lehigh Strategic Initiatives (without descriptions)cama23
 
Proudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxProudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxthorishapillay1
 
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17Celine George
 
ANG SEKTOR NG agrikultura.pptx QUARTER 4
ANG SEKTOR NG agrikultura.pptx QUARTER 4ANG SEKTOR NG agrikultura.pptx QUARTER 4
ANG SEKTOR NG agrikultura.pptx QUARTER 4MiaBumagat1
 
Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Mark Reed
 
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxMULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxAnupkumar Sharma
 
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTSGRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTSJoshuaGantuangco2
 
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)lakshayb543
 
How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17Celine George
 
Karra SKD Conference Presentation Revised.pptx
Karra SKD Conference Presentation Revised.pptxKarra SKD Conference Presentation Revised.pptx
Karra SKD Conference Presentation Revised.pptxAshokKarra1
 
Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17Celine George
 
FILIPINO PSYCHology sikolohiyang pilipino
FILIPINO PSYCHology sikolohiyang pilipinoFILIPINO PSYCHology sikolohiyang pilipino
FILIPINO PSYCHology sikolohiyang pilipinojohnmickonozaleda
 
Virtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdf
Virtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdfVirtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdf
Virtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdfErwinPantujan2
 
Judging the Relevance and worth of ideas part 2.pptx
Judging the Relevance  and worth of ideas part 2.pptxJudging the Relevance  and worth of ideas part 2.pptx
Judging the Relevance and worth of ideas part 2.pptxSherlyMaeNeri
 
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdfAMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdfphamnguyenenglishnb
 

Kürzlich hochgeladen (20)

Keynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designKeynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-design
 
Choosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for ParentsChoosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for Parents
 
YOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptx
YOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptxYOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptx
YOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptx
 
Global Lehigh Strategic Initiatives (without descriptions)
Global Lehigh Strategic Initiatives (without descriptions)Global Lehigh Strategic Initiatives (without descriptions)
Global Lehigh Strategic Initiatives (without descriptions)
 
Proudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxProudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptx
 
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
 
ANG SEKTOR NG agrikultura.pptx QUARTER 4
ANG SEKTOR NG agrikultura.pptx QUARTER 4ANG SEKTOR NG agrikultura.pptx QUARTER 4
ANG SEKTOR NG agrikultura.pptx QUARTER 4
 
Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)
 
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxMULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
 
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTSGRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
 
FINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptx
FINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptxFINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptx
FINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptx
 
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
 
Raw materials used in Herbal Cosmetics.pptx
Raw materials used in Herbal Cosmetics.pptxRaw materials used in Herbal Cosmetics.pptx
Raw materials used in Herbal Cosmetics.pptx
 
How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17
 
Karra SKD Conference Presentation Revised.pptx
Karra SKD Conference Presentation Revised.pptxKarra SKD Conference Presentation Revised.pptx
Karra SKD Conference Presentation Revised.pptx
 
Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17
 
FILIPINO PSYCHology sikolohiyang pilipino
FILIPINO PSYCHology sikolohiyang pilipinoFILIPINO PSYCHology sikolohiyang pilipino
FILIPINO PSYCHology sikolohiyang pilipino
 
Virtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdf
Virtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdfVirtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdf
Virtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdf
 
Judging the Relevance and worth of ideas part 2.pptx
Judging the Relevance  and worth of ideas part 2.pptxJudging the Relevance  and worth of ideas part 2.pptx
Judging the Relevance and worth of ideas part 2.pptx
 
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdfAMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
 

Introduction to Reinforcement Learning

  • 1. Introduction to Machine Learning Lecture 21 Reinforcement Learning Albert Orriols i Puig http://www.albertorriols.net htt // lb t i l t aorriols@salle.url.edu Artificial Intelligence – Machine Learning g g Enginyeria i Arquitectura La Salle Universitat Ramon Llull
  • 2. Recap of Lectures 5-18 Supervised learning p g Data classification Labeled data Build a model that covers all the space Unsupervised learning Clustering Unlabeled data Group similar objects G i il bj t Association rule analysis Unlabeled data Get the most frequent/important associations Genetic Fuzzy Systems Slide 2 Artificial Intelligence Machine Learning
  • 3. Today’s Agenda Introduction Reinforcement Learning Some examples before going farther Slide 3 Artificial Intelligence Machine Learning
  • 4. Introduction What does reinforcement learning aim at? g Learning from interaction (with environment) Goal-directed learning GOAL State Environment Environment Action Agent agent Learning what to do and its effect Trial-and-error search and delayed reward Slide 4 Artificial Intelligence Machine Learning
  • 5. Introduction Learn a reactive behaviors Behaviors as a mapping between perceptions and actions The Th agent has to exploit what it already knows in order to th t l it h t l dk i dt obtain reward, but it also has to explore in order to make better action selections in the future. Dilemma − neither exploitation nor exploration can be e a e t e e p o tat o o e p o at o ca pursued exclusively without failing at the task. Slide 5 Artificial Intelligence Machine Learning
  • 6. How Can We Learn It? Look-up tables p Rules 1. 3. Perception Action State 1 Action 1 State 2 Action 2 State 3 Action 3 … … Neural Net orks Ne ral Networks Finite t Fi it automata t 2. 4. Slide 6 Artificial Intelligence Machine Learning
  • 7. Reinforcement Learning Slide 7 Artificial Intelligence Machine Learning
  • 8. Reinforcement Learning Reward function Agent r:S → R State Action or Reward st at r:S×A→ R rt Environment Agent and environment interact at discrete time steps t=0,1,2, … The agent g observes state at step t: st ε S produces action at at step t: at ε A(st) gets resulting reward: rt+1 ε R goes to the next step st+1 Slide 8 Artificial Intelligence Machine Learning
  • 9. Reinforcement Learning Agent State Action Reward st at rt Environment Trace of a trial …r … at rt+1 at+1 rt+2 at+2 rt+3 at+3 t st st+1 st+2 st+3 Agent goal: Maximize the total amount of reward t receives Therefore, that means maximizing not only the immediate reward, but cumulative reward in the long run Slide 9 Artificial Intelligence Machine Learning
  • 10. Example of RL Example: Recycling robot State charge level of battery Actions look for cans, wait for can, go recharge Reward R d positive for finding cans, negative for running out of battery Slide 10 Artificial Intelligence Machine Learning
  • 11. More precisely… Restricting to Markovian Decision Process (MDP) g ( ) Finite set of situations Finite t f ti Fi it set of actions Transition probabilities Reward probabilities This means that The agent needs to have complete information of the world State st+1 only depends on state st and action at Slide 11 Artificial Intelligence Machine Learning
  • 12. Recycling Robot Example 1 − β , −3 β , R search wait 1, R wait search recharge 1, 0 High g Low search wait α ,R 1 − α ,R search wait search 1R 1, Slide 12 Artificial Intelligence Machine Learning
  • 13. Recycling Robot Example S = {high, low} g A (high) = {wait, search} A (low ) = {wait, search, recharge} R search : expected # cans while searching R wait : expected # cans while waiting R search > R wait Slide 13 Artificial Intelligence Machine Learning
  • 14. Breaking the Markovian Property Possible problems that do not satisfy MDP p y When action and states are not finite Solution: Discretize the set of actions and states When transition probabilities do not depend only on the current state Possible solution: represent states as structures build up over time from sequences of sensations q This is POMDP Partial observable MDP Use POMDP algorithms to solve these problems g Slide 14 Artificial Intelligence Machine Learning
  • 15. Elements of Reinforcement Learning Slide 15 Artificial Intelligence Machine Learning
  • 16. Elements of RL Policy: what to do Reward: what’s good Value: What’s good because it p ed cts reward a ue at s t predicts e a d Model: What follows what Slide 16 Artificial Intelligence Machine Learning
  • 17. Components of an RL Agent Policy (behavior) Mapping from states to actions π*: S A Reward Local reward in state t: rt Model Probability of transition from state s to s’ by executing action a s T(s,a,s’) And The transitions probabilities depend only on these parameters This is not known by the agent Slide 17 Artificial Intelligence Machine Learning
  • 18. Components of an RL Agent Value functions Vπ(s): Long-term reward estimation from state s following policy π Qπ(s,a): Long-term reward estimation from state s executing ac o action a and then following po cy π ad e oo g policy A simple example A maze Note t at t e age t does not know its o ote that the agent ot o ts own pos t o It ca o y position. t can only perceive what it has in the surrounding states Slide 18 Artificial Intelligence Machine Learning
  • 19. Components of an RL Agent Value functions Vπ(s): Long-term reward estimation from state s following policy π Qπ(s,a): Long-term reward estimation from state s executing ac o action a and then following po cy π ad e oo g policy A simple example A maze Note t at t e age t does not know its o ote that the agent ot o ts own pos t o It ca o y position. t can only perceive what it has in the surrounding states Slide 19 Artificial Intelligence Machine Learning
  • 20. Pursuing the goal: Maximize long term reward Slide 20 Artificial Intelligence Machine Learning
  • 21. Goals and Rewards Ok, but I need to maximize my long term reward. How I , y g get the long term reward? Long term reward defined in terms of the goal of the agent The agent receives the local reward at each time step How? Intuitive idea: Sum all the rewards obtained so far Problem: It can increase heavily in non-ending tasks Slide 21 Artificial Intelligence Machine Learning
  • 22. Goals and Rewards How can we deal with non-ending tasks? g Weighted addition of local rewards The γ parameter (0 < γ < 1) is the discounting factor e pa a ete ) s t e d scou t g acto …r … at rt+1 at+1 rt+2 at+2 rt+3 at+3 t st st+1 st+2 st+3 Note t e b as for immediate rewards ote the bias o ed ate e a ds If you want to avoid it, set γ close to 1 Slide 22 Artificial Intelligence Machine Learning
  • 23. Some examples Slide 23 Artificial Intelligence Machine Learning
  • 24. Pole balancing Balance the pole p The car can move forward a d backward and bac a d Avoid failure: the pole falling beyond a certain critical angle the car hitting the end of the track g Reward -1 upon failure -ak, for k steps before failure a Slide 24 Artificial Intelligence Machine Learning
  • 25. Mountain Car Problem Objective j Get to the top of the hill as qu c y quickly as poss b e possible State d fi iti St t definition: Car position and speed Actions Forward, reverse, none Reward -1 for each step that are not the on the top of the hill -number of steps before reaching the top of the hill Slide 25 Artificial Intelligence Machine Learning
  • 26. Next Class How t l H to learn th policies the li i Slide 26 Artificial Intelligence Machine Learning
  • 27. Introduction to Machine Learning Lecture 21 Reinforcement Learning Albert Orriols i Puig http://www.albertorriols.net htt // lb t i l t aorriols@salle.url.edu Artificial Intelligence – Machine Learning g g Enginyeria i Arquitectura La Salle Universitat Ramon Llull