SlideShare ist ein Scribd-Unternehmen logo
1 von 23
Downloaden Sie, um offline zu lesen
EECS 463 Course Project   1




            ADAPTIVE LEARNING IN
            GAMES
3/11/2010   Suvarup Saha
Outline
2


     Motivation
     Games
     Learning in Games
     Adaptive Learning
       Example
     Gradient Techniques
     Conclusion


                           EECS 463 Course Project   3/11/2010
Motivation
3


     Adaptive Filtering Techniques generalize to a lot of
     applications outside
       Gradient Based iterative search
       Stochastic Gradient
       Least Squares
     Application of Game Theory in less than rational multi-
     agent scenarios demand self-learning mechanisms
     Adaptive techniques can be applied in such instances to
     help the agents learn the game and play intelligently

                             EECS 463 Course Project   3/11/2010
Games
4


     A game is an interaction between two or more self-interested
     agents
     Each agent chooses a strategy si from a set of strategies, Si
     A (joint) strategy profile, s, is the set of chosen strategies, also
     called an outcome of the game in a single play
     Each agent has a utility function, ui(s), specifying their
     preference for each outcome in terms of a payoff
     An agent’s best response is the strategy with the highest
     payoff, given its opponents choice of strategy
     A Nash equilibrium is a strategy profile such that every
     agent’s strategy is a best response to others’ choice of strategy

                                 EECS 463 Course Project   3/11/2010
A Normal Form Game
5

                                  B

                           b1           b2

                A     a1   4,4          5,2
                      a2   0,1          4,3


     This is a 2 player game with SA={a1,a2}, SB={b1,b2}
     The ui(s) are explicitly given in a matrix form, for
     example uA(a1, b2) = 5, uB(a1, b2) = 2
     The best response of A to B playing b2 is a1
     In this game, (a1, b1) is the unique Nash Equilibrium
                                EECS 463 Course Project   3/11/2010
Learning in Games
6


     Classical Approach: Compute an optimal/equilibrium
     strategy
     Some criticisms to this approach are
       Other agents’ utilities might be unknown to an agent for
       computing an equilibrium strategy
       Other agents might not be playing an equilibrium strategy
       Computing an equilibrium strategy might be hard
      Another Approach: Learn how to ‘optimally’ play a game
     by
       playing it many times
       updating strategy based on experience
                              EECS 463 Course Project   3/11/2010
Learning Dynamics
7




                                                  Rationality/Sophistication of agents



        Evolutionary        Adaptive             Bayesian
        Dynamics            Learning             Learning




                       Focus of Our Discussion




                                   EECS 463 Course Project   3/11/2010
Evolutionary Dynamics
8

     Inspired by Evolutionary Biology with no appeal to
     rationality of the agents
     Entire population of agents all programmed to use some
     strategy
        Players are randomly matched to play with each other
     Strategies with high payoff spread within the population by
       Learning
       copying or inheriting strategies – Replicator Dynamics
       Infection
     Stability analysis – Evolutionary Stable Strategies (ESS)
       Players playing an ESS must have strictly higher payoffs than a
       small group of invaders playing a different strategy

                                EECS 463 Course Project   3/11/2010
Bayesian Learning
9


     Assumes ‘informed agents’ playing repeated games
     with a finite action space
     Payoffs depend on some characteristics of agents
     represented by types – each agent’s type is private
     information
     The agents’ initial beliefs are given by a common prior
     distribution over agent types
     This belief is updated according to Bayes’ Rule to a
     posterior distribution with each stage of the game.
     In every finite Bayesian game, there is at least one
     Bayesian Nash equilibrium, possibly in mixed strategies

                            EECS 463 Course Project   3/11/2010
Adaptive Learning
10

      Agents are not fully rational, but can learn through
      experience and adapt their strategies
      Agents do not know the reward structure of the game
      Agents are only able to take actions and observe their own
      rewards (or oppnents’ rewards as well)
      Popular Examples
        Best Response Update
        Fictitious Play
        Regret Matching
        Infinitesimal Gradient Ascent (IGA)
        Dynamic Gradient Play
        Adaptive Play Q-learning

                                  EECS 463 Course Project   3/11/2010
Fictitious Play
11


      The learning process is used to develop a ‘historical
      distribution’ of the other agents’ play
      In fictitious play, agent i has an exogenous initial weight
      function       kit: S-i R+
      Weight is updated by adding 1 to the weight of each
      opponent strategy, each time it is played
      The probability that player i assigns to player -i
      playing s-i at date t is given by
                  qit(s-i) = kit(s-i) / Σ kit(s-i)
      The ‘best response’ of the agent i in this fictitious play is
      given by
                 sit+1 = arg max Σ qit(s-i)ui(si, s-it)

                                     EECS 463 Course Project   3/11/2010
An Example
12

      Consider the same 2x2 game example as before
                                                                                     B
      Suppose we assign                                                        b1        b2
          kA0 (b1)= kA0 (b2)= kB0 (a1)= kB0 (a2)= 1                   A   a1   4,4       5,2
      Then, qA0 (b1)= qA0 (b2)= qB0 (a1)= qB0 (a2)= 0.5
                                                                          a2   0,1       4,3
      For A, if A chooses a1
            qA0(b1)uA(a1, b1) + qA0(b2)uA(a1, b2) = .5*4+.5*5 = 4.5
      while if A chooses a2
            qA0(b1)uA(a2, b1) + qA0(b2)uA(a2, b2) = .5*0+.5*4 = 2
      For B, if B chooses b1
            qB0(a1)uB(a1, b1) + qB0(a2)uB(a2, b1) = .5*4+.5*1 = 2.5
      while if B chooses b2
            qB0(a1)uB(a1, b2) + qB0(a2)uB(a2, b2) = .5*2+.5*3 = 2.5
      Clearly, A plays a1 , B can choose either b1 or b2; assume B plays b2

                                         EECS 463 Course Project   3/11/2010
Game proceeds.
13



      stage                        0


      A’s selection                a1


      B’s selection                b2


      A’s payoff                   5


      B’ payoff                    2


      kAt(b1), qAt(b1)   1, 0.5         1, 0.33


      kAt(b2), qAt(b2)   1, 0.5         2, 0.67


      kBt(a1), qBt(a1)   1, 0 .5        2, 0.67


      kBt(a2), qBt(a2)   1, 0 .5        1, 0.33


                                                  EECS 463 Course Project   3/11/2010
Game proceeds..
14



      stage                        0              1


      A’s selection                a1             a1


      B’s selection                b2             b1


      A’s payoff                   5              4


      B’ payoff                    2              4


      kAt(b1), qAt(b1)   1, 0.5         1, 0.33         2, 0.5


      kAt(b2), qAt(b2)   1, 0.5         2, 0.67         2, 0.5


      kBt(a1), qBt(a1)   1, 0 .5        2, 0.67         3, 0.75


      kBt(a2), qBt(a2)   1, 0 .5        1, 0.33         1, 0.25


                                                  EECS 463 Course Project   3/11/2010
Game proceeds…
15



      stage                        0              1               2


      A’s selection                a1             a1              a1


      B’s selection                b2             b1              b1


      A’s payoff                   5              4               4


      B’ payoff                    2              4               4


      kAt(b1), qAt(b1)   1, 0.5         1, 0.33         2, 0.5              3, 0.6


      kAt(b2), qAt(b2)   1, 0.5         2, 0.67         2, 0.5              2, 0.4


      kBt(a1), qBt(a1)   1, 0 .5        2, 0.67         3, 0.75             4, 0.2


      kBt(a2), qBt(a2)   1, 0 .5        1, 0.33         1, 0.25             1, 0.8


                                                  EECS 463 Course Project     3/11/2010
Game proceeds….
16



      stage                        0              1               2                  3


      A’s selection                a1             a1              a1                 a1


      B’s selection                b2             b1              b1                 b1


      A’s payoff                   5              4               4                  4


      B’ payoff                    2              4               4                  4


      kAt(b1), qAt(b1)   1, 0.5         1, 0.33         2, 0.5              3, 0.6        4, 0.67


      kAt(b2), qAt(b2)   1, 0.5         2, 0.67         2, 0.5              2, 0.4        2, 0.33


      kBt(a1), qBt(a1)   1, 0 .5        2, 0.67         3, 0.75             4, 0.2        5, 0 .84


      kBt(a2), qBt(a2)   1, 0 .5        1, 0.33         1, 0.25             1, 0.8        1, 0.16


                                                  EECS 463 Course Project     3/11/2010
Gradient Based Learning
17


      Fictitious Play assumes unbounded computation is
      allowed in every step – arg max calculation
      An alternative is to proceed in gradient ascent on some
      objective function – expected payoff
      Two players – row and column – have payoffs
               r  r        c c 
            R= 11

                   r 
                       and
                       12
                           C=    
                                               11    12

                r
                21    22    c c             21    22

      Row player chooses action 1 with probability α while
      column player chooses action 2 with probability β
      Expected payoffs are
            Vr (α, β ) = r11αβ + r12α (1 − β ) + r21(1 − α)β + r22 (1 − α )(1 − β )
            Vc (α , β ) = c11αβ + c12α (1 − β ) + c21 (1 − α )β + c22 (1 − α )(1 − β )
                                               EECS 463 Course Project         3/11/2010
Gradient Ascent
18


      Each player repeatedly adjusts her half of the current strategy
      pair in the direction of the current gradient with some step size η
                                                     ∂Vr (α k , β k )
                                     α k +1 = α k + η
                                                          ∂α
                                                     ∂V (α , β )
                                     β k +1   = βk +η c k k
                                                          ∂β
      In case the equations take the strategies outside the probability
      simplex, it is projected back to the boundary
      Gradient ascent algorithm assumes a full information game –
      both the players know the game matrices and can see the mixed
      strategy of their opponent in the previous step
         u = (r11 + r22 ) − (r21 + r12 )                      u' = (c11 +c22) −(c21 +c12)
         ∂Vr (α , β )                                         ∂Vc (α , β )
                      = βu − (r22 − r12 )                                  = αu ' − (c 22 − c 21 )
            ∂α                                                   ∂β
                                                       EECS 463 Course Project       3/11/2010
Infinitesimal Gradient Ascent
19

      Interesting to see what happens to the strategy pair and to the
      expected payoffs over time
      Strategy pair sequence produced by following a gradient ascent
      algorithm may never converge
      Average payoff of both the players always converges to that of some
      Nash pair
      Consider a small step size assumption – limη →0 so that the update
      equations become        ∂α 
                                 ∂t    0          u  α   − ( r22 − r12 ) 
                                 ∂β   = '                  +
                                       u          0   β   − ( c 22 − c 21 ) 
                                                       
                                                                                
                                 ∂t   
      Point where the gradient is zero – Nash Equilibrium
                                       c − c         r22 − r12 
                        (α * , β * ) =  22 ' 21 ,
                                        u                u    
      This point might even lie outside the probability simplex.

                                           EECS 463 Course Project            3/11/2010
IGA dynamics
20


      Denote the off-diagonal matrix containing u and u’ by U
      Depending on the nature of U (noninvertible, real or imaginary
      e-values) the convergence dynamics will vary




                               EECS 463 Course Project   3/11/2010
WoLF - W(in)-o(r)-L(earn)-Fast
21


      Introduces variable learning rate instead of a fixed η
                                                     ∂Vr (α k , β k )
                             α k +1 = α k + ηl r
                                                          ∂α
                                                 k



                                                     ∂ V c (α k , β k )
                             β k +1 = β k + η l kc
                                                            ∂β
      Let αe be the equilibrium strategy selected by the row player
      and βe be the equilibrium strategy selected by the column player
                         l                 Vr (αk , βk ) > Vr (α e , βk ) →Winning
                     l =  min
                         r
                         k
                         l max                     →
                                            otherwise Losin g

                                l          Vc (αk , βk ) > Vc (αk , β e ) →Winning
                     l   c
                         k    =  min
                                 l max              →
                                             otherwise Losing

      If in a two-person, two-action, iterated general-sum game, both
      players follow the WoLF-IGA algorithm (with lmax>lmin) then their
      strategies will converge to a Nash equilibrium
                                                EECS 463 Course Project          3/11/2010
WoLF-IGA convergence
22




                  EECS 463 Course Project   3/11/2010
To Conclude
23


      Learning in games is popular in anticipation of a future in
      which less than rational agents play a game repeatedly to
      arrive at a stable and efficient equilibrium.
      The algorithmic structure and adaptive techniques involved in
      such learning are largely motivated by Machine Learning and
      Adaptive Filtering
      A Gradient- based approach relieves this computational
      burden but might suffer from convergence issues
      A stochastic gradient method (not discussed in the presentation)
      makes use of minimal information available and still performs
      near-optimally

                                EECS 463 Course Project   3/11/2010

Weitere ähnliche Inhalte

Kürzlich hochgeladen

Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfciinovamais
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104misteraugie
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptxVS Mahajan Coaching Centre
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionSafetyChain Software
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxheathfieldcps1
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphThiyagu K
 
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptxContemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptxRoyAbrique
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformChameera Dedduwage
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxNirmalaLoungPoorunde1
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)eniolaolutunde
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactdawncurless
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfchloefrazer622
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxGaneshChakor2
 
Separation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesSeparation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesFatimaKhan178732
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxSayali Powar
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeThiyagu K
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityGeoBlogs
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactPECB
 
Hybridoma Technology ( Production , Purification , and Application )
Hybridoma Technology  ( Production , Purification , and Application  ) Hybridoma Technology  ( Production , Purification , and Application  )
Hybridoma Technology ( Production , Purification , and Application ) Sakshi Ghasle
 

Kürzlich hochgeladen (20)

Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory Inspection
 
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot Graph
 
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptxContemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy Reform
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptx
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impact
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdf
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptx
 
Separation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesSeparation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and Actinides
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and Mode
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activity
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
 
Hybridoma Technology ( Production , Purification , and Application )
Hybridoma Technology  ( Production , Purification , and Application  ) Hybridoma Technology  ( Production , Purification , and Application  )
Hybridoma Technology ( Production , Purification , and Application )
 

Empfohlen

PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024Neil Kimberley
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)contently
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024Albert Qian
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsKurio // The Social Media Age(ncy)
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Search Engine Journal
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summarySpeakerHub
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next Tessa Mero
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentLily Ray
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best PracticesVit Horky
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project managementMindGenius
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...RachelPearson36
 
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Applitools
 
12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at WorkGetSmarter
 
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...DevGAMM Conference
 

Empfohlen (20)

Skeleton Culture Code
Skeleton Culture CodeSkeleton Culture Code
Skeleton Culture Code
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search Intent
 
How to have difficult conversations
How to have difficult conversations How to have difficult conversations
How to have difficult conversations
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best Practices
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project management
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
 
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
 
12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work
 
ChatGPT webinar slides
ChatGPT webinar slidesChatGPT webinar slides
ChatGPT webinar slides
 
More than Just Lines on a Map: Best Practices for U.S Bike Routes
More than Just Lines on a Map: Best Practices for U.S Bike RoutesMore than Just Lines on a Map: Best Practices for U.S Bike Routes
More than Just Lines on a Map: Best Practices for U.S Bike Routes
 
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
 

Adaptive Learning In Games

  • 1. EECS 463 Course Project 1 ADAPTIVE LEARNING IN GAMES 3/11/2010 Suvarup Saha
  • 2. Outline 2 Motivation Games Learning in Games Adaptive Learning Example Gradient Techniques Conclusion EECS 463 Course Project 3/11/2010
  • 3. Motivation 3 Adaptive Filtering Techniques generalize to a lot of applications outside Gradient Based iterative search Stochastic Gradient Least Squares Application of Game Theory in less than rational multi- agent scenarios demand self-learning mechanisms Adaptive techniques can be applied in such instances to help the agents learn the game and play intelligently EECS 463 Course Project 3/11/2010
  • 4. Games 4 A game is an interaction between two or more self-interested agents Each agent chooses a strategy si from a set of strategies, Si A (joint) strategy profile, s, is the set of chosen strategies, also called an outcome of the game in a single play Each agent has a utility function, ui(s), specifying their preference for each outcome in terms of a payoff An agent’s best response is the strategy with the highest payoff, given its opponents choice of strategy A Nash equilibrium is a strategy profile such that every agent’s strategy is a best response to others’ choice of strategy EECS 463 Course Project 3/11/2010
  • 5. A Normal Form Game 5 B b1 b2 A a1 4,4 5,2 a2 0,1 4,3 This is a 2 player game with SA={a1,a2}, SB={b1,b2} The ui(s) are explicitly given in a matrix form, for example uA(a1, b2) = 5, uB(a1, b2) = 2 The best response of A to B playing b2 is a1 In this game, (a1, b1) is the unique Nash Equilibrium EECS 463 Course Project 3/11/2010
  • 6. Learning in Games 6 Classical Approach: Compute an optimal/equilibrium strategy Some criticisms to this approach are Other agents’ utilities might be unknown to an agent for computing an equilibrium strategy Other agents might not be playing an equilibrium strategy Computing an equilibrium strategy might be hard Another Approach: Learn how to ‘optimally’ play a game by playing it many times updating strategy based on experience EECS 463 Course Project 3/11/2010
  • 7. Learning Dynamics 7 Rationality/Sophistication of agents Evolutionary Adaptive Bayesian Dynamics Learning Learning Focus of Our Discussion EECS 463 Course Project 3/11/2010
  • 8. Evolutionary Dynamics 8 Inspired by Evolutionary Biology with no appeal to rationality of the agents Entire population of agents all programmed to use some strategy Players are randomly matched to play with each other Strategies with high payoff spread within the population by Learning copying or inheriting strategies – Replicator Dynamics Infection Stability analysis – Evolutionary Stable Strategies (ESS) Players playing an ESS must have strictly higher payoffs than a small group of invaders playing a different strategy EECS 463 Course Project 3/11/2010
  • 9. Bayesian Learning 9 Assumes ‘informed agents’ playing repeated games with a finite action space Payoffs depend on some characteristics of agents represented by types – each agent’s type is private information The agents’ initial beliefs are given by a common prior distribution over agent types This belief is updated according to Bayes’ Rule to a posterior distribution with each stage of the game. In every finite Bayesian game, there is at least one Bayesian Nash equilibrium, possibly in mixed strategies EECS 463 Course Project 3/11/2010
  • 10. Adaptive Learning 10 Agents are not fully rational, but can learn through experience and adapt their strategies Agents do not know the reward structure of the game Agents are only able to take actions and observe their own rewards (or oppnents’ rewards as well) Popular Examples Best Response Update Fictitious Play Regret Matching Infinitesimal Gradient Ascent (IGA) Dynamic Gradient Play Adaptive Play Q-learning EECS 463 Course Project 3/11/2010
  • 11. Fictitious Play 11 The learning process is used to develop a ‘historical distribution’ of the other agents’ play In fictitious play, agent i has an exogenous initial weight function kit: S-i R+ Weight is updated by adding 1 to the weight of each opponent strategy, each time it is played The probability that player i assigns to player -i playing s-i at date t is given by qit(s-i) = kit(s-i) / Σ kit(s-i) The ‘best response’ of the agent i in this fictitious play is given by sit+1 = arg max Σ qit(s-i)ui(si, s-it) EECS 463 Course Project 3/11/2010
  • 12. An Example 12 Consider the same 2x2 game example as before B Suppose we assign b1 b2 kA0 (b1)= kA0 (b2)= kB0 (a1)= kB0 (a2)= 1 A a1 4,4 5,2 Then, qA0 (b1)= qA0 (b2)= qB0 (a1)= qB0 (a2)= 0.5 a2 0,1 4,3 For A, if A chooses a1 qA0(b1)uA(a1, b1) + qA0(b2)uA(a1, b2) = .5*4+.5*5 = 4.5 while if A chooses a2 qA0(b1)uA(a2, b1) + qA0(b2)uA(a2, b2) = .5*0+.5*4 = 2 For B, if B chooses b1 qB0(a1)uB(a1, b1) + qB0(a2)uB(a2, b1) = .5*4+.5*1 = 2.5 while if B chooses b2 qB0(a1)uB(a1, b2) + qB0(a2)uB(a2, b2) = .5*2+.5*3 = 2.5 Clearly, A plays a1 , B can choose either b1 or b2; assume B plays b2 EECS 463 Course Project 3/11/2010
  • 13. Game proceeds. 13 stage 0 A’s selection a1 B’s selection b2 A’s payoff 5 B’ payoff 2 kAt(b1), qAt(b1) 1, 0.5 1, 0.33 kAt(b2), qAt(b2) 1, 0.5 2, 0.67 kBt(a1), qBt(a1) 1, 0 .5 2, 0.67 kBt(a2), qBt(a2) 1, 0 .5 1, 0.33 EECS 463 Course Project 3/11/2010
  • 14. Game proceeds.. 14 stage 0 1 A’s selection a1 a1 B’s selection b2 b1 A’s payoff 5 4 B’ payoff 2 4 kAt(b1), qAt(b1) 1, 0.5 1, 0.33 2, 0.5 kAt(b2), qAt(b2) 1, 0.5 2, 0.67 2, 0.5 kBt(a1), qBt(a1) 1, 0 .5 2, 0.67 3, 0.75 kBt(a2), qBt(a2) 1, 0 .5 1, 0.33 1, 0.25 EECS 463 Course Project 3/11/2010
  • 15. Game proceeds… 15 stage 0 1 2 A’s selection a1 a1 a1 B’s selection b2 b1 b1 A’s payoff 5 4 4 B’ payoff 2 4 4 kAt(b1), qAt(b1) 1, 0.5 1, 0.33 2, 0.5 3, 0.6 kAt(b2), qAt(b2) 1, 0.5 2, 0.67 2, 0.5 2, 0.4 kBt(a1), qBt(a1) 1, 0 .5 2, 0.67 3, 0.75 4, 0.2 kBt(a2), qBt(a2) 1, 0 .5 1, 0.33 1, 0.25 1, 0.8 EECS 463 Course Project 3/11/2010
  • 16. Game proceeds…. 16 stage 0 1 2 3 A’s selection a1 a1 a1 a1 B’s selection b2 b1 b1 b1 A’s payoff 5 4 4 4 B’ payoff 2 4 4 4 kAt(b1), qAt(b1) 1, 0.5 1, 0.33 2, 0.5 3, 0.6 4, 0.67 kAt(b2), qAt(b2) 1, 0.5 2, 0.67 2, 0.5 2, 0.4 2, 0.33 kBt(a1), qBt(a1) 1, 0 .5 2, 0.67 3, 0.75 4, 0.2 5, 0 .84 kBt(a2), qBt(a2) 1, 0 .5 1, 0.33 1, 0.25 1, 0.8 1, 0.16 EECS 463 Course Project 3/11/2010
  • 17. Gradient Based Learning 17 Fictitious Play assumes unbounded computation is allowed in every step – arg max calculation An alternative is to proceed in gradient ascent on some objective function – expected payoff Two players – row and column – have payoffs r r  c c  R= 11 r  and 12 C=  11 12  r 21  22 c c  21 22 Row player chooses action 1 with probability α while column player chooses action 2 with probability β Expected payoffs are Vr (α, β ) = r11αβ + r12α (1 − β ) + r21(1 − α)β + r22 (1 − α )(1 − β ) Vc (α , β ) = c11αβ + c12α (1 − β ) + c21 (1 − α )β + c22 (1 − α )(1 − β ) EECS 463 Course Project 3/11/2010
  • 18. Gradient Ascent 18 Each player repeatedly adjusts her half of the current strategy pair in the direction of the current gradient with some step size η ∂Vr (α k , β k ) α k +1 = α k + η ∂α ∂V (α , β ) β k +1 = βk +η c k k ∂β In case the equations take the strategies outside the probability simplex, it is projected back to the boundary Gradient ascent algorithm assumes a full information game – both the players know the game matrices and can see the mixed strategy of their opponent in the previous step u = (r11 + r22 ) − (r21 + r12 ) u' = (c11 +c22) −(c21 +c12) ∂Vr (α , β ) ∂Vc (α , β ) = βu − (r22 − r12 ) = αu ' − (c 22 − c 21 ) ∂α ∂β EECS 463 Course Project 3/11/2010
  • 19. Infinitesimal Gradient Ascent 19 Interesting to see what happens to the strategy pair and to the expected payoffs over time Strategy pair sequence produced by following a gradient ascent algorithm may never converge Average payoff of both the players always converges to that of some Nash pair Consider a small step size assumption – limη →0 so that the update equations become  ∂α   ∂t  0 u  α   − ( r22 − r12 )   ∂β = ' +   u 0   β   − ( c 22 − c 21 )        ∂t  Point where the gradient is zero – Nash Equilibrium c − c r22 − r12  (α * , β * ) =  22 ' 21 ,  u u   This point might even lie outside the probability simplex. EECS 463 Course Project 3/11/2010
  • 20. IGA dynamics 20 Denote the off-diagonal matrix containing u and u’ by U Depending on the nature of U (noninvertible, real or imaginary e-values) the convergence dynamics will vary EECS 463 Course Project 3/11/2010
  • 21. WoLF - W(in)-o(r)-L(earn)-Fast 21 Introduces variable learning rate instead of a fixed η ∂Vr (α k , β k ) α k +1 = α k + ηl r ∂α k ∂ V c (α k , β k ) β k +1 = β k + η l kc ∂β Let αe be the equilibrium strategy selected by the row player and βe be the equilibrium strategy selected by the column player l Vr (αk , βk ) > Vr (α e , βk ) →Winning l =  min r k l max → otherwise Losin g l Vc (αk , βk ) > Vc (αk , β e ) →Winning l c k =  min  l max → otherwise Losing If in a two-person, two-action, iterated general-sum game, both players follow the WoLF-IGA algorithm (with lmax>lmin) then their strategies will converge to a Nash equilibrium EECS 463 Course Project 3/11/2010
  • 22. WoLF-IGA convergence 22 EECS 463 Course Project 3/11/2010
  • 23. To Conclude 23 Learning in games is popular in anticipation of a future in which less than rational agents play a game repeatedly to arrive at a stable and efficient equilibrium. The algorithmic structure and adaptive techniques involved in such learning are largely motivated by Machine Learning and Adaptive Filtering A Gradient- based approach relieves this computational burden but might suffer from convergence issues A stochastic gradient method (not discussed in the presentation) makes use of minimal information available and still performs near-optimally EECS 463 Course Project 3/11/2010