SlideShare ist ein Scribd-Unternehmen logo
1 von 109
Disappointing and Unexpected
     Results in Monte-Carlo Tree Search

                                O. Teytaud & colleagues
                           Silver Workshop, ECML 2012




In a nutshell:
- the game of Go, a great AI-complete challenge
- MCTS, a great recent tool for MDP-solving
- negative results on MCTS are the most important stuff
- considerations on academic publications (pros and cons)
Disappointing and Unexpected
     Results in Monte-Carlo Tree Search

                                O. Teytaud & colleagues
                           Silver Workshop, ECML 2012
                                     If you solve these weaknesses,
                                          even if it takes all your
                                   time in all your research during 30
                                       years, it is worth being done.
In a nutshell:
- the game of Go, a great AI-complete challenge
- MCTS, a great recent tool for MDP-solving
- negative results on MCTS are the most important stuff
- considerations on academic publications (pros and cons)
Part I. A success story
        on Computer Games
Part II. Two unsolved problems in Computer Games

Part III. Some algorithms which do not solve them

Part IV. Conclusion (technical)

Part V. Meta-conclusion (non-technical)
Part I : The Success Story
         (less showing off in part II :-) )


            The game of Go is a beautiful
                   Challenge.
Part I : The Success Story
         (less showing off in part II :-) )


            The game of Go is a beautiful
                    challenge.




             We did the first wins against
                professional players
                 in the game of Go
Game of Go (9x9 here)
Game of Go
Game of Go
Game of Go
Game of Go
Game of Go
Game of Go
Game of Go: counting territories
(white has 7.5 “bonus” as black starts)
Game of Go: the rules
       Black plays at the blue circle:
       the white group dies (it is
       removed)

It's impossible to kill white (two “eyes”).




     “Superko” rule: we don't come back to the same
     situation.

              (without superko: “PSPACE hard”
                with superko: “EXPTIME-hard”)

 At the end, we count territories
 ==> black starts, so +7.5 for white.
UCT (Upper Confidence Trees)
               (a variant of MCTS)




Coulom (06)
Chaslot, Saito & Bouzy (06)
Kocsis Szepesvari (06)
UCT
UCT
UCT
UCT
UCT
      Kocsis & Szepesvari (06)
Exploitation ...
Exploitation ...
            SCORE =
                5/7
             + k.sqrt( log(10)/7 )
Exploitation ...
            SCORE =
                5/7
             + k.sqrt( log(10)/7 )
Exploitation ...
            SCORE =
                5/7
             + k.sqrt( log(10)/7 )
... or exploration ?
              SCORE =
                  0/2
               + k.sqrt( log(10)/2 )
“UCB” ?
•   I have shown the “UCB” formula (Lai, Robbins), which is
    the difference between MCTS and UCT
“UCB” ?
•   I have shown the “UCB” formula (Lai, Robbins), which is
    the difference between MCTS and UCT

•   The UCB formula has deep mathematical principles.
“UCB” ?
•   I have shown the “UCB” formula (Lai, Robbins), which is
    the difference between MCTS and UCT

•   The UCB formula has deep mathematical principles.

•   But very far from the MCTS context.
“UCB” ?
•   I have shown the “UCB” formula (Lai, Robbins), which is
    the difference between MCTS and UCT

•   The UCB formula has deep mathematical principles.

•   But very far from the MCTS context.

•   Contrarily to what has often been claimed, UCB is
    not central in MCTS.
“UCB” ?
•   I have shown the “UCB” formula (Lai, Robbins), which is
    the difference between MCTS and UCT

•   The UCB formula has deep mathematical principles.

•   But very far from the MCTS context.

•   Contrarily to what has often been claimed, UCB is
    not central in MCTS.

•   But for publishing papers, relating MCTS to UCB is
    so beautiful, with plenty of maths papers in the
    bibliography :-)
The great news:
● Not related to classical algorithms
              (no alpha-beta)
● Recent tools

              (Rémi Coulom's paper in 2006)
● Not at all specific from Go

              (now widely used in games,
              and beyond)
The great news:
● Not related to classical algorithms
              (no alpha-beta)
● Recent tools

              (Rémi Coulom's paper in 2006)
● Not at all specific from Go

              (now widely used in games,
              and beyond)

    But great performance in Go
         needs adaptations
         (of the MC part)...
We all have to write reports:
● Showing that we are very strong
● Showing that our research has “breakthroughs”,

    which destroy “bottlenecks”

So ok the previous slide is perfect for that
Part II: challenges

Two main challenges:
● Situations which require abstract thinking

                                (cf. Cazenave)
● Situations which involve divide & conquer

                                     (cf Müller)
Part I. A success story on Computer Games

Part II. Two unsolved problems in
        Computer Games
Part III. Some algorithms which do not solve them

Part IV. Conclusion (technical)

Part V. Meta-conclusion (non-technical)
A trivial semeai
           (= “liberty” race)
             Plenty of equivalent
                       situations!

              They are randomly
                  sampled, with 
               no generalization.

               50% of estimated
                 win probability!
Semeai

     Plenty of equivalent
               situations!

         They are randomly
             sampled, with 
          no generalization.

          50% of estimated
            win probability!
Semeai

     Plenty of equivalent
               situations!

         They are randomly
             sampled, with 
          no generalization.

          50% of estimated
            win probability!
Semeai

     Plenty of equivalent
               situations!

         They are randomly
             sampled, with 
          no generalization.

          50% of estimated
            win probability!
Semeai

     Plenty of equivalent
               situations!

         They are randomly
             sampled, with 
          no generalization.

          50% of estimated
            win probability!
Semeai

     Plenty of equivalent
               situations!

         They are randomly
             sampled, with 
          no generalization.

          50% of estimated
            win probability!
Semeai

     Plenty of equivalent
               situations!

         They are randomly
             sampled, with 
          no generalization.

          50% of estimated
            win probability!
Semeai

     Plenty of equivalent
               situations!

         They are randomly
             sampled, with 
          no generalization.

          50% of estimated
            win probability!
A trivial semeai

           Plenty of equivalent
                     situations!

            They are randomly
                sampled, with 
             no generalization.

             50% of estimated
               win probability!
A trivial semeai

           Plenty of equivalent
                     situations!

            They are randomly
                sampled, with 
             no generalization.

             50% of estimated
               win probability!
A trivial semeai

           Plenty of equivalent
                     situations!

            They are randomly
                sampled, with 
             no generalization.

             50% of estimated
               win probability!
This is very easy.
Children can solve that.

But it is too abstract
for computers.

Computers play
“semeais” very badly.
It does not work. Why ?

                                              50% of estimated
                                                win probability!


In the first node:
● The first simulations give ~ 50%

● The next simulations go to 100% or 0% 


(depending on the chosen move)
● But, then, we switch to another node 


                                               (~ 8! x 8! such nodes)
And the humans ?

                                 50% of estimated
                                   win probability!


In the first node:
● The first simulations give ~ 50%

● The next simulations go to 100% or 0% 


(depending on the chosen move)
● But, then, we DON'T switch to another node 


 
Requires more than local fighting.
Requires combining several local fights.
Children usually
 not so good
 at this.
But strong adults
 really good.
And computers
  very childish.
                          Looks like a
                           bad move,
                            “locally”.

    Lee Sedol (black)
          Vs
   Hang Jansik (white)
Requires more than local fighting.
Requires combining several local fights.
Children usually
 not so good
 at this.
But strong adults
 really good.
And computers
  very childish.
                        Looks like a
                         bad move,
                          “locally”.
Part I. A success story on Computer Games
Part II. Two unsolved problems in Computer Games

Part III. Some algorithms which
     do not solve them
    (negatives results show that importance stuff is
      really on II...)

Part IV. Conclusion (technical)

Part V. Meta-conclusion (non-technical)
Part III: techniques for addressing these challenges




             1. Parallelization

           2. Machine Learning

        3. Genetic Programming

              4. Nested MCTS
Parallelizing MCTS
•       On a parallel machine with shared memory: just many
        simulations in parallel, the same memory for all.

•       On a parallel machine with no shared memory: one
        MCTS per comp. node, and 3 times per second:

    ●   Select nodes with at least 5% of total sims (depth at most
        3)

    ●   Average all statistics on these nodes

        ==> comp cost = log(nb comp nodes)
Parallelizing MCTS
•       On a parallel machine with shared memory: just many
        simulations in parallel, the same memory for all.

•       On a parallel machine with no shared memory: one
        MCTS per comp. node, and 3 times per second:

    ●   Select nodes with at least 5% of total sims (depth at most
        3)

    ●   Average all statistics on these nodes

        ==> comp cost = log(nb comp nodes)
Parallelizing MCTS
•       On a parallel machine with shared memory: just many
        simulations in parallel, the same memory for all.

•       On a parallel machine with no shared memory: one
        MCTS per comp. node, and 3 times per second:

    ●   Select nodes with at least 5% of total sims (depth at most
        3)

    ●   Average all statistics on these nodes

        ==> comp cost = log(nb comp nodes)
Parallelizing MCTS
•       On a parallel machine with shared memory: just many
        simulations in parallel, the same memory for all.

•       On a parallel machine with no shared memory: one
        MCTS per comp. node, and 3 times per second:

    ●   Select nodes with at least 5% of total sims (depth at most
        3)

    ●   Average all statistics on these nodes

        ==> comp cost = log(nb comp nodes)
Parallelizing MCTS
•       On a parallel machine with shared memory: just many
        simulations in parallel, the same memory for all.

•       On a parallel machine with no shared memory: one
        MCTS per comp. node, and 3 times per second:

    ●   Select nodes with at least 5% of total sims (depth at most
        3)

    ●   Average all statistics on these nodes

        ==> comp cost = log(nb comp nodes)
Good news: it works
 So misleading numbers...
Much better than voting schemes




  But little difference with T. Cazenave
  (depth 0).
Every month, someone tells us:


                Try with a bigger
                   machine !
                And win against
                    top pros !


                (I have believed that,
                   at some point...)
In fact, “32” and “1”
have almost the same level...
               (against humans...)
Being faster is not the solution
The same in Havannah
      (F. Teytaud)
More deeply, 1
                            (R. Coulom)

Improvement in terms of performance against
humans

               <<

Improvement in terms of performance against
computers

               <<

Improvements in terms of self-play
More deeply, 2



No improvement in divide and conquer.



    No improvement on situations

      which require abstraction.
Part III: techniques for adressing these challenges




             1. Parallelization

         2. Machine Learning

       3. Genetic Programming

             4. Nested MCTS
Machine learning

A lot of tuning of the MC is central.

  It is a bit disappointing for the
      genericity of the method.

        Can we make this
        tuning automatic ?
A classical machine learning trick in MCTS: RAVE
              (= rapid action value estimates)
   score(move) =
             alpha UCB(move)
              + (1-alpha) StatisticsInSubtree(move)
   Alpha2 = nbSimulations / ( K + nbSimulations)
Usually works well, but performs weakly on some situations.

weakness:
 - brings information only from bottom to top of the tree
 - does not solve main problems
 - sometimes very harmful
 ==> extensions ?
Here B2 is the only good move for white.
But B2 makes sense only as a first move,
  and nowhere else in subtrees ==> RAVE rejects B2.




==> extensions ?
A classical machine learning trick in MCTS: RAVE
              (= rapid action value estimates)
   score(move) =
             alpha UCB(move)
              + (1-alpha) StatisticsInSubtree(move)
   Alpha2 = nbSimulations / ( K + nbSimulations)
Usually works well, but performs weakly on some situations.
                                                   [Müller]
4 generic rules proposed recently:
- Drake [ICGA 2009]: Last Good Reply
- Silver and others: simulation balancing
- poolRave [Rimmel et al, ACG 2011]
- Contextual Monte-Carlo [Rimmel et al, EvoGames 2010]
- Decisive moves and anti-decisive moves
                                  [Teytaud et al, CIG 2010]

       ==> significantly (statistics) ok, but far less
             efficient than human expertise
Part III: techniques for adressing these challenges




             1. Parallelization

          2. Machine Learning

      3. Genetic Programming

             4. Nested MCTS
We don't want to use expert knowledge.
       We want automated solutions.
Developing biases by Genetic Programming ?
We don't want to use expert knowledge.
      We want automated solutions.
Developing a MC by Genetic Programming ?




          Looks like a good idea.
              But importantly:
               A strong MC part
(in terms of playing strength of the MC part),
           does not imply (by far!)
              a stronger MCTS.

                          (except in 1P cases...)
We don't want to use expert knowledge.
              We want automated solutions.
        Developing a MC by Genetic Programming ?




 Hoock et al
Cazenave et al
Part III: techniques for addressing these challenges




              1. Parallelization

           2. Machine Learning

        3. Genetic Programming

              4. Nested MCTS
Nested MCTS in one slide
                             (Cazenave, F. Teytaud, etc)

1) to a strategy, you can associate a value function
   -Value(s)
   = expected reward when simulation with strategy 
         from state s
Nested MCTS in one slide
                             (Cazenave, F. Teytaud, etc)

1) to a strategy, you can associate a value function
    -Value(s)
    = expected reward when simulation with strategy 
          from state s
2) Then define:
  Nested-MC0(state)=MC(state)
  Nested-MC1(state)=decision maximizing
           NestedMC0-value(state.(state))
                ...
  Nested-MC.42(state)=decision maximizing
           NestedMC.41-value(state.(state))
Nested MCTS in one slide
                             (Cazenave, F. Teytaud, etc)

1) to a strategy, you can associate a value function
    -Value(s)
    = expected reward when simulation with strategy 
          from state s
2) Then define:
  NestedMC0(state)=MC(state)
  NestedMC1(state)=decision maximizing
           NestedMC0-value(state+decision)
                ...
  NestedMC.42(state)=decision maximizing
           NestedMC.41-value(state+decision)

==> looks like a great idea
==> not good in Go
==> good on some less widely known testbeds
         (“morpion solitaire”, some hard scheduling pbs)
Part I. A success story on Computer Games
Part II. Two unsolved problems in Computer Games

Part III. Some algorithms which do not solve them

Part IV. Conclusion (technical)
Part V. Meta-conclusion (non-technical)
Part IV: Conclusions

Game of Go:

1- disappointingly,
    most recent progress = human expertise
    ==> we understood a lot by methods which
        do not work or work little
    ==> we understood a lot by
        counter-examples, not by impressive
        performance
Part IV: Conclusions

Game of Go:

1- disappointingly,
    most recent progress = human expertise

2- UCB is not that much involved in MCTS
          (simple rules perform similarly)
                      “==> publication bias”
Part IV: Conclusions

Recent “generic” progress in MCTS:

1- application to GGP (general game playing):
    the program learns the rules of the game
    just before the competition, no last-minute
    development (fully automatized)
   ==> not so well known, but really interesting
Part IV: Conclusions
Recent “generic” progress in MCTS:

1- application to GGP (general game playing):
    the program learns the rules of the game
    just before the competition, no last-minute
    Development (fully automatized)

2- one-player games: great ideas which do not
   work in 2P-games sometimes work in 1P
   games (e.g. optimizing the MC in a
   DPS sense)
Part IV: Conclusions


 Techniques which
outperformed the
state of the art in
Minesweeper were
(negatively)
tested on Go,
and (positively) on
industrial problems.
Part V: Meta-Conclusion

Huge publication bias.
People report only experiments which are
sooooo great breakthrough.
Part V: Meta-Conclusion

Huge publication bias.
People report only experiments which are
sooooo great breakthroughs.
But when you discuss with them they tell
you that there is publication and
there is reality.
Part V: Meta-Conclusion

Huge publication bias.
People report only experiments which are
sooooo great breakthroughs.
But when you discuss with them they tell
you that there is publication and
there is reality.

 At the end, we trust our friends, or published
theorems, but we don't trust experiments.

The most interesting MCTS results are
negative results:
Understanding this
 “combination of local stuff”       Abstract
is impossible for computers     thinking (looks
                                 like theorem
                                    proving)




                      Current main ML
                    techniques for MCTS
                    does not work on this
There are several examples of MCTS papers
in which problems were swept under the carpet,
            for the sake of publication,
   whereas the dust was the interesting stuff.
     Results are often difficult to reproduce,
    or unstable w.r.t. experimental conditions.
Examples:
             “- I have truncated results to ..... because
                     it was unstable otherwise.”
                       (cheat by using new version only for openings)




==> for any method, with enough
          tuning, you get positive results
Examples:
             “- I have truncated results to ..... because
                     it was unstable otherwise.”
                        (cheat by using new version only for openings)

    “- I could make it work after a lot of tuning in 9x9, but I could
                  not get positive results in 19x19”
                                         (cheat by heavy tuning)
==> for any method, with enough
          tuning, you get positive results
==> you are more likely to publish “I used sophisticated method XXX
      and got positive results” than “I used plenty of dirty tuning
          and got positive results”
          ==> if method XXX has plenty of free parameters
             it's ok at some point you will validate it
For mathematical works,
 sometimes people lie on motivations, trying
to justify that there is a real world application.
For mathematical works,
 sometimes people lie on motivations, trying
to justify that there is a real world application.

 Sometimes it's true, but it's also often a lie.



                      A memory from a long time ago;
                  I was working on pure theory stuff and
                  I asked “I have read in the abstract that
                this can be applied to biological problems.
                             Can you explain ?”
For mathematical works,
 sometimes people lie on motivations, trying
to justify that there is a real world application.

 Sometimes it's true, but it's also often a lie.



                      A memory from a long time ago;
                  I was working on pure theory stuff and
                  I asked “I have read in the abstract that
                this can be applied to biological problems.

                   Answer: “Wahaha he has believed it!”
For mathematical works,
 sometimes people lie on motivations, trying
to justify that there is a real world application.

 Sometimes it's true, but it's also often a lie.

     In experiments, it's different: people
        often use experimental setups
  for hiding the problems under the carpet.
       Mathematicians can not do that.
Part V: Meta-Conclusion

Huge publication bias.
People report only experiments which are
sooooo great breakthroughs.
But when you discuss with them they tell
you that there is publication and
there is reality.

My conclusions:
- don't trust publications too much,
- I want to publish less
- I want to publish (try to publish...)
      failures and disappointing results.
Part V: Meta-Conclusion

Huge publication bias.
People report only experiments which are
sooooo great breakthrough.
But when you discuss with them they tell
you that there is publication and
there is reality.                 We could apply in
                                     MineSweeper (1P)
                                    Ideas which do not
My conclusions:                       work in Go (2P)
- don't trust publications too much,
- I want to publish less
- I want to publish (try to publish...)
      failures and disappointing results.
Part V: Meta-Conclusion

Huge publication bias.
People report only experiments which are
sooooo great breakthrough.
But when you discuss with them they tell
you that there is publication and
there is reality. We could apply in We could apply in
                    Energy Manag. (1P)    MineSweeper (1P)
                    Ideas which do not   Ideas which do not
My conclusions:       work in Go (2P)      work in Go (2P)
- don't trust publications too much,
- I want to publish less
- I want to publish (try to publish...)
      failures and disappointing results.
Part V: Meta-Conclusion

People in computer-games look much more
Clever since they have been working on Go.

Much easier to write reports :-)
Lucky, right place, right moment.

The progress in the game of Go does not cure cancer.
The important challenges are still in front of us (don't trust
         too much published solutions...).

Failed experiments on Go provide
 more insights than the success story (in which
 the tuning part, which is not so generic, is not visible...).
Yet games are great challenges.




When you play Go, you look clever & wise.

When you play StarCraft, you look like
  a geeky teenager.

Yet, StarCraft, Doom, Table Tennis, MineSweeper
   are great challenges.
Difficult games: Havannah

                      Very difficult
                     for computers.
What else ? First Person Shooting
(UCT for partially observable MDP)
What else ? Real Time Strategy Game
   (multiple actors, partially obs.)




 Frédéric Lemoine   MIG 11/07/2008   104
What else ? Sports (continuous control)




 Frédéric Lemoine   MIG 11/07/2008   105
“Real” games
Assumption: if a computer understands and guesses spins, then
this robot will be efficient for something else than just games.

(holds true for Go)
“Real” games
Assumption: if a computer understands and guesses spins, then
this robot will be efficient for something else than just games.




                             VS
What else ? Collaborative sports
Funding based on
    Experimental
                                    publication records
       works.

                                                                 For me, this is source
                                                                      of all evil.
Difficult to reproduce
 (except games...)                                                Academics are, and
                                                                     should remain,
                                                                 the most independent
                                                                   and reliable people.
       Statistical                                                 Should be referees
     conscious or        Dust swept under                           for all important
                                              Negative results
     unconscious         carpet / aesthetic
                               bias
                                               unpublished        industrial contracts.
       cheating




                        Moderately reliable
                            publication
     Yet, academic papers are, I think, more reliable than
     reports for billion-$ contracts ==> pressure by money
                          does not work :-(

Weitere ähnliche Inhalte

Andere mochten auch

Combining games artificial intelligences & improving random seeds
Combining games artificial intelligences & improving random seedsCombining games artificial intelligences & improving random seeds
Combining games artificial intelligences & improving random seedsOlivier Teytaud
 
Examples of operational research
Examples of operational researchExamples of operational research
Examples of operational researchOlivier Teytaud
 
Keywords and examples of machine learning
Keywords and examples of machine learningKeywords and examples of machine learning
Keywords and examples of machine learningOlivier Teytaud
 
Fuzzy control - superfast survey
Fuzzy control - superfast surveyFuzzy control - superfast survey
Fuzzy control - superfast surveyOlivier Teytaud
 
Simulation-based optimization: Upper Confidence Tree and Direct Policy Search
Simulation-based optimization: Upper Confidence Tree and Direct Policy SearchSimulation-based optimization: Upper Confidence Tree and Direct Policy Search
Simulation-based optimization: Upper Confidence Tree and Direct Policy SearchOlivier Teytaud
 
Bias correction, and other uncertainty management techniques
Bias correction, and other uncertainty management techniquesBias correction, and other uncertainty management techniques
Bias correction, and other uncertainty management techniquesOlivier Teytaud
 

Andere mochten auch (8)

Combining games artificial intelligences & improving random seeds
Combining games artificial intelligences & improving random seedsCombining games artificial intelligences & improving random seeds
Combining games artificial intelligences & improving random seeds
 
Examples of operational research
Examples of operational researchExamples of operational research
Examples of operational research
 
Keywords and examples of machine learning
Keywords and examples of machine learningKeywords and examples of machine learning
Keywords and examples of machine learning
 
Debugging
DebuggingDebugging
Debugging
 
Fuzzy control - superfast survey
Fuzzy control - superfast surveyFuzzy control - superfast survey
Fuzzy control - superfast survey
 
Simulation-based optimization: Upper Confidence Tree and Direct Policy Search
Simulation-based optimization: Upper Confidence Tree and Direct Policy SearchSimulation-based optimization: Upper Confidence Tree and Direct Policy Search
Simulation-based optimization: Upper Confidence Tree and Direct Policy Search
 
Power systemsilablri
Power systemsilablriPower systemsilablri
Power systemsilablri
 
Bias correction, and other uncertainty management techniques
Bias correction, and other uncertainty management techniquesBias correction, and other uncertainty management techniques
Bias correction, and other uncertainty management techniques
 

Ähnlich wie Disappointing results & open problems in Monte-Carlo Tree Search

Combining UCT and Constraint Satisfaction Problems for Minesweeper
Combining UCT and Constraint Satisfaction Problems for MinesweeperCombining UCT and Constraint Satisfaction Problems for Minesweeper
Combining UCT and Constraint Satisfaction Problems for MinesweeperOlivier Teytaud
 
Stochastic Dynamic Programming and You.pptx
Stochastic Dynamic Programming and You.pptxStochastic Dynamic Programming and You.pptx
Stochastic Dynamic Programming and You.pptxjkmrshll88
 
Meta Monte-Carlo Tree Search
Meta Monte-Carlo Tree SearchMeta Monte-Carlo Tree Search
Meta Monte-Carlo Tree SearchOlivier Teytaud
 
Showcase of My Research on Games & AI "till the end of Oct. 2014"
Showcase of My Research on Games & AI "till the end of Oct. 2014"Showcase of My Research on Games & AI "till the end of Oct. 2014"
Showcase of My Research on Games & AI "till the end of Oct. 2014"Mohammad Shaker
 
Dark Matter, Public Health, and Scientific Computing
Dark Matter, Public Health, and Scientific ComputingDark Matter, Public Health, and Scientific Computing
Dark Matter, Public Health, and Scientific ComputingGreg Wilson
 
Understanding Basics of Machine Learning
Understanding Basics of Machine LearningUnderstanding Basics of Machine Learning
Understanding Basics of Machine LearningPranav Ainavolu
 
How To Run a 5 Whys (With Humans, Not Robots)
How To Run a 5 Whys (With Humans, Not Robots)How To Run a 5 Whys (With Humans, Not Robots)
How To Run a 5 Whys (With Humans, Not Robots)Dan Milstein
 
Integrated Math 2 Section 4-2
Integrated Math 2 Section 4-2Integrated Math 2 Section 4-2
Integrated Math 2 Section 4-2Jimbo Lamb
 
Jason Yee - Chaos! - Codemotion Rome 2019
Jason Yee - Chaos! - Codemotion Rome 2019Jason Yee - Chaos! - Codemotion Rome 2019
Jason Yee - Chaos! - Codemotion Rome 2019Codemotion
 
The Validation Attitude
The Validation AttitudeThe Validation Attitude
The Validation AttitudeDVClub
 
STAT 200 Massive Success / snaptutorial.com
STAT 200 Massive Success / snaptutorial.comSTAT 200 Massive Success / snaptutorial.com
STAT 200 Massive Success / snaptutorial.comReynolds79
 
Just the basics_strata_2013
Just the basics_strata_2013Just the basics_strata_2013
Just the basics_strata_2013Ken Mwai
 
The Ludic Fallacy Applied to Automated Planning
The Ludic Fallacy Applied to Automated PlanningThe Ludic Fallacy Applied to Automated Planning
The Ludic Fallacy Applied to Automated PlanningLuke Dicken
 
Machine Learning on Azure - AzureConf
Machine Learning on Azure - AzureConfMachine Learning on Azure - AzureConf
Machine Learning on Azure - AzureConfSeth Juarez
 
Deep Learning Class #0 - You Can Do It
Deep Learning Class #0 - You Can Do ItDeep Learning Class #0 - You Can Do It
Deep Learning Class #0 - You Can Do ItHolberton School
 
DL Classe 0 - You can do it
DL Classe 0 - You can do itDL Classe 0 - You can do it
DL Classe 0 - You can do itGregory Renard
 
How does Google Google: A journey into the wondrous mathematics behind your f...
How does Google Google: A journey into the wondrous mathematics behind your f...How does Google Google: A journey into the wondrous mathematics behind your f...
How does Google Google: A journey into the wondrous mathematics behind your f...David Gleich
 

Ähnlich wie Disappointing results & open problems in Monte-Carlo Tree Search (20)

Combining UCT and Constraint Satisfaction Problems for Minesweeper
Combining UCT and Constraint Satisfaction Problems for MinesweeperCombining UCT and Constraint Satisfaction Problems for Minesweeper
Combining UCT and Constraint Satisfaction Problems for Minesweeper
 
binary_logic.pptx
binary_logic.pptxbinary_logic.pptx
binary_logic.pptx
 
Cat scratch
Cat scratchCat scratch
Cat scratch
 
Stochastic Dynamic Programming and You.pptx
Stochastic Dynamic Programming and You.pptxStochastic Dynamic Programming and You.pptx
Stochastic Dynamic Programming and You.pptx
 
Meta Monte-Carlo Tree Search
Meta Monte-Carlo Tree SearchMeta Monte-Carlo Tree Search
Meta Monte-Carlo Tree Search
 
Showcase of My Research on Games & AI "till the end of Oct. 2014"
Showcase of My Research on Games & AI "till the end of Oct. 2014"Showcase of My Research on Games & AI "till the end of Oct. 2014"
Showcase of My Research on Games & AI "till the end of Oct. 2014"
 
Dark Matter, Public Health, and Scientific Computing
Dark Matter, Public Health, and Scientific ComputingDark Matter, Public Health, and Scientific Computing
Dark Matter, Public Health, and Scientific Computing
 
Understanding Basics of Machine Learning
Understanding Basics of Machine LearningUnderstanding Basics of Machine Learning
Understanding Basics of Machine Learning
 
How To Run a 5 Whys (With Humans, Not Robots)
How To Run a 5 Whys (With Humans, Not Robots)How To Run a 5 Whys (With Humans, Not Robots)
How To Run a 5 Whys (With Humans, Not Robots)
 
Integrated Math 2 Section 4-2
Integrated Math 2 Section 4-2Integrated Math 2 Section 4-2
Integrated Math 2 Section 4-2
 
Jason Yee - Chaos! - Codemotion Rome 2019
Jason Yee - Chaos! - Codemotion Rome 2019Jason Yee - Chaos! - Codemotion Rome 2019
Jason Yee - Chaos! - Codemotion Rome 2019
 
The Validation Attitude
The Validation AttitudeThe Validation Attitude
The Validation Attitude
 
STAT 200 Massive Success / snaptutorial.com
STAT 200 Massive Success / snaptutorial.comSTAT 200 Massive Success / snaptutorial.com
STAT 200 Massive Success / snaptutorial.com
 
Just the basics_strata_2013
Just the basics_strata_2013Just the basics_strata_2013
Just the basics_strata_2013
 
The Ludic Fallacy Applied to Automated Planning
The Ludic Fallacy Applied to Automated PlanningThe Ludic Fallacy Applied to Automated Planning
The Ludic Fallacy Applied to Automated Planning
 
Machine Learning on Azure - AzureConf
Machine Learning on Azure - AzureConfMachine Learning on Azure - AzureConf
Machine Learning on Azure - AzureConf
 
Deep Learning Class #0 - You Can Do It
Deep Learning Class #0 - You Can Do ItDeep Learning Class #0 - You Can Do It
Deep Learning Class #0 - You Can Do It
 
DL Classe 0 - You can do it
DL Classe 0 - You can do itDL Classe 0 - You can do it
DL Classe 0 - You can do it
 
How does Google Google: A journey into the wondrous mathematics behind your f...
How does Google Google: A journey into the wondrous mathematics behind your f...How does Google Google: A journey into the wondrous mathematics behind your f...
How does Google Google: A journey into the wondrous mathematics behind your f...
 
Games for Learning
Games for LearningGames for Learning
Games for Learning
 

Kürzlich hochgeladen

08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 

Kürzlich hochgeladen (20)

08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 

Disappointing results & open problems in Monte-Carlo Tree Search

  • 1. Disappointing and Unexpected Results in Monte-Carlo Tree Search O. Teytaud & colleagues Silver Workshop, ECML 2012 In a nutshell: - the game of Go, a great AI-complete challenge - MCTS, a great recent tool for MDP-solving - negative results on MCTS are the most important stuff - considerations on academic publications (pros and cons)
  • 2. Disappointing and Unexpected Results in Monte-Carlo Tree Search O. Teytaud & colleagues Silver Workshop, ECML 2012 If you solve these weaknesses, even if it takes all your time in all your research during 30 years, it is worth being done. In a nutshell: - the game of Go, a great AI-complete challenge - MCTS, a great recent tool for MDP-solving - negative results on MCTS are the most important stuff - considerations on academic publications (pros and cons)
  • 3. Part I. A success story on Computer Games Part II. Two unsolved problems in Computer Games Part III. Some algorithms which do not solve them Part IV. Conclusion (technical) Part V. Meta-conclusion (non-technical)
  • 4. Part I : The Success Story (less showing off in part II :-) ) The game of Go is a beautiful Challenge.
  • 5. Part I : The Success Story (less showing off in part II :-) ) The game of Go is a beautiful challenge. We did the first wins against professional players in the game of Go
  • 6. Game of Go (9x9 here)
  • 13. Game of Go: counting territories (white has 7.5 “bonus” as black starts)
  • 14. Game of Go: the rules Black plays at the blue circle: the white group dies (it is removed) It's impossible to kill white (two “eyes”). “Superko” rule: we don't come back to the same situation. (without superko: “PSPACE hard” with superko: “EXPTIME-hard”) At the end, we count territories ==> black starts, so +7.5 for white.
  • 15. UCT (Upper Confidence Trees) (a variant of MCTS) Coulom (06) Chaslot, Saito & Bouzy (06) Kocsis Szepesvari (06)
  • 16. UCT
  • 17. UCT
  • 18. UCT
  • 19. UCT
  • 20. UCT Kocsis & Szepesvari (06)
  • 22. Exploitation ... SCORE = 5/7 + k.sqrt( log(10)/7 )
  • 23. Exploitation ... SCORE = 5/7 + k.sqrt( log(10)/7 )
  • 24. Exploitation ... SCORE = 5/7 + k.sqrt( log(10)/7 )
  • 25. ... or exploration ? SCORE = 0/2 + k.sqrt( log(10)/2 )
  • 26. “UCB” ? • I have shown the “UCB” formula (Lai, Robbins), which is the difference between MCTS and UCT
  • 27. “UCB” ? • I have shown the “UCB” formula (Lai, Robbins), which is the difference between MCTS and UCT • The UCB formula has deep mathematical principles.
  • 28. “UCB” ? • I have shown the “UCB” formula (Lai, Robbins), which is the difference between MCTS and UCT • The UCB formula has deep mathematical principles. • But very far from the MCTS context.
  • 29. “UCB” ? • I have shown the “UCB” formula (Lai, Robbins), which is the difference between MCTS and UCT • The UCB formula has deep mathematical principles. • But very far from the MCTS context. • Contrarily to what has often been claimed, UCB is not central in MCTS.
  • 30. “UCB” ? • I have shown the “UCB” formula (Lai, Robbins), which is the difference between MCTS and UCT • The UCB formula has deep mathematical principles. • But very far from the MCTS context. • Contrarily to what has often been claimed, UCB is not central in MCTS. • But for publishing papers, relating MCTS to UCB is so beautiful, with plenty of maths papers in the bibliography :-)
  • 31. The great news: ● Not related to classical algorithms (no alpha-beta) ● Recent tools (Rémi Coulom's paper in 2006) ● Not at all specific from Go (now widely used in games, and beyond)
  • 32. The great news: ● Not related to classical algorithms (no alpha-beta) ● Recent tools (Rémi Coulom's paper in 2006) ● Not at all specific from Go (now widely used in games, and beyond) But great performance in Go needs adaptations (of the MC part)...
  • 33. We all have to write reports: ● Showing that we are very strong ● Showing that our research has “breakthroughs”, which destroy “bottlenecks” So ok the previous slide is perfect for that
  • 34. Part II: challenges Two main challenges: ● Situations which require abstract thinking (cf. Cazenave) ● Situations which involve divide & conquer (cf Müller)
  • 35. Part I. A success story on Computer Games Part II. Two unsolved problems in Computer Games Part III. Some algorithms which do not solve them Part IV. Conclusion (technical) Part V. Meta-conclusion (non-technical)
  • 36. A trivial semeai (= “liberty” race) Plenty of equivalent situations! They are randomly sampled, with  no generalization. 50% of estimated win probability!
  • 37. Semeai Plenty of equivalent situations! They are randomly sampled, with  no generalization. 50% of estimated win probability!
  • 38. Semeai Plenty of equivalent situations! They are randomly sampled, with  no generalization. 50% of estimated win probability!
  • 39. Semeai Plenty of equivalent situations! They are randomly sampled, with  no generalization. 50% of estimated win probability!
  • 40. Semeai Plenty of equivalent situations! They are randomly sampled, with  no generalization. 50% of estimated win probability!
  • 41. Semeai Plenty of equivalent situations! They are randomly sampled, with  no generalization. 50% of estimated win probability!
  • 42. Semeai Plenty of equivalent situations! They are randomly sampled, with  no generalization. 50% of estimated win probability!
  • 43. Semeai Plenty of equivalent situations! They are randomly sampled, with  no generalization. 50% of estimated win probability!
  • 44. A trivial semeai Plenty of equivalent situations! They are randomly sampled, with  no generalization. 50% of estimated win probability!
  • 45. A trivial semeai Plenty of equivalent situations! They are randomly sampled, with  no generalization. 50% of estimated win probability!
  • 46. A trivial semeai Plenty of equivalent situations! They are randomly sampled, with  no generalization. 50% of estimated win probability!
  • 47. This is very easy. Children can solve that. But it is too abstract for computers. Computers play “semeais” very badly.
  • 48. It does not work. Why ? 50% of estimated win probability! In the first node: ● The first simulations give ~ 50% ● The next simulations go to 100% or 0%  (depending on the chosen move) ● But, then, we switch to another node                                                 (~ 8! x 8! such nodes)
  • 49. And the humans ? 50% of estimated win probability! In the first node: ● The first simulations give ~ 50% ● The next simulations go to 100% or 0%  (depending on the chosen move) ● But, then, we DON'T switch to another node   
  • 50. Requires more than local fighting. Requires combining several local fights. Children usually not so good at this. But strong adults really good. And computers very childish. Looks like a bad move, “locally”. Lee Sedol (black) Vs Hang Jansik (white)
  • 51. Requires more than local fighting. Requires combining several local fights. Children usually not so good at this. But strong adults really good. And computers very childish. Looks like a bad move, “locally”.
  • 52. Part I. A success story on Computer Games Part II. Two unsolved problems in Computer Games Part III. Some algorithms which do not solve them (negatives results show that importance stuff is really on II...) Part IV. Conclusion (technical) Part V. Meta-conclusion (non-technical)
  • 53. Part III: techniques for addressing these challenges 1. Parallelization 2. Machine Learning 3. Genetic Programming 4. Nested MCTS
  • 54. Parallelizing MCTS • On a parallel machine with shared memory: just many simulations in parallel, the same memory for all. • On a parallel machine with no shared memory: one MCTS per comp. node, and 3 times per second: ● Select nodes with at least 5% of total sims (depth at most 3) ● Average all statistics on these nodes ==> comp cost = log(nb comp nodes)
  • 55. Parallelizing MCTS • On a parallel machine with shared memory: just many simulations in parallel, the same memory for all. • On a parallel machine with no shared memory: one MCTS per comp. node, and 3 times per second: ● Select nodes with at least 5% of total sims (depth at most 3) ● Average all statistics on these nodes ==> comp cost = log(nb comp nodes)
  • 56. Parallelizing MCTS • On a parallel machine with shared memory: just many simulations in parallel, the same memory for all. • On a parallel machine with no shared memory: one MCTS per comp. node, and 3 times per second: ● Select nodes with at least 5% of total sims (depth at most 3) ● Average all statistics on these nodes ==> comp cost = log(nb comp nodes)
  • 57. Parallelizing MCTS • On a parallel machine with shared memory: just many simulations in parallel, the same memory for all. • On a parallel machine with no shared memory: one MCTS per comp. node, and 3 times per second: ● Select nodes with at least 5% of total sims (depth at most 3) ● Average all statistics on these nodes ==> comp cost = log(nb comp nodes)
  • 58. Parallelizing MCTS • On a parallel machine with shared memory: just many simulations in parallel, the same memory for all. • On a parallel machine with no shared memory: one MCTS per comp. node, and 3 times per second: ● Select nodes with at least 5% of total sims (depth at most 3) ● Average all statistics on these nodes ==> comp cost = log(nb comp nodes)
  • 59. Good news: it works So misleading numbers...
  • 60. Much better than voting schemes But little difference with T. Cazenave (depth 0).
  • 61. Every month, someone tells us: Try with a bigger machine ! And win against top pros ! (I have believed that, at some point...)
  • 62. In fact, “32” and “1” have almost the same level... (against humans...)
  • 63. Being faster is not the solution
  • 64. The same in Havannah (F. Teytaud)
  • 65. More deeply, 1 (R. Coulom) Improvement in terms of performance against humans << Improvement in terms of performance against computers << Improvements in terms of self-play
  • 66. More deeply, 2 No improvement in divide and conquer. No improvement on situations which require abstraction.
  • 67. Part III: techniques for adressing these challenges 1. Parallelization 2. Machine Learning 3. Genetic Programming 4. Nested MCTS
  • 68. Machine learning A lot of tuning of the MC is central. It is a bit disappointing for the genericity of the method. Can we make this tuning automatic ?
  • 69. A classical machine learning trick in MCTS: RAVE (= rapid action value estimates) score(move) = alpha UCB(move) + (1-alpha) StatisticsInSubtree(move) Alpha2 = nbSimulations / ( K + nbSimulations) Usually works well, but performs weakly on some situations. weakness: - brings information only from bottom to top of the tree - does not solve main problems - sometimes very harmful ==> extensions ?
  • 70. Here B2 is the only good move for white. But B2 makes sense only as a first move, and nowhere else in subtrees ==> RAVE rejects B2. ==> extensions ?
  • 71. A classical machine learning trick in MCTS: RAVE (= rapid action value estimates) score(move) = alpha UCB(move) + (1-alpha) StatisticsInSubtree(move) Alpha2 = nbSimulations / ( K + nbSimulations) Usually works well, but performs weakly on some situations. [Müller] 4 generic rules proposed recently: - Drake [ICGA 2009]: Last Good Reply - Silver and others: simulation balancing - poolRave [Rimmel et al, ACG 2011] - Contextual Monte-Carlo [Rimmel et al, EvoGames 2010] - Decisive moves and anti-decisive moves [Teytaud et al, CIG 2010] ==> significantly (statistics) ok, but far less efficient than human expertise
  • 72. Part III: techniques for adressing these challenges 1. Parallelization 2. Machine Learning 3. Genetic Programming 4. Nested MCTS
  • 73. We don't want to use expert knowledge. We want automated solutions. Developing biases by Genetic Programming ?
  • 74. We don't want to use expert knowledge. We want automated solutions. Developing a MC by Genetic Programming ? Looks like a good idea. But importantly: A strong MC part (in terms of playing strength of the MC part), does not imply (by far!) a stronger MCTS. (except in 1P cases...)
  • 75. We don't want to use expert knowledge. We want automated solutions. Developing a MC by Genetic Programming ? Hoock et al Cazenave et al
  • 76. Part III: techniques for addressing these challenges 1. Parallelization 2. Machine Learning 3. Genetic Programming 4. Nested MCTS
  • 77. Nested MCTS in one slide (Cazenave, F. Teytaud, etc) 1) to a strategy, you can associate a value function -Value(s) = expected reward when simulation with strategy  from state s
  • 78. Nested MCTS in one slide (Cazenave, F. Teytaud, etc) 1) to a strategy, you can associate a value function -Value(s) = expected reward when simulation with strategy  from state s 2) Then define: Nested-MC0(state)=MC(state) Nested-MC1(state)=decision maximizing NestedMC0-value(state.(state)) ... Nested-MC.42(state)=decision maximizing NestedMC.41-value(state.(state))
  • 79. Nested MCTS in one slide (Cazenave, F. Teytaud, etc) 1) to a strategy, you can associate a value function -Value(s) = expected reward when simulation with strategy  from state s 2) Then define: NestedMC0(state)=MC(state) NestedMC1(state)=decision maximizing NestedMC0-value(state+decision) ... NestedMC.42(state)=decision maximizing NestedMC.41-value(state+decision) ==> looks like a great idea ==> not good in Go ==> good on some less widely known testbeds (“morpion solitaire”, some hard scheduling pbs)
  • 80. Part I. A success story on Computer Games Part II. Two unsolved problems in Computer Games Part III. Some algorithms which do not solve them Part IV. Conclusion (technical) Part V. Meta-conclusion (non-technical)
  • 81. Part IV: Conclusions Game of Go: 1- disappointingly, most recent progress = human expertise ==> we understood a lot by methods which do not work or work little ==> we understood a lot by counter-examples, not by impressive performance
  • 82. Part IV: Conclusions Game of Go: 1- disappointingly, most recent progress = human expertise 2- UCB is not that much involved in MCTS (simple rules perform similarly) “==> publication bias”
  • 83. Part IV: Conclusions Recent “generic” progress in MCTS: 1- application to GGP (general game playing): the program learns the rules of the game just before the competition, no last-minute development (fully automatized) ==> not so well known, but really interesting
  • 84. Part IV: Conclusions Recent “generic” progress in MCTS: 1- application to GGP (general game playing): the program learns the rules of the game just before the competition, no last-minute Development (fully automatized) 2- one-player games: great ideas which do not work in 2P-games sometimes work in 1P games (e.g. optimizing the MC in a DPS sense)
  • 85. Part IV: Conclusions Techniques which outperformed the state of the art in Minesweeper were (negatively) tested on Go, and (positively) on industrial problems.
  • 86. Part V: Meta-Conclusion Huge publication bias. People report only experiments which are sooooo great breakthrough.
  • 87. Part V: Meta-Conclusion Huge publication bias. People report only experiments which are sooooo great breakthroughs. But when you discuss with them they tell you that there is publication and there is reality.
  • 88. Part V: Meta-Conclusion Huge publication bias. People report only experiments which are sooooo great breakthroughs. But when you discuss with them they tell you that there is publication and there is reality. At the end, we trust our friends, or published theorems, but we don't trust experiments. The most interesting MCTS results are negative results:
  • 89. Understanding this “combination of local stuff” Abstract is impossible for computers thinking (looks like theorem proving) Current main ML techniques for MCTS does not work on this
  • 90. There are several examples of MCTS papers in which problems were swept under the carpet, for the sake of publication, whereas the dust was the interesting stuff. Results are often difficult to reproduce, or unstable w.r.t. experimental conditions.
  • 91. Examples: “- I have truncated results to ..... because it was unstable otherwise.” (cheat by using new version only for openings) ==> for any method, with enough tuning, you get positive results
  • 92. Examples: “- I have truncated results to ..... because it was unstable otherwise.” (cheat by using new version only for openings) “- I could make it work after a lot of tuning in 9x9, but I could not get positive results in 19x19” (cheat by heavy tuning) ==> for any method, with enough tuning, you get positive results ==> you are more likely to publish “I used sophisticated method XXX and got positive results” than “I used plenty of dirty tuning and got positive results” ==> if method XXX has plenty of free parameters it's ok at some point you will validate it
  • 93. For mathematical works, sometimes people lie on motivations, trying to justify that there is a real world application.
  • 94. For mathematical works, sometimes people lie on motivations, trying to justify that there is a real world application. Sometimes it's true, but it's also often a lie. A memory from a long time ago; I was working on pure theory stuff and I asked “I have read in the abstract that this can be applied to biological problems. Can you explain ?”
  • 95. For mathematical works, sometimes people lie on motivations, trying to justify that there is a real world application. Sometimes it's true, but it's also often a lie. A memory from a long time ago; I was working on pure theory stuff and I asked “I have read in the abstract that this can be applied to biological problems. Answer: “Wahaha he has believed it!”
  • 96. For mathematical works, sometimes people lie on motivations, trying to justify that there is a real world application. Sometimes it's true, but it's also often a lie. In experiments, it's different: people often use experimental setups for hiding the problems under the carpet. Mathematicians can not do that.
  • 97. Part V: Meta-Conclusion Huge publication bias. People report only experiments which are sooooo great breakthroughs. But when you discuss with them they tell you that there is publication and there is reality. My conclusions: - don't trust publications too much, - I want to publish less - I want to publish (try to publish...) failures and disappointing results.
  • 98. Part V: Meta-Conclusion Huge publication bias. People report only experiments which are sooooo great breakthrough. But when you discuss with them they tell you that there is publication and there is reality. We could apply in MineSweeper (1P) Ideas which do not My conclusions: work in Go (2P) - don't trust publications too much, - I want to publish less - I want to publish (try to publish...) failures and disappointing results.
  • 99. Part V: Meta-Conclusion Huge publication bias. People report only experiments which are sooooo great breakthrough. But when you discuss with them they tell you that there is publication and there is reality. We could apply in We could apply in Energy Manag. (1P) MineSweeper (1P) Ideas which do not Ideas which do not My conclusions: work in Go (2P) work in Go (2P) - don't trust publications too much, - I want to publish less - I want to publish (try to publish...) failures and disappointing results.
  • 100. Part V: Meta-Conclusion People in computer-games look much more Clever since they have been working on Go. Much easier to write reports :-) Lucky, right place, right moment. The progress in the game of Go does not cure cancer. The important challenges are still in front of us (don't trust too much published solutions...). Failed experiments on Go provide more insights than the success story (in which the tuning part, which is not so generic, is not visible...).
  • 101. Yet games are great challenges. When you play Go, you look clever & wise. When you play StarCraft, you look like a geeky teenager. Yet, StarCraft, Doom, Table Tennis, MineSweeper are great challenges.
  • 102. Difficult games: Havannah Very difficult for computers.
  • 104. What else ? Real Time Strategy Game (multiple actors, partially obs.) Frédéric Lemoine MIG 11/07/2008 104
  • 106. “Real” games Assumption: if a computer understands and guesses spins, then this robot will be efficient for something else than just games. (holds true for Go)
  • 107. “Real” games Assumption: if a computer understands and guesses spins, then this robot will be efficient for something else than just games. VS
  • 109. Funding based on Experimental publication records works. For me, this is source of all evil. Difficult to reproduce (except games...) Academics are, and should remain, the most independent and reliable people. Statistical Should be referees conscious or Dust swept under for all important Negative results unconscious carpet / aesthetic bias unpublished industrial contracts. cheating Moderately reliable publication Yet, academic papers are, I think, more reliable than reports for billion-$ contracts ==> pressure by money does not work :-(