Exploring the Future Potential of AI-Enabled Smartphone Processors
Disappointing results & open problems in Monte-Carlo Tree Search
1. Disappointing and Unexpected
Results in Monte-Carlo Tree Search
O. Teytaud & colleagues
Silver Workshop, ECML 2012
In a nutshell:
- the game of Go, a great AI-complete challenge
- MCTS, a great recent tool for MDP-solving
- negative results on MCTS are the most important stuff
- considerations on academic publications (pros and cons)
2. Disappointing and Unexpected
Results in Monte-Carlo Tree Search
O. Teytaud & colleagues
Silver Workshop, ECML 2012
If you solve these weaknesses,
even if it takes all your
time in all your research during 30
years, it is worth being done.
In a nutshell:
- the game of Go, a great AI-complete challenge
- MCTS, a great recent tool for MDP-solving
- negative results on MCTS are the most important stuff
- considerations on academic publications (pros and cons)
3. Part I. A success story
on Computer Games
Part II. Two unsolved problems in Computer Games
Part III. Some algorithms which do not solve them
Part IV. Conclusion (technical)
Part V. Meta-conclusion (non-technical)
4. Part I : The Success Story
(less showing off in part II :-) )
The game of Go is a beautiful
Challenge.
5. Part I : The Success Story
(less showing off in part II :-) )
The game of Go is a beautiful
challenge.
We did the first wins against
professional players
in the game of Go
13. Game of Go: counting territories
(white has 7.5 “bonus” as black starts)
14. Game of Go: the rules
Black plays at the blue circle:
the white group dies (it is
removed)
It's impossible to kill white (two “eyes”).
“Superko” rule: we don't come back to the same
situation.
(without superko: “PSPACE hard”
with superko: “EXPTIME-hard”)
At the end, we count territories
==> black starts, so +7.5 for white.
15. UCT (Upper Confidence Trees)
(a variant of MCTS)
Coulom (06)
Chaslot, Saito & Bouzy (06)
Kocsis Szepesvari (06)
26. “UCB” ?
• I have shown the “UCB” formula (Lai, Robbins), which is
the difference between MCTS and UCT
27. “UCB” ?
• I have shown the “UCB” formula (Lai, Robbins), which is
the difference between MCTS and UCT
• The UCB formula has deep mathematical principles.
28. “UCB” ?
• I have shown the “UCB” formula (Lai, Robbins), which is
the difference between MCTS and UCT
• The UCB formula has deep mathematical principles.
• But very far from the MCTS context.
29. “UCB” ?
• I have shown the “UCB” formula (Lai, Robbins), which is
the difference between MCTS and UCT
• The UCB formula has deep mathematical principles.
• But very far from the MCTS context.
• Contrarily to what has often been claimed, UCB is
not central in MCTS.
30. “UCB” ?
• I have shown the “UCB” formula (Lai, Robbins), which is
the difference between MCTS and UCT
• The UCB formula has deep mathematical principles.
• But very far from the MCTS context.
• Contrarily to what has often been claimed, UCB is
not central in MCTS.
• But for publishing papers, relating MCTS to UCB is
so beautiful, with plenty of maths papers in the
bibliography :-)
31. The great news:
● Not related to classical algorithms
(no alpha-beta)
● Recent tools
(Rémi Coulom's paper in 2006)
● Not at all specific from Go
(now widely used in games,
and beyond)
32. The great news:
● Not related to classical algorithms
(no alpha-beta)
● Recent tools
(Rémi Coulom's paper in 2006)
● Not at all specific from Go
(now widely used in games,
and beyond)
But great performance in Go
needs adaptations
(of the MC part)...
33. We all have to write reports:
● Showing that we are very strong
● Showing that our research has “breakthroughs”,
which destroy “bottlenecks”
So ok the previous slide is perfect for that
34. Part II: challenges
Two main challenges:
● Situations which require abstract thinking
(cf. Cazenave)
● Situations which involve divide & conquer
(cf Müller)
35. Part I. A success story on Computer Games
Part II. Two unsolved problems in
Computer Games
Part III. Some algorithms which do not solve them
Part IV. Conclusion (technical)
Part V. Meta-conclusion (non-technical)
36. A trivial semeai
(= “liberty” race)
Plenty of equivalent
situations!
They are randomly
sampled, with
no generalization.
50% of estimated
win probability!
37. Semeai
Plenty of equivalent
situations!
They are randomly
sampled, with
no generalization.
50% of estimated
win probability!
38. Semeai
Plenty of equivalent
situations!
They are randomly
sampled, with
no generalization.
50% of estimated
win probability!
39. Semeai
Plenty of equivalent
situations!
They are randomly
sampled, with
no generalization.
50% of estimated
win probability!
40. Semeai
Plenty of equivalent
situations!
They are randomly
sampled, with
no generalization.
50% of estimated
win probability!
41. Semeai
Plenty of equivalent
situations!
They are randomly
sampled, with
no generalization.
50% of estimated
win probability!
42. Semeai
Plenty of equivalent
situations!
They are randomly
sampled, with
no generalization.
50% of estimated
win probability!
43. Semeai
Plenty of equivalent
situations!
They are randomly
sampled, with
no generalization.
50% of estimated
win probability!
44. A trivial semeai
Plenty of equivalent
situations!
They are randomly
sampled, with
no generalization.
50% of estimated
win probability!
45. A trivial semeai
Plenty of equivalent
situations!
They are randomly
sampled, with
no generalization.
50% of estimated
win probability!
46. A trivial semeai
Plenty of equivalent
situations!
They are randomly
sampled, with
no generalization.
50% of estimated
win probability!
47. This is very easy.
Children can solve that.
But it is too abstract
for computers.
Computers play
“semeais” very badly.
48. It does not work. Why ?
50% of estimated
win probability!
In the first node:
● The first simulations give ~ 50%
● The next simulations go to 100% or 0%
(depending on the chosen move)
● But, then, we switch to another node
(~ 8! x 8! such nodes)
49. And the humans ?
50% of estimated
win probability!
In the first node:
● The first simulations give ~ 50%
● The next simulations go to 100% or 0%
(depending on the chosen move)
● But, then, we DON'T switch to another node
50. Requires more than local fighting.
Requires combining several local fights.
Children usually
not so good
at this.
But strong adults
really good.
And computers
very childish.
Looks like a
bad move,
“locally”.
Lee Sedol (black)
Vs
Hang Jansik (white)
51. Requires more than local fighting.
Requires combining several local fights.
Children usually
not so good
at this.
But strong adults
really good.
And computers
very childish.
Looks like a
bad move,
“locally”.
52. Part I. A success story on Computer Games
Part II. Two unsolved problems in Computer Games
Part III. Some algorithms which
do not solve them
(negatives results show that importance stuff is
really on II...)
Part IV. Conclusion (technical)
Part V. Meta-conclusion (non-technical)
53. Part III: techniques for addressing these challenges
1. Parallelization
2. Machine Learning
3. Genetic Programming
4. Nested MCTS
54. Parallelizing MCTS
• On a parallel machine with shared memory: just many
simulations in parallel, the same memory for all.
• On a parallel machine with no shared memory: one
MCTS per comp. node, and 3 times per second:
● Select nodes with at least 5% of total sims (depth at most
3)
● Average all statistics on these nodes
==> comp cost = log(nb comp nodes)
55. Parallelizing MCTS
• On a parallel machine with shared memory: just many
simulations in parallel, the same memory for all.
• On a parallel machine with no shared memory: one
MCTS per comp. node, and 3 times per second:
● Select nodes with at least 5% of total sims (depth at most
3)
● Average all statistics on these nodes
==> comp cost = log(nb comp nodes)
56. Parallelizing MCTS
• On a parallel machine with shared memory: just many
simulations in parallel, the same memory for all.
• On a parallel machine with no shared memory: one
MCTS per comp. node, and 3 times per second:
● Select nodes with at least 5% of total sims (depth at most
3)
● Average all statistics on these nodes
==> comp cost = log(nb comp nodes)
57. Parallelizing MCTS
• On a parallel machine with shared memory: just many
simulations in parallel, the same memory for all.
• On a parallel machine with no shared memory: one
MCTS per comp. node, and 3 times per second:
● Select nodes with at least 5% of total sims (depth at most
3)
● Average all statistics on these nodes
==> comp cost = log(nb comp nodes)
58. Parallelizing MCTS
• On a parallel machine with shared memory: just many
simulations in parallel, the same memory for all.
• On a parallel machine with no shared memory: one
MCTS per comp. node, and 3 times per second:
● Select nodes with at least 5% of total sims (depth at most
3)
● Average all statistics on these nodes
==> comp cost = log(nb comp nodes)
65. More deeply, 1
(R. Coulom)
Improvement in terms of performance against
humans
<<
Improvement in terms of performance against
computers
<<
Improvements in terms of self-play
66. More deeply, 2
No improvement in divide and conquer.
No improvement on situations
which require abstraction.
67. Part III: techniques for adressing these challenges
1. Parallelization
2. Machine Learning
3. Genetic Programming
4. Nested MCTS
68. Machine learning
A lot of tuning of the MC is central.
It is a bit disappointing for the
genericity of the method.
Can we make this
tuning automatic ?
69. A classical machine learning trick in MCTS: RAVE
(= rapid action value estimates)
score(move) =
alpha UCB(move)
+ (1-alpha) StatisticsInSubtree(move)
Alpha2 = nbSimulations / ( K + nbSimulations)
Usually works well, but performs weakly on some situations.
weakness:
- brings information only from bottom to top of the tree
- does not solve main problems
- sometimes very harmful
==> extensions ?
70. Here B2 is the only good move for white.
But B2 makes sense only as a first move,
and nowhere else in subtrees ==> RAVE rejects B2.
==> extensions ?
71. A classical machine learning trick in MCTS: RAVE
(= rapid action value estimates)
score(move) =
alpha UCB(move)
+ (1-alpha) StatisticsInSubtree(move)
Alpha2 = nbSimulations / ( K + nbSimulations)
Usually works well, but performs weakly on some situations.
[Müller]
4 generic rules proposed recently:
- Drake [ICGA 2009]: Last Good Reply
- Silver and others: simulation balancing
- poolRave [Rimmel et al, ACG 2011]
- Contextual Monte-Carlo [Rimmel et al, EvoGames 2010]
- Decisive moves and anti-decisive moves
[Teytaud et al, CIG 2010]
==> significantly (statistics) ok, but far less
efficient than human expertise
72. Part III: techniques for adressing these challenges
1. Parallelization
2. Machine Learning
3. Genetic Programming
4. Nested MCTS
73. We don't want to use expert knowledge.
We want automated solutions.
Developing biases by Genetic Programming ?
74. We don't want to use expert knowledge.
We want automated solutions.
Developing a MC by Genetic Programming ?
Looks like a good idea.
But importantly:
A strong MC part
(in terms of playing strength of the MC part),
does not imply (by far!)
a stronger MCTS.
(except in 1P cases...)
75. We don't want to use expert knowledge.
We want automated solutions.
Developing a MC by Genetic Programming ?
Hoock et al
Cazenave et al
76. Part III: techniques for addressing these challenges
1. Parallelization
2. Machine Learning
3. Genetic Programming
4. Nested MCTS
77. Nested MCTS in one slide
(Cazenave, F. Teytaud, etc)
1) to a strategy, you can associate a value function
-Value(s)
= expected reward when simulation with strategy
from state s
78. Nested MCTS in one slide
(Cazenave, F. Teytaud, etc)
1) to a strategy, you can associate a value function
-Value(s)
= expected reward when simulation with strategy
from state s
2) Then define:
Nested-MC0(state)=MC(state)
Nested-MC1(state)=decision maximizing
NestedMC0-value(state.(state))
...
Nested-MC.42(state)=decision maximizing
NestedMC.41-value(state.(state))
79. Nested MCTS in one slide
(Cazenave, F. Teytaud, etc)
1) to a strategy, you can associate a value function
-Value(s)
= expected reward when simulation with strategy
from state s
2) Then define:
NestedMC0(state)=MC(state)
NestedMC1(state)=decision maximizing
NestedMC0-value(state+decision)
...
NestedMC.42(state)=decision maximizing
NestedMC.41-value(state+decision)
==> looks like a great idea
==> not good in Go
==> good on some less widely known testbeds
(“morpion solitaire”, some hard scheduling pbs)
80. Part I. A success story on Computer Games
Part II. Two unsolved problems in Computer Games
Part III. Some algorithms which do not solve them
Part IV. Conclusion (technical)
Part V. Meta-conclusion (non-technical)
81. Part IV: Conclusions
Game of Go:
1- disappointingly,
most recent progress = human expertise
==> we understood a lot by methods which
do not work or work little
==> we understood a lot by
counter-examples, not by impressive
performance
82. Part IV: Conclusions
Game of Go:
1- disappointingly,
most recent progress = human expertise
2- UCB is not that much involved in MCTS
(simple rules perform similarly)
“==> publication bias”
83. Part IV: Conclusions
Recent “generic” progress in MCTS:
1- application to GGP (general game playing):
the program learns the rules of the game
just before the competition, no last-minute
development (fully automatized)
==> not so well known, but really interesting
84. Part IV: Conclusions
Recent “generic” progress in MCTS:
1- application to GGP (general game playing):
the program learns the rules of the game
just before the competition, no last-minute
Development (fully automatized)
2- one-player games: great ideas which do not
work in 2P-games sometimes work in 1P
games (e.g. optimizing the MC in a
DPS sense)
85. Part IV: Conclusions
Techniques which
outperformed the
state of the art in
Minesweeper were
(negatively)
tested on Go,
and (positively) on
industrial problems.
86. Part V: Meta-Conclusion
Huge publication bias.
People report only experiments which are
sooooo great breakthrough.
87. Part V: Meta-Conclusion
Huge publication bias.
People report only experiments which are
sooooo great breakthroughs.
But when you discuss with them they tell
you that there is publication and
there is reality.
88. Part V: Meta-Conclusion
Huge publication bias.
People report only experiments which are
sooooo great breakthroughs.
But when you discuss with them they tell
you that there is publication and
there is reality.
At the end, we trust our friends, or published
theorems, but we don't trust experiments.
The most interesting MCTS results are
negative results:
89. Understanding this
“combination of local stuff” Abstract
is impossible for computers thinking (looks
like theorem
proving)
Current main ML
techniques for MCTS
does not work on this
90. There are several examples of MCTS papers
in which problems were swept under the carpet,
for the sake of publication,
whereas the dust was the interesting stuff.
Results are often difficult to reproduce,
or unstable w.r.t. experimental conditions.
91. Examples:
“- I have truncated results to ..... because
it was unstable otherwise.”
(cheat by using new version only for openings)
==> for any method, with enough
tuning, you get positive results
92. Examples:
“- I have truncated results to ..... because
it was unstable otherwise.”
(cheat by using new version only for openings)
“- I could make it work after a lot of tuning in 9x9, but I could
not get positive results in 19x19”
(cheat by heavy tuning)
==> for any method, with enough
tuning, you get positive results
==> you are more likely to publish “I used sophisticated method XXX
and got positive results” than “I used plenty of dirty tuning
and got positive results”
==> if method XXX has plenty of free parameters
it's ok at some point you will validate it
93. For mathematical works,
sometimes people lie on motivations, trying
to justify that there is a real world application.
94. For mathematical works,
sometimes people lie on motivations, trying
to justify that there is a real world application.
Sometimes it's true, but it's also often a lie.
A memory from a long time ago;
I was working on pure theory stuff and
I asked “I have read in the abstract that
this can be applied to biological problems.
Can you explain ?”
95. For mathematical works,
sometimes people lie on motivations, trying
to justify that there is a real world application.
Sometimes it's true, but it's also often a lie.
A memory from a long time ago;
I was working on pure theory stuff and
I asked “I have read in the abstract that
this can be applied to biological problems.
Answer: “Wahaha he has believed it!”
96. For mathematical works,
sometimes people lie on motivations, trying
to justify that there is a real world application.
Sometimes it's true, but it's also often a lie.
In experiments, it's different: people
often use experimental setups
for hiding the problems under the carpet.
Mathematicians can not do that.
97. Part V: Meta-Conclusion
Huge publication bias.
People report only experiments which are
sooooo great breakthroughs.
But when you discuss with them they tell
you that there is publication and
there is reality.
My conclusions:
- don't trust publications too much,
- I want to publish less
- I want to publish (try to publish...)
failures and disappointing results.
98. Part V: Meta-Conclusion
Huge publication bias.
People report only experiments which are
sooooo great breakthrough.
But when you discuss with them they tell
you that there is publication and
there is reality. We could apply in
MineSweeper (1P)
Ideas which do not
My conclusions: work in Go (2P)
- don't trust publications too much,
- I want to publish less
- I want to publish (try to publish...)
failures and disappointing results.
99. Part V: Meta-Conclusion
Huge publication bias.
People report only experiments which are
sooooo great breakthrough.
But when you discuss with them they tell
you that there is publication and
there is reality. We could apply in We could apply in
Energy Manag. (1P) MineSweeper (1P)
Ideas which do not Ideas which do not
My conclusions: work in Go (2P) work in Go (2P)
- don't trust publications too much,
- I want to publish less
- I want to publish (try to publish...)
failures and disappointing results.
100. Part V: Meta-Conclusion
People in computer-games look much more
Clever since they have been working on Go.
Much easier to write reports :-)
Lucky, right place, right moment.
The progress in the game of Go does not cure cancer.
The important challenges are still in front of us (don't trust
too much published solutions...).
Failed experiments on Go provide
more insights than the success story (in which
the tuning part, which is not so generic, is not visible...).
101. Yet games are great challenges.
When you play Go, you look clever & wise.
When you play StarCraft, you look like
a geeky teenager.
Yet, StarCraft, Doom, Table Tennis, MineSweeper
are great challenges.
106. “Real” games
Assumption: if a computer understands and guesses spins, then
this robot will be efficient for something else than just games.
(holds true for Go)
107. “Real” games
Assumption: if a computer understands and guesses spins, then
this robot will be efficient for something else than just games.
VS
109. Funding based on
Experimental
publication records
works.
For me, this is source
of all evil.
Difficult to reproduce
(except games...) Academics are, and
should remain,
the most independent
and reliable people.
Statistical Should be referees
conscious or Dust swept under for all important
Negative results
unconscious carpet / aesthetic
bias
unpublished industrial contracts.
cheating
Moderately reliable
publication
Yet, academic papers are, I think, more reliable than
reports for billion-$ contracts ==> pressure by money
does not work :-(