SlideShare ist ein Scribd-Unternehmen logo
1 von 57
Downloaden Sie, um offline zu lesen
Zero learning, old and new.
Tristan Cazenave, Univ. Dauphine
Yen-Chi Chen, National Taiwan Normal University
Guan-Wei Chen, National Dong Hwa University
Shi-Yu Chen, National Dong Hwa University
Xian-Dong Chiu, National Dong Hwa University
Julien Dehos, Univ. Littoral Cote d’Opale
Maria Elsa, National Dong Hwa University
Qucheng Gong, Facebook AI Research
Hengyuan Hu, Facebook AI Research
Vasil Khalidov, Facebook AI Research
Chen-Ling Li, National Dong Hwa University
Hsin-I Lin, National Dong Hwa University
Yu-Jin Lin, National Dong Hwa University
OlIvier Teytaud
Started to work in AI last century.
Currently working on games, alphazero style learning, derivative-free
optimization.
Has been working at ARTELYS, INRIA, GOOGLE, FB.
Xavier Martinet, Facebook AI Research
Vegard Mella, Facebook AI Research
Jeremy Rapin, Facebook AI Research
Baptiste Roziere, Facebook AI Research
Gabriel Synnaeve, Facebook AI Research
Fabien Teytaud, Univ. Littoral Cote d’Opale
Olivier Teytaud, Facebook AI Research
Shi-Cheng Ye, National Dong Hwa University
Yi-Jun Ye, National Dong Hwa University
Shi-Jim Yen, National Dong Hwa University
Sergey Zagoruyko, Facebook AI Research
1. MCTS = Monte Carlo Tree Search
2. AlphaZero: adding conv nets
3. AlphaZero – great performances
4. AlphaZero – limitations
5. Open Sourcing
6. Research directions
ALPHAZERO INGREDIENT #1: MCTS
MCTS (MONTE CARLO TREE SEARCH) WAS ORIGINALLY PUBLISHED IN [COULOM06].
WAS ENOUGH FOR WINNING GAMES AGAINST PROS IN 9X9 +19X19 WITH HANDICAP ~4.
QUITE STRONG FOR FULLY OBSERVABLE “GENERAL GAME PLAYING” (I.E. THE PROGRAM MUST
FIRST UNDERSTAND THE RULES).
UCT (UPPER CONFIDENCE TREES) IS A VARIANT OF MCTS
(USING UCB).
usgo.org
Coulom (06)
Chaslot, Saito & Bouzy (06)
Kocsis Szepesvari (06)
UCT (UPPER CONFIDENCE TREES) STARTS
WITH SIMPLE MONTE CARLO
(Monte Carlo)
Monte Carlo …
UCT
(Monte Carlo)
Monte Carlo … and
keep track of
statistics!
UCT
(Monte Carlo)
UCT
(we have
statistics!)
(Monte Carlo)
UCT
(we have
statistics!)
UCT
Kocsis & Szepesvari (06)
EXPLOITATION ...
Monte Carlo, and
build statistics… and
modify MC with
those statistics!
EXPLOITATION ...
SCORE =
5/7
+ k.sqrt( log(10)/7 )
EXPLOITATION ...
SCORE =
5/7
+ k.sqrt( log(10)/7 )
EXPLOITATION ...
SCORE =
5/7
+ k.sqrt( log(10)/7 )
... OR EXPLORATION ?
SCORE =
0/2
+ k.sqrt( log(10)/2 )
UCT IN ONE SLIDE
UCT for choosing a move in a board B
While ( I have time left )
{
Do a simulation
{
Start at board B
At each time step, choose action by UCB (or random if no statistics!)
}
Update statistics with this simulation
}
Return the most simulated action.
1. MCTS
2. AlphaZero: adding conv nets
3. AlphaZero – great performances
4. AlphaZero – limitations
5. Open Sourcing
6. Research directions
ALPHAZERO INGREDIENT #2: DEEP NETWORK
OVERVIEW IN “DEEP LEARNING”, LECUN, BENGIO, HINTON 2015
BOTH A CRITIC NETWORK (EVALUATING THE PROBABILITY OF WINNING IN A GIVEN POSITION) AND
A POLICY NETWORK (PROVIDING A PROBABILITY DISTRIBUTION ON ACTIONS).
Image “clarifai.com/technology” and “the data science blog”
ß ß ß Invariance by translation à à à High level features
PUCT: UCT WITH PRIOR
SCORE(state, action) =
5/7
+ NN(state, action) .sqrt( 10 / 7 )
No logNN based
ALPHAZERO IN A NUTSHELL: A FIXED POINT METHOD!
MCTS(NN): A MCTS WHICH USES A NEURAL NET NN FOR
• EVALUATING LEAVES (NO RANDOM ROLLOUT)
• SUGGESTING POLICIES (BIASING THE MCTS)
NN ß MCTS:
• EACH CLIENT: PLAYS GAMES WITH A MCTS(NN)
• SERVER:
• RECEIVES BATCHES “(STATES, ACTIONS, REWARD AT END OF GAMES)”
• TWO LOSS FUNCTIONS (+WEIGHT DECAY):
• LEARN “STATE à REWARD” (CRITIC)
• LEARN “STATE à PROBABILITY DISTRIBUTION ON ACTIONS” (ACTOR) , I.E. MIMIC THE MCTS
ALPHAZERO:
• RANDOMLY INITIALIZE NN
• ITERATIVELY IMITATE: NN-ACTOR è MCTS(NN) NN-CRITIC è GAME RESULTS
Prediction
of value p imitates
𝜋 from
MCTS
Weight
decay
ALPHAZERO TRAINING IN ONE SLIDE
THE CLIENTS PERFORM SIMULATIONS USING THE NEURAL NETWORKS OUTPUTS:
- NN.PI
- NN.V
EACH CLIENT, MORE DETAILS:
- RUNNING MCTS MODIFIED AS FOLLOWS:
- UCB SCORE COMBINED BY NN.PI (``PUCT’’ FORMULA)
- RANDOM ROLLOUTS (CORRESPONDING TO STATES WITH ZERO SIMULATION) REPLACED BY:
- 1 SINGLE STEP WITH NN.PI FOR CHOOSING ONE ACTION
- REWARD OF THE SIMULATION REPLACED BY NN.V WHICH PREDICTS THE REWARD
- SENDING TO THE MASTER PLENTY OF (STATE, PI CHOSEN BY MCTS, REWARD OF THE GAME)
THE MASTER LEARNS A BETTER NN BASED ON CLIENTS’ SIMULATIONS
Prediction
of value p
imitates
𝜋 from
MCTS
Weight
decay
MASTER
(neural net)
CLIENT 1
(simulating
games using
MCTS)
State s
NN.Pi(s), NN.V(s)
Pi(s) = probabilities of actions in state s
V(s) = estimated winning rate in s
Training of
𝜋 and V
Inference
State s, MCTS.Pi(s),
real reward R
ALPHAZERO TRAINING IN ONE SLIDE
THE CLIENTS PERFORM SIMULATIONS USING THE NEURAL NETWORKS OUTPUTS:
- NN.PI
- NN.V
EACH CLIENT, MORE DETAILS:
- RUNNING MCTS MODIFIED AS FOLLOWS:
- UCB SCORE COMBINED BY NN.PI
- RANDOM ROLLOUTS (CORRESPONDING TO STATES WITH ZERO SIMULATION) REPLACED BY:
- 1 SINGLE STEP WITH NN.PI FOR CHOOSING ONE ACTION
- REWARD OF THE SIMULATION REPLACED BY NN.V WHICH PREDICTS THE REWARD
- SENDING TO THE MASTER PLENTY OF (STATE, PI CHOSEN BY MCTS, REWARD OF THE GAME)
THE MASTER:
- PROVIDES NN.PI (PROBABILITY DISTRIBUTION ON ACTIONS) AND NN.V (ESTIMATED
REWARD) IN STATES, AS REQUESTED BY CLIENTS
- LEARNS A BETTER NN BASED ON CLIENTS’ SIMULATIONS
Prediction
of value p
imitates
𝜋 from
MCTS
Weight
decay
MASTER
(neural net)
CLIENT 1
(simulating
games using
MCTS)
State s
NN.Pi(s), NN.V(s)
Pi(s) = probabilities of actions in state s
V(s) = estimated winning rate in s
Training of
𝜋 and V
Inference
State s, MCTS.Pi(s),
real reward R
Actually there is a replay buffer.
Simulated results are sent to a data structure.
The training picks up data and performs
stochastic gradient descent.
1. MCTS
2. AlphaZero: adding conv nets
3. AlphaZero – great performances
4. AlphaZero – limitations
5. Open Sourcing
6. Research directions
ALPHAZERO: GREAT RESULTS
[SILVER ET AL, NATURE PAPERS + ARXIV]
• NO GAME SPECIFIC KNOWLEDGE
• USING MASSIVE COMPUTATIONAL POWER
• EXTENSIVE REPRESENTATION FOR ACTIONS
ALPHAZERO: GREAT RESULTS
[SILVER ET AL, NATURE PAPERS + ARXIV]
• NO GAME SPECIFIC KNOWLEDGE
• USING MASSIVE COMPUTATIONAL POWER
• EXTENSIVE REPRESENTATION FOR ACTIONS ç ONE CHANNEL FOR EACH POSSIBLE RELATIVE
MOVE OF EACH PIECE!!! HOW MANY CHANNELS FOR GO ? FOR CHESS ?
1. MCTS
2. AlphaZero: adding conv nets
3. AlphaZero – great performances
4. AlphaZero – limitations
5. Open Sourcing
6. Research directions
ALPHAZERO: OPEN PROBLEMS
1. BASED ON MCTS è SO THAT APPLICATION TO PARTIAL OBSERVATION IS NOT TRIVIAL
2. BASED ON MCTS è SO THAT SIMULATORS/BACKTRACKS ARE NECESSARY (WHITE BOX)
3. BASED ON NN è HOW TO DEAL WITH HUGE / COMPLEX ACTION SPACES
4. BASED ON NN è HUGE NUMBER OF SIMULATED GAMES è REDUCED DATA ?
You need a very special
MCTS for partially-
observable games
- Should we learn just V or just 𝜋 or both ?
- Complex action spaces ?
- Partially observable games ? (defogization)
ALPHAZERO: OPEN PROBLEMS
1. BASED ON MCTS è SO THAT APPLICATION TO PARTIAL OBSERVATION IS NOT TRIVIAL
2. BASED ON MCTS è SO THAT SIMULATORS/BACKTRACKS ARE NECESSARY (WHITE BOX)
3. BASED ON NN è HOW TO DEAL WITH HUGE / COMPLEX ACTION SPACES
4. BASED ON NN è HUGE NUMBER OF SIMULATED GAMES è REDUCED DATA ?
WHY DOES ZERO FAIL IN PARTIALLY OBSERVABLE GAMES ?
ALPHAZERO: OPEN PROBLEMS
1. BASED ON MCTS è SO THAT APPLICATION TO PARTIAL OBSERVATION IS NOT TRIVIAL
2. BASED ON MCTS è SO THAT SIMULATORS/BACKTRACKS ARE NECESSARY (WHITE BOX)
3. BASED ON NN è HOW TO DEAL WITH HUGE / COMPLEX ACTION SPACES
4. BASED ON NN è HUGE NUMBER OF SIMULATED GAMES è REDUCED DATA ?
WHY DOES ZERO FAIL IN PARTIALLY OBSERVABLE GAMES ?
è BECAUSE MCTS NEEDS SIMULATIONS “STATE, ACTION à NEW STATE”
1. MCTS
2. AlphaZero: adding conv nets
3. AlphaZero – great performances
4. AlphaZero – limitations
5. Open Sourcing
6. Research directions
POLYGAMES, OPEN SOURCED RECENTLY!
1. MODIFY ONE AND ONLY ONE FILE, SO THAT THE GAME IS YOURS:
• class State {
• bool PlayAction(Action action) ç what happens if we play this move ?
• vector<Action> getLegalActions() ç what are the legal moves ?
• float getReward(int player) ç which reward did that player win ?
• bool terminated() ç is the game over ?
• int getCurrentplayer() ç who should play now ?
• vector<float> getFeature() ç input of the neural net (vector, reshaped as a 3D tensor as below)
• vector<int> getFeatureSize() ç shape of the input of the neural net (Polygames will reshape accordingly)
• … a few technical things…
• } and a class of actions (each action is mapped to an output neuron)
2. GET LEARNING CURVES ON YOUR GAME
• with different architectures (cool: structured output)
• In a couple of days, single machine
• In progress: a class of partially observable games
+ interface with LUDII ?
STRONG COMMITMENT TO OPEN SOURCE &
ACADEMIC PUBLICATION (FACEBOOK)
Open source &
exports to Onnx
format, usable
everywhere
Open source, beats
pros in Go
Starcraft
OpenData
On […] English-French and […] German-English
benchmarks, our models respectively obtain
[…], outperforming the state of the art by more
than 11 BLEU points. On low-resource
languages like English-Urdu and English-
Romanian, our methods achieve even better
results […]. Our code for NMT and PBSMT is
publicly available.
https://github.com/TorchCraft/
Pytorch for
Starcraft
STRONG COMMITMENT TO OPEN SOURCE &
ACADEMIC PUBLICATION (FACEBOOK)
<< At FAIR, we openly share our advances as much as we can, as fast as we
can in the form of technical papers, open source code and teaching
material. >> (Y. Le Cun, Facebook, BusinessInsider)
1. MCTS
2. AlphaZero: adding conv nets
3. AlphaZero – great performances
4. AlphaZero – limitations
5. Open Sourcing
6. Research directions
POLYGAMES: PARTIALLY OBSERVABLE GAMES
Main challenge in partially observable games: building the probability distribution of hidden states, assuming Nash policies.
Papers: Mundhenk, Rintanen… show 2EXP complexity for many partially observable games, and undecidability in some cases.
Consider Chinese Dark Chess. A part of the information is hidden.
But it’s simple (in that case): just randomly draw the hidden information
when it’s revealed è you can simulate CDC in MCTS.
The same principle applies when the hidden information is the same for all players.
Example: Minesweeper (single player!).
- Naïve version: randomly draw the positions of mines until you find something consistent with observations.
- This is slow, but equivalent to the classical minesweeper.
- Faster: use constraint satisfaction problems (cf Studholm’s paper)
POLYGAMES: THE RED QUEEN EFFECT AND TOURNAMENTS
Zero-learning = fixed point algorithm.
• Stops when MCTS(NN) = NN ?
• Fixed point whereas no
order on players ? (A > B > C > A)
èKeep an archive
èRelated: Grigoriadis & Khachyan, 1994
POLYGAMES:
PROVING
STUFF ?
Proof of
convergence
?
Even better,
prescribing
something:
Which level of noise ?
Which formula ?
How many MCTS simulations ?
POLYGAMES: STRUCTURED OUTPUT
Consider Go or Breakthrough or Draughts or many others.
The output space is topologically related to the input space.
This link is destroyed by the FCMLP (fully connected multilayered perceptron).
Let us make the training faster by using convolutions everywhere in the network + global pooling.
+ applicable for any board size!
Cool stuff with Polygames
Cool stuff with Polygames
- learning in 13x13 and play in 19x19 at strong level (fully
convolutional nets)
Cool stuff with Polygames
- learning in 13x13 and play in 19x19 at strong level (fully
convolutional nets)
- strong checkpoints at many games
Cool stuff with Polygames
- learning in 13x13 and play in 19x19 at strong level (fully
convolutional nets)
- strong checkpoints at many games
- stochastic games
Cool stuff with Polygames
- learning in 13x13 and play in 19x19 at strong level (fully
convolutional nets)
- strong checkpoints at many games
- stochastic games
- possibility to add layers, channels, kernel width dynamically
Cool stuff with Polygames
- learning in 13x13 and play in 19x19 at strong level (fully
convolutional nets)
- strong checkpoints at many games
- stochastic games
- possibility to add layers, channels, kernel width dynamically
- distributed
Cool stuff with Polygames
- learning in 13x13 and play in 19x19 at strong level (fully
convolutional nets)
- strong checkpoints at many games
- stochastic games
- possibility to add layers, channels, kernel width dynamically
- distributed
- a few partially observable games (Minesweeper)
Cool stuff with Polygames
- learning in 13x13 and play in 19x19 at strong level (fully
convolutional nets)
- strong checkpoints at many games
- stochastic games
- possibility to add layers, channels, kernel width dynamically
- distributed
- a few partially observable games (Minesweeper)
- maintained, open sourced, readable
Cool stuff with Polygames
- learning in 13x13 and play in 19x19 at strong level (fully
convolutional nets)
- strong checkpoints at many games
- stochastic games
- possibility to add layers, channels, kernel width dynamically
- distributed
- a few partially observable games (Minesweeper)
- maintained, open sourced, readable
- understand complex crucial things: mask illegal actions rather
than learning logit -infinity
Cool stuff with Polygames
- learning in 13x13 and play in 19x19 at strong level (fully
convolutional nets)
- strong checkpoints at many games
- stochastic games
- possibility to add layers, channels, kernel width dynamically
- distributed
- a few partially observable games (Minesweeper)
- maintained, open sourced, readable
- understand complex crucial things: mask illegal actions rather
than learning logit –infinity
- tournament mode for robust learning
Cool stuff with Polygames
- learning in 13x13 and play in 19x19 at strong level (fully
convolutional nets)
- strong checkpoints at many games
- stochastic games
- possibility to add layers, channels, kernel width dynamically
- distributed
- a few partially observable games (Minesweeper)
- maintained, open sourced, readable
- understand complex crucial things: mask illegal actions rather
than learning logit –infinity
- tournament mode for robust learning
- in progress: learning with side information
HEX
According to Bonnet et al
(https://www.lamsade.dauphine.fr/~bonnet/publi/connection-
games.pdf), “Since its independent inventions in 1942 and 1948 by
the poet and mathematician Piet Hein and the economist and
mathematician John Nash, the game of hex has acquired a special
spot in the heart of abstract game aficionados. Its purity and depth
has lead Jack van Rijswijck to conclude his PhD thesis with the
following hyperbole [1]: << Hex has a Platonic existence,
independent of human thought. If ever we find an
extraterrestrial civilization at all, they will know hex, without
any doubt.>> ”
HEX
Simplest rules ever!
I play black.
You play white.
We put a stone in turn.
If I connect my sides, I win.
If you connect your sides, you win.
Theorem: no draw.
Until 2019/10/31: no computer managed to beat the best humans!
HEX
Simplest rules ever!
I play black.
You play white.
We put a stone in turn.
If I connect my sides, I win.
If you connect your sides, you win.
Theorem: no draw.
Until 2019/10/31: no computer managed to beat the best humans!
HEX
Polygames vs Arek Kulczycki
Bunch of GPUs, several days.
Operated & trained by Vegard, a.k.a
“un putain de hacker de ouf”. (winner last LG tournament, best
ELO-rank on the LittleGolem server)
Thanks a lot ! ! !
HEX
Simplest rules ever!
I play black.
You play white.
We put a stone in turn.
If I connect my sides, I win.
If you connect your sides, you win.
Theorem: no draw.
Until 2019/10/31: no computer managed to beat the best humans!
(Max Pixel)
HEX
Simplest rules ever!
I play black.
You play white.
We put a stone in turn.
If I connect my sides, I win.
If you connect your sides, you win.
Theorem: no draw.
Until 2019/10/31: no computer managed to beat the best humans!
(pngimg.com)
HEX
Simplest rules ever!
I play black.
You play white.
We put a stone in turn.
If I connect my sides, I win.
If you connect your sides, you win.
Theorem: no draw.
Until 2019/10/31: no computer managed to beat the best humans!
Fantastic game with a super
long final path!
Breakthrough: seemingly a win for white. Draw with best
bots. Maybe needs a pie rule or a 12* rule.
(12*: turns 122112211221122…
Pie: second player can swap roles.)
Othello: won all* games against 2 strong bots (incl. winner
Olympiads 2019).
Einstein: not yet much results, looks like we play well.
*except one by human
operator mistake!
THE END !!!
… we’re coming in many other
games, stay tuned J
(and help us
- join the group J )
Havannah: big board, diversity of
winning conditions, long games,
hexagons…
LUDII: enormous library of games,
interfacing in progress with the
Maastricht games gang.

Weitere ähnliche Inhalte

Was ist angesagt?

Neural networks - BigSkyDevCon
Neural networks - BigSkyDevConNeural networks - BigSkyDevCon
Neural networks - BigSkyDevConryanstout
 
Computational Linguistics week 10
 Computational Linguistics week 10 Computational Linguistics week 10
Computational Linguistics week 10Mark Chang
 
TensorFlow 深度學習快速上手班--自然語言處理應用
TensorFlow 深度學習快速上手班--自然語言處理應用TensorFlow 深度學習快速上手班--自然語言處理應用
TensorFlow 深度學習快速上手班--自然語言處理應用Mark Chang
 
Tech day ngobrol santai tensorflow
Tech day ngobrol santai tensorflowTech day ngobrol santai tensorflow
Tech day ngobrol santai tensorflowRamdhan Rizki
 
Deep Learning for AI (3)
Deep Learning for AI (3)Deep Learning for AI (3)
Deep Learning for AI (3)Dongheon Lee
 
RNNs for Timeseries Analysis
RNNs for Timeseries AnalysisRNNs for Timeseries Analysis
RNNs for Timeseries AnalysisBruno Gonçalves
 
An evolutionary tic tac toe player ccit2012
An evolutionary tic tac toe player ccit2012An evolutionary tic tac toe player ccit2012
An evolutionary tic tac toe player ccit2012Belal Al-Khateeb
 
[系列活動] 手把手的深度學實務
[系列活動] 手把手的深度學實務[系列活動] 手把手的深度學實務
[系列活動] 手把手的深度學實務台灣資料科學年會
 

Was ist angesagt? (9)

Neural networks - BigSkyDevCon
Neural networks - BigSkyDevConNeural networks - BigSkyDevCon
Neural networks - BigSkyDevCon
 
Computational Linguistics week 10
 Computational Linguistics week 10 Computational Linguistics week 10
Computational Linguistics week 10
 
TensorFlow 深度學習快速上手班--自然語言處理應用
TensorFlow 深度學習快速上手班--自然語言處理應用TensorFlow 深度學習快速上手班--自然語言處理應用
TensorFlow 深度學習快速上手班--自然語言處理應用
 
Tech day ngobrol santai tensorflow
Tech day ngobrol santai tensorflowTech day ngobrol santai tensorflow
Tech day ngobrol santai tensorflow
 
Deep Learning for AI (3)
Deep Learning for AI (3)Deep Learning for AI (3)
Deep Learning for AI (3)
 
RNNs for Timeseries Analysis
RNNs for Timeseries AnalysisRNNs for Timeseries Analysis
RNNs for Timeseries Analysis
 
A gentle introduction to Deep Learning
A gentle introduction to Deep LearningA gentle introduction to Deep Learning
A gentle introduction to Deep Learning
 
An evolutionary tic tac toe player ccit2012
An evolutionary tic tac toe player ccit2012An evolutionary tic tac toe player ccit2012
An evolutionary tic tac toe player ccit2012
 
[系列活動] 手把手的深度學實務
[系列活動] 手把手的深度學實務[系列活動] 手把手的深度學實務
[系列活動] 手把手的深度學實務
 

Ähnlich wie AlphaZero and beyond: Polygames

How DeepMind Mastered The Game Of Go
How DeepMind Mastered The Game Of GoHow DeepMind Mastered The Game Of Go
How DeepMind Mastered The Game Of GoTim Riser
 
J-Fall 2017 - AI Self-learning Game Playing
J-Fall 2017 - AI Self-learning Game PlayingJ-Fall 2017 - AI Self-learning Game Playing
J-Fall 2017 - AI Self-learning Game PlayingRichard Abbuhl
 
Getting Started with Machine Learning
Getting Started with Machine LearningGetting Started with Machine Learning
Getting Started with Machine LearningHumberto Marchezi
 
Introduction to Alphago Zero
Introduction to Alphago ZeroIntroduction to Alphago Zero
Introduction to Alphago ZeroChia-Ching Lin
 
iSense Java Summit - Self-learning Game Playing
iSense Java Summit - Self-learning Game PlayingiSense Java Summit - Self-learning Game Playing
iSense Java Summit - Self-learning Game PlayingRichard Abbuhl
 
Deep learning simplified
Deep learning simplifiedDeep learning simplified
Deep learning simplifiedLovelyn Rose
 
Machine learning 2016: deep networks and Monte Carlo Tree Search
Machine learning 2016: deep networks and Monte Carlo Tree SearchMachine learning 2016: deep networks and Monte Carlo Tree Search
Machine learning 2016: deep networks and Monte Carlo Tree SearchOlivier Teytaud
 
Machine learning 2016: deep networks and Monte Carlo Tree Search
Machine learning 2016: deep networks and Monte Carlo Tree SearchMachine learning 2016: deep networks and Monte Carlo Tree Search
Machine learning 2016: deep networks and Monte Carlo Tree SearchOlivier Teytaud
 
Gan seminar
Gan seminarGan seminar
Gan seminarSan Kim
 
深度學習在AOI的應用
深度學習在AOI的應用深度學習在AOI的應用
深度學習在AOI的應用CHENHuiMei
 
@RISK Unchained Webinar
@RISK Unchained Webinar@RISK Unchained Webinar
@RISK Unchained WebinarAndrew Sich
 
Monte Carlo Tree Search in 2014 (MCMC days in Marseille)
Monte Carlo Tree Search in 2014 (MCMC days in Marseille)Monte Carlo Tree Search in 2014 (MCMC days in Marseille)
Monte Carlo Tree Search in 2014 (MCMC days in Marseille)Olivier Teytaud
 
Getting Started with Keras and TensorFlow - StampedeCon AI Summit 2017
Getting Started with Keras and TensorFlow - StampedeCon AI Summit 2017Getting Started with Keras and TensorFlow - StampedeCon AI Summit 2017
Getting Started with Keras and TensorFlow - StampedeCon AI Summit 2017StampedeCon
 
Machine Learning Essentials Demystified part2 | Big Data Demystified
Machine Learning Essentials Demystified part2 | Big Data DemystifiedMachine Learning Essentials Demystified part2 | Big Data Demystified
Machine Learning Essentials Demystified part2 | Big Data DemystifiedOmid Vahdaty
 
GALE: Geometric active learning for Search-Based Software Engineering
GALE: Geometric active learning for Search-Based Software EngineeringGALE: Geometric active learning for Search-Based Software Engineering
GALE: Geometric active learning for Search-Based Software EngineeringCS, NcState
 
Machine Learning, Deep Learning and Data Analysis Introduction
Machine Learning, Deep Learning and Data Analysis IntroductionMachine Learning, Deep Learning and Data Analysis Introduction
Machine Learning, Deep Learning and Data Analysis IntroductionTe-Yen Liu
 

Ähnlich wie AlphaZero and beyond: Polygames (20)

How DeepMind Mastered The Game Of Go
How DeepMind Mastered The Game Of GoHow DeepMind Mastered The Game Of Go
How DeepMind Mastered The Game Of Go
 
J-Fall 2017 - AI Self-learning Game Playing
J-Fall 2017 - AI Self-learning Game PlayingJ-Fall 2017 - AI Self-learning Game Playing
J-Fall 2017 - AI Self-learning Game Playing
 
Getting Started with Machine Learning
Getting Started with Machine LearningGetting Started with Machine Learning
Getting Started with Machine Learning
 
Introduction to Alphago Zero
Introduction to Alphago ZeroIntroduction to Alphago Zero
Introduction to Alphago Zero
 
iSense Java Summit - Self-learning Game Playing
iSense Java Summit - Self-learning Game PlayingiSense Java Summit - Self-learning Game Playing
iSense Java Summit - Self-learning Game Playing
 
Deep learning simplified
Deep learning simplifiedDeep learning simplified
Deep learning simplified
 
Machine learning 2016: deep networks and Monte Carlo Tree Search
Machine learning 2016: deep networks and Monte Carlo Tree SearchMachine learning 2016: deep networks and Monte Carlo Tree Search
Machine learning 2016: deep networks and Monte Carlo Tree Search
 
Machine learning 2016: deep networks and Monte Carlo Tree Search
Machine learning 2016: deep networks and Monte Carlo Tree SearchMachine learning 2016: deep networks and Monte Carlo Tree Search
Machine learning 2016: deep networks and Monte Carlo Tree Search
 
Gan seminar
Gan seminarGan seminar
Gan seminar
 
AI and Deep Learning
AI and Deep Learning AI and Deep Learning
AI and Deep Learning
 
深度學習在AOI的應用
深度學習在AOI的應用深度學習在AOI的應用
深度學習在AOI的應用
 
@RISK Unchained Webinar
@RISK Unchained Webinar@RISK Unchained Webinar
@RISK Unchained Webinar
 
Monte Carlo Tree Search in 2014 (MCMC days in Marseille)
Monte Carlo Tree Search in 2014 (MCMC days in Marseille)Monte Carlo Tree Search in 2014 (MCMC days in Marseille)
Monte Carlo Tree Search in 2014 (MCMC days in Marseille)
 
Understanding AlphaGo
Understanding AlphaGoUnderstanding AlphaGo
Understanding AlphaGo
 
Getting Started with Keras and TensorFlow - StampedeCon AI Summit 2017
Getting Started with Keras and TensorFlow - StampedeCon AI Summit 2017Getting Started with Keras and TensorFlow - StampedeCon AI Summit 2017
Getting Started with Keras and TensorFlow - StampedeCon AI Summit 2017
 
Machine Learning Essentials Demystified part2 | Big Data Demystified
Machine Learning Essentials Demystified part2 | Big Data DemystifiedMachine Learning Essentials Demystified part2 | Big Data Demystified
Machine Learning Essentials Demystified part2 | Big Data Demystified
 
GALE: Geometric active learning for Search-Based Software Engineering
GALE: Geometric active learning for Search-Based Software EngineeringGALE: Geometric active learning for Search-Based Software Engineering
GALE: Geometric active learning for Search-Based Software Engineering
 
Machine Learning, Deep Learning and Data Analysis Introduction
Machine Learning, Deep Learning and Data Analysis IntroductionMachine Learning, Deep Learning and Data Analysis Introduction
Machine Learning, Deep Learning and Data Analysis Introduction
 
Ai in games
Ai in gamesAi in games
Ai in games
 
Phx dl meetup
Phx dl meetupPhx dl meetup
Phx dl meetup
 

Kürzlich hochgeladen

Bioteknologi kelas 10 kumer smapsa .pptx
Bioteknologi kelas 10 kumer smapsa .pptxBioteknologi kelas 10 kumer smapsa .pptx
Bioteknologi kelas 10 kumer smapsa .pptx023NiWayanAnggiSriWa
 
Harmful and Useful Microorganisms Presentation
Harmful and Useful Microorganisms PresentationHarmful and Useful Microorganisms Presentation
Harmful and Useful Microorganisms Presentationtahreemzahra82
 
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
RESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptx
RESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptxRESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptx
RESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptxFarihaAbdulRasheed
 
Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024AyushiRastogi48
 
Transposable elements in prokaryotes.ppt
Transposable elements in prokaryotes.pptTransposable elements in prokaryotes.ppt
Transposable elements in prokaryotes.pptArshadWarsi13
 
OECD bibliometric indicators: Selected highlights, April 2024
OECD bibliometric indicators: Selected highlights, April 2024OECD bibliometric indicators: Selected highlights, April 2024
OECD bibliometric indicators: Selected highlights, April 2024innovationoecd
 
User Guide: Magellan MX™ Weather Station
User Guide: Magellan MX™ Weather StationUser Guide: Magellan MX™ Weather Station
User Guide: Magellan MX™ Weather StationColumbia Weather Systems
 
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.PraveenaKalaiselvan1
 
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...lizamodels9
 
Microphone- characteristics,carbon microphone, dynamic microphone.pptx
Microphone- characteristics,carbon microphone, dynamic microphone.pptxMicrophone- characteristics,carbon microphone, dynamic microphone.pptx
Microphone- characteristics,carbon microphone, dynamic microphone.pptxpriyankatabhane
 
Pests of soyabean_Binomics_IdentificationDr.UPR.pdf
Pests of soyabean_Binomics_IdentificationDr.UPR.pdfPests of soyabean_Binomics_IdentificationDr.UPR.pdf
Pests of soyabean_Binomics_IdentificationDr.UPR.pdfPirithiRaju
 
User Guide: Orion™ Weather Station (Columbia Weather Systems)
User Guide: Orion™ Weather Station (Columbia Weather Systems)User Guide: Orion™ Weather Station (Columbia Weather Systems)
User Guide: Orion™ Weather Station (Columbia Weather Systems)Columbia Weather Systems
 
Behavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdfBehavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdfSELF-EXPLANATORY
 
Functional group interconversions(oxidation reduction)
Functional group interconversions(oxidation reduction)Functional group interconversions(oxidation reduction)
Functional group interconversions(oxidation reduction)itwameryclare
 
Call Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 Genuine
Call Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 GenuineCall Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 Genuine
Call Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 Genuinethapagita
 
Speech, hearing, noise, intelligibility.pptx
Speech, hearing, noise, intelligibility.pptxSpeech, hearing, noise, intelligibility.pptx
Speech, hearing, noise, intelligibility.pptxpriyankatabhane
 
Dubai Calls Girl Lisa O525547819 Lexi Call Girls In Dubai
Dubai Calls Girl Lisa O525547819 Lexi Call Girls In DubaiDubai Calls Girl Lisa O525547819 Lexi Call Girls In Dubai
Dubai Calls Girl Lisa O525547819 Lexi Call Girls In Dubaikojalkojal131
 
The dark energy paradox leads to a new structure of spacetime.pptx
The dark energy paradox leads to a new structure of spacetime.pptxThe dark energy paradox leads to a new structure of spacetime.pptx
The dark energy paradox leads to a new structure of spacetime.pptxEran Akiva Sinbar
 
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)User Guide: Pulsar™ Weather Station (Columbia Weather Systems)
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)Columbia Weather Systems
 

Kürzlich hochgeladen (20)

Bioteknologi kelas 10 kumer smapsa .pptx
Bioteknologi kelas 10 kumer smapsa .pptxBioteknologi kelas 10 kumer smapsa .pptx
Bioteknologi kelas 10 kumer smapsa .pptx
 
Harmful and Useful Microorganisms Presentation
Harmful and Useful Microorganisms PresentationHarmful and Useful Microorganisms Presentation
Harmful and Useful Microorganisms Presentation
 
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
 
RESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptx
RESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptxRESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptx
RESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptx
 
Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024
 
Transposable elements in prokaryotes.ppt
Transposable elements in prokaryotes.pptTransposable elements in prokaryotes.ppt
Transposable elements in prokaryotes.ppt
 
OECD bibliometric indicators: Selected highlights, April 2024
OECD bibliometric indicators: Selected highlights, April 2024OECD bibliometric indicators: Selected highlights, April 2024
OECD bibliometric indicators: Selected highlights, April 2024
 
User Guide: Magellan MX™ Weather Station
User Guide: Magellan MX™ Weather StationUser Guide: Magellan MX™ Weather Station
User Guide: Magellan MX™ Weather Station
 
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
 
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...
 
Microphone- characteristics,carbon microphone, dynamic microphone.pptx
Microphone- characteristics,carbon microphone, dynamic microphone.pptxMicrophone- characteristics,carbon microphone, dynamic microphone.pptx
Microphone- characteristics,carbon microphone, dynamic microphone.pptx
 
Pests of soyabean_Binomics_IdentificationDr.UPR.pdf
Pests of soyabean_Binomics_IdentificationDr.UPR.pdfPests of soyabean_Binomics_IdentificationDr.UPR.pdf
Pests of soyabean_Binomics_IdentificationDr.UPR.pdf
 
User Guide: Orion™ Weather Station (Columbia Weather Systems)
User Guide: Orion™ Weather Station (Columbia Weather Systems)User Guide: Orion™ Weather Station (Columbia Weather Systems)
User Guide: Orion™ Weather Station (Columbia Weather Systems)
 
Behavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdfBehavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdf
 
Functional group interconversions(oxidation reduction)
Functional group interconversions(oxidation reduction)Functional group interconversions(oxidation reduction)
Functional group interconversions(oxidation reduction)
 
Call Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 Genuine
Call Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 GenuineCall Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 Genuine
Call Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 Genuine
 
Speech, hearing, noise, intelligibility.pptx
Speech, hearing, noise, intelligibility.pptxSpeech, hearing, noise, intelligibility.pptx
Speech, hearing, noise, intelligibility.pptx
 
Dubai Calls Girl Lisa O525547819 Lexi Call Girls In Dubai
Dubai Calls Girl Lisa O525547819 Lexi Call Girls In DubaiDubai Calls Girl Lisa O525547819 Lexi Call Girls In Dubai
Dubai Calls Girl Lisa O525547819 Lexi Call Girls In Dubai
 
The dark energy paradox leads to a new structure of spacetime.pptx
The dark energy paradox leads to a new structure of spacetime.pptxThe dark energy paradox leads to a new structure of spacetime.pptx
The dark energy paradox leads to a new structure of spacetime.pptx
 
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)User Guide: Pulsar™ Weather Station (Columbia Weather Systems)
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)
 

AlphaZero and beyond: Polygames

  • 1. Zero learning, old and new. Tristan Cazenave, Univ. Dauphine Yen-Chi Chen, National Taiwan Normal University Guan-Wei Chen, National Dong Hwa University Shi-Yu Chen, National Dong Hwa University Xian-Dong Chiu, National Dong Hwa University Julien Dehos, Univ. Littoral Cote d’Opale Maria Elsa, National Dong Hwa University Qucheng Gong, Facebook AI Research Hengyuan Hu, Facebook AI Research Vasil Khalidov, Facebook AI Research Chen-Ling Li, National Dong Hwa University Hsin-I Lin, National Dong Hwa University Yu-Jin Lin, National Dong Hwa University OlIvier Teytaud Started to work in AI last century. Currently working on games, alphazero style learning, derivative-free optimization. Has been working at ARTELYS, INRIA, GOOGLE, FB. Xavier Martinet, Facebook AI Research Vegard Mella, Facebook AI Research Jeremy Rapin, Facebook AI Research Baptiste Roziere, Facebook AI Research Gabriel Synnaeve, Facebook AI Research Fabien Teytaud, Univ. Littoral Cote d’Opale Olivier Teytaud, Facebook AI Research Shi-Cheng Ye, National Dong Hwa University Yi-Jun Ye, National Dong Hwa University Shi-Jim Yen, National Dong Hwa University Sergey Zagoruyko, Facebook AI Research
  • 2. 1. MCTS = Monte Carlo Tree Search 2. AlphaZero: adding conv nets 3. AlphaZero – great performances 4. AlphaZero – limitations 5. Open Sourcing 6. Research directions
  • 3. ALPHAZERO INGREDIENT #1: MCTS MCTS (MONTE CARLO TREE SEARCH) WAS ORIGINALLY PUBLISHED IN [COULOM06]. WAS ENOUGH FOR WINNING GAMES AGAINST PROS IN 9X9 +19X19 WITH HANDICAP ~4. QUITE STRONG FOR FULLY OBSERVABLE “GENERAL GAME PLAYING” (I.E. THE PROGRAM MUST FIRST UNDERSTAND THE RULES). UCT (UPPER CONFIDENCE TREES) IS A VARIANT OF MCTS (USING UCB). usgo.org
  • 4. Coulom (06) Chaslot, Saito & Bouzy (06) Kocsis Szepesvari (06) UCT (UPPER CONFIDENCE TREES) STARTS WITH SIMPLE MONTE CARLO (Monte Carlo) Monte Carlo …
  • 5. UCT (Monte Carlo) Monte Carlo … and keep track of statistics!
  • 10. EXPLOITATION ... Monte Carlo, and build statistics… and modify MC with those statistics!
  • 11. EXPLOITATION ... SCORE = 5/7 + k.sqrt( log(10)/7 )
  • 12. EXPLOITATION ... SCORE = 5/7 + k.sqrt( log(10)/7 )
  • 13. EXPLOITATION ... SCORE = 5/7 + k.sqrt( log(10)/7 )
  • 14. ... OR EXPLORATION ? SCORE = 0/2 + k.sqrt( log(10)/2 )
  • 15. UCT IN ONE SLIDE UCT for choosing a move in a board B While ( I have time left ) { Do a simulation { Start at board B At each time step, choose action by UCB (or random if no statistics!) } Update statistics with this simulation } Return the most simulated action.
  • 16. 1. MCTS 2. AlphaZero: adding conv nets 3. AlphaZero – great performances 4. AlphaZero – limitations 5. Open Sourcing 6. Research directions
  • 17. ALPHAZERO INGREDIENT #2: DEEP NETWORK OVERVIEW IN “DEEP LEARNING”, LECUN, BENGIO, HINTON 2015 BOTH A CRITIC NETWORK (EVALUATING THE PROBABILITY OF WINNING IN A GIVEN POSITION) AND A POLICY NETWORK (PROVIDING A PROBABILITY DISTRIBUTION ON ACTIONS). Image “clarifai.com/technology” and “the data science blog” ß ß ß Invariance by translation à à à High level features
  • 18. PUCT: UCT WITH PRIOR SCORE(state, action) = 5/7 + NN(state, action) .sqrt( 10 / 7 ) No logNN based
  • 19. ALPHAZERO IN A NUTSHELL: A FIXED POINT METHOD! MCTS(NN): A MCTS WHICH USES A NEURAL NET NN FOR • EVALUATING LEAVES (NO RANDOM ROLLOUT) • SUGGESTING POLICIES (BIASING THE MCTS) NN ß MCTS: • EACH CLIENT: PLAYS GAMES WITH A MCTS(NN) • SERVER: • RECEIVES BATCHES “(STATES, ACTIONS, REWARD AT END OF GAMES)” • TWO LOSS FUNCTIONS (+WEIGHT DECAY): • LEARN “STATE à REWARD” (CRITIC) • LEARN “STATE à PROBABILITY DISTRIBUTION ON ACTIONS” (ACTOR) , I.E. MIMIC THE MCTS ALPHAZERO: • RANDOMLY INITIALIZE NN • ITERATIVELY IMITATE: NN-ACTOR è MCTS(NN) NN-CRITIC è GAME RESULTS Prediction of value p imitates 𝜋 from MCTS Weight decay
  • 20. ALPHAZERO TRAINING IN ONE SLIDE THE CLIENTS PERFORM SIMULATIONS USING THE NEURAL NETWORKS OUTPUTS: - NN.PI - NN.V EACH CLIENT, MORE DETAILS: - RUNNING MCTS MODIFIED AS FOLLOWS: - UCB SCORE COMBINED BY NN.PI (``PUCT’’ FORMULA) - RANDOM ROLLOUTS (CORRESPONDING TO STATES WITH ZERO SIMULATION) REPLACED BY: - 1 SINGLE STEP WITH NN.PI FOR CHOOSING ONE ACTION - REWARD OF THE SIMULATION REPLACED BY NN.V WHICH PREDICTS THE REWARD - SENDING TO THE MASTER PLENTY OF (STATE, PI CHOSEN BY MCTS, REWARD OF THE GAME) THE MASTER LEARNS A BETTER NN BASED ON CLIENTS’ SIMULATIONS Prediction of value p imitates 𝜋 from MCTS Weight decay MASTER (neural net) CLIENT 1 (simulating games using MCTS) State s NN.Pi(s), NN.V(s) Pi(s) = probabilities of actions in state s V(s) = estimated winning rate in s Training of 𝜋 and V Inference State s, MCTS.Pi(s), real reward R
  • 21. ALPHAZERO TRAINING IN ONE SLIDE THE CLIENTS PERFORM SIMULATIONS USING THE NEURAL NETWORKS OUTPUTS: - NN.PI - NN.V EACH CLIENT, MORE DETAILS: - RUNNING MCTS MODIFIED AS FOLLOWS: - UCB SCORE COMBINED BY NN.PI - RANDOM ROLLOUTS (CORRESPONDING TO STATES WITH ZERO SIMULATION) REPLACED BY: - 1 SINGLE STEP WITH NN.PI FOR CHOOSING ONE ACTION - REWARD OF THE SIMULATION REPLACED BY NN.V WHICH PREDICTS THE REWARD - SENDING TO THE MASTER PLENTY OF (STATE, PI CHOSEN BY MCTS, REWARD OF THE GAME) THE MASTER: - PROVIDES NN.PI (PROBABILITY DISTRIBUTION ON ACTIONS) AND NN.V (ESTIMATED REWARD) IN STATES, AS REQUESTED BY CLIENTS - LEARNS A BETTER NN BASED ON CLIENTS’ SIMULATIONS Prediction of value p imitates 𝜋 from MCTS Weight decay MASTER (neural net) CLIENT 1 (simulating games using MCTS) State s NN.Pi(s), NN.V(s) Pi(s) = probabilities of actions in state s V(s) = estimated winning rate in s Training of 𝜋 and V Inference State s, MCTS.Pi(s), real reward R Actually there is a replay buffer. Simulated results are sent to a data structure. The training picks up data and performs stochastic gradient descent.
  • 22. 1. MCTS 2. AlphaZero: adding conv nets 3. AlphaZero – great performances 4. AlphaZero – limitations 5. Open Sourcing 6. Research directions
  • 23. ALPHAZERO: GREAT RESULTS [SILVER ET AL, NATURE PAPERS + ARXIV] • NO GAME SPECIFIC KNOWLEDGE • USING MASSIVE COMPUTATIONAL POWER • EXTENSIVE REPRESENTATION FOR ACTIONS
  • 24. ALPHAZERO: GREAT RESULTS [SILVER ET AL, NATURE PAPERS + ARXIV] • NO GAME SPECIFIC KNOWLEDGE • USING MASSIVE COMPUTATIONAL POWER • EXTENSIVE REPRESENTATION FOR ACTIONS ç ONE CHANNEL FOR EACH POSSIBLE RELATIVE MOVE OF EACH PIECE!!! HOW MANY CHANNELS FOR GO ? FOR CHESS ?
  • 25. 1. MCTS 2. AlphaZero: adding conv nets 3. AlphaZero – great performances 4. AlphaZero – limitations 5. Open Sourcing 6. Research directions
  • 26. ALPHAZERO: OPEN PROBLEMS 1. BASED ON MCTS è SO THAT APPLICATION TO PARTIAL OBSERVATION IS NOT TRIVIAL 2. BASED ON MCTS è SO THAT SIMULATORS/BACKTRACKS ARE NECESSARY (WHITE BOX) 3. BASED ON NN è HOW TO DEAL WITH HUGE / COMPLEX ACTION SPACES 4. BASED ON NN è HUGE NUMBER OF SIMULATED GAMES è REDUCED DATA ? You need a very special MCTS for partially- observable games - Should we learn just V or just 𝜋 or both ? - Complex action spaces ? - Partially observable games ? (defogization)
  • 27. ALPHAZERO: OPEN PROBLEMS 1. BASED ON MCTS è SO THAT APPLICATION TO PARTIAL OBSERVATION IS NOT TRIVIAL 2. BASED ON MCTS è SO THAT SIMULATORS/BACKTRACKS ARE NECESSARY (WHITE BOX) 3. BASED ON NN è HOW TO DEAL WITH HUGE / COMPLEX ACTION SPACES 4. BASED ON NN è HUGE NUMBER OF SIMULATED GAMES è REDUCED DATA ? WHY DOES ZERO FAIL IN PARTIALLY OBSERVABLE GAMES ?
  • 28. ALPHAZERO: OPEN PROBLEMS 1. BASED ON MCTS è SO THAT APPLICATION TO PARTIAL OBSERVATION IS NOT TRIVIAL 2. BASED ON MCTS è SO THAT SIMULATORS/BACKTRACKS ARE NECESSARY (WHITE BOX) 3. BASED ON NN è HOW TO DEAL WITH HUGE / COMPLEX ACTION SPACES 4. BASED ON NN è HUGE NUMBER OF SIMULATED GAMES è REDUCED DATA ? WHY DOES ZERO FAIL IN PARTIALLY OBSERVABLE GAMES ? è BECAUSE MCTS NEEDS SIMULATIONS “STATE, ACTION à NEW STATE”
  • 29. 1. MCTS 2. AlphaZero: adding conv nets 3. AlphaZero – great performances 4. AlphaZero – limitations 5. Open Sourcing 6. Research directions
  • 30. POLYGAMES, OPEN SOURCED RECENTLY! 1. MODIFY ONE AND ONLY ONE FILE, SO THAT THE GAME IS YOURS: • class State { • bool PlayAction(Action action) ç what happens if we play this move ? • vector<Action> getLegalActions() ç what are the legal moves ? • float getReward(int player) ç which reward did that player win ? • bool terminated() ç is the game over ? • int getCurrentplayer() ç who should play now ? • vector<float> getFeature() ç input of the neural net (vector, reshaped as a 3D tensor as below) • vector<int> getFeatureSize() ç shape of the input of the neural net (Polygames will reshape accordingly) • … a few technical things… • } and a class of actions (each action is mapped to an output neuron) 2. GET LEARNING CURVES ON YOUR GAME • with different architectures (cool: structured output) • In a couple of days, single machine • In progress: a class of partially observable games + interface with LUDII ?
  • 31. STRONG COMMITMENT TO OPEN SOURCE & ACADEMIC PUBLICATION (FACEBOOK) Open source & exports to Onnx format, usable everywhere Open source, beats pros in Go Starcraft OpenData On […] English-French and […] German-English benchmarks, our models respectively obtain […], outperforming the state of the art by more than 11 BLEU points. On low-resource languages like English-Urdu and English- Romanian, our methods achieve even better results […]. Our code for NMT and PBSMT is publicly available. https://github.com/TorchCraft/ Pytorch for Starcraft
  • 32. STRONG COMMITMENT TO OPEN SOURCE & ACADEMIC PUBLICATION (FACEBOOK) << At FAIR, we openly share our advances as much as we can, as fast as we can in the form of technical papers, open source code and teaching material. >> (Y. Le Cun, Facebook, BusinessInsider)
  • 33. 1. MCTS 2. AlphaZero: adding conv nets 3. AlphaZero – great performances 4. AlphaZero – limitations 5. Open Sourcing 6. Research directions
  • 34. POLYGAMES: PARTIALLY OBSERVABLE GAMES Main challenge in partially observable games: building the probability distribution of hidden states, assuming Nash policies. Papers: Mundhenk, Rintanen… show 2EXP complexity for many partially observable games, and undecidability in some cases. Consider Chinese Dark Chess. A part of the information is hidden. But it’s simple (in that case): just randomly draw the hidden information when it’s revealed è you can simulate CDC in MCTS. The same principle applies when the hidden information is the same for all players. Example: Minesweeper (single player!). - Naïve version: randomly draw the positions of mines until you find something consistent with observations. - This is slow, but equivalent to the classical minesweeper. - Faster: use constraint satisfaction problems (cf Studholm’s paper)
  • 35. POLYGAMES: THE RED QUEEN EFFECT AND TOURNAMENTS Zero-learning = fixed point algorithm. • Stops when MCTS(NN) = NN ? • Fixed point whereas no order on players ? (A > B > C > A) èKeep an archive èRelated: Grigoriadis & Khachyan, 1994
  • 36. POLYGAMES: PROVING STUFF ? Proof of convergence ? Even better, prescribing something: Which level of noise ? Which formula ? How many MCTS simulations ?
  • 37. POLYGAMES: STRUCTURED OUTPUT Consider Go or Breakthrough or Draughts or many others. The output space is topologically related to the input space. This link is destroyed by the FCMLP (fully connected multilayered perceptron). Let us make the training faster by using convolutions everywhere in the network + global pooling. + applicable for any board size!
  • 38. Cool stuff with Polygames
  • 39. Cool stuff with Polygames - learning in 13x13 and play in 19x19 at strong level (fully convolutional nets)
  • 40. Cool stuff with Polygames - learning in 13x13 and play in 19x19 at strong level (fully convolutional nets) - strong checkpoints at many games
  • 41. Cool stuff with Polygames - learning in 13x13 and play in 19x19 at strong level (fully convolutional nets) - strong checkpoints at many games - stochastic games
  • 42. Cool stuff with Polygames - learning in 13x13 and play in 19x19 at strong level (fully convolutional nets) - strong checkpoints at many games - stochastic games - possibility to add layers, channels, kernel width dynamically
  • 43. Cool stuff with Polygames - learning in 13x13 and play in 19x19 at strong level (fully convolutional nets) - strong checkpoints at many games - stochastic games - possibility to add layers, channels, kernel width dynamically - distributed
  • 44. Cool stuff with Polygames - learning in 13x13 and play in 19x19 at strong level (fully convolutional nets) - strong checkpoints at many games - stochastic games - possibility to add layers, channels, kernel width dynamically - distributed - a few partially observable games (Minesweeper)
  • 45. Cool stuff with Polygames - learning in 13x13 and play in 19x19 at strong level (fully convolutional nets) - strong checkpoints at many games - stochastic games - possibility to add layers, channels, kernel width dynamically - distributed - a few partially observable games (Minesweeper) - maintained, open sourced, readable
  • 46. Cool stuff with Polygames - learning in 13x13 and play in 19x19 at strong level (fully convolutional nets) - strong checkpoints at many games - stochastic games - possibility to add layers, channels, kernel width dynamically - distributed - a few partially observable games (Minesweeper) - maintained, open sourced, readable - understand complex crucial things: mask illegal actions rather than learning logit -infinity
  • 47. Cool stuff with Polygames - learning in 13x13 and play in 19x19 at strong level (fully convolutional nets) - strong checkpoints at many games - stochastic games - possibility to add layers, channels, kernel width dynamically - distributed - a few partially observable games (Minesweeper) - maintained, open sourced, readable - understand complex crucial things: mask illegal actions rather than learning logit –infinity - tournament mode for robust learning
  • 48. Cool stuff with Polygames - learning in 13x13 and play in 19x19 at strong level (fully convolutional nets) - strong checkpoints at many games - stochastic games - possibility to add layers, channels, kernel width dynamically - distributed - a few partially observable games (Minesweeper) - maintained, open sourced, readable - understand complex crucial things: mask illegal actions rather than learning logit –infinity - tournament mode for robust learning - in progress: learning with side information
  • 49. HEX According to Bonnet et al (https://www.lamsade.dauphine.fr/~bonnet/publi/connection- games.pdf), “Since its independent inventions in 1942 and 1948 by the poet and mathematician Piet Hein and the economist and mathematician John Nash, the game of hex has acquired a special spot in the heart of abstract game aficionados. Its purity and depth has lead Jack van Rijswijck to conclude his PhD thesis with the following hyperbole [1]: << Hex has a Platonic existence, independent of human thought. If ever we find an extraterrestrial civilization at all, they will know hex, without any doubt.>> ”
  • 50. HEX Simplest rules ever! I play black. You play white. We put a stone in turn. If I connect my sides, I win. If you connect your sides, you win. Theorem: no draw. Until 2019/10/31: no computer managed to beat the best humans!
  • 51. HEX Simplest rules ever! I play black. You play white. We put a stone in turn. If I connect my sides, I win. If you connect your sides, you win. Theorem: no draw. Until 2019/10/31: no computer managed to beat the best humans!
  • 52. HEX Polygames vs Arek Kulczycki Bunch of GPUs, several days. Operated & trained by Vegard, a.k.a “un putain de hacker de ouf”. (winner last LG tournament, best ELO-rank on the LittleGolem server) Thanks a lot ! ! !
  • 53. HEX Simplest rules ever! I play black. You play white. We put a stone in turn. If I connect my sides, I win. If you connect your sides, you win. Theorem: no draw. Until 2019/10/31: no computer managed to beat the best humans! (Max Pixel)
  • 54. HEX Simplest rules ever! I play black. You play white. We put a stone in turn. If I connect my sides, I win. If you connect your sides, you win. Theorem: no draw. Until 2019/10/31: no computer managed to beat the best humans! (pngimg.com)
  • 55. HEX Simplest rules ever! I play black. You play white. We put a stone in turn. If I connect my sides, I win. If you connect your sides, you win. Theorem: no draw. Until 2019/10/31: no computer managed to beat the best humans! Fantastic game with a super long final path!
  • 56. Breakthrough: seemingly a win for white. Draw with best bots. Maybe needs a pie rule or a 12* rule. (12*: turns 122112211221122… Pie: second player can swap roles.) Othello: won all* games against 2 strong bots (incl. winner Olympiads 2019). Einstein: not yet much results, looks like we play well. *except one by human operator mistake!
  • 57. THE END !!! … we’re coming in many other games, stay tuned J (and help us - join the group J ) Havannah: big board, diversity of winning conditions, long games, hexagons… LUDII: enormous library of games, interfacing in progress with the Maastricht games gang.