SlideShare ist ein Scribd-Unternehmen logo
1 von 24
Downloaden Sie, um offline zu lesen
Mastering the game of Go with deep neural
networks and tree search
A Presentation
Aditya R Suryavamshi
Monday 12th
March, 2018
Dayananda Sagar College of Engineering
Table of contents
1. Introduction
2. Games
3. Playing Games as a Computer
4. AlphaGo
5. Conclusion
1
Introduction
About the Paper
• The paper is a seminal paper in the field of Artificial Intelligence
(AI), more specifically in the field of General Game Playing (GGI).
• Introduces a novel search algorithm that builds on top of
existing mechanisms to search through very dense trees of
games such as Go.
• The System developed in this paper ended up winning against
one of the strongest Go Players of our current time.
2
Authors of the Paper
• David Silver
• Aja Huang (Placed stones on the Go board for AlphaGo in the
match versus Lee Sedol @ the 2017 Future of Go Summit!)
Chris J. Maddison, Arthur Guez, Laurent Sifre, George van den
Driessche, Julian Schrittwieser, Ioannis Antonoglou, Veda
Panneershelvam, Marc Lanctot, Sander Dieleman, Dominik Grewe,
John Nham, Nal Kalchbrenner, Ilya Sutskever, Timothy Lillicrap,
Madeleine Leach, Koray Kavukcuoglu, Thore Graepel & Demis
Hassabis
3
Games
Games
• Games of Perfect Information
• Basically games which don’t involve chance, and all the players
have complete information to make the best decision for
themselves.
• These games have an associated value function v∗
(s), from every
state of the game s, under perfect play by all players
• Complexity of Perfect Information Games
• A game with breadth b and depth d will have bd
possible
sequence of moves.
• Typical Values are in the order of b = 35, d = 80 for chess, and b =
250, d = 150 for Go.
4
Go
• Why Go?
• Search Space is extremely large (≈ 250150
)
• Go is generally considered the kind of game that requires intuition
Objective
Surround a larger area of the board with your own stone than your
opponent.
Rules
• Players take alternate turns in placing the stone on the board
• Black moves first
• Rule of liberty
• Ko Rule
5
Playing Games as a Computer
How do you play games when you are a Computer?
Take 1
Exhaustive Search
• MinMax
• Depth First MinMax with Alpha-Beta Pruning
Issues
Exhaustive Searching is in-feasible for all but the simplest of games
(Tic-Tac-Toe).
6
How do you play games when you are a Computer?
Take 2
Cut Down on the Breadth and Depth of the Game
• Focus on only the promising Moves
• Depth Via Position Evaluation and truncation of search tree at
state s and replacing the subtree below by an approximate
value function v∗
(s) that predicts outcome from the state s
• Breadth via Sampling actions from a policy p(a|s)
Monte Carlo Rollouts
Uses a policy for both the players to search until the end, without
branching.
7
Monte Carlo Tree Search
Used to find out the most promising action that would optimize for
winning.
General Steps
Selection Select a Node according to a criteria (Usually
something like Upper Confidence Bound (UCB))
Expansion If the selected Node isn’t the leaf node, and hasn’t
been visited before, expand the Node with one or more
child Nodes, and select one of the child nodes.
Simulation Also called Evaluation, Playout or Rollout
Play a random playout from the node until the very
end to determine the value of the rollout.
Backpropogation Update the results of the rollout all the way to the
root.
8
AlphaGo
AlphaGo
• AlphaGo effectively combines the ideas of policy networks and
value networks with MCTS.
• A policy network pσ is trained via supervised learning of expert
human moves.
• A fast rollout policy pπ is also trained for rapidly sampling
actions during rollouts.
• A reinforcemnt learning policy network pρ is trained, which
adjusts the policy towards winning the game, rather than
maximizing predictive accuracy of human moves.
• A value network vθ is also trained to predict the winner of the
games played from a state.
9
Supervised Learning of Policy Network
• The Supervised Learning(SL) Policy Network pσ(a|s) gives a
probability distribution across all the legal moves a, given a
state s.
• Trained on 30 million randomly sampled state-action pairs
(s, a), with the intent of maximizing human move.
• The moves were downloaded from the KGS Go Server, which
contains 160,000 games played by 6-9 dan human players.
• 13 Layer Network.
• The SL Policy Net has an accuracy of 57.0% using all input
features, and 55.7% using only raw board positions and move
history as inputs, compared to state-of-art at the time of 44.4%
• A faster but less accurate rollout policy pπ(a|s) is also trained,
using a linear softmax of small pattern features. (Accuracy of
24.2%)
10
Reinforcement Learning of Policy Network
• The Reinforcemt Learning(RL) Policy Network pρ improves upon
the SL Network, by optimizing for winning the game.
• The RL Policy Net weights ρ are initialized to the same values as
that of SL Policy Net ρ = σ
• Games are played between the current policy pρ and a randomly
selected iteration of the policy network.
• A reward function r(s) is used, which is +1 for winning and -1 for
losing and 0 for all other non-terminal states.
• RL Net when played agains SL Net won more than 80% of the
games
• The RL Policy Net won more than 85% of games, using no search
at all, against pachi, the strongest open source Go Program.
11
Reinforcemnt Learning of Value Network
• Used to estimate the value of a state s when the game is played
using policy p
vp
(s) = E[zt|st = s, at...T ∼ p]
• The estimation is done for the strongest policy, using the RL
Policy Network pρ
• Value Network has a similar architecture as the policy network
but instead outputs a single value instead of a probability
distribution.
• Trained by regression on state outcome pairs (s, z)
12
Searching with policy and Value Networks
• Each edge (s, a) of the search tree stores an action value Q(s, a),
a visit count N(s, a), and prior probability P(s, a)
• At each time step of simulation traversal, an action is selected
from state st using
at = argmax
a
(Q(st, a) + u(st, a))
which maximizes the action value with the u(st, a) term used to
incentivize exploration
• When the traversal reaches a leaf node sL it may be expanded.
13
Searching with policy and Value Networks
• The leaf position sl is processed just once by the SL Policy
network and the output probabilites are stored as prior
probabilites P for each legal action a, P(s, a) = pσ(a|s).
• The leaf node valuation is done by mixing the values from the
value network vθ(sL) and the outcome zt of a random rollout
played until the terminal step using the fast rollout policy pπ
with a mixing parameter λ
• At the end of simulation, the action values and visit counts of all
traversed edges are update.
• Once the search is complete the algorithm chooses the most
visited move from the root position
14
How good is it?
Really Really Really good
The system built using the proposed system AlphaGo, beats
everything under the sun for Go currently
15
Conclusion
Conclusion
• The paper builds on top of existing literature and introduces
new mechanism to search through an intractable search space
by focusing on branches which can have values with high
probability.
• It also provides a pathway through for future research on other
actions which require human intelligence in an implicit and
seemingly intuitive way.
16
Things to Do!
• Read the Paper (Surprisingly Easy to Understand)
• Learn to Play Go
• Watch the Documentary AlphaGo (2017) (Is on NF)
• When it’s published Skim or Read the book Deep Learning and
the Game of Go (Manning Publication)
17
Questions?
17

Weitere ähnliche Inhalte

Was ist angesagt?

Adversarial Search
Adversarial SearchAdversarial Search
Adversarial SearchMegha Sharma
 
Flexbone Offense
Flexbone OffenseFlexbone Offense
Flexbone OffenseTom Neuman
 
Random scan displays and raster scan displays
Random scan displays and raster scan displaysRandom scan displays and raster scan displays
Random scan displays and raster scan displaysSomya Bagai
 
Adversarial search with Game Playing
Adversarial search with Game PlayingAdversarial search with Game Playing
Adversarial search with Game PlayingAman Patel
 
Photo-realistic Single Image Super-resolution using a Generative Adversarial ...
Photo-realistic Single Image Super-resolution using a Generative Adversarial ...Photo-realistic Single Image Super-resolution using a Generative Adversarial ...
Photo-realistic Single Image Super-resolution using a Generative Adversarial ...Hansol Kang
 
Variational Autoencoder
Variational AutoencoderVariational Autoencoder
Variational AutoencoderMark Chang
 
hill climbing algorithm in artificial intelligence
hill climbing algorithm in artificial intelligencehill climbing algorithm in artificial intelligence
hill climbing algorithm in artificial intelligenceRahul Gupta
 
Introduction of Deep Reinforcement Learning
Introduction of Deep Reinforcement LearningIntroduction of Deep Reinforcement Learning
Introduction of Deep Reinforcement LearningNAVER Engineering
 
Deep Learning State of the Art (2019) - MIT by Lex Fridman
Deep Learning State of the Art (2019) - MIT by Lex FridmanDeep Learning State of the Art (2019) - MIT by Lex Fridman
Deep Learning State of the Art (2019) - MIT by Lex FridmanPeerasak C.
 
A Comparison of Loss Function on Deep Embedding
A Comparison of Loss Function on Deep EmbeddingA Comparison of Loss Function on Deep Embedding
A Comparison of Loss Function on Deep EmbeddingCenk Bircanoğlu
 
Playing Atari with Deep Reinforcement Learning
Playing Atari with Deep Reinforcement LearningPlaying Atari with Deep Reinforcement Learning
Playing Atari with Deep Reinforcement LearningYoonho Shin
 
Semi supervised learning machine learning made simple
Semi supervised learning  machine learning made simpleSemi supervised learning  machine learning made simple
Semi supervised learning machine learning made simpleDevansh16
 
Game design through the eyes of gaming history
Game design through the eyes of gaming historyGame design through the eyes of gaming history
Game design through the eyes of gaming historyDori Adar
 
Deep Learning A-Z™: Convolutional Neural Networks (CNN) - Module 2
Deep Learning A-Z™: Convolutional Neural Networks (CNN) - Module 2Deep Learning A-Z™: Convolutional Neural Networks (CNN) - Module 2
Deep Learning A-Z™: Convolutional Neural Networks (CNN) - Module 2Kirill Eremenko
 

Was ist angesagt? (15)

Adversarial Search
Adversarial SearchAdversarial Search
Adversarial Search
 
Flexbone Offense
Flexbone OffenseFlexbone Offense
Flexbone Offense
 
Random scan displays and raster scan displays
Random scan displays and raster scan displaysRandom scan displays and raster scan displays
Random scan displays and raster scan displays
 
Adversarial search with Game Playing
Adversarial search with Game PlayingAdversarial search with Game Playing
Adversarial search with Game Playing
 
Photo-realistic Single Image Super-resolution using a Generative Adversarial ...
Photo-realistic Single Image Super-resolution using a Generative Adversarial ...Photo-realistic Single Image Super-resolution using a Generative Adversarial ...
Photo-realistic Single Image Super-resolution using a Generative Adversarial ...
 
Variational Autoencoder
Variational AutoencoderVariational Autoencoder
Variational Autoencoder
 
hill climbing algorithm in artificial intelligence
hill climbing algorithm in artificial intelligencehill climbing algorithm in artificial intelligence
hill climbing algorithm in artificial intelligence
 
Introduction of Deep Reinforcement Learning
Introduction of Deep Reinforcement LearningIntroduction of Deep Reinforcement Learning
Introduction of Deep Reinforcement Learning
 
Adversarial search
Adversarial search Adversarial search
Adversarial search
 
Deep Learning State of the Art (2019) - MIT by Lex Fridman
Deep Learning State of the Art (2019) - MIT by Lex FridmanDeep Learning State of the Art (2019) - MIT by Lex Fridman
Deep Learning State of the Art (2019) - MIT by Lex Fridman
 
A Comparison of Loss Function on Deep Embedding
A Comparison of Loss Function on Deep EmbeddingA Comparison of Loss Function on Deep Embedding
A Comparison of Loss Function on Deep Embedding
 
Playing Atari with Deep Reinforcement Learning
Playing Atari with Deep Reinforcement LearningPlaying Atari with Deep Reinforcement Learning
Playing Atari with Deep Reinforcement Learning
 
Semi supervised learning machine learning made simple
Semi supervised learning  machine learning made simpleSemi supervised learning  machine learning made simple
Semi supervised learning machine learning made simple
 
Game design through the eyes of gaming history
Game design through the eyes of gaming historyGame design through the eyes of gaming history
Game design through the eyes of gaming history
 
Deep Learning A-Z™: Convolutional Neural Networks (CNN) - Module 2
Deep Learning A-Z™: Convolutional Neural Networks (CNN) - Module 2Deep Learning A-Z™: Convolutional Neural Networks (CNN) - Module 2
Deep Learning A-Z™: Convolutional Neural Networks (CNN) - Module 2
 

Ähnlich wie A Presentation on the Paper: Mastering the game of Go with deep neural networks and tree search

La question de la durabilité des technologies de calcul et de télécommunication
La question de la durabilité des technologies de calcul et de télécommunicationLa question de la durabilité des technologies de calcul et de télécommunication
La question de la durabilité des technologies de calcul et de télécommunicationAlexandre Monnin
 
Introduction to Alphago Zero
Introduction to Alphago ZeroIntroduction to Alphago Zero
Introduction to Alphago ZeroChia-Ching Lin
 
How DeepMind Mastered The Game Of Go
How DeepMind Mastered The Game Of GoHow DeepMind Mastered The Game Of Go
How DeepMind Mastered The Game Of GoTim Riser
 
Adversarial search
Adversarial searchAdversarial search
Adversarial searchNilu Desai
 
Alpha go 16110226_김영우
Alpha go 16110226_김영우Alpha go 16110226_김영우
Alpha go 16110226_김영우영우 김
 
AlphaZero: A General Reinforcement Learning Algorithm that Masters Chess, Sho...
AlphaZero: A General Reinforcement Learning Algorithm that Masters Chess, Sho...AlphaZero: A General Reinforcement Learning Algorithm that Masters Chess, Sho...
AlphaZero: A General Reinforcement Learning Algorithm that Masters Chess, Sho...Joonhyung Lee
 
J-Fall 2017 - AI Self-learning Game Playing
J-Fall 2017 - AI Self-learning Game PlayingJ-Fall 2017 - AI Self-learning Game Playing
J-Fall 2017 - AI Self-learning Game PlayingRichard Abbuhl
 
Mastering the game of go with deep neural networks and tree search
Mastering the game of go with deep neural networks and tree searchMastering the game of go with deep neural networks and tree search
Mastering the game of go with deep neural networks and tree searchSanFengChang
 
IaGo: an Othello AI inspired by AlphaGo
IaGo: an Othello AI inspired by AlphaGoIaGo: an Othello AI inspired by AlphaGo
IaGo: an Othello AI inspired by AlphaGoShion Honda
 
Chakrabarti alpha go analysis
Chakrabarti alpha go analysisChakrabarti alpha go analysis
Chakrabarti alpha go analysisDave Selinger
 
GamePlaying.ppt
GamePlaying.pptGamePlaying.ppt
GamePlaying.pptVihaanN2
 
Demystifying deep reinforement learning
Demystifying deep reinforement learningDemystifying deep reinforement learning
Demystifying deep reinforement learning재연 윤
 
An analysis of minimax search and endgame databases in evolving awale game pl...
An analysis of minimax search and endgame databases in evolving awale game pl...An analysis of minimax search and endgame databases in evolving awale game pl...
An analysis of minimax search and endgame databases in evolving awale game pl...csandit
 

Ähnlich wie A Presentation on the Paper: Mastering the game of Go with deep neural networks and tree search (20)

Google Deepmind Mastering Go Research Paper
Google Deepmind Mastering Go Research PaperGoogle Deepmind Mastering Go Research Paper
Google Deepmind Mastering Go Research Paper
 
La question de la durabilité des technologies de calcul et de télécommunication
La question de la durabilité des technologies de calcul et de télécommunicationLa question de la durabilité des technologies de calcul et de télécommunication
La question de la durabilité des technologies de calcul et de télécommunication
 
Introduction to Alphago Zero
Introduction to Alphago ZeroIntroduction to Alphago Zero
Introduction to Alphago Zero
 
AlphaGo and AlphaGo Zero
AlphaGo and AlphaGo ZeroAlphaGo and AlphaGo Zero
AlphaGo and AlphaGo Zero
 
How DeepMind Mastered The Game Of Go
How DeepMind Mastered The Game Of GoHow DeepMind Mastered The Game Of Go
How DeepMind Mastered The Game Of Go
 
Adversarial search
Adversarial searchAdversarial search
Adversarial search
 
Games
GamesGames
Games
 
Alpha go 16110226_김영우
Alpha go 16110226_김영우Alpha go 16110226_김영우
Alpha go 16110226_김영우
 
AlphaZero: A General Reinforcement Learning Algorithm that Masters Chess, Sho...
AlphaZero: A General Reinforcement Learning Algorithm that Masters Chess, Sho...AlphaZero: A General Reinforcement Learning Algorithm that Masters Chess, Sho...
AlphaZero: A General Reinforcement Learning Algorithm that Masters Chess, Sho...
 
J-Fall 2017 - AI Self-learning Game Playing
J-Fall 2017 - AI Self-learning Game PlayingJ-Fall 2017 - AI Self-learning Game Playing
J-Fall 2017 - AI Self-learning Game Playing
 
ConvNets_C_Focke2
ConvNets_C_Focke2ConvNets_C_Focke2
ConvNets_C_Focke2
 
Games.4
Games.4Games.4
Games.4
 
Mastering the game of go with deep neural networks and tree search
Mastering the game of go with deep neural networks and tree searchMastering the game of go with deep neural networks and tree search
Mastering the game of go with deep neural networks and tree search
 
IaGo: an Othello AI inspired by AlphaGo
IaGo: an Othello AI inspired by AlphaGoIaGo: an Othello AI inspired by AlphaGo
IaGo: an Othello AI inspired by AlphaGo
 
Chakrabarti alpha go analysis
Chakrabarti alpha go analysisChakrabarti alpha go analysis
Chakrabarti alpha go analysis
 
GamePlaying.ppt
GamePlaying.pptGamePlaying.ppt
GamePlaying.ppt
 
Capgemini 1
Capgemini 1Capgemini 1
Capgemini 1
 
Alpha Go: in few slides
Alpha Go: in few slidesAlpha Go: in few slides
Alpha Go: in few slides
 
Demystifying deep reinforement learning
Demystifying deep reinforement learningDemystifying deep reinforement learning
Demystifying deep reinforement learning
 
An analysis of minimax search and endgame databases in evolving awale game pl...
An analysis of minimax search and endgame databases in evolving awale game pl...An analysis of minimax search and endgame databases in evolving awale game pl...
An analysis of minimax search and endgame databases in evolving awale game pl...
 

Kürzlich hochgeladen

TechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catchers
TechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor CatchersTechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catchers
TechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catcherssdickerson1
 
INFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETE
INFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETEINFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETE
INFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETEroselinkalist12
 
complete construction, environmental and economics information of biomass com...
complete construction, environmental and economics information of biomass com...complete construction, environmental and economics information of biomass com...
complete construction, environmental and economics information of biomass com...asadnawaz62
 
Past, Present and Future of Generative AI
Past, Present and Future of Generative AIPast, Present and Future of Generative AI
Past, Present and Future of Generative AIabhishek36461
 
Artificial-Intelligence-in-Electronics (K).pptx
Artificial-Intelligence-in-Electronics (K).pptxArtificial-Intelligence-in-Electronics (K).pptx
Artificial-Intelligence-in-Electronics (K).pptxbritheesh05
 
Application of Residue Theorem to evaluate real integrations.pptx
Application of Residue Theorem to evaluate real integrations.pptxApplication of Residue Theorem to evaluate real integrations.pptx
Application of Residue Theorem to evaluate real integrations.pptx959SahilShah
 
computer application and construction management
computer application and construction managementcomputer application and construction management
computer application and construction managementMariconPadriquez1
 
8251 universal synchronous asynchronous receiver transmitter
8251 universal synchronous asynchronous receiver transmitter8251 universal synchronous asynchronous receiver transmitter
8251 universal synchronous asynchronous receiver transmitterShivangiSharma879191
 
main PPT.pptx of girls hostel security using rfid
main PPT.pptx of girls hostel security using rfidmain PPT.pptx of girls hostel security using rfid
main PPT.pptx of girls hostel security using rfidNikhilNagaraju
 
Sachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective IntroductionSachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective IntroductionDr.Costas Sachpazis
 
Instrumentation, measurement and control of bio process parameters ( Temperat...
Instrumentation, measurement and control of bio process parameters ( Temperat...Instrumentation, measurement and control of bio process parameters ( Temperat...
Instrumentation, measurement and control of bio process parameters ( Temperat...121011101441
 
Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024hassan khalil
 
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxDecoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxJoão Esperancinha
 
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort serviceGurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort servicejennyeacort
 
Oxy acetylene welding presentation note.
Oxy acetylene welding presentation note.Oxy acetylene welding presentation note.
Oxy acetylene welding presentation note.eptoze12
 
Introduction-To-Agricultural-Surveillance-Rover.pptx
Introduction-To-Agricultural-Surveillance-Rover.pptxIntroduction-To-Agricultural-Surveillance-Rover.pptx
Introduction-To-Agricultural-Surveillance-Rover.pptxk795866
 
Introduction to Machine Learning Unit-3 for II MECH
Introduction to Machine Learning Unit-3 for II MECHIntroduction to Machine Learning Unit-3 for II MECH
Introduction to Machine Learning Unit-3 for II MECHC Sai Kiran
 

Kürzlich hochgeladen (20)

Design and analysis of solar grass cutter.pdf
Design and analysis of solar grass cutter.pdfDesign and analysis of solar grass cutter.pdf
Design and analysis of solar grass cutter.pdf
 
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
 
TechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catchers
TechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor CatchersTechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catchers
TechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catchers
 
INFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETE
INFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETEINFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETE
INFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETE
 
complete construction, environmental and economics information of biomass com...
complete construction, environmental and economics information of biomass com...complete construction, environmental and economics information of biomass com...
complete construction, environmental and economics information of biomass com...
 
Past, Present and Future of Generative AI
Past, Present and Future of Generative AIPast, Present and Future of Generative AI
Past, Present and Future of Generative AI
 
Artificial-Intelligence-in-Electronics (K).pptx
Artificial-Intelligence-in-Electronics (K).pptxArtificial-Intelligence-in-Electronics (K).pptx
Artificial-Intelligence-in-Electronics (K).pptx
 
🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...
🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...
🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...
 
Application of Residue Theorem to evaluate real integrations.pptx
Application of Residue Theorem to evaluate real integrations.pptxApplication of Residue Theorem to evaluate real integrations.pptx
Application of Residue Theorem to evaluate real integrations.pptx
 
computer application and construction management
computer application and construction managementcomputer application and construction management
computer application and construction management
 
8251 universal synchronous asynchronous receiver transmitter
8251 universal synchronous asynchronous receiver transmitter8251 universal synchronous asynchronous receiver transmitter
8251 universal synchronous asynchronous receiver transmitter
 
main PPT.pptx of girls hostel security using rfid
main PPT.pptx of girls hostel security using rfidmain PPT.pptx of girls hostel security using rfid
main PPT.pptx of girls hostel security using rfid
 
Sachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective IntroductionSachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
 
Instrumentation, measurement and control of bio process parameters ( Temperat...
Instrumentation, measurement and control of bio process parameters ( Temperat...Instrumentation, measurement and control of bio process parameters ( Temperat...
Instrumentation, measurement and control of bio process parameters ( Temperat...
 
Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024
 
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxDecoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
 
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort serviceGurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
 
Oxy acetylene welding presentation note.
Oxy acetylene welding presentation note.Oxy acetylene welding presentation note.
Oxy acetylene welding presentation note.
 
Introduction-To-Agricultural-Surveillance-Rover.pptx
Introduction-To-Agricultural-Surveillance-Rover.pptxIntroduction-To-Agricultural-Surveillance-Rover.pptx
Introduction-To-Agricultural-Surveillance-Rover.pptx
 
Introduction to Machine Learning Unit-3 for II MECH
Introduction to Machine Learning Unit-3 for II MECHIntroduction to Machine Learning Unit-3 for II MECH
Introduction to Machine Learning Unit-3 for II MECH
 

A Presentation on the Paper: Mastering the game of Go with deep neural networks and tree search

  • 1. Mastering the game of Go with deep neural networks and tree search A Presentation Aditya R Suryavamshi Monday 12th March, 2018 Dayananda Sagar College of Engineering
  • 2. Table of contents 1. Introduction 2. Games 3. Playing Games as a Computer 4. AlphaGo 5. Conclusion 1
  • 4. About the Paper • The paper is a seminal paper in the field of Artificial Intelligence (AI), more specifically in the field of General Game Playing (GGI). • Introduces a novel search algorithm that builds on top of existing mechanisms to search through very dense trees of games such as Go. • The System developed in this paper ended up winning against one of the strongest Go Players of our current time. 2
  • 5. Authors of the Paper • David Silver • Aja Huang (Placed stones on the Go board for AlphaGo in the match versus Lee Sedol @ the 2017 Future of Go Summit!) Chris J. Maddison, Arthur Guez, Laurent Sifre, George van den Driessche, Julian Schrittwieser, Ioannis Antonoglou, Veda Panneershelvam, Marc Lanctot, Sander Dieleman, Dominik Grewe, John Nham, Nal Kalchbrenner, Ilya Sutskever, Timothy Lillicrap, Madeleine Leach, Koray Kavukcuoglu, Thore Graepel & Demis Hassabis 3
  • 7. Games • Games of Perfect Information • Basically games which don’t involve chance, and all the players have complete information to make the best decision for themselves. • These games have an associated value function v∗ (s), from every state of the game s, under perfect play by all players • Complexity of Perfect Information Games • A game with breadth b and depth d will have bd possible sequence of moves. • Typical Values are in the order of b = 35, d = 80 for chess, and b = 250, d = 150 for Go. 4
  • 8. Go • Why Go? • Search Space is extremely large (≈ 250150 ) • Go is generally considered the kind of game that requires intuition Objective Surround a larger area of the board with your own stone than your opponent. Rules • Players take alternate turns in placing the stone on the board • Black moves first • Rule of liberty • Ko Rule 5
  • 9. Playing Games as a Computer
  • 10. How do you play games when you are a Computer? Take 1 Exhaustive Search • MinMax • Depth First MinMax with Alpha-Beta Pruning Issues Exhaustive Searching is in-feasible for all but the simplest of games (Tic-Tac-Toe). 6
  • 11. How do you play games when you are a Computer? Take 2 Cut Down on the Breadth and Depth of the Game • Focus on only the promising Moves • Depth Via Position Evaluation and truncation of search tree at state s and replacing the subtree below by an approximate value function v∗ (s) that predicts outcome from the state s • Breadth via Sampling actions from a policy p(a|s) Monte Carlo Rollouts Uses a policy for both the players to search until the end, without branching. 7
  • 12. Monte Carlo Tree Search Used to find out the most promising action that would optimize for winning. General Steps Selection Select a Node according to a criteria (Usually something like Upper Confidence Bound (UCB)) Expansion If the selected Node isn’t the leaf node, and hasn’t been visited before, expand the Node with one or more child Nodes, and select one of the child nodes. Simulation Also called Evaluation, Playout or Rollout Play a random playout from the node until the very end to determine the value of the rollout. Backpropogation Update the results of the rollout all the way to the root. 8
  • 14. AlphaGo • AlphaGo effectively combines the ideas of policy networks and value networks with MCTS. • A policy network pσ is trained via supervised learning of expert human moves. • A fast rollout policy pπ is also trained for rapidly sampling actions during rollouts. • A reinforcemnt learning policy network pρ is trained, which adjusts the policy towards winning the game, rather than maximizing predictive accuracy of human moves. • A value network vθ is also trained to predict the winner of the games played from a state. 9
  • 15. Supervised Learning of Policy Network • The Supervised Learning(SL) Policy Network pσ(a|s) gives a probability distribution across all the legal moves a, given a state s. • Trained on 30 million randomly sampled state-action pairs (s, a), with the intent of maximizing human move. • The moves were downloaded from the KGS Go Server, which contains 160,000 games played by 6-9 dan human players. • 13 Layer Network. • The SL Policy Net has an accuracy of 57.0% using all input features, and 55.7% using only raw board positions and move history as inputs, compared to state-of-art at the time of 44.4% • A faster but less accurate rollout policy pπ(a|s) is also trained, using a linear softmax of small pattern features. (Accuracy of 24.2%) 10
  • 16. Reinforcement Learning of Policy Network • The Reinforcemt Learning(RL) Policy Network pρ improves upon the SL Network, by optimizing for winning the game. • The RL Policy Net weights ρ are initialized to the same values as that of SL Policy Net ρ = σ • Games are played between the current policy pρ and a randomly selected iteration of the policy network. • A reward function r(s) is used, which is +1 for winning and -1 for losing and 0 for all other non-terminal states. • RL Net when played agains SL Net won more than 80% of the games • The RL Policy Net won more than 85% of games, using no search at all, against pachi, the strongest open source Go Program. 11
  • 17. Reinforcemnt Learning of Value Network • Used to estimate the value of a state s when the game is played using policy p vp (s) = E[zt|st = s, at...T ∼ p] • The estimation is done for the strongest policy, using the RL Policy Network pρ • Value Network has a similar architecture as the policy network but instead outputs a single value instead of a probability distribution. • Trained by regression on state outcome pairs (s, z) 12
  • 18. Searching with policy and Value Networks • Each edge (s, a) of the search tree stores an action value Q(s, a), a visit count N(s, a), and prior probability P(s, a) • At each time step of simulation traversal, an action is selected from state st using at = argmax a (Q(st, a) + u(st, a)) which maximizes the action value with the u(st, a) term used to incentivize exploration • When the traversal reaches a leaf node sL it may be expanded. 13
  • 19. Searching with policy and Value Networks • The leaf position sl is processed just once by the SL Policy network and the output probabilites are stored as prior probabilites P for each legal action a, P(s, a) = pσ(a|s). • The leaf node valuation is done by mixing the values from the value network vθ(sL) and the outcome zt of a random rollout played until the terminal step using the fast rollout policy pπ with a mixing parameter λ • At the end of simulation, the action values and visit counts of all traversed edges are update. • Once the search is complete the algorithm chooses the most visited move from the root position 14
  • 20. How good is it? Really Really Really good The system built using the proposed system AlphaGo, beats everything under the sun for Go currently 15
  • 22. Conclusion • The paper builds on top of existing literature and introduces new mechanism to search through an intractable search space by focusing on branches which can have values with high probability. • It also provides a pathway through for future research on other actions which require human intelligence in an implicit and seemingly intuitive way. 16
  • 23. Things to Do! • Read the Paper (Surprisingly Easy to Understand) • Learn to Play Go • Watch the Documentary AlphaGo (2017) (Is on NF) • When it’s published Skim or Read the book Deep Learning and the Game of Go (Manning Publication) 17