SlideShare ist ein Scribd-Unternehmen logo
1 von 17
Downloaden Sie, um offline zu lesen
System Engineering Laboratory
koain@naver.com
16110226 Kim Young Woo
Mastering the game of Go with Deep
Neural Networks and Tree Search
Contents
Ⅰ. What is the AlphaGo?
- Go Machine
Ⅱ. Background
- Overview
- MCTS (Monte Carlo Tree Search)
- CNN (Convolutional Neural Network)
Ⅲ. Components
- Policy Networks
- Value Networks
- Searching with policy and value networks
Ⅳ. Conclusion
Ⅰ. What is the AlphaGo?
What is the AlphaGo? – Go Machine
- AlphaGo is a computer program developed by Google DeepMind to play game Go.
- The first computer Go program to beat a professional human Go player without
handicaps.
What is the AlphaGo? – Go Machine
- For Chess, IBM Deep Blue beat the world Chess champion using Brute Force search.
- Go game board is 19 x 19 size and possible states of one point are 3. The number of
all cases is 3361
≒ 10170
.
- The number of the next reasonable position is 250 and a Go game is ended at 150
turns on average. The tree’s depth is the 150 and breadth is the 250.
- It is impossible to search all of the cases with current technology.
- It is the key how can decrease the depth and breadth of the search tree.
Ⅱ. Background
Background – Overview
1. MCTS (Monte Carlo Tree Search)
- It is used by many AI Go programs.
2. CNN (Convolution Neural Networks)
- Policy Networks
- Value Networks
Background – MCTS
- When it is impossible to explore all paths, it is efficient
- Selection : Select the most promising path from root to leaf.
- Expansion : If game is not ended, either create one or more child nodes or choose
from them.
- Simulation : Play game from chosen node until game is ended.
- Backpropagation : Update information on the path from root to chosen node using the
simulation result.
Background – CNN
- Convolution Layer : It extracts meaningful data(feature maps) from input image.
- Sub-sampling Layer : It max-pooling from feature maps.
- Fully-Connected Layer : It is used for classification from feature maps.
Ⅲ. Components
- 𝑠 : State of the board
- 𝑎 : Next action
- 𝑣 𝑠 : Valuation function
- 𝑃 𝑎 𝑠 : Propability distribution over possible moves 𝑎 in position 𝑠
- 𝑃𝜎 : Supervised learning based Policy Networks
- 𝑃𝜋 : Rapidly sample actions during rollouts.
- 𝑃𝜌 : Reinforcement learning based Policy Networks
- 𝑣 𝜃 : Value Networks that predicts the winner of games
Components
Components - Policy Networks
- Decrease the breadth of the search tree.
- Convolutional Neural Networks for finding next action.
- Estimating value function 𝑃 𝑎 𝑠 .
- Supervised Learning and Reinforcement Learning.
1. SL(Supervised Learning) Policy Network
- Learn from human expert using 30 million data from KGS GO Server
2. RL(Reinforcement Learning) Policy Network
- Initialized to SL
- Learn from playing games of self-play with RL policy Network
- RL policy Network won more than 80% games against SL policy Network
Components - Value Networks
- Decrease the depth of the search tree.
- Convolutional Neural Networks for predicting outcome
from position 𝑠.
- Estimating value function 𝑣 𝑝(𝑠).
- Reinforcement Learning.
1. Reinforcement Learning
- Self play game from RL policy networks
- Avoiding the overfitting from KGS data sets.
Components – Searching with policy and value network
- 𝑄 : MCTS action value
- 𝑢(𝑃) : Bonus that depends on a stored prior probability P, that is in inverse proportion
to the visit counts.
- Selection : Select maximum 𝑄 + 𝑢(𝑃) value at step L-1.
- Expansion : Expanse the nodes based on 𝑃𝜎 at step L
- Evaluation : Evaluate the win rate by simulating using 𝑣 𝜃 and random rollout playing.
- Backup : 𝑄 and visit counts of all traversed edges are updated.
Ⅳ. Conclusion
Conclusion
- Single-machine AlphaGo is many dan ranks stronger than any previous Go program,
winning 494 out of 495 games (99.8%) against outher Go programs.
- Distributed version Alphago won the match 5 games to 0 against Fan Hui, European
Go Champion.
The End.

Weitere ähnliche Inhalte

Ähnlich wie Alpha go 16110226_김영우

Mastering the game of go with deep neural networks and tree searching
Mastering the game of go with deep neural networks and tree searchingMastering the game of go with deep neural networks and tree searching
Mastering the game of go with deep neural networks and tree searching
Brian Kim
 
Computer Vision for Beginners
Computer Vision for BeginnersComputer Vision for Beginners
Computer Vision for Beginners
Sanghamitra Deb
 
Final Presentation - Edan&Itzik
Final Presentation - Edan&ItzikFinal Presentation - Edan&Itzik
Final Presentation - Edan&Itzik
itzik cohen
 

Ähnlich wie Alpha go 16110226_김영우 (20)

Chess Engine
Chess EngineChess Engine
Chess Engine
 
IaGo: an Othello AI inspired by AlphaGo
IaGo: an Othello AI inspired by AlphaGoIaGo: an Othello AI inspired by AlphaGo
IaGo: an Othello AI inspired by AlphaGo
 
Mastering the game of go with deep neural networks and tree searching
Mastering the game of go with deep neural networks and tree searchingMastering the game of go with deep neural networks and tree searching
Mastering the game of go with deep neural networks and tree searching
 
La question de la durabilité des technologies de calcul et de télécommunication
La question de la durabilité des technologies de calcul et de télécommunicationLa question de la durabilité des technologies de calcul et de télécommunication
La question de la durabilité des technologies de calcul et de télécommunication
 
AlphaGo and AlphaGo Zero
AlphaGo and AlphaGo ZeroAlphaGo and AlphaGo Zero
AlphaGo and AlphaGo Zero
 
J-Fall 2017 - AI Self-learning Game Playing
J-Fall 2017 - AI Self-learning Game PlayingJ-Fall 2017 - AI Self-learning Game Playing
J-Fall 2017 - AI Self-learning Game Playing
 
A Presentation on the Paper: Mastering the game of Go with deep neural networ...
A Presentation on the Paper: Mastering the game of Go with deep neural networ...A Presentation on the Paper: Mastering the game of Go with deep neural networ...
A Presentation on the Paper: Mastering the game of Go with deep neural networ...
 
Project on ai gaming
Project on ai gamingProject on ai gaming
Project on ai gaming
 
Reinforcement Learning for Self Driving Cars
Reinforcement Learning for Self Driving CarsReinforcement Learning for Self Driving Cars
Reinforcement Learning for Self Driving Cars
 
Understanding AlphaGo
Understanding AlphaGoUnderstanding AlphaGo
Understanding AlphaGo
 
Ibm's deep blue chess grandmaster chips
Ibm's deep blue chess grandmaster chipsIbm's deep blue chess grandmaster chips
Ibm's deep blue chess grandmaster chips
 
Introduction to Alphago Zero
Introduction to Alphago ZeroIntroduction to Alphago Zero
Introduction to Alphago Zero
 
Google Deepmind Mastering Go Research Paper
Google Deepmind Mastering Go Research PaperGoogle Deepmind Mastering Go Research Paper
Google Deepmind Mastering Go Research Paper
 
AlphaZero and beyond: Polygames
AlphaZero and beyond: PolygamesAlphaZero and beyond: Polygames
AlphaZero and beyond: Polygames
 
AI Lecture 5 (game playing)
AI Lecture 5 (game playing)AI Lecture 5 (game playing)
AI Lecture 5 (game playing)
 
Gdc19 junsik hwang_v20190314_upload
Gdc19 junsik hwang_v20190314_uploadGdc19 junsik hwang_v20190314_upload
Gdc19 junsik hwang_v20190314_upload
 
Computer Vision for Beginners
Computer Vision for BeginnersComputer Vision for Beginners
Computer Vision for Beginners
 
Final Presentation - Edan&Itzik
Final Presentation - Edan&ItzikFinal Presentation - Edan&Itzik
Final Presentation - Edan&Itzik
 
Minimax.pdf
Minimax.pdfMinimax.pdf
Minimax.pdf
 
Artificial neural networks introduction
Artificial neural networks introductionArtificial neural networks introduction
Artificial neural networks introduction
 

Kürzlich hochgeladen

Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Victor Rentea
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Kürzlich hochgeladen (20)

Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 

Alpha go 16110226_김영우

  • 1. System Engineering Laboratory koain@naver.com 16110226 Kim Young Woo Mastering the game of Go with Deep Neural Networks and Tree Search
  • 2. Contents Ⅰ. What is the AlphaGo? - Go Machine Ⅱ. Background - Overview - MCTS (Monte Carlo Tree Search) - CNN (Convolutional Neural Network) Ⅲ. Components - Policy Networks - Value Networks - Searching with policy and value networks Ⅳ. Conclusion
  • 3. Ⅰ. What is the AlphaGo?
  • 4. What is the AlphaGo? – Go Machine - AlphaGo is a computer program developed by Google DeepMind to play game Go. - The first computer Go program to beat a professional human Go player without handicaps.
  • 5. What is the AlphaGo? – Go Machine - For Chess, IBM Deep Blue beat the world Chess champion using Brute Force search. - Go game board is 19 x 19 size and possible states of one point are 3. The number of all cases is 3361 ≒ 10170 . - The number of the next reasonable position is 250 and a Go game is ended at 150 turns on average. The tree’s depth is the 150 and breadth is the 250. - It is impossible to search all of the cases with current technology. - It is the key how can decrease the depth and breadth of the search tree.
  • 7. Background – Overview 1. MCTS (Monte Carlo Tree Search) - It is used by many AI Go programs. 2. CNN (Convolution Neural Networks) - Policy Networks - Value Networks
  • 8. Background – MCTS - When it is impossible to explore all paths, it is efficient - Selection : Select the most promising path from root to leaf. - Expansion : If game is not ended, either create one or more child nodes or choose from them. - Simulation : Play game from chosen node until game is ended. - Backpropagation : Update information on the path from root to chosen node using the simulation result.
  • 9. Background – CNN - Convolution Layer : It extracts meaningful data(feature maps) from input image. - Sub-sampling Layer : It max-pooling from feature maps. - Fully-Connected Layer : It is used for classification from feature maps.
  • 11. - 𝑠 : State of the board - 𝑎 : Next action - 𝑣 𝑠 : Valuation function - 𝑃 𝑎 𝑠 : Propability distribution over possible moves 𝑎 in position 𝑠 - 𝑃𝜎 : Supervised learning based Policy Networks - 𝑃𝜋 : Rapidly sample actions during rollouts. - 𝑃𝜌 : Reinforcement learning based Policy Networks - 𝑣 𝜃 : Value Networks that predicts the winner of games Components
  • 12. Components - Policy Networks - Decrease the breadth of the search tree. - Convolutional Neural Networks for finding next action. - Estimating value function 𝑃 𝑎 𝑠 . - Supervised Learning and Reinforcement Learning. 1. SL(Supervised Learning) Policy Network - Learn from human expert using 30 million data from KGS GO Server 2. RL(Reinforcement Learning) Policy Network - Initialized to SL - Learn from playing games of self-play with RL policy Network - RL policy Network won more than 80% games against SL policy Network
  • 13. Components - Value Networks - Decrease the depth of the search tree. - Convolutional Neural Networks for predicting outcome from position 𝑠. - Estimating value function 𝑣 𝑝(𝑠). - Reinforcement Learning. 1. Reinforcement Learning - Self play game from RL policy networks - Avoiding the overfitting from KGS data sets.
  • 14. Components – Searching with policy and value network - 𝑄 : MCTS action value - 𝑢(𝑃) : Bonus that depends on a stored prior probability P, that is in inverse proportion to the visit counts. - Selection : Select maximum 𝑄 + 𝑢(𝑃) value at step L-1. - Expansion : Expanse the nodes based on 𝑃𝜎 at step L - Evaluation : Evaluate the win rate by simulating using 𝑣 𝜃 and random rollout playing. - Backup : 𝑄 and visit counts of all traversed edges are updated.
  • 16. Conclusion - Single-machine AlphaGo is many dan ranks stronger than any previous Go program, winning 494 out of 495 games (99.8%) against outher Go programs. - Distributed version Alphago won the match 5 games to 0 against Fan Hui, European Go Champion.