SlideShare ist ein Scribd-Unternehmen logo
1 von 44
AlphaGo: An AI Go Player based on
Deep Neural Networks and Monte Carlo Tree Search
Michael J. Moon
M.Sc. Candidates in Biostatistics
Dalla Lana School of Public Health
University of Toronto
April 7, 2016
Agenda
Introduction
Methodologies
Design
Discussion
References
AlphaGo | M.Moon 2
Introduction | Background
AlphaGo | M.Moon 3
The Game of Go
> Played on a square grid called a board, usually 19 x 19
> Stones – black and white – are placed alternatively
> Points awarded for surrounding empty space
1. 1 Googol = 1.0 × 10100
Example of a Go Board
Shades represent territories
Introduction | Background
AlphaGo | M.Moon 4
The Game of Go
> Played on a square grid called a board, usually 19 x 19
> Stones – black and white – are placed alternatively
> Points awarded for surrounding empty space
Complexity
> Possible number of sequences ≈ 250150
> Googol1 times more complex than chess
> Viewed as an unsolved “grand challenge” for AI
1. 1 Googol = 1.0 × 10100
“pinnacle of perfect information games”
Demis Hassabis, Co-founder of DeepMind
Example of a Go Board
Shades represent territories
Introduction | Background
AlphaGo | M.Moon 5
1. Image source: https://deepmind.com/alpha-go.html;
2. Source: http://www.straitstimes.com/asia/east-asia/googles-alphago-gets-divine-go-ranking
> Google DeepMind’s AI Go Player
Introduction | Background
AlphaGo | M.Moon 6
1. Image source: https://deepmind.com/alpha-go.html;
2. Source: http://www.straitstimes.com/asia/east-asia/googles-alphago-gets-divine-go-ranking
5-0 against Fan Hui
> Victory against 3-times European champion
> First program to win against a professional player in an even game
Introduction | Background
AlphaGo | M.Moon 7
1. Image source: https://deepmind.com/alpha-go.html;
2. Source: http://www.straitstimes.com/asia/east-asia/googles-alphago-gets-divine-go-ranking
5-0 against Fan Hui
> Victory against 3-times European champion
> First program to win against a professional player in an even game
4-1 against Sedol Lee
> Victory against world’s top player over the past decade
> Awarded the highest Go ranking after the match2
Introduction | Overview of the Design
AlphaGo | M.Moon
30M Human Moves
SL Policy Network
Rollout Policy
RL Policy Network
RL Value Network
8
Introduction | Overview of the Design
AlphaGo | M.Moon
30M Human Moves
SL Policy Network
Rollout Policy
RL Policy Network
RL Value Network
9
Monte Carlo
Tree Search
Move Selection
Introduction | Overview of the Design
AlphaGo | M.Moon
30M Human Moves
SL Policy Network
Rollout Policy
RL Policy Network
RL Value Network
10
Monte Carlo
Tree Search
Asynchronous Multi-threaded Search
> 40 Search Threads
> 48 CPUs
> 8 GPUs
Distributed Version1
> 40 Search Threads
> 1,202 CPUs
> 176 GPUs
1. Used against Fan Hui; 1,920 CPUs and 280 GPUs against Lee
http://www.economist.com/news/science-and-technology/21694540-win-or-lose-best-five-battle-contest-another-milestone
Move Selection
Methodologies | Deep Neural Network
AlphaGo | M.Moon 11
Deep Learning Architecture
> Multilayer (5~20) stack of simple
modules subject to learning
𝑤𝑖𝑗 𝑤𝑗𝑘
𝑤 𝑘𝑙
𝑦𝑗 = 𝑓 𝑧𝑗
𝑦 𝑘 = 𝑓 𝑧 𝑘
𝑦𝑙 = 𝑓 𝑧𝑙
𝐻𝑖𝑑𝑑𝑒𝑛 𝑢𝑛𝑖𝑡𝑠 𝐻2
𝐻𝑖𝑑𝑑𝑒𝑛 𝑢𝑛𝑖𝑡𝑠 𝐻1
𝐼𝑛𝑝𝑢𝑡 𝑢𝑛𝑖𝑡𝑠
𝑂𝑢𝑡𝑝𝑢𝑡 𝑢𝑛𝑖𝑡𝑠
𝑦𝑙
𝑧𝑗 =
𝑖∈𝐼𝑛
𝑤𝑖𝑗 𝑥𝑖
𝑧 𝑘 =
𝑗∈𝐻1
𝑤𝑗𝑘 𝑦𝑗
𝑧𝑙 =
𝑘∈𝐻2
𝑤 𝑘𝑙 𝑦 𝑘
i
j
k
l
Methodologies | Deep Neural Network
AlphaGo | M.Moon 12
Deep Learning Architecture
> Multilayer (5~20) stack of simple
modules subject to learning
Backpropagation Training
> Trained by simple stochastic
gradient descent to minimize error
𝑤𝑖𝑗 𝑤𝑗𝑘
𝑤 𝑘𝑙
𝑦𝑗 = 𝑓 𝑧𝑗
𝑦 𝑘 = 𝑓 𝑧 𝑘
𝑦𝑙 = 𝑓 𝑧𝑙
𝐻𝑖𝑑𝑑𝑒𝑛 𝑢𝑛𝑖𝑡𝑠 𝐻2
𝐻𝑖𝑑𝑑𝑒𝑛 𝑢𝑛𝑖𝑡𝑠 𝐻1
𝐼𝑛𝑝𝑢𝑡 𝑢𝑛𝑖𝑡𝑠
𝑂𝑢𝑡𝑝𝑢𝑡 𝑢𝑛𝑖𝑡𝑠
𝑦𝑙
𝑧𝑗 =
𝑖∈𝐼𝑛
𝑤𝑖𝑗 𝑥𝑖
𝑧 𝑘 =
𝑗∈𝐻1
𝑤𝑗𝑘 𝑦𝑗
𝑧𝑙 =
𝑘∈𝐻2
𝑤 𝑘𝑙 𝑦 𝑘
i
j
k
l
Methodologies | Deep Neural Network
AlphaGo | M.Moon 13
Deep Learning Architecture
> Multilayer (5~20) stack of simple
modules subject to learning
Backpropagation Training
> Trained by simple stochastic
gradient descent to minimize error
𝜕𝐸
𝜕𝑦𝑙
𝐸 = 𝐿𝑜𝑠𝑠 𝐹𝑢𝑛𝑐𝑡𝑖𝑜𝑛
𝑒. 𝑔. , 𝐸 =
1
2
𝑦𝑙 − 𝑡𝑙
2
𝜕𝐸
𝜕𝑦𝑙
= 𝑦𝑙 − 𝑡𝑙
𝜕𝐸
𝜕𝑧𝑗
=
𝜕𝐸
𝜕𝑗
𝜕𝑦𝑗
𝜕𝑧𝑗
𝜕𝐸
𝜕𝑧 𝑘
=
𝜕𝐸
𝜕𝑦 𝑘
𝜕𝑦 𝑘
𝜕𝑧 𝑘
𝜕𝐸
𝜕𝑧𝑙
=
𝜕𝐸
𝜕𝑦𝑙
𝜕𝑦𝑙
𝜕𝑧𝑙
𝜕𝐸
𝜕𝑦𝑗
=
𝑘∈𝐻2
𝑤𝑗𝑘
𝜕𝐸
𝜕𝑧 𝑘
𝜕𝐸
𝜕𝑦 𝑘
=
𝑙∈𝑂𝑢𝑡
𝑤 𝑘𝑙
𝜕𝐸
𝜕𝑧𝑙
𝑤𝑖𝑗 𝑤𝑗𝑘
𝑤 𝑘𝑙
𝐻𝑖𝑑𝑑𝑒𝑛 𝑢𝑛𝑖𝑡𝑠 𝐻2
𝐻𝑖𝑑𝑑𝑒𝑛 𝑢𝑛𝑖𝑡𝑠 𝐻1
𝐼𝑛𝑝𝑢𝑡 𝑢𝑛𝑖𝑡𝑠
𝑂𝑢𝑡𝑝𝑢𝑡 𝑢𝑛𝑖𝑡𝑠
i
j
k
l
𝑤𝑖𝑗
′
= 𝑤𝑖𝑗 − 𝜂
𝜕𝐸
𝜕𝑧𝑗
𝜂 = 𝑠𝑡𝑒𝑝 𝑠𝑖𝑧𝑒
Application of the chain rule for derivatives to obtain gradient descents
Methodologies | Deep Neural Network
AlphaGo | M.Moon 14
Deep Learning Architecture
> Multilayer (5~20) stack of simple
modules subject to learning
Backpropagation Training
> Trained by simple stochastic
gradient descent to minimize error
> Rectified linear unit (ReLU) learns
faster than other non-linearities
𝑓 𝑥 = max 0, 𝑥
𝑤𝑖𝑗 𝑤𝑗𝑘
𝑤 𝑘𝑙
𝑦𝑗 = 𝑓 𝑧𝑗
𝑦 𝑘 = 𝑓 𝑧 𝑘
𝑦𝑙 = 𝑓 𝑧𝑙
𝐻𝑖𝑑𝑑𝑒𝑛 𝑢𝑛𝑖𝑡𝑠 𝐻2
𝐻𝑖𝑑𝑑𝑒𝑛 𝑢𝑛𝑖𝑡𝑠 𝐻1
𝑂𝑢𝑡𝑝𝑢𝑡 𝑢𝑛𝑖𝑡𝑠
𝑦𝑙
𝑧𝑗 =
𝑖∈𝐼𝑛
𝑤𝑖𝑗 𝑥𝑖
𝑧 𝑘 =
𝑗∈𝐻1
𝑤𝑗𝑘 𝑦𝑗
𝑧𝑙 =
𝑘∈𝐻2
𝑤 𝑘𝑙 𝑦 𝑘
i
j
k
l
𝐼𝑛𝑝𝑢𝑡 𝑢𝑛𝑖𝑡𝑠
Methodologies | Deep Convolutional Neural Network
AlphaGo | M.Moon 15
Input
Arrays such as signals,
images and videos
Local Connections
Arrays such as signals,
images and videos
Shared Weights
Each filter with common weights
and bias to create a feature
𝑾 𝟏
𝑾 𝟏
𝑾 𝟏
Pooling
Coarse-graining the position of
each feature, typically by taking
max from neighbouring features
Non-linearity
Local weighted sums to
a non-linearity such as ReLU
Size and Stride
Filter size 3 with stride 2
Deep Architecture
Uses stacks of many layers
Properties of natural signals
Methodologies | Deep Convolutional Neural Network
AlphaGo | M.Moon 16
Architecture
> Highly correlated local
groups
> Local statistics invariant to
location
Properties
> Compositional hierarchy
> Invariant to small shifts and
distortions due to pooling
> Weights trained through
backpropagation
𝑾 𝟏
𝑾 𝟏
𝑾 𝟏
Methodologies | Monte Carlo Tree Search
AlphaGo | M.Moon 17
Tree Policy
𝑔 𝑝 𝑎 𝑠 , 𝑁 𝑎 ,
𝑠 ∈ 𝑇𝑟𝑒𝑒 𝑁𝑜𝑑𝑒𝑠
Default Policy
𝑝 𝑎 𝑠 , 𝑠 ∉ 𝑇𝑟𝑒𝑒 𝑁𝑜𝑑𝑒𝑠
𝑠 ∈ 𝑆, 𝑛𝑜𝑑𝑒𝑠
𝑎 ∈ 𝐴, 𝑒𝑑𝑔𝑒𝑠
𝑟 𝑠 , 𝑟𝑒𝑤𝑎𝑟𝑑
𝑁 𝑎 , 𝑣𝑖𝑠𝑖𝑡 𝑐𝑜𝑢𝑛𝑡
Overview
Find optimal decisions by:
> Take random samples in the decision space
> Build a search tree according to the result
Methodologies | Monte Carlo Tree Search
AlphaGo | M.Moon 18
Selection
Traverse to the most
urgent expandable node
Tree Policy
𝑔 𝑝 𝑎 𝑠 , 𝑁 𝑎 ,
𝑠 ∈ 𝑇𝑟𝑒𝑒 𝑁𝑜𝑑𝑒𝑠
Tries to balance
exploration and exploitation
Default Policy
𝑝 𝑎 𝑠 , 𝑠 ∉ 𝑇𝑟𝑒𝑒 𝑁𝑜𝑑𝑒𝑠
𝑠 ∈ 𝑆, 𝑛𝑜𝑑𝑒𝑠
𝑎 ∈ 𝐴, 𝑒𝑑𝑔𝑒𝑠
𝑟 𝑠 , 𝑟𝑒𝑤𝑎𝑟𝑑
𝑁 𝑎 , 𝑣𝑖𝑠𝑖𝑡 𝑐𝑜𝑢𝑛𝑡
Methodologies | Monte Carlo Tree Search
AlphaGo | M.Moon 19
Selection
Traverse to the most
urgent expandable node
Expansion
Add a child node from the
selected node
Tree Policy
𝑔 𝑝 𝑎 𝑠 , 𝑁 𝑎 ,
𝑠 ∈ 𝑇𝑟𝑒𝑒 𝑁𝑜𝑑𝑒𝑠
Tries to balance
exploration and exploitation
Default Policy
𝑝 𝑎 𝑠 , 𝑠 ∉ 𝑇𝑟𝑒𝑒 𝑁𝑜𝑑𝑒𝑠
𝑠 ∈ 𝑆, 𝑛𝑜𝑑𝑒𝑠
𝑎 ∈ 𝐴, 𝑒𝑑𝑔𝑒𝑠
𝑟 𝑠 , 𝑟𝑒𝑤𝑎𝑟𝑑
𝑁 𝑎 , 𝑣𝑖𝑠𝑖𝑡 𝑐𝑜𝑢𝑛𝑡
Methodologies | Monte Carlo Tree Search
AlphaGo | M.Moon 20
𝑟 𝑠′
Selection
Traverse to the most
urgent expandable node
Expansion
Add a child node from the
selected node
Simulation
Simulate from the newly
added node to an outcome
Tree Policy
𝑔 𝑝 𝑎 𝑠 , 𝑁 𝑎 ,
𝑠 ∈ 𝑇𝑟𝑒𝑒 𝑁𝑜𝑑𝑒𝑠
Default Policy
𝑝 𝑎 𝑠 , 𝑠 ∉ 𝑇𝑟𝑒𝑒 𝑁𝑜𝑑𝑒𝑠
𝑠 ∈ 𝑆, 𝑛𝑜𝑑𝑒𝑠
𝑎 ∈ 𝐴, 𝑒𝑑𝑔𝑒𝑠
𝑟 𝑠 , 𝑟𝑒𝑤𝑎𝑟𝑑
𝑁 𝑎 , 𝑣𝑖𝑠𝑖𝑡 𝑐𝑜𝑢𝑛𝑡
Methodologies | Monte Carlo Tree Search
AlphaGo | M.Moon 21
𝑟 𝑠′
Selection
Traverse to the most
urgent expandable node
Expansion
Add a child node from the
selected node
Simulation
Simulate from the newly
added node to an outcome
Backpropagation
Backup simulation result
through selected nodes
Tree Policy
𝑔 𝑝 𝑎 𝑠 , 𝑁 𝑎 ,
𝑠 ∈ 𝑇𝑟𝑒𝑒 𝑁𝑜𝑑𝑒𝑠
Default Policy
𝑝 𝑎 𝑠 , 𝑠 ∉ 𝑇𝑟𝑒𝑒 𝑁𝑜𝑑𝑒𝑠
𝑠 ∈ 𝑆, 𝑛𝑜𝑑𝑒𝑠
𝑎 ∈ 𝐴, 𝑒𝑑𝑔𝑒𝑠
𝑟 𝑠 , 𝑟𝑒𝑤𝑎𝑟𝑑
𝑁 𝑎 , 𝑣𝑖𝑠𝑖𝑡 𝑐𝑜𝑢𝑛𝑡
Methodologies | Monte Carlo Tree Search
AlphaGo | M.Moon 22
𝑟 𝑠′
Tree Policy
𝑔 𝑝 𝑎 𝑠 , 𝑁 𝑎 ,
𝑠 ∈ 𝑇𝑟𝑒𝑒 𝑁𝑜𝑑𝑒𝑠
Default Policy
𝑝 𝑎 𝑠 , 𝑠 ∉ 𝑇𝑟𝑒𝑒 𝑁𝑜𝑑𝑒𝑠
Strengths
> Anytime algorithm – gives a valid solution at
any time of interruption
> Values of intermediate states are not
evaluated – domain knowledge not required
𝑠 ∈ 𝑆, 𝑛𝑜𝑑𝑒𝑠
𝑎 ∈ 𝐴, 𝑒𝑑𝑔𝑒𝑠
𝑟 𝑠 , 𝑟𝑒𝑤𝑎𝑟𝑑
𝑁 𝑎 , 𝑣𝑖𝑠𝑖𝑡 𝑐𝑜𝑢𝑛𝑡
Design | Problem Setting
AlphaGo | M.Moon 23
Unique Optimal Value Function
𝑣∗
𝑠 =
𝑧 𝑇 𝑖𝑓 𝑠 = 𝑠 𝑇
max
𝑎
−𝑣∗ 𝑓 𝑥, 𝑎 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
> 𝑠 ∈ 𝑆 State of the game
> 𝑎 ∈ 𝐴(𝑠) Legal actions at 𝑠
> 𝑓(𝑎, 𝑠) Deterministic state transitions
> 𝑟 𝑖 𝑠 Reward for player 𝑖 at 𝑠, 𝑖 ∈ 1,2
𝑍𝑒𝑟𝑜 − 𝑠𝑢𝑚 𝑔𝑎𝑚𝑒: 𝑟 𝑠 = 𝑟1
𝑠 = −𝑟2
𝑠
𝑟 𝑠 = 0 𝑖𝑓 𝑠 ≠ 𝑠 𝑇
> 𝑧𝑡 = ±𝑟 𝑠 𝑇 Terminal reward at 𝑠 𝑇
zt ∈ −1,1
Value Function
> 𝑣 𝑝
(𝑠) 𝐸 𝑧𝑡 𝑠𝑡 = 𝑠, 𝑎 𝑡,…,𝑇~𝑝
Policy
> 𝑝 𝑎 𝑠 Probability distribution
over legal actions
Design | Rollout Policy
AlphaGo | M.Moon 24
> A fast, linear softmax policy for simulation
> Pattern-based feature inputs
> Trained using 8 million positions
> Less domain knowledge implemented compared to
existing MTSC Go programs
> 24.2% prediction accuracy
> Similar 𝑝𝜏 𝑎 𝑠 for tree expansion
𝑝 𝜋 𝑎 𝑠
𝑀𝑎𝑥𝑚𝑖𝑧𝑒 ∆𝜋 ∝
𝜕𝑙𝑜𝑔𝑝 𝜋 𝑎 𝑠
𝜕𝜋
Design | Neural Network Architectures
AlphaGo | M.Moon 25
1
1
0
1
1
0
1
0
0
0
1
0
1
0
Input
19 x 19 intersections
x 48 feature plane
x48 +1Input Feature Space
> Stone Colour
> Ones & Zeros
> Turns Since
> Liberties
> Capture Size
> Self-atari Size
> Liberties after Move
> Ladder Capture
> Ladder Escape
> Sensibleness
with respect to current player
Extra Feature for Value Network
> Player Colour 0
19
x
19
1
1
0
1
0
1
0
0 1
0
Design | Neural Network Architectures
AlphaGo | M.Moon 26
0
0
0
0
0
0
0
0
0
0
0
0
1
1
0
1
1
0
0
0
1
0
0
0
0
0
0
0
1
0
𝑘 Filters
Kernel size 5 x 5 with
stride 1 convolution
𝑘 = 192 𝑎𝑡 𝑚𝑎𝑡𝑐ℎ
Zero-Padding
(19+4) x (19+4)
ReLU
𝑓 𝒙 = max 0, 𝒙
0
0
0
0
0
0
0
0
Design | Neural Network Architectures
AlphaGo | M.Moon 27
0
0
0
0
0
0
0
0
0
0
0
0
1
1
0
1
1
0
0
0
1
0
0
0
0
0
0
0
1
0
19 x 19 Output
𝑃𝑟𝑒𝑣𝑖𝑜𝑢𝑠𝐷𝑖𝑚 + 𝑃𝑎𝑑𝑑𝑖𝑛𝑔 𝑆𝑖𝑧𝑒 − 𝐾𝑒𝑟𝑛𝑒𝑟 𝑆𝑖𝑧𝑒
𝑆𝑡𝑟𝑖𝑑𝑒
+ 1
=
19 + 4 − 5
1
+ 1 = 19
0
0
0
0
0
0
0
0
Design | Neural Network Architectures
AlphaGo | M.Moon 28
0
0
0
0
0
0
0
0
0
0
0
0
1
1
0
1
1
0
0
0
1
0
0
0
0
0
0
0
1
0
0
0
0
00
0
0
0
x11
𝑘 Filters
Kernel size 3 x 3 with
stride 1 convolution
Zero-Padding
(19+2) x (19+2)
ReLU
𝑓 𝒙 = max 0, 𝒙
19
x
19
0
0
0
0
0
0
0
0
Design | Neural Network Architectures
AlphaGo | M.Moon 29
0
0
0
0
0
0
0
0
0
0
0
0
1
1
0
1
1
0
0
0
1
0
0
0
0
0
0
0
1
0
0
0
0
00
0
0
0
Policy
1-Stride Convolution
1 kernel of size 1 x 1 with different
bias for each intersection
Softmax Function
Outputs 𝑝 𝑎 𝑠 for each of
19 x 19 intersections
Value 1-Stride Convolution
1 kernel of size 1 x 1
Tanh Function
Fully-connected layer
Outputs a single 𝑣 𝜃 𝑠 ∈ −1,1
256 Rectifiers
Fully-connected layer
19
x
19
x11
Convolution Layers
0
0
0
0
0
0
0
0
Design | Supervised Learning Policy Network
AlphaGo | M.Moon 30
> Trained using mini-batches of 16 randomly selected
from 28.4 million positions
> Trained on 50 GPUs over 3 weeks
> Tested with 1 million positions
> 57.0% prediction accuracy
𝑝 𝜎 𝑎 𝑠
𝑀𝑎𝑥𝑚𝑖𝑧𝑒 ∆𝜎 ∝
𝜕𝑙𝑜𝑔𝑝 𝜎 𝑎 𝑠
𝜕𝜎
Design | Reinforcement Learning Policy Network
AlphaGo | M.Moon 31
> Trained using self-play between the current network and a randomly
selected previous iteration of 𝑝 𝜎 𝑎 𝑠
> Trained over 10,000 million mini-batches of 128 games
> Evaluated through game play 𝑎~𝑝 𝜌 ∙ 𝑠 without search
> 80% against 𝑝 𝜎 𝑎 𝑠
> 85% against strongest open-source Go program
𝑝 𝜌 𝑎 𝑠
𝑀𝑎𝑥𝑚𝑖𝑧𝑒 ∆𝜌 ∝
𝜕𝑙𝑜𝑔𝑝 𝜌 𝑎 𝑡 𝑠𝑡
𝜕𝜌
𝑧𝑡
Design | Value Network
AlphaGo | M.Moon 32
> Trained using 30 million distinct positions from a separate game
generated by a random mix of 𝑝 𝜎 𝑎 𝑠 and 𝑝 𝜌 𝑎 𝑠 to prevent overfitting
> Consistently more accurate than 𝑝 𝜋 𝑎 𝑠
> Approaches Monte Carlo rollouts using 𝑝 𝜌 𝑎 𝑠 with less computation
𝑣 𝜃(𝑠)
𝑀𝑖𝑛𝑖𝑚𝑖𝑧𝑒 ∆𝜃 ∝
𝜕𝑣 𝜃 𝑠
𝜕𝜃
𝑧 − 𝑣 𝜃 𝑠
𝒗 𝜽 𝒔 ≈ 𝒗 𝒑 𝝆 𝒔 ≈ 𝒗∗ 𝒔
Design | Search Algorithm
AlphaGo | M.Moon 33
*Image captured from Silver D. et al. (2016)
Edge (𝑠,𝑎) Data
𝑄 𝑠, 𝑎 = 𝑎𝑐𝑡𝑖𝑜𝑛 𝑣𝑎𝑙𝑢𝑒
𝑁 𝑠, 𝑎 = 𝑣𝑖𝑠𝑖𝑡 𝑐𝑜𝑢𝑛𝑡
𝑃 𝑠, 𝑎 = 𝑝𝑟𝑖𝑜𝑟 𝑝𝑟𝑜𝑏𝑎𝑏𝑖𝑙𝑖𝑡𝑦
Design | Search Algorithm
AlphaGo | M.Moon 34
*Image captured from Silver D. et al. (2016)
Selection
𝑎 𝑡 = 𝑎𝑟𝑔𝑚𝑎𝑥
𝑎
𝑄 𝑠𝑡, 𝑎 + 𝑢 𝑠𝑡, 𝑎
𝑢 𝑠𝑡, 𝑎 ∝
𝑃 𝑠, 𝑎
1 + 𝑁 𝑠, 𝑎
encourages exploration
Stop if t = 𝐿, predefined time step
Edge (𝑠,𝑎) Data
𝑄 𝑠, 𝑎 = 𝑎𝑐𝑡𝑖𝑜𝑛 𝑣𝑎𝑙𝑢𝑒
𝑁 𝑠, 𝑎 = 𝑣𝑖𝑠𝑖𝑡 𝑐𝑜𝑢𝑛𝑡
𝑃 𝑠, 𝑎 = 𝑝𝑟𝑖𝑜𝑟 𝑝𝑟𝑜𝑏𝑎𝑏𝑖𝑙𝑖𝑡𝑦
Design | Search Algorithm
AlphaGo | M.Moon 35
*Image captured from Silver D. et al. (2016)
Expansion
E𝑥𝑝𝑎𝑛𝑑 using a~ 𝑝τ 𝑖𝑓 𝑁 𝑠, 𝑎 > 𝑛 𝑡ℎ𝑟, a dynamic threshold
𝑆𝑒𝑡 𝑃 𝑠 𝐿, 𝑎 = 𝑝 𝜎 𝑠 𝐿, 𝑎
Edge (𝑠,𝑎) Data
𝑄 𝑠, 𝑎 = 𝑎𝑐𝑡𝑖𝑜𝑛 𝑣𝑎𝑙𝑢𝑒
𝑁 𝑠, 𝑎 = 𝑣𝑖𝑠𝑖𝑡 𝑐𝑜𝑢𝑛𝑡
𝑃 𝑠, 𝑎 = 𝑝𝑟𝑖𝑜𝑟 𝑝𝑟𝑜𝑏𝑎𝑏𝑖𝑙𝑖𝑡𝑦
Design | Search Algorithm
AlphaGo | M.Moon 36
*Image captured from Silver D. et al. (2016)
Evaluation
𝑉 𝑠 𝐿 = 1 − 𝜆 𝑣 𝜃 𝑠 𝐿 + 𝜆𝑧 𝐿
𝑧 𝐿 = 𝑠 𝑇~𝑝 𝜋 𝑎 𝑠 𝐿 𝑎𝑛𝑑
𝜆 = 𝑚𝑖𝑥𝑖𝑛𝑔 𝑝𝑎𝑟𝑎𝑚𝑒𝑡𝑒𝑟
Edge (𝑠,𝑎) Data
𝑄 𝑠, 𝑎 = 𝑎𝑐𝑡𝑖𝑜𝑛 𝑣𝑎𝑙𝑢𝑒
𝑁 𝑠, 𝑎 = 𝑣𝑖𝑠𝑖𝑡 𝑐𝑜𝑢𝑛𝑡
𝑃 𝑠, 𝑎 = 𝑝𝑟𝑖𝑜𝑟 𝑝𝑟𝑜𝑏𝑎𝑏𝑖𝑙𝑖𝑡𝑦
Design | Search Algorithm
AlphaGo | M.Moon 37
*Image captured from Silver D. et al. (2016)
Edge (𝑠,𝑎) Data
𝑄 𝑠, 𝑎 = 𝑎𝑐𝑡𝑖𝑜𝑛 𝑣𝑎𝑙𝑢𝑒
𝑁 𝑠, 𝑎 = 𝑣𝑖𝑠𝑖𝑡 𝑐𝑜𝑢𝑛𝑡
𝑃 𝑠, 𝑎 = 𝑝𝑟𝑖𝑜𝑟 𝑝𝑟𝑜𝑏𝑎𝑏𝑖𝑙𝑖𝑡𝑦
Backup
𝑁 𝑠, 𝑎 =
𝑖=1
𝑖
1 𝑠, 𝑎, 𝑖
𝑄 𝑠, 𝑎 =
1
𝑁 𝑠, 𝑎
𝑖=1
𝑖
1 𝑠, 𝑎, 𝑖 𝑉 𝑆 𝐿
𝑖
𝑖 ∈ 1, 𝑛 𝑎𝑠𝑦𝑛𝑐ℎ𝑟𝑜𝑛𝑜𝑢𝑠 𝑠𝑖𝑚𝑢𝑙𝑎𝑡𝑖𝑜𝑛𝑠
Design | Search Algorithm
AlphaGo | M.Moon 38
*Image captured from Silver D. et al. (2016)
Edge (𝑠,𝑎) Data
𝑄 𝑠, 𝑎 = 𝑎𝑐𝑡𝑖𝑜𝑛 𝑣𝑎𝑙𝑢𝑒
𝑁 𝑠, 𝑎 = 𝑣𝑖𝑠𝑖𝑡 𝑐𝑜𝑢𝑛𝑡
𝑃 𝑠, 𝑎 = 𝑝𝑟𝑖𝑜𝑟 𝑝𝑟𝑜𝑏𝑎𝑏𝑖𝑙𝑖𝑡𝑦
Select Move
𝑎′
= argmax
𝑎
𝑁 𝑠, 𝑎
Discussion | Performance
AlphaGo | M.Moon 39
Against AI Players
> Played against strongest commercial and
open-source Go programs based on MCTS
> Single machine AlphaGo won 494 out of
495 in even games
> Distributed version of AlphaGo won 77%
against the single machine version and
100% against others
Discussion | Performance
AlphaGo | M.Moon 40
Against Fan Hui
> Won 5-0 in formal games with 1 hour of
main time + three 30s byoyomi1’s
> Won 3-2 in informal games with three
30s byoyomi1’s
1. Time slots to be consumed after exhausting main time; reset to full period if not exceeded in a single turn;
*Image captured from Silver D. et al. (2016)
Discussion | Performance
AlphaGo | M.Moon 41
Against Sedol Lee
> Won 4-1 in formal games with 2 hours of main
time + three 60s byoyomi’s
> Game 4 – the only loss – being analyzed
> MCTS may have overlooked Lee’s game
changing move – which was the only move that
could save the game at the state
Game 4
Sedol Lee (White), AlphaGo (Black)
Sedol Lee wins by resignation
*Image captured from https://gogameguru.com/lee-sedol-defeats-alphago-masterful-comeback-game-4/
Discussion | Future Work
AlphaGo | M.Moon 42
Next Potential Matches
> Imperfect information games
(e.g., Poker, StarCraft)
> AlphaGo based on pure learning
> Testbed for future algorithmic researches
Areas Applications
> Gaming
> Healthcare
> Smartphone Assistant
Healthcare Applications
> Medical diagnosis of images
> Longitudinal tracking of vital signs to help
people have healthier lifestyles
Discussion | Future Work
AlphaGo | M.Moon 43
Next Potential Matches
> Imperfect information games
(e.g., Poker, StarCraft)
> AlphaGo based on pure learning
> Testbed for future algorithmic researches
“it’d be cool if one day an AI was involved in finding a new particle”
Demis Hassabis, Co-founder of DeepMind
Areas Applications
> Gaming
> Healthcare
> Smartphone Assistant
Healthcare Applications
> Medical diagnosis of images
> Longitudinal tracking of vital signs to help
people have healthier lifestyles
References
AlphaGo | M.Moon 44
Browne, C. B., Powley, E., Whitehouse, D., Lucas, S. M., Cowling, P. I., Rohlfshagen, P., . . . Colton, S. (2012). A Survey of Monte
Carlo Tree Search Methods. IEEE Trans. Comput. Intell. AI Games IEEE Transactions on Computational Intelligence and AI in
Games, 4(1), 1-43.
Byford, S. (2016, March 10). DeepMind founder Demis Hassabis on how AI will shape the future. The Verge. Retrieved April 02,
2016, from http://www.theverge.com/2016/3/10/11192774/demis-hassabis-interview-alphago-google-deepmind-ai
Google Inc. (2016). AlphaGo | Google DeepMind. Retrieved April 02, 2016, from https://deepmind.com/alpha-go.html
Lecun, Y., Bengio, Y., & Hinton, G. (2015). Deep learning. Nature, 521(7553), 436-444.
Ormerod, D. (2016, March 13). Lee Sedol defeats AlphaGo in masterful comeback - Game 4. Retrieved April 06, 2016, from
https://gogameguru.com/lee-sedol-defeats-alphago-masterful-comeback-game-4/
Silver, D., Huang, A., Maddison, C. J., Guez, A., Sifre, L., Driessche, G. V., . . . Hassabis, D. (2016). Mastering the game of Go with
deep neural networks and tree search. Nature, 529(7587), 484-489.

Weitere ähnliche Inhalte

Was ist angesagt?

AlphaGo in Depth
AlphaGo in Depth AlphaGo in Depth
AlphaGo in Depth Mark Chang
 
ModuLab DLC-Medical3
ModuLab DLC-Medical3ModuLab DLC-Medical3
ModuLab DLC-Medical3Dongheon Lee
 
Monte Carlo Tree Search for the Super Mario Bros
Monte Carlo Tree Search for the Super Mario BrosMonte Carlo Tree Search for the Super Mario Bros
Monte Carlo Tree Search for the Super Mario BrosChih-Sheng Lin
 
Microsoft Data Wranglers - 8august2017
Microsoft Data Wranglers - 8august2017Microsoft Data Wranglers - 8august2017
Microsoft Data Wranglers - 8august2017Julian Lee
 
increasing the action gap - new operators for reinforcement learning
increasing the action gap - new operators for reinforcement learningincreasing the action gap - new operators for reinforcement learning
increasing the action gap - new operators for reinforcement learningRyo Iwaki
 
[한국어] Safe Multi-Agent Reinforcement Learning for Autonomous Driving
[한국어] Safe Multi-Agent Reinforcement Learning for Autonomous Driving[한국어] Safe Multi-Agent Reinforcement Learning for Autonomous Driving
[한국어] Safe Multi-Agent Reinforcement Learning for Autonomous DrivingKiho Suh
 
Josh Patterson MLconf slides
Josh Patterson MLconf slidesJosh Patterson MLconf slides
Josh Patterson MLconf slidesMLconf
 
Image classification with neural networks
Image classification with neural networksImage classification with neural networks
Image classification with neural networksSepehr Rasouli
 
Uncertainty Quantification in AI
Uncertainty Quantification in AIUncertainty Quantification in AI
Uncertainty Quantification in AIFlorian Wilhelm
 
Distributed implementation of a lstm on spark and tensorflow
Distributed implementation of a lstm on spark and tensorflowDistributed implementation of a lstm on spark and tensorflow
Distributed implementation of a lstm on spark and tensorflowEmanuel Di Nardo
 
Deep Recurrent Neural Networks for Sequence Learning in Spark by Yves Mabiala
Deep Recurrent Neural Networks for Sequence Learning in Spark by Yves MabialaDeep Recurrent Neural Networks for Sequence Learning in Spark by Yves Mabiala
Deep Recurrent Neural Networks for Sequence Learning in Spark by Yves MabialaSpark Summit
 

Was ist angesagt? (12)

AlphaGo in Depth
AlphaGo in Depth AlphaGo in Depth
AlphaGo in Depth
 
ModuLab DLC-Medical3
ModuLab DLC-Medical3ModuLab DLC-Medical3
ModuLab DLC-Medical3
 
Monte Carlo Tree Search for the Super Mario Bros
Monte Carlo Tree Search for the Super Mario BrosMonte Carlo Tree Search for the Super Mario Bros
Monte Carlo Tree Search for the Super Mario Bros
 
Microsoft Data Wranglers - 8august2017
Microsoft Data Wranglers - 8august2017Microsoft Data Wranglers - 8august2017
Microsoft Data Wranglers - 8august2017
 
increasing the action gap - new operators for reinforcement learning
increasing the action gap - new operators for reinforcement learningincreasing the action gap - new operators for reinforcement learning
increasing the action gap - new operators for reinforcement learning
 
How AlphaGo Works
How AlphaGo WorksHow AlphaGo Works
How AlphaGo Works
 
[한국어] Safe Multi-Agent Reinforcement Learning for Autonomous Driving
[한국어] Safe Multi-Agent Reinforcement Learning for Autonomous Driving[한국어] Safe Multi-Agent Reinforcement Learning for Autonomous Driving
[한국어] Safe Multi-Agent Reinforcement Learning for Autonomous Driving
 
Josh Patterson MLconf slides
Josh Patterson MLconf slidesJosh Patterson MLconf slides
Josh Patterson MLconf slides
 
Image classification with neural networks
Image classification with neural networksImage classification with neural networks
Image classification with neural networks
 
Uncertainty Quantification in AI
Uncertainty Quantification in AIUncertainty Quantification in AI
Uncertainty Quantification in AI
 
Distributed implementation of a lstm on spark and tensorflow
Distributed implementation of a lstm on spark and tensorflowDistributed implementation of a lstm on spark and tensorflow
Distributed implementation of a lstm on spark and tensorflow
 
Deep Recurrent Neural Networks for Sequence Learning in Spark by Yves Mabiala
Deep Recurrent Neural Networks for Sequence Learning in Spark by Yves MabialaDeep Recurrent Neural Networks for Sequence Learning in Spark by Yves Mabiala
Deep Recurrent Neural Networks for Sequence Learning in Spark by Yves Mabiala
 

Andere mochten auch

AlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree Search
AlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree SearchAlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree Search
AlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree SearchKarel Ha
 
Challenges for implementing Monte Carlo Tree Search in commercial games
Challenges for implementing Monte Carlo Tree Search in commercial gamesChallenges for implementing Monte Carlo Tree Search in commercial games
Challenges for implementing Monte Carlo Tree Search in commercial gamesMatthew Bedder
 
인공지능개론 (머신러닝 중심)
인공지능개론 (머신러닝 중심)인공지능개론 (머신러닝 중심)
인공지능개론 (머신러닝 중심)SK(주) C&C - 강병호
 
Deep Learning Into Advance - 1. Image, ConvNet
Deep Learning Into Advance - 1. Image, ConvNetDeep Learning Into Advance - 1. Image, ConvNet
Deep Learning Into Advance - 1. Image, ConvNetHyojun Kim
 
What did AlphaGo do to beat the strongest human Go player?
What did AlphaGo do to beat the strongest human Go player?What did AlphaGo do to beat the strongest human Go player?
What did AlphaGo do to beat the strongest human Go player?Tobias Pfeiffer
 
Google AlphaGo, 어떻게 동작할까요?
Google AlphaGo, 어떻게 동작할까요?Google AlphaGo, 어떻게 동작할까요?
Google AlphaGo, 어떻게 동작할까요?Lee Ji Eun
 
ゲーム木探索技術とコンピュータ将棋への応用
ゲーム木探索技術とコンピュータ将棋への応用ゲーム木探索技術とコンピュータ将棋への応用
ゲーム木探索技術とコンピュータ将棋への応用Shogo Takeuchi
 
통찰의연결 학습자료 04
통찰의연결 학습자료 04통찰의연결 학습자료 04
통찰의연결 학습자료 04connect_foundation
 
AlphaGo 囲碁AI Master 〜AlphaGoから何を学ぶのか〜
AlphaGo 囲碁AI Master 〜AlphaGoから何を学ぶのか〜AlphaGo 囲碁AI Master 〜AlphaGoから何を学ぶのか〜
AlphaGo 囲碁AI Master 〜AlphaGoから何を学ぶのか〜Kentaro Imai
 
Mcts ai
Mcts aiMcts ai
Mcts aiftgaic
 
알파고의 알고리즘
알파고의 알고리즘알파고의 알고리즘
알파고의 알고리즘SeokWon Kim
 
Application of Monte Carlo Tree Search in a Fighting Game AI (GCCE 2016)
Application of Monte Carlo Tree Search in a Fighting Game AI (GCCE 2016)Application of Monte Carlo Tree Search in a Fighting Game AI (GCCE 2016)
Application of Monte Carlo Tree Search in a Fighting Game AI (GCCE 2016)ftgaic
 
바둑인을 위한 알파고
바둑인을 위한 알파고바둑인을 위한 알파고
바둑인을 위한 알파고Donghun Lee
 
왓슨컴퓨터의 인공지능
왓슨컴퓨터의 인공지능왓슨컴퓨터의 인공지능
왓슨컴퓨터의 인공지능SeokWon Kim
 
알파고 해부하기 2부
알파고 해부하기 2부알파고 해부하기 2부
알파고 해부하기 2부Donghun Lee
 
JEITA Software Engineering Workshop 2013
JEITA Software Engineering Workshop 2013JEITA Software Engineering Workshop 2013
JEITA Software Engineering Workshop 2013Hiroyuki Yoshida
 
AlphaGo の論文を読んで (MIJS 分科会資料 2016/11/08)
AlphaGo の論文を読んで (MIJS 分科会資料 2016/11/08)AlphaGo の論文を読んで (MIJS 分科会資料 2016/11/08)
AlphaGo の論文を読んで (MIJS 分科会資料 2016/11/08)Akihiro HATANAKA
 
알파고 해부하기 1부
알파고 해부하기 1부알파고 해부하기 1부
알파고 해부하기 1부Donghun Lee
 
모두의 알파고
모두의 알파고모두의 알파고
모두의 알파고Donghun Lee
 

Andere mochten auch (20)

AlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree Search
AlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree SearchAlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree Search
AlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree Search
 
Challenges for implementing Monte Carlo Tree Search in commercial games
Challenges for implementing Monte Carlo Tree Search in commercial gamesChallenges for implementing Monte Carlo Tree Search in commercial games
Challenges for implementing Monte Carlo Tree Search in commercial games
 
인공지능개론 (머신러닝 중심)
인공지능개론 (머신러닝 중심)인공지능개론 (머신러닝 중심)
인공지능개론 (머신러닝 중심)
 
Deep Learning Into Advance - 1. Image, ConvNet
Deep Learning Into Advance - 1. Image, ConvNetDeep Learning Into Advance - 1. Image, ConvNet
Deep Learning Into Advance - 1. Image, ConvNet
 
What did AlphaGo do to beat the strongest human Go player?
What did AlphaGo do to beat the strongest human Go player?What did AlphaGo do to beat the strongest human Go player?
What did AlphaGo do to beat the strongest human Go player?
 
Google AlphaGo, 어떻게 동작할까요?
Google AlphaGo, 어떻게 동작할까요?Google AlphaGo, 어떻게 동작할까요?
Google AlphaGo, 어떻게 동작할까요?
 
ゲーム木探索技術とコンピュータ将棋への応用
ゲーム木探索技術とコンピュータ将棋への応用ゲーム木探索技術とコンピュータ将棋への応用
ゲーム木探索技術とコンピュータ将棋への応用
 
통찰의연결 학습자료 04
통찰의연결 학습자료 04통찰의연결 학습자료 04
통찰의연결 학습자료 04
 
AlphaGo 囲碁AI Master 〜AlphaGoから何を学ぶのか〜
AlphaGo 囲碁AI Master 〜AlphaGoから何を学ぶのか〜AlphaGo 囲碁AI Master 〜AlphaGoから何を学ぶのか〜
AlphaGo 囲碁AI Master 〜AlphaGoから何を学ぶのか〜
 
Mcts ai
Mcts aiMcts ai
Mcts ai
 
알파고의 알고리즘
알파고의 알고리즘알파고의 알고리즘
알파고의 알고리즘
 
Application of Monte Carlo Tree Search in a Fighting Game AI (GCCE 2016)
Application of Monte Carlo Tree Search in a Fighting Game AI (GCCE 2016)Application of Monte Carlo Tree Search in a Fighting Game AI (GCCE 2016)
Application of Monte Carlo Tree Search in a Fighting Game AI (GCCE 2016)
 
바둑인을 위한 알파고
바둑인을 위한 알파고바둑인을 위한 알파고
바둑인을 위한 알파고
 
왓슨컴퓨터의 인공지능
왓슨컴퓨터의 인공지능왓슨컴퓨터의 인공지능
왓슨컴퓨터의 인공지능
 
알파고 해부하기 2부
알파고 해부하기 2부알파고 해부하기 2부
알파고 해부하기 2부
 
JEITA Software Engineering Workshop 2013
JEITA Software Engineering Workshop 2013JEITA Software Engineering Workshop 2013
JEITA Software Engineering Workshop 2013
 
AlphaGo の論文を読んで (MIJS 分科会資料 2016/11/08)
AlphaGo の論文を読んで (MIJS 分科会資料 2016/11/08)AlphaGo の論文を読んで (MIJS 分科会資料 2016/11/08)
AlphaGo の論文を読んで (MIJS 分科会資料 2016/11/08)
 
알파고 해부하기 1부
알파고 해부하기 1부알파고 해부하기 1부
알파고 해부하기 1부
 
AlphaGoのしくみ
AlphaGoのしくみAlphaGoのしくみ
AlphaGoのしくみ
 
모두의 알파고
모두의 알파고모두의 알파고
모두의 알파고
 

Ähnlich wie AlphaGo: An AI Go player based on deep neural networks and monte carlo tree search

Introduction to Alphago Zero
Introduction to Alphago ZeroIntroduction to Alphago Zero
Introduction to Alphago ZeroChia-Ching Lin
 
Mastering the game of go with deep neural networks and tree searching
Mastering the game of go with deep neural networks and tree searchingMastering the game of go with deep neural networks and tree searching
Mastering the game of go with deep neural networks and tree searchingBrian Kim
 
Artificial neural networks introduction
Artificial neural networks introductionArtificial neural networks introduction
Artificial neural networks introductionSungminYou
 
Restricting the Flow: Information Bottlenecks for Attribution
Restricting the Flow: Information Bottlenecks for AttributionRestricting the Flow: Information Bottlenecks for Attribution
Restricting the Flow: Information Bottlenecks for Attributiontaeseon ryu
 
Dueling network architectures for deep reinforcement learning
Dueling network architectures for deep reinforcement learningDueling network architectures for deep reinforcement learning
Dueling network architectures for deep reinforcement learningTaehoon Kim
 
Parallel Machine Learning- DSGD and SystemML
Parallel Machine Learning- DSGD and SystemMLParallel Machine Learning- DSGD and SystemML
Parallel Machine Learning- DSGD and SystemMLJanani C
 
From Alpha Go to Alpha Zero - Vaas Madrid 2018
From Alpha Go to Alpha Zero -  Vaas Madrid 2018From Alpha Go to Alpha Zero -  Vaas Madrid 2018
From Alpha Go to Alpha Zero - Vaas Madrid 2018Juantomás García Molina
 
Jsai final final final
Jsai final final finalJsai final final final
Jsai final final finaldinesh malla
 
AlphaZero: A General Reinforcement Learning Algorithm that Masters Chess, Sho...
AlphaZero: A General Reinforcement Learning Algorithm that Masters Chess, Sho...AlphaZero: A General Reinforcement Learning Algorithm that Masters Chess, Sho...
AlphaZero: A General Reinforcement Learning Algorithm that Masters Chess, Sho...Joonhyung Lee
 
Alpha go 16110226_김영우
Alpha go 16110226_김영우Alpha go 16110226_김영우
Alpha go 16110226_김영우영우 김
 
fuzzy fuzzification and defuzzification
fuzzy fuzzification and defuzzificationfuzzy fuzzification and defuzzification
fuzzy fuzzification and defuzzificationNourhan Selem Salm
 
A Presentation on the Paper: Mastering the game of Go with deep neural networ...
A Presentation on the Paper: Mastering the game of Go with deep neural networ...A Presentation on the Paper: Mastering the game of Go with deep neural networ...
A Presentation on the Paper: Mastering the game of Go with deep neural networ...AdityaSuryavamshi
 
An Introduction to Reinforcement Learning - The Doors to AGI
An Introduction to Reinforcement Learning - The Doors to AGIAn Introduction to Reinforcement Learning - The Doors to AGI
An Introduction to Reinforcement Learning - The Doors to AGIAnirban Santara
 
Foundations: Artificial Neural Networks
Foundations: Artificial Neural NetworksFoundations: Artificial Neural Networks
Foundations: Artificial Neural Networksananth
 
DFA minimization algorithms in map reduce
DFA minimization algorithms in map reduceDFA minimization algorithms in map reduce
DFA minimization algorithms in map reduceIraj Hedayati
 

Ähnlich wie AlphaGo: An AI Go player based on deep neural networks and monte carlo tree search (20)

Introduction to Alphago Zero
Introduction to Alphago ZeroIntroduction to Alphago Zero
Introduction to Alphago Zero
 
Mastering the game of go with deep neural networks and tree searching
Mastering the game of go with deep neural networks and tree searchingMastering the game of go with deep neural networks and tree searching
Mastering the game of go with deep neural networks and tree searching
 
Artificial neural networks introduction
Artificial neural networks introductionArtificial neural networks introduction
Artificial neural networks introduction
 
Restricting the Flow: Information Bottlenecks for Attribution
Restricting the Flow: Information Bottlenecks for AttributionRestricting the Flow: Information Bottlenecks for Attribution
Restricting the Flow: Information Bottlenecks for Attribution
 
2021 06-02-tabnet
2021 06-02-tabnet2021 06-02-tabnet
2021 06-02-tabnet
 
Dueling network architectures for deep reinforcement learning
Dueling network architectures for deep reinforcement learningDueling network architectures for deep reinforcement learning
Dueling network architectures for deep reinforcement learning
 
Alpha zero - London 2018
Alpha zero  - London 2018 Alpha zero  - London 2018
Alpha zero - London 2018
 
Parallel Machine Learning- DSGD and SystemML
Parallel Machine Learning- DSGD and SystemMLParallel Machine Learning- DSGD and SystemML
Parallel Machine Learning- DSGD and SystemML
 
Reinforcement Learning - DQN
Reinforcement Learning - DQNReinforcement Learning - DQN
Reinforcement Learning - DQN
 
From Alpha Go to Alpha Zero - Vaas Madrid 2018
From Alpha Go to Alpha Zero -  Vaas Madrid 2018From Alpha Go to Alpha Zero -  Vaas Madrid 2018
From Alpha Go to Alpha Zero - Vaas Madrid 2018
 
Jsai final final final
Jsai final final finalJsai final final final
Jsai final final final
 
AlphaZero: A General Reinforcement Learning Algorithm that Masters Chess, Sho...
AlphaZero: A General Reinforcement Learning Algorithm that Masters Chess, Sho...AlphaZero: A General Reinforcement Learning Algorithm that Masters Chess, Sho...
AlphaZero: A General Reinforcement Learning Algorithm that Masters Chess, Sho...
 
Continuous control
Continuous controlContinuous control
Continuous control
 
Alpha go 16110226_김영우
Alpha go 16110226_김영우Alpha go 16110226_김영우
Alpha go 16110226_김영우
 
fuzzy fuzzification and defuzzification
fuzzy fuzzification and defuzzificationfuzzy fuzzification and defuzzification
fuzzy fuzzification and defuzzification
 
A Presentation on the Paper: Mastering the game of Go with deep neural networ...
A Presentation on the Paper: Mastering the game of Go with deep neural networ...A Presentation on the Paper: Mastering the game of Go with deep neural networ...
A Presentation on the Paper: Mastering the game of Go with deep neural networ...
 
An Introduction to Reinforcement Learning - The Doors to AGI
An Introduction to Reinforcement Learning - The Doors to AGIAn Introduction to Reinforcement Learning - The Doors to AGI
An Introduction to Reinforcement Learning - The Doors to AGI
 
Foundations: Artificial Neural Networks
Foundations: Artificial Neural NetworksFoundations: Artificial Neural Networks
Foundations: Artificial Neural Networks
 
Introduction to Genetic algorithm and its significance in VLSI design and aut...
Introduction to Genetic algorithm and its significance in VLSI design and aut...Introduction to Genetic algorithm and its significance in VLSI design and aut...
Introduction to Genetic algorithm and its significance in VLSI design and aut...
 
DFA minimization algorithms in map reduce
DFA minimization algorithms in map reduceDFA minimization algorithms in map reduce
DFA minimization algorithms in map reduce
 

Kürzlich hochgeladen

Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...amitlee9823
 
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...shivangimorya083
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...amitlee9823
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxolyaivanovalion
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxolyaivanovalion
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxolyaivanovalion
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...amitlee9823
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxolyaivanovalion
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...amitlee9823
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxJohnnyPlasten
 
Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...shambhavirathore45
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Valters Lauzums
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightDelhi Call girls
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...amitlee9823
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysismanisha194592
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxolyaivanovalion
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg
 

Kürzlich hochgeladen (20)

Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
 
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptx
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptx
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptx
 
Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 

AlphaGo: An AI Go player based on deep neural networks and monte carlo tree search

  • 1. AlphaGo: An AI Go Player based on Deep Neural Networks and Monte Carlo Tree Search Michael J. Moon M.Sc. Candidates in Biostatistics Dalla Lana School of Public Health University of Toronto April 7, 2016
  • 3. Introduction | Background AlphaGo | M.Moon 3 The Game of Go > Played on a square grid called a board, usually 19 x 19 > Stones – black and white – are placed alternatively > Points awarded for surrounding empty space 1. 1 Googol = 1.0 × 10100 Example of a Go Board Shades represent territories
  • 4. Introduction | Background AlphaGo | M.Moon 4 The Game of Go > Played on a square grid called a board, usually 19 x 19 > Stones – black and white – are placed alternatively > Points awarded for surrounding empty space Complexity > Possible number of sequences ≈ 250150 > Googol1 times more complex than chess > Viewed as an unsolved “grand challenge” for AI 1. 1 Googol = 1.0 × 10100 “pinnacle of perfect information games” Demis Hassabis, Co-founder of DeepMind Example of a Go Board Shades represent territories
  • 5. Introduction | Background AlphaGo | M.Moon 5 1. Image source: https://deepmind.com/alpha-go.html; 2. Source: http://www.straitstimes.com/asia/east-asia/googles-alphago-gets-divine-go-ranking > Google DeepMind’s AI Go Player
  • 6. Introduction | Background AlphaGo | M.Moon 6 1. Image source: https://deepmind.com/alpha-go.html; 2. Source: http://www.straitstimes.com/asia/east-asia/googles-alphago-gets-divine-go-ranking 5-0 against Fan Hui > Victory against 3-times European champion > First program to win against a professional player in an even game
  • 7. Introduction | Background AlphaGo | M.Moon 7 1. Image source: https://deepmind.com/alpha-go.html; 2. Source: http://www.straitstimes.com/asia/east-asia/googles-alphago-gets-divine-go-ranking 5-0 against Fan Hui > Victory against 3-times European champion > First program to win against a professional player in an even game 4-1 against Sedol Lee > Victory against world’s top player over the past decade > Awarded the highest Go ranking after the match2
  • 8. Introduction | Overview of the Design AlphaGo | M.Moon 30M Human Moves SL Policy Network Rollout Policy RL Policy Network RL Value Network 8
  • 9. Introduction | Overview of the Design AlphaGo | M.Moon 30M Human Moves SL Policy Network Rollout Policy RL Policy Network RL Value Network 9 Monte Carlo Tree Search Move Selection
  • 10. Introduction | Overview of the Design AlphaGo | M.Moon 30M Human Moves SL Policy Network Rollout Policy RL Policy Network RL Value Network 10 Monte Carlo Tree Search Asynchronous Multi-threaded Search > 40 Search Threads > 48 CPUs > 8 GPUs Distributed Version1 > 40 Search Threads > 1,202 CPUs > 176 GPUs 1. Used against Fan Hui; 1,920 CPUs and 280 GPUs against Lee http://www.economist.com/news/science-and-technology/21694540-win-or-lose-best-five-battle-contest-another-milestone Move Selection
  • 11. Methodologies | Deep Neural Network AlphaGo | M.Moon 11 Deep Learning Architecture > Multilayer (5~20) stack of simple modules subject to learning 𝑤𝑖𝑗 𝑤𝑗𝑘 𝑤 𝑘𝑙 𝑦𝑗 = 𝑓 𝑧𝑗 𝑦 𝑘 = 𝑓 𝑧 𝑘 𝑦𝑙 = 𝑓 𝑧𝑙 𝐻𝑖𝑑𝑑𝑒𝑛 𝑢𝑛𝑖𝑡𝑠 𝐻2 𝐻𝑖𝑑𝑑𝑒𝑛 𝑢𝑛𝑖𝑡𝑠 𝐻1 𝐼𝑛𝑝𝑢𝑡 𝑢𝑛𝑖𝑡𝑠 𝑂𝑢𝑡𝑝𝑢𝑡 𝑢𝑛𝑖𝑡𝑠 𝑦𝑙 𝑧𝑗 = 𝑖∈𝐼𝑛 𝑤𝑖𝑗 𝑥𝑖 𝑧 𝑘 = 𝑗∈𝐻1 𝑤𝑗𝑘 𝑦𝑗 𝑧𝑙 = 𝑘∈𝐻2 𝑤 𝑘𝑙 𝑦 𝑘 i j k l
  • 12. Methodologies | Deep Neural Network AlphaGo | M.Moon 12 Deep Learning Architecture > Multilayer (5~20) stack of simple modules subject to learning Backpropagation Training > Trained by simple stochastic gradient descent to minimize error 𝑤𝑖𝑗 𝑤𝑗𝑘 𝑤 𝑘𝑙 𝑦𝑗 = 𝑓 𝑧𝑗 𝑦 𝑘 = 𝑓 𝑧 𝑘 𝑦𝑙 = 𝑓 𝑧𝑙 𝐻𝑖𝑑𝑑𝑒𝑛 𝑢𝑛𝑖𝑡𝑠 𝐻2 𝐻𝑖𝑑𝑑𝑒𝑛 𝑢𝑛𝑖𝑡𝑠 𝐻1 𝐼𝑛𝑝𝑢𝑡 𝑢𝑛𝑖𝑡𝑠 𝑂𝑢𝑡𝑝𝑢𝑡 𝑢𝑛𝑖𝑡𝑠 𝑦𝑙 𝑧𝑗 = 𝑖∈𝐼𝑛 𝑤𝑖𝑗 𝑥𝑖 𝑧 𝑘 = 𝑗∈𝐻1 𝑤𝑗𝑘 𝑦𝑗 𝑧𝑙 = 𝑘∈𝐻2 𝑤 𝑘𝑙 𝑦 𝑘 i j k l
  • 13. Methodologies | Deep Neural Network AlphaGo | M.Moon 13 Deep Learning Architecture > Multilayer (5~20) stack of simple modules subject to learning Backpropagation Training > Trained by simple stochastic gradient descent to minimize error 𝜕𝐸 𝜕𝑦𝑙 𝐸 = 𝐿𝑜𝑠𝑠 𝐹𝑢𝑛𝑐𝑡𝑖𝑜𝑛 𝑒. 𝑔. , 𝐸 = 1 2 𝑦𝑙 − 𝑡𝑙 2 𝜕𝐸 𝜕𝑦𝑙 = 𝑦𝑙 − 𝑡𝑙 𝜕𝐸 𝜕𝑧𝑗 = 𝜕𝐸 𝜕𝑗 𝜕𝑦𝑗 𝜕𝑧𝑗 𝜕𝐸 𝜕𝑧 𝑘 = 𝜕𝐸 𝜕𝑦 𝑘 𝜕𝑦 𝑘 𝜕𝑧 𝑘 𝜕𝐸 𝜕𝑧𝑙 = 𝜕𝐸 𝜕𝑦𝑙 𝜕𝑦𝑙 𝜕𝑧𝑙 𝜕𝐸 𝜕𝑦𝑗 = 𝑘∈𝐻2 𝑤𝑗𝑘 𝜕𝐸 𝜕𝑧 𝑘 𝜕𝐸 𝜕𝑦 𝑘 = 𝑙∈𝑂𝑢𝑡 𝑤 𝑘𝑙 𝜕𝐸 𝜕𝑧𝑙 𝑤𝑖𝑗 𝑤𝑗𝑘 𝑤 𝑘𝑙 𝐻𝑖𝑑𝑑𝑒𝑛 𝑢𝑛𝑖𝑡𝑠 𝐻2 𝐻𝑖𝑑𝑑𝑒𝑛 𝑢𝑛𝑖𝑡𝑠 𝐻1 𝐼𝑛𝑝𝑢𝑡 𝑢𝑛𝑖𝑡𝑠 𝑂𝑢𝑡𝑝𝑢𝑡 𝑢𝑛𝑖𝑡𝑠 i j k l 𝑤𝑖𝑗 ′ = 𝑤𝑖𝑗 − 𝜂 𝜕𝐸 𝜕𝑧𝑗 𝜂 = 𝑠𝑡𝑒𝑝 𝑠𝑖𝑧𝑒 Application of the chain rule for derivatives to obtain gradient descents
  • 14. Methodologies | Deep Neural Network AlphaGo | M.Moon 14 Deep Learning Architecture > Multilayer (5~20) stack of simple modules subject to learning Backpropagation Training > Trained by simple stochastic gradient descent to minimize error > Rectified linear unit (ReLU) learns faster than other non-linearities 𝑓 𝑥 = max 0, 𝑥 𝑤𝑖𝑗 𝑤𝑗𝑘 𝑤 𝑘𝑙 𝑦𝑗 = 𝑓 𝑧𝑗 𝑦 𝑘 = 𝑓 𝑧 𝑘 𝑦𝑙 = 𝑓 𝑧𝑙 𝐻𝑖𝑑𝑑𝑒𝑛 𝑢𝑛𝑖𝑡𝑠 𝐻2 𝐻𝑖𝑑𝑑𝑒𝑛 𝑢𝑛𝑖𝑡𝑠 𝐻1 𝑂𝑢𝑡𝑝𝑢𝑡 𝑢𝑛𝑖𝑡𝑠 𝑦𝑙 𝑧𝑗 = 𝑖∈𝐼𝑛 𝑤𝑖𝑗 𝑥𝑖 𝑧 𝑘 = 𝑗∈𝐻1 𝑤𝑗𝑘 𝑦𝑗 𝑧𝑙 = 𝑘∈𝐻2 𝑤 𝑘𝑙 𝑦 𝑘 i j k l 𝐼𝑛𝑝𝑢𝑡 𝑢𝑛𝑖𝑡𝑠
  • 15. Methodologies | Deep Convolutional Neural Network AlphaGo | M.Moon 15 Input Arrays such as signals, images and videos Local Connections Arrays such as signals, images and videos Shared Weights Each filter with common weights and bias to create a feature 𝑾 𝟏 𝑾 𝟏 𝑾 𝟏 Pooling Coarse-graining the position of each feature, typically by taking max from neighbouring features Non-linearity Local weighted sums to a non-linearity such as ReLU Size and Stride Filter size 3 with stride 2 Deep Architecture Uses stacks of many layers Properties of natural signals
  • 16. Methodologies | Deep Convolutional Neural Network AlphaGo | M.Moon 16 Architecture > Highly correlated local groups > Local statistics invariant to location Properties > Compositional hierarchy > Invariant to small shifts and distortions due to pooling > Weights trained through backpropagation 𝑾 𝟏 𝑾 𝟏 𝑾 𝟏
  • 17. Methodologies | Monte Carlo Tree Search AlphaGo | M.Moon 17 Tree Policy 𝑔 𝑝 𝑎 𝑠 , 𝑁 𝑎 , 𝑠 ∈ 𝑇𝑟𝑒𝑒 𝑁𝑜𝑑𝑒𝑠 Default Policy 𝑝 𝑎 𝑠 , 𝑠 ∉ 𝑇𝑟𝑒𝑒 𝑁𝑜𝑑𝑒𝑠 𝑠 ∈ 𝑆, 𝑛𝑜𝑑𝑒𝑠 𝑎 ∈ 𝐴, 𝑒𝑑𝑔𝑒𝑠 𝑟 𝑠 , 𝑟𝑒𝑤𝑎𝑟𝑑 𝑁 𝑎 , 𝑣𝑖𝑠𝑖𝑡 𝑐𝑜𝑢𝑛𝑡 Overview Find optimal decisions by: > Take random samples in the decision space > Build a search tree according to the result
  • 18. Methodologies | Monte Carlo Tree Search AlphaGo | M.Moon 18 Selection Traverse to the most urgent expandable node Tree Policy 𝑔 𝑝 𝑎 𝑠 , 𝑁 𝑎 , 𝑠 ∈ 𝑇𝑟𝑒𝑒 𝑁𝑜𝑑𝑒𝑠 Tries to balance exploration and exploitation Default Policy 𝑝 𝑎 𝑠 , 𝑠 ∉ 𝑇𝑟𝑒𝑒 𝑁𝑜𝑑𝑒𝑠 𝑠 ∈ 𝑆, 𝑛𝑜𝑑𝑒𝑠 𝑎 ∈ 𝐴, 𝑒𝑑𝑔𝑒𝑠 𝑟 𝑠 , 𝑟𝑒𝑤𝑎𝑟𝑑 𝑁 𝑎 , 𝑣𝑖𝑠𝑖𝑡 𝑐𝑜𝑢𝑛𝑡
  • 19. Methodologies | Monte Carlo Tree Search AlphaGo | M.Moon 19 Selection Traverse to the most urgent expandable node Expansion Add a child node from the selected node Tree Policy 𝑔 𝑝 𝑎 𝑠 , 𝑁 𝑎 , 𝑠 ∈ 𝑇𝑟𝑒𝑒 𝑁𝑜𝑑𝑒𝑠 Tries to balance exploration and exploitation Default Policy 𝑝 𝑎 𝑠 , 𝑠 ∉ 𝑇𝑟𝑒𝑒 𝑁𝑜𝑑𝑒𝑠 𝑠 ∈ 𝑆, 𝑛𝑜𝑑𝑒𝑠 𝑎 ∈ 𝐴, 𝑒𝑑𝑔𝑒𝑠 𝑟 𝑠 , 𝑟𝑒𝑤𝑎𝑟𝑑 𝑁 𝑎 , 𝑣𝑖𝑠𝑖𝑡 𝑐𝑜𝑢𝑛𝑡
  • 20. Methodologies | Monte Carlo Tree Search AlphaGo | M.Moon 20 𝑟 𝑠′ Selection Traverse to the most urgent expandable node Expansion Add a child node from the selected node Simulation Simulate from the newly added node to an outcome Tree Policy 𝑔 𝑝 𝑎 𝑠 , 𝑁 𝑎 , 𝑠 ∈ 𝑇𝑟𝑒𝑒 𝑁𝑜𝑑𝑒𝑠 Default Policy 𝑝 𝑎 𝑠 , 𝑠 ∉ 𝑇𝑟𝑒𝑒 𝑁𝑜𝑑𝑒𝑠 𝑠 ∈ 𝑆, 𝑛𝑜𝑑𝑒𝑠 𝑎 ∈ 𝐴, 𝑒𝑑𝑔𝑒𝑠 𝑟 𝑠 , 𝑟𝑒𝑤𝑎𝑟𝑑 𝑁 𝑎 , 𝑣𝑖𝑠𝑖𝑡 𝑐𝑜𝑢𝑛𝑡
  • 21. Methodologies | Monte Carlo Tree Search AlphaGo | M.Moon 21 𝑟 𝑠′ Selection Traverse to the most urgent expandable node Expansion Add a child node from the selected node Simulation Simulate from the newly added node to an outcome Backpropagation Backup simulation result through selected nodes Tree Policy 𝑔 𝑝 𝑎 𝑠 , 𝑁 𝑎 , 𝑠 ∈ 𝑇𝑟𝑒𝑒 𝑁𝑜𝑑𝑒𝑠 Default Policy 𝑝 𝑎 𝑠 , 𝑠 ∉ 𝑇𝑟𝑒𝑒 𝑁𝑜𝑑𝑒𝑠 𝑠 ∈ 𝑆, 𝑛𝑜𝑑𝑒𝑠 𝑎 ∈ 𝐴, 𝑒𝑑𝑔𝑒𝑠 𝑟 𝑠 , 𝑟𝑒𝑤𝑎𝑟𝑑 𝑁 𝑎 , 𝑣𝑖𝑠𝑖𝑡 𝑐𝑜𝑢𝑛𝑡
  • 22. Methodologies | Monte Carlo Tree Search AlphaGo | M.Moon 22 𝑟 𝑠′ Tree Policy 𝑔 𝑝 𝑎 𝑠 , 𝑁 𝑎 , 𝑠 ∈ 𝑇𝑟𝑒𝑒 𝑁𝑜𝑑𝑒𝑠 Default Policy 𝑝 𝑎 𝑠 , 𝑠 ∉ 𝑇𝑟𝑒𝑒 𝑁𝑜𝑑𝑒𝑠 Strengths > Anytime algorithm – gives a valid solution at any time of interruption > Values of intermediate states are not evaluated – domain knowledge not required 𝑠 ∈ 𝑆, 𝑛𝑜𝑑𝑒𝑠 𝑎 ∈ 𝐴, 𝑒𝑑𝑔𝑒𝑠 𝑟 𝑠 , 𝑟𝑒𝑤𝑎𝑟𝑑 𝑁 𝑎 , 𝑣𝑖𝑠𝑖𝑡 𝑐𝑜𝑢𝑛𝑡
  • 23. Design | Problem Setting AlphaGo | M.Moon 23 Unique Optimal Value Function 𝑣∗ 𝑠 = 𝑧 𝑇 𝑖𝑓 𝑠 = 𝑠 𝑇 max 𝑎 −𝑣∗ 𝑓 𝑥, 𝑎 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒 > 𝑠 ∈ 𝑆 State of the game > 𝑎 ∈ 𝐴(𝑠) Legal actions at 𝑠 > 𝑓(𝑎, 𝑠) Deterministic state transitions > 𝑟 𝑖 𝑠 Reward for player 𝑖 at 𝑠, 𝑖 ∈ 1,2 𝑍𝑒𝑟𝑜 − 𝑠𝑢𝑚 𝑔𝑎𝑚𝑒: 𝑟 𝑠 = 𝑟1 𝑠 = −𝑟2 𝑠 𝑟 𝑠 = 0 𝑖𝑓 𝑠 ≠ 𝑠 𝑇 > 𝑧𝑡 = ±𝑟 𝑠 𝑇 Terminal reward at 𝑠 𝑇 zt ∈ −1,1 Value Function > 𝑣 𝑝 (𝑠) 𝐸 𝑧𝑡 𝑠𝑡 = 𝑠, 𝑎 𝑡,…,𝑇~𝑝 Policy > 𝑝 𝑎 𝑠 Probability distribution over legal actions
  • 24. Design | Rollout Policy AlphaGo | M.Moon 24 > A fast, linear softmax policy for simulation > Pattern-based feature inputs > Trained using 8 million positions > Less domain knowledge implemented compared to existing MTSC Go programs > 24.2% prediction accuracy > Similar 𝑝𝜏 𝑎 𝑠 for tree expansion 𝑝 𝜋 𝑎 𝑠 𝑀𝑎𝑥𝑚𝑖𝑧𝑒 ∆𝜋 ∝ 𝜕𝑙𝑜𝑔𝑝 𝜋 𝑎 𝑠 𝜕𝜋
  • 25. Design | Neural Network Architectures AlphaGo | M.Moon 25 1 1 0 1 1 0 1 0 0 0 1 0 1 0 Input 19 x 19 intersections x 48 feature plane x48 +1Input Feature Space > Stone Colour > Ones & Zeros > Turns Since > Liberties > Capture Size > Self-atari Size > Liberties after Move > Ladder Capture > Ladder Escape > Sensibleness with respect to current player Extra Feature for Value Network > Player Colour 0 19 x 19 1 1 0 1 0 1 0 0 1 0
  • 26. Design | Neural Network Architectures AlphaGo | M.Moon 26 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 1 1 0 0 0 1 0 0 0 0 0 0 0 1 0 𝑘 Filters Kernel size 5 x 5 with stride 1 convolution 𝑘 = 192 𝑎𝑡 𝑚𝑎𝑡𝑐ℎ Zero-Padding (19+4) x (19+4) ReLU 𝑓 𝒙 = max 0, 𝒙 0 0 0 0 0 0 0 0
  • 27. Design | Neural Network Architectures AlphaGo | M.Moon 27 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 1 1 0 0 0 1 0 0 0 0 0 0 0 1 0 19 x 19 Output 𝑃𝑟𝑒𝑣𝑖𝑜𝑢𝑠𝐷𝑖𝑚 + 𝑃𝑎𝑑𝑑𝑖𝑛𝑔 𝑆𝑖𝑧𝑒 − 𝐾𝑒𝑟𝑛𝑒𝑟 𝑆𝑖𝑧𝑒 𝑆𝑡𝑟𝑖𝑑𝑒 + 1 = 19 + 4 − 5 1 + 1 = 19 0 0 0 0 0 0 0 0
  • 28. Design | Neural Network Architectures AlphaGo | M.Moon 28 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 1 1 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 00 0 0 0 x11 𝑘 Filters Kernel size 3 x 3 with stride 1 convolution Zero-Padding (19+2) x (19+2) ReLU 𝑓 𝒙 = max 0, 𝒙 19 x 19 0 0 0 0 0 0 0 0
  • 29. Design | Neural Network Architectures AlphaGo | M.Moon 29 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 1 1 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 00 0 0 0 Policy 1-Stride Convolution 1 kernel of size 1 x 1 with different bias for each intersection Softmax Function Outputs 𝑝 𝑎 𝑠 for each of 19 x 19 intersections Value 1-Stride Convolution 1 kernel of size 1 x 1 Tanh Function Fully-connected layer Outputs a single 𝑣 𝜃 𝑠 ∈ −1,1 256 Rectifiers Fully-connected layer 19 x 19 x11 Convolution Layers 0 0 0 0 0 0 0 0
  • 30. Design | Supervised Learning Policy Network AlphaGo | M.Moon 30 > Trained using mini-batches of 16 randomly selected from 28.4 million positions > Trained on 50 GPUs over 3 weeks > Tested with 1 million positions > 57.0% prediction accuracy 𝑝 𝜎 𝑎 𝑠 𝑀𝑎𝑥𝑚𝑖𝑧𝑒 ∆𝜎 ∝ 𝜕𝑙𝑜𝑔𝑝 𝜎 𝑎 𝑠 𝜕𝜎
  • 31. Design | Reinforcement Learning Policy Network AlphaGo | M.Moon 31 > Trained using self-play between the current network and a randomly selected previous iteration of 𝑝 𝜎 𝑎 𝑠 > Trained over 10,000 million mini-batches of 128 games > Evaluated through game play 𝑎~𝑝 𝜌 ∙ 𝑠 without search > 80% against 𝑝 𝜎 𝑎 𝑠 > 85% against strongest open-source Go program 𝑝 𝜌 𝑎 𝑠 𝑀𝑎𝑥𝑚𝑖𝑧𝑒 ∆𝜌 ∝ 𝜕𝑙𝑜𝑔𝑝 𝜌 𝑎 𝑡 𝑠𝑡 𝜕𝜌 𝑧𝑡
  • 32. Design | Value Network AlphaGo | M.Moon 32 > Trained using 30 million distinct positions from a separate game generated by a random mix of 𝑝 𝜎 𝑎 𝑠 and 𝑝 𝜌 𝑎 𝑠 to prevent overfitting > Consistently more accurate than 𝑝 𝜋 𝑎 𝑠 > Approaches Monte Carlo rollouts using 𝑝 𝜌 𝑎 𝑠 with less computation 𝑣 𝜃(𝑠) 𝑀𝑖𝑛𝑖𝑚𝑖𝑧𝑒 ∆𝜃 ∝ 𝜕𝑣 𝜃 𝑠 𝜕𝜃 𝑧 − 𝑣 𝜃 𝑠 𝒗 𝜽 𝒔 ≈ 𝒗 𝒑 𝝆 𝒔 ≈ 𝒗∗ 𝒔
  • 33. Design | Search Algorithm AlphaGo | M.Moon 33 *Image captured from Silver D. et al. (2016) Edge (𝑠,𝑎) Data 𝑄 𝑠, 𝑎 = 𝑎𝑐𝑡𝑖𝑜𝑛 𝑣𝑎𝑙𝑢𝑒 𝑁 𝑠, 𝑎 = 𝑣𝑖𝑠𝑖𝑡 𝑐𝑜𝑢𝑛𝑡 𝑃 𝑠, 𝑎 = 𝑝𝑟𝑖𝑜𝑟 𝑝𝑟𝑜𝑏𝑎𝑏𝑖𝑙𝑖𝑡𝑦
  • 34. Design | Search Algorithm AlphaGo | M.Moon 34 *Image captured from Silver D. et al. (2016) Selection 𝑎 𝑡 = 𝑎𝑟𝑔𝑚𝑎𝑥 𝑎 𝑄 𝑠𝑡, 𝑎 + 𝑢 𝑠𝑡, 𝑎 𝑢 𝑠𝑡, 𝑎 ∝ 𝑃 𝑠, 𝑎 1 + 𝑁 𝑠, 𝑎 encourages exploration Stop if t = 𝐿, predefined time step Edge (𝑠,𝑎) Data 𝑄 𝑠, 𝑎 = 𝑎𝑐𝑡𝑖𝑜𝑛 𝑣𝑎𝑙𝑢𝑒 𝑁 𝑠, 𝑎 = 𝑣𝑖𝑠𝑖𝑡 𝑐𝑜𝑢𝑛𝑡 𝑃 𝑠, 𝑎 = 𝑝𝑟𝑖𝑜𝑟 𝑝𝑟𝑜𝑏𝑎𝑏𝑖𝑙𝑖𝑡𝑦
  • 35. Design | Search Algorithm AlphaGo | M.Moon 35 *Image captured from Silver D. et al. (2016) Expansion E𝑥𝑝𝑎𝑛𝑑 using a~ 𝑝τ 𝑖𝑓 𝑁 𝑠, 𝑎 > 𝑛 𝑡ℎ𝑟, a dynamic threshold 𝑆𝑒𝑡 𝑃 𝑠 𝐿, 𝑎 = 𝑝 𝜎 𝑠 𝐿, 𝑎 Edge (𝑠,𝑎) Data 𝑄 𝑠, 𝑎 = 𝑎𝑐𝑡𝑖𝑜𝑛 𝑣𝑎𝑙𝑢𝑒 𝑁 𝑠, 𝑎 = 𝑣𝑖𝑠𝑖𝑡 𝑐𝑜𝑢𝑛𝑡 𝑃 𝑠, 𝑎 = 𝑝𝑟𝑖𝑜𝑟 𝑝𝑟𝑜𝑏𝑎𝑏𝑖𝑙𝑖𝑡𝑦
  • 36. Design | Search Algorithm AlphaGo | M.Moon 36 *Image captured from Silver D. et al. (2016) Evaluation 𝑉 𝑠 𝐿 = 1 − 𝜆 𝑣 𝜃 𝑠 𝐿 + 𝜆𝑧 𝐿 𝑧 𝐿 = 𝑠 𝑇~𝑝 𝜋 𝑎 𝑠 𝐿 𝑎𝑛𝑑 𝜆 = 𝑚𝑖𝑥𝑖𝑛𝑔 𝑝𝑎𝑟𝑎𝑚𝑒𝑡𝑒𝑟 Edge (𝑠,𝑎) Data 𝑄 𝑠, 𝑎 = 𝑎𝑐𝑡𝑖𝑜𝑛 𝑣𝑎𝑙𝑢𝑒 𝑁 𝑠, 𝑎 = 𝑣𝑖𝑠𝑖𝑡 𝑐𝑜𝑢𝑛𝑡 𝑃 𝑠, 𝑎 = 𝑝𝑟𝑖𝑜𝑟 𝑝𝑟𝑜𝑏𝑎𝑏𝑖𝑙𝑖𝑡𝑦
  • 37. Design | Search Algorithm AlphaGo | M.Moon 37 *Image captured from Silver D. et al. (2016) Edge (𝑠,𝑎) Data 𝑄 𝑠, 𝑎 = 𝑎𝑐𝑡𝑖𝑜𝑛 𝑣𝑎𝑙𝑢𝑒 𝑁 𝑠, 𝑎 = 𝑣𝑖𝑠𝑖𝑡 𝑐𝑜𝑢𝑛𝑡 𝑃 𝑠, 𝑎 = 𝑝𝑟𝑖𝑜𝑟 𝑝𝑟𝑜𝑏𝑎𝑏𝑖𝑙𝑖𝑡𝑦 Backup 𝑁 𝑠, 𝑎 = 𝑖=1 𝑖 1 𝑠, 𝑎, 𝑖 𝑄 𝑠, 𝑎 = 1 𝑁 𝑠, 𝑎 𝑖=1 𝑖 1 𝑠, 𝑎, 𝑖 𝑉 𝑆 𝐿 𝑖 𝑖 ∈ 1, 𝑛 𝑎𝑠𝑦𝑛𝑐ℎ𝑟𝑜𝑛𝑜𝑢𝑠 𝑠𝑖𝑚𝑢𝑙𝑎𝑡𝑖𝑜𝑛𝑠
  • 38. Design | Search Algorithm AlphaGo | M.Moon 38 *Image captured from Silver D. et al. (2016) Edge (𝑠,𝑎) Data 𝑄 𝑠, 𝑎 = 𝑎𝑐𝑡𝑖𝑜𝑛 𝑣𝑎𝑙𝑢𝑒 𝑁 𝑠, 𝑎 = 𝑣𝑖𝑠𝑖𝑡 𝑐𝑜𝑢𝑛𝑡 𝑃 𝑠, 𝑎 = 𝑝𝑟𝑖𝑜𝑟 𝑝𝑟𝑜𝑏𝑎𝑏𝑖𝑙𝑖𝑡𝑦 Select Move 𝑎′ = argmax 𝑎 𝑁 𝑠, 𝑎
  • 39. Discussion | Performance AlphaGo | M.Moon 39 Against AI Players > Played against strongest commercial and open-source Go programs based on MCTS > Single machine AlphaGo won 494 out of 495 in even games > Distributed version of AlphaGo won 77% against the single machine version and 100% against others
  • 40. Discussion | Performance AlphaGo | M.Moon 40 Against Fan Hui > Won 5-0 in formal games with 1 hour of main time + three 30s byoyomi1’s > Won 3-2 in informal games with three 30s byoyomi1’s 1. Time slots to be consumed after exhausting main time; reset to full period if not exceeded in a single turn; *Image captured from Silver D. et al. (2016)
  • 41. Discussion | Performance AlphaGo | M.Moon 41 Against Sedol Lee > Won 4-1 in formal games with 2 hours of main time + three 60s byoyomi’s > Game 4 – the only loss – being analyzed > MCTS may have overlooked Lee’s game changing move – which was the only move that could save the game at the state Game 4 Sedol Lee (White), AlphaGo (Black) Sedol Lee wins by resignation *Image captured from https://gogameguru.com/lee-sedol-defeats-alphago-masterful-comeback-game-4/
  • 42. Discussion | Future Work AlphaGo | M.Moon 42 Next Potential Matches > Imperfect information games (e.g., Poker, StarCraft) > AlphaGo based on pure learning > Testbed for future algorithmic researches Areas Applications > Gaming > Healthcare > Smartphone Assistant Healthcare Applications > Medical diagnosis of images > Longitudinal tracking of vital signs to help people have healthier lifestyles
  • 43. Discussion | Future Work AlphaGo | M.Moon 43 Next Potential Matches > Imperfect information games (e.g., Poker, StarCraft) > AlphaGo based on pure learning > Testbed for future algorithmic researches “it’d be cool if one day an AI was involved in finding a new particle” Demis Hassabis, Co-founder of DeepMind Areas Applications > Gaming > Healthcare > Smartphone Assistant Healthcare Applications > Medical diagnosis of images > Longitudinal tracking of vital signs to help people have healthier lifestyles
  • 44. References AlphaGo | M.Moon 44 Browne, C. B., Powley, E., Whitehouse, D., Lucas, S. M., Cowling, P. I., Rohlfshagen, P., . . . Colton, S. (2012). A Survey of Monte Carlo Tree Search Methods. IEEE Trans. Comput. Intell. AI Games IEEE Transactions on Computational Intelligence and AI in Games, 4(1), 1-43. Byford, S. (2016, March 10). DeepMind founder Demis Hassabis on how AI will shape the future. The Verge. Retrieved April 02, 2016, from http://www.theverge.com/2016/3/10/11192774/demis-hassabis-interview-alphago-google-deepmind-ai Google Inc. (2016). AlphaGo | Google DeepMind. Retrieved April 02, 2016, from https://deepmind.com/alpha-go.html Lecun, Y., Bengio, Y., & Hinton, G. (2015). Deep learning. Nature, 521(7553), 436-444. Ormerod, D. (2016, March 13). Lee Sedol defeats AlphaGo in masterful comeback - Game 4. Retrieved April 06, 2016, from https://gogameguru.com/lee-sedol-defeats-alphago-masterful-comeback-game-4/ Silver, D., Huang, A., Maddison, C. J., Guez, A., Sifre, L., Driessche, G. V., . . . Hassabis, D. (2016). Mastering the game of Go with deep neural networks and tree search. Nature, 529(7587), 484-489.

Hinweis der Redaktion

  1. Pachi runs 100,000 simulations per move
  2. AlphaGo seems to be able to manage risk more precisely than humans can, and is completely happy to accept losses as long as its probability of winning remains favorable.