1. AlphaGo
Analysis
from
Deep
Learning
Perspec6ve
Chayan
Chakrabar6
July
11,
2016
Pleasanton,
CA
2. Mastering
the
game
of
GO
• DeepMind
problem
domain
• Deep
learning
and
reinforcement
learning
concepts
• Design
of
AlphaGo
• Execu6on
3. GO:
perfect
informa6on
game
All
possible
GO
boards
=
250150
>
Number
of
atoms
in
the
universe
4. Reduce
search
space
• Reduce
breadth
– Not
all
moves
are
equally
likely
– Some
moves
are
bePer
– Leverage
moves
made
by
expert
players
• Reduce
depth
– Evaluate
strength
of
board
(likelihood
of
winning)
– Collapse
symmetrical
or
similar
boards
– Simulate
the
games
17. Two
kinds
of
policies
● used a large database of online expert games
● learned two versions of the neural network
○ a fast network P for use in evaluation
○ an accurate network P for use in selection
Step 1: learn to predict human moves
CS63 topic
neural networks
week 7, 14?
19. Reduce
depth
by
board
evalua6on
Updated$Model
ver 1,000,000
Board$Position
Training:
Value$
Predictio
Model
(Regressio
Evaluation
Updated$Model
W
Value$
Prediction$
Adds$a reg
Predicts$v
Close$to$1
Close$to$0
Win$/$Loss
e$
Adds$a regression$layer$to$the$model
Predicts$values$between$0~1
Close$to$1:$a$good$board$position
Close$to$0:$a$bad$board$position
aluation
Updated$Model
ver 1,000,000
Training:
Win$/$Loss
Win
(0~1)
Value$
Prediction$
Model
(Regression)
Adds$a regression$layer$to$the$model
Predicts$values$between$0~1
Close$to$1:$a$good$board$position
Close$to$0:$a$bad$board$position
20. Value
follows
from
policy
Step 3: learn a board evaluation network, V
● use random samples from the self-play database
● prediction target: probability that black wins from a
given board
21. PuWng
it
all
together
Looking*ahead*(w/*Monte*Carlo*Search*Tree)
Action$Candidates$Reduction
(Policy$Network)
Board$Evaluation
(Value$Network)
(Rollout):$Faster$version$of$estimating$p(a|s)
! uses shallow$networks$(3$ms ! 2µs)
27. Apply
trained
networks
to
tasks
with
different
loss
func6on
Takeaways
Use+the+networks+trained+for+a+certain+task+(with+different+loss+objectives)+for+several+other+ta
28. Single
most
important
takeaway
• Feature
abstrac6on
is
the
key
component
of
any
machine
learning
algorithm
• Convolu6onal
neural
networks
are
great
at
automated
feature
abstrac6on
29. Reference
Silver
et.
al.
Mastering
the
Game
of
Go
with
Deep
Neural
Networks
and
Tree
Search.
Nature.
529,
484–489.
January
2016.
30. About
the
speaker
Chayan
Chakrabar6
hPps://www.linkedin.com/in/chayanchakrabar6