SlideShare a Scribd company logo
1 of 54
Download to read offline
Presenter: Shane	
  (Seungwhan)	
  Moon
PhD	
  student
Language	
  Technologies	
  Institute,	
  School	
  of	
  Computer	
  Science
Carnegie	
  Mellon	
  University
3/2/2016
How	
  it	
  works
AlphaGo vs	
  European	
  Champion	
  (Fan	
  Hui 2-­‐Dan)
October	
  5	
  – 9,	
  2015
<Official	
  match>
-­‐ Time	
  limit:	
  1	
  hour
-­‐ AlphaGo Wins (5:0)
*
rank
AlphaGo vs	
  World	
  Champion	
  (Lee	
  Sedol 9-­‐Dan)
March	
  9	
  – 15,	
  2016
<Official	
  match>
-­‐ Time	
  limit:	
  2	
  hours
Venue:	
  Seoul,	
  Four	
  Seasons	
  Hotel
Image	
  Source: Josun	
  Times Jan	
  28th
2015
Lee	
  Sedol
Photo	
  source: Maeil	
  Economics 2013/04
wiki
Computer	
  Go	
  AI?
Computer	
  Go	
  AI – Definition
s (state)
d	
  =	
  1
0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 1 0 0
0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0
=
(e.g.	
  we	
  can	
  represent	
  the	
  board	
  into	
  a	
  matrix-­‐like	
  form)
*	
  The	
  actual	
  model	
  uses	
  other	
  features	
  than	
  board	
  positions	
  as	
  well
Computer	
  Go	
  AI	
  – Definition
s (state)
d	
  =	
  1 d	
  =	
  2
a (action)
Given	
  s,	
  pick	
  the	
  best	
  a
Computer	
  Go
Artificial	
  
Intelligence
s a s'
Computer	
  Go	
  AI – An	
  Implementation	
  Idea?
d	
  =	
  1 d	
  =	
  2
…
How	
  about	
  simulating	
  all	
  possible	
  board	
  positions?
Computer	
  Go	
  AI	
  – An	
  Implementation	
  Idea?
d	
  =	
  1 d	
  =	
  2
…
d	
  =	
  3
…
…
…
…
Computer	
  Go	
  AI	
  – An	
  Implementation	
  Idea?
d	
  =	
  1 d	
  =	
  2
…
d	
  =	
  3
…
…
…
…
… d	
  =	
  maxD
Process	
  the	
  simulation	
  until	
  the	
  game	
  ends,
then	
  report	
  win	
  /	
  lose	
  results
Computer	
  Go	
  AI	
  – An	
  Implementation	
  Idea?
d	
  =	
  1 d	
  =	
  2
…
d	
  =	
  3
…
…
…
…
… d	
  =	
  maxD
Process	
  the	
  simulation	
  until	
  the	
  game	
  ends,
then	
  report	
  win	
  /	
  lose	
  results
e.g. it	
  wins	
  13	
  times	
  if	
  the	
  next	
  stone	
  gets	
  placed	
  here
37,839	
  times
431,320	
  times
Choose	
  the	
  “next	
  action	
  /	
  stone”
that	
  has	
  the	
  most	
  win-­‐counts	
  in	
  the	
  full-­‐scale	
  simulation
This	
  is	
  NOT	
  possible;	
  it	
  is	
  said	
  the	
  possible	
  configurations	
  of	
  the	
  board	
  exceeds	
  the	
  number	
   of	
  atoms	
  in	
  the	
  universe
Key: To	
  Reduce Search	
  Space
Reducing	
  Search	
  Space
1.	
  Reducing	
  “action	
  candidates”	
  (Breadth	
  Reduction)
d	
  =	
  1 d	
  =	
  2
…
d	
  =	
  3
…
…
…
… d	
  =	
  maxD
Win?
Loss?
IF	
  there	
  is	
  a	
  model	
  that	
  can	
  tell	
  you	
  that	
  these	
  moves
are	
  not	
  common	
  /	
  probable	
  (e.g.	
  by	
  experts,	
  etc.)	
  …
Reducing	
  Search	
  Space
1.	
  Reducing	
  “action	
  candidates”	
  (Breadth	
  Reduction)
d	
  =	
  1 d	
  =	
  2
…
d	
  =	
  3
…
… d	
  =	
  maxD
Win?
Loss?
Remove	
  these	
  from	
  search	
  candidates	
  in	
  advance (breadth	
  reduction)
Reducing	
  Search	
  Space
2.	
  Position	
  evaluation	
  ahead	
  of	
  time	
  (Depth	
  Reduction)
d	
  =	
  1 d	
  =	
  2
…
d	
  =	
  3
…
… d	
  =	
  maxD
Win?
Loss?
Instead	
  of	
  simulating	
  until	
  the	
  maximum	
  depth ..
Reducing	
  Search	
  Space
2.	
  Position	
  evaluation	
  ahead	
  of	
  time	
  (Depth	
  Reduction)
d	
  =	
  1 d	
  =	
  2
…
d	
  =	
  3
…
V	
  =	
  1
V	
  =	
  2
V	
  =	
  10
IF	
  there	
  is	
  a	
  function	
  that	
  can	
  measure:
V(s):	
  “board	
  evaluation	
  of	
  state	
  s”
Reducing	
  Search	
  Space
1. Reducing	
  “action	
  candidates”	
  (Breadth	
  Reduction)
2. Position	
  evaluation	
  ahead	
  of	
  time	
  (Depth	
  Reduction)
1.	
  Reducing	
  “action	
  candidates”
Learning:	
  P	
  (	
  next	
  action	
  |	
  current	
  state	
  )
=	
  P	
  (	
  a	
  |	
  s	
  )
1.	
  Reducing	
  “action	
  candidates”
(1) Imitating	
  expert	
  moves	
  (supervised	
  learning)
Current	
  State
Prediction	
  
Model
Next	
  State
s1 s2
s2 s3
s3 s4
Data:	
  Online	
  Go experts (5~9	
  dan)
160K games, 30M	
  board	
  positions
1.	
  Reducing	
  “action	
  candidates”
(1) Imitating	
  expert	
  moves	
  (supervised	
  learning)
Prediction	
  Model
Current	
  Board Next	
  Board
1.	
  Reducing	
  “action	
  candidates”
(1) Imitating	
  expert	
  moves	
  (supervised	
  learning)
Prediction	
  Model
Current	
  Board Next	
  Action
There	
  are	
  19	
  X	
  19	
  =	
  361
possible	
  actions
(with	
  different	
  probabilities)
1.	
  Reducing	
  “action	
  candidates”
(1) Imitating	
  expert	
  moves	
  (supervised	
  learning)
Prediction	
  Model
0 0	
   0 0 0	
   0 0 0 0
0 0	
   0 0 0 1 0 0 0
0 -­‐1 0 0 1 -­‐1 1 0 0
0 1 0 0 1 -­‐1 0 0 0
0 0	
   0 0 -­‐1 0 0 0 0
0 0	
   0 0 0	
   0 0 0 0
0 -­‐1 0 0 0	
   0 0 0 0
0 0	
   0 0 0	
   0 0 0 0
0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0
0 0 0 0 0 1 0 0 0
0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0
s af:	
  s à a
Current	
  Board Next	
  Action
1.	
  Reducing	
  “action	
  candidates”
(1) Imitating	
  expert	
  moves	
  (supervised	
  learning)
Prediction	
  
Model
0 0	
   0 0 0	
   0 0 0 0
0 0	
   0 0 0 1 0 0 0
0 -­‐1 0 0 1 -­‐1 1 0 0
0 1 0 0 1 -­‐1 0 0 0
0 0	
   0 0 -­‐1 0 0 0 0
0 0	
   0 0 0	
   0 0 0 0
0 -­‐1 0 0 0	
   0 0 0 0
0 0	
   0 0 0	
   0 0 0 0
0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0
0 0 0 0 0 1 0 0 0
0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0
s g:	
  s à p(a|s)
0 0 0 0 0 0 0 0 0
0 0 0 0 0 0	
  	
  	
  	
  	
  	
   0 0 0
0 0 0 0 0 0 0 0 0
0 0 0 0 0 0.2 0.1 0 0
0 0 0 0 0 0.4	
  0.2 0 0
0 0 0 0 0 0.1	
  	
  	
   0 0 0
0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0
p(a|s) aargmax
Current	
  Board Next	
  Action
1.	
  Reducing	
  “action	
  candidates”
(1) Imitating	
  expert	
  moves	
  (supervised	
  learning)
Prediction	
  
Model
0 0	
   0 0 0	
   0 0 0 0
0 0	
   0 0 0 1 0 0 0
0 -­‐1 0 0 1 -­‐1 1 0 0
0 1 0 0 1 -­‐1 0 0 0
0 0	
   0 0 -­‐1 0 0 0 0
0 0	
   0 0 0	
   0 0 0 0
0 -­‐1 0 0 0	
   0 0 0 0
0 0	
   0 0 0	
   0 0 0 0
0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0
0 0 0 0 0 1 0 0 0
0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0
s g:	
  s à p(a|s) p(a|s) aargmax
Current	
  Board Next	
  Action
1.	
  Reducing	
  “action	
  candidates”
(1) Imitating	
  expert	
  moves	
  (supervised	
  learning)
Deep	
  Learning
(13	
  Layer	
  CNN)
0 0	
   0 0 0	
   0 0 0 0
0 0	
   0 0 0 1 0 0 0
0 -­‐1 0 0 1 -­‐1 1 0 0
0 1 0 0 1 -­‐1 0 0 0
0 0	
   0 0 -­‐1 0 0 0 0
0 0	
   0 0 0	
   0 0 0 0
0 -­‐1 0 0 0	
   0 0 0 0
0 0	
   0 0 0	
   0 0 0 0
0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0
0 0 0 0 0 1 0 0 0
0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0
s g:	
  s à p(a|s)
0 0 0 0 0 0 0 0 0
0 0 0 0 0 0	
  	
  	
  	
  	
  	
   0 0 0
0 0 0 0 0 0 0 0 0
0 0 0 0 0 0.2 0.1 0 0
0 0 0 0 0 0.4	
  0.2 0 0
0 0 0 0 0 0.1	
  	
  	
   0 0 0
0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0
p(a|s) aargmax
Current	
  Board Next	
  Action
Convolutional	
  Neural	
  Network	
  (CNN)
CNN	
  is	
  a	
  powerful	
  model	
  for	
  image	
  recognition	
  tasks;	
  it	
  abstracts	
  out	
  the	
  input	
  image	
  through	
  convolution	
  layers
Image	
  source
Convolutional	
  Neural	
  Network	
  (CNN)
And	
  they	
  use	
  this	
  CNN	
  model	
  (similar	
  architecture)	
  to	
  evaluate	
  the	
  board	
  position;	
  which learns	
  “some”	
  spatial	
  invariance
Go: abstraction	
  is	
  the	
  key	
  to	
  win
CNN:	
  abstraction	
  is	
  its	
  forte
1.	
  Reducing	
  “action	
  candidates”
(1) Imitating	
  expert	
  moves	
  (supervised	
  learning)
Expert	
  Moves	
  Imitator	
  Model
(w/	
  CNN)
Current	
  Board Next	
  Action
Training:
1.	
  Reducing	
  “action	
  candidates”
(2) Improving	
  through	
  self-­‐plays	
  (reinforcement	
  learning)
Expert	
  Moves	
  
Imitator	
  Model
(w/	
  CNN)
Expert	
  Moves	
  
Imitator	
  Model
(w/	
  CNN)
VS
Improving	
  by	
  playing	
  against	
  itself
1.	
  Reducing	
  “action	
  candidates”
(2) Improving	
  through	
  self-­‐plays	
  (reinforcement	
  learning)
Expert	
  Moves	
  
Imitator	
  Model
(w/	
  CNN)
Expert	
  Moves	
  
Imitator	
  Model
(w/	
  CNN)
VS
Return:	
  board	
  positions, win/lose info
1.	
  Reducing	
  “action	
  candidates”
(2) Improving	
  through	
  self-­‐plays	
  (reinforcement	
  learning)
Expert	
  Moves	
  Imitator	
  Model
(w/	
  CNN)
Board	
  position win/loss
Training:
Loss
z	
  =	
  -­‐1
1.	
  Reducing	
  “action	
  candidates”
(2) Improving	
  through	
  self-­‐plays	
  (reinforcement	
  learning)
Expert	
  Moves	
  Imitator	
  Model
(w/	
  CNN)
Training:
z	
  =	
  +1
Board	
  position win/loss
Win
1.	
  Reducing	
  “action	
  candidates”
(2) Improving	
  through	
  self-­‐plays	
  (reinforcement	
  learning)
Updated	
  Model
ver 1.1
Updated	
  Model
ver 1.3VS
Return:	
  board	
  positions, win/lose info
It	
  uses	
  the	
  same	
  topology	
  as	
  the	
  expert	
  moves	
  imitator	
  model,	
  and	
  just	
  uses	
  the	
  updated parameters
Older	
  models	
  vs.	
  newer	
  models
1.	
  Reducing	
  “action	
  candidates”
(2) Improving	
  through	
  self-­‐plays	
  (reinforcement	
  learning)
Updated	
  Model	
  
ver 1.3
Updated	
  Model	
  
ver 1.7VS
Return:	
  board	
  positions, win/lose info
1.	
  Reducing	
  “action	
  candidates”
(2) Improving	
  through	
  self-­‐plays	
  (reinforcement	
  learning)
Updated	
  Model	
  
ver 1.5
Updated	
  Model	
  
ver 2.0VS
Return:	
  board	
  positions, win/lose info
1.	
  Reducing	
  “action	
  candidates”
(2) Improving	
  through	
  self-­‐plays	
  (reinforcement	
  learning)
Updated	
  Model	
  
ver 3204.1
Updated	
  Model	
  
ver 46235.2VS
Return:	
  board	
  positions, win/lose info
1.	
  Reducing	
  “action	
  candidates”
(2) Improving	
  through	
  self-­‐plays	
  (reinforcement	
  learning)
Updated	
  Model	
  
ver 1,000,000VS
The	
  final	
  model	
  wins 80%	
  of	
  the time
when	
  playing	
  against	
  the	
  first	
  model
Expert	
  Moves	
  
Imitator	
  Model
2.	
  Board	
  Evaluation
2.	
  Board	
  Evaluation
Updated	
  Model
ver 1,000,000
Board	
  Position
Training:
Win	
  /	
  Loss
Win
(0~1)
Value	
  
Prediction	
  
Model
(Regression)
Adds	
  a regression	
  layer	
  to	
  the	
  model
Predicts	
  values	
  between	
  0~1
Close	
  to	
  1:	
  a	
  good	
  board	
  position
Close	
  to	
  0:	
  a	
  bad	
  board	
  position
Reducing	
  Search	
  Space
1. Reducing	
  “action	
  candidates”
(Breadth	
  Reduction)
2. Board	
  Evaluation (Depth	
  Reduction)
Policy	
  Network
Value	
  Network
Looking	
  ahead	
  (w/	
  Monte	
  Carlo	
  Search	
  Tree)
Action	
  Candidates	
  Reduction
(Policy	
  Network)
Board	
  Evaluation
(Value	
  Network)
(Rollout):	
  Faster	
  version	
  of	
  estimating	
  p(a|s)
à uses shallow	
  networks	
  (3	
  ms à 2µs)
Results
Elo rating	
  system
Performance	
  with	
  different	
  combinations	
  of	
  AlphaGo components
Takeaways
Use	
  the	
  networks	
  trained	
  for	
  a	
  certain	
  task	
  (with	
  different	
  loss	
  objectives)	
  for	
  several	
  other	
  tasks
Lee	
  Sedol 9-­‐dan vs	
  AlphaGo
Lee	
  Sedol 9-­‐dan vs	
  AlphaGo
Energy	
  Consumption
Lee	
  Sedol AlphaGo
-­‐ Recommended calories	
  for	
  a man per	
  day
: ~2,500 kCal
-­‐ Assumption: Lee consumes	
  the	
  entire	
  amount	
  of	
  
per-­‐day calories	
  in	
  this	
  one	
  game
2,500	
  kCal *	
  4,184	
  J/kCal
~=	
  10M	
  [J]
-­‐ Assumption: CPU:	
  ~100	
  W,	
  GPU:	
  ~300 W
-­‐ 1,202 CPUs, 176 GPUs
170,000	
  J/sec	
  *	
  5	
  hr *	
  3,600	
  sec/hr
~=	
  3,000M	
  [J]
A	
  very,	
  very	
  rough	
  calculation	
  ;)
AlphaGo is	
  estimated	
  to	
  be	
  around	
  ~5-­‐dan
=	
  multiple	
  machines European	
  champion
Taking	
  CPU	
  /	
  GPU resources	
  to	
  virtually	
  infinity?
But	
  Google	
  has	
  promised	
  not	
  to	
  use	
  more	
  CPU/GPUs
than	
  they	
  used	
  for	
  Fan	
  Hui	
  for	
  the	
  game	
  with	
  Lee
No	
  one	
  knows
how it	
  will	
  converge
AlphaGo learns	
  millions	
  of	
  Go	
  games	
  every	
  day
AlphaGo will	
  presumably	
  converge	
  to	
  some	
  point	
  eventually.
However,	
  in	
  the	
  Nature	
  paper	
  they	
  don’t	
  report	
  how	
  AlphaGo’s performance	
  improves
as	
  a	
  function	
  of	
  times	
  AlphaGo plays	
  against	
  itself	
  (self-­‐plays).
What	
  if	
  AlphaGo learns	
  Lee’s	
  game	
  strategy
Google	
  said	
  they	
  won’t	
  use	
  Lee’s	
  game	
  plays	
  as	
  AlphaGo’s training	
  data	
  
Even	
  if	
  it	
  does,	
  it	
  won’t	
  be	
  easy	
  to	
  modify	
  the	
  model	
  trained	
  over	
  millions	
  of
data	
  points	
  with	
  just	
  a	
  few	
  game	
  plays	
  with	
  Lee
(prone	
  to	
  over-­‐fitting,	
  etc.)
AlphaGo’s Weakness?
AlphaGo – How	
  It	
  Works
Presenter: Shane	
  (Seungwhan)	
  Moon
PhD	
  student
Language	
  Technologies	
  Institute,	
  School	
  of	
  Computer	
  Science
Carnegie	
  Mellon	
  University
me@shanemoon.com
3/2/2016
Reference
• Silver,	
  David,	
  et	
  al.	
  "Mastering	
  the	
  game	
  of	
  Go	
  with	
  deep	
  neural	
  
networks	
  and	
  tree	
  search." Nature 529.7587	
  (2016):	
  484-­‐489.

More Related Content

What's hot

AlphaGo 알고리즘 요약
AlphaGo 알고리즘 요약AlphaGo 알고리즘 요약
AlphaGo 알고리즘 요약Jooyoul Lee
 
Intro to Deep Reinforcement Learning
Intro to Deep Reinforcement LearningIntro to Deep Reinforcement Learning
Intro to Deep Reinforcement LearningKhaled Saleh
 
How DeepMind Mastered The Game Of Go
How DeepMind Mastered The Game Of GoHow DeepMind Mastered The Game Of Go
How DeepMind Mastered The Game Of GoTim Riser
 
An introduction to deep reinforcement learning
An introduction to deep reinforcement learningAn introduction to deep reinforcement learning
An introduction to deep reinforcement learningBig Data Colombia
 
Deep Reinforcement Learning
Deep Reinforcement LearningDeep Reinforcement Learning
Deep Reinforcement LearningUsman Qayyum
 
AlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree Search
AlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree SearchAlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree Search
AlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree SearchKarel Ha
 
알아두면 쓸데있는 신기한 강화학습 NAVER 2017
알아두면 쓸데있는 신기한 강화학습 NAVER 2017알아두면 쓸데있는 신기한 강화학습 NAVER 2017
알아두면 쓸데있는 신기한 강화학습 NAVER 2017Taehoon Kim
 
AlphaGo Zero Introduction
AlphaGo Zero IntroductionAlphaGo Zero Introduction
AlphaGo Zero Introduction友誠 張
 
An introduction to reinforcement learning
An introduction to reinforcement learningAn introduction to reinforcement learning
An introduction to reinforcement learningSubrat Panda, PhD
 
Multi-armed Bandits
Multi-armed BanditsMulti-armed Bandits
Multi-armed BanditsDongmin Lee
 
Inverse Reinforcement Learning Algorithms
Inverse Reinforcement Learning AlgorithmsInverse Reinforcement Learning Algorithms
Inverse Reinforcement Learning AlgorithmsSungjoon Choi
 
AlphaGo Zero: Mastering the Game of Go Without Human Knowledge
AlphaGo Zero: Mastering the Game of Go Without Human KnowledgeAlphaGo Zero: Mastering the Game of Go Without Human Knowledge
AlphaGo Zero: Mastering the Game of Go Without Human KnowledgeJoonhyung Lee
 
1시간만에 GAN(Generative Adversarial Network) 완전 정복하기
1시간만에 GAN(Generative Adversarial Network) 완전 정복하기1시간만에 GAN(Generative Adversarial Network) 완전 정복하기
1시간만에 GAN(Generative Adversarial Network) 완전 정복하기NAVER Engineering
 
AlphaZero: A General Reinforcement Learning Algorithm that Masters Chess, Sho...
AlphaZero: A General Reinforcement Learning Algorithm that Masters Chess, Sho...AlphaZero: A General Reinforcement Learning Algorithm that Masters Chess, Sho...
AlphaZero: A General Reinforcement Learning Algorithm that Masters Chess, Sho...Joonhyung Lee
 
Introduction of Deep Reinforcement Learning
Introduction of Deep Reinforcement LearningIntroduction of Deep Reinforcement Learning
Introduction of Deep Reinforcement LearningNAVER Engineering
 
Basic Generative Adversarial Networks
Basic Generative Adversarial NetworksBasic Generative Adversarial Networks
Basic Generative Adversarial NetworksDong Heon Cho
 
딥러닝과 강화 학습으로 나보다 잘하는 쿠키런 AI 구현하기
딥러닝과 강화 학습으로 나보다 잘하는 쿠키런 AI 구현하기딥러닝과 강화 학습으로 나보다 잘하는 쿠키런 AI 구현하기
딥러닝과 강화 학습으로 나보다 잘하는 쿠키런 AI 구현하기NAVER D2
 
MIT 6.S091: Introduction to Deep Reinforcement Learning (Deep RL) by Lex Fridman
MIT 6.S091: Introduction to Deep Reinforcement Learning (Deep RL) by Lex FridmanMIT 6.S091: Introduction to Deep Reinforcement Learning (Deep RL) by Lex Fridman
MIT 6.S091: Introduction to Deep Reinforcement Learning (Deep RL) by Lex FridmanPeerasak C.
 
Artificial Intelligence: What Is Reinforcement Learning?
Artificial Intelligence: What Is Reinforcement Learning?Artificial Intelligence: What Is Reinforcement Learning?
Artificial Intelligence: What Is Reinforcement Learning?Bernard Marr
 

What's hot (20)

AlphaGo 알고리즘 요약
AlphaGo 알고리즘 요약AlphaGo 알고리즘 요약
AlphaGo 알고리즘 요약
 
Intro to Deep Reinforcement Learning
Intro to Deep Reinforcement LearningIntro to Deep Reinforcement Learning
Intro to Deep Reinforcement Learning
 
How DeepMind Mastered The Game Of Go
How DeepMind Mastered The Game Of GoHow DeepMind Mastered The Game Of Go
How DeepMind Mastered The Game Of Go
 
An introduction to deep reinforcement learning
An introduction to deep reinforcement learningAn introduction to deep reinforcement learning
An introduction to deep reinforcement learning
 
Deep Reinforcement Learning
Deep Reinforcement LearningDeep Reinforcement Learning
Deep Reinforcement Learning
 
AlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree Search
AlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree SearchAlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree Search
AlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree Search
 
알아두면 쓸데있는 신기한 강화학습 NAVER 2017
알아두면 쓸데있는 신기한 강화학습 NAVER 2017알아두면 쓸데있는 신기한 강화학습 NAVER 2017
알아두면 쓸데있는 신기한 강화학습 NAVER 2017
 
AlphaGo Zero Introduction
AlphaGo Zero IntroductionAlphaGo Zero Introduction
AlphaGo Zero Introduction
 
An introduction to reinforcement learning
An introduction to reinforcement learningAn introduction to reinforcement learning
An introduction to reinforcement learning
 
Multi-armed Bandits
Multi-armed BanditsMulti-armed Bandits
Multi-armed Bandits
 
Inverse Reinforcement Learning Algorithms
Inverse Reinforcement Learning AlgorithmsInverse Reinforcement Learning Algorithms
Inverse Reinforcement Learning Algorithms
 
AlphaGo Zero: Mastering the Game of Go Without Human Knowledge
AlphaGo Zero: Mastering the Game of Go Without Human KnowledgeAlphaGo Zero: Mastering the Game of Go Without Human Knowledge
AlphaGo Zero: Mastering the Game of Go Without Human Knowledge
 
1시간만에 GAN(Generative Adversarial Network) 완전 정복하기
1시간만에 GAN(Generative Adversarial Network) 완전 정복하기1시간만에 GAN(Generative Adversarial Network) 완전 정복하기
1시간만에 GAN(Generative Adversarial Network) 완전 정복하기
 
AlphaZero: A General Reinforcement Learning Algorithm that Masters Chess, Sho...
AlphaZero: A General Reinforcement Learning Algorithm that Masters Chess, Sho...AlphaZero: A General Reinforcement Learning Algorithm that Masters Chess, Sho...
AlphaZero: A General Reinforcement Learning Algorithm that Masters Chess, Sho...
 
Introduction of Deep Reinforcement Learning
Introduction of Deep Reinforcement LearningIntroduction of Deep Reinforcement Learning
Introduction of Deep Reinforcement Learning
 
Basic Generative Adversarial Networks
Basic Generative Adversarial NetworksBasic Generative Adversarial Networks
Basic Generative Adversarial Networks
 
딥러닝과 강화 학습으로 나보다 잘하는 쿠키런 AI 구현하기
딥러닝과 강화 학습으로 나보다 잘하는 쿠키런 AI 구현하기딥러닝과 강화 학습으로 나보다 잘하는 쿠키런 AI 구현하기
딥러닝과 강화 학습으로 나보다 잘하는 쿠키런 AI 구현하기
 
Reinforcement learning
Reinforcement learningReinforcement learning
Reinforcement learning
 
MIT 6.S091: Introduction to Deep Reinforcement Learning (Deep RL) by Lex Fridman
MIT 6.S091: Introduction to Deep Reinforcement Learning (Deep RL) by Lex FridmanMIT 6.S091: Introduction to Deep Reinforcement Learning (Deep RL) by Lex Fridman
MIT 6.S091: Introduction to Deep Reinforcement Learning (Deep RL) by Lex Fridman
 
Artificial Intelligence: What Is Reinforcement Learning?
Artificial Intelligence: What Is Reinforcement Learning?Artificial Intelligence: What Is Reinforcement Learning?
Artificial Intelligence: What Is Reinforcement Learning?
 

Similar to How AlphaGo Works

Distributed Formal Concept Analysis Algorithms Based on an Iterative MapReduc...
Distributed Formal Concept Analysis Algorithms Based on an Iterative MapReduc...Distributed Formal Concept Analysis Algorithms Based on an Iterative MapReduc...
Distributed Formal Concept Analysis Algorithms Based on an Iterative MapReduc...Ruairi de Frein
 
Gan seminar
Gan seminarGan seminar
Gan seminarSan Kim
 
Logistic Modeling with Applications to Marketing and Credit Risk in the Autom...
Logistic Modeling with Applications to Marketing and Credit Risk in the Autom...Logistic Modeling with Applications to Marketing and Credit Risk in the Autom...
Logistic Modeling with Applications to Marketing and Credit Risk in the Autom...Magnify Analytic Solutions
 
Risking Everything with Akka Streams
Risking Everything with Akka StreamsRisking Everything with Akka Streams
Risking Everything with Akka Streamsjohofer
 
Unit 5 Introduction to Planning and ANN.pptx
Unit 5 Introduction to Planning and ANN.pptxUnit 5 Introduction to Planning and ANN.pptx
Unit 5 Introduction to Planning and ANN.pptxDrYogeshDeshmukh1
 
Ropossum: A Game That Generates Itself
Ropossum: A Game That Generates ItselfRopossum: A Game That Generates Itself
Ropossum: A Game That Generates ItselfMohammad Shaker
 
Large scale landuse classification of satellite imagery
Large scale landuse classification of satellite imageryLarge scale landuse classification of satellite imagery
Large scale landuse classification of satellite imagerySuneel Marthi
 
Deep learning simplified
Deep learning simplifiedDeep learning simplified
Deep learning simplifiedLovelyn Rose
 
Lucio marcenaro tue summer_school
Lucio marcenaro tue summer_schoolLucio marcenaro tue summer_school
Lucio marcenaro tue summer_schoolJun Hu
 
Practical AI for Business: Bandit Algorithms
Practical AI for Business: Bandit AlgorithmsPractical AI for Business: Bandit Algorithms
Practical AI for Business: Bandit AlgorithmsSC5.io
 
AlphaZero and beyond: Polygames
AlphaZero and beyond: PolygamesAlphaZero and beyond: Polygames
AlphaZero and beyond: PolygamesOlivier Teytaud
 
Marl의 개념 및 군사용 적용방안
Marl의 개념 및 군사용 적용방안Marl의 개념 및 군사용 적용방안
Marl의 개념 및 군사용 적용방안KyuYeolJung
 
Story points considered harmful - or why the future of estimation is really i...
Story points considered harmful - or why the future of estimation is really i...Story points considered harmful - or why the future of estimation is really i...
Story points considered harmful - or why the future of estimation is really i...Vasco Duarte
 
[2019] 퍼즐 게임 난이도 예측은 닥터 P에게
[2019] 퍼즐 게임 난이도 예측은 닥터 P에게[2019] 퍼즐 게임 난이도 예측은 닥터 P에게
[2019] 퍼즐 게임 난이도 예측은 닥터 P에게NHN FORWARD
 
OpenGL L02-Transformations
OpenGL L02-TransformationsOpenGL L02-Transformations
OpenGL L02-TransformationsMohammad Shaker
 
Object Tracking with Instance Matching and Online Learning
Object Tracking with Instance Matching and Online LearningObject Tracking with Instance Matching and Online Learning
Object Tracking with Instance Matching and Online LearningJui-Hsin (Larry) Lai
 
Is Production RL at a tipping point?
Is Production RL at a tipping point?Is Production RL at a tipping point?
Is Production RL at a tipping point?M Waleed Kadous
 
John Maxwell, Data Scientist, Nordstrom at MLconf Seattle 2017
John Maxwell, Data Scientist, Nordstrom at MLconf Seattle 2017 John Maxwell, Data Scientist, Nordstrom at MLconf Seattle 2017
John Maxwell, Data Scientist, Nordstrom at MLconf Seattle 2017 MLconf
 
[한국어] Safe Multi-Agent Reinforcement Learning for Autonomous Driving
[한국어] Safe Multi-Agent Reinforcement Learning for Autonomous Driving[한국어] Safe Multi-Agent Reinforcement Learning for Autonomous Driving
[한국어] Safe Multi-Agent Reinforcement Learning for Autonomous DrivingKiho Suh
 
Geohydrology ii (3)
Geohydrology ii (3)Geohydrology ii (3)
Geohydrology ii (3)Amro Elfeki
 

Similar to How AlphaGo Works (20)

Distributed Formal Concept Analysis Algorithms Based on an Iterative MapReduc...
Distributed Formal Concept Analysis Algorithms Based on an Iterative MapReduc...Distributed Formal Concept Analysis Algorithms Based on an Iterative MapReduc...
Distributed Formal Concept Analysis Algorithms Based on an Iterative MapReduc...
 
Gan seminar
Gan seminarGan seminar
Gan seminar
 
Logistic Modeling with Applications to Marketing and Credit Risk in the Autom...
Logistic Modeling with Applications to Marketing and Credit Risk in the Autom...Logistic Modeling with Applications to Marketing and Credit Risk in the Autom...
Logistic Modeling with Applications to Marketing and Credit Risk in the Autom...
 
Risking Everything with Akka Streams
Risking Everything with Akka StreamsRisking Everything with Akka Streams
Risking Everything with Akka Streams
 
Unit 5 Introduction to Planning and ANN.pptx
Unit 5 Introduction to Planning and ANN.pptxUnit 5 Introduction to Planning and ANN.pptx
Unit 5 Introduction to Planning and ANN.pptx
 
Ropossum: A Game That Generates Itself
Ropossum: A Game That Generates ItselfRopossum: A Game That Generates Itself
Ropossum: A Game That Generates Itself
 
Large scale landuse classification of satellite imagery
Large scale landuse classification of satellite imageryLarge scale landuse classification of satellite imagery
Large scale landuse classification of satellite imagery
 
Deep learning simplified
Deep learning simplifiedDeep learning simplified
Deep learning simplified
 
Lucio marcenaro tue summer_school
Lucio marcenaro tue summer_schoolLucio marcenaro tue summer_school
Lucio marcenaro tue summer_school
 
Practical AI for Business: Bandit Algorithms
Practical AI for Business: Bandit AlgorithmsPractical AI for Business: Bandit Algorithms
Practical AI for Business: Bandit Algorithms
 
AlphaZero and beyond: Polygames
AlphaZero and beyond: PolygamesAlphaZero and beyond: Polygames
AlphaZero and beyond: Polygames
 
Marl의 개념 및 군사용 적용방안
Marl의 개념 및 군사용 적용방안Marl의 개념 및 군사용 적용방안
Marl의 개념 및 군사용 적용방안
 
Story points considered harmful - or why the future of estimation is really i...
Story points considered harmful - or why the future of estimation is really i...Story points considered harmful - or why the future of estimation is really i...
Story points considered harmful - or why the future of estimation is really i...
 
[2019] 퍼즐 게임 난이도 예측은 닥터 P에게
[2019] 퍼즐 게임 난이도 예측은 닥터 P에게[2019] 퍼즐 게임 난이도 예측은 닥터 P에게
[2019] 퍼즐 게임 난이도 예측은 닥터 P에게
 
OpenGL L02-Transformations
OpenGL L02-TransformationsOpenGL L02-Transformations
OpenGL L02-Transformations
 
Object Tracking with Instance Matching and Online Learning
Object Tracking with Instance Matching and Online LearningObject Tracking with Instance Matching and Online Learning
Object Tracking with Instance Matching and Online Learning
 
Is Production RL at a tipping point?
Is Production RL at a tipping point?Is Production RL at a tipping point?
Is Production RL at a tipping point?
 
John Maxwell, Data Scientist, Nordstrom at MLconf Seattle 2017
John Maxwell, Data Scientist, Nordstrom at MLconf Seattle 2017 John Maxwell, Data Scientist, Nordstrom at MLconf Seattle 2017
John Maxwell, Data Scientist, Nordstrom at MLconf Seattle 2017
 
[한국어] Safe Multi-Agent Reinforcement Learning for Autonomous Driving
[한국어] Safe Multi-Agent Reinforcement Learning for Autonomous Driving[한국어] Safe Multi-Agent Reinforcement Learning for Autonomous Driving
[한국어] Safe Multi-Agent Reinforcement Learning for Autonomous Driving
 
Geohydrology ii (3)
Geohydrology ii (3)Geohydrology ii (3)
Geohydrology ii (3)
 

Recently uploaded

Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 

Recently uploaded (20)

Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 

How AlphaGo Works

  • 1. Presenter: Shane  (Seungwhan)  Moon PhD  student Language  Technologies  Institute,  School  of  Computer  Science Carnegie  Mellon  University 3/2/2016 How  it  works
  • 2. AlphaGo vs  European  Champion  (Fan  Hui 2-­‐Dan) October  5  – 9,  2015 <Official  match> -­‐ Time  limit:  1  hour -­‐ AlphaGo Wins (5:0) * rank
  • 3. AlphaGo vs  World  Champion  (Lee  Sedol 9-­‐Dan) March  9  – 15,  2016 <Official  match> -­‐ Time  limit:  2  hours Venue:  Seoul,  Four  Seasons  Hotel Image  Source: Josun  Times Jan  28th 2015
  • 4. Lee  Sedol Photo  source: Maeil  Economics 2013/04 wiki
  • 6. Computer  Go  AI – Definition s (state) d  =  1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 = (e.g.  we  can  represent  the  board  into  a  matrix-­‐like  form) *  The  actual  model  uses  other  features  than  board  positions  as  well
  • 7. Computer  Go  AI  – Definition s (state) d  =  1 d  =  2 a (action) Given  s,  pick  the  best  a Computer  Go Artificial   Intelligence s a s'
  • 8. Computer  Go  AI – An  Implementation  Idea? d  =  1 d  =  2 … How  about  simulating  all  possible  board  positions?
  • 9. Computer  Go  AI  – An  Implementation  Idea? d  =  1 d  =  2 … d  =  3 … … … …
  • 10. Computer  Go  AI  – An  Implementation  Idea? d  =  1 d  =  2 … d  =  3 … … … … … d  =  maxD Process  the  simulation  until  the  game  ends, then  report  win  /  lose  results
  • 11. Computer  Go  AI  – An  Implementation  Idea? d  =  1 d  =  2 … d  =  3 … … … … … d  =  maxD Process  the  simulation  until  the  game  ends, then  report  win  /  lose  results e.g. it  wins  13  times  if  the  next  stone  gets  placed  here 37,839  times 431,320  times Choose  the  “next  action  /  stone” that  has  the  most  win-­‐counts  in  the  full-­‐scale  simulation
  • 12. This  is  NOT  possible;  it  is  said  the  possible  configurations  of  the  board  exceeds  the  number   of  atoms  in  the  universe
  • 13. Key: To  Reduce Search  Space
  • 14. Reducing  Search  Space 1.  Reducing  “action  candidates”  (Breadth  Reduction) d  =  1 d  =  2 … d  =  3 … … … … d  =  maxD Win? Loss? IF  there  is  a  model  that  can  tell  you  that  these  moves are  not  common  /  probable  (e.g.  by  experts,  etc.)  …
  • 15. Reducing  Search  Space 1.  Reducing  “action  candidates”  (Breadth  Reduction) d  =  1 d  =  2 … d  =  3 … … d  =  maxD Win? Loss? Remove  these  from  search  candidates  in  advance (breadth  reduction)
  • 16. Reducing  Search  Space 2.  Position  evaluation  ahead  of  time  (Depth  Reduction) d  =  1 d  =  2 … d  =  3 … … d  =  maxD Win? Loss? Instead  of  simulating  until  the  maximum  depth ..
  • 17. Reducing  Search  Space 2.  Position  evaluation  ahead  of  time  (Depth  Reduction) d  =  1 d  =  2 … d  =  3 … V  =  1 V  =  2 V  =  10 IF  there  is  a  function  that  can  measure: V(s):  “board  evaluation  of  state  s”
  • 18. Reducing  Search  Space 1. Reducing  “action  candidates”  (Breadth  Reduction) 2. Position  evaluation  ahead  of  time  (Depth  Reduction)
  • 19. 1.  Reducing  “action  candidates” Learning:  P  (  next  action  |  current  state  ) =  P  (  a  |  s  )
  • 20. 1.  Reducing  “action  candidates” (1) Imitating  expert  moves  (supervised  learning) Current  State Prediction   Model Next  State s1 s2 s2 s3 s3 s4 Data:  Online  Go experts (5~9  dan) 160K games, 30M  board  positions
  • 21. 1.  Reducing  “action  candidates” (1) Imitating  expert  moves  (supervised  learning) Prediction  Model Current  Board Next  Board
  • 22. 1.  Reducing  “action  candidates” (1) Imitating  expert  moves  (supervised  learning) Prediction  Model Current  Board Next  Action There  are  19  X  19  =  361 possible  actions (with  different  probabilities)
  • 23. 1.  Reducing  “action  candidates” (1) Imitating  expert  moves  (supervised  learning) Prediction  Model 0 0   0 0 0   0 0 0 0 0 0   0 0 0 1 0 0 0 0 -­‐1 0 0 1 -­‐1 1 0 0 0 1 0 0 1 -­‐1 0 0 0 0 0   0 0 -­‐1 0 0 0 0 0 0   0 0 0   0 0 0 0 0 -­‐1 0 0 0   0 0 0 0 0 0   0 0 0   0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 s af:  s à a Current  Board Next  Action
  • 24. 1.  Reducing  “action  candidates” (1) Imitating  expert  moves  (supervised  learning) Prediction   Model 0 0   0 0 0   0 0 0 0 0 0   0 0 0 1 0 0 0 0 -­‐1 0 0 1 -­‐1 1 0 0 0 1 0 0 1 -­‐1 0 0 0 0 0   0 0 -­‐1 0 0 0 0 0 0   0 0 0   0 0 0 0 0 -­‐1 0 0 0   0 0 0 0 0 0   0 0 0   0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 s g:  s à p(a|s) 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0             0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.2 0.1 0 0 0 0 0 0 0 0.4  0.2 0 0 0 0 0 0 0 0.1       0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 p(a|s) aargmax Current  Board Next  Action
  • 25. 1.  Reducing  “action  candidates” (1) Imitating  expert  moves  (supervised  learning) Prediction   Model 0 0   0 0 0   0 0 0 0 0 0   0 0 0 1 0 0 0 0 -­‐1 0 0 1 -­‐1 1 0 0 0 1 0 0 1 -­‐1 0 0 0 0 0   0 0 -­‐1 0 0 0 0 0 0   0 0 0   0 0 0 0 0 -­‐1 0 0 0   0 0 0 0 0 0   0 0 0   0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 s g:  s à p(a|s) p(a|s) aargmax Current  Board Next  Action
  • 26. 1.  Reducing  “action  candidates” (1) Imitating  expert  moves  (supervised  learning) Deep  Learning (13  Layer  CNN) 0 0   0 0 0   0 0 0 0 0 0   0 0 0 1 0 0 0 0 -­‐1 0 0 1 -­‐1 1 0 0 0 1 0 0 1 -­‐1 0 0 0 0 0   0 0 -­‐1 0 0 0 0 0 0   0 0 0   0 0 0 0 0 -­‐1 0 0 0   0 0 0 0 0 0   0 0 0   0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 s g:  s à p(a|s) 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0             0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.2 0.1 0 0 0 0 0 0 0 0.4  0.2 0 0 0 0 0 0 0 0.1       0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 p(a|s) aargmax Current  Board Next  Action
  • 27. Convolutional  Neural  Network  (CNN) CNN  is  a  powerful  model  for  image  recognition  tasks;  it  abstracts  out  the  input  image  through  convolution  layers Image  source
  • 28. Convolutional  Neural  Network  (CNN) And  they  use  this  CNN  model  (similar  architecture)  to  evaluate  the  board  position;  which learns  “some”  spatial  invariance
  • 29. Go: abstraction  is  the  key  to  win CNN:  abstraction  is  its  forte
  • 30. 1.  Reducing  “action  candidates” (1) Imitating  expert  moves  (supervised  learning) Expert  Moves  Imitator  Model (w/  CNN) Current  Board Next  Action Training:
  • 31. 1.  Reducing  “action  candidates” (2) Improving  through  self-­‐plays  (reinforcement  learning) Expert  Moves   Imitator  Model (w/  CNN) Expert  Moves   Imitator  Model (w/  CNN) VS Improving  by  playing  against  itself
  • 32. 1.  Reducing  “action  candidates” (2) Improving  through  self-­‐plays  (reinforcement  learning) Expert  Moves   Imitator  Model (w/  CNN) Expert  Moves   Imitator  Model (w/  CNN) VS Return:  board  positions, win/lose info
  • 33. 1.  Reducing  “action  candidates” (2) Improving  through  self-­‐plays  (reinforcement  learning) Expert  Moves  Imitator  Model (w/  CNN) Board  position win/loss Training: Loss z  =  -­‐1
  • 34. 1.  Reducing  “action  candidates” (2) Improving  through  self-­‐plays  (reinforcement  learning) Expert  Moves  Imitator  Model (w/  CNN) Training: z  =  +1 Board  position win/loss Win
  • 35. 1.  Reducing  “action  candidates” (2) Improving  through  self-­‐plays  (reinforcement  learning) Updated  Model ver 1.1 Updated  Model ver 1.3VS Return:  board  positions, win/lose info It  uses  the  same  topology  as  the  expert  moves  imitator  model,  and  just  uses  the  updated parameters Older  models  vs.  newer  models
  • 36. 1.  Reducing  “action  candidates” (2) Improving  through  self-­‐plays  (reinforcement  learning) Updated  Model   ver 1.3 Updated  Model   ver 1.7VS Return:  board  positions, win/lose info
  • 37. 1.  Reducing  “action  candidates” (2) Improving  through  self-­‐plays  (reinforcement  learning) Updated  Model   ver 1.5 Updated  Model   ver 2.0VS Return:  board  positions, win/lose info
  • 38. 1.  Reducing  “action  candidates” (2) Improving  through  self-­‐plays  (reinforcement  learning) Updated  Model   ver 3204.1 Updated  Model   ver 46235.2VS Return:  board  positions, win/lose info
  • 39. 1.  Reducing  “action  candidates” (2) Improving  through  self-­‐plays  (reinforcement  learning) Updated  Model   ver 1,000,000VS The  final  model  wins 80%  of  the time when  playing  against  the  first  model Expert  Moves   Imitator  Model
  • 41. 2.  Board  Evaluation Updated  Model ver 1,000,000 Board  Position Training: Win  /  Loss Win (0~1) Value   Prediction   Model (Regression) Adds  a regression  layer  to  the  model Predicts  values  between  0~1 Close  to  1:  a  good  board  position Close  to  0:  a  bad  board  position
  • 42. Reducing  Search  Space 1. Reducing  “action  candidates” (Breadth  Reduction) 2. Board  Evaluation (Depth  Reduction) Policy  Network Value  Network
  • 43. Looking  ahead  (w/  Monte  Carlo  Search  Tree) Action  Candidates  Reduction (Policy  Network) Board  Evaluation (Value  Network) (Rollout):  Faster  version  of  estimating  p(a|s) à uses shallow  networks  (3  ms à 2µs)
  • 44. Results Elo rating  system Performance  with  different  combinations  of  AlphaGo components
  • 45. Takeaways Use  the  networks  trained  for  a  certain  task  (with  different  loss  objectives)  for  several  other  tasks
  • 46. Lee  Sedol 9-­‐dan vs  AlphaGo
  • 47. Lee  Sedol 9-­‐dan vs  AlphaGo Energy  Consumption Lee  Sedol AlphaGo -­‐ Recommended calories  for  a man per  day : ~2,500 kCal -­‐ Assumption: Lee consumes  the  entire  amount  of   per-­‐day calories  in  this  one  game 2,500  kCal *  4,184  J/kCal ~=  10M  [J] -­‐ Assumption: CPU:  ~100  W,  GPU:  ~300 W -­‐ 1,202 CPUs, 176 GPUs 170,000  J/sec  *  5  hr *  3,600  sec/hr ~=  3,000M  [J] A  very,  very  rough  calculation  ;)
  • 48. AlphaGo is  estimated  to  be  around  ~5-­‐dan =  multiple  machines European  champion
  • 49. Taking  CPU  /  GPU resources  to  virtually  infinity? But  Google  has  promised  not  to  use  more  CPU/GPUs than  they  used  for  Fan  Hui  for  the  game  with  Lee No  one  knows how it  will  converge
  • 50. AlphaGo learns  millions  of  Go  games  every  day AlphaGo will  presumably  converge  to  some  point  eventually. However,  in  the  Nature  paper  they  don’t  report  how  AlphaGo’s performance  improves as  a  function  of  times  AlphaGo plays  against  itself  (self-­‐plays).
  • 51. What  if  AlphaGo learns  Lee’s  game  strategy Google  said  they  won’t  use  Lee’s  game  plays  as  AlphaGo’s training  data   Even  if  it  does,  it  won’t  be  easy  to  modify  the  model  trained  over  millions  of data  points  with  just  a  few  game  plays  with  Lee (prone  to  over-­‐fitting,  etc.)
  • 53. AlphaGo – How  It  Works Presenter: Shane  (Seungwhan)  Moon PhD  student Language  Technologies  Institute,  School  of  Computer  Science Carnegie  Mellon  University me@shanemoon.com 3/2/2016
  • 54. Reference • Silver,  David,  et  al.  "Mastering  the  game  of  Go  with  deep  neural   networks  and  tree  search." Nature 529.7587  (2016):  484-­‐489.