SlideShare ist ein Scribd-Unternehmen logo
1 von 30
Downloaden Sie, um offline zu lesen
AlphaGo	
  Analysis	
  from	
  Deep	
  
Learning	
  Perspec6ve	
  
Chayan	
  Chakrabar6	
  
July	
  11,	
  2016	
  
Pleasanton,	
  CA	
  
Mastering	
  the	
  game	
  of	
  GO	
  
•  DeepMind	
  problem	
  domain	
  
•  Deep	
  learning	
  and	
  reinforcement	
  learning	
  
concepts	
  
•  Design	
  of	
  AlphaGo	
  
•  Execu6on	
  
GO:	
  perfect	
  informa6on	
  game	
  
All	
  possible	
  GO	
  boards	
  =	
  250150	
  >	
  Number	
  of	
  atoms	
  in	
  the	
  universe	
  	
  	
  
Reduce	
  search	
  space	
  
•  Reduce	
  breadth	
  
– Not	
  all	
  moves	
  are	
  equally	
  likely	
  
– Some	
  moves	
  are	
  bePer	
  
– Leverage	
  moves	
  made	
  by	
  expert	
  players	
  
•  Reduce	
  depth	
  
– Evaluate	
  strength	
  of	
  board	
  (likelihood	
  of	
  winning)	
  
– Collapse	
  symmetrical	
  or	
  similar	
  boards	
  
– Simulate	
  the	
  games	
  
	
  
	
  
Monte	
  Carlo	
  tree	
  search	
  
Supervised	
  learning	
  using	
  neural	
  networks	
  
Convolu6onal	
  neural	
  networks	
  
Encode	
  local	
  or	
  spa6al	
  features	
  
Reinforcement	
  learning	
  
Reinforcement"Learning""
State:" St
Reward"
(Feedback):"Rt
AcIon:"At
•  Feedback"is"delayed."
•  No"supervisor,"only"a"reward"signal."
•  Rules"of"the"game"are"unknown."
Agent"
Environment"
Determinis6c	
  policy	
  
Stochas6c	
  policy	
  
Value:	
  expected	
  long	
  term	
  reward	
  
Monte	
  Carlo	
  tree	
  search	
  combined	
  
with	
  deep	
  neural	
  networks	
  AlphaGo
neural networks
normal MCTS
AlphaGO	
  schema6c	
  architecture	
  
AlphaGo neural networks
selectionevaluation evaluation
Reducing	
  breadth	
  of	
  moves	
  
Predic6ng	
  the	
  move	
  
1.*Reducing*“action*candidates”
(1) Imitating+expert+moves+(supervised+learning)
Expert$Moves$Imitator$Model
(w/$CNN)
Current$Board
Training:
ng*“action*candidates”
+expert+moves+(supervised+learning)
Expert$Moves$Imitator$Model
(w/$CNN)
Next$Action
Training:
ng*“action*candidates”
+expert+moves+(supervised+learning)
Prediction$
Model
0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0
0 0 0 0 0 1 0 0 0
0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0
g:$s ! p(a|s) p(a|s) aargmax
Next$Action
Reducing*“action*candidates”
Imitating+expert+moves+(supervised+learning)
Expert$Moves$Imitator$Model
(w/$CNN)
nt$Board Next$A
Training:
Two	
  kinds	
  of	
  policies	
  
● used a large database of online expert games
● learned two versions of the neural network
○ a fast network P for use in evaluation
○ an accurate network P for use in selection
Step 1: learn to predict human moves
CS63 topic
neural networks
week 7, 14?
Further	
  reduce	
  search	
  space	
  Symmetries"
Input"
RotaIon""
90"degrees"
RotaIon""
180"degrees"
RotaIon""
270"degrees"
VerIcal"
reflecIon"
VerIcal"
reflecIon"
VerIcal"
reflecIon"
VerIcal"
reflecIon"
Reduce	
  depth	
  by	
  board	
  evalua6on	
  
Updated$Model
ver 1,000,000
Board$Position
Training:
Value$
Predictio
Model
(Regressio
Evaluation
Updated$Model
W
Value$
Prediction$
Adds$a reg
Predicts$v
Close$to$1
Close$to$0
Win$/$Loss
e$
Adds$a regression$layer$to$the$model
Predicts$values$between$0~1
Close$to$1:$a$good$board$position
Close$to$0:$a$bad$board$position
aluation
Updated$Model
ver 1,000,000
Training:
Win$/$Loss
Win
(0~1)
Value$
Prediction$
Model
(Regression)
Adds$a regression$layer$to$the$model
Predicts$values$between$0~1
Close$to$1:$a$good$board$position
Close$to$0:$a$bad$board$position
Value	
  follows	
  from	
  policy	
  
Step 3: learn a board evaluation network, V
● use random samples from the self-play database
● prediction target: probability that black wins from a
given board
PuWng	
  it	
  all	
  together	
  
Looking*ahead*(w/*Monte*Carlo*Search*Tree)
Action$Candidates$Reduction
(Policy$Network)
Board$Evaluation
(Value$Network)
(Rollout):$Faster$version$of$estimating$p(a|s)
! uses shallow$networks$(3$ms ! 2µs)
Selec6on	
  
Expansion	
  Expansion"
s
a
s0
Insert"the"node"for"the"successor"
state""""".""s0
1"
2"
Nv(s0
, a0
) = Nr(s0
, a0
) = 0
Wr(s0
, a0
) = Wv(s0
, a0
) = 0
P(s0
, a0
) = p (a0
|s0
)
p (a0
|s0
)
If"visit"count"exceed"a"threshold":"
"""""","Nr(s, a) > nthr
a0
a0
For"every"possible"""""","iniIalize"
the"staIsIcs:""""
a0
75"
Evalua6on	
  EvaluaIon"
p⇡
1"
2" Simulate"the"acIon"by""
rollout"policy"network""""""""."p⇡
Evaluate""""""""""""""by"value"network""""""."v✓(s0
) v✓
r(sT )
v✓(s0
)
When"reaching"terminal""""""",""
calculate"the"reward""""""""""""".""
sT
r(sT )
76"
Backup	
  
Distribute	
  search	
  through	
  GPUs	
  Distributed"Search""
p⇡
r(sT )
v✓(s0
)
p (a0
|s0
)
Main"search"tree"
Master"CPU"
Policy"&"value"networks"
176"GPUs"
Rollout"policy"networks"
1,202"CPUs""
78"
Apply	
  trained	
  networks	
  to	
  tasks	
  with	
  
different	
  loss	
  func6on	
  Takeaways
Use+the+networks+trained+for+a+certain+task+(with+different+loss+objectives)+for+several+other+ta
Single	
  most	
  important	
  takeaway	
  
•  Feature	
  abstrac6on	
  is	
  the	
  key	
  component	
  of	
  
any	
  machine	
  learning	
  algorithm	
  
•  Convolu6onal	
  neural	
  networks	
  are	
  great	
  at	
  
automated	
  feature	
  abstrac6on	
  
Reference	
  
Silver	
  et.	
  al.	
  Mastering	
  the	
  Game	
  of	
  Go	
  with	
  
Deep	
  Neural	
  Networks	
  and	
  Tree	
  Search.	
  	
  Nature.	
  
529,	
  484–489.	
  January	
  2016.	
  
	
  
About	
  the	
  speaker	
  
Chayan	
  Chakrabar6	
  
hPps://www.linkedin.com/in/chayanchakrabar6	
  
	
  

Weitere ähnliche Inhalte

Was ist angesagt?

AlphaZero: A General Reinforcement Learning Algorithm that Masters Chess, Sho...
AlphaZero: A General Reinforcement Learning Algorithm that Masters Chess, Sho...AlphaZero: A General Reinforcement Learning Algorithm that Masters Chess, Sho...
AlphaZero: A General Reinforcement Learning Algorithm that Masters Chess, Sho...Joonhyung Lee
 
AlphaGo Zero: Mastering the Game of Go Without Human Knowledge
AlphaGo Zero: Mastering the Game of Go Without Human KnowledgeAlphaGo Zero: Mastering the Game of Go Without Human Knowledge
AlphaGo Zero: Mastering the Game of Go Without Human KnowledgeJoonhyung Lee
 
ModuLab DLC-Medical3
ModuLab DLC-Medical3ModuLab DLC-Medical3
ModuLab DLC-Medical3Dongheon Lee
 
AlphaZero and beyond: Polygames
AlphaZero and beyond: PolygamesAlphaZero and beyond: Polygames
AlphaZero and beyond: PolygamesOlivier Teytaud
 
Deliberately Planning and Acting for Angry Birds with Refinement Methods
Deliberately Planning and Acting for Angry Birds with Refinement MethodsDeliberately Planning and Acting for Angry Birds with Refinement Methods
Deliberately Planning and Acting for Angry Birds with Refinement MethodsRuofei Du
 
TensorFlow 深度學習快速上手班--機器學習
TensorFlow 深度學習快速上手班--機器學習TensorFlow 深度學習快速上手班--機器學習
TensorFlow 深度學習快速上手班--機器學習Mark Chang
 
AlphaGo Zero Introduction
AlphaGo Zero IntroductionAlphaGo Zero Introduction
AlphaGo Zero Introduction友誠 張
 
Microsoft Data Wranglers - 8august2017
Microsoft Data Wranglers - 8august2017Microsoft Data Wranglers - 8august2017
Microsoft Data Wranglers - 8august2017Julian Lee
 
TensorFlow 深度學習快速上手班--電腦視覺應用
TensorFlow 深度學習快速上手班--電腦視覺應用TensorFlow 深度學習快速上手班--電腦視覺應用
TensorFlow 深度學習快速上手班--電腦視覺應用Mark Chang
 
A Development of Log-based Game AI using Deep Learning
A Development of Log-based Game AI using Deep LearningA Development of Log-based Game AI using Deep Learning
A Development of Log-based Game AI using Deep LearningSuntae Kim
 
Deep Learning for AI (2)
Deep Learning for AI (2)Deep Learning for AI (2)
Deep Learning for AI (2)Dongheon Lee
 
หัดเขียน A.I. แบบ AlphaGo กันชิวๆ
หัดเขียน A.I. แบบ AlphaGo กันชิวๆหัดเขียน A.I. แบบ AlphaGo กันชิวๆ
หัดเขียน A.I. แบบ AlphaGo กันชิวๆKan Ouivirach, Ph.D.
 
30 分鐘學會實作 Python Feature Selection
30 分鐘學會實作 Python Feature Selection30 分鐘學會實作 Python Feature Selection
30 分鐘學會實作 Python Feature SelectionJames Huang
 

Was ist angesagt? (15)

AlphaZero: A General Reinforcement Learning Algorithm that Masters Chess, Sho...
AlphaZero: A General Reinforcement Learning Algorithm that Masters Chess, Sho...AlphaZero: A General Reinforcement Learning Algorithm that Masters Chess, Sho...
AlphaZero: A General Reinforcement Learning Algorithm that Masters Chess, Sho...
 
(Alpha) Zero to Elo (with demo)
(Alpha) Zero to Elo (with demo)(Alpha) Zero to Elo (with demo)
(Alpha) Zero to Elo (with demo)
 
AlphaGo Zero: Mastering the Game of Go Without Human Knowledge
AlphaGo Zero: Mastering the Game of Go Without Human KnowledgeAlphaGo Zero: Mastering the Game of Go Without Human Knowledge
AlphaGo Zero: Mastering the Game of Go Without Human Knowledge
 
ModuLab DLC-Medical3
ModuLab DLC-Medical3ModuLab DLC-Medical3
ModuLab DLC-Medical3
 
AlphaZero and beyond: Polygames
AlphaZero and beyond: PolygamesAlphaZero and beyond: Polygames
AlphaZero and beyond: Polygames
 
Deliberately Planning and Acting for Angry Birds with Refinement Methods
Deliberately Planning and Acting for Angry Birds with Refinement MethodsDeliberately Planning and Acting for Angry Birds with Refinement Methods
Deliberately Planning and Acting for Angry Birds with Refinement Methods
 
TensorFlow 深度學習快速上手班--機器學習
TensorFlow 深度學習快速上手班--機器學習TensorFlow 深度學習快速上手班--機器學習
TensorFlow 深度學習快速上手班--機器學習
 
AlphaGo Zero Introduction
AlphaGo Zero IntroductionAlphaGo Zero Introduction
AlphaGo Zero Introduction
 
Microsoft Data Wranglers - 8august2017
Microsoft Data Wranglers - 8august2017Microsoft Data Wranglers - 8august2017
Microsoft Data Wranglers - 8august2017
 
TensorFlow 深度學習快速上手班--電腦視覺應用
TensorFlow 深度學習快速上手班--電腦視覺應用TensorFlow 深度學習快速上手班--電腦視覺應用
TensorFlow 深度學習快速上手班--電腦視覺應用
 
A Development of Log-based Game AI using Deep Learning
A Development of Log-based Game AI using Deep LearningA Development of Log-based Game AI using Deep Learning
A Development of Log-based Game AI using Deep Learning
 
Deep Reasoning
Deep ReasoningDeep Reasoning
Deep Reasoning
 
Deep Learning for AI (2)
Deep Learning for AI (2)Deep Learning for AI (2)
Deep Learning for AI (2)
 
หัดเขียน A.I. แบบ AlphaGo กันชิวๆ
หัดเขียน A.I. แบบ AlphaGo กันชิวๆหัดเขียน A.I. แบบ AlphaGo กันชิวๆ
หัดเขียน A.I. แบบ AlphaGo กันชิวๆ
 
30 分鐘學會實作 Python Feature Selection
30 分鐘學會實作 Python Feature Selection30 分鐘學會實作 Python Feature Selection
30 分鐘學會實作 Python Feature Selection
 

Ähnlich wie Chakrabarti alpha go analysis

How DeepMind Mastered The Game Of Go
How DeepMind Mastered The Game Of GoHow DeepMind Mastered The Game Of Go
How DeepMind Mastered The Game Of GoTim Riser
 
Demystifying deep reinforement learning
Demystifying deep reinforement learningDemystifying deep reinforement learning
Demystifying deep reinforement learning재연 윤
 
A Presentation on the Paper: Mastering the game of Go with deep neural networ...
A Presentation on the Paper: Mastering the game of Go with deep neural networ...A Presentation on the Paper: Mastering the game of Go with deep neural networ...
A Presentation on the Paper: Mastering the game of Go with deep neural networ...AdityaSuryavamshi
 
Playing the Snake Game with Deep Reinforcement Learning (by Chuyang Liu)
Playing the Snake Game with Deep Reinforcement Learning (by Chuyang Liu)Playing the Snake Game with Deep Reinforcement Learning (by Chuyang Liu)
Playing the Snake Game with Deep Reinforcement Learning (by Chuyang Liu)Chuyang Liu
 
From Alpha Go to Alpha Zero - Vaas Madrid 2018
From Alpha Go to Alpha Zero -  Vaas Madrid 2018From Alpha Go to Alpha Zero -  Vaas Madrid 2018
From Alpha Go to Alpha Zero - Vaas Madrid 2018Juantomás García Molina
 
Apache MXNet ODSC West 2018
Apache MXNet ODSC West 2018Apache MXNet ODSC West 2018
Apache MXNet ODSC West 2018Apache MXNet
 
GamePlaying.ppt
GamePlaying.pptGamePlaying.ppt
GamePlaying.pptVihaanN2
 
Ropossum: A Game That Generates Itself
Ropossum: A Game That Generates ItselfRopossum: A Game That Generates Itself
Ropossum: A Game That Generates ItselfMohammad Shaker
 
AlphaGo zero
AlphaGo zeroAlphaGo zero
AlphaGo zeroDong Guo
 
Adversarial search with Game Playing
Adversarial search with Game PlayingAdversarial search with Game Playing
Adversarial search with Game PlayingAman Patel
 
Intro to Deep Reinforcement Learning
Intro to Deep Reinforcement LearningIntro to Deep Reinforcement Learning
Intro to Deep Reinforcement LearningKhaled Saleh
 
La question de la durabilité des technologies de calcul et de télécommunication
La question de la durabilité des technologies de calcul et de télécommunicationLa question de la durabilité des technologies de calcul et de télécommunication
La question de la durabilité des technologies de calcul et de télécommunicationAlexandre Monnin
 
Alpha go 16110226_김영우
Alpha go 16110226_김영우Alpha go 16110226_김영우
Alpha go 16110226_김영우영우 김
 
MongoDB.local Seattle 2019: Tips & Tricks for Avoiding Common Query Pitfalls
MongoDB.local Seattle 2019: Tips & Tricks for Avoiding Common Query PitfallsMongoDB.local Seattle 2019: Tips & Tricks for Avoiding Common Query Pitfalls
MongoDB.local Seattle 2019: Tips & Tricks for Avoiding Common Query PitfallsMongoDB
 
Devoxx 2017 - AI Self-learning Game Playing
Devoxx 2017 - AI Self-learning Game PlayingDevoxx 2017 - AI Self-learning Game Playing
Devoxx 2017 - AI Self-learning Game PlayingRichard Abbuhl
 
MongoDB World 2018: Tips and Tricks for Avoiding Common Query Pitfalls
MongoDB World 2018: Tips and Tricks for Avoiding Common Query PitfallsMongoDB World 2018: Tips and Tricks for Avoiding Common Query Pitfalls
MongoDB World 2018: Tips and Tricks for Avoiding Common Query PitfallsMongoDB
 
Curiosity-Bottleneck: Exploration by Distilling Task-Specific Novelty
Curiosity-Bottleneck: Exploration by Distilling Task-Specific NoveltyCuriosity-Bottleneck: Exploration by Distilling Task-Specific Novelty
Curiosity-Bottleneck: Exploration by Distilling Task-Specific NoveltyHyunwoo Kim
 
ch_5 Game playing Min max and Alpha Beta pruning.ppt
ch_5 Game playing Min max and Alpha Beta pruning.pptch_5 Game playing Min max and Alpha Beta pruning.ppt
ch_5 Game playing Min max and Alpha Beta pruning.pptSanGeet25
 

Ähnlich wie Chakrabarti alpha go analysis (20)

How DeepMind Mastered The Game Of Go
How DeepMind Mastered The Game Of GoHow DeepMind Mastered The Game Of Go
How DeepMind Mastered The Game Of Go
 
Demystifying deep reinforement learning
Demystifying deep reinforement learningDemystifying deep reinforement learning
Demystifying deep reinforement learning
 
A Presentation on the Paper: Mastering the game of Go with deep neural networ...
A Presentation on the Paper: Mastering the game of Go with deep neural networ...A Presentation on the Paper: Mastering the game of Go with deep neural networ...
A Presentation on the Paper: Mastering the game of Go with deep neural networ...
 
Playing the Snake Game with Deep Reinforcement Learning (by Chuyang Liu)
Playing the Snake Game with Deep Reinforcement Learning (by Chuyang Liu)Playing the Snake Game with Deep Reinforcement Learning (by Chuyang Liu)
Playing the Snake Game with Deep Reinforcement Learning (by Chuyang Liu)
 
Understanding AlphaGo
Understanding AlphaGoUnderstanding AlphaGo
Understanding AlphaGo
 
From Alpha Go to Alpha Zero - Vaas Madrid 2018
From Alpha Go to Alpha Zero -  Vaas Madrid 2018From Alpha Go to Alpha Zero -  Vaas Madrid 2018
From Alpha Go to Alpha Zero - Vaas Madrid 2018
 
Games.4
Games.4Games.4
Games.4
 
Apache MXNet ODSC West 2018
Apache MXNet ODSC West 2018Apache MXNet ODSC West 2018
Apache MXNet ODSC West 2018
 
GamePlaying.ppt
GamePlaying.pptGamePlaying.ppt
GamePlaying.ppt
 
Ropossum: A Game That Generates Itself
Ropossum: A Game That Generates ItselfRopossum: A Game That Generates Itself
Ropossum: A Game That Generates Itself
 
AlphaGo zero
AlphaGo zeroAlphaGo zero
AlphaGo zero
 
Adversarial search with Game Playing
Adversarial search with Game PlayingAdversarial search with Game Playing
Adversarial search with Game Playing
 
Intro to Deep Reinforcement Learning
Intro to Deep Reinforcement LearningIntro to Deep Reinforcement Learning
Intro to Deep Reinforcement Learning
 
La question de la durabilité des technologies de calcul et de télécommunication
La question de la durabilité des technologies de calcul et de télécommunicationLa question de la durabilité des technologies de calcul et de télécommunication
La question de la durabilité des technologies de calcul et de télécommunication
 
Alpha go 16110226_김영우
Alpha go 16110226_김영우Alpha go 16110226_김영우
Alpha go 16110226_김영우
 
MongoDB.local Seattle 2019: Tips & Tricks for Avoiding Common Query Pitfalls
MongoDB.local Seattle 2019: Tips & Tricks for Avoiding Common Query PitfallsMongoDB.local Seattle 2019: Tips & Tricks for Avoiding Common Query Pitfalls
MongoDB.local Seattle 2019: Tips & Tricks for Avoiding Common Query Pitfalls
 
Devoxx 2017 - AI Self-learning Game Playing
Devoxx 2017 - AI Self-learning Game PlayingDevoxx 2017 - AI Self-learning Game Playing
Devoxx 2017 - AI Self-learning Game Playing
 
MongoDB World 2018: Tips and Tricks for Avoiding Common Query Pitfalls
MongoDB World 2018: Tips and Tricks for Avoiding Common Query PitfallsMongoDB World 2018: Tips and Tricks for Avoiding Common Query Pitfalls
MongoDB World 2018: Tips and Tricks for Avoiding Common Query Pitfalls
 
Curiosity-Bottleneck: Exploration by Distilling Task-Specific Novelty
Curiosity-Bottleneck: Exploration by Distilling Task-Specific NoveltyCuriosity-Bottleneck: Exploration by Distilling Task-Specific Novelty
Curiosity-Bottleneck: Exploration by Distilling Task-Specific Novelty
 
ch_5 Game playing Min max and Alpha Beta pruning.ppt
ch_5 Game playing Min max and Alpha Beta pruning.pptch_5 Game playing Min max and Alpha Beta pruning.ppt
ch_5 Game playing Min max and Alpha Beta pruning.ppt
 

Kürzlich hochgeladen

How to setup Pycharm environment for Odoo 17.pptx
How to setup Pycharm environment for Odoo 17.pptxHow to setup Pycharm environment for Odoo 17.pptx
How to setup Pycharm environment for Odoo 17.pptxCeline George
 
How to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POSHow to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POSCeline George
 
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...Nguyen Thanh Tu Collection
 
Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)Jisc
 
Python Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxPython Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxRamakrishna Reddy Bijjam
 
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...Pooja Bhuva
 
Understanding Accommodations and Modifications
Understanding  Accommodations and ModificationsUnderstanding  Accommodations and Modifications
Understanding Accommodations and ModificationsMJDuyan
 
On_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptx
On_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptxOn_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptx
On_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptxPooja Bhuva
 
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptxHMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptxEsquimalt MFRC
 
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptxMaritesTamaniVerdade
 
Salient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functionsSalient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functionsKarakKing
 
Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.pptRamjanShidvankar
 
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdf
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdfUnit 3 Emotional Intelligence and Spiritual Intelligence.pdf
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdfDr Vijay Vishwakarma
 
Micro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfMicro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfPoh-Sun Goh
 
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...Nguyen Thanh Tu Collection
 
NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...
NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...
NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...Amil baba
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxDenish Jangid
 
Single or Multiple melodic lines structure
Single or Multiple melodic lines structureSingle or Multiple melodic lines structure
Single or Multiple melodic lines structuredhanjurrannsibayan2
 
FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024Elizabeth Walsh
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxheathfieldcps1
 

Kürzlich hochgeladen (20)

How to setup Pycharm environment for Odoo 17.pptx
How to setup Pycharm environment for Odoo 17.pptxHow to setup Pycharm environment for Odoo 17.pptx
How to setup Pycharm environment for Odoo 17.pptx
 
How to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POSHow to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POS
 
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
 
Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)
 
Python Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxPython Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docx
 
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...
 
Understanding Accommodations and Modifications
Understanding  Accommodations and ModificationsUnderstanding  Accommodations and Modifications
Understanding Accommodations and Modifications
 
On_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptx
On_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptxOn_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptx
On_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptx
 
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptxHMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
 
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
 
Salient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functionsSalient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functions
 
Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.ppt
 
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdf
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdfUnit 3 Emotional Intelligence and Spiritual Intelligence.pdf
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdf
 
Micro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfMicro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdf
 
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
 
NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...
NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...
NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
 
Single or Multiple melodic lines structure
Single or Multiple melodic lines structureSingle or Multiple melodic lines structure
Single or Multiple melodic lines structure
 
FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptx
 

Chakrabarti alpha go analysis

  • 1. AlphaGo  Analysis  from  Deep   Learning  Perspec6ve   Chayan  Chakrabar6   July  11,  2016   Pleasanton,  CA  
  • 2. Mastering  the  game  of  GO   •  DeepMind  problem  domain   •  Deep  learning  and  reinforcement  learning   concepts   •  Design  of  AlphaGo   •  Execu6on  
  • 3. GO:  perfect  informa6on  game   All  possible  GO  boards  =  250150  >  Number  of  atoms  in  the  universe      
  • 4. Reduce  search  space   •  Reduce  breadth   – Not  all  moves  are  equally  likely   – Some  moves  are  bePer   – Leverage  moves  made  by  expert  players   •  Reduce  depth   – Evaluate  strength  of  board  (likelihood  of  winning)   – Collapse  symmetrical  or  similar  boards   – Simulate  the  games      
  • 5. Monte  Carlo  tree  search  
  • 6. Supervised  learning  using  neural  networks  
  • 8. Encode  local  or  spa6al  features  
  • 9. Reinforcement  learning   Reinforcement"Learning"" State:" St Reward" (Feedback):"Rt AcIon:"At •  Feedback"is"delayed." •  No"supervisor,"only"a"reward"signal." •  Rules"of"the"game"are"unknown." Agent" Environment"
  • 12. Value:  expected  long  term  reward  
  • 13. Monte  Carlo  tree  search  combined   with  deep  neural  networks  AlphaGo neural networks normal MCTS
  • 14. AlphaGO  schema6c  architecture   AlphaGo neural networks selectionevaluation evaluation
  • 16. Predic6ng  the  move   1.*Reducing*“action*candidates” (1) Imitating+expert+moves+(supervised+learning) Expert$Moves$Imitator$Model (w/$CNN) Current$Board Training: ng*“action*candidates” +expert+moves+(supervised+learning) Expert$Moves$Imitator$Model (w/$CNN) Next$Action Training: ng*“action*candidates” +expert+moves+(supervised+learning) Prediction$ Model 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 g:$s ! p(a|s) p(a|s) aargmax Next$Action Reducing*“action*candidates” Imitating+expert+moves+(supervised+learning) Expert$Moves$Imitator$Model (w/$CNN) nt$Board Next$A Training:
  • 17. Two  kinds  of  policies   ● used a large database of online expert games ● learned two versions of the neural network ○ a fast network P for use in evaluation ○ an accurate network P for use in selection Step 1: learn to predict human moves CS63 topic neural networks week 7, 14?
  • 18. Further  reduce  search  space  Symmetries" Input" RotaIon"" 90"degrees" RotaIon"" 180"degrees" RotaIon"" 270"degrees" VerIcal" reflecIon" VerIcal" reflecIon" VerIcal" reflecIon" VerIcal" reflecIon"
  • 19. Reduce  depth  by  board  evalua6on   Updated$Model ver 1,000,000 Board$Position Training: Value$ Predictio Model (Regressio Evaluation Updated$Model W Value$ Prediction$ Adds$a reg Predicts$v Close$to$1 Close$to$0 Win$/$Loss e$ Adds$a regression$layer$to$the$model Predicts$values$between$0~1 Close$to$1:$a$good$board$position Close$to$0:$a$bad$board$position aluation Updated$Model ver 1,000,000 Training: Win$/$Loss Win (0~1) Value$ Prediction$ Model (Regression) Adds$a regression$layer$to$the$model Predicts$values$between$0~1 Close$to$1:$a$good$board$position Close$to$0:$a$bad$board$position
  • 20. Value  follows  from  policy   Step 3: learn a board evaluation network, V ● use random samples from the self-play database ● prediction target: probability that black wins from a given board
  • 21. PuWng  it  all  together   Looking*ahead*(w/*Monte*Carlo*Search*Tree) Action$Candidates$Reduction (Policy$Network) Board$Evaluation (Value$Network) (Rollout):$Faster$version$of$estimating$p(a|s) ! uses shallow$networks$(3$ms ! 2µs)
  • 23. Expansion  Expansion" s a s0 Insert"the"node"for"the"successor" state""""".""s0 1" 2" Nv(s0 , a0 ) = Nr(s0 , a0 ) = 0 Wr(s0 , a0 ) = Wv(s0 , a0 ) = 0 P(s0 , a0 ) = p (a0 |s0 ) p (a0 |s0 ) If"visit"count"exceed"a"threshold":" """""","Nr(s, a) > nthr a0 a0 For"every"possible"""""","iniIalize" the"staIsIcs:"""" a0 75"
  • 24. Evalua6on  EvaluaIon" p⇡ 1" 2" Simulate"the"acIon"by"" rollout"policy"network""""""""."p⇡ Evaluate""""""""""""""by"value"network""""""."v✓(s0 ) v✓ r(sT ) v✓(s0 ) When"reaching"terminal""""""","" calculate"the"reward"""""""""""""."" sT r(sT ) 76"
  • 26. Distribute  search  through  GPUs  Distributed"Search"" p⇡ r(sT ) v✓(s0 ) p (a0 |s0 ) Main"search"tree" Master"CPU" Policy"&"value"networks" 176"GPUs" Rollout"policy"networks" 1,202"CPUs"" 78"
  • 27. Apply  trained  networks  to  tasks  with   different  loss  func6on  Takeaways Use+the+networks+trained+for+a+certain+task+(with+different+loss+objectives)+for+several+other+ta
  • 28. Single  most  important  takeaway   •  Feature  abstrac6on  is  the  key  component  of   any  machine  learning  algorithm   •  Convolu6onal  neural  networks  are  great  at   automated  feature  abstrac6on  
  • 29. Reference   Silver  et.  al.  Mastering  the  Game  of  Go  with   Deep  Neural  Networks  and  Tree  Search.    Nature.   529,  484–489.  January  2016.    
  • 30. About  the  speaker   Chayan  Chakrabar6   hPps://www.linkedin.com/in/chayanchakrabar6