Diese Präsentation wurde erfolgreich gemeldet.
Die SlideShare-Präsentation wird heruntergeladen. ×

Chakrabarti alpha go analysis

Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige
Anzeige

Hier ansehen

1 von 30 Anzeige
Anzeige

Weitere Verwandte Inhalte

Diashows für Sie (15)

Ähnlich wie Chakrabarti alpha go analysis (20)

Anzeige

Aktuellste (20)

Chakrabarti alpha go analysis

  1. 1. AlphaGo  Analysis  from  Deep   Learning  Perspec6ve   Chayan  Chakrabar6   July  11,  2016   Pleasanton,  CA  
  2. 2. Mastering  the  game  of  GO   •  DeepMind  problem  domain   •  Deep  learning  and  reinforcement  learning   concepts   •  Design  of  AlphaGo   •  Execu6on  
  3. 3. GO:  perfect  informa6on  game   All  possible  GO  boards  =  250150  >  Number  of  atoms  in  the  universe      
  4. 4. Reduce  search  space   •  Reduce  breadth   – Not  all  moves  are  equally  likely   – Some  moves  are  bePer   – Leverage  moves  made  by  expert  players   •  Reduce  depth   – Evaluate  strength  of  board  (likelihood  of  winning)   – Collapse  symmetrical  or  similar  boards   – Simulate  the  games      
  5. 5. Monte  Carlo  tree  search  
  6. 6. Supervised  learning  using  neural  networks  
  7. 7. Convolu6onal  neural  networks  
  8. 8. Encode  local  or  spa6al  features  
  9. 9. Reinforcement  learning   Reinforcement"Learning"" State:" St Reward" (Feedback):"Rt AcIon:"At •  Feedback"is"delayed." •  No"supervisor,"only"a"reward"signal." •  Rules"of"the"game"are"unknown." Agent" Environment"
  10. 10. Determinis6c  policy  
  11. 11. Stochas6c  policy  
  12. 12. Value:  expected  long  term  reward  
  13. 13. Monte  Carlo  tree  search  combined   with  deep  neural  networks  AlphaGo neural networks normal MCTS
  14. 14. AlphaGO  schema6c  architecture   AlphaGo neural networks selectionevaluation evaluation
  15. 15. Reducing  breadth  of  moves  
  16. 16. Predic6ng  the  move   1.*Reducing*“action*candidates” (1) Imitating+expert+moves+(supervised+learning) Expert$Moves$Imitator$Model (w/$CNN) Current$Board Training: ng*“action*candidates” +expert+moves+(supervised+learning) Expert$Moves$Imitator$Model (w/$CNN) Next$Action Training: ng*“action*candidates” +expert+moves+(supervised+learning) Prediction$ Model 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 g:$s ! p(a|s) p(a|s) aargmax Next$Action Reducing*“action*candidates” Imitating+expert+moves+(supervised+learning) Expert$Moves$Imitator$Model (w/$CNN) nt$Board Next$A Training:
  17. 17. Two  kinds  of  policies   ● used a large database of online expert games ● learned two versions of the neural network ○ a fast network P for use in evaluation ○ an accurate network P for use in selection Step 1: learn to predict human moves CS63 topic neural networks week 7, 14?
  18. 18. Further  reduce  search  space  Symmetries" Input" RotaIon"" 90"degrees" RotaIon"" 180"degrees" RotaIon"" 270"degrees" VerIcal" reflecIon" VerIcal" reflecIon" VerIcal" reflecIon" VerIcal" reflecIon"
  19. 19. Reduce  depth  by  board  evalua6on   Updated$Model ver 1,000,000 Board$Position Training: Value$ Predictio Model (Regressio Evaluation Updated$Model W Value$ Prediction$ Adds$a reg Predicts$v Close$to$1 Close$to$0 Win$/$Loss e$ Adds$a regression$layer$to$the$model Predicts$values$between$0~1 Close$to$1:$a$good$board$position Close$to$0:$a$bad$board$position aluation Updated$Model ver 1,000,000 Training: Win$/$Loss Win (0~1) Value$ Prediction$ Model (Regression) Adds$a regression$layer$to$the$model Predicts$values$between$0~1 Close$to$1:$a$good$board$position Close$to$0:$a$bad$board$position
  20. 20. Value  follows  from  policy   Step 3: learn a board evaluation network, V ● use random samples from the self-play database ● prediction target: probability that black wins from a given board
  21. 21. PuWng  it  all  together   Looking*ahead*(w/*Monte*Carlo*Search*Tree) Action$Candidates$Reduction (Policy$Network) Board$Evaluation (Value$Network) (Rollout):$Faster$version$of$estimating$p(a|s) ! uses shallow$networks$(3$ms ! 2µs)
  22. 22. Selec6on  
  23. 23. Expansion  Expansion" s a s0 Insert"the"node"for"the"successor" state""""".""s0 1" 2" Nv(s0 , a0 ) = Nr(s0 , a0 ) = 0 Wr(s0 , a0 ) = Wv(s0 , a0 ) = 0 P(s0 , a0 ) = p (a0 |s0 ) p (a0 |s0 ) If"visit"count"exceed"a"threshold":" """""","Nr(s, a) > nthr a0 a0 For"every"possible"""""","iniIalize" the"staIsIcs:"""" a0 75"
  24. 24. Evalua6on  EvaluaIon" p⇡ 1" 2" Simulate"the"acIon"by"" rollout"policy"network""""""""."p⇡ Evaluate""""""""""""""by"value"network""""""."v✓(s0 ) v✓ r(sT ) v✓(s0 ) When"reaching"terminal""""""","" calculate"the"reward"""""""""""""."" sT r(sT ) 76"
  25. 25. Backup  
  26. 26. Distribute  search  through  GPUs  Distributed"Search"" p⇡ r(sT ) v✓(s0 ) p (a0 |s0 ) Main"search"tree" Master"CPU" Policy"&"value"networks" 176"GPUs" Rollout"policy"networks" 1,202"CPUs"" 78"
  27. 27. Apply  trained  networks  to  tasks  with   different  loss  func6on  Takeaways Use+the+networks+trained+for+a+certain+task+(with+different+loss+objectives)+for+several+other+ta
  28. 28. Single  most  important  takeaway   •  Feature  abstrac6on  is  the  key  component  of   any  machine  learning  algorithm   •  Convolu6onal  neural  networks  are  great  at   automated  feature  abstrac6on  
  29. 29. Reference   Silver  et.  al.  Mastering  the  Game  of  Go  with   Deep  Neural  Networks  and  Tree  Search.    Nature.   529,  484–489.  January  2016.    
  30. 30. About  the  speaker   Chayan  Chakrabar6   hPps://www.linkedin.com/in/chayanchakrabar6    

×