2. OutlineOutline
1. The math behind Go
2. From Crazy Stone -> AlphaGO
3. AlphaGo vs AlphaZero
4. Policy Iteration
5. Policy Improvement (Math alert!)
6. Policy Evaluation
7. The deep side of AlphaZero
8. Code and demo
3.
4. "For a true AI isn't measured by the size of its tree, but by
the precision of its moves." Filottete
5.
6. Go is constructive
Humans describe more as intuititive game
possible states
possible games for each starting state
10
170
10
360
40. 1. Clone yourself and ght!
2. As the Yous battle, observe the ght
3. Use those experiences to improve further
41.
42. How is it implemented in python?How is it implemented in python?
def play_against_yourself(game, player_mcts):
...
board = game.reset()
while not terminal:
act = player_mcts.pick_move(board)
board, r, terminal, opp_act = game.step(action)
training_samples.append((board, player_id, act))
training_samples.append((board, opp_id, opp_act))
return training_samples
43. To the code!To the code!
main: https://gist.github.com/manuel-
delverme/36f9fd220989903274c4badf83c0f880