(Alpha) Zero to Elo (with demo)

•

0 gefällt mir•204 views

MeetupDataScienceRoma

Presented here: https://www.meetup.com/it-IT/Machine-Learning-Data-Science-Meetup/events/249861353/

Technologie

OutlineOutline
1. The math behind Go
2. From Crazy Stone -> AlphaGO
3. AlphaGo vs AlphaZero
4. Policy Iteration
5. Policy Improvement (Math alert!)
6. Policy Evaluation
7. The deep side of AlphaZero
8. Code and demo

"For a true AI isn't measured by the size of its tree, but by
the precision of its moves." Filottete

Go is constructive
Humans describe more as intuititive game
possible states
possible games for each starting state
10
170
10
360

Adversarial
Fully observable
Deterministic

"The mystery of Go, the ancient game that computers still
can't win" - Wired 2014

AlphaGo Zero vs AlphaZeroAlphaGo Zero vs AlphaZero

Reinforcement LearningReinforcement Learning

The agent de nes the part of the world that wants to
explore
And it evaluates the goodness of its behaviors, based
on how much reward is getting

and:
π(a ∣ s) = P (a ∣ s) ∀s ∈ S
(s) = [ ∣ ]vπ Eπ ∑
t
γ
t
Rt St

def value(state):
"""
Black magic
"""
return v

def policy(state):
"""
White magic
"""
return reasonable_actions

1. Plan in the future
2. Try new actions

Monte-Carlo Tree SearchMonte-Carlo Tree Search

MCTS is an algorithm to perform sampling based
lookahead search.

With the backup operation we keep track of:
N(s,a) visit count
Q(s,a) mean action value

Q(s, a) + cP (s, a)
N(s,b)∑
b
√
1+N(s,a)

Policy EvaluationPolicy Evaluation
Self PlaySelf Play

1. Clone yourself and ght!
2. As the Yous battle, observe the ght
3. Use those experiences to improve further

How is it implemented in python?How is it implemented in python?
def play_against_yourself(game, player_mcts):
...
board = game.reset()
while not terminal:
act = player_mcts.pick_move(board)
board, r, terminal, opp_act = game.step(action)
training_samples.append((board, player_id, act))
training_samples.append((board, opp_id, opp_act))
return training_samples

To the code!To the code!
main: https://gist.github.com/manuel-
delverme/36f9fd220989903274c4badf83c0f880

The deeper side of RLThe deeper side of RL

In AlphaZero we want to classify cats nd the best
moves

The superstar of the newtorkThe superstar of the newtork

Deep Learning - where are theDeep Learning - where are the
layers? 1/523layers? 1/523

Deep Learning - where are theDeep Learning - where are the
layers? 2/523layers? 2/523

it's-going-to-take-a-while 3/523it's-going-to-take-a-while 3/523

it's-going-to-take-a-while 4/523it's-going-to-take-a-while 4/523

lol joking/523lol joking/523
fast forwarding...

Loss function - what makes theLoss function - what makes the
model happy?model happy?
(z − v(s) − π log p + c||θ||)
2

To the code!To the code!
train: https://gist.github.com/manuel-
delverme/a1b6b93bd5b4d607920b045b039fcb98

ContactsContacts
manuel.delverme@gmail.com
simone.totaro@gmail.com

Weitere ähnliche Inhalte

Was ist angesagt?

TensorFlow 深度學習快速上手班--機器學習Mark Chang

How AlphaGo WorksShane (Seungwhan) Moon

TensorFlow 深度學習快速上手班--電腦視覺應用Mark Chang

AlphaZero and beyond: PolygamesOlivier Teytaud

Introduction to Neural Networks and Deep Learning from ScratchAhmed BESBES

Evolutionary deep learning: computer vision.Olivier Teytaud

NTC_Tensor flow 深度學習快速上手班_Part1 -機器學習NTC.im(Notch Training Center)

An introduction to Deep Learning with Apache MXNet (November 2017)Julien SIMON

A Development of Log-based Game AI using Deep LearningSuntae Kim

Games.4Praveen Kumar

ModuLab DLC-Medical3Dongheon Lee

What is a Neural Network | EdurekaEdureka!

AlphaGo Zero Introduction友誠張

Was ist angesagt? (13)

TensorFlow 深度學習快速上手班--機器學習

How AlphaGo Works

TensorFlow 深度學習快速上手班--電腦視覺應用

AlphaZero and beyond: Polygames

Introduction to Neural Networks and Deep Learning from Scratch

Evolutionary deep learning: computer vision.

NTC_Tensor flow 深度學習快速上手班_Part1 -機器學習

An introduction to Deep Learning with Apache MXNet (November 2017)

A Development of Log-based Game AI using Deep Learning

Games.4

ModuLab DLC-Medical3

What is a Neural Network | Edureka

AlphaGo Zero Introduction

Ähnlich wie (Alpha) Zero to Elo (with demo)

Understanding AlphaGoAmit Mandelbaum

Testing hybrid computational intelligence algorithms for general game playing...Antonio Mora

Md2010 jl-wp7-sl-game-devJose Luis Latorre Millas

Simple APIs and innovative documentationPyDataParis

Tensorflow + Keras & Open AI GymHO-HSUN LIN

How to generate game character behaviors using AI and ML - Unite CopenhagenUnity Technologies

The Role of Shologuti in Artificial Intelligence Research: A Rural Game of Ba...IJCSIS Research Publications

Demystifying deep reinforement learning재연 윤

หัดเขียน A.I. แบบ AlphaGo กันชิวๆKan Ouivirach, Ph.D.

ChatGPT in EducationVictor del Rosal

J-Fall 2017 - AI Self-learning Game PlayingRichard Abbuhl

report on snake game azhar niaz

La question de la durabilité des technologies de calcul et de télécommunicationAlexandre Monnin

Object Orientation vs. Functional Programming in PythonPython Ireland

C3_W2.pdfShaheenKolimi

Genetic Programming in PythonIntellovations, LLC

The Ring programming language version 1.10 book - Part 71 of 212Mahmoud Samir Fayed

A Deep Journey into Playing Games with Reinforcement Learning - Kim HammarKim Hammar

From alpha go to alpha zero TLP innova 2018Juantomás García Molina

Training the agent for trading use Interactive Broker python api之帆楊

Ähnlich wie (Alpha) Zero to Elo (with demo) (20)

Understanding AlphaGo

Testing hybrid computational intelligence algorithms for general game playing...

Md2010 jl-wp7-sl-game-dev

Simple APIs and innovative documentation

Tensorflow + Keras & Open AI Gym

How to generate game character behaviors using AI and ML - Unite Copenhagen

The Role of Shologuti in Artificial Intelligence Research: A Rural Game of Ba...

Demystifying deep reinforement learning

หัดเขียน A.I. แบบ AlphaGo กันชิวๆ

ChatGPT in Education

J-Fall 2017 - AI Self-learning Game Playing

report on snake game

La question de la durabilité des technologies de calcul et de télécommunication

Object Orientation vs. Functional Programming in Python

C3_W2.pdf

Genetic Programming in Python

The Ring programming language version 1.10 book - Part 71 of 212

A Deep Journey into Playing Games with Reinforcement Learning - Kim Hammar

From alpha go to alpha zero TLP innova 2018

Training the agent for trading use Interactive Broker python api

Mehr von MeetupDataScienceRoma

Serve Davvero il Machine Learning nelle PMI? | Niccolò AnninoMeetupDataScienceRoma

Meta-learning through the lenses of Statistical Learning Theory (Carlo Cilibe...MeetupDataScienceRoma

Claudio Gallicchio - Deep Reservoir Computing for Structured DataMeetupDataScienceRoma

Docker for Deep Learning (Andrea Panizza)MeetupDataScienceRoma

Machine Learning for Epidemiological Models (Enrico Meloni)MeetupDataScienceRoma

Quantum Machine Learning and QEM for Gaussian mixture models (Alessandro Luongo)MeetupDataScienceRoma

Web Meetup #2: Modelli matematici per l'epidemiologiaMeetupDataScienceRoma

Deep red - The environmental impact of deep learning (Paolo Caressa)MeetupDataScienceRoma

[Sponsored] C3.ai descriptionMeetupDataScienceRoma

Paolo Galeone - Dissecting tf.function to discover auto graph strengths and s...MeetupDataScienceRoma

Multimodal AI Approach to Provide Assistive Services (Francesco Puja)MeetupDataScienceRoma

Introduzione - Meetup MLOps & Assistive AIMeetupDataScienceRoma

Zero, One, Many - Machine Learning in Produzione (Luca Palmieri)MeetupDataScienceRoma

Mario Incarnati - The power of data visualizationMeetupDataScienceRoma

Machine Learning in the AWS CloudMeetupDataScienceRoma

OLIVAW: reaching superhuman strength at OthelloMeetupDataScienceRoma

[Giovanni Galloro] How to use machine learning on Google Cloud PlatformMeetupDataScienceRoma

Bring your neural networks to the browser with TF.js - Simone ScardapaneMeetupDataScienceRoma

Meetup Gennaio 2019 - Slide introduttivaMeetupDataScienceRoma

Elena Gagliardoni - Neural ChatbotMeetupDataScienceRoma

Mehr von MeetupDataScienceRoma (20)

Serve Davvero il Machine Learning nelle PMI? | Niccolò Annino

Meta-learning through the lenses of Statistical Learning Theory (Carlo Cilibe...

Claudio Gallicchio - Deep Reservoir Computing for Structured Data

Docker for Deep Learning (Andrea Panizza)

Machine Learning for Epidemiological Models (Enrico Meloni)

Quantum Machine Learning and QEM for Gaussian mixture models (Alessandro Luongo)

Web Meetup #2: Modelli matematici per l'epidemiologia

Deep red - The environmental impact of deep learning (Paolo Caressa)

[Sponsored] C3.ai description

Paolo Galeone - Dissecting tf.function to discover auto graph strengths and s...

Multimodal AI Approach to Provide Assistive Services (Francesco Puja)

Introduzione - Meetup MLOps & Assistive AI

Zero, One, Many - Machine Learning in Produzione (Luca Palmieri)

Mario Incarnati - The power of data visualization

Machine Learning in the AWS Cloud

OLIVAW: reaching superhuman strength at Othello

[Giovanni Galloro] How to use machine learning on Google Cloud Platform

Bring your neural networks to the browser with TF.js - Simone Scardapane

Meetup Gennaio 2019 - Slide introduttiva

Elena Gagliardoni - Neural Chatbot

Kürzlich hochgeladen

GenCyber Cyber Security Day PresentationMichael W. Hawkins

08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls

WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal

Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited

My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar

The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los

04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG

08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls

Key Features Of Token Development (1).pptxLBM Solutions

Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard

From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software

Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix

Pigging Solutions in Pet Food ManufacturingPigging Solutions

Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent

Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies

Pigging Solutions Piggable Sweeping ElbowsPigging Solutions

SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j

08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls

Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst

AI as an Interface for Commercial BuildingsMemoori

Kürzlich hochgeladen (20)

GenCyber Cyber Security Day Presentation

08448380779 Call Girls In Civil Lines Women Seeking Men

WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service

Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365

My Hashitalk Indonesia April 2024 Presentation

The 7 Things I Know About Cyber Security After 25 Years | April 2024

04-2024-HHUG-Sales-and-Marketing-Alignment.pptx

08448380779 Call Girls In Friends Colony Women Seeking Men

Key Features Of Token Development (1).pptx

Maximizing Board Effectiveness 2024 Webinar.pptx

From Event to Action: Accelerate Your Decision Making with Real-Time Automation

Swan(sea) Song – personal research during my six years at Swansea ... and bey...

Pigging Solutions in Pet Food Manufacturing

Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...

Factors to Consider When Choosing Accounts Payable Services Providers.pptx

Pigging Solutions Piggable Sweeping Elbows

SIEMENS: RAPUNZEL – A Tale About Knowledge Graph

08448380779 Call Girls In Diplomatic Enclave Women Seeking Men

Human Factors of XR: Using Human Factors to Design XR Systems

AI as an Interface for Commercial Buildings

(Alpha) Zero to Elo (with demo)

1. [[ ] t0 3l0] t0 3l0αα00

2. OutlineOutline 1. The math behind Go 2. From Crazy Stone -> AlphaGO 3. AlphaGo vs AlphaZero 4. Policy Iteration 5. Policy Improvement (Math alert!) 6. Policy Evaluation 7. The deep side of AlphaZero 8. Code and demo

4. "For a true AI isn't measured by the size of its tree, but by the precision of its moves." Filottete

6. Go is constructive Humans describe more as intuititive game possible states possible games for each starting state 10 170 10 360

7. Adversarial Fully observable Deterministic

8. AI in GoAI in Go

9. "The mystery of Go, the ancient game that computers still can't win" - Wired 2014

10. CrazyStoneCrazyStone

11.

12.

13.

14. AlphaGo Zero vs AlphaZeroAlphaGo Zero vs AlphaZero

15. Reinforcement LearningReinforcement Learning

16.

17. ExampleExample

18.

19.

20. NoticeNotice

21. The agent de nes the part of the world that wants to explore And it evaluates the goodness of its behaviors, based on how much reward is getting

22. and: π(a ∣ s) = P (a ∣ s) ∀s ∈ S (s) = [ ∣ ]vπ Eπ ∑ t γ t Rt St

23. def value(state): """ Black magic """ return v

24. def policy(state): """ White magic """ return reasonable_actions

25. Policy IterationPolicy Iteration

26. #TODO @Manuel: Add code here

27.

28. Policy ImprovementPolicy Improvement

29. 1. Plan in the future 2. Try new actions

30. Monte-Carlo Tree SearchMonte-Carlo Tree Search

31. MCTS is an algorithm to perform sampling based lookahead search.

32.

33.

34. With the backup operation we keep track of: N(s,a) visit count Q(s,a) mean action value

35. ExplorationExploration

36. Bandits ϵ − greedy

37. Q(s, a) + cP (s, a) N(s,b)∑ b √ 1+N(s,a)

38. Policy EvaluationPolicy Evaluation Self PlaySelf Play

39. How well am I doing?

40. 1. Clone yourself and ght! 2. As the Yous battle, observe the ght 3. Use those experiences to improve further

41.

42. How is it implemented in python?How is it implemented in python? def play_against_yourself(game, player_mcts): ... board = game.reset() while not terminal: act = player_mcts.pick_move(board) board, r, terminal, opp_act = game.step(action) training_samples.append((board, player_id, act)) training_samples.append((board, opp_id, opp_act)) return training_samples

43. To the code!To the code! main: https://gist.github.com/manuel- delverme/36f9fd220989903274c4badf83c0f880

44. The deeper side of RLThe deeper side of RL

45.

46. In AlphaZero we want to classify cats nd the best moves

47. π(s) and v(s)

48. The superstar of the newtorkThe superstar of the newtork

49.

50. Deep Learning - where are theDeep Learning - where are the layers? 1/523layers? 1/523

51. Deep Learning - where are theDeep Learning - where are the layers? 2/523layers? 2/523

52. it's-going-to-take-a-while 3/523it's-going-to-take-a-while 3/523

53. it's-going-to-take-a-while 4/523it's-going-to-take-a-while 4/523

54. it's-going-to-take-a-while 4/523it's-going-to-take-a-while 4/523