Alpha zero - London 2018

•

2 gefällt mir•974 views

1) Alpha Zero was an AI developed by DeepMind that achieved master level play in the games of chess, shogi, and Go without relying on human data or prior knowledge. 2) It was able to achieve this by using a new form of deep reinforcement learning that allowed it to learn to play solely from games of self-play, starting from random play. 3) Alpha Zero demonstrated superhuman performance in chess, shogi, and Go by defeating previous champion programs in these games, despite being provided no domain knowledge except the game rules.

Ingenieurwesen

From Alpha Go to
Alpha Zero
Google London

March 2018

Juantomás García
• Data Solutions Manager @ OpenSistemas
• GDE (Google Developer Expert) for cloud
Others
• Co-Author of the first Spanish free software book “La Pastilla
Roja”
• President of Hispalinux (Spanish Linux User Group)
• Organizer of the Machine Learning Spain and GDG Cloud Madrid.
Who I am

• People interested in Machine Learning
• Wants to know more about what’s is Alpha Go
• With a good technical background.
Who are the Audience

• I love Machine Learning.
• There are a lot of takeaways from this project.
• I wish to divulge it
Why I did this presentation

• Alpha Go: the epic project
• AlphaGo Zero: re-evolution version
• Alpha Zero: Looking for general solutions
• DIY: Alpha Zero Connect 4
• Takeaways
Outline

A brief introduction
• Deep Blue was about brute force
• They were emulating how humans play chess

A brief introduction
• A very huge Search Space
Chess -> Opening 20 possible moves
Go -> Opening 361 possible moves

Alpha Go Main Concepts
• Policy Neural Network
“To decide which are the most sensible moves in
a particular board position”.

Alpha Go Main Concepts
• Value Neural Network
“How great is a particular board arrangements”.
“How likely you are to win the game with this
position”.

Alpha Go First Approach: SL
• Just train both networks using human games.
• Just old and ordinary supervised learning.
• With this: AlphaGo just play with like a weak
human.
• It like the approach of deep blue: just emulating
human chess players

Alpha Go Second Approach: RL
• Improve SL version starting playing again itself.
• With Reinforcement Learning is able to play well
against state of the art go playing programs
• These programs are using MCTS

Alpha Go Second Approach: RL
• It is not 2 NN vs Monte Carlo Tree Search
• Is a better MCTS thanks to the NNs.

Alpha Go Second Approach: RL
• Optimal Value Function V*(s)
“Determine the outcome of the game from every
board position (s is the state)”.
Brute force solution is impossible:
Chess: 35 ** 80
Go: 250 ** 150

Alpha Go Second Approach: RL
• Two solutions for reduce the effective search
space:
Truncate the tree subtree search: V(s) like V*(s)
Reducing the breadth of the search with the
policy: P(a|s)
We MCTS rollout the moves choose by the policy
function and evaluate with the optimal value
function.

AlphaGo Zero: Re-Evolution version
• Just trained with Reinforcement Learning
• Choose the less out different moves: u(s,a)
• Just one neural network for policy and value.
• Every time a search is done the neural network is
retrained.

AlphaGo Zero: Re-Evolution version
• Human games was noisy and not reliable.
• Don’t use rollouts for predict who will win.

Alpha Zero: New Challenges
AlphaGo Zero VS AlphaZero:
• Binary outcome (win / loss) × expected outcome
(including
• 3 draws or potentially other outcomes)
• Board positions transformed before passing to neural
networks (by randomly selected rotation or redirection) × no
data augmentation
• Games generated by the best player from previous iterations
(margin of 55 %) × continual update using the latest
parameters (without the evaluation and selection steps)
• Hyper-parameters tuned by Bayesian optimisation × reused
the same hyper-parameters without game-specific tuning

Alpha Zero: DYI
https://medium.com/applied-data-science/how-to-build-your-own-alphazero-ai-using-python-and-keras-7f664945c188

Takeaways
RL is more than Atari Games and GO

Takeaways
AI discovery new ways to play.
Think about new projects like proteins fold.

Takeaways
We’re living awesome times.
Sharing AI papers, tools, models, etc. More
than any time before.

Takeaways
As Ms Fei Fei said: “It’s about democratizing AI”

Takeaways
Watch this Documentary Film about Alpha Go:

Empfohlen

AlphaZeroKarel Ha

Introduction to Alphago ZeroChia-Ching Lin

AlphaGo and AlphaGo Zero☕ Keita Watanabe

How AlphaGo WorksShane (Seungwhan) Moon

AlphaZero: A General Reinforcement Learning Algorithm that Masters Chess, Sho...Joonhyung Lee

AlphaGo in Depth Mark Chang

AlphaGoJackei Kuo

딥러닝과 강화 학습으로 나보다 잘하는 쿠키런 AI 구현하기 DEVIEW 2016Taehoon Kim

Empfohlen

AlphaZeroKarel Ha

Introduction to Alphago ZeroChia-Ching Lin

AlphaGo and AlphaGo Zero☕ Keita Watanabe

How AlphaGo WorksShane (Seungwhan) Moon

AlphaZero: A General Reinforcement Learning Algorithm that Masters Chess, Sho...Joonhyung Lee

AlphaGo in Depth Mark Chang

AlphaGoJackei Kuo

딥러닝과 강화 학습으로 나보다 잘하는 쿠키런 AI 구현하기 DEVIEW 2016Taehoon Kim

AlphaGo 알고리즘 요약Jooyoul Lee

Understanding AlphaGoAmit Mandelbaum

알파고 (바둑 인공지능)의 작동 원리Shane (Seungwhan) Moon

전리품 분배 시스템 기획 배상욱SwooBae

게임기획 포트폴리오 애니팡역기획 배상욱SwooBae

알파고 해부하기 3부Donghun Lee

Hangman game - AI Powered and in Pythonaiclub_slides

알파고 풀어보기 / Alpha Technical Review상은 박

Habitica Design Challenge Finalist for Octalysis - Ivan Milev Yu-kai Chou

게임 기획 튜토리얼 (2015 개정판)Lee Sangkyoon (Kay)

게임별 유저 성향 분석ACE Trader

The Rise and Rise of Idle GamesAnthony Pecorella

AlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree SearchKarel Ha

2013 Gould Backer Zone Defensedockj

ML Zoomcamp 1.4 - CRISP-DMAlexey Grigorev

알파고 해부하기 2부Donghun Lee

DiscoRank: optimizing discoverability on SoundCloudAmélie Anglade

A brief overview of Reinforcement Learning applied to gamesThomas da Silva Paula

강화 학습 기초 Reinforcement Learning an introductionTaehoon Kim

Dueling network architectures for deep reinforcement learningTaehoon Kim

From alpha go to alpha zero TLP innova 2018Juantomás García Molina

Adversarial search with Game PlayingAman Patel

Weitere ähnliche Inhalte

Was ist angesagt?

AlphaGo 알고리즘 요약Jooyoul Lee

Understanding AlphaGoAmit Mandelbaum

알파고 (바둑 인공지능)의 작동 원리Shane (Seungwhan) Moon

전리품 분배 시스템 기획 배상욱SwooBae

게임기획 포트폴리오 애니팡역기획 배상욱SwooBae

알파고 해부하기 3부Donghun Lee

Hangman game - AI Powered and in Pythonaiclub_slides

알파고 풀어보기 / Alpha Technical Review상은 박

Habitica Design Challenge Finalist for Octalysis - Ivan Milev Yu-kai Chou

게임 기획 튜토리얼 (2015 개정판)Lee Sangkyoon (Kay)

게임별 유저 성향 분석ACE Trader

The Rise and Rise of Idle GamesAnthony Pecorella

AlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree SearchKarel Ha

2013 Gould Backer Zone Defensedockj

ML Zoomcamp 1.4 - CRISP-DMAlexey Grigorev

알파고 해부하기 2부Donghun Lee

DiscoRank: optimizing discoverability on SoundCloudAmélie Anglade

A brief overview of Reinforcement Learning applied to gamesThomas da Silva Paula

강화 학습 기초 Reinforcement Learning an introductionTaehoon Kim

Dueling network architectures for deep reinforcement learningTaehoon Kim

Was ist angesagt? (20)

AlphaGo 알고리즘 요약

Understanding AlphaGo

알파고 (바둑 인공지능)의 작동 원리

전리품 분배 시스템 기획 배상욱

게임기획 포트폴리오 애니팡역기획 배상욱

알파고 해부하기 3부

Hangman game - AI Powered and in Python

알파고 풀어보기 / Alpha Technical Review

Habitica Design Challenge Finalist for Octalysis - Ivan Milev

게임 기획 튜토리얼 (2015 개정판)

게임별 유저 성향 분석

The Rise and Rise of Idle Games

AlphaGo: Mastering the Game of Go with Deep Neural Networks and Tree Search

2013 Gould Backer Zone Defense

ML Zoomcamp 1.4 - CRISP-DM

알파고 해부하기 2부

DiscoRank: optimizing discoverability on SoundCloud

A brief overview of Reinforcement Learning applied to games

강화 학습 기초 Reinforcement Learning an introduction

Dueling network architectures for deep reinforcement learning

Ähnlich wie Alpha zero - London 2018

From alpha go to alpha zero TLP innova 2018Juantomás García Molina

Adversarial search with Game PlayingAman Patel

Chakrabarti alpha go analysisDave Selinger

Gameskalavathisugan

A Presentation on the Paper: Mastering the game of Go with deep neural networ...AdityaSuryavamshi

How DeepMind Mastered The Game Of GoTim Riser

J-Fall 2017 - AI Self-learning Game PlayingRichard Abbuhl

Games.4Praveen Kumar

AlphaGo: An AI Go player based on deep neural networks and monte carlo tree s...Michael Jongho Moon

Implementation and analysis of search algorithms in single player connect fou...Anmol Rajpurohit

Devoxx 2017 - AI Self-learning Game PlayingRichard Abbuhl

AlphaGo zeroDong Guo

Alpha go 16110226_김영우영우 김

IaGo: an Othello AI inspired by AlphaGoShion Honda

TensorFlow London 11: Pierre Harvey Richemond 'Trends and Developments in Rei...Seldon

[1312.5602] Playing Atari with Deep Reinforcement LearningSeung Jae Lee

chess-algorithms-theory-and-practice_ver2017.pdfrajdipdas12

Deep learning to the rescue - solving long standing problems of recommender ...Balázs Hidasi

21CSC206T_UNIT3.pptx.pdf ARITIFICIAL INTELLIGENCEudayvanand

Mastering the game of go with deep neural networks and tree searchSanFengChang

Ähnlich wie Alpha zero - London 2018 (20)

From alpha go to alpha zero TLP innova 2018

Adversarial search with Game Playing

Chakrabarti alpha go analysis

Games

A Presentation on the Paper: Mastering the game of Go with deep neural networ...

How DeepMind Mastered The Game Of Go

J-Fall 2017 - AI Self-learning Game Playing

Games.4

AlphaGo: An AI Go player based on deep neural networks and monte carlo tree s...

Implementation and analysis of search algorithms in single player connect fou...

Devoxx 2017 - AI Self-learning Game Playing

AlphaGo zero

Alpha go 16110226_김영우

IaGo: an Othello AI inspired by AlphaGo

TensorFlow London 11: Pierre Harvey Richemond 'Trends and Developments in Rei...

[1312.5602] Playing Atari with Deep Reinforcement Learning

chess-algorithms-theory-and-practice_ver2017.pdf

Deep learning to the rescue - solving long standing problems of recommender ...

21CSC206T_UNIT3.pptx.pdf ARITIFICIAL INTELLIGENCE

Mastering the game of go with deep neural networks and tree search

Mehr von Juantomás García Molina

#AbadIA machine learning pipelines commit conf 2019Juantomás García Molina

AbadIA - sphere it krakow 2019Juantomás García Molina

AbadIA ING Direct - Madrid 2019Juantomás García Molina

AbadIA US Secret Tour - Pittsburgh'19Juantomás García Molina

AbadIA: the abbey of the crime AI - GDG Cloud London 2018Juantomás García Molina

#AbadIA: the abbey of the crime AI - IO18 extended madrid 2018Juantomás García Molina

#AbadIA: the abbey of the crime AI - IBM meetup Madrid 2018Juantomás García Molina

AbadIA: the abbey of the crime AI - Vaas Madrid 2018Juantomás García Molina

From Alpha Go to Alpha Zero - Vaas Madrid 2018Juantomás García Molina

Codemotion madrid 2017 Arquitectura kappa 2.0Juantomás García Molina

JBCN barcelona 2017 kappa architecture 2.0Juantomás García Molina

Meetup big data developers 2017 madrid - spark real use casesJuantomás García Molina

Gdg cloud madrid 2017 - GDG kick off metuupJuantomás García Molina

Scalaua 2017 kyev kappa architecture 2.0Juantomás García Molina

Icea 2017 big data - recursos humanosJuantomás García Molina

Gdg cloud london 2017 kappa architecture 2.0 copiaJuantomás García Molina

Datascience lab 2017 odessa kappa architecture 2.0Juantomás García Molina

Databeers madrid 2017 - Paas pigeons as a serviceJuantomás García Molina

How to create a personal knowledge graph IBM Meetup Big Data Madrid 2017Juantomás García Molina

Librecon 2016 bilbao: kappa architecture IoT of the carsJuantomás García Molina

Mehr von Juantomás García Molina (20)

#AbadIA machine learning pipelines commit conf 2019

AbadIA - sphere it krakow 2019

AbadIA ING Direct - Madrid 2019

AbadIA US Secret Tour - Pittsburgh'19

AbadIA: the abbey of the crime AI - GDG Cloud London 2018

#AbadIA: the abbey of the crime AI - IO18 extended madrid 2018

#AbadIA: the abbey of the crime AI - IBM meetup Madrid 2018

AbadIA: the abbey of the crime AI - Vaas Madrid 2018

From Alpha Go to Alpha Zero - Vaas Madrid 2018

Codemotion madrid 2017 Arquitectura kappa 2.0

JBCN barcelona 2017 kappa architecture 2.0

Meetup big data developers 2017 madrid - spark real use cases

Gdg cloud madrid 2017 - GDG kick off metuup

Scalaua 2017 kyev kappa architecture 2.0

Icea 2017 big data - recursos humanos

Gdg cloud london 2017 kappa architecture 2.0 copia

Datascience lab 2017 odessa kappa architecture 2.0

Databeers madrid 2017 - Paas pigeons as a service

How to create a personal knowledge graph IBM Meetup Big Data Madrid 2017

Librecon 2016 bilbao: kappa architecture IoT of the cars

Kürzlich hochgeladen

Thermal Engineering-R & A / C - unit - VDineshKumar4165

XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXssuser89054b

PE 459 LECTURE 2- natural gas basic concepts and propertiessarkmank1

FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced LoadsArindam Chakraborty, Ph.D., P.E. (CA, TX)

Double Revolving field theory-how the rotor develops torqueBhangaleSonal

Generative AI or GenAI technology based PPTbhaskargani46

Engineering Drawing focus on projection of planesRAJNEESHKUMAR341697

Hostel management system project report..pdfKamal Acharya

A Study of Urban Area Plan for Pabna MunicipalityMorshed Ahmed Rahath

Thermal Engineering -unit - III & IV.pptDineshKumar4165

1_Introduction + EAM Vocabulary + how to navigate in EAM.pdfAldoGarca30

Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak HamilCara Menggugurkan Kandungan 087776558899

Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...Arindam Chakraborty, Ph.D., P.E. (CA, TX)

Moment Distribution Method For Btech CivilVinayVitekari

Unleashing the Power of the SORA AI lastest leapRishantSharmaFr

HOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptxSCMS School of Architecture

Design For Accessibility: Getting it right from the startQuintin Balsdon

DeepFakes presentation : brief idea of DeepFakesMayuraD1

Unit 4_Part 1 CSE2001 Exception Handling and Function Template and Class Temp...drmkjayanthikannan

Computer Networks Basics of Network DevicesChandrakantDivate1

Kürzlich hochgeladen (20)

Thermal Engineering-R & A / C - unit - V

XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

PE 459 LECTURE 2- natural gas basic concepts and properties

FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced Loads

Double Revolving field theory-how the rotor develops torque

Generative AI or GenAI technology based PPT

Engineering Drawing focus on projection of planes

Hostel management system project report..pdf

A Study of Urban Area Plan for Pabna Municipality

Thermal Engineering -unit - III & IV.ppt

1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf

Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil

Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...

Moment Distribution Method For Btech Civil

Unleashing the Power of the SORA AI lastest leap

HOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptx

Design For Accessibility: Getting it right from the start

DeepFakes presentation : brief idea of DeepFakes

Unit 4_Part 1 CSE2001 Exception Handling and Function Template and Class Temp...

Computer Networks Basics of Network Devices

Alpha zero - London 2018

1. From Alpha Go to Alpha Zero Google London March 2018

2. Juantomás García • Data Solutions Manager @ OpenSistemas • GDE (Google Developer Expert) for cloud Others • Co-Author of the first Spanish free software book “La Pastilla Roja” • President of Hispalinux (Spanish Linux User Group) • Organizer of the Machine Learning Spain and GDG Cloud Madrid. Who I am

3. • People interested in Machine Learning • Wants to know more about what’s is Alpha Go • With a good technical background. Who are the Audience

4. • I love Machine Learning. • There are a lot of takeaways from this project. • I wish to divulge it Why I did this presentation

5. • Alpha Go: the epic project • AlphaGo Zero: re-evolution version • Alpha Zero: Looking for general solutions • DIY: Alpha Zero Connect 4 • Takeaways Outline

6. A brief introduction • Deep Blue was about brute force • They were emulating how humans play chess

7. A brief introduction • A very huge Search Space Chess -> Opening 20 possible moves Go -> Opening 361 possible moves

8. Alpha Go Main Concepts • Policy Neural Network “To decide which are the most sensible moves in a particular board position”.

9. Alpha Go Main Concepts • Value Neural Network “How great is a particular board arrangements”. “How likely you are to win the game with this position”.

10. Alpha Go Main Concepts

11. Alpha Go First Approach: SL • Just train both networks using human games. • Just old and ordinary supervised learning. • With this: AlphaGo just play with like a weak human. • It like the approach of deep blue: just emulating human chess players

12. Alpha Go First Approach: SL

13. Alpha Go Second Approach: RL • Improve SL version starting playing again itself. • With Reinforcement Learning is able to play well against state of the art go playing programs • These programs are using MCTS

14. Alpha Go Second Approach: RL

15. Alpha Go Second Approach: RL • It is not 2 NN vs Monte Carlo Tree Search • Is a better MCTS thanks to the NNs.

16. Alpha Go Second Approach: RL • Optimal Value Function V*(s) “Determine the outcome of the game from every board position (s is the state)”. Brute force solution is impossible: Chess: 35 ** 80 Go: 250 ** 150

17. Alpha Go Second Approach: RL • Two solutions for reduce the effective search space: Truncate the tree subtree search: V(s) like V*(s) Reducing the breadth of the search with the policy: P(a|s) We MCTS rollout the moves choose by the policy function and evaluate with the optimal value function.

18. AlphaGo: The Match

19. AlphaGo Zero: Re-Evolution version • Just trained with Reinforcement Learning • Choose the less out different moves: u(s,a) • Just one neural network for policy and value. • Every time a search is done the neural network is retrained.

20. AlphaGo Zero: Re-Evolution version • Human games was noisy and not reliable. • Don’t use rollouts for predict who will win.

21. AlphaGo Zero: Re-Evolution version

22. AlphaGo Zero: Re-Evolution version

23. Alpha Zero: New Challenges AlphaGo Zero VS AlphaZero: • Binary outcome (win / loss) × expected outcome (including • 3 draws or potentially other outcomes) • Board positions transformed before passing to neural networks (by randomly selected rotation or redirection) × no data augmentation • Games generated by the best player from previous iterations (margin of 55 %) × continual update using the latest parameters (without the evaluation and selection steps) • Hyper-parameters tuned by Bayesian optimisation × reused the same hyper-parameters without game-specific tuning

24. Alpha Zero

25. Alpha Zero: DYI https://medium.com/applied-data-science/how-to-build-your-own-alphazero-ai-using-python-and-keras-7f664945c188

26. Takeaways RL is more than Atari Games and GO

27. Takeaways AI discovery new ways to play. Think about new projects like proteins fold.

28. Takeaways We’re living awesome times. Sharing AI papers, tools, models, etc. More than any time before.

29. Takeaways As Ms Fei Fei said: “It’s about democratizing AI”

30. Takeaways Watch this Documentary Film about Alpha Go:

31. Thank You