SlideShare ist ein Scribd-Unternehmen logo
1 von 20
Multi-agent RL
in
Sequential Social Dilemmas
Paper Review
MARL in SSD
• Multi Agent Reinforcement Learning
• Sequential Social Dilemmas
=> Understanding Agent Cooperation
=> In sequential situation ( mixed incentive sturcutre of matrix game social dilemma )
learn policies.
Sequential situation
Fruit Gathering
Wolfpack Hunting
Social Dilemma
• A social dilemma is a situation in which an
individual profits from selfishness unless everyone
chooses the selfish alternative, in which case the
whole group loses => Represent with Matrix game
Matrix Game – prisoner’s dilemma
Nash Equilibrium
This is Best Choice..
in global perspective
Betrayal Cooperate Matrix Game Social Dilemma
== MGSD
Rational agent
choice this
( Think reward is - )
MGSD ignores…
1. In real world’s social dilemmas are temporally extended
2. Cooperation and defection are labels that apply to polices implementing
strategic decision
3. Cooperativeness may be a graded quantity
4. Decision to cooperate or defect occur only quasi-simultaneously since some
information about what player 2 is starting to do can inform player 1’s decision
and vice versa
5. Decision must be made despite only having partial information about the
state of the world and the activities of the other players
Sequential Social Dilemma
SSD
= Markov Games +
Matrix Game Social
Dilemma
SSD – Markov Games
two-player partially observable Markov game : M => O : S x {1,2}
# O = { o_i | s, o_i }
Transition Function T : S x A_1 x A_2 -> delta(S) ( discrete probability distributions )
Reward Function r_i : S x A1 x A2
Policy π : O_i -> delta(A_i)
== Find MGSD with Reinforcement Learning
Value-state function
SSD – Definition of SSD
Sequential Social Dilemma
Empirical payoff matrix
Markov game에서 observation이 변함에 따라 policy가 변화
Learning Algorithm
== Deep Multiagent Reinforcement Learning
Use Deep Q-Network
Uniform Dist.
Simulation Method
Game : 2D grid-world
Observation : 3( RGB )
x 15(forehead) x 10(side)
Action :
8 ( arrow keys + rotate left + rotate right
+ use beam + stand )
Episode : 1000 step
NN : two Hidden layer – 32 unit
+ relu activation 8 output
Policy : e-greedy ( decrease e 1.0 to 0.1 )
Result – Gathering
Reward가 없지만… laser로 other agent를 잠깐 없앰
먹을게 (초록) 많으면 공존하면서 reward를 얻고,
적으면 서로 공격하기 시작함
Result – Gathering
Touch Green : reward +1 ( green removed temporally )
Beam to other player : (tagging)
hit twice, remove opponent from game N_tagged frames
Apple respawns after N_apple frames
=>
Defecting Policy == aggressive ( use beam )
Coopertive Policy == not seek to tag the other player
https://www.youtube.com/watch?v=F97lqqpcqsM
Result – Gathering
*After training for 4- million steps for each option
Conflict cost
Abundance
Highly Agressive
Low Agressive
RL to SSD
1. Train Policies at Different Game
2. Extract trained Policies from 1.
3. Calculate MGSD
4. Repeat 2-3 Until Converge
Gathering : DRL to SSD
Prisoner Dilemma
or
Non-SSD : ( NE is Global Optimal )
Wolfpack
함께 잡으면 더 높은 Reward
Wolfpack
r_team : reward when touch prey same
time
radius : capture radius ( collision size )
== difficulty of capture
Wolfpack SSD
Material Link
• https://arxiv.org/pdf/1702.03037.pdf
• https://deepmind.com/blog/understanding-agent-
cooperation/

Weitere ähnliche Inhalte

Was ist angesagt?

중국의 역습 - 도탑전기 분석
중국의 역습 - 도탑전기 분석중국의 역습 - 도탑전기 분석
중국의 역습 - 도탑전기 분석Harns (Nak-Hyoung) Kim
 
Adversarial search
Adversarial searchAdversarial search
Adversarial searchDheerendra k
 
Minmax Algorithm In Artificial Intelligence slides
Minmax Algorithm In Artificial Intelligence slidesMinmax Algorithm In Artificial Intelligence slides
Minmax Algorithm In Artificial Intelligence slidesSamiaAziz4
 
FPGAによる津波シミュレーション -- GPUを超える高性能計算の手法
FPGAによる津波シミュレーション -- GPUを超える高性能計算の手法FPGAによる津波シミュレーション -- GPUを超える高性能計算の手法
FPGAによる津波シミュレーション -- GPUを超える高性能計算の手法Kentaro Sano
 
Basic Design Framework
Basic Design FrameworkBasic Design Framework
Basic Design FrameworkSafal Kapoor
 
. An introduction to machine learning and probabilistic ...
. An introduction to machine learning and probabilistic .... An introduction to machine learning and probabilistic ...
. An introduction to machine learning and probabilistic ...butest
 
LAFS Game Mechanics - Narrative Elements
LAFS Game Mechanics - Narrative ElementsLAFS Game Mechanics - Narrative Elements
LAFS Game Mechanics - Narrative ElementsDavid Mullich
 
AMD: Where Gaming Begins
AMD: Where Gaming BeginsAMD: Where Gaming Begins
AMD: Where Gaming BeginsAMD
 
PhD Dissertation Writing Services TIPS FOR DEVELOPING Ant-Lion Optimizer Algo...
PhD Dissertation Writing Services TIPS FOR DEVELOPING Ant-Lion Optimizer Algo...PhD Dissertation Writing Services TIPS FOR DEVELOPING Ant-Lion Optimizer Algo...
PhD Dissertation Writing Services TIPS FOR DEVELOPING Ant-Lion Optimizer Algo...PhD Assistance
 
秘密分散法の数理
秘密分散法の数理秘密分散法の数理
秘密分散法の数理Akito Tabira
 
C#, C/CLI と CUDAによる画像処理ことはじめ
C#, C/CLI と CUDAによる画像処理ことはじめC#, C/CLI と CUDAによる画像処理ことはじめ
C#, C/CLI と CUDAによる画像処理ことはじめNVIDIA Japan
 
[PR12] PR-050: Convolutional LSTM Network: A Machine Learning Approach for Pr...
[PR12] PR-050: Convolutional LSTM Network: A Machine Learning Approach for Pr...[PR12] PR-050: Convolutional LSTM Network: A Machine Learning Approach for Pr...
[PR12] PR-050: Convolutional LSTM Network: A Machine Learning Approach for Pr...Taegyun Jeon
 
ACRi HLSチャレンジ紹介
ACRi HLSチャレンジ紹介ACRi HLSチャレンジ紹介
ACRi HLSチャレンジ紹介Jun Ando
 
NDC 2018 레벨 디자인 튜토리얼 Level Design Tutorial
NDC 2018 레벨 디자인 튜토리얼 Level Design TutorialNDC 2018 레벨 디자인 튜토리얼 Level Design Tutorial
NDC 2018 레벨 디자인 튜토리얼 Level Design Tutorial용태 이
 
オープンソースコンパイラNNgenでつくるエッジ・ディープラーニングシステム
オープンソースコンパイラNNgenでつくるエッジ・ディープラーニングシステムオープンソースコンパイラNNgenでつくるエッジ・ディープラーニングシステム
オープンソースコンパイラNNgenでつくるエッジ・ディープラーニングシステムShinya Takamaeda-Y
 
도탑전기 분석
도탑전기 분석도탑전기 분석
도탑전기 분석MooSeok Kang
 
ConvNeXt: A ConvNet for the 2020s explained
ConvNeXt: A ConvNet for the 2020s explainedConvNeXt: A ConvNet for the 2020s explained
ConvNeXt: A ConvNet for the 2020s explainedSushant Gautam
 

Was ist angesagt? (20)

중국의 역습 - 도탑전기 분석
중국의 역습 - 도탑전기 분석중국의 역습 - 도탑전기 분석
중국의 역습 - 도탑전기 분석
 
Adversarial search
Adversarial searchAdversarial search
Adversarial search
 
Minmax Algorithm In Artificial Intelligence slides
Minmax Algorithm In Artificial Intelligence slidesMinmax Algorithm In Artificial Intelligence slides
Minmax Algorithm In Artificial Intelligence slides
 
FPGAによる津波シミュレーション -- GPUを超える高性能計算の手法
FPGAによる津波シミュレーション -- GPUを超える高性能計算の手法FPGAによる津波シミュレーション -- GPUを超える高性能計算の手法
FPGAによる津波シミュレーション -- GPUを超える高性能計算の手法
 
Basic Design Framework
Basic Design FrameworkBasic Design Framework
Basic Design Framework
 
. An introduction to machine learning and probabilistic ...
. An introduction to machine learning and probabilistic .... An introduction to machine learning and probabilistic ...
. An introduction to machine learning and probabilistic ...
 
新しい暗号技術
新しい暗号技術新しい暗号技術
新しい暗号技術
 
LAFS Game Mechanics - Narrative Elements
LAFS Game Mechanics - Narrative ElementsLAFS Game Mechanics - Narrative Elements
LAFS Game Mechanics - Narrative Elements
 
AMD: Where Gaming Begins
AMD: Where Gaming BeginsAMD: Where Gaming Begins
AMD: Where Gaming Begins
 
PhD Dissertation Writing Services TIPS FOR DEVELOPING Ant-Lion Optimizer Algo...
PhD Dissertation Writing Services TIPS FOR DEVELOPING Ant-Lion Optimizer Algo...PhD Dissertation Writing Services TIPS FOR DEVELOPING Ant-Lion Optimizer Algo...
PhD Dissertation Writing Services TIPS FOR DEVELOPING Ant-Lion Optimizer Algo...
 
秘密分散法の数理
秘密分散法の数理秘密分散法の数理
秘密分散法の数理
 
C#, C/CLI と CUDAによる画像処理ことはじめ
C#, C/CLI と CUDAによる画像処理ことはじめC#, C/CLI と CUDAによる画像処理ことはじめ
C#, C/CLI と CUDAによる画像処理ことはじめ
 
[PR12] PR-050: Convolutional LSTM Network: A Machine Learning Approach for Pr...
[PR12] PR-050: Convolutional LSTM Network: A Machine Learning Approach for Pr...[PR12] PR-050: Convolutional LSTM Network: A Machine Learning Approach for Pr...
[PR12] PR-050: Convolutional LSTM Network: A Machine Learning Approach for Pr...
 
ACRi HLSチャレンジ紹介
ACRi HLSチャレンジ紹介ACRi HLSチャレンジ紹介
ACRi HLSチャレンジ紹介
 
Minimax
MinimaxMinimax
Minimax
 
NDC 2018 레벨 디자인 튜토리얼 Level Design Tutorial
NDC 2018 레벨 디자인 튜토리얼 Level Design TutorialNDC 2018 레벨 디자인 튜토리얼 Level Design Tutorial
NDC 2018 레벨 디자인 튜토리얼 Level Design Tutorial
 
オープンソースコンパイラNNgenでつくるエッジ・ディープラーニングシステム
オープンソースコンパイラNNgenでつくるエッジ・ディープラーニングシステムオープンソースコンパイラNNgenでつくるエッジ・ディープラーニングシステム
オープンソースコンパイラNNgenでつくるエッジ・ディープラーニングシステム
 
Cuda
CudaCuda
Cuda
 
도탑전기 분석
도탑전기 분석도탑전기 분석
도탑전기 분석
 
ConvNeXt: A ConvNet for the 2020s explained
ConvNeXt: A ConvNet for the 2020s explainedConvNeXt: A ConvNet for the 2020s explained
ConvNeXt: A ConvNet for the 2020s explained
 

Ähnlich wie Multi agent reinforcement learning for sequential social dilemmas

GAMING BOT USING REINFORCEMENT LEARNING
GAMING BOT USING REINFORCEMENT LEARNINGGAMING BOT USING REINFORCEMENT LEARNING
GAMING BOT USING REINFORCEMENT LEARNINGIRJET Journal
 
Multi-Agent Reinforcement Learning
Multi-Agent Reinforcement LearningMulti-Agent Reinforcement Learning
Multi-Agent Reinforcement LearningSeolhokim
 
Deep Reinforcement Learning
Deep Reinforcement LearningDeep Reinforcement Learning
Deep Reinforcement LearningUsman Qayyum
 
A Brief Survey of Reinforcement Learning
A Brief Survey of Reinforcement LearningA Brief Survey of Reinforcement Learning
A Brief Survey of Reinforcement LearningGiancarlo Frison
 
Reinforcement learning
Reinforcement learning Reinforcement learning
Reinforcement learning Chandra Meena
 
LAFS Game Design 1 - Structural Elements
LAFS Game Design 1 - Structural ElementsLAFS Game Design 1 - Structural Elements
LAFS Game Design 1 - Structural ElementsDavid Mullich
 
LAFS Game Design 10 - Fun and Accessability
LAFS Game Design 10 - Fun and AccessabilityLAFS Game Design 10 - Fun and Accessability
LAFS Game Design 10 - Fun and AccessabilityDavid Mullich
 

Ähnlich wie Multi agent reinforcement learning for sequential social dilemmas (9)

GAMING BOT USING REINFORCEMENT LEARNING
GAMING BOT USING REINFORCEMENT LEARNINGGAMING BOT USING REINFORCEMENT LEARNING
GAMING BOT USING REINFORCEMENT LEARNING
 
Game Theory Assignment
Game Theory AssignmentGame Theory Assignment
Game Theory Assignment
 
Multi-Agent Reinforcement Learning
Multi-Agent Reinforcement LearningMulti-Agent Reinforcement Learning
Multi-Agent Reinforcement Learning
 
Deep Reinforcement Learning
Deep Reinforcement LearningDeep Reinforcement Learning
Deep Reinforcement Learning
 
A Brief Survey of Reinforcement Learning
A Brief Survey of Reinforcement LearningA Brief Survey of Reinforcement Learning
A Brief Survey of Reinforcement Learning
 
Reinforcement learning
Reinforcement learning Reinforcement learning
Reinforcement learning
 
LAFS Game Design 1 - Structural Elements
LAFS Game Design 1 - Structural ElementsLAFS Game Design 1 - Structural Elements
LAFS Game Design 1 - Structural Elements
 
Game theory
Game theoryGame theory
Game theory
 
LAFS Game Design 10 - Fun and Accessability
LAFS Game Design 10 - Fun and AccessabilityLAFS Game Design 10 - Fun and Accessability
LAFS Game Design 10 - Fun and Accessability
 

Mehr von Dong Heon Cho

Forward-Forward Algorithm
Forward-Forward AlgorithmForward-Forward Algorithm
Forward-Forward AlgorithmDong Heon Cho
 
Neural Radiance Field
Neural Radiance FieldNeural Radiance Field
Neural Radiance FieldDong Heon Cho
 
2020 > Self supervised learning
2020 > Self supervised learning2020 > Self supervised learning
2020 > Self supervised learningDong Heon Cho
 
All about that pooling
All about that poolingAll about that pooling
All about that poolingDong Heon Cho
 
Background elimination review
Background elimination reviewBackground elimination review
Background elimination reviewDong Heon Cho
 
Transparent Latent GAN
Transparent Latent GANTransparent Latent GAN
Transparent Latent GANDong Heon Cho
 
Multi object Deep reinforcement learning
Multi object Deep reinforcement learningMulti object Deep reinforcement learning
Multi object Deep reinforcement learningDong Heon Cho
 
Hybrid reward architecture
Hybrid reward architectureHybrid reward architecture
Hybrid reward architectureDong Heon Cho
 
Use Jupyter notebook guide in 5 minutes
Use Jupyter notebook guide in 5 minutesUse Jupyter notebook guide in 5 minutes
Use Jupyter notebook guide in 5 minutesDong Heon Cho
 
AlexNet and so on...
AlexNet and so on...AlexNet and so on...
AlexNet and so on...Dong Heon Cho
 
Deep Learning AtoC with Image Perspective
Deep Learning AtoC with Image PerspectiveDeep Learning AtoC with Image Perspective
Deep Learning AtoC with Image PerspectiveDong Heon Cho
 
How can we train with few data
How can we train with few dataHow can we train with few data
How can we train with few dataDong Heon Cho
 
Domain adaptation gan
Domain adaptation ganDomain adaptation gan
Domain adaptation ganDong Heon Cho
 
Dense sparse-dense training for dnn and Other Models
Dense sparse-dense training for dnn and Other ModelsDense sparse-dense training for dnn and Other Models
Dense sparse-dense training for dnn and Other ModelsDong Heon Cho
 

Mehr von Dong Heon Cho (20)

Forward-Forward Algorithm
Forward-Forward AlgorithmForward-Forward Algorithm
Forward-Forward Algorithm
 
What is Texture.pdf
What is Texture.pdfWhat is Texture.pdf
What is Texture.pdf
 
BADGE
BADGEBADGE
BADGE
 
Neural Radiance Field
Neural Radiance FieldNeural Radiance Field
Neural Radiance Field
 
2020 > Self supervised learning
2020 > Self supervised learning2020 > Self supervised learning
2020 > Self supervised learning
 
All about that pooling
All about that poolingAll about that pooling
All about that pooling
 
Background elimination review
Background elimination reviewBackground elimination review
Background elimination review
 
Transparent Latent GAN
Transparent Latent GANTransparent Latent GAN
Transparent Latent GAN
 
Image matting atoc
Image matting atocImage matting atoc
Image matting atoc
 
Multi object Deep reinforcement learning
Multi object Deep reinforcement learningMulti object Deep reinforcement learning
Multi object Deep reinforcement learning
 
Multi agent System
Multi agent SystemMulti agent System
Multi agent System
 
Hybrid reward architecture
Hybrid reward architectureHybrid reward architecture
Hybrid reward architecture
 
Use Jupyter notebook guide in 5 minutes
Use Jupyter notebook guide in 5 minutesUse Jupyter notebook guide in 5 minutes
Use Jupyter notebook guide in 5 minutes
 
AlexNet and so on...
AlexNet and so on...AlexNet and so on...
AlexNet and so on...
 
Deep Learning AtoC with Image Perspective
Deep Learning AtoC with Image PerspectiveDeep Learning AtoC with Image Perspective
Deep Learning AtoC with Image Perspective
 
LOL win prediction
LOL win predictionLOL win prediction
LOL win prediction
 
How can we train with few data
How can we train with few dataHow can we train with few data
How can we train with few data
 
Domain adaptation gan
Domain adaptation ganDomain adaptation gan
Domain adaptation gan
 
Dense sparse-dense training for dnn and Other Models
Dense sparse-dense training for dnn and Other ModelsDense sparse-dense training for dnn and Other Models
Dense sparse-dense training for dnn and Other Models
 
Squeeeze models
Squeeeze modelsSqueeeze models
Squeeeze models
 

Kürzlich hochgeladen

Gartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptxGartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptxchadhar227
 
Fun all Day Call Girls in Jaipur 9332606886 High Profile Call Girls You Ca...
Fun all Day Call Girls in Jaipur   9332606886  High Profile Call Girls You Ca...Fun all Day Call Girls in Jaipur   9332606886  High Profile Call Girls You Ca...
Fun all Day Call Girls in Jaipur 9332606886 High Profile Call Girls You Ca...kumargunjan9515
 
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...Elaine Werffeli
 
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...gajnagarg
 
Top Call Girls in Balaghat 9332606886Call Girls Advance Cash On Delivery Ser...
Top Call Girls in Balaghat  9332606886Call Girls Advance Cash On Delivery Ser...Top Call Girls in Balaghat  9332606886Call Girls Advance Cash On Delivery Ser...
Top Call Girls in Balaghat 9332606886Call Girls Advance Cash On Delivery Ser...kumargunjan9515
 
Dubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls DubaiDubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls Dubaikojalkojal131
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Researchmichael115558
 
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...HyderabadDolls
 
7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.ppt7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.pptibrahimabdi22
 
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...nirzagarg
 
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...nirzagarg
 
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...Bertram Ludäscher
 
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样wsppdmt
 
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...gajnagarg
 
Ranking and Scoring Exercises for Research
Ranking and Scoring Exercises for ResearchRanking and Scoring Exercises for Research
Ranking and Scoring Exercises for ResearchRajesh Mondal
 
Computer science Sql cheat sheet.pdf.pdf
Computer science Sql cheat sheet.pdf.pdfComputer science Sql cheat sheet.pdf.pdf
Computer science Sql cheat sheet.pdf.pdfSayantanBiswas37
 
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...Health
 
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book nowVadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book nowgargpaaro
 
Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1ranjankumarbehera14
 
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...nirzagarg
 

Kürzlich hochgeladen (20)

Gartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptxGartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptx
 
Fun all Day Call Girls in Jaipur 9332606886 High Profile Call Girls You Ca...
Fun all Day Call Girls in Jaipur   9332606886  High Profile Call Girls You Ca...Fun all Day Call Girls in Jaipur   9332606886  High Profile Call Girls You Ca...
Fun all Day Call Girls in Jaipur 9332606886 High Profile Call Girls You Ca...
 
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
 
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
 
Top Call Girls in Balaghat 9332606886Call Girls Advance Cash On Delivery Ser...
Top Call Girls in Balaghat  9332606886Call Girls Advance Cash On Delivery Ser...Top Call Girls in Balaghat  9332606886Call Girls Advance Cash On Delivery Ser...
Top Call Girls in Balaghat 9332606886Call Girls Advance Cash On Delivery Ser...
 
Dubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls DubaiDubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls Dubai
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
 
7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.ppt7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.ppt
 
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
 
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
 
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
 
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
 
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
 
Ranking and Scoring Exercises for Research
Ranking and Scoring Exercises for ResearchRanking and Scoring Exercises for Research
Ranking and Scoring Exercises for Research
 
Computer science Sql cheat sheet.pdf.pdf
Computer science Sql cheat sheet.pdf.pdfComputer science Sql cheat sheet.pdf.pdf
Computer science Sql cheat sheet.pdf.pdf
 
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
 
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book nowVadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
 
Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1
 
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
 

Multi agent reinforcement learning for sequential social dilemmas

  • 1. Multi-agent RL in Sequential Social Dilemmas Paper Review
  • 2. MARL in SSD • Multi Agent Reinforcement Learning • Sequential Social Dilemmas => Understanding Agent Cooperation => In sequential situation ( mixed incentive sturcutre of matrix game social dilemma ) learn policies.
  • 4. Social Dilemma • A social dilemma is a situation in which an individual profits from selfishness unless everyone chooses the selfish alternative, in which case the whole group loses => Represent with Matrix game
  • 5. Matrix Game – prisoner’s dilemma Nash Equilibrium This is Best Choice.. in global perspective Betrayal Cooperate Matrix Game Social Dilemma == MGSD Rational agent choice this ( Think reward is - )
  • 6. MGSD ignores… 1. In real world’s social dilemmas are temporally extended 2. Cooperation and defection are labels that apply to polices implementing strategic decision 3. Cooperativeness may be a graded quantity 4. Decision to cooperate or defect occur only quasi-simultaneously since some information about what player 2 is starting to do can inform player 1’s decision and vice versa 5. Decision must be made despite only having partial information about the state of the world and the activities of the other players
  • 7. Sequential Social Dilemma SSD = Markov Games + Matrix Game Social Dilemma
  • 8. SSD – Markov Games two-player partially observable Markov game : M => O : S x {1,2} # O = { o_i | s, o_i } Transition Function T : S x A_1 x A_2 -> delta(S) ( discrete probability distributions ) Reward Function r_i : S x A1 x A2 Policy π : O_i -> delta(A_i) == Find MGSD with Reinforcement Learning Value-state function
  • 9. SSD – Definition of SSD Sequential Social Dilemma Empirical payoff matrix Markov game에서 observation이 변함에 따라 policy가 변화
  • 10. Learning Algorithm == Deep Multiagent Reinforcement Learning Use Deep Q-Network Uniform Dist.
  • 11. Simulation Method Game : 2D grid-world Observation : 3( RGB ) x 15(forehead) x 10(side) Action : 8 ( arrow keys + rotate left + rotate right + use beam + stand ) Episode : 1000 step NN : two Hidden layer – 32 unit + relu activation 8 output Policy : e-greedy ( decrease e 1.0 to 0.1 )
  • 12. Result – Gathering Reward가 없지만… laser로 other agent를 잠깐 없앰 먹을게 (초록) 많으면 공존하면서 reward를 얻고, 적으면 서로 공격하기 시작함
  • 13. Result – Gathering Touch Green : reward +1 ( green removed temporally ) Beam to other player : (tagging) hit twice, remove opponent from game N_tagged frames Apple respawns after N_apple frames => Defecting Policy == aggressive ( use beam ) Coopertive Policy == not seek to tag the other player https://www.youtube.com/watch?v=F97lqqpcqsM
  • 14. Result – Gathering *After training for 4- million steps for each option Conflict cost Abundance Highly Agressive Low Agressive
  • 15. RL to SSD 1. Train Policies at Different Game 2. Extract trained Policies from 1. 3. Calculate MGSD 4. Repeat 2-3 Until Converge
  • 16. Gathering : DRL to SSD Prisoner Dilemma or Non-SSD : ( NE is Global Optimal )
  • 18. Wolfpack r_team : reward when touch prey same time radius : capture radius ( collision size ) == difficulty of capture
  • 20. Material Link • https://arxiv.org/pdf/1702.03037.pdf • https://deepmind.com/blog/understanding-agent- cooperation/