SlideShare ist ein Scribd-Unternehmen logo
1 von 43
Downloaden Sie, um offline zu lesen
ADAPTIVE HIGH-LEVEL
STRATEGY LEARNING IN
STARCRAFT
SBGames 2013
Jiéverson Maissiat and Felipe Meneguzzi
Background
Reinforcement Learning
A learning technique for agents acting in the
environment, observing the reached states and
the received rewards at each step
Reinforcement Learning
Q-Learning
Updates the value of a state-action pair after
the action has been taken in the state and an
immediate reward has been received
Q-Learning

Q(s,a)

●
●
●

s represents the current state of the world;
a represents the action chosen by the agent;
Q(s,a) represents the value obtained the last time action a was executed at state s. This value is
often called Q-value.
Q-Learning

Q(s,a)

●

r

r represents the reward obtained after performing action a in state s;
Q-Learning

Q(s,a)

●

fit is responsible to adjust the reward;

fit( r )
Q-Learning

Q(s,a) = Q(s,a)

●

+

The Q-value is increased with the adjusted reward;

fit( r )
Q-Learning

Q(s,a) = Q(s,a)

●

+

α

*

r

α is the learning-rate, which determines the weight of new information over what the agent
already knows: a factor of 0 prevents the agent from learning anything (by keeping the Q-value
identical to its previous value) whereas a factor of 1 makes the agent consider all newly obtained
information;
Q-Learning

Q(s,a) =

●

0

If the Q-value starts in 0...

+

α

*

r
Q-Learning

Q(s,a) =

●

0

and the received reward is 2...

+

α

*

2
Q-Learning

Q(s,a) =

●

0

with a learning-rate of 0.3...

+

0.3 *

2
Q-Learning

0.6

●

=

0

than the Q-value is updated to 0.6;

+

0.3 *

2
Q-Learning

-2.4

●

=

0.6

+

Others updates will be done over the first;

0.3 *

-10
Q-Learning

Q(s,a) =

●

And so it follows.

-2.4

+

0.3 *

r
Parameter Control
α / learning-rate
Influence how the state values are computed,
and ultimately how a policy is generated
Learning Rate: α
●

High Learning-rate?
It can prevent the algorithm from converging, or lead to inaccuracies in the
computed value of each state (e.g. a local maximum);

●

Low Learning-rate?
It can hampers the convergence of the Q-value;

●

Varying the Learning-rate?
A typical way to vary the α-value, is to start interactions with a value close
to 1, and then decrease it over time toward 0.However, this approach is not
effective for dynamic environments, since a drastic change in the
environment with a learning-rate close to 0 prevents the agent from
learning the optimal policy in the changed environment.
Exploration Policy
ε-greedy

It has the probability ε to select a random action, and a
probability 1 - ε to select the optimal action (which has the
highest Q-value)
Meta-Level Reasoning
Meta-Level Reasoning
StarCraft:
Brood War
StarCraft
●

Real-Time Strategy
(RTS)

●

Resource Management

●

Battle!
StarCraft
StarCraft
META-LEVEL
REINFORCEMENT
LEARNING
Meta-Level Reasoning
Meta-Level Reasoning

Reinforcement
Learning
Meta-Level Learning
Meta-Level Reasoning
ε-greedy
Meta-Level Action Selection
ε-greedy modified:
Instead of keeping the exploitation-rate (ε-value) constant, we apply the
same meta-level reasoning to the ε-value, increasing the exploration rate,
whenever we find that the agent must increase its learning-rate.

The more the agent wants to learn, the more it wants to
explore; if there is nothing to learn, there is nothing to
explore.

ε=α
Experiments
in StarCraft
Code Injection at StarCraft
● BWAPI
○ Framework in C++ to inject code at
StarCraft;
● BTHAI
○ Bot implemented in BWAPI, with different
high-level strategies;
A Reinforcement Learning
Approach for StarCraft
● High-Level Strategy
○ Learning and selection of strategies rather
than micro-actions;
● Learn the end of each match
○ Feedback of victory: +1
○ Feedback of defeat: -1
● Terran vs CPU
○ It was assumed that the agent plays only
as Terran;
List of Terran Strategies
of BTHAI
● Marine Rush
● Wraith Harass
● Terran Defensive
● Terran Defensive FB
● Terran Push
Results
Source Code
BWAPI & StarCraft
https://github.com/jieverson/BTHAIMOD
Thank You

contact@jieveron.com
Porto Alegre, 18 de outubro de 2013

Weitere ähnliche Inhalte

Ähnlich wie Adaptive High-Level Strategy Learning in StarCraft

Reinforcement learning
Reinforcement learningReinforcement learning
Reinforcement learningElias Hasnat
 
An efficient use of temporal difference technique in Computer Game Learning
An efficient use of temporal difference technique in Computer Game LearningAn efficient use of temporal difference technique in Computer Game Learning
An efficient use of temporal difference technique in Computer Game LearningPrabhu Kumar
 
Reinforcement learning, Q-Learning
Reinforcement learning, Q-LearningReinforcement learning, Q-Learning
Reinforcement learning, Q-LearningKuppusamy P
 
Survey of Modern Reinforcement Learning
Survey of Modern Reinforcement Learning Survey of Modern Reinforcement Learning
Survey of Modern Reinforcement Learning Julia Maddalena
 
Reinforcement learning
Reinforcement learningReinforcement learning
Reinforcement learningDing Li
 
Reinforcement Learning Guide For Beginners
Reinforcement Learning Guide For BeginnersReinforcement Learning Guide For Beginners
Reinforcement Learning Guide For Beginnersgokulprasath06
 
24.09.2021 Reinforcement Learning Algorithms.pptx
24.09.2021 Reinforcement Learning Algorithms.pptx24.09.2021 Reinforcement Learning Algorithms.pptx
24.09.2021 Reinforcement Learning Algorithms.pptxManiMaran230751
 
Reinforcement Learning on Mine Sweeper
Reinforcement Learning on Mine SweeperReinforcement Learning on Mine Sweeper
Reinforcement Learning on Mine SweeperDataScienceLab
 
Reinfrocement Learning
Reinfrocement LearningReinfrocement Learning
Reinfrocement LearningNatan Katz
 
reiniforcement learning.ppt
reiniforcement learning.pptreiniforcement learning.ppt
reiniforcement learning.pptcharusharma165
 
Reinforcement Learning
Reinforcement LearningReinforcement Learning
Reinforcement LearningSVijaylakshmi
 
Head First Reinforcement Learning
Head First Reinforcement LearningHead First Reinforcement Learning
Head First Reinforcement Learningazzeddine chenine
 
Reinforcement learning
Reinforcement learning Reinforcement learning
Reinforcement learning Chandra Meena
 
RL_online _presentation_1.ppt
RL_online _presentation_1.pptRL_online _presentation_1.ppt
RL_online _presentation_1.pptssuser43a599
 
Lecture notes
Lecture notesLecture notes
Lecture notesbutest
 
これからの強化学習第1章 1.1 1.2
これからの強化学習第1章 1.1 1.2これからの強化学習第1章 1.1 1.2
これからの強化学習第1章 1.1 1.2Takashi Tamura
 
RL_Dr.SNR Final ppt for Presentation 28.05.2021.pptx
RL_Dr.SNR Final ppt for Presentation 28.05.2021.pptxRL_Dr.SNR Final ppt for Presentation 28.05.2021.pptx
RL_Dr.SNR Final ppt for Presentation 28.05.2021.pptxdeeplearning6
 

Ähnlich wie Adaptive High-Level Strategy Learning in StarCraft (20)

Reinforcement learning
Reinforcement learningReinforcement learning
Reinforcement learning
 
An efficient use of temporal difference technique in Computer Game Learning
An efficient use of temporal difference technique in Computer Game LearningAn efficient use of temporal difference technique in Computer Game Learning
An efficient use of temporal difference technique in Computer Game Learning
 
Reinforcement learning, Q-Learning
Reinforcement learning, Q-LearningReinforcement learning, Q-Learning
Reinforcement learning, Q-Learning
 
CS799_FinalReport
CS799_FinalReportCS799_FinalReport
CS799_FinalReport
 
Survey of Modern Reinforcement Learning
Survey of Modern Reinforcement Learning Survey of Modern Reinforcement Learning
Survey of Modern Reinforcement Learning
 
RL.ppt
RL.pptRL.ppt
RL.ppt
 
Reinforcement learning
Reinforcement learningReinforcement learning
Reinforcement learning
 
Reinforcement Learning Guide For Beginners
Reinforcement Learning Guide For BeginnersReinforcement Learning Guide For Beginners
Reinforcement Learning Guide For Beginners
 
24.09.2021 Reinforcement Learning Algorithms.pptx
24.09.2021 Reinforcement Learning Algorithms.pptx24.09.2021 Reinforcement Learning Algorithms.pptx
24.09.2021 Reinforcement Learning Algorithms.pptx
 
Reinforcement Learning on Mine Sweeper
Reinforcement Learning on Mine SweeperReinforcement Learning on Mine Sweeper
Reinforcement Learning on Mine Sweeper
 
Reinfrocement Learning
Reinfrocement LearningReinfrocement Learning
Reinfrocement Learning
 
reiniforcement learning.ppt
reiniforcement learning.pptreiniforcement learning.ppt
reiniforcement learning.ppt
 
Reinforcement Learning
Reinforcement LearningReinforcement Learning
Reinforcement Learning
 
Head First Reinforcement Learning
Head First Reinforcement LearningHead First Reinforcement Learning
Head First Reinforcement Learning
 
Reinforcement learning
Reinforcement learning Reinforcement learning
Reinforcement learning
 
YijueRL.ppt
YijueRL.pptYijueRL.ppt
YijueRL.ppt
 
RL_online _presentation_1.ppt
RL_online _presentation_1.pptRL_online _presentation_1.ppt
RL_online _presentation_1.ppt
 
Lecture notes
Lecture notesLecture notes
Lecture notes
 
これからの強化学習第1章 1.1 1.2
これからの強化学習第1章 1.1 1.2これからの強化学習第1章 1.1 1.2
これからの強化学習第1章 1.1 1.2
 
RL_Dr.SNR Final ppt for Presentation 28.05.2021.pptx
RL_Dr.SNR Final ppt for Presentation 28.05.2021.pptxRL_Dr.SNR Final ppt for Presentation 28.05.2021.pptx
RL_Dr.SNR Final ppt for Presentation 28.05.2021.pptx
 

Kürzlich hochgeladen

Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdflior mazor
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 

Kürzlich hochgeladen (20)

Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 

Adaptive High-Level Strategy Learning in StarCraft