Malmotutorial

Artificial General Intelligence in Games

Human-level Control
Through Deep
Reinforcement
Learning
V. Mnih et al.
https://storage.googleapis.com/deep
mind-
media/dqn/DQNNaturePaper.pdf

General Video Game
AI: a Multi-Track
Framework for
Evaluating Agents,
Games and Content
Generation Algorithms
Diego Perez-Liebana, Jialin Liu,
Ahmed Khalifa, Raluca D.
Gaina, Julian Togelius, Simon
M. Lucas
https://arxiv.org/pdf/1802.10363
http://www.gvgai.net

Project Malmo
aka.ms/Malmo
github.com/Microsoft/malmo
The Malmo Platform for Artificial Intelligence
Experimentation
Matthew Johnson, Katja Hofmann, Tim
Hutton, & David Bignell 2016

Malmo design principles
Beyond “narrow AI” with multi-task learning
Wired for multi-agent tasks (including human agents)

Use Cases and Design Principles
into the game through
an intuitive yet powerful API
– building on existing Minecraft
capabilities
Built for extensions and novel uses – open
source; “plug-and-play” design of
observation, command, reward handlers
Low entry
barrier: provide
cross-language
(currently: Java,
.NET, C/C++,
Python, Lua) &
cross-platform
(Windows, Linux,
MacOS) API

Malmo = Minecraft Mod + API + tools

import
import
MalmoPython
MalmoPython “my_mission.xml"
MalmoPython “save.tgz"
// start a new mission
while
// interpret world state
for in
print "Summed reward:"
for in
print "Observation:"
// act
"move 1"
"turn 0.5"
“jump 1"
Python example

Example: Tabular Q-Learning in Malmo

Example: Deep Q-Learning in Malmo

Low entry barrier, yet powerful
Wired for multi-agent tasks (including human agents)

<ServerHandlers>
<FlatWorldGenerator
generatorString="3;7,220*..."/>
<DrawingDecorator>
[...]
<DrawCuboid x1="-2" y1="45" z1="-2" x2="7"
y2="45" z2="18" type="lava" /> 
<DrawCuboid x1="1" y1="45" z1="1" [...]
type="sandstone" /> 
<DrawBlock x="4" y="45" z="1" type="cobblestone"
/> 
[...]
</ServerHandlers>
<AgentHandlers>
<ObservationFromFullStats/>
<DiscreteMovementCommands>
<ModifierList type="deny-list">
<command>attack</command>
</ModifierList>
</DiscreteMovementCommands>
<RewardForTouchingBlockType>
<Block reward="-100.0" type="lava“
behaviour="onceOnly"/>
<Block reward="100.0" type="lapis_block“
behaviour="onceOnly"/>
</RewardForTouchingBlockType>
<RewardForSendingCommand reward="-1"/>
</AgentHandlers>
Example Task
(Mission XML)

Creating new tasks is easy http://sameersingh.org/courses/ai
proj/sp17/projects.html

Low entry barrier, yet powerful
Beyond “narrow AI” with multi-task learning

A natural environment for multi-agent learning

Goal: foster research in
collaborative AI
Details: https://www.microsoft.com/en-us/research/academic-program/collaborative-ai-challenge

MARLÖ Competition –
The Multi-Agent Reinforcement
Learning in MalmÖ

MARLO: Motivation
• General-reward settings are the most realistic for many real-world
applications but are also notoriously challenging
• More research on insights and approaches that generalize beyond individual
tasks and opponent types.
• The cost of creating tasks and opponents amortizes as both can be shared by
a large community

Overview
• Participants develop agents which play tasks on Malmo platform
• The agents play in multiple games of different scenarios
• Each game has a different set of multi-agent tasks for training, validation and
final test
• Participants use those tasks to train and validate their agents
• The agents play the final test task to determine the winner of MARLO in a
tournament

Evaluation
• Each league (P players in a group) is played across the same N games, with T
repetitions on the private task of each game.
• Each game has its own leaderboard, ranking entries and awarding points: 25
points for the 1st, 18 for the 2nd, 15, 12, 10, 8, 6, 4, 2, 1 and 0 for the 11th
onwards.
• The final ranking for each league is determined by summing points across all
games.

Schedule (draft)
• Same version as
multi-agent tasks
but using bots,
which run locally
• Top 32 evaluated
teams are invited
to the final round
• Multi-agent games
in remote server
for final
tournament
• Live competition!!

Participation: Eligibility
• A team consists of up to five participants
• 18 years of age or older. If any team member is 18 years of age or older, but is
considered a minor in their place of residence, they should ask their parent’s
or legal guardian’s permission prior to submitting an entry into the
Competition
• Award: available only for participants affiliated with a University or a non-
profit research organization

What you get from the competition
• Award
• 1st place: 10,000 USD-equivalent Azure plus a travel grant to join a relevant academic
conference or workshop.
• 2nd place: 5,000 USD-equivalent Azure.
• 3rd place: 3,000 USD-equivalent Azure.
• Publication
• The top three entries will be invited as co-authors in a paper summarizing the
competition structure, rules, approaches, results and main take-aways.

Mob Chase
1 point
0.2 points
-0.02 points

Mob Chase
__________
_wwwwwwwww
_w*.....=w
ww......ww
w=...*..w_
ww......w_
_w.*..*.ww
_w......=w
_ww=wwwwww
__www_____
_www______
_w=wwwwww_
_w..*.*.w_
_w*.....w_
_w......w_
_w.*.*..w_
_w......w_
_w......w_
_w==wwwww_
_wwww_____
________www_
wwwwwwwww=w_
w=......*.w_
ww........ww
_w....*.*.=w
ww........ww
w=.....*..w_
ww*.*.....w_
_w........w_
_w........w_
_wwwwwwwwww_
____________
__________
__________
__________
__________
__________
__________
__________
__________
__________
__________
__________
_wwwwwwww_
_w______w_
_w______w_
_w______w_
_w______w_
_w______w_
_w______w_
_wwwwwwww_
__________
__________
_wwwwwwww_
_w......w_
_w......w_
_w......w_
_w......w_
_w......w_
_w......w_
_wwwwwwww_
__________
__________
_wwwwwwwww
_w......=w
ww......ww
w=......w_
ww......w_
_w......ww
_w......=w
_ww=wwwwww
__www_____
__________
_wwwwwwww_
_w......=_
_w......w_
_=......w_
_w......w_
_w......w_
_w......=_
_ww=wwwww_
__________
__________
_wwwwwwwww
_w*.....=w
ww......ww
w=...*..w_
ww......w_
_w.*..*.ww
_w......=w
_ww=wwwwww
__www_____

Build Battle
1 point
• +.2 points
• -.2 points
-0.02 points

Treasure Hunt
0.5 points
• 0.25 points
• -1 points
-0.02 points

Treasure Hunt
wwwwwwwwwwwwwwwwwwww
w...e+.............w
w..................w
w.gggggggggggggggggw
w...............A..w
w.................Aw
w..................w
w.......e.....+..+.w
w.......+..........w
w..................w
w..................w
w......=...........w
wwwwwwwwwwwwwwwwwww
wg...............=w
wg................w
wg................w
wg...A............w
w.................w
wg........*......Aw
wg................w
wg................w
wg............+...w
wg......e.........w
wg..............*.w
wg................w
wg.e..............w
wg............*...w
wg................w
wggggggggg.gggggggw
wg................w
wg................w
wwwwwwwwwwwwwwwwwww
wwwwwwwwwwwwwwwwwwwww
w........g..........w
we.+.....g..........w
w........g......+...w
w*....e..g..........w
w........g..........w
w........g..........w
w........g..........w
wA.......g..........w
w...A...............w
w........g.........=w
w........g..........w
w........g...+......w
w........g..........w
w........g..........w
w..*.....g..........w
wwwwwwwwwwwwwwwwwwwww
w..................w
w..................w
w..................w
w..................w
w..................w
w..................w
w..................w
w..................w
w..................w
w..................w
w..................w
w....+.............w
w..................w
w..................w
w..................w
w..................w
w.............+..+.w
w.......+..........w
w..................w
w..................w
w..................w
w..................w
w..................w
wggggggggggggggggggw
w..................w
w..................w
w..................w
w..................w
w..................w
w..................w
w..................w
w..................w
w..................w
w..................w
w..................w
w..................w
w..................w
w..................w
w..................w
w..................w
w..................w
w..................w
w...e+.............w
w..................w
w..................w
w..................w
w..................w
w.......e.....+..+.w
w.......+..........w
w..................w
w..................w
w..................w
w...e+.............w
w..................w
w..................w
w..................w
w..................w
w.......e.....+..+.w
w.......+..........w
w..................w
w..................w
w......=...........w
w...e+.............w
w..................w
w...............A..w
w.................Aw
w..................w
w.......e.....+..+.w
w.......+..........w
w..................w
w..................w
w......=...........w

Qualifying Task
• MarLo-FindTheGoal-v0
• 7x7 room
• Goal: find the goal ☺
(yellow block)
• Rewards:
• -0.01 per command
• 100 commands max
• 1.0 find goal
• -0.1 out of time

Participating in the Competition

Project MARLÖ
• Multi-Agent Reinforcement Learning
in Malmo
• Reinforcement learning wrapper
build on top of project Malmo
• Proposes to inspire the creation of
extremely potent general agents
through a multi-agent, multi-game
environment
• Uses OpenAI GYM format
• Also on GitHub!
• https://github.com/crowdAI/marLo

• Install Malmo
• Anaconda (recommended)
• Pip (+ git)
• Repack
• Manual compilation
• Install Marlo
Installation instructions

Agents: a semi-technical view
• Agents in Marlo are simple and work in a very Gym-like format:
• Start up a Minecraft client on port 10000
• Use “marlo.make()” function to make an environment. This returns a user token
• Use the user token to generate an image of the environment for agent use with
“marlo.init()”
• Run an agent to play the game
• We have seen a sample random agent that plays any game it connects to
• We also provide examples of more complex agents:
• ChainerRL agents (DQN, PPO)
• TensorBoard-Chainer plotting compatible
• Other environments (TensorFlow, KerasRL, PyBrain) are possible – the only
requirement is that they comply with the Gym API

Experiments
• A simple script which trains an agent over a set number of steps and
episodes is provided within the Marlo package
• The underlying functionality is simple: at the beginning of training, reset
the environment:

Experiments
• Main loop with stopping condition:
• Episode ends or maximum number of steps reached

Experiments
• Log results of the episode
• We incorporate an example to plot using Tensorboard-Chainer

Plotting results (Tensorboard-Chainer)
• Works much like your typical
Tensorboard, only it’s
abstracted to work with
Chainer
• Can be used to gather
images, text, audio and
histograms

Submission
1. Create a private repository on gitlab.crowdai.org. It must contain:
• Dockerfile that installs dependencies and sets up everything
• crowdai.json file with this mandatory fields:
• challenge_id - ”marLo"
• grader_id - " marLo"
• author - name of the author (string), for teams, pleas also create a field 'authors'
containing a list with all authors

Submission
2. Submitting to crowdAI:
• Create and push a new tag
• Each tag counts as a new submission:
• You will be able to see your AI agent actually play the game and see more
details about your submission evaluation of your submission on:
https://gitlab.crowdai.org/<your-crowdAI-user-name>/marLo/issues
• A video of the game will also be generated and available from the leaderboard

• Follow
Malmo: @Project_Malmo and website (aka.ms/malmo)
People on Twitter: @diego_pliebana, @katjahofmann, @MeMohanty
• MARLO Github:
https://github.com/crowdAI/marLo
• MARLO Documentation:
https://marlo.readthedocs.io/en/latest/
• Competition website
https://www.crowdai.org/challenges/marlo-2018
• AIIDE 2018 Workshop
https://marlo-ai.github.io/
Follow the project

Hands-On Time
1. Install Malmo and Marlo
2. Play the games
3. Execute agents
Doc: https://marlo.readthedocs.io/en/latest/
Code: https://github.com/crowdAI/marLo/
Competition: https://www.crowdai.org/challenges/marlo-2018
We’re here to help!

Malmotutorial

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Ähnlich wie Malmotutorial

Ähnlich wie Malmotutorial (20)

Mehr von Hirono Jumpei

Mehr von Hirono Jumpei (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Malmotutorial