7. General Video Game
AI: a Multi-Track
Framework for
Evaluating Agents,
Games and Content
Generation Algorithms
Diego Perez-Liebana, Jialin Liu,
Ahmed Khalifa, Raluca D.
Gaina, Julian Togelius, Simon
M. Lucas
https://arxiv.org/pdf/1802.10363
http://www.gvgai.net
10. Malmo design principles
Beyond “narrow AI” with multi-task learning
Wired for multi-agent tasks (including human agents)
11. Use Cases and Design Principles
into the game through
an intuitive yet powerful API
– building on existing Minecraft
capabilities
Built for extensions and novel uses – open
source; “plug-and-play” design of
observation, command, reward handlers
Low entry
barrier: provide
cross-language
(currently: Java,
.NET, C/C++,
Python, Lua) &
cross-platform
(Windows, Linux,
MacOS) API
29. MARLO: Motivation
• General-reward settings are the most realistic for many real-world
applications but are also notoriously challenging
• More research on insights and approaches that generalize beyond individual
tasks and opponent types.
• The cost of creating tasks and opponents amortizes as both can be shared by
a large community
30. Overview
• Participants develop agents which play tasks on Malmo platform
• The agents play in multiple games of different scenarios
• Each game has a different set of multi-agent tasks for training, validation and
final test
• Participants use those tasks to train and validate their agents
• The agents play the final test task to determine the winner of MARLO in a
tournament
33. Evaluation
• Each league (P players in a group) is played across the same N games, with T
repetitions on the private task of each game.
• Each game has its own leaderboard, ranking entries and awarding points: 25
points for the 1st, 18 for the 2nd, 15, 12, 10, 8, 6, 4, 2, 1 and 0 for the 11th
onwards.
• The final ranking for each league is determined by summing points across all
games.
34. Schedule (draft)
• Same version as
multi-agent tasks
but using bots,
which run locally
• Top 32 evaluated
teams are invited
to the final round
• Multi-agent games
in remote server
for final
tournament
• Live competition!!
35. Participation: Eligibility
• A team consists of up to five participants
• 18 years of age or older. If any team member is 18 years of age or older, but is
considered a minor in their place of residence, they should ask their parent’s
or legal guardian’s permission prior to submitting an entry into the
Competition
• Award: available only for participants affiliated with a University or a non-
profit research organization
36. What you get from the competition
• Award
• 1st place: 10,000 USD-equivalent Azure plus a travel grant to join a relevant academic
conference or workshop.
• 2nd place: 5,000 USD-equivalent Azure.
• 3rd place: 3,000 USD-equivalent Azure.
• Publication
• The top three entries will be invited as co-authors in a paper summarizing the
competition structure, rules, approaches, results and main take-aways.
54. Qualifying Task
• MarLo-FindTheGoal-v0
• 7x7 room
• Goal: find the goal ☺
(yellow block)
• Rewards:
• -0.01 per command
• 100 commands max
• 1.0 find goal
• -0.1 out of time
56. Project MARLÖ
• Multi-Agent Reinforcement Learning
in Malmo
• Reinforcement learning wrapper
build on top of project Malmo
• Proposes to inspire the creation of
extremely potent general agents
through a multi-agent, multi-game
environment
• Uses OpenAI GYM format
• Also on GitHub!
• https://github.com/crowdAI/marLo
71. Agents: a semi-technical view
• Agents in Marlo are simple and work in a very Gym-like format:
• Start up a Minecraft client on port 10000
• Use “marlo.make()” function to make an environment. This returns a user token
• Use the user token to generate an image of the environment for agent use with
“marlo.init()”
• Run an agent to play the game
• We have seen a sample random agent that plays any game it connects to
• We also provide examples of more complex agents:
• ChainerRL agents (DQN, PPO)
• TensorBoard-Chainer plotting compatible
• Other environments (TensorFlow, KerasRL, PyBrain) are possible – the only
requirement is that they comply with the Gym API
78. Experiments
• A simple script which trains an agent over a set number of steps and
episodes is provided within the Marlo package
• The underlying functionality is simple: at the beginning of training, reset
the environment:
79. Experiments
• Main loop with stopping condition:
• Episode ends or maximum number of steps reached
80. Experiments
• Log results of the episode
• We incorporate an example to plot using Tensorboard-Chainer
81. Plotting results (Tensorboard-Chainer)
• Works much like your typical
Tensorboard, only it’s
abstracted to work with
Chainer
• Can be used to gather
images, text, audio and
histograms
82. Submission
1. Create a private repository on gitlab.crowdai.org. It must contain:
• Dockerfile that installs dependencies and sets up everything
• crowdai.json file with this mandatory fields:
• challenge_id - ”marLo"
• grader_id - " marLo"
• author - name of the author (string), for teams, pleas also create a field 'authors'
containing a list with all authors
83. Submission
2. Submitting to crowdAI:
• Create and push a new tag
• Each tag counts as a new submission:
• You will be able to see your AI agent actually play the game and see more
details about your submission evaluation of your submission on:
https://gitlab.crowdai.org/<your-crowdAI-user-name>/marLo/issues
• A video of the game will also be generated and available from the leaderboard
84. • Follow
Malmo: @Project_Malmo and website (aka.ms/malmo)
People on Twitter: @diego_pliebana, @katjahofmann, @MeMohanty
• MARLO Github:
https://github.com/crowdAI/marLo
• MARLO Documentation:
https://marlo.readthedocs.io/en/latest/
• Competition website
https://www.crowdai.org/challenges/marlo-2018
• AIIDE 2018 Workshop
https://marlo-ai.github.io/
Follow the project
86. Hands-On Time
1. Install Malmo and Marlo
2. Play the games
3. Execute agents
Doc: https://marlo.readthedocs.io/en/latest/
Code: https://github.com/crowdAI/marLo/
Competition: https://www.crowdai.org/challenges/marlo-2018
We’re here to help!