Slides from the 2017 Copenhagen Azure Saturday session about building the Hitman game backend and services on Microsoft Azure.
Subjects covered are Game development, Cloud Service Architecture, Actor model, Orleans and Analytics.
3. Hitman
• Six games for a nearly 20 years old
franchise
• Third person, stealth-action puzzle game
• Huge sandbox levels to explore
• Figure out the clockwork, take out the
target
• AAA title, shipping on PC, PS4 and XBoxOne
4. Game development
• Creative field
• AAA games
• High risk multi-year project
• Similar to a blockbuster hollywood movie
• Hitman Absolution took 6 years
• Need to innovate
• Adapt faster to the market
• Keep players engaged with the franchise
5. Hitman as a Service
• Ever expanding, ever evolving
• Digital first
• Episodic
• Listen and react to the community
• Live content
• Iterative development
• Put a minimal version live and iterate
6. Backend
• Extensibility
• Server delivered missions
• Server delivered menus
• Server driven game modes
• Authoritative server
• Scoring
• Progression
• Contracts creation
• Supporting services
• Analytics, authorization, configuration,
administration
• Keep it game driven
8. Azure services
Agent 47’s weapons of choice
WebApps
˃ Shared services & web tools
CloudServices
˃ Game logic and stateful user sessions
Search
˃ UGC discoverability
Storage
˃ Blob, Key value storage & Queues
SQL
˃ Relational storage
StreamAnalytics
˃ Analytic processing
DocumentDB
˃ Object storage
AppInsights
˃ Service monitoring & diagnostics
DataFactory
˃ Analytic pipeline orchestration
HDInsight
˃ On-demand big data processing
EventHub
˃ Metric events pipeline
Redis Cache
˃ Managed Redis
9. Going deeper: Game services
Hitman
Game Client
Game Cloud Service
Service
Web Roles
Service
Worker Roles
Orleans
Azure
Search
Game Service Storage Account
Storage tablesResource blobs
(privates)
Config blobs
(publics)
Service blobs
(privates)
Azure Redis
Cache
Game Service SQL DB
HTTPS
10. Event-based evaluation
• Progression, Scoring and more
• Examples: NPC Kill, Disguise, Trespassing,
Shots fired
• C# ”Scripts” evaluation
• Bigger evaluation at the end of a session
• Custom state machine definition
• Executed both client & server side
11. Hitman
Game Client
Game Cloud Service
Service
Web Roles
Service
Worker Roles
Orleans
Azure
Search
Game Service Storage Account
Storage tablesResource blobs
(privates)
Config blobs
(publics)
Service blobs
(privates)
Azure Redis
Cache
Game Service SQL DB
HTTPS
12. Orleans
• Open source framework
• Actor model for .net
• Isolated, persistent, light weight, single
threaded objects
• Dynamically instanciated
• Blocking concurrency at the actor level
• Each player is an actor
• Great way to scale: run more actors !
• Isolate user sessions
• Automated lifecycle
13. Service Fabric
• Same approach, but fully managed
• Actors running on a scalable cluster
• Great development environment
25. Looking back
• More than a year being live
• More than one client update per month
• Even more server updates
• Live content coming out every week
• Learnings
• Azure runs great !
• Automation & process are key
• Operating on a live product
26. Looking back
more learnings
• Beware when using small VM sizes
• Port exhaustion comes quickly (~1k limit)
• Be careful with HttpClient instances
• CloudService load-balancing
• Renegociate persistent connections regularly
• ARM templates are great
• Once you find examples and documentation
Good afternoon, and thanks for coming to this session.
I am Christian Corsano, I serve as Lead Online Programmer at Io-Interactive, and I am here to talk about how we took Hitman from a traditional game to a running online experience, and how Microsoft Azure helped us to do that.
But first, let’s introduce the game with a small video.
Hitman is nearly 20 years old now, and the our new release is the sixth game in the franchise.
It can be best described as a third person stealth action puzzle game, where you explore massive sandbox levels, and through extensive replay, figure out their intricate clockwork, and use it at your advantage to take out your targets.
It is a major title, targetting PC PS4 and XBoxOne.
To give a bit of context, let’s taking one step back.
We called our approach ”Hitman as a Service”
Hitman is the first digital-first, episodic AAA game.
Our huge sandbox levels are made to be explored over and over again, discovering the many layers of content hidden deep inside them, delivering them in episodes to let the players discover them.
It is also an opportunity to constantly listen to the community and deliver custom tailored content based on their feedback.
We also constantly released weekly content for the past year, including Elusive Targets where you get a limited time to take out a hidden target in a modified location.
You have one shot, if you miss it you will not be able to restart the mission, if you finish it you will not be able to improve your score.
This means that Hitman had to move from a very traditional boxed release to something extremely flexible and reactive.
Having this constant stream of releases and competitive modes called for a brand new online backend.
Of course Hitman is not a MMO, the data is installed on the player’s hard drive, but within these levels the objectives definitions are delivered from the server, allowing us to build and deliver more very easily.
The game menus which presents the content are also assembled from server data, allowing us to tweak them live if needed.
Game modes such as Elusive Targets need very tight time constraints and fair evaluation, which is provided by the backend as well.
These are all requirements very specific to our game and the model we chose, but more generally any modern game requires analytics, and might use some more common online services.
We are still game developers though, and we need to focus on building the game features.
Here is a summary of the services we use within our current solution.
Some of them are core to the solution, such as CloudServices, Azure Storage, SQL, Redis, some other actually drove the development of complete features, such as Azure Search which is powering the discoverability of huge number of user created contracts in a fun and meaningful way.
As we grew smarter in using them, we were able to tweak our usage, refine our implementation to gain performance and reduce cost.
We also sometime discover a new service or pricing tier in preview and jump on it, such as recently with a new Azure SQL offering more suited to run part of our Analytics pipeline.
This is a bird-view description of our game services, we have one of these for each of the 3 platforms we support : PC, PS4 and XBoxOne.
Our game is communicating with the server through standard HTTPS, and the connection is established with what is called in Azure Cloud Services terms a Web Role, which is basically a managed node running a specific web application on an IIS server.This Web Role scales as needed depending on average CPU load, based on rules we defined, and is in charge of deserializing payloads and forwarding calls to our service framework.
That service framework is powered by Microsoft Orleans, of which I will speak more in a bit, and actual service logic sits within a Worker Role, which can be described as a managed, distributed Windows Service.
One nice thing about this is that each role is dealing with very different jobs, and are configured to run and scale independently.
The webrole is stateless, doing a lot of serialization and deserialization work on top of dealing with HTTPS termination through IIS, while the service worker role is stateful, much more memory and CPU intensive.
As a consequence, the web role is running on less core and memory, and scale very frequently, while the worker role is running on a bigger setup and scale less often.
For storage we are following the general principle of using the right tool for the right job, and depending on access pattern we are saving our state to different services:
Azure Blob Storage for static configuration and menu assets
Azure Table Storage for high volume player data
Azure SQL for player inventory and contractsAzure Redis for real time leaderboards
Azure Search for user generated contract discovery
So, in more practical terms, what exactly makes Hitman an online game ?
I mentioned some of our core meta-game systems were online, such as Progression (which is mostly based on Challenges, an achievement-like system), or Scoring.
These are powered by a stream of rich gameplay events triggered by the players and sent to the backend.
These events range from the obvious - and very detailed - ”Kill” event when a NPC is eliminated in the game, to taking one of our many disguises or trespassing in a forbidden area.
Each event is evaluated against a set of rules expressed in C#, and stored in memory for the duration of the session.A chain of evaluation is triggered against the full session when the player exits, or fails, to compute and award XP, challenges (achievements) and of course score and rank on global leaderboards.
We also needed a way to drive local evaluation of objectives and challenges that would behave the same on client and server.
This is challenging as we are not actually running the game or any part of it on the server, so we came up with a small, specialized state-machine definition implemented on both side (and unit-tested).
It means we can give real-time feedback to the player on these online systems, while keeping it server authoritative.
Our approach was service oriented from day one, and we early on tried to have per-player service semantics to cope with scale and concurrency.
To do this, we developed our own little .Net framework to declare and execute service operations, and after testing different implementations, landed with Microsoft Orleans, which .
Orleans is an Actor model framework for .net. Those of you knowing Erlang (or Elixir) probably know what this is about.
An actor is an isolated, light weight single threaded ”program” running on a distributed system.
On Orleans, actors are dynamically instanciated and the framework ensure that no concurrent calls can reach a given actor at any time.
It means for Hitman that each of our players has his own actor as soon as he connects to the system, and will keep that running context for the duration as long as he stays active.
This allows us to keep an in-memory state for each player, to reduce storage access to our various solutions.
In case the cluster needs to scale up or down, actors will be migrated by the framework and we will only have to ensure the state is writted to persistent storage and restored on the new node.
One of the nicest guarantee Orleans gives us is no concurrent calls on a stateful actor: this means we have (almost) no shared state to worry about on our system.
One thing to note is that around the time we implemented our system with Orleans on top of Cloud Services (which are quite old-school now), Microsoft came up with a managed version of it, which is now the preferred way of running massively distributed code on Azure.
It is called Service Fabric, and is doing the same thing more natively, with better tooling, and more efficiently.
As we have our service running we have not started migrating yet, but if you are going to do anything with Azure you should definitely start there.
Another aspect of our backend I would like to talk about is Analytics.
Metrics are obviously a key part of game development, and serves multiple purpose.
Of course there are some base KPI that you would like to have for any game, in order to follow things like acquisition and retention, but there is a world of other opportunities when gathering metrics more specific to your own product.
We are sending telemetry to a separate data pipeline solely dedicated to analytics, and we use this collect data that helps us evaluate how our players play our game, and also keep an eye on error indicators, such as disconnection to the servers.
This pipeline is built using some of the extensive Azure analytics offering.
You can see the list here, but let’s take a closer look.
The game sends telemetry to a collector webapp, which allow us to route the events to different endpoints when needed.
This web app then writes the metrics to an Azure EventHub, which is a managed service similar to Apache Kafka in the non Azure world.
This event hub has several Azure stream analytics jobs doing near-realtime aggregation.
Some of these aggregations are directly written to Table storage and are used for monitoring and also reinjected into the game service, for things such as contracts trending.
Another stream outputs the raw data into hourly raw files, to be consumed by pig scripts running on a on-demand Hadoop cluster every night.
Then the aggregated files are pumped into SQL stored procedure to be exposed and used in reports.
All of it is orchestrated by Data Factory, which keeps track of what tranform has been run for what time slice.
Raw event data coming from blob storage is aggregated by an Azure HDInsight (Hadoop) cluster, ran automatically, on-demand, by DataFactory every night.
This means we are paying the cluster for 1 or 2 hours a day.
The aggregated data is output into multiple flat CSV files, and the data pipeline fans out to each individual daily extraction.
Most of these are plugged on a SQL Stored Procedure activity, where the tabular data is fed by Data Factory in batches to a SQL stored procedure, when we can run insert or merge logic.
Some of these output table are then triggering another activity to do additional aggregation step.
As you can see it can get fairly big, but it keeps the whole flow simple and easy to monitor and manage.
You can also temporarilly fork your data stream to do data experiment for instance, without risking impacting your main data flow.
Now having data is nice, but what did we do with it ?
Beside all traditional reporting and the occasional anomaly investigation, we also are looking at ways to improve our game and level design.
For instance, all Hitman players starts in the tutorial level you can see above, and we took a look at how they played on
This is a map of where players get spotted when infiltrating the boat on the first tutorial mission.
I only got static images in that presentation, but this is taken from an interactive report where we can filter points for the first minute of playtime for instance.
Usually what happen when players get spotted is fairly straightforward: this is the map of where players get killed in the first 5 minutes of the playthrough.
You can see they get killed a lot, and some in places that we considered very safe.
And here we can see where they equip different disguises.
All this data helped us to understand what could be improved when guiding first timers in our game, which is notoriously difficult, and very different from traditionnal stealth action games.
Taking a look back, the journey has been amazing so far.
Not counting the beta early 2016, we released in march and quickly moved to support the product and iron out the issues.
We shipped more than one client update per month over the year that followed, on 3 different platforms each time.
We patched and improved the server even more, and that without considering the new content we have been pushing out every week.
We did learn a lot, and improved our workflow accordingly.
Among that, we automated more of our deployment tools, formalized our release process, and established a decision process to decide live drops and report on consumer issues.
Most importantly, we learned to operate on a live game, which is similar from going to building a ship in a dry dock, to improving and transforming it while it is sailing.
Looking ahead of us, you probably heard in the news last week that they are a lot of unknown for IO and Hitman, but this is how we see it:
With this game we released a platform on which we want to build.
The content out there is what we call Season 1, there are more seasons to come.
We will keep releasing live content, and keep improving the game.