The huge growth in online games over the last few years presents unique challenges in terms of scalability and backend infrastructure. Launch day can mean an immediate increase in load from a few hundred to millions of users, each of whom may be conducting dozens of transactions per minute as they purchase extra weapons or gear. Games often process many more writes than reads, some of which are low-latency and compute-intensive while others are high-latency but must be shared among hundreds of global servers. Multiplayer games require enough global player state to allow for matchmaking, while not allowing for a single point of failure. Games require both centralized servers to allow for frequent updates and decentralized ones to allow for low-latency play worldwide. And to prevent cheating, authoritative logic needs to stay server-side. With $24 billion in revenue in 2014 (and projected to grow 4x that by 2020), the online game industry is in need of all the creative solutions that researchers can offer. This talk will outline specific high-performance computing challenges faced by games and the pros and cons of the current solutions being used.
2. Promise & peril of live game operations
• Launched 3/2/12
• Shot to top 10
• Servers crashed
• Offline for 168 days
• $27M lost revenue
• Re-launched 8/17/12
• Top-grossing game for nearly 3 years
LTD revenue: $150M+
Source: AppAnnie, NPD, ThinkGaming
2
Updated more than 45 times
Rank history for top grossing games, iOS in United States
4. Build game
Build
backend
Launch
Build tools for
ops team
Segment &
target customers
Deploy servers
Business
intelligence
Offers &
Promotions
Update
content
Host in-game
events
Customer
service
Nowadays…
Business
intelligence
User
Acquisition
5. Characteristics of live games
• Many more writes than reads (primarily driven by analytics)
• Authoritative server-side game logic (to prevent cheating)
• Millions of users (often with transient relationships)
• Peak load often hits on day one of release
• Must be scalable, global, and reliable all at once
• Game studios often inexperienced at large-scale system design
• Testing w/ real user “stories” at scale is hard but critical
5
6. +
Live ops tools and dashboards
(mission control for the whole team)
Back-end services
(cross-platform, one-stop shop)
+
Integration with other partners
(building out a full ecosystem)
and many more coming...
6
7. Typical backend services
• Player Accounts
– Authentication
– Profile Management
– Account Linking
• Data Storage
– Per player
– Per title
– Per character (under one player)
– Files / CDN delivery
• Commerce
– Catalog Management
– Virtual Currencies
– Player Inventory
– In-game Purchasing
– Receipt validation (Apple/Google/Amazon)
– Marketing and Promotion
• In-game Marketing
– Push Notifications
– In-game Messaging
• Product Management
– Analytics and Reporting
– Customer Segmentation
– Customer Support Tools
– Abuse Reporting and Banning
• Social
– Friends Lists
– Player Chat
– Leaderboards
– Game forum integration
– Trading / gifting
• Multiplayer
– Photon integration (real-time / turn-based)
– Matchmaking
– Custom game server hosting
– Server monitoring
• Game logic
– Server-hosted JavaScript
7
8. Availability Zone B
Oregon
AWS cloud: PlayFab Web Services
Amazon Route 53
(3.playfabapi.com)
Amazon EC2
(API handling)
Matchmaker
Instance
Instance
Game Server
Monitor
DynamoDB Amazon RDS Amazon S3 Amazon RedshiftReports
service
Instance
Logs
Architecture Overview
Matchmaker
(Secondary)
Instance
Virginia
Availability Zone A
Amazon EC2
(API handling)
Elastic Load Balancing
Cross-zone storage
Game Client
Game Manager
(Dashboards)
Amazon EC2
(Virtual game
servers)
Tokyo
Amazon EC2
(Virtual game
servers)
Sydney
Physical game
servers
Physical game
servers
Amazon EC2
(Virtual game
servers)
Local disk Local disk Local disk
Physical game
servers
9. Dealing with elastic load
• Peak load is often launch day
– Hard to predict actual load
• We solve using elastic
compute & storage solutions
• No transaction support in DB,
so must solve at logic level
9
10. Applying actions to user segments
• Users are grouped into
segments based on behavior
• Games apply actions to
segments
– Send a push notification
– Grant an item
• Need both batch and real-time
processing
• We use a hybrid model, with
duplicate storage
– Redshift for bulk actions
– DynamoDB for individual triggers
10
11. Global state for matchmaking
• Matching users for a game is
harder than it sounds
– Millions of players
– Must be atomic
– Lots of state and complex rules
• We use a single server, w/
fallover
– State is written though to DB
– If primary fails, secondary reads in
state from DB and takes over
– This happens very quickly…
11
12. Thank you
James Gwertzman (james@playfab.com)
@gwertz or @playfabnetwork
Create a free account today at www.playfab.com
12
Hinweis der Redaktion
Load of King.com – 140M DAU, probably 300K RPS
Consider the Simpsons Tapped out.. It was a top 10 game for more than a year, and even now is still in top 100 nearly 3 years after launch. That’s enviable. But it also had a rocky start. Launched, hit top 10, then had to be taken down for 4 months as the back-end was rebuilt. This is tricky stuff to get right.
Consider how building a game has evolved. It used to be you’d build a game, ship it, and move on.
Now, however, games have becomes services, and suddenly launching the game is only the beginning. There’s all this other stuff that has to happen AFTER the game goes live – the service has to be managed.
(talk about the flow)
Costs include:
Technology cost
People cost (burn out)
Continuously reacting to competitors
Our goal is to cover all the core services you might need… and to integrate third party services to fill in the areas we don’t cover
Three primary areas:
Game Client
Game services
Multiplayer Game Servers (physical and AWS)