4. AMAZON ECS & VIDEO GAMES
WHAT TO EXPECT
ECS CONSOLIDATION WINS & LESSONS
Adapting existing deployments & infrastructure
INFRASTRUCTURE ENCAPSULATION
Object Oriented Development Operations
MULTI-GAME MODULAR SERVICE DESIGN
A different kind of scaling
1
2
3
5. AMAZON ECS & VIDEO GAMES
WHAT TO EXPECT
ECS CONSOLIDATION WINS & LESSONS
Adapting existing deployments & infrastructure
INFRASTRUCTURE ENCAPSULATION
Object Oriented Development Operations
MULTI-GAME MODULAR SERVICE DESIGN
A different kind of scaling
1
2
3
6. AMAZON ECS & VIDEO GAMES
WHAT TO EXPECT
ECS CONSOLIDATION WINS & LESSONS
Adapting existing deployments & infrastructure
INFRASTRUCTURE ENCAPSULATION
Object Oriented Development Operations
MULTI-GAME MODULAR SERVICE DESIGN
A different kind of scaling
1
2
3
7. AMAZON ECS & VIDEO GAMES
WHAT TO EXPECT
ECS CONSOLIDATION WINS & LESSONS
Adapting existing deployments & infrastructure
INFRASTRUCTURE ENCAPSULATION
Object Oriented Development Operations
MULTI-GAME MODULAR SERVICE DESIGN
A different kind of scaling
1
2
3
11. DATA PRODUCTS & SERVICES
OUR MISSION
Empower teams at Riot to make timely, data-informed products by maintaining a
scalable and reliable data platform
AWS re:Invent 2015 | (GAM303) Riot Games: Migrating Mountains of Big Data to AWS
Sean Maloney
13. PROBLEM
SCALING TOTAL OWNERSHIP ON AWS
Total ownership
We want to empower developers to:
• Provision their own infrastructure
• Execute their own deployments
• Monitor their own metrics
14. PROBLEM
SCALING TOTAL OWNERSHIP ON AWS
Resource attribution
• Who owns these EBS volumes?
• What applications depend on these security groups?
• Can these AMIs be deleted?
16. PROBLEM
SCALING TOTAL OWNERSHIP ON AWS
Security
• Auditing is important, but it’s reactive
• Operational time sink
Security Monkey AWS Trusted Advisor
21. CONTAINERS
STANDARDIZED APPLICATION UNITS… ON AWS
• Dockerfiles capture application dependencies
• Common use cases have great community support
• Profit from our own engineering community
23. CONTAINERS
STANDARDIZED APPLICATION UNITS… ON AWS
Embrace the abstraction
• Plan for failure at all levels
• Avoid manual intervention whenever possible
• It’s ephemeral all the way down
24. CONTAINERS
STANDARDIZED APPLICATION UNITS… ON AWS
Scheduling is hard
We need to:
• Quickly and fairly run tasks
• Prevent resource conflicts
• Provide reasonable fault tolerance
25. CONTAINERS
STANDARDIZED APPLICATION UNITS… ON AWS
Provisioning AWS hardware
• Enable total ownership of the AWS resources backing our containers
• Avoid the security, resource attribution, and convention degradation pitfalls
27. AMAZON EC2 CONTAINER SERVICE
STANDARDIZED APPLICATION UNITS… ON ECS!
• ECS AMI provides all necessary software
• Designed with AWS integration in mind
• Free!
28. AMAZON EC2 CONTAINER SERVICE
KEY FEATURE TIMELINE
Nov 2014
ECS ANNOUNCED
Re:Invent 2014
29. AMAZON EC2 CONTAINER SERVICE
KEY FEATURE TIMELINE
Nov 2014 April 2015 Dec 2015 August 2016
ECS ANNOUNCED
Re:Invent 2014
ECS GENERALLY
AVAILABLE
ECR & NEW REGIONS
ECS becomes available in the final
missing region we need (Frankfurt)
for our globally deployed
applications
30. AMAZON EC2 CONTAINER SERVICE
KEY FEATURE TIMELINE
Nov 2014 April 2015 Dec 2015 May 2016 July 2016 August 2016
ECS ANNOUNCED
Re:Invent 2014
ECS GENERALLY
AVAILABLE
ECR & NEW REGIONS
ECS becomes available in the final
missing region we need (Frankfurt)
for our globally deployed
applications
SERVICE
SCALING
Automatic task count
scaling based on
CloudWatch metrics
introduced
31. AMAZON EC2 CONTAINER SERVICE
KEY FEATURE TIMELINE
Nov 2014 April 2015 Dec 2015 May 2016 July 2016 August 2016
ECS ANNOUNCED
Re:Invent 2014
ECS GENERALLY
AVAILABLE
ECR & NEW REGIONS
ECS becomes available in the final
missing region we need (Frankfurt)
for our globally deployed
applications
SERVICE
SCALING
Automatic task count
scaling based on
CloudWatch metrics
introduced
TASK SPECIFIC
IAM ROLES
Unlocked a lot of cluster
sharing potential
32. AMAZON EC2 CONTAINER SERVICE
KEY FEATURE TIMELINE
Nov 2014 April 2015 Dec 2015 May 2016 July 2016 August 2016
ECS ANNOUNCED
Re:Invent 2014
ECS GENERALLY
AVAILABLE
ECR & NEW REGIONS
ECS becomes available in the final
missing region we need (Frankfurt)
for our globally deployed
applications
SERVICE
SCALING
Automatic task count
scaling based on
CloudWatch metrics
introduced
TASK SPECIFIC
IAM ROLES
Unlocked a lot of cluster
sharing potential
APPLICATION
LOAD BALANCERS
Several key improvements
for ECS
33. BEYOND ECS
INFRASTRUCTURE AS CODE
• At scale, orchestrating infrastructure in a consistent, reproducible way is key
34. BEYOND ECS
INFRASTRUCTURE AS CODE
• At scale, orchestrating infrastructure in a consistent, reproducible way is key
Total ownership
36. VPC NAT
Gateway
Route 53
Hosted Zone
Route
Tables
VPN
Gateway
VPC Internet
Gateway
Application Subnets Tools
Instances
Instances
Instances
Availability
Zone C
Availability
Zone B
Availability
Zone A
PROVISIONING
INFRASTRUCTURE AS OBJECT-ORIENTED CODE
38. ECS CLUSTER
TERRAFORM BUILDING BLOCKS
VPC NAT
Gateway
Route 53
Hosted Zone
Route
Tables
VPN
Gateway
VPC Internet
Gateway
Application Subnets Tools
Instances
Instances
Instances
Availability
Zone C
Availability
Zone B
Availability
Zone A
ECS Cluster
Autoscaling Group
Security
Group
Security
Group
Security
Group
Instance Instance Instance Instance Instance Instance
Launch Configuration User Data
IAM Role CloudWatch
Alarms
41. MICROSERVICES
WITHOUT SERVICE ENDPOINTS
ECS Cluster
Autoscaling Group
Security
Group
Security
Group
Security
Group
Instance Instance Instance Instance Instance Instance
Launch Configuration User Data
IAM Role CloudWatch
Alarm
ECS Service Task Definition
CloudWatch Alarms IAM Role
42. MICROSERVICES
WITH SERVICE ENDPOINTS
ECS Cluster
ECS Service Task Definition
CloudWatch Alarms IAM Role
Application Load Balancer
Monitoring
CloudWatch Alarms SNS Topics
Security
Group
Security
Group
Listeners
Target Groups
Route 53
43. PERSISTENT DATA
LOSE ECS HOSTS WITHOUT LOSING DATA
ECS Cluster
ECS Service
Application Load Balancer
Monitoring
CloudWatch Alarms SNS Topics
Security
Group
Security
Group
Listeners
Target Groups
Route 53
Attachment Group
EBS EBS EBS
Elastic Network Interface
Elastic IP
46. LESSONS
WHAT WORKED FOR US
• Break apart your stacks, but don’t overdo it
ECS Service
ALB
SNS Topics
Route 53
IAM Role
Task Definition
CloudWatch Alarms
47. LESSONS
WHAT WORKED FOR US
• Be liberal with your cluster provisioning
• Don’t risk resource contention in production
• With good orchestration, additional clusters != additional operational overhead
48. LESSONS
WHAT WORKED FOR US
• Tag everything all of the time
• Keep your tags organized in your Terraform templates
• Have top level variables that get applied to every resource
• Create a tag for every dimension that is useful to your business
52. LESSONS
WHAT WORKED FOR US
• Profile your memory requirements, monitor for scheduling issues
53. RIOT ENGINEERING BLOG
THESE PROBLEMS AND MORE
https://engineering.riotgames.com/
Riot engineering
How we use data
http://na.leagueoflegends.com/en/tag/insights