Whether you are promoting the launch of a new phone, offering tickets to a popular singer’s concert, running an online promotion for an e-commerce store, or reporting on a breaking news story, all of these scenarios need an architecture that is agile and cost effective. Discover how the Amazon Web Services platform help you scale your applications up and down to help with unforeseen and planned user spikes. Find out how customers built their application architectures to solve their scaling problems and the benefits they found by using Amazon Web Services. In this session, we will share a live demo on step-by-step scaling on our platform.
Blair Layton, Business Development Manager – Database Services, Amazon Web Services, APAC
3. What is a Large Scale Event
An event where you need more capacity than normally
allocated for a period of time
Typically from minutes to days, but could be a couple of
weeks
Often associated with a sudden surge of users
Hard to architect and provision for at a reasonable cost
Consumers get angry when it all goes wrong!
4.
5.
6.
7.
8. What is a Large Scale Event?
For you, it could be as simple as needing twice as much
capacity for a short promotion
Everyone’s Large Scale Event is different, but the
underlying concepts are the same
9. What Problems do you Face?
Unknown infrastructure requirements
• Cost?
Short duration of the event
• Massive investment in infrastructure that is otherwise idle or
underutilized
• Often tight deadlines to get the system live
Legacy system integration
Understanding system bahaviour, required metrics
Getting the right architecture
Finding the right talent
15. Day One, User One
A single EC2 Instance
• With full stack on this host
• Web app
• Database
• Management
• Etc.
A single Elastic IP
Route53 for DNS
EC2
Instance
Elastic IP
Amazon
Route 53
User
16. “We’re gonna need a bigger box”
Simplest approach
Can now leverage PIOPs
High I/O instances
High memory instances
High CPU instances
High storage instances
Easy to change instance sizes
Will hit an endpoint eventually
r3.8xlarge
m3.large
t2.micro
17. Day One, User One:
We could potentially get to a
few hundred to a few
thousand depending on
application complexity and
traffic
No failover
No redundancy
Too many eggs in one
basket
EC2
Instance
Elastic IP
Amazon
Route 53
User
18. Day Two, User >1
First let’s separate out our
single host into more than one.
Web
Database
• Make use of a database
service?
Web
Instance
Database
Instance
Elastic IP
Amazon
Route 53
User
21. User >100
First let’s separate out our
single host into more than one
Web
Database
• Use RDS to make your life
easier
Web
Instance
Elastic IP
RDS DB
Instance
Amazon
Route 53
User
22. User > 1000
Next let’s address our lack of
failover and redundancy
issues
Elastic Load Balancing
Another web instance
• In another Availability Zone
Enable Amazon RDS multi-AZ
Web
Instance
RDS DB Instance
Active (Multi-AZ)
Availability Zone Availability Zone
Web
Instance
RDS DB Instance
Standby (Multi-AZ)
Elastic Load
Balancing
Amazon
Route 53
User
23. User >10 ks–100 ks
RDS DB Instance
Active (Multi-AZ)
Availability Zone Availability Zone
RDS DB Instance
Standby (Multi-AZ)
Elastic Load
Balancing
RDS DB Instance
Read Replica
RDS DB Instance
Read Replica
RDS DB Instance
Read Replica
RDS DB Instance
Read Replica
Web
Instance
Web
Instance
Web
Instance
Web
Instance
Web
Instance
Web
Instance
Web
Instance
Web
Instance
Amazon
Route 53
User
24. This will take us pretty far
honestly, but we care about
performance and
efficiency, so let’s clean
this up a bit
25. Shift Some Load Around
Let’s lighten the load on our
web and database instances
Move static content from the web
instance to Amazon S3 and
CloudFront
Move dynamic content from the
Elastic Load Balancing to
CloudFront
Move session/state and DB
caching to ElastiCache or
DynamoDB
Web
Instance
RDS DB Instance
Active (Multi-AZ)
Availability Zone
Elastic Load
Balancing
Amazon S3
Amazon
CloudFront
Amazon
Route 53
User
ElastiCache
Amazon
DynamoDB
26. User >500k+
Availability Zone
Amazon
Route 53
User
Amazon S3
Amazon
Cloudfront
Availability Zone
Elastic Load
Balancing
DynamoDB
RDS DB Instance
Read Replica
Web
Instance
Web
Instance
Web
Instance
ElastiCache RDS DB Instance
Read Replica
Web
Instance
Web
Instance
Web
Instance
ElastiCacheRDS DB Instance
Standby (Multi-AZ)
RDS DB Instance
Active (Multi-AZ)
27. Time to make some
radical improvements at
the web & app layers
28. SOAing
Move services into their own tiers
or modules. Treat each of these
as 100% separate pieces of your
infrastructure and scale them
independently.
Amazon.com and AWS do this
extensively! It offers flexibility and
greater understanding of each
component.
29. Loose Coupling Sets You Free!
The looser they're coupled, the bigger they scale
• Use independent components
• Design everything as a black box
• Decouple interactions
• Favor services with built in redundancy and scalability than
building your own
Controller A Controller B
Controller A Controller B
Q Q
Tight Coupling
Use Amazon SQS as Buffers
Loose Coupling
30. Users > 1 Million
RDS DB Instance
Active (Multi-AZ)
Availability Zone
Elastic Load
Balancer
RDS DB Instance
Read Replica
RDS DB Instance
Read Replica
Web
Instance
Web
Instance
Web
Instance
Web
Instance
Amazon
Route 53
User
Amazon S3
Amazon
Cloudfront
Amazon
DynamoDB
Amazon SQS
ElastiCache
Worker
Instance
Worker
Instance
Amazon
CloudWatch
Internal App
Instance
Internal App
Instance
Amazon SES
32. From 5 to 10 Million Users
You may start to run into issues with your database around
contention on the write master.
How can you solve it?
Federation (splitting into multiple DBs based on function)
Sharding (splitting one data set up across multiple hosts)
Moving some functionality to other types of DBs (NoSQL)
33. From 5 to 10 Million Users
You may start to run into issues with speed and performance
of your applications
Make sure you have monitoring, metrics, & logging in place
• If you can’t build it internally, outsource it! (third-party SaaS)
Pay attention to what customers are saying works well vs.
what doesn’t, and use this as direction
Try to work on squeezing as much performance out of each
service or component
35. Sizing for Peak Loads
Promotions cause huge spikes in user activity
Auto-scaling works for the web and middle tier
RDS instances have to be sized for peak loads
Adopted our recommendations in a staged approach
36. Amazon
Route 53
CloudFront
Availability Zone #1
Amazon S3
Availability Zone #2
Amazon EC2Amazon EC2
Auto Scaling
Geo Routing
US East
Amazon
CloudWatch
RDS DB Instance
Active (Multi-AZ)
RDS DB Instance
Standby (Multi-AZ)
User
37. Amazon
Route 53
CloudFront
Availability Zone #1
Amazon S3
Availability Zone #2
Amazon EC2Amazon EC2
Auto Scaling
Geo Routing
US East
User
Amazon
CloudWatch
RDS DB Instance
Active (Multi-AZ)
RDS DB Instance
Standby (Multi-AZ)
RDS DB
instance read
replica
38. Amazon
Route 53
CloudFront
Availability Zone #1
Amazon S3
DynamoDB
Availability Zone #2
Amazon EC2Amazon EC2
Auto Scaling
Geo Routing
US East
User
Amazon
CloudWatch
RDS DB Instance
Active (Multi-AZ)
RDS DB Instance
Standby (Multi-AZ)
RDS DB
instance read
replica
39. Amazon
Route 53
CloudFront
Availability Zone #1
Amazon S3
DynamoDB
Availability Zone #2
Amazon EC2
ElastiCache
Memcached
Amazon EC2
Auto Scaling
Geo Routing
US East
User
Amazon
CloudWatch
RDS DB Instance
Active (Multi-AZ)
RDS DB Instance
Standby (Multi-AZ)
RDS DB
instance read
replica
40. Amazon
Route 53
CloudFront
Availability Zone #1
Amazon S3
DynamoDB
Availability Zone #2
Amazon EC2
ElastiCache
(Redis Master)
ElastiCache
Memcached
Amazon EC2
Redis Slave
Auto Scaling
Geo Routing
US East
User
Amazon
CloudWatch
RDS DB Instance
Active (Multi-AZ)
RDS DB Instance
Standby (Multi-AZ)
RDS DB
instance read
replica
Amazon Redshift
41. Lessons Learned
Listen to AWS Business Development and Solution
Architects ;)
Gaming promotions much easier to handle
Unpredicted loads also easier to handle
Senior operations person moving to a new game
Customers get a much better gaming experience!
43. Customer Success Stories
Telecommunications Company
iPhone 5s/5c, 6/6+ and Samsung Note III launch
Needed a system to handle a huge number of concurrent
requests
Failed previously at the iPhone5 launch
Management directive to succeed at all costs!
45. Great Success!
Tested with 150,000 concurrent users
All phones gone within 2 minutes
No phones misallocated or unallocated
Management said the system was too fast!
Actual launch went smoothly
46. Lessons
AWS can provide infrastructure for applications to scale to
very high concurrent users
Managed services allow for quick deployment and changes
to infrastructure
Impossible for the customer to execute internally
Massive cost savings, even with huge over provisioning
47. “With our systems on AWS, we
can scale our resources more
than 130-fold in 30 minutes,
enabling us to support more
than 2,500 orders per second”
KT Chiu
Founder and Chief Executive Officer
TixCraft