Understand how to architect an infrastructure to handle going from zero to millions of users. From leveraging highly scalable AWS services to making smart decisions on building out your application, you'll learn a number of best practices for scaling your infrastructure in the cloud.
1. Scaling the Platform for
Your Startup
Andreas Chatzakis
AWS Solutions Architecture
2. Why are you here?
• Building the technology platform for your startup
• You want to prepare for success
• Learn about design patterns & scalability
• A pragmatic approach for startups
3. Priorities for startups
• Racing within a window of opportunity
• Small team with no legacy
• Focus on solving a problem
• Avoid over-engineering & re-engineering
• Reduce risk of failure when you go viral
4. A scalable architecture
• Can support growth in users, traffic, data size
• Without practical limits
• Without a drop in performance
• Seamlessly - just by adding more resources
• Efficiently - in terms of cost per user
6. AWS Regions
US-WEST (Oregon)
EU-WEST (Ireland)
ASIA PAC (Tokyo)
US-WEST (N. California)
SOUTH AMERICA (Sao Paulo)
US-EAST (Virginia)
AWS GovCloud (US)
ASIA PAC
(Sydney)
ASIA PAC
(Singapore)
CHINA (Beijing)
7. Availability Zones (AZs)
US-WEST (Oregon)
EU-WEST (Ireland)
ASIA PAC (Tokyo)
US-WEST (N. California)
SOUTH AMERICA (Sao Paulo)
US-EAST (Virginia)
AWS GovCloud (US)
ASIA PAC
(Sydney)
ASIA PAC
(Singapore)
CHINA (Beijing)
EU-CENTRAL (Frankfurt)
ASIA PAC (Singapore)
CHINA (Beijing)
13. We need a bigger server
• Add larger & faster storage (EBS)
• Use the right instance type
• Easy to change instance sizes
• Not our long term strategy
• Will hit an endpoint eventually
• No fault tolerance
14. Separating web and DB
• More capacity
• Scale each tier individually
• Tailor instance for each tier
– Instance type
– Storage
• Security
– Security groups
– DB in a private VPC subnet
15. But how do I choose what
DB technology I need?
SQL? NoSQL?
16. Why start with a Relational DB?
• SQL is versatile & feature-rich
• Lots of existing code, tools, knowledge
• Clear patterns to scalability (for read-heavy apps)
• Reality: eventually you will have a polyglot data layer
– There will be workloads where NoSQL is a better fit
– Combination of both Relational and NoSQL
– Use the right tool for each workload
17. Key Insight: Relational Databases are Complex
• Our experience running Amazon.com taught us that
relational databases can be a pain to manage and
operate with high availability
• Poorly managed relational databases are a leading
cause of lost sleep and downtime in the IT world!
• Especially for startups with small teams
20. Offload static content
• Amazon S3: highly available hosting that scales
– Static files (JavaScript, CSS, images)
– User uploads
• S3 URLs – serve directly from S3
• Let the web server focus on dynamic content
21. Masterclass Live: Amazon S3
Upcoming Session at the AWS Pop Up Loft
Wednesday 16th September, 10.00
22. Amazon CloudFront
• Worldwide network of edge locations
• Cache on the edge
– Reduce latency
– Reduce load on origin servers
– Static and dynamic content
– Even few seconds caching of popular content can have huge impact
• Connection optimizations
– Optimize transfer route
– Reuse connections
– Benefits even non cachable content
CloudFront
23. CloudFront for static & dynamic content
Amazon
Route 53
EC2 instance(s)
S3 bucket
Static content
Dynamic content
css/*
js/*
Images/*
Default(*)
CloudFron
t
distributio
n
24. Database caching
• Faster response from RAM
• Reduce load on database
Application server
1. If data in cache,
return result
2. If not in cache,
read from DB
RDS database
Amazon ElastiCache
3. And store
in cache
27. High Availability
Availability Zone a
RDS DB
instance
Web
server
S3 bucket for
static assets
www.example.com
Amazon Route 53
DNS service
Amazon CloudFront
ElastiCache
node 1
28. High Availability
Availability Zone a
RDS DB
instance
Availability Zone b
Web
server
Web
server
S3 bucket for
static assets
www.example.com
Amazon Route 53
DNS service
Amazon CloudFront
ElastiCache
node 1
29. High Availability
Availability Zone a
RDS DB
instance
Availability Zone b
www.example.com
Amazon Route 53
DNS service
Elastic Load
Balancing
Web
server
Web
server
S3 bucket for
static assets
Amazon CloudFront
ElastiCache
node 1
30. Elastic Load Balancing
• Managed Load Balancing Service
• Fault tolerant
• Health Checks
• Distributes traffic across AZs
• Elastic – automatically scales its capacity
31. Data Layer HA
Availability Zone a
RDS DB
instance
Availability Zone b
www.example.com
Amazon Route 53
DNS service
Elastic Load
Balancing
Web
server
Web
server
S3 bucket for
static assets
ElastiCache
node 1
Amazon CloudFront
32. Availability Zone a
RDS DB
instance
Availability Zone b
www.example.com
Amazon Route 53
DNS service
Elastic Load
Balancing
Web
server
Web
server
RDS DB
standby
S3 bucket for
static assets
ElastiCache
node 1
Amazon CloudFront
Data Layer HA
33. Data Layer HA
Availability Zone a
RDS DB
instance
ElastiCache
node 1
Availability Zone b
S3 bucket for
static assets
www.example.com
Amazon Route 53
DNS service
Elastic Load
Balancing
Web
server
Web
server
RDS DB
standby
34. Data Layer HA
Availability Zone a
RDS DB
instance
ElastiCache
node 1
Availability Zone b
S3 bucket for
static assets
www.example.com
Amazon Route 53
DNS service
Elastic Load
Balancing
Web
server
Web
server
RDS DB
standby
ElastiCache
node 2
35. User sessions
• Problem: Often stored on local disk
(not shared)
• Quickfix: ELB Session stickiness
• Solution: DynamoDB
Elastic Load
Balancing
Web
server
Web
server
Logged in Logged out
36. Amazon DynamoDB
• Managed document and key-value NoSQL DB
• Simple to launch and scale
• To millions of IOPS
• Both reads and writes
• Consistent, fast performance
• Durable: perfect for storage of session data
https://github.com/aws/aws-dynamodb-session-tomcat
http://docs.aws.amazon.com/aws-sdk-php/guide/latest/feature-dynamodb-session-handler.html
37. AWS bootcamp: Architecting Highly
Available Applications on AWS
Free Training at the AWS Pop Up Loft
9th October 10.00
39. Replace guesswork with elastic IT
Startups pre-AWS
Demand
Unhappy
Customers
Waste $$$
Traditional
Capacity
Capacity
Demand
AWS Cloud
40. Scaling the web tier
Availability Zone a
RDS DB
instance
ElastiCache
node 1
Availability Zone b
S3 bucket for
static assets
www.example.com
Amazon Route 53
DNS service
Elastic Load
Balancing
Web
server
Web
server
RDS DB
standby
ElastiCache
node 2
41. Scaling the web tier
Availability Zone a
RDS DB
instance
ElastiCache
node 1
Availability Zone b
S3 bucket for
static assets
www.example.com
Amazon Route 53
DNS service
Elastic Load
Balancing
Web
server
Web
server
RDS DB
standby
ElastiCache
node 2
Web
server
Web
server
42. Scaling the web tier
Availability Zone a
RDS DB
instance
ElastiCache
node 1
Availability Zone b
S3 bucket for
static assets
www.example.com
Amazon Route 53
DNS service
Elastic Load
Balancing
Web
server
Web
server
RDS DB
standby
ElastiCache
node 2
Web
server
Web
server
43. Automatic resizing of compute
clusters based on demand
Feature Details
Control Define
minimum
and
maximum
instance pool
sizes
and
when
scaling
and
cool
down
occurs.
Integrated
to
Amazon
CloudWatch
Use
metrics gathered
by
CloudWatch to
drive
scaling.
Instance
types Run
Auto
Scaling
for
on-‐demand
and
Spot
Instances. Compatible
with
VPC.
aws autoscaling create-auto-scaling-group
--auto-scaling-group-name MyGroup
--launch-configuration-name MyConfig
--min-size 4
--max-size 200
--availability-zones us-west-2c, us-west-2b
Auto Scaling Trigger auto-scaling policy
Amazon
CloudWatch
47. What does this mean in practice?
• Only store transient data on local disk
• Needs to persist beyond a single http request?
– Then store it elsewhere
User uploads
User Sessions
Amazon S3
AWS DynamoDB
Application Data
Amazon RDS
48. Having decomposed into
small, loosely coupled,
stateless building blocks
You can now Scale out with ease
Having
done
that…
49. Having decomposed into
small, loosely coupled,
stateless building blocks
We can also Scale back with ease
Having
done
that…
50. Take the shortcut
• While this architecture is simple you still need
to deal with:
– Configuration details
– Deploying code to multiple instances
– Maintaining multiple environments (Dev, Test, Prod)
– Maintain different versions of the application
• Solution: Use AWS Elastic Beanstalk
51. AWS Elastic Beanstalk (EB)
• Easily deploy, monitor, and scale three-tier web
applications and services.
• Infrastructure provisioned and managed by EB
• You maintain control.
• Preconfigured application containers
• Easily customizable.
• Support for these platforms:
52. Deploy your Apps with AWS Elastic
Beanstalk
Upcoming Session at the AWS Pop Up Loft
Tuesday 15th September, 16.00
54. Mobile
Push
Notifications
Mobile
Analytics
Cognito
Cognito
Sync
Analytics
Kinesis
Data
Pipeline
RedShift EMR
Your Applications
AWS Global Infrastructure
Network
VPC
Direct
Connect
Route 53
Storage
EBS S3 Glacier CloudFront
Database
DynamoDBRDS ElastiCache
Deployment & Management
Elastic
Beanstalk
OpsWorks
Cloud
Formation
Code
Deploy
Code
Pipeline
Code
Commit
Security & Administration
CloudWatch Config
Cloud
Trail
IAM Directory KMS
Application
SQS SWF
App
Stream
Elastic
Transcoder
SES
Cloud
Search
SNS
Enterprise Applications
WorkSpaces WorkMail WorkDocs
Compute
EC2 ELB
Auto
Scaling
LambdaECS
55. AWS building blocks
Inherently Scalable & Highly Available Scalable & Highly Available
a Elastic Load Balancing
a Amazon CloudFront
a Amazon Route53
a Amazon S3
a Amazon SNS / SQS
a Amazon SES
a Amazon CloudSearch
a AWS Lambda
a …
a Amazon DynamoDB
a Amazon Redshift
a Amazon RDS
a Amazon Elasticache
a …
4 Amazon EC2
4 Amazon VPC
Automated Configurable With the right architecture
56. Stay focused as you scale your team
AWS
Cloud-‐Based
Infrastructure
Your
Business
More
Time
to
Focus
on
Your
Business
Configuring
Your
Cloud
Assets
70%
30%70%
On-‐Premise
Infrastructure
30%
Managing
All
of
the
“Undifferentiated
Heavy
Lifting”
57. Don’t reinvent the wheel
• Notification system
• E-Mail component
• Search engine
• Workflow engine
• Queue
• Transcoding system
• Monitoring system
Amazon
CloudSearch
Amazon SQSAmazon SNS
Amazon Elastic
Transcoder
Amazon SWFAmazon SES
If you find yourself writing your own…
58. Search features
• Freetext AND Structured
• Synonyms
• Stemming
• Relevance
• Complex scoring
• Faceting
• Geospatial
59. Amazon CloudSearch is a fully managed search
service in the cloud for your website or application
60. Rich search feature set
Source:
IDC,
Nielsen,
Twitter
blog
Faceting Highlighting Autocomplete
suggestions
Geospatial
search
64. Prepare for the challenges
• Increase in concurrent users
• Data size growth
• More features => more DB tables
• Technical debt (e.g. inefficient queries)
– Reduce it
– Manage it
66. My SQL compatible
Available, durable, and fault tolerant
5X better performance of
high-end MySQL database
Highly scalable and secure
Up to 64TB of storage
Amazon Aurora
1/10th the cost of the leading
commercial database solutions
67. Amazon Aurora - Write performance
• MySQL Sysbench
• R3.8XL with 32 cores
and 244 GB RAM
• 4 client machines with
1,000 threads each
68. Amazon Aurora - Read performance
• MySQL Sysbench
• R3.8XL with 32 cores
and 244 GB RAM
• Single client with
1,000 threads
69. Scaling Relational DBs – option 2
Read Replicas (Master – Slave)
– Scale out beyond capacity of single DB instance
– Available in Amazon RDS for MySQL, PostgreSQL and Amazon Aurora
– Replication lag
– Writes => master
– Reads with tolerance to stale data => read replica (slave)
– Reads with need for most recent data => master
72. Scaling the DB
Web
server
Web
server
Web
server
Web
server
Availability Zone a
RDS DB
instance
ElastiCache
node 1
Availability Zone b
S3 bucket for
static assets
www.example.com
Amazon Route 53
DNS service
Elastic Load
Balancing
RDS DB
standby
ElastiCache
node 2
RDS read
replica
RDS read
replica
73. Amazon Aurora Replicas have less replication lag
2.6 3.4 3.9 5.4
1,000 2,000 5,000 10,000
0
50,000
100,000
150,000
200,000
250,000
300,000
350,000
Updates per second
Read replica lag in milliseconds
Read replica lag
Aurora
RDS MySQL;;30,000 IOPS (Single AZ)
Updates per
second
Amazon
Aurora
RDS MySQL
30K IOPS
(single AZ)
1,000 2.62 ms 0 s
2,000 3.42 ms 1 s
5,000 3.94 ms 60 s
10,000 5.38 ms 300 s
Write workload
250 tables
Query cache on for Amazon Aurora, off for MySQL (best
settings)
74. What if your app is write-heavy?
Challenge: You will eventually hit the write throughput or
storage limit of the master node
Solutions:
• Federation (splitting into multiple DBs based on function)
• Sharding (splitting one data set up across multiple hosts)
75. Database federation
• Split up databases by
function/purpose
• Harder to do cross-function
queries
• Essentially delaying the need for
something like sharding/NoSQL
until much further down the line
• Won’t help with single huge
functions/tables
Forums DB
Users DB
Products
DB
76. Sharded horizontal scaling
• More complex at the
application layer
• ORM support can help
• No practical limit on
scalability
• Operation
complexity/sophistication
• Shard by function or key
space
• RDBMS or NoSQL
User ShardID
002345 A
002346 B
002347 C
002348 B
002349 A
Shard C
Shard B
Shard A
77. NoSQL data stores
• Trade query & integrity features of Relational DBs for
– More flexible data model
– Horizontal scalability & predictable performance
DynamoDB
Provisioned read/write performance per table
78. Massive and Seamless Scale
• Distributed system that can scale both reads and writes
– Sharding + Replicas
• Automatic partitioning:
– Data set size growth
– Provisioned capacity increases table
80. Increased
provisioned
throughput
Illustrative diagram only
Region
Table
Partition
SS
D
Table
Partition
SS
D
Table
Partition
SS
D
Table
Partition
SS
D
Table
Partition
SS
D
Table
Partition
SS
D
Table
Partition
SS
D
Table
Partition
SS
D
Table
Partition
SS
D
Table
Partition
SS
D
81. High
provisioned
throughput
Region
Illustrative diagram only
Tabl
e
Partitio
n
Tabl
e
Partitio
n
Tabl
e
Partitio
n
Tabl
e
Partitio
n
Tabl
e
Partitio
n
Tabl
e
Partitio
n
Tabl
e
Partitio
n
Tabl
e
Partitio
n
Tabl
e
Partitio
n
Tabl
e
Partitio
n
Tabl
e
Partitio
n
Tabl
e
Partitio
n
Tabl
e
Partitio
n
Tabl
e
Partitio
n
Tabl
e
Partitio
n
Tabl
e
Partitio
n
Tabl
e
Partitio
n
Tabl
e
Partitio
n
Tabl
e
Partitio
n
Tabl
e
Partitio
n
Tabl
e
Partitio
n
Tabl
e
Partitio
n
Tabl
e
Partitio
n
Tabl
e
Partitio
n
Tabl
e
Partitio
n
Tabl
e
Partitio
n
Tabl
e
Partitio
n
Tabl
e
Partitio
n
Tabl
e
Partitio
n
Tabl
e
Partitio
n
Tabl
e
Partitio
n
Tabl
e
Partitio
n
Tabl
e
Partitio
n
Tabl
e
Partitio
n
Tabl
e
Partitio
n
Tabl
e
Partitio
n
Tabl
e
Partitio
n
Tabl
e
Partitio
n
Tabl
e
Partitio
n
Tabl
e
Partitio
n
84. AWS Marketplace & Partners Can Help
• Find, research, buy software
• Aligns with EC2 usage model
• Launch in minutes
• Marketplace billing integrated
into your AWS account
• 20+ categories
Learn more at: aws.amazon.com/marketplace
86. Amazon Simple Queuing Service (SQS)
Tight
coupling
• Place
asynchronous
tasks
into
Amazon
SQS
• Respond
quickly
to
end
users
• Protects
backend
systems
from
spikes
• Process
at
own
pace
• Protect
end
users
from
backend
problems
• Item
will
be
processed
eventually
SQS
Get
Message
Back
End EC2
Instance
Put
Message
Front
End EC2
Instance
91. Event-Driven Compute in the Cloud
Lambda functions: Stateless, request-driven code execution
• Triggered by events in other services:
• PUT to an Amazon S3 bucket
• Write to an Amazon DynamoDB table
• Record in an Amazon Kinesis stream
• Amazon SNS Message received
• Changes in Amazon Cognito data
• Makes it easy to…
• Transform data as it reaches the cloud
• Perform data-driven auditing, analysis, and notification
• Kick off workflows
92. Data Triggers – Amazon Simple Notification Service
Lambda FunctionSNSCloudWatch
Metric
96. Dynamic content generation
based on incoming news text
and images
Real time log
processing for
prediction analytics
Thumbnailing
installation site photos
for mobile use
Real time processing and
recording of inbound traffic from
a range of social media
platforms
Large scale distributed
search across blog
content
Operational
analytics and real
time troubleshooting
97. Introducing the AWS API Gateway
Internet
Mobile Apps
Websites
Services
API
Gateway
AWS Lambda
functions
AWS
API Gateway
Cache
Endpoints on
Amazon EC2 /
Amazon
Elastic
Beanstalk
Any other publicly
accessible endpoint
Amazon
CloudWatch
Monitoring
98. No server is easier to manage than
"no server”.
Werner Vogels
101. Experiment and collect data
• Build an MVP
• Run A/B testing
• Analyze user behavior
• Iterate
• Get insights about your customers
102. Data Produced
Available for Analysis
Gartner: User Survey Analysis: Key Trends Shaping the Future of Data Center Infrastructure Through 2011
IDC: Worldwide Business Analytics Software 2012–2016 Forecast and 2011 Vendor Shares
103. Three Types of Data Analytics
Retrospective
analysis and
reporting
Here-and-now
real-time processing
and dashboards
Predictions
to enable smart
apps
Amazon Kinesis
Amazon EC2
Amazon DynamoDB
AWS Lambda
Amazon Redshift
Amazon RDS
Amazon S3
Amazon EMR
AWS Marketplace
Amazon Machine Learning
Amazon EMR
AWS Lambda
108. Amazon Redshift Node Types
Dense
Compute
DW2.L
15 GB RAM
160 GB
SSD
2 vpus
Single Node (160 GB)
Cluster 2-32 Nodes (320 GB – 5 TB)
L
L L L L L L L L
L L L L L L L L
L L L L L L L L
L L L L L L L L
114. Amazon EBS
standard volume
prices are lowered
up to 50% for both
storage and I/O
requests.
January 21, 2014
50%
Amazon is
reducing prices for
Amazon
ElasticCache cache
nodes by an
average of 34%.
March 26, 2014
34%
Amazon S3: We are
reducing prices for
Standard and Reduced
Redundancy Storage,
by an average of 51%.
March 26, 2014
51%
We’ve announced price reductions 42* times since
our inception in 2006. Recent price drops included…
*as of April 1, 2014
115. Main cost-saving principles
1. Turn off the lights
2. Use Auto Scaling
3. Use the newer instance types
4. Use the right instances for your workload
5. Use Reserved Instances for predictable workloads
6. Use Spot Instances for async workloads
7. Leverage Amazon S3 storage classes
8. Use Glacier for archival
9. Serve content through Cloudfront
10. Offload your architecture
118. Amazon Route 53
DNS serviceNo limit
Availability Zone a
RDS DB
instance
ElastiCache
node 2
Availability Zone b
S3 bucket for
static assets
www.example.com
Elastic Load
Balancing
RDS DB
standby
ElastiCache
node 3
RDS read
replica
RDS read
replica
DynamoDB
RDS read
replica
ElastiCache
node 4
RDS read
replica
ElastiCache
node 1
CloudSearchLambdaSES SQS
119. A quick review
• Keep it simple and stateless
• Make use of managed self-scaling services
• Multi-AZ and AutoScale your EC2 infrastructure
• Use the right DB for each workload
• Cache data at multiple levels
• Simplify operations with deployment tools
120. Next steps?
READ!
• aws.amazon.com/documentation
• aws.amazon.com/architecture
• aws.amazon.com/start-ups
• aws.amazon.com/training
ASK FOR HELP!
• forums.aws.amazon.com
• aws.amazon.com/support
121.
122. AWS for Startups
Upcoming Sessions at the AWS Pop Up Loft
11th September, 13.00
16th September, 17.00
29th September, 17.00
14th October, 18.00