6. Utility computing
On demand Pay as you go
Uniform Available
Compute
Storage
Security
Scaling
Database
Networking
Monitoring
Messaging
Workflow
DNS
Load
Balancing
Backup
CDN
7. No
Up-‐Front
Capital
Expense
Pay
Only
for
What
You
Use
Self-‐Service
Infrastructure
Easily
Scale
Up
and
Down
Improve
Agility
&
Time-‐to-‐Market
Low
Cost
Deploy
Cloud computing benefits
12. NumberofEC2Instances
4/12/2008 4/14/2008 4/15/2008 4/16/2008 4/18/2008 4/19/2008 4/20/20084/17/20084/13/2008
40
servers
to
5000
in
3
days
EC2 scaled to peak of 5000
instances
“Techcrunched”
Launch of Facebook
modification
Steady state of ~40
instances
13. Compute
Storage
AWS
Global
Infrastructure
Database
App
Services
Deployment
&
AdministraNon
Networking
Global Infrastructure
14. Global Infrastructure
Region
US-WEST (N. California)
EU-WEST (Ireland)
ASIA PAC (Tokyo)
ASIA PAC
(Singapore)
US-WEST (Oregon)
SOUTH AMERICA (Sao Paulo)
US-EAST (Virginia)
GOV CLOUD
ASIA PAC
(Sydney)
16. Customer Needs
• Store
Any
Amount
of
Data
– Without
Capacity
Planning
• Perform
Complex
Analysis
on
Any
Data
– Scale
on
Demand
• Store
Data
Securely
• Decrease
Time
to
Market
– Build
Environments
Quickly
• Reduce
Costs
– Reduce
Capital
Expenditure
• Enable
Global
Reach
18. ElasNc
Block
Store
High performance block storage
device
1GB to 1TB in size
Mount as drives to instances with
snapshot/cloning functionalities
IMAGE
Availability
99.99%
Durability
99.999999999%
Is a Web Store
Not a file system
No Single Points of Failure
Eventually consistent
Paradigm Object store
Performance Very Fast
Redundancy Across Availability Zones
Security Public Key / Private Key
Pricing $0.095/GB/month
Typical use
case
Write once, read many
Limits 100 Buckets, Unlimited
Storage, 5TB Objects
Simple
Storage
Service
Highly
scalable
object
storage
for
the
internet
1
byte
to
5TB
in
size
99.999999999%
durability
19. Peak Requests: 830,000+ per second
Total Number of Objects Stored in Amazon S3
14 Billion
40 Billion
102 Billion
762 Billion
262 Billion
1.3 Trillion
Q4 2006 Q4 2007 Q4 2008 Q4 2009 Q4 2010 Q4 2011 Q4 2012
Objects in S3
20. Glacier
Long
term
object
archive
Extremely
low
cost
per
gigabyte
99.999999999%
durability
ElasNc
Block
Store
High performance block storage
device
1GB to 1TB in size
Mount as drives to instances with
snapshot/cloning functionalities
IMAGE
Durability
99.999999999%
Designed for Archival
Not a file system
Vaults & Archives
3-5 Hour Retrieval Time
Paradigm Archive Store
Performance Configurable - Low
Redundancy Across Availability Zones
Security Public Key / Private Key
Pricing $0.011/GB/month
Typical use
case
Write once, read
infrequently
< 10% / Month
21. Simple
Storage
Service
Highly
scalable
object
storage
1
byte
to
5TB
in
size
99.999999999%
durability
Glacier
Long
term
object
archive
Extremely
low
cost
per
gigabyte
99.999999999%
durability
Storage
Lifecycle
IntegraNon
23. Compute
Storage
AWS
Global
Infrastructure
Database
App
Services
Deployment
&
AdministraNon
Networking
Database
Relational Database Service
Managed Oracle, MySQL & SQL Server
Dynamo DB
Managed NOSQL Database
Amazon Redshift
Massively Parallel Petabyte Scale Data Warehouse
RDS Dynamo
DB
Redshift
24. Compute
Storage
AWS
Global
Infrastructure
Database
App
Services
Deployment
&
AdministraNon
Networking
Database
Relational Database Service
Database-as-a-Service
No need to install or manage database instances
Scalable and fault tolerant configurations
Integration with Data Pipeline
RDS Dynamo
DB
Redshift
25. Compute
Storage
AWS
Global
Infrastructure
Database
App
Services
Deployment
&
AdministraNon
Networking
Database
DynamoDB
Provisioned throughput NoSQL database
Fast, predictable, configurable performance
Fully distributed, fault tolerant HA architecture
Integration with EMR & Hive
RDS Dynamo
DB
Redshift
26. Compute
Storage
AWS
Global
Infrastructure
Database
App
Services
Deployment
&
AdministraNon
Networking
Database
Redshift
Managed Massively Parallel Petabyte Scale Data
Warehouse
Streaming Backup/Restore to S3
Extensive Security
2 TB -> 1.6 PB
RDS Dynamo
DB
Redshift
32. Input Datanode: This could be a S3 bucket, RDS
table, EMR Hive table, etc.
Activity: This is a data aggregation,
manipulation, or copy that runs on a user-
configured schedule.
Output Datanode: This supports all the same
datasources as the input datanode, but they don’t
have to be the same type.
Amazon Data Pipeline
35. Benefits only possible in the Cloud
Pay as you
Go
Lower
Overall
Costs
Stop
Guessing
Capacity
Agility /
Speed /
Innovation
Avoid
Undifferentiated
Heavy Lifting
Go Global
in Minutes
✔ ✔ ✔ ✔ ✔ ✔
“Private
Cloud” /
On
Premises
X X X X X X
37. Ease of Operation
Compute
Infrastructure
Hadoop
ConfiguraNon
Local
Disk
OperaNng
System
Config
HDFS
Networking
Hive
Pig
HBase
User
Defined
Sogware
InstallaNon
38. Ease of Operation
Compute
Infrastructure
Hadoop
ConfiguraNon
Local
Disk
OperaNng
System
Config
HDFS
Networking
Hive
Pig
HBase
User
Defined
Sogware
InstallaNon
Multiple Hadoop
Distributions - Open Source
& MapR
Clusters Launched with 1
Command
Up in 5 Minutes
Hard Partitioned per
Customer on CPU, Memory
and Disk
Dynamic Cluster Resizing
In any of 8 Regions around
the Globe
40. Lower TCO
June
2013
Study
by
Accenture
Technology
Labs
Not
Sponsored
or
Funded
by
Amazon
“Accenture
assessed
the
price-‐
performance
raJo
between
bare-‐metal
Hadoop
clusters
and
Hadoop-‐as-‐a-‐Service
on
Amazon
Web
Services…[and]
revealed
that
Hadoop-‐as-‐a-‐Service
offers
bePer
price-‐performance
raJo…”
hkp://www.accenture.com/us-‐en/Pages/insight-‐hadoop-‐
deployment-‐comparison.aspx
41. • Spot allows customers
to bid on unused EC2
capacity
• Spot price based on
supply/demand of
instance types in an
Availability Zone
• Customers are fulfilled
when their bid price is
higher than the Spot
Price
• Instances will be
interrupted when the
Spot price exceed the
bid price
Spot 101 - What are Spot Instances