3. Problems with this approach
Client
Application
Relational database
• It doesn’t scale
• Management is hard
• High cost
• Low performance
• Migration is difficult
4. Why do we get these problems?
When all you have is a hammer, everything looks like a nail
Client
Application
Relational database
6. AWS service and use case mapping
Data
Search NoSQL SQL DWHCache Hadoop
Blob
store
ETL
Amazon S3 Amazon EMRDynamoDB Amazon RDSElastiCache Amazon
Redshift
AWS Data
Pipeline
Amazon
CloudSearch
8. Social gaming
Autoscaling
Elastic
Loadbalancer
Mobile client
DynamoDB Amazon S3
Log files
Amazon
Elastic
MapReduce
3
1
2
Social gaming have a large amount
of transactions, which all require
high performance and extreme
scalability
① Player data is stored in Amazon
DynamoDB, which can scale both in
terms of data volume and performance.
Long term usage log files are sent in
parallel to S3 for unlimited and cheap
storage.
Big data analytics are done in
EMR, which can be easily integrated
with both DynamoDB and S3.
1
2
3
9. E-commerce site
Autoscaling
End users
RDS
(Master)
ElastiCache
4
1
2
High availability, search performance
and flexibility to rapidly change data
structures to fit new business
requirements.
① For high performance, low latency
responses, cache in Elasticache first
② Order and customer information stored
in a traditional, but fault tolerant RDS.
商 Item meta data, such as color, title etc
are all stored in DynamoDB for a very
flexible data schema
④ For scalable search meta data is
indexed into CloudSearch, which can
handle full text search easily
1
2
3
RDS
(Slave)
Amazon
CloudSearch
Amazon
DynamoDB
4
10. How do I know which service to pick?
The “data temperature” method
11. What is “data temperature”?
Data ?
http://www.amazon.co.jp/dp/B0016V9FCQ
12. Data temperature
Hot Warm Cold
Volume MB~GB GB~TB PB
Item size B~KB KB~MB KB~TB
Latency ms ms-s min-hr
Durability Low-high High Very high
Request rate Very high High Low
Cost/GB $$~$ $~¢¢ ¢
The temperature of the data will vary depending on its format and use.
13. The AWS service heat map
Low
Data volume
Latency
Cost/GB
Request
Amazon
ElastiCache Amazon RDS
Amazon DynamoDB Amazon S3
Amazon RedShift
Amazon EMR
Low
High
High
Low
Low
High
High
14. How do I know which service to pick?
The cost estimation method
15. Choosing service based on cost estimate
Example: Should I pick S3 or DynamoDB?
• “I’m currently scoping out a project that will greatly
increase my team’s use of Amazon S3. Hoping you
could answer some questions. The current iteration of
the design calls for many small files, perhaps up to a
billion during peak. The total size would be on the
order of 1.5 TB per month…”
Request rate
writes/s
Object size
bytes
Total size
GB/month
Objects per
month
300 2048 1483 777,600,000
16. Choosing service based on cost estimate
Example: Should I pick S3 or DynamoDB?
• Time for …
※: http://calculator.s3.amazonaws.com/index.html?lng=ja_JP
17. Choosing service based on cost estimate
Example: Should I pick S3 or DynamoDB?
Request rate Object size Total size Objects
300 2048 1483 777,600,000
DynamoDB
Monthly cost: $669.56
Amazon S3
Monthly cost: $4325.33<
18. Choosing service based on cost estimate
Example: Should I pick S3 or DynamoDB?
Request rate Object size Total size Objects
Scenario 1 300 2048 1483 777,600,000
Scenario 2 300 32,768 23,730 777,600,000
DynamoDB win
Amazon S3 win
20. Summary
• The era of relational database only onpremises
architecture is over.
• Performance, reliability, and scalability can
all be improved by the cloud, but choosing the
right architecture is must.
• There are several ways of choosing the right
service for the job
– Use the “data temperature” and use case
– Use the reverse cost estimate method
– Ask AWS sales
21. When in doubt, contact us
https://aws.amazon.com/jp/contact-us/
23. Amazon RDS
A fully managed relational database service
• Create and scale with a
few clicks
• Automated backups every
5 minutes for DR
• Manual snapshot feature
Availability Zone A Availability Zone B
Master Slave
Data synch
Automatic failover
Automated
backup
• Automated security
patching
• 4 supported engines
• Monitoring and
automatic recovery
24. Amazon RDS
A fully managed relational database service
When to use
• Transactions
• Complex queries
• Medium to high query/write
rate
– Up to 30 K IOPS (15 K reads
+ 15K writes)
• 100s of GB to low TBs
• Workload can fit in a
single node
• High durability
and not to use
• Massive read/write rates
– Example: 150 K write requests
per second
• Data size or throughput
demands
• sharding
– Example: 10 s or 100 s of
terabytes
• Simple Get/Put and queries
that a NoSQL can handle
• Complex analytics
25. DynamoDB
Fully managed NoSQL service
• Easy administration and
high availability
– No SPOF
– Data is replicated into 3
availability zones
– Storage scales, and data is
automatically partioned
• No limit on storage
– Only pay for the storage you
use
– No need to add nodes or disks
as storage grows
Client
Region
26. DynamoDB
Fully managed NoSQL service
• Fast and predictable
performance
• Seamless/massive scale
• Autosharding
• Consistent/low latency
• No size or throughput
limits
• Very high durability
• Key-value or simple queries
• Need multi-item/row or
cross table transactions
• Need complex
queries, joins
• Need real-time analytics
on historic data
• Storing cold data
When to use and not to use
27. Amazon Redshift
Fully managed data warehouse service
• DWH as a Service: Amazon Redshift
is a fast, fully
managed, petabyte-scale data
warehouse service
• Scalable: 160GB ~ Petabytes
• Fast: Amazon Redshift has a
massively parallel processing
(MPP) architecture, parallelizing
and distributing SQL operations to
take advantage of all available
resources.
• Low cost: No initial cost, no
license fees, and only pay for
what you use.
+nodes
BI tools
リーダー
ノード
Comput
e node
Comput
e node
Comput
e node
JDBC/ODBC
10GigE Mesh
SQL end point:
• Parallel queries
• Create results
S3, DynamoDB, EMR
integration
28. Amazon Redshift
Fully managed data warehouse service
• Information analysis and
reporting
• Complex DW queries that
summarize historical data
• Batched large updates e.g. daily
sales totals
• 10s of concurrent queries
• 100s GB to PB
• Compression
• Column based
• Very high durability
• OLTP workloads
– 1000s of concurrent
users
– Large number of
singleton updates
When to use and not to use
29. Amazon S3
low cost, highly reliable object storage service
Datacenter A
Datacenter C
Datacenter B
File A
File B
File C
User side Infrastructure side
• Never lose data with
99.99999999999% reliability
• Data automatically replicated
• Choose from over 9 regions
globally
• Only put data, with no need to
worry about scalability,
infrastructure, volume expansion
etc.
• Only pay for what you use
Example:1GB/Month – ~3yen
30. Amazon S3
low cost, highly reliable object storage service
• Store large objects
• Key-value store - Get/Put/List
• Unlimited storage
• Versioning
• Very high durability
– 99.999999999%
• Very high throughput (via parallel
clients)
• Use for storing persistent data
– Backups
– Source/target for EMR
– Blob store with metadata in SQL or
NoSQL
• Complex queries
• Very low latency (ms)
• Search
• Read-after-write
consistency for
overwrites
• Need transactions
When to use and not to use