SlideShare ist ein Scribd-Unternehmen logo
1 von 30
Rethinking the database for the
cloud
AWS database services best practices
Amazon Data Services Japan
Rasmus Ekman
Traditional architecture
Client
Application
Relational database
Problems with this approach
Client
Application
Relational database
• It doesn’t scale
• Management is hard
• High cost
• Low performance
• Migration is difficult
Why do we get these problems?
When all you have is a hammer, everything looks like a nail
Client
Application
Relational database
Rethinking the architecture
Client
Application
Data
Search
NoSQL SQL DWH
Cache
Hadoop
Blob
Store
ETL
AWS service and use case mapping
Data
Search NoSQL SQL DWHCache Hadoop
Blob
store
ETL
Amazon S3 Amazon EMRDynamoDB Amazon RDSElastiCache Amazon
Redshift
AWS Data
Pipeline
Amazon
CloudSearch
Sample references
Social gaming
Autoscaling
Elastic
Loadbalancer
Mobile client
DynamoDB Amazon S3
Log files
Amazon
Elastic
MapReduce
3
1
2
Social gaming have a large amount
of transactions, which all require
high performance and extreme
scalability
① Player data is stored in Amazon
DynamoDB, which can scale both in
terms of data volume and performance.
Long term usage log files are sent in
parallel to S3 for unlimited and cheap
storage.
Big data analytics are done in
EMR, which can be easily integrated
with both DynamoDB and S3.
1
2
3
E-commerce site
Autoscaling
End users
RDS
(Master)
ElastiCache
4
1
2
High availability, search performance
and flexibility to rapidly change data
structures to fit new business
requirements.
① For high performance, low latency
responses, cache in Elasticache first
② Order and customer information stored
in a traditional, but fault tolerant RDS.
商 Item meta data, such as color, title etc
are all stored in DynamoDB for a very
flexible data schema
④ For scalable search meta data is
indexed into CloudSearch, which can
handle full text search easily
1
2
3
RDS
(Slave)
Amazon
CloudSearch
Amazon
DynamoDB
4
How do I know which service to pick?
The “data temperature” method
What is “data temperature”?
Data ?
http://www.amazon.co.jp/dp/B0016V9FCQ
Data temperature
Hot Warm Cold
Volume MB~GB GB~TB PB
Item size B~KB KB~MB KB~TB
Latency ms ms-s min-hr
Durability Low-high High Very high
Request rate Very high High Low
Cost/GB $$~$ $~¢¢ ¢
The temperature of the data will vary depending on its format and use.
The AWS service heat map
Low
Data volume
Latency
Cost/GB
Request
Amazon
ElastiCache Amazon RDS
Amazon DynamoDB Amazon S3
Amazon RedShift
Amazon EMR
Low
High
High
Low
Low
High
High
How do I know which service to pick?
The cost estimation method
Choosing service based on cost estimate
Example: Should I pick S3 or DynamoDB?
• “I’m currently scoping out a project that will greatly
increase my team’s use of Amazon S3. Hoping you
could answer some questions. The current iteration of
the design calls for many small files, perhaps up to a
billion during peak. The total size would be on the
order of 1.5 TB per month…”
Request rate
writes/s
Object size
bytes
Total size
GB/month
Objects per
month
300 2048 1483 777,600,000
Choosing service based on cost estimate
Example: Should I pick S3 or DynamoDB?
• Time for …
※: http://calculator.s3.amazonaws.com/index.html?lng=ja_JP
Choosing service based on cost estimate
Example: Should I pick S3 or DynamoDB?
Request rate Object size Total size Objects
300 2048 1483 777,600,000
DynamoDB
Monthly cost: $669.56
Amazon S3
Monthly cost: $4325.33<
Choosing service based on cost estimate
Example: Should I pick S3 or DynamoDB?
Request rate Object size Total size Objects
Scenario 1 300 2048 1483 777,600,000
Scenario 2 300 32,768 23,730 777,600,000
DynamoDB win
Amazon S3 win
Summary
Summary
• The era of relational database only onpremises
architecture is over.
• Performance, reliability, and scalability can
all be improved by the cloud, but choosing the
right architecture is must.
• There are several ways of choosing the right
service for the job
– Use the “data temperature” and use case
– Use the reverse cost estimate method
– Ask AWS sales
When in doubt, contact us
https://aws.amazon.com/jp/contact-us/
APPENDIX
AWS database services -
introduction and best practices
Amazon RDS
A fully managed relational database service
• Create and scale with a
few clicks
• Automated backups every
5 minutes for DR
• Manual snapshot feature
Availability Zone A Availability Zone B
Master Slave
Data synch
Automatic failover
Automated
backup
• Automated security
patching
• 4 supported engines
• Monitoring and
automatic recovery
Amazon RDS
A fully managed relational database service
When to use
• Transactions
• Complex queries
• Medium to high query/write
rate
– Up to 30 K IOPS (15 K reads
+ 15K writes)
• 100s of GB to low TBs
• Workload can fit in a
single node
• High durability
and not to use
• Massive read/write rates
– Example: 150 K write requests
per second
• Data size or throughput
demands
• sharding
– Example: 10 s or 100 s of
terabytes
• Simple Get/Put and queries
that a NoSQL can handle
• Complex analytics
DynamoDB
Fully managed NoSQL service
• Easy administration and
high availability
– No SPOF
– Data is replicated into 3
availability zones
– Storage scales, and data is
automatically partioned
• No limit on storage
– Only pay for the storage you
use
– No need to add nodes or disks
as storage grows
Client
Region
DynamoDB
Fully managed NoSQL service
• Fast and predictable
performance
• Seamless/massive scale
• Autosharding
• Consistent/low latency
• No size or throughput
limits
• Very high durability
• Key-value or simple queries
• Need multi-item/row or
cross table transactions
• Need complex
queries, joins
• Need real-time analytics
on historic data
• Storing cold data
When to use and not to use
Amazon Redshift
Fully managed data warehouse service
• DWH as a Service: Amazon Redshift
is a fast, fully
managed, petabyte-scale data
warehouse service
• Scalable: 160GB ~ Petabytes
• Fast: Amazon Redshift has a
massively parallel processing
(MPP) architecture, parallelizing
and distributing SQL operations to
take advantage of all available
resources.
• Low cost: No initial cost, no
license fees, and only pay for
what you use.
+nodes
BI tools
リーダー
ノード
Comput
e node
Comput
e node
Comput
e node
JDBC/ODBC
10GigE Mesh
SQL end point:
• Parallel queries
• Create results
S3, DynamoDB, EMR
integration
Amazon Redshift
Fully managed data warehouse service
• Information analysis and
reporting
• Complex DW queries that
summarize historical data
• Batched large updates e.g. daily
sales totals
• 10s of concurrent queries
• 100s GB to PB
• Compression
• Column based
• Very high durability
• OLTP workloads
– 1000s of concurrent
users
– Large number of
singleton updates
When to use and not to use
Amazon S3
low cost, highly reliable object storage service
Datacenter A
Datacenter C
Datacenter B
File A
File B
File C
User side Infrastructure side
• Never lose data with
99.99999999999% reliability
• Data automatically replicated
• Choose from over 9 regions
globally
• Only put data, with no need to
worry about scalability,
infrastructure, volume expansion
etc.
• Only pay for what you use
Example:1GB/Month – ~3yen
Amazon S3
low cost, highly reliable object storage service
• Store large objects
• Key-value store - Get/Put/List
• Unlimited storage
• Versioning
• Very high durability
– 99.999999999%
• Very high throughput (via parallel
clients)
• Use for storing persistent data
– Backups
– Source/target for EMR
– Blob store with metadata in SQL or
NoSQL
• Complex queries
• Very low latency (ms)
• Search
• Read-after-write
consistency for
overwrites
• Need transactions
When to use and not to use

Weitere ähnliche Inhalte

Was ist angesagt?

Autoscale DynamoDB with Dynamic DynamoDB
Autoscale DynamoDB with Dynamic DynamoDBAutoscale DynamoDB with Dynamic DynamoDB
Autoscale DynamoDB with Dynamic DynamoDB
Sebastian Dahlgren
 

Was ist angesagt? (20)

AWS Analytics
AWS AnalyticsAWS Analytics
AWS Analytics
 
Building Your Data Warehouse with Amazon Redshift
Building Your Data Warehouse with Amazon RedshiftBuilding Your Data Warehouse with Amazon Redshift
Building Your Data Warehouse with Amazon Redshift
 
(ARC202) Real-World Real-Time Analytics | AWS re:Invent 2014
(ARC202) Real-World Real-Time Analytics | AWS re:Invent 2014(ARC202) Real-World Real-Time Analytics | AWS re:Invent 2014
(ARC202) Real-World Real-Time Analytics | AWS re:Invent 2014
 
DAT102 Introduction to Amazon DynamoDB - AWS re: Invent 2012
DAT102 Introduction to Amazon DynamoDB - AWS re: Invent 2012DAT102 Introduction to Amazon DynamoDB - AWS re: Invent 2012
DAT102 Introduction to Amazon DynamoDB - AWS re: Invent 2012
 
AWS Big Data Platform
AWS Big Data PlatformAWS Big Data Platform
AWS Big Data Platform
 
AWS Storage and Database Architecture Best Practices (DAT203) | AWS re:Invent...
AWS Storage and Database Architecture Best Practices (DAT203) | AWS re:Invent...AWS Storage and Database Architecture Best Practices (DAT203) | AWS re:Invent...
AWS Storage and Database Architecture Best Practices (DAT203) | AWS re:Invent...
 
Big Data Architectural Patterns and Best Practices on AWS
Big Data Architectural Patterns and Best Practices on AWSBig Data Architectural Patterns and Best Practices on AWS
Big Data Architectural Patterns and Best Practices on AWS
 
BDA302 Deep Dive on Migrating Big Data Workloads to Amazon EMR
BDA302 Deep Dive on Migrating Big Data Workloads to Amazon EMRBDA302 Deep Dive on Migrating Big Data Workloads to Amazon EMR
BDA302 Deep Dive on Migrating Big Data Workloads to Amazon EMR
 
AWS Webinar - Dynamo DB + Redshift 13_09_19
AWS Webinar - Dynamo DB + Redshift 13_09_19AWS Webinar - Dynamo DB + Redshift 13_09_19
AWS Webinar - Dynamo DB + Redshift 13_09_19
 
AWS re:Invent 2016: Streaming ETL for RDS and DynamoDB (DAT315)
AWS re:Invent 2016: Streaming ETL for RDS and DynamoDB (DAT315)AWS re:Invent 2016: Streaming ETL for RDS and DynamoDB (DAT315)
AWS re:Invent 2016: Streaming ETL for RDS and DynamoDB (DAT315)
 
Getting Maximum Performance from Amazon Redshift: Complex Queries
Getting Maximum Performance from Amazon Redshift: Complex QueriesGetting Maximum Performance from Amazon Redshift: Complex Queries
Getting Maximum Performance from Amazon Redshift: Complex Queries
 
Optimizing Storage for Big Data/Analytics Workloads
Optimizing Storage for Big Data/Analytics WorkloadsOptimizing Storage for Big Data/Analytics Workloads
Optimizing Storage for Big Data/Analytics Workloads
 
Uses and Best Practices for Amazon Redshift
Uses and Best Practices for Amazon Redshift Uses and Best Practices for Amazon Redshift
Uses and Best Practices for Amazon Redshift
 
Getting Started with Amazon QuickSight
Getting Started with Amazon QuickSightGetting Started with Amazon QuickSight
Getting Started with Amazon QuickSight
 
Log Analytics with Amazon Elasticsearch Service & Kibana
Log Analytics with Amazon Elasticsearch Service & KibanaLog Analytics with Amazon Elasticsearch Service & Kibana
Log Analytics with Amazon Elasticsearch Service & Kibana
 
Leveraging Amazon Redshift for your Data Warehouse
Leveraging Amazon Redshift for your Data WarehouseLeveraging Amazon Redshift for your Data Warehouse
Leveraging Amazon Redshift for your Data Warehouse
 
(WRK302) Event-Driven Programming
(WRK302) Event-Driven Programming(WRK302) Event-Driven Programming
(WRK302) Event-Driven Programming
 
AWS Database Services
AWS Database ServicesAWS Database Services
AWS Database Services
 
New Database Migration Services & RDS Updates
New Database Migration Services & RDS UpdatesNew Database Migration Services & RDS Updates
New Database Migration Services & RDS Updates
 
Autoscale DynamoDB with Dynamic DynamoDB
Autoscale DynamoDB with Dynamic DynamoDBAutoscale DynamoDB with Dynamic DynamoDB
Autoscale DynamoDB with Dynamic DynamoDB
 

Ähnlich wie Rethinking the database for the cloud (iJAWS)

Ähnlich wie Rethinking the database for the cloud (iJAWS) (20)

AWS Webcast - Managing Big Data in the AWS Cloud_20140924
AWS Webcast - Managing Big Data in the AWS Cloud_20140924AWS Webcast - Managing Big Data in the AWS Cloud_20140924
AWS Webcast - Managing Big Data in the AWS Cloud_20140924
 
AWS Enterprise Summit Netherlands - Big Data Architectural Patterns & Best Pr...
AWS Enterprise Summit Netherlands - Big Data Architectural Patterns & Best Pr...AWS Enterprise Summit Netherlands - Big Data Architectural Patterns & Best Pr...
AWS Enterprise Summit Netherlands - Big Data Architectural Patterns & Best Pr...
 
Big Data Architectural Patterns
Big Data Architectural PatternsBig Data Architectural Patterns
Big Data Architectural Patterns
 
HSBC and AWS Day - Database Options on AWS
HSBC and AWS Day - Database Options on AWSHSBC and AWS Day - Database Options on AWS
HSBC and AWS Day - Database Options on AWS
 
AWS November Webinar Series - Architectural Patterns & Best Practices for Big...
AWS November Webinar Series - Architectural Patterns & Best Practices for Big...AWS November Webinar Series - Architectural Patterns & Best Practices for Big...
AWS November Webinar Series - Architectural Patterns & Best Practices for Big...
 
Big Data and Architectural Patterns on AWS - Pop-up Loft Tel Aviv
Big Data and Architectural Patterns on AWS - Pop-up Loft Tel AvivBig Data and Architectural Patterns on AWS - Pop-up Loft Tel Aviv
Big Data and Architectural Patterns on AWS - Pop-up Loft Tel Aviv
 
Data warehousing in the era of Big Data: Deep Dive into Amazon Redshift
Data warehousing in the era of Big Data: Deep Dive into Amazon RedshiftData warehousing in the era of Big Data: Deep Dive into Amazon Redshift
Data warehousing in the era of Big Data: Deep Dive into Amazon Redshift
 
Intro to database_services_fg_aws_summit_2014
Intro to database_services_fg_aws_summit_2014Intro to database_services_fg_aws_summit_2014
Intro to database_services_fg_aws_summit_2014
 
February 2016 Webinar Series - Architectural Patterns for Big Data on AWS
February 2016 Webinar Series - Architectural Patterns for Big Data on AWSFebruary 2016 Webinar Series - Architectural Patterns for Big Data on AWS
February 2016 Webinar Series - Architectural Patterns for Big Data on AWS
 
Database and Analytics on the AWS Cloud
Database and Analytics on the AWS CloudDatabase and Analytics on the AWS Cloud
Database and Analytics on the AWS Cloud
 
Getting Started with Managed Database Services on AWS
Getting Started with Managed Database Services on AWSGetting Started with Managed Database Services on AWS
Getting Started with Managed Database Services on AWS
 
(DAT202) Managed Database Options on AWS
(DAT202) Managed Database Options on AWS(DAT202) Managed Database Options on AWS
(DAT202) Managed Database Options on AWS
 
(BDT322) How Redfin & Twitter Leverage Amazon S3 For Big Data
(BDT322) How Redfin & Twitter Leverage Amazon S3 For Big Data(BDT322) How Redfin & Twitter Leverage Amazon S3 For Big Data
(BDT322) How Redfin & Twitter Leverage Amazon S3 For Big Data
 
Deep Dive: Amazon Elastic MapReduce
Deep Dive: Amazon Elastic MapReduceDeep Dive: Amazon Elastic MapReduce
Deep Dive: Amazon Elastic MapReduce
 
2017 AWS DB Day | AWS 데이터베이스 개요 - 나의 업무에 적합한 데이터베이스는?
2017 AWS DB Day |  AWS 데이터베이스 개요 - 나의 업무에 적합한 데이터베이스는?2017 AWS DB Day |  AWS 데이터베이스 개요 - 나의 업무에 적합한 데이터베이스는?
2017 AWS DB Day | AWS 데이터베이스 개요 - 나의 업무에 적합한 데이터베이스는?
 
Choosing the Right Database Service (김상필, 유타카 호시노) - AWS DB Day
Choosing the Right Database Service (김상필, 유타카 호시노) - AWS DB DayChoosing the Right Database Service (김상필, 유타카 호시노) - AWS DB Day
Choosing the Right Database Service (김상필, 유타카 호시노) - AWS DB Day
 
Data & Analytics - Session 2 - Introducing Amazon Redshift
Data & Analytics - Session 2 - Introducing Amazon RedshiftData & Analytics - Session 2 - Introducing Amazon Redshift
Data & Analytics - Session 2 - Introducing Amazon Redshift
 
Amazon Elastic Map Reduce - Ian Meyers
Amazon Elastic Map Reduce - Ian MeyersAmazon Elastic Map Reduce - Ian Meyers
Amazon Elastic Map Reduce - Ian Meyers
 
Big Data Architectural Patterns and Best Practices on AWS
Big Data Architectural Patterns and Best Practices on AWSBig Data Architectural Patterns and Best Practices on AWS
Big Data Architectural Patterns and Best Practices on AWS
 
Amazon Redshift, Customer Acquisition Cost & Advertising ROI presented with A...
Amazon Redshift, Customer Acquisition Cost & Advertising ROI presented with A...Amazon Redshift, Customer Acquisition Cost & Advertising ROI presented with A...
Amazon Redshift, Customer Acquisition Cost & Advertising ROI presented with A...
 

Mehr von Rasmus Ekman

Mehr von Rasmus Ekman (9)

クラウドが拓いたITの進化
クラウドが拓いたITの進化クラウドが拓いたITの進化
クラウドが拓いたITの進化
 
センサーからコグニティブまで、IoTの本当のフルスタックとは?
センサーからコグニティブまで、IoTの本当のフルスタックとは?センサーからコグニティブまで、IoTの本当のフルスタックとは?
センサーからコグニティブまで、IoTの本当のフルスタックとは?
 
APIエコノミー (金融編)
APIエコノミー (金融編)APIエコノミー (金融編)
APIエコノミー (金融編)
 
APIエコノミーの現状と今後の期待
APIエコノミーの現状と今後の期待APIエコノミーの現状と今後の期待
APIエコノミーの現状と今後の期待
 
金融業界におけるAPIエコノミー / Fintech meetup / IBM
金融業界におけるAPIエコノミー / Fintech meetup / IBM金融業界におけるAPIエコノミー / Fintech meetup / IBM
金融業界におけるAPIエコノミー / Fintech meetup / IBM
 
APIエコノミーで日本をよくしましょう
APIエコノミーで日本をよくしましょうAPIエコノミーで日本をよくしましょう
APIエコノミーで日本をよくしましょう
 
AWSでのバースト ― GP2 T2 ご紹介資料
AWSでのバースト ― GP2 T2 ご紹介資料AWSでのバースト ― GP2 T2 ご紹介資料
AWSでのバースト ― GP2 T2 ご紹介資料
 
JAWS札幌 re:Invent 2014レポート ― サーバレスの時代へ
JAWS札幌 re:Invent 2014レポート ― サーバレスの時代へJAWS札幌 re:Invent 2014レポート ― サーバレスの時代へ
JAWS札幌 re:Invent 2014レポート ― サーバレスの時代へ
 
AWS - an introduction to bursting (GP2 - T2)
AWS - an introduction to bursting (GP2 - T2)AWS - an introduction to bursting (GP2 - T2)
AWS - an introduction to bursting (GP2 - T2)
 

Kürzlich hochgeladen

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Victor Rentea
 

Kürzlich hochgeladen (20)

Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKSpring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
AI in Action: Real World Use Cases by Anitaraj
AI in Action: Real World Use Cases by AnitarajAI in Action: Real World Use Cases by Anitaraj
AI in Action: Real World Use Cases by Anitaraj
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
JohnPollard-hybrid-app-RailsConf2024.pptx
JohnPollard-hybrid-app-RailsConf2024.pptxJohnPollard-hybrid-app-RailsConf2024.pptx
JohnPollard-hybrid-app-RailsConf2024.pptx
 

Rethinking the database for the cloud (iJAWS)

  • 1. Rethinking the database for the cloud AWS database services best practices Amazon Data Services Japan Rasmus Ekman
  • 3. Problems with this approach Client Application Relational database • It doesn’t scale • Management is hard • High cost • Low performance • Migration is difficult
  • 4. Why do we get these problems? When all you have is a hammer, everything looks like a nail Client Application Relational database
  • 6. AWS service and use case mapping Data Search NoSQL SQL DWHCache Hadoop Blob store ETL Amazon S3 Amazon EMRDynamoDB Amazon RDSElastiCache Amazon Redshift AWS Data Pipeline Amazon CloudSearch
  • 8. Social gaming Autoscaling Elastic Loadbalancer Mobile client DynamoDB Amazon S3 Log files Amazon Elastic MapReduce 3 1 2 Social gaming have a large amount of transactions, which all require high performance and extreme scalability ① Player data is stored in Amazon DynamoDB, which can scale both in terms of data volume and performance. Long term usage log files are sent in parallel to S3 for unlimited and cheap storage. Big data analytics are done in EMR, which can be easily integrated with both DynamoDB and S3. 1 2 3
  • 9. E-commerce site Autoscaling End users RDS (Master) ElastiCache 4 1 2 High availability, search performance and flexibility to rapidly change data structures to fit new business requirements. ① For high performance, low latency responses, cache in Elasticache first ② Order and customer information stored in a traditional, but fault tolerant RDS. 商 Item meta data, such as color, title etc are all stored in DynamoDB for a very flexible data schema ④ For scalable search meta data is indexed into CloudSearch, which can handle full text search easily 1 2 3 RDS (Slave) Amazon CloudSearch Amazon DynamoDB 4
  • 10. How do I know which service to pick? The “data temperature” method
  • 11. What is “data temperature”? Data ? http://www.amazon.co.jp/dp/B0016V9FCQ
  • 12. Data temperature Hot Warm Cold Volume MB~GB GB~TB PB Item size B~KB KB~MB KB~TB Latency ms ms-s min-hr Durability Low-high High Very high Request rate Very high High Low Cost/GB $$~$ $~¢¢ ¢ The temperature of the data will vary depending on its format and use.
  • 13. The AWS service heat map Low Data volume Latency Cost/GB Request Amazon ElastiCache Amazon RDS Amazon DynamoDB Amazon S3 Amazon RedShift Amazon EMR Low High High Low Low High High
  • 14. How do I know which service to pick? The cost estimation method
  • 15. Choosing service based on cost estimate Example: Should I pick S3 or DynamoDB? • “I’m currently scoping out a project that will greatly increase my team’s use of Amazon S3. Hoping you could answer some questions. The current iteration of the design calls for many small files, perhaps up to a billion during peak. The total size would be on the order of 1.5 TB per month…” Request rate writes/s Object size bytes Total size GB/month Objects per month 300 2048 1483 777,600,000
  • 16. Choosing service based on cost estimate Example: Should I pick S3 or DynamoDB? • Time for … ※: http://calculator.s3.amazonaws.com/index.html?lng=ja_JP
  • 17. Choosing service based on cost estimate Example: Should I pick S3 or DynamoDB? Request rate Object size Total size Objects 300 2048 1483 777,600,000 DynamoDB Monthly cost: $669.56 Amazon S3 Monthly cost: $4325.33<
  • 18. Choosing service based on cost estimate Example: Should I pick S3 or DynamoDB? Request rate Object size Total size Objects Scenario 1 300 2048 1483 777,600,000 Scenario 2 300 32,768 23,730 777,600,000 DynamoDB win Amazon S3 win
  • 20. Summary • The era of relational database only onpremises architecture is over. • Performance, reliability, and scalability can all be improved by the cloud, but choosing the right architecture is must. • There are several ways of choosing the right service for the job – Use the “data temperature” and use case – Use the reverse cost estimate method – Ask AWS sales
  • 21. When in doubt, contact us https://aws.amazon.com/jp/contact-us/
  • 22. APPENDIX AWS database services - introduction and best practices
  • 23. Amazon RDS A fully managed relational database service • Create and scale with a few clicks • Automated backups every 5 minutes for DR • Manual snapshot feature Availability Zone A Availability Zone B Master Slave Data synch Automatic failover Automated backup • Automated security patching • 4 supported engines • Monitoring and automatic recovery
  • 24. Amazon RDS A fully managed relational database service When to use • Transactions • Complex queries • Medium to high query/write rate – Up to 30 K IOPS (15 K reads + 15K writes) • 100s of GB to low TBs • Workload can fit in a single node • High durability and not to use • Massive read/write rates – Example: 150 K write requests per second • Data size or throughput demands • sharding – Example: 10 s or 100 s of terabytes • Simple Get/Put and queries that a NoSQL can handle • Complex analytics
  • 25. DynamoDB Fully managed NoSQL service • Easy administration and high availability – No SPOF – Data is replicated into 3 availability zones – Storage scales, and data is automatically partioned • No limit on storage – Only pay for the storage you use – No need to add nodes or disks as storage grows Client Region
  • 26. DynamoDB Fully managed NoSQL service • Fast and predictable performance • Seamless/massive scale • Autosharding • Consistent/low latency • No size or throughput limits • Very high durability • Key-value or simple queries • Need multi-item/row or cross table transactions • Need complex queries, joins • Need real-time analytics on historic data • Storing cold data When to use and not to use
  • 27. Amazon Redshift Fully managed data warehouse service • DWH as a Service: Amazon Redshift is a fast, fully managed, petabyte-scale data warehouse service • Scalable: 160GB ~ Petabytes • Fast: Amazon Redshift has a massively parallel processing (MPP) architecture, parallelizing and distributing SQL operations to take advantage of all available resources. • Low cost: No initial cost, no license fees, and only pay for what you use. +nodes BI tools リーダー ノード Comput e node Comput e node Comput e node JDBC/ODBC 10GigE Mesh SQL end point: • Parallel queries • Create results S3, DynamoDB, EMR integration
  • 28. Amazon Redshift Fully managed data warehouse service • Information analysis and reporting • Complex DW queries that summarize historical data • Batched large updates e.g. daily sales totals • 10s of concurrent queries • 100s GB to PB • Compression • Column based • Very high durability • OLTP workloads – 1000s of concurrent users – Large number of singleton updates When to use and not to use
  • 29. Amazon S3 low cost, highly reliable object storage service Datacenter A Datacenter C Datacenter B File A File B File C User side Infrastructure side • Never lose data with 99.99999999999% reliability • Data automatically replicated • Choose from over 9 regions globally • Only put data, with no need to worry about scalability, infrastructure, volume expansion etc. • Only pay for what you use Example:1GB/Month – ~3yen
  • 30. Amazon S3 low cost, highly reliable object storage service • Store large objects • Key-value store - Get/Put/List • Unlimited storage • Versioning • Very high durability – 99.999999999% • Very high throughput (via parallel clients) • Use for storing persistent data – Backups – Source/target for EMR – Blob store with metadata in SQL or NoSQL • Complex queries • Very low latency (ms) • Search • Read-after-write consistency for overwrites • Need transactions When to use and not to use