In addition to running databases in Amazon EC2, AWS customers can choose among a variety of managed database services. These services save effort, save time, and unlock new capabilities and economies. In this session, we make it easy to understand how they differ, what they have in common, and how to choose one or more. We explain the fundamentals of Amazon DynamoDB, a fully managed NoSQL database service; Amazon RDS, a relational database service in the cloud; Amazon ElastiCache, a fast, in-memory caching service in the cloud; and Amazon Redshift, a fully managed, petabyte-scale data-warehouse solution that can be surprisingly economical. We will cover how each service might help support your application, how much each service costs, and how to get started. We will also have with us Jeongsang Baek, the VP of Engineering from IGAWorks, Korea’s No.1 mobile business platform, who will walk us through their architecture and share with us the key insights that they gained from using the various AWS database technologies to deliver a reliable, efficient and cost-effective experience.
7. If you host your databases on-premises
Power, HVAC, net
Rack and stack
Server maintenance
OS patches
DB s/w patches
Database backups
Scaling
High availability
DB s/w installs
OS installation
you
App optimization
8. If you host your databases on-premises
Power, HVAC, net
Rack and stack
Server maintenance
OS patches
DB s/w patches
Database backups
Scaling
High availability
DB s/w installs
OS installation
you
App optimization
9. If you host your databases in Amazon EC2
Power, HVAC, net
Rack and stack
Server maintenance
OS patches
DB s/w patches
Database backups
Scaling
High availability
DB s/w installs
OS installation
you
App optimization
10. If you host your databases in Amazon EC2
OS patches
DB s/w patches
Database backups
Scaling
High availability
DB s/w installs
you
App optimization
Power, HVAC, net
Rack and stack
Server maintenance
OS installation
11. If you choose a managed DB service
Power, HVAC, net
Rack and stack
Server maintenance
OS patches
DB s/w patches
Database backups
App optimization
High availability
DB s/w installs
OS installation
you
Scaling
12. Quick summary of the options
• Self-Managed—You are responsible for the hardware,
OS, security, updates, backups, replication etc., but have
full control over it.
• EC2 Instances—You only need to focus on the database
level updates, patches, replication, backups etc. and
don’t have to worry about the hardware or the OS
installation.
• Fully Managed—You get features such as backup and
replication etc. as a package service and don’t have to
bother with patching and updates.
14. A managed service for each major DB type
Amazon
DynamoDB
Document
and Key-
Value Store
Amazon
RDS
SQL
Database
Engines
Amazon
ElastiCache
In-Memory
Key-Value
Store
Amazon
Redshift
Data
Warehouse
17. NoSQL vs. SQL for a new app: how to choose?
• Schema-less, easy reads
and writes, simple data
model
• Scaling is easy
• Focus on performance and
availability at any scale
• Strong schema, complex
relationships,
transactions and joins
• Scaling is difficult
• Focus on consistency
over scale and availability
NoSQL SQL
20. Popular use cases
Ad Tech IoT Gaming
Mobile
& Web
Ad serving,
retargeting, ID
lookup, user
profile
management,
session-
tracking, RTB
Tracking state,
metadata and
readings from
millions of
devices, real-
time
notifications
Recording
game details,
leaderboards,
session
information,
usage history,
and logs
Storing user
profiles,
session details,
personalization
settings, entity
specific
metadata
21. Predictable, low latency performance
Consistent single-digit millisecond latency even at massive scales
22. Writes
Replicated continuously to 3 AZs
Persisted to disk (custom SSD)
Reads
Strongly or eventually consistent
No latency trade-off
Automatic replication for rock-solid durability and
availability
23. Amazon DynamoDB is a schemaless database
Attributes
Schema-less
Schema is defined per item
Items
Table
Item
Key
24. Define the desired performance using provisioned
throughput
Read
capacity units
Write
capacity units
1 RPS > 2.5 M
requests in a
month
25. You pay for the resources that you use
Monthly
bill = GB +
Pricing varies by region. Further details at http://aws.amazon.com/dynamodb/pricing/
Storage
consumed
Write
capacity
units
(WCUs)
+
Read
capacity
units
(RCUs)
Free tier:
• Generous free tier of 25 GB, 25 WCUs, and 25 RCUs
• That is, you get over 60M read requests and 60M write request for free in a month
• The free tier is indefinite—you benefit from this every month
30. RDS feature matrix
Feature Aurora MySQL PostgreSQL Oracle SQL Server
VPC
High availability
Instance scaling
Encryption Coming
soon
Read replicas Oracle Golden
GateCross region
Max storage 64 TB 6 TB 6 TB 6 TB 4 TB
Scale storage Auto
Scaling
Provisioned IOPS NA 30,000 30,000 30,000 20,000
Largest instance R3.8XL R3.8XL R3.8XL R3.8XL R3.8XL
31. Amazon Aurora: Fast, available, and MySQL-compatible
SQL
Trans-
actions
AZ 1 AZ 2 AZ 3
Caching
Amazon
S3
5x faster than MySQL on
same hardware
Sysbench: 100K writes/sec
and 500K reads/sec
Designed for 99.99%
availability
6-way replicated storage
across 3 AZs
Scale to 64 TB and 15 read
replicas
32. Amazon RDS is simple and fast to scale
Database instance types
offer a range of CPU and
memory selections
Scale up or down among
instance types on demand
Database storage is
scalable on demand
33. Amazon RDS offers fast, predictable storage
General Purpose
(SSD) for most
workloads
Provisioned IOPS
(SSD) for OLTP
workloads up to
30,000 IOPS
Magnetic for small
workloads with
infrequent access
34. High availability Multi-AZ deployments
Enterprise-grade fault tolerance solution for
production databases
35. Choose cross-region replication for enhanced data locality,
even more ease of migration
Even faster recovery in the
event of disaster
Bring data close to your
customers
Promote to a master for
easy migration
36. Monthly
bill = +
Further details at http://aws.amazon.com/rds/pricing/
You pay for the resources that you use
Storage
consumed
Duration for which DB
instances were used
(Price depends on
type of storage)
(Price depends on
type of DB instance)
Free tier (for first 12 months)
• 750 micro DB instance hours
• 20 GB of DB storage
• 20 GB for backups
• 10 million I/O operations
GBN ×
40. Caching layer for performance or cost optimization
of an underlying database
Storage of ephemeral key-value data
High-performance application patterns such as
leaderboards (for gaming users), session
management, event counters, in-memory lists
Popular use cases
42. How ElastiCache billing works
Monthly
bill = N ×
Further details at http://aws.amazon.com/elasticache/pricing/
Duration for which the
nodes were used
Number of nodes
(Price depends on type
of node)
Free tier (for first 12 months)—750 micro cache node hours
45. Amazon
Redshift
a lot faster
a lot cheaper
a whole lot simpler
Relational data warehouse
Massively parallel; petabyte scale
Fully managed
HDD and SSD platforms
$1,000/TB/year; starts at $0.25/hour
46. Popular use cases
10x cheaper
Easy to provision
Higher DBA productivity
Traditional
enterprises
10x faster
No programming
Easily leverage BI tools,
Hadoop, machine
learning, streaming
Companies
with big data
Analysis in-line with
process flows
Pay as you go, grow as
you need
Managed availability and
disaster recovery
SaaS
companies
47. Amazon Redshift architecture
Leader node
• Simple SQL endpoint
• Stores metadata
• Optimizes query plan
• Coordinates query execution
Compute nodes
• Local columnar storage
• Parallel/distributed execution of all
queries, loads, backups, restores,
resizes
Start at just $0.25/hour, grow to 2 PB
(compressed)
• DC1: SSD; scale 160 GB–326 TB
• DS2: HDD; scale 2 TB–2 PB
10 GigE
(HPC)
Ingestion
Backup
Restore
JDBC/ODBC
48. Amazon Redshift is fast
Dramatically less I/O
Column storage
Data compression
Zone maps
Direct-attached storage
Large data block sizes
10 | 13 | 14 | 26 |…
… | 100 | 245 | 324
375 | 393 | 417…
… 512 | 549 | 623
637 | 712 | 809 …
… | 834 | 921 | 959
10
324
375
623
637
959
ID Age State Amount
123 20 CA 500
345 25 WA 250
678 40 FL 125
957 37 WA 375
49. Fully managed, continuous/incremental backups
Multiple copies within cluster
Continuous and incremental backups
to Amazon S3
Continuous and incremental backups
across regions
Streaming restore
Amazon S3
Amazon S3
Region 1
Region 2
50. Amazon Redshift offers rock-solid fault tolerance
Amazon S3
Amazon S3
Region 1
Region 2
Disk failures
Node failures
Network failure
AZ/region level disasters
51. You pay for what you use
Further details at https://aws.amazon.com/redshift/pricing/
Monthly
bill = N ×
Duration for which the
nodes were used
Number of nodes
(Price depends on type
of node)2 month free trial
Leader node is free
No upfront costs, pay as you go
Price includes three data copies
Backup storage is free up to 100% of provisioned storage
3x data compression on average
52. Redshift has a large ecosystem
Data Integration Systems IntegratorsBusiness Intelligence
56. IGAWorks provides
• Adbrix: App analytics and marketing attribution
• Adpopcorn: Monetization
• Live Operation: Operating tools for in-app campaigns
• Nanoo, Jiver: In-app engagement
All services are offered at no cost
57. Architecture of legacy service
Adbrix
User
Mobile
Device
Amazon
Route 53
EC2
Analytics MSSQL
Databases
Analytics
AWS Tokyo region
EC2
Tracking
API
MSSQL
Databases
Activity
Storage
Over hundreds of EC2 instances
Over dozens of MSSQL instances
Over 1 PB EBS
59. Use case: Adpopcorn
• Any app can be media for incentivized ads
• Reward a user in exchange for completing an action such as
installing or running the advertiser’s app
• Types
• Offerwall
• Lock screen ads
60. Participating incentivized ads
1. Open offerwall
2. Request
available ads 3. Read available ads
5. Response available
offers
6. Install and run
advertiser’s app
Ad serve API Ad inventory
7. Sends the first run
activity
8. Put a participation log
Mobile
device
Participation logs
4. Check participation logs
and de-duplicate ads
9. Give promised reward
61. Points to improve performance
• Ad inventory
• Store complex relational data
• Boost DB read request
• Participation log
• High read/write throughput
• Low latency
62. Re-Architecting Adpopcorn
ElastiCache
Ad Inventory
AWS Elastic
Beanstalk
Ad Serve API
Dynamo DB
Participation Log
Route 53Mobile
Device
AWS Tokyo region
Amazon
Kinesis
Participation
Stream
Elastic
Beanstalk
ETL
Worker
Amazon RDS
Monetization
Report
Amazon RDS
Ad Inventory
63. Use case: Adbrix
• Legacy
• Stored ‘all’ activities to MSSQL EC2 instances
• Expensive to store raw data to Amazon Elastic Block Store
• Hard to scale out and distribute data
• If one EC2 instance went down, then the whole service failed
• Storage size limitation
• Need to constantly monitor the storage whether it is full or not
64. Re-architecting Adbrix
EMR-Spark
Daily Batch
Analysis
Adbrix
User
Mobile
Device
Route 53
EC2
Adbrix
Analytics
Database
Adbrix
Analytics
Elastic
Beanstalk
Activity
Tracker
Amazon
Kinesis
Elastic
Beanstalk
Activity
Process
Amazon S3
Activity
Storages
Amazon
Lambda
Micro-batch
loading
Amazon
Redshift
BI Analysis
AWS Tokyo region AWS N. Virginia region
Cross
Region
Replication
ElastiCache
Ad Inventory
Dynamo DB
Participation Log
Amazon RDS
Ad Inventory
65. • Amazon RDS:
- For ad inventory with strong schema, complex relationships, queryable data
- High availability Multi-AZ deployments
• Amazon DynamoDB:
- For participation log with heavy read/write load
- Single-digit millisecond latency
• Amazon ElastiCache:
- Redis/Memcached for fast and complex caching ad inventory
- Offloading the massive read request from RDS
• Amazon Redshift:
- For petabyte-scale big data analysis
- Export business insight easily by using reporting tool
DB heroes!
Fully-managed! Low cost! High performance!
66. Monthly cost report
Jan Feb Mar Apr May Jun Jul Aug
IGAWorks Cost Trend in 2015
Amazon ElastiCache
Amazon RDS
Amazon DynamoDB
Amazon Redshift
Others
67. Result
• Reduced 40% cost of analysis
• Scaled out more easily to support 130 million devices
• Guaranteed 2-digit latency from ad serve API response
+ Recruitment policy is changed
68. Lesson learned
Start your business today.
You may face with a difficult problem.
However, AWS already has the solutions.
70. Review: AWS managed DB services
Amazon
DynamoDB
Document
and Key-
Value Store
Amazon
RDS
SQL
Database
Engines
Amazon
ElastiCache
In-Memory
Key-Value
Store
Amazon
Redshift
Data
Warehouse
71. Benefits of AWS managed database services
Pay only for what
you use
No up-front cost
Fully managed
services
AWS handles
installs, patching,
restarts
Easy to scale
Grow as you need
Designed for use
with other AWS
services
AWS
Data Pipeline
Amazon
EC2
Amazon
S3
Amazon
CloudWatch
Amazon
SNS
Amazon
VPC
72. Related Sessions
• DAT201 - Introduction to Amazon Redshift
Oct 7 – 1:30pm – 2:30pm
• DAT204 - NoSQL? No Worries: Building Scalable
Applications on AWS NoSQL Services
Oct 7 – 1:30pm – 2:30pm
• DAT301 - Amazon Aurora Deep Dive
Oct 7 – 2:40pm – 3:45pm
• DAT407 - Amazon ElastiCache: Deep Dive
Oct 8 – 11am – 12pm