This session will begin with an introduction to non-relational (NoSQL) databases and compare them with relational (SQL) databases. We will also explain the fundamentals of Amazon DynamoDB, a fully managed NoSQL database service. Learn the fundamentals of DynamoDB and see the new DynamoDB console first-hand as we discuss common use cases and benefits of this high-performance key-value and JSON document store.
2. Agenda
Brief history of data processing
SQL vs. NoSQL
DynamoDB tables, API, data types, indexes
Scaling
Streams and Triggers
Customer Case Study - Codurance
4. Timeline of database technologyDataPressure
Ledgers
UnitRecords
Data
Drum
s
File
System
s
RDBM
S
NoSQ
L
5. Data volume since 2010DataVolume
Historical Current
90% of stored data generated in last
2 years
1 terabyte of data in 2010 equals 6.5
petabytes today
Linear correlation between data
pressure and technical innovation
No reason these trends will not
continue over time
8. Relational vs. NonRelational databases
Traditional SQL NoSQL
Primary Secondary
Scale up
DB
Scale out
DB
DBDB
DBDB
DB
DB
DB
9. Why NoSQL?
Optimized for storage Optimized for compute
Normalized/relational Denormalized/hierarchical
Ad hoc queries Instantiated views
Scale vertically Scale horizontally
Good for OLAP Built for OLTP at scale
SQL NoSQL
10. SQL vs. NoSQL schema design
NoSQL design optimises for
compute instead of storage
13. Over 200 million usersOver 4 billion items stored
Millions of ads per month
Cross-device ad solutions
130+ million new users in 1 year
150+ million messages per month
Process requests in milliseconds High-performance ads
Statcast uses burst scalability
for many games on a single day
Flexibility for fast growth
Web clickstream insights
Specialty online and retail stores
Over 5 billion items
processed daily
About 200 million messages
processed daily
Cognitive training
Job-matching platform
5+ million registered users
Mobile game analytics
10M global users
Home security
Wearable and IoT
solutions
170,000 concurrent players
15. High availability and durability
WRITES
Replicated continuously to 3 AZs
Persisted to disk (custom SSD)
READS
Strongly or eventually consistent
No latency trade-off
Designed to
support
99.99%
of availability
Built for high
durability
16. How DynamoDB scales
partitions
1 .. N
table
DynamoDB automatically partitions data
• Partition key spreads data (and workload) across
partitions
• Automatically partitions as data grows and throughput
needs increase
Large number of unique hash keys
+
Uniform distribution of workload
across hash keys
High-scale
apps
17. Flexibility and low cost
Reads per second
Writes per second
table
Customers can configure a table for
just a few RPS or for hundreds of
thousands of RPS
Customers only pay for how much
they provision
Provides maximum flexibility to adjust
expenditure based on the workload
18. Fully managed service = automated operations
DB hosted on-premises DB hosted on Amazon EC2
App Optimisation
Scaling
High Availability
Database Backups
DB s/w patches
DB s/w installs
OS patches
OS installation
Server Maintenance
Rack & Stack
Power, HVAC, net
App Optimisation
Scaling
High Availability
Database Backups
DB s/w patches
DB s/w installs
OS patches
OS installation
Server Maintenance
Rack & Stack
Power, HVAC, net
Amazon DynamoDB
App Optimisation
Scaling
High Availability
Database Backups
DB s/w patches
DB s/w installs
OS patches
OS installation
Server Maintenance
Rack & Stack
Power, HVAC, net
21. 00 55 A954 FFAA
Partition keys
Partition key uniquely identifies an item
Partition key is used for building an unordered hash index
Allows table to be partitioned for scale
Id = 1
Name = Jim
Hash (1) = 7B
Id = 2
Name = Andy
Dept = Eng
Hash (2) = 48
Id = 3
Name = Kim
Dept = Ops
Hash (3) = CD
Key Space
22. Partition:Sort key
Partition:Sort key uses two attributes together to uniquely identify an Item
Within unordered hash index, data is arranged by the sort key
No limit on the number of items (∞) per partition key
• Except if you have local secondary indexes
00:0 FF:∞
Hash (2) = 48
Customer# = 2
Order# = 10
Item = Pen
Customer# = 2
Order# = 11
Item = Shoes
Customer# = 1
Order# = 10
Item = Toy
Customer# = 1
Order# = 11
Item = Boots
Hash (1) = 7B
Customer# = 3
Order# = 10
Item = Book
Customer# = 3
Order# = 11
Item = Paper
Hash (3) = CD
55 A9:∞54:∞ AA
Partition 1 Partition 2 Partition 3
23. Partitions are three-way replicated
Id = 2
Name = Andy
Dept = Engg
Id = 3
Name = Kim
Dept = Ops
Id = 1
Name = Jim
Id = 2
Name = Andy
Dept = Engg
Id = 3
Name = Kim
Dept = Ops
Id = 1
Name = Jim
Id = 2
Name = Andy
Dept = Engg
Id = 3
Name = Kim
Dept = Ops
Id = 1
Name = Jim
Replica 1
Replica 2
Replica 3
Partition 1 Partition 2 Partition N
24. Local secondary index (LSI)
Alternate sort key attribute
Index is local to a partition key
A1
(partition)
A3
(sort)
A2
(item key)
A1
(partition)
A2
(sort)
A3 A4 A5
LSIs A1
(partition)
A4
(sort)
A2
(item key)
A3
(projected)
Table
KEYS_ONLY
INCLUDE A3
A1
(partition)
A5
(sort)
A2
(item key)
A3
(projected)
A4
(projected)
ALL
10 GB maximum per
partition key; LSIs limit the
number of range keys!
25. Global secondary index (GSI)
Alternate partition and/or sort key
Index is across all partition keys
A1
(partition)
A2 A3 A4 A5
GSIs A5
(partition)
A4
(sort)
A1
(item key)
A3
(projected)
Table
INCLUDE A3
A4
(partition)
A5
(sort)
A1
(item key)
A2
(projected)
A3
(projected) ALL
A2
(partition)
A1
(itemkey)
KEYS_ONLY
Online indexing
Read capacity units
(RCUs) and write
capacity units (WCUs)
are provisioned
separately for GSIs
26. How do GSI updates work?
Table
Primary
table
Primary
table
Primary
table
Primary
table
Global
secondary
index
Client
1. Update request
2. Asynchronous
update (in progress)
2. Update response
If GSIs don’t have enough write capacity, table writes will be throttled!
27. LSI or GSI?
LSI can be modelled as a GSI
If data size in an item collection > 10 GB, use GSI
If eventual consistency is okay for your scenario, use GSI!
29. Scaling
Throughput
• Provision any amount of throughput to a table
Size
• Add any number of items to a table
• Maximum item size is 400 KB
• LSIs limit the number of range keys due to 10 GB limit
Scaling is achieved through partitioning
30. Throughput
Provisioned at the table level
• Write capacity units (WCUs) are measured in 1 KB per second
• Read capacity units (RCUs) are measured in 4 KB per second
• RCUs measure strictly consistent reads
• Eventually consistent reads cost 1/2 of consistent reads
Read and write throughput limits are independent
WCURCU
31. Partitioning math
In the future, these details might change…
Number of partitions
By capacity (Total RCU / 3000) + (Total WCU / 1000)
By size Total Size / 10 GB
Total partitions CEILING(MAX (Capacity, Size))
32. Partitioning example Table size = 8 GB, RCUs = 5000, WCUs = 500
RCUs per partition = 5000/3 = 1666.67
WCUs per partition = 500/3 = 166.67
Data/partition = 10/3 = 3.33 GB
RCUs and WCUs are uniformly
spread across partitions
Number of partitions
By capacity (5000 / 3000) + (500 / 1000) = 2.17
By size 8 / 10 = 0.8
Total partitions CEILING(MAX (2.17, 0.8)) = 3
33. To learn more, please attend:
Deep Dive on DynamoDB
Floor 0, Room 3, 14:00 p.m.–14:45 p.m.
Andreas Chatzakis, Solutions Architect
43. Ireland (eu-west-1)
Secure: AWS
Lambda Amazon
DynamoDB
Amazon
SES
users
AWS KMS AWS IAM
admin
Amazon
CloudFront
Amazon
CloudWatch
AWS
CloudTrail
QR Reader
awsloft.london
AWS WAF
S3: Static HTML/CSS
and Javascript content
for the site.
API for all dynamic
content (proxy to
AWS Lambda)
Serverless Architecture
Dynamo DB is a piece of
the puzzle.
44. Fast feedback …
• Single Server Local Environment
• Local Dynamo DB
• Simulated API Gateway
• Abstracted Lambda API
• Mocked Encryption
• Mocked External Services
• SES, KMS etc.
• Microservices based Cloud Environment
• Continuously deploy to QA
• One click deployment to production
• Hot deployment
45. Persistence Options
• RDS (Postgres)
• Out of box backup and recovery
• Out of box encryption
• Mature development tooling and libraries
• Possible downtime during scalling
• Relatively complicated migrations
• More complicated to model hierarchical structure
• Dynamo DB
• Elastic Scaling
• Evolutionary Schema design
• Easy to get started
• Custom encryption
• Complicated joins
• Backup and recovery using pipelines
46. Lessons
• Dynamo DB is easy to get started
• Runs locally for Dev environments
• Tooling and libraries are surprisingly mature
• API (at least in Clojure / Java) is simple
• Custom encryption is inconvenient but easy to overcome
• Backups using pipelines are straight forward
• Schema migrations are rare
• Possibly more cost efficient if you plan well or use auto scaling
• Easy to monitor
• … it’s painless