Deep Dive - DynamoDB

©2015, Amazon Web Services, Inc. or its aﬃliates. All rights reserved
Deep Dive: Amazon DynamoDB
Fitz (Philip Fitzsimons)
Solutions Architecture
Amazon Web Services

Agenda
•  Tables, API, data types, indexes
•  Scaling
•  Data modeling
•  Scenarios and best practices
•  Customer Story
•  DynamoDB Streams
•  Reference architecture

Amazon DynamoDB
•  Managed NoSQL database service
•  Supports both document and key-value data models
•  Highly scalable
•  Consistent, single-digit millisecond latency at any
scale
•  Highly available—3x replication
•  Simple and powerful API

Table
Table
Items
AAributes
Hash
Key
Range
Key
Mandatory
Key-value access pattern
Determines data distribution Optional
Model 1:N relationships
Enables rich query capabilities
All items for a hash key
==, <, >, >=, <=
“begins with”
“between”
sorted results
counts
top/boAom N values
paged responses

•  CreateTable
•  UpdateTable
•  DeleteTable
•  DescribeTable
•  ListTables
•  GetItem
•  Query
•  Scan
•  BatchGetItem
•  PutItem
•  UpdateItem
•  DeleteItem
•  BatchWriteItem
•  ListStreams
•  DescribeStream
•  GetShardIterator
•  GetRecords
Table and item API
Stream API
DynamoDB
In preview

Data types
•  String (S)
•  Number (N)
•  Binary (B)
•  String Set (SS)
•  Number Set (NS)
•  Binary Set (BS)
•  Boolean (BOOL)
•  Null (NULL)
•  List (L)
•  Map (M)
Used for storing nested JSON documents

00 55 A954 AA FF
Hash table
•  Hash key uniquely identifies an item
•  Hash key is used for building an unordered hash index
•  Table can be partitioned for scale
00 FF
Id = 1
Name = Jim
Hash (1) = 7B
Id = 2
Name = Andy
Dept = Engg
Hash (2) = 48
Id = 3
Name = Kim
Dept = Ops
Hash (3) = CD
Key Space

Partitions are three-way replicated
Id = 2
Name = Andy
Dept = Engg
Id = 3
Name = Kim
Dept = Ops
Id = 1
Name = Jim
Id = 2
Name = Andy
Dept = Engg
Id = 3
Name = Kim
Dept = Ops
Id = 1
Name = Jim
Id = 2
Name = Andy
Dept = Engg
Id = 3
Name = Kim
Dept = Ops
Id = 1
Name = Jim
Replica 1
Replica 2
Replica 3
Partition 1 Partition 2 Partition N

Hash-range table
•  Hash key and range key together uniquely identify an Item
•  Within unordered hash index, data is sorted by the range key
•  No limit on the number of items (∞) per hash key
–  Except if you have local secondary indexes
00:0 FF:∞
Hash (2) = 48
Customer# = 2
Order# = 10
Item = Pen
Customer# = 2
Order# = 11
Item = Shoes
Customer# = 1
Order# = 10
Item = Toy
Customer# = 1
Order# = 11
Item = Boots
Hash (1) = 7B
Customer# = 3
Order# = 10
Item = Book
Customer# = 3
Order# = 11
Item = Paper
Hash (3) = CD
55 A9:∞54:∞ AA
Partition 1 Partition 2 Partition 3

Local secondary index (LSI)
•  Alternate range key attribute
•  Index is local to a hash key (or partition)
A1

(hash)

A3

(range)

A2

(table
key)

A1

(hash)

A2

(range)

A3
A4
A5

LSIs A1

(hash)

A4

(range)

A2

(table
key)

A3

(projected)

Table
KEYS_ONLY
INCLUDE A3
A1

(hash)

A5

(range)

A2

(table
key)

A3

(projected)

A4

(projected)
ALL
10 GB max per hash
key, i.e. LSIs limit the
# of range keys!

Global secondary index (GSI)
•  Alternate hash (+range) key
•  Index is across all table hash keys (partitions)
A1

(hash)

A2
A3
A4
A5

GSIs A5

(hash)

A4

(range)

A1

(table
key)

A3

(projected)

Table
INCLUDE A3
A4

(hash)

A5

(range)

A1

(table
key)

A2

(projected)

A3

(projected)
ALL
A2

(hash)

A1

(table
key)
KEYS_ONLY
RCUs/WCUs
provisioned separately
for GSIs
Online indexing

How do GSI updates work?
Table
Primary
table
Primary
table
Primary
table
Primary
table
Global
Secondary
Index
Client
1. Update request
2. Asynchronous
update (in progress)
2. Update response
If GSIs don’t have enough write capacity, table writes will be throttled!

LSI or GSI?
•  LSI can be modeled as a GSI
•  If data size in an item collection > 10 GB, use GSI
•  If eventual consistency is okay for your
scenario, use GSI!

Scaling
•  Throughput
–  Provision any amount of throughput to a table
•  Size
–  Add any number of items to a table
•  Max item size is 400 KB
•  LSIs limit the number of range keys due to 10 GB limit
•  Scaling is achieved through partitioning

Throughput
•  Provisioned at the table level
–  Write capacity units (WCUs) are measured in 1 KB per second
–  Read capacity units (RCUs) are measured in 4 KB per second
•  RCUs measure strictly consistent reads
•  Eventually consistent reads cost 1/2 of consistent reads
•  Read and write throughput limits are
independent
WCURCU

Partitioning math
# 𝑜𝑓 𝑃𝑎𝑟𝑡𝑖𝑡𝑖𝑜𝑛𝑠 = 𝑇 𝑎𝑏𝑙𝑒 𝑆𝑖𝑧𝑒 𝑖𝑛 𝐺𝐵/10 𝐺𝐵 
( 𝑓𝑜𝑟 𝑠𝑖𝑧𝑒)
# 𝑜𝑓 𝑃𝑎𝑟𝑡𝑖𝑡𝑖𝑜𝑛𝑠
( 𝑓𝑜𝑟 𝑡ℎ𝑟𝑜𝑢𝑔ℎ𝑝𝑢𝑡)
= 𝑅 𝐶𝑈↓𝑓𝑜𝑟 𝑟𝑒𝑎𝑑𝑠 /3000 𝑅𝐶𝑈  + 𝑊 𝐶𝑈↓𝑓𝑜𝑟 𝑤𝑟𝑖𝑡𝑒𝑠 /
1000 𝑊𝐶𝑈 
# 𝑜𝑓 𝑃𝑎𝑟𝑡𝑖𝑡𝑖𝑜𝑛𝑠=MAX⁠# 𝑜𝑓 𝑃𝑎𝑟𝑡𝑖𝑡𝑖𝑜𝑛𝑠 ⁠ # 𝑜𝑓 𝑃𝑎𝑟𝑡𝑖𝑡𝑖𝑜𝑛𝑠  
( 𝑓𝑜𝑟 𝑡ℎ𝑟𝑜𝑢𝑔ℎ𝑝𝑢𝑡)( 𝑓𝑜𝑟 𝑠𝑖𝑧𝑒)( 𝑡𝑜𝑡𝑎𝑙)
In the future, these details might change…

Partitioning example
# 𝑜𝑓 𝑃𝑎𝑟𝑡𝑖𝑡𝑖𝑜𝑛𝑠 = 8 𝐺𝐵/10 𝐺𝐵  = 0.8 = 1
( 𝑓𝑜𝑟 𝑠𝑖𝑧𝑒)
# 𝑜𝑓 𝑃𝑎𝑟𝑡𝑖𝑡𝑖𝑜𝑛𝑠
( 𝑓𝑜𝑟 𝑡ℎ𝑟𝑜𝑢𝑔ℎ𝑝𝑢𝑡)
= 5000↓𝑅𝐶𝑈 /3000 𝑅𝐶𝑈  + 500↓𝑊𝐶𝑈 /1000 𝑊𝐶𝑈  =
2.17 = 3
Table size = 8 GB, RCUs = 5000, WCUs = 500
# 𝑜𝑓 𝑃𝑎𝑟𝑡𝑖𝑡𝑖𝑜𝑛𝑠=MAX⁠1 𝑓𝑜𝑟 𝑠𝑖𝑧𝑒⁠ 3 𝑓𝑜𝑟 𝑡ℎ𝑟𝑜𝑢𝑔ℎ𝑝𝑢𝑡  
( 𝑡𝑜𝑡𝑎𝑙)
RCUs per partition = 5000/3 = 1666.67
WCUs per partition = 500/3 = 166.67
Data/partition = 8/3 = 2.66 GB
RCUs and WCUs are uniformly
spread across partitions

Getting the most out of DynamoDB throughput
“To get the most out of
DynamoDB throughput, create
tables where the hash key
element has a large number of
distinct values, and values are
requested fairly uniformly, as
randomly as possible.”
—DynamoDB Developer Guide
•  Space: access is evenly
spread over the key-space
•  Time: requests arrive evenly
spaced in time

Example: hot keys
Partition
Time
Heat

How does DynamoDB handle bursts?
•  DynamoDB saves 300 seconds of unused
capacity per partition
•  This is used when a partition runs out of
provisioned throughput due to bursts
–  Provided excess capacity is available at the node

Burst capacity is built-in
0
400
800
1200
1600
CapacityUnits
Time
Provisioned Consumed
“Save up” unused capacity
Consume saved up capacity
Burst capacity: 300 seconds
(1200 × 300 = 3600 CU)

Burst capacity may not be sufficient
0
400
800
1200
1600
CapacityUnits
Time
Provisioned Consumed Attempted
Burst capacity: 300 seconds
(1200 × 300 = 3600 CU)
Throttled requests
Don’t completely depend on burst capacity… provision sufficient throughput

What causes throttling?
•  If sustained throughput goes beyond
provisioned throughput per partition
•  From the example before:
–  Table created with 5000 RCUs, 500 WCUs
–  RCUs per partition = 1666.67
–  WCUs per partition = 166.67
–  If sustained throughput > (1666 RCUs or 166 WCUs) per key or
partition, DynamoDB may throttle requests
•  Solution: Increase provisioned throughput

What causes throttling?
•  Non-uniform workloads
–  Hot keys/hot partitions
–  Very large bursts
•  Dilution of throughout across partitions caused
by mixing hot data with cold data
–  Use a table per time period for storing time series data so WCUs
and RCUs are applied to the hot data set

Data Modeling
Store data based on how you will access it!

1:1 relationships or key-values
•  Use a table with a hash key
•  Use GetItem or BatchGetItem API
Example: Given a user or email, get attributes
Users
Table

Hash
key
A/ributes

UserId
=
bob
Email
=
bob@gmail.com,
JoinDate
=
2011-‐11-‐15

UserId
=
fred
Email
=
fred@yahoo.com,
JoinDate
=
2011-‐12-‐01

Users-‐Email-‐GSI

Hash
key
A/ributes

Email
=
bob@gmail.com
UserId
=
bob,
JoinDate
=
2011-‐11-‐15

Email
=
fred@yahoo.com
UserId
=
fred,
JoinDate
=
2011-‐12-‐01

1:N relationships or parent-children
•  Use a table with hash and range key
•  Use Query API
Example:
–  Given a device, find all readings between epoch X, Y
Device-‐measurements

Hash
Key
Range
key
A/ributes

DeviceId
=
1
epoch
=
5513A97C
Temperature
=
30,
pressure
=
90

DeviceId
=
1
epoch
=
5513A9DB
Temperature
=
30,
pressure
=
90

N:M relationships
•  Use a table and GSI with hash and range key
elements switched
•  Use Query API
Example: Given a user, find all games. Or given a
game, find all users.
User-‐Games-‐Table

Hash
Key
Range
key

UserId
=
bob
GameId
=
Game1

UserId
=
fred
GameId
=
Game2

UserId
=
bob
GameId
=
Game3

Game-‐Users-‐GSI

Hash
Key
Range
key

GameId
=
Game1
UserId
=
bob

GameId
=
Game2
UserId
=
fred

GameId
=
Game3
UserId
=
bob

Documents (JSON)
•  New data types (M, L, BOOL,
NULL) introduced to support
JSON
•  Document SDKs
–  Simple programming model
–  Conversion to/from JSON
–  Java, JavaScript, Ruby, .NET
•  Cannot index (S,N) elements
of a JSON object stored in M
–  They need to be modeled as
top-level table attributes to be
used in LSIs and GSIs
Javascript DynamoDB
string S
number N
boolean BOOL
null NULL
array L
object M

Rich expressions
•  Projection expression
–  Query/Get/Scan: ProductReviews.FiveStar[0]
•  Filter expression
–  Query/Scan: #V > :num (#V is a place holder for keyword VIEWS)
•  Conditional expression
–  Put/Update/DeleteItem: attribute_not_exists (#pr.FiveStar)
•  Update expression
–  UpdateItem: set Replies = Replies + :num

Event Logging
Storing time series data

Time series tables
Events_table_2015_April
Event_id
(Hash key)
Timestamp
(range key)
Attribute1 …. Attribute N
Events_table_2015_March
Event_id
(Hash key)
Timestamp
(range key)
Events_table_2015_Feburary
Event_id
(Hash key)
Timestamp
(range key)
Events_table_2015_January
Event_id
(Hash key)
Timestamp
(range key)
RCUs = 1000
WCUs = 100
RCUs = 10000
WCUs = 10000
RCUs = 100
WCUs = 1
RCUs = 10
WCUs = 1
Current table
Older tables
HotdataColddata
Don’t mix hot and cold data; archive cold data to Amazon S3

Use a table per time period
•  Pre-create daily, weekly, monthly tables
•  Provision required throughput for current table
•  Writes go to the current table
•  Turn off (or reduce) throughput for older tables
Dealing with time series data

Product Catalog
Popular items (read)

Partition 1
2000 RCUs
Partition K
2000 RCUs
Partition M
2000 RCUs
Partition 50
2000 RCU
Scaling bottlenecks
Product A Product B
Shoppers
ProductCatalog Table
100,000↓𝑅𝐶𝑈 /50↓𝑃𝑎𝑟𝑡𝑖𝑡𝑖𝑜𝑛𝑠   ≈𝟐𝟎𝟎𝟎
𝑅𝐶𝑈 𝑝𝑒𝑟 𝑝𝑎𝑟𝑡𝑖𝑡𝑖𝑜𝑛
SELECT Id,
Description, ...
FROM ProductCatalog
WHERE Id="POPULAR_PRODUCT"

RequestsPerSecond
Item Primary Key
Request Distribution Per Hash Key
DynamoDB Requests

Partition 1 Partition 2
ProductCatalog Table
User
DynamoDB
User
SELECT Id,
Description, ...
FROM ProductCatalog
WHERE Id="POPULAR_PRODUCT"

RequestsPerSecond
Item Primary Key
Request Distribution Per Hash Key
DynamoDB Requests Cache Hits

Messaging App
Large items
Filters vs. indexes
M:N Modeling—inbox and outbox

Messages
Table
Messages App
David
SELECT *
FROM Messages
WHERE Recipient='David'
LIMIT 50
ORDER BY Date DESC
Inbox
SELECT *
FROM Messages
WHERE Sender ='David'
LIMIT 50
ORDER BY Date DESC
Outbox

Recipient Date Sender Message
David 2014-10-02 Bob …
… 48 more messages for David …
David 2014-10-03 Alice …
Alice 2014-09-28 Bob …
Alice 2014-10-01 Carol …
Large and small attributes mixed
(Many more messages)
David
Messages Table
50 items × 256 KB each
Large message bodies
Attachments
SELECT *
FROM Messages
WHERE Recipient='David'
LIMIT 50
ORDER BY Date DESC
Inbox

50↓𝑖𝑡𝑒𝑚𝑠 ×256↓𝐾𝐵 × 1↓𝑅𝐶𝑈 /4↓𝐾𝐵  × 1↓𝑟𝑒𝑎𝑑 /2↓𝑒. 𝑐. 𝑟𝑒𝑎𝑑𝑠  = 𝟏 𝟔𝟎𝟎↓𝑹𝑪𝑼 
Computing inbox query cost
Items evaluated by query
Average item size
Conversion ratio
Eventually consistent reads

Recipient Date Sender Subject MsgId
David 2014-10-02 Bob Hi!… afed
David 2014-10-03 Alice RE: The… 3kf8
Alice 2014-09-28 Bob FW: Ok… 9d2b
Alice 2014-10-01 Carol Hi!... ct7r
Separate the bulk data
Inbox-GSI Messages Table
MsgId Body
9d2b …
3kf8 …
ct7r …
afed …
David
1.  Query Inbox-GSI: 1 RCU
2.  BatchGetItem Messages: 1600 RCU
(50 separate items at 256 KB)
(50 sequential items at 128 bytes)
Uniformly distributes large item reads

Simplified writes
David
PutItem
{
MsgId: 123,
Body: ...,
Recipient: Steve,
Sender: David,
Date: 2014-10-23,
...
}
Inbox
Global secondary
index
Messages
Table

Outbox Sender
Outbox GSI
SELECT *
FROM Messages
WHERE Sender ='David'
LIMIT 50
ORDER BY Date DESC

Messaging app
Messages
Table
David
Inbox
Global secondary
index
Inbox
Outbox
Global secondary
index
Outbox

•  Reduce one-to-many item sizes
•  Configure secondary index projections
•  Use GSIs to model M:N relationship
between sender and recipient
Distribute large items
Querying many large items at
once
InboxMessagesOutbox

Multiplayer Online Gaming
Query filters vs.
composite key indexes

GameId Date Host Opponent Status
d9bl3 2014-10-02 David Alice DONE
72f49 2014-09-30 Alice Bob PENDING
o2pnb 2014-10-08 Bob Carol IN_PROGRESS
b932s 2014-10-03 Carol Bob PENDING
ef9ca 2014-10-03 David Bob IN_PROGRESS
Games Table
Multiplayer online game data

Query for incoming game requests
•  DynamoDB indexes provide hash and range
•  What about queries for two equalities and a
range?
SELECT * FROM Game
WHERE Opponent='Bob‘
AND Status=‘PENDING'
ORDER BY Date DESC
(hash)
(range)
(?)

Secondary Index
Opponent Date GameId Status Host
Alice 2014-10-02 d9bl3 DONE David
Carol 2014-10-08 o2pnb IN_PROGRESS Bob
Bob 2014-09-30 72f49 PENDING Alice
Bob 2014-10-03 b932s PENDING Carol
Bob 2014-10-03 ef9ca IN_PROGRESS David
Approach 1: Query filter
Bob

Secondary Index
Approach 1: Query filter
Bob
Opponent Date GameId Status Host
Alice 2014-10-02 d9bl3 DONE David
Carol 2014-10-08 o2pnb IN_PROGRESS Bob
Bob 2014-09-30 72f49 PENDING Alice
Bob 2014-10-03 b932s PENDING Carol
Bob 2014-10-03 ef9ca IN_PROGRESS David
SELECT * FROM Game
WHERE Opponent='Bob'
ORDER BY Date DESC
FILTER ON Status='PENDING'
(filtered out)

•  Send back less data “on the wire”
•  Simplify application code
•  Simple SQL-like expressions
–  AND, OR, NOT, ()
Use query filter
Your index isn’t entirely selective

Approach 2: Composite key
StatusDate
DONE_2014-10-02
IN_PROGRESS_2014-10-08
IN_PROGRESS_2014-10-03
PENDING_2014-09-30
PENDING_2014-10-03
Status
DONE
IN_PROGRESS
IN_PROGRESS
PENDING
PENDING
Date
2014-10-02
2014-10-08
2014-10-03
2014-10-03
2014-09-30

Secondary Index
Opponent StatusDate GameId Host
Alice DONE_2014-10-02 d9bl3 David
Carol IN_PROGRESS_2014-10-08 o2pnb Bob
Bob IN_PROGRESS_2014-10-03 ef9ca David
Bob PENDING_2014-09-30 72f49 Alice
Bob PENDING_2014-10-03 b932s Carol

Opponent StatusDate GameId Host
Alice DONE_2014-10-02 d9bl3 David
Carol IN_PROGRESS_2014-10-08 o2pnb Bob
Bob IN_PROGRESS_2014-10-03 ef9ca David
Bob PENDING_2014-09-30 72f49 Alice
Bob PENDING_2014-10-03 b932s Carol
Secondary Index
Bob
SELECT * FROM Game
WHERE Opponent='Bob'
AND StatusDate BEGINS_WITH 'PENDING'

Needle in a sorted haystack
Bob

Sparse indexes
Id

(Hash)

User
Game
Score
Date
Award

1
Bob
G1
1300
2012-‐12-‐23

2
Bob
G1
1450
2012-‐12-‐23

3
Jay
G1
1600
2012-‐12-‐24

4
Mary
G1
2000
2012-‐10-‐24
Champ

5
Ryan
G2
123
2012-‐03-‐10

6
Jones
G2
345
2012-‐03-‐20

Game-scores-table
Award

(Hash)

Id
User
Score

Champ
4
Mary
2000

Award-GSI
Scan sparse hash GSIs

•  Concatenate attributes to form useful
secondary index keys
•  Take advantage of sparse indexes
Replace filter with indexes
You want to optimize a query as
much as possible
Status + Date

AWS Big Data Platform
Overview
Richard Freeman, PhD
Senior Data Scientist / Architect
richard.freeman@justgiving.com

69
We are
A tech-for-good company that is
Connecting the World’s
Causes
with People who
Care
DONATIONS
$3bn of donations raised since launch, $576m in 2014
CHARITIES
20,000 charities on the platform
USERS
23m users have made a transaction
REACH
Givers in 164 countries donate to causes in 12 countries
DATA
Billions of data points
AMAZON WEB SERVICES
Production environment micro-services and big data analytics,
reporting and KPIs

The Challenge
•  Data sources: clickstream, logs, data warehouse, and
external data sources
•  Data too big and SQL queries too complex for data
warehouse
•  Machine Learning algorithms at scale
•  Generate KPI and interactive reports
New Data Science Use Cases
•  High throughput
•  Live view in near real-time
•  Scale out
•  Minimum maintenance
Web Events

Real Time Web Analytics with Kinesis and DynamoDB

The Outcome
• Live view on the data
• Fast to query and scales out
• Minimum support and maintenance
• Needed a pattern to maximise throughput of temporal data
DynamoDB
• AWS services managed cloud services and SDKs
• New pattern: Resilient and incremental data loads into many
Redshift clusters in parallel
• One Redshift cluster per stakeholder, e.g. business, data
scientist, analysts, DBAs
• Scale out and run complex queries and machine learning
algorithms
RAVEN platform:
Scalable reporting and analytics cloud platform

Thank you!
We are hiring!
Contact:
Richard.freeman
@justgiving.com

Real-Time Voting
Write-heavy items

Requirements for voting
•  Allow each person to vote only once
•  No changing votes
•  Real-time aggregation
•  Voter analytics, demographics

Real-time voting architecture
AggregateVotes
Table
Voters
RawVotes Table
Voting App

Partition 1
1000 WCUs
Partition K
1000 WCUs
Partition M
1000 WCUs
Partition N
1000 WCUs
Votes Table
Candidate A Candidate B
Scaling bottlenecks
Voters
Provision 200,000 WCUs

Write sharding
Candidate A_2
Candidate B_1
Candidate B_2
Candidate B_3
Candidate B_5
Candidate B_4
Candidate B_7
Candidate B_6
Candidate A_1
Candidate A_3
Candidate A_4
Candidate A_7 Candidate B_8
Candidate A_6 Candidate A_8
Candidate A_5
Voter
Votes Table

Write sharding
Candidate A_2
Candidate B_1
Candidate B_2
Candidate B_3
Candidate B_5
Candidate B_4
Candidate B_7
Candidate B_6
Candidate A_1
Candidate A_3
Candidate A_4
UpdateItem: “CandidateA_” + rand(0, 200)
ADD 1 to Votes
Candidate A_5
Voter
Votes Table

Votes Table
Shard aggregation
Candidate A_2
Candidate B_1
Candidate B_2
Candidate B_3
Candidate B_5
Candidate B_4
Candidate B_7
Candidate B_6
Candidate A_1
Candidate A_3
Candidate A_4
Candidate A_5
Periodic
Process
Candidate A
Total: 2.5M
1. Sum
2. Store Voter

•  Trade off read cost for write scalability
•  Consider throughput per hash key and per
partition
Shard write-heavy hash keys
Your write workload is not
horizontally scalable

Correctness in voting
UserId Candidate Date
Alice A 2013-10-02
Bob B 2013-10-02
Eve B 2013-10-02
Chuck A 2013-10-02
RawVotes Table
Segment Votes
A_1 23
B_2 12
B_1 14
A_2 25
AggregateVotes Table
Voter
1. Record vote and de-dupe; retry 2. Increment candidate counter

Correctness in aggregation?
UserId Candidate Date
Alice A 2013-10-02
Bob B 2013-10-02
Eve B 2013-10-02
Chuck A 2013-10-02
RawVotes Table
Segment Votes
A_1 23
B_2 12
B_1 14
A_2 25
AggregateVotes Table
Voter

•  Stream of updates to
a table
•  Asynchronous
•  Exactly once
•  Strictly ordered
–  Per item
•  Highly durable
•  Scale with table
•  24-hour lifetime
•  Sub-second latency
DynamoDB Streams

View Type Destination
Old image—before update Name = John, Destination = Mars
New image—after update Name = John, Destination = Pluto
Old and new images Name = John, Destination = Mars
Name = John, Destination = Pluto
Keys only Name = John
View types
UpdateItem (Name = John, Destination = Pluto)

Stream
Table
Partition 1
Partition 2
Partition 3
Partition 4
Partition 5
Table
Shard 1
Shard 2
Shard 3
Shard 4
KCL
Worker
KCL
Worker
KCL
Worker
KCL
Worker
Amazon Kinesis Client
Library Application
DynamoDB
Client Application
Updates
DynamoDB Streams and
Amazon Kinesis Client Library

DynamoDB Streams
Open Source Cross-
Region Replication Library
Asia Pacific (Sydney) EU (Ireland) Replica
US East (N. Virginia)
Cross-region replication

DynamoDB Streams and AWS Lambda

Real-time voting architecture (improved)
AggregateVotes
Table
Amazon
Redshift Amazon EMR
Your
Amazon Kinesis–
Enabled App
Voters RawVotes TableVoting App RawVotes
DynamoDB
Stream

AggregateVotes
Table
Amazon
Redshift Amazon EMR
Your
Amazon Kinesis-
Enabled App
DynamoDB
Stream

AggregateVotes
Table
Amazon
Redshift Amazon EMR
Your
Amazon Kinesis-
Enabled app
DynamoDB
Stream

AggregateVotes
Table
Amazon
Redshift Amazon EMR
Your
Amazon Kinesis–
Enabled App
Voters RawVotes TableVoting app RawVotes
DynamoDB
Stream

Analytics with
DynamoDB Streams
•  Collect and de-dupe data in DynamoDB
•  Aggregate data in-memory and flush
periodically
Performing real-time aggregation
and analytics

Deep Dive - DynamoDB

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Andere mochten auch

Andere mochten auch (16)

Ähnlich wie Deep Dive - DynamoDB

Ähnlich wie Deep Dive - DynamoDB (20)

Mehr von Amazon Web Services

Mehr von Amazon Web Services (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Deep Dive - DynamoDB