DynamoDB In-Depth: In this technical discussion, learn how to use DynamoDB for your mobile and web apps, and how to pick the right database for your app. We will cover the fundamental concepts and how to go about architecting your app on DynamoDB. Plus, gain key insights to help you make the most out of DynamoDB.
Developer Drill Down: Come learn with live examples of a DynamoDB application integrating with other data services on AWS to enrich your app. This is a developer driven interactive session focused on building real-life applications.
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
DynamoDB In-depth & Developer Drill Down
1. DynamoDB In-Depth & Developer Drill Down
Peter-Mark Verwoerd
Solutions Architect
Ace Hotel, New York. May 22nd, 2014
2. Overview
• Local Secondary Indexes
• Global Secondary Indexes
• Design Patterns
– User Data
• Demo
• Break
• Design Patterns (continued…)
– Game State
– Save Games
– Global Leaderboard
– High throughput voting
• Data design patterns
4. Local Secondary Indexes
• Alternate Range Key for your table
• More flexible Query patterns
• Local to the Hash Key
local secondary indexes (LSI)
index and table data is co-located (same partition)
5. Use case for Local Secondary Indexes
• Find the recent DynamoDB forum posts
• Table sorted by range key only
Forum Subject LastReplyTime Views Replies Answered
S3 How to set permissions? 2013-04-02 100 20 1
DynamoDB Creating secondary indexes? 2013-02-12 100 20 0
DynamoDB I get an error 2012-11-05 98 3 1
DynamoDB Setting row permissions 2012-06-17 100 8 0
DynamoDB Signature not working 2012-03-28 12 1 1
DynamoDB Transaction support 2013-04-01 5 10 0
6. Use case for Local Secondary Indexes
• Create a local secondary index on LastReplyTime
Forum LastReplyTime Subject Views Replies Answered
S3 2013-04-02 How to set permissions? 100 20 1
DynamoDB 2012-03-28 Signature not working 12 1 1
DynamoDB 2012-06-17 Setting row permissions 100 8 0
DynamoDB 2012-11-05 I get an error 98 3 1
DynamoDB 2013-02-12 Creating secondary indexes? 100 20 0
DynamoDB 2013-04-01 Transaction support 5 10 0
7. Write example (behind the scenes)
• Updating the LastReplyTime for a Post
– from “2013-03-17” to “2013-04-02”
DynamoDB
ForumPost
Partition 1
UpdateItem
ReplyTime
Index
Table
ForumPost
Partition 2
8. Write example (behind the scenes)
• Update the attribute(s) in the item in the table
• Update the attribute(s) in the index if necessary
9. Write example (behind the scenes)
• Update the attribute(s) in the item in the table
– Update “How..” date from 2 to 5
Table Index
Forum Q’n Date
S3 Ask… 1
DDB Ask… 5
DDB Help.. 1
DDB How… 2
DDB Using… 3
Forum Date Q’n
S3 1 Ask...
DDB 1 Help…
DDB 2 How…
DDB 3 Using…
DDB 5 Ask…
10. Write example (behind the scenes)
• Update the attribute(s) in the item in the table
– Update “How..” date from 2 to 5
Table Index
Forum Q’n Date
S3 Ask… 1
DDB Ask… 5
DDB Help.. 1
DDB How… 2
DDB Using… 3
Forum Date Q’n
S3 1 Ask...
DDB 1 Help…
DDB 2 How…
DDB 3 Using…
DDB 5 Ask…
5
11. Write example (behind the scenes)
• Update the attribute(s) in the index
– Update “How..” date from 2 to 5
Table Index
Forum Q’n Date
S3 Ask… 1
DDB Ask… 5
DDB Help.. 1
DDB How… 2
DDB Using… 3
Forum Date Q’n
S3 1 Ask...
DDB 1 Help…
DDB 2 How…
DDB 3 Using…
DDB 5 Ask…
DDB 5 How…
5
12. User
(hash)
Date
(range)
File
(key)
User
(hash)
File
(range)
Date Type Size S3Key
Date-index
User
(hash)
Type
(range)
File
(key)
Date
(projected)
Table
KEYS_ONLY
INCLUDE Date
User
(hash)
Size
(range)
File
(key)
Date
(projected)
Type
(projected)
S3key
(projected) ALL
Local Secondary Index Projections
Type-index
Size-index
13. Projections
• Pick which attributes are “copied” into the index
• Pros:
– Improves Query performance when querying projected attributes
• Cons:
– Increases write cost when:
• Projected attributes are frequently updated
• Projected attributes are > 1KB
14. Provisioned throughput cost (reads)
• If querying only for projected attributes:
– Query costs the same as a Query on a table
• If querying for non-projected attributes
– Query costs the same as a Query on a table
– Plus, the cost of retrieving each item from the table independently
• (similar to Query + BatchGetItem)
15. Queries that Fetch
• Index: Project KEYS_ONLY
• Query: (“DDB”, “Date >= 3”, “ALL_ATTRIBUTES”)
Table Index
Forum Q’n Date Answered
S3 Ask… 1 1
DDB Ask… 5 0
DDB Help.. 1 1
DDB How… 2 1
DDB Using… 3 0
Forum Date Q’n
S3 1 Ask...
DDB 1 Help…
DDB 2 How…
DDB 3 Using…
DDB 5 Ask…
16. Queries that Fetch
• Index: Project KEYS_ONLY
• Query: (“DDB”, “Date >= 3”, “ALL_ATTRIBUTES”)
Table Index
Forum Q’n Date Answered
S3 Ask… 1 1
DDB Ask… 5 0
DDB Help.. 1 1
DDB How… 2 1
DDB Using… 3 0
Forum Date Q’n
S3 1 Ask...
DDB 1 Help…
DDB 2 How…
DDB 3 Using…
DDB 5 Ask…
2. Fetch items 1. Query Index
17. Queries that Fetch
DynamoDB
ForumPost
Partition 1
1. Query
ReplyTime
Index
Table
ForumPost
Partition 2
2. DynamoDB Queries Index
3. DynamoDB
fetches each item
from the table
18. Sparse indexes
• “Unanswered” entries are very interesting
Forum Subject LastReplyTime Views Replies Answered
S3 How to set permissions? 2013-04-02 100 20 1
DynamoDB Creating secondary indexes? 2013-02-12 100 20 0
DynamoDB I get an error 2013-04-01 98 3 1
DynamoDB Setting row permissions 2013-04-01 100 8 0
DynamoDB Signature not working 2013-04-01 12 1 0
DynamoDB Using the SDK 2013-04-01 5 10 1
19. Sparse indexes
• The “Unanswered” index contains only unanswered replies
Forum Unans
wered
Subject LastReplyTime Views Replies
DynamoDB 1 Setting row permissions 2013-04-01 100 8
DynamoDB 1 Signature not working 2013-04-01 12 1
DynamoDB 1 Creating secondary indexes? 2013-02-12 100 20
20. Sparse indexes
• Tip: To get useful sort order, populate Unanswered with LastReplyDateTime
Forum Unanswered Subject LastReplyTime Views Replies
DynamoDB 2013-02-12 Creating secondary indexes? 2013-02-12 100 20
DynamoDB 2013-04-01 Setting row permissions 2013-04-01 100 8
DynamoDB 2013-04-01 Signature not working 2013-04-01 12 1
22. Global Secondary Indexes
• Alternate Hash and/or Range Key for your table
• Even more flexible Query patterns
23. Global Secondary Index Projections
Urgent
(hash)
Id
(key)
Table
GSIs
INCLUDE
To
(hash)
Date
(range)
Id
(key)
Message
(projected)
From
(projected)
ALL
To
(hash)
From
(range)
Id
(key)
23
Id
(hash)
Date From To Message Urgent
From
(hash)
To
(range)
Id
(key) KEYS_ONLY
From
(hash)
Date
(range)
Id
(key)
To
24. GSI Query Pattern
• Query covered by GSI
– Query GSI & get the attributes
• Query not covered by GSI
– Query GSI get the table key(s)
– BatchGetItem/GetItem from table
– 2 or more round trips to DynamoDB
Tip: If you need very low latency then project all required attributes into GSI
24
25. How do GSI updates work
Table
Primary
table
Primary
table
Primary
table
Primary
table
Global
Secondary
Index
Client
2. Asynchronous
update (in progress)
25
26. 1 Table update = 0, 1 or 2 GSI updates
Table Operation No. of GSI index
updates
• Item not in Index before or after update 0
• Update introduces a new indexed-attribute
• Update deletes the indexed-attribute
1
• Updated changes the value of an indexed
attribute from A to B
2
26
27. Local Secondary Index Global Secondary Index
1 Key = hash key and a range key Key = hash or hash-and-range
2
Hash same attribute as that of the table. Range key can be
any scalar table attribute
The index hash key and range key (if present) can be any
scalar table attributes
3
For each hash key, the total size of all indexed items must
be 10 GB or less
No size restrictions for global secondary indexes
4
Query over a single partition, as specified by the hash key
value in the query
Query over the entire table, across all partitions
5 Eventual consistency or strong consistency Eventual consistency only
6 Read and write capacity units consumed from the table.
Every global secondary index has its own provisioned
read and write capacity units
7
Query will automatically fetch non-projected attributes
from the table
Query can only request projected attributes. It will not
fetch any attributes from the table
28. LSI or GSI?
• LSI can be modeled as a GSI
• If date size in a item collection > 10GB use GSI
• If GSI will work for your scenario use GSI!
– 2 round trips (unless you include)
– Eventual consistency
29. Best Practices
• Provision enough throughput for GSI
– one update to the table may result in two writes to an index
• If GSIs do not have enough write capacity, table writes will eventually be
throttled down to what the "slowest" index can consume
36. Web Identity Federation
Users
AWS IAM
Web identity federation
(Fine-grained access control)
Amazon
DynamoDB
37. Fine-Grained Access Control
• Limit access to particular hash key values
• Limit access to specific attributes
• Use policy substitution variables to write the policy once
38. Fine-Grained Access Control
Images Table
User Image Date Link
Bob aed4c 2013-10-01 s3://…
Bob 5f2e2 2013-09-05 s3://…
Bob f93bae 2013-10-08 s3://…
Alice ca61a 2013-09-12 s3://…
“Allow all authenticated Facebook
users to Query the Images table,
but only on items where their
Facebook ID is the hash key”
39. Fine-Grained Access Control
Images Table
User Image Date Link
Bob aed4c 2013-10-01 s3://…
Bob 5f2e2 2013-09-05 s3://…
Bob f93bae 2013-10-08 s3://…
Alice ca61a 2013-09-12 s3://…
Bob
Bob “logs in” using
web identity federation
AWS
IAM
40. Fine-Grained Access Control
Images Table
User Image Date Link
Bob aed4c 2013-10-01 s3://…
Bob 5f2e2 2013-09-05 s3://…
Bob f93bae 2013-10-08 s3://…
Alice ca61a 2013-09-12 s3://…
Bob
Bob can Query for Images
where User=“Bob”
41. Fine-Grained Access Control
Images Table
User Image Date Link
Bob aed4c 2013-10-01 s3://…
Bob 5f2e2 2013-09-05 s3://…
Bob f93bae 2013-10-08 s3://…
Alice ca61a 2013-09-12 s3://…
Bob
Bob cannot Query for Images
where User=“Alice”
42. Two-tier Architecture Tradeoffs
• Pros:
– Lower latency
– Lower cost
– Lower operational complexity
• Cons:
Users
– Less visibility into application behavior
– More difficult to make changes to persistence layer
– Requires “scoping” items to a given user Amazon
DynamoDB
45. Tagging App Query Patterns
• Image Table:
– How many votes does this image(URL) have?
– Does an item already exist for this URL?
• Tag Table:
– How many images are tagged with this tag?
• ImageTag table:
– All images with a given tag
– All tags for a given image
– How may votes does this tag have?
46.
47. Image Table
Id DateAdded VoteCount
"http://tag-pics.s3.amazonaws.com/aws-icons/cloudsearch.png" "2014-05-06T05:50:06.371Z" 0
"http://tag-pics.s3.amazonaws.com/aws-icons/dynamodb.png" "2014-05-06T05:03:16.582Z" 3
Attribute Type Value
Id (Hash Key)
String "http://tag-pics.s3.amazonaws.com/aws-icons/cloudsearch.png"
DateAdded String "2014-05-06T05:50:06.371Z"
VoteCount Number 0
48. Tag Table
Tag ImageCount
"new" 2
"database" 1
"nosql" 1
"cloudsearch" 1
"dynamodb" 1
Attribute Type Value
Tag (Hash Key)
String "database"
ImageCount Number 1
50. ImageTag Table Indexes
Local Secondary Index
Index Name Hash Key Range Key Projected Attributes Index Size (Bytes)* Item Count*
VoteCount-index Tag (String) VoteCount (Number) All 369 3
Global Secondary Index
Index
Name
Hash Key
Range
Key
Projected
Attributes
Status
Read
Capacity
Units
Write
Capacity
Units
Last
Decr
ease
Time
Last
Increas
e Time
Index
Size
(Bytes)*
Item
Coun
t*
ImageId-index
ImageId (String) Tag (String) Tag, ImageId Active 1 1 222 3
54. Summary: Image Tagging Demo
• Modeling applications on DynamoDB is similar to with databases
• Need to plan your schema and indexes around how you are going to query your
data
58. Tic Tac Toe Table
Game Table
Id Players O State IsTie Winner Data
abecd [ Alice, Bob ] Alice DONE 1 …
fbdcc [ Alice, Bob ] Alice DONE Alice …
dbace [ Alice, Bob ] Alice STARTED …
59. Tic Tac Toe Table
{
"Data" : [ [ "X", null, "O" ],
[ null, "O", null],
[ "O", null, "X" ]
]
}
Id Players O State IsTie Winner Data
abecd [ Alice, Bob ] Alice DONE 1 …
fbdcc [ Alice, Bob ] Alice DONE Alice …
dbace [ Alice, Bob ] Alice STARTED …
66. State Transitions with Conditional Writes
Bob (1)
Bob (2)
DynamoDB
Bob (3)
State : STARTED,
Turn : Bob,
Top-Right : O
67. State Transitions with Conditional Writes
Bob (1)
Bob (2)
DynamoDB
Bob (3)
Update:
Turn : Alice
Top-Left : X
Update:
Turn : Alice
Low-Right : X
Update:
Turn : Alice
Mid : X
State : STARTED,
Turn : Bob,
Top-Right : O
68. State Transitions with Conditional Writes
Bob (1)
Bob (2)
DynamoDB
Bob (3)
Update:
Turn : Alice
Top-Left : X
Update:
Turn : Alice
Low-Right : X
Update:
Turn : Alice
Mid : X
State : STARTED,
Turn : Alice,
Top-Right : O,
Top-Left : X,
Mid: X,
Low-Right: X
69. Conditional Writes
• Apply an update only if values are as expected
• Otherwise reject the write
70. Conditional Writes
{
Id : abecd,
Players : [ Alice, Bob ],
State : STARTED,
Turn : Bob,
Top-Right: O
}
UpdateItem Id=abecd
Game Item Updates: {
Turn : Alice,
Top-Left: X
}
Expected: {
Turn : Bob,
Top-Left : null,
State : STARTED
}
71. State Transitions with Conditional Writes
Bob (1)
Bob (2)
DynamoDB
Bob (3)
Update:
Turn : Alice
Top-Left : X
Expect:
Turn : Bob
Top-Left : null
State : STARTED,
Turn : Bob,
Top-Right : O
Update:
Turn : Alice
Low-Right : X
Expect:
Turn : Bob
Low-Right : null
Update:
Turn : Alice
Mid : X
Expect:
Turn : Bob
Mid : null
72. State Transitions with Conditional Writes
Bob (1)
Bob (2)
DynamoDB
Bob (3)
Update:
Turn : Alice
Top-Left : X
Expect:
Turn : Bob
Top-Left : null
State : STARTED,
Turn : Bob,
Top-Right : O
Update:
Turn : Alice
Low-Right : X
Expect:
Turn : Bob
Low-Right : null
Update:
Turn : Alice
Mid : X
Expect:
Turn : Bob
Mid : null
73. State Transitions with Conditional Writes
Bob (1)
Bob (2)
DynamoDB
Bob (3)
Update:
Turn : Alice
Top-Left : X
Expect:
Turn : Bob
Top-Left : null
State : STARTED,
Turn : Alice,
Top-Right : O,
Top-Left : X
Update:
Turn : Alice
Low-Right : X
Expect:
Turn : Bob
Low-Right : null
Update:
Turn : Alice
Mid : X
Expect:
Turn : Bob
Mid : null
76. Primary Key Schemas
Primary Key
Hash Key Schema
Id Players O State IsTie Winner Data
abecd [ Alice, Bob ] Alice DONE 1 …
fbdcc [ Alice, Bob ] Alice DONE Alice …
dbace [ Alice, Bob ] Alice STARTED …
77. Primary Key Schemas
Id Turn Players Turn State IsTie Winner Data
abecd 0 [ Alice, Bob ] Alice STARTED …
abecd 1 [ Alice, Bob ] Bob STARTED …
abecd 2 [ Alice, Bob ] Alice STARTED …
abecd 3 [ Alice, Bob ] Bob STARTED …
abecd 4 [ Alice, Bob ] Alice DONE Alice …
dbace 0 [ Alice, Bob ] Bob STARTED
dbace 1 [ Alice, Bob ] Alice STARTED …
Primary Key
Hash and Range Key Schema
78. Primary Key Schemas
Id Turn Players Turn State IsTie Winner Data
abecd 0 [ Alice, Bob ] Alice STARTED …
abecd 1 [ Alice, Bob ] Bob STARTED …
abecd 2 [ Alice, Bob ] Alice STARTED …
abecd 3 [ Alice, Bob ] Bob STARTED …
abecd 4 [ Alice, Bob ] Alice DONE Alice …
dbace 0 [ Alice, Bob ] Bob STARTED
dbace 1 [ Alice, Bob ] Alice STARTED …
Primary Key
79. Primary Key Schemas
• Hash-only
– Key/value lookups only
• Hash and Range
– Given a hash key value, query for items by range key
– Items are sorted by range key within each hash key
80. Primary Key Schemas
Id Turn Players Turn State IsTie Winner Data
abecd 0 [ Alice, Bob ] Alice STARTED …
abecd 1 [ Alice, Bob ] Bob STARTED …
abecd 2 [ Alice, Bob ] Alice STARTED …
abecd 3 [ Alice, Bob ] Bob STARTED …
abecd 4 [ Alice, Bob ] Alice DONE Alice …
dbace 0 [ Alice, Bob ] Bob STARTED
dbace 1 [ Alice, Bob ] Alice STARTED …
Primary Key
Query WHERE Id=abecd, ORDER BY Turn DESC, LIMIT 2
82. Game-Wide Leaderboard
• Find the top 10 scores game-wide
HighScore User
1000 Alice
850 Dave
580 Erin
470 Bob
30 Chuck
83. Game-Wide Leaderboard
• Find the top 10 scores game-wide
HighScore User
1000 Alice
850 Dave
580 Erin
470 Bob
30 Chuck
Table Schemas must begin
with a Hash Key
84. Game-Wide Leaderboard
• Find the top 10 scores game-wide
Cannot be Queried
the way we want
User HighScore
Chuck 20
Alice 1000
Bob 470
Dave 850
Erin 580
85. Game-Wide Leaderboard
• Use a constant Hash key?
Constant HighScore-User
1 0001000-Alice
1 0000850-Dave
1 0000580-Erin
1 0000470-Bob
1 0000030-Chuck
Zero-pad strings for sort
stability
96. Scaling on DynamoDB
You
Provision 1200 Write Capacity Units
Partition 1 Partition 2
600 Write Capacity Units (each)
Votes Table
97. Scaling on DynamoDB
You
Provision 1200 Write Capacity Units
Partition 1 Partition 2
(no sharing)
Votes Table
98. Scaling on DynamoDB
You
Provision 200,000 Write Capacity Units
Votes Table
Partition 1
(600 WCU)
Partition K
(600 WCU)
Partition M
(600 WCU)
Partition N
(600 WCU)
99. Scaling bottlenecks
Votes Table
Partition 1
(600 WCU)
Candidate A
Partition K
(600 WCU)
Partition M
(600 WCU)
Partition N
(600 WCU)
Candidate B
Voters
100. Scaling bottlenecks
Votes Table
Partition 1
(600 WCU)
Candidate A
Partition K
(600 WCU)
Partition M
(600 WCU)
Partition N
(600 WCU)
Candidate B
Voters
101. Best Practice: Uniform Workloads
“To achieve the full amount of request
throughput you have provisioned for a table,
keep your workload spread evenly across the
hash key values.”
– DynamoDB Developer Guide
113. Data Characteristics: Hot, Warm, Cold
Hot Warm Cold
Volume MB–GB GB–TB PB
Item size B–KB KB–MB KB–TB
Latency ms ms, sec min, hrs
Durability Low–High High Very High
Request rate Very High High Low
Cost/GB $$-$ $-¢¢ ¢
114. Low
We are sincerely eager to hear
Amazon
ElastiCache
your feedback on this
presentation and on re:Invent.
Amazon
RDS
Please fill out an evaluation form
High Low
when you have a chance.
Amazon
Redshift
Amazon S3
Request rate
High Low
Cost/GB
Low High
Latency
Low High
Data Volume
Amazon
Glacier
Amazon
EMR
Structure
High
Amazon
DynamoDB
115. What data store should I use?
Elasti-
Cache
Amazon
DynamoDB
Amazon
RDS
Cloud
Search
Amazon Redshift Amazon
EMR (Hive)
Amazon S3 Amazon
Glacier
Average
latency
ms ms ms,sec ms,sec sec,min sec,min,h
rs
ms,sec,min
(~ size)
hrs
Data volume GB GB–TBs
(no limit)
GB–TB
(3 TB Max)
GB–TB TB–PB
(1.6 PB max)
GB–PB
(~nodes)
GB–PB
(no limit)
GB–PB
(no limit)
Item size B-KB KB
(64 KB max)
KB
(~rowsize)
KB
(1 MB
max)
KB
(64 K max)
KB-MB KB-GB
(5 TB max)
GB
(40 TB
max)
Request rate Very High Very High High High Low Low Low–
Very High
(no limit)
Very Low
(no limit)
Storage cost
$/GB/month
$$ ¢¢ ¢¢ $ ¢ ¢ ¢ ¢
Durability Low -
Moderate
Very High High High High High Very High Very High
Hot Data Warm Data Cold Data
116. Use the right tool for the job!
Data Tier
Amazon
CloudSearch
Amazon RDS
Amazon
ElastiCache
Amazon DynamoDB
Amazon
Elastic MapReduce
Amazon S3
Amazon
Glacier
Amazon Redshift AWS Data Pipeline
117. When to use
• Fast and predictable performance
• Seamless/massive scale
• Autosharding
• Consistent/low latency
• No size or throughput limits
• Very high durability
• Key-value or simple queries
When not to use
• Need multi-item/row or cross table
transactions
• Need complex queries, joins
• Need real-time analytics on
historic data
• Storing cold data
Amazon DynamoDB
120. Social Gaming
• Host games
• Invite friends to play
• Find friends’ games to play
• See history of games
121. Social Gaming
HostedGame
Table
Hash: UserId
Range: GameId
Attributes: OpponentId, Date, (rest of game state)
UserId GameId Date OpponentId …
Carol e23f5a 2013-10-08 Charlie …
Alice d4e2dc 2013-10-01 Bob …
Alice e9cba3 2013-09-27 Bob …
Alice f6a3bd 2013-10-08
122. Social Gaming
• Host games
• Invite friends to play
• Find friends’ games to play
• See history of games
123. Social Gaming: find recent games
UserId GameId Date OpponentId …
Carol e23f5a 2013-10-08 Charlie …
Alice d4e2dc 2013-10-01 Bob …
Alice e9cba3 2013-09-27 Bob …
Alice f6a3bd 2013-10-08
Query UserId=Alice
124. Query cost
• Provisioned Throughput: Work / sec allowed on your table
• Capacity Units: Amount of provisioned throughput consumed by an operation
125. Query cost
UserId GameId Date OpponentId …
Carol e23f5a 2013-10-08 Charlie …
Alice d4e2dc 2013-10-01 Bob …
Alice e9cba3 2013-09-27 Bob …
Alice f6a3bd 2013-10-08
(1 item = 600 bytes)
(397 more games for Alice)
126. Query cost
UserId GameId Date OpponentId …
Carol e23f5a 2013-10-08 Charlie …
Alice d4e2dc 2013-10-01 Bob …
Alice e9cba3 2013-09-27 Bob …
Alice f6a3bd 2013-10-08
(1 item = 600 bytes)
(397 more games for Alice)
(Items evaluated by Query) (KB per Read Capacity Unit)
400 X 600 / 1024 / 4 = 60 Read Capacity Units
(bytes per item) (bytes per KB)
127. Local Secondary Indexes
• An alternate range key on a table
HostedGame Table LocalSecondaryIndex on Date
UserId GameId Date
Carol e23f5a 2013-10-08
Alice d4e2dc 2013-10-01
Alice e9cba3 2013-09-27
Alice f6a3bd 2013-10-01
UserId Date GameId
Carol 2013-10-08 e23f5a
Alice 2013-09-27 e9cba3
Alice 2013-10-01 d4e2dc
Alice 2013-10-01 f6a3bd
128. Query cost on Local Secondary Indexes
UserId Date GameId …
Carol 2013-10-08 e23f5a …
Alice (397 older games)
Alice 2013-09-27 e9cba3 …
Alice 2013-10-01 d4e2dc …
Alice 2013-10-01 f6a3bd …
Query for the 10 most recent games
129. Query cost on Local Secondary Indexes
UserId Date GameId …
Carol 2013-10-08 e23f5a …
Alice (397 older games)
Alice 2013-09-27 e9cba3 …
Alice 2013-10-01 d4e2dc …
Alice 2013-10-01 f6a3bd …
Query for the 10 most recent games
(Items evaluated by Query)(KB per Read Capacity Unit)
10 X 600 / 1024 / 4 = 2 Read Capacity Units
(bytes per item) (bytes per KB)
131. Example Local Secondary Indexes
• Find 10 recent matches between Alice and Bob
– Hash: UserId
– Range: OpponentId + Date
Query WHERE UserId=Alice AND OpponentAndDate STARTS_WITH “Bob-”
LIMIT 10 DESC
132. More example Local Secondary Indexes
• Find a host’s matches without an opponent
133. More example Local Secondary Indexes
• Find a host’s matches without an opponent
– Hash: UserId
– Range: UnmatchedDate
(sparse index)
Query WHERE UserId=Alice LIMIT 10 DESC
134. Local Secondary Index Projections
• Choose what attributes are copied into the index
– ALL, SPECIFIC, KEYS
• Substantially cheaper to Query only projection
• Project the attributes that your use case requires
• Can make writes cheaper too
135. Write cost for Local Secondary Index
• Insert new item
– 1 additional write
• Setting index range key to / from null
– 1 additional write
• Updating a projected attribute
– 1 additional write
• Updating a non-projected attribute
– 0 additional writes
• Updating the index range key
– 2 additional writes
136. Read cost for Query of non-projected attributes
• Regular Query cost
+
• Single-item Get cost for each evaluated item
137. Example Local Secondary Index Projections
• Query Alice’s 10 most recent Games
UserId GameId Date OpponentId …
Carol e23f5a 2013-10-08 Charlie …
Alice d4e2dc 2013-10-01 Bob …
Alice e9cba3 2013-09-27 Bob …
Alice f6a3bd 2013-10-08
138. Example Local Secondary Index Projections
• Query Alice’s 10 most recent Games
– Opponent, Winner, (UserId, GameId, Date)
– Projected item size from 600 bytes to 40 bytes
• Write cost:
– 1 Write Capacity Unit for insert, opponent joining, and completion
– 0 Write Capacity Units for other state transitions
140. Social Gaming: Friends
• Query who you are friends with
• Ask to be friends with someone
• Acknowledge (or decline) friend request
141. Social Gaming: Friends
Friends
Table
Hash: UserId
Range: FriendId
Attributes: Status, Date, etc
UserId FriendId Status Date …
Alice Bob FRIENDS 2013-08-20 …
Bob Alice FRIENDS 2013-08-20 …
Bob Chuck INCOMING 2013-10-08 …
Chuck Bob SENT 2013-10-08 …
142. Becoming Friends: Multi-item Atomic Writes
UserId FriendId Status
Alice Bob FRIENDS
Bob Alice FRIENDS
Bob Chuck INCOMING
Chuck Bob SENT
Bob
A friend request!
143. Becoming Friends: Multi-item Atomic Writes
UserId FriendId Status
Alice Bob FRIENDS
Bob Alice FRIENDS
Bob Chuck INCOMING
Chuck Bob SENT
Bob
1. Update Bob/Chuck record
2. Update Chuck/Bob record
144. Becoming Friends: Multi-item Atomic Writes
UserId FriendId Status
Alice Bob FRIENDS
Bob Alice FRIENDS
Bob Chuck FRIENDS
Chuck Bob SENT
Bob
UpdateItem
Status=FRIENDS
1. Update Bob/Chuck record
2. Update Chuck/Bob record
145. Becoming Friends: Multi-item Atomic Writes
UserId FriendId Status
Alice Bob FRIENDS
Bob Alice FRIENDS
Bob Chuck FRIENDS
Chuck Bob FRIENDS
Bob
UpdateItem
Status=FRIENDS
1. Update Bob/Chuck record
2. Update Chuck/Bob record
147. Becoming Friends: When things go wrong
UserId FriendId Status
Alice Bob FRIENDS
Bob Alice FRIENDS
Bob Chuck INCOMING
Chuck Bob SENT
Bob
A friend request!
1. Update Bob/Chuck record
2. Update Chuck/Bob record
148. Becoming Friends: When things go wrong
UserId FriendId Status
Alice Bob FRIENDS
Bob Alice FRIENDS
Bob Chuck FRIENDS
Chuck Bob SENT
Bob
UpdateItem
Status=FRIENDS
1. Update Bob/Chuck record
2. Update Chuck/Bob record
149. Becoming Friends: When things go wrong
UserId FriendId Status
Alice Bob FRIENDS
Bob Alice FRIENDS
Bob Chuck FRIENDS
Chuck Bob SENT
Bob
UpdateItem
Status=FRIENDS
1. Update Bob/Chuck record
2. Update Chuck/Bob record
150. Multi-item transaction in DynamoDB
• Scan for “stuck” transactions
• Use the Client Transactions Library on the AWS SDK for Java
• Roll your own scheme
153. Client Transactions Usage
• Low contention only
• Don’t mix Tx Client writes with normal writes
• No Query support
• Expensive, slower
• But, easy to use
154. Specialized Transactions
Id Status V1 V2
UserId FriendId Status V
Alice Bob FRIENDS 3
Bob Alice FRIENDS 3
Bob Chuck INCOMING 2
Chuck Bob SENT 2
Bob
1. Read items
2. Write to Tx table
3. Apply writes
4. Delete from Tx table
Transactions Table
A friend request!
155. Specialized Transactions
Id Status V1 V2
UserId FriendId Status V
Alice Bob FRIENDS 3
Bob Alice FRIENDS 3
Bob Chuck INCOMING 2
Chuck Bob SENT 2
Bob BatchGetItem
1. Read items
2. Write to Tx table
3. Apply writes
4. Delete from Tx table
Transactions Table
156. Specialized Transactions
Id Status V1 V2
Bob-Chuck Bob: FRIENDS
Chuck: FRIENDS
2 2
UserId FriendId Status V
Alice Bob FRIENDS 3
Bob Alice FRIENDS 3
Bob Chuck INCOMING 2
Chuck Bob SENT 2
Bob PutItem,
Expect not exists
1. Read items
2. Write to Tx table
3. Apply writes
4. Delete from Tx table
Transactions Table
157. Specialized Transactions
Id Status V1 V2
Bob-Chuck Bob: FRIENDS
Chuck: FRIENDS
2 2
UserId FriendId Status V
Alice Bob FRIENDS 3
Bob Alice FRIENDS 3
Bob Chuck FRIENDS 3
Chuck Bob FRIENDS 3
Bob UpdateItem,
Expect V=Vprev
1. Read items
2. Write to Tx table
3. Apply writes
4. Delete from Tx table
Transactions Table
158. Specialized Transactions
Id Status V1 V2
UserId FriendId Status V
Alice Bob FRIENDS 3
Bob Alice FRIENDS 3
Bob Chuck FRIENDS 3
Chuck Bob FRIENDS 3
Bob DeleteItem,
Expect V1=V1prev,
V2=V2prev,
1. Read items
2. Write to Tx table
3. Apply writes
4. Delete from Tx table
Transactions Table
160. Specialized Transactions
Id Status V1 V2
UserId FriendId Status V
Alice Bob FRIENDS 3
Bob Alice FRIENDS 3
Bob Chuck INCOMING 2
Chuck Bob SENT 2
Bob BatchGetItem
1. Read items
2. Write to Tx table
3. Apply writes
4. Delete from Tx table
Transactions Table
161. Specialized Transactions
Id Status V1 V2
Bob-Chuck Bob: FRIENDS
Chuck: FRIENDS
2 2
UserId FriendId Status V
Alice Bob FRIENDS 3
Bob Alice FRIENDS 3
Bob Chuck INCOMING 2
Chuck Bob SENT 2
Bob PutItem,
Expect not exists
1. Read items
2. Write to Tx table
3. Apply writes
4. Delete from Tx table
Transactions Table
162. Specialized Transactions
Id Status V1 V2
Bob-Chuck Bob: FRIENDS
Chuck: FRIENDS
2 2
UserId FriendId Status V
Alice Bob FRIENDS 3
Bob Alice FRIENDS 3
Bob Chuck FRIENDS 3
Chuck Bob SENT 2
Bob UpdateItem,
Expect V=Vprev
1. Read items
2. Write to Tx table
3. Apply writes
4. Delete from Tx table
Transactions Table
163.
164. Specialized Transactions
Id Status V1 V2
Bob-Chuck Bob: FRIENDS
Chuck: FRIENDS
2 2
UserId FriendId Status V
Alice Bob FRIENDS 3
Bob Alice FRIENDS 3
Bob Chuck FRIENDS 3
Chuck Bob SENT 2
Sweeper Scan
1. Scan for stuck Tx
2. Apply writes
3. Delete from Tx table
Transactions Table
165. Specialized Transactions
Id Status V1 V2
Bob-Chuck Bob: FRIENDS
Chuck: FRIENDS
2 2
UserId FriendId Status V
Alice Bob FRIENDS 3
Bob Alice FRIENDS 3
Bob Chuck FRIENDS 3
Chuck Bob FRIENDS 3
UpdateItem,
Expect V=Vprev
Transactions Table
Sweeper
1. Scan for stuck Tx
2. Apply writes
3. Delete from Tx table
166. Specialized Transactions
Id Status V1 V2
UserId FriendId Status V
Alice Bob FRIENDS 3
Bob Alice FRIENDS 3
Bob Chuck FRIENDS 3
Chuck Bob FRIENDS 3
DeleteItem,
Expect V1=V1prev,
V2=V2prev,
Transactions Table
Sweeper
1. Scan for stuck Tx
2. Apply writes
3. Delete from Tx table
167. Transaction advice
• Lock items before modifying
– Including items that don’t exist yet
• Don’t stomp on future writes (use versions)
• Sweep for stuck transactions
• Avoid deadlock
Hinweis der Redaktion
Local here means that the index is local to each hash key value.
Instead of being local to the hash key, you can provide a different attribute to use as the hash key for the index.
Think of this as a parallel table asynchronously populated by DynamoDB
Eventually consistent
We’ve talked about a some specific examples of user data, like social network and image tagging metadata, but now let’s talk about user data in more general terms.
Which could include user settings or preferences in your app, single-player games, or user event history.
What if you could have your app, say running on your users’ browsers or mobile devices, call into the DynamoDB web service directly? Instead of a 3, or N-tier architecture, this is just two: the client, and the database web service.
Opens up a can of worms around access control. If you give all users access to a database like this, how do you prevent one user from reading, or even modifying another user’s data? How do you distribute credentials for accessing the database to each of your users?
To call DynamoDB, for example, users need AWS Credentials – an AWS Secret key and Access key
To solve that access control problem, we’ll introduce two new concepts in AWS. The first is called Web Identity Federation.
The fact is that each of your app users probably already have an identity in another identity provider, like Facebook, Google, or Amazon.com
With Web Identity Federation, your users can first log in to the provider of their choice, and then pass the login token they get back from that identity provider, to AWS Identity Access Management, which verifies that token with the identity provider, and returns to the user a unique set of temporary AWS credentials just for that end-user.
And using those temporary AWS credentials, the end-users can call DynamoDB. All without having to hardcode or distribute AWS Credentials directly to every single user of your application
Now, that solves the authentication problem of getting credentials to users, but not the authorization problem of preventing users from reading or modifying each other’s items.
You should consider these trade-offs on a use case by use case basis. Maybe there are some use cases like ingestion of user event data where this makes a lot of sense, and other use cases where the added control of your middle tier is better.
Or maybe you want to more quickly prototype your application quickly without the middle-tier, and you can release future versions of your application with a different architecture.
Image: This table stores one item for each image tagged in the app, using the URL of the image as the “Id”. Also stores the number of votes for each image. This table is effectively just a key/value store, since it uses a Hash type primary key with no Secondary Indexes. Single-item and batch operations are the only efficient operations on this table. It efficiently answers questions like, “how many votes does this particular item have”, and “does an item already exist for this URL”, but not more complex questions like, “how many image URLs contain the word ‘cat’”.
Tag: This stores one item for each tag. It also stores the number of images tagged with that particular tag. Like the Image table, this is a Hash type primary key with no Secondary Indexes.
ImageTag: This table maintains a many-to-many relationship between images and tags. It is a Hash and Range primary key schema on Tag (hash) and ImageId (range). This Hash and Range index unlocks the Query API, which lets us efficiently query for all images with a given tag.
Query:
All images with a given tag
This table also has a Local Secondary Index on Tag (hash) and VoteCount (range) so that we can efficiently look up all images, sorted by their popularity. It also has a Global Secondary Index on ImageId (hash) and Tag (range) so that we can efficiently query for all of the tags for a given image.
Let’s look and see what this drawing shows:
We have two players in a round of tic-tac toe. There’s the tic tac toe board stored stored in a DynamoDB table.
This is a match between Alice and Bob, and we’re going to have Alice go first.
We can see with this move that Bob is not very good at this game. By playing that, Alice can guarantee a win by playing in the lower-left.
Let’s say Bob realizes that he not good at this game, and wants to come up with some other way to win.
Based on the API calls we’ve sketched out, this would work and Bob would win, or crash the game.
Let’s take another look at that, this time in terms of the item and attributes
Here all of those will be merged together. UpdateItem lets you pick specific attributes in an item to update, leaving all the rest of the attributes alone.
PutItem on the other hand replaces the whole item, so then it would have been last write wins. That opens another can of worms around being able to “undo” moves, but that’s a different issue that we’ll fix in the same way.
Apply the write only if the values in the item are still what the request expected them to be.
But, only one of those writes will arrive first. Writes to each item are serialized by DynamoDB.
That covers game state transitions, but often in games, players want to save their progress from time to time so that they can go back if they make a mistake or to explore an alternate approach.
Going back to Tic Tac Toe example,
could be used for Tic Tac Toe instant replay,
Or to train some kind of machine learning system that analyses someone’s game behavior to come up with AI opponents of varying difficulty, or coaching tips to point out if they habitually make poor decisions
Anyway, how do we model this on DynamoDB?
Let’s go back to that primary key schema we were talking about before. This Hash Key schema supports really only two kinds of operations: Single item operations by primary key, or full table scan.
This schema effectively gives you a key/value store. You can put items, update them, and retrieve them efficiently at any scale, as long as you have the primary key value.
How let’s look at another type of primary key schema: Hash + Range. Remember that the primary key values uniquely identify an item in the database. So now the game abecd isn’t unique – but each game Id + turn is.
What does this let you do?
This opens up the Query API, where given an exact hash key value, you can efficiently query across the sorted range key values.
For example, this query up to says, given id=abecd, …
You can query LT, GT, BETWEEN, for strings or byte arrays STARTS_WITH.
Next example is taking that game and adding a social network on top of it.
What’s a data model we could use to satisfy those use cases?
…
We’ve taken that hash-only Game item and changed the primary key so it’s hash of the hosting player, and range of the game id.
This model efficiently supports those first two requirements
Cost would be high to retrieve all games Alice ever hosted just to find the recent games. But what does Cost mean on DynamoDB?
But we want a cost that is constant – 10 rows is 10 rows. Not something that gets more and more expensive as your app grows. We need an index.
(If you flip around the string contatenation and put Date before OpponentId, you can’t query the index this way – it would be sorted by date and then player.
We’re going to go a little deeper into secondary indexes, because they also have some options built in that let you really optimize your queries and writes.
To demonstrate what “optimizing cost” means, and to better demonstrate what indexes are doing behind the scenes for you, let’s look at the cost of various operations when you have a local secondary index
Updating the index range key costs 2 writes because it’s a delete/write
Now let’s go back to that use case where you want to retrieve a user’s recent games by date. You could just project everything into the index, but that would be more expensive than it needs to be.
But what do you really need when you do this query? Do you need to show the full game state in this view in your app? Or maybe just a few things.
If you need only a few things, like the primary keys (required), who played, and who won, then you could have some pretty decent cost savings by only projecting those things.
This make queries cheaper by reducing the item sizes,
And makes writes cheaper because many of the writes while playing the game don’t cost anything extra
In DynamoDB, writes are atomic, and reads are isolated for single items, but not cross-items. We’ll show some ways to use those atomic features as building blocks for higher-level transactions. But keep this limitation in mind when choosing the datastore you use for different parts of your applications. Some parts of your app might be really well-suited to DynamoDB, and other parts less so.
Bear in mind that this does add expense and should be considered only if you really need it.
We’ve already talked about the social gaming aspect, but not how people become friends in the first place.
Requirements for a really basic social network are, to be able to query for your friends
Notice here that friendship is a two-way street: There’s a row for Alice being friends with Bob, and Bob being friends with Alice. This way Alice shows up when Bob queries for his friends, and Bob shows up with Alice queries for her friends.
Remember that we have this symmetrical friend relationship – two items for friendship.
Here we have Bob, who discovers this incoming friend request
Now in order for Bob to confirm that, we need to update both the Bob-Chuck record, and the Chuck-Bob record.
In systems like this, systems can crash, lose power, network timeouts, etc. Let’s see what happens when there’s a failure like this when accepting a friend request.
Let’s look at that first Scan for stuck transactions example. To make it easier to find stuck transactions, we’ll add some transition states. Here we have the friend items for bob/chuck item half in the first row, and for chuck/bob in the second row.
When Chuck goes to send a friend request the chuck/bob item.
This doesn’t cover if they each send a friend request to each other simultaneously, but we can ignore that for now.
Let’s go through that same example where Bob wants to accept Chuck’s incoming friend request. This time we have a new Transactions table to support it all.
It is important to sort the two user ids when constructing the id.
Writing this transaction item will effectively guarantee that it will eventually be driven to success, without content from any of the other friend requests or accepts, because any modification to bob/chuck or chuck/bob will need to go through this exact same row.
Here’s what the sweeper is going to do. Something that can be running in the background, or it’s code we can invoke if we ever happen across a stuck transaction.
Building atomic multi-item operations into your application can be tricky. Here are a few general guiding principals we’ve talked about today, but this doesn’t cover every edge case.
You’ll often find that there is some place in your schema – some item somewhere that you can use to serialize and lock. This is really the tip of the iceberg in terms of approaches for making atomic writes, so keep your eyes open for other examples of how applications are dealing with this in the real-world.