Advanced Design Patterns for Amazon DynamoDB - Workshop (DAT404-R1) - AWS re:Invent 2018

© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Advanced Design Patterns for
Amazon DynamoDB – Workshop
Padma Malligarjunan
Senior Product Manager, AWS
D A T 4 0 4
Sean Shriver
Senior Solutions Architect, AWS

Set up the lab environment
1. Prerequisites: Laptop, Amazon DynamoDB basics
2. Setup time: 15 minutes
3. Go to https://amazon.qwiklabs.com
4. Choose Sign In and log on using your credentials or click Join Here to create a new
account.
1. If you click on Create New Account or the Forgot Password link, you will
receive an email from noreply@qwiklabs.com.
5. Enter in the top Search bar ”Advanced Design Patterns Using Amazon
DynamoDB”. Click on the lab to open it.
6. Click Start Lab. Enter the Lab Access Code provided to you.
7. On the right-side of the page is the outline for the lab. Note that there are 8 exercises
8. Review and complete the “Setup” and “Start Lab” sections.
9. Stop when you see the “You have Completed the Setup!” and wait. Do not start
Exercise 1.

Agenda
• Prerequisites for the workshop
• Key DynamoDB concepts
• Tenets of NoSQL data modeling
• Workshop (2 hours, 7 exercises)
• Design patterns - m:n relationships, index overloading, sparse
indexes, adjacency lists, Amazon DynamoDB Streams

Amazon DynamoDB – Key concepts
Document or key-value
Scales to any workload
Fully managed NoSQL
Access control
Event-driven programming
Fast and consistent
Table
Items
Attributes
Partition
Key
Sort
Key
• Global secondary index
• Local secondary index

Amazon DynamoDB – Key concepts
Selecting a Partition Key
• Large number of distinct
values
• Requests are uniformly
distributed
Examples
• Bad: Status, Year
• Good: CustomerId, DeviceId
Selecting a Sort Key
• Model 1:n and m:n
relationships
• Used to order data on disk
• Efficient/selective queries
• Range queries
Examples
• OrderId and OrderItemId per
CustomerId

• Understand the use case
• Identify the access patterns
• Read/Write workloads
• Query dimensions and
aggregations
• Data modeling
• Using NoSQL design patterns
• Review -> Repeat -> Review
Tenets of DynamoDB data modeling

aggregations
• Data modeling
• Nature of the data
• OLTP / OLAP / full-text search
• Relationships between the
entities
• What does concurrent access
look like?
• Time series data
• Archiving needs, etc.

aggregations
• Data modeling
• Source data analysis (write
workload)
• Reading one item versus
multiple items (read workload)
• Query aggregations and KPIs

aggregations
• Data modeling
• 1:1, 1:n, m:n relationships
• 1 application = 1 table
• Avoid unnecessary fetches
• Simplify access patterns
• Identify primary key
• Partition key and Sort key
• Query dimensions using LSIs and
GSIs

Local Secondary Index Global Secondary Index
Capacity planning for indexes
• Index is local to the partition
• Data size per partition <10GB
• Index across partitions
• Eventually consistent
• Consumes RCUs/WCUs from the
table’s provisioning
• RCUs/WCUs are provisioned
separately for GSIs
If GSIs don’t have enough write capacity, all table writes will be
throttled!

Throughput
• Provision any amount of throughput to a table
Size
• Add any number of items to a table
• Tables can scale to any size
• Max item size is 400 KB
Scaling is achieved through partitioning
Horizontal scaling

0
400
800
1200
1600
CapacityUnits
Time
Provisioned Consumed
“Save up” unused capacity
Consume saved-up capacity
Burst: 300 seconds
(1200 × 300 = 360K CU)
Burst capacity is built-in

Adaptive capacity

• Calculate provisioned throughput for all access patterns to be
supported by the table & secondary indexes
• Design the table for uniform access across a large number of
logical partition keys
• The key pressure for each logical partition key should be
under 3,000 RCUs and 1,000 WCUs.
• Enable DynamoDB auto scaling for normal traffic
• Understand planning for marketing events or Prime Day type
activities
Capacity planning guidelines

Workshop plan
1. Seven exercises, skipping exercise #2
2. ~15 minutes per exercise
3. Raise your hands for questions
4. Stay in sync with the class
5. Review the Python scripts

Exercise #1 - DynamoDB Capacity Units & Partitioning (20 minutes)
What you’ll learn:
• Tables with a small number of WCUs
• Write a smaller data set
• Write larger data sets at a higher rate
• Burst capacity
• Behavior when increasing provisioned capacity
• Under-provisioned WCUs on a GSI
• Viewing these metrics on Amazon CloudWatch
Workshop

DynamoDB Scan operations
• Access every item in a table on an index
• Read 1 MB data in each operation
• Use LastEvaluatedKey to continue
• Reads up to the maximum throughput of a single partition
• Parallel scans versus sequential scans

Parallel Scan
• Read all the items from a table faster
• Take advantage of the table’s provisioned capacity
• Set TotalSegments = number of application workers; each worker scans a
different segment
All Data items
Segment 0 Segment 1 Segment 2
Worker-0 Worker-1 Worker-2
Application - Main thread
Worker-3
Segment 3

Sequential versus parallel Scan
Scenario:
Scan server logs data for response code = OK
• Sequential Scan
fe = "responsecode <> :f"
eav = {":f": 200}
response = table.scan(
FilterExpression=fe,
ExpressionAttributeValues=eav,
Limit=pageSize
)
• Parallel Scan
fe = "responsecode <> :f"
eav = {":f": 200}
response = table.scan(
FilterExpression=fe,
ExpressionAttributeValues=eav,
Limit=pageSize,
TotalSegments=totalsegments,
Segment=threadsegment
)

Sequential versus parallel Scan
Scenario:
• Sequential Scan

Sequential versus Parallel Scan
Scenario:
• Parallel Scan

Exercise #2 – DynamoDB Sequential and Parallel table scan (10 minutes)
What you’ll learn
• Time a Sequential (simple) scan versus a Parallel scan.
• Populate a table with a large data set.
• Scan and compare run times.
• Time difference will be significant for larger data sets.
• Don’t forget to review the Python scripts
Workshop

Write sharding – Voting example
Partition 1
1000 WCUs
Partition K
1000 WCUs
Partition M
1000 WCUs
Partition N
1000 WCUs
Votes Table
Candidate A Candidate B
Voters
Provision 200,000 WCUs

Write sharding – Scaling writes
Candidate A_2
Candidate B_1
Candidate B_2
Candidate B_3
Candidate B_5
Candidate B_4
Candidate B_7
Candidate B_6
Candidate A_1
Candidate A_3
Candidate A_4
Candidate A_7 Candidate B_8
Candidate A_6 Candidate A_8
Candidate A_5
Voter
Votes Table

Write sharding – Scaling writes
Candidate A_2
Candidate B_1
Candidate B_2
Candidate B_3
Candidate B_5
Candidate B_4
Candidate B_7
Candidate B_6
Candidate A_1
Candidate A_3
Candidate A_4
UpdateItem: “CandidateA_” + rand(0, 10)
ADD 1 to Votes
Candidate A_5
Voter
Votes Table

Write sharding – Merging results
Votes Table
Candidate A_2
Candidate B_1
Candidate B_2
Candidate B_3
Candidate B_5
Candidate B_4
Candidate B_7
Candidate B_6
Candidate A_1
Candidate A_3
Candidate A_4
Candidate A_5
Periodic
Process
Candidate A
Total: 2.5M
1. Sum
2. Store Voter

Write sharding – Trade-in orders
ShipmentService

Exercise #3 – GSI Write Sharding (15 minute)
• Use the same server logs data from the previous lab, and query the selected
list of 404 error records
• The GSI was already created, review the GSI definition
• Query the GSI to retrieve the records selectively, also include sort key
• Review the Python scripts
Workshop

Data modeling: GSI overloading
• Working with the 5 GSI default limit
• Overload attribute values based on an item’s context
Example: Query Employee Database by
(1) Employee Name (5) Current Job role
(2) Desk (6) Employees in a City
(3) Hire Date (7) Warehouse Location
(4) Quarterly Sales (8) Employee ID
…. and many more

Partition Key = “Employee ID”
Sort Key = (Let’s look at final design)
DynamoDB Table design…
(1) Employee Name (2) Desk (3) Hire Date (4) Quarterly Sales

Partition Key = “Employee ID”
Sort Key = Use a generic attribute-name “A”
Value is based on the item type.
Create a GSI on “A”.

In the upcoming lab…

Exercise #4 – GSI Key Overloading (15 minute)
• Load the Employee data, create an overloaded-GSI, and query the results!
Workshop

Data modeling: Sparse indexes
• Saves WCUs
• Selective querying
Example: Investigators use a GSI to query “Assigned” Cases
A
U
D
I
T
S
Primary Key
Attributes
CaseID Item
CASE1
DETAILS1
Assignee Tags ASIN OrderID
INV1
Assignee Metadata
INV1#AUDIT1
Assignee Status_Date Type TicketID
ASSIGNED_2017-08-08 T1
INV1#AUDIT2
COMPLETED_2017-08-08 T1
INV1#AUDIT3
POST-AUDIT_2017-08-08 T1
INV1#AUDIT4
ARCHIVED_2017-08-08 T1
S31
S3Datapoints

A
U
D
I
T
S
Primary Key
Attributes
CaseID Item
CASE1
DETAILS1
Other attributes
INV1
Other attributes
INV1#AUDIT1
ASSIGNEE STATUS_DATE
INV1#AUDIT2
INV1#AUDIT3
INV1#AUDIT4

A
U
D
I
T
S
Primary Key
Attributes
CaseID Item
CASE1
DETAILS1
Other attributes
INV1
Other attributes
INV1#AUDIT1
ASSIGNEE (GSI-PK) STATUS_DATE (GSI-SK)
INV1#AUDIT2
INV1#AUDIT3
INV1#AUDIT4

A
U
D
I
T
S
Primary Key
Attributes
CaseID Item
CASE1
DETAILS1
Other attributes
INV1
Other attributes
INV1#AUDIT1
ASSIGNEE GSI_STATUS_DATE Status_Date
ASSIGNED_2017-08-08 T1 ASSIGNED_2017-08-08 T1
INV1#AUDIT2
Assignee Status_Date
INV1#AUDIT3
INV1#AUDIT4

Data modeling: Sparse indexes – another
example
Id
(Partition)
User Game Score Date Award
1 Bob G1 1300 2012-12-23
2 Bob G1 1450 2012-12-23
3 Jay G1 1600 2012-12-24
4 Mary G1 2000 2012-10-24 Champ
5 Ryan G2 123 2012-03-10
6 Jones G2 345 2012-03-20
Game-scores-table
Award
(Partition)
Id User Score
Champ 4 Mary 2000
Award-GSI
Scan sparse GSIs

Exercise #5 – Sparse Indexes (15 minute)
• Create a GSI on an existing table (this step takes a few minutes)
• Review the use case for using a sparse index
• Take advantage of sparse Indexes!
Workshop

Data modeling: Composite sort keys
• Define hierarchical relationships
• Execute selective queries

Multivalue sorts and filters
Secondary index
Opponent Date GameId Status Host
Alice 2014-10-02 d9bl3 DONE David
Carol 2014-10-08 o2pnb IN_PROGRESS Bob
Bob 2014-09-30 72f49 PENDING Alice
Bob 2014-10-03 b932s PENDING Carol
Bob 2014-10-03 ef9ca IN_PROGRESS David
Bob
Partition key Sort key

Approach 1 : Query Filter
Secondary index
Opponent Date GameId Status Host
Alice 2014-10-02 d9bl3 DONE David
Carol 2014-10-08 o2pnb IN_PROGRESS Bob
Bob 2014-09-30 72f49 PENDING Alice
Bob 2014-10-03 b932s PENDING Carol
Bob 2014-10-03 ef9ca IN_PROGRESS David
SELECT * FROM Game
WHERE Opponent='Bob'
ORDER BY Date DESC
FILTER ON Status='PENDING'
Bob
(filtered out)

Approach 2 : Composite sort keys
StatusDate
DONE_2014-10-02
IN_PROGRESS_2014-10-08
IN_PROGRESS_2014-10-03
PENDING_2014-09-30
PENDING_2014-10-03
Status
DONE
IN_PROGRESS
IN_PROGRESS
PENDING
PENDING
Date
2014-10-02
2014-10-08
2014-10-03
2014-10-03
2014-09-30
+ =

Secondary Index
Opponent StatusDate GameId Host
Alice DONE_2014-10-02 d9bl3 David
Carol IN_PROGRESS_2014-10-08 o2pnb Bob
Bob IN_PROGRESS_2014-10-03 ef9ca David
Bob PENDING_2014-09-30 72f49 Alice
Bob PENDING_2014-10-03 b932s Carol
Partition key Sort key

Opponent StatusDate GameId Host
Alice DONE_2014-10-02 d9bl3 David
Carol IN_PROGRESS_2014-10-08 o2pnb Bob
Bob IN_PROGRESS_2014-10-03 ef9ca David
Bob PENDING_2014-09-30 72f49 Alice
Bob PENDING_2014-10-03 b932s Carol
Secondary index
Bob
SELECT * FROM Game
WHERE Opponent='Bob'
AND StatusDate BEGINS_WITH 'PENDING'

Example: Query Amazon buildings by Country, State, City, Office Location
Partition Key = “USA”
Sort Key = “TX”CONCAT ( , “AUSTIN”, “AUS-007” )

Example: Query Amazon buildings by State, City, Office Location

• Use composite sort key to define a hierarchy
• Highly selective queries with sort conditions
• Reduce query complexity

Exercise #6 – Composite keys (15 minute)
• Query employee data using State, City, Department.
• Create a GSI on PartitionKey=State; SortKey=City#Dept.
• If the attribute doesn’t exist, you have to add it to the table.
• Recap: In the real world, understand all the access patterns to design your
table
Workshop

Data modeling: Adjacency lists
• Easily model many-to-many relationships
• Without excessive data duplication
Example: An invoice contains many bills. A bill can be broken up into
multiple invoices.
Desired access patterns: (1) Get bills by invoice ID
(2) Get invoices by bill ID

Example: An invoice contains many bills. A bill can be broken up into multiple invoices.
Desired access patterns: (1) Get bills by invoice ID, (2) Get invoices by bill ID
Primary Key
Attributes
ID Item
INVOICE1
INVOICE1
Other Invoice Attributes
BILL1
Bill-Invoice Attributes
BILL2
BILL1 BILL1
Other Bill Attributes
BILL2 BILL1

Primary Key
Attributes
ID Item (GSI-PK)
INVOICE1
INVOICE1
Other Invoice Attributes
BILL1
BILL2
BILL1 BILL1
BILL2 BILL2

GSI Primary Key
Projected Attributes
Item
INVOICE1
ID Item Other Invoice Attributes
INVOICE1 INVOICE1
BILL1
ID Item Bill-Invoice Attributes
INVOICE1 BILL1
ID Item Other Bill Attributes
BILL1 BILL1
BILL2
ID Item Bill-Invoice Attributes
INVOICE1 BILL2
ID Item Other Bill Attributes
BILL2 BILL2
Secondary index

Exercise #7 – Adjacency Lists (10 minute)
• Many-to-many relationships between Invoices, Bills & Customers using
Adjacency lists and an overloaded GSI partition key.
• Query all Bills for an Invoice
• Query all invoices for a Bill.
Workshop

DynamoDB Streams and AWS Lambda

DynamoDB Streams as triggers
Lambda function
Notify change
Item/Table Level Metrics
Amazon CloudSearch
Amazon Kinesis Firehose

DynamoDB Streams
N e a r - r e a l - t i m e m e t r i c s a n d a g g r e g a t i o n s

Exercise #8 – DynamoDB Streams & Lambda (10 minutes)
• Build a replication workflow
• Enable Streams on the tlog table
• Create a replication target
• Create a Lambda function (with AWS Identity and Access Management
[IAM] policies) to read the streams and write to the target
Workshop

Thank you!
Padma Malligarjunan - @theRealPadma
Sean Shriver - @sean_shriver

Advanced Design Patterns for Amazon DynamoDB - Workshop (DAT404-R1) - AWS re:Invent 2018

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Ähnlich wie Advanced Design Patterns for Amazon DynamoDB - Workshop (DAT404-R1) - AWS re:Invent 2018

Ähnlich wie Advanced Design Patterns for Amazon DynamoDB - Workshop (DAT404-R1) - AWS re:Invent 2018 (20)

Mehr von Amazon Web Services

Mehr von Amazon Web Services (20)

Advanced Design Patterns for Amazon DynamoDB - Workshop (DAT404-R1) - AWS re:Invent 2018