This document summarizes a presentation about using Amazon DynamoDB Streams and AWS Lambda for real-time data processing. It discusses how DynamoDB Streams provides a log of item-level changes in DynamoDB tables and how Lambda functions can be triggered by these streams to run custom code. Examples are given of building applications that replicate data, perform auditing, and send notifications in real-time. The presentation includes a demo of setting up cross-region replication and data auditing using DynamoDB Streams and Lambda.
2. Amazon DynamoDB Streams – time-ordered sequence
of item-level changes
• Time and partition ordered log
• Provides a stream of inserts, deletes, updates
• Old item
• New item
• Primary key
• Change type
• Stream items delivered exactly once
• Streams are asynchronous
• Scales with your table
DynamoDB DynamoDB Streams
3. Benefits of DynamoDB Streams for real-time data
processing
Durability & high availability
• High throughput consensus protocol
• Replicated across multiple AZs
Managed streams
• Simply enable streaming
Performance
• Designed for sub-second latency
Native integration with AWS Lambda
• DynamoDB Triggers invoke a Lambda
function to run your custom code
DynamoDB DynamoDB Streams
DynamoDB Triggers
Lambda function
Run custom code
4. AWS Lambda: A compute service that runs your code
in response to events
Lambda functions: Stateless, trigger-based code execution
Triggered by events:
• Direct Sync and Async invocations
• Put to an Amazon S3 bucket
• Table update on Amazon DynamoDB
• And many more …
Makes it easy to
• Build back-end services that perform at scale
• Perform data-driven auditing, analysis, and notification
5. High performance at any scale;
Cost-effective and efficient
No Infrastructure to manage
Pay only for what you use: Lambda
automatically matches capacity to
your request rate. Purchase
compute in 100ms increments.
Bring Your Own Code
“Productivity focused compute platform to build powerful, dynamic, modular
applications in the cloud”
Run code in a choice of standard
languages. Use threads, processes,
files, and shell scripts normally.
Focus on business logic, not
infrastructure. You upload code; AWS
Lambda handles everything else.
Benefits of AWS Lambda for building a server-less data
processing engine
1 2 3
6. DynamoDB Streams + Lambda = Database Triggers
Run multiple real time applications in parallel
• DynamoDB Streams natively supports Cross Region Replication
• Triggers enables Filtering, Monitoring, Auditing, Notifications, Aggregation, etc.
• No charge for reads/polls that your AWS Lambda function makes to the DynamoDB
Stream associated with the table
7. Walkthrough of a simple stream logging application
workflow
Streams
Amazon
DynamoDB
AWS Lambda Amazon
CloudWatch Logs
New table
updates
10. Walkthrough of setting up DynamoDB Triggers and Lambda
functions through the AWS Console
11. Walkthrough of setting up DynamoDB Triggers and Lambda
functions through the AWS Console
12. Walkthrough of setting up DynamoDB Triggers and Lambda
functions through the AWS Console
13. Walkthrough of setting up DynamoDB Triggers and Lambda
functions through the AWS Console
14. Walkthrough of setting up DynamoDB Triggers and Lambda
functions through the AWS Console
15. Walkthrough of setting up DynamoDB Triggers and Lambda
functions through the AWS Console
16. Walkthrough of setting up DynamoDB Triggers and Lambda
functions through the AWS Console
17. Walkthrough of setting up DynamoDB Triggers and Lambda
functions through the AWS Console
18. Walkthrough of setting up DynamoDB Triggers and Lambda
functions through the AWS Console
19. Today’s demo: Workflow of cross-region replication
and real-time data auditing
Original Table
Data Stream
Amazon
DynamoDB
AWS Lambda
Amazon
DynamoDB
Amazon SNS
20. Loop through event array
Replicate item to different table
Send notification if suspicious record
In both cases, wait for callbacks before exiting
21. Demo: Cross region replication and
real-time data auditing using Amazon
DynamoDB and AWS Lambda
22. Attaching Lambda functions to DynamoDB Streams
• Automatic Shards: One Lambda function concurrently invoked per DynamoDB shard
• Each individual shard follows ordered processing
• A given key will be present in at most one concurrently active shard
• All changes (insert, remove, modify) available for a rolling 24-hour basis
… …
Source
DynamoDB Streams
Destination
1
Lambda
Destination
2
Pollers FunctionsShards
Lambda will scale automaticallyDynamoDB Streams scales by grouping records into shards
23. Attaching Lambda functions to DynamoDB Streams
• Reading the stream: Stream is exposed via the familiar Amazon Kinesis Client Library
interface
• Read the stream using https://github.com/awslabs/dynamodb-streams-kinesis-adapter
• Records can be retrieved at ~2x rate of the table’s provisioned write capacity
• Automatic Scaling: Both Dynamo DB and Lambda scale automatically with PUT rates
• Default limit of 100 concurrent Lambda functions, can be increased by AWS Support Center
24. Performance tuning DynamoDB as an event source
• Batch size: Max records that AWS
Lambda will retrieve from DynamoDB at
the time of invoking your function
• Increasing batch size will cause fewer
Lambda function invocations with more
data processed per function
• Starting Position: The position in the
stream where Lambda starts reading
• Set to “Trim Horizon” for starting with
oldest record
• Set to “Latest” for starting with most
recent data
25. Best practices for creating Lambda functions
• Memory: CPU proportional to the memory configured
• Increasing memory makes your code execute faster (if CPU bound)
• Timeout: Increasing timeout allows for longer functions, but more wait in case of errors
• Retries: For DynamoDB Streams, Lambda has unlimited retries (until data expires)
• Permission model: Lambda pulls data from DynamoDB, so no resource policy needed,
only execution role to allow Lambda access to DynamoDB
26. Monitoring and Debugging Lambda functions
• Console Dashboard
• Lists all Lambda functions
• Easy editing of resources,
event sources and other
settings
• At-a-glance metrics
• Metrics in CloudWatch
• Requests
• Errors
• Latency
• Throttles
• Logging in CloudWatch Logs
27. Three Next Steps
1. Enable DynamoDB Streams for your existing DynamoDB tables. DynamoDB
Streams provides a time-ordered sequence of item-level changes made to data in a
table in the last 24 hours.
2. Create and test your first Lambda function. With AWS Lambda, there are no new
languages, tools, or frameworks to learn. You can use any third party library, even
native ones.
3. Use AWS Lambda with DynamoDB Streams to create DynamoDB Triggers … no
infrastructure to manage, and setup a clean and lightweight implementation of
database triggers, NoSQL style!
28. Thank you!
Visit http://aws.amazon.com/dynamodb,
the AWS blog, and the DynamoDB
forum to learn more and get started
using DynamoDB.
Visit http://aws.amazon.com/lambda, the
AWS Compute blog, and the Lambda
forum to learn more and get started
using Lambda.