Diese Präsentation wurde erfolgreich gemeldet.
Wir verwenden Ihre LinkedIn Profilangaben und Informationen zu Ihren Aktivitäten, um Anzeigen zu personalisieren und Ihnen relevantere Inhalte anzuzeigen. Sie können Ihre Anzeigeneinstellungen jederzeit ändern.
© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Cecilia Deng
Software Developer
AWS Lambda
12/01...
What to Expect from the Session
• What kinds of real time events can trigger lambda?
• How does Lambda pull and process st...
Flavors of real time event sources
Asynchronous Invoke
Push Event Source
Synchronous Invoke
Push Event Source
Stream
Pull ...
Real-time push
Real-time push
Who?
• Any integrator that uses AWS Lambda invoke API
• E.g., Amazon S3, Amazon SNS, Amazon Alexa, AWS IoT
...
Real-time push
Synchronous Invoke
Push Event Source
Asynchronous Invoke
Push Event Source
Real-time pull
Real-time pull
Who?
• Amazon Kinesis and DynamoDB update streams
What?
• Lambda grabbing events from a stream for processi...
Real-time pull
Stream
Pull Event Source
Processing streams
Processing streams: Kinesis setup
Streams
▪ Made up of shards
▪ Each shard supports writes up to 1 MB/s
▪ Each shard suppo...
Processing streams: Lambda setup
Memory
▪ CPU is proportional to the memory
configured
▪ More memory means faster executio...
Processing streams: event source setup
Batch size
▪ Max number of records that Lambda will send in one invocation
▪ Not eq...
Processing streams: event source setup
Starting Position:
▪ The position in the stream where Lambda starts reading
▪ Set t...
Processing streams: event source setup
Amazon
Kinesis 1
AWS
Lambda 1
Amazon
CloudWatch
Amazon
DynamoDB
AWS
Lambda 2 Amazon...
Processing streams: under the hood
Event received by Lambda function is a collection of records from the stream
{ "Records...
Processing streams: under the hood
Polling
▪ Concurrent polling and processing per shard
▪ Polls every 250 ms if no record...
Processing streams: under the hood
Per Shard:
▪ Lambda calls GetRecords with max limit from Kinesis (10 k or 10 MB)
▪ If n...
Processing streams: how it works
▪ Lambda blocks on ordered processing for each individual shard
▪ Increasing # of shards ...
Processing streams: under the hood
▪ Retry execution failures until the record is expired
▪ Retry with exponential backoff...
Processing streams: under the hood
▪ Maximum theoretical throughput:
# shards * 2 MB / (s)
▪ Effective theoretical through...
Processing streams: how it looks
•GetRecords (effective throughput): bytes, latency, records, etc.
•PutRecord: bytes, late...
Processing streams: how it looks
Amazon CloudWatch Metrics
• Invocation count
• Duration
• Error count
• Throttle count
Am...
Processing streams: how it looks
Common observations:
▪ Effective batch size may be less than configured during low throug...
ANALYSING USAGE OF THOMSON
REUTERS PRODUCTS WITH AWS
Anders Fritz & Marco Pierleoni
CHALLENGE
To identify and define a solution for usage analytics tracking that enables product teams to
take ownership of t...
SOLUTION
SOLUTION
SOLUTION
SOLUTION
SOLUTION
SOLUTION
SOLUTION
• Product Insight is live – adoption rate high.
• Tested 4,000 requests per second while targeting 5bn requests / month.
•...
EVENTS CAPTURED
UK EU Referendum June 23rd (BrExit)
time
#events
EVENTS CAPTURED
US Elections November 8th
time
#events
Thank you!
Remember to complete
your evaluations!
Nächste SlideShare
Wird geladen in …5
×

AWS re:Invent 2016: Real-time Data Processing Using AWS Lambda (SVR301)

1.437 Aufrufe

Veröffentlicht am

Serverless architecture can eliminate the need to provision and manage servers required to process files or streaming data in real time.
In this session, we will cover the fundamentals of using AWS Lambda to process data in real-time from push sources such as AWS Iot and pull sources such as Amazon DynamoDB Streams or Amazon Kinesis. We will walk through sample use cases and demonstrate how to set up some of these real-time data processing solutions. We'll also discuss best practices and do a deep dive into AWS Lambda real-time stream processing.

You also hear from speakers from Thomson Reuters, who discuss how the company leverages AWS for its Product Insight service. The service provides insights to collect usage analytics for Thomson Reuters products. The speakers walk through its architecture and demonstrate how they leverage Amazon Kinesis Streams, Amazon Kinesis Firehose, AWS Lambda, Amazon S3, Amazon Route 53, and AWS KMS for near real-time access to data being collected around the globe. They also outline how applying AWS methodologies benefited its business, such as time-to-market and cross-region ingestion, auto-scaling capabilities, low-latency, security features, and extensibility.

Veröffentlicht in: Technologie

AWS re:Invent 2016: Real-time Data Processing Using AWS Lambda (SVR301)

  1. 1. © 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Cecilia Deng Software Developer AWS Lambda 12/01/2016 SVR301 Real-Time Processing Using AWS Lambda Anders Fritz Senior Manager ThomsonReuters Marco Pierleoni Lead Software Developer Thomson Reuters
  2. 2. What to Expect from the Session • What kinds of real time events can trigger lambda? • How does Lambda pull and process streams? • What are some stream processing behaviors? • Hear how Thomson Reuters went real time with AWS Lambda
  3. 3. Flavors of real time event sources Asynchronous Invoke Push Event Source Synchronous Invoke Push Event Source Stream Pull Event Source S3 async invoke Alexa skill sync invoke Pull then sync invoke DynamoDB Update Stream
  4. 4. Real-time push
  5. 5. Real-time push Who? • Any integrator that uses AWS Lambda invoke API • E.g., Amazon S3, Amazon SNS, Amazon Alexa, AWS IoT What? • Event sources sending events to Lambda for processing How? • Real-time triggered events owned by event source • Real-time processing owned by Lambda invoke methods
  6. 6. Real-time push Synchronous Invoke Push Event Source Asynchronous Invoke Push Event Source
  7. 7. Real-time pull
  8. 8. Real-time pull Who? • Amazon Kinesis and DynamoDB update streams What? • Lambda grabbing events from a stream for processing How? • Mapping maintained by Lambda • Real-time triggered events owned by DDB or Kinesis producer • Real-time processing owned by Lambda stream polling component and invoke methods
  9. 9. Real-time pull Stream Pull Event Source
  10. 10. Processing streams
  11. 11. Processing streams: Kinesis setup Streams ▪ Made up of shards ▪ Each shard supports writes up to 1 MB/s ▪ Each shard supports reads up to 2 MB/s ▪ Each shard supports 5 reads/s Data ▪ All data is stored and replayable for 24 hours by default ▪ Make sure partition key distribution is even to optimize parallel throughput ▪ Pick a key with more groups than shards
  12. 12. Processing streams: Lambda setup Memory ▪ CPU is proportional to the memory configured ▪ More memory means faster execution, if CPU bound ▪ More memory means larger sized record batches can be processed Timeout • Increasing timeout allows for longer functions, but more wait in case of errors Permission model • The execution role defined for Lambda must have permission to access the stream
  13. 13. Processing streams: event source setup Batch size ▪ Max number of records that Lambda will send in one invocation ▪ Not equivalent to how many records Lambda gets from Kinesis ▪ Effective batch size is MIN(records available, batch size, 6 MB) ▪ Increasing batch size allows fewer Lambda function invocations with more data processed per function
  14. 14. Processing streams: event source setup Starting Position: ▪ The position in the stream where Lambda starts reading ▪ Set to “Trim Horizon” for reading from start of stream (all data) ▪ Set to “Latest” for reading most recent data (LIFO) (latest data)
  15. 15. Processing streams: event source setup Amazon Kinesis 1 AWS Lambda 1 Amazon CloudWatch Amazon DynamoDB AWS Lambda 2 Amazon S3 • Multiple functions can be mapped to one stream • Multiple streams can be mapped to one Lambda function • Each mapping is a unique key pair Kinesis stream to Lambda function • Each mapping has unique shard iterators Amazon Kinesis 2
  16. 16. Processing streams: under the hood Event received by Lambda function is a collection of records from the stream { "Records": [ { "kinesis": { "partitionKey": "partitionKey-3", "kinesisSchemaVersion": "1.0", "data": "SGVsbG8sIHRoaXMgaXMgYSB0ZXN0IDEyMy4=", "sequenceNumber": "49545115243490985018280067714973144582180062593244200961" }, "eventSource": "aws:kinesis", "eventID": "shardId- 000000000000:49545115243490985018280067714973144582180062593244200961", "invokeIdentityArn": "arn:aws:iam::account-id:role/testLEBRole", "eventVersion": "1.0", "eventName": "aws:kinesis:record", "eventSourceARN": "arn:aws:kinesis:us-west-2:35667example:stream/examplestream", "awsRegion": "us-west-2" } ] }
  17. 17. Processing streams: under the hood Polling ▪ Concurrent polling and processing per shard ▪ Polls every 250 ms if no records found ▪ Grab as much as possible in one GetRecords call Batching ▪ Sub batch in memory for invocation payload Synchronous invocation ▪ Batches invoked as synchronous RequestResponse type ▪ Lambda honors Kinesis at least once semantics ▪ Each shard blocks on in order synchronous invocation
  18. 18. Processing streams: under the hood Per Shard: ▪ Lambda calls GetRecords with max limit from Kinesis (10 k or 10 MB) ▪ If no record, wait 250 ms ▪ From in memory, sub batches and formats records into Lambda payload ▪ Invoke Lambda with synchronous invoke … … Source Kinesis Lambda Polling Logic Shards Lambda will scale automaticallyScale Kinesis by adding shards Batch sync invokesPolls
  19. 19. Processing streams: how it works ▪ Lambda blocks on ordered processing for each individual shard ▪ Increasing # of shards with even distribution allows increased concurrency ▪ Batch size may impact duration if the Lambda function takes longer to process more records … … Source Kinesis Lambda Polling Logic Shards Lambda will scale automaticallyScale Kinesis by adding shards Batch sync invokesPolls
  20. 20. Processing streams: under the hood ▪ Retry execution failures until the record is expired ▪ Retry with exponential backoff up to 60 s ▪ Throttles and errors impacts duration and directly impacts throughput Kinesis … Source Scale Kinesis by adding shards Lambda Polling Logic Lambda will scale automatically Polls invoke fail invoke fail invoke success Batch sync invokes
  21. 21. Processing streams: under the hood ▪ Maximum theoretical throughput: # shards * 2 MB / (s) ▪ Effective theoretical throughput: ( # shards * batch size (MB) ) / ( function duration (s) * retries until expiry) ▪ If put / ingestion rate is greater than the theoretical throughput, consider increasing number of shards of optimizing function duration to increase throughput
  22. 22. Processing streams: how it looks •GetRecords (effective throughput): bytes, latency, records, etc. •PutRecord: bytes, latency, records, etc. •GetRecords.IteratorAgeMilliseconds: how old your last processed records were. If high, processing is falling behind. If close to 24 hours, records are close to being dropped.
  23. 23. Processing streams: how it looks Amazon CloudWatch Metrics • Invocation count • Duration • Error count • Throttle count Amazon CloudWatch Logs • All Metrics • Custom logs • RAM consumed
  24. 24. Processing streams: how it looks Common observations: ▪ Effective batch size may be less than configured during low throughput ▪ Effective batch size will increase during higher throughput ▪ Increased Lambda duration -> decreased # of invokes and GetRecord calls ▪ Too many consumers of your stream may compete with Kinesis read limits and induce ReadProvisionedThroughputExceeded errors and metrics
  25. 25. ANALYSING USAGE OF THOMSON REUTERS PRODUCTS WITH AWS Anders Fritz & Marco Pierleoni
  26. 26. CHALLENGE To identify and define a solution for usage analytics tracking that enables product teams to take ownership of the usage data collected. In addition to tracking and visualizing usage data it had to; 1. Cross reference Usage with Business data 4. Require Limited Maintenance. 3. Auto Scale as data flow fluctuates. 2. Follow TR Security & Compliance rules. 5. Launch in 5 months.
  27. 27. SOLUTION
  28. 28. SOLUTION
  29. 29. SOLUTION
  30. 30. SOLUTION
  31. 31. SOLUTION
  32. 32. SOLUTION
  33. 33. SOLUTION
  34. 34. • Product Insight is live – adoption rate high. • Tested 4,000 requests per second while targeting 5bn requests / month. • Since March – very little maintenance required • No Outages • No Downtime • Cloudwatch monitor everything. • Latency – Data visible on chart within 10 seconds • BrExit and US elections tested autoscaling. • US elections ~16m events – normally ~ 6-8m events / day. • UK EU referendum (BrExit) ~ 10m events – normally ~ 5m events / day OUTCOME
  35. 35. EVENTS CAPTURED UK EU Referendum June 23rd (BrExit) time #events
  36. 36. EVENTS CAPTURED US Elections November 8th time #events
  37. 37. Thank you!
  38. 38. Remember to complete your evaluations!

×