Weitere ähnliche Inhalte Ähnlich wie Serverless Stream Processing Tips & Tricks (ANT358) - AWS re:Invent 2018 (20) Mehr von Amazon Web Services (20) Serverless Stream Processing Tips & Tricks (ANT358) - AWS re:Invent 20182. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Serverless Stream Processing Tips
and Tricks
Rajeev Chakrabarti
Principal Enterprise Architect,
AWS
A N T - 3 5 8
Girish Sood
Senior Product Manager, AWS
3. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
No servers to provision
or manage
Scales with usage
Never pay for idle Availability and fault
tolerance built in
Serverless means …
4. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
FUNCTION
Node.js
Python
Java
C#
Go
SERVICES (ANYTHING)
Changes in
data state
Requests to
endpoints
Changes in
resource state
EVENT SOURCE
Serverless applications
5. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Using AWS Lambda
Bring your own code
• Node.js, Java, Python,
C#, Go
• Bring your own libraries
(even native ones)
Simple resource model
• Select power rating from
128 MB to 3 GB
• CPU and network
allocated proportionately
Flexible use
• Synchronous or
asynchronous
• Integrated with other
AWS services
Flexible authorization
• Securely grant access to
resources and VPCs
• Fine-grained control for
invoking your functions
6. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Fine-grained pricing
• Buy compute time in 100ms
increments
• Low request charge
• No hourly, daily, or monthly
minimums
• No per-device fees
• Never pay for idleAWS Free Tier
1M requests and 400,000 GBs of compute
Every month, every customer
7. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Processing Amazon Kinesis Data Streams
with Lambda
8. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Processing a data stream with
Lambda
Data
producer
Amazon
SNS
Continuously stream data
Lambda
service
Lambda
function
A
Lambda
function
B
Continuously polls for new
data, one poll per second
Automatically invokes your
function(s) when data found
Kinesis Data
Streams
9. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Benefits of serverless stream processing
No servers to manage
No stream consumption costs when no new records
to process
Automatically scales during re-shard operations
10. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
How it works
Data
producer
KinesisData
Streams
Lambda
service
Function A
(instance 1)
Shard 1
Shard 2
Shard 3
Shard 4
Function A
(instance 2)
Function A
(instance 3)
Function A
(instance 4)
Function B
(instance 1)
Function B
(instance 2)
Function B
(instance 3)
Function B
(instance 4)
Function C
(instance 1)
Function C
(instance 2)
Function C
(instance 3)
Function C
(instance 4)
11. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Important considerations
• Lambda polls each shard once per second
• A data stream shard supports five reads per second
To maximize throughput, limit the number of different
Lambda functions reading from the same stream to five*
*or less, if the stream has other consumers such as Amazon Kinesis Data
Firehose or Amazon Kinesis Data Analytics
12. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Optimize batch size for maximum throughput
• Lambda polls each shard once per second
• Lambda’s maximum execution time is 15 minutes
Adjust the batch size (maximum 1000) to ensure execution time is optimal
Data
producer
KinesisData
Streams
Lambda
service
Function A
(instance 1)
Function A
(instance 1)
Batch size = 200
300 records Invoked serially
13. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Beware poison messages
• Lambda checkpoints upon the success of each batch
• Failed batches are retried indefinitely (until the bad
record expires from the shard)
Data
producer
KinesisData
Streams
Lambda
service
Function A
(instance 1)
Batch size = 200
300 records
.
.
Continues until
record expiration
Function A
(instance 1)
14. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Capture and log exceptions
• Catch exceptions and log for offline analysis
Data
producer
Kinesis Data
Streams
Lambda
service
Function A
(instance 1)
Batch size = 200
300 records
✔
Function A
(instance 1)
✔
Catch exceptions and
log to CloudWatch Logs
Amazon
CloudWatch
Logs
Return successfully from
Lambda function
Ensure processing moves forward by catching exceptions and returning
successfully
!
15. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Monitor to ensure processing is moving forward
• Monitor Lambda's IteratorAge metric for your
function(s)
• Milliseconds between time of oldest record in a batch and time the record was
written to the stream
• If value is growing unbounded, consider the following:
• Increase stream's retention period to avoid data loss
• Deploy new, optimized Lambda function
• Add more shards (will not cause IteratorAge to drop for existing records, but may
help for future records)
16. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
When does serverless not make sense?
• Stateful stream processing, such as windowed aggregations
• Consider Kinesis Data Analytics, a custom KCL application, or open source
libraries such as Flink
• Buffering large volumes of streaming data before writing elsewhere
• Consider using Kinesis Data Firehose
17. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
New: AWS Lambda supports Kinesis Data Streams
Enhanced Fan-Out and HTTP/2 for faster streaming
Enhanced fan-out allows customers to scale the
number of functions reading from a stream in
parallel while maintaining performance.
HTTP/2 data retrieval API improves data delivery
speed between data producers and Lambda
functions by more than 65%
Amazon Kinesis
Data Streams
18. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Streaming ingest-transform-load (ITL)
with Kinesis Data Firehose
19. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Streaming ITL
Perform typical ETL operations on the fly
• Enrich
• Filter
• Convert
20. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Enrich streaming data
Data
producer
KinesisData
Firehose
{
"ip_addr": "1.2.3.4",
..
}
{
"ip_addr":
"1.2.3.4",
"city": "Boston",
"state": "MA",
..
}
21. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Filter streaming data
Data
producer
KinesisData
Firehose
{
"type": "info",
..
}
{
"type": "error",
..
}
{
"type": "error",
..
}
22. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Convert streaming data
Data
producer
KinesisData
Firehose
[Wed Oct 11 14:32:52 2017] [error]
[client 127.0.0.1]
{
"date": "2017/10/11
14:32:52",
"status": "error",
"source": "127.0.0.1"
}
23. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Filter, enrich, and convert data while it is streaming
Data
producer
KinesisData
Firehose
Kinesis Data
Analytics
Service
[Wed Oct 11 14:32:52 2017] [error] [client 127.0.0.1]
[Wed Oct 11 14:32:53 2017] [info] [client 127.0.0.1]
Geo-IP
service
{
"date": "2017/10/11 14:32:52",
"status": "error",
"source": "127.0.0.1",
"city": "Boston",
"state": "MA"
}
{
"recordId": "1",
"result": "Ok",
"data": {
"date": "2017/10/11 14:32:52",
"status": "error",
"source": "127.0.0.1",
"city": "Boston",
"state": "MA"
}
},
{
"recordId": "2",
"result": "Dropped"
}
24. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Behind the scenes
Data
producer
KinesisData
Firehose
S3
Maximum payload size: 3MB
Configured buffer size: 100 MB
…
Delivered payload size: 100 MB
• Asynchronously
invokes
transformation
Lambda function
until buffer is full
(or timeout expired)
• On invocation
failure, retries three
times before
skipping the batch
S3
Failed batches, or failed
transformations
25. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Monitor Kinesis Data Firehose data transformation
• Monitor Data Firehose's ExecuteProcess.Success metric for your
delivery stream
• Ratio of successful Lambda invocations to all Lambda invocation attempts
• Should be at or near 1.0 consistently
• If not, investigate Lambda failure reasons
• Monitor Data Firehose's DeliveryTo*.Success metric
• Should be at or near 1.0 consistently
26. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Streaming data analysis with
Kinesis Data Analytics
27. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Pre-process data before analysis
Typical pre-processing operations
• Convert
• Kinesis Data Analytics requires structured input
• Kinesis Data Analytics prefers CSV or JSON input
• Enrich
• Add or modify fields using external data sources
28. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Pre-process data before analysis
Data
producer
KinesisData
Analytics
KinesisData
Streams |
KinesisData
Firehose
Pre-
processor
SQL engine
…
Maximum payload size: 2MB
• Asynchronously
invokes pre-
processing
Lambda function
Ensure processing moves
forward by catching
exceptions and returning
successfully
• On invocation failure,
retries indefinitely
29. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Handling failed records
Data
producer
KinesisData
Analytics
KinesisData
Streams |
KinesisData
Firehose
Pre-processor
In-application source stream
{
"recordId": "1",
"result": "Ok",
"data": {…}
},
{
"recordId": "2",
"result": "ProcessingFailed",
"data": {…}
}
In-application error stream SQL engine
30. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Deliver results to Lambda
Enables post-processing before delivery
• Enrich; transform
Enables additional delivery destinations
• Write the results of the analysis to any data store
• Forward results to downstream services
31. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Deliver results to Lambda
Data
producer
KinesisData
Analytics
KinesisData
Streams |
KinesisData
Firehose
• Tumbling windows: Lambda
invoked at the end of the
window
• Sliding windows or
continuous queries: Invoked
~1 per second
Failing invocations will be
retried indefinitely and may
result in backpressure!backpressure
• Data is chunked into < 6MB
batches before delivering to
Lambda
32. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Monitor Kinesis Data Analytics metrics
• Monitor the MillisBehindLatest metric for your Kinesis Data
Analytics application
• Milliseconds between the time a record was processed by Kinesis Data Analytics and
the time it arrived in the source Data Streams/Data Firehose
• If it is growing unbounded
• Identify bad actor: Lambda function (pre-processing or destination) or SQL
application
• Optimize Lambda and/or SQL; increase input parallelism
• Monitor Kinesis Data Analytics’ InputProcessing.Duration
metric
• Monitor Kinesis Data Analytics' LambdaDelivery.Duration metric
33. Thank you!
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Rajeev Chakrabarti
Principal Enterprise Architect
34. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.