SlideShare ist ein Scribd-Unternehmen logo
1 von 34
Downloaden Sie, um offline zu lesen
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Serverless Stream Processing Tips
and Tricks
Rajeev Chakrabarti
Principal Enterprise Architect,
AWS
A N T - 3 5 8
Girish Sood
Senior Product Manager, AWS
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
No servers to provision
or manage
Scales with usage
Never pay for idle Availability and fault
tolerance built in
Serverless means …
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
FUNCTION
Node.js
Python
Java
C#
Go
SERVICES (ANYTHING)
Changes in
data state
Requests to
endpoints
Changes in
resource state
EVENT SOURCE
Serverless applications
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Using AWS Lambda
Bring your own code
• Node.js, Java, Python,
C#, Go
• Bring your own libraries
(even native ones)
Simple resource model
• Select power rating from
128 MB to 3 GB
• CPU and network
allocated proportionately
Flexible use
• Synchronous or
asynchronous
• Integrated with other
AWS services
Flexible authorization
• Securely grant access to
resources and VPCs
• Fine-grained control for
invoking your functions
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Fine-grained pricing
• Buy compute time in 100ms
increments
• Low request charge
• No hourly, daily, or monthly
minimums
• No per-device fees
• Never pay for idleAWS Free Tier
1M requests and 400,000 GBs of compute
Every month, every customer
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Processing Amazon Kinesis Data Streams
with Lambda
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Processing a data stream with
Lambda
Data
producer
Amazon
SNS
Continuously stream data
Lambda
service
Lambda
function
A
Lambda
function
B
Continuously polls for new
data, one poll per second
Automatically invokes your
function(s) when data found
Kinesis Data
Streams
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Benefits of serverless stream processing
No servers to manage
No stream consumption costs when no new records
to process
Automatically scales during re-shard operations
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
How it works
Data
producer
KinesisData
Streams
Lambda
service
Function A
(instance 1)
Shard 1
Shard 2
Shard 3
Shard 4
Function A
(instance 2)
Function A
(instance 3)
Function A
(instance 4)
Function B
(instance 1)
Function B
(instance 2)
Function B
(instance 3)
Function B
(instance 4)
Function C
(instance 1)
Function C
(instance 2)
Function C
(instance 3)
Function C
(instance 4)
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Important considerations
• Lambda polls each shard once per second
• A data stream shard supports five reads per second
To maximize throughput, limit the number of different
Lambda functions reading from the same stream to five*
*or less, if the stream has other consumers such as Amazon Kinesis Data
Firehose or Amazon Kinesis Data Analytics
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Optimize batch size for maximum throughput
• Lambda polls each shard once per second
• Lambda’s maximum execution time is 15 minutes
Adjust the batch size (maximum 1000) to ensure execution time is optimal
Data
producer
KinesisData
Streams
Lambda
service
Function A
(instance 1)
Function A
(instance 1)
Batch size = 200
300 records Invoked serially
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Beware poison messages
• Lambda checkpoints upon the success of each batch
• Failed batches are retried indefinitely (until the bad
record expires from the shard)
Data
producer
KinesisData
Streams
Lambda
service
Function A
(instance 1)
Batch size = 200
300 records
.
.
Continues until
record expiration
Function A
(instance 1)
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Capture and log exceptions
• Catch exceptions and log for offline analysis
Data
producer
Kinesis Data
Streams
Lambda
service
Function A
(instance 1)
Batch size = 200
300 records
✔
Function A
(instance 1)
✔
Catch exceptions and
log to CloudWatch Logs
Amazon
CloudWatch
Logs
Return successfully from
Lambda function
Ensure processing moves forward by catching exceptions and returning
successfully
!
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Monitor to ensure processing is moving forward
• Monitor Lambda's IteratorAge metric for your
function(s)
• Milliseconds between time of oldest record in a batch and time the record was
written to the stream
• If value is growing unbounded, consider the following:
• Increase stream's retention period to avoid data loss
• Deploy new, optimized Lambda function
• Add more shards (will not cause IteratorAge to drop for existing records, but may
help for future records)
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
When does serverless not make sense?
• Stateful stream processing, such as windowed aggregations
• Consider Kinesis Data Analytics, a custom KCL application, or open source
libraries such as Flink
• Buffering large volumes of streaming data before writing elsewhere
• Consider using Kinesis Data Firehose
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
New: AWS Lambda supports Kinesis Data Streams
Enhanced Fan-Out and HTTP/2 for faster streaming
Enhanced fan-out allows customers to scale the
number of functions reading from a stream in
parallel while maintaining performance.
HTTP/2 data retrieval API improves data delivery
speed between data producers and Lambda
functions by more than 65%
Amazon Kinesis
Data Streams
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Streaming ingest-transform-load (ITL)
with Kinesis Data Firehose
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Streaming ITL
Perform typical ETL operations on the fly
• Enrich
• Filter
• Convert
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Enrich streaming data
Data
producer
KinesisData
Firehose
{
"ip_addr": "1.2.3.4",
..
}
{
"ip_addr":
"1.2.3.4",
"city": "Boston",
"state": "MA",
..
}
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Filter streaming data
Data
producer
KinesisData
Firehose
{
"type": "info",
..
}
{
"type": "error",
..
}
{
"type": "error",
..
}
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Convert streaming data
Data
producer
KinesisData
Firehose
[Wed Oct 11 14:32:52 2017] [error]
[client 127.0.0.1]
{
"date": "2017/10/11
14:32:52",
"status": "error",
"source": "127.0.0.1"
}
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Filter, enrich, and convert data while it is streaming
Data
producer
KinesisData
Firehose
Kinesis Data
Analytics
Service
[Wed Oct 11 14:32:52 2017] [error] [client 127.0.0.1]
[Wed Oct 11 14:32:53 2017] [info] [client 127.0.0.1]
Geo-IP
service
{
"date": "2017/10/11 14:32:52",
"status": "error",
"source": "127.0.0.1",
"city": "Boston",
"state": "MA"
}
{
"recordId": "1",
"result": "Ok",
"data": {
"date": "2017/10/11 14:32:52",
"status": "error",
"source": "127.0.0.1",
"city": "Boston",
"state": "MA"
}
},
{
"recordId": "2",
"result": "Dropped"
}
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Behind the scenes
Data
producer
KinesisData
Firehose
S3
Maximum payload size: 3MB
Configured buffer size: 100 MB
…
Delivered payload size: 100 MB
• Asynchronously
invokes
transformation
Lambda function
until buffer is full
(or timeout expired)
• On invocation
failure, retries three
times before
skipping the batch
S3
Failed batches, or failed
transformations
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Monitor Kinesis Data Firehose data transformation
• Monitor Data Firehose's ExecuteProcess.Success metric for your
delivery stream
• Ratio of successful Lambda invocations to all Lambda invocation attempts
• Should be at or near 1.0 consistently
• If not, investigate Lambda failure reasons
• Monitor Data Firehose's DeliveryTo*.Success metric
• Should be at or near 1.0 consistently
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Streaming data analysis with
Kinesis Data Analytics
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Pre-process data before analysis
Typical pre-processing operations
• Convert
• Kinesis Data Analytics requires structured input
• Kinesis Data Analytics prefers CSV or JSON input
• Enrich
• Add or modify fields using external data sources
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Pre-process data before analysis
Data
producer
KinesisData
Analytics
KinesisData
Streams |
KinesisData
Firehose
Pre-
processor
SQL engine
…
Maximum payload size: 2MB
• Asynchronously
invokes pre-
processing
Lambda function
Ensure processing moves
forward by catching
exceptions and returning
successfully
• On invocation failure,
retries indefinitely
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Handling failed records
Data
producer
KinesisData
Analytics
KinesisData
Streams |
KinesisData
Firehose
Pre-processor
In-application source stream
{
"recordId": "1",
"result": "Ok",
"data": {…}
},
{
"recordId": "2",
"result": "ProcessingFailed",
"data": {…}
}
In-application error stream SQL engine
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Deliver results to Lambda
Enables post-processing before delivery
• Enrich; transform
Enables additional delivery destinations
• Write the results of the analysis to any data store
• Forward results to downstream services
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Deliver results to Lambda
Data
producer
KinesisData
Analytics
KinesisData
Streams |
KinesisData
Firehose
• Tumbling windows: Lambda
invoked at the end of the
window
• Sliding windows or
continuous queries: Invoked
~1 per second
Failing invocations will be
retried indefinitely and may
result in backpressure!backpressure
• Data is chunked into < 6MB
batches before delivering to
Lambda
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Monitor Kinesis Data Analytics metrics
• Monitor the MillisBehindLatest metric for your Kinesis Data
Analytics application
• Milliseconds between the time a record was processed by Kinesis Data Analytics and
the time it arrived in the source Data Streams/Data Firehose
• If it is growing unbounded
• Identify bad actor: Lambda function (pre-processing or destination) or SQL
application
• Optimize Lambda and/or SQL; increase input parallelism
• Monitor Kinesis Data Analytics’ InputProcessing.Duration
metric
• Monitor Kinesis Data Analytics' LambdaDelivery.Duration metric
Thank you!
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Rajeev Chakrabarti
Principal Enterprise Architect
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Advanced Design Patterns for Amazon DynamoDB - Workshop (DAT404-R1) - AWS re:...
Advanced Design Patterns for Amazon DynamoDB - Workshop (DAT404-R1) - AWS re:...Advanced Design Patterns for Amazon DynamoDB - Workshop (DAT404-R1) - AWS re:...
Advanced Design Patterns for Amazon DynamoDB - Workshop (DAT404-R1) - AWS re:...
 
Serverless State Management & Orchestration for Modern Apps (API302) - AWS re...
Serverless State Management & Orchestration for Modern Apps (API302) - AWS re...Serverless State Management & Orchestration for Modern Apps (API302) - AWS re...
Serverless State Management & Orchestration for Modern Apps (API302) - AWS re...
 
Building Serverless Applications with Amazon DynamoDB & AWS Lambda - Workshop...
Building Serverless Applications with Amazon DynamoDB & AWS Lambda - Workshop...Building Serverless Applications with Amazon DynamoDB & AWS Lambda - Workshop...
Building Serverless Applications with Amazon DynamoDB & AWS Lambda - Workshop...
 
[NEW LAUNCH!] Building modern applications using Amazon DynamoDB transactions...
[NEW LAUNCH!] Building modern applications using Amazon DynamoDB transactions...[NEW LAUNCH!] Building modern applications using Amazon DynamoDB transactions...
[NEW LAUNCH!] Building modern applications using Amazon DynamoDB transactions...
 
Shift-Left SRE: Self-Healing with AWS Lambda Functions (DEV313-S) - AWS re:In...
Shift-Left SRE: Self-Healing with AWS Lambda Functions (DEV313-S) - AWS re:In...Shift-Left SRE: Self-Healing with AWS Lambda Functions (DEV313-S) - AWS re:In...
Shift-Left SRE: Self-Healing with AWS Lambda Functions (DEV313-S) - AWS re:In...
 
Serverless Stream Processing Pipeline Best Practices (SRV316-R1) - AWS re:Inv...
Serverless Stream Processing Pipeline Best Practices (SRV316-R1) - AWS re:Inv...Serverless Stream Processing Pipeline Best Practices (SRV316-R1) - AWS re:Inv...
Serverless Stream Processing Pipeline Best Practices (SRV316-R1) - AWS re:Inv...
 
Serverless Architectural Patterns and Best Practices (ARC305-R2) - AWS re:Inv...
Serverless Architectural Patterns and Best Practices (ARC305-R2) - AWS re:Inv...Serverless Architectural Patterns and Best Practices (ARC305-R2) - AWS re:Inv...
Serverless Architectural Patterns and Best Practices (ARC305-R2) - AWS re:Inv...
 
Log Analytics with AWS
Log Analytics with AWSLog Analytics with AWS
Log Analytics with AWS
 
Don’t Wait Until Tomorrow: From Batch to Streaming (ANT360) - AWS re:Invent 2018
Don’t Wait Until Tomorrow: From Batch to Streaming (ANT360) - AWS re:Invent 2018Don’t Wait Until Tomorrow: From Batch to Streaming (ANT360) - AWS re:Invent 2018
Don’t Wait Until Tomorrow: From Batch to Streaming (ANT360) - AWS re:Invent 2018
 
Serverless Video Ingestion & Analytics with Amazon Kinesis Video Streams (ANT...
Serverless Video Ingestion & Analytics with Amazon Kinesis Video Streams (ANT...Serverless Video Ingestion & Analytics with Amazon Kinesis Video Streams (ANT...
Serverless Video Ingestion & Analytics with Amazon Kinesis Video Streams (ANT...
 
Instrumenting Kubernetes for Observability Using AWS X-Ray and Amazon CloudWa...
Instrumenting Kubernetes for Observability Using AWS X-Ray and Amazon CloudWa...Instrumenting Kubernetes for Observability Using AWS X-Ray and Amazon CloudWa...
Instrumenting Kubernetes for Observability Using AWS X-Ray and Amazon CloudWa...
 
How Amazon Migrated Items & Offers for Retail, Marketplace, & Digital to Dyna...
How Amazon Migrated Items & Offers for Retail, Marketplace, & Digital to Dyna...How Amazon Migrated Items & Offers for Retail, Marketplace, & Digital to Dyna...
How Amazon Migrated Items & Offers for Retail, Marketplace, & Digital to Dyna...
 
ML Workflows with Amazon SageMaker and AWS Step Functions (API325) - AWS re:I...
ML Workflows with Amazon SageMaker and AWS Step Functions (API325) - AWS re:I...ML Workflows with Amazon SageMaker and AWS Step Functions (API325) - AWS re:I...
ML Workflows with Amazon SageMaker and AWS Step Functions (API325) - AWS re:I...
 
Closing Loops and Opening Minds: How to Take Control of Systems, Big and Smal...
Closing Loops and Opening Minds: How to Take Control of Systems, Big and Smal...Closing Loops and Opening Minds: How to Take Control of Systems, Big and Smal...
Closing Loops and Opening Minds: How to Take Control of Systems, Big and Smal...
 
Five New Security Automations Using AWS Security Services & Open Source (SEC4...
Five New Security Automations Using AWS Security Services & Open Source (SEC4...Five New Security Automations Using AWS Security Services & Open Source (SEC4...
Five New Security Automations Using AWS Security Services & Open Source (SEC4...
 
High Performance Computing on AWS: Driving Innovation without Infrastructure ...
High Performance Computing on AWS: Driving Innovation without Infrastructure ...High Performance Computing on AWS: Driving Innovation without Infrastructure ...
High Performance Computing on AWS: Driving Innovation without Infrastructure ...
 
Proven Methodologies for Accelerating Your Cloud Journey (ENT308-S) - AWS re:...
Proven Methodologies for Accelerating Your Cloud Journey (ENT308-S) - AWS re:...Proven Methodologies for Accelerating Your Cloud Journey (ENT308-S) - AWS re:...
Proven Methodologies for Accelerating Your Cloud Journey (ENT308-S) - AWS re:...
 
Automate & Audit Cloud Governance & Compliance in Your Landing Zone (ENT315-R...
Automate & Audit Cloud Governance & Compliance in Your Landing Zone (ENT315-R...Automate & Audit Cloud Governance & Compliance in Your Landing Zone (ENT315-R...
Automate & Audit Cloud Governance & Compliance in Your Landing Zone (ENT315-R...
 
A Chronicle of Airbnb Architecture Evolution (ARC407) - AWS re:Invent 2018
A Chronicle of Airbnb Architecture Evolution (ARC407) - AWS re:Invent 2018A Chronicle of Airbnb Architecture Evolution (ARC407) - AWS re:Invent 2018
A Chronicle of Airbnb Architecture Evolution (ARC407) - AWS re:Invent 2018
 
Build an AWS Analytics Solution to Monitor the Video Streaming Experience (MA...
Build an AWS Analytics Solution to Monitor the Video Streaming Experience (MA...Build an AWS Analytics Solution to Monitor the Video Streaming Experience (MA...
Build an AWS Analytics Solution to Monitor the Video Streaming Experience (MA...
 

Ähnlich wie Serverless Stream Processing Tips & Tricks (ANT358) - AWS re:Invent 2018

100 Billion Data Points With Lambda_AWSPSSummit_Singapore
100 Billion Data Points With Lambda_AWSPSSummit_Singapore100 Billion Data Points With Lambda_AWSPSSummit_Singapore
100 Billion Data Points With Lambda_AWSPSSummit_Singapore
Amazon Web Services
 

Ähnlich wie Serverless Stream Processing Tips & Tricks (ANT358) - AWS re:Invent 2018 (20)

Serverless Stream Processing Tips & Tricks - BDA311 - Chicago AWS Summit
Serverless Stream Processing Tips & Tricks - BDA311 - Chicago AWS SummitServerless Stream Processing Tips & Tricks - BDA311 - Chicago AWS Summit
Serverless Stream Processing Tips & Tricks - BDA311 - Chicago AWS Summit
 
SRV316 Serverless Data Processing at Scale: An Amazon.com Case Study
 SRV316 Serverless Data Processing at Scale: An Amazon.com Case Study SRV316 Serverless Data Processing at Scale: An Amazon.com Case Study
SRV316 Serverless Data Processing at Scale: An Amazon.com Case Study
 
How Nextdoor Built a Scalable, Serverless Data Pipeline for Billions of Event...
How Nextdoor Built a Scalable, Serverless Data Pipeline for Billions of Event...How Nextdoor Built a Scalable, Serverless Data Pipeline for Billions of Event...
How Nextdoor Built a Scalable, Serverless Data Pipeline for Billions of Event...
 
AWS Floor28 - WildRydes Serverless Data Processsing workshop (Ver2)
AWS Floor28 - WildRydes Serverless Data Processsing workshop (Ver2)AWS Floor28 - WildRydes Serverless Data Processsing workshop (Ver2)
AWS Floor28 - WildRydes Serverless Data Processsing workshop (Ver2)
 
WildRydes Serverless Data Processing Workshop
WildRydes Serverless Data Processing WorkshopWildRydes Serverless Data Processing Workshop
WildRydes Serverless Data Processing Workshop
 
Serverless Architectural Patterns - ServerlessDays TLV
Serverless Architectural Patterns - ServerlessDays TLVServerless Architectural Patterns - ServerlessDays TLV
Serverless Architectural Patterns - ServerlessDays TLV
 
Serverless Developer Experience
Serverless Developer ExperienceServerless Developer Experience
Serverless Developer Experience
 
STG401_This Is My Architecture
STG401_This Is My ArchitectureSTG401_This Is My Architecture
STG401_This Is My Architecture
 
Data Warehousing and Data Lake Analytics, Together - AWS Online Tech Talks
Data Warehousing and Data Lake Analytics, Together - AWS Online Tech TalksData Warehousing and Data Lake Analytics, Together - AWS Online Tech Talks
Data Warehousing and Data Lake Analytics, Together - AWS Online Tech Talks
 
GPSTEC313_GPS Real-Time Data Processing with AWS Lambda Quickly, at Scale, an...
GPSTEC313_GPS Real-Time Data Processing with AWS Lambda Quickly, at Scale, an...GPSTEC313_GPS Real-Time Data Processing with AWS Lambda Quickly, at Scale, an...
GPSTEC313_GPS Real-Time Data Processing with AWS Lambda Quickly, at Scale, an...
 
100 Billion Data Points With Lambda_AWSPSSummit_Singapore
100 Billion Data Points With Lambda_AWSPSSummit_Singapore100 Billion Data Points With Lambda_AWSPSSummit_Singapore
100 Billion Data Points With Lambda_AWSPSSummit_Singapore
 
ABD301-Analyzing Streaming Data in Real Time with Amazon Kinesis
ABD301-Analyzing Streaming Data in Real Time with Amazon KinesisABD301-Analyzing Streaming Data in Real Time with Amazon Kinesis
ABD301-Analyzing Streaming Data in Real Time with Amazon Kinesis
 
From Batch to Streaming - How Amazon Flex Uses Real-time Analytics
From Batch to Streaming - How Amazon Flex Uses Real-time AnalyticsFrom Batch to Streaming - How Amazon Flex Uses Real-time Analytics
From Batch to Streaming - How Amazon Flex Uses Real-time Analytics
 
Serverless Architectural Patterns
Serverless Architectural PatternsServerless Architectural Patterns
Serverless Architectural Patterns
 
Leadership Session: Using DevOps, Microservices, and Serverless to Accelerate...
Leadership Session: Using DevOps, Microservices, and Serverless to Accelerate...Leadership Session: Using DevOps, Microservices, and Serverless to Accelerate...
Leadership Session: Using DevOps, Microservices, and Serverless to Accelerate...
 
Analyzing Streaming Data in Real-time with Amazon Kinesis
Analyzing Streaming Data in Real-time with Amazon KinesisAnalyzing Streaming Data in Real-time with Amazon Kinesis
Analyzing Streaming Data in Real-time with Amazon Kinesis
 
I Want to Analyze and Visualize Website Access Logs, but Why Do I Need Server...
I Want to Analyze and Visualize Website Access Logs, but Why Do I Need Server...I Want to Analyze and Visualize Website Access Logs, but Why Do I Need Server...
I Want to Analyze and Visualize Website Access Logs, but Why Do I Need Server...
 
如何以 serverless 架構打造快速回應客戶需求的零售情境 (Level: 200)
如何以 serverless 架構打造快速回應客戶需求的零售情境 (Level: 200)如何以 serverless 架構打造快速回應客戶需求的零售情境 (Level: 200)
如何以 serverless 架構打造快速回應客戶需求的零售情境 (Level: 200)
 
Tinder and DynamoDB: It's a Match! Massive Data Migration, Zero Down Time - D...
Tinder and DynamoDB: It's a Match! Massive Data Migration, Zero Down Time - D...Tinder and DynamoDB: It's a Match! Massive Data Migration, Zero Down Time - D...
Tinder and DynamoDB: It's a Match! Massive Data Migration, Zero Down Time - D...
 
Wild Rydes with Big Data/Kinesis focus: AWS Serverless Workshop
Wild Rydes with Big Data/Kinesis focus: AWS Serverless WorkshopWild Rydes with Big Data/Kinesis focus: AWS Serverless Workshop
Wild Rydes with Big Data/Kinesis focus: AWS Serverless Workshop
 

Mehr von Amazon Web Services

Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
Amazon Web Services
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
Amazon Web Services
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
Amazon Web Services
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
Amazon Web Services
 

Mehr von Amazon Web Services (20)

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS Fargate
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWS
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot
 
Open banking as a service
Open banking as a serviceOpen banking as a service
Open banking as a service
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
 
Computer Vision con AWS
Computer Vision con AWSComputer Vision con AWS
Computer Vision con AWS
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatare
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e web
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
 
Fundraising Essentials
Fundraising EssentialsFundraising Essentials
Fundraising Essentials
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container Service
 

Serverless Stream Processing Tips & Tricks (ANT358) - AWS re:Invent 2018

  • 1.
  • 2. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Serverless Stream Processing Tips and Tricks Rajeev Chakrabarti Principal Enterprise Architect, AWS A N T - 3 5 8 Girish Sood Senior Product Manager, AWS
  • 3. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. No servers to provision or manage Scales with usage Never pay for idle Availability and fault tolerance built in Serverless means …
  • 4. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. FUNCTION Node.js Python Java C# Go SERVICES (ANYTHING) Changes in data state Requests to endpoints Changes in resource state EVENT SOURCE Serverless applications
  • 5. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Using AWS Lambda Bring your own code • Node.js, Java, Python, C#, Go • Bring your own libraries (even native ones) Simple resource model • Select power rating from 128 MB to 3 GB • CPU and network allocated proportionately Flexible use • Synchronous or asynchronous • Integrated with other AWS services Flexible authorization • Securely grant access to resources and VPCs • Fine-grained control for invoking your functions
  • 6. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Fine-grained pricing • Buy compute time in 100ms increments • Low request charge • No hourly, daily, or monthly minimums • No per-device fees • Never pay for idleAWS Free Tier 1M requests and 400,000 GBs of compute Every month, every customer
  • 7. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Processing Amazon Kinesis Data Streams with Lambda
  • 8. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Processing a data stream with Lambda Data producer Amazon SNS Continuously stream data Lambda service Lambda function A Lambda function B Continuously polls for new data, one poll per second Automatically invokes your function(s) when data found Kinesis Data Streams
  • 9. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Benefits of serverless stream processing No servers to manage No stream consumption costs when no new records to process Automatically scales during re-shard operations
  • 10. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. How it works Data producer KinesisData Streams Lambda service Function A (instance 1) Shard 1 Shard 2 Shard 3 Shard 4 Function A (instance 2) Function A (instance 3) Function A (instance 4) Function B (instance 1) Function B (instance 2) Function B (instance 3) Function B (instance 4) Function C (instance 1) Function C (instance 2) Function C (instance 3) Function C (instance 4)
  • 11. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Important considerations • Lambda polls each shard once per second • A data stream shard supports five reads per second To maximize throughput, limit the number of different Lambda functions reading from the same stream to five* *or less, if the stream has other consumers such as Amazon Kinesis Data Firehose or Amazon Kinesis Data Analytics
  • 12. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Optimize batch size for maximum throughput • Lambda polls each shard once per second • Lambda’s maximum execution time is 15 minutes Adjust the batch size (maximum 1000) to ensure execution time is optimal Data producer KinesisData Streams Lambda service Function A (instance 1) Function A (instance 1) Batch size = 200 300 records Invoked serially
  • 13. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Beware poison messages • Lambda checkpoints upon the success of each batch • Failed batches are retried indefinitely (until the bad record expires from the shard) Data producer KinesisData Streams Lambda service Function A (instance 1) Batch size = 200 300 records . . Continues until record expiration Function A (instance 1)
  • 14. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Capture and log exceptions • Catch exceptions and log for offline analysis Data producer Kinesis Data Streams Lambda service Function A (instance 1) Batch size = 200 300 records ✔ Function A (instance 1) ✔ Catch exceptions and log to CloudWatch Logs Amazon CloudWatch Logs Return successfully from Lambda function Ensure processing moves forward by catching exceptions and returning successfully !
  • 15. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Monitor to ensure processing is moving forward • Monitor Lambda's IteratorAge metric for your function(s) • Milliseconds between time of oldest record in a batch and time the record was written to the stream • If value is growing unbounded, consider the following: • Increase stream's retention period to avoid data loss • Deploy new, optimized Lambda function • Add more shards (will not cause IteratorAge to drop for existing records, but may help for future records)
  • 16. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. When does serverless not make sense? • Stateful stream processing, such as windowed aggregations • Consider Kinesis Data Analytics, a custom KCL application, or open source libraries such as Flink • Buffering large volumes of streaming data before writing elsewhere • Consider using Kinesis Data Firehose
  • 17. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. New: AWS Lambda supports Kinesis Data Streams Enhanced Fan-Out and HTTP/2 for faster streaming Enhanced fan-out allows customers to scale the number of functions reading from a stream in parallel while maintaining performance. HTTP/2 data retrieval API improves data delivery speed between data producers and Lambda functions by more than 65% Amazon Kinesis Data Streams
  • 18. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Streaming ingest-transform-load (ITL) with Kinesis Data Firehose
  • 19. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Streaming ITL Perform typical ETL operations on the fly • Enrich • Filter • Convert
  • 20. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Enrich streaming data Data producer KinesisData Firehose { "ip_addr": "1.2.3.4", .. } { "ip_addr": "1.2.3.4", "city": "Boston", "state": "MA", .. }
  • 21. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Filter streaming data Data producer KinesisData Firehose { "type": "info", .. } { "type": "error", .. } { "type": "error", .. }
  • 22. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Convert streaming data Data producer KinesisData Firehose [Wed Oct 11 14:32:52 2017] [error] [client 127.0.0.1] { "date": "2017/10/11 14:32:52", "status": "error", "source": "127.0.0.1" }
  • 23. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Filter, enrich, and convert data while it is streaming Data producer KinesisData Firehose Kinesis Data Analytics Service [Wed Oct 11 14:32:52 2017] [error] [client 127.0.0.1] [Wed Oct 11 14:32:53 2017] [info] [client 127.0.0.1] Geo-IP service { "date": "2017/10/11 14:32:52", "status": "error", "source": "127.0.0.1", "city": "Boston", "state": "MA" } { "recordId": "1", "result": "Ok", "data": { "date": "2017/10/11 14:32:52", "status": "error", "source": "127.0.0.1", "city": "Boston", "state": "MA" } }, { "recordId": "2", "result": "Dropped" }
  • 24. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Behind the scenes Data producer KinesisData Firehose S3 Maximum payload size: 3MB Configured buffer size: 100 MB … Delivered payload size: 100 MB • Asynchronously invokes transformation Lambda function until buffer is full (or timeout expired) • On invocation failure, retries three times before skipping the batch S3 Failed batches, or failed transformations
  • 25. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Monitor Kinesis Data Firehose data transformation • Monitor Data Firehose's ExecuteProcess.Success metric for your delivery stream • Ratio of successful Lambda invocations to all Lambda invocation attempts • Should be at or near 1.0 consistently • If not, investigate Lambda failure reasons • Monitor Data Firehose's DeliveryTo*.Success metric • Should be at or near 1.0 consistently
  • 26. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Streaming data analysis with Kinesis Data Analytics
  • 27. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Pre-process data before analysis Typical pre-processing operations • Convert • Kinesis Data Analytics requires structured input • Kinesis Data Analytics prefers CSV or JSON input • Enrich • Add or modify fields using external data sources
  • 28. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Pre-process data before analysis Data producer KinesisData Analytics KinesisData Streams | KinesisData Firehose Pre- processor SQL engine … Maximum payload size: 2MB • Asynchronously invokes pre- processing Lambda function Ensure processing moves forward by catching exceptions and returning successfully • On invocation failure, retries indefinitely
  • 29. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Handling failed records Data producer KinesisData Analytics KinesisData Streams | KinesisData Firehose Pre-processor In-application source stream { "recordId": "1", "result": "Ok", "data": {…} }, { "recordId": "2", "result": "ProcessingFailed", "data": {…} } In-application error stream SQL engine
  • 30. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Deliver results to Lambda Enables post-processing before delivery • Enrich; transform Enables additional delivery destinations • Write the results of the analysis to any data store • Forward results to downstream services
  • 31. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Deliver results to Lambda Data producer KinesisData Analytics KinesisData Streams | KinesisData Firehose • Tumbling windows: Lambda invoked at the end of the window • Sliding windows or continuous queries: Invoked ~1 per second Failing invocations will be retried indefinitely and may result in backpressure!backpressure • Data is chunked into < 6MB batches before delivering to Lambda
  • 32. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Monitor Kinesis Data Analytics metrics • Monitor the MillisBehindLatest metric for your Kinesis Data Analytics application • Milliseconds between the time a record was processed by Kinesis Data Analytics and the time it arrived in the source Data Streams/Data Firehose • If it is growing unbounded • Identify bad actor: Lambda function (pre-processing or destination) or SQL application • Optimize Lambda and/or SQL; increase input parallelism • Monitor Kinesis Data Analytics’ InputProcessing.Duration metric • Monitor Kinesis Data Analytics' LambdaDelivery.Duration metric
  • 33. Thank you! © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Rajeev Chakrabarti Principal Enterprise Architect
  • 34. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.