SlideShare ist ein Scribd-Unternehmen logo
1 von 46
Building Big Data Applications with
Serverless Architectures
AWS Monthly Webinar Series
June 15, 2017
Allan MacInnis
Solutions Architect, AWS
Agenda
What’s Serverless Real-Time Data Processing?
Serverless Processing of Real-Time Streaming Data
Serverless Streaming ETL & Analytics
Demo: Streaming ETL
Serverless Data Processing with Distributed Computing
What’s Serverless Real-Time
Data Processing?
AWS Lambda
Efficient performance at scale Easy to author, deploy,
maintain, secure & manage. Focus on business logic
to build back-end services that perform at scale.
Bring Your Own Code: Stateless, event-driven code
with native support for Node.js, Java, Python and C#
languages.
No Infrastructure to manage: Compute without
managing infrastructure like Amazon EC2 instances
and Auto Scaling groups.
Cost-effective: Automatically matches capacity to
request rate. Compute cost 100 ms increments.
Triggered by events: Direct Sync & Async API calls,
AWS service integrations, and 3rd party triggers.
Amazon
S3
Amazon
DynamoDB
Amazon
Kinesis
AWS
CloudFormation
AWS
CloudTrail
Amazon
CloudWatch
Amazon
Cognito
Amazon
SNS
Amazon
SES
Cron
events
DATA STORES ENDPOINTS
CONFIGURATION REPOSITORIES EVENT/MESSAGE SERVICES
Lambda Event Sources
… and many more!
AWS
CodeCommit
Amazon
API Gateway
Amazon
Alexa
AWS
IoT
AWS Step
Functions
Serverless Real-Time Data Processing Is..
Capture Data
Streams
IoT Data
Financial
Data
Log Data
No servers
to provision
or manage
EVENT SOURCE
Node.js
Python
Java
C#
Process Data
Streams
FUNCTION
Clickstream
Data
Output
Data
DATABASE
CLOUD
SERVICES
Amazon
DynamoDB
Amazon
Kinesis
Amazon
S3
Amazon
SNS
ASYNCHRONOUS PUSH MODEL
STREAM PULL MODEL
Lambda Real-Time Event Sources
Amazon
Alexa
AWS
IoT
SYNCHRONOUS PUSH MODEL
Mapping owned by Event Source
Mapping owned by Lambda
Invokes Lambda via Event Source API
Lambda function invokes when new
records found on stream
Resource-based policy permissions
Lambda Execution role policy permissions
Concurrent executions
Sync invocation
Async Invocation
Sync invocation
Lambda polls the streams
HOW IT WORKS
Serverless Processing of
Real-Time Streaming Data
Amazon Kinesis
Real-Time: Collect real-time data streams and
promptly respond to key business events and
operational triggers. Real-time latencies.
Easy to use: Focus on quickly launching data
streaming applications instead of managing
infrastructure.
Amazon Kinesis Offering: Managed services for
streaming data ingestion and processing.
• Amazon Kinesis Streams: Build applications
that process or analyze streaming data.
• Amazon Kinesis Firehose: Load streaming
data into Amazon S3, Amazon Redshift, and
Amazon Elasticsearch Service.
• Amazon Kinesis Analytics: Analyze data
streams using SQL queries.
Processing Real-Time Streams: Lambda + Amazon Kinesis
Streaming data sent to Amazon
Kinesis and stored in shards
Multiple Lambda functions can be
triggered to process same Amazon
Kinesis stream for “fan out”
Lambda can process data and store
results ex. to DynamoDB, S3
Lambda can aggregate data to
services like Amazon Elasticsearch
Service for analytics
Lambda sends event data and
function info to Amazon CloudWatch
for capturing metrics and monitoring
Amazon
Kinesis
AWS
Lambda
Amazon
CloudWatch
Amazn
DynamoDB
AWS
Lambda
Amazon
Elasticsearch Service
Amazon
S3
Processing Streams: Set Up Amazon Kinesis Stream
Streams
Made up of Shards
Each Shard ingests/reads data up to 1 MB/sec
Each Shard emits/writes data up to 2 MB/sec
Each shard supports 5 reads/sec
Data
All data is stored for 24 hours, configurable to 7 days
Make sure partition key distribution is even to optimize parallel throughput
Partition key used to distribute PUTs across shards, choose key with more groups than
shards
Best Practice
Determine an initial size/shards to plan for expected maximum demand
 Leverage “Help me decide how many shards I need” option in Console
 Use formula for Number Of Shards:
max(incoming_write_bandwidth_in_KB/1000, outgoing_read_bandwidth_in_KB / 2000)
Processing Streams: Create Lambda functions
Memory
CPU allocation proportional to the memory configured
Increasing memory makes your code execute faster (if CPU bound)
Increasing memory allows for larger record sizes processed
Timeout
Increasing timeout allows for longer functions, but longer wait in case of errors
Permission model
Execution role defined for Lambda must have permission to access the stream
Retries
With Amazon Kinesis, Lambda retries until the data expires
(24 hours)
Best Practice
Write Lambda function code to be stateless
Processing Streams: Configure Event Source
Amazon Kinesis mapped as event source in Lambda
Batch size
Max number of records that Lambda will send to one invocation
Not equivalent to effective batch size
Effective batch size is every 250 ms – Calculated as:
MIN(records available, batch size, 6MB)
Increasing batch size allows fewer Lambda function invocations with more
data processed per function
Best Practices
Set to “Trim Horizon” for reading from start of
stream (all data)
Set to “Latest” for reading most recent data (LIFO) (latest data)
Processing streams: How It Works
Polling
Concurrent polling and processing per shard
Lambda polls every 250 ms if no records found
Will grab as much data as possible in one GetRecords call (Batch)
Batching
Batches are passed for invocation to Lambda through
function parameters
Batch size may impact duration if the Lambda function
takes longer to process more records
Sub batch in memory for invocation payload
Synchronous invocation
Batches invoked as synchronous RequestResponse type
Lambda honors Amazon Kinesis at least once semantics
Each shard blocks in order of synchronous invocation
Processing streams: Tuning throughput
If put / ingestion rate is greater than the theoretical throughput, your
processing is at risk of falling behind
Maximum theoretical throughput
# shards * 2MB / Lambda function duration (s)
Effective theoretical throughput
# shards * batch size (MB) / Lambda function duration (s)
… …
Source
Amazon Kinesis
Destination
1
Lambda
Destination
2
FunctionsShards
Lambda will scale automaticallyScale Amazon Kinesis by splitting or merging shards
Waits for responsePolls a batch
Processing streams: Tuning Throughput w/ Retries
Retries
Will retry on execution failures until the record is expired
Throttles and errors impacts duration and directly impacts throughput
Best Practice
Retry with exponential backoff of up to 60s
Effective theoretical throughput with retries
( # shards * batch size (MB) ) / ( function duration (s) * retries until expiry)
… …
Source
Amazon Kinesis
Destination
1
Lambda
Destination
2
FunctionsShards
Lambda will scale automaticallyScale Amazon Kinesis by splitting or merging shards
Receives errorPolls a batch
Receives error
Receives success
Processing streams: Common observations
Effective batch size may be less than configured during low throughput
Effective batch size will increase during higher throughput
Increased Lambda duration -> decreased # of invokes and GetRecord calls
Too many consumers of your stream may compete with Amazon Kinesis read
limits and induce ReadProvisionedThroughputExceeded errors and metrics
Amazon
Kinesis
AWS
Lambda
Processing streams: Monitoring with Cloudwatch
• GetRecords: (effective throughput)
• PutRecord : bytes, latency, records, etc
• GetRecords.IteratorAgeMilliseconds: how old your
last processed records were
Monitoring Amazon Kinesis Streams
Monitoring Lambda functions
• Invocation count: Time function invoked
• Duration: Execution/processing time
• Error count: Number of Errors
• Throttle count: Number of time function throttled
• Iterator Age: Time elapsed from batch received &
final record written to stream
• Review All Metrics
• Make Custom logs
• View RAM consumed
• Search for log events
Debugging
Serverless Streaming ETL
Streaming ETL: What is it?
Traditional ETL
• Batch-oriented
• Servers run scheduled jobs
• Latent
Streaming ETL
• Process data as it gets created
• Runs continuously
• Can produce real-time results
Streaming ETL: Amazon Kinesis Firehose
• Zero administration
• Direct-to-data store integration
• Seamless elasticity
Streaming ETL: Firehose Data Transformation
• Firehose buffers up to 3MB of ingested data
• When buffer is full, automatically invokes Lambda function,
passing array of records to be processed
• Lambda function processes and returns array of transformed
records, with status of each record
• Transformed records are saved to configured destination
[{"
"recordId": "1234",
"data": "encoded-data"
},
{
"recordId": "1235",
"data": "encoded-data"
}
]
[{
"recordId": "1234",
"result": "Ok"
"data": "encoded-data"
},
{
"recordId": "1235",
"result": "Dropped"
"data": "encoded-data"
}
]
Streaming ETL: Firehose delivery architecture
with transformations
S3 bucket
source records
data source
source records
Amazon Elasticsearch
Service
Firehose
delivery stream
transformed
records
delivery failure
Data transformation
function
transformation failure
Streaming ETL: Amazon Kinesis Analytics
• Apply SQL on streams
• Build real time, stream processing applications
• Easy Scalability
Streaming Analytics: Kinesis Analytics
Easily write SQL code to process
streaming data
Connect to streaming source
Continuously deliver SQL results
Connect to streaming source
• Single streaming source, Kinesis Firehose or Streams
• Single reference source up to 1 GB from S3
• Input formats include JSON, CSV, unstructured text (log
files)
• Each input exposed to has a schema
• Schema is inferred but you can edit
• Deep nesting (2+ levels) and multiple even types
supported
• Best practice for formats not supported: pre-process
data using an AWS Lambda function for transformation
Streaming Analytics: Kinesis Analytics
Write SQL code
• Build streaming applications with one to many SQL
statements
• Extensions to the SQL standard to work seamlessly with
streaming data (STREAM, Windows, ROWTIME)
• Robust SQL support including large number of functions
including:
• Simple mathematical operators (AVG, STDEV, etc.)
• String manipulations (SUBSTRING, POSITION)
• Advanced analytics (random sampling, anomaly detection)
• Support for at-least-once processing semantics
Streaming Analytics: Kinesis Analytics
Example SQL code
Streaming Analytics: Kinesis Analytics
SELECT STREAM
DEVICE_ID,
MAX(MEASURED_TEMP) AS HIGH_TEMP
FROM SOURCE_SQL_STREAM
GROUP BY
DEVICE_ID,
STEP(SOURCE_SQL_STREAM.ROWTIME BY INTERVAL '15'
MINUTE);
Continuously deliver SQL results
• Up to three outputs, including S3, Redshift, and
Elasticsearch (through Kinesis Firehose), Kinesis Streams.
• Firehose allows Kinesis Analytics to separates of processing and
delivery data
• Delivery speed will be heavily dependent upon your SQL queries (i.e.
simple ETL versus 10 minute aggregations)
• Output formats include JSON, CSV, variable column (TSV,
pipe-delimited)
• Best practice for destinations not supported: emit data to a
stream and use AWS Lambda for delivery
Streaming Analytics: Kinesis Analytics
Real-time analytical patterns
• Pre-processing: filtering, transformations
• Basic Analytics: Simple counts, aggregates over
windows
• Advanced Analytics: Detecting anomalies, event
correlation
• Post-processing: Alerting, triggering, final filters
Streaming Analytics: Kinesis Analytics
Amazon S3
Ingest
Streaming
ETL
Persist Analyze
AWS
Lambda
0 msec seconds < 5 minutes
Amazon
Kinesis
Firehose
Amazon
Redshift
Amazon
Elasticsearch
Amazon
Athena
Amazon
Kinesis
Analytics
Amazon
Redshift
SpectrumAmazon
Kinesis
Streams
Streaming Analytics: Big Picture
Data’s destination is S3, Redshift, or ES?
• Consider Kinesis Firehose. Transform streaming data with Lambda
before it gets delivered.
Require stateful processing, such as aggregations over a time
period?
• Consider Kinesis Analytics. Persist your aggregated data using Kinesis
Streams with Lambda, or Kinesis Firehose.
Require stateless processing, with varied destinations?
• Consider Kinesis Streams with Lambda.
Serverless Stream Processing: Which Approach?
Serverless Data Processing with
Distributed Computing
10101101
11001010
Serverless Distributed Computing: Map-Reduce Model
Why Serverless Data Processing with Distributed
Computing?
Remove Difficult infrastructure management
 Cluster administration
 Complex configuration tools
Enable simple, elastic, user-friendly distributed data
processing
 Eliminate complexity of state management
 Bring Distributed Computing power to the masses
Serverless Distributed Computing: Map-Reduce Model
Why Serverless Data Processing with Distributed
Computing?
Eliminate utilization concerns
 Makes code simpler by removes complexities of multi-
threading processing to optimize server usage
 Cost-effective option to run ad hoc MapReduce jobs
Easier, automatic horizontal scaling
 Provide ability to process scientific and analytics
applications
Serverless Distributed Computing: MapReduce
Input Bucket
1
2
Driver
job state
Mapper Functions
map phase
S3
event
source
mapper
output
3 Coordinator
4
Reducer step 1
reducer output
5
recursively
create
n‘th reducer
step
ResultFinal Reducer
reduce phase
6
Serverless Distributed Computing: PyWren
PyWren Prototype Developed at University of California, Berkeley
Uses Python with AWS Lambda stateless functions for large scale data
analytics
Achieved @ 30-40 MB/s write and read performance per-core to S3
object store
Scaled to 60-80 GB/s across 2800 simultaneous functions
Serverless Distributed Computing: Benchmark
Using Amazon MapReduce Reference Architecture Framework
with Lambda
Dataset
Queries:
 Scan query (90 M Rows, 6.36 GB of data)
 Select query on Page Rankings
 Aggregation query on UserVisits ( 775M rows, ~127GB of
data)
Rankings
(rows)
Rankings
(bytes)
UserVisits
(rows)
UserVisits
(bytes)
Documents
(bytes)
90 Million 6.38 GB 775 Million 126.8 GB 136.9 GB
Serverless Distributed Computing: Benchmark
Using Amazon MapReduce Reference Architecture Framework
with Lambda
Subset of the Amplab benchmark ran to compare with other data
processing frameworks
Performance Benchmarks: Execution time for each workload in seconds
TECHNOLOGY SCAN 1A SCAN 1B AGGREGATE 2A
Amazon Redshift (HDD) 2.49 2.61 25.46
Serverless MapReduce 39 47 200
Impala - Disk - 1.2.3 12.015 12.015 113.72
Impala - Mem - 1.2.3 2.17 3.01 84.35
Shark - Disk - 0.8.1 6.6 7 151.4
Shark - Mem - 0.8.1 1.7 1.8 83.7
Hive - 0.12 YARN 50.49 59.93 730.62
Tez - 0.2.0 28.22 36.35 377.48
Demo: Streaming ETL
Requirements and Architecture
• BI team needs to run ad-hoc queries against all order data.
• Real-time dashboard needs to display top 20 products sold
in the past 5 minutes.
Streaming ETL Demo: E-commerce Events
Amazon Kinesis
Firehose
Amazon Kinesis
Streams
Amazon Kinesis
Analytics
Amazon
Redshift
AWS Lambda
Amazon
DynamoDB
E-commerce
Orders
BI
Users
Reports
Dashboards
Next Steps
Serverless Big Data Applications: Next steps
 Learn more about AWS Serverless at
https://aws.amazon.com/serverless
 Explore the AWS Lambda Reference Architecture on GitHub:
 Real-Time Streaming:
https://github.com/awslabs/lambda-refarch-
streamprocessing
 Distributed Computing Reference Architecture
(serverless MapReduce)
https://github.com/awslabs/lambda-refarch-mapreduce
 Create an Amazon Kinesis stream or Kinesis Firehose delivery
stream. Visit the Amazon Kinesis Console and configure a stream
to receive data.
 Send test data to your stream using the Amazon Kinesis Data
Generator.
 Create & test a Lambda function to process streams from Amazon
Kinesis by visiting Lambda console. First 1M requests each month
are on us!
Serverless Big Data Applications: Next steps
 Read the Developer Guide and try the Lambda and Amazon
Kinesis Tutorial:
 http://docs.aws.amazon.com/lambda/latest/dg/with-
kinesis.html
 Send questions, comments, feedback to the AWS Lambda Forums
Serverless Big Data Applications: Next steps
Thank You!

Weitere ähnliche Inhalte

Was ist angesagt?

Getting started with Amazon DynamoDB
Getting started with Amazon DynamoDBGetting started with Amazon DynamoDB
Getting started with Amazon DynamoDBAmazon Web Services
 
AWS re:Invent 2016| DAT318 | Migrating from RDBMS to NoSQL: How Sony Moved fr...
AWS re:Invent 2016| DAT318 | Migrating from RDBMS to NoSQL: How Sony Moved fr...AWS re:Invent 2016| DAT318 | Migrating from RDBMS to NoSQL: How Sony Moved fr...
AWS re:Invent 2016| DAT318 | Migrating from RDBMS to NoSQL: How Sony Moved fr...Amazon Web Services
 
(BDT322) How Redfin & Twitter Leverage Amazon S3 For Big Data
(BDT322) How Redfin & Twitter Leverage Amazon S3 For Big Data(BDT322) How Redfin & Twitter Leverage Amazon S3 For Big Data
(BDT322) How Redfin & Twitter Leverage Amazon S3 For Big DataAmazon Web Services
 
Optimizing Storage for Big Data/Analytics Workloads
Optimizing Storage for Big Data/Analytics WorkloadsOptimizing Storage for Big Data/Analytics Workloads
Optimizing Storage for Big Data/Analytics WorkloadsAmazon Web Services
 
(BDT310) Big Data Architectural Patterns and Best Practices on AWS
(BDT310) Big Data Architectural Patterns and Best Practices on AWS(BDT310) Big Data Architectural Patterns and Best Practices on AWS
(BDT310) Big Data Architectural Patterns and Best Practices on AWSAmazon Web Services
 
Optimizing Storage for Big Data Analytics Workloads
Optimizing Storage for Big Data Analytics WorkloadsOptimizing Storage for Big Data Analytics Workloads
Optimizing Storage for Big Data Analytics WorkloadsAmazon Web Services
 
Strategic Uses for Cost Efficient Long-Term Cloud Storage
Strategic Uses for Cost Efficient Long-Term Cloud StorageStrategic Uses for Cost Efficient Long-Term Cloud Storage
Strategic Uses for Cost Efficient Long-Term Cloud StorageAmazon Web Services
 
AWS re:Invent 2016: Streaming ETL for RDS and DynamoDB (DAT315)
AWS re:Invent 2016: Streaming ETL for RDS and DynamoDB (DAT315)AWS re:Invent 2016: Streaming ETL for RDS and DynamoDB (DAT315)
AWS re:Invent 2016: Streaming ETL for RDS and DynamoDB (DAT315)Amazon Web Services
 
FSI201 FINRA’s Managed Data Lake – Next Gen Analytics in the Cloud
FSI201 FINRA’s Managed Data Lake – Next Gen Analytics in the CloudFSI201 FINRA’s Managed Data Lake – Next Gen Analytics in the Cloud
FSI201 FINRA’s Managed Data Lake – Next Gen Analytics in the CloudAmazon Web Services
 
Building A Modern Data Analytics Architecture on AWS
Building A Modern Data Analytics Architecture on AWSBuilding A Modern Data Analytics Architecture on AWS
Building A Modern Data Analytics Architecture on AWSAmazon Web Services
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon RedshiftAmazon Web Services
 
Simple, Scalable and Highly Durable NAS in the Cloud - Amazon EFS
Simple, Scalable and Highly Durable NAS in the Cloud - Amazon EFSSimple, Scalable and Highly Durable NAS in the Cloud - Amazon EFS
Simple, Scalable and Highly Durable NAS in the Cloud - Amazon EFSAmazon Web Services
 
Big Data Architectural Patterns and Best Practices on AWS
Big Data Architectural Patterns and Best Practices on AWSBig Data Architectural Patterns and Best Practices on AWS
Big Data Architectural Patterns and Best Practices on AWSAmazon Web Services
 
SRV403 Deep Dive on Object Storage: Amazon S3 and Amazon Glacier
SRV403 Deep Dive on Object Storage: Amazon S3 and Amazon GlacierSRV403 Deep Dive on Object Storage: Amazon S3 and Amazon Glacier
SRV403 Deep Dive on Object Storage: Amazon S3 and Amazon GlacierAmazon Web Services
 
Databases in the Cloud - DevDay Austin 2017 Day 2
Databases in the Cloud - DevDay Austin 2017 Day 2Databases in the Cloud - DevDay Austin 2017 Day 2
Databases in the Cloud - DevDay Austin 2017 Day 2Amazon Web Services
 
Day 4 - Big Data on AWS - RedShift, EMR & the Internet of Things
Day 4 - Big Data on AWS - RedShift, EMR & the Internet of ThingsDay 4 - Big Data on AWS - RedShift, EMR & the Internet of Things
Day 4 - Big Data on AWS - RedShift, EMR & the Internet of ThingsAmazon Web Services
 
Big Data and Architectural Patterns on AWS - Pop-up Loft Tel Aviv
Big Data and Architectural Patterns on AWS - Pop-up Loft Tel AvivBig Data and Architectural Patterns on AWS - Pop-up Loft Tel Aviv
Big Data and Architectural Patterns on AWS - Pop-up Loft Tel AvivAmazon Web Services
 
AWS re:Invent 2016: Introduction to Managed Database Services on AWS (DAT307)
AWS re:Invent 2016: Introduction to Managed Database Services on AWS (DAT307)AWS re:Invent 2016: Introduction to Managed Database Services on AWS (DAT307)
AWS re:Invent 2016: Introduction to Managed Database Services on AWS (DAT307)Amazon Web Services
 
Strategic Uses for Cost Efficient Long-Term Cloud Storage
Strategic Uses for Cost Efficient Long-Term Cloud StorageStrategic Uses for Cost Efficient Long-Term Cloud Storage
Strategic Uses for Cost Efficient Long-Term Cloud StorageAmazon Web Services
 
Managing Data with Voume Velocity, and Variety with Amazon ElastiCache for Redis
Managing Data with Voume Velocity, and Variety with Amazon ElastiCache for RedisManaging Data with Voume Velocity, and Variety with Amazon ElastiCache for Redis
Managing Data with Voume Velocity, and Variety with Amazon ElastiCache for RedisAmazon Web Services
 

Was ist angesagt? (20)

Getting started with Amazon DynamoDB
Getting started with Amazon DynamoDBGetting started with Amazon DynamoDB
Getting started with Amazon DynamoDB
 
AWS re:Invent 2016| DAT318 | Migrating from RDBMS to NoSQL: How Sony Moved fr...
AWS re:Invent 2016| DAT318 | Migrating from RDBMS to NoSQL: How Sony Moved fr...AWS re:Invent 2016| DAT318 | Migrating from RDBMS to NoSQL: How Sony Moved fr...
AWS re:Invent 2016| DAT318 | Migrating from RDBMS to NoSQL: How Sony Moved fr...
 
(BDT322) How Redfin & Twitter Leverage Amazon S3 For Big Data
(BDT322) How Redfin & Twitter Leverage Amazon S3 For Big Data(BDT322) How Redfin & Twitter Leverage Amazon S3 For Big Data
(BDT322) How Redfin & Twitter Leverage Amazon S3 For Big Data
 
Optimizing Storage for Big Data/Analytics Workloads
Optimizing Storage for Big Data/Analytics WorkloadsOptimizing Storage for Big Data/Analytics Workloads
Optimizing Storage for Big Data/Analytics Workloads
 
(BDT310) Big Data Architectural Patterns and Best Practices on AWS
(BDT310) Big Data Architectural Patterns and Best Practices on AWS(BDT310) Big Data Architectural Patterns and Best Practices on AWS
(BDT310) Big Data Architectural Patterns and Best Practices on AWS
 
Optimizing Storage for Big Data Analytics Workloads
Optimizing Storage for Big Data Analytics WorkloadsOptimizing Storage for Big Data Analytics Workloads
Optimizing Storage for Big Data Analytics Workloads
 
Strategic Uses for Cost Efficient Long-Term Cloud Storage
Strategic Uses for Cost Efficient Long-Term Cloud StorageStrategic Uses for Cost Efficient Long-Term Cloud Storage
Strategic Uses for Cost Efficient Long-Term Cloud Storage
 
AWS re:Invent 2016: Streaming ETL for RDS and DynamoDB (DAT315)
AWS re:Invent 2016: Streaming ETL for RDS and DynamoDB (DAT315)AWS re:Invent 2016: Streaming ETL for RDS and DynamoDB (DAT315)
AWS re:Invent 2016: Streaming ETL for RDS and DynamoDB (DAT315)
 
FSI201 FINRA’s Managed Data Lake – Next Gen Analytics in the Cloud
FSI201 FINRA’s Managed Data Lake – Next Gen Analytics in the CloudFSI201 FINRA’s Managed Data Lake – Next Gen Analytics in the Cloud
FSI201 FINRA’s Managed Data Lake – Next Gen Analytics in the Cloud
 
Building A Modern Data Analytics Architecture on AWS
Building A Modern Data Analytics Architecture on AWSBuilding A Modern Data Analytics Architecture on AWS
Building A Modern Data Analytics Architecture on AWS
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon Redshift
 
Simple, Scalable and Highly Durable NAS in the Cloud - Amazon EFS
Simple, Scalable and Highly Durable NAS in the Cloud - Amazon EFSSimple, Scalable and Highly Durable NAS in the Cloud - Amazon EFS
Simple, Scalable and Highly Durable NAS in the Cloud - Amazon EFS
 
Big Data Architectural Patterns and Best Practices on AWS
Big Data Architectural Patterns and Best Practices on AWSBig Data Architectural Patterns and Best Practices on AWS
Big Data Architectural Patterns and Best Practices on AWS
 
SRV403 Deep Dive on Object Storage: Amazon S3 and Amazon Glacier
SRV403 Deep Dive on Object Storage: Amazon S3 and Amazon GlacierSRV403 Deep Dive on Object Storage: Amazon S3 and Amazon Glacier
SRV403 Deep Dive on Object Storage: Amazon S3 and Amazon Glacier
 
Databases in the Cloud - DevDay Austin 2017 Day 2
Databases in the Cloud - DevDay Austin 2017 Day 2Databases in the Cloud - DevDay Austin 2017 Day 2
Databases in the Cloud - DevDay Austin 2017 Day 2
 
Day 4 - Big Data on AWS - RedShift, EMR & the Internet of Things
Day 4 - Big Data on AWS - RedShift, EMR & the Internet of ThingsDay 4 - Big Data on AWS - RedShift, EMR & the Internet of Things
Day 4 - Big Data on AWS - RedShift, EMR & the Internet of Things
 
Big Data and Architectural Patterns on AWS - Pop-up Loft Tel Aviv
Big Data and Architectural Patterns on AWS - Pop-up Loft Tel AvivBig Data and Architectural Patterns on AWS - Pop-up Loft Tel Aviv
Big Data and Architectural Patterns on AWS - Pop-up Loft Tel Aviv
 
AWS re:Invent 2016: Introduction to Managed Database Services on AWS (DAT307)
AWS re:Invent 2016: Introduction to Managed Database Services on AWS (DAT307)AWS re:Invent 2016: Introduction to Managed Database Services on AWS (DAT307)
AWS re:Invent 2016: Introduction to Managed Database Services on AWS (DAT307)
 
Strategic Uses for Cost Efficient Long-Term Cloud Storage
Strategic Uses for Cost Efficient Long-Term Cloud StorageStrategic Uses for Cost Efficient Long-Term Cloud Storage
Strategic Uses for Cost Efficient Long-Term Cloud Storage
 
Managing Data with Voume Velocity, and Variety with Amazon ElastiCache for Redis
Managing Data with Voume Velocity, and Variety with Amazon ElastiCache for RedisManaging Data with Voume Velocity, and Variety with Amazon ElastiCache for Redis
Managing Data with Voume Velocity, and Variety with Amazon ElastiCache for Redis
 

Ähnlich wie Building Big Data Applications with Serverless Architectures - June 2017 AWS Online Tech Talks

SMC303 Real-time Data Processing Using AWS Lambda
SMC303 Real-time Data Processing Using AWS LambdaSMC303 Real-time Data Processing Using AWS Lambda
SMC303 Real-time Data Processing Using AWS LambdaAmazon Web Services
 
Real Time Data Processing Using AWS Lambda - DevDay Austin 2017
Real Time Data Processing Using AWS Lambda - DevDay Austin 2017Real Time Data Processing Using AWS Lambda - DevDay Austin 2017
Real Time Data Processing Using AWS Lambda - DevDay Austin 2017Amazon Web Services
 
Real-time Data Processing Using AWS Lambda
Real-time Data Processing Using AWS LambdaReal-time Data Processing Using AWS Lambda
Real-time Data Processing Using AWS LambdaAmazon Web Services
 
Real Time Data Processing Using AWS Lambda - DevDay Los Angeles 2017
Real Time Data Processing Using AWS Lambda - DevDay Los Angeles 2017Real Time Data Processing Using AWS Lambda - DevDay Los Angeles 2017
Real Time Data Processing Using AWS Lambda - DevDay Los Angeles 2017Amazon Web Services
 
Real-time Data Processing Using AWS Lambda
Real-time Data Processing Using AWS LambdaReal-time Data Processing Using AWS Lambda
Real-time Data Processing Using AWS LambdaAmazon Web Services
 
Real-time Data Processing Using AWS Lambda
Real-time Data Processing Using AWS LambdaReal-time Data Processing Using AWS Lambda
Real-time Data Processing Using AWS LambdaAmazon Web Services
 
Raleigh DevDay 2017: Real time data processing using AWS Lambda
Raleigh DevDay 2017: Real time data processing using AWS LambdaRaleigh DevDay 2017: Real time data processing using AWS Lambda
Raleigh DevDay 2017: Real time data processing using AWS LambdaAmazon Web Services
 
Real-time Data Processing Using AWS Lambda
Real-time Data Processing Using AWS LambdaReal-time Data Processing Using AWS Lambda
Real-time Data Processing Using AWS LambdaAmazon Web Services
 
Deep Dive and Best Practices for Real Time Streaming Applications
Deep Dive and Best Practices for Real Time Streaming ApplicationsDeep Dive and Best Practices for Real Time Streaming Applications
Deep Dive and Best Practices for Real Time Streaming ApplicationsAmazon Web Services
 
AWS April 2016 Webinar Series - Getting Started with Real-Time Data Analytics...
AWS April 2016 Webinar Series - Getting Started with Real-Time Data Analytics...AWS April 2016 Webinar Series - Getting Started with Real-Time Data Analytics...
AWS April 2016 Webinar Series - Getting Started with Real-Time Data Analytics...Amazon Web Services
 
AWS May Webinar Series - Streaming Data Processing with Amazon Kinesis and AW...
AWS May Webinar Series - Streaming Data Processing with Amazon Kinesis and AW...AWS May Webinar Series - Streaming Data Processing with Amazon Kinesis and AW...
AWS May Webinar Series - Streaming Data Processing with Amazon Kinesis and AW...Amazon Web Services
 
Real-time Data Processing Using AWS Lambda
Real-time Data Processing Using AWS LambdaReal-time Data Processing Using AWS Lambda
Real-time Data Processing Using AWS LambdaAmazon Web Services
 
Real Time Data Processing Using AWS Lambda
Real Time Data Processing Using AWS LambdaReal Time Data Processing Using AWS Lambda
Real Time Data Processing Using AWS LambdaAmazon Web Services
 
Deep dive and best practices on real time streaming applications nyc-loft_oct...
Deep dive and best practices on real time streaming applications nyc-loft_oct...Deep dive and best practices on real time streaming applications nyc-loft_oct...
Deep dive and best practices on real time streaming applications nyc-loft_oct...Amazon Web Services
 
Em tempo real: Ingestão, processamento e analise de dados
Em tempo real: Ingestão, processamento e analise de dadosEm tempo real: Ingestão, processamento e analise de dados
Em tempo real: Ingestão, processamento e analise de dadosAmazon Web Services LATAM
 
Amazon Kinesis Data Streams Vs Msk (1).pptx
Amazon Kinesis Data Streams Vs Msk (1).pptxAmazon Kinesis Data Streams Vs Msk (1).pptx
Amazon Kinesis Data Streams Vs Msk (1).pptxRenjithPillai26
 
AWS Summit Seoul 2015 - AWS 클라우드를 활용한 빅데이터 및 실시간 스트리밍 분석
AWS Summit Seoul 2015 -  AWS 클라우드를 활용한 빅데이터 및 실시간 스트리밍 분석AWS Summit Seoul 2015 -  AWS 클라우드를 활용한 빅데이터 및 실시간 스트리밍 분석
AWS Summit Seoul 2015 - AWS 클라우드를 활용한 빅데이터 및 실시간 스트리밍 분석Amazon Web Services Korea
 
Choose Right Stream Storage: Amazon Kinesis Data Streams vs MSK
Choose Right Stream Storage: Amazon Kinesis Data Streams vs MSKChoose Right Stream Storage: Amazon Kinesis Data Streams vs MSK
Choose Right Stream Storage: Amazon Kinesis Data Streams vs MSKSungmin Kim
 

Ähnlich wie Building Big Data Applications with Serverless Architectures - June 2017 AWS Online Tech Talks (20)

SMC303 Real-time Data Processing Using AWS Lambda
SMC303 Real-time Data Processing Using AWS LambdaSMC303 Real-time Data Processing Using AWS Lambda
SMC303 Real-time Data Processing Using AWS Lambda
 
Real Time Data Processing Using AWS Lambda - DevDay Austin 2017
Real Time Data Processing Using AWS Lambda - DevDay Austin 2017Real Time Data Processing Using AWS Lambda - DevDay Austin 2017
Real Time Data Processing Using AWS Lambda - DevDay Austin 2017
 
Real-time Data Processing Using AWS Lambda
Real-time Data Processing Using AWS LambdaReal-time Data Processing Using AWS Lambda
Real-time Data Processing Using AWS Lambda
 
Real Time Data Processing Using AWS Lambda - DevDay Los Angeles 2017
Real Time Data Processing Using AWS Lambda - DevDay Los Angeles 2017Real Time Data Processing Using AWS Lambda - DevDay Los Angeles 2017
Real Time Data Processing Using AWS Lambda - DevDay Los Angeles 2017
 
Real-time Data Processing Using AWS Lambda
Real-time Data Processing Using AWS LambdaReal-time Data Processing Using AWS Lambda
Real-time Data Processing Using AWS Lambda
 
Real-time Data Processing Using AWS Lambda
Real-time Data Processing Using AWS LambdaReal-time Data Processing Using AWS Lambda
Real-time Data Processing Using AWS Lambda
 
Raleigh DevDay 2017: Real time data processing using AWS Lambda
Raleigh DevDay 2017: Real time data processing using AWS LambdaRaleigh DevDay 2017: Real time data processing using AWS Lambda
Raleigh DevDay 2017: Real time data processing using AWS Lambda
 
Real-time Data Processing Using AWS Lambda
Real-time Data Processing Using AWS LambdaReal-time Data Processing Using AWS Lambda
Real-time Data Processing Using AWS Lambda
 
Deep Dive and Best Practices for Real Time Streaming Applications
Deep Dive and Best Practices for Real Time Streaming ApplicationsDeep Dive and Best Practices for Real Time Streaming Applications
Deep Dive and Best Practices for Real Time Streaming Applications
 
AWS April 2016 Webinar Series - Getting Started with Real-Time Data Analytics...
AWS April 2016 Webinar Series - Getting Started with Real-Time Data Analytics...AWS April 2016 Webinar Series - Getting Started with Real-Time Data Analytics...
AWS April 2016 Webinar Series - Getting Started with Real-Time Data Analytics...
 
AWS May Webinar Series - Streaming Data Processing with Amazon Kinesis and AW...
AWS May Webinar Series - Streaming Data Processing with Amazon Kinesis and AW...AWS May Webinar Series - Streaming Data Processing with Amazon Kinesis and AW...
AWS May Webinar Series - Streaming Data Processing with Amazon Kinesis and AW...
 
Real-time Data Processing Using AWS Lambda
Real-time Data Processing Using AWS LambdaReal-time Data Processing Using AWS Lambda
Real-time Data Processing Using AWS Lambda
 
Real Time Data Processing Using AWS Lambda
Real Time Data Processing Using AWS LambdaReal Time Data Processing Using AWS Lambda
Real Time Data Processing Using AWS Lambda
 
Deep dive and best practices on real time streaming applications nyc-loft_oct...
Deep dive and best practices on real time streaming applications nyc-loft_oct...Deep dive and best practices on real time streaming applications nyc-loft_oct...
Deep dive and best practices on real time streaming applications nyc-loft_oct...
 
Real-Time Event Processing
Real-Time Event ProcessingReal-Time Event Processing
Real-Time Event Processing
 
Em tempo real: Ingestão, processamento e analise de dados
Em tempo real: Ingestão, processamento e analise de dadosEm tempo real: Ingestão, processamento e analise de dados
Em tempo real: Ingestão, processamento e analise de dados
 
Amazon Kinesis Data Streams Vs Msk (1).pptx
Amazon Kinesis Data Streams Vs Msk (1).pptxAmazon Kinesis Data Streams Vs Msk (1).pptx
Amazon Kinesis Data Streams Vs Msk (1).pptx
 
AWS Real-Time Event Processing
AWS Real-Time Event ProcessingAWS Real-Time Event Processing
AWS Real-Time Event Processing
 
AWS Summit Seoul 2015 - AWS 클라우드를 활용한 빅데이터 및 실시간 스트리밍 분석
AWS Summit Seoul 2015 -  AWS 클라우드를 활용한 빅데이터 및 실시간 스트리밍 분석AWS Summit Seoul 2015 -  AWS 클라우드를 활용한 빅데이터 및 실시간 스트리밍 분석
AWS Summit Seoul 2015 - AWS 클라우드를 활용한 빅데이터 및 실시간 스트리밍 분석
 
Choose Right Stream Storage: Amazon Kinesis Data Streams vs MSK
Choose Right Stream Storage: Amazon Kinesis Data Streams vs MSKChoose Right Stream Storage: Amazon Kinesis Data Streams vs MSK
Choose Right Stream Storage: Amazon Kinesis Data Streams vs MSK
 

Mehr von Amazon Web Services

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Amazon Web Services
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Amazon Web Services
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateAmazon Web Services
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSAmazon Web Services
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Amazon Web Services
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Amazon Web Services
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...Amazon Web Services
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsAmazon Web Services
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareAmazon Web Services
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSAmazon Web Services
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAmazon Web Services
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareAmazon Web Services
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWSAmazon Web Services
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckAmazon Web Services
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without serversAmazon Web Services
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...Amazon Web Services
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceAmazon Web Services
 

Mehr von Amazon Web Services (20)

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS Fargate
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWS
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot
 
Open banking as a service
Open banking as a serviceOpen banking as a service
Open banking as a service
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
 
Computer Vision con AWS
Computer Vision con AWSComputer Vision con AWS
Computer Vision con AWS
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatare
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e web
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
 
Fundraising Essentials
Fundraising EssentialsFundraising Essentials
Fundraising Essentials
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container Service
 

Kürzlich hochgeladen

Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesThousandEyes
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsRavi Sanghani
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationKnoldus Inc.
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesKari Kakkonen
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfIngrid Airi González
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rick Flair
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Strongerpanagenda
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Hiroshi SHIBATA
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch TuesdayIvanti
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 

Kürzlich hochgeladen (20)

Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and Insights
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog Presentation
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examples
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdf
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch Tuesday
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 

Building Big Data Applications with Serverless Architectures - June 2017 AWS Online Tech Talks

  • 1. Building Big Data Applications with Serverless Architectures AWS Monthly Webinar Series June 15, 2017 Allan MacInnis Solutions Architect, AWS
  • 2. Agenda What’s Serverless Real-Time Data Processing? Serverless Processing of Real-Time Streaming Data Serverless Streaming ETL & Analytics Demo: Streaming ETL Serverless Data Processing with Distributed Computing
  • 4. AWS Lambda Efficient performance at scale Easy to author, deploy, maintain, secure & manage. Focus on business logic to build back-end services that perform at scale. Bring Your Own Code: Stateless, event-driven code with native support for Node.js, Java, Python and C# languages. No Infrastructure to manage: Compute without managing infrastructure like Amazon EC2 instances and Auto Scaling groups. Cost-effective: Automatically matches capacity to request rate. Compute cost 100 ms increments. Triggered by events: Direct Sync & Async API calls, AWS service integrations, and 3rd party triggers.
  • 5. Amazon S3 Amazon DynamoDB Amazon Kinesis AWS CloudFormation AWS CloudTrail Amazon CloudWatch Amazon Cognito Amazon SNS Amazon SES Cron events DATA STORES ENDPOINTS CONFIGURATION REPOSITORIES EVENT/MESSAGE SERVICES Lambda Event Sources … and many more! AWS CodeCommit Amazon API Gateway Amazon Alexa AWS IoT AWS Step Functions
  • 6. Serverless Real-Time Data Processing Is.. Capture Data Streams IoT Data Financial Data Log Data No servers to provision or manage EVENT SOURCE Node.js Python Java C# Process Data Streams FUNCTION Clickstream Data Output Data DATABASE CLOUD SERVICES
  • 7. Amazon DynamoDB Amazon Kinesis Amazon S3 Amazon SNS ASYNCHRONOUS PUSH MODEL STREAM PULL MODEL Lambda Real-Time Event Sources Amazon Alexa AWS IoT SYNCHRONOUS PUSH MODEL Mapping owned by Event Source Mapping owned by Lambda Invokes Lambda via Event Source API Lambda function invokes when new records found on stream Resource-based policy permissions Lambda Execution role policy permissions Concurrent executions Sync invocation Async Invocation Sync invocation Lambda polls the streams HOW IT WORKS
  • 9. Amazon Kinesis Real-Time: Collect real-time data streams and promptly respond to key business events and operational triggers. Real-time latencies. Easy to use: Focus on quickly launching data streaming applications instead of managing infrastructure. Amazon Kinesis Offering: Managed services for streaming data ingestion and processing. • Amazon Kinesis Streams: Build applications that process or analyze streaming data. • Amazon Kinesis Firehose: Load streaming data into Amazon S3, Amazon Redshift, and Amazon Elasticsearch Service. • Amazon Kinesis Analytics: Analyze data streams using SQL queries.
  • 10. Processing Real-Time Streams: Lambda + Amazon Kinesis Streaming data sent to Amazon Kinesis and stored in shards Multiple Lambda functions can be triggered to process same Amazon Kinesis stream for “fan out” Lambda can process data and store results ex. to DynamoDB, S3 Lambda can aggregate data to services like Amazon Elasticsearch Service for analytics Lambda sends event data and function info to Amazon CloudWatch for capturing metrics and monitoring Amazon Kinesis AWS Lambda Amazon CloudWatch Amazn DynamoDB AWS Lambda Amazon Elasticsearch Service Amazon S3
  • 11. Processing Streams: Set Up Amazon Kinesis Stream Streams Made up of Shards Each Shard ingests/reads data up to 1 MB/sec Each Shard emits/writes data up to 2 MB/sec Each shard supports 5 reads/sec Data All data is stored for 24 hours, configurable to 7 days Make sure partition key distribution is even to optimize parallel throughput Partition key used to distribute PUTs across shards, choose key with more groups than shards Best Practice Determine an initial size/shards to plan for expected maximum demand  Leverage “Help me decide how many shards I need” option in Console  Use formula for Number Of Shards: max(incoming_write_bandwidth_in_KB/1000, outgoing_read_bandwidth_in_KB / 2000)
  • 12. Processing Streams: Create Lambda functions Memory CPU allocation proportional to the memory configured Increasing memory makes your code execute faster (if CPU bound) Increasing memory allows for larger record sizes processed Timeout Increasing timeout allows for longer functions, but longer wait in case of errors Permission model Execution role defined for Lambda must have permission to access the stream Retries With Amazon Kinesis, Lambda retries until the data expires (24 hours) Best Practice Write Lambda function code to be stateless
  • 13. Processing Streams: Configure Event Source Amazon Kinesis mapped as event source in Lambda Batch size Max number of records that Lambda will send to one invocation Not equivalent to effective batch size Effective batch size is every 250 ms – Calculated as: MIN(records available, batch size, 6MB) Increasing batch size allows fewer Lambda function invocations with more data processed per function Best Practices Set to “Trim Horizon” for reading from start of stream (all data) Set to “Latest” for reading most recent data (LIFO) (latest data)
  • 14. Processing streams: How It Works Polling Concurrent polling and processing per shard Lambda polls every 250 ms if no records found Will grab as much data as possible in one GetRecords call (Batch) Batching Batches are passed for invocation to Lambda through function parameters Batch size may impact duration if the Lambda function takes longer to process more records Sub batch in memory for invocation payload Synchronous invocation Batches invoked as synchronous RequestResponse type Lambda honors Amazon Kinesis at least once semantics Each shard blocks in order of synchronous invocation
  • 15. Processing streams: Tuning throughput If put / ingestion rate is greater than the theoretical throughput, your processing is at risk of falling behind Maximum theoretical throughput # shards * 2MB / Lambda function duration (s) Effective theoretical throughput # shards * batch size (MB) / Lambda function duration (s) … … Source Amazon Kinesis Destination 1 Lambda Destination 2 FunctionsShards Lambda will scale automaticallyScale Amazon Kinesis by splitting or merging shards Waits for responsePolls a batch
  • 16. Processing streams: Tuning Throughput w/ Retries Retries Will retry on execution failures until the record is expired Throttles and errors impacts duration and directly impacts throughput Best Practice Retry with exponential backoff of up to 60s Effective theoretical throughput with retries ( # shards * batch size (MB) ) / ( function duration (s) * retries until expiry) … … Source Amazon Kinesis Destination 1 Lambda Destination 2 FunctionsShards Lambda will scale automaticallyScale Amazon Kinesis by splitting or merging shards Receives errorPolls a batch Receives error Receives success
  • 17. Processing streams: Common observations Effective batch size may be less than configured during low throughput Effective batch size will increase during higher throughput Increased Lambda duration -> decreased # of invokes and GetRecord calls Too many consumers of your stream may compete with Amazon Kinesis read limits and induce ReadProvisionedThroughputExceeded errors and metrics Amazon Kinesis AWS Lambda
  • 18. Processing streams: Monitoring with Cloudwatch • GetRecords: (effective throughput) • PutRecord : bytes, latency, records, etc • GetRecords.IteratorAgeMilliseconds: how old your last processed records were Monitoring Amazon Kinesis Streams Monitoring Lambda functions • Invocation count: Time function invoked • Duration: Execution/processing time • Error count: Number of Errors • Throttle count: Number of time function throttled • Iterator Age: Time elapsed from batch received & final record written to stream • Review All Metrics • Make Custom logs • View RAM consumed • Search for log events Debugging
  • 20. Streaming ETL: What is it? Traditional ETL • Batch-oriented • Servers run scheduled jobs • Latent Streaming ETL • Process data as it gets created • Runs continuously • Can produce real-time results
  • 21. Streaming ETL: Amazon Kinesis Firehose • Zero administration • Direct-to-data store integration • Seamless elasticity
  • 22. Streaming ETL: Firehose Data Transformation • Firehose buffers up to 3MB of ingested data • When buffer is full, automatically invokes Lambda function, passing array of records to be processed • Lambda function processes and returns array of transformed records, with status of each record • Transformed records are saved to configured destination [{" "recordId": "1234", "data": "encoded-data" }, { "recordId": "1235", "data": "encoded-data" } ] [{ "recordId": "1234", "result": "Ok" "data": "encoded-data" }, { "recordId": "1235", "result": "Dropped" "data": "encoded-data" } ]
  • 23. Streaming ETL: Firehose delivery architecture with transformations S3 bucket source records data source source records Amazon Elasticsearch Service Firehose delivery stream transformed records delivery failure Data transformation function transformation failure
  • 24. Streaming ETL: Amazon Kinesis Analytics • Apply SQL on streams • Build real time, stream processing applications • Easy Scalability
  • 25. Streaming Analytics: Kinesis Analytics Easily write SQL code to process streaming data Connect to streaming source Continuously deliver SQL results
  • 26. Connect to streaming source • Single streaming source, Kinesis Firehose or Streams • Single reference source up to 1 GB from S3 • Input formats include JSON, CSV, unstructured text (log files) • Each input exposed to has a schema • Schema is inferred but you can edit • Deep nesting (2+ levels) and multiple even types supported • Best practice for formats not supported: pre-process data using an AWS Lambda function for transformation Streaming Analytics: Kinesis Analytics
  • 27. Write SQL code • Build streaming applications with one to many SQL statements • Extensions to the SQL standard to work seamlessly with streaming data (STREAM, Windows, ROWTIME) • Robust SQL support including large number of functions including: • Simple mathematical operators (AVG, STDEV, etc.) • String manipulations (SUBSTRING, POSITION) • Advanced analytics (random sampling, anomaly detection) • Support for at-least-once processing semantics Streaming Analytics: Kinesis Analytics
  • 28. Example SQL code Streaming Analytics: Kinesis Analytics SELECT STREAM DEVICE_ID, MAX(MEASURED_TEMP) AS HIGH_TEMP FROM SOURCE_SQL_STREAM GROUP BY DEVICE_ID, STEP(SOURCE_SQL_STREAM.ROWTIME BY INTERVAL '15' MINUTE);
  • 29. Continuously deliver SQL results • Up to three outputs, including S3, Redshift, and Elasticsearch (through Kinesis Firehose), Kinesis Streams. • Firehose allows Kinesis Analytics to separates of processing and delivery data • Delivery speed will be heavily dependent upon your SQL queries (i.e. simple ETL versus 10 minute aggregations) • Output formats include JSON, CSV, variable column (TSV, pipe-delimited) • Best practice for destinations not supported: emit data to a stream and use AWS Lambda for delivery Streaming Analytics: Kinesis Analytics
  • 30. Real-time analytical patterns • Pre-processing: filtering, transformations • Basic Analytics: Simple counts, aggregates over windows • Advanced Analytics: Detecting anomalies, event correlation • Post-processing: Alerting, triggering, final filters Streaming Analytics: Kinesis Analytics
  • 31. Amazon S3 Ingest Streaming ETL Persist Analyze AWS Lambda 0 msec seconds < 5 minutes Amazon Kinesis Firehose Amazon Redshift Amazon Elasticsearch Amazon Athena Amazon Kinesis Analytics Amazon Redshift SpectrumAmazon Kinesis Streams Streaming Analytics: Big Picture
  • 32. Data’s destination is S3, Redshift, or ES? • Consider Kinesis Firehose. Transform streaming data with Lambda before it gets delivered. Require stateful processing, such as aggregations over a time period? • Consider Kinesis Analytics. Persist your aggregated data using Kinesis Streams with Lambda, or Kinesis Firehose. Require stateless processing, with varied destinations? • Consider Kinesis Streams with Lambda. Serverless Stream Processing: Which Approach?
  • 33. Serverless Data Processing with Distributed Computing 10101101 11001010
  • 34. Serverless Distributed Computing: Map-Reduce Model Why Serverless Data Processing with Distributed Computing? Remove Difficult infrastructure management  Cluster administration  Complex configuration tools Enable simple, elastic, user-friendly distributed data processing  Eliminate complexity of state management  Bring Distributed Computing power to the masses
  • 35. Serverless Distributed Computing: Map-Reduce Model Why Serverless Data Processing with Distributed Computing? Eliminate utilization concerns  Makes code simpler by removes complexities of multi- threading processing to optimize server usage  Cost-effective option to run ad hoc MapReduce jobs Easier, automatic horizontal scaling  Provide ability to process scientific and analytics applications
  • 36. Serverless Distributed Computing: MapReduce Input Bucket 1 2 Driver job state Mapper Functions map phase S3 event source mapper output 3 Coordinator 4 Reducer step 1 reducer output 5 recursively create n‘th reducer step ResultFinal Reducer reduce phase 6
  • 37. Serverless Distributed Computing: PyWren PyWren Prototype Developed at University of California, Berkeley Uses Python with AWS Lambda stateless functions for large scale data analytics Achieved @ 30-40 MB/s write and read performance per-core to S3 object store Scaled to 60-80 GB/s across 2800 simultaneous functions
  • 38. Serverless Distributed Computing: Benchmark Using Amazon MapReduce Reference Architecture Framework with Lambda Dataset Queries:  Scan query (90 M Rows, 6.36 GB of data)  Select query on Page Rankings  Aggregation query on UserVisits ( 775M rows, ~127GB of data) Rankings (rows) Rankings (bytes) UserVisits (rows) UserVisits (bytes) Documents (bytes) 90 Million 6.38 GB 775 Million 126.8 GB 136.9 GB
  • 39. Serverless Distributed Computing: Benchmark Using Amazon MapReduce Reference Architecture Framework with Lambda Subset of the Amplab benchmark ran to compare with other data processing frameworks Performance Benchmarks: Execution time for each workload in seconds TECHNOLOGY SCAN 1A SCAN 1B AGGREGATE 2A Amazon Redshift (HDD) 2.49 2.61 25.46 Serverless MapReduce 39 47 200 Impala - Disk - 1.2.3 12.015 12.015 113.72 Impala - Mem - 1.2.3 2.17 3.01 84.35 Shark - Disk - 0.8.1 6.6 7 151.4 Shark - Mem - 0.8.1 1.7 1.8 83.7 Hive - 0.12 YARN 50.49 59.93 730.62 Tez - 0.2.0 28.22 36.35 377.48
  • 41. Requirements and Architecture • BI team needs to run ad-hoc queries against all order data. • Real-time dashboard needs to display top 20 products sold in the past 5 minutes. Streaming ETL Demo: E-commerce Events Amazon Kinesis Firehose Amazon Kinesis Streams Amazon Kinesis Analytics Amazon Redshift AWS Lambda Amazon DynamoDB E-commerce Orders BI Users Reports Dashboards
  • 43. Serverless Big Data Applications: Next steps  Learn more about AWS Serverless at https://aws.amazon.com/serverless  Explore the AWS Lambda Reference Architecture on GitHub:  Real-Time Streaming: https://github.com/awslabs/lambda-refarch- streamprocessing  Distributed Computing Reference Architecture (serverless MapReduce) https://github.com/awslabs/lambda-refarch-mapreduce
  • 44.  Create an Amazon Kinesis stream or Kinesis Firehose delivery stream. Visit the Amazon Kinesis Console and configure a stream to receive data.  Send test data to your stream using the Amazon Kinesis Data Generator.  Create & test a Lambda function to process streams from Amazon Kinesis by visiting Lambda console. First 1M requests each month are on us! Serverless Big Data Applications: Next steps
  • 45.  Read the Developer Guide and try the Lambda and Amazon Kinesis Tutorial:  http://docs.aws.amazon.com/lambda/latest/dg/with- kinesis.html  Send questions, comments, feedback to the AWS Lambda Forums Serverless Big Data Applications: Next steps