Serverless technologies let you build and scale applications and services rapidly without the need to provision or manage servers. In this session, we show you how to incorporate serverless concepts into your big data architectures. We explore the concepts behind and benefits of serverless architectures for big data, looking at design patterns to ingest, store, process, and visualize your data. Along the way, we explain when and how you can use serverless technologies to streamline data processing, minimize infrastructure management, and improve agility and robustness and share a reference architecture using a combination of cloud and open source technologies to solve your big data problems. Topics include: use cases and best practices for serverless big data applications; leveraging AWS technologies such as Amazon DynamoDB, Amazon S3, Amazon Kinesis, AWS Lambda, Amazon Athena, and Amazon EMR; and serverless ETL, event processing, ad hoc analysis, and real-time analytics.
2. Agenda
Serverless – what and why?
Serverless – which service when?
Common big data applications
Fitting serverless into big data applications
Next steps…
4. No servers to provision
or manage
Scales with usage
Never pay for idle Availability and
fault-tolerance built in
Serverless characteristics
5. Serverless nicely fits into big data platforms
• Mix and match serverless, managed, and virtualized
• Leverage services to easily
• Rapidly ingest, categorize, and discover your data
• Allow easy query and analysis of your data
• Transform and Load data
• Provide custom event based handlers
• Serverless allows you to focuses more analytics and
not on infrastructure or servers
6. Lambda
• Run your code in the cloud—fully
managed and highly available
• Triggered through API or state
changes in your setup
• Scales automatically to match the
incoming event rate
• Node.js (JavaScript), Python, Java,
and C#
• Charged per 100-ms execution time
Serverless compute
8. AWS Glue
Serverless catalog and ETL/ELT service
Data Catalog
Job Authoring
Job Execution
Crawl, discover, and organize data
Integration with managed and serverless analytics
Serverless ETL – Pay for what you consume
9. Kinesis Streams
• For technical developers
• Build your own custom
applications that process
or analyze streaming
data
Kinesis Firehose
• For all developers and
data scientists
• Easily load massive
volumes of streaming data
into Amazon S3, Amazon
Redshift, and Amazon ES
Kinesis Analytics
• For all developers and data
scientists
• Easily analyze data
streams using standard
SQL queries
Amazon Kinesis: Streaming data made easy
Services make it easy to capture, deliver, and process streams on AWS
11. Characteristics of a big data applications
Future
Proof
Flexible
Access
Dive in
Anywhere
Collect
Anything
12. Components of big data applications
Catalog
& Search
Protect
& Secure
Access &
User InterfaceIngest & Store
Prepare &
Transform
Analyze &
Reason
13. Components of big data applications
Catalog
& Search
Protect
& Secure
Access &
User InterfaceIngest & Store
S3
Prepare &
Transform
Analyze &
Reason
Amazon
Kinesis
14. Components of big data applications
Protect
& Secure
Access &
User InterfaceIngest & Store
S3
Prepare &
Transform
AWS Glue
Data Catalog
Analyze &
Reason
Amazon
Kinesis
15. Components of big data applications
Protect
& Secure
Access &
User InterfaceIngest & Store
Amazon
Kinesis
S3
AWS Lambda
AWS Glue
Prepare &
Transform
AWS Glue
Data Catalog
Analyze &
Reason
16. Components of big data applications
Protect
& Secure
Access &
User Interface
Prepare &
TransformIngest & Store
Amazon
Kinesis
S3
AWS Lambda
AWS Glue
Kinesis Analytics
Amazon
Athena
AWS Glue
Data Catalog
Analyze &
Reason
17. Components of big data applications
Protect
& Secure
Access &
User Interface
Prepare &
TransformIngest & Store
Amazon
Kinesis
S3
AWS Lambda
AWS Glue
Kinesis Analytics
Amazon
Athena
Amazon
QuickSight
AWS Glue
Data Catalog
Analyze &
Reason
18. Components of big data applications
Protect
& Secure
Access &
User Interface
Prepare &
TransformIngest & Store
Amazon
Kinesis
S3
AWS Lambda
AWS Glue
Kinesis Analytics
Amazon
Athena
QuickSight
Glue Data
Catalog
Analyze &
Reason
19. Serverless real-time analytics
Prepare &
TransformIngest & Store
Kinesis AWS Lambda Kinesis Analytics
Preprocessing Logic
to transform real-
time data
Serverless Streams
and Firehose
SQL-based analytics
on streaming data
QuickSight
Analyze &
Reason
29. Components of big data applications
Protect
& Secure
Access &
User Interface
Prepare &
TransformIngest & Store
Kinesis
S3
AWS Lambda
AWS Glue
Kinesis Analytics
Amazon
Athena
QuickSight
AWS Glue
Data Catalog
Integrated Pipeline
Analyze &
Reason
30. Catalog & Search
Access and search metadata
Access & User Interface
Give your users easy and secure access
DynamoDB Elasticsearch API Gateway Identity & Access
Management
Cognito
QuickSight Amazon AI EMR Redshift
Athena Kinesis RDS
Central Storage
Secure, cost-effective
Storage in Amazon S3
S3
Snowball Database Migration
Service
Kinesis Firehose Direct Connect
Data Ingestion
Get your data into S3
Quickly and securely
Protect and Secure
Use entitlements to ensure data is secure and users’ identities are verified
Processing & Analytics
Use of predictive and prescriptive
analytics to gain better understanding
Security Token
Service
CloudWatch CloudTrail Key Management
Service
Data lake reference architecture
Catalog & Search
Access and search metadata
Access & User Interface
Give your users easy and secure access
DynamoDB Elasticsearch API Gateway Identity & Access
Management
Cognito
QuickSight A
Central Storage
Secure, cost-effective
Storage in Amazon S3
Glue ETL
31. The right tool for the right job
Machine Learning/Deep Learning
Business
Reporting
Data Scientists
Data Engineer
IDE
Data
Catalog
Central
Storage
33. The right tool for the right job
Machine Learning/Deep Learning
Business
Reporting
Data Scientists
Data Engineer
IDE
Data
Catalog
Central
Storage
34. Serverless nicely fits into big data platforms
• Mix and match serverless, managed, and virtualized services
• Leverage services to easily
• Rapidly ingest, categorize, and discover your data
• Allow easy query and analysis of your data
• Transform and Load data
• Provide custom event based handlers
• Serverless allows you to focuses more analytics and not on
infrastructure or servers
• Pay only for what you use