SlideShare ist ein Scribd-Unternehmen logo
1 von 14
Downloaden Sie, um offline zu lesen
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Serverless Data Lake Day with AWS
Tame Your Big Data with Kinesis Firehose, S3, Glue, Athena, Quicksight
Kim Kao, Solutions Architect
2
Big Data still challenges most enterprises
Store AnalyzeIngest
1 4
0 9
5CUSTOMER &
OPERATIONAL
DATA
CUSTOMER &
OPERATIONAL
INSIGHTS
ANALYTICS PIPELINE
Rigid data
ingest
Inability to process
new types of data
Limited analytics
platforms
Inability to generate
new types of insights
Data doesn’t
“connect”
Siloed data
prevents single
source of truth
3
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Data Lake concepts
4
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
The concept of a Data Lake
• All data in one place, a single source of truth
• Handles structured/semi-structured/unstructured/raw data
• Supports fast ingestion and consumption
• Schema on read
• Designed for low-cost storage
• Decouples storage and compute
• Supports protection and security rules
5
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
A data lake is a centralized repository that allows
you to store all your structured and unstructured
data at any scale
6
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Typical steps of building a data lake
Setup storage1
Move data2
Cleanse, prep,
and catalog data
3
Configure and enforce
security and compliance
policies
4
Make data available
for analytics and
consumption
5
7
Workshop Solution Architecture
Architecture Highlights
• Serverless
• Hybrid
PB3: Note: Draw.io source for the diagram will be shared at the Immersion Day
8
Service Used in this Lab
S3: Store data at Any Scalable with Unmatched Durability & Availability
• Built to store any amount of data
• Runs on the world’s largest global
cloud infrastructure
• Designed to deliver 99.999999999% durability
• Geographic redundancy & automatic replication
• Seamlessly replicates data between any region
• Tiered storage to optimize price/performance:
Store data at $0.023/GB/month at S3
($0.004/GB/month at Glacier)
S3
Standard
S3 Standard
Infrequent Access
S3 One Zone-IA
Glacier
9
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Process Data in Place…
Amazon S3
Amazon Athena Amazon Redshift
Spectrum
Amazon SageMaker AWS Glue
10
Amazon Kinesis—Real Time
time
Load data streams
into AWS data stores
Kinesis Data
Firehose
Build custom
applications that
analyze data streams
Kinesis Data
Streams
Capture, process,
and store video
streams for analytics
Kinesis Video
Streams
Analyze data streams
with SQL
Kinesis Data
Analytics
SQL
11
Capture and submit
streaming data to Firehose
Analyze streaming data using your
favorite BI tools
Firehose loads streaming data
continuously into S3, Amazon Redshift,
and Amazon ES
Service Used in this Lab
Amazon Kinesis Firehose
Zero administration: Capture and deliver streaming data to Amazon S3, Amazon Redshift, or
Amazon Elasticsearch Service without writing an app or managing infrastructure.
Direct-to-data-store integration: Batch, compress, and encrypt streaming data for delivery
in as little as 60 seconds.
Seamless elasticity: Seamlessly scales to match data throughput without intervention.
12
Kinesis Data Firehose—How it Works
Ingest Transform Deliver
Amazon S3
Amazon Redshift
Amazon Elasticsearch Service
AWS IoT
Amazon Kinesis Agent
Amazon Kinesis Streams
Amazon CloudWatch Logs
Amazon CloudWatch Events
Apache Kafka
13
Kinesis Streams Kinesis Firehose
Use case Capture and expose data streams to build
arbitrary stream processing applications
Fully-managed service for automatic
loading of data streams into S3,
Redshift, ElasticSearch etc.
Provisioning and
Resource Management
Provisioned model via Streams No provisioning of the underlying
resource called a
DeliveryStream
Customer-controllable construct of ‘Shards’ No Shards (not a customer visible
construct)
Scaling Customer owns explicit scaling of stream/
shards via Split/Merge APIs
Firehose is completely elastic – no
explicit scaling actions needed.
Ingestion (Put Data) Customer owns “partition key’ as part of
PUT* API call
Different and simplified Put* API that
doesn’t need a partition key
Retrieval (Get Data) Get API to retrieve data directly from stream No Get* API – Data appears in
destination (like S3/ Redshift) after
meeting configuration criteria
Full-flexibility to write/ manage own
application with Kinesis Client Library and
connector libraries
No application to be written or
managed. Customers specify simple
set of pre-defined configurations on the
console or API
Amazon Kinesis Streams vs Firehose
14
Firehose Highlights
• Records: The maximum size of a record (before Base64-encoding)
is 1000 KB.
• Compression: Amazon Kinesis Data Firehose allows you to
compress your data before delivering it to Amazon S3.
• Transformation: Firehose can invoke an AWS Lambda function to
transform incoming data before delivering it to destinations.
• Buffering
• Buffer size is in MBs and ranges from 1MB to 128MB.
• Buffer interval is in seconds and ranges from 60 seconds to 900
seconds.
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Source: Firehose FAQ

Weitere ähnliche Inhalte

Was ist angesagt?

Streaming ETL for Data Lakes using Amazon Kinesis Firehose - May 2017 AWS Onl...
Streaming ETL for Data Lakes using Amazon Kinesis Firehose - May 2017 AWS Onl...Streaming ETL for Data Lakes using Amazon Kinesis Firehose - May 2017 AWS Onl...
Streaming ETL for Data Lakes using Amazon Kinesis Firehose - May 2017 AWS Onl...Amazon Web Services
 
(BDT317) Building A Data Lake On AWS
(BDT317) Building A Data Lake On AWS(BDT317) Building A Data Lake On AWS
(BDT317) Building A Data Lake On AWSAmazon Web Services
 
How to build a data lake with aws glue data catalog (ABD213-R) re:Invent 2017
How to build a data lake with aws glue data catalog (ABD213-R)  re:Invent 2017How to build a data lake with aws glue data catalog (ABD213-R)  re:Invent 2017
How to build a data lake with aws glue data catalog (ABD213-R) re:Invent 2017Amazon Web Services
 
ABD318_Architecting a data lake with Amazon S3, Amazon Kinesis, AWS Glue and ...
ABD318_Architecting a data lake with Amazon S3, Amazon Kinesis, AWS Glue and ...ABD318_Architecting a data lake with Amazon S3, Amazon Kinesis, AWS Glue and ...
ABD318_Architecting a data lake with Amazon S3, Amazon Kinesis, AWS Glue and ...Amazon Web Services
 
ABD324_Migrating Your Oracle Data Warehouse to Amazon Redshift Using AWS DMS ...
ABD324_Migrating Your Oracle Data Warehouse to Amazon Redshift Using AWS DMS ...ABD324_Migrating Your Oracle Data Warehouse to Amazon Redshift Using AWS DMS ...
ABD324_Migrating Your Oracle Data Warehouse to Amazon Redshift Using AWS DMS ...Amazon Web Services
 
Optimizing Storage for Big Data Analytics Workloads
Optimizing Storage for Big Data Analytics WorkloadsOptimizing Storage for Big Data Analytics Workloads
Optimizing Storage for Big Data Analytics WorkloadsAmazon Web Services
 
Building Data Lake on AWS | AWS Floor28
Building Data Lake on AWS | AWS Floor28Building Data Lake on AWS | AWS Floor28
Building Data Lake on AWS | AWS Floor28Amazon Web Services
 
Business Intelligence in Minutes with Amazon Athena and Amazon QuickSight - A...
Business Intelligence in Minutes with Amazon Athena and Amazon QuickSight - A...Business Intelligence in Minutes with Amazon Athena and Amazon QuickSight - A...
Business Intelligence in Minutes with Amazon Athena and Amazon QuickSight - A...Amazon Web Services
 
Building a Server-less Data Lake on AWS - Technical 301
Building a Server-less Data Lake on AWS - Technical 301Building a Server-less Data Lake on AWS - Technical 301
Building a Server-less Data Lake on AWS - Technical 301Amazon Web Services
 
The Power of Big Data - AWS Summit Bahrain 2017
The Power of Big Data - AWS Summit Bahrain 2017The Power of Big Data - AWS Summit Bahrain 2017
The Power of Big Data - AWS Summit Bahrain 2017Amazon Web Services
 
Taking the Performance of your Data Warehouse to the Next Level with Amazon R...
Taking the Performance of your Data Warehouse to the Next Level with Amazon R...Taking the Performance of your Data Warehouse to the Next Level with Amazon R...
Taking the Performance of your Data Warehouse to the Next Level with Amazon R...Amazon Web Services
 
Modern Data Architectures for Business Insights at Scale
Modern Data Architectures for Business Insights at ScaleModern Data Architectures for Business Insights at Scale
Modern Data Architectures for Business Insights at ScaleAmazon Web Services
 
Building Data Lakes and Analytics on AWS; Patterns and Best Practices - BDA30...
Building Data Lakes and Analytics on AWS; Patterns and Best Practices - BDA30...Building Data Lakes and Analytics on AWS; Patterns and Best Practices - BDA30...
Building Data Lakes and Analytics on AWS; Patterns and Best Practices - BDA30...Amazon Web Services
 
Building Your Data Lake on AWS - Level 200
Building Your Data Lake on AWS - Level 200Building Your Data Lake on AWS - Level 200
Building Your Data Lake on AWS - Level 200Amazon Web Services
 
Getting Started with Amazon QuickSight
Getting Started with Amazon QuickSightGetting Started with Amazon QuickSight
Getting Started with Amazon QuickSightAmazon Web Services
 
Real-time Analytics using Data from IoT Devices - AWS Online Tech Talks
Real-time Analytics using Data from IoT Devices - AWS Online Tech TalksReal-time Analytics using Data from IoT Devices - AWS Online Tech Talks
Real-time Analytics using Data from IoT Devices - AWS Online Tech TalksAmazon Web Services
 
NEW LAUNCH! Introducing AWS Batch: Easy and efficient batch computing
 	  NEW LAUNCH! Introducing AWS Batch: Easy and efficient batch computing 	  NEW LAUNCH! Introducing AWS Batch: Easy and efficient batch computing
NEW LAUNCH! Introducing AWS Batch: Easy and efficient batch computingAmazon Web Services
 
Fast Track to Your Data Lake on AWS
Fast Track to Your Data Lake on AWSFast Track to Your Data Lake on AWS
Fast Track to Your Data Lake on AWSAmazon Web Services
 

Was ist angesagt? (20)

Streaming ETL for Data Lakes using Amazon Kinesis Firehose - May 2017 AWS Onl...
Streaming ETL for Data Lakes using Amazon Kinesis Firehose - May 2017 AWS Onl...Streaming ETL for Data Lakes using Amazon Kinesis Firehose - May 2017 AWS Onl...
Streaming ETL for Data Lakes using Amazon Kinesis Firehose - May 2017 AWS Onl...
 
(BDT317) Building A Data Lake On AWS
(BDT317) Building A Data Lake On AWS(BDT317) Building A Data Lake On AWS
(BDT317) Building A Data Lake On AWS
 
How to build a data lake with aws glue data catalog (ABD213-R) re:Invent 2017
How to build a data lake with aws glue data catalog (ABD213-R)  re:Invent 2017How to build a data lake with aws glue data catalog (ABD213-R)  re:Invent 2017
How to build a data lake with aws glue data catalog (ABD213-R) re:Invent 2017
 
ABD318_Architecting a data lake with Amazon S3, Amazon Kinesis, AWS Glue and ...
ABD318_Architecting a data lake with Amazon S3, Amazon Kinesis, AWS Glue and ...ABD318_Architecting a data lake with Amazon S3, Amazon Kinesis, AWS Glue and ...
ABD318_Architecting a data lake with Amazon S3, Amazon Kinesis, AWS Glue and ...
 
ABD324_Migrating Your Oracle Data Warehouse to Amazon Redshift Using AWS DMS ...
ABD324_Migrating Your Oracle Data Warehouse to Amazon Redshift Using AWS DMS ...ABD324_Migrating Your Oracle Data Warehouse to Amazon Redshift Using AWS DMS ...
ABD324_Migrating Your Oracle Data Warehouse to Amazon Redshift Using AWS DMS ...
 
Optimizing Storage for Big Data Analytics Workloads
Optimizing Storage for Big Data Analytics WorkloadsOptimizing Storage for Big Data Analytics Workloads
Optimizing Storage for Big Data Analytics Workloads
 
Building Data Lake on AWS | AWS Floor28
Building Data Lake on AWS | AWS Floor28Building Data Lake on AWS | AWS Floor28
Building Data Lake on AWS | AWS Floor28
 
Business Intelligence in Minutes with Amazon Athena and Amazon QuickSight - A...
Business Intelligence in Minutes with Amazon Athena and Amazon QuickSight - A...Business Intelligence in Minutes with Amazon Athena and Amazon QuickSight - A...
Business Intelligence in Minutes with Amazon Athena and Amazon QuickSight - A...
 
Using Data Lakes
Using Data LakesUsing Data Lakes
Using Data Lakes
 
Building a Server-less Data Lake on AWS - Technical 301
Building a Server-less Data Lake on AWS - Technical 301Building a Server-less Data Lake on AWS - Technical 301
Building a Server-less Data Lake on AWS - Technical 301
 
The Power of Big Data - AWS Summit Bahrain 2017
The Power of Big Data - AWS Summit Bahrain 2017The Power of Big Data - AWS Summit Bahrain 2017
The Power of Big Data - AWS Summit Bahrain 2017
 
Taking the Performance of your Data Warehouse to the Next Level with Amazon R...
Taking the Performance of your Data Warehouse to the Next Level with Amazon R...Taking the Performance of your Data Warehouse to the Next Level with Amazon R...
Taking the Performance of your Data Warehouse to the Next Level with Amazon R...
 
Modern Data Architectures for Business Insights at Scale
Modern Data Architectures for Business Insights at ScaleModern Data Architectures for Business Insights at Scale
Modern Data Architectures for Business Insights at Scale
 
Building Data Lakes and Analytics on AWS; Patterns and Best Practices - BDA30...
Building Data Lakes and Analytics on AWS; Patterns and Best Practices - BDA30...Building Data Lakes and Analytics on AWS; Patterns and Best Practices - BDA30...
Building Data Lakes and Analytics on AWS; Patterns and Best Practices - BDA30...
 
Building Your Data Lake on AWS - Level 200
Building Your Data Lake on AWS - Level 200Building Your Data Lake on AWS - Level 200
Building Your Data Lake on AWS - Level 200
 
Log Analytics with AWS
Log Analytics with AWSLog Analytics with AWS
Log Analytics with AWS
 
Getting Started with Amazon QuickSight
Getting Started with Amazon QuickSightGetting Started with Amazon QuickSight
Getting Started with Amazon QuickSight
 
Real-time Analytics using Data from IoT Devices - AWS Online Tech Talks
Real-time Analytics using Data from IoT Devices - AWS Online Tech TalksReal-time Analytics using Data from IoT Devices - AWS Online Tech Talks
Real-time Analytics using Data from IoT Devices - AWS Online Tech Talks
 
NEW LAUNCH! Introducing AWS Batch: Easy and efficient batch computing
 	  NEW LAUNCH! Introducing AWS Batch: Easy and efficient batch computing 	  NEW LAUNCH! Introducing AWS Batch: Easy and efficient batch computing
NEW LAUNCH! Introducing AWS Batch: Easy and efficient batch computing
 
Fast Track to Your Data Lake on AWS
Fast Track to Your Data Lake on AWSFast Track to Your Data Lake on AWS
Fast Track to Your Data Lake on AWS
 

Ähnlich wie Serverless Datalake Day with AWS

BDA308 Serverless Analytics with Amazon Athena and Amazon QuickSight, featuri...
BDA308 Serverless Analytics with Amazon Athena and Amazon QuickSight, featuri...BDA308 Serverless Analytics with Amazon Athena and Amazon QuickSight, featuri...
BDA308 Serverless Analytics with Amazon Athena and Amazon QuickSight, featuri...Amazon Web Services
 
Cutting to the chase for Machine Learning Analytics Ecosystem & AWS Lake Form...
Cutting to the chase for Machine Learning Analytics Ecosystem & AWS Lake Form...Cutting to the chase for Machine Learning Analytics Ecosystem & AWS Lake Form...
Cutting to the chase for Machine Learning Analytics Ecosystem & AWS Lake Form...AWS Riyadh User Group
 
Best Practices Using Big Data on AWS | AWS Public Sector Summit 2017
Best Practices Using Big Data on AWS | AWS Public Sector Summit 2017Best Practices Using Big Data on AWS | AWS Public Sector Summit 2017
Best Practices Using Big Data on AWS | AWS Public Sector Summit 2017Amazon Web Services
 
Architecting a Serverless Data Lake on AWS
Architecting a Serverless Data Lake on AWSArchitecting a Serverless Data Lake on AWS
Architecting a Serverless Data Lake on AWSAmazon Web Services
 
AWS Data Lake: data analysis @ scale
AWS Data Lake: data analysis @ scaleAWS Data Lake: data analysis @ scale
AWS Data Lake: data analysis @ scaleAmazon Web Services
 
Creare e gestire Data Lake e Data Warehouses
Creare e gestire Data Lake e Data WarehousesCreare e gestire Data Lake e Data Warehouses
Creare e gestire Data Lake e Data WarehousesAmazon Web Services
 
Using data lakes to quench your analytics fire - AWS Summit Cape Town 2018
Using data lakes to quench your analytics fire - AWS Summit Cape Town 2018Using data lakes to quench your analytics fire - AWS Summit Cape Town 2018
Using data lakes to quench your analytics fire - AWS Summit Cape Town 2018Amazon Web Services
 
Data Catalog & ETL - Glue & Athena
Data Catalog & ETL - Glue & AthenaData Catalog & ETL - Glue & Athena
Data Catalog & ETL - Glue & AthenaAmazon Web Services
 
Data Catalog & ETL - Glue & Athena
Data Catalog & ETL - Glue & AthenaData Catalog & ETL - Glue & Athena
Data Catalog & ETL - Glue & AthenaAmazon Web Services
 
Analyze your Data Lake, Fast @ Any Scale - AWS Online Tech Talks
Analyze your Data Lake, Fast @ Any Scale - AWS Online Tech TalksAnalyze your Data Lake, Fast @ Any Scale - AWS Online Tech Talks
Analyze your Data Lake, Fast @ Any Scale - AWS Online Tech TalksAmazon Web Services
 
Implementazione di una soluzione Data Lake.pdf
Implementazione di una soluzione Data Lake.pdfImplementazione di una soluzione Data Lake.pdf
Implementazione di una soluzione Data Lake.pdfAmazon Web Services
 
Big data journey to the cloud rohit pujari 5.30.18
Big data journey to the cloud   rohit pujari 5.30.18Big data journey to the cloud   rohit pujari 5.30.18
Big data journey to the cloud rohit pujari 5.30.18Cloudera, Inc.
 
Building a Modern Data Platform on AWS
Building a Modern Data Platform on AWSBuilding a Modern Data Platform on AWS
Building a Modern Data Platform on AWSAmazon Web Services
 
Success has Many Query Engines- Tel Aviv Summit 2018
Success has Many Query Engines- Tel Aviv Summit 2018Success has Many Query Engines- Tel Aviv Summit 2018
Success has Many Query Engines- Tel Aviv Summit 2018Amazon Web Services
 
Build Data Lakes and Analytics on AWS: Patterns & Best Practices
Build Data Lakes and Analytics on AWS: Patterns & Best PracticesBuild Data Lakes and Analytics on AWS: Patterns & Best Practices
Build Data Lakes and Analytics on AWS: Patterns & Best PracticesAmazon Web Services
 
Build Data Lakes & Analytics on AWS: Patterns & Best Practices
Build Data Lakes & Analytics on AWS: Patterns & Best PracticesBuild Data Lakes & Analytics on AWS: Patterns & Best Practices
Build Data Lakes & Analytics on AWS: Patterns & Best PracticesAmazon Web Services
 
Building a Data Lake in Amazon S3 & Amazon Glacier (STG401-R1) - AWS re:Inven...
Building a Data Lake in Amazon S3 & Amazon Glacier (STG401-R1) - AWS re:Inven...Building a Data Lake in Amazon S3 & Amazon Glacier (STG401-R1) - AWS re:Inven...
Building a Data Lake in Amazon S3 & Amazon Glacier (STG401-R1) - AWS re:Inven...Amazon Web Services
 
Analyzing Data Streams in Real Time with Amazon Kinesis: PNNL's Serverless Da...
Analyzing Data Streams in Real Time with Amazon Kinesis: PNNL's Serverless Da...Analyzing Data Streams in Real Time with Amazon Kinesis: PNNL's Serverless Da...
Analyzing Data Streams in Real Time with Amazon Kinesis: PNNL's Serverless Da...Amazon Web Services
 

Ähnlich wie Serverless Datalake Day with AWS (20)

BDA308 Serverless Analytics with Amazon Athena and Amazon QuickSight, featuri...
BDA308 Serverless Analytics with Amazon Athena and Amazon QuickSight, featuri...BDA308 Serverless Analytics with Amazon Athena and Amazon QuickSight, featuri...
BDA308 Serverless Analytics with Amazon Athena and Amazon QuickSight, featuri...
 
Cutting to the chase for Machine Learning Analytics Ecosystem & AWS Lake Form...
Cutting to the chase for Machine Learning Analytics Ecosystem & AWS Lake Form...Cutting to the chase for Machine Learning Analytics Ecosystem & AWS Lake Form...
Cutting to the chase for Machine Learning Analytics Ecosystem & AWS Lake Form...
 
Best Practices Using Big Data on AWS | AWS Public Sector Summit 2017
Best Practices Using Big Data on AWS | AWS Public Sector Summit 2017Best Practices Using Big Data on AWS | AWS Public Sector Summit 2017
Best Practices Using Big Data on AWS | AWS Public Sector Summit 2017
 
Architecting a Serverless Data Lake on AWS
Architecting a Serverless Data Lake on AWSArchitecting a Serverless Data Lake on AWS
Architecting a Serverless Data Lake on AWS
 
AWS Data Lake: data analysis @ scale
AWS Data Lake: data analysis @ scaleAWS Data Lake: data analysis @ scale
AWS Data Lake: data analysis @ scale
 
Implementing a Data Lake
Implementing a Data LakeImplementing a Data Lake
Implementing a Data Lake
 
Creare e gestire Data Lake e Data Warehouses
Creare e gestire Data Lake e Data WarehousesCreare e gestire Data Lake e Data Warehouses
Creare e gestire Data Lake e Data Warehouses
 
Using data lakes to quench your analytics fire - AWS Summit Cape Town 2018
Using data lakes to quench your analytics fire - AWS Summit Cape Town 2018Using data lakes to quench your analytics fire - AWS Summit Cape Town 2018
Using data lakes to quench your analytics fire - AWS Summit Cape Town 2018
 
Data Catalog & ETL - Glue & Athena
Data Catalog & ETL - Glue & AthenaData Catalog & ETL - Glue & Athena
Data Catalog & ETL - Glue & Athena
 
Data Catalog & ETL - Glue & Athena
Data Catalog & ETL - Glue & AthenaData Catalog & ETL - Glue & Athena
Data Catalog & ETL - Glue & Athena
 
Analyze your Data Lake, Fast @ Any Scale - AWS Online Tech Talks
Analyze your Data Lake, Fast @ Any Scale - AWS Online Tech TalksAnalyze your Data Lake, Fast @ Any Scale - AWS Online Tech Talks
Analyze your Data Lake, Fast @ Any Scale - AWS Online Tech Talks
 
Implementazione di una soluzione Data Lake.pdf
Implementazione di una soluzione Data Lake.pdfImplementazione di una soluzione Data Lake.pdf
Implementazione di una soluzione Data Lake.pdf
 
Big data journey to the cloud rohit pujari 5.30.18
Big data journey to the cloud   rohit pujari 5.30.18Big data journey to the cloud   rohit pujari 5.30.18
Big data journey to the cloud rohit pujari 5.30.18
 
Building a Modern Data Platform on AWS
Building a Modern Data Platform on AWSBuilding a Modern Data Platform on AWS
Building a Modern Data Platform on AWS
 
ABD217_From Batch to Streaming
ABD217_From Batch to StreamingABD217_From Batch to Streaming
ABD217_From Batch to Streaming
 
Success has Many Query Engines- Tel Aviv Summit 2018
Success has Many Query Engines- Tel Aviv Summit 2018Success has Many Query Engines- Tel Aviv Summit 2018
Success has Many Query Engines- Tel Aviv Summit 2018
 
Build Data Lakes and Analytics on AWS: Patterns & Best Practices
Build Data Lakes and Analytics on AWS: Patterns & Best PracticesBuild Data Lakes and Analytics on AWS: Patterns & Best Practices
Build Data Lakes and Analytics on AWS: Patterns & Best Practices
 
Build Data Lakes & Analytics on AWS: Patterns & Best Practices
Build Data Lakes & Analytics on AWS: Patterns & Best PracticesBuild Data Lakes & Analytics on AWS: Patterns & Best Practices
Build Data Lakes & Analytics on AWS: Patterns & Best Practices
 
Building a Data Lake in Amazon S3 & Amazon Glacier (STG401-R1) - AWS re:Inven...
Building a Data Lake in Amazon S3 & Amazon Glacier (STG401-R1) - AWS re:Inven...Building a Data Lake in Amazon S3 & Amazon Glacier (STG401-R1) - AWS re:Inven...
Building a Data Lake in Amazon S3 & Amazon Glacier (STG401-R1) - AWS re:Inven...
 
Analyzing Data Streams in Real Time with Amazon Kinesis: PNNL's Serverless Da...
Analyzing Data Streams in Real Time with Amazon Kinesis: PNNL's Serverless Da...Analyzing Data Streams in Real Time with Amazon Kinesis: PNNL's Serverless Da...
Analyzing Data Streams in Real Time with Amazon Kinesis: PNNL's Serverless Da...
 

Mehr von Amazon Web Services

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Amazon Web Services
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Amazon Web Services
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateAmazon Web Services
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSAmazon Web Services
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Amazon Web Services
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Amazon Web Services
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...Amazon Web Services
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsAmazon Web Services
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareAmazon Web Services
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSAmazon Web Services
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAmazon Web Services
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareAmazon Web Services
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWSAmazon Web Services
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckAmazon Web Services
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without serversAmazon Web Services
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...Amazon Web Services
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceAmazon Web Services
 

Mehr von Amazon Web Services (20)

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS Fargate
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWS
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot
 
Open banking as a service
Open banking as a serviceOpen banking as a service
Open banking as a service
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
 
Computer Vision con AWS
Computer Vision con AWSComputer Vision con AWS
Computer Vision con AWS
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatare
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e web
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
 
Fundraising Essentials
Fundraising EssentialsFundraising Essentials
Fundraising Essentials
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container Service
 

Serverless Datalake Day with AWS

  • 1. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Serverless Data Lake Day with AWS Tame Your Big Data with Kinesis Firehose, S3, Glue, Athena, Quicksight Kim Kao, Solutions Architect
  • 2. 2 Big Data still challenges most enterprises Store AnalyzeIngest 1 4 0 9 5CUSTOMER & OPERATIONAL DATA CUSTOMER & OPERATIONAL INSIGHTS ANALYTICS PIPELINE Rigid data ingest Inability to process new types of data Limited analytics platforms Inability to generate new types of insights Data doesn’t “connect” Siloed data prevents single source of truth
  • 3. 3 © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Data Lake concepts
  • 4. 4 © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. The concept of a Data Lake • All data in one place, a single source of truth • Handles structured/semi-structured/unstructured/raw data • Supports fast ingestion and consumption • Schema on read • Designed for low-cost storage • Decouples storage and compute • Supports protection and security rules
  • 5. 5 © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. A data lake is a centralized repository that allows you to store all your structured and unstructured data at any scale
  • 6. 6 © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Typical steps of building a data lake Setup storage1 Move data2 Cleanse, prep, and catalog data 3 Configure and enforce security and compliance policies 4 Make data available for analytics and consumption 5
  • 7. 7 Workshop Solution Architecture Architecture Highlights • Serverless • Hybrid PB3: Note: Draw.io source for the diagram will be shared at the Immersion Day
  • 8. 8 Service Used in this Lab S3: Store data at Any Scalable with Unmatched Durability & Availability • Built to store any amount of data • Runs on the world’s largest global cloud infrastructure • Designed to deliver 99.999999999% durability • Geographic redundancy & automatic replication • Seamlessly replicates data between any region • Tiered storage to optimize price/performance: Store data at $0.023/GB/month at S3 ($0.004/GB/month at Glacier) S3 Standard S3 Standard Infrequent Access S3 One Zone-IA Glacier
  • 9. 9 © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Process Data in Place… Amazon S3 Amazon Athena Amazon Redshift Spectrum Amazon SageMaker AWS Glue
  • 10. 10 Amazon Kinesis—Real Time time Load data streams into AWS data stores Kinesis Data Firehose Build custom applications that analyze data streams Kinesis Data Streams Capture, process, and store video streams for analytics Kinesis Video Streams Analyze data streams with SQL Kinesis Data Analytics SQL
  • 11. 11 Capture and submit streaming data to Firehose Analyze streaming data using your favorite BI tools Firehose loads streaming data continuously into S3, Amazon Redshift, and Amazon ES Service Used in this Lab Amazon Kinesis Firehose Zero administration: Capture and deliver streaming data to Amazon S3, Amazon Redshift, or Amazon Elasticsearch Service without writing an app or managing infrastructure. Direct-to-data-store integration: Batch, compress, and encrypt streaming data for delivery in as little as 60 seconds. Seamless elasticity: Seamlessly scales to match data throughput without intervention.
  • 12. 12 Kinesis Data Firehose—How it Works Ingest Transform Deliver Amazon S3 Amazon Redshift Amazon Elasticsearch Service AWS IoT Amazon Kinesis Agent Amazon Kinesis Streams Amazon CloudWatch Logs Amazon CloudWatch Events Apache Kafka
  • 13. 13 Kinesis Streams Kinesis Firehose Use case Capture and expose data streams to build arbitrary stream processing applications Fully-managed service for automatic loading of data streams into S3, Redshift, ElasticSearch etc. Provisioning and Resource Management Provisioned model via Streams No provisioning of the underlying resource called a DeliveryStream Customer-controllable construct of ‘Shards’ No Shards (not a customer visible construct) Scaling Customer owns explicit scaling of stream/ shards via Split/Merge APIs Firehose is completely elastic – no explicit scaling actions needed. Ingestion (Put Data) Customer owns “partition key’ as part of PUT* API call Different and simplified Put* API that doesn’t need a partition key Retrieval (Get Data) Get API to retrieve data directly from stream No Get* API – Data appears in destination (like S3/ Redshift) after meeting configuration criteria Full-flexibility to write/ manage own application with Kinesis Client Library and connector libraries No application to be written or managed. Customers specify simple set of pre-defined configurations on the console or API Amazon Kinesis Streams vs Firehose
  • 14. 14 Firehose Highlights • Records: The maximum size of a record (before Base64-encoding) is 1000 KB. • Compression: Amazon Kinesis Data Firehose allows you to compress your data before delivering it to Amazon S3. • Transformation: Firehose can invoke an AWS Lambda function to transform incoming data before delivering it to destinations. • Buffering • Buffer size is in MBs and ranges from 1MB to 128MB. • Buffer interval is in seconds and ranges from 60 seconds to 900 seconds. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Source: Firehose FAQ