SlideShare ist ein Scribd-Unternehmen logo
1 von 55
©	2017,	Amazon	Web	Services,	Inc.	or	its	Affiliates.	All	rights	reserved
Pop-up Loft
Building Data Lakes with AWS
John Mallory
Storage BD
©	2017,	Amazon	Web	Services,	Inc.	or	its	Affiliates.	All	rights	reserved
Rethink how to become a data-driven business
• Business outcomes - start with the insights and actions you want to drive,
then work backwards to a streamlined design
• Experimentation - start small, test many ideas, keep the good ones and
scale those up, paying only for what you consume
• Agile and timely - deploy data processing infrastructure in minutes, not
months. take advantage of a rich platform of services to respond quickly to
changing business needs
©	2017,	Amazon	Web	Services,	Inc.	or	its	Affiliates.	All	rights	reserved
Finding Value in Data is a Journey
Business Monitoring
Business Insights
New Business Opportunity
Business Optimization
Business Transformation
Evolving Tools and Infrastructure
©	2017,	Amazon	Web	Services,	Inc.	or	its	Affiliates.	All	rights	reserved
Often Undertaken with Silos of Tools and Data
Hadoop
Spark
NoSQL
Storage
Arrays
Databases
Data
Warehouse
Structured Data
SQL
Raw Data
ETL
Advanced Analytics
ETL
©	2017,	Amazon	Web	Services,	Inc.	or	its	Affiliates.	All	rights	reserved
Legacy Data Warehouses & RDBMS
• Complex to setup and manage
• Do not scale
• Takes months to add new
data sources
• Queries take too long
• Cost $MM upfront
©	2017,	Amazon	Web	Services,	Inc.	or	its	Affiliates.	All	rights	reserved
This Leads to Friction & Pain
• Challenging to move data across silos
• Forced to keep multiple copies of data
• Complex data transformation & governance
• Users struggle to find data they need
• Slows innovation and evolution
• Expensive
©	2017,	Amazon	Web	Services,	Inc.	or	its	Affiliates.	All	rights	reserved
Enter the Data Lake Architecture
Data Lake is a new and increasingly
popular architecture to store and
analyze massive volumes and
heterogeneous types of data.
Benefits of a Data Lake
• All Data in One Place
• Quick Ingest & Transformation
• Bring Functionality to the Data
• Schema on Read
©	2017,	Amazon	Web	Services,	Inc.	or	its	Affiliates.	All	rights	reserved
Consideration 1 – S3 for the Data Lake
©	2017,	Amazon	Web	Services,	Inc.	or	its	Affiliates.	All	rights	reserved
Consolidate Data / Separate Storage & Compute
• Amazon S3 as the data lake storage tier; not a single analytics
tool like Hadoop or a data warehouse
• Decoupled storage and compute is cheaper and more efficient to
operate
• Decoupled storage and compute allow us to evolve to clusterless
architectures (i.e. AWS Lambda, Amazon Athena, Redshift
Spectrum, AWS Glue, Amazon Macie)
• Do not build data silos in Hadoop or an EDW
• Gain the flexibility to use all the analytics tools in the ecosystem
around S3 & future proof the architecture
©	2017,	Amazon	Web	Services,	Inc.	or	its	Affiliates.	All	rights	reserved
An AWS Data Lake Architecture
AWS	Glue
ETL	&	Data	Catalog
Serverless
Compute
AWS	Lambda
Trigger-based	Code	Execution
Amazon	Redshift	Spectrum
Fast	@	Exabyte	scale
Amazon	Athena
Interactive	Query
Data
Processing
Amazon	EMR
Managed	Hadoop	Applications
Amazon	Redshift
Petabyte-scale	Data	Warehousing
Storage
Amazon	S3	
Exabyte-scale	Object	Storage
AWS	Glue	Data	Catalog
Hive-compatible	Metastore
©	2017,	Amazon	Web	Services,	Inc.	or	its	Affiliates.	All	rights	reserved
• Nasdaq implements an S3 data lake + Redshift data warehouse
architecture
• Most recent two years of data is kept in the Redshift data
warehouse and snapshotted into S3 for disaster recovery
• Data between two and five years old is kept in S3
• Presto on EMR is used to ad-hoc query data in S3
• Transitioned from an on-premises data warehouse to Amazon
Redshift & S3 data lake architecture
• Over 1,000 tables migrated
• Average daily ingest of over 7B rows
• Migrated off legacy DW to AWS (start to finish) in 7 man-months
• AWS costs were 43% of legacy budget for the same data set
(~1100 tables)
©	2017,	Amazon	Web	Services,	Inc.	or	its	Affiliates.	All	rights	reserved
Nasdaq uses Presto on Amazon EMR and Amazon Redshift
as a tiered data lake
Full	Presentation:	https://www.youtube.com/watch?v=LuHxnOQarXU
©	2017,	Amazon	Web	Services,	Inc.	or	its	Affiliates.	All	rights	reserved
Designed	for	11	9s	
of	durability
§ Multiple	Encryption	Options
§ Robust/Highly	Flexible	Access	Controls
Durable Secure High	performance
§ Multiple	upload
§ Range	GET
§ Scalable	Throughput
§ Amazon	EMR
§ Amazon	Redshift/Spectrum
§ Amazon	DynamoDB
§ Amazon	Athena
§ Amazon	Rekognition
§ Amazon	Glue
IntegratedEasy	to	use
§ Simple	REST	API
§ AWS	SDKs
§ Read-after-create	consistency
§ Event	notification
§ Lifecycle	policies
§ Simple	Management	Tools
§ Hadoop	compatibility
Scalable
§ Store	as	much	as	you	need
§ Scale	storage	and	compute	
independently
§ Scale	without	limits
§ Affordable
Why Choose Amazon S3 for Data Lake?
©	2017,	Amazon	Web	Services,	Inc.	or	its	Affiliates.	All	rights	reserved
Optimize Costs with Data Tiering
• Use HDFS for very frequently accessed
(hot) data
• Use Amazon S3 Standard for frequently
accessed data
• Use Amazon S3 Standard – IA for less
frequently accessed data
• Use Amazon Glacier for archiving cold data
• Use Amazon S3 Analytics for storage class
analysis
New
©	2017,	Amazon	Web	Services,	Inc.	or	its	Affiliates.	All	rights	reserved
Encryption ComplianceSecurity
§ Identity	and	Access	
Management	(IAM)	policies	
§ Bucket	policies
§ Access	Control	Lists	(ACLs)
§ Private	VPC	endpoints	to	
Amazon	S3
§ SSL	endpoints
§ Server	Side	Encryption	
(SSE-S3)
§ S3	Server	Side	Encryption	
with	provided	keys	(SSE-
C,	SSE-KMS)
§ Client-side	Encryption
§ Buckets	access	logs
§ Lifecycle	Management	
Policies
§ Access	Control	Lists	(ACLs)
§ Versioning	&	MFA	deletes
§ Certifications	– HIPAA,	PCI,	
SOC	1/2/3	etc.
Implement the right security controls in S3
©	2017,	Amazon	Web	Services,	Inc.	or	its	Affiliates.	All	rights	reserved
Manage your data
S3 object Tags
Manage storage based on object tags
• Classify your data
• Tag your objects with key-value pairs
• Write policies once based on the type of data
Discoverability Lifecycle PolicyAccess Control
©	2017,	Amazon	Web	Services,	Inc.	or	its	Affiliates.	All	rights	reserved
Manage S3 Security
{
"Version":	"2012-10-17",
"Statement":	[
{
"Effect":	"Allow",
"Action":	[
"s3:GetObject"					
],
"Resource":	"arn:aws:s3:::EXAMPLE-BUCKET-NAME/*"
"Condition":	{"StringEquals":	{"S3:ResourceTag/HIPAA":"True"}}
}
]
}	
Manage	permissions	with	tags
©	2017,	Amazon	Web	Services,	Inc.	or	its	Affiliates.	All	rights	reserved
Macie:	A	New	Approach
Amazon	Macie
Understand	Your	Data
Natural	Language	
Processing	(NLP)
Understand	Data	Access
Machine	Learning
©	2017,	Amazon	Web	Services,	Inc.	or	its	Affiliates.	All	rights	reserved
Amazon	Macie	Uses	Machine	Learning
• Understand	behavioral	analytics	to	baseline	normal	behavior
• Train	and	develop	contextualized	alerts	by	understanding	the	
value	of	data	being	accessed
• Context	for	content
©	2017,	Amazon	Web	Services,	Inc.	or	its	Affiliates.	All	rights	reserved
Business	Critical	Data	in	Amazon	S3
• Static	website	content
• Source	code
• SSL	certificates,	private	keys
• iOS	and	Android	app	signing	
keys
• Database	backups
• OAuth	and	Cloud	SAAS	API	
Keys
©	2017,	Amazon	Web	Services,	Inc.	or	its	Affiliates.	All	rights	reserved
©	2017,	Amazon	Web	Services,	Inc.	or	its	Affiliates.	All	rights	reserved
Consideration 2 – Ingest & Catalog
©	2017,	Amazon	Web	Services,	Inc.	or	its	Affiliates.	All	rights	reserved
AWS Snowball & Snowmobile
• Accelerate PBs with AWS-provided
appliances
• 50, 80, 100 TB models
• 100PB Snowmobile
AWS Storage Gateway
• Instant hybrid cloud
• Up to 120 MB/s cloud upload rate
(4x improvement), and
Choose the Right Ingestion Methods
Amazon Kinesis Firehose
• Ingest device streams directly
into AWS data stores
AWS Direct Connect
• COLO to AWS
• Use native copy tools
Native/ISV Connectors
• Sqoop, Flume, DistCp
• Commvault, Veritas, etc
Amazon S3 Transfer
Acceleration
• Move data up to 300% faster
using AWS’s private network
©	2017,	Amazon	Web	Services,	Inc.	or	its	Affiliates.	All	rights	reserved
Amazon Kinesis Firehose
Load massive volumes of streaming data into Amazon S3, Redshift
and Elasticsearch
• Zero administration: Capture and deliver streaming data into Amazon S3, Amazon Redshift, and other
destinations without writing an application or managing infrastructure.
• Direct-to-data store integration: Batch, compress, and encrypt streaming data for delivery into data
destinations in as little as 60 secs using simple configurations.
• Seamless elasticity: Seamlessly scales to match data throughput w/o intervention
• Serverless ETL using AWS Lambda - Firehose can invoke your Lambda function to transform
incoming source data.
Capture	and	submit	streaming	
data
Analyze	streaming	data	using	your	
favorite	BI	tools	
Firehose	loads	streaming	data	continuously	
into	Amazon	S3,	Redshift	and	Elasticsearch
©	2017,	Amazon	Web	Services,	Inc.	or	its	Affiliates.	All	rights	reserved
Catalog Your Data
S3
Put	data	in	S3
Amazon
DynamoDB
Amazon	Elasticsearch
Service
Extract	metadata
with	Lambda
Data	Sources
Search	
capabilities
©	2017,	Amazon	Web	Services,	Inc.	or	its	Affiliates.	All	rights	reserved
Catalog with AWS Glue
©	2017,	Amazon	Web	Services,	Inc.	or	its	Affiliates.	All	rights	reserved
Glue Data	Catalog
Manage	table	metadata	through	a	Hive	metastore	API	or	Hive	SQL.	Supported	by	
tools	like	Hive,	Presto,	Spark	etc.
We	added	a	few	extensions:
§ Search over	metadata	for	data	discovery
§ Connection	info – JDBC	URLs,	credentials
§ Classification for	identifying	and	parsing	files
§ Versioning of	table	metadata	as	schemas	evolve	and	other	metadata	are	updated
Populate	using	Hive	DDL,	bulk	import,	or	automatically	through	Crawlers.
©	2017,	Amazon	Web	Services,	Inc.	or	its	Affiliates.	All	rights	reserved
Data Catalog: Crawlers
Automatically discover new data, extracts schema definitions
• Detect schema changes and version tables
• Detect Hive style partitions on Amazon S3
Built-in classifiers for popular types; custom classifiers using Grok expressions
Run ad hoc or on a schedule; serverless – only pay when crawler runs
Crawlers	automatically	build	your	Data	Catalog	and	keep	it	in	sync
©	2017,	Amazon	Web	Services,	Inc.	or	its	Affiliates.	All	rights	reserved
AWS Glue Data Catalog
Bring	in	metadata	from	a	variety	of	data	sources	(Amazon	S3,	Amazon	Redshift,	etc.)	into	a	single	categorized	
list	that	is	searchable
©	2017,	Amazon	Web	Services,	Inc.	or	its	Affiliates.	All	rights	reserved
Consideration 3 – Optimizing Performance
©	2017,	Amazon	Web	Services,	Inc.	or	its	Affiliates.	All	rights	reserved
Getting high Throughput Performance with S3
• S3 can scale to many thousands of requests per second
• Need a good key naming scheme
• Only at scale do you need to consider your key naming scheme
• What are Partitions?
• Why?
• Spread Keys Lexigraphically
• Goal of Partitioning is too spread the heat
• Prevent HotSpots
©	2017,	Amazon	Web	Services,	Inc.	or	its	Affiliates.	All	rights	reserved
Distributing key names
• Add randomness to the beginning of the key name…
<my_bucket>/6213-2013_11_13.jpg
<my_bucket>/4653-2013_11_13.jpg
<my_bucket>/9873-2013_11_13.jpg
<my_bucket>/4657-2013_11_13.jpg
<my_bucket>/1256-2013_11_13.jpg
<my_bucket>/8345-2013_11_13.jpg
<my_bucket>/0321-2013_11_13.jpg
<my_bucket>/5654-2013_11_13.jpg
<my_bucket>/2345-2013_11_13.jpg
<my_bucket>/7567-2013_11_13.jpg
<my_bucket>/3455-2013_11_13.jpg
<my_bucket>/4313-2013_11_13.jpg
Partitions:
<my_bucket>/0
<my_bucket>/1
<my_bucket>/2
<my_bucket>/3
<my_bucket>/4
<my_bucket>/5
<my_bucket>/6
<my_bucket>/7
<my_bucket>/8
<my_bucket>/9
©	2017,	Amazon	Web	Services,	Inc.	or	its	Affiliates.	All	rights	reserved
Data Recommendations for EMR and S3
Performance Best Practices:
• Reduce Number of S3 objects by aggregating small files
into larger ones (s3distcp – group-by option)
• Goal: Files >128MB
• Use EMRFS with Consistent View
• Parquet with Snappy compression is emerging as the best
compression algorithm
• Reverse partition scheme to HOUR, DAY, MONTH, YEAR
©	2017,	Amazon	Web	Services,	Inc.	or	its	Affiliates.	All	rights	reserved
Use the Right Data Formats
• Pay by the amount of data scanned per query
• Use Compressed Columnar Formats
• Parquet
• ORC
• Easy to integrate with wide variety of tools
Dataset Size	on	Amazon	S3 Query	Run	time Data	Scanned Cost
Logs	stored	as	Text	files 1	TB 237	seconds 1.15TB $5.75
Logs	stored	in	Apache	
Parquet	format*
130	GB 5.13	seconds 2.69	GB $0.013
Savings 87%	less	with	Parquet 34x	faster 99%	less	data	scanned 99.7%	cheaper
©	2017,	Amazon	Web	Services,	Inc.	or	its	Affiliates.	All	rights	reserved
Consideration 4 – Query in Place
©	2017,	Amazon	Web	Services,	Inc.	or	its	Affiliates.	All	rights	reserved
S3
Data	Catalog
AthenaEMR Redshift
Spectrum
Amazon	ML	/	MXNet
RDS
QuickSight
Kinesis
Database	
Migration
Service
Glue
Amazon Analytics End to End Architecture
IAM
Other
Sources
©	2017,	Amazon	Web	Services,	Inc.	or	its	Affiliates.	All	rights	reserved
Explore Your Data Without ETL
• Amazon Athena is an interactive query service
that makes it easy to analyze data directly from
Amazon S3 using Standard SQL
©	2017,	Amazon	Web	Services,	Inc.	or	its	Affiliates.	All	rights	reserved
Athena is Serverless
• No Infrastructure or
administration
• Zero Spin up time
• Transparent upgrades
©	2017,	Amazon	Web	Services,	Inc.	or	its	Affiliates.	All	rights	reserved
Query Data Directly from Amazon S3
• No loading of data
• Query data in its raw format
• Athena supports multiple data formats
• Text, CSV, TSV, JSON, weblogs, AWS service logs
• Or convert to an optimized form like ORC or Parquet for the best performance
and lowest cost
• No ETL required
• Stream data directly from Amazon S3
©	2017,	Amazon	Web	Services,	Inc.	or	its	Affiliates.	All	rights	reserved
Familiar Technologies Under the Covers
• Used for SQL Queries
• In-memory distributed query engine
• ANSI-SQL compatible with extensions
• Used for DDL functionality
• Complex data types
• Multitude of formats
• Supports data partitioning
©	2017,	Amazon	Web	Services,	Inc.	or	its	Affiliates.	All	rights	reserved
What About ETL?
Raw Data Assets Transformed Into Usable Ones
©	2017,	Amazon	Web	Services,	Inc.	or	its	Affiliates.	All	rights	reserved
ETL is the most time-consuming part of analytics
ETL Data Warehousing Business Intelligence
80% of time
spent here
Amazon Redshift Amazon QuickSight
©	2017,	Amazon	Web	Services,	Inc.	or	its	Affiliates.	All	rights	reserved
AWS Glue
Simple, flexible, cost-effective ETL
§ AWS	Glue	is	a	fully	managed	ETL	(extract,	transform,	and	load)	service
§ Categorize	your	data,	clean	it,	enrich	it	and	move	it	reliably							between	
various	data	stores
§ Once	catalogued,	your	data	is	immediately	searchable	and	queryable	across	
your	data	silos
§ Simple	and	cost-effective
§ Serverless;	runs	on	a	fully	managed,	scale-out	Spark	environment
©	2017,	Amazon	Web	Services,	Inc.	or	its	Affiliates.	All	rights	reserved
Build event-driven ETL pipelines
©	2017,	Amazon	Web	Services,	Inc.	or	its	Affiliates.	All	rights	reserved
Relational data warehouse
Massively parallel; Petabyte scale
Fully managed
Supports Standard ANSI SQL
High Performance
Amazon	
Redshift
a lot faster
a lot simpler
a lot cheaper
Fully Managed Petabyte-scale Data
Warehouse
©	2017,	Amazon	Web	Services,	Inc.	or	its	Affiliates.	All	rights	reserved
Amazon Redshift Spectrum
Run SQL queries directly against data in S3 using thousands of nodes
Fast @ exabyte scale Elastic & highly available On-demand, pay-per-query
High concurrency: Multiple
clusters access same data
No ETL: Query data in-place
using open file formats
Full Amazon Redshift
SQL support
S3
SQL
©	2017,	Amazon	Web	Services,	Inc.	or	its	Affiliates.	All	rights	reserved
Amazon	EMR
Real-time Analytics
Amazon
Kinesis
KCL	app
AWS	Lambda
Spark
Streaming
Amazon	
SNS
Amazon
ML
Notifications
Amazon
ElastiCache	
(Redis)
Amazon
DynamoDB
Amazon
RDS
Amazon
ES
Alerts
App	state
Real-time	prediction
KPI
process
store
Stream
Amazon	Kinesis
Analytics
Amazon
S3
Log
Amazon	
KinesisFan	out
©	2017,	Amazon	Web	Services,	Inc.	or	its	Affiliates.	All	rights	reserved
Case Study: Clickstream Analysis
Hearst	Corporation	monitors	trending	content	for	over	250	digital	properties	worldwide	
and	processes	more	than	30TB	of	data	per	day,	using	an	architecture	that	includes	
Amazon	Kinesis	and	Spark	running	on	Amazon	EMR.
Store → Process	|	Analyze	 →	 Answers
©	2017,	Amazon	Web	Services,	Inc.	or	its	Affiliates.	All	rights	reserved
Amazon	Kinesis
Amazon	
EMR
Amazon	EMR
Amazon	Redshift
Elasticsearch
Clickstream
Hearst	Corporation	monitors	
trending	content	for	over	250	
digital	properties	worldwide	
and	processes	more	than	30TB	
of	data	per	day,	using	an	
architecture	that	includes	
Amazon	Kinesis	and	Spark	
running	on	Amazon	EMR.
©	2017,	Amazon	Web	Services,	Inc.	or	its	Affiliates.	All	rights	reserved
&
Batch
Analytics
Amazon	S3
Amazon	EMR
Hive
Pig
Spark
Amazon
ML
process
store
Consume
Amazon	Redshift
Amazon	EMR
Presto
Spark
Batch
Interactive
Batch	prediction
Real-time	prediction
Stream Amazon	Kinesis
Firehose
Amazon	Athena
Files
Amazon	Kinesis
Analytics
©	2017,	Amazon	Web	Services,	Inc.	or	its	Affiliates.	All	rights	reserved
Ingest/
Collect	
Consume/
visualize
Store
Process/
analyze
Data
1 4
0 9
5
Amazon	S3
Data	lake
Amazon	EMR
Amazon
Kinesis
Amazon	RedShift
Answers	&	
Insights
Hot	HomesUsers
Properties
Agents
User Profile
Recommendation
Hot Homes
Similar Homes
Agent Follow-up
Agent Scorecard
Marketing
A/B Testing
Real Time Data
…
Amazon	
DynamoDB
BI	/	Reporting
©	2017,	Amazon	Web	Services,	Inc.	or	its	Affiliates.	All	rights	reserved
Choose the Right Tools
Amazon	Redshift,	Spectrum
Enterprise	Data	Warehouse
Amazon	EMR
Hadoop/Spark
Amazon	Athena
Clusterless SQL
Amazon	Glue
Clusterless ETL
Amazon	Aurora
Managed	Relational	Database
Amazon	Machine	Learning
Predictive	Analytics
Amazon	Quicksight
Business	Intelligence/Visualization
Amazon	ElasticSearch Service
ElasticSearch
Amazon	ElastiCache
Redis In-memory	Datastore
Amazon	DynamoDB
Managed	NoSQL	Database
Amazon	Rekognition &	Amazon	Polly
Image	Recognition	&	Text-to-Speech	AI	APIs
Amazon	Lex
Voice	or	Text	Chatbots
Amazon S3
Data Lake
Amazon Kinesis
Streams & Firehose
Hadoop / Spark
Streaming Analytics Tools
Amazon Redshift
Data Warehouse
Amazon DynamoDB
NoSQL Database
AWS Lambda
Spark Streaming
on EMR
Amazon
Elasticsearch Service
Relational Database
Amazon EMR
Amazon Aurora
Amazon Machine Learning
Predictive Analytics
Any Open Source Tool
of Choice on EC2
AWS Data Lake
You Don’t
Have to
Choose
Data Science Sandbox
Visualization /
Reporting
Apache Storm
on EMR
Apache Flink
on EMR
Amazon Kinesis
Analytics
Serving Tier
Clusterless SQL Query
Amazon Athena
DataSourcesTransactionalData
Amazon Glue
Clusterless ETL
Amazon ElastiCache
Redis
©	2017,	Amazon	Web	Services,	Inc.	or	its	Affiliates.	All	rights	reserved
• Use S3 as the storage repository for your data lake,
instead of a Hadoop cluster or data warehouse
• Decoupled storage and compute is cheaper and more
efficient to operate
• Decoupled storage and compute allow us to evolve to
clusterless architectures like Athena
• Do not build data silos in Hadoop or the Enterprise DW
• Gain flexibility to use all the analytics tools in the ecosystem
around S3 & future proof the architecture
Evolve as Needed
©	2017,	Amazon	Web	Services,	Inc.	or	its	Affiliates.	All	rights	reserved
Pop-up Loft

Weitere ähnliche Inhalte

Was ist angesagt?

Visualization with Amazon QuickSight
Visualization with Amazon QuickSightVisualization with Amazon QuickSight
Visualization with Amazon QuickSightAmazon Web Services
 
Data Catalog & ETL - Glue & Athena
Data Catalog & ETL - Glue & AthenaData Catalog & ETL - Glue & Athena
Data Catalog & ETL - Glue & AthenaAmazon Web Services
 
Migrating Databases to the Cloud: Introduction to AWS DMS - SRV215 - Chicago ...
Migrating Databases to the Cloud: Introduction to AWS DMS - SRV215 - Chicago ...Migrating Databases to the Cloud: Introduction to AWS DMS - SRV215 - Chicago ...
Migrating Databases to the Cloud: Introduction to AWS DMS - SRV215 - Chicago ...Amazon Web Services
 
Building a Data Lake on AWS
Building a Data Lake on AWSBuilding a Data Lake on AWS
Building a Data Lake on AWSGary Stafford
 
ABD315_Serverless ETL with AWS Glue
ABD315_Serverless ETL with AWS GlueABD315_Serverless ETL with AWS Glue
ABD315_Serverless ETL with AWS GlueAmazon Web Services
 
Building a Data Lake on S3 for IoT Workloads
Building a Data Lake on S3 for IoT WorkloadsBuilding a Data Lake on S3 for IoT Workloads
Building a Data Lake on S3 for IoT WorkloadsAmazon Web Services
 
Snowflake Architecture.pptx
Snowflake Architecture.pptxSnowflake Architecture.pptx
Snowflake Architecture.pptxchennakesava44
 
Amazon Aurora and AWS Database Migration Service
Amazon Aurora and AWS Database Migration ServiceAmazon Aurora and AWS Database Migration Service
Amazon Aurora and AWS Database Migration ServiceAmazon Web Services
 
A tour of Amazon Redshift
A tour of Amazon RedshiftA tour of Amazon Redshift
A tour of Amazon RedshiftKel Graham
 
Amazon Athena Capabilities and Use Cases Overview
Amazon Athena Capabilities and Use Cases Overview Amazon Athena Capabilities and Use Cases Overview
Amazon Athena Capabilities and Use Cases Overview Amazon Web Services
 
Best Practices for Building a Data Lake in Amazon S3 and Amazon Glacier, with...
Best Practices for Building a Data Lake in Amazon S3 and Amazon Glacier, with...Best Practices for Building a Data Lake in Amazon S3 and Amazon Glacier, with...
Best Practices for Building a Data Lake in Amazon S3 and Amazon Glacier, with...Amazon Web Services
 

Was ist angesagt? (20)

Introduction to Amazon Redshift
Introduction to Amazon RedshiftIntroduction to Amazon Redshift
Introduction to Amazon Redshift
 
Visualization with Amazon QuickSight
Visualization with Amazon QuickSightVisualization with Amazon QuickSight
Visualization with Amazon QuickSight
 
Building-a-Data-Lake-on-AWS
Building-a-Data-Lake-on-AWSBuilding-a-Data-Lake-on-AWS
Building-a-Data-Lake-on-AWS
 
Data Catalog & ETL - Glue & Athena
Data Catalog & ETL - Glue & AthenaData Catalog & ETL - Glue & Athena
Data Catalog & ETL - Glue & Athena
 
Amazon ElastiCache and Redis
Amazon ElastiCache and RedisAmazon ElastiCache and Redis
Amazon ElastiCache and Redis
 
Migrating Databases to the Cloud: Introduction to AWS DMS - SRV215 - Chicago ...
Migrating Databases to the Cloud: Introduction to AWS DMS - SRV215 - Chicago ...Migrating Databases to the Cloud: Introduction to AWS DMS - SRV215 - Chicago ...
Migrating Databases to the Cloud: Introduction to AWS DMS - SRV215 - Chicago ...
 
Building a Data Lake on AWS
Building a Data Lake on AWSBuilding a Data Lake on AWS
Building a Data Lake on AWS
 
Introduction to Amazon Athena
Introduction to Amazon AthenaIntroduction to Amazon Athena
Introduction to Amazon Athena
 
Amazon QuickSight
Amazon QuickSightAmazon QuickSight
Amazon QuickSight
 
ABD315_Serverless ETL with AWS Glue
ABD315_Serverless ETL with AWS GlueABD315_Serverless ETL with AWS Glue
ABD315_Serverless ETL with AWS Glue
 
Building a Data Lake on S3 for IoT Workloads
Building a Data Lake on S3 for IoT WorkloadsBuilding a Data Lake on S3 for IoT Workloads
Building a Data Lake on S3 for IoT Workloads
 
AWS RDS
AWS RDSAWS RDS
AWS RDS
 
Snowflake Architecture.pptx
Snowflake Architecture.pptxSnowflake Architecture.pptx
Snowflake Architecture.pptx
 
Amazon Aurora and AWS Database Migration Service
Amazon Aurora and AWS Database Migration ServiceAmazon Aurora and AWS Database Migration Service
Amazon Aurora and AWS Database Migration Service
 
Introduction to Amazon Aurora
Introduction to Amazon AuroraIntroduction to Amazon Aurora
Introduction to Amazon Aurora
 
Amazon Aurora: Under the Hood
Amazon Aurora: Under the HoodAmazon Aurora: Under the Hood
Amazon Aurora: Under the Hood
 
A tour of Amazon Redshift
A tour of Amazon RedshiftA tour of Amazon Redshift
A tour of Amazon Redshift
 
Amazon Athena Capabilities and Use Cases Overview
Amazon Athena Capabilities and Use Cases Overview Amazon Athena Capabilities and Use Cases Overview
Amazon Athena Capabilities and Use Cases Overview
 
Best Practices for Building a Data Lake in Amazon S3 and Amazon Glacier, with...
Best Practices for Building a Data Lake in Amazon S3 and Amazon Glacier, with...Best Practices for Building a Data Lake in Amazon S3 and Amazon Glacier, with...
Best Practices for Building a Data Lake in Amazon S3 and Amazon Glacier, with...
 
Amazon Redshift
Amazon Redshift Amazon Redshift
Amazon Redshift
 

Ähnlich wie Building Data Lakes with AWS

Architecting an Open Data Lake for the Enterprise
Architecting an Open Data Lake for the EnterpriseArchitecting an Open Data Lake for the Enterprise
Architecting an Open Data Lake for the EnterpriseAmazon Web Services
 
Storage Data Management: Tools and Templates to Seamlessly Automate and Optim...
Storage Data Management: Tools and Templates to Seamlessly Automate and Optim...Storage Data Management: Tools and Templates to Seamlessly Automate and Optim...
Storage Data Management: Tools and Templates to Seamlessly Automate and Optim...Amazon Web Services
 
Implementazione di una soluzione Data Lake.pdf
Implementazione di una soluzione Data Lake.pdfImplementazione di una soluzione Data Lake.pdf
Implementazione di una soluzione Data Lake.pdfAmazon Web Services
 
STG206_Big Data Data Lakes and Data Oceans
STG206_Big Data Data Lakes and Data OceansSTG206_Big Data Data Lakes and Data Oceans
STG206_Big Data Data Lakes and Data OceansAmazon Web Services
 
Cutting to the chase for Machine Learning Analytics Ecosystem & AWS Lake Form...
Cutting to the chase for Machine Learning Analytics Ecosystem & AWS Lake Form...Cutting to the chase for Machine Learning Analytics Ecosystem & AWS Lake Form...
Cutting to the chase for Machine Learning Analytics Ecosystem & AWS Lake Form...AWS Riyadh User Group
 
Amazon S3 & Amazon Glacier - Object Storage Overview
Amazon S3 & Amazon Glacier - Object Storage OverviewAmazon S3 & Amazon Glacier - Object Storage Overview
Amazon S3 & Amazon Glacier - Object Storage OverviewAmazon Web Services
 
ABD201-Big Data Architectural Patterns and Best Practices on AWS
ABD201-Big Data Architectural Patterns and Best Practices on AWSABD201-Big Data Architectural Patterns and Best Practices on AWS
ABD201-Big Data Architectural Patterns and Best Practices on AWSAmazon Web Services
 
Building a Modern Data Platform on AWS
Building a Modern Data Platform on AWSBuilding a Modern Data Platform on AWS
Building a Modern Data Platform on AWSAmazon Web Services
 
Build Data Lakes & Analytics on AWS: Patterns & Best Practices
Build Data Lakes & Analytics on AWS: Patterns & Best PracticesBuild Data Lakes & Analytics on AWS: Patterns & Best Practices
Build Data Lakes & Analytics on AWS: Patterns & Best PracticesAmazon Web Services
 
Build Data Lakes and Analytics on AWS: Patterns & Best Practices
Build Data Lakes and Analytics on AWS: Patterns & Best PracticesBuild Data Lakes and Analytics on AWS: Patterns & Best Practices
Build Data Lakes and Analytics on AWS: Patterns & Best PracticesAmazon Web Services
 
21st Century Analytics with Zopa
21st Century Analytics with Zopa21st Century Analytics with Zopa
21st Century Analytics with ZopaAmazon Web Services
 
Migrating your traditional Data Warehouse to a Modern Data Lake
Migrating your traditional Data Warehouse to a Modern Data LakeMigrating your traditional Data Warehouse to a Modern Data Lake
Migrating your traditional Data Warehouse to a Modern Data LakeAmazon Web Services
 
Building Data Lakes and Analytics on AWS
Building Data Lakes and Analytics on AWSBuilding Data Lakes and Analytics on AWS
Building Data Lakes and Analytics on AWSAmazon Web Services
 
Building Data Lakes and Analytics on AWS
Building Data Lakes and Analytics on AWSBuilding Data Lakes and Analytics on AWS
Building Data Lakes and Analytics on AWSAmazon Web Services
 
Using AWS CloudTrail to Enhance Governance and Compliance of Amazon S3 - DEV3...
Using AWS CloudTrail to Enhance Governance and Compliance of Amazon S3 - DEV3...Using AWS CloudTrail to Enhance Governance and Compliance of Amazon S3 - DEV3...
Using AWS CloudTrail to Enhance Governance and Compliance of Amazon S3 - DEV3...Amazon Web Services
 
Data Lake Implementation: Processing and Querying Data in Place (STG204-R1) -...
Data Lake Implementation: Processing and Querying Data in Place (STG204-R1) -...Data Lake Implementation: Processing and Querying Data in Place (STG204-R1) -...
Data Lake Implementation: Processing and Querying Data in Place (STG204-R1) -...Amazon Web Services
 
Building+your+Data+Project+on+AWS+-+Luke+Anderson.pdf
Building+your+Data+Project+on+AWS+-+Luke+Anderson.pdfBuilding+your+Data+Project+on+AWS+-+Luke+Anderson.pdf
Building+your+Data+Project+on+AWS+-+Luke+Anderson.pdfSasikumarPalanivel3
 
Building+your+Data+Project+on+AWS+-+Luke+Anderson.pdf
Building+your+Data+Project+on+AWS+-+Luke+Anderson.pdfBuilding+your+Data+Project+on+AWS+-+Luke+Anderson.pdf
Building+your+Data+Project+on+AWS+-+Luke+Anderson.pdfsaidbilgen
 

Ähnlich wie Building Data Lakes with AWS (20)

Securing Your Big Data on AWS
Securing Your Big Data on AWSSecuring Your Big Data on AWS
Securing Your Big Data on AWS
 
Architecting an Open Data Lake for the Enterprise
Architecting an Open Data Lake for the EnterpriseArchitecting an Open Data Lake for the Enterprise
Architecting an Open Data Lake for the Enterprise
 
Storage Data Management: Tools and Templates to Seamlessly Automate and Optim...
Storage Data Management: Tools and Templates to Seamlessly Automate and Optim...Storage Data Management: Tools and Templates to Seamlessly Automate and Optim...
Storage Data Management: Tools and Templates to Seamlessly Automate and Optim...
 
Implementazione di una soluzione Data Lake.pdf
Implementazione di una soluzione Data Lake.pdfImplementazione di una soluzione Data Lake.pdf
Implementazione di una soluzione Data Lake.pdf
 
STG206_Big Data Data Lakes and Data Oceans
STG206_Big Data Data Lakes and Data OceansSTG206_Big Data Data Lakes and Data Oceans
STG206_Big Data Data Lakes and Data Oceans
 
Cutting to the chase for Machine Learning Analytics Ecosystem & AWS Lake Form...
Cutting to the chase for Machine Learning Analytics Ecosystem & AWS Lake Form...Cutting to the chase for Machine Learning Analytics Ecosystem & AWS Lake Form...
Cutting to the chase for Machine Learning Analytics Ecosystem & AWS Lake Form...
 
Amazon S3 & Amazon Glacier - Object Storage Overview
Amazon S3 & Amazon Glacier - Object Storage OverviewAmazon S3 & Amazon Glacier - Object Storage Overview
Amazon S3 & Amazon Glacier - Object Storage Overview
 
ABD201-Big Data Architectural Patterns and Best Practices on AWS
ABD201-Big Data Architectural Patterns and Best Practices on AWSABD201-Big Data Architectural Patterns and Best Practices on AWS
ABD201-Big Data Architectural Patterns and Best Practices on AWS
 
Building a Modern Data Platform on AWS
Building a Modern Data Platform on AWSBuilding a Modern Data Platform on AWS
Building a Modern Data Platform on AWS
 
Build Data Lakes & Analytics on AWS: Patterns & Best Practices
Build Data Lakes & Analytics on AWS: Patterns & Best PracticesBuild Data Lakes & Analytics on AWS: Patterns & Best Practices
Build Data Lakes & Analytics on AWS: Patterns & Best Practices
 
Build Data Lakes and Analytics on AWS: Patterns & Best Practices
Build Data Lakes and Analytics on AWS: Patterns & Best PracticesBuild Data Lakes and Analytics on AWS: Patterns & Best Practices
Build Data Lakes and Analytics on AWS: Patterns & Best Practices
 
21st Century Analytics with Zopa
21st Century Analytics with Zopa21st Century Analytics with Zopa
21st Century Analytics with Zopa
 
Migrating your traditional Data Warehouse to a Modern Data Lake
Migrating your traditional Data Warehouse to a Modern Data LakeMigrating your traditional Data Warehouse to a Modern Data Lake
Migrating your traditional Data Warehouse to a Modern Data Lake
 
Building Data Lakes and Analytics on AWS
Building Data Lakes and Analytics on AWSBuilding Data Lakes and Analytics on AWS
Building Data Lakes and Analytics on AWS
 
Building Data Lakes and Analytics on AWS
Building Data Lakes and Analytics on AWSBuilding Data Lakes and Analytics on AWS
Building Data Lakes and Analytics on AWS
 
Using AWS CloudTrail to Enhance Governance and Compliance of Amazon S3 - DEV3...
Using AWS CloudTrail to Enhance Governance and Compliance of Amazon S3 - DEV3...Using AWS CloudTrail to Enhance Governance and Compliance of Amazon S3 - DEV3...
Using AWS CloudTrail to Enhance Governance and Compliance of Amazon S3 - DEV3...
 
Data Lake Implementation: Processing and Querying Data in Place (STG204-R1) -...
Data Lake Implementation: Processing and Querying Data in Place (STG204-R1) -...Data Lake Implementation: Processing and Querying Data in Place (STG204-R1) -...
Data Lake Implementation: Processing and Querying Data in Place (STG204-R1) -...
 
Construindo data lakes e analytics com AWS
Construindo data lakes e analytics com AWSConstruindo data lakes e analytics com AWS
Construindo data lakes e analytics com AWS
 
Building+your+Data+Project+on+AWS+-+Luke+Anderson.pdf
Building+your+Data+Project+on+AWS+-+Luke+Anderson.pdfBuilding+your+Data+Project+on+AWS+-+Luke+Anderson.pdf
Building+your+Data+Project+on+AWS+-+Luke+Anderson.pdf
 
Building+your+Data+Project+on+AWS+-+Luke+Anderson.pdf
Building+your+Data+Project+on+AWS+-+Luke+Anderson.pdfBuilding+your+Data+Project+on+AWS+-+Luke+Anderson.pdf
Building+your+Data+Project+on+AWS+-+Luke+Anderson.pdf
 

Mehr von Amazon Web Services

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Amazon Web Services
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Amazon Web Services
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateAmazon Web Services
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSAmazon Web Services
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Amazon Web Services
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Amazon Web Services
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...Amazon Web Services
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsAmazon Web Services
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareAmazon Web Services
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSAmazon Web Services
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAmazon Web Services
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareAmazon Web Services
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWSAmazon Web Services
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckAmazon Web Services
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without serversAmazon Web Services
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...Amazon Web Services
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceAmazon Web Services
 

Mehr von Amazon Web Services (20)

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS Fargate
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWS
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot
 
Open banking as a service
Open banking as a serviceOpen banking as a service
Open banking as a service
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
 
Computer Vision con AWS
Computer Vision con AWSComputer Vision con AWS
Computer Vision con AWS
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatare
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e web
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
 
Fundraising Essentials
Fundraising EssentialsFundraising Essentials
Fundraising Essentials
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container Service
 

Building Data Lakes with AWS