SlideShare ist ein Scribd-Unternehmen logo
1 von 27
Downloaden Sie, um offline zu lesen
1
todd.reichmuth@snowflake.com Solution Engineering
Data Marriage of the Lake and Warehouse
(why you need both)
2
Data Lake
a place to store ALL (not only structured) data in native format that can inexpensively grow without limits and
can exist until the end of time
Data Warehouse
a place to store “enhanced” data that is integrated, governed, cleansed, and well defined. Typically the data is
multi-dimensional, historical, and non-volatile.
Data Mart
a subset of the data warehouse, oriented to a specific business line or team.
Big Data
Data characterized by the 3 (sometimes up to 5) V’s.
Volume (high), Velocity (near real time), Variety (everything), Veracity (trustworthy), Value (actionable insights)
Usually satisfied by Data Lake, sometimes by Data Warehouse
Let’s agree on terminology
2
3
In the beginning, ”analytics” were run against operational systems. This led to single purpose
decision support systems. It became apparent that this was not sustainable. Siloed
environments caused multiple interpretations of the truth, duplicated effort, high cost, and slow
time to value.
An enterprise solution was needed.
• 1979 – Bill Inmon introduces the term Data Warehouse
• 1983 – Teradata introduces DBC/1012 specifically designed for decision support
• 1986 – Ralph Kimball founds Red Brick Systems
• 1988 – IBM publishes an article introducing the term “Business Data Warehouse”
• 1992 – Bill Inmon publishes a book ”Building the Data Warehouse”
• 1996 – Ralph Kimball publishes a book “The Data Warehouse Toolkit”
Evolution of the Data warehouse
4
Data Warehouse architecture in the early days
OLTP
System
ERP
CRM
Data
Warehouse
Flat
Files
ETL
OLAP
Reporting
Mining
5
Data warehouse architecture evolution
OLTP
System
ERP
CRM
Data
Warehouse
Flat
Files
ETL
Dashboarding
Reporting
Analytics
Data Science
External Users
Data
Warehouse
???
Data
Warehouse
???
Web &
Mobile
External
Sources
IoT
ELT
Stream
MobileXML / JSON
High Volume
Low Latency
Data velocity increasing (machine generated + IoT) and in new shapes and sizes. Silo’ing of
data came back in different ways. Less tolerance for long analytical cycles. The existing
data warehouse could not support this evolution.
Structured
High Performance
6
2010 – James Dixon
CTO Pentaho
“If you think of a
datamart as a store of
bottled water –
cleansed and packaged
and structured for easy
consumption – the data
lake is a large body of
water in a more natural
state. The contents of
the data lake stream in
from a source to fill the
lake, and various users
of the lake can come to
examine, dive in, or
take samples.”
Emergence of the Data lake
7
Big Data V’s: Warehouse & Lake comparison
7
Data warehouse Data lake
Volume
Expensive for large data,
capacity is often capped
Low cost and high capacity
Variety
Structured
Schema on write
Structured, semi-structured,
unstructured
Schema on read
Velocity
Batch ETL/ELT
Hourly or daily latency
Streaming
Near real time
Veracity
Highly curated, governed.
Source of truth
Raw, uncleansed data.
Source of fact
Value
Measure things, detect trends
Q: What has worked well?
Find relationships, make
predictions
Q: What will work well?
8
Absolutely not (in most cases) !!!
They each fulfill a different purpose. Each is optimized for different users,
skills, workloads, and service level agreements.
The data lake is an “exploratory” layer with low data latency and no
preconceived boundaries on how data will be used. Perfect for data scientists
and those willing to do their own data preparation and discovery.
A data warehouse is a trusted and curated layer for making “bet your career”
decisions. Loved by business professionals. Built for easy consumption and
ready to publish.
Can the data lake eliminate the warehouse?
9
• Certain users need/want data in a way the warehouse can best provide
• Integrated
• Subject oriented
• Curated
• Time variant
• Relational
• Ready to consume
• Data may need to be enhanced to support particular user needs
• Some business functions will require high performance, high concurrency
• Enterprise level historical data needs to be carefully managed
Why a Lake should not replace a Warehouse
10
Modern data management landscape
OLTP
System
ERP
CRM
Data
Warehouse
Flat
Files
ETL
Dashboarding
Reporting
Analytics
Data Science
External Users
Data Lake
Web &
Mobile
External
Sources
IoT
ELT
Stream
Mobile
Fill the lake with all data and curate to the warehouse to satisfy known requirements
11
Building the modern data management environment
Start filling the lake
Low cost pure
capture
environment
Built separate from
core IT
Data is stored raw
Stage 1
Landing zone for
raw data
Stage 2
Data science
environment
Stage 3
Offload for data
warehouse
Stage 4
Key component of
data operations
Actively used as
platform for
experimentation
Test and learn
environment
Data scientist given
access
Governance = “just
enough”
Large datasets, low
intensity workload
moved to lake
Needle in the
haystack searches
that do not require
indexes
Lake becomes a
source for EDW
Lake becomes core
part of data
management
Attempt to enforce
strong governance
and tight security
12
Data Warehouse
• SMP or MPP
• Popularity of the appliance
• In some cases Hadoop
• Oracle, Teradata, Netezza, Vertica, Redshift,
etc.
Data Lake
• File system with abstraction
• Hadoop, Amazon S3, Azure Data Lake Storage,
Google Cloud Storage
Technology tools of choice
1
Can a single technology platform fulfill the
needs of both data management frameworks?
13
Lake
• Cheap and plentiful storage
• Ability to handle all data types
• Optimized for ingestion
Warehouse
• Supports commonly used tools
• Handles high concurrency
• Reliable, Available, Stable
• Data protection (secure & recoverable)
Desirable capabilities
• Low administration
• Support for multi-tenant/concurrency
• Easily scale up/down, in/out
• Low total cost of ownership
• No paying for unused capacity
• Sharing of data with external entities
Technical requirements to satisfy both frameworks
1414
Modern data management architecture
SMP Shared-disk MPP Shared-nothing
Single Server
Resizing cluster requires redistributing data.
Impact to user.
Reads/Writes at the same time
cripples the system
Each cluster requires its own copy of data
(ex: test/dev, HA)
Replication requires
additional hardware
Processes needed to maintain
sort and distribution for performance
Resizing cluster requires redistributing data.
Impact to user.
Traditional architectures
© 2018 Snowflake Computing Inc. All Rights Reserved.
Cloud based, Multi-cluster,
shared data
Centralized, elastic storage
Multiple, Independent compute clusters
Unlimited storage and compute, scaled
independently gives flexibility
15
Cloud-based architecture for big data analytics
Data Warehousing Cloud Service
Separated Storage, Compute, &
Metadata, Global Services
One Compute cluster,
multiple Databases
One Database, multiple
Compute clusters
Compute clusters
scale independently
Multi-Cluster compute farms
support high concurrency
Workloads don't compete
with each other
Compute
cluster
Compute
cluster
Compute
cluster
Compute
cluster
Compute
cluster
S
S
M
XXLL
Database
Compute
cluster
Sales
Users
Data
Scientist
Finance
Users
ETL &
ELT
Services
Test/Dev
Users
Marketing
Users
XL
s
Metadata
Azure Data
Factory
1616
Relational database extended to semi-structured data
> SELECT … FROM …Semi-structured data
(JSON, Avro, XML, Parquet, ORC)
Structured data
(e.g., CSV, TSV, …)
Storage optimization
Transparent discovery and storage optimization of
repeated elements
Query optimization
Full database optimization for queries
on semi-structured data
+
1717© 2018 Snowflake Computing Inc. All Rights Reserved.
Streaming and Batch Loading
Data Sources
Web Servers
Cloud Storage
On Premises
Data Lake
BI Tools
Apps
Source Raw Data
Augmented Data
Snowflake
Database
S3
S3
Transformation
Transformation
Kafka/Kinesis
COPY
PUT + COPY
Snowpipe
REST API
Azure
Blob
GCS
Azure
Blob
18
A better way to share data
Data Providers Data Consumers
No data movement
Share with unlimited number
of consumers Live access
Data consumers immediately see
all updates
Ready to use
Consumers can immediately
start querying
1919
Near Zero Administration
NO Infrastructure
NO Tuning
NO Upgrades/Patches
NO Indexing
NO Storage worries
NO Reorg/Vacuuming
NO Partitioning
NO Stats gathering
NO Workload mgmt.
NO Manual backups
2020
Comprehensive data protection
Protection against infrastructure failures
All data transparently & synchronously replicated 3+ ways
across multiple datacenters
Long-term data protection
Zero-copy clones + optional export to S3 enable
user-managed data copies
Protection against corruption & user errors
“Time travel” feature enables instant roll-back to any point in time
during chosen retention window
New data Modified data
T0 T1 T2
SELECT * FROM T0…
Daily
Weekly
© 2018 Snowflake Computing Inc. All Rights Reserved.
2121© 2018 Snowflake Computing Inc. All Rights Reserved.
“Time travel” for data
Previous versions of data
automatically retained
Retention period selected by customer
Accessed via SQL extensions
AS OF for selection
CLONE to recreate
UNDROP recovers from accidental deletion
New data Modified data
T0 T1 T2
> SELECT * FROM mytable AS OF T0
> SELECT * FROM mytable AS OF T1
2222
Zero-copy data cloning
Instant data cloning operations
Databases, schema, tables, etc
Metadata-only operation
Modified data stored as new blocks
Unmodified data stored only once
No data copying required, no cost!
Instant test/dev environments
Test code on your entire production dataset
Swap tables into production when ready
CLONE 1
MODIFIED DATA
© 2018 Snowflake Computing Inc. All Rights Reserved.
23
Data Science
Reporting
Ad-hoc analytics
ETL and
Processing
Pay for what you actually use…down to the second
Morning Noon Night
Workload
Morning Noon Night
Workload
Morning Noon Night
Workload
Morning Noon Night
Workload
S 2S
Autoscaling Multi-cluster Warehouse
L
Autosuspend/Autoresume
M
Autosuspend/Autoresume
M 2M
Autoscaling Multi-cluster Warehouse
2424
Access control
Role-based access
control model
Granular privileges on
all objects & actions
Authentication
Embedded
multi-factor authentication
Federated
authentication available
Data encryption
All data encrypted,
always, end-to-end
Encryption keys
managed automatically
External validation
Certified against
enterprise-class requirements
Protected by industrial-strength security
© 2018 Snowflake Computing Inc. All Rights Reserved.
25
Big Data V’s: Warehouse & Lake comparison
2
Data warehouse Data lake
Volume
Expensive for large data,
capacity is often capped
Low cost and high capacity
Variety
Structured
Schema on write
Structured, semi-structured,
unstructured
Schema on read
Velocity
Batch ETL/ELT
Hourly or daily latency
Streaming
Near real time
Veracity
Highly curated, governed.
Source of truth
Raw, uncleansed data.
Source of fact
Value
Measure things, detect trends
Q: What has worked well?
Find relationships, make
predictions
Q: What will work well?
26
Data Management Modernization using the Cloud
Modern Data
Sources
Streaming
Data
Platforms
Compute for Data
Scientists
Compute for Ad
Hoc Users
Compute for BI
& Analytics
Data lake
Data warehouse
Native JDBC /
ODBC
Connector
Web UI
SQL Analysts
Data science
users and tools
Native JDBC /
ODBC
Connector
BI & analytics tools
Compute for Data
Loading
Compute for Data
Transformations
Blob
Extract, Load & Transform
Tools (ELT)
Extract, Transform & Load
Tools (ETL)
Traditional
Data Sources
Key
NO ETL
ETL / ELT
Relational
Databases
Database Replication
Compute for loading
Transformed Data
© 2018 Snowflake Computing Inc. All Rights Reserved.
Thank you!

Weitere ähnliche Inhalte

Was ist angesagt?

Building the Data Lake with Azure Data Factory and Data Lake Analytics
Building the Data Lake with Azure Data Factory and Data Lake AnalyticsBuilding the Data Lake with Azure Data Factory and Data Lake Analytics
Building the Data Lake with Azure Data Factory and Data Lake AnalyticsKhalid Salama
 
Modernizing to a Cloud Data Architecture
Modernizing to a Cloud Data ArchitectureModernizing to a Cloud Data Architecture
Modernizing to a Cloud Data ArchitectureDatabricks
 
Pipelines and Packages: Introduction to Azure Data Factory (DATA:Scotland 2019)
Pipelines and Packages: Introduction to Azure Data Factory (DATA:Scotland 2019)Pipelines and Packages: Introduction to Azure Data Factory (DATA:Scotland 2019)
Pipelines and Packages: Introduction to Azure Data Factory (DATA:Scotland 2019)Cathrine Wilhelmsen
 
Snowflake + Power BI: Cloud Analytics for Everyone
Snowflake + Power BI: Cloud Analytics for EveryoneSnowflake + Power BI: Cloud Analytics for Everyone
Snowflake + Power BI: Cloud Analytics for EveryoneAngel Abundez
 
A 30 day plan to start ending your data struggle with Snowflake
A 30 day plan to start ending your data struggle with SnowflakeA 30 day plan to start ending your data struggle with Snowflake
A 30 day plan to start ending your data struggle with SnowflakeSnowflake Computing
 
Pipelines and Data Flows: Introduction to Data Integration in Azure Synapse A...
Pipelines and Data Flows: Introduction to Data Integration in Azure Synapse A...Pipelines and Data Flows: Introduction to Data Integration in Azure Synapse A...
Pipelines and Data Flows: Introduction to Data Integration in Azure Synapse A...Cathrine Wilhelmsen
 
Snowflake Data Loading.pptx
Snowflake Data Loading.pptxSnowflake Data Loading.pptx
Snowflake Data Loading.pptxParag860410
 
[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...
[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...
[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...DataScienceConferenc1
 
Modern Data architecture Design
Modern Data architecture DesignModern Data architecture Design
Modern Data architecture DesignKujambu Murugesan
 
Building Dynamic Pipelines in Azure Data Factory (SQLSaturday Oslo)
Building Dynamic Pipelines in Azure Data Factory (SQLSaturday Oslo)Building Dynamic Pipelines in Azure Data Factory (SQLSaturday Oslo)
Building Dynamic Pipelines in Azure Data Factory (SQLSaturday Oslo)Cathrine Wilhelmsen
 
Snowflake for Data Engineering
Snowflake for Data EngineeringSnowflake for Data Engineering
Snowflake for Data EngineeringHarald Erb
 
Actionable Insights with AI - Snowflake for Data Science
Actionable Insights with AI - Snowflake for Data ScienceActionable Insights with AI - Snowflake for Data Science
Actionable Insights with AI - Snowflake for Data ScienceHarald Erb
 
Azure DataBricks for Data Engineering by Eugene Polonichko
Azure DataBricks for Data Engineering by Eugene PolonichkoAzure DataBricks for Data Engineering by Eugene Polonichko
Azure DataBricks for Data Engineering by Eugene PolonichkoDimko Zhluktenko
 
Five Things to Consider About Data Mesh and Data Governance
Five Things to Consider About Data Mesh and Data GovernanceFive Things to Consider About Data Mesh and Data Governance
Five Things to Consider About Data Mesh and Data GovernanceDATAVERSITY
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Databricks
 
Snowflake: The Good, the Bad, and the Ugly
Snowflake: The Good, the Bad, and the UglySnowflake: The Good, the Bad, and the Ugly
Snowflake: The Good, the Bad, and the UglyTyler Wishnoff
 
Free Training: How to Build a Lakehouse
Free Training: How to Build a LakehouseFree Training: How to Build a Lakehouse
Free Training: How to Build a LakehouseDatabricks
 
Data platform modernization with Databricks.pptx
Data platform modernization with Databricks.pptxData platform modernization with Databricks.pptx
Data platform modernization with Databricks.pptxCalvinSim10
 

Was ist angesagt? (20)

Building the Data Lake with Azure Data Factory and Data Lake Analytics
Building the Data Lake with Azure Data Factory and Data Lake AnalyticsBuilding the Data Lake with Azure Data Factory and Data Lake Analytics
Building the Data Lake with Azure Data Factory and Data Lake Analytics
 
Elastic Data Warehousing
Elastic Data WarehousingElastic Data Warehousing
Elastic Data Warehousing
 
Snowflake Datawarehouse Architecturing
Snowflake Datawarehouse ArchitecturingSnowflake Datawarehouse Architecturing
Snowflake Datawarehouse Architecturing
 
Modernizing to a Cloud Data Architecture
Modernizing to a Cloud Data ArchitectureModernizing to a Cloud Data Architecture
Modernizing to a Cloud Data Architecture
 
Pipelines and Packages: Introduction to Azure Data Factory (DATA:Scotland 2019)
Pipelines and Packages: Introduction to Azure Data Factory (DATA:Scotland 2019)Pipelines and Packages: Introduction to Azure Data Factory (DATA:Scotland 2019)
Pipelines and Packages: Introduction to Azure Data Factory (DATA:Scotland 2019)
 
Snowflake + Power BI: Cloud Analytics for Everyone
Snowflake + Power BI: Cloud Analytics for EveryoneSnowflake + Power BI: Cloud Analytics for Everyone
Snowflake + Power BI: Cloud Analytics for Everyone
 
A 30 day plan to start ending your data struggle with Snowflake
A 30 day plan to start ending your data struggle with SnowflakeA 30 day plan to start ending your data struggle with Snowflake
A 30 day plan to start ending your data struggle with Snowflake
 
Pipelines and Data Flows: Introduction to Data Integration in Azure Synapse A...
Pipelines and Data Flows: Introduction to Data Integration in Azure Synapse A...Pipelines and Data Flows: Introduction to Data Integration in Azure Synapse A...
Pipelines and Data Flows: Introduction to Data Integration in Azure Synapse A...
 
Snowflake Data Loading.pptx
Snowflake Data Loading.pptxSnowflake Data Loading.pptx
Snowflake Data Loading.pptx
 
[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...
[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...
[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...
 
Modern Data architecture Design
Modern Data architecture DesignModern Data architecture Design
Modern Data architecture Design
 
Building Dynamic Pipelines in Azure Data Factory (SQLSaturday Oslo)
Building Dynamic Pipelines in Azure Data Factory (SQLSaturday Oslo)Building Dynamic Pipelines in Azure Data Factory (SQLSaturday Oslo)
Building Dynamic Pipelines in Azure Data Factory (SQLSaturday Oslo)
 
Snowflake for Data Engineering
Snowflake for Data EngineeringSnowflake for Data Engineering
Snowflake for Data Engineering
 
Actionable Insights with AI - Snowflake for Data Science
Actionable Insights with AI - Snowflake for Data ScienceActionable Insights with AI - Snowflake for Data Science
Actionable Insights with AI - Snowflake for Data Science
 
Azure DataBricks for Data Engineering by Eugene Polonichko
Azure DataBricks for Data Engineering by Eugene PolonichkoAzure DataBricks for Data Engineering by Eugene Polonichko
Azure DataBricks for Data Engineering by Eugene Polonichko
 
Five Things to Consider About Data Mesh and Data Governance
Five Things to Consider About Data Mesh and Data GovernanceFive Things to Consider About Data Mesh and Data Governance
Five Things to Consider About Data Mesh and Data Governance
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4
 
Snowflake: The Good, the Bad, and the Ugly
Snowflake: The Good, the Bad, and the UglySnowflake: The Good, the Bad, and the Ugly
Snowflake: The Good, the Bad, and the Ugly
 
Free Training: How to Build a Lakehouse
Free Training: How to Build a LakehouseFree Training: How to Build a Lakehouse
Free Training: How to Build a Lakehouse
 
Data platform modernization with Databricks.pptx
Data platform modernization with Databricks.pptxData platform modernization with Databricks.pptx
Data platform modernization with Databricks.pptx
 

Ähnlich wie The Marriage of the Data Lake and the Data Warehouse and Why You Need Both

Designing modern dw and data lake
Designing modern dw and data lakeDesigning modern dw and data lake
Designing modern dw and data lakepunedevscom
 
Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)James Serra
 
Data Lakes: A Logical Approach for Faster Unified Insights
Data Lakes: A Logical Approach for Faster Unified InsightsData Lakes: A Logical Approach for Faster Unified Insights
Data Lakes: A Logical Approach for Faster Unified InsightsDenodo
 
Data Lakes: A Logical Approach for Faster Unified Insights (ASEAN)
Data Lakes: A Logical Approach for Faster Unified Insights (ASEAN)Data Lakes: A Logical Approach for Faster Unified Insights (ASEAN)
Data Lakes: A Logical Approach for Faster Unified Insights (ASEAN)Denodo
 
ADV Slides: The Evolution of the Data Platform and What It Means to Enterpris...
ADV Slides: The Evolution of the Data Platform and What It Means to Enterpris...ADV Slides: The Evolution of the Data Platform and What It Means to Enterpris...
ADV Slides: The Evolution of the Data Platform and What It Means to Enterpris...DATAVERSITY
 
Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)James Serra
 
ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...
ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...
ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...DATAVERSITY
 
Using Data Platforms That Are Fit-For-Purpose
Using Data Platforms That Are Fit-For-PurposeUsing Data Platforms That Are Fit-For-Purpose
Using Data Platforms That Are Fit-For-PurposeDATAVERSITY
 
Data lake-itweekend-sharif university-vahid amiry
Data lake-itweekend-sharif university-vahid amiryData lake-itweekend-sharif university-vahid amiry
Data lake-itweekend-sharif university-vahid amirydatastack
 
Got data?… now what? An introduction to modern data platforms
Got data?… now what?  An introduction to modern data platformsGot data?… now what?  An introduction to modern data platforms
Got data?… now what? An introduction to modern data platformsJamesAnderson599331
 
Data Virtualization: An Essential Component of a Cloud Data Lake
Data Virtualization: An Essential Component of a Cloud Data LakeData Virtualization: An Essential Component of a Cloud Data Lake
Data Virtualization: An Essential Component of a Cloud Data LakeDenodo
 
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data ArchitectureADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data ArchitectureDATAVERSITY
 
What is Data Lake and its Benefits?
What is Data Lake and its Benefits?What is Data Lake and its Benefits?
What is Data Lake and its Benefits?V2Soft
 
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization Denodo
 
Vikram Andem Big Data Strategy @ IATA Technology Roadmap
Vikram Andem Big Data Strategy @ IATA Technology Roadmap Vikram Andem Big Data Strategy @ IATA Technology Roadmap
Vikram Andem Big Data Strategy @ IATA Technology Roadmap IT Strategy Group
 
Difference between Database vs Data Warehouse vs Data Lake
Difference between Database vs Data Warehouse vs Data LakeDifference between Database vs Data Warehouse vs Data Lake
Difference between Database vs Data Warehouse vs Data Lakejeetendra mandal
 
Logical Data Lakes: From Single Purpose to Multipurpose Data Lakes (APAC)
Logical Data Lakes: From Single Purpose to Multipurpose Data Lakes (APAC)Logical Data Lakes: From Single Purpose to Multipurpose Data Lakes (APAC)
Logical Data Lakes: From Single Purpose to Multipurpose Data Lakes (APAC)Denodo
 
Strata San Jose 2017 - Ben Sharma Presentation
Strata San Jose 2017 - Ben Sharma PresentationStrata San Jose 2017 - Ben Sharma Presentation
Strata San Jose 2017 - Ben Sharma PresentationZaloni
 
So You Want to Build a Data Lake?
So You Want to Build a Data Lake?So You Want to Build a Data Lake?
So You Want to Build a Data Lake?David P. Moore
 

Ähnlich wie The Marriage of the Data Lake and the Data Warehouse and Why You Need Both (20)

Designing modern dw and data lake
Designing modern dw and data lakeDesigning modern dw and data lake
Designing modern dw and data lake
 
Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)
 
Data Lakes: A Logical Approach for Faster Unified Insights
Data Lakes: A Logical Approach for Faster Unified InsightsData Lakes: A Logical Approach for Faster Unified Insights
Data Lakes: A Logical Approach for Faster Unified Insights
 
Data Lakes: A Logical Approach for Faster Unified Insights (ASEAN)
Data Lakes: A Logical Approach for Faster Unified Insights (ASEAN)Data Lakes: A Logical Approach for Faster Unified Insights (ASEAN)
Data Lakes: A Logical Approach for Faster Unified Insights (ASEAN)
 
ADV Slides: The Evolution of the Data Platform and What It Means to Enterpris...
ADV Slides: The Evolution of the Data Platform and What It Means to Enterpris...ADV Slides: The Evolution of the Data Platform and What It Means to Enterpris...
ADV Slides: The Evolution of the Data Platform and What It Means to Enterpris...
 
Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)
 
ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...
ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...
ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...
 
Benefits of a data lake
Benefits of a data lake Benefits of a data lake
Benefits of a data lake
 
Using Data Platforms That Are Fit-For-Purpose
Using Data Platforms That Are Fit-For-PurposeUsing Data Platforms That Are Fit-For-Purpose
Using Data Platforms That Are Fit-For-Purpose
 
Data lake-itweekend-sharif university-vahid amiry
Data lake-itweekend-sharif university-vahid amiryData lake-itweekend-sharif university-vahid amiry
Data lake-itweekend-sharif university-vahid amiry
 
Got data?… now what? An introduction to modern data platforms
Got data?… now what?  An introduction to modern data platformsGot data?… now what?  An introduction to modern data platforms
Got data?… now what? An introduction to modern data platforms
 
Data Virtualization: An Essential Component of a Cloud Data Lake
Data Virtualization: An Essential Component of a Cloud Data LakeData Virtualization: An Essential Component of a Cloud Data Lake
Data Virtualization: An Essential Component of a Cloud Data Lake
 
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data ArchitectureADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
 
What is Data Lake and its Benefits?
What is Data Lake and its Benefits?What is Data Lake and its Benefits?
What is Data Lake and its Benefits?
 
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization
 
Vikram Andem Big Data Strategy @ IATA Technology Roadmap
Vikram Andem Big Data Strategy @ IATA Technology Roadmap Vikram Andem Big Data Strategy @ IATA Technology Roadmap
Vikram Andem Big Data Strategy @ IATA Technology Roadmap
 
Difference between Database vs Data Warehouse vs Data Lake
Difference between Database vs Data Warehouse vs Data LakeDifference between Database vs Data Warehouse vs Data Lake
Difference between Database vs Data Warehouse vs Data Lake
 
Logical Data Lakes: From Single Purpose to Multipurpose Data Lakes (APAC)
Logical Data Lakes: From Single Purpose to Multipurpose Data Lakes (APAC)Logical Data Lakes: From Single Purpose to Multipurpose Data Lakes (APAC)
Logical Data Lakes: From Single Purpose to Multipurpose Data Lakes (APAC)
 
Strata San Jose 2017 - Ben Sharma Presentation
Strata San Jose 2017 - Ben Sharma PresentationStrata San Jose 2017 - Ben Sharma Presentation
Strata San Jose 2017 - Ben Sharma Presentation
 
So You Want to Build a Data Lake?
So You Want to Build a Data Lake?So You Want to Build a Data Lake?
So You Want to Build a Data Lake?
 

Mehr von Adaryl "Bob" Wakefield, MBA

Mehr von Adaryl "Bob" Wakefield, MBA (9)

Big Data Technologies and Why They Matter To R Users
Big Data Technologies and Why They Matter To R UsersBig Data Technologies and Why They Matter To R Users
Big Data Technologies and Why They Matter To R Users
 
Getting Started With Sparklyr
Getting Started With SparklyrGetting Started With Sparklyr
Getting Started With Sparklyr
 
Perfecting Your Streaming Skills with Spark and Real World IoT Data
Perfecting Your Streaming Skills with Spark and Real World IoT DataPerfecting Your Streaming Skills with Spark and Real World IoT Data
Perfecting Your Streaming Skills with Spark and Real World IoT Data
 
Reproducible Research with R, The Tidyverse, Notebooks, and Spark
Reproducible Research with R, The Tidyverse, Notebooks, and SparkReproducible Research with R, The Tidyverse, Notebooks, and Spark
Reproducible Research with R, The Tidyverse, Notebooks, and Spark
 
Architecting Your First Big Data Implementation
Architecting Your First Big Data ImplementationArchitecting Your First Big Data Implementation
Architecting Your First Big Data Implementation
 
Hands On: Introduction to the Hadoop Ecosystem
Hands On: Introduction to the Hadoop EcosystemHands On: Introduction to the Hadoop Ecosystem
Hands On: Introduction to the Hadoop Ecosystem
 
Hands On: Linux survival skills
Hands On: Linux survival skillsHands On: Linux survival skills
Hands On: Linux survival skills
 
In Memory Databases: A Real Time Analytics Solution
In Memory Databases: A Real Time Analytics SolutionIn Memory Databases: A Real Time Analytics Solution
In Memory Databases: A Real Time Analytics Solution
 
Not Just Another Overview of Apache Hadoop
Not Just Another Overview of Apache HadoopNot Just Another Overview of Apache Hadoop
Not Just Another Overview of Apache Hadoop
 

Kürzlich hochgeladen

Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
Zuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxZuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxolyaivanovalion
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptxAnupama Kate
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...amitlee9823
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxJohnnyPlasten
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfRachmat Ramadhan H
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightDelhi Call girls
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxMohammedJunaid861692
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxolyaivanovalion
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfLars Albertsson
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...amitlee9823
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxolyaivanovalion
 
Data-Analysis for Chicago Crime Data 2023
Data-Analysis for Chicago Crime Data  2023Data-Analysis for Chicago Crime Data  2023
Data-Analysis for Chicago Crime Data 2023ymrp368
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionfulawalesam
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxfirstjob4
 

Kürzlich hochgeladen (20)

(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
Zuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxZuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptx
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx
 
Sampling (random) method and Non random.ppt
Sampling (random) method and Non random.pptSampling (random) method and Non random.ppt
Sampling (random) method and Non random.ppt
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptx
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
 
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in  KishangarhDelhi 99530 vip 56974 Genuine Escort Service Call Girls in  Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
 
Data-Analysis for Chicago Crime Data 2023
Data-Analysis for Chicago Crime Data  2023Data-Analysis for Chicago Crime Data  2023
Data-Analysis for Chicago Crime Data 2023
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interaction
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptx
 

The Marriage of the Data Lake and the Data Warehouse and Why You Need Both

  • 1. 1 todd.reichmuth@snowflake.com Solution Engineering Data Marriage of the Lake and Warehouse (why you need both)
  • 2. 2 Data Lake a place to store ALL (not only structured) data in native format that can inexpensively grow without limits and can exist until the end of time Data Warehouse a place to store “enhanced” data that is integrated, governed, cleansed, and well defined. Typically the data is multi-dimensional, historical, and non-volatile. Data Mart a subset of the data warehouse, oriented to a specific business line or team. Big Data Data characterized by the 3 (sometimes up to 5) V’s. Volume (high), Velocity (near real time), Variety (everything), Veracity (trustworthy), Value (actionable insights) Usually satisfied by Data Lake, sometimes by Data Warehouse Let’s agree on terminology 2
  • 3. 3 In the beginning, ”analytics” were run against operational systems. This led to single purpose decision support systems. It became apparent that this was not sustainable. Siloed environments caused multiple interpretations of the truth, duplicated effort, high cost, and slow time to value. An enterprise solution was needed. • 1979 – Bill Inmon introduces the term Data Warehouse • 1983 – Teradata introduces DBC/1012 specifically designed for decision support • 1986 – Ralph Kimball founds Red Brick Systems • 1988 – IBM publishes an article introducing the term “Business Data Warehouse” • 1992 – Bill Inmon publishes a book ”Building the Data Warehouse” • 1996 – Ralph Kimball publishes a book “The Data Warehouse Toolkit” Evolution of the Data warehouse
  • 4. 4 Data Warehouse architecture in the early days OLTP System ERP CRM Data Warehouse Flat Files ETL OLAP Reporting Mining
  • 5. 5 Data warehouse architecture evolution OLTP System ERP CRM Data Warehouse Flat Files ETL Dashboarding Reporting Analytics Data Science External Users Data Warehouse ??? Data Warehouse ??? Web & Mobile External Sources IoT ELT Stream MobileXML / JSON High Volume Low Latency Data velocity increasing (machine generated + IoT) and in new shapes and sizes. Silo’ing of data came back in different ways. Less tolerance for long analytical cycles. The existing data warehouse could not support this evolution. Structured High Performance
  • 6. 6 2010 – James Dixon CTO Pentaho “If you think of a datamart as a store of bottled water – cleansed and packaged and structured for easy consumption – the data lake is a large body of water in a more natural state. The contents of the data lake stream in from a source to fill the lake, and various users of the lake can come to examine, dive in, or take samples.” Emergence of the Data lake
  • 7. 7 Big Data V’s: Warehouse & Lake comparison 7 Data warehouse Data lake Volume Expensive for large data, capacity is often capped Low cost and high capacity Variety Structured Schema on write Structured, semi-structured, unstructured Schema on read Velocity Batch ETL/ELT Hourly or daily latency Streaming Near real time Veracity Highly curated, governed. Source of truth Raw, uncleansed data. Source of fact Value Measure things, detect trends Q: What has worked well? Find relationships, make predictions Q: What will work well?
  • 8. 8 Absolutely not (in most cases) !!! They each fulfill a different purpose. Each is optimized for different users, skills, workloads, and service level agreements. The data lake is an “exploratory” layer with low data latency and no preconceived boundaries on how data will be used. Perfect for data scientists and those willing to do their own data preparation and discovery. A data warehouse is a trusted and curated layer for making “bet your career” decisions. Loved by business professionals. Built for easy consumption and ready to publish. Can the data lake eliminate the warehouse?
  • 9. 9 • Certain users need/want data in a way the warehouse can best provide • Integrated • Subject oriented • Curated • Time variant • Relational • Ready to consume • Data may need to be enhanced to support particular user needs • Some business functions will require high performance, high concurrency • Enterprise level historical data needs to be carefully managed Why a Lake should not replace a Warehouse
  • 10. 10 Modern data management landscape OLTP System ERP CRM Data Warehouse Flat Files ETL Dashboarding Reporting Analytics Data Science External Users Data Lake Web & Mobile External Sources IoT ELT Stream Mobile Fill the lake with all data and curate to the warehouse to satisfy known requirements
  • 11. 11 Building the modern data management environment Start filling the lake Low cost pure capture environment Built separate from core IT Data is stored raw Stage 1 Landing zone for raw data Stage 2 Data science environment Stage 3 Offload for data warehouse Stage 4 Key component of data operations Actively used as platform for experimentation Test and learn environment Data scientist given access Governance = “just enough” Large datasets, low intensity workload moved to lake Needle in the haystack searches that do not require indexes Lake becomes a source for EDW Lake becomes core part of data management Attempt to enforce strong governance and tight security
  • 12. 12 Data Warehouse • SMP or MPP • Popularity of the appliance • In some cases Hadoop • Oracle, Teradata, Netezza, Vertica, Redshift, etc. Data Lake • File system with abstraction • Hadoop, Amazon S3, Azure Data Lake Storage, Google Cloud Storage Technology tools of choice 1 Can a single technology platform fulfill the needs of both data management frameworks?
  • 13. 13 Lake • Cheap and plentiful storage • Ability to handle all data types • Optimized for ingestion Warehouse • Supports commonly used tools • Handles high concurrency • Reliable, Available, Stable • Data protection (secure & recoverable) Desirable capabilities • Low administration • Support for multi-tenant/concurrency • Easily scale up/down, in/out • Low total cost of ownership • No paying for unused capacity • Sharing of data with external entities Technical requirements to satisfy both frameworks
  • 14. 1414 Modern data management architecture SMP Shared-disk MPP Shared-nothing Single Server Resizing cluster requires redistributing data. Impact to user. Reads/Writes at the same time cripples the system Each cluster requires its own copy of data (ex: test/dev, HA) Replication requires additional hardware Processes needed to maintain sort and distribution for performance Resizing cluster requires redistributing data. Impact to user. Traditional architectures © 2018 Snowflake Computing Inc. All Rights Reserved. Cloud based, Multi-cluster, shared data Centralized, elastic storage Multiple, Independent compute clusters Unlimited storage and compute, scaled independently gives flexibility
  • 15. 15 Cloud-based architecture for big data analytics Data Warehousing Cloud Service Separated Storage, Compute, & Metadata, Global Services One Compute cluster, multiple Databases One Database, multiple Compute clusters Compute clusters scale independently Multi-Cluster compute farms support high concurrency Workloads don't compete with each other Compute cluster Compute cluster Compute cluster Compute cluster Compute cluster S S M XXLL Database Compute cluster Sales Users Data Scientist Finance Users ETL & ELT Services Test/Dev Users Marketing Users XL s Metadata Azure Data Factory
  • 16. 1616 Relational database extended to semi-structured data > SELECT … FROM …Semi-structured data (JSON, Avro, XML, Parquet, ORC) Structured data (e.g., CSV, TSV, …) Storage optimization Transparent discovery and storage optimization of repeated elements Query optimization Full database optimization for queries on semi-structured data +
  • 17. 1717© 2018 Snowflake Computing Inc. All Rights Reserved. Streaming and Batch Loading Data Sources Web Servers Cloud Storage On Premises Data Lake BI Tools Apps Source Raw Data Augmented Data Snowflake Database S3 S3 Transformation Transformation Kafka/Kinesis COPY PUT + COPY Snowpipe REST API Azure Blob GCS Azure Blob
  • 18. 18 A better way to share data Data Providers Data Consumers No data movement Share with unlimited number of consumers Live access Data consumers immediately see all updates Ready to use Consumers can immediately start querying
  • 19. 1919 Near Zero Administration NO Infrastructure NO Tuning NO Upgrades/Patches NO Indexing NO Storage worries NO Reorg/Vacuuming NO Partitioning NO Stats gathering NO Workload mgmt. NO Manual backups
  • 20. 2020 Comprehensive data protection Protection against infrastructure failures All data transparently & synchronously replicated 3+ ways across multiple datacenters Long-term data protection Zero-copy clones + optional export to S3 enable user-managed data copies Protection against corruption & user errors “Time travel” feature enables instant roll-back to any point in time during chosen retention window New data Modified data T0 T1 T2 SELECT * FROM T0… Daily Weekly © 2018 Snowflake Computing Inc. All Rights Reserved.
  • 21. 2121© 2018 Snowflake Computing Inc. All Rights Reserved. “Time travel” for data Previous versions of data automatically retained Retention period selected by customer Accessed via SQL extensions AS OF for selection CLONE to recreate UNDROP recovers from accidental deletion New data Modified data T0 T1 T2 > SELECT * FROM mytable AS OF T0 > SELECT * FROM mytable AS OF T1
  • 22. 2222 Zero-copy data cloning Instant data cloning operations Databases, schema, tables, etc Metadata-only operation Modified data stored as new blocks Unmodified data stored only once No data copying required, no cost! Instant test/dev environments Test code on your entire production dataset Swap tables into production when ready CLONE 1 MODIFIED DATA © 2018 Snowflake Computing Inc. All Rights Reserved.
  • 23. 23 Data Science Reporting Ad-hoc analytics ETL and Processing Pay for what you actually use…down to the second Morning Noon Night Workload Morning Noon Night Workload Morning Noon Night Workload Morning Noon Night Workload S 2S Autoscaling Multi-cluster Warehouse L Autosuspend/Autoresume M Autosuspend/Autoresume M 2M Autoscaling Multi-cluster Warehouse
  • 24. 2424 Access control Role-based access control model Granular privileges on all objects & actions Authentication Embedded multi-factor authentication Federated authentication available Data encryption All data encrypted, always, end-to-end Encryption keys managed automatically External validation Certified against enterprise-class requirements Protected by industrial-strength security © 2018 Snowflake Computing Inc. All Rights Reserved.
  • 25. 25 Big Data V’s: Warehouse & Lake comparison 2 Data warehouse Data lake Volume Expensive for large data, capacity is often capped Low cost and high capacity Variety Structured Schema on write Structured, semi-structured, unstructured Schema on read Velocity Batch ETL/ELT Hourly or daily latency Streaming Near real time Veracity Highly curated, governed. Source of truth Raw, uncleansed data. Source of fact Value Measure things, detect trends Q: What has worked well? Find relationships, make predictions Q: What will work well?
  • 26. 26 Data Management Modernization using the Cloud Modern Data Sources Streaming Data Platforms Compute for Data Scientists Compute for Ad Hoc Users Compute for BI & Analytics Data lake Data warehouse Native JDBC / ODBC Connector Web UI SQL Analysts Data science users and tools Native JDBC / ODBC Connector BI & analytics tools Compute for Data Loading Compute for Data Transformations Blob Extract, Load & Transform Tools (ELT) Extract, Transform & Load Tools (ETL) Traditional Data Sources Key NO ETL ETL / ELT Relational Databases Database Replication Compute for loading Transformed Data
  • 27. © 2018 Snowflake Computing Inc. All Rights Reserved. Thank you!