Weitere ähnliche Inhalte Ähnlich wie Delivering Data Democratization in the Cloud with Snowflake (20) Mehr von Kent Graziano (20) Kürzlich hochgeladen (20) Delivering Data Democratization in the Cloud with Snowflake1. © 2019 Snowflake Inc. All Rights Reserved
DELIVERING DATA
DEMOCRATIZATION
IN THE CLOUD
WITH SNOWFLAKE
KENT GRAZIANO, CHIEF TECHNICAL EVANGELIST I @KentGraziano
2. © 2019 Snowflake Inc. All Rights Reserved
• Chief Technical Evangelist, Snowflake Computing
• Oracle ACE Director, Alumni (DW/BI)
• OakTable Network
• Blogger – The Data Warrior
• Certified Data Vault Master and DV 2.0 Practitioner
• Former Member: Boulder BI Brain Trust (#BBBT)
• Member: DAMA Houston & DAMA International
• Data Architecture and Data Warehouse Specialist
• 30+ years in IT
• 25+ years of Oracle-related work
• 20+ years of data warehousing experience
• Author & Co-Author of a bunch of books (Amazon)
• Past-President of ODTUG and Rocky Mountain Oracle User Group
My Bio
3. © 2019 Snowflake Inc. All Rights Reserved
3 years in stealth + 4 years GA
3
1700+ employees
Over 3400
customers today
Over $1.4B in venture
funding from leading
investors
First customers
2014, general
availability 2015
Founded 2012 by
industry veterans
with over 120
database patents
Queries processed in
Snowflake per day:
406 million
Largest single
table:
46.7 trillion rows
Largest number of
tables single DB:
3.1 million!
Single customer
most data:
> 19.3PB
Single customer
most users:
> 11,000
Fun facts:
5. © 2019 Snowflake Inc. All Rights Reserved
AGENDA
Data Challenges
What is a Cloud Data Platform?
Real World Use Cases
9. It’s not the data itself
it’s how you take full advantage of the insight it provides
Web ERP3rd party apps Enterprise apps IoTMobile
10. © 2020 Snowflake Inc. All Rights Reserved
Most firms don’t consistently turn data into action
Source: Forrester
All Possible data All Possible Action
of firms
aspire to be
data-driven.
73%
of firms are
good at
turning data
into action.
29%
11. © 2020 Snowflake Inc. All Rights Reserved
JOURNEY TO A CLOUD DATA PLATFORM
On Premises
EDW
Data Lake,
Hadoop
1st Gen Cloud
EDW
Cloud Data
Platform
All Data
All Users
Fast Answers
SQL Database
Value
of
Data
Time
12. © 2020 Snowflake Inc. All Rights Reserved 12
DATA
SOURCES
OLTP DATABASES
ENTERPRISE
APPLICATIONS
THIRD-PARTY
WEB/LOG DATA
IoT
DATA
CONSUMERS
DATA MONETIZATION
OPERATIONAL
REPORTING
AD HOC ANALYSIS
REAL-TIME ANALYTICS
SNOWFLAKE CLOUD DATA PLATFORM
13. © 2020 Snowflake Inc. All Rights Reserved
Rethink
transformation
with robust
and integrated
data pipelines
Simplify and
accelerate your
data lake with
one platform for
all your data
Develop apps
with fast and
scalable analytics
that delight
customers
Deliver
analytics at
scale with
a modern
data warehouse
Empower your
ecosystem
to securely
collaborate
across all data
Simplify and
accelerate
machine learning
and artificial
intelligence
ONE PLATFORM, ONE COPY OF DATA,
MANY WORKLOADS
15. © 2020 Snowflake Inc. All Rights Reserved
Delivering compelling results
Simpler data pipeline
Replace noSQL database with Snowflake
for storing & transforming JSON event data
Faster analytics
Replace on-premises data warehouse with
Snowflake for analytics workload
Significantly lower cost
Improved performance while adding new
workloads--at a fraction of the cost
noSQL data base:
8 hours to prepare data
Snowflake:
1.5 minutes
Data warehouse
appliance: 20+ hours
Snowflake:
45 minutes
Data warehouse
appliance:
$5M + to expand
Snowflake:
added 2 new workloads for $50K
16. © 2020 Snowflake Inc. All Rights Reserved
Simplifying the Data pipeline
EDW
Game Event Data
Internal Data
Third-party Data
Analysts & BI
Tools
Staging
noSQL
Database
Existing
EDW
Game Event Data
Internal Data
Third-party Data
Analysts & BI
Tools
SnowflakeKinesis
Cleanse Normalize Transform
11-24 hours 15 minutes
Scenario
Complex pipeline slowing down analytics
Pain Points
• Fragile data pipeline
• Delays in getting updated data
• High cost and complexity
• Limited data granularity
Solution
Send data from Kinesis to S3 to Snowflake with
schemaless ingestion and easy querying
Snowflake Value
• >50x faster data updates
• 80% lower costs
• Nearly eliminated pipeline failures
• Able to retain full data granularity
17. © 2020 Snowflake Inc. All Rights Reserved
JSON Support with SQL – No Programming
17
> SELECT … FROM
…
Semi-structured data
(JSON, Avro, XML, Parquet, ORC)
Structured data
(e.g., CSV, TSV, …)
Storage optimization
Transparent discovery and storage optimization
of repeated elements
Query optimization
Full database optimization for queries
on semi-structured data
+
select v:lastName::string as last_name
from json_demo;
18. © 2020 Snowflake Inc. All Rights Reserved
Load data and run queries,
we do all the rest
Zero infrastructure and admin costs
Secure and highly available
Fully managed with no knobs
or tuning required
No indexes, distribution keys,
partitioning, or vacuuming
18
Automatic Query Optimization
20. © 2020 Snowflake Inc. All Rights Reserved
DATA ANALYTICS AT EXTREME SCALE
20
Scenario Pain Points
Snowflake Value
120x faster –
from 20 hours
to 45 minutes
Ad-Hoc analytics
available to all users
in minutes
Deployed in a week
during the busiest
time of year
• Financial institution with
a huge focus on security
• Overburdened staff
• Business needs to run monthly reports
that span 10 years of historical data
• No way to analyze
semi-structured data
• Quoted $500,000 to replace their
existing hardware appliance
• 20+ hours to run reports
• Could not continue to scale
• Users unable to query while
performing ETL
Profiles
Ratings
Loses
Microstrategy
1-2 days
1-2 mins
Legacy DW
21. © 2020 Snowflake Inc. All Rights Reserved
SNOWFLAKE - SECURE BY DESIGN
21
Snowflake
Operational Controls
• NIST 800-53
• SOC2 Type 2
• HIPAA
• PCI
• FedRAMP
Access
• All communication
secured & encrypted
• TLS 1.2 encryption
in both trusted and
untrusted networks
• IP Whitelisting
Authentication
• Password Policy
enforcement
• Multifactor Authentication
• SAML 2.0 support for
Federated Authentication
Application
• Flexible user
management
• Role-based
access control for
granular control
• RBAC for data
and actions
Data
• Encrypted at rest
• Hierarchical key model
rooted in Cloud HSM
• Automatic key rotation
• Time Travel 1-90 days
• Tri-Secret Secure
• Query statement
encryption
Infrastructure
• AWS, Azure Physical Security
• AWS, Azure Redundancy
• Regional Data Centers
▪ US
▪ EU
▪ AP
22. © 2020 Snowflake Inc. All Rights Reserved
DATA SECURITY & GOVERNANCE
Ensure data privacy and restrict access to source data
22
SECURE VIEWS
Cell-level security in
multi-tenant situations
SECURE UDFs
Prevent consumers
from seeing
underlying data
SECURE JOINS
Securely mask data
during join operations
SECURE
MATERIALIZED VIEWS
Secure,
precomputed views
24. © 2020 Snowflake Inc. All Rights Reserved
Previous bottlenecks
Slow to modify due to so many layers
Legacy database limited size and scalability
Limited datasets to increase stability
Limited enterprise access
Expensive to build and maintain
SITUATION BEFORE: DATA WAREHOUSE
24
Data
Warehouse
Landing
Database
BI
Analytics
Tools
Source
Systems
25. © 2020 Snowflake Inc. All Rights Reserved
Previous bottlenecks
Not scalable
Too many points of failure
Developers and Change Control still required
Expensive to build and maintain
Too many users
SITUATION BEFORE: DATA SERVICES
25
ODataETL Repo
Database
BI
Analytics
Tools
Transform
and Store
Data Quality
Tool
Host
API
Source
Systems
26. © 2020 Snowflake Inc. All Rights Reserved
Previous bottlenecks
Opaque solutions
Difficult to support
Duplication of effort from business users
Proprietary solution and code
SITUATION BEFORE: SELF-SERVICE HADOOP
26
SQL
Datamart
HDFS
BI
Analytics
Tools
Transform
and Store
ETL Tool
HostSource
Systems
27. © 2020 Snowflake Inc. All Rights Reserved
SNOWFLAKE IMPLEMENTATION
27
Centralized raw and enterprise data
Simplified data transformation
Existing tools integrate seamlessly
Scales for the workload
Can handle all of the enterprise’s data
Source
Systems
Attunity
Replicate
Attunity (M)
Azure Blob
OData (L) SAS (M) Power BI (L)Databricks (XL)
Excel SAS Power BIDatabricks
28. © 2020 Snowflake Inc. All Rights Reserved
“SNOWFLAKE IS HERE
TO STAY, RIGHT?”
IMMEDIATE WINS
DATA MANAGEMENT
PROFESSIONAL
EHS Greenhouse Gas Report
Regulatory report originally ran in 48 hours
in legacy db, reduced to minutes
Data Prep for loading
15 hour process to prepare data for
loading into subsurface application
reduced to 30 minutes (mostly the data
load time)
Massive Scale
2,000 simultaneous and random queries
against a 40 billion record table all return in
under 10 seconds
29. © 2020 Snowflake Inc. All Rights Reserved
Separation of Storage & Compute
New multi-cluster, shared data architecture
29
Multi-Tenant Snowflake Deployment
Semi-structured
data
Structured data
Data Science (L)
Database Services
Management Optimization Security Availability Transactions Metadata
Easy-to-use Service with No Management
Unlimited Compute Clusters to Serve Every Use
Flexible Cloud Storage For All Kinds of Data
SQL Queries (S)
Data Transformation (M)
BI/Reporting
Multi-Cluster (M)
30. © 2020 Snowflake Inc. All Rights Reserved
How Snowflake transformed and simplified our data
30
Have all of our data in one place
Separated workloads to never
have downtime
Easy adoption because everything
looks like a traditional database
Give users a consistent location for
data, regardless of tool or use case
Tremendous scalability for users
of a large enterprise
Comparatively low cost
31. © 2020 Snowflake Inc. All Rights Reserved.
“2000 Simultaneous Queries
against a 40 Billion record table
all return in under 10 seconds.”
Larry Querbach,
Enterprise Data Architect
33. © 2020 Snowflake Inc. All Rights Reserved
SITUATION
• 13 data warehouses, some
end-of-life
• Huge data silos caused
ambiguity and reporting
disparities
• Poor consumer insight
VALUE
• Diverse data ingested for
analytics, data science
• Improved agility, scalability
with compute on demand
• Much faster – 6 hour queries
running in 3 seconds
• In depth customer knowledge
and improved services
34. © 2020 Snowflake Inc. All Rights Reserved
SITUATION / PAIN
• Painfully slow analytics cycles
• Limited ability to answer
complex questions
• Inability to provide business
continuity and ensure security
SOLUTION / VALUE
• Hundreds of newly
empowered analysts
• Improved scalability and
faster query times
• Guaranteed security and data
availability, 24/7/365
35. © 2020 Snowflake Inc. All Rights Reserved.
“In terms of processing data,
we are 10X faster and
60% cheaper.”
Rene Greiner,
VP of Data Integration
37. © 2020 Snowflake Inc. All Rights Reserved
What does a Cloud Data
Platform Provide?
Cost effective storage and analysis of
GBs, TBs, or even PB’s
Lightning fast query performance
Continuous data loading without impacting
query performance
Unlimited user concurrency
Full SQL relational support of both
structured and semi-structured data
Support for the tools and languages you
already useJava
>_
Scripting
38. Big Data does not have to
equal Big Effort
Web ERP3rd party
apps
Enterprise
apps
IoTMobile
39. © 2020 Snowflake Inc. All Rights Reserved
Kent Graziano
Snowflake Computing
Kent.graziano@snowflake.com
On Twitter @KentGraziano
More info at
http://snowflake.com
Visit my blog at
http://kentgraziano.com
Contact Info