Weitere ähnliche Inhalte Ähnlich wie Modern Data Warehouse Fundamentals Part 2 (20) Mehr von Cloudera, Inc. (12) Kürzlich hochgeladen (20) Modern Data Warehouse Fundamentals Part 23. © Cloudera, Inc. All rights reserved. 3
SPEAKERS
Greg Rahn
Director of Product Management
grahn@cloudera.com
Santosh Kumar
Senior Product Manager
skumar@cloudera.com
4. 4 © Cloudera, Inc. All rights reserved.
BEYOND DATA WAREHOUSING
The modern platform for machine learning and analytics optimized for the cloud
Amazon S3
Microsoft
ADLS HDFS KUDU
SECURITY GOVERNANCE
WORKLOAD
MANAGEMENT
INGEST &
REPLICATION
DATA CATALOG
Core
Services
Storage
Services
ANALYTICSDATA
SCIENCE
EXTENSIBLE
SERVICES
OPERATIONAL
DATABASE
DATA ENGINEERING
5. Confidential-Restricted – For Discussion Purposes Only5 © Cloudera, Inc. All rights reserved.
WITH A CLOUD NATIVE OPTION - ALTUS DW
● Quick time to value - no software or
clusters to manage
● Bring warehouse to the data with zero
copy simplicity
● Use your security policies with your
data - no proprietary stacks
● Apply enterprise governance to
transient workloads
● Shared data experience with SDX
● Optimized for Azure & AWS
DATA WAREHOUSE
GOVERNANCESECURITY
ALTUS CONTROL
PLANE
LIFECYCLE
MANAGEMENT
MULTI-CLOUD
Amazon
S3
Microsoft
ADLS
MULTI-CLOUD PAAS SOLUTION
6. 6 © Cloudera, Inc. All rights reserved.
KOMATSU MINING: Optimize Machine Performance
CHALLENGES
Create an Industrial IoT (IIoT)
solution for optimizing mining
equipment utility and build
better next-generation products
Current system couldn’t handle:
• Scale of IoT data
• Demand for new users and
use cases
• 30TB/month data growth
RESULTS
• 2X Increase in production
hours on key equipment
• Design next-generation
equipment: environmentally
smarter, more productive, at
lower cost
• Meet or exceed all KPIs:
“Deliver all of the data with
less complexity and
significant cost savings”
SOLUTION
Cloud-based IIoT analytics for a
full view of mining operations
• Quickly and easily analyze
huge volume and variety
(time-series, sensor, event,
and more) of data
• More use cases and users:
“democratizing analytics for
different user groups”
• Scale quickly and easily in
the cloud
https://www.cloudera.com/more/news-and-blogs/press-releases/2017-11-15-komatsu-helps-improve-mining-performance.html
8. 8 © Cloudera, Inc. All rights reserved.
BUSINESS DRIVERS FOR DATA WAREHOUSING IN THE CLOUD
*BI, Analytics, and the Cloud: Strategies for Business Agility, TDWI, 2016
Scalability (51%) Flexibility (41%) Business agility /
reduce IT involvement
(39%)
Cost (37%)
$$$
9. 9 © Cloudera, Inc. All rights reserved.
TECHNOLOGY DRIVERS IN THE CLOUD
1. Cost-effective, scalable storage in a single, shared
repository
Azure Data
Lake StorageAmazon S3
2. Access to limitless utility-based compute
Amazon
EC2
Azure
Virtual Machine
3. Open and modular architectures
Apache
Impala
10. 10 © Cloudera, Inc. All rights reserved.
KEY STAKEHOLDERS
Instant, self-service
access to data and
resources
Application performance
Job-oriented tools
Choice
Secure, controlled
provisioning
Predictable costs
Systems-oriented tools
Standards and portability
KNOWLEDGE WORKERS INFRASTRUCTURE TEAM
Advance strategic
initiatives
Link analytics to business
Reduce admin burden
Integrated solutions
DATA TEAM
11. 11 © Cloudera, Inc. All rights reserved.
KEY BENEFITS
Modern Data Warehouse
High-Performance SQL
Self-Service Flexibility
Cost-Effective Scale
Open Architecture for SQL and Beyond
12. 12 © Cloudera, Inc. All rights reserved.
ADVANTAGES OF A MODERN DATA WAREHOUSE
Data Flexibility
• Iterative modeling and
self-service accessibility
• Portability: No proprietary formats
or storage lock-in
Go Beyond SQL
• Consolidate data silos with
an open architecture
• Shared data across SQL
and non-SQL workloads
High-Performance SQL and …
Cost-Effective Scalability
• Elastic scale in any environment
• Cloud-native integration for
optimized pay-per-use costs
• Proven at massive scale
Hybrid Decoupled Architecture
• Runs across multi-cloud & on-prem
for zero lock-in
• Multi-storage over S3, ADLS, HDFS,
Kudu, Isilon, etc.
Shared Data
13. 13 © Cloudera, Inc. All rights reserved.
COMMON CLOUD ANALYTIC PATTERNS
Shared Object StorageCloud
ETL/ELT ETL/ELT
Ad Hoc /
Exploratory
Sales
Reporting
Marketing
Dashboard
Only pay for what you need,
when you need it
• Transient workloads
• Contention-free isolation
Self-service flexibility at any
scale
• Elastic scale on-demand
• Multi-tenant isolation
DATA
ENGINEERING
DATA
ENGINEERING
DATA
WAREHOUSE
DATA
WAREHOUSE
DATA
WAREHOUSE
14. 14 © Cloudera, Inc. All rights reserved.
COMMON CLOUD ANALYTIC PATTERNS
Shared Object StorageCloud
ETL/ELT ETL/ELT
Ad Hoc /
Exploratory
Sales
Reporting
Marketing
Dashboard
Only pay for what you need,
when you need it
• Transient workloads
• Contention-free isolation
Self-service flexibility at any
scale
• Elastic scale on-demand
• Multi-tenant isolation
DATA
ENGINEERING
DATA
ENGINEERING
DATA
WAREHOUSE
DATA
WAREHOUSE
DATA
WAREHOUSE
Beware of data silos
without shared
metadata
15. © Cloudera, Inc. All rights reserved.
INTELLIGENT DATA CONTEXT - SHARED DATA EXPERIENCE
16. 16 © Cloudera, Inc. All rights reserved.
Stateful Context, Shared Experience
INTELLIGENT DATA CONTEXT
17. 17 © Cloudera, Inc. All rights reserved.
With Cloudera Altus Data Warehouse and SDX running on Microsoft ADLS, we were able to establish
our Telekom Data Intelligence Hub: a trusted, fully governed platform and ecosystem where our
users are empowered to exchange and analyse data and develop multi-function, data-driven
applications easier and securely. - Sven Löffler, BizDev Executive
18. 18 © Cloudera, Inc. All rights reserved.
CUSTOMER STORIES
Couldn’t solve predictive maintenance goals
EDH delivers:
• Ingest telematics in real-time
• Machine learning to predict failures
• Analytics to minimize service downtime
• Protect sensitive and regulated data
• Consistent security and governance
• “SDX is the key to making that happen” - CIO
Drug R&D too slow and expensive
EDH delivers:
• Self-service analytics
• Meet HIPAA regulations
• >5 petabytes from 2100 silos
• Using Spark, Impala, & Search side-by-side
• With Anaconda, AtScale, Cloudwick, Kinetica,
StreamSets, Tamr, Trifacta, & Zoomdata
19. 19 © Cloudera, Inc. All rights reserved.
CHALLENGES WITH MULTIPLE DEPLOYMENT MODELS
How are you managing your Data Warehouse today?
How do you share datasets?
Do you copy things around?
How do you audit accesses across copies?
Have you lost the track of the Source of the Truth?
How do you propagate access permissions on copies?
Have you ended up with multiple silos in the process?
20. 20 © Cloudera, Inc. All rights reserved.
BUSINESS IMPACT OF SILOED SYSTEMS
Lost Revenue
Inaccurate and duplicated data
directly impacts bottom line of
88% of all companies.
Limits of Legacy
Legacy limits organizations from
taking advantage of data-driven
opportunities.
Costly Compliance
By 2023, regulated organizations
will spend over 5% of revenue on
compliance.
21. 21 © Cloudera, Inc. All rights reserved.
Cloudera Enterprise with SDX
provides maximum cloud flexibility, enabling enterprise IT to
control workloads anywhere, managed any way, and deliver a
shared data experience business and data professionals demand
22. 22 © Cloudera, Inc. All rights reserved.
Of course! We have our
internal EDH cluster. That
would be easy!
Charles: With increased focus on
… business insights.. dashboard
… FAST...
Charles,
SVP, Emerging Businesses
Mulyadi,
Data Scientist
Pipelines! Workloads!
Queries! More pipelines.
More workloads! More
queries! Even more….
Alan,
Internal EDH Data Platform
Manager
Adding more workloads to Internal
EDH clusters is risky and adds
uncertainty to existing SLA-
sensitive workloads.
May be separate cluster with
“required” data?
Why not!!
23. 23 © Cloudera, Inc. All rights reserved.
Support
Data Migration Cost Grows Exponentially
Internal
EDH
Emerging
Businesses
Analytics
Sales
Analytics
37
15
47
27 27
15
Product
Training
Finance
No single source of truth
Synchronization overhead
Stale data
24. 24 © Cloudera, Inc. All rights reserved.
Support
Embrace unification of data and data context via SDX
Internal
EDH
Emerging
Businesses
Analytics
Sales
Analytics
Product
Training Finance
25. 25 © Cloudera, Inc. All rights reserved.
MODERN DATA WAREHOUSE REQUIREMENTS
Modeling Transform to it easy to combine datasets
Governance Audit trail, lineage etc.
Authorization Ensure right permissions are for right folks
Preparation Cleanse, filter, standardize to enable wider acceptance
Schema
Permissions
Gov artifacts
Ingestion Collect data from various sources in varied formats
26. 26 © Cloudera, Inc. All rights reserved.
DATA WAREHOUSE IN CLOUD DEPLOYMENTS
Data
Sources
Cloud
Store
Cloud
Store
ETL Tool BI ToolsAnalytics
DB
“Glueing Tools”
27. 27 © Cloudera, Inc. All rights reserved.
THREE THINGS TO REMEMBER ABOUT SDX
• SDX is a differentiated capability offered by Cloudera only
• SDX enables a shared data experience across multiple deployment model
• SDX provides shared data context essential for global enterprise including
schema, access permissions and governance
29. 2929
✓ No software to install or clusters
to manage
✓ Get multiple workloads up and
running within minutes
✓ Enable self service across your
organization
✓ Fully secure, automated, with
identity preserved across
functions
✓ Optimized for both AWS and
Azure
✓ Pay only for what you use
DATA ENGINEERING DATA WAREHOUSE DATA SCIENCE*
MULTI
FUNCTION
DATA CATALOG
GOVERNANCESECURITY CONTROL
PLANE
LIFECYCLE
MANAGEMENT
MULTI
CLOUD Amazon
S3
Microsoft
ADLS
CLOUDERA ALTUS DATA WAREHOUSE
BRING THE WAREHOUSE TO YOUR DATA
* roadmap
30. 30 © Cloudera, Inc. All rights reserved.
ALTUS DATA WAREHOUSE
The first data warehouse cloud service to bring the warehouse to the data—delivering instant analytics to anyone
For business analysts:
• Run reports and queries at any time, with fast,
predictable performance
• Get self service analytic access on demand, using the
same preferred tools and SQL skills
• Power reports, BI, exploratory analytics, and ad hoc
queries, all over the same shared data and schemas
• Extend insights to data science teams, data engineers,
production applications, and more
For IT:
• Eliminate data movement across workloads with lock-
in-free open architecture
• Provision isolated resources as they’re needed, with just
a few clicks
• Easily manage unlimited tenants, and maintain
consistent security and governance with the Shared
Data Experience
• Support transient and long-running workloads with
elastic scale, all with a single view into cloud costs and
usage
31. 31 © Cloudera, Inc. All rights reserved.
Metadata
Security
Governance
Workload
Management
Ingest &
Replication
MODERN DATA WAREHOUSING WITH ALTUS
Elastic and decoupled by design
Shared data
in object store
(S3 or ADLS)
Altus Data Warehouse
Sales & Marketing BI
Altus Data Engineering
Data Prep / ELT
Altus Data Warehouse
Exploratory Queries
32. 32 © Cloudera, Inc. All rights reserved.
WHAT’S MISSING FROM YOUR CLOUD DATA WAREHOUSE?
Does data need to be copied/loaded into the database?
Is upfront modeling or a proprietary data format required?
Can you scale compute and storage independently?
What’s required to grow/shrink your cluster?
Is data shared across workloads or do non-SQL workloads require different data silos?
Are object stores a native storage layer?
Can the database span on-prem and multiple cloud environments?
Flexibility
Hybrid
Scale
Beyond SQL
Shared Data
34. 34 © Cloudera, Inc. All rights reserved.
CLOUDERA ENTERPRISE
The modern platform for machine learning and analytics optimized for the cloud
Amazon
S3
Microsoft
ADLS HDFS KUDU
SECURITY GOVERNANCE
WORKLOAD
MANAGEMENT
INGEST &
REPLICATION
DATA CATALOG
Core
Services
Storage
Services
DATA
WAREHOUSE
DATA
SCIENCE
EXTENSIBLE
SERVICES
OPERATIONAL
DATABASE
DATA
ENGINEERING
35. 35 © Cloudera, Inc. All rights reserved.
CLOUDERA ALTUS
Data warehousing in the cloud – multiple clusters over single shared data
DATA
WAREHOUSE
Discovery
(raw)
DATA
WAREHOUSE
Exploration
(curated)
DATA
ENGINEERING
Prep - New
Report
DATA
WAREHOUSE
BI/New
Reporting
DATA
SCIENCE
Model
Build/Test
DATA
ENGINEERING
Prep –
Known
DATA
WAREHOUSE
Regular
Reporting
Shared Object Storage (S3, ADLS)
Shared Metadata, Security, Governance
36. 36 © Cloudera, Inc. All rights reserved.
Q&A
ALTUS FREE TRIAL
https://cloudera.com/altus