SlideShare ist ein Scribd-Unternehmen logo
1 von 38
© Cloudera, Inc. All rights reserved. 1
MODERN DATA WAREHOUSE
FUNDAMENTALS
Part II: Exploring the Move to Cloud and Maintaining a Common Data Context
December, 2018
© Cloudera, Inc. All rights reserved. 3
SPEAKERS
Greg Rahn
Director of Product Management
grahn@cloudera.com
Santosh Kumar
Senior Product Manager
skumar@cloudera.com
4 © Cloudera, Inc. All rights reserved.
BEYOND DATA WAREHOUSING
The modern platform for machine learning and analytics optimized for the cloud
Amazon S3
Microsoft
ADLS HDFS KUDU
SECURITY GOVERNANCE
WORKLOAD
MANAGEMENT
INGEST &
REPLICATION
DATA CATALOG
Core
Services
Storage
Services
ANALYTICSDATA
SCIENCE
EXTENSIBLE
SERVICES
OPERATIONAL
DATABASE
DATA ENGINEERING
Confidential-Restricted – For Discussion Purposes Only5 © Cloudera, Inc. All rights reserved.
WITH A CLOUD NATIVE OPTION - ALTUS DW
● Quick time to value - no software or
clusters to manage
● Bring warehouse to the data with zero
copy simplicity
● Use your security policies with your
data - no proprietary stacks
● Apply enterprise governance to
transient workloads
● Shared data experience with SDX
● Optimized for Azure & AWS
DATA WAREHOUSE
GOVERNANCESECURITY
ALTUS CONTROL
PLANE
LIFECYCLE
MANAGEMENT
MULTI-CLOUD
Amazon
S3
Microsoft
ADLS
MULTI-CLOUD PAAS SOLUTION
6 © Cloudera, Inc. All rights reserved.
KOMATSU MINING: Optimize Machine Performance
CHALLENGES
Create an Industrial IoT (IIoT)
solution for optimizing mining
equipment utility and build
better next-generation products
Current system couldn’t handle:
• Scale of IoT data
• Demand for new users and
use cases
• 30TB/month data growth
RESULTS
• 2X Increase in production
hours on key equipment
• Design next-generation
equipment: environmentally
smarter, more productive, at
lower cost
• Meet or exceed all KPIs:
“Deliver all of the data with
less complexity and
significant cost savings”
SOLUTION
Cloud-based IIoT analytics for a
full view of mining operations
• Quickly and easily analyze
huge volume and variety
(time-series, sensor, event,
and more) of data
• More use cases and users:
“democratizing analytics for
different user groups”
• Scale quickly and easily in
the cloud
https://www.cloudera.com/more/news-and-blogs/press-releases/2017-11-15-komatsu-helps-improve-mining-performance.html
© Cloudera, Inc. All rights reserved.
MOTIVATIONS FOR THE CLOUD
8 © Cloudera, Inc. All rights reserved.
BUSINESS DRIVERS FOR DATA WAREHOUSING IN THE CLOUD
*BI, Analytics, and the Cloud: Strategies for Business Agility, TDWI, 2016
Scalability (51%) Flexibility (41%) Business agility /
reduce IT involvement
(39%)
Cost (37%)
$$$
9 © Cloudera, Inc. All rights reserved.
TECHNOLOGY DRIVERS IN THE CLOUD
1. Cost-effective, scalable storage in a single, shared
repository
Azure Data
Lake StorageAmazon S3
2. Access to limitless utility-based compute
Amazon
EC2
Azure
Virtual Machine
3. Open and modular architectures
Apache
Impala
10 © Cloudera, Inc. All rights reserved.
KEY STAKEHOLDERS
Instant, self-service
access to data and
resources
Application performance
Job-oriented tools
Choice
Secure, controlled
provisioning
Predictable costs
Systems-oriented tools
Standards and portability
KNOWLEDGE WORKERS INFRASTRUCTURE TEAM
Advance strategic
initiatives
Link analytics to business
Reduce admin burden
Integrated solutions
DATA TEAM
11 © Cloudera, Inc. All rights reserved.
KEY BENEFITS
Modern Data Warehouse
High-Performance SQL
Self-Service Flexibility
Cost-Effective Scale
Open Architecture for SQL and Beyond
12 © Cloudera, Inc. All rights reserved.
ADVANTAGES OF A MODERN DATA WAREHOUSE
Data Flexibility
• Iterative modeling and
self-service accessibility
• Portability: No proprietary formats
or storage lock-in
Go Beyond SQL
• Consolidate data silos with
an open architecture
• Shared data across SQL
and non-SQL workloads
High-Performance SQL and …
Cost-Effective Scalability
• Elastic scale in any environment
• Cloud-native integration for
optimized pay-per-use costs
• Proven at massive scale
Hybrid Decoupled Architecture
• Runs across multi-cloud & on-prem
for zero lock-in
• Multi-storage over S3, ADLS, HDFS,
Kudu, Isilon, etc.
Shared Data
13 © Cloudera, Inc. All rights reserved.
COMMON CLOUD ANALYTIC PATTERNS
Shared Object StorageCloud
ETL/ELT ETL/ELT
Ad Hoc /
Exploratory
Sales
Reporting
Marketing
Dashboard
Only pay for what you need,
when you need it
• Transient workloads
• Contention-free isolation
Self-service flexibility at any
scale
• Elastic scale on-demand
• Multi-tenant isolation
DATA
ENGINEERING
DATA
ENGINEERING
DATA
WAREHOUSE
DATA
WAREHOUSE
DATA
WAREHOUSE
14 © Cloudera, Inc. All rights reserved.
COMMON CLOUD ANALYTIC PATTERNS
Shared Object StorageCloud
ETL/ELT ETL/ELT
Ad Hoc /
Exploratory
Sales
Reporting
Marketing
Dashboard
Only pay for what you need,
when you need it
• Transient workloads
• Contention-free isolation
Self-service flexibility at any
scale
• Elastic scale on-demand
• Multi-tenant isolation
DATA
ENGINEERING
DATA
ENGINEERING
DATA
WAREHOUSE
DATA
WAREHOUSE
DATA
WAREHOUSE
Beware of data silos
without shared
metadata
© Cloudera, Inc. All rights reserved.
INTELLIGENT DATA CONTEXT - SHARED DATA EXPERIENCE
16 © Cloudera, Inc. All rights reserved.
Stateful Context, Shared Experience
INTELLIGENT DATA CONTEXT
17 © Cloudera, Inc. All rights reserved.
With Cloudera Altus Data Warehouse and SDX running on Microsoft ADLS, we were able to establish
our Telekom Data Intelligence Hub: a trusted, fully governed platform and ecosystem where our
users are empowered to exchange and analyse data and develop multi-function, data-driven
applications easier and securely. - Sven Löffler, BizDev Executive
18 © Cloudera, Inc. All rights reserved.
CUSTOMER STORIES
Couldn’t solve predictive maintenance goals
EDH delivers:
• Ingest telematics in real-time
• Machine learning to predict failures
• Analytics to minimize service downtime
• Protect sensitive and regulated data
• Consistent security and governance
• “SDX is the key to making that happen” - CIO
Drug R&D too slow and expensive
EDH delivers:
• Self-service analytics
• Meet HIPAA regulations
• >5 petabytes from 2100 silos
• Using Spark, Impala, & Search side-by-side
• With Anaconda, AtScale, Cloudwick, Kinetica,
StreamSets, Tamr, Trifacta, & Zoomdata
19 © Cloudera, Inc. All rights reserved.
CHALLENGES WITH MULTIPLE DEPLOYMENT MODELS
How are you managing your Data Warehouse today?
How do you share datasets?
Do you copy things around?
How do you audit accesses across copies?
Have you lost the track of the Source of the Truth?
How do you propagate access permissions on copies?
Have you ended up with multiple silos in the process?
20 © Cloudera, Inc. All rights reserved.
BUSINESS IMPACT OF SILOED SYSTEMS
Lost Revenue
Inaccurate and duplicated data
directly impacts bottom line of
88% of all companies.
Limits of Legacy
Legacy limits organizations from
taking advantage of data-driven
opportunities.
Costly Compliance
By 2023, regulated organizations
will spend over 5% of revenue on
compliance.
21 © Cloudera, Inc. All rights reserved.
Cloudera Enterprise with SDX
provides maximum cloud flexibility, enabling enterprise IT to
control workloads anywhere, managed any way, and deliver a
shared data experience business and data professionals demand
22 © Cloudera, Inc. All rights reserved.
Of course! We have our
internal EDH cluster. That
would be easy!
Charles: With increased focus on
… business insights.. dashboard
… FAST...
Charles,
SVP, Emerging Businesses
Mulyadi,
Data Scientist
Pipelines! Workloads!
Queries! More pipelines.
More workloads! More
queries! Even more….
Alan,
Internal EDH Data Platform
Manager
Adding more workloads to Internal
EDH clusters is risky and adds
uncertainty to existing SLA-
sensitive workloads.
May be separate cluster with
“required” data?
Why not!!
23 © Cloudera, Inc. All rights reserved.
Support
Data Migration Cost Grows Exponentially
Internal
EDH
Emerging
Businesses
Analytics
Sales
Analytics
37
15
47
27 27
15
Product
Training
Finance
No single source of truth
Synchronization overhead
Stale data
24 © Cloudera, Inc. All rights reserved.
Support
Embrace unification of data and data context via SDX
Internal
EDH
Emerging
Businesses
Analytics
Sales
Analytics
Product
Training Finance
25 © Cloudera, Inc. All rights reserved.
MODERN DATA WAREHOUSE REQUIREMENTS
Modeling Transform to it easy to combine datasets
Governance Audit trail, lineage etc.
Authorization Ensure right permissions are for right folks
Preparation Cleanse, filter, standardize to enable wider acceptance
Schema
Permissions
Gov artifacts
Ingestion Collect data from various sources in varied formats
26 © Cloudera, Inc. All rights reserved.
DATA WAREHOUSE IN CLOUD DEPLOYMENTS
Data
Sources
Cloud
Store
Cloud
Store
ETL Tool BI ToolsAnalytics
DB
“Glueing Tools”
27 © Cloudera, Inc. All rights reserved.
THREE THINGS TO REMEMBER ABOUT SDX
• SDX is a differentiated capability offered by Cloudera only
• SDX enables a shared data experience across multiple deployment model
• SDX provides shared data context essential for global enterprise including
schema, access permissions and governance
© Cloudera, Inc. All rights reserved.
CLOUDERA ALTUS FOR DATA WAREHOUSING
2929
✓ No software to install or clusters
to manage
✓ Get multiple workloads up and
running within minutes
✓ Enable self service across your
organization
✓ Fully secure, automated, with
identity preserved across
functions
✓ Optimized for both AWS and
Azure
✓ Pay only for what you use
DATA ENGINEERING DATA WAREHOUSE DATA SCIENCE*
MULTI
FUNCTION
DATA CATALOG
GOVERNANCESECURITY CONTROL
PLANE
LIFECYCLE
MANAGEMENT
MULTI
CLOUD Amazon
S3
Microsoft
ADLS
CLOUDERA ALTUS DATA WAREHOUSE
BRING THE WAREHOUSE TO YOUR DATA
* roadmap
30 © Cloudera, Inc. All rights reserved.
ALTUS DATA WAREHOUSE
The first data warehouse cloud service to bring the warehouse to the data—delivering instant analytics to anyone
For business analysts:
• Run reports and queries at any time, with fast,
predictable performance
• Get self service analytic access on demand, using the
same preferred tools and SQL skills
• Power reports, BI, exploratory analytics, and ad hoc
queries, all over the same shared data and schemas
• Extend insights to data science teams, data engineers,
production applications, and more
For IT:
• Eliminate data movement across workloads with lock-
in-free open architecture
• Provision isolated resources as they’re needed, with just
a few clicks
• Easily manage unlimited tenants, and maintain
consistent security and governance with the Shared
Data Experience
• Support transient and long-running workloads with
elastic scale, all with a single view into cloud costs and
usage
31 © Cloudera, Inc. All rights reserved.
Metadata
Security
Governance
Workload
Management
Ingest &
Replication
MODERN DATA WAREHOUSING WITH ALTUS
Elastic and decoupled by design
Shared data
in object store
(S3 or ADLS)
Altus Data Warehouse
Sales & Marketing BI
Altus Data Engineering
Data Prep / ELT
Altus Data Warehouse
Exploratory Queries
32 © Cloudera, Inc. All rights reserved.
WHAT’S MISSING FROM YOUR CLOUD DATA WAREHOUSE?
Does data need to be copied/loaded into the database?
Is upfront modeling or a proprietary data format required?
Can you scale compute and storage independently?
What’s required to grow/shrink your cluster?
Is data shared across workloads or do non-SQL workloads require different data silos?
Are object stores a native storage layer?
Can the database span on-prem and multiple cloud environments?
Flexibility
Hybrid
Scale
Beyond SQL
Shared Data
© Cloudera, Inc. All rights reserved.
SUMMARY
34 © Cloudera, Inc. All rights reserved.
CLOUDERA ENTERPRISE
The modern platform for machine learning and analytics optimized for the cloud
Amazon
S3
Microsoft
ADLS HDFS KUDU
SECURITY GOVERNANCE
WORKLOAD
MANAGEMENT
INGEST &
REPLICATION
DATA CATALOG
Core
Services
Storage
Services
DATA
WAREHOUSE
DATA
SCIENCE
EXTENSIBLE
SERVICES
OPERATIONAL
DATABASE
DATA
ENGINEERING
35 © Cloudera, Inc. All rights reserved.
CLOUDERA ALTUS
Data warehousing in the cloud – multiple clusters over single shared data
DATA
WAREHOUSE
Discovery
(raw)
DATA
WAREHOUSE
Exploration
(curated)
DATA
ENGINEERING
Prep - New
Report
DATA
WAREHOUSE
BI/New
Reporting
DATA
SCIENCE
Model
Build/Test
DATA
ENGINEERING
Prep –
Known
DATA
WAREHOUSE
Regular
Reporting
Shared Object Storage (S3, ADLS)
Shared Metadata, Security, Governance
36 © Cloudera, Inc. All rights reserved.
Q&A
ALTUS FREE TRIAL
https://cloudera.com/altus
THANK YOU
https://www.cloudera.com/products/data-warehouse.html
© Cloudera, Inc. All rights reserved. 38

Weitere ähnliche Inhalte

Was ist angesagt?

Cloudera Fast Forward Labs: The Vision and the Challenge of Applied Machine L...
Cloudera Fast Forward Labs: The Vision and the Challenge of Applied Machine L...Cloudera Fast Forward Labs: The Vision and the Challenge of Applied Machine L...
Cloudera Fast Forward Labs: The Vision and the Challenge of Applied Machine L...
Cloudera, Inc.
 
Making Self-Service BI a Reality in the Enterprise
Making Self-Service BI a Reality in the EnterpriseMaking Self-Service BI a Reality in the Enterprise
Making Self-Service BI a Reality in the Enterprise
Cloudera, Inc.
 

Was ist angesagt? (20)

Introducing Workload XM 8.7.18
Introducing Workload XM 8.7.18Introducing Workload XM 8.7.18
Introducing Workload XM 8.7.18
 
Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3
 
Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18
 
Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019
 
Cloudera SDX
Cloudera SDXCloudera SDX
Cloudera SDX
 
Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18
 
Big data journey to the cloud maz chaudhri 5.30.18
Big data journey to the cloud   maz chaudhri 5.30.18Big data journey to the cloud   maz chaudhri 5.30.18
Big data journey to the cloud maz chaudhri 5.30.18
 
The 6th Wave of Automation: Automation of Decisions | Cloudera Analytics & Ma...
The 6th Wave of Automation: Automation of Decisions | Cloudera Analytics & Ma...The 6th Wave of Automation: Automation of Decisions | Cloudera Analytics & Ma...
The 6th Wave of Automation: Automation of Decisions | Cloudera Analytics & Ma...
 
How to Build Multi-disciplinary Analytics Applications on a Shared Data Platform
How to Build Multi-disciplinary Analytics Applications on a Shared Data PlatformHow to Build Multi-disciplinary Analytics Applications on a Shared Data Platform
How to Build Multi-disciplinary Analytics Applications on a Shared Data Platform
 
Comment développer une stratégie Big Data dans le cloud public avec l'offre P...
Comment développer une stratégie Big Data dans le cloud public avec l'offre P...Comment développer une stratégie Big Data dans le cloud public avec l'offre P...
Comment développer une stratégie Big Data dans le cloud public avec l'offre P...
 
Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18
 
PaaS or Fail: Rule the Cloud with Altus
PaaS or Fail: Rule the Cloud with AltusPaaS or Fail: Rule the Cloud with Altus
PaaS or Fail: Rule the Cloud with Altus
 
Customer Best Practices: Optimizing Cloudera on AWS
Customer Best Practices: Optimizing Cloudera on AWSCustomer Best Practices: Optimizing Cloudera on AWS
Customer Best Practices: Optimizing Cloudera on AWS
 
2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists
 
Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18
 
Multidisziplinäre Analyseanwendungen auf einer gemeinsamen Datenplattform ers...
Multidisziplinäre Analyseanwendungen auf einer gemeinsamen Datenplattform ers...Multidisziplinäre Analyseanwendungen auf einer gemeinsamen Datenplattform ers...
Multidisziplinäre Analyseanwendungen auf einer gemeinsamen Datenplattform ers...
 
Cloudera Fast Forward Labs: The Vision and the Challenge of Applied Machine L...
Cloudera Fast Forward Labs: The Vision and the Challenge of Applied Machine L...Cloudera Fast Forward Labs: The Vision and the Challenge of Applied Machine L...
Cloudera Fast Forward Labs: The Vision and the Challenge of Applied Machine L...
 
Big Data Fundamentals
Big Data FundamentalsBig Data Fundamentals
Big Data Fundamentals
 
Extending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the PlatformExtending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the Platform
 
Making Self-Service BI a Reality in the Enterprise
Making Self-Service BI a Reality in the EnterpriseMaking Self-Service BI a Reality in the Enterprise
Making Self-Service BI a Reality in the Enterprise
 

Ähnlich wie Modern Data Warehouse Fundamentals Part 2

Ähnlich wie Modern Data Warehouse Fundamentals Part 2 (20)

A deep dive into running data analytic workloads in the cloud
A deep dive into running data analytic workloads in the cloudA deep dive into running data analytic workloads in the cloud
A deep dive into running data analytic workloads in the cloud
 
Hybrid is the New Normal
Hybrid is the New NormalHybrid is the New Normal
Hybrid is the New Normal
 
The new big data
The new big dataThe new big data
The new big data
 
Cloud-Native Machine Learning: Emerging Trends and the Road Ahead
Cloud-Native Machine Learning: Emerging Trends and the Road AheadCloud-Native Machine Learning: Emerging Trends and the Road Ahead
Cloud-Native Machine Learning: Emerging Trends and the Road Ahead
 
Cloudera Altus: Big Data in der Cloud einfach gemacht
Cloudera Altus: Big Data in der Cloud einfach gemachtCloudera Altus: Big Data in der Cloud einfach gemacht
Cloudera Altus: Big Data in der Cloud einfach gemacht
 
Cloudera Altus: Big Data in the Cloud Made Easy
Cloudera Altus: Big Data in the Cloud Made EasyCloudera Altus: Big Data in the Cloud Made Easy
Cloudera Altus: Big Data in the Cloud Made Easy
 
Gartner Data and Analytics Summit: Bringing Self-Service BI & SQL Analytics ...
 Gartner Data and Analytics Summit: Bringing Self-Service BI & SQL Analytics ... Gartner Data and Analytics Summit: Bringing Self-Service BI & SQL Analytics ...
Gartner Data and Analytics Summit: Bringing Self-Service BI & SQL Analytics ...
 
Five Tips for Running Cloudera on AWS
Five Tips for Running Cloudera on AWSFive Tips for Running Cloudera on AWS
Five Tips for Running Cloudera on AWS
 
Optimize your cloud strategy for machine learning and analytics
Optimize your cloud strategy for machine learning and analyticsOptimize your cloud strategy for machine learning and analytics
Optimize your cloud strategy for machine learning and analytics
 
Big Data LDN 2018: MICROSOFT AZURE AND CLOUDERA – FLEXIBLE CLOUD, WHATEVER TH...
Big Data LDN 2018: MICROSOFT AZURE AND CLOUDERA – FLEXIBLE CLOUD, WHATEVER TH...Big Data LDN 2018: MICROSOFT AZURE AND CLOUDERA – FLEXIBLE CLOUD, WHATEVER TH...
Big Data LDN 2018: MICROSOFT AZURE AND CLOUDERA – FLEXIBLE CLOUD, WHATEVER TH...
 
Turning Data into Business Value with a Modern Data Platform
Turning Data into Business Value with a Modern Data PlatformTurning Data into Business Value with a Modern Data Platform
Turning Data into Business Value with a Modern Data Platform
 
Accelerate Migration to the Cloud using Data Virtualization (APAC)
Accelerate Migration to the Cloud using Data Virtualization (APAC)Accelerate Migration to the Cloud using Data Virtualization (APAC)
Accelerate Migration to the Cloud using Data Virtualization (APAC)
 
How Big Data Can Enable Analytics from the Cloud (Technical Workshop)
How Big Data Can Enable Analytics from the Cloud (Technical Workshop)How Big Data Can Enable Analytics from the Cloud (Technical Workshop)
How Big Data Can Enable Analytics from the Cloud (Technical Workshop)
 
Big Data LDN 2018: CONSISTENT SECURITY, GOVERNANCE AND FLEXIBILITY FOR ALL WO...
Big Data LDN 2018: CONSISTENT SECURITY, GOVERNANCE AND FLEXIBILITY FOR ALL WO...Big Data LDN 2018: CONSISTENT SECURITY, GOVERNANCE AND FLEXIBILITY FOR ALL WO...
Big Data LDN 2018: CONSISTENT SECURITY, GOVERNANCE AND FLEXIBILITY FOR ALL WO...
 
Part 2: Cloudera’s Operational Database: Unlocking New Benefits in the Cloud
Part 2: Cloudera’s Operational Database: Unlocking New Benefits in the CloudPart 2: Cloudera’s Operational Database: Unlocking New Benefits in the Cloud
Part 2: Cloudera’s Operational Database: Unlocking New Benefits in the Cloud
 
Stl meetup cloudera platform - january 2020
Stl meetup   cloudera platform  - january 2020Stl meetup   cloudera platform  - january 2020
Stl meetup cloudera platform - january 2020
 
Machine Learning Model Deployment: Strategy to Implementation
Machine Learning Model Deployment: Strategy to ImplementationMachine Learning Model Deployment: Strategy to Implementation
Machine Learning Model Deployment: Strategy to Implementation
 
Cloudera GoDataFest Deploying Cloudera in the Cloud
Cloudera GoDataFest Deploying Cloudera in the CloudCloudera GoDataFest Deploying Cloudera in the Cloud
Cloudera GoDataFest Deploying Cloudera in the Cloud
 
The 5 Biggest Data Myths in Telco: Exposed
The 5 Biggest Data Myths in Telco: ExposedThe 5 Biggest Data Myths in Telco: Exposed
The 5 Biggest Data Myths in Telco: Exposed
 
Big data journey to the cloud 5.30.18 asher bartch
Big data journey to the cloud 5.30.18   asher bartchBig data journey to the cloud 5.30.18   asher bartch
Big data journey to the cloud 5.30.18 asher bartch
 

Mehr von Cloudera, Inc.

Mehr von Cloudera, Inc. (12)

Partner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptxPartner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptx
 
Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists
 
Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19
 
Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19
 
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
 
Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360
 
Spark and Deep Learning Frameworks at Scale 7.19.18
Spark and Deep Learning Frameworks at Scale 7.19.18Spark and Deep Learning Frameworks at Scale 7.19.18
Spark and Deep Learning Frameworks at Scale 7.19.18
 
How Cloudera SDX can aid GDPR compliance
How Cloudera SDX can aid GDPR complianceHow Cloudera SDX can aid GDPR compliance
How Cloudera SDX can aid GDPR compliance
 
When SAP alone is not enough
When SAP alone is not enoughWhen SAP alone is not enough
When SAP alone is not enough
 
Multi task learning stepping away from narrow expert models 7.11.18
Multi task learning stepping away from narrow expert models 7.11.18Multi task learning stepping away from narrow expert models 7.11.18
Multi task learning stepping away from narrow expert models 7.11.18
 
Cloudera training secure your cloudera cluster 7.10.18
Cloudera training secure your cloudera cluster 7.10.18Cloudera training secure your cloudera cluster 7.10.18
Cloudera training secure your cloudera cluster 7.10.18
 
Delivering improved patient outcomes through advanced analytics 6.26.18
Delivering improved patient outcomes through advanced analytics 6.26.18Delivering improved patient outcomes through advanced analytics 6.26.18
Delivering improved patient outcomes through advanced analytics 6.26.18
 

Kürzlich hochgeladen

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 

Kürzlich hochgeladen (20)

Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 

Modern Data Warehouse Fundamentals Part 2

  • 1. © Cloudera, Inc. All rights reserved. 1
  • 2. MODERN DATA WAREHOUSE FUNDAMENTALS Part II: Exploring the Move to Cloud and Maintaining a Common Data Context December, 2018
  • 3. © Cloudera, Inc. All rights reserved. 3 SPEAKERS Greg Rahn Director of Product Management grahn@cloudera.com Santosh Kumar Senior Product Manager skumar@cloudera.com
  • 4. 4 © Cloudera, Inc. All rights reserved. BEYOND DATA WAREHOUSING The modern platform for machine learning and analytics optimized for the cloud Amazon S3 Microsoft ADLS HDFS KUDU SECURITY GOVERNANCE WORKLOAD MANAGEMENT INGEST & REPLICATION DATA CATALOG Core Services Storage Services ANALYTICSDATA SCIENCE EXTENSIBLE SERVICES OPERATIONAL DATABASE DATA ENGINEERING
  • 5. Confidential-Restricted – For Discussion Purposes Only5 © Cloudera, Inc. All rights reserved. WITH A CLOUD NATIVE OPTION - ALTUS DW ● Quick time to value - no software or clusters to manage ● Bring warehouse to the data with zero copy simplicity ● Use your security policies with your data - no proprietary stacks ● Apply enterprise governance to transient workloads ● Shared data experience with SDX ● Optimized for Azure & AWS DATA WAREHOUSE GOVERNANCESECURITY ALTUS CONTROL PLANE LIFECYCLE MANAGEMENT MULTI-CLOUD Amazon S3 Microsoft ADLS MULTI-CLOUD PAAS SOLUTION
  • 6. 6 © Cloudera, Inc. All rights reserved. KOMATSU MINING: Optimize Machine Performance CHALLENGES Create an Industrial IoT (IIoT) solution for optimizing mining equipment utility and build better next-generation products Current system couldn’t handle: • Scale of IoT data • Demand for new users and use cases • 30TB/month data growth RESULTS • 2X Increase in production hours on key equipment • Design next-generation equipment: environmentally smarter, more productive, at lower cost • Meet or exceed all KPIs: “Deliver all of the data with less complexity and significant cost savings” SOLUTION Cloud-based IIoT analytics for a full view of mining operations • Quickly and easily analyze huge volume and variety (time-series, sensor, event, and more) of data • More use cases and users: “democratizing analytics for different user groups” • Scale quickly and easily in the cloud https://www.cloudera.com/more/news-and-blogs/press-releases/2017-11-15-komatsu-helps-improve-mining-performance.html
  • 7. © Cloudera, Inc. All rights reserved. MOTIVATIONS FOR THE CLOUD
  • 8. 8 © Cloudera, Inc. All rights reserved. BUSINESS DRIVERS FOR DATA WAREHOUSING IN THE CLOUD *BI, Analytics, and the Cloud: Strategies for Business Agility, TDWI, 2016 Scalability (51%) Flexibility (41%) Business agility / reduce IT involvement (39%) Cost (37%) $$$
  • 9. 9 © Cloudera, Inc. All rights reserved. TECHNOLOGY DRIVERS IN THE CLOUD 1. Cost-effective, scalable storage in a single, shared repository Azure Data Lake StorageAmazon S3 2. Access to limitless utility-based compute Amazon EC2 Azure Virtual Machine 3. Open and modular architectures Apache Impala
  • 10. 10 © Cloudera, Inc. All rights reserved. KEY STAKEHOLDERS Instant, self-service access to data and resources Application performance Job-oriented tools Choice Secure, controlled provisioning Predictable costs Systems-oriented tools Standards and portability KNOWLEDGE WORKERS INFRASTRUCTURE TEAM Advance strategic initiatives Link analytics to business Reduce admin burden Integrated solutions DATA TEAM
  • 11. 11 © Cloudera, Inc. All rights reserved. KEY BENEFITS Modern Data Warehouse High-Performance SQL Self-Service Flexibility Cost-Effective Scale Open Architecture for SQL and Beyond
  • 12. 12 © Cloudera, Inc. All rights reserved. ADVANTAGES OF A MODERN DATA WAREHOUSE Data Flexibility • Iterative modeling and self-service accessibility • Portability: No proprietary formats or storage lock-in Go Beyond SQL • Consolidate data silos with an open architecture • Shared data across SQL and non-SQL workloads High-Performance SQL and … Cost-Effective Scalability • Elastic scale in any environment • Cloud-native integration for optimized pay-per-use costs • Proven at massive scale Hybrid Decoupled Architecture • Runs across multi-cloud & on-prem for zero lock-in • Multi-storage over S3, ADLS, HDFS, Kudu, Isilon, etc. Shared Data
  • 13. 13 © Cloudera, Inc. All rights reserved. COMMON CLOUD ANALYTIC PATTERNS Shared Object StorageCloud ETL/ELT ETL/ELT Ad Hoc / Exploratory Sales Reporting Marketing Dashboard Only pay for what you need, when you need it • Transient workloads • Contention-free isolation Self-service flexibility at any scale • Elastic scale on-demand • Multi-tenant isolation DATA ENGINEERING DATA ENGINEERING DATA WAREHOUSE DATA WAREHOUSE DATA WAREHOUSE
  • 14. 14 © Cloudera, Inc. All rights reserved. COMMON CLOUD ANALYTIC PATTERNS Shared Object StorageCloud ETL/ELT ETL/ELT Ad Hoc / Exploratory Sales Reporting Marketing Dashboard Only pay for what you need, when you need it • Transient workloads • Contention-free isolation Self-service flexibility at any scale • Elastic scale on-demand • Multi-tenant isolation DATA ENGINEERING DATA ENGINEERING DATA WAREHOUSE DATA WAREHOUSE DATA WAREHOUSE Beware of data silos without shared metadata
  • 15. © Cloudera, Inc. All rights reserved. INTELLIGENT DATA CONTEXT - SHARED DATA EXPERIENCE
  • 16. 16 © Cloudera, Inc. All rights reserved. Stateful Context, Shared Experience INTELLIGENT DATA CONTEXT
  • 17. 17 © Cloudera, Inc. All rights reserved. With Cloudera Altus Data Warehouse and SDX running on Microsoft ADLS, we were able to establish our Telekom Data Intelligence Hub: a trusted, fully governed platform and ecosystem where our users are empowered to exchange and analyse data and develop multi-function, data-driven applications easier and securely. - Sven Löffler, BizDev Executive
  • 18. 18 © Cloudera, Inc. All rights reserved. CUSTOMER STORIES Couldn’t solve predictive maintenance goals EDH delivers: • Ingest telematics in real-time • Machine learning to predict failures • Analytics to minimize service downtime • Protect sensitive and regulated data • Consistent security and governance • “SDX is the key to making that happen” - CIO Drug R&D too slow and expensive EDH delivers: • Self-service analytics • Meet HIPAA regulations • >5 petabytes from 2100 silos • Using Spark, Impala, & Search side-by-side • With Anaconda, AtScale, Cloudwick, Kinetica, StreamSets, Tamr, Trifacta, & Zoomdata
  • 19. 19 © Cloudera, Inc. All rights reserved. CHALLENGES WITH MULTIPLE DEPLOYMENT MODELS How are you managing your Data Warehouse today? How do you share datasets? Do you copy things around? How do you audit accesses across copies? Have you lost the track of the Source of the Truth? How do you propagate access permissions on copies? Have you ended up with multiple silos in the process?
  • 20. 20 © Cloudera, Inc. All rights reserved. BUSINESS IMPACT OF SILOED SYSTEMS Lost Revenue Inaccurate and duplicated data directly impacts bottom line of 88% of all companies. Limits of Legacy Legacy limits organizations from taking advantage of data-driven opportunities. Costly Compliance By 2023, regulated organizations will spend over 5% of revenue on compliance.
  • 21. 21 © Cloudera, Inc. All rights reserved. Cloudera Enterprise with SDX provides maximum cloud flexibility, enabling enterprise IT to control workloads anywhere, managed any way, and deliver a shared data experience business and data professionals demand
  • 22. 22 © Cloudera, Inc. All rights reserved. Of course! We have our internal EDH cluster. That would be easy! Charles: With increased focus on … business insights.. dashboard … FAST... Charles, SVP, Emerging Businesses Mulyadi, Data Scientist Pipelines! Workloads! Queries! More pipelines. More workloads! More queries! Even more…. Alan, Internal EDH Data Platform Manager Adding more workloads to Internal EDH clusters is risky and adds uncertainty to existing SLA- sensitive workloads. May be separate cluster with “required” data? Why not!!
  • 23. 23 © Cloudera, Inc. All rights reserved. Support Data Migration Cost Grows Exponentially Internal EDH Emerging Businesses Analytics Sales Analytics 37 15 47 27 27 15 Product Training Finance No single source of truth Synchronization overhead Stale data
  • 24. 24 © Cloudera, Inc. All rights reserved. Support Embrace unification of data and data context via SDX Internal EDH Emerging Businesses Analytics Sales Analytics Product Training Finance
  • 25. 25 © Cloudera, Inc. All rights reserved. MODERN DATA WAREHOUSE REQUIREMENTS Modeling Transform to it easy to combine datasets Governance Audit trail, lineage etc. Authorization Ensure right permissions are for right folks Preparation Cleanse, filter, standardize to enable wider acceptance Schema Permissions Gov artifacts Ingestion Collect data from various sources in varied formats
  • 26. 26 © Cloudera, Inc. All rights reserved. DATA WAREHOUSE IN CLOUD DEPLOYMENTS Data Sources Cloud Store Cloud Store ETL Tool BI ToolsAnalytics DB “Glueing Tools”
  • 27. 27 © Cloudera, Inc. All rights reserved. THREE THINGS TO REMEMBER ABOUT SDX • SDX is a differentiated capability offered by Cloudera only • SDX enables a shared data experience across multiple deployment model • SDX provides shared data context essential for global enterprise including schema, access permissions and governance
  • 28. © Cloudera, Inc. All rights reserved. CLOUDERA ALTUS FOR DATA WAREHOUSING
  • 29. 2929 ✓ No software to install or clusters to manage ✓ Get multiple workloads up and running within minutes ✓ Enable self service across your organization ✓ Fully secure, automated, with identity preserved across functions ✓ Optimized for both AWS and Azure ✓ Pay only for what you use DATA ENGINEERING DATA WAREHOUSE DATA SCIENCE* MULTI FUNCTION DATA CATALOG GOVERNANCESECURITY CONTROL PLANE LIFECYCLE MANAGEMENT MULTI CLOUD Amazon S3 Microsoft ADLS CLOUDERA ALTUS DATA WAREHOUSE BRING THE WAREHOUSE TO YOUR DATA * roadmap
  • 30. 30 © Cloudera, Inc. All rights reserved. ALTUS DATA WAREHOUSE The first data warehouse cloud service to bring the warehouse to the data—delivering instant analytics to anyone For business analysts: • Run reports and queries at any time, with fast, predictable performance • Get self service analytic access on demand, using the same preferred tools and SQL skills • Power reports, BI, exploratory analytics, and ad hoc queries, all over the same shared data and schemas • Extend insights to data science teams, data engineers, production applications, and more For IT: • Eliminate data movement across workloads with lock- in-free open architecture • Provision isolated resources as they’re needed, with just a few clicks • Easily manage unlimited tenants, and maintain consistent security and governance with the Shared Data Experience • Support transient and long-running workloads with elastic scale, all with a single view into cloud costs and usage
  • 31. 31 © Cloudera, Inc. All rights reserved. Metadata Security Governance Workload Management Ingest & Replication MODERN DATA WAREHOUSING WITH ALTUS Elastic and decoupled by design Shared data in object store (S3 or ADLS) Altus Data Warehouse Sales & Marketing BI Altus Data Engineering Data Prep / ELT Altus Data Warehouse Exploratory Queries
  • 32. 32 © Cloudera, Inc. All rights reserved. WHAT’S MISSING FROM YOUR CLOUD DATA WAREHOUSE? Does data need to be copied/loaded into the database? Is upfront modeling or a proprietary data format required? Can you scale compute and storage independently? What’s required to grow/shrink your cluster? Is data shared across workloads or do non-SQL workloads require different data silos? Are object stores a native storage layer? Can the database span on-prem and multiple cloud environments? Flexibility Hybrid Scale Beyond SQL Shared Data
  • 33. © Cloudera, Inc. All rights reserved. SUMMARY
  • 34. 34 © Cloudera, Inc. All rights reserved. CLOUDERA ENTERPRISE The modern platform for machine learning and analytics optimized for the cloud Amazon S3 Microsoft ADLS HDFS KUDU SECURITY GOVERNANCE WORKLOAD MANAGEMENT INGEST & REPLICATION DATA CATALOG Core Services Storage Services DATA WAREHOUSE DATA SCIENCE EXTENSIBLE SERVICES OPERATIONAL DATABASE DATA ENGINEERING
  • 35. 35 © Cloudera, Inc. All rights reserved. CLOUDERA ALTUS Data warehousing in the cloud – multiple clusters over single shared data DATA WAREHOUSE Discovery (raw) DATA WAREHOUSE Exploration (curated) DATA ENGINEERING Prep - New Report DATA WAREHOUSE BI/New Reporting DATA SCIENCE Model Build/Test DATA ENGINEERING Prep – Known DATA WAREHOUSE Regular Reporting Shared Object Storage (S3, ADLS) Shared Metadata, Security, Governance
  • 36. 36 © Cloudera, Inc. All rights reserved. Q&A ALTUS FREE TRIAL https://cloudera.com/altus
  • 38. © Cloudera, Inc. All rights reserved. 38