SlideShare ist ein Scribd-Unternehmen logo
1 von 77
1© Cloudera, Inc. All rights reserved.
A deep dive into running data analytic workloads in the cloud
Strata San Jose 2018
Jason Wang | Altus Engineering
Aishwarya Venkataraman | Altus Engineering
Stefan Salandy | Systems Engineering
Mala Ramakrishnan | Senior Director, Altus Product & Marketing
2© Cloudera, Inc. All rights reserved.
Who we are
Jason Stefan
Aishwarya Mala
3© Cloudera, Inc. All rights reserved.
Agenda
- Introduction
- Cloudera Altus
- Introducing today’s lab
- Hands-on data pipeline
- Running analytic database as a PaaS
- Workload Analytics
- Conclusion
4© Cloudera, Inc. All rights reserved.
Introduction
5© Cloudera, Inc. All rights reserved.
The Big Shift
In 2017
58% on-premises
11% private cloud
25% public cloud
Source: 451 Research, Voice of the
Enterprise: Workloads and Key Projects,
Cloud Transformation, 2017.
By 2019
38% on-premises
15% private cloud
41% public cloud
6© Cloudera, Inc. All rights reserved.
Old Job
Buy databases in bulk and rent back to
departments
Load data into and out of individual
data silos as needed
Add storage to each platform as
needed
The cloud has redefined our world
Role of VP of Data Management
Most deployments are a hybrid of the old and new
SDX
COMPUTE
STORAGE
The New World
New Job
Departments buy their own databases
Safe, collaborative environment for
every department to access
centralized, shared data
Departments rent their storage needs
7© Cloudera, Inc. All rights reserved.
The market is diverging toward 4 distinct
environments
¼ PaaS
¼ Public Cloud / IaaS
¼ Private Cloud
¼ Non-Cloud
8© Cloudera, Inc. All rights reserved.
Perfectly valid reasons for each environment
Non-Cloud Private Cloud
Public Cloud /
IaaS
PaaS
I want to
maximize
• Cost-efficiency • Control, elasticity,
and convenience
• Control, elasticity,
and convenience
• Agility
I want to
minimize
• Dependence on
unproven technology
• Resource contention
between
departments
• Dependence on data
center floor space
• Dependence on IT
and therefore need
as simple as possible
I want to
standardize
• On whatever
provides the best
ROI
• On a single
environment for the
entire data center
• On a single cloud
provider for all
infrastructure needs
• On whatever is
easiest to use
I want to
store my
data
• On premises
because cheaper
and/or more secure
• On premises due to
company /
government mandate
• In the cloud because
easier
• In the cloud because
easier
9© Cloudera, Inc. All rights reserved.
Which environment do you want?
Non-Cloud Private Cloud Public Cloud / IaaS PaaS
“I need huge scale in a
single cluster”
“I want to separate compute
and storage”
“I want to configure and
troubleshoot my
environment”
“I’m done hiring my own
admins”
“I have a ton of cold data”
“I have unmet demand for
ad hoc workloads”
“We’ve already done a scan
of AWS and that’s where
we’re moving”
“My team has limited skills”
“My existing cluster
utilization is 90%”
“Bare metal is not an option
and I’m not allowed to move
to the cloud”
“My annual chargeback per
server is outrageous”
“I get no love from central
IT”
10© Cloudera, Inc. All rights reserved.
● The modern platform for machine
learning and analytics
● with multiple deployment options
● and one shared data experience
11© Cloudera, Inc. All rights reserved. 11
The modern platform for machine learning and analytics optimized for the cloud
DATA CATALOG
SECURITY GOVERNANCE
WORKLOAD
MANAGEMENT
INGEST &
REPLICATION
EXTENSIBLE
SERVICES
CORE
SERVICES DATA
ENGINEERING
OPERATIONAL
DATABASE
ANALYTIC
DATABASE
DATA
SCIENCE
S3 ADL
S
HDFS KUDU
STORAGE
SERVICES
Cloudera Enterprise
PRIVATE CLOUDBARE METAL INFRASTRUCTURE
DEPLOYMENT
OPTIONS SERVICES
12© Cloudera, Inc. All rights reserved.
Who is this tutorial for?
The Data Management Infrastructure Model
https://www.gartner.com/doc/3817571/solve-data-challenges-data-management
13© Cloudera, Inc. All rights reserved.
Who is this tutorial for?
Data Management Infrastructure Model Roles and Skills
https://www.gartner.com/doc/3817571/solve-data-challenges-
data-management
14© Cloudera, Inc. All rights reserved.
Traditional on-premises workloads generally share a cluster
HDFS
15© Cloudera, Inc. All rights reserved.
Cloud workloads: Separation of storage and compute
Object Store (S3, ADLS)
Dedicated
compute
Shared
data
16© Cloudera, Inc. All rights reserved.
Technology drivers for workloads in the cloud
1. Scalable and cost-effective storage in
a single repository
1. Access to utility-based compute
1. Open and modular architectures
Amazon
EC2
Azure
Data Lake Storage
Amazon
S3
Azure
Virtual Machine
17© Cloudera, Inc. All rights reserved.
Types of clusters
lifecycle
transient permanent
single tenant
multi tenant
Data Engineering Pipeline
Analytics Cluster authorization
configuration
performance
troubleshooting
upgrade
metadata
18© Cloudera, Inc. All rights reserved.
Data Engineering in the Cloud
Hyperscale Cloud Storage
Batch
Cluster
Transient Batch
Spin up clusters as needed.
● On-demand/spot instances
● Usage-based pricing
● Sized for workload
● Cluster per tenant/user
Batch
Cluster
Batch
Cluster
Long-running Batch
Persistent clusters for frequent ETL.
● Reserved instances
● Node-based pricing
● Grow/shrink
● Cluster per tenant group
Persistent
Cluster
Batch
Persistent Batch on HDFS
Top performance for frequent ETL.
● Reserved instances
● Node-based pricing
● Shared across tenant groups
● Lift-and-shift
PaaS
Batch
Persistent
Cluster
Batch Batch
Persistent Cluster
HDFS
Batch Batch
19© Cloudera, Inc. All rights reserved.
Analytics in the cloud
Object Storage
Transient
Cluster
Transient Analytics
(infrequent usage)
Spin up clusters when needed
● On-demand instances
● Usage-based pricing
● Grow/shrink
● Cluster per tenant or user
Persistent Analytics
(regular usage)
Persistent clusters for BI any time
● Reserved instances
● Usage-based pricing
● Grow/shrink
● Cluster per tenant group
Persistent Analytics
with Local Storage (fastest)
Max speed for more regular workloads
● Reserved instances
● Node-based pricing
● Less frequent grow/shrink
● Shared cluster for shared local data
Persistent Cluster HDFS and/or
Kudu
Transient
Cluster
Persistent
Cluster
Persistent
Cluster
PaaS
20© Cloudera, Inc. All rights reserved.
Primary analytic workloads in the cloud
scale, agility, and cost-efficiencies
Shared, Open Storage
ETL / Data
Preparation
BI / SQL
Analytics
Only pay for what you
need, when you need it
• Transient workloads
• Contention-free
isolation
• Cloud-native
integration
Self-service flexibility at
any scale
• Elastic scale
• Multi-tenant isolation
• Cloud-native or local
21© Cloudera, Inc. All rights reserved.
Introduction to Cloudera Altus
22© Cloudera, Inc. All rights reserved. 22
Multi-cloud Platform-as-a-Service (PaaS) offering
Built to analyze and process data at scale in public cloud infrastructure
Cloudera Altus
EXTENSIBLE
SERVICES
ALTUS
SERVICES DATA
ENGINEERING
OPERATIONAL
DATABASE
ANALYTIC
DATABASE
DATA
SCIENCE
23© Cloudera, Inc. All rights reserved. 23
Multi-cloud Platform-as-a-Service (PaaS) offering
Built to analyze and process data at scale in public cloud infrastructure
Cloudera Altus
EXTENSIBLE
SERVICES
ALTUS
SERVICES DATA
ENGINEERING
OPERATIONAL
DATABASE
ANALYTIC
DATABASE
DATA
SCIENCE
24© Cloudera, Inc. All rights reserved.
What is it?
- Short-lived
- Single tenant
- Hive, Spark, or MapReduce Cluster
Used for things like
- ETL jobs
- batch processing
- with data living in S3 or ADLS
- Provides fast and easy job submission
without cluster management
Available on AWS and Azure
Altus Data Engineering (DE)
DATA
ENGINEERING
25© Cloudera, Inc. All rights reserved.
What is it?
- Long-lived
- Multi tenant
- Impala Cluster
Used for things like
- data warehousing
- analytics
- with data living in S3 or ADLS
- Provies fast and easy analytics
without cluster management
Available on AWS
Altus Analytic Database (ADB)
ANALYTIC
DATABASE
26© Cloudera, Inc. All rights reserved.
What is it?
- Cloud native shared metadata store
with metadata living in S3 or ADLS
Used for things like
- Shared cataloging to define and preserve
structure and business context of data
- Provides unified security across
transient and recurring workloads
- Enables consistent governance
across all data to increase compliance
Cloudera Shared Data Experience (SDX)
S3 or ADLS
DATA
ENGINEERING
ANALYTIC
DATABASE
ANALYTIC
DATABASE
ANALYTIC
DATABASE
DATA
ENGINEERING
27© Cloudera, Inc. All rights reserved.
Altus Features
Focus on the workload, not the infrastructure.
Let Altus do the heavy lifting.
Low cost
• Per-node/per-hour pricing
• Create clusters as needed
• Terminate clusters when
they’re not in use
End-user focused
• Manages your cluster so you
don’t have to
• Submit Jobs via the UI/CLI/API
• Built in workload
troubleshooting and analytics
Easy to use
• Self-service for end-users
• Built on your familiar cloud
infrastructure
• Cluster provisioning in
minutes
Cloud-native
• Runs on AWS and Azure
• Read/Write against ADLS and
S3
• Decouple storage from compute
Integrated Platform
• Same Cloudera platform on-
premises and in the cloud
• Many different services like
DE and ADB
• Share metadata across
clusters with SDX
Secure
• Integrated with Azure and
AWS security models
• Cloudera NEVER has access
to your data
• Backed by native cloud
storage
28© Cloudera, Inc. All rights reserved.
Altus Workflow
29© Cloudera, Inc. All rights reserved.
Altus Workflow: create environment
30© Cloudera, Inc. All rights reserved.
Altus Workflow: create cluster
31© Cloudera, Inc. All rights reserved.
Altus Workflow: run a job
32© Cloudera, Inc. All rights reserved.
Altus Data Engineering Workflow: short-lived
33© Cloudera, Inc. All rights reserved.
Altus Analytic Database Workflow: long-lived
34© Cloudera, Inc. All rights reserved.
What is an Environment?
What are Clusters?
An Environment is an encapsulation of the cloud provider resources and the
cross account trust needed to deploy Cloudera clusters.
A Cluster is a Cloudera Cluster (CM + Master + Worker nodes) optimized for
running specific workloads.
35© Cloudera, Inc. All rights reserved.
1. Security Model for Delegated Access
2. Networking
3. Cloud Storage Data Access
AWS vs. Azure
36© Cloudera, Inc. All rights reserved.
AWS Model for Delegated Access: IAM Roles
37© Cloudera, Inc. All rights reserved.
Azure Model for Delegated Access: Service Principal
38© Cloudera, Inc. All rights reserved.
AWS Networking
39© Cloudera, Inc. All rights reserved.
Azure Networking
40© Cloudera, Inc. All rights reserved.
AWS S3 Data Access: Instance Profile
41© Cloudera, Inc. All rights reserved.
Azure ADLS Data Access: MSI
42© Cloudera, Inc. All rights reserved.
Today’s Lab:
Solving a Business Need With Cloudera Altus
43© Cloudera, Inc. All rights reserved.
Setting the Scene
- We work for an outdoor clothing retail company and website
sales are struggling
- We need to figure out whether sales orders correlate with
website visits and what steps to take to improve sales
- We’ll use Altus DE and Altus ADB to solve this
44© Cloudera, Inc. All rights reserved.
Already Setup: Raw Data Ingestion
Sales Orders Raw Logs
45© Cloudera, Inc. All rights reserved.
Part One: Data Engineering
Sales Orders Raw Logs Tokenized logs
46© Cloudera, Inc. All rights reserved.
Sales Orders Raw Logs Tokenized logs
Part Two: Analytics
47© Cloudera, Inc. All rights reserved.
What this will look like in today’s lab
1
2
3
4
48© Cloudera, Inc. All rights reserved.
Hands-on Data Pipeline
49© Cloudera, Inc. All rights reserved.
But first, go get the handout
https://tinyurl.com/y9zxxzkm
50© Cloudera, Inc. All rights reserved.
When you see this hand it means look at your handout for a hands-on task.
Handout overview
https://tinyurl.com/y9zxxzkm
51© Cloudera, Inc. All rights reserved.
Log in to Altus
1
console.altus.cloudera.com
https://tinyurl.com/y9zxxzkm
52© Cloudera, Inc. All rights reserved.
Create one cluster for Data engineering and one
cluster for Analytic Database. While these clusters are
creating, take a break!
Create Altus clusters
2
https://tinyurl.com/y9zxxzkm
53© Cloudera, Inc. All rights reserved.
Perform ETL using Altus Data Engineering
3
https://tinyurl.com/y9zxxzkm
54© Cloudera, Inc. All rights reserved.
Altus Analytics Database
Altus Analytic DB Architecture
S3
EC2
● Impala running on
EC2 nodes
● Data stored in S3
● Data can be
accessed by
multiple clusters
56© Cloudera, Inc. All rights reserved.
Explore data using Altus Analytic Database
4
https://tinyurl.com/y9zxxzkm
57© Cloudera, Inc. All rights reserved.
58© Cloudera, Inc. All rights reserved.
Altus Workload Analytics
59© Cloudera, Inc. All rights reserved.
● Get insight into causes of
job failure
● Size clusters and optimize
job performance
● Identify issues even when
they don’t show up as
errors
Altus Workload Analytics
60© Cloudera, Inc. All rights reserved.
Hive invalid query
Troubleshooting failed jobs
5
61© Cloudera, Inc. All rights reserved.
62© Cloudera, Inc. All rights reserved.
Spark Out of Memory issue
Troubleshooting failed jobs
5
63© Cloudera, Inc. All rights reserved.
64© Cloudera, Inc. All rights reserved.
Example: Skewed join
- WA lists outlier tasks that have a long wait before they start
Optimize Performance
65© Cloudera, Inc. All rights reserved.
66© Cloudera, Inc. All rights reserved.
● Track history of recurring workloads over time
● Performance trends of each individual stage
● Automatic detection of abnormal behavior of recurring workloads (too fast or
too slow)
● Drilling down can show differences between data input / output size
● Group by jobs
DEMO
Track history
67© Cloudera, Inc. All rights reserved.
68© Cloudera, Inc. All rights reserved.
69© Cloudera, Inc. All rights reserved.
- Number of Map/Reduce jobs generated
- Log files for each individual task
- Metrics for each stage
- Browse and search configuration properties
DEMO
Execution details of a job
70© Cloudera, Inc. All rights reserved.
Conclusion
71© Cloudera, Inc. All rights reserved.
Spin up working environments ad hoc
Bring your own data and tools
Adjust resources on-demand
Pay for your actual consumption of resources
Key benefits of PaaS
72© Cloudera, Inc. All rights reserved.
cloudera.com/altus
73© Cloudera, Inc. All rights reserved.
Thank you
cloudera.com/altus
74© Cloudera, Inc. All rights reserved.
75© Cloudera, Inc. All rights reserved.
The key benefits of a modern analytic database
High-performance BI and SQL analytics
Flexibility for data and use case variety
Cost-effective scale for today and tomorrow
Go beyond SQL with an open architecture
76© Cloudera, Inc. All rights reserved.
Advantages of a modern approach
decoupled for cloud and on-premises
Go Beyond SQL
• Consolidate data silos with
an open architecture
• Shared data across SQL
and non-SQL workloads
Data Flexibility
• Iterative modeling and self-
service accessibility
• Portability: No proprietary
formats or storage lock-in
Cost-Effective Scalability
• Elastic scale in any
environment
• Cloud-native integration for
optimized pay-per-use costs
• Proven at massive scale
Hybrid
• Runs across multi-cloud &
on-prem for zero lock-in
• Multi-storage over S3,
ADLS, HDFS, Kudu, Isilon,
etc.
Shared Data
77© Cloudera, Inc. All rights reserved.
High-performance BI and SQL analytics
Flexibility for data and use case variety
Cost-effective scale for today and
tomorrow
Go beyond SQL with an open
architecture
Same SQL engine native across any
cloud and on-prem
Self-service access directly on object
stores, without the silos
Elasticity on-demand through
decoupled compute and object
storage
Converge workloads over shared
data, with zero lock-in
Key benefits translated for the cloud

Weitere ähnliche Inhalte

Was ist angesagt?

Part 2: Cloudera’s Operational Database: Unlocking New Benefits in the Cloud
Part 2: Cloudera’s Operational Database: Unlocking New Benefits in the CloudPart 2: Cloudera’s Operational Database: Unlocking New Benefits in the Cloud
Part 2: Cloudera’s Operational Database: Unlocking New Benefits in the CloudCloudera, Inc.
 
Simplifying Hadoop with RecordService, A Secure and Unified Data Access Path ...
Simplifying Hadoop with RecordService, A Secure and Unified Data Access Path ...Simplifying Hadoop with RecordService, A Secure and Unified Data Access Path ...
Simplifying Hadoop with RecordService, A Secure and Unified Data Access Path ...Cloudera, Inc.
 
One Hadoop, Multiple Clouds - NYC Big Data Meetup
One Hadoop, Multiple Clouds - NYC Big Data MeetupOne Hadoop, Multiple Clouds - NYC Big Data Meetup
One Hadoop, Multiple Clouds - NYC Big Data MeetupAndrei Savu
 
Hadoop Distributed File System (HDFS) Encryption with Cloudera Navigator Key ...
Hadoop Distributed File System (HDFS) Encryption with Cloudera Navigator Key ...Hadoop Distributed File System (HDFS) Encryption with Cloudera Navigator Key ...
Hadoop Distributed File System (HDFS) Encryption with Cloudera Navigator Key ...Cloudera, Inc.
 
Part 1: Lambda Architectures: Simplified by Apache Kudu
Part 1: Lambda Architectures: Simplified by Apache KuduPart 1: Lambda Architectures: Simplified by Apache Kudu
Part 1: Lambda Architectures: Simplified by Apache KuduCloudera, Inc.
 
Spark One Platform Webinar
Spark One Platform WebinarSpark One Platform Webinar
Spark One Platform WebinarCloudera, Inc.
 
Introduction to Machine Learning on Apache Spark MLlib by Juliet Hougland, Se...
Introduction to Machine Learning on Apache Spark MLlib by Juliet Hougland, Se...Introduction to Machine Learning on Apache Spark MLlib by Juliet Hougland, Se...
Introduction to Machine Learning on Apache Spark MLlib by Juliet Hougland, Se...Cloudera, Inc.
 
Data Science and CDSW
Data Science and CDSWData Science and CDSW
Data Science and CDSWJason Hubbard
 
Risk Management for Data: Secured and Governed
Risk Management for Data: Secured and GovernedRisk Management for Data: Secured and Governed
Risk Management for Data: Secured and GovernedCloudera, Inc.
 
Hadoop on Cloud: Why and How?
Hadoop on Cloud: Why and How?Hadoop on Cloud: Why and How?
Hadoop on Cloud: Why and How?Cloudera, Inc.
 
Extreme Sports & Beyond: Exploring a new frontier in data with GoPro
Extreme Sports & Beyond: Exploring a new frontier in data with GoProExtreme Sports & Beyond: Exploring a new frontier in data with GoPro
Extreme Sports & Beyond: Exploring a new frontier in data with GoProCloudera, Inc.
 
Data Science at Scale Using Apache Spark and Apache Hadoop
Data Science at Scale Using Apache Spark and Apache HadoopData Science at Scale Using Apache Spark and Apache Hadoop
Data Science at Scale Using Apache Spark and Apache HadoopCloudera, Inc.
 
Hadoop for the Data Scientist: Spark in Cloudera 5.5
Hadoop for the Data Scientist: Spark in Cloudera 5.5Hadoop for the Data Scientist: Spark in Cloudera 5.5
Hadoop for the Data Scientist: Spark in Cloudera 5.5Cloudera, Inc.
 
Running Hadoop as Service in AltiScale Platform
Running Hadoop as Service in AltiScale PlatformRunning Hadoop as Service in AltiScale Platform
Running Hadoop as Service in AltiScale PlatformInMobi Technology
 
Part 3: Models in Production: A Look From Beginning to End
Part 3: Models in Production: A Look From Beginning to EndPart 3: Models in Production: A Look From Beginning to End
Part 3: Models in Production: A Look From Beginning to EndCloudera, Inc.
 
Apache Spark Operations
Apache Spark OperationsApache Spark Operations
Apache Spark OperationsCloudera, Inc.
 
Five Tips for Running Cloudera on AWS
Five Tips for Running Cloudera on AWSFive Tips for Running Cloudera on AWS
Five Tips for Running Cloudera on AWSCloudera, Inc.
 

Was ist angesagt? (20)

Part 2: Cloudera’s Operational Database: Unlocking New Benefits in the Cloud
Part 2: Cloudera’s Operational Database: Unlocking New Benefits in the CloudPart 2: Cloudera’s Operational Database: Unlocking New Benefits in the Cloud
Part 2: Cloudera’s Operational Database: Unlocking New Benefits in the Cloud
 
Simplifying Hadoop with RecordService, A Secure and Unified Data Access Path ...
Simplifying Hadoop with RecordService, A Secure and Unified Data Access Path ...Simplifying Hadoop with RecordService, A Secure and Unified Data Access Path ...
Simplifying Hadoop with RecordService, A Secure and Unified Data Access Path ...
 
One Hadoop, Multiple Clouds - NYC Big Data Meetup
One Hadoop, Multiple Clouds - NYC Big Data MeetupOne Hadoop, Multiple Clouds - NYC Big Data Meetup
One Hadoop, Multiple Clouds - NYC Big Data Meetup
 
Hadoop Distributed File System (HDFS) Encryption with Cloudera Navigator Key ...
Hadoop Distributed File System (HDFS) Encryption with Cloudera Navigator Key ...Hadoop Distributed File System (HDFS) Encryption with Cloudera Navigator Key ...
Hadoop Distributed File System (HDFS) Encryption with Cloudera Navigator Key ...
 
Big Data Fundamentals
Big Data FundamentalsBig Data Fundamentals
Big Data Fundamentals
 
Part 1: Lambda Architectures: Simplified by Apache Kudu
Part 1: Lambda Architectures: Simplified by Apache KuduPart 1: Lambda Architectures: Simplified by Apache Kudu
Part 1: Lambda Architectures: Simplified by Apache Kudu
 
Spark One Platform Webinar
Spark One Platform WebinarSpark One Platform Webinar
Spark One Platform Webinar
 
Introduction to Machine Learning on Apache Spark MLlib by Juliet Hougland, Se...
Introduction to Machine Learning on Apache Spark MLlib by Juliet Hougland, Se...Introduction to Machine Learning on Apache Spark MLlib by Juliet Hougland, Se...
Introduction to Machine Learning on Apache Spark MLlib by Juliet Hougland, Se...
 
Data Science and CDSW
Data Science and CDSWData Science and CDSW
Data Science and CDSW
 
Intro to Apache Spark
Intro to Apache SparkIntro to Apache Spark
Intro to Apache Spark
 
Risk Management for Data: Secured and Governed
Risk Management for Data: Secured and GovernedRisk Management for Data: Secured and Governed
Risk Management for Data: Secured and Governed
 
Hadoop on Cloud: Why and How?
Hadoop on Cloud: Why and How?Hadoop on Cloud: Why and How?
Hadoop on Cloud: Why and How?
 
Extreme Sports & Beyond: Exploring a new frontier in data with GoPro
Extreme Sports & Beyond: Exploring a new frontier in data with GoProExtreme Sports & Beyond: Exploring a new frontier in data with GoPro
Extreme Sports & Beyond: Exploring a new frontier in data with GoPro
 
Data Science at Scale Using Apache Spark and Apache Hadoop
Data Science at Scale Using Apache Spark and Apache HadoopData Science at Scale Using Apache Spark and Apache Hadoop
Data Science at Scale Using Apache Spark and Apache Hadoop
 
Facial recognition
Facial recognitionFacial recognition
Facial recognition
 
Hadoop for the Data Scientist: Spark in Cloudera 5.5
Hadoop for the Data Scientist: Spark in Cloudera 5.5Hadoop for the Data Scientist: Spark in Cloudera 5.5
Hadoop for the Data Scientist: Spark in Cloudera 5.5
 
Running Hadoop as Service in AltiScale Platform
Running Hadoop as Service in AltiScale PlatformRunning Hadoop as Service in AltiScale Platform
Running Hadoop as Service in AltiScale Platform
 
Part 3: Models in Production: A Look From Beginning to End
Part 3: Models in Production: A Look From Beginning to EndPart 3: Models in Production: A Look From Beginning to End
Part 3: Models in Production: A Look From Beginning to End
 
Apache Spark Operations
Apache Spark OperationsApache Spark Operations
Apache Spark Operations
 
Five Tips for Running Cloudera on AWS
Five Tips for Running Cloudera on AWSFive Tips for Running Cloudera on AWS
Five Tips for Running Cloudera on AWS
 

Ähnlich wie Running data analytic workloads in the cloud with Cloudera Altus

Multidisziplinäre Analyseanwendungen auf einer gemeinsamen Datenplattform ers...
Multidisziplinäre Analyseanwendungen auf einer gemeinsamen Datenplattform ers...Multidisziplinäre Analyseanwendungen auf einer gemeinsamen Datenplattform ers...
Multidisziplinäre Analyseanwendungen auf einer gemeinsamen Datenplattform ers...Cloudera, Inc.
 
How to Build Multi-disciplinary Analytics Applications on a Shared Data Platform
How to Build Multi-disciplinary Analytics Applications on a Shared Data PlatformHow to Build Multi-disciplinary Analytics Applications on a Shared Data Platform
How to Build Multi-disciplinary Analytics Applications on a Shared Data PlatformCloudera, Inc.
 
Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19Cloudera, Inc.
 
Cloudera Altus: Big Data in der Cloud einfach gemacht
Cloudera Altus: Big Data in der Cloud einfach gemachtCloudera Altus: Big Data in der Cloud einfach gemacht
Cloudera Altus: Big Data in der Cloud einfach gemachtCloudera, Inc.
 
Big data journey to the cloud 5.30.18 asher bartch
Big data journey to the cloud 5.30.18   asher bartchBig data journey to the cloud 5.30.18   asher bartch
Big data journey to the cloud 5.30.18 asher bartchCloudera, Inc.
 
Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18Cloudera, Inc.
 
Cloud Data Warehousing with Cloudera Altus 7.24.18
Cloud Data Warehousing with Cloudera Altus 7.24.18Cloud Data Warehousing with Cloudera Altus 7.24.18
Cloud Data Warehousing with Cloudera Altus 7.24.18Cloudera, Inc.
 
Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Cloudera, Inc.
 
Cloudera GoDataFest Deploying Cloudera in the Cloud
Cloudera GoDataFest Deploying Cloudera in the CloudCloudera GoDataFest Deploying Cloudera in the Cloud
Cloudera GoDataFest Deploying Cloudera in the CloudGoDataDriven
 
Cloudera - The Modern Platform for Analytics
Cloudera - The Modern Platform for AnalyticsCloudera - The Modern Platform for Analytics
Cloudera - The Modern Platform for AnalyticsCloudera, Inc.
 
High-Performance Analytics in the Cloud with Apache Impala
High-Performance Analytics in the Cloud with Apache ImpalaHigh-Performance Analytics in the Cloud with Apache Impala
High-Performance Analytics in the Cloud with Apache ImpalaCloudera, Inc.
 
Cloudera Altus: Big Data in the Cloud Made Easy
Cloudera Altus: Big Data in the Cloud Made EasyCloudera Altus: Big Data in the Cloud Made Easy
Cloudera Altus: Big Data in the Cloud Made EasyCloudera, Inc.
 
Gartner Data and Analytics Summit: Bringing Self-Service BI & SQL Analytics ...
 Gartner Data and Analytics Summit: Bringing Self-Service BI & SQL Analytics ... Gartner Data and Analytics Summit: Bringing Self-Service BI & SQL Analytics ...
Gartner Data and Analytics Summit: Bringing Self-Service BI & SQL Analytics ...Cloudera, Inc.
 
Self-service Big Data Analytics on Microsoft Azure
Self-service Big Data Analytics on Microsoft AzureSelf-service Big Data Analytics on Microsoft Azure
Self-service Big Data Analytics on Microsoft AzureCloudera, Inc.
 
Cloud-Native Machine Learning: Emerging Trends and the Road Ahead
Cloud-Native Machine Learning: Emerging Trends and the Road AheadCloud-Native Machine Learning: Emerging Trends and the Road Ahead
Cloud-Native Machine Learning: Emerging Trends and the Road AheadDataWorks Summit
 
Comment développer une stratégie Big Data dans le cloud public avec l'offre P...
Comment développer une stratégie Big Data dans le cloud public avec l'offre P...Comment développer une stratégie Big Data dans le cloud public avec l'offre P...
Comment développer une stratégie Big Data dans le cloud public avec l'offre P...Cloudera, Inc.
 
Optimize your cloud strategy for machine learning and analytics
Optimize your cloud strategy for machine learning and analyticsOptimize your cloud strategy for machine learning and analytics
Optimize your cloud strategy for machine learning and analyticsCloudera, Inc.
 
Introduction to cloud computing
Introduction to cloud computingIntroduction to cloud computing
Introduction to cloud computingPUBLEAD (R)
 

Ähnlich wie Running data analytic workloads in the cloud with Cloudera Altus (20)

Multidisziplinäre Analyseanwendungen auf einer gemeinsamen Datenplattform ers...
Multidisziplinäre Analyseanwendungen auf einer gemeinsamen Datenplattform ers...Multidisziplinäre Analyseanwendungen auf einer gemeinsamen Datenplattform ers...
Multidisziplinäre Analyseanwendungen auf einer gemeinsamen Datenplattform ers...
 
How to Build Multi-disciplinary Analytics Applications on a Shared Data Platform
How to Build Multi-disciplinary Analytics Applications on a Shared Data PlatformHow to Build Multi-disciplinary Analytics Applications on a Shared Data Platform
How to Build Multi-disciplinary Analytics Applications on a Shared Data Platform
 
Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19
 
Cloudera Altus: Big Data in der Cloud einfach gemacht
Cloudera Altus: Big Data in der Cloud einfach gemachtCloudera Altus: Big Data in der Cloud einfach gemacht
Cloudera Altus: Big Data in der Cloud einfach gemacht
 
Big data journey to the cloud 5.30.18 asher bartch
Big data journey to the cloud 5.30.18   asher bartchBig data journey to the cloud 5.30.18   asher bartch
Big data journey to the cloud 5.30.18 asher bartch
 
Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18
 
Cloud Data Warehousing with Cloudera Altus 7.24.18
Cloud Data Warehousing with Cloudera Altus 7.24.18Cloud Data Warehousing with Cloudera Altus 7.24.18
Cloud Data Warehousing with Cloudera Altus 7.24.18
 
Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2
 
Cloudera GoDataFest Deploying Cloudera in the Cloud
Cloudera GoDataFest Deploying Cloudera in the CloudCloudera GoDataFest Deploying Cloudera in the Cloud
Cloudera GoDataFest Deploying Cloudera in the Cloud
 
Cloudera SDX
Cloudera SDXCloudera SDX
Cloudera SDX
 
Cloudera - The Modern Platform for Analytics
Cloudera - The Modern Platform for AnalyticsCloudera - The Modern Platform for Analytics
Cloudera - The Modern Platform for Analytics
 
High-Performance Analytics in the Cloud with Apache Impala
High-Performance Analytics in the Cloud with Apache ImpalaHigh-Performance Analytics in the Cloud with Apache Impala
High-Performance Analytics in the Cloud with Apache Impala
 
Cloudera Altus: Big Data in the Cloud Made Easy
Cloudera Altus: Big Data in the Cloud Made EasyCloudera Altus: Big Data in the Cloud Made Easy
Cloudera Altus: Big Data in the Cloud Made Easy
 
Gartner Data and Analytics Summit: Bringing Self-Service BI & SQL Analytics ...
 Gartner Data and Analytics Summit: Bringing Self-Service BI & SQL Analytics ... Gartner Data and Analytics Summit: Bringing Self-Service BI & SQL Analytics ...
Gartner Data and Analytics Summit: Bringing Self-Service BI & SQL Analytics ...
 
Self-service Big Data Analytics on Microsoft Azure
Self-service Big Data Analytics on Microsoft AzureSelf-service Big Data Analytics on Microsoft Azure
Self-service Big Data Analytics on Microsoft Azure
 
Cloud-Native Machine Learning: Emerging Trends and the Road Ahead
Cloud-Native Machine Learning: Emerging Trends and the Road AheadCloud-Native Machine Learning: Emerging Trends and the Road Ahead
Cloud-Native Machine Learning: Emerging Trends and the Road Ahead
 
Comment développer une stratégie Big Data dans le cloud public avec l'offre P...
Comment développer une stratégie Big Data dans le cloud public avec l'offre P...Comment développer une stratégie Big Data dans le cloud public avec l'offre P...
Comment développer une stratégie Big Data dans le cloud public avec l'offre P...
 
Hybrid is the New Normal
Hybrid is the New NormalHybrid is the New Normal
Hybrid is the New Normal
 
Optimize your cloud strategy for machine learning and analytics
Optimize your cloud strategy for machine learning and analyticsOptimize your cloud strategy for machine learning and analytics
Optimize your cloud strategy for machine learning and analytics
 
Introduction to cloud computing
Introduction to cloud computingIntroduction to cloud computing
Introduction to cloud computing
 

Mehr von Cloudera, Inc.

Partner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptxPartner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptxCloudera, Inc.
 
Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists Cloudera, Inc.
 
2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards FinalistsCloudera, Inc.
 
Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019Cloudera, Inc.
 
Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19Cloudera, Inc.
 
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19Cloudera, Inc.
 
Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19Cloudera, Inc.
 
Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19Cloudera, Inc.
 
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Cloudera, Inc.
 
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Cloudera, Inc.
 
Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3Cloudera, Inc.
 
Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1Cloudera, Inc.
 
Extending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the PlatformExtending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the PlatformCloudera, Inc.
 
Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18Cloudera, Inc.
 
Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360Cloudera, Inc.
 
Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18Cloudera, Inc.
 
Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18Cloudera, Inc.
 
Introducing Workload XM 8.7.18
Introducing Workload XM 8.7.18Introducing Workload XM 8.7.18
Introducing Workload XM 8.7.18Cloudera, Inc.
 
Get started with Cloudera's cyber solution
Get started with Cloudera's cyber solutionGet started with Cloudera's cyber solution
Get started with Cloudera's cyber solutionCloudera, Inc.
 
Spark and Deep Learning Frameworks at Scale 7.19.18
Spark and Deep Learning Frameworks at Scale 7.19.18Spark and Deep Learning Frameworks at Scale 7.19.18
Spark and Deep Learning Frameworks at Scale 7.19.18Cloudera, Inc.
 

Mehr von Cloudera, Inc. (20)

Partner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptxPartner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptx
 
Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists
 
2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists
 
Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019
 
Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19
 
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
 
Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19
 
Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19
 
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
 
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
 
Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3
 
Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1
 
Extending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the PlatformExtending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the Platform
 
Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18
 
Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360
 
Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18
 
Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18
 
Introducing Workload XM 8.7.18
Introducing Workload XM 8.7.18Introducing Workload XM 8.7.18
Introducing Workload XM 8.7.18
 
Get started with Cloudera's cyber solution
Get started with Cloudera's cyber solutionGet started with Cloudera's cyber solution
Get started with Cloudera's cyber solution
 
Spark and Deep Learning Frameworks at Scale 7.19.18
Spark and Deep Learning Frameworks at Scale 7.19.18Spark and Deep Learning Frameworks at Scale 7.19.18
Spark and Deep Learning Frameworks at Scale 7.19.18
 

Kürzlich hochgeladen

Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxJohnnyPlasten
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptxAnupama Kate
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxolyaivanovalion
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxfirstjob4
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Delhi Call girls
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingNeil Barnes
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxolyaivanovalion
 
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一ffjhghh
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfLars Albertsson
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAroojKhan71
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxolyaivanovalion
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationshipsccctableauusergroup
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiSuhani Kapoor
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfLars Albertsson
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysismanisha194592
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSAishani27
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998YohFuh
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusTimothy Spann
 

Kürzlich hochgeladen (20)

Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptx
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptx
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptx
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data Storytelling
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptx
 
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdf
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
 
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in  KishangarhDelhi 99530 vip 56974 Genuine Escort Service Call Girls in  Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICS
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 

Running data analytic workloads in the cloud with Cloudera Altus

  • 1. 1© Cloudera, Inc. All rights reserved. A deep dive into running data analytic workloads in the cloud Strata San Jose 2018 Jason Wang | Altus Engineering Aishwarya Venkataraman | Altus Engineering Stefan Salandy | Systems Engineering Mala Ramakrishnan | Senior Director, Altus Product & Marketing
  • 2. 2© Cloudera, Inc. All rights reserved. Who we are Jason Stefan Aishwarya Mala
  • 3. 3© Cloudera, Inc. All rights reserved. Agenda - Introduction - Cloudera Altus - Introducing today’s lab - Hands-on data pipeline - Running analytic database as a PaaS - Workload Analytics - Conclusion
  • 4. 4© Cloudera, Inc. All rights reserved. Introduction
  • 5. 5© Cloudera, Inc. All rights reserved. The Big Shift In 2017 58% on-premises 11% private cloud 25% public cloud Source: 451 Research, Voice of the Enterprise: Workloads and Key Projects, Cloud Transformation, 2017. By 2019 38% on-premises 15% private cloud 41% public cloud
  • 6. 6© Cloudera, Inc. All rights reserved. Old Job Buy databases in bulk and rent back to departments Load data into and out of individual data silos as needed Add storage to each platform as needed The cloud has redefined our world Role of VP of Data Management Most deployments are a hybrid of the old and new SDX COMPUTE STORAGE The New World New Job Departments buy their own databases Safe, collaborative environment for every department to access centralized, shared data Departments rent their storage needs
  • 7. 7© Cloudera, Inc. All rights reserved. The market is diverging toward 4 distinct environments ¼ PaaS ¼ Public Cloud / IaaS ¼ Private Cloud ¼ Non-Cloud
  • 8. 8© Cloudera, Inc. All rights reserved. Perfectly valid reasons for each environment Non-Cloud Private Cloud Public Cloud / IaaS PaaS I want to maximize • Cost-efficiency • Control, elasticity, and convenience • Control, elasticity, and convenience • Agility I want to minimize • Dependence on unproven technology • Resource contention between departments • Dependence on data center floor space • Dependence on IT and therefore need as simple as possible I want to standardize • On whatever provides the best ROI • On a single environment for the entire data center • On a single cloud provider for all infrastructure needs • On whatever is easiest to use I want to store my data • On premises because cheaper and/or more secure • On premises due to company / government mandate • In the cloud because easier • In the cloud because easier
  • 9. 9© Cloudera, Inc. All rights reserved. Which environment do you want? Non-Cloud Private Cloud Public Cloud / IaaS PaaS “I need huge scale in a single cluster” “I want to separate compute and storage” “I want to configure and troubleshoot my environment” “I’m done hiring my own admins” “I have a ton of cold data” “I have unmet demand for ad hoc workloads” “We’ve already done a scan of AWS and that’s where we’re moving” “My team has limited skills” “My existing cluster utilization is 90%” “Bare metal is not an option and I’m not allowed to move to the cloud” “My annual chargeback per server is outrageous” “I get no love from central IT”
  • 10. 10© Cloudera, Inc. All rights reserved. ● The modern platform for machine learning and analytics ● with multiple deployment options ● and one shared data experience
  • 11. 11© Cloudera, Inc. All rights reserved. 11 The modern platform for machine learning and analytics optimized for the cloud DATA CATALOG SECURITY GOVERNANCE WORKLOAD MANAGEMENT INGEST & REPLICATION EXTENSIBLE SERVICES CORE SERVICES DATA ENGINEERING OPERATIONAL DATABASE ANALYTIC DATABASE DATA SCIENCE S3 ADL S HDFS KUDU STORAGE SERVICES Cloudera Enterprise PRIVATE CLOUDBARE METAL INFRASTRUCTURE DEPLOYMENT OPTIONS SERVICES
  • 12. 12© Cloudera, Inc. All rights reserved. Who is this tutorial for? The Data Management Infrastructure Model https://www.gartner.com/doc/3817571/solve-data-challenges-data-management
  • 13. 13© Cloudera, Inc. All rights reserved. Who is this tutorial for? Data Management Infrastructure Model Roles and Skills https://www.gartner.com/doc/3817571/solve-data-challenges- data-management
  • 14. 14© Cloudera, Inc. All rights reserved. Traditional on-premises workloads generally share a cluster HDFS
  • 15. 15© Cloudera, Inc. All rights reserved. Cloud workloads: Separation of storage and compute Object Store (S3, ADLS) Dedicated compute Shared data
  • 16. 16© Cloudera, Inc. All rights reserved. Technology drivers for workloads in the cloud 1. Scalable and cost-effective storage in a single repository 1. Access to utility-based compute 1. Open and modular architectures Amazon EC2 Azure Data Lake Storage Amazon S3 Azure Virtual Machine
  • 17. 17© Cloudera, Inc. All rights reserved. Types of clusters lifecycle transient permanent single tenant multi tenant Data Engineering Pipeline Analytics Cluster authorization configuration performance troubleshooting upgrade metadata
  • 18. 18© Cloudera, Inc. All rights reserved. Data Engineering in the Cloud Hyperscale Cloud Storage Batch Cluster Transient Batch Spin up clusters as needed. ● On-demand/spot instances ● Usage-based pricing ● Sized for workload ● Cluster per tenant/user Batch Cluster Batch Cluster Long-running Batch Persistent clusters for frequent ETL. ● Reserved instances ● Node-based pricing ● Grow/shrink ● Cluster per tenant group Persistent Cluster Batch Persistent Batch on HDFS Top performance for frequent ETL. ● Reserved instances ● Node-based pricing ● Shared across tenant groups ● Lift-and-shift PaaS Batch Persistent Cluster Batch Batch Persistent Cluster HDFS Batch Batch
  • 19. 19© Cloudera, Inc. All rights reserved. Analytics in the cloud Object Storage Transient Cluster Transient Analytics (infrequent usage) Spin up clusters when needed ● On-demand instances ● Usage-based pricing ● Grow/shrink ● Cluster per tenant or user Persistent Analytics (regular usage) Persistent clusters for BI any time ● Reserved instances ● Usage-based pricing ● Grow/shrink ● Cluster per tenant group Persistent Analytics with Local Storage (fastest) Max speed for more regular workloads ● Reserved instances ● Node-based pricing ● Less frequent grow/shrink ● Shared cluster for shared local data Persistent Cluster HDFS and/or Kudu Transient Cluster Persistent Cluster Persistent Cluster PaaS
  • 20. 20© Cloudera, Inc. All rights reserved. Primary analytic workloads in the cloud scale, agility, and cost-efficiencies Shared, Open Storage ETL / Data Preparation BI / SQL Analytics Only pay for what you need, when you need it • Transient workloads • Contention-free isolation • Cloud-native integration Self-service flexibility at any scale • Elastic scale • Multi-tenant isolation • Cloud-native or local
  • 21. 21© Cloudera, Inc. All rights reserved. Introduction to Cloudera Altus
  • 22. 22© Cloudera, Inc. All rights reserved. 22 Multi-cloud Platform-as-a-Service (PaaS) offering Built to analyze and process data at scale in public cloud infrastructure Cloudera Altus EXTENSIBLE SERVICES ALTUS SERVICES DATA ENGINEERING OPERATIONAL DATABASE ANALYTIC DATABASE DATA SCIENCE
  • 23. 23© Cloudera, Inc. All rights reserved. 23 Multi-cloud Platform-as-a-Service (PaaS) offering Built to analyze and process data at scale in public cloud infrastructure Cloudera Altus EXTENSIBLE SERVICES ALTUS SERVICES DATA ENGINEERING OPERATIONAL DATABASE ANALYTIC DATABASE DATA SCIENCE
  • 24. 24© Cloudera, Inc. All rights reserved. What is it? - Short-lived - Single tenant - Hive, Spark, or MapReduce Cluster Used for things like - ETL jobs - batch processing - with data living in S3 or ADLS - Provides fast and easy job submission without cluster management Available on AWS and Azure Altus Data Engineering (DE) DATA ENGINEERING
  • 25. 25© Cloudera, Inc. All rights reserved. What is it? - Long-lived - Multi tenant - Impala Cluster Used for things like - data warehousing - analytics - with data living in S3 or ADLS - Provies fast and easy analytics without cluster management Available on AWS Altus Analytic Database (ADB) ANALYTIC DATABASE
  • 26. 26© Cloudera, Inc. All rights reserved. What is it? - Cloud native shared metadata store with metadata living in S3 or ADLS Used for things like - Shared cataloging to define and preserve structure and business context of data - Provides unified security across transient and recurring workloads - Enables consistent governance across all data to increase compliance Cloudera Shared Data Experience (SDX) S3 or ADLS DATA ENGINEERING ANALYTIC DATABASE ANALYTIC DATABASE ANALYTIC DATABASE DATA ENGINEERING
  • 27. 27© Cloudera, Inc. All rights reserved. Altus Features Focus on the workload, not the infrastructure. Let Altus do the heavy lifting. Low cost • Per-node/per-hour pricing • Create clusters as needed • Terminate clusters when they’re not in use End-user focused • Manages your cluster so you don’t have to • Submit Jobs via the UI/CLI/API • Built in workload troubleshooting and analytics Easy to use • Self-service for end-users • Built on your familiar cloud infrastructure • Cluster provisioning in minutes Cloud-native • Runs on AWS and Azure • Read/Write against ADLS and S3 • Decouple storage from compute Integrated Platform • Same Cloudera platform on- premises and in the cloud • Many different services like DE and ADB • Share metadata across clusters with SDX Secure • Integrated with Azure and AWS security models • Cloudera NEVER has access to your data • Backed by native cloud storage
  • 28. 28© Cloudera, Inc. All rights reserved. Altus Workflow
  • 29. 29© Cloudera, Inc. All rights reserved. Altus Workflow: create environment
  • 30. 30© Cloudera, Inc. All rights reserved. Altus Workflow: create cluster
  • 31. 31© Cloudera, Inc. All rights reserved. Altus Workflow: run a job
  • 32. 32© Cloudera, Inc. All rights reserved. Altus Data Engineering Workflow: short-lived
  • 33. 33© Cloudera, Inc. All rights reserved. Altus Analytic Database Workflow: long-lived
  • 34. 34© Cloudera, Inc. All rights reserved. What is an Environment? What are Clusters? An Environment is an encapsulation of the cloud provider resources and the cross account trust needed to deploy Cloudera clusters. A Cluster is a Cloudera Cluster (CM + Master + Worker nodes) optimized for running specific workloads.
  • 35. 35© Cloudera, Inc. All rights reserved. 1. Security Model for Delegated Access 2. Networking 3. Cloud Storage Data Access AWS vs. Azure
  • 36. 36© Cloudera, Inc. All rights reserved. AWS Model for Delegated Access: IAM Roles
  • 37. 37© Cloudera, Inc. All rights reserved. Azure Model for Delegated Access: Service Principal
  • 38. 38© Cloudera, Inc. All rights reserved. AWS Networking
  • 39. 39© Cloudera, Inc. All rights reserved. Azure Networking
  • 40. 40© Cloudera, Inc. All rights reserved. AWS S3 Data Access: Instance Profile
  • 41. 41© Cloudera, Inc. All rights reserved. Azure ADLS Data Access: MSI
  • 42. 42© Cloudera, Inc. All rights reserved. Today’s Lab: Solving a Business Need With Cloudera Altus
  • 43. 43© Cloudera, Inc. All rights reserved. Setting the Scene - We work for an outdoor clothing retail company and website sales are struggling - We need to figure out whether sales orders correlate with website visits and what steps to take to improve sales - We’ll use Altus DE and Altus ADB to solve this
  • 44. 44© Cloudera, Inc. All rights reserved. Already Setup: Raw Data Ingestion Sales Orders Raw Logs
  • 45. 45© Cloudera, Inc. All rights reserved. Part One: Data Engineering Sales Orders Raw Logs Tokenized logs
  • 46. 46© Cloudera, Inc. All rights reserved. Sales Orders Raw Logs Tokenized logs Part Two: Analytics
  • 47. 47© Cloudera, Inc. All rights reserved. What this will look like in today’s lab 1 2 3 4
  • 48. 48© Cloudera, Inc. All rights reserved. Hands-on Data Pipeline
  • 49. 49© Cloudera, Inc. All rights reserved. But first, go get the handout https://tinyurl.com/y9zxxzkm
  • 50. 50© Cloudera, Inc. All rights reserved. When you see this hand it means look at your handout for a hands-on task. Handout overview https://tinyurl.com/y9zxxzkm
  • 51. 51© Cloudera, Inc. All rights reserved. Log in to Altus 1 console.altus.cloudera.com https://tinyurl.com/y9zxxzkm
  • 52. 52© Cloudera, Inc. All rights reserved. Create one cluster for Data engineering and one cluster for Analytic Database. While these clusters are creating, take a break! Create Altus clusters 2 https://tinyurl.com/y9zxxzkm
  • 53. 53© Cloudera, Inc. All rights reserved. Perform ETL using Altus Data Engineering 3 https://tinyurl.com/y9zxxzkm
  • 54. 54© Cloudera, Inc. All rights reserved. Altus Analytics Database
  • 55. Altus Analytic DB Architecture S3 EC2 ● Impala running on EC2 nodes ● Data stored in S3 ● Data can be accessed by multiple clusters
  • 56. 56© Cloudera, Inc. All rights reserved. Explore data using Altus Analytic Database 4 https://tinyurl.com/y9zxxzkm
  • 57. 57© Cloudera, Inc. All rights reserved.
  • 58. 58© Cloudera, Inc. All rights reserved. Altus Workload Analytics
  • 59. 59© Cloudera, Inc. All rights reserved. ● Get insight into causes of job failure ● Size clusters and optimize job performance ● Identify issues even when they don’t show up as errors Altus Workload Analytics
  • 60. 60© Cloudera, Inc. All rights reserved. Hive invalid query Troubleshooting failed jobs 5
  • 61. 61© Cloudera, Inc. All rights reserved.
  • 62. 62© Cloudera, Inc. All rights reserved. Spark Out of Memory issue Troubleshooting failed jobs 5
  • 63. 63© Cloudera, Inc. All rights reserved.
  • 64. 64© Cloudera, Inc. All rights reserved. Example: Skewed join - WA lists outlier tasks that have a long wait before they start Optimize Performance
  • 65. 65© Cloudera, Inc. All rights reserved.
  • 66. 66© Cloudera, Inc. All rights reserved. ● Track history of recurring workloads over time ● Performance trends of each individual stage ● Automatic detection of abnormal behavior of recurring workloads (too fast or too slow) ● Drilling down can show differences between data input / output size ● Group by jobs DEMO Track history
  • 67. 67© Cloudera, Inc. All rights reserved.
  • 68. 68© Cloudera, Inc. All rights reserved.
  • 69. 69© Cloudera, Inc. All rights reserved. - Number of Map/Reduce jobs generated - Log files for each individual task - Metrics for each stage - Browse and search configuration properties DEMO Execution details of a job
  • 70. 70© Cloudera, Inc. All rights reserved. Conclusion
  • 71. 71© Cloudera, Inc. All rights reserved. Spin up working environments ad hoc Bring your own data and tools Adjust resources on-demand Pay for your actual consumption of resources Key benefits of PaaS
  • 72. 72© Cloudera, Inc. All rights reserved. cloudera.com/altus
  • 73. 73© Cloudera, Inc. All rights reserved. Thank you cloudera.com/altus
  • 74. 74© Cloudera, Inc. All rights reserved.
  • 75. 75© Cloudera, Inc. All rights reserved. The key benefits of a modern analytic database High-performance BI and SQL analytics Flexibility for data and use case variety Cost-effective scale for today and tomorrow Go beyond SQL with an open architecture
  • 76. 76© Cloudera, Inc. All rights reserved. Advantages of a modern approach decoupled for cloud and on-premises Go Beyond SQL • Consolidate data silos with an open architecture • Shared data across SQL and non-SQL workloads Data Flexibility • Iterative modeling and self- service accessibility • Portability: No proprietary formats or storage lock-in Cost-Effective Scalability • Elastic scale in any environment • Cloud-native integration for optimized pay-per-use costs • Proven at massive scale Hybrid • Runs across multi-cloud & on-prem for zero lock-in • Multi-storage over S3, ADLS, HDFS, Kudu, Isilon, etc. Shared Data
  • 77. 77© Cloudera, Inc. All rights reserved. High-performance BI and SQL analytics Flexibility for data and use case variety Cost-effective scale for today and tomorrow Go beyond SQL with an open architecture Same SQL engine native across any cloud and on-prem Self-service access directly on object stores, without the silos Elasticity on-demand through decoupled compute and object storage Converge workloads over shared data, with zero lock-in Key benefits translated for the cloud