SlideShare ist ein Scribd-Unternehmen logo
1 von 27
Downloaden Sie, um offline zu lesen
ENTERPRISE MACHINE LEARNING ON K8S:
LESSONS LEARNED AND THE ROAD AHEAD
Tim Chen, Cloudera
Tristan Zajonc, Cloudera
2 © Cloudera, Inc. All rights reserved.
Tim Chen
Cloudera
@tnachen
WHO ARE WE?
Lead for Cloudera Machine Learning
Former Hyperpilot CEO/Cofounder
Apache PMC for Drill, Mesos
Tristan Zajonc
Cloudera
@tristanzajonc
CTO for Machine Learning at Cloudera
Former Sense CEO/Cofounder
3 © Cloudera, Inc. All rights reserved.
DISCLAIMER
The information in this document is proprietary to Cloudera. No part of this document may be reproduced or
disclosed without the express prior written permission of Cloudera.
This document is not binding upon Cloudera in any way (including with respect to Cloudera’s course of
business, product strategy, future releases, or development commitments), and is not a part of or subject to
any license agreement or other agreement you may have with Cloudera. Cloudera makes no commitments
about any future developments or functionality in any Cloudera product. The development, release, and timing
of release of any software features or functionality described in this document remains at the discretion of
Cloudera and you should not rely on any statements about development plans or anticipated functionality in
this document when making any purchasing decisions.
Cloudera assumes no responsibility for errors or omissions in this document. Cloudera does not warrant the
accuracy or completeness of the information, text, graphics, links or other items contained within this
material. This document is provided without a warranty of any kind, either express or implied, including but
not limited to the implied warranties of merchantability, fitness for a particular purpose or non-infringement.
Cloudera shall have no liability for damages of any kind including without limitation direct, special, indirect or
consequential damages that may result from the use of these materials.”
4 © Cloudera, Inc. All rights reserved.
+
5 © Cloudera, Inc. All rights reserved.
CLOUDERA DATA SCIENCE WORKBENCH
Collaborative data science experience powered by Kubernetes
• Built with Docker and Kubernetes
• Isolated, reproducible user environments
• Supports both big and small data
• Local Python, R, Scala runtimes
• Connect to any data source
• Schedule & share GPU resources
• Scale to CDH with Spark, Impala, Hive
• Secure and governed by default
• Easy, audited access to Kerberized clusters
• Leverages shared platform services
• Deployed with Cloudera Manager
CDH CDH
Cloudera Manager
gateway node(s) CDH nodes
Hive, HDFS, ...
CDSW CDSW
...
Master
...
Engine
EngineEngine
EngineEngine
6 © Cloudera, Inc. All rights reserved.
CLOUDERA MACHINE LEARNING (PREVIEW)
Cloud-native machine learning platform for the enterprise
DATA SCIENCE DATA ENGINEERING MODEL OPERATIONS
CLOUDERA ML RUNTIME
Python/R, Spark, TensorFlow, etc, optimized for CPU/GPU
Interactive Development Batch Pipelines Predictive APIs
End-to-end ML lifecycle for
teams to build at scale
Rapid provisioning and
elastic autoscaling with
multi-cloud portability
powered by Kubernetes
Containerized workloads for
open data science with easy
dependency management
Distributed GPU training for
accelerated deep learning
Secure data access to HDFS
or cloud object storage with
shared schema, governance
KUBERNETES
Amazon EKS, Microsoft AKS, Google GKE, Red Hat OpenShift
DATA CATALOG
GOVERNANCESECURITY LIFECYCLEWORKLOAD XM
STORAGE Amazon
S3
Microsoft
ADLS
HDFS KUDU
7 © Cloudera, Inc. All rights reserved.
SOME LESSONS LEARNED
8 © Cloudera, Inc. All rights reserved.
LESSON 1: ENTERPRISE ML REQUIRES BIGGER PICTURE
Streaming
Ingest
Batch Ingest
Machine
Learning Tools
BI Tools and
SQL Editors
Data Products
DATA, METADATA, SECURITY, GOVERNANCE, WORKLOAD MANAGEMENT
MACHINE
LEARNING
DATA
ENGINEERING
DATA
WAREHOUSE
OPERATIONAL
DATABASE
9 © Cloudera, Inc. All rights reserved.
UBER
Michelangelo: Uber’s Machine Learning Platform
Source: https://eng.uber.com/michelangelo/
10 © Cloudera, Inc. All rights reserved.
NETFLIX
Netflix recommendation infrastructure
Source: https://medium.com/netflix-techblog/netflix-at-spark-ai-summit-2018-5304749ed7fa
11 © Cloudera, Inc. All rights reserved.
FACEBOOK
Facebook’s AI infrastructure
Source: https://www.slideshare.net/KarthikMurugesan2/facebook-machine-learning-infrastructure-2018-slides
12 © Cloudera, Inc. All rights reserved.
EMERGING CONSENSUS FOR ENTERPRISE ML AT SCALE
Cloud Native
Infrastructure
Big Data
Platform
ML/AI
Frameworks
13 © Cloudera, Inc. All rights reserved.
Data scientists want to use every library under the sun, platform should support that
Workbench
(interactive)
ANALYZ
E
Experiments
(batch, ad-hoc)
TRAIN
Reports
(shared output)
SHARE
Jobs
(batch, scheduled)
REPEAT
Models
(RPC APIs)
SERVE
LOOSELY COUPLED, ML/AI FRAMEWORK AGNOSTIC
LESSON 2: FOCUS ON WORKFLOWS NOT FRAMEWORKS
14 © Cloudera, Inc. All rights reserved.
Cloudera ML is built around one Operator and four “CRDs”:
• Session (Interactive)
• Experiment (Batch)
• Job (Scheduled)
• Model (Online API)
No long-lived operators for particular libraries: TFJob, SparkJob, etc
Kubernetes is not exposed to data scientist. Goal is serverless experience.
WHAT DOES THIS LOOK LIKE IN K8S?
Let’s talk operators and CRDs….
15 © Cloudera, Inc. All rights reserved.
cldr ml run --cmd “python train.py”
16 © Cloudera, Inc. All rights reserved.
from pyspark.sql import SparkSession
spark = SparkSession.builder.getOrCreate()
results = sc.parallelize(xrange(0, 1000)) 
.map(f).collect()
17 © Cloudera, Inc. All rights reserved.
ROAD AHEAD
18 © Cloudera, Inc. All rights reserved.
• Unified workflow for distributed data engineering and machine learning
• Simple dependency management, particularly for ML workloads
• python: pip install, R: install.packages, etc
• Elastic compute (CPU / GPU) in cloud
• Improved utilization and multi-tenancy
WHY SPARK ON KUBERNETES?
19 © Cloudera, Inc. All rights reserved.
SPARK ON KUBERNETES
• Spark has native integration with Kubernetes since 2.3+
• Runs Spark fully containerized on K8s
• Allows Spark to add / remove executors with user specified Spark configuration
• Able to leverages Kubernetes features to facilitate launching Spark
(e.g: Node Affinities, RBAC, Namespaces, etc).
20 © Cloudera, Inc. All rights reserved.
Kubernetes
CMLCML
Engine
Spark
Driver
(Client
Mode)
Spark
Executor
Fluentd Fluentd
Project
Volume
Project
Volume
Spark
Executor
Project
Volume
Hive, HDFS,
HMS, Sentry,
Navigator- Create service account and namespace for
quota
- Configure Spark driver / executor
configuration to connect to k8s. Also
populate kerberos related information for
accessing HDFS and other CDH services.
- Populate dependency management related
configurations using pod template: volume,
image, user information, etc.
- Fluentd configured to pick up logging and
metric information to external storage
SPARK ON KUBERNETES IN CML
21 © Cloudera, Inc. All rights reserved.
DEMO
22 © Cloudera, Inc. All rights reserved.
• Kerberos support
• Dynamic allocation
• GPU support
• Scale and automation tests
• Advanced scheduling (FairScheduler, Hierarchical Queues, etc)
SPARK AND K8S UPSTREAM ONGOING WORK
23 © Cloudera, Inc. All rights reserved.
Proactively optimize Workloads, Application Performance, and Infrastructure Capacity
for Data Warehousing, Data Engineering, and Machine Learning Environments
Migrate Analyze Optimize Manage
AIOPS - AI-DRIVEN OPTIMIZATION AT SCALE
24 © Cloudera, Inc. All rights reserved.
AIOPS ON K8S - NODE BOTTLENECK ANALYSIS
25 © Cloudera, Inc. All rights reserved.
AIOPS ON K8S - CLUSTER AUTOMATION
26 © Cloudera, Inc. All rights reserved.
Questions?
Want to play around with this stuff?
tiny.cloudera.com/CML
Relevant Sigs
sig-big-data, sig-ml, sig-scheduling
THANK YOU

Weitere ähnliche Inhalte

Was ist angesagt?

Breaking the Monolith
Breaking the MonolithBreaking the Monolith
Breaking the MonolithVMware Tanzu
 
Kubernetes data science and machine learning
Kubernetes data science and machine learningKubernetes data science and machine learning
Kubernetes data science and machine learningKublr
 
High-Precision GPS Positioning for Spring Developers
High-Precision GPS Positioning for Spring DevelopersHigh-Precision GPS Positioning for Spring Developers
High-Precision GPS Positioning for Spring DevelopersVMware Tanzu
 
Secure Infrastructure Provisioning with Terraform Cloud, Vault + GitLab CI
Secure Infrastructure Provisioning with Terraform Cloud, Vault + GitLab CISecure Infrastructure Provisioning with Terraform Cloud, Vault + GitLab CI
Secure Infrastructure Provisioning with Terraform Cloud, Vault + GitLab CIMitchell Pronschinske
 
Manage thousands of k8s applications with minimal efforts using kube carrier
Manage thousands of k8s applications with minimal efforts using kube carrierManage thousands of k8s applications with minimal efforts using kube carrier
Manage thousands of k8s applications with minimal efforts using kube carrierLibbySchulze
 
The Good, the Bad and the Ugly of Migrating Hundreds of Legacy Applications ...
 The Good, the Bad and the Ugly of Migrating Hundreds of Legacy Applications ... The Good, the Bad and the Ugly of Migrating Hundreds of Legacy Applications ...
The Good, the Bad and the Ugly of Migrating Hundreds of Legacy Applications ...Josef Adersberger
 
Webinar: Operating Kubernetes at Scale
Webinar: Operating Kubernetes at ScaleWebinar: Operating Kubernetes at Scale
Webinar: Operating Kubernetes at ScaleMesosphere Inc.
 
An Open, Open source way to enable your Cloud Native Journey
An Open, Open source way to enable your Cloud Native JourneyAn Open, Open source way to enable your Cloud Native Journey
An Open, Open source way to enable your Cloud Native Journeyinwin stack
 
Top Considerations For Operating a Kubernetes Environment at Scale
Top Considerations For Operating a Kubernetes Environment at ScaleTop Considerations For Operating a Kubernetes Environment at Scale
Top Considerations For Operating a Kubernetes Environment at ScaleSignalFx
 
The Evolution of your Kubernetes Cluster
The Evolution of your Kubernetes ClusterThe Evolution of your Kubernetes Cluster
The Evolution of your Kubernetes ClusterKublr
 
Distributed architecture in a cloud native microservices ecosystem
Distributed architecture in a cloud native microservices ecosystemDistributed architecture in a cloud native microservices ecosystem
Distributed architecture in a cloud native microservices ecosystemZhenzhong Xu
 
Kirin User Story: Migrating Mission Critical Applications to OpenStack Privat...
Kirin User Story: Migrating Mission Critical Applications to OpenStack Privat...Kirin User Story: Migrating Mission Critical Applications to OpenStack Privat...
Kirin User Story: Migrating Mission Critical Applications to OpenStack Privat...Motoki Kakinuma
 
Overseeing Ship's Surveys and Surveyors Globally Using IoT and Docker by Jay ...
Overseeing Ship's Surveys and Surveyors Globally Using IoT and Docker by Jay ...Overseeing Ship's Surveys and Surveyors Globally Using IoT and Docker by Jay ...
Overseeing Ship's Surveys and Surveyors Globally Using IoT and Docker by Jay ...Docker, Inc.
 
Continuous Everything in a Multi-cloud and Multi-platform Environment
Continuous Everything in a Multi-cloud and Multi-platform EnvironmentContinuous Everything in a Multi-cloud and Multi-platform Environment
Continuous Everything in a Multi-cloud and Multi-platform EnvironmentVMware Tanzu
 
Storage os kubernetes clusters need persistent data
Storage os   kubernetes clusters need persistent dataStorage os   kubernetes clusters need persistent data
Storage os kubernetes clusters need persistent dataLibbySchulze
 
運用高效、敏捷全新平台極速落實雲原生開發
運用高效、敏捷全新平台極速落實雲原生開發運用高效、敏捷全新平台極速落實雲原生開發
運用高效、敏捷全新平台極速落實雲原生開發inwin stack
 
CF Summit North America 2017 - Technical Keynote
CF Summit North America 2017 - Technical KeynoteCF Summit North America 2017 - Technical Keynote
CF Summit North America 2017 - Technical KeynoteChip Childers
 
PaaS application in Heroku
PaaS application in HerokuPaaS application in Heroku
PaaS application in HerokuDileepa Jayakody
 

Was ist angesagt? (20)

Breaking the Monolith
Breaking the MonolithBreaking the Monolith
Breaking the Monolith
 
Intro to Google Cloud Platform Data Engineering.- Endpoints
Intro to Google Cloud Platform Data Engineering.- EndpointsIntro to Google Cloud Platform Data Engineering.- Endpoints
Intro to Google Cloud Platform Data Engineering.- Endpoints
 
Kubernetes data science and machine learning
Kubernetes data science and machine learningKubernetes data science and machine learning
Kubernetes data science and machine learning
 
High-Precision GPS Positioning for Spring Developers
High-Precision GPS Positioning for Spring DevelopersHigh-Precision GPS Positioning for Spring Developers
High-Precision GPS Positioning for Spring Developers
 
Secure Infrastructure Provisioning with Terraform Cloud, Vault + GitLab CI
Secure Infrastructure Provisioning with Terraform Cloud, Vault + GitLab CISecure Infrastructure Provisioning with Terraform Cloud, Vault + GitLab CI
Secure Infrastructure Provisioning with Terraform Cloud, Vault + GitLab CI
 
Manage thousands of k8s applications with minimal efforts using kube carrier
Manage thousands of k8s applications with minimal efforts using kube carrierManage thousands of k8s applications with minimal efforts using kube carrier
Manage thousands of k8s applications with minimal efforts using kube carrier
 
The Good, the Bad and the Ugly of Migrating Hundreds of Legacy Applications ...
 The Good, the Bad and the Ugly of Migrating Hundreds of Legacy Applications ... The Good, the Bad and the Ugly of Migrating Hundreds of Legacy Applications ...
The Good, the Bad and the Ugly of Migrating Hundreds of Legacy Applications ...
 
Webinar: Operating Kubernetes at Scale
Webinar: Operating Kubernetes at ScaleWebinar: Operating Kubernetes at Scale
Webinar: Operating Kubernetes at Scale
 
An Open, Open source way to enable your Cloud Native Journey
An Open, Open source way to enable your Cloud Native JourneyAn Open, Open source way to enable your Cloud Native Journey
An Open, Open source way to enable your Cloud Native Journey
 
Top Considerations For Operating a Kubernetes Environment at Scale
Top Considerations For Operating a Kubernetes Environment at ScaleTop Considerations For Operating a Kubernetes Environment at Scale
Top Considerations For Operating a Kubernetes Environment at Scale
 
The Evolution of your Kubernetes Cluster
The Evolution of your Kubernetes ClusterThe Evolution of your Kubernetes Cluster
The Evolution of your Kubernetes Cluster
 
Distributed architecture in a cloud native microservices ecosystem
Distributed architecture in a cloud native microservices ecosystemDistributed architecture in a cloud native microservices ecosystem
Distributed architecture in a cloud native microservices ecosystem
 
Kirin User Story: Migrating Mission Critical Applications to OpenStack Privat...
Kirin User Story: Migrating Mission Critical Applications to OpenStack Privat...Kirin User Story: Migrating Mission Critical Applications to OpenStack Privat...
Kirin User Story: Migrating Mission Critical Applications to OpenStack Privat...
 
Overseeing Ship's Surveys and Surveyors Globally Using IoT and Docker by Jay ...
Overseeing Ship's Surveys and Surveyors Globally Using IoT and Docker by Jay ...Overseeing Ship's Surveys and Surveyors Globally Using IoT and Docker by Jay ...
Overseeing Ship's Surveys and Surveyors Globally Using IoT and Docker by Jay ...
 
Continuous Everything in a Multi-cloud and Multi-platform Environment
Continuous Everything in a Multi-cloud and Multi-platform EnvironmentContinuous Everything in a Multi-cloud and Multi-platform Environment
Continuous Everything in a Multi-cloud and Multi-platform Environment
 
Storage os kubernetes clusters need persistent data
Storage os   kubernetes clusters need persistent dataStorage os   kubernetes clusters need persistent data
Storage os kubernetes clusters need persistent data
 
Building cloud native apps
Building cloud native appsBuilding cloud native apps
Building cloud native apps
 
運用高效、敏捷全新平台極速落實雲原生開發
運用高效、敏捷全新平台極速落實雲原生開發運用高效、敏捷全新平台極速落實雲原生開發
運用高效、敏捷全新平台極速落實雲原生開發
 
CF Summit North America 2017 - Technical Keynote
CF Summit North America 2017 - Technical KeynoteCF Summit North America 2017 - Technical Keynote
CF Summit North America 2017 - Technical Keynote
 
PaaS application in Heroku
PaaS application in HerokuPaaS application in Heroku
PaaS application in Heroku
 

Ähnlich wie Enterprise machine learning on k8s lessons learned and the road ahead

Cloud-Native Machine Learning: Emerging Trends and the Road Ahead
Cloud-Native Machine Learning: Emerging Trends and the Road AheadCloud-Native Machine Learning: Emerging Trends and the Road Ahead
Cloud-Native Machine Learning: Emerging Trends and the Road AheadDataWorks Summit
 
Edge to AI: Analytics from Edge to Cloud with Efficient Movement of Machine ...
Edge to AI:  Analytics from Edge to Cloud with Efficient Movement of Machine ...Edge to AI:  Analytics from Edge to Cloud with Efficient Movement of Machine ...
Edge to AI: Analytics from Edge to Cloud with Efficient Movement of Machine ...Timothy Spann
 
The Edge to AI Deep Dive Barcelona Meetup March 2019
The Edge to AI Deep Dive Barcelona Meetup March 2019The Edge to AI Deep Dive Barcelona Meetup March 2019
The Edge to AI Deep Dive Barcelona Meetup March 2019Timothy Spann
 
Part 2: A Visual Dive into Machine Learning and Deep Learning 

Part 2: A Visual Dive into Machine Learning and Deep Learning 
Part 2: A Visual Dive into Machine Learning and Deep Learning 

Part 2: A Visual Dive into Machine Learning and Deep Learning 
Cloudera, Inc.
 
Edge to ai analytics from edge to cloud with efficient movement of machine data
Edge to ai  analytics from edge to cloud with efficient movement of machine dataEdge to ai  analytics from edge to cloud with efficient movement of machine data
Edge to ai analytics from edge to cloud with efficient movement of machine dataTimothy Spann
 
Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19Cloudera, Inc.
 
Machine Learning in the Enterprise 2019
Machine Learning in the Enterprise 2019   Machine Learning in the Enterprise 2019
Machine Learning in the Enterprise 2019 Timothy Spann
 
How to go into production your machine learning models? #CWT2017
How to go into production your machine learning models? #CWT2017How to go into production your machine learning models? #CWT2017
How to go into production your machine learning models? #CWT2017Cloudera Japan
 
One Hadoop, Multiple Clouds - NYC Big Data Meetup
One Hadoop, Multiple Clouds - NYC Big Data MeetupOne Hadoop, Multiple Clouds - NYC Big Data Meetup
One Hadoop, Multiple Clouds - NYC Big Data MeetupAndrei Savu
 
One Hadoop, Multiple Clouds
One Hadoop, Multiple CloudsOne Hadoop, Multiple Clouds
One Hadoop, Multiple CloudsCloudera, Inc.
 
Meetup Streaming Data Pipeline Development
Meetup Streaming Data Pipeline DevelopmentMeetup Streaming Data Pipeline Development
Meetup Streaming Data Pipeline DevelopmentTimothy Spann
 
Future of Data Milwaukee Meetup Streaming Data Pipeline Development 28 June 2023
Future of Data Milwaukee Meetup Streaming Data Pipeline Development 28 June 2023Future of Data Milwaukee Meetup Streaming Data Pipeline Development 28 June 2023
Future of Data Milwaukee Meetup Streaming Data Pipeline Development 28 June 2023ssuser73434e
 
Cloudera Analytics and Machine Learning Platform - Optimized for Cloud
Cloudera Analytics and Machine Learning Platform - Optimized for Cloud Cloudera Analytics and Machine Learning Platform - Optimized for Cloud
Cloudera Analytics and Machine Learning Platform - Optimized for Cloud Stefan Lipp
 
Big Data Fundamentals 6.6.18
Big Data Fundamentals 6.6.18Big Data Fundamentals 6.6.18
Big Data Fundamentals 6.6.18Cloudera, Inc.
 
Hadoop on Cloud: Why and How?
Hadoop on Cloud: Why and How?Hadoop on Cloud: Why and How?
Hadoop on Cloud: Why and How?Cloudera, Inc.
 
26Oct2023_Adding Generative AI to Real-Time Streaming Pipelines_ NYC Meetup
26Oct2023_Adding Generative AI to Real-Time Streaming Pipelines_ NYC Meetup26Oct2023_Adding Generative AI to Real-Time Streaming Pipelines_ NYC Meetup
26Oct2023_Adding Generative AI to Real-Time Streaming Pipelines_ NYC MeetupTimothy Spann
 
PartnerSkillUp_Enable a Streaming CDC Solution
PartnerSkillUp_Enable a Streaming CDC SolutionPartnerSkillUp_Enable a Streaming CDC Solution
PartnerSkillUp_Enable a Streaming CDC SolutionTimothy Spann
 
April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024
April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024
April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024Timothy Spann
 
NOVA Data Science Meetup 2-21-2018 Presentation Cloudera Data Science Workbench
NOVA Data Science Meetup 2-21-2018 Presentation Cloudera Data Science WorkbenchNOVA Data Science Meetup 2-21-2018 Presentation Cloudera Data Science Workbench
NOVA Data Science Meetup 2-21-2018 Presentation Cloudera Data Science WorkbenchNOVA DATASCIENCE
 

Ähnlich wie Enterprise machine learning on k8s lessons learned and the road ahead (20)

Cloud-Native Machine Learning: Emerging Trends and the Road Ahead
Cloud-Native Machine Learning: Emerging Trends and the Road AheadCloud-Native Machine Learning: Emerging Trends and the Road Ahead
Cloud-Native Machine Learning: Emerging Trends and the Road Ahead
 
Edge to AI: Analytics from Edge to Cloud with Efficient Movement of Machine ...
Edge to AI:  Analytics from Edge to Cloud with Efficient Movement of Machine ...Edge to AI:  Analytics from Edge to Cloud with Efficient Movement of Machine ...
Edge to AI: Analytics from Edge to Cloud with Efficient Movement of Machine ...
 
The Edge to AI Deep Dive Barcelona Meetup March 2019
The Edge to AI Deep Dive Barcelona Meetup March 2019The Edge to AI Deep Dive Barcelona Meetup March 2019
The Edge to AI Deep Dive Barcelona Meetup March 2019
 
Part 2: A Visual Dive into Machine Learning and Deep Learning 

Part 2: A Visual Dive into Machine Learning and Deep Learning 
Part 2: A Visual Dive into Machine Learning and Deep Learning 

Part 2: A Visual Dive into Machine Learning and Deep Learning 

 
Edge to ai analytics from edge to cloud with efficient movement of machine data
Edge to ai  analytics from edge to cloud with efficient movement of machine dataEdge to ai  analytics from edge to cloud with efficient movement of machine data
Edge to ai analytics from edge to cloud with efficient movement of machine data
 
Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19
 
Machine Learning in the Enterprise 2019
Machine Learning in the Enterprise 2019   Machine Learning in the Enterprise 2019
Machine Learning in the Enterprise 2019
 
How to go into production your machine learning models? #CWT2017
How to go into production your machine learning models? #CWT2017How to go into production your machine learning models? #CWT2017
How to go into production your machine learning models? #CWT2017
 
One Hadoop, Multiple Clouds - NYC Big Data Meetup
One Hadoop, Multiple Clouds - NYC Big Data MeetupOne Hadoop, Multiple Clouds - NYC Big Data Meetup
One Hadoop, Multiple Clouds - NYC Big Data Meetup
 
One Hadoop, Multiple Clouds
One Hadoop, Multiple CloudsOne Hadoop, Multiple Clouds
One Hadoop, Multiple Clouds
 
Meetup Streaming Data Pipeline Development
Meetup Streaming Data Pipeline DevelopmentMeetup Streaming Data Pipeline Development
Meetup Streaming Data Pipeline Development
 
Future of Data Milwaukee Meetup Streaming Data Pipeline Development 28 June 2023
Future of Data Milwaukee Meetup Streaming Data Pipeline Development 28 June 2023Future of Data Milwaukee Meetup Streaming Data Pipeline Development 28 June 2023
Future of Data Milwaukee Meetup Streaming Data Pipeline Development 28 June 2023
 
Cloudera Analytics and Machine Learning Platform - Optimized for Cloud
Cloudera Analytics and Machine Learning Platform - Optimized for Cloud Cloudera Analytics and Machine Learning Platform - Optimized for Cloud
Cloudera Analytics and Machine Learning Platform - Optimized for Cloud
 
Big Data Fundamentals 6.6.18
Big Data Fundamentals 6.6.18Big Data Fundamentals 6.6.18
Big Data Fundamentals 6.6.18
 
Big Data Fundamentals
Big Data FundamentalsBig Data Fundamentals
Big Data Fundamentals
 
Hadoop on Cloud: Why and How?
Hadoop on Cloud: Why and How?Hadoop on Cloud: Why and How?
Hadoop on Cloud: Why and How?
 
26Oct2023_Adding Generative AI to Real-Time Streaming Pipelines_ NYC Meetup
26Oct2023_Adding Generative AI to Real-Time Streaming Pipelines_ NYC Meetup26Oct2023_Adding Generative AI to Real-Time Streaming Pipelines_ NYC Meetup
26Oct2023_Adding Generative AI to Real-Time Streaming Pipelines_ NYC Meetup
 
PartnerSkillUp_Enable a Streaming CDC Solution
PartnerSkillUp_Enable a Streaming CDC SolutionPartnerSkillUp_Enable a Streaming CDC Solution
PartnerSkillUp_Enable a Streaming CDC Solution
 
April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024
April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024
April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024
 
NOVA Data Science Meetup 2-21-2018 Presentation Cloudera Data Science Workbench
NOVA Data Science Meetup 2-21-2018 Presentation Cloudera Data Science WorkbenchNOVA Data Science Meetup 2-21-2018 Presentation Cloudera Data Science Workbench
NOVA Data Science Meetup 2-21-2018 Presentation Cloudera Data Science Workbench
 

Kürzlich hochgeladen

TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providermohitmore19
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...ICS
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comFatema Valibhai
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsJhone kinadey
 
VTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnVTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnAmarnathKambale
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfkalichargn70th171
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...Health
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVshikhaohhpro
 
Diamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionDiamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionSolGuruz
 
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdfintroduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdfVishalKumarJha10
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...panagenda
 
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerHow To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerThousandEyes
 
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...OnePlan Solutions
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Modelsaagamshah0812
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfkalichargn70th171
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsAlberto González Trastoy
 
Exploring the Best Video Editing App.pdf
Exploring the Best Video Editing App.pdfExploring the Best Video Editing App.pdf
Exploring the Best Video Editing App.pdfproinshot.com
 
Introducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) SolutionIntroducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) SolutionOnePlan Solutions
 

Kürzlich hochgeladen (20)

CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.com
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial Goals
 
VTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnVTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learn
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTV
 
Diamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionDiamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with Precision
 
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdfintroduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
 
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerHow To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
 
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Models
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
 
Exploring the Best Video Editing App.pdf
Exploring the Best Video Editing App.pdfExploring the Best Video Editing App.pdf
Exploring the Best Video Editing App.pdf
 
Microsoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdfMicrosoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdf
 
Introducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) SolutionIntroducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) Solution
 

Enterprise machine learning on k8s lessons learned and the road ahead

  • 1. ENTERPRISE MACHINE LEARNING ON K8S: LESSONS LEARNED AND THE ROAD AHEAD Tim Chen, Cloudera Tristan Zajonc, Cloudera
  • 2. 2 © Cloudera, Inc. All rights reserved. Tim Chen Cloudera @tnachen WHO ARE WE? Lead for Cloudera Machine Learning Former Hyperpilot CEO/Cofounder Apache PMC for Drill, Mesos Tristan Zajonc Cloudera @tristanzajonc CTO for Machine Learning at Cloudera Former Sense CEO/Cofounder
  • 3. 3 © Cloudera, Inc. All rights reserved. DISCLAIMER The information in this document is proprietary to Cloudera. No part of this document may be reproduced or disclosed without the express prior written permission of Cloudera. This document is not binding upon Cloudera in any way (including with respect to Cloudera’s course of business, product strategy, future releases, or development commitments), and is not a part of or subject to any license agreement or other agreement you may have with Cloudera. Cloudera makes no commitments about any future developments or functionality in any Cloudera product. The development, release, and timing of release of any software features or functionality described in this document remains at the discretion of Cloudera and you should not rely on any statements about development plans or anticipated functionality in this document when making any purchasing decisions. Cloudera assumes no responsibility for errors or omissions in this document. Cloudera does not warrant the accuracy or completeness of the information, text, graphics, links or other items contained within this material. This document is provided without a warranty of any kind, either express or implied, including but not limited to the implied warranties of merchantability, fitness for a particular purpose or non-infringement. Cloudera shall have no liability for damages of any kind including without limitation direct, special, indirect or consequential damages that may result from the use of these materials.”
  • 4. 4 © Cloudera, Inc. All rights reserved. +
  • 5. 5 © Cloudera, Inc. All rights reserved. CLOUDERA DATA SCIENCE WORKBENCH Collaborative data science experience powered by Kubernetes • Built with Docker and Kubernetes • Isolated, reproducible user environments • Supports both big and small data • Local Python, R, Scala runtimes • Connect to any data source • Schedule & share GPU resources • Scale to CDH with Spark, Impala, Hive • Secure and governed by default • Easy, audited access to Kerberized clusters • Leverages shared platform services • Deployed with Cloudera Manager CDH CDH Cloudera Manager gateway node(s) CDH nodes Hive, HDFS, ... CDSW CDSW ... Master ... Engine EngineEngine EngineEngine
  • 6. 6 © Cloudera, Inc. All rights reserved. CLOUDERA MACHINE LEARNING (PREVIEW) Cloud-native machine learning platform for the enterprise DATA SCIENCE DATA ENGINEERING MODEL OPERATIONS CLOUDERA ML RUNTIME Python/R, Spark, TensorFlow, etc, optimized for CPU/GPU Interactive Development Batch Pipelines Predictive APIs End-to-end ML lifecycle for teams to build at scale Rapid provisioning and elastic autoscaling with multi-cloud portability powered by Kubernetes Containerized workloads for open data science with easy dependency management Distributed GPU training for accelerated deep learning Secure data access to HDFS or cloud object storage with shared schema, governance KUBERNETES Amazon EKS, Microsoft AKS, Google GKE, Red Hat OpenShift DATA CATALOG GOVERNANCESECURITY LIFECYCLEWORKLOAD XM STORAGE Amazon S3 Microsoft ADLS HDFS KUDU
  • 7. 7 © Cloudera, Inc. All rights reserved. SOME LESSONS LEARNED
  • 8. 8 © Cloudera, Inc. All rights reserved. LESSON 1: ENTERPRISE ML REQUIRES BIGGER PICTURE Streaming Ingest Batch Ingest Machine Learning Tools BI Tools and SQL Editors Data Products DATA, METADATA, SECURITY, GOVERNANCE, WORKLOAD MANAGEMENT MACHINE LEARNING DATA ENGINEERING DATA WAREHOUSE OPERATIONAL DATABASE
  • 9. 9 © Cloudera, Inc. All rights reserved. UBER Michelangelo: Uber’s Machine Learning Platform Source: https://eng.uber.com/michelangelo/
  • 10. 10 © Cloudera, Inc. All rights reserved. NETFLIX Netflix recommendation infrastructure Source: https://medium.com/netflix-techblog/netflix-at-spark-ai-summit-2018-5304749ed7fa
  • 11. 11 © Cloudera, Inc. All rights reserved. FACEBOOK Facebook’s AI infrastructure Source: https://www.slideshare.net/KarthikMurugesan2/facebook-machine-learning-infrastructure-2018-slides
  • 12. 12 © Cloudera, Inc. All rights reserved. EMERGING CONSENSUS FOR ENTERPRISE ML AT SCALE Cloud Native Infrastructure Big Data Platform ML/AI Frameworks
  • 13. 13 © Cloudera, Inc. All rights reserved. Data scientists want to use every library under the sun, platform should support that Workbench (interactive) ANALYZ E Experiments (batch, ad-hoc) TRAIN Reports (shared output) SHARE Jobs (batch, scheduled) REPEAT Models (RPC APIs) SERVE LOOSELY COUPLED, ML/AI FRAMEWORK AGNOSTIC LESSON 2: FOCUS ON WORKFLOWS NOT FRAMEWORKS
  • 14. 14 © Cloudera, Inc. All rights reserved. Cloudera ML is built around one Operator and four “CRDs”: • Session (Interactive) • Experiment (Batch) • Job (Scheduled) • Model (Online API) No long-lived operators for particular libraries: TFJob, SparkJob, etc Kubernetes is not exposed to data scientist. Goal is serverless experience. WHAT DOES THIS LOOK LIKE IN K8S? Let’s talk operators and CRDs….
  • 15. 15 © Cloudera, Inc. All rights reserved. cldr ml run --cmd “python train.py”
  • 16. 16 © Cloudera, Inc. All rights reserved. from pyspark.sql import SparkSession spark = SparkSession.builder.getOrCreate() results = sc.parallelize(xrange(0, 1000)) .map(f).collect()
  • 17. 17 © Cloudera, Inc. All rights reserved. ROAD AHEAD
  • 18. 18 © Cloudera, Inc. All rights reserved. • Unified workflow for distributed data engineering and machine learning • Simple dependency management, particularly for ML workloads • python: pip install, R: install.packages, etc • Elastic compute (CPU / GPU) in cloud • Improved utilization and multi-tenancy WHY SPARK ON KUBERNETES?
  • 19. 19 © Cloudera, Inc. All rights reserved. SPARK ON KUBERNETES • Spark has native integration with Kubernetes since 2.3+ • Runs Spark fully containerized on K8s • Allows Spark to add / remove executors with user specified Spark configuration • Able to leverages Kubernetes features to facilitate launching Spark (e.g: Node Affinities, RBAC, Namespaces, etc).
  • 20. 20 © Cloudera, Inc. All rights reserved. Kubernetes CMLCML Engine Spark Driver (Client Mode) Spark Executor Fluentd Fluentd Project Volume Project Volume Spark Executor Project Volume Hive, HDFS, HMS, Sentry, Navigator- Create service account and namespace for quota - Configure Spark driver / executor configuration to connect to k8s. Also populate kerberos related information for accessing HDFS and other CDH services. - Populate dependency management related configurations using pod template: volume, image, user information, etc. - Fluentd configured to pick up logging and metric information to external storage SPARK ON KUBERNETES IN CML
  • 21. 21 © Cloudera, Inc. All rights reserved. DEMO
  • 22. 22 © Cloudera, Inc. All rights reserved. • Kerberos support • Dynamic allocation • GPU support • Scale and automation tests • Advanced scheduling (FairScheduler, Hierarchical Queues, etc) SPARK AND K8S UPSTREAM ONGOING WORK
  • 23. 23 © Cloudera, Inc. All rights reserved. Proactively optimize Workloads, Application Performance, and Infrastructure Capacity for Data Warehousing, Data Engineering, and Machine Learning Environments Migrate Analyze Optimize Manage AIOPS - AI-DRIVEN OPTIMIZATION AT SCALE
  • 24. 24 © Cloudera, Inc. All rights reserved. AIOPS ON K8S - NODE BOTTLENECK ANALYSIS
  • 25. 25 © Cloudera, Inc. All rights reserved. AIOPS ON K8S - CLUSTER AUTOMATION
  • 26. 26 © Cloudera, Inc. All rights reserved. Questions? Want to play around with this stuff? tiny.cloudera.com/CML Relevant Sigs sig-big-data, sig-ml, sig-scheduling