SlideShare ist ein Scribd-Unternehmen logo
1 von 15
Training and Serving ML models using Kubeflow
• Subtitle or speaker name
Jayesh Sharma
Machine Learning Stages
@aronchick
Make it easy for everyone to develop, deploy,
and manage portable, scalable ML everywhere
Why Kubeflow?
● Composability
○ Choose from existing popular tools
● Portability
○ Build using cloud native, portable Kubernetes APIs
● Scalability
○ TF already supports CPU/GPU/distributed
○ K8s scales to 5k nodes with same stack
What’s in the Box?
● Jupyter Hub - for collaborative & interactive training
● A TensorFlow Training Controller
● A TensorFlow Serving Deployment
● Argo for workflows
● Much more
What’s in the Box?
Kubeflow today
Kubeflow is composable
Training
• Perform distributed training with TF-Jobs
• Run pipelines with regular containers as steps.
• Run pipelines with TF-Jobs and other CRDs as steps.
Serving
• KF-Serving, Seldon Core
• Azure ML Service and other frameworks.
Kubeflow
Architecture
TF-Job: Distributed Training
A distributed TensorFlow job typically contains 0 or more of the following
processes:
• Chief: The chief is responsible for orchestrating training and performing
tasks like checkpointing the model.
• PS: The ps are parameter servers; these servers provide a distributed
data store for the model parameters.
• Worker: The workers do the actual work of training the model. In some
cases, worker 0 might also act as the chief.
• Evaluator: The evaluators can be used to compute
evaluation metrics as the model is trained.
An example
TF-Job YAML
Parameter Server option
Worker specification
Image with your code
Command to begin
training
Kubeflow Pipelines
• A user interface (UI) for managing and tracking experiments, jobs, and
runs.
• An engine for scheduling multi-step ML workflows.
• An SDK for defining and manipulating pipelines and components.
• Notebooks for interacting with the system using the SDK.
Anatomy of a pipeline
• Containerized implementations of ML Tasks
• Pre-built components: Just provide params or code snippets. Create
your own components from code or libraries
• Use any runtime, framework, data types
• Attach k8s objects - volumes, secrets
• Specification of the sequence of steps
• Specified via Python DSL
• Inferred from data dependencies on input/output
• Input Parameters
• A “Run” = Pipeline invoked w/ specific parameters
• Schedules
• Invoke a single run or create a recurring scheduled pipeline
Training And Serving ML Model Using Kubeflow by Jayesh Sharma
Training And Serving ML Model Using Kubeflow by Jayesh Sharma

Weitere ähnliche Inhalte

Was ist angesagt?

MLFlow: Platform for Complete Machine Learning Lifecycle
MLFlow: Platform for Complete Machine Learning Lifecycle MLFlow: Platform for Complete Machine Learning Lifecycle
MLFlow: Platform for Complete Machine Learning Lifecycle
Databricks
 
MLflow: Infrastructure for a Complete Machine Learning Life Cycle with Mani ...
 MLflow: Infrastructure for a Complete Machine Learning Life Cycle with Mani ... MLflow: Infrastructure for a Complete Machine Learning Life Cycle with Mani ...
MLflow: Infrastructure for a Complete Machine Learning Life Cycle with Mani ...
Databricks
 
How to Utilize MLflow and Kubernetes to Build an Enterprise ML Platform
How to Utilize MLflow and Kubernetes to Build an Enterprise ML PlatformHow to Utilize MLflow and Kubernetes to Build an Enterprise ML Platform
How to Utilize MLflow and Kubernetes to Build an Enterprise ML Platform
Databricks
 

Was ist angesagt? (20)

Kubeflow Pipelines (with Tekton)
Kubeflow Pipelines (with Tekton)Kubeflow Pipelines (with Tekton)
Kubeflow Pipelines (with Tekton)
 
KFServing - Serverless Model Inferencing
KFServing - Serverless Model InferencingKFServing - Serverless Model Inferencing
KFServing - Serverless Model Inferencing
 
MLOps - Build pipelines with Tensor Flow Extended & Kubeflow
MLOps - Build pipelines with Tensor Flow Extended & KubeflowMLOps - Build pipelines with Tensor Flow Extended & Kubeflow
MLOps - Build pipelines with Tensor Flow Extended & Kubeflow
 
MLOps Using MLflow
MLOps Using MLflowMLOps Using MLflow
MLOps Using MLflow
 
Introduction to MLflow
Introduction to MLflowIntroduction to MLflow
Introduction to MLflow
 
Simplifying Model Management with MLflow
Simplifying Model Management with MLflowSimplifying Model Management with MLflow
Simplifying Model Management with MLflow
 
MLFlow: Platform for Complete Machine Learning Lifecycle
MLFlow: Platform for Complete Machine Learning Lifecycle MLFlow: Platform for Complete Machine Learning Lifecycle
MLFlow: Platform for Complete Machine Learning Lifecycle
 
Serving models using KFServing
Serving models using KFServingServing models using KFServing
Serving models using KFServing
 
Kubernetes Monitoring & Best Practices
Kubernetes Monitoring & Best PracticesKubernetes Monitoring & Best Practices
Kubernetes Monitoring & Best Practices
 
Kubernetes Helm (Boulder Kubernetes Meetup, June 2016)
Kubernetes Helm (Boulder Kubernetes Meetup, June 2016)Kubernetes Helm (Boulder Kubernetes Meetup, June 2016)
Kubernetes Helm (Boulder Kubernetes Meetup, June 2016)
 
Pythonsevilla2019 - Introduction to MLFlow
Pythonsevilla2019 - Introduction to MLFlowPythonsevilla2019 - Introduction to MLFlow
Pythonsevilla2019 - Introduction to MLFlow
 
ONNX and Edge Deployments
ONNX and Edge DeploymentsONNX and Edge Deployments
ONNX and Edge Deployments
 
Reproducible AI using MLflow and PyTorch
Reproducible AI using MLflow and PyTorchReproducible AI using MLflow and PyTorch
Reproducible AI using MLflow and PyTorch
 
Event-driven autoscaling through KEDA and Knative Integration | DevNation Tec...
Event-driven autoscaling through KEDA and Knative Integration | DevNation Tec...Event-driven autoscaling through KEDA and Knative Integration | DevNation Tec...
Event-driven autoscaling through KEDA and Knative Integration | DevNation Tec...
 
MLflow: Infrastructure for a Complete Machine Learning Life Cycle with Mani ...
 MLflow: Infrastructure for a Complete Machine Learning Life Cycle with Mani ... MLflow: Infrastructure for a Complete Machine Learning Life Cycle with Mani ...
MLflow: Infrastructure for a Complete Machine Learning Life Cycle with Mani ...
 
Room 1 - 4 - Phạm Tường Chiến & Trần Văn Thắng - Deliver managed Kubernetes C...
Room 1 - 4 - Phạm Tường Chiến & Trần Văn Thắng - Deliver managed Kubernetes C...Room 1 - 4 - Phạm Tường Chiến & Trần Văn Thắng - Deliver managed Kubernetes C...
Room 1 - 4 - Phạm Tường Chiến & Trần Văn Thắng - Deliver managed Kubernetes C...
 
Kubernetes a comprehensive overview
Kubernetes   a comprehensive overviewKubernetes   a comprehensive overview
Kubernetes a comprehensive overview
 
Helm - Application deployment management for Kubernetes
Helm - Application deployment management for KubernetesHelm - Application deployment management for Kubernetes
Helm - Application deployment management for Kubernetes
 
Intro to Helm for Kubernetes
Intro to Helm for KubernetesIntro to Helm for Kubernetes
Intro to Helm for Kubernetes
 
How to Utilize MLflow and Kubernetes to Build an Enterprise ML Platform
How to Utilize MLflow and Kubernetes to Build an Enterprise ML PlatformHow to Utilize MLflow and Kubernetes to Build an Enterprise ML Platform
How to Utilize MLflow and Kubernetes to Build an Enterprise ML Platform
 

Ähnlich wie Training And Serving ML Model Using Kubeflow by Jayesh Sharma

[DSC Europe 23] Milos Grubjesic Empowering Business with Pepsico s Advanced M...
[DSC Europe 23] Milos Grubjesic Empowering Business with Pepsico s Advanced M...[DSC Europe 23] Milos Grubjesic Empowering Business with Pepsico s Advanced M...
[DSC Europe 23] Milos Grubjesic Empowering Business with Pepsico s Advanced M...
DataScienceConferenc1
 

Ähnlich wie Training And Serving ML Model Using Kubeflow by Jayesh Sharma (20)

Containerized architectures for deep learning
Containerized architectures for deep learningContainerized architectures for deep learning
Containerized architectures for deep learning
 
Distributed Tensorflow with Kubernetes - data2day - Jakob Karalus
Distributed Tensorflow with Kubernetes - data2day - Jakob KaralusDistributed Tensorflow with Kubernetes - data2day - Jakob Karalus
Distributed Tensorflow with Kubernetes - data2day - Jakob Karalus
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
 
Kubeflow: portable and scalable machine learning using Jupyterhub and Kuberne...
Kubeflow: portable and scalable machine learning using Jupyterhub and Kuberne...Kubeflow: portable and scalable machine learning using Jupyterhub and Kuberne...
Kubeflow: portable and scalable machine learning using Jupyterhub and Kuberne...
 
[DSC Europe 23] Milos Grubjesic Empowering Business with Pepsico s Advanced M...
[DSC Europe 23] Milos Grubjesic Empowering Business with Pepsico s Advanced M...[DSC Europe 23] Milos Grubjesic Empowering Business with Pepsico s Advanced M...
[DSC Europe 23] Milos Grubjesic Empowering Business with Pepsico s Advanced M...
 
Deploy your machine learning models to production with Kubernetes
Deploy your machine learning models to production with KubernetesDeploy your machine learning models to production with Kubernetes
Deploy your machine learning models to production with Kubernetes
 
MLflow with Databricks
MLflow with DatabricksMLflow with Databricks
MLflow with Databricks
 
Mlflow with databricks
Mlflow with databricksMlflow with databricks
Mlflow with databricks
 
Running Apache Spark Jobs Using Kubernetes
Running Apache Spark Jobs Using KubernetesRunning Apache Spark Jobs Using Kubernetes
Running Apache Spark Jobs Using Kubernetes
 
Hot to build continuously processing for 24/7 real-time data streaming platform?
Hot to build continuously processing for 24/7 real-time data streaming platform?Hot to build continuously processing for 24/7 real-time data streaming platform?
Hot to build continuously processing for 24/7 real-time data streaming platform?
 
Trenowanie i wdrażanie modeli uczenia maszynowego z wykorzystaniem Google Clo...
Trenowanie i wdrażanie modeli uczenia maszynowego z wykorzystaniem Google Clo...Trenowanie i wdrażanie modeli uczenia maszynowego z wykorzystaniem Google Clo...
Trenowanie i wdrażanie modeli uczenia maszynowego z wykorzystaniem Google Clo...
 
Vertex AI Presentation
Vertex AI PresentationVertex AI Presentation
Vertex AI Presentation
 
Bodywork - GitOps for Machine Learning
Bodywork - GitOps for Machine LearningBodywork - GitOps for Machine Learning
Bodywork - GitOps for Machine Learning
 
Democratizing machine learning on kubernetes
Democratizing machine learning on kubernetesDemocratizing machine learning on kubernetes
Democratizing machine learning on kubernetes
 
Parallel Programming
Parallel ProgrammingParallel Programming
Parallel Programming
 
Neptune @ SoCal
Neptune @ SoCalNeptune @ SoCal
Neptune @ SoCal
 
Caffe2
Caffe2Caffe2
Caffe2
 
ML Platform Q1 Meetup: Airbnb's End-to-End Machine Learning Infrastructure
ML Platform Q1 Meetup: Airbnb's End-to-End Machine Learning InfrastructureML Platform Q1 Meetup: Airbnb's End-to-End Machine Learning Infrastructure
ML Platform Q1 Meetup: Airbnb's End-to-End Machine Learning Infrastructure
 
DAIS Europe Nov. 2020 presentation on MLflow Model Serving
DAIS Europe Nov. 2020 presentation on MLflow Model ServingDAIS Europe Nov. 2020 presentation on MLflow Model Serving
DAIS Europe Nov. 2020 presentation on MLflow Model Serving
 
Scale machine learning deployment
Scale machine learning deploymentScale machine learning deployment
Scale machine learning deployment
 

Mehr von CodeOps Technologies LLP

Mehr von CodeOps Technologies LLP (20)

AWS Serverless Event-driven Architecture - in lastminute.com meetup
AWS Serverless Event-driven Architecture - in lastminute.com meetupAWS Serverless Event-driven Architecture - in lastminute.com meetup
AWS Serverless Event-driven Architecture - in lastminute.com meetup
 
Understanding azure batch service
Understanding azure batch serviceUnderstanding azure batch service
Understanding azure batch service
 
DEVOPS AND MACHINE LEARNING
DEVOPS AND MACHINE LEARNINGDEVOPS AND MACHINE LEARNING
DEVOPS AND MACHINE LEARNING
 
SERVERLESS MIDDLEWARE IN AZURE FUNCTIONS
SERVERLESS MIDDLEWARE IN AZURE FUNCTIONSSERVERLESS MIDDLEWARE IN AZURE FUNCTIONS
SERVERLESS MIDDLEWARE IN AZURE FUNCTIONS
 
BUILDING SERVERLESS SOLUTIONS WITH AZURE FUNCTIONS
BUILDING SERVERLESS SOLUTIONS WITH AZURE FUNCTIONSBUILDING SERVERLESS SOLUTIONS WITH AZURE FUNCTIONS
BUILDING SERVERLESS SOLUTIONS WITH AZURE FUNCTIONS
 
APPLYING DEVOPS STRATEGIES ON SCALE USING AZURE DEVOPS SERVICES
APPLYING DEVOPS STRATEGIES ON SCALE USING AZURE DEVOPS SERVICESAPPLYING DEVOPS STRATEGIES ON SCALE USING AZURE DEVOPS SERVICES
APPLYING DEVOPS STRATEGIES ON SCALE USING AZURE DEVOPS SERVICES
 
BUILD, TEST & DEPLOY .NET CORE APPS IN AZURE DEVOPS
BUILD, TEST & DEPLOY .NET CORE APPS IN AZURE DEVOPSBUILD, TEST & DEPLOY .NET CORE APPS IN AZURE DEVOPS
BUILD, TEST & DEPLOY .NET CORE APPS IN AZURE DEVOPS
 
CREATE RELIABLE AND LOW-CODE APPLICATION IN SERVERLESS MANNER
CREATE RELIABLE AND LOW-CODE APPLICATION IN SERVERLESS MANNERCREATE RELIABLE AND LOW-CODE APPLICATION IN SERVERLESS MANNER
CREATE RELIABLE AND LOW-CODE APPLICATION IN SERVERLESS MANNER
 
CREATING REAL TIME DASHBOARD WITH BLAZOR, AZURE FUNCTION COSMOS DB AN AZURE S...
CREATING REAL TIME DASHBOARD WITH BLAZOR, AZURE FUNCTION COSMOS DB AN AZURE S...CREATING REAL TIME DASHBOARD WITH BLAZOR, AZURE FUNCTION COSMOS DB AN AZURE S...
CREATING REAL TIME DASHBOARD WITH BLAZOR, AZURE FUNCTION COSMOS DB AN AZURE S...
 
WRITE SCALABLE COMMUNICATION APPLICATION WITH POWER OF SERVERLESS
WRITE SCALABLE COMMUNICATION APPLICATION WITH POWER OF SERVERLESSWRITE SCALABLE COMMUNICATION APPLICATION WITH POWER OF SERVERLESS
WRITE SCALABLE COMMUNICATION APPLICATION WITH POWER OF SERVERLESS
 
Deploy Microservices To Kubernetes Without Secrets by Reenu Saluja
Deploy Microservices To Kubernetes Without Secrets by Reenu SalujaDeploy Microservices To Kubernetes Without Secrets by Reenu Saluja
Deploy Microservices To Kubernetes Without Secrets by Reenu Saluja
 
Leverage Azure Tech stack for any Kubernetes cluster via Azure Arc by Saiyam ...
Leverage Azure Tech stack for any Kubernetes cluster via Azure Arc by Saiyam ...Leverage Azure Tech stack for any Kubernetes cluster via Azure Arc by Saiyam ...
Leverage Azure Tech stack for any Kubernetes cluster via Azure Arc by Saiyam ...
 
YAML Tips For Kubernetes by Neependra Khare
YAML Tips For Kubernetes by Neependra KhareYAML Tips For Kubernetes by Neependra Khare
YAML Tips For Kubernetes by Neependra Khare
 
Must Know Azure Kubernetes Best Practices And Features For Better Resiliency ...
Must Know Azure Kubernetes Best Practices And Features For Better Resiliency ...Must Know Azure Kubernetes Best Practices And Features For Better Resiliency ...
Must Know Azure Kubernetes Best Practices And Features For Better Resiliency ...
 
Monitor Azure Kubernetes Cluster With Prometheus by Mamta Jha
Monitor Azure Kubernetes Cluster With Prometheus by Mamta JhaMonitor Azure Kubernetes Cluster With Prometheus by Mamta Jha
Monitor Azure Kubernetes Cluster With Prometheus by Mamta Jha
 
Jet brains space intro presentation
Jet brains space intro presentationJet brains space intro presentation
Jet brains space intro presentation
 
Functional Programming in Java 8 - Lambdas and Streams
Functional Programming in Java 8 - Lambdas and StreamsFunctional Programming in Java 8 - Lambdas and Streams
Functional Programming in Java 8 - Lambdas and Streams
 
Distributed Tracing: New DevOps Foundation
Distributed Tracing: New DevOps FoundationDistributed Tracing: New DevOps Foundation
Distributed Tracing: New DevOps Foundation
 
"Distributed Tracing: New DevOps Foundation" by Jayesh Ahire
"Distributed Tracing: New DevOps Foundation" by Jayesh Ahire  "Distributed Tracing: New DevOps Foundation" by Jayesh Ahire
"Distributed Tracing: New DevOps Foundation" by Jayesh Ahire
 
Improve customer engagement and productivity with conversational ai
Improve customer engagement and productivity with conversational aiImprove customer engagement and productivity with conversational ai
Improve customer engagement and productivity with conversational ai
 

Kürzlich hochgeladen

Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
Enterprise Knowledge
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
Earley Information Science
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
giselly40
 

Kürzlich hochgeladen (20)

🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdf
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 

Training And Serving ML Model Using Kubeflow by Jayesh Sharma

  • 1. Training and Serving ML models using Kubeflow • Subtitle or speaker name Jayesh Sharma
  • 3. Make it easy for everyone to develop, deploy, and manage portable, scalable ML everywhere
  • 4. Why Kubeflow? ● Composability ○ Choose from existing popular tools ● Portability ○ Build using cloud native, portable Kubernetes APIs ● Scalability ○ TF already supports CPU/GPU/distributed ○ K8s scales to 5k nodes with same stack
  • 5. What’s in the Box? ● Jupyter Hub - for collaborative & interactive training ● A TensorFlow Training Controller ● A TensorFlow Serving Deployment ● Argo for workflows ● Much more
  • 8. Kubeflow is composable Training • Perform distributed training with TF-Jobs • Run pipelines with regular containers as steps. • Run pipelines with TF-Jobs and other CRDs as steps. Serving • KF-Serving, Seldon Core • Azure ML Service and other frameworks.
  • 10. TF-Job: Distributed Training A distributed TensorFlow job typically contains 0 or more of the following processes: • Chief: The chief is responsible for orchestrating training and performing tasks like checkpointing the model. • PS: The ps are parameter servers; these servers provide a distributed data store for the model parameters. • Worker: The workers do the actual work of training the model. In some cases, worker 0 might also act as the chief. • Evaluator: The evaluators can be used to compute evaluation metrics as the model is trained.
  • 11. An example TF-Job YAML Parameter Server option Worker specification Image with your code Command to begin training
  • 12. Kubeflow Pipelines • A user interface (UI) for managing and tracking experiments, jobs, and runs. • An engine for scheduling multi-step ML workflows. • An SDK for defining and manipulating pipelines and components. • Notebooks for interacting with the system using the SDK.
  • 13. Anatomy of a pipeline • Containerized implementations of ML Tasks • Pre-built components: Just provide params or code snippets. Create your own components from code or libraries • Use any runtime, framework, data types • Attach k8s objects - volumes, secrets • Specification of the sequence of steps • Specified via Python DSL • Inferred from data dependencies on input/output • Input Parameters • A “Run” = Pipeline invoked w/ specific parameters • Schedules • Invoke a single run or create a recurring scheduled pipeline