SlideShare a Scribd company logo
1 of 21
Download to read offline
Machine Learning on
Kubernetes
13 Dec 2017
Anirudh Ramanathan
Software Engineer on Kubernetes
Twitter: @anirudh4444
Disclaimer
I’m not a Machine Learning expert.
I work on infrastructure and distributed systems for a
living.
Kubernetes a year ago...
● Was used primarily for stateless workloads
● Needed an understanding of several core concepts to operate
● Applications had to be written to fit into core controller abstractions
Kubernetes today...
● Has abstractions to support Stateful applications and now data
processing and machine learning.
● Has a wide range of extension points including ones that allow API
extensions and custom controllers.
● Has support for building higher level abstractions and APIs to hide
infrastructure & operational complexity.
What’s changed?
● Workload controller abstractions moving to GA/stable.
● Custom Resource Definitions & Aggregated API Servers
● Kubernetes Operators
● Community support for external frameworks
● Work on scheduling and resource management (ongoing)
Machine Learning
Solving problems without explicitly knowing
how to create solutions
Machine Learning Infrastructure
TFX: A TensorFlow-Based Production-Scale Machine Learning Platform (KDD 2017)
Machine Learning Infrastructure
TFX: A TensorFlow-Based Production-Scale Machine Learning Platform (KDD 2017)
Kubeflow
https://github.com/google/kubeflow/
Our goal is not to recreate other services, but to provide a straightforward
way for spinning up best of breed OSS solutions.
● A JupyterHub to create & manage interactive Jupyter notebooks
● A Tensorflow Training Controller that can be configured to use CPUs
or GPUs, and adjusted to the size of a cluster with a single setting
● A TF Serving container
JupyterHub
● A single hub & proxy for managing interactive sessions
● Can run entirely within Kubernetes - notebooks are backed by
Kubernetes pods
● Can request required resources - CPUs, GPUs, etc
● Has pluggable authentication (oauth, kdc, etc)
Made possible by: https://github.com/jupyterhub/kubespawner
Tensorflow Training Controller
● A Kubernetes “operator” to help run distributed/non-distributed TF
training.
● Exposes an API through a CustomResourceDefinition
● Controller manages complexity of distributed training using
Tensorflow.
Made possible by: https://github.com/tensorflow/k8s
Tensorflow Serving
● A Kubernetes Deployment that can serve saved models
● Deployment - replicas can be scaled.
Future work:
● Custom metrics & Autoscaling
But there were so many stages!
● Clearly there are many other challenges faced by people building
Machine Learning infrastructure.
● How do I preprocess data?
● How do I describe my pipeline?
● How do I orchestrate my pipeline?
● We have some ideas.
Apache Spark
● Spark on Kubernetes is an ongoing effort since Dec 2016.
● It is being upstreamed into Spark and expected to land in Spark 2.3
(due sometime in January).
● The changes make Spark itself aware of a new Kubernetes Scheduler
that can directly run Spark applications for the user.
Apache Spark
Spark Core Kubernetes Scheduler Backend
Kubernetes
Cluster
add executors
rm executors
configuration
Apache Spark
Kubernetes Scheduler for Spark
● Spark 2.3 will support
○ Running Java/Scala jobs
○ Static allocation of executors
○ Some dependency management
● Our fork (github.com/apache-spark-on-k8s/spark) has several
additional features which we’re slowly upstreaming.
○ It’s being run by several organizations right now.
Apache Airflow
● A DAG scheduler.
● Has a rich ecosystem of “operators” to allow interacting with different
applications.
● Community working on a Kubernetes native executor for Airflow.
● Currently in the process of being upstreamed.
Apache Airflow
BashOperator(
task_id = ‘account-test’,
bash_command = ‘run-something.sh’,
dag = dag,
executor_config = {
‘request_memory’: ‘128Mi’,
‘limit_memory’: ‘128Mi’
‘image’: ‘airflow/scipy:1.1.5’
}
)
The operators can specify various Kubernetes executor constraints within each DAG step.
For example:
Putting it all together
HDFS
or GCS/S3
Spark
Airflow Pipeline
JupyterHub
Tensorflow
Other ML
Frameworks
Get Involved
Kubeflow
● Slack Channel (See https://github.com/google/kubeflow for joining instructions)
● Twitter (http://twitter.com/kubeflow)
● Mailing List (https://groups.google.com/forum/#!forum/kubeflow-discuss)
SIG Big Data
● Slack Channel (https://kubernetes.slack.com/messages/sig-big-data)
● Mailing list (https://groups.google.com/forum/#!forum/kubernetes-sig-big-data)
● Weekly meeting (https://github.com/kubernetes/community/tree/master/sig-big-data)
Questions?

More Related Content

What's hot

Introducing Kubeflow (w. Special Guests Tensorflow and Apache Spark)
Introducing Kubeflow (w. Special Guests Tensorflow and Apache Spark)Introducing Kubeflow (w. Special Guests Tensorflow and Apache Spark)
Introducing Kubeflow (w. Special Guests Tensorflow and Apache Spark)DataWorks Summit
 
Advanced Model Inferencing leveraging Kubeflow Serving, KNative and Istio
Advanced Model Inferencing leveraging Kubeflow Serving, KNative and IstioAdvanced Model Inferencing leveraging Kubeflow Serving, KNative and Istio
Advanced Model Inferencing leveraging Kubeflow Serving, KNative and IstioAnimesh Singh
 
Machine learning at scale by Amy Unruh from Google
Machine learning at scale by  Amy Unruh from GoogleMachine learning at scale by  Amy Unruh from Google
Machine learning at scale by Amy Unruh from GoogleBill Liu
 
What's New in H2O Driverless AI? - Arno Candel - H2O AI World London 2018
What's New in H2O Driverless AI? - Arno Candel - H2O AI World London 2018What's New in H2O Driverless AI? - Arno Candel - H2O AI World London 2018
What's New in H2O Driverless AI? - Arno Candel - H2O AI World London 2018Sri Ambati
 
Kubeflow at Spotify (For the Kubeflow Summit)
Kubeflow at Spotify (For the Kubeflow Summit)Kubeflow at Spotify (For the Kubeflow Summit)
Kubeflow at Spotify (For the Kubeflow Summit)Josh Baer
 
running Tensorflow in Production
running Tensorflow in Productionrunning Tensorflow in Production
running Tensorflow in ProductionMatthias Feys
 
Productionizing Machine Learning Pipelines with Databricks and Azure ML
Productionizing Machine Learning Pipelines with Databricks and Azure MLProductionizing Machine Learning Pipelines with Databricks and Azure ML
Productionizing Machine Learning Pipelines with Databricks and Azure MLDatabricks
 
Kyryl Truskovskyi: Kubeflow for end2end machine learning lifecycle
Kyryl Truskovskyi: Kubeflow for end2end machine learning lifecycleKyryl Truskovskyi: Kubeflow for end2end machine learning lifecycle
Kyryl Truskovskyi: Kubeflow for end2end machine learning lifecycleLviv Startup Club
 
AI Pipeline Optimization using Kubeflow
AI Pipeline Optimization using KubeflowAI Pipeline Optimization using Kubeflow
AI Pipeline Optimization using KubeflowSteve Guhr
 
Getting Started with Visual Studio Tools for AI
Getting Started with Visual Studio Tools for AIGetting Started with Visual Studio Tools for AI
Getting Started with Visual Studio Tools for AIMicrosoft Tech Community
 
Yannis Zarkadas. Enterprise data science workflows on kubeflow
Yannis Zarkadas. Enterprise data science workflows on kubeflowYannis Zarkadas. Enterprise data science workflows on kubeflow
Yannis Zarkadas. Enterprise data science workflows on kubeflowMarynaHoldaieva
 
Accelerated Machine Learning with RAPIDS and MLflow, Nvidia/RAPIDS
Accelerated Machine Learning with RAPIDS and MLflow, Nvidia/RAPIDSAccelerated Machine Learning with RAPIDS and MLflow, Nvidia/RAPIDS
Accelerated Machine Learning with RAPIDS and MLflow, Nvidia/RAPIDSDatabricks
 
Build and Monitor Machine Learning Services in Kubernetes
Build and Monitor Machine Learning Services in KubernetesBuild and Monitor Machine Learning Services in Kubernetes
Build and Monitor Machine Learning Services in KubernetesKP Kaiser
 
Boolan machine learning summit
Boolan machine learning summitBoolan machine learning summit
Boolan machine learning summitAdam Gibson
 
Serverless with Knative - Mete Atamel (Google)
Serverless with Knative - Mete Atamel (Google)Serverless with Knative - Mete Atamel (Google)
Serverless with Knative - Mete Atamel (Google)Shift Conference
 

What's hot (20)

Introducing Kubeflow (w. Special Guests Tensorflow and Apache Spark)
Introducing Kubeflow (w. Special Guests Tensorflow and Apache Spark)Introducing Kubeflow (w. Special Guests Tensorflow and Apache Spark)
Introducing Kubeflow (w. Special Guests Tensorflow and Apache Spark)
 
Advanced Model Inferencing leveraging Kubeflow Serving, KNative and Istio
Advanced Model Inferencing leveraging Kubeflow Serving, KNative and IstioAdvanced Model Inferencing leveraging Kubeflow Serving, KNative and Istio
Advanced Model Inferencing leveraging Kubeflow Serving, KNative and Istio
 
Machine learning at scale by Amy Unruh from Google
Machine learning at scale by  Amy Unruh from GoogleMachine learning at scale by  Amy Unruh from Google
Machine learning at scale by Amy Unruh from Google
 
What's New in H2O Driverless AI? - Arno Candel - H2O AI World London 2018
What's New in H2O Driverless AI? - Arno Candel - H2O AI World London 2018What's New in H2O Driverless AI? - Arno Candel - H2O AI World London 2018
What's New in H2O Driverless AI? - Arno Candel - H2O AI World London 2018
 
Kubeflow at Spotify (For the Kubeflow Summit)
Kubeflow at Spotify (For the Kubeflow Summit)Kubeflow at Spotify (For the Kubeflow Summit)
Kubeflow at Spotify (For the Kubeflow Summit)
 
running Tensorflow in Production
running Tensorflow in Productionrunning Tensorflow in Production
running Tensorflow in Production
 
Productionizing Machine Learning Pipelines with Databricks and Azure ML
Productionizing Machine Learning Pipelines with Databricks and Azure MLProductionizing Machine Learning Pipelines with Databricks and Azure ML
Productionizing Machine Learning Pipelines with Databricks and Azure ML
 
Kyryl Truskovskyi: Kubeflow for end2end machine learning lifecycle
Kyryl Truskovskyi: Kubeflow for end2end machine learning lifecycleKyryl Truskovskyi: Kubeflow for end2end machine learning lifecycle
Kyryl Truskovskyi: Kubeflow for end2end machine learning lifecycle
 
"Remote development of Quarkus applications"
"Remote development of Quarkus applications""Remote development of Quarkus applications"
"Remote development of Quarkus applications"
 
AI Pipeline Optimization using Kubeflow
AI Pipeline Optimization using KubeflowAI Pipeline Optimization using Kubeflow
AI Pipeline Optimization using Kubeflow
 
"Kubernetes as Driver of Generic IT Automation"
"Kubernetes as Driver of Generic IT Automation""Kubernetes as Driver of Generic IT Automation"
"Kubernetes as Driver of Generic IT Automation"
 
How to set up Kubernetes for all your machine learning workflows
How to set up Kubernetes for all your machine learning workflowsHow to set up Kubernetes for all your machine learning workflows
How to set up Kubernetes for all your machine learning workflows
 
Getting Started with Visual Studio Tools for AI
Getting Started with Visual Studio Tools for AIGetting Started with Visual Studio Tools for AI
Getting Started with Visual Studio Tools for AI
 
Yannis Zarkadas. Enterprise data science workflows on kubeflow
Yannis Zarkadas. Enterprise data science workflows on kubeflowYannis Zarkadas. Enterprise data science workflows on kubeflow
Yannis Zarkadas. Enterprise data science workflows on kubeflow
 
Using AML Python SDK
Using AML Python SDKUsing AML Python SDK
Using AML Python SDK
 
Accelerated Machine Learning with RAPIDS and MLflow, Nvidia/RAPIDS
Accelerated Machine Learning with RAPIDS and MLflow, Nvidia/RAPIDSAccelerated Machine Learning with RAPIDS and MLflow, Nvidia/RAPIDS
Accelerated Machine Learning with RAPIDS and MLflow, Nvidia/RAPIDS
 
Operator development made easy with helm
Operator development made easy with helmOperator development made easy with helm
Operator development made easy with helm
 
Build and Monitor Machine Learning Services in Kubernetes
Build and Monitor Machine Learning Services in KubernetesBuild and Monitor Machine Learning Services in Kubernetes
Build and Monitor Machine Learning Services in Kubernetes
 
Boolan machine learning summit
Boolan machine learning summitBoolan machine learning summit
Boolan machine learning summit
 
Serverless with Knative - Mete Atamel (Google)
Serverless with Knative - Mete Atamel (Google)Serverless with Knative - Mete Atamel (Google)
Serverless with Knative - Mete Atamel (Google)
 

Similar to Machine learning on kubernetes

PySpark on Kubernetes @ Python Barcelona March Meetup
PySpark on Kubernetes @ Python Barcelona March MeetupPySpark on Kubernetes @ Python Barcelona March Meetup
PySpark on Kubernetes @ Python Barcelona March MeetupHolden Karau
 
Democratizing machine learning on kubernetes
Democratizing machine learning on kubernetesDemocratizing machine learning on kubernetes
Democratizing machine learning on kubernetesDocker, Inc.
 
Flink Forward San Francisco 2019: Managing Flink on Kubernetes - FlinkK8sOper...
Flink Forward San Francisco 2019: Managing Flink on Kubernetes - FlinkK8sOper...Flink Forward San Francisco 2019: Managing Flink on Kubernetes - FlinkK8sOper...
Flink Forward San Francisco 2019: Managing Flink on Kubernetes - FlinkK8sOper...Flink Forward
 
ML Platform Q1 Meetup: Airbnb's End-to-End Machine Learning Infrastructure
ML Platform Q1 Meetup: Airbnb's End-to-End Machine Learning InfrastructureML Platform Q1 Meetup: Airbnb's End-to-End Machine Learning Infrastructure
ML Platform Q1 Meetup: Airbnb's End-to-End Machine Learning InfrastructureFei Chen
 
DevOps Days Boston 2017: Real-world Kubernetes for DevOps
DevOps Days Boston 2017: Real-world Kubernetes for DevOpsDevOps Days Boston 2017: Real-world Kubernetes for DevOps
DevOps Days Boston 2017: Real-world Kubernetes for DevOpsAmbassador Labs
 
Kubernetes Forum Seoul 2019: Re-architecting Data Platform with Kubernetes
Kubernetes Forum Seoul 2019: Re-architecting Data Platform with KubernetesKubernetes Forum Seoul 2019: Re-architecting Data Platform with Kubernetes
Kubernetes Forum Seoul 2019: Re-architecting Data Platform with KubernetesSeungYong Oh
 
AirBNB's ML platform - BigHead
AirBNB's ML platform - BigHeadAirBNB's ML platform - BigHead
AirBNB's ML platform - BigHeadKarthik Murugesan
 
Bighead: Airbnb’s End-to-End Machine Learning Platform with Krishna Puttaswa...
 Bighead: Airbnb’s End-to-End Machine Learning Platform with Krishna Puttaswa... Bighead: Airbnb’s End-to-End Machine Learning Platform with Krishna Puttaswa...
Bighead: Airbnb’s End-to-End Machine Learning Platform with Krishna Puttaswa...Databricks
 
Infrastructure Agnostic Machine Learning Workload Deployment
Infrastructure Agnostic Machine Learning Workload DeploymentInfrastructure Agnostic Machine Learning Workload Deployment
Infrastructure Agnostic Machine Learning Workload DeploymentDatabricks
 
KubeCon NA - 2021 Tools That I Wish Existed 3 Years Ago To Build a SaaS Offering
KubeCon NA - 2021 Tools That I Wish Existed 3 Years Ago To Build a SaaS OfferingKubeCon NA - 2021 Tools That I Wish Existed 3 Years Ago To Build a SaaS Offering
KubeCon NA - 2021 Tools That I Wish Existed 3 Years Ago To Build a SaaS OfferingMauricio (Salaboy) Salatino
 
Productionizing Machine Learning - Bigdata meetup 5-06-2019
Productionizing Machine Learning - Bigdata meetup 5-06-2019Productionizing Machine Learning - Bigdata meetup 5-06-2019
Productionizing Machine Learning - Bigdata meetup 5-06-2019Iulian Pintoiu
 
Kubernetes: Managed or Not Managed?
Kubernetes: Managed or Not Managed?Kubernetes: Managed or Not Managed?
Kubernetes: Managed or Not Managed?Mathieu Herbert
 
Nugwc k8s session-16-march-2021
Nugwc k8s session-16-march-2021Nugwc k8s session-16-march-2021
Nugwc k8s session-16-march-2021Avanti Patil
 
Containerized architectures for deep learning
Containerized architectures for deep learningContainerized architectures for deep learning
Containerized architectures for deep learningAntje Barth
 
Distributed tensorflow on kubernetes
Distributed tensorflow on kubernetesDistributed tensorflow on kubernetes
Distributed tensorflow on kubernetesinwin stack
 
Distributed tensorflow on kubernetes
Distributed tensorflow on kubernetesDistributed tensorflow on kubernetes
Distributed tensorflow on kubernetesinwin stack
 
Kubernetes for Beginners
Kubernetes for BeginnersKubernetes for Beginners
Kubernetes for BeginnersDigitalOcean
 
Running Apache Spark on Kubernetes: Best Practices and Pitfalls
Running Apache Spark on Kubernetes: Best Practices and PitfallsRunning Apache Spark on Kubernetes: Best Practices and Pitfalls
Running Apache Spark on Kubernetes: Best Practices and PitfallsDatabricks
 
introduction to micro services
introduction to micro servicesintroduction to micro services
introduction to micro servicesSpyros Lambrinidis
 
Introduction to kubernetes
Introduction to kubernetesIntroduction to kubernetes
Introduction to kubernetesGabriel Carro
 

Similar to Machine learning on kubernetes (20)

PySpark on Kubernetes @ Python Barcelona March Meetup
PySpark on Kubernetes @ Python Barcelona March MeetupPySpark on Kubernetes @ Python Barcelona March Meetup
PySpark on Kubernetes @ Python Barcelona March Meetup
 
Democratizing machine learning on kubernetes
Democratizing machine learning on kubernetesDemocratizing machine learning on kubernetes
Democratizing machine learning on kubernetes
 
Flink Forward San Francisco 2019: Managing Flink on Kubernetes - FlinkK8sOper...
Flink Forward San Francisco 2019: Managing Flink on Kubernetes - FlinkK8sOper...Flink Forward San Francisco 2019: Managing Flink on Kubernetes - FlinkK8sOper...
Flink Forward San Francisco 2019: Managing Flink on Kubernetes - FlinkK8sOper...
 
ML Platform Q1 Meetup: Airbnb's End-to-End Machine Learning Infrastructure
ML Platform Q1 Meetup: Airbnb's End-to-End Machine Learning InfrastructureML Platform Q1 Meetup: Airbnb's End-to-End Machine Learning Infrastructure
ML Platform Q1 Meetup: Airbnb's End-to-End Machine Learning Infrastructure
 
DevOps Days Boston 2017: Real-world Kubernetes for DevOps
DevOps Days Boston 2017: Real-world Kubernetes for DevOpsDevOps Days Boston 2017: Real-world Kubernetes for DevOps
DevOps Days Boston 2017: Real-world Kubernetes for DevOps
 
Kubernetes Forum Seoul 2019: Re-architecting Data Platform with Kubernetes
Kubernetes Forum Seoul 2019: Re-architecting Data Platform with KubernetesKubernetes Forum Seoul 2019: Re-architecting Data Platform with Kubernetes
Kubernetes Forum Seoul 2019: Re-architecting Data Platform with Kubernetes
 
AirBNB's ML platform - BigHead
AirBNB's ML platform - BigHeadAirBNB's ML platform - BigHead
AirBNB's ML platform - BigHead
 
Bighead: Airbnb’s End-to-End Machine Learning Platform with Krishna Puttaswa...
 Bighead: Airbnb’s End-to-End Machine Learning Platform with Krishna Puttaswa... Bighead: Airbnb’s End-to-End Machine Learning Platform with Krishna Puttaswa...
Bighead: Airbnb’s End-to-End Machine Learning Platform with Krishna Puttaswa...
 
Infrastructure Agnostic Machine Learning Workload Deployment
Infrastructure Agnostic Machine Learning Workload DeploymentInfrastructure Agnostic Machine Learning Workload Deployment
Infrastructure Agnostic Machine Learning Workload Deployment
 
KubeCon NA - 2021 Tools That I Wish Existed 3 Years Ago To Build a SaaS Offering
KubeCon NA - 2021 Tools That I Wish Existed 3 Years Ago To Build a SaaS OfferingKubeCon NA - 2021 Tools That I Wish Existed 3 Years Ago To Build a SaaS Offering
KubeCon NA - 2021 Tools That I Wish Existed 3 Years Ago To Build a SaaS Offering
 
Productionizing Machine Learning - Bigdata meetup 5-06-2019
Productionizing Machine Learning - Bigdata meetup 5-06-2019Productionizing Machine Learning - Bigdata meetup 5-06-2019
Productionizing Machine Learning - Bigdata meetup 5-06-2019
 
Kubernetes: Managed or Not Managed?
Kubernetes: Managed or Not Managed?Kubernetes: Managed or Not Managed?
Kubernetes: Managed or Not Managed?
 
Nugwc k8s session-16-march-2021
Nugwc k8s session-16-march-2021Nugwc k8s session-16-march-2021
Nugwc k8s session-16-march-2021
 
Containerized architectures for deep learning
Containerized architectures for deep learningContainerized architectures for deep learning
Containerized architectures for deep learning
 
Distributed tensorflow on kubernetes
Distributed tensorflow on kubernetesDistributed tensorflow on kubernetes
Distributed tensorflow on kubernetes
 
Distributed tensorflow on kubernetes
Distributed tensorflow on kubernetesDistributed tensorflow on kubernetes
Distributed tensorflow on kubernetes
 
Kubernetes for Beginners
Kubernetes for BeginnersKubernetes for Beginners
Kubernetes for Beginners
 
Running Apache Spark on Kubernetes: Best Practices and Pitfalls
Running Apache Spark on Kubernetes: Best Practices and PitfallsRunning Apache Spark on Kubernetes: Best Practices and Pitfalls
Running Apache Spark on Kubernetes: Best Practices and Pitfalls
 
introduction to micro services
introduction to micro servicesintroduction to micro services
introduction to micro services
 
Introduction to kubernetes
Introduction to kubernetesIntroduction to kubernetes
Introduction to kubernetes
 

Recently uploaded

Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 

Recently uploaded (20)

Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 

Machine learning on kubernetes

  • 1. Machine Learning on Kubernetes 13 Dec 2017 Anirudh Ramanathan Software Engineer on Kubernetes Twitter: @anirudh4444
  • 2. Disclaimer I’m not a Machine Learning expert. I work on infrastructure and distributed systems for a living.
  • 3. Kubernetes a year ago... ● Was used primarily for stateless workloads ● Needed an understanding of several core concepts to operate ● Applications had to be written to fit into core controller abstractions
  • 4. Kubernetes today... ● Has abstractions to support Stateful applications and now data processing and machine learning. ● Has a wide range of extension points including ones that allow API extensions and custom controllers. ● Has support for building higher level abstractions and APIs to hide infrastructure & operational complexity.
  • 5. What’s changed? ● Workload controller abstractions moving to GA/stable. ● Custom Resource Definitions & Aggregated API Servers ● Kubernetes Operators ● Community support for external frameworks ● Work on scheduling and resource management (ongoing)
  • 6. Machine Learning Solving problems without explicitly knowing how to create solutions
  • 7. Machine Learning Infrastructure TFX: A TensorFlow-Based Production-Scale Machine Learning Platform (KDD 2017)
  • 8. Machine Learning Infrastructure TFX: A TensorFlow-Based Production-Scale Machine Learning Platform (KDD 2017)
  • 9. Kubeflow https://github.com/google/kubeflow/ Our goal is not to recreate other services, but to provide a straightforward way for spinning up best of breed OSS solutions. ● A JupyterHub to create & manage interactive Jupyter notebooks ● A Tensorflow Training Controller that can be configured to use CPUs or GPUs, and adjusted to the size of a cluster with a single setting ● A TF Serving container
  • 10. JupyterHub ● A single hub & proxy for managing interactive sessions ● Can run entirely within Kubernetes - notebooks are backed by Kubernetes pods ● Can request required resources - CPUs, GPUs, etc ● Has pluggable authentication (oauth, kdc, etc) Made possible by: https://github.com/jupyterhub/kubespawner
  • 11. Tensorflow Training Controller ● A Kubernetes “operator” to help run distributed/non-distributed TF training. ● Exposes an API through a CustomResourceDefinition ● Controller manages complexity of distributed training using Tensorflow. Made possible by: https://github.com/tensorflow/k8s
  • 12. Tensorflow Serving ● A Kubernetes Deployment that can serve saved models ● Deployment - replicas can be scaled. Future work: ● Custom metrics & Autoscaling
  • 13. But there were so many stages! ● Clearly there are many other challenges faced by people building Machine Learning infrastructure. ● How do I preprocess data? ● How do I describe my pipeline? ● How do I orchestrate my pipeline? ● We have some ideas.
  • 14. Apache Spark ● Spark on Kubernetes is an ongoing effort since Dec 2016. ● It is being upstreamed into Spark and expected to land in Spark 2.3 (due sometime in January). ● The changes make Spark itself aware of a new Kubernetes Scheduler that can directly run Spark applications for the user.
  • 15. Apache Spark Spark Core Kubernetes Scheduler Backend Kubernetes Cluster add executors rm executors configuration
  • 16. Apache Spark Kubernetes Scheduler for Spark ● Spark 2.3 will support ○ Running Java/Scala jobs ○ Static allocation of executors ○ Some dependency management ● Our fork (github.com/apache-spark-on-k8s/spark) has several additional features which we’re slowly upstreaming. ○ It’s being run by several organizations right now.
  • 17. Apache Airflow ● A DAG scheduler. ● Has a rich ecosystem of “operators” to allow interacting with different applications. ● Community working on a Kubernetes native executor for Airflow. ● Currently in the process of being upstreamed.
  • 18. Apache Airflow BashOperator( task_id = ‘account-test’, bash_command = ‘run-something.sh’, dag = dag, executor_config = { ‘request_memory’: ‘128Mi’, ‘limit_memory’: ‘128Mi’ ‘image’: ‘airflow/scipy:1.1.5’ } ) The operators can specify various Kubernetes executor constraints within each DAG step. For example:
  • 19. Putting it all together HDFS or GCS/S3 Spark Airflow Pipeline JupyterHub Tensorflow Other ML Frameworks
  • 20. Get Involved Kubeflow ● Slack Channel (See https://github.com/google/kubeflow for joining instructions) ● Twitter (http://twitter.com/kubeflow) ● Mailing List (https://groups.google.com/forum/#!forum/kubeflow-discuss) SIG Big Data ● Slack Channel (https://kubernetes.slack.com/messages/sig-big-data) ● Mailing list (https://groups.google.com/forum/#!forum/kubernetes-sig-big-data) ● Weekly meeting (https://github.com/kubernetes/community/tree/master/sig-big-data)