ML solutions in production start from data ingestion and extend upto the actual deployment step. We want this workflow to be scalable, portable and simple. Containers and kubernetes are great at the former two but not the latter if you aren't a devops practitioner. We'll explore how you can leverage the Kubeflow project to deploy best-of-breed open-source systems for ML to diverse infrastructures.
2. Agenda
- Need of DevOps for ML and Data Science (DataOps)
- Containers and Kubernetes for ML
- Opportunities and challenges
- Kubeflow: composable, portable and scalable ML
- Components
- Low bar, high ceiling
- Issues and roadmap
- Summary and demo
4. Current ML workflow
The reality
Source: https://papers.nips.cc/paper/5656-hidden-technical-debt-in-machine-learning-systems.pdf
5. DataOps - DevOps in Data Science and ML
DataOps is an automated, process-oriented methodology, used by analytic and data teams to improve
the quality and reduce the cycle time of data analytics.
DataOps manifesto: http://dataopsmanifesto.org
9. Containers
● Containers allow you to easily package an application's code, configurations, and dependencies
into easy to use building blocks.
● These building blocks deliver environmental consistency, operational efficiency, developer
productivity, and version control.
● To put it simply, your code runs in any environment!
11. Kubernetes
● Kubernetes is an orchestration manager for containers.
● It orchestrates computing, network and storage.
● Simply put, it makes your life easier when working with containers.
18. Kubeflow
● ML toolkit for Kubernetes
● Open-source and community-driven
● Support for multiple ML frameworks
● End-to-end workflows which can be shared,
scaled and deployed
Source:
https://github.com/kubeflow/kubeflow/issues/187
19. Low bar, high ceiling
● Low bar: allow data science practitioners to get up and running on Kubernetes cluster even
without DevOps know-how.
● High ceiling: allow sysdmins and DevOps practitioners to modify defaults and extend the
framework as needed.
20. Components
● Jupyterhub (collaboration and interactivity)
● K8s- native tensorflow controller (model building)
● K8s- native tensorflow serving deployment (model deployment)
● Ambassador (reverse proxy)
● Current and upcoming components for model tuning, model building and much more...
● Out-of-the-box setup for putting all of this together!
22. Tensorflow
- Open source numerical computing and ML
- Developed by Google, open-sourced in 2015
- Huge community and ecosystem
- Support for multiple ML models
- Tf-serving (model deployment), tensorboard
(training visualization), etc.
- Supports distributed training and
deployment of models
23. Why Kubeflow?
Based on current functionality you should consider using Kubeflow if:
● You want to train/serve TensorFlow models in different environments (e.g. local, on prem, and cloud)
● You want to use Jupyter notebooks to manage TensorFlow training jobs
● You want to launch training jobs that use resources – such as additional CPUs or GPUs – that aren’t
available on your personal computer
● You want to combine TensorFlow with other processes
○ For example, you may want to use tensorflow/agents to run simulations to generate data for
training reinforcement learning models.
Refer https://www.kubeflow.org/docs/started/getting-started/ for more info.
24. Demo
- Kubeflow tutorial using a sequence-to-sequence model
- Based on Hamel Husain’s wonderful post: How to create data products that are magical using
sequence-to-sequence models
- Github repo: https://github.com/kubeflow/examples/tree/master/github_issue_summarization
- Let’s get started!
26. Road ahead
- Get the entry (bar)rier lower
- Multi-tenancy on Kubernetes
- Support for different ML libraries/packages
- PyTorch
- Caffe2
- Mxnet
- v1.0 to be launched by December 2018
27. Find out more
- Official website: https://www.kubeflow.org/
- Github: https://github.com/kubeflow/kubeflow
- Katacoda tutorials: https://www.katacoda.com/kubeflow/
28. Reach out at
Email: akashtndn.acm@gmail.com, akash@socialcops.com
Twitter: @AkashTandon
Github: @analyticalmonk