SlideShare ist ein Scribd-Unternehmen logo
1 von 52
Downloaden Sie, um offline zu lesen
Build and Monitor
Machine Learning
Services in Kubernetes
Kirk Kaiser
@burningion
Dedicated hardware
for inference
Availability of large,
labeled datasets
Original gains with
CNN to detect objects
in images
There is (almost) nothing new in
universal turing machines.
Why is machine learning
suddenly a thing?
DeepMind &
AlphaGo Zero
OpenAI & Dota 2
DeepFakes
& Nicholas Cage
Build, clean, and label
a dataset
Build a model Deploy and Evaluate
How does it work?
???
How to build a deep learning
model
model
build a deep learning
model
Actual model used for a project:
Build, clean, and label
a dataset
Build a model Deploy and Evaluate
How does it work?
Real world data is
filled with garbage
and messy
Models tend to be
brittle and opaque,
difficult to debug
How do you
measure
‘performance’?
ML Development Lifecycle
is Different from Traditional
Development
Suddenly the machine you’re
running on matters.
(GPUs and TPUs necessary for
training quickly on massive
datasets.)
Driver updates can speed up your
code 300% (?!)
Training cost to reproduce GPT-2
text generation model (via
OpenGPT-2) from scratch ~$50,000
on GCP*
* https://blog.usejournal.com/opengpt-2-we-replicated-
gpt-2-because-you-can-too-45e34e6d36dc
Training Computational
Needs Can Be Staggering
Computational costs for running
AlphaGo Zero estimated to cost
~$35 million*
* https://www.yuzeh.com/data/agz-cost.html
Cell phones are all adding
acceleration to native chips
Bandwidth and latency limitations
on cameras forces inference
computation to edge
Inference Gets Pushed Out to
the Edge
Gather, clean, and label data Once a model is deployed, can
then be used to bootstrap better
data
Datasets Become Integral
Part of Your Code
Web Software
Development
is Changing
More Container Adoption
Especially in Larger
Environments
And Containers Usually Means
Orchestration
Moving from managing and deploying
to individual servers to pushing code to
a “sea of infinite computation”
But there is a
tradeoff...
– etcd, default database to
manage kubernetes state,
recommends 5 (!) server
instances to durability
– Growing list of subpackages
to control networking layers,
load balancing (Istio, Envoy)
– Completely rethink
development, testing,
deployment as kubernetes
grows beyond single dev
machine
Kubernetes
comes with extra
complexity
etcd
Docker
Kubernetes
Envoy
Minikube
Istio
containerd
Spinnaker
That complexity is trade off in move
from building software as a service
to software as a utility.
More Pieces to Manage and
Deploy
Kubeflow
ML toolkit for
Kubernetes
from Google
TensorRT Inference
Server
Custom Inference server
with Optimizations for
NVIDIA Hardware
Pachyderm
Version control for data,
and data pipelines
Kubernetes Native ML Tools
Kubeflow supports both TensorRT
and PyTorch for Inference
That’s a lot of
moving pieces!
How do I get
started?
Simplify first.
Run Kubernetes
locally, with GPU
acceleration.
@sterlingcrispin
+
NVIDIA Container Registry
https://ngc.nvidia.com
PyTorch & FFMPEG w/ Hardware
Acceleration Dockerfile
Example YAML
Manifest for
Kubernetes
Observability in systems become a
necessity to understand what’s
happening in software with so many
moving pieces.
For example, pipelines.
See complete units of work, as they
pass through your entire system,
especially useful in Pipelines with
multiple steps.
Add tags to be able to see specific
customers, organizations, and their
direct experience with your
systems.
Compress Distributed System
Complexity with Traces
Individual Trace
See bottlenecks in CPU, disk space,
memory usage.
For GPU / TPU, see hardware level
metrics
Correlate with logs and traces to
isolate errors to software or
hardware level issues.
See History of Machine State
with Metrics
Add Observability with
DaemonSets
Ingested logs show the history of
work on each individual system
component.
Correlate with traces and metrics to
isolate errors to library, software
dependency level
See Auditable Trail of Side
Effects with Logs
– Either of these advances take
in isolation adds to a
platform
– Taken together, we have a
chance to rethink the way
software behaves. (Images as
code and APIs.)
Object detection
alone opens up
new platforms
Bin-e, a self
sorting recycling
bin
Dab & T-Pose
controlled lights
Jetson Nano
$99 GPU Accelerated
Machine
Thank you!
Datadog is hiring!
Example Kubernetes Project:
https://dtdg.co/ml-kubernetes

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Distributed tensorflow on kubernetes
Distributed tensorflow on kubernetesDistributed tensorflow on kubernetes
Distributed tensorflow on kubernetes
 
Cloud Native CI/CD with GitOps
Cloud Native CI/CD with GitOpsCloud Native CI/CD with GitOps
Cloud Native CI/CD with GitOps
 
PuppetConf 2017: Kubernetes in the Cloud w/ Puppet + Google Container Engine-...
PuppetConf 2017: Kubernetes in the Cloud w/ Puppet + Google Container Engine-...PuppetConf 2017: Kubernetes in the Cloud w/ Puppet + Google Container Engine-...
PuppetConf 2017: Kubernetes in the Cloud w/ Puppet + Google Container Engine-...
 
5 Habits of High-Velocity Teams Using Kubernetes
5 Habits of High-Velocity Teams Using Kubernetes5 Habits of High-Velocity Teams Using Kubernetes
5 Habits of High-Velocity Teams Using Kubernetes
 
How to make cloud native platform by kubernetes
How to make cloud native platform by kubernetesHow to make cloud native platform by kubernetes
How to make cloud native platform by kubernetes
 
Enterprise Kubernetes from Canonical
Enterprise Kubernetes from CanonicalEnterprise Kubernetes from Canonical
Enterprise Kubernetes from Canonical
 
Genomics data insights
Genomics data insightsGenomics data insights
Genomics data insights
 
How to integrate Kubernetes in OpenStack: You need to know these project
How to integrate Kubernetes in OpenStack: You need to know these projectHow to integrate Kubernetes in OpenStack: You need to know these project
How to integrate Kubernetes in OpenStack: You need to know these project
 
What is Kubernets
What is  KubernetsWhat is  Kubernets
What is Kubernets
 
Pulumi iac on gcp
Pulumi iac on gcpPulumi iac on gcp
Pulumi iac on gcp
 
Bitfusion Nimbix Dev Summit Heterogeneous Architectures
Bitfusion Nimbix Dev Summit Heterogeneous Architectures Bitfusion Nimbix Dev Summit Heterogeneous Architectures
Bitfusion Nimbix Dev Summit Heterogeneous Architectures
 
Introducing Pico - A Deep Learning Platform using Docker & IoT - Sangam Biradar
Introducing Pico - A Deep Learning Platform using Docker & IoT - Sangam BiradarIntroducing Pico - A Deep Learning Platform using Docker & IoT - Sangam Biradar
Introducing Pico - A Deep Learning Platform using Docker & IoT - Sangam Biradar
 
Making cloud native platform by kubernetes
Making cloud native platform by kubernetesMaking cloud native platform by kubernetes
Making cloud native platform by kubernetes
 
PuppetConf 2017: Zero to Kubernetes -Scott Coulton, Puppet
PuppetConf 2017: Zero to Kubernetes -Scott Coulton, PuppetPuppetConf 2017: Zero to Kubernetes -Scott Coulton, Puppet
PuppetConf 2017: Zero to Kubernetes -Scott Coulton, Puppet
 
Webinar kubernetes and-spark
Webinar  kubernetes and-sparkWebinar  kubernetes and-spark
Webinar kubernetes and-spark
 
Google Cloud - Stand Out Features
Google Cloud - Stand Out FeaturesGoogle Cloud - Stand Out Features
Google Cloud - Stand Out Features
 
Yannis Zarkadas. Enterprise data science workflows on kubeflow
Yannis Zarkadas. Enterprise data science workflows on kubeflowYannis Zarkadas. Enterprise data science workflows on kubeflow
Yannis Zarkadas. Enterprise data science workflows on kubeflow
 
DevOps with Azure, Kubernetes, and Helm Webinar
DevOps with Azure, Kubernetes, and Helm WebinarDevOps with Azure, Kubernetes, and Helm Webinar
DevOps with Azure, Kubernetes, and Helm Webinar
 
Jenkins X intro (from google app dev conference)
Jenkins X intro (from google app dev conference)Jenkins X intro (from google app dev conference)
Jenkins X intro (from google app dev conference)
 
KubeCon China 2019 - Building Apps with Containers, Functions and Managed Ser...
KubeCon China 2019 - Building Apps with Containers, Functions and Managed Ser...KubeCon China 2019 - Building Apps with Containers, Functions and Managed Ser...
KubeCon China 2019 - Building Apps with Containers, Functions and Managed Ser...
 

Ähnlich wie Build and Monitor Machine Learning Services in Kubernetes

Stateful, Stateless and Serverless - Running Apache Kafka® on Kubernetes
Stateful, Stateless and Serverless - Running Apache Kafka® on KubernetesStateful, Stateless and Serverless - Running Apache Kafka® on Kubernetes
Stateful, Stateless and Serverless - Running Apache Kafka® on Kubernetes
confluent
 

Ähnlich wie Build and Monitor Machine Learning Services in Kubernetes (20)

DevOps Days Boston 2017: Real-world Kubernetes for DevOps
DevOps Days Boston 2017: Real-world Kubernetes for DevOpsDevOps Days Boston 2017: Real-world Kubernetes for DevOps
DevOps Days Boston 2017: Real-world Kubernetes for DevOps
 
Productionizing Machine Learning - Bigdata meetup 5-06-2019
Productionizing Machine Learning - Bigdata meetup 5-06-2019Productionizing Machine Learning - Bigdata meetup 5-06-2019
Productionizing Machine Learning - Bigdata meetup 5-06-2019
 
Containerized architectures for deep learning
Containerized architectures for deep learningContainerized architectures for deep learning
Containerized architectures for deep learning
 
S8277 - Introducing Krylov: AI Platform that Empowers eBay Data Science and E...
S8277 - Introducing Krylov: AI Platform that Empowers eBay Data Science and E...S8277 - Introducing Krylov: AI Platform that Empowers eBay Data Science and E...
S8277 - Introducing Krylov: AI Platform that Empowers eBay Data Science and E...
 
Nugwc k8s session-16-march-2021
Nugwc k8s session-16-march-2021Nugwc k8s session-16-march-2021
Nugwc k8s session-16-march-2021
 
Democratizing machine learning on kubernetes
Democratizing machine learning on kubernetesDemocratizing machine learning on kubernetes
Democratizing machine learning on kubernetes
 
How to build streaming data pipelines with Akka Streams, Flink, and Spark usi...
How to build streaming data pipelines with Akka Streams, Flink, and Spark usi...How to build streaming data pipelines with Akka Streams, Flink, and Spark usi...
How to build streaming data pipelines with Akka Streams, Flink, and Spark usi...
 
Stateful, Stateless and Serverless - Running Apache Kafka® on Kubernetes
Stateful, Stateless and Serverless - Running Apache Kafka® on KubernetesStateful, Stateless and Serverless - Running Apache Kafka® on Kubernetes
Stateful, Stateless and Serverless - Running Apache Kafka® on Kubernetes
 
Deploying deep learning models with Docker and Kubernetes
Deploying deep learning models with Docker and KubernetesDeploying deep learning models with Docker and Kubernetes
Deploying deep learning models with Docker and Kubernetes
 
"Kubernetes as Driver of Generic IT Automation"
"Kubernetes as Driver of Generic IT Automation""Kubernetes as Driver of Generic IT Automation"
"Kubernetes as Driver of Generic IT Automation"
 
"Kubernetes as Driver of Generic IT Automation"
"Kubernetes as Driver of Generic IT Automation""Kubernetes as Driver of Generic IT Automation"
"Kubernetes as Driver of Generic IT Automation"
 
DevOps Spain 2019. David Cañadillas -Cloudbees
DevOps Spain 2019. David Cañadillas -CloudbeesDevOps Spain 2019. David Cañadillas -Cloudbees
DevOps Spain 2019. David Cañadillas -Cloudbees
 
Yannis Zarkadas. Stefano Fioravanzo. Enterprise data science workflows on kub...
Yannis Zarkadas. Stefano Fioravanzo. Enterprise data science workflows on kub...Yannis Zarkadas. Stefano Fioravanzo. Enterprise data science workflows on kub...
Yannis Zarkadas. Stefano Fioravanzo. Enterprise data science workflows on kub...
 
How To Build Efficient ML Pipelines From The Startup Perspective (GTC Silicon...
How To Build Efficient ML Pipelines From The Startup Perspective (GTC Silicon...How To Build Efficient ML Pipelines From The Startup Perspective (GTC Silicon...
How To Build Efficient ML Pipelines From The Startup Perspective (GTC Silicon...
 
Jacopo Nardiello - From CI to Prod: Running Magento at scale with Kubernetes
Jacopo Nardiello - From CI to Prod: Running Magento at scale with KubernetesJacopo Nardiello - From CI to Prod: Running Magento at scale with Kubernetes
Jacopo Nardiello - From CI to Prod: Running Magento at scale with Kubernetes
 
Distributed Tensorflow with Kubernetes - data2day - Jakob Karalus
Distributed Tensorflow with Kubernetes - data2day - Jakob KaralusDistributed Tensorflow with Kubernetes - data2day - Jakob Karalus
Distributed Tensorflow with Kubernetes - data2day - Jakob Karalus
 
給 RD 的 Kubernetes 初體驗
給 RD 的 Kubernetes 初體驗給 RD 的 Kubernetes 初體驗
給 RD 的 Kubernetes 初體驗
 
Continuous Lifecycle London 2018 Event Keynote
Continuous Lifecycle London 2018 Event KeynoteContinuous Lifecycle London 2018 Event Keynote
Continuous Lifecycle London 2018 Event Keynote
 
Kubernetes at Google Cloud Community Copenhagen
Kubernetes at Google Cloud Community CopenhagenKubernetes at Google Cloud Community Copenhagen
Kubernetes at Google Cloud Community Copenhagen
 
Monitoring kubernetes across data center and cloud
Monitoring kubernetes across data center and cloudMonitoring kubernetes across data center and cloud
Monitoring kubernetes across data center and cloud
 

Kürzlich hochgeladen

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Kürzlich hochgeladen (20)

Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
A Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusA Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source Milvus
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 

Build and Monitor Machine Learning Services in Kubernetes