There are many options to build and deploy machine learning models to production at scale. We’ll walk through the growing suite of tools to centralize model building and deployment at scale, with technologies like Kubeflow, Seldon and TensorRT. Finally, we’ll address techniques for monitoring these new services.
You’ll walk away with an understanding of a complete development lifecycle for scalable machine learning services.
3. Dedicated hardware
for inference
Availability of large,
labeled datasets
Original gains with
CNN to detect objects
in images
There is (almost) nothing new in
universal turing machines.
Why is machine learning
suddenly a thing?
10. Build, clean, and label
a dataset
Build a model Deploy and Evaluate
How does it work?
Real world data is
filled with garbage
and messy
Models tend to be
brittle and opaque,
difficult to debug
How do you
measure
‘performance’?
14. Training cost to reproduce GPT-2
text generation model (via
OpenGPT-2) from scratch ~$50,000
on GCP*
* https://blog.usejournal.com/opengpt-2-we-replicated-
gpt-2-because-you-can-too-45e34e6d36dc
Training Computational
Needs Can Be Staggering
Computational costs for running
AlphaGo Zero estimated to cost
~$35 million*
* https://www.yuzeh.com/data/agz-cost.html
15. Cell phones are all adding
acceleration to native chips
Bandwidth and latency limitations
on cameras forces inference
computation to edge
Inference Gets Pushed Out to
the Edge
16. Gather, clean, and label data Once a model is deployed, can
then be used to bootstrap better
data
Datasets Become Integral
Part of Your Code
23. – etcd, default database to
manage kubernetes state,
recommends 5 (!) server
instances to durability
– Growing list of subpackages
to control networking layers,
load balancing (Istio, Envoy)
– Completely rethink
development, testing,
deployment as kubernetes
grows beyond single dev
machine
Kubernetes
comes with extra
complexity
27. Kubeflow
ML toolkit for
Kubernetes
from Google
TensorRT Inference
Server
Custom Inference server
with Optimizations for
NVIDIA Hardware
Pachyderm
Version control for data,
and data pipelines
Kubernetes Native ML Tools
40. See complete units of work, as they
pass through your entire system,
especially useful in Pipelines with
multiple steps.
Add tags to be able to see specific
customers, organizations, and their
direct experience with your
systems.
Compress Distributed System
Complexity with Traces
42. See bottlenecks in CPU, disk space,
memory usage.
For GPU / TPU, see hardware level
metrics
Correlate with logs and traces to
isolate errors to software or
hardware level issues.
See History of Machine State
with Metrics
45. Ingested logs show the history of
work on each individual system
component.
Correlate with traces and metrics to
isolate errors to library, software
dependency level
See Auditable Trail of Side
Effects with Logs
46.
47. – Either of these advances take
in isolation adds to a
platform
– Taken together, we have a
chance to rethink the way
software behaves. (Images as
code and APIs.)
Object detection
alone opens up
new platforms