A lightning talk discussing some important challenges facing ML engineers and how the introduction of Kubeflow Pipelines will help.
Full slides w/ speaker notes here: https://docs.google.com/presentation/d/12dwhS_x4568G6XQjI9SEUacD-n4hFQczBcRBLdbHNEM/edit
6. Most teams spend 1-3
âsprintsâ getting an
initial prototype out
How many product teams
will wait this long to get
initial learnings?
How long does it take to build an ML
prototype?
7. Over 30% of ML
practitioners spend more
than a quarter turning an
idea into production
software
How long does it take to go from prototype ->
production-grade solution?
10. Problem Definition Prototype Productionize Measure
4w 2w 1w 2w 1w 2w 2w
14 weeks to go from a
defined problem to a
production solution!
11. Difficult to Collaborate
Keeping track of
projects, artifacts
and lineage was
difficult
No common way of
building workflows
Teams using N
different frameworks
in different ways. No
shared learnings.
Slow feedback loops
Data analysis was
separate from model
training and model
analysis. Each step is
custom
Other Challenges
12. ! Started discussing it
with Google in early
2018
! Aligned our infra tooling
with their direction
! Product launched in late
2018
! Promising early results!
Kubeflow Pipelines
13. ! Evaluated in mid 2018
! Decided to replace our
scala-based ML tooling
with TFX
Tensorflow
Extended
14. ! Launched a team to
make Kubeflow
Pipelines work for
Spotify
! Thin internal layer to
help development
speed and integrate
with Spotify
ecosystem
Kubeflow + TFX at Spotify
15. Test Cluster
Internal development
cluster to test
upgrades, run
integration tests
Development Cluster
For running ad-hoc
jobs, developing new
workflows
Production Cluster
For regularly
scheduled workloads
Higher availability
SLA
âSpotifyâ Kubeflow Setup
16. Caching
Quicker resumption of
failed tasks
Other Spotify Kubeflow Features
Central Metadata
Keep track of whatâs
being built and run
Spotify-wide
Command Line
Tooling
Allows for scheduling
and execution of jobs
via luigi (Spotify
orchestration
Shared-VPC
Integration
Connect with other
Spotify services
Common TFX
Components
Easily run tfx-based
pipelines
18. Problem Definition Prototype Productionize Measure
4w 1w 2d 1d 1d 2d 1d
Shorter iteration cycles =>
faster time to production =>
better ML in our products
Kubeflow Pipelines
Machine Learning Journey - Updated
19. Mention Hack week last week:
â Nearly 1000 runs
â (Maybe add a quote)
Recent Progress
! During hack week,
over 1000 runs of
pipeline experiments
! Developers are loving
the integration of data
validation, training and
model analysis
20. Augus
t 2019
âSpotifyâ Kubeflow
Pipeline Platform
launched in alpha.
Jan 2020
Launch beta -
open it up to the
entire Spotify
community
Aug 2018
Kubeflow
Pipeline
Launches
Jan 2019
First teams
trying out
Kubeflow. Start
focusing infra
efforts
Weâre here
Our Kubeflow Timeline
21. Our Vision for Kubeflow Pipelines
D. Sculley, G. Holt, D. Golovin, E. Davydov, T. Phillips, D. Ebner, V. Chaudhary, M. Young, and J.-F. Crespo. Hidden technical
debt in machine learning systems. In Neural Information Processing Systems (NIPS). 2015.