SlideShare ist ein Scribd-Unternehmen logo
1 von 89
Human CentricHuman Centric
Machine Learning InfrastructureMachine Learning Infrastructure
@@
Ville Tuulos
QCon SF, November 2018
InfoQ.com: News & Community Site
• Over 1,000,000 software developers, architects and CTOs read the site world-
wide every month
• 250,000 senior developers subscribe to our weekly newsletter
• Published in 4 languages (English, Chinese, Japanese and Brazilian
Portuguese)
• Post content from our QCon conferences
• 2 dedicated podcast channels: The InfoQ Podcast, with a focus on
Architecture and The Engineering Culture Podcast, with a focus on building
• 96 deep dives on innovative topics packed as downloadable emags and
minibooks
• Over 40 new content items per week
Watch the video with slide
synchronization on InfoQ.com!
https://www.infoq.com/presentations/
netflix-ml-infrastructure
Purpose of QCon
- to empower software development by facilitating the spread of
knowledge and innovation
Strategy
- practitioner-driven conference designed for YOU: influencers of
change and innovation in your teams
- speakers and topics driving the evolution and innovation
- connecting and catalyzing the influencers and innovators
Highlights
- attended by more than 12,000 delegates since 2007
- held in 9 cities worldwide
Presented at QCon San Francisco
www.qconsf.com
Meet Alex, a new chief data scientist at Caveman CupcakesMeet Alex, a new chief data scientist at Caveman Cupcakes
You are hired!
We need a
dynamic pricing
model.
We need a
dynamic pricing
model.
Optimal pricing model
Great job!
The model works
perfectly!
Could you
predict churn
too?
Optimal pricing modelOptimal churn model
Alex's model
Optimal pricing modelOptimal churn model
Alex's model
Good job again!
Promising
results!
Can you include
a causal attribution
model for
marketing?
Optimal pricing model
Optimal churn model
Alex's model
Attribution
model
Are you sure
these results
make sense?
Take twoTake two
Meet the new data science team at Caveman CupcakesMeet the new data science team at Caveman Cupcakes
You are hired!
Pricing model
Churn modelAttribution
model
VS
the human is the bottleneckthe human is the bottleneck
VS
the human is the bottleneckthe human is the bottleneck the human is the bottleneckthe human is the bottleneck
VS
BuildBuild
Data WarehouseData Warehouse
BuildBuild
Data WarehouseData Warehouse
Compute ResourcesCompute Resources
BuildBuild
Data WarehouseData Warehouse
Compute ResourcesCompute Resources
Job SchedulerJob Scheduler
BuildBuild
Data WarehouseData Warehouse
Compute ResourcesCompute Resources
Job SchedulerJob Scheduler
VersioningVersioning
BuildBuild
Data WarehouseData Warehouse
Compute ResourcesCompute Resources
Job SchedulerJob Scheduler
VersioningVersioning
Collaboration ToolsCollaboration Tools
BuildBuild
Data WarehouseData Warehouse
Compute ResourcesCompute Resources
Job SchedulerJob Scheduler
VersioningVersioning
Collaboration ToolsCollaboration Tools
Model DeploymentModel Deployment
BuildBuild
Data WarehouseData Warehouse
Compute ResourcesCompute Resources
Job SchedulerJob Scheduler
VersioningVersioning
Collaboration ToolsCollaboration Tools
Model DeploymentModel Deployment
Feature EngineeringFeature Engineering
BuildBuild
Data WarehouseData Warehouse
Compute ResourcesCompute Resources
Job SchedulerJob Scheduler
VersioningVersioning
Collaboration ToolsCollaboration Tools
Model DeploymentModel Deployment
Feature EngineeringFeature Engineering
ML LibrariesML Libraries
BuildBuild
Data WarehouseData Warehouse
Compute ResourcesCompute Resources
Job SchedulerJob Scheduler
VersioningVersioning
Collaboration ToolsCollaboration Tools
Model DeploymentModel Deployment
Feature EngineeringFeature Engineering
ML LibrariesML Libraries
How muchHow much
data scientistdata scientist
carescares
BuildBuild
Data WarehouseData Warehouse
Compute ResourcesCompute Resources
Job SchedulerJob Scheduler
VersioningVersioning
Collaboration ToolsCollaboration Tools
Model DeploymentModel Deployment
Feature EngineeringFeature Engineering
ML LibrariesML Libraries
How muchHow much
data scientistdata scientist
carescares
How muchHow much
infrastructureinfrastructure
is neededis needed
BuildBuild
DeployDeploy
DeployDeploy
No plan survives contact with enemyNo plan survives contact with enemy
DeployDeploy
No plan survives contact with enemyNo plan survives contact with enemy
No model survives contact with realityNo model survives contact with reality
our ML infra supportsour ML infra supports
twotwo humanhuman activities:activities:
buildingbuilding andand deployingdeploying
data science workflows.data science workflows.
Screenplay Analysis Using NLP
Fraud Detection
Title Portfolio Optimization
Estimate Word-of-Mouth Effects
Incremental Impact of Marketing
Classify Support Tickets
Predict Quality of Network
Content Valuation
Cluster Tweets
Intelligent Infrastructure Machine Translation
Optimal CDN Caching
Predict Churn
Content Tagging
Optimize Production Schedules
Notebooks:Notebooks: NteractNteract
Job Scheduler:Job Scheduler: MesonMeson
Compute Resources:Compute Resources: TitusTitus
Query Engine:Query Engine: SparkSpark
Data Lake:Data Lake: S3S3
ML LibML Libraries: Rraries: R, XGBoost, TF etc., XGBoost, TF etc.
Notebooks:Notebooks: NteractNteract
Job Scheduler:Job Scheduler: MesonMeson
Compute Resources:Compute Resources: TitusTitus
Query Engine:Query Engine: SparkSpark
Data Lake:Data Lake: S3S3
ML LibML Libraries: Rraries: R, XGBoost, TF etc., XGBoost, TF etc.
Notebooks:Notebooks: NteractNteract
Job Scheduler:Job Scheduler: MesonMeson
Compute Resources:Compute Resources: TitusTitus
Query Engine:Query Engine: SparkSpark
Data Lake:Data Lake: S3S3
{
data
compute
prototyping
ML LibML Libraries: Rraries: R, XGBoost, TF etc., XGBoost, TF etc.models
Bad Old DaysBad Old Days
Bad Old DaysBad Old Days
Data Scientist built an NLP model in Python. Easy and fun!Data Scientist built an NLP model in Python. Easy and fun!
Bad Old DaysBad Old Days
Data Scientist built an NLP model in Python. Easy and fun!Data Scientist built an NLP model in Python. Easy and fun!
How to run at scale?
Custom Titus executor.
Bad Old DaysBad Old Days
Data Scientist built an NLP model in Python. Easy and fun!Data Scientist built an NLP model in Python. Easy and fun!
How to run at scale?
Custom Titus executor.
How to access data at scale?
Slow!
Bad Old DaysBad Old Days
Data Scientist built an NLP model in Python. Easy and fun!Data Scientist built an NLP model in Python. Easy and fun!
How to run at scale?
Custom Titus executor.
How to schedule the model to update daily?
Learn about the job scheduler.
How to access data at scale?
Slow!
Bad Old DaysBad Old Days
Data Scientist built an NLP model in Python. Easy and fun!Data Scientist built an NLP model in Python. Easy and fun!
How to run at scale?
Custom Titus executor.
How to schedule the model to update daily?
Learn about the job scheduler.
How to access data at scale?
Slow!
How to expose the model to a custom UI?
Custom web backend.
Bad Old DaysBad Old Days
Data Scientist built an NLP model in Python. Easy and fun!Data Scientist built an NLP model in Python. Easy and fun!
How to run at scale?
Custom Titus executor.
How to schedule the model to update daily?
Learn about the job scheduler.
How to access data at scale?
Slow!
How to expose the model to a custom UI?
Custom web backend.
Time to production:
4 months
Bad Old DaysBad Old Days
Data Scientist built an NLP model in Python. Easy and fun!Data Scientist built an NLP model in Python. Easy and fun!
How to run at scale?
Custom Titus executor.
How to schedule the model to update daily?
Learn about the job scheduler.
How to access data at scale?
Slow!
How to expose the model to a custom UI?
Custom web backend.
Time to production:
4 months
How to monitor models in production?
How to iterate on a new version without
breaking the production version?
How to let another data scientist iterate
on her version of the model safely?
How to debug yesterday's failed
production run?
How to backfill historical data?
How to make this faster?
Notebooks:Notebooks: NteractNteract
Job Scheduler:Job Scheduler: MesonMeson
Compute Resources:Compute Resources: TitusTitus
Query Engine:Query Engine: SparkSpark
Data Lake:Data Lake: S3S3
{
data
compute
prototyping
ML LibML Libraries: Rraries: R, XGBoost, TF etc., XGBoost, TF etc.models
ML Wrapping:ML Wrapping: MetaflowMetaflow
MetaflowMetaflow
BuildBuild
def compute(input):
output = my_model(input)
return outputoutputinput compute
How toHow to get started?get started?
def compute(input):
output = my_model(input)
return outputoutputinput compute
How toHow to get started?get started?
# python myscript.py
from metaflow import FlowSpec, step
class MyFlow(FlowSpec):
@step
def start(self):
self.next(self.a, self.b)
@step
def a(self):
self.next(self.join)
@step
def b(self):
self.next(self.join)
@step
def join(self, inputs):
self.next(self.end)
MyFlow()
How toHow to structure my code? structure my code?
start
B
A
join end
from metaflow import FlowSpec, step
class MyFlow(FlowSpec):
@step
def start(self):
self.next(self.a, self.b)
@step
def a(self):
self.next(self.join)
@step
def b(self):
self.next(self.join)
@step
def join(self, inputs):
self.next(self.end)
MyFlow()
How toHow to structure my code? structure my code?
start
B
A
join end
# python myscript.py run
metaflow("MyFlow") %>%
step(
step = "start",
next_step = c("a", "b")
) %>%
step(
step = "A",
r_function = r_function(a_func),
next_step = "join"
) %>%
step(
step = "B",
r_function = r_function(b_func),
next_step = "join"
) %>%
step(
step = "Join",
r_function = r_function(join,
join_step = TRUE),
How toHow to deal with models deal with models
written in R?written in R?
start
B
A
join end
metaflow("MyFlow") %>%
step(
step = "start",
next_step = c("a", "b")
) %>%
step(
step = "A",
r_function = r_function(a_func),
next_step = "join"
) %>%
step(
step = "B",
r_function = r_function(b_func),
next_step = "join"
) %>%
step(
step = "Join",
r_function = r_function(join,
join_step = TRUE),
How toHow to deal with models deal with models
written in R?written in R?
start
B
A
join end
# RScript myscript.R
MetaflowMetaflow adoptionadoption
at Netflixat Netflix
134 projects on Metaflow
as of November 2018
@step
def start(self):
self.x = 0
self.next(self.a, self.b)
@step
def a(self):
self.x += 2
self.next(self.join)
@step
def b(self):
self.x += 3
self.next(self.join)
@step
def join(self, inputs):
self.out = max(i.x for i in inputs)
self.next(self.end)
How toHow to prototype and testprototype and test
my code locally?my code locally?
start
B
A
join end
x=0
x+=2
x+=3
max(A.x, B.x)
@step
def start(self):
self.x = 0
self.next(self.a, self.b)
@step
def a(self):
self.x += 2
self.next(self.join)
@step
def b(self):
self.x += 3
self.next(self.join)
@step
def join(self, inputs):
self.out = max(i.x for i in inputs)
self.next(self.end)
How toHow to prototype and testprototype and test
my code locally?my code locally?
start
B
A
join end
# python myscript.py resume B
x=0
x+=2
x+=3
max(A.x, B.x)
@titus(cpu=16, gpu=1)
@step
def a(self):
tensorflow.train()
self.next(self.join)
@titus(memory=200000)
@step
def b(self):
massive_dataframe_operation()
self.next(self.join)
How toHow to get access to more CPUs, get access to more CPUs,
GPUs, or memory?GPUs, or memory?
start
B
A
join end
16 cores, 1GPU
200GB RAM
@titus(cpu=16, gpu=1)
@step
def a(self):
tensorflow.train()
self.next(self.join)
@titus(memory=200000)
@step
def b(self):
massive_dataframe_operation()
self.next(self.join)
How toHow to get access to more CPUs, get access to more CPUs,
GPUs, or memory?GPUs, or memory?
start
B
A
join end
16 cores, 1GPU
200GB RAM
# python myscript.py run
@step
def start(self):
self.grid = [’x’,’y’,’z’]
self.next(self.a, foreach=’grid’)
@titus(memory=10000)
@step
def a(self):
self.x = ord(self.input)
self.next(self.join)
@step
def join(self, inputs):
self.out = max(i.x for i in inputs)
self.next(self.end)
How toHow to distribute work over distribute work over
many parallel jobs?many parallel jobs?
start
A
join end
40%40% of projects run steps outside their dev environment.of projects run steps outside their dev environment.
How quickly they start using Titus?How quickly they start using Titus?
from metaflow import Table
@titus(memory=200000, network=20000)
@step
def b(self):
# Load data from S3 to a dataframe
# at 10Gbps
df = Table('vtuulos', 'input_table')
self.next(self.end)
How toHow to access large amounts of access large amounts of
input data?input data?
start
B
A
join end
S3
Case Study:Case Study: Marketing Cost per Incremental WatcherMarketing Cost per Incremental Watcher
1. Build a separate model for every new title with marketing spend. 
Parallel foreach.
2. Load and prepare input data for each model.
Download Parquet directly from S3.
Total amount of model input data: 890GB.
3. Fit a model.
Train each model on an instance with 400GB of RAM, 16 cores.
The model is written in R.
4. Share updated results.
Collect results of individual models, write to a table.
Results shown on a Tableau dashboard.
DeployDeploy
# Access Savin's runs
namespace('user:savin')
run = Flow('MyFlow').latest_run
print(run.id) # = 234
print(run.tags) # = ['unsampled_model']
# Access David's runs
namespace('user:david')
run = Flow('MyFlow').latest_run
print(run.id) # = 184
print(run.tags) # = ['sampled_model']
# Access everyone's runs
namespace(None)
run = Flow('MyFlow').latest_run
print(run.id) # = 184
How toHow to version my results and version my results and
access results by others?access results by others?
start
B
A
join end
david: sampled_model
savin: unsampled_model
How toHow to deploy my workflow to deploy my workflow to
production?production?
start
B
A
join end
How toHow to deploy my workflow to deploy my workflow to
production?production?
start
B
A
join end
#python myscript.py meson create
26%26% of projects get deployed to the production scheduler.of projects get deployed to the production scheduler.
How quickly the first deployment happens?How quickly the first deployment happens?
How toHow to monitor models andmonitor models and
examine results?examine results?
start
B
A
join end
x=0
x+=2
x+=3
max(A.x, B.x)
How toHow to deploy results asdeploy results as
a microservice?a microservice?
 
start
B
A
join end
x=0
x+=2
x+=3
max(A.x, B.x)
Metaflow
hosting
from metaflow import WebServiceSpec
from metaflow import endpoint
class MyWebService(WebServiceSpec):
@endpoint
def show_data(self, request_dict):
# TODO: real-time predict here
result = self.artifacts.flow.x
return {'result': result}
How toHow to deploy results asdeploy results as
a microservice?a microservice?
 
start
B
A
join end
x=0
x+=2
x+=3
max(A.x, B.x)
Metaflow
hosting
from metaflow import WebServiceSpec
from metaflow import endpoint
class MyWebService(WebServiceSpec):
@endpoint
def show_data(self, request_dict):
# TODO: real-time predict here
result = self.artifacts.flow.x
return {'result': result}
{"result": 3}{
# curl http://host/show_data 
Case Study:Case Study: Launch Date Schedule OptimizationLaunch Date Schedule Optimization
1. Batch optimize launch date schedules for new titles daily. 
Batch optimization deployed on Meson.
2. Serve results through a custom UI.
Results deployed on Metaflow Hosting.
3. Support arbitrary what-if scenarios in the custom UI.
Run optimizer in real-time in a custom web endpoint.
MetaflowMetaflow
diversediverse problemsproblems
diversediverse problemsproblems
diversediverse peoplepeople
diversediverse problemsproblems
diversediverse peoplepeople
diversediverse modelsmodels
diversediverse problemsproblems
diversediverse peoplepeople
helphelp peoplepeople buildbuild
diversediverse modelsmodels
diversediverse problemsproblems
diversediverse peoplepeople
helphelp peoplepeople buildbuild
helphelp peoplepeople deploydeploy
diversediverse modelsmodels
diversediverse problemsproblems
diversediverse peoplepeople
helphelp peoplepeople buildbuild
helphelp peoplepeople deploydeploy
diversediverse modelsmodels
happyhappy peoplepeople, healthy, healthy businessbusiness
thank you!thank you!
 
@vtuulos@vtuulos
vtuulos@netflix.comvtuulos@netflix.com
Bruno Coldiori
https://www.flickr.com/photos/br1dotcom/8900102170/
https://www.maxpixel.net/Isolated-Animal-Hundeportrait-Dog-Nature-3234285
Photo CreditsPhoto Credits
Watch the video with slide
synchronization on InfoQ.com!
https://www.infoq.com/presentations/
netflix-ml-infrastructure

Weitere ähnliche Inhalte

Mehr von C4Media

Shifting Left with Cloud Native CI/CD
Shifting Left with Cloud Native CI/CDShifting Left with Cloud Native CI/CD
Shifting Left with Cloud Native CI/CDC4Media
 
CI/CD for Machine Learning
CI/CD for Machine LearningCI/CD for Machine Learning
CI/CD for Machine LearningC4Media
 
Fault Tolerance at Speed
Fault Tolerance at SpeedFault Tolerance at Speed
Fault Tolerance at SpeedC4Media
 
Architectures That Scale Deep - Regaining Control in Deep Systems
Architectures That Scale Deep - Regaining Control in Deep SystemsArchitectures That Scale Deep - Regaining Control in Deep Systems
Architectures That Scale Deep - Regaining Control in Deep SystemsC4Media
 
ML in the Browser: Interactive Experiences with Tensorflow.js
ML in the Browser: Interactive Experiences with Tensorflow.jsML in the Browser: Interactive Experiences with Tensorflow.js
ML in the Browser: Interactive Experiences with Tensorflow.jsC4Media
 
Build Your Own WebAssembly Compiler
Build Your Own WebAssembly CompilerBuild Your Own WebAssembly Compiler
Build Your Own WebAssembly CompilerC4Media
 
User & Device Identity for Microservices @ Netflix Scale
User & Device Identity for Microservices @ Netflix ScaleUser & Device Identity for Microservices @ Netflix Scale
User & Device Identity for Microservices @ Netflix ScaleC4Media
 
Scaling Patterns for Netflix's Edge
Scaling Patterns for Netflix's EdgeScaling Patterns for Netflix's Edge
Scaling Patterns for Netflix's EdgeC4Media
 
Make Your Electron App Feel at Home Everywhere
Make Your Electron App Feel at Home EverywhereMake Your Electron App Feel at Home Everywhere
Make Your Electron App Feel at Home EverywhereC4Media
 
The Talk You've Been Await-ing For
The Talk You've Been Await-ing ForThe Talk You've Been Await-ing For
The Talk You've Been Await-ing ForC4Media
 
Future of Data Engineering
Future of Data EngineeringFuture of Data Engineering
Future of Data EngineeringC4Media
 
Automated Testing for Terraform, Docker, Packer, Kubernetes, and More
Automated Testing for Terraform, Docker, Packer, Kubernetes, and MoreAutomated Testing for Terraform, Docker, Packer, Kubernetes, and More
Automated Testing for Terraform, Docker, Packer, Kubernetes, and MoreC4Media
 
Navigating Complexity: High-performance Delivery and Discovery Teams
Navigating Complexity: High-performance Delivery and Discovery TeamsNavigating Complexity: High-performance Delivery and Discovery Teams
Navigating Complexity: High-performance Delivery and Discovery TeamsC4Media
 
High Performance Cooperative Distributed Systems in Adtech
High Performance Cooperative Distributed Systems in AdtechHigh Performance Cooperative Distributed Systems in Adtech
High Performance Cooperative Distributed Systems in AdtechC4Media
 
Rust's Journey to Async/await
Rust's Journey to Async/awaitRust's Journey to Async/await
Rust's Journey to Async/awaitC4Media
 
Opportunities and Pitfalls of Event-Driven Utopia
Opportunities and Pitfalls of Event-Driven UtopiaOpportunities and Pitfalls of Event-Driven Utopia
Opportunities and Pitfalls of Event-Driven UtopiaC4Media
 
Datadog: a Real-Time Metrics Database for One Quadrillion Points/Day
Datadog: a Real-Time Metrics Database for One Quadrillion Points/DayDatadog: a Real-Time Metrics Database for One Quadrillion Points/Day
Datadog: a Real-Time Metrics Database for One Quadrillion Points/DayC4Media
 
Are We Really Cloud-Native?
Are We Really Cloud-Native?Are We Really Cloud-Native?
Are We Really Cloud-Native?C4Media
 
CockroachDB: Architecture of a Geo-Distributed SQL Database
CockroachDB: Architecture of a Geo-Distributed SQL DatabaseCockroachDB: Architecture of a Geo-Distributed SQL Database
CockroachDB: Architecture of a Geo-Distributed SQL DatabaseC4Media
 
A Dive into Streams @LinkedIn with Brooklin
A Dive into Streams @LinkedIn with BrooklinA Dive into Streams @LinkedIn with Brooklin
A Dive into Streams @LinkedIn with BrooklinC4Media
 

Mehr von C4Media (20)

Shifting Left with Cloud Native CI/CD
Shifting Left with Cloud Native CI/CDShifting Left with Cloud Native CI/CD
Shifting Left with Cloud Native CI/CD
 
CI/CD for Machine Learning
CI/CD for Machine LearningCI/CD for Machine Learning
CI/CD for Machine Learning
 
Fault Tolerance at Speed
Fault Tolerance at SpeedFault Tolerance at Speed
Fault Tolerance at Speed
 
Architectures That Scale Deep - Regaining Control in Deep Systems
Architectures That Scale Deep - Regaining Control in Deep SystemsArchitectures That Scale Deep - Regaining Control in Deep Systems
Architectures That Scale Deep - Regaining Control in Deep Systems
 
ML in the Browser: Interactive Experiences with Tensorflow.js
ML in the Browser: Interactive Experiences with Tensorflow.jsML in the Browser: Interactive Experiences with Tensorflow.js
ML in the Browser: Interactive Experiences with Tensorflow.js
 
Build Your Own WebAssembly Compiler
Build Your Own WebAssembly CompilerBuild Your Own WebAssembly Compiler
Build Your Own WebAssembly Compiler
 
User & Device Identity for Microservices @ Netflix Scale
User & Device Identity for Microservices @ Netflix ScaleUser & Device Identity for Microservices @ Netflix Scale
User & Device Identity for Microservices @ Netflix Scale
 
Scaling Patterns for Netflix's Edge
Scaling Patterns for Netflix's EdgeScaling Patterns for Netflix's Edge
Scaling Patterns for Netflix's Edge
 
Make Your Electron App Feel at Home Everywhere
Make Your Electron App Feel at Home EverywhereMake Your Electron App Feel at Home Everywhere
Make Your Electron App Feel at Home Everywhere
 
The Talk You've Been Await-ing For
The Talk You've Been Await-ing ForThe Talk You've Been Await-ing For
The Talk You've Been Await-ing For
 
Future of Data Engineering
Future of Data EngineeringFuture of Data Engineering
Future of Data Engineering
 
Automated Testing for Terraform, Docker, Packer, Kubernetes, and More
Automated Testing for Terraform, Docker, Packer, Kubernetes, and MoreAutomated Testing for Terraform, Docker, Packer, Kubernetes, and More
Automated Testing for Terraform, Docker, Packer, Kubernetes, and More
 
Navigating Complexity: High-performance Delivery and Discovery Teams
Navigating Complexity: High-performance Delivery and Discovery TeamsNavigating Complexity: High-performance Delivery and Discovery Teams
Navigating Complexity: High-performance Delivery and Discovery Teams
 
High Performance Cooperative Distributed Systems in Adtech
High Performance Cooperative Distributed Systems in AdtechHigh Performance Cooperative Distributed Systems in Adtech
High Performance Cooperative Distributed Systems in Adtech
 
Rust's Journey to Async/await
Rust's Journey to Async/awaitRust's Journey to Async/await
Rust's Journey to Async/await
 
Opportunities and Pitfalls of Event-Driven Utopia
Opportunities and Pitfalls of Event-Driven UtopiaOpportunities and Pitfalls of Event-Driven Utopia
Opportunities and Pitfalls of Event-Driven Utopia
 
Datadog: a Real-Time Metrics Database for One Quadrillion Points/Day
Datadog: a Real-Time Metrics Database for One Quadrillion Points/DayDatadog: a Real-Time Metrics Database for One Quadrillion Points/Day
Datadog: a Real-Time Metrics Database for One Quadrillion Points/Day
 
Are We Really Cloud-Native?
Are We Really Cloud-Native?Are We Really Cloud-Native?
Are We Really Cloud-Native?
 
CockroachDB: Architecture of a Geo-Distributed SQL Database
CockroachDB: Architecture of a Geo-Distributed SQL DatabaseCockroachDB: Architecture of a Geo-Distributed SQL Database
CockroachDB: Architecture of a Geo-Distributed SQL Database
 
A Dive into Streams @LinkedIn with Brooklin
A Dive into Streams @LinkedIn with BrooklinA Dive into Streams @LinkedIn with Brooklin
A Dive into Streams @LinkedIn with Brooklin
 

Kürzlich hochgeladen

Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxBkGupta21
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rick Flair
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 

Kürzlich hochgeladen (20)

Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptx
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 

Human-centric Machine Learning Infrastructure @Netflix

  • 1. Human CentricHuman Centric Machine Learning InfrastructureMachine Learning Infrastructure @@ Ville Tuulos QCon SF, November 2018
  • 2. InfoQ.com: News & Community Site • Over 1,000,000 software developers, architects and CTOs read the site world- wide every month • 250,000 senior developers subscribe to our weekly newsletter • Published in 4 languages (English, Chinese, Japanese and Brazilian Portuguese) • Post content from our QCon conferences • 2 dedicated podcast channels: The InfoQ Podcast, with a focus on Architecture and The Engineering Culture Podcast, with a focus on building • 96 deep dives on innovative topics packed as downloadable emags and minibooks • Over 40 new content items per week Watch the video with slide synchronization on InfoQ.com! https://www.infoq.com/presentations/ netflix-ml-infrastructure
  • 3. Purpose of QCon - to empower software development by facilitating the spread of knowledge and innovation Strategy - practitioner-driven conference designed for YOU: influencers of change and innovation in your teams - speakers and topics driving the evolution and innovation - connecting and catalyzing the influencers and innovators Highlights - attended by more than 12,000 delegates since 2007 - held in 9 cities worldwide Presented at QCon San Francisco www.qconsf.com
  • 4.
  • 5. Meet Alex, a new chief data scientist at Caveman CupcakesMeet Alex, a new chief data scientist at Caveman Cupcakes You are hired!
  • 6. We need a dynamic pricing model.
  • 7. We need a dynamic pricing model. Optimal pricing model
  • 8.
  • 9.
  • 10. Great job! The model works perfectly!
  • 12. Optimal pricing modelOptimal churn model Alex's model
  • 13. Optimal pricing modelOptimal churn model Alex's model
  • 15. Can you include a causal attribution model for marketing?
  • 16. Optimal pricing model Optimal churn model Alex's model Attribution model
  • 17. Are you sure these results make sense?
  • 19. Meet the new data science team at Caveman CupcakesMeet the new data science team at Caveman Cupcakes You are hired!
  • 21. VS
  • 22. the human is the bottleneckthe human is the bottleneck VS
  • 23. the human is the bottleneckthe human is the bottleneck the human is the bottleneckthe human is the bottleneck VS
  • 24.
  • 27. Data WarehouseData Warehouse Compute ResourcesCompute Resources BuildBuild
  • 28. Data WarehouseData Warehouse Compute ResourcesCompute Resources Job SchedulerJob Scheduler BuildBuild
  • 29. Data WarehouseData Warehouse Compute ResourcesCompute Resources Job SchedulerJob Scheduler VersioningVersioning BuildBuild
  • 30. Data WarehouseData Warehouse Compute ResourcesCompute Resources Job SchedulerJob Scheduler VersioningVersioning Collaboration ToolsCollaboration Tools BuildBuild
  • 31. Data WarehouseData Warehouse Compute ResourcesCompute Resources Job SchedulerJob Scheduler VersioningVersioning Collaboration ToolsCollaboration Tools Model DeploymentModel Deployment BuildBuild
  • 32. Data WarehouseData Warehouse Compute ResourcesCompute Resources Job SchedulerJob Scheduler VersioningVersioning Collaboration ToolsCollaboration Tools Model DeploymentModel Deployment Feature EngineeringFeature Engineering BuildBuild
  • 33. Data WarehouseData Warehouse Compute ResourcesCompute Resources Job SchedulerJob Scheduler VersioningVersioning Collaboration ToolsCollaboration Tools Model DeploymentModel Deployment Feature EngineeringFeature Engineering ML LibrariesML Libraries BuildBuild
  • 34. Data WarehouseData Warehouse Compute ResourcesCompute Resources Job SchedulerJob Scheduler VersioningVersioning Collaboration ToolsCollaboration Tools Model DeploymentModel Deployment Feature EngineeringFeature Engineering ML LibrariesML Libraries How muchHow much data scientistdata scientist carescares BuildBuild
  • 35. Data WarehouseData Warehouse Compute ResourcesCompute Resources Job SchedulerJob Scheduler VersioningVersioning Collaboration ToolsCollaboration Tools Model DeploymentModel Deployment Feature EngineeringFeature Engineering ML LibrariesML Libraries How muchHow much data scientistdata scientist carescares How muchHow much infrastructureinfrastructure is neededis needed BuildBuild
  • 37. DeployDeploy No plan survives contact with enemyNo plan survives contact with enemy
  • 38. DeployDeploy No plan survives contact with enemyNo plan survives contact with enemy No model survives contact with realityNo model survives contact with reality
  • 39. our ML infra supportsour ML infra supports twotwo humanhuman activities:activities: buildingbuilding andand deployingdeploying data science workflows.data science workflows.
  • 40.
  • 41. Screenplay Analysis Using NLP Fraud Detection Title Portfolio Optimization Estimate Word-of-Mouth Effects Incremental Impact of Marketing Classify Support Tickets Predict Quality of Network Content Valuation Cluster Tweets Intelligent Infrastructure Machine Translation Optimal CDN Caching Predict Churn Content Tagging Optimize Production Schedules
  • 42. Notebooks:Notebooks: NteractNteract Job Scheduler:Job Scheduler: MesonMeson Compute Resources:Compute Resources: TitusTitus Query Engine:Query Engine: SparkSpark Data Lake:Data Lake: S3S3 ML LibML Libraries: Rraries: R, XGBoost, TF etc., XGBoost, TF etc.
  • 43. Notebooks:Notebooks: NteractNteract Job Scheduler:Job Scheduler: MesonMeson Compute Resources:Compute Resources: TitusTitus Query Engine:Query Engine: SparkSpark Data Lake:Data Lake: S3S3 ML LibML Libraries: Rraries: R, XGBoost, TF etc., XGBoost, TF etc.
  • 44. Notebooks:Notebooks: NteractNteract Job Scheduler:Job Scheduler: MesonMeson Compute Resources:Compute Resources: TitusTitus Query Engine:Query Engine: SparkSpark Data Lake:Data Lake: S3S3 { data compute prototyping ML LibML Libraries: Rraries: R, XGBoost, TF etc., XGBoost, TF etc.models
  • 45. Bad Old DaysBad Old Days
  • 46. Bad Old DaysBad Old Days Data Scientist built an NLP model in Python. Easy and fun!Data Scientist built an NLP model in Python. Easy and fun!
  • 47. Bad Old DaysBad Old Days Data Scientist built an NLP model in Python. Easy and fun!Data Scientist built an NLP model in Python. Easy and fun! How to run at scale? Custom Titus executor.
  • 48. Bad Old DaysBad Old Days Data Scientist built an NLP model in Python. Easy and fun!Data Scientist built an NLP model in Python. Easy and fun! How to run at scale? Custom Titus executor. How to access data at scale? Slow!
  • 49. Bad Old DaysBad Old Days Data Scientist built an NLP model in Python. Easy and fun!Data Scientist built an NLP model in Python. Easy and fun! How to run at scale? Custom Titus executor. How to schedule the model to update daily? Learn about the job scheduler. How to access data at scale? Slow!
  • 50. Bad Old DaysBad Old Days Data Scientist built an NLP model in Python. Easy and fun!Data Scientist built an NLP model in Python. Easy and fun! How to run at scale? Custom Titus executor. How to schedule the model to update daily? Learn about the job scheduler. How to access data at scale? Slow! How to expose the model to a custom UI? Custom web backend.
  • 51. Bad Old DaysBad Old Days Data Scientist built an NLP model in Python. Easy and fun!Data Scientist built an NLP model in Python. Easy and fun! How to run at scale? Custom Titus executor. How to schedule the model to update daily? Learn about the job scheduler. How to access data at scale? Slow! How to expose the model to a custom UI? Custom web backend. Time to production: 4 months
  • 52. Bad Old DaysBad Old Days Data Scientist built an NLP model in Python. Easy and fun!Data Scientist built an NLP model in Python. Easy and fun! How to run at scale? Custom Titus executor. How to schedule the model to update daily? Learn about the job scheduler. How to access data at scale? Slow! How to expose the model to a custom UI? Custom web backend. Time to production: 4 months How to monitor models in production? How to iterate on a new version without breaking the production version? How to let another data scientist iterate on her version of the model safely? How to debug yesterday's failed production run? How to backfill historical data? How to make this faster?
  • 53. Notebooks:Notebooks: NteractNteract Job Scheduler:Job Scheduler: MesonMeson Compute Resources:Compute Resources: TitusTitus Query Engine:Query Engine: SparkSpark Data Lake:Data Lake: S3S3 { data compute prototyping ML LibML Libraries: Rraries: R, XGBoost, TF etc., XGBoost, TF etc.models ML Wrapping:ML Wrapping: MetaflowMetaflow
  • 56. def compute(input): output = my_model(input) return outputoutputinput compute How toHow to get started?get started?
  • 57. def compute(input): output = my_model(input) return outputoutputinput compute How toHow to get started?get started? # python myscript.py
  • 58. from metaflow import FlowSpec, step class MyFlow(FlowSpec): @step def start(self): self.next(self.a, self.b) @step def a(self): self.next(self.join) @step def b(self): self.next(self.join) @step def join(self, inputs): self.next(self.end) MyFlow() How toHow to structure my code? structure my code? start B A join end
  • 59. from metaflow import FlowSpec, step class MyFlow(FlowSpec): @step def start(self): self.next(self.a, self.b) @step def a(self): self.next(self.join) @step def b(self): self.next(self.join) @step def join(self, inputs): self.next(self.end) MyFlow() How toHow to structure my code? structure my code? start B A join end # python myscript.py run
  • 60. metaflow("MyFlow") %>% step( step = "start", next_step = c("a", "b") ) %>% step( step = "A", r_function = r_function(a_func), next_step = "join" ) %>% step( step = "B", r_function = r_function(b_func), next_step = "join" ) %>% step( step = "Join", r_function = r_function(join, join_step = TRUE), How toHow to deal with models deal with models written in R?written in R? start B A join end
  • 61. metaflow("MyFlow") %>% step( step = "start", next_step = c("a", "b") ) %>% step( step = "A", r_function = r_function(a_func), next_step = "join" ) %>% step( step = "B", r_function = r_function(b_func), next_step = "join" ) %>% step( step = "Join", r_function = r_function(join, join_step = TRUE), How toHow to deal with models deal with models written in R?written in R? start B A join end # RScript myscript.R
  • 62. MetaflowMetaflow adoptionadoption at Netflixat Netflix 134 projects on Metaflow as of November 2018
  • 63. @step def start(self): self.x = 0 self.next(self.a, self.b) @step def a(self): self.x += 2 self.next(self.join) @step def b(self): self.x += 3 self.next(self.join) @step def join(self, inputs): self.out = max(i.x for i in inputs) self.next(self.end) How toHow to prototype and testprototype and test my code locally?my code locally? start B A join end x=0 x+=2 x+=3 max(A.x, B.x)
  • 64. @step def start(self): self.x = 0 self.next(self.a, self.b) @step def a(self): self.x += 2 self.next(self.join) @step def b(self): self.x += 3 self.next(self.join) @step def join(self, inputs): self.out = max(i.x for i in inputs) self.next(self.end) How toHow to prototype and testprototype and test my code locally?my code locally? start B A join end # python myscript.py resume B x=0 x+=2 x+=3 max(A.x, B.x)
  • 65. @titus(cpu=16, gpu=1) @step def a(self): tensorflow.train() self.next(self.join) @titus(memory=200000) @step def b(self): massive_dataframe_operation() self.next(self.join) How toHow to get access to more CPUs, get access to more CPUs, GPUs, or memory?GPUs, or memory? start B A join end 16 cores, 1GPU 200GB RAM
  • 66. @titus(cpu=16, gpu=1) @step def a(self): tensorflow.train() self.next(self.join) @titus(memory=200000) @step def b(self): massive_dataframe_operation() self.next(self.join) How toHow to get access to more CPUs, get access to more CPUs, GPUs, or memory?GPUs, or memory? start B A join end 16 cores, 1GPU 200GB RAM # python myscript.py run
  • 67. @step def start(self): self.grid = [’x’,’y’,’z’] self.next(self.a, foreach=’grid’) @titus(memory=10000) @step def a(self): self.x = ord(self.input) self.next(self.join) @step def join(self, inputs): self.out = max(i.x for i in inputs) self.next(self.end) How toHow to distribute work over distribute work over many parallel jobs?many parallel jobs? start A join end
  • 68. 40%40% of projects run steps outside their dev environment.of projects run steps outside their dev environment. How quickly they start using Titus?How quickly they start using Titus?
  • 69. from metaflow import Table @titus(memory=200000, network=20000) @step def b(self): # Load data from S3 to a dataframe # at 10Gbps df = Table('vtuulos', 'input_table') self.next(self.end) How toHow to access large amounts of access large amounts of input data?input data? start B A join end S3
  • 70. Case Study:Case Study: Marketing Cost per Incremental WatcherMarketing Cost per Incremental Watcher 1. Build a separate model for every new title with marketing spend.  Parallel foreach. 2. Load and prepare input data for each model. Download Parquet directly from S3. Total amount of model input data: 890GB. 3. Fit a model. Train each model on an instance with 400GB of RAM, 16 cores. The model is written in R. 4. Share updated results. Collect results of individual models, write to a table. Results shown on a Tableau dashboard.
  • 72. # Access Savin's runs namespace('user:savin') run = Flow('MyFlow').latest_run print(run.id) # = 234 print(run.tags) # = ['unsampled_model'] # Access David's runs namespace('user:david') run = Flow('MyFlow').latest_run print(run.id) # = 184 print(run.tags) # = ['sampled_model'] # Access everyone's runs namespace(None) run = Flow('MyFlow').latest_run print(run.id) # = 184 How toHow to version my results and version my results and access results by others?access results by others? start B A join end david: sampled_model savin: unsampled_model
  • 73. How toHow to deploy my workflow to deploy my workflow to production?production? start B A join end
  • 74. How toHow to deploy my workflow to deploy my workflow to production?production? start B A join end #python myscript.py meson create
  • 75. 26%26% of projects get deployed to the production scheduler.of projects get deployed to the production scheduler. How quickly the first deployment happens?How quickly the first deployment happens?
  • 76. How toHow to monitor models andmonitor models and examine results?examine results? start B A join end x=0 x+=2 x+=3 max(A.x, B.x)
  • 77. How toHow to deploy results asdeploy results as a microservice?a microservice?   start B A join end x=0 x+=2 x+=3 max(A.x, B.x) Metaflow hosting from metaflow import WebServiceSpec from metaflow import endpoint class MyWebService(WebServiceSpec): @endpoint def show_data(self, request_dict): # TODO: real-time predict here result = self.artifacts.flow.x return {'result': result}
  • 78. How toHow to deploy results asdeploy results as a microservice?a microservice?   start B A join end x=0 x+=2 x+=3 max(A.x, B.x) Metaflow hosting from metaflow import WebServiceSpec from metaflow import endpoint class MyWebService(WebServiceSpec): @endpoint def show_data(self, request_dict): # TODO: real-time predict here result = self.artifacts.flow.x return {'result': result} {"result": 3}{ # curl http://host/show_data 
  • 79. Case Study:Case Study: Launch Date Schedule OptimizationLaunch Date Schedule Optimization 1. Batch optimize launch date schedules for new titles daily.  Batch optimization deployed on Meson. 2. Serve results through a custom UI. Results deployed on Metaflow Hosting. 3. Support arbitrary what-if scenarios in the custom UI. Run optimizer in real-time in a custom web endpoint.
  • 84. diversediverse problemsproblems diversediverse peoplepeople helphelp peoplepeople buildbuild diversediverse modelsmodels
  • 85. diversediverse problemsproblems diversediverse peoplepeople helphelp peoplepeople buildbuild helphelp peoplepeople deploydeploy diversediverse modelsmodels
  • 86. diversediverse problemsproblems diversediverse peoplepeople helphelp peoplepeople buildbuild helphelp peoplepeople deploydeploy diversediverse modelsmodels happyhappy peoplepeople, healthy, healthy businessbusiness
  • 89. Watch the video with slide synchronization on InfoQ.com! https://www.infoq.com/presentations/ netflix-ml-infrastructure