SlideShare ist ein Scribd-Unternehmen logo
1 von 40
Downloaden Sie, um offline zu lesen
What are the Unique Challenges and
Opportunities in Systems for ML?
Matei Zaharia
😀 🙂
AI is going to
change all of
computing!
AI Researcher Systems Researcher
😀 🙂
It’s intelligent and you
don’t need to program
anymore and you just
differentiate things...
AI Researcher Systems Researcher
😀 🙂
How does it affect
your research
field?
AI Researcher Systems Researcher
😀 🙂
How does it affect
your research
field?
AI Researcher Systems Researcher
Umm, I figured out
a way to shave off
some system calls!
😀 🙂
How does it affect
your research
field?
AI Researcher Networking Researcher
I came up with a
new congestion
control scheme
😐 🙂AI Researcher Networking Researcher
I came up with a
new congestion
control scheme
Motivation
ML workloads can certainly influence a lot of systems,
but what are the unique research challenges they raise?
Turns out there are a lot! ML is very different from
traditional software, and we should look at how
My Perspective
Research lab focused on infrastructure for
usable machine learning
Data & ML platform for 2000+ orgs
How Does ML Differ from Traditional Software?
Traditional Software
Goal: meet a functional
specification
Quality depends only on
application code
Mostly deterministic
Machine Learning
Goal: optimize a metric
(e.g. accuracy)
Quality depends on input data
and tuning parameters
Stochastic
Some Interesting Opportunities
ML Platforms: software for managing and productionizing ML
Data-oriented model training, QA and debugging tools
Optimizations leveraging the stochastic nature of ML
ML-Aware System Optimization:
NoScope & BlazeIt
The ML Inference Bottleneck
Inference cost is often 100x higher than training
overall, and greatly limits deployments
Example: processing 1 video
stream in real time with CNNs
requires a $1000 GPU
Inference Optimization in NoScope
Idea: optimize execution of ML models for a specific
application or query
• Model specialization: train a small DNN to recognize the
specific class in the dataset (e.g. “buses in street video”)
• Query optimization: tune a cascade of
models to achieve a target accuracy Target
Model
Specialized
Model
Dataset
User Query
NoScope Results
VLDB ‘17, github.com/stanford-futuredata/noscope
Optimizing ML + SQL in BlazeIt
[Kang et al, CIDR 2019]
Object Detection DNN
Frames from Video
Query Plan with
Specialized DNNs
Resnet 50
SQL Query
BlazeIt Optimizations
Accelerate approximate queries by
using specialized model’s output
as a control variate for sampling
E.g.: find average # of cars/frame
Use specialized models to sort
frames by likelihood of matching
query, then run full model
E.g.: SELECT * FROM frames
WHERE #(red buses) > 3 LIMIT 5
Aggregation Queries Limit Queries
BlazeIt Results
Aggregation Queries Limit Queries
Quality Assurance for ML with
Model Assertions
Motivation
ML applications fail in complex, hard-to-debug ways
• Tesla cars crashing into lane dividers
• Gender classification incorrect
based on race
How can we test and improve quality of ML apps?
Model Assertions
Predicates on input/output of an ML application
(similar to software assertions)
[Kang, Raghavan et al, NeurIPS MLSys 2018]
Frame 1 Frame 2 Frame 3
assert(cars should not flicker in and out)
Improved training
(data selection &
weak supervision)
Runtime
monitoring
Example Assertions
Problem Domain Assertion
Video analytics
Objects should not flicker
in and out across frames
Autonomous vehicles
LIDAR and video object
detectors should agree
Heart rhythm
classification
Output class should not
change frequently
Using Model Assertions
Inference time
» Runtime monitoring
» Corrective action
Training time
» Active learning
» Weak supervision via
correction rules
Active Learning with Assertions:
Can assertions help select data to label & train on?
Key idea: new active learning algorithm samples data that
is most likely to reduce # failing assertions
Active Learning with Assertions:
Can assertions help select data to label & train on?
Using assertions
for active learning
improves model
quality.
Selection Method for 2000 New Labels
mAP
Weak Supervision with Assertions:
Can assertions improve quality without human labeling?
Key idea: consistency constraints API lets devs say which
attributes should stay constant across outputs in a dataset
E.g. “each tracked object should always have same class”,
“each person should have consistent detected gender”
Task Pretrained Weakly Supervised
AV perception (mAP) 10.6 14.1 (+33%)
Object detection (mAP) 34.4 49.9 (+45%)
ECG (% accuracy) 70.7 72.1 (+2%)
Weak Supervision with Assertions:
Can assertions improve quality without human labeling?
Model Quality After Retraining
Retrained SSD ModelOriginal SSD Model
[Kang, Raghavan et al, NeurIPS MLSys 2018]
ML Platforms: Programming and
Deployment Systems for ML
ML at Industrial Scale
Today, ML development is ad-hoc:
• Hard to track experiments & metrics: users do it best-effort
• Hard to reproduce results: won’t happen by default
• Hard to share & deploy models: different dev & deploy stacks
Each app takes months to build, and then needs to
continuously be maintained!
ML Platforms
A new class of systems to manage the ML lifecycle
Pioneered by company-specific platforms: Facebook
FBLearner, Uber Michelangelo, Google TFX, etc
+Standardize the data prep / training / deploy cycle:
if you work with the platform, you get these!
–Limited to a few algorithms or frameworks
–Tied to one company’s infrastructure
MLflow from Databricks
Open source, open-interface ML platform (mlflow.org)
• Works with any existing ML library and deployment service
Project
Project Spec
your_code.py
. . .
log_param(“alpha”, 0.5)
log_metric(“rmse”, 0.2)
log_model(my_model)
. . .
Deps Params
Tracking Server
UI
API
Inference Code
Bulk Scoring
Cloud Serving Tools
Deployment TargetsExperiment TrackingReproducible Projects
REST
API
my_project/
├── MLproject
│
│
│
│
│
├── conda.yaml
├── main.py
└── model.py
...
MLflow Projects: Reproducible Runs
conda_env: conda.yaml
entry_points:
main:
parameters:
training_data: path
lr: {type: float, default: 0.1}
command: python main.py {training_data} {lr}
$ mlflow run git://<my_project>
mlflow.run(“git://<my_project>”, ...)
Simple packaging format for code + dependencies
Composing Projects
r1 = mlflow.run(“ProjectA”, params)
if r1 > 0:
r2 = mlflow.run(“ProjectB”, …)
else:
r2 = mlflow.run(“ProjectC”, …)
r3 = mlflow.run(“ProjectD”, r2)
MLflow Tracking: Logging for ML
Notebooks
Local Apps
Cloud Jobs
Tracking Server
UI
API
mlflow.log_param(“alpha”, 0.5)
mlflow.log_metric(“accuracy”, 0.9)
...
REST API
Tracking UI: Inspecting Runs
Model Format
ONNX Flavor
Python Flavor
Model Logic
Batch Inference
REST Serving
Packaging Format
. . .
Testing & Debug Tools
LIME
TCAV
Packages arbitrary code (not just model weights)
MLflow Models: Packaging Models
MLflow Community Growth
140 contributors from >50 companies since June 2018
850K downloads/month
Major external contributions:
• Docker & Kubernetes execution
• R API
• Integrations with PyTorch, H2O, HDFS, GCS, …
• Plugin system
Other ML-Specific Research Opportunities
Data validation and monitoring (e.g. TFX Data Validation)
Supervision-oriented systems (e.g. Snorkel, Overton)
Leveraging the numeric nature of ML for optimization,
security, etc (e.g. TASO, HogWild, SSP, federated ML)
Conclusion
Many systems problems specific to ML are not
heavily studied in research
• App lifecycle, data quality & monitoring, model QA, etc
These are also major problems in practice!
Follow DAWN’s research at dawn.cs.stanford.edu

Weitere ähnliche Inhalte

Was ist angesagt?

Deep Learning for Natural Language Processing Using Apache Spark and TensorFl...
Deep Learning for Natural Language Processing Using Apache Spark and TensorFl...Deep Learning for Natural Language Processing Using Apache Spark and TensorFl...
Deep Learning for Natural Language Processing Using Apache Spark and TensorFl...Databricks
 
Simplifying Model Management with MLflow
Simplifying Model Management with MLflowSimplifying Model Management with MLflow
Simplifying Model Management with MLflowDatabricks
 
Scaling Databricks to Run Data and ML Workloads on Millions of VMs
Scaling Databricks to Run Data and ML Workloads on Millions of VMsScaling Databricks to Run Data and ML Workloads on Millions of VMs
Scaling Databricks to Run Data and ML Workloads on Millions of VMsMatei Zaharia
 
Hamburg Data Science Meetup - MLOps with a Feature Store
Hamburg Data Science Meetup - MLOps with a Feature StoreHamburg Data Science Meetup - MLOps with a Feature Store
Hamburg Data Science Meetup - MLOps with a Feature StoreMoritz Meister
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesDatabricks
 
MLflow and Azure Machine Learning—The Power Couple for ML Lifecycle Management
MLflow and Azure Machine Learning—The Power Couple for ML Lifecycle ManagementMLflow and Azure Machine Learning—The Power Couple for ML Lifecycle Management
MLflow and Azure Machine Learning—The Power Couple for ML Lifecycle ManagementDatabricks
 
Augmenting Machine Learning with Databricks Labs AutoML Toolkit
Augmenting Machine Learning with Databricks Labs AutoML ToolkitAugmenting Machine Learning with Databricks Labs AutoML Toolkit
Augmenting Machine Learning with Databricks Labs AutoML ToolkitDatabricks
 
Productionalizing Models through CI/CD Design with MLflow
Productionalizing Models through CI/CD Design with MLflowProductionalizing Models through CI/CD Design with MLflow
Productionalizing Models through CI/CD Design with MLflowDatabricks
 
MLflow: A Platform for Production Machine Learning
MLflow: A Platform for Production Machine LearningMLflow: A Platform for Production Machine Learning
MLflow: A Platform for Production Machine LearningMatei Zaharia
 
Ml ops past_present_future
Ml ops past_present_futureMl ops past_present_future
Ml ops past_present_futureNisha Talagala
 
High Performance Transfer Learning for Classifying Intent of Sales Engagement...
High Performance Transfer Learning for Classifying Intent of Sales Engagement...High Performance Transfer Learning for Classifying Intent of Sales Engagement...
High Performance Transfer Learning for Classifying Intent of Sales Engagement...Databricks
 
Productionizing Deep Reinforcement Learning with Spark and MLflow
Productionizing Deep Reinforcement Learning with Spark and MLflowProductionizing Deep Reinforcement Learning with Spark and MLflow
Productionizing Deep Reinforcement Learning with Spark and MLflowDatabricks
 
MLFlow: Platform for Complete Machine Learning Lifecycle
MLFlow: Platform for Complete Machine Learning Lifecycle MLFlow: Platform for Complete Machine Learning Lifecycle
MLFlow: Platform for Complete Machine Learning Lifecycle Databricks
 
ML-Ops how to bring your data science to production
ML-Ops  how to bring your data science to productionML-Ops  how to bring your data science to production
ML-Ops how to bring your data science to productionHerman Wu
 
MLOps with a Feature Store: Filling the Gap in ML Infrastructure
MLOps with a Feature Store: Filling the Gap in ML InfrastructureMLOps with a Feature Store: Filling the Gap in ML Infrastructure
MLOps with a Feature Store: Filling the Gap in ML InfrastructureData Science Milan
 
Challenges of Operationalising Data Science in Production
Challenges of Operationalising Data Science in ProductionChallenges of Operationalising Data Science in Production
Challenges of Operationalising Data Science in Productioniguazio
 
What's Next for MLflow in 2019
What's Next for MLflow in 2019What's Next for MLflow in 2019
What's Next for MLflow in 2019Anyscale
 
3 App Compat Win7
3 App Compat Win73 App Compat Win7
3 App Compat Win7llangit
 
Managers guide to effective building of machine learning products
Managers guide to effective building of machine learning productsManagers guide to effective building of machine learning products
Managers guide to effective building of machine learning productsGianmario Spacagna
 
The A-Z of Data: Introduction to MLOps
The A-Z of Data: Introduction to MLOpsThe A-Z of Data: Introduction to MLOps
The A-Z of Data: Introduction to MLOpsDataPhoenix
 

Was ist angesagt? (20)

Deep Learning for Natural Language Processing Using Apache Spark and TensorFl...
Deep Learning for Natural Language Processing Using Apache Spark and TensorFl...Deep Learning for Natural Language Processing Using Apache Spark and TensorFl...
Deep Learning for Natural Language Processing Using Apache Spark and TensorFl...
 
Simplifying Model Management with MLflow
Simplifying Model Management with MLflowSimplifying Model Management with MLflow
Simplifying Model Management with MLflow
 
Scaling Databricks to Run Data and ML Workloads on Millions of VMs
Scaling Databricks to Run Data and ML Workloads on Millions of VMsScaling Databricks to Run Data and ML Workloads on Millions of VMs
Scaling Databricks to Run Data and ML Workloads on Millions of VMs
 
Hamburg Data Science Meetup - MLOps with a Feature Store
Hamburg Data Science Meetup - MLOps with a Feature StoreHamburg Data Science Meetup - MLOps with a Feature Store
Hamburg Data Science Meetup - MLOps with a Feature Store
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction Queries
 
MLflow and Azure Machine Learning—The Power Couple for ML Lifecycle Management
MLflow and Azure Machine Learning—The Power Couple for ML Lifecycle ManagementMLflow and Azure Machine Learning—The Power Couple for ML Lifecycle Management
MLflow and Azure Machine Learning—The Power Couple for ML Lifecycle Management
 
Augmenting Machine Learning with Databricks Labs AutoML Toolkit
Augmenting Machine Learning with Databricks Labs AutoML ToolkitAugmenting Machine Learning with Databricks Labs AutoML Toolkit
Augmenting Machine Learning with Databricks Labs AutoML Toolkit
 
Productionalizing Models through CI/CD Design with MLflow
Productionalizing Models through CI/CD Design with MLflowProductionalizing Models through CI/CD Design with MLflow
Productionalizing Models through CI/CD Design with MLflow
 
MLflow: A Platform for Production Machine Learning
MLflow: A Platform for Production Machine LearningMLflow: A Platform for Production Machine Learning
MLflow: A Platform for Production Machine Learning
 
Ml ops past_present_future
Ml ops past_present_futureMl ops past_present_future
Ml ops past_present_future
 
High Performance Transfer Learning for Classifying Intent of Sales Engagement...
High Performance Transfer Learning for Classifying Intent of Sales Engagement...High Performance Transfer Learning for Classifying Intent of Sales Engagement...
High Performance Transfer Learning for Classifying Intent of Sales Engagement...
 
Productionizing Deep Reinforcement Learning with Spark and MLflow
Productionizing Deep Reinforcement Learning with Spark and MLflowProductionizing Deep Reinforcement Learning with Spark and MLflow
Productionizing Deep Reinforcement Learning with Spark and MLflow
 
MLFlow: Platform for Complete Machine Learning Lifecycle
MLFlow: Platform for Complete Machine Learning Lifecycle MLFlow: Platform for Complete Machine Learning Lifecycle
MLFlow: Platform for Complete Machine Learning Lifecycle
 
ML-Ops how to bring your data science to production
ML-Ops  how to bring your data science to productionML-Ops  how to bring your data science to production
ML-Ops how to bring your data science to production
 
MLOps with a Feature Store: Filling the Gap in ML Infrastructure
MLOps with a Feature Store: Filling the Gap in ML InfrastructureMLOps with a Feature Store: Filling the Gap in ML Infrastructure
MLOps with a Feature Store: Filling the Gap in ML Infrastructure
 
Challenges of Operationalising Data Science in Production
Challenges of Operationalising Data Science in ProductionChallenges of Operationalising Data Science in Production
Challenges of Operationalising Data Science in Production
 
What's Next for MLflow in 2019
What's Next for MLflow in 2019What's Next for MLflow in 2019
What's Next for MLflow in 2019
 
3 App Compat Win7
3 App Compat Win73 App Compat Win7
3 App Compat Win7
 
Managers guide to effective building of machine learning products
Managers guide to effective building of machine learning productsManagers guide to effective building of machine learning products
Managers guide to effective building of machine learning products
 
The A-Z of Data: Introduction to MLOps
The A-Z of Data: Introduction to MLOpsThe A-Z of Data: Introduction to MLOps
The A-Z of Data: Introduction to MLOps
 

Ähnlich wie What are the Unique Challenges and Opportunities in Systems for ML?

201909 Automated ML for Developers
201909 Automated ML for Developers201909 Automated ML for Developers
201909 Automated ML for DevelopersMark Tabladillo
 
Walk through of azure machine learning studio new features
Walk through of azure machine learning studio new featuresWalk through of azure machine learning studio new features
Walk through of azure machine learning studio new featuresLuca Zavarella
 
Justin Basilico, Research/ Engineering Manager at Netflix at MLconf SF - 11/1...
Justin Basilico, Research/ Engineering Manager at Netflix at MLconf SF - 11/1...Justin Basilico, Research/ Engineering Manager at Netflix at MLconf SF - 11/1...
Justin Basilico, Research/ Engineering Manager at Netflix at MLconf SF - 11/1...MLconf
 
Recommendations for Building Machine Learning Software
Recommendations for Building Machine Learning SoftwareRecommendations for Building Machine Learning Software
Recommendations for Building Machine Learning SoftwareJustin Basilico
 
Recommendations for Building Machine Learning Software
Recommendations for Building Machine Learning SoftwareRecommendations for Building Machine Learning Software
Recommendations for Building Machine Learning SoftwareJustin Basilico
 
Lessons Learned from Building Machine Learning Software at Netflix
Lessons Learned from Building Machine Learning Software at NetflixLessons Learned from Building Machine Learning Software at Netflix
Lessons Learned from Building Machine Learning Software at NetflixJustin Basilico
 
AI for Software Engineering
AI for Software EngineeringAI for Software Engineering
AI for Software EngineeringMiroslaw Staron
 
MLOps Using MLflow
MLOps Using MLflowMLOps Using MLflow
MLOps Using MLflowDatabricks
 
AnalyticOps: Lessons Learned Moving Machine-Learning Algorithms to Production...
AnalyticOps: Lessons Learned Moving Machine-Learning Algorithms to Production...AnalyticOps: Lessons Learned Moving Machine-Learning Algorithms to Production...
AnalyticOps: Lessons Learned Moving Machine-Learning Algorithms to Production...Robert Grossman
 
The Power of Auto ML and How Does it Work
The Power of Auto ML and How Does it WorkThe Power of Auto ML and How Does it Work
The Power of Auto ML and How Does it WorkIvo Andreev
 
Build, Train, and Deploy ML Models at Scale
Build, Train, and Deploy ML Models at ScaleBuild, Train, and Deploy ML Models at Scale
Build, Train, and Deploy ML Models at ScaleAmazon Web Services
 
Mohamed Sabri: Operationalize machine learning with Kubeflow
Mohamed Sabri: Operationalize machine learning with KubeflowMohamed Sabri: Operationalize machine learning with Kubeflow
Mohamed Sabri: Operationalize machine learning with KubeflowLviv Startup Club
 
Mohamed Sabri: Operationalize machine learning with Kubeflow
Mohamed Sabri: Operationalize machine learning with KubeflowMohamed Sabri: Operationalize machine learning with Kubeflow
Mohamed Sabri: Operationalize machine learning with KubeflowEdunomica
 
Build, train and deploy ML models at scale.pdf
Build, train and deploy ML models at scale.pdfBuild, train and deploy ML models at scale.pdf
Build, train and deploy ML models at scale.pdfAmazon Web Services
 
Build, train, and deploy ML models at scale.pdf
Build, train, and deploy ML models at scale.pdfBuild, train, and deploy ML models at scale.pdf
Build, train, and deploy ML models at scale.pdfAmazon Web Services
 
MLOps and Data Quality: Deploying Reliable ML Models in Production
MLOps and Data Quality: Deploying Reliable ML Models in ProductionMLOps and Data Quality: Deploying Reliable ML Models in Production
MLOps and Data Quality: Deploying Reliable ML Models in ProductionProvectus
 
201906 02 Introduction to AutoML with ML.NET 1.0
201906 02 Introduction to AutoML with ML.NET 1.0201906 02 Introduction to AutoML with ML.NET 1.0
201906 02 Introduction to AutoML with ML.NET 1.0Mark Tabladillo
 
The Data Science Process - Do we need it and how to apply?
The Data Science Process - Do we need it and how to apply?The Data Science Process - Do we need it and how to apply?
The Data Science Process - Do we need it and how to apply?Ivo Andreev
 

Ähnlich wie What are the Unique Challenges and Opportunities in Systems for ML? (20)

201909 Automated ML for Developers
201909 Automated ML for Developers201909 Automated ML for Developers
201909 Automated ML for Developers
 
Walk through of azure machine learning studio new features
Walk through of azure machine learning studio new featuresWalk through of azure machine learning studio new features
Walk through of azure machine learning studio new features
 
Justin Basilico, Research/ Engineering Manager at Netflix at MLconf SF - 11/1...
Justin Basilico, Research/ Engineering Manager at Netflix at MLconf SF - 11/1...Justin Basilico, Research/ Engineering Manager at Netflix at MLconf SF - 11/1...
Justin Basilico, Research/ Engineering Manager at Netflix at MLconf SF - 11/1...
 
Recommendations for Building Machine Learning Software
Recommendations for Building Machine Learning SoftwareRecommendations for Building Machine Learning Software
Recommendations for Building Machine Learning Software
 
Recommendations for Building Machine Learning Software
Recommendations for Building Machine Learning SoftwareRecommendations for Building Machine Learning Software
Recommendations for Building Machine Learning Software
 
Lessons Learned from Building Machine Learning Software at Netflix
Lessons Learned from Building Machine Learning Software at NetflixLessons Learned from Building Machine Learning Software at Netflix
Lessons Learned from Building Machine Learning Software at Netflix
 
AI for Software Engineering
AI for Software EngineeringAI for Software Engineering
AI for Software Engineering
 
Introduction to ML.NET
Introduction to ML.NETIntroduction to ML.NET
Introduction to ML.NET
 
MLOps Using MLflow
MLOps Using MLflowMLOps Using MLflow
MLOps Using MLflow
 
AnalyticOps: Lessons Learned Moving Machine-Learning Algorithms to Production...
AnalyticOps: Lessons Learned Moving Machine-Learning Algorithms to Production...AnalyticOps: Lessons Learned Moving Machine-Learning Algorithms to Production...
AnalyticOps: Lessons Learned Moving Machine-Learning Algorithms to Production...
 
The Power of Auto ML and How Does it Work
The Power of Auto ML and How Does it WorkThe Power of Auto ML and How Does it Work
The Power of Auto ML and How Does it Work
 
Build, Train, and Deploy ML Models at Scale
Build, Train, and Deploy ML Models at ScaleBuild, Train, and Deploy ML Models at Scale
Build, Train, and Deploy ML Models at Scale
 
Mohamed Sabri: Operationalize machine learning with Kubeflow
Mohamed Sabri: Operationalize machine learning with KubeflowMohamed Sabri: Operationalize machine learning with Kubeflow
Mohamed Sabri: Operationalize machine learning with Kubeflow
 
Mohamed Sabri: Operationalize machine learning with Kubeflow
Mohamed Sabri: Operationalize machine learning with KubeflowMohamed Sabri: Operationalize machine learning with Kubeflow
Mohamed Sabri: Operationalize machine learning with Kubeflow
 
TechDayPakistan-Slides RAG with Cosmos DB.pptx
TechDayPakistan-Slides RAG with Cosmos DB.pptxTechDayPakistan-Slides RAG with Cosmos DB.pptx
TechDayPakistan-Slides RAG with Cosmos DB.pptx
 
Build, train and deploy ML models at scale.pdf
Build, train and deploy ML models at scale.pdfBuild, train and deploy ML models at scale.pdf
Build, train and deploy ML models at scale.pdf
 
Build, train, and deploy ML models at scale.pdf
Build, train, and deploy ML models at scale.pdfBuild, train, and deploy ML models at scale.pdf
Build, train, and deploy ML models at scale.pdf
 
MLOps and Data Quality: Deploying Reliable ML Models in Production
MLOps and Data Quality: Deploying Reliable ML Models in ProductionMLOps and Data Quality: Deploying Reliable ML Models in Production
MLOps and Data Quality: Deploying Reliable ML Models in Production
 
201906 02 Introduction to AutoML with ML.NET 1.0
201906 02 Introduction to AutoML with ML.NET 1.0201906 02 Introduction to AutoML with ML.NET 1.0
201906 02 Introduction to AutoML with ML.NET 1.0
 
The Data Science Process - Do we need it and how to apply?
The Data Science Process - Do we need it and how to apply?The Data Science Process - Do we need it and how to apply?
The Data Science Process - Do we need it and how to apply?
 

Kürzlich hochgeladen

Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsJhone kinadey
 
Azure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdf
Azure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdfAzure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdf
Azure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdfryanfarris8
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️Delhi Call girls
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsAlberto González Trastoy
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdfWave PLM
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsArshad QA
 
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Steffen Staab
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxComplianceQuest1
 
The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...
The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...
The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...kalichargn70th171
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...ICS
 
Define the academic and professional writing..pdf
Define the academic and professional writing..pdfDefine the academic and professional writing..pdf
Define the academic and professional writing..pdfPearlKirahMaeRagusta1
 
10 Trends Likely to Shape Enterprise Technology in 2024
10 Trends Likely to Shape Enterprise Technology in 202410 Trends Likely to Shape Enterprise Technology in 2024
10 Trends Likely to Shape Enterprise Technology in 2024Mind IT Systems
 
How to Choose the Right Laravel Development Partner in New York City_compress...
How to Choose the Right Laravel Development Partner in New York City_compress...How to Choose the Right Laravel Development Partner in New York City_compress...
How to Choose the Right Laravel Development Partner in New York City_compress...software pro Development
 
VTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnVTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnAmarnathKambale
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfkalichargn70th171
 
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerHow To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerThousandEyes
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVshikhaohhpro
 
Exploring the Best Video Editing App.pdf
Exploring the Best Video Editing App.pdfExploring the Best Video Editing App.pdf
Exploring the Best Video Editing App.pdfproinshot.com
 
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...OnePlan Solutions
 
AI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplateAI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplatePresentation.STUDIO
 

Kürzlich hochgeladen (20)

Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial Goals
 
Azure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdf
Azure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdfAzure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdf
Azure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdf
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview Questions
 
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docx
 
The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...
The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...
The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
 
Define the academic and professional writing..pdf
Define the academic and professional writing..pdfDefine the academic and professional writing..pdf
Define the academic and professional writing..pdf
 
10 Trends Likely to Shape Enterprise Technology in 2024
10 Trends Likely to Shape Enterprise Technology in 202410 Trends Likely to Shape Enterprise Technology in 2024
10 Trends Likely to Shape Enterprise Technology in 2024
 
How to Choose the Right Laravel Development Partner in New York City_compress...
How to Choose the Right Laravel Development Partner in New York City_compress...How to Choose the Right Laravel Development Partner in New York City_compress...
How to Choose the Right Laravel Development Partner in New York City_compress...
 
VTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnVTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learn
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
 
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerHow To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTV
 
Exploring the Best Video Editing App.pdf
Exploring the Best Video Editing App.pdfExploring the Best Video Editing App.pdf
Exploring the Best Video Editing App.pdf
 
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
 
AI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplateAI & Machine Learning Presentation Template
AI & Machine Learning Presentation Template
 

What are the Unique Challenges and Opportunities in Systems for ML?

  • 1. What are the Unique Challenges and Opportunities in Systems for ML? Matei Zaharia
  • 2. 😀 🙂 AI is going to change all of computing! AI Researcher Systems Researcher
  • 3. 😀 🙂 It’s intelligent and you don’t need to program anymore and you just differentiate things... AI Researcher Systems Researcher
  • 4. 😀 🙂 How does it affect your research field? AI Researcher Systems Researcher
  • 5. 😀 🙂 How does it affect your research field? AI Researcher Systems Researcher Umm, I figured out a way to shave off some system calls!
  • 6. 😀 🙂 How does it affect your research field? AI Researcher Networking Researcher I came up with a new congestion control scheme
  • 7. 😐 🙂AI Researcher Networking Researcher I came up with a new congestion control scheme
  • 8. Motivation ML workloads can certainly influence a lot of systems, but what are the unique research challenges they raise? Turns out there are a lot! ML is very different from traditional software, and we should look at how
  • 9. My Perspective Research lab focused on infrastructure for usable machine learning Data & ML platform for 2000+ orgs
  • 10. How Does ML Differ from Traditional Software? Traditional Software Goal: meet a functional specification Quality depends only on application code Mostly deterministic Machine Learning Goal: optimize a metric (e.g. accuracy) Quality depends on input data and tuning parameters Stochastic
  • 11. Some Interesting Opportunities ML Platforms: software for managing and productionizing ML Data-oriented model training, QA and debugging tools Optimizations leveraging the stochastic nature of ML
  • 13. The ML Inference Bottleneck Inference cost is often 100x higher than training overall, and greatly limits deployments Example: processing 1 video stream in real time with CNNs requires a $1000 GPU
  • 14. Inference Optimization in NoScope Idea: optimize execution of ML models for a specific application or query • Model specialization: train a small DNN to recognize the specific class in the dataset (e.g. “buses in street video”) • Query optimization: tune a cascade of models to achieve a target accuracy Target Model Specialized Model Dataset User Query
  • 15. NoScope Results VLDB ‘17, github.com/stanford-futuredata/noscope
  • 16. Optimizing ML + SQL in BlazeIt [Kang et al, CIDR 2019] Object Detection DNN Frames from Video Query Plan with Specialized DNNs Resnet 50 SQL Query
  • 17. BlazeIt Optimizations Accelerate approximate queries by using specialized model’s output as a control variate for sampling E.g.: find average # of cars/frame Use specialized models to sort frames by likelihood of matching query, then run full model E.g.: SELECT * FROM frames WHERE #(red buses) > 3 LIMIT 5 Aggregation Queries Limit Queries
  • 19. Quality Assurance for ML with Model Assertions
  • 20. Motivation ML applications fail in complex, hard-to-debug ways • Tesla cars crashing into lane dividers • Gender classification incorrect based on race How can we test and improve quality of ML apps?
  • 21. Model Assertions Predicates on input/output of an ML application (similar to software assertions) [Kang, Raghavan et al, NeurIPS MLSys 2018] Frame 1 Frame 2 Frame 3 assert(cars should not flicker in and out) Improved training (data selection & weak supervision) Runtime monitoring
  • 22. Example Assertions Problem Domain Assertion Video analytics Objects should not flicker in and out across frames Autonomous vehicles LIDAR and video object detectors should agree Heart rhythm classification Output class should not change frequently
  • 23. Using Model Assertions Inference time » Runtime monitoring » Corrective action Training time » Active learning » Weak supervision via correction rules
  • 24. Active Learning with Assertions: Can assertions help select data to label & train on? Key idea: new active learning algorithm samples data that is most likely to reduce # failing assertions
  • 25. Active Learning with Assertions: Can assertions help select data to label & train on? Using assertions for active learning improves model quality. Selection Method for 2000 New Labels mAP
  • 26. Weak Supervision with Assertions: Can assertions improve quality without human labeling? Key idea: consistency constraints API lets devs say which attributes should stay constant across outputs in a dataset E.g. “each tracked object should always have same class”, “each person should have consistent detected gender”
  • 27. Task Pretrained Weakly Supervised AV perception (mAP) 10.6 14.1 (+33%) Object detection (mAP) 34.4 49.9 (+45%) ECG (% accuracy) 70.7 72.1 (+2%) Weak Supervision with Assertions: Can assertions improve quality without human labeling?
  • 28. Model Quality After Retraining Retrained SSD ModelOriginal SSD Model [Kang, Raghavan et al, NeurIPS MLSys 2018]
  • 29. ML Platforms: Programming and Deployment Systems for ML
  • 30. ML at Industrial Scale Today, ML development is ad-hoc: • Hard to track experiments & metrics: users do it best-effort • Hard to reproduce results: won’t happen by default • Hard to share & deploy models: different dev & deploy stacks Each app takes months to build, and then needs to continuously be maintained!
  • 31. ML Platforms A new class of systems to manage the ML lifecycle Pioneered by company-specific platforms: Facebook FBLearner, Uber Michelangelo, Google TFX, etc +Standardize the data prep / training / deploy cycle: if you work with the platform, you get these! –Limited to a few algorithms or frameworks –Tied to one company’s infrastructure
  • 32. MLflow from Databricks Open source, open-interface ML platform (mlflow.org) • Works with any existing ML library and deployment service Project Project Spec your_code.py . . . log_param(“alpha”, 0.5) log_metric(“rmse”, 0.2) log_model(my_model) . . . Deps Params Tracking Server UI API Inference Code Bulk Scoring Cloud Serving Tools Deployment TargetsExperiment TrackingReproducible Projects REST API
  • 33. my_project/ ├── MLproject │ │ │ │ │ ├── conda.yaml ├── main.py └── model.py ... MLflow Projects: Reproducible Runs conda_env: conda.yaml entry_points: main: parameters: training_data: path lr: {type: float, default: 0.1} command: python main.py {training_data} {lr} $ mlflow run git://<my_project> mlflow.run(“git://<my_project>”, ...) Simple packaging format for code + dependencies
  • 34. Composing Projects r1 = mlflow.run(“ProjectA”, params) if r1 > 0: r2 = mlflow.run(“ProjectB”, …) else: r2 = mlflow.run(“ProjectC”, …) r3 = mlflow.run(“ProjectD”, r2)
  • 35. MLflow Tracking: Logging for ML Notebooks Local Apps Cloud Jobs Tracking Server UI API mlflow.log_param(“alpha”, 0.5) mlflow.log_metric(“accuracy”, 0.9) ... REST API
  • 37. Model Format ONNX Flavor Python Flavor Model Logic Batch Inference REST Serving Packaging Format . . . Testing & Debug Tools LIME TCAV Packages arbitrary code (not just model weights) MLflow Models: Packaging Models
  • 38. MLflow Community Growth 140 contributors from >50 companies since June 2018 850K downloads/month Major external contributions: • Docker & Kubernetes execution • R API • Integrations with PyTorch, H2O, HDFS, GCS, … • Plugin system
  • 39. Other ML-Specific Research Opportunities Data validation and monitoring (e.g. TFX Data Validation) Supervision-oriented systems (e.g. Snorkel, Overton) Leveraging the numeric nature of ML for optimization, security, etc (e.g. TASO, HogWild, SSP, federated ML)
  • 40. Conclusion Many systems problems specific to ML are not heavily studied in research • App lifecycle, data quality & monitoring, model QA, etc These are also major problems in practice! Follow DAWN’s research at dawn.cs.stanford.edu