FrugalML: Using ML APIs More Accurately and Cheaply

Databricks
DatabricksDeveloper Marketing and Relations at MuleSoft um Databricks
FrugalML: Using ML Prediction APIs
more Accurately and Cheaply
Lingjiao Chen
1
Joint work with
James Zou
Matei Zaharia
Outline
Introduction to MLaaS
FrugalML: How to save up to 90% using cloud ML APIs?
The main idea
How to use it
Empirical evaluation on real world ML APIs
What is next?
2
Copyright@Lingjiao Chen,
https://lchen001.github.io/
Machine Learning as a Service (MLaaS)
- Goal:
Mitigate low level overheads
- e.g., model training
- data labelling, etc
- Participator:
-VALUE:
Previous: USD 1.0 billion in 2019
Expected: USD 8.48 billion by 2025
2019 2024
C
A
G
R
:
4
3
%
Source: Mordor Intelligence
3
Copyright@Lingjiao Chen,
https://lchen001.github.io/
Copyright@Lingjiao Chen,
https://lchen001.github.io/
4
Example: FER via GoogleVision API
Cost: $0.0015/image
Problem: Which API to use?
- ML Prediction APIs: a data point -> a label (plus a cost)
e.g., Google API: images -> facial emotions, 0.0015$/image
- Many commercial APIs with same functionality
- Heterogeneity in performance and cost
… …
5
Copyright@Lingjiao Chen,
https://lchen001.github.io/
Our Proposed Solution: FrugalML
- Optimize for best sequential strategy with a budget constraint
Up to 90% cost savings or 5% better accuracy with same cost
across all tasks and datasets evaluated
6
Copyright@Lingjiao Chen,
https://lchen001.github.io/
FrugalML: How to use it?
- Call a base service first
- Take the predicted quality score (QS) and predicted label from the
base service as features to decide
- i) if the prediction should be accepted
- ii) if and which additional API should be invoked.
7
Copyright@Lingjiao Chen,
https://lchen001.github.io/
FrugalML: How to use it?
8
Copyright@Lingjiao Chen,
https://lchen001.github.io/
FrugalMLTraining FrugalML Deploying
Google API Deploying
FrugalML: How to train it?
Goal: Pick the optimal base/add-on services, thresholds, etc.
Combinatorial optimization problem: provably efficient solver?
Statistically: How many samples are needed?
Computationally: How long does it take for training?
9
Copyright@Lingjiao Chen,
https://lchen001.github.io/
FrugalML: A provably efficient solver
✔ Key lemma: base/add-on services from <3 services (sparsity)
✔ An approx. solver: O(1/N) accuracy loss guarantee
✔ Sample complexity: N samples annotated by APIs
✔ Computational cost: O(N)
10
Copyright@Lingjiao Chen,
https://lchen001.github.io/
Learned FrugalML Strategy
Case Study on a facial emotion dataset, FER+
Budget: $5 (=cheapest commercial API)
FrugalML works well in practice
11
Copyright@Lingjiao Chen,
https://lchen001.github.io/
$15
$10
$0.01
Accuracy and Cost Comparison
Cost
(Dollar)
Accuracy
(%)
Case Study on a facial emotion dataset, FER+
FrugalML works well in practice
12
Copyright@Lingjiao Chen,
https://lchen001.github.io/
Accuracy Budget Trade-offs
Case study on a facial emotion dataset, FER+1
Accuracy
(%)
Microsoft API
Github API
FrugalML works well in practice
13
Copyright@Lingjiao Chen,
https://lchen001.github.io/
Face++ API
Google API
FrugalML’s cost savings (%) while match best commercial API’s accuracy
Up to 90% cost savings or 5% better accuracy with same cost
across all tasks and datasets evaluated
FrugalML works well in practice
Vision NLP Speech
14
Copyright@Lingjiao Chen,
https://lchen001.github.io/
FrugalML’s accuracy improvement (%) while match best commercial API’s cost
Up to 90% cost savings or 5% better accuracy with same cost
across all tasks and datasets evaluated
FrugalML works well in practice
Vision NLP Speech
15
Copyright@Lingjiao Chen,
https://lchen001.github.io/
Conclusions and Open Problems
Question: Best use ML APIs in the market within a budget
Our solution: FrugalML
Provable performance and efficiency guarantee
Up to 90% cost savings or 5% better accuracy with same cost
Dataset with 612,139 samples annotated by APIs and code released
Open problems: many exist in this under-explored area
More complicated tasks?
API performance shift?
Other requirements (fairness, latency, …)?
16
Copyright@Lingjiao Chen,
https://lchen001.github.io/
Code and Data:
github.com/lchen001/Frugal
ML
More on theoretical analysis, empirical results:
Please visit our project website and/or full paper!
17
Copyright@Lingjiao Chen,
https://lchen001.github.io/
1 von 17

Recomendados

Commercializing Alternative Data von
Commercializing Alternative DataCommercializing Alternative Data
Commercializing Alternative DataDatabricks
350 views18 Folien
Jeeves Grows Up: An AI Chatbot for Performance and Quality von
Jeeves Grows Up: An AI Chatbot for Performance and QualityJeeves Grows Up: An AI Chatbot for Performance and Quality
Jeeves Grows Up: An AI Chatbot for Performance and QualityDatabricks
260 views28 Folien
Polymorphic Table Functions: The Best Way to Integrate SQL and Apache Spark von
Polymorphic Table Functions: The Best Way to Integrate SQL and Apache SparkPolymorphic Table Functions: The Best Way to Integrate SQL and Apache Spark
Polymorphic Table Functions: The Best Way to Integrate SQL and Apache SparkDatabricks
421 views28 Folien
Software Engineering for Data Scientists von
Software Engineering for Data ScientistsSoftware Engineering for Data Scientists
Software Engineering for Data ScientistsDomino Data Lab
447 views11 Folien
Architecting for Data Science von
Architecting for Data ScienceArchitecting for Data Science
Architecting for Data ScienceJohann Schleier-Smith
1.3K views114 Folien
Deep Learning for Recommender Systems with Nick pentreath von
Deep Learning for Recommender Systems with Nick pentreathDeep Learning for Recommender Systems with Nick pentreath
Deep Learning for Recommender Systems with Nick pentreathDatabricks
3.2K views31 Folien

Más contenido relacionado

Was ist angesagt?

Feature Store as a Data Foundation for Machine Learning von
Feature Store as a Data Foundation for Machine LearningFeature Store as a Data Foundation for Machine Learning
Feature Store as a Data Foundation for Machine LearningProvectus
349 views43 Folien
Horizon: Deep Reinforcement Learning at Scale von
Horizon: Deep Reinforcement Learning at ScaleHorizon: Deep Reinforcement Learning at Scale
Horizon: Deep Reinforcement Learning at ScaleDatabricks
670 views37 Folien
Scaling AutoML-Driven Anomaly Detection With Luminaire von
Scaling AutoML-Driven Anomaly Detection With LuminaireScaling AutoML-Driven Anomaly Detection With Luminaire
Scaling AutoML-Driven Anomaly Detection With LuminaireDatabricks
273 views34 Folien
H2O World - Data Science w/ Big Data in a Corporate Environment - Nachum Shacham von
H2O World - Data Science w/ Big Data in a Corporate Environment - Nachum ShachamH2O World - Data Science w/ Big Data in a Corporate Environment - Nachum Shacham
H2O World - Data Science w/ Big Data in a Corporate Environment - Nachum ShachamSri Ambati
2.3K views18 Folien
Real-time Recommendations for Retail: Architecture, Algorithms, and Design von
Real-time Recommendations for Retail: Architecture, Algorithms, and DesignReal-time Recommendations for Retail: Architecture, Algorithms, and Design
Real-time Recommendations for Retail: Architecture, Algorithms, and DesignJuliet Hougland
5K views54 Folien
Building Real-Time Data Pipeline for Diabetes Medication Recommender System U... von
Building Real-Time Data Pipeline for Diabetes Medication Recommender System U...Building Real-Time Data Pipeline for Diabetes Medication Recommender System U...
Building Real-Time Data Pipeline for Diabetes Medication Recommender System U...Databricks
958 views31 Folien

Was ist angesagt?(20)

Feature Store as a Data Foundation for Machine Learning von Provectus
Feature Store as a Data Foundation for Machine LearningFeature Store as a Data Foundation for Machine Learning
Feature Store as a Data Foundation for Machine Learning
Provectus349 views
Horizon: Deep Reinforcement Learning at Scale von Databricks
Horizon: Deep Reinforcement Learning at ScaleHorizon: Deep Reinforcement Learning at Scale
Horizon: Deep Reinforcement Learning at Scale
Databricks670 views
Scaling AutoML-Driven Anomaly Detection With Luminaire von Databricks
Scaling AutoML-Driven Anomaly Detection With LuminaireScaling AutoML-Driven Anomaly Detection With Luminaire
Scaling AutoML-Driven Anomaly Detection With Luminaire
Databricks273 views
H2O World - Data Science w/ Big Data in a Corporate Environment - Nachum Shacham von Sri Ambati
H2O World - Data Science w/ Big Data in a Corporate Environment - Nachum ShachamH2O World - Data Science w/ Big Data in a Corporate Environment - Nachum Shacham
H2O World - Data Science w/ Big Data in a Corporate Environment - Nachum Shacham
Sri Ambati2.3K views
Real-time Recommendations for Retail: Architecture, Algorithms, and Design von Juliet Hougland
Real-time Recommendations for Retail: Architecture, Algorithms, and DesignReal-time Recommendations for Retail: Architecture, Algorithms, and Design
Real-time Recommendations for Retail: Architecture, Algorithms, and Design
Juliet Hougland5K views
Building Real-Time Data Pipeline for Diabetes Medication Recommender System U... von Databricks
Building Real-Time Data Pipeline for Diabetes Medication Recommender System U...Building Real-Time Data Pipeline for Diabetes Medication Recommender System U...
Building Real-Time Data Pipeline for Diabetes Medication Recommender System U...
Databricks958 views
Building the Artificially Intelligent Enterprise von Databricks
Building the Artificially Intelligent EnterpriseBuilding the Artificially Intelligent Enterprise
Building the Artificially Intelligent Enterprise
Databricks253 views
Improving Search in Workday Products using Natural Language Processing von DataWorks Summit
Improving Search in Workday Products using Natural Language ProcessingImproving Search in Workday Products using Natural Language Processing
Improving Search in Workday Products using Natural Language Processing
DataWorks Summit729 views
Commercial Analytics at Scale in Pharma: From Hackathon to MVP with Azure Dat... von Databricks
Commercial Analytics at Scale in Pharma: From Hackathon to MVP with Azure Dat...Commercial Analytics at Scale in Pharma: From Hackathon to MVP with Azure Dat...
Commercial Analytics at Scale in Pharma: From Hackathon to MVP with Azure Dat...
Databricks564 views
Data Science as a Service: Intersection of Cloud Computing and Data Science von Pouria Amirian
Data Science as a Service: Intersection of Cloud Computing and Data ScienceData Science as a Service: Intersection of Cloud Computing and Data Science
Data Science as a Service: Intersection of Cloud Computing and Data Science
Pouria Amirian1.1K views
Machine Learning in Production with Dato Predictive Services von Turi, Inc.
Machine Learning in Production with Dato Predictive ServicesMachine Learning in Production with Dato Predictive Services
Machine Learning in Production with Dato Predictive Services
Turi, Inc.820 views
Importance of ML Reproducibility & Applications with MLfLow von Databricks
Importance of ML Reproducibility & Applications with MLfLowImportance of ML Reproducibility & Applications with MLfLow
Importance of ML Reproducibility & Applications with MLfLow
Databricks288 views
Storage Challenges for Production Machine Learning von Nisha Talagala
Storage Challenges for Production Machine LearningStorage Challenges for Production Machine Learning
Storage Challenges for Production Machine Learning
Nisha Talagala172 views
Accelerating the ML Lifecycle with an Enterprise-Grade Feature Store von Databricks
Accelerating the ML Lifecycle with an Enterprise-Grade Feature StoreAccelerating the ML Lifecycle with an Enterprise-Grade Feature Store
Accelerating the ML Lifecycle with an Enterprise-Grade Feature Store
Databricks562 views
Knowledge Graph for Machine Learning and Data Science von Cambridge Semantics
Knowledge Graph for Machine Learning and Data ScienceKnowledge Graph for Machine Learning and Data Science
Knowledge Graph for Machine Learning and Data Science
MLCommons: Better ML for Everyone von Databricks
MLCommons: Better ML for EveryoneMLCommons: Better ML for Everyone
MLCommons: Better ML for Everyone
Databricks167 views
Towards Personalization in Global Digital Health von Databricks
Towards Personalization in Global Digital HealthTowards Personalization in Global Digital Health
Towards Personalization in Global Digital Health
Databricks196 views
Rsqrd AI: How to Design a Reliable and Reproducible Pipeline von Sanjana Chowdhury
Rsqrd AI: How to Design a Reliable and Reproducible PipelineRsqrd AI: How to Design a Reliable and Reproducible Pipeline
Rsqrd AI: How to Design a Reliable and Reproducible Pipeline
Sanjana Chowdhury992 views

Similar a FrugalML: Using ML APIs More Accurately and Cheaply

Ml product page von
Ml product pageMl product page
Ml product pageJanu Jahnavi
57 views13 Folien
Ml product page von
Ml product pageMl product page
Ml product pageJanu Jahnavi
11 views13 Folien
Unleashing the Power of Generative AI.pdf von
Unleashing the Power of Generative AI.pdfUnleashing the Power of Generative AI.pdf
Unleashing the Power of Generative AI.pdfeoinhalpin99
8 views16 Folien
Unleashing the Power of Generative AI.pdf von
Unleashing the Power of Generative AI.pdfUnleashing the Power of Generative AI.pdf
Unleashing the Power of Generative AI.pdfTomHalpin9
38 views16 Folien
MongoDB World 2018: Building Intelligent Apps with MongoDB & Google Cloud von
MongoDB World 2018: Building Intelligent Apps with MongoDB & Google CloudMongoDB World 2018: Building Intelligent Apps with MongoDB & Google Cloud
MongoDB World 2018: Building Intelligent Apps with MongoDB & Google CloudMongoDB
1K views70 Folien
Smarter Event-Driven Edge with Amazon SageMaker & Project Flogo (AIM204-S) - ... von
Smarter Event-Driven Edge with Amazon SageMaker & Project Flogo (AIM204-S) - ...Smarter Event-Driven Edge with Amazon SageMaker & Project Flogo (AIM204-S) - ...
Smarter Event-Driven Edge with Amazon SageMaker & Project Flogo (AIM204-S) - ...Amazon Web Services
5.1K views42 Folien

Similar a FrugalML: Using ML APIs More Accurately and Cheaply(20)

Unleashing the Power of Generative AI.pdf von eoinhalpin99
Unleashing the Power of Generative AI.pdfUnleashing the Power of Generative AI.pdf
Unleashing the Power of Generative AI.pdf
eoinhalpin998 views
Unleashing the Power of Generative AI.pdf von TomHalpin9
Unleashing the Power of Generative AI.pdfUnleashing the Power of Generative AI.pdf
Unleashing the Power of Generative AI.pdf
TomHalpin938 views
MongoDB World 2018: Building Intelligent Apps with MongoDB & Google Cloud von MongoDB
MongoDB World 2018: Building Intelligent Apps with MongoDB & Google CloudMongoDB World 2018: Building Intelligent Apps with MongoDB & Google Cloud
MongoDB World 2018: Building Intelligent Apps with MongoDB & Google Cloud
MongoDB1K views
Smarter Event-Driven Edge with Amazon SageMaker & Project Flogo (AIM204-S) - ... von Amazon Web Services
Smarter Event-Driven Edge with Amazon SageMaker & Project Flogo (AIM204-S) - ...Smarter Event-Driven Edge with Amazon SageMaker & Project Flogo (AIM204-S) - ...
Smarter Event-Driven Edge with Amazon SageMaker & Project Flogo (AIM204-S) - ...
Amazon Web Services5.1K views
Machine Learning Platformization & AutoML: Adopting ML at Scale in the Enterp... von Ed Fernandez
Machine Learning Platformization & AutoML: Adopting ML at Scale in the Enterp...Machine Learning Platformization & AutoML: Adopting ML at Scale in the Enterp...
Machine Learning Platformization & AutoML: Adopting ML at Scale in the Enterp...
Ed Fernandez4.8K views
Building NLP applications with Transformers von Julien SIMON
Building NLP applications with TransformersBuilding NLP applications with Transformers
Building NLP applications with Transformers
Julien SIMON1.1K views
MLSEV Virtual. ML Platformization and AutoML in the Enterprise von BigML, Inc
MLSEV Virtual. ML Platformization and AutoML in the EnterpriseMLSEV Virtual. ML Platformization and AutoML in the Enterprise
MLSEV Virtual. ML Platformization and AutoML in the Enterprise
BigML, Inc390 views
Xain.io exhibiting at Berlin Tech Job Fair Spring 2020 von TechMeetups
Xain.io exhibiting at Berlin Tech Job Fair Spring 2020Xain.io exhibiting at Berlin Tech Job Fair Spring 2020
Xain.io exhibiting at Berlin Tech Job Fair Spring 2020
TechMeetups129 views
USING FACTORY DESIGN PATTERNS IN MAP REDUCE DESIGN FOR BIG DATA ANALYTICS von HCL Technologies
USING FACTORY DESIGN PATTERNS IN MAP REDUCE DESIGN FOR BIG DATA ANALYTICSUSING FACTORY DESIGN PATTERNS IN MAP REDUCE DESIGN FOR BIG DATA ANALYTICS
USING FACTORY DESIGN PATTERNS IN MAP REDUCE DESIGN FOR BIG DATA ANALYTICS
HCL Technologies2.3K views
Bailing Out Your Business with Open Source von Matt Asay
Bailing Out Your Business with Open SourceBailing Out Your Business with Open Source
Bailing Out Your Business with Open Source
Matt Asay635 views
Designing a Generative AI QnA solution with Proprietary Enterprise Business K... von IRJET Journal
Designing a Generative AI QnA solution with Proprietary Enterprise Business K...Designing a Generative AI QnA solution with Proprietary Enterprise Business K...
Designing a Generative AI QnA solution with Proprietary Enterprise Business K...
IRJET Journal27 views
Using ml to accelerate failure analysis von Heemeng Foo
Using ml to accelerate failure analysisUsing ml to accelerate failure analysis
Using ml to accelerate failure analysis
Heemeng Foo118 views
[DSC Europe 22] Avoid mistakes building AI products - Karol Przystalski von DataScienceConferenc1
[DSC Europe 22] Avoid mistakes building AI products - Karol Przystalski[DSC Europe 22] Avoid mistakes building AI products - Karol Przystalski
[DSC Europe 22] Avoid mistakes building AI products - Karol Przystalski
Appear IQ The Business Case for hybrid html5 mobile apps von Appear
Appear IQ The Business Case for hybrid html5 mobile appsAppear IQ The Business Case for hybrid html5 mobile apps
Appear IQ The Business Case for hybrid html5 mobile apps
Appear403 views
Applying the Serverless Mindset to Any Tech Stack von Ben Kehoe
Applying the Serverless Mindset to Any Tech StackApplying the Serverless Mindset to Any Tech Stack
Applying the Serverless Mindset to Any Tech Stack
Ben Kehoe558 views

Más de Databricks

DW Migration Webinar-March 2022.pptx von
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDatabricks
4.3K views25 Folien
Data Lakehouse Symposium | Day 1 | Part 1 von
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Databricks
1.5K views43 Folien
Data Lakehouse Symposium | Day 1 | Part 2 von
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Databricks
739 views16 Folien
Data Lakehouse Symposium | Day 4 von
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Databricks
1.8K views74 Folien
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop von
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of HadoopDatabricks
6.3K views64 Folien
Democratizing Data Quality Through a Centralized Platform von
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDatabricks
1.4K views36 Folien

Más de Databricks(20)

DW Migration Webinar-March 2022.pptx von Databricks
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptx
Databricks4.3K views
Data Lakehouse Symposium | Day 1 | Part 1 von Databricks
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1
Databricks1.5K views
Data Lakehouse Symposium | Day 1 | Part 2 von Databricks
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2
Databricks739 views
Data Lakehouse Symposium | Day 4 von Databricks
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4
Databricks1.8K views
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop von Databricks
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
Databricks6.3K views
Democratizing Data Quality Through a Centralized Platform von Databricks
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized Platform
Databricks1.4K views
Learn to Use Databricks for Data Science von Databricks
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data Science
Databricks1.6K views
Why APM Is Not the Same As ML Monitoring von Databricks
Why APM Is Not the Same As ML MonitoringWhy APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML Monitoring
Databricks743 views
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix von Databricks
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixThe Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
Databricks689 views
Stage Level Scheduling Improving Big Data and AI Integration von Databricks
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI Integration
Databricks850 views
Simplify Data Conversion from Spark to TensorFlow and PyTorch von Databricks
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Databricks1.8K views
Scaling your Data Pipelines with Apache Spark on Kubernetes von Databricks
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on Kubernetes
Databricks2.1K views
Scaling and Unifying SciKit Learn and Apache Spark Pipelines von Databricks
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Databricks667 views
Sawtooth Windows for Feature Aggregations von Databricks
Sawtooth Windows for Feature AggregationsSawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature Aggregations
Databricks605 views
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink von Databricks
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Databricks675 views
Re-imagine Data Monitoring with whylogs and Spark von Databricks
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and Spark
Databricks550 views
Raven: End-to-end Optimization of ML Prediction Queries von Databricks
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction Queries
Databricks449 views
Processing Large Datasets for ADAS Applications using Apache Spark von Databricks
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache Spark
Databricks513 views
Massive Data Processing in Adobe Using Delta Lake von Databricks
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta Lake
Databricks719 views
Machine Learning CI/CD for Email Attack Detection von Databricks
Machine Learning CI/CD for Email Attack DetectionMachine Learning CI/CD for Email Attack Detection
Machine Learning CI/CD for Email Attack Detection
Databricks389 views

Último

CRM stick or twist.pptx von
CRM stick or twist.pptxCRM stick or twist.pptx
CRM stick or twist.pptxinfo828217
10 views16 Folien
[DSC Europe 23][Cryptica] Martin_Summer_Digital_central_bank_money_Ideas_init... von
[DSC Europe 23][Cryptica] Martin_Summer_Digital_central_bank_money_Ideas_init...[DSC Europe 23][Cryptica] Martin_Summer_Digital_central_bank_money_Ideas_init...
[DSC Europe 23][Cryptica] Martin_Summer_Digital_central_bank_money_Ideas_init...DataScienceConferenc1
5 views18 Folien
SAP-TCodes.pdf von
SAP-TCodes.pdfSAP-TCodes.pdf
SAP-TCodes.pdfmustafaghulam8181
10 views285 Folien
Short Story Assignment by Kelly Nguyen von
Short Story Assignment by Kelly NguyenShort Story Assignment by Kelly Nguyen
Short Story Assignment by Kelly Nguyenkellynguyen01
19 views17 Folien
UNEP FI CRS Climate Risk Results.pptx von
UNEP FI CRS Climate Risk Results.pptxUNEP FI CRS Climate Risk Results.pptx
UNEP FI CRS Climate Risk Results.pptxpekka28
11 views51 Folien
Chapter 3b- Process Communication (1) (1)(1) (1).pptx von
Chapter 3b- Process Communication (1) (1)(1) (1).pptxChapter 3b- Process Communication (1) (1)(1) (1).pptx
Chapter 3b- Process Communication (1) (1)(1) (1).pptxayeshabaig2004
6 views30 Folien

Último(20)

CRM stick or twist.pptx von info828217
CRM stick or twist.pptxCRM stick or twist.pptx
CRM stick or twist.pptx
info82821710 views
[DSC Europe 23][Cryptica] Martin_Summer_Digital_central_bank_money_Ideas_init... von DataScienceConferenc1
[DSC Europe 23][Cryptica] Martin_Summer_Digital_central_bank_money_Ideas_init...[DSC Europe 23][Cryptica] Martin_Summer_Digital_central_bank_money_Ideas_init...
[DSC Europe 23][Cryptica] Martin_Summer_Digital_central_bank_money_Ideas_init...
Short Story Assignment by Kelly Nguyen von kellynguyen01
Short Story Assignment by Kelly NguyenShort Story Assignment by Kelly Nguyen
Short Story Assignment by Kelly Nguyen
kellynguyen0119 views
UNEP FI CRS Climate Risk Results.pptx von pekka28
UNEP FI CRS Climate Risk Results.pptxUNEP FI CRS Climate Risk Results.pptx
UNEP FI CRS Climate Risk Results.pptx
pekka2811 views
Chapter 3b- Process Communication (1) (1)(1) (1).pptx von ayeshabaig2004
Chapter 3b- Process Communication (1) (1)(1) (1).pptxChapter 3b- Process Communication (1) (1)(1) (1).pptx
Chapter 3b- Process Communication (1) (1)(1) (1).pptx
ayeshabaig20046 views
Ukraine Infographic_22NOV2023_v2.pdf von AnastosiyaGurin
Ukraine Infographic_22NOV2023_v2.pdfUkraine Infographic_22NOV2023_v2.pdf
Ukraine Infographic_22NOV2023_v2.pdf
AnastosiyaGurin1.4K views
OECD-Persol Holdings Workshop on Advancing Employee Well-being in Business an... von StatsCommunications
OECD-Persol Holdings Workshop on Advancing Employee Well-being in Business an...OECD-Persol Holdings Workshop on Advancing Employee Well-being in Business an...
OECD-Persol Holdings Workshop on Advancing Employee Well-being in Business an...
[DSC Europe 23][AI:CSI] Dragan Pleskonjic - AI Impact on Cybersecurity and P... von DataScienceConferenc1
[DSC Europe 23][AI:CSI]  Dragan Pleskonjic - AI Impact on Cybersecurity and P...[DSC Europe 23][AI:CSI]  Dragan Pleskonjic - AI Impact on Cybersecurity and P...
[DSC Europe 23][AI:CSI] Dragan Pleskonjic - AI Impact on Cybersecurity and P...
Advanced_Recommendation_Systems_Presentation.pptx von neeharikasingh29
Advanced_Recommendation_Systems_Presentation.pptxAdvanced_Recommendation_Systems_Presentation.pptx
Advanced_Recommendation_Systems_Presentation.pptx
SUPER STORE SQL PROJECT.pptx von khan888620
SUPER STORE SQL PROJECT.pptxSUPER STORE SQL PROJECT.pptx
SUPER STORE SQL PROJECT.pptx
khan88862012 views
Data Journeys Hard Talk workshop final.pptx von info828217
Data Journeys Hard Talk workshop final.pptxData Journeys Hard Talk workshop final.pptx
Data Journeys Hard Talk workshop final.pptx
info82821710 views
Cross-network in Google Analytics 4.pdf von GA4 Tutorials
Cross-network in Google Analytics 4.pdfCross-network in Google Analytics 4.pdf
Cross-network in Google Analytics 4.pdf
GA4 Tutorials6 views

FrugalML: Using ML APIs More Accurately and Cheaply

  • 1. FrugalML: Using ML Prediction APIs more Accurately and Cheaply Lingjiao Chen 1 Joint work with James Zou Matei Zaharia
  • 2. Outline Introduction to MLaaS FrugalML: How to save up to 90% using cloud ML APIs? The main idea How to use it Empirical evaluation on real world ML APIs What is next? 2 Copyright@Lingjiao Chen, https://lchen001.github.io/
  • 3. Machine Learning as a Service (MLaaS) - Goal: Mitigate low level overheads - e.g., model training - data labelling, etc - Participator: -VALUE: Previous: USD 1.0 billion in 2019 Expected: USD 8.48 billion by 2025 2019 2024 C A G R : 4 3 % Source: Mordor Intelligence 3 Copyright@Lingjiao Chen, https://lchen001.github.io/
  • 5. Problem: Which API to use? - ML Prediction APIs: a data point -> a label (plus a cost) e.g., Google API: images -> facial emotions, 0.0015$/image - Many commercial APIs with same functionality - Heterogeneity in performance and cost … … 5 Copyright@Lingjiao Chen, https://lchen001.github.io/
  • 6. Our Proposed Solution: FrugalML - Optimize for best sequential strategy with a budget constraint Up to 90% cost savings or 5% better accuracy with same cost across all tasks and datasets evaluated 6 Copyright@Lingjiao Chen, https://lchen001.github.io/
  • 7. FrugalML: How to use it? - Call a base service first - Take the predicted quality score (QS) and predicted label from the base service as features to decide - i) if the prediction should be accepted - ii) if and which additional API should be invoked. 7 Copyright@Lingjiao Chen, https://lchen001.github.io/
  • 8. FrugalML: How to use it? 8 Copyright@Lingjiao Chen, https://lchen001.github.io/ FrugalMLTraining FrugalML Deploying Google API Deploying
  • 9. FrugalML: How to train it? Goal: Pick the optimal base/add-on services, thresholds, etc. Combinatorial optimization problem: provably efficient solver? Statistically: How many samples are needed? Computationally: How long does it take for training? 9 Copyright@Lingjiao Chen, https://lchen001.github.io/
  • 10. FrugalML: A provably efficient solver ✔ Key lemma: base/add-on services from <3 services (sparsity) ✔ An approx. solver: O(1/N) accuracy loss guarantee ✔ Sample complexity: N samples annotated by APIs ✔ Computational cost: O(N) 10 Copyright@Lingjiao Chen, https://lchen001.github.io/
  • 11. Learned FrugalML Strategy Case Study on a facial emotion dataset, FER+ Budget: $5 (=cheapest commercial API) FrugalML works well in practice 11 Copyright@Lingjiao Chen, https://lchen001.github.io/ $15 $10 $0.01
  • 12. Accuracy and Cost Comparison Cost (Dollar) Accuracy (%) Case Study on a facial emotion dataset, FER+ FrugalML works well in practice 12 Copyright@Lingjiao Chen, https://lchen001.github.io/
  • 13. Accuracy Budget Trade-offs Case study on a facial emotion dataset, FER+1 Accuracy (%) Microsoft API Github API FrugalML works well in practice 13 Copyright@Lingjiao Chen, https://lchen001.github.io/ Face++ API Google API
  • 14. FrugalML’s cost savings (%) while match best commercial API’s accuracy Up to 90% cost savings or 5% better accuracy with same cost across all tasks and datasets evaluated FrugalML works well in practice Vision NLP Speech 14 Copyright@Lingjiao Chen, https://lchen001.github.io/
  • 15. FrugalML’s accuracy improvement (%) while match best commercial API’s cost Up to 90% cost savings or 5% better accuracy with same cost across all tasks and datasets evaluated FrugalML works well in practice Vision NLP Speech 15 Copyright@Lingjiao Chen, https://lchen001.github.io/
  • 16. Conclusions and Open Problems Question: Best use ML APIs in the market within a budget Our solution: FrugalML Provable performance and efficiency guarantee Up to 90% cost savings or 5% better accuracy with same cost Dataset with 612,139 samples annotated by APIs and code released Open problems: many exist in this under-explored area More complicated tasks? API performance shift? Other requirements (fairness, latency, …)? 16 Copyright@Lingjiao Chen, https://lchen001.github.io/
  • 17. Code and Data: github.com/lchen001/Frugal ML More on theoretical analysis, empirical results: Please visit our project website and/or full paper! 17 Copyright@Lingjiao Chen, https://lchen001.github.io/