SlideShare ist ein Scribd-Unternehmen logo
1 von 52
Downloaden Sie, um offline zu lesen
Automated
Hyperparameter Tuning
June 20th, 2019
Logistics
• We can’t hear you…
• Recording will be available…
• Slides will be available…
• Code samples and notebooks will be available…
• Queue up Questions…
• Bookmark databricks.com/blog
About our speakers
Yifan Cao, Sr. Product Manager, Machine Learning at Databricks
• Product Area: ML/DL algorithms and Databricks Runtime for Machine
Learning
• Built and grew two ML products to multi-million dollars in annual
revenue
• B.S. Engineering from UC Berkeley; MBA from MIT
Joseph Bradley, Software Engineer, Machine Learning at Databricks
• Apache Spark PMC member
• Postdoc at UC Berkeley
• Ph.D. in Machine Learning from Carnegie Mellon
Accelerate innovation by unifying data science,
engineering and business
• Original creators of
• 2000+ global companies use our platform across big
data & machine learning lifecycle
VISION
WHO WE
ARE
Unified Analytics PlatformSOLUTION
DATA
ENGINEERS
x
Data & ML Tech and People are in Silos
DATA
SCIENTISTS
Hiring Data Scientists is a Key Blocker
“My team needs to build 100+
models this year, but it has
only got to 20%.”
What is Automated ML (AutoML)?
● Excel-like tool that enables anyone
to do machine learning
● Productivity tools for
data scientists
Raw Data
Model
Exploration
Feature
Engineering
ETL
Model
Scoring
Hyperparam
eter Tuning
Alerting &
Monitoring
Cross
Validation
Where does AutoML fit on Databricks?
DATA
ENGINEERS
DATA
SCIENTISTS
AutoML
Great Training
AutoML on Databricks (1/3)
AutoML librariesUSER CONTROL
Watch it now >
https://dbricks.co/zynga
Custom Solution: Zynga
Automating Predictive Modeling at Zynga with Pandas UDFs
Great Training
AutoML on Databricks (2/3)
AutoML libraries
PartnershipsAUTOMATION
USER CONTROL
Databricks
ETL & ML
Databricks
ML Test & Model
Enable data scientists and citizen data scientists to accelerate and scale
the development and delivery of predictive models.
Run and deploy ML
models at Scale
14
Databricks and DataRobot Integration
Watch it now >
https://dbricks.co/datarobot
Great Training
AutoML on Databricks (3/3)
AutoML libraries
Partnerships
Hyperopt
AUTOMATION
USER CONTROL
AUTOMATION +
CONTROL
Integrations MLlib
Today's Content
Great Training
A simple analogy
Manual Transmission
Semi AutonomousAUTOMATION
USER CONTROL
AUTOMATION +
CONTROL
Automatic Transmission
Today's Content
Use Case #1: Hyperparameter Tuning
Model
Exploration
Feature
Engineering
Model
Scoring
Hyperparam
eter Tuning
Alerting &
Monitoring
Cross
Validation
Scenarios:
● Automated hyperparameter search to select models after cross validation
● Automated hyperparameter search to optimize models in production
Our Offerings:
● Distributed Hyperopt + Automated MLflow Tracking
Raw Data ETL
Use Case #2: Model Search
Model
Exploration
Feature
Engineering
Model
Scoring
Hyperparam
eter Tuning
Alerting &
Monitoring
Cross
Validation
Scenarios:
● Automated model search by exploring different combinations of featuresets, algos,
hyperparameters
● Automated model search by extending a baseline model to 1000+ custom models
Our Offerings:
● MLlib + Automated MLflow Tracking
● Distributed Hyperopt + Automated MLflow Tracking, with conditional hyperparameter tuning
Raw Data ETL
Scenarios:
● Automated end-to-end Machine Learning model generation pipelines incorporating
customer-specified logics
Our Offerings:
● Leverage existing Databricks internal tools & frameworks on top of Databricks Runtime
ML
Use Case #3: End-to-end ML Pipeline
Model
Exploration
Feature
Engineering
Model
Scoring
Hyperparam
eter Tuning
Alerting &
Monitoring
Cross
Validation
Raw Data ETL
Hyperparameters
Hyperparameters
Express high-level concepts, such as statistical assumptions
E.g.: regularization
Are fixed before training or are hard to learn from data
E.g.: neural net architecture
Affect objective, test time performance, computational cost
E.g.: # iterations or epochs
Tuning hyperparameters
E.g.: Fitting a
polynomial
Common goals:
• More flexible modeling process
• Reduced generalization error
• Faster training
• Plug & play ML
Challenges in tuning
Curse of dimensionality
Non-convex optimization
Computational cost
Unintuitive hyperparameters
Data prep: train-validation-test splits
Data
Data prep: train-validation-test splits
Training Data Test Data
ML Model
Data prep: train-validation-test splits
Training
Data
Validation
Data
Test Data
Final
ML Model
ML Model 1
ML Model 2
ML Model 3
A practical definition of tuning
ML Model
Featurization
Model family
selection
Hyperparameter
tuning
Parameters: configs which your ML library learns from data
Hyperparameters: configs which your ML library does not learn from data
Tuning Methods
Overview of tuning methods
•Manual search
•Grid search
•Random search
•Population-based algorithms
•Bayesian algorithms
Manual search
Select hyperparameter settings to try based on human intuition.
2 hyperparameters:
•[0, ..., 5]
•{A, B, ..., F}
A B C D E F
0
1
2
3
4
5
Expert knowledge tells us to try:
(2,C), (2,D), (2,E), (3,C), (3,D), (3,E)
Grid Search
Try points on a grid defined by ranges and step sizes
X-axis: {A,...,F}
Y-axis: 0-5, step = 1
A B C D E F
0
1
2
3
4
5
A B C D E F
0
1
2
3
4
5
Random Search
Sample from distributions over ranges
X-axis: Uniform({A,...,F})
Y-axis: Uniform([0,5])
Start with random search, then iterate:
•Use the previous “generation” to
inform the next generation
•E.g., sample from best performers &
then perturb them
Population Based Algorithms
A B C D E F
0
1
2
3
4
5
Start with random search, then iterate:
•Use the previous “generation” to
inform the next generation
•E.g., sample from best performers &
then perturb them
Population Based Algorithms
A B C D E F
0
1
2
3
4
5
Start with random search, then iterate:
•Use the previous “generation” to
inform the next generation
•E.g., sample from best performers &
then perturb them
Population Based Algorithms
A B C D E F
0
1
2
3
4
5
Model the loss function:
Hyperparameters ⇒ loss
Iteratively search space, trading off
between exploration and exploitation
A B C D E F
0
1
2
3
4
5
Bayesian Optimization
Get samples: Test new points in
hyperparameter space
Bayesian Optimization
A B C D E F
0
1
2
3
4
5
A B C D E F
0
1
2
3
4
5
Get samples: Test new points in
hyperparameter space
Update model of space:
Hyperparameters ⇒ loss
Bayesian Optimization
Comparing tuning methods
Iterative /
adaptive?
# evaluations
for P params
Model of
param space
Grid search No O(c^P) none
Random search No O(k) none
Population-based Yes O(k) implicit
Bayesian Yes O(k) explicit
Open-source tools for tuning
Grid
search
Random
search
Population
-based
Bayesian PyPi
downloads
last month
Github
stars
License
scikit-learn Yes Yes --- --- BSD
MLlib Yes --- --- Apache 2.0
scikit-opti
mize
Yes 49,189 1,278 BSD
Hyperopt Yes Yes 98,282 3,286 BSD
DEAP Yes 26,700 2,789 LGPL v3
TPOT Yes 9,057 5,609 LGPL v3
GPyOpt Yes 4,959 451 BSD
As of mid-April 2019
Tracking Tuning Workflows
MLflow Overview
42
Tracking
Record and query
experiments: code,
data, config, results
Projects
Packaging format
for reproducible runs
on any platform
Models
General model format
that supports diverse
deployment tools
mlflow.org github.com/mlflow twitter.com/MLflowdatabricks.com/mlflow
Organizing with
Training Data Validation Data Test Data
Final ML ModelML Model 1
ML Model 2
ML Model 3
Experiment
Main run
Child runs
Tip: Tune full pipeline, not 1 model.
Instrumenting tuning with
MLflow concepts for tracking runs
Params: hyperparameters
Metrics: training & validation, loss & objective, multiple objectives
Tags: provenance, simple metadata
Artifacts: serialized model, large metadata
Analyzing how tuning performs
Questions to answer
• Am I tuning the right hyperparameters?
• Am I exploring the right parts of the search space?
• Do I need to do another round of tuning?
Examining results
• Simple case: visualize param vs metric
• Challenges: multiple params and metrics, iterative experimentation
Auto-tracking MLlib with
Training Data Validation Data Test Data
Final ML ModelML Model 1
ML Model 2
ML Model 3
Experiment
Main run
Child runs
In Databricks
• CrossValidator &
TrainValidationSplit
• 1 run per setting of
hyperparameters
• Avg metrics for CV folds(demo)
Scaling Tuning Workflows
Hyperopt
Hyperparameter tuning in Python ML workflows
● Usable with any Python ML library
● Tuning algorithms:
○ Random search
○ Bayesian (Tree of Parzen Estimators)
● Open source (3-clause BSD license)
https://github.com/hyperopt/hyperopt
Distribute tuning across Spark clusters
● Each Spark task trains & evaluates 1 model (hyperparameter setting)
○ Applicable to single-machine ML workloads
● Via new SparkTrials plugin
● Contributing to open source Hyperopt:
github.com/hyperopt/hyperopt/pull/509
With automated MLflow tracking in Databricks
Available now in Databricks Runtime 5.4 ML
Hyperopt on Apache Spark
(demo)
Related Content
Blog:
• Hyperparameter Tuning with MLflow,
Apache Spark MLlib and Hyperopt
Webinar:
• How to Automate Machine Learning and
Scale Delivery
Tutorials
● Hyperparameter Tuning Documentation
● MLflow integrations with H20.ai GPyOpt,
HyperOpt
Notebooks
● MLlib + Automated MLflow Tracking
● Distributed Hyperopt + Automated MLflow
Tracking
● Basic Introduction to DataRobot via API
Videos
● Automating Predictive Modeling at Zynga
with PySpark and Pandas UDFs
● Best Practices for Hyperparameter Tuning
with MLflow
● Advanced Hyperparameter Optimization
for Deep Learning with MLflow
Getting started
MLflow
Managed MLflow
Generally Available in
Databricks
MLlib + automated
MLflow tracking
Public preview in
Databricks Runtime 5.4
& 5.4ML
Distributed Hyperopt
+ automated MLflow
tracking
Public preview in
Databricks Runtime 5.4ML
https://docs.databricks.com/spark/latest/mllib/index.html#hyperparameter-tuning
https://docs.azuredatabricks.net/spark/latest/mllib/index.html#hyperparameter-tuning
https://mlflow.org/
Thank you
Q&A
52

Weitere ähnliche Inhalte

Was ist angesagt?

Machine Learning Pipelines
Machine Learning PipelinesMachine Learning Pipelines
Machine Learning Pipelinesjeykottalam
 
Best Practices for Hyperparameter Tuning with MLflow
Best Practices for Hyperparameter Tuning with MLflowBest Practices for Hyperparameter Tuning with MLflow
Best Practices for Hyperparameter Tuning with MLflowDatabricks
 
Machine Learning Deep Learning AI and Data Science
Machine Learning Deep Learning AI and Data Science Machine Learning Deep Learning AI and Data Science
Machine Learning Deep Learning AI and Data Science Venkata Reddy Konasani
 
Random forest and decision tree
Random forest and decision treeRandom forest and decision tree
Random forest and decision treeAAKANKSHA JAIN
 
Machine Learning in 10 Minutes | What is Machine Learning? | Edureka
Machine Learning in 10 Minutes | What is Machine Learning? | EdurekaMachine Learning in 10 Minutes | What is Machine Learning? | Edureka
Machine Learning in 10 Minutes | What is Machine Learning? | EdurekaEdureka!
 
Machine-Learning-A-Z-Course-Downloadable-Slides-V1.5.pdf
Machine-Learning-A-Z-Course-Downloadable-Slides-V1.5.pdfMachine-Learning-A-Z-Course-Downloadable-Slides-V1.5.pdf
Machine-Learning-A-Z-Course-Downloadable-Slides-V1.5.pdfMaris R
 
Real World End to End machine Learning Pipeline
Real World End to End machine Learning PipelineReal World End to End machine Learning Pipeline
Real World End to End machine Learning PipelineSrivatsan Srinivasan
 
Linear models for classification
Linear models for classificationLinear models for classification
Linear models for classificationSung Yub Kim
 
Boosting Approach to Solving Machine Learning Problems
Boosting Approach to Solving Machine Learning ProblemsBoosting Approach to Solving Machine Learning Problems
Boosting Approach to Solving Machine Learning ProblemsDr Sulaimon Afolabi
 
Deep Dive into Hyperparameter Tuning
Deep Dive into Hyperparameter TuningDeep Dive into Hyperparameter Tuning
Deep Dive into Hyperparameter TuningShubhmay Potdar
 
Machine learning with scikitlearn
Machine learning with scikitlearnMachine learning with scikitlearn
Machine learning with scikitlearnPratap Dangeti
 
Python Pandas
Python PandasPython Pandas
Python PandasSunil OS
 
Presentation on unsupervised learning
Presentation on unsupervised learning Presentation on unsupervised learning
Presentation on unsupervised learning ANKUSH PAL
 
L2. Evaluating Machine Learning Algorithms I
L2. Evaluating Machine Learning Algorithms IL2. Evaluating Machine Learning Algorithms I
L2. Evaluating Machine Learning Algorithms IMachine Learning Valencia
 
[Paper Reading] Attention is All You Need
[Paper Reading] Attention is All You Need[Paper Reading] Attention is All You Need
[Paper Reading] Attention is All You NeedDaiki Tanaka
 
Python Anaconda Tutorial | Edureka
Python Anaconda Tutorial | EdurekaPython Anaconda Tutorial | Edureka
Python Anaconda Tutorial | EdurekaEdureka!
 
Learn Python Programming | Python Programming - Step by Step | Python for Beg...
Learn Python Programming | Python Programming - Step by Step | Python for Beg...Learn Python Programming | Python Programming - Step by Step | Python for Beg...
Learn Python Programming | Python Programming - Step by Step | Python for Beg...Edureka!
 

Was ist angesagt? (20)

Machine Learning Pipelines
Machine Learning PipelinesMachine Learning Pipelines
Machine Learning Pipelines
 
Best Practices for Hyperparameter Tuning with MLflow
Best Practices for Hyperparameter Tuning with MLflowBest Practices for Hyperparameter Tuning with MLflow
Best Practices for Hyperparameter Tuning with MLflow
 
Machine Learning Deep Learning AI and Data Science
Machine Learning Deep Learning AI and Data Science Machine Learning Deep Learning AI and Data Science
Machine Learning Deep Learning AI and Data Science
 
Random forest and decision tree
Random forest and decision treeRandom forest and decision tree
Random forest and decision tree
 
Machine Learning in 10 Minutes | What is Machine Learning? | Edureka
Machine Learning in 10 Minutes | What is Machine Learning? | EdurekaMachine Learning in 10 Minutes | What is Machine Learning? | Edureka
Machine Learning in 10 Minutes | What is Machine Learning? | Edureka
 
Machine-Learning-A-Z-Course-Downloadable-Slides-V1.5.pdf
Machine-Learning-A-Z-Course-Downloadable-Slides-V1.5.pdfMachine-Learning-A-Z-Course-Downloadable-Slides-V1.5.pdf
Machine-Learning-A-Z-Course-Downloadable-Slides-V1.5.pdf
 
Deep learning presentation
Deep learning presentationDeep learning presentation
Deep learning presentation
 
Real World End to End machine Learning Pipeline
Real World End to End machine Learning PipelineReal World End to End machine Learning Pipeline
Real World End to End machine Learning Pipeline
 
Support Vector Machines ( SVM )
Support Vector Machines ( SVM ) Support Vector Machines ( SVM )
Support Vector Machines ( SVM )
 
Linear models for classification
Linear models for classificationLinear models for classification
Linear models for classification
 
Boosting Approach to Solving Machine Learning Problems
Boosting Approach to Solving Machine Learning ProblemsBoosting Approach to Solving Machine Learning Problems
Boosting Approach to Solving Machine Learning Problems
 
Deep learning ppt
Deep learning pptDeep learning ppt
Deep learning ppt
 
Deep Dive into Hyperparameter Tuning
Deep Dive into Hyperparameter TuningDeep Dive into Hyperparameter Tuning
Deep Dive into Hyperparameter Tuning
 
Machine learning with scikitlearn
Machine learning with scikitlearnMachine learning with scikitlearn
Machine learning with scikitlearn
 
Python Pandas
Python PandasPython Pandas
Python Pandas
 
Presentation on unsupervised learning
Presentation on unsupervised learning Presentation on unsupervised learning
Presentation on unsupervised learning
 
L2. Evaluating Machine Learning Algorithms I
L2. Evaluating Machine Learning Algorithms IL2. Evaluating Machine Learning Algorithms I
L2. Evaluating Machine Learning Algorithms I
 
[Paper Reading] Attention is All You Need
[Paper Reading] Attention is All You Need[Paper Reading] Attention is All You Need
[Paper Reading] Attention is All You Need
 
Python Anaconda Tutorial | Edureka
Python Anaconda Tutorial | EdurekaPython Anaconda Tutorial | Edureka
Python Anaconda Tutorial | Edureka
 
Learn Python Programming | Python Programming - Step by Step | Python for Beg...
Learn Python Programming | Python Programming - Step by Step | Python for Beg...Learn Python Programming | Python Programming - Step by Step | Python for Beg...
Learn Python Programming | Python Programming - Step by Step | Python for Beg...
 

Ähnlich wie Automated Hyperparameter Tuning, Scaling and Tracking

Building High Available and Scalable Machine Learning Applications
Building High Available and Scalable Machine Learning ApplicationsBuilding High Available and Scalable Machine Learning Applications
Building High Available and Scalable Machine Learning ApplicationsYalçın Yenigün
 
The Power of Auto ML and How Does it Work
The Power of Auto ML and How Does it WorkThe Power of Auto ML and How Does it Work
The Power of Auto ML and How Does it WorkIvo Andreev
 
Augmenting Machine Learning with Databricks Labs AutoML Toolkit
Augmenting Machine Learning with Databricks Labs AutoML ToolkitAugmenting Machine Learning with Databricks Labs AutoML Toolkit
Augmenting Machine Learning with Databricks Labs AutoML ToolkitDatabricks
 
Big Data Spain 2018: How to build Weighted XGBoost ML model for Imbalance dat...
Big Data Spain 2018: How to build Weighted XGBoost ML model for Imbalance dat...Big Data Spain 2018: How to build Weighted XGBoost ML model for Imbalance dat...
Big Data Spain 2018: How to build Weighted XGBoost ML model for Imbalance dat...Alok Singh
 
Reproducible AI Using PyTorch and MLflow
Reproducible AI Using PyTorch and MLflowReproducible AI Using PyTorch and MLflow
Reproducible AI Using PyTorch and MLflowDatabricks
 
Tuning ML Models: Scaling, Workflows, and Architecture
Tuning ML Models: Scaling, Workflows, and ArchitectureTuning ML Models: Scaling, Workflows, and Architecture
Tuning ML Models: Scaling, Workflows, and ArchitectureDatabricks
 
Spark Summit EU 2015: Combining the Strengths of MLlib, scikit-learn, and R
Spark Summit EU 2015: Combining the Strengths of MLlib, scikit-learn, and RSpark Summit EU 2015: Combining the Strengths of MLlib, scikit-learn, and R
Spark Summit EU 2015: Combining the Strengths of MLlib, scikit-learn, and RDatabricks
 
When We Spark and When We Don’t: Developing Data and ML Pipelines
When We Spark and When We Don’t: Developing Data and ML PipelinesWhen We Spark and When We Don’t: Developing Data and ML Pipelines
When We Spark and When We Don’t: Developing Data and ML PipelinesStitch Fix Algorithms
 
Apache Spark Model Deployment
Apache Spark Model Deployment Apache Spark Model Deployment
Apache Spark Model Deployment Databricks
 
Guiding through a typical Machine Learning Pipeline
Guiding through a typical Machine Learning PipelineGuiding through a typical Machine Learning Pipeline
Guiding through a typical Machine Learning PipelineMichael Gerke
 
Open and Automated Machine Learning
Open and Automated Machine LearningOpen and Automated Machine Learning
Open and Automated Machine LearningJoaquin Vanschoren
 
Drifting Away: Testing ML Models in Production
Drifting Away: Testing ML Models in ProductionDrifting Away: Testing ML Models in Production
Drifting Away: Testing ML Models in ProductionDatabricks
 
Augmenting Machine Learning with Databricks Labs AutoML Toolkit
Augmenting Machine Learning with Databricks Labs AutoML ToolkitAugmenting Machine Learning with Databricks Labs AutoML Toolkit
Augmenting Machine Learning with Databricks Labs AutoML ToolkitDatabricks
 
How to Productionize Your Machine Learning Models Using Apache Spark MLlib 2....
How to Productionize Your Machine Learning Models Using Apache Spark MLlib 2....How to Productionize Your Machine Learning Models Using Apache Spark MLlib 2....
How to Productionize Your Machine Learning Models Using Apache Spark MLlib 2....Databricks
 
Big Data Meets Learning Science: Keynote by Al Essa
Big Data Meets Learning Science: Keynote by Al EssaBig Data Meets Learning Science: Keynote by Al Essa
Big Data Meets Learning Science: Keynote by Al EssaSpark Summit
 
Spark MLlib - Training Material
Spark MLlib - Training Material Spark MLlib - Training Material
Spark MLlib - Training Material Bryan Yang
 
XGBoost @ Fyber
XGBoost @ FyberXGBoost @ Fyber
XGBoost @ FyberDaniel Hen
 
Reproducible AI using MLflow and PyTorch
Reproducible AI using MLflow and PyTorchReproducible AI using MLflow and PyTorch
Reproducible AI using MLflow and PyTorchDatabricks
 

Ähnlich wie Automated Hyperparameter Tuning, Scaling and Tracking (20)

Building High Available and Scalable Machine Learning Applications
Building High Available and Scalable Machine Learning ApplicationsBuilding High Available and Scalable Machine Learning Applications
Building High Available and Scalable Machine Learning Applications
 
The Power of Auto ML and How Does it Work
The Power of Auto ML and How Does it WorkThe Power of Auto ML and How Does it Work
The Power of Auto ML and How Does it Work
 
Augmenting Machine Learning with Databricks Labs AutoML Toolkit
Augmenting Machine Learning with Databricks Labs AutoML ToolkitAugmenting Machine Learning with Databricks Labs AutoML Toolkit
Augmenting Machine Learning with Databricks Labs AutoML Toolkit
 
Big Data Spain 2018: How to build Weighted XGBoost ML model for Imbalance dat...
Big Data Spain 2018: How to build Weighted XGBoost ML model for Imbalance dat...Big Data Spain 2018: How to build Weighted XGBoost ML model for Imbalance dat...
Big Data Spain 2018: How to build Weighted XGBoost ML model for Imbalance dat...
 
Reproducible AI Using PyTorch and MLflow
Reproducible AI Using PyTorch and MLflowReproducible AI Using PyTorch and MLflow
Reproducible AI Using PyTorch and MLflow
 
Tuning ML Models: Scaling, Workflows, and Architecture
Tuning ML Models: Scaling, Workflows, and ArchitectureTuning ML Models: Scaling, Workflows, and Architecture
Tuning ML Models: Scaling, Workflows, and Architecture
 
Spark Summit EU 2015: Combining the Strengths of MLlib, scikit-learn, and R
Spark Summit EU 2015: Combining the Strengths of MLlib, scikit-learn, and RSpark Summit EU 2015: Combining the Strengths of MLlib, scikit-learn, and R
Spark Summit EU 2015: Combining the Strengths of MLlib, scikit-learn, and R
 
When We Spark and When We Don’t: Developing Data and ML Pipelines
When We Spark and When We Don’t: Developing Data and ML PipelinesWhen We Spark and When We Don’t: Developing Data and ML Pipelines
When We Spark and When We Don’t: Developing Data and ML Pipelines
 
Apache Spark Model Deployment
Apache Spark Model Deployment Apache Spark Model Deployment
Apache Spark Model Deployment
 
Guiding through a typical Machine Learning Pipeline
Guiding through a typical Machine Learning PipelineGuiding through a typical Machine Learning Pipeline
Guiding through a typical Machine Learning Pipeline
 
Open and Automated Machine Learning
Open and Automated Machine LearningOpen and Automated Machine Learning
Open and Automated Machine Learning
 
Drifting Away: Testing ML Models in Production
Drifting Away: Testing ML Models in ProductionDrifting Away: Testing ML Models in Production
Drifting Away: Testing ML Models in Production
 
Machine learning
Machine learningMachine learning
Machine learning
 
Augmenting Machine Learning with Databricks Labs AutoML Toolkit
Augmenting Machine Learning with Databricks Labs AutoML ToolkitAugmenting Machine Learning with Databricks Labs AutoML Toolkit
Augmenting Machine Learning with Databricks Labs AutoML Toolkit
 
Ai in finance
Ai in financeAi in finance
Ai in finance
 
How to Productionize Your Machine Learning Models Using Apache Spark MLlib 2....
How to Productionize Your Machine Learning Models Using Apache Spark MLlib 2....How to Productionize Your Machine Learning Models Using Apache Spark MLlib 2....
How to Productionize Your Machine Learning Models Using Apache Spark MLlib 2....
 
Big Data Meets Learning Science: Keynote by Al Essa
Big Data Meets Learning Science: Keynote by Al EssaBig Data Meets Learning Science: Keynote by Al Essa
Big Data Meets Learning Science: Keynote by Al Essa
 
Spark MLlib - Training Material
Spark MLlib - Training Material Spark MLlib - Training Material
Spark MLlib - Training Material
 
XGBoost @ Fyber
XGBoost @ FyberXGBoost @ Fyber
XGBoost @ Fyber
 
Reproducible AI using MLflow and PyTorch
Reproducible AI using MLflow and PyTorchReproducible AI using MLflow and PyTorch
Reproducible AI using MLflow and PyTorch
 

Mehr von Databricks

DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDatabricks
 
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Databricks
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Databricks
 
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Databricks
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Databricks
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of HadoopDatabricks
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDatabricks
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceDatabricks
 
Why APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringWhy APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringDatabricks
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixThe Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixDatabricks
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationDatabricks
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchDatabricks
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesDatabricks
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesDatabricks
 
Sawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsSawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsDatabricks
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkDatabricks
 
Re-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkDatabricks
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesDatabricks
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkDatabricks
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeDatabricks
 

Mehr von Databricks (20)

DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptx
 
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2
 
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized Platform
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data Science
 
Why APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringWhy APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML Monitoring
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixThe Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI Integration
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorch
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on Kubernetes
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
 
Sawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsSawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature Aggregations
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
 
Re-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and Spark
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction Queries
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache Spark
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta Lake
 

Kürzlich hochgeladen

Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxolyaivanovalion
 
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...shivangimorya083
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz1
 
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Delhi Call girls
 
Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...shambhavirathore45
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...amitlee9823
 
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...Pooja Nehwal
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxMohammedJunaid861692
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxfirstjob4
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfLars Albertsson
 
Zuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxZuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxolyaivanovalion
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxolyaivanovalion
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% SecurePooja Nehwal
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysismanisha194592
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxolyaivanovalion
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxolyaivanovalion
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFxolyaivanovalion
 

Kürzlich hochgeladen (20)

Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
 
Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptx
 
Sampling (random) method and Non random.ppt
Sampling (random) method and Non random.pptSampling (random) method and Non random.ppt
Sampling (random) method and Non random.ppt
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
 
Zuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxZuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptx
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFx
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptx
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFx
 

Automated Hyperparameter Tuning, Scaling and Tracking

  • 2. Logistics • We can’t hear you… • Recording will be available… • Slides will be available… • Code samples and notebooks will be available… • Queue up Questions… • Bookmark databricks.com/blog
  • 3. About our speakers Yifan Cao, Sr. Product Manager, Machine Learning at Databricks • Product Area: ML/DL algorithms and Databricks Runtime for Machine Learning • Built and grew two ML products to multi-million dollars in annual revenue • B.S. Engineering from UC Berkeley; MBA from MIT Joseph Bradley, Software Engineer, Machine Learning at Databricks • Apache Spark PMC member • Postdoc at UC Berkeley • Ph.D. in Machine Learning from Carnegie Mellon
  • 4. Accelerate innovation by unifying data science, engineering and business • Original creators of • 2000+ global companies use our platform across big data & machine learning lifecycle VISION WHO WE ARE Unified Analytics PlatformSOLUTION
  • 5. DATA ENGINEERS x Data & ML Tech and People are in Silos DATA SCIENTISTS
  • 6.
  • 7. Hiring Data Scientists is a Key Blocker
  • 8. “My team needs to build 100+ models this year, but it has only got to 20%.”
  • 9. What is Automated ML (AutoML)? ● Excel-like tool that enables anyone to do machine learning ● Productivity tools for data scientists
  • 10. Raw Data Model Exploration Feature Engineering ETL Model Scoring Hyperparam eter Tuning Alerting & Monitoring Cross Validation Where does AutoML fit on Databricks? DATA ENGINEERS DATA SCIENTISTS AutoML
  • 11. Great Training AutoML on Databricks (1/3) AutoML librariesUSER CONTROL
  • 12. Watch it now > https://dbricks.co/zynga Custom Solution: Zynga Automating Predictive Modeling at Zynga with Pandas UDFs
  • 13. Great Training AutoML on Databricks (2/3) AutoML libraries PartnershipsAUTOMATION USER CONTROL
  • 14. Databricks ETL & ML Databricks ML Test & Model Enable data scientists and citizen data scientists to accelerate and scale the development and delivery of predictive models. Run and deploy ML models at Scale 14 Databricks and DataRobot Integration Watch it now > https://dbricks.co/datarobot
  • 15. Great Training AutoML on Databricks (3/3) AutoML libraries Partnerships Hyperopt AUTOMATION USER CONTROL AUTOMATION + CONTROL Integrations MLlib Today's Content
  • 16. Great Training A simple analogy Manual Transmission Semi AutonomousAUTOMATION USER CONTROL AUTOMATION + CONTROL Automatic Transmission Today's Content
  • 17. Use Case #1: Hyperparameter Tuning Model Exploration Feature Engineering Model Scoring Hyperparam eter Tuning Alerting & Monitoring Cross Validation Scenarios: ● Automated hyperparameter search to select models after cross validation ● Automated hyperparameter search to optimize models in production Our Offerings: ● Distributed Hyperopt + Automated MLflow Tracking Raw Data ETL
  • 18. Use Case #2: Model Search Model Exploration Feature Engineering Model Scoring Hyperparam eter Tuning Alerting & Monitoring Cross Validation Scenarios: ● Automated model search by exploring different combinations of featuresets, algos, hyperparameters ● Automated model search by extending a baseline model to 1000+ custom models Our Offerings: ● MLlib + Automated MLflow Tracking ● Distributed Hyperopt + Automated MLflow Tracking, with conditional hyperparameter tuning Raw Data ETL
  • 19. Scenarios: ● Automated end-to-end Machine Learning model generation pipelines incorporating customer-specified logics Our Offerings: ● Leverage existing Databricks internal tools & frameworks on top of Databricks Runtime ML Use Case #3: End-to-end ML Pipeline Model Exploration Feature Engineering Model Scoring Hyperparam eter Tuning Alerting & Monitoring Cross Validation Raw Data ETL
  • 21. Hyperparameters Express high-level concepts, such as statistical assumptions E.g.: regularization Are fixed before training or are hard to learn from data E.g.: neural net architecture Affect objective, test time performance, computational cost E.g.: # iterations or epochs
  • 22. Tuning hyperparameters E.g.: Fitting a polynomial Common goals: • More flexible modeling process • Reduced generalization error • Faster training • Plug & play ML
  • 23. Challenges in tuning Curse of dimensionality Non-convex optimization Computational cost Unintuitive hyperparameters
  • 25. Data prep: train-validation-test splits Training Data Test Data ML Model
  • 26. Data prep: train-validation-test splits Training Data Validation Data Test Data Final ML Model ML Model 1 ML Model 2 ML Model 3
  • 27. A practical definition of tuning ML Model Featurization Model family selection Hyperparameter tuning Parameters: configs which your ML library learns from data Hyperparameters: configs which your ML library does not learn from data
  • 29. Overview of tuning methods •Manual search •Grid search •Random search •Population-based algorithms •Bayesian algorithms
  • 30. Manual search Select hyperparameter settings to try based on human intuition. 2 hyperparameters: •[0, ..., 5] •{A, B, ..., F} A B C D E F 0 1 2 3 4 5 Expert knowledge tells us to try: (2,C), (2,D), (2,E), (3,C), (3,D), (3,E)
  • 31. Grid Search Try points on a grid defined by ranges and step sizes X-axis: {A,...,F} Y-axis: 0-5, step = 1 A B C D E F 0 1 2 3 4 5
  • 32. A B C D E F 0 1 2 3 4 5 Random Search Sample from distributions over ranges X-axis: Uniform({A,...,F}) Y-axis: Uniform([0,5])
  • 33. Start with random search, then iterate: •Use the previous “generation” to inform the next generation •E.g., sample from best performers & then perturb them Population Based Algorithms A B C D E F 0 1 2 3 4 5
  • 34. Start with random search, then iterate: •Use the previous “generation” to inform the next generation •E.g., sample from best performers & then perturb them Population Based Algorithms A B C D E F 0 1 2 3 4 5
  • 35. Start with random search, then iterate: •Use the previous “generation” to inform the next generation •E.g., sample from best performers & then perturb them Population Based Algorithms A B C D E F 0 1 2 3 4 5
  • 36. Model the loss function: Hyperparameters ⇒ loss Iteratively search space, trading off between exploration and exploitation A B C D E F 0 1 2 3 4 5 Bayesian Optimization
  • 37. Get samples: Test new points in hyperparameter space Bayesian Optimization A B C D E F 0 1 2 3 4 5
  • 38. A B C D E F 0 1 2 3 4 5 Get samples: Test new points in hyperparameter space Update model of space: Hyperparameters ⇒ loss Bayesian Optimization
  • 39. Comparing tuning methods Iterative / adaptive? # evaluations for P params Model of param space Grid search No O(c^P) none Random search No O(k) none Population-based Yes O(k) implicit Bayesian Yes O(k) explicit
  • 40. Open-source tools for tuning Grid search Random search Population -based Bayesian PyPi downloads last month Github stars License scikit-learn Yes Yes --- --- BSD MLlib Yes --- --- Apache 2.0 scikit-opti mize Yes 49,189 1,278 BSD Hyperopt Yes Yes 98,282 3,286 BSD DEAP Yes 26,700 2,789 LGPL v3 TPOT Yes 9,057 5,609 LGPL v3 GPyOpt Yes 4,959 451 BSD As of mid-April 2019
  • 42. MLflow Overview 42 Tracking Record and query experiments: code, data, config, results Projects Packaging format for reproducible runs on any platform Models General model format that supports diverse deployment tools mlflow.org github.com/mlflow twitter.com/MLflowdatabricks.com/mlflow
  • 43. Organizing with Training Data Validation Data Test Data Final ML ModelML Model 1 ML Model 2 ML Model 3 Experiment Main run Child runs Tip: Tune full pipeline, not 1 model.
  • 44. Instrumenting tuning with MLflow concepts for tracking runs Params: hyperparameters Metrics: training & validation, loss & objective, multiple objectives Tags: provenance, simple metadata Artifacts: serialized model, large metadata
  • 45. Analyzing how tuning performs Questions to answer • Am I tuning the right hyperparameters? • Am I exploring the right parts of the search space? • Do I need to do another round of tuning? Examining results • Simple case: visualize param vs metric • Challenges: multiple params and metrics, iterative experimentation
  • 46. Auto-tracking MLlib with Training Data Validation Data Test Data Final ML ModelML Model 1 ML Model 2 ML Model 3 Experiment Main run Child runs In Databricks • CrossValidator & TrainValidationSplit • 1 run per setting of hyperparameters • Avg metrics for CV folds(demo)
  • 48. Hyperopt Hyperparameter tuning in Python ML workflows ● Usable with any Python ML library ● Tuning algorithms: ○ Random search ○ Bayesian (Tree of Parzen Estimators) ● Open source (3-clause BSD license) https://github.com/hyperopt/hyperopt
  • 49. Distribute tuning across Spark clusters ● Each Spark task trains & evaluates 1 model (hyperparameter setting) ○ Applicable to single-machine ML workloads ● Via new SparkTrials plugin ● Contributing to open source Hyperopt: github.com/hyperopt/hyperopt/pull/509 With automated MLflow tracking in Databricks Available now in Databricks Runtime 5.4 ML Hyperopt on Apache Spark (demo)
  • 50. Related Content Blog: • Hyperparameter Tuning with MLflow, Apache Spark MLlib and Hyperopt Webinar: • How to Automate Machine Learning and Scale Delivery Tutorials ● Hyperparameter Tuning Documentation ● MLflow integrations with H20.ai GPyOpt, HyperOpt Notebooks ● MLlib + Automated MLflow Tracking ● Distributed Hyperopt + Automated MLflow Tracking ● Basic Introduction to DataRobot via API Videos ● Automating Predictive Modeling at Zynga with PySpark and Pandas UDFs ● Best Practices for Hyperparameter Tuning with MLflow ● Advanced Hyperparameter Optimization for Deep Learning with MLflow
  • 51. Getting started MLflow Managed MLflow Generally Available in Databricks MLlib + automated MLflow tracking Public preview in Databricks Runtime 5.4 & 5.4ML Distributed Hyperopt + automated MLflow tracking Public preview in Databricks Runtime 5.4ML https://docs.databricks.com/spark/latest/mllib/index.html#hyperparameter-tuning https://docs.azuredatabricks.net/spark/latest/mllib/index.html#hyperparameter-tuning https://mlflow.org/