Diese Präsentation wurde erfolgreich gemeldet.
Wir verwenden Ihre LinkedIn Profilangaben und Informationen zu Ihren Aktivitäten, um Anzeigen zu personalisieren und Ihnen relevantere Inhalte anzuzeigen. Sie können Ihre Anzeigeneinstellungen jederzeit ändern.
Automated
Hyperparameter Tuning
June 20th, 2019
Logistics
• We can’t hear you…
• Recording will be available…
• Slides will be available…
• Code samples and notebooks wil...
About our speakers
Yifan Cao, Sr. Product Manager, Machine Learning at Databricks
• Product Area: ML/DL algorithms and Dat...
Accelerate innovation by unifying data science,
engineering and business
• Original creators of
• 2000+ global companies u...
DATA
ENGINEERS
x
Data & ML Tech and People are in Silos
DATA
SCIENTISTS
Hiring Data Scientists is a Key Blocker
“My team needs to build 100+
models this year, but it has
only got to 20%.”
What is Automated ML (AutoML)?
● Excel-like tool that enables anyone
to do machine learning
● Productivity tools for
data ...
Raw Data
Model
Exploration
Feature
Engineering
ETL
Model
Scoring
Hyperparam
eter Tuning
Alerting &
Monitoring
Cross
Valida...
Great Training
AutoML on Databricks (1/3)
AutoML librariesUSER CONTROL
Watch it now >
https://dbricks.co/zynga
Custom Solution: Zynga
Automating Predictive Modeling at Zynga with Pandas UDFs
Great Training
AutoML on Databricks (2/3)
AutoML libraries
PartnershipsAUTOMATION
USER CONTROL
Databricks
ETL & ML
Databricks
ML Test & Model
Enable data scientists and citizen data scientists to accelerate and scale
...
Great Training
AutoML on Databricks (3/3)
AutoML libraries
Partnerships
Hyperopt
AUTOMATION
USER CONTROL
AUTOMATION +
CONT...
Great Training
A simple analogy
Manual Transmission
Semi AutonomousAUTOMATION
USER CONTROL
AUTOMATION +
CONTROL
Automatic ...
Use Case #1: Hyperparameter Tuning
Model
Exploration
Feature
Engineering
Model
Scoring
Hyperparam
eter Tuning
Alerting &
M...
Use Case #2: Model Search
Model
Exploration
Feature
Engineering
Model
Scoring
Hyperparam
eter Tuning
Alerting &
Monitoring...
Scenarios:
● Automated end-to-end Machine Learning model generation pipelines incorporating
customer-specified logics
Our ...
Hyperparameters
Hyperparameters
Express high-level concepts, such as statistical assumptions
E.g.: regularization
Are fixed before trainin...
Tuning hyperparameters
E.g.: Fitting a
polynomial
Common goals:
• More flexible modeling process
• Reduced generalization ...
Challenges in tuning
Curse of dimensionality
Non-convex optimization
Computational cost
Unintuitive hyperparameters
Data prep: train-validation-test splits
Data
Data prep: train-validation-test splits
Training Data Test Data
ML Model
Data prep: train-validation-test splits
Training
Data
Validation
Data
Test Data
Final
ML Model
ML Model 1
ML Model 2
ML Mo...
A practical definition of tuning
ML Model
Featurization
Model family
selection
Hyperparameter
tuning
Parameters: configs w...
Tuning Methods
Overview of tuning methods
•Manual search
•Grid search
•Random search
•Population-based algorithms
•Bayesian algorithms
Manual search
Select hyperparameter settings to try based on human intuition.
2 hyperparameters:
•[0, ..., 5]
•{A, B, ...,...
Grid Search
Try points on a grid defined by ranges and step sizes
X-axis: {A,...,F}
Y-axis: 0-5, step = 1
A B C D E F
0
1
...
A B C D E F
0
1
2
3
4
5
Random Search
Sample from distributions over ranges
X-axis: Uniform({A,...,F})
Y-axis: Uniform([0,...
Start with random search, then iterate:
•Use the previous “generation” to
inform the next generation
•E.g., sample from be...
Start with random search, then iterate:
•Use the previous “generation” to
inform the next generation
•E.g., sample from be...
Start with random search, then iterate:
•Use the previous “generation” to
inform the next generation
•E.g., sample from be...
Model the loss function:
Hyperparameters ⇒ loss
Iteratively search space, trading off
between exploration and exploitation
...
Get samples: Test new points in
hyperparameter space
Bayesian Optimization
A B C D E F
0
1
2
3
4
5
A B C D E F
0
1
2
3
4
5
Get samples: Test new points in
hyperparameter space
Update model of space:
Hyperparameters ⇒ loss...
Comparing tuning methods
Iterative /
adaptive?
# evaluations
for P params
Model of
param space
Grid search No O(c^P) none
...
Open-source tools for tuning
Grid
search
Random
search
Population
-based
Bayesian PyPi
downloads
last month
Github
stars
L...
Tracking Tuning Workflows
MLflow Overview
42
Tracking
Record and query
experiments: code,
data, config, results
Projects
Packaging format
for reprod...
Organizing with
Training Data Validation Data Test Data
Final ML ModelML Model 1
ML Model 2
ML Model 3
Experiment
Main run...
Instrumenting tuning with
MLflow concepts for tracking runs
Params: hyperparameters
Metrics: training & validation, loss &...
Analyzing how tuning performs
Questions to answer
• Am I tuning the right hyperparameters?
• Am I exploring the right part...
Auto-tracking MLlib with
Training Data Validation Data Test Data
Final ML ModelML Model 1
ML Model 2
ML Model 3
Experiment...
Scaling Tuning Workflows
Hyperopt
Hyperparameter tuning in Python ML workflows
● Usable with any Python ML library
● Tuning algorithms:
○ Random se...
Distribute tuning across Spark clusters
● Each Spark task trains & evaluates 1 model (hyperparameter setting)
○ Applicable...
Related Content
Blog:
• Hyperparameter Tuning with MLflow,
Apache Spark MLlib and Hyperopt
Webinar:
• How to Automate Mach...
Getting started
MLflow
Managed MLflow
Generally Available in
Databricks
MLlib + automated
MLflow tracking
Public preview i...
Thank you
Q&A
52
Automated Hyperparameter Tuning, Scaling and Tracking
Nächste SlideShare
Wird geladen in …5
×

Automated Hyperparameter Tuning, Scaling and Tracking

1.180 Aufrufe

Veröffentlicht am

Automated Machine Learning (AutoML) has received significant interest recently. We believe that the right automation would bring significant value and dramatically shorten time-to-value for data science teams. Databricks is automating the Data Science and Machine Learning process through a combination of product offerings, partnerships, and custom solutions. This talk will focus on how Databricks can help automate hyperparameter tuning.

For both traditional Machine Learning and modern Deep Learning, tuning hyperparameters can dramatically increase model performance and improve training times. However, tuning can be a complex and expensive process. In this talk, we'll start with a brief survey of the most popular techniques for hyperparameter tuning (e.g., grid search, random search, and Bayesian optimization). We will then discuss open source tools that implement each of these techniques, helping to automate the search over hyperparameters.

Finally, we will discuss and demo improvements we built for these tools in Databricks, including integration with MLflow:

Apache PySpark MLlib integration with MLflow for automatically tracking tuning
Hyperopt integration with Apache Spark to distribute tuning and with MLflow for automatic tracking

Recording and notebooks will be provided after the webinar so that you can practice at your own pace.

Presenters
Joseph Bradley, Software Engineer, Databricks
Joseph Bradley is a Software Engineer and Apache Spark PMC member working on Machine Learning at Databricks. Previously, he was a postdoc at UC Berkeley after receiving his Ph.D. in Machine Learning from Carnegie Mellon in 2013.
Yifan Cao, Senior Product Manager, Databricks
Yifan Cao is a Senior Product Manager at Databricks. His product area spans ML/DL algorithms and Databricks Runtime for Machine Learning. Prior to Databricks, Yifan worked on two Machine Learning products, applying NLP to find metadata and applying machine learning to predict equipment failures. He helped build the products from ground up to multi-million dollars in ARR. Yifan started his career as a researcher in quantum computing. Yifan received his B.S in UC Berkeley and Master from MIT.

Veröffentlicht in: Daten & Analysen
  • DOWNLOAD FULL BOOKS, INTO AVAILABLE FORMAT ......................................................................................................................... ......................................................................................................................... 1.DOWNLOAD FULL. PDF EBOOK here { https://tinyurl.com/yxufevpm } ......................................................................................................................... 1.DOWNLOAD FULL. EPUB Ebook here { https://tinyurl.com/yxufevpm } ......................................................................................................................... 1.DOWNLOAD FULL. doc Ebook here { https://tinyurl.com/yxufevpm } ......................................................................................................................... 1.DOWNLOAD FULL. PDF EBOOK here { https://tinyurl.com/yxufevpm } ......................................................................................................................... 1.DOWNLOAD FULL. EPUB Ebook here { https://tinyurl.com/yxufevpm } ......................................................................................................................... 1.DOWNLOAD FULL. doc Ebook here { https://tinyurl.com/yxufevpm } ......................................................................................................................... ......................................................................................................................... ......................................................................................................................... .............. Browse by Genre Available eBooks ......................................................................................................................... Art, Biography, Business, Chick Lit, Children's, Christian, Classics, Comics, Contemporary, Cookbooks, Crime, Ebooks, Fantasy, Fiction, Graphic Novels, Historical Fiction, History, Horror, Humor And Comedy, Manga, Memoir, Music, Mystery, Non Fiction, Paranormal, Philosophy, Poetry, Psychology, Religion, Romance, Science, Science Fiction, Self Help, Suspense, Spirituality, Sports, Thriller, Travel, Young Adult,
       Antworten 
    Sind Sie sicher, dass Sie …  Ja  Nein
    Ihre Nachricht erscheint hier

Automated Hyperparameter Tuning, Scaling and Tracking

  1. 1. Automated Hyperparameter Tuning June 20th, 2019
  2. 2. Logistics • We can’t hear you… • Recording will be available… • Slides will be available… • Code samples and notebooks will be available… • Queue up Questions… • Bookmark databricks.com/blog
  3. 3. About our speakers Yifan Cao, Sr. Product Manager, Machine Learning at Databricks • Product Area: ML/DL algorithms and Databricks Runtime for Machine Learning • Built and grew two ML products to multi-million dollars in annual revenue • B.S. Engineering from UC Berkeley; MBA from MIT Joseph Bradley, Software Engineer, Machine Learning at Databricks • Apache Spark PMC member • Postdoc at UC Berkeley • Ph.D. in Machine Learning from Carnegie Mellon
  4. 4. Accelerate innovation by unifying data science, engineering and business • Original creators of • 2000+ global companies use our platform across big data & machine learning lifecycle VISION WHO WE ARE Unified Analytics PlatformSOLUTION
  5. 5. DATA ENGINEERS x Data & ML Tech and People are in Silos DATA SCIENTISTS
  6. 6. Hiring Data Scientists is a Key Blocker
  7. 7. “My team needs to build 100+ models this year, but it has only got to 20%.”
  8. 8. What is Automated ML (AutoML)? ● Excel-like tool that enables anyone to do machine learning ● Productivity tools for data scientists
  9. 9. Raw Data Model Exploration Feature Engineering ETL Model Scoring Hyperparam eter Tuning Alerting & Monitoring Cross Validation Where does AutoML fit on Databricks? DATA ENGINEERS DATA SCIENTISTS AutoML
  10. 10. Great Training AutoML on Databricks (1/3) AutoML librariesUSER CONTROL
  11. 11. Watch it now > https://dbricks.co/zynga Custom Solution: Zynga Automating Predictive Modeling at Zynga with Pandas UDFs
  12. 12. Great Training AutoML on Databricks (2/3) AutoML libraries PartnershipsAUTOMATION USER CONTROL
  13. 13. Databricks ETL & ML Databricks ML Test & Model Enable data scientists and citizen data scientists to accelerate and scale the development and delivery of predictive models. Run and deploy ML models at Scale 14 Databricks and DataRobot Integration Watch it now > https://dbricks.co/datarobot
  14. 14. Great Training AutoML on Databricks (3/3) AutoML libraries Partnerships Hyperopt AUTOMATION USER CONTROL AUTOMATION + CONTROL Integrations MLlib Today's Content
  15. 15. Great Training A simple analogy Manual Transmission Semi AutonomousAUTOMATION USER CONTROL AUTOMATION + CONTROL Automatic Transmission Today's Content
  16. 16. Use Case #1: Hyperparameter Tuning Model Exploration Feature Engineering Model Scoring Hyperparam eter Tuning Alerting & Monitoring Cross Validation Scenarios: ● Automated hyperparameter search to select models after cross validation ● Automated hyperparameter search to optimize models in production Our Offerings: ● Distributed Hyperopt + Automated MLflow Tracking Raw Data ETL
  17. 17. Use Case #2: Model Search Model Exploration Feature Engineering Model Scoring Hyperparam eter Tuning Alerting & Monitoring Cross Validation Scenarios: ● Automated model search by exploring different combinations of featuresets, algos, hyperparameters ● Automated model search by extending a baseline model to 1000+ custom models Our Offerings: ● MLlib + Automated MLflow Tracking ● Distributed Hyperopt + Automated MLflow Tracking, with conditional hyperparameter tuning Raw Data ETL
  18. 18. Scenarios: ● Automated end-to-end Machine Learning model generation pipelines incorporating customer-specified logics Our Offerings: ● Leverage existing Databricks internal tools & frameworks on top of Databricks Runtime ML Use Case #3: End-to-end ML Pipeline Model Exploration Feature Engineering Model Scoring Hyperparam eter Tuning Alerting & Monitoring Cross Validation Raw Data ETL
  19. 19. Hyperparameters
  20. 20. Hyperparameters Express high-level concepts, such as statistical assumptions E.g.: regularization Are fixed before training or are hard to learn from data E.g.: neural net architecture Affect objective, test time performance, computational cost E.g.: # iterations or epochs
  21. 21. Tuning hyperparameters E.g.: Fitting a polynomial Common goals: • More flexible modeling process • Reduced generalization error • Faster training • Plug & play ML
  22. 22. Challenges in tuning Curse of dimensionality Non-convex optimization Computational cost Unintuitive hyperparameters
  23. 23. Data prep: train-validation-test splits Data
  24. 24. Data prep: train-validation-test splits Training Data Test Data ML Model
  25. 25. Data prep: train-validation-test splits Training Data Validation Data Test Data Final ML Model ML Model 1 ML Model 2 ML Model 3
  26. 26. A practical definition of tuning ML Model Featurization Model family selection Hyperparameter tuning Parameters: configs which your ML library learns from data Hyperparameters: configs which your ML library does not learn from data
  27. 27. Tuning Methods
  28. 28. Overview of tuning methods •Manual search •Grid search •Random search •Population-based algorithms •Bayesian algorithms
  29. 29. Manual search Select hyperparameter settings to try based on human intuition. 2 hyperparameters: •[0, ..., 5] •{A, B, ..., F} A B C D E F 0 1 2 3 4 5 Expert knowledge tells us to try: (2,C), (2,D), (2,E), (3,C), (3,D), (3,E)
  30. 30. Grid Search Try points on a grid defined by ranges and step sizes X-axis: {A,...,F} Y-axis: 0-5, step = 1 A B C D E F 0 1 2 3 4 5
  31. 31. A B C D E F 0 1 2 3 4 5 Random Search Sample from distributions over ranges X-axis: Uniform({A,...,F}) Y-axis: Uniform([0,5])
  32. 32. Start with random search, then iterate: •Use the previous “generation” to inform the next generation •E.g., sample from best performers & then perturb them Population Based Algorithms A B C D E F 0 1 2 3 4 5
  33. 33. Start with random search, then iterate: •Use the previous “generation” to inform the next generation •E.g., sample from best performers & then perturb them Population Based Algorithms A B C D E F 0 1 2 3 4 5
  34. 34. Start with random search, then iterate: •Use the previous “generation” to inform the next generation •E.g., sample from best performers & then perturb them Population Based Algorithms A B C D E F 0 1 2 3 4 5
  35. 35. Model the loss function: Hyperparameters ⇒ loss Iteratively search space, trading off between exploration and exploitation A B C D E F 0 1 2 3 4 5 Bayesian Optimization
  36. 36. Get samples: Test new points in hyperparameter space Bayesian Optimization A B C D E F 0 1 2 3 4 5
  37. 37. A B C D E F 0 1 2 3 4 5 Get samples: Test new points in hyperparameter space Update model of space: Hyperparameters ⇒ loss Bayesian Optimization
  38. 38. Comparing tuning methods Iterative / adaptive? # evaluations for P params Model of param space Grid search No O(c^P) none Random search No O(k) none Population-based Yes O(k) implicit Bayesian Yes O(k) explicit
  39. 39. Open-source tools for tuning Grid search Random search Population -based Bayesian PyPi downloads last month Github stars License scikit-learn Yes Yes --- --- BSD MLlib Yes --- --- Apache 2.0 scikit-opti mize Yes 49,189 1,278 BSD Hyperopt Yes Yes 98,282 3,286 BSD DEAP Yes 26,700 2,789 LGPL v3 TPOT Yes 9,057 5,609 LGPL v3 GPyOpt Yes 4,959 451 BSD As of mid-April 2019
  40. 40. Tracking Tuning Workflows
  41. 41. MLflow Overview 42 Tracking Record and query experiments: code, data, config, results Projects Packaging format for reproducible runs on any platform Models General model format that supports diverse deployment tools mlflow.org github.com/mlflow twitter.com/MLflowdatabricks.com/mlflow
  42. 42. Organizing with Training Data Validation Data Test Data Final ML ModelML Model 1 ML Model 2 ML Model 3 Experiment Main run Child runs Tip: Tune full pipeline, not 1 model.
  43. 43. Instrumenting tuning with MLflow concepts for tracking runs Params: hyperparameters Metrics: training & validation, loss & objective, multiple objectives Tags: provenance, simple metadata Artifacts: serialized model, large metadata
  44. 44. Analyzing how tuning performs Questions to answer • Am I tuning the right hyperparameters? • Am I exploring the right parts of the search space? • Do I need to do another round of tuning? Examining results • Simple case: visualize param vs metric • Challenges: multiple params and metrics, iterative experimentation
  45. 45. Auto-tracking MLlib with Training Data Validation Data Test Data Final ML ModelML Model 1 ML Model 2 ML Model 3 Experiment Main run Child runs In Databricks • CrossValidator & TrainValidationSplit • 1 run per setting of hyperparameters • Avg metrics for CV folds(demo)
  46. 46. Scaling Tuning Workflows
  47. 47. Hyperopt Hyperparameter tuning in Python ML workflows ● Usable with any Python ML library ● Tuning algorithms: ○ Random search ○ Bayesian (Tree of Parzen Estimators) ● Open source (3-clause BSD license) https://github.com/hyperopt/hyperopt
  48. 48. Distribute tuning across Spark clusters ● Each Spark task trains & evaluates 1 model (hyperparameter setting) ○ Applicable to single-machine ML workloads ● Via new SparkTrials plugin ● Contributing to open source Hyperopt: github.com/hyperopt/hyperopt/pull/509 With automated MLflow tracking in Databricks Available now in Databricks Runtime 5.4 ML Hyperopt on Apache Spark (demo)
  49. 49. Related Content Blog: • Hyperparameter Tuning with MLflow, Apache Spark MLlib and Hyperopt Webinar: • How to Automate Machine Learning and Scale Delivery Tutorials ● Hyperparameter Tuning Documentation ● MLflow integrations with H20.ai GPyOpt, HyperOpt Notebooks ● MLlib + Automated MLflow Tracking ● Distributed Hyperopt + Automated MLflow Tracking ● Basic Introduction to DataRobot via API Videos ● Automating Predictive Modeling at Zynga with PySpark and Pandas UDFs ● Best Practices for Hyperparameter Tuning with MLflow ● Advanced Hyperparameter Optimization for Deep Learning with MLflow
  50. 50. Getting started MLflow Managed MLflow Generally Available in Databricks MLlib + automated MLflow tracking Public preview in Databricks Runtime 5.4 & 5.4ML Distributed Hyperopt + automated MLflow tracking Public preview in Databricks Runtime 5.4ML https://docs.databricks.com/spark/latest/mllib/index.html#hyperparameter-tuning https://docs.azuredatabricks.net/spark/latest/mllib/index.html#hyperparameter-tuning https://mlflow.org/
  51. 51. Thank you Q&A 52

×