SlideShare ist ein Scribd-Unternehmen logo
1 von 21
Using H2O Random Grid Search for
Hyper-parameters Optimization
Jo-fai (Joe) Chow
Data Scientist
joe@h2o.ai
WHO AM I
• Customer Data Scientist at H2O.ai
• Background
o Telecom (Virgin Media)
o Data Science Platform (Domino Data Lab)
o Water Engineering + Machine Learning Research
(STREAM Industrial Doctorate Centre)
ABOUT H2O
• Company
o Team: 50 (45 shown)
o Founded in 2012, Mountain View,
California.
o Venture capital backed
• Products
o Open-source machine learning
platform.
o Flow (Web), R, Python, Spark,
Hadoop interfaces.
ABOUT THIS TALK
• Story of a baker and a data scientist
o Why you should care
• Hyper-parameters optimization
o Common techniques
o H2O Python API
• Other H2O features for streamlining workflow
STORY OF A BAKER
• Making a cake
o Source
• Ingredients
o Process:
• Mixing
• Baking
• Decorating
o End product
• A nice looking cake
Credit: www.dphotographer.co.uk/image/201305/baking_a_cake
STORY OF A DATA SCIENTIST
• Making a data product
o Source
• Raw data
o Process:
• Data munging
• Analyzing / Modeling
• Reporting
o End product
• Apps, graphs or reports
Credit: www.simranjindal.com
BAKER AND DATA SCIENTIST
• What do they have in common?
o Process is important to bakers and data scientists.
Yet, most customers do not appreciate the effort.
o Most customers only care about raw materials
quality and end products.
WHY YOU SHOULD CARE
• We can use machine/software to automate
some laborious tasks.
• We can spend more time on quality assurance
and presentation.
• This talk is about making one specific task,
hyper-parameters tuning, more efficient.
HYPER-PARAMETERS OPTIMISATION
• Overview
o Optimizing an algorithm’s performance.
• e.g. Random Forest, Gradient Boosting Machine (GBM)
o Trying different sets of hyper-parameters within a
defined search space.
o No rules of thumb.
HYPER-PARAMETERS OPTIMISATION
• Example of hyper-parameters in H2O
o Random Forest:
• No. of trees, depth of trees, sample rate …
o Gradient Boosting Machine (GBM):
• No. of trees, depth of trees, learning rate, sample rate …
o Deep Learning:
• Activation, hidden layer sizes, L1, L2, dropout ratios …
COMMON TECHNIQUES
• Manual search
o Tuning by hand - inefficient
o Expert opinion (not always reliable)
• Grid search
o Automated search within a defined space
o Computationally expensive
• Random grid search
o More efficient than manual / grid search
o Equal performance in less time
RANDOM GRID SEARCH – DOES IT WORK?
• Random Search for Hyper-Parameter
Optimization
o Journal of Machine Learning Research (2012)
o James Bergstra and Yoshua Bengio
o “Compared with deep belief networks configured by a
thoughtful combination of manual search and grid
search, purely random search found statistically equal
performance on four of seven data sets, and superior
performance on one of seven.”
RELATED FEATURE – EARLY STOPPING
• A technique for regularization.
• Avoid over-fitting the training set.
• Useful when combined with hyper-parameter
search:
o Additional controls (e.g. time constraint,
tolerance)
H2O RANDOM GRID SEARCH
• Objectives
o Optimize model performance based on evaluation
metric.
o Explore the defined search space randomly.
o Use early-stopping for regularization and
additional controls.
RANDOM GRID SEARCH (PYTHON API)
RANDOM GRID SEARCH
• Outputs
o Best model based on metric
o A set of hyper-parameters for the best model
• Other APIs
o R, REST, Java (see documentation on GitHub)
OTHER H2O FEATURES
• h2oEnsemble
o Better predictive performance
• Sparkling Water = Spark + H2O
• Plain Old Java Object (POJO)
o Productionize H2O models
CONCLUSIONS
• Most people only care about the end product.
• Use H2O random grid search to save time on
hyper-parameters tuning.
• Spend more time on quality assurance and
presentation.
CONCLUSIONS
• H2O Random Grid Search
o An efficient way to tune hyper-parameters
o APIs for Python, R, Java, REST
o Do check out the code examples on GitHub
• Combine with other H2O features
o Streamline data science workflow
ACKNOWLEDGEMENTS
• GoDataDriven
• Conference sponsors
• My colleagues at H2O.ai
THANK YOU
• Resources
o Slides + code – github.com/h2oai/h2o-meetups
o Download H2O – www.h2o.ai
o Documentation – www.h2o.ai/docs/
o joe@h2o.ai
• We are hiring – www.h2o.ai/careers/

Weitere ähnliche Inhalte

Was ist angesagt?

Building a Knowledge Graph with Spark and NLP: How We Recommend Novel Drugs t...
Building a Knowledge Graph with Spark and NLP: How We Recommend Novel Drugs t...Building a Knowledge Graph with Spark and NLP: How We Recommend Novel Drugs t...
Building a Knowledge Graph with Spark and NLP: How We Recommend Novel Drugs t...
Databricks
 
Application of machine learning in industrial applications
Application of machine learning in industrial applicationsApplication of machine learning in industrial applications
Application of machine learning in industrial applications
Anish Das
 
Big Data Analytics | What Is Big Data Analytics? | Big Data Analytics For Beg...
Big Data Analytics | What Is Big Data Analytics? | Big Data Analytics For Beg...Big Data Analytics | What Is Big Data Analytics? | Big Data Analytics For Beg...
Big Data Analytics | What Is Big Data Analytics? | Big Data Analytics For Beg...
Simplilearn
 

Was ist angesagt? (20)

Anomaly detection
Anomaly detectionAnomaly detection
Anomaly detection
 
Anomaly detection Workshop slides
Anomaly detection Workshop slidesAnomaly detection Workshop slides
Anomaly detection Workshop slides
 
Data Mining Technique - CRISP-DM
Data Mining Technique - CRISP-DMData Mining Technique - CRISP-DM
Data Mining Technique - CRISP-DM
 
Active Governance Across the Delta Lake with Alation
Active Governance Across the Delta Lake with AlationActive Governance Across the Delta Lake with Alation
Active Governance Across the Delta Lake with Alation
 
Data Management Best Practices
Data Management Best PracticesData Management Best Practices
Data Management Best Practices
 
Outlier Detection
Outlier DetectionOutlier Detection
Outlier Detection
 
Getting Started with Azure AutoML
Getting Started with Azure AutoMLGetting Started with Azure AutoML
Getting Started with Azure AutoML
 
Dimensional Modeling
Dimensional ModelingDimensional Modeling
Dimensional Modeling
 
Building a Knowledge Graph with Spark and NLP: How We Recommend Novel Drugs t...
Building a Knowledge Graph with Spark and NLP: How We Recommend Novel Drugs t...Building a Knowledge Graph with Spark and NLP: How We Recommend Novel Drugs t...
Building a Knowledge Graph with Spark and NLP: How We Recommend Novel Drugs t...
 
Application of machine learning in industrial applications
Application of machine learning in industrial applicationsApplication of machine learning in industrial applications
Application of machine learning in industrial applications
 
Big Data Analytics | What Is Big Data Analytics? | Big Data Analytics For Beg...
Big Data Analytics | What Is Big Data Analytics? | Big Data Analytics For Beg...Big Data Analytics | What Is Big Data Analytics? | Big Data Analytics For Beg...
Big Data Analytics | What Is Big Data Analytics? | Big Data Analytics For Beg...
 
data science @NYT ; inaugural Data Science Initiative Lecture
data science @NYT ; inaugural Data Science Initiative Lecturedata science @NYT ; inaugural Data Science Initiative Lecture
data science @NYT ; inaugural Data Science Initiative Lecture
 
Predictive Analytics Using R | Edureka
Predictive Analytics Using R | EdurekaPredictive Analytics Using R | Edureka
Predictive Analytics Using R | Edureka
 
Metadata Strategies
Metadata StrategiesMetadata Strategies
Metadata Strategies
 
Best Practices in Metadata Management
Best Practices in Metadata ManagementBest Practices in Metadata Management
Best Practices in Metadata Management
 
Data engineering zoomcamp introduction
Data engineering zoomcamp  introductionData engineering zoomcamp  introduction
Data engineering zoomcamp introduction
 
Anomaly Detection and Spark Implementation - Meetup Presentation.pptx
Anomaly Detection and Spark Implementation - Meetup Presentation.pptxAnomaly Detection and Spark Implementation - Meetup Presentation.pptx
Anomaly Detection and Spark Implementation - Meetup Presentation.pptx
 
Presentation on Big Data Analytics
Presentation on Big Data AnalyticsPresentation on Big Data Analytics
Presentation on Big Data Analytics
 
Ethics of Analytics and Machine Learning
Ethics of Analytics and Machine LearningEthics of Analytics and Machine Learning
Ethics of Analytics and Machine Learning
 
Big Data na globo.com
Big Data na globo.comBig Data na globo.com
Big Data na globo.com
 

Ähnlich wie Using H2O Random Grid Search for Hyper-parameters Optimization

02-Lifecycle.pptx
02-Lifecycle.pptx02-Lifecycle.pptx
02-Lifecycle.pptx
Shree Shree
 
Scalable Machine Learning in R and Python with H2O
Scalable Machine Learning in R and Python with H2OScalable Machine Learning in R and Python with H2O
Scalable Machine Learning in R and Python with H2O
Sri Ambati
 

Ähnlich wie Using H2O Random Grid Search for Hyper-parameters Optimization (20)

H2O Random Grid Search - PyData Amsterdam
H2O Random Grid Search - PyData AmsterdamH2O Random Grid Search - PyData Amsterdam
H2O Random Grid Search - PyData Amsterdam
 
How and why you need to build a big data lab
How and why you need to build a big data labHow and why you need to build a big data lab
How and why you need to build a big data lab
 
Scalable Automatic Machine Learning with H2O
Scalable Automatic Machine Learning with H2OScalable Automatic Machine Learning with H2O
Scalable Automatic Machine Learning with H2O
 
Applied Machine learning using H2O, python and R Workshop
Applied Machine learning using H2O, python and R WorkshopApplied Machine learning using H2O, python and R Workshop
Applied Machine learning using H2O, python and R Workshop
 
Experimentation Platform on Hadoop
Experimentation Platform on HadoopExperimentation Platform on Hadoop
Experimentation Platform on Hadoop
 
eBay Experimentation Platform on Hadoop
eBay Experimentation Platform on HadoopeBay Experimentation Platform on Hadoop
eBay Experimentation Platform on Hadoop
 
Open Platform for AI & ML modeling
Open Platform for AI & ML modelingOpen Platform for AI & ML modeling
Open Platform for AI & ML modeling
 
Reproducible AI using MLflow and PyTorch
Reproducible AI using MLflow and PyTorchReproducible AI using MLflow and PyTorch
Reproducible AI using MLflow and PyTorch
 
Scalable Ensemble Machine Learning @ Harvard Health Policy Data Science Lab
Scalable Ensemble Machine Learning @ Harvard Health Policy Data Science LabScalable Ensemble Machine Learning @ Harvard Health Policy Data Science Lab
Scalable Ensemble Machine Learning @ Harvard Health Policy Data Science Lab
 
Analysing GitHub commits with R
Analysing GitHub commits with RAnalysing GitHub commits with R
Analysing GitHub commits with R
 
Intro to Machine Learning with H2O and AWS
Intro to Machine Learning with H2O and AWSIntro to Machine Learning with H2O and AWS
Intro to Machine Learning with H2O and AWS
 
Introduction to H2O and Model Stacking Use Cases
Introduction to H2O and Model Stacking Use CasesIntroduction to H2O and Model Stacking Use Cases
Introduction to H2O and Model Stacking Use Cases
 
H2O at Poznan R Meetup
H2O at Poznan R MeetupH2O at Poznan R Meetup
H2O at Poznan R Meetup
 
02-Lifecycle.pptx
02-Lifecycle.pptx02-Lifecycle.pptx
02-Lifecycle.pptx
 
Improving Model Predictions via Stacking and Hyper-parameters Tuning
Improving Model Predictions via Stacking and Hyper-parameters TuningImproving Model Predictions via Stacking and Hyper-parameters Tuning
Improving Model Predictions via Stacking and Hyper-parameters Tuning
 
Scalable Machine Learning in R and Python with H2O
Scalable Machine Learning in R and Python with H2OScalable Machine Learning in R and Python with H2O
Scalable Machine Learning in R and Python with H2O
 
H2O at Berlin R Meetup
H2O at Berlin R MeetupH2O at Berlin R Meetup
H2O at Berlin R Meetup
 
Berlin R Meetup
Berlin R MeetupBerlin R Meetup
Berlin R Meetup
 
Three signs your architecture is too small for big data. Camp IT December 2014
Three signs your architecture is too small for big data.  Camp IT December 2014Three signs your architecture is too small for big data.  Camp IT December 2014
Three signs your architecture is too small for big data. Camp IT December 2014
 
The Effect of Third Party Implementations on Reproducibility
The Effect of Third Party Implementations on ReproducibilityThe Effect of Third Party Implementations on Reproducibility
The Effect of Third Party Implementations on Reproducibility
 

Mehr von Jo-fai Chow

Kaggle competitions, new friends, new skills and new opportunities
Kaggle competitions, new friends, new skills and new opportunitiesKaggle competitions, new friends, new skills and new opportunities
Kaggle competitions, new friends, new skills and new opportunities
Jo-fai Chow
 
A Systematic, Multi-Criteria Decision Support Framework for Sustainable Drain...
A Systematic, Multi-Criteria Decision Support Framework for Sustainable Drain...A Systematic, Multi-Criteria Decision Support Framework for Sustainable Drain...
A Systematic, Multi-Criteria Decision Support Framework for Sustainable Drain...
Jo-fai Chow
 
Udacity Statement (Introduction to Statistics, August 2012)
Udacity Statement (Introduction to Statistics, August 2012)Udacity Statement (Introduction to Statistics, August 2012)
Udacity Statement (Introduction to Statistics, August 2012)
Jo-fai Chow
 
Coursera Statement (Computational Investing, Part I,
Coursera Statement (Computational Investing, Part I, Coursera Statement (Computational Investing, Part I,
Coursera Statement (Computational Investing, Part I,
Jo-fai Chow
 

Mehr von Jo-fai Chow (20)

Making Multimillion-Dollar Baseball Decisions with H2O AutoML, LIME and Shiny
Making Multimillion-Dollar Baseball Decisions with H2O AutoML, LIME and ShinyMaking Multimillion-Dollar Baseball Decisions with H2O AutoML, LIME and Shiny
Making Multimillion-Dollar Baseball Decisions with H2O AutoML, LIME and Shiny
 
Automatic and Interpretable Machine Learning in R with H2O and LIME
Automatic and Interpretable Machine Learning in R with H2O and LIMEAutomatic and Interpretable Machine Learning in R with H2O and LIME
Automatic and Interpretable Machine Learning in R with H2O and LIME
 
Automatic and Interpretable Machine Learning with H2O and LIME
Automatic and Interpretable Machine Learning with H2O and LIMEAutomatic and Interpretable Machine Learning with H2O and LIME
Automatic and Interpretable Machine Learning with H2O and LIME
 
H2O at BelgradeR Meetup
H2O at BelgradeR MeetupH2O at BelgradeR Meetup
H2O at BelgradeR Meetup
 
Introduction to Machine Learning with H2O and Python
Introduction to Machine Learning with H2O and PythonIntroduction to Machine Learning with H2O and Python
Introduction to Machine Learning with H2O and Python
 
Introduction to Machine Learning with H2O and Python
Introduction to Machine Learning with H2O and PythonIntroduction to Machine Learning with H2O and Python
Introduction to Machine Learning with H2O and Python
 
From Kaggle to H2O - The True Story of a Civil Engineer Turned Data Geek
From Kaggle to H2O - The True Story of a Civil Engineer Turned Data GeekFrom Kaggle to H2O - The True Story of a Civil Engineer Turned Data Geek
From Kaggle to H2O - The True Story of a Civil Engineer Turned Data Geek
 
H2O Deep Water - Making Deep Learning Accessible to Everyone
H2O Deep Water - Making Deep Learning Accessible to EveryoneH2O Deep Water - Making Deep Learning Accessible to Everyone
H2O Deep Water - Making Deep Learning Accessible to Everyone
 
H2O Deep Water - Making Deep Learning Accessible to Everyone
H2O Deep Water - Making Deep Learning Accessible to EveryoneH2O Deep Water - Making Deep Learning Accessible to Everyone
H2O Deep Water - Making Deep Learning Accessible to Everyone
 
Project "Deep Water"
Project "Deep Water"Project "Deep Water"
Project "Deep Water"
 
H2O Machine Learning Use Cases
H2O Machine Learning Use CasesH2O Machine Learning Use Cases
H2O Machine Learning Use Cases
 
Kaggle Competitions, New Friends, New Skills and New Opportunities
Kaggle Competitions, New Friends, New Skills and New OpportunitiesKaggle Competitions, New Friends, New Skills and New Opportunities
Kaggle Competitions, New Friends, New Skills and New Opportunities
 
Introduction to Generalised Low-Rank Model and Missing Values
Introduction to Generalised Low-Rank Model and Missing ValuesIntroduction to Generalised Low-Rank Model and Missing Values
Introduction to Generalised Low-Rank Model and Missing Values
 
Kaggle competitions, new friends, new skills and new opportunities
Kaggle competitions, new friends, new skills and new opportunitiesKaggle competitions, new friends, new skills and new opportunities
Kaggle competitions, new friends, new skills and new opportunities
 
Deploying your Predictive Models as a Service via Domino
Deploying your Predictive Models as a Service via DominoDeploying your Predictive Models as a Service via Domino
Deploying your Predictive Models as a Service via Domino
 
A Systematic, Multi-Criteria Decision Support Framework for Sustainable Drain...
A Systematic, Multi-Criteria Decision Support Framework for Sustainable Drain...A Systematic, Multi-Criteria Decision Support Framework for Sustainable Drain...
A Systematic, Multi-Criteria Decision Support Framework for Sustainable Drain...
 
Designing Sustainable Drainage Systems
Designing Sustainable Drainage SystemsDesigning Sustainable Drainage Systems
Designing Sustainable Drainage Systems
 
Developing a New Decision Support System for SuDS
Developing a New Decision Support System for SuDSDeveloping a New Decision Support System for SuDS
Developing a New Decision Support System for SuDS
 
Udacity Statement (Introduction to Statistics, August 2012)
Udacity Statement (Introduction to Statistics, August 2012)Udacity Statement (Introduction to Statistics, August 2012)
Udacity Statement (Introduction to Statistics, August 2012)
 
Coursera Statement (Computational Investing, Part I,
Coursera Statement (Computational Investing, Part I, Coursera Statement (Computational Investing, Part I,
Coursera Statement (Computational Investing, Part I,
 

Kürzlich hochgeladen

Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
amitlee9823
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
shivangimorya083
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
amitlee9823
 
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
shivangimorya083
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
shivangimorya083
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
Lars Albertsson
 

Kürzlich hochgeladen (20)

Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptx
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptx
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
 
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptx
 
Sampling (random) method and Non random.ppt
Sampling (random) method and Non random.pptSampling (random) method and Non random.ppt
Sampling (random) method and Non random.ppt
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
 
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
 

Using H2O Random Grid Search for Hyper-parameters Optimization

  • 1. Using H2O Random Grid Search for Hyper-parameters Optimization Jo-fai (Joe) Chow Data Scientist joe@h2o.ai
  • 2. WHO AM I • Customer Data Scientist at H2O.ai • Background o Telecom (Virgin Media) o Data Science Platform (Domino Data Lab) o Water Engineering + Machine Learning Research (STREAM Industrial Doctorate Centre)
  • 3. ABOUT H2O • Company o Team: 50 (45 shown) o Founded in 2012, Mountain View, California. o Venture capital backed • Products o Open-source machine learning platform. o Flow (Web), R, Python, Spark, Hadoop interfaces.
  • 4. ABOUT THIS TALK • Story of a baker and a data scientist o Why you should care • Hyper-parameters optimization o Common techniques o H2O Python API • Other H2O features for streamlining workflow
  • 5. STORY OF A BAKER • Making a cake o Source • Ingredients o Process: • Mixing • Baking • Decorating o End product • A nice looking cake Credit: www.dphotographer.co.uk/image/201305/baking_a_cake
  • 6. STORY OF A DATA SCIENTIST • Making a data product o Source • Raw data o Process: • Data munging • Analyzing / Modeling • Reporting o End product • Apps, graphs or reports Credit: www.simranjindal.com
  • 7. BAKER AND DATA SCIENTIST • What do they have in common? o Process is important to bakers and data scientists. Yet, most customers do not appreciate the effort. o Most customers only care about raw materials quality and end products.
  • 8. WHY YOU SHOULD CARE • We can use machine/software to automate some laborious tasks. • We can spend more time on quality assurance and presentation. • This talk is about making one specific task, hyper-parameters tuning, more efficient.
  • 9. HYPER-PARAMETERS OPTIMISATION • Overview o Optimizing an algorithm’s performance. • e.g. Random Forest, Gradient Boosting Machine (GBM) o Trying different sets of hyper-parameters within a defined search space. o No rules of thumb.
  • 10. HYPER-PARAMETERS OPTIMISATION • Example of hyper-parameters in H2O o Random Forest: • No. of trees, depth of trees, sample rate … o Gradient Boosting Machine (GBM): • No. of trees, depth of trees, learning rate, sample rate … o Deep Learning: • Activation, hidden layer sizes, L1, L2, dropout ratios …
  • 11. COMMON TECHNIQUES • Manual search o Tuning by hand - inefficient o Expert opinion (not always reliable) • Grid search o Automated search within a defined space o Computationally expensive • Random grid search o More efficient than manual / grid search o Equal performance in less time
  • 12. RANDOM GRID SEARCH – DOES IT WORK? • Random Search for Hyper-Parameter Optimization o Journal of Machine Learning Research (2012) o James Bergstra and Yoshua Bengio o “Compared with deep belief networks configured by a thoughtful combination of manual search and grid search, purely random search found statistically equal performance on four of seven data sets, and superior performance on one of seven.”
  • 13. RELATED FEATURE – EARLY STOPPING • A technique for regularization. • Avoid over-fitting the training set. • Useful when combined with hyper-parameter search: o Additional controls (e.g. time constraint, tolerance)
  • 14. H2O RANDOM GRID SEARCH • Objectives o Optimize model performance based on evaluation metric. o Explore the defined search space randomly. o Use early-stopping for regularization and additional controls.
  • 15. RANDOM GRID SEARCH (PYTHON API)
  • 16. RANDOM GRID SEARCH • Outputs o Best model based on metric o A set of hyper-parameters for the best model • Other APIs o R, REST, Java (see documentation on GitHub)
  • 17. OTHER H2O FEATURES • h2oEnsemble o Better predictive performance • Sparkling Water = Spark + H2O • Plain Old Java Object (POJO) o Productionize H2O models
  • 18. CONCLUSIONS • Most people only care about the end product. • Use H2O random grid search to save time on hyper-parameters tuning. • Spend more time on quality assurance and presentation.
  • 19. CONCLUSIONS • H2O Random Grid Search o An efficient way to tune hyper-parameters o APIs for Python, R, Java, REST o Do check out the code examples on GitHub • Combine with other H2O features o Streamline data science workflow
  • 20. ACKNOWLEDGEMENTS • GoDataDriven • Conference sponsors • My colleagues at H2O.ai
  • 21. THANK YOU • Resources o Slides + code – github.com/h2oai/h2o-meetups o Download H2O – www.h2o.ai o Documentation – www.h2o.ai/docs/ o joe@h2o.ai • We are hiring – www.h2o.ai/careers/

Hinweis der Redaktion

  1. New face in PyData conference Thanks for having me I am Joe. I work for H2O. Only joined 6 weeks ago Not really a contributor to their code base – more like a long-term user (using H2O for my previous jobs since 2014) Today’s talk is about one specific feature in H2O Instead of giving you a full demo Tell you a story from a user’s perspective
  2. A little bit more background of me H2O – helping customers leverage H2O with their data, build smart apps Before I joined H2O VM – building data apps for CEO and CFO – answering specific business questions Domino – evangelist – blogging and going to meetups and conferences Before data science journey, I was a civil engineer – working on water related research
  3. Team is growing rapidly Based in Mountain View – I am the only guy in UK/Europe VC: $34M in total
  4. Start with a story of baker / data scientist Background information Explain why you should care about the main topic – hyper-para tuning What is it Techniques API Other useful features in H2O
  5. Start with a story Or a workflow of making a cake
  6. Raw data – structured / unstructured / different databases
  7. Customers don’t see much of the process