SlideShare ist ein Scribd-Unternehmen logo
1 von 27
Downloaden Sie, um offline zu lesen
Machine Learning Scientist
@ledell / erin@h2o.ai
Erin LeDell
Automatic Machine Learning
AutoML
MEET
THE MAKERS
ERIN LEDELL
Machine Learning Scientist
NAVDEEP GILL
Software Engineer

& Data Scientist
RAY PECK
Director of Product
Engineering
• Intro to Automatic Machine Learning (AutoML)
• Random Grid Search & Stacked Ensembles
• H2O’s AutoML (R, Python, GUI)
• H2O-3 Roadmap
• Hands-on Tutorial
Agenda
Intro to AutoML
Automatic Machine Learning
Aspects of Automatic ML
Model

Generation
EnsemblesData Prep
Data Prep
• Imputation of missing data
• Standardization of numeric features
• One-hot encoding of categorical features
• Count/Label/Target encoding of categorical features
• Feature selection and/or feature extraction (e.g. PCA)
• Feature engineering
Model Generation
• Cartesian grid search
• Random grid search
• Tune individual models via Early Stopping
• Bayesian Hyperparameter Optimization
Ensembles
• Bagging / Averaging
• Stacking / Super Learning
• Ensemble Selection
Random Stacking
Random Grids + Stacked Ensembles
Stacked Ensembles
• Specify L base learners (with model params).
• Specify a metalearner (just another algo).
• Perform k-fold cross-validation on the base learners.
Stacked Ensembles
• Collect cross-validated predicted values from base
learners.
• Train a second-level metalearning algorithm to find
the optimal combination of base learners.
• Metalearner requires only a small amount of compute
on top of the cross-validation process (it’s cheap).
Random Grid Search + Stacking
• Random Grid Search combined with Stacked
Ensembles is a powerful combination.
• Stacked Ensembles perform particularly well if the
models they are based on (1) are individually
strong, and (2) make uncorrelated errors.
• Random Grid Search is an excellent way to create
a diverse of models for the ensemble.
H2O AutoML
Automatic Machine Learning in H2O
H2O Machine Learning Platform
• Distributed (multi-core + multi-node) implementations of
cutting edge ML algorithms.
• Core algorithms written in high performance Java.
• APIs available in R, Python, Scala & web GUI.
• Works on Hadoop, Spark, EC2, your laptop, etc.
• Easily deploy models to production as pure Java code.
H2O AutoML (first cut)
• Imputation, one-hot encoding, standardization.
• Random Grid Search over a custom hyperparameter
space, defined by expert data scientists.
• Early stopping of individual models and random grids.
• GBMs, Random Forests, Deep Neural Nets, GLMs
• Multiple Stacked Ensembles of models.
• Leaderboard for ranking.
H2O AutoML in R
library(h2o)
h2o.init()



train <- h2o.importFile("train.csv")


aml <- h2o.automl(y = "response_colname", 

training_frame = train,

max_runtime_secs = 600)
lb <- aml@leaderboard

H2O AutoML in Python
import h2o
from h2o.automl import H2OAutoML
h2o.init()



train = h2o.import_file("train.csv")



aml = H2OAutoML(max_runtime_secs = 600)

aml.train(y = "response_colname", 

training_frame = train)



lb = aml.leaderboard
H2O AutoML in Flow
Example Leaderboard for binary classification
H2O AutoML Leaderboard
H2O-3 Roadmap
Coming Soon to H2O
Feature Q1 Q2
New Algorithm: Cox-Proportional Hazards
GLM: Ordinal Regression
GBM: Quasibinomial
NLP Improvements, TF-IDF
Stacked Ensemble: Custom Metalearner
AutoML: New Ensembles
AutoML: Add XGBoost
Distributed XGBoost
New Algorithm: Factorization Machines
H2O-3 Roadmap
https://tinyurl.com/h2o-automl-jira
• Documentation: http://docs.h2o.ai
• Tutorials: https://github.com/h2oai/h2o-tutorials
• Slidedecks: https://github.com/h2oai/h2o-meetups
• Videos: https://www.youtube.com/user/0xdata
• Events & Meetups: http://h2o.ai/events
• Stack Overflow: https://stackoverflow.com/tags/h2o
• Google Group: https://tinyurl.com/h2ostream
• Gitter: http://gitter.im/h2oai/h2o-3
Hands-on Tutorial
DEMO
First-time Qwiklab Account Setup
• Go to http://h2oai.qwiklab.com
• Click on “JOIN”
• Create a new account with a valid email address
• You will receive a confirmation email
• Click on the link in the confirmation email
• Go back to http://h2oai.qwiklab.com and log in
• Go to the Catalog on the left bar
• Choose “Introduction to AutoML in H2O”
• Wait for instructions
https://tinyurl.com/automl-h2oworld17
Code and data available here
H2O AutoML Tutorial

Weitere ähnliche Inhalte

Was ist angesagt?

LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init Attent...
LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init Attent...LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init Attent...
LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init Attent...
Po-Chuan Chen
 
Machine Learning presentation.
Machine Learning presentation.Machine Learning presentation.
Machine Learning presentation.
butest
 
How to Utilize MLflow and Kubernetes to Build an Enterprise ML Platform
How to Utilize MLflow and Kubernetes to Build an Enterprise ML PlatformHow to Utilize MLflow and Kubernetes to Build an Enterprise ML Platform
How to Utilize MLflow and Kubernetes to Build an Enterprise ML Platform
Databricks
 

Was ist angesagt? (20)

How to fine-tune and develop your own large language model.pptx
How to fine-tune and develop your own large language model.pptxHow to fine-tune and develop your own large language model.pptx
How to fine-tune and develop your own large language model.pptx
 
Building a MLOps Platform Around MLflow to Enable Model Productionalization i...
Building a MLOps Platform Around MLflow to Enable Model Productionalization i...Building a MLOps Platform Around MLflow to Enable Model Productionalization i...
Building a MLOps Platform Around MLflow to Enable Model Productionalization i...
 
Machine Learning - Splitting Datasets
Machine Learning - Splitting DatasetsMachine Learning - Splitting Datasets
Machine Learning - Splitting Datasets
 
Frontiers of Natural Language Processing
Frontiers of Natural Language ProcessingFrontiers of Natural Language Processing
Frontiers of Natural Language Processing
 
Chain-of-thought Prompting.pptx
Chain-of-thought Prompting.pptxChain-of-thought Prompting.pptx
Chain-of-thought Prompting.pptx
 
Ml ops intro session
Ml ops   intro sessionMl ops   intro session
Ml ops intro session
 
LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init Attent...
LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init Attent...LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init Attent...
LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init Attent...
 
Getting Started with Azure AutoML
Getting Started with Azure AutoMLGetting Started with Azure AutoML
Getting Started with Azure AutoML
 
Machine Learning presentation.
Machine Learning presentation.Machine Learning presentation.
Machine Learning presentation.
 
XGBoost & LightGBM
XGBoost & LightGBMXGBoost & LightGBM
XGBoost & LightGBM
 
How to Utilize MLflow and Kubernetes to Build an Enterprise ML Platform
How to Utilize MLflow and Kubernetes to Build an Enterprise ML PlatformHow to Utilize MLflow and Kubernetes to Build an Enterprise ML Platform
How to Utilize MLflow and Kubernetes to Build an Enterprise ML Platform
 
Testing and Deployment - Full Stack Deep Learning
Testing and Deployment - Full Stack Deep LearningTesting and Deployment - Full Stack Deep Learning
Testing and Deployment - Full Stack Deep Learning
 
Microsoft Introduction to Automated Machine Learning
Microsoft Introduction to Automated Machine LearningMicrosoft Introduction to Automated Machine Learning
Microsoft Introduction to Automated Machine Learning
 
Handling High Cardinality Categoricals via Target Encodings
Handling High Cardinality Categoricals via Target Encodings Handling High Cardinality Categoricals via Target Encodings
Handling High Cardinality Categoricals via Target Encodings
 
Machine Learning
Machine LearningMachine Learning
Machine Learning
 
ML Infra for Netflix Recommendations - AI NEXTCon talk
ML Infra for Netflix Recommendations - AI NEXTCon talkML Infra for Netflix Recommendations - AI NEXTCon talk
ML Infra for Netflix Recommendations - AI NEXTCon talk
 
Manufacturing Data Analytics
Manufacturing Data AnalyticsManufacturing Data Analytics
Manufacturing Data Analytics
 
Integrating systems in the age of Quarkus and Camel
Integrating systems in the age of Quarkus and CamelIntegrating systems in the age of Quarkus and Camel
Integrating systems in the age of Quarkus and Camel
 
MLOps Using MLflow
MLOps Using MLflowMLOps Using MLflow
MLOps Using MLflow
 
MLOps with Azure DevOps
MLOps with Azure DevOpsMLOps with Azure DevOps
MLOps with Azure DevOps
 

Ähnlich wie Intro to AutoML + Hands-on Lab - Erin LeDell, Machine Learning Scientist, H2O.ai

Scalable Automatic Machine Learning in H2O
Scalable Automatic Machine Learning in H2OScalable Automatic Machine Learning in H2O
Scalable Automatic Machine Learning in H2O
Sri Ambati
 

Ähnlich wie Intro to AutoML + Hands-on Lab - Erin LeDell, Machine Learning Scientist, H2O.ai (20)

Open Platform for AI & ML modeling
Open Platform for AI & ML modelingOpen Platform for AI & ML modeling
Open Platform for AI & ML modeling
 
Scalable Automatic Machine Learning with H2O
Scalable Automatic Machine Learning with H2OScalable Automatic Machine Learning with H2O
Scalable Automatic Machine Learning with H2O
 
A full Machine learning pipeline in Scikit-learn vs in scala-Spark: pros and ...
A full Machine learning pipeline in Scikit-learn vs in scala-Spark: pros and ...A full Machine learning pipeline in Scikit-learn vs in scala-Spark: pros and ...
A full Machine learning pipeline in Scikit-learn vs in scala-Spark: pros and ...
 
Scalable Data Science in Python and R on Apache Spark
Scalable Data Science in Python and R on Apache SparkScalable Data Science in Python and R on Apache Spark
Scalable Data Science in Python and R on Apache Spark
 
Scalable Automatic Machine Learning with H2O” by Erin LeDell, Chief Machine L...
Scalable Automatic Machine Learning with H2O” by Erin LeDell, Chief Machine L...Scalable Automatic Machine Learning with H2O” by Erin LeDell, Chief Machine L...
Scalable Automatic Machine Learning with H2O” by Erin LeDell, Chief Machine L...
 
Building a SIMD Supported Vectorized Native Engine for Spark SQL
Building a SIMD Supported Vectorized Native Engine for Spark SQLBuilding a SIMD Supported Vectorized Native Engine for Spark SQL
Building a SIMD Supported Vectorized Native Engine for Spark SQL
 
Building Machine Learning models with Apache Spark and Amazon SageMaker | AWS...
Building Machine Learning models with Apache Spark and Amazon SageMaker | AWS...Building Machine Learning models with Apache Spark and Amazon SageMaker | AWS...
Building Machine Learning models with Apache Spark and Amazon SageMaker | AWS...
 
Low Latency Polyglot Model Scoring using Apache Apex
Low Latency Polyglot Model Scoring using Apache ApexLow Latency Polyglot Model Scoring using Apache Apex
Low Latency Polyglot Model Scoring using Apache Apex
 
Spark + H20 = Machine Learning at scale
Spark + H20 = Machine Learning at scaleSpark + H20 = Machine Learning at scale
Spark + H20 = Machine Learning at scale
 
R4ML: An R Based Scalable Machine Learning Framework
R4ML: An R Based Scalable Machine Learning FrameworkR4ML: An R Based Scalable Machine Learning Framework
R4ML: An R Based Scalable Machine Learning Framework
 
Hadoop spark online demo
Hadoop spark online demoHadoop spark online demo
Hadoop spark online demo
 
Scalable Automatic Machine Learning in H2O
Scalable Automatic Machine Learning in H2OScalable Automatic Machine Learning in H2O
Scalable Automatic Machine Learning in H2O
 
The Nitty Gritty of Advanced Analytics Using Apache Spark in Python
The Nitty Gritty of Advanced Analytics Using Apache Spark in PythonThe Nitty Gritty of Advanced Analytics Using Apache Spark in Python
The Nitty Gritty of Advanced Analytics Using Apache Spark in Python
 
Automate Machine Learning Pipeline Using MLBox
Automate Machine Learning Pipeline Using MLBoxAutomate Machine Learning Pipeline Using MLBox
Automate Machine Learning Pipeline Using MLBox
 
How to Productionize Your Machine Learning Models Using Apache Spark MLlib 2....
How to Productionize Your Machine Learning Models Using Apache Spark MLlib 2....How to Productionize Your Machine Learning Models Using Apache Spark MLlib 2....
How to Productionize Your Machine Learning Models Using Apache Spark MLlib 2....
 
Ml pipelines with Apache spark and Apache beam - Ottawa Reactive meetup Augus...
Ml pipelines with Apache spark and Apache beam - Ottawa Reactive meetup Augus...Ml pipelines with Apache spark and Apache beam - Ottawa Reactive meetup Augus...
Ml pipelines with Apache spark and Apache beam - Ottawa Reactive meetup Augus...
 
Building Machine Learning Inference Pipelines at Scale (July 2019)
Building Machine Learning Inference Pipelines at Scale (July 2019)Building Machine Learning Inference Pipelines at Scale (July 2019)
Building Machine Learning Inference Pipelines at Scale (July 2019)
 
Reproducible AI using MLflow and PyTorch
Reproducible AI using MLflow and PyTorchReproducible AI using MLflow and PyTorch
Reproducible AI using MLflow and PyTorch
 
Scalable AutoML for Time Series Forecasting using Ray
Scalable AutoML for Time Series Forecasting using RayScalable AutoML for Time Series Forecasting using Ray
Scalable AutoML for Time Series Forecasting using Ray
 
Analytics Metrics delivery and ML Feature visualization: Evolution of Data Pl...
Analytics Metrics delivery and ML Feature visualization: Evolution of Data Pl...Analytics Metrics delivery and ML Feature visualization: Evolution of Data Pl...
Analytics Metrics delivery and ML Feature visualization: Evolution of Data Pl...
 

Mehr von Sri Ambati

Mehr von Sri Ambati (20)

H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
 
Generative AI Masterclass - Model Risk Management.pptx
Generative AI Masterclass - Model Risk Management.pptxGenerative AI Masterclass - Model Risk Management.pptx
Generative AI Masterclass - Model Risk Management.pptx
 
AI and the Future of Software Development: A Sneak Peek
AI and the Future of Software Development: A Sneak Peek AI and the Future of Software Development: A Sneak Peek
AI and the Future of Software Development: A Sneak Peek
 
LLMOps: Match report from the top of the 5th
LLMOps: Match report from the top of the 5thLLMOps: Match report from the top of the 5th
LLMOps: Match report from the top of the 5th
 
Building, Evaluating, and Optimizing your RAG App for Production
Building, Evaluating, and Optimizing your RAG App for ProductionBuilding, Evaluating, and Optimizing your RAG App for Production
Building, Evaluating, and Optimizing your RAG App for Production
 
Building LLM Solutions using Open Source and Closed Source Solutions in Coher...
Building LLM Solutions using Open Source and Closed Source Solutions in Coher...Building LLM Solutions using Open Source and Closed Source Solutions in Coher...
Building LLM Solutions using Open Source and Closed Source Solutions in Coher...
 
Risk Management for LLMs
Risk Management for LLMsRisk Management for LLMs
Risk Management for LLMs
 
Open-Source AI: Community is the Way
Open-Source AI: Community is the WayOpen-Source AI: Community is the Way
Open-Source AI: Community is the Way
 
Building Custom GenAI Apps at H2O
Building Custom GenAI Apps at H2OBuilding Custom GenAI Apps at H2O
Building Custom GenAI Apps at H2O
 
Applied Gen AI for the Finance Vertical
Applied Gen AI for the Finance Vertical Applied Gen AI for the Finance Vertical
Applied Gen AI for the Finance Vertical
 
Cutting Edge Tricks from LLM Papers
Cutting Edge Tricks from LLM PapersCutting Edge Tricks from LLM Papers
Cutting Edge Tricks from LLM Papers
 
Practitioner's Guide to LLMs: Exploring Use Cases and a Glimpse Beyond Curren...
Practitioner's Guide to LLMs: Exploring Use Cases and a Glimpse Beyond Curren...Practitioner's Guide to LLMs: Exploring Use Cases and a Glimpse Beyond Curren...
Practitioner's Guide to LLMs: Exploring Use Cases and a Glimpse Beyond Curren...
 
Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...
Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...
Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...
 
KGM Mastering Classification and Regression with LLMs: Insights from Kaggle C...
KGM Mastering Classification and Regression with LLMs: Insights from Kaggle C...KGM Mastering Classification and Regression with LLMs: Insights from Kaggle C...
KGM Mastering Classification and Regression with LLMs: Insights from Kaggle C...
 
LLM Interpretability
LLM Interpretability LLM Interpretability
LLM Interpretability
 
Never Reply to an Email Again
Never Reply to an Email AgainNever Reply to an Email Again
Never Reply to an Email Again
 
Introducción al Aprendizaje Automatico con H2O-3 (1)
Introducción al Aprendizaje Automatico con H2O-3 (1)Introducción al Aprendizaje Automatico con H2O-3 (1)
Introducción al Aprendizaje Automatico con H2O-3 (1)
 
From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...
From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...
From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...
 
AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...
AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...
AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...
 
AI Foundations Course Module 1 - An AI Transformation Journey
AI Foundations Course Module 1 - An AI Transformation JourneyAI Foundations Course Module 1 - An AI Transformation Journey
AI Foundations Course Module 1 - An AI Transformation Journey
 

Kürzlich hochgeladen

Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Victor Rentea
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Victor Rentea
 

Kürzlich hochgeladen (20)

Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptx
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 

Intro to AutoML + Hands-on Lab - Erin LeDell, Machine Learning Scientist, H2O.ai

  • 1.
  • 2. Machine Learning Scientist @ledell / erin@h2o.ai Erin LeDell
  • 4. MEET THE MAKERS ERIN LEDELL Machine Learning Scientist NAVDEEP GILL Software Engineer
 & Data Scientist RAY PECK Director of Product Engineering
  • 5. • Intro to Automatic Machine Learning (AutoML) • Random Grid Search & Stacked Ensembles • H2O’s AutoML (R, Python, GUI) • H2O-3 Roadmap • Hands-on Tutorial Agenda
  • 6. Intro to AutoML Automatic Machine Learning
  • 7. Aspects of Automatic ML Model
 Generation EnsemblesData Prep
  • 8. Data Prep • Imputation of missing data • Standardization of numeric features • One-hot encoding of categorical features • Count/Label/Target encoding of categorical features • Feature selection and/or feature extraction (e.g. PCA) • Feature engineering
  • 9. Model Generation • Cartesian grid search • Random grid search • Tune individual models via Early Stopping • Bayesian Hyperparameter Optimization
  • 10. Ensembles • Bagging / Averaging • Stacking / Super Learning • Ensemble Selection
  • 11. Random Stacking Random Grids + Stacked Ensembles
  • 12. Stacked Ensembles • Specify L base learners (with model params). • Specify a metalearner (just another algo). • Perform k-fold cross-validation on the base learners.
  • 13. Stacked Ensembles • Collect cross-validated predicted values from base learners. • Train a second-level metalearning algorithm to find the optimal combination of base learners. • Metalearner requires only a small amount of compute on top of the cross-validation process (it’s cheap).
  • 14. Random Grid Search + Stacking • Random Grid Search combined with Stacked Ensembles is a powerful combination. • Stacked Ensembles perform particularly well if the models they are based on (1) are individually strong, and (2) make uncorrelated errors. • Random Grid Search is an excellent way to create a diverse of models for the ensemble.
  • 15. H2O AutoML Automatic Machine Learning in H2O
  • 16. H2O Machine Learning Platform • Distributed (multi-core + multi-node) implementations of cutting edge ML algorithms. • Core algorithms written in high performance Java. • APIs available in R, Python, Scala & web GUI. • Works on Hadoop, Spark, EC2, your laptop, etc. • Easily deploy models to production as pure Java code.
  • 17. H2O AutoML (first cut) • Imputation, one-hot encoding, standardization. • Random Grid Search over a custom hyperparameter space, defined by expert data scientists. • Early stopping of individual models and random grids. • GBMs, Random Forests, Deep Neural Nets, GLMs • Multiple Stacked Ensembles of models. • Leaderboard for ranking.
  • 18. H2O AutoML in R library(h2o) h2o.init()
 
 train <- h2o.importFile("train.csv") 
 aml <- h2o.automl(y = "response_colname", 
 training_frame = train,
 max_runtime_secs = 600) lb <- aml@leaderboard

  • 19. H2O AutoML in Python import h2o from h2o.automl import H2OAutoML h2o.init()
 
 train = h2o.import_file("train.csv")
 
 aml = H2OAutoML(max_runtime_secs = 600)
 aml.train(y = "response_colname", 
 training_frame = train)
 
 lb = aml.leaderboard
  • 21. Example Leaderboard for binary classification H2O AutoML Leaderboard
  • 23. Feature Q1 Q2 New Algorithm: Cox-Proportional Hazards GLM: Ordinal Regression GBM: Quasibinomial NLP Improvements, TF-IDF Stacked Ensemble: Custom Metalearner AutoML: New Ensembles AutoML: Add XGBoost Distributed XGBoost New Algorithm: Factorization Machines H2O-3 Roadmap https://tinyurl.com/h2o-automl-jira
  • 24. • Documentation: http://docs.h2o.ai • Tutorials: https://github.com/h2oai/h2o-tutorials • Slidedecks: https://github.com/h2oai/h2o-meetups • Videos: https://www.youtube.com/user/0xdata • Events & Meetups: http://h2o.ai/events • Stack Overflow: https://stackoverflow.com/tags/h2o • Google Group: https://tinyurl.com/h2ostream • Gitter: http://gitter.im/h2oai/h2o-3
  • 26. First-time Qwiklab Account Setup • Go to http://h2oai.qwiklab.com • Click on “JOIN” • Create a new account with a valid email address • You will receive a confirmation email • Click on the link in the confirmation email • Go back to http://h2oai.qwiklab.com and log in • Go to the Catalog on the left bar • Choose “Introduction to AutoML in H2O” • Wait for instructions
  • 27. https://tinyurl.com/automl-h2oworld17 Code and data available here H2O AutoML Tutorial