SlideShare ist ein Scribd-Unternehmen logo
1 von 16
Downloaden Sie, um offline zu lesen
H2O AutoML Roadmap 2016.10
Raymond Peck
Director of Product Engineering, H2O.ai
rpeck@h2o.ai
© H2O.ai, 2016 1
What Will We Cover?
• What is AutoML?
• What is the roadmap for H2O AutoML?
© H2O.ai, 2016 2
What is AutoML?
H2O AutoML automates parts of data preparation and model
training in order to help both Machine Learning / Data Science
experts and complete novices.
Other AutoML projects concentrate on novices.
© H2O.ai, 2016 3
Outside AutoML Projects
• auto-sklearn
• AutoCompete
• TPOT
• DataRobot
• Automatic Statistician
• BigML
• et al...
© H2O.ai, 2016 4
Who is the Target Audience?
• "Big green button" for novice users such as software
developers and business analysts;
• Iterative, interactive use and controls for expert users:
• Machine Learning experts
• Descriptive Data Scientists
© H2O.ai, 2016 5
What Are the Pieces?
• data cleaning
• feature engineering / feature generation
• feature selection
• for both the original and generated features
• model hyperparameter tuning
• automatic smart ensemble generation
© H2O.ai, 2016 6
Prior Work @ H2O
• ensembles (stacking), from Erin LeDell
• random hyperparameter search with automatic stopping,
from Raymond Peck
• some dataset characterization and feature engineering,
from Spencer Aiello
• hyperopt Bayesian hyperparameter optimization, from
Abhishek Malali
© H2O.ai, 2016 7
Current Work
• random hyperparameter search with parameter values
based on open datasets
• moving ensembles into the back end
• working on basic metalearning for hyperparameter vectors,
starting with 140 OpenML datasets
© H2O.ai, 2016 8
Future Work
• feature selection
• feature engineering for IID data
• Bayesian hyperparameter search with warm start
• feature engineering for non-IID data, e.g. time series
• iterate w/ larger datasets that are typical for our customers
• distribution guesser for regression
© H2O.ai, 2016 9
How Do We Evaluate Our Work?
• public datasets from
• OpenML
• ChaLearn AutoML challenge
• Kaggle
• our own Data Scientists' work with customer datasets
• customer feedback (soon)
© H2O.ai, 2016 10
Data Cleaning
• outlier analysis (with user feedback)
• sentinel value detection
• as a side-effect of outlier analysis
• type-based heuristics (e.g., 999999, 1970.01.01)
• identifier detection (e.g., customer ID)
• smart imputation
© H2O.ai, 2016 11
Feature Generation
We will be using several techniques including:
• type-based heuristics
• date/time expansion
• log and other transforms of numerics
• interactions (product, ratio, etc)
• feature generation with Deep Learning deepfeatures()
• clustering
© H2O.ai, 2016 12
Feature Selection
We will be evaluating several techniques including:
• Mutual Information (non-linear correlation)
• variable importance from GBM and Deep Learning
• PCA
• GLM with Elastic Net / LASSO
Perhaps different selectors for initial data and transforms / interactions
to trade off speed and the detection of non-linear relationships.
© H2O.ai, 2016 13
Hyperparameter Tuning
• currently do random hyperparameter search with metric-based
smart stopping
• hyperparameter values taken from hand-tuning 140 OpenML
datasets
• soon adding simple "nearest neighbors" warm start (basic
metalearning)
• then adding Bayesian hyperparameter optimization
• possibly integrating hyperopt into the back end
© H2O.ai, 2016 14
Automatic Smart Ensemble
Generation
• currently adding Erin LeDell's stacking / SuperLearner into the back end
• initially, ensemble top N models from hyperparameter searches
• optional "use original features"
• smarter ensemble generation for faster scoring, less overfitting:
• greedy ensemble creation
• ensemble models with uncorrelated residuals
© H2O.ai, 2016 15
Possible Futures
• multiple concurrent H2O clusters for speed
• freeze/thaw model training
• outlier analysis with user feedback
• residuals analysis with user feedback
• composite models using pre-clustering step
• try to predict accuracy from dataset metadata
• training time prediction
• scoring time prediction
© H2O.ai, 2016 16

Weitere ähnliche Inhalte

Andere mochten auch

400 million Search Results -Predict Contextual Ad Clicks
400 million Search Results -Predict Contextual Ad Clicks 400 million Search Results -Predict Contextual Ad Clicks
400 million Search Results -Predict Contextual Ad Clicks Sri Ambati
 
Robsonalves fotografia Fine Art 2016-2
Robsonalves fotografia Fine Art 2016-2Robsonalves fotografia Fine Art 2016-2
Robsonalves fotografia Fine Art 2016-2Robson Alves
 
Alice Lindorfer
Alice LindorferAlice Lindorfer
Alice LindorferAOtaki
 
Introduction to Deducer
Introduction to DeducerIntroduction to Deducer
Introduction to DeducerKazuki Yoshida
 
Data mining with Rattle For R
Data mining with Rattle For RData mining with Rattle For R
Data mining with Rattle For RAkhil Anil
 
Automating Machine Learning - Is it feasible?
Automating Machine Learning - Is it feasible?Automating Machine Learning - Is it feasible?
Automating Machine Learning - Is it feasible?Manuel Martín
 
R and Rcmdr Statistical Software
R and Rcmdr Statistical SoftwareR and Rcmdr Statistical Software
R and Rcmdr Statistical Softwarearttan2001
 
Automatic Machine Learning, AutoML
Automatic Machine Learning, AutoMLAutomatic Machine Learning, AutoML
Automatic Machine Learning, AutoMLHimadri Mishra
 

Andere mochten auch (10)

400 million Search Results -Predict Contextual Ad Clicks
400 million Search Results -Predict Contextual Ad Clicks 400 million Search Results -Predict Contextual Ad Clicks
400 million Search Results -Predict Contextual Ad Clicks
 
Robsonalves fotografia Fine Art 2016-2
Robsonalves fotografia Fine Art 2016-2Robsonalves fotografia Fine Art 2016-2
Robsonalves fotografia Fine Art 2016-2
 
Alice Lindorfer
Alice LindorferAlice Lindorfer
Alice Lindorfer
 
Introduction to Deducer
Introduction to DeducerIntroduction to Deducer
Introduction to Deducer
 
Data mining with Rattle For R
Data mining with Rattle For RData mining with Rattle For R
Data mining with Rattle For R
 
Installing R and R-Studio
Installing R and R-StudioInstalling R and R-Studio
Installing R and R-Studio
 
Automating Machine Learning - Is it feasible?
Automating Machine Learning - Is it feasible?Automating Machine Learning - Is it feasible?
Automating Machine Learning - Is it feasible?
 
R and Rcmdr Statistical Software
R and Rcmdr Statistical SoftwareR and Rcmdr Statistical Software
R and Rcmdr Statistical Software
 
Automatic Machine Learning, AutoML
Automatic Machine Learning, AutoMLAutomatic Machine Learning, AutoML
Automatic Machine Learning, AutoML
 
R-Studio Vs. Rcmdr
R-Studio Vs. RcmdrR-Studio Vs. Rcmdr
R-Studio Vs. Rcmdr
 

Ähnlich wie H2O Machine Learning AutoML Roadmap 2016.10

MongoDB .local Chicago 2019: MongoDB – Powering the new age data demands
MongoDB .local Chicago 2019: MongoDB – Powering the new age data demandsMongoDB .local Chicago 2019: MongoDB – Powering the new age data demands
MongoDB .local Chicago 2019: MongoDB – Powering the new age data demandsMongoDB
 
LIMS Implementation
LIMS ImplementationLIMS Implementation
LIMS ImplementationRobin Emig
 
AI as a Service, Build Shared AI Service Platforms Based on Deep Learning Tec...
AI as a Service, Build Shared AI Service Platforms Based on Deep Learning Tec...AI as a Service, Build Shared AI Service Platforms Based on Deep Learning Tec...
AI as a Service, Build Shared AI Service Platforms Based on Deep Learning Tec...Databricks
 
Building a Real-Time Security Application Using Log Data and Machine Learning...
Building a Real-Time Security Application Using Log Data and Machine Learning...Building a Real-Time Security Application Using Log Data and Machine Learning...
Building a Real-Time Security Application Using Log Data and Machine Learning...Sri Ambati
 
MongoDB World 2019: High Performance Auditing of Changes Based on MongoDB Cha...
MongoDB World 2019: High Performance Auditing of Changes Based on MongoDB Cha...MongoDB World 2019: High Performance Auditing of Changes Based on MongoDB Cha...
MongoDB World 2019: High Performance Auditing of Changes Based on MongoDB Cha...MongoDB
 
Comcast Labs Connect - PHLAI Conference Philadelphia 2018
Comcast Labs Connect - PHLAI Conference Philadelphia 2018 Comcast Labs Connect - PHLAI Conference Philadelphia 2018
Comcast Labs Connect - PHLAI Conference Philadelphia 2018 Open Data Group
 
Machine Learning - Eine Challenge für Architekten
Machine Learning - Eine Challenge für ArchitektenMachine Learning - Eine Challenge für Architekten
Machine Learning - Eine Challenge für ArchitektenHarald Erb
 
Techniques for scaling application with security and visibility in cloud
Techniques for scaling application with security and visibility in cloudTechniques for scaling application with security and visibility in cloud
Techniques for scaling application with security and visibility in cloudAkshay Mathur
 
Algolytics company Overview 2015
Algolytics company Overview 2015Algolytics company Overview 2015
Algolytics company Overview 2015Algolytics
 
Big Data LDN 2018: HOW AUTOMATION CAN ACCELERATE THE DELIVERY OF MACHINE LEAR...
Big Data LDN 2018: HOW AUTOMATION CAN ACCELERATE THE DELIVERY OF MACHINE LEAR...Big Data LDN 2018: HOW AUTOMATION CAN ACCELERATE THE DELIVERY OF MACHINE LEAR...
Big Data LDN 2018: HOW AUTOMATION CAN ACCELERATE THE DELIVERY OF MACHINE LEAR...Matt Stubbs
 
An Introduction to Graph: Database, Analytics, and Cloud Services
An Introduction to Graph:  Database, Analytics, and Cloud ServicesAn Introduction to Graph:  Database, Analytics, and Cloud Services
An Introduction to Graph: Database, Analytics, and Cloud ServicesJean Ihm
 
Making advertising personal, 4th NL Recommenders Meetup
Making advertising personal, 4th NL Recommenders MeetupMaking advertising personal, 4th NL Recommenders Meetup
Making advertising personal, 4th NL Recommenders MeetupOlivier Koch
 
Why Your Data Science Architecture Should Include a Data Virtualization Tool ...
Why Your Data Science Architecture Should Include a Data Virtualization Tool ...Why Your Data Science Architecture Should Include a Data Virtualization Tool ...
Why Your Data Science Architecture Should Include a Data Virtualization Tool ...Denodo
 
What is the future of data strategy?
What is the future of data strategy?What is the future of data strategy?
What is the future of data strategy?Denodo
 
SharePoint Site Redesign : Information Architecture and User-centered Design ...
SharePoint Site Redesign : Information Architecture and User-centered Design ...SharePoint Site Redesign : Information Architecture and User-centered Design ...
SharePoint Site Redesign : Information Architecture and User-centered Design ...arsathe
 

Ähnlich wie H2O Machine Learning AutoML Roadmap 2016.10 (20)

MongoDB .local Chicago 2019: MongoDB – Powering the new age data demands
MongoDB .local Chicago 2019: MongoDB – Powering the new age data demandsMongoDB .local Chicago 2019: MongoDB – Powering the new age data demands
MongoDB .local Chicago 2019: MongoDB – Powering the new age data demands
 
LIMS Implementation
LIMS ImplementationLIMS Implementation
LIMS Implementation
 
AI as a Service, Build Shared AI Service Platforms Based on Deep Learning Tec...
AI as a Service, Build Shared AI Service Platforms Based on Deep Learning Tec...AI as a Service, Build Shared AI Service Platforms Based on Deep Learning Tec...
AI as a Service, Build Shared AI Service Platforms Based on Deep Learning Tec...
 
Building a Real-Time Security Application Using Log Data and Machine Learning...
Building a Real-Time Security Application Using Log Data and Machine Learning...Building a Real-Time Security Application Using Log Data and Machine Learning...
Building a Real-Time Security Application Using Log Data and Machine Learning...
 
DevOps Days Rockies MLOps
DevOps Days Rockies MLOpsDevOps Days Rockies MLOps
DevOps Days Rockies MLOps
 
MongoDB World 2019: High Performance Auditing of Changes Based on MongoDB Cha...
MongoDB World 2019: High Performance Auditing of Changes Based on MongoDB Cha...MongoDB World 2019: High Performance Auditing of Changes Based on MongoDB Cha...
MongoDB World 2019: High Performance Auditing of Changes Based on MongoDB Cha...
 
Comcast Labs Connect - PHLAI Conference Philadelphia 2018
Comcast Labs Connect - PHLAI Conference Philadelphia 2018 Comcast Labs Connect - PHLAI Conference Philadelphia 2018
Comcast Labs Connect - PHLAI Conference Philadelphia 2018
 
Machine Learning - Eine Challenge für Architekten
Machine Learning - Eine Challenge für ArchitektenMachine Learning - Eine Challenge für Architekten
Machine Learning - Eine Challenge für Architekten
 
Bigowl aitech
Bigowl aitechBigowl aitech
Bigowl aitech
 
Techniques for scaling application with security and visibility in cloud
Techniques for scaling application with security and visibility in cloudTechniques for scaling application with security and visibility in cloud
Techniques for scaling application with security and visibility in cloud
 
Algolytics company Overview 2015
Algolytics company Overview 2015Algolytics company Overview 2015
Algolytics company Overview 2015
 
Algolytics company Overview 2015
Algolytics company Overview 2015Algolytics company Overview 2015
Algolytics company Overview 2015
 
Big Data LDN 2018: HOW AUTOMATION CAN ACCELERATE THE DELIVERY OF MACHINE LEAR...
Big Data LDN 2018: HOW AUTOMATION CAN ACCELERATE THE DELIVERY OF MACHINE LEAR...Big Data LDN 2018: HOW AUTOMATION CAN ACCELERATE THE DELIVERY OF MACHINE LEAR...
Big Data LDN 2018: HOW AUTOMATION CAN ACCELERATE THE DELIVERY OF MACHINE LEAR...
 
An Introduction to Graph: Database, Analytics, and Cloud Services
An Introduction to Graph:  Database, Analytics, and Cloud ServicesAn Introduction to Graph:  Database, Analytics, and Cloud Services
An Introduction to Graph: Database, Analytics, and Cloud Services
 
Machine Learning
Machine LearningMachine Learning
Machine Learning
 
Internship Presentation.pdf
Internship Presentation.pdfInternship Presentation.pdf
Internship Presentation.pdf
 
Making advertising personal, 4th NL Recommenders Meetup
Making advertising personal, 4th NL Recommenders MeetupMaking advertising personal, 4th NL Recommenders Meetup
Making advertising personal, 4th NL Recommenders Meetup
 
Why Your Data Science Architecture Should Include a Data Virtualization Tool ...
Why Your Data Science Architecture Should Include a Data Virtualization Tool ...Why Your Data Science Architecture Should Include a Data Virtualization Tool ...
Why Your Data Science Architecture Should Include a Data Virtualization Tool ...
 
What is the future of data strategy?
What is the future of data strategy?What is the future of data strategy?
What is the future of data strategy?
 
SharePoint Site Redesign : Information Architecture and User-centered Design ...
SharePoint Site Redesign : Information Architecture and User-centered Design ...SharePoint Site Redesign : Information Architecture and User-centered Design ...
SharePoint Site Redesign : Information Architecture and User-centered Design ...
 

Kürzlich hochgeladen

Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...nirzagarg
 
SR-101-01012024-EN.docx Federal Constitution of the Swiss Confederation
SR-101-01012024-EN.docx  Federal Constitution  of the Swiss ConfederationSR-101-01012024-EN.docx  Federal Constitution  of the Swiss Confederation
SR-101-01012024-EN.docx Federal Constitution of the Swiss ConfederationEfruzAsilolu
 
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24  Building Real-Time Pipelines With FLaNKDATA SUMMIT 24  Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNKTimothy Spann
 
Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1ranjankumarbehera14
 
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With OrangePredicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With OrangeThinkInnovation
 
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...gajnagarg
 
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...nirzagarg
 
Switzerland Constitution 2002.pdf.........
Switzerland Constitution 2002.pdf.........Switzerland Constitution 2002.pdf.........
Switzerland Constitution 2002.pdf.........EfruzAsilolu
 
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格q6pzkpark
 
怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制
怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制
怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制vexqp
 
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...nirzagarg
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Researchmichael115558
 
怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制
怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制
怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制vexqp
 
一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样
一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样
一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样wsppdmt
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...ZurliaSoop
 
7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.ppt7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.pptibrahimabdi22
 
怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制
怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制
怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制vexqp
 
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...gajnagarg
 
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...gajnagarg
 

Kürzlich hochgeladen (20)

Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
 
SR-101-01012024-EN.docx Federal Constitution of the Swiss Confederation
SR-101-01012024-EN.docx  Federal Constitution  of the Swiss ConfederationSR-101-01012024-EN.docx  Federal Constitution  of the Swiss Confederation
SR-101-01012024-EN.docx Federal Constitution of the Swiss Confederation
 
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24  Building Real-Time Pipelines With FLaNKDATA SUMMIT 24  Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
 
Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1
 
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With OrangePredicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
 
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
 
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
Switzerland Constitution 2002.pdf.........
Switzerland Constitution 2002.pdf.........Switzerland Constitution 2002.pdf.........
Switzerland Constitution 2002.pdf.........
 
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
 
怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制
怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制
怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制
 
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制
怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制
怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制
 
一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样
一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样
一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.ppt7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.ppt
 
怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制
怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制
怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制
 
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
 
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
 

H2O Machine Learning AutoML Roadmap 2016.10

  • 1. H2O AutoML Roadmap 2016.10 Raymond Peck Director of Product Engineering, H2O.ai rpeck@h2o.ai © H2O.ai, 2016 1
  • 2. What Will We Cover? • What is AutoML? • What is the roadmap for H2O AutoML? © H2O.ai, 2016 2
  • 3. What is AutoML? H2O AutoML automates parts of data preparation and model training in order to help both Machine Learning / Data Science experts and complete novices. Other AutoML projects concentrate on novices. © H2O.ai, 2016 3
  • 4. Outside AutoML Projects • auto-sklearn • AutoCompete • TPOT • DataRobot • Automatic Statistician • BigML • et al... © H2O.ai, 2016 4
  • 5. Who is the Target Audience? • "Big green button" for novice users such as software developers and business analysts; • Iterative, interactive use and controls for expert users: • Machine Learning experts • Descriptive Data Scientists © H2O.ai, 2016 5
  • 6. What Are the Pieces? • data cleaning • feature engineering / feature generation • feature selection • for both the original and generated features • model hyperparameter tuning • automatic smart ensemble generation © H2O.ai, 2016 6
  • 7. Prior Work @ H2O • ensembles (stacking), from Erin LeDell • random hyperparameter search with automatic stopping, from Raymond Peck • some dataset characterization and feature engineering, from Spencer Aiello • hyperopt Bayesian hyperparameter optimization, from Abhishek Malali © H2O.ai, 2016 7
  • 8. Current Work • random hyperparameter search with parameter values based on open datasets • moving ensembles into the back end • working on basic metalearning for hyperparameter vectors, starting with 140 OpenML datasets © H2O.ai, 2016 8
  • 9. Future Work • feature selection • feature engineering for IID data • Bayesian hyperparameter search with warm start • feature engineering for non-IID data, e.g. time series • iterate w/ larger datasets that are typical for our customers • distribution guesser for regression © H2O.ai, 2016 9
  • 10. How Do We Evaluate Our Work? • public datasets from • OpenML • ChaLearn AutoML challenge • Kaggle • our own Data Scientists' work with customer datasets • customer feedback (soon) © H2O.ai, 2016 10
  • 11. Data Cleaning • outlier analysis (with user feedback) • sentinel value detection • as a side-effect of outlier analysis • type-based heuristics (e.g., 999999, 1970.01.01) • identifier detection (e.g., customer ID) • smart imputation © H2O.ai, 2016 11
  • 12. Feature Generation We will be using several techniques including: • type-based heuristics • date/time expansion • log and other transforms of numerics • interactions (product, ratio, etc) • feature generation with Deep Learning deepfeatures() • clustering © H2O.ai, 2016 12
  • 13. Feature Selection We will be evaluating several techniques including: • Mutual Information (non-linear correlation) • variable importance from GBM and Deep Learning • PCA • GLM with Elastic Net / LASSO Perhaps different selectors for initial data and transforms / interactions to trade off speed and the detection of non-linear relationships. © H2O.ai, 2016 13
  • 14. Hyperparameter Tuning • currently do random hyperparameter search with metric-based smart stopping • hyperparameter values taken from hand-tuning 140 OpenML datasets • soon adding simple "nearest neighbors" warm start (basic metalearning) • then adding Bayesian hyperparameter optimization • possibly integrating hyperopt into the back end © H2O.ai, 2016 14
  • 15. Automatic Smart Ensemble Generation • currently adding Erin LeDell's stacking / SuperLearner into the back end • initially, ensemble top N models from hyperparameter searches • optional "use original features" • smarter ensemble generation for faster scoring, less overfitting: • greedy ensemble creation • ensemble models with uncorrelated residuals © H2O.ai, 2016 15
  • 16. Possible Futures • multiple concurrent H2O clusters for speed • freeze/thaw model training • outlier analysis with user feedback • residuals analysis with user feedback • composite models using pre-clustering step • try to predict accuracy from dataset metadata • training time prediction • scoring time prediction © H2O.ai, 2016 16