SlideShare ist ein Scribd-Unternehmen logo
1 von 27
QCon London
H2O Driverless AI
An AI that creates AI!
Marios Michailidis
InfoQ.com: News & Community Site
Watch the video with slide
synchronization on InfoQ.com!
https://www.infoq.com/presentations/
h2o-driverless-ai
• Over 1,000,000 software developers, architects and CTOs read the site world-
wide every month
• 250,000 senior developers subscribe to our weekly newsletter
• Published in 4 languages (English, Chinese, Japanese and Brazilian
Portuguese)
• Post content from our QCon conferences
• 2 dedicated podcast channels: The InfoQ Podcast, with a focus on
Architecture and The Engineering Culture Podcast, with a focus on building
• 96 deep dives on innovative topics packed as downloadable emags and
minibooks
• Over 40 new content items per week
Purpose of QCon
- to empower software development by facilitating the spread of
knowledge and innovation
Strategy
- practitioner-driven conference designed for YOU: influencers of
change and innovation in your teams
- speakers and topics driving the evolution and innovation
- connecting and catalyzing the influencers and innovators
Highlights
- attended by more than 12,000 delegates since 2007
- held in 9 cities worldwide
Presented at QCon London
www.qconlondon.com
Background
• Competitive data scientist at H2O.ai
• PhD in ensemble methods at UCL
• Former kaggle #1 – over 120+ competitions
Features
Target
Data Quality and
Transformation
Modeling
Table
Model
Building
Model
Data Integration
+
Challenges in the Machine Learning Workflow
Weeks or even Months per Model Optimization
Highly Iterative Process
• Insight – Visualization
• Cross Validation
• Feature Engineering
• Model Selection
• Hyper Parameter Optimization
• Feature Selection
• Ensemble
• Understanding/Interpreting the results
• Deploy/Productionize
Confidential4
Visualizations
outlier detection Correlation graph Heat map
Confidential5
Animal
Dog
Dog
Cat
Fish
Cat
Dog
Fish
Cat
Cat
Fish
L.Encoding
2
2
1
3
1
2
3
1
1
3
Frequency
3
3
4
3
4
3
3
4
4
3
Is_Dog? Is_Cat? Is_Fish?
1 0 0
1 0 0
0 1 0
0 0 1
0 1 0
1 0 0
0 0 1
0 1 0
0 1 0
0 0 1
Cost
100
150
300
50
250
75
25
400
225
40
Animal Avg.cost
Cat 293.75
Dog 108.333
Fish 38.3333
108.33
108.33
293.75
38.33
293.75
108.33
38.33
293.75
293.75
38.33
Feature engineering: Categorical features
Confidential6
Age
40
45
45
45
49
51
56
61
62
71
73
74
77
78
78
Age
40-51
52-56
57-77
78+
Age
40
45
45
45
missing
51
56
61
62
71
73
74
77
78
78
Age
40
45
45
45
61.14
51
56
61
62
71
73
74
77
78
78
Age
40-51
52-56
57-77
78+
missing
Age
40-51
40-51
40-51
40-51
40-51
40-51
52-56
57-77
57-77
57-77
57-77
57-77
57-77
78+
78+
Age
40-51
40-51
40-51
40-51
missing
40-51
52-56
57-77
57-77
57-77
57-77
57-77
57-77
78+
78+
Mean=
61.14
Feature engineering: Numerical features
Age
Income
Confidential7
Age
incom
e
40 40
45 14
45 100
45 33
49 20
51 44
56 65
61 80
62 55
71 15
73 18
74 33
77 32
78 48
78 41
* +
1600 80
630 59
4500 145
1485 78
980 69
2244 95
3640 121
4880 141
3410 117
1065 86
1314 91
2442 107
2464 109
3744 126
3198 119
Animal color
Dog brown
Dog white
Cat black
Fish gold
Cat mixed
Dog black
Fish white
Cat white
Cat black
Fish brown
Animal_color
Dog_brown
Dog_white
Cat_black
Fish_gold
Cat_mixed
Dog_black
Fish_white
Cat_white
Cat_black
Fish_brown
Age Animal
13 Dog
12 Dog
15 Cat
3 Fish
16 Cat
10 Dog
6 Fish
20 Cat
13 Cat
2 Fish
derive
d
11.66
11.66
16
3.66
16
11.66
3.66
16
16
3.66
Animal Age
Dog 11.66
Cat 16
Fish 3.66
Feature engineering: Interactions
Confidential8
This dog is cute! I love dogs
dog this is jedi love cute
empero
r bingo ! I
2 1 1 0 1 1 0 0 1 1
• SVD
• Word2vec
word f1 f2 f3 f4 f5
dog 0.8 0.2 0.1 -0.1 -0.4
this -0.2 0 0.1 0.8 0.3
love 0.1 0.2 0.7 -0.5 0.1
cute 0.2 0.2 0.8 -0.4 0
King – Man =
Queen
Feature engineering: Text
Confidential9
Date
1/1/2018
2/1/2018
3/1/2018
4/1/2018
5/1/2018
6/1/2018
7/1/2018
8/1/2018
9/1/2018
10/1/2018
Day Month Year
weekda
y weeknum
1 1 2018 2 1
2 1 2018 3 1
3 1 2018 4 1
4 1 2018 5 1
5 1 2018 6 1
6 1 2018 7 1
7 1 2018 1 2
8 1 2018 2 2
9 1 2018 3 2
10 1 2018 4 2
Date Sales
1/1/2018 100
2/1/2018 150
3/1/2018 160
4/1/2018 200
5/1/2018 210
6/1/2018 150
7/1/2018 160
8/1/2018 120
9/1/2018 80
10/1/2018 70
Lag1 Lag2
- -
100 -
150 100
160 150
200 160
210 200
150 210
160 150
120 160
80 120
MA2
-
100
125
155
180
205
180
155
140
100
Feature engineering: Time series
Confidential10
Common open source packages used in ML
LIBLINEAR
LIBFFM
LIBSVM
Confidential11
• Optimizing maximum depth
• Tree expansion (loss vs depth)
• Learning rate
• Number of trees
1
2
3 3
2
Hyper parameter tuning
Copyright 2018 H2O.ai Inc. All Rights Reserved.
Validation approach
Train dataPast Future
x0 x1 x2 x3 y
0.94 0.27 0.80 0.34 1
0.02 0.22 0.17 0.84 0
0.83 0.11 0.23 0.42 1
0.74 0.26 0.03 0.41 0
0.08 0.29 0.76 0.37 0
0.71 0.76 0.43 0.95 1
0.08 0.72 0.97 0.04 0
0.84 0.79 0.89 0.05 1
0.94 0.27 0.80 0.34 1
0.02 0.22 0.17 0.84 0
0.83 0.11 0.23 0.42 1
0.74 0.26 0.03 0.41 0
0.08 0.29 0.76 0.37 0
0.71 0.76 0.43 0.95 1
0.08 0.72 0.97 0.04 0
0.84 0.79 0.89 0.05 1
K=4
pred
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
Fold : 1
pred
0.96
0.03
0.00
0.00
0.00
0.00
0.00
0.00
Train
Predict pred
0.96
0.03
0.90
0.12
0.00
0.00
0.00
0.00
Fold : 2Fold : 3
pred
0.96
0.03
0.90
0.12
0.03
0.77
0.00
0.00
Fold : 4
pred
0.96
0.03
0.90
0.12
0.03
0.77
0.18
0.91
Confidential13
Genetic algorithm approach
x1 x2 x3 x4 y
0.14 0.69 0.01 0.71 1
0.22 0.44 0.45 0.69 1
0.12 0.35 0.51 0.23 0
0.22 0.42 0.79 0.60 1
0.93 0.82 0.72 0.50 1
0.32 0.58 0.28 0.22 0
0.95 0.59 0.68 0.09 1
0.34 0.58 0.35 0.81 0
0.05 0.80 0.28 0.86 1
0.23 0.49 0.63 0.03 0
0.05 0.34 0.53 0.73 1
0.74 0.02 0.33 0.56 0
Iteration1/10
X% accuracy
Feature Importance
x4 1
x2 0.5
x3 0.2
x1 0.01
Tuned
Iteration2/10
+ Random Transformation
sqrt(x2)
Z% accuracy
Feature Importance
x2+x4 1
x4 0.4
sqrt(x2) 0.3
x3 0.2
x2 0.05
Confidential14
Feature ranking based on permutations
0.5 0.52 0.69 0.2 1
0.18 0.94 0.57 0.68 0
0.67 0.6 0.76 0.91 1
0.25 0.68 0.57 0.07 1
0.83 0.07 0.76 0.56 0
0.49 0.92 0.64 0.02 0
0.17 0.7 0.57 1
0.96 0.75 0.59 1
0.96 0.14 0.09 0
0.61 0.38 0.07 0
0.07 0.13 0.87 1
0.66 0.27 0.48 1
0.41
0.56
0.21
0.61
0.26
0.02
0.02
0.56
0.21
0.41
0.61
0.26
Train
Validation
fit
score
80% accuracy 70% accuracy
(drop)
Shuffle The 10% difference
is how important that
feature is
Bring back to
normal
Repeat for
all features
Confidential15
Stacking
X0 x1 x2 xn y
0.17 0.25 0.93 0.79 1
0.35 0.61 0.93 0.57 0
0.44 0.59 0.56 0.46 0
0.37 0.43 0.74 0.28 1
0.96 0.07 0.57 0.01 1
A
X0 x1 x2 xn y
0.89 0.72 0.50 0.66 0
0.58 0.71 0.92 0.27 1
0.10 0.35 0.27 0.37 0
0.47 0.68 0.30 0.98 0
0.39 0.53 0.59 0.18 1
B
X0 x1 x2 xn y
0.29 0.77 0.05 0.09 ?
0.38 0.66 0.42 0.91 ?
0.72 0.66 0.92 0.11 ?
0.70 0.37 0.91 0.17 ?
0.59 0.98 0.93 0.65 ?
C
pred0 pred1 pred2 y
0.24 0.72 0.70 0
0.95 0.25 0.22 1
0.64 0.80 0.96 0
0.89 0.58 0.52 0
0.11 0.20 0.93 1
B1
pred0 pred1 pred2 y
0.50 0.50 0.39 ?
0.62 0.59 0.46 ?
0.22 0.31 0.54 ?
0.90 0.47 0.09 ?
0.20 0.09 0.61 ?
C1
Train algorithm 0 on A and make predictions for B and C and save to B1, C1
Train algorithm 1 on A and make predictions for B and C and save to B1, C1
Train algorithm 2 on A and make predictions for B and C and save to B1, C1
Train algorithm 3 on B1 and make predictions for C1
Preds3
0.45
0.23
0.99
0.34
0.05
Consider datasets A,B,C. Target variable (y) is known for A,B
Confidential16
“The ability to explain or to present in understandable terms to a human.”
– Finale Doshi-Velez and Been Kim. “Towards a rigorous science of interpretable
machine learning.” arXiv preprint. 2017. https://arxiv.org/pdf/1702.08608.pdf
FAT*: https://www.fatml.org/resources/principles-for-accountable-algorithms
XAI: https://www.darpa.mil/program/explainable-artificial-intelligence
Machine Learning Interpretability?
Confidential17
Interpretability and accuracy trade-off
Linear Models
Exact explanations for
approximate models.
Machine Learning
Approximate explanations
for exact models.
Confidential18
Decision Tree Surrogate Models
Partial Dependence and Individual Conditional Expectation
https://github.com/jphall663/interpretable_machine_learning_with_python
Machine Learning Interpretability examples
H2O.ai Product Suite
Automatic feature engineering,
machine learning and interpretability
• 100% open source – Apache V2 licensed
• Built for data scientists – interface using R, Python on H2O Flow
(interactive notebook interface)
• Enterprise support subscriptions
• Enterprise software
• Built for domain users, analysts and
data scientists – GUI-based interface
for end-to-end data science
• Fully automated machine learning from
ingest to deployment
H2O AI open source engine
integration with Spark
Lightning fast machine
learning on GPUs
In-memory, distributed
machine learning algorithms
with H2O Flow GUI
Open Source
Confidential20
H2O is the open source leader in AI.
Democratize AI for Everyone.
AI for good.
Confidential21
Will there be a default?
AUC, Accuracy, Precision…
Time, hardware
Insight-visualization
Feature Engineering
Predictions
Model Interpretability
• Some input data
• A target variable
• An objective (or a success metric)
• Some allocated resources
H2O Driverless AI process
x1 x2 x3 x4 y
0.14 0.69 0.01 0.71 1
0.22 0.44 0.45 0.69 1
0.12 0.35 0.51 0.23 0
0.22 0.42 0.79 0.60 1
0.93 0.82 0.72 0.50 1
0.32 0.58 0.28 0.22 0
0.95 0.59 0.68 0.09 1
0.34 0.58 0.35 0.81 0
0.05 0.80 0.28 0.86 1
0.23 0.49 0.63 0.03 0
0.05 0.34 0.53 0.73 1
0.74 0.02 0.33 0.56 0
Confidential22
Scoring Pipeline
• Python
• MOJO (Java)
Confidential23
Demo
Confidential24
Thank you
marios@h2o.ai
in/mariosmichailidis/
@StackNet_
https://www.facebook.com/StackNet/
Watch the video with slide
synchronization on InfoQ.com!
https://www.infoq.com/presentations/
h2o-driverless-ai

Weitere ähnliche Inhalte

Mehr von C4Media

Shifting Left with Cloud Native CI/CD
Shifting Left with Cloud Native CI/CDShifting Left with Cloud Native CI/CD
Shifting Left with Cloud Native CI/CDC4Media
 
CI/CD for Machine Learning
CI/CD for Machine LearningCI/CD for Machine Learning
CI/CD for Machine LearningC4Media
 
Fault Tolerance at Speed
Fault Tolerance at SpeedFault Tolerance at Speed
Fault Tolerance at SpeedC4Media
 
Architectures That Scale Deep - Regaining Control in Deep Systems
Architectures That Scale Deep - Regaining Control in Deep SystemsArchitectures That Scale Deep - Regaining Control in Deep Systems
Architectures That Scale Deep - Regaining Control in Deep SystemsC4Media
 
ML in the Browser: Interactive Experiences with Tensorflow.js
ML in the Browser: Interactive Experiences with Tensorflow.jsML in the Browser: Interactive Experiences with Tensorflow.js
ML in the Browser: Interactive Experiences with Tensorflow.jsC4Media
 
Build Your Own WebAssembly Compiler
Build Your Own WebAssembly CompilerBuild Your Own WebAssembly Compiler
Build Your Own WebAssembly CompilerC4Media
 
User & Device Identity for Microservices @ Netflix Scale
User & Device Identity for Microservices @ Netflix ScaleUser & Device Identity for Microservices @ Netflix Scale
User & Device Identity for Microservices @ Netflix ScaleC4Media
 
Scaling Patterns for Netflix's Edge
Scaling Patterns for Netflix's EdgeScaling Patterns for Netflix's Edge
Scaling Patterns for Netflix's EdgeC4Media
 
Make Your Electron App Feel at Home Everywhere
Make Your Electron App Feel at Home EverywhereMake Your Electron App Feel at Home Everywhere
Make Your Electron App Feel at Home EverywhereC4Media
 
The Talk You've Been Await-ing For
The Talk You've Been Await-ing ForThe Talk You've Been Await-ing For
The Talk You've Been Await-ing ForC4Media
 
Future of Data Engineering
Future of Data EngineeringFuture of Data Engineering
Future of Data EngineeringC4Media
 
Automated Testing for Terraform, Docker, Packer, Kubernetes, and More
Automated Testing for Terraform, Docker, Packer, Kubernetes, and MoreAutomated Testing for Terraform, Docker, Packer, Kubernetes, and More
Automated Testing for Terraform, Docker, Packer, Kubernetes, and MoreC4Media
 
Navigating Complexity: High-performance Delivery and Discovery Teams
Navigating Complexity: High-performance Delivery and Discovery TeamsNavigating Complexity: High-performance Delivery and Discovery Teams
Navigating Complexity: High-performance Delivery and Discovery TeamsC4Media
 
High Performance Cooperative Distributed Systems in Adtech
High Performance Cooperative Distributed Systems in AdtechHigh Performance Cooperative Distributed Systems in Adtech
High Performance Cooperative Distributed Systems in AdtechC4Media
 
Rust's Journey to Async/await
Rust's Journey to Async/awaitRust's Journey to Async/await
Rust's Journey to Async/awaitC4Media
 
Opportunities and Pitfalls of Event-Driven Utopia
Opportunities and Pitfalls of Event-Driven UtopiaOpportunities and Pitfalls of Event-Driven Utopia
Opportunities and Pitfalls of Event-Driven UtopiaC4Media
 
Datadog: a Real-Time Metrics Database for One Quadrillion Points/Day
Datadog: a Real-Time Metrics Database for One Quadrillion Points/DayDatadog: a Real-Time Metrics Database for One Quadrillion Points/Day
Datadog: a Real-Time Metrics Database for One Quadrillion Points/DayC4Media
 
Are We Really Cloud-Native?
Are We Really Cloud-Native?Are We Really Cloud-Native?
Are We Really Cloud-Native?C4Media
 
CockroachDB: Architecture of a Geo-Distributed SQL Database
CockroachDB: Architecture of a Geo-Distributed SQL DatabaseCockroachDB: Architecture of a Geo-Distributed SQL Database
CockroachDB: Architecture of a Geo-Distributed SQL DatabaseC4Media
 
A Dive into Streams @LinkedIn with Brooklin
A Dive into Streams @LinkedIn with BrooklinA Dive into Streams @LinkedIn with Brooklin
A Dive into Streams @LinkedIn with BrooklinC4Media
 

Mehr von C4Media (20)

Shifting Left with Cloud Native CI/CD
Shifting Left with Cloud Native CI/CDShifting Left with Cloud Native CI/CD
Shifting Left with Cloud Native CI/CD
 
CI/CD for Machine Learning
CI/CD for Machine LearningCI/CD for Machine Learning
CI/CD for Machine Learning
 
Fault Tolerance at Speed
Fault Tolerance at SpeedFault Tolerance at Speed
Fault Tolerance at Speed
 
Architectures That Scale Deep - Regaining Control in Deep Systems
Architectures That Scale Deep - Regaining Control in Deep SystemsArchitectures That Scale Deep - Regaining Control in Deep Systems
Architectures That Scale Deep - Regaining Control in Deep Systems
 
ML in the Browser: Interactive Experiences with Tensorflow.js
ML in the Browser: Interactive Experiences with Tensorflow.jsML in the Browser: Interactive Experiences with Tensorflow.js
ML in the Browser: Interactive Experiences with Tensorflow.js
 
Build Your Own WebAssembly Compiler
Build Your Own WebAssembly CompilerBuild Your Own WebAssembly Compiler
Build Your Own WebAssembly Compiler
 
User & Device Identity for Microservices @ Netflix Scale
User & Device Identity for Microservices @ Netflix ScaleUser & Device Identity for Microservices @ Netflix Scale
User & Device Identity for Microservices @ Netflix Scale
 
Scaling Patterns for Netflix's Edge
Scaling Patterns for Netflix's EdgeScaling Patterns for Netflix's Edge
Scaling Patterns for Netflix's Edge
 
Make Your Electron App Feel at Home Everywhere
Make Your Electron App Feel at Home EverywhereMake Your Electron App Feel at Home Everywhere
Make Your Electron App Feel at Home Everywhere
 
The Talk You've Been Await-ing For
The Talk You've Been Await-ing ForThe Talk You've Been Await-ing For
The Talk You've Been Await-ing For
 
Future of Data Engineering
Future of Data EngineeringFuture of Data Engineering
Future of Data Engineering
 
Automated Testing for Terraform, Docker, Packer, Kubernetes, and More
Automated Testing for Terraform, Docker, Packer, Kubernetes, and MoreAutomated Testing for Terraform, Docker, Packer, Kubernetes, and More
Automated Testing for Terraform, Docker, Packer, Kubernetes, and More
 
Navigating Complexity: High-performance Delivery and Discovery Teams
Navigating Complexity: High-performance Delivery and Discovery TeamsNavigating Complexity: High-performance Delivery and Discovery Teams
Navigating Complexity: High-performance Delivery and Discovery Teams
 
High Performance Cooperative Distributed Systems in Adtech
High Performance Cooperative Distributed Systems in AdtechHigh Performance Cooperative Distributed Systems in Adtech
High Performance Cooperative Distributed Systems in Adtech
 
Rust's Journey to Async/await
Rust's Journey to Async/awaitRust's Journey to Async/await
Rust's Journey to Async/await
 
Opportunities and Pitfalls of Event-Driven Utopia
Opportunities and Pitfalls of Event-Driven UtopiaOpportunities and Pitfalls of Event-Driven Utopia
Opportunities and Pitfalls of Event-Driven Utopia
 
Datadog: a Real-Time Metrics Database for One Quadrillion Points/Day
Datadog: a Real-Time Metrics Database for One Quadrillion Points/DayDatadog: a Real-Time Metrics Database for One Quadrillion Points/Day
Datadog: a Real-Time Metrics Database for One Quadrillion Points/Day
 
Are We Really Cloud-Native?
Are We Really Cloud-Native?Are We Really Cloud-Native?
Are We Really Cloud-Native?
 
CockroachDB: Architecture of a Geo-Distributed SQL Database
CockroachDB: Architecture of a Geo-Distributed SQL DatabaseCockroachDB: Architecture of a Geo-Distributed SQL Database
CockroachDB: Architecture of a Geo-Distributed SQL Database
 
A Dive into Streams @LinkedIn with Brooklin
A Dive into Streams @LinkedIn with BrooklinA Dive into Streams @LinkedIn with Brooklin
A Dive into Streams @LinkedIn with Brooklin
 

Kürzlich hochgeladen

How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfhans926745
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdfChristopherTHyatt
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdflior mazor
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 

Kürzlich hochgeladen (20)

How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdf
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 

H2O's Driverless AI: An AI That Creates AI

  • 1. QCon London H2O Driverless AI An AI that creates AI! Marios Michailidis
  • 2. InfoQ.com: News & Community Site Watch the video with slide synchronization on InfoQ.com! https://www.infoq.com/presentations/ h2o-driverless-ai • Over 1,000,000 software developers, architects and CTOs read the site world- wide every month • 250,000 senior developers subscribe to our weekly newsletter • Published in 4 languages (English, Chinese, Japanese and Brazilian Portuguese) • Post content from our QCon conferences • 2 dedicated podcast channels: The InfoQ Podcast, with a focus on Architecture and The Engineering Culture Podcast, with a focus on building • 96 deep dives on innovative topics packed as downloadable emags and minibooks • Over 40 new content items per week
  • 3. Purpose of QCon - to empower software development by facilitating the spread of knowledge and innovation Strategy - practitioner-driven conference designed for YOU: influencers of change and innovation in your teams - speakers and topics driving the evolution and innovation - connecting and catalyzing the influencers and innovators Highlights - attended by more than 12,000 delegates since 2007 - held in 9 cities worldwide Presented at QCon London www.qconlondon.com
  • 4. Background • Competitive data scientist at H2O.ai • PhD in ensemble methods at UCL • Former kaggle #1 – over 120+ competitions
  • 5. Features Target Data Quality and Transformation Modeling Table Model Building Model Data Integration + Challenges in the Machine Learning Workflow Weeks or even Months per Model Optimization Highly Iterative Process • Insight – Visualization • Cross Validation • Feature Engineering • Model Selection • Hyper Parameter Optimization • Feature Selection • Ensemble • Understanding/Interpreting the results • Deploy/Productionize
  • 7. Confidential5 Animal Dog Dog Cat Fish Cat Dog Fish Cat Cat Fish L.Encoding 2 2 1 3 1 2 3 1 1 3 Frequency 3 3 4 3 4 3 3 4 4 3 Is_Dog? Is_Cat? Is_Fish? 1 0 0 1 0 0 0 1 0 0 0 1 0 1 0 1 0 0 0 0 1 0 1 0 0 1 0 0 0 1 Cost 100 150 300 50 250 75 25 400 225 40 Animal Avg.cost Cat 293.75 Dog 108.333 Fish 38.3333 108.33 108.33 293.75 38.33 293.75 108.33 38.33 293.75 293.75 38.33 Feature engineering: Categorical features
  • 9. Confidential7 Age incom e 40 40 45 14 45 100 45 33 49 20 51 44 56 65 61 80 62 55 71 15 73 18 74 33 77 32 78 48 78 41 * + 1600 80 630 59 4500 145 1485 78 980 69 2244 95 3640 121 4880 141 3410 117 1065 86 1314 91 2442 107 2464 109 3744 126 3198 119 Animal color Dog brown Dog white Cat black Fish gold Cat mixed Dog black Fish white Cat white Cat black Fish brown Animal_color Dog_brown Dog_white Cat_black Fish_gold Cat_mixed Dog_black Fish_white Cat_white Cat_black Fish_brown Age Animal 13 Dog 12 Dog 15 Cat 3 Fish 16 Cat 10 Dog 6 Fish 20 Cat 13 Cat 2 Fish derive d 11.66 11.66 16 3.66 16 11.66 3.66 16 16 3.66 Animal Age Dog 11.66 Cat 16 Fish 3.66 Feature engineering: Interactions
  • 10. Confidential8 This dog is cute! I love dogs dog this is jedi love cute empero r bingo ! I 2 1 1 0 1 1 0 0 1 1 • SVD • Word2vec word f1 f2 f3 f4 f5 dog 0.8 0.2 0.1 -0.1 -0.4 this -0.2 0 0.1 0.8 0.3 love 0.1 0.2 0.7 -0.5 0.1 cute 0.2 0.2 0.8 -0.4 0 King – Man = Queen Feature engineering: Text
  • 11. Confidential9 Date 1/1/2018 2/1/2018 3/1/2018 4/1/2018 5/1/2018 6/1/2018 7/1/2018 8/1/2018 9/1/2018 10/1/2018 Day Month Year weekda y weeknum 1 1 2018 2 1 2 1 2018 3 1 3 1 2018 4 1 4 1 2018 5 1 5 1 2018 6 1 6 1 2018 7 1 7 1 2018 1 2 8 1 2018 2 2 9 1 2018 3 2 10 1 2018 4 2 Date Sales 1/1/2018 100 2/1/2018 150 3/1/2018 160 4/1/2018 200 5/1/2018 210 6/1/2018 150 7/1/2018 160 8/1/2018 120 9/1/2018 80 10/1/2018 70 Lag1 Lag2 - - 100 - 150 100 160 150 200 160 210 200 150 210 160 150 120 160 80 120 MA2 - 100 125 155 180 205 180 155 140 100 Feature engineering: Time series
  • 12. Confidential10 Common open source packages used in ML LIBLINEAR LIBFFM LIBSVM
  • 13. Confidential11 • Optimizing maximum depth • Tree expansion (loss vs depth) • Learning rate • Number of trees 1 2 3 3 2 Hyper parameter tuning
  • 14. Copyright 2018 H2O.ai Inc. All Rights Reserved. Validation approach Train dataPast Future x0 x1 x2 x3 y 0.94 0.27 0.80 0.34 1 0.02 0.22 0.17 0.84 0 0.83 0.11 0.23 0.42 1 0.74 0.26 0.03 0.41 0 0.08 0.29 0.76 0.37 0 0.71 0.76 0.43 0.95 1 0.08 0.72 0.97 0.04 0 0.84 0.79 0.89 0.05 1 0.94 0.27 0.80 0.34 1 0.02 0.22 0.17 0.84 0 0.83 0.11 0.23 0.42 1 0.74 0.26 0.03 0.41 0 0.08 0.29 0.76 0.37 0 0.71 0.76 0.43 0.95 1 0.08 0.72 0.97 0.04 0 0.84 0.79 0.89 0.05 1 K=4 pred 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 Fold : 1 pred 0.96 0.03 0.00 0.00 0.00 0.00 0.00 0.00 Train Predict pred 0.96 0.03 0.90 0.12 0.00 0.00 0.00 0.00 Fold : 2Fold : 3 pred 0.96 0.03 0.90 0.12 0.03 0.77 0.00 0.00 Fold : 4 pred 0.96 0.03 0.90 0.12 0.03 0.77 0.18 0.91
  • 15. Confidential13 Genetic algorithm approach x1 x2 x3 x4 y 0.14 0.69 0.01 0.71 1 0.22 0.44 0.45 0.69 1 0.12 0.35 0.51 0.23 0 0.22 0.42 0.79 0.60 1 0.93 0.82 0.72 0.50 1 0.32 0.58 0.28 0.22 0 0.95 0.59 0.68 0.09 1 0.34 0.58 0.35 0.81 0 0.05 0.80 0.28 0.86 1 0.23 0.49 0.63 0.03 0 0.05 0.34 0.53 0.73 1 0.74 0.02 0.33 0.56 0 Iteration1/10 X% accuracy Feature Importance x4 1 x2 0.5 x3 0.2 x1 0.01 Tuned Iteration2/10 + Random Transformation sqrt(x2) Z% accuracy Feature Importance x2+x4 1 x4 0.4 sqrt(x2) 0.3 x3 0.2 x2 0.05
  • 16. Confidential14 Feature ranking based on permutations 0.5 0.52 0.69 0.2 1 0.18 0.94 0.57 0.68 0 0.67 0.6 0.76 0.91 1 0.25 0.68 0.57 0.07 1 0.83 0.07 0.76 0.56 0 0.49 0.92 0.64 0.02 0 0.17 0.7 0.57 1 0.96 0.75 0.59 1 0.96 0.14 0.09 0 0.61 0.38 0.07 0 0.07 0.13 0.87 1 0.66 0.27 0.48 1 0.41 0.56 0.21 0.61 0.26 0.02 0.02 0.56 0.21 0.41 0.61 0.26 Train Validation fit score 80% accuracy 70% accuracy (drop) Shuffle The 10% difference is how important that feature is Bring back to normal Repeat for all features
  • 17. Confidential15 Stacking X0 x1 x2 xn y 0.17 0.25 0.93 0.79 1 0.35 0.61 0.93 0.57 0 0.44 0.59 0.56 0.46 0 0.37 0.43 0.74 0.28 1 0.96 0.07 0.57 0.01 1 A X0 x1 x2 xn y 0.89 0.72 0.50 0.66 0 0.58 0.71 0.92 0.27 1 0.10 0.35 0.27 0.37 0 0.47 0.68 0.30 0.98 0 0.39 0.53 0.59 0.18 1 B X0 x1 x2 xn y 0.29 0.77 0.05 0.09 ? 0.38 0.66 0.42 0.91 ? 0.72 0.66 0.92 0.11 ? 0.70 0.37 0.91 0.17 ? 0.59 0.98 0.93 0.65 ? C pred0 pred1 pred2 y 0.24 0.72 0.70 0 0.95 0.25 0.22 1 0.64 0.80 0.96 0 0.89 0.58 0.52 0 0.11 0.20 0.93 1 B1 pred0 pred1 pred2 y 0.50 0.50 0.39 ? 0.62 0.59 0.46 ? 0.22 0.31 0.54 ? 0.90 0.47 0.09 ? 0.20 0.09 0.61 ? C1 Train algorithm 0 on A and make predictions for B and C and save to B1, C1 Train algorithm 1 on A and make predictions for B and C and save to B1, C1 Train algorithm 2 on A and make predictions for B and C and save to B1, C1 Train algorithm 3 on B1 and make predictions for C1 Preds3 0.45 0.23 0.99 0.34 0.05 Consider datasets A,B,C. Target variable (y) is known for A,B
  • 18. Confidential16 “The ability to explain or to present in understandable terms to a human.” – Finale Doshi-Velez and Been Kim. “Towards a rigorous science of interpretable machine learning.” arXiv preprint. 2017. https://arxiv.org/pdf/1702.08608.pdf FAT*: https://www.fatml.org/resources/principles-for-accountable-algorithms XAI: https://www.darpa.mil/program/explainable-artificial-intelligence Machine Learning Interpretability?
  • 19. Confidential17 Interpretability and accuracy trade-off Linear Models Exact explanations for approximate models. Machine Learning Approximate explanations for exact models.
  • 20. Confidential18 Decision Tree Surrogate Models Partial Dependence and Individual Conditional Expectation https://github.com/jphall663/interpretable_machine_learning_with_python Machine Learning Interpretability examples
  • 21. H2O.ai Product Suite Automatic feature engineering, machine learning and interpretability • 100% open source – Apache V2 licensed • Built for data scientists – interface using R, Python on H2O Flow (interactive notebook interface) • Enterprise support subscriptions • Enterprise software • Built for domain users, analysts and data scientists – GUI-based interface for end-to-end data science • Fully automated machine learning from ingest to deployment H2O AI open source engine integration with Spark Lightning fast machine learning on GPUs In-memory, distributed machine learning algorithms with H2O Flow GUI Open Source
  • 22. Confidential20 H2O is the open source leader in AI. Democratize AI for Everyone. AI for good.
  • 23. Confidential21 Will there be a default? AUC, Accuracy, Precision… Time, hardware Insight-visualization Feature Engineering Predictions Model Interpretability • Some input data • A target variable • An objective (or a success metric) • Some allocated resources H2O Driverless AI process x1 x2 x3 x4 y 0.14 0.69 0.01 0.71 1 0.22 0.44 0.45 0.69 1 0.12 0.35 0.51 0.23 0 0.22 0.42 0.79 0.60 1 0.93 0.82 0.72 0.50 1 0.32 0.58 0.28 0.22 0 0.95 0.59 0.68 0.09 1 0.34 0.58 0.35 0.81 0 0.05 0.80 0.28 0.86 1 0.23 0.49 0.63 0.03 0 0.05 0.34 0.53 0.73 1 0.74 0.02 0.33 0.56 0
  • 27. Watch the video with slide synchronization on InfoQ.com! https://www.infoq.com/presentations/ h2o-driverless-ai