H2O: A Platform for Big Math
From just your laptop to 100's of nodes, H2O gives you a Single System Image - easy aggregation of all the memory and all the cores, and a simple coding style that scales wide at in-memory speeds. H2O is easily 1000x faster than disk based clustering solutions, and often 10x faster than best-of-breed alternative in-memory solutions - and will work directly on your existing Hadoop cluster. H2O ingests a wide variety of formats, parallel and distributed across the cluster, and stores the data highly compressed and then lets you do scale-out math at memory-bandwidth speeds (on compressed data!), making terabyte-scale munging an interactive experience. This is a technical talk on the insides of H2O, specifically focusing on the Single-System-Image aspect: how we write single-threaded code, and have H2O auto-parallelize and auto-scale-out to 100's of nodes and 1000's of cores.
Arno is the Chief Architect of H2O, a distributed and scalable open-source machine learning platform. He is also the main author of H2O’s Deep Learning. Before joining H2O.ai, Arno was a founding Senior MTS at Skytree where he designed and implemented high-performance machine learning algorithms. He has over a decade of experience in HPC with C++/MPI and had access to the world’s largest supercomputers as a Staff Scientist at SLAC National Accelerator Laboratory where he participated in US DOE scientific computing initiatives and collaborated with CERN on next-generation particle accelerators. Arno holds a PhD and Masters summa cum laude in Physics from ETH Zurich, Switzerland. He has authored dozens of scientific papers and is a sought-after conference speaker. Arno was named "2014 Big Data All-Star" by Fortune Magazine. Follow him on Twitter: @ArnoCandel.
- Powered by the open source machine learning software H2O.ai. Contributors welcome at: https://github.com/h2oai
- To view videos on H2O open source machine learning software, go to: https://www.youtube.com/user/0xdata
Arno candel h2o_a_platform_for_big_math_hadoop_summit_june2016
1. H2O.ai
Machine Intelligence
H2O: A Platform for Big Math
Arno Candel, PhD
Chief Architect
or: How to make A.I. and TensorFlow work for you
+20 more
2. H2O.ai
Machine Intelligence
Who Am I?
Arno Candel
Chief Architect,
Physicist & Hacker at H2O.ai
PhD Physics, ETH Zurich 2005
10+ yrs Supercomputing (HPC)
6 yrs at SLAC (Stanford Linear Accelerator)
4.5 yrs Machine Learning
2.5 yrs at H2O.ai
Fortune Magazine Big Data All Star
Follow me @ArnoCandel
2
Who am I?
5. H2O.ai
Machine Intelligence 5
Brief History of A.I., ML and DL
John McCarthy
Princeton, Bell Labs, Dartmouth, later: MIT, Stanford
1955: “A proposal for the Dartmouth summer
research project on Artificial Intelligence”
with Marvin Minsky (MIT), Claude Shannon
(Bell Labs) and Nathaniel Rochester (IBM)
http://www.asiapacific-mathnews.com/04/0403/0015_0020.pdf
A step back: A.I. was coined over 60 years ago
8. H2O.ai
Machine Intelligence 8
Step 3: Big Data + In-Memory Clusters
2011: Jeopardy (IBM Watson)
In-Memory Analytics/ML
4 TB of data (incl. wikipedia), 90 servers,
16 TB RAM, Hadoop, 6 million logic rules
https://www.youtube.com/watch?v=P18EdAKuC1U https://en.wikipedia.org/wiki/Watson_(computer)
Note: IBM Watson received the question in electronic written form, and was
often able to press the answer button faster than the competing humans.
“No computer will ever answer random questions!?”
9. H2O.ai
Machine Intelligence 9
“No computer will ever speak any language!?”
2014: Google
(acquired Quest Visual)
Deep Learning
Convolutional and Recurrent
Neural Networks,
with training data from users
Step 4: Deep Learning
• Translate between 103 languages by typing
• Instant camera translation: Use your camera to translate text instantly in 29 languages
• Camera Mode: Take pictures of text for higher-quality translations in 37 languages
• Conversation Mode: Two-way instant speech translation in 32 languages
• Handwriting: Draw characters instead of using the keyboard in 93 languages
10. H2O.ai
Machine Intelligence 10
Step 5: Augmented Deep Learning
2014: Atari Games (DeepMind)
2016: AlphaGo (Google DeepMind)
Deep Learning
+ reinforcement learning, tree search,
Monte Carlo, GPUs, playing against itself, …
https://deepmind.com
Go board has approx.
200,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,
000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000 (2E170) possible positions.
trained from raw pixel values, no human rules
“No computer will ever beat the best Go master!?”
11. H2O.ai
Machine Intelligence 11
Microsoft had won the Visual Recognition challenge:
http://image-net.org/challenges/LSVRC/2015/
Step 6: A.I. Chatbots have Opinions too!
17. H2O.ai
Machine Intelligence 17
Gradient Boosting Machine
Tree Model
(nano-fast)
Auto-generated
Java scoring code
to easily
Operationalize Data Science
Easily Bring Models into Production
READ MORE
18. H2O.ai
Machine Intelligence
Spark + H2O = Sparkling Water
18
• Spark 2.0 API compatibility
• Use H2O algorithms in
conjunction with, or instead of,
MLLib algorithms on Spark
• Build Ensembles using H2O and
MLLib Algorithms
• Visual Intelligence for Spark. Run
Spark, MLLib, Scala in Flow
• Export MLLib models as POJOs
• Toolchain for ML pipelines and
debugging support
Sparkling W
ater
2.0
20. H2O.ai
Machine Intelligence 20
H2O Elastic Net (GLM): 10 secs
alpha=0.5, lambda=1.379e-4 (auto)
H2O Deep Learning: 45 secs
4 hidden ReLU layers of 20 neurons, 1 epoch
Features have non-
linear impact
Chicago, Atlanta,
Dallas:
often delayed
Significant Performance Gains with Deep Learning
Predict departure delay (Y/N) on 20 years of airline flight data
(116M rows, 12 cols, categorical + numerical data with missing values)
WATCH NOW
AUC: 0.656
AUC: 0.703
(higher is better, ranges from 0.5 to 1)
Feature importances
10 nodes: Dual E5-2650 (8 cores, 2.6GHz), 10GbE
25. 25
Distributed Gradient Boosting Machine
find optimal split
(feature & value)
• H2O: First open-source implementation of scalable, distributed
Gradient Boosting Machine - fully featured
• Parallelized Individual Tree Construction
• Discretization (binning) for speedup without loss of accuracy
age < 25 ?
Y N
all data
age
12 118
income
1k 1M
Analytical error landscape
best split: age 25
H2O: discretized into bins
12 118
age 25
age
29. H2O.ai
Machine Intelligence
Digital Marketing - Campaigns
“H2O gave us the capability to do Big
Modeling. There is no limit to scaling in H2O.”
“Working with the H2O
team has been amazing.”
“The business value that we have gained
from advanced analytics is enormous.”
WATCH NOW
WATCH NOW
29
30. H2O.ai
Machine Intelligence
WATCH NOW
WATCH NOW
Matching TV Watching Behavior with Buying Behavior
“Unlike other systems where I had
to buy the whole package and just
use 10-20%, I can customize H2O
to suit my needs.”
“I am a big fan of open source. H2O is
the best fit in terms of cost as well as ease
of use and scalability and usability.”
30
31. H2O.ai
Machine Intelligence
WATCH NOW
WATCH NOW
Insurance - Risk Assessment
“Predictive analytics is the differentiator
for insurance companies going forward
in the next couple of decades.”
“Advanced analytics was one of the
key investments that we decided to
make.”
31
32. H2O.ai
Machine Intelligence
Fintech - Fraud/Risk/Churn/etc.
“H2O is a great solution because it's
designed to be enterprise ready and
can operate on very large datasets.”
”H2O has been a one-stop shop that helps
us do all our modeling in one framework.”
”H2O is the best solution to be able to
iterate very quickly on large datasets
and produce meaningful models.”
WATCH NOW
WATCH NOW
32
Today’s Keynote!
39. H2O.ai
Machine Intelligence 39
H2O OPEN TOUR w w w. O P E N . H 2 O . A I
We’re coming to a town near you in NYC / TX
Visit our Booth Today!
40. H2O.ai
Machine Intelligence
A.I. and Deep Learning are hot (again)!
Make your own smart data products with H2O!
Try H2O today - installs in minutes!
40
h2o.ai/download
https://www.youtube.com/user/0xdata/videos
https://github.com/h2oai/h2o-3
H2O Google Group
@h2oai
Summary
We’re hiring: h2o.ai/careers/