Michal Malohlava talks about the PySparkling Water package for Spark and Python users.
- Powered by the open source machine learning software H2O.ai. Contributors welcome at: https://github.com/h2oai
- To view videos on H2O open source machine learning software, go to: https://www.youtube.com/user/0xdata
3. H2O.ai
Machine Intelligence
H2O
Open-Source In-Memory Data Science
Platform
• Highly optimized Java code (in-house)
• Distributed in-memory K-V store and map/
reduce computation framework
• Data parser (HDFS, S3, NFS, HTTP, local
drives, etc.)
• Read/write access to distributed data
frames (R/Pandas-style)
• ML algos - Deep Learning, GBM, DRF, GLM,
GLRM, K-Means, PCA, CoxPH, Ensembles
• REST API: clients Interactive UI/R/Python
H2O Python
client
pySparkling
4. H2O.ai
Machine Intelligence
PySparkling
Provides
Transparent integration of H2O machine learning
platform with Spark ecosystem (PySpark)
Transparent use of H2O data structures (H2OFrame)
and algorithms with Spark Python API
Excels in existing Spark workflows requiring
advanced Machine Learning algorithms
FunctionalitymissinginH2Ocanbe
replacedbySparkandviceversa
5. H2O.ai
Machine Intelligence
Benefits
• Additional algorithms
• NLP features
• Powerful data munging
• ML Pipelines
• Advanced algorithms
• speed v. accuracy
• advanced parameters
• Fully distributed and
parallelized
• Graphical environment
• Fully fledged Python/R
interfaces
38. H2O.ai
Machine Intelligence
The Plan
Separation of H2O cluster from Spark infrastructure
✓ Preserving existing API
h2oContext = H2OContext.getOrCreate(ip=“…”, port=…)
Better integration into PySpark pipelines
✓ Support of H2O Ensembles (right now only as R-package)
Integration with Steam platform to support model
management
DeepWater integration H2O DeepWater with Python
early sneak
Fabrizio MiloSunday
3pm
39. H2O.ai
Machine Intelligence
Checkout GitHub & Contribute
https://github.com/h2oai/sparkling-water
Checkout H2O.ai Training Books
http://h2o.ai/resources
Checkout H2O.ai Blog
http://h2o.ai/blog/
Checkout H2O.ai Youtube Channel
https://www.youtube.com/user/0xdata
More info
40. H2O.ai
Machine Intelligence
Learn more at h2o.ai
Follow us at @h2oai
Come to see us at Open Tour
in Dallas! See open.h2o.ai
PySparkling is
open-source
ML application platform
combining
power of PySpark and H2O
Thank you!
DALLAS, TX
OCT 26th