This video was recorded in London on October 30th, 2018 and can be viewed here:https://www.youtube.com/watch?v=6PxoLP63CYE&t=0s&list=PLNtMya54qvOHh9LaA08hkusynWVStNEhm&index=20
Spark pipelines represent a powerful concept to support productionizing machine learning workflows. Their API allows to combine data processing with machine learning algorithms and opens opportunities for integration with various machine learning libraries. However, to benefit from the power of pipelines, their users need to have a freedom to choose and experiment with any machine learning algorithm or library. Therefore, we developed Sparkling Water that embeds H2O machine learning library of advanced algorithms into the Spark ecosystem and exposes them via pipeline API. Furthermore, the algorithms benefit from H2O MOJOs - Model Object Optimized - a powerful concept shared across entire H2O platform to store and exchange models. The MOJOs are designed for effective model deployment with focus on scoring speed, traceability, exchangeability, and backward compatibility. In this talk we will explain the architecture of Sparkling Water with focus on integration into the Spark pipelines and MOJOs. We’ll demonstrate creation of pipelines integrating H2O machine learning models and their deployments using Scala or Python. Furthermore, we will show how to utilize pre-trained model MOJOs with Spark pipelines.
Bio: Jakub (or “Kuba” as we call him) completed his Bachelor’s Degree in Computer Science and Master’s Degree in Software Systems at Charles University in Prague. As a bachelor’s thesis, Kuba wrote a small platform for distributed computing of any types of tasks. During his master’s degree studies, he developed a cluster monitoring tool for JVM based languages which makes debugging and reasoning the performance of distributed systems easier using a concept called distributed stack traces. Kuba enjoys dealing with problems and learning new programming languages. At H2O.ai, Kuba works on Sparkling Water.
Aside from programming, Kuba enjoys exploring new cultures and bouldering. He’s also a big fan of tea preparation and the associated ceremony.
Linkedin: https://www.linkedin.com/in/havaj/
2. #ML4SAIS
Who are we?
• Kuba
• Senior Software engineer at H2O.ai - Core Sparkling Water
• Master’s at Charles University (CZ)
• Implemented high-performance cluster monitoring tool for JVM based
languages (JNI, JVMTI, instrumentation)
• Michal
• VP of Engineering at H2O.ai
• Creator of Sparkling Water
• Ph.D at Charles University (CZ), PostDoc at Purdue (US)
2
8. #ML4SAIS
H2O + Spark
• H2O
• Machine Learning Library
• Distributed Algorithms
• For ML experts
• Sparkling Water
• Integrates H2O & Spark Ecosystems
• Transparent for Spark users
• Based on Spark pipelines & H2O
8
9. Basic ML Lifecycle: Sparkling Water
9
Model
Training
Algorithm
Feature
Engineering
Spark Transformers H2O MOJO
Model
Training
Prediction
s
Deploymen
t
Predictions
AutoM
L
Pipeline
#ML4SAIS
12. #ML4SAIS
H2O Driverless AI
• What if I’m not expert ?
• H2O Driverless AI
• H2O Driverless AI
• No expert knowledge required
• Automatic Feature Engineering & ML
13
13. Basic ML Lifecycle: Driverless AI
14
Model
Training
Algorithm
Feature
Engineering
Driverless AI Feature
Transformations
Driverless AI Model
Training
Prediction
s
Deploymen
t
Predictions
PipelineDriverless AI MOJO
as
#ML4SAIS