Diese Präsentation wurde erfolgreich gemeldet.
Wir verwenden Ihre LinkedIn Profilangaben und Informationen zu Ihren Aktivitäten, um Anzeigen zu personalisieren und Ihnen relevantere Inhalte anzuzeigen. Sie können Ihre Anzeigeneinstellungen jederzeit ändern.

Productionizing H2O Models with Apache Spark - Jakub Hava - H2O AI World London 2018

266 Aufrufe

Veröffentlicht am

This video was recorded in London on October 30th, 2018 and can be viewed here:https://www.youtube.com/watch?v=6PxoLP63CYE&t=0s&list=PLNtMya54qvOHh9LaA08hkusynWVStNEhm&index=20

Spark pipelines represent a powerful concept to support productionizing machine learning workflows. Their API allows to combine data processing with machine learning algorithms and opens opportunities for integration with various machine learning libraries. However, to benefit from the power of pipelines, their users need to have a freedom to choose and experiment with any machine learning algorithm or library. Therefore, we developed Sparkling Water that embeds H2O machine learning library of advanced algorithms into the Spark ecosystem and exposes them via pipeline API. Furthermore, the algorithms benefit from H2O MOJOs - Model Object Optimized - a powerful concept shared across entire H2O platform to store and exchange models. The MOJOs are designed for effective model deployment with focus on scoring speed, traceability, exchangeability, and backward compatibility. In this talk we will explain the architecture of Sparkling Water with focus on integration into the Spark pipelines and MOJOs. We’ll demonstrate creation of pipelines integrating H2O machine learning models and their deployments using Scala or Python. Furthermore, we will show how to utilize pre-trained model MOJOs with Spark pipelines.

Bio: Jakub (or “Kuba” as we call him) completed his Bachelor’s Degree in Computer Science and Master’s Degree in Software Systems at Charles University in Prague. As a bachelor’s thesis, Kuba wrote a small platform for distributed computing of any types of tasks. During his master’s degree studies, he developed a cluster monitoring tool for JVM based languages which makes debugging and reasoning the performance of distributed systems easier using a concept called distributed stack traces. Kuba enjoys dealing with problems and learning new programming languages. At H2O.ai, Kuba works on Sparkling Water.

Aside from programming, Kuba enjoys exploring new cultures and bouldering. He’s also a big fan of tea preparation and the associated ceremony.

Linkedin: https://www.linkedin.com/in/havaj/

Veröffentlicht in: Technologie
  • Als Erste(r) kommentieren

Productionizing H2O Models with Apache Spark - Jakub Hava - H2O AI World London 2018

  1. 1. Productionizing H2O Models with Apache Spark Jakub Hava Senior Software Engineer H2O.ai https://www.linkedin.com/in/havaj/
  2. 2. #ML4SAIS Who are we? • Kuba • Senior Software engineer at H2O.ai - Core Sparkling Water • Master’s at Charles University (CZ) • Implemented high-performance cluster monitoring tool for JVM based languages (JNI, JVMTI, instrumentation) • Michal • VP of Engineering at H2O.ai • Creator of Sparkling Water • Ph.D at Charles University (CZ), PostDoc at Purdue (US) 2
  3. 3. Machine Learning (ML) Lifecycle
  4. 4. 4 Model Training Algorithm Feature Engineering Model Pipeline Building Training Prediction s Data Engineering Basic ML Lifecycle #ML4SAIS
  5. 5. 5 Model Training Algorithm Feature Engineering Featurization Pipeline Model Model Pipeline Building Training Prediction s Deploymen t Predictions Data Engineering Model Pipeline Deployment Basic ML Lifecycle #ML4SAIS
  6. 6. Example Implementations 6#ML4SAIS Data Engineering Feature Engineering Training Algorithm Deployment Pipeline Model Spark H2O Spark H2O MOJO Spark H2O Driverless AI Spark H2O Driverless AI MOJO Model Building Model Deployment
  7. 7. H2O + Spark = Sparkling Water
  8. 8. #ML4SAIS H2O + Spark • H2O • Machine Learning Library • Distributed Algorithms • For ML experts • Sparkling Water • Integrates H2O & Spark Ecosystems • Transparent for Spark users • Based on Spark pipelines & H2O 8
  9. 9. Basic ML Lifecycle: Sparkling Water 9 Model Training Algorithm Feature Engineering Spark Transformers H2O MOJO Model Training Prediction s Deploymen t Predictions AutoM L Pipeline #ML4SAIS
  10. 10. Demo: Spark Pipeline
  11. 11. H2O Driverless AI
  12. 12. #ML4SAIS H2O Driverless AI • What if I’m not expert ? • H2O Driverless AI • H2O Driverless AI • No expert knowledge required • Automatic Feature Engineering & ML 13
  13. 13. Basic ML Lifecycle: Driverless AI 14 Model Training Algorithm Feature Engineering Driverless AI Feature Transformations Driverless AI Model Training Prediction s Deploymen t Predictions PipelineDriverless AI MOJO as #ML4SAIS
  14. 14. Demo: Driverless AI as Spark Pipeline
  15. 15. 16
  16. 16. Driverless AI Pipeline 17#ML4SAIS
  17. 17. Governed ML Lifecycle
  18. 18. Governed ML Lifecycle 19 Model Training Algorithm Feature Engineering Featurization Pipeline Model Model Pipeline Building Training Prediction s Deploymen t Predictions Model Manageme nt Data Engineering Model Pipeline Deployment Model Monitoring Auto Documentation #ML4SAIS
  19. 19. #ML4SAIS Materials 20 https://bit.ly/2sxowxD
  20. 20. #ML4SAIS Sparkling Water enables deployment of H2O ML models with Spark Pipelines 21 Thank you!

×