SlideShare ist ein Scribd-Unternehmen logo
1 von 19
Deploying H2O in
Large-Scale Distributed
Environments using
Containers
Yoann Lechevallier
Senior Systems Engineer for Europe, Middle East, & Africa (EMEA)
BlueData
www.bluedata.ai @BlueData www.linkedin.com/in/yoannlechevallier
Large-Scale Distributed
Environments for AI / ML / DL
• Access to valuable data: small, big, or both
• Choices of modeling techniques: each problem is
different
• Ability to build on datasets, validate on other
datasets, iterate, and improve
• Access to GPUs (and CPUs)
• Scale easily on real datasets
• Ability to operationalize in production
Source: https://rohitnarurkar.wordpress.com/2013/11/02/cuda-matrix-multiplication
Distributed ML / DL – Key Requirements
• Scalability, repeatability, complexity,
reproducibility across environments
• Sharing data, not duplicating data
• Deploying distributed platforms, libraries,
applications, and versions
• Efficiently sharing expensive resources like GPUs
• Agility to scale up and down compute resources
• Providing a future-proof solution
• Ensuring compatible NVIDIA device kernel module
installation
Distributed ML / DL – Challenges
Laptop On-Prem
Cluster
Off-Prem
Cluster
Deploying H2O in a Distributed
Environment, on Containers
Container-Based Architecture for AI / ML / DL
IOBoost™ – Extreme performance and enterprise-grade scalability
ElasticPlane™ – Self-service, multi-tenant containerized environments
DataTap™ – In-place access to data on-prem or in the cloud
Data Scientists Developers Data Engineers Data Analysts
NFS HDFS
Compute
Storage
On-Premises
CPUs GPUs
Hybrid Multi-Cloud
BlueData EPIC™ Software Platform
Example of an H2O Pipeline on Containers
H2O Driverless AI
Import Validate
Export
Shared Data Access Layer
… Data Sources …
With H2O + BlueData EPIC, enterprise customers now have:
• Pre-built Docker H2O images with CUDA and automated
cluster creation for the entire stack
• Appropriate NVIDIA kernel module surfaced automatically to
the containers
• Easy access to resources required (e.g. single node, single
GPU, multi-node, multi-GPU combinations)
• UI, CLI, and API access (notebooks, web, SSH)
• NFS mounts surfaced as local drives for sharing assets
Challenges Solved, Deployment Accelerated
Deploy H2O from Pre-Built Images in the
BlueData EPIC App Store
Docker images for multiple
applications and versions
Ability to create and
add new images, and
save or restore
tested combinations
on demand
Multi-Tenant, with Quotas for GPU
Resources
Support for multi-tenancy
and ability to define quota
per tenant
Define ‘flavor’ types used to
launch Docker containers
Spin Up Multiple Environments
Quick launch templates
for one-click cluster
creation
Run multiple clusters,
with different versions or
combinations of tools,
side by side
Pick from a list of
pre-built and tested images
Assign specific resources (GPUs,
CPUs) to the cluster, depending on
the use case (e.g. for Driverless AI)
Define number of nodes, here for
H2O and Sparkling Water
On-Demand Cluster Creation
• The user authenticates on Driverless AI
• Import datasets from BlueData
DataTap with DataTap connector,
optimised access with BlueData
IOBoost
• Analyse the data
• Run experiments
• Build models, save them …
• Validate against other datasets from
DataTap …
• Export model for production
Run Driverless AI on Containers with GPUs
dtap
• Optionally initialise Sparkling Water against an existing H2O cluster created previously
[external backend]
• Pass to Sparkling Water the appropriate jar to use for the HDFS connectivity
• Work on your dataset using the HDFS connectivity
Work with Sparkling Water Cluster and HDFS
• BlueData EPIC automatically
deploys the environments
• Using persistent containers
• Providing true multi-tenancy
• Access to shared resources (CPU,
RAM, GPUs, storage)
• Pre-built H2O images in the
BlueData EPIC App Store
• Enterprise-grade security
(integration with AD /LDAP / TDE)
Simplify H2O Deployments at Scale in Minutes
BlueData DataTap
BlueData IOBoost
Enable Compute / Storage Separation
Connect the clusters to different datasets without
copying the data, and with performance optimised
From the BlueData EPIC App Store, deploy
more application clusters to connect to H2O
Integrate H2O with Production Environment
• Infrastructure for distributed ML / DL is complex (CPUs, GPUs, data
…)
• This complexity can be abstracted from data science teams with
self-service provisioning and automation, using containers
• GPU access can be effectively used by the containerized
application, then released for other applications and users
• For a flexible and scalable solution, data resources should be
decoupled from compute
• H2O, Driverless AI, and Sparkling Water can be deployed at scale on
containers – whether on-premises, on any public cloud, or hybrid
• BlueData + H2O proven in production with Global 2000
Lessons Learned – H2O on Containers
Thank you !
www.bluedata.ai
Yoann Lechevallier
Senior Systems Engineer for Europe, Middle East, & Africa (EMEA)
BlueData
@BlueData www.linkedin.com/in/yoannlechevallier

Weitere ähnliche Inhalte

Mehr von Sri Ambati

Building Custom GenAI Apps at H2O
Building Custom GenAI Apps at H2OBuilding Custom GenAI Apps at H2O
Building Custom GenAI Apps at H2OSri Ambati
 
Applied Gen AI for the Finance Vertical
Applied Gen AI for the Finance Vertical Applied Gen AI for the Finance Vertical
Applied Gen AI for the Finance Vertical Sri Ambati
 
Cutting Edge Tricks from LLM Papers
Cutting Edge Tricks from LLM PapersCutting Edge Tricks from LLM Papers
Cutting Edge Tricks from LLM PapersSri Ambati
 
Practitioner's Guide to LLMs: Exploring Use Cases and a Glimpse Beyond Curren...
Practitioner's Guide to LLMs: Exploring Use Cases and a Glimpse Beyond Curren...Practitioner's Guide to LLMs: Exploring Use Cases and a Glimpse Beyond Curren...
Practitioner's Guide to LLMs: Exploring Use Cases and a Glimpse Beyond Curren...Sri Ambati
 
Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...
Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...
Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...Sri Ambati
 
KGM Mastering Classification and Regression with LLMs: Insights from Kaggle C...
KGM Mastering Classification and Regression with LLMs: Insights from Kaggle C...KGM Mastering Classification and Regression with LLMs: Insights from Kaggle C...
KGM Mastering Classification and Regression with LLMs: Insights from Kaggle C...Sri Ambati
 
LLM Interpretability
LLM Interpretability LLM Interpretability
LLM Interpretability Sri Ambati
 
Never Reply to an Email Again
Never Reply to an Email AgainNever Reply to an Email Again
Never Reply to an Email AgainSri Ambati
 
Introducción al Aprendizaje Automatico con H2O-3 (1)
Introducción al Aprendizaje Automatico con H2O-3 (1)Introducción al Aprendizaje Automatico con H2O-3 (1)
Introducción al Aprendizaje Automatico con H2O-3 (1)Sri Ambati
 
From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...
From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...
From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...Sri Ambati
 
AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...
AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...
AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...Sri Ambati
 
AI Foundations Course Module 1 - An AI Transformation Journey
AI Foundations Course Module 1 - An AI Transformation JourneyAI Foundations Course Module 1 - An AI Transformation Journey
AI Foundations Course Module 1 - An AI Transformation JourneySri Ambati
 
ML Model Deployment and Scoring on the Edge with Automatic ML & DF
ML Model Deployment and Scoring on the Edge with Automatic ML & DFML Model Deployment and Scoring on the Edge with Automatic ML & DF
ML Model Deployment and Scoring on the Edge with Automatic ML & DFSri Ambati
 
Scaling & Managing Production Deployments with H2O ModelOps
Scaling & Managing Production Deployments with H2O ModelOpsScaling & Managing Production Deployments with H2O ModelOps
Scaling & Managing Production Deployments with H2O ModelOpsSri Ambati
 
Automatic Model Documentation with H2O
Automatic Model Documentation with H2OAutomatic Model Documentation with H2O
Automatic Model Documentation with H2OSri Ambati
 
Your AI Transformation
Your AI Transformation Your AI Transformation
Your AI Transformation Sri Ambati
 
AI Solutions in Manufacturing
AI Solutions in ManufacturingAI Solutions in Manufacturing
AI Solutions in ManufacturingSri Ambati
 
ICLR 2020 Recap
ICLR 2020 RecapICLR 2020 Recap
ICLR 2020 RecapSri Ambati
 
Getting Your Supply Chain Back on Track with AI
Getting Your Supply Chain Back on Track with AIGetting Your Supply Chain Back on Track with AI
Getting Your Supply Chain Back on Track with AISri Ambati
 
AI and AutoML: Debunking Myths
AI and AutoML: Debunking MythsAI and AutoML: Debunking Myths
AI and AutoML: Debunking MythsSri Ambati
 

Mehr von Sri Ambati (20)

Building Custom GenAI Apps at H2O
Building Custom GenAI Apps at H2OBuilding Custom GenAI Apps at H2O
Building Custom GenAI Apps at H2O
 
Applied Gen AI for the Finance Vertical
Applied Gen AI for the Finance Vertical Applied Gen AI for the Finance Vertical
Applied Gen AI for the Finance Vertical
 
Cutting Edge Tricks from LLM Papers
Cutting Edge Tricks from LLM PapersCutting Edge Tricks from LLM Papers
Cutting Edge Tricks from LLM Papers
 
Practitioner's Guide to LLMs: Exploring Use Cases and a Glimpse Beyond Curren...
Practitioner's Guide to LLMs: Exploring Use Cases and a Glimpse Beyond Curren...Practitioner's Guide to LLMs: Exploring Use Cases and a Glimpse Beyond Curren...
Practitioner's Guide to LLMs: Exploring Use Cases and a Glimpse Beyond Curren...
 
Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...
Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...
Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...
 
KGM Mastering Classification and Regression with LLMs: Insights from Kaggle C...
KGM Mastering Classification and Regression with LLMs: Insights from Kaggle C...KGM Mastering Classification and Regression with LLMs: Insights from Kaggle C...
KGM Mastering Classification and Regression with LLMs: Insights from Kaggle C...
 
LLM Interpretability
LLM Interpretability LLM Interpretability
LLM Interpretability
 
Never Reply to an Email Again
Never Reply to an Email AgainNever Reply to an Email Again
Never Reply to an Email Again
 
Introducción al Aprendizaje Automatico con H2O-3 (1)
Introducción al Aprendizaje Automatico con H2O-3 (1)Introducción al Aprendizaje Automatico con H2O-3 (1)
Introducción al Aprendizaje Automatico con H2O-3 (1)
 
From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...
From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...
From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...
 
AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...
AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...
AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...
 
AI Foundations Course Module 1 - An AI Transformation Journey
AI Foundations Course Module 1 - An AI Transformation JourneyAI Foundations Course Module 1 - An AI Transformation Journey
AI Foundations Course Module 1 - An AI Transformation Journey
 
ML Model Deployment and Scoring on the Edge with Automatic ML & DF
ML Model Deployment and Scoring on the Edge with Automatic ML & DFML Model Deployment and Scoring on the Edge with Automatic ML & DF
ML Model Deployment and Scoring on the Edge with Automatic ML & DF
 
Scaling & Managing Production Deployments with H2O ModelOps
Scaling & Managing Production Deployments with H2O ModelOpsScaling & Managing Production Deployments with H2O ModelOps
Scaling & Managing Production Deployments with H2O ModelOps
 
Automatic Model Documentation with H2O
Automatic Model Documentation with H2OAutomatic Model Documentation with H2O
Automatic Model Documentation with H2O
 
Your AI Transformation
Your AI Transformation Your AI Transformation
Your AI Transformation
 
AI Solutions in Manufacturing
AI Solutions in ManufacturingAI Solutions in Manufacturing
AI Solutions in Manufacturing
 
ICLR 2020 Recap
ICLR 2020 RecapICLR 2020 Recap
ICLR 2020 Recap
 
Getting Your Supply Chain Back on Track with AI
Getting Your Supply Chain Back on Track with AIGetting Your Supply Chain Back on Track with AI
Getting Your Supply Chain Back on Track with AI
 
AI and AutoML: Debunking Myths
AI and AutoML: Debunking MythsAI and AutoML: Debunking Myths
AI and AutoML: Debunking Myths
 

Kürzlich hochgeladen

IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 

Kürzlich hochgeladen (20)

IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 

Deploying H2O in Large-Scale Distributed Environments - Yoann L. - H2O AI World London

  • 1. Deploying H2O in Large-Scale Distributed Environments using Containers Yoann Lechevallier Senior Systems Engineer for Europe, Middle East, & Africa (EMEA) BlueData www.bluedata.ai @BlueData www.linkedin.com/in/yoannlechevallier
  • 3. • Access to valuable data: small, big, or both • Choices of modeling techniques: each problem is different • Ability to build on datasets, validate on other datasets, iterate, and improve • Access to GPUs (and CPUs) • Scale easily on real datasets • Ability to operationalize in production Source: https://rohitnarurkar.wordpress.com/2013/11/02/cuda-matrix-multiplication Distributed ML / DL – Key Requirements
  • 4. • Scalability, repeatability, complexity, reproducibility across environments • Sharing data, not duplicating data • Deploying distributed platforms, libraries, applications, and versions • Efficiently sharing expensive resources like GPUs • Agility to scale up and down compute resources • Providing a future-proof solution • Ensuring compatible NVIDIA device kernel module installation Distributed ML / DL – Challenges Laptop On-Prem Cluster Off-Prem Cluster
  • 5. Deploying H2O in a Distributed Environment, on Containers
  • 6. Container-Based Architecture for AI / ML / DL IOBoost™ – Extreme performance and enterprise-grade scalability ElasticPlane™ – Self-service, multi-tenant containerized environments DataTap™ – In-place access to data on-prem or in the cloud Data Scientists Developers Data Engineers Data Analysts NFS HDFS Compute Storage On-Premises CPUs GPUs Hybrid Multi-Cloud BlueData EPIC™ Software Platform
  • 7. Example of an H2O Pipeline on Containers H2O Driverless AI Import Validate Export Shared Data Access Layer … Data Sources …
  • 8. With H2O + BlueData EPIC, enterprise customers now have: • Pre-built Docker H2O images with CUDA and automated cluster creation for the entire stack • Appropriate NVIDIA kernel module surfaced automatically to the containers • Easy access to resources required (e.g. single node, single GPU, multi-node, multi-GPU combinations) • UI, CLI, and API access (notebooks, web, SSH) • NFS mounts surfaced as local drives for sharing assets Challenges Solved, Deployment Accelerated
  • 9. Deploy H2O from Pre-Built Images in the BlueData EPIC App Store Docker images for multiple applications and versions Ability to create and add new images, and save or restore tested combinations on demand
  • 10. Multi-Tenant, with Quotas for GPU Resources Support for multi-tenancy and ability to define quota per tenant Define ‘flavor’ types used to launch Docker containers
  • 11. Spin Up Multiple Environments Quick launch templates for one-click cluster creation Run multiple clusters, with different versions or combinations of tools, side by side
  • 12. Pick from a list of pre-built and tested images Assign specific resources (GPUs, CPUs) to the cluster, depending on the use case (e.g. for Driverless AI) Define number of nodes, here for H2O and Sparkling Water On-Demand Cluster Creation
  • 13. • The user authenticates on Driverless AI • Import datasets from BlueData DataTap with DataTap connector, optimised access with BlueData IOBoost • Analyse the data • Run experiments • Build models, save them … • Validate against other datasets from DataTap … • Export model for production Run Driverless AI on Containers with GPUs dtap
  • 14. • Optionally initialise Sparkling Water against an existing H2O cluster created previously [external backend] • Pass to Sparkling Water the appropriate jar to use for the HDFS connectivity • Work on your dataset using the HDFS connectivity Work with Sparkling Water Cluster and HDFS
  • 15. • BlueData EPIC automatically deploys the environments • Using persistent containers • Providing true multi-tenancy • Access to shared resources (CPU, RAM, GPUs, storage) • Pre-built H2O images in the BlueData EPIC App Store • Enterprise-grade security (integration with AD /LDAP / TDE) Simplify H2O Deployments at Scale in Minutes
  • 16. BlueData DataTap BlueData IOBoost Enable Compute / Storage Separation Connect the clusters to different datasets without copying the data, and with performance optimised
  • 17. From the BlueData EPIC App Store, deploy more application clusters to connect to H2O Integrate H2O with Production Environment
  • 18. • Infrastructure for distributed ML / DL is complex (CPUs, GPUs, data …) • This complexity can be abstracted from data science teams with self-service provisioning and automation, using containers • GPU access can be effectively used by the containerized application, then released for other applications and users • For a flexible and scalable solution, data resources should be decoupled from compute • H2O, Driverless AI, and Sparkling Water can be deployed at scale on containers – whether on-premises, on any public cloud, or hybrid • BlueData + H2O proven in production with Global 2000 Lessons Learned – H2O on Containers
  • 19. Thank you ! www.bluedata.ai Yoann Lechevallier Senior Systems Engineer for Europe, Middle East, & Africa (EMEA) BlueData @BlueData www.linkedin.com/in/yoannlechevallier

Hinweis der Redaktion

  1. Deep learning uses general learning algorithms The algorithms need to build the layers of an artificial neural network Training data Processing this training data requires lots of computation Matrix multiplications
  2. The #1 challenge with respect to bringing the DevOps mindset to to Big Data is the scalability, reproducibility and repeatability. It’s easy enough for developers to work on their laptops. Data scientists sometimes prototype the entire pipeline on a powerful laptop with a whatever it takes, “make it work” mentality. You can take a single node VM, install a bunch of libraries and work on smallish data sets. But will that same program successfully deploy and work on a real environment that uses multi-node clusters, potentially different versions and libraries and more importantly significantly larger volumes of data. This last aspect is unique to the Big Data and is one of single biggest reason that data team are unable to iterate rapidly ML / DL local Single node VM Local libraries Limited data (10s of GB) “It works on my laptop” Multi-node environments Different versions Different environment variables Libraries and dependencies must exist on all nodes Big Data (TBs of data)
  3. Virtualization ushered in cost savings through reduced footprint, faster server provisioning, and improved disaster recovery (DR), because the DR site hardware no longer had to mirror the primary data center. Do you need a full platform that can house multiple services? Go with a virtual machine. Do you need a single service that can be clustered and deployed at scale? Go with a container.