Deploying H2O in Large-Scale Distributed Environments - Yoann L. - H2O AI World London

Deploying H2O in
Large-Scale Distributed
Environments using
Containers
Yoann Lechevallier
Senior Systems Engineer for Europe, Middle East, & Africa (EMEA)
BlueData
www.bluedata.ai @BlueData www.linkedin.com/in/yoannlechevallier

Large-Scale Distributed
Environments for AI / ML / DL

• Access to valuable data: small, big, or both
• Choices of modeling techniques: each problem is
different
• Ability to build on datasets, validate on other
datasets, iterate, and improve
• Access to GPUs (and CPUs)
• Scale easily on real datasets
• Ability to operationalize in production
Source: https://rohitnarurkar.wordpress.com/2013/11/02/cuda-matrix-multiplication
Distributed ML / DL – Key Requirements

• Scalability, repeatability, complexity,
reproducibility across environments
• Sharing data, not duplicating data
• Deploying distributed platforms, libraries,
applications, and versions
• Efficiently sharing expensive resources like GPUs
• Agility to scale up and down compute resources
• Providing a future-proof solution
• Ensuring compatible NVIDIA device kernel module
installation
Distributed ML / DL – Challenges
Laptop On-Prem
Cluster
Off-Prem
Cluster

Deploying H2O in a Distributed
Environment, on Containers

Container-Based Architecture for AI / ML / DL
IOBoost™ – Extreme performance and enterprise-grade scalability
ElasticPlane™ – Self-service, multi-tenant containerized environments
DataTap™ – In-place access to data on-prem or in the cloud
Data Scientists Developers Data Engineers Data Analysts
NFS HDFS
Compute
Storage
On-Premises
CPUs GPUs
Hybrid Multi-Cloud
BlueData EPIC™ Software Platform

Example of an H2O Pipeline on Containers
H2O Driverless AI
Import Validate
Export
Shared Data Access Layer
… Data Sources …

With H2O + BlueData EPIC, enterprise customers now have:
• Pre-built Docker H2O images with CUDA and automated
cluster creation for the entire stack
• Appropriate NVIDIA kernel module surfaced automatically to
the containers
• Easy access to resources required (e.g. single node, single
GPU, multi-node, multi-GPU combinations)
• UI, CLI, and API access (notebooks, web, SSH)
• NFS mounts surfaced as local drives for sharing assets
Challenges Solved, Deployment Accelerated

Deploy H2O from Pre-Built Images in the
BlueData EPIC App Store
Docker images for multiple
applications and versions
Ability to create and
add new images, and
save or restore
tested combinations
on demand

Multi-Tenant, with Quotas for GPU
Resources
Support for multi-tenancy
and ability to define quota
per tenant
Define ‘flavor’ types used to
launch Docker containers

Spin Up Multiple Environments
Quick launch templates
for one-click cluster
creation
Run multiple clusters,
with different versions or
combinations of tools,
side by side

Pick from a list of
pre-built and tested images
Assign specific resources (GPUs,
CPUs) to the cluster, depending on
the use case (e.g. for Driverless AI)
Define number of nodes, here for
H2O and Sparkling Water
On-Demand Cluster Creation

• The user authenticates on Driverless AI
• Import datasets from BlueData
DataTap with DataTap connector,
optimised access with BlueData
IOBoost
• Analyse the data
• Run experiments
• Build models, save them …
• Validate against other datasets from
DataTap …
• Export model for production
Run Driverless AI on Containers with GPUs
dtap

• Optionally initialise Sparkling Water against an existing H2O cluster created previously
[external backend]
• Pass to Sparkling Water the appropriate jar to use for the HDFS connectivity
• Work on your dataset using the HDFS connectivity
Work with Sparkling Water Cluster and HDFS

• BlueData EPIC automatically
deploys the environments
• Using persistent containers
• Providing true multi-tenancy
• Access to shared resources (CPU,
RAM, GPUs, storage)
• Pre-built H2O images in the
BlueData EPIC App Store
• Enterprise-grade security
(integration with AD /LDAP / TDE)
Simplify H2O Deployments at Scale in Minutes

BlueData DataTap
BlueData IOBoost
Enable Compute / Storage Separation
Connect the clusters to different datasets without
copying the data, and with performance optimised

From the BlueData EPIC App Store, deploy
more application clusters to connect to H2O
Integrate H2O with Production Environment

• Infrastructure for distributed ML / DL is complex (CPUs, GPUs, data
…)
• This complexity can be abstracted from data science teams with
self-service provisioning and automation, using containers
• GPU access can be effectively used by the containerized
application, then released for other applications and users
• For a flexible and scalable solution, data resources should be
decoupled from compute
• H2O, Driverless AI, and Sparkling Water can be deployed at scale on
containers – whether on-premises, on any public cloud, or hybrid
• BlueData + H2O proven in production with Global 2000
Lessons Learned – H2O on Containers

Thank you !
www.bluedata.ai
Yoann Lechevallier
Senior Systems Engineer for Europe, Middle East, & Africa (EMEA)
BlueData
@BlueData www.linkedin.com/in/yoannlechevallier

Deploying H2O in Large-Scale Distributed Environments - Yoann L. - H2O AI World London

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Mehr von Sri Ambati

Mehr von Sri Ambati (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Deploying H2O in Large-Scale Distributed Environments - Yoann L. - H2O AI World London

Hinweis der Redaktion