This video was recorded in London on October 30, 2018 and can be viewed here: https://youtu.be/dLfYjbWo9IA
This session will discuss how to get up and running quickly with containerized H2O environments (H2O Flow, Sparkling Water, and Driverless AI) at scale, in a multi-tenant architecture with a shared pool of resources using CPUs and/or GPUs. See how how you can spin up (and tear down) your H2O environments on-demand, with just a few mouse clicks. Find out how to enable quota management of GPU resources for greater efficiency, and easily connect your compute to your datasets for large-scale distributed machine learning. Learn how to operationalize your machine learning pipelines and deliver faster time-to-value for your AI initiative — while ensuring enterprise-grade security and high performance.
Bio: Yoann Lechevallier is a Senior Systems Engineer at BlueData, where he focuses on helping enterprise customers deploy AI, machine learning, and big data analytics applications running on containers. Yoann has deep expertise in systems integration, performance tuning, and data analysis. He recently built containerized environments for H2O Flow, Sparkling Water, and Driverless AI for deployment with the BlueData EPIC software platform. He also developed a data connector for H2O Driverless AI to enable compute / storage separation with BlueData. Prior to BlueData, Yoann has held positions in consulting, benchmark engineering, and professional services at Splunk, IBM, Bull SAS, Seanodes, and Sun Microsystems. Yoann has extensive experience working with leading enterprises throughout Europe, the Middle East, and Africa - including financial services and insurance (Barclays, RBS, HSBC, Vanquis, Lloyds, BNP, UBS, KBC, JPMC, Prudential, Royal London), telecommunications (BT, H3G, Nokia), and healthcare (HSCIC, Sidra). Yoann holds a Master of Science degree from INSA in Rouen, France as well as a Masters degree in Embedded Computing from SUPAERO in Toulouse, France.
Deploying H2O in Large-Scale Distributed Environments - Yoann L. - H2O AI World London
1. Deploying H2O in
Large-Scale Distributed
Environments using
Containers
Yoann Lechevallier
Senior Systems Engineer for Europe, Middle East, & Africa (EMEA)
BlueData
www.bluedata.ai @BlueData www.linkedin.com/in/yoannlechevallier
3. • Access to valuable data: small, big, or both
• Choices of modeling techniques: each problem is
different
• Ability to build on datasets, validate on other
datasets, iterate, and improve
• Access to GPUs (and CPUs)
• Scale easily on real datasets
• Ability to operationalize in production
Source: https://rohitnarurkar.wordpress.com/2013/11/02/cuda-matrix-multiplication
Distributed ML / DL – Key Requirements
4. • Scalability, repeatability, complexity,
reproducibility across environments
• Sharing data, not duplicating data
• Deploying distributed platforms, libraries,
applications, and versions
• Efficiently sharing expensive resources like GPUs
• Agility to scale up and down compute resources
• Providing a future-proof solution
• Ensuring compatible NVIDIA device kernel module
installation
Distributed ML / DL – Challenges
Laptop On-Prem
Cluster
Off-Prem
Cluster
6. Container-Based Architecture for AI / ML / DL
IOBoost™ – Extreme performance and enterprise-grade scalability
ElasticPlane™ – Self-service, multi-tenant containerized environments
DataTap™ – In-place access to data on-prem or in the cloud
Data Scientists Developers Data Engineers Data Analysts
NFS HDFS
Compute
Storage
On-Premises
CPUs GPUs
Hybrid Multi-Cloud
BlueData EPIC™ Software Platform
7. Example of an H2O Pipeline on Containers
H2O Driverless AI
Import Validate
Export
Shared Data Access Layer
… Data Sources …
8. With H2O + BlueData EPIC, enterprise customers now have:
• Pre-built Docker H2O images with CUDA and automated
cluster creation for the entire stack
• Appropriate NVIDIA kernel module surfaced automatically to
the containers
• Easy access to resources required (e.g. single node, single
GPU, multi-node, multi-GPU combinations)
• UI, CLI, and API access (notebooks, web, SSH)
• NFS mounts surfaced as local drives for sharing assets
Challenges Solved, Deployment Accelerated
9. Deploy H2O from Pre-Built Images in the
BlueData EPIC App Store
Docker images for multiple
applications and versions
Ability to create and
add new images, and
save or restore
tested combinations
on demand
10. Multi-Tenant, with Quotas for GPU
Resources
Support for multi-tenancy
and ability to define quota
per tenant
Define ‘flavor’ types used to
launch Docker containers
11. Spin Up Multiple Environments
Quick launch templates
for one-click cluster
creation
Run multiple clusters,
with different versions or
combinations of tools,
side by side
12. Pick from a list of
pre-built and tested images
Assign specific resources (GPUs,
CPUs) to the cluster, depending on
the use case (e.g. for Driverless AI)
Define number of nodes, here for
H2O and Sparkling Water
On-Demand Cluster Creation
13. • The user authenticates on Driverless AI
• Import datasets from BlueData
DataTap with DataTap connector,
optimised access with BlueData
IOBoost
• Analyse the data
• Run experiments
• Build models, save them …
• Validate against other datasets from
DataTap …
• Export model for production
Run Driverless AI on Containers with GPUs
dtap
14. • Optionally initialise Sparkling Water against an existing H2O cluster created previously
[external backend]
• Pass to Sparkling Water the appropriate jar to use for the HDFS connectivity
• Work on your dataset using the HDFS connectivity
Work with Sparkling Water Cluster and HDFS
15. • BlueData EPIC automatically
deploys the environments
• Using persistent containers
• Providing true multi-tenancy
• Access to shared resources (CPU,
RAM, GPUs, storage)
• Pre-built H2O images in the
BlueData EPIC App Store
• Enterprise-grade security
(integration with AD /LDAP / TDE)
Simplify H2O Deployments at Scale in Minutes
16. BlueData DataTap
BlueData IOBoost
Enable Compute / Storage Separation
Connect the clusters to different datasets without
copying the data, and with performance optimised
17. From the BlueData EPIC App Store, deploy
more application clusters to connect to H2O
Integrate H2O with Production Environment
18. • Infrastructure for distributed ML / DL is complex (CPUs, GPUs, data
…)
• This complexity can be abstracted from data science teams with
self-service provisioning and automation, using containers
• GPU access can be effectively used by the containerized
application, then released for other applications and users
• For a flexible and scalable solution, data resources should be
decoupled from compute
• H2O, Driverless AI, and Sparkling Water can be deployed at scale on
containers – whether on-premises, on any public cloud, or hybrid
• BlueData + H2O proven in production with Global 2000
Lessons Learned – H2O on Containers
19. Thank you !
www.bluedata.ai
Yoann Lechevallier
Senior Systems Engineer for Europe, Middle East, & Africa (EMEA)
BlueData
@BlueData www.linkedin.com/in/yoannlechevallier
Hinweis der Redaktion
Deep learning uses general learning algorithms
The algorithms need to build the layers of an artificial neural network
Training data
Processing this training data requires lots of computation
Matrix multiplications
The #1 challenge with respect to bringing the DevOps mindset to to Big Data is the scalability, reproducibility and repeatability.
It’s easy enough for developers to work on their laptops. Data scientists sometimes prototype the entire pipeline on a powerful laptop with a whatever it takes, “make it work” mentality. You can take a single node VM, install a bunch of libraries and work on smallish data sets.
But will that same program successfully deploy and work on a real environment that uses multi-node clusters, potentially different versions and libraries and more importantly significantly larger volumes of data. This last aspect is unique to the Big Data and is one of single biggest reason that data team are unable to iterate rapidly
ML / DL local
Single node VM
Local libraries
Limited data (10s of GB)
“It works on my laptop”
Multi-node environments
Different versions
Different environment variables
Libraries and dependencies must exist on all nodes
Big Data (TBs of data)
Virtualization ushered in cost savings through reduced footprint, faster server provisioning, and improved disaster recovery (DR), because the DR site hardware no longer had to mirror the primary data center.
Do you need a full platform that can house multiple services? Go with a virtual machine.
Do you need a single service that can be clustered and deployed at scale? Go with a container.