Univa and UberCloud demonstrate how to manage and execute Ansys Application Containers in a hybrid cloud, powered by Navops Launch and Univa Grid Engine.
UberCloud and Univa have partnered to provide a comprehensive container orchestration environment together with growing set of containerized application software from ANSYS, CD-adapco, Gromacs, NICE DCV, Numeca, OpenFOAM, PSPP, Scilab.
www.theubercloud.com/cloud-hpc
2. Who is Univa?
• Software-defined computing infrastructure (SDI) solutions to modernize mission-
critical workloads and accelerate hybrid cloud migration
• Focused on Global 2000 customers across every type of infrastructure and
software stack
• Thousands of diverse applications, workflows and millions of tasks every day
• Founded in 2011 / Privately held / 50 employees
• Offices in Toronto, Chicago and Munich
Largest independent provider of cluster management and
orchestration software
250
Global Customers
1,000
Integrated Applications
2,500,000
Compute Cores
7,000,000
Jobs Per Day
3. Univa market presence by industry
Industry leaders use Univa products to optimize performance and
manage mission-critical applications
Energy Gov’t Financial Life Science Manufacturing Transportation
4. Univa product lines
Univa provides cluster management and workload orchestration solutions
that optimize legacy infrastructure and facilitate hybrid cloud operations
Maximize
Existing Resources
High Performance
Computing (HPC)
Modernize
Cloud Strategy
Cloud, Hybrid-Cloud,
Cloud-Native
Infrastructure
5. Univa product details
High Performance Computing
Create a single compute pool across
distributed data center resources
Univa Grid Engine features:
• Efficiently manage HPC workloads
• Track and measure resource
utilization
• License sharing according to business
objectives
• Run Docker containers in a Grid
Engine cluster and blend containers
with other workloads
Cloud, Hybrid Cloud, Cloud-Native
Migrate to cloud to increase efficiencies
and run containers at scale
Harness hybrid, public and private cloud
resources
Run containerized and non-containerized
workloads more efficiently on Kubernetes
Run Mesos frameworks on native
Kubernetes or Grid Engine clusters
LAUNCH
COMMAND
URB
6. Univa HPC Market Perspectives
Univa customers are over weighted
to the enterprise
Customer cloud interest has
grown 10-fold since last year
Movement to cloud is recent but
strong trend as 61% of Univa
customers surveyed are planning or
deployed to cloud
Open source project Tortuga
New frameworks, containers &
cloud are changing how enterprises
look at technical computing
Cloud Native HPC
“In the past 18 months, 100% of hundreds of
HPC inquiries were regarding cloud migration”
Gartner HPC Research Director, 2018
Univa Customer Survey, 2018
7. 2012 2013 2014 2016
Started Cloud Experiments
(200+ so far)
2015
HPC Technology
Development
2017
Univa Reseller
Relationship
CAE Software
Containers
2018
ANSYS
Cloud Hosting Partner
Brief History of UberCloud
8. Software Containers: what are they?
• Core feature of Linux
• Virtualizes the kernel
• Low overhead, small image size
• Docker is the most common standard
9. UberCloud Container Technology
• Based on Docker, enhanced for
engineering & scientific application
• Application software is pre-installed,
configured, and tested by UberCloud
• Includes all the tools an engineer
needs such as MPI and remote viz
• Available as self service or as a fully
managed service
12. What is Navops Launch?
Capabilities
• Deploy and manage HPC clusters
• Physical, virtual, cloud,
hybrid or any combination of
these
• Fully automated configuration
management of nodes
Key Features
• Policy-based launching of cloud
instances
• Automate the use of Cloud
• Deep integrations to cloud
provider HPC oriented offerings
• Integrates well with existing
“brownfield” corporate IT tools
13. Policy-based cloud bursting
With Navops Launch you can easily create custom policies to
dynamically burst and manage your hybrid cloud
Public or Private
clouds
On-premise Cluster
Workloads burst based on
configurable policies
14. Seamless Hybrid Cloud
Virtual
Machines
Bare metal
On Prem.
Storage
Traditional Batch / Machine Learning / Analytic Workloads - Containerized and Non-containerized
Policy engine /
Cloud automation
Cloud Provider API
Cloud provider provisions and
de-provisions instances
Cloud Infrastructure
Cloud hosts scale based on workload demand
and provisioning policies
Data Management
Cache, synch, mirror,
transfer,..
VPN, VPC
Local network extended to
cloud-based resources
Cloud
Storage
On-premise infrastructure
15. CHALLENGE: SOLUTION:
CUSTOMER VALUE:
Wharton was able to avoid substantial infrastructure costs and user
training while transparently tripling its core count
• Diverse research & analytic workloads
• Growing demands by students and faculty
• Workload requirements hard to predict
• Challenges growing HPC infrastructure
• Univa Grid Engine
• Navops Launch
• Seamless bursting to public cloud
GPUs and TensorFlow
• Significant cost avoidance on new infrastructure
• Improved capacity and strategic flexibility
• Hybrid, pay-per-use model for cost efficiency
The Wharton School is globally known for intellectual leadership
and innovation in business education.
Case Study: Hybrid Cloud Bursting
16. CHALLENGE: SOLUTION:
CUSTOMER VALUE:
Improving EDA license usage by bursting jobs selectively
• Critical simulation & verification workloads
• Need to defer/reduce CapEx
• Unmet demand during peak periods
• On-premises EDA licenses underutilized
• Univa Grid Engine
• Univa License Orchestrator
• Univa Unicloud with auto-scaling
• Hybrid cloud environment
By leveraging Unicloud, Mellanox can extend their
local cluster to the cloud, providing increased
capacity and maximizing on-prem EDA license
usage by shifting cloud-friendly workloads to Azure
• Avoid resource bottlenecks during tape-out periods
• Reduce cost by paying for capacity only when needed
• Maximize on-premises EDA license usage by shifting
non-licensed work to the cloud, improving productivity
Case Study: Hybrid Cloud Bursting
Hyperion (2017) reported that the Middleware segment of HPC is going to grow at a 6.9% CAGR from 2016-2021
The cloud category is the fastest growing HPC category at 14%-18% CAGR but represents only 20% of the market at present
11 have purchased Navops Launch, 47 in pipeline
In 2012 the cloud was already mainstream in many areas. AWS was 10 years old
Realized was that technical computing was lagging in the adoption of some of these models.
Set up finding out what those challenges are and then addressing them one by one. And the way we do this is by what we call experiments.
These are short projects - 1 month long. Completely voluntary effort where we bring together a hardware vendor, sw vendor and an end user and we help them work on a real world problem.
50/25 - 50/50 (average time: 3mo) (average time: under 1 week)
The resulting cases and published as a compendium that are available to everyone for free. Its a way of building the knowledge base of the whole community.
Based on the learning we decided to build HPC containers and spent the next 18 months developing these.
During that time we signed partnerships with ANSYS and Univa. Our partnership with Unvia goes back to November of 2015. And in 2015 we released our CAE software containers including one from ANSYS. These are available for example in the Microsoft Azure Marketplace.
And finally this year we’ve deepened our partnership with Univa, and become an ANSYS advanced solution partner.
What are these containers that we’re talking about.
Shipping container metaphor - Prior to that no standardization
Containers are a type of virtualization technology that is based on the Linux kernel.
How are containers different from virtual machines? VMs virtualize the hardware. For example they take 1 cpu and make it look like 5
Containers virtualize the operating system. So the operate at one level higher than VMs
This makes containers very lightweight since they don’t carry around a whole OS.
It also makes them very portable. VMs need the same tech for mobility Eg vmotion
Anywhere there’s a linux OS with a docker run time, these containers can run.
Makes them very suitable to heterogeneous infrastructure. For example a hybrid cloud scenario. on-prem + cloud
Docker is a container standard
Containers is an incredibly popular technology for microservices (which is a design pattern for developing web facing applications)
But they’re not exclusive to microservices and can be used for regular applications.
UberCloud has taken this technology and adapted it for the HPC workloads. enhanced it for engineering & scientific application software such as ANSYS
These are packages that contain everything an engineer needs to complete a task on a remote server as if it was running on his desktop.
Up to date OS, libraries MPI, utilities for remote viz, data storage, engineering applications,
Software packages designed to deliver the tools that an engineer needs
Ready to execute, in an instant. No need to install software, deal with complex OS commands, or configure.
Hybrid Infrastructure. Containers let you run your ANSYS workload on-prem and in the cloud.
All the benefits of cloud including the ability to scale up and down
Great GUI capabilities
ANSYS has workbench which lets you really take a drag and drop approach to setting up complex, multi-physics simulations.
Now you can have that same workstation like capability with HPC infrastructure.
Users don’t have the frustration over having to learn a new and unfamiliar access model.
Simple GUI run within a web browser will allow for the maximum level of ease-of-use for new users. Workstation
Pre and post processing in the cloud. Minimize data transfer
ssh command line access is also available for interactive
For batch users, they can access a batch interface as well. That is enabled through a 3 way integration between UberCloud, Univa and CycleCloud from Microsoft
Microsoft provides the HPC hardware, CycleCloud creates HPC clusters, UGE orchestrates the UberCloud ANSYS containers. Complete solution
ANSYS Elastic Licensing
Hybrid infrastructure needs a corresponding licensing structure
ANSYS has innovated here and introduced elastic licenses
These are licenses that can be consumed on demand. Can be mixed with paid up licenses
Hybrid scenario, have perpetual license for on-premise cluster and elastic licensing for the cloud overflow
The Seamless Hybrid Cloud is What many of our customers are looking for. A system that combines the ‘best of both worlds’. They continue to have an on premise HPC cluster that runs their day to day workload but they can extend that cluster ‘on demand into the cloud based on rules and policy’ to meet the needs of their users or projects.
To do this One additional Univa Product is needed and that is Navops Launch. If we start at the left of this diagram you can see that Navops Launch is the Cloud Automation and Policy engine – it interfaces with the Cloud infrastructure and requests new compute resources as needed via the Cloud vendor API. This part of the hybrid cloud is just one piece of the whole puzzle. As nodes are created in the cloud and added to the cluster dynamically they are automatically configured using configuration management tools – then connected to a Network infrastructure that is extended to the cloud vendor over a VPC or a ‘Direct Hardware VPN connection’ finally the customer has to determine how the data for their applications will be synchronized between on premise and the cloud – there are many potential solutions from the simple such as ‘rsync’ to more complex (and expensive) such as Avere caching filer.
So you see a Hybrid Cloud as we define it extends Univa Grid Engine to a cloud provider and adds the compute to the cluster ‘transparently’ – it must be as similar as possible to adding and managing nodes on prem and in the cloud. Univa Products such as Navops Launch abstract this a bit to make it ‘look the same’ even if underneath the code does something different for on premise and cloud.
You may be thinking – ‘Why would someone go to all this trouble?’ Why not just create a new cluster in the cloud with a new workload manager (SLURM, or others) and run the work there – this seems like lots of work.
It is a bit of work but the reason they do it is because the hundreds of thousands of applications that have ‘thousands of scripts and hooks written around them’ created over decades *DOES NOT HAVE TO CHANGE* put simply – the workflow the ‘sticky and messy stuff’ – works without change in the hybrid cloud.
We have several customers who are using and starting the process of creating a ‘Seemless Hybrid Cloud’ and a small number of customers are looking beyond that to incorporate ‘Cloud-Native HPC’.
By shifting jobs that do not consume EDA licenses to the cloud, Mellanox ensure there are more scheduling slots available for jobs that consume (Cadence / Synopsis) licenses, thus keeping their license features more fully utilized (more consequential from a cost standpoint than just keeping hardware busy)
Unicloud is versatile enough in its design to allow Mellanox to use their own custom machine images on Azure providing the flexibility to deploy the same image on-premises or in cloud avoiding the need to build a new virtual machine image specific to the cloud provider. Competing solutions like Bright Cluster Manager do not allow this.
Mellanox have developed custom scripts, where when the number of pending jobs exceeds a threshold, they dynamically pick jobs from the pending queue and move them to the cloud-bursting queue. They move only jobs that do not need EDA licenses because they are not entitled to use the licenses in the cloud – also, they dispatch jobs to appropriately sized VMs on Azure depending on the resource requirements of the job thus ensuring that they’re using the most cost-effective resources to get the job done.
In the next several slides I am going to talk from a high level about the Univa Products that form ‘part of the Hybrid Cloud’ and where we are going with them.