SlideShare ist ein Scribd-Unternehmen logo
1 von 12
Downloaden Sie, um offline zu lesen
Thought Leadership White Paper
IBM Systems and Technology Group November 2012
Could the “C” in HPC
stand for Cloud?
By Christopher N. Porter, IBM Corporation
porterc@us.ibm.com
2 Could the “C” in HPC stand for Cloud?
Introduction
Most IaaS (infrastructure as a service) vendors such as
Rackspace, Amazon and Savvis use various virtualization
technologies to manage the underlying hardware they
build their offerings on. Unfortunately the virtualization
technologies used vary from vendor to vendor and are
sometimes kept secret. Therefore, the question about
virtual machines versus physical machines for high
performance computing (HPC) applications is germane
to any discussion of HPC in the cloud.
This paper examines aspects of computing important in HPC
(compute and network bandwidth, compute and network
latency, memory size and bandwidth, I/O, and so on) and how
they are affected by various virtualization technologies. The
benchmark results presented will illuminate areas where cloud
computing, as a virtualized infrastructure, is sufficient for some
workloads and inappropriate for others. In addition, it will
provide a quantitative assessment of the performance
differences between a sample of applications running on
various hypervisors so that data-based decisions can be made
for datacenter and technology adoption planning.
A business case for HPC clouds
HPC architects have been slow to adopt virtualization
technologies for two reasons:
1.	The common assumption that virtualization impacts
application performance so severely that any gains in
flexibility are far outweighed by the loss of application
throughput.
2.	Utilization on traditional HPC infrastructure is very high
(between 80 - 95 percent).Therefore, the typical driving
business cases for virtualization (for example, utilization of
hardware, server consolidation or license utilization) simply
did not hold significant enough merit to justify the added
complexity and expense of running workload in virtualized
resources.
In many cases, however, HPC architects would be willing to
lose some small percentage of application performance to
achieve the flexibility and resilience that virtual machine based
computing would allow. There are several reasons architects
may make this compromise, including:
•	 Security: Some HPC environments require data and host
isolation between groups of users or even between the users
themselves. In these situations VMs and VLANs can be used
in consort to isolate users from each other and isolate data to
the users who should have access to it.
•	 Application stack control: In a mixed application
environment where multiple applications share the same
physical hardware, it can be difficult to satisfy the
configuration requirements of each application, including OS
versions, updates and libraries. Using virtualization makes that
task easier since the whole stack can be deployed as part of the
application.
•	 High value asset maximization: In a heterogeneous HPC
system the newest machines are often in highest demand. To
manage this demand, some organizations use a reservation
system to minimize conflicts between users. When using VMs
for computing, however, the migration facility available within
IBM Systems and Technology Group 3
most hypervisors allows opportunistic workloads to use high
value assets by even after a reservation window opens for a
different user. If the reserving user submits workload against a
reservation, then the opportunistic workload can be migrated
to other assets to continue processing without losing any CPU
cycles.
•	 Utilization improvement: If the losses in application
performance are very small (single digit percentages), then
adoption of virtualization technology may enable incremental
steps forward in overall utilization in some cases. In these
cases, virtualization may offer an increase in overall HPC
throughput for the HPC environment.
•	 Large execution time jobs: Several HPC applications offer
no checkpoint restart capability. VM technology can capture
and checkpoint the entire state of the virtual machine,
however, allowing for checkpoint of these applications. If jobs
run long enough to be at the same MTBF for the solution as a
whole, then the checkpoint facility available within virtual
machines may be very attractive. Additionally, if server
maintenance is a common or predictable occurrence, then
checkpoint migration or suspension of a long running job
within a VM could prevent loss of compute time.
•	 Increases in job reliability: Virtual machines, if used on a 1:1
basis with batch jobs (meaning each job runs within a VM
container), provide a barrier between their own environment,
the host environment and any other virtual machine
environments running on the hypervisor. As such, “rogue”
jobs which try and access more memory or cpu cores than
expected can be isolated from well behaved jobs allocated
resources as expected. Such a situation without virtual
machine containment, where jobs share a physical host often
cause problems in the form of slowdowns, swapping or even
OS crashes.
Management tools
Achieving HPC in a cloud environment requires a few well
chosen tools including a hypervisor platform, workload
manager and an infrastructure management toolkit. The
management toolkit provides the policy definition,
enforcement, provisioning management, resource reservation
and reporting. The hypervisor platform provides the
foundation for the virtual portion of cloud resources and the
workload manager provides the task management.
The cloud computing management tools of IBM® Platform
Computing™—IBM® Platform™ Cluster Manager –
Advanced Edition and IBM® Platform™ Dynamic Cluster—
turn static clusters, grids and datacenters into dynamic shared
computing environments. The products can be used to create
private internal clouds or hybrid private clouds, which use
external public clouds for peak demand. This is commonly
referred to as “cloud bursting” or “peak shaving.”
Platform Cluster Manager – Advanced Edition creates a cloud
computing infrastructure to efficiently manage application
workloads applied to multiple virtual and physical platforms. It
does this by uniting diverse hypervisor and physical
environments into a single dynamically shared infrastructure.
Although this document describes the properties of virtual
machines, Platform Cluster Manager – Advanced Edition is
not in any way limited to managing virtual machines. It
unlocks the full computing potential lying dormant in existing
heterogeneous virtual and physical resources according to
workload-intelligent and resource-aware policies.
4 Could the “C” in HPC stand for Cloud?
Platform Cluster Manager – Advanced Edition optimizes
infrastructure resources dynamically based on perceived
demand and critical resource availability using an API or a web
interface. This allows users to enjoy the following business
benefits:
•	 By eliminating silos resource utilization can be improved
•	 Batch job wait times are reduced because of additional
resource availability or flexibility
•	 Users perceive a larger resource pool
•	 Administrator workload is reduced through multiple layers of
automation
•	 Power consumption and server proliferation is reduced
Subsystem benchmarks
Hardware environment and settings
KVM and OVM testing
Physical hardware: (2) HP ProLiant BL465cG5 with Dual
Socket Quad Core AMD 2382 + AMD-V and 16 GB RAM
OS Installed: RHEL 5.5 x86_64
Hypervisor(s): KVM in RHEL 5.5, OVM 2.2, RHEL 5.5 Xen
(para-virtualized)
Number of VMs per physical node: Unless otherwise noted,
benchmarks were run on a 4 GB memory VM.
Interconnects: The interconnect between VMs or hypervisors
was never used to run the benchmarks. The hypervisor hosts
were connected to a 1000baseT network.
Citrix Xen testing
Physical hardware: (2) HP ProLiant BL2x220c in a c3000
chassis with dual socket quad core 2.83 GHz Intel® CPUs and
8 GB RAM
OS Installed: CentOS Linux 5.3 x86_64
Storage: Local Disk
Hypervisor: Citrix Xen 5.5
VM Configuration: (Qty 1) 8 GB VM with 8 cores, (Qty 2) 4
GB VMs with 4 cores, (Qty 4) 2 GB VMs with 2 cores, (Qty 8)
1 GB VMs with 1 core
NetPIPE
NetPIPE is an acronym that stands for Network Protocol
Independent Performance Evaluator.1
It is a useful tool for
measuring two important characteristics of networks: latency
and bandwidth. HPC application performance is becoming
increasingly dependent on the interconnect between compute
servers. Because of this trend, not only does parallel application
performance need to be examined, but also the performance
level of the network alone from both the latency and the
bandwidth standpoints.
The terms used for each data series in this section are defined
as follows:
•	 no_bkpln: Refers to communications happening over a
1000baseT Ethernet network
•	 same_bkpln: Refers to communications traversing a
backplane within a blade enclosure
IBM Systems and Technology Group 5
•	 diff_hyp: Refers to virtual machine to virtual machine
communication occurring between two separate physical
hypervisors
•	 pm2pm: Physical machine to physical machine
•	 vm2pm: Virtual machine to physical machine
•	 vm2vm: Virtual machine to virtual machine
Figures 1 and 2 illustrate that the closer the two entities
communicating are, the higher the bandwidth and lower the
latency between them. Additionally they show that when there
is a hypervisor layer between the entities, the communication is
slowed only slightly, and latencies stay in the expected range
for 1000baseT communication (60 - 80 µsec). When two
different VMs on separate hypervisors communicate—even
when the backplane is within the blade chassis—the latency is
more than double. The story gets even worse (by about 50
percent) when the two VMs do not share a backplane and
communicate over TCP/IP.
This benchmark illustrates that not all HPC workloads are
suitable for a virtualized environment. When applications run
in parallel and are latency sensitive (as many MPI based
applications are), using virtualized resources may be something
that should be avoided. If there is no choice but to use
virtualized resources, then the scheduler must have the ability
to choose resources that are adjacent to each other on the
network or the performance is likely to be unacceptable. This
conclusion also applies to transactional applications where
latency can be the largest part of the ‘submit to receive cycle
time.’
Figure 1: Network bandwidth between machines
Figure 2: Network latency between machines
6 Could the “C” in HPC stand for Cloud?
IOzone
IOzone is a file system benchmarking tool, which generates
and measures a variety of file operations.2
In this benchmark,
IOzone was only run for write, rewrite, read and reread to
mimic the most popular functions an I/O subsystem performs.
This steady state I/O test clearly demonstrates that KVM
hypervisors are severely lacking when it comes to I/O to disk
in both reads and writes. Even in the OVM case, in a best case
scenario the performance of the I/O is nearing 40 percent
degradation. Write performance for Citrix Xen is also limited.
However, read performance exceeds that of the physical
machine by over 7 percent. This can only be attributed to a
read-ahead function in Xen, which worked better than the
native Linux read-ahead algorithm.
Figure 3: IOzone 32 GB file (Local disk) Figure 4: IOzone 32 GB file (Local disk)
Regardless, this benchmark, more than others, provides a
warning to early HPC cloud adopters of the performance risks
of virtual technologies. HPC users running I/O bound
applications (Nastran, Gaussian, certain types of ABAQUS
jobs, and so on) should steer clear of virtualization until these
issues are resolved.
Application benchmarks
Software compilation
Compiler used: gcc-4.1.2
Compilation target: Linux kernel 2.6.34 (with ‘deconfig’ option).
All transient files were put in a run specific subdirectory using
the ‘O’ option in make. Thus the source is kept in read-only
state and writes are into the run specific sub-directory.
IBM Systems and Technology Group 7
Figure 5 shows the difference in compilation performance for a
physical machine running a compile on an NFS volume
compared to Citrix Xen doing the same thing on the same
NFS volume. Citrix Xen is roughly 11 percent slower than the
physical machine performing the task. Also included is the
difference between compiling to a local disk target versus
compiling to the NFS target on the physical machine. The
results illustrate how NFS performance can significantly affect
a job’s elapsed time. This is of crucial importance because most
virtualized private cloud implementations utilize NFS as the
file system instead of using local drives to facilitate migration.
SIMULIA® Abaqus
SIMULIA® Abaqus3
is the standard of the manufacturing
industry for implicit and explicit non-linear finite element
Figure 5: Compilation of kernel 2.6.34 Figure 6: Parallel ABAQUS explicit (e2.inp)
solutions. SIMULIA publishes a benchmark suite that
hardware vendors use to distinguish their products.4
“e2” and
“s6” were used for these benchmarks.
The ABAQUS explicit distributed parallel runs were
performed using HP MPI (2.03.01) and scratch files were
written to local scratch disk. This comparison, unlike the
others presented in this paper, was done in two different ways:
1.	The data series called “Citrix” is for a single 8 GB RAM VM
with 8 cores where the MPI ranks communicated within a
single VM.
2.	The data series called “Citrix – Different VMs” represents
multiple separate VMs defined on the hypervisor host
intercommunicating.
8 Could the “C” in HPC stand for Cloud?
Figure 7: Parallel ABAQUS standard (s6.inp)
As expected, the additional layers of virtualized networking
slowed the communication speeds (also shown in the NetPIPE
results) and reduced scalability when the job had higher rank
counts. In addition, for communications within a VM, the
performance for a virtual machine compared to the physical
machine was almost identical.
ABAQUS has a different algorithm for solving implicit Finite
Element Analysis (FEA) problem called “ABAQUS Standard.”
This method does not run distributed parallel, but can be run
SMP parallel which was done for the “s6” benchmark.
Figure 8: Serial FLUENT 12.1
Typically ABAQUS Standard does considerably more I/O to
scratch disk than its explicit counterpart. However, this is
dependent upon the amount of memory available in the
execution environment. It is clear again that when an
application is only CPU or memory constrained, a virtual
machine has almost no detectable performance impact.
ANSYS® FLUENT
ANSYS® FLUENT5
belongs to a large class of HPC
applications referred to as computational fluid dynamics (CFD)
codes. The “aircraft_2m” FLUENT model was selected based
on size and run for 25 iterations. The “sedan_4m” model was
chosen as a suitable sized model for running in parallel.
Hundred iterations were performed using this model.
IBM Systems and Technology Group 9
Figure 9: Distributed parallel FLUENT 12.1 (sedan_4m - 100 iterations)
Though CFD codes such as FLUENT are rarely run serially
because of memory requirements or solution time
requirements, the comparison in Figure 8 shows that the
solution time for a physical machine and a virtual machine are
different by only 1.9 percent where the virtual machine is the
slower of the two. The “aircraft_2m” model was simply too
small to scale well in parallel, and provided strangely varying
results, so the sedan_4m model was used.6
The result for the parallel case (Figure 9) illustrate that at two
CPUs the virtual machine outperforms the physical machine.
This is most likely caused by the native Linux scheduler
moving processes around on the physical host. If the
application had been bound to particular cores, then this effect
would disappear. In the four and eight CPU runs the difference
between physical and virtual machines is negligible. This
supports the theory that the Linux CPU scheduler is impacting
the two CPU job.
LS-DYNA®
LS-DYNA®7
is a transient dynamic finite element analysis
program capable of solving complex real world time domain
problems on serial, SMP parallel, and distributed parallel
computational engines. The “refined_neon_30ms” model was
chosen for benchmarks reviewed in this section. HP MPI
2.03.01, now owned by IBM Platform Computing was the
message passing library used.
Figure 10: LS-DYNA - MPP971 - Refined Neon
10 Could the “C” in HPC stand for Cloud?
The MPP-DYNA application responds well when run in a low
latency environment. This benchmark supports the notion that
distributed parallel LS-DYNA jobs are still very sensitive to
network latency, even when using a backplane of a VM. A serial
run shows a virtual machine is 1 percent slower. Introduce
message passing, however, and at eight CPUs the virtual
machine is nearly 40 percent slower than the physical machine.
The expectation is that if the same job was run on multiple
VMs as was done for ABAQUS Explicit parallel jobs, the effect
would be even greater, where physical machines significantly
outperform virtual machines.
Conclusion
As with most legends, there is some truth to the notion that
VMs are inappropriate for HPC applications. The benchmark
results demonstrate that latency sensitive and I/O bound
applications would perform at levels unacceptable to HPC
users. However, the results also show that CPU and memory
bound applications and parallel applications that are not
latency sensitive perform well in a virtual environment. HPC
architects who dismiss virtualization technology entirely may
therefore be missing an enormous opportunity to inject
flexibility and even a performance edge into their HPC
designs.
The power of Platform Cluster Manger - Advanced Edition
and IBM® Platform™ LSF® is their ability to work in consort
to manage both of these types of workload simultaneously in a
single environment. These tools allow their users to maximize
resource utilization and flexibility through provisioning and
control at the physical and virtual levels. Only IBM Platform
Computing technology allows for environment optimization at
the job-by-job level, and only Platform Cluster Manager –
Advanced Edition continues to optimize that environment
after jobs have been scheduled and new jobs have been
submitted. Such an environment could realize orders of
magnitude increases in efficiency and throughput while
reducing the overhead of IT maintenance.
Significant results
•	 The KVM hypervisor significantly outperforms the OVM hypervisor on AMD servers, especially when several VMs run
simultaneously.
•	 Citrix Xen I/O read and rereads are very fast on Intel servers.
•	 OVM outperforms KVM by a significant margin for I/O intensive applications running on AMD servers.
•	 I/O intensive and latency sensitive parallel applications are not a good fit for virtual environments today.
•	 Memory and CPU bound applications are at performance parity between physical and virtual machines.
Notes
For more information
To learn more about IBM Platform Computing, please
contact your IBM marketing representative or IBM
Business Partner, or visit the following website:
ibm.com/platformcomputing
© Copyright IBM Corporation 2012
IBM Corporation
Systems and Technology Group
Route 100
Somers, NY 10589
Produced in the United States of America
November 2012
IBM, the IBM logo, ibm.com, Platform Computing, Platform Cluster
Manager, Platform Dynamic Cluster and Platform LSF are trademarks of
International Business Machines Corp., registered in many jurisdictions
worldwide. Other product and service names might be trademarks of IBM
or other companies. A current list of IBM trademarks is available on the
web at “Copyright and trademark information” at
ibm.com/legal/copytrade.shtml
Intel, Intel logo, Intel Inside, Intel Inside logo, Intel Centrino, Intel
Centrino logo, Celeron, Intel Xeon, Intel SpeedStep, Itanium, and Pentium
are trademarks or registered trademarks of Intel Corporation or its
subsidiaries in the United States and other countries.
Linux is a registered trademark of Linus Torvalds in the United States,
other countries, or both.
This document is current as of the initial date of publication and may be
changed by IBM at any time. Not all offerings are available in every country
in which IBM operates.
The performance data discussed herein is presented as derived under
specific operating conditions. Actual results may vary. THE
INFORMATION IN THIS DOCUMENT IS PROVIDED “AS IS”
WITHOUT ANY WARRANTY, EXPRESS OR IMPLIED,
INCLUDING WITHOUT ANY WARRANTIES OF
MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE
AND ANY WARRANTY OR CONDITION OF NON-
INFRINGEMENT. IBM products are warranted according to the terms
and conditions of the agreements under which they are provided.
Actual available storage capacity may be reported for both uncompressed
and compressed data and will vary and may be less than stated.
1
http://www.scl.ameslab.gov/netpipe/
2
http://www.iozone.org/
3
ABAQUS is a trademark of Simulia and Dassault Systemes (http://www.
simulia.com)
4
See http://www.simulia.com/support/v67/v67_performance.html for
description of the benchmark models and availability
5
Fluent is a trademark of ANSYS, Inc (http://www.fluent.com)
6
The largest model provided by ANSYS, “truck_14m”, was not an option
for this benchmark asthe model was too large to fit into memory.
7
LS-DYNA is a trademark of LSTC (http://www.lstc.com/)
Please Recycle
DCW03038-USEN-00

Weitere ähnliche Inhalte

Was ist angesagt?

Performance Analysis of Server Consolidation Algorithms in Virtualized Cloud...
Performance Analysis of Server Consolidation Algorithms in  Virtualized Cloud...Performance Analysis of Server Consolidation Algorithms in  Virtualized Cloud...
Performance Analysis of Server Consolidation Algorithms in Virtualized Cloud...Susheel Thakur
 
A Dynamically-adaptive Resource Aware Load Balancing Scheme for VM migrations...
A Dynamically-adaptive Resource Aware Load Balancing Scheme for VM migrations...A Dynamically-adaptive Resource Aware Load Balancing Scheme for VM migrations...
A Dynamically-adaptive Resource Aware Load Balancing Scheme for VM migrations...IOSR Journals
 
IBM Managing Workload Scalability with MQ Clusters
IBM Managing Workload Scalability with MQ ClustersIBM Managing Workload Scalability with MQ Clusters
IBM Managing Workload Scalability with MQ ClustersIBM Systems UKI
 
Building highly available architectures with WAS and MQ
Building highly available architectures with WAS and MQBuilding highly available architectures with WAS and MQ
Building highly available architectures with WAS and MQMatthew White
 
Cloud Computing Presentation
Cloud Computing PresentationCloud Computing Presentation
Cloud Computing PresentationMohammed Kharma
 
Conference Paper: CHASE: Component High-Availability Scheduler in Cloud Compu...
Conference Paper: CHASE: Component High-Availability Scheduler in Cloud Compu...Conference Paper: CHASE: Component High-Availability Scheduler in Cloud Compu...
Conference Paper: CHASE: Component High-Availability Scheduler in Cloud Compu...Ericsson
 
IBM IMPACT 2014 - AMC-1882 Building a Scalable & Continuously Available IBM M...
IBM IMPACT 2014 - AMC-1882 Building a Scalable & Continuously Available IBM M...IBM IMPACT 2014 - AMC-1882 Building a Scalable & Continuously Available IBM M...
IBM IMPACT 2014 - AMC-1882 Building a Scalable & Continuously Available IBM M...Peter Broadhurst
 
Dynamic Resource Allocation Using Virtual Machines for Cloud Computing Enviro...
Dynamic Resource Allocation Using Virtual Machines for Cloud Computing Enviro...Dynamic Resource Allocation Using Virtual Machines for Cloud Computing Enviro...
Dynamic Resource Allocation Using Virtual Machines for Cloud Computing Enviro...SaikiranReddy Sama
 
Data Core Riverved Dr 22 Sep08
Data Core Riverved Dr 22 Sep08Data Core Riverved Dr 22 Sep08
Data Core Riverved Dr 22 Sep08michaelking
 
Architecting High Availability Linux Environments within the Rackspace Cloud
Architecting High Availability Linux Environments within the Rackspace CloudArchitecting High Availability Linux Environments within the Rackspace Cloud
Architecting High Availability Linux Environments within the Rackspace CloudRackspace
 
Performance and Cost Analysis of Modern Public Cloud Services
Performance and Cost Analysis of Modern Public Cloud ServicesPerformance and Cost Analysis of Modern Public Cloud Services
Performance and Cost Analysis of Modern Public Cloud ServicesMd.Saiedur Rahaman
 
CtrlS: Cloud Solutions for Retail & eCommerce
CtrlS: Cloud Solutions for Retail & eCommerceCtrlS: Cloud Solutions for Retail & eCommerce
CtrlS: Cloud Solutions for Retail & eCommerceeTailing India
 
Migrate VMs faster with a new Dell EMC PowerEdge MX solution - Summary
Migrate VMs faster with a new Dell EMC PowerEdge MX solution - SummaryMigrate VMs faster with a new Dell EMC PowerEdge MX solution - Summary
Migrate VMs faster with a new Dell EMC PowerEdge MX solution - SummaryPrincipled Technologies
 
Performance Evaluation of Server Consolidation Algorithms in Virtualized Clo...
Performance Evaluation of Server Consolidation Algorithms  in Virtualized Clo...Performance Evaluation of Server Consolidation Algorithms  in Virtualized Clo...
Performance Evaluation of Server Consolidation Algorithms in Virtualized Clo...Susheel Thakur
 
Server Consolidation Algorithms for Virtualized Cloud Environment: A Performa...
Server Consolidation Algorithms for Virtualized Cloud Environment: A Performa...Server Consolidation Algorithms for Virtualized Cloud Environment: A Performa...
Server Consolidation Algorithms for Virtualized Cloud Environment: A Performa...Susheel Thakur
 
Windows Server Virtualization
Windows Server VirtualizationWindows Server Virtualization
Windows Server Virtualizationwebhostingguy
 

Was ist angesagt? (20)

Performance Analysis of Server Consolidation Algorithms in Virtualized Cloud...
Performance Analysis of Server Consolidation Algorithms in  Virtualized Cloud...Performance Analysis of Server Consolidation Algorithms in  Virtualized Cloud...
Performance Analysis of Server Consolidation Algorithms in Virtualized Cloud...
 
A Dynamically-adaptive Resource Aware Load Balancing Scheme for VM migrations...
A Dynamically-adaptive Resource Aware Load Balancing Scheme for VM migrations...A Dynamically-adaptive Resource Aware Load Balancing Scheme for VM migrations...
A Dynamically-adaptive Resource Aware Load Balancing Scheme for VM migrations...
 
IBM Managing Workload Scalability with MQ Clusters
IBM Managing Workload Scalability with MQ ClustersIBM Managing Workload Scalability with MQ Clusters
IBM Managing Workload Scalability with MQ Clusters
 
Building highly available architectures with WAS and MQ
Building highly available architectures with WAS and MQBuilding highly available architectures with WAS and MQ
Building highly available architectures with WAS and MQ
 
Cloud Computing Presentation
Cloud Computing PresentationCloud Computing Presentation
Cloud Computing Presentation
 
Conference Paper: CHASE: Component High-Availability Scheduler in Cloud Compu...
Conference Paper: CHASE: Component High-Availability Scheduler in Cloud Compu...Conference Paper: CHASE: Component High-Availability Scheduler in Cloud Compu...
Conference Paper: CHASE: Component High-Availability Scheduler in Cloud Compu...
 
IBM PowerVM for IBM PowerLinux
IBM PowerVM for IBM PowerLinuxIBM PowerVM for IBM PowerLinux
IBM PowerVM for IBM PowerLinux
 
IBM IMPACT 2014 - AMC-1882 Building a Scalable & Continuously Available IBM M...
IBM IMPACT 2014 - AMC-1882 Building a Scalable & Continuously Available IBM M...IBM IMPACT 2014 - AMC-1882 Building a Scalable & Continuously Available IBM M...
IBM IMPACT 2014 - AMC-1882 Building a Scalable & Continuously Available IBM M...
 
zClouds - A better business Cloud
zClouds - A better business CloudzClouds - A better business Cloud
zClouds - A better business Cloud
 
Dynamic Resource Allocation Using Virtual Machines for Cloud Computing Enviro...
Dynamic Resource Allocation Using Virtual Machines for Cloud Computing Enviro...Dynamic Resource Allocation Using Virtual Machines for Cloud Computing Enviro...
Dynamic Resource Allocation Using Virtual Machines for Cloud Computing Enviro...
 
Data Core Riverved Dr 22 Sep08
Data Core Riverved Dr 22 Sep08Data Core Riverved Dr 22 Sep08
Data Core Riverved Dr 22 Sep08
 
Architecting High Availability Linux Environments within the Rackspace Cloud
Architecting High Availability Linux Environments within the Rackspace CloudArchitecting High Availability Linux Environments within the Rackspace Cloud
Architecting High Availability Linux Environments within the Rackspace Cloud
 
Cloud computing
Cloud computingCloud computing
Cloud computing
 
Performance and Cost Analysis of Modern Public Cloud Services
Performance and Cost Analysis of Modern Public Cloud ServicesPerformance and Cost Analysis of Modern Public Cloud Services
Performance and Cost Analysis of Modern Public Cloud Services
 
CtrlS: Cloud Solutions for Retail & eCommerce
CtrlS: Cloud Solutions for Retail & eCommerceCtrlS: Cloud Solutions for Retail & eCommerce
CtrlS: Cloud Solutions for Retail & eCommerce
 
N fv good
N fv goodN fv good
N fv good
 
Migrate VMs faster with a new Dell EMC PowerEdge MX solution - Summary
Migrate VMs faster with a new Dell EMC PowerEdge MX solution - SummaryMigrate VMs faster with a new Dell EMC PowerEdge MX solution - Summary
Migrate VMs faster with a new Dell EMC PowerEdge MX solution - Summary
 
Performance Evaluation of Server Consolidation Algorithms in Virtualized Clo...
Performance Evaluation of Server Consolidation Algorithms  in Virtualized Clo...Performance Evaluation of Server Consolidation Algorithms  in Virtualized Clo...
Performance Evaluation of Server Consolidation Algorithms in Virtualized Clo...
 
Server Consolidation Algorithms for Virtualized Cloud Environment: A Performa...
Server Consolidation Algorithms for Virtualized Cloud Environment: A Performa...Server Consolidation Algorithms for Virtualized Cloud Environment: A Performa...
Server Consolidation Algorithms for Virtualized Cloud Environment: A Performa...
 
Windows Server Virtualization
Windows Server VirtualizationWindows Server Virtualization
Windows Server Virtualization
 

Ähnlich wie Could the “C” in HPC stand for Cloud?

A Survey of Performance Comparison between Virtual Machines and Containers
A Survey of Performance Comparison between Virtual Machines and ContainersA Survey of Performance Comparison between Virtual Machines and Containers
A Survey of Performance Comparison between Virtual Machines and Containersprashant desai
 
HPC HUB - Virtual Supercomputer on Demand
HPC HUB - Virtual Supercomputer on DemandHPC HUB - Virtual Supercomputer on Demand
HPC HUB - Virtual Supercomputer on DemandVilgelm Bitner
 
Dynamic resource allocation using virtual machines for cloud computing enviro...
Dynamic resource allocation using virtual machines for cloud computing enviro...Dynamic resource allocation using virtual machines for cloud computing enviro...
Dynamic resource allocation using virtual machines for cloud computing enviro...IEEEFINALYEARPROJECTS
 
JAVA 2013 IEEE PARALLELDISTRIBUTION PROJECT Dynamic resource allocation using...
JAVA 2013 IEEE PARALLELDISTRIBUTION PROJECT Dynamic resource allocation using...JAVA 2013 IEEE PARALLELDISTRIBUTION PROJECT Dynamic resource allocation using...
JAVA 2013 IEEE PARALLELDISTRIBUTION PROJECT Dynamic resource allocation using...IEEEGLOBALSOFTTECHNOLOGIES
 
lect15_cloud.ppt
lect15_cloud.pptlect15_cloud.ppt
lect15_cloud.pptAjit Mali
 
cloudintroduction.ppt
cloudintroduction.pptcloudintroduction.ppt
cloudintroduction.pptAhmedRebai8
 
Introduction to Cloud Computing
Introduction to Cloud ComputingIntroduction to Cloud Computing
Introduction to Cloud ComputingBharat Kalia
 
IEEE Paper - A Study Of Cloud Computing Environments For High Performance App...
IEEE Paper - A Study Of Cloud Computing Environments For High Performance App...IEEE Paper - A Study Of Cloud Computing Environments For High Performance App...
IEEE Paper - A Study Of Cloud Computing Environments For High Performance App...Angela Williams
 
How to Ensure Next-Generation Services
How to Ensure Next-Generation ServicesHow to Ensure Next-Generation Services
How to Ensure Next-Generation ServicesFluke Networks
 
Introduction to Cloud Computing
Introduction to Cloud Computing Introduction to Cloud Computing
Introduction to Cloud Computing Pratik Patil
 
OpenNebula TechDay Boston 2015 - Bringing Private Cloud Computing to HPC and ...
OpenNebula TechDay Boston 2015 - Bringing Private Cloud Computing to HPC and ...OpenNebula TechDay Boston 2015 - Bringing Private Cloud Computing to HPC and ...
OpenNebula TechDay Boston 2015 - Bringing Private Cloud Computing to HPC and ...OpenNebula Project
 
Cloud computing-2 (1)
Cloud computing-2 (1)Cloud computing-2 (1)
Cloud computing-2 (1)JUDYFLAVIAB
 

Ähnlich wie Could the “C” in HPC stand for Cloud? (20)

A Survey of Performance Comparison between Virtual Machines and Containers
A Survey of Performance Comparison between Virtual Machines and ContainersA Survey of Performance Comparison between Virtual Machines and Containers
A Survey of Performance Comparison between Virtual Machines and Containers
 
HPC HUB - Virtual Supercomputer on Demand
HPC HUB - Virtual Supercomputer on DemandHPC HUB - Virtual Supercomputer on Demand
HPC HUB - Virtual Supercomputer on Demand
 
Dynamic resource allocation using virtual machines for cloud computing enviro...
Dynamic resource allocation using virtual machines for cloud computing enviro...Dynamic resource allocation using virtual machines for cloud computing enviro...
Dynamic resource allocation using virtual machines for cloud computing enviro...
 
JAVA 2013 IEEE PARALLELDISTRIBUTION PROJECT Dynamic resource allocation using...
JAVA 2013 IEEE PARALLELDISTRIBUTION PROJECT Dynamic resource allocation using...JAVA 2013 IEEE PARALLELDISTRIBUTION PROJECT Dynamic resource allocation using...
JAVA 2013 IEEE PARALLELDISTRIBUTION PROJECT Dynamic resource allocation using...
 
lect15_cloud.ppt
lect15_cloud.pptlect15_cloud.ppt
lect15_cloud.ppt
 
cloudintroduction.ppt
cloudintroduction.pptcloudintroduction.ppt
cloudintroduction.ppt
 
Introduction to Cloud Computing
Introduction to Cloud ComputingIntroduction to Cloud Computing
Introduction to Cloud Computing
 
IEEE Paper - A Study Of Cloud Computing Environments For High Performance App...
IEEE Paper - A Study Of Cloud Computing Environments For High Performance App...IEEE Paper - A Study Of Cloud Computing Environments For High Performance App...
IEEE Paper - A Study Of Cloud Computing Environments For High Performance App...
 
Unit 2
Unit 2Unit 2
Unit 2
 
cloud computng
cloud computng cloud computng
cloud computng
 
Cloud ppt
Cloud pptCloud ppt
Cloud ppt
 
Moving CCAP To The Cloud
Moving CCAP To The CloudMoving CCAP To The Cloud
Moving CCAP To The Cloud
 
lect15_cloud.ppt
lect15_cloud.pptlect15_cloud.ppt
lect15_cloud.ppt
 
How to Ensure Next-Generation Services
How to Ensure Next-Generation ServicesHow to Ensure Next-Generation Services
How to Ensure Next-Generation Services
 
cloud computing
cloud computingcloud computing
cloud computing
 
Introduction to Cloud Computing
Introduction to Cloud Computing Introduction to Cloud Computing
Introduction to Cloud Computing
 
OpenNebula TechDay Boston 2015 - Bringing Private Cloud Computing to HPC and ...
OpenNebula TechDay Boston 2015 - Bringing Private Cloud Computing to HPC and ...OpenNebula TechDay Boston 2015 - Bringing Private Cloud Computing to HPC and ...
OpenNebula TechDay Boston 2015 - Bringing Private Cloud Computing to HPC and ...
 
Cloud computing-2 (1)
Cloud computing-2 (1)Cloud computing-2 (1)
Cloud computing-2 (1)
 
Lect15 cloud
Lect15 cloudLect15 cloud
Lect15 cloud
 
Cloud
CloudCloud
Cloud
 

Mehr von IBM India Smarter Computing

Using the IBM XIV Storage System in OpenStack Cloud Environments
Using the IBM XIV Storage System in OpenStack Cloud Environments Using the IBM XIV Storage System in OpenStack Cloud Environments
Using the IBM XIV Storage System in OpenStack Cloud Environments IBM India Smarter Computing
 
TSL03104USEN Exploring VMware vSphere Storage API for Array Integration on th...
TSL03104USEN Exploring VMware vSphere Storage API for Array Integration on th...TSL03104USEN Exploring VMware vSphere Storage API for Array Integration on th...
TSL03104USEN Exploring VMware vSphere Storage API for Array Integration on th...IBM India Smarter Computing
 
A Comparison of PowerVM and Vmware Virtualization Performance
A Comparison of PowerVM and Vmware Virtualization PerformanceA Comparison of PowerVM and Vmware Virtualization Performance
A Comparison of PowerVM and Vmware Virtualization PerformanceIBM India Smarter Computing
 
IBM pureflex system and vmware vcloud enterprise suite reference architecture
IBM pureflex system and vmware vcloud enterprise suite reference architectureIBM pureflex system and vmware vcloud enterprise suite reference architecture
IBM pureflex system and vmware vcloud enterprise suite reference architectureIBM India Smarter Computing
 

Mehr von IBM India Smarter Computing (20)

Using the IBM XIV Storage System in OpenStack Cloud Environments
Using the IBM XIV Storage System in OpenStack Cloud Environments Using the IBM XIV Storage System in OpenStack Cloud Environments
Using the IBM XIV Storage System in OpenStack Cloud Environments
 
All-flash Needs End to End Storage Efficiency
All-flash Needs End to End Storage EfficiencyAll-flash Needs End to End Storage Efficiency
All-flash Needs End to End Storage Efficiency
 
TSL03104USEN Exploring VMware vSphere Storage API for Array Integration on th...
TSL03104USEN Exploring VMware vSphere Storage API for Array Integration on th...TSL03104USEN Exploring VMware vSphere Storage API for Array Integration on th...
TSL03104USEN Exploring VMware vSphere Storage API for Array Integration on th...
 
IBM FlashSystem 840 Product Guide
IBM FlashSystem 840 Product GuideIBM FlashSystem 840 Product Guide
IBM FlashSystem 840 Product Guide
 
IBM System x3250 M5
IBM System x3250 M5IBM System x3250 M5
IBM System x3250 M5
 
IBM NeXtScale nx360 M4
IBM NeXtScale nx360 M4IBM NeXtScale nx360 M4
IBM NeXtScale nx360 M4
 
IBM System x3650 M4 HD
IBM System x3650 M4 HDIBM System x3650 M4 HD
IBM System x3650 M4 HD
 
IBM System x3300 M4
IBM System x3300 M4IBM System x3300 M4
IBM System x3300 M4
 
IBM System x iDataPlex dx360 M4
IBM System x iDataPlex dx360 M4IBM System x iDataPlex dx360 M4
IBM System x iDataPlex dx360 M4
 
IBM System x3500 M4
IBM System x3500 M4IBM System x3500 M4
IBM System x3500 M4
 
IBM System x3550 M4
IBM System x3550 M4IBM System x3550 M4
IBM System x3550 M4
 
IBM System x3650 M4
IBM System x3650 M4IBM System x3650 M4
IBM System x3650 M4
 
IBM System x3500 M3
IBM System x3500 M3IBM System x3500 M3
IBM System x3500 M3
 
IBM System x3400 M3
IBM System x3400 M3IBM System x3400 M3
IBM System x3400 M3
 
IBM System x3250 M3
IBM System x3250 M3IBM System x3250 M3
IBM System x3250 M3
 
IBM System x3200 M3
IBM System x3200 M3IBM System x3200 M3
IBM System x3200 M3
 
IBM PowerVC Introduction and Configuration
IBM PowerVC Introduction and ConfigurationIBM PowerVC Introduction and Configuration
IBM PowerVC Introduction and Configuration
 
A Comparison of PowerVM and Vmware Virtualization Performance
A Comparison of PowerVM and Vmware Virtualization PerformanceA Comparison of PowerVM and Vmware Virtualization Performance
A Comparison of PowerVM and Vmware Virtualization Performance
 
IBM pureflex system and vmware vcloud enterprise suite reference architecture
IBM pureflex system and vmware vcloud enterprise suite reference architectureIBM pureflex system and vmware vcloud enterprise suite reference architecture
IBM pureflex system and vmware vcloud enterprise suite reference architecture
 
X6: The sixth generation of EXA Technology
X6: The sixth generation of EXA TechnologyX6: The sixth generation of EXA Technology
X6: The sixth generation of EXA Technology
 

Could the “C” in HPC stand for Cloud?

  • 1. Thought Leadership White Paper IBM Systems and Technology Group November 2012 Could the “C” in HPC stand for Cloud? By Christopher N. Porter, IBM Corporation porterc@us.ibm.com
  • 2. 2 Could the “C” in HPC stand for Cloud? Introduction Most IaaS (infrastructure as a service) vendors such as Rackspace, Amazon and Savvis use various virtualization technologies to manage the underlying hardware they build their offerings on. Unfortunately the virtualization technologies used vary from vendor to vendor and are sometimes kept secret. Therefore, the question about virtual machines versus physical machines for high performance computing (HPC) applications is germane to any discussion of HPC in the cloud. This paper examines aspects of computing important in HPC (compute and network bandwidth, compute and network latency, memory size and bandwidth, I/O, and so on) and how they are affected by various virtualization technologies. The benchmark results presented will illuminate areas where cloud computing, as a virtualized infrastructure, is sufficient for some workloads and inappropriate for others. In addition, it will provide a quantitative assessment of the performance differences between a sample of applications running on various hypervisors so that data-based decisions can be made for datacenter and technology adoption planning. A business case for HPC clouds HPC architects have been slow to adopt virtualization technologies for two reasons: 1. The common assumption that virtualization impacts application performance so severely that any gains in flexibility are far outweighed by the loss of application throughput. 2. Utilization on traditional HPC infrastructure is very high (between 80 - 95 percent).Therefore, the typical driving business cases for virtualization (for example, utilization of hardware, server consolidation or license utilization) simply did not hold significant enough merit to justify the added complexity and expense of running workload in virtualized resources. In many cases, however, HPC architects would be willing to lose some small percentage of application performance to achieve the flexibility and resilience that virtual machine based computing would allow. There are several reasons architects may make this compromise, including: • Security: Some HPC environments require data and host isolation between groups of users or even between the users themselves. In these situations VMs and VLANs can be used in consort to isolate users from each other and isolate data to the users who should have access to it. • Application stack control: In a mixed application environment where multiple applications share the same physical hardware, it can be difficult to satisfy the configuration requirements of each application, including OS versions, updates and libraries. Using virtualization makes that task easier since the whole stack can be deployed as part of the application. • High value asset maximization: In a heterogeneous HPC system the newest machines are often in highest demand. To manage this demand, some organizations use a reservation system to minimize conflicts between users. When using VMs for computing, however, the migration facility available within
  • 3. IBM Systems and Technology Group 3 most hypervisors allows opportunistic workloads to use high value assets by even after a reservation window opens for a different user. If the reserving user submits workload against a reservation, then the opportunistic workload can be migrated to other assets to continue processing without losing any CPU cycles. • Utilization improvement: If the losses in application performance are very small (single digit percentages), then adoption of virtualization technology may enable incremental steps forward in overall utilization in some cases. In these cases, virtualization may offer an increase in overall HPC throughput for the HPC environment. • Large execution time jobs: Several HPC applications offer no checkpoint restart capability. VM technology can capture and checkpoint the entire state of the virtual machine, however, allowing for checkpoint of these applications. If jobs run long enough to be at the same MTBF for the solution as a whole, then the checkpoint facility available within virtual machines may be very attractive. Additionally, if server maintenance is a common or predictable occurrence, then checkpoint migration or suspension of a long running job within a VM could prevent loss of compute time. • Increases in job reliability: Virtual machines, if used on a 1:1 basis with batch jobs (meaning each job runs within a VM container), provide a barrier between their own environment, the host environment and any other virtual machine environments running on the hypervisor. As such, “rogue” jobs which try and access more memory or cpu cores than expected can be isolated from well behaved jobs allocated resources as expected. Such a situation without virtual machine containment, where jobs share a physical host often cause problems in the form of slowdowns, swapping or even OS crashes. Management tools Achieving HPC in a cloud environment requires a few well chosen tools including a hypervisor platform, workload manager and an infrastructure management toolkit. The management toolkit provides the policy definition, enforcement, provisioning management, resource reservation and reporting. The hypervisor platform provides the foundation for the virtual portion of cloud resources and the workload manager provides the task management. The cloud computing management tools of IBM® Platform Computing™—IBM® Platform™ Cluster Manager – Advanced Edition and IBM® Platform™ Dynamic Cluster— turn static clusters, grids and datacenters into dynamic shared computing environments. The products can be used to create private internal clouds or hybrid private clouds, which use external public clouds for peak demand. This is commonly referred to as “cloud bursting” or “peak shaving.” Platform Cluster Manager – Advanced Edition creates a cloud computing infrastructure to efficiently manage application workloads applied to multiple virtual and physical platforms. It does this by uniting diverse hypervisor and physical environments into a single dynamically shared infrastructure. Although this document describes the properties of virtual machines, Platform Cluster Manager – Advanced Edition is not in any way limited to managing virtual machines. It unlocks the full computing potential lying dormant in existing heterogeneous virtual and physical resources according to workload-intelligent and resource-aware policies.
  • 4. 4 Could the “C” in HPC stand for Cloud? Platform Cluster Manager – Advanced Edition optimizes infrastructure resources dynamically based on perceived demand and critical resource availability using an API or a web interface. This allows users to enjoy the following business benefits: • By eliminating silos resource utilization can be improved • Batch job wait times are reduced because of additional resource availability or flexibility • Users perceive a larger resource pool • Administrator workload is reduced through multiple layers of automation • Power consumption and server proliferation is reduced Subsystem benchmarks Hardware environment and settings KVM and OVM testing Physical hardware: (2) HP ProLiant BL465cG5 with Dual Socket Quad Core AMD 2382 + AMD-V and 16 GB RAM OS Installed: RHEL 5.5 x86_64 Hypervisor(s): KVM in RHEL 5.5, OVM 2.2, RHEL 5.5 Xen (para-virtualized) Number of VMs per physical node: Unless otherwise noted, benchmarks were run on a 4 GB memory VM. Interconnects: The interconnect between VMs or hypervisors was never used to run the benchmarks. The hypervisor hosts were connected to a 1000baseT network. Citrix Xen testing Physical hardware: (2) HP ProLiant BL2x220c in a c3000 chassis with dual socket quad core 2.83 GHz Intel® CPUs and 8 GB RAM OS Installed: CentOS Linux 5.3 x86_64 Storage: Local Disk Hypervisor: Citrix Xen 5.5 VM Configuration: (Qty 1) 8 GB VM with 8 cores, (Qty 2) 4 GB VMs with 4 cores, (Qty 4) 2 GB VMs with 2 cores, (Qty 8) 1 GB VMs with 1 core NetPIPE NetPIPE is an acronym that stands for Network Protocol Independent Performance Evaluator.1 It is a useful tool for measuring two important characteristics of networks: latency and bandwidth. HPC application performance is becoming increasingly dependent on the interconnect between compute servers. Because of this trend, not only does parallel application performance need to be examined, but also the performance level of the network alone from both the latency and the bandwidth standpoints. The terms used for each data series in this section are defined as follows: • no_bkpln: Refers to communications happening over a 1000baseT Ethernet network • same_bkpln: Refers to communications traversing a backplane within a blade enclosure
  • 5. IBM Systems and Technology Group 5 • diff_hyp: Refers to virtual machine to virtual machine communication occurring between two separate physical hypervisors • pm2pm: Physical machine to physical machine • vm2pm: Virtual machine to physical machine • vm2vm: Virtual machine to virtual machine Figures 1 and 2 illustrate that the closer the two entities communicating are, the higher the bandwidth and lower the latency between them. Additionally they show that when there is a hypervisor layer between the entities, the communication is slowed only slightly, and latencies stay in the expected range for 1000baseT communication (60 - 80 µsec). When two different VMs on separate hypervisors communicate—even when the backplane is within the blade chassis—the latency is more than double. The story gets even worse (by about 50 percent) when the two VMs do not share a backplane and communicate over TCP/IP. This benchmark illustrates that not all HPC workloads are suitable for a virtualized environment. When applications run in parallel and are latency sensitive (as many MPI based applications are), using virtualized resources may be something that should be avoided. If there is no choice but to use virtualized resources, then the scheduler must have the ability to choose resources that are adjacent to each other on the network or the performance is likely to be unacceptable. This conclusion also applies to transactional applications where latency can be the largest part of the ‘submit to receive cycle time.’ Figure 1: Network bandwidth between machines Figure 2: Network latency between machines
  • 6. 6 Could the “C” in HPC stand for Cloud? IOzone IOzone is a file system benchmarking tool, which generates and measures a variety of file operations.2 In this benchmark, IOzone was only run for write, rewrite, read and reread to mimic the most popular functions an I/O subsystem performs. This steady state I/O test clearly demonstrates that KVM hypervisors are severely lacking when it comes to I/O to disk in both reads and writes. Even in the OVM case, in a best case scenario the performance of the I/O is nearing 40 percent degradation. Write performance for Citrix Xen is also limited. However, read performance exceeds that of the physical machine by over 7 percent. This can only be attributed to a read-ahead function in Xen, which worked better than the native Linux read-ahead algorithm. Figure 3: IOzone 32 GB file (Local disk) Figure 4: IOzone 32 GB file (Local disk) Regardless, this benchmark, more than others, provides a warning to early HPC cloud adopters of the performance risks of virtual technologies. HPC users running I/O bound applications (Nastran, Gaussian, certain types of ABAQUS jobs, and so on) should steer clear of virtualization until these issues are resolved. Application benchmarks Software compilation Compiler used: gcc-4.1.2 Compilation target: Linux kernel 2.6.34 (with ‘deconfig’ option). All transient files were put in a run specific subdirectory using the ‘O’ option in make. Thus the source is kept in read-only state and writes are into the run specific sub-directory.
  • 7. IBM Systems and Technology Group 7 Figure 5 shows the difference in compilation performance for a physical machine running a compile on an NFS volume compared to Citrix Xen doing the same thing on the same NFS volume. Citrix Xen is roughly 11 percent slower than the physical machine performing the task. Also included is the difference between compiling to a local disk target versus compiling to the NFS target on the physical machine. The results illustrate how NFS performance can significantly affect a job’s elapsed time. This is of crucial importance because most virtualized private cloud implementations utilize NFS as the file system instead of using local drives to facilitate migration. SIMULIA® Abaqus SIMULIA® Abaqus3 is the standard of the manufacturing industry for implicit and explicit non-linear finite element Figure 5: Compilation of kernel 2.6.34 Figure 6: Parallel ABAQUS explicit (e2.inp) solutions. SIMULIA publishes a benchmark suite that hardware vendors use to distinguish their products.4 “e2” and “s6” were used for these benchmarks. The ABAQUS explicit distributed parallel runs were performed using HP MPI (2.03.01) and scratch files were written to local scratch disk. This comparison, unlike the others presented in this paper, was done in two different ways: 1. The data series called “Citrix” is for a single 8 GB RAM VM with 8 cores where the MPI ranks communicated within a single VM. 2. The data series called “Citrix – Different VMs” represents multiple separate VMs defined on the hypervisor host intercommunicating.
  • 8. 8 Could the “C” in HPC stand for Cloud? Figure 7: Parallel ABAQUS standard (s6.inp) As expected, the additional layers of virtualized networking slowed the communication speeds (also shown in the NetPIPE results) and reduced scalability when the job had higher rank counts. In addition, for communications within a VM, the performance for a virtual machine compared to the physical machine was almost identical. ABAQUS has a different algorithm for solving implicit Finite Element Analysis (FEA) problem called “ABAQUS Standard.” This method does not run distributed parallel, but can be run SMP parallel which was done for the “s6” benchmark. Figure 8: Serial FLUENT 12.1 Typically ABAQUS Standard does considerably more I/O to scratch disk than its explicit counterpart. However, this is dependent upon the amount of memory available in the execution environment. It is clear again that when an application is only CPU or memory constrained, a virtual machine has almost no detectable performance impact. ANSYS® FLUENT ANSYS® FLUENT5 belongs to a large class of HPC applications referred to as computational fluid dynamics (CFD) codes. The “aircraft_2m” FLUENT model was selected based on size and run for 25 iterations. The “sedan_4m” model was chosen as a suitable sized model for running in parallel. Hundred iterations were performed using this model.
  • 9. IBM Systems and Technology Group 9 Figure 9: Distributed parallel FLUENT 12.1 (sedan_4m - 100 iterations) Though CFD codes such as FLUENT are rarely run serially because of memory requirements or solution time requirements, the comparison in Figure 8 shows that the solution time for a physical machine and a virtual machine are different by only 1.9 percent where the virtual machine is the slower of the two. The “aircraft_2m” model was simply too small to scale well in parallel, and provided strangely varying results, so the sedan_4m model was used.6 The result for the parallel case (Figure 9) illustrate that at two CPUs the virtual machine outperforms the physical machine. This is most likely caused by the native Linux scheduler moving processes around on the physical host. If the application had been bound to particular cores, then this effect would disappear. In the four and eight CPU runs the difference between physical and virtual machines is negligible. This supports the theory that the Linux CPU scheduler is impacting the two CPU job. LS-DYNA® LS-DYNA®7 is a transient dynamic finite element analysis program capable of solving complex real world time domain problems on serial, SMP parallel, and distributed parallel computational engines. The “refined_neon_30ms” model was chosen for benchmarks reviewed in this section. HP MPI 2.03.01, now owned by IBM Platform Computing was the message passing library used. Figure 10: LS-DYNA - MPP971 - Refined Neon
  • 10. 10 Could the “C” in HPC stand for Cloud? The MPP-DYNA application responds well when run in a low latency environment. This benchmark supports the notion that distributed parallel LS-DYNA jobs are still very sensitive to network latency, even when using a backplane of a VM. A serial run shows a virtual machine is 1 percent slower. Introduce message passing, however, and at eight CPUs the virtual machine is nearly 40 percent slower than the physical machine. The expectation is that if the same job was run on multiple VMs as was done for ABAQUS Explicit parallel jobs, the effect would be even greater, where physical machines significantly outperform virtual machines. Conclusion As with most legends, there is some truth to the notion that VMs are inappropriate for HPC applications. The benchmark results demonstrate that latency sensitive and I/O bound applications would perform at levels unacceptable to HPC users. However, the results also show that CPU and memory bound applications and parallel applications that are not latency sensitive perform well in a virtual environment. HPC architects who dismiss virtualization technology entirely may therefore be missing an enormous opportunity to inject flexibility and even a performance edge into their HPC designs. The power of Platform Cluster Manger - Advanced Edition and IBM® Platform™ LSF® is their ability to work in consort to manage both of these types of workload simultaneously in a single environment. These tools allow their users to maximize resource utilization and flexibility through provisioning and control at the physical and virtual levels. Only IBM Platform Computing technology allows for environment optimization at the job-by-job level, and only Platform Cluster Manager – Advanced Edition continues to optimize that environment after jobs have been scheduled and new jobs have been submitted. Such an environment could realize orders of magnitude increases in efficiency and throughput while reducing the overhead of IT maintenance. Significant results • The KVM hypervisor significantly outperforms the OVM hypervisor on AMD servers, especially when several VMs run simultaneously. • Citrix Xen I/O read and rereads are very fast on Intel servers. • OVM outperforms KVM by a significant margin for I/O intensive applications running on AMD servers. • I/O intensive and latency sensitive parallel applications are not a good fit for virtual environments today. • Memory and CPU bound applications are at performance parity between physical and virtual machines.
  • 11. Notes
  • 12. For more information To learn more about IBM Platform Computing, please contact your IBM marketing representative or IBM Business Partner, or visit the following website: ibm.com/platformcomputing © Copyright IBM Corporation 2012 IBM Corporation Systems and Technology Group Route 100 Somers, NY 10589 Produced in the United States of America November 2012 IBM, the IBM logo, ibm.com, Platform Computing, Platform Cluster Manager, Platform Dynamic Cluster and Platform LSF are trademarks of International Business Machines Corp., registered in many jurisdictions worldwide. Other product and service names might be trademarks of IBM or other companies. A current list of IBM trademarks is available on the web at “Copyright and trademark information” at ibm.com/legal/copytrade.shtml Intel, Intel logo, Intel Inside, Intel Inside logo, Intel Centrino, Intel Centrino logo, Celeron, Intel Xeon, Intel SpeedStep, Itanium, and Pentium are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries. Linux is a registered trademark of Linus Torvalds in the United States, other countries, or both. This document is current as of the initial date of publication and may be changed by IBM at any time. Not all offerings are available in every country in which IBM operates. The performance data discussed herein is presented as derived under specific operating conditions. Actual results may vary. THE INFORMATION IN THIS DOCUMENT IS PROVIDED “AS IS” WITHOUT ANY WARRANTY, EXPRESS OR IMPLIED, INCLUDING WITHOUT ANY WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND ANY WARRANTY OR CONDITION OF NON- INFRINGEMENT. IBM products are warranted according to the terms and conditions of the agreements under which they are provided. Actual available storage capacity may be reported for both uncompressed and compressed data and will vary and may be less than stated. 1 http://www.scl.ameslab.gov/netpipe/ 2 http://www.iozone.org/ 3 ABAQUS is a trademark of Simulia and Dassault Systemes (http://www. simulia.com) 4 See http://www.simulia.com/support/v67/v67_performance.html for description of the benchmark models and availability 5 Fluent is a trademark of ANSYS, Inc (http://www.fluent.com) 6 The largest model provided by ANSYS, “truck_14m”, was not an option for this benchmark asthe model was too large to fit into memory. 7 LS-DYNA is a trademark of LSTC (http://www.lstc.com/) Please Recycle DCW03038-USEN-00