Develop, Deploy, and Innovate with Intel® Cluster Ready
1. 1
Develop, Deploy and Innovate
with Intel® Cluster Ready
cluster architecture for distributed performance
June, 19 2013
Werner Krotz-Vogel
2. 2
Intel® Cluster Ready
• Intel® Cluster Ready - an “industry standard architecture for
Intel-based Linux clusters”
Intel Cluster Ready implements an industry-standard cluster architecture that lets
you select, deploy and manage a high-performance cluster almost as easily as a
desktop PC. This architecture helps to ensure interoperability across the complete
HPC solution stack.
• Intel® Cluster Checker tool included with Intel Cluster Ready
systems
After installation, users can easily check system health with Intel Cluster Checker,
System Administrators can easily perform preventative maintenance, and cluster
diagnostics
2
3. 3
Intel® Cluster Ready – New Specification 1.3
Evolution of the Intel® Cluster Ready architecture
Main new features
• Adds Intel Xeon® PhiTM (and co-processors / accelerators in general)
• Updates the versions of the set of components
• New reference design incorporating Intel Xeon® PhiTM
Available July 12, 2013
3
4. 4
Intel® Cluster Checker – New Version 2.1
Update to the second generation Intel® Cluster Checker
Important new features
• Verifies Intel® Cluster Ready 1.3 compliance (and maintains support for
1.2)
• Fully supports Intel Xeon® PhiTM based clusters
• Easily extensible (no coding necessary!)
Available July 12, 2013
4
5. 5
Intel® Cluster Checker 2.1 with
Intel® Xeon Phi™ coprocessor support
•The micinfo test module checks that coprocessor information is correct
and uniform across nodes. Any error, undefined value or abnormal
difference among coprocessors is reported when it may impact cluster
productivity.
•The miccheck test module checks the sanity of the coprocessor cards by
running miccheck diagnostic tools in every node in parallel.
•To run a benchmark which offloads work to a coprocessor:
$ OFFLOAD_REPORT=2 MKL_MIC_ENABLE=1
clck -I micinfo -I miccheck -I dgemm
6. 6
Intel® Cluster Checker 2.1
Faster Execution Time
Reduction is 2x vs. v1.8, a 256-node certification takes nearly 30 minutes
Results have been estimated based on internal Intel analysis and are provided for informational purposes only. Any difference in system hardware or software design
or configuration may affect actual performance.
0
200
400
600
800
1000
1200
1400
1600
8 16 32 64 128 256
ExecutionTimeinSeconds
Node Quantity
7. 7
Intel® Cluster Studio XE 2013
Scale Forward, Scale Faster – for HPC Clusters
Relentless Pursuit of
Compute Capacity
Software Development
Solutions Must Scale
Scale Performance
MPI
Latency
Parallel
Models
Analysis
Tools
Compilers
Performance Libraries
Scale Forward
MPI
Scalability
Parallel
Models
Multicore Many-core
Efficiency
Memory & Thread Correctness
MPI Correctness
Parallel
Models
• Industry Leading Commercial MPI
Latency & Scalability
• Industry Leading Compiler
Performance
• Industry Leading Threading &
Performance Analysis Tools
Integrated for MPI Analysis
• Powerful Parallel Programming
Models2x Moore’s Law
8. 8
Phase Product Feature
Build
Intel® MPI Library High Performance Message Passing (MPI) Library
Intel®
Composer XE
C/C++ and Fortran compilers and performance
libraries
• Intel® Threading Building Blocks
• Intel® Cilk™ Plus
• Intel® Math Kernel Library
• Intel® Integrated Performance Primitives
Verify
Intel®
Inspector XE
Memory & threading dynamic analysis
Static Analysis for code quality
Verify &
Tune
Intel® Trace Analyzer &
Collector
MPI Performance Profiler for understanding
application correctness & behavior
Tune
Intel® VTune™
Amplifier XE
Performance Profiler for optimizing application
performance and scalability
Intel® Cluster Studio XE 2013
Tools to Scale Forward, Scale Faster – for HPC Clusters