SlideShare ist ein Scribd-Unternehmen logo
1 von 71
Downloaden Sie, um offline zu lesen
© 2013, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S.
and/or other countries. *Other names and brands may be claimed as the property of others.
Intel MPI Library,
Trace Analyzer and Collector,
and tuning tips
in cluster architectures for distributed performance
August, 2013
1
Werner Krotz-Vogel
© 2013, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S.
and/or other countries. *Other names and brands may be claimed as the property of others.
Legal Disclaimer
INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL® PRODUCTS. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR
OTHERWISE, TO ANY INTELLECTUAL PROPETY RIGHTS IS GRANTED BY THIS DOCUMENT. EXCEPT AS PROVIDED IN INTEL’S TERMS AND CONDITIONS OF
SALE FOR SUCH PRODUCTS, INTEL ASSUMES NO LIABILITY WHATSOEVER, AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY, RELATING TO
SALE AND/OR USE OF INTEL® PRODUCTS INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE,
MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT, COPYRIGHT OR OTHER INTELLECTUAL PROPERTY RIGHT.
Intel may make changes to specifications and product descriptions at any time, without notice.
All products, dates, and figures specified are preliminary based on current expectations, and are subject to change without notice.
Intel, processors, chipsets, and desktop boards may contain design defects or errors known as errata, which may cause the product to deviate from
published specifications. Current characterized errata are available on request.
Sandy Bridge and other code names featured are used internally within Intel to identify products that are in development and not yet publicly announced
for release. Customers, licensees and other third parties are not authorized by Intel to use code names in advertising, promotion or marketing of any
product or services and any such use of Intel's internal code names is at the sole risk of the user
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as
SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those
factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated
purchases, including the performance of that product when combined with other products. For more information go to http://www.intel.com/performance
Intel, Core, Xeon, VTune, Cilk, Intel and Intel Sponsors of Tomorrow. and Intel Sponsors of Tomorrow. logo, and the Intel logo are trademarks of Intel
Corporation in the United States and other countries.
*Other names and brands may be claimed as the property of others.
Copyright ©2011 Intel Corporation.
Hyper-Threading Technology: Requires an Intel® HT Technology enabled system, check with your PC manufacturer. Performance will vary depending on
the specific hardware and software used. Not available on all Intel® Core™ processors. For more information including details on which processors
support HT Technology, visit http://www.intel.com/info/hyperthreading
Intel® 64 architecture: Requires a system with a 64-bit enabled processor, chipset, BIOS and software. Performance will vary depending on the specific
hardware and software you use. Consult your PC manufacturer for more information. For more information, visit http://www.intel.com/info/em64t
Intel® Turbo Boost Technology: Requires a system with Intel® Turbo Boost Technology capability. Consult your PC manufacturer. Performance varies
depending on hardware, software and system configuration. For more information, visit http://www.intel.com/technology/turboboost
2
© 2013, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S.
and/or other countries. *Other names and brands may be claimed as the property of others.
Objectives
• Intel ® MPI execution models on Intel ® Many Integrated
Core (MIC) Architecture
• Pure MPI or hybrid MPI applications on MIC
• Analysis of Intel® MPI codes with the Intel ® Trace Analyzer
and Collector (ITAC) on MIC
• Load balancing on heterogenous systems
• Debugging Intel ® MPI codes on MIC
• Intel Cluster Checker v2 with support for MIC
3
© 2013, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S.
and/or other countries. *Other names and brands may be claimed as the property of others.
Outline
• Overview
• Installation of Intel® MPI
• Programming Models
• Hybrid Computing
• Intel® Trace Analyzer and Collector
• Load Balancing
• Debugging
• Intel® Cluster Checker
4
© 2013, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S.
and/or other countries. *Other names and brands may be claimed as the property of others.
Outline
• Overview
• Installation of Intel® MPI
• Programming Models
• Hybrid Computing
• Intel® Trace Analyzer and Collector
• Load Balancing
• Debugging
• Intel® Cluster Checker
5
© 2013, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S.
and/or other countries. *Other names and brands may be claimed as the property of others.
Intel® MPI Library Overview
• Intel is a leading vendor of MPI
implementations and tools
• Optimized MPI application
performance
Application-specific tuning
Automatic tuning
• Lower latency
Industry leading latency
• Interconnect Independence &
Runtime Selection
Multi-vendor interoperability
Performance optimized support for
the latest OFED capabilities
through DAPL 2.0
• More robust MPI applications
Seamless interoperability with
Intel® Trace Analyzer and
Collector
6
© 2013, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S.
and/or other countries. *Other names and brands may be claimed as the property of others.
Range of models to meet application needs
Foo( )
Main( )
Foo( )
MPI_*( )
Main( )
Foo( )
MPI_*( )
Main( )
Foo( )
MPI_*( )
Spectrum of Programming Models and Mindsets
7
7
Main( )
Foo( )
MPI_*( )
Main( )
Foo( )
MPI_*( )
Main( )
Foo( )
MPI_*( )Multi-core
(Xeon)
Many-core
(MIC)
Multi-Core Centric Many-Core Centric
Multi-Core Hosted
General purpose
serial and parallel
computing
Offload
Codes with highly-
parallel phases
Many Core Hosted
Highly-parallel codes
Symmetric
Codes with balanced
needs
Xeon
MIC
© 2013, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S.
and/or other countries. *Other names and brands may be claimed as the property of others.
Levels of communication
• Current clusters are not homogenous
regarding communication speed:
• Inter node (Infiniband, Ethernet, etc)
• Intra node
• Inter sockets (Quick Path Interconnect)
• Intra socket
• Two additional levels to come with MIC co-
processor:
• Host-MIC communication
• Inter MIC communication
8
© 2013, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S.
and/or other countries. *Other names and brands may be claimed as the property of others.
Intel® MPI Library Architecture & Staging
9
CH3*
MPI-2.2
Application
MPICH2* upper layer
CH3* device layer
Nemesis*
ADI3*
Netmod*
kernel SCIF
user SCIF†
shm
mmap(2)
HCA‡ driver
dapl, ofa
Pre-Alpha Alpha Beta/Gold
tcp
OFED verbs/core
†: Symmetric
Communi-
cations
Interface
‡: Host
Channel
Adapter
© 2013, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S.
and/or other countries. *Other names and brands may be claimed as the property of others.
Selecting network fabrics
• Intel® MPI selects automatically the best available network
fabric it can find.
• Use I_MPI_FABRICS to select a different
communication device explicitly
• The best fabric is usually based on Infiniband (dapl, ofa) for
inter node communication and shared memory for intra node
• Available for KNC:
• shm, tcp, ofa, dapl
• Availability checked in the order shm:dapl,
shm:ofa, shm:tcp (intra:inter)
• Set I_MPI_SSHM_SCIF=1 to enable shm fabric between host
and MIC
10
© 2013, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S.
and/or other countries. *Other names and brands may be claimed as the property of others.
Intel® MPI 4.1
what’s NOT in it for Xeon Phi coprocessors?
• Features not provided for Xeon Phi
coprocessors:
• Dynamic process management
• MPI file I/O
• mpirun -perhost option
• mpitune
• ILP64 mode
• No support on Xeon Phi coprocessors on
deprecated feature:
• MPD process manager
11
© 2013, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S.
and/or other countries. *Other names and brands may be claimed as the property of others.
Outline
• Overview
• Installation of Intel® MPI
• Programming Models
• Hybrid Computing
• Intel® Trace Analyzer and Collector
• Load Balancing
• Debugging
• Intel® Cluster Checker
12
© 2013, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S.
and/or other countries. *Other names and brands may be claimed as the property of others.
Installation
Download latest Intel® MPI, included in Intel Cluster Studio XE,
available from Intel Registration Center
l_mpi_p_4.1.0.030.tgz (later: l_itac_b_8.1.0.016.tgz)
Unpack the tar file, and execute the installation script:
# tar zxf l_mpi_b_4.1.0.030.tgz
# cd l_mpi_p_4.1.0.030
# ./install.sh
Follow the installation instructions
Root or user installation possible!
Resulting directory structure has intel64 and mic sub-dirs.:
/opt/intel/impi/4.1.0.030/intel64/{bin,etc,include,lib}
/opt/intel/impi/4.1.0.030/mic/{bin,etc,include,lib}
Only one user environment setup required, serves both architectures!
13
© 2013, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S.
and/or other countries. *Other names and brands may be claimed as the property of others.
Prerequisites
Assumption: Hostname host-mic0 is associated to IP
Specified in /etc/hosts or $HOME/.ssh/config
The tools directory /opt/intel is mounted by NFS onto MIC
If NFS is not available: Upload Intel® MPI libraries onto the
card(s)
# cd /opt/intel/impi/4.1.0.030/mic/lib
scp libmpi.so.4.1 /lib64/libmpi.so.4
...
Execute as root or user with sudo rights (if not possible, copy to user
directory)
Has to be repeated after every re-boot of the KNC card
14
© 2013, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S.
and/or other countries. *Other names and brands may be claimed as the property of others.
Prerequisites per User
Set the compiler environment
# source <compiler_installdir>/bin/compilervars.sh intel64
Identical for Host and MIC
Set the Intel® MPI environment
# source /opt/intel/impi/4.1.0.030/intel64/bin/mpivars.sh
Identical for Host and MIC
mpirun needs ssh access to MIC!
– Done! User‘s ssh key ~/.ssh/id_rsa.pub is copied to MIC at driver
boot time.
15
© 2013, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S.
and/or other countries. *Other names and brands may be claimed as the property of others.
Compiling and Linking for MIC
Compile MPI sources using Intel® MPI scripts
For Xeon with potential offload (latest compiler)
# mpiicc –o test test.c
For Xeon without potential offload as usual
# mpiicc [-no-offload] –o test test.c
For native execution on MIC add „–mmic“ flag,
i.e. the usual compiler flag controls also the MPI compilation
# mpiicc –mmic –o test test.c
Linker verbose mode “-v” shows
Without „–mmic“ linkage with intel64 libraries:
ld ... -L/opt/intel/impi/4.1.0.030/intel64/lib ...
With „–mmic“ linkage with MIC libraries:
ld ... -L/opt/intel/impi/4.1.0.030/mic/lib ...
16
© 2013, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S.
and/or other countries. *Other names and brands may be claimed as the property of others.
Outline
• Overview
• Installation of Intel® MPI
• Programming Models
• Hybrid Computing
• Intel® Trace Analyzer and Collector
• Load Balancing
• Debugging
• Intel® Cluster Checker
17
© 2013, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S.
and/or other countries. *Other names and brands may be claimed as the property of others.
Co-processor only Programming Model
• MPI ranks on Intel®
MIC (only)
• All messages into/out
of Intel® MIC
coprocessors
• Intel® CilkTM Plus,
OpenMP*, Intel®
Threading Building
Blocks, Pthreads used
directly within MPI
processes
• Intermediate step: All
MPI processes run on 1
Intel® MIC Architecture
only
Build Intel® MIC binary using Intel® MIC compiler.
Upload the binary to the Intel® MIC Architecture.
Run instances of the MPI application on Intel® MIC nodes.
18
CPUCPU MIC
CPUCPU MIC
Data
MPI
Data
Network
Homogenous network
of many-core CPUs
© 2013, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S.
and/or other countries. *Other names and brands may be claimed as the property of others.
Co-processor-only Programming Model
MPI ranks on the MIC coprocessor(s) onlyMPI ranks on the MIC coprocessor(s) only
MPI messages into/out of the MIC coprocessor(s)
Threading possible
19
19
• Build the application for the MIC Architecture
# mpiicc -mmic -o test_hello.MIC test.c
• Upload the MIC executable (no NFS only)
# scp ./test_hello.MIC mic0:/tmp/test_hello.MIC
– Remark: If NFS available no explicit uploads required (just copies)!
• Launch the application on the co-processor from host
# mpirun -n 2 -wdir /tmp -host mic0
/tmp/test_hello.MIC
• Alternatively: login to MIC and execute the already uploaded mpirun there!
© 2013, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S.
and/or other countries. *Other names and brands may be claimed as the property of others.
Symmetric Programming Model
• MPI ranks on Intel® MIC
Architecture and host CPUs
• Messages to/from any core
• Intel® CilkTM Plus,
OpenMP*, Intel® Threading
Building Blocks, Pthreads*
used directly within MPI
processes
• Intermediate step: All MPI
processes run on 1 host
CPU and 1 Intel® MIC
Architecture only
• Available in Intel® MPI
Library for Intel® MIC Alpha
(1 host, 1 co-processor).
Build Intel® 64 and Intel® MIC Architecture binaries by using the resp.
compilers targeting Intel® 64 and Intel® MIC Architecture.
Upload the Intel® MIC binary to the Intel® MIC Architecture.
Run instances of the MPI application on different mixed nodes.
20
Heterogeneous
network of
homogeneous CPUs
CPUCPU MIC
CPUCPU MIC
Data
MPI
Data
Network
Data
Data
Build Intel® 64 and Intel® MIC Architecture binaries by using the resp.
compilers targeting Intel® 64 and Intel® MIC Architecture.
Upload the Intel® MIC binary to the Intel® MIC Architecture.
Run instances of the MPI application on different mixed nodes.
© 2013, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S.
and/or other countries. *Other names and brands may be claimed as the property of others.
Symmetric model
MPI ranks on the MIC coprocessor(s) and host CPU(s)MPI ranks on the MIC coprocessor(s) and host CPU(s)
MPI messages into/out of the MIC(s) and host CPU(s)
Threading possible
21
21
• Build the application for Intel®64 and the MIC Architecture separately
# mpiicc -o test_hello test.c
# mpiicc –mmic -o test_hello.MIC test.c
• Upload the MIC executable
# scp ./test_hello.MIC mic0:/tmp/test_hello.MIC
• Launch the application on the host and the co-processor from the host
# mpirun -n 2 -host <hostname> ./test_hello : -wdir /tmp -n
2 -host mic0 /tmp/test_hello.MIC
© 2013, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S.
and/or other countries. *Other names and brands may be claimed as the property of others.
MPI+Offload Programming Model
• MPI ranks on Intel®
Xeon® processors (only)
• All messages into/out of
host CPUs
• Offload models used to
accelerate MPI ranks
• Intel® CilkTM Plus,
OpenMP*, Intel®
Threading Building
Blocks, Pthreads* within
Intel® MIC
Build Intel® 64 executable with included offload by using the Intel® 64
compiler.
Run instances of the MPI application on the host, offloading code
onto MIC.
Advantages of more cores and wider SIMD for certain applications
22
Homogenous network
of heterogeneous nodes
CPUCPU MIC
CPUCPU MIC
MPI
Offload
Offload
Network
Data
Data
Build Intel® 64 executable with included offload by using the Intel® 64
compiler.
Run instances of the MPI application on the host, offloading code
onto MIC.
Advantages of more cores and wider SIMD for certain applications
© 2013, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S.
and/or other countries. *Other names and brands may be claimed as the property of others.
MPI+Offload Programming Model
MPI ranks on the host CPUs onlyMPI ranks on the host CPUs only
MPI messages into/out of the host CPUs
Intel® MIC Architecture as an accelerator
23
23
• Compile for MPI and internal offload
# mpiicc –o test test.c
• Latest compiler compiles by default for offloading if offload construct is detected!
– Switch off by -no-offload flag
• Execute on host(s) as usual
# mpirun -n 2 ./test
• MPI processes will offload code for acceleration
© 2013, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S.
and/or other countries. *Other names and brands may be claimed as the property of others.
Offloading to Intel® MIC Architecture
Examples
C/C++ Offload Pragma
#pragma offload target (mic)
#pragma omp parallel for reduction(+:pi)
for (i=0; i<count; i++) {
float t = (float)((i+0.5)/count);
pi += 4.0/(1.0+t*t);
}
pi /= count;
MKL Implicit Offload
//MKL implicit offload requires no source code
changes, simply link with the offload MKL
Library.
MKL Explicit Offload
#pragma offload target (mic) 
in(transa, transb, N, alpha, beta) 
in(A:length(matrix_elements)) 
in(B:length(matrix_elements)) 
in(C:length(matrix_elements)) 
out(C:length(matrix_elements)alloc_if(0))
sgemm(&transa, &transb, &N, &N, &N, &alpha,
A, &N, B, &N, &beta, C, &N);
Fortran Offload Directive
!dir$ omp offload target(mic)
!$omp parallel do
do i=1,10
A(i) = B(i) * C(i)
enddo
!$omp end parallel
C/C++ Language Extensions
class _Shared common {
int data1;
char *data2;
class common *next;
void process();
};
_Shared class common obj1, obj2;
…
_Cilk_spawn _Offload obj1.process();
_Cilk_spawn obj2.process();
…
24
Intel Confidential - Use under NDA only
© 2013, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S.
and/or other countries. *Other names and brands may be claimed as the property of others.
Outline
• Overview
• Installation of Intel® MPI
• Programming Models
• Hybrid Computing
• Intel® Trace Analyzer and Collector
• Load Balancing
• Debugging
• Intel® Cluster Checker
25
© 2013, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S.
and/or other countries. *Other names and brands may be claimed as the property of others.
Traditional Cluster Computing
• MPI is »the« portable cluster solution
• Parallel programs use MPI over cores inside
the nodes
– Homogeneous programming model
– "Easily" portable to different sizes of clusters
– No threading issues like »False Sharing«
(common cache line)
– Maintenance costs only for one parallelization model
26
© 2013, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S.
and/or other countries. *Other names and brands may be claimed as the property of others.
Traditional Cluster Computing
(contd.)
• Hardware trends
• Increasing number of cores per node - plus cores on co-
processors
• Increasing number of nodes per cluster
• Consequence: Increasing number of MPI processes per
application
• Potential MPI limitations
• Memory consumption per MPI process, sum exceeds the
node memory
• Limited scalability due to exhausted interconnects (e.g.
MPI collectives)
• Load balancing is often challenging in MPI
27
© 2013, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S.
and/or other countries. *Other names and brands may be claimed as the property of others.
Hybrid Computing
• Combine MPI programming model with threading model
• Overcome MPI limitations by adding threading:
• Potential memory gains in threaded code
• Better scalability (e.g. less MPI
communication)
• Threading offers smart load balancing
strategies
• Result: Maximize performance by exploitation of hardware
(incl. co-processors)
28
© 2013, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S.
and/or other countries. *Other names and brands may be claimed as the property of others.
29
Example: MPI Load Imbalance
4 Cores per Node
Nodes
Proc 1Proc 0 Proc 3Proc 2
Proc 4 Proc 5
i
j
...
Difficult to
implement load
balancing in
nodes with MPI
Dark red =
high load
© 2013, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S.
and/or other countries. *Other names and brands may be claimed as the property of others.
30
Example: Hybrid Load Balance
Nodes
Thread0
i
...
Thread1
Thread2
Thread3
Thread0
Thread1
Thread2
Thread3
Proc 0
Interleaved
OpenMP threads
improve total
load balancing
j
Dark red =
high load
4 Threads per Node on 4 Cores
© 2013, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S.
and/or other countries. *Other names and brands may be claimed as the property of others.
Options for Thread Parallelism
31
Intel® Math Kernel Library
OpenMP*
Intel® Threading Building Blocks
Intel® Cilk™ Plus
Pthreads* and other threading libraries
Programmer control
Ease of use / code
maintainability
Choice of unified programming to target Intel® Xeon and Intel® MIC Architecture!
© 2013, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S.
and/or other countries. *Other names and brands may be claimed as the property of others.
Intel® MPI Support of Hybrid Codes
Intel® MPI is strong in mapping control
Sophisticated default or user controlled
I_MPI_PIN_PROCESSOR_LIST for pure MPI
For hybrid codes (takes precedence):
I_MPI_PIN_DOMAIN =<size>[:<layout>]
<size> =
omp Adjust to OMP_NUM_THREADS
auto #CPUs/#MPIprocs
<n> Number
<layout> =
platform According to BIOS numbering
compact Close to each other
scatter Far away from each other
Naturally extends to hybrid codes on MIC
32
* Although locality issues apply as well, multicore threading runtimes are by far more expressive, richer, and with lower overhead.
© 2013, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S.
and/or other countries. *Other names and brands may be claimed as the property of others.
Intel® MPI Support of Hybrid Codes
Define I_MPI_PIN_DOMAIN to split logical processors into non-
overlapping subsets
Mapping rule: 1 MPI process per 1 domain
33
Pin OpenMP threads inside
the domain with
KMP_AFFINITY
(or in the code)
© 2013, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S.
and/or other countries. *Other names and brands may be claimed as the property of others.
Intel® MPI Environment Support
The execution command mpirun of Intel® MPI reads
argument sets from the command line:
Sections between „:“ define an argument set
(alternatively a line in a configfile specifies a set)
Host, number of nodes, but also environment can be
set independently in each argument set
# mpirun –env I_MPI_PIN_DOMAIN 4 –host myXEON
...
: -env I_MPI_PIN_DOMAIN 16 –host
myMIC
Adapt the important environment variables to the
architecture
OMP_NUM_THREADS, KMP_AFFINITY for OpenMP
CILK_NWORKERS for Intel® CilkTM Plus
34
* Although locality issues apply as well, multicore threading runtimes are by far more expressive, richer, and with lower overhead.
© 2013, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S.
and/or other countries. *Other names and brands may be claimed as the property of others.
Co-Processor only and Symmetric Support
Full hybrid support on Intel® Xeon from Intel ® MPI extends
to Intel ® MIC
KMP_AFFINITY=balanced (only on MIC) in addition to
scatter and compact
Recommendations:
Explicitly control where MPI processes and threads run
in a hybrid application
(according to threading model and application)
Avoid splitting cores among MPI processes, i.e.
I_MPI_PIN_DOMAIN should be a multiple of 4
Try different KMP_AFFINITY settings for your application
35
© 2013, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S.
and/or other countries. *Other names and brands may be claimed as the property of others.
OS Thread Affinity Mapping
• The Intel® MIC coprocessor has N cores, each with 4 hardware
thread contexts, for a total of M=4*N threads
• The OS maps “procs” to the M hardware threads:
• The OS runs on proc 0, which lives on MIC core (N-1)!
• Rule of thumb: Avoid using OS procs 0, (M-3), (M-2), and (M-1)
to avoid contention with the OS
• Only less than 2% resources unused (1/#cores)
• Especially important when using the offload model due to data
transfer activity!
• But: Non-offload applications may slightly benefit from running on
core (N-1)
36
MIC core 0 1 … (N-2) (N-1)
MIC HW thread 0 1 2 3 0 1 … 3 0 1 2 3
OS “proc” 1 2 3 4 5 6 … (M-4) 0 (M-3) (M-2) (M-1)
© 2013, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S.
and/or other countries. *Other names and brands may be claimed as the property of others.
OS Thread Affinity Mapping (ctd.)
OpenMP library maps to the OS “procs”
Examples (for non-offload apps which benefit from core N-1):
KMP_AFFINITY=compact,granularity=thread,compact
KMP_AFFINITY=balanced,granularity=thread OMP_NUM_THREADS=n=M/2
37
MIC core 0 1 … (N-2) (N-1)
MIC HW thread 0 1 2 3 0 1 … 3 0 1 2 3
OS “proc” 1 2 3 4 5 6 … (M-4) 0 (M-3) (M-2) (M-1)
OpenMP thread 0 1 2 3 4 5 … (M-5) (M-4) (M-3) (M-2) (M-1)
MIC core 0 1 … (N-2) (N-1)
MIC HW thread 0 1 2 3 0 1 … 3 0 1 2 3
OS “proc” 1 2 3 4 5 6 … (M-4) 0 (M-3) (M-2) (M-1)
OpenMP thread 0 1 3 4 … (n-2) (n-1)
© 2013, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S.
and/or other countries. *Other names and brands may be claimed as the property of others.
MPI+Offload Support
How to control MIC mapping of threads?
How do I avoid that offload of first MPI process
interferes with offload of second MPI process,
i.e. by using identical MIC cores/threads?
Default: No special support (now). Offloads from
MPI processes handled by system like offloads
from independent processes (or users).
Define thread affinity manually per single MPI
process (pseudo syntax!):
# export OMP_NUM_THREADS=4
# mpirun –env KMP_AFFINITY=[1-4] –n 1 –host myMIC ... :
–env KMP_AFFINITY=[5-8] –n 1 –host myMIC ... :
...
38
© 2013, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S.
and/or other countries. *Other names and brands may be claimed as the property of others.
Outline
• Overview
• Installation of Intel® MPI
• Programming Models
• Hybrid Computing
• Intel® Trace Analyzer and Collector
• Load Balancing
• Debugging
• Intel® Cluster Checker
39
© 2013, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S.
and/or other countries. *Other names and brands may be claimed as the property of others.
Compare the event
timelines of two
communication profiles
Blue = computation
Red = communication
Chart showing how the
MPI processes interact
Intel® Trace Analyzer and Collector
40
© 2013, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S.
and/or other countries. *Other names and brands may be claimed as the property of others.
Intel® Trace Analyzer and Collector Overview
• Intel® Trace Analyzer and
Collector helps the developer:
• Visualize and understand parallel
application behavior
• Evaluate profiling statistics and load
balancing
• Identify communication hotspots
• Features
• Event-based approach
• Low overhead
• Excellent scalability
• Comparison of multiple profiles
• Powerful aggregation and filtering
functions
• Fail-safe MPI tracing
• Provides API to instrument user code
• MPI correctness checking
• Idealizer
41
© 2013, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S.
and/or other countries. *Other names and brands may be claimed as the property of others.
Full ITAC Functionality on MIC
42
© 2013, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S.
and/or other countries. *Other names and brands may be claimed as the property of others.
ITAC Prerequisites
Upload ITAC library manually
# sudo scp /opt/intel/itac/8.1.0.016/mic/slib/libVT.so
mic0:/lib64/
Set ITAC environment (per user)
# source /opt/intel/itac/8.1.0.016/intel64/bin/itacvars.sh impi4
–Identical for Host and MIC
43
© 2013, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S.
and/or other countries. *Other names and brands may be claimed as the property of others.
ITAC Usage with Xeon Phi
Run with –trace flag (without linkage) to create a trace file
MPI+Offload
# mpirun –trace -n 2 ./test
Co-processor only
# mpirun –trace -n 2 -wdir /tmp
-host mic0 /tmp/test_hello.MIC
Symmetric
# mpirun –trace -n 2 -host michost./test_hello :
-wdir /tmp -n 2 -host mic0
/tmp/test_hello.MIC
Flag „-trace“ will implicitly pre-load libVT.so
(which finally calls libmpi.so to execute the MPI call)
Set VT_LOGFILE_FORMAT=stfsingle to create a single trace
44
© 2013, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S.
and/or other countries. *Other names and brands may be claimed as the property of others.
ITAC Usage with Xeon Phi
Compilation Support
Compile and link with „–trace“ flag
# mpiicc -trace -o test_hello test.c
# mpiicc –trace –mmic -o test_hello.MIC test.c
Linkage of libVT library
Compile with –tcollect flag
# mpiicc –tcollect -o test_hello test.c
# mpiicc –tcollect –mmic -o test_hello.MIC test.c
• Linkage of libVT library
• Will do a full instrumentation of your code, i.e. All user functions
will be visible in the trace file
• Maximal insight, but also maximal overhead
Use the VT API of ITAC to manually instrument your code.
Run as usual Intel® MPI program without „-trace“ flag
# mpirun ...
45
© 2013, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S.
and/or other countries. *Other names and brands may be claimed as the property of others.
ITAC Analysis
Start the ITAC analysis GUI with the trace file (or load it)
# traceanalyzer test_hello.single.stf
Start the analysis, usually by inspection of the Flat Profile (default
chart), the Event Timeline, and the Message Profile
• Select “Charts->Event Timeline”
• Select “Charts->Message Profile”
• Zoom into the Event Timeline
• Klick into it, keep pressed, move to the right,
and release the mouse
• See menu Navigate to get back
• Right klick the “Group MPI->Ungroup MPI”.
46
© 2013, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S.
and/or other countries. *Other names and brands may be claimed as the property of others.
Outline
• Overview
• Installation of Intel® MPI
• Programming Models
• Hybrid Computing
• Intel® Trace Analyzer and Collector
• Load Balancing
• Debugging
• Intel® Cluster Checker
47
© 2013, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S.
and/or other countries. *Other names and brands may be claimed as the property of others.
Intel® Xeon Phi Coprocessor Becomes
a Network Node
48
*
Intel® Xeon® Processor Intel® Xeon Phi Coprocessor
Virtual Network
Connection
Intel® Xeon® Processor Intel® Xeon Phi Coprocessor
Virtual Network
Connection
…
…
48
Intel® MIC Architecture + Linux enables IP addressability
© 2013, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S.
and/or other countries. *Other names and brands may be claimed as the property of others.
Load Balancing
• Situation
• Host and Xeon Phi coprocessor computation performance are different
• Host and Xeon Phi coprocessor internal communication speed is different
• MPI in symmetric mode is like running on a heterogenous
cluster
• Load balanced codes (on homogeneous cluser) may get
imbalanced!
• Solution? No general solution!
• Approach 1: Adapt MPI mapping of (hybrid) code to performance
characteristics: #m processes per host, #n process per Xeon Phi
coprocessor(s)
• Approach 2: Change code internal mapping of workload to MPI processes
• Example: uneven split of calculation grid for MPI processes on host vs. Xeon Phi
coprocessor(s)
• Approach 3: ...
• Analyze load balance of application with ITAC
• Ideal Interconnect Simulator
49
© 2013, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S.
and/or other countries. *Other names and brands may be claimed as the property of others.
Improving Load Balance: Real World Case
50
Host
16 MPI procs x
1 OpenMP thread
Xeon Phi coprocessor
8 MPI procs x
28 OpenMP threads
Collapsed data per
node and Xeon Phi
coprocessor
Too high load on Host
= too low load on Xeon Phi coprocessor
© 2013, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S.
and/or other countries. *Other names and brands may be claimed as the property of others.
Improving Load Balance: Real World Case
51
Collapsed data per
node and Xeon Phi
coprocessor
Host
16 MPI procs x
1 OpenMP thread
Xeon Phi coprocessor
24 MPI procs x
8 OpenMP threads
Too low load on Host
= too high load on Xeon Phi coprocessor
© 2013, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S.
and/or other countries. *Other names and brands may be claimed as the property of others.
Improving Load Balance: Real World Case
52
Collapsed data per
node and Xeon Phi
coprocessor
Host
16 MPI procs x
1 OpenMP thread
Xeon Phi coprocessor
16 MPI procs x
12 OpenMP thrds
Perfect balance
Host load = Xeon Phi coprocessor load
© 2013, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S.
and/or other countries. *Other names and brands may be claimed as the property of others.
Ideal Interconnect Simulator (IIS)
What is the Ideal Interconnect Simulator (IIS)?
Using a ITAC trace of an MPI application,
simulate it under ideal conditions
Zero network latency
Infinite network bandwidth
Zero MPI buffer copy time
Infinite MPI buffer size
Only limiting factors are concurrency rules,
e.g.,
A message can not be received before it is sent
An All-to-All collective may end only when the last
thread starts
53
© 2013, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S.
and/or other countries. *Other names and brands may be claimed as the property of others.
Ideal Interconnect Simulator (Idealizer)
54
Actual trace
Idealized Trace
© 2013, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S.
and/or other countries. *Other names and brands may be claimed as the property of others.
Building Blocks: Elementary Messages
55
MPI_Recv
MPI_IsendMPI_IsendP1
P2
Early Send / Late
Receive
MPI_Isend
MPI_Recv
P1
P2
Late Send / Early
Receive
MPI_Recv
zero duration
zero duration
MPI_Isend
zero duration
MPI_Recv
Load imbalance
© 2013, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S.
and/or other countries. *Other names and brands may be claimed as the property of others.
Building Blocks: Collective Operations
56
Actual trace
(Gigabit Ethernet)
Simulated trace (Ideal
interconnect) Same timescale in both figures
Same
MPI_Alltoallv
Legend:
257 = MPI_Alltoallv
506 = User_Code
© 2013, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S.
and/or other countries. *Other names and brands may be claimed as the property of others.
Application Imbalance Diagram: Total
57
"calculation"
"load imbalance"
"interconnect"Faster network
Change parallel
decomposition
Change algorithm
MPI
© 2013, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S.
and/or other countries. *Other names and brands may be claimed as the property of others.
Application Imbalance Diagram: Breakdown
58
MPI_Recv
MPI_Allreduce
MPI_Alltoallv
"load imbalance"
"interconnect"
© 2013, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S.
and/or other countries. *Other names and brands may be claimed as the property of others.
Outline
• Overview
• Installation of Intel® MPI
• Programming Models
• Hybrid Computing
• Intel® Trace Analyzer and Collector
• Load Balancing
• Debugging
• Intel® Cluster Checker
59
© 2013, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S.
and/or other countries. *Other names and brands may be claimed as the property of others.
Debugging Intel® MPI Application
Use environment variables:
I_MPI_DEBUG to set the debug level
I_MPI_DEBUG_OUTPUT to specify a file for output re-
direction
Use format strings like %r, %p or %h to add rank, pid
or host name to the file name accordingly
Usage:
# export I_MPI_DEBUG=<debug level>
or:
# mpirun –env I_MPI_DEBUG <debug level>
–n <# of processes> ./a.out
Processor information utility in Intel® MPI :
# cpuinfo
Aggregates /proc/cpuinfo information
60
© 2013, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S.
and/or other countries. *Other names and brands may be claimed as the property of others.
GDB* on Intel® Xeon Phi™ Coprocessor
• GDB* supports Intel® Xeon Phi™ Coprocessor
• Intel upstreams features and capabilities to GNU*
community
• Broad enabling of developers and software tools ecosystem
• Available from Intel at http://software.intel.com
61
8/19/201
3
© 2013, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S.
and/or other countries. *Other names and brands may be claimed as the property of others.
The GNU* Project Debugger and
Intel® Xeon Phi™ Coprocessor
• Native and cross-debugger versions of GDB*
exist for the Intel® Xeon Phi™ coprocessor
• It is part of the Intel® Manycore Platform
Software Stack (Intel® MPSS)
• http://software.intel.com/en-us/articles/intel-
manycore-platform-software-stack-mpss
You can debug with it as either root or a user
62
Intel Confidential – NDA presentation
© 2013, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S.
and/or other countries. *Other names and brands may be claimed as the property of others.
Native debugging on the Intel® Xeon
Phi™ Coprocessor with GDB*
63
• Run GDB* on the Intel® Xeon Phi™ Coprocessor
ssh –t mic0 /usr/bin/gdb
– To attach to a running application via the
process-id
(gdb) shell pidof my_application
42
(gdb) attach 42
– To run an application directly from GDB*
(gdb) file /target/path/to/application
(gdb) start
Intel Confidential – NDA presentation
© 2013, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S.
and/or other countries. *Other names and brands may be claimed as the property of others.
Remote debugging with GDB*
for Intel® Xeon Phi™ Coprocessor
64
• Run GDB* on your localhost
/usr/linux-k1om-4.7/bin/x86_64-k1om-linux-gdb
Start gdbserver on the Intel® Xeon Phi™Coprocessor
• To remote debug using |ssh
(gdb) target extended-remote | ssh –T mic0 gdbserver –multi IP:port
• To remote debug using stdio
(gdb) target extended-remote | ssh -T mic0 gdbserver –multi -
To attach to a running application via the process-id (pid)
(gdb) file /local/path/to/application
(gdb) attach <remote-pid>
To run an application directly from GDB*
(gdb) file /local/path/to/application
(gdb) set remote exec-file /target/path/to/application
© 2013, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S.
and/or other countries. *Other names and brands may be claimed as the property of others.
Explore Intel® Xeon Phi™
Coprocessor Architecture Features
65
8/19/
2013
List all new vector and mask registers
(gdb) info registers zmm
k0 0x0 0
⁞
zmm31 {v16_float = {0x0 <repeats 16 times>}, v8_double = {0x0,
0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0},
v64_int8 = {0x0 <repeats 64 times>},
v32_int16 = {0x0 <repeats 32 times>},
v16_int32 = {0x0 <repeats 16 times>},
v8_int64 = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0},
v4_uint128 = {0x0, 0x0, 0x0, 0x0}}
Disassemble Instructions
• (gdb) disassemble $pc, +10
• Dump of assembler code from 0x11 to 0x24:
• 0x0000000000000011 <foobar+17>: vpackstorelps %zmm0,-
0x10(%rbp){%k1}
• 0x0000000000000018 <foobar+24>: vbroadcastss -0x10(%rbp),%zmm0
© 2013, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S.
and/or other countries. *Other names and brands may be claimed as the property of others.
Outline
• Overview
• Installation of Intel® MPI
• Programming Models
• Hybrid Computing
• Intel® Trace Analyzer and Collector
• Load Balancing
• Debugging
• Intel® Cluster Checker
66
© 2013, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S.
and/or other countries. *Other names and brands may be claimed as the property of others.
Intel® Cluster Checker 2.0 with
Intel® Xeon Phi™ coprocessor support
•The new micinfo test module checks that coprocessor
information is correct and uniform across nodes. Any error,
undefined value or abnormal difference among coprocessors is
reported when it may impact cluster productivity.
•The new miccheck test module checks the sanity of the
coprocessor cards by running miccheck diagnostic tools in
every node in parallel.
•To run a benchmark which offloads work to a coprocessor:
$ OFFLOAD_REPORT=2 MKL_MIC_ENABLE=1 
clck -I micinfo -I miccheck -I dgemm
http://software.intel.com/en-us/articles/using-intel-cluster-checker-20-to-check-intel-xeon-phi-support
67
© 2013, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S.
and/or other countries. *Other names and brands may be claimed as the property of others.
Intel® Cluster Checker 2.0
Faster Execution Time
Reduction is 2x vs. v1.8, a 256-node certification takes nearly 30 minutes
Results have been estimated based on internal Intel analysis and are provided for informational purposes only.
Any difference in system hardware or software design or configuration may affect actual performance.
0
200
400
600
800
1000
1200
1400
1600
8 16 32 64 128 256
ExecutionTimeinSeconds
Node Quantity
© 2013, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S.
and/or other countries. *Other names and brands may be claimed as the property of others.
Summary
The ease of use of Intel® MPI and related tools
like the Intel Trace Analyzer and Collector
extends from the Intel Xeon architecture to the
Intel MIC architecture.
“Everything must be made as simple as possible. But not simpler.”
― Albert Einstein
69
© 2013, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S.
and/or other countries. *Other names and brands may be claimed as the property of others.
70
© 2013, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S.
and/or other countries. *Other names and brands may be claimed as the property of others.
71

Weitere ähnliche Inhalte

Was ist angesagt?

How to create a high quality, fast texture compressor using ISPC
How to create a high quality, fast texture compressor using ISPC How to create a high quality, fast texture compressor using ISPC
How to create a high quality, fast texture compressor using ISPC Gael Hofemeier
 
Make your unity game faster, faster
Make your unity game faster, fasterMake your unity game faster, faster
Make your unity game faster, fasterIntel® Software
 
More explosions, more chaos, and definitely more blowing stuff up
More explosions, more chaos, and definitely more blowing stuff upMore explosions, more chaos, and definitely more blowing stuff up
More explosions, more chaos, and definitely more blowing stuff upIntel® Software
 
Real-Time Game Optimization with Intel® GPA
Real-Time Game Optimization with Intel® GPAReal-Time Game Optimization with Intel® GPA
Real-Time Game Optimization with Intel® GPAIntel® Software
 
Relative Capacity por Eduardo Oliveira e Joseph Temple
Relative Capacity por Eduardo Oliveira e Joseph TempleRelative Capacity por Eduardo Oliveira e Joseph Temple
Relative Capacity por Eduardo Oliveira e Joseph TempleJoao Galdino Mello de Souza
 
Real-Time Game Optimization with Intel® GPA
Real-Time Game Optimization with Intel® GPAReal-Time Game Optimization with Intel® GPA
Real-Time Game Optimization with Intel® GPAIntel® Software
 
Intel Knights Landing Slides
Intel Knights Landing SlidesIntel Knights Landing Slides
Intel Knights Landing SlidesRonen Mendezitsky
 
【視覺進化論】AI智慧視覺運算技術論壇_2_ChungYeh
【視覺進化論】AI智慧視覺運算技術論壇_2_ChungYeh【視覺進化論】AI智慧視覺運算技術論壇_2_ChungYeh
【視覺進化論】AI智慧視覺運算技術論壇_2_ChungYehMAKERPRO.cc
 
QATCodec: past, present and future
QATCodec: past, present and futureQATCodec: past, present and future
QATCodec: past, present and futureboxu42
 
In The Trenches Optimizing UE4 for Intel
In The Trenches Optimizing UE4 for IntelIn The Trenches Optimizing UE4 for Intel
In The Trenches Optimizing UE4 for IntelIntel® Software
 
So You Want to Build a Snowman…But it is Summer
So You Want to Build a Snowman…But it is SummerSo You Want to Build a Snowman…But it is Summer
So You Want to Build a Snowman…But it is SummerIntel® Software
 
Scaling python to_hpc_big_data-maidanov
Scaling python to_hpc_big_data-maidanovScaling python to_hpc_big_data-maidanov
Scaling python to_hpc_big_data-maidanovDenis Nagorny
 
Denis Nagorny - Pumping Python Performance
Denis Nagorny - Pumping Python PerformanceDenis Nagorny - Pumping Python Performance
Denis Nagorny - Pumping Python PerformanceSergey Arkhipov
 
Servidor IBM zEnterprise BC12
Servidor IBM zEnterprise BC12Servidor IBM zEnterprise BC12
Servidor IBM zEnterprise BC12Anderson Bassani
 
Intel Public Roadmap for Desktop, Mobile, Data Center
Intel Public Roadmap for Desktop, Mobile, Data CenterIntel Public Roadmap for Desktop, Mobile, Data Center
Intel Public Roadmap for Desktop, Mobile, Data CenterDr. Wilfred Lin (Ph.D.)
 
Технологии Intel для виртуализации сетей операторов связи
Технологии Intel для виртуализации сетей операторов связиТехнологии Intel для виртуализации сетей операторов связи
Технологии Intel для виртуализации сетей операторов связиCisco Russia
 

Was ist angesagt? (20)

How to create a high quality, fast texture compressor using ISPC
How to create a high quality, fast texture compressor using ISPC How to create a high quality, fast texture compressor using ISPC
How to create a high quality, fast texture compressor using ISPC
 
Make your unity game faster, faster
Make your unity game faster, fasterMake your unity game faster, faster
Make your unity game faster, faster
 
More explosions, more chaos, and definitely more blowing stuff up
More explosions, more chaos, and definitely more blowing stuff upMore explosions, more chaos, and definitely more blowing stuff up
More explosions, more chaos, and definitely more blowing stuff up
 
Real-Time Game Optimization with Intel® GPA
Real-Time Game Optimization with Intel® GPAReal-Time Game Optimization with Intel® GPA
Real-Time Game Optimization with Intel® GPA
 
Relative Capacity por Eduardo Oliveira e Joseph Temple
Relative Capacity por Eduardo Oliveira e Joseph TempleRelative Capacity por Eduardo Oliveira e Joseph Temple
Relative Capacity por Eduardo Oliveira e Joseph Temple
 
Intel VTune
Intel VTuneIntel VTune
Intel VTune
 
Real-Time Game Optimization with Intel® GPA
Real-Time Game Optimization with Intel® GPAReal-Time Game Optimization with Intel® GPA
Real-Time Game Optimization with Intel® GPA
 
Intel Knights Landing Slides
Intel Knights Landing SlidesIntel Knights Landing Slides
Intel Knights Landing Slides
 
Intel Roadmap
Intel RoadmapIntel Roadmap
Intel Roadmap
 
【視覺進化論】AI智慧視覺運算技術論壇_2_ChungYeh
【視覺進化論】AI智慧視覺運算技術論壇_2_ChungYeh【視覺進化論】AI智慧視覺運算技術論壇_2_ChungYeh
【視覺進化論】AI智慧視覺運算技術論壇_2_ChungYeh
 
QATCodec: past, present and future
QATCodec: past, present and futureQATCodec: past, present and future
QATCodec: past, present and future
 
In The Trenches Optimizing UE4 for Intel
In The Trenches Optimizing UE4 for IntelIn The Trenches Optimizing UE4 for Intel
In The Trenches Optimizing UE4 for Intel
 
So You Want to Build a Snowman…But it is Summer
So You Want to Build a Snowman…But it is SummerSo You Want to Build a Snowman…But it is Summer
So You Want to Build a Snowman…But it is Summer
 
Scaling python to_hpc_big_data-maidanov
Scaling python to_hpc_big_data-maidanovScaling python to_hpc_big_data-maidanov
Scaling python to_hpc_big_data-maidanov
 
Denis Nagorny - Pumping Python Performance
Denis Nagorny - Pumping Python PerformanceDenis Nagorny - Pumping Python Performance
Denis Nagorny - Pumping Python Performance
 
Нетбуки
НетбукиНетбуки
Нетбуки
 
Servidor IBM zEnterprise BC12
Servidor IBM zEnterprise BC12Servidor IBM zEnterprise BC12
Servidor IBM zEnterprise BC12
 
Intel Public Roadmap for Desktop, Mobile, Data Center
Intel Public Roadmap for Desktop, Mobile, Data CenterIntel Public Roadmap for Desktop, Mobile, Data Center
Intel Public Roadmap for Desktop, Mobile, Data Center
 
Технологии Intel для виртуализации сетей операторов связи
Технологии Intel для виртуализации сетей операторов связиТехнологии Intel для виртуализации сетей операторов связи
Технологии Intel для виртуализации сетей операторов связи
 
Public roadmap-article
Public roadmap-articlePublic roadmap-article
Public roadmap-article
 

Ähnlich wie Intel® Trace Analyzer e Collector (ITAC) - Intel Software Conference 2013

Embree Ray Tracing Kernels
Embree Ray Tracing KernelsEmbree Ray Tracing Kernels
Embree Ray Tracing KernelsIntel® Software
 
TDC2017 | São Paulo - Trilha Machine Learning How we figured out we had a SRE...
TDC2017 | São Paulo - Trilha Machine Learning How we figured out we had a SRE...TDC2017 | São Paulo - Trilha Machine Learning How we figured out we had a SRE...
TDC2017 | São Paulo - Trilha Machine Learning How we figured out we had a SRE...tdc-globalcode
 
Tendências da junção entre Big Data Analytics, Machine Learning e Supercomput...
Tendências da junção entre Big Data Analytics, Machine Learning e Supercomput...Tendências da junção entre Big Data Analytics, Machine Learning e Supercomput...
Tendências da junção entre Big Data Analytics, Machine Learning e Supercomput...Igor José F. Freitas
 
Software-defined Visualization, High-Fidelity Visualization: OpenSWR and OSPRay
Software-defined Visualization, High-Fidelity Visualization: OpenSWR and OSPRaySoftware-defined Visualization, High-Fidelity Visualization: OpenSWR and OSPRay
Software-defined Visualization, High-Fidelity Visualization: OpenSWR and OSPRayIntel® Software
 
What are latest new features that DPDK brings into 2018?
What are latest new features that DPDK brings into 2018?What are latest new features that DPDK brings into 2018?
What are latest new features that DPDK brings into 2018?Michelle Holley
 
TDC2018SP | Trilha IA - Inteligencia Artificial na Arquitetura Intel
TDC2018SP | Trilha IA - Inteligencia Artificial na Arquitetura IntelTDC2018SP | Trilha IA - Inteligencia Artificial na Arquitetura Intel
TDC2018SP | Trilha IA - Inteligencia Artificial na Arquitetura Inteltdc-globalcode
 
Introduction ciot workshop premeetup
Introduction ciot workshop premeetupIntroduction ciot workshop premeetup
Introduction ciot workshop premeetupBeMyApp
 
Efficient Rendering with DirectX* 12 on Intel® Graphics
Efficient Rendering with DirectX* 12 on Intel® GraphicsEfficient Rendering with DirectX* 12 on Intel® Graphics
Efficient Rendering with DirectX* 12 on Intel® GraphicsGael Hofemeier
 
Ready access to high performance Python with Intel Distribution for Python 2018
Ready access to high performance Python with Intel Distribution for Python 2018Ready access to high performance Python with Intel Distribution for Python 2018
Ready access to high performance Python with Intel Distribution for Python 2018AWS User Group Bengaluru
 
Austin Cherian: Big data and HPC technologies - intel
Austin Cherian: Big data and HPC technologies - intelAustin Cherian: Big data and HPC technologies - intel
Austin Cherian: Big data and HPC technologies - intelVu Hung Nguyen
 
Explore, design and implement threading parallelism with Intel® Advisor XE
Explore, design and implement threading parallelism with Intel® Advisor XEExplore, design and implement threading parallelism with Intel® Advisor XE
Explore, design and implement threading parallelism with Intel® Advisor XEIntel IT Center
 
Intel NFVi Enabling Kit Demo/Lab
Intel NFVi Enabling Kit Demo/LabIntel NFVi Enabling Kit Demo/Lab
Intel NFVi Enabling Kit Demo/LabMichelle Holley
 
Intel Mobile Launch Information
Intel Mobile Launch InformationIntel Mobile Launch Information
Intel Mobile Launch InformationAnna Yovka
 
Алексей Слепцов_"Интернет вещей. Что это и для чего"
Алексей Слепцов_"Интернет вещей. Что это и для чего"Алексей Слепцов_"Интернет вещей. Что это и для чего"
Алексей Слепцов_"Интернет вещей. Что это и для чего"GeeksLab Odessa
 
Droidcon2013 x86phones weggerle_taubert_intel
Droidcon2013 x86phones weggerle_taubert_intelDroidcon2013 x86phones weggerle_taubert_intel
Droidcon2013 x86phones weggerle_taubert_intelDroidcon Berlin
 
Intel Core X-seires processors
Intel Core X-seires processorsIntel Core X-seires processors
Intel Core X-seires processorsLow Hong Chuan
 
Accelerate Ceph performance via SPDK related techniques
Accelerate Ceph performance via SPDK related techniques Accelerate Ceph performance via SPDK related techniques
Accelerate Ceph performance via SPDK related techniques Ceph Community
 
Accelerate Your Python* Code through Profiling, Tuning, and Compilation Part ...
Accelerate Your Python* Code through Profiling, Tuning, and Compilation Part ...Accelerate Your Python* Code through Profiling, Tuning, and Compilation Part ...
Accelerate Your Python* Code through Profiling, Tuning, and Compilation Part ...Intel® Software
 
Methods and practices to analyze the performance of your application with Int...
Methods and practices to analyze the performance of your application with Int...Methods and practices to analyze the performance of your application with Int...
Methods and practices to analyze the performance of your application with Int...Intel Software Brasil
 

Ähnlich wie Intel® Trace Analyzer e Collector (ITAC) - Intel Software Conference 2013 (20)

MeeGo Overview DeveloperDay Munich
MeeGo Overview DeveloperDay MunichMeeGo Overview DeveloperDay Munich
MeeGo Overview DeveloperDay Munich
 
Embree Ray Tracing Kernels
Embree Ray Tracing KernelsEmbree Ray Tracing Kernels
Embree Ray Tracing Kernels
 
TDC2017 | São Paulo - Trilha Machine Learning How we figured out we had a SRE...
TDC2017 | São Paulo - Trilha Machine Learning How we figured out we had a SRE...TDC2017 | São Paulo - Trilha Machine Learning How we figured out we had a SRE...
TDC2017 | São Paulo - Trilha Machine Learning How we figured out we had a SRE...
 
Tendências da junção entre Big Data Analytics, Machine Learning e Supercomput...
Tendências da junção entre Big Data Analytics, Machine Learning e Supercomput...Tendências da junção entre Big Data Analytics, Machine Learning e Supercomput...
Tendências da junção entre Big Data Analytics, Machine Learning e Supercomput...
 
Software-defined Visualization, High-Fidelity Visualization: OpenSWR and OSPRay
Software-defined Visualization, High-Fidelity Visualization: OpenSWR and OSPRaySoftware-defined Visualization, High-Fidelity Visualization: OpenSWR and OSPRay
Software-defined Visualization, High-Fidelity Visualization: OpenSWR and OSPRay
 
What are latest new features that DPDK brings into 2018?
What are latest new features that DPDK brings into 2018?What are latest new features that DPDK brings into 2018?
What are latest new features that DPDK brings into 2018?
 
TDC2018SP | Trilha IA - Inteligencia Artificial na Arquitetura Intel
TDC2018SP | Trilha IA - Inteligencia Artificial na Arquitetura IntelTDC2018SP | Trilha IA - Inteligencia Artificial na Arquitetura Intel
TDC2018SP | Trilha IA - Inteligencia Artificial na Arquitetura Intel
 
Introduction ciot workshop premeetup
Introduction ciot workshop premeetupIntroduction ciot workshop premeetup
Introduction ciot workshop premeetup
 
Efficient Rendering with DirectX* 12 on Intel® Graphics
Efficient Rendering with DirectX* 12 on Intel® GraphicsEfficient Rendering with DirectX* 12 on Intel® Graphics
Efficient Rendering with DirectX* 12 on Intel® Graphics
 
Ready access to high performance Python with Intel Distribution for Python 2018
Ready access to high performance Python with Intel Distribution for Python 2018Ready access to high performance Python with Intel Distribution for Python 2018
Ready access to high performance Python with Intel Distribution for Python 2018
 
Austin Cherian: Big data and HPC technologies - intel
Austin Cherian: Big data and HPC technologies - intelAustin Cherian: Big data and HPC technologies - intel
Austin Cherian: Big data and HPC technologies - intel
 
Explore, design and implement threading parallelism with Intel® Advisor XE
Explore, design and implement threading parallelism with Intel® Advisor XEExplore, design and implement threading parallelism with Intel® Advisor XE
Explore, design and implement threading parallelism with Intel® Advisor XE
 
Intel NFVi Enabling Kit Demo/Lab
Intel NFVi Enabling Kit Demo/LabIntel NFVi Enabling Kit Demo/Lab
Intel NFVi Enabling Kit Demo/Lab
 
Intel Mobile Launch Information
Intel Mobile Launch InformationIntel Mobile Launch Information
Intel Mobile Launch Information
 
Алексей Слепцов_"Интернет вещей. Что это и для чего"
Алексей Слепцов_"Интернет вещей. Что это и для чего"Алексей Слепцов_"Интернет вещей. Что это и для чего"
Алексей Слепцов_"Интернет вещей. Что это и для чего"
 
Droidcon2013 x86phones weggerle_taubert_intel
Droidcon2013 x86phones weggerle_taubert_intelDroidcon2013 x86phones weggerle_taubert_intel
Droidcon2013 x86phones weggerle_taubert_intel
 
Intel Core X-seires processors
Intel Core X-seires processorsIntel Core X-seires processors
Intel Core X-seires processors
 
Accelerate Ceph performance via SPDK related techniques
Accelerate Ceph performance via SPDK related techniques Accelerate Ceph performance via SPDK related techniques
Accelerate Ceph performance via SPDK related techniques
 
Accelerate Your Python* Code through Profiling, Tuning, and Compilation Part ...
Accelerate Your Python* Code through Profiling, Tuning, and Compilation Part ...Accelerate Your Python* Code through Profiling, Tuning, and Compilation Part ...
Accelerate Your Python* Code through Profiling, Tuning, and Compilation Part ...
 
Methods and practices to analyze the performance of your application with Int...
Methods and practices to analyze the performance of your application with Int...Methods and practices to analyze the performance of your application with Int...
Methods and practices to analyze the performance of your application with Int...
 

Mehr von Intel Software Brasil

Modernização de código em Xeon® e Xeon Phi™
Modernização de código em Xeon® e Xeon Phi™  Modernização de código em Xeon® e Xeon Phi™
Modernização de código em Xeon® e Xeon Phi™ Intel Software Brasil
 
Escreva sua App sem gastar energia, agora no KitKat
Escreva sua App sem gastar energia, agora no KitKatEscreva sua App sem gastar energia, agora no KitKat
Escreva sua App sem gastar energia, agora no KitKatIntel Software Brasil
 
Desafios do Desenvolvimento Multiplataforma
Desafios do Desenvolvimento MultiplataformaDesafios do Desenvolvimento Multiplataforma
Desafios do Desenvolvimento MultiplataformaIntel Software Brasil
 
Desafios do Desenvolvimento Multi-plataforma
Desafios do Desenvolvimento Multi-plataformaDesafios do Desenvolvimento Multi-plataforma
Desafios do Desenvolvimento Multi-plataformaIntel Software Brasil
 
Getting the maximum performance in distributed clusters Intel Cluster Studio XE
Getting the maximum performance in distributed clusters Intel Cluster Studio XEGetting the maximum performance in distributed clusters Intel Cluster Studio XE
Getting the maximum performance in distributed clusters Intel Cluster Studio XEIntel Software Brasil
 
Principais conceitos técnicas e modelos de programação paralela
Principais conceitos técnicas e modelos de programação paralelaPrincipais conceitos técnicas e modelos de programação paralela
Principais conceitos técnicas e modelos de programação paralelaIntel Software Brasil
 
Principais conceitos e técnicas em vetorização
Principais conceitos e técnicas em vetorizaçãoPrincipais conceitos e técnicas em vetorização
Principais conceitos e técnicas em vetorizaçãoIntel Software Brasil
 
Benchmarking para sistemas de alto desempenho
Benchmarking para sistemas de alto desempenhoBenchmarking para sistemas de alto desempenho
Benchmarking para sistemas de alto desempenhoIntel Software Brasil
 
Yocto no 1 IoT Day da Telefonica/Vivo
Yocto no 1 IoT Day da Telefonica/VivoYocto no 1 IoT Day da Telefonica/Vivo
Yocto no 1 IoT Day da Telefonica/VivoIntel Software Brasil
 
Otávio Salvador - Yocto project reduzindo -time to market- do seu próximo pr...
Otávio Salvador - Yocto project  reduzindo -time to market- do seu próximo pr...Otávio Salvador - Yocto project  reduzindo -time to market- do seu próximo pr...
Otávio Salvador - Yocto project reduzindo -time to market- do seu próximo pr...Intel Software Brasil
 
Desenvolvimento e análise de performance de jogos Android com Coco2d-HTML5
Desenvolvimento e análise de performance de jogos Android com Coco2d-HTML5Desenvolvimento e análise de performance de jogos Android com Coco2d-HTML5
Desenvolvimento e análise de performance de jogos Android com Coco2d-HTML5Intel Software Brasil
 
O uso de tecnologias Intel na implantação de sistemas de alto desempenho
O uso de tecnologias Intel na implantação de sistemas de alto desempenhoO uso de tecnologias Intel na implantação de sistemas de alto desempenho
O uso de tecnologias Intel na implantação de sistemas de alto desempenhoIntel Software Brasil
 
Escreva sua App Android sem gastar energia - Intel Sw Day
Escreva sua App Android sem gastar energia - Intel Sw DayEscreva sua App Android sem gastar energia - Intel Sw Day
Escreva sua App Android sem gastar energia - Intel Sw DayIntel Software Brasil
 

Mehr von Intel Software Brasil (20)

Modernização de código em Xeon® e Xeon Phi™
Modernização de código em Xeon® e Xeon Phi™  Modernização de código em Xeon® e Xeon Phi™
Modernização de código em Xeon® e Xeon Phi™
 
Escreva sua App sem gastar energia, agora no KitKat
Escreva sua App sem gastar energia, agora no KitKatEscreva sua App sem gastar energia, agora no KitKat
Escreva sua App sem gastar energia, agora no KitKat
 
Desafios do Desenvolvimento Multiplataforma
Desafios do Desenvolvimento MultiplataformaDesafios do Desenvolvimento Multiplataforma
Desafios do Desenvolvimento Multiplataforma
 
Desafios do Desenvolvimento Multi-plataforma
Desafios do Desenvolvimento Multi-plataformaDesafios do Desenvolvimento Multi-plataforma
Desafios do Desenvolvimento Multi-plataforma
 
Yocto - 7 masters
Yocto - 7 mastersYocto - 7 masters
Yocto - 7 masters
 
Getting the maximum performance in distributed clusters Intel Cluster Studio XE
Getting the maximum performance in distributed clusters Intel Cluster Studio XEGetting the maximum performance in distributed clusters Intel Cluster Studio XE
Getting the maximum performance in distributed clusters Intel Cluster Studio XE
 
Intel tools to optimize HPC systems
Intel tools to optimize HPC systemsIntel tools to optimize HPC systems
Intel tools to optimize HPC systems
 
Principais conceitos técnicas e modelos de programação paralela
Principais conceitos técnicas e modelos de programação paralelaPrincipais conceitos técnicas e modelos de programação paralela
Principais conceitos técnicas e modelos de programação paralela
 
Principais conceitos e técnicas em vetorização
Principais conceitos e técnicas em vetorizaçãoPrincipais conceitos e técnicas em vetorização
Principais conceitos e técnicas em vetorização
 
Notes on NUMA architecture
Notes on NUMA architectureNotes on NUMA architecture
Notes on NUMA architecture
 
Benchmarking para sistemas de alto desempenho
Benchmarking para sistemas de alto desempenhoBenchmarking para sistemas de alto desempenho
Benchmarking para sistemas de alto desempenho
 
Yocto no 1 IoT Day da Telefonica/Vivo
Yocto no 1 IoT Day da Telefonica/VivoYocto no 1 IoT Day da Telefonica/Vivo
Yocto no 1 IoT Day da Telefonica/Vivo
 
Html5 fisl15
Html5 fisl15Html5 fisl15
Html5 fisl15
 
IoT FISL15
IoT FISL15IoT FISL15
IoT FISL15
 
IoT TDC Floripa 2014
IoT TDC Floripa 2014IoT TDC Floripa 2014
IoT TDC Floripa 2014
 
Otávio Salvador - Yocto project reduzindo -time to market- do seu próximo pr...
Otávio Salvador - Yocto project  reduzindo -time to market- do seu próximo pr...Otávio Salvador - Yocto project  reduzindo -time to market- do seu próximo pr...
Otávio Salvador - Yocto project reduzindo -time to market- do seu próximo pr...
 
Html5 tdc floripa_2014
Html5 tdc floripa_2014Html5 tdc floripa_2014
Html5 tdc floripa_2014
 
Desenvolvimento e análise de performance de jogos Android com Coco2d-HTML5
Desenvolvimento e análise de performance de jogos Android com Coco2d-HTML5Desenvolvimento e análise de performance de jogos Android com Coco2d-HTML5
Desenvolvimento e análise de performance de jogos Android com Coco2d-HTML5
 
O uso de tecnologias Intel na implantação de sistemas de alto desempenho
O uso de tecnologias Intel na implantação de sistemas de alto desempenhoO uso de tecnologias Intel na implantação de sistemas de alto desempenho
O uso de tecnologias Intel na implantação de sistemas de alto desempenho
 
Escreva sua App Android sem gastar energia - Intel Sw Day
Escreva sua App Android sem gastar energia - Intel Sw DayEscreva sua App Android sem gastar energia - Intel Sw Day
Escreva sua App Android sem gastar energia - Intel Sw Day
 

Kürzlich hochgeladen

SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxBkGupta21
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 

Kürzlich hochgeladen (20)

SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptx
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 

Intel® Trace Analyzer e Collector (ITAC) - Intel Software Conference 2013

  • 1. © 2013, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others. Intel MPI Library, Trace Analyzer and Collector, and tuning tips in cluster architectures for distributed performance August, 2013 1 Werner Krotz-Vogel
  • 2. © 2013, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others. Legal Disclaimer INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL® PRODUCTS. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL PROPETY RIGHTS IS GRANTED BY THIS DOCUMENT. EXCEPT AS PROVIDED IN INTEL’S TERMS AND CONDITIONS OF SALE FOR SUCH PRODUCTS, INTEL ASSUMES NO LIABILITY WHATSOEVER, AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY, RELATING TO SALE AND/OR USE OF INTEL® PRODUCTS INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT, COPYRIGHT OR OTHER INTELLECTUAL PROPERTY RIGHT. Intel may make changes to specifications and product descriptions at any time, without notice. All products, dates, and figures specified are preliminary based on current expectations, and are subject to change without notice. Intel, processors, chipsets, and desktop boards may contain design defects or errors known as errata, which may cause the product to deviate from published specifications. Current characterized errata are available on request. Sandy Bridge and other code names featured are used internally within Intel to identify products that are in development and not yet publicly announced for release. Customers, licensees and other third parties are not authorized by Intel to use code names in advertising, promotion or marketing of any product or services and any such use of Intel's internal code names is at the sole risk of the user Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more information go to http://www.intel.com/performance Intel, Core, Xeon, VTune, Cilk, Intel and Intel Sponsors of Tomorrow. and Intel Sponsors of Tomorrow. logo, and the Intel logo are trademarks of Intel Corporation in the United States and other countries. *Other names and brands may be claimed as the property of others. Copyright ©2011 Intel Corporation. Hyper-Threading Technology: Requires an Intel® HT Technology enabled system, check with your PC manufacturer. Performance will vary depending on the specific hardware and software used. Not available on all Intel® Core™ processors. For more information including details on which processors support HT Technology, visit http://www.intel.com/info/hyperthreading Intel® 64 architecture: Requires a system with a 64-bit enabled processor, chipset, BIOS and software. Performance will vary depending on the specific hardware and software you use. Consult your PC manufacturer for more information. For more information, visit http://www.intel.com/info/em64t Intel® Turbo Boost Technology: Requires a system with Intel® Turbo Boost Technology capability. Consult your PC manufacturer. Performance varies depending on hardware, software and system configuration. For more information, visit http://www.intel.com/technology/turboboost 2
  • 3. © 2013, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others. Objectives • Intel ® MPI execution models on Intel ® Many Integrated Core (MIC) Architecture • Pure MPI or hybrid MPI applications on MIC • Analysis of Intel® MPI codes with the Intel ® Trace Analyzer and Collector (ITAC) on MIC • Load balancing on heterogenous systems • Debugging Intel ® MPI codes on MIC • Intel Cluster Checker v2 with support for MIC 3
  • 4. © 2013, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others. Outline • Overview • Installation of Intel® MPI • Programming Models • Hybrid Computing • Intel® Trace Analyzer and Collector • Load Balancing • Debugging • Intel® Cluster Checker 4
  • 5. © 2013, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others. Outline • Overview • Installation of Intel® MPI • Programming Models • Hybrid Computing • Intel® Trace Analyzer and Collector • Load Balancing • Debugging • Intel® Cluster Checker 5
  • 6. © 2013, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others. Intel® MPI Library Overview • Intel is a leading vendor of MPI implementations and tools • Optimized MPI application performance Application-specific tuning Automatic tuning • Lower latency Industry leading latency • Interconnect Independence & Runtime Selection Multi-vendor interoperability Performance optimized support for the latest OFED capabilities through DAPL 2.0 • More robust MPI applications Seamless interoperability with Intel® Trace Analyzer and Collector 6
  • 7. © 2013, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others. Range of models to meet application needs Foo( ) Main( ) Foo( ) MPI_*( ) Main( ) Foo( ) MPI_*( ) Main( ) Foo( ) MPI_*( ) Spectrum of Programming Models and Mindsets 7 7 Main( ) Foo( ) MPI_*( ) Main( ) Foo( ) MPI_*( ) Main( ) Foo( ) MPI_*( )Multi-core (Xeon) Many-core (MIC) Multi-Core Centric Many-Core Centric Multi-Core Hosted General purpose serial and parallel computing Offload Codes with highly- parallel phases Many Core Hosted Highly-parallel codes Symmetric Codes with balanced needs Xeon MIC
  • 8. © 2013, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others. Levels of communication • Current clusters are not homogenous regarding communication speed: • Inter node (Infiniband, Ethernet, etc) • Intra node • Inter sockets (Quick Path Interconnect) • Intra socket • Two additional levels to come with MIC co- processor: • Host-MIC communication • Inter MIC communication 8
  • 9. © 2013, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others. Intel® MPI Library Architecture & Staging 9 CH3* MPI-2.2 Application MPICH2* upper layer CH3* device layer Nemesis* ADI3* Netmod* kernel SCIF user SCIF† shm mmap(2) HCA‡ driver dapl, ofa Pre-Alpha Alpha Beta/Gold tcp OFED verbs/core †: Symmetric Communi- cations Interface ‡: Host Channel Adapter
  • 10. © 2013, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others. Selecting network fabrics • Intel® MPI selects automatically the best available network fabric it can find. • Use I_MPI_FABRICS to select a different communication device explicitly • The best fabric is usually based on Infiniband (dapl, ofa) for inter node communication and shared memory for intra node • Available for KNC: • shm, tcp, ofa, dapl • Availability checked in the order shm:dapl, shm:ofa, shm:tcp (intra:inter) • Set I_MPI_SSHM_SCIF=1 to enable shm fabric between host and MIC 10
  • 11. © 2013, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others. Intel® MPI 4.1 what’s NOT in it for Xeon Phi coprocessors? • Features not provided for Xeon Phi coprocessors: • Dynamic process management • MPI file I/O • mpirun -perhost option • mpitune • ILP64 mode • No support on Xeon Phi coprocessors on deprecated feature: • MPD process manager 11
  • 12. © 2013, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others. Outline • Overview • Installation of Intel® MPI • Programming Models • Hybrid Computing • Intel® Trace Analyzer and Collector • Load Balancing • Debugging • Intel® Cluster Checker 12
  • 13. © 2013, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others. Installation Download latest Intel® MPI, included in Intel Cluster Studio XE, available from Intel Registration Center l_mpi_p_4.1.0.030.tgz (later: l_itac_b_8.1.0.016.tgz) Unpack the tar file, and execute the installation script: # tar zxf l_mpi_b_4.1.0.030.tgz # cd l_mpi_p_4.1.0.030 # ./install.sh Follow the installation instructions Root or user installation possible! Resulting directory structure has intel64 and mic sub-dirs.: /opt/intel/impi/4.1.0.030/intel64/{bin,etc,include,lib} /opt/intel/impi/4.1.0.030/mic/{bin,etc,include,lib} Only one user environment setup required, serves both architectures! 13
  • 14. © 2013, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others. Prerequisites Assumption: Hostname host-mic0 is associated to IP Specified in /etc/hosts or $HOME/.ssh/config The tools directory /opt/intel is mounted by NFS onto MIC If NFS is not available: Upload Intel® MPI libraries onto the card(s) # cd /opt/intel/impi/4.1.0.030/mic/lib scp libmpi.so.4.1 /lib64/libmpi.so.4 ... Execute as root or user with sudo rights (if not possible, copy to user directory) Has to be repeated after every re-boot of the KNC card 14
  • 15. © 2013, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others. Prerequisites per User Set the compiler environment # source <compiler_installdir>/bin/compilervars.sh intel64 Identical for Host and MIC Set the Intel® MPI environment # source /opt/intel/impi/4.1.0.030/intel64/bin/mpivars.sh Identical for Host and MIC mpirun needs ssh access to MIC! – Done! User‘s ssh key ~/.ssh/id_rsa.pub is copied to MIC at driver boot time. 15
  • 16. © 2013, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others. Compiling and Linking for MIC Compile MPI sources using Intel® MPI scripts For Xeon with potential offload (latest compiler) # mpiicc –o test test.c For Xeon without potential offload as usual # mpiicc [-no-offload] –o test test.c For native execution on MIC add „–mmic“ flag, i.e. the usual compiler flag controls also the MPI compilation # mpiicc –mmic –o test test.c Linker verbose mode “-v” shows Without „–mmic“ linkage with intel64 libraries: ld ... -L/opt/intel/impi/4.1.0.030/intel64/lib ... With „–mmic“ linkage with MIC libraries: ld ... -L/opt/intel/impi/4.1.0.030/mic/lib ... 16
  • 17. © 2013, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others. Outline • Overview • Installation of Intel® MPI • Programming Models • Hybrid Computing • Intel® Trace Analyzer and Collector • Load Balancing • Debugging • Intel® Cluster Checker 17
  • 18. © 2013, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others. Co-processor only Programming Model • MPI ranks on Intel® MIC (only) • All messages into/out of Intel® MIC coprocessors • Intel® CilkTM Plus, OpenMP*, Intel® Threading Building Blocks, Pthreads used directly within MPI processes • Intermediate step: All MPI processes run on 1 Intel® MIC Architecture only Build Intel® MIC binary using Intel® MIC compiler. Upload the binary to the Intel® MIC Architecture. Run instances of the MPI application on Intel® MIC nodes. 18 CPUCPU MIC CPUCPU MIC Data MPI Data Network Homogenous network of many-core CPUs
  • 19. © 2013, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others. Co-processor-only Programming Model MPI ranks on the MIC coprocessor(s) onlyMPI ranks on the MIC coprocessor(s) only MPI messages into/out of the MIC coprocessor(s) Threading possible 19 19 • Build the application for the MIC Architecture # mpiicc -mmic -o test_hello.MIC test.c • Upload the MIC executable (no NFS only) # scp ./test_hello.MIC mic0:/tmp/test_hello.MIC – Remark: If NFS available no explicit uploads required (just copies)! • Launch the application on the co-processor from host # mpirun -n 2 -wdir /tmp -host mic0 /tmp/test_hello.MIC • Alternatively: login to MIC and execute the already uploaded mpirun there!
  • 20. © 2013, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others. Symmetric Programming Model • MPI ranks on Intel® MIC Architecture and host CPUs • Messages to/from any core • Intel® CilkTM Plus, OpenMP*, Intel® Threading Building Blocks, Pthreads* used directly within MPI processes • Intermediate step: All MPI processes run on 1 host CPU and 1 Intel® MIC Architecture only • Available in Intel® MPI Library for Intel® MIC Alpha (1 host, 1 co-processor). Build Intel® 64 and Intel® MIC Architecture binaries by using the resp. compilers targeting Intel® 64 and Intel® MIC Architecture. Upload the Intel® MIC binary to the Intel® MIC Architecture. Run instances of the MPI application on different mixed nodes. 20 Heterogeneous network of homogeneous CPUs CPUCPU MIC CPUCPU MIC Data MPI Data Network Data Data Build Intel® 64 and Intel® MIC Architecture binaries by using the resp. compilers targeting Intel® 64 and Intel® MIC Architecture. Upload the Intel® MIC binary to the Intel® MIC Architecture. Run instances of the MPI application on different mixed nodes.
  • 21. © 2013, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others. Symmetric model MPI ranks on the MIC coprocessor(s) and host CPU(s)MPI ranks on the MIC coprocessor(s) and host CPU(s) MPI messages into/out of the MIC(s) and host CPU(s) Threading possible 21 21 • Build the application for Intel®64 and the MIC Architecture separately # mpiicc -o test_hello test.c # mpiicc –mmic -o test_hello.MIC test.c • Upload the MIC executable # scp ./test_hello.MIC mic0:/tmp/test_hello.MIC • Launch the application on the host and the co-processor from the host # mpirun -n 2 -host <hostname> ./test_hello : -wdir /tmp -n 2 -host mic0 /tmp/test_hello.MIC
  • 22. © 2013, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others. MPI+Offload Programming Model • MPI ranks on Intel® Xeon® processors (only) • All messages into/out of host CPUs • Offload models used to accelerate MPI ranks • Intel® CilkTM Plus, OpenMP*, Intel® Threading Building Blocks, Pthreads* within Intel® MIC Build Intel® 64 executable with included offload by using the Intel® 64 compiler. Run instances of the MPI application on the host, offloading code onto MIC. Advantages of more cores and wider SIMD for certain applications 22 Homogenous network of heterogeneous nodes CPUCPU MIC CPUCPU MIC MPI Offload Offload Network Data Data Build Intel® 64 executable with included offload by using the Intel® 64 compiler. Run instances of the MPI application on the host, offloading code onto MIC. Advantages of more cores and wider SIMD for certain applications
  • 23. © 2013, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others. MPI+Offload Programming Model MPI ranks on the host CPUs onlyMPI ranks on the host CPUs only MPI messages into/out of the host CPUs Intel® MIC Architecture as an accelerator 23 23 • Compile for MPI and internal offload # mpiicc –o test test.c • Latest compiler compiles by default for offloading if offload construct is detected! – Switch off by -no-offload flag • Execute on host(s) as usual # mpirun -n 2 ./test • MPI processes will offload code for acceleration
  • 24. © 2013, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others. Offloading to Intel® MIC Architecture Examples C/C++ Offload Pragma #pragma offload target (mic) #pragma omp parallel for reduction(+:pi) for (i=0; i<count; i++) { float t = (float)((i+0.5)/count); pi += 4.0/(1.0+t*t); } pi /= count; MKL Implicit Offload //MKL implicit offload requires no source code changes, simply link with the offload MKL Library. MKL Explicit Offload #pragma offload target (mic) in(transa, transb, N, alpha, beta) in(A:length(matrix_elements)) in(B:length(matrix_elements)) in(C:length(matrix_elements)) out(C:length(matrix_elements)alloc_if(0)) sgemm(&transa, &transb, &N, &N, &N, &alpha, A, &N, B, &N, &beta, C, &N); Fortran Offload Directive !dir$ omp offload target(mic) !$omp parallel do do i=1,10 A(i) = B(i) * C(i) enddo !$omp end parallel C/C++ Language Extensions class _Shared common { int data1; char *data2; class common *next; void process(); }; _Shared class common obj1, obj2; … _Cilk_spawn _Offload obj1.process(); _Cilk_spawn obj2.process(); … 24 Intel Confidential - Use under NDA only
  • 25. © 2013, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others. Outline • Overview • Installation of Intel® MPI • Programming Models • Hybrid Computing • Intel® Trace Analyzer and Collector • Load Balancing • Debugging • Intel® Cluster Checker 25
  • 26. © 2013, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others. Traditional Cluster Computing • MPI is »the« portable cluster solution • Parallel programs use MPI over cores inside the nodes – Homogeneous programming model – "Easily" portable to different sizes of clusters – No threading issues like »False Sharing« (common cache line) – Maintenance costs only for one parallelization model 26
  • 27. © 2013, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others. Traditional Cluster Computing (contd.) • Hardware trends • Increasing number of cores per node - plus cores on co- processors • Increasing number of nodes per cluster • Consequence: Increasing number of MPI processes per application • Potential MPI limitations • Memory consumption per MPI process, sum exceeds the node memory • Limited scalability due to exhausted interconnects (e.g. MPI collectives) • Load balancing is often challenging in MPI 27
  • 28. © 2013, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others. Hybrid Computing • Combine MPI programming model with threading model • Overcome MPI limitations by adding threading: • Potential memory gains in threaded code • Better scalability (e.g. less MPI communication) • Threading offers smart load balancing strategies • Result: Maximize performance by exploitation of hardware (incl. co-processors) 28
  • 29. © 2013, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others. 29 Example: MPI Load Imbalance 4 Cores per Node Nodes Proc 1Proc 0 Proc 3Proc 2 Proc 4 Proc 5 i j ... Difficult to implement load balancing in nodes with MPI Dark red = high load
  • 30. © 2013, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others. 30 Example: Hybrid Load Balance Nodes Thread0 i ... Thread1 Thread2 Thread3 Thread0 Thread1 Thread2 Thread3 Proc 0 Interleaved OpenMP threads improve total load balancing j Dark red = high load 4 Threads per Node on 4 Cores
  • 31. © 2013, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others. Options for Thread Parallelism 31 Intel® Math Kernel Library OpenMP* Intel® Threading Building Blocks Intel® Cilk™ Plus Pthreads* and other threading libraries Programmer control Ease of use / code maintainability Choice of unified programming to target Intel® Xeon and Intel® MIC Architecture!
  • 32. © 2013, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others. Intel® MPI Support of Hybrid Codes Intel® MPI is strong in mapping control Sophisticated default or user controlled I_MPI_PIN_PROCESSOR_LIST for pure MPI For hybrid codes (takes precedence): I_MPI_PIN_DOMAIN =<size>[:<layout>] <size> = omp Adjust to OMP_NUM_THREADS auto #CPUs/#MPIprocs <n> Number <layout> = platform According to BIOS numbering compact Close to each other scatter Far away from each other Naturally extends to hybrid codes on MIC 32 * Although locality issues apply as well, multicore threading runtimes are by far more expressive, richer, and with lower overhead.
  • 33. © 2013, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others. Intel® MPI Support of Hybrid Codes Define I_MPI_PIN_DOMAIN to split logical processors into non- overlapping subsets Mapping rule: 1 MPI process per 1 domain 33 Pin OpenMP threads inside the domain with KMP_AFFINITY (or in the code)
  • 34. © 2013, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others. Intel® MPI Environment Support The execution command mpirun of Intel® MPI reads argument sets from the command line: Sections between „:“ define an argument set (alternatively a line in a configfile specifies a set) Host, number of nodes, but also environment can be set independently in each argument set # mpirun –env I_MPI_PIN_DOMAIN 4 –host myXEON ... : -env I_MPI_PIN_DOMAIN 16 –host myMIC Adapt the important environment variables to the architecture OMP_NUM_THREADS, KMP_AFFINITY for OpenMP CILK_NWORKERS for Intel® CilkTM Plus 34 * Although locality issues apply as well, multicore threading runtimes are by far more expressive, richer, and with lower overhead.
  • 35. © 2013, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others. Co-Processor only and Symmetric Support Full hybrid support on Intel® Xeon from Intel ® MPI extends to Intel ® MIC KMP_AFFINITY=balanced (only on MIC) in addition to scatter and compact Recommendations: Explicitly control where MPI processes and threads run in a hybrid application (according to threading model and application) Avoid splitting cores among MPI processes, i.e. I_MPI_PIN_DOMAIN should be a multiple of 4 Try different KMP_AFFINITY settings for your application 35
  • 36. © 2013, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others. OS Thread Affinity Mapping • The Intel® MIC coprocessor has N cores, each with 4 hardware thread contexts, for a total of M=4*N threads • The OS maps “procs” to the M hardware threads: • The OS runs on proc 0, which lives on MIC core (N-1)! • Rule of thumb: Avoid using OS procs 0, (M-3), (M-2), and (M-1) to avoid contention with the OS • Only less than 2% resources unused (1/#cores) • Especially important when using the offload model due to data transfer activity! • But: Non-offload applications may slightly benefit from running on core (N-1) 36 MIC core 0 1 … (N-2) (N-1) MIC HW thread 0 1 2 3 0 1 … 3 0 1 2 3 OS “proc” 1 2 3 4 5 6 … (M-4) 0 (M-3) (M-2) (M-1)
  • 37. © 2013, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others. OS Thread Affinity Mapping (ctd.) OpenMP library maps to the OS “procs” Examples (for non-offload apps which benefit from core N-1): KMP_AFFINITY=compact,granularity=thread,compact KMP_AFFINITY=balanced,granularity=thread OMP_NUM_THREADS=n=M/2 37 MIC core 0 1 … (N-2) (N-1) MIC HW thread 0 1 2 3 0 1 … 3 0 1 2 3 OS “proc” 1 2 3 4 5 6 … (M-4) 0 (M-3) (M-2) (M-1) OpenMP thread 0 1 2 3 4 5 … (M-5) (M-4) (M-3) (M-2) (M-1) MIC core 0 1 … (N-2) (N-1) MIC HW thread 0 1 2 3 0 1 … 3 0 1 2 3 OS “proc” 1 2 3 4 5 6 … (M-4) 0 (M-3) (M-2) (M-1) OpenMP thread 0 1 3 4 … (n-2) (n-1)
  • 38. © 2013, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others. MPI+Offload Support How to control MIC mapping of threads? How do I avoid that offload of first MPI process interferes with offload of second MPI process, i.e. by using identical MIC cores/threads? Default: No special support (now). Offloads from MPI processes handled by system like offloads from independent processes (or users). Define thread affinity manually per single MPI process (pseudo syntax!): # export OMP_NUM_THREADS=4 # mpirun –env KMP_AFFINITY=[1-4] –n 1 –host myMIC ... : –env KMP_AFFINITY=[5-8] –n 1 –host myMIC ... : ... 38
  • 39. © 2013, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others. Outline • Overview • Installation of Intel® MPI • Programming Models • Hybrid Computing • Intel® Trace Analyzer and Collector • Load Balancing • Debugging • Intel® Cluster Checker 39
  • 40. © 2013, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others. Compare the event timelines of two communication profiles Blue = computation Red = communication Chart showing how the MPI processes interact Intel® Trace Analyzer and Collector 40
  • 41. © 2013, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others. Intel® Trace Analyzer and Collector Overview • Intel® Trace Analyzer and Collector helps the developer: • Visualize and understand parallel application behavior • Evaluate profiling statistics and load balancing • Identify communication hotspots • Features • Event-based approach • Low overhead • Excellent scalability • Comparison of multiple profiles • Powerful aggregation and filtering functions • Fail-safe MPI tracing • Provides API to instrument user code • MPI correctness checking • Idealizer 41
  • 42. © 2013, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others. Full ITAC Functionality on MIC 42
  • 43. © 2013, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others. ITAC Prerequisites Upload ITAC library manually # sudo scp /opt/intel/itac/8.1.0.016/mic/slib/libVT.so mic0:/lib64/ Set ITAC environment (per user) # source /opt/intel/itac/8.1.0.016/intel64/bin/itacvars.sh impi4 –Identical for Host and MIC 43
  • 44. © 2013, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others. ITAC Usage with Xeon Phi Run with –trace flag (without linkage) to create a trace file MPI+Offload # mpirun –trace -n 2 ./test Co-processor only # mpirun –trace -n 2 -wdir /tmp -host mic0 /tmp/test_hello.MIC Symmetric # mpirun –trace -n 2 -host michost./test_hello : -wdir /tmp -n 2 -host mic0 /tmp/test_hello.MIC Flag „-trace“ will implicitly pre-load libVT.so (which finally calls libmpi.so to execute the MPI call) Set VT_LOGFILE_FORMAT=stfsingle to create a single trace 44
  • 45. © 2013, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others. ITAC Usage with Xeon Phi Compilation Support Compile and link with „–trace“ flag # mpiicc -trace -o test_hello test.c # mpiicc –trace –mmic -o test_hello.MIC test.c Linkage of libVT library Compile with –tcollect flag # mpiicc –tcollect -o test_hello test.c # mpiicc –tcollect –mmic -o test_hello.MIC test.c • Linkage of libVT library • Will do a full instrumentation of your code, i.e. All user functions will be visible in the trace file • Maximal insight, but also maximal overhead Use the VT API of ITAC to manually instrument your code. Run as usual Intel® MPI program without „-trace“ flag # mpirun ... 45
  • 46. © 2013, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others. ITAC Analysis Start the ITAC analysis GUI with the trace file (or load it) # traceanalyzer test_hello.single.stf Start the analysis, usually by inspection of the Flat Profile (default chart), the Event Timeline, and the Message Profile • Select “Charts->Event Timeline” • Select “Charts->Message Profile” • Zoom into the Event Timeline • Klick into it, keep pressed, move to the right, and release the mouse • See menu Navigate to get back • Right klick the “Group MPI->Ungroup MPI”. 46
  • 47. © 2013, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others. Outline • Overview • Installation of Intel® MPI • Programming Models • Hybrid Computing • Intel® Trace Analyzer and Collector • Load Balancing • Debugging • Intel® Cluster Checker 47
  • 48. © 2013, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others. Intel® Xeon Phi Coprocessor Becomes a Network Node 48 * Intel® Xeon® Processor Intel® Xeon Phi Coprocessor Virtual Network Connection Intel® Xeon® Processor Intel® Xeon Phi Coprocessor Virtual Network Connection … … 48 Intel® MIC Architecture + Linux enables IP addressability
  • 49. © 2013, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others. Load Balancing • Situation • Host and Xeon Phi coprocessor computation performance are different • Host and Xeon Phi coprocessor internal communication speed is different • MPI in symmetric mode is like running on a heterogenous cluster • Load balanced codes (on homogeneous cluser) may get imbalanced! • Solution? No general solution! • Approach 1: Adapt MPI mapping of (hybrid) code to performance characteristics: #m processes per host, #n process per Xeon Phi coprocessor(s) • Approach 2: Change code internal mapping of workload to MPI processes • Example: uneven split of calculation grid for MPI processes on host vs. Xeon Phi coprocessor(s) • Approach 3: ... • Analyze load balance of application with ITAC • Ideal Interconnect Simulator 49
  • 50. © 2013, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others. Improving Load Balance: Real World Case 50 Host 16 MPI procs x 1 OpenMP thread Xeon Phi coprocessor 8 MPI procs x 28 OpenMP threads Collapsed data per node and Xeon Phi coprocessor Too high load on Host = too low load on Xeon Phi coprocessor
  • 51. © 2013, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others. Improving Load Balance: Real World Case 51 Collapsed data per node and Xeon Phi coprocessor Host 16 MPI procs x 1 OpenMP thread Xeon Phi coprocessor 24 MPI procs x 8 OpenMP threads Too low load on Host = too high load on Xeon Phi coprocessor
  • 52. © 2013, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others. Improving Load Balance: Real World Case 52 Collapsed data per node and Xeon Phi coprocessor Host 16 MPI procs x 1 OpenMP thread Xeon Phi coprocessor 16 MPI procs x 12 OpenMP thrds Perfect balance Host load = Xeon Phi coprocessor load
  • 53. © 2013, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others. Ideal Interconnect Simulator (IIS) What is the Ideal Interconnect Simulator (IIS)? Using a ITAC trace of an MPI application, simulate it under ideal conditions Zero network latency Infinite network bandwidth Zero MPI buffer copy time Infinite MPI buffer size Only limiting factors are concurrency rules, e.g., A message can not be received before it is sent An All-to-All collective may end only when the last thread starts 53
  • 54. © 2013, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others. Ideal Interconnect Simulator (Idealizer) 54 Actual trace Idealized Trace
  • 55. © 2013, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others. Building Blocks: Elementary Messages 55 MPI_Recv MPI_IsendMPI_IsendP1 P2 Early Send / Late Receive MPI_Isend MPI_Recv P1 P2 Late Send / Early Receive MPI_Recv zero duration zero duration MPI_Isend zero duration MPI_Recv Load imbalance
  • 56. © 2013, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others. Building Blocks: Collective Operations 56 Actual trace (Gigabit Ethernet) Simulated trace (Ideal interconnect) Same timescale in both figures Same MPI_Alltoallv Legend: 257 = MPI_Alltoallv 506 = User_Code
  • 57. © 2013, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others. Application Imbalance Diagram: Total 57 "calculation" "load imbalance" "interconnect"Faster network Change parallel decomposition Change algorithm MPI
  • 58. © 2013, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others. Application Imbalance Diagram: Breakdown 58 MPI_Recv MPI_Allreduce MPI_Alltoallv "load imbalance" "interconnect"
  • 59. © 2013, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others. Outline • Overview • Installation of Intel® MPI • Programming Models • Hybrid Computing • Intel® Trace Analyzer and Collector • Load Balancing • Debugging • Intel® Cluster Checker 59
  • 60. © 2013, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others. Debugging Intel® MPI Application Use environment variables: I_MPI_DEBUG to set the debug level I_MPI_DEBUG_OUTPUT to specify a file for output re- direction Use format strings like %r, %p or %h to add rank, pid or host name to the file name accordingly Usage: # export I_MPI_DEBUG=<debug level> or: # mpirun –env I_MPI_DEBUG <debug level> –n <# of processes> ./a.out Processor information utility in Intel® MPI : # cpuinfo Aggregates /proc/cpuinfo information 60
  • 61. © 2013, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others. GDB* on Intel® Xeon Phi™ Coprocessor • GDB* supports Intel® Xeon Phi™ Coprocessor • Intel upstreams features and capabilities to GNU* community • Broad enabling of developers and software tools ecosystem • Available from Intel at http://software.intel.com 61 8/19/201 3
  • 62. © 2013, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others. The GNU* Project Debugger and Intel® Xeon Phi™ Coprocessor • Native and cross-debugger versions of GDB* exist for the Intel® Xeon Phi™ coprocessor • It is part of the Intel® Manycore Platform Software Stack (Intel® MPSS) • http://software.intel.com/en-us/articles/intel- manycore-platform-software-stack-mpss You can debug with it as either root or a user 62 Intel Confidential – NDA presentation
  • 63. © 2013, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others. Native debugging on the Intel® Xeon Phi™ Coprocessor with GDB* 63 • Run GDB* on the Intel® Xeon Phi™ Coprocessor ssh –t mic0 /usr/bin/gdb – To attach to a running application via the process-id (gdb) shell pidof my_application 42 (gdb) attach 42 – To run an application directly from GDB* (gdb) file /target/path/to/application (gdb) start Intel Confidential – NDA presentation
  • 64. © 2013, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others. Remote debugging with GDB* for Intel® Xeon Phi™ Coprocessor 64 • Run GDB* on your localhost /usr/linux-k1om-4.7/bin/x86_64-k1om-linux-gdb Start gdbserver on the Intel® Xeon Phi™Coprocessor • To remote debug using |ssh (gdb) target extended-remote | ssh –T mic0 gdbserver –multi IP:port • To remote debug using stdio (gdb) target extended-remote | ssh -T mic0 gdbserver –multi - To attach to a running application via the process-id (pid) (gdb) file /local/path/to/application (gdb) attach <remote-pid> To run an application directly from GDB* (gdb) file /local/path/to/application (gdb) set remote exec-file /target/path/to/application
  • 65. © 2013, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others. Explore Intel® Xeon Phi™ Coprocessor Architecture Features 65 8/19/ 2013 List all new vector and mask registers (gdb) info registers zmm k0 0x0 0 ⁞ zmm31 {v16_float = {0x0 <repeats 16 times>}, v8_double = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, v64_int8 = {0x0 <repeats 64 times>}, v32_int16 = {0x0 <repeats 32 times>}, v16_int32 = {0x0 <repeats 16 times>}, v8_int64 = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, v4_uint128 = {0x0, 0x0, 0x0, 0x0}} Disassemble Instructions • (gdb) disassemble $pc, +10 • Dump of assembler code from 0x11 to 0x24: • 0x0000000000000011 <foobar+17>: vpackstorelps %zmm0,- 0x10(%rbp){%k1} • 0x0000000000000018 <foobar+24>: vbroadcastss -0x10(%rbp),%zmm0
  • 66. © 2013, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others. Outline • Overview • Installation of Intel® MPI • Programming Models • Hybrid Computing • Intel® Trace Analyzer and Collector • Load Balancing • Debugging • Intel® Cluster Checker 66
  • 67. © 2013, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others. Intel® Cluster Checker 2.0 with Intel® Xeon Phi™ coprocessor support •The new micinfo test module checks that coprocessor information is correct and uniform across nodes. Any error, undefined value or abnormal difference among coprocessors is reported when it may impact cluster productivity. •The new miccheck test module checks the sanity of the coprocessor cards by running miccheck diagnostic tools in every node in parallel. •To run a benchmark which offloads work to a coprocessor: $ OFFLOAD_REPORT=2 MKL_MIC_ENABLE=1 clck -I micinfo -I miccheck -I dgemm http://software.intel.com/en-us/articles/using-intel-cluster-checker-20-to-check-intel-xeon-phi-support 67
  • 68. © 2013, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others. Intel® Cluster Checker 2.0 Faster Execution Time Reduction is 2x vs. v1.8, a 256-node certification takes nearly 30 minutes Results have been estimated based on internal Intel analysis and are provided for informational purposes only. Any difference in system hardware or software design or configuration may affect actual performance. 0 200 400 600 800 1000 1200 1400 1600 8 16 32 64 128 256 ExecutionTimeinSeconds Node Quantity
  • 69. © 2013, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others. Summary The ease of use of Intel® MPI and related tools like the Intel Trace Analyzer and Collector extends from the Intel Xeon architecture to the Intel MIC architecture. “Everything must be made as simple as possible. But not simpler.” ― Albert Einstein 69
  • 70. © 2013, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others. 70
  • 71. © 2013, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others. 71