SlideShare ist ein Scribd-Unternehmen logo
1 von 28
Downloaden Sie, um offline zu lesen
Software & Services Group, Developer Products Division
Copyright© 2010, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.
Intel® Parallel Studio 2011
Essential Performance
Design l Build & Debug l Verify l Tune
Kirill Mavrodiev
kirill.mavrodiev@intel.com
EMEA Compiler TCE (Software Engineer)
SSG DPD ICL
Software & Services Group, Developer Products Division
Copyright© 2010, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.
AGENDA
•Intel® Parallel Studio 2011
•Intel® Parallel Composer
–Intel® Cilk Plus Key words
–Array Notation
–Guided Auto-Parallelization (GAP)
–Intel® Parallel Debugger Extension
2
Intel® Parallel Studio 2011 Introduction
Software & Services Group, Developer Products Division
Copyright© 2010, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.
Three Product Lines for Diverse Needs
Essential
Performance
Advanced
Performance
Distributed
Performance
C/C++ developers
Microsoft Visual Studio*
Take advantage of multicore
C++ and Fortran developers
Windows* and Linux*
High performance, cross platform apps
C++ and Fortran developers
on Windows* and Linux*
High performance MPI clusters
intel.com/software/products
3
Intel® Parallel Studio 2011 Introduction
Software & Services Group, Developer Products Division
Copyright© 2010, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.
What’s New in
Intel® Parallel Studio 2011
4
Intel® Parallel Building Blocks – A
comprehensive, portable, reliable, future proof
parallel models for both data and task parallelism
• Intel® Threading Building Blocks
• Intel® Cilk Plus
• Intel® Array Building Blocks (Beta)
Intel® Parallel Advisor –Parallelism design guide
• Demystifies and speeds parallel application design
• Gives parallelism design insight and analysis
through Explorer and Modeler analysis tools
Now includes Intel® Premier Support
• Unlimited technical support and upgrades for one
year
Enhancements
• Intel® Threading Building Blocks 3.0
• Compiler improvements
• Microsoft Visual Studio* 2010 integration
Intel® Parallel Studio 2011 Introduction
Software & Services Group, Developer Products Division
Copyright© 2010, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.
Intel® Parallel Studio 2011
5
• All-in-one toolset for
the software
development
lifecycle
• Microsoft Visual
Studio* plug-in
– 2005, 2008 and 2010
Intel® Parallel Studio 2011 Introduction
Software & Services Group, Developer Products Division
Copyright© 2010, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.
Intel® Parallel Advisor
Step by Step Guidance
6
Focus on the hot call trees and loops as locations to
experiment with parallelism.
Advisor annotations into source code to describe
their parallel experiment.
Evaluate the performance of parallel experiment by
displaying the performance projection for each
parallel site and how each site‟s performance
impacts the entire program.
Identifies data issues (races) of each parallel
experiment.
Intel® Parallel Studio 2011 Introduction
Software & Services Group, Developer Products Division
Copyright© 2010, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.
Intel® Parallel Composer
BUILD & DEBUG PHASE
7
Develop high performance applications with a optimized
C/C++ compiler and comprehensive threaded libraries
Intel® Integrated Performance
Primitives
Extensive library highly optimized software functions for
digital media and data-processing applications
Improve Performance
Easier, faster performance for Windows* apps
Intel® Parallel Building Blocks
Comprehensive set of parallel development models that
support multiple approaches to parallelism.
Intel® Parallel Studio 2011 Introduction
Software & Services Group, Developer Products Division
Copyright© 2010, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.
Intel’s Family of Parallel Programming Models
8
MPI
Intel®
Concurrent
Collections
OpenMP*
OpenCL*
Intel®
Cilk
Plus
Intel® Math
Kernel
Library (MKL)
Intel®
Integrated
Performance
Primitives
(IPP)
Intel®
Threading
Building
Blocks (TBB)
Intel® Array
Building
Blocks (ArBB)
Fixed
Function
Libraries
Established
Standards
Research and
Exploration
Intel® Parallel
Building Blocks (PBB)
Intel® Cilk Plus, Intel® TBB: Part of Intel® Parallel Studio 2011
Intel® Array Building Blocks: Known by code names „Intel Ct‟ or
„Intel Firetown”; public beta started around mid September 2010
Intel® Parallel Studio 2011 Introduction
8
Software & Services Group, Developer Products Division
Copyright© 2010, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.
Intel® Parallel Building Blocks
Details
Intel® Parallel Studio 2011 Introduction
9
Software & Services Group, Developer Products Division
Copyright© 2010, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.
Intel® Parallel Inspector
VERIFY PHASE
10
Identify memory issues in serial and parallel
applications in addition to threading errors.
Find Memory Errors
Find a wide variety of memory errors
Find Threading Errors
Find data races and deadlocks
Improve Reliability
Ensure application reliability with proactive memory
and threading error checking
Intel® Parallel Studio 2011 Introduction
Software & Services Group, Developer Products Division
Copyright© 2010, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.
Intel® Parallel Amplifier
TUNE PHASE
11
Hotspot Analysis
Where does my program spend most of the time?
Concurrency Analysis
Where and Why doesn‟t my program utilize all
available cores?
Locks & Wait Analysis
Where and Why does my program wait?
Optimize serial and parallel application performance
with 3 easy to use, powerful analysis methods
Intel® Parallel Studio 2011 Introduction
Software & Services Group, Developer Products Division
Copyright© 2010, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.
Intel® Parallel Studio 2011
Summary
• For over a year, Intel Parallel Studio has made it easier for
Windows* developers to create fast, reliable applications for
multicore
• This release is a major update
– Intel® Parallel Building Blocks adds significant new parallelism models
– Intel® Parallel Advisor empowers software architects with parallelism
design insight and analysis for building reliable, high performance
applications for multicore
– Other enhancements including support for Visual Studio* 2010
www.intel.com/go/parallel:
Try it Right Now!
12
Intel® Parallel Studio 2011 Introduction
Software & Services Group, Developer Products Division
Copyright© 2010, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.
Intel® Parallel Composer
Software & Services Group, Developer Products Division
Copyright© 2010, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.
Intel® Cilk™ Plus
Language extensions to simplify task & data parallelism
• C++ language extension that provides three simple
keywords to write parallel code
– Loop-type data parallelism using cilk_for
– General parallelism using cilk_spawn and cilk_sync
• Unambiguous semantics, strict fork–join model via
compiler support
• Easiest for the programmer to understand the parallel
control flow
• Automatic load balancing via work stealing
• Low-overhead task spawning, encourage programmers
to create many small tasks
• A program with many small tasks provide opportunity
for the task scheduler both to load balance and forward
scale to larger core counts
Intel® Parallel Studio 2011 Introduction
14
Software & Services Group, Developer Products Division
Copyright© 2010, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.
Intel® Cilk™ Plus
Cilk adds the following new keywords:
 _Cilk_spawn for spawning a function call that executes
asynchronously,
 _Cilk_sync for synchronization point to wait for children spawned
inside that function,
 _Cilk_for for parallel for-loop that executes iterations in parallel.
Cilk includes Reducers for lock-free access to global
data:
 Use built-in reducers for common types – strings, summation,
min/max, logical operations, and more.
 Write custom reducers to manage any data type.
15
Intel® Parallel Studio 2011 Introduction
Software & Services Group, Developer Products Division
Copyright© 2010, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.
Simple Divide and Conquer Example
#include "cilk/cilk.h"
int fib(int n) {
int x, y;
if (n<2) return n;
x = cilk_spawn fib(n-1);
y = fib(n-2);
cilk_sync;
return x+y;
};
int main () {
printf("Fib of 40 is %dn", fib(40));
return 0;
}
Allow fib(n-1) to run in parallel
with fib(n-2)
Ensure that all parallel work is
complete before using the result
16
Intel® Parallel Studio 2011 Introduction
Software & Services Group, Developer Products Division
Copyright© 2010, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.
Intel® Cilk Plus Tachyon Implementation
17
Intel® Parallel Studio 2011 Introduction
Software & Services Group, Developer Products Division
Copyright© 2010, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.
Array Notations for data parallelism
• New array extension to C/C++ language
• Specify parallel operations on arrays (instead of sequential
loops)
• Predictable performance based on mapping parallel constructs
to underlying multi-threading/SIMD hardware
• Works seamlessly with existing C/C++ frameworks and
runtimes: Intel® TBB, OpenMP*, MPI, Intel® Cilk™ Plus,
Pthreads, etc.
18
Intel® Parallel Studio 2011 Introduction
Software & Services Group, Developer Products Division
Copyright© 2010, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.
Array Section Notation
• Array Section Notation
<array base> [<lower_bound> : <length> : <stride>]*
– „:<stride>‟ is optional ( defaults to stride=1 )
– missing „:<length>:<stride>‟ implies length=1
– Simple „:‟ select all elements of this dimension
– Note syntax difference to Fortran section which is
lower_bound : upper_bound : [stride]
• Samples:
A[:] // All elements of vector A
B[2:6] // Elements 2 to 7 of vector B
C[:][5] // Column 5 of matrix C
D[0:3:2] // Elements 0,2,4 of vector D
E[0:3][0:4] // 12 elements from E[0][0] to E[2][3]
19
Intel® Parallel Studio 2011 Introduction
Software & Services Group, Developer Products Division
Copyright© 2010, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.
Intel® Cilk Plus implementation
cilk_for(int k = 0; k < nz; k++){
for(int j = 0; j < ny; j++)
for(int i = 0; i < nx; i+=STRIDE){
tmp[:] = rhs[ID(i,j,k) : STRIDE] + x[IDEA(i,j,k) : STRIDE] * 6.0 -
(x[ID(i-1,j,k) : STRIDE] + x[ID(i+1,j,k) : STRIDE]+
x[ID(i,j-1,k) : STRIDE] + x[ID(i,j+1,k) : STRIDE]+
x[ID(i,j,k-1) : STRIDE] + x[ID(i,j,k+1) : STRIDE]);
residueConvergeStrongCReducer = cilk::max_of (
residueConvergeStrongCReducer, __sec_reduce_max
(fabs(tmp[:])) );
residueConvergeStrongL2Reducer +=
__sec_reduce_add (tmp[:]*tmp[:]);
}
}
20
Intel® Parallel Studio 2011 Introduction
Software & Services Group, Developer Products Division
Copyright© 2010, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.
Intel® Cilk Plus implementation
21
Intel® Parallel Studio 2011 Introduction
Software & Services Group, Developer Products Division
Copyright© 2010, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.
GAP: Guided Auto-parallelization
• Targeted for Mainstream and HPC Users
• Advice to change code for more auto-vectorization, auto-parallelization and
data transformations
• Diagnostic guidance generated when invoked
• Advice may involve
– suggestions for source-change
– adding pragmas
– adding new options
• Simple source changes that assert new properties
– Add a new pragma for loop if semantics are satisfied
– Use a local-variable for the upper-bound of a loop
– Initialize scalar variable unconditionally at top of loop
– Reorder fields of a structure (or split into two)
• Desired behavior
– Each advice is specific using source-level variable names
– User does semantic analysis – apply or reject each advice
– Advice should be as localized as possible
– Following the advice should result in better optimizations
22
Intel® Parallel Studio 2011 Introduction
Software & Services Group, Developer Products Division
Copyright© 2010, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.
GAP – How it Works
Selection of most Relevant Switches
Multiple compiler switches to activate and fine-tune
guidance analysis
• Activate messages individually for vectorization,
parallelization, data transformations or all three
-guide-vec[=level]
-guide-par[=level]
-guide-data-trans[=level]
-guide[=level]
Optional argument level=1,2,3,4 controls extend of
analysis; Intel Composer only supports up to level 3
• Control the source code part for which analysis is done
-guide-opts=<arg>
Samples:
-guide-opts=“convert.c,'funca(int)'“
-guide-opts="bar.f90,'module_1::func_solve'“
• Control where the message are going
-guide-file=<file_name> -guide-file-append<=file_name>
23
Intel® Parallel Studio 2011 Introduction
Software & Services Group, Developer Products Division
Copyright© 2010, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.
GAP Case Study
extern int num_nodes;
typedef struct TEST_STRUCT {
// Coordinates of city1
float latitude1;
float longitude1;
int city_id1;
int stops[10000]; // Currently unused field
// Coordinates of city2
float latitude2;
float longitude2;
int city_id2;
} test_struct;
extern float *distances; extern test_struct** nodes;
void process_nodes(void)
{
float const R = 3964.0;
float temp, lat1, lat2, long1, long2, result;
int temp1 = num_nodes;
//#pragma loop count min(16)
//#pragma parallel
// for (int k=0; k < temp1; k++) {
for (int k=0; k < num_nodes; k++) {
lat1 = nodes[k]->latitude1;
lat2 = nodes[k]->latitude2;
long1 = nodes[k]->longitude1;
long2 = nodes[k]->longitude2;
// Compute the distance between the two cities
temp = sin(lat1) * sin(lat2) +
cos(lat1) * cos(lat2) * cos(long1-long2);
result = 2.0 * R * atan(sqrt((1.0-temp)/(1.0+temp)));
// Store the distance computed in the distances array
distances[k] = result;
}
}
[c:/test2/usability2] icl -c distance.cpp -Qguide=4 -Qparallel
GAP REPORT LOG OPENED ON Wed Mar 03 18:34:01 2010
c:test2usability2distance.h(2): remark #30755: (DTRANS) Reorderi
ng the fields of the structure 'TEST_STRUCT' will improve data locality.
Suggested field order: 'stops, latitude1, longitude1, latitude2, longitude2,
city_id1, city_id2'. [VERIFY] The suggestion is based on the field references
in current compilation. Please make sure that the restructured code satisfies
the original program semantics.
c:test2gap_examplesusability2distance.cpp(30): remark #30534:
(LOOP) Add -Qansi-alias option for better type-based disambiguation
analysis by the compiler if appropriate (option will apply for entire
compilation). This will improve optimizations for the loop at line 30 [VERIFY]
Make sure that the semantics of this option is obeyed for entire
compilation.
c:test2usability2distance.cpp(29): remark #30519: (PAR) Use "#pr
agma parallel" to parallelize the loop at line 29, if these arrays in the loop d
o not have cross-iteration dependencies: nodes, distances. [VERIFY] A cross-
iteration dependency exists if a memory location is modified in an iteration
of the loop and accessed (a read or a write) in another iteration of the loop.
Make sure that there are no such dependencies.
c:test2gap_examplesusability2distance.cpp(29): remark #30525: (PAR)
If the trip count of the loop at line 29 is greater than 16, then use "#pragma
loop count min(16)" to parallelize this loop. [VERIFY] Make sure that the
loop has a minimum of 16 iterations.
c:test2gap_examplesusability2distance.cpp(48): remark #30525: (PAR)
If the trip count of the loop at line 48 is greater than 751, then use
"#pragma loop count min(751)" to parallelize this loop. [VERIFY] Make sure
that the loop has a minimum of 751 iterations.
END OF GAP REPORT LOG
24
Intel® Parallel Studio 2011 Introduction
Software & Services Group, Developer Products Division
Copyright© 2010, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.
Intel® Debugger &
Intel® Parallel Debugger Extension
Linux* Mac*OS Windows*
Intel® Debugger (IDB) Intel® Parallel Debugger
Extension
Intel® C++ Composer XE
Intel® Fortran Composer XE
Intel® Cluster Toolkit
Compiler Edition
Intel® C++ Composer XE
Intel® Fortran Composer XE
Intel® C++ Composer XE
Intel® Visual Fortran Composer XE
Intel® Parallel Composer
Software & Services Group, Developer Products Division
Copyright© 2010, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.
Thread Shared Data Event Detection
Break on Thread Shared Data Access (read/write)
Re-entrant Function Detection
SIMD SSE Registers Window
Enhanced OpenMP* Support
Serialize OpenMP threaded application execution on the
fly
Insight into thread groups, barriers, locks, wait lists etc.
Key Features
Intel® Parallel Studio 2011 Introduction
26
Software & Services Group, Developer Products Division
Copyright© 2010, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.
Questions?
27
Intel® Parallel Studio 2011 Introduction
Software & Services Group, Developer Products Division
Copyright© 2010, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners.
Legal Disclaimer
INFORMATION IN THIS DOCUMENT IS PROVIDED “AS IS”. NO LICENSE, EXPRESS OR
IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL PROPERTY RIGHTS
IS GRANTED BY THIS DOCUMENT. INTEL ASSUMES NO LIABILITY WHATSOEVER
AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY, RELATING TO THIS
INFORMATION INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR
A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT,
COPYRIGHT OR OTHER INTELLECTUAL PROPERTY RIGHT.
Performance tests and ratings are measured using specific computer systems and/or
components and reflect the approximate performance of Intel products as measured
by those tests. Any difference in system hardware or software design or
configuration may affect actual performance. Buyers should consult other sources of
information to evaluate the performance of systems or components they are
considering purchasing. For more information on performance tests and on the
performance of Intel products, reference www.intel.com/software/products.
Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and other
countries.
*Other names and brands may be claimed as the property of others.
Copyright © 2010. Intel Corporation.
http://www.intel.com/software/products
Intel® Parallel Studio 2011 Introduction
28

Weitere ähnliche Inhalte

Was ist angesagt?

Atmel - Next-Generation IDE: Maximizing IP Reuse [WHITE PAPER]
Atmel - Next-Generation IDE: Maximizing IP Reuse [WHITE PAPER]Atmel - Next-Generation IDE: Maximizing IP Reuse [WHITE PAPER]
Atmel - Next-Generation IDE: Maximizing IP Reuse [WHITE PAPER]Atmel Corporation
 
A better experience through performance: Intel Core i5 & Core i3 processor-po...
A better experience through performance: Intel Core i5 & Core i3 processor-po...A better experience through performance: Intel Core i5 & Core i3 processor-po...
A better experience through performance: Intel Core i5 & Core i3 processor-po...Principled Technologies
 
Insync 10 session maximize your jd edwards enterprise one investment with t...
Insync 10 session   maximize your jd edwards enterprise one investment with t...Insync 10 session   maximize your jd edwards enterprise one investment with t...
Insync 10 session maximize your jd edwards enterprise one investment with t...InSync Conference
 
Juc deck 16x9_dev_ops_mvp
Juc deck 16x9_dev_ops_mvpJuc deck 16x9_dev_ops_mvp
Juc deck 16x9_dev_ops_mvpCurtis Yanko
 
Advanced Techniques to Accelerate Model Tuning | Software for AI Optimization...
Advanced Techniques to Accelerate Model Tuning | Software for AI Optimization...Advanced Techniques to Accelerate Model Tuning | Software for AI Optimization...
Advanced Techniques to Accelerate Model Tuning | Software for AI Optimization...Intel® Software
 
Real World Java Compatibility (Tim Ellison)
Real World Java Compatibility (Tim Ellison)Real World Java Compatibility (Tim Ellison)
Real World Java Compatibility (Tim Ellison)Chris Bailey
 
TDC2019 Intel Software Day - Inferencia de IA em edge devices
TDC2019 Intel Software Day - Inferencia de IA em edge devicesTDC2019 Intel Software Day - Inferencia de IA em edge devices
TDC2019 Intel Software Day - Inferencia de IA em edge devicestdc-globalcode
 
IBM Rhapsody Code Generation Customization
IBM Rhapsody Code Generation CustomizationIBM Rhapsody Code Generation Customization
IBM Rhapsody Code Generation Customizationgjuljo
 
DDS vs DDS4CCM
DDS vs DDS4CCMDDS vs DDS4CCM
DDS vs DDS4CCMRemedy IT
 
Slowear Heading Clouds
Slowear Heading CloudsSlowear Heading Clouds
Slowear Heading Cloudsxband
 
JavaOne2013: Implement a High Level Parallel API - Richard Ning
JavaOne2013: Implement a High Level Parallel API - Richard NingJavaOne2013: Implement a High Level Parallel API - Richard Ning
JavaOne2013: Implement a High Level Parallel API - Richard NingChris Bailey
 
Sanjaykumar Kakaso Mane_MAY2016
Sanjaykumar Kakaso Mane_MAY2016Sanjaykumar Kakaso Mane_MAY2016
Sanjaykumar Kakaso Mane_MAY2016Sanjay Mane
 
Introduction to java_ee
Introduction to java_eeIntroduction to java_ee
Introduction to java_eeYogesh Bindwal
 
IBM Rhapsody and MATLAB/Simulink
IBM Rhapsody and MATLAB/SimulinkIBM Rhapsody and MATLAB/Simulink
IBM Rhapsody and MATLAB/Simulinkgjuljo
 
End-to-End Deep Learning Deployment with ONNX
End-to-End Deep Learning Deployment with ONNXEnd-to-End Deep Learning Deployment with ONNX
End-to-End Deep Learning Deployment with ONNXNick Pentreath
 
InterConnect 2017 : Do You Have the Right Solution for z/OS Application Devel...
InterConnect 2017 : Do You Have the Right Solution for z/OS Application Devel...InterConnect 2017 : Do You Have the Right Solution for z/OS Application Devel...
InterConnect 2017 : Do You Have the Right Solution for z/OS Application Devel...DevOps for Enterprise Systems
 
DUG'20: 09 - DAOS Middleware Update
DUG'20: 09 - DAOS Middleware UpdateDUG'20: 09 - DAOS Middleware Update
DUG'20: 09 - DAOS Middleware UpdateAndrey Kudryavtsev
 

Was ist angesagt? (20)

Atmel - Next-Generation IDE: Maximizing IP Reuse [WHITE PAPER]
Atmel - Next-Generation IDE: Maximizing IP Reuse [WHITE PAPER]Atmel - Next-Generation IDE: Maximizing IP Reuse [WHITE PAPER]
Atmel - Next-Generation IDE: Maximizing IP Reuse [WHITE PAPER]
 
A better experience through performance: Intel Core i5 & Core i3 processor-po...
A better experience through performance: Intel Core i5 & Core i3 processor-po...A better experience through performance: Intel Core i5 & Core i3 processor-po...
A better experience through performance: Intel Core i5 & Core i3 processor-po...
 
Automatic Performance Improvement for Legacy COBOL
Automatic Performance Improvement for Legacy COBOLAutomatic Performance Improvement for Legacy COBOL
Automatic Performance Improvement for Legacy COBOL
 
Insync 10 session maximize your jd edwards enterprise one investment with t...
Insync 10 session   maximize your jd edwards enterprise one investment with t...Insync 10 session   maximize your jd edwards enterprise one investment with t...
Insync 10 session maximize your jd edwards enterprise one investment with t...
 
RTF - Prasad bhatt
RTF - Prasad bhattRTF - Prasad bhatt
RTF - Prasad bhatt
 
Juc deck 16x9_dev_ops_mvp
Juc deck 16x9_dev_ops_mvpJuc deck 16x9_dev_ops_mvp
Juc deck 16x9_dev_ops_mvp
 
Advanced Techniques to Accelerate Model Tuning | Software for AI Optimization...
Advanced Techniques to Accelerate Model Tuning | Software for AI Optimization...Advanced Techniques to Accelerate Model Tuning | Software for AI Optimization...
Advanced Techniques to Accelerate Model Tuning | Software for AI Optimization...
 
Real World Java Compatibility (Tim Ellison)
Real World Java Compatibility (Tim Ellison)Real World Java Compatibility (Tim Ellison)
Real World Java Compatibility (Tim Ellison)
 
TDC2019 Intel Software Day - Inferencia de IA em edge devices
TDC2019 Intel Software Day - Inferencia de IA em edge devicesTDC2019 Intel Software Day - Inferencia de IA em edge devices
TDC2019 Intel Software Day - Inferencia de IA em edge devices
 
IBM Rhapsody Code Generation Customization
IBM Rhapsody Code Generation CustomizationIBM Rhapsody Code Generation Customization
IBM Rhapsody Code Generation Customization
 
DDS vs DDS4CCM
DDS vs DDS4CCMDDS vs DDS4CCM
DDS vs DDS4CCM
 
Slowear Heading Clouds
Slowear Heading CloudsSlowear Heading Clouds
Slowear Heading Clouds
 
JavaOne2013: Implement a High Level Parallel API - Richard Ning
JavaOne2013: Implement a High Level Parallel API - Richard NingJavaOne2013: Implement a High Level Parallel API - Richard Ning
JavaOne2013: Implement a High Level Parallel API - Richard Ning
 
Sanjaykumar Kakaso Mane_MAY2016
Sanjaykumar Kakaso Mane_MAY2016Sanjaykumar Kakaso Mane_MAY2016
Sanjaykumar Kakaso Mane_MAY2016
 
Introduction to java_ee
Introduction to java_eeIntroduction to java_ee
Introduction to java_ee
 
Host Simulation
Host SimulationHost Simulation
Host Simulation
 
IBM Rhapsody and MATLAB/Simulink
IBM Rhapsody and MATLAB/SimulinkIBM Rhapsody and MATLAB/Simulink
IBM Rhapsody and MATLAB/Simulink
 
End-to-End Deep Learning Deployment with ONNX
End-to-End Deep Learning Deployment with ONNXEnd-to-End Deep Learning Deployment with ONNX
End-to-End Deep Learning Deployment with ONNX
 
InterConnect 2017 : Do You Have the Right Solution for z/OS Application Devel...
InterConnect 2017 : Do You Have the Right Solution for z/OS Application Devel...InterConnect 2017 : Do You Have the Right Solution for z/OS Application Devel...
InterConnect 2017 : Do You Have the Right Solution for z/OS Application Devel...
 
DUG'20: 09 - DAOS Middleware Update
DUG'20: 09 - DAOS Middleware UpdateDUG'20: 09 - DAOS Middleware Update
DUG'20: 09 - DAOS Middleware Update
 

Andere mochten auch

От e-commerce к de-commerce
От e-commerce к de-commerceОт e-commerce к de-commerce
От e-commerce к de-commerceMedia Gorod
 
Постановка задачи на разработку web-систем: что должен знать исполнитель на с...
Постановка задачи на разработку web-систем: что должен знать исполнитель на с...Постановка задачи на разработку web-систем: что должен знать исполнитель на с...
Постановка задачи на разработку web-систем: что должен знать исполнитель на с...Media Gorod
 
тарифный план битрикс дмитрий росляков
тарифный план битрикс   дмитрий росляковтарифный план битрикс   дмитрий росляков
тарифный план битрикс дмитрий росляковMedia Gorod
 
Semantic Web & электронные сми илья клинцов
Semantic Web & электронные сми   илья клинцовSemantic Web & электронные сми   илья клинцов
Semantic Web & электронные сми илья клинцовMedia Gorod
 
как организовать работу с юзабилити подрядчиком платон днепровский
как организовать работу с юзабилити подрядчиком   платон днепровскийкак организовать работу с юзабилити подрядчиком   платон днепровский
как организовать работу с юзабилити подрядчиком платон днепровскийMedia Gorod
 
Инфраструктура крупного географически распределенного интернет проекта: техн...
Инфраструктура крупного географически распределенного интернет проекта:  техн...Инфраструктура крупного географически распределенного интернет проекта:  техн...
Инфраструктура крупного географически распределенного интернет проекта: техн...Media Gorod
 
Вебинар Томулевича adjacency
Вебинар Томулевича adjacencyВебинар Томулевича adjacency
Вебинар Томулевича adjacencyMedia Gorod
 

Andere mochten auch (8)

От e-commerce к de-commerce
От e-commerce к de-commerceОт e-commerce к de-commerce
От e-commerce к de-commerce
 
Постановка задачи на разработку web-систем: что должен знать исполнитель на с...
Постановка задачи на разработку web-систем: что должен знать исполнитель на с...Постановка задачи на разработку web-систем: что должен знать исполнитель на с...
Постановка задачи на разработку web-систем: что должен знать исполнитель на с...
 
тарифный план битрикс дмитрий росляков
тарифный план битрикс   дмитрий росляковтарифный план битрикс   дмитрий росляков
тарифный план битрикс дмитрий росляков
 
Semantic Web & электронные сми илья клинцов
Semantic Web & электронные сми   илья клинцовSemantic Web & электронные сми   илья клинцов
Semantic Web & электронные сми илья клинцов
 
как организовать работу с юзабилити подрядчиком платон днепровский
как организовать работу с юзабилити подрядчиком   платон днепровскийкак организовать работу с юзабилити подрядчиком   платон днепровский
как организовать работу с юзабилити подрядчиком платон днепровский
 
Инфраструктура крупного географически распределенного интернет проекта: техн...
Инфраструктура крупного географически распределенного интернет проекта:  техн...Инфраструктура крупного географически распределенного интернет проекта:  техн...
Инфраструктура крупного географически распределенного интернет проекта: техн...
 
Вебинар Томулевича adjacency
Вебинар Томулевича adjacencyВебинар Томулевича adjacency
Вебинар Томулевича adjacency
 
E travel13
E travel13E travel13
E travel13
 

Ähnlich wie Кирилл Мавродиев, Intel – Обзор современных возможностей по распараллеливанию и векторизации приложений с использованием Parallels Composer

Developing Multi-OS Native Mobile Applications with Intel INDE
Developing Multi-OS Native Mobile Applications with Intel INDEDeveloping Multi-OS Native Mobile Applications with Intel INDE
Developing Multi-OS Native Mobile Applications with Intel INDEIntel® Software
 
Getting the maximum performance in distributed clusters Intel Cluster Studio XE
Getting the maximum performance in distributed clusters Intel Cluster Studio XEGetting the maximum performance in distributed clusters Intel Cluster Studio XE
Getting the maximum performance in distributed clusters Intel Cluster Studio XEIntel Software Brasil
 
Intel® XDK Разработка мобильных HTML5 приложений. Максим Хухро, Intel
Intel® XDK Разработка мобильных HTML5 приложений. Максим Хухро, Intel Intel® XDK Разработка мобильных HTML5 приложений. Максим Хухро, Intel
Intel® XDK Разработка мобильных HTML5 приложений. Максим Хухро, Intel Apps4All
 
Intel XDK - Philly JS
Intel XDK - Philly JSIntel XDK - Philly JS
Intel XDK - Philly JSIan Maffett
 
Intel Ultrabook Software Development Tools - Intel AppLab Berlin
Intel Ultrabook Software Development Tools - Intel AppLab BerlinIntel Ultrabook Software Development Tools - Intel AppLab Berlin
Intel Ultrabook Software Development Tools - Intel AppLab BerlinIntel Developer Zone Community
 
Accelerate Your Python* Code through Profiling, Tuning, and Compilation Part ...
Accelerate Your Python* Code through Profiling, Tuning, and Compilation Part ...Accelerate Your Python* Code through Profiling, Tuning, and Compilation Part ...
Accelerate Your Python* Code through Profiling, Tuning, and Compilation Part ...Intel® Software
 
Intel® VTune™ Amplifier - Intel Software Conference 2013
Intel® VTune™ Amplifier - Intel Software Conference 2013Intel® VTune™ Amplifier - Intel Software Conference 2013
Intel® VTune™ Amplifier - Intel Software Conference 2013Intel Software Brasil
 
Using JavaScript to Build HTML5 Tools (Ian Maffett)
Using JavaScript to Build HTML5 Tools (Ian Maffett)Using JavaScript to Build HTML5 Tools (Ian Maffett)
Using JavaScript to Build HTML5 Tools (Ian Maffett)Future Insights
 
Across the Silicon Spectrum: Xeon Phi to Quark – Unleash the Performance in Y...
Across the Silicon Spectrum: Xeon Phi to Quark – Unleash the Performance in Y...Across the Silicon Spectrum: Xeon Phi to Quark – Unleash the Performance in Y...
Across the Silicon Spectrum: Xeon Phi to Quark – Unleash the Performance in Y...Intel Software Brasil
 
Ready access to high performance Python with Intel Distribution for Python 2018
Ready access to high performance Python with Intel Distribution for Python 2018Ready access to high performance Python with Intel Distribution for Python 2018
Ready access to high performance Python with Intel Distribution for Python 2018AWS User Group Bengaluru
 
Explore, design and implement threading parallelism with Intel® Advisor XE
Explore, design and implement threading parallelism with Intel® Advisor XEExplore, design and implement threading parallelism with Intel® Advisor XE
Explore, design and implement threading parallelism with Intel® Advisor XEIntel IT Center
 
Introduction ciot workshop premeetup
Introduction ciot workshop premeetupIntroduction ciot workshop premeetup
Introduction ciot workshop premeetupBeMyApp
 
Build HTML5 VR Apps using Intel® XDK
Build HTML5 VR Apps using Intel® XDKBuild HTML5 VR Apps using Intel® XDK
Build HTML5 VR Apps using Intel® XDKIntel® Software
 
【視覺進化論】AI智慧視覺運算技術論壇_2_ChungYeh
【視覺進化論】AI智慧視覺運算技術論壇_2_ChungYeh【視覺進化論】AI智慧視覺運算技術論壇_2_ChungYeh
【視覺進化論】AI智慧視覺運算技術論壇_2_ChungYehMAKERPRO.cc
 
Intel NFVi Enabling Kit Demo/Lab
Intel NFVi Enabling Kit Demo/LabIntel NFVi Enabling Kit Demo/Lab
Intel NFVi Enabling Kit Demo/LabMichelle Holley
 
How to create a high quality, fast texture compressor using ISPC
How to create a high quality, fast texture compressor using ISPC How to create a high quality, fast texture compressor using ISPC
How to create a high quality, fast texture compressor using ISPC Gael Hofemeier
 
Intel® Trace Analyzer e Collector (ITAC) - Intel Software Conference 2013
Intel® Trace Analyzer e Collector (ITAC) - Intel Software Conference 2013Intel® Trace Analyzer e Collector (ITAC) - Intel Software Conference 2013
Intel® Trace Analyzer e Collector (ITAC) - Intel Software Conference 2013Intel Software Brasil
 

Ähnlich wie Кирилл Мавродиев, Intel – Обзор современных возможностей по распараллеливанию и векторизации приложений с использованием Parallels Composer (20)

Developing Multi-OS Native Mobile Applications with Intel INDE
Developing Multi-OS Native Mobile Applications with Intel INDEDeveloping Multi-OS Native Mobile Applications with Intel INDE
Developing Multi-OS Native Mobile Applications with Intel INDE
 
Getting the maximum performance in distributed clusters Intel Cluster Studio XE
Getting the maximum performance in distributed clusters Intel Cluster Studio XEGetting the maximum performance in distributed clusters Intel Cluster Studio XE
Getting the maximum performance in distributed clusters Intel Cluster Studio XE
 
Intel® XDK Разработка мобильных HTML5 приложений. Максим Хухро, Intel
Intel® XDK Разработка мобильных HTML5 приложений. Максим Хухро, Intel Intel® XDK Разработка мобильных HTML5 приложений. Максим Хухро, Intel
Intel® XDK Разработка мобильных HTML5 приложений. Максим Хухро, Intel
 
Multi-OS Engine Technology Overview
Multi-OS Engine Technology OverviewMulti-OS Engine Technology Overview
Multi-OS Engine Technology Overview
 
Intel XDK - Philly JS
Intel XDK - Philly JSIntel XDK - Philly JS
Intel XDK - Philly JS
 
Intel Ultrabook Software Development Tools - Intel AppLab Berlin
Intel Ultrabook Software Development Tools - Intel AppLab BerlinIntel Ultrabook Software Development Tools - Intel AppLab Berlin
Intel Ultrabook Software Development Tools - Intel AppLab Berlin
 
Accelerate Your Python* Code through Profiling, Tuning, and Compilation Part ...
Accelerate Your Python* Code through Profiling, Tuning, and Compilation Part ...Accelerate Your Python* Code through Profiling, Tuning, and Compilation Part ...
Accelerate Your Python* Code through Profiling, Tuning, and Compilation Part ...
 
Intel tools to optimize HPC systems
Intel tools to optimize HPC systemsIntel tools to optimize HPC systems
Intel tools to optimize HPC systems
 
MeeGo Overview DeveloperDay Munich
MeeGo Overview DeveloperDay MunichMeeGo Overview DeveloperDay Munich
MeeGo Overview DeveloperDay Munich
 
Intel® VTune™ Amplifier - Intel Software Conference 2013
Intel® VTune™ Amplifier - Intel Software Conference 2013Intel® VTune™ Amplifier - Intel Software Conference 2013
Intel® VTune™ Amplifier - Intel Software Conference 2013
 
Using JavaScript to Build HTML5 Tools (Ian Maffett)
Using JavaScript to Build HTML5 Tools (Ian Maffett)Using JavaScript to Build HTML5 Tools (Ian Maffett)
Using JavaScript to Build HTML5 Tools (Ian Maffett)
 
Across the Silicon Spectrum: Xeon Phi to Quark – Unleash the Performance in Y...
Across the Silicon Spectrum: Xeon Phi to Quark – Unleash the Performance in Y...Across the Silicon Spectrum: Xeon Phi to Quark – Unleash the Performance in Y...
Across the Silicon Spectrum: Xeon Phi to Quark – Unleash the Performance in Y...
 
Ready access to high performance Python with Intel Distribution for Python 2018
Ready access to high performance Python with Intel Distribution for Python 2018Ready access to high performance Python with Intel Distribution for Python 2018
Ready access to high performance Python with Intel Distribution for Python 2018
 
Explore, design and implement threading parallelism with Intel® Advisor XE
Explore, design and implement threading parallelism with Intel® Advisor XEExplore, design and implement threading parallelism with Intel® Advisor XE
Explore, design and implement threading parallelism with Intel® Advisor XE
 
Introduction ciot workshop premeetup
Introduction ciot workshop premeetupIntroduction ciot workshop premeetup
Introduction ciot workshop premeetup
 
Build HTML5 VR Apps using Intel® XDK
Build HTML5 VR Apps using Intel® XDKBuild HTML5 VR Apps using Intel® XDK
Build HTML5 VR Apps using Intel® XDK
 
【視覺進化論】AI智慧視覺運算技術論壇_2_ChungYeh
【視覺進化論】AI智慧視覺運算技術論壇_2_ChungYeh【視覺進化論】AI智慧視覺運算技術論壇_2_ChungYeh
【視覺進化論】AI智慧視覺運算技術論壇_2_ChungYeh
 
Intel NFVi Enabling Kit Demo/Lab
Intel NFVi Enabling Kit Demo/LabIntel NFVi Enabling Kit Demo/Lab
Intel NFVi Enabling Kit Demo/Lab
 
How to create a high quality, fast texture compressor using ISPC
How to create a high quality, fast texture compressor using ISPC How to create a high quality, fast texture compressor using ISPC
How to create a high quality, fast texture compressor using ISPC
 
Intel® Trace Analyzer e Collector (ITAC) - Intel Software Conference 2013
Intel® Trace Analyzer e Collector (ITAC) - Intel Software Conference 2013Intel® Trace Analyzer e Collector (ITAC) - Intel Software Conference 2013
Intel® Trace Analyzer e Collector (ITAC) - Intel Software Conference 2013
 

Mehr von Media Gorod

Iidf market watch_2013
Iidf market watch_2013Iidf market watch_2013
Iidf market watch_2013Media Gorod
 
E travel 2013 ufs-f
E travel 2013 ufs-fE travel 2013 ufs-f
E travel 2013 ufs-fMedia Gorod
 
Travel shop 2013
Travel shop 2013Travel shop 2013
Travel shop 2013Media Gorod
 
Kozyakov pay u_e-travel2013
Kozyakov pay u_e-travel2013Kozyakov pay u_e-travel2013
Kozyakov pay u_e-travel2013Media Gorod
 
13909772985295c7a772abc7.11863824
13909772985295c7a772abc7.1186382413909772985295c7a772abc7.11863824
13909772985295c7a772abc7.11863824Media Gorod
 
As e-travel 2013
As   e-travel 2013As   e-travel 2013
As e-travel 2013Media Gorod
 
Ishounkina internet research-projects
Ishounkina internet research-projectsIshounkina internet research-projects
Ishounkina internet research-projectsMedia Gorod
 
Orlova pay u group_290813_
Orlova pay u group_290813_Orlova pay u group_290813_
Orlova pay u group_290813_Media Gorod
 
Ep presentation (infographic 2013)
Ep presentation (infographic 2013)Ep presentation (infographic 2013)
Ep presentation (infographic 2013)Media Gorod
 
Iway slides e-travel_2013-11_ready
Iway slides e-travel_2013-11_readyIway slides e-travel_2013-11_ready
Iway slides e-travel_2013-11_readyMedia Gorod
 
Data insight e-travel2013
Data insight e-travel2013Data insight e-travel2013
Data insight e-travel2013Media Gorod
 
Электронное Правительство как Продукт
Электронное Правительство как ПродуктЭлектронное Правительство как Продукт
Электронное Правительство как ПродуктMedia Gorod
 
Lean мышление / Специфика Lean Startup
Lean мышление / Специфика Lean StartupLean мышление / Специфика Lean Startup
Lean мышление / Специфика Lean StartupMedia Gorod
 
Глобальный взгляд на мобильный мир (Nielsen)
 Глобальный взгляд на мобильный мир (Nielsen) Глобальный взгляд на мобильный мир (Nielsen)
Глобальный взгляд на мобильный мир (Nielsen)Media Gorod
 
Как россияне используют смартфоны (Nielsen)
 Как россияне используют смартфоны (Nielsen) Как россияне используют смартфоны (Nielsen)
Как россияне используют смартфоны (Nielsen)Media Gorod
 
Мобильный интернет в России (MailRuGroup)
Мобильный интернет в России (MailRuGroup) Мобильный интернет в России (MailRuGroup)
Мобильный интернет в России (MailRuGroup) Media Gorod
 
Karlovyvaryparti 130406024405-phpapp02
Karlovyvaryparti 130406024405-phpapp02Karlovyvaryparti 130406024405-phpapp02
Karlovyvaryparti 130406024405-phpapp02Media Gorod
 

Mehr von Media Gorod (20)

Itogi2013
Itogi2013Itogi2013
Itogi2013
 
Moneytree rus 1
Moneytree rus 1Moneytree rus 1
Moneytree rus 1
 
Iidf market watch_2013
Iidf market watch_2013Iidf market watch_2013
Iidf market watch_2013
 
E travel 2013 ufs-f
E travel 2013 ufs-fE travel 2013 ufs-f
E travel 2013 ufs-f
 
Travel shop 2013
Travel shop 2013Travel shop 2013
Travel shop 2013
 
Kozyakov pay u_e-travel2013
Kozyakov pay u_e-travel2013Kozyakov pay u_e-travel2013
Kozyakov pay u_e-travel2013
 
13909772985295c7a772abc7.11863824
13909772985295c7a772abc7.1186382413909772985295c7a772abc7.11863824
13909772985295c7a772abc7.11863824
 
As e-travel 2013
As   e-travel 2013As   e-travel 2013
As e-travel 2013
 
Ishounkina internet research-projects
Ishounkina internet research-projectsIshounkina internet research-projects
Ishounkina internet research-projects
 
Orlova pay u group_290813_
Orlova pay u group_290813_Orlova pay u group_290813_
Orlova pay u group_290813_
 
Ep presentation (infographic 2013)
Ep presentation (infographic 2013)Ep presentation (infographic 2013)
Ep presentation (infographic 2013)
 
Iway slides e-travel_2013-11_ready
Iway slides e-travel_2013-11_readyIway slides e-travel_2013-11_ready
Iway slides e-travel_2013-11_ready
 
Data insight e-travel2013
Data insight e-travel2013Data insight e-travel2013
Data insight e-travel2013
 
Электронное Правительство как Продукт
Электронное Правительство как ПродуктЭлектронное Правительство как Продукт
Электронное Правительство как Продукт
 
Lean мышление / Специфика Lean Startup
Lean мышление / Специфика Lean StartupLean мышление / Специфика Lean Startup
Lean мышление / Специфика Lean Startup
 
Глобальный взгляд на мобильный мир (Nielsen)
 Глобальный взгляд на мобильный мир (Nielsen) Глобальный взгляд на мобильный мир (Nielsen)
Глобальный взгляд на мобильный мир (Nielsen)
 
Как россияне используют смартфоны (Nielsen)
 Как россияне используют смартфоны (Nielsen) Как россияне используют смартфоны (Nielsen)
Как россияне используют смартфоны (Nielsen)
 
Мобильный интернет в России (MailRuGroup)
Мобильный интернет в России (MailRuGroup) Мобильный интернет в России (MailRuGroup)
Мобильный интернет в России (MailRuGroup)
 
Meta Mass Media
Meta Mass MediaMeta Mass Media
Meta Mass Media
 
Karlovyvaryparti 130406024405-phpapp02
Karlovyvaryparti 130406024405-phpapp02Karlovyvaryparti 130406024405-phpapp02
Karlovyvaryparti 130406024405-phpapp02
 

Кирилл Мавродиев, Intel – Обзор современных возможностей по распараллеливанию и векторизации приложений с использованием Parallels Composer

  • 1. Software & Services Group, Developer Products Division Copyright© 2010, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. Intel® Parallel Studio 2011 Essential Performance Design l Build & Debug l Verify l Tune Kirill Mavrodiev kirill.mavrodiev@intel.com EMEA Compiler TCE (Software Engineer) SSG DPD ICL
  • 2. Software & Services Group, Developer Products Division Copyright© 2010, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. AGENDA •Intel® Parallel Studio 2011 •Intel® Parallel Composer –Intel® Cilk Plus Key words –Array Notation –Guided Auto-Parallelization (GAP) –Intel® Parallel Debugger Extension 2 Intel® Parallel Studio 2011 Introduction
  • 3. Software & Services Group, Developer Products Division Copyright© 2010, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. Three Product Lines for Diverse Needs Essential Performance Advanced Performance Distributed Performance C/C++ developers Microsoft Visual Studio* Take advantage of multicore C++ and Fortran developers Windows* and Linux* High performance, cross platform apps C++ and Fortran developers on Windows* and Linux* High performance MPI clusters intel.com/software/products 3 Intel® Parallel Studio 2011 Introduction
  • 4. Software & Services Group, Developer Products Division Copyright© 2010, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. What’s New in Intel® Parallel Studio 2011 4 Intel® Parallel Building Blocks – A comprehensive, portable, reliable, future proof parallel models for both data and task parallelism • Intel® Threading Building Blocks • Intel® Cilk Plus • Intel® Array Building Blocks (Beta) Intel® Parallel Advisor –Parallelism design guide • Demystifies and speeds parallel application design • Gives parallelism design insight and analysis through Explorer and Modeler analysis tools Now includes Intel® Premier Support • Unlimited technical support and upgrades for one year Enhancements • Intel® Threading Building Blocks 3.0 • Compiler improvements • Microsoft Visual Studio* 2010 integration Intel® Parallel Studio 2011 Introduction
  • 5. Software & Services Group, Developer Products Division Copyright© 2010, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. Intel® Parallel Studio 2011 5 • All-in-one toolset for the software development lifecycle • Microsoft Visual Studio* plug-in – 2005, 2008 and 2010 Intel® Parallel Studio 2011 Introduction
  • 6. Software & Services Group, Developer Products Division Copyright© 2010, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. Intel® Parallel Advisor Step by Step Guidance 6 Focus on the hot call trees and loops as locations to experiment with parallelism. Advisor annotations into source code to describe their parallel experiment. Evaluate the performance of parallel experiment by displaying the performance projection for each parallel site and how each site‟s performance impacts the entire program. Identifies data issues (races) of each parallel experiment. Intel® Parallel Studio 2011 Introduction
  • 7. Software & Services Group, Developer Products Division Copyright© 2010, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. Intel® Parallel Composer BUILD & DEBUG PHASE 7 Develop high performance applications with a optimized C/C++ compiler and comprehensive threaded libraries Intel® Integrated Performance Primitives Extensive library highly optimized software functions for digital media and data-processing applications Improve Performance Easier, faster performance for Windows* apps Intel® Parallel Building Blocks Comprehensive set of parallel development models that support multiple approaches to parallelism. Intel® Parallel Studio 2011 Introduction
  • 8. Software & Services Group, Developer Products Division Copyright© 2010, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. Intel’s Family of Parallel Programming Models 8 MPI Intel® Concurrent Collections OpenMP* OpenCL* Intel® Cilk Plus Intel® Math Kernel Library (MKL) Intel® Integrated Performance Primitives (IPP) Intel® Threading Building Blocks (TBB) Intel® Array Building Blocks (ArBB) Fixed Function Libraries Established Standards Research and Exploration Intel® Parallel Building Blocks (PBB) Intel® Cilk Plus, Intel® TBB: Part of Intel® Parallel Studio 2011 Intel® Array Building Blocks: Known by code names „Intel Ct‟ or „Intel Firetown”; public beta started around mid September 2010 Intel® Parallel Studio 2011 Introduction 8
  • 9. Software & Services Group, Developer Products Division Copyright© 2010, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. Intel® Parallel Building Blocks Details Intel® Parallel Studio 2011 Introduction 9
  • 10. Software & Services Group, Developer Products Division Copyright© 2010, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. Intel® Parallel Inspector VERIFY PHASE 10 Identify memory issues in serial and parallel applications in addition to threading errors. Find Memory Errors Find a wide variety of memory errors Find Threading Errors Find data races and deadlocks Improve Reliability Ensure application reliability with proactive memory and threading error checking Intel® Parallel Studio 2011 Introduction
  • 11. Software & Services Group, Developer Products Division Copyright© 2010, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. Intel® Parallel Amplifier TUNE PHASE 11 Hotspot Analysis Where does my program spend most of the time? Concurrency Analysis Where and Why doesn‟t my program utilize all available cores? Locks & Wait Analysis Where and Why does my program wait? Optimize serial and parallel application performance with 3 easy to use, powerful analysis methods Intel® Parallel Studio 2011 Introduction
  • 12. Software & Services Group, Developer Products Division Copyright© 2010, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. Intel® Parallel Studio 2011 Summary • For over a year, Intel Parallel Studio has made it easier for Windows* developers to create fast, reliable applications for multicore • This release is a major update – Intel® Parallel Building Blocks adds significant new parallelism models – Intel® Parallel Advisor empowers software architects with parallelism design insight and analysis for building reliable, high performance applications for multicore – Other enhancements including support for Visual Studio* 2010 www.intel.com/go/parallel: Try it Right Now! 12 Intel® Parallel Studio 2011 Introduction
  • 13. Software & Services Group, Developer Products Division Copyright© 2010, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. Intel® Parallel Composer
  • 14. Software & Services Group, Developer Products Division Copyright© 2010, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. Intel® Cilk™ Plus Language extensions to simplify task & data parallelism • C++ language extension that provides three simple keywords to write parallel code – Loop-type data parallelism using cilk_for – General parallelism using cilk_spawn and cilk_sync • Unambiguous semantics, strict fork–join model via compiler support • Easiest for the programmer to understand the parallel control flow • Automatic load balancing via work stealing • Low-overhead task spawning, encourage programmers to create many small tasks • A program with many small tasks provide opportunity for the task scheduler both to load balance and forward scale to larger core counts Intel® Parallel Studio 2011 Introduction 14
  • 15. Software & Services Group, Developer Products Division Copyright© 2010, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. Intel® Cilk™ Plus Cilk adds the following new keywords:  _Cilk_spawn for spawning a function call that executes asynchronously,  _Cilk_sync for synchronization point to wait for children spawned inside that function,  _Cilk_for for parallel for-loop that executes iterations in parallel. Cilk includes Reducers for lock-free access to global data:  Use built-in reducers for common types – strings, summation, min/max, logical operations, and more.  Write custom reducers to manage any data type. 15 Intel® Parallel Studio 2011 Introduction
  • 16. Software & Services Group, Developer Products Division Copyright© 2010, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. Simple Divide and Conquer Example #include "cilk/cilk.h" int fib(int n) { int x, y; if (n<2) return n; x = cilk_spawn fib(n-1); y = fib(n-2); cilk_sync; return x+y; }; int main () { printf("Fib of 40 is %dn", fib(40)); return 0; } Allow fib(n-1) to run in parallel with fib(n-2) Ensure that all parallel work is complete before using the result 16 Intel® Parallel Studio 2011 Introduction
  • 17. Software & Services Group, Developer Products Division Copyright© 2010, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. Intel® Cilk Plus Tachyon Implementation 17 Intel® Parallel Studio 2011 Introduction
  • 18. Software & Services Group, Developer Products Division Copyright© 2010, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. Array Notations for data parallelism • New array extension to C/C++ language • Specify parallel operations on arrays (instead of sequential loops) • Predictable performance based on mapping parallel constructs to underlying multi-threading/SIMD hardware • Works seamlessly with existing C/C++ frameworks and runtimes: Intel® TBB, OpenMP*, MPI, Intel® Cilk™ Plus, Pthreads, etc. 18 Intel® Parallel Studio 2011 Introduction
  • 19. Software & Services Group, Developer Products Division Copyright© 2010, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. Array Section Notation • Array Section Notation <array base> [<lower_bound> : <length> : <stride>]* – „:<stride>‟ is optional ( defaults to stride=1 ) – missing „:<length>:<stride>‟ implies length=1 – Simple „:‟ select all elements of this dimension – Note syntax difference to Fortran section which is lower_bound : upper_bound : [stride] • Samples: A[:] // All elements of vector A B[2:6] // Elements 2 to 7 of vector B C[:][5] // Column 5 of matrix C D[0:3:2] // Elements 0,2,4 of vector D E[0:3][0:4] // 12 elements from E[0][0] to E[2][3] 19 Intel® Parallel Studio 2011 Introduction
  • 20. Software & Services Group, Developer Products Division Copyright© 2010, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. Intel® Cilk Plus implementation cilk_for(int k = 0; k < nz; k++){ for(int j = 0; j < ny; j++) for(int i = 0; i < nx; i+=STRIDE){ tmp[:] = rhs[ID(i,j,k) : STRIDE] + x[IDEA(i,j,k) : STRIDE] * 6.0 - (x[ID(i-1,j,k) : STRIDE] + x[ID(i+1,j,k) : STRIDE]+ x[ID(i,j-1,k) : STRIDE] + x[ID(i,j+1,k) : STRIDE]+ x[ID(i,j,k-1) : STRIDE] + x[ID(i,j,k+1) : STRIDE]); residueConvergeStrongCReducer = cilk::max_of ( residueConvergeStrongCReducer, __sec_reduce_max (fabs(tmp[:])) ); residueConvergeStrongL2Reducer += __sec_reduce_add (tmp[:]*tmp[:]); } } 20 Intel® Parallel Studio 2011 Introduction
  • 21. Software & Services Group, Developer Products Division Copyright© 2010, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. Intel® Cilk Plus implementation 21 Intel® Parallel Studio 2011 Introduction
  • 22. Software & Services Group, Developer Products Division Copyright© 2010, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. GAP: Guided Auto-parallelization • Targeted for Mainstream and HPC Users • Advice to change code for more auto-vectorization, auto-parallelization and data transformations • Diagnostic guidance generated when invoked • Advice may involve – suggestions for source-change – adding pragmas – adding new options • Simple source changes that assert new properties – Add a new pragma for loop if semantics are satisfied – Use a local-variable for the upper-bound of a loop – Initialize scalar variable unconditionally at top of loop – Reorder fields of a structure (or split into two) • Desired behavior – Each advice is specific using source-level variable names – User does semantic analysis – apply or reject each advice – Advice should be as localized as possible – Following the advice should result in better optimizations 22 Intel® Parallel Studio 2011 Introduction
  • 23. Software & Services Group, Developer Products Division Copyright© 2010, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. GAP – How it Works Selection of most Relevant Switches Multiple compiler switches to activate and fine-tune guidance analysis • Activate messages individually for vectorization, parallelization, data transformations or all three -guide-vec[=level] -guide-par[=level] -guide-data-trans[=level] -guide[=level] Optional argument level=1,2,3,4 controls extend of analysis; Intel Composer only supports up to level 3 • Control the source code part for which analysis is done -guide-opts=<arg> Samples: -guide-opts=“convert.c,'funca(int)'“ -guide-opts="bar.f90,'module_1::func_solve'“ • Control where the message are going -guide-file=<file_name> -guide-file-append<=file_name> 23 Intel® Parallel Studio 2011 Introduction
  • 24. Software & Services Group, Developer Products Division Copyright© 2010, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. GAP Case Study extern int num_nodes; typedef struct TEST_STRUCT { // Coordinates of city1 float latitude1; float longitude1; int city_id1; int stops[10000]; // Currently unused field // Coordinates of city2 float latitude2; float longitude2; int city_id2; } test_struct; extern float *distances; extern test_struct** nodes; void process_nodes(void) { float const R = 3964.0; float temp, lat1, lat2, long1, long2, result; int temp1 = num_nodes; //#pragma loop count min(16) //#pragma parallel // for (int k=0; k < temp1; k++) { for (int k=0; k < num_nodes; k++) { lat1 = nodes[k]->latitude1; lat2 = nodes[k]->latitude2; long1 = nodes[k]->longitude1; long2 = nodes[k]->longitude2; // Compute the distance between the two cities temp = sin(lat1) * sin(lat2) + cos(lat1) * cos(lat2) * cos(long1-long2); result = 2.0 * R * atan(sqrt((1.0-temp)/(1.0+temp))); // Store the distance computed in the distances array distances[k] = result; } } [c:/test2/usability2] icl -c distance.cpp -Qguide=4 -Qparallel GAP REPORT LOG OPENED ON Wed Mar 03 18:34:01 2010 c:test2usability2distance.h(2): remark #30755: (DTRANS) Reorderi ng the fields of the structure 'TEST_STRUCT' will improve data locality. Suggested field order: 'stops, latitude1, longitude1, latitude2, longitude2, city_id1, city_id2'. [VERIFY] The suggestion is based on the field references in current compilation. Please make sure that the restructured code satisfies the original program semantics. c:test2gap_examplesusability2distance.cpp(30): remark #30534: (LOOP) Add -Qansi-alias option for better type-based disambiguation analysis by the compiler if appropriate (option will apply for entire compilation). This will improve optimizations for the loop at line 30 [VERIFY] Make sure that the semantics of this option is obeyed for entire compilation. c:test2usability2distance.cpp(29): remark #30519: (PAR) Use "#pr agma parallel" to parallelize the loop at line 29, if these arrays in the loop d o not have cross-iteration dependencies: nodes, distances. [VERIFY] A cross- iteration dependency exists if a memory location is modified in an iteration of the loop and accessed (a read or a write) in another iteration of the loop. Make sure that there are no such dependencies. c:test2gap_examplesusability2distance.cpp(29): remark #30525: (PAR) If the trip count of the loop at line 29 is greater than 16, then use "#pragma loop count min(16)" to parallelize this loop. [VERIFY] Make sure that the loop has a minimum of 16 iterations. c:test2gap_examplesusability2distance.cpp(48): remark #30525: (PAR) If the trip count of the loop at line 48 is greater than 751, then use "#pragma loop count min(751)" to parallelize this loop. [VERIFY] Make sure that the loop has a minimum of 751 iterations. END OF GAP REPORT LOG 24 Intel® Parallel Studio 2011 Introduction
  • 25. Software & Services Group, Developer Products Division Copyright© 2010, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. Intel® Debugger & Intel® Parallel Debugger Extension Linux* Mac*OS Windows* Intel® Debugger (IDB) Intel® Parallel Debugger Extension Intel® C++ Composer XE Intel® Fortran Composer XE Intel® Cluster Toolkit Compiler Edition Intel® C++ Composer XE Intel® Fortran Composer XE Intel® C++ Composer XE Intel® Visual Fortran Composer XE Intel® Parallel Composer
  • 26. Software & Services Group, Developer Products Division Copyright© 2010, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. Thread Shared Data Event Detection Break on Thread Shared Data Access (read/write) Re-entrant Function Detection SIMD SSE Registers Window Enhanced OpenMP* Support Serialize OpenMP threaded application execution on the fly Insight into thread groups, barriers, locks, wait lists etc. Key Features Intel® Parallel Studio 2011 Introduction 26
  • 27. Software & Services Group, Developer Products Division Copyright© 2010, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. Questions? 27 Intel® Parallel Studio 2011 Introduction
  • 28. Software & Services Group, Developer Products Division Copyright© 2010, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. Legal Disclaimer INFORMATION IN THIS DOCUMENT IS PROVIDED “AS IS”. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL PROPERTY RIGHTS IS GRANTED BY THIS DOCUMENT. INTEL ASSUMES NO LIABILITY WHATSOEVER AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY, RELATING TO THIS INFORMATION INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT, COPYRIGHT OR OTHER INTELLECTUAL PROPERTY RIGHT. Performance tests and ratings are measured using specific computer systems and/or components and reflect the approximate performance of Intel products as measured by those tests. Any difference in system hardware or software design or configuration may affect actual performance. Buyers should consult other sources of information to evaluate the performance of systems or components they are considering purchasing. For more information on performance tests and on the performance of Intel products, reference www.intel.com/software/products. Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and other countries. *Other names and brands may be claimed as the property of others. Copyright © 2010. Intel Corporation. http://www.intel.com/software/products Intel® Parallel Studio 2011 Introduction 28