SlideShare ist ein Scribd-Unternehmen logo
1 von 10
Post-processing SAR images on Xeon Phi – a
porting exercise
Martin Hilgeman
HPC Consultant EMEA
Research Computing
• Xeon Phi porting checklist
• Post processing of SAR images
• Questions
Agenda
Dell Confidential
Research Computing
SAR interferometry is used to monitor terrain displacements by using satellite data
The phase difference between two SAR images is calculated that have been acquired at:
› different times
› with slightly different view angles
› different observation dates
Dell Confidential
Case study: Geospatial imaging
Research Computing
Application characteristics:
– Written in C90 and C++, ~ 150,000 lines of code
– Serial application, using Intel MKL DFTI functions
– Iterative scheme
• MKL FFTs are already being used, little room for improvement
– FFTs are too small to run in threaded mode
• Application only runs in serial mode, needs to be parallelized in order to run on a Phi in
reasonable speed
• Phase images are divided into patches, which is suitable for parallelization
– Parallelize the main loop using OpenMP
Dell Confidential
Code details
Research Computing
Test Setup
Confidential5
Jobs ran at the TACC Stampede system
PowerEdge C8220 with Intel® Xeon® E5-2680 2.7GHz
– 16 cores
– 32 GB memory
– RHEL 6.3
– Intel® Xeon Phi™ 7120P
Research Computing
Parallelization of big main loop
count = 1;
if (lastGCP != NULL) {
app = lastGCP->next;
} else {
app = bufferGCP;
}
while (app) {
app->loc [0] = (int) app->loc [0];
app->loc [1] = (int) app->loc [1];
app->offset[0] = (int) app->offset[0];
app->offset[1] = (int) app->offset[1];
<lot of work>
lastGCP = app;
app = app->next;
count++;
}
6 Confidential
count = 1;
if (lastGCP != NULL) {
app = lastGCP->next;
} else {
app = bufferGCP;
}
int i = 0;
int nthreads, me;
#if defined _OPENMP
#pragma omp parallel 
private(i,me,app2,err,suboffset,qc,m_block,s_block)
{
nthreads = omp_get_num_threads();
me = omp_get_thread_num();
#else
nthreads = 1;
me = 0;
#endif
#pragma omp for schedule(static)
for (i = 0; i < buffer_ngcp; i++) {
#pragma omp critical
{
app2 = app;
app = app->next;
}
app2->loc [0] = (int) app2->loc [0];
app2->loc [1] = (int) app2->loc [1];
app2->offset[0] = (int) app2->offset[0];
app2->offset[1] = (int) app2->offset[1];
<lot of work>
lastGCP = app2;
count++;
}
#if defined _OPENMP
}
#endif
Research ComputingDell Confidential
Results
Wall Clock Time (ss:00)
Original version on E5440 2.83 GHz 102.00
Original version on E5-2680 2.70 GHz 27.43
Additional source optimizations 25.83
2 threads 17.37
4 threads 9.97
6 threads 7.17
8 threads 5.77
12 threads 5.57
Research Computing
• Now runs multi-threaded using OpenMP directives
– The number of patches is greater than the number of threads on the Phi
• Memory footprint is small, so should fit on the card
– Running in native mode is possible
• Intel MKL has complete support for Xeon Phi
– Assume that the FFTs are making optimal use of the resources
Dell Confidential
How about Intel® Xeon Phi™ performance?
Research Computing
Module Summary
--------------------------------------------------------------------------------
Samples Self % Total % Module
247 59.81% 59.81% /lib64/libc-2.12.so
79 19.13% 78.93% /scratch/dell-guest/app
38 9.20% 88.14% /opt/intel/mkl/lib/intel64/libmkl_core.so
34 8.23% 96.37% /opt/intel/mkl/lib/intel64/libmkl_intel_thread.so
10 2.42% 98.79% /opt/intel/mkl/lib/intel64/libmkl_intel_lp64.so
5 1.21% 100.00% /lib64/ld-2.12.so
File Summary
--------------------------------------------------------------------------------
Samples Self % Total % File
332 80.39% 80.39% ??
64 15.50% 95.88% malloc.c
10 2.42% 98.31% interp.c
3 0.73% 99.03% bsearch.c
2 0.48% 99.52% SRC/patch.c
2 0.48% 100.00% printf_fp.c
Function Summary
--------------------------------------------------------------------------------
Samples Self % Total % Function
126 30.51% 30.51% brk
69 16.71% 47.22% ??
60 14.53% 61.74% _int_malloc
43 10.41% 72.15% pmatch
34 8.23% 80.39% memcpy
13 3.15% 83.54% get_correlation_real_mkl
10 2.42% 85.96% extract2double
8 1.94% 87.89% __intel_new_memcpy
Dell Confidential
A profile
Research ComputingDell Confidential
Questions?

Weitere ähnliche Inhalte

Was ist angesagt?

Computer Performance Microscopy with SHIM
Computer Performance Microscopy with SHIMComputer Performance Microscopy with SHIM
Computer Performance Microscopy with SHIMhiyangxi
 
MLOps Case Studies: Building fast, scalable, and high-accuracy ML systems at ...
MLOps Case Studies: Building fast, scalable, and high-accuracy ML systems at ...MLOps Case Studies: Building fast, scalable, and high-accuracy ML systems at ...
MLOps Case Studies: Building fast, scalable, and high-accuracy ML systems at ...Masashi Shibata
 
Two C++ Tools: Compiler Explorer and Cpp Insights
Two C++ Tools: Compiler Explorer and Cpp InsightsTwo C++ Tools: Compiler Explorer and Cpp Insights
Two C++ Tools: Compiler Explorer and Cpp InsightsAlison Chaiken
 
VkRunner: a simple Vulkan shader script test utility [Lightning Talk] (Lightn...
VkRunner: a simple Vulkan shader script test utility [Lightning Talk] (Lightn...VkRunner: a simple Vulkan shader script test utility [Lightning Talk] (Lightn...
VkRunner: a simple Vulkan shader script test utility [Lightning Talk] (Lightn...Igalia
 
190111 tf2 preview_jwkang_pub
190111 tf2 preview_jwkang_pub190111 tf2 preview_jwkang_pub
190111 tf2 preview_jwkang_pubJaewook. Kang
 
function* - ES6, generators, and all that (JSRomandie meetup, February 2014)
function* - ES6, generators, and all that (JSRomandie meetup, February 2014)function* - ES6, generators, and all that (JSRomandie meetup, February 2014)
function* - ES6, generators, and all that (JSRomandie meetup, February 2014)Igalia
 
Include
IncludeInclude
Includezniker
 
Cypher for Gremlin
Cypher for GremlinCypher for Gremlin
Cypher for GremlinopenCypher
 
Unity best practices (2013)
Unity best practices (2013)Unity best practices (2013)
Unity best practices (2013)Benjamin Robert
 

Was ist angesagt? (14)

Computer Performance Microscopy with SHIM
Computer Performance Microscopy with SHIMComputer Performance Microscopy with SHIM
Computer Performance Microscopy with SHIM
 
MLOps Case Studies: Building fast, scalable, and high-accuracy ML systems at ...
MLOps Case Studies: Building fast, scalable, and high-accuracy ML systems at ...MLOps Case Studies: Building fast, scalable, and high-accuracy ML systems at ...
MLOps Case Studies: Building fast, scalable, and high-accuracy ML systems at ...
 
Two C++ Tools: Compiler Explorer and Cpp Insights
Two C++ Tools: Compiler Explorer and Cpp InsightsTwo C++ Tools: Compiler Explorer and Cpp Insights
Two C++ Tools: Compiler Explorer and Cpp Insights
 
Sorter
SorterSorter
Sorter
 
VkRunner: a simple Vulkan shader script test utility [Lightning Talk] (Lightn...
VkRunner: a simple Vulkan shader script test utility [Lightning Talk] (Lightn...VkRunner: a simple Vulkan shader script test utility [Lightning Talk] (Lightn...
VkRunner: a simple Vulkan shader script test utility [Lightning Talk] (Lightn...
 
190111 tf2 preview_jwkang_pub
190111 tf2 preview_jwkang_pub190111 tf2 preview_jwkang_pub
190111 tf2 preview_jwkang_pub
 
function* - ES6, generators, and all that (JSRomandie meetup, February 2014)
function* - ES6, generators, and all that (JSRomandie meetup, February 2014)function* - ES6, generators, and all that (JSRomandie meetup, February 2014)
function* - ES6, generators, and all that (JSRomandie meetup, February 2014)
 
Include
IncludeInclude
Include
 
Snug
SnugSnug
Snug
 
Cypher for Gremlin
Cypher for GremlinCypher for Gremlin
Cypher for Gremlin
 
Peephole Optimization
Peephole OptimizationPeephole Optimization
Peephole Optimization
 
Numba Overview
Numba OverviewNumba Overview
Numba Overview
 
Circuit Simplifier
Circuit SimplifierCircuit Simplifier
Circuit Simplifier
 
Unity best practices (2013)
Unity best practices (2013)Unity best practices (2013)
Unity best practices (2013)
 

Ähnlich wie Post-processing SAR images on Xeon Phi - a porting exercise

lecture_GPUArchCUDA04-OpenMPHOMP.pdf
lecture_GPUArchCUDA04-OpenMPHOMP.pdflecture_GPUArchCUDA04-OpenMPHOMP.pdf
lecture_GPUArchCUDA04-OpenMPHOMP.pdfTigabu Yaya
 
Exploiting parallelism opportunities in non-parallel architectures to improve...
Exploiting parallelism opportunities in non-parallel architectures to improve...Exploiting parallelism opportunities in non-parallel architectures to improve...
Exploiting parallelism opportunities in non-parallel architectures to improve...GreenLSI Team, LSI, UPM
 
OmpSs – improving the scalability of OpenMP
OmpSs – improving the scalability of OpenMPOmpSs – improving the scalability of OpenMP
OmpSs – improving the scalability of OpenMPIntel IT Center
 
OpenMP tasking model: from the standard to the classroom
OpenMP tasking model: from the standard to the classroomOpenMP tasking model: from the standard to the classroom
OpenMP tasking model: from the standard to the classroomFacultad de Informática UCM
 
Netmap presentation
Netmap presentationNetmap presentation
Netmap presentationAmir Razmjou
 
Umbra Ignite 2015: Jérémy Virga – Dishonored 2 rendering engine architecture ...
Umbra Ignite 2015: Jérémy Virga – Dishonored 2 rendering engine architecture ...Umbra Ignite 2015: Jérémy Virga – Dishonored 2 rendering engine architecture ...
Umbra Ignite 2015: Jérémy Virga – Dishonored 2 rendering engine architecture ...Umbra Software
 
Big Data Day LA 2016/ Big Data Track - Portable Stream and Batch Processing w...
Big Data Day LA 2016/ Big Data Track - Portable Stream and Batch Processing w...Big Data Day LA 2016/ Big Data Track - Portable Stream and Batch Processing w...
Big Data Day LA 2016/ Big Data Track - Portable Stream and Batch Processing w...Data Con LA
 
Mathematics and development of fast TLS handshakes
Mathematics and development of fast TLS handshakesMathematics and development of fast TLS handshakes
Mathematics and development of fast TLS handshakesAlexander Krizhanovsky
 
“Efficiently Map AI and Vision Applications onto Multi-core AI Processors Usi...
“Efficiently Map AI and Vision Applications onto Multi-core AI Processors Usi...“Efficiently Map AI and Vision Applications onto Multi-core AI Processors Usi...
“Efficiently Map AI and Vision Applications onto Multi-core AI Processors Usi...Edge AI and Vision Alliance
 
SMP implementation for OpenBSD/sgi
SMP implementation for OpenBSD/sgiSMP implementation for OpenBSD/sgi
SMP implementation for OpenBSD/sgiTakuya ASADA
 
Performance Analysis of Lattice QCD on GPUs in APGAS Programming Model
Performance Analysis of Lattice QCD on GPUs in APGAS Programming ModelPerformance Analysis of Lattice QCD on GPUs in APGAS Programming Model
Performance Analysis of Lattice QCD on GPUs in APGAS Programming ModelKoichi Shirahata
 
Topology hiding Multipath Routing Protocol in MANET
Topology hiding Multipath Routing Protocol in MANETTopology hiding Multipath Routing Protocol in MANET
Topology hiding Multipath Routing Protocol in MANETAkshay Phalke
 
Java Jit. Compilation and optimization by Andrey Kovalenko
Java Jit. Compilation and optimization by Andrey KovalenkoJava Jit. Compilation and optimization by Andrey Kovalenko
Java Jit. Compilation and optimization by Andrey KovalenkoValeriia Maliarenko
 
Pragmatic Optimization in Modern Programming - Ordering Optimization Approaches
Pragmatic Optimization in Modern Programming - Ordering Optimization ApproachesPragmatic Optimization in Modern Programming - Ordering Optimization Approaches
Pragmatic Optimization in Modern Programming - Ordering Optimization ApproachesMarina Kolpakova
 
Modeling & Simulation of CubeSat-based Missions'Concept of Operations
Modeling & Simulation of CubeSat-based Missions'Concept of OperationsModeling & Simulation of CubeSat-based Missions'Concept of Operations
Modeling & Simulation of CubeSat-based Missions'Concept of OperationsObeo
 
Profiling & Testing with Spark
Profiling & Testing with SparkProfiling & Testing with Spark
Profiling & Testing with SparkRoger Rafanell Mas
 
Data Analytics and Simulation in Parallel with MATLAB*
Data Analytics and Simulation in Parallel with MATLAB*Data Analytics and Simulation in Parallel with MATLAB*
Data Analytics and Simulation in Parallel with MATLAB*Intel® Software
 

Ähnlich wie Post-processing SAR images on Xeon Phi - a porting exercise (20)

lecture_GPUArchCUDA04-OpenMPHOMP.pdf
lecture_GPUArchCUDA04-OpenMPHOMP.pdflecture_GPUArchCUDA04-OpenMPHOMP.pdf
lecture_GPUArchCUDA04-OpenMPHOMP.pdf
 
Exploiting parallelism opportunities in non-parallel architectures to improve...
Exploiting parallelism opportunities in non-parallel architectures to improve...Exploiting parallelism opportunities in non-parallel architectures to improve...
Exploiting parallelism opportunities in non-parallel architectures to improve...
 
OmpSs – improving the scalability of OpenMP
OmpSs – improving the scalability of OpenMPOmpSs – improving the scalability of OpenMP
OmpSs – improving the scalability of OpenMP
 
OpenMP tasking model: from the standard to the classroom
OpenMP tasking model: from the standard to the classroomOpenMP tasking model: from the standard to the classroom
OpenMP tasking model: from the standard to the classroom
 
Programar para GPUs
Programar para GPUsProgramar para GPUs
Programar para GPUs
 
Netmap presentation
Netmap presentationNetmap presentation
Netmap presentation
 
Umbra Ignite 2015: Jérémy Virga – Dishonored 2 rendering engine architecture ...
Umbra Ignite 2015: Jérémy Virga – Dishonored 2 rendering engine architecture ...Umbra Ignite 2015: Jérémy Virga – Dishonored 2 rendering engine architecture ...
Umbra Ignite 2015: Jérémy Virga – Dishonored 2 rendering engine architecture ...
 
Big Data Day LA 2016/ Big Data Track - Portable Stream and Batch Processing w...
Big Data Day LA 2016/ Big Data Track - Portable Stream and Batch Processing w...Big Data Day LA 2016/ Big Data Track - Portable Stream and Batch Processing w...
Big Data Day LA 2016/ Big Data Track - Portable Stream and Batch Processing w...
 
Mathematics and development of fast TLS handshakes
Mathematics and development of fast TLS handshakesMathematics and development of fast TLS handshakes
Mathematics and development of fast TLS handshakes
 
“Efficiently Map AI and Vision Applications onto Multi-core AI Processors Usi...
“Efficiently Map AI and Vision Applications onto Multi-core AI Processors Usi...“Efficiently Map AI and Vision Applications onto Multi-core AI Processors Usi...
“Efficiently Map AI and Vision Applications onto Multi-core AI Processors Usi...
 
SMP implementation for OpenBSD/sgi
SMP implementation for OpenBSD/sgiSMP implementation for OpenBSD/sgi
SMP implementation for OpenBSD/sgi
 
Performance Analysis of Lattice QCD on GPUs in APGAS Programming Model
Performance Analysis of Lattice QCD on GPUs in APGAS Programming ModelPerformance Analysis of Lattice QCD on GPUs in APGAS Programming Model
Performance Analysis of Lattice QCD on GPUs in APGAS Programming Model
 
Topology hiding Multipath Routing Protocol in MANET
Topology hiding Multipath Routing Protocol in MANETTopology hiding Multipath Routing Protocol in MANET
Topology hiding Multipath Routing Protocol in MANET
 
Exploring Gpgpu Workloads
Exploring Gpgpu WorkloadsExploring Gpgpu Workloads
Exploring Gpgpu Workloads
 
Java Jit. Compilation and optimization by Andrey Kovalenko
Java Jit. Compilation and optimization by Andrey KovalenkoJava Jit. Compilation and optimization by Andrey Kovalenko
Java Jit. Compilation and optimization by Andrey Kovalenko
 
Pragmatic Optimization in Modern Programming - Ordering Optimization Approaches
Pragmatic Optimization in Modern Programming - Ordering Optimization ApproachesPragmatic Optimization in Modern Programming - Ordering Optimization Approaches
Pragmatic Optimization in Modern Programming - Ordering Optimization Approaches
 
Modeling & Simulation of CubeSat-based Missions'Concept of Operations
Modeling & Simulation of CubeSat-based Missions'Concept of OperationsModeling & Simulation of CubeSat-based Missions'Concept of Operations
Modeling & Simulation of CubeSat-based Missions'Concept of Operations
 
Can FPGAs Compete with GPUs?
Can FPGAs Compete with GPUs?Can FPGAs Compete with GPUs?
Can FPGAs Compete with GPUs?
 
Profiling & Testing with Spark
Profiling & Testing with SparkProfiling & Testing with Spark
Profiling & Testing with Spark
 
Data Analytics and Simulation in Parallel with MATLAB*
Data Analytics and Simulation in Parallel with MATLAB*Data Analytics and Simulation in Parallel with MATLAB*
Data Analytics and Simulation in Parallel with MATLAB*
 

Mehr von Intel IT Center

AI Crash Course- Supercomputing
AI Crash Course- SupercomputingAI Crash Course- Supercomputing
AI Crash Course- SupercomputingIntel IT Center
 
FPGA Inference - DellEMC SURFsara
FPGA Inference - DellEMC SURFsaraFPGA Inference - DellEMC SURFsara
FPGA Inference - DellEMC SURFsaraIntel IT Center
 
High Memory Bandwidth Demo @ One Intel Station
High Memory Bandwidth Demo @ One Intel StationHigh Memory Bandwidth Demo @ One Intel Station
High Memory Bandwidth Demo @ One Intel StationIntel IT Center
 
INFOGRAPHIC: Advantages of Intel vs. IBM Power on SAP HANA solutions
INFOGRAPHIC: Advantages of Intel vs. IBM Power on SAP HANA solutionsINFOGRAPHIC: Advantages of Intel vs. IBM Power on SAP HANA solutions
INFOGRAPHIC: Advantages of Intel vs. IBM Power on SAP HANA solutionsIntel IT Center
 
Disrupt Hackers With Robust User Authentication
Disrupt Hackers With Robust User AuthenticationDisrupt Hackers With Robust User Authentication
Disrupt Hackers With Robust User AuthenticationIntel IT Center
 
Strengthen Your Enterprise Arsenal Against Cyber Attacks With Hardware-Enhanc...
Strengthen Your Enterprise Arsenal Against Cyber Attacks With Hardware-Enhanc...Strengthen Your Enterprise Arsenal Against Cyber Attacks With Hardware-Enhanc...
Strengthen Your Enterprise Arsenal Against Cyber Attacks With Hardware-Enhanc...Intel IT Center
 
Harness Digital Disruption to Create 2022’s Workplace Today
Harness Digital Disruption to Create 2022’s Workplace TodayHarness Digital Disruption to Create 2022’s Workplace Today
Harness Digital Disruption to Create 2022’s Workplace TodayIntel IT Center
 
Don't Rely on Software Alone. Protect Endpoints with Hardware-Enhanced Security.
Don't Rely on Software Alone.Protect Endpoints with Hardware-Enhanced Security.Don't Rely on Software Alone.Protect Endpoints with Hardware-Enhanced Security.
Don't Rely on Software Alone. Protect Endpoints with Hardware-Enhanced Security.Intel IT Center
 
Achieve Unconstrained Collaboration in a Digital World
Achieve Unconstrained Collaboration in a Digital WorldAchieve Unconstrained Collaboration in a Digital World
Achieve Unconstrained Collaboration in a Digital WorldIntel IT Center
 
Intel® Xeon® Scalable Processors Enabled Applications Marketing Guide
Intel® Xeon® Scalable Processors Enabled Applications Marketing GuideIntel® Xeon® Scalable Processors Enabled Applications Marketing Guide
Intel® Xeon® Scalable Processors Enabled Applications Marketing GuideIntel IT Center
 
#NABshow: National Association of Broadcasters 2017 Super Session Presentatio...
#NABshow: National Association of Broadcasters 2017 Super Session Presentatio...#NABshow: National Association of Broadcasters 2017 Super Session Presentatio...
#NABshow: National Association of Broadcasters 2017 Super Session Presentatio...Intel IT Center
 
Identity Protection for the Digital Age
Identity Protection for the Digital AgeIdentity Protection for the Digital Age
Identity Protection for the Digital AgeIntel IT Center
 
Three Steps to Making a Digital Workplace a Reality
Three Steps to Making a Digital Workplace a RealityThree Steps to Making a Digital Workplace a Reality
Three Steps to Making a Digital Workplace a RealityIntel IT Center
 
Three Steps to Making The Digital Workplace a Reality - by Intel’s Chad Const...
Three Steps to Making The Digital Workplace a Reality - by Intel’s Chad Const...Three Steps to Making The Digital Workplace a Reality - by Intel’s Chad Const...
Three Steps to Making The Digital Workplace a Reality - by Intel’s Chad Const...Intel IT Center
 
Intel® Xeon® Processor E7-8800/4800 v4 EAMG 2.0
Intel® Xeon® Processor E7-8800/4800 v4 EAMG 2.0Intel® Xeon® Processor E7-8800/4800 v4 EAMG 2.0
Intel® Xeon® Processor E7-8800/4800 v4 EAMG 2.0Intel IT Center
 
Intel® Xeon® Processor E5-2600 v4 Enterprise Database Applications Showcase
Intel® Xeon® Processor E5-2600 v4 Enterprise Database Applications ShowcaseIntel® Xeon® Processor E5-2600 v4 Enterprise Database Applications Showcase
Intel® Xeon® Processor E5-2600 v4 Enterprise Database Applications ShowcaseIntel IT Center
 
Intel® Xeon® Processor E5-2600 v4 Core Business Applications Showcase
Intel® Xeon® Processor E5-2600 v4 Core Business Applications ShowcaseIntel® Xeon® Processor E5-2600 v4 Core Business Applications Showcase
Intel® Xeon® Processor E5-2600 v4 Core Business Applications ShowcaseIntel IT Center
 
Intel® Xeon® Processor E5-2600 v4 Financial Security Applications Showcase
Intel® Xeon® Processor E5-2600 v4 Financial Security Applications ShowcaseIntel® Xeon® Processor E5-2600 v4 Financial Security Applications Showcase
Intel® Xeon® Processor E5-2600 v4 Financial Security Applications ShowcaseIntel IT Center
 
Intel® Xeon® Processor E5-2600 v4 Telco Cloud Digital Applications Showcase
Intel® Xeon® Processor E5-2600 v4 Telco Cloud Digital Applications ShowcaseIntel® Xeon® Processor E5-2600 v4 Telco Cloud Digital Applications Showcase
Intel® Xeon® Processor E5-2600 v4 Telco Cloud Digital Applications ShowcaseIntel IT Center
 
Intel® Xeon® Processor E5-2600 v4 Tech Computing Applications Showcase
Intel® Xeon® Processor E5-2600 v4 Tech Computing Applications ShowcaseIntel® Xeon® Processor E5-2600 v4 Tech Computing Applications Showcase
Intel® Xeon® Processor E5-2600 v4 Tech Computing Applications ShowcaseIntel IT Center
 

Mehr von Intel IT Center (20)

AI Crash Course- Supercomputing
AI Crash Course- SupercomputingAI Crash Course- Supercomputing
AI Crash Course- Supercomputing
 
FPGA Inference - DellEMC SURFsara
FPGA Inference - DellEMC SURFsaraFPGA Inference - DellEMC SURFsara
FPGA Inference - DellEMC SURFsara
 
High Memory Bandwidth Demo @ One Intel Station
High Memory Bandwidth Demo @ One Intel StationHigh Memory Bandwidth Demo @ One Intel Station
High Memory Bandwidth Demo @ One Intel Station
 
INFOGRAPHIC: Advantages of Intel vs. IBM Power on SAP HANA solutions
INFOGRAPHIC: Advantages of Intel vs. IBM Power on SAP HANA solutionsINFOGRAPHIC: Advantages of Intel vs. IBM Power on SAP HANA solutions
INFOGRAPHIC: Advantages of Intel vs. IBM Power on SAP HANA solutions
 
Disrupt Hackers With Robust User Authentication
Disrupt Hackers With Robust User AuthenticationDisrupt Hackers With Robust User Authentication
Disrupt Hackers With Robust User Authentication
 
Strengthen Your Enterprise Arsenal Against Cyber Attacks With Hardware-Enhanc...
Strengthen Your Enterprise Arsenal Against Cyber Attacks With Hardware-Enhanc...Strengthen Your Enterprise Arsenal Against Cyber Attacks With Hardware-Enhanc...
Strengthen Your Enterprise Arsenal Against Cyber Attacks With Hardware-Enhanc...
 
Harness Digital Disruption to Create 2022’s Workplace Today
Harness Digital Disruption to Create 2022’s Workplace TodayHarness Digital Disruption to Create 2022’s Workplace Today
Harness Digital Disruption to Create 2022’s Workplace Today
 
Don't Rely on Software Alone. Protect Endpoints with Hardware-Enhanced Security.
Don't Rely on Software Alone.Protect Endpoints with Hardware-Enhanced Security.Don't Rely on Software Alone.Protect Endpoints with Hardware-Enhanced Security.
Don't Rely on Software Alone. Protect Endpoints with Hardware-Enhanced Security.
 
Achieve Unconstrained Collaboration in a Digital World
Achieve Unconstrained Collaboration in a Digital WorldAchieve Unconstrained Collaboration in a Digital World
Achieve Unconstrained Collaboration in a Digital World
 
Intel® Xeon® Scalable Processors Enabled Applications Marketing Guide
Intel® Xeon® Scalable Processors Enabled Applications Marketing GuideIntel® Xeon® Scalable Processors Enabled Applications Marketing Guide
Intel® Xeon® Scalable Processors Enabled Applications Marketing Guide
 
#NABshow: National Association of Broadcasters 2017 Super Session Presentatio...
#NABshow: National Association of Broadcasters 2017 Super Session Presentatio...#NABshow: National Association of Broadcasters 2017 Super Session Presentatio...
#NABshow: National Association of Broadcasters 2017 Super Session Presentatio...
 
Identity Protection for the Digital Age
Identity Protection for the Digital AgeIdentity Protection for the Digital Age
Identity Protection for the Digital Age
 
Three Steps to Making a Digital Workplace a Reality
Three Steps to Making a Digital Workplace a RealityThree Steps to Making a Digital Workplace a Reality
Three Steps to Making a Digital Workplace a Reality
 
Three Steps to Making The Digital Workplace a Reality - by Intel’s Chad Const...
Three Steps to Making The Digital Workplace a Reality - by Intel’s Chad Const...Three Steps to Making The Digital Workplace a Reality - by Intel’s Chad Const...
Three Steps to Making The Digital Workplace a Reality - by Intel’s Chad Const...
 
Intel® Xeon® Processor E7-8800/4800 v4 EAMG 2.0
Intel® Xeon® Processor E7-8800/4800 v4 EAMG 2.0Intel® Xeon® Processor E7-8800/4800 v4 EAMG 2.0
Intel® Xeon® Processor E7-8800/4800 v4 EAMG 2.0
 
Intel® Xeon® Processor E5-2600 v4 Enterprise Database Applications Showcase
Intel® Xeon® Processor E5-2600 v4 Enterprise Database Applications ShowcaseIntel® Xeon® Processor E5-2600 v4 Enterprise Database Applications Showcase
Intel® Xeon® Processor E5-2600 v4 Enterprise Database Applications Showcase
 
Intel® Xeon® Processor E5-2600 v4 Core Business Applications Showcase
Intel® Xeon® Processor E5-2600 v4 Core Business Applications ShowcaseIntel® Xeon® Processor E5-2600 v4 Core Business Applications Showcase
Intel® Xeon® Processor E5-2600 v4 Core Business Applications Showcase
 
Intel® Xeon® Processor E5-2600 v4 Financial Security Applications Showcase
Intel® Xeon® Processor E5-2600 v4 Financial Security Applications ShowcaseIntel® Xeon® Processor E5-2600 v4 Financial Security Applications Showcase
Intel® Xeon® Processor E5-2600 v4 Financial Security Applications Showcase
 
Intel® Xeon® Processor E5-2600 v4 Telco Cloud Digital Applications Showcase
Intel® Xeon® Processor E5-2600 v4 Telco Cloud Digital Applications ShowcaseIntel® Xeon® Processor E5-2600 v4 Telco Cloud Digital Applications Showcase
Intel® Xeon® Processor E5-2600 v4 Telco Cloud Digital Applications Showcase
 
Intel® Xeon® Processor E5-2600 v4 Tech Computing Applications Showcase
Intel® Xeon® Processor E5-2600 v4 Tech Computing Applications ShowcaseIntel® Xeon® Processor E5-2600 v4 Tech Computing Applications Showcase
Intel® Xeon® Processor E5-2600 v4 Tech Computing Applications Showcase
 

Kürzlich hochgeladen

Introduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptxIntroduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptxMatsuo Lab
 
UiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation DevelopersUiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation DevelopersUiPathCommunity
 
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...Aggregage
 
UiPath Studio Web workshop series - Day 8
UiPath Studio Web workshop series - Day 8UiPath Studio Web workshop series - Day 8
UiPath Studio Web workshop series - Day 8DianaGray10
 
Building AI-Driven Apps Using Semantic Kernel.pptx
Building AI-Driven Apps Using Semantic Kernel.pptxBuilding AI-Driven Apps Using Semantic Kernel.pptx
Building AI-Driven Apps Using Semantic Kernel.pptxUdaiappa Ramachandran
 
Comparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and IstioComparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and IstioChristian Posta
 
UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7DianaGray10
 
Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.YounusS2
 
NIST Cybersecurity Framework (CSF) 2.0 Workshop
NIST Cybersecurity Framework (CSF) 2.0 WorkshopNIST Cybersecurity Framework (CSF) 2.0 Workshop
NIST Cybersecurity Framework (CSF) 2.0 WorkshopBachir Benyammi
 
Cybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptxCybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptxGDSC PJATK
 
AI You Can Trust - Ensuring Success with Data Integrity Webinar
AI You Can Trust - Ensuring Success with Data Integrity WebinarAI You Can Trust - Ensuring Success with Data Integrity Webinar
AI You Can Trust - Ensuring Success with Data Integrity WebinarPrecisely
 
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019IES VE
 
Videogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdfVideogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdfinfogdgmi
 
Linked Data in Production: Moving Beyond Ontologies
Linked Data in Production: Moving Beyond OntologiesLinked Data in Production: Moving Beyond Ontologies
Linked Data in Production: Moving Beyond OntologiesDavid Newbury
 
How Accurate are Carbon Emissions Projections?
How Accurate are Carbon Emissions Projections?How Accurate are Carbon Emissions Projections?
How Accurate are Carbon Emissions Projections?IES VE
 
Igniting Next Level Productivity with AI-Infused Data Integration Workflows
Igniting Next Level Productivity with AI-Infused Data Integration WorkflowsIgniting Next Level Productivity with AI-Infused Data Integration Workflows
Igniting Next Level Productivity with AI-Infused Data Integration WorkflowsSafe Software
 
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...DianaGray10
 
UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1DianaGray10
 

Kürzlich hochgeladen (20)

Introduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptxIntroduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptx
 
UiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation DevelopersUiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation Developers
 
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
 
UiPath Studio Web workshop series - Day 8
UiPath Studio Web workshop series - Day 8UiPath Studio Web workshop series - Day 8
UiPath Studio Web workshop series - Day 8
 
Building AI-Driven Apps Using Semantic Kernel.pptx
Building AI-Driven Apps Using Semantic Kernel.pptxBuilding AI-Driven Apps Using Semantic Kernel.pptx
Building AI-Driven Apps Using Semantic Kernel.pptx
 
Comparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and IstioComparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and Istio
 
UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7
 
Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.
 
NIST Cybersecurity Framework (CSF) 2.0 Workshop
NIST Cybersecurity Framework (CSF) 2.0 WorkshopNIST Cybersecurity Framework (CSF) 2.0 Workshop
NIST Cybersecurity Framework (CSF) 2.0 Workshop
 
Cybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptxCybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptx
 
AI You Can Trust - Ensuring Success with Data Integrity Webinar
AI You Can Trust - Ensuring Success with Data Integrity WebinarAI You Can Trust - Ensuring Success with Data Integrity Webinar
AI You Can Trust - Ensuring Success with Data Integrity Webinar
 
20230104 - machine vision
20230104 - machine vision20230104 - machine vision
20230104 - machine vision
 
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
 
Videogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdfVideogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdf
 
Linked Data in Production: Moving Beyond Ontologies
Linked Data in Production: Moving Beyond OntologiesLinked Data in Production: Moving Beyond Ontologies
Linked Data in Production: Moving Beyond Ontologies
 
How Accurate are Carbon Emissions Projections?
How Accurate are Carbon Emissions Projections?How Accurate are Carbon Emissions Projections?
How Accurate are Carbon Emissions Projections?
 
20150722 - AGV
20150722 - AGV20150722 - AGV
20150722 - AGV
 
Igniting Next Level Productivity with AI-Infused Data Integration Workflows
Igniting Next Level Productivity with AI-Infused Data Integration WorkflowsIgniting Next Level Productivity with AI-Infused Data Integration Workflows
Igniting Next Level Productivity with AI-Infused Data Integration Workflows
 
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
 
UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1
 

Post-processing SAR images on Xeon Phi - a porting exercise

  • 1. Post-processing SAR images on Xeon Phi – a porting exercise Martin Hilgeman HPC Consultant EMEA
  • 2. Research Computing • Xeon Phi porting checklist • Post processing of SAR images • Questions Agenda Dell Confidential
  • 3. Research Computing SAR interferometry is used to monitor terrain displacements by using satellite data The phase difference between two SAR images is calculated that have been acquired at: › different times › with slightly different view angles › different observation dates Dell Confidential Case study: Geospatial imaging
  • 4. Research Computing Application characteristics: – Written in C90 and C++, ~ 150,000 lines of code – Serial application, using Intel MKL DFTI functions – Iterative scheme • MKL FFTs are already being used, little room for improvement – FFTs are too small to run in threaded mode • Application only runs in serial mode, needs to be parallelized in order to run on a Phi in reasonable speed • Phase images are divided into patches, which is suitable for parallelization – Parallelize the main loop using OpenMP Dell Confidential Code details
  • 5. Research Computing Test Setup Confidential5 Jobs ran at the TACC Stampede system PowerEdge C8220 with Intel® Xeon® E5-2680 2.7GHz – 16 cores – 32 GB memory – RHEL 6.3 – Intel® Xeon Phi™ 7120P
  • 6. Research Computing Parallelization of big main loop count = 1; if (lastGCP != NULL) { app = lastGCP->next; } else { app = bufferGCP; } while (app) { app->loc [0] = (int) app->loc [0]; app->loc [1] = (int) app->loc [1]; app->offset[0] = (int) app->offset[0]; app->offset[1] = (int) app->offset[1]; <lot of work> lastGCP = app; app = app->next; count++; } 6 Confidential count = 1; if (lastGCP != NULL) { app = lastGCP->next; } else { app = bufferGCP; } int i = 0; int nthreads, me; #if defined _OPENMP #pragma omp parallel private(i,me,app2,err,suboffset,qc,m_block,s_block) { nthreads = omp_get_num_threads(); me = omp_get_thread_num(); #else nthreads = 1; me = 0; #endif #pragma omp for schedule(static) for (i = 0; i < buffer_ngcp; i++) { #pragma omp critical { app2 = app; app = app->next; } app2->loc [0] = (int) app2->loc [0]; app2->loc [1] = (int) app2->loc [1]; app2->offset[0] = (int) app2->offset[0]; app2->offset[1] = (int) app2->offset[1]; <lot of work> lastGCP = app2; count++; } #if defined _OPENMP } #endif
  • 7. Research ComputingDell Confidential Results Wall Clock Time (ss:00) Original version on E5440 2.83 GHz 102.00 Original version on E5-2680 2.70 GHz 27.43 Additional source optimizations 25.83 2 threads 17.37 4 threads 9.97 6 threads 7.17 8 threads 5.77 12 threads 5.57
  • 8. Research Computing • Now runs multi-threaded using OpenMP directives – The number of patches is greater than the number of threads on the Phi • Memory footprint is small, so should fit on the card – Running in native mode is possible • Intel MKL has complete support for Xeon Phi – Assume that the FFTs are making optimal use of the resources Dell Confidential How about Intel® Xeon Phi™ performance?
  • 9. Research Computing Module Summary -------------------------------------------------------------------------------- Samples Self % Total % Module 247 59.81% 59.81% /lib64/libc-2.12.so 79 19.13% 78.93% /scratch/dell-guest/app 38 9.20% 88.14% /opt/intel/mkl/lib/intel64/libmkl_core.so 34 8.23% 96.37% /opt/intel/mkl/lib/intel64/libmkl_intel_thread.so 10 2.42% 98.79% /opt/intel/mkl/lib/intel64/libmkl_intel_lp64.so 5 1.21% 100.00% /lib64/ld-2.12.so File Summary -------------------------------------------------------------------------------- Samples Self % Total % File 332 80.39% 80.39% ?? 64 15.50% 95.88% malloc.c 10 2.42% 98.31% interp.c 3 0.73% 99.03% bsearch.c 2 0.48% 99.52% SRC/patch.c 2 0.48% 100.00% printf_fp.c Function Summary -------------------------------------------------------------------------------- Samples Self % Total % Function 126 30.51% 30.51% brk 69 16.71% 47.22% ?? 60 14.53% 61.74% _int_malloc 43 10.41% 72.15% pmatch 34 8.23% 80.39% memcpy 13 3.15% 83.54% get_correlation_real_mkl 10 2.42% 85.96% extract2double 8 1.94% 87.89% __intel_new_memcpy Dell Confidential A profile