SlideShare a Scribd company logo
1 of 17
MitgliedderHelmholtz-Gemeinschaft
Scalable Parallel
Performance Measurement
with the Scalasca Toolset
Bernd Mohr
June 2013
June 2013 JSC 2
Parallel Architectures: State of the Art
Network or Switch
...
N0 N1 Nk
Inter-
connect
P0 Pn
...
Memory
A0
Am
... Inter-
connect
P0 Pn
...
Memory
A0
Am
...
Inter-
connect
P0 Pn
...
A0
Am
...
Memory
Pi
Core0 Core1 Corer
L10 L11 L1
L20 L2r/2
L30
...
... Aj
Router Router
Router
Router Router
Router
Router
Router Router Router
Router Router Router
Router Router Router
Router Router Router
Router Router Router
Router Router Router
Router Router Router
Router Router Router
Router Router Router
or
SMP
NUMA
June 2013 JSC 3
Parallel Performance Challenges
• Current and future systems (will) consist of
 Complex configurations
 With a huge number of components
 Very likely heterogeneous
• Deep software hierarchies of large, complex software components will
be required to make use of such systems
 Sophisticated integrated performance
measurement, analysis, and optimization capabilities
will be required to efficiently operate such systems
 Tools which provide insight not just numbers or charts needed!
June 2013 JSC 4
“A picture is worth 1000 words…”
• “Real world” example• MPI ring program
June 2013 JSC 5
“What about 1000’s of pictures?”
(with 100’s of menu options)
June 2013 JSC 6
Example Automatic Analysis: Late Sender
June 2013 JSC 7
Scalasca: Example MPI Patterns
time
process
ENTER EXIT SEND RECV COLLEXIT
(a) Late Sender
time
process
(b) Late Receiver
time
process
(d) Wait at N x N
time
process
(c) Late Sender / Wrong Order
June 2013 JSC 8
The Scalasca Project
• Scalable Analysis of
Large Scale Applications
• Approach
 Instrument C, C++, and Fortran parallel applications
 Based on MPI, OpenMP, SHMEM, or hybrid
 Option 1: scalable call-path profiling
 Option 2: scalable event trace analysis
 Collect event traces
 Search trace for event patterns representing inefficiencies
 Categorize and rank inefficiencies found
• Supports MPI 2.2 (P2P, collectives, RMA, IO) and OpenMP 3.0 (excl. nesting)
http://www.scalasca.org/
June 2013 JSC 9
Scalasca Example: CESM Sea Ice Module
Late Sender
Analysis
• Finds waiting at
MPI_Waitall()
inside
ice boundary
halo update
• Shows distribution
of imbalance
across system
and ranks
June 2013 JSC 10
Scalasca Example: CESM Sea Ice Module
Late Sender
Analysis +
Application
Topology
• Shows distribution
of imbalance
over topology
• MPI topologies
are automatically
captured
June 2013 JSC 11time
Scalasca Root Cause Analysis
• Root-cause analysis
 Wait states typically caused by load
or communication imbalances
earlier in the program
 Waiting time can also propagate
(e.g., indirect waiting time)
 Enhanced performance analysis to
find the root cause of wait states
• Approach
 Distinguish between direct
and indirect waiting time
 Identify call path/process
combinations delaying other
processes and causing first
order waiting time
 Identify original delay
Recv
Send
Send
foo
foo
foo
bar
bar Recv
A
B
C
cause
Recv
Recv
Direct waitIndirect wait
Recv
barDELAY
June 2013 JSC 12
Scalasca Example: CESM Sea Ice Module
Direct Wait
Time Analysis
• Direct wait
caused by ranks
processing areas
near the north
and south
ice borders
June 2013 JSC 13
Scalasca Example: CESM Sea Ice Module
Indirect Wait
Time Analysis
• Indirect waits
occurs for
ranks processing
warmer areas
June 2013 JSC 14
Scalasca Example: CESM Sea Ice Module
Delay Costs
Analysis
• Delays NOT
caused on ranks
processing
ice!
June 2013 JSC 15
NEW: Scalasca on Intel MIC
Example:
• TACC Stampede
• NAS BT-MZ code
• MPI/OpenMP
• 8x16 CPU threads (2 MPI/node)
• 60x16 MIC threads (15 MPI/MIC)
Supported modes
• Host-only or MIC-only
• Symmetric
Not yet supported modes
• Offload
June 2013 JSC 16
Acknowledgements
• Scalasca team (JSC) (GRS)
• Sponsors
Michael
Knobloch
Bernd
Mohr
Peter
Philippen
Markus
Geimer
Daniel
Lorenz
Christian
Rössel
David
Böhme
Marc-André
Hermanns
Pavel
Saviankou
Marc
Schlütter
Ilja
Zhukov
Alexandre
Strube
Brian
Wylie
Felix
Wolf
Anke
Visser
Monika
Lücke
Aamer
Shah
Alexandru
Calotoiu
Jie
Jiang
Sergei
Shudler
Guoyong
Mao
Philipp
Gschwandtner
June 2013 JSC 17
Questions?
• Check out
http://www.scalasca.org
• Or contact us at
scalasca@fz-juelich.de

More Related Content

Similar to Scalable Parallel Performance Measurement with the Scalasca Toolset

ACM Applicative System Methodology 2016
ACM Applicative System Methodology 2016ACM Applicative System Methodology 2016
ACM Applicative System Methodology 2016Brendan Gregg
 
Blue Waters and Resource Management - Now and in the Future
 Blue Waters and Resource Management - Now and in the Future Blue Waters and Resource Management - Now and in the Future
Blue Waters and Resource Management - Now and in the Futureinside-BigData.com
 
System on Chip Design and Modelling Dr. David J Greaves
System on Chip Design and Modelling   Dr. David J GreavesSystem on Chip Design and Modelling   Dr. David J Greaves
System on Chip Design and Modelling Dr. David J GreavesSatya Harish
 
Case Study: Elasticsearch Ingest Using StreamSets at Cisco Intercloud
Case Study: Elasticsearch Ingest Using StreamSets at Cisco IntercloudCase Study: Elasticsearch Ingest Using StreamSets at Cisco Intercloud
Case Study: Elasticsearch Ingest Using StreamSets at Cisco IntercloudRick Bilodeau
 
Case Study: Elasticsearch Ingest Using StreamSets @ Cisco Intercloud
Case Study: Elasticsearch Ingest Using StreamSets @ Cisco IntercloudCase Study: Elasticsearch Ingest Using StreamSets @ Cisco Intercloud
Case Study: Elasticsearch Ingest Using StreamSets @ Cisco IntercloudStreamsets Inc.
 
Crash course on data streaming (with examples using Apache Flink)
Crash course on data streaming (with examples using Apache Flink)Crash course on data streaming (with examples using Apache Flink)
Crash course on data streaming (with examples using Apache Flink)Vincenzo Gulisano
 
CS8091_BDA_Unit_IV_Stream_Computing
CS8091_BDA_Unit_IV_Stream_ComputingCS8091_BDA_Unit_IV_Stream_Computing
CS8091_BDA_Unit_IV_Stream_ComputingPalani Kumar
 
Disadvantages Of Robotium
Disadvantages Of RobotiumDisadvantages Of Robotium
Disadvantages Of RobotiumSusan Tullis
 
Apache Pulsar Overview
Apache Pulsar OverviewApache Pulsar Overview
Apache Pulsar OverviewStreamlio
 
Software and Hardware Tools for Microprocessors
Software and Hardware Tools for MicroprocessorsSoftware and Hardware Tools for Microprocessors
Software and Hardware Tools for MicroprocessorsDeepak Tathe
 
Streaming analytics state of the art
Streaming analytics state of the artStreaming analytics state of the art
Streaming analytics state of the artStavros Kontopoulos
 
Highlights and Challenges from Running Spark on Mesos in Production by Morri ...
Highlights and Challenges from Running Spark on Mesos in Production by Morri ...Highlights and Challenges from Running Spark on Mesos in Production by Morri ...
Highlights and Challenges from Running Spark on Mesos in Production by Morri ...Spark Summit
 
ACM DEBS 2015: Realtime Streaming Analytics Patterns
ACM DEBS 2015: Realtime Streaming Analytics PatternsACM DEBS 2015: Realtime Streaming Analytics Patterns
ACM DEBS 2015: Realtime Streaming Analytics PatternsSrinath Perera
 
DEBS 2015 Tutorial : Patterns for Realtime Streaming Analytics
DEBS 2015 Tutorial : Patterns for Realtime Streaming AnalyticsDEBS 2015 Tutorial : Patterns for Realtime Streaming Analytics
DEBS 2015 Tutorial : Patterns for Realtime Streaming AnalyticsSriskandarajah Suhothayan
 
Tulinx introduction 20130622 detailed
Tulinx introduction 20130622   detailedTulinx introduction 20130622   detailed
Tulinx introduction 20130622 detailedarjen1970
 
Functional reactive programming
Functional reactive programmingFunctional reactive programming
Functional reactive programmingAraf Karsh Hamid
 
Big Data Streams Architectures. Why? What? How?
Big Data Streams Architectures. Why? What? How?Big Data Streams Architectures. Why? What? How?
Big Data Streams Architectures. Why? What? How?Anton Nazaruk
 
[2015/2016] AADL (Architecture Analysis and Design Language)
[2015/2016] AADL (Architecture Analysis and Design Language)[2015/2016] AADL (Architecture Analysis and Design Language)
[2015/2016] AADL (Architecture Analysis and Design Language)Ivano Malavolta
 

Similar to Scalable Parallel Performance Measurement with the Scalasca Toolset (20)

Stream Processing Overview
Stream Processing OverviewStream Processing Overview
Stream Processing Overview
 
ACM Applicative System Methodology 2016
ACM Applicative System Methodology 2016ACM Applicative System Methodology 2016
ACM Applicative System Methodology 2016
 
Blue Waters and Resource Management - Now and in the Future
 Blue Waters and Resource Management - Now and in the Future Blue Waters and Resource Management - Now and in the Future
Blue Waters and Resource Management - Now and in the Future
 
System on Chip Design and Modelling Dr. David J Greaves
System on Chip Design and Modelling   Dr. David J GreavesSystem on Chip Design and Modelling   Dr. David J Greaves
System on Chip Design and Modelling Dr. David J Greaves
 
Case Study: Elasticsearch Ingest Using StreamSets at Cisco Intercloud
Case Study: Elasticsearch Ingest Using StreamSets at Cisco IntercloudCase Study: Elasticsearch Ingest Using StreamSets at Cisco Intercloud
Case Study: Elasticsearch Ingest Using StreamSets at Cisco Intercloud
 
Case Study: Elasticsearch Ingest Using StreamSets @ Cisco Intercloud
Case Study: Elasticsearch Ingest Using StreamSets @ Cisco IntercloudCase Study: Elasticsearch Ingest Using StreamSets @ Cisco Intercloud
Case Study: Elasticsearch Ingest Using StreamSets @ Cisco Intercloud
 
Crash course on data streaming (with examples using Apache Flink)
Crash course on data streaming (with examples using Apache Flink)Crash course on data streaming (with examples using Apache Flink)
Crash course on data streaming (with examples using Apache Flink)
 
CS8091_BDA_Unit_IV_Stream_Computing
CS8091_BDA_Unit_IV_Stream_ComputingCS8091_BDA_Unit_IV_Stream_Computing
CS8091_BDA_Unit_IV_Stream_Computing
 
Disadvantages Of Robotium
Disadvantages Of RobotiumDisadvantages Of Robotium
Disadvantages Of Robotium
 
ODP Presentation LinuxCon NA 2014
ODP Presentation LinuxCon NA 2014ODP Presentation LinuxCon NA 2014
ODP Presentation LinuxCon NA 2014
 
Apache Pulsar Overview
Apache Pulsar OverviewApache Pulsar Overview
Apache Pulsar Overview
 
Software and Hardware Tools for Microprocessors
Software and Hardware Tools for MicroprocessorsSoftware and Hardware Tools for Microprocessors
Software and Hardware Tools for Microprocessors
 
Streaming analytics state of the art
Streaming analytics state of the artStreaming analytics state of the art
Streaming analytics state of the art
 
Highlights and Challenges from Running Spark on Mesos in Production by Morri ...
Highlights and Challenges from Running Spark on Mesos in Production by Morri ...Highlights and Challenges from Running Spark on Mesos in Production by Morri ...
Highlights and Challenges from Running Spark on Mesos in Production by Morri ...
 
ACM DEBS 2015: Realtime Streaming Analytics Patterns
ACM DEBS 2015: Realtime Streaming Analytics PatternsACM DEBS 2015: Realtime Streaming Analytics Patterns
ACM DEBS 2015: Realtime Streaming Analytics Patterns
 
DEBS 2015 Tutorial : Patterns for Realtime Streaming Analytics
DEBS 2015 Tutorial : Patterns for Realtime Streaming AnalyticsDEBS 2015 Tutorial : Patterns for Realtime Streaming Analytics
DEBS 2015 Tutorial : Patterns for Realtime Streaming Analytics
 
Tulinx introduction 20130622 detailed
Tulinx introduction 20130622   detailedTulinx introduction 20130622   detailed
Tulinx introduction 20130622 detailed
 
Functional reactive programming
Functional reactive programmingFunctional reactive programming
Functional reactive programming
 
Big Data Streams Architectures. Why? What? How?
Big Data Streams Architectures. Why? What? How?Big Data Streams Architectures. Why? What? How?
Big Data Streams Architectures. Why? What? How?
 
[2015/2016] AADL (Architecture Analysis and Design Language)
[2015/2016] AADL (Architecture Analysis and Design Language)[2015/2016] AADL (Architecture Analysis and Design Language)
[2015/2016] AADL (Architecture Analysis and Design Language)
 

More from Intel IT Center

AI Crash Course- Supercomputing
AI Crash Course- SupercomputingAI Crash Course- Supercomputing
AI Crash Course- SupercomputingIntel IT Center
 
FPGA Inference - DellEMC SURFsara
FPGA Inference - DellEMC SURFsaraFPGA Inference - DellEMC SURFsara
FPGA Inference - DellEMC SURFsaraIntel IT Center
 
High Memory Bandwidth Demo @ One Intel Station
High Memory Bandwidth Demo @ One Intel StationHigh Memory Bandwidth Demo @ One Intel Station
High Memory Bandwidth Demo @ One Intel StationIntel IT Center
 
INFOGRAPHIC: Advantages of Intel vs. IBM Power on SAP HANA solutions
INFOGRAPHIC: Advantages of Intel vs. IBM Power on SAP HANA solutionsINFOGRAPHIC: Advantages of Intel vs. IBM Power on SAP HANA solutions
INFOGRAPHIC: Advantages of Intel vs. IBM Power on SAP HANA solutionsIntel IT Center
 
Disrupt Hackers With Robust User Authentication
Disrupt Hackers With Robust User AuthenticationDisrupt Hackers With Robust User Authentication
Disrupt Hackers With Robust User AuthenticationIntel IT Center
 
Strengthen Your Enterprise Arsenal Against Cyber Attacks With Hardware-Enhanc...
Strengthen Your Enterprise Arsenal Against Cyber Attacks With Hardware-Enhanc...Strengthen Your Enterprise Arsenal Against Cyber Attacks With Hardware-Enhanc...
Strengthen Your Enterprise Arsenal Against Cyber Attacks With Hardware-Enhanc...Intel IT Center
 
Harness Digital Disruption to Create 2022’s Workplace Today
Harness Digital Disruption to Create 2022’s Workplace TodayHarness Digital Disruption to Create 2022’s Workplace Today
Harness Digital Disruption to Create 2022’s Workplace TodayIntel IT Center
 
Don't Rely on Software Alone. Protect Endpoints with Hardware-Enhanced Security.
Don't Rely on Software Alone.Protect Endpoints with Hardware-Enhanced Security.Don't Rely on Software Alone.Protect Endpoints with Hardware-Enhanced Security.
Don't Rely on Software Alone. Protect Endpoints with Hardware-Enhanced Security.Intel IT Center
 
Achieve Unconstrained Collaboration in a Digital World
Achieve Unconstrained Collaboration in a Digital WorldAchieve Unconstrained Collaboration in a Digital World
Achieve Unconstrained Collaboration in a Digital WorldIntel IT Center
 
Intel® Xeon® Scalable Processors Enabled Applications Marketing Guide
Intel® Xeon® Scalable Processors Enabled Applications Marketing GuideIntel® Xeon® Scalable Processors Enabled Applications Marketing Guide
Intel® Xeon® Scalable Processors Enabled Applications Marketing GuideIntel IT Center
 
#NABshow: National Association of Broadcasters 2017 Super Session Presentatio...
#NABshow: National Association of Broadcasters 2017 Super Session Presentatio...#NABshow: National Association of Broadcasters 2017 Super Session Presentatio...
#NABshow: National Association of Broadcasters 2017 Super Session Presentatio...Intel IT Center
 
Identity Protection for the Digital Age
Identity Protection for the Digital AgeIdentity Protection for the Digital Age
Identity Protection for the Digital AgeIntel IT Center
 
Three Steps to Making a Digital Workplace a Reality
Three Steps to Making a Digital Workplace a RealityThree Steps to Making a Digital Workplace a Reality
Three Steps to Making a Digital Workplace a RealityIntel IT Center
 
Three Steps to Making The Digital Workplace a Reality - by Intel’s Chad Const...
Three Steps to Making The Digital Workplace a Reality - by Intel’s Chad Const...Three Steps to Making The Digital Workplace a Reality - by Intel’s Chad Const...
Three Steps to Making The Digital Workplace a Reality - by Intel’s Chad Const...Intel IT Center
 
Intel® Xeon® Processor E7-8800/4800 v4 EAMG 2.0
Intel® Xeon® Processor E7-8800/4800 v4 EAMG 2.0Intel® Xeon® Processor E7-8800/4800 v4 EAMG 2.0
Intel® Xeon® Processor E7-8800/4800 v4 EAMG 2.0Intel IT Center
 
Intel® Xeon® Processor E5-2600 v4 Enterprise Database Applications Showcase
Intel® Xeon® Processor E5-2600 v4 Enterprise Database Applications ShowcaseIntel® Xeon® Processor E5-2600 v4 Enterprise Database Applications Showcase
Intel® Xeon® Processor E5-2600 v4 Enterprise Database Applications ShowcaseIntel IT Center
 
Intel® Xeon® Processor E5-2600 v4 Core Business Applications Showcase
Intel® Xeon® Processor E5-2600 v4 Core Business Applications ShowcaseIntel® Xeon® Processor E5-2600 v4 Core Business Applications Showcase
Intel® Xeon® Processor E5-2600 v4 Core Business Applications ShowcaseIntel IT Center
 
Intel® Xeon® Processor E5-2600 v4 Financial Security Applications Showcase
Intel® Xeon® Processor E5-2600 v4 Financial Security Applications ShowcaseIntel® Xeon® Processor E5-2600 v4 Financial Security Applications Showcase
Intel® Xeon® Processor E5-2600 v4 Financial Security Applications ShowcaseIntel IT Center
 
Intel® Xeon® Processor E5-2600 v4 Telco Cloud Digital Applications Showcase
Intel® Xeon® Processor E5-2600 v4 Telco Cloud Digital Applications ShowcaseIntel® Xeon® Processor E5-2600 v4 Telco Cloud Digital Applications Showcase
Intel® Xeon® Processor E5-2600 v4 Telco Cloud Digital Applications ShowcaseIntel IT Center
 
Intel® Xeon® Processor E5-2600 v4 Tech Computing Applications Showcase
Intel® Xeon® Processor E5-2600 v4 Tech Computing Applications ShowcaseIntel® Xeon® Processor E5-2600 v4 Tech Computing Applications Showcase
Intel® Xeon® Processor E5-2600 v4 Tech Computing Applications ShowcaseIntel IT Center
 

More from Intel IT Center (20)

AI Crash Course- Supercomputing
AI Crash Course- SupercomputingAI Crash Course- Supercomputing
AI Crash Course- Supercomputing
 
FPGA Inference - DellEMC SURFsara
FPGA Inference - DellEMC SURFsaraFPGA Inference - DellEMC SURFsara
FPGA Inference - DellEMC SURFsara
 
High Memory Bandwidth Demo @ One Intel Station
High Memory Bandwidth Demo @ One Intel StationHigh Memory Bandwidth Demo @ One Intel Station
High Memory Bandwidth Demo @ One Intel Station
 
INFOGRAPHIC: Advantages of Intel vs. IBM Power on SAP HANA solutions
INFOGRAPHIC: Advantages of Intel vs. IBM Power on SAP HANA solutionsINFOGRAPHIC: Advantages of Intel vs. IBM Power on SAP HANA solutions
INFOGRAPHIC: Advantages of Intel vs. IBM Power on SAP HANA solutions
 
Disrupt Hackers With Robust User Authentication
Disrupt Hackers With Robust User AuthenticationDisrupt Hackers With Robust User Authentication
Disrupt Hackers With Robust User Authentication
 
Strengthen Your Enterprise Arsenal Against Cyber Attacks With Hardware-Enhanc...
Strengthen Your Enterprise Arsenal Against Cyber Attacks With Hardware-Enhanc...Strengthen Your Enterprise Arsenal Against Cyber Attacks With Hardware-Enhanc...
Strengthen Your Enterprise Arsenal Against Cyber Attacks With Hardware-Enhanc...
 
Harness Digital Disruption to Create 2022’s Workplace Today
Harness Digital Disruption to Create 2022’s Workplace TodayHarness Digital Disruption to Create 2022’s Workplace Today
Harness Digital Disruption to Create 2022’s Workplace Today
 
Don't Rely on Software Alone. Protect Endpoints with Hardware-Enhanced Security.
Don't Rely on Software Alone.Protect Endpoints with Hardware-Enhanced Security.Don't Rely on Software Alone.Protect Endpoints with Hardware-Enhanced Security.
Don't Rely on Software Alone. Protect Endpoints with Hardware-Enhanced Security.
 
Achieve Unconstrained Collaboration in a Digital World
Achieve Unconstrained Collaboration in a Digital WorldAchieve Unconstrained Collaboration in a Digital World
Achieve Unconstrained Collaboration in a Digital World
 
Intel® Xeon® Scalable Processors Enabled Applications Marketing Guide
Intel® Xeon® Scalable Processors Enabled Applications Marketing GuideIntel® Xeon® Scalable Processors Enabled Applications Marketing Guide
Intel® Xeon® Scalable Processors Enabled Applications Marketing Guide
 
#NABshow: National Association of Broadcasters 2017 Super Session Presentatio...
#NABshow: National Association of Broadcasters 2017 Super Session Presentatio...#NABshow: National Association of Broadcasters 2017 Super Session Presentatio...
#NABshow: National Association of Broadcasters 2017 Super Session Presentatio...
 
Identity Protection for the Digital Age
Identity Protection for the Digital AgeIdentity Protection for the Digital Age
Identity Protection for the Digital Age
 
Three Steps to Making a Digital Workplace a Reality
Three Steps to Making a Digital Workplace a RealityThree Steps to Making a Digital Workplace a Reality
Three Steps to Making a Digital Workplace a Reality
 
Three Steps to Making The Digital Workplace a Reality - by Intel’s Chad Const...
Three Steps to Making The Digital Workplace a Reality - by Intel’s Chad Const...Three Steps to Making The Digital Workplace a Reality - by Intel’s Chad Const...
Three Steps to Making The Digital Workplace a Reality - by Intel’s Chad Const...
 
Intel® Xeon® Processor E7-8800/4800 v4 EAMG 2.0
Intel® Xeon® Processor E7-8800/4800 v4 EAMG 2.0Intel® Xeon® Processor E7-8800/4800 v4 EAMG 2.0
Intel® Xeon® Processor E7-8800/4800 v4 EAMG 2.0
 
Intel® Xeon® Processor E5-2600 v4 Enterprise Database Applications Showcase
Intel® Xeon® Processor E5-2600 v4 Enterprise Database Applications ShowcaseIntel® Xeon® Processor E5-2600 v4 Enterprise Database Applications Showcase
Intel® Xeon® Processor E5-2600 v4 Enterprise Database Applications Showcase
 
Intel® Xeon® Processor E5-2600 v4 Core Business Applications Showcase
Intel® Xeon® Processor E5-2600 v4 Core Business Applications ShowcaseIntel® Xeon® Processor E5-2600 v4 Core Business Applications Showcase
Intel® Xeon® Processor E5-2600 v4 Core Business Applications Showcase
 
Intel® Xeon® Processor E5-2600 v4 Financial Security Applications Showcase
Intel® Xeon® Processor E5-2600 v4 Financial Security Applications ShowcaseIntel® Xeon® Processor E5-2600 v4 Financial Security Applications Showcase
Intel® Xeon® Processor E5-2600 v4 Financial Security Applications Showcase
 
Intel® Xeon® Processor E5-2600 v4 Telco Cloud Digital Applications Showcase
Intel® Xeon® Processor E5-2600 v4 Telco Cloud Digital Applications ShowcaseIntel® Xeon® Processor E5-2600 v4 Telco Cloud Digital Applications Showcase
Intel® Xeon® Processor E5-2600 v4 Telco Cloud Digital Applications Showcase
 
Intel® Xeon® Processor E5-2600 v4 Tech Computing Applications Showcase
Intel® Xeon® Processor E5-2600 v4 Tech Computing Applications ShowcaseIntel® Xeon® Processor E5-2600 v4 Tech Computing Applications Showcase
Intel® Xeon® Processor E5-2600 v4 Tech Computing Applications Showcase
 

Scalable Parallel Performance Measurement with the Scalasca Toolset

  • 2. June 2013 JSC 2 Parallel Architectures: State of the Art Network or Switch ... N0 N1 Nk Inter- connect P0 Pn ... Memory A0 Am ... Inter- connect P0 Pn ... Memory A0 Am ... Inter- connect P0 Pn ... A0 Am ... Memory Pi Core0 Core1 Corer L10 L11 L1 L20 L2r/2 L30 ... ... Aj Router Router Router Router Router Router Router Router Router Router Router Router Router Router Router Router Router Router Router Router Router Router Router Router Router Router Router Router Router Router Router Router Router Router or SMP NUMA
  • 3. June 2013 JSC 3 Parallel Performance Challenges • Current and future systems (will) consist of  Complex configurations  With a huge number of components  Very likely heterogeneous • Deep software hierarchies of large, complex software components will be required to make use of such systems  Sophisticated integrated performance measurement, analysis, and optimization capabilities will be required to efficiently operate such systems  Tools which provide insight not just numbers or charts needed!
  • 4. June 2013 JSC 4 “A picture is worth 1000 words…” • “Real world” example• MPI ring program
  • 5. June 2013 JSC 5 “What about 1000’s of pictures?” (with 100’s of menu options)
  • 6. June 2013 JSC 6 Example Automatic Analysis: Late Sender
  • 7. June 2013 JSC 7 Scalasca: Example MPI Patterns time process ENTER EXIT SEND RECV COLLEXIT (a) Late Sender time process (b) Late Receiver time process (d) Wait at N x N time process (c) Late Sender / Wrong Order
  • 8. June 2013 JSC 8 The Scalasca Project • Scalable Analysis of Large Scale Applications • Approach  Instrument C, C++, and Fortran parallel applications  Based on MPI, OpenMP, SHMEM, or hybrid  Option 1: scalable call-path profiling  Option 2: scalable event trace analysis  Collect event traces  Search trace for event patterns representing inefficiencies  Categorize and rank inefficiencies found • Supports MPI 2.2 (P2P, collectives, RMA, IO) and OpenMP 3.0 (excl. nesting) http://www.scalasca.org/
  • 9. June 2013 JSC 9 Scalasca Example: CESM Sea Ice Module Late Sender Analysis • Finds waiting at MPI_Waitall() inside ice boundary halo update • Shows distribution of imbalance across system and ranks
  • 10. June 2013 JSC 10 Scalasca Example: CESM Sea Ice Module Late Sender Analysis + Application Topology • Shows distribution of imbalance over topology • MPI topologies are automatically captured
  • 11. June 2013 JSC 11time Scalasca Root Cause Analysis • Root-cause analysis  Wait states typically caused by load or communication imbalances earlier in the program  Waiting time can also propagate (e.g., indirect waiting time)  Enhanced performance analysis to find the root cause of wait states • Approach  Distinguish between direct and indirect waiting time  Identify call path/process combinations delaying other processes and causing first order waiting time  Identify original delay Recv Send Send foo foo foo bar bar Recv A B C cause Recv Recv Direct waitIndirect wait Recv barDELAY
  • 12. June 2013 JSC 12 Scalasca Example: CESM Sea Ice Module Direct Wait Time Analysis • Direct wait caused by ranks processing areas near the north and south ice borders
  • 13. June 2013 JSC 13 Scalasca Example: CESM Sea Ice Module Indirect Wait Time Analysis • Indirect waits occurs for ranks processing warmer areas
  • 14. June 2013 JSC 14 Scalasca Example: CESM Sea Ice Module Delay Costs Analysis • Delays NOT caused on ranks processing ice!
  • 15. June 2013 JSC 15 NEW: Scalasca on Intel MIC Example: • TACC Stampede • NAS BT-MZ code • MPI/OpenMP • 8x16 CPU threads (2 MPI/node) • 60x16 MIC threads (15 MPI/MIC) Supported modes • Host-only or MIC-only • Symmetric Not yet supported modes • Offload
  • 16. June 2013 JSC 16 Acknowledgements • Scalasca team (JSC) (GRS) • Sponsors Michael Knobloch Bernd Mohr Peter Philippen Markus Geimer Daniel Lorenz Christian Rössel David Böhme Marc-André Hermanns Pavel Saviankou Marc Schlütter Ilja Zhukov Alexandre Strube Brian Wylie Felix Wolf Anke Visser Monika Lücke Aamer Shah Alexandru Calotoiu Jie Jiang Sergei Shudler Guoyong Mao Philipp Gschwandtner
  • 17. June 2013 JSC 17 Questions? • Check out http://www.scalasca.org • Or contact us at scalasca@fz-juelich.de