SlideShare ist ein Scribd-Unternehmen logo
1 von 22
Downloaden Sie, um offline zu lesen
Instrumentation and
  analysis of NPB
         Zafar Gilani
         EMDC 2012
Measurement Tools and Techniques
             UPC
Outline
●   Introduction to benchmark app
●   Testbeds
●   Instrumentation
●   Traces
●   Measurement criterion
●   Evaluation
●   Anomalies
●   Conclusions
1




    Introduction to benchmark app
    ● NPB = NAS Parallel Benchmarks.
    ● A small set of programs designed to
      evaluate performance of parallel
      supercomputers.
    ● 5 kernels, 3 pseudo applications.
    ● 3 versions: Serial, OpenMP, MPI.
    ● 8 kind of classes of tests:
      ○   S - small, for quick tests
      ○   W - workstation size
      ○   A, B, C - standard tests, ~4x increase from A to C
      ○   D, E, F - large tests, ~16x increase from A to C
2




    Testbeds
                    Local                Remote
     Machine type   Laptop               Server
     Processor      Intel Core i3-330M   Intel Xeon E5645
                    2.13GHz              2.40GHz
     Cores          2                    6
     Cache (MB)     3                    12
     Memory (GB)    3                    24
3




    Instrumentation
    ● Preload Extrae's MPI trace library
      "libmpitrace.so".
    ● The library intercepts all the MPI calls and
      traces all the MPI events.
    ● Instrumented and executed:
       ○ NPB version 3.3 stable
       ○ NPB3.3-MPI
       ○ IS (Integer Sort) kernel with 2, 4, 8, 16 and 32 procs
    ● Per experiment:
       ○ Size of problem: Class C, 135 million values approx.
       ○ Iterations: 10
4

    Local
    traces
              Exec




       Comm
5

    Remote
    traces
Evaluation & Comparative
         Analysis
6




    Measurement criterion
    Metric               Relevance to NPB-MPI Integer Sort
    Computation time     General idea of speed-up.
    Communication time   Impact of increasing number of processes
                         on communication.
    Load imbalance       Which processes or threads do less as
                         compared to others.
    Bottlenecks          Performance bottlenecks.
    L1 cache misses      To see how many times the CPU had to
                         go to other memory to find data.
7




    Computation time
    ● Measured: thread processing time.
    ● Local:
      ○ increase in time directly proportional to nprocs
      ○ upto 32 processes
      ○ poor scalability
    ● Remote:
      ○ decrease in time directly proportional to nprocs
      ○ upto 32 processes
      ○ good scalability
8
9




    Communication time
    ● Overall communication time is determined
      by the process taking maximum time.
    ● Local:
      ○ rapid increase in time as number of processes are
          increased
    ● Remote:
       ○ nominal increase in time as number of processes
         are increased
10
11




     Load Imbalance
     ● On boada
       ○ For nprocs = 4, threads = {2, 3} are lazy.
       ○ For nprocs = 16, threads = {5, 6, 7, 8, 12} are lazy.
                                                                 Exec


                                                                 Wait



                                                             Comm
12




     Bottlenecks
     ● For nprocs = {8, 16, 32}, one or more
       processes takes more time.
       ○ Wait/Wait All signals.
       ○ Typical times for local machine is around 1000 ms.
       ○ Typical times for remote machine is around 250 ms.
         ■ 4x difference (threads in remote machine have
            shorter wait time).
13


     Wait




     I/O
14




     L1 cache misses
     ● Cache misses in local machine are more
       expensive: typically costing 5x more time.
       ○ Cache size difference? Local has to "look"
          elsewhere more often.
          ■ i3 has 3MB cache.
          ■ Xeon has 12MB cache.
15
16




     Anomalies
     ● For 32 threads:
       ○ Time taken to spawn threads varies.
       ○ Remote takes less time to spawn 32 threads.
       ○ Possible reasons:
         ■ Acquiring locks and switching between resource
              acquisition and release is costly.
     ● Time taken by "other" jobs also varies:
       ○ But these generally vary from system to system.
17




     Spawning




                Others ??
18




     Conclusions
     ● Instrumentation is necessary to reveal
       performance insights of parallel code.
     ● Extrae supports a handy procedure for
       automatic instrumentation.
     ● Some interesting observations:
       ○ IS does not properly scale on low-end machines
         beyond 16 procs.
       ○ Scales nicely on a server such as boada.
       ○ IS code becomes communication intensive when
         nprocs is increased.
       ○ Some bottlenecks deteriorate performance.
Instrumentation and
  analysis of NPB
         Zafar Gilani
         EMDC 2012
Measurement Tools and Techniques
             UPC

Weitere ähnliche Inhalte

Was ist angesagt?

On component interface
On component interfaceOn component interface
On component interfaceLaurence Chen
 
A simple tool for debug (tap>)
A simple tool for debug (tap>)A simple tool for debug (tap>)
A simple tool for debug (tap>)Laurence Chen
 
Efficient Bytecode Analysis: Linespeed Shellcode Detection
Efficient Bytecode Analysis: Linespeed Shellcode DetectionEfficient Bytecode Analysis: Linespeed Shellcode Detection
Efficient Bytecode Analysis: Linespeed Shellcode DetectionGeorg Wicherski
 
On the Necessity and Inapplicability of Python
On the Necessity and Inapplicability of PythonOn the Necessity and Inapplicability of Python
On the Necessity and Inapplicability of PythonTakeshi Akutsu
 
On the necessity and inapplicability of python
On the necessity and inapplicability of pythonOn the necessity and inapplicability of python
On the necessity and inapplicability of pythonYung-Yu Chen
 
Why Is Concurrent Programming Hard? And What Can We Do about It?
Why Is Concurrent Programming Hard? And What Can We Do about It?Why Is Concurrent Programming Hard? And What Can We Do about It?
Why Is Concurrent Programming Hard? And What Can We Do about It?Stefan Marr
 
BUD17-300: Journey of a packet
BUD17-300: Journey of a packetBUD17-300: Journey of a packet
BUD17-300: Journey of a packetLinaro
 
Greedy Enough for the Grid?
Greedy Enough for the Grid?Greedy Enough for the Grid?
Greedy Enough for the Grid?Matteo Romanello
 
Using R in remote computer clusters
Using R in remote computer clustersUsing R in remote computer clusters
Using R in remote computer clustersBurak Himmetoglu
 
BKK16-503 Undefined Behavior and Compiler Optimizations – Why Your Program St...
BKK16-503 Undefined Behavior and Compiler Optimizations – Why Your Program St...BKK16-503 Undefined Behavior and Compiler Optimizations – Why Your Program St...
BKK16-503 Undefined Behavior and Compiler Optimizations – Why Your Program St...Linaro
 
Parallel Programming on the ANDC cluster
Parallel Programming on the ANDC clusterParallel Programming on the ANDC cluster
Parallel Programming on the ANDC clusterSudhang Shankar
 
Diagnosing HotSpot JVM Memory Leaks with JFR and JMC
Diagnosing HotSpot JVM Memory Leaks with JFR and JMCDiagnosing HotSpot JVM Memory Leaks with JFR and JMC
Diagnosing HotSpot JVM Memory Leaks with JFR and JMCMushfekur Rahman
 
Building a Unified Logging Layer with Fluentd, Elasticsearch and Kibana
Building a Unified Logging Layer with Fluentd, Elasticsearch and KibanaBuilding a Unified Logging Layer with Fluentd, Elasticsearch and Kibana
Building a Unified Logging Layer with Fluentd, Elasticsearch and KibanaMushfekur Rahman
 

Was ist angesagt? (14)

On component interface
On component interfaceOn component interface
On component interface
 
A simple tool for debug (tap>)
A simple tool for debug (tap>)A simple tool for debug (tap>)
A simple tool for debug (tap>)
 
Efficient Bytecode Analysis: Linespeed Shellcode Detection
Efficient Bytecode Analysis: Linespeed Shellcode DetectionEfficient Bytecode Analysis: Linespeed Shellcode Detection
Efficient Bytecode Analysis: Linespeed Shellcode Detection
 
On the Necessity and Inapplicability of Python
On the Necessity and Inapplicability of PythonOn the Necessity and Inapplicability of Python
On the Necessity and Inapplicability of Python
 
On the necessity and inapplicability of python
On the necessity and inapplicability of pythonOn the necessity and inapplicability of python
On the necessity and inapplicability of python
 
Why Is Concurrent Programming Hard? And What Can We Do about It?
Why Is Concurrent Programming Hard? And What Can We Do about It?Why Is Concurrent Programming Hard? And What Can We Do about It?
Why Is Concurrent Programming Hard? And What Can We Do about It?
 
BUD17-300: Journey of a packet
BUD17-300: Journey of a packetBUD17-300: Journey of a packet
BUD17-300: Journey of a packet
 
Greedy Enough for the Grid?
Greedy Enough for the Grid?Greedy Enough for the Grid?
Greedy Enough for the Grid?
 
Using R in remote computer clusters
Using R in remote computer clustersUsing R in remote computer clusters
Using R in remote computer clusters
 
BKK16-503 Undefined Behavior and Compiler Optimizations – Why Your Program St...
BKK16-503 Undefined Behavior and Compiler Optimizations – Why Your Program St...BKK16-503 Undefined Behavior and Compiler Optimizations – Why Your Program St...
BKK16-503 Undefined Behavior and Compiler Optimizations – Why Your Program St...
 
Parallel Programming on the ANDC cluster
Parallel Programming on the ANDC clusterParallel Programming on the ANDC cluster
Parallel Programming on the ANDC cluster
 
Diagnosing HotSpot JVM Memory Leaks with JFR and JMC
Diagnosing HotSpot JVM Memory Leaks with JFR and JMCDiagnosing HotSpot JVM Memory Leaks with JFR and JMC
Diagnosing HotSpot JVM Memory Leaks with JFR and JMC
 
NS3 Tech Talk
NS3 Tech TalkNS3 Tech Talk
NS3 Tech Talk
 
Building a Unified Logging Layer with Fluentd, Elasticsearch and Kibana
Building a Unified Logging Layer with Fluentd, Elasticsearch and KibanaBuilding a Unified Logging Layer with Fluentd, Elasticsearch and Kibana
Building a Unified Logging Layer with Fluentd, Elasticsearch and Kibana
 

Andere mochten auch

2 rest-elevator-pitch
2 rest-elevator-pitch2 rest-elevator-pitch
2 rest-elevator-pitchzafargilani
 
5 state-of-cloud-applications-and-platforms
5 state-of-cloud-applications-and-platforms5 state-of-cloud-applications-and-platforms
5 state-of-cloud-applications-and-platformszafargilani
 
6 intelligent-placement-of-datacenters
6 intelligent-placement-of-datacenters6 intelligent-placement-of-datacenters
6 intelligent-placement-of-datacenterszafargilani
 
1 distributed-systems-template-modified
1 distributed-systems-template-modified1 distributed-systems-template-modified
1 distributed-systems-template-modifiedzafargilani
 
1 logical data models for cc arch
1 logical data models for cc arch1 logical data models for cc arch
1 logical data models for cc archzafargilani
 
Laporan lengakap percobaan pembiasan cahaya
Laporan lengakap percobaan pembiasan cahayaLaporan lengakap percobaan pembiasan cahaya
Laporan lengakap percobaan pembiasan cahayafikar zul
 
Topik 1 Dunia Melalui Deria Kita (bahagian 1)
Topik 1 Dunia Melalui Deria Kita (bahagian 1)Topik 1 Dunia Melalui Deria Kita (bahagian 1)
Topik 1 Dunia Melalui Deria Kita (bahagian 1)Faizal Jay'z
 
Bab 1 Dunia Melalui Deria Kita
Bab 1 Dunia Melalui Deria Kita Bab 1 Dunia Melalui Deria Kita
Bab 1 Dunia Melalui Deria Kita Safwan Yusuf
 
Hype vs. Reality: The AI Explainer
Hype vs. Reality: The AI ExplainerHype vs. Reality: The AI Explainer
Hype vs. Reality: The AI ExplainerLuminary Labs
 
Study: The Future of VR, AR and Self-Driving Cars
Study: The Future of VR, AR and Self-Driving CarsStudy: The Future of VR, AR and Self-Driving Cars
Study: The Future of VR, AR and Self-Driving CarsLinkedIn
 

Andere mochten auch (12)

2 rest-elevator-pitch
2 rest-elevator-pitch2 rest-elevator-pitch
2 rest-elevator-pitch
 
5 state-of-cloud-applications-and-platforms
5 state-of-cloud-applications-and-platforms5 state-of-cloud-applications-and-platforms
5 state-of-cloud-applications-and-platforms
 
6 intelligent-placement-of-datacenters
6 intelligent-placement-of-datacenters6 intelligent-placement-of-datacenters
6 intelligent-placement-of-datacenters
 
Bigtable
BigtableBigtable
Bigtable
 
1 distributed-systems-template-modified
1 distributed-systems-template-modified1 distributed-systems-template-modified
1 distributed-systems-template-modified
 
1 logical data models for cc arch
1 logical data models for cc arch1 logical data models for cc arch
1 logical data models for cc arch
 
Laporan lengakap percobaan pembiasan cahaya
Laporan lengakap percobaan pembiasan cahayaLaporan lengakap percobaan pembiasan cahaya
Laporan lengakap percobaan pembiasan cahaya
 
Topik 1 Dunia Melalui Deria Kita (bahagian 1)
Topik 1 Dunia Melalui Deria Kita (bahagian 1)Topik 1 Dunia Melalui Deria Kita (bahagian 1)
Topik 1 Dunia Melalui Deria Kita (bahagian 1)
 
Bab 1 Dunia Melalui Deria Kita
Bab 1 Dunia Melalui Deria Kita Bab 1 Dunia Melalui Deria Kita
Bab 1 Dunia Melalui Deria Kita
 
3 apache-avro
3 apache-avro3 apache-avro
3 apache-avro
 
Hype vs. Reality: The AI Explainer
Hype vs. Reality: The AI ExplainerHype vs. Reality: The AI Explainer
Hype vs. Reality: The AI Explainer
 
Study: The Future of VR, AR and Self-Driving Cars
Study: The Future of VR, AR and Self-Driving CarsStudy: The Future of VR, AR and Self-Driving Cars
Study: The Future of VR, AR and Self-Driving Cars
 

Ähnlich wie Assignment 1-mtat

The Dark Side Of Go -- Go runtime related problems in TiDB in production
The Dark Side Of Go -- Go runtime related problems in TiDB  in productionThe Dark Side Of Go -- Go runtime related problems in TiDB  in production
The Dark Side Of Go -- Go runtime related problems in TiDB in productionPingCAP
 
Understanding and Measuring I/O Performance
Understanding and Measuring I/O PerformanceUnderstanding and Measuring I/O Performance
Understanding and Measuring I/O PerformanceGlenn K. Lockwood
 
Benchmarks, performance, scalability, and capacity what's behind the numbers
Benchmarks, performance, scalability, and capacity what's behind the numbersBenchmarks, performance, scalability, and capacity what's behind the numbers
Benchmarks, performance, scalability, and capacity what's behind the numbersJustin Dorfman
 
Benchmarks, performance, scalability, and capacity what s behind the numbers...
Benchmarks, performance, scalability, and capacity  what s behind the numbers...Benchmarks, performance, scalability, and capacity  what s behind the numbers...
Benchmarks, performance, scalability, and capacity what s behind the numbers...james tong
 
Harnessing OpenCL in Modern Coprocessors
Harnessing OpenCL in Modern CoprocessorsHarnessing OpenCL in Modern Coprocessors
Harnessing OpenCL in Modern CoprocessorsUnai Lopez-Novoa
 
PEARC17: Evaluation of Intel Omni-Path on the Intel Knights Landing Processor
PEARC17: Evaluation of Intel Omni-Path on the Intel Knights Landing ProcessorPEARC17: Evaluation of Intel Omni-Path on the Intel Knights Landing Processor
PEARC17: Evaluation of Intel Omni-Path on the Intel Knights Landing ProcessorAntonio Gomez
 
An End to Order (many cores with java, session two)
An End to Order (many cores with java, session two)An End to Order (many cores with java, session two)
An End to Order (many cores with java, session two)Robert Burrell Donkin
 
Mirko Damiani - An Embedded soft real time distributed system in Go
Mirko Damiani - An Embedded soft real time distributed system in GoMirko Damiani - An Embedded soft real time distributed system in Go
Mirko Damiani - An Embedded soft real time distributed system in Golinuxlab_conf
 
The Search for Gravitational Waves
The Search for Gravitational WavesThe Search for Gravitational Waves
The Search for Gravitational Wavesinside-BigData.com
 
Preparing Fusion codes for Perlmutter - CGYRO
Preparing Fusion codes for Perlmutter - CGYROPreparing Fusion codes for Perlmutter - CGYRO
Preparing Fusion codes for Perlmutter - CGYROIgor Sfiligoi
 
Performance Optimization of CGYRO for Multiscale Turbulence Simulations
Performance Optimization of CGYRO for Multiscale Turbulence SimulationsPerformance Optimization of CGYRO for Multiscale Turbulence Simulations
Performance Optimization of CGYRO for Multiscale Turbulence SimulationsIgor Sfiligoi
 
GPU Computing for Data Science
GPU Computing for Data Science GPU Computing for Data Science
GPU Computing for Data Science Domino Data Lab
 
SPE effiency on modern hardware paper presentation
SPE effiency on modern hardware   paper presentationSPE effiency on modern hardware   paper presentation
SPE effiency on modern hardware paper presentationPanagiotisSavvaidis
 
Computer network (7)
Computer network (7)Computer network (7)
Computer network (7)NYversity
 
Building zero data loss pipelines with apache kafka
Building zero data loss pipelines with apache kafkaBuilding zero data loss pipelines with apache kafka
Building zero data loss pipelines with apache kafkaAvinash Ramineni
 
How I Sped up Complex Matrix-Vector Multiplication: Finding Intel MKL's "S
How I Sped up Complex Matrix-Vector Multiplication: Finding Intel MKL's "SHow I Sped up Complex Matrix-Vector Multiplication: Finding Intel MKL's "S
How I Sped up Complex Matrix-Vector Multiplication: Finding Intel MKL's "SBrandon Liu
 

Ähnlich wie Assignment 1-mtat (20)

The Dark Side Of Go -- Go runtime related problems in TiDB in production
The Dark Side Of Go -- Go runtime related problems in TiDB  in productionThe Dark Side Of Go -- Go runtime related problems in TiDB  in production
The Dark Side Of Go -- Go runtime related problems in TiDB in production
 
Understanding and Measuring I/O Performance
Understanding and Measuring I/O PerformanceUnderstanding and Measuring I/O Performance
Understanding and Measuring I/O Performance
 
Benchmarks, performance, scalability, and capacity what's behind the numbers
Benchmarks, performance, scalability, and capacity what's behind the numbersBenchmarks, performance, scalability, and capacity what's behind the numbers
Benchmarks, performance, scalability, and capacity what's behind the numbers
 
Benchmarks, performance, scalability, and capacity what s behind the numbers...
Benchmarks, performance, scalability, and capacity  what s behind the numbers...Benchmarks, performance, scalability, and capacity  what s behind the numbers...
Benchmarks, performance, scalability, and capacity what s behind the numbers...
 
Harnessing OpenCL in Modern Coprocessors
Harnessing OpenCL in Modern CoprocessorsHarnessing OpenCL in Modern Coprocessors
Harnessing OpenCL in Modern Coprocessors
 
PEARC17: Evaluation of Intel Omni-Path on the Intel Knights Landing Processor
PEARC17: Evaluation of Intel Omni-Path on the Intel Knights Landing ProcessorPEARC17: Evaluation of Intel Omni-Path on the Intel Knights Landing Processor
PEARC17: Evaluation of Intel Omni-Path on the Intel Knights Landing Processor
 
An End to Order (many cores with java, session two)
An End to Order (many cores with java, session two)An End to Order (many cores with java, session two)
An End to Order (many cores with java, session two)
 
OpenMP
OpenMPOpenMP
OpenMP
 
Mirko Damiani - An Embedded soft real time distributed system in Go
Mirko Damiani - An Embedded soft real time distributed system in GoMirko Damiani - An Embedded soft real time distributed system in Go
Mirko Damiani - An Embedded soft real time distributed system in Go
 
The Search for Gravitational Waves
The Search for Gravitational WavesThe Search for Gravitational Waves
The Search for Gravitational Waves
 
Preparing Fusion codes for Perlmutter - CGYRO
Preparing Fusion codes for Perlmutter - CGYROPreparing Fusion codes for Perlmutter - CGYRO
Preparing Fusion codes for Perlmutter - CGYRO
 
Parallel Algorithms
Parallel AlgorithmsParallel Algorithms
Parallel Algorithms
 
An End to Order
An End to OrderAn End to Order
An End to Order
 
Performance Optimization of CGYRO for Multiscale Turbulence Simulations
Performance Optimization of CGYRO for Multiscale Turbulence SimulationsPerformance Optimization of CGYRO for Multiscale Turbulence Simulations
Performance Optimization of CGYRO for Multiscale Turbulence Simulations
 
GPU Computing for Data Science
GPU Computing for Data Science GPU Computing for Data Science
GPU Computing for Data Science
 
SPE effiency on modern hardware paper presentation
SPE effiency on modern hardware   paper presentationSPE effiency on modern hardware   paper presentation
SPE effiency on modern hardware paper presentation
 
Computer network (7)
Computer network (7)Computer network (7)
Computer network (7)
 
Parallel Algorithms
Parallel AlgorithmsParallel Algorithms
Parallel Algorithms
 
Building zero data loss pipelines with apache kafka
Building zero data loss pipelines with apache kafkaBuilding zero data loss pipelines with apache kafka
Building zero data loss pipelines with apache kafka
 
How I Sped up Complex Matrix-Vector Multiplication: Finding Intel MKL's "S
How I Sped up Complex Matrix-Vector Multiplication: Finding Intel MKL's "SHow I Sped up Complex Matrix-Vector Multiplication: Finding Intel MKL's "S
How I Sped up Complex Matrix-Vector Multiplication: Finding Intel MKL's "S
 

Kürzlich hochgeladen

Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 

Kürzlich hochgeladen (20)

Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 

Assignment 1-mtat

  • 1. Instrumentation and analysis of NPB Zafar Gilani EMDC 2012 Measurement Tools and Techniques UPC
  • 2. Outline ● Introduction to benchmark app ● Testbeds ● Instrumentation ● Traces ● Measurement criterion ● Evaluation ● Anomalies ● Conclusions
  • 3. 1 Introduction to benchmark app ● NPB = NAS Parallel Benchmarks. ● A small set of programs designed to evaluate performance of parallel supercomputers. ● 5 kernels, 3 pseudo applications. ● 3 versions: Serial, OpenMP, MPI. ● 8 kind of classes of tests: ○ S - small, for quick tests ○ W - workstation size ○ A, B, C - standard tests, ~4x increase from A to C ○ D, E, F - large tests, ~16x increase from A to C
  • 4. 2 Testbeds Local Remote Machine type Laptop Server Processor Intel Core i3-330M Intel Xeon E5645 2.13GHz 2.40GHz Cores 2 6 Cache (MB) 3 12 Memory (GB) 3 24
  • 5. 3 Instrumentation ● Preload Extrae's MPI trace library "libmpitrace.so". ● The library intercepts all the MPI calls and traces all the MPI events. ● Instrumented and executed: ○ NPB version 3.3 stable ○ NPB3.3-MPI ○ IS (Integer Sort) kernel with 2, 4, 8, 16 and 32 procs ● Per experiment: ○ Size of problem: Class C, 135 million values approx. ○ Iterations: 10
  • 6. 4 Local traces Exec Comm
  • 7. 5 Remote traces
  • 9. 6 Measurement criterion Metric Relevance to NPB-MPI Integer Sort Computation time General idea of speed-up. Communication time Impact of increasing number of processes on communication. Load imbalance Which processes or threads do less as compared to others. Bottlenecks Performance bottlenecks. L1 cache misses To see how many times the CPU had to go to other memory to find data.
  • 10. 7 Computation time ● Measured: thread processing time. ● Local: ○ increase in time directly proportional to nprocs ○ upto 32 processes ○ poor scalability ● Remote: ○ decrease in time directly proportional to nprocs ○ upto 32 processes ○ good scalability
  • 11. 8
  • 12. 9 Communication time ● Overall communication time is determined by the process taking maximum time. ● Local: ○ rapid increase in time as number of processes are increased ● Remote: ○ nominal increase in time as number of processes are increased
  • 13. 10
  • 14. 11 Load Imbalance ● On boada ○ For nprocs = 4, threads = {2, 3} are lazy. ○ For nprocs = 16, threads = {5, 6, 7, 8, 12} are lazy. Exec Wait Comm
  • 15. 12 Bottlenecks ● For nprocs = {8, 16, 32}, one or more processes takes more time. ○ Wait/Wait All signals. ○ Typical times for local machine is around 1000 ms. ○ Typical times for remote machine is around 250 ms. ■ 4x difference (threads in remote machine have shorter wait time).
  • 16. 13 Wait I/O
  • 17. 14 L1 cache misses ● Cache misses in local machine are more expensive: typically costing 5x more time. ○ Cache size difference? Local has to "look" elsewhere more often. ■ i3 has 3MB cache. ■ Xeon has 12MB cache.
  • 18. 15
  • 19. 16 Anomalies ● For 32 threads: ○ Time taken to spawn threads varies. ○ Remote takes less time to spawn 32 threads. ○ Possible reasons: ■ Acquiring locks and switching between resource acquisition and release is costly. ● Time taken by "other" jobs also varies: ○ But these generally vary from system to system.
  • 20. 17 Spawning Others ??
  • 21. 18 Conclusions ● Instrumentation is necessary to reveal performance insights of parallel code. ● Extrae supports a handy procedure for automatic instrumentation. ● Some interesting observations: ○ IS does not properly scale on low-end machines beyond 16 procs. ○ Scales nicely on a server such as boada. ○ IS code becomes communication intensive when nprocs is increased. ○ Some bottlenecks deteriorate performance.
  • 22. Instrumentation and analysis of NPB Zafar Gilani EMDC 2012 Measurement Tools and Techniques UPC