SlideShare ist ein Scribd-Unternehmen logo
1 von 127
Parallel Computing Platforms Ananth Grama, Anshul Gupta,  George Karypis, and Vipin Kumar   To accompany the text ``Introduction to Parallel Computing'',  Addison Wesley, 2003.
Topic Overview  ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Scope of Parallelism  ,[object Object],[object Object],[object Object],[object Object],[object Object]
Implicit Parallelism: Trends in Microprocessor Architectures  ,[object Object],[object Object],[object Object],[object Object],[object Object]
Pipelining and Superscalar Execution  ,[object Object],[object Object],[object Object]
Pipelining and Superscalar Execution  ,[object Object],[object Object],[object Object],[object Object],[object Object]
Pipelining and Superscalar Execution  ,[object Object],[object Object]
Superscalar Execution: An Example   Example of a two-way superscalar execution of instructions.
Superscalar Execution: An Example ,[object Object],[object Object]
Superscalar Execution  ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Superscalar Execution:  Issue Mechanisms  ,[object Object],[object Object],[object Object]
Superscalar Execution:  Efficiency Considerations  ,[object Object],[object Object],[object Object],[object Object],[object Object]
Very Long Instruction Word (VLIW) Processors  ,[object Object],[object Object],[object Object],[object Object],[object Object]
Very Long Instruction Word (VLIW) Processors: Considerations  ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Limitations of  Memory System Performance  ,[object Object],[object Object],[object Object],[object Object]
Memory System Performance: Bandwidth and Latency  ,[object Object],[object Object],[object Object],[object Object],[object Object]
Memory Latency: An Example  ,[object Object],[object Object],[object Object]
Memory Latency: An Example  ,[object Object],[object Object],[object Object]
Improving Effective Memory  Latency Using Caches  ,[object Object],[object Object],[object Object],[object Object],[object Object]
Impact of Caches: Example  ,[object Object]
Impact of Caches: Example (continued) ,[object Object],[object Object],[object Object],[object Object],[object Object]
Impact of Caches ,[object Object],[object Object],[object Object]
Impact of Memory Bandwidth ,[object Object],[object Object],[object Object]
Impact of Memory Bandwidth: Example ,[object Object],[object Object],[object Object],[object Object]
Impact of Memory Bandwidth ,[object Object],[object Object],[object Object],[object Object]
Impact of Memory Bandwidth  ,[object Object],[object Object],[object Object]
Impact of Memory Bandwidth: Example  ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Impact of Memory Bandwidth: Example  ,[object Object],[object Object],[object Object],Multiplying a matrix with a vector: (a) multiplying column-by-column, keeping a running sum; (b) computing each element of the result as a dot product of a row of the matrix with the vector.
Impact of Memory Bandwidth: Example ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Memory System Performance: Summary  ,[object Object],[object Object],[object Object],[object Object]
Alternate Approaches for  Hiding Memory Latency  ,[object Object],[object Object],[object Object],[object Object],[object Object]
Multithreading for Latency Hiding  ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Multithreading for Latency Hiding: Example ,[object Object],[object Object],[object Object],[object Object]
Multithreading for Latency Hiding  ,[object Object],[object Object],[object Object]
Prefetching for Latency Hiding  ,[object Object],[object Object],[object Object],[object Object]
Tradeoffs of Multithreading and Prefetching  ,[object Object],[object Object],[object Object]
Tradeoffs of Multithreading and Prefetching  ,[object Object],[object Object],[object Object],[object Object]
Explicitly Parallel Platforms
Dichotomy of Parallel Computing Platforms  ,[object Object],[object Object]
Control Structure of Parallel Programs  ,[object Object],[object Object]
Control Structure of Parallel Programs  ,[object Object],[object Object],[object Object]
SIMD and MIMD Processors A typical SIMD architecture (a) and a typical MIMD architecture (b).
SIMD Processors  ,[object Object],[object Object],[object Object],[object Object]
Conditional Execution in SIMD Processors  Executing a conditional statement on an SIMD computer with four processors: (a) the conditional statement; (b) the execution of the statement in two steps.
MIMD Processors ,[object Object],[object Object],[object Object],[object Object]
SIMD-MIMD Comparison  ,[object Object],[object Object],[object Object],[object Object]
Communication Model  of Parallel Platforms  ,[object Object],[object Object],[object Object]
Shared-Address-Space Platforms  ,[object Object],[object Object],[object Object]
NUMA and UMA Shared-Address-Space Platforms  Typical shared-address-space architectures: (a) Uniform-memory access shared-address-space computer; (b) Uniform-memory-access shared-address-space computer with caches and memories; (c) Non-uniform-memory-access shared-address-space computer with local memory only.
NUMA and UMA  Shared-Address-Space Platforms  ,[object Object],[object Object],[object Object],[object Object],[object Object]
Shared-Address-Space  vs.  Shared Memory Machines   ,[object Object],[object Object],[object Object]
Message-Passing Platforms  ,[object Object],[object Object],[object Object],[object Object]
Message Passing  vs.  Shared Address Space Platforms   ,[object Object],[object Object]
Physical Organization  of Parallel Platforms  ,[object Object]
Architecture of an  Ideal Parallel Computer  ,[object Object],[object Object],[object Object]
Architecture of an  Ideal Parallel Computer ,[object Object],[object Object],[object Object],[object Object],[object Object]
Architecture of an  Ideal Parallel Computer  ,[object Object],[object Object],[object Object],[object Object],[object Object]
Physical Complexity of an  Ideal Parallel Computer  ,[object Object],[object Object],[object Object]
Interconnection Networks  for Parallel Computers  ,[object Object],[object Object],[object Object],[object Object],[object Object]
Static and Dynamic Interconnection Networks  Classification of interconnection networks: (a) a static network; and (b) a dynamic network.
Interconnection Networks  ,[object Object],[object Object],[object Object]
Interconnection Networks:  Network Interfaces  ,[object Object],[object Object],[object Object],[object Object]
Network Topologies  ,[object Object],[object Object],[object Object]
Network Topologies: Buses  ,[object Object],[object Object],[object Object],[object Object],[object Object]
Network Topologies: Buses  Bus-based interconnects (a) with no local caches; (b) with local memory/caches. Since much of the data accessed by processors is local to the processor, a local memory can improve the performance of bus-based machines.
Network Topologies: Crossbars A completely non-blocking crossbar network connecting  p  processors to b memory banks. A crossbar network uses an  p×m  grid of switches to connect  p  inputs to m outputs in a non-blocking manner.
Network Topologies: Crossbars ,[object Object],[object Object],[object Object]
Network Topologies:  Multistage Networks  ,[object Object],[object Object],[object Object]
Network Topologies:  Multistage Networks The schematic of a typical multistage interconnection network.
Network Topologies: Multistage Omega Network ,[object Object],[object Object],[object Object]
Network Topologies:  Multistage Omega Network Each stage of the Omega network implements a perfect shuffle as follows: A perfect shuffle interconnection for eight inputs and outputs.
Network Topologies:  Multistage Omega Network ,[object Object],[object Object],Two switching configurations of the 2 × 2 switch:  (a) Pass-through; (b) Cross-over.
Network Topologies:  Multistage Omega Network A complete omega network connecting eight inputs and eight outputs. An omega network has  p/2 × log p  switching nodes, and the cost of such a network grows as  (p log p). A complete Omega network with the perfect shuffle interconnects and switches can now be illustrated :
Network Topologies:  Multistage Omega Network – Routing ,[object Object],[object Object],[object Object],[object Object]
Network Topologies:  Multistage Omega Network – Routing An example of blocking in omega network: one of the messages  (010 to 111 or 110 to 100) is blocked at link AB.
Network Topologies:  Completely Connected Network ,[object Object],[object Object],[object Object],[object Object]
Network Topologies: Completely Connected and Star Connected Networks Example of an 8-node completely connected network. (a) A completely-connected network of eight nodes;  (b) a star connected network of nine nodes.
Network Topologies:  Star Connected Network ,[object Object],[object Object],[object Object]
Network Topologies:  Linear Arrays, Meshes, and  k-d  Meshes ,[object Object],[object Object],[object Object],[object Object]
Network Topologies: Linear Arrays Linear arrays: (a) with no wraparound links; (b) with wraparound link.
Network Topologies:  Two- and Three Dimensional Meshes Two and three dimensional meshes: (a) 2-D mesh with no wraparound; (b) 2-D mesh with wraparound link (2-D torus); and (c) a 3-D mesh with no wraparound.
Network Topologies:  Hypercubes and their Construction Construction of hypercubes from hypercubes of lower dimension.
Network Topologies:  Properties of Hypercubes ,[object Object],[object Object],[object Object]
Network Topologies: Tree-Based Networks Complete binary tree networks: (a) a static tree network; and (b) a dynamic tree network.
Network Topologies: Tree Properties  ,[object Object],[object Object],[object Object],[object Object]
Network Topologies: Fat Trees A fat tree network of 16 processing nodes.
Evaluating  Static Interconnection Networks ,[object Object],[object Object],[object Object]
Evaluating  Static Interconnection Networks Wraparound  k -ary  d -cube  Hypercube  2-D wraparound mesh  2-D mesh, no wraparound  Linear array  Complete binary tree  Star  Completely-connected  Cost  (No. of links)  Arc Connectivity  BisectionWidth  Diameter  Network
Evaluating Dynamic Interconnection Networks Dynamic Tree  Omega Network  Crossbar  Cost  (No. of links)  Arc Connectivity  Bisection Width  Diameter  Network
Cache Coherence  in Multiprocessor Systems  ,[object Object],[object Object],[object Object],[object Object]
Cache Coherence  in Multiprocessor Systems  Cache coherence in multiprocessor systems: (a) Invalidate protocol; (b) Update protocol for shared variables. When the value of a variable is changes, all its copies must either be invalidated or updated.
Cache Coherence:  Update and Invalidate Protocols  ,[object Object],[object Object],[object Object],[object Object]
Maintaining Coherence  Using Invalidate Protocols  ,[object Object],[object Object],[object Object],[object Object],[object Object]
Maintaining Coherence  Using Invalidate Protocols State diagram of a simple three-state coherence protocol.
Maintaining Coherence  Using Invalidate Protocols  Example of parallel program execution with the simple three-state coherence protocol.
Snoopy Cache Systems How are invalidates sent to the right processors? In snoopy caches, there is a broadcast media that listens to all invalidates and read requests and performs appropriate coherence operations locally. A simple snoopy bus based cache coherence system.
Performance of Snoopy Caches  ,[object Object],[object Object],[object Object]
Directory Based Systems  ,[object Object],[object Object],[object Object]
Directory Based Systems Architecture of typical directory based systems: (a) a centralized directory; and (b) a distributed directory.
Performance of  Directory Based Schemes  ,[object Object],[object Object],[object Object],[object Object]
Communication Costs  in Parallel Machines  ,[object Object],[object Object]
Message Passing Costs in  Parallel Computers ,[object Object],[object Object],[object Object],[object Object]
Store-and-Forward Routing  ,[object Object],[object Object],[object Object]
Routing Techniques Passing a message from node  P 0  to  P 3  (a) through a store-and-forward communication network; (b) and (c) extending the concept to cut-through routing. The shaded regions represent the time that the message is in transit. The startup time associated with this message transfer is assumed to be zero.
Packet Routing ,[object Object],[object Object],[object Object],[object Object],[object Object]
Cut-Through Routing  ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Cut-Through Routing  ,[object Object],[object Object]
Simplified Cost Model for Communicating Messages ,[object Object],[object Object],[object Object],[object Object]
Simplified Cost Model for Communicating Messages ,[object Object],[object Object],[object Object],[object Object]
Cost Models for  Shared Address Space Machines  ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Routing Mechanisms  for Interconnection Networks  ,[object Object],[object Object],[object Object]
Routing Mechanisms  for Interconnection Networks Routing a message from node  P s  (010) to node  P d  (111) in a three-dimensional hypercube using E-cube routing.
Mapping Techniques for Graphs  ,[object Object],[object Object],[object Object]
Mapping Techniques for Graphs: Metrics  ,[object Object],[object Object],[object Object],[object Object],[object Object]
Embedding a Linear Array  into a Hypercube  ,[object Object],[object Object],0
Embedding a Linear Array  into a Hypercube ,[object Object],[object Object]
Embedding a Linear Array  into a Hypercube: Example ( a) A three-bit reflected Gray code ring; and (b) its embedding into a three-dimensional hypercube.
Embedding a Mesh  into a Hypercube ,[object Object]
Embedding a Mesh into a Hypercube (a) A 4 × 4 mesh illustrating the mapping of mesh nodes to the nodes in a four-dimensional hypercube; and (b) a 2 × 4 mesh embedded into a three-dimensional hypercube. Once again, the congestion, dilation, and expansion of the mapping is 1.
Embedding a Mesh into a Linear Array  ,[object Object],[object Object],[object Object]
Embedding a Mesh into a Linear Array: Example (a) Embedding a 16 node linear array into a 2-D mesh; and (b) the inverse of the mapping. Solid lines correspond to links in the linear array and normal lines to links in the mesh.
Embedding a Hypercube into a 2-D Mesh ,[object Object],[object Object],[object Object]
Embedding a Hypercube into a 2-D Mesh: Example  Embedding a hypercube into a 2-D mesh.
Case Studies:  The IBM Blue-Gene Architecture  The hierarchical architecture of Blue Gene.
Case Studies:  The Cray T3E Architecture Interconnection network of the Cray T3E:  (a) node architecture; (b) network topology.
Case Studies:  The SGI Origin 3000 Architecture Architecture of the SGI Origin 3000 family of servers.
Case Studies:  The Sun HPC Server Architecture Architecture of the Sun Enterprise family of servers.

Weitere ähnliche Inhalte

Was ist angesagt?

Dynamic load balancing in distributed systems in the presence of delays a re...
Dynamic load balancing in distributed systems in the presence of delays  a re...Dynamic load balancing in distributed systems in the presence of delays  a re...
Dynamic load balancing in distributed systems in the presence of delays a re...
Mumbai Academisc
 
Performance of a speculative transmission scheme for scheduling latency reduc...
Performance of a speculative transmission scheme for scheduling latency reduc...Performance of a speculative transmission scheme for scheduling latency reduc...
Performance of a speculative transmission scheme for scheduling latency reduc...
Mumbai Academisc
 
Analysis of Multicore Performance Degradation of Scientific Applications
Analysis of Multicore Performance Degradation of Scientific ApplicationsAnalysis of Multicore Performance Degradation of Scientific Applications
Analysis of Multicore Performance Degradation of Scientific Applications
James McGalliard
 
AN ATTEMPT TO IMPROVE THE PROCESSOR PERFORMANCE BY PROPER MEMORY MANAGEMENT F...
AN ATTEMPT TO IMPROVE THE PROCESSOR PERFORMANCE BY PROPER MEMORY MANAGEMENT F...AN ATTEMPT TO IMPROVE THE PROCESSOR PERFORMANCE BY PROPER MEMORY MANAGEMENT F...
AN ATTEMPT TO IMPROVE THE PROCESSOR PERFORMANCE BY PROPER MEMORY MANAGEMENT F...
IJCSEA Journal
 
Fpga implementation of scalable queue manager
Fpga implementation of scalable queue managerFpga implementation of scalable queue manager
Fpga implementation of scalable queue manager
iaemedu
 
process management
 process management process management
process management
Ashish Kumar
 

Was ist angesagt? (19)

Scheduling in distributed systems - Andrii Vozniuk
Scheduling in distributed systems - Andrii VozniukScheduling in distributed systems - Andrii Vozniuk
Scheduling in distributed systems - Andrii Vozniuk
 
Limitations of memory system performance
Limitations of memory system performanceLimitations of memory system performance
Limitations of memory system performance
 
Tdt4260 miniproject report_group_3
Tdt4260 miniproject report_group_3Tdt4260 miniproject report_group_3
Tdt4260 miniproject report_group_3
 
Cache memory
Cache memoryCache memory
Cache memory
 
SPROJReport (1)
SPROJReport (1)SPROJReport (1)
SPROJReport (1)
 
Oversimplified CA
Oversimplified CAOversimplified CA
Oversimplified CA
 
STUDY OF VARIOUS FACTORS AFFECTING PERFORMANCE OF MULTI-CORE PROCESSORS
STUDY OF VARIOUS FACTORS AFFECTING PERFORMANCE OF MULTI-CORE PROCESSORSSTUDY OF VARIOUS FACTORS AFFECTING PERFORMANCE OF MULTI-CORE PROCESSORS
STUDY OF VARIOUS FACTORS AFFECTING PERFORMANCE OF MULTI-CORE PROCESSORS
 
Computer architecture
Computer architectureComputer architecture
Computer architecture
 
Dynamic load balancing in distributed systems in the presence of delays a re...
Dynamic load balancing in distributed systems in the presence of delays  a re...Dynamic load balancing in distributed systems in the presence of delays  a re...
Dynamic load balancing in distributed systems in the presence of delays a re...
 
Performance of a speculative transmission scheme for scheduling latency reduc...
Performance of a speculative transmission scheme for scheduling latency reduc...Performance of a speculative transmission scheme for scheduling latency reduc...
Performance of a speculative transmission scheme for scheduling latency reduc...
 
Analysis of Multicore Performance Degradation of Scientific Applications
Analysis of Multicore Performance Degradation of Scientific ApplicationsAnalysis of Multicore Performance Degradation of Scientific Applications
Analysis of Multicore Performance Degradation of Scientific Applications
 
AN ATTEMPT TO IMPROVE THE PROCESSOR PERFORMANCE BY PROPER MEMORY MANAGEMENT F...
AN ATTEMPT TO IMPROVE THE PROCESSOR PERFORMANCE BY PROPER MEMORY MANAGEMENT F...AN ATTEMPT TO IMPROVE THE PROCESSOR PERFORMANCE BY PROPER MEMORY MANAGEMENT F...
AN ATTEMPT TO IMPROVE THE PROCESSOR PERFORMANCE BY PROPER MEMORY MANAGEMENT F...
 
Advanced computer architecture lesson 5 and 6
Advanced computer architecture lesson 5 and 6Advanced computer architecture lesson 5 and 6
Advanced computer architecture lesson 5 and 6
 
Fpga implementation of scalable queue manager
Fpga implementation of scalable queue managerFpga implementation of scalable queue manager
Fpga implementation of scalable queue manager
 
CS6601 DISTRIBUTED SYSTEMS
CS6601 DISTRIBUTED SYSTEMSCS6601 DISTRIBUTED SYSTEMS
CS6601 DISTRIBUTED SYSTEMS
 
Survey paper _ lakshmi yasaswi kamireddy(651771619)
Survey paper _ lakshmi yasaswi kamireddy(651771619)Survey paper _ lakshmi yasaswi kamireddy(651771619)
Survey paper _ lakshmi yasaswi kamireddy(651771619)
 
Superscalar Architecture_AIUB
Superscalar Architecture_AIUBSuperscalar Architecture_AIUB
Superscalar Architecture_AIUB
 
Load Balancing In Distributed Computing
Load Balancing In Distributed ComputingLoad Balancing In Distributed Computing
Load Balancing In Distributed Computing
 
process management
 process management process management
process management
 

Ähnlich wie Chap2 slides

CS 301 Computer ArchitectureStudent # 1 EID 09Kingdom of .docx
CS 301 Computer ArchitectureStudent # 1 EID 09Kingdom of .docxCS 301 Computer ArchitectureStudent # 1 EID 09Kingdom of .docx
CS 301 Computer ArchitectureStudent # 1 EID 09Kingdom of .docx
faithxdunce63732
 
Pipeline Mechanism
Pipeline MechanismPipeline Mechanism
Pipeline Mechanism
Ashik Iqbal
 

Ähnlich wie Chap2 slides (20)

Chap2 slides
Chap2 slidesChap2 slides
Chap2 slides
 
week_2Lec02_CS422.pptx
week_2Lec02_CS422.pptxweek_2Lec02_CS422.pptx
week_2Lec02_CS422.pptx
 
CS 301 Computer ArchitectureStudent # 1 EID 09Kingdom of .docx
CS 301 Computer ArchitectureStudent # 1 EID 09Kingdom of .docxCS 301 Computer ArchitectureStudent # 1 EID 09Kingdom of .docx
CS 301 Computer ArchitectureStudent # 1 EID 09Kingdom of .docx
 
Pipeline Mechanism
Pipeline MechanismPipeline Mechanism
Pipeline Mechanism
 
Summary of Simultaneous Multithreading: Maximizing On-Chip Parallelism
Summary of Simultaneous Multithreading: Maximizing On-Chip ParallelismSummary of Simultaneous Multithreading: Maximizing On-Chip Parallelism
Summary of Simultaneous Multithreading: Maximizing On-Chip Parallelism
 
Memory consistency models
Memory consistency modelsMemory consistency models
Memory consistency models
 
Hardback solution to accelerate multimedia computation through mgp in cmp
Hardback solution to accelerate multimedia computation through mgp in cmpHardback solution to accelerate multimedia computation through mgp in cmp
Hardback solution to accelerate multimedia computation through mgp in cmp
 
IRJET-A Review on Trends in Multicore Processor Based on Cache and Power Diss...
IRJET-A Review on Trends in Multicore Processor Based on Cache and Power Diss...IRJET-A Review on Trends in Multicore Processor Based on Cache and Power Diss...
IRJET-A Review on Trends in Multicore Processor Based on Cache and Power Diss...
 
Chapter 2 pc
Chapter 2 pcChapter 2 pc
Chapter 2 pc
 
Multicore Computers
Multicore ComputersMulticore Computers
Multicore Computers
 
shashank_spdp1993_00395543
shashank_spdp1993_00395543shashank_spdp1993_00395543
shashank_spdp1993_00395543
 
Study of various factors affecting performance of multi core processors
Study of various factors affecting performance of multi core processorsStudy of various factors affecting performance of multi core processors
Study of various factors affecting performance of multi core processors
 
1.prallelism
1.prallelism1.prallelism
1.prallelism
 
1.prallelism
1.prallelism1.prallelism
1.prallelism
 
REDUCING COMPETITIVE CACHE MISSES IN MODERN PROCESSOR ARCHITECTURES
REDUCING COMPETITIVE CACHE MISSES IN MODERN PROCESSOR ARCHITECTURESREDUCING COMPETITIVE CACHE MISSES IN MODERN PROCESSOR ARCHITECTURES
REDUCING COMPETITIVE CACHE MISSES IN MODERN PROCESSOR ARCHITECTURES
 
REDUCING COMPETITIVE CACHE MISSES IN MODERN PROCESSOR ARCHITECTURES
REDUCING COMPETITIVE CACHE MISSES IN MODERN PROCESSOR ARCHITECTURESREDUCING COMPETITIVE CACHE MISSES IN MODERN PROCESSOR ARCHITECTURES
REDUCING COMPETITIVE CACHE MISSES IN MODERN PROCESSOR ARCHITECTURES
 
Reducing Competitive Cache Misses in Modern Processor Architectures
Reducing Competitive Cache Misses in Modern Processor ArchitecturesReducing Competitive Cache Misses in Modern Processor Architectures
Reducing Competitive Cache Misses in Modern Processor Architectures
 
FrackingPaper
FrackingPaperFrackingPaper
FrackingPaper
 
Compiler design
Compiler designCompiler design
Compiler design
 
Parallel Algorithms Advantages and Disadvantages
Parallel Algorithms Advantages and DisadvantagesParallel Algorithms Advantages and Disadvantages
Parallel Algorithms Advantages and Disadvantages
 

Kürzlich hochgeladen

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Kürzlich hochgeladen (20)

Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelNavi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu SubbuApidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 

Chap2 slides

  • 1. Parallel Computing Platforms Ananth Grama, Anshul Gupta, George Karypis, and Vipin Kumar To accompany the text ``Introduction to Parallel Computing'', Addison Wesley, 2003.
  • 2.
  • 3.
  • 4.
  • 5.
  • 6.
  • 7.
  • 8. Superscalar Execution: An Example Example of a two-way superscalar execution of instructions.
  • 9.
  • 10.
  • 11.
  • 12.
  • 13.
  • 14.
  • 15.
  • 16.
  • 17.
  • 18.
  • 19.
  • 20.
  • 21.
  • 22.
  • 23.
  • 24.
  • 25.
  • 26.
  • 27.
  • 28.
  • 29.
  • 30.
  • 31.
  • 32.
  • 33.
  • 34.
  • 35.
  • 36.
  • 37.
  • 39.
  • 40.
  • 41.
  • 42. SIMD and MIMD Processors A typical SIMD architecture (a) and a typical MIMD architecture (b).
  • 43.
  • 44. Conditional Execution in SIMD Processors Executing a conditional statement on an SIMD computer with four processors: (a) the conditional statement; (b) the execution of the statement in two steps.
  • 45.
  • 46.
  • 47.
  • 48.
  • 49. NUMA and UMA Shared-Address-Space Platforms Typical shared-address-space architectures: (a) Uniform-memory access shared-address-space computer; (b) Uniform-memory-access shared-address-space computer with caches and memories; (c) Non-uniform-memory-access shared-address-space computer with local memory only.
  • 50.
  • 51.
  • 52.
  • 53.
  • 54.
  • 55.
  • 56.
  • 57.
  • 58.
  • 59.
  • 60. Static and Dynamic Interconnection Networks Classification of interconnection networks: (a) a static network; and (b) a dynamic network.
  • 61.
  • 62.
  • 63.
  • 64.
  • 65. Network Topologies: Buses Bus-based interconnects (a) with no local caches; (b) with local memory/caches. Since much of the data accessed by processors is local to the processor, a local memory can improve the performance of bus-based machines.
  • 66. Network Topologies: Crossbars A completely non-blocking crossbar network connecting p processors to b memory banks. A crossbar network uses an p×m grid of switches to connect p inputs to m outputs in a non-blocking manner.
  • 67.
  • 68.
  • 69. Network Topologies: Multistage Networks The schematic of a typical multistage interconnection network.
  • 70.
  • 71. Network Topologies: Multistage Omega Network Each stage of the Omega network implements a perfect shuffle as follows: A perfect shuffle interconnection for eight inputs and outputs.
  • 72.
  • 73. Network Topologies: Multistage Omega Network A complete omega network connecting eight inputs and eight outputs. An omega network has p/2 × log p switching nodes, and the cost of such a network grows as (p log p). A complete Omega network with the perfect shuffle interconnects and switches can now be illustrated :
  • 74.
  • 75. Network Topologies: Multistage Omega Network – Routing An example of blocking in omega network: one of the messages (010 to 111 or 110 to 100) is blocked at link AB.
  • 76.
  • 77. Network Topologies: Completely Connected and Star Connected Networks Example of an 8-node completely connected network. (a) A completely-connected network of eight nodes; (b) a star connected network of nine nodes.
  • 78.
  • 79.
  • 80. Network Topologies: Linear Arrays Linear arrays: (a) with no wraparound links; (b) with wraparound link.
  • 81. Network Topologies: Two- and Three Dimensional Meshes Two and three dimensional meshes: (a) 2-D mesh with no wraparound; (b) 2-D mesh with wraparound link (2-D torus); and (c) a 3-D mesh with no wraparound.
  • 82. Network Topologies: Hypercubes and their Construction Construction of hypercubes from hypercubes of lower dimension.
  • 83.
  • 84. Network Topologies: Tree-Based Networks Complete binary tree networks: (a) a static tree network; and (b) a dynamic tree network.
  • 85.
  • 86. Network Topologies: Fat Trees A fat tree network of 16 processing nodes.
  • 87.
  • 88. Evaluating Static Interconnection Networks Wraparound k -ary d -cube Hypercube 2-D wraparound mesh 2-D mesh, no wraparound Linear array Complete binary tree Star Completely-connected Cost (No. of links) Arc Connectivity BisectionWidth Diameter Network
  • 89. Evaluating Dynamic Interconnection Networks Dynamic Tree Omega Network Crossbar Cost (No. of links) Arc Connectivity Bisection Width Diameter Network
  • 90.
  • 91. Cache Coherence in Multiprocessor Systems Cache coherence in multiprocessor systems: (a) Invalidate protocol; (b) Update protocol for shared variables. When the value of a variable is changes, all its copies must either be invalidated or updated.
  • 92.
  • 93.
  • 94. Maintaining Coherence Using Invalidate Protocols State diagram of a simple three-state coherence protocol.
  • 95. Maintaining Coherence Using Invalidate Protocols Example of parallel program execution with the simple three-state coherence protocol.
  • 96. Snoopy Cache Systems How are invalidates sent to the right processors? In snoopy caches, there is a broadcast media that listens to all invalidates and read requests and performs appropriate coherence operations locally. A simple snoopy bus based cache coherence system.
  • 97.
  • 98.
  • 99. Directory Based Systems Architecture of typical directory based systems: (a) a centralized directory; and (b) a distributed directory.
  • 100.
  • 101.
  • 102.
  • 103.
  • 104. Routing Techniques Passing a message from node P 0 to P 3 (a) through a store-and-forward communication network; (b) and (c) extending the concept to cut-through routing. The shaded regions represent the time that the message is in transit. The startup time associated with this message transfer is assumed to be zero.
  • 105.
  • 106.
  • 107.
  • 108.
  • 109.
  • 110.
  • 111.
  • 112. Routing Mechanisms for Interconnection Networks Routing a message from node P s (010) to node P d (111) in a three-dimensional hypercube using E-cube routing.
  • 113.
  • 114.
  • 115.
  • 116.
  • 117. Embedding a Linear Array into a Hypercube: Example ( a) A three-bit reflected Gray code ring; and (b) its embedding into a three-dimensional hypercube.
  • 118.
  • 119. Embedding a Mesh into a Hypercube (a) A 4 × 4 mesh illustrating the mapping of mesh nodes to the nodes in a four-dimensional hypercube; and (b) a 2 × 4 mesh embedded into a three-dimensional hypercube. Once again, the congestion, dilation, and expansion of the mapping is 1.
  • 120.
  • 121. Embedding a Mesh into a Linear Array: Example (a) Embedding a 16 node linear array into a 2-D mesh; and (b) the inverse of the mapping. Solid lines correspond to links in the linear array and normal lines to links in the mesh.
  • 122.
  • 123. Embedding a Hypercube into a 2-D Mesh: Example Embedding a hypercube into a 2-D mesh.
  • 124. Case Studies: The IBM Blue-Gene Architecture The hierarchical architecture of Blue Gene.
  • 125. Case Studies: The Cray T3E Architecture Interconnection network of the Cray T3E: (a) node architecture; (b) network topology.
  • 126. Case Studies: The SGI Origin 3000 Architecture Architecture of the SGI Origin 3000 family of servers.
  • 127. Case Studies: The Sun HPC Server Architecture Architecture of the Sun Enterprise family of servers.