2. Contents
2
Introduction to Parallel Computing
Motivating Parallelism
Scope of Parallel Computing
Parallel Programming Platforms
Implicit Parallelism
Trends in Microprocessor and Architectures
Limitations of Memory System Performance
Dichotomy of Parallel Computing Platforms
Physical Organization of Parallel Platforms
Communication Costs in Parallel Machines
Scalable design principles
Architectures: N-wide superscalar architectures
Multi-core architectures.
3. Introduction to Parallel
Computing
3
A parallel computer is a “Collection of processing
elements that communicate and co-operate to solve large
problems fast”.
Processing of multiple tasks simultaneous on
multiple processor is called parallel processing.
4. What is Parallel Computing?
Traditionally, software has been written for serial computation:
To be run on a single computer having a single Central Processing Unit (CPU)
5. What is Parallel Computing?
In the simplest sense, parallel computing is the simultaneous use of
multiple compute resources to solve a computational problem.
6. Serial Vs Parallel Computing
Fetch/Store
Compute
Fetch/Store
Compute
communicate
Cooperative game
7. Motivating Parallelism
7
The role of parallelism in accelerating computing
speeds has been recognized for several decades.
Its role in providing multiplicity of datapaths and
increased access to storage elements has been
significant in commercial applications.
The scalable performance and lower cost of parallel
platforms is reflected in the wide variety of applications.
8. 8
Developing parallel hardware and software has traditionally
been time and effort intensive.
If one is to view this in the context of rapidly improving
uniprocessor speeds, one is tempted to question the need for
parallel computing.
This is the result of a number of fundamental physical and
computational limitations.
The emergence of standardized parallel programming
environments, libraries, and hardware have significantly
reduced time to (parallel) solution.
9. In short
9
1. Overcome limits to serial computing
2. Limits to increase transistor density
3. Limits to data transmission speed
4. Faster turn-around time
5. Solve larger problems
10. Parallel computing has great impact on wide range of
applications.
Commerical
Scientific
Turn around time should be minimum
High performance
Resource mangement
Load balencing
Dynamic libray
Minimum network congetion and latency
10
Scope of Parallel Computing
11. Applications
Commercial computing.
- Weather forecasting
- Remote sensors, Image processing
- Process optimization, operations research.
Scientific and Engineering application.
- Computational chemistry
- Molecular modelling
- Structure mechanics
Business application.
- E – Governance
- Medical Imaging
Internet applications.
- Internet server
- Digital Libraries
11
12. The main objective is to provide sufficient
details to programmer to be able to write
efficient code on variety of platform.
Performance of various parallel
algorithm.
12
Parallel Programming
Platforms
13. Implicit Parallelism
A programming language is said to be
implicitly parallel if its compiler or interpreter
can recognize opportunities for
parallelization and implement them without
being told to do so.
13
15. Dichotomy of Parallel
Computing Platforms
First explore a dichotomy based on the logical and
physical organization of parallel platforms.
The logical organization refers to a programmer's
view of the platform while the physical organization
refers to the actual hardware organization of the
platform.
The two critical components of parallel computing
from a programmer's perspective are ways of
expressing parallel tasks and mechanisms for
specifying interaction between these tasks.
The former is sometimes also referred to as the
control structure and the latter as the communication
model.
15
16. Control Structure of Parallel Platforms
16
Parallel tasks can be specified at various levels of granularity.
At the other extreme, individual instructions within a program
can be viewed as parallel tasks. Between these extremes lie a
range of models for specifying the control structure of programs
and the corresponding architectural support for them.
Parallelism from single instruction on multiple processors
Consider the following code segment that adds two vectors:
1 for (i = 0; i < 1000; i++)
2 c[i] = a[i] + b[i];
In this example, various iterations of the loop are independent
of each other; i.e., c[0] = a[0] + b[0]; c[1] = a[1] + b[1];, etc., can all be
executed independently of each other. Consequently, if there is a mechanism for executing the same
instruction, in this case add on all the processors with appropriate data, we
could execute this loop much faster
17. A typical SIMD architecture (a) and a typical MIMD
architecture (b).
17
Figure A typical SIMD architecture (a) and a typical MIMD architecture (b).
18. Executing a conditional statement on an SIMD computer
with four processors: (a) the conditional statement; (b) the
execution of the statement in two steps
18
19. Communication Model of Parallel Platforms
19
Shared-Address-Space Platforms
Typical shared-address-space architectures: (a) Uniform-memory-access
shared-address-space computer; (b) Uniform-memory-access shared-
address-space computer with caches and memories; (c) Non-uniform-
memory-access shared-address-space computer with local memory only.
20. Message-Passing Platforms
20
The logical machine view of a message-passing platform
consists of p processing nodes.
Instances clustered workstations and non-shared-address-
space multicomputers.
On such platforms, interactions between processes running
on different nodes must be accomplished using messages,
hence the name message passing.
This exchange of messages is used to transfer data, work,
and to synchronize actions among the processes.
In its most general form, message-passing paradigms
support execution of a different program on each of the p
nodes.
21. Physical Organization of
Parallel Platforms
21
Architecture of an Ideal Parallel Computer
Exclusive-read, exclusive-write (EREW) PRAM. In this class,
access to a memory location is exclusive. No concurrent read or
write operations are allowed.
Concurrent-read, exclusive-write (CREW) PRAM. In this class,
multiple read accesses to a memory location are allowed.
Exclusive-read, concurrent-write (ERCW) PRAM. Multiple write
accesses are allowed to a memory location, but multiple read
accesses are serialized.
Concurrent-read, concurrent-write (CRCW) PRAM. This class
allows multiple read and write accesses to a common memory
location. This is the most powerful PRAM model.
22. Interconnection Networks for Parallel Computers
▹ Interconnection networks can be classified
as static or dynamic. Static networks consist of point-
to-point communication links among processing nodes
and are also referred to as direct networks. Figure .Classification
of interconnection networks: (a) a static network; and (b) a dynamic network.
22
24. Two and three dimensional meshes: (a) 2-D mesh with no
wraparound; (b) 2-D mesh with wraparound link (2-D
torus); and (c) a 3-D mesh with no wraparound.
24
27. Scalable Design principles
❖ Avoid the single point of failure.
❖ Scale horizontally, not vertically.
❖ Push work as far away from the core as possible.
❖ API first.
❖ Cache everything, always.
❖ Provide as fresh as needed data.
❖ Design for maintenance and automation.
❖ Asynchronous rather than synchronous.
❖ Strive for statelessness.
28. N-wide superscalar architecture:
❖ Superscalar architecture is called as N-wide architecture
if it supports to fetch and dispatch of n instructions in
every cycle.
30. Multi-core architectures:
❖ Many cores fit on the single processor socket.
❖ 2)Also called Chip-Multiprocessor
❖ 3)These cores runs in parallel.
❖ 4)The architecture of a multicore processor enables
❖ communication between all available cores to ensure that
the processing tasks are divided and assigned accurately.