3. Demand for High Speed Computers
Technological advancement has its Limits
Solution is Replication of Processing Units
It leads to parallel Computers
3
5. Observations
Theory
Physical
Experiment
Classical science is based on
ï§observation,
ï§theory, and
ï§physical experimentation
Observation of a phenomenon leads
to a hypothesis.
The scientist develops a theory to
explain the phenomenon and designs
an experiment to test that theory.
5
Nature
6. Physical experiments are not always
feasible because:
âą Too Expensive
âą Time Consuming
âą Unethical
âą Impossible to perform
In contrast, modern science is
characterized by observation, theory,
experimentation, and numerical
simulation.
6
Observations
Theory
Numerical
Simulation
Nature
7. Numerical Simulation creates the experimental environment by using
Mathematical formulas. It is an increasingly important tool for
scientists, who often cannot use physical experiments to test
theories.
The modern scientist compares the behaviour of a numerical
simulation, which implements the theory, to observation of âreal worldâ
phenomena.
Many important scientific problems are so complex that solving them
via numerical simulation requires extraordinarily powerful computers.
7
8. These complex problems, often called grand challenges
for science (Levin 1989):
âą Quantum chemistry, statistical mechanics, and relativistic physics
âą Cosmology and astrophysics
âą Computational fluid dynamics and turbulence
âą Materials design and superconductivity
âą Biology, pharmacology, genome sequencing, genetic engineering,
protein folding, enzyme activity, and cell modelling
âą Medicine, and modelling of human organs and bones
âą Global weather and environmental modelling
8
10. Solomon: constructed by Westinghouse Electric Company in
the early 1960s.
ILLIAC IV: assembled at Burrough Corporation in the early
1970s.
At Carnegie-Mellon University, two parallel computers
C.mmp and Cm* were constructed during 1970s.
In early1980s researchers at Caltech built the parallel
computer Cosmic Cube
In the mid -1980s the parallel commercial computers were
constructed with microprocessors.
It took more than 20 years for parallel computers to move from the lab to market.
10
PP: Parallel Processing
Daniel Slotnick at University of Illinois designed two early parallel computers
12. The performance growth
rate for minicomputers,
mainframes and traditional
supercomputer has been
just under 20% a year
While the performance growth
rate for microprocessors has
averaged 35% a year.
12
PP: Parallel Processing
13. The performance of Processor can be improved
through
13
PP: Parallel Processing
14. Fundamental Architectural Advances
Bit parallel
memory
Bit-parallel
arithmetic
Cache
memory
Channels
Interleaved
memory
Instruction
lookahead
Instruction
pipelining
Multiple
functional
units
Pipelined
functional
units
Data
pipelining
14
PP: Parallel Processing
15. Microprocessors have been able to achieve more
impressive performance gain because-
They are at the
beginning stage
They have not
incorporated all
the architectural
advances
Their clock speed
is much slower
15
PP: Parallel Processing
18. Some of the organizations that delivered commercial parallel computers based on
microprocessor CPUs in the 10-year period 1984-1993 and their current status.
18
PP: Parallel Processing
22. Parallel computing is the use of a parallel computer to
reduce the time needed to solve a single
computational problem.
Parallel computing is now considered a standard way
for computational scientists and engineers to solve
problems in areas as diverse as galactic evolution,
climate modeling, aircraft design, and molecular
dynamics.
22
24. A parallel computer is a multiple processor computer
system supporting parallel programming.
24
Important categories of parallel
computers
Multicomputers Multiprocessors
25. A multicomputer is a parallel
computer constructed out of
multiple computers and an
interconnection network.
Each computer has its own
memory and it is accessible by
that particular processor .
The processors on different
computers interact by passing
messages to each other.
25
26. Multiprocessor is a
computer system with two
or more CPUs. It is highly
integrated system in which
all CPUs share access to a
single global memory.
This shared memory supports
communication &
synchronization among
processors.
26
27. It is information processing that emphasizes the
concurrent manipulation of data elements
belonging to one or more processes solving a
single problem.
A parallel computer capable of parallel processing.
27
28. âą Sequential events or processes which seem to
occur or progress at the same time.
Concurrent Processing
âą Events or processes which occur or progress
at the same time
Parallel Processing
28
29. 29
Concurrency: Two or more
threads in progress at the
same time but only one
executed by single CPU.
Parallelism: Two or more
threads executing at the
same time
30. 30
A supercomputer is a general purpose
computer capable of solving individual
problems at extremely high computational
speeds, compared with other computers
built during the same time.
31. 31
The throughput of a devices is the number of
results it produces per unit time.
There are many ways to improve the
throughput of a device.
Speed
By reducing Instruction
Cycle Time
Concurrency
By executing more
instructions per Cycle Time
32. 32
Speedup is the ratio between the time needed
for the most efficient sequential algorithm to
perform a computation and the time needed to
perform the same computation on a machine
incorporating pipelining and/or parallelism.
Speedup = Tsequential/ Tparallel/pipelined
33. 33
A pipelined computation is divided into a number
of steps called segments or stages. The output of
one segment is the input of the next segment.
35. 35
Data Parallelism is the use of multiple functional
units to apply the same operation simultaneously
to element of a data set.
THE SAME SET OF OPERATIONS TO DIFFERENT DATA
36. 36
A k-fold increase in the number of
functional units leads to a k-fold
increase in the throughput of the
system if there is no overhead
associated with the parallelism.
37. A processor array is a parallel computer with set
of identical, ALUs/Processing Elements (PEs) that
can operate in parallel in a lock step fashion under
the control of one control unit and a number of
memory modules.
37
38. 38
Three methods to assemble widgets.
a) A sequential widget assembly
machine produces one widget
every three units of time.
b) A three segment pipelined widget-
assembly machine produces the
first widget in three units of time
and successive widgets every time
unit thereafter.
c) A three-way data-parallel widget-
assembly machine produces three
widgets every three units of time.
40. Control parallelism is
achieved by applying
different operations to
different data elements
simultaneously.
40
Pipelining is a special case of control parallelism.
41. Most realistic problems can exploit both data and
control parallelism.
Problem: Weekly maintenance of a Lawn
1. Mowing the Lawn
2. Edging the Lawn
3. Checking the Sprinkle
4. Weeding the flower beds
41
42. Different workers mowing the
lawn simultaneously
(Data Parallelism)
Other team of workers are
weeding the flower bed in parallel
(Control Parallelism)
42
Turn off
Security
System
Check
Sprinklers
Turn on
Security
System
Mow
Lawn
Edge
Lawn
Weed
Garden
43. An algorithm is scalable if the level of parallelism increases at
least linearly with the problem size.
An architecture is scalable if it continues to yield the same
performance per processor, albeit used on a larger problem
size, as the number of processors increases.
Data parallel algorithms are more scalable than control
parallel algorithms. Control parallelism is usually a constant,
independent of the problem size, while the level of data
parallelism is an increasing function of the problem size.
43
44. There are different ways to classify parallel computers. One of the
more widely used classifications, in use since 1966, is called
Flynn's Taxonomy.
Flynn's taxonomy distinguishes multi-processor computer
architectures according to how they can be classified along the
two independent dimensions of Instruction Stream and Data
Stream. Each of these dimensions can have only one of two
possible states: Single or Multiple.
44
46. A serial (non-parallel) computer
Single Instruction: Only one instruction stream/ clock cycle
Single Data: Only one data stream/ clock cycle
Deterministic execution
This is the oldest type of computer
Examples: older generation mainframes,
minicomputers, workstations and
single processor/core PCs.
46
47. SIMD: A type of parallel computer
Single Instruction: All processing units execute the
same instruction at any given clock cycle
Multiple Data: Each processing unit can operate on a
different data element
Two varieties: Processor Arrays and Vector Pipelines
47
48. Processor Arrays: Thinking Machines CM-2, MasPar
MP-1 & MP-2, ILLIAC IV
Vector Pipelines: IBM 9000, Cray X-MP, Y-MP & C90,
Fujitsu VP, NEC SX-2, Hitachi S820, ETA10
Most modern computers, particularly those with
graphics processor units (GPUs) employ SIMD
instructions and execution units.
48
50. 50
(MISD): A type of parallel computer
Multiple Instruction: Each processing unit operates on the
data independently via separate instruction streams.
Single Data: A single data stream is fed into multiple
processing units.
Few (if any) actual examples of this class of parallel
computer have ever existed.
51. 51
(MIMD): A type of parallel computer
Multiple Instruction: Every processor may be
executing a different instruction stream
Multiple Data: Every processor may be working with a
different data stream
The most common type of parallel computer - most
modern supercomputers fall into this category.