College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
System on chip architectures
1. SYSTEM DESIGN
Mr. A. B. Shinde
Assistant Professor,
Electronics Engineering,
PVPIT, Budhgaon.
shindesir.pvp@gmail.com1
2. CONCEPT OF SYSTEM
A system is a collection of elements or components that are organized for a
common purpose.
A system is a set of interacting or interdependent components forming an
integrated design.
A system has structure: it contains parts (or components) that are directly
or indirectly related to each other;
A system has behavior: it exhibits processes that fulfill its function or
purpose;
A system has interconnectivity: the parts and processes are connected by
structural and/or behavioral relationships.
2
3. SYSTEM
Elements of a system
Input: The inputs are said to be fed to the systems in order to get
the output.
Output: Those elements that exists in the system due to the
processing of the inputs is known as output
Processor: It is the operational component of a system which
processes the inputs.
Control: The control element guides the system. It is the decision-
making sub-system that controls the activities like governing
inputs, processing them and generating output.
Boundary and interface: The limits that identify its components,
processes and interrelationships when it interfaces with another
system.
3
4. IMPORTANCE OF SYSTEM ARCHITECTURES
A system architecture is the conceptual model that defines the
structure, behavior (functioning) and more views of a system.
A system architecture can comprise:
system components,
the externally visible properties of those components,
the relationships between them.
It can provide a plan from which products can be procured, and
systems developed, that will work together to implement the overall
system.
4
5. SYSTEM ON CHIP
System-on-a-chip (SoC or SOC) refers to integrating all components
of a computer or other electronic system into a single integrated circuit
(chip).
It may contain digital, analog, or mixed-signal
– all on one semiconductor chip.
5
7. SIMD
Single Instruction Multiple Data (SIMD), is a class of parallel computers in
Flynn's taxonomy.
In computing, SIMD is a technique employed to Achieve data level
parallelism.
7
8. SIMD
SIMD machines are capable of applying the
exact same instruction stream to multiple
streams of data simultaneously.
This type of architecture is perfectly suited to
achieving very high processing rates, as
the data can be split into many different
independent pieces, and the multiple
instruction units can all operate on them at
the same time.
8
For example: each of 64,000 processors in a Thinking
Machines CM-2 would execute the same instruction at the same
time so that you could do 64,000 multiplies on 64,000 pairs of
numbers at a time.
10. SIMD TYPES
Synchronous (lock-step):
These systems are synchronous, meaning that they are built in such a way
as to guarantee that all instruction units will receive the same instruction at
the same time, and thus all will potentially be able to execute the same
operation simultaneously.
Deterministic SIMD architectures:
These are deterministic because, at any one point in time, there is only one
instruction being executed, even though multiple units may be executing it.
So, every time the same program is run on the same data, using the same
number of execution units, exactly the same result is guaranteed at every step
in the process.
Well-suited to instruction/operation level parallelism:
The “single” in single-instruction doesn’t mean that there’s only one
instruction unit, as it does in SISD, but rather that there’s only one instruction
stream, and this instruction stream is executed by multiple processing units
on different pieces of data, all at the same time, thus achieving parallelism.
10
11. SIMD (ADVANTAGES)
An application where the same value is being added (or subtracted) to a large
number of data points, a common operation in many multimedia applications.
One example would be changing the brightness of an image.
To change the brightness, the R G and B values are read from memory, a value is
added (or subtracted) from them, and the resulting values are written back out to
memory.
The data is understood to be in blocks, and a number of values can be loaded
all at once.
Instead of a series of instructions saying "get this pixel, now get the next pixel",
a SIMD processor will have a single instruction that effectively says "get lots of
pixels―. This can take much less time than "getting" each pixel individually, like
with traditional CPU design.
If the SIMD system works by loading up eight data points at once, the add
operation being applied to the data will happen to all eight values at the same
time. 11
12. SIMD (DISADVANTAGES)
Not all algorithms can be vectorized.
Implementing an algorithm with SIMD instructions usually requires
human labor; most compilers don't generate SIMD instructions from a typical
C program, for instance.
Programming with particular SIMD instruction sets can involve numerous
low-level challenges.
It has restrictions on data alignment.
Gathering data into SIMD registers and scattering it to the correct
destination locations is tricky and can be inefficient.
Specific instructions like rotations or three-operand addition aren't in some
SIMD instruction sets.
12
14. SISD
This is the oldest style of computer
architecture, and still one of the most
important: all personal computers fit within this
category
Single instruction refers to the fact that there
is only one instruction stream being acted on
by the CPU during any one clock tick;
single data means, analogously, that one and
only one data stream is being employed as
input during any one clock tick.
14
15. SISD
In computing, SISD is a term referring to
a computer architecture in which a single
processor, (uniprocessor) executes a single
instruction stream, to operate on data stored in
a single memory.
This corresponds to the Von Neumann
Architecture.
Instruction fetching and pipelined execution of
instructions are common examples found
in most modern SISD computers.
15
16. CHARACTERISTICS OF SISD
Serial Instructions are executed one after the other, in lock-step; this
type of sequential execution is commonly called serial, as opposed to
parallel, in which multiple instructions may be processed simultaneously.
Deterministic Because each instruction has a unique place in the
execution stream, and thus a unique time during which it and it alone is
being processed, the entire execution is said to be
deterministic, meaning that you (can potentially) know exactly what is
happening at all times, and, ideally, you can exactly recreate the process, step
by step, at any later time.
Examples:
All personal computers,
All single-instruction-unit-CPU workstations,
Mini-computers, and
Mainframes.
16
18. MIMD
In computing, MIMD is a technique
employed to achieve parallelism.
Machines using MIMD have a number
of processors that function
asynchronously and independently.
At any time, different processors may
be executing different instructions on
different pieces of data.
MIMD architectures may be used in a
number of application areas such as
computer-aided design/computer-
aided manufacturing, simulation,
modeling, and as communication
switches. 18
19. MIMD
MIMD machines can be of either
shared memory or distributed
memory categories.
Shared memory machines
may be of the bus-based,
extended or hierarchical type.
—Distributed memory machines
may have hypercube or mesh
interconnection schemes.
19
20. MIMD: SHARED MEMORY MODEL
The processors are all connected to a "globally available" memory,
via either a software or hardware means. The operating system
usually maintains its memory coherence.
Bus-based:
MIMD machines with shared memory have processors which share a
common, central memory.
Here all processors are attached to a bus which connects them to
memory.
This setup is called bus-base point where there is too much
contention on the bus.
Hierarchical:
MIMD machines with hierarchical shared memory use a hierarchy
of buses to give processors access to each other's memory.
Processors on different boards may communicate through inter-nodal
buses.
Buses support communication between boards.
With this type of architecture, the machine may support over a thousand
processors.
20
21. MIMD: DISTRIBUTED MEMORY MODEL
In distributed memory MIMD machines, each processor has its own
individual memory location. Each processor has no direct
knowledge about other processor's memory.
For data to be shared, it must be passed from one processor to
another as a message. Since there is no shared memory, contention is
not as great a problem with these machines.
It is not economically feasible to connect a large number of processors
directly to each other. A way to avoid this multitude of direct
connections is to connect each processor to just a few others.
The amount of time required for processors to perform simple
message routing can be substantial.
Systems were designed to reduce this time loss and hypercube
and mesh are among two of the popular interconnection schemes.
21
22. MIMD: DISTRIBUTED MEMORY MODEL
Interconnection schemes:
Hypercube interconnection network:
In an MIMD distributed memory machine with a hypercube system
interconnection network containing four processors, a processor and a
memory module are placed at each vertex of a square.
The diameter of the system is the minimum number of steps it takes for
one processor to send a message to the processor that is the farthest
away.
So, for example, In a hypercube system with eight processors and each
processor and memory module being placed in the vertex of a cube, the
diameter is 3. In general, a system that contains 2^N processors with each
processor directly connected to N other processors, the diameter of the
system is N.
Mesh interconnection network:
In an MIMD distributed memory machine with a mesh
interconnection network, processors are placed in a two- dimensional grid.
Each processor is connected to its four immediate neighbors. Wrap
around connections may be provided at the edges of the mesh.
One advantage of the mesh interconnection network over the
hypercube is that the mesh system need not be configured in
powers of two.
22
23. MIMD: CATEGORIES
The most general of all of the major categories, a MIMD machine is
capable of being programmed to operate as if it were in fact any of
the four.
Synchronous or asynchronous MIMD instruction streams can
potentially be executed either synchronously or asynchronously, i.e.,
either in tightly controlled lock-step or in a more loosely bound “do your
own thing” mode.
Deterministic or non-deterministic MIMD systems are potentially
capable of deterministic behavior, that is, of reproducing the exact same
set of processing steps every time a program is run on the same data.
Well-suited to block, loop, or subroutine level parallelism. The more
code each processor in an MIMD assembly is given domain over, the
more efficiently the entire system will operate, in general.
Multiple Instruction or Single Program MIMD-style systems are
capable of running in true “multiple-instruction” mode, with every
processor doing something different, or every processor can be given the
same code; this latter case is called SPMD, “Single Program Multiple
Data”, and is a generalization of SIMD-style parallelism.
23
25. MISD
In computing, MISD is a type of parallel
computing architecture where many
functional units perform different operations
on the same data.
Pipeline architectures belong to this type.
Fault-tolerant computers executing the
same instructions redundantly in order to
detect and mask errors, in a manner known
as task replication, may be considered
to belong to this type.
Not many instances of this
architecture exist, as MIMD and SIMD
are often more appropriate for common data
parallel techniques. 25
26. MISD
Another example of a MISD
process that is carried out routinely
at United Nations.
When a delegate speaks in a
language of his/her choice, his
speech is simultaneously
translated into a number of other
languages for the benefit of
other delegates present. Thus
the delegate‘s speech (a single
data) is being processed by a
number of translators
(processors) yielding different
results.
26
27. MISD
MISD Examples:
Multiple frequency filters operating on a single signal stream.
Multiple cryptography algorithms attempting to crack a single
coded message.
Both of these are examples of this type of processing where
multiple, independent instruction streams are applied simultaneously
to a single data stream.
27
29. PIPELINING
In computing, a pipeline is a set of data processing
elements connected in series, so that the output of one element is
the input of the next one.
The elements of a pipeline are often executed in parallel or in time-
sliced fashion.
29
30. PIPELINING (CONCEPT AND MOTIVATION)
Consider the washing of a car:
A car on the washing line can have only one of the three steps done at
once. After the car has its washing, it moves for drying, leaving the
washing facilities available for the next car.
The first car then moves on to polishing, the second car to drying, and a
third car begins to have its washing.
If each operation needs 30 minutes each, then finishing all three cars
when only one car can be operated at once would take (??????) minutes.
On the other hand, using the washing line, the total time to complete
all three is (?????) minutes. At this point, additional cars will come off the
assembly line.
30
31. PIPELINING (IMPLEMENTATIONS)
Buffered, Synchronous pipelines:
Conventional microprocessors are synchronous circuits that use buffered,
synchronous pipelines. In these pipelines, "pipeline registers" are inserted in-
between pipeline stages, and are clocked synchronously.
Buffered, Asynchronous pipelines:
Asynchronous pipelines are used in asynchronous circuits, and have their
pipeline registers clocked asynchronously. Generally speaking, they use a
request/acknowledge system, wherein each stage can detect when it's finished.
When a stage is finished and the next stage has sent it a "request" signal, the
stage sends an "acknowledge" signal to the next stage, and a "request" signal to
the previous stage. When a stage receives an "acknowledge" signal, it clocks its
input registers, thus reading in the data from the previous stage.
Unbuffered pipelines:
Unbuffered pipelines, called "wave pipelines", do not have registers in-between
pipeline stages.
Instead, the delays in the pipeline are "balanced" so that, for each stage, the
difference between the first stabilized output data and the last is minimized.
31
32. INSTRUCTION PIPELINE
An instruction pipeline is a
technique used in the design of
computers and other digital
electronic devices to increase
their instruction throughput (the
number of instructions that can be
executed in a unit of time).
The fundamental idea is to split
the processing of a computer
instruction into a series of
independent steps, with storage
at the end of each step. This
allows the computer's control
circuitry to issue instructions at the
processing rate of the slowest
step, which is much faster than
the time needed to perform all
steps at once.
32
33. INSTRUCTION PIPELINE
For example, the classic RISC
pipeline is broken into five stages
with a set of flip flops between
each stage.
Instruction fetch
Instruction decode and register
fetch
Execute
Memory access
Register write back
33
34. PIPELINING (ADVANTAGES AND DISADVANTAGES)
Pipelining does not help in all cases. An instruction pipeline is said to be
fully pipelined if it can accept a new instruction every clock cycle. A
pipeline that is not fully pipelined has wait cycles that delay the progress
of the pipeline.
Advantages of Pipelining:
The cycle time of the processor is reduced, thus increasing instruction issue- rate in
most cases.
Some combinational circuits such as adders or multipliers can be made faster by
adding more circuitry. If pipelining is used instead, it can save circuitry.
Disadvantages of Pipelining:
A non-pipelined processor executes only a single instruction at a time. This prevents
branch delays and problems with serial instructions being executed concurrently.
Consequently the design is simpler and cheaper to manufacture.
The instruction latency in a non-pipelined processor is slightly lower than in a
pipelined equivalent. This is due to the fact that extra flip flops must be added
to the data path of a pipelined processor.
A non-pipelined processor will have a stable instruction bandwidth. The
performance of a pipelined processor is much harder to predict and may vary more
widely between different programs.
34
36. PARALLEL COMPUTING
Parallel computing is a form of computation in which many
calculations are carried out simultaneously, operating on the principle
that large problems can often be divided into smaller ones, which
are then solved concurrently ("in parallel").
There are several different forms of parallel computing:
bit-level,
instruction level,
data, and
task parallelism.
Parallelism has been employed for many years, mainly in high-
performance computing.
As power consumption by computers has become a concern in recent
years, parallel computing has become the dominant issue in
computer architecture, mainly in the form of multicore processors.36
37. PARALLEL COMPUTING
Computer Software is written for serial computation. To solve a
problem, an algorithm is constructed and implemented as a serial
stream of instructions. Only one instruction may execute at a time—
after that instruction is finished, the next is executed.
Parallel computing, on the other hand, uses multiple
processing elements simultaneously to solve a problem.
This is accomplished by breaking the problem into independent
parts so that each processing element can execute its part of the
algorithm simultaneously with the others.
The processing elements can be diverse and include resources such
as a single computer with multiple processors, several networked
computers, specialized hardware or any combination of the above.
.
37
38. TYPES OF PARALLELISM
Bit-level parallelism:
From the advent of VLSI in the 1970s until about 1986, speed-up in computer
architecture was driven by doubling computer word size— the amount of
information the processor can manipulate per cycle. Increasing the word size
reduces the number of instructions the processor must execute to perform an
operation on variables whose sizes are greater than the length of the word.
Instruction-level parallelism:
A computer program is, a stream of instructions executed by a processor.
These instructions can be re-ordered and combined into groups which are then
executed in parallel without changing the result of the program. This is known as
instruction-level parallelism.
Data parallelism:
Data parallelism is parallelism inherent in program loops, which focuses on
distributing the data across different computing nodes to be processed in parallel.
Task parallelism:
Task parallelism is the characteristic of a parallel program that "entirely
different calculations can be performed on either the same or different sets of data‖
This contrasts with data parallelism, where the same calculation is performed on
the same or different sets of data.
38
39. TYPES OF PARALLELISM
Bit-level parallelism is a form of parallel computing based on increasing
processor word size.
Increasing the word size reduces the number of instructions the processor
must execute in order to perform an operation on variables whose sizes
are greater than the length of the word.
For example:
Consider a case where an 8-bit processor must add two 16-bit integers. The
processor must first add the 8 lower-order bits from each integer, then add the 8
higher-order bits, requiring two instructions to complete a single operation. A 16-
bit processor would be able to complete the operation with single instruction
Historically, 4-bit microprocessors were replaced with 8-bit, then 16-bit, then 32-bit
microprocessors. This trend generally came to an end with the introduction of 32-
bit processors, which has been a standard in general purpose computing for
two decades. Only recently, with the advent of x86-64 architectures, have
64-bit processors become commonplace.
39
40. TYPES OF PARALLELISM
Instruction-level parallelism (ILP) is a measure of how many of the
operations in a computer program can be performed simultaneously.
Consider the following program:
For Example:
1. e = a + b
2. f = c + d
3. g = e * f
Here, Operation 3 depends on the results of operations 1 and 2, so it cannot
be calculated until both of them are completed. However, operations 1 and 2
do not depend on any other operation, so they can be calculated
simultaneously.
If we assume that each operation can be completed in one unit of time
then these three instructions can be completed in a total of two units of time,
giving an ILP of 3/2.
40
41. TYPES OF PARALLELISM
Instruction-level parallelism (ILP):
A goal of compiler and processor designers is to identify
and take advantage of as much ILP as possible.
Ordinary programs are typically written under a sequential
execution model where instructions execute one after the
other and in the order specified by the programmer. ILP allows
the compiler and the processor to overlap the execution of
multiple instructions or even to change the order in which
instructions are executed.
How much ILP exists in programs is very application specific. In
certain fields, such as graphics and scientific computing the
amount can be very large. However, workloads such as
cryptography exhibit much less parallelism.
41
42. TYPES OF PARALLELISM
Data parallelism (also known as loop-level
parallelism) is a form of parallelization of computing
across multiple processors in parallel computing
environments.
Data parallelism focuses on distributing the data across
different parallel computing nodes.
In a multiprocessor system executing a single set of
instructions (SIMD), data parallelism is achieved when
each processor performs the same task on different
pieces of distributed data. In some situations, a single
execution thread controls operations on all pieces of
data.
42
43. TYPES OF PARALLELISM
Data parallelism
For instance, consider a 2-processor system (CPUs A and B) in
a parallel environment, and we wish to do a task on some data
‗d‘. It is possible to tell CPU A to do that task on one part of ‗d‘
and CPU B on another part simultaneously, thereby reducing
the duration of the execution.
The data can be assigned using conditional statements
As a specific example, consider adding two matrices. In a
data parallel implementation, CPU A could add all
elements from the top half of the matrices, while CPU B could
add all elements from the bottom half of the matrices.
Since the two processors work in parallel, the job of
performing matrix addition would take one half the time of
performing the same operation in serial using 51 one CPU
alone.
43
44. TYPES OF PARALLELISM
Task parallelism (also known as function
parallelism and control parallelism) is a form of parallelization
of computer code across multiple processors in parallel
computing environments.
Task parallelism focuses on distributing execution processes
(threads) across different parallel computing nodes.
In a multiprocessor system, task parallelism is achieved when
each processor executes a different thread (or process) on the
same or different data.
The threads may execute the same or different code. In the
general case, different execution threads communicate with one
another as they work. Communication takes place usually to
pass data from one thread to the next as part of a workflow.
44
45. TYPES OF PARALLELISM
Task parallelism
As a simple example, if we are running code on a 2-
processor system (CPUs "a" & "b") in a parallel
environment and we wish to do tasks "A" and "B" , it is
possible to tell CPU "a" to do task "A" and CPU "b" to do
task 'B" simultaneously, thereby reducing the runtime of
the execution.
The tasks can be assigned using conditional
statements.
Task parallelism emphasizes the distributed
(parallelized) nature of the processing (i.e. threads), as
opposed to the data (data parallelism).
45