2. • The essence of the superscalar approach is the ability to execute instructions
independently and concurrently in different pipelines.
• In a traditional scalar organization, there is a single pipelined func- tional unit for integer
operations and one for floating-point operations.
• In the superscalar organization, there are multiple functional units, each of which is
implemented as a pipeline.
• Each individual functional unit provides a degree of parallelism by virtue of its pipelined
structure
• The use of multiple functional units enables the processor to execute streams of
instructions in parallel, one stream for each pipeline.
• It is the responsibility of the hardware, in conjunction with the compiler, to assure that the
parallel execution does not violate the intent of the program.
Superscalar Architecture
3.
4. • Superpipelining exploits the fact that many pipeline stages perform tasks that require less than
half a clock cycle.
• The base pipeline issues one instruction per clock cycle and can perform one pipeline stage per
clock cycle.
• Note that although several instructions are executing concurrently, only one instruction is in its
execution stage at any one time.
• The next part of the diagram shows a superpipelined implementation that is capable of performing
two pipeline stages per clock cycle.
• An alternative way of looking at this is that the functions performed in each stage can be split into
two nonoverlapping parts and each can execute in half a clock cycle.
• A superpipeline implementation that behaves in this fashion is said to be of degree 2. Finally, the
lowest part of the diagram shows a superscalar implementation capable of executing two
instances of each stage in parallel.
Superpipelining
7. • True Data Dependency : True Data Dependency is when the second instruction can
be fetched and decoded but cannot execute until the first instruction executes. The
reason is that the second instruction needs data produced by the first instruction.
• Procedural Dependencies : The presence of branches in an instruction sequence
complicates the pipeline operation. The instructions following a branch (taken or not
taken) have a procedural dependency on the branch and cannot be executed until the
branch is executed.
• Resource Conflicts : A resource conflict is a competition of two or more instructions
for the same resource at the same time. Examples of resources include memories,
caches, buses, register-file ports, and functional units
9. • A taxonomy first introduced by Flynn [FLYN72] is still the most common way of categorizing
systems with parallel processing capability. Flynn proposed the follow- ing categories of
computer systems:
• Single instruction, single data (SISD) stream : A single processor executes a single
instruction stream to operate on data stored in a single memory. Uniprocessors fall into
this category.
• Single instruction, multiple data (SIMD) stream : A single machine instruction controls
the simultaneous execution of a number of processing elements on a lockstep basis.
Each processing element has an associated data memory, so that instructions are
executed on different sets of data by different processors. Vector and array processors
fall into this category
• Multiple instruction, single data (MISD) stream : A sequence of data is trans- mitted to a
set of processors, each of which executes a different instruction sequence. This
structure is not commercially implemented.
• Multiple instruction, multiple data (MIMD) stream : A set of processors simultaneously
execute different instruction sequences on different data sets.
10. • For SISD there is some sort of control unit (CU) that provides an instruction stream
(IS) to a processing unit (PU). The processing unit operates on a single data stream
(DS) from a memory unit (MU).
• For SIMD, there is still a single control unit, now feeding a single instruction stream to
multiple PUs. Each PU may have its own dedicated memory or there may be a shared
memory.
11. • Finally, with the MIMD, there are multiple control units, each feeding a separate
instruction stream to its own PU. The MIMD may be a shared-memory multiprocessor
or a distributed- memory multicomputer
12. • RISC stands for Reduced Instruction Set Computer. RISC processor design has
separate digital circuitry in the control unit, which produces all the necessary signals
needed for the execution of each instruction in the instruction set of the processor.
• Examples of RISC processors:
• IBM RS6000, MC88100
• DEC’s Alpha 21064, 21164 and 21264 processors
RISC Architecture
13. • RISC processors use a small and limited number of instructions. This puts emphasis
on software and compiler design due to the relatively simple instruction set.
• RISC machines mostly uses hardwired control unit.
• RISC processors consume less power and have high performance. RISC processors
have been known to be heavily pipelined this ensures that the hardware resources of
the processor are utilized to a maximum giving higher throughput and also consuming
less power.
• Each instruction is very simple and consistent. Most instructions in a RISC instruction
set are very simple that get executed in one clock cycle.
• RISC processors use simple addressing modes.
• RISC instruction is of uniform fixed length.
• The RISC design philosophy generally incorporates a larger number of registers to
prevent in large amounts of interactions with memory
14. • CISC stands for Complex Instruction Set Computer. If the control unit contains a
number of micro-electronic circuitry to generate a set of control signals and each
micro-circuitry is activated by a micro-code, this design approach is called CISC
design. The primary goal of CISC architecture is to complete a task in as few lines of
assembly code as possible.
• Examples of CISC processors are:
• Intel 386, 486, Pentium, Pentium Pro, Pentium II, Pentium III
• Motorola’s 68000, 68020, 68040, etc.
CISC Architecture
15. • CISC chips have complex instructions. A CISC processor would come prepared with
a specific instruction (call it "MULT"). Thus, the entire task of multiplying two numbers
(2,3) can be completed with one instruction:
• MULT is what is known as a "complex instruction." It operates directly on the
computer's memory banks and does not require the programmer to explicitly call any
loading or storing functions. It closely resembles a command in a higher level
language.
• There are a variety of instructions many of which are complex and thus make up for
smaller assembly code thus leading to very low RAM consumption.
• CISC machines generally make use of complex addressing modes.
• The decision of CISC processor designers to provide a variety of addressing modes
leads to variable-length instructions. For example, instruction length increases if an
operand is in memory as opposed to in a register.
16. • The complex instruction set and smaller assembly code meant little work for the
compiler and thus eased up compiler design
• CISC machines uses micro-program control unit which consist of micro programs that
are stored in a control memory like ROM from where the CPU accesses them and
generates control signals.
• CISC processors are having limited number of registers. CISC processors normally
only have a single set of registers. Since the addressing modes give provisions for
memory operands, limited number of “costly” register memory is sufficient for the
functions.