Ch 2.pptx

Chapter two
Central processing unit

Outlines
3/24/2023 computer architecture and organization 2
Computer arithmetic
Instruction sets, Instruction format and addressing modes
CPU Structure, RISC and CISC
Pipelining
The Control Unit (Hardwired and Micro programmed
Implementations)

CPU
• Part of computer that performs the bulk of data processing operation
• It interprets and executes machine level instructions
• Controls data transfer from/to Main Memory (MM) and CPU
• Detects any errors
3/24/2023 3

Arithmetic logic unit (ALU)
• The ALU is that part of the computer that actually performs
arithmetic and logic operations on data.
• All other element of Computer Systems - CU, register, main memory,
and I/O are mainly used
to bring data into the ALU for it to process and
to take the result backout.

ALU Inputs and Outputs
3/24/2023 5
• Data are presented to the ALU in registers, and the results
of an operation are stored in registers.
• These registers are temporary storage locations within the processor.
• The ALU may also set flags as the result of an operation.
For example,
• an overflow flag is set to 1 if the result of a computation exceeds the
length of the register into which it is to be stored.
• Flags are also stored in registers within the processor.
• The control unit provides signals that control the operation
of the ALU and the movement of the data into and out of
the ALU.
ALU
Control
Signals
Operand
Registers
Flags
Result
Registers
Figure 10.1 ALU Inputs and Outputs

Computer arithmetic
• The two principal concerns for computer arithmetic are
The way in which numbers are represented (the binary format)
The algorithms used for the basic arithmetic operations (add, subtract, multiply, divide).
• The basic arithmetic operations are: add, sub, mult and div.
• Arithmetic instruction are performed on binary /decimal data.
• Arithmetic operation is executed in ALU section.
• Computer arithmetic is commonly performed on two very different types of
numbers: integer and floating point.
6

Cont..
 Integer representation
Sign magnitude representation
Two’s complement representation
Integer arithmetic
Addition and subtraction
Multiplication and division
Negation

Integer Representation
• Only have 0 & 1 to represent numbers
• Positive numbers stored in binary
• e.g. 41=00101001
• No minus sign and periods.

Sign-Magnitude Representation
• The simplest form of representation that employs a sign bit is the
sign-magnitude representation.
• Left most bit is sign bit
 0 means positive
 1 means negative
• +18 = 00010010
• -18 = 10010010 (sign magnitude)

Twos Complement Representation
• Like sign magnitude, twos complement representation uses the most
significant bit as a sign bit,
• Easy to test whether an integer is positive or negative.
• Differs from the use of the sign-magnitude representation in the way
that the other bits are interpreted.
 +2 = 00000010
 +1 = 00000001
 +0 = 00000000
 -1 = 11111111
 -2 = 11111110

Integer arithmetic
3/24/2023 11
1. Negation
• Twos complement operation
Take the Boolean complement of each bit of the integer (including the sign bit)
Treating the result as an unsigned binary integer, add 1
• The negative of the negative of that number is itself:
+18 = 00010010 (twos complement)
bitwise complement = 11101101
+ 1
11101110 = -18
-18 = 11101110 (twos complement)
bitwise complement = 00010001
+ 1
00010010 = +18

2. Addition and subtraction
3/24/2023 12
1001 = –7
+0101 = 5
1110 = –2
1100 = –4
+0100 = 4
10000 = 0
(a) (–7) + (+5) (b) (–4) + (+4)
0011 = 3
+0100 = 4
0111 = 7
1100 = –4
+1111 = –1
11011 = –5
(c) (+3) + (+4) (d) (–4) + (–1)
0101 = 5
+0100 = 4
1001 = Overflow
1001 = –7
+1010 = –6
10011 = Overflow
(e) (+5) + (+4) (f) (–7) + (–6)
Figure 10.3 Addition of Numbers in Twos Complement Representation

3/24/2023 13
OVERFLOW RULE: If two numbers are added, and they are
both positive or both negative, then overflow occurs if and
only if the result has the opposite sign.
SUBTRACTION RULE: To subtract one number (subtrahend)
from another (minuend), take the twos complement
(negation) of the subtrahend and add it to the minuend.

3/24/2023 14
0010 = 2
+1001 = –7
1011 = –5
0101 = 5
+1110 = –2
10011 = 3
(a) M = 2 = 0010
S = 7 = 0111
–S = 1001
(b) M = 5 = 0101
S = 2 = 0010
–S = 1110
1011 = –5
+1110 = –2
11001 = –7
0101 = 5
+0010 = 2
0111 = 7
(c) M =–5 = 1011
S = 2 = 0010
–S = 1110
(d) M = 5 = 0101
S =–2 = 1110
–S = 0010
0111 = 7
+0111 = 7
1110 = Overflow
1010 = –6
+1100 = –4
10110 = Overflow
(e) M = 7 = 0111
S = –7 = 1001
–S = 0111
(f) M = –6 = 1010
S = 4 = 0100
–S = 1100
Figure 10.4 Subtraction of Numbers in Twos Complement Representation (M – S)

Adder
OF
OF = overflow bit
SW = Switch (select addition or subtraction)
Complementer
Figure 10.6 Block Diagram of Hardware for Addition and Subtraction
A Register
B Register
SW
• For addition,
 the two numbers are presented to the adder from
two registers, as A and B registers. The result
may be stored in one of these registers or in a
third.
• For subtraction,
 the subtrahend (B register) is passed through a
twos complementer so that its twos complement
is presented to the adder.
• The overflow indication is stored in a 1-bit overflow
flag (0 = no overflow; 1 = overflow).
• Control signals are needed to control whether or not
the complementer is used, depending on whether the
operation is addition or subtraction.

• The operation of the processor is determined by the instructions it
executes, referred to as machine instructions or computer
instructions.
• The collection of different instructions that the processor can
execute is referred to as the processor’s instruction set.
• Each instruction must contain the information required by the
processor for execution.
Machine Instruction Characteristics

Elements of a Machine Instruction
3/24/2023 17
Operation code: Specifies the operation to be performed (e.g., ADD, I/O).
The operation is specified by a binary code, known as the operation code, or op-
code.
Source operand reference: The operation may involve one or more source
operands, that is, operands that are inputs for the operation.
Result operand reference: The operation may produce a result.
Next instruction reference: This tells the processor where to fetch the next
instruction after the execution of this instruction is complete.

© 2016 Pearson Education, Inc., Hoboken, NJ. All rights
reserved.
Instruction
address
calculation
Instruction
operation
decoding
Operand
address
calculation
Data
Operation
Operand
address
calculation
Instruction
fetch
Instruction complete,
fetch next instruction
Multiple
operands
Return for string
or vector data
Figure 12.1 Instruction Cycle State Diagram
Operand
fetch
Operand
store
Multiple
results

Source and result operands can be in one of four areas:
• Main or virtual memory: As with next instruction references, the main or virtual
memory address must be supplied.
• Processor register: With rare exceptions, a processor contains one or more registers
that may be referenced by machine instructions.
• Immediate: The value of the operand is contained in a field in the instruction being
executed.
• I/O device: The instruction must specify the I/O module and device for the
operation. If memory-mapped I/O is used, this is just another main or virtual
memory address.
3/24/2023 19

Instruction format
• Instruction:- is collection of ordered steps forms a program of a
computer.
CU reads an instruction from memory and execute it.
consists of an opcode, usually with some additional information such as where operands
come from, and where results go.
• Each instruction is represented by a sequence of bits.
• The bits of instruction are divided into groups called fields.
3/24/2023 20

Op-codes are represented by abbreviations called mnemonics.
• Examples include:
 ADD Add
 SUB Subtract
 MUL Multiply
 DIV Divide
 LOAD Load data from memory
 STOR Store data to memory
 Operands are also represented symbolically. for example,
 ADD R, Y : mean add the value contained in data location Y to the contents of register R.
 Each symbolic op-code has a fixed binary representation.
The programmer specifies the location of each symbolic operand.

Instruction Types
Any program written in a high-level language must be translated into machine
language to be executed.
Thus, the set of machine instructions must be sufficient to express any of the
instructions from a high-level language.
Instructions are categorized as follows:
 Data processing: Arithmetic and logic instructions.
 Data storage: Movement of data into or out of register and or memory locations.
 Data movement: I/O instructions.
 I/O instructions are needed to transfer programs and data into memory and the results of computations back
out to the user.
 Control: Test and branch instructions
 Test instructions are used to test the value of a data word or the status of a computation.
 Branch instructions are then used to branch to a different set of instructions depending on the decision
made.
3/24/2023 22

Types of instruction
• Note:thenumberofaddressfieldintheinstructionformatofcomputerdependsontheinternal
organizationofitsregister
1. Three addresses instruction
• Operand 1, Operand 2, Result
• format : op X,Y,Z;
E.g. ADD X,Y,Z;
• Not common
• Needs very long(complex design) words to hold 3 address

Types of instruction cont..
2. Two addresses instruction
• One address doubles as operand and result
• format: op X,Y;
E.g. SUB X,Y;
• most common in commercial computer
• Reduces length of instruction
• Requires some extra work
• Temporary storage to hold some results

Cont…
3. One address instruction
• Implicit second address
• Uses an accumulator contains one of the operand &used to store result
• format: op X;
E.g. MUL X,
• Common on early machines
4. 0 (zero) addresses
• All addresses implicit
• Uses a stack
• format: OP;
e.g. DIV;

Example: Evaluate (A+B)  (C+D)
• Three-Address
1. ADD R1, A, B; R1 ← M[A] + M[B]
2. ADD R2, C, D; R2 ← M[C] + M[D]
3. MUL X, R1, R2; M[X] ← R1  R2
26 / 34
• Two-Address
1. MOV R1, A; R1 ← M[A]
2. ADD R1, B; R1 ← R1 + M[B]
3. MOV R2, C; R2 ← M[C]
4. ADD R2, D; R2 ← R2 + M[D]
5. MUL R1, R2; R1 ← R1  R2
6. MOV X, R1; M[X] ← R1

• One-Address
1. LOAD A; AC ← M[A]
2. ADD B; AC ← AC + M[B]
3. STORE T; M[T] ← AC
4. LOAD C; AC ← M[C]
5. ADD D; AC ← AC + M[D]
6. MUL T; AC ← AC  M[T]
7. STORE X; M[X] ← AC
27 / 34

• Zero-Address
1. PUSH A; TOS ← A
2. PUSH B ; TOS ← B
3. ADD; TOS ← (A + B)
4. PUSH C; TOS ← C
5. PUSH D; TOS ← D
6. ADD; TOS ← (C + D)
7. MUL; TOS ← (C+D)(A+B)
8. POP X; M[X] ← TOS
28 / 34

Instruction Comment
SUB Y, A, B Y ¬ A – B
MPY T, D, E T ¬ D ´ E
ADD T, T, C T ¬ T + C
DIV Y, Y, T Y ¬ Y ÷ T
(a) Three-address instructions
Instruction Comment
LOAD D AC ¬ D
MPY E AC ¬ AC ´ E
ADD C AC ¬ AC + C
STOR Y Y ¬ AC
LOAD A AC ¬ A
SUB B AC ¬ AC – B
DIV Y AC ¬ AC ÷ Y
STOR Y Y ¬ AC
Instruction Comment
MOVE Y, A Y ¬ A
SUB Y, B Y ¬ Y – B
MOVE T, D T ¬ D
MPY T, E T ¬ T ´ E
ADD T, C T ¬ T + C
DIV Y, T Y ¬ Y ÷ T
(b) Two-address instructions (c) One-address instructions
Figure 12.3 Programs to Execute Y=
A- B
C+ D´ E
( )

AC = accumulator
T = top of stack
(T – 1) = second element of stack
A, B, C = memory or register locations
reserved.
Number of Addresses Symbolic Representation Interpretation
3 OP A, B, C A ¬ B OP C
2 OP A, B A ¬ A OP B
1 OP A AC ¬ AC OP A
0 OP T ¬ (T – 1) OP T
Table 12.1
Utilization of Instruction Addresses
(Non-branching Instructions)

Instruction Set Design
The most important of these fundamental design issues include the following:
• Operation repertoire: How many and which operations to provide, and how complex
operations should be .
• Data types: The various types of data upon which operations are performed.
• Instruction format: Instruction length (in bits), number of addresses, size of various fields,
and so on.
• Registers: Number of processor registers that can be referenced by instructions, and their
use
• Addressing: The mode or modes by which the address of an operand is specified.

Addressing Modes
Specify where an operand is located & how to compute the exact “memory
address” of an operand.
They can specify a constant, a register, or a memory location.
The actual location of an operand is its effective address.
Generally it Inform 2 things;-
1. Exact location of an operand
2. How to find that memory address

Types of addressing modes
• An operand reference in an instruction either contains the actual
value of the operand (immediate) or a reference to the address of
the operand.
• A wide variety of addressing modes is used in various instruction
sets such as;
Immediate
Direct
Indirect
Register
3/24/2023 33
Register Indirect
Displacement (Indexed)
Stack

Immediate Addressing
• Operand is part of instruction
• Operand = address field
e.g. ADD 5: Add 5 to contents of accumulator
5 is operand
• The simplest form of addressing, in which the operand value is present in the
instruction.
• No memory reference to fetch data (saves one memory cycle).
• Fast, but Limited range(operand); The size of the number is restricted to the
size of the address field.

Direct Addressing
• Address field contains address of operand.
• Effective address (EA) = address field (A)
• e.g. ADD A
• Add contents of cell A to accumulator
• Look in memory at address A for operand
• Single memory reference to access data
• No additional calculations to work out effective address
• Limited address space (limitation)

Indirect Addressing
• With direct addressing, the length of the address field is
usually less than the word length, thus limiting the address
range.
• Solution: having the address field refer to the address of a
word in memory, which contains a full-length address of the
operand. This is known as indirect addressing.
• Reference to the address of a word in memory which
contains a full-length address of the operand.
• EA = (A)
• Look in A, find address (A) and look there for operand
• e.g. ADD (A)
• Add contents of cell pointed to by contents of A to
accumulator.

Cont..
• Multiple memory accesses to find operand
• Instruction execution requires two memory references to fetch the operand
• One to get its address and a second to get its value
• Hence slower
• Large address space
• 2n where n = word length
• May be nested, multilevel, cascaded
• e.g. EA = (((A)))
• Parentheses are to be interpreted as meaning ‘contents of’

Register Addressing
• Similar to direct addressing. The only difference is that the address field refers
to a register rather than a main memory address.
• Operand is held in register named in address filed
EA = R
• Very small address field needed
• Shorter instructions
• Faster instruction fetch
• Limited number of registers
• The address space is very limited

Register Addressing …
• Advantage:
• No memory access
• No time-consuming memory references are required
• Very fast execution
• Disadvantage: Very limited address space

Register Indirect Addressing
• Analogous to indirect addressing.
The only difference is whether the address field refers to a
memory location or a register.
EA = (R)
• Operand is in memory cell pointed to by contents of
register R.
• Address space limitation is overcome by having that field
refer to a word-length location containing an address.
• Large address space (2n)
• Uses one less memory reference than indirect
addressing

Displacement Addressing
• Combines the capabilities of direct addressing and register
indirect addressing.
EA = A + (R)
• Instruction has two address fields
A = base value
R = register that holds displacement or vice versa (a register whose
contents are added to A to produce the effective address)
• Common uses of displacement addressing
 Relative addressing (PC-relative addressing)
 Base-register addressing
 Indexing

Relative Addressing
• The content of the program counter is added
to the addressing field of the instruction to
obtain the effective address.
• EA = PC + Address field value
• PC = PC + Relative value program counter
(PC)
reserved.

Base-Register Addressing
• The base register content is added to the addressing field of the
instruction to obtain the effective address.
reserved.

Indexing
• The content of index register is added to the address part of the instruction
to obtain the EA.
• The method of calculating the EA is the same as for base-register
addressing
• Auto indexing
• Automatically increment or decrement the index register after each reference to it
• EA = A + (R)
• (R)  (R) + 1
• Postindexing
• Indexing is performed after the indirection
• EA = (A) + (R)
• Preindexing
• Indexing is performed before the indirection
• EA = (A + (R))

Stack Addressing
• A stack is a linear array of locations
• Sometimes referred to as a pushdown list or last-in-first-out
queue
• A stack is a reserved block of locations
• Items are appended to the top of the stack so that the block
is partially filled
• The machine instructions need not include a memory
reference but implicitly operate on the top of the
stack
• Operand is (implicitly) on top of stack
 ADD
• Pop top two items from stack, add, push the result onto stack

Summary of Addressing Modes
Mode Algorithm Principal Advantage Principal Disadvantage
Immediate Operand = A No memory reference Limited operand magnitude
Direct EA = A Simple Limited address space
Indirect EA = (A) Large address space Multiple memory references
Register EA = R No memory reference Limited address space
Register indirect EA = (R) Large address space Extra memory reference
Displacement EA = A + (R) Flexibility Complexity
Stack EA = top of stack No memory reference Limited applicability

CPU Structure and Function
Things a CPU must do (CPU requirements):
Fetch instruction: reads an instruction from memory (register, cache, main memory).
Interpret instruction: instruction is decoded to determine what action is required.
Fetch data: reading data from memory or an I/O module.
Process data: performing some arithmetic or logical operation on data.
Write data: writing exaction result to memory or an I/O module.
A small amount of internal memory, called registers, is needed to fulfill this
requirements (store some data temporarily)
Processor Organization

Figure 14.1 The CPU with the System Bus
Control
Bus
Data
Bus
Address
Bus
System
Bus
ALU
Registers
Control
Unit
• The ALU
 Does the actual computation or processing of data.
• The control unit
 Controls the movement of data and instructions into and
out of the processor
 Controls the operation of the ALU
 Decode the op-code

Control
Unit
Registers
Arithmetic
and
Boolean
Logic
Complementer
Internal
CPU
Bus
Shifter
Status Flags
Arithmetic and Logic Unit
Figure 14.2 Internal Structure of the CPU
Control
Paths

Register Organization
Within the processor there is a set of registers that function as a level
of memory above main memory and cache in the hierarchy
The registers in the processor perform two roles:
3/24/2023
• Enable the machine or assembly
language programmer to minimize
main memory references by
optimizing use of registers.
• Used by the control unit to control
the operation of the CPU and
• To control the execution of programs
by privileged operating system
programs
• Not visible to the user

User-Visible Registers
General-purpose registers: used for any type of functions.
Data registers: only used to hold data.
Address registers: used to hold address information.
Segment pointers: holds the address of the base of the segment.
Index registers: These are used for indexed addressing and may be auto indexed.
Stack pointer: dedicated register that points to the top of the stack.
condition codes (also referred to as flags).
bits set by the processor hardware as the result of operations. For example, an
arithmetic operation may produce a positive, negative, zero, or overflow result.

Control and Status Registers
Four registers are essential to instruction execution:
Program counter (PC)
Contains the address of an instruction to be fetched.
Instruction register (IR)
Contains the instruction most recently fetched.
Memory address register (MAR)
Contains the address of a location in memory.
Memory buffer register (MBR)
Contains a word of data to be written to memory or the word most recently
read.

Program Status Word (PSW): Register or set of registers that contain status
information
 Common fields or flags include:
Sign: sign bit of the result of the last arithmetic operation.
Zero: Set when the result is 0.
Carry: Set if an operation resulted in a carry (addition) into or borrow (subtraction) out
of a high-order bit.
Equal: Set if a logical compare result is equality.
Overflow: Used to indicate arithmetic overflow.
Interrupt Enable/Disable: Used to enable or disable interrupts.
Supervisor: Indicates whether the processor is executing in supervisor or user mode.

CPU design issues
Whether to use completely general-purpose registers or to specialize their
use.
Specialized registers save bits in instruction b/c their use can be implicit.
General purpose registers are more flexible.
The number of registers, either general purpose or data plus address, to
be provided.
More registers require more operand specifier bits.
between 8 and 32 registers appears optimum.
Register length.
Registers must be at least long enough to hold the largest address.
Data registers should be able to hold values of most data types.

The Instruction cycle
An instruction cycle includes the following stages:
Fetch: Read the next instruction from memory into the processor.
Execute: Interpret the op-code and perform the indicated operation.
Interrupt: If interrupts are enabled and an interrupt has occurred, save the current process
state and service the interrupt.
The instruction cycle consists of these phases:
– Fetch an instruction from memory
– Decode the instruction
– Read the effective address from memory if the operand has an indirect address.
– Execute the instruction.

Fetch
Figure 14.4 The Instruction Cycle
Execute
Interrupt Indirect

Instruction
address
calculation
Instruction
operation
decoding
Operand
address
calculation
Data
Operation
Operand
address
calculation
Instruction
fetch
Instruction complete,
fetch next instruction
Multiple
operands
Return for string
or vector data
Figure 14.5 Instruction Cycle State Diagram
No
interrupt
Interrupt
Operand
fetch
Indirection
Operand
store
Interrupt
check
Interrupt
Multiple
results
Indirection

Instruction Pipelining
• To improve the performance of a CPU we have two options:
1. Improve the hardware by introducing faster circuits.
2. Arrange the hardware such that more than one operation can be performed at
the same time.
• Since, there is a limit on the speed of hardware and the cost of faster circuits is
quite high, thus the 2nd option is better.
• Pipelining is a process of arrangement of hardware elements of the CPU such
that its overall performance is increased.
• Simultaneous execution of more than one instruction takes place in a pipelined
processor.
• This is solved without additional hardware, only letting different parts of the
hardware work for different instructions at the same time.
• Thus, pipelined operation increases the efficiency of a system.

Cont.…
• Data dependencies can be addressed by reordering the instructions
when possible (compiler).
• Processors make use of instruction pipelining to speed up execution.
• By breaking up the instruction cycle into a number of separate stages that
occur in sequence, such as fetch instruction, decode instruction, determine
operand addresses, fetch operands, execute instruction, and write operand
result.
• Instructions move through these stages, as on an assembly line, so
that in principle, each stage can be working on a different instruction
at the same time.

Cont.…
• Apparently a greater number of stages always provides better
performance.
• However:
Greater number of stages increases the overhead in moving information
between stages and synchronization between stages.
With the number of stages the complexity of the CPU grows.
It is difficult to keep a large pipeline at maximum rate because of pipeline
hazards.

Fetch
Instruction Instruction
(a) Simplified view
Result
Execute
Fetch
Instruction
Discard
Instruction
New address
Wait Wait
(b) Expanded view
Figure 14.9 Two-Stage Instruction Pipeline
Result
Execute

Additional Stages
• Fetch instruction (FI)
• Read the next expected instruction into a
buffer
• Decode instruction (DI)
• Determine the opcode and the operand
specifiers
• Calculate operands (CO)
• Calculate the effective address of each
source operand
• This may involve displacement, register
indirect, indirect, or other forms of
address calculation
• Fetch operands (FO)
• Fetch each operand from memory
• Operands in registers need not be fetched
• Execute instruction (EI)
• Perform the indicated operation and store
the result, if any, in the specified
destination operand location
• Write operand (WO)
• Store the result in memory
reserved.

1
Instruction 1
Time
FI
Instruction 2
Instruction 3
Instruction 4
Instruction 5
Instruction 6
Instruction 7
Instruction 8
Instruction 9
Figure 14.10 Timing Diagram for Instruction Pipeline Operation
2 3 4 5 6 7 8 9 10 11 12 13 14
DI CO FO EI WO
WO
FI DI CO FO EI
FI DI CO FO EI WO
FI DI CO FO EI WO
FI DI CO FO EI WO
FI DI CO FO EI WO
FI DI CO FO EI WO
FI DI CO FO EI WO
FI DI CO FO EI WO

1
Instruction 1
Time
Instruction 2
Instruction 3
Instruction 4
Instruction 5
Instruction 6
Instruction 7
Instruction 15
Instruction 16
Figure 14.11 The Effect of a Conditional Branch on Instruction Pipeline Operation
2 3 4 5 6 7 8 9 10
Branch Penalty
11 12 13 14
FI DI CO FO EI WO
FI DI CO FO EI WO
FI DI CO FO EI WO
FI DI CO FO
FI DI CO
FI DI
FI
FI DI CO FO EI WO
FI DI CO FO EI WO

No
Yes
Yes
No
FI
DI
CO
FO
EI
WO
Calculate
Operands
Fetch
Instruction
Decode
Instruction
Uncon-
ditional
Branch?
Branch
or
Inter
-rupt?
Figure 14.12 Six-Stage Instruction Pipeline
Write
Operands
Fetch
Operands
Execute
Instruction
Update
PC
Empty
Pipe

1
0
2
4
6
8
10
12
0 5 10 15 20
0
2
4
6
8
10
12
14
2 4 8
Number of instructions (log scale)
(a)
Speedup
factor
Speedup
factor
Number of stages
(b)
Figure 14.14 Speedup Factors with Instruction Pipelining
16
k = 6 stages
n = 10 instructions
n = 20 instructions
n = 30 instructions
k = 9 stages
k = 12 stages
32 64 128

Pipeline Hazards
• Pipeline hazards are situations that prevent the next instruction in the instruction
stream from executing during its designated clock cycle.
• The instruction is said to be stalled.
• When an instruction is stalled, all instructions later in the pipeline than the stalled
instruction are also stalled.
• Instructions earlier than the stalled one can continue.
• No new instructions are fetched during the stall.

Pipeline Hazards
Occur when the pipeline,
or some portion of the
pipeline, must stall
because conditions do
not permit continued
execution
Also referred to as a
pipeline bubble
There are three types of
hazards:
•Resource
•Data
•Control
reserved.

Types of Hazards
• Structural (Resource) Hazards: arises from hardware resource conflicts. That
is, when the hardware cannot service all the combinations of parallel use
attempted by the stages in the pipeline.
• Data Hazards: arise when an instruction depends on the(data) results of
another instruction that has not yet produced the desired/needed result.
• Control Hazards: arising from the presence of branches or other instructions
in the pipeline that alter the sequential instruction flow.
• Instruction fetch depends on the result of instruction in pipeline

Structural (resource) Hazards
• Occur when a certain resource (memory, functional unit) is requested by more than
one instruction at the same time.
• Insufficient resources to service need.
• Sometimes resources are not sufficiently duplicated: e.g., read/writes ports
• Commonly arises when you have uneven service rates in the pipe stages.
Possible Solutions
 Stall.
Refactor pipeline or pipeline the pipe stage.
Duplicate/split the resource (split/duplicate caches to alleviate memory pressure).
Build instruction buffers to alleviate memory pressure.

1
I1
Clock cycle
(a) Five-stage pipeline, ideal case
Instrutcion
FI
I2
I3
I4
Figure 14.15 Example of Resource Hazard
2 3 4 5 6 7 8 9
DI FO EI WO
FI DI FO EI WO
FI DI FO EI WO
FI DI FO EI WO
1
I1
Clock cycle
(b) I1 source operand in memory
Instrutcion
FI
I2
I3
I4
2 3 4 5 6 7 8 9
DI FO EI WO
FI DI FO EI WO
FI
Idle DI FO EI WO
FI DI FO EI WO
• In Figure 14.15b, assume that the source operand for
instruction I1 is in memory, rather than a register.
Therefore, the fetch instruction stage of the pipeline
must idle for one cycle before beginning the
instruction fetch for instruction I3.
• If multiple instructions are ready to enter the execute
instruction phase and there is a single ALU. The
solutions is to increase available resources, such as
having multiple ports into main memory and multiple
ALU units.

Data Hazards
• Occurs when the pipeline changes the order of read/write access to operands.
• We have two instructions, I1 and I2. The execution of I2 starts before I1 has
terminated. If I2 needs the result produced by I1, but this result has not yet been
generated, we have a data hazard.
E.g. I1: MUL R2,R3 ; R2 ← R2 * R3
I2: ADD R1,R2; R1 ← R1 + R2
• Early pipe stage attempts to read a data/operand value that has not yet been
produced by an instruction in a later pipe stage.
Possible Solutions
 Stall.
Data forwarding (allow earlier pipe stage to fetch incorrect data, but then overwrite the
fetched result from the later pipe stage) which is called bypassing or short-circuiting.
3/24/2023 72
=> the program produces an incorrect result because of the use of
pipelining.

1
ADD EAX, EBX
Clock cycle
FI
SUB ECX, EAX
I3
I4
Figure 14.16 Example of Data Hazard
2 3 4 5 6 7 8 9 10
DI FO EI WO
FI DI Idle FO EI WO
FI DI FO EI WO
FI DI FO EI WO
The first instruction adds the contents of the 32-bit registers EAX and EBX and stores the result in EAX. The
second instruction subtracts the contents of EAX from ECX and stores the result in ECX.
If the two instructions are executed in strict sequence, no problem occurs. However, if the instructions
are executed in a pipeline, then it is possible for the program produces an incorrect result because of the
use of pipelining.

Types of Data Hazard
• Read after write (RAW), or true dependency
• An instruction modifies a register or memory location and a succeeding instruction reads
data in memory or register location.
• Hazard occurs if the read takes place before write operation is complete.
• Write after read (WAR), or antidependency
• An instruction reads a register or memory location and a succeeding instruction writes to
the location
• Hazard occurs if the write operation completes before the read operation takes place.
• Write after write (WAW), or output dependency
• Two instructions both write to the same location.
• Hazard occurs if the write operations take place in the reverse order of the intended
sequence.

Control Hazard
• Also known as a branch hazard (produced by branch instructions)
• The presence of a (conditional) branch alters the sequential flow of
instructions and it is not known where to continue until the branch outcome
is resolved.
Possible Solutions
Multiple streams : replicate the initial portions of the pipeline and allow the pipeline to
fetch both instructions, making use of two streams.
Prefetching branch target: When a conditional branch is recognized, the target
of the branch is pre fetched, in addition to the instruction following the branch.
Loop buffer: the most recently fetched instructions, in sequence is buffered. If a branch
is to be taken, the hardware checks the branch within the buffer.
Delayed branch: Redefine the runtime behavior of branches to take affect only after
the partially fetched /executed instructions flow through the pipeline.
Branch prediction: Predict (statically or dynamically) the outcome of the branch and
fetch there.

RISC & CISC
• Instruction set : determines the way that machine language programs are
constructed.
• Its design is important aspect of computer
• Based on the instruction set design computer instruction is classified as CISC
(Complex Instruction Set Computing) and RISC (Reduced Instruction Set
Computing).
• Both approaches try to increase the CPU performance
• RISC: Reduce the cycles per instruction at the cost of the number of instructions
per program.
• CISC: The CISC approach attempts to minimize the number of instructions per
program but at the cost of increase in number of cycles per instruction.
3/24/2023 76

Complex Instruction Set Architecture (CISC)
• The main idea is that a single instruction will do all loading, evaluating and storing
operations just like a multiplication command will do stuff like loading data,
evaluating and storing it, hence it’s complex.
• Minimize the number of instructions per program but at the cost of increase in
number of cycles per instruction.
• Code size is smaller, but more number of cycle.
• Needs better hardware and powerful processing.
• Large variety of addressing modes.
• Variable length instruction formats.
• Use more RAM but less register.

Characteristics of CISC
• Complex instruction, hence complex instruction decoding.
• Instruction are larger than one word size.
• Instruction may take more than single clock cycle to get executed.
• Less number of general purpose register as operation get performed in
memory itself.
• Complex Addressing Modes.
• More Data types

Reduced Instruction Set Architecture (RISC)
• The main idea behind is to make hardware simpler by using an instruction set
composed of a few basic steps for loading, evaluating and storing operations just
like a load command will load data, store command will store the data.
• Is a type of microprocessor architecture that utilizes a small, highly-optimized set of
instructions.
• Reduce the cycles per instruction at the cost of the number of instructions per
program.
• Designed to perform a set of smaller computer instructions so that it can operate at
higher speeds.
• Code size is larger but clock cycle is smaller.
• Use more register and less RAM.

Characteristics of RISC
• Simpler instruction, hence simple instruction decoding.
• Instruction come under size of one word.
• Instruction take single clock cycle to get executed.
• More number of general purpose register.
• Simple Addressing Modes.
• All operations done within “registers” of the CPU.
• Fixed-length and easily decoded instruction format.
• Less Data types.
• Pipeline can be achieved.
• Hardwired control unit.

Cont.….
Advantage of RISC
1)Because each instruction requires only one clock cycle to execute, the
entire program will execute in approximately the same amount of time as
the multi-cycle "MUL" command.
2)These RISC "reduced instructions" require less transistors of hardware
space than the complex instructions, leaving more room for general purpose
registers.
3)Because all of the instructions execute in a uniform amount of time (i.e.
one clock), pipelining is possible.

Difference between CISC and RISC
RISC CISC
 Focus on software  Focus on hardware
 Uses only Hardwired control unit
 Uses both hardwired and microprogrammed
control unit
 Transistors are used for more registers
 Transistors are used for storing complex
Instructions
 Fixed sized instructions  Variable sized instructions
 Can perform only Register to Register
Arithmetic operations
 Can perform REG to REG or REG to MEM or MEM
to MEM
 Requires more number of registers  Requires less number of registers
 Code size is large  Code size is small
 An instruction executed in a single clock cycle  Instruction takes more than one clock cycle
 An instruction fit in one word  Instructions are larger than the size of one word
3/24/2023 82

Control Unit
The execution of a program consists of the sequential execution of
instructions.
• Each instruction is executed during an instruction cycle made up of shorter sub-
cycles (fetch, indirect, execute, interrupt)
• Each cycle is made up of a sequence of more fundamental operations, called micro-
operations
 A single micro-operation involves a transfer b/n registers, a transfer b/n a register and an external
bus, or a simple ALU operation.
The control unit of a processor performs two tasks:
1. It causes the processor to step through a series of micro-operations in the proper
sequence, based on the program being executed, i.e. sequencing.
2. It generates the control signals that cause each micro-operation to be executed i.e.
executing.
This signal cause the opening and closing of logic gates, resulting in the transfer of data
to and from registers and the operation of the ALU.

Program Execution
Instruction Cycle Instruction Cycle
Instruction Cycle
Indirect Execute Interrupt
Fetch
µOP µOP µOP
Figure 20.1 Constituent Elements of a Program Execution
µOP
µOP

The three-step process leads to a characterization of the control unit:
1. Define the basic elements of the processor.
2. Describe the micro-operations that the processor performs.
3. Determine the functions that the control unit must perform to cause the micro-operations to
be performed.
The basic functional elements of the processor are the following:
■ ALU
■ Registers
■ Internal data paths
Micro-operations fall into one of the following categories:
• Transfer data from one register to another.
• Transfer data from a register to an external interface (e.g., system bus).
• Transfer data from an external interface to a register.
• Perform an arithmetic or logic operation, using registers for input and output.
■ External data paths
■ Control unit

Control
Unit
Figure 20.4 Block Diagram of the Control Unit
Instruction register
Flags
Clock
Control signals
within CPU
Control signals
from control bus
Control signals
to control bus
Control
bus
The inputs are:
■ Clock: causes one micro-operation to be performed for
each clock pulse.
■ Instruction register: The opcode and addressing mode of
the current instruction are used to determine which micro-
operations to be performed.
■ Flags: determine the status of the CPU and the outcome
of previous ALU operations.
■ Control signals from control bus: provides signals of
interrupt and acknowledgment to the control unit.
The outputs are as follows:
■ Control signals within CPU: cause data mov’t from one register to another, and activate specific ALU functions.
■ Control signals to control bus: control signals to memory, and to the I/O modules.
 Three types of control signals :
 those that activate an ALU function;
 those that activate a data path; and
 those that are signals on the external system bus or other external interface.

3/24/2023 87
The first step in execution is to transfer the contents of the PC to the MAR.
 The control unit activates the control signal that opens the gates between the bits of PC and the
MAR.
The next step is to read a word from memory into the MBR and increment the PC.
The control unit does this by sending the following control signals simultaneously:
■ A control signal that opens gates, allowing the contents of the MAR onto the address bus;
■ A memory read control signal on the control bus;
■ A control signal that opens the gates, allowing the contents of the data bus to be stored in the MBR;
■ Control signals to logic that add 1 to the contents of the PC and store the result back to the PC.
 The control unit sends a control signal that opens gates between the MBR and the IR.
 The control unit decide whether to perform an indirect cycle or an execute cycle next. To decide this,
it examines the IR to see if an indirect memory reference is made.

Two major types of Control Unit
Hardwired Control Unit:
 The control logic is implemented with gates, flip-flops, decoders, and other digital circuits.
 + Fast operation.
 -Wiring change(if the design has to be modified).
Micro-programmed Control Unit:
 The control information is stored in a control memory, and the control memory is
programmed to initiate the required sequence of micro-operations.
 +Any required change can be done by updating the micro-program in control memory (Easy
to change or modification).
 -Slow operation.
3/24/2023 88

Hardwired Implementation
• The control unit is a combinatorial circuit.
• The controls signals are generated by hardware using conventional
logic design techniques.
• The control signals, that specify the micro operations, are a group of bits that
select the path in multiplexer, decoder, and arithmetic logic units.
Problems
Complex sequencing and micro-operation logic.
Difficult to design and test.
Inflexible design.
Difficult to add new instruction.

Instruction register
Decoder
Control
Unit
Figure 20.10 Control Unit with Decoded Inputs
Flags
Timing
generator
Tn
Clock
T2
T1
I0 I1 Ik
C0 C1 Cm

Microprogrammed Implementation
An alternative to a hardwired control unit.
The logic of the control unit is specified by a microprogram which consists a
sequence of instructions.
 Micro program is a sequence of microinstruction.
 Microinstruction are very simple instructions that specify micro-operations.
The Microinstruction is stored in a control memory in the form of control word.
 to initiate (generate) the required sequence of micro-operations.
A microprogrammed control unit is a relatively simple logic circuit that is capable of
1. Sequencing through microinstructions and
2. Generating control signals to execute each microinstruction
• The control signals generated by a microinstruction are used to cause register transfers and ALU
operations.

Cont...
Dynamic microprogramming :Control Memory =RAM
RAM can be used for writing (to change a writable control memory)
Micro program is loaded initially from an auxiliary memory such as a magnetic disk.
Static microprogramming : Control Memory =ROM
Control words in ROM are made permanent during the hardware production.
Control Memory
• A memory which is part of a control unit
»Computer Memory
• Main Memory : for storing user program (Machine instruction/data)
• Control Memory : for storing microprogram (serious of microinstruction)

Sequencing
Logic
Control
Unit Decoder
Decoder
Control Signals
to System Bus
Control Signals
Within CPU
ALU
Flags
Clock
Read
Next Address Control
Control Address Register
Instruction Register
Control Buffer Register
Figure 21.4 Functioning of Microprogrammed Control Unit
Control
Memory
1. Sequencing logic issues a READ command to the
control memory.
2. The control word whose address is specified in
the control address register is read into the control
buffer register.
3. The content of the control buffer register
generates control signals and next-address
information for the sequencing logic unit.
4. The sequencing logic unit loads a new address
into the control address register based on the
information from the control buffer register and the
ALU flags.
• The upper decoder translates the opcode of the IR into
a control memory address.
To execute an instruction in 1 clock pulse;

Hardwired vs Micro programmed
3/24/2023 95
Parameters Hardwired Microprogrammed
Control signals  Generated using hardware.  Generated using software(micro program)
Structure  Based on hardware, so it is rigid.  Based on software, so it is flexible.
Modification  Done by redesigning.  Done by reprogramming.
Instruction set  Small and simple.  Large and complex.
Debugging  Difficult  Easy
Emulation  Not possible  Possible
Execution speed  Very fast  Slower
Memory  No memory required  Memory required for microprogram
Cost  Low cost as no memory  High cost due to control memory
Processor  Preferred in RISC  Preferred in CISC
Design process  Sequential circuit  Programming
Chip area  More chip area  Less chip area
Pipelining  Small and efficient  Long and less efficient

Ch 2.pptx

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Ähnlich wie Ch 2.pptx

Ähnlich wie Ch 2.pptx (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Ch 2.pptx