SlideShare ist ein Scribd-Unternehmen logo
1 von 94
Part III
Multiplication
Parts

Chapters
1.
2.
3.
4.

Numbers and Arithmetic
Representing Signed Numbers
Redundant Number Systems
Residue Number Systems

5.
6.
7.
8.

Basic Addition and Counting
Carry-Look ahead Adders
Variations in Fast Adders
Multioperand Addition

III. Multiplication

9.
10.
11.
12.

Basic Multiplication Schemes
High-Radix Multipliers
Tree and Array Multipliers
Variations in Multipliers

IV. Division

13.
14.
15.
16.

Basic Division Schemes
High-Radix Dividers
Variations in Dividers
Division by Convergence

17.
18.
19.
20.

Floating-Point Reperesentations
Floating-Point Operations
Errors and Error Control
Precise and Certifiable Arithmetic

21.
22.
23.
24.

Square-Rooting Methods
The CORDIC Algorithms
Variations in Function Evaluation
Arithmetic by Table Lookup

25.
26.
27.
28.
28.

High-Throughput Arithmetic
Low-Power Arithmetic
Fault-Tolerant Arithmetic
Past, Present, and Future
Reconfigurable Arithmetic

Elementary Operations

I. Number Representation

II. Addition / Subtraction

V. Real Arithmetic

VI. Function Evaluation

VII. Implementation Topics

Appendix: Past, Present, and Future

Apr. 2012 Computer Arithmetic, Multiplicationlide 1
S
About This Presentation
This presentation is intended to support the use of the textbook
Computer Arithmetic: Algorithms and Hardware Designs (Oxford
U. Press, 2nd ed., 2010, ISBN 978-0-19-532848-6). It is
updated regularly by the author as part of his teaching of the
graduate course ECE 252B, Computer Arithmetic, at the
University of California, Santa Barbara. Instructors can use
these slides freely in classroom teaching and for other
educational purposes. Unauthorized uses are strictly prohibited.
© Behrooz Parhami
Edition

Released

Revised

Revised

Revised

Revised

First

Jan. 2000

Sep. 2001

Sep. 2003

Oct. 2005

May 2007

Apr. 2008

Apr. 2009

Apr. 2011

Apr. 2012

Second

Apr. 2010

Apr. 2012 Computer Arithmetic, Multiplicationlide 2
S
III Multiplication
Review multiplication schemes and various speedup methods
• Multiplication is heavily used (in arith & array indexing)
• Division = reciprocation + multiplication
• Multiplication speedup: high-radix, tree, recursive
• Bit-serial, modular, and array multipliers

Topics in This Part
Chapter 9 Basic Multiplication Schemes
Chapter 10 High-Radix Multipliers
Chapter 11 Tree and Array Multipliers
Chapter 12 Variations in Multipliers
Apr. 2012 Computer Arithmetic, Multiplicationlide 3
S
“Well, well, for a rabbit, you’re not
very good at multiplying, are you?”

Apr. 2012 Computer Arithmetic, Multiplicationlide 4
S
9 Basic Multiplication Schemes
Chapter Goals
Study shift/add or bit-at-a-time multipliers
and set the stage for faster methods and
variations to be covered in Chapters 10-12
Chapter Highlights
Multiplication = multioperand addition
Hardware, firmware, software algorithms
Multiplying 2’s-complement numbers
The special case of one constant operand
Apr. 2012 Computer Arithmetic, Multiplicationlide 5
S
Basic Multiplication Schemes: Topics
Topics in This Chapter
9.1 Shift/Add Multiplication Algorithms
9.2 Programmed Multiplication
9.3 Basic Hardware Multipliers
9.4 Multiplication of Signed Numbers
9.5 Multiplication by Constants
9.6 Preview of Fast Multipliers

Apr. 2012 Computer Arithmetic, Multiplicationlide 6
S
9.1 Shift/Add Multiplication Algorithms
Notation for our discussion of multiplication algorithms:
a
x
p

Multiplicand
Multiplier
Product (a × x)

p2k–1p2k–2

ak–1ak–2 . . . a1a0
xk–1xk–2 . . . x1x0
. . . p3p2p1p0

Initially, we assume unsigned operands
Multiplicand
Multiplier

x 0 a 20
x 1 a 21
x 2 a 22
x 3 a 23

Partial
products
bit-matrix

p

×

a
x

Product

Fig. 9.1 Multiplication of two 4-bit unsigned binary numbers in dot notation.

Apr. 2012 Computer Arithmetic, Multiplicationlide 7
S
Multiplication Recurrence

Partial
products
bit-matrix

p

×

Multiplicand
Multiplier

x 0 a 20
x 1 a 21
x 2 a 22
x 3 a 23

Fig. 9.1

a
x

Product

Preferred

Multiplication with right shifts: top-to-bottom accumulation
p(j+1) = (p(j) + xj a 2k) 2–1
|–––add–––|
|––shift right––|

with

p(0) = 0 and
p(k) = p = ax + p(0)2–k

Multiplication with left shifts: bottom-to-top accumulation
p(j+1) = 2 p(j) + xk–j–1a with
|shift|
|––––add––––|

p(0) = 0 and
p(k) = p = ax + p(0)2k

Apr. 2012 Computer Arithmetic, Multiplicationlide 8
S
Examples of Basic Multiplication
Right-shift algorithm
========================
a
1010 1 0 1 0
x
1 0 1 1
========================
p(0)
0 0 0 0
+x0a
1 0 1 0
–––––––––––––––––––––––––
2p(1)
0 1 0 1 0
(1)
p
0 1 0 1 0
+x1a
1 0 1 0
–––––––––––––––––––––––––
2p(2)
0 1 1 1 1 0
(2)
p
0 1 1 1 1 0
+x2a
0 0 0 0
–––––––––––––––––––––––––
2p(3)
0 0 1 1 1 1 0
(3)
p
0 0 1 1 1 1 0
+x3a
1 0 1 0
–––––––––––––––––––––––––
2p(4)
0 1 1 0 1 1 1 0
(4)
p
0 1 1 0 1 1 1 0
========================

Left-shift algorithm
=======================
a
1 0 1 0
x
1 0 1 1
=======================
p(0)
0 0 0 0
(0)
2p
0 0 0 0 0
+x3a
1 0 1 0
––––––––––––––––––––––––
p(1)
0 1 0 1 0
2p(1)
0 1 0 1 0 0
+x2a
0 0 0 0
––––––––––––––––––––––––
p(2)
0 1 0 1 0 0
(j+1)
(2) = (p(j) + x a 2k) 2–1
p
2p
0 1j 0 1 0 0 0
+x1a
1
|–––add–––| 0 1 0
––––––––––––––––––––––––
|––shift right––|
(3)
p
0 1 1 0 0 1 0
(3)
2p
0 1 1 0 0 1 0 0
+x0a
1 0 1 0
––––––––––––––––––––––––
p(4)
0 1 1 0 1 1 1 0
=======================

Apr. 2012 Computer Arithmetic, Multiplicationlide 9
S

Fig. 9.2
Examples
of
sequential
multiplication with
right and
left shifts.

Check:
10 × 11
= 110
= 64 + 32 +
8+4+2
Examples of Basic Multiplication (Continued)
Right-shift algorithm
========================
a
1 0 1 0
x
1 0 1 1
========================
p(0)
0 0 0 0
+x0a
1 0 1 0
–––––––––––––––––––––––––
2p(1)
0 1 0 1 0
(1)
p
0 1 0 1 0
+x1a
1 0 1 0
–––––––––––––––––––––––––
2p(2)
0 1 1 1 1 0
(2)
p
0 1 1 1 1 0
(j)
p(j+1) =0 2 p0 0 xk–j–1a
+
+x2a
0
|shift|
–––––––––––––––––––––––––
2p(3)
0 0|––––add––––|
1 1 1 1 0
(3)
p
0 0 1 1 1 1 0
+x3a
1 0 1 0
–––––––––––––––––––––––––
2p(4)
0 1 1 0 1 1 1 0
(4)
p
0 1 1 0 1 1 1 0
========================

Left-shift algorithm
=======================
a
1 0 1 0
x
1 0 1 1
=======================
p(0)
0 0 0 0
(0)
2p
0 0 0 0 0
+x3a
1 0 1 0
––––––––––––––––––––––––
p(1)
0 1 0 1 0
2p(1)
0 1 0 1 0 0
+x2a
0 0 0 0
––––––––––––––––––––––––
p(2)
0 1 0 1 0 0
(2)
2p
0 1 0 1 0 0 0
+x1a
1 0 1 0
––––––––––––––––––––––––
p(3)
0 1 1 0 0 1 0
(3)
2p
0 1 1 0 0 1 0 0
+x0a
1 0 1 0
––––––––––––––––––––––––
p(4)
0 1 1 0 1 1 1 0
=======================

Fig. 9.2
Examples
of
sequential
multiplication with
right and
left shifts.

Check:
10 × 11
= 110
= 64 + 32 +
8+4+2

Apr. 2012 Computer Arithmetic, Multiplicationlide 10
S
9.2 Programmed Multiplication
{Using right shifts, multiply unsigned m_cand and m_ier,
storing the resultant 2k-bit product in p_high and p_low.
Registers: R0 holds 0
Rc for counter R0
0 Rc Counter
Ra for m_cand
Rx for m_ier
Rp for p_high
Rq for p_low} Ra Multiplicand Rx Multiplier
{Load operands into registers Ra and Rx} Rp Product, high Rq Product, low
mult: load
Ra with m_cand
load
Rx with m_ier
{Initialize partial product and counter}
copy
R0 into Rp
copy
R0 into Rq
load
k into Rc
{Begin multiplication loop}
m_loop: shift
Rx right 1 {LSB moves to carry flag}
branch
no_add if carry = 0
add
Ra to Rp
{carry flag is set to cout}
no_add: rotate
Rp right 1 {carry to MSB, LSB to carry}
rotate
Rq right 1 {carry to MSB, LSB to carry}
decr
Rc
{decrement counter by 1}
branch
m_loop if Rc ≠ 0
{Store the product}
Fig. 9.3 Programmed
store
Rp into p_high
multiplication (right-shift
store
Rq into p_low
algorithm).
m_done: ...

Apr. 2012 Computer Arithmetic, Multiplicationlide 11
S
Time Complexity of Programmed Multiplication
Assume k-bit words
k iterations of the main loop
6-7 instructions per iteration, depending on the multiplier bit
Thus, 6k + 3 to 7k + 3 machine instructions,
ignoring operand loads and result store
k = 32 implies 200+ instructions on average
This is too slow for many modern applications!
Microprogrammed multiply would be somewhat better

Apr. 2012 Computer Arithmetic, Multiplicationlide 12
S
9.3 Basic Hardware Multipliers
Shift
Multiplier x
Doublewidth partial product p

(j)

Shift
Multiplicand a
0
0

Mux

xj a
cout

k

1

k

Adder

xj

p(j+1) = (p(j) + xj a 2k) 2–1
|–––add–––|
|––shift right––|

k

Fig. 9.4 Hardware realization of the sequential multiplication
algorithm with additions and right shifts.

Apr. 2012 Computer Arithmetic, Multiplicationlide 13
S
Example of Hardware Multiplication
Shift

1 Multiplier x 1
1 1 0
0 0
Doublewidth 1 1 1
1 1 1 1 0 0 0
0 0 0 0partial product p 0
(j)

(11)ten
(110)ten

Shift

(10)ten

1 Multiplicand a
0 1 0
0

0

Mux

xj a
cout

k

1

k

Adder

xj

p(j+1) = (p(j) + xj a 2k) 2–1
|–––add–––|
|––shift right––|

k

Fig. 9.4a Hardware realization of the sequential multiplication
algorithm with additions and right shifts.

Apr. 2012 Computer Arithmetic, Multiplicationlide 14
S
Performing Add and Shift in One Clock Cycle
Adder’s
carry-out

Adder’s sum
k

k–1

Unused
part of the
multiplier x

Partial product p(j)
k
To adder

k–1
To mux control

Fig. 9.5 Combining the loading and shifting of the
double-width register holding the partial product and
the partially used multiplier.

Apr. 2012 Computer Arithmetic, Multiplicationlide 15
S
Sequential Multiplication with Left Shifts
Shift
Multiplier x

Doublewidth partial product p (j)
Shift
Multiplicand a

2k

xk-j-1

0
0

Mux

xk-j-1a
cout

1

k

2k-bit adder
2k

Fig. 9.4b Hardware realization of the sequential multiplication
algorithm with left shifts and additions.

Apr. 2012 Computer Arithmetic, Multiplicationlide 16
S
9.4 Multiplication of
Signed Numbers
Fig. 9.6 Sequential
multiplication of
2’s-complement
numbers with right
shifts (positive
multiplier).
Negative multiplicand,
positive multiplier:
No change, other than
looking out for proper
sign extension

============================
a
1 0 1 1 0
x
0 1 0 1 1
============================
p(0)
0 0 0 0 0
Check:
+x0a
1 0 1 1 0
–
10 × 11
–––––––––––––––––––––––––––––
= –110
(1)
2p
1 1 0 1 1 0
(1)
= –512 +
p
1 1 0 1 1 0
+x1a
1 0 1 1 0
256 +
–––––––––––––––––––––––––––––
128 +
2p(2)
1 1 0 0 0 1 0
16 + 2
p(2)
1 1 0 0 0 1 0
+x2a
0 0 0 0 0
–––––––––––––––––––––––––––––
2p(3)
1 1 1 0 0 0 1 0
(3)
p
1 1 1 0 0 0 1 0
+x3a
1 0 1 1 0
–––––––––––––––––––––––––––––
2p(4)
1 1 0 0 1 0 0 1 0
(4)
p
1 1 0 0 1 0 0 1 0
+x4a
0 0 0 0 0
–––––––––––––––––––––––––––––
2p(5)
1 1 1 0 0 1 0 0 1 0
(5)
p
1 1 1 0 0 1 0 0 1 0
============================

Apr. 2012 Computer Arithmetic, Multiplicationlide 17
S
The Case of a
Negative Multiplier
Fig. 9.7 Sequential
multiplication of
2’s-complement
numbers with right
shifts (negative
multiplier).
Negative multiplicand,
negative multiplier:
In last step (the sign bit),
subtract rather than add

============================
a
1 0 1 1 0
x
1 0 1 0 1
============================
p(0)
0 0 0 0 0
Check:
+x0a
1 0 1 1 0
–
10 × –11
–––––––––––––––––––––––––––––
= 110
2p(1)
1 1 0 1 1 0
= 64 + 32 +
p(1)
1 1 0 1 1 0
+x1a
0 0 0 0 0
8+4+2
–––––––––––––––––––––––––––––
2p(2)
1 1 1 0 1 1 0
(2)
p
1 1 1 0 1 1 0
+x2a
1 0 1 1 0
–––––––––––––––––––––––––––––
2p(3)
1 1 0 0 1 1 1 0
(3)
p
1 1 0 0 1 1 1 0
+x3a
0 0 0 0 0
–––––––––––––––––––––––––––––
2p(4)
1 1 1 0 0 1 1 1 0
(4)
p
1 1 1 0 0 1 1 1 0
+(−x4a)
0 1 0 1 0
–––––––––––––––––––––––––––––
2p(5)
0 0 0 1 1 0 1 1 1 0
(5)
p
0 0 0 1 1 0 1 1 1 0
============================

Apr. 2012 Computer Arithmetic, Multiplicationlide 18
S
Signed 2’s-Complement Hardware Multiplier
Multiplier
Partial product
k+1

Multiplicand
1

Enable

0
Mux
k+1

cout

Select

Adder
k+1

Fig. 9.8

cin
0, except in
last cycle

The 2’s-complement sequential hardware multiplier.

Apr. 2012 Computer Arithmetic, Multiplicationlide 19
S
Booth’s Recoding
Table 9.1 Radix-2 Booth’s recoding
–––––––––––––––––––––––––––––––––––––
xi xi–1 yi
Explanation
–––––––––––––––––––––––––––––––––––––
0 0
0
No string of 1s in sight
0 1
1
End of string of 1s in x
−
1 0
1
Beginning of string of 1s in x
1 1
0
Continuation of string of 1s in x
–––––––––––––––––––––––––––––––––––––
Example
1 0 0 1 1 1 0 1
(1) −1 0 1 0 0 −1 1 0

1 0 1 0
−
1 1 −1 1

1 1 1 0
0 0 −1 0

Operand x
Recoded version y

Justification
2j + 2j–1 + . . . + 2i+1 + 2i = 2j+1 – 2i

Apr. 2012 Computer Arithmetic, Multiplicationlide 20
S
Example Multiplication
with Booth’s Recoding
Fig. 9.9 Sequential
multiplication of
2’s-complement
numbers with right
shifts by means of
Booth’s recoding.
––––––––––
xi xi–1 yi
––––––––––
0 0
0
0 1
1
−
1 0
1
1 1
0
––––––––––

============================
a
1 0 1 1 0
x
1 0 1 0 1 Multiplier
−
y
1 1 −1 1 −1 Booth-recoded
============================
p(0)
0 0 0 0 0
Check:
+y0a
0 1 0 1 0
–
10 × –11
–––––––––––––––––––––––––––––
= 110
2p(1)
0 0 1 0 1 0
p(1)
0 0 1 0 1 0 = 64 + 32 +
+y1a
1 0 1 1 0
8+4+2
–––––––––––––––––––––––––––––
2p(2)
1 1 1 0 1 1 0
p(2)
1 1 1 0 1 1 0
+y2a
0 1 0 1 0
–––––––––––––––––––––––––––––
2p(3)
0 0 0 1 1 1 1 0
p(3)
0 0 0 1 1 1 1 0
+y3a
1 0 1 1 0
–––––––––––––––––––––––––––––
2p(4)
1 1 1 0 0 1 1 1 0
p(4)
1 1 1 0 0 1 1 1 0
y4a
0 1 0 1 0
–––––––––––––––––––––––––––––
2p(5)
0 0 0 1 1 0 1 1 1 0
p(5)
0 0 0 1 1 0 1 1 1 0
============================

Apr. 2012 Computer Arithmetic, Multiplicationlide 21
S
9.5 Multiplication by Constants
Explicit, e.g.

y := 12 ∗ x + 1

Implicit, e.g.

A[i, j] := A[i, j] + B[i, j]

Address of A[i, j] = base + n ∗ i + j

Software aspects:

0

1

2

. . .

n–1

0
1
2
.
.

Row i

.
m–1
Column j

Optimizing compilers replace multiplications by shifts/adds/subs
Produce efficient code using as few registers as possible
Find the best code by a time/space-efficient algorithm
Hardware aspects:
Synthesize special-purpose units such as filters
y[t] = a0x[t] + a1x[t – 1] + a2x[t – 2] + b1y[t – 1] + b2y[t – 2]

Apr. 2012 Computer Arithmetic, Multiplicationlide 22
S
Multiplication Using Binary Expansion
Example: Multiply R1 by the constant 113 = (1 1 1 0 0 0 1)two
R2
R3
R6
R7
R112
R113

←
←
←
←
←
←

R1 shift-left 1
R2 + R1
R3 shift-left 1
R6 + R1
R7 shift-left 4
R112 + R1

Shift, add

Shift

Ri: Register that contains i times (R1)
This notation is for clarity; only one
register other than R1 is needed

Shorter sequence using shift-and-add instructions
R3
R7
R113

←
←
←

R1 shift-left 1 + R1
R3 shift-left 1 + R1
R7 shift-left 4 + R1

Apr. 2012 Computer Arithmetic, Multiplicationlide 23
S
Multiplication via Recoding
Example: Multiply R1 by 113 = (1 1 1 0 0 0 1)two = (1 0 0−1 0 0 0 1)two
R8
R7
R112
R113

←
←
←
←

R1 shift-left 3
R8 – R1
R7 shift-left 4
R112 + R1

Shift
Shift, subtract

Shift, add

Shorter sequence using shift-and-add/subtract instructions
R7
R113

←
←

R1 shift-left 3 – R1
R7 shift-left 4 + R1

6 shift or add (3 shift-and-add) instructions needed without recoding
The canonic signed-digit representation of a number contains no
consecutive nonzero digits: average number of shift-adds is O(k/3)

Apr. 2012 Computer Arithmetic, Multiplicationlide 24
S
Multiplication via Factorization
Example: Multiply R1 by 119 = 7 × 17 = (8 – 1) × (16 + 1)
R8
R7
R112
R119

←
←
←
←

R1 shift-left 3
R8 – R1
R7 shift-left 4
R112 + R7

Shorter sequence using shift-and-add/subtract instructions
R7
R119

←
←

R1 shift-left 3 – R1
R7 shift-left 4 + R7

Requires a scratch register
for holding the 7 multiple

119 = (1 1 1 0 1 1 1)two = (1 0 0 0−1 0 0−1)two
More instructions may be needed without factorization

Apr. 2012 Computer Arithmetic, Multiplicationlide 25
S
Multiplication by Multiple Constants
Example: Multiplying a number by 45, 49, and 65
R9
R45

←
←

R1 shift-left 3 + R1
R9 shift-left 2 + R9

R7
R49

←
←

R1 shift-left 3 – R1
R7 shift-left 3 – R7

R65

←

Separate solutions:
5 shift-add/subtract
operations

R1 shift-left 6 + R1

A combined solution for all three constants
R65
R49
R45

←
←
←

R1 shift-left 6 + R1
R65 – R1 left-shift 4
R49 – R1 left-shift 2

A programmable
block can perform
any of the three
multiplications

Apr. 2012 Computer Arithmetic, Multiplicationlide 26
S
9.6 Preview of Fast Multipliers
Viewing multiplication as a multioperand addition problem,
there are but two ways to speed it up
a. Reducing the number of operands to be added:
Handling more than one multiplier bit at a time
(high-radix multipliers, Chapter 10)
b. Adding the operands faster:
Parallel/pipelined multioperand addition
(tree and array multipliers, Chapter 11)
In Chapter 12, we cover all remaining multiplication topics:
Bit-serial multipliers
Modular multipliers
Multiply-add units
Squaring as a special case

Apr. 2012 Computer Arithmetic, Multiplicationlide 27
S
10 High-Radix Multipliers
Chapter Goals
Study techniques that allow us to handle
more than one multiplier bit in each cycle
(two bits in radix 4, three in radix 8, . . .)
Chapter Highlights
High radix gives rise to “difficult” multiples
Recoding (change of digit-set) as remedy
Carry-save addition reduces cycle time
Implementation and optimization methods
Apr. 2012 Computer Arithmetic, Multiplicationlide 28
S
High-Radix Multipliers: Topics
Topics in This Chapter
10.1 Radix-4 Multiplication
10.2 Modified Booth’s Recoding
10.3 Using Carry-Save Adders
10.4 Radix-8 and Radix-16 Multipliers
10.5 Multibeat Multipliers
10.6 VLSI Complexity Issues

Apr. 2012 Computer Arithmetic, Multiplicationlide 29
S
10.1 Radix-4 Multiplication
Fig. 9.1
(modified)

×

a
x

Multiplicand
Multiplier

x0 r 0
x 0 a 20
x 1 a 21
x1 r 1
x 2 a 22
x2 a r 2
3
x 3 a 23
x3 a r
p

Partial
products
bit-matrix
Product

Preferred

Multiplication with right shifts in radix r : top-to-bottom accumulation
p(j+1) = (p(j) + xj a r k) r –1
|–––add–––|
|––shift right––|

with

p(0) = 0 and
p(k) = p = ax + p(0)r –k

Multiplication with left shifts in radix r : bottom-to-top accumulation
p(j+1) =

r p(j) + xk–j–1a with
|shift|
|––––add––––|

p(0) = 0 and
p(k) = p = ax + p(0)r k

Apr. 2012 Computer Arithmetic, Multiplicationlide 30
S
Radix-4 Multiplication in Dot Notation
a
x
x 0 a 20
x 1 a 21
x 2 a 22
x 3 a 23

Partial
products
bit-matrix

p

×

Multiplicand
Multiplier

Product

Fig. 10.1 Radix-4,
or two-bit-at-a-time,
multiplication in dot
notation

Fig. 9.1

Number of cycles is
halved, but now the
“difficult” multiple 3a
must be dealt with

×

a
x
(x x ) a 4 0
1 0 two
(x 3x 2) twoa 4 1

Multiplicand
Multiplier

p

Product

Apr. 2012 Computer Arithmetic, Multiplicationlide 31
S
A Possible Design for a Radix-4 Multiplier
Precomputed via
shift-and-add
(3a = 2a + a)

Multiplier
3a
0

a 2a

00 01 10 11

k/2 + 1 cycles, rather than k
One extra cycle over k/2
not too bad, but we would like
to avoid it if possible
Solving this problem for radix 4
may also help when dealing
with even higher radices

2-bit shifts

x i+1

Mux

To the adder
Fig. 10.2 The multiple
generation part of a radix-4
multiplier with
precomputation of 3a.

Apr. 2012 Computer Arithmetic, Multiplicationlide 32
S

xi
Example Radix-4 Multiplication Using 3a
================================
a
0 1 1 0
3a
0 1 0 0 1 0
x
1 1 1 0
================================
p(0)
0 0 0 0
+(x1x0)twoa
0 0 1 1 0 0
–––––––––––––––––––––––––––––––––
4p(1)
0 0 1 1 0 0
p(1)
0 0 1 1 0 0
+(x3x2)twoa
0 1 0 0 1 0
–––––––––––––––––––––––––––––––––
4p(2)
0 1 0 1 0 1 0 0
p(2)
0 1 0 1 0 1 0 0
================================

×

a
x
(x x )
1 0
(x 3x 2)
p

Fig. 10.3 Example of
radix-4 multiplication
using the 3a multiple.

Apr. 2012 Computer Arithmetic, Multiplicationlide 33
S
A Second Design for a Radix-4 Multiplier
Fig. 10.4 The multiple generation
part of a radix-4 multiplier based on
replacing 3a with 4a (carry into next
higher radix-4 multiplier digit) and –a.

Multiplier
2-bit shifts

0

a 2a –a

00 01 10 11

Mux

To the adder

x i+1

xi

c Carry
+c
FF
mod 4
c
xi+1 ⊕ xi c
xi ⊕ c
xi+1
---0
0
0
0
1
1
1
1

xi
--0
0
1
1
0
0
1
1

c
--0
1
0
1
0
1
0
1

Set if x c)
xi+1(xi ∨ i+1 = x i = 1
or if x i+1 = c = 1
Mux control
---------------0
0
0
1
0
1
1
0
1
0
1
1
1
1
0
0

Set carry
-----------0
0
0
0
0
1
1
1

Apr. 2012 Computer Arithmetic, Multiplicationlide 34
S
10.2 Modified Booth’s Recoding
Table 10.1

Radix-4 Booth’s recoding yielding (zk/2 . . . z1z0)four

–––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––
xi+1
xi
xi–1
yi+1
yi
zi/2
Explanation
–––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––
0
0
0
0
0
0
No string of 1s in sight
0
0
1
0
1
1
End of string of 1s
0
1
0
0
1
1
Isolated 1
0
1
1
1
0
2
End of string of 1s
−
−
1
0
0
1
0
2
Beginning of string of 1s
−
−
1
0
1
1
1
1
End a string, begin new one
−
−
1
1
0
0
1
1
Beginning of string of 1s
1
1
1
0
0
0
Continuation of string of 1s
–––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––
Recoded
Context
Radix-4 digit
radix-2 digits

Example
1 0 0 1 1 1 0 1
(1) −1 0 1 0 0 −1 1 0
(1)

−

2

2

−

1

2

1 0 1 0
−
1 1 −1 1
1

−

−

1

1 1 1 0
0 0 −1 0
0

−

2

Operand x
Recoded version y
Radix-4 version z

Apr. 2012 Computer Arithmetic, Multiplicationlide 35
S
Example Multiplication via Modified Booth’s Recoding
================================
a
0 1 1 0
x
1 0 1 0
−
z
1 −2
Radix-4
================================
p(0)
0 0 0 0 0 0
+z0a
1 1 0 1 0 0
–––––––––––––––––––––––––––––––––
4p(1)
1 1 0 1 0 0
p(1)
1 1 1 1 0 1 0 0
+z1a
1 1 1 0 1 0
–––––––––––––––––––––––––––––––––
4p(2)
1 1 0 1 1 1 0 0
p(2)
1 1 0 1 1 1 0 0
================================

×

a
x
(x x )
1 0
(x 3x 2)
p

Fig. 10.5 Example of
radix-4 multiplication
with modified Booth’s
recoding of the 2’scomplement multiplier.

Apr. 2012 Computer Arithmetic, Multiplicationlide 36
S
Multiple Generation with Radix-4 Booth’s Recoding
M ultip lier

2-bit shift
x i+1

M ultip licand

Init. 0

xi

x i–1

k
Sign
of a

Recoding Logic
neg two
Could have named
this signal one/two

non0

0

Enable 0
Select

0

a
M ux

2a

1

---- Encoding ---Digit neg two non0
–2
1
1
1
–1
1
0
1
0
0
0
0
1
0
0
1
2
0
1
1
0, a, or 2a

k+1

Add/subtract
control

z i/2 a
To adder inp ut

Fig. 10.6 The multiple generation part of a radix-4
multiplier based on Booth’s recoding.

Apr. 2012 Computer Arithmetic, Multiplicationlide 37
S
10.3 Using Carry-Save Adders
Multiplier

2a

0
Mux

a

0

x i+1

xi

Mux

Old Cumulative
Partial Product

CSA

Adder
New Cumulative Partial Product

Fig. 10.7 Radix-4 multiplication with a carry-save adder used to
combine the cumulative partial product, xia, and 2xi+1a into two numbers.

Apr. 2012 Computer Arithmetic, Multiplicationlide 38
S
Keeping the Partial Product in Carry-Save Form
Sum

Mux

Multiplier

Carry

Partial Product
k

k

Sum
Right
shift

Carry

Multiplicand
0
Mux
k-Bit CSA
Carry

k

Upper half of PP

(a) Multiplier
k block diagram

Sum
k-Bit Adder

Lower half of PP

(b) Operation in a typical cycle

Fig. 10.8 Radix-2 multiplication with
the upper half of the cumulative partial
product kept in stored-carry form.

Apr. 2012 Computer Arithmetic, Multiplicationlide 39
S
Carry-Save Multiplier with Radix-4 Booth’s Recoding
a

Multiplier
x

i+1

x

i

x

i-1

Booth recoder
and selector
Old cumulative
partial product

z

i/2

a

CSA
New cumulative
partial product

2-bit
Adder
Adder

FF

Extra “dot”

To the lower half
of partial product

Fig. 10.9 Radix-4 multiplication with a CSA used to combine the
stored-carry cumulative partial product and zi/2a into two numbers.

Apr. 2012 Computer Arithmetic, Multiplicationlide 40
S
Radix-4 Booth’s Recoding for Parallel Multiplication
x i+2

x i+1

x

x

i

i–1

x

i–2

Recoding Logic
neg two

non0

a

2a

1
Enable 0
Mux
Select

Fig. 10.10 Booth
recoding and multiple
selection logic for
high-radix or parallel
multiplication.

0, a, or 2a

k+1

Selective Complement
k+2

Extra "Dot"
for Column i

0, a, –a, 2a, or –2a

z i/2 a

Apr. 2012 Computer Arithmetic, Multiplicationlide 41
S
Yet Another Design for Radix-4 Multiplication
Multiplier

2a

0
Mux

Old Cumulative
Partial Product

(4; 2)-counter

CSA

New Cumulative
Partial Product

CSA

Adder

x i+1

a

0

xi

Mux

Fig. 10.11 Radix-4 multiplication,
with the cumulative partial product,
xia, and 2xi+1a combined into two
numbers by two CSAs.

FF

2-Bit
Adder

T o the Lower Half
of Partial Product

Apr. 2012 Computer Arithmetic, Multiplicationlide 42
S
10.4 Radix-8 and Radix-16 Multipliers
M ultip lier
0

8a
Mux

0

x i+3

4a

Mux

4-bit
right
shift

0

4-Bit
Shift

Mux

0

x i+1

a

Mux

CSA

Fig. 10.12 Radix-16
multiplication with the
upper half of the
cumulative partial
product in carry-save
form.

x i+2

2a

xi

CSA

CSA

CSA
4

Sum
3
Carry
Partial Product
(Up p er Half)

FF

4-Bit
Adder

4
T o the Lower Half
of Partial Product

Apr. 2012 Computer Arithmetic, Multiplicationlide 43
S
Other High-Radix Multipliers
Remove this mux & CSA and
replace the 4-bit shift (adder)
with a 3-bit shift (adder) to get
a radix-8 multiplier (cycle time
will remain the same, though)

0

8a
Mux

0

x i+3

4a

Mux

0

4-Bit
Shift

Mux

x i+2

2a
0

x i+1

a

Mux

CSA

A radix-16 multiplier design
becomes a radix-256
multiplier if radix-4 Booth’s
recoding is applied first
(the muxes are replaced by
Booth recoding and multiple
selection logic)

M ultip lier

xi

CSA

Fig. 10.12

CSA

CSA
4

Sum
3
Carry
Partial Product
(Up p er Half)

FF

4-Bit
Adder

4
T o the Lower Half
of Partial Product

Apr. 2012 Computer Arithmetic, Multiplicationlide 44
S
A Spectrum of Multiplier Design Choices
Next
multip le

Several
multip les
...
Small CSA
tree

Adder
Partial product

Sp eed up

...

Full CSA
tree

Partial product

Adder
Basic
binary

All multip les

High-radix
or
p artial tree

Adder
Full
Economize tree

Fig. 10.13 High-radix multipliers as intermediate
between sequential radix-2 and full-tree multipliers.

Apr. 2012 Computer Arithmetic, Multiplicationlide 45
S
10.5 Multibeat Multipliers
Inputs
Present
state

Next-state
logic

State
flip-flops

Next-state
excitation

PH1
Next-state
logic

State
latches

CLK

(a) Sequential machine with FFs

Fig. 10.15

Inputs

State
latches

Next-state
logic
PH2

Inputs

(b) Sequential machine with latches and 2-phase clock

Two-phase clocking for sequential logic.
Begin changing FF contents
Change becomes visible at FF output

Once cycle

Observation: Half of the
clock cycle goes to
waste

Apr. 2012 Computer Arithmetic, Multiplicationlide 46
S
Twin-Beat and Three-Beat Multipliers
Twin M ultip lier
Registers
3a

a

4

a

3a

4

Pipelined
Radix-8
Booth
Recoder
& Selector

This radix-64 multiplier
runs at the clock rate of a
radix-8 design (2X speed)

Pipelined
Radix-8
Booth
Recoder
& Selector

CSA

CSA

Sum

Sum

Carry

Node 3

Beat-1
Input

Node 1

Carry

Beat-2
Input

5
FF

Adder

6
6-Bit
Adder

6
T o the Lower Half
of Partial Product

Fig. 10.14 Twin-beat multiplier
with radix-8 Booth’s recoding.

Node 2

Beat-3
Input

Fig. 10.16 Conceptual view
of a three-beat multiplier.

Apr. 2012 Computer Arithmetic, Multiplicationlide 47
S
10.6 VLSI Complexity Issues
A radix-2b multiplier requires:
bk two-input AND gates to form the partial products bit-matrix
O(bk) area for the CSA tree
At least Θ(k) area for the final carry-propagate adder
Total area:
Latency:

A = O(bk)
T = O((k/b) log b + log k)

Any VLSI circuit computing the product of two k-bit integers must
satisfy the following constraints:
AT grows at least as fast as k3/2
AT2 is at least proportional to k2
The preceding radix-2b implementations are suboptimal, because:
AT =
AT2 =

O(k2 log b + bk log k)
O((k3/b) log2b)

Apr. 2012 Computer Arithmetic, Multiplicationlide 48
S
Comparing High- and Low-Radix Multipliers
AT = O(k2 log b + bk log k)

AT 2 = O((k3/b) log2b)

Low-Cost
b = O(1)

High Speed
b = O(k)

AT- or AT 2Optimal

AT

O(k 2)

O(k 2 log k)

O(k 3/2)

AT 2

O(k 3)

O(k 2 log2 k)

O(k 2)

Intermediate designs do not yield better AT or AT 2 values;
The multipliers remain asymptotically suboptimal for any b
By the AT measure (indicator of cost-effectiveness), slower radix-2
multipliers are better than high-radix or tree multipliers
Thus, when an application requires many independent multiplications,
it is more cost-effective to use a large number of slower multipliers
High-radix multiplier latency can be reduced from O((k/b) log b + log k)
to O(k/b + log k) through more effective pipelining (Chapter 11)

Apr. 2012 Computer Arithmetic, Multiplicationlide 49
S
11 Tree and Array Multipliers
Chapter Goals
Study the design of multipliers for highest
possible performance (speed, throughput)
Chapter Highlights
Tree multiplier = reduction tree
+ redundant-to-binary converter
Avoiding full sign extension in multiplying
signed numbers
Array multiplier = one-sided reduction tree
+ ripple-carry adder
Apr. 2012 Computer Arithmetic, Multiplicationlide 50
S
Tree and Array Multipliers: Topics
Topics in This Chapter
11.1. Full-Tree Multipliers
11.2. Alternative Reduction Trees
11.3. Tree Multipliers for Signed Numbers
11.4. Partial-Tree and Truncated Multipliers
11.5. Array Multipliers
11.6. Pipelined Tree and Array Multipliers

Apr. 2012 Computer Arithmetic, Multiplicationlide 51
S
11.1 Full-Tree Multipliers
a
x
(x x ) a 4 0
1 0 two
(x 3x 2) twoa 4 1
p

×

Next
multip le

Multiplicand
Multiplier

Product

Several
multip les
...

...

Small CSA
tree

Adder
Partial product

Sp eed up

Full CSA
tree

Partial product

Adder
Basic
binary

All multiples

High-radix
or
p artial tree

Adder
Full
Economize tree

Fig. 10.13 High-radix multipliers
as intermediate between sequential
radix-2 and full-tree multipliers.

Fig. 11.1

M ultiplier
...

a
M ultipleForming
Circuits

a
a

. . .

a

Partial-Products
Reduction Tree
(M ulti-Operand
Addition Tree)

Redundant result
Redundant-to-Binary
Converter
Higher-order
product bits

Some lower-order
product bits are
generated directly

General structure of a full-tree multiplier.

Apr. 2012 Computer Arithmetic, Multiplicationlide 52
S
Full-Tree versus Partial-Tree Multiplier
All partial products

Several partial products

...

...

Large tree of
carry-save
adders

Logdepth

Adder

Logdepth

Product

Small tree of
carry-save
adders

Adder
Product

Schematic diagrams for full-tree and partial-tree multipliers.

Apr. 2012 Computer Arithmetic, Multiplicationlide 53
S
Variations in Full-Tree Multiplier Design
Designs are distinguished by
variations in three elements:
1. Multiple-forming circuits

M ultiplier
...

a
M ultipleForming
Circuits

a
a

. . .

a

Partial-Products
Reduction Tree

2. Partial products reduction tree

(M ulti-Operand
Addition Tree)

Fig. 11.1

Redundant result

3. Redundant-to-binary converter

Redundant-to-Binary
Converter
Higher-order
product bits

Some lower-order
product bits are
generated directly

Apr. 2012 Computer Arithmetic, Multiplicationlide 54
S
Example of Variations in CSA Tree Design
×

a
x

Wallace Tree
Multiplicand
Multiplier
(5 FAs + 3 HAs + 4-Bit Adder)

x 0 a 20
x 1 a 21
x 2 a 22
x 3 a 23

1

Partial
products
bit-matrix

2

3 4 3 2 1
FA FA FA HA
p
Product
-------------------1 3 2 3 2 1 1
FA HA FA HA
---------------------2 2 2 2 1 1 1
4-Bit Adder
---------------------1 1 1 1 1 1 1 1

Fig. 11.2

Corrections
shown in red

Dadda Tree
(4 FAs + 2 HAs + 6-Bit Adder)
1

2

3 4 3 2 1
FA FA
HA HA
-------------------1 3 3 3 3 2 1
2 2
FA FA FA HA
HA HA FA
---------------------2 2 2 2 2 2 1
1
6-Bit Adder
---------------------1 1 1 1 1 1 1 1

Two different binary 4 × 4 tree multipliers.

Apr. 2012 Computer Arithmetic, Multiplicationlide 55
S
Details of a CSA Tree

[0, 6]

[2, 8]

[1, 7]

[3, 9]

[5, 11]

[4, 10]

[6, 12]

[1, 6]

Fig. 11.3 Possible
CSA tree for a 7 × 7
tree multiplier.

7-bit CSA
[2, 8]

7-bit CSA

[1,8]

[3, 11]

[5, 11]

7-bit CSA
[6, 12]

[3, 12]

[2, 8]

CSA trees are quite
irregular, causing
some difficulties in
VLSI realization
Thus, our motivation
to examine alternate
methods for partial
products reduction

7-bit CSA
[3,9]

[2,12]
[3,12]

T he index pair
[i, j] means that
bit positions
from i up to j
are involved.

10-bit CSA
[3,12]
[4,13]

[4,12]

10-bit CPA
Ignore

[4, 13]

Apr. 2012 Computer Arithmetic, Multiplicationlide 56
S

3

2

1

0
11.2 Alternative Reduction Trees

FA

Inputs

Level-1
carries

FA

11 + ψ1 = 2ψ1 + 3
Therefore, ψ1 = 8
carries are needed

FA

FA

FA

FA

FA

FA

FA

Level-2
carries

Level
2

FA

FA

FA
Level-3
carries

Fig. 11.4
A slice of a
balanced-delay
tree for 11
inputs.

Level
1

FA
Level-4
carry

FA

FA

Level
3

FA

FA
Outputs

Apr. 2012 Computer Arithmetic, Multiplicationlide 57
S

Level
4

FA

Level
5
Binary Tree of 4-to-2 Reduction Modules
4-to-2

4-to-2

4-to-2

4-to-2 compressor

4-to-2
FA

4-to-2

0

4-to-2

1

0

FA

4-to-2
c

(a) Binary tree of (4; 2)-counters

1

s

(b) Realization with FAs

c

s

(c) A faster realization

Fig. 11.5 Tree multiplier with a more regular
structure based on 4-to-2 reduction modules.
Due to its recursive structure, a binary tree is more regular
than a 3-to-2 reduction tree when laid out in VLSI

Apr. 2012 Computer Arithmetic, Multiplicationlide 58
S
Example Multiplier with 4-to-2 Reduction Tree

Similarly,
using Booth’s
recoding may
not yield any
advantage,
because it
introduces
irregularity

Multiple
generation
circuits

...

M u l t i p l i c a n d

Redundant-to-binary converter

Fig. 11.6 Layout of a partial-products reduction tree composed of
4-to-2 reduction modules. Each solid arrow represents two numbers.

Apr. 2012 Computer Arithmetic, Multiplicationlide 59
S

M ul t ipl e s elect i on si gnals

Even if 4-to-2 reduction
is implemented using
two CSA levels, design
regularity potentially
makes up for the larger
number of logic levels
11.3 Tree Multipliers for Signed Numbers
---------- Extended positions ----------

Sign

Magnitude positions ---------

xk–1
yk–1
zk–1

xk–1
yk–1
zk–1

xk–2
yk–2
zk–2

xk–1
yk–1
zk–1

xk–1
yk–1
zk–1

xk–1
yk–1
zk–1

From Fig. 8.19a

xk–1
yk–1
zk–1

xk–3
yk–3
zk–3

xk–4
yk–4
zk–4

...
...
...

Sign extension in multioperand addition.
Signs

Sign extensions

α α α α α α α α x x x x x x x
β β β β β β β x x x x x x x x
γ γ γ γ γ γ x x x x x x x x x
Five redundant copies
removed
FA

FA

FA

FA

FA

The difference in
multiplication is the
shifting sign positions

αβγ
FA

Fig. 11.7 Sharing of full adders to reduce
the CSA width in a signed tree multiplier.

Apr. 2012 Computer Arithmetic, Multiplicationlide 60
S
Using the Negative-Weight
Property of the Sign Bit
Sign extension is a way of
converting negatively weighted bits
(negabits) to positively weighted
bits (posibits) to facilitate reduction,
but there are other methods of
accomplishing the same without
introducing a lot of extra bits

a4
a3
a2
a1
a0
x4
x3
x2
x1
x0
---------------------------a4 x0 a3 x0 a2 x0 a1 x0 a0 x0
a4 x1 a3 x1 a2 x1 a1 x1 a0 x1
a4 x2 a3 x2 a2 x2 a1 x2 a0 x2
a4 x3 a3 x3 a2 x3 a1 x3 a0 x3
a4 x4 a3 x4 a2 x4 a1 x4 a0 x4
-------------------------------------------------------p9
p8
p7
p6
p5
p4
p3
p2
p1
p0

a. Unsigned

×

a4
a3
a2
a1
a0
x
x
x
x
x
4
3
2
1
0
----------------------------a4 x0 a3 x0 a2 x0 a1 x0 a0 x0
-a4 x1 a3 x1 a2 x1 a1 x1 a0 x1
-a4 x2 a3 x2 a2 x2 a1 x2 a0 x2
-a4 x3 a3 x3 a2 x3 a1 x3 a0 x3
a4 x4 -a3 x4 -a2 x4 -a1 x4 -a0 x4
-------------------------------------------------------p
p
p
p
p
p
p
p
p
p
9
8
7
6
5
4
3
2
1
0

b. 2's-complement

×

a4
a3
a2
a1
a0
x4
x3
x2
x1
x0
---------------------------_
_ a4 x0 a3 x0 a2 x0 a1 x0 a0 x0
_ a4 x1 a3 x1 a2 x1 a1 x1 a0 x1
_ a4 x2 a3 x2 a2 x2 a1 x2 a0 x2
a4 x3 a3 x3 a2 x3 a1 x3 a0 x3
_
_
_
_
a x
a x
a x
a x
a x
4 4
3 4
2 4
1 4
0 4
a4
a4
1
x4
x4
-------------------------------------------------------p
p
p
p
p
p
p
p
p
p
9
8
7
6
5
4
3
2
1
0

Fig. 11.8 Baugh-Wooley
2’s-complement multiplication.

×

d. Modified B-W

Baugh and Wooley have
contributed two such methods

c. Baugh-Wooley

×
__

__

a4
a3
a2
a1
a0
x
x
x
x
x
4
3
2
1
0
--------------------------__
a x
a x
a x
a x
a x
4 0
3 0
2 0
1 0
0 0
a x
a x
a x
a x
3 x1 a2 x1 a1 x1
0 1
a
2 2
1 2
0 2
a x
__3 a0 x3
1
a x
0 4

a x
a x
a4 x1
__
4 2
3 2
a x
__3 a3 x3 a2 x3
__
__
4
a x
a x
a x
a x
4 4
3 4
2 4
1 4
1
1
-------------------------------------------------------p
p
p
p
p
p
p
p
p
p
9
8
7
6
5
4
3
2
1
0

Apr. 2012 Computer Arithmetic, Multiplicationlide 61
S
a4 x4 -a3 x4 -a2 x4 -a1 x4 -a0 x4
-------------------------------------------------------p9
p8
p7
p6
p5
p4
p3
p2
p1
p0

The Baugh-Wooley Method and Its Modified Form
Fig. 11.8

–a4x0 = a4(1 – x0) –
= a4x0′ – a4
–a4

a4x0′
a4

In next column

a4
a3
a2
a1
a0
x4
x3
x2
x1
x0
×
---------------------------_
a4 x0 a3 x0 a2 x0 a1 x0 a0 x0
_
a4
_ a4 x1 a3 x1 a2 x1 a1 x1 a0 x1
_ a4 x2 a3 x2 a2 x2 a1 x2 a0 x2
a4 x3 a3 x3 a2 x3 a1 x3 a0 x3
_
_
_
_
a4 x4 a3 x4 a2 x4 a1 x4 a0 x4
a4
a4
1
x4
x4
-------------------------------------------------------p
p
p
p
p
p
p
p
p
p
9
8
7
6
5
4
3
2
1
0

c. Baugh-Wooley

d. Modified B-W
–a4x0 = (1 – a4x0) – 1
= (a4x0)′ – 1
–1

(a4x0)′
1

In next column

×
__

__

a4
a3
a2
a1
a0
x
x
x
x
x
4
3
2
1
0
--------------------------__
a x
a x
a x
a x
a x
4 0
3 0
2 0
1 0
0 0
a x
a x
a x
a x
0 1
a3 x1 a2 x1 a1 x1
2 2
1 2
0 2
a x
a x
__3 0 3
1
a x
0 4

a x
a x
a4 x1
__
4 2
3 2
a x
a x
a x
__3 __3 __3
4
3
2
a x
a x
a x
a x
4 4
3 4
2 4
1 4
1
1
-------------------------------------------------------p
p
p
p
p
p
p
p
p
p
9
8
7
6
5
4
3
2
1
0

Apr. 2012 Computer Arithmetic, Multiplicationlide 62
S
Alternate Views of the
Baugh-Wooley Methods
+ 0
0 –a4x3 –a4x2 –a4x1 –a4x0
+ 0
0 –a3x4 –a2x4 –a1x4 –a0x4
-------------------------------------------– 0
0
a4x3 a4x2 a4x1 a4x0
– 0
0
a3x4 a2x4 a1x4 a0x4
-------------------------------------------+ 1
1
a4x3 a4x2 a4x1 a4x0
+ 1
1
a3x4 a2x4 a1x4 a0x4
1
1
-------------------------------------------+ a4 a4 a4x3 a4x2 a4x1 a4x0
+ x4 x4
a4 a3x4 a2x4 a1x4 a0x4
a4
1
x4
x4
--------------------------------------------

a4
a3
a2
a1
a0
x4
x3
x2
x1
x0
---------------------------a4 x0 a3 x0 a2 x0 a1 x0 a0 x0
a4 x1 a3 x1 a2 x1 a1 x1 a0 x1
a4 x2 a3 x2 a2 x2 a1 x2 a0 x2
a4 x3 a3 x3 a2 x3 a1 x3 a0 x3
a4 x4 a3 x4 a2 x4 a1 x4 a0 x4
-------------------------------------------------------p9
p8
p7
p6
p5
p4
p3
p2
p1
p0

a. Unsigned

×

a4
a3
a2
a1
a0
x
x
x
x
x
4
3
2
1
0
----------------------------a4 x0 a3 x0 a2 x0 a1 x0 a0 x0
-a4 x1 a3 x1 a2 x1 a1 x1 a0 x1
-a4 x2 a3 x2 a2 x2 a1 x2 a0 x2
-a4 x3 a3 x3 a2 x3 a1 x3 a0 x3
a4 x4 -a3 x4 -a2 x4 -a1 x4 -a0 x4
-------------------------------------------------------p
p
p
p
p
p
p
p
p
p
9
8
7
6
5
4
3
2
1
0

b. 2's-complement

×

a4
a3
a2
a1
a0
x4
x3
x2
x1
x0
---------------------------_
_ a4 x0 a3 x0 a2 x0 a1 x0 a0 x0
_ a4 x1 a3 x1 a2 x1 a1 x1 a0 x1
_ a4 x2 a3 x2 a2 x2 a1 x2 a0 x2
a4 x3 a3 x3 a2 x3 a1 x3 a0 x3
_
_
_
_
a x
a x
a x
a x
a x
4 4
3 4
2 4
1 4
0 4
a4
a4
1
x4
x4
-------------------------------------------------------p
p
p
p
p
p
p
p
p
p
9
8
7
6
5
4
3
2
1
0

c. Baugh-Wooley

×

d. Modified B-W

×
__

__

a4
a3
a2
a1
a0
x
x
x
x
x
4
3
2
1
0
--------------------------__
a x
a x
a x
a x
a x
4 0
3 0
2 0
1 0
0 0
a x
a x
a x
a x
3 x1 a2 x1 a1 x1
0 1
a
2 2
1 2
0 2
a x
__3 a0 x3
1
a x
0 4

a x
a x
a4 x1
__
4 2
3 2
a x
__3 a3 x3 a2 x3
__
__
4
a x
a x
a x
a x
4 4
3 4
2 4
1 4
1
1
-------------------------------------------------------p
p
p
p
p
p
p
p
p
p
9
8
7
6
5
4
3
2
1
0

Apr. 2012 Computer Arithmetic, Multiplicationlide 63
S
11.4 Partial-Tree and Truncated Multipliers
h inputs

High-radix versus partialtree multipliers: The
difference is quantitative, not
qualitative
For small h, say ≤ 8 bits,
we view the multiplier of
Fig. 11.9 as high-radix
When h is a significant
fraction of k, say k/2 or k/4,
then we tend to view it as
a partial-tree multiplier

Upper part of
the cumulative
partial product
(stored-carry)

...

CSA Tree

Sum
Carry

Better design through pipelining
to be covered in Section 11.6

FF
Adder

h-Bit
Adder

Lower part of
the cumulative
partial product

Fig. 11.9 General structure
of a partial-tree multiplier.

Apr. 2012 Computer Arithmetic, Multiplicationlide 64
S
Why Truncated Multipliers?
Nearly half of the hardware in array/tree multipliers is there to get the
last bit right (1 dot = one FPGA cell)
ulp

.        
k-by-k fractional
× .        
multiplication
--------------------------------.
      |
Max error = 8/2 + 7/4
.
     | 
+ 6/8 + 5/16 + 4/32
.
    |  
+ 3/64 + 2/128
.
   |   
+ 1/256 = 7.004 ulp
.
  |    
.
 |     
Mean error =
.
|      
1.751 ulp
.
|       
--------------------------------.        |       
Fig. 11.10

The idea of a truncated multiplier with 8-bit fractional operands.

Apr. 2012 Computer Arithmetic, Multiplicationlide 65
S
Truncated Multipliers with Error Compensation
We can introduce additional “dots” on the left-hand side to compensate
for the removal of dots from the right-hand side
Constant compensation

Variable compensation

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

o o o o
o o o
o o
o

o
o
o
o
o
1

o
o
o
o
o
o

o|
o|
o|
o|
o|
o|
o|
|

o o o o
o o o
o o
o

o
o
o
o
o

o o|
o o|
o o|
o o|
o o|
o o|
x-1o|
y-1 |

Constant and variable error compensation for truncated multipliers.
Max error = +4 ulp
Max error ≅ −3 ulp

Max error = +? ulp
Max error ≅ −? ulp

Mean error = ? ulp

Mean error = ? ulp

Apr. 2012 Computer Arithmetic, Multiplicationlide 66
S
11.5 Array Multipliers
x2 a

x1 a

x0 a
CSA

x3 a
CSA

x4 a

CSA
CSA

a4 x0 0 a3 x0 0 a2 x0 0 a1 x0 0 a0 x0

0

p0

a3 x1
a4 x1

a2 x1

a0 x1
p1

a3 x2
a4 x2

a2 x2

a1 x2

a0 x
2
p2

a3 x3
a4 x3

a2 x3

a1 x3

a0 x3
p3

a3 x4

Rip p le-Carry Adder

a1 x1

a1 x4

a2 x4

a4 x4

a0 x4
0

ax

Fig. 11.11 A basic array
multiplier uses a one-sided CSA
tree and a ripple-carry adder.

p9

p8

p7

p6

Fig. 11.12 Details of a 5 × 5
array multiplier using FA blocks.

Apr. 2012 Computer Arithmetic, Multiplicationlide 67
S

p4
p5
Signed (2’s-complement) Array Multiplier
Fig. 11.13
Modifications in a 5 × 5
array multiplier to deal
with 2’s-complement
inputs using the
Baugh-Wooley
method or to shorten
the critical path.
a4 x4
_
a4
_
x4

_
a4 x0 0 a3 x0 0 a2 x0 0 a1 x0 0 a0 x0
p0

a3 x1
_
a4 x1

a2 x1

a1 x1

a0 x1
p1

a3 x2
_
a4 x2

a2 x2

a1 x2

a0 x
2
p2

a3 x3
_
a4 x3

a2 x3

1

a0 x3

_
a2 x4

_
a3 x4

a1 x3
_
a1 x4

_
a0 x4

p3
a4
x4

p9

p8

p7

p6

Apr. 2012 Computer Arithmetic, Multiplicationlide 68
S

p5

p4
Array Multiplier Built of Modified Full-Adder Cells
Fig. 11.14 Design of
a 5 × 5 array multiplier
with two additive
inputs and full-adder
blocks that include
AND gates.

a4

a3

a2

a1

a0
x0
p0
p1
p2
p3

FA

p4

p9

p8

p7

p6

Apr. 2012 Computer Arithmetic, Multiplicationlide 69
S

p5

x1
x2
x3
x4
Array Multiplier without a Final Carry-Propagate Adder
Fig. 11.15 Conceptual
view of a modified
array multiplier that
does not need a final
carry-propagate adder.

..
.
M ux
i

Fig. 11.16 Carry-save
addition, performed in
level i, extends the
conditionally computed
bits of the final product.
i Conditional bits

Dots in row i

Dots in row i + 1
i + 1 Conditional bits
of the final product

Bi+1

Level i

M ux
i

i

i+1

Bi+1

i+1

M ux
k

Bi

Bi

i

..
.

k

M ux
k
[k, 2k–1]

k–1

i+1

i

i–1

All remaining bits of the final product
produced only 2 gate levels after pk–1

Apr. 2012 Computer Arithmetic, Multiplicationlide 70
S

1

0
11.6 Pipelined Tree and Array Multipliers
h inputs

...
h inputs
Upper part of
the cumulative
partial product
(stored-carry)

...

Pipelined
CSA Tree

Latches
(h + 2)-input
CSA tree

Latches
Latches

CSA Tree

CSA

Sum

Latch

CSA

Carry
FF
Adder

h-Bit
Adder

Lower part of
the cumulative
partial product

Sum
Carry
FF
Adder

Fig. 11.9 General structure
of a partial-tree multiplier.

h-Bit
Adder

Lower part of
the cumulative
partial product

Fig. 11.17 Efficiently pipelined
partial-tree multiplier.

Apr. 2012 Computer Arithmetic, Multiplicationlide 71
S
Pipelined Array Multipliers
a4

a3

a2

a1

a0

x0

x1

x2

x3

x4

p4

p3

p2

p1

p0

With latches after every
FA level, the maximum
throughput is achieved
Latches may be inserted
after every h FA levels for
an intermediate design
Example: 3-stage pipeline
Fig. 11.18 Pipelined 5 × 5
array multiplier using
latched FA blocks.
The small shaded boxes
are latches.

Latched
FA with
AND gate

FA
FA
FA

Latch
FA

p9

p8

p7

p6

p5

Apr. 2012 Computer Arithmetic, Multiplicationlide 72
S
12 Variations in Multipliers
Chapter Goals
Learn additional methods for synthesizing
fast multipliers as well as other types
of multipliers (bit-serial, modular, etc.)
Chapter Highlights
Building a multiplier from smaller units
Performing multiply-add as one operation
Bit-serial and (semi)systolic multipliers
Using a multiplier for squaring is wasteful
Apr. 2012 Computer Arithmetic, Multiplicationlide 73
S
Variations in Multipliers: Topics
Topics in This Chapter
12.1 Divide-and-Conquer Designs
12.2 Additive Multiply Modules
12.3 Bit-Serial Multipliers
12.4 Modular Multipliers
12.5 The Special Case of Squaring
12.6 Combined Multiply-Add Units

Apr. 2012 Computer Arithmetic, Multiplicationlide 74
S
12.1 Divide-and-Conquer Designs
Building wide multiplier from narrower ones

aH
xH

×

aL
xL

Rearranged partial products
in 2b-by-2b multiplication
2b bits
b bits

a L xL
a L xH
a H xL
a H xH
p

a H xL
a H xH

a L xL
a L xH

3b bits

Fig. 12.1 Divide-and-conquer (recursive) strategy for
synthesizing a 2b × 2b multiplier from b × b multipliers.

Apr. 2012 Computer Arithmetic, Multiplicationlide 75
S
General Structure of a Recursive Multiplier
2b × 2b
3b × 3b
4b × 4b

use (3; 2)-counters
use (5; 2)-counters
use (7; 2)-counters

4b × 4b
3b × 3b
2b × 2b
b ×b

Fig. 12.2 Using b × b multipliers to synthesize
2b × 2b, 3b × 3b, and 4b × 4b multipliers.

Apr. 2012 Computer Arithmetic, Multiplicationlide 76
S
Using b × c, rather than b × b Building Blocks
4b × 4b
3b × 3b
2b × 2b
b ×b

2b × 2c
2b × 4c
gb × hc

use b × c multipliers and (3; 2)-counters
use b × c multipliers and (5?; 2)-counters
use b × c multipliers and (?; 2)-counters

Apr. 2012 Computer Arithmetic, Multiplicationlide 77
S
Wide Multiplier Built of Narrow Multipliers and Adders
Fig. 12.3 Using 4 × 4
multipliers and 4-bit
adders to synthesize
an 8 × 8 multiplier.

aH

xH

[4, 7]

aL

[4, 7]

Multiply
[12,15]

xH

[0, 3]

aH

[4, 7]

Multiply

[8,11]

[8,11]

x

[4, 7]

L

aL

[0, 3]

[0, 3]

Multiply
[4, 7]

[8,11]
8

xL
[0, 3]

Multiply
[4, 7]

[4, 7]

[0, 3]

Add
[4, 7]

12

8

Add
[8,11]

12

Add
[4, 7]

Add
[8,11]

000

Add
[12,15]

p [12,15]

p[8,11]

p [4, 7]

Apr. 2012 Computer Arithmetic, Multiplicationlide 78
S

p [0, 3]
Karatsuba Multiplication
2b × 2b

multiplication requires four b × b multiplications:

(2baH + aL) × (2bxH + xL) = 22baHxH + 2b (aHxL + aLxH) + aLxL
Karatsuba noted that one of the four multiplications can be removed
at the expense of introducing a few additions:
(2baH + aL) × (2bxH + xL) =

b bits

Mult 1

Mult 3

aH

aL

xH

22baHxH + 2b [(aH + aL) × (xH + xL) – aHxH – aLxL] + aLxL

xL

Mult 2

Benefit is quite significant for extremely wide operands
(4/3)5 = 4.2
(4/3)10 = 17.8
(4/3)20 = 315.3
(4/3)50 = 1,765,781

Apr. 2012 Computer Arithmetic, Multiplicationlide 79
S
12.2 Additive Multiply Modules
y

z

y
ax
z

x

a

4-bit adder

p = ax + y + z

cin
p

(a) Block diagram
(b) Dot notation
v
v
Fig. 12.4 Additive multiply module with 2 × 4 multiplier (ax)
plus 4-bit and 2-bit additive inputs (y and z).

b × c AMM

b-bit and c-bit multiplicative inputs
b-bit and c-bit additive inputs
(b + c)-bit output
(2b – 1) × (2c – 1) + (2b – 1) + (2c – 1) = 2b+c – 1

Apr. 2012 Computer Arithmetic, Multiplicationlide 80
S
Multiplier Built of AMMs
*

Legend:
2 bits
4 bits

*

x[0, 1]

a [0, 3]
*

*

a [4, 7]

[2, 5]

x[0, 1]

[2, 3]

[4, 5]

*

a [4, 7]

x [2, 3]

Understanding an
8 × 8 multiplier
built of 4 × 2
AMMs using dot
notation

[4,7]

x [4, 5]

a [0, 3]

[8, 11]

*

x [2, 3]

a [0, 3]
[6, 9]

[0, 1]

[4,5]

[6, 7]
[6, 9]

x [6, 7]

a [0, 3]
[8, 11]

[8, 9]

x [4, 5]
a [4, 7]

[8, 9]

[10,13]

a [4, 7]

[6, 7]

[10,11]

x [6, 7]
p[12,15] p [10,11] p [8, 9]

p[0, 1]
p[2, 3]

p[4,5]
p [6, 7]

Fig. 12.5 An 8 × 8 multiplier built of 4 × 2 AMMs.
Inputs marked with an asterisk carry 0s.

Apr. 2012 Computer Arithmetic, Multiplicationlide 81
S
a [4, 7]

Multiplier Built of AMMs: Alternate Design
*

a [0,3]

*

*

x[0, 1]
*

x [2, 3]

This design is more regular
than that in Fig. 12.5 and is
easily expandable to larger
configurations; its latency,
however, is greater

*

x [4, 5]
*

x [6, 7]
Legend:
2 bits
4 bits
p[12,15] p [10,11]

p[0, 1]

p[2, 3]

p[4,5]

p [6, 7]

p [8, 9]

Fig. 12.6 Alternate 8 × 8
multiplier design based on
4 × 2 AMMs. Inputs marked
with an asterisk carry 0s.

Apr. 2012 Computer Arithmetic, Multiplicationlide 82
S
12.3 Bit-Serial Multipliers
Bit-serial adder
(LSB first)

…x x x
2 1 0

FF
FA

…y y y
2 1 0

Bit-serial multiplier
(Must follow the k-bit
inputs with k 0s;
alternatively, view
the product as being
only k bits wide)

…a a a
2 1 0
…x x x
2 1 0

?

…s s s
2 1 0

…p p p
2 1 0

What goes inside the box to make a bit-serial multiplier?
Can the circuit be designed to support a high clock rate?

Apr. 2012 Computer Arithmetic, Multiplicationlide 83
S
Semisystolic Serial-Parallel Multiplier
a3

M ultiplicand (parallel in)
a1
a2

a0

x0 x1 x2 x3
M ultiplier
(serial in)
LSB-first

Sum

FA
Carry

FA

FA

FA

Product
(serial out)

Fig. 12.7 Semi-systolic circuit for 4 × 4 multiplication in 8 clock cycles.
This is called “semisystolic” because it has a large signal fan-out of k
(k-way broadcasting) and a long wire spanning all k positions

Apr. 2012 Computer Arithmetic, Multiplicationlide 84
S
Systolic Retiming as a Design Tool
A semisystolic circuit can be converted to a systolic circuit
via retiming, which involves advancing and retarding signals
by means of delay removal and delay insertion in such a
way that the relative timings of various parts are unaffected
Cut
e
f
CL

g
h

–d

+d
e+d
f+d
CR

CL

g–d
h–d

CR

–d

+d

Original delays
Adjusted delays
Fig. 12.8 Example of retiming by delaying the inputs to C L
and advancing the outputs from CL by d units

Apr. 2012 Computer Arithmetic, Multiplicationlide 85
S
Alternate Explanation of Systolic Retiming
t

t+a

t

d1

d2

t+a+d1+d2

t+d1

t+a+d1

d1

t

d2

t+a+d1+d2

Transferring delay from the outputs of a subsystem to its
inputs does not change the behavior of the overall system

Apr. 2012 Computer Arithmetic, Multiplicationlide 86
S
a3

A First Attempt
at Retiming

M ultip licand (parallel in)
a1
a2

x0 x1 x2 x3

a0

M ultiplier
(serial in)
LSB-first
Sum

FA

FA

FA

FA

Product
(serial out)

Carry

Fig. 12.7
a3

M ultiplicand (parallel in)
a1
a2

a0

x0 x1 x2 x3
M ultiplier
(serial in)
LSB-first

FA

Sum

FA

FA

FA

Carry

Cut 3

Cut 2

Cut 1

Product
(serial out)

Fig. 12.9 A retimed version
of our semi-systolic multiplier.

Apr. 2012 Computer Arithmetic, Multiplicationlide 87
S
Deriving a Fully
Systolic Multiplier

M ultip licand (parallel in)
a1
a2

a3

a0

x0 x1 x2 x3
M ultiplier
(serial in)
LSB-first

Sum

FA

FA

FA

FA

Carry

Product
(serial out)

Fig. 12.7
M ultiplicand (parallel in)
a1
a2

a3

a0

x0

x1

x2

x3

M ultiplier
(serial in)
LSB-first
Sum

FA
Carry

FA

FA

FA

Product
(serial out)

Fig. 12.10 Systolic circuit for
4 × 4 multiplication in 15 cycles.

Apr. 2012 Computer Arithmetic, Multiplicationlide 88
S
A Direct Design for a Bit-Serial Multiplier
p (i–1)

ai

ai

t out

t in

a(i - 1)

xi

xi

x(i - 1)

ai x(i - 1)
cout

ai xi
2

(5; 3)-counter
1

c in

Already
accumulated
into three
numbers

ai xi

0

xi a(i - 1)

1

sin

0

a
x

s out
Mux

Already output

Fig. 12.11 Building block for a
latency-free bit-serial multiplier.

(a) Structure of the bit-matrix

...
...
...

t out t in

LSB

...

cout c in

0

...

sin s out

pi

ai
xi

Fig. 12.12 The cellular structure
of the bit-serial multiplier based on
the cell in Fig. 12.11.

p

ai xi

p(i - 1)
ai x(i - 1)
xi a(i - 1)
2p (i ) Shift right )to
obtain p(i
(b) Reduction after each input bit

Fig. 12.13 Bit-serial multiplier
design in dot notation.

Apr. 2012 Computer Arithmetic, Multiplicationlide 89
S
12.4 Modular Multipliers
FA

FA

...

FA

FA

FA

Fig. 12.14 Modulo-(2b – 1)
carry-save adder.

4

4

4

4

Mod-15 CSA
Divide by 16

Fig. 12.15 Design of a
4 × 4 modulo-15 multiplier.

Mod-15 CSA

Mod-15 CPA

Apr. 2012 Computer Arithmetic, Multiplicationlide 90
S

4
Other Examples of Modular Multiplication

Address

Fig. 12.16 One way
to design of a 4 × 4
modulo-13 multiplier.

n inp uts

.
.
.

CSA Tree

Table

...

Data

Fig. 12.17 A method for
modular multioperand addition.

3-inp ut
M odulo-m
Adder
sum mod m

Apr. 2012 Computer Arithmetic, Multiplicationlide 91
S
12.5 The Special Case of Squaring
Multiply x by x

x3
x3

x2
x2

x1
x1

x0
x0

x1 x0
x0 x1

x4 x3
x3 x4

x3 x0
x2 x1
x1 x2
x0 x3

x0 x0

x4 x2
x3 x3
x2 x4

x4 x1
x3 x2
x2 x3
x1 x4

x4 x0
x3 x1
x2 x2
x1 x3
x0 x4

x2 x0
x1 x1
x0 x2

x4 x4
p9

x4
x4

p8

p7

p6

p5

p4

p3

p2

p1

p0
Simplify

x4 x3
x4
p9

x4 x2

p8

p7

x4 x1
x3 x2
x3

x4 x0
x3 x1

x3 x0
x2 x1
x2

x2 x0

p6

p5

p4

p3

Fig. 12.18

x1 x0
x1

_

x0

x1x0 –x1x0
p2

0

x0

Design of a 5-bit squarer.

Apr. 2012 Computer Arithmetic, Multiplicationlide 92
S
Divide-and-Conquer Squarers
Building wide squarers from narrower ones

aH
xH
xH

×

aH x H
xH

aL
xL
xL

xL
a L xH
a H xL
xH

p

Rearranged partial products
in 2b-by-2b multiplication
2b bits
b bits

a L xL
xL
xH x H
aH

xH
a H xL

a L xH
xL

xL x L
aL

3b bits

Divide-and-conquer (recursive) strategy for synthesizing a
2b × 2b squarer from b × b squarers and multiplier.

Apr. 2012 Computer Arithmetic, Multiplicationlide 93
S
12.6 Combined Multiply-Add Units
Additive input

(a)

CSA tree output
Carry-save additive input

(b)

CSA tree output
Additive input
Dot matrix for the
4 × 4 multiplication

(c)

Carry-save additive input

(d)

Dot matrix for the
4 × 4 multiplication

Multiply-add
versus
multiply-accumulate
Multiply-accumulate
units often have wider
additive inputs
Fig. 12.19
Dot-notation
representations
of various methods
for performing
a multiply-add
operation
in hardware.

Apr. 2012 Computer Arithmetic, Multiplicationlide 94
S

Weitere ähnliche Inhalte

Was ist angesagt?

solucionario de purcell 2
solucionario de purcell 2solucionario de purcell 2
solucionario de purcell 2José Encalada
 
Digital signal processing (2nd ed) (mitra) solution manual
Digital signal processing (2nd ed) (mitra) solution manualDigital signal processing (2nd ed) (mitra) solution manual
Digital signal processing (2nd ed) (mitra) solution manualRamesh Sundar
 
統計的学習の基礎 4章 前半
統計的学習の基礎 4章 前半統計的学習の基礎 4章 前半
統計的学習の基礎 4章 前半Ken'ichi Matsui
 
51556 0131469657 ism-15
51556 0131469657 ism-1551556 0131469657 ism-15
51556 0131469657 ism-15Carlos Fuentes
 
Tutor 41 Big Numbers
Tutor 41 Big NumbersTutor 41 Big Numbers
Tutor 41 Big NumbersMax Kleiner
 
第7回 大規模データを用いたデータフレーム操作実習(1)
第7回 大規模データを用いたデータフレーム操作実習(1)第7回 大規模データを用いたデータフレーム操作実習(1)
第7回 大規模データを用いたデータフレーム操作実習(1)Wataru Shito
 
Relaxation methods for the matrix exponential on large networks
Relaxation methods for the matrix exponential on large networksRelaxation methods for the matrix exponential on large networks
Relaxation methods for the matrix exponential on large networksDavid Gleich
 
「ベータ分布の謎に迫る」第6回 プログラマのための数学勉強会 LT資料
「ベータ分布の謎に迫る」第6回 プログラマのための数学勉強会 LT資料「ベータ分布の謎に迫る」第6回 プログラマのための数学勉強会 LT資料
「ベータ分布の謎に迫る」第6回 プログラマのための数学勉強会 LT資料Ken'ichi Matsui
 
Green’s Function Solution of Non-homogenous Singular Sturm-Liouville Problem
Green’s Function Solution of Non-homogenous Singular Sturm-Liouville ProblemGreen’s Function Solution of Non-homogenous Singular Sturm-Liouville Problem
Green’s Function Solution of Non-homogenous Singular Sturm-Liouville ProblemIJSRED
 

Was ist angesagt? (15)

solucionario de purcell 2
solucionario de purcell 2solucionario de purcell 2
solucionario de purcell 2
 
Digital signal processing (2nd ed) (mitra) solution manual
Digital signal processing (2nd ed) (mitra) solution manualDigital signal processing (2nd ed) (mitra) solution manual
Digital signal processing (2nd ed) (mitra) solution manual
 
統計的学習の基礎 4章 前半
統計的学習の基礎 4章 前半統計的学習の基礎 4章 前半
統計的学習の基礎 4章 前半
 
51556 0131469657 ism-15
51556 0131469657 ism-1551556 0131469657 ism-15
51556 0131469657 ism-15
 
Capitulo 4 Soluciones Purcell 9na Edicion
Capitulo 4 Soluciones Purcell 9na EdicionCapitulo 4 Soluciones Purcell 9na Edicion
Capitulo 4 Soluciones Purcell 9na Edicion
 
Tutor 41 Big Numbers
Tutor 41 Big NumbersTutor 41 Big Numbers
Tutor 41 Big Numbers
 
第7回 大規模データを用いたデータフレーム操作実習(1)
第7回 大規模データを用いたデータフレーム操作実習(1)第7回 大規模データを用いたデータフレーム操作実習(1)
第7回 大規模データを用いたデータフレーム操作実習(1)
 
Relaxation methods for the matrix exponential on large networks
Relaxation methods for the matrix exponential on large networksRelaxation methods for the matrix exponential on large networks
Relaxation methods for the matrix exponential on large networks
 
「ベータ分布の謎に迫る」第6回 プログラマのための数学勉強会 LT資料
「ベータ分布の謎に迫る」第6回 プログラマのための数学勉強会 LT資料「ベータ分布の謎に迫る」第6回 プログラマのための数学勉強会 LT資料
「ベータ分布の謎に迫る」第6回 プログラマのための数学勉強会 LT資料
 
Capitulo 5 Soluciones Purcell 9na Edicion
Capitulo 5 Soluciones Purcell 9na EdicionCapitulo 5 Soluciones Purcell 9na Edicion
Capitulo 5 Soluciones Purcell 9na Edicion
 
TensorFriday#01
TensorFriday#01TensorFriday#01
TensorFriday#01
 
Integral calculus
  Integral calculus   Integral calculus
Integral calculus
 
Sec 3 A Maths Notes Indices
Sec 3 A Maths Notes IndicesSec 3 A Maths Notes Indices
Sec 3 A Maths Notes Indices
 
Green’s Function Solution of Non-homogenous Singular Sturm-Liouville Problem
Green’s Function Solution of Non-homogenous Singular Sturm-Liouville ProblemGreen’s Function Solution of Non-homogenous Singular Sturm-Liouville Problem
Green’s Function Solution of Non-homogenous Singular Sturm-Liouville Problem
 
Algebra 6
Algebra 6Algebra 6
Algebra 6
 

Andere mochten auch

Mini Project on 4 BIT SERIAL MULTIPLIER
Mini Project on 4 BIT SERIAL MULTIPLIERMini Project on 4 BIT SERIAL MULTIPLIER
Mini Project on 4 BIT SERIAL MULTIPLIERj naga sai
 
Unit 2 arithmetics
Unit 2   arithmeticsUnit 2   arithmetics
Unit 2 arithmeticschidabdu
 
Wallace tree multiplier
Wallace tree multiplierWallace tree multiplier
Wallace tree multiplierSudhir Kumar
 
Bit Serial multiplier using Verilog
Bit Serial multiplier using VerilogBit Serial multiplier using Verilog
Bit Serial multiplier using VerilogBhargavKatkam
 
Booths Multiplication Algorithm
Booths Multiplication AlgorithmBooths Multiplication Algorithm
Booths Multiplication Algorithmknightnick
 

Andere mochten auch (7)

Mini Project on 4 BIT SERIAL MULTIPLIER
Mini Project on 4 BIT SERIAL MULTIPLIERMini Project on 4 BIT SERIAL MULTIPLIER
Mini Project on 4 BIT SERIAL MULTIPLIER
 
Array multiplier
Array multiplierArray multiplier
Array multiplier
 
Unit 2 arithmetics
Unit 2   arithmeticsUnit 2   arithmetics
Unit 2 arithmetics
 
Boothmultiplication
BoothmultiplicationBoothmultiplication
Boothmultiplication
 
Wallace tree multiplier
Wallace tree multiplierWallace tree multiplier
Wallace tree multiplier
 
Bit Serial multiplier using Verilog
Bit Serial multiplier using VerilogBit Serial multiplier using Verilog
Bit Serial multiplier using Verilog
 
Booths Multiplication Algorithm
Booths Multiplication AlgorithmBooths Multiplication Algorithm
Booths Multiplication Algorithm
 

Ähnlich wie F31 book-arith-pres-pt3

Computer organiztion3
Computer organiztion3Computer organiztion3
Computer organiztion3Umang Gupta
 
Student manual
Student manualStudent manual
Student manualec931657
 
Matlab_Simulink_Tutorial.ppt
Matlab_Simulink_Tutorial.pptMatlab_Simulink_Tutorial.ppt
Matlab_Simulink_Tutorial.pptPrasenjitDey49
 
Chapter 1 digital design.pptx
Chapter 1 digital design.pptxChapter 1 digital design.pptx
Chapter 1 digital design.pptxAliaaTarek5
 
Adapting to Adaptive Plans on 12c
Adapting to Adaptive Plans on 12cAdapting to Adaptive Plans on 12c
Adapting to Adaptive Plans on 12cMauro Pagano
 
Idea for ineractive programming language
Idea for ineractive programming languageIdea for ineractive programming language
Idea for ineractive programming languageLincoln Hannah
 
digital-systems-and-binary-numbers1.pptx
digital-systems-and-binary-numbers1.pptxdigital-systems-and-binary-numbers1.pptx
digital-systems-and-binary-numbers1.pptxRameshK531901
 
MatlabSimulinkTutorial.ppt
MatlabSimulinkTutorial.pptMatlabSimulinkTutorial.ppt
MatlabSimulinkTutorial.pptjeronimored
 
Matlab_Simulink_Tutorial.ppt
Matlab_Simulink_Tutorial.pptMatlab_Simulink_Tutorial.ppt
Matlab_Simulink_Tutorial.pptDrBashirMSaad
 
Arithmetic of Computers
Arithmetic of Computers Arithmetic of Computers
Arithmetic of Computers Dhamodhar M
 
Data representation moris mano ch 03
Data representation   moris mano ch  03Data representation   moris mano ch  03
Data representation moris mano ch 03thearticlenow
 
Algebra and Trigonometry 9th Edition Larson Solutions Manual
Algebra and Trigonometry 9th Edition Larson Solutions ManualAlgebra and Trigonometry 9th Edition Larson Solutions Manual
Algebra and Trigonometry 9th Edition Larson Solutions Manualkejeqadaqo
 
Algebra cheat sheet
Algebra cheat sheetAlgebra cheat sheet
Algebra cheat sheetalex9803
 

Ähnlich wie F31 book-arith-pres-pt3 (20)

Chapter 3
Chapter 3Chapter 3
Chapter 3
 
Computer organiztion3
Computer organiztion3Computer organiztion3
Computer organiztion3
 
Student manual
Student manualStudent manual
Student manual
 
Matlab_Simulink_Tutorial.ppt
Matlab_Simulink_Tutorial.pptMatlab_Simulink_Tutorial.ppt
Matlab_Simulink_Tutorial.ppt
 
Chapter 1 digital design.pptx
Chapter 1 digital design.pptxChapter 1 digital design.pptx
Chapter 1 digital design.pptx
 
DIGITAL ELECTRONICS.pptx
DIGITAL ELECTRONICS.pptxDIGITAL ELECTRONICS.pptx
DIGITAL ELECTRONICS.pptx
 
Adapting to Adaptive Plans on 12c
Adapting to Adaptive Plans on 12cAdapting to Adaptive Plans on 12c
Adapting to Adaptive Plans on 12c
 
Idea for ineractive programming language
Idea for ineractive programming languageIdea for ineractive programming language
Idea for ineractive programming language
 
digital-systems-and-binary-numbers1.pptx
digital-systems-and-binary-numbers1.pptxdigital-systems-and-binary-numbers1.pptx
digital-systems-and-binary-numbers1.pptx
 
MatlabSimulinkTutorial.ppt
MatlabSimulinkTutorial.pptMatlabSimulinkTutorial.ppt
MatlabSimulinkTutorial.ppt
 
Matlab_Simulink_Tutorial.ppt
Matlab_Simulink_Tutorial.pptMatlab_Simulink_Tutorial.ppt
Matlab_Simulink_Tutorial.ppt
 
Ch3
Ch3Ch3
Ch3
 
Arithmetic of Computers
Arithmetic of Computers Arithmetic of Computers
Arithmetic of Computers
 
Ch1 2 (2)
Ch1 2 (2)Ch1 2 (2)
Ch1 2 (2)
 
Datarepresentation2
Datarepresentation2Datarepresentation2
Datarepresentation2
 
Data representation moris mano ch 03
Data representation   moris mano ch  03Data representation   moris mano ch  03
Data representation moris mano ch 03
 
Digital logic circuit
Digital logic circuitDigital logic circuit
Digital logic circuit
 
DATA REPRESENTATION
DATA  REPRESENTATIONDATA  REPRESENTATION
DATA REPRESENTATION
 
Algebra and Trigonometry 9th Edition Larson Solutions Manual
Algebra and Trigonometry 9th Edition Larson Solutions ManualAlgebra and Trigonometry 9th Edition Larson Solutions Manual
Algebra and Trigonometry 9th Edition Larson Solutions Manual
 
Algebra cheat sheet
Algebra cheat sheetAlgebra cheat sheet
Algebra cheat sheet
 

F31 book-arith-pres-pt3

  • 1. Part III Multiplication Parts Chapters 1. 2. 3. 4. Numbers and Arithmetic Representing Signed Numbers Redundant Number Systems Residue Number Systems 5. 6. 7. 8. Basic Addition and Counting Carry-Look ahead Adders Variations in Fast Adders Multioperand Addition III. Multiplication 9. 10. 11. 12. Basic Multiplication Schemes High-Radix Multipliers Tree and Array Multipliers Variations in Multipliers IV. Division 13. 14. 15. 16. Basic Division Schemes High-Radix Dividers Variations in Dividers Division by Convergence 17. 18. 19. 20. Floating-Point Reperesentations Floating-Point Operations Errors and Error Control Precise and Certifiable Arithmetic 21. 22. 23. 24. Square-Rooting Methods The CORDIC Algorithms Variations in Function Evaluation Arithmetic by Table Lookup 25. 26. 27. 28. 28. High-Throughput Arithmetic Low-Power Arithmetic Fault-Tolerant Arithmetic Past, Present, and Future Reconfigurable Arithmetic Elementary Operations I. Number Representation II. Addition / Subtraction V. Real Arithmetic VI. Function Evaluation VII. Implementation Topics Appendix: Past, Present, and Future Apr. 2012 Computer Arithmetic, Multiplicationlide 1 S
  • 2. About This Presentation This presentation is intended to support the use of the textbook Computer Arithmetic: Algorithms and Hardware Designs (Oxford U. Press, 2nd ed., 2010, ISBN 978-0-19-532848-6). It is updated regularly by the author as part of his teaching of the graduate course ECE 252B, Computer Arithmetic, at the University of California, Santa Barbara. Instructors can use these slides freely in classroom teaching and for other educational purposes. Unauthorized uses are strictly prohibited. © Behrooz Parhami Edition Released Revised Revised Revised Revised First Jan. 2000 Sep. 2001 Sep. 2003 Oct. 2005 May 2007 Apr. 2008 Apr. 2009 Apr. 2011 Apr. 2012 Second Apr. 2010 Apr. 2012 Computer Arithmetic, Multiplicationlide 2 S
  • 3. III Multiplication Review multiplication schemes and various speedup methods • Multiplication is heavily used (in arith & array indexing) • Division = reciprocation + multiplication • Multiplication speedup: high-radix, tree, recursive • Bit-serial, modular, and array multipliers Topics in This Part Chapter 9 Basic Multiplication Schemes Chapter 10 High-Radix Multipliers Chapter 11 Tree and Array Multipliers Chapter 12 Variations in Multipliers Apr. 2012 Computer Arithmetic, Multiplicationlide 3 S
  • 4. “Well, well, for a rabbit, you’re not very good at multiplying, are you?” Apr. 2012 Computer Arithmetic, Multiplicationlide 4 S
  • 5. 9 Basic Multiplication Schemes Chapter Goals Study shift/add or bit-at-a-time multipliers and set the stage for faster methods and variations to be covered in Chapters 10-12 Chapter Highlights Multiplication = multioperand addition Hardware, firmware, software algorithms Multiplying 2’s-complement numbers The special case of one constant operand Apr. 2012 Computer Arithmetic, Multiplicationlide 5 S
  • 6. Basic Multiplication Schemes: Topics Topics in This Chapter 9.1 Shift/Add Multiplication Algorithms 9.2 Programmed Multiplication 9.3 Basic Hardware Multipliers 9.4 Multiplication of Signed Numbers 9.5 Multiplication by Constants 9.6 Preview of Fast Multipliers Apr. 2012 Computer Arithmetic, Multiplicationlide 6 S
  • 7. 9.1 Shift/Add Multiplication Algorithms Notation for our discussion of multiplication algorithms: a x p Multiplicand Multiplier Product (a × x) p2k–1p2k–2 ak–1ak–2 . . . a1a0 xk–1xk–2 . . . x1x0 . . . p3p2p1p0 Initially, we assume unsigned operands Multiplicand Multiplier x 0 a 20 x 1 a 21 x 2 a 22 x 3 a 23 Partial products bit-matrix p × a x Product Fig. 9.1 Multiplication of two 4-bit unsigned binary numbers in dot notation. Apr. 2012 Computer Arithmetic, Multiplicationlide 7 S
  • 8. Multiplication Recurrence Partial products bit-matrix p × Multiplicand Multiplier x 0 a 20 x 1 a 21 x 2 a 22 x 3 a 23 Fig. 9.1 a x Product Preferred Multiplication with right shifts: top-to-bottom accumulation p(j+1) = (p(j) + xj a 2k) 2–1 |–––add–––| |––shift right––| with p(0) = 0 and p(k) = p = ax + p(0)2–k Multiplication with left shifts: bottom-to-top accumulation p(j+1) = 2 p(j) + xk–j–1a with |shift| |––––add––––| p(0) = 0 and p(k) = p = ax + p(0)2k Apr. 2012 Computer Arithmetic, Multiplicationlide 8 S
  • 9. Examples of Basic Multiplication Right-shift algorithm ======================== a 1010 1 0 1 0 x 1 0 1 1 ======================== p(0) 0 0 0 0 +x0a 1 0 1 0 ––––––––––––––––––––––––– 2p(1) 0 1 0 1 0 (1) p 0 1 0 1 0 +x1a 1 0 1 0 ––––––––––––––––––––––––– 2p(2) 0 1 1 1 1 0 (2) p 0 1 1 1 1 0 +x2a 0 0 0 0 ––––––––––––––––––––––––– 2p(3) 0 0 1 1 1 1 0 (3) p 0 0 1 1 1 1 0 +x3a 1 0 1 0 ––––––––––––––––––––––––– 2p(4) 0 1 1 0 1 1 1 0 (4) p 0 1 1 0 1 1 1 0 ======================== Left-shift algorithm ======================= a 1 0 1 0 x 1 0 1 1 ======================= p(0) 0 0 0 0 (0) 2p 0 0 0 0 0 +x3a 1 0 1 0 –––––––––––––––––––––––– p(1) 0 1 0 1 0 2p(1) 0 1 0 1 0 0 +x2a 0 0 0 0 –––––––––––––––––––––––– p(2) 0 1 0 1 0 0 (j+1) (2) = (p(j) + x a 2k) 2–1 p 2p 0 1j 0 1 0 0 0 +x1a 1 |–––add–––| 0 1 0 –––––––––––––––––––––––– |––shift right––| (3) p 0 1 1 0 0 1 0 (3) 2p 0 1 1 0 0 1 0 0 +x0a 1 0 1 0 –––––––––––––––––––––––– p(4) 0 1 1 0 1 1 1 0 ======================= Apr. 2012 Computer Arithmetic, Multiplicationlide 9 S Fig. 9.2 Examples of sequential multiplication with right and left shifts. Check: 10 × 11 = 110 = 64 + 32 + 8+4+2
  • 10. Examples of Basic Multiplication (Continued) Right-shift algorithm ======================== a 1 0 1 0 x 1 0 1 1 ======================== p(0) 0 0 0 0 +x0a 1 0 1 0 ––––––––––––––––––––––––– 2p(1) 0 1 0 1 0 (1) p 0 1 0 1 0 +x1a 1 0 1 0 ––––––––––––––––––––––––– 2p(2) 0 1 1 1 1 0 (2) p 0 1 1 1 1 0 (j) p(j+1) =0 2 p0 0 xk–j–1a + +x2a 0 |shift| ––––––––––––––––––––––––– 2p(3) 0 0|––––add––––| 1 1 1 1 0 (3) p 0 0 1 1 1 1 0 +x3a 1 0 1 0 ––––––––––––––––––––––––– 2p(4) 0 1 1 0 1 1 1 0 (4) p 0 1 1 0 1 1 1 0 ======================== Left-shift algorithm ======================= a 1 0 1 0 x 1 0 1 1 ======================= p(0) 0 0 0 0 (0) 2p 0 0 0 0 0 +x3a 1 0 1 0 –––––––––––––––––––––––– p(1) 0 1 0 1 0 2p(1) 0 1 0 1 0 0 +x2a 0 0 0 0 –––––––––––––––––––––––– p(2) 0 1 0 1 0 0 (2) 2p 0 1 0 1 0 0 0 +x1a 1 0 1 0 –––––––––––––––––––––––– p(3) 0 1 1 0 0 1 0 (3) 2p 0 1 1 0 0 1 0 0 +x0a 1 0 1 0 –––––––––––––––––––––––– p(4) 0 1 1 0 1 1 1 0 ======================= Fig. 9.2 Examples of sequential multiplication with right and left shifts. Check: 10 × 11 = 110 = 64 + 32 + 8+4+2 Apr. 2012 Computer Arithmetic, Multiplicationlide 10 S
  • 11. 9.2 Programmed Multiplication {Using right shifts, multiply unsigned m_cand and m_ier, storing the resultant 2k-bit product in p_high and p_low. Registers: R0 holds 0 Rc for counter R0 0 Rc Counter Ra for m_cand Rx for m_ier Rp for p_high Rq for p_low} Ra Multiplicand Rx Multiplier {Load operands into registers Ra and Rx} Rp Product, high Rq Product, low mult: load Ra with m_cand load Rx with m_ier {Initialize partial product and counter} copy R0 into Rp copy R0 into Rq load k into Rc {Begin multiplication loop} m_loop: shift Rx right 1 {LSB moves to carry flag} branch no_add if carry = 0 add Ra to Rp {carry flag is set to cout} no_add: rotate Rp right 1 {carry to MSB, LSB to carry} rotate Rq right 1 {carry to MSB, LSB to carry} decr Rc {decrement counter by 1} branch m_loop if Rc ≠ 0 {Store the product} Fig. 9.3 Programmed store Rp into p_high multiplication (right-shift store Rq into p_low algorithm). m_done: ... Apr. 2012 Computer Arithmetic, Multiplicationlide 11 S
  • 12. Time Complexity of Programmed Multiplication Assume k-bit words k iterations of the main loop 6-7 instructions per iteration, depending on the multiplier bit Thus, 6k + 3 to 7k + 3 machine instructions, ignoring operand loads and result store k = 32 implies 200+ instructions on average This is too slow for many modern applications! Microprogrammed multiply would be somewhat better Apr. 2012 Computer Arithmetic, Multiplicationlide 12 S
  • 13. 9.3 Basic Hardware Multipliers Shift Multiplier x Doublewidth partial product p (j) Shift Multiplicand a 0 0 Mux xj a cout k 1 k Adder xj p(j+1) = (p(j) + xj a 2k) 2–1 |–––add–––| |––shift right––| k Fig. 9.4 Hardware realization of the sequential multiplication algorithm with additions and right shifts. Apr. 2012 Computer Arithmetic, Multiplicationlide 13 S
  • 14. Example of Hardware Multiplication Shift 1 Multiplier x 1 1 1 0 0 0 Doublewidth 1 1 1 1 1 1 1 0 0 0 0 0 0 0partial product p 0 (j) (11)ten (110)ten Shift (10)ten 1 Multiplicand a 0 1 0 0 0 Mux xj a cout k 1 k Adder xj p(j+1) = (p(j) + xj a 2k) 2–1 |–––add–––| |––shift right––| k Fig. 9.4a Hardware realization of the sequential multiplication algorithm with additions and right shifts. Apr. 2012 Computer Arithmetic, Multiplicationlide 14 S
  • 15. Performing Add and Shift in One Clock Cycle Adder’s carry-out Adder’s sum k k–1 Unused part of the multiplier x Partial product p(j) k To adder k–1 To mux control Fig. 9.5 Combining the loading and shifting of the double-width register holding the partial product and the partially used multiplier. Apr. 2012 Computer Arithmetic, Multiplicationlide 15 S
  • 16. Sequential Multiplication with Left Shifts Shift Multiplier x Doublewidth partial product p (j) Shift Multiplicand a 2k xk-j-1 0 0 Mux xk-j-1a cout 1 k 2k-bit adder 2k Fig. 9.4b Hardware realization of the sequential multiplication algorithm with left shifts and additions. Apr. 2012 Computer Arithmetic, Multiplicationlide 16 S
  • 17. 9.4 Multiplication of Signed Numbers Fig. 9.6 Sequential multiplication of 2’s-complement numbers with right shifts (positive multiplier). Negative multiplicand, positive multiplier: No change, other than looking out for proper sign extension ============================ a 1 0 1 1 0 x 0 1 0 1 1 ============================ p(0) 0 0 0 0 0 Check: +x0a 1 0 1 1 0 – 10 × 11 ––––––––––––––––––––––––––––– = –110 (1) 2p 1 1 0 1 1 0 (1) = –512 + p 1 1 0 1 1 0 +x1a 1 0 1 1 0 256 + ––––––––––––––––––––––––––––– 128 + 2p(2) 1 1 0 0 0 1 0 16 + 2 p(2) 1 1 0 0 0 1 0 +x2a 0 0 0 0 0 ––––––––––––––––––––––––––––– 2p(3) 1 1 1 0 0 0 1 0 (3) p 1 1 1 0 0 0 1 0 +x3a 1 0 1 1 0 ––––––––––––––––––––––––––––– 2p(4) 1 1 0 0 1 0 0 1 0 (4) p 1 1 0 0 1 0 0 1 0 +x4a 0 0 0 0 0 ––––––––––––––––––––––––––––– 2p(5) 1 1 1 0 0 1 0 0 1 0 (5) p 1 1 1 0 0 1 0 0 1 0 ============================ Apr. 2012 Computer Arithmetic, Multiplicationlide 17 S
  • 18. The Case of a Negative Multiplier Fig. 9.7 Sequential multiplication of 2’s-complement numbers with right shifts (negative multiplier). Negative multiplicand, negative multiplier: In last step (the sign bit), subtract rather than add ============================ a 1 0 1 1 0 x 1 0 1 0 1 ============================ p(0) 0 0 0 0 0 Check: +x0a 1 0 1 1 0 – 10 × –11 ––––––––––––––––––––––––––––– = 110 2p(1) 1 1 0 1 1 0 = 64 + 32 + p(1) 1 1 0 1 1 0 +x1a 0 0 0 0 0 8+4+2 ––––––––––––––––––––––––––––– 2p(2) 1 1 1 0 1 1 0 (2) p 1 1 1 0 1 1 0 +x2a 1 0 1 1 0 ––––––––––––––––––––––––––––– 2p(3) 1 1 0 0 1 1 1 0 (3) p 1 1 0 0 1 1 1 0 +x3a 0 0 0 0 0 ––––––––––––––––––––––––––––– 2p(4) 1 1 1 0 0 1 1 1 0 (4) p 1 1 1 0 0 1 1 1 0 +(−x4a) 0 1 0 1 0 ––––––––––––––––––––––––––––– 2p(5) 0 0 0 1 1 0 1 1 1 0 (5) p 0 0 0 1 1 0 1 1 1 0 ============================ Apr. 2012 Computer Arithmetic, Multiplicationlide 18 S
  • 19. Signed 2’s-Complement Hardware Multiplier Multiplier Partial product k+1 Multiplicand 1 Enable 0 Mux k+1 cout Select Adder k+1 Fig. 9.8 cin 0, except in last cycle The 2’s-complement sequential hardware multiplier. Apr. 2012 Computer Arithmetic, Multiplicationlide 19 S
  • 20. Booth’s Recoding Table 9.1 Radix-2 Booth’s recoding ––––––––––––––––––––––––––––––––––––– xi xi–1 yi Explanation ––––––––––––––––––––––––––––––––––––– 0 0 0 No string of 1s in sight 0 1 1 End of string of 1s in x − 1 0 1 Beginning of string of 1s in x 1 1 0 Continuation of string of 1s in x ––––––––––––––––––––––––––––––––––––– Example 1 0 0 1 1 1 0 1 (1) −1 0 1 0 0 −1 1 0 1 0 1 0 − 1 1 −1 1 1 1 1 0 0 0 −1 0 Operand x Recoded version y Justification 2j + 2j–1 + . . . + 2i+1 + 2i = 2j+1 – 2i Apr. 2012 Computer Arithmetic, Multiplicationlide 20 S
  • 21. Example Multiplication with Booth’s Recoding Fig. 9.9 Sequential multiplication of 2’s-complement numbers with right shifts by means of Booth’s recoding. –––––––––– xi xi–1 yi –––––––––– 0 0 0 0 1 1 − 1 0 1 1 1 0 –––––––––– ============================ a 1 0 1 1 0 x 1 0 1 0 1 Multiplier − y 1 1 −1 1 −1 Booth-recoded ============================ p(0) 0 0 0 0 0 Check: +y0a 0 1 0 1 0 – 10 × –11 ––––––––––––––––––––––––––––– = 110 2p(1) 0 0 1 0 1 0 p(1) 0 0 1 0 1 0 = 64 + 32 + +y1a 1 0 1 1 0 8+4+2 ––––––––––––––––––––––––––––– 2p(2) 1 1 1 0 1 1 0 p(2) 1 1 1 0 1 1 0 +y2a 0 1 0 1 0 ––––––––––––––––––––––––––––– 2p(3) 0 0 0 1 1 1 1 0 p(3) 0 0 0 1 1 1 1 0 +y3a 1 0 1 1 0 ––––––––––––––––––––––––––––– 2p(4) 1 1 1 0 0 1 1 1 0 p(4) 1 1 1 0 0 1 1 1 0 y4a 0 1 0 1 0 ––––––––––––––––––––––––––––– 2p(5) 0 0 0 1 1 0 1 1 1 0 p(5) 0 0 0 1 1 0 1 1 1 0 ============================ Apr. 2012 Computer Arithmetic, Multiplicationlide 21 S
  • 22. 9.5 Multiplication by Constants Explicit, e.g. y := 12 ∗ x + 1 Implicit, e.g. A[i, j] := A[i, j] + B[i, j] Address of A[i, j] = base + n ∗ i + j Software aspects: 0 1 2 . . . n–1 0 1 2 . . Row i . m–1 Column j Optimizing compilers replace multiplications by shifts/adds/subs Produce efficient code using as few registers as possible Find the best code by a time/space-efficient algorithm Hardware aspects: Synthesize special-purpose units such as filters y[t] = a0x[t] + a1x[t – 1] + a2x[t – 2] + b1y[t – 1] + b2y[t – 2] Apr. 2012 Computer Arithmetic, Multiplicationlide 22 S
  • 23. Multiplication Using Binary Expansion Example: Multiply R1 by the constant 113 = (1 1 1 0 0 0 1)two R2 R3 R6 R7 R112 R113 ← ← ← ← ← ← R1 shift-left 1 R2 + R1 R3 shift-left 1 R6 + R1 R7 shift-left 4 R112 + R1 Shift, add Shift Ri: Register that contains i times (R1) This notation is for clarity; only one register other than R1 is needed Shorter sequence using shift-and-add instructions R3 R7 R113 ← ← ← R1 shift-left 1 + R1 R3 shift-left 1 + R1 R7 shift-left 4 + R1 Apr. 2012 Computer Arithmetic, Multiplicationlide 23 S
  • 24. Multiplication via Recoding Example: Multiply R1 by 113 = (1 1 1 0 0 0 1)two = (1 0 0−1 0 0 0 1)two R8 R7 R112 R113 ← ← ← ← R1 shift-left 3 R8 – R1 R7 shift-left 4 R112 + R1 Shift Shift, subtract Shift, add Shorter sequence using shift-and-add/subtract instructions R7 R113 ← ← R1 shift-left 3 – R1 R7 shift-left 4 + R1 6 shift or add (3 shift-and-add) instructions needed without recoding The canonic signed-digit representation of a number contains no consecutive nonzero digits: average number of shift-adds is O(k/3) Apr. 2012 Computer Arithmetic, Multiplicationlide 24 S
  • 25. Multiplication via Factorization Example: Multiply R1 by 119 = 7 × 17 = (8 – 1) × (16 + 1) R8 R7 R112 R119 ← ← ← ← R1 shift-left 3 R8 – R1 R7 shift-left 4 R112 + R7 Shorter sequence using shift-and-add/subtract instructions R7 R119 ← ← R1 shift-left 3 – R1 R7 shift-left 4 + R7 Requires a scratch register for holding the 7 multiple 119 = (1 1 1 0 1 1 1)two = (1 0 0 0−1 0 0−1)two More instructions may be needed without factorization Apr. 2012 Computer Arithmetic, Multiplicationlide 25 S
  • 26. Multiplication by Multiple Constants Example: Multiplying a number by 45, 49, and 65 R9 R45 ← ← R1 shift-left 3 + R1 R9 shift-left 2 + R9 R7 R49 ← ← R1 shift-left 3 – R1 R7 shift-left 3 – R7 R65 ← Separate solutions: 5 shift-add/subtract operations R1 shift-left 6 + R1 A combined solution for all three constants R65 R49 R45 ← ← ← R1 shift-left 6 + R1 R65 – R1 left-shift 4 R49 – R1 left-shift 2 A programmable block can perform any of the three multiplications Apr. 2012 Computer Arithmetic, Multiplicationlide 26 S
  • 27. 9.6 Preview of Fast Multipliers Viewing multiplication as a multioperand addition problem, there are but two ways to speed it up a. Reducing the number of operands to be added: Handling more than one multiplier bit at a time (high-radix multipliers, Chapter 10) b. Adding the operands faster: Parallel/pipelined multioperand addition (tree and array multipliers, Chapter 11) In Chapter 12, we cover all remaining multiplication topics: Bit-serial multipliers Modular multipliers Multiply-add units Squaring as a special case Apr. 2012 Computer Arithmetic, Multiplicationlide 27 S
  • 28. 10 High-Radix Multipliers Chapter Goals Study techniques that allow us to handle more than one multiplier bit in each cycle (two bits in radix 4, three in radix 8, . . .) Chapter Highlights High radix gives rise to “difficult” multiples Recoding (change of digit-set) as remedy Carry-save addition reduces cycle time Implementation and optimization methods Apr. 2012 Computer Arithmetic, Multiplicationlide 28 S
  • 29. High-Radix Multipliers: Topics Topics in This Chapter 10.1 Radix-4 Multiplication 10.2 Modified Booth’s Recoding 10.3 Using Carry-Save Adders 10.4 Radix-8 and Radix-16 Multipliers 10.5 Multibeat Multipliers 10.6 VLSI Complexity Issues Apr. 2012 Computer Arithmetic, Multiplicationlide 29 S
  • 30. 10.1 Radix-4 Multiplication Fig. 9.1 (modified) × a x Multiplicand Multiplier x0 r 0 x 0 a 20 x 1 a 21 x1 r 1 x 2 a 22 x2 a r 2 3 x 3 a 23 x3 a r p Partial products bit-matrix Product Preferred Multiplication with right shifts in radix r : top-to-bottom accumulation p(j+1) = (p(j) + xj a r k) r –1 |–––add–––| |––shift right––| with p(0) = 0 and p(k) = p = ax + p(0)r –k Multiplication with left shifts in radix r : bottom-to-top accumulation p(j+1) = r p(j) + xk–j–1a with |shift| |––––add––––| p(0) = 0 and p(k) = p = ax + p(0)r k Apr. 2012 Computer Arithmetic, Multiplicationlide 30 S
  • 31. Radix-4 Multiplication in Dot Notation a x x 0 a 20 x 1 a 21 x 2 a 22 x 3 a 23 Partial products bit-matrix p × Multiplicand Multiplier Product Fig. 10.1 Radix-4, or two-bit-at-a-time, multiplication in dot notation Fig. 9.1 Number of cycles is halved, but now the “difficult” multiple 3a must be dealt with × a x (x x ) a 4 0 1 0 two (x 3x 2) twoa 4 1 Multiplicand Multiplier p Product Apr. 2012 Computer Arithmetic, Multiplicationlide 31 S
  • 32. A Possible Design for a Radix-4 Multiplier Precomputed via shift-and-add (3a = 2a + a) Multiplier 3a 0 a 2a 00 01 10 11 k/2 + 1 cycles, rather than k One extra cycle over k/2 not too bad, but we would like to avoid it if possible Solving this problem for radix 4 may also help when dealing with even higher radices 2-bit shifts x i+1 Mux To the adder Fig. 10.2 The multiple generation part of a radix-4 multiplier with precomputation of 3a. Apr. 2012 Computer Arithmetic, Multiplicationlide 32 S xi
  • 33. Example Radix-4 Multiplication Using 3a ================================ a 0 1 1 0 3a 0 1 0 0 1 0 x 1 1 1 0 ================================ p(0) 0 0 0 0 +(x1x0)twoa 0 0 1 1 0 0 ––––––––––––––––––––––––––––––––– 4p(1) 0 0 1 1 0 0 p(1) 0 0 1 1 0 0 +(x3x2)twoa 0 1 0 0 1 0 ––––––––––––––––––––––––––––––––– 4p(2) 0 1 0 1 0 1 0 0 p(2) 0 1 0 1 0 1 0 0 ================================ × a x (x x ) 1 0 (x 3x 2) p Fig. 10.3 Example of radix-4 multiplication using the 3a multiple. Apr. 2012 Computer Arithmetic, Multiplicationlide 33 S
  • 34. A Second Design for a Radix-4 Multiplier Fig. 10.4 The multiple generation part of a radix-4 multiplier based on replacing 3a with 4a (carry into next higher radix-4 multiplier digit) and –a. Multiplier 2-bit shifts 0 a 2a –a 00 01 10 11 Mux To the adder x i+1 xi c Carry +c FF mod 4 c xi+1 ⊕ xi c xi ⊕ c xi+1 ---0 0 0 0 1 1 1 1 xi --0 0 1 1 0 0 1 1 c --0 1 0 1 0 1 0 1 Set if x c) xi+1(xi ∨ i+1 = x i = 1 or if x i+1 = c = 1 Mux control ---------------0 0 0 1 0 1 1 0 1 0 1 1 1 1 0 0 Set carry -----------0 0 0 0 0 1 1 1 Apr. 2012 Computer Arithmetic, Multiplicationlide 34 S
  • 35. 10.2 Modified Booth’s Recoding Table 10.1 Radix-4 Booth’s recoding yielding (zk/2 . . . z1z0)four ––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––– xi+1 xi xi–1 yi+1 yi zi/2 Explanation ––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––– 0 0 0 0 0 0 No string of 1s in sight 0 0 1 0 1 1 End of string of 1s 0 1 0 0 1 1 Isolated 1 0 1 1 1 0 2 End of string of 1s − − 1 0 0 1 0 2 Beginning of string of 1s − − 1 0 1 1 1 1 End a string, begin new one − − 1 1 0 0 1 1 Beginning of string of 1s 1 1 1 0 0 0 Continuation of string of 1s ––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––– Recoded Context Radix-4 digit radix-2 digits Example 1 0 0 1 1 1 0 1 (1) −1 0 1 0 0 −1 1 0 (1) − 2 2 − 1 2 1 0 1 0 − 1 1 −1 1 1 − − 1 1 1 1 0 0 0 −1 0 0 − 2 Operand x Recoded version y Radix-4 version z Apr. 2012 Computer Arithmetic, Multiplicationlide 35 S
  • 36. Example Multiplication via Modified Booth’s Recoding ================================ a 0 1 1 0 x 1 0 1 0 − z 1 −2 Radix-4 ================================ p(0) 0 0 0 0 0 0 +z0a 1 1 0 1 0 0 ––––––––––––––––––––––––––––––––– 4p(1) 1 1 0 1 0 0 p(1) 1 1 1 1 0 1 0 0 +z1a 1 1 1 0 1 0 ––––––––––––––––––––––––––––––––– 4p(2) 1 1 0 1 1 1 0 0 p(2) 1 1 0 1 1 1 0 0 ================================ × a x (x x ) 1 0 (x 3x 2) p Fig. 10.5 Example of radix-4 multiplication with modified Booth’s recoding of the 2’scomplement multiplier. Apr. 2012 Computer Arithmetic, Multiplicationlide 36 S
  • 37. Multiple Generation with Radix-4 Booth’s Recoding M ultip lier 2-bit shift x i+1 M ultip licand Init. 0 xi x i–1 k Sign of a Recoding Logic neg two Could have named this signal one/two non0 0 Enable 0 Select 0 a M ux 2a 1 ---- Encoding ---Digit neg two non0 –2 1 1 1 –1 1 0 1 0 0 0 0 1 0 0 1 2 0 1 1 0, a, or 2a k+1 Add/subtract control z i/2 a To adder inp ut Fig. 10.6 The multiple generation part of a radix-4 multiplier based on Booth’s recoding. Apr. 2012 Computer Arithmetic, Multiplicationlide 37 S
  • 38. 10.3 Using Carry-Save Adders Multiplier 2a 0 Mux a 0 x i+1 xi Mux Old Cumulative Partial Product CSA Adder New Cumulative Partial Product Fig. 10.7 Radix-4 multiplication with a carry-save adder used to combine the cumulative partial product, xia, and 2xi+1a into two numbers. Apr. 2012 Computer Arithmetic, Multiplicationlide 38 S
  • 39. Keeping the Partial Product in Carry-Save Form Sum Mux Multiplier Carry Partial Product k k Sum Right shift Carry Multiplicand 0 Mux k-Bit CSA Carry k Upper half of PP (a) Multiplier k block diagram Sum k-Bit Adder Lower half of PP (b) Operation in a typical cycle Fig. 10.8 Radix-2 multiplication with the upper half of the cumulative partial product kept in stored-carry form. Apr. 2012 Computer Arithmetic, Multiplicationlide 39 S
  • 40. Carry-Save Multiplier with Radix-4 Booth’s Recoding a Multiplier x i+1 x i x i-1 Booth recoder and selector Old cumulative partial product z i/2 a CSA New cumulative partial product 2-bit Adder Adder FF Extra “dot” To the lower half of partial product Fig. 10.9 Radix-4 multiplication with a CSA used to combine the stored-carry cumulative partial product and zi/2a into two numbers. Apr. 2012 Computer Arithmetic, Multiplicationlide 40 S
  • 41. Radix-4 Booth’s Recoding for Parallel Multiplication x i+2 x i+1 x x i i–1 x i–2 Recoding Logic neg two non0 a 2a 1 Enable 0 Mux Select Fig. 10.10 Booth recoding and multiple selection logic for high-radix or parallel multiplication. 0, a, or 2a k+1 Selective Complement k+2 Extra "Dot" for Column i 0, a, –a, 2a, or –2a z i/2 a Apr. 2012 Computer Arithmetic, Multiplicationlide 41 S
  • 42. Yet Another Design for Radix-4 Multiplication Multiplier 2a 0 Mux Old Cumulative Partial Product (4; 2)-counter CSA New Cumulative Partial Product CSA Adder x i+1 a 0 xi Mux Fig. 10.11 Radix-4 multiplication, with the cumulative partial product, xia, and 2xi+1a combined into two numbers by two CSAs. FF 2-Bit Adder T o the Lower Half of Partial Product Apr. 2012 Computer Arithmetic, Multiplicationlide 42 S
  • 43. 10.4 Radix-8 and Radix-16 Multipliers M ultip lier 0 8a Mux 0 x i+3 4a Mux 4-bit right shift 0 4-Bit Shift Mux 0 x i+1 a Mux CSA Fig. 10.12 Radix-16 multiplication with the upper half of the cumulative partial product in carry-save form. x i+2 2a xi CSA CSA CSA 4 Sum 3 Carry Partial Product (Up p er Half) FF 4-Bit Adder 4 T o the Lower Half of Partial Product Apr. 2012 Computer Arithmetic, Multiplicationlide 43 S
  • 44. Other High-Radix Multipliers Remove this mux & CSA and replace the 4-bit shift (adder) with a 3-bit shift (adder) to get a radix-8 multiplier (cycle time will remain the same, though) 0 8a Mux 0 x i+3 4a Mux 0 4-Bit Shift Mux x i+2 2a 0 x i+1 a Mux CSA A radix-16 multiplier design becomes a radix-256 multiplier if radix-4 Booth’s recoding is applied first (the muxes are replaced by Booth recoding and multiple selection logic) M ultip lier xi CSA Fig. 10.12 CSA CSA 4 Sum 3 Carry Partial Product (Up p er Half) FF 4-Bit Adder 4 T o the Lower Half of Partial Product Apr. 2012 Computer Arithmetic, Multiplicationlide 44 S
  • 45. A Spectrum of Multiplier Design Choices Next multip le Several multip les ... Small CSA tree Adder Partial product Sp eed up ... Full CSA tree Partial product Adder Basic binary All multip les High-radix or p artial tree Adder Full Economize tree Fig. 10.13 High-radix multipliers as intermediate between sequential radix-2 and full-tree multipliers. Apr. 2012 Computer Arithmetic, Multiplicationlide 45 S
  • 46. 10.5 Multibeat Multipliers Inputs Present state Next-state logic State flip-flops Next-state excitation PH1 Next-state logic State latches CLK (a) Sequential machine with FFs Fig. 10.15 Inputs State latches Next-state logic PH2 Inputs (b) Sequential machine with latches and 2-phase clock Two-phase clocking for sequential logic. Begin changing FF contents Change becomes visible at FF output Once cycle Observation: Half of the clock cycle goes to waste Apr. 2012 Computer Arithmetic, Multiplicationlide 46 S
  • 47. Twin-Beat and Three-Beat Multipliers Twin M ultip lier Registers 3a a 4 a 3a 4 Pipelined Radix-8 Booth Recoder & Selector This radix-64 multiplier runs at the clock rate of a radix-8 design (2X speed) Pipelined Radix-8 Booth Recoder & Selector CSA CSA Sum Sum Carry Node 3 Beat-1 Input Node 1 Carry Beat-2 Input 5 FF Adder 6 6-Bit Adder 6 T o the Lower Half of Partial Product Fig. 10.14 Twin-beat multiplier with radix-8 Booth’s recoding. Node 2 Beat-3 Input Fig. 10.16 Conceptual view of a three-beat multiplier. Apr. 2012 Computer Arithmetic, Multiplicationlide 47 S
  • 48. 10.6 VLSI Complexity Issues A radix-2b multiplier requires: bk two-input AND gates to form the partial products bit-matrix O(bk) area for the CSA tree At least Θ(k) area for the final carry-propagate adder Total area: Latency: A = O(bk) T = O((k/b) log b + log k) Any VLSI circuit computing the product of two k-bit integers must satisfy the following constraints: AT grows at least as fast as k3/2 AT2 is at least proportional to k2 The preceding radix-2b implementations are suboptimal, because: AT = AT2 = O(k2 log b + bk log k) O((k3/b) log2b) Apr. 2012 Computer Arithmetic, Multiplicationlide 48 S
  • 49. Comparing High- and Low-Radix Multipliers AT = O(k2 log b + bk log k) AT 2 = O((k3/b) log2b) Low-Cost b = O(1) High Speed b = O(k) AT- or AT 2Optimal AT O(k 2) O(k 2 log k) O(k 3/2) AT 2 O(k 3) O(k 2 log2 k) O(k 2) Intermediate designs do not yield better AT or AT 2 values; The multipliers remain asymptotically suboptimal for any b By the AT measure (indicator of cost-effectiveness), slower radix-2 multipliers are better than high-radix or tree multipliers Thus, when an application requires many independent multiplications, it is more cost-effective to use a large number of slower multipliers High-radix multiplier latency can be reduced from O((k/b) log b + log k) to O(k/b + log k) through more effective pipelining (Chapter 11) Apr. 2012 Computer Arithmetic, Multiplicationlide 49 S
  • 50. 11 Tree and Array Multipliers Chapter Goals Study the design of multipliers for highest possible performance (speed, throughput) Chapter Highlights Tree multiplier = reduction tree + redundant-to-binary converter Avoiding full sign extension in multiplying signed numbers Array multiplier = one-sided reduction tree + ripple-carry adder Apr. 2012 Computer Arithmetic, Multiplicationlide 50 S
  • 51. Tree and Array Multipliers: Topics Topics in This Chapter 11.1. Full-Tree Multipliers 11.2. Alternative Reduction Trees 11.3. Tree Multipliers for Signed Numbers 11.4. Partial-Tree and Truncated Multipliers 11.5. Array Multipliers 11.6. Pipelined Tree and Array Multipliers Apr. 2012 Computer Arithmetic, Multiplicationlide 51 S
  • 52. 11.1 Full-Tree Multipliers a x (x x ) a 4 0 1 0 two (x 3x 2) twoa 4 1 p × Next multip le Multiplicand Multiplier Product Several multip les ... ... Small CSA tree Adder Partial product Sp eed up Full CSA tree Partial product Adder Basic binary All multiples High-radix or p artial tree Adder Full Economize tree Fig. 10.13 High-radix multipliers as intermediate between sequential radix-2 and full-tree multipliers. Fig. 11.1 M ultiplier ... a M ultipleForming Circuits a a . . . a Partial-Products Reduction Tree (M ulti-Operand Addition Tree) Redundant result Redundant-to-Binary Converter Higher-order product bits Some lower-order product bits are generated directly General structure of a full-tree multiplier. Apr. 2012 Computer Arithmetic, Multiplicationlide 52 S
  • 53. Full-Tree versus Partial-Tree Multiplier All partial products Several partial products ... ... Large tree of carry-save adders Logdepth Adder Logdepth Product Small tree of carry-save adders Adder Product Schematic diagrams for full-tree and partial-tree multipliers. Apr. 2012 Computer Arithmetic, Multiplicationlide 53 S
  • 54. Variations in Full-Tree Multiplier Design Designs are distinguished by variations in three elements: 1. Multiple-forming circuits M ultiplier ... a M ultipleForming Circuits a a . . . a Partial-Products Reduction Tree 2. Partial products reduction tree (M ulti-Operand Addition Tree) Fig. 11.1 Redundant result 3. Redundant-to-binary converter Redundant-to-Binary Converter Higher-order product bits Some lower-order product bits are generated directly Apr. 2012 Computer Arithmetic, Multiplicationlide 54 S
  • 55. Example of Variations in CSA Tree Design × a x Wallace Tree Multiplicand Multiplier (5 FAs + 3 HAs + 4-Bit Adder) x 0 a 20 x 1 a 21 x 2 a 22 x 3 a 23 1 Partial products bit-matrix 2 3 4 3 2 1 FA FA FA HA p Product -------------------1 3 2 3 2 1 1 FA HA FA HA ---------------------2 2 2 2 1 1 1 4-Bit Adder ---------------------1 1 1 1 1 1 1 1 Fig. 11.2 Corrections shown in red Dadda Tree (4 FAs + 2 HAs + 6-Bit Adder) 1 2 3 4 3 2 1 FA FA HA HA -------------------1 3 3 3 3 2 1 2 2 FA FA FA HA HA HA FA ---------------------2 2 2 2 2 2 1 1 6-Bit Adder ---------------------1 1 1 1 1 1 1 1 Two different binary 4 × 4 tree multipliers. Apr. 2012 Computer Arithmetic, Multiplicationlide 55 S
  • 56. Details of a CSA Tree [0, 6] [2, 8] [1, 7] [3, 9] [5, 11] [4, 10] [6, 12] [1, 6] Fig. 11.3 Possible CSA tree for a 7 × 7 tree multiplier. 7-bit CSA [2, 8] 7-bit CSA [1,8] [3, 11] [5, 11] 7-bit CSA [6, 12] [3, 12] [2, 8] CSA trees are quite irregular, causing some difficulties in VLSI realization Thus, our motivation to examine alternate methods for partial products reduction 7-bit CSA [3,9] [2,12] [3,12] T he index pair [i, j] means that bit positions from i up to j are involved. 10-bit CSA [3,12] [4,13] [4,12] 10-bit CPA Ignore [4, 13] Apr. 2012 Computer Arithmetic, Multiplicationlide 56 S 3 2 1 0
  • 57. 11.2 Alternative Reduction Trees FA Inputs Level-1 carries FA 11 + ψ1 = 2ψ1 + 3 Therefore, ψ1 = 8 carries are needed FA FA FA FA FA FA FA Level-2 carries Level 2 FA FA FA Level-3 carries Fig. 11.4 A slice of a balanced-delay tree for 11 inputs. Level 1 FA Level-4 carry FA FA Level 3 FA FA Outputs Apr. 2012 Computer Arithmetic, Multiplicationlide 57 S Level 4 FA Level 5
  • 58. Binary Tree of 4-to-2 Reduction Modules 4-to-2 4-to-2 4-to-2 4-to-2 compressor 4-to-2 FA 4-to-2 0 4-to-2 1 0 FA 4-to-2 c (a) Binary tree of (4; 2)-counters 1 s (b) Realization with FAs c s (c) A faster realization Fig. 11.5 Tree multiplier with a more regular structure based on 4-to-2 reduction modules. Due to its recursive structure, a binary tree is more regular than a 3-to-2 reduction tree when laid out in VLSI Apr. 2012 Computer Arithmetic, Multiplicationlide 58 S
  • 59. Example Multiplier with 4-to-2 Reduction Tree Similarly, using Booth’s recoding may not yield any advantage, because it introduces irregularity Multiple generation circuits ... M u l t i p l i c a n d Redundant-to-binary converter Fig. 11.6 Layout of a partial-products reduction tree composed of 4-to-2 reduction modules. Each solid arrow represents two numbers. Apr. 2012 Computer Arithmetic, Multiplicationlide 59 S M ul t ipl e s elect i on si gnals Even if 4-to-2 reduction is implemented using two CSA levels, design regularity potentially makes up for the larger number of logic levels
  • 60. 11.3 Tree Multipliers for Signed Numbers ---------- Extended positions ---------- Sign Magnitude positions --------- xk–1 yk–1 zk–1 xk–1 yk–1 zk–1 xk–2 yk–2 zk–2 xk–1 yk–1 zk–1 xk–1 yk–1 zk–1 xk–1 yk–1 zk–1 From Fig. 8.19a xk–1 yk–1 zk–1 xk–3 yk–3 zk–3 xk–4 yk–4 zk–4 ... ... ... Sign extension in multioperand addition. Signs Sign extensions α α α α α α α α x x x x x x x β β β β β β β x x x x x x x x γ γ γ γ γ γ x x x x x x x x x Five redundant copies removed FA FA FA FA FA The difference in multiplication is the shifting sign positions αβγ FA Fig. 11.7 Sharing of full adders to reduce the CSA width in a signed tree multiplier. Apr. 2012 Computer Arithmetic, Multiplicationlide 60 S
  • 61. Using the Negative-Weight Property of the Sign Bit Sign extension is a way of converting negatively weighted bits (negabits) to positively weighted bits (posibits) to facilitate reduction, but there are other methods of accomplishing the same without introducing a lot of extra bits a4 a3 a2 a1 a0 x4 x3 x2 x1 x0 ---------------------------a4 x0 a3 x0 a2 x0 a1 x0 a0 x0 a4 x1 a3 x1 a2 x1 a1 x1 a0 x1 a4 x2 a3 x2 a2 x2 a1 x2 a0 x2 a4 x3 a3 x3 a2 x3 a1 x3 a0 x3 a4 x4 a3 x4 a2 x4 a1 x4 a0 x4 -------------------------------------------------------p9 p8 p7 p6 p5 p4 p3 p2 p1 p0 a. Unsigned × a4 a3 a2 a1 a0 x x x x x 4 3 2 1 0 ----------------------------a4 x0 a3 x0 a2 x0 a1 x0 a0 x0 -a4 x1 a3 x1 a2 x1 a1 x1 a0 x1 -a4 x2 a3 x2 a2 x2 a1 x2 a0 x2 -a4 x3 a3 x3 a2 x3 a1 x3 a0 x3 a4 x4 -a3 x4 -a2 x4 -a1 x4 -a0 x4 -------------------------------------------------------p p p p p p p p p p 9 8 7 6 5 4 3 2 1 0 b. 2's-complement × a4 a3 a2 a1 a0 x4 x3 x2 x1 x0 ---------------------------_ _ a4 x0 a3 x0 a2 x0 a1 x0 a0 x0 _ a4 x1 a3 x1 a2 x1 a1 x1 a0 x1 _ a4 x2 a3 x2 a2 x2 a1 x2 a0 x2 a4 x3 a3 x3 a2 x3 a1 x3 a0 x3 _ _ _ _ a x a x a x a x a x 4 4 3 4 2 4 1 4 0 4 a4 a4 1 x4 x4 -------------------------------------------------------p p p p p p p p p p 9 8 7 6 5 4 3 2 1 0 Fig. 11.8 Baugh-Wooley 2’s-complement multiplication. × d. Modified B-W Baugh and Wooley have contributed two such methods c. Baugh-Wooley × __ __ a4 a3 a2 a1 a0 x x x x x 4 3 2 1 0 --------------------------__ a x a x a x a x a x 4 0 3 0 2 0 1 0 0 0 a x a x a x a x 3 x1 a2 x1 a1 x1 0 1 a 2 2 1 2 0 2 a x __3 a0 x3 1 a x 0 4 a x a x a4 x1 __ 4 2 3 2 a x __3 a3 x3 a2 x3 __ __ 4 a x a x a x a x 4 4 3 4 2 4 1 4 1 1 -------------------------------------------------------p p p p p p p p p p 9 8 7 6 5 4 3 2 1 0 Apr. 2012 Computer Arithmetic, Multiplicationlide 61 S
  • 62. a4 x4 -a3 x4 -a2 x4 -a1 x4 -a0 x4 -------------------------------------------------------p9 p8 p7 p6 p5 p4 p3 p2 p1 p0 The Baugh-Wooley Method and Its Modified Form Fig. 11.8 –a4x0 = a4(1 – x0) – = a4x0′ – a4 –a4 a4x0′ a4 In next column a4 a3 a2 a1 a0 x4 x3 x2 x1 x0 × ---------------------------_ a4 x0 a3 x0 a2 x0 a1 x0 a0 x0 _ a4 _ a4 x1 a3 x1 a2 x1 a1 x1 a0 x1 _ a4 x2 a3 x2 a2 x2 a1 x2 a0 x2 a4 x3 a3 x3 a2 x3 a1 x3 a0 x3 _ _ _ _ a4 x4 a3 x4 a2 x4 a1 x4 a0 x4 a4 a4 1 x4 x4 -------------------------------------------------------p p p p p p p p p p 9 8 7 6 5 4 3 2 1 0 c. Baugh-Wooley d. Modified B-W –a4x0 = (1 – a4x0) – 1 = (a4x0)′ – 1 –1 (a4x0)′ 1 In next column × __ __ a4 a3 a2 a1 a0 x x x x x 4 3 2 1 0 --------------------------__ a x a x a x a x a x 4 0 3 0 2 0 1 0 0 0 a x a x a x a x 0 1 a3 x1 a2 x1 a1 x1 2 2 1 2 0 2 a x a x __3 0 3 1 a x 0 4 a x a x a4 x1 __ 4 2 3 2 a x a x a x __3 __3 __3 4 3 2 a x a x a x a x 4 4 3 4 2 4 1 4 1 1 -------------------------------------------------------p p p p p p p p p p 9 8 7 6 5 4 3 2 1 0 Apr. 2012 Computer Arithmetic, Multiplicationlide 62 S
  • 63. Alternate Views of the Baugh-Wooley Methods + 0 0 –a4x3 –a4x2 –a4x1 –a4x0 + 0 0 –a3x4 –a2x4 –a1x4 –a0x4 -------------------------------------------– 0 0 a4x3 a4x2 a4x1 a4x0 – 0 0 a3x4 a2x4 a1x4 a0x4 -------------------------------------------+ 1 1 a4x3 a4x2 a4x1 a4x0 + 1 1 a3x4 a2x4 a1x4 a0x4 1 1 -------------------------------------------+ a4 a4 a4x3 a4x2 a4x1 a4x0 + x4 x4 a4 a3x4 a2x4 a1x4 a0x4 a4 1 x4 x4 -------------------------------------------- a4 a3 a2 a1 a0 x4 x3 x2 x1 x0 ---------------------------a4 x0 a3 x0 a2 x0 a1 x0 a0 x0 a4 x1 a3 x1 a2 x1 a1 x1 a0 x1 a4 x2 a3 x2 a2 x2 a1 x2 a0 x2 a4 x3 a3 x3 a2 x3 a1 x3 a0 x3 a4 x4 a3 x4 a2 x4 a1 x4 a0 x4 -------------------------------------------------------p9 p8 p7 p6 p5 p4 p3 p2 p1 p0 a. Unsigned × a4 a3 a2 a1 a0 x x x x x 4 3 2 1 0 ----------------------------a4 x0 a3 x0 a2 x0 a1 x0 a0 x0 -a4 x1 a3 x1 a2 x1 a1 x1 a0 x1 -a4 x2 a3 x2 a2 x2 a1 x2 a0 x2 -a4 x3 a3 x3 a2 x3 a1 x3 a0 x3 a4 x4 -a3 x4 -a2 x4 -a1 x4 -a0 x4 -------------------------------------------------------p p p p p p p p p p 9 8 7 6 5 4 3 2 1 0 b. 2's-complement × a4 a3 a2 a1 a0 x4 x3 x2 x1 x0 ---------------------------_ _ a4 x0 a3 x0 a2 x0 a1 x0 a0 x0 _ a4 x1 a3 x1 a2 x1 a1 x1 a0 x1 _ a4 x2 a3 x2 a2 x2 a1 x2 a0 x2 a4 x3 a3 x3 a2 x3 a1 x3 a0 x3 _ _ _ _ a x a x a x a x a x 4 4 3 4 2 4 1 4 0 4 a4 a4 1 x4 x4 -------------------------------------------------------p p p p p p p p p p 9 8 7 6 5 4 3 2 1 0 c. Baugh-Wooley × d. Modified B-W × __ __ a4 a3 a2 a1 a0 x x x x x 4 3 2 1 0 --------------------------__ a x a x a x a x a x 4 0 3 0 2 0 1 0 0 0 a x a x a x a x 3 x1 a2 x1 a1 x1 0 1 a 2 2 1 2 0 2 a x __3 a0 x3 1 a x 0 4 a x a x a4 x1 __ 4 2 3 2 a x __3 a3 x3 a2 x3 __ __ 4 a x a x a x a x 4 4 3 4 2 4 1 4 1 1 -------------------------------------------------------p p p p p p p p p p 9 8 7 6 5 4 3 2 1 0 Apr. 2012 Computer Arithmetic, Multiplicationlide 63 S
  • 64. 11.4 Partial-Tree and Truncated Multipliers h inputs High-radix versus partialtree multipliers: The difference is quantitative, not qualitative For small h, say ≤ 8 bits, we view the multiplier of Fig. 11.9 as high-radix When h is a significant fraction of k, say k/2 or k/4, then we tend to view it as a partial-tree multiplier Upper part of the cumulative partial product (stored-carry) ... CSA Tree Sum Carry Better design through pipelining to be covered in Section 11.6 FF Adder h-Bit Adder Lower part of the cumulative partial product Fig. 11.9 General structure of a partial-tree multiplier. Apr. 2012 Computer Arithmetic, Multiplicationlide 64 S
  • 65. Why Truncated Multipliers? Nearly half of the hardware in array/tree multipliers is there to get the last bit right (1 dot = one FPGA cell) ulp .         k-by-k fractional × .         multiplication --------------------------------.       | Max error = 8/2 + 7/4 .      |  + 6/8 + 5/16 + 4/32 .     |   + 3/64 + 2/128 .    |    + 1/256 = 7.004 ulp .   |     .  |      Mean error = . |       1.751 ulp . |        --------------------------------.        |        Fig. 11.10 The idea of a truncated multiplier with 8-bit fractional operands. Apr. 2012 Computer Arithmetic, Multiplicationlide 65 S
  • 66. Truncated Multipliers with Error Compensation We can introduce additional “dots” on the left-hand side to compensate for the removal of dots from the right-hand side Constant compensation Variable compensation . . . . . . . . . . . . . . . . o o o o o o o o o o o o o o o 1 o o o o o o o| o| o| o| o| o| o| | o o o o o o o o o o o o o o o o o| o o| o o| o o| o o| o o| x-1o| y-1 | Constant and variable error compensation for truncated multipliers. Max error = +4 ulp Max error ≅ −3 ulp Max error = +? ulp Max error ≅ −? ulp Mean error = ? ulp Mean error = ? ulp Apr. 2012 Computer Arithmetic, Multiplicationlide 66 S
  • 67. 11.5 Array Multipliers x2 a x1 a x0 a CSA x3 a CSA x4 a CSA CSA a4 x0 0 a3 x0 0 a2 x0 0 a1 x0 0 a0 x0 0 p0 a3 x1 a4 x1 a2 x1 a0 x1 p1 a3 x2 a4 x2 a2 x2 a1 x2 a0 x 2 p2 a3 x3 a4 x3 a2 x3 a1 x3 a0 x3 p3 a3 x4 Rip p le-Carry Adder a1 x1 a1 x4 a2 x4 a4 x4 a0 x4 0 ax Fig. 11.11 A basic array multiplier uses a one-sided CSA tree and a ripple-carry adder. p9 p8 p7 p6 Fig. 11.12 Details of a 5 × 5 array multiplier using FA blocks. Apr. 2012 Computer Arithmetic, Multiplicationlide 67 S p4 p5
  • 68. Signed (2’s-complement) Array Multiplier Fig. 11.13 Modifications in a 5 × 5 array multiplier to deal with 2’s-complement inputs using the Baugh-Wooley method or to shorten the critical path. a4 x4 _ a4 _ x4 _ a4 x0 0 a3 x0 0 a2 x0 0 a1 x0 0 a0 x0 p0 a3 x1 _ a4 x1 a2 x1 a1 x1 a0 x1 p1 a3 x2 _ a4 x2 a2 x2 a1 x2 a0 x 2 p2 a3 x3 _ a4 x3 a2 x3 1 a0 x3 _ a2 x4 _ a3 x4 a1 x3 _ a1 x4 _ a0 x4 p3 a4 x4 p9 p8 p7 p6 Apr. 2012 Computer Arithmetic, Multiplicationlide 68 S p5 p4
  • 69. Array Multiplier Built of Modified Full-Adder Cells Fig. 11.14 Design of a 5 × 5 array multiplier with two additive inputs and full-adder blocks that include AND gates. a4 a3 a2 a1 a0 x0 p0 p1 p2 p3 FA p4 p9 p8 p7 p6 Apr. 2012 Computer Arithmetic, Multiplicationlide 69 S p5 x1 x2 x3 x4
  • 70. Array Multiplier without a Final Carry-Propagate Adder Fig. 11.15 Conceptual view of a modified array multiplier that does not need a final carry-propagate adder. .. . M ux i Fig. 11.16 Carry-save addition, performed in level i, extends the conditionally computed bits of the final product. i Conditional bits Dots in row i Dots in row i + 1 i + 1 Conditional bits of the final product Bi+1 Level i M ux i i i+1 Bi+1 i+1 M ux k Bi Bi i .. . k M ux k [k, 2k–1] k–1 i+1 i i–1 All remaining bits of the final product produced only 2 gate levels after pk–1 Apr. 2012 Computer Arithmetic, Multiplicationlide 70 S 1 0
  • 71. 11.6 Pipelined Tree and Array Multipliers h inputs ... h inputs Upper part of the cumulative partial product (stored-carry) ... Pipelined CSA Tree Latches (h + 2)-input CSA tree Latches Latches CSA Tree CSA Sum Latch CSA Carry FF Adder h-Bit Adder Lower part of the cumulative partial product Sum Carry FF Adder Fig. 11.9 General structure of a partial-tree multiplier. h-Bit Adder Lower part of the cumulative partial product Fig. 11.17 Efficiently pipelined partial-tree multiplier. Apr. 2012 Computer Arithmetic, Multiplicationlide 71 S
  • 72. Pipelined Array Multipliers a4 a3 a2 a1 a0 x0 x1 x2 x3 x4 p4 p3 p2 p1 p0 With latches after every FA level, the maximum throughput is achieved Latches may be inserted after every h FA levels for an intermediate design Example: 3-stage pipeline Fig. 11.18 Pipelined 5 × 5 array multiplier using latched FA blocks. The small shaded boxes are latches. Latched FA with AND gate FA FA FA Latch FA p9 p8 p7 p6 p5 Apr. 2012 Computer Arithmetic, Multiplicationlide 72 S
  • 73. 12 Variations in Multipliers Chapter Goals Learn additional methods for synthesizing fast multipliers as well as other types of multipliers (bit-serial, modular, etc.) Chapter Highlights Building a multiplier from smaller units Performing multiply-add as one operation Bit-serial and (semi)systolic multipliers Using a multiplier for squaring is wasteful Apr. 2012 Computer Arithmetic, Multiplicationlide 73 S
  • 74. Variations in Multipliers: Topics Topics in This Chapter 12.1 Divide-and-Conquer Designs 12.2 Additive Multiply Modules 12.3 Bit-Serial Multipliers 12.4 Modular Multipliers 12.5 The Special Case of Squaring 12.6 Combined Multiply-Add Units Apr. 2012 Computer Arithmetic, Multiplicationlide 74 S
  • 75. 12.1 Divide-and-Conquer Designs Building wide multiplier from narrower ones aH xH × aL xL Rearranged partial products in 2b-by-2b multiplication 2b bits b bits a L xL a L xH a H xL a H xH p a H xL a H xH a L xL a L xH 3b bits Fig. 12.1 Divide-and-conquer (recursive) strategy for synthesizing a 2b × 2b multiplier from b × b multipliers. Apr. 2012 Computer Arithmetic, Multiplicationlide 75 S
  • 76. General Structure of a Recursive Multiplier 2b × 2b 3b × 3b 4b × 4b use (3; 2)-counters use (5; 2)-counters use (7; 2)-counters 4b × 4b 3b × 3b 2b × 2b b ×b Fig. 12.2 Using b × b multipliers to synthesize 2b × 2b, 3b × 3b, and 4b × 4b multipliers. Apr. 2012 Computer Arithmetic, Multiplicationlide 76 S
  • 77. Using b × c, rather than b × b Building Blocks 4b × 4b 3b × 3b 2b × 2b b ×b 2b × 2c 2b × 4c gb × hc use b × c multipliers and (3; 2)-counters use b × c multipliers and (5?; 2)-counters use b × c multipliers and (?; 2)-counters Apr. 2012 Computer Arithmetic, Multiplicationlide 77 S
  • 78. Wide Multiplier Built of Narrow Multipliers and Adders Fig. 12.3 Using 4 × 4 multipliers and 4-bit adders to synthesize an 8 × 8 multiplier. aH xH [4, 7] aL [4, 7] Multiply [12,15] xH [0, 3] aH [4, 7] Multiply [8,11] [8,11] x [4, 7] L aL [0, 3] [0, 3] Multiply [4, 7] [8,11] 8 xL [0, 3] Multiply [4, 7] [4, 7] [0, 3] Add [4, 7] 12 8 Add [8,11] 12 Add [4, 7] Add [8,11] 000 Add [12,15] p [12,15] p[8,11] p [4, 7] Apr. 2012 Computer Arithmetic, Multiplicationlide 78 S p [0, 3]
  • 79. Karatsuba Multiplication 2b × 2b multiplication requires four b × b multiplications: (2baH + aL) × (2bxH + xL) = 22baHxH + 2b (aHxL + aLxH) + aLxL Karatsuba noted that one of the four multiplications can be removed at the expense of introducing a few additions: (2baH + aL) × (2bxH + xL) = b bits Mult 1 Mult 3 aH aL xH 22baHxH + 2b [(aH + aL) × (xH + xL) – aHxH – aLxL] + aLxL xL Mult 2 Benefit is quite significant for extremely wide operands (4/3)5 = 4.2 (4/3)10 = 17.8 (4/3)20 = 315.3 (4/3)50 = 1,765,781 Apr. 2012 Computer Arithmetic, Multiplicationlide 79 S
  • 80. 12.2 Additive Multiply Modules y z y ax z x a 4-bit adder p = ax + y + z cin p (a) Block diagram (b) Dot notation v v Fig. 12.4 Additive multiply module with 2 × 4 multiplier (ax) plus 4-bit and 2-bit additive inputs (y and z). b × c AMM b-bit and c-bit multiplicative inputs b-bit and c-bit additive inputs (b + c)-bit output (2b – 1) × (2c – 1) + (2b – 1) + (2c – 1) = 2b+c – 1 Apr. 2012 Computer Arithmetic, Multiplicationlide 80 S
  • 81. Multiplier Built of AMMs * Legend: 2 bits 4 bits * x[0, 1] a [0, 3] * * a [4, 7] [2, 5] x[0, 1] [2, 3] [4, 5] * a [4, 7] x [2, 3] Understanding an 8 × 8 multiplier built of 4 × 2 AMMs using dot notation [4,7] x [4, 5] a [0, 3] [8, 11] * x [2, 3] a [0, 3] [6, 9] [0, 1] [4,5] [6, 7] [6, 9] x [6, 7] a [0, 3] [8, 11] [8, 9] x [4, 5] a [4, 7] [8, 9] [10,13] a [4, 7] [6, 7] [10,11] x [6, 7] p[12,15] p [10,11] p [8, 9] p[0, 1] p[2, 3] p[4,5] p [6, 7] Fig. 12.5 An 8 × 8 multiplier built of 4 × 2 AMMs. Inputs marked with an asterisk carry 0s. Apr. 2012 Computer Arithmetic, Multiplicationlide 81 S
  • 82. a [4, 7] Multiplier Built of AMMs: Alternate Design * a [0,3] * * x[0, 1] * x [2, 3] This design is more regular than that in Fig. 12.5 and is easily expandable to larger configurations; its latency, however, is greater * x [4, 5] * x [6, 7] Legend: 2 bits 4 bits p[12,15] p [10,11] p[0, 1] p[2, 3] p[4,5] p [6, 7] p [8, 9] Fig. 12.6 Alternate 8 × 8 multiplier design based on 4 × 2 AMMs. Inputs marked with an asterisk carry 0s. Apr. 2012 Computer Arithmetic, Multiplicationlide 82 S
  • 83. 12.3 Bit-Serial Multipliers Bit-serial adder (LSB first) …x x x 2 1 0 FF FA …y y y 2 1 0 Bit-serial multiplier (Must follow the k-bit inputs with k 0s; alternatively, view the product as being only k bits wide) …a a a 2 1 0 …x x x 2 1 0 ? …s s s 2 1 0 …p p p 2 1 0 What goes inside the box to make a bit-serial multiplier? Can the circuit be designed to support a high clock rate? Apr. 2012 Computer Arithmetic, Multiplicationlide 83 S
  • 84. Semisystolic Serial-Parallel Multiplier a3 M ultiplicand (parallel in) a1 a2 a0 x0 x1 x2 x3 M ultiplier (serial in) LSB-first Sum FA Carry FA FA FA Product (serial out) Fig. 12.7 Semi-systolic circuit for 4 × 4 multiplication in 8 clock cycles. This is called “semisystolic” because it has a large signal fan-out of k (k-way broadcasting) and a long wire spanning all k positions Apr. 2012 Computer Arithmetic, Multiplicationlide 84 S
  • 85. Systolic Retiming as a Design Tool A semisystolic circuit can be converted to a systolic circuit via retiming, which involves advancing and retarding signals by means of delay removal and delay insertion in such a way that the relative timings of various parts are unaffected Cut e f CL g h –d +d e+d f+d CR CL g–d h–d CR –d +d Original delays Adjusted delays Fig. 12.8 Example of retiming by delaying the inputs to C L and advancing the outputs from CL by d units Apr. 2012 Computer Arithmetic, Multiplicationlide 85 S
  • 86. Alternate Explanation of Systolic Retiming t t+a t d1 d2 t+a+d1+d2 t+d1 t+a+d1 d1 t d2 t+a+d1+d2 Transferring delay from the outputs of a subsystem to its inputs does not change the behavior of the overall system Apr. 2012 Computer Arithmetic, Multiplicationlide 86 S
  • 87. a3 A First Attempt at Retiming M ultip licand (parallel in) a1 a2 x0 x1 x2 x3 a0 M ultiplier (serial in) LSB-first Sum FA FA FA FA Product (serial out) Carry Fig. 12.7 a3 M ultiplicand (parallel in) a1 a2 a0 x0 x1 x2 x3 M ultiplier (serial in) LSB-first FA Sum FA FA FA Carry Cut 3 Cut 2 Cut 1 Product (serial out) Fig. 12.9 A retimed version of our semi-systolic multiplier. Apr. 2012 Computer Arithmetic, Multiplicationlide 87 S
  • 88. Deriving a Fully Systolic Multiplier M ultip licand (parallel in) a1 a2 a3 a0 x0 x1 x2 x3 M ultiplier (serial in) LSB-first Sum FA FA FA FA Carry Product (serial out) Fig. 12.7 M ultiplicand (parallel in) a1 a2 a3 a0 x0 x1 x2 x3 M ultiplier (serial in) LSB-first Sum FA Carry FA FA FA Product (serial out) Fig. 12.10 Systolic circuit for 4 × 4 multiplication in 15 cycles. Apr. 2012 Computer Arithmetic, Multiplicationlide 88 S
  • 89. A Direct Design for a Bit-Serial Multiplier p (i–1) ai ai t out t in a(i - 1) xi xi x(i - 1) ai x(i - 1) cout ai xi 2 (5; 3)-counter 1 c in Already accumulated into three numbers ai xi 0 xi a(i - 1) 1 sin 0 a x s out Mux Already output Fig. 12.11 Building block for a latency-free bit-serial multiplier. (a) Structure of the bit-matrix ... ... ... t out t in LSB ... cout c in 0 ... sin s out pi ai xi Fig. 12.12 The cellular structure of the bit-serial multiplier based on the cell in Fig. 12.11. p ai xi p(i - 1) ai x(i - 1) xi a(i - 1) 2p (i ) Shift right )to obtain p(i (b) Reduction after each input bit Fig. 12.13 Bit-serial multiplier design in dot notation. Apr. 2012 Computer Arithmetic, Multiplicationlide 89 S
  • 90. 12.4 Modular Multipliers FA FA ... FA FA FA Fig. 12.14 Modulo-(2b – 1) carry-save adder. 4 4 4 4 Mod-15 CSA Divide by 16 Fig. 12.15 Design of a 4 × 4 modulo-15 multiplier. Mod-15 CSA Mod-15 CPA Apr. 2012 Computer Arithmetic, Multiplicationlide 90 S 4
  • 91. Other Examples of Modular Multiplication Address Fig. 12.16 One way to design of a 4 × 4 modulo-13 multiplier. n inp uts . . . CSA Tree Table ... Data Fig. 12.17 A method for modular multioperand addition. 3-inp ut M odulo-m Adder sum mod m Apr. 2012 Computer Arithmetic, Multiplicationlide 91 S
  • 92. 12.5 The Special Case of Squaring Multiply x by x x3 x3 x2 x2 x1 x1 x0 x0 x1 x0 x0 x1 x4 x3 x3 x4 x3 x0 x2 x1 x1 x2 x0 x3 x0 x0 x4 x2 x3 x3 x2 x4 x4 x1 x3 x2 x2 x3 x1 x4 x4 x0 x3 x1 x2 x2 x1 x3 x0 x4 x2 x0 x1 x1 x0 x2 x4 x4 p9 x4 x4 p8 p7 p6 p5 p4 p3 p2 p1 p0 Simplify x4 x3 x4 p9 x4 x2 p8 p7 x4 x1 x3 x2 x3 x4 x0 x3 x1 x3 x0 x2 x1 x2 x2 x0 p6 p5 p4 p3 Fig. 12.18 x1 x0 x1 _ x0 x1x0 –x1x0 p2 0 x0 Design of a 5-bit squarer. Apr. 2012 Computer Arithmetic, Multiplicationlide 92 S
  • 93. Divide-and-Conquer Squarers Building wide squarers from narrower ones aH xH xH × aH x H xH aL xL xL xL a L xH a H xL xH p Rearranged partial products in 2b-by-2b multiplication 2b bits b bits a L xL xL xH x H aH xH a H xL a L xH xL xL x L aL 3b bits Divide-and-conquer (recursive) strategy for synthesizing a 2b × 2b squarer from b × b squarers and multiplier. Apr. 2012 Computer Arithmetic, Multiplicationlide 93 S
  • 94. 12.6 Combined Multiply-Add Units Additive input (a) CSA tree output Carry-save additive input (b) CSA tree output Additive input Dot matrix for the 4 × 4 multiplication (c) Carry-save additive input (d) Dot matrix for the 4 × 4 multiplication Multiply-add versus multiply-accumulate Multiply-accumulate units often have wider additive inputs Fig. 12.19 Dot-notation representations of various methods for performing a multiply-add operation in hardware. Apr. 2012 Computer Arithmetic, Multiplicationlide 94 S