A High performance unified BCD adder/Subtractor

2009 IEEE Computer Society Annual Symposium on VLSI

A High Performance Unified BCD and Binary
Adder/Subtractor
Anshul Singh,Aman Gupta,Sreehari Veeramachaneni, M.B. Srinivas*
Centre for VLSI and Embedded System Technologies(CVEST),
International Institute of Information Technology (IIIT),Gachibowli, Hyderabad, 500032, India.
* Department Electronics and Communication Engg, Birla Institute of Technology and Science (BITS),
Hyderabad Campus, Hyderabad, India
Email: {anshul_ singh, aman }@students.iiit.ac.in, srihari@research.iiit.ac.in ,srinivas@iiit.ac.in.
Abstract- Decimal data processing applications have grown
exponentially in recent years thereby increasing the need to have
hardware support for decimal arithmetic. In this paper, an
improved architecture for efficient Binary Coded Decimal (BCD)
addition/subtraction is presented that performs binary
addition/subtraction without any extra hardware. The
architecture works for both signed and unsigned numbers. The
design is runtime reconfigurable and maximum utilization of the
hardware is a feature of the architecture. Simulation results show
that the proposed architecture is at least 32% better in terms of
power-delay product than the existing designs.

Flora [5] followed the principle of carry select adders and
came up with a design which concurrently calculated two
results, one assuming the presence of an input carry and the
other in its absence. Fischer et al. [4] (Fig. 2) later came up
with a compact design that employed only one adder but the
latency was a problem as it had to use an additional correction
block.
Input Stage

Invert the operand
when a sign is
detected

N1

Output Stage
AGS
X
S

I. INTRODUCTION

Binary
Adder

Fast decimal data processing needs hardware that supports
decimal arithmetic. Recently, specifications for decimal
floating point arithmetic have been added to the draft revision
of the IEEE-P754 standard for floating point arithmetic [2].
Extensive work has been done on BCD arithmetic especially
on adders/subtractors. Some of the initial contributions came
from Schmooklar et al. [1] and Adiletta et al. [3]. Later,
designs of combined BCD and Binary adders were presented
by Levine et al. and Anderson. The first BCD sign-magnitude
adder/subtractor was designed by Grupe [11]. An area
efficient sign-magnitude adder was later developed by
Hwang[8] as shown in Fig.1 Area occupied by this design was
least amongst all the previous designs.

A 6 is added when
both N2 and N1 are
positive

N2

Subtract a 6
when necessary

SUM

C
Y

F3
F1

F2

Figure 2. Fischer’s proposal [4]

During the last decade various BCD adder/subtractor
circuits have been developed for the IBM microprocessors
based on the design presented by Haller et al. [6]. This
architecture is shown in Fig. 3. Recently, Haller et al.
optimized the carry chain in the same architecture which
slightly reduced the delay but with an increased area of the
unit.
Operand N2

N2 Augend Input

N1 (Addend Input)

….

….

Digitwise + 6
Operand N1

Nine’s Complimenter

Dec Add

Dec Sub

1

0

MUX
Ai
Carry Input

Ci

….

Bi

A0

Binary ALU
Yi

….

….

B0
Y1
.
.
Y0

Binary
Carry
Out

.
.
.

Ci

Y1
.
.
.
Y0

.
.
.

N1
N2

….

Y0

….

C0

Decimal
Correction
Unit
Ri

….

Cd

Decimal
Carry
Out

N1
+
N2

N1
N2
6

N1
+
N2
+
6

CY0

N1
N2

N1
+
N2

N1
N2
6

N1
+
N2
+
6

CY1
Digital Carry
Network

R0

….

Digitwise - 6

Digitwise - 6

….
Multiplexer
Partial Sum 0

….

MUX

Partial Sum 1

MUX

SUM

Figure 1. Hwang’s proposal [8]

MUX
SUM

Figure3. Haller’s Proposal [6,7]

978-0-7695-3684-2/09 $25.00 © 2009 IEEE
DOI 10.1109/ISVLSI.2009.40

211

Carry Out

in Fig.6), which is used for group propagate and generate. The
CM block is basically the Black cell used in the proposed
design. It can be easily observed that the delay for generating
the carry out from the CM1 is same as that for the second
ripple carry adder thereby making the CM1 block redundant.
Further, replacement of CM3 block by the grey cell (GC) will
reduce the hardware without affecting the functionality of the
circuit.

The design of the Universal Adder (Fig. 4) proposed by
D.R.Humberto et al. [12] uses affective addition/subtraction
operations on unsigned/sign-magnitude, and various
complement representations. This design overcomes the
limitations of previously reported approaches that produce
some of the results in complement representation when
operating on sign-magnitude numbers. The design has high
latency.

A4 B4
Add N1' N2'

EAdd

N2

N1

N1 N2*

DC
Logic

XOR
Add

Decimal SUB

Binary ADD/SUB
Decimal Add

Digitwise -6

N1

Bin

A2 B2

A1 B1

FA

Co
Logic

Correction Coder
Decimal

A3 B3

FA

FA

FA

N2

CM2

Correction Time

CM1

DS

(3/2) Counter Array

CM3
Co

Carry-Propagate Adder

Ci

Add

XOR
0

SUM

1

Figure4. Humberto’s Proposal [12]
G e n e ra te

Sreehari et al. [9] recently came up with the prefix logic
based BCD adders and proposed a novel unified BCD binary
adder-subtractor [10] which is considered as the fastest unified
adder in the literature so far. The architecture is divided into
three major parts, the pre-computation stage, the prefix
network and the post-computation stage. This architecture is
illustrated in Fig.5. The pre-computation block consists of
logic to compute propagate and generate signals for both BCD
and Binary addition/subtraction.
Bn An

BCD

BCD

PG

P*G*

B3 A3

Bn-1 n-1
A

BCD
FA
PG

BCD
FA
PG

BCD
FS
P*G*

BCD
FS
P*G*

B2 A 2

BCD
FA
PG

BCD
FS
P*G*

Wide varieties of prefix networks are available
depending on the requirements of the designer. Sklansky
network is chosen by the authors for reduction in delay.
The post-computation block proposed by Sreehari et al.
[10] (Fig. 7) uses a 4 bit CLA to add the two numbers to
calculate the sum/difference and the carry out bits for each
stage. But these bits are already calculated in the precomputation block and the prefix network thereby making
more than half of the post computation block redundant.
Removing these redundancies from the design can increase the
performance of the architecture considerably for each stage.

B1 A1

BCD
FA
PG

P ro p a g a te

Figure6. The Existing P-G Block [9]

BCD
FS
P*G*

As Bs
B1 B2 B3 B4
C in

PREFIX NETWORK

A1 A2 A3 A4

0

O pSelect
FA/FS

FA/FS

FA/FS

FA/FS

FA/FS

4-bit C LA Full Adder

OpSelect

4

MUX
B k+1

Sum/Difference

4

C orrector for
Subtractor

C out

C orrector for
Adder

4

Figure5. Architecture of the existing Unified BCD and Binary
Adder/Subtractor [10]

4
MUX
4

SU M / D IFFER EN C E

The pre-computation stage of the architecture is not
clearly presented by the authors in the paper [10] and it is
assumed that they have used the same P-G block presented in
[9]. The P-G block uses a Carry Merge block, CM (as shown

Figure7. Post Computation Block [10]

212

C in’

1's
C om plem ent
1

This paper presents a modified version of this unified
adder and is shown to perform better by at least 32% in
power-delay product. The rest of the paper is organized as
follows: Section 2 gives description of the algorithm for the
unified adder while section 3 describes the proposed
architecture. Simulation results for the proposed and existing
circuits are given in section 4 and comparisons are carried out.
.

4
AN-(N-3)

4

B8-5

4
A8-5

4

B4-1

P-G Block 2

PG2

S8-5

4
S4-1

4

Correction
Block K
4
ON-(N-3)

C2
S8-5

4

Correction
Block 2

O8-5

4

C1
S4-1

36
+3 8
74

0

1

1

1011 1100

+ 0010 0011

+ 0010 0011
1110 0000

0

1

1

1011 1100

1011 1100

+ 0010 0111

+ 0010 0111

1110 0011

4

1110 0100

Figure10. Example of Binary addition/subtraction illustrating the concept of 4
bit propagate and generate for BCD subtraction / Binary addition/subtraction

Correction
Block 1

O4-1

0

1011 1100

1
Generate
C + D > 1111

SN-(N-3)

1

1101 1111

Propagate
C + D = 1111

Prefix Network

CK

36
+3 3
69

For
Binary
addition/subtraction
and
BCD
subtraction,
P* = 1 if C + D = 15 (C and D are 4 bit numbers)
G* = 1 if C + D > 15
For the case of subtraction, D is the 2’s complement of the
original subtrahend.
For BCD subtraction P* and G* remain the same as in
binary addition/subtraction because BCD subtraction is treated
as Binary subtraction for the first two stages.
These control signals are then sent to the prefix network
which calculates the group propagate and generate using the
formula
Gi:j = Gi:k + Pi:k.Gk-1:j
Pi:j = Pi:k.Pk-1:j
where i ≥ k > j

P-G Block 1

PG1

0

4
A4-1

4

4
SN-(N-3)

PG K

0

Figure9. Examples of BCD addition illustrating the concept of 4 bit propagate
and generate

0
P-G Block K

1

36
+3 8
75

Generate
A+B>9

The main objective of the algorithm is to perform efficient
BCD addition/subtraction. But in the proposed design the
binary addition/subtraction is automatically taken care of
without any extra hardware. As BCD digits are 4 bits in
length, all the operations, be it BCD addition/subtraction or
binary addition/subtraction, are done on 4 bit numbers. The
algorithm divides the proposed design into three major parts,
the P-G Block, the Prefix Block and the Correction Block as
shown in the Fig. 8.
The P-G block generates signals named propagate (P) and
generate (G) for every 4 bits. These signals are used by the
prefix network for generating the carry out for each stage.
The P and G for a stage denote whether the stage propagates
or generates the carry/borrow respectively. Along with
generating these signals, the sum/difference of the 4 bit
numbers is obtained that is directly used by the correction
logic unlike the previous design [10]. The P-G block itself
uses prefix logic to generate the P and G signals for 4 bit
numbers.
4

36
+3 3
70

1

II. Algorithm for Unified BCD/Binary
Adder/Subtractor

BN -(N -3)

1

1

Propagate
A+B=9

4

The group Pk:0 and Gk:0 bits denote whether the first k
stages propagate or generate the carry/borrow. Gk:0 denotes the
carry out of the kth stage i.e. Ck = Gk:0 where Ck is the carry
out of the kth stage.
After all the carry/borrow bits are obtained, these are fed
to the correction stage which along with the sum/difference
bits from the P-G block gives out the final result. The first
operation in the correction block is to add the in-coming
carry/borrow from the previous stage to the sum/difference
bits. This is implemented using carry select adder to reduce

Figure8. Architecture of Unified BCD/Binary Adder/Subtractor

The concept of propagate and generate for different cases
are illustrated below with equations and examples.
For BCD addition,
P = 1 if A + B = 9 (A and B are 4 bit numbers)
G = 1 if A + B > 9

213

the delay. After this stage the correct binary outputs are
obtained but for BCD addition/subtraction further corrections
are to be made to obtain the correct BCD result.
For BCD addition (0110)2 or (6)10 is added to the binary
sum if it exceeds (1001)2 or (9)10 to get the correct BCD sum.
BCD subtraction in the first block is treated as binary
subtraction and the difference is obtained by the 2’s
complement technique. The only thing which has to be taken
care of is that the magnitude of subtrahend should always be
smaller than that of the minuend. If a digit of the minuend is
greater than that of subtrahend the binary output for that digit
is the correct BCD output and there is no need for any
correction. But if a digit of the subtrahend is greater than the
minuend then (1010)2 or (10)10 has to be added to the binary
output for that digit to get the correct BCD difference. To
detect the relative magnitude of the minuend and the
subtrahend of a PG block, the carry out of that stage is
checked. The following example illustrates the above
algorithm.
Let A (minuend) = 5 5 6
B (subtrahend) = 2 3 9
In BCD format: A = 0101 0101 0110
B = 0010 0011 1001
Treating these numbers as binary, 2’s complement of B,
say C is taken
C = 1101 1100 0111
Next the subtrahend is added to the minuend and the
correction is done if needed

is 1 when the effective operation is subtraction and 0 when the
effective operation is addition.
III. Proposed Architecture of the Unified BCD/Binary
Adder/Subtractor
The architecture, as discussed before, consists of three
major blocks i.e. the P-G block, the Prefix block and the
Correction block (Fig.8).
The architecture of the P-G block is shown in the Fig. 12.
Each block takes in 8 bits, 4 bits of each number and generates
the propagate and generate signals for BCD addition (P and G)
and for BCD subtraction/Binary addition/subtraction (P* and
G*) and also the sum or difference bits (S4 to S1 in the below
case). The logic diagram of the full adder, BC (black cell) and
GC (grey cell) are given in Fig. 13. For the case of
BCD/Binary subtraction, 2’s complement of the subtrahend is
calculated by inverting the bits of the subtrahend and adding 1
to the adder generating the least significant bit in the first P-G
block (least significant) as shown in Fig. 12. The rest of the PG blocks only take complements of the subtrahend and do not
add 1. To choose between the two kinds of propagate and
generate a multiplexer is used at the end of each P-G block.
Bs
OpSelect

SUB/ADD

B4

Carry out, no
correction needed

SUB/ADD

A4

A3

FA

+

Correct Binary output
Correction
Correct BCD output

No carry out,
correction needed

0

FA
g2:0

S3

FA

S2

S1

p4:3,g4:3

S4 S3

GC

S2 S1

S4

S1

0
g4:0/Cout1

1101 1100 0111

G

1

G*

P

P*

0011 0001 1101
+

A1

A2

p3,g3

B1

BC

0101 0101 0110

A
C

1

SUB/ADD

B2

FA
S4

p4,g4

1

SUB/ADD

B3

1

1010

0

BCD ADD/ELSE

Output to Prefix Network

0011 0001 0111

Figure 11.Illustrating the proposed algorithm for BCD subtraction

Figure12. The P-G block

Hence the final result = (317)10

A

The signed numbers are taken care by the control logic at
the beginning which takes the two sign bits and OpSelect
(Operation Select) as inputs to compute the control signal
(SUB/ADD) which specifies the effective operation to be
performed by the hardware. The effective operation to be
performed is calculated by the below equation
SUB/ADD = As ⊕ Bs ⊕ OpSelect

B

XOR-XNOR
Cin

Generate

Propagate

where OpSelect is 1 when the operation is subtraction and 0
when the operation is addition and As and Bs are the sign bits
of the numbers under operation. The control signal SUB/ADD

0

1

0

Sum

Figure13. (a) Full adder

214

1

Cout

As

G k-1:j

S3

S4

Pi:k

0

Gk-1:j

Pi:k

Gi:k

1

Pk-1:j

C1



G i:j

Pi:j

Gi:j

(b)

0

S2

0

1

O2

S4
1

S2.S3
0

S2(C1)’

S3
0

1

For the binary addition/subtraction the output of the carry
select adder in the correction block gives the final result.
IV. Simulations and Results
The analysis of all the architectures tabulated below has
been carried out by performing simulation runs on HSPICE
using 65nm CMOS technology. Simulations are performed for
32 bit adders/subtractors. All the circuits are simulated at 1.2V
at a frequency of 50 MHz. The simulation results are shown in
Table 1
TABLE I
Average Delay, Power, Power-Delay Product and Area of various
architectures

Cin

4 D4-1

Delay
(10-10 sec)

Power
(10-4 watt)

Humberto[12]
Haller [7]
Sreehari [10]
Proposed

8.106
5.488
3.959
3.268

3.714
5.029
2.860
2.328

(C1)’
4

Correction Logic for BCD
Subtraction

4

S2

Figure16. Gate level diagram of correction unit for BCD Subtraction

C1

Correction Logic for BCD
Addition

S1

S3 (C1)’

Architecture

4

O1

Figure15. Gate level diagram of correction unit for BCD Addition

S1

Carry select 1-adder

PowerDelay
Product
(10-14)
35.984
27.911
11.486
7.813

Area
(no. of
mosfets)
8510
11056
2902
2500

4

ADD/SUB

MUX

It is clear from the above Table 1 that the proposed design
has an improvement of 17.45% in terms of delay and is
18.60% better in terms of power giving it a 32% improvement
in power-delay product over the most efficient architecture in
the literature.

4

B4-1

D4-1

BCD/Binary

MUX



1

O3

S4(C1)’

The propagate and the generate signals produced by the
P-G block are then sent to the prefix network. The selection of
the prefix network can be made according to the requirements
of area, power and delay from the wide range available in
literature. For simulation purposes Sklansky network is chosen
for the design [13]. The prefix network generates the group
generate for each stage which is the carry out of that stage.
Carry out of nth stage is denoted by Cn.
After all the carry/borrow bits are obtained, these are fed
to the correction stage which along with the sum/difference
bits from the P-G block gives out the final result which is
shown in Fig. 14. The first operation in the correction block is
to add the in-coming carry/borrow from the previous stage to
the sum/difference bits. This is implemented using carry select
adder to reduce the delay. After this stage the correct binary
outputs are obtained but for BCD addition/subtraction,
corrections need to be made to obtain the correct BCD result.
For the BCD addition the correction is done by adding (0110)2
and for BCD subtraction the correction is done by adding
(1010)2 to the correct binary output whenever needed. The
logic diagram of the two correction units is shown in Fig. 15
and Fig. 16
S3

1

0

(c)

S4

1

0

O4

Figure13. (b) Grey Cell (c) Black Cell

4

S1

S2

S3

S4

4

S2

G i:k

O4-1

                           Figure14.Correction Block

215

V. Conclusion
This paper presented a modified architecture for fast BCD
addition/subtraction that performs binary addition/subtraction
without any extra hardware. The design is runtime
reconfigurable and maximum utilization of the hardware is a
feature of the architecture. All the blocks have been designed
to work with least delay. The proposed architecture shows, on
an average, an improvement of 32% in power-delay product
over the most efficient architecture in the literature.

REFERENCES
[1]

[2]
[3]
[4]

[5]
[6]

[7]

[8]

[9]

[10]

[11]
[12]

[13]

M.S.Schmookler and A. Weinderger. “Decimal Adder for Directly
Implementing BCD Addition Utilizing Logic Circuitry”, International
Business Machines Corporation, US patent 3629565, pages 1 – 19, Dec
1971.
IEEE standard for floating-point arithmetic. IEEE SC, Oct. 2006 at
http://754r.ucbtest.org/drafts
M. J. Adiletta and V. C. Lamere. “BCD Adder Circuit”. Digital
Equipment Corporation, US patent 4805131, pages 1 – 18, Jul 1989.
H. Fischer andW. Rohsaint. “Circuit Arrangement for Adding or
Subtracting Operands Coded in BCD-Code or Binary-Code”, Siemens
Aktiengesellschaft, US patent 5146423, pages 1 – 9, Sep 1992
Flora, Laurence P., “Fast BCD/Binary Adder”, US Patent 5007010.
W. Haller, U. Krauch, and H. Wetter. Combined Binary/Decimal Adder
Unit. International Business Machines Corporation, US patent 5928319,
pages 1-9, Jul 1999.
W. Haller, W. H. Li, M. R. Kelly, and H. Wetter. “Highly Parallel
Structure for Fast Cycle Binary and Decimal Adder Unit”. International
Business Machines Corporation, US patent 2006/0031289, pages 1 – 8,
Feb 2006
S. Hwang. “High-Speed Binary and Decimal Arithmetic Logic Unit”,
American Telephone and Telegraph Company, AT&T Bell Laboratories,
US patent 4866656, pages 1-11, Sep 1989.
Sreehari Veeramachaneni, M. Keerthi Krishna , L. Avinesh, P Sreekanth
Reddy, M.B. Srinivas, “Novel High-Speed 16-Digit BCD Adders
Conforming to IEEE 754r Format”, IEEE Computer Society Annual
Symposium on VLSI (ISVLSI’07), pages 343-350, Mar 2007.
Sreehari Veeramachaneni, M, Kirthi Krishna; V, Prateek G, S. Subroto,
S, Bharat, M.B.Srinivas, “A Novel Carry-Look Ahead Approach to a
Unified BCD and Binary Adder/Subtractor”, 21st International
Conference on VLSI Design 2008, pages 547-552, Jan 2008.
U. Grupe.“Decimal Adder“, Vereinigte Flugtechnische Werke-Fokker
gmbH, US patent 3935438, pages 1 – 11, Jan 1976.
D.R.Humberto Calderón, G. N. Gaydadjiev, S. Vassiliadis,
“Reconfigurable Universal Adder”, Proceedings of the IEEE
International
Conference
on
Application-Specific
Systems,
Architectures, and Processors (ASAP 07), pages 186-191, July 2007.
J. Sklansky, “Conditional-sum addition logic,” IRE Trans. Electronic
Computers, vol. EC-9, pages 226-231, June 1960.

216

A High performance unified BCD adder/Subtractor

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Ähnlich wie A High performance unified BCD adder/Subtractor

Ähnlich wie A High performance unified BCD adder/Subtractor (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

A High performance unified BCD adder/Subtractor