Improved architecture for efficient Binary Coded Decimal (BCD) addition/subtraction is presented that performs binary
addition/subtraction without any extra hardware
2. in Fig.6), which is used for group propagate and generate. The
CM block is basically the Black cell used in the proposed
design. It can be easily observed that the delay for generating
the carry out from the CM1 is same as that for the second
ripple carry adder thereby making the CM1 block redundant.
Further, replacement of CM3 block by the grey cell (GC) will
reduce the hardware without affecting the functionality of the
circuit.
The design of the Universal Adder (Fig. 4) proposed by
D.R.Humberto et al. [12] uses affective addition/subtraction
operations on unsigned/sign-magnitude, and various
complement representations. This design overcomes the
limitations of previously reported approaches that produce
some of the results in complement representation when
operating on sign-magnitude numbers. The design has high
latency.
A4 B4
Add N1' N2'
EAdd
N2
N1
N1 N2*
DC
Logic
XOR
Add
Decimal SUB
Binary ADD/SUB
Decimal Add
Digitwise -6
N1
Bin
A2 B2
A1 B1
FA
Co
Logic
Correction Coder
Decimal
A3 B3
FA
FA
FA
N2
CM2
Correction Time
CM1
DS
(3/2) Counter Array
CM3
Co
Carry-Propagate Adder
Ci
Add
XOR
0
SUM
1
Figure4. Humberto’s Proposal [12]
G e n e ra te
Sreehari et al. [9] recently came up with the prefix logic
based BCD adders and proposed a novel unified BCD binary
adder-subtractor [10] which is considered as the fastest unified
adder in the literature so far. The architecture is divided into
three major parts, the pre-computation stage, the prefix
network and the post-computation stage. This architecture is
illustrated in Fig.5. The pre-computation block consists of
logic to compute propagate and generate signals for both BCD
and Binary addition/subtraction.
Bn An
BCD
BCD
PG
P*G*
B3 A3
Bn-1 n-1
A
BCD
FA
PG
BCD
FA
PG
BCD
FS
P*G*
BCD
FS
P*G*
B2 A 2
BCD
FA
PG
BCD
FS
P*G*
Wide varieties of prefix networks are available
depending on the requirements of the designer. Sklansky
network is chosen by the authors for reduction in delay.
The post-computation block proposed by Sreehari et al.
[10] (Fig. 7) uses a 4 bit CLA to add the two numbers to
calculate the sum/difference and the carry out bits for each
stage. But these bits are already calculated in the precomputation block and the prefix network thereby making
more than half of the post computation block redundant.
Removing these redundancies from the design can increase the
performance of the architecture considerably for each stage.
B1 A1
BCD
FA
PG
P ro p a g a te
Figure6. The Existing P-G Block [9]
BCD
FS
P*G*
As Bs
B1 B2 B3 B4
C in
PREFIX NETWORK
A1 A2 A3 A4
0
O pSelect
FA/FS
FA/FS
FA/FS
FA/FS
FA/FS
4-bit C LA Full Adder
OpSelect
4
MUX
B k+1
Sum/Difference
4
C orrector for
Subtractor
C out
C orrector for
Adder
4
Figure5. Architecture of the existing Unified BCD and Binary
Adder/Subtractor [10]
4
MUX
4
SU M / D IFFER EN C E
The pre-computation stage of the architecture is not
clearly presented by the authors in the paper [10] and it is
assumed that they have used the same P-G block presented in
[9]. The P-G block uses a Carry Merge block, CM (as shown
Figure7. Post Computation Block [10]
212
C in’
1's
C om plem ent
1
3. This paper presents a modified version of this unified
adder and is shown to perform better by at least 32% in
power-delay product. The rest of the paper is organized as
follows: Section 2 gives description of the algorithm for the
unified adder while section 3 describes the proposed
architecture. Simulation results for the proposed and existing
circuits are given in section 4 and comparisons are carried out.
.
4
AN-(N-3)
4
B8-5
4
A8-5
4
B4-1
P-G Block 2
PG2
S8-5
4
S4-1
4
Correction
Block K
4
ON-(N-3)
C2
S8-5
4
Correction
Block 2
O8-5
4
C1
S4-1
36
+3 8
74
0
1
1
1011 1100
+ 0010 0011
+ 0010 0011
1110 0000
0
1
1
1011 1100
1011 1100
+ 0010 0111
+ 0010 0111
1110 0011
4
1110 0100
Figure10. Example of Binary addition/subtraction illustrating the concept of 4
bit propagate and generate for BCD subtraction / Binary addition/subtraction
Correction
Block 1
O4-1
0
1011 1100
1
Generate
C + D > 1111
SN-(N-3)
1
1101 1111
Propagate
C + D = 1111
Prefix Network
CK
36
+3 3
69
For
Binary
addition/subtraction
and
BCD
subtraction,
P* = 1 if C + D = 15 (C and D are 4 bit numbers)
G* = 1 if C + D > 15
For the case of subtraction, D is the 2’s complement of the
original subtrahend.
For BCD subtraction P* and G* remain the same as in
binary addition/subtraction because BCD subtraction is treated
as Binary subtraction for the first two stages.
These control signals are then sent to the prefix network
which calculates the group propagate and generate using the
formula
Gi:j = Gi:k + Pi:k.Gk-1:j
Pi:j = Pi:k.Pk-1:j
where i ≥ k > j
P-G Block 1
PG1
0
4
A4-1
4
4
SN-(N-3)
PG K
0
Figure9. Examples of BCD addition illustrating the concept of 4 bit propagate
and generate
0
P-G Block K
1
36
+3 8
75
Generate
A+B>9
The main objective of the algorithm is to perform efficient
BCD addition/subtraction. But in the proposed design the
binary addition/subtraction is automatically taken care of
without any extra hardware. As BCD digits are 4 bits in
length, all the operations, be it BCD addition/subtraction or
binary addition/subtraction, are done on 4 bit numbers. The
algorithm divides the proposed design into three major parts,
the P-G Block, the Prefix Block and the Correction Block as
shown in the Fig. 8.
The P-G block generates signals named propagate (P) and
generate (G) for every 4 bits. These signals are used by the
prefix network for generating the carry out for each stage.
The P and G for a stage denote whether the stage propagates
or generates the carry/borrow respectively. Along with
generating these signals, the sum/difference of the 4 bit
numbers is obtained that is directly used by the correction
logic unlike the previous design [10]. The P-G block itself
uses prefix logic to generate the P and G signals for 4 bit
numbers.
4
36
+3 3
70
1
II. Algorithm for Unified BCD/Binary
Adder/Subtractor
BN -(N -3)
1
1
Propagate
A+B=9
4
The group Pk:0 and Gk:0 bits denote whether the first k
stages propagate or generate the carry/borrow. Gk:0 denotes the
carry out of the kth stage i.e. Ck = Gk:0 where Ck is the carry
out of the kth stage.
After all the carry/borrow bits are obtained, these are fed
to the correction stage which along with the sum/difference
bits from the P-G block gives out the final result. The first
operation in the correction block is to add the in-coming
carry/borrow from the previous stage to the sum/difference
bits. This is implemented using carry select adder to reduce
Figure8. Architecture of Unified BCD/Binary Adder/Subtractor
The concept of propagate and generate for different cases
are illustrated below with equations and examples.
For BCD addition,
P = 1 if A + B = 9 (A and B are 4 bit numbers)
G = 1 if A + B > 9
213
4. the delay. After this stage the correct binary outputs are
obtained but for BCD addition/subtraction further corrections
are to be made to obtain the correct BCD result.
For BCD addition (0110)2 or (6)10 is added to the binary
sum if it exceeds (1001)2 or (9)10 to get the correct BCD sum.
BCD subtraction in the first block is treated as binary
subtraction and the difference is obtained by the 2’s
complement technique. The only thing which has to be taken
care of is that the magnitude of subtrahend should always be
smaller than that of the minuend. If a digit of the minuend is
greater than that of subtrahend the binary output for that digit
is the correct BCD output and there is no need for any
correction. But if a digit of the subtrahend is greater than the
minuend then (1010)2 or (10)10 has to be added to the binary
output for that digit to get the correct BCD difference. To
detect the relative magnitude of the minuend and the
subtrahend of a PG block, the carry out of that stage is
checked. The following example illustrates the above
algorithm.
Let A (minuend) = 5 5 6
B (subtrahend) = 2 3 9
In BCD format: A = 0101 0101 0110
B = 0010 0011 1001
Treating these numbers as binary, 2’s complement of B,
say C is taken
C = 1101 1100 0111
Next the subtrahend is added to the minuend and the
correction is done if needed
is 1 when the effective operation is subtraction and 0 when the
effective operation is addition.
III. Proposed Architecture of the Unified BCD/Binary
Adder/Subtractor
The architecture, as discussed before, consists of three
major blocks i.e. the P-G block, the Prefix block and the
Correction block (Fig.8).
The architecture of the P-G block is shown in the Fig. 12.
Each block takes in 8 bits, 4 bits of each number and generates
the propagate and generate signals for BCD addition (P and G)
and for BCD subtraction/Binary addition/subtraction (P* and
G*) and also the sum or difference bits (S4 to S1 in the below
case). The logic diagram of the full adder, BC (black cell) and
GC (grey cell) are given in Fig. 13. For the case of
BCD/Binary subtraction, 2’s complement of the subtrahend is
calculated by inverting the bits of the subtrahend and adding 1
to the adder generating the least significant bit in the first P-G
block (least significant) as shown in Fig. 12. The rest of the PG blocks only take complements of the subtrahend and do not
add 1. To choose between the two kinds of propagate and
generate a multiplexer is used at the end of each P-G block.
Bs
OpSelect
SUB/ADD
B4
Carry out, no
correction needed
SUB/ADD
A4
A3
FA
+
Correct Binary output
Correction
Correct BCD output
No carry out,
correction needed
0
FA
g2:0
S3
FA
S2
S1
p4:3,g4:3
S4 S3
GC
S2 S1
S4
S1
0
g4:0/Cout1
1101 1100 0111
G
1
G*
P
P*
0011 0001 1101
+
A1
A2
p3,g3
B1
BC
0101 0101 0110
A
C
1
SUB/ADD
B2
FA
S4
p4,g4
1
SUB/ADD
B3
1
1010
0
BCD ADD/ELSE
Output to Prefix Network
0011 0001 0111
Figure 11.Illustrating the proposed algorithm for BCD subtraction
           Â
Figure12. The P-G block
    Hence the final result = (317)10
A
The signed numbers are taken care by the control logic at
the beginning which takes the two sign bits and OpSelect
(Operation Select) as inputs to compute the control signal
(SUB/ADD) which specifies the effective operation to be
performed by the hardware. The effective operation to be
performed is calculated by the below equation
SUB/ADD = As ⊕ Bs ⊕ OpSelect
B
XOR-XNOR
Cin
Generate
Propagate
where OpSelect is 1 when the operation is subtraction and 0
when the operation is addition and As and Bs are the sign bits
of the numbers under operation. The control signal SUB/ADD
0
1
0
Sum
Figure13. (a) Full adder
214
1
Cout
As
5. G k-1:j
S3
S4
Pi:k
0
Gk-1:j
Pi:k
Gi:k
1
Pk-1:j
C1
   Â
G i:j
Pi:j
Gi:j
(b)
0
S2
0
1
O2
S4
1
S2.S3
0
S2(C1)’
S3
0
1
For the binary addition/subtraction the output of the carry
select adder in the correction block gives the final result.
IV. Simulations and Results
The analysis of all the architectures tabulated below has
been carried out by performing simulation runs on HSPICE
using 65nm CMOS technology. Simulations are performed for
32 bit adders/subtractors. All the circuits are simulated at 1.2V
at a frequency of 50 MHz. The simulation results are shown in
Table 1
TABLE I
Average Delay, Power, Power-Delay Product and Area of various
architectures
Cin
4 D4-1
Delay
(10-10 sec)
Power
(10-4 watt)
Humberto[12]
Haller [7]
Sreehari [10]
Proposed
8.106
5.488
3.959
3.268
3.714
5.029
2.860
2.328
(C1)’
4
Correction Logic for BCD
Subtraction
4
S2
Figure16. Gate level diagram of correction unit for BCD Subtraction
C1
Correction Logic for BCD
Addition
S1
S3 (C1)’
Architecture
4
O1
Figure15. Gate level diagram of correction unit for BCD Addition
S1
Carry select 1-adder
PowerDelay
Product
(10-14)
35.984
27.911
11.486
7.813
Area
(no. of
mosfets)
8510
11056
2902
2500
4
ADD/SUB
MUX
It is clear from the above Table 1 that the proposed design
has an improvement of 17.45% in terms of delay and is
18.60% better in terms of power giving it a 32% improvement
in power-delay product over the most efficient architecture in
the literature.
4
B4-1
D4-1
BCD/Binary
MUX
   Â
1
O3
S4(C1)’
The propagate and the generate signals produced by the
P-G block are then sent to the prefix network. The selection of
the prefix network can be made according to the requirements
of area, power and delay from the wide range available in
literature. For simulation purposes Sklansky network is chosen
for the design [13]. The prefix network generates the group
generate for each stage which is the carry out of that stage.
Carry out of nth stage is denoted by Cn.
After all the carry/borrow bits are obtained, these are fed
to the correction stage which along with the sum/difference
bits from the P-G block gives out the final result which is
shown in Fig. 14. The first operation in the correction block is
to add the in-coming carry/borrow from the previous stage to
the sum/difference bits. This is implemented using carry select
adder to reduce the delay. After this stage the correct binary
outputs are obtained but for BCD addition/subtraction,
corrections need to be made to obtain the correct BCD result.
For the BCD addition the correction is done by adding (0110)2
and for BCD subtraction the correction is done by adding
(1010)2 to the correct binary output whenever needed. The
logic diagram of the two correction units is shown in Fig. 15
and Fig. 16
S3
1
0
Â
(c)
S4
1
0
O4
Figure13. (b) Grey Cell (c) Black Cell
4
S1
S2
S3
S4
4
S2
G i:k
O4-1
                           Figure14.Correction Block
Â
Â
215
6. V. Conclusion
This paper presented a modified architecture for fast BCD
addition/subtraction that performs binary addition/subtraction
without any extra hardware. The design is runtime
reconfigurable and maximum utilization of the hardware is a
feature of the architecture. All the blocks have been designed
to work with least delay. The proposed architecture shows, on
an average, an improvement of 32% in power-delay product
over the most efficient architecture in the literature.
REFERENCES
[1]
[2]
[3]
[4]
[5]
[6]
[7]
[8]
[9]
[10]
[11]
[12]
[13]
M.S.Schmookler and A. Weinderger. “Decimal Adder for Directly
Implementing BCD Addition Utilizing Logic Circuitry”, International
Business Machines Corporation, US patent 3629565, pages 1 – 19, Dec
1971.
IEEE standard for floating-point arithmetic. IEEE SC, Oct. 2006 at
http://754r.ucbtest.org/drafts
M. J. Adiletta and V. C. Lamere. “BCD Adder Circuit”. Digital
Equipment Corporation, US patent 4805131, pages 1 – 18, Jul 1989.
H. Fischer andW. Rohsaint. “Circuit Arrangement for Adding or
Subtracting Operands Coded in BCD-Code or Binary-Code”, Siemens
Aktiengesellschaft, US patent 5146423, pages 1 – 9, Sep 1992
Flora, Laurence P., “Fast BCD/Binary Adder”, US Patent 5007010.
W. Haller, U. Krauch, and H. Wetter. Combined Binary/Decimal Adder
Unit. International Business Machines Corporation, US patent 5928319,
pages 1-9, Jul 1999.
W. Haller, W. H. Li, M. R. Kelly, and H. Wetter. “Highly Parallel
Structure for Fast Cycle Binary and Decimal Adder Unit”. International
Business Machines Corporation, US patent 2006/0031289, pages 1 – 8,
Feb 2006
S. Hwang. “High-Speed Binary and Decimal Arithmetic Logic Unit”,
American Telephone and Telegraph Company, AT&T Bell Laboratories,
US patent 4866656, pages 1-11, Sep 1989.
Sreehari Veeramachaneni, M. Keerthi Krishna , L. Avinesh, P Sreekanth
Reddy, M.B. Srinivas, “Novel High-Speed 16-Digit BCD Adders
Conforming to IEEE 754r Format”, IEEE Computer Society Annual
Symposium on VLSI (ISVLSI’07), pages 343-350, Mar 2007.
Sreehari Veeramachaneni, M, Kirthi Krishna; V, Prateek G, S. Subroto,
S, Bharat, M.B.Srinivas, “A Novel Carry-Look Ahead Approach to a
Unified BCD and Binary Adder/Subtractor”, 21st International
Conference on VLSI Design 2008, pages 547-552, Jan 2008.
U. Grupe.“Decimal Adder“, Vereinigte Flugtechnische Werke-Fokker
gmbH, US patent 3935438, pages 1 – 11, Jan 1976.
D.R.Humberto CalderĂłn, G. N. Gaydadjiev, S. Vassiliadis,
“Reconfigurable Universal Adder”, Proceedings of the IEEE
International
Conference
on
Application-Specific
Systems,
Architectures, and Processors (ASAP 07), pages 186-191, July 2007.
J. Sklansky, “Conditional-sum addition logic,” IRE Trans. Electronic
Computers, vol. EC-9, pages 226-231, June 1960.
216