Escolar Documentos
Profissional Documentos
Cultura Documentos
I.
INTRODUCTION
An area-time efficient hardware-based BCD multiplier is
often preferred to much slower software simulations [1], to
meet the precision and performance requirements in financial
computing. In general, an n-digit-by-n-digit BCD multiplicacan be performed as:
tion
10
10 ,
(1)
where
is referred as the i-th partial product,
and the i-th digit of the multipgiven by the multiplicand
lier, (i.e., ).
Figure 1
Decimal Value
8421
5421
4221
5211
0000
0000
0000
0000
0001
0001
0001
0001 | 0010
0010
0010
0010 | 0100
0011 | 0100
0011
0011
0011 | 0101
0101 | 0110
0100
0100
0110 | 1000
0111
0101
1000
1001 | 0111
1000
0110
1001
1100 |1010
1010 | 1101
0111
1010
1101 | 1011
1100 | 1011
1000
1011
1110
1110 | 1101
1001
1100
1111
1111
A general parallel BCD multiplier architecture is depicted in Figure 1. It has two major stages: (i) the partial
product generation (PPG) stage, including a decoder of to
select the right multiples of the multiplicand as determined by
the pre-computations logic, and (ii) the partial product accumulation (PPA) stage, where all the partial products are
shifted and added together to obtain the final product.
Although operands in the decimal multipliers (Figure 1)
are typically represented and computed in popular 8421 BCD,
actually, 4221, 5211 and 5421 BCD (TABLE 1) can also be
employed, alone or along with 8421 codes, to simplify the
computation logic, and thus to improve the area/time efficiency over conventional 8421-based BCD multipliers. For
instance, [2] [3] introduce 5421-to-4221 and 8421-to-5211
BCD recoding to speed up the PPG and generate 4221-coded
partial products, so that 4221 carry-save adders (CSAs),
which have simpler logic than 8421 adder, can be applied in
the partial product reduction (PPR). However, as existing
4221 PPRs involve long carry propagations, their overall performances tend to be much worse than the 8421 PPR [4] of
the same size. Besides, instead of using a 4221 adder to add
the two 4221-coded results of the 4221 PPR directly, current
practice employs 4221-8421 conversions so that classical
8421 full adders can be used to obtain the final 8421 products;
this conversion increases both the delay and chip area.
In this paper, we design three 16-by-16 multipliers based
on different combinations of 8421 and 4221 BCD codes. The
first 8421 multiplier is designed following the architecture
originally reported in [4], but with significant improvement
where the pre-computations are performed with 8421-5421
recoding logic. We also design a 4221 BCD multiplier, which
includes modified 4221-5211 recoding for multiplicand precomputations and novel 4221 carry-lookahead adders (CLAs)
for PPA. The third design is also a 4221 BCD multiplier, but
built upon 32:2 4221 CSA trees and a 4221 CLA for PPA, so
that no 4221-8421 conversions are required. We synthesize
all three multipliers, and compare them against the known
best designs in terms of delay and delay-area product.
In what follows, we will review previous work in Section
2. The proposed 8421 and 4211 multipliers are described in
Sections 3 and 4, respectively. Performance results of the
various BCD multiplier designs are reported and analyzed in
Section 5. Finally, the conclusion is drawn in Section 6.
II.
PRIOR WORK
In this section, various design techniques that are applicable to PPGs and PPAs (Figure 1) in decimal multipliers
with different BCD representations will be reviewed.
A. Data Coding and Logic in Partial Product Generation
Existing PPGs decode to select the right multiples of
the multiplicand available from the pre-computation logic
(Figure 1). Several decoding strategies of have been proposed to simplify the logic of PPG. In [5] [6], is written as
0
1 , where 0 , 1 {0,1,2,4,5}, so that a large
partial product can be computed as summation of two
1391
3
3
2
3
0
0
1
0
1
2
3
3
3
3
3
0
1
2
2
3
2
2
1
3
1
3
3
2
1
3
0
0
2
1
0
1
2
1
3
0
1
0
(2)
0
3
1
0
2
0
(3)
IV.
4221 BCD MULTIPLIER
To take full advantage of the simplicity of CSAs and 9s
complement in 4221, we implement a novel 4221 full adder
so that the 4221-to-8421 conversion as required in [2] [3] is
no longer needed in our 4221 multipliers, and all the inputs
and outputs of the multipliers as well as the internal intermediate results can be represented directly in 4221. However,
due to the redundant and discontinuous nature of 4221 representation, we only use the codes listed in the left sub-column
of the 4221 column in TABLE 1, if there are two representations for one decimal value. This way, we can avoid the socalled many-to-many 5211-to-4221 recoding and the high
complexity of 4221 full adder logic that otherwise will incur.
A. 4221 BCD PPG
The PPG is similar to what is shown in [4]. Since we
force each decimal value to be represented by one unique
4221 BCD, we now calculate 2 in 4221 directly by (4), and
by 3 bits to become
5 by left-shifting the 4221-encoded
a 5211-encoded 5 and recoding it from 5211 to 4221 codes
in 4221 needs more gates
(5). Obviously, calculating 2
than that in 8421-5421 recoding. In addition, the Boolean
expressions for 0 , 1 and OP in 4221, (6)~(8) respectively,
are also more complicated than those in 8421 [4]. Even so,
given the simpler logic for obtaining a numbers 9s complement in 4221, the PPG for 4221 may still hold its performance advantage.
For each PPG, after achieving OP and the two intermediate partial products, the second of which might be negative,
we sum them up, based on two addition schemes for the two
4221 PPAs following, to ensure that the operands for the
PPRs are positive. One scheme is by using a 4221 CLA to
add the 2 intermediate partial products and OP to produce
one positive 4221-coded partial product for the following
4221 CLA trees; the other utilizes a multi-digit 4221 CSA [2]
to produce two positive intermediate 4221-coded partial
products for the following PPR based on 32:2 CSA trees.
B. 4221 BCD Addition and PPA
Like in [8], the logic of a 1-digit 4221 full adder can be
expressed as (9); that is, adding two 1-digit inputs, 3: 0
and 3: 0 , and a single-bit carry-in, cin, gives a 1-digit sum
3: 0 , a single-bit carry-out, cout, a single-bit carrygeneration, gdigit, and a single-bit carry-propagation, pdigit.
Let us define
&
,
|
,
^
, for
0,1,2,3.
Obviously, the logic of 4221 full adder is much more
complicated than 8421 in [8]. 4221 CLAs are built with the
same carry generation/propagation structure as those in [10].
1392
2 0
2 1
3
_ 3
2 2
2 3
1
2
3
3
3
3
1
2
2
0 0
4
2
3
1
1
1
1
1 0
1 1
2
3
3
0
1
1
2
3
3
3
2
3
1
1
2
2
3
2
1
2
0
1
2
0
1
0
3
(4)
1
3
1
0
0
3
2
Figure 2
(5)
1
0
1
0
1
2
3
1
3
3
2
1
3
2
1
2
0
1
2
1 0
0
1
0
1
2
0
1
0
0
(6)
(7)
1
1
0
3
3
3 2
2 1 0
0
0
0
0
0
0
0
3 1
3 1
2 1
3 0
3
0 1
1
1
1
1
3
2
2
3
1
1
2
2
0
0
1
2
2
(8)
0
2
1
2
0
0
2
1
0
0
1
0
0
3~ 3 2 1 0
3 2 1 0
3 2 1 0
3 2
3 2 1 0
3 2 1
3 2 1 0
3 1
3 0
3 0
3
3
Figure 3
carries, and outputs two 4221 digits with the same weight.
(9) Finally, a 32-digit 4221 CLA adds up the results of the PPR
3 2
2
to get the final 4221 product.
2 0
2 1
2 0
This proposed 4221 PPA architecture is similar to that
1
0
0
0
used
in [2] using CSAs, but with one major distinction, i.e. no
1
0
0
4221-8421 recoding logic is needed in our design.
3 1 0
1 0
V.
EVALUATION AND COMPARISON
2 1 0
3 1 0
3 2 0
3 1
In this paper, we implement 16-by-16 decimal multip3
2 1 0
2 0
liers for 8421 and 4221, and compare them to those in [2] [4],
3 2 1 0
the two most area-time efficient and high performance archi3
2 1
2 0
2
1 0
3
tectures known in the open literature. All the designs are
1
1 0
coded in Verilog HDL and synthesized using Synopsys De3
1
1 2
0
sign Compiler, with the 90nm technology from TSMC. We
2 0
2 1 0
use the product of delay and circuit area as the merits for perA PPR with 4221 CSA trees is shown in Figure 2. Partial formance comparisons.
products are separated by horizontal grids, from top to bottom.
A. Decimal PPG
Each partial product includes its first positive intermediate
We first evaluate the logic of generating 2 and 5 , as
partial product as a black dot row, and its positive second
listed in TABLE 2. Then we synthesize the respective PPGs
intermediate partial product as a grey dot row. Each column
of all three multipliers (TABLE 3).
includes a 32:2 4221 CSA tree, as depicted in Figure 3, where
It shows that, 8421-5421 recoding is the most efficient
each CSA is a 3:2 4-bit binary CSA and each x2, i.e. 2a ,
for 2 and 5 , and it reduces the 8421 PPG area in [4] by
has a carry-in from the CSA tree of the less significant digit
about 10%, for the same amount of delay. Since the 4221
and a carry-out to the more significant one [2]. x1 is to PPG takes both advantages of simple 9s compliment and
ensure the correct 4221 representation for the 4221 full adder. multi-digit 4221 CSAs for the two intermediate partial prodEach CSA tree adds up the digits of the same weight from the ucts, no carry-propagation is involved, leading to low delay
32 positive intermediate partial products, propagates the and circuit area.
1393
TABLE 2 16-DIGIT 2
AND 5
Multiplier
Delay (ns)
COMPARISON
)
Area (
Delay
Area
8421 [5]
0.03
622.34
8421-5421 [2]
0.03
600.47
18.67
18.01
4221-5211
0.05
1281.37
60.07
8421 [5]
0.03
1450.01
45.50
5421-8421 [2]
0.03
751.46
22.54
5211-4221
0.03
975.84
29.28
Delay (ns)
8421 [4]
0.50
Area (
Delay
6802.69
Area
3401.35
8421-5421
0.50
6313.00
3156.50
0.50
9439.52
4719.76
0.30
4114.35
1234.31
Full Adder
Delay (ns)
8421
0.1
Area (
Delay
282.24
Area
28.22
4221
0.11
823.44
90.58
8421
0.30
3389.00
1016.70
4221
0.30
6236.09
1870.83
Delay (ns)
63,149
63,149
1.23
98,487
121,139
0.73
5,810
3,548
4221 PPR
91,990
91,990
1.26
125,280
157,853
Area (
Delay
Area
REFERENCE
Delay (ns)
1.49
200,903
299,345
8421-5421 Multiplier
1.46
181,873
265,535
1.70
263,089
447,251
1.55
205,103
317,909
Area (
Delay
Area
1394