Low Power and Fast Adder Implementation

Low power and fast adder implementation
with Double Gate MOSFETs

1
B.Vignesh , P.sujith
palanisamy.suji@gmail.com1,vigneshkiruba@gmail.com2
UG Scholar,Department of ECE,
Sri Ramakrishna Institute of technology, Coimbatore.
1,2
AbstractIn this paper we present implementation of a 32-bit

adder using Quad Carry Look Ahead(QCLA) algorithm in
compound domino logic with Merged Pre-charge Keeper
transistor and Statistically Skewed Inverter with Double Gate
MOSFET(DGMOSFET)s. The worst case propagation delay of
the adder is 220ps. The average operating power is 186 W.
Index Terms DGMOS, carry-look-ahead QCLA, domino
logic, propagation delay, power consumption.
INTRODUCTION
I.
or the past several years Semiconductor Industry has been

following Moore's law of scaling. According to Moore's
law, the performance, improves by 30%, the number of
transistors on a chip doubles roughly every 18 to 24 months
[1].
However the semiconductor industry is now facing a lot of
challenges such as, high power density, device reliability, etc,
as it scales down to the lower nano sizes(less than 45nm
node). Hence, innovative device structures are needed to cope
up with some of these challenges, and to support high
performance and low power applications. The double gate
MOSFET (DG-MOSFET) device is one of the most promising
candidates for replacing conventional MOSFET in today's
state-of-the-art chips, in the near future [2]. Due to their
smaller size, controllable threshold voltage and reduced power
consumption, they are very effective devices for high
performance and low power digital circuits. Fast and efficient
adders are essential for high performance micro processors.
Such adder architectures typically utilize parallelism and
dynamic logic to achieve fast computation. Here we design
our adder by utilizing the concept of independent gate control
of DGMOS which is applied to domino circuits. Organization
of paper is as follows: The fully depleted (FD) double gate
(DG) silicon-on-insulator (SOI) technology with planar
independent self-aligned gates is briefly explained in Sec II.
Quad Carry Look Ahead (QCLA) adder algorithm and its
implementation is given in Sec III and Sec IV respectively.
DCVSL based adder cell is discussed in Sec V and the results
are presented in Sec VI.
II. DOUBLE GATE FDSOI TECHNOLOGY
We note that there have been a few different double-gate
structures reported in the literature. The present work is based
A.N.Jayanthi3
jaynathimuthuraman@rediffmail.com3
3
Assistant Professor, Department of ECE,

Sri Ramakrishna Institute of technology, Coimbator
on DG-MOSFET designed and fabricated at LETI, France [3]
and Fig. 1 shows a TEM microphotograph of a planar DG
transistor fabricated in the process.
23
ding DG NMOS symbol.
The essential features of a FD SOI DGMOS are a uniform

Fi and thin silicon channel, thick source drain regions and
g. aligned top and bottom gates. There are two main types of DG
1 MOSFETs: The symmetric double gate (SDG) device with
T
E both gates of identical work functions and gate oxide thickness
M and the asymmetric double gate (ADG) device with different
cr gate work functions and different gate oxide thickness. Lin et
os
al. investigated the circuit performance of these two devices
s
se [4] and reported that the driving current in SDG is higher than
cti the driving current in the ADG due to the inversion charge
on difference for the same threshold (leading to better
of
a conductivity). Furthermore since the electric field is lower in
do the SDG, the mobility of the carriers is higher which directly
ub impacts the current. The higher mobility is also responsible for
lethe SDG having a lower delay and hence the SDG is preferred
m
et over ADG design of logic circuits. We have used the device
al- given in [5] and the supply voltage is maintained at 1.2V for
ga our adder implementation.
te
tra
III. QCLA ALGORITHM
ns
ist
or
Usually, fast adders are implemented using a Carry
[3
] Look Ahead algorithm which uses the traditional generate and
an
propagate terms [6]. If ai and bi are the input operands,
d
th pi and g i are propagate and generate signals respectively
e then sum bits, Si , can be described by the following equations
cor
I (0, i) pi pi1 pi2 ........... p0 ,
res
pon
G(0, i) gi piG(0, i 1); Si aibici ;
Where G(0, i) and I (0, i) denotes the group generate and

group propagate signals respectively for a group of bits from
position 0 to i. The quantity that is propagated to the next
stage is the Carry-out at bit i . Block diagram of 16 bit QCLA
is given in Fig. 2 in which we need binary, ternary and quad
convergences to provide best compromise between delay and
power consumption. So we need three types of cells whose
logic equations are given in [7]. Below are the equations for a
4-bits adder
PG2:I(0,1)=p1p0 and G(0,1)= g1 +g0p0

PG3: I(0,2)=p2p1p0 and G(0,2)= g 2 +g1p2 + g0p2p1
PG4: I(0,3)=p3p2pp0 and
G(0,3)= g3 +g 2p3 +g1p3p2 + g 0p3p2p1.
Lings equations [8] are an alternative to the Classical CLA,
by identifying pi
reformulated as
gi gi , the generate term G(0, i) , can be
G(0, i) pi ( gi G(0, i 1)) pi H (0, i 1)

In Lings adder, the pseudo-carry H i is propagated, and
combined with the remaining terms in the final sum:
H (0, i) gi pi1H (0, i1);

Si piH (0, i) gi pi1H (0, i1).
The
advantage
of using
Lings
equations
comes
after
expandin
g the
recursion
s [9]. For
instance,
expandin
g the
recursion
s of
(0,
a group
of 4 bits
results in
H
3)
g
p
p
The H(0,3)
term has
fewer factors
than G(0,3),
which in
CMOS
requires
fewer transistors in the stack of the first gate.

However, the sum computation when using Lings pseudocarry equations is more complex. So Lings equations
effectively move complexity from the carry tree into the sumpre-compute block [10] which is not in the critical path.
IV. IMPLEMENTATION OF QCLA with DGMOS
Meng et al. [11] proposed novel DG circuit techniques for
NAND, NOR etc. which reduced the area as well as the power
resulting in improved performance. These are the most basic
techniques when it comes to DGMOS and are widely used. In
[12] NAND gate circuits with reduced stack have been
proposed. These circuits achieve higher density due to
application of different threshold voltages for NMOS and
PMOS devices. We have implemented our basics cells with
dominos logic, compound domino and compound domino
with stack height reduction wherever necessary. In [13]
domino logic circuits have been developed using DGMOS
24
with Merged Pre-charge Keeper(MPK) with Statistically

skewed inverter(SSI) and MPK with Dynamically Skewed
Inverter(DSI). As shown in Fig. 3 and 4, I(0,3) and H(0,3) are
implemented in compound domino configuration where as
H(0,15) is implemented in compound domino with stack
height reduction for better performance since these blocks are
in the critical path. Our conventional cells are domino gates
with keeper transistor.
Fig. 2 Block diagram of 16-bit QCL Adder.
The advantage of these topologies is that from a single cell

we will get G01, G02 and H03 which are needed to implement
PG1, PG2 and PG3. As given in [14], implementation using
compound domino gate gives speed improvement than a
dynamic gate. For Implementing I(0,3) we are effectively
utilizing the property of DGMOS as given in [11] such that
when two MOSFETS are parallel we can group them into one
and replace with one DGMOS. While generating the terms
I01, I02 and I03, four transistors can be saved compared to
CMOS implementation of same circuit, and we are carefully
pre-charging the intermediate nodes to get the output, which
prevents the charge sharing problem of dynamic gates. We
implemented the term H (0,15) using compound domino with
stack height reduction to improve the speed as given in [15].
V. DCVSL ADDER CELL

Differential Cascade Voltage Switch Logic (DCVSL)
family is similar to the Pseudo NMOS in the way that it also
has all the logic implemented only in the PDN and PMOS are
present in the form of load transistors [16] (which are now in a
latch type configuration.). The speed is high as the switching
is done through NMOS and the logic can be condensed when
there are common terms in both the trees. Adding a PMOS
sleep transistor from VDD on top of the circuit we have
developed a 1 bit full adder in DGMOS whose transistor level
schematic is given in Fig. 5. For this adder cell, we assume the
input and its complement are present. In most practical
implementations the compliment is made available by a chain
of buffers. We have compared this adder design with a
standard 28 gate full adder (without XOR configuration) [17]
designed in DGMOS technology with channel length of 25nm
with the double gate optimization. The results are depicted
below in the form of chart in Fig. 6. From these we see that
the DCVSL based adder is about 40% faster than the
conventional adder and also due to the usage of the sleep
transistor, its leakage current is drastically lower than that of
its counterpart. Sum will be generated from this block using
the below given equations with
ai , bi and carry in as inputs
Si aibiG(0, i1) if Cin G(0, i -1)

Si aibi( pi .H (0, i1)) if Cin H (0, i -1)
Fig. 3 Transistor level schematic of H(0,3) (a) Conventional and

(b) MPK/SSI,MPK/DSI realized in compound domino logic.
Fig. 5 DCVSL based one bit full adder (DCVSL_SUM) cell

with PMOS as sleep transistor.
Fig. 4
Fig. 6 Comparison of two adder cells in terms of

(i) Total no.of Transistors
(ii) Average Propagation Dealy(ns)
(iii) Leakage Current (nA).
Transistor level schematic of (a)I(0,3) and (b)H(0,15) implemented

with compound domino with stack height reduction.
25
VI. RESULTS AND DISCUSSION

We have implemented H(0,3) with three different domino
configurations - Conventional, MPK/SSI and MPK/DSI and
they are compared in terms of propagation delay (Tprop) for
worst case delay input vector, power in the evaluation phase
(Pe) after disconnecting the evaluation network from supply
and power in the pre-charge phase (Pp) after disconnecting the
evaluation network from output, shown in Table 1. From the
results, it is clear that MPK/SSI shows good power delay
performance. MPK/DSI performs better in terms of
performance and static power consumption but it consumes
more active power due to clock switching. We also
implemented 16-bit QCLA using the three configurations
whose results are given in Table II. For 16-bit QCLA, the
power consumption during pre-charge phase is nearly same
for the three configurations but there is a significant difference
in power consumption during evaluation phase which is of
we implemented 32- bit QCLA adder with MPK/SSI

compound domino logic with stack height reduction and a
novel DCVSL based cell is designed for generating sum at the
output. We minimized the complexity by generating mix of
carry and pseudo carry terms using H(0,3) and H(0,15) cells
and sum computation using DCVSL_SUM block which
resulted in minimized power and delay.
ACKNOWLEDGMENT
This work was funded by the Indo-French Centre for the
Promotion of Advanced Research. The authors thank the
Laboratorie dElectronique et de Technologie de lInformation
(LETI) of the Commissariat lEnergie Atomique (CEA),
Grenoble, France, for generously providing their circuit
models for double-gate MOSFETs.
interest. We have implemented a 32-bit adder using the above

given cells - H(0,3), I(0,3) and H(0,15) - with MPK/SSI
configuration and DCVSL based sum block (Fig. 5) at the
output to generate sum and CLK as the sleep signal. Since we
are generating mix of carries and pseudo carries using H(0,3)
and H(0,15) cells, we are able to minimize the complexity of
generating sum with our DCVSL_SUM cell which is a better
tradeoff between power and delay. We implemented a 32-bit
QCLA with MPK/SSI configuration. In this case, the worst
case delay is 220 ps and the power consumed in the evaluation
phase is 186 W. In our implementation, the blocks which get
external inputs are footed so as to take care of non
monotonicity of external inputs. We assume the least
significant bits to be more active than the most significant bits
so LSBs are always kept away from the output in order to
decrease unnecessary discharges of the internal nodes.
[1]
TABLE I
RESULTS OF H(0, 3) CELL FOR DIFERENT ARCHITECTURES
[8]
[2]
[3]
[4]
[5]
[6]
[7]
[9]
Pe(evaluation)
(W)
21
Pp(pre-charge)
(nW)
10
[10]
Conventional
Tprop
(ps)
72
MPK/SSI
65
11.6
11.7
[11]
MPK/DSI
63
6.4
6.3
[12]
TABLE II
RESULTS OF 16-BIT QCLA FOR DIFERENT ARCHITECTURES
[13]
Conventional
Tprop
(ps)
164
Pe(evaluation)
(W)
65
Pp(pre-charge)
(nW)
42
MPK/SSI
137
50
41
[15]
MPK/DSI
135
43
41
[16]
[14]
[17]
VII. CONCLUSION
We implemented 16-bit QCLA with conventional,
MPK/SSI and MPK/DSI and observed that MPK/SSI is better
in terms of power dissipation and propagation delay. Hence
26
REFERENCES
S.Borkar, "Design perspectives on 22nm CMOS and beyond,"
IEEE/ACM 46th Annual Design Automation Conference., SFO California, pp. 93-94, July 2009.
A. Amara and O. Rozeau (eds.), Planar double-gate transistor: From
technology to circuit. Dordrecht. Springer, 2009.
M. Vinet et al., Bonded Planar Double-Metal-Gate NMOS transistors
down to 10 nm, IEEE Electron Devices Let., vol. 26, No. 5, p. 317319, May 2005.
C.-H. Lin, P. Su, Y. T a d , X. Xi, I. He, A. M. Niknejad, M. Chan, and
C.Hu,Circuit Performance of Double-Gate SOI CMOS,
Semiconductor Device Research Symposium, pp. 266-267, Dec 2003.
B. Giraud , A.Amara and A. Vladimirescu, A Comparative Study of
6T and 4T SRAM Cells in Double-Gate CMOS with Statistical
Variation, IEEE International Symposium on circuits and systems.,
pp. 3022 3025, May 2007.
J. M. Rabaey, A. Chandrakasan, and B. Nikolic, Digital Integrated
Circuits: A Design Perspective, 2nd ed. Englewood Cliffs, NJ: PrenticeHall, 2003.
P. Royannez and A. Amara, A 1.0ns 64-bits GaAs Adder using Quad
tree algorithm, 6th Great Lakes Symposium on VLSI. , pp. 24 28, Mar
1996.
Ling, H,High Speed Binary Adder, IBM J. Research. Dev., Vol. 25,
no.3, p.156, May 1981.
R. W. Doran, Variants of an improved carry-lookahead adder,
IEEETrans. Computers, vol. 37, pp. 11101113, Sep 1988.
R. Zlatanovici, S. Kao, and B. Nikolic, EnergyDelay Optimization of
64-Bit Carry-Lookahead Adders With a 240 ps 90 nm CMOS Design
Example, IEEE J. Solid-State Circuits, vol. 44, pp. 569 - 583, Feb
2009.
M. H. Chiang , K. Kim , C. Tretz and C. T. Chuang, "Novel high-density
low-power high-performance double-gate logic technique," Proc. IEEE
Int. SOI Conf., pp. 122, Oct 2004.
M.-H. Chiang "High-density reduced-stack logic circuit techniques
using independent-gate controlled double-gate devices," IEEE
Trans.Electron Devices., vol. 53, pp. 2370-2377, Aug 2006.
H. Mahmoodi, et al., High-performance and low-power domino logic
using independent gate control in double-gate SOI MOSFETs,"
Proc.IEEE Int. SOI Conf., pp. 67 - 68, Oct 2004.
S. Naffziger, A sub-nanosecond 0.5m 64b adder design,
International Solid-State Circuits Conference., pp. 210-211, Feb 1996.
J. Park, H. C. Ngo, J. A. Silberman, and S. H. Dong, 470 ps 64 bit
parallel binary adder, in Symp. VLSI Circuits., pp. 192193, Jun 2000.
Kan M. Chu et al.,A comparison of CMOS circuit Techniques:
Differential Cascode Voltage Switch Logic versus Conventional Logic,
IEEE Journal of Solid State Circuits., vol. 22, pp.528-532, Aug. 1987.
Yeo, Kiat Seng and Roy, Kaushik, Low Voltage, Low Power VLSI
Subsystems, New York, McGraw- Hill, 2005.

Low Power and Fast Adder Implementation

Enviado por

Dados do documento

Descrição original:

Direitos autorais

Formatos disponíveis

Compartilhar este documento

Compartilhar ou incorporar documento

Opções de compartilhamento

Você considera este documento útil?

Este conteúdo é inapropriado?

Direitos autorais:

Formatos disponíveis

Low Power and Fast Adder Implementation

Enviado por

Direitos autorais:

Formatos disponíveis

Low power and fast adder implementation

with Double Gate MOSFETs

AbstractIn this paper we present implementation of a 32-bit

or the past several years Semiconductor Industry has been

Assistant Professor, Department of ECE,

ding DG NMOS symbol.

The essential features of a FD SOI DGMOS are a uniform

G(0, i) gi piG(0, i 1); Si aibici ;

Where G(0, i) and I (0, i) denotes the group generate and

PG2:I(0,1)=p1p0 and G(0,1)= g1 +g0p0

gi gi , the generate term G(0, i) , can be

G(0, i) pi ( gi G(0, i 1)) pi H (0, i 1)

H (0, i) gi pi1H (0, i1);

fewer transistors in the stack of the first gate.

with Merged Pre-charge Keeper(MPK) with Statistically

Fig. 2 Block diagram of 16-bit QCL Adder.

The advantage of these topologies is that from a single cell

V. DCVSL ADDER CELL

ai , bi and carry in as inputs

Si aibiG(0, i1) if Cin G(0, i -1)

Fig. 3 Transistor level schematic of H(0,3) (a) Conventional and

Fig. 5 DCVSL based one bit full adder (DCVSL_SUM) cell

Fig. 6 Comparison of two adder cells in terms of

Transistor level schematic of (a)I(0,3) and (b)H(0,15) implemented

VI. RESULTS AND DISCUSSION

we implemented 32- bit QCLA adder with MPK/SSI

interest. We have implemented a 32-bit adder using the above

Você também pode gostar