Escolar Documentos
Profissional Documentos
Cultura Documentos
Abstract
A high-speed dual-phase domino circuit design with high
performance and reliable characteristics is proposed. The
cell-based automatic synthesis flow supports the quick
design of high performance chips. The test chip of a
dual-phase 64-bit high-speed multiplier with a built-in
performance adjustment mechanism has been successfully
validated using TSMC 0.18um CMOS technology. The test
chip shows a 2.7X performance improvement compared to
the conventional static CMOS logic design. In addition, a
cell-based synthesizable design CAD flow, with
consideration of the skew-tolerant issue has been established.
A latched type domino cell library with noise-alleviation,
charge sharing, and crosstalk alleviation abilities was also
developed to support the proposed design flow. Finally, a
built-in performance adjustment mechanism is conducted
within the design. This mechanism supports performance
adjustment after chip fabrication, under clock skew
considerations.
Keywords: pipelined domino circuits, performance
adjustment
I. Introduction
The domino logic design offers a smaller circuit area and
higher operating speed than the complementary CMOS
design. In a conventional domino circuit, the single-phase
function evaluation occupies only half of a clock cycle. For
all of the internal gates within the circuits, only the signal
rising transition, without a falling signal transition, is
permitted. This makes the conventional domino logic a
positive function only. For the correct operation of all
domino gates, a fan-in low-to-high (rising) signal transition
must be maintained during the evaluation phase. Table 1
shows the conventional domino logic using a bubble
(inverter) pushing technique, which greatly increases the
circuit area.
Table 1 The gate count comparisons of static and domino
circuits
ISCAS-85
Benchmark cir.
C17
C432
C499
C13155
C1908
C3540
C6288
Average
Static
#of gate
3
112
312
306
279
463
1651
446.57
Dynamic
#of gate
13
260
392
416
369
1085
2789
760.57
Gate Increasing
ratio
333.33%
132.14%
25.64%
35.95%
32.26%
134.34%
68.93%
70.32%
Mp
V dd
Mp
Vdd
Embedded Latch
Vx
Vx
V out
Co
In1
In2
In2
Ink
Phi
Mn
Phi
Phi
Ink
precharge
phase
evaluate
phase
precharge
phase
Phi
Phi
Mn
Vout
Co
In1
evaluate
phase
Store
phase
G2
Dynamic
phi
G2
N1- + Latch
CLK2
phi
CLK1
CLK1
N 2+
N2 -
CLK2
G1
G2
G1
Dynamic
latch
clk
N1-latch output
CLK2
clk
N + Na4
a1
a2
n r2
o u tp u t
dl
1
dl
dl
dl
n r2
in v
d l4
phi
phi
phi
phi
phi
phi
phi
phi
in v
( b ) P r o p o s e d g a t e c ir c u i t
an2
Vx
an2
a4
hl
phi
na2
a3
N -
( a ) O r ig in a l g a t e c ir c u i t
na2
a1
a2
phi
o u tp u t
phi
a3
p h i1
d l1
p h i0
d l3
d l2
a4
phi
phi
phi
phi
phi
N1- out1
N1- out2
N2-out2
G1
clk
Na3
N1- output
Original Circuit
a1
a2
CLK1
G2
Dynamic
+ Latch
phi
( c ) D o m i n o L a t c h t r a n s i s t o r s c h e m a t i c c ir c x u it
N2- output
N2 + out1
N2 + out2
Next stage get the data after the previous stage is stable
S te p 1
S te p 2
S te p 3
8 1 * 5 2 8 3
) ) D Q G OR J LF J D WH
A d d in g la t c h t o r e s o lv e th e
In v e r t in g f u n c tio n p r o b le m
S te p 5
S te p 7
IN
D a t a R e tim in g
OUT
MUX
DCPG_IN6
DCPG_IN5
OUT
Clk
(Pd+Td)odd stage
S y n c h ro n o u s
p r o c e s s b y a d d in g la tc h e s
A s s ig n s s u ita b le c lo c k
p h a s e to e a c h lo g ic g a te
phi0
(Pd+Td)even stage
phi1
C lo c k P h a s e
a s s ig n m e n t
3 R V W V \ Q WK H V LV
DCPG_OUT
IN
MUX
A d d in g L a t c h
6 \QRSV\V
S te p 6
R Q
Coarse Delay
phi0
S te p 4
Fine Delay
DCPG_IN4
DCPG_IN3
DCPG_IN2
DCPG_IN1
DCPG_IN0
5 7 / & 2 ' (
6 \ Q R S V \ V S U H V \ Q W K H V L V
D Q G R S WLP L] D WLR Q
DCPG_IN5
Coarse Delay
DCPG_IN6
phi1
02
01
03
18
19
XX
Block 1
Block 3
INE_CLK
en
count
adpll_Lock
IN_CLK
reset_IN
L_sel scan_in
en
rst
ck2_out
ADPLL
LFSR
count [7:0]
tpg_go
ck2_out
rst
clk2
TPI [63:0]
clk22
DCPG
ctl_coun
t
Ora_ctl_en
d
ora_ctl
CLK_SEL
Block 2
Mechanism
(I)
net_rst
DIV_number [9:0]
Ora_ctl
Tpg_go
BT2
BT2
Count
ck1_out
32x32 bit
High Speed
Multiplier
ck2_out
phi1
phi0
00
20
00
00
01
02
39
40
00
01
02
39
40
41
ORA
tii [6:0]
Mechanism
(II)
net_rst
ora_ctl
ora_out [63:0]
00
00
result [63:0]
ck2_out
0
3
32
end compression
01
00
rst
DCPG_IN [6:0]
02
64
V. Conclusion
The dual-phase high speed domino circuit design
technique with in-house EDA cell-based automatic synthesis
flow is proposed. The clock phase and retiming process
maintain the correct circuit operation. The clock cycle
adjustment mechanism cooperates with BIST circuit to
provide highspeed circuit performance adjustment
mechanism for circuit chip after manufacturing. The test chip
of dual-phase 64-bit high-speed multiplier with performance
adjustment mechanism has been successfully validated.
References
[1] N. Goncalves and H. Den Man, "NORA: A racefree
dynamic CMOS technique for pipelined logic structure",
IEEE J. Solid-state Circuits, vol. 18, pp. 261-266, June
1983.
[2] M. Sjalander, P. Larsson Edefors, "Multiplication
Acceleration Through Twin Precision", IEEE Trans. on
VLSI Systems, vol. 17, no. 9, 2009.