Escolar Documentos
Profissional Documentos
Cultura Documentos
Winter 2004
Clocking
Website: http://vlsicad.ucsd.edu/courses/ece260b-w04
Combinational
Combinational Combinational logic
logic logic
clk
clk clk
Comb Comb
register
register
register
Logic Logic
register
register
register
register
CL
CL CL
CL CL
CL
A+B
A+B AA BB
100.00
386
20.00
0.00
1982 1987 1993 1998 2004
Year
Q1 Q2
Tmax + Tsetup ≤ T
Q1 Q2
Q1 Q2
critical path,
Tclock1 ~5 logic levels Tclock2
clock
ECE 260B – CSE 241A Clocking 10
Courtesy K. Keutzer et al. UCB 10
Andrew B. Kahng, UCSD
Cycle Time – Flip-Flop Delay (Clock to Q)
Q1 Q2
Q1 Q2
hold time –
D stable
after clock
When signal
may change
FlipFlop
Comb
Comb
Logic
Logic
tper
i Comb o
regA
regB
Comb
Logic
Logic
tpdmax
tskew
Ck Too late!
Ck’
i
Courtesy K. Yang, UCLA tpdmax
o
ECE 260B – CSE 241A Clocking 15 Andrew B. Kahng, UCSD
Example of tpdmin Violation: Race Through
Suppose clock skew causes regA to be clocked before regB
“i” passes through the CL with little delay (tpdmin)
“o” arrives before the rising Ck’ causes the data to be latched
Cannot be fixed by changing frequency have rock instead of chip
Ck Ck’
i o
Comb
regA
regB
Comb
Logic
Logic
tpdmin
tskew
Ck
Ck’ Too early!
i tpdmin
o
Courtesy K. Yang, UCLA
ECE 260B – CSE 241A Clocking 16 Andrew B. Kahng, UCSD
Time Borrowing (Cycle Stealing, Useful Skew)
FlipFlop
FlipFlop
Comb
Comb
Logic
Logic
Subject to:
Process variation from lot-to-lot
Process variation across the die
Radically different loading (ff density) around the die
Metal variation across the die
Power variation across the die (both static IR and dynamic)
Coupling (same and other layers)
ECE 260B – CSE 241A Clocking 20 Andrew B. Kahng, UCSD
Issues in Clock Distribution Network Design
Skew
Process, voltage, and temperature
Data dependence
Noise coupling
Load balancing
Power, CV2f
Clock gating
Flexibility/Tunability
Compactness – fit into existing layout/design
Reliability
Electromigration
Sylvester
ECE/ 260B
Shepard, 2001
– CSE 241A Clocking 23 Andrew B. Kahng, UCSD
Clock Distribution Methods
RC-Tree Grids
Less capacitance Reliable
More accuracy Less data dependency
Flexible wiring Tunable (late in design)
Sylvester
ECE/ 260B
Shepard, 2001
– CSE 241A Clocking 25 Andrew B. Kahng, UCSD
H-Tree
H-tree (Bakoglu)
One large central driver, recursive structure to
match wirelengths
Halve wire width at branching points to reduce
reflections
Disadvantages
Slew degradation along long RC paths
courtesy of P. Zarkesh-Ha
Unrealistically large central driver
- Clock drivers can create large temperature
gradients (ex. Alpha 21064 ~30° C)
Non-uniform load distribution
Inherently non-scalable (wire R growth)
Partial solution: intermediate buffers at branching
points
Sylvester
ECE/ 260B
Shepard, 2001
– CSE 241A Clocking 26 Andrew B. Kahng, UCSD
Buffered H-tree
Advantages
Ideally zero-skew
Can be low power (depending on skew requirements)
Low area (silicon and wiring)
CAD tool friendly (regular)
Disadvantages
Sensitive to process variations
- Devices Want same size buffers at each level of tree
- Wires Want similar segment lengths on each layer in each source-sink
path !!!
Local clocking loads inherently non-uniform
Sylvester
ECE/ 260B
Shepard, 2001
– CSE 241A Clocking 27 Andrew B. Kahng, UCSD
Tree Balancing
Sylvester
ECE/ 260B
Shepard, 2001
– CSE 241A Clocking 28 Andrew B. Kahng, UCSD
Examples From Processor Chips
output mesh
1u m5 ribs every 20 - 30 u
(4 to 6 rows)
4 2
v
6 4
s0 v b 6
Topology
a b
s1 s2 s3 s4
ECE 260B – CSE 241A Clocking 41 Andrew B. Kahng, UCSD
BST-DME Bottom-Up Phase
s0 v
Topology
a b
B=4 s2 s1 s2 s3 s4
s0
s3
s1 a v b
s4
ECE 260B – CSE 241A Clocking 43 Andrew B. Kahng, UCSD
Outline
Why Clocking
Clock Distribution
The “Zero-Skew Tree” Problem
The “Useful” Skew Problem
The Timing Analysis Problem
D : longest path FF
FF
d : shortest path
permissible range
W. Dai,
ECEUC260B
Santa Cruz241A Clocking 45
– CSE Andrew B. Kahng, UCSD
“Useful Skew” Design Robustness
FF FF 6 ns FF T = 6 ns
2 ns
4 0
“0 0 0”: at verge of violation
4 0
“2 0 2”: more safety margin
2 -2
W. Dai,
ECEUC260B
Santa Cruz241A Clocking 46
– CSE Andrew B. Kahng, UCSD
Constraints on Skews
FFi receives clock signal delayed by xi ≥ MIN_DEL
0 < α ≤ 1 ≤ β : if nominal clock delay is xi, then actual clock delay
must fall within interval αxi ≤ x ≤ βxi
For FF to operate correctly when clock edge arrives at time x, the
correct input data must be present and stable during the time
interval (x – SETUP, x + HOLD)
For 1 ≤ i,j ≤ L (#FFs), we compute lower and upper bounds MIN(i,j)
and MAX(i,j) for the time that is required for a signal edge to
propagate from FFi to FFj
Avoid zero-clocking
βxj + SETUP + MAX(i,j) ≤ αxj + P; P = clock period
LP_SAFETY (robustness):
Maximize M s.t.
αxj - βxj – M ≥ HOLD – MIN(i,j)
αxi– βxj – M ≥ SETUP + MAX(i,j) – P
xi ≥ MIN_DEL
Notes
- J. P. Fishburn, “Clock Skew Optimization”, IEEE Trans. Computers 39(7) (1990), pp. 945-951.
- T. G. Szymanski, “Computing Optimal Clock Schedules”, Proc. DAC, June 1992, pp. 399-404.
- Useful Skew optimization is similar to Retiming optimization
- Peak current reductions are a side benefit
Combinational
Combinational Combinational logic
logic logic
clk
clk clk
original circuit
Combinational
logic
extracted block
Courtesy K. Keutzer et al. UCB
ECE 260B – CSE 241A Clocking 50 Andrew B. Kahng, UCSD
Key = Delays Through Combinational Block
X
A(X) dx → z
A(Z)
Z
A(Y) dY → z
Y
A( υ) = max (A(u) + du →υ )
u∈FI( υ )