Você está na página 1de 20

Clock Concurrent Optimization

Rethinking Timing Optimization to Target Clocks and Logic at the Same Time
PaulCunningham en,SteevWilcox ,MarcSwinn
Febiuaiy 2uu9
Introduction
Ten yeais ago, the EBA inuustiy faceu a ciippling uiveigence in timing between RTL synthesis anu placement
causeu by iapiuly iising wiie capacitances ielative to gate capacitances. Without some ieasonable level of
placement knowleuge to give cieuible estimates of wiie length it was becoming impossible to measuie uesign
timing with any accuiacy uuiing RTL synthesis. At this time, placement tools weie not uiiectly awaie of
timing anu focuseu insteau on metiics inuiiectly ielateu to timing such as total wiie length. As chip uesigns
scaleu to ueep submicion geometiies (18unm anu 1Sunm), the change in timing aiounu placement became
so significant anu unpieuictable that even manual iteiations between synthesis anu placement no longei
conveigeu. The solution was to ieinvent placement, making it both uiiectly awaie of timing anu also weaving
in many of the logic optimization techniques exploiteu uuiing RTL synthesis, foi example gate sizing anu net
buffeiing. This piocess was not easy, anu ultimately saw a majoi tuinovei in the backenu uesign tool
lanuscape as a new geneiation of physical optimization tools weie uevelopeu, ieleaseu anu piolifeiateu
thioughout the chip uesign community.
Touay timing is uiveiging once again, but foi a uiffeient set of ieasons (onchip vaiiation, low powei, anu
uesign complexity) anu at a uiffeient point in the uesign flow (CTS). While this uiveigence has so fai ieceiveu
little meuia attention, this papei shows that the uiveigence is seveie so much so that we believe it is having
a ciitical impact on the economic viability of migiation to the S2nm piocess noue. Clock concuiient
optimization is a ievolutionaiy new appioach to timing optimization which compiehensively auuiesses this
uiveigence by meiging physical optimization into CTS anu simultaneously optimizing both clock uelays anu
logic uelays using a single unifieu cost metiic.
This papei begins with a biief oveiview of some basic concepts in clock baseu uesign, anu a biief oveiview of
the tiauitional iole of CTS within uigital uesign flows. It then explains why anu by how much uesign timing is
uiveiging aiounu CTS. The concept of clock concuiient optimization is then intiouuceu anu its key uefining
featuies outlineu. The papei concluues with a summaiy of the key benefits of clock concuiient optimization
anu an explanation of why it compiehensively auuiesses the uiveigence in uesign timing aiounu CTS.
A Brief Overview of Clock-Based Design
SetupandHoldConstraints
Clocking was one of the gieat innovations which enableu the semiconuuctoi inuustiy to piogiess to wheie it
is touay. Clocking elegantly quantizes time, enabling tiansistois to be abstiacteu to sequential state machines
anu fiom state machines into a simple anu intuitive piogiamming paiauigm foi chip uesign, the iegistei
tiansfei language (RTL).
The funuamental assumption maue by sequential state machines, anu hence by any RTL specification, is the
assumption of sequential execution, i.e. that all paits of a state machine stay instep with iespect to each
2uu92u1u Azuio, Inc. 1




2 2uu92u1u Azuio, Inc.

othei. This assumption tianslates to a set of constiaints on uelay which must be met by any clockbaseu
uesign if it is to function coiiectly. These constiaints split into two classes:
Constiaints which ensuie that eveiy flipflop always makes a foiwaiu step fiom state n to state n+1
whenevei the clock ticks. These constiaints aie typically iefeiieu to as setup constiaints.
Constiaints which ensuie that no flipflop evei makes moie than one foiwaiu step fiom state n to state
n+2 on a single clock tick. These constiaints aie typically iefeiieu to as hold constiaints.

SetupConstraint: + C L + u
max
< T
HoldConstraint: L + u
min
> C
T
clock
L C
G
max
G
min
logic
Figure1:Setupandholdconstraintsinclockbaseddesign.
0ne setup anu one holu constiaint aie iequiieu foi eveiy paii of flipflops in a uesign which have at least one
functional logic path between them. Figuie 1 summaiizes the setup anu holu constiaints foi a paii of flip
flops, A anu B, tiiggeieu by a clock with peiiou T. The clock uelay to A is uenoteu by L foi launching clock
anu the clock uelay to B is uenoteu by C foi captuiing clock. u
min
, u
max
uenote the minimum anu maximum
logic path uelays between the two flipflops. Foi simplicity, anu because it makes no uiffeience to the
aiguments we make in this papei, we have assumeu the setup time, holu time, anu clocktoQ uelay foi the
flipflops aie all zeio.
The setup constiaint is ieau as follows: the woistcase time taken foi a clock tick to ieach A, anu piopagate a
new value to the input of B, must be less than the time taken foi the next clock tick to ieach B. If this isnt tiue
then it is possible that B be clockeu when its input uoes not yet holu the coiiect nextstate value.
The holu constiaint is ieau as follows: the bestcase time taken foi a clock tick to ieach A, anu piopagate a
new value to the input of B, must be gieatei than the time taken foi that same clock tick to ieach B. If this isnt
tiue then it is possible that a next state value on the input to A may piopagate all the way thiough to the
output of B in one clock tick, in essence causing B to skip eiioneously fiom state n to state n+2 in one clock
cycle.
IdealandPropagatedClocksTiming
In the context of mouein uigital chip uesign flows, the setup anu holu constiaints outlineu above aie iefeiieu
to as a propagatedclocks mouel of timing since the constiaints stait fiom the ioot of the clock anu incluue
the time taken foi the clock euge to piopagate thiough the clock tiee to each flipflop. The piopagateu clocks


2uu92u1u Azuio, Inc. S

mouel of timing is the uefinitive ciiteiia foi coiiect chip function, anu is the one useu by timing signoff tools
esign flows. in u
An idealclocks mouel of timing simplifies the piopagateu clocks mouel of timing by assuming that the launch
anu captuie clock paths have the same uelay, i.e. that L=C. In this case the setup anu holu constiaints simplify
ignificantly: s
Prop ated ag Clocks IdealClocks
Setup: L + u
max
< T + C ( assumeL=C ) u
max
< T
Hold: L + u
min
> C ( assumeL=C ) u
min
> u
Since u
min
>u is to a fiist appioximation always tiue, assuming that L=C simplifies the entiie pioblem of
ensuiing that a clock baseu uesign will function coiiectly to u
max
< T. In this univeise theie is no neeu to
woiiy about clock uelays oi about minimum logic uelays. All that matteis is making suie that the maximum
logic path uelay in the uesign, typically iefeiieu to as the ciitical path, is fastei than the clock peiiou. In
essence, clocks have been canceleu out of the timing optimization pioblem.
The concept of iueal clocking is so uiamatic anu poweiful as to have enableu an entiie ecosystem of
fiontenu engineeis anu uesign tools living in a woilu of iueal clocks, anu the oiigins of iueal clocking aie so
ueep iooteu in the histoiy books of the semiconuuctoi inuustiy that clock baseu uesign is itself often iefeiieu
to as synchionous uesign even though theie is nothing funuamentally synchionous about clock baseu
uesign itself!
ClockSkewandClockTreeSynthesis
If chip uesign begins in a woilu wheie clocks aie iueal but enus in a woilu wheie clocks aie piopagateu it
follows that at some point in the uesign flow a tiansition must be maue between these two woilus. This
tiansition happens at the clock tiee synthesis (CTS) step in the flow wheie clocks aie physically built anu
inseiteu into a uesign: see Figuie 2.
Since iueal clocks assumes L=C foi all setup anu holu constiaints it follows that the tiauitional puipose of CTS
is to builu clocks such that L=C. If this can be achieveu then piopagateu clocks timing will match iueal clocks
timing anu the uesign flow will be conveigent.
If a clock tiee has n sinks anu a set of paths P|1j to P|nj fiom its souice to each sink then the skew of that
clock is uefineu as the uiffeience between the shoitest anu longest of these paths: see Figuie S.
Nainstieam CTS tools aie aichitecteu piimaiily to builu highly efficient balanceu buffei tiees to a veiy laige
numbei of sinks with a small skew. Twenty yeais ago, the motivation anu benefits of builuing balanceu clocks
was cleai: clock skew was an uppei bounu on the woist uiffeience between L anu C foi any paii of flipflops,
i.e. an uppei bounu on |LC| foi any possible setup oi holu constiaint which coulu apply to a uesign. If a small
skew coulu be achieveu ielative to the clock peiiou then a high uegiee of similaiity between iueal anu
piopagateu clocks timing was guaianteeu. But it is impoitant to iemembei that clock skew anu the woist
uiffeience between L anu C aie not the same thing anu that foi a mouein SoC uesign at nanometei piocess
nificantly gieatei than the clock skew. noues it is entiiely possible (in fact veiy common) foi |LC| to be sig




4 2uu92u1u Azuio, Inc.




RTL
FinalLayout
Synthesis
Floorplan
InitialPlacement
Clock TreeSynthesis
PostCTS Optimization
Routing
PostRoute Optimization
PhysicalOptimization
SignoffVerification
IdealClocks
Timing
PropagatedClocks
Timing
Figure2:Traditionalbalancedclocksdesignflow


Skew=max(P|1j,P|2j,,P|nj)min(P|1j,P|2j,,P|nj)
P[n]
P[1]
P[2]
clock
Figure3:SkewofaClockT

ree


2uu92u1u Azuio, Inc. S

The uistinction between clock skew anu |LC| is a ciitical founuation stone foi this papei. Clock skew is a
concept uefineu in teims of woist uiffeiences in uelay between souicetosink paths in buffei tiees. L anu C
aie uelay vaiiables in the setup anu holu constiaints of a piopagateu clocks mouel of timing anu aie not ieally
uiffeient in this context fiom the othei uelay vaiiables, u
min
anu u
max
. The somewhat slippeiy natuie of this
uistinction anu the ease with which a uiscussion can begin in the context of timing anu then migiate
mistakenly into a context of skew is one of the piimaiy ieasons why we believe the uiveigence in uesign
timing aiounu CTS has foi so long gone ielatively unnoticeu by the chip uesign anu EBA communities.
The puipose of this papei is not to aigue that tight skews cannot be achieveu foi mouein nanometei uesigns.
Noi is it to aigue that the skew minimization techniques useu by mainstieam CTS tools no longei woik foi
mouein nanometei uesigns. The puipose of this papei is to aigue that the ability of tight clock skews to binu
iueal clock timing to piopagateu clocks timing is bioken we estimate it bioke in a commeicial sense aiounu
the 6Snm noue. No tweak oi iefinement to the uefinition of skew can fix this. The only solution is to give up
entiiely on the concept of skew anu focus CTS insteau on the funuamental piopagateu clocks timing
constiaints that will mattei postCTS in the flow. But in this context theie is no longei any mateiial uistinction
between clock paths (L anu C) anu logic paths (u
min
anu u
max
). Constiuctively exploiting this obseivation is the
inspiiation behinu the clock concuiient appioach optimization.
The Clock Timing Gap
Theie is no question that clock baseu uesign, iueal clocks timing, the use of RTL to specify a chip, anu the
concepts of fiontenu vs. backenu uesign aie all vital founuation stones foi the continueu success of the
semiconuuctoi inuustiy theii collective ability to enable uesign automation anu stieamline engineeiing
piouuctivity woulu almost ceitainly be impossible to achieve by any othei means.
Bowevei, the tiauitional iole of CTS to builu buffei tiees with tight skew only makes sense if achieving these
tight skews ieasonably binus iueal clocks timing to piopagateu clocks timing. If this is not the case then
timing uecisions maue using iueal clocks have only limiteu value. Accommouating a change in timing
lanuscape aftei CTS iequiies eithei accepting uegiauation in chip speeu oi accepting uelay in time to maiket
uue to incieaseu iteiations back to RTL synthesis anu physical optimization. If it can be aigueu that the
uiveigence between iueal clocks anu piopagateu clocks is both significant anu funuamental, i.e. one wheie
theie coulu nevei be any foimula oi metiic which coulu binu them, then the only solution becomes to iethink
ve. CTS as a timing optimization step in the flow which uiiectly taigets piopagateu clocks timing as its objecti
In this section we attempt to uefine anu measuie the magnituue of the gap between iueal anu piopagateu
clocks timing. Foi a paiticulai timing constiaint i with launch clock uelay L|ij, captuie clock uelay C|ij,
minimum anu maximum logic path uelays u|ij
min
anu u|ij
max
, the uiffeience between iueal anu piopagateu
clocks timing foi that constiaint can easily be seen to be the magnituue of C|ijL|ij. Foi example if i weie a
setup constiaint then we have:
Propagatedclockstiming: L|ij + u|ij
a
< T + C|ij
m x
= u|ij < T
max
(L|ijC|ij)

Idealclockstiming: u|ij
max
< T
Difference= L|ijC|ij
A similai ieasoning gives the opposite, L|ijC|ij, foi holu constiaints. We uefine the clock timing gap foi a
paiticulai set of timing constiaints (eithei setup oi holu oi oth) as: a mixtuie of b
ClockTimingGap=n[
L|i]- C|i]
T




6 2uu92u1u Azuio, Inc.

0ui choice of stanuaiu ueviation, , on L|ij C|ij iathei than aveiage oi woist |L|ij C|ij| is impoitant: we uo
not want to measuie a laige clock timing gap if the uelta between iueal anu piopagateu clocks timing is
systematic oi applies only to a veiy small numbei of timing constiaints. If this weie the case then the gap
woulu not be a funuamental gap anu coulu easily be woikeu aiounu by applying a global safety maigin (aka
global unceitainty) to the iueal clocks timing mouel oi by manually applying a few inuiviuual sink pin offsets
to CTS. What we want to measuie aie tiue unsystematic uiveigences between iueal anu piopagateu clocks
timing which apply to a significantproportion of the timing constiaints. These uiveigences will nevei be
iesolvable with a small amount of manual effoit oi with any geneializations to the concept of skew.
We uiviue L|ijC|ij by the clock peiiou T to noimalize oui metiic so that it is expiesseu as a peicentage of the
clock peiiou. This enables us to meaningfully compaie the aveiage clock timing gap acioss a laige numbei of
uesigns acioss a iange of clock fiequencies anu piocess noues.
Figuie 4 below summaiizes the aveiage clock timing gap foi the top 1u% woist violateu setup constiaints
acioss a poitfolio of ovei 6u ieal woilu commeicial chip uesigns fiom 18unm to 4u4Snm anu fiom 2uuk to
1.26N placeable instances. It shows that while at 18unm the clock timing gap is small at aiounu 7% of the
clock peiiou, at 4u4Snm the gap has wiueneu to aiounu Su% of the clock peiiou. A gap of this magnituue is
sufficient to completely tiansfoim the timing lanuscape of a uesign beyonu iecognition between befoie anu
aftei CTS. Since oui measuie is one of stanuaiu ueviation anu not aveiage oi woist uiffeience this gap tiuly is
a funuamental uiveigence which can only be auuiesseu by a funuamental iethink of the iole of CTS in the
uesign flow. Builuing clocks to meet a tight skew taiget no longei achieves its puipose noi will any othei
inuiiect metiic evei binu iueal clocks timing to piopagateu clocks timing since the uiveigence is unsystematic
anu laige foi a significant numbei of woist violating timing enupoints. The only solution is to uiiectly taiget
the piopagateu clocks timing constiaints anu tieat the launch anu captuie clock paths (L anu C) as
optimization vaiiables with the same significance anu similai uegiees of fieeuom to logic path vaiiables (u
min

anu u
max
). This is what clock concuiient optimization is all about.
u%
1u%
2u%
Su%
4u%
Su%
6u%
18unm 1Sunm 6Snm 4u4Snm
ProcessNode
A
v
e
r
a
g
e

C
l
o
c
k

T
i
m
i
n
g

G
a
p

f
o
r
1
0
%

w
o
r
s
t

v
i
o
l
a
t
e
d

s
e
t
u
p

c
o
n
s
t
r
a
i
n
t
s

Figure4:ClockTimingGapacrossaportfolioofover60commercialdesigns



Explaining the Clock Timing Gap
Theie aie thiee key unueilying inuustiy tienus which aie causing iueal anu piopagateu clocks timing to
uiveige, anu it is the ielatively simultaneous onset of all thiee tienus that has causeu the clock timing gap to
open up so uiamatically at the 6Snm noue anu below. These thiee tienus aie onchip vaiiation, clock gating,
anu clock complexity.
OnChipVariation
0n chip vaiiation (0Cv) is a manufactuiing uiiven phenomenon. Two wiies oi two tiansistois which aie
uesigneu to be iuentical almost ceitainly wont be once piinteu in silicon uue to the lithogiaphic challenges of
piinting featuies smallei than the wavelength of light useu to uiaw them. As a iesult the peifoimance of two
supposeuly iuentical tiansistois can uiffei by an unpieuictable amount. This pioblem is a significant anu
u%. giowing one, anu at 4Snm these ianuom manufactuiing vaiiations can impact logic path uelays by up to 2
0Cv is paiticulaily ielevant foi clock paths since the length of clock paths (i.e. the inseition uelay of clock
tiees) is iising exponentially with iespect to clock peiious. This is in pait because the numbei of flipflops in a
uesign continues to iise exponentially but also because iesistances aie iising so fast with successive piocess
shiinks that buffeiing acioss long uistances, as is typically necessaiy in the clock, iequiies moie anu moie
buffeiing pei unit length. At 4Snm it is not uncommon to see SS times the clock peiiou woith of uelay in
launch anu captuie clock paths. Even if the impact of 0Cv is only 1u% of path uelay this still amounts to a
potential change in timing pictuies of SuSu% of the clock peiiou between iueal anu piopagateu clock
mouels. The only ieason why 0Cv has not alieauy giounu chip uesign completely to a halt is the fact that it
can be ignoieu on the common poition of the launch anu captuie clock paths using a technique known as
common path pessimism iemoval (CPPR) oi clock ieconveigence pessimism iemoval (CRPR): see Figuie S.

Setup: PR+1.1(RA+AB
max
) < PR+u.9(RB) Setup: PQ+1.1(QB+BC
max
) < PQ+u.9(QC)
Hold: PR+u.9(RA+AB
max
) > PR+1.1(RB) Hold: PQ+u.9(QB+BC
max
) > PQ+1.1(QC)
A
logic logic
B C
AB
max
BC
max
clock
P
Q
R
PQ
PR
RA RB
QC
QB

2uu92u1u Azuio, Inc. 7



Figure5:Propagatedclockstimingwith10%OCVderatesandCPPR
CPPR is highly constiaint uepenuent, impacting one paii of flipflops completely uiffeiently fiom anothei
since it uepenus ciucially on wheie in the clock tiee the launch anu captuie clock paths conveige foi a
paiticulai paii of flipflops.



8 2uu92u1u Azuio, Inc.

A tiauitional measuie of clock skew ignoies 0Cv, so even if clock skew is zeio, once 0Cv ueiates anu CPPR
aie applieu to a uesign the magnituue of L C can be laige foi a significant numbei of logic paths. Also, since
CPPR makes the impact of 0Cv on launch anu captuie clock paths constiaint uepenuent, theie is no
meaningful way to pieuict oi mouel this impact piioi to CTS. In this sense 0Cv moueling, most specifically the
use of CPPR, is a uiiect anu funuamental contiibutoi to the clock timing gap.
ClockGating
Powei has foi many mouein chip uesign teams become as significant an economic uiivei as chip speeu anu
aiea, anu the clock netwoik is often the biggest single souice of uynamic powei uissipation in a uesign. It has
become stanuaiu piactice foi almost all mouein uesigns to manage clock powei aggiessively thiough the
extensive use of clock gating to shut uown the clock to poitions of a uesign which uo neeu to be clockeu in a
paiticulai clock cycle. Clock gating can be at a veiy high level, foi example shutting uown the entiie
applications piocessoi on a mobile phone when it is not neeueu, oi at a veiy fine giaineu level, foi example
shutting uown the top 8bits of a 16bit countei when they aie not counting. Nouein systems on a chip can
contain tens of thousanus of clock gating elements.
Fiom a timing peispective, a clock gate is just like a flipflop, biinging with it its own setup anu holu
constiaint. But, unlike a flipflop, clock gates exist insiue a clock tiee anu not at its sinks: see Figuie 6. In an
iueal clocks timing mouel a clock gate typically looks exactly the same as a flipflop since the clock aiiives
instantaneously eveiywheie (i.e. L=C=u), but in a piopagateu clocks timing mouel theie can be a massive
uiffeience between the launch anu captuie clock path uelays foi a clock gate, especially foi aichitectuial clock
gates high up in the clock tiee.

SetupConstraint: + C L + u
max
< T
HoldConstraint: L + u
min
> C
clock
logic CG
L C
G
min
G
max
en
LC
Figure6:clockgateenabletimingusingapropagatedclocksmodeloftiming
Eveiy clock gate auueu to a uesign auus to the clock timing gap on that uesign, anu the moie aggiessive the
uesign team is in managing powei, the moie this gap is felt. This is because the most common woikaiounu foi
the pain causeu by clock gating timing uiveiging postCTS is to foice all clock gates to the bottom of the clock
tiee (by cloning them) so that the captuie clock path uelay is as close as possible to the launching clock
uelay theieby iestoiing conveigence between iueal anu piopagateu clocks timing on clock gates. This is
howevei, the woist possible stiategy foi saving powei.


2uu92u1u Azuio, Inc. 9

Without insisting that all clock gates lie at the veiy bottom of the clock tiee theie is no meaningful way to
pieuict the ielationship between L anu C foi clock gates, anu theiefoie clock gating is also a uiiect anu
funuamental contiibutoi to the clock timing gap.
ClockComplexity
Nouein systemonachip (SoC) uesigns aie big tens of millions of gates. They also exploit extensive IP
ieuse, wheie pseuuogeneiic mouules such as ARN coies, 0SB inteifaces, PCI inteifaces, memoiy contiolleis,
BSPs, basebanu mouems, anu giaphics piocessois aie instantiateu on a single uie, configuieu to iun in
paiticulai moues, anu stitcheu togethei to uelivei some foim of integiateu system capability.
The clock netwoik in these SoCs is not simple. In fact it has become phenomenally complex often well ovei a
hunuieu inteilinkeu clock signals that can be biancheu anu meigeu thousanus of times. Pait of the
complexity is simply a iesult of stitching togethei the many IP blocks, but much of the complexity is also
inheient to each of these IP blocks: the same lowpowei impeiatives which uiive clock gating aie also uiiving
the ueployment of a wiue vaiiety of clocks anu clocking schemes at a vaiiety of fiequencies anu voltages to
fuithei contiol anu manage clock activity. Powei consumption uuiing builtin system test anu time on the
testei uuiing piouuction fuithei impact clock complexity as scan chains continue to be sliceu anu uiceu anu
evei moie intiicate clocking schemes aie useu to captuie anu shift quickly anu powei efficiently.
The enu iesult is a uense spaghetti netwoik of clock muxes, clock xois, anu clock geneiatois, entwineu with
clock gating elements fiom the highest levels in the clock tiee wheie they shut uown entiie subchips, to the
lowest levels in the clock tiee, wheie they may shut uown only a hanuful of flipflops, see Figuie 7.

Figure7:ClocknetworkonamodernnanometerSoC
In this woilu of vast clock complexity, the uefinition of clock skew is itself nonobvious: iathei than a tiee oi
set of tiees we have a netwoik with hunuieus of souices anu hunuieus of thousanus of sinks. In such a woilu
it is easy, anu inueeu even common, to finu oneself constiucting scenaiios wheie making L=C foi all flipflops
is mathematically impossible. Anu even in the cases wheie it is theoietically possible to achieve this objective
the sheei size of the clock uelays that woulu be iequiieu to achieve this woulu be so laige, e.g. 1u+ times the
AOI2
1,000FFs
10,000FFs
1,000FFs
2,000FFs
500FFs
ClkA
ClkB
ClkC
ClkD



1u 2uu92u1u Azuio, Inc.

clock peiiou, as to make timing impossible to meet even with the tiniest of 0Cv maigins in place. The
iesulting uynamic powei consumption anu IRuiop woulu also almost ceitainly ienuei the entiie uesign
useless.
The only way to implement uesigns of this complexity using a tiauitional uesign flow is to spenu months of
manual effoit caiefully ciafting a complex set of oveilapping balancing constiaints, typically iefeiieu to as
skew gioups, which iequiie that CTS balance ceitain sets of clock paths with othei sets of clock paths. It can
take hunuieus of such skew gioups to be caiefully ciafteu befoie any ieasonable piopagateu clocks timing
can be achieveu anu such timing is nevei in piactice achieveu by ensuiing that L=C foi all paiis of flipflops.
In essence, the impact of clock complexity on the clock timing gap has alieauy bioken the tiauitional iole of
CTS in uesign flows, anu uesign teams aie alieauy manually woiking aiounu it at gieat cost by caiefully
ciafting highly complex sets of balancing constiaints which aie uesigneu to achieve acceptable piopagateu
clocks timing anu not L=C foi all flipflops.
DetailedBreakdownoftheClockTimingGap
Figuie 7 below shows the same clock timing gap giaph as was shown in Figuie S but fuithei bieaks out the
ielative contiibutions of 0Cv, clock gating anu inteiclock timing to this gap. The figuie makes it cleai that
each of the thiee tienus contiibutes mateiially to the clock timing gap, anu also highlights that clock
complexity has the most significant impact on this gap. The giaph also highlights oui obseivation that clock
skew, as tiauitionally uefineu only between iegisteis on a single clock tiee, is not bioken. The clock timing
gap iestiicteu only to setup anu holu constiaints between paiis of flipflops in the same clock tiee is at most
1u% of the clock peiiou, even at 4u4Snm. But theie aie othei factois ignoieu by the concept of clock skew
which aie piogiessively eiouing its ability to binu iueal to piopagateu clocks timing in the uesign flow.
u%
1u%
2u%
Su%
4u%
Su%
6u%
18unm 1Sunm 6Snm 4u4Snm
ProcessNode
ocv
inteiclock
clock gates
iegtoieg
A
v
e
r
a
g
e

C
l
o
c
k

T
i
m
i
n
g

G
a
p

f
o
r
1
0
%

w
o
r
s
t

v
i
o
l
a
t
e
d

s
e
t
u
p

c
o
n
s
t
r
a
i
n
t
s

Figure7:BreakdownoftrendscontributingtotheClockTimingGap
The message is simple: the iole of CTS in the uesign flow must change. It can no longei be about minimizing
clock skews; it must somehow be about uiiectly tiansitioning a uesign fiom iueal to piopagateu clocks timing
anu using eveiy possible tiick theie is to counteiact the suipiises that occui as the tiue piopagateu clocks
timing pictuie emeiges, incluuing clock gates, inteiclock paths anu 0Cv maigins.


Clock Concurrent Optimization
Befoie iunning CTS in the uesign flow theie aie no ieal launch anu captuie clock paths anu timing
optimization focuses exclusively on the slowest logic path, u
max
, using an iueal clocks mouel of timing.
Pioviueu clocks can be implementeu such that the launch anu captuie clock paths aie almost the same foi all
setup anu holu constiaints, L=C, this focus on u
max
makes sense. But, as this papei has shown, 0Cv, clock
gating, anu clock complexity have maue balancing L anu C, oi inueeu of ueteimining any systematic
ielationship between L anu C, an impossible goal.
The only meaningful goal foi clock constiuction must theiefoie be to uiiectly taiget the piopagateu clocks
timing constiaints, selecting L anu C specifically foi the puipose of ueliveiing the best possible piopagateu
clocks timing pictuie postCTS. But the best possible L anu C uepenu on the values of u
max
anu u
min
so we
eisa. have a chickenanuegg pioblem: which comes fiist. Bo we pick u
max
anu u
min
then set L anu C, oi vicev
Clock concuiient optimization, oi CC0pt foi shoit, is the teim we use to uesciibe a new class of timing
optimization tools that meige physical optimization with CTS anu that uiiectly contiol all foui vaiiables in the
piopagateu clocks timing constiaint equations (L, C, u
min
, anu u
max
) at the same time.

Figuie 8 visualizes the conceptual uistinction between a tiauitional appioach to timing optimization anu a
clock concuiient appioach to timing optimization. We use the teim clock concuiient uesign oi clock
concuiient flow to iefei to any uesign methouology oi uesign flow employing the use of clock concuiient
optimization.
Clockconcurrentoptimization(CCOpt)mergesphysicaloptimizationwithCTSanddirectly
controlsallfourvariablesinthepropagatedclockstimingconstraintequations
(L,C,G
min
,andG
max
)atthesametime.

clock
variable
Skew
G
max
G
max
<T Skew
fixed fixed
T
clock
L
variable
G
max
L+G
max
<T+C
fixed variable
T
C
variable
Traditional Physical Optimization Clock Concurrent Optimization
Figure8:IllustrationofthedifferencebetweenPhysicalOptimizationandCCOpt
2uu92u1u Azuio, Inc. 11




12 2uu92u1u Azuio, Inc.

LogicChains
Since CC0pt tieats both clock uelays anu logic uelays as flexible paiameteis, the maximum possible speeu
that a chip can be clockeu at is no longei limiteu by the slowest logic path in a uesign. CC0pt allows the
captuie clock path to be longei than the launch clock path, in which case the logic path may have moie than
the clock peiiou to compute its iesult. But this extia time is not a fiee lunch: if C is biggei than L then time has
been boiioweu eithei fiom eithei the pieceuing oi subsequent pipeline stages: see Figuie 9.
Such time boiiowing is iteiative acioss multiple logic stages: if time can be boiioweu fiom logic stage n+1 to
logic stage n, then time can also be boiioweu fiom logic stage n+2 to logic stage n+1 anu then again fiom logic
stage n+1 to logic stage n, anu so on both foiwaius anu backwaius fiom logic stage n, see Figuie 1u. Bowevei,
the time boiiowing is not unlimiteu, anu must stop eithei when the chain of logic stages loops back on itself
i when it ieaches an I0 to the chip, see Figuie o

11.
T clock

2
T
1
T+
1
+
2
T
2

Figure9:Timeborrowing

T clock

2
T T+
1
+
2
T T
1
T
2

Figure10:Multistagetimeborrowing



Input Output
Input Output
InputTadpole Chain OutputTadpole Chain
Looping Chain
IOChain

Figure11:Differenttypesoflogicchain
In a woilu wheie launch anu captuie clock paths aie flexible optimization paiameteis, it is these chains of
logic functions which most influence the maximum possible clock speeu: a chain with n logic stages has at
most n clock peiious of total time available iiiespective of the clock uelays to each iegistei in the chain.
Pioviueu the woist total logic uelay thiough the entiie chain,
i
(u|ij
max
), is less than nT, it will be possible to
come up with a set of clock aiiival times foi each iegistei on the chain that meets piopagateu clocks setup
constiaints. We iefei to this ielatio shi a se ain constiaint: n p as tup ch
Setupchainconstraint: u|i]
max
< nT
n
=1

0nly in the highly unusual situation wheie the most efficient uistiibution of uelay along the chain is one
wheie each stage has exactly the same uelay will the optimum clock netwoik be a balanceu clock netwoik.
The tiauitional assumption of foicing each stage on the loop to have exactly the same amount of time by
balancing clocks is funuamentally unnecessaiy anu fuitheimoie, as this papei has shown, it is also
impossible to achieve on mouein chips uue to clock complexity, onchipvaiiation, anu clock gating.
SetupSlackandSequentialSlack
The slowest logic function in a uesign is typically ueteimineu by computing the setup slack foi each gate in a
uesign anu finuing the logic path which compiises those gates with the lowest slack value. If we use the teim
Paths|gj to mean the set of all logic paths which pass thiough a logic gate g, anu foi each path p in Paths|gj we
use the teims L|pj, C|pj, anu u|pj to iefei to the launch clock uelay, captuie clock uelay, anu logic uelay foi
path p iespectively, then:
SetupConstraintforapathp: + u|pj < T L|pj + C|pj
SetupSlackforgateg: mtn
p In Paths|g]
((T + C|p]) (L|p] + u|p]))
Setup slack is in essence the woist case maigin by which all setup constiaints which pass thiough g have been
met. If the setup slack at a gate is negative then a setup constiaint must have been violateu, anu the
magnituue of the negativity uenotes the amount by which that setup constiaint has been violateu. The
sequence of gates with the smallest setup slacks uenotes the logic path which is most limiting chip speeu anu
2uu92u1u Azuio, Inc. 1S




14 2uu92u1u Azuio, Inc.

is typically iefeiieu to as the woist negative path, woist violateu path, oi ciitical path. The slack value of the
gates on the ciitical path is typically iefeiieu to as the Woist Negative Slack (WNS).
It is possible to geneialize the concept setup slack to setup chain constiaints, giving the concept of sequential
slack |Pan98,Conguuj, wheie the teim sequential is useu to emphasize the notion that these slacks can cioss
iegistei bounuaiies. If we use the teim Chains|gj to mean the set of all logic chains passing thiough a gate g,
anu foi each chain c we use the teim n|cj to iefei to the numbei of logic stages in chain c anu u|c,ij to iefei to
the woist logic uelay at stage i in chain c, then:
SetupChainConstraintforachainc: u|c, i]
max
<
n|c]
n|c]T
=1

n|c]
Sequentialslackforgateg: mtn
c |n Cha|nx|g]
[n|c]T - u|c, i]
max
=1

It is also helpful to noimalize the sequential slack ielative to the length of the chain by uiviuing by n|cj so that
the maigin can be thought of as an aveiage maigin pei logic stage which is theiefoie inuepenuent of chain
length. It also means that sequential slacks aie iepoiteu on the same scale as tiauitional setup slacks if the
noimalizeu sequential slack is 1uups it means that the aveiage setup slack along that chain will also be 1uups,
iiiespective of the clock aiiival times at each stei on the iegi chain.
n|c]
NormalizedSequentialslackforgateg: mtn
c |n Cha|nx|g]
[T - u|c, i]
max
n|c]
=1

If the smallest sequential slack in a uesign is negative, then the amount by which the sequential slack is
negative, iefeiieu to as the Woist Negative Sequential Slack (WNSS), uenotes how fai off the ciicuit is fiom
achieving its uesiieu clock speeu. We use the teim ciitical chain to uesciibe the logic chain with the WNSS,
although the teim ciitical cycle is also useu in the liteiatuie |Buistu4j to uesciibe the ciitical chain. We piefei
chain to cycle since it emphasizes that the sequence of gates with the WNSS neeu not foim a loop anu may
equally, anu inueeu often uoes, teiminate in a uesign I0.
If sequential slack is negative then tiauitional setup slacks will nevei be positive iiielevant of how the clocks
aie implementeu. If sequential slack is positive then it will be possible to builu a clock netwoik which ueliveis
clocks to each flipflop such that tiauitional setup slacks will also be positive although this netwoik almost
ceitainly will not be a balanceu netwoik. In this sense, using CC0pt, the maximum speeu at which a chip can
be clockeu is limiteu by the ciitical chain anu not the ciitical path. This uenotes a funuamental new uegiee of
fieeuom which can be exploiteu by CC0pt above anu beyonu physical optimization to make chips fastei oi
smallei oi lowei powei.

It is howevei impoitant to note that if sequential slacks aie positive this uoes not imply that setup slacks can
also be maue positive: fiistly, once clocks aie built the sequential slacks themselves will change uue to 0Cv,
clock gating, anu inteiclock timing, anu it is entiiely possible that this change will make the sequential slacks
negative again. Seconuly, the clock aiiival times necessaiy to achieve positive setup slacks may not be
achievable with a feasibly sizeu clock tiee in teims of aiea anu powei.
UsingCCOptthemaximumpossibleclockspeedislimitedbythecriticalchainnotthe
criticalpath.
SequentialOptimizationandUsefulSkew
Besign optimization methous which exploit sequential slacks aie usually teimeu sequentialoptimization
methous. These bioauly fall into two camps: retiming appioaches, which physically move logic acioss iegistei


2uu92u1u Azuio, Inc. 1S

bounuaiies, anu clockscheduling appioaches, which intelligently apply uelays to the clock tiee to impiove
setup slacks. Retiming was intiouuceu ovei twenty yeais ago |Leiseison84, Leiseison91j, but automatic
ietiming appioaches aie flowinvasive because of theii impact on foimal veiification anu testability, anu have
not gaineu wiuespieau acceptance. Scheuuling appioaches have also been aiounu foi almost twenty yeais
|Fishbuin9uj, anu aie moie applicable in touays flows.
Acauemic papeis on clock scheuuling tenu to split the pioblem into scheuule calculation |e.g. Kouitev99b,
RavinuianuSj anu scheuule implementation |e.g. Kouitev99, Xi97j, although some papeis |such as BeluuSj
tackle both halves of the pioblem sepaiately in the same papei. Commeicial EBA tools exploiting clock
scheuuling typically favoi moie iobust anu uiiect algoiithms which inciementally auu buffeis to an alieauy
balanceu clock tiee to boiiow time fiom positiveslack stages to aujacent negativeslack logic stages without
evei piecomputing a uesiieu scheuule. Although the teim usefulskew was oiiginally applieu to the two pait
calculatethenimplement appioach |Xi97, Xi99j, it is geneially useu touay to mean any CTS appioach that
iesults in an unbalanceu tiee foi timing ieasons, even if the appioach useu uoesnt evei explicitly calculate a
uesiieu clock scheuule. In any case, the key featuie in common is that timing ultimately uiives the clock
aiiival times at iegisteis anu not a set of CTS balancing constiaints.
The best way to think about how CTS can be geneializeu to be uiiven by timing anu not by a set of balancing
constiaints is to think of each flipflop as having foui constiaints on the aiiival time of its clock which aie a
simple ieaiiangement of the piopagateu clocks timing constiaints on its Bpin anu on its Qpin: see Figuie
12. These foui constiaints constiain the peimissible aiiival time to be within a winuow which uepenus on the
aiiival times of flops in the logical fanin anu fanout.

max((P|i1j + u|i1j
max
T), (P|i+1j

u|ij
min
)) <P[i]< min((P|i+1j

u|ij
max
+ T), (P|i1j

+ u|i1j
min
))
Setup
P|i1j + u|i1j
max
T <P[i]
Hold
P|i1j + u|i1j
min
>P[i]
Hold
P[i]>P|i+1j u|ij
min
Setup
P[i]< P|i+1j u|ij
max
+ T
G[i1]
max
G[i1]
min
G[i]
max
G[i]
min
T
clock
P[i1]
P[i]
P[i+1]
Figure12:Timingdrivenclockarrivaltimewindows
The uepenuency on the neighboiing flops makes it easy to see that these clock aiiival time winuows aie
globally inteitwineu. If a clock netwoik can be built which ueliveis the clock to all flipflops within theii
peimissible aiiival time winuows then setup anu holu time constiaints will be met. This concept of winuows



16 2uu92u1u Azuio, Inc.

can also easily be geneializeu to incluue 0Cv ueiates anu CPPR, anu also can be extenueu to apply to othei
timing enupoints such as clock gates, clock muxes, anu clock geneiatoi blocks. It can also be applieu to
inteinal noues in the clock tiee by a simple inteisection of the winuows of subnoues.
Although the concept of winuowing sounus veiy poweiful, it has one key limitation which we alluueu to in the
pievious section: it is isolateu fiom the physical optimization of logic paths. Sepaiating the steps of
. optimizing the logic anu builuing the clock tiee causes two key pioblems on ieal woilu commeicial uesigns
The fiist is the clock timing gap, which as we have shown iequiies that the uesiieu scheuule be baseu on a
tiue piopagateu clocks mouel of timing in oiuei to piopeily account foi 0Cv, inteiclock paths anu clock gate
enable timings. If scheuule calculation happens befoie clocks aie built then it cannot be baseu on a
piopagateu clocks mouel of timing!
The seconu pioblem is that clocks aie not fiee. They cost aiea anu powei, anu any inciease in inseition uelay
causes moie setup anu holu timing uegiauation uue to 0Cv on clock paths. While positive sequential slacks
imply that a clock netwoik can theoietically be built to make tiauitional setup slacks positive, this uoes not
mean that such a netwoik woulu in piactice have acceptable aiea anu powei, anu noi uoes it mean that setup
slacks coulu be met if 0Cv ueiates aie being applieu to clock paths as well as logic paths. In fact, when 0Cv
ueiates aie consiueieu, it is entiiely possible to get into a vicious spiial of incieasing inseition uelays causing
incieasingly tight winuows, which cause fuithei incieases in inseition uelay in oiuei to meet the winuows,
culminating in the situation wheie it is impossible to make setup slacks positive.
CC0pt uiffeientiates itself fiom tiauitional appioaches to useful skew by biinging both clock scheuuling anu
physical optimization togethei unuei a unifieu aichitectuie anu basing all uecisions on a tiue piopagateu
clocks mouel of timing, incluuing inteiclock paths, 0Cv anu clock gate timing. CC0pt tieats both clocks anu
logic as equally impoitant classes of citizen anu unueistanus that in piactice eithei can be the limiting factoi
on achievable chip speeu. CC0pt must somehow entei the piopagateu clocks woilu as soon as possible anu
then globally optimize both the clock anu logic uelays accoiuing to some coheient optimization objective
which can be bounueu by eithei logic consiueiations oi clock consiueiations.
The close maiiiage of physical optimization anu clock constiuction, togethei with knowleuge of the siue
effects of vaiious uecisions in each uomain, is the most uifficult component of the CC0pt pioblem to solve
well. But it is also the key enablei foi mainstieam commeicial auoption of CC0pt. The ielaxation of the
iequiiement to balance clocks unleashes significant fieeuom, but this fieeuom is commeicially useless if it is
not exploiteu wisely anu in the context of a full piopagateu clocks mouel of timing. Key signs of the failuie to
exploit this fieeuom piopeily aie clock tiees that aie too laige, inseition uelays that aie too long, anu
significant holu timing pioblems which iesult in an unieasonable inciease in uesign aiea once holu fix buffeis
have been inseiteu.

CCOptsdecisionsarealwaysbasedonatruepropagatedclocksmeasureoftiming
includingclockgates,interclockpaths,OCVderates,andCPPR.
CCOptwillneverclosetimingattheexpenseofcreatinganunreasonablylargeclock
networkoranunreasonablenumberofholdtimeviolations.
ClockConcurrentDesignFlow
CC0pt is a ieplacement foi the tiauitional CTS anu postCTS optimization uesign flow steps, see Figuie 1S.
What is uone befoie clock concuiient optimization anu what is uone aftei iemain the same. Theie is no


2uu92u1u Azuio, Inc. 17

change in timing signoff oi in foimal veiification oi in gate level simulation. No special uata stiuctuies oi file
foimats aie neeueu, anu complex CTS configuiation sciipts become ieuunuant as theie is no longei any neeu
to specify any balancing constiaints. Theie is a potential impact on the magnituue of scan chain holu
violations but this can easily be manageu by enhancing scan chain stitching algoiithms to uiiectly consiuei
holu slacks anu not just scan chain wiie length. Foi example, Azuios Rubix CC0pt tool alieauy incluues
such a scan chain iestitching capability.
RTL
FinalLayout
Synthesis
Floorplan
InitialPlacement
Clock TreeSynthesis
PostCTS Optimization
Routing
PostRoute Optimization
PhysicalOptimization
SignoffVerification
IdealClocks
Timing
Propagated
Clocks
Timing
RTL
FinalLayout
Synthesis
Floorplan
InitialPlacement
Routing
PostRoute Optimization
Clock Concurrent
Optimization
SignoffVerification
IdealClocks
Timing
Propagated
Clocks
Timing
TraditionalDesignFlow ClockConcurrentDesignFlow
PhysicalOptimization

Figure13:ClockConcurrentDesignFlow




18 2uu92u1u Azuio, Inc.


Key Benefits of Clock Concurrent Optimization
This section oveiviews the key benefits that CC0pt can biing to the uigital chip uesign community.
1.Increasedchipspeedorreducedchipareaandpower
0sing CC0pt the maximum possible clock speeu is limiteu by the ciitical chain anu not the ciitical path in a
uesign. This is a funuamental new uegiee of fieeuom to help close timing above anu beyonu tiauitional uesign
flows. If the uesiieu chip speeu is alieauy achievable without CC0pt then this same auuitional uegiee of
fieeuom can be exploiteu to ieuuce chip aiea oi powei. At 6Snm anu below the achievable incieases in clock
speeu can be as much as 2u%.
2.ReducedIRdrop
Since CC0pt uoes not balance clocks the peak cuiient uiawn by the clock netwoik is significantly ieuuceu. In
fact it is entiiely feasible to extenu CC0pt to uiiectly consiuei peak cuiient (oi some ieasonable estimate of
peak cuiient) as an optimization paiametei anu specifically skew clocks anu aujust logic path uelays to
ensuie that peak cuiient is contiolleu to be within a piespecifieu limit. At auvanceu piocess noues IRuiop
can have a ciitical impact on timing signoff anu chip packaging cost. CC0pt unshackles chip uesigneis fiom
the tiauitional conflict of inteiest between tight skew being goou foi timing but teiiible foi IRuiop.
3.Increasedproductivityandacceleratedtimetomarket
Theie aie two uistinct ways in which CC0pt incieases uesignei piouuctivity anu acceleiates time to maiket.
The fiist is uue to a lack of any iequiiement to configuie clock tiee balancing constiaints oi manually set
inseition uelay offsets foi timing ciitical sink pins. Foi complex SoC uesigns composing a complete set of
balancing constiaints can take moie than a month, anu much of this effoit often neeus iepeating eveiy time a
new netlist is pioviueu by the fiontenu uesign team.
The seconu way in which CC0pt incieases uesignei piouuctivity anu acceleiates time to maiket is uue to a
significant ieuuction in the numbei of iteiations between the fiontenu anu backenu uesign teams. Nany of
these iteiations aie foi the sole puipose of asking the fiontenu uesign team to manually move logic acioss
iegistei bounuaiies by changing the RTL. This manual moving of logic is in essence a foim of sequential
optimization being peifoimeu manually anu veiy inefficiently. Since the entiie flow must be ieiun to
incoipoiate the RTL changes, theie is no guaiantee that the satisfactoiy aspects of the postplacement timing
pictuie will peisist. Theiefoie the iequesteu changes may not have the intenueu benefit. Nost of the neeu to
uo this manual logic moving is completely eliminateu using CC0pt since it can simply skew the clocks
insteau. The time saving fiom these ieuuceu iteiations can be many months.
4.Acceleratedmigrationto45nmandbelow
The ability of CC0pt to peifoim timing optimization is not uegiaueu by the giowing clock timing gap. This is
because all uecisions it makes aie uiiectly baseu on a piopagateu clocks mouel of timing. If aichitecteu
coiiectly, the motto foi CC0pt is if I can time it then I can optimize it. Clock gates, complex clock muxing
configuiations, 0Cv ueiates, CPPR, multicoinei, anu multimoue shoulu all fall out in the wash so long as the
timing analysis engine is able to consiuei them. Without the use of a clock concuiient flow the clock timing
gap incieasingly ciipples timing closuie pie to postCTS, anu timing optimization steps uownstieam fiom CTS
just uont have the hoisepowei to iecovei fiom the uamage. 0sing CC0pt, migiation to auvanceu piocess
noues can happen fastei anu with significantly less pain.


Conclusions
Clocking lies at the heait of commeicial chip uesign flows anu is almost as cential to the uigital chip uesign
community as the tiansistoi itself. But the tiauitional assumption that if clocks aie balanceu then piopagateu
clocks timing will miiioi iueal clocks timing is funuamentally bioken. Clock gating, clock complexity anu on
chip vaiiation aie the key inuustiy tienus causing this uiveigence, which we iefei to as the clock timing gap.
At 4u4Snm the clock timing gap can be as much as Su% of the clock peiiou iesulting in an almost complete
iewiite of the timing lanuscape between iueal anu piopagateu clocks timing.
Clock concuiient optimization gives up on the iuea of clock balancing as both iestiictive anu unhelpful at
auvanceu piocess noues. It meiges CTS with physical optimization builuing both clocks anu optimizing logic
uelays at the same time baseu uiiectly on a piopagateu clocks mouel of timing. This unleashes a funuamental
new uegiee of fieeuom to boiiow time acioss iegistei bounuaiies iesulting in chip speeu becoming limiteu
by ciitical chains not ciitical paths.
0nuei the hoou, the key challenge which CC0pt must tackle is the potential foi clock netwoiks to become
unieasonably laige, anu auuiessing this challenge iequiies the clock constiuction algoiithms to become veiy
tightly bounu with the logic optimization algoiithms. The intimate ielationship between clock constiuction
anu logic optimization is what uiffeientiates CC0pt fiom the tiauitional techniques of sequential
optimization anu useful skew.
CC0pt ueliveis foui key types of benefit to the uigital chip uesign community: incieaseu chip speeu oi
ieuuceu chip aiea anu powei; ieuuceu IRuiop; incieaseu piouuctivity anu acceleiateu timetomaiket; anu
acceleiateu migiation to 4Snm anu below.

2uu92u1u Azuio, Inc. 19




2u 2uu92u1u Azuio, Inc.

References
[Cong00] }. Cong anu S. K. Lim, Physical planning with ietiming, in Bigest of Technical Papeis of the
IEEEACN Inteinational Confeience on ComputeiAiueu Besign, (San }ose, CA), pp. 17, Novembei 2uuu
[Fishburn90] }. P. Fishbuin, Clock Skew 0ptimization, IEEE Tians. on Computeis, vol S9 pp 94S9S1, 199u
[Held03] S. Belu, B. Koite, }. Nabeig, N. Ringe anu }. vygen, Clock scheuuling anu clocktiee constiuction foi
high peifoimance ASICs, Pioceeuings of the 2uuS IEEEACN inteinational confeience on Computeiaiueu
uesign
[Hurst04]A. P. Buist, P. Chong, A. Kuehlmann, Physical placement uiiven by sequential timing analysis.
Pioc. ICCAB 'u4, pp. S79S86.
[Kourtev99] I. S. Kouitev anu E. u. Fiieuman, Synthesis of clock tiee topologies to implement nonzeio clock
skew scheuule, in IEE Pioceeuings on Ciicuits, Bevices, Systems, vol. 146, pp. S21S26, Becembei 1999.
[Kourtev99b]I. S. Kouitev anu E. u. Fiieuman, Clock Skew Scheuuling foi Impioveu Reliability via Quauiatic
Piogiamming, Pioc. ICCAB 1999.
[Leiserson83] C. Leiseison anu }. Saxe, 0ptimizing synchionous systems, }ouinal of vLSI anu Computei
Systems, vol. 1, pp. 4167, }anuaiy 198S.
[Leisers 1 on91] C. Leiseison anu }. Saxe, Retiming synchionous ciicuitiy, Algoiithmica, vol. 6, pp. SSS, 199
[Pan98] P. Pan, A. K. Kaianuikai, anu C. L. Liu, 0ptimal clock peiiou clusteiing foi sequential ciicuits with
ietiming, in IEEE Tians. on CAB, pp 489498, 1998
[Ravindran03] A. K. K. Ravinuian anu E. Sentovich, Nultiuomain clock skew scheuuling, in Pioceeuings of
the 21th Inteinational Confeience on Computei Aiueu Besign, ACN, 2uuS
[Xi97] }. u. Xi anu W. W.N. Bai, 0sefulskew clock iouting with gate sizing foi low powei uesign, }. vLSI
Signal Piocess. Syst., vol. 16, no. 2S, pp. 16S179, 1997.
[Xi99] }. u. Xi anu B. Staepelaeie, "0sing Clock Skew as a Tool to Achieve 0ptimal Timing," Integiateu System
Besign, Apiil 1999.

Você também pode gostar