3 AParallelFlopSynchronizerandtheHandshakeInterfacef

IEICE TRANS. FUNDAMENTALS, VOL.E87A, NO.
12 DECEMBER 2004
3166
PAPER
Special Section on VLSI Design and CAD Algorithms
A Parallel Flop Synchronizer and the Handshake Interface for

Bridging Asynchronous Domains
Suk-Jin KIMa) , Jeong-Gun LEE , Student Members, and Kiseon KIM , Member
SUMMARY
Inter-domain communications on a chip require a synchronizer to resolve the timing problems between an input and a clock of
a destination. This paper presents a parallel flop synchronizer and its interface circuit for transferring asynchronous data to the clock domain. The
proposed scheme uses a bank of independent two-flops in parallel and supports a two-phase handshake protocol. Compared to the conventional twoflop synchronizer, performance analysis shows that the proposed scheme
can reduce latency up to one and a half of clock cycles while retaining its
safety to a tolerable level. All designs have been implemented in a 0.25 m
CMOS technology to verify performance analysis of the proposed synchronization.
key words: synchronizer, two-flop, metastability, clock domain
1.
Introduction
Along with the rapid increase of integration density and

clock frequency, a future System-on-Chip (SoC) may handle
a number of clock domains in a single chip. Subsequently,
we envision that the design of the SoC will be based on several pre-existing modules, which are driven with dierent
clock frequencies, in a plug-and-play fashion. It is noteworthy that communications between clock domains require a
robust synchronizer to resolve the timing problems between
an input and a clock of a destination. If the input arrives
within the aperture time near the rising (or falling) edge of
the clock, the output of a flip-flop may enter the metastable
state where it temporarily becomes an intermediate value
between 0 and 1. Furthermore, the synchronization scheme
should provide the handshake interface to prevent the input
from missing or overwriting to the destination domain.
It is well known that inter-domain communications can
be classified into five domains according to the relationship
between the timing events on the input data and the destination clock [1]; synchronous has the same frequencies and
phases, mesochronous has the same frequencies but a constant phase dierence, plesiochronous is at nearly the same
frequencies, periodic domain operates at the dierent frequencies, and asynchronous does not have any restriction
on event timing. As clock frequency and integration density
increase, the synchronization problem is becoming an issue
for the asynchronous domain.
Recently, diverse schemes for inter-domain communiManuscript received March 18, 2004.
Manuscript revised June 11, 2004.
Final manuscript received July 26, 2004.
The authors are with the Department of Information and

Communications, Gwangju Institute of Science and Technology,
Gwangju, 500-712, Korea.
a) E-mail: guesswho@gist.ac.kr
cations have been proposed [2][8]. Especially for the asynchronous domain, a simple and common solution is well
known, which is named as a two-flop synchronizer [4], [5],
and also more elaborating schemes are proposed based on
the self-timed pipeline [6], [7]. Further noting that these
schemes suer from long latency of synchronization due
to the serially connected flip-flops. An alternative scheme
was proposed to reduce latency by means of a pointer-based
FIFO [8]. This scheme does not require synchronization unless the FIFO is either full or empty. However, there are
practically many chances that one side is faster than the
other, then it may experience the same latency of the twoflop, which corresponds to one to two clock cycles.
In this paper, we adopt the basic two-flop, and present a
parallel bank of two-flops synchronizer with the handshake
interface circuit for bridging asynchronous domains, in order to reduce latency of synchronization without relying on
FIFO. Compared to the conventional two-flop synchronizer,
the proposed scheme needs trade-o between safety and latency to optimize the overall performance. Latency of the
proposed synchronization is analyzed and compared to the
conventional two-flop. All designs have been implemented
in a CMOS 0.25 m technology and a pre-layout HSPICE
simulation was made on a 16-bit wide datapath to verify the
analysis.
The rest of the paper is organized as follows. Section 2
introduces the conventional two-flop synchronization. We
propose the parallel flop synchronizer and the handshake interface circuit in Sect. 3 and Sect. 4, respectively. Section 5
analyzes latency of the proposed synchronization scheme
and HSPICE simulation results are discussed in Sect. 6. Finally, we draw the conclusion in Sect. 7.
2.
Overview of a Two-Flop Synchronizer
Figure 1 shows interface of the synchronizer with an input

side for the asynchronous write and an output side which
Fig. 1 Interface of a synchronizer for transferring asynchronous data to

a clock domain.
KIM et al.: A PARALLEL FLOP SYNCHRONIZER AND THE HANDSHAKE INTERFACE FOR BRIDGING ASYNCHRONOUS DOMAINS
3167
Fig. 2
A two-flop synchronizer and its handshake interface circuit.
has a clocked interface for reading. All input signals from

a source domain are asynchronous, but a Rrdy (for Read
ready) signal to a destination is valid at the rising edge of
the destination clock (CLK).
The simplest way to synchronize an asynchronous signal is to add a pair of flip-flops in serial [1], as shown in a
Two-flop of Fig. 2. If a request (REQ) signal from the source
domain arrives during the aperture time of the first flop (F1),
its output enters a metastable state. Then, the second flop
(F2) allows for a whole clock cycle to resolve the metastability before latching it. However, this scheme does not guarantee the perfect synchronization since the time required for
stabilizing a metastable signal varies and is unpredictable.
Instead, it reduces the probability of synchronization failure
to a tolerable level.
The performance of the synchronizer can be expressed
in terms of latency and safety. Latency of the two-flop is
obviously one to two clock cycles, while its safety is represented by Mean Time Between Failure (MTBF) that informs
on how often it will meet synchronization failure. The following equation shows MTBF of the synchronizer [1], [3],
[4], [9]:
MT BF =
1
P f ail fd
et/
T w fc fd
(1)
where P f ail is the probability of synchronization failure, fd

is the frequency of data, is the settling time constant of
the flip-flop, T w is a parameter related to the time window of
susceptibility, and fc is the clock frequency. P f ail is known
to be decreased exponentially as the waiting time, t, increases [9]. For the two-flop synchronizer, t is simply set
to the clock cycle time. It is noteworthy that the MTBF of a
practical SoC using the two-flop is order of many eons [4],
subsequently we rather trade-o between safety and latency.
The two-flop synchronizer is generally sucient to
synchronize a signal to a clocked module. However, it is
possible that the signal can be sampled multiple times by
the fast clock or never detected by the slow clock of the destination domain. Therefore, a handshake protocol is necessary to prevent the input data from missing or overwriting to
the destination. Figure 2 shows a push type of the two-flop
synchronizer and its interface circuit in [5], based on a twophase bundled data protocol. Note that each up or down
transition on handshake signals represents a distinct event.
When the source domain initiates a write transaction to a
destination domain, it asserts a REQ signal by toggling it.
After the REQ is synchronized by the Two-flop, the transition of its output is detected by an XOR gate and a flip-flop
F3. Then, a Rrdy signal becomes high and an actual data
transfer occurs at the next rising edge of the CLK so that the
combinational logic can be placed after the data flop F4. At
the same time, an acknowledgement (ACK) signal is toggled
to show the readiness for another transaction to the source
domain. Since handshake signals ensure stable data, data
can be transferred safely without synchronizing themselves.
The maximum latency of two-flop scheme is three clock cycles (two cycles for synchronization plus one cycle for the
handshake interface).
When the clock domain has data to send to the asynchronous domain, on the contrary, the ACK signal should be
synchronized instead of the REQ signal.
3.
Proposed Parallel Flop Synchronizer
In order to reduce latency of a synchronizer at the cost of

MTBF, we adopt a number of independent two-flops in parallel, called a parallel flop synchronizer, as shown in Fig. 3.
Note that each Two-flop is fed by the same input signal,
REQ, but the corresponding clock (CLKi ) to the ith Twoflop is delayed by Di from a local CLK,
Di = i
Tc
N
(2)
where T c is a clock cycle time of the destination domain

and N is the number of the parallel Two-flops. One possible
implementation of the delay element is to make an inverter
chain in series. Figure 4 shows a basic delay element consists of two inverters. To make a fine adjustment of the delay
value, we append extra capacitive load that will never draw
current itself.
When the REQ signal arrives at the clock domain, the
earliest Two-flop triggers its clock after the REQ latches the
input within T c /N. Then, the next Two-flops latch the same
REQ one by one at intervals of T c /N. Therefore, the proposed parallel bank leads the synchronizer to sample the input signal at N times of the local clock rate, and the early
sampled signal can reduce latency of synchronization.
In case that an output of any first flop enters the
metastable state, its successive flop allows for a clock cycle to resolve the metastability as the conventional two-flop
synchronizer does. It is well known that, when a system has
the failure probability of p, a parallel bank of N systems has
failure probability of
P f ail (parallel) = 1 (1 p)N
(3)
Further considering that N p is negligibly small, Eq. (3) becomes approximately as follows:
IEICE TRANS. FUNDAMENTALS, VOL.E87A, NO.12 DECEMBER 2004
3168
Fig. 5
Fig. 3
Fig. 4
The dynamic CMOS implementation of the DETECT module.
A parallel flop synchronizer.
gate in [10], which is shown in Fig. 5. Note that the pull-up

path resistances of both P-type transistors P0 and P1 should
be at least five or six times as big as the pull-down resistance.
This can be achieved by minimally sized P-type transistors
and wider N-type transistors. The more optimized circuit
for the dynamic N-input gate is introduced in [11].
Finally, the proposed synchronization requires up to
(1 + N1 ) clock cycles for synchronization plus the delay for
the DETECT module, denoted as Ddetect .
A basic delay element.

N(N 1)p2
...
P f ail (parallel) = 1 1 N p +
2!
1 [1 N p]
= Np
(4)
Accordingly, the MTBF of the parallel flop synchronizer becomes 1/N times of a single two-flop synchronizer, which
needs to be safe enough.
1
P f ail (parallel f lop) fd
1
NP f ail (two f lop) fd

MT BFtwo f lop
=
N
MT BF parallel f lop =
The Handshake Interface for the Parallel Flop Synchronizer
Once a parallel flop synchronizer produces a SYNC signal,

a REQ signal should hold stable until a source domain receives an ACK from a destination domain. For the parallel
flop synchronizer, we propose the handshake interface circuit based on the two-phase bundled data protocol, as shown
in Fig. 6.
The proposed handshake interface has two functions as
follows:
generating an ACK signal on a corresponding REQ
notifying the arrival of data to the destination using a
Rrdy signal
(5)
At the last stage of the parallel flop synchronizer, there

is a module for detecting the first transition among the outputs of N Two-flops to notify the arrival of a new REQ. Its
output, SYNC, becomes high when an input (Ri ) is deviated
from other inputs, and goes low when all inputs are same.
Therefore, the logic equation for the SYNC signal can be
written as follows:
SYNC = AllZero(Ri ) + AllOne(Ri )
= (R0 R1 . . . RN1 ) + (R0 R1 . . . RN1 )
4.
(6)
Eq. (6) can be realized through two N-input AND gates and
a 2-input NOR gate. In this implementation, the delay of the
N-input gate becomes larger as the number of N increases.
In order to design a fast detection circuit eciently, we apply the dynamic CMOS implementation of N-input AND
As soon as the SYNC becomes high (a new REQ is

synchronized), the ACK signal is generated by triggering F5,
while the input data are temporarily latched at F6 flip-flop.
Since a source domain can change data any time after receiving the ACK, the transmitted data should be stored simultaneously with the assertion of the ACK. This decoupling of
the ACK generation from the rising edge of the CLK reduces
latency of the handshake protocol at the cost of intermediate
data flip-flop.
The early sampled SYNC signal may become low before the rising edge of the CLK. Since the destination latches
the input Rdata upon the positive edge of the CLK, we extend the SYNC until the next rising edge of the CLK by
means of a C-element [12] and two series of P-type transistors M1 and M2. Initially, a node R and an output of a Celement are high. Once the SYNC signal becomes high, Ntype transistors M3 and M4 pull the node R low. Then, the
3169
Fig. 6
The handshake interface circuit for the parallel flop synchronizer.
inverter chain to the M4 disables the pull down path shortly

for preventing the node R from fighting against a fast Pchannel pull up. Subsequently, a low value of the node R
filters through the C-element when the CLK is low. The Celement has the majority function performed on the two inputs and the previous output. Then, the Rrdy becomes high
and the M1 turns on, but the M2 should wait for the high
phase of the CLK to switch on. At the next rising edge of
the CLK, finally the node R is charged to the initial state.
Since both inputs of the C-element (R and CLK) are high
now, the output of the C-element goes high. Subsequently,
the Rrdy becomes low and M1 is turned o.
The Rrdy signal indicates the arrival of new data to the
destination domain. It is also an enable (En) signal for data
flip-flop F7 triggered by the local CLK. To avoid another
synchronization failure at the F7, timing constraints must be
satisfied that the En signal and the input of the F7 are stable
before the rising edge of the CLK less than its setup time.
Note that Dcritical denotes the critical delay from the rising
edge of the CLKN1 in Fig. 3 to the En or the output of the
F6, then
Dcritical tCLKN1 max[En,F6Q ]
= Ddetect + Dhandshake
Tc
Dcritical <
T setup
N
(7)
(8)
(9)
where Dhandshake is the delay of the handshake interface

on the critical condition. This is the worst case that the
first REQ sampling is made by the Two-flopN1 . Note that
Dcritcal includes the delay of the DETECT module, Ddetect .
From Eq. (9), the number of parallel Two-flops is limited as
follows:
N<
5.
Tc
Dcritical + T setup
(10)
Latency Analysis of the Proposed Scheme
To analyze the performance of the synchronizer, we investigate latency of the proposed scheme in detail and compare
to that of the conventional two-flop synchronizer.
We classify latency of the synchronizer into overall

and protocol latency, denoted as Loverall and L protocol , respectively. Loverall is defined as the time consumed in the
synchronizer from a REQ arrival to data latch at the destination domain. Subsequently, Loverall can be expressed as
a sum of synchronization and interface latency (Loverall =
L sync + Lint f ). On the contrary to the interface in [4], [5],
the proposed interface decouples an ACK generation from
the local CLK of the destination. Therefore, it is worthy to
analyze protocol latency, defined as the time taken from the
REQ arrival to the ACK generation. Similarly, L protocol can
be divided as synchronization and acknowledgement latency
(L protocol = L sync +Lack ). Note that these latency components
are varying according to the REQ arrival time after the CLK
ticks (0 < t T c ).
Synchronization latency: The single two-flop scheme incurs one to two clock cycles for synchronization owing to
serially connected two flip-flops. Therefore, synchronization latency of the two-flop, denoted as L sync two , can be expressed to the following equations:
L sync
two (t)
= (T c t) + T c
= 2T c t,
f or 0 < t T c
(11)
On the other hand, the sampling rate of the parallel

flop synchronizer increases N times of the local clock rate.
Therefore, synchronization latency of the proposed parallel
flop, denoted as L sync par , varies in a sawtooth function having peak values at Nn T c and decrease proportionally. The
function is periodic with period TNc . Therefore, L sync par becomes as follows:

T
c
t + T c + Ddetect
L sync par (t) =
N
N+1
T c + Ddetect t,
=
N
Tc
(12)
f or 0 < t
N
T
c
L sync par (t) = L sync par t +
,
N
(13)
f or 0 < t T c
It is dicult to generalize Ddetect in terms of T c . Further considering the timing constraint of Eq. (8), however,
Dcritical including Ddetect should be less than the clock interval T c /N. Therefore, we move Ddetect into the part of the interface latency, Lint f par (and the acknowledgement latency,
Lack par ), for simplicity. Then, synchronization latency of
both schemes can be depicted in Fig. 7(a).
Interface latency: The handshake interface of the conventional two-flop scheme always takes a whole clock cycle,
since the destination latches data upon the rising edge of the
CLK. Therefore, interface latency of the two-flop becomes
Lint f
two (t)
= Tc
(14)
3170
(a) Synchronization latency
Fig. 8
Latency comparison.
Interface latency of both synchronizers are depicted in

Fig. 7(b).
(b) Interface latency
Overall latency: Overall latency is the sum of synchronization and interface latency. Therefore, overall latency of the
single two-flop scheme becomes
Loverall
two (t)
= L sync two (t) + Lint f

= 3T c t
f or 0 < t T c
two (t)
(16)
while that of the proposed scheme is

Loverall
par (t)
(c) Acknowledgement latency

Fig. 7
Latency comparison.
However, latency of the proposed interface, denoted as

Lint f par , varies depending on the first sampling time of the
REQ signal. Considering Ddetect , Lint f par is defined as the
time taken from the first transition of the parallel bank outputs (Ri ) to the data store at the F7. From Eqs. (8) and (9),
in the case that the REQ arrives before the last Two-Flop
triggers its clock (CLKN1 ), the F7 can latch the transmitted
data (Rdata) at the next rising edge of the CLK. Otherwise,
it wait for one more clock cycle, which incurs the same latency of the two-flop interface. Therefore, Lint f par has a
stair function shifted by TNc as follows:
Lint f
par (t) =
Nk
N Tc,
Tc,
< t Nk T c ,
k = 1, 2, . . . , N 1
N1
N Tc
< t Tc
(15)
(17)
Figure 8 shows overall latency of the two-flop and the parallel flop, respectively. In case that the REQ arrives before the
Two-flopN1 triggers its clock (region 1 and region 2), the
proposed synchronization scheme reduces overall latency
by a clock cycle. Otherwise, it incurs the same latency as
that of the conventional scheme.
Acknowledgement latency: In the handshake interface of
[4], [5], the ACK signal is always generated upon the rising
edge of the CLK. Therefore, acknowledgement latency of
the conventional two-flop is same as interface latency:
Lack
k1
N Tc
= L sync par (t) + Lint f par (t)
2T c t, 0 < t N1
N Tc
3T t, N1 T < t T
c
c
c
N
two (t)
= Lint f
two (t)
= Tc
(18)
Similar to the calculation of Lint f par , acknowledgement latency of the proposed interface, Lack par , also includes Ddetect . Therefore, Lack par is defined as the time
3171
Table 1
Loverall
Loverall
Latency reduction.
L protocol
two
par
two
L protocol
Tc
par
2N2
N Tc
Loverall
par
L protocol
0
Region1
Tc
Region2
Tc
Tc
N1
N Tc
N1
N Tc
Region3
par
N2
N Tc
taken from the first transition of the parallel bank output (Ri )
to the ACK generation at the F5 of Fig. 6. From Eq. (9), we
expect that the time it takes from the first REQ sampling
clock to the F6 output is less than the clock interval, T c /N.
Therefore, Lack par has at most T c /N as follows:
Tc
Tc
T setup <
N
N
which is depicted in Fig. 7(c).
Lack
par (t)
Dcritical <
two (t)
= L sync two (t) + Lack two (t)

= 3T c t = Loverall two (t)
(20)
On the other hand, protocol latency of the parallel flop

becomes as follows:
L protocol
L protocol
= L sync par (t) + Lack par (t)

N+2
T c t,
N
Tc
f or 0 < t
N
T
c
,
par (t) = L protocol par t +
N
f or 0 < t T c
par (t)
(21)
(22)
which are also depicted in Fig. 8. From the figure,

L protocol par is less than L protocol two and Loverall par . In case
that the REQ arrives within region 2, L protocol par is same as
Loverall par .
Table 1 shows the summary of latency reduction based
on Fig. 8. The proposed synchronization scheme has a various latency reduction according to the REQ arrival time and
the number of parallel Two-flops.
Average gain for latency: We introduce an average gain to
compare latency of the proposed scheme to the single twoflop. Suppose that the REQ arrival is random, and modelled as a uniform random variable, then the average gain
for overall latency is
Goverall = E[Loverall
N1
=
Tc
N
two
Loverall
Average gain for latency.
(19)
Protocol latency: Protocol latency is also the sum of synchronization and acknowledgement latency. Since acknowledgement latency of the two-flop is same as interface latency, protocol latency of the single two-flop is also same as
overall latency:
L protocol
Fig. 9
par ]
(23)
where E[L] implies the averaging of L, which is a function

of time, over the whole time duration.
For the average gain for protocol latency, we can calculate in the same way as Goverall :
G protocol = E[L protocol two L protocol
3N1
Tc
=
2 N
par ]
(24)
These trends are shown in Fig. 9. As the number of parallel Two-flops for the proposed scheme increases to infinity,
Goverall and G protocol approach to T c and 1.5T c , respectively.
6.
Simulation Results
To verify the analysis results on latency, the proposed parallel flop synchronizer and the interface circuit have been
implemented in 0.25 m CMOS technology. We employed
four independent Two-flops (N = 4) and 16-bit wide datapath for a pre-layout HSPICE simulation.
Figure 10 shows waveforms of handshake signals when
two asynchronous write transactions are made to a clock domain with T c = 2.0ns. Initially, all handshake signals are
low and a source domain starts transaction by asserting the
REQ signal. This signal arrives at the destination domain
between 24 T c and 34 T c (region 2 in Fig. 8), and it is firstly
sampled by the last Two-flop whose clock is delayed by 34 T c
from a local CLK (see R3). Then, other subsequent Twoflops have the same values one by one at intervals of T4c .
As soon as the SYNC detects the transition of R3, the ACK
signal is generated and a Rrdy signal becomes high since a
CLK signal is already low. Finally the input data are saved
to the destination at the next rising edge of the CLK. In this
case, overall and protocol latency of the proposed scheme
are almost same (about 1.5T c ). Compared to overall latency
of the single two-flop scheme that finishes the transaction
at the next rising edge of the CLK after R0 transition, it
saves a clock cycle for overall and protocol latency, respectively. These results confirms the analysis shown in region
2 of Fig. 8 and Table 1.
After the source domain receives the ACK from the
destination, it toggles the REQ signal to send another data.
3172
7.
Fig. 10
Waveforms of handshake signals.
Since the REQ signal arrives between 34 T c and T c (region

3 in Fig. 8), the REQ is firstly sampled by the first Twoflop triggered by the local CLK (see R0). Therefore, overall latency of the parallel flop is same as that of the single
two-flop scheme (less than 94 cycles). However, the ACK signal is generated about 34 T c before the destination latches the
Rdata. This latency reduction can also be calculated from
region3 of Table 1 when we substitute N = 4.
From our implementation, we have observed that the
sum of the Dcritical and T setup was less than 0.5ns. From
Eq. (10), therefore, the maximum number of parallel Twoflops for the proposed scheme is calculated as four (N =
4). However, note that Dcritical and T setup decreases as the
implementation technology advances.
As for the MTBF of the proposed synchronizer, suppose that a SoC has the receiver operating at 500 MHz and
the data exchange rate is on average five clock cycles. For
typical values of = 10 ps and T w = 50 ps (conservative
values in 0.18 m technology [4]), the probability of synchronization failure for the two-flop scheme is calculated as
follows:
2109
P f ail (two f lop) = (5 1011 )(5 108 )(e 1011 )

3 1089
(25)
Since 4 P f ail (two f lop) is negligibly small, the MTBF of

the parallel flop synchronizer is calculated by Eq. (5):
1
(4)(3 1089 )(108 )
1080 seconds
1072 years
MT BF parallel
(26)
which is practically safe enough, although it was reduced

from that of the conventional two-flop, 1073 years.
Conclusion
For the data transfer from an asynchronous domain to a

clocked module, this paper presents the parallel flop synchronizer and its handshake interface designed for reducing latency. Since the MTBF of the practical SoC is order of many eons, we can trade-o between safety and latency. The proposed synchronization scheme adopts a wellknown two-flop, and includes a bank of independent twoflops in parallel. Furthermore, we propose the handshake
interface to prevent the synchronizer from missing or overwriting data, which also reduces latency by decoupling the
acknowledgement generation from the destination clock at
the cost of the intermediate data flip-flop.
Performance analysis shows that the proposed synchronization scheme can reduce overall latency up to one clock
cycle, and protocol latency to one and a half of cycles, as
the number of parallel Two-flops approaches to infinity. The
latency reduction is caused by the parallel synchronization
and decoupling the generation of a handshake signal. The
MTBF of the proposed scheme decreases 1/N times of the
conventional two-flop, but it is still safe enough. All deigns
have been implemented in a 0.25 m CMOS technology to
verify the analysis results.
Acknowledgments
This work was supported in part by the KAIST/GIST IT-21
Initiative in BK21 of Ministry of Education, the Korea Science and Engineering Foundation (KOSEF) through the Ultrafast Fiber-Optic Networks Research Center at Gwangju
Institute of Science and Technology in republic of Korea.
References
[1] W. Dally and J. Poulton, Digital Systems Engineering, Cambridge
University Press, 1998.
[2] A. Chakraborty and M. Greenstreet, Ecient self-timed interfaces
for crossing clock domains, International Symposium on Advanced
Research in Asynchrnous circuits and Systems, pp.7888, 2003.
[3] R. Kol and R. Ginosar, Adaptive synchronization for multisynchoronous systems, International Symposium on Advanced Research in Asynchrnous circuits and Systems, pp.8796, June 1994.
[4] R. Ginosar, Fourteen ways to fool your synchronizer, International Symposium on Advanced Research in Asynchrnous circuits
and Systems, pp.8996, May 2003.
[5] M. Crews and Y. Yuenyongsgool, Practical design for transferring
signals between clock domains, http://www.edn.com, Feb. 2003.
[6] J. Seizovic, Pipeline synchronization, International Symposium on
Advanced Research in Asynchrnous circuits and Systems, pp.8796,
June 1994.
[7] J. Kessels, A. Peeters, and S. Kim, Bridging clock domains by
synchronizing the mice in the mousetrap, Internation Workshop
on Power and Timing Modeling, Optimization, and Simulation,
pp.141150, Sept. 2003.
[8] T. Chelcea and S. Nowick, Robust interfaces for mixed-timing
systems with application to latency-insensitive protocols, Proc.
ACM/IEEE Design Automation Conference, pp.2126, June 2001.
[9] C. Dike and E. Burton, Miller and noise eects in a synchronizing
flip-flop, IEEE J. Solid-State Circuits, vol.34, no.6, pp.849855,
3173
June 1999.
[10] F. Cheng, Practical design and performance evaluation of completion detection circuits, Proc. International Conference on Computer
Design, pp.354359, 1998.
[11] H. Lam and C. Tsui, High performance and low power completion
detection circuit, Proc. International Symposium on Circuits and
Systems, vol.5, pp.V-405-V-408, May 2003.
[12] M. Shames, J. Ebergen, and M. Elmasry, A comparison of CMOS
implementation of an asynchronous circuits primitive: The Celement, International Symposium on Low Power Electronics and
Design, pp.9396, 1996.
Suk-Jin Kim
received the B.S. degree in
electronics engineering from Kyunghee University in 1998, and M.S. degree from Gwangju Institute of Science and Technology (GIST), Korea, in 2000. Currently he is working toward
Ph.D. degree in Department of Information and
Communications of Gwangju Institute of Science and Technology. His research interests include the synchronization and the power saving
in Globally Asynchronous Locally Synchronous
(GALS) systems.
Jeong-Gun Lee
received the B.S. degree
in computer engineering from Hallyim University in 1996, and M.S. degree from Gwangju Institute of Science and Technology (GIST), Korea, in 1988. Currently he is working toward
Ph.D. degree in Department of Information and
Communications of Gwangju Institute of Science and Technology. His research interests include the Petri-net theory and asynchronous circuits and systems.
Kiseon Kim
received the B.Eng. and
M.Eng. from Seoul National University, all in
electronics engineering, in 1978 and 1980, and
Ph.D. degree from the University of Southern
California, Los Angeles, in 1987, in electrical engineering systems. From 1988 to 1991,
he was with Schoumberger in Texas, as a senior development engineer where he has been
involved in development of telemetry systems.
From 1991 to 1994, he was a computer communications specialist for Superconducting Super
Colider Lab., in TX, where he has built telemetry logging and analysis
systems for high energy physics instrumentations. Since joining GIST, in
1994, he is presently a Professor. His research interests include wideband
digital communications system design, analysis and implementation.

3 AParallelFlopSynchronizerandtheHandshakeInterfacef

Enviado por

Dados do documento

Título original

Direitos autorais

Formatos disponíveis

Compartilhar este documento

Compartilhar ou incorporar documento

Opções de compartilhamento

Você considera este documento útil?

Este conteúdo é inapropriado?

Direitos autorais:

Formatos disponíveis

3 AParallelFlopSynchronizerandtheHandshakeInterfacef

Enviado por

Direitos autorais:

Formatos disponíveis

IEICE TRANS. FUNDAMENTALS, VOL.E87A, NO.

Special Section on VLSI Design and CAD Algorithms

A Parallel Flop Synchronizer and the Handshake Interface for

Along with the rapid increase of integration density and

The authors are with the Department of Information and

Overview of a Two-Flop Synchronizer

Figure 1 shows interface of the synchronizer with an input

Fig. 1 Interface of a synchronizer for transferring asynchronous data to

A two-flop synchronizer and its handshake interface circuit.

has a clocked interface for reading. All input signals from

where P f ail is the probability of synchronization failure, fd

Proposed Parallel Flop Synchronizer

In order to reduce latency of a synchronizer at the cost of

where T c is a clock cycle time of the destination domain

IEICE TRANS. FUNDAMENTALS, VOL.E87A, NO.12 DECEMBER 2004

The dynamic CMOS implementation of the DETECT module.

A parallel flop synchronizer.

gate in [10], which is shown in Fig. 5. Note that the pull-up

A basic delay element.

NP f ail (two f lop) fd

The Handshake Interface for the Parallel Flop Synchronizer

Once a parallel flop synchronizer produces a SYNC signal,

At the last stage of the parallel flop synchronizer, there

= (R0 R1 . . . RN1 ) + (R0 R1 . . . RN1 )

As soon as the SYNC becomes high (a new REQ is

The handshake interface circuit for the parallel flop synchronizer.

inverter chain to the M4 disables the pull down path shortly

where Dhandshake is the delay of the handshake interface

Latency Analysis of the Proposed Scheme

We classify latency of the synchronizer into overall

On the other hand, the sampling rate of the parallel

IEICE TRANS. FUNDAMENTALS, VOL.E87A, NO.12 DECEMBER 2004

(a) Synchronization latency

Interface latency of both synchronizers are depicted in

(b) Interface latency

= L sync two (t) + Lint f

while that of the proposed scheme is

(c) Acknowledgement latency

However, latency of the proposed interface, denoted as

= L sync par (t) + Lint f par (t)

= L sync two (t) + Lack two (t)

On the other hand, protocol latency of the parallel flop

= L sync par (t) + Lack par (t)

which are also depicted in Fig. 8. From the figure,

Average gain for latency.

where E[L] implies the averaging of L, which is a function

IEICE TRANS. FUNDAMENTALS, VOL.E87A, NO.12 DECEMBER 2004

Waveforms of handshake signals.

Since the REQ signal arrives between 34 T c and T c (region

P f ail (two f lop) = (5 1011 )(5 108 )(e 1011 )

Since 4 P f ail (two f lop) is negligibly small, the MTBF of

which is practically safe enough, although it was reduced

For the data transfer from an asynchronous domain to a

Você também pode gostar