Escolar Documentos
Profissional Documentos
Cultura Documentos
12 DECEMBER 2004
3166
PAPER
SUMMARY
Inter-domain communications on a chip require a synchronizer to resolve the timing problems between an input and a clock of
a destination. This paper presents a parallel flop synchronizer and its interface circuit for transferring asynchronous data to the clock domain. The
proposed scheme uses a bank of independent two-flops in parallel and supports a two-phase handshake protocol. Compared to the conventional twoflop synchronizer, performance analysis shows that the proposed scheme
can reduce latency up to one and a half of clock cycles while retaining its
safety to a tolerable level. All designs have been implemented in a 0.25 m
CMOS technology to verify performance analysis of the proposed synchronization.
key words: synchronizer, two-flop, metastability, clock domain
1.
Introduction
cations have been proposed [2][8]. Especially for the asynchronous domain, a simple and common solution is well
known, which is named as a two-flop synchronizer [4], [5],
and also more elaborating schemes are proposed based on
the self-timed pipeline [6], [7]. Further noting that these
schemes suer from long latency of synchronization due
to the serially connected flip-flops. An alternative scheme
was proposed to reduce latency by means of a pointer-based
FIFO [8]. This scheme does not require synchronization unless the FIFO is either full or empty. However, there are
practically many chances that one side is faster than the
other, then it may experience the same latency of the twoflop, which corresponds to one to two clock cycles.
In this paper, we adopt the basic two-flop, and present a
parallel bank of two-flops synchronizer with the handshake
interface circuit for bridging asynchronous domains, in order to reduce latency of synchronization without relying on
FIFO. Compared to the conventional two-flop synchronizer,
the proposed scheme needs trade-o between safety and latency to optimize the overall performance. Latency of the
proposed synchronization is analyzed and compared to the
conventional two-flop. All designs have been implemented
in a CMOS 0.25 m technology and a pre-layout HSPICE
simulation was made on a 16-bit wide datapath to verify the
analysis.
The rest of the paper is organized as follows. Section 2
introduces the conventional two-flop synchronization. We
propose the parallel flop synchronizer and the handshake interface circuit in Sect. 3 and Sect. 4, respectively. Section 5
analyzes latency of the proposed synchronization scheme
and HSPICE simulation results are discussed in Sect. 6. Finally, we draw the conclusion in Sect. 7.
2.
KIM et al.: A PARALLEL FLOP SYNCHRONIZER AND THE HANDSHAKE INTERFACE FOR BRIDGING ASYNCHRONOUS DOMAINS
3167
Fig. 2
1
P f ail fd
et/
T w fc fd
(1)
synchronizer and its interface circuit in [5], based on a twophase bundled data protocol. Note that each up or down
transition on handshake signals represents a distinct event.
When the source domain initiates a write transaction to a
destination domain, it asserts a REQ signal by toggling it.
After the REQ is synchronized by the Two-flop, the transition of its output is detected by an XOR gate and a flip-flop
F3. Then, a Rrdy signal becomes high and an actual data
transfer occurs at the next rising edge of the CLK so that the
combinational logic can be placed after the data flop F4. At
the same time, an acknowledgement (ACK) signal is toggled
to show the readiness for another transaction to the source
domain. Since handshake signals ensure stable data, data
can be transferred safely without synchronizing themselves.
The maximum latency of two-flop scheme is three clock cycles (two cycles for synchronization plus one cycle for the
handshake interface).
When the clock domain has data to send to the asynchronous domain, on the contrary, the ACK signal should be
synchronized instead of the REQ signal.
3.
Tc
N
(2)
(3)
Further considering that N p is negligibly small, Eq. (3) becomes approximately as follows:
3168
Fig. 5
Fig. 3
Fig. 4
N(N 1)p2
...
P f ail (parallel) = 1 1 N p +
2!
1 [1 N p]
= Np
(4)
Accordingly, the MTBF of the parallel flop synchronizer becomes 1/N times of a single two-flop synchronizer, which
needs to be safe enough.
1
P f ail (parallel f lop) fd
1
MT BF parallel f lop =
(5)
4.
(6)
Eq. (6) can be realized through two N-input AND gates and
a 2-input NOR gate. In this implementation, the delay of the
N-input gate becomes larger as the number of N increases.
In order to design a fast detection circuit eciently, we apply the dynamic CMOS implementation of N-input AND
KIM et al.: A PARALLEL FLOP SYNCHRONIZER AND THE HANDSHAKE INTERFACE FOR BRIDGING ASYNCHRONOUS DOMAINS
3169
Fig. 6
(7)
(8)
(9)
Tc
Dcritical + T setup
(10)
To analyze the performance of the synchronizer, we investigate latency of the proposed scheme in detail and compare
to that of the conventional two-flop synchronizer.
two (t)
= (T c t) + T c
= 2T c t,
f or 0 < t T c
(11)
two (t)
= Tc
(14)
3170
Fig. 8
Latency comparison.
Overall latency: Overall latency is the sum of synchronization and interface latency. Therefore, overall latency of the
single two-flop scheme becomes
Loverall
two (t)
two (t)
(16)
par (t)
Latency comparison.
Lint f
par (t) =
Nk
N Tc,
Tc,
< t Nk T c ,
k = 1, 2, . . . , N 1
N1
N Tc
< t Tc
(15)
(17)
Figure 8 shows overall latency of the two-flop and the parallel flop, respectively. In case that the REQ arrives before the
Two-flopN1 triggers its clock (region 1 and region 2), the
proposed synchronization scheme reduces overall latency
by a clock cycle. Otherwise, it incurs the same latency as
that of the conventional scheme.
Acknowledgement latency: In the handshake interface of
[4], [5], the ACK signal is always generated upon the rising
edge of the CLK. Therefore, acknowledgement latency of
the conventional two-flop is same as interface latency:
Lack
k1
N Tc
2T c t, 0 < t N1
N Tc
3T t, N1 T < t T
c
c
c
N
two (t)
= Lint f
two (t)
= Tc
(18)
Similar to the calculation of Lint f par , acknowledgement latency of the proposed interface, Lack par , also includes Ddetect . Therefore, Lack par is defined as the time
KIM et al.: A PARALLEL FLOP SYNCHRONIZER AND THE HANDSHAKE INTERFACE FOR BRIDGING ASYNCHRONOUS DOMAINS
3171
Table 1
Loverall
Loverall
Latency reduction.
L protocol
two
par
two
L protocol
Tc
par
2N2
N Tc
Loverall
par
L protocol
0
Region1
Tc
Region2
Tc
Tc
N1
N Tc
N1
N Tc
Region3
par
N2
N Tc
taken from the first transition of the parallel bank output (Ri )
to the ACK generation at the F5 of Fig. 6. From Eq. (9), we
expect that the time it takes from the first REQ sampling
clock to the F6 output is less than the clock interval, T c /N.
Therefore, Lack par has at most T c /N as follows:
Tc
Tc
T setup <
N
N
which is depicted in Fig. 7(c).
Lack
par (t)
Dcritical <
two (t)
(20)
L protocol
N
Tc
f or 0 < t
N
T
c
,
par (t) = L protocol par t +
N
f or 0 < t T c
par (t)
(21)
(22)
two
Loverall
(19)
Protocol latency: Protocol latency is also the sum of synchronization and acknowledgement latency. Since acknowledgement latency of the two-flop is same as interface latency, protocol latency of the single two-flop is also same as
overall latency:
L protocol
Fig. 9
par ]
(23)
par ]
(24)
These trends are shown in Fig. 9. As the number of parallel Two-flops for the proposed scheme increases to infinity,
Goverall and G protocol approach to T c and 1.5T c , respectively.
6.
Simulation Results
To verify the analysis results on latency, the proposed parallel flop synchronizer and the interface circuit have been
implemented in 0.25 m CMOS technology. We employed
four independent Two-flops (N = 4) and 16-bit wide datapath for a pre-layout HSPICE simulation.
Figure 10 shows waveforms of handshake signals when
two asynchronous write transactions are made to a clock domain with T c = 2.0ns. Initially, all handshake signals are
low and a source domain starts transaction by asserting the
REQ signal. This signal arrives at the destination domain
between 24 T c and 34 T c (region 2 in Fig. 8), and it is firstly
sampled by the last Two-flop whose clock is delayed by 34 T c
from a local CLK (see R3). Then, other subsequent Twoflops have the same values one by one at intervals of T4c .
As soon as the SYNC detects the transition of R3, the ACK
signal is generated and a Rrdy signal becomes high since a
CLK signal is already low. Finally the input data are saved
to the destination at the next rising edge of the CLK. In this
case, overall and protocol latency of the proposed scheme
are almost same (about 1.5T c ). Compared to overall latency
of the single two-flop scheme that finishes the transaction
at the next rising edge of the CLK after R0 transition, it
saves a clock cycle for overall and protocol latency, respectively. These results confirms the analysis shown in region
2 of Fig. 8 and Table 1.
After the source domain receives the ACK from the
destination, it toggles the REQ signal to send another data.
3172
7.
Fig. 10
(25)
MT BF parallel
(26)
Conclusion
KIM et al.: A PARALLEL FLOP SYNCHRONIZER AND THE HANDSHAKE INTERFACE FOR BRIDGING ASYNCHRONOUS DOMAINS
3173
June 1999.
[10] F. Cheng, Practical design and performance evaluation of completion detection circuits, Proc. International Conference on Computer
Design, pp.354359, 1998.
[11] H. Lam and C. Tsui, High performance and low power completion
detection circuit, Proc. International Symposium on Circuits and
Systems, vol.5, pp.V-405-V-408, May 2003.
[12] M. Shames, J. Ebergen, and M. Elmasry, A comparison of CMOS
implementation of an asynchronous circuits primitive: The Celement, International Symposium on Low Power Electronics and
Design, pp.9396, 1996.
Suk-Jin Kim
received the B.S. degree in
electronics engineering from Kyunghee University in 1998, and M.S. degree from Gwangju Institute of Science and Technology (GIST), Korea, in 2000. Currently he is working toward
Ph.D. degree in Department of Information and
Communications of Gwangju Institute of Science and Technology. His research interests include the synchronization and the power saving
in Globally Asynchronous Locally Synchronous
(GALS) systems.
Jeong-Gun Lee
received the B.S. degree
in computer engineering from Hallyim University in 1996, and M.S. degree from Gwangju Institute of Science and Technology (GIST), Korea, in 1988. Currently he is working toward
Ph.D. degree in Department of Information and
Communications of Gwangju Institute of Science and Technology. His research interests include the Petri-net theory and asynchronous circuits and systems.
Kiseon Kim
received the B.Eng. and
M.Eng. from Seoul National University, all in
electronics engineering, in 1978 and 1980, and
Ph.D. degree from the University of Southern
California, Los Angeles, in 1987, in electrical engineering systems. From 1988 to 1991,
he was with Schoumberger in Texas, as a senior development engineer where he has been
involved in development of telemetry systems.
From 1991 to 1994, he was a computer communications specialist for Superconducting Super
Colider Lab., in TX, where he has built telemetry logging and analysis
systems for high energy physics instrumentations. Since joining GIST, in
1994, he is presently a Professor. His research interests include wideband
digital communications system design, analysis and implementation.