Second Draft

Area Optimized CMOS Layouts of a 50 Gb/s Low
Power 4:1 Multiplexer

AbstractIn this work, novel layouts of a 4:1 CMOS trans-
mission gate multiplexer are presented. The proposed layouts
are realized by following the design rules for 45 nm and 90 nm
CMOS processes, with a supply voltage of 1.2 V. Both layouts
are designed using two different routing strategies- using only one
metal layer, and using two metal layers. The power dissipation
and area are noted and compared in all four cases. It is observed
that layouts utilizing two metal layers have reduced area but
augmented cost of fabrication. The multiplexer layout, using two
metal layers for routing, generates an output bit rate as high
as 50 Gb/s in case of 90 nm technology and occupies an area
of 8.91 m
2
; dissipating 55.966 W of power. The output bit
rate remains same for 45 nm process, with a power consumption
of 33.713 W and an area of 2.45 m
2
. The areas occupied by
these layouts are the lowest encountered in the authors literature
survey.
KeywordsCMOS, Optimization, Layout.
I. INTRODUCTION
A multiplexer (MUX) is a parallel-to-serial data selector
used in many high speed data communication systems. In
such systems, it is often the bottleneck point and therefore,
limits their performance [7]. Thus, high speed multiplexers are
of utmost importance. Earlier, technologies like SiGe, GaAs,
BiCMOS were preferred for designing multiplexer circuits
owing to their high speeds [2]-[4]. Nowadays, deep submicron
CMOS transistors are capable of handling such speeds [1].
CMOS is at a disadvantage against SiGe in terms of speed,
but it consumes relatively low power. This reduction in power
can be attributed to two factors- low supply voltage of current
CMOS fabrication technologies that reduces consumption of
power, and practically zero static power dissipation. It also
enjoys a high immunity against noise.
Since the advent of VLSI technology in the 1970s, transistor
count of processors has increased phenomenally from 2,300 in
1971 to 3.1 billion in 2011 [9], [10]. The Altera Benchmark
set of 120 real customer designs [14] has estimated that
multiplexers generally account for over 25 % of the area of
an FPGA design. Consequently, optimizing an FPGA design
typically requires multiplexer optimization. Recently, an FPGA
transistor count of 6.8 billion has been reported [11]. This
has been possible not only because of decreasing lithography,
but also due to improvement in layout designing techniques.
Focusing on the latter, we propose a minimal area layout
of a 4:1 transmission gate MUX using two different routing
congurations- one using a single metal (Metal 1), and the
other using two metals (Metal 1 & Metal 2). Pertaining to
specic needs, either of them can be used for fabrication.
The layouts are area optimized and DRC clean for 45nm and
90nm specications. Thus, 4 different layouts were designed.
Pseudo Random Binary Sequences (PRBS) were input into
the layouts and the resultant outputs were noted. In all four
TABLE I. COMPARISON OF LAYOUT SIZE
Technology Size of M1 Size of M2 Percentage
layout(m
2
) layout(m
2
) Difference
45nm 2.77 2.45 11.55 %
90nm 11.07 8.91 19.51 %
cases, a maximum operational output bit rate of 50 Gb/s was
observed.
The multiplexers circuit has been explained in section II by
weighing the merits of selecting a transmission gate logic
(TGL) architecture over one created by using basic logic gates.
Section III attempts to emphasize the strategy utilized by the
authors for devising the layouts. Section IV demonstrates the
layouts, observes and compares the area occupied for each
case. These observations are then explained. Section V focuses
on the post- simulation results of these layouts and draws
inferences from them. A comparison is also drawn against
other works encountered in the literature. Finally, section VI
concludes the paper with the authors remarks.
II. MULTIPLEXER ARCHITECTURE
The MUX output is a minterm sum of product (SoP)
expression. The minimized Boolean equation of a 4:1 mul-
tiplexer, with inputs X
0
,X
1
,X
2
,X
3
and select lines C
0
,C
1
is:
F = (X
0
.

C
0
.

C
1
) +(X
1
.

C
0
.C
1
) +(X
2
.C
0
.

C
1
) +(X
3
.C
0
.C
1
)
The most trivial method to realize the above equation is
by combining the corresponding logic gates and obtaining the
multiplexer shown in Fig. 1(a). This circuit utilizes 7 logic
gates and requires 46 transistors when it is realized using
CMOS- 23 nMOSFETS and 23 pMOSFETS. Large number of
transistors increase chip area, time as well as power dissipation
and are non-conducive to a system on chip (SoC) design.
Conventionally, 4:1 multiplexers utilize a tree type hierarchy
and are made up of three 2:1 multiplexers as shown in Fig.
2(a). The rst two multiplexers, share a single select bit,
say C
0
. Their outputs then fan out to the input of the third
multiplexer which is controlled by the second select bit, say
C
1
. For proper operation, the frequency of C
0
should be
exactly half of C
1
. The resultant output would then have
double the bit rate of C
1
.
A transmission gate (TG) consists of a pMOS- nMOS pair
wired in parallel. It selectively blocks or passes a signal level
from the input to the output. Complementary voltages control
the gates of these transistors, so that at any given instant
both transistors are in the same state - either ON or OFF.
A TG based, tree structure, 4:1 MUX is shown in Fig. 2(b). It
utilizes only a total of 16 CMOS transistors, a 65% reduction
in transistor count with respect to the circuit in Fig.1(a).
Therefore, the layouts presented in this paper are based on
the architecture showed in Fig. 2(b).
(a) (b)
Fig. 1. MUX based on (a) Logic Gates and (b) Transmission Gates
(a) (b)
Fig. 2. (a) Tree Structure and (b) CMOS circuit used for layout fabrication
III. LAYOUT OPTIMIZATION
The EDA software Microwind was used for designing the
VLSI layout of the circuit shown in Fig. 2(b). It was observed
that the verilog to layout compiler of the software generated
a layout that occupied an area of 139.34 m
2
and dissipated
11.140 W power for a 1 Gb/s input, using three metal layers
in the process. It, therefore, occupied a disproportionately large
area and unnecessarily utilized an additional metal layer. These
drawbacks will subsequently lead to higher cost and lower
transistor density during fabrication.
Vying for area and power optimization, the authors decided
to manually construct the layout. In contrast to the softwares
approach, a non-linear approach was used for constructing the
layouts, i.e they werent created by merely linking several
discrete nMOS and pMOS layouts. Since an Eulerian path was
not dened for our circuit, it wasnt possible to create a VLSI
layout using a single n+/p+ region for V
ss
/V
dd
, respectively
[8]. The next step was to create 4 different regions for the 2:1
MUX and the inverters. At every step, extreme care was taken
to minimize the width of each layer as well as the distance
between discrete layers, while following all the design rules
for width, spacing and encasing, corresponding to both, 45nm
and 90nm CMOS technologies. These design rules for spacing
and width are specied in Table II.
Thus, highly optimized layouts were conceived, including a
layout that occupied an area of 2.45 m
2
- a reduction of 15.63
times in comparison to the one automatically generated by
software. Moreover, power dissipation was also reduced by a
factor of 3, wrt the software generated layout. As stated earlier,
a MUX is the building block for many complex circuits. Hence,
reduction in its size would lead to an overall decrease in the
size of large circuits, and increase their transistor density.
IV. PROPOSED LAYOUTS
The circuit in Fig. 2(b) has been realized utilizing two
different routing strategies: the rst uses only one metal layer
for wiring (hereafter referred to as M1) and the other utilizes
two metal layers (hereafter referred to as M2). The layouts, M2
and M1 are shown in Fig. 3 and Fig. 5 respectively. Both are
TABLE II. DESIGN RULES
Layer(s) 45nm 90nm
Width Spacing Width Spacing
Poly 2 3 2 3
Metal1 3 4 3 4
Metal2 3 4 3 4
Via 3 4 2 4
Contact 2 3 2 4
DiffN 4 4 4 4
DiffP 4 4 4 4
Nwell 10 11 10 11
Polly-Metal - 1 - 1
TABLE III. POWER DISSIPATION OF ALL LAYOUTS
Input Output Power Dissipation (W)
Bitrate Bitrate 45nm 90nm
(Gbps) (Gbps) M1 M2 M1 M2
1 4 3.683 3.592 5.647 5.310
2 8 6.989 6.391 10.880 10.313
3.2 12.8 9.125 10.041 15.831 15.831
4 16 12.046 12.672 19.951 20.061
5 20 13.684 15.055 23.860 23.860
6.4 25.6 17.620 19.592 30.754 30.754
8 32 20.922 23.243 37.629 37.320
10 40 25.690 27.066 45.973 47.973
11.11 44.44 27.162 30.337 50.232 50.241
12.5 50 30.163 33.713 55.966 55.966
in accordance with the design rules of 90nm technology. Fig.
5 and Fig. 6 display M1 and M2, respectively, using standards
for 45 nm technology.
M2 occupies 11.55% lesser area than M1 in 45 nm technology
and 19.51% lesser area in 90 nm technology. The results are
displayed in Table I. The cost incurred in fabricating M2 will
be higher than M1. This is because the fabrication process
would include the addition of a silicon nitride layer, a new
metal layer and vias for connecting Metal1 and Metal2 [8].
However, determining this cost difference is outside the scope
of this paper.
As expected, the layout area dramatically decreases when 45
nm process is used- 75% in M1 and 72.5% in M2. It is
also observed that the two dimensional layout area of M2 is
smaller in both technologies: since now the Metal 2 lines can
cross over existing Metal1 layers. Hence, Metal1 wires dont
need additional surface to go around other Metal1 wirings.
Moreover, the percentage change in area (M1 vs. M2) is greater
in case of 90 nm, as compared to 45 nm technology. This
is because of different design rule specications governing
the two fabrication technologies. In both technologies, the
minimum area that has to be occupied by a contact (along with
required surrounding layers) is 16
2
. In the 90 nm fabrication
process, the minimum area designated for a via (along with its
surrounding layers) remains 16
2
. However, this value shoots
up to 49
2
when fabricating in 45 nm process. Consequently,
the layout area increases.
A contact, generally made of aluminium, connects active
regions (source, drain and poly) with an overhead metal layer.
A via is generally made up of tungsten and connects metal
layers that are on different levels.
(a) (b)
Fig. 3. (a) MUX with 2 Metal layers in 90nm technology and (b) Legend
Fig. 4. MUX with 1 Metal layer in 90nm technology
Fig. 5. MUX with 1 Metal layer in 45nm technology
Fig. 6. MUX with 2 Metal layers in 45nm technology
V. SIMULATION RESULTS AND INFERENCES
Microwind 3.1.7 was used as the software platform for
all post-layout simulations presented in this work. The design
rule specications of both processes were adhered to, as
corroborated by the software. All observations were made at
a 5ns timescale.
For simulation purposes, 4 Pseudo-Random Binary Sequences
(PRBS) with the same bit rate (say b) were given as inputs (X
0
,
X
1
, X
2
, X
3
). The MUX operation was synchronised using 2
clocks- C
0
and C
1
. C
0
s frequency was equal to the bit rate
Fig. 7. Output of layout M1 with 90nm standard at 1 Gbps Input
TABLE IV. COMPARISON WITH OTHER WORKS
Properties This work By [2] By [3] By [7] By [12] By [13]
90nm 45nm
Maximum 50 Gb/s 50Gb/s 20 Gb/s 30 Gb/s 10 Gb/s 3.6 Gb/s 200Gb/s
Output Bitrate
Supply Voltage 1.2 V 1.2V 1.5 V 1.5 V 1.8 V 3.3 V 0.7 V
CMOS 90 nm 45nm 0.13 m 0.13 m 0.18 m 0.35 m 45 nm
Technology
Minimum 5.400x1.650 2.975x0.825 0.93x0.71 0.93x0.71 0.575x0.475 0.625x0.575 2.600x2.375
Chip Size m
2
m
2
mm
2
mm
2
mm
2
mm
2
m
2
Maximum Power 56 W 33.7 W 82mW 75mW 53.3 mW 60mW 1.887nW
Dissipation
Fig. 8. Power dissipation vs. output bit rate plots for all layouts.
of the input sequences. C
1
ticked at double this value.
The inputs were fed to two 2:1 multiplexers, with C
0
as the
common select bit. The two MUX generated two signals (say
O
0
and O
1
) with bit rate 2b. O
0
and O
1
are then input to
the third multiplexer with C
1
as its select bit. The third 2:1
multiplexer gives the nal output with bit rate 4 times the input
bit rate. The output bit rates and power dissipation for all the
layouts are depicted in Table III. Fig. 8 graphically represents
these values. Fig. 7 depicts the MUX output for M1 at 1 Gbps
input, fabricated using 90 nm technology.
While observing the output signal, we assumed that a bit
would be detected by the receiver if it satised the following
condition:
Let the time period of the signal (1/bit rate) be t. For a bit
to be detectable, the amplitude of the output signal at any time
nt +(t/2) (where n is a positive integer) should belong to the
range [V
SS
,V
SS
+ 20%]
[V
DD
- 20%,V
DD
]. This implies
that for a V
DD
value of 1.2V, the output value at nt + (t/2)
should be less than 0.24 V to be detected as logic level low and
greater than 0.96V to be detected as logic level high. Output
values for all frequencies listed in Table III were noted and
observed on the basis of this criterion.
The Output bit rate vs. Power dissipation plots in Fig. 8 are
practically linear. This clearly illustrates that power dissipation
increases linearly with the input bit rate (frequency). Mathe-
matically, dynamic power dissipation in a low power CMOS
circuit is given as [8]:
P = CfV
2
DD
Where:
= Activity factor
C = Total active capacitance of electrical nodes (F)
f = Operational frequency of the integrated circuit (Hz)
V
DD
= Supply Voltage (V)
Our experimental ndings are therefore consistent with
theoretical observations. An increase in the input bit rate
augments the number of transitions per second (high-to-low
and low-to-high). Subsequently, the dynamic power dissipation
increases. The static power dissipation in a CMOS circuit is
negligible.
Moreover, we also observe that in 90nm CMOS technology,
M1 and M2 have near-identical values of power dissipation,
despite the 20% difference in their sizes. This suggests that
in 90nm CMOS, there is no direct co-relation between size
and power dissipation. In a stark contrast, power consumption
of M2 is consistently and marginally higher than M1 when
fabrication takes place using 45nm technology. This can again
be explained by the presence of a relatively larger vias in
45nm CMOS [refer Section III] as compared to 90nm CMOS.
These vias have larger cross-sectional area and thus offer lower
resistance. From:
P = V
2
DD
/R
We see that with a decrease in resistance, the power dissipation
in 45nm CMOS would increase. As is evident from Table III,
the maximum operational output bit rate in all the 4 cases
turned out to be 50 Gbps with a maximum power dissipation
of 55.966 W.
In addition to the 2 aforementioned technologies, we also
simulated layouts utilizing 32nm foundry. The resultant MUX
yielded an output bit rate up to 40 Gb/s and dissipated
14.223W and 13.630W power in case of M1 and M2,
respectively. The design rules for this technology are identical
to those of 45nm technology. Hence, layouts in Fig. 5 and
Fig.6 were used to evaluate these values.
VI. CONCLUSION
In this work, we have presented 4 optimised layouts of a
4:1 Multiplexer in both 45 nm and 90 nm standards. They are
functional from DC up to a output bit rate of 50Gbps. In all
the literature the authors surveyed on 4:1 MUX, their sizes are
the lowest recorded in the respective CMOS standards [1], [2],
[3], [7], [12], [13].
Work was also done on 32 nm CMOS technology, but
not as comprehensively. Although, 45nm technology takes
less area and consumes less power, vias require relatively
high cross sectional area (in terms of ). Thus, a layout in
45nm technology that does not require a via may be more
economically feasible. This conjecture is also supported by
the observation that the single metal layout in this technology
dissipates lesser power than its dual metal counterpart. Individ-
ual design rule specications play a critical role in governing
the physical as well as electrical properties of devices.
Power dissipation in the circuit increases with increase in the
operational frequency of the circuit. Scaling down fabrication
technology reduces the current required to switch a transistor
from on to off or vice versa. Therefore, work on reduced
lithography has enormous potential as a future research topic.
More efcient algorithms need to be developed to enable
ECAD softwares to optimize layouts, on the lines of this work.
The next step in the authors research would be to develop a
sophisticated algorithm that would be capable of designing
minimum-width and minimum-spacing layouts, without vio-
lating process design rules. This would reduce human efforts
and lead to development of more efcient devices.
REFERENCES
[1] Kehrer, Daniel, H-D. Wohlmuth, Herbert Knapp, and Arpad L. Scholtz.
A 15 Gb/s 4: 1 paralleltoserial data multiplexer in 0.12m CMOS. In
Solid-State Circuits Conference, 2002. ESSCIRC 2002. Proceedings of
the 28th European, pp. 227-230. IEEE, 2002.
[2] Kehrer, Daniel, and H-D. Wohlmuth. A 20 Gb/s 82mW one-stage 4: 1
multiplexer in 0.13/spl mu/m CMOS. Solid-State Circuits Conference,
2003. ESSCIRC03. Proceedings of the 29th European. IEEE, 2003.
[3] Kehrer, Daniel, and H-D. Wohlmuth. A 30-Gb/s 70-mW one-stage 4:
1 multiplexer in 0.13-m CMOS. Solid-State Circuits, IEEE Journal of
39.7 (2004): 1140-1147.
[4] S. Tanaka, and H. Hida, 120-Gb/s Multiplexing and 110-Gb/s Demulti-
plexing ICs, IEEE J. Solid-State Circuits, Vol. 39, no. 12, pp. 2397-2402,
Dec. 2004.
[5] Joakim Hallin, Torgil Kjellberg, and Thomas Swahn, A 165-Gb/s 4:1
Multiplexer in InP DHBT technology, IEEE J. Solid-State Circuits, Vol.
41, no. 10, pp. 2209-2214, Oct. 2006.
[6] Guo, Yawei, Zhanpeng Zhang, Wei Hu, and Lianxing Yang. CMOS
Multiplexer and Demultiplexer for Gigabit Ethernet. In Communica-
tions, Circuits and Systems and West Sino Expositions, IEEE 2002
International Conference on, vol. 1, pp. 819-823. IEEE, 2002.
[7] Sun, Xiang, and Jun Feng. A 10 Gb/s Low-power 4: 1 Multiplexer
in 0.18 m CMOS. In Signals Systems and Electronics (ISSSE), 2010
International Symposium on, vol. 1, pp. 1-4. IEEE, 2010.
[8] Uyemura, John Paul. Introduction to VLSI circuits and systems. J. Wiley,
2002.
[9] 60 Years of The Transistor: 1947-2007, Intel, www.intel.com/cn/ tech-
nology/ timeline.pdf
[10] Riedlinger, Bhatia, Biro, Bowhill, Fetzer, Gronowski and Grutkowski.
A 32nm 3.1 Billion Transistor 12-Wide-Issue Itanium Processor for
Mission-Critical Servers, ISSCC 2011 /Session 4/Enterprise Processors
and Components/4.8
[11] Xilinxs 3D (or 2.5D) packaging enables the worlds highest capacity
FPGA device, and one of the most powerful processors on the market,
www.i-micronews.com/upload/pdf/3DPackaging Nov.2011 AC.pdf
[12] Li, Yujun, et al. A 3.6 Gb/s 60 mW 4: 1 multiplexer in 0.35-m CMOS.
Signals Systems and Electronics (ISSSE), 2010 International Symposium
on. Vol. 2. IEEE, 2010.
[13] Mishra, Meenakshi, and Shyam Akashe. High performance, low
power 200 Gb/s 4:1 MUX with TGL in 45 nm technology. Applied
Nanoscience (2013): 1-7.
[14] FPGA Performance Benchmarking Methodology, White Paper,
www.altera.com

Second Draft

Enviado por

Dados do documento

Descrição original:

Direitos autorais

Formatos disponíveis

Compartilhar este documento

Compartilhar ou incorporar documento

Opções de compartilhamento

Você considera este documento útil?

Este conteúdo é inapropriado?

Direitos autorais:

Formatos disponíveis

Second Draft

Enviado por

Direitos autorais:

Formatos disponíveis

Area Optimized CMOS Layouts of a 50 Gb/s Low

Power 4:1 Multiplexer

Você também pode gostar