Você está na página 1de 7

12.4 FFT PROCESSOR, Cont.

545

Characteristics for the unconstrained design approach are:


The layout is done in detail and allows all aspects of the circuits to be
optimized, but it is costly in terms of design time. The design involves not
only schematic or netlist entry, but also detailed layout, design rule
checking, logic, and electrical simulation.
The potential device density and switching frequency are very high.
Turnaround time and cost are the same as for standard- and
unconstrained-cell designs.
Changing vendor or VLSI process may be very difficult. Often, the basic
cells need to be completely redesigned.
Digital and analog circuits can be mixed. Medium- to relatively large-size
memories can be implemented using standard processes for digital
circuits.

12.4 FFT PROCESSOR, Cont.


In this section we estimate the required chip area of the FFT processor assuming a
0.8-um CMOS process. The FFT processor requires a large memory with a capacity of
1024 x (23+23)-bit words. For the FFT processor we have chosen to partition each of
the two logic memories into four physical memories. This means that we will use
eight memories with 128 x (23+23)-bit words. The partitioning is done to save chip
area since the RAMs are a dominant part of the FFT processor. To save further chip
area, it is possible to let several memory arrays share a row decoder. A decoder to
select the right memory array is also needed. The floor plan for a 128 x (23+23)-bit
memory is shown in Figure 12.11. The required read and write frequency is 31 MHz.
A lower bound for the chip area required for the eight memories in the FFT
processor is
^Memory = 8 ARAM = 8 1.1 1.3 = 11.4 mm2

Notice that a significant area is


required for wiring and a large part
of the chip area is wasted. Figure
12.12 shows the floor plan for the
butterfly processor. A preliminary
floor plan for the complete FFT pro-
cessor is shown in Figure 12.13. The
RAMs will consume most of the chip
area.
In practice, it is recommended to
use at least one power pin, for both
VDD and Gnd, for every eight I/O
pins. The FFT processor requires
only 32 input/output pins, and at
least three pins for communication
and clocking; only four VDD and four Figure 12.11 Floor plan for dual-port,
Gnd pins are required. Further, we single-phase RAM with 128 x
assume that five pins are needed for (23 + 23)-bits
546 Chapter 12 Integrated Circuit Design

testing purposes. Thus, about 48 pins are required. The pad frame goes around the
whole chip and the width of the pad frame is about 0.26 mm. Large empty areas
are found between the butterfly processing elements. These areas will be used for
address generation, control, decoupling capacitances, etc. The total chip area for
the FFT processor is estimated to be
AFFT 6 x 5 ~ 30 mm2
The power consumption is estimated to be about 150 mW for each RAM and
300 mW for each butterfly processor at 5 V. Allowing 400 mW for control and I/O
yields a total power consumption of about

P - 2 PButterfly + 8 PRAM + ^Control + A/O 2.2 W


Note that this estimate is a
very crude estimate and the
actual power consumption may
deviate significantly. If a more
power-efficient logic style [8, 9]
is used, the power consumption
may be reduced to only 60%
i.e., only 1.3 W is it required.
Further, it should be noted that
the power consumption for state- Figure 12.12 Butterfly processor with distributed
of-the-art CMOS processes, which arithmetic and TSPC logic
today have geometries in the
range of 0.25 to 0.35 um, would be significantly lower. The chip area would also be
lower due to the smaller geometry, as well as to the fact that few processing ele-
ments would be required sine the inherent circuit speed would be higher.

Figure 12.13 Preliminary floor plan of the FFT processor


12.5 DCT PROCESSOR, Cont. 547

12.5 DCT PROCESSOR, CONT.


Two sets of memories are required for the DCT processor, since two images are
processed concurrently. Each memory is implemented as two dual-port RAMs with
128 x 16-bits. The size of one such RAM is
^RAM = 0.7 x 1.3 = 0.9 mm2
A floor plan for the RAM is shown
in Figure 12.14. The area required for
the memories is estimated to

^Memory = 4 ARAM ~ 3.6 mm2

Figure 12.15 shows the floor plan


for the complete DCT processor. The
required chip area is estimated as
Figure 12.14 Dual-port RAM with 128 x
ADCT = 4.4 x 2.7 * 12 mm2 16-bits

The DCT processor requires 72 input and 72 output pins, and at least 3 pins
for communication and clocking. Here it is necessary to use about 18 VDD and 18
Gnd pins. We assume that 5 pins are needed for testing purposes. Thus, altogether
we need about 185 pins. The active circuitry is about 4.4 x 2.7 mm = 12 mm2. A die
of this size can accommodate about 100 pads, so the chip is pad limited. The die

Figure 12.15 Floor plan for the 2-D DCT using TSPC logic
548 Chapter 12 Integrated Circuit Design

size is therefore increased to 7.5 x 7.5 mm to accommodate this large number of


pads. The required die size, including pads and the necessary scribe margin
between the dice, is
ADCT = 7.5 x 7.5 = 56 mm2
Hence, about 20% of the die is used for active circuitry. A more efficient design
would involve either multiplexing I/O pins or possibly including more of the sys-
tem functions on the chip. Often special, small pads are used for pin-limited chips
to reduce the chip area.
The power consumption at 5 V is estimated to be 50 mW for the control unit,
60 mW for each RAM, 50 mW for each parallel/serial converter, 30 mW for the
clock, and 170 mW for each distributed arithmetic unit. The total power consump-
tion is estimated to be
P = 50 + 4 60 + 2 50 + 30 + 16 170 3.2 W

12.6 INTERPOLATOR, CONT.


The adaptor coefficients are 9 bits and the data word length is 21 bits. The floor
plan for the adaptor is shown in Figure 12.16. The main area is occupied by the
multiplier. The serial/parallel multiplier is based on the circuit shown in Figure
11.34. The execution time is
TPE = (max{(Wc + 2), Wd] + 3) TCL
Also a ROM is needed for the six coefficients in each adaptor. However, the
area for this ROM is insignificant. Allowing only 30% overhead for routing, etc., a
9-bit adaptor requires
^adaptor * 0.9 x 0.3 - 0.27 mm2

Figure 12.16 Floor plan for a 9-bit two-port adaptor using TSPC logic

Thus, less than 2 mm2 is needed for the four PEs in the interpolator.
Two memories are required for the interpolator: one with five 21-bit words
and the other with ten 21-bit words. Figure 12.17 shows the floor plan for the first
12.6 INTERPOLATOR, Cont. 549

RAM. The area of the first RAM, with eight words of which only five are used, is
estimated as

ARAMI = 0.5 0.14 0.07 mm2


The area for the second RAM, with 16 words of which only 10 are used, is esti-
mated as

ARAM2 = 0.5 0.2 = 0.1 mm2


Of course, the unused cells in the memories and the corresponding parts of
the address decoders are never implemented. The corresponding floor plan is
shown in Figure 12.18. The interpolator requires 16 input pins, 16 output pins,
and at least 3 pins for communication and clocking. We will use 4 VDD and 4 Gnd
pins. Further, we assume that 5 pins are needed for testing purposes. Thus, alto-
gether we need about 48 pins. The floor plan for the interpolator is shown in Fig-
ure 12.19.

Figure 12.17 Dual-port RAM with 8 x 21 bits

Drivers
nl Write Port In t

RAM Core |
<N
O

Read
\^ I P0rt I / v

Write Address Decoder Read Address Decoder


0.5 mm
^ ^
Figure 12.18 Dual-port RAM with 16 x 21 bits

The active circuitry is about 1.1 x 1.9 mm = 2.1 mm2. The pads can not be
placed with a spacing of less than 135 um. Hence, the die can accommodate only
550 Chapter 12 Integrated Circuit Design

Figure 12.19 Floor plan for the interpolator

The die size must therefore be increased so that the circumference of the
active circuitry becomes at least 48 0.135 ~ 6.5 mm. We may choose to increase
the size to 1.9 x 1.4 = 2.7 mm2. The pads and the necessary scribe margin add
another 0.5 mm on each side. The required die size is 2.9 x 2.4 ~ 7.0 mm2. Hence,
only 32% of the die is used for active circuitry. The interpolator is obviously a pad-
limited circuit.
The power consumption for each adaptor is about 180 mW. The power con-
sumption for the complete interpolator is estimated as
P-0.9W
12.7 Economic Aspects 551

Also in this case, the power .consumption is unnecessarily high, mainly


because of an overdesigned clock driver and clock network. Another reason is the
use of a logic stylei.e., TSPC that requires a lot of power. More power-efficient
logic styles [8, 9] are available that require only about 60% of the power of TSPC
i.e., only 0.55 W would be required. Yet a more modern CMOS process would
reduce both the chip area and power consumption significantly.
Since only a small fraction of the available area is used, it may be advanta-
geous to instead use an architecture with more processing elements with a reduc-
tion of the speed requirements. The excess speed may then be traded for lower
power consumption by using voltage scalingi.e., reducing the power supply volt-
age until the circuit just meets the requirements.

12.7 ECONOMIC ASPECTS


Today, application-specific VLSI circuits are costly and require a long design time,
but this is to a large extent also true for conventional implementation techniques.
It is therefore necessary to address the economic constraints as well as the
designer's productivity and the risks and uncertainties associated with develop-
ment of an integrated circuit. A manufacturable, reliable design must be devel-
oped on an aggressive time schedule with a minimum effort to keep pace with the
rapidly changing market.
Most DSP applications require such high performance with respect to power
dissipation, computational throughput, and size that it can not be met with con-
ventional implementation technologies. The required number of chips is often
small, which means that the development cost must be shared among just a few
systems. The unit cost is therefore dominated by the development cost. However, it
can be expected that the large investments currently being made in computer-
aided and automatic design tools for DSP and integrated circuits will significantly
reduce the development costs as well as the design and manufacturing time.
A VLSI package may cost $20 or more in low-volume quantities. The cost of
developing and running test programs on expensive testers is also significant. An
unprocessed wafer costs about $10 and the cost after processing is in the range of
$200 to $800, but a wafer contains one to several hundreds of potentially good
dies. Thus, silicon real-estate is relatively cheap even if large chip size means few-
ers chips per wafer and a higher probability of defective chips.

12.7.1 Yield
Integrated circuits are fabricated by batch processing several wafers
simultaneously. Typical lot sizes may vary between 20 to 200 wafers. The number
of square dice per wafer is

where Lc = die edge length and Dw = wafer diameter. Today, 6- and 8-inch wafers
are common.

Você também pode gostar