Escolar Documentos
Profissional Documentos
Cultura Documentos
Architectures
logic were explored. That paper, however, only considered a) Original Circuit b) Final Implementation
homogeneous memory architectures, ie. architectures in Figure 1: Example Mapping to a 8-Input, 3-Output Mem-
which each memory array is identical. In this paper, we ory Block
show that significant density improvements are possible if
the FPGA contains a heterogeneous memory architecture,
that is, an architecture with more than one size of memory benchmark circuits to each architecture. Each circuit con-
array. tained between 527 and 6598 4-LUTs. Fifteen of the cir-
The goals of this paper are as follows: cuits were sequential. The combinational circuits and 9
1. The first goal is to quantify the density improvements of the sequential circuits were obtained from the Micro-
that are possible with a heterogeneous memory archi- electronics Corporation of North Carolina (MCNC) bench-
tecture (compared to a homogeneous memory archi- mark suite, while the remaining sequential circuits were
tecture) when used to implement logic. obtained from the University of Toronto and were the re-
sult of synthesis from VHDL and Verilog. All circuits were
2. There are many possible heterogeneous memory ar- optimized using SIS [15] and mapped to four-input lookup-
chitectures (different array sizes, numbers, etc.). The tables using Flowmap and Flowpack [16]. The SMAP al-
second goal of this paper is to find the heterogeneous gorithm was then used to pack as much circuit information
memory architecture that can most efficiently imple- as possible into the available memory arrays. The number
ment logic. of nodes that can be packed to the available arrays is used
The architectural space explored in this paper is de- as a metric to compare memory array architectures.
scribed in Section 2. Section 3 describes the experimental The results in this paper depend heavily on the SMAP
methodology and reviews the SMAP algorithm. Finally, algorithm, which was originally developed for architec-
Section 4 presents experimental results. tures in which all arrays are the same size. The follow-
ing subsection reviews SMAP, while the subsequent sub-
section shows how SMAP can be used to map logic to a
2 Embedded Array Architectures heterogeneous memory architecture.
Table 1 summarizes the parameters that define the 3.1 Review of SMAP
FPGA embedded memory array architecture, along with
values of these parameters for several commercial devices. This section briefly reviews SMAP; for more details,
In this paper we are considering architectures with two dif- see [10].
ferent array sizes; we denote the number of bits in each The SMAP algorithm is based on Flowpack, a post-
type of array as and . The number of each type of processing step of Flowmap [16]. Given a seed node, the
arrays is denoted and . We assume that all arrays algorithm finds the maximum-volume k-feasible cut, where
have the same set of allowable data widths, and denote that is the number of address inputs to each memory array. A
set by eff . For a fixed size, a wider memory implies fewer -feasible cut is a set of no more than nodes in the fanin-
memory words in each array. In the Altera FLEX10K for network of the seed such that the the seed can be expressed
example, bits, and eff , meaning entirely as a function of the nodes; the maximum-volume
each array can be configured to be one of 2048x1, 1024x2, -feasible cut is the cut which contains the most nodes be-
512x4, or 256x8. tween the cut and the seed. The nodes that make up the
cut become the memory array inputs. Figure 1(a) shows an
example circuit along with the the maximum 8-feasible cut
3 Methodology for seed node A.
Given a seed node and a cut, SMAP then selects which
To compare memory array architectures, we employed nodes will become the memory array outputs. Any node
an experimental methodology in which we varied the var- that can be expressed as a function of the cut nodes is a po-
ious architectural parameters, and mapped a set of 28 tential memory array output. The selection of the outputs
Parameter Meaning Commercial Devices Range in
Altera 10K Vantis VF1 Lattice isp6192 this paper
Number of Type-1 Arrays 3-16 28-48 1 1-9
Number of Type-2 Arrays - - - 1-9
Bits per Type-1 Array 2048 128 4608 128-8192
Bits per Type-2 Array - - - 128-8192
eff Allowable Data Widths 1,2,4,8 4 9,18 1,2,4,8
Packing Ratio
250 250
200 200 2
150 150
100 100
50 50
1
0 0
128 256 512 1024 2048 4096 8192 128 256 512 1024 2048 4096 8192 128 256 512 1024 2048 4096 8192
Bits per Array Bits per Array Bits per Array
3.5
Packing Density
3.0
2.5
Array 1 size (B1)
2.0
128 256 512 1024 2048 4096 8192
Array 2 size (B2)
1.5
128 2.04 2.17 2.67 2.79 3.42 2.43 1.55 8192
256 2.10 2.61 2.73 3.33 2.41 1.56 1.0 4096
512 2.77 2.86 3.27 2.40 1.57 8192 2048
4096 1024
1024 2.73 2.98 2.28 1.53 2048
2048 2.63 2.04 1.43 1024 512
512 256
4096 1.63 1.24 256
8192 0.99 128 128 B1
B2
are four of each kind of array). As the results show, the was the case for all architectures which we investigated,
best packing density occurs when there are four arrays of except the case as described above).
2048 bits each, and four arrays of 128 bits each (we did not It is interesting to note that although an FPGA with both
consider array sizes smaller than 128 bits, since such small 128 bit arrays and 2048 bit arrays was found to be best,
arrays would not be suitable for implementing the memory in some cases, (Figures 4(c) and (e)) the majority of the
parts of circuits, and thus, would not likely be considered arrays should contain 2048 bits, while in other cases, the
by an FPGA manufacturer). The packing density at this majority of the arrays should contain 128 bits (Figures 4(d)
point is 23% higher than the best packing density obtained and (f)). This can be observed in the graphs by noticing that
for homogeneous architectures. in Figures 4(c) and (e), the highest point is to the left of
We repeated the experiments for several values of the center of the graph, while in Figure 4(d) and (f), the
and ; selected graphical results are shown in Figure 4. highest point is to the right of the center of the graph.
In Figure 4(a), one of each type of array is assumed. In this We have investigated other architectures with a
case, the best architecture is a homogeneous architecture ratio of and , and have confirmed that, as the
in which both arrays contain 2048 bits. This was the only total number of arrays increases, the preference for smaller
configuration for which a homogeneous architecture was arrays increases. Intuitively, if there are more arrays, the
found to be the best. SMAP tool is less able to effectively fill the larger arrays
Results for FPGAs with the ratio with logic.
(that is, FPGAs for which there are twice as many type-2 A second conclusion that can be drawn from the results
arrays as type-1 arrays) are shown in Figure 4(c) and (d). in Figure 4 (and confirmed by other experiments we have
Results for FPGAs with the ratio (three performed) is that as the total number of arrays increases,
times as many type-2 arrays as type-1 arrays) are shown in the advantage due to heterogeneous architectures (com-
Figure 4(e) and (f). In both cases, the best architecture was pared to homogeneous architectures) tends to increase. If
found to consist of 2048 bit arrays and 128 bit arrays (this there are only two arrays, a homogeneous architecture is
better, while if there are 12 arrays (Figures 4(d) and (f)), [9] T. Ngai, J. Rose, and S. J. E. Wilton, An SRAM-
the heterogeneous architecture is considerably better (22% Programmable field-configurable memory, in Proceedings
better in each case). of the IEEE 1995 Custom Integrated Circuits Conference,
pp. 499502, May 1995.
[10] S. J. E. Wilton, SMAP: heterogeneous technology map-
5 Conclusions ping for FPGAs with
embedded memory arrays, in ACM/SIGDA International
Symposium on Field-Programmable Gate Arrays, pp. 171
Although embedded arrays in FPGAs were developed 178, February 1998.
in order to implement on-chip storage, it is clear that these
arrays can also be configured as ROMs and used to imple- [11] S. J. E. Wilton, J. Rose, and Z. G. Vranesic, Architec-
ture of centralized field-configurable memory, in Proceed-
ment logic. In this paper, we have shown that significant
ings of the ACM/SIGDA International Symposium on Field-
density improvements are possible if the FPGA contains Programmable Gate Arrays, pp. 97103, 1995.
a heterogeneous memory architecture, that is, an architec-
ture with more than one size of memory array. The amount [12] S. J. E. Wilton, J. Rose, and Z. G. Vranesic, Mem-
ory/logic interconnect flexibility in FPGAs with large em-
of improvement depends on how many memory arrays are
bedded memory arrays, in Proceedings of the IEEE 1996
present; if there are eight arrays, we have shown that the Custom Integrated Circuits Conference, pp. 144147, May
best heterogeneous architecture can implement logic 23% 1996.
more efficiently than the best homogeneous architecture.
[13] S. J. E. Wilton, J. Rose, and Z. G. Vranesic, Memory-
In virtually all cases, we have found that the best het-
to-memory connection structures in FPGAs with embedded
erogeneous architecture consists of some 2048 bit arrays, memory arrays, in ACM/SIGDA International Symposium
and some 128 bit arrays. The exact number of each size of on Field-Programmable Gate Arrays, pp. 1016, February
array depends on the total number of arrays available; the 1997.
more arrays that are present, the larger the proportion that
[14] S. J. E. Wilton, Implementing logic in FPGA embedded
should be 128 bits. memory arrays: Architectural implications, in IEEE Cus-
We have also shown that the benefits of heterogeneous tom Integrated Circuits Conference, May 1998.
architectures become more significant as the number of ar-
[15] E. Sentovich, SIS: A system for sequential circuit analy-
rays increase. This is a compelling argument for hetero-
sis, Tech. Rep. UCB/ERL M92/41, Electronics Research
geneous memory architectures. Future architectures are Laboratory, University of California, Berkeley, May 1992.
likely to contain more memory than they do now; FP-
[16] J. Cong and Y. Ding, FlowMap: an optimal technology
GAs with such large memory capacities would signifi-
mapping algorithm for delay optimization in lookup-table
cantly benefit if a heterogeneous architecture is used.
based FPGA designs, IEEE Transactions on Computer-
Aided Design of Integrated Circuits and Systems, vol. 13,
pp. 112, January 1994.
References [17] S. J. E. Wilton, Architectures and Algorithms for Field-
Programmable Gate Arrays with Embedded Memory. PhD
[1] Xilinx, Inc., Virtex 2.5 V Field Programmable Gate Arrays, thesis, University of Toronto, 1997.
ver. 1.6, July 1999.
[2] Altera Corporation, FLEX 10K Embedded Programmable
Logic Family Data Sheet, ver. 4.1, Mar 2001.
[3] Altera Corporation, APEX 20K Programmable Logic De-
vice Family Data Sheet, ver. 2.1, Feb 2002.
[4] Altera Corporation, Stratix Programmable Logic Device
Family Datasheet, 2002.
[5] Xilinx, Inc., XC4000E and XC4000X Series Field Pro-
grammable Gate Arrays, ver. 1.6, May 1999.
[6] Actel Corporation, Datasheet: 3200DX
Field-Programmable Gate Arrays, 1995.
[7] Actel Corporation, Actels Reprogrammable SPGAs, 1996.
[8] Lattice Semiconductor Corporation, Datasheet: ispLSI and
pLSI 6192 High Density Programmable Logic with Dedi-
cated Memory and Register/Counter Modules, July 1996.
4.0
Packing Density
2.5
Packing Density
3.5
2.0
3.0
1.5
1.0 8192
2.5
8192 4096
4096 0.5
2.0 2048
2048 8192 1024
8192 1024 4096
4096 2048 512
2048 512 1024
1024
512 256 512 256
256 128 256 B1
128 B1 128 128
B2 B2
a) , b) ,
4.0 3.0
Packing Density
c) , d) ,
4.0
2.5
Packing Density
Packing Density
3.5
3.0 2.0
2.5
1.5 8192
2.0 8192
4096 4096
1.5 1.0
2048 2048
8192 8192 1024
4096 1024 4096
2048 512 2048 512
1024 1024
512 256 512 256
256 B1 256 B1
128 128 128 128
B2 B2
e) , f) ,
Figure 4: Other Selected Heterogeneous Architecture Results