A High Performance Fir Filter-Full-Document

A High-Performance FIR Filter Architecture for Fixed and
Reconfigurable Applications
Table of Contents
1.
Introduction
1.1
Purpose
1.2
Target Audience
1.3
References
1.4
Abbreviations 6
2.
INTRODUCTION TO VLSI DESIGN:
2.1
What is VLSI?7
2.1.1
Dealing with VLSI Circuits 7
2.1.2
Introduction to VHDL7
2.1.2.1 Limitations of traditional programming languages

2.1.3
Electronic Design Automation
3.
Literature survey
4.
Project 14
5.
Software Requirement
22
5.1
MODELSIM - ALTRA
22
5.1.1
Assumptions 22
5.1.2
Basic Simulation Flow
13
22
5.1.2.1 Basic Simulation Flow - Overview Lab [Creating the Working Library]:
5.1.3
Project Flow 23
5.1.4
Multiple Library Flow
5.1.5
Debugging Tools
24
5.1.6
Basic Simulation
24
5.1.6.1 Introduction
24
23
22
5.1.6.2 Design Files for this Lesson 25

5.2
Xilinx ISE
25
5.2.1
Pre-requisite 25
5.2.2
Simulation Libraries 25
5.2.3
Setting up Environment Variables
5.2.4
Simulation Library Compilation Wizard
6.
Software Algorithm
27
6.1
Encryption Round
27
6.2
Decryption Round
27
6.3
Key Round
7.
Synthesis report (Example)
28
7.1
Analysis and Synthesis
28
7.2
Flow Summary
7.3
Timing Analyzer Summary
8.
Simulation Output (Example) 31
8.1
Scalable Encryption 31
8.1.1
Binary Format 31
8.1.2
Decimal Format
8.2
Scalable Decryption 33
8.2.1
Binary Format 34
8.2.2
Decimal Format
35
8.3
Input and Output
36
9.
RTL view (Example) 38
9.1
Scalable encryption algorithm top module
9.2
Scalable Encryption Algorithm Multiplexer 39
25
26
27
29
30
32
38
9.3
Scalable encryption and decryption round
40
9.4
Scalable Encryption Algorithm key round
41
10.
Conclusion
11.
REFERENCE 43
42
List of Figures
Figure 2 : FPGA Achitecture 10

Figure 3 : Examples of FPGA Packages
Figure 4 : Basic Simulation Flow
Figure 5 : Project flow
14
15
Figure 6 : Multiple Library flow
16
Figure 7: Analysis and Synthesis
20
Figure 8: Flow summary
11
21
Figure 9: Timing Analyzer summary 1
22
22
22
Figure 12: Scalable Encryption
23
Figure 13: Scalable Encryption Binary Format
24
Figure 14: Scalable Encryption Decimal Format
25
Figure 15: Scalable Decryption
26
Figure 16: Scalable Decryption Binary Format
27
Figure 17: Scalable Decryption Decimal Format
28
Figure 18: Scalable Encryption algorithm Top Module
30
Figure 19: Scalable Encryption Algorithm Multiplexer
31
Figure 20: Scalable encryption algorithm Decryption
32
Figure 21: Scalable encryption algorithm key round 33
1.
INTRODUCTION
1.1
Purpose
The document designed for VLSI reference with Project and software implementation.
1.2
Target Audience
For client and student verification
1.3
References
RTL HARDWARE DESIGN USING VHDL Coding for Efficiency, Portability, and
Scalability
Circuit Design with VHDL Volnei A. Pedroni
Spartan-6 Family Overview
Modelsim Users Manual Software Version 6.5b
1.4
Abbreviations
VLSI Very Large Scale Integrated Circuits

HDL Hardware Description Language
VHDL VHSIC Hardware Description Language
VHSIC
Very High Speed Integrated Circuits
IEEE Institute Of Electrical And Electronic Engineers

VITALVHDL Initiative Towards Ask Libraries
RTL
Register Transfer Level
WAVES
Waveform And Vector Exchange to Support Design and Test Verification
PLA
Programmable Logic Array
PAL
Programmable Array Logic
CPLD Complex Programmable Logic Devices

CLB
Configurable Logic Blocks
ISE
Integrated Software Environment
EDA Electronic Design Automation

FPGA Field-Programmable Gate Array
2.
INTRODUCTION TO VLSI DESIGN:
2.1
What is VLSI
VLSI stands for "Very Large Scale Integration". This is the field which involves packing
more and more logic devices into smaller and smaller areas. VLSI, circuits that would have taken
board furls of space can now be put into a small space few millimeters across! VLSI circuits are
everywhere ... your computer, your car, your brand new state-of-the-art digital camera, the cellphones, and what have you. All this involves a lot of expertise on many fronts within the same
field, which we will look at in later sections.
2.1.1
Dealing with VLSI Circuits
The way normal blocks like latches and gates are implemented is different from what students
have seen so far, but the behavior remains the same. All the miniaturization involves new things
to consider. A lot of thought has to go into actual implementations as well as design.
Circuit Delays: Large complicated circuits running at very high frequencies have one big
problem to tackle - the problem of delays in propagation of signals through gates and wire even
for areas a few micrometers across! The operation speed is so large that as the delays add up,
they can actually become comparable to the clock speeds.
Power: Another effect of high operation frequencies is increased consumption of power. This has
two-fold effect - devices consume batteries faster, and heat dissipation increases. Coupled with
the fact that surface areas have decreased, heat poses a major threat to the Stability of the circuit
itself.
Layout: Laying out the circuit components is task common to all branches of electronics. Whats
so special in our case is that there are many possible ways to do this; there can be multiple layers
of different materials on the same silicon, there can be different arrangements of the smaller parts
for the same component and soon. The choice between the two is determined by the way we
chose the layout the circuit components. Layout can also affect the fabrication of VLSI chips,
making it either easy or difficult to implement the components on the silicon.
2.1.2
Introduction to VHDL
A digital system can be described at different levels of abstraction and from different points of
view. An HDL should faithfully and accurately model and describe a circuit, whether already
built or under development, from either the structural or behavioral views, at the desired level of
abstraction. Because HDLs are modeled after hardware, their semantics and use are very
different from those of traditional programming languages.
2.1.2.1 Limitations of traditional programming languages
There are wide varieties of computer programming languages, from Frontend to C to Java.
Unfortunately, they are not adequate to model digital hardware. To understand their limitations, it
is beneficial to examine the development of a language. A programming language is
characterized by its syntax and semantics. The syntax comprises the grammatical rules used to
write a program, and the semantics is the meaning associated with language constructs. When
a new computer language is developed, the designers first study the characteristics of the
underlying processes and then develop syntactic constructs and their associated semantics to
model and express these characteristics.
Most traditional general-purpose programming languages, such as C, are modeled after a

sequential process. In this process, operations are performed in sequential order, one operation at
a time. Since an operation frequently depends on the result of an earlier operation, the order of
execution cannot be altered at will. The sequential process model has two major benefits. At the
abstract level, it helps the human thinking process to develop an algorithm step by step. At the
implementation level, the sequential process resembles the operation of a basic computer model
and thus allows efficient translation from an algorithm to machine instructions.
The characteristics of digital hardware, on the other hand, are very different from those of the
sequential model. A typical digital system is normally built by smaller parts, with customized
wiring that connects the input and output ports of these parts. When signal changes, the parts
connected to the signal are activated and a set of new operations is initiated accordingly. These
operations are performed concurrently, and each operation will take a specific amount of time,
which represents the propagation delay of a particular part, to complete. After completion, each
part updates the value of the corresponding output port. If the value is changed, the output signal
will in turn activate all the connected parts and initiate another round of operations. This
description shows several unique characteristics of digital systems, including the connections of
parts, concurrent operations, and the concept of propagation delay and timing. The sequential
model used in traditional programming languages cannot capture the characteristics of digital
hardware, and there is a need for special languages (i.e., HDLs) that are designed to model
digital hardware.
VHDL includes facilities for describing logical structure and function of digital systems at a
number of levels of abstraction, from system level down to the gate level. It is intended, among
other things, as a modeling language for specification and simulation. We can also use it for
hardware synthesis if we restrict ourselves to a subset that can be automatically translated into
hardware.
VHDL arose out of the United States governments Very High Speed Integrated Circuits
(VHSIC) program. In the course of this program, it became clear that there was a need for a
standard language for describing the structure and function of integrated circuits (ICs). Hence the
VHSIC Hardware Description Language (VHDL) was developed. It was subsequently developed
further under the auspices of the Institute of Electrical and Electronic Engineers (IEEE) and
adopted in the form of the IEEE Standard 1076, Standard VHDL Language Reference Manual, in
1987. This first standard version of the language is often referred to as VHDL-87.
After the initial release, various extensions were developed to facilitate various design and
modeling requirements. These extensions are documented in several IEEE standards:
i.
IEEE standard 1076.1-1999, VHDL Analog and Mixed Signal Extensions (VHDL-AMS):
defines the extension for analog and mixed-signal modeling.
ii.
IEEE standard 1076.2-1996, VHDL Mathematical Packages: defines extra mathematical
functions for real and complex numbers.
iii.
IEEE standard 1076.3- 1997, Synthesis Packages: defines arithmetic operations over a
collection of bits.
iv.
IEEE standard 1076.4-1995, VHDL Initiative towards ASK Libraries (VITAL): defines a
mechanism to add detailed timing information to ASIC cells.
v.
IEEE standard 1076.6-1999, VHDL Register Transfer Level (RTL) Synthesis: defines a
subset that is suitable for synthesis.
vi.
IEEE standard 1 164- 1993 Multivalve Logic System for VHDL Model Interoperability
(std-logicJl64): defines new data types to model multivalve logic.
vii.
IEEE standard 1029.1-1998, VHDL Waveform and Vector Exchange to Support Design
and Test Verification (WAVES): defines how to use VHDL to exchange information in a
simulation environment.
Figure 1 : Summary of VHDL design flow

2.1.3
Electronic Design Automation
There are several EDA (Electronic Design Automation) tools available for circuit synthesis,
implementation, and simulation using VHDL. Some tools (place and route, for example) are
ordered as part of a vendors design suite (e.g., Alteras Quartus II, which allows the synthesis of
VHDL code onto Alteras CPLD/FPGA chips, or Xilinxs ISE suite, for Xilinxs CPLD/FPGA
chips). Other tools (synthesizers, for example), besides being ordered as part of the design suites,
can also be provided by specialized EDA companies (Mentor Graphics, Synopsis, Simplicity,
etc.). Examples of the latter group are Leonardo Spectrum (a synthesizer from Mentor Graphics),
Simplify (a synthesizer from Simplicity), and Modelsim (a simulator from Model Technology, a
Mentor Graphics company). The designs presented in the book were synthesized onto
CPLD/FPGA devices (appendix A) either from Altera or Xilinx. The tools used were either ISE
combined with ModelSim, MaxPlus II combined with Advanced
Synthesis Software or Quartus II. Leonardo Spectrum was also used occasionally. Although
different EDA tools were used to implement and test the examples presented in the design, we
decided to standardize the visual presentation of all simulation graphs. Due to its clean
appearance, the waveform editor of MaxPlus II was employed. However, newer simulators like
ISE ModelSim and Quartus II, over a much broader set of features, which allow, for example, a
more refined timing analysis. For that reason, those tools were adopted when examining the fine
details of each design.
2.1.4 Field-Programmable Gate Array
A field-programmable gate array (FPGA) is a semiconductor device that can be configured by

the customer or designer after manufacturinghence the name "field-programmable". To
program an FPGA one must specify how they want the chip to work with a logic circuit diagram
or a source code in a hardware description language (HDL). FPGAs can be used to implement
any logical function that an application-specific integrated circuit (ASIC) could perform, but the
ability to update the functionality after shipping offers advantages for many applications.
FPGAs contain programmable logic components called "logic blocks", and a hierarchy of
reconfigurable interconnects that allow the blocks to be "wired together"somewhat like a onechip programmable breadboard. Logic blocks can be configured to perform complex
combinational functions, or merely simple logic gates like AND and XOR. In most FPGAs, the
logic blocks also include memory elements, which may be simple flip-flops or more complete
blocks of memory.
For any given semiconductor process, FPGAs are usually slower than their fixed ASIC
counterparts. They also draw more power, and generally achieve less functionality using a given
amount of circuit complexity. But their advantages include a shorter time to market, ability to reprogram in the field to fix bugs, and lower non-recurring engineering costs. Vendors can also
take a middle road by developing their hardware on ordinary FPGAs, but manufacture their final
version so it can no longer be modified after the design has been committed.
Field Programmable Gate Array (FPGA) devices were introduced by Xilinx in the mid-1980s.
They differ from CPLDs in architecture, storage technology, number of built-in features, and
cost, and are aimed at the implementation of high performance, large-size circuits.
Figure 2 : FPGA Architecture
The basic architecture of an FPGA is illustrated in figure 2. It consists of a matrix of CLBs

(Configurable Logic Blocks), interconnected by an array of switch matrices.
The internal architecture of a CLB is different from that of a PLD First, instead of implementing
SOP expressions with AND gates followed by OR gates (like in SPLDs), its operation is
normally based on a LUT (lookup table). Moreover, in an FPGA the number of flip-flops is much
more abundant than in a CPLD, thus allowing the construction of more sophisticated sequential
circuits. Besides JTAG support and interface to diverse logic levels, other additional features are
also included in FPGA chips, like SRAM memory, clock multiplication (PLL or DLL), PCI
interface, etc. Some chips also include dedicated blocks, like multipliers, DSPs, and
microprocessors.
Another fundamental difference between an FPGA and a CPLD refers to the storage of the
interconnects. While CPLDs are non-volatile (that is, they make use of antifuse, EEPROM,
Flash, etc.), most FPGAs use SRAM, and are therefore volatile. This approach saves space and
lowers the cost of the chip because FPGAs present a very large number of programmable
interconnections, but requires an external ROM. There are, however, non-volatile FPGAs (with
antifuse), which might be advantageous when reprogramming is not necessary.
Figure 3 : Examples of FPGA Packages
FPGAs can be very sophisticated. Chips manufactured with state-of-the-art0.09mmCMOS

technology, with nine copper layers and over 1,000 I/O pins, are currently available. A few
examples of FPGA packages are illustrated in figure A6, which shows one of the smallest FPGA
packages on the left (64 pins), a medium-size package in the middle (324 pins), and a large
package (1,152 pins) on the right. Several companies manufacture FPGAs, like Xilinx, Actel,
Altera, and Quick Logic.
Notice that all Xilinx FPGAs use SRAM to store the interconnects, so are reprogrammable, but
volatile (thus requiring external ROM). On the other hand, ActelFPGAs are non-volatile (they
use antifuse), but are non-reprogrammable (except one family, which uses Flash memory). Since
each approach has its own advantages and disadvantages, the actual application will dictate
which chip architecture is most appropriate.
3.
Literature survey
In A Low-Power Digit-Based Reconfigurable FIR Filter It provides a flexible yet compact and
low-power solution to FIR filters with a wide range of precision and tap length. Based on the
proposed architecture, an 8-digit reconfigurable FIR filter chip is implemented in a single-poly
quadruple-metal 0.35- m CMOS technology.
In Area Efficient and Low Power Reconfiurable Fir Filter new reconfigurable architectures of
low complexity FIR filters are proposed, namely programmable shifts method. The Methods can
be modified by replacing the adder architecture by using carry save adder instead of normal
adder architecture.
In Area Efficient Reconfigurable Fast Filter Bank for Multi-Standard Wireless Receivers This
is accomplished with an improved modified frequency transformation-based variable digital
filter (MFTVDF) at the first stage of the multistage implementation that offers unabridged
control over the cutoff frequency on a wide frequency range thereby improving the
cutoff frequency range which inturn results in fine control over subband bandwidth.
In Hardware Efficient Reconfigurable FIR Filter two new reconfigurable architectures of
low complexity FIR filters are proposed, namely constant shifts method and
programmable shifts method. The proposed FIR filter architecture is capable of operating
for different word length filter coefficients without any overhead in the hardware circuitry.
4.
Project
4.1
Existing System:
The output of an FIR filter of length N can be computed using the relation
The computation of (1) can be expressed by the recurrence relation

Y(z) =[z1((z1(z1h(N1)+h(N2))+h(N3))+h(1))+h(0)]X(z)
4.1.1 Computational Analysis:

The data-flow graphs (DFG-1 and DFG-2) of transpose form FIR filter for filter length N=6, as
shown in Fig.1, for a block of two successive outputs{y (n), y (n1)} that are derived from (2).
The product values and their accumulation paths in DFG-1 and DFG-2 of Fig. 1 are shown in
dataflow tables (DFT-1 and DFT-2) of Fig. 2. The arrows inDFT-1 and DFT-2 of Fig. 2 represent
the accumulation path of the products.
Fig. 1. DFG of transpose form structure for N =6. (a) DFG-1 for output y (n). (b) DFG-2 for
output y (n1).
Fig. 2. (a) DFT of multipliers of DFG shown in Fig. 1(a) corresponding to output y (n). (b) DFT
of multipliers of DFG shown in Fig. 1(b) corresponding to output y (n1). Arrow: accumulation
path of the products.
4.1.2 DFG Transformation:

Transformation is to modify a given DFG, while preserving the functionality of the
original DFG. A transformed DFG usually yields a different hardware structure, which leads to a
different structure for testability and power consumption. Following techniques are available for
a behavioral transformation of DFGs:
Algebraic laws and redundancy manipulation:
Associativity, commutatively, and common sub-expression
Temporal Transformation:
Retiming, pipelining, and rephrasing
Control (Memory) Transformation:
Loop folding, unrolling, and functional pipelining
Sub-Operation level Transformation:
Add/shift multiplication and multiple constant multiplications

Among the above techniques, pipelining and loop folding/unrolling are most commonly used and
are described in detail. There are two types of pipelining, structural pipelining and functional
pipelining, which are also explained below.
4.1.3 Structural Pipelining

Pipelined modules are used in structural pipelining. If there is no data dependency exists between
the operations and the operations (which implies they do not start in the same control step),
pipelined modules can be shared, even if the corresponding operations overlap in their
executions. For example, suppose that a pipelined multiplier resource, with execution delay of 2
and a data introduction interval of 1, is available. Then we can have a data flow graph with
pipelined multipliers as shown in Fig. 3. Note that operations v6, v7, and v8 can share one
pipelined multiplier.
Figure 3 A Scheduled DFG using structural pipelining
4.1.4 Functional Pipelining
In a functional pipelining, it is assumed that there is a constraint on the global data introduction
interval d0 representing the time interval between two consecutive executions. The number of
required resources in pipelined implementations depends on d0. The smaller d0 is, the larger the
number of operations executing concurrently. The result of the scheduling, using two stages and
data introduction interval of two time steps is shown in Fig. 4 In the figure, operations in the
stage 1 and stage 2 are executed concurrently and cannot be shared.
Figure 4 A Scheduled DFG using functional pipelining
4.1.5
MCM-Based Implementation of Fixed-Coefficient FIR Filter:

Critical path and shortest path solving contribute to most of the computation time in
retiming.
Definition 1 (the path solver problem): Let, S= {s0,s1,s3,.,sk} where is the maximum number
of feasible solutions available for retiming of a considered filter DFG. During retiming of digital
filters in high level synthesis, the shortest path between the nodes must be computed for (K+1)
times where is the number of feasible solutions available for the DFG which is nothing but
unique entries in path delay matrix. Similarly, the critical path must be computed for (K+1) .
General purpose processors (GPPs) where retiming algorithm is implemented are fully
programmable but are less efficient in terms of power and performance. Hence, the problem is to
improve the performance and power of retiming using FPGA based path solvers.[6][17] Further,
along with retiming, high level transformation technique called automatic pipeline is applied to
improve the filter speed.
Definition 2 (multiple constant multiplication in digital filters): For the considered filter
coefficient constant in the retimed filters, find the set of multipleirless operations(O1,O2,O3,
On) with minimum number of addition, subtraction, and shift operations using multiple constant
multiplier architecture to optimize the filter architecture further.
Definition 3 (optimization and automation of filter HDL): An environment needs to be developed
to obtain HDLs of retimed filters in which user can choose different data path element
architectures depending on the specifications. This reduces time to market and helps to evaluate a
lot of hardware implementation trade-offs. Filter equivalence checking after applying high level
transformation needs to be done which needs to be developed as a part of the optimization
environment.
Figure 5 Example for addressing MCM problem in digital filters.
(a)
(b)
(c)
Figure 6: General structure of MCM block for (a) FIR filter, (b) transposed direct form-I IIR
filter, and (c) transposed direct form-II IIR filter.
TABLE I
MCM IN TRANSPOSE FORM BLOCK FIR FILTER OFLENGTH 16 AND BLOCK SIZE 4
As shown in Table I, MCM can be applied in both horizontal and vertical direction of the
coefficient matrix. The sample x (4k3) appears in four rows or four columns of the following
input matrix:
R= (3)
Whereas x (4k) appears in only one row or one column. Therefore, all the four rows of
coefficient matrix are involved in the MCM for the x (4k 3), whereas only the first row of
coefficients are involved in the MCM for x (4k). For larger values of N or the smaller block
sizes, the row size of the coefficient matrix is larger that results in larger MCM size across all the
samples, which results into larger saving in computational complexity.
4.2
Proposed System:
Phase 1:
In phase 1 to discussed about 16 tap FIR filter for Low pass, High pass, Band pass, and band
stop filter and to analysis the performance, efficiency, speed, and power consumption for the
respective filter types. Fig.6 shows the block diagram of the proposed system. The NCO used for
signal generation with required frequency range. NCO is used in the modulation block.
4.2.1 FIR filter:

Filters are widely employed in signal processing and communication systems in applications
such as channel equalization, noise reduction, radar, audio processing, video processing,
biomedical signal processing, and analysis of economic and financial data. For example in a
radio receiver band-pass filters, or tuners, are used to extract the signals from a radio channel. In
an audio graphic
equalizer the input signal is filtered into a number of sub-band signals and
the gain for each sub-band can be varied manually with a set of controls to change the perceived
audio sensation. In a Dolby system pre-filtering and post-filtering are used to minimize the effect
of noise. In hi-fi audio a compensating filter may be included in the preamplifier to compensate
for the non-ideal frequency response characteristics of the speakers. Filters are also used to
create perceptual audio-visual effects for music, films and in broadcast studios. The primary
functions of filters are one of the followings:
a)
To confine a signal into a prescribed frequency band as in low-pass, high-pass, and bandpass filters.
b)
To decompose a signal into two or more sub-bands as in filter-banks, graphic equalizers,
sub-band coders, frequency multiplexers.
c)
To modify the frequency spectrum of a signal as in telephone channel equalization and
audio graphic equalizers.
d)
To model the input-output relationship of a system such as telecommunication channels,
human vocal tract, and music synthesizers. Depending on the form of the filter equation and the
structure of implementation, filters may be broadly classified into the following classes:
a.
Linear filters versus nonlinear filters.
b.
Time-invariant filters versus time-varying filters.
c.
Adaptive filters versus non-adaptive filters.
d.
Recursive versus non-recursive filters.
e.
Direct-form, cascade-form, parallel-form and lattice structures.
In this chapter we are mainly concerned with linear time-invariant (LTI) filters. These are a
class of filters whose output is a linear combination of the input and whose coefficients do not
vary with time. Time-varying and adaptive filters are considered in later chapters.
4.2.2 Filter types:

Basically filters are classified into 4 types based on rejection of the frequency range.
Low pass filter
High pass filter
Band pass filter
Band stop filter
Low pass filter is pass the below frequency range of the given or design frequency value and
reject the above frequency range. High pass filter is pass the above frequency range of the given
or design frequency value and reject the below frequency range. Band pass filter to pass the
between the range of given values and reject the others frequency. But band stop filter to reject
the between the range of given values and pass the others frequency.
4.2.2.1
Low-pass filter
A low-pass filter is a filter that passes signals with a frequency lower than a certain cut-off
frequency and attenuates signals with frequencies higher than the cut-off frequency. The amount
of attenuation for each frequency depends on the filter design. The filter is sometimes called a
high-cut filter, or treble cut filter in audio applications. A low-pass filter is the opposite of a highpass filter. A band-pass filter is a combination of a low-pass and a high-pass filter.
Low-pass filters exist in many different forms, including electronic circuits (such as a hiss filter
used in audio), anti-aliasing filters for conditioning signals prior to analog-to-digital conversion,
digital filters for smoothing sets of data, acoustic barriers, blurring of images, and so on. The
moving average operation used in fields such as finance is a particular kind of low-pass filter,
and can be analysed with the same signal processing techniques as are used for other low-pass
filters. Low-pass filters provide a smoother form of a signal, removing the short-term
fluctuations, and leaving the longer-term trend.
Nowadays, FPGAs are becoming a more and more attractive choice for power-hungry signal
processing applications because theyre able to process a huge amount of data in parallel.
Modern processors that have similar or even higher clock speeds than FPGA (in terms of GHz
clock rate) still having less processing power than FPGAs. Lets look at an example of 8-tap
discrete finite impulse response (FIR) filtering, which can be expressed as:
FPGAs can easily perform eight parallel operations to provide results within one clock cycle
using the parallel filtering architecture as shown in Figure 1. In contrast, computer processors
require more than one clock cycle to provide the same result, as illustrated in the following
pseudo-code:
This is due to the fact that the a processor needs several clock cycles to fetch, decode, and
execute a DSP instruction, as opposed to an FPGA, which uses parallel processing [1].
Therefore, repetitive, power-hungry signal processing tasks such as filtering, Fourier transforms,
correlation, matrix processing, and so on, are very well suited to FPGAs, while less demanding
signal processing and sequential processing tasks are well suited to processors. Furthermore,
since floating point, unpredictable, and dynamic parallel processing tasks cant leverage the
repetitive processing advantage of FPGAs, theyre easier to perform in a processor than in a

complex implementation in an FPGA.
Advantages:
1.
reduced filter length
2.
less element to used
3.
reduced cycle period
Phase 2:
To replace 16 tap FIR filter to 32 tap FIR filter and compare the both filter performance,
accuracy, and power consumption.
4.3
Taps
8 tap
16 tap
32 tap
Comparison
Area(um2)
54836
70521(28%)
93296(32%)
Power(w)
0.114
0.120(05%)
0.145(20%)
140000
120000
100000
80000
60000
40000
20000
0
AREA
32- TAP
Mahesh
32- TAP Park
32 TAP Ready
Rani
Delay
7.343
10.176(38%)
15.999(57%)
Figure 4: area comparison chart
70
60
50
32-TAP
Mahesh
40
32-TAP Park
30
32-TAP Readdy
Rani
20
10
0
Power(w)
Figure 5: power comparison chart

18
16
14
12
10
8
6
4
2
0
delay
8tap
16tap
32tap
Figure 6: delay comparison chart
5.
Software Requirement
5.1
MODELSIM - ALTRA
5.1.1
Assumptions
I assume that you are familiar with the use of your operating system. You should also be
familiar with the window management functions of your graphic interface: Open Windows,
OSF/Motif, CDE, KDE, GNOME, or Microsoft Windows 2000/XP. We also assume that you
have a working knowledge of the language in which your design and/or test bench is written
(i.e., VHDL, Verilog, etc.). Although ModelSim is an excellent tool to use while learning HDL
concepts and practices, this document is not written to support that goal.
5.1.2 Modelsim introduction

ModelSim is a verification and simulation tool for VHDL, Verilog, System Verilog, and mixed
language designs. This lesson provides a brief conceptual overview of the ModelSim simulation
environment. It is divided into four topics, which you will learn more about in subsequent
lessons.
5.1.3
Basic Simulation Flow

The following diagram shows the basic steps for simulating a design in ModelSim.
Figure 7 : Basic Simulation Flow
5.1.3.1
Basic Simulation Flow - Overview Lab
In ModelSim, all designs are compiled into a library. You typically start a new simulation
in ModelSim by creating a working library called "work," which is the default library name used
by the compiler as the default destination for compiled design units.
Compiling Your Design
After creating the working library, and compile your design units into it. The ModelSim
library format is compatible across all supported platforms. Its can simulate your design on any
platform without having to recompile your design. Loading the Simulator with Your Design and
Running the Simulation with the design compiled, load the simulator with your design by
invoking the simulator on a top-level module (Verilog) or a configuration or entity/architecture
pair (VHDL).
Assuming the design loads successfully, the simulation time is set to zero, and you enter a run
command to begin simulation.
Debugging Your Results
If you dont get the results you expect, you can use ModelSims robust debugging
environment to track down the cause of the problem.
5.1.4
Project Flow
A project is a collection mechanism for an HDL design under specification or test. Even though
you dont have to use projects in ModelSim, they may ease interaction with the tool and are
useful for organizing files and specifying simulation settings. The following diagram shows the
basic steps for simulating a design within a ModelSim project.
Figure 8 : Project flow
As you can see, the flow is similar to the basic simulation flow. However, there are twoimportant
differences:
Do not have to create a working library in the project flow; it is done for you
automatically.
Projects are persistent. In other words, they will open every time you invoke ModelSim
unless you specifically close them.
5.1.5
Multiple Library Flow
ModelSim uses libraries in two ways: 1) as a local working library that contains the
compiled version of your design; 2) as a resource library. The contents of your working library
will change as you update your design and recompile. A resource library is typically static and
serves as a parts source for your design. It can create your own resource libraries, or they may be
supplied by another design team or a third party (e.g., a silicon vendor). It specifies which
resource libraries will be used when the design is compiled, and there are rules to specify in
which order they are searched. A common example of using both a working library and a
resource library is one where your gate-level design and test bench are compiled into the
working library, and the design references gate-level models in a separate resource library. The
diagram below shows the basic steps for simulating with multiple libraries.
Figure 9 : Multiple Library flow
5.1.6
Debugging Tools
ModelSim offers numerous tools for debugging and analyzing your design. Several of
these tools are covered in subsequent lessons, including:
Using projects
Working with multiple libraries
Setting breakpoints and stepping through the source code
Viewing waveforms and measuring time
Viewing and initializing memories
Creating stimulus with the Waveform Editor
Automating simulation
5.1.7
Basic Simulation
5.1.7.1 Introduction
In this lesson you will go step-by-step through the basic simulation flow:
1. Create the Working Design Library
2. Compile the Design Units
3. Load the Design
4. Run the Simulation
5.1.7.2 Design Files for this Lesson
The sample design for this lesson is a simple 8-bit, binary up-counter with an associated
test bench. The pathnames are as follows:
Verilog <install_dir>/examples/tutorials/verilog/basicSimulation/counter.v and tcounter.v
VHDL <install_dir>/examples/tutorials/vhdl/basicSimulation/counter.vhd and tcounter.vhd
This lesson uses the Verilog files counter.v and tcounter.v. If you have a VHDL license, use
Counter.vhd and tcounter.vhd instead. Or, if you have a mixed license, feel free to use the
Verilog test bench with the VHDL counter or vice versa.
5.2
Xilinx ISE
The steps required to perform behavioral and post-route simulations using the Xilinx
Integrated Software Environment (ISE) and Mentor Graphics ModelSim simulator. In the past,
Xilinx bundled the ISE with a licensed edition of ModelSim, resulting in little or no issues
between the two softwares. However, in recent years it appears as if the two companies are no
longer on level terms. As a result, Mentor Graphics does not provide Xilinx with licenses for
ModelSim and Xilinx for its part do not provide pre-compiled device libraries for ModelSim.
This tutorial will show you how to resolve compatibility issues, so that you can continue to use
the features that these softwares provide.
As with any software tutorial, there is always more than one way of doing things. If you
come across better solutions or typos or have any suggestions for improving this document
please email the author or contact Dr. Grantner.
5.2.1
Pre-requisite
Before reading any further you should have the following softwares installed on your
computer
5.2.2
Simulation Libraries
Compile HDL Libraries through Project Navigator
5.2.3
Setting up Environment Variables
Before we discuss the individual methods, let us first understand how ModelSim accesses
these libraries. A list of the compiled libraries and their locations is stored in modelsim.ini file
found in the installation directory. When ModelSim starts a simulation it searches for this ini file;
first, in the local project folder, then in the installation directory of ModelSim. If a particular
library is not found, the simulation stops with an error. Both the methods discussed above cannot
access the modelsim.ini _le directly, so it is important that we setup environment variables that
point to the location of this _le. Setting up environment variables requires administrative
privileges on your computer. If you do not have these privileges skip to the next section.
1.
The first step is to change the write permissions for the modelsim.ini file. Browse to the
ModelSim install directory. In our setup this is c:\modeltech64 10.0c (this will be different for
other versions). Right-click on modelsim.ini, select Properties and uncheck Read-only.
2.
Make a backup of the modelsim.ini file for safety reasons.
3.
The easiest way to view or modify the environment variables is by right-clicking on My
ComputerProperties; click advanced system settingsEnvironment Variables.
4.
Under System variables, click New; enter modelsim for the variable name. For the
Variable value enterc: \modeltech64 10.0cnmodelsim.ini.
5.
Similarly, add another variable named Path with the value c:\modeltech64
10.0cnwin64.For our setup; win64 is where the executable (modelsim.exe) is located. If a
variable called path already exists, make sure you append to the existing values. Note that
multiple values can be separated by semi-colon.
6.
Click OK to save these settings.
5.2.4
This method is highly recommended because library compilation can be performed for
more than one device family and/or language.
1. Open the Simulation Library Compilation Wizard. This can be accessed from Start
MenuProgramsXilinx ISE Design Suite 13.xISE Design Tools32-bit Tools
Note: If you are using 64-bit version of ModelSim use the Simulation Library Compilation
Wizard from 64-bit Tools. ModelSim PE Student Version 10.x is only available in the 32-bit
version.
2. The Select Simulator window opens up. Select the appropriate simulator (For Student Edition
select ModelSim PE); enter c:\modeltech64 10.0cnwin64 for executable location, compxlib.cfg
for Compxlib Configuration File and compxlib.log for Compxlib Log File.
3. Next select the HDL used for simulation. If you are unsure select Both VHDL and Verilog.
However, this will increase the compilation time and the disk space required.
4. Then select all the device families that you will be working with. Again the more number of
devices, more the compilation time and the disk space required. Remember that you can always
run the compilation wizard at a later time for additional devices.
5. The next window is for selecting libraries for Functional and Timing Simulation. Different
libraries are required for different types of simulation (behavioral, post-route, etc.). We suggest
that you select All Libraries as the default option. Interested users can refer to Chapter 6 of the
Xilinx Synthesis and Simulation Design Guide for additional information.
6. Finally the window for Output directory for compiled libraries is shown. We suggest leaving
the default values that Xilinx picks. Then select Launch Compile Process.
7. Be patient as the compilation can take a long time depending on the options that you have
chosen.
8. The compile process may have contained a lot of warnings but should be error-free. We have
not explored the reasons behind these warnings, but they do not appear to affect the simulation of
any of our designs.
9. Once the process is completed, open c:\modeltech64 10.0cnmodelsim.ini and verifies if there
are libraries pointing to the output directory entered in step 6. This will happen only if you have
set the environment variables.
Library compilation is now complete. If you have not set the environment variables then
the wizard creates a modelsim.ini in the output directory entered in step 6. By default this
location is c:\Xilinxn13.xnISE DSnISE. Open this file and verify that it contains the location of
the libraries that were just compiled. This file should be copied into every project you create.
6.
MATLAB FIR filter coefficient
6.1
Low pass filter
7. Synthesis report
8. Simulation Output
9. Area report
10. Power report
11. RTL view
12. Conclusion
In this paper, we have explored the possibility of realization of block FIR filters in transpose
form configuration for areadelay efficient realization of both fixed and reconfigurable
applications. A generalized block formulation is presented for transpose form block FIR filter,
and based on that we have derived transpose form block filter for reconfigurable applications.
We have presented a scheme to identify the MCM blocks for horizontal and vertical
subexpression elimination in the proposed block FIR filter for fixed coefficients to reducethe
computational complexity. Performance comparison shows that the proposed structure involves
significantly less ADP and less EPS than the existing block direct-form structure for medium or
large filter lengths while for the short-length filters, the existing block direct-form structure has
less ADP and less EPS than the proposed structure.
REFERENCES
1.
K.-H. Chen and T.-D. Chiueh, A low-power digit-based reconfigurable FIR filter, IEEE
Trans. Circuits Syst. II, Exp. Briefs, vol. 53, no. 8, pp. 617621, Aug. 2006.
2.
R. Mahesh and A. P. Vinod, New reconfigurable architectures for implementing FIR
filters with low complexity, IEEE Trans. Comput.-Aided Design Integr. Circuits Syst., vol. 29,
no. 2, pp. 275288, Feb. 2010.
3.
S. Y. Park and P. K. Meher, Efficient FPGA and ASIC realizations of a DA-based
reconfigurable FIR digital filter, IEEE Trans. Circuits Syst. II, Exp. Briefs, vol. 61, no. 7, pp.
511515, Jul. 2014.
4.
P. K. Meher, Hardware-efficient systolization of DA-based calculation of finite digital
convolution, IEEE Trans. Circuits Syst. II, Exp. Briefs, vol. 53, no. 8, pp. 707711, Aug. 2006.
5.
P. K. Meher, S. Chandrasekaran, and A. Amira, FPGA realization of FIR filters by
efficient and flexible systolization using distributed arithmetic, IEEE Trans. Signal Process.,
vol. 56, no. 7, pp. 30093017, Jul. 2008.
6.
P. K. Meher, New approach to look-up-table design and memory based realization of
FIR digital filter, IEEE Trans. Circuits Syst. I, Reg. Papers, vol. 57, no. 3, pp. 592603, Mar.
2010.

A High Performance Fir Filter-Full-Document

Enviado por

Dados do documento

Título original

Direitos autorais

Formatos disponíveis

Compartilhar este documento

Compartilhar ou incorporar documento

Opções de compartilhamento

Você considera este documento útil?

Este conteúdo é inapropriado?

Direitos autorais:

Formatos disponíveis

A High Performance Fir Filter-Full-Document

Enviado por

Direitos autorais:

Formatos disponíveis

A High-Performance FIR Filter Architecture for Fixed and

INTRODUCTION TO VLSI DESIGN:

Dealing with VLSI Circuits 7

2.1.2.1 Limitations of traditional programming languages

Electronic Design Automation

Basic Simulation Flow

Multiple Library Flow

5.1.6.2 Design Files for this Lesson 25

Setting up Environment Variables

Simulation Library Compilation Wizard

Synthesis report (Example)

Analysis and Synthesis

Timing Analyzer Summary

Simulation Output (Example) 31

Input and Output

RTL view (Example) 38

Scalable encryption algorithm top module

Scalable Encryption Algorithm Multiplexer 39

Scalable encryption and decryption round

Scalable Encryption Algorithm key round

Figure 2 : FPGA Achitecture 10

Figure 6 : Multiple Library flow

Figure 7: Analysis and Synthesis

Figure 8: Flow summary

Figure 9: Timing Analyzer summary 1

Figure 10: Timing Analyzer summary 2

Figure 11: Timing Analyzer summary 3

Figure 12: Scalable Encryption

Figure 13: Scalable Encryption Binary Format

Figure 14: Scalable Encryption Decimal Format

Figure 15: Scalable Decryption

Figure 16: Scalable Decryption Binary Format

Figure 17: Scalable Decryption Decimal Format

Figure 18: Scalable Encryption algorithm Top Module

Figure 19: Scalable Encryption Algorithm Multiplexer

Figure 20: Scalable encryption algorithm Decryption

Figure 21: Scalable encryption algorithm key round 33

For client and student verification

Circuit Design with VHDL Volnei A. Pedroni

Spartan-6 Family Overview

Modelsim Users Manual Software Version 6.5b

VLSI Very Large Scale Integrated Circuits

Very High Speed Integrated Circuits

IEEE Institute Of Electrical And Electronic Engineers

Register Transfer Level

Waveform And Vector Exchange to Support Design and Test Verification

Programmable Logic Array

Programmable Array Logic

CPLD Complex Programmable Logic Devices

Configurable Logic Blocks

Integrated Software Environment

EDA Electronic Design Automation

INTRODUCTION TO VLSI DESIGN:

Dealing with VLSI Circuits

Most traditional general-purpose programming languages, such as C, are modeled after a

Figure 1 : Summary of VHDL design flow

Electronic Design Automation

2.1.4 Field-Programmable Gate Array

A field-programmable gate array (FPGA) is a semiconductor device that can be configured by

Figure 2 : FPGA Architecture

The basic architecture of an FPGA is illustrated in figure 2. It consists of a matrix of CLBs

Figure 3 : Examples of FPGA Packages

FPGAs can be very sophisticated. Chips manufactured with state-of-the-art0.09mmCMOS