Escolar Documentos
Profissional Documentos
Cultura Documentos
Project Report
On
Submitted to
i
INSTITUTE OF TECHNOLOGY AND MANAGEMENT
GWALIOR, INDIA.
Certificate
I hereby certify that the work which is being presented in the major project entitled “Design
and Implementation of Fast Fourier Transform and Inverse Fast Fourier Transform on
FPGA” in partial fulfillment of the requirement for the award of the degree of Bachelor of
Engineering (Electronics and Communication) at Institute of Technology and
Management, Gwalior is an authentic record of my own work carried out under the
supervision of undersign and refers others researchers‘ work which are duly listed in the
reference section.
The matter embodied in this work has not been submitted for the award of any other degree
of this or any other university.
This is to certify that the above statement made by the candidate is correct and true to the
best of my knowledge.
ii
Abstract
The objective of this project is to design and implement Fast Fourier Transform (FFT) and
Inverse Fast Fourier Transform (IFFT) module on a FPGA hardware. This project
concentrates on developing FFT and IFFT. The work includes in designing and mapping of
the module. The design uses 8-point FFT and IFFT for the processing module which indicate
that the processing block contain 8 inputs data. The Fast Fourier Transform and Inverse Fast
Fourier Transform are derived from the main function which is called Discrete Fourier
Transform (DFT). The idea of using FFT/IFFT instead of DFT is that the computation of the
function can be made faster where this is the main criteria for implementation in the digital
signal processing. In DFT the computation for N-point of the DFT will calculate one by one
for each point. While for FFT/IFFT, the computation is done simultaneously and this method
saves quite a lot of time. The project uses radix-2 DIF-FFT algorithm breaks the entire DFT
calculation down into a number of 2-point DFTs. Each 2-point DFT consists of a multiply-
All modules are designed using VHDL programming language and implement using FPGA
board. The board is connected to computer through serial port and development kit software
is used to provide interface between user and the hardware. All processing is executed in
FPGA board and user only requires to give the inputs data to the hardware through software.
Input and output data is displayed to computer and the results is compared using simulation
software. The design process and downloading process into FPGA board uses VHDL and the
iii
List of Figures
iv
Figure 4.15: Synchronous: Good Clocking
Figure 4.16: Metastability - The Problem
Figure 5.1: Mapping Module
Figure 5.2: FFT module
v
Dedicated to
vi
Table of Contents
1. Introduction 12-14
1.1 Motivation 1
1.2 Objective 2
ix
7. Result and Simulation 74-77
References 82-83
x
x
xi
CHAPTER 1
INTRODUCTION
1.1 Motivation
This chapter covers the material on project background, project objectives, project scope and
the project outline. Introduction on this chapter covers about the FFT/IFFT implementation
method and description on the available hardware for implementation. The problem
statement of the project will also be carried out in this chapter.
With the rapid growth of digital communication in recent years, the need for high-speed data
transmission has been increased. The mobile telecommunications industry faces the problem
of providing the technology that be able to support a variety of services ranging from voice
communication with a bit rate of a few kbps to wireless multimedia in which bit rate up to 2
Mbps. Many systems have been proposed and FFT/IFFT system has gained much attention
for different reasons. Although DFT (Discrete Fourier Transform) was first developed in the
1960s, only in recent years, it has been recognized as an outstanding method for high-speed
cellular data communication where its implementation relies on very high-speed digital
signal processing. This method has only recently become available with reasonable prices
versus performance of hardware implementation.
Since DFT is carried out in the digital domain, there are several methods to implement the
system. One of the methods to implement the system is using ASIC (Application Specific
Integrated Circuit). ASICs are the fastest, smallest, and lowest power way to implement DFT
into hardware. The main problem using this method is inflexibility of design process
involved and the longer time to market period for the designed chip.
Another method that can be used to implement DFT is general purpose Microprocessor or
Micro Controller. Power PC 7400 and DSP Processor is an example of microprocessor that is
capable to implement fast vector operations. This processor is highly programmable and
flexible in term of changing the DFT design into the system. The disadvantages of using this
hardware are, it needs memory and other peripheral chips to support the operation. Beside
1
that, it uses the most power usage and memory space, and would be the slowest in term of
time to produce the output compared to other hardware.
1.2 Objective
The aim for this project is to design a module for FFT (Fast Fourier Transform) and IFFT
(Inverse Fast Fourier Transform), mapping (modulator), using hardware programming
language (VHDL). These designs were developed using VHDL programming language in
design entry software. The design is then implemented in the FPGA development board.
Description on the development board will be carried out at methodology chapter.
In order to implement IFFT computation in the FPGA hardware, the knowledge on Very
High Speed Integrated Circuit (VHSIC) Hardware Description Language (VHDL)
programming is required. This is because FPGA chip is programmed using VHDL language
where the core block diagram of the FFT/IFFT implements in this hardware. The transmitter
and receiver are developed in one FPGA board, thus required both IFFT and FFT algorithm
implemented in the system.
The works involved is focused on the design of the core processing block using 8 point Fast
Fourier Transform (FFT) for receiver and 8 point Inverse Fast Fourier Transform (IFFT) for
transmitter part. The implementation of this design into FPGA hardware is to no avail for
several reasons encountered during the integration process from software into FPGA
hardware.
2
The project was done up to simulation level using Model Sim software and only consists FFT
and IFFT processing module. Some of the problem encountered by during project was that
the design of FFT and IFFT is not fit to FPGA hardware. The design used a large number of
gates and causes this problem to arise. Logic gates are greatly consumed if the number of
multiplier and divider are increase. One method to overcome this problem is by decreasing
the number of multiplier and divider in the VHDL design.
Beside that, the design does not include control signal which cause difficulties in controlling
the data processing in FFT or IFFT module. The control signal is use to select the process
executed for each computation process during VHDL design. As a result, the design is not
applicable for hardware implementation in the FPGA development board. New design is
required to overcome this problem. Since the design is not possible to use, this project will
concentrate on designing the FFT and IFFT module which can be implement in the dedicated
FPGA board. To ensure that the program can be implemented, the number of gates used in
the design must be small or at least less than the hardware can support. Otherwise the design
module is not able to implement into the dedicated board.
The work of the project will be focused on the design of the processing block which is 8
point IFFT and FFT function. The design also includes mapping block, serial to parallel and
parallel to serial block set. All design need to be verified to ensure that no error in VHDL
programming before being simulated. Design process will be described on the methodology
chapter.
The second scope is to implement the design into FPGA hardware development board. This
process is implemented if all designs are correctly verified and simulated using particular
software. Implementation includes hardware programming on FPGA or downloading
hardware design into FPGA and software programming.
CHAPTER II
3
2.1 Introduction
With the rapid growth of digital communication in recent years, the need for high-speed data
transmission has increased. The mobile telecommunications industry faces the problem of
providing the technology that be able to support a variety of services ranging from voice
communication with a bit rate of a few kbps to wireless multimedia in which bit rate up to 2
Mbps. Many systems have been proposed and FFT/IFFT based system has gained much
attention for different reasons. Although FFT/IFFT was first developed in the 1960s, only
recently has it been recognized as an outstanding method for high-speed cellular data
communication where its implementation relies on very high-speed digital signal processing,
and this has only recently become available with reasonable prices of hardware
implementation.
Fourier analysis forms the basis for much of digital signal processing. Simply stated, the
Fourier transform (there are actually several members of this family) allows a time domain
signal to be converted into its equivalent representation in the frequency domain. Conversely,
if the frequency response of a signal is known, the inverse Fourier transform allows the
corresponding time domain signal to be determined.
In addition to frequency analysis, these transforms are useful in filter design, since the
frequency response of a filter can be obtained by taking the Fourier transform of its impulse
response. Conversely, if the frequency response is specified, then the required impulse
response can be obtained by taking the inverse Fourier transform of the frequency response.
Digital filters can be constructed based on their impulse response, because the coefficients of
an FIR filter and its impulse response are identical.
The Fourier transform family (Fourier Transform, Fourier Series, Discrete Time Fourier
Series, and Discrete Fourier Transform) is shown in Figure 2.1. These accepted definitions
have evolved (not necessarily logically) over the years and depend upon whether the signal is
continuous–aperiodic, continuous–periodic, sampled–aperiodic, or sampled–periodic. In this
context, the term sampled is the same as discrete (i.e., a discrete number of time samples).
4
Figure 2.1 Fourier Transform Family
The only member of this family which is relevant to digital signal processing is the Discrete
Fourier Transform (DFT) which operates on a sampled time domain signal which is
periodic. The signal must be periodic in order to be decomposed into the summation of
sinusoids. However, only a finite number of samples (N) are available for inputting into the
DFT. This dilemma is overcome by placing an infinite number of groups of the same N
samples ―end-to-end,‖ thereby forcing mathematical (but not real-world) periodicity as
shown in Figure 2.1.
The basic function of DFT can be exemplified by the following figure2.2.
There are two basic types of DFTs: real, and complex. The complex DFT, is where the input
and output are both complex numbers. Since time domain input samples are real and have no
imaginary part, the imaginary part of the input is always set to zero. The output of the DFT,
X(k), contains a real and imaginary component which can be converted into amplitude and
phase.
5
The real DFT, although somewhat simpler, is basically a simplification of the complex DFT.
Most FFT routines are written using the complex DFT format, therefore understanding the
complex DFT and how it relates to the real DFT is important. For instance, if you know the
real DFT frequency outputs and want to use a complex inverse DFT to calculate the time
samples, you need to know how to place the real DFT outputs points into the complex DFT
format before taking the complex inverse DFT. Notice that the cosine and sine terms in the
equation can be expressed in either polar or rectangular coordinates using Euler‘s equation:
The DFT output spectrum can be represented in either polar form (magnitude and phase) or
rectangular form (real and imaginary) as shown in Figure 2.3. The conversion between the
two forms is straightforward.
Converting real and imaginary DFT components into magnitude and phase as follows :
6
Figure 2.4 : Conversion between polar and rectangular Co-ordinates
In order to understand the development of the FFT, consider first the 8-point DFT
expansion shown in Figure 5.10. In order to simplify the diagram, note that the
quantity WN is defined as:
WN = e−j2π /N .
WNnk = e−j2πnk /N .
The twiddle factors are simply the sine and cosine basis functions written in polar
form. Note that the 8-point DFT shown in the diagram requires 64 complex
multiplications. In general, an N-point DFT requires N2 complex multiplications.
7
The number of multiplications required is significant because the multiplication
function requires a relatively large amount of DSP processing time. In fact, the total
time required to compute the DFT is directly proportional to the number of
multiplications plus the required amount of overhead.
where WN = e−j2π /N
8
In 1971, Weinstein and Ebert made an important contribution. Discrete Fourier transform
(DFT) method was proposed to perform the base band modulation and demodulation. DFT is
an efficient signal processing algorithm. It eliminates the banks of sub carrier oscillators.
They used guard space between symbols to combat ICI and ISI problem.
Originally, multi-carrier systems were implemented through the use of separate local
oscillators to generate each individual sub carrier. This was both efficient and costly. With
the advent of cheap powerful processors, the sub-carriers can now be generated using Fast
Fourier Transform (FFT). The FFT is used to calculate the spectral content of the signal. It
moves a signal from the time domain where it is expressed as a series of time events to the
frequency domain where it is expressed as the amplitude and phase of a particular frequency.
The inverse FFT (IFFT) performs the reciprocal operation.
Direct computation of the DFT is less efficient because it does not exploit the properties of
symmetry and periodicity of the phase factor WN=e-j2∏/N.
Table 2.1
9
2.3 Classification
2.3.1 On the basis of storage of component
According to the storage of the component of the intermediate vector, FFT algorithms are
classified into two groups.
10
Classification of FFT algorithm based on Decimation of s(n) or S(K).Decimation means
decomposition into decimal parts.
11
CHAPTER III
Before going further to discus on the FFT and IFFT design, it is good to explain a bit on the
Fast Fourier Transform and Inverse Fast Fourier Transform operation. The Fast Fourier
Transform (FFT) and Inverse Fast Fourier Transform (IFFT) are derived from the main
function which is called Discrete Fourier Transform (DFT). The idea of using FFT/IFFT
instead of DFT is that the computation of the function can be made faster where this is the
main criteria for implementation in the digital signal processing. In DFT the computation for
12
N-point of the DFT will calculate one by one for each point. While for FFT/IFFT, the
computation is done simultaneously and this method saves quite a lot of time. Below is the
equation showing the DFT and from here the equation is derived to get FFT/IFFT function.
The 8-point decimation-in-time (DIT) FFT algorithm computes the final output in three
stages as shown in Figure 3.2. The eight input time samples are first divided (or decimated)
into four groups of 2-point DFTs. The four 2-point DFTs are then combined into two 4-point
DFTs. The two 4-point DFTs are then combined to produce the final output X(k). The
detailed process is shown in Figure 3.3, where all the multiplications and additions are
shown. Note that the basic two-point DFT butterfly operation forms the basis for all
computation. The computation is done in three stages. After the first stage computation is
complete, there is no need to store any previous results. The first stage outputs can be stored
in the same registers which originally held the time samples x(n). Similarly, when the second
stage computation is completed, the results of the first stage computation can be deleted. In
this way, in-place computation proceeds to the final stage. Note that in order for the
algorithm to work properly, the order of the input time samples, x(n), must be properly re-
ordered using a bit reversal algorithm.
13
Figure3.1 : The butterfly computation in DIT-FFT
The above description shows that for FFT decimation in frequency radix 2, the input can be
grouped into odd and even number. Thus, graphically the operation can be view using FFT flow
graph shown in figure.
14
Figure 3.3 DIT-FFT Signal Flow Graph
The bit reversal algorithm used to perform this re-ordering is shown in Figure 2.8The
decimal index, n, is converted to its binary equivalent. The binary bits are then placed in
reverse order, and converted back to a decimal number. Bit reversing is often performed in
DSP hardware in the data address generator (DAG), thereby simplifying the software,
reducing overhead, and speeding up the computations. The computation of the FFT using
decimation-in-frequency (DIF) is shown in Figures 2.9 and 2.10. This method requires that
the bit reversal algorithm be applied to the output X(k). Note that the butterfly for the DIF
algorithm differs slightly from the decimation-in-time butterfly as shown in Figure 2.5.
It should be noted that the algorithms required to compute the inverse FFT are nearly
identical to those required to compute the FFT, assuming complex FFTs are used. In fact, a
useful method for verifying a complex FFT algorithm consists of first taking the FFT of the
x(n) time samples and then taking the inverse FFT of the X(k). At the end of this process, the
15
original time samples, Re x(n), should be obtained and the imaginary part, Im x(n), should be
zero (within the limits of the mathematical round off errors).
16
Figure 3.6 DIF-FFT algorithm
The equation above shows that for FFT decimation in frequency radix 2, the input can be
grouped into odd and even number. Thus, graphically the operation can be view using FFT flow
graph shown in figure.
Figure 3.7 –Point DIF-FFT Signal Flow Graph using Decimation in Frequency (DIF)
Mathematically, the butterfly process for each stage can be derived as the equation stated below.
FFT Stage 1
X(0) + X(4) => X‘(0),
X(1) + X(5) => X‘(1),
X(2) + X(6) => X‘(2),
X(3) + X(7) => X‘(3),
[X(0) – X(4)]W0 => X‘(4),
[X(1) – X(5)]W1 => X‘(5),
[X(2) – X(6)]W2 => X‘(6),
[X(3) – X(7)]W3 => X‘(7),
FFT Stage 2
X‘(0) + X‘(2) => X‖(0),
X‘(1) + X(3) => X‖(1),
[X‘(0) – X‘(2)]W0 => X‖(2),
[X‘(1) – X‘(3)]W0 => X‖(3),
X‘(4) + X‘(2) => X‖(4),
X‘(5) + X(3) => X‖(5),
[X‘(4) – X‘(6)]W0 => X‖(6),
[X‘(5) – X‘(7)]W0 => X‖(7),
FFT Stage 3
18
X‖(0) + X‖(1) => Y(0),
X‖(1) – X‖(5) => Y(1),
X‖(2) + X‖(3) => Y(2),
X‖(2) – X‖(3) => Y(3),
X‖(4) + X‖(5) => Y(4),
X‖(4) – X‖(5) => Y(5),
X‖(6) + X‖(7) => Y(6),
X‖(6) – X‖(7) => Y(7),
The FFTs discussed up to this point are radix-2 FFTs, i.e., the computations are based on 2-
point butterflies. This implies that the number of points in the FFT must be a power of 2. If
the number of points in an FFT is a power of 4, however, the FFT can be broken down into a
number of 4-point DFTs as shown in Figure 5.20. This is called a radix-4 FFT. The
fundamental decimation-in-time butterfly for the radix-4 FFT is shown in Figure 5.21.
The radix-4 FFT requires fewer complex multiplications but more additions than the radix-2
FFT for the same number of points. Compared to the radix-2 FFT, the radix-4 FFT trades
more complex data addressing and twiddle factors with less computation. The resulting
savings in computation time varies between different DSPs but a radix-4 FFT can be as much
as twice as fast as a radix-2 FFT for DSPs with optimal architectures.
Comparing this with the first equation, it is shown that the same FFT algorithm can be used
to find the IFFT function with some changes in certain properties. The changes that
implement is by adding a scaling factor of 1/N and replacing twiddle factor value (Wnk) with
the complex conjugate W−nk to the equation (1) of FFT. With these changes, the same FFT
19
flow graph also can be used for the Inverse fast Fourier Transform. Below is the table show
the value of twiddle factor for IFFT
Table 3.1 : Twiddle factor for 8 point Inverse Fast Fourier Transform
IFFT (N=8)
nk W Value
1 W8-0 1
2 W8-1 .07071+.7071j
3 W8-2 j1
4 W8-3 -.07071+.7071j
5 W8-4 -1
6 W8-5 -.07071-.7071j
-6
7 W8 -j1
8 W8-7 .07071-.7071j
The equation above shows that for FFT decimation in frequency radix 2, the input can be
grouped into odd and even number. Thus, graphically the operation can be view using FFT
flow graph shown in figure.
20
A cache memory architecture adds a cache memory between the processor and the memory
to increase the effective memory bandwidth. Baas, presented a cache FFT algorithm which
increases energy efficiency and effectively lowers the power consumption.
Dual memory architecture, implemented uses two memories connected to a digital array
signal processor. The programmable array controller generates addresses to memories in a
ping-pong fashion.
The processor array architecture, consists of independent processing elements, with local
buffers, which are connected using an interconnect network.
Pipeline FFT architectures, introduced in , contain log r N blocks; each block consists of
delay lines, arithmetic units that implement a radix-r FFT butterfly operation and ROMs for
twiddle factors. A variety of pipeline FFTs have been implemented. Most pipeline FFT
realizations use delay lines for data reordering between the processing elements. Although
this gives simple data flow architecture, it causes high power consumption.
Many of the FFT algorithms relate to the ―butterfly structure‖ presented first by Cooley and
Tukey where separate processing element (PE) is assigned for each node of the FFT flow.
FFT algorithms have several stages of so called butterfly computations, and a number of
butterflies are calculated at each stage. In the pipeline FFT architecture all the butterflies of
each stage are computed using a single PE and PE assigned to different stages form a line of
processors. It is also possible to map the computation network into another line of processing
where the stages of FFT are sequentially computed by parallel PE‘s connected by the perfect
shuffling network. All the butterflies of a single stage are computed in parallel. This is called
iterative architecture.
21
Imaging
Pattern Recognition
Filter Design
Calculating Impulse response from frequency response
Calculating frequency response from impulse response
For efficient calculation of DFT.
CHAPTER IV
22
Figure 4.1: FPGA Architecture
Each FPGA vendor has its own FPGA architecture, but in general terms they are all a
variation of that shown in Figure 4.1. The architecture consists of configurable logic blocks,
configurable I/O blocks, and programmable interconnect. Also, there will be clock circuitry
for driving the clock signals to each logic block, and additional logic resources such as
ALUs, memory, and decoders may be available. The two basic types of programmable
elements for an FPGA are Static RAM and anti-fuses.
24
Figure 4.3: FPGA Configurable I/O Block
25
Figure 4.4: FPGA Programmable Interconnect
27
■ Lucent Technologies ORCA family
Write a Specification
Specification Review
Design
Simulate
Design Review
Synthesize
28
Place and Route
Resimulate
Final Review
Chip Test
Chip Product
■ An external block diagram showing how the chip fits into the system.
29
■ input threshold level
■ Package type
■ Target price
■ Test procedures
It is also very important to understand that this is a living document. Many sections will have
best guesses in them, but these will change as the chip is being designed.
■ Use logic that fits well with the architecture of the device you have chosen
■ Macros
■ Synchronous design
31
design should be simulated separately before hooking them up to larger sections.
There will be much iteration of design and simulation in order to get the correct
functionality. Once design and simulation are finished, another design review must take
place so that the design can be checked. It is important to get others to look over the
simulations and make sure that nothing was missed and that no improper assumption was
made. This is one of the most important reviews because it is only with correct and
complete simulation that you will know that your chip will work correctly in your system.
4.11.6 Synthesis:
If the design was entered using an HDL, the next step is to synthesize the chip. This
involves using synthesis software to optimally translate your register transfer level
(RTL) design into a gate level design that can be mapped to logic blocks in the FPGA.
This may involve specifying switches and optimization criteria in the HDL code, or
playing with parameters of the synthesis software in order to insure good timing and
utilization.
4.11.9 Testing:
32
For a programmable device, you simply program the device and immediately have your
prototypes. You then have the responsibility to place these prototypes in your system
and determine that the entire system actually works correctly. If you have followed
the procedure up to this point, chances are very good that your system will perform
correctly with only minor problems. These problems can often be worked around by
modifying the system or changing the system software. These problems need to be tested
and documented so that they can be fixed on the next revision of the chip. System
integration and system testing is necessary at this point to insure that all parts of the
system work correctly together. When the chips are put into production, it is necessary to
have some sort of burn-in test of your system that continually tests your system over
some long amount of time. If a chip has been designed correctly, it will only fail
because of electrical or mechanical problems that will usually show up with this kind of
stress testing.
33
Figure 4.6: Top-Down Design
Top-down design is the preferred methodology for chip design for several reasons.
First, chips often incorporate a large number of gates and a very high level of
functionality. This methodology simplifies the design task and allows more than one
engineer, when necessary, to design the chip. Second, it allows flexibility in the design.
Sections can be removed and replaced with a higher-performance or optimized designs
without affecting other sections of the chip. Also important is the fact that simulation
is much simplified using this design methodology. Simulation is an extremely important
consideration in chip design since a chip cannot be blue-wired after production. For this
reason, simulation must be done extensively before the chip is sent for fabrication. A top-
down design approach allows each module to be simulated independently from the rest
of the design [25]. This is important for complex designs where an entire design can
take weeks to simulate and days to debug. Simulation is discussed in more detail later in
this chapter.
34
the architecture to provide you with faster, more optimal designs.
35
Figure 4.7: Asynchronous: Race Condition
36
Figure 4.9: Asynchronous: Delay Dependent Logic
37
Figure 4.11: Asynchronous: Hold Time Violation
4.16 Glitches:
A glitch can occur due to small delays in a circuit such as that shown in Figure 4.12. An
inverting multiplexer contains a glitch when switching between two signals, both of
which are high. Yet due to the delay in the inverter, the output goes high for a very short
time. Synchronizing this output by sending it through a flip-flop as shown in Figure
4.13, ensures that this glitch will not appear on the output and will not affect logic
further downstream.
39
Figure 4.15: Synchronous: Good Clocking
4.18 Metastability:
One of the great buzzwords, and often misunderstood concepts, of synchronous
design is metastability. Metastability refers to a condition which arises when
an asynchronous signal is clocked into a synchronous flip-flop. While chip designers
would prefer a completely synchronous world, the unfortunate fact is that signals coming
into a chip will depend on a user pushing a button or an interrupt from a processor, or will
be generated by a clock which is different from the one used by the chip. In these cases,
the asynchronous signal must be synchronized to the chip clock so that it can be used by
the internal circuitry. The designer must be careful how to do this in order to avoid
metastability problems as shown in Figure 4.16. If the ASYNC_IN signal goes high
around the same time as the clock, we have an unavoidable race condition [25]. The
output of the flip-flop can actually go to an undefined voltage level that is somewhere
between a logic 0 and logic 1. This is because an internal transistor did not have enough
time to fully charge to the correct level. This meta level may remain until the transistor
voltage leaks off or ―decays‖, or until the next clock cycle. During the clock cycle, the
gates that are connected to the output of the flip-flop may interpret this level differently.
In the figure, the upper gate sees the level as logic 1 whereas the lower gate sees it as
logic 0. In normal operation, OUT1 and OUT2 should always be the same value. In this
case, they are not and this could send the logic into an unexpected state from which it
may never return. This metastability can permanently lock up your chip.
40
Figure 4.16: Metastability - The Problem
41
This method of timing analysis is growing less and less popular. It involves including
timing information in a functional simulation so that the real behavior of the chip
is simulated. The advantage of this kind of simulation is that timing and functional
problems can be examined and corrected. Also, asynchronous designs must use this type
of analysis because static timing analysis only works for synchronous designs. This is
another reason for designing synchronous chips only. As chips become larger, though,
this type of compute intensive simulation takes longer and longer to run. Also,
simulations can miss particular transitions that result in worst case results. This means that
certain long delay paths never get evaluated and a chip with timing problems can pass
timing simulation. If you do need to perform timing simulation, it is important to do
both worst case simulation and best case simulation. The term ―best case‖ can be
misleading. It refers to a chip that, due to voltage, temperature, and process variations, is
operating faster than the typical chip. However, hold time problems become apparent only
during the best case conditions
CHAPTER V
SYSTEM IMPLEMENTATION
42
5.1 Introduction
This chapter covers the material on the implementation of Fast Fourier Transform and
Inverse Fast Fourier Transform design modules and verification of the design modules.
Behavioral synthesis is used to transfer the mathematical algorithm into VHDL program.
There are various types of architecture design for IFFT module. Some of the design use DSP
chip as the main part to implement the core-processing block, which is IFFT computation.
This issue has been discussed in the previous chapter and as stated in that chapter, FPGA is
the most cost effective to implement the design. As we know that, the IFFT transmitter
consists of several block or modules to implement the system using the IFFT function. After
consulting various books, white paper and journal, the proposed transmitter design is consist
of serial to parallel converter, modulator bank, processing block, parallel to serial converter
and cyclic prefix block module. This module block diagram is close to the standard for all
IFFT systems. It was in close accordance with the systems discussed in the primary resource
textbooks. These sources and several technical papers, served as useful tools to validate our
design.
43
prevent data overflow, the data needs to be scaled beforehand leaving enough extra bits for
growth. Alternatively, the data can be scaled after each stage of the FFT computation. The
technique of scaling data after each pass of the FFT is known as block floating point. It is
called this because a full array of data is scaled as a block regardless of whether or not each
element in the block needs to be scaled. The complete block is scaled so that the relative
relationship of each data word remains the same. For example, if each data word is shifted
right by one bit (divided by 2), the absolute values have been changed, but relative to each
other, the data stays the same.
The use of a floating-point DSP eliminates the need for data scaling and therefore results in a
simpler FFT routine, however the tradeoff is the increased processing time required for the
complex floating-point arithmetic.
5.3 Implementation
When FFT and IFFT based FPGAs system design specify the architecture of FFT and
IFFT from a symbolic level, this level allows us using VHDL which stands for
VHSIC (Very High Speed Integrated Circuit) Hardware Programming Language VHDL
allows many levels of abstractions and permits accurate description of electronic
44
components ranging from simple logic gates to microprocessors. VHDL have tools
needed for description and simulation which leads to a lower production cost. This
section presents hardware implementation for the FFT and IFFT previously described.
The FPGA implementations of FFT and IFFT architecture was developed using VHDL
with 32 bit floating point arithmetic. Because floating points have greatest amount of
dynamic range for any applications. Unfortunately, there is currently no clear support
for floating-point arithmetic in VHDL. As a result, VHDL library was designed for using
FFT and IFFT algorithm on FPGAs.
The design is now proceeded to the implementation stage. The process involved in this stage
is device programming. Device programming is the process to program FPGA board using
software. This process basically will burn hardware design into FPGA board.
Hardware module is developed using VHDL language. The module developed include
FFT/IFFT, both of this module function is describe as in paragraph below.
45
Figure 5.2 FFT module
Figure 5.2 show the block diagram for FFT module. This basic module consists of only two
inputs which is DataA and DataB. Opcode is used to select the operation performed by the
module. Result will be delivered through Result port. Several operations are performed by this
hardware where each operation executed in one clock cycle. Each operation is assigned to the
unique opcode value. Referring to the source code, FFT module has eight operations involved
such as addition, subtraction, multiplication, pass module and conversion from positive number
to negative.
CHAPTER VI
DESIGN WALKTHROUGH
6.1 Introduction
This chapter discusses the methodology of the project and tools that involved in the process
to complete the design process of FFT/IFFT module in the FPGA hardware. The topic
basically covers on the usage of the VHDL software.
46
The methodology of the project is basically divided into four main stages. These stages is
started with study the relevant topics and followed by the design process, implementation,
test and analysis stages. All stages are subdivided into several small topics or sub-stages and
explanation for each stage will be carried out in this chapter.
Each of the software function will be discussed in this chapter. For hardware part, a FPGA is
used and some documentation regarding this described in chapter 4.
VHDL design is the first steps to perform in the design process. The FFT/IFFT modules are
programmed in VHDL language. Basically this process is to generate the VHDL source
code. After generating the code, VHDL software is used to verify the generated code. The
47
software will perform two processes which is VHDL analyzer and logic synthesis. VHDL
analyzer output is used as the logic synthesis and design verification. In logic synthesis, the
net list file which obtain from VHDL analyzer is synthesized base on the design constrain
and technology library available in the software. There are two types of simulation at the
design verification which is functional simulation and timing simulation. The functional
simulation is to simulate the hardware function and this process is not carried out since the
software used is not available. But the timing simulation is perform using Model Sim
software. The timing simulation is providing the timing function for the designed hardware.
48
49
6.4 Description of VHDL language
6.4.1 Very High Speed Integrated Circuits Hardware Descriptive Language
VHDL is an acronym, which stands for VHSIC hardware Description language VHSIC is yet
another acronym, which stands for very high, speed integrated circuits. VHDL can wear
many hats. It is being used for documentation, verification, and synthesis of large digital
design. This is actually one of the key features of VHDL, since the same VHDL code can
theoretically achieve all three of these goals, thus saving a lot of effort. In addition to being
used for each of these purposes, VHDL can be used to take three different approaches for
describing hardware. These three approaches are the structural, data flow and behavioral
methods of hardware description. Most of the times a mixture of the three methods are
employed. VHDL is a standard (VHDL- 1076) developed by IEEE (institute of electrical and
electronics engineers). VHDL is an industry standard language used to describe hardware
from the abstract to the concrete level. VHDL is rapidly being embraced as the universal
communication medium of design. Computer aided engineering workstation venders
throughout the industry are standardizing VHDL as input and output from their tools, etc.
50
VHDL contains level of representations that can be used to representations all levels of
description from the bi-directional switch level in between. VHDL is designed to fulfill a
number of needs in the design process. First, it allows the description if the structure of the
design, that is how it can be decompose into sub-designs, and how these sub-designs are
interconnected. Secondly, it allows the specifications of the functions of the designs using
familiar programming language forms. Thirdly, as a result it allows a design to be simulated
before being manufactured so that designers can quickly compare alternatives and test for
correctness without the delay and the expense of hardware prototyping. VHDL is intended to
provide a tool that can be used by the digital systems community to distribute their designs in
a standard format using VHDL; they are able to talk to each other about their complex digital
circuit in a common language without the difficulties of revealing technical details. It is a
standard and unambiguous way of exchanging device and system models so that engineers
clear idea early in the design process where components from separate contractors may need
more work to function properly. It enables manufactures to document and archive electronics
systems and components in a common format allowing various parties to understand and
participate in the systems development.
As a standard description of digital systems, VHDL is used as input and output to various
simulations, synthesis and layout tools. The language provides the ability to describe
systems, networks and components at a very high behavioral level as well as very low gate
level. It also represents a top down methodology and environment. Simulation can be carried
out at any level from a generally functional analysis to a very detail gate level waveform
analysis. Synthesis is carried out currently only at the register level. A register transfer level
(RTL) description of hardware consists of a series of Boolean logic expression. Once a
VHDL design has been decomposed down to register level, a VHDL synthesizer can
generate the application specific integrate circuit (ASIC) representation or schematic for the
PC board layout.
Top-down design first describes the system at a very high level of abstraction, like a
specification. Designers simulate and debug the system at this very high level before refining
it into smaller components. The method describes each component at a high level and debugs
it alone and with other components in system. The design continues to be refined and debug
until it is complete down to its lowest building block. Mixed level design occurs when some
51
components are a more detail level of description than others the advantage of the top down
design methodology is that engineers can discover and correct systems requirements and
timings. The tedious task of gate level design can be left to synthesis tools. The bottom line is
to reduce cost and faster time of manufacturing.
A. ENTITY
All designs are expressed in terms of entities. An entity is the most basic building block in a
design. The uppermost level of the design is the top-level entity. If the design is hierarchical
then the top-level description will have lower-level description contained in it. These lower-
level descriptions will be lower-level entity description.
The syntax for declaring an entity is:
entity_declaration ::=
entity identifier is
entity_header
entity_declarative_part
[ begin
entity_statement_part ]
end [ entity_simple_name ] ;
OR
Entity entity_name is
Port ( port_list);
end entity_name;
eg:
entity and_gate is
port (a: in std_logic;
b: in std_logic;
y: out std_logic);
end and_gate;
eg:
52
entity deco8_3 is
port (a: in std_logic_vector(2 downto 0);
y: out std_logic_vector(7 downto 0));
end deco8_3;
(iii)mode inout
Value can be assigned can be read from it i.e. BIDIRECTIONAL.
Eg: entity and_gate is
port (x: inout std_logic);
end and_gate;
(iv) mode buffer
Output port with internal read capability.
Eg: entity and_gate is
port (w: buffer std_logic);
53
end and_gate;
B. ARCHITECTURE
All entities that can be simulated have an architecture description. The architecture des-
cribes the behavior of the entity. A single entity can have multiple architecture. One
architecture might be behavioral another might be structural description of the design. An
architecture body is declared using the syntax:
architecture_body ::=
architecture identifier of entity_name is
architecture_declarative_part
begin
architecture_statement_part
end [ architecture_simple_name ] ;
architecture_declarative_part ::= { block_declarative_item }
architecture_statement_part ::= { concurrent_statement }
C. CONFIGURATION
A configuration statement is used to bind component instants to an entity architecture pair. A
configuration can be considered like a part list for a design. It describes which behavior to use
for each entity much as a part list describes which part to use in the design.
D. GENERIC
A generic is VHDL term for a parameter that passes information to an entity for instance
if an entity is gate-level model with a rise and fall delay, the values for the rise and fall
54
delays could be passed into the entity with generics.
E. PROCESS
A process is a basic unit of execution in VHDL. All operations that are performed in a
simulation of a VHDL description is broken into a single or multiple processes.
55
All designs are created from entities. An entity in VHDL corresponds directly to a symbol
in the traditional CAE workstation design methodology. Let us look at the top-level entity
for the rsff symbol described earlier. An entity for the RSff would look like this:
ENTITY rsff IS
PORT (set, reset: IN BIT;
Q, QB: BUFFER BIT);
END rsff;
The keyword ENTITY signifies that this is the start of an entity statement. In the
descriptions shown throughout the project , keywords of the language and types provided
with the STANDARD package will be shown in all CAPITAL letters. For instance, in the
example above the keywords are ENTITY, PORT ,IS BUFFER , etc. the standard type
provided in BIT. Names of user-created objects such as rsff, in the example above will be
shown in lower case italics. The name of the entity is rsff, as was the name of the symbol
described earlier. The entity has four ports in the PORT clause. Two ports are of mode IN
and two ports are of mode BUFFER. The reason for port mode BUFFER instead of just OUT
will be described later. The two input ports correspond directly to the two input pins on the
symbol from the workstation. The two buffer ports correspond directly to the two output
ports for the symbol. All of the ports have a type of BIT. The entity describes the interface
onto the outside world. It specifies the number of ports, the direction of the ports, and the
type of the ports. A lot more information can be put into the entity that is shown here, but
this will give us a foundation upon which we can later build.
56
an inactive value ('1'), then NAND gate U1 will have two '1' values as input, causing it to
output a value of '0' on the output Q . The RSF will have been reset. The schematic for the
RSff component also has a counterpart in VHDL, it is called architecture. Architecture is
always related to an entity and describes the behavior of that entity. Architecture for the rsff
device described above would look like this:
Q
Clk
Q’
S
The keyword ARCHITECTURE signifies that this statement will describe architecture for
an Entity. The architecture name will be netlist. The entity that the architecture is describing
is called rsff . The reason for the connection between the architecture and the entity is that an
entity can have multiple architectures describing the behavior of the entity. For instance, one
architecture could be a behavioral description and another like the one shown above, could
be a structural description. The textual area between the keyword ARCHITECTURE and the
keyword BEGIN is where local signal and components are declared for later use. In this
example , there are two instances of a nand2 gate placed in the architecture. The compiler
needs to know what the interface to the component placed in the architecture are. The
component declaration statement will describe that information to the compiler . The
statement areas of the architecture start with the keyword BEGIN. All the statements
57
between the BEGIN and the END netlist statement are called concurrent statements, because
all of the statements execute concurrently.
58
verify that the developed model is correct, with few or no errors being found. It should not
be a means to locate errors in the VHDL code in order to patch them . If the test bench
incorporate models of components surrounding the model to be tested, they need only to
incorporate function‘s and interfaces required to properly operate with the model under test,
it is not necessary to develop complete VHDL models for them. If external stimuli or
configuration data is required, it should be implemented by reading an ASCII file in order to
ensure portability. Someone not involved in the creation of should perform the verification
that model or package, to avoid that a misunderstanding of the functionality is masked by the
same misunderstanding in the verification . A test-bench has three main purposes:
1. To generate stimulus for simulation ( waveform )
2. To apply this stimulus to the entity under test and collect the output responses
3. To compare output responses with expected values
Stimulus is automatically applied to the entity under test by instantiating the entity under
Test bench model and then specifying the appropriate signals .
The main purpose of a model for board-level simulation is to be used for verification of a
board using component , normally together with several other components . This can be
seen as the simulation version of bread boarding . This implies that the model must have
acceptable simulation speed , but only need to model the functionality possibly affecting the
board and other models . The model should be on the Register transfer level (RTL) or higher.
The model need necessarily not reflect the actual structure of the component.
The main purpose of the system level simulation is to provide the functionality of a board, a
subsystem, or a protocol, with a simulation speed allowing trade-offs to be performed. No
Similarity with any hardware is necessary, as long as the desired functionality is achieved.
The behavior may be approximated with respect to details such as tiling aspects, exactly
which clock cycle an event occurs, the exact numerical value of result, etc. VHDL modelings
Guidelines are rich and good references for VHDL modeling. It defines requirement on
VHDL models and test benches, and is intended to be used as an application document for
European Space Agency (ESA) development involving VHDL modeling. It is mainly
focused on digital models, specific requirements for analog modeling have not been covered.
60
End Component;
Begin
X1; XOR2 port map (A, B, SUM);
A1: AND2 port map (A, B, CARRY);
End HA_STRUCTURE;
The name of architecture body is HA_STRUCTURE. The entity declaration for HALF
_ADDER specifies the interface ports for this architecture body. The architecture body is
Composed of two parts: the declarative part (before the keyword begins) and the statement
parts (after the keywords begins). Two components declaration is present in the declarative
part of architecture body. These declarations specify the interface of components that are
used in the architecture body. The component XOR2 and AND2 May either is predefined
components in a library or, if they do not exist, they may late Bound to other components in
a library.
The declared components are instantiated in the statement part of the architecture body
using component instantiation statements X1 and A1 are components labels for these
components instantiation. The first component instantiation statement, labeled X1 shows
that signals A and B(the input ports of HALF_ADDER),are connected to the X and Y input
ports of XOR2 components, while output Z of this component is connected to output port
SUM of the HALF_ADDER entity. Similarly, in the second component instantiation
statement, signals A and B are connected to ports L and M of the AND2 component, while
port N is connected to the CARRY port of the HALF_ADDER. Note that in this case, the
signals in the port map of a component instantiation and the port signals in the component
declaration are associated by position (called positional association). The structural
representation for the HALF_ADDER does not say any thing about its functionality.
Separate entity models would be described for the components XOR2 and AND2, each
having its own entity declaration and architecture body.
61
End component;
Component NAND3
Port (D0, D1, D2: in BIT; DZ: out BIT);
End component;
Signal ABAR, BBAR: Bit0;
Begin
V0: INV port map (A, ABAR);
V1: INV port map (B, BBAR);
N0: NAND3 port map (ENABLE, ABAR, BBAR, Z (0));
N1: NAND3 port map (ABAR, B, ENABLE, Z (1));
N2: NAND3 port map (A, BBAR, ENABLE, Z (2));
N3: NAND3 port map (A, B, ENABLE, Z (3));
End DEC_STR;
In this example, the name of the architecture body is DEC_STR, and it is associated with the
entity declaration with the name DECODER2*4; therefore, it inherits the list of interface
port from that entity declaration. In addition to the two component declarations
(for INV and NAND3), the architecture body contains a signal declaration that declares two
signals, ABAR and BBAR of type BIT. These signals, which represent wires, the used to
connect the various components that from the decoders. The scope of these signals is
restricted to the architecture body; therefore, these signals are not visible outside of
architecture body. Contrast these signals with the ports of an entity declaration, available for
use within any architecture body associated with the entity declaration. A component
instantiation statement is a concurrent statement. Therefore the order of the statements is not
important. The structural style of modeling describes only an interconnection of components
(viewed as black boxes 0, without implying any behavior of the components themselves nor
of the entity that they collectively represent. In the architecture body DEC_STR, the signal
A, B, and ENABLE used in the component instantiation statements are the input ports
declared in the DECODER2*4 entity declaration for example, in the components
instantiation labeled N3, port A is connected to input D0 of components NAND3, port B is
connected to the Input port D1 of components NAND3, port ENABLE is connected to the
output DZ and ports (3) of the DECODER2*4 entity is connected to the output port DZ of
component NAND3. Again, positional association is used to map signals in a port map of the
component instantiation with the ports of components specified in its declaration. The
62
behavior of the component NAND3 and INV are not apparent, nor is the behavior of the
decoder entity that the structural model represents.
The dataflow model for the HALF_ADDER is described using two concurrent signal
assignment statements (sequential signals assignment statements are described in the next
section). In a signal assignment statement, the symbol <= implies an assignment of a value
to a signal. The value of the expression on the right hand side of the statement is computed
and is assigned to the left hand side , called the target signal . A concurrent signal assignment
statement is executed only when any signal used in the expression on the right hand side has
an event on it , that is , for the signal change .
The architecture body consists of one signal declaration and six concurrent signals
assignment statements. The signal declaration declares signals ABAR and BBAR to be used
locally within the architecture body. In each of the signal assignments, no after clause was
used to specify delay. In all such cases, a default delay of 0 ns is assumed This Delay of 0 ns
is also known as delta and it represents as an infinitesimal delay. To the Behavior of this
architecture body, consider an event happening on one of the input signals, say, input port B
at time T. This would cause the concurrent signals assignment statement 1,3,and 6 to be
triggered. Their right- hand -side expression would be evaluated, and the corresponding
values would be scheduled to be assignee to the target signal at time (T+). When simulation
time advances to (T+), new values are assigned to signals Z (3) BBAR and Z (1). Since the
63
value of BBAR changes, this will in turn trigger signal assignment statements 2 and 4.
Eventually, at time (t=20, signals Z (0) and Z (2) will be assigned their new values.
The semantics of this concurrent behavior indicate that the simulation as defined by the
language , is event triggered and that simulation time advance to the next time unit when an
event is scheduled to occur. Simulation time could also advance a multiple of delta time unit
. For example, events may have been scheduled to occur. At times 1,3, 4, 4+, 5, 6, 6+, 6+3,
10, 10+, 15, 15+ time units. The after clause may be used to generate a clock signal as shown
in the following concurrent signal assignment statement .
Clk
10 20 30 40 50 60 70ns
64
BBAR: = not B; ............. statement 2
If ENABLE = '1' then ............. statement 3
Z (3) <= not (A and B); ............. statement 4
Z (0) <= not (ABAR and BBAR); ............. statement 5
Z (2) <= not (A and BBAR); ............. statement 6
Z (1) <= not (ABAR and B); ............ statement 7
Else
Z<= "1111‖; ............. statement 8
End if;
End process;
End DEC_SEQUENTIAL;
A process statement also has a declarative part (before the keyword begin) and a statement
part (between the keywords begin and end process). The statements appearing within the
statement part are sequential statements and are executed sequel- -initially . The list of
signals specified within the parenthesis after the keyword process constitutes a sensitivity
list and the process statement is invoked whenever there is an event on any signal in this list.
In the previous example, when an event occurs on signal A, B or ENABLE, the statements
appearing within the process statement are executed sequentially. The variable declaration
(start with the keyword variable) declaration two variables called ABAR, BBAR. A variable
is different from a signal in that way it is always assigned a value instantaneously and the
assignment operator used is the := compound symbol; contrast this with a signal that is
assigned a value always after a certain delay (user specified or the default delta delay ), and
the assignment operator used to assign a value to a signal is the <= compound symbol.
Variable declared within a process have their scope limited to that process. Variables can
also be declared in subprograms; variable declared outside of a process or a process or a
subprogram are called shared variables. These variables can be updated and read by more
than one process. Note that signals cannot be declared within process. Signal assignment
statements appearing within a process are called sequential signal assignment statements.
Sequential signal assignment statements, including variable assignment statements are
executed sequentially independent of whether an event occurs on any signal in its right hand
side expression ; contrast this with the execution of concurrent signal assignment statements
in the data flow modeling style . In the previous architecture body, if any event occurs on
65
any signal A, B, or ENABLE, statement 1, which is a variable assignment statement, is
executed then statement 2 is executed and so on, execution of the third statement, an if
statement, causes control to jump to the appropriate branch based on the value of the signal
ENABLE. If the value of ENABLE is ‗1‘, the next 4 signal assignment statements, 4 through
7 are executed independent of their respected values after delta delay. If ENABLE has a
value ‗0‘, a value of '1' is assigned is to each of the elements of the output array Z. When
execution reaches the end of the process, the process suspends itself and wait for another
event to occur on a signal in its sensitivity list. It is possible to use case or loop statements
within the process. The semantics and structure of these statements are very similar to those
in other high level programming like C or PASCAL.
An explicit wait statement can also be used to suspend a process. It can be used to wait for
certain amount of pan, until a certain condition becomes true, until an event occurs on one
or more signals here is an example of a process statement that generates a clock with a
different on a period .
Process
Begin
CLK <= '0';
Wait for 20ns;
CLK <='1';
Wait for 12ns;
End process;
Clk
0 10 32 52 64 84 96 ns
This process does not have a sensitivity list because explicit wait statements are present
inside a process. It is important to remember that a process never terminates. It is always
either being executed or in a suspended state. All processes are executed once during the
initialization phase of simulation until they are suspended. Therefore, a process with no
sensitivity list and no explicit wait statement will never suspend itself. A signal can represent
not only a wire but also a placeholder for a value, that is , it can be used model a flip-flop.
Here is such an example; Port signal Q models a level-sensitive flip-flop.
66
Entity LS_DFF is
Port (Q: out BIT; D, CLK: in BIT) ;
End LS_DFF;
Architecture LS_DFF_BEH of LS_DFF is
Begin
Process (D, CLK)
Begin
If
CLK=1‘ then Q<=D;
End if;
End process;
End LS_DFF_BEH;
The process executes whenever there is an event on signal D or CLK. If the value of CLK is
‗1‘, the value of D is assigned to Q. if CLK is ‗0‘, then no assignment to Q takes place. Thus,
as long as CLK is ‗1‘, any change on D will appear on Q. once CLK becomes ‗0‘, the value
in Q is retained.
It is possible to mix the three modeling styles that we have seen so far in signal architecture
body. That is, within an architecture body, That is, within an architecture body, we could use
component instantiation statement (that represent structure), concurrent signal assignment
statements (that represent dataflow), and process statements (that represent behavior). Here is
an example of a mixed style model for the one-bit figure below.
The full-adder is represented using one component instantiation statement, one process
statement, and one concurrent signal assignment statement. All of these statements are
concurrent statements; therefore, their order of appearance with the architecture: body is not
important. Note that a process statement itself is a concurrent statement; however, statement
within a process statement is always executed sequentially. S1 is a signal locally within the
architecture body and is used to pass the value from the output of the component Xl to the
expression for signal SUM.
The machine can usually be modeled using a case statement in a process. The state
Information is stored in a signal. The multiple branches of the case statement contain the
behavior for each state.
The VHDL design modeling is self elucidated from the following Y-chart in general for any
circuit design.
68
Figure 6.3 Y-chart
6.5 MATLAB
69
Matlab is a multi purpose software which usually is used for mathematical computation in
the engineering field. In this project Matlab software is used for verification of FFT/IFFT
module.
70
71
72
73
CHAPTER VIII
The waveform result for these modules was given in result and simulation chapter and the
discussion regarding the operation of these modules was also made in above chapter. The design
can be further made to an improvement based on the suggestion discussed in later sections.
Use higher fixes point representation for point value representation. Floating point format also
can be considered as the solution to reduce error number representation especially for twiddle
factor value which is 0.7071. Although floating point consume processing time and output
latency, but it is an excellence method to overcome accuracy problem.
Beside that, it is suggested to create a circuit to detect the overflow by indicating the flag or
whatever way to ensure that the user know that the input given creates error to the system. The
user will notice this problem and will change the input value to ensure no error occurred.
In this design, the receiver module which is mainly using FFT is good at processing for the
positive input value. Therefore, any imaginary value should be mapped into real value such that
receiver can process the input data correctly.
For the future works, it is suggested to develop other modules such as interleaving, error
correction, QAM or QPSK modulation, cyclic prefix module and RF part. These modules will
make a complete set of system for transmitter and receiver.
75
8.3 Implementation Challenges
This section covers on the analysis of the results obtained from the FFT and IFFT module.
The comparison results between Matlab and these modules were shown in the Table 7.1,
Table 7.2 and Table 7.3……………………………. in the previous chapter.
As notice from the comparison result, it can be concluded that the results obtained were not
exactly correct as using Matlab software for FFT/IFFT module. This section will present
some of the reason and discussed why this problem occurred and suggest the best solution for
the problem encountered.
8.3.1 Accuracy
The biggest problem when dealing with hardware in implementing mathematical computation is
the accuracy. The main reason why computation using hardware module is not accurate as using
software base is that the multiplication and division is using fix point number instead of floating
point. The weakness of using fix point representation is, approximation made for number
representation introduce error. For example decimal numbers for 0.7071 in binary is 010110101.
If this binary number converted back into decimal, the result is equal to 0.70. From this example,
it is proven that the twiddle factor is not represented accurately.
Figure 8.1 show an example of integer number multiply with twiddle factor in decimal and
binary representation. Result for decimal number can be shown up to 0.0001 point of
76
accuracy. Compare to binary representation, the result obtained from computation only can
be displayed in 8 bit representation which is 7.The 8 bit number multiply with 8 bit number
will result 16 bit number. As we notice that, the result only can store 8 bit number thus, the
register only can store 7 instead of 7.7781 81 and consequently it sacrifice the accuracy. This
only involves one operation, since in FFT/IFFT requires many stages and many operations
the final result will be totally different at some of the output. In Matlab computation is done
all in floating point format and only during the final result, it is made rounded. Compared to
FPGA module, each operation is already having an error, thus obviously some output will not
given same value. Fix point number provide faster processing time, less circuit complexity
and less usage of memory module compared to floating point representation. The result
shown in this example is only for one multiplication. The result become worse if there are
more than two twiddle multiplication.
Figure 8.2 shows the example of division process at the IFFT module. In decimal, the result
can be shown accurately until up to 0.001. But for binary representation, the result can be
shown is 1. In binary, division process is represented as the multiplication process because it
simplifies the programming code. Division of 8 can be shown as multiplication of number
with 0.125.
77
8.3.4 Overflow
Overflow is the main problem in binary representation for arithmetic process. Basically,
overflow is occurred when the value of carry in is not same as the value for carry out. Figure
8.3 depicts the example clearly.
This example shows the addition of both positive decimal numbers. As mention before,
maximum positive number representation for 8 bit two‘s complement is 127. Thus in
decimal, additions of this number will results 252. But in binary two‘s complement, the value
252 is equal to -4. In this case, the result obtained for this addition will create an error result.
It can be concluded that the problem encountered are as discussed as above which leads to
the reason on why FPGA and Matlab computation gives different result. Some suggestion is
proposed and is stated at the last future section.
78
References
[4] A Partially Operated FFT/IFFT Processor for Low Complexity OFDM Modulation
and Demodulation of WiBro In-car Entertainment System ISCAS 2000 - IEEE
International Symposium on Circuits and Systems, May 28-31, 2000, Geneva,
Switzerland
[8] IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 40, NO. 8, AUGUST 2005
A 1-GS/s FFT/IFFT Processor for UWB Applications Yu-Wei Lin, Hsuan-Yu Liu, and
Chen-Yi Lee, Member, IEEE
[10] A bit-serial FFT processor Eric Ericson Fabris; Gustavo André Hoffmann; Altamiro
Susin; Luigi Carro UFRGS Federal Univ. – Microelectronics Group Caixa Postal 15064
91501-970 Porto Alegre, Brazil {fabris, susin, carro}@inf.ufrgs.br {Gusthoff} @ieee.org
[12] A Systolic FFT Architecture for Real Time FPGA Systems Preston A. Jackson, Cy
P. Chan, Jonathan E. Scalera, Charles M. Rader, and M. Michael Vai MIT Lincoln
Laboratory 244 Wood ST, Lexington, MA 02420
{preston,cychan,jscalera,cmr,mvai}@ll.mit.edu
[13] Steven W. Smith, The Scientist and Engineer‘s Guide to Digital Signal Processing,
Second Edition, 1999, California Technical Publishing, P.O. Box 502407, San Diego, CA
92150. Also available for free download at: http://www.dspguide.com or
http://www.analog.com
80
[18] ―Digital Signal Processing and Applications‖ Vallabhraj.
81