Você está na página 1de 25

1

CHAPTER 1

INTRODUCTION

1.1 FAST FOURIER TRANSFORM


A Fast Fourier Transform (FFT) is an efficient algorithm to compute the Discrete
Fourier Transform (DFT) and it’s inverse. There are many distinct FFT algorithms
involving a wide range of mathematics, from simple complex-number arithmetic to group
theory and number theory. The fast Fourier Transform is a highly efficient procedure for
computing the DFT of a finite series and requires less number of computations than that
of direct evaluation of DFT. It reduces the computations by taking advantage of the fact
that the calculation of the coefficients of the DFT can be carried out iteratively. Due to
this, FFT computation technique is used in digital spectral analysis, filter simulation,
autocorrelation and pattern recognition.

The FFT is based on decomposition and breaking the transform into smaller
transforms and combining them to get the total transform. FFT reduces the computation
time required to compute a discrete Fourier transform and improves the performance by a
factor of 100 or more over direct evaluation of the DFT.

A DFT decomposes a sequence of values into components of different


frequencies. This operation is useful in many fields but computing it directly from the
definition is often too slow to be practical. An FFT is a way to compute the same result
more quickly: computing a DFT of N points in the obvious way, using the definition,
takes O( N2 ) arithmetical operations, while an FFT can compute the same result in only
O(N log N) operations.

The difference in speed can be substantial, especially for long data sets where N
may be in the thousands or millions—in practice, the computation time can be reduced by
2

several orders of magnitude in such cases, and the improvement is roughly proportional
to N /log (N). This huge improvement made many DFT-based algorithms practical. FFT’s
are of great importance to a wide variety of applications, from digital signal processing
and solving partial differential equations to algorithms for quick multiplication of large
integers.

The most well known FFT algorithms depend upon the factorization of N, but
there are FFT with O (N log N) complexity for all N, even for prime N. Many FFT
algorithms only depend on the fact that is an N th primitive root of unity, and thus can be
applied to analogous transforms over any finite field, such as number-theoretic
transforms.

The Fast Fourier Transform algorithm exploit the two basic properties of the
twiddle factor - the symmetry property and periodicity property which reduces the
number of complex multiplications required to perform DFT.

FFT algorithms are based on the fundamental principle of decomposing the


computation of discrete Fourier Transform of a sequence of length N into successively
smaller discrete Fourier transforms. There are basically two classes of FFT algorithms.
A) Decimation In Time (DIT) algorithm
B) Decimation In Frequency (DIF) algorithm.

In decimation-in-time, the sequence for which we need the DFT is successively


divided into smaller sequences and the DFTs of these subsequences are combined in a
certain pattern to obtain the required DFT of the entire sequence. In the decimation-in-
frequency approach, the frequency samples of the DFT are decomposed into smaller and
smaller subsequences in a similar manner.

The number of complex multiplication and addition operations required by the


simple forms both the Discrete Fourier Transform (DFT) and Inverse Discrete Fourier
Transform (IDFT) is of order N2 as there are N data points to calculate, each of which
requires N complex arithmetic operations.
3

The discrete Fourier transform (DFT) is defined by the formula:

− j 2ΠnK
N −1 N
X (K ) = ∑ x(n) •e ;
n=0
Where K is an integer ranging from 0 to N − 1.

The algorithmic complexity of DFT will O(N2) and hence is not a very efficient
method. If we can't do any better than this then the DFT will not be very useful for the
majority of practical DSP application. However, there are a number of different 'Fast
Fourier Transform' (FFT) algorithms that enable the calculation the Fourier transform of
a signal much faster than a DFT. As the name suggests, FFTs are algorithms for quick
calculation of discrete Fourier transform of a data vector. The FFT is a DFT algorithm
which reduces the number of computations needed for N points from O(N 2) to O(N log
N) where log is the base-2 logarithm. If the function to be transformed is not
harmonically related to the sampling frequency, the response of an FFT looks like a
‘sinc’ function (sin x) / x.

The Radix-2 DIT algorithm rearranges the DFT of the function xn into two parts:

a sum over the even-numbered indices n = 2m and a sum over the odd-numbered indices
n = 2m + 1:
4

One can factor a common multiplier out of the second sum in the
equation. It is the two sums are the DFT of the even-indexed part x2m and the DFT of

odd-indexed part x2m + 1 of the function xn. Denote the DFT of the Even-indexed inputs

x2m by Ek and the DFT of the Odd-indexed inputs x2m + 1 by Ok and we obtain:

However, these smaller DFTs have a length of N/2, so we need compute only N/2
outputs: thanks to the periodicity properties of the DFT, the outputs for N/2 < k < N from
a DFT of length N/2 are identical to the outputs for 0< k < N/2. That is, Ek + N / 2 = Ek

and Ok + N / 2 = Ok. The phase factor exp[ − 2πik / N] called a twiddle factor which

obeys the relation: exp[ − 2πi(k + N / 2) / N] = e − πiexp[ − 2πik / N] = − exp[ − 2πik / N],
flipping the sign of the Ok + N / 2 terms. Thus, the whole DFT can be calculated as

follows:

This result, expressing the DFT of length N recursively in terms of two DFTs of
size N/2, is the core of the radix-2 DIT fast Fourier transform. The algorithm gains its
speed by re-using the results of intermediate computations to compute multiple DFT
outputs. Note that final outputs are obtained by a +/− combination of Ek and Okexp( −

2πik / N), which is simply a size-2 DFT; when this is generalized to larger radices below,
the size-2 DFT is replaced by a larger DFT (which itself can be evaluated with an FFT).
5

This process is an example of the general technique of divide and conquers


algorithms. In many traditional implementations, however, the explicit recursion is
avoided, and instead one traverses the computational tree in breadth-first fashion.

Fig 1.1 Decimation In Time FFT

In the DIT algorithm, the twiddle multiplication is performed before the butterfly
stage whereas for the DIF algorithm, the twiddle multiplication comes after the Butterfly
stage.

Fig 1.2 : Decimation In Frequency FFT

The 'Radix 2' algorithms are useful if N is a regular power of 2 (N=2p). If we


assume that algorithmic complexity provides a direct measure of execution time and that
the relevant logarithm base is 2 then as shown in table 1.1, ratio of execution times for
the (DFT) vs. (Radix 2 FFT) increases tremendously with increase in N.
6

The term 'FFT' is actually slightly ambiguous, because there are several
commonly used 'FFT' algorithms. There are two different Radix 2 algorithms, the so-
called 'Decimation in Time' (DIT) and 'Decimation in Frequency' (DIF) algorithms. Both
of these rely on the recursive decomposition of an N point transform into 2 (N/2) point
transforms.

Number Complex Multiplications Complex Multiplication Speed


of Points, in Direct computations, in FFT Algorithm, (N/2) improvement
N N2 log2 N Factor
4 16 4 4.0
8 64 12 5.3
16 256 32 8.0
32 1024 80 12.8
64 4096 192 21.3
128 16384 448 36.6
Table 1.1: Comparison of Execution Times, DFT & Radix – 2 FFT

1.2 BUTTERFLY STRUCTURES FOR FFT

Basically FFT algorithms are developed by means of divide and conquer method,
the is depending on the decomposition of an N point DFT in to smaller DFT’s. If N is
factored as N = r1,r2,r3 ..rL where r1=r2=…=rL=r, then rL =N. where r is called as Radix of
FFFt algorithm.

If r= 2, then if is called as radix-2 FFT algorithm,. The basic DFT is of size of 2.


The N point DFT is decimated into 2 point DFT by two ways such as Decimation In
Time (DIT) and Decimation In Frequency (DIF) algorithm. Both the algorithm take the
advantage of periodicity and symmetry property of the twiddle factor.
− j 2ΠnK
W nK =e N
N
The radix-2 decimation-in-frequency FFT is an important algorithm obtained by
the divide and conquers approach. The Fig. 1.2 below shows the first stage of the 8-point
7

DIF algorithm.

Fig. 1.1: First Stage of 8 point Decimation in Frequency Algorithm.

The decimation, however, causes shuffling in data. The entire process involves v
= log2 N stages of decimation, where each stage involves N/2 butterflies of the type
shown in the Fig. 1.3.
8

Fig. 1.4: Butterfly Scheme.

− j 2Πnk
Here W nk = e N is the Twiddle factor.

Consequently, the computation of N-point DFT via this algorithm requires (N/2)
log2 N complex multiplications. For illustrative purposes, the eight-point decimation-in
frequency algorithm is shown in the Figure below. We observe, as previously stated, that
the output sequence occurs in bit-reversed order with respect to the input. Furthermore, if
we abandon the requirement that the computations occur in place, it is also possible to
have both the input and output in normal order. The 8 point Decimation In frequency
algorithm is shown in Fig 1.5.
9

Fig. 1.5: 8 point Decimation in Frequency Algorithm


10

CHAPTER 2

ARCHITECTURE

2.1 Comparative Study


Our V HDL code implements an 8 point decimation-in-frequency algorithm using
the butterfly structure. The number of stages v in the structure shall be v = log2 N. In our
case, N = 8 and hence, the number of stages is equal to 3. There are various ways to
implement these three stages. The proposed three stage pipe line and the iterative
architecture method are described below for implementation of DFT.

A) Iterative Architecture - Using only one stage iteratively three times, once for every
decimation. This is a hardware efficient circuit as there is only one set of adders and
subtractions. The first stage requires only 2 CORDICs. The computation of each
CORDIC takes 8 clock pulses. The second and third stages do not require any CORDIC,
although in this structure they will require to rotate data by 0o or -90o using the
CORDIC, which will take 16 (8 for the second and 8 for third stage) clock pulses. The
entire process of rotation by 0o or -90o can rather be easily achieved by 2’s complement
and BUS exchange which would require much less hardware. Besides, while one set of
data is being computed, we have no option but to wait for it to get completely processed
for 36 clock cycles before inputting the next set of data. Thus,
Time Taken for computation = 24 clock cycles
No. of adders and subtractions = 16

b) Proposed method- Pipeline Architecture - Using three separate stages, one for every
decimation. This is the other extreme which would require 3 sets of sixteen adders. The
complexity of implementation would definitely be reduced and delay would drastically
cut down as each stage would be separated from the other by a bank of registers, and one
set of data could be serially streamed into the input registers 8 clock pulses after the
previous set. The net effect is that at a time we can have 3 stages working simultaneously.
11

However, this architecture is not taken into consideration as a valid option simply
because of the immense hardware required. Besides, it would give improvement of
merely 1 clock cycle over the architecture discussed below which we have used in terms
of the total time taken. Thus,
Time Taken for computation = 8 clock cycles
No. of adders and subtractions = 40

2.2 Working

2.2.1. Circuit Implementation

The radix-2, 8-point FFT was designed using VHDL code and simulated in
Model Simulation in order to verify its functionality. The design is synthesized utilizing
0.18μm technology. Timing constraint is set with operating frequency 50MHz. FFT
architecture is divided into three main process blocks. The block diagram of process
block is shown in Fig. 2.1.

Fig. 2.1: Data Process Block

This block consist of data input, butterfly computation and data output. The data
is read in every rising edge of clock and stored in the memory register. Butterfly
computation block compute the stored data before going to data output process. The data
is kept in the register before it is read out.

The FFT radix-2 processor architecture consist of a butterfly architecture,


memory register, control circuit, serial to parallel and parallel to serial converter. Twiddle
factor are stored as signed fixed point word. The block diagram representation of FFT
architecture design is shown in Fig. 2.2.
12

Fig 2.2: FFT Processor Architecture

The most important element in FFT processor is a butterfly structure. It takes two
signed fixed-point data from memory register and computes the FFT algorithm. The
output results are written back in same memory location as the previous input stored.
This method is called in-placement memory storage whereby it can reduce the hardware
utilization. The butterfly architecture is shown in Fig. 2.2. The adder sums the input
before being multiplied by the twiddle factor. The multiplier forms the partial product of
the complex multiplication and produce two times bigger then input bit. Shift register
would shift the bits to avoid overflow issue. Output of this butterfly would be kept in the
register for the subsequent stage.

The FFT processor event is determined by the control circuit depending on the
feedback it receives from the surrounding unit. Moore machine approach is adapted
whereby the output signal dependant to the value of next state. This design functions as a
synchronous design which controlled by “CLK” signal. The input signal “RST” is used to
reset the FFT processor including the input buffer which holds data for next stage.
13

CHAPTER 3

HARDWATE DESCRIPTION LANGUAGE

3.1 INTRODUCTION

Hardware Description Language (HDL) is a language that can describe the


behavior and structure of electronic system, but it is particularly suited as a language to
describe the structure and the behavior of the digital electronic hardware design, such as
ASICs and FPGAs as well as conventional circuits. HDL can be used to describe
electronic hardware at many different levels of abstraction such as Algorithm, Register
transfer level (RTL) and Gate level. Algorithm is un synthesizable, RTL is the input to
the synthesis, and Gate Level is the input from the synthesis. It is often reported that a
large number of ASIC designs meet their specification first time, but fail to work when
plunged into a system. HDL allows this issue to be addressed in two ways, a HDL
specification can be executed in order to achieve a high level of confidence in its
correctness before commencing design and may simulate one specification for a part in
the wider system context(Eg:- Printed Circuited Board Simulation). This depends upon
how accurately the specialization handles aspects such as timing and initialization.

3.2 ADVANTAGES OF HDL

A design methodology that uses HDLs has several fundamental advantages over
traditional Gate Level Design Methodology. The following are some of the advantages:

• One can verify functionality early in the design process and immediately simulate
the design written as a HDL description. Design simulation at this high level,
before implementation at the Gate Level allows testing architectural and
designing decisions.

• FPGA synthesis provides logic synthesis and optimization, so one can


automatically convert a VHDL description to gate level implementation in a given
technology.
14

• HDL descriptions provide technology independent documentation of a design and


its functionality. A HDL description is more easily read and understood than a
net-list or schematic description.

• HDLs typically support a mixed level description where structural or net-list


constructs can be mixed with behavioral or algorithmic descriptions. With this
mixed level capabilities one can describe system architectures at a high level or
gate level implementation.

3.3 VHDL

VHDL is a hardware description language. It describes the behavior of an


electronic circuit or system, from which the physical circuit or system can then be
attained.

VHDL stands for VHSIC Hardware Description Language. VHSIC is itself an


abbreviation for Very High Speed Integrated Circuits, an initiative funded by United
States Department of Defense in the 1980s that led to creation of VHDL. Its first version
was VHDL 87, later upgraded to the VHDL 93. VHDL was the original and first
hardware description language to be standardized by Institute of Electrical and
Electronics Engineers, through the IEEE 1076 standards. An additional standard, the
IEEE 1164, was later added to introduce a multi-valued logic system.

VHDL is intended for circuit synthesis as well as circuit simulation. However,


though VHDL is fully simulatable, not all constructs are synthesizable. The two main
immediate applications of VHDL are in the field of Programmable Logic Devices and in
the field of ASICs (Application Specific Integrated Circuits). Once the VHDL code has
been written, it can be used either to implement the circuit in a programmable device or
can be submitted to a foundry for fabrication of an ASIC chip.

VHDL is a fairly general-purpose language, and it doesn't require a simulator on


15

which to run the code. There are many VHDL compilers, which build executable
binaries. It can read and write files on the host computer, so a VHDL program can be
written that generates another VHDL program to be incorporated in the design being
developed. Because of this general-purpose nature, it is possible to use VHDL to write a
test bench that verifies the functionality of the design using files on the host computer to
define stimuli, interacts with the user, and compares results with those expected.

The key advantage of VHDL when used for systems design is that it allows the
behavior of the required system to be described (modeled) and verified (simulated) before
synthesis tools translate the design into real hardware (gates and wires). The VHDL
statements are inherently concurrent and the statements placed in a PROCESS,
FUNCTION or PROCEDURE are executed sequentially.

3.4 EDA Tools

There are several EDA (Electronic Design Automation) tool available for circuit
synthesis, implementation and simulation using VHDL. Some tools are offered as part of
a vendor’s design suite such as Altera’s Quatus II which allows the synthesis of VHDL
code onto Altera’s CPLD/FPGA chips, or Xilinx’s ISE suite, for Xilinx’s CPLD/FPGA
chips. The tools used were either ISE combined with ModelSim.
16

CHAPTER 4

DESIGN OF FFT

4.1 IMPLEMENTATION OF 8-POINT FFT BLOCKS

The FFT computation is accomplished in three stages. The x(0) until x(7)
variables are denoted as the input values for FFT computation and X(0) until X(7) are
denoted as the outputs. The pipeline architecture of the 8 point FFT is shown in Fig 4.1
consisting of butterfly schemes in it. There are two operations to complete the
computation in each stage.

Fig 4.1: Pipeline architecture of 8 point FFT.

The upward arrow will execute addition operation while downward arrow will
execute subtraction operation. The subtracted value is multiplied with twiddle factor
value before being processed into the next stage. This operation is done concurrently and
is known as butterfly process.
17

The implementation of FFT flow graph in the VHDL requires three stages, final
computation is done and the result is sent to the variable Y (0) to Y (7). Equation in each
stage is used to construct scheduling diagram. The figure 2 shows the scheduling diagram
of the first stage of IFFT algorithm.

Fig 4.2: Scheduling Diagram of stage one - FFT

For stage one, computation is accomplished in three clock cycles denoted as S0 to


S2.The operation is much simpler compared with FFT. This is because FFT processed
both real and imaginary value. The result from FFT is represented in real and imaginary
value because of the multiplication of twiddle factor. Twiddle factor is a constant defined
by the number of point used in this transform. This scheduling diagram is derived from
the equations obtain in FFT signal flow graph. The rest of the scheduling diagrams can be
sketched in the same way as shown in figure 4.2. Thus each stage requires a clock cycle
and totally three clock cycles are needed. Scheduling diagrams are a part of behavioral
modeling and Synthesis steps to translate the algorithmic description into RTL (register
transfer level) in VHDL design.

4.2 DESIGN OF A GENERAL RADIX-2 FFT USING VHDL

As we move to higher-point FFTs, the structure for computing the FFT becomes
more complex and the need for an efficient complex multiplier to be incorporated within
the butterfly structure arises. Hence we propose an algorithm for an efficient complex
multiplier that overcomes the complication of using complex numbers throughout the
process.
18

A radix-2 FFT can be efficiently implemented using a butterfly processor which


includes, besides the butterfly itself, an additional complex multiplier for the twiddle
factors.

A radix-2 butterfly processor consists of a complex adder, a complex subtraction,


and a complex multiplier for the twiddle factors. The complex multiplication with the
twiddle factor is often implemented with four real multiplications and 2 add / subtract
operations.

Normal Complex Operation:

(X+jY) (C+ jS) = CX + jSX + jCY - YS

= CX – YS + j (SX + CY)

Real Part R = CX – YS

Imaginary Part I = SX + CY

Using the twiddle factor multiplier that has been developed, it is possible to
design a butterfly processor for a radix-2 Cooley-Tukey FFT. Hence this basic structure
of radix-2 FFT can be used as a building block to construct higher N-point FFTs. This
structure has been developed as an extension to provide for the computation of higher
value index FFTs.
19

CHAPTER 5

VHDL IMPLEMENTATION

5.1 DESIGN SOFTWARE


The implementations have been carried out using the software, ModSim. The
hardware language used is the Very High Speed Integrated Circuit Hardware Description
Language (VHDL). VHDL is a widely used language for register transfer level
description of hardware. It is used for design entry, compile and simulation of digital
systems.

5.2 FFT IMPLEMENTATION STATES


The architectural design consist of data inputs, control unit, register, twiddle
factor and the data output. The register may be of the array of four or eight variable in the
type of real. The FFT implementation in VHDL consists of three states such as start, load
and run.

The initial state is start where the eight input of the FFT are given through single
clock cycles. The serial input of the input variable is declared as real. The input data is
serially sent to the register memory and each input is given through a clock cycle and is
used as an array in further stages through the internal register.

The second state is load where the butterfly process is carried out. The butterfly
process requires two clock cycles for completing a single stage of the butter fly stage.
Totally it requires six clock cycles to complete the butter fly process.

In the third state is run, the outputs of FFT are obtained by one by one. The output
will also requires a clock cycle and obtained by starting from X (0) up to X (7). As the
output consist of real and imaginary values, two registers are used for declare the values.
20

The controlling unit carried in the FFT process is by clock and reset input in the
program. The clock input is used to drive the pipeline stage the FFT architecture and reset
is carried for resetting the initial state. The twiddle factor value is fixed and it defined
inside the code itself.
21

CHAPTER 6

RESULTS

The simulation of this whole project has been done using the Model Sim of
version 6.2. Modelsim is a simulation tool for programming {VLSI} {ASIC}s,
{FPGA}s, {CPLD}s, and {SoC}s. Modelsim provides a comprehensive simulation and
debug environment for complex ASIC and FPGA designs. Support is provided for
multiple languages including Verilog, SystemVerilog, VHDL and SystemC.

6.1 SIMULATION RESULT OBTAINED FOR START STATE


22

6.2 SIMULATION RESULTS OBTAINED FOR LOAD STATE


23

6.3 SIMULATION RESULTS OBTAINED FOR RUN STATE

CHAPTER 7
24

CONCLUSION AND FUTURE SCOPE

7.1. CONCLUSION
This project describes the efficient use of VHDL code for the implementation of
radix 2 based FFT pipelined architecture and the wave form result of the various stages
has been obtained successfully. Compared to previous method it requires only six clock
cycles for performing the butterfly process and also, the accuracy in obtained results has
been increased with the help of efficient coding in VHDL. The accuracy in results
depends upon the equations obtained from the butterfly diagram and then on the correct
drawing of scheduling diagrams based on these equations.

7.2. FUTURE SCOPE

The future scopes of this project are to implement the proposed FFT architecture
using Field-Programmable Gate Arrays (FPGAs) and also obtain the Discrete In time
(DIT) algorithm of FFT.

The FFT (Fast Fourier Transform) processor plays a critical part in speed and
power consumption of the Orthogonal Frequency Division Multiplexing (OFDM)
communication system. Thus the FFT block can be implemented in OFDM.

REFERENCES
25

[1]. K.Harikrishna 1, T. Rama Rao 2, Valadimir A. Labay: ‘An Efficient FFT


Architecture for OFDM Communication Systems’. in Conference on Convergent
Technologies for Asia-Pacific Region (TENCON '06), Vol. 1, pp. 95-99, 2006.

[2]. C. Gonzalez-Concejero, V. Rodellar, : ‘An FFT/IFFT design versus Altera and


Xilinx cores ’. in conference on Reconfigurable Computing and FPGAs, 2008.
ReConFig '08. International Conference.

[3]. J.W. Cooley, J.W. Tukey , An algorithm for the machine calculation of complex
Fourier series, Math of comp, 1965, vol.9, pp. 297-301..

[4]. Muhammad Hasrul Mamat, " Implementation of an Inter-carrier Interference Self-


Cancellation Technique for OFDM System in Altera CYCLONE II FPGA ," 2008
International Conference on Electronic Design
.
[5]. Pawan Verma, Harpreet Kaur , " VHDL implementation of FFT/IFFT Blocks for
OFDM ", 2009 International Conference on Advances in Recent Technologies in
Communication and Computing.

[6]. Peter J. Ashenden, "VHDL Standards," IEEE Design and Test of Computers,
vol.18, no. 5, pp. 122-123, Sep./Oct. 2001,

Você também pode gostar