Escolar Documentos
Profissional Documentos
Cultura Documentos
Abstract—Fast Fourier transform (FFT) is always an (SDF) and multipath delay commutator (MDC) for efficient
accepted topic for research from past many years for different hardware utilization.
applications in the digital system. The computational unit of
FFT consists butterfly units which have complex multiplication The FFT algorithm has evolved in order achieve the
and complex addition. As multipliers are the basic efficient processor with less power, area, speed and high
computational element and also the slow and power consuming accuracy. It is noticed that each FFT algorithm is efficient
element in the processor special multipliers are used to than other with a variation of complex multipliers in the
overcome the downside. In this study, we have proposed a architectures [7-8]. As multipliers are the most power and
computationally efficient Fast Fourier Transform (FFT) based area consuming and slow to compute. Thus, most of the
on radix-2 and radix-4 decimation in frequency algorithm, FFT algorithms focuses on reducing the multipliers which
where different multipliers are carried out in FFT algorithm results in complex architecture. Hence using an efficient
and parameters are compared. This entire work is performed complex multiplier in the architectures can reduce the
on Xilinx ISE 14.7 and implemented on FPGA Xilinx vertex6 overall consumption of power and area resulting an efficient
xc6vlx760-2ff1760. FFT processor. In FFT algorithms fixed point multipliers
and floating point multipliers [9] are used most preferably
Keyword— Fast Fourier transform (FFT), Butterfly unit, fixed point multipliers are used when application focus on
Computational unit, Multipliers.
speed, power and area but application were dynamic range
is required then floating point multipliers is preferred at
I. INTRODUCTION expenses of higher cost.
Fast Fourier Transform(FFT) is a standard algorithm as Implementation of FFT processor is an old field but
it is computationally efficient over Discrete Fourier advances are still to be made this paper presents low
transform (DFT) [1] which is widely used in various complexity and less area consuming datapath unit of FFT by
applications of digital signal processing, communication combining algorithms, arithmetic and architecture [10].
system, image processing and bio-medical etc. As FFT has Implementation of radix-2 and radix-4 FFT algorithm is
large scale of computational requirements, it utilises large done using different types multipliers such as Array
area and consumes more power when implemented on multiplier, Wallace tree multiplier, Vedic multiplier, Radix-
hardware. Therefore, an efficient and accurate 4 booth multiplier and constant multiplier which are
implementation of FFT processor is always preferred. represented in fixed point (Q-format) [11] and also
FFT algorithm was proposed by Cooley and Tukey [2] synthesized on Xilinx vertex6 xc6vlx760-2ff1760. Post,
which uses divide and conquer approach to reduce the place and route simulation is performed using Verilog code
computational process, this algorithm led to two main to estimate delay, area and power on Xilinx ISE 14.7.
algorithms Radix-2 decimation in time (DIT) and Radix-2
decimation in frequency (DIF). After this to reduce the II. REVIEW OF FFT ALGORITHM
computation, higher radix [3] (Radix-4, Radix-8) algorithm
The N-point discrete Fourier transform(DFT) [1] is
came into picture which reduces the complex multiplication,
de¿ned by
but the butterfly structure becomes more complex when
radix is higher. So, mixed radix algorithm [4] was
introduced which utilise the benefits of both radix-2 and
Radix-4 algorithms. FFT algorithm further evolved with
radix-2k algorithm [5] with more computational efficiency.
Pipelined FFT architectures [6] were introduced which is a where is the twiddle factor, FFT uses
special class of FFT algorithm were FFT is sequentially the symmetry and periodicity of the complex twiddle factor
computed, they are classified into single delay feedback
2748
International Conference on Energy, Communication, Data Analytics and Soft Computing (ICECDS-2017)
to compute DFT. The Cooley-Tukey FFT algorithm is a order. Each of the numbers between two stages represents
divide-and-conquer algorithm which recursively split the the twiddle factor values of the FFT algorithm according to
DFT computation into odd and even half parts. the N number off the input sequence.
X(2m+1) =
-j W qN
4(m+1)
FFT is calculated by butterfly operation as shown in 1
j
x(n+N/2) - X (x(n)-x(n+N/2))WnN
+
x[2] 0 0 0 X[4]
x[3] 0 0 4
X[12] + ]
x[4] 0 0 0 X[2]
x[5] 0 2 0 X[10]
x[6] 0 4 0 X[6] + -
x[7] 0 6 4
X[14]
x[8] 0 0 0
X[1] ]
x[9] 1 0 0 X[9]
x[10] 2 0 0 X[5]
x[11] 3 0 4 X[13]
x[12] 4 0 0 X[3]
- -j
x[13] 5 2 0 X[11]
x[14] 6 4 0 X[7]
]
x[15] 7 6 4
X[15]
Fig. 2. signal flow graph of 16- Point Radix-2 DIF butterfly operation.
2749
International Conference on Energy, Communication, Data Analytics and Soft Computing (ICECDS-2017)
STAGE 1 STAGE 2
x[0]
x[1]
0
0
X[0]
X[4]
110101010
x[2] 0 X[8]
0 Fig. 5. grouping of numbers by overlapping technique
x[3] X[12]
x[4] 0 X[1]
1
x[5] X[5]
2 TABLE II. TABLE II: BOOTH RECODED TABLE
x[6] X[9]
3 X[13]
x[7]
x[8] 0 X[2] Multiplier Bits Encoded Partial Operation
x[9] 2 X[6] Qi+1 Qi Qi-1 Multiplier products
x[10]
4
X[10] value
6 0 0 0 0 0M No action
x[11] X[14]
x[12] 0 X[3] 0 0 1 1 1M Add
3
x[13] X[7] 0 1 0 1 1M Add
x[14] 6 X[11] 0 1 1 2 2M Shift left and add
9
x[15] X[15]
1 0 0 -2 -2M Shift left and subtract
Fig. 4. signal flow graph of 16- Point Radix-4 DIF butterfly operation. 1 0 1 -1 -1M Subtract
1 1 0 -1 -1M Subtract
Figure 4 shows a 16-point radix-4 where it is clear that
1 1 1 0 0M No action
the non-trivial complex multiplications of radix-4 FFT
algorithms will only appear after every two butterfly stages.
As such, it provides better spatial regularity than radix-2 Table II describes the Booth Recoding table. In the
FFT algorithms, which is beneficial to hardware above table, 3 bits of the Multiplier Q are taken for
implementation. inspection to find the encoded value. is the encoded value
which is to be multiplied with the Multiplicand ‘M’ and is
III. TYPPES OF MULTIPLIERS given by = Qi-1 + Qi -2 * Qi+1.
2750
International Conference on Energy, Communication, Data Analytics and Soft Computing (ICECDS-2017)
x
X>>2
+
-
X>>7
X>>10 (a) (b)
+ Fig 9. (a)Wallace tree structure for 8x8 multiplication. (b) Reorganized
Matrix of Wallace tree matrix.
+
The multiplication operation performed in the Wallace
z tree, every possible bit in every column is covered by the
3:2 (full adder) or 2:2 (half adder) compressors repetitively
Fig. 7. Representation of the result in CSD format until the final partial product is left with a depth of only 2.
Thus, to compress the partial products a Wallace tree
The figure 7 shows the constant multiplier using shift multiplier uses more hardware is utilized to get final product
and add approach is used where CSD representation is used as quickly as possible. Fig. 9 shows the logic used for 8x8
to reduce the non-zero bits. bits Wallace tree multiplication and the tree structure
organised according to the addition performed for the partial
products.
2751
International Conference on Energy, Communication, Data Analytics and Soft Computing (ICECDS-2017)
IV. PROPOSED WORK TABLE IV. COMPARISON TABLE OF RADIX-4 DIFFFT WITH
DIFFERENT MULTIPLIERS.
FFT processor are most important digital device which
Paramet Array Wallace Vedic Modifie Constant
plays a valuable role in different application of er multiplier tree multiplie d booth multiplier
communication, signal processing and bio-medical etc. multiplie r multiplie
There is always a trade of between speed, area, power and r r
precision based upon the algorithm, architecture and No. of A
slice r 14,237 16,207 14,198 16.273 15,080
computational elements used. In this work detailed study of e
FFT architecture and multipliers is done to get an efficient LUT’s
a
utilization of speed, area and delay for FFT processor in Logic D 4.5ns 4.2ns 4.38ns 6.54ns 5.02ns
e
various applications. l
Routing a
19ns 17.9ns 17.52 24.07ns 14.37ns
According to Dr. Oscar Gustafsson [10] Ideally, “one y
should select FFT algorithms based on the architecture, Total 23.6ns 22.18ns 21.9ns 30.62ns 19.39ns
which in turn should be selected based on the processing
applications”. This work presented will be extensively
presenting data-path components i.e. computational
elements in Radix-2 and Radix-4 FFT algorithm, the main From the comparison table mentioned above it is
computational elements are multipliers and adders where observed that using a constant multiplier there is a drastic
multipliers are important computational element in the change in the delay by 35% decrease from the other
implementation of area, power and delay optimized FFT multiplier’s and also there is 7% decrease in area. Hence,
processor. using constant multiplier in FFT architecture it will provide
high speed with comparable less area utilization.
The presented work focuses on efficient and high speed
multipliers for a better performance of FFT processor. Also,
implemented FFT architecture using following efficient VI. CONCLUSION
multipliers and compared the effect on area and delay This work has presented efficient computational unit of
utilized by the FFT processor. FFT processor, comparing different multiplier as mentioned
in paper were used in Radix-2 FFT algorithm and Radix-4
V. IMPLIMENTATION AND RESULT ANALYSIS FFT algorithm to analyze the effect of area and delay
depending upon the efficiency of multiplier. It was observed
Implementation of Radix-2 and Radix-4 FFT algorithm that using a constant multiplier in CSD format provides high
is performed using different multipliers in Q-point format speed and less area utilization. Post-place and route
on Xilinx vertex6 xc6vlx760-2ff1760. Post-place and route simulation and synthesis were performed using Verilog code
simulation is performed and synthesis analysis is performed in Xilinx ISE 14.7 and implemented on Xilinx vertex6
using Verilog code on Xilinx ISE 14.7. xc6vlx760-2ff1760 FPGA.
2752
International Conference on Energy, Communication, Data Analytics and Soft Computing (ICECDS-2017)
[9] Amir Kaivani and Seokbum Ko, “floating point butterfly architecture [14] Sree Nivas A, Kayalvizhi N, “Implementation of Power Efficient
based on binary signed digit representation” IEEE transaction on very Vedic Multiplier”, International Journal of Computer
large scale integration system 10.1109/TVLSI.2015.2437999. Applications(IJCA), ISSN 0975 – 8887, Volume 43, pp. 21-24, April
[10] Oscar Gustafsson, “ELECTRONICS LETTERS 28th February 2013 2012.
Vol.49 No.5” doi: 10.1049/el.2013.0549. [15] Kyung-Ju Cho, Suhyun Jo, Yong-Eun Kim, Yi-Nan Xu, Jin-Gyun
[11] Sandesh S. Saokar, R. M. Banakar, Saroja Siddamal, “High Speed Chung, “Constant Multiplier Design using Specialized Bit Pattern
Signed Multiplier for Digital Signal Processing Applications” source: Adders” source: 978-1-4244-2182-4/08/$25.00 ©2008 IEEE.
978-1-4673-1318-6/12/$31.00 ©2012 IEEE. [16] R. Hartley, “Subexpression sharing in filters using canonic signed
[12] “
David Villeger, Vojin G Oklobdzija, Evaluation of Booth Encoding digit multipliers,” IEEE Trans. Circuits & Syst. II, vol. 43, Oct. 1996.
Techniques for Parallel Multiplier Implementation” source: Volume [17] S. D. Pezaris, "A 40-ns 17-Bit by 17-Bit Array Multiplier", IEEE
29, Issue 23, 11 November 1993, ISSN 0013-5194. Trans. on Computers, pp. 442-447, Abr. 1971.
[13] Kanhe, Aniruddha, Shishir Kumar Das, and Ankit Kumar Singh. [18] Ron S. Waters, Earl E. Swartz lander, “A Reduced Complexity
"Design and implementation of low power multiplier using vedic Wallace Multiplier Reduction,” IEEE TRANSACTIONS ON
multiplication technique." International Journal of Computer Science COMPUTERS, VOL. 59, NO. 8, pp. 1134 – 1137, AUGUST 2010.
and Communication 3, no. 1, pp.131-132, 2012.
2753