Attribution Non-Commercial (BY-NC)

11 visualizações

Attribution Non-Commercial (BY-NC)

- Zero Padding
- EERAManual
- Digital Control of Power Converter - A Survey
- Radar Signal Detection Using TMS320C50
- lab1
- Web Pack
- Microprocessor and Microcontroller lecturer notes
- 6713 User manual
- vlsi 1
- Digital Systems Design Using VHDL
- DSP FOR SMART BIOMETRIC SOLUTIONS (PRINTED)
- FPGA Implementation-Blob Recognition
- 01656412
- VHDL_5
- A FPGA-based Software GPS Receiver Simulink XSG.pdf
- Unit 3 Programmable Digital Signal Processors
- Altera Introducing Innovations at 28 nm to Move Beyond Moore’s Law
- Analog Device
- 1-Fourier-Analysis.ppt
- fft imp butterfly.docx

Você está na página 1de 4

ABSTRACT the inputs be scaled relative to the largest value that will

occur in the input data. Additional logic or processing

This paper describes the design and implementation of a may be required to achieve this condition. The mapping

fully pipelined 64-point Fast Fourier Transform (FFT) in between input numbers and their corresponding

programmable logic. The FFT takes 20-bit fixed point hexadecimal values is as follows:

complex numbers as input and after a known pipeline

latency produces 20-bit complex values representing the

FFT of the input. It is designed to allow continuous input 1 0x7FFFF

of samples and is therefore suitable for use in real-time … …

systems. The modular design allows it to be used together 1.907x10-6 0x00001

with other 64-point FFTs to create larger sizes, much as 0 0x00000

this design is built using smaller 8-point FFTs. Such a -1.907x10-6 0xFFFFF

design has many applications in high-speed real-time … …

systems such as wireless networking, spectral analysis, -1 0x80000

recognition systems, and more.

This mapping is more practically represented by

1. INTRODUCTION

floor (219 s ) 1> s ≥ 0

n= (2.1)

floor (2 (2 + s )) 0 ≥ s > −1

19

FFTs have use in innumerable signal processing

applications and are often an important building block in

such systems. Many of these applications require real- where s is the -1 to 1 scaled input number and n is the

time operation in order to be useful. While Digital Signal decimal value of the binary number to be fed into the

Processors (DSPs) are available that can perform an FFT input of the FFT.

fast enough to keep up with many real-time applications, Output values follow the same format as described

some systems require additional computation or have above. However, down-scaling is done as a byproduct of

speed requirements that exceed the capabilities of a DSP some of the internal stages, as described later in this

alone. It is in these situations that dedicated logic for paper. As a result, the output must be multiplied by 256

computing an FFT can be useful. Described in this paper to correct for this. The mapping for the output of the FFT

is the interface, design, implementation, and testing of a to their equivalent values is given by

64-point FFT implementation that takes advantage of

pipelining, memory bank switching, and smaller FFTs to

create a design capable of continuous real-time operation n

219 0 ≤ n ≤ 0 x7 FFFF

at high speeds.

s= (2.2)

n

− 2 0 x80000 ≤ n ≤ 0 xFFFFF

2. INTERFACING

219

The FFT is designed to take complex values at the input,

where the real and imaginary components each have 20 where n is the hexadecimal equivalent of the binary output

bits of precision. A twos-complement fixed-point format of the FFT and s is the corresponding fractional decimal

is used, with all numbers scaled to between -1.0 and 1.0. value.

Getting the most accuracy out of the design requires that

20-bit x 2 Bank 8-point Bank To Twiddle

Complex Switched FFT Unit A Switched Factor

Input Memory A Memory B Multiplier

Switched Factor FFT Unit B Switched Complex

Memory B Multiplier Memory C Output

Controller/Memory

Address Generator

Both the input and output busses to the FFT are t (2) = x (2) + x (6)

synchronized to the rising edge of the clock, so that an t (3) = x (3) + x (7)

t (4) = x (4) + x (8)

input value is captured and an output value is available on t (5) = x (1) − x (5)

the rising edge. The maximum clock rate for the design is t (6) = x (2) − x (6)

t (7) = x (3) − x (7)

determined by the speed and design of the programmable t (8) = x (4) − x (8)

logic device (PLD) that will be used. The simulated q (1) = t (1) + t (3)

timing information for this design on an Altera Apex q (2) = t (2) + t (4)

q (3) = t (1) − t (3)

FPGA device is described in later in this paper. q (4) = t (2) − t (4)

q (5) = t (5)

q (6) = t (6) + t (8)

3. DESIGN q (7) = t (7)

q (8) = t (6) = t (8)

The core FFT algorithm chosen for this design is the s (1) = q (1) + q (2)

Winograd 8-point FFT. This algorithm significantly s (2) = q (1) − q (2)

s (3) = q (3) − jq (4)

reduces the number of multiplications needed versus other s (4) = q (3) + jq (4)

algorithms at the expense of an increase in the number of s (5) = q (5) − j (1/ 2 ) q (6)

s (6) = q (5) + j (1/ 2 ) q (6)

additions and memory needed [1-3]. For PLDs, s (7) = (1/ 2 ) q (8) − jq (7)

multiplication is more expensive to implement than s (8) = (1/ 2 ) q (8) + jq (7)

addition in terms of computation time and number of

y (1) = s (1)

gates, and therefore the Winograd algorithm was chosen. y (2) = s (5) + s (7)

The equations that describe how it is computed are shown y (3) = s (3)

y (4) = s (5) − s (7)

in Figure 2. y (5) = s (2)

The pipeline layout of the 64-point FFT is shown in y (6) = s (6) − s (8)

y (7) = s (4)

Figure 1. There are 6 stages, including two 8-point y (8) = s (6) + s (8)

Winograd FFTs, one twiddle factor multiplier, and three

bank switched memories. The 8-point FFT blocks have

clocked shift registers at the input and output, but the

FFT itself is computed with purely combinatorial logic. Figure 2: 8-point Winograd FFT

The only multiplications needed within this stage are a

few multiplies by 1/ 2 , which are also built using

straight combinatorial logic units. These units perform

the multiplication by using shift-add techniques. The

Input: x(0..63) Output: X(0..63) avoided by instead multiplexing the use of two units at the

expense of increased latency.

The three bank switched memory blocks are used to

x0 8-pt s(0..7)

realize the multiplexing as well as to facilitate the sample

FFT reordering that is done at three different times in the data

flow. Specifically, each memory block consists of two

x1 8-pt s(8..15) banks, each of which can store 64 20-bit x 2 complex

FFT numbers. While one bank is being written with the data

from the previous stage the other bank can be read from

separately. When the bank being written to is filled with

64 new samples, the pipeline stages following are timed to

x7 8-pt s(56..63)

be finished reading the 64 samples from the other bank.

FFT The banks are then switched, allowing the new data to be

read out and the old bank to be loaded with more samples.

s(0..63) Twiddle factor t(0..63) In this way, continuous operation is possible.

multiplication Data reordering is accomplished by controlling the

memory access pattern when reading the data out of the

memory banks. The controller unit for the pipeline

t0 8-pt X0 generates the memory read addresses for each block,

FFT creating the modulo-8 reordering system. The controller

is also responsible for timing the start sequences between

t1 X1 each pipeline stage, and generating the proper indices for

8-pt

the ROM in the twiddle factor unit. The controller is

FFT

implemented simply as a 128-state state machine, using a

counter as an address generator for a ROM that stores the

values for all the control signals and addresses at each

t7 8-pt X7 state.

FFT The twiddle factor multiplier is simply a ROM

coupled with a complex multiplier. The ROM stores the

64 pre-computed twiddle factors. The complex

{x, t, X}k = {x, t, X}(n) where (n mod 8) = k

multiplication is accomplished by breaking the operation

Figure 3: Data flow for 64-point FFT down into three multiplies and five additions, as shown in

(3.3). The total latency for this stage is 7 cycles.

actual multiplication value used is 0.7071. It should be ( xr + jxi )(tr + jti ) (3.1)

noted that the internal precision of these multiplication

units and the rest of the 8-point FFT blocks is 24-bit, and xr tr − ti xi + jti xr + jtr xi (3.2)

that the output is scaled down by a divisor of 16 as a

result of how the algorithm is implemented. tr ( xr − xi ) + xi (tr − ti ) + j ( xr (tr − ti ) − tr ( xr − xi ))

The 8-point FFT units have shift registers on their

(3.3)

inputs and outputs, each one with eight positions for a 20-

bit by 2 complex number. Each time the input register is

4. IMPLEMENTATION AND TESTING

loaded with a new group of eight values, it is copied to a

latch from where the actual FFT is computed. In this way,

VHDL was chosen as the hardware description

the shift register can continue to load itself with new

language with which to build the FFT. The choice was

values while the FFT is running. The output register

made based mostly on the ready availability of tools to

operates in a similar fashion, copying the output of the

compile and simulate VHDL designs. The Quartus II

FFT from a parallel output latch and shifting the values

design system from Altera was used to compile and

out one at a time.

simulate the system. The target device used for

Combined with reordering and a twiddle factor

performing the timing simulation was the Apex

multiplication stage, the two 8-point FFT units are used to

EP20K600E [4]. This FPGA contains 24320 logic

produce the 64-point FFT. The data flow diagram for the

elements (LEs), 7326 of which are used by the FFT. A

algorithm is shown in Figure 3. It should be noted that

timing analysis of the worst case propagation time shows

the implementation of sixteen 8-point FFT units was

that the maximum speed for the FFT design in this FPGA

Software FFT Hardware FFT 1/ 2 or simply the fact that the software implementation

FFT Output Real Imag Real Imag has much higher internal precision, particularly in the

Sample #

multiplication units.

1 0x07D62 0x07D62 0x07D61 0x07D61

2 0xF7222 0xF6115 0xF7221 0xF6114

3 0x02E15 0xF7099 0x02E14 0xF7098 5. CONCLUSIONS

4 0x062ED 0xFDB5E 0x062EC 0xFDB5E

5 0x03302 0x023AA 0x03301 0x023A9 This paper presented an architecture for a pipelined 64-

6 0xFF3D3 0x01C12 0xFF3D2 0x01C12 point FFT for implementation in a PLD. It is suitable for

7 0xFE819 0xFED87 0xFE818 0xFED86 relatively high-speed applications where the typical DSP

8 0x00A2E 0xFD8DD 0x00A2E 0xFD8DC is not sufficiently fast to process the data, and particularly

for real-time designs. Further work might include a

simple extension of the radix-8 algorithm to the next step,

Table 1: First 8 points of simulation results a 512-point design, further investigation of the off-by-one

is 33.54 MHz. This means that the design can perform a errors, or perhaps further optimizations of the FFT design.

64-point FFT in 1.908 µs. In contrast, a 64-point FFT on

common DSP chip, the TI TMS320C3X at 75 MHz, takes ACKNOWLEDGEMENT

19.75 µs [5]. Thus the dedicated hardware FFT is faster

by more than a factor of ten. It should be noted however, This work was supported in part by the University

that the C3X family of DSPs are 32-bit floating-point, Scholars Program at the University of Florida.

which would mean greater dynamic range than the fixed-

point implementation. However, the quoted speed of the 7. REFERENCES

C3X does not include the time required to convert the

samples to the TMS320-specific floating point format, [1] Oppenheim, A.V., Schafer, R.W., Discrete-time Signal

which may be a concern in actual system implementations Processing, 2nd ed., Prentice Hall, New Jersey, 1999.

and would slow the algorithm down further.

While this design is capable of processing up to 33.54 [2] Press, W.H., Flannery, B.P., Teukolsky S.A., and

Msamples/s in at least one type of FPGA, the cost of the Vetterling W.T., Numerical Recipes in C: The Art of

pipelined architecture is in the latency of each stage. Each Scientific Computing, Cambridge University Press,

bank switched memory stage adds 66 cycles of latency, January 1993.

due to the number of cycles it takes to load 64 samples in

before a bank switch. The 8-point Winograd FFT stages [3] Smith, Steven, The Scientist and Engineer’s Guide to

each add 16 cycles. These cycles are the time that it takes Digital Signal Processsing, California Technical

to shift in 8 samples as well as another 8 cycles to allow Publishing, 1997.

the FFT to complete. It should be noted that the FFT

itself, without the attached shift registers, is not pipelined. [4] “APEX 20K Programmable Logic Device Family”,

It is given the maximum amount of time possible, 8 Product Data Sheet, ver. 4.0, Altera Corporation, August

cycles, to compute its result. After 8 cycles, the input 2001.

shift register would begin to lose data, making this the

upper limit. This “wait state” allows the rest of the system [5] “TMS320C3x General-Purpose Applications User

to be pipelined while keeping the 8-point FFT Guide”, Texas Instruments, January 1998.

combinatorial.

The twiddle factor multiplier stage contributes 7

cycles of latency. This latency arises due to the delays

associated with the twiddle factor ROM, as well as the

pipelined multipliers used to form the complex

multiplication unit. This makes the total latency for the

entire 64-point FFT is 237 cycles.

To test the FFT, it was given a set of artificially

generated complex input values, and the outputs were

compared with a software FFT implementation. The first

eight points of the results are shown in Table 1. It is

noted that the output values of the hardware

implementation sometimes differ from the software

version by one. This is likely due to the rounding caused

by the shift-add implementation of the multiplication by

- Zero PaddingEnviado portommaso850
- EERAManualEnviado porAhmed Ksentini
- Digital Control of Power Converter - A SurveyEnviado porShaolinmonks
- Radar Signal Detection Using TMS320C50Enviado pormgitecetech
- lab1Enviado porAidil Azhar
- Web PackEnviado porCharanjeet Singh Vaseir
- Microprocessor and Microcontroller lecturer notesEnviado porAnanda Padmanaban
- 6713 User manualEnviado porsningle
- vlsi 1Enviado porVimala Priya
- Digital Systems Design Using VHDLEnviado porK Praveen Kumar
- DSP FOR SMART BIOMETRIC SOLUTIONS (PRINTED)Enviado porkiran_ballamudi
- FPGA Implementation-Blob RecognitionEnviado porsgrupnar
- 01656412Enviado porJulio César
- VHDL_5Enviado porChutiya
- A FPGA-based Software GPS Receiver Simulink XSG.pdfEnviado porsgrupnar
- Unit 3 Programmable Digital Signal ProcessorsEnviado porPreetham Saigal
- Altera Introducing Innovations at 28 nm to Move Beyond Moore’s LawEnviado porkn65238859
- Analog DeviceEnviado porDaniel Nwodo
- 1-Fourier-Analysis.pptEnviado pordowntolab
- fft imp butterfly.docxEnviado porFarhaNazneen
- ms.pdfEnviado porMarwan Ahmed
- Comparison of Embedded System Design for Industrial ApplicationsEnviado porrakso_o
- Fft Tutorial 121102Enviado porBilal Najeeb
- Embedded System for Neural Machine Interface1.pdfEnviado porArati Chavan
- Digital Signal ProcessingEnviado porbrandondsz
- Ch1-Introduction to DSP'sEnviado porSuriya Madhan
- Syllabus Et 8 SemEnviado porAnonymous Rs28Sn
- fpga10Enviado poravinashdara
- Resume.Enviado porRaja Rahul
- Telinnovation Prod BulletinEnviado porapi-3697276

- Verilog TutorialEnviado porwizardvenkat
- High-Speed Digital System DesignEnviado porEnSon Chang
- Computer System Architecture 3rd Ed Morris Mano p98Enviado porkrishnaav
- Mixed Signal DesignEnviado porhe4dsh0k
- 2003 Hot-Carrier Stressing of NPN BJTs Incorporating FEnviado porkrishnaav
- sva_cdc_paper_dvcon2006.pdfEnviado porkrishnaav
- CMOS Integrated ADC and DAC 2nd VersionEnviado porkrishnaav
- Cadence ManualEnviado porUdhaya Simha
- PROBABILITY ArithimaticEnviado porAamer
- Bridge Axi Ahb (1)Enviado porkrishnaav
- analog discovery.pdfEnviado porkrishnaav
- Opamps TheoryEnviado porMahesh Kumar
- Improved Accuracy Current-Mode Multiplier Circuits With Applications in Analog Signal Processing.pdfEnviado porkrishnaav
- MicroEnviado porkrishnaav
- Prasad Htc InvoiceEnviado porkrishnaav
- Model PaperEnviado porkrishnaav
- VHDL-Reference.pdfEnviado porkrishnaav
- VHDL TutorialEnviado porkamarisamy
- Electronic Circuit Analysis Lab ManualEnviado pormahender1987
- Fpga-01 Sp3 Im Manual (1)Enviado porkrishnaav
- Matlab Part 1Enviado porrajaranjay
- 10.1.1.2Enviado porkrishnaav
- AC Lab ManualEnviado porSethu Naidu

- Pd 400 Cnc ProxonEnviado porNelson Hercilla
- Synergistic Potential for Academic Linkages in Providing GlobalEnviado porTimothy Kitui
- CVEnviado porocta_joni
- Decision Tree PruningEnviado poranon_144443406
- 86010100.pdfEnviado porRadu Barbu
- Sound_Vision_2009-04_05Enviado portrickyspidy
- Teacher feedback on advanced EFL student writing-in-progres= A Case study at a secondary school in Oman.docEnviado porjamel_terzi_alimi
- Dagmar ApproachEnviado porPrasun Goala
- The Nordic Model_FinalEnviado porCsaba madarász
- Formal Report.docxEnviado porLizbeth Aura Cebrian
- Types and Components of Computer Systems-Grade 10Enviado porAziz Ahmad
- Tailoring Strategy to Fit Specific Industry and Company SituationsEnviado porEqraChaudhary
- Embroidery FontsEnviado porAshley Quinn
- API 579-1 _ ASME FFS-1 Part 9 Fracture Mechanics EvaluationsEnviado porstaplesjf
- 5-54530_StrengthenYourHealthSafetyCultureEnviado porLucy Enobong
- Kotak FinalEnviado porPayal Ambhore
- 10 Week Vertical Jump Training ProgramEnviado porVicente Zarate
- The CodeEnviado porBeyond Words Publishing
- Professional Practice SyllabusEnviado porDavid Carnicelli
- Gama Bombas de Calor Waterkotte 2012 EEnviado porLuis Campos Gonzalez
- Macmillan Guide 2 Sci Teachers BookEnviado porXenia Xenia
- Good Conduct Prisoners Probational Release Rules 1927.Doc PakistanEnviado porMuhammad Shahbaz Tahir
- Zavio F210AEnviado porSelene MorHz
- Matthes Kohring 2008 Media FramesEnviado porAmmar Mustafa Mahadi
- Design Methodology for Cross-flodding Connects on Naval VesselsEnviado poryw_oulala
- Critical Care Pain ManagementEnviado porNining Komala Sari
- Moraga Rotary Newsletter - May 3, 2016Enviado porFrank May
- Food Co-ops toolkitEnviado porAvantgardens
- Thesis OutlineEnviado porEnDang Sakinah
- Module 7,9,10 EssayEnviado porVinay