Você está na página 1de 57

VietNam NationalUniversity

University ofEngineeringandTechnology

FPGA TECHNOLOGY
TS. Nguyn Kim Hng
Email: kiemhung@vnu.edu.vn

LaboratoryforSmartIntegratedSystems

Objectives

Inthislectureyouwillbeintroducedto:
TheprogrammablelogicTechnology,the
featuresofFPGAarchitecture
CoarsegrainedReconfigurableArchitectures
ReconfigurableComputing

LaboratoryforSmartIntegratedSystems

Review
Existing Integrated Circuits (ICs) can be classified
into (1):
Standard ICs:
realize some commonly used logic circuits
conform to an agreed-upon standard in terms of
functionality and physical configuration
For example:
7400-series, etc.
Memories, microcontroller, microprocessors, etc.

LaboratoryforSmartIntegratedSystems

Review
Existing Integrated Circuits (ICs) can be classified
into (2):
Programmable Logic Devices (PLD):
Contain a regular structure and a collection of
programmable switches that allow the internal circuitry in
the chip to be configured by the user to implement a wide
range of different logic circuits.
Can be programmed multiple times.
Maskprogrammable PLDs andFieldprogrammable PLDs.
Be classified into:
Programmable Logic Array (PLA): both the AND and OR
planes are programmable.
Programmable Array Logic (PAL): programmable AND plane,
the is fixed OR plane.
Field Programmable Gate Array (FPGA)

LaboratoryforSmartIntegratedSystems

ExampleofMaskProgrammablePLD

A sea-of-gates gate array

f1 x2 x3 x1 x3

LaboratoryforSmartIntegratedSystems

ExampleofFieldProgrammablePLD

LaboratoryforSmartIntegratedSystems

Review
Existing Integrated Circuits (ICs) can be classified
into (3):
Application Specific IC (ASIC) or

Chips:

Custom-Designed

Aim to meet the desired performance or cost


objectives.
chip is designed first and then manufactured by a
company that has the fabrication facilities
Designed for:
Video processing,
An interface between memory and CPU,
automobile, etc.

LaboratoryforSmartIntegratedSystems

ContrastingArchitectures
ASIC architecture compared to the Xilinx FPGA architecture
Granularity: Gates vs. LUTs
Delays: Low vs. High
Performance: High vs. Low

Fundamental considerations for selecting ASIC or FPGA

Cost
Size
Performance
Volume
Analog circuitry
Time to market
Reprogrammability

LaboratoryforSmartIntegratedSystems

FPGAApplications
Implementing the prototype for ASIC designs
Providing a hardware platform to verify the physical
implementation of new algorithms in:

Digital signal processing (DSP),


Baseband processing in communication,
Software-defined radios,
Radar,
Video, image processing,
Physical layer communication interfaces, etc

On-Chip embedded processing systems


Functioning reconfigurable hardware in Reconfigurable
Computing

LaboratoryforSmartIntegratedSystems

WhatisFPGA?
Field ProgrammableGateArray:
Prefabricated digital (IC)devices
Electricallyprogrammedtobecome
almostanykindofdigitalcircuitor
system
Programming takes place in the
field.

Comprisesof
Configurablelogicblocks(CLB),
Programmableroutingresources:wires
andswitches
I/Oblocks.

Adoptstheprogrammingtechnologies:
SRAMbasedtechnology
Flash/EEPROMtechnology
Antifusetechnology

OverviewoftheFPGAarchitecture

LaboratoryforSmartIntegratedSystems

10

ProgrammingTechnologies
A memory element for storing
configuration information

A basic CLB

LaboratoryforSmartIntegratedSystems

11

ProgrammingTechnologies
(1)SRAMBasedProgrammingTechnology
Characteristics:
Staticmemorycellsareusedasthebasiccells,
thedominantapproachfortheexistingFPGAs

Advantages:
reprogrammability;theuseofstandardCMOSprocesstechnology
higherspeedandlowerdynamicpowerconsumption

Disadvantages:
Largerareacomparedtootherprogrammingtechnologies
anSRAMcellrequires6transistors
SRAMcellsarevolatile

LaboratoryforSmartIntegratedSystems

12

ProgrammingTechnologies
(2)EEPROM/FlashbasedProgrammingTechnology
Characteristics:
canbeelectricallyprogrammed

Advantages:
nonvolatile
IsmoreefficientintermofareathanSRAMbasedprogramming
technology

Disadvantages:
cannotbereconfigured/reprogrammedaninfinitenumberof
times
flashbasedtechnologyusesnonstandardCMOSprocess

LaboratoryforSmartIntegratedSystems

13

ProgrammingTechnologies
(3)AntifuseProgrammingTechnology
Characteristics:
onetimeprogrammable(OTP)

Advantages:
lowarea;
nonvolatile

Disadvantages:
doesnotusestandardCMOSprocess
cannotbereprogrammed

LaboratoryforSmartIntegratedSystems

14

Configuration
When does configuration happen?
On power up: static configuration
On demand: dynamical configuration

Why do FPGAs need to be configured?


FPGA configuration memory is volatile
Configuration data is stored in a PROM or other external
data source

What do you need to know about FPGA configuration?


What happens during configuration
How to set up various configuration modes and daisy chains

LaboratoryforSmartIntegratedSystems

15

Configuration
Cost of ownership is reduced with the ability to
reconfigure the hardwareextending the life of the
product
Reduces the costly physical deployment
of repair technicians
Extends the life of the product
Upgrades
Bug fixes
Adding additional functionality
Faster time to market
Partial reconfiguration

LaboratoryforSmartIntegratedSystems

16

FPGAConfigurationMethods
Xilinx Cables:
JTAG
Slave Serial
Slave SelectMAP

Microprocessor:
JTAG
Slave Serial
Slave SelectMAP

FPGA
FPGA

Compact Flash Card:


System ACE

Xilinx PROMs:
Slave/Master
Serial
Slave/Master
SelectMAP

Commodity Flash:
Slave SelectMAP
SPI*
BPI*

*SPI and BPI support is available in the newer Virtex-5 and Spartan-3E families

LaboratoryforSmartIntegratedSystems

17

FivePrimaryElements
Configurable
logicblocks

XilinxFPGAs

Dedicated
blocks
Input and
outputblocks

Routing

*Clocking
Resources

LaboratoryforSmartIntegratedSystems

18

ConfigurableLogicBlocks
Logic Block Architecture:

Basic component, provides the basic logic and storage


functionality
Granularity:
Fine-grained: Logic Gates
Medium-grained: Multiplexors, LUTs, Flip-Flop, etc
Coarse-grained: Processor cores, DSP cores, etc

Organization:

A single basic logic element (BLE): also called Logic cells


Cluster of locally interconnected BLEs: also called Slices

Specific Purpose Hard Block: Memory, Multipliers, Adders, and


DSP blocks, high-speed input/output (I/O) interfaces
very efficient at implementing specific functions
wasting huge amount of logic and routing resources if unused

LaboratoryforSmartIntegratedSystems

19

ConfigurableLogicBlocks

A configurable logic block (CLB) having four BLEs

LaboratoryforSmartIntegratedSystems

20

LogicCells
Logic cells include
Combinatorial logic, arithmetic
logic, and a register

Combinatorial logic is
implemented using Look-Up
Tables (LUTs)
Register can function as
latches, JK, SR, D, and T-type
flip-flops
Arithmetic logic is a dedicated
carry chain for implementing
fast arithmetic operations

Carry
out

Carry
Chain

LUT

Carry
in

LaboratoryforSmartIntegratedSystems

S/R

21

LUT:LookupTable
Used to implement a small
logic function
Composed of:
storage cells store values

that produce the output of


the logic function f
Multiplexers select the
content of one of the
storage cells as the output
of the LUT
LUTs size is defined by the
number of inputs

LaboratoryforSmartIntegratedSystems

22

CombinatorialLogic
LUTs function as a Memory
A B C D E F Z
0 0 0 0 0 0 0
0 0 0 0 0 1 0
0 0 0 0 1 0 0
0 0 0 0 1 1 1
0 0 0 1 0 0 1
000000110011
. . .
1
0 0 1 1 0 0 0
0 0 1 1 0 1 0
0 0 1 1 1 0 0
0 0 1 1 1 1 1

Combinatorial Logic

A
B
C
D
E
F

They generate
the output
value

LUT
Z

for a given set


of inputs

ConstantdelaythroughaLUT
Limitedbythenumberofinputsand
outputs,notbycomplexity

LaboratoryforSmartIntegratedSystems

23

LUT:ASimpleExample

LaboratoryforSmartIntegratedSystems

24

WideInputFunctions
Forwiderinput
functions,LUTs canbe
combinedusinga
multiplexer

LUT

LUT

Thesemuxes are
dedicated,sotheyare
fast

LaboratoryforSmartIntegratedSystems

MUX

LUT

25

ASICImplementation
8inputANDgate
TwofourinputNANDgates
feedingatwoinputNORgate

Approximate delay in a standard-cell


ASIC with 0.13- process = 0.47 ns

Beware of ASIC libraries


with very wide gate types!

LaboratoryforSmartIntegratedSystems

26

XilinxImplementation
8-input AND gate implemented in
three 4-input LUTs and two logic
levels

Approximate max delay in a Spartan-3 FPGA = 0.678 ns


Approximate gate count = 18 gates
Approximate max delay in a Virtex-5 FPGA = 0.435 ns
Approximate gate count = 18 gates

LaboratoryforSmartIntegratedSystems

27

Quiz
How many 4-input LUTs would be
required to implement a 32-input
OR gate?
How many Logic Levels would they
generate?

LaboratoryforSmartIntegratedSystems

28

CarryLogic

An n-bit ripple-carry adder

LaboratoryforSmartIntegratedSystems

29

CarryLogic
The carry logic chain is dedicated
logic that computes high-speed
arithmetic logic functions
The carry chain generally consists
of a multiplexer and an XOR gate

The LUT computes the multiplexer


selector
The multiplexer determines the carryout
The XOR gate computes the addition

From
LUT

LaboratoryforSmartIntegratedSystems

30

RoutingNetworkArchitecture
Provides connections among logic blocks and I/O blocks
to implement any user-defined circuit
Comprises of wires and programmable switches
Must be very flexible to accommodate a wide variety
of circuits
Must be very efficiency to offer high performance
Be optimized by taking into account the common
characteristics of these circuits:
Locality: requiring abundant short wires
some distant connections: leads to the need for sparse long
wires.

Can be categorized as:


Island-style
Hierarchical

LaboratoryforSmartIntegratedSystems

31

RoutingNetworkArchitecture
Island-style Architecture (or mesh-based FPGA architecture):
The most commonly used architecture among commercial FPGAs
Configurable logic blocks look like islands in a sea of routing
interconnect (the routing network occupies 8090% of total area)

LaboratoryforSmartIntegratedSystems

32

RoutingNetworkArchitecture
Channelwidth:isthenumberoftracksinroutingchannel
Connectionboxes(CB):connectsLogicblocksandroutingnetwork
FlexibilityofaCB(Fc)isthenumberofroutingtracksofadjacentchannel
whichareconnectedtothepinofablock
Fc(in):theconnectivityofinputpinsoflogicblocks
Fc(out):theconnectivityofoutputpinsoflogicblocks

Switchboxes(SB):connects horizontalandverticalroutingtracks
FlexibilityofaSB(Fs)isthetotalnumberoftrackswhicheverytrackentering
intheswitchboxconnectsto

LaboratoryforSmartIntegratedSystems

33

RoutingNetworkArchitecture
Routingtrackscanbe
bidirectionalor
unidirectional
Channelwidthof
unidirectionalwiringmust
beinmultiplesof2

LaboratoryforSmartIntegratedSystems

34

RoutingNetworkArchitecture
Multilengthwiresarecreated tobalanceflexibility,areaand
delayoftheroutingnetwork

Longerwiresegments:
Spanmultipleblocksandrequirefewerswitches,therebyreducing
routingareaanddelay
Butalsodecreaseroutingflexibility,whichreducestheprobabilityto
routeahardwarecircuitsuccessfully

LaboratoryforSmartIntegratedSystems

35

RoutingNetworkArchitecture
Hierarchical Architecture:

Exploit this locality by dividing logic blocks into separate clusters

The connections between logic blocks within same cluster are made by
wire segments
the connection between blocks residing in different groups require the
traversal of one or more levels of hierarchy.

LaboratoryforSmartIntegratedSystems

36

RoutingNetworkArchitecture
Hierarchical Architecture:
Example

LaboratoryforSmartIntegratedSystems

37

NoC
basedRoutingArchitecture
based
NoCbasedRoutingArchitecture

Network-on-Chip:

Router

Processing
element

Unidirectional
links
Network
Interface

Input
buffers

Network-on-Chip.

LaboratoryforSmartIntegratedSystems

38

On
chipInterconnectionTypes
OnchipInterconnectionTypes

Network-on-Chip:

Network-on-Chip

LaboratoryforSmartIntegratedSystems

39

DedicatedRouting
Acombinationofprogrammableand
dedicatedroutinglines
Dedicatedrouting
Globalclockswithpredefinedclocktree
RegionalclocksandIOclocks
Globallowskewroutingresourcesforother
highfanoutsignals
Carrychainrouting
Dedicatedroutingamongotherdedicated
resources

Generalinterconnect
RoutingoflocalsignalsbetweenCLBs and
IOBs

LaboratoryforSmartIntegratedSystems

40

IOBElement
Controltheflowofdatabetweenthe
I/Opinsandtheinternallogicofthe
device
Canconfigureasingleinterfacepin
asinput,outputorbidirectional
Includeaninputblock,anoutput
blockandanoutputenableblock
A pair of Dual-Data Rate
(DDR) registers
Two operation modes of DDR
registers:
Singledatarate(SDR):dataare
copiedintotheI/Oregisterson
therisingclockedgeonly
Doubledatarate(DDR):dataare
copiedintotheI/Oregisterson
boththerisingclockedgeand
fallingclockedge

LaboratoryforSmartIntegratedSystems

41

ConfigurableI/Ostandards
Standard referstoelectrical
aspectsofthesignals,suchas
theirlogic0andlogic1voltage
levels
I/Ocanbeconfiguredtoaccept
andgeneratesignalsconforming
towhicheverstandardisrequired
I/Osignalswillbesplitintoa
numberofbanks,eachbankcan
beconfiguredindividuallyto
supportaparticularI/Ostandard
allowstheFPGAtoworkwith
devicesusingmultipleI/O
standards
allowstheFPGAtoactuallybe
usedtointerface(translate)
betweendifferentI/Ostandards

LaboratoryforSmartIntegratedSystems

42

I/OTranslators
Programmableinputandoutputthresholds
Supportedstandardsinclude
LVCMOS(severalclasses),LVPECL,HSTL
(severalclasses),SSTL(severalclasses),PCI,
PCIX,LVDS(severalclasses),GTL,GTL+,and
HyperTransport (LDT)technology
Supportedstandardsvary,checkyourdatasheet

DifferentI/Ostandardsrequireaseparateinputandoutput
referencevoltageforeachbanksupportingaseparateI/O
standard
Generally,eachbankcansupportseveralstandards,aslongas
theysharethesamevref (input)orvcco (output)

LaboratoryforSmartIntegratedSystems

43

DedicatedBlocks

HardIP
Preimplementedhardwareblockssuchasmicroprocessorcores,gigabit
interfaces,multipliers,adders,MACfunctionsetc.
Designedtobeasefficientaspossibleintermsofpowerconsumption,
siliconarea,andperformance

SoftIP:
sourcelevellibraryofhighlevelfunctions inahardwaredescription
language,orHDL,suchasVerilog orVHDLattheregistertransferlevel
(RTL)ofabstraction

FirmIP:
alibraryofhighlevelfunctionsinnetlist (i.e.thesefunctionshave
alreadybeenoptimallymapped,placed,androutedintoagroupof
programmablelogicblocks)

LaboratoryforSmartIntegratedSystems

44

Gigabittransceivers
Specialhardwiredtransceiverblocks
Useonepairofdifferentialsignalstotransmit(TX)dataand
anotherpairtoreceive(RX)data
Cantransmitandreceivebillionsofbitsofdatapersecond

LaboratoryforSmartIntegratedSystems

45

MemoryBlocks
Support single- and dual-port
synchronous operations
In dual-port mode, these RAM blocks
support fully independent ports for
both reading and writing
Each block of RAM can be used
independently, or multiple blocks can
be combined together to implement
larger blocks by dedicated cascade
logic
Blocks of memory are generally spread
out across the die
Dedicated FIFO logic enables each
RAM to be configured as a FIFO
Contain from tens to hundreds of
these RAM blocks

Total storage capacity of a few hundred


thousand bits up to several million bits

LaboratoryforSmartIntegratedSystems

46

SpecificPurposeHardBlocks:XILINXDSPSLICE
25x18 Multiply

Dedicated A
Cascading

ALU Mode

Independent
C input

Pattern Detection

LaboratoryforSmartIntegratedSystems

47

ClockManagement
ClockParametersandSkew:
ClockParameters:

Skew:
resultsinmissingthedata at highfrequency

LaboratoryforSmartIntegratedSystems

48

ClockManagement
Jitter:
clockedgesmayarrivealittleearlyoralittlelate
ifsuperimposemultipleedgesontopofeachother;theresultwouldbea
fuzzy clock

LaboratoryforSmartIntegratedSystems

49

ClockManagement
Dedicatedclocktreesarepreoptimizedclocknetworksthatbalancethe
skew,andminimizedelay
Usingspecialtracksandisseparatefromthegeneralpurpose
programmableinterconnect
Virtex5FPGAhas32separateclocknetworks
Spartan3FPGAhas8separateclocknetworks

Eachcanbeconfiguredforabuiltinclockenable(BUFGCE)orswitchingclocksources
(BUFGMUX)

LaboratoryforSmartIntegratedSystems

50

ClockManagement
PLL(PhaseLockLoop)

CMT

synthesizingclockfrequencies
reducingclockjitter

DigitalClockManager
(DCM):
generatingclock
frequencies,
correctingclockduty
cycles,andphaseshifting
clocks

DCMconsistsof
DigitalDelayLockedLoop(DLL)
DigitalFrequencySynthesis
(DFS)
DigitalPhaseShifter(DPS)

LaboratoryforSmartIntegratedSystems

51

DedicatedandSpecialResources
Clockmanagement(CMT)
DCMandPLL
Dedicatedclocktrees(notshown)

Testlogic
BuiltinJTAG

I/Otranslators
Supportingmanydifferentthresholds

Otherresources
DualDataRate(DDR)registersinIOB

SERDESresources

DedicatedCores
BlockRAM
DSPSlices
Gigabittransceivers,MGTs (all
devices)
TrimodeEthernetMAC(alldevices)
PCIExpress core(alldevices)

AdditionalFXTCores
PowerPC 440processors(not
shown)
FasterGTXtransceiver(notshown)

ThededicatedresourcesforVirtex5

LaboratoryforSmartIntegratedSystems

52

EXAMPLES

Spartan-3 Family Architecture

LaboratoryforSmartIntegratedSystems

53

EXAMPLES

Structure of a Xilinx Virtex II Pro FPGA with two PowerPC 405 Processor blocks

LaboratoryforSmartIntegratedSystems

54

FPGADesignFlow
Specifications
Specifications

High-level
High-level
Description
Description

Structural
Structural
Description
Description

Behavioral
VHDL, C

Structural
VHDL

LaboratoryforSmartIntegratedSystems

5555

FPGADesignFlow
High-level
High-level
Description
Description

Specifications
Specifications

Implementing
Placed
Placed
&& Routed
Routed
Design
Design
Programming

Technology
Mapping

Gate-level
Gate-level
Design
Design

Generating

Bit-stream

Structural
Structural
Description
Description

Synthesis

Logic
Logic
Description
Description
X=(AB*CD)+
(A+D)+(A(B+C))
Y = (A(B+C)+AC+
D+A(BC+D))

LaboratoryforSmartIntegratedSystems

5656

Summary
Concepts and applications of FPGA
FPGA architecture
Configurable Logic Block
Routing Network Architecture

LaboratoryforSmartIntegratedSystems

57

Você também pode gostar