Escolar Documentos
Profissional Documentos
Cultura Documentos
University ofEngineeringandTechnology
FPGA TECHNOLOGY
TS. Nguyn Kim Hng
Email: kiemhung@vnu.edu.vn
LaboratoryforSmartIntegratedSystems
Objectives
Inthislectureyouwillbeintroducedto:
TheprogrammablelogicTechnology,the
featuresofFPGAarchitecture
CoarsegrainedReconfigurableArchitectures
ReconfigurableComputing
LaboratoryforSmartIntegratedSystems
Review
Existing Integrated Circuits (ICs) can be classified
into (1):
Standard ICs:
realize some commonly used logic circuits
conform to an agreed-upon standard in terms of
functionality and physical configuration
For example:
7400-series, etc.
Memories, microcontroller, microprocessors, etc.
LaboratoryforSmartIntegratedSystems
Review
Existing Integrated Circuits (ICs) can be classified
into (2):
Programmable Logic Devices (PLD):
Contain a regular structure and a collection of
programmable switches that allow the internal circuitry in
the chip to be configured by the user to implement a wide
range of different logic circuits.
Can be programmed multiple times.
Maskprogrammable PLDs andFieldprogrammable PLDs.
Be classified into:
Programmable Logic Array (PLA): both the AND and OR
planes are programmable.
Programmable Array Logic (PAL): programmable AND plane,
the is fixed OR plane.
Field Programmable Gate Array (FPGA)
LaboratoryforSmartIntegratedSystems
ExampleofMaskProgrammablePLD
f1 x2 x3 x1 x3
LaboratoryforSmartIntegratedSystems
ExampleofFieldProgrammablePLD
LaboratoryforSmartIntegratedSystems
Review
Existing Integrated Circuits (ICs) can be classified
into (3):
Application Specific IC (ASIC) or
Chips:
Custom-Designed
LaboratoryforSmartIntegratedSystems
ContrastingArchitectures
ASIC architecture compared to the Xilinx FPGA architecture
Granularity: Gates vs. LUTs
Delays: Low vs. High
Performance: High vs. Low
Cost
Size
Performance
Volume
Analog circuitry
Time to market
Reprogrammability
LaboratoryforSmartIntegratedSystems
FPGAApplications
Implementing the prototype for ASIC designs
Providing a hardware platform to verify the physical
implementation of new algorithms in:
LaboratoryforSmartIntegratedSystems
WhatisFPGA?
Field ProgrammableGateArray:
Prefabricated digital (IC)devices
Electricallyprogrammedtobecome
almostanykindofdigitalcircuitor
system
Programming takes place in the
field.
Comprisesof
Configurablelogicblocks(CLB),
Programmableroutingresources:wires
andswitches
I/Oblocks.
Adoptstheprogrammingtechnologies:
SRAMbasedtechnology
Flash/EEPROMtechnology
Antifusetechnology
OverviewoftheFPGAarchitecture
LaboratoryforSmartIntegratedSystems
10
ProgrammingTechnologies
A memory element for storing
configuration information
A basic CLB
LaboratoryforSmartIntegratedSystems
11
ProgrammingTechnologies
(1)SRAMBasedProgrammingTechnology
Characteristics:
Staticmemorycellsareusedasthebasiccells,
thedominantapproachfortheexistingFPGAs
Advantages:
reprogrammability;theuseofstandardCMOSprocesstechnology
higherspeedandlowerdynamicpowerconsumption
Disadvantages:
Largerareacomparedtootherprogrammingtechnologies
anSRAMcellrequires6transistors
SRAMcellsarevolatile
LaboratoryforSmartIntegratedSystems
12
ProgrammingTechnologies
(2)EEPROM/FlashbasedProgrammingTechnology
Characteristics:
canbeelectricallyprogrammed
Advantages:
nonvolatile
IsmoreefficientintermofareathanSRAMbasedprogramming
technology
Disadvantages:
cannotbereconfigured/reprogrammedaninfinitenumberof
times
flashbasedtechnologyusesnonstandardCMOSprocess
LaboratoryforSmartIntegratedSystems
13
ProgrammingTechnologies
(3)AntifuseProgrammingTechnology
Characteristics:
onetimeprogrammable(OTP)
Advantages:
lowarea;
nonvolatile
Disadvantages:
doesnotusestandardCMOSprocess
cannotbereprogrammed
LaboratoryforSmartIntegratedSystems
14
Configuration
When does configuration happen?
On power up: static configuration
On demand: dynamical configuration
LaboratoryforSmartIntegratedSystems
15
Configuration
Cost of ownership is reduced with the ability to
reconfigure the hardwareextending the life of the
product
Reduces the costly physical deployment
of repair technicians
Extends the life of the product
Upgrades
Bug fixes
Adding additional functionality
Faster time to market
Partial reconfiguration
LaboratoryforSmartIntegratedSystems
16
FPGAConfigurationMethods
Xilinx Cables:
JTAG
Slave Serial
Slave SelectMAP
Microprocessor:
JTAG
Slave Serial
Slave SelectMAP
FPGA
FPGA
Xilinx PROMs:
Slave/Master
Serial
Slave/Master
SelectMAP
Commodity Flash:
Slave SelectMAP
SPI*
BPI*
*SPI and BPI support is available in the newer Virtex-5 and Spartan-3E families
LaboratoryforSmartIntegratedSystems
17
FivePrimaryElements
Configurable
logicblocks
XilinxFPGAs
Dedicated
blocks
Input and
outputblocks
Routing
*Clocking
Resources
LaboratoryforSmartIntegratedSystems
18
ConfigurableLogicBlocks
Logic Block Architecture:
Organization:
LaboratoryforSmartIntegratedSystems
19
ConfigurableLogicBlocks
LaboratoryforSmartIntegratedSystems
20
LogicCells
Logic cells include
Combinatorial logic, arithmetic
logic, and a register
Combinatorial logic is
implemented using Look-Up
Tables (LUTs)
Register can function as
latches, JK, SR, D, and T-type
flip-flops
Arithmetic logic is a dedicated
carry chain for implementing
fast arithmetic operations
Carry
out
Carry
Chain
LUT
Carry
in
LaboratoryforSmartIntegratedSystems
S/R
21
LUT:LookupTable
Used to implement a small
logic function
Composed of:
storage cells store values
LaboratoryforSmartIntegratedSystems
22
CombinatorialLogic
LUTs function as a Memory
A B C D E F Z
0 0 0 0 0 0 0
0 0 0 0 0 1 0
0 0 0 0 1 0 0
0 0 0 0 1 1 1
0 0 0 1 0 0 1
000000110011
. . .
1
0 0 1 1 0 0 0
0 0 1 1 0 1 0
0 0 1 1 1 0 0
0 0 1 1 1 1 1
Combinatorial Logic
A
B
C
D
E
F
They generate
the output
value
LUT
Z
ConstantdelaythroughaLUT
Limitedbythenumberofinputsand
outputs,notbycomplexity
LaboratoryforSmartIntegratedSystems
23
LUT:ASimpleExample
LaboratoryforSmartIntegratedSystems
24
WideInputFunctions
Forwiderinput
functions,LUTs canbe
combinedusinga
multiplexer
LUT
LUT
Thesemuxes are
dedicated,sotheyare
fast
LaboratoryforSmartIntegratedSystems
MUX
LUT
25
ASICImplementation
8inputANDgate
TwofourinputNANDgates
feedingatwoinputNORgate
LaboratoryforSmartIntegratedSystems
26
XilinxImplementation
8-input AND gate implemented in
three 4-input LUTs and two logic
levels
LaboratoryforSmartIntegratedSystems
27
Quiz
How many 4-input LUTs would be
required to implement a 32-input
OR gate?
How many Logic Levels would they
generate?
LaboratoryforSmartIntegratedSystems
28
CarryLogic
LaboratoryforSmartIntegratedSystems
29
CarryLogic
The carry logic chain is dedicated
logic that computes high-speed
arithmetic logic functions
The carry chain generally consists
of a multiplexer and an XOR gate
From
LUT
LaboratoryforSmartIntegratedSystems
30
RoutingNetworkArchitecture
Provides connections among logic blocks and I/O blocks
to implement any user-defined circuit
Comprises of wires and programmable switches
Must be very flexible to accommodate a wide variety
of circuits
Must be very efficiency to offer high performance
Be optimized by taking into account the common
characteristics of these circuits:
Locality: requiring abundant short wires
some distant connections: leads to the need for sparse long
wires.
LaboratoryforSmartIntegratedSystems
31
RoutingNetworkArchitecture
Island-style Architecture (or mesh-based FPGA architecture):
The most commonly used architecture among commercial FPGAs
Configurable logic blocks look like islands in a sea of routing
interconnect (the routing network occupies 8090% of total area)
LaboratoryforSmartIntegratedSystems
32
RoutingNetworkArchitecture
Channelwidth:isthenumberoftracksinroutingchannel
Connectionboxes(CB):connectsLogicblocksandroutingnetwork
FlexibilityofaCB(Fc)isthenumberofroutingtracksofadjacentchannel
whichareconnectedtothepinofablock
Fc(in):theconnectivityofinputpinsoflogicblocks
Fc(out):theconnectivityofoutputpinsoflogicblocks
Switchboxes(SB):connects horizontalandverticalroutingtracks
FlexibilityofaSB(Fs)isthetotalnumberoftrackswhicheverytrackentering
intheswitchboxconnectsto
LaboratoryforSmartIntegratedSystems
33
RoutingNetworkArchitecture
Routingtrackscanbe
bidirectionalor
unidirectional
Channelwidthof
unidirectionalwiringmust
beinmultiplesof2
LaboratoryforSmartIntegratedSystems
34
RoutingNetworkArchitecture
Multilengthwiresarecreated tobalanceflexibility,areaand
delayoftheroutingnetwork
Longerwiresegments:
Spanmultipleblocksandrequirefewerswitches,therebyreducing
routingareaanddelay
Butalsodecreaseroutingflexibility,whichreducestheprobabilityto
routeahardwarecircuitsuccessfully
LaboratoryforSmartIntegratedSystems
35
RoutingNetworkArchitecture
Hierarchical Architecture:
The connections between logic blocks within same cluster are made by
wire segments
the connection between blocks residing in different groups require the
traversal of one or more levels of hierarchy.
LaboratoryforSmartIntegratedSystems
36
RoutingNetworkArchitecture
Hierarchical Architecture:
Example
LaboratoryforSmartIntegratedSystems
37
NoC
basedRoutingArchitecture
based
NoCbasedRoutingArchitecture
Network-on-Chip:
Router
Processing
element
Unidirectional
links
Network
Interface
Input
buffers
Network-on-Chip.
LaboratoryforSmartIntegratedSystems
38
On
chipInterconnectionTypes
OnchipInterconnectionTypes
Network-on-Chip:
Network-on-Chip
LaboratoryforSmartIntegratedSystems
39
DedicatedRouting
Acombinationofprogrammableand
dedicatedroutinglines
Dedicatedrouting
Globalclockswithpredefinedclocktree
RegionalclocksandIOclocks
Globallowskewroutingresourcesforother
highfanoutsignals
Carrychainrouting
Dedicatedroutingamongotherdedicated
resources
Generalinterconnect
RoutingoflocalsignalsbetweenCLBs and
IOBs
LaboratoryforSmartIntegratedSystems
40
IOBElement
Controltheflowofdatabetweenthe
I/Opinsandtheinternallogicofthe
device
Canconfigureasingleinterfacepin
asinput,outputorbidirectional
Includeaninputblock,anoutput
blockandanoutputenableblock
A pair of Dual-Data Rate
(DDR) registers
Two operation modes of DDR
registers:
Singledatarate(SDR):dataare
copiedintotheI/Oregisterson
therisingclockedgeonly
Doubledatarate(DDR):dataare
copiedintotheI/Oregisterson
boththerisingclockedgeand
fallingclockedge
LaboratoryforSmartIntegratedSystems
41
ConfigurableI/Ostandards
Standard referstoelectrical
aspectsofthesignals,suchas
theirlogic0andlogic1voltage
levels
I/Ocanbeconfiguredtoaccept
andgeneratesignalsconforming
towhicheverstandardisrequired
I/Osignalswillbesplitintoa
numberofbanks,eachbankcan
beconfiguredindividuallyto
supportaparticularI/Ostandard
allowstheFPGAtoworkwith
devicesusingmultipleI/O
standards
allowstheFPGAtoactuallybe
usedtointerface(translate)
betweendifferentI/Ostandards
LaboratoryforSmartIntegratedSystems
42
I/OTranslators
Programmableinputandoutputthresholds
Supportedstandardsinclude
LVCMOS(severalclasses),LVPECL,HSTL
(severalclasses),SSTL(severalclasses),PCI,
PCIX,LVDS(severalclasses),GTL,GTL+,and
HyperTransport (LDT)technology
Supportedstandardsvary,checkyourdatasheet
DifferentI/Ostandardsrequireaseparateinputandoutput
referencevoltageforeachbanksupportingaseparateI/O
standard
Generally,eachbankcansupportseveralstandards,aslongas
theysharethesamevref (input)orvcco (output)
LaboratoryforSmartIntegratedSystems
43
DedicatedBlocks
HardIP
Preimplementedhardwareblockssuchasmicroprocessorcores,gigabit
interfaces,multipliers,adders,MACfunctionsetc.
Designedtobeasefficientaspossibleintermsofpowerconsumption,
siliconarea,andperformance
SoftIP:
sourcelevellibraryofhighlevelfunctions inahardwaredescription
language,orHDL,suchasVerilog orVHDLattheregistertransferlevel
(RTL)ofabstraction
FirmIP:
alibraryofhighlevelfunctionsinnetlist (i.e.thesefunctionshave
alreadybeenoptimallymapped,placed,androutedintoagroupof
programmablelogicblocks)
LaboratoryforSmartIntegratedSystems
44
Gigabittransceivers
Specialhardwiredtransceiverblocks
Useonepairofdifferentialsignalstotransmit(TX)dataand
anotherpairtoreceive(RX)data
Cantransmitandreceivebillionsofbitsofdatapersecond
LaboratoryforSmartIntegratedSystems
45
MemoryBlocks
Support single- and dual-port
synchronous operations
In dual-port mode, these RAM blocks
support fully independent ports for
both reading and writing
Each block of RAM can be used
independently, or multiple blocks can
be combined together to implement
larger blocks by dedicated cascade
logic
Blocks of memory are generally spread
out across the die
Dedicated FIFO logic enables each
RAM to be configured as a FIFO
Contain from tens to hundreds of
these RAM blocks
LaboratoryforSmartIntegratedSystems
46
SpecificPurposeHardBlocks:XILINXDSPSLICE
25x18 Multiply
Dedicated A
Cascading
ALU Mode
Independent
C input
Pattern Detection
LaboratoryforSmartIntegratedSystems
47
ClockManagement
ClockParametersandSkew:
ClockParameters:
Skew:
resultsinmissingthedata at highfrequency
LaboratoryforSmartIntegratedSystems
48
ClockManagement
Jitter:
clockedgesmayarrivealittleearlyoralittlelate
ifsuperimposemultipleedgesontopofeachother;theresultwouldbea
fuzzy clock
LaboratoryforSmartIntegratedSystems
49
ClockManagement
Dedicatedclocktreesarepreoptimizedclocknetworksthatbalancethe
skew,andminimizedelay
Usingspecialtracksandisseparatefromthegeneralpurpose
programmableinterconnect
Virtex5FPGAhas32separateclocknetworks
Spartan3FPGAhas8separateclocknetworks
Eachcanbeconfiguredforabuiltinclockenable(BUFGCE)orswitchingclocksources
(BUFGMUX)
LaboratoryforSmartIntegratedSystems
50
ClockManagement
PLL(PhaseLockLoop)
CMT
synthesizingclockfrequencies
reducingclockjitter
DigitalClockManager
(DCM):
generatingclock
frequencies,
correctingclockduty
cycles,andphaseshifting
clocks
DCMconsistsof
DigitalDelayLockedLoop(DLL)
DigitalFrequencySynthesis
(DFS)
DigitalPhaseShifter(DPS)
LaboratoryforSmartIntegratedSystems
51
DedicatedandSpecialResources
Clockmanagement(CMT)
DCMandPLL
Dedicatedclocktrees(notshown)
Testlogic
BuiltinJTAG
I/Otranslators
Supportingmanydifferentthresholds
Otherresources
DualDataRate(DDR)registersinIOB
SERDESresources
DedicatedCores
BlockRAM
DSPSlices
Gigabittransceivers,MGTs (all
devices)
TrimodeEthernetMAC(alldevices)
PCIExpress core(alldevices)
AdditionalFXTCores
PowerPC 440processors(not
shown)
FasterGTXtransceiver(notshown)
ThededicatedresourcesforVirtex5
LaboratoryforSmartIntegratedSystems
52
EXAMPLES
LaboratoryforSmartIntegratedSystems
53
EXAMPLES
Structure of a Xilinx Virtex II Pro FPGA with two PowerPC 405 Processor blocks
LaboratoryforSmartIntegratedSystems
54
FPGADesignFlow
Specifications
Specifications
High-level
High-level
Description
Description
Structural
Structural
Description
Description
Behavioral
VHDL, C
Structural
VHDL
LaboratoryforSmartIntegratedSystems
5555
FPGADesignFlow
High-level
High-level
Description
Description
Specifications
Specifications
Implementing
Placed
Placed
&& Routed
Routed
Design
Design
Programming
Technology
Mapping
Gate-level
Gate-level
Design
Design
Generating
Bit-stream
Structural
Structural
Description
Description
Synthesis
Logic
Logic
Description
Description
X=(AB*CD)+
(A+D)+(A(B+C))
Y = (A(B+C)+AC+
D+A(BC+D))
LaboratoryforSmartIntegratedSystems
5656
Summary
Concepts and applications of FPGA
FPGA architecture
Configurable Logic Block
Routing Network Architecture
LaboratoryforSmartIntegratedSystems
57