Escolar Documentos
Profissional Documentos
Cultura Documentos
Programmable Logic
The simplest programmable logic devices are PALs (see 22V10 figure next page). PLD my students use them in their first year at PSU Programmable Logic Devices What is the next step in the evolution of PLDs?
More gates!
How do we get more gates? We could put several PALs on one chip and put an interconnection matrix between them!!
This is called a Complex PLD (CPLD).
22V10 PLD
Cypress CPLD
FPGA Technology
1.
4.
Dedicated logic for carry generation, or other arithmetic functions Phase locked loops for clock synchronization, division, multiplication.
Dedicated memory
16 x1 LUT
DFF
EPROM/EEPROM Technology
EPROM can be reprogrammed, no need for external storage. EPROM can not be re-programmed in circuit. EEPROM can be re-programmed in circuit. EEPROM consumes 2X more area as EPROM.
FPGA
Segmented Unpredictable Long Moderate Moderate Registered logic only Source: Altera
FPGAs
What? - Programmable logic + programmable routing = FPGAs. Why? - Zero NREs, easy bug fixes, and short time-to-market. How?
Design Considerations
Target architecture. Fixed logic and routing resources. Fixed I/O pins. Slow signal delays.
COSTS of Technologies
Lower Cost
Moores Law is alive
Smaller geometries and larger wafers and lower defect density (=higher yield ) continue to achieve lower cost per function
On-chip microprocessor(s) and Gbps transceivers Gate count is really a meaningless metric
System Generator bridges the gap between Matlab/Simulink and FPGA circuit description
Source:IBM
65 nm
ASICS are only for extreme designs: Extreme volume, speed, size, low power
SPGA
Allow multiple building blocks. Logic. Memory. Data path.
5% 5% 6%
3%
24% 15%
11%
Source:dataquest
9%
Source:dataquest
FPGAs Growth
2500 2000 1500 1000 500 0 1996 1997 1998 1999 2000 M USD
Milions US dollars
Source:dataquest
Rapid Prototyping
What? Why? How?
What is prototyping?
Basic components: FPGAs and FPICs. Hardware : boards, boxes, and cabinets. Software: methodologies and CAD tools.
Customer acceptance
Product development
Production
Design Verification
Specification Functionality & requirements
?
Final product Final functionality & performance
Design Process
Specification System-level design RTL design Logic-level design Physical-level design Final chips Simulation Fast prototyping Formal verification Logic emulation
Verification Alternatives
Modeling System Prepare accuracy integration time
Event Driven Simulation High Cycle-Based Simulation Behavioral Simulation Breadboarding Med. Low Med.
No No No
1 --------- Actual hardware at 50MHz 10 -------- Logic emulator or prototype at 5MHz 100------2K-------- HW accelerator at 250M evals/sec 1 Mon. 50K------- Cycle-based simulator at 1K insts/sec 3 Mon. 120K----- Compiled-code logic simulator at 125MIPs 1.5 Yr. 800K----- Event-driven logic simulator at 125 MIPs
We need FPGA emulation because simulation is too slow
SW
Design
Code
Integration
Debug
HW
Design
Build
Integration
Debug
CHIP
Design
Fab
Debug
SW
Design
System Integration Code & SW Debug Build HW Integration & Debug Final Integration
HW
Design
CHIP
Design
Chip debug
Fab
Maximum number of IO? Package types? Ball Grid Array (BGA) for high density IO
Altera Flex10K/10KE LEs (Logic elements) have 4-input LUTS (look-up tables) +1 FF Fast Carry Chain between LEs, Cascade chain for logic operations Large blocks of SRAM available as well Altera Max7000/Max7000A EEPROM based, very fast (Tpd = 7.5 ns) Basically a PLD architecture with programmable interconnect. Max 7000A family is 3.3 v
Cypress CPLDs
Ultra37000 Family
32 to 512 Macrocells Fast (Tpd 5 to 10ns depending on number of macrocells) Very good routing resources for a CPLD
Evolution
1965 Max Clock Rate (MHz) Min IC Geometries () # of IC Metal Layers PC Board Trace 5Width () years: 1 1 2000 1-2 1980 10 5 2 500 2-4 1995 100 0.5 3 100 4-8 2010( ?) 1000 0.05 12 25 1020
Every System speed doubles, IC geometry shrinks 50% # of PC-Board Layers Every 7-8 years: PC-board min trace width shrinks 50%
Spartan-2 XC4000
'96 '97 '98 '99 '00 '01 '02
Spartan-3
'03 '04
Year
Speed Doubles Every 5 Years ...but the speed of light never changes
Caused by: Gate leakage due to 16 gate thickness Sub-threshold leakage current
incomplete turn-off because threshold does not scale
FPGAs in 2003
1000 to 80,000 LUTs and flip-flops, millions of bits in dual-ported RAMs Low-skew Global Clocks, Frequency synthesis, 50 ps phase control 18 Kbit BlockRAMs and 18 x 18 multipliers FPGAs are not glue-logic anymore
FPGAs in 2003
1000 to 80,000 LUTs and flip-flops, millions of bits in dual-ported RAMs Low-skew Global Clocks, Frequency synthesis, 50 ps phase control 18 Kbit BlockRAMs and 18 x 18 multipliers FPGAs are not glue-logic anymore
FPGAs in 2003
300+ MHz system clock, 800 MHz I/O 3+ Gigabit transceivers Embedded hard and soft microprocessors Design security: Triple-DES encryption VHDL/Verilog entry, synthesis, auto place and route FPGAs are a compelling alternative to ASICs
FPGAs in 2004
Integrated Tri-Mode Ethernet MAC Cores Integrated System Monitor 500 MHz Xesium Clocking 500 MHz Xtreme DSP Slice
More Versatility
New innovative functions
Lower Cost
Smaller area = lower cost per function
Better clocking, less skew, more flexibility Better configuration control, partial reconfiguration Robust configuration cell, SEU tolerant like 130 nm
32-bit arithmetic
48-bit adders and synchronous loadable counters
Advanced Clocking
Proper clocking is extremely important for performance and reliability Most design need many global clock lines with minimal clock delay and clock skew Digital Clock Manager (DCM) provides: Four-phase outputs, Frequency multiplication and division Fine phase adjustment
Advanced I/O
>50 Different Output Standards (strength, voltage, input threshold, etc)
multiple parallel output transistors which are either fully on or fully off,
System Synchronous
System-Synchronous when the clock arrives simultaneously at all chips typically used below 200 MHz clock rate On-chip clock distribution DCM Zero clock delay controls set-up time, and avoids hold time requirements The traditional design methodology
Source Synchronous
Each data bus has its own clock trace typically used at 200 to 800 MHz clock rate On-chip clock-distribution DCM centers the clock in the data eye Adds more unidirectional-only clock lines The only way above 300 MHz
32b @ 78 MHz
32b @ 78 MHz
Virtex-II Pro
Virtex-II Pro
Virtex-4
Virtex-4
Serial Out
TX Clock Generator
16X/20X Multiplier
Receiver
Comma Detect RX and Word Alignment
Clock Generator
Serial In
Virtex-4 Capabilities
Any type of design runs at >400 MHz Pipelining provides extra performance for free Synchronous is best, but 32 clock are available Gigabit serial saves pins and board area On-chip termination for board signal integrity I/O features support double-data rate operation and source-synchronous design
Virtex-4 Capabilities
Popular functions are hard-wired
for lower cost, higher performance, and ease-of-use: microprocessors, FIFOs, serial I/O, clock management, etc.
2004 Challenges
Technology moves rapidly: 130, 90, 65 nm Multiple Vcc, lower voltage - higher current
Lower Vcc makes decoupling very critical