Você está na página 1de 20

LECTURE 16 FINE-GRAINED RECONFIGURABLE COMPUTING: FPGAS JOEL EMER AND DANIEL SANCHEZ

6.888 PARALLEL AND HETEROGENEOUS COMPUTER ARCHITECTURE SPRING 2013

Field Programmable Gate Arrays (FPGA)


And 00 0 0 0 1 00 01 10 11 Or 0 0 1 1

LUT

01

Latch

10 11

RAM

.....

Evolution of FPGA applications


3

Logic Replacement
Low

design cost and effort Low volume applications Often replaced with ASIC as volume increases

Algorithmic computation
Offloads

a general purpose processor Used for multiple algorithms ASIC replacement not expected

6.888 Spring 2013 - Sanchez and Emer L16

Benefits of FPGA computation


4

Custom operations/data types custom operations/data types Flexible flow control - control flow based on arbitrary state machines Local state access - local state elements allows parallel state access Fine grain parallelism - replicated logic permits easy parallelism Custom communication - explicit direct inter-module communication

Reduced memory references more direct reuse of data

Better power efficiency more activity directly applied to computation Better area efficiency more area directly applied to computation

6.888 Spring 2013 - Sanchez and Emer L16

QPI-attached FPGA platform


Four Socket QuickAssist Platform Topology
QuickAssist FPGA

Intel QuickAssist QPI-based FPGA Accelerator Accelerator Hardware Module (AHM) Platform (QAP)

P
QPI Links

QPI Links

M Xilinx Virtex 6 Module

CS
I/O I/O I/O

CS
I/O

FPGA

PCIe attached FPGA


Intel Xeon processor 7000 series
6.888 Spring 2013 - Sanchez and Emer L16

Altera Stratix IV Module

Reed Solomon Results


WiMAX requirement is to support a throughput of 134Mbps

Xilinx IP
Equivalent Gate Count Frequency (MHz) Steady State (Cycles/ Block) Data rate (Mbps) 297,409 145.3 660

CatapultC
596,730 91.2 2073

Bluespec
267,741 108.5 276

392.8

89.7

701.3

Lower is better

Higher is better

Source: MIT, Abhinav Agarwal, Alfred Ng CSG

BORPH
7

Berkeley Operating system for ReProgrammable Hardware OS for reconfigurable computers


Treats

reconfigurable hardware as computational resources

UNIX interface to HW designs


Familiar

to both software and hardware engineers Design language independent

Goal:

Make FPGA-based reconfigurable computers easy to use


6.888 Spring 2013 - Sanchez and Emer L16

Conventional View of FPGA Systems


8
User Process (SW) User Process (SW) User Process (SW)

Software

pipe User Library file OS Kernel Device Driver

socket IPC

Master-Slave Relationship

Hardware

Hardware Platform (Network, UART, HD)


FPGA

coprocessor

6.888 Spring 2013 - Sanchez and Emer 6.888 Spring 2013 - Sanchez and Emer L16 L16

BORPH Layers
User Process (SW) User Process (SW) User Process (SW)

Peer-to-Peer Relationship

User Library Software file pipe BORPH Kernel Device Driver IPC ioreg virtual file socket

Hardware

Hardware Platform (Network, UART, HD)

Hardware User Library


User Process (HW) User Process (HW)

6.888 Spring 2013 - Sanchez and Emer 6.888 Spring 2013 - Sanchez and Emer L16 L16

Overview of BORPH Concepts


Hardware process Hardware syscall interface Interacting with an FPGA


SW

SW

SW

Software

file

pipe

User Library

IPC ioreg

socket

Hardware

ioreg virtual file interface Hardware file I/O

BORPH Kernel Device Driver

Hardware Platform (Network, UART, HD)

Hardware User Library


HW HW

6.888 Spring 2013 - Sanchez and Emer L16

Hardware Process (1)

An executing instance of a hardware design

SW: An executing instance of a program


Software

SW

SW

SW

Normal UNIX process

file

pipe

User Library

IPC ioreg

socket

Has pid, check status with ps, kill, etc

BORPH Kernel Device Driver

Hardware

Unit of management Created when a BORPH Object File (BOF) file is exec-ed

Hardware Platform (Network, UART, HD)

Hardware User Library


HW HW

Kernel selects and configure hardware region automatically

6.888 Spring 2013 - Sanchez and Emer L16

11

Benefits of UNIX Process Model


12

Very easy for user to reason about Enable FPGA designs to become active component of the system
e.g.

an FIR filter: Conventional: a passive entity where software sends/receives data BORPH: an active entity that pulls/pushes data as needed

Enable multiple instances of the same FPGA design running in the system
No

more fixed accelerator concept Works well in true reconfigurable computing systems

6.888 Spring 2013 - Sanchez and Emer L16

HW Processes I/O

I/O managed by kernel

Similar to SW
SW SW SW

Hide details from users

Software

e.g. HW-SW, HW-HW UNIX file pipe

file

pipe

User Library

IPC ioreg

socket

File I/O, pipe, signal ioreg virtual file system

Hardware

Standard UNIX I/O mechanism HW specific service

BORPH Kernel Device Driver

Hardware Platform (Network, UART, HD)

Hardware User Library


HW HW

Dont ask How do I in HW. Think: What if it were SW?


6.888 Spring 2013 - Sanchez and Emer L16

13

ioreg Virtual File System


14

Maps user defined hardware constructs as virtual files under the processs / proc/<pid>/hw/ioreg/ directory

Single word register Memory: On-chip + Off-chip FIFO /proc/123/hw/ioreg/COUNTERVAL

Example:

ioreg information embedded in the executing BOF file read and write system calls translated to message packet by the kernel

Any UNIX program can communicate with hardware processes


n n n n n

Shell: echo 1 > /proc/123/hw/ioreg/enable C: MEM_FILE = fopen(/proc/123/hw/ioreg/MyMemory, r); fread(swbuf, 1, MEM_SIZE, MEM_FILE); Python, Java, etc

6.888 Spring 2013 - Sanchez and Emer L16

Hardware File I/O


15

Access to the general file system from hardware processes Debug by printing

printf

Read test vectors, record output


Analog Frontend

A/D

Baseband Process

Upper Layer

SW/HW processes chained by file pipe


bash$ receiver.bof < file.in > file.out

Decode video.in

Resize

Edge Detect

Encode video.out

bash$ decode video.in | resize | edgdet.bof | encode > video.out

6.888 Spring 2013 - Sanchez and Emer L16

Latency-Insensitive Design: A Higher Semantic


FPGA
Control Partition Control Timing Partition Fetch Decode Exe

Fetch

Decode

Exe

Functional Partition

Inter-module communication by latency insensitive channels

Changing the timing behavior of a module does not affect functional correctness of the program Improved modularity Simplified design-space exploration 16

Many HW designs use this methodology


Implemented with guarded FIFOs in current RTLs

Latency-Insensitive Design: A Higher Semantic


FPGA0

FPGA
Control Partition Control Timing Partition Fetch Decode Exe

FPGA1

Fetch

Decode

Exe

Functional Partition

Behavior of LI channels does not affect functional correctness.

17

Latency-Insensitive Design: A Higher Semantic


FPGA
Control Partition Control Timing Partition Fetch Decode Exe

Fetch

Decode

Exe

Functional Partition

There are many FIFOs in the design

It may not be safe to modify some of them Reasoning about cycle accuracy is difficult But the programmer knows about the LI property
18

Compilers see only wires and registers

A Syntax for LI Design

Programmer needs to differentiate LI channels from normal FIFOs Latency-Insensitive Send/Recv endpoints

module mkTimeP; Send#(Inst) send <- mkSend(Decode); endmodule

Implementation chosen by compiler FIFO order Guaranteed delivery Unspecified buffering & unspecified latency Programmer responsible for correct annotation

mkTimeP RTL

Explicit programmer contract


mkFuncP RTL
module mkFuncP; Recv#(Inst) recv <- mkRecv(Decode); endmodule

Easy to use often a textual substitution!

19

Connected User Application


20
Pla8orm_0 Pla8orm_0 Comms mkApplicaAon Pla8orm_1 Pla8orm_1 Comms mkApplicaAon_Stub mkA mkA_Stub mkB_Stub mkC mkB

mkC_Stub

Key:

User Spring 2013 Library Generated 6.888 - Sanchez and Emer L16

Você também pode gostar