Escolar Documentos
Profissional Documentos
Cultura Documentos
LUT
01
Latch
10 11
RAM
.....
Logic Replacement
Low
design cost and effort Low volume applications Often replaced with ASIC as volume increases
Algorithmic computation
Offloads
a general purpose processor Used for multiple algorithms ASIC replacement not expected
Custom operations/data types custom operations/data types Flexible flow control - control flow based on arbitrary state machines Local state access - local state elements allows parallel state access Fine grain parallelism - replicated logic permits easy parallelism Custom communication - explicit direct inter-module communication
Better power efficiency more activity directly applied to computation Better area efficiency more area directly applied to computation
Intel QuickAssist QPI-based FPGA Accelerator Accelerator Hardware Module (AHM) Platform (QAP)
P
QPI Links
QPI Links
CS
I/O I/O I/O
CS
I/O
FPGA
Xilinx IP
Equivalent Gate Count Frequency (MHz) Steady State (Cycles/ Block) Data rate (Mbps) 297,409 145.3 660
CatapultC
596,730 91.2 2073
Bluespec
267,741 108.5 276
392.8
89.7
701.3
Lower is better
Higher is better
BORPH
7
Goal:
Software
socket IPC
Master-Slave Relationship
Hardware
coprocessor
6.888 Spring 2013 - Sanchez and Emer 6.888 Spring 2013 - Sanchez and Emer L16 L16
BORPH Layers
User Process (SW) User Process (SW) User Process (SW)
Peer-to-Peer Relationship
User Library Software file pipe BORPH Kernel Device Driver IPC ioreg virtual file socket
Hardware
6.888 Spring 2013 - Sanchez and Emer 6.888 Spring 2013 - Sanchez and Emer L16 L16
SW
SW
SW
Software
file
pipe
User Library
IPC ioreg
socket
Hardware
SW
SW
SW
file
pipe
User Library
IPC ioreg
socket
Hardware
Unit of management Created when a BORPH Object File (BOF) file is exec-ed
11
Very easy for user to reason about Enable FPGA designs to become active component of the system
e.g.
an FIR filter: Conventional: a passive entity where software sends/receives data BORPH: an active entity that pulls/pushes data as needed
Enable multiple instances of the same FPGA design running in the system
No
more fixed accelerator concept Works well in true reconfigurable computing systems
HW Processes I/O
Similar to SW
SW SW SW
Software
file
pipe
User Library
IPC ioreg
socket
Hardware
13
Maps user defined hardware constructs as virtual files under the processs / proc/<pid>/hw/ioreg/ directory
Example:
ioreg information embedded in the executing BOF file read and write system calls translated to message packet by the kernel
Shell: echo 1 > /proc/123/hw/ioreg/enable C: MEM_FILE = fopen(/proc/123/hw/ioreg/MyMemory, r); fread(swbuf, 1, MEM_SIZE, MEM_FILE); Python, Java, etc
Access to the general file system from hardware processes Debug by printing
printf
A/D
Baseband Process
Upper Layer
Decode video.in
Resize
Edge Detect
Encode video.out
Fetch
Decode
Exe
Functional Partition
Changing the timing behavior of a module does not affect functional correctness of the program Improved modularity Simplified design-space exploration 16
FPGA
Control Partition Control
Timing Partition Fetch
Decode
Exe
FPGA1
Fetch
Decode
Exe
Functional Partition
17
Fetch
Decode
Exe
Functional Partition
It may not be safe to modify some of them Reasoning about cycle accuracy is difficult But the programmer knows about the LI property
18
Programmer needs to differentiate LI channels from normal FIFOs Latency-Insensitive Send/Recv endpoints
Implementation chosen by compiler FIFO order Guaranteed delivery Unspecified buffering & unspecified latency Programmer responsible for correct annotation
mkTimeP RTL
mkFuncP RTL
module mkFuncP; Recv#(Inst) recv <- mkRecv(Decode); endmodule
19
mkC_Stub
Key:
User Spring 2013 Library Generated 6.888 - Sanchez and Emer L16