Você está na página 1de 42

Garnet2.

0:
A Detailed On-Chip
Network Model Inside a
Full-System Simulator
Tushar Krishna
Assistant Professor
School of ECE and CS
gem5 workshop Georgia Institute of Technology
ARM Research Summit tushar@ece.gatech.edu
September 11, 2017
http://synergy.ece.gatech.edu/tools/garnet
Overview
u What are Networks-On-Chip (NoCs)?
u Why model NoCs accurately
u Garnet2.0
u Configuration
u Topology
u Routing
u Flow-Control
u Router Microarchitecture
u System Integration
u Network Interface
u Network Parameters
u Running Garnet with Ruby
u Ruby Garnet Standalone
u Output Stats
u Extensions and FAQs
Garnet2.0 Tutorial | ARM Research Summit 2017 Tushar Krishna | Georgia Institute of Technology September 11, 2017 2
Overview
u What are Networks-On-Chip (NoCs)?
u Why model NoCs accurately
u Garnet2.0
u Configuration
u Topology
u Routing
u Flow-Control
u Router Microarchitecture
u System Integration
u Network Interface
u Network Parameters
u Running Garnet with Ruby
u Ruby Garnet Standalone
u Output Stats
u Extensions and FAQs
Garnet2.0 Tutorial | ARM Research Summit 2017 Tushar Krishna | Georgia Institute of Technology September 11, 2017 3
Networks-on-Chip

Garnet2.0 Tutorial | ARM Research Summit 2017 Tushar Krishna | Georgia Institute of Technology September 11, 2017 4
Introduction to NoCs

Core Core Core Core Core Core


+ L1$ + L1$ + L1$ + L1$ + L1$ + L1$
A
LD (A) R1

Rsp
Req
On-Chip Network

Fwd

L2$ L2$ L2$ L2$ L2$ L2$

Garnet2.0 Tutorial | ARM Research Summit 2017 Tushar Krishna | Georgia Institute of Technology September 11, 2017 5
Modern NoCs

Core

L1 L1
D$ I$
Network
L2$ Interface

L3$/Directory Router

Network Interface converts cache


messages (ctrl or data) into packets.

Packets get broken down into one or


more flits depending on NoC link width

“Tile”

Garnet2.0 Tutorial | ARM Research Summit 2017 Tushar Krishna | Georgia Institute of Technology September 11, 2017 6
Network Architecture
u Topology

u Routing

u Flow Control
u Router Microarchitecture

Garnet2.0 Tutorial | ARM Research Summit 2017 Tushar Krishna | Georgia Institute of Technology September 11, 2017 7
Topology:
How to connect nodes with links
~Road Network

Garnet2.0 Tutorial | ARM Research Summit 2017 Tushar Krishna | Georgia Institute of Technology September 11, 2017 8
Routing:
Which path should a message take
~Series of road segments from source to destination

Garnet2.0 Tutorial | ARM Research Summit 2017 Tushar Krishna | Georgia Institute of Technology September 11, 2017 9
Flow Control:
When does a message stop/proceed
~Traffic Signals / Stop signs at end of each road segment

Garnet2.0 Tutorial | ARM Research Summit 2017 Tushar Krishna | Georgia Institute of Technology September 11, 2017 10
Router Microarchitecture:
How to build the routers
~Design of traffic intersection (number of lanes,
algorithm for turning red/green)

Garnet2.0 Tutorial | ARM Research Summit 2017 Tushar Krishna | Georgia Institute of Technology September 11, 2017 11
Overview
u What are Networks-On-Chip (NoCs)?
u Why model NoCs accurately
u Garnet2.0
u Configuration
u Topology
u Routing
u Flow-Control
u Router Microarchitecture
u System Integration
u Network Interface
u Network Parameters
u Running Garnet with Ruby
u Ruby Garnet Standalone
u Output Stats
u Extensions and FAQs
Garnet2.0 Tutorial | ARM Research Summit 2017 Tushar Krishna | Georgia Institute of Technology September 11, 2017 12
Why model NoCs accurately?
Case Study I: Case Study II:
Directory vs. Broadcast Protocols Private vs. Shared L2
Full-state Directory HyperTransport Token Coherence Shared L2 Private L2
1.6 1.2
Normalized Runtime

1.4

Normalized Runtime
1
1.2
1 0.8

0.8 0.6
0.6
0.4
0.4
0.2 0.2
0
0
Baseline NoC FANOUT + FANIN NoC SMART NoC Baseline NoC SMART NoC
(Intel SCC) (Krishna et al, MICRO 2011) (Krishna et al, HPCA 2013) (Intel SCC) (Krishna et al, HPCA 2013)
64-core CMP with different NoC Microarchitectures 64-core CMP with different NoC Microarchitectures

Different NoC Microarchitectures may lead


to different microarchitectural decisions
and new design optimization opportunities

64-core CMP running PARSEC workloads in full-system gem5. Average runtime plotted.
Garnet2.0 Tutorial | ARM Research Summit 2017 Tushar Krishna | Georgia Institute of Technology September 11, 2017 13
Overview
u What are Networks-On-Chip (NoCs)?
u Why model NoCs accurately
u Garnet2.0
u Configuration
u Topology
u Routing
u Flow-Control
u Router Microarchitecture
u System Integration
u Network Interface
u Network Parameters
u Running Garnet with Ruby
u Ruby Garnet Standalone
u Output Stats
u Extensions and FAQs
Garnet2.0 Tutorial | ARM Research Summit 2017 Tushar Krishna | Georgia Institute of Technology September 11, 2017 14
Garnet2.0
u Detailed NoC Model
u Currently part of Ruby Memory System in gem5
u Original version (5-stage pipeline) released in 2009
u Developed by Niket Agarwal (currently at Google) and myself
u New version (1-stage pipeline, more configurability)
released in 2016
u Resources
u Source: src/mem/ruby/network/garnet2.0
u gem5 wiki page: www.gem5.org/garnet2.0
u Dev patches + practice labs:
http://synergy.ece.gatech.edu/garnet
Garnet2.0 Tutorial | ARM Research Summit 2017 Tushar Krishna | Georgia Institute of Technology September 11, 2017 15
Each topology is a python file
Topology in configs/topologies/

Dir Dir Step 1: Instantiate routers and


Core Core Core connect to controllers
+ L1$ ExtLink + L1$ + L1$ via “Ext Links”
(bi-directional)

L2$ L2$ L2$ Step 2: Instantiate “Int Links”


IntLink and connect to routers in
(uni-directional) desired topology
DMA R R R
This is an example of
MeshDirCorners_XY.py
R R R

L2$ L2$ L2$

Dir Dir
Core Core Core
+ L1$ + L1$ + L1$

Garnet2.0 Tutorial | ARM Research Summit 2017 Tushar Krishna | Georgia Institute of Technology September 11, 2017 16
Topology Configurable Parameters
u Router
u router_latency (in cycles)
u Can be set per router
u Defined in src/mem/ruby/network/BasicRouter.py

u Link
u link_latency (in cycles)
u Can be set per link
u Defined in src/mem/ruby/network/BasicLink.py
u weight (i.e., link weight)
u To bias routing algorithm [later slides]
u src_outport (string) and dst_inport (string)
u Port direction (e.g., “East”)
u Helps with readability of Config file + Adaptive routing algorithms
u bw_multiplier (value)
u Used by Ruby’s simple network model, NOT by Garnet
u Link bandwidth is set inside Garnet via the ni_flit_size parameter [later slides]
Garnet2.0 Tutorial | ARM Research Summit 2017 Tushar Krishna | Georgia Institute of Technology September 11, 2017 17
Default Topologies Supported
u Pt2Pt

u Crossbar

u Mesh
u Mesh_XY

u Mesh_westfirst

u MeshDirCorners_XY

u Cluster

Garnet2.0 Tutorial | ARM Research Summit 2017 Tushar Krishna | Georgia Institute of Technology September 11, 2017 18
Routing
u Routing table
u Automatically populated based on topology
u All messages use shortest path
u In case of multiple options, the path with the smaller
weight is chosen
u Deterministic Routing

u Custom
u Users can leverage outport/inport direction names
associated with each port to implement custom
algorithms (say adaptive)
Garnet2.0 Tutorial | ARM Research Summit 2017 Tushar Krishna | Georgia Institute of Technology September 11, 2017 19
Deadlock Avoidance
u Deadlock: A condition in which a set of agents wait
indefinitely trying to acquire a set of resources
u
D 0 1 A
x

v
w
3 2
C B

u Packet A holds buffer u (in 1) and wants buffer v (in 2)


u Packet B holds buffer v (in 2) and wants buffer w (in 3)
u Packet C holds buffer w (in 3) and wants buffer x (in 0)
u Packet D holds buffer x (in 0) and wants buffer u (in 1)

Garnet2.0 Tutorial | ARM Research Summit 2017 Tushar Krishna | Georgia Institute of Technology September 11, 2017 20
Deadlock-free Routing Algorithms
u Deadlocks may occur if the turns taken form a cycle

u Removing some turns can make the routing algorithm


deadlock free

West-First Turn Model


XY Model
Garnet2.0 Tutorial | ARM Research Summit 2017 Tushar Krishna | Georgia Institute of Technology September 11, 2017 21
Deadlock-free Routing Algorithms
u Assign weights to bias which links used first
(to ensure no cyclic dependence)
XY Routing: Always go X first, then Y 1 1
DC
1 1
2 2 2 2 2
DA DB 2
1 1

1 1
2 2 2 2 2
2
1 1
SA SB SC
1 1
Mesh_XY
Garnet2.0 Tutorial | ARM Research Summit 2017 Tushar Krishna | Georgia Institute of Technology September 11, 2017 22
Deadlock-free Routing Algorithms
u Assign weights to bias which links used first
(to ensure no cyclic dependence)
West-first Routing: Go W first, then N/S 1 1
DC
2 2
2 2 2 2 2 2
DA DB
1 1

2 2
2 2 2 2 2 2
1 1
SA SB SC
2 2
Mesh_westfirst
Garnet2.0 Tutorial | ARM Research Summit 2017 Tushar Krishna | Georgia Institute of Technology September 11, 2017 23
Flow Control
u Virtual Channels
u Coherence Protocol requires certain number of virtual
networks / message classes to avoid protocol deadlocks
u This is the minimum number of VCs required
u Within each vnet, there can be more than one VC for
boosting network performance
u In Garnet, only one packet can use a VC inside a router at a time
u VCs in vnets carrying control messages are 1-flit deep
u VCs in vnets carrying data (cacheline) messages are 4-5 flit deep

u Credits
u Each VC conveys its buffer availability by sending credits
to its upstream router

Garnet2.0 Tutorial | ARM Research Summit 2017 Tushar Krishna | Georgia Institute of Technology September 11, 2017 24
Conventional VC Router Microarch
Route BW: Buffer Write
Compute VC Allocator
RC: Route Compute
SW Allocator
V
N
VC 0 VA: VC Allocation
0 VC 1 Input VCs arbitrate for “output”
FLIT VCs (Input VCs at next router)
V
N VC n
1
Input Buffers SA: Switch Allocation
Input ports arbitrate for
VC 1
output ports
VC 2
BR: Buffer Read
VC n
ST: Switch Traversal
Input Buffers Crossbar Switch
LT: Link Traversal

BW RC VA SA BR ST LT

Garnet2.0 Tutorial | ARM Research Summit 2017 Tushar Krishna | Georgia Institute of Technology September 11, 2017 25
Single-Cycle Router Implementation
Router.ccàwakeup()
1 2
3

SwitchAllocator.cc
àwakeup() OutputUnit.cc
Network InputUnit.cc àwakeup()
Link.cc àwakeup() * Arbitrate inports has_free_vc()
VN0 VC 0
VC 1
* Arbitrate outports
VN1 VC 2 * VC Allocate select_free_vc() CreditLink.cc
OutVCState.cc
VC 3 * send credit
VirtualChannel.cc (schedule creditlink
Credit wakeup next cycle) * Rcv Credit
* Buffer Write * Buffer Read * Update State
Link.cc
* Route Compute 4 (send flit to switch)

CrossbarSwitch.ccàwakeup()
For multi-cycle router, NetworkLink.cc
add delay in flit * Switch Traversal: Push winner flits on link queue
* Schedule output link wakeup for next cycle
before it can do SA
Garnet2.0 Tutorial | ARM Research Summit 2017 Tushar Krishna | Georgia Institute of Technology September 11, 2017 26
Overview
u What are Networks-On-Chip (NoCs)?
u Why model NoCs accurately
u Garnet2.0
u Configuration
u Topology
u Routing
u Flow-Control
u Router Microarchitecture
u System Integration
u Network Interface
u Network Parameters
u Running Garnet with Ruby
u Ruby Garnet Standalone
u Output Stats
u Extensions and FAQs
Garnet2.0 Tutorial | ARM Research Summit 2017 Tushar Krishna | Georgia Institute of Technology September 11, 2017 27
System Organization

Ruby Memory System

GarnetNetwork.cc

Network
Coherence Network
Link
Protocol Interface

Credit
Interface is Router
Router Link
“MessageBuffers
” from Ruby Stats

Garnet2.0 Tutorial | ARM Research Summit 2017 Tushar Krishna | Georgia Institute of Technology September 11, 2017 28
Msg_size decides Can add additional
Network Interface number of flits info in flit

Dir <VN, msg>


Core credit
Core
+ L1$ + L1$ VC Select
NI Flitisize
VC 0
NI L2$

Cache Controller
VN0
L2$ NI VC 1

NI NI VN1
VC 2

DMA NI R R
VC 3
To/From
Ingress Router

R R
NI NI
L2$ NI
NI NI
L2$ Egress
Dir
Core Core NetworkInterface.cc
+ L1$ + L1$ àwakeup()

Garnet2.0 Tutorial | ARM Research Summit 2017 Tushar Krishna | Georgia Institute of Technology September 11, 2017 29
Garnet Configurable Parameters
u ni_flit_size
u Default = 16B (128b) à 1-flit ctrl, 5-flit data
u This sets the bandwidth of each physical link
u vcs_per_vnet
u Total VCs in each inport = num_vnets * vcs_per_vnet
u buffers_per_data_vc
u Default = 4
u buffers_per_ctrl_vc
u Default = 1
u routing_algorithm
u Weight-based table or custom
u Defined in:
src/mem/ruby/network/garnet2.0/GarnetNetwork.py
Garnet2.0 Tutorial | ARM Research Summit 2017 Tushar Krishna | Georgia Institute of Technology September 11, 2017 30
Command Line Parameters
src/mem/ruby/network/
BasicRouter.py Network.py Definitions and
BasicLink.py Default Values
garnet2.0/GarnetNetwork.py

configs/

ruby/Ruby.py common/Options.py
network/Network.py example/garnet_synth_test.py

- Overrides default values in Ruby


- All parameters in in these .py files
can be specified from command line

Garnet2.0 Tutorial | ARM Research Summit 2017 Tushar Krishna | Georgia Institute of Technology September 11, 2017 31
Running Garnet with Ruby
u Build Ruby Coherence Protocol
scons build/X86_MOESI_hammer/gem5.opt PROTOCOL=MOESI_hammer

u Protocol determines number of message


classes/virtual networks required

u Invoke garnet2.0 from command line with


appropriate network parameters
./build/X86_MOESI_hammer/gem5.opt configs/example/fs.py
--network=garnet2.0 --topology=Mesh_XY ...

Garnet2.0 Tutorial | ARM Research Summit 2017 Tushar Krishna | Georgia Institute of Technology September 11, 2017 32
Running Garnet Standalone
u Build the Garnet_standalone protocol
scons build/NULL/gem5.debug PROTOCOL=Garnet_Standalone
u Dummy protocol just for traffic injection via Garnet
Synthetic Traffic tester (next slide)
u 3 Virtual Networks: vnet 0 and vnet 1 inject ctrl (1-flit)
packets, vnet 2 injects data (5-flit) packets

u Run user-specified synthetic traffic


./build/NULL/gem5.debug
configs/example/garnet_synth_traffic.py
--network=garnet2.0 --topology=... \
--synthetic=uniform_random \
--injectionrate=0.01 \
--num-packets-max=100 \
Garnet2.0 Tutorial | ARM Research Summit 2017 Tushar Krishna | Georgia Institute of Technology September 11, 2017 33
Garnet Synthetic Traffic
u Dummy CPU Model [only works with Garnet_standalone protocol]
u src/cpu/testers/garnet_synthetic_traffic
u Inject packets for user-specified traffic pattern at user-specified injection
rate
u uniform_random
u tornado
u bit_complement
u bit_reverse
u bit_rotation
u neighbor
u shuffle
u transpose
u Ability to inject continuously or fixed number of packets, and/or for fixed
time, and/or from fixed source and/or to fixed destination in one/more vnets
u Heavy customization very useful for debugging
u All parameters described here:
http://www.gem5.org/Garnet_Synthetic_Traffic
Garnet2.0 Tutorial | ARM Research Summit 2017 Tushar Krishna | Georgia Institute of Technology September 11, 2017 34
Overview
u What are Networks-On-Chip (NoCs)?
u Why model NoCs accurately
u Garnet2.0
u Configuration
u Topology
u Routing
u Flow-Control
u Router Microarchitecture
u System Integration
u Network Interface
u Network Parameters
u Running Garnet with Ruby
u Ruby Garnet Standalone
u Output Stats
u Extensions and FAQs
Garnet2.0 Tutorial | ARM Research Summit 2017 Tushar Krishna | Georgia Institute of Technology September 11, 2017 35
Sample Simulation
./build/NULL/gem5.debug configs/example/garnet_synth_traffic.py
--topology=Mesh_XY --num-cpus=16 --num-dirs=16 --mesh-rows=4

m5out/config.ini

Garnet2.0 Tutorial | ARM Research Summit 2017 Tushar Krishna | Georgia Institute of Technology September 11, 2017 36
Output Stats
m5out/stats.txt

Packet and Flit stats (#injected,


More stats can be added
#received, queueing latency at NIC, and
via GarnetNetwork.cc
network latency) per VN and overall
Garnet2.0 Tutorial | ARM Research Summit 2017 Tushar Krishna | Georgia Institute of Technology September 11, 2017 37
Overview
u What are Networks-On-Chip (NoCs)?
u Why model NoCs accurately
u Garnet2.0
u Configuration
u Topology
u Routing
u Flow-Control
u Router Microarchitecture
u System Integration
u Network Interface
u Network Parameters
u Running Garnet with Ruby
u Ruby Garnet Standalone
u Output Stats
u Extensions and FAQs
Garnet2.0 Tutorial | ARM Research Summit 2017 Tushar Krishna | Georgia Institute of Technology September 11, 2017 38
Extensions and FAQs
u System Level Modeling
u How can I integrate Garnet into my own simulator (or with the gem5
classic memory system)?
u This should not be too hard - would just require some changes to the NI code on how
it receives its inputs. If anyone wants to try it, I would love to give pointers.
u How do I print a network trace?
u You can add some code to the NI. I have a patch on my website for reference.
u Can garnet read a network trace?
u You can run Garnet in a standalone mode, and have it inject traffic from a trace
instead of a fixed synthetic pattern. Alternately, try to use cpu/testers/traffic_gen
u Does Garnet report NoC area and power numbers?
u The output of Garnet can be fed into DSENT (Sun et al, NOCS 2012) which is present
in ext/dsent. We will add an automatic stats.txt parser for it soon.
u Can we model multiple clock domains?
u The entire Ruby memory system (including Garnet inside it) is one clock domain. To
mimic a multi-clock domain design, you can schedule wakeups of slow routers
intelligently at some multiples of the clock rather than every cycle.

Garnet2.0 Tutorial | ARM Research Summit 2017 Tushar Krishna | Georgia Institute of Technology September 11, 2017 39
Extensions and FAQs
u Topology
u How do we model multiple BW links in Garnet?
u Inherently, that would require support in the router to manage multiple flit
sizes. Instead, you can add multiple links between the same nodes if you want
to model higher bandwidth. If they have the same weight, Garnet will
randomly send over the two.
u Can we model a heterogeneous CPU-GPU system?
u Yes. The current AMD GPU model models a cluster of CPUs connected to a GPU.
u Can we model indirect networks such as Clos?
u Yes, there can be additional routers that are not connected to any controller
and act purely as switches.
u Can we model large-scale HPC networks?
u Garnet can model any sized network. You can run 256-node standalone
simulations easily. However, beyond that gem5 cannot instantiate more
directories (which it uses as destination nodes). I have a patch to run 1024-
node synthetic sims on my website. But these run quite slowly.

Garnet2.0 Tutorial | ARM Research Summit 2017 Tushar Krishna | Georgia Institute of Technology September 11, 2017 40
Extensions and FAQs
u Routing
u How do we model an adaptive routing algorithm?
u If you want to use internal NoC metrics (such as number of credits at an
output port) for making routing decisions, do not use table-based routing.
Instead, set the routing-algorithm to custom, and implement your own routing
function inside RoutingUnit.cc. See outportComputeXY() for reference.

u Flow Control
u Can we implement alternate deadlock avoidance schemes (such as
escape VCs or dateline)?
u You can update the vc selection scheme inside SwitchAllocator to control
which VCs get allocated.

u Microarchitecture
u Can we model variable number of VCs in each router?
u Currently the codebase is very tied to the global vcs_per_vnet parameter. If
you want to model variable number of VCs, one hack could be to have
everyone instantiate the same number of VCs, but modify the VC select to
never allocate certain VCs

Garnet2.0 Tutorial | ARM Research Summit 2017 Tushar Krishna | Georgia Institute of Technology September 11, 2017 41
Conclusions
u Garnet2.0 is an open-source research vehicle
u Use it and contribute to it!
u Being actively maintained by the following people
u Georgia Tech: Tushar Krishna
u AMD Research: Brad Beckmann, Onur Kayiran, Jieming Yin, Matt
Porembas
u If you have any questions, email on gem5-users or gem5-dev mailing lists

u Would love to have it integrated with the classic memory model

u Useful Resources: Thank you!


u www.gem5.org/Interconnection_Network
u www.gem5.org/garnet2.0
u http://synergy.ece.gatech.edu/garnet

Garnet2.0 Tutorial | ARM Research Summit 2017 Tushar Krishna | Georgia Institute of Technology September 11, 2017 42

Você também pode gostar