A Provably Good Performance Centric NoC

2013 IEEE Asia Pacific Conference on Postgraduate Research in Microelectronics and Electronics (PrimeAsia)
A Provably Good Performance Centric NoC

Topology
Tuhin Subhra Das, Prasun Ghosal

Department of Information Technology
Bengal Engineering and Science University, Shibpur
Howrah 711103, WB, India
tuhinbcrec@gmail.com, prasun@ieee.org
AbstractAs chip density increases rapidly with every process large or moderate bisection-width [5], whereas some other
generation, the use of Network-on-Chip (NoC) has become the important metric viz. network diameter [5] [6] as well as node
prevalent architecture for SoC, MPSoC, and, large scale CMP degree [5] are expected to be smaller. A Hybrid locally mesh
(Chip Multi Processor) based designs. Diverse NoC solutions globally star (HLMGS) 2D NoC topology has been proposed
have been proposed by the researchers to meet the enhanced in this paper with an objective of providing a balanced network
on-chip communication requirements. Here, underlying network
interconnection architecture (topology), router design and routing
with a low network latency [6] and higher throughput[6]
policy play an important role for overall system performance benefit. Channel contention [7] problem also has been reduced
improvement. In this work, a 2D Hybrid Mesh based Star by providing more alternative path.
topology has been proposed with an objective of providing low The overall organization of the paper is as follows. Section
latency, low channel contention and higher throughput based
II describes the background and related works in this area
system. The observe experimental results show a maximum
latency benefit of 62% and increase of 48% in throughput for to provide the motivation of the present problem. In section
this proposed topology compared to simple 2D mesh in cost of III details of proposed Hybrid architecture is given here. In
additional area overhead. section IV the details of proposed routing algorithm has been
described. Deadlock freeness of the proposed algorithm is
KeywordsNoC topology; routing; throughput; latency; load presented in section V. Experimental results are reported in
balancing; performance
section VI and section VII concludes the paper.
I. I NTRODUCTION II. BACKGROUND AND M OTIVATION

As the silicon process technology is evolving with an ac- In NoC, topology provides basic interconnection architec-
celerated pace, design reuse and design automation technology ture among the routers. This topologies are broadly categorized
are now looking as the major technical barriers for future into two sections viz. regular and irregular. A regular topology
progress, and this productivity gap between silicon process and follows a symmetric pattern throughout its whole structure.
design technology is increasing rapidly with the time. System- Whereas, an irregular topology is derived by mixing different
on-a-chip [1] design with its advanced IC process technology structure of topology in hybrid or hierarchical or asymmetric
has reduced this design productivity gap to some extent. fashion. A regular grid-based 2D mesh is very popular NoC
However, the incremental changes to current methodologies topology because of having architectural simplicity and sup-
for IC design cannot be sufficient for enabling the full potential porting high level of parallelism. But it has very long network
of system on chip (SoC) integration. Here, NoC [2] has come diameter [5] that leads to higher network latency [5]. While
up with an alternative solution by providing a platform based a hierarchical star [8] topology offers very short diameter but
designing (PBD) methodology. This PBD methodology, which does not suit well for designing parallel architecture because of
actually builds on intellectual property (IP) blocks facilities having smaller bisection width [5]. In some recently proposed
hierarchical design methodology starting at the system level. topology, the researchers have followed some hybridization or
It also provides a clear separation between architecture design hierarchy based technique to design more enhanced topology.
phase and the function design phase. It allows not only reuse For example a star-type 2D mesh [9] is proposed by combining
of components but reuse of the system also. simple 2D-mesh and hierarchical star [8] topology. Again, a
Today NoC with 100 cores [3] already exists and a proto- L2STAR [10] and multi-level mesh [11] follows some level
type for 1000 or more core has also been proposed recently wise hierarchical architecture. Objective is to design a low
[4] by the researchers. Here, key challenge is to provide latency based parallel architecture. But most of them suffer
a massively parallel distributed communication environment. from either higher node degree or from channel contention
The underlying network topology play an important role to problem with the increasing network size. For example, a
improve system performance as it defines the communication multi-level mesh offer low latency but node degree increases
infrastructure between any two router on chip. Where, key almost linearly with the increasing network size. Topologies
issues are modelling, design technology, routing, flow control like star-type 2D mesh[9], L2STAR [10], and SD2D [12]
and deadlock prevention . A properly balanced network offers provides a limit on maximum node degree but they only
support a second level hierarchical routing policy. So making
978-1-4799-2751-7/13/$31.00 2013 IEEE a scalable, low latency and high throughput [6] based reliable
170
(a) Leaf Router (b) Non leaf Router
Fig. 2: Router Architecture when N is 1
between source and destination nodes (positive integer). A

packet may follow global routing in one of following two
situations.
(i)If difference between current and destination node exceeds
Fig. 1: Proposed LMGS topology when N is 4 some predefined threshold value (which is 4 here) then packet
follow global routing.
(ii)Again, when current node position matches with the target
destination node but target destination node is not equal to the
architecture is a challenge to the designer. In this paper, original destination node then also follow global routing.
we have tried to overcome all these limitation through our In global routing when a packet switches from one specific
proposed work. level to another level then the target destination point also
shifted to one level upwards or downwards, though actual
III. P ROPOSED H YBRID LMGS T OPOLOGY destination node remain same. At the beginning of algorithm
current node will be initialized to source node, whereas target
This paper demonstrate a hybrid locally mesh globally star destination node will be initialized with original destination
(HLMGS) NoC interconnection architecture (see Fig.1). Pro- node. This target destination node will shifted to new position
posed topology facilitates both long distance traffic and short at every stage of global routing.Each router node position is
distance traffic by using two different types of connections at represented by the notation (L, X, Y ), where L represents the
different level. Usually mesh facilitates the short distance local level of router, and X, Y the row and column number of that
traffic, whereas star used for long distance traffic. In mesh, a particular node. The current node position is represented by
connection is established between two same levels of routers, the notation (Lc , Xcurr , Ycurr ), the original destination node
whereas in hierarchical star connection a communication be- position by (Ld , Xdest , Ydest ) and the target destination node
tween two different levels of routers is established. Though position by (Ld , Xdest

, Ydest ). Two other variables Xdif f and
presence of extra links and routers require some additional area Ydif f have been used to measure only the difference between
but at the same time they offer more evenly distributed traffic source and destination nodes (always positive integers).
throughout the network. And thus helps in load balancing
of the network and offers better performance with the issues Different variable abbreviations used in the pseudo codes
like congestion control or channel contention [7] problem by are given in table I.
providing more alternative path.
Some important parameters of an M M sized proposed
A. Proposed Routing
architecture are as follows, where, M = 2m for m =
2, 3, 4, . . . , n.[where m,n are positive integer number] The pseudo code of the proposed routing scheme is as
Bisection width = M + 4 follows.
Maximum node degree of non-leaf router = 7
Maximum node degree of leaf router = 9, when N = 4
Maximum node degree of leaf router = 6, when N = 1
Maximum number of IP cores connected to a network = M
M N , where, N represents the numbers of IP(Intellectual B. Global Star Routing
Property) core connected to each leaf level router.
The general pseudo code of the global star routing scheme
is as follows:
IV. P ROPOSED ROUTING A LGORITHM
In proposed routing algorithm two important variables
Xdif f and Ydif f has been used to measure only the difference
171
Xdest X co-ordinate of actual destination node.

Ydest Y co-ordinate of actual destination node.

Xdest X co-ordinate of target destination node,
initialized to value of Xdest .

Ydest Y co-ordinate of target destination node,
initialized to value of Ydest .
Xcurr X co-ordinate of current node.
Ycurr Y co-ordinate of current node.
Lc Level of current node, initialized to 0.
Ld Level of destination node, initialized to 0.
Ld Level of target destination node, initialized to 0.
Xdif f difference between Xdest and Xcurr .
Ydif f difference between Ydest and Ycurr .
TABLE I: Table of notations
Fig. 3: An abstract view of 4 4 hybrid LMGS topology

if (current node == actual destination node) then
packet has reached the destination
end
else
if (current node != target destination node) then
if (Xdif f or Ydif f or (Xdif f + Ydif f ) > 4) then
V. D EADLOCK F REENESS
Then follow global star routing
end A deadlock may appear in NOC routing if proposed routing
else generates a circular waiting path as depicted in[13][10],where
Follow local XY routing
end two different packet or flit will wait for each other in a
end cyclic way for undefined time. So to avoid this kind of
else deadlock situation two different policies has been adopted by
Follow global Star routing
end researchers. One is to restrict the packet movement to avoid
end the generation of circular waiting path. Another way is to split
Algorithm 1: Proposed routing the physical channel to into several number of logical virtual
channel [14][15]. We have chosen the first one as adding
virtual channel is not free of cost and need some additional
space also as described in details in [16]. Simple mesh and
During global routing as the target destination node is
star routing is always deadlock free, and the proposed locally
shifted one level up, the following section will be executed:
mesh based globally star routing will not generate any such
Lc = Lc + 1;
circular path also. Because in a specific level a packet follows
Ld = Ld + 1;
either local mesh routing or global Star routing. So proposed
Xcurr = Xcurr /2;
routing is quite adept to avoid any such kind of unavoidable
Ycurr = Ycurr /2;
situation.
Xdest = Xdest /2;

Ydest = Ydest /2;
VI. E XPERIMENTAL R ESULTS AND D ISCUSSION
When shifted one level downward, the following section For experimental result and evaluation of proposed work,
is executed: we selected Ns-2 [17] as suggested in [6][18][19]. It is an
Ld= Ld1 ;
object-oriented, discrete-event driven network simulator imple-
If Ld != 0 then Xdest
= Xdest /(Lc 1)2;

mented in C++ and Otcl and suit well for simulating NoC at
Else Xdest = Xdest ;
higher abstraction level. This simulator provides a convenient
If Ld !=0 then Ydest
= Ydest /(Lc1 )2;

user interface Network Animator (NAM), which help us to
Else Ydest = Ydest ;
visualize the network operation in real time by tracking the
Lc = Ld ;

data flow (see Fig.4).Important performance centric parameter
Xcurr = Xdest ;

viz. latency, throughput, packet drop rate etc. can be calculated
Ycurr = Ydest ;
easily from this output trace file. This simulator also facilities
us to observe system performance under different network
load. A list of important parameter that has been used for
our experimentation are listed in tableII.
if (current node != target destination node) then
Shift target destination node and next current node one level up; Route For experiments, we create four different type of 4 4
packet towards the new current node; size topology through tcl script. Where routers and PE cores
end
else (i.e. resources) are represented by square and circular node
Shift target destination and next current node one level down; Route respectively and they are connected by duplex link according
packet to new target destination node;
end to the proposed and other compared topologies (see Fig.4).
Each router at level-0 is connected to its neighbour router by
Algorithm 2: Global routing
a maximum channel bandwidth of 1mb. Router at higher level
172
(i.e. other than level-0 or non-leaf router) connected to anther

higher level router by a maximum channel bandwidth of 2
Mb. Thus for a 4 4 size proposed topology (HLMGS) total TABLE II: NS-2 Simulation Parameter Details
eight 2mb channel will be required. Four channels will be
used to connect four level-1 routers in mesh and another four Maximum Channel Bandwidth 1Mb-2Mb
will be used to connect these four level-1 routers to the level- Link delay 0.1-0.15 ms
2 router in star orientation (See Fig.3). Each leaf router (i.e. Topology LMGS, L2STAR, SD2D, Mesh
router at level-0) connected to a single resource (i.e IP core) by
Buffer size 8 (Unit of packets)
1Mb channel bandwidth. UDP is selected for communication
protocol as it provides non-guaranteed datagram delivery. Each Queue Type Drop Tail
source node is attached to an UDP agent and each sink is Packet size 8 Bytes
attached to a null agent. Each source (i.e. UDP agent) is Packet injection rate 500K-1000k
attached to an exponential traffic generator. Traffic on and off Simulation duration 15 sec
period are set to 2 millisecond and 0.1 millisecond respectively. Connection type UDP
Each node uses a DropTail queue, whose maximum size is
Traffic Type Exponential
set to 8 as suggested in [19]. Link delay for short (i.e. for
level-0 connection) and long channel (i.e. other than level-0 Traffic Burst-time (On period) 2ms
connection) is set to 0.1 and 0.15 milliseconds respectively. Traffic Ideal-time(Off period) 0.1ms
A communication scenario has been defined by selecting
some traffic source-sink pairs randomly and simulation run
for 15 seconds. Some perl script has been used to retrieve
require information from the trace file, which are used to
analyse network performance. Important performance centric
parameter like network latency, throughput, packets loss rate
has been observed for different topologies with a varying
network load.
Fig. 5: Packet latency as a function of increasing network load

for 4 4 sized topology
Fig. 4: Snapshot of NS-2 Network Animator for 44 proposed

topology
A latency benefit of 49% compared to l2star [10] and sd2d

[12] and 62% compared to simple mesh has been observed for
this proposed hybrid topology (see Fig.5), simply by doubling
the channel bandwidth of only 16% of total link of the whole Fig. 6: Maximum throughput as a function of increasing
network. This simulated results may differ with the analytical network load for 4 4 sized topology
result, because packet delay influenced heavily by the channel
contention problem rather link delay. Link delay is observable
only at the ideal situation i.e. when no packet gets loss at
173
R EFERENCES
[1] J. Nurmi, Network-on-Chip: A New Paradigm for System-on-Chip
Design, in International Symposium on System-on-Chip, pp. 26, 2005.
[2] V. Rantala, T. Lehtonen, and J. Plosila, Network on Chip Routing
Algorithms, tech. rep., TUCS Technical Reports 779, Turku Centre
for Computer Science, 2006.
[3] Tilera Announces the worlds first 100-core processor. Online, Octo-
ber 2009. Available: http://goo.gl/K9c85.
[4] U. of Glasgow, Scientists Squeeze More Than 1,000 cores on to
Computer Chip. Online. Available: http://goo.gl/KdBbW.
[5] S. Kundu, R. P. Dasari, S. Chattopadhyay, and K. Manna, Mesh-of-
Tree Based Scalable Network-on-Chip Architecture, in IEEE Region
10 Colloquium and the Third ICIIS, 2008.
[6] J. Chen, P. Gillard, and C. Li, Performance evaluation of three
Network-on-Chip (NoC) architectures (Invited), in 1st IEEE Interna-
Fig. 7: Packet loss rate as a function of increasing network tional Conference on Communications in China (ICCC), pp. 9196,
load for 4 4 sized topology 2012.
[7] C. J. Glass and L. M. Ni, The Turn Model for Adaptive Routing,
Journal of the Association for Computing Machinery, vol. 41, pp. 874
902, September 1994.
transmission time. Packet delay for proposed topology (i.e. [8] Z. Song, G. Ma, and D. Song, Hierarchical Star: An Optimal NoC
HLMGS topology) reaches to thresholds value comparatively Topology for High-Performance SoC Design, in International Multi-
at higher load compared to others. An increase of 25% and symposiums on Computer and Computational Sciences (IMSCCS 08),
pp. 158163, 2008.
27% in maximum throughput compared to sd2d and l2star
[9] K.-J. Chen, C.-H. Peng, and F. Lai, Star-type architecture with low
and increase of 48% compared to simple 2d mesh has been transmission latency for a 2D mesh NOC, in IEEE Asia Pacific
observed for this proposed topology (see Fig.6). Packet loss Conference on Circuits and Systems (APCCAS), pp. 919922, 2010.
rate is also is negligible or very low for this proposed topology [10] P. Ghosal and T. S. Das, L2STAR: A Star Type level-2 2D Mesh
(as shown in Fig.7). Low packet drop rate signifies a low architecture for NoC, in Asia Pacific Conference on Postgraduate
channel contention problem also. This signifies a properly Research in Microelectronics and Electronics (PrimeAsia), pp. 155
distributed traffic throughout the network. A comparison on 159, 2012.
required area has been calculated followed by a method [11] M. Saneei, A. Afzali-Kusha, and Z. Navabi, Low-Latency Multi-Level
Mesh Topology for NoCs, in The 18th International Confernece on
proposed by S Suboh et. al in [20]. Where average required Microelectronics (ICM), pp. 3639, 2006.
area (Av ) has been calculated as follows
[12] P. Ghosal and T. S. Das, Network-on-chip routing using Structural
Diametrical 2D mesh architecture, in Third International Conference
Av = Ns (Rs + as dg Sf Bs ) + Nc Ac + al Nl Ll (1) on Emerging Applications of Information Technology (EAIT), pp. 471
474, 2012.
Number of switches (Ns ) is considered 16 for mesh and [13] Y. Fukushima, M. Fukushi, I. E. Yairi, and T. Hattori, A Hardware-
Oriented Fault-Tolerant Routing Algorithm for Irregular 2D-Mesh
21 for HLMGS, whereas average node degree for mesh and Network-on-Chip without Virtual Channels, in IEEE 25th International
proposed HLMGS are consider as 4 and 5.4 respectively. Link Symposium on Defect and Fault Tolerance in VLSI Systems (DFT),
length for leaf level connection has been kept 1, while link pp. 5259, 2010.
length for non-leaf connection is kept 2 units. A 24% to [14] Y. M. Boura and C. R. Das, Fault-tolerant routing in mesh networks,
12% area overhead has been observed by taking other variable in International Conference on Parallel Processing, pp. I.106I.109,
1995.
values as proposed in [20] and varying number of connected IP
core (N) to each router from 1 to 4 for this proposed topology [15] D. H. Linder and J. C. Harden, An adaptive and fault-tolerant
wormhole routing strategies for k-ary n-cubes, IEEE Transactions on
over 2dmesh. So require area overhead decreases with the Computer, vol. 40, pp. 212, 1991.
increasing IP core number. [16] A. A. Chien, A cost and speed model for k-ary n-cube wormhole
routers, in Hot Interconnects 93, 1993.
[17] The Network Simulator-NS-2. Online. Available:
VII. C ONCLUSION http://www.isi.edu/nsnam/ns/.
Though experimental result is quite convincing but lot [18] M. Ali, M. Welzl, A. Adnan, and F. Nadeem, Using the NS-2 Network
Simulator for Evaluating Network on Chips (NoC), in International
more things are still left to experiment in future. For example Conference on Emerging Technologies (ICET 06), pp. 506512, 2006.
result is observed under static routing and assumption of [19] Y.-R. Sun, S. Kumar, and A. Jain, Simulation and Evaluation for a net-
a single resource connected to each leaf router. So how it work on chip architecture using NS-2, in 20th NORCHIP conference,
will perform under dynamic routing and what will be the 2002.
requisite channel bandwidth with the increasing number of [20] S. Suboh, M. Bakhouya, J. Gaber, and T. El-Ghazawi, Analytical mod-
resource connected to each leaf router. Those things need to eling and evaluation of network-on-chip architectures, in International
be addressed in future work. A result under different kind Conference on High Performance Computing and Simulation (HPCS),
pp. 615622, 2010.
of traffic pattern is also a subject to observe. However, the
observed experimental result shows the superiority of proposed
topology over other compared topologies with respect to some
performance centric measurement parameter like throughput,
latency, load balancing and packet loss rate in cost of additional
area overhead.
174

A Provably Good Performance Centric NoC

Enviado por

Dados do documento

Direitos autorais

Formatos disponíveis

Compartilhar este documento

Compartilhar ou incorporar documento

Opções de compartilhamento

Você considera este documento útil?

Este conteúdo é inapropriado?

Direitos autorais:

Formatos disponíveis

A Provably Good Performance Centric NoC

Enviado por

Direitos autorais:

Formatos disponíveis

2013 IEEE Asia Pacific Conference on Postgraduate Research in Microelectronics and Electronics (PrimeAsia)

A Provably Good Performance Centric NoC

Tuhin Subhra Das, Prasun Ghosal

I. I NTRODUCTION II. BACKGROUND AND M OTIVATION

(a) Leaf Router (b) Non leaf Router

Fig. 2: Router Architecture when N is 1

between source and destination nodes (positive integer). A

Xdest X co-ordinate of actual destination node.

TABLE I: Table of notations

Fig. 3: An abstract view of 4 4 hybrid LMGS topology

(i.e. other than level-0 or non-leaf router) connected to anther

Fig. 5: Packet latency as a function of increasing network load

Fig. 4: Snapshot of NS-2 Network Animator for 44 proposed

A latency benefit of 49% compared to l2star [10] and sd2d

Você também pode gostar