Você está na página 1de 8

Deadlock Free Routing Algorithms for

Mesh Topology NoC Systems with Regions


1

Rickard Holsmark, 2Maurizio Palesi and 1Shashi Kumar


Jnkping University, Sweden 2DIIT, University of Catania, Italy
1
{rickard.holsmark, shashi.kumar}@ing.hj.se 2mpalesi@diit.unict.it
1

Abstract
Region concept helps to accommodate cores larger
than the tile size in mesh topology NoC architectures.
In addition, it offers many new opportunities for NoC
design, as well as provides new design issues and
challenges. The most important among these is the
design of a deadlock free routing algorithm. In this
paper, we present and compare two routing algorithms
for mesh topology NoC with regions. The first
algorithm is borrowed from the area of fault tolerant
networks and is adapted for the NoC context. We
compare this with an algorithm designed using a
methodology for design of application specific routing
algorithms for communication networks. Our study
shows that the application specific routing algorithm
not only provides much higher adaptivity, but also
superior performance as compared to the other
algorithm in all traffic cases.
Keywords: Routing Algorithms, Networks on Chip,
Deadlock, Wormhole Switching, Application Specific
Routing

is isolated from the outside network using a wrapper as


shown in Fig. 1.
In a NoC system with regions, routing of packets
becomes more complex. Some network routers are
removed from the mesh network to accommodate a
large region. In effect, a region acts as an obstacle to
the network traffic. This not only results in higher
packet latency, but deadlock free routing algorithms
designed for regular mesh network are no more usable.
Wormhole switching used in communication
networks is proposed by several researchers, e.g. [3,4]
as most suitable for on-chip communication. A
drawback with this switching technique is the
increased possibility of deadlocks. To solve the
problem of deadlock, many algorithms have been
proposed for mesh topology networks in literature. For
example, the simple X-Y routing algorithm and Turnmodel based [5] algorithms like west-first, are
deadlock free in mesh networks. However, none of
these can be used for meshes with regions as messages
cannot get around these because of the restrictions on
the allowed turns.
NoC
Router

1. Introduction
Network on Chip (NoC) is slowly being accepted
as an important paradigm for implementing
communication among various cores in a SoC.
Network topology and routing algorithms are the two
most important aspects which distinguish various
proposed NoC architectures [1,2,3,4]. Fixed tile size
based two dimensional mesh topology is favored by
many research groups because of its layout efficiency,
good electrical properties and simplicity in addressing
on-chip resources. Such a physically homogeneous
network is not efficient for incorporating cores of
different sizes in the network. In such a network, the
tile size should be able accommodate the physically
largest core, such as a shared memory. It will also be
hard to reuse earlier designed multi-core sub-systems
within a fixed tile size based NoC. To overcome these
problems the concept of a region was proposed in [1].
This concept allows a rectangular area, larger than a
tile, in the mesh to be declared as a region. The region

Region
Wrapper

Region

Normal
Sized
NoC Tile

Fig. 1. Region within a mesh topology NoC

Bolotin et al. [3] have also proposed nonhomogeneous mesh topology NoC allowing
rectangular cores larger than the mesh tile. Their
solution to deadlock free routing is to use X-Y routing
extended with hard coded paths for region affected
traffic.
A problem similar to regions occurs when
designing fault-tolerant routing algorithms for mesh
networks. Several of these algorithms consider faults to
be contained in rectangular blocks similar to regions.

Proceedings of the 9th EUROMICRO Conference on Digital System Design (DSD'06)


0-7695-2609-8/06 $20.00 2006

In this category, virtual channels [6] have been used to


facilitate design of such algorithms [7]. Still, the use of
virtual channels adds resources and increase design
complexity. Some researchers have proposed fault
tolerant algorithms without the use of virtual channels.
These are based on non-adaptive routing algorithms
that are modified to work in the presence of faults or
regions. In [8] they use modified X-Y routing to route
around faulty blocks, but also impose some
restrictions. In [9] an algorithm that is less restricted
was proposed.
Duato [10] has proposed a general theory to
develop highly adaptive deadlock free routing
algorithms for a general communication network
which uses wormhole switching technique. The basic
idea in Duatos theory is to identify a set of
consecutive communication channels in the network
which if used concurrently can cause a deadlock
situation. The solution is to prevent this situation.
Most of the deadlock free routing algorithms
proposed in literature are general purpose and have
been designed to handle worst case communication
patterns in the network. A NoC system specialized for
a set of applications can be regarded as a semi-static
system. Here we can have the information about the set
of pairs of cores which communicate and other pairs
which never communicate. This information about the
communication topology can be incorporated in
Duatos theory to design highly adaptive routing
algorithms. We call such algorithms as Application
Specific Routing Algorithms (APSRAs) [11]. APSRA
has not yet been used for development of deadlock free
routing in mesh NoC with regions.

2. Region Concept and New Design Issues


The region concept presented in [1] was intended
for use of larger resources, which do not fit in the fixed
sized slot of a regular mesh architecture layout. Region
concept could also be useful for encapsulating a group
of resources which have very high and special
communication requirements which can not be
supported by the general NoC communication
infrastructure. Within such a region, one could have
specialized interconnections as well as communication
protocols for achieving the required performance. One
can also think about encapsulation of a group of
resources as a region for special requirements such as
low power consumption or data security.
Above applications of region may seem to imply
that the region structure has to be physically different
in design from its surroundings. That is however not
necessary; it is also possible to think of the region as a
logical structure. In this case the internal hardware

design of the region is identical with the outside NoC


structure but is somehow isolated from the surrounding
network. This assumes that there are configurable
routers in the NoC that can be used for defining and
maintaining a region.
We feel that reuse of multi-core subsystems will
become a very important application of the region
concept in the near future. For example, multi-media
solutions currently available as separate SoCs can be
reused. It is unlikely that these subsystems will
physically fit in the general slot for a core in the mesh
NoC. Without the region concept the subsystem will
need to be redesigned keeping in view the NoC
constraints. The effort required to redesign may be too
high, or the redesigned subsystem may not be able to
achieve the required performance in the NoC context.

2.1.

Routing in NoC with Regions

Efficient routing of messages within the network is


essential in order to fully exploit the power of the
computing resources and achieve good performance
for applications running on them. A good routing
algorithm should not only provide low latency for
messages but should also be deadlock free when the
network is concurrently routing multiple messages.
However, incorporating regions in mesh networks
result in a major change of the communication
infrastructure and the existing mesh routing algorithms
cannot be directly reused.
In addition to creating problems of deadlock
freedom, regions also affects the traffic distribution in
the network. Traffic flows which get obstructed by the
region have to circumvent it in order to make progress.
This could make the border links of the region more
heavily used as compared to other links. Adaptive
routing is one solution that can reduce the problem of
local congestions. Normally, the term adaptive refers to
a possibility to sense congestion and take action to
divert from it. In this sense it is reactive. When regions
are used in a NoC it is possible that this information is
incorporated in the routing algorithm so that
occurrence of congestion is reduced or avoided.

2.2.

Accessing and Addressing Regions

Since a region occupies a larger area than a


standard resource, it may be useful to consider several
addresses and several access points to it. A large region
may internally provide different types of access
mechanisms to its internal resources. The purpose for
which the region is used might also have an effect on
how the region is designed. A large shared memory
perhaps requires several access points distributed

Proceedings of the 9th EUROMICRO Conference on Digital System Design (DSD'06)


0-7695-2609-8/06 $20.00 2006

around the entire border, whereas a system with many


processing elements might be accessed only by a few
resources outside the region. When using a region the
issue of access-points and addresses to the region must
be defined.
The three major options, in order of increased
routing complexity and accessing power are:
1. Use corner router which originally had a resource
connected to it as a single access point
2. Use the routers on the border that originally had
connections to resources as multiple access
points
3. Use all the possible routers on the boundary as
access points.
Fig. 1 illustrate how a region can be accessed using
multiple access points. In this figure, routers on the
region boundary connect through the wrapper to the
internal region core.

3. Deadlock Free Routing in NoC Systems


The deadlock free algorithms developed for
homogenous mesh networks, like Odd-Even routing
algorithm [12], cannot be directly used in NoC with
regions. To be able to reach all destinations the routing
algorithm has to decide about turns to get around the
region. This will in many situations violate rules that
were used to secure deadlock freeness property in the
case of a homogenous NoC. Breaking these rules in
order to reach a destination may result in a deadlock
situation.
In the following subsections we describe two
routing algorithms that we have used in our evaluation
of routing performance of NoC in the presence of
regions. They represent two different approaches that
can be used to guarantee deadlock free routing in a
NoC both with and without regions. Due to the
restrictions of on-chip resources, we present algorithms
that do not require virtual channels. However, it is
possible to include this feature to increase network
performance. The first approach is adopted from the
area of fault tolerant routing. It is a general routing
algorithm in the sense that it works for any traffic
scenario and region placement in a NoC. This results in
good scalablilty and it supports dynamic changes of
both architecture and communication patterns.
The second approach has evolved from knowledge
of the design optimization of embedded systems. It
relies on the assumption that communication among
tasks in an embedded application is known in advance.
This information about the communication is
incorporated when designing the routing algorithm. As
we need not consider all possible communication

patterns, fewer restrictions need to be applied on the


routes of the actual communications to avoid
deadlocks. Thus, an application specific routing
algorithm can have more adaptivity as compared to a
general algorithm. However, any change in
architecture or communication pattern requires a reanalysis and possibly re-design of the complete routing
algorithm.

3.1.

Algorithm from Fault Tolerance Area

Chen and Chiu [9] present a fault tolerant algorithm


that can be used for routing in the presence of regions.
However, the published algorithm had some errors
which have later been corrected. The improved version
of this algorithm has been used in [13] for routing the
presence of regions in a deadlock free manner. We
describe the basic ideas in the original algorithm here,
for a thorough description of the algorithm, see [9]. For
our purpose a faulty block described in the original
algorithm is equivalent to a region.
Chen and Chiu [9] borrow the idea of rings and
chains from [7] to isolate the faulty nodes from the rest
of the network. For messages which do not encounter
any ring or chain, they allow non-adaptive routes
which use maximum one turn from source to
destination. For messages encountering faulty blocks it
becomes necessary to allow some turns which are
forbidden during normal routing. Only a few
combinations of forbidden turns are allowed in a clever
manner such that these turns can never combine with
each other (or with the allowed normal turns) to form a
cycle. When routing on paths not affected by faults,
messages are forwarded in the network according to
their type, as illustrated in Fig. 2.
CF

CF
CF

RF

CF

RO

Fig. 2. Message types and corresponding allowed


routes in algorithm
A message is of type row first (RF) if it has the
destination to its west. If the destination is to its north
or south it is a column first (CF) message. A message
of type RF can thus change to CF when it reaches the
column of destination. If it has its destination to its east
it is of type column first (CF) except when the
destination is in the same row, then it is row only (RO).
A CF can also change to RO if the destination is in the
same row to its east. However, an RO message never
changes its type. If a message hits the border of a

Proceedings of the 9th EUROMICRO Conference on Digital System Design (DSD'06)


0-7695-2609-8/06 $20.00 2006

faulty block special rules apply depending on the type


of the message and whether the border resides on fault
ring or a fault chain. There are different rules for
routing around these depending on whether faults are
surrounded by; an s-chain (chain that touch the south
border only), a non s-chain (chain that touches only the
west or west and south border) or ring (all other
positions of rings and chains). Fig. 3 illustrates routes
for some messages when traveling in the presence of
faulty blocks (regions). Messages are denoted by their
source (Sn) and destination (Dn).

(APplication Specific Routing Algorithm), is to extend


Duatos theory in such a way as to exploit the
designer's
knowledge
about
communication
characteristics of the application being implemented.
As a result an application specific channel dependency
graph (ASCDG) is built incorporating knowledge
about the communication topology of the algorithm.
Communication Graph
Application
Application
totobe
bemapped
mapped

Network Topology

T2
T1
T4
T3

P1

P2

P3

P4

Mapping
Mapping
Function
Function
P5

Tn

P7
P6

D3
P8

P9

Comm. Concurrency
P10

C1
C2

Cm

S1

P11

P12

P13

APSRA
APSRA

Active Nodes
D1

Faulty Nodes
Memory
Memory
budget
budget

Route

Routing
Tables

Non S-Chain
S-Chain
S3

Compression
Compression

Fault-Ring

Compressed
Routing
Tables

S2
D2

Fig. 4. Overview of APSRA design methodology


Fig. 3. Message routes when encountering fault rings
and chains

3.2.

Application Specific Routing Algorithms

Typical routing algorithms for NoC systems are


designed for a specific network topology and are
independent from the application which will be
mapped on the NoC. If a small variation of the
topology should occur (e.g., due to the merging of tiles
of a mesh based network to form a region) the routers
need to be redesigned. The use of routing table helps to
overcome this problem and makes the router general
and configurable.
Routing tables are filled up with information, which
enables the communication between every pair of
network nodes. The constraint to be satisfied is that the
channel dependency graph (CDG) [10] should not
contain any cycle to ensure that the routing is deadlock
free. To do this, some possible paths, that allow two
nodes to communicate, must be prohibited causing a
degradation of routing adaptiveness. This is, however,
a strong limitation in an embedded system scenario and
the designer cannot exploit his knowledge of the
application that will be mapped on the NoC.
Often the designer is aware about which core pairs
that communicate, and which do not. To overcome this
limitation a methodology to generate application
specific routing functions has been proposed in [11].
The basic idea of this methodology, known as APSRA

In [11] it is proved that if the ASCDG is acyclic


then the routing is deadlock free. Since the ASCDG is
a sub-graph of the CDG, it has more probability to be
acyclic. This probability is quite high since, in practical
cases, each node of the network communicates with a
small subset of other nodes. The result is that a number
of dependencies that are present in the CDG (which is
built by conservatively assuming that all the network
node pairs will communicate) are not present in the
ASCDG (which is built by assuming the actual
communicating pairs). However, if the ASCDG is not
acyclic, a heuristic to break all the cycles with the
objective to minimise the impact on the degree of
adaptiveness, and with the constraint to guarantee
destination reachability has been proposed in [11].
Fig. 4 shows an overview of the APSRA design
flow. The starting point is the application being
implemented along with the network topology. The
application is divided into a graph of concurrent tasks
and, using a set of available IPs, the application tasks
are assigned and scheduled. Finally, a mapping
function is used to decide to which node of the network
each selected IP should be mapped on.
Using this information APSRA generates a set of
routing tables (one for each router of the NoC), which
guarantee both reachability and deadlock freeness with
the objective to maximise the degree of adaptiveness.
The information about communication concurrency
could be also exploited to improve the adaptiveness.

Proceedings of the 9th EUROMICRO Conference on Digital System Design (DSD'06)


0-7695-2609-8/06 $20.00 2006

Finally, a compression technique can be used to


compress the generated routing tables [14].
For the sake of example, let us consider the
communication graph and the topology graph depicted
in Fig. 5(a) and 5(b) respectively.
Communication Graph
T1

Topology Graph

T2

l12

l23

P1
T6

l41

T3

P2

l14

l21
l45

l52

P4
T5

P3

l25

l32
l56

l63

P5

l54

T4
(a)

l36
P6

l65
(b)
T1T5
T2T4

l 41

l12

l23

l 12

l23

l21

l32

l 21

l32

l14

l52

l25

l63

l 36

l41

l14

l52

l 25

l12
l21
l63

l36

l41

l 14

l52

l45

l56

l 45

l56

l45

l54

l65

l 54

l65

l54

(c)

(d)

l25

(e)

Fig. 5. Comparison of cyclic dependencies without and


with APSRA methodology
Although for this example the topology is meshbased, the approach is general and can be applied to
any network topology without modification. As
mapping function, let us consider M(Ti) = Pi,
i=1,2,3,4,5.
The CDG for a minimal fully adaptive routing
algorithm is shown in Fig. 5(c). Since it contains six
cycles, Duato's theorem cannot assure the deadlock
freeness of the minimal fully adaptive routing for this
topology. The number of cycles is reduced to two for
the ASCDG as shown in Fig. 5(d). Although also in
this case we cannot assure the deadlock freeness, we
can simply break the cycle as follows. The application
specific channel dependency l4,1l1,2 is due to the
communication T4T2. Such communication can be
realized by both paths P4P5P2 and P4P1P2.
If the routing function is restricted in such a way as
the latter path is prohibited, then the application
specific channel dependency l4,3l3,1 does not exist
any longer. In a similar way it is possible to break the
second cycle, removing, for instance, the dependency
l1,4l1,5 due to the communication T1T5. However,
this restriction reduces the degree of adaptiveness of
the routing. Now suppose that we have some
knowledge about communication concurrency and
suppose
that
communication
T1T5
and
communication T2T4 do not overlap in time.
Fig. 5(e) highlights the dependencies due to such
communications. Since these communications are not
concurrent, the associated dependencies are not
concurrently active too. The result is that the two
cycles are actually false cycles. In conclusion, for this
latter case a minimal fully adaptive routing is deadlock
free.

4. Evaluation of Algorithms
4.1.

Adaptivity Analysis

One metric to characterize an adaptive routing


algorithm is the degree of adaptiveness [5]. For a given
source destination pair the degree of adaptiveness is
defined as the ratio between the number of admissible
paths and the total number of paths connecting the
source node to the destination node. We calculated the
adaptiveness for a 7x7 NoC with a 2x2 region placed
in the center of the NoC with 4 access points and 1
access point and at bottom left corner with 3 access
points and 1 access point. Note that Chen and Chius
algorithm actually is non-adaptive, and that the
reported adaptiveness is for comparison purposes only.
In all these configurations, the average degree of
adaptiveness exhibited by APSRA exceeded 80%,
while the values of Chen and Chius algorithm in all
cases were slightly below 40%. To compare the
algorithms for different region sizes, we define a new
adaptivity measure called relative adaptivity. It
represents the ratio between the number of paths when
region is present and the number of paths without
region.
0,5
0,45
0,4
0,35
0,3
APSRA

0,25

Chiu

0,2
0,15
0,1
0,05
0
1x1

2x1

2x2

3x2

3x3

(a)
0,9
0,8
0,7
0,6
0,5

APSRA

0,4

Chiu

0,3
0,2
0,1
0
1x1

2x2

3x2

3x3

4x3

4x4

(b)
Fig. 6. Relative adaptiveness vs. size of region: (a)
region in centre and (b) region in bottom left corner
Fig. 6(a) shows the relative adaptivity for different
region size located at the center of the NoC, whereas

Proceedings of the 9th EUROMICRO Conference on Digital System Design (DSD'06)


0-7695-2609-8/06 $20.00 2006

Fig. 6(b) shows this variation for regions located at the


bottom left corner of the NoC. For both cases and for
each region the access point is located at the top right
corner. As expected, the relative adaptivity decreases
with the increase in size of the region in general. For
regions located at the corner of the NoC there is a
minimum in relative adaptivity when region size is 3x3
(or half the dimension of mesh NoC). If region size
increases further the relative adaptivity increases. This
effect is caused by the fact that a region located at the
bottom left corner of the NoC obstructs only
communications between nodes located at the north
quadrant and east quadrant of the region. The number
of these nodes is equal for regions 3x3, 4x3, and 4x4.
For this reason, whilst the number of paths without
region decrease on average (because access point
moves in direction of the center of the NoC), the
number of paths remains fairly the same when region
size increases from 3x3 to 4x3 and further to 4x4.

Latency values were averaged over 5 random


traffic scenarios to get an overall view about how the
performance in the network is affected by changes in
network configuration and packet injection rate.
Blocked Routing Cycles/Router can give information
where the network is most congested.
Simulation Results
We can classify communication traffic into three
types, namely, as communication traffic to region, as
other traffic where a resource other than the region is
a destination, and as all communications which is the
aggregate of the first two types of traffic.
Average Latency, All Communications
53

Simulation Based Evaluation

For our evaluation purposes we have developed a


model of 7x7 mesh topology NoC with regions in SDL
(Specification and Description Language). We have
implemented wormhole switching with a packet size of
10 flits. Every router has two flit input and one flit
output buffer. The router can simultaneously route
packets destined to non-conflicting output ports. The
minimal link delay is three cycles / flit and the
maximum link bandwidth is 0,5 flits / cycle (1 packet /
20 cycles). Cores are modeled as traffic generators and
resource network interface has output buffer large
enough to keep packet generation un-affected by
network conditions. The flits in a packet are sent in a
burst mode at the maximum link bandwidth and the
gap between the packets is varied according to a
Poisson distribution. The destinations for generated
packets are randomly selected with hot-spot probability
of 60 % for region access points. We compare APSRA
and Chen and Chius algorithm with region of size
2x2, either in bottom left corner with 3 access points
(bl_ap3) or in centre of network with 4 access points
(c_ap4). Simulations were carried out using Telelogic
SDL simulation tool (Tau 4.4).
The following parameters were used to study the
performance of a NoC platform. Performance values
were collected over 60 000 packets, after a warm-up
session of 30 000 packets.
Average Latency: The average delay of a
packet from source (when the header leaves) to
the destination (when the tail has reached).

Latency (cycles)

4.2.

Blocked Routing Cycles/Router: The total


number of routing cycles when packets were
blocked in a router.

apsra_bl_ap3
chiu_bl_ap3

48

apsra_c_ap4
chiu_c_ap4

43

38

33
1

Packet Injection Rate (% of LBW)

Fig. 7. Average latency for all communications


with region placed in bottom left (bl) and centre (c), vs.
packet injection rate in % of link bandwidth
The first result shows average latency for all
communications in the network, as depicted in Fig. 7.
The lowest latency values are obtained for APSRA
with central region (apsra_c_ap4). Second lowest
latency values are obtained with Chen and Chius
algorithm and central region (chiu_c_ap4). After this is
APSRA with region in bottom left corner
(apsra_bl_ap).
The worst performance is shown by Chen and
Chius algorithm and region in bottom left corner
(chiu_bl_ap3). In Fig. 8 we give average latency for
traffic with destinations other than the region. The
worst position from latency point of view, up to an
injection rate of 5%, is with Chen and Chius algorithm
and region in centre (chiu_c_ap4). In this case all the
other combinations provide similar latency values in
this range. However, when injection rate is increased
above 5%, Chen and Chius algorithm and region in
corner position (chiu_bl_ap3) rapidly saturates. Next to
saturate is APSRA with region in corner

Proceedings of the 9th EUROMICRO Conference on Digital System Design (DSD'06)


0-7695-2609-8/06 $20.00 2006

(apsra_bl_ap3). The best result from saturation point of


view is when using APSRA and region in centre
(apsra_c_ap4), although it has slightly higher latency at
lower injection rates.

that the scale of blocked routing cycles is not the same


in the two diagrams.

Average Latency, Other Traffic


43

apsra_bl_ap3

Latency (cycles)

42

chiu_bl_ap3

41

apsra_c_ap4

40

chiu_c_ap4

39
38
37
36
35
1

(a)

Packet Injection Rate (% of LBW)

Fig. 8. Average latency for communications destined


outside region, with region in bottom left (bl) and
centre (c), vs injection rate in % of link bandwidth
Average Latency, Region Traffic

Latency (cycles)

58

apsra_bl_ap3
chiu_bl_ap3

53

apsra_c_ap4
chiu_c_ap4

48
43

(b)

38
33
1

Output Rate (% of LBW)

Fig. 9. Average latency for communications destined


to region in bottom left (bl) and centre (c), vs injection
rate in % of link bandwidth
We also give results for traffic destined only to
region (see Fig. 9). In this case also APSRA with
central region show the best performance results in
terms of low latency. In this case, however Chen and
Chius algorithm with central region clearly gives
better results than both algorithms with region at
bottom left position. Worst performance is also in this
case shown by Chen and Chius algorithm with region
in bottom left corner.
Fig. 10 gives more detail about what causes the
difference in latency values. The diagrams present
values on how many routing cycles the packets were
blocked in different routers. These results are from
one of the simulations with 10 % packet injection rate,
where the difference in latency was very large. Note

Fig. 10. Blocked routing cycles/router with (a)


APSRA algorithm and (b) Chen and Chius algorithm
Fig. 10 (a and b) reveals that APSRA algorithm
does not cause as much blockage as does Chen and
Chiu algorithm. Note that Chen and Chius algorithm
result in more blockages close to north and west border
of the region. The reason is that this path is highly
utilized by the algorithm in the procedures of routing
around region border. APSRA on the other hand is not
biased towards specific routes, and thus spreads the
traffic more evenly around the border. As APSRA in
many situations have several paths to select from it is
also possible to avoid congested routes which further
decreases the blockage.
Discussion on Results
The simulation results show that APSRA has an
overall advantage in communication latency, for
identical traffic scenarios. This is probably an effect of
its unbiased behavior, which has fewer tendencies to
create highly congested routes as compared to Chuis
algorithm. In addition, the higher adaptivity of the

Proceedings of the 9th EUROMICRO Conference on Digital System Design (DSD'06)


0-7695-2609-8/06 $20.00 2006

algorithm makes it possible to avoid congested routes.


This is especially shown in the results of the traffic not
destined to the region. In this case, a large difference is
shown between APSRA and Chen and Chius
algorithm for the region in the centre.
Even though the average distance for APSRA is
slightly longer for a region in the centre, as indicated
by a somewhat higher latency at lower loads, APSRA
manages to keep communication below saturation up
to approximately 8%. For the same scenario, Chen and
Chius algorithm has significantly higher latency.
Considering traffic to region, the latency is more
dominated by the distance from sources to the
destinations, which in this case is shorter with a
centrally placed region. As traffic to the region has a
probability of 60% this also dominates the average
latency when we consider all communications case.

5. Conclusions
In this paper we have highlighted the importance of
the region concept in mesh topology NoC architecture.
We have also listed new issues which a designer will
encounter while designing a heterogeneous mesh
topology NoC system using multi-port or multi-access
point cores. We presented and compared two deadlock
free routing algorithms for mesh NoC with regions.
Our analysis and simulation based evaluation
demonstrate that minimal distance deadlock free
algorithms designed using APSRA methodology outperforms the other algorithm borrowed from fault
tolerant area in terms of adaptivity and latency.
However, the area of a NoC router required by the
APSRA based algorithm is expected to be larger than
the router for the other algorithm. This is because
APSRA requires tables within each router to store
routing information, whereas the other algorithm can
be implemented as an optimized FSM. The table based
implementation of the APSRA based algorithms could
also be a blessing because it allows configurability
(and even dynamic re-configurability) of routing
algorithms to efficiently handle modifications in
communication requirements in the running
applications. Future developments will mainly address
the definition of design space exploration strategies to
optimally determine region placement, shape, and
number of access points.

6. References
[1] Kumar, S., Jantsch, A., Soininen, J-P., Forsell, M.,
Millberg, M., berg, J., Tiensyrj, K., Hemani, A.: A
network on chip architecture and design methodology.
In IEEE Annual Symposium on VLSI (April 2002)
[2] Dally, W.J., Towles, B.: Route Packets, Not Wires: OnChip Interconnection Networks. Design Automation
Conference (DAC), Las Vegas, NV (June 2001)
[3] Bolotin, E., Morgenshtein, A., Cidon, I., Ginosar, R.,
Kolodny, A.: Automatic Hardware-Efficient SoC
Integration by QoS Network on Chip. ICECS (2004)
[4] Pande, P.P., Grecu, C., Ivanov, A., Saleh, R.: Design of
a Switch for Network on Chip Applications, Proc. Int.
Symp. Circuits and Systems (ISCAS), vol. 5, pp. 217220, May 2003.
[5] Glass, C. J., Ni, L. M.: The turn model for adaptive
routing, Journal of the Association for Computing
Machinery, vol. 41, no. 5, pp. 874-902, 1994.
[6] Dally, W.J., Aoki, H.: Deadlock-free adaptive routing in
multicomputer networks using virtual channels. IEEE
Transactions on Parallel and Distributed Systems,
4(4):466--475, (April 1993)
[7] Boppana, R. V., Chalasani, S.: Fault-tolerant wormhole
routing algorithms for mesh networks. IEEE
Transactions on Computer, Vol. 44, No. 7, (1995)
[8] Wu, J.: A Fault-Tolerant and Deadlock-Free Routing
Protocol in 2D Meshes Based on Odd-Even Turn
Model. IEEE Trans. Computers 52(9):1154-1169 (2003)
[9] Chen, K-H., Chiu, G-M.: Fault-Tolerant Routing
Algorithm for Meshes without Using Virtual Channels.
Journal of Information Science and Engineering, Vol.14
No.4, pp.765-783 (December 1998).
[10] Duato, J.: A New Theory of Deadlock-Free Adaptive
Routing in Wormhole Networks. IEEE Trans. on
Parallel and Distributed Systems, 4(12): 1320-1331
(December 2003).
[11] Palesi, M., Holsmark, R., Kumar, S., Catania, V.:
APSRA: A methodology for design of application
specific routing algorithms for NoC systems. Technical
Report DIIT-TR-01-060406, Dip. di Ingegneria
Informatica e delle Telecomunicazioni, Univ. di Catania
(2006)
[12] Chiu, G.-M.: The Odd-Even Turn Model for Adaptive
Routing, IEEE Trans. on Parallel Distribuited Systems,
vol. 11, no. 7, pp. 729-738, 2000.

Acknowledgements

[13] R. Holsmark and S. Kumar, Design Issues and


Performance Evaluation of Mesh NoC with Regions,
Norchip 2005, Oulu, Finland (November 2005)

The research reported in this paper was supported by the


project, Specialization and Evaluation of Network on Chip
Architectures for multi-media applications, funded by the
Swedish K.K. Foundation. We thank Prof. Petru Eles for
valuable discussions and suggestions.

[14] M.Palesi, S.Kumar, R.Holsmark, A Method for Router


Table Compression for Application Specific Routing in
Mesh Topology NoC Architectures, SAMOS VI:
Embedded Computer Systems: Architectures, Modeling,
and Simulation. Samos, Greece, July 17-20, 2006.

Proceedings of the 9th EUROMICRO Conference on Digital System Design (DSD'06)


0-7695-2609-8/06 $20.00 2006