Improvement The NoC Performance and Fault Tolerant by Dividing Bandwidth in Mesh and Fat-Tree Topologies

JOURNAL OF COMPUTING, VOLUME 3, ISSUE 12, DECEMBER 2011, ISSN 2151-9617 https://sites.google.com/site/journalofcomputing WWW.JOURNALOFCOMPUTING.
ORG
83
Improvement the NoC Performance and Fault tolerant by Dividing bandwidth In Mesh and Fat-Tree topologies
Reza Kourdy
Department of Computer Engineering Islamic Azad University, Khorramabad Branch, Iran
Mohammad Reza Nouri rad

Department of Computer Engineering Islamic Azad University, Khorramabad Branch, Iran
Abstract We propose a dividing routing algorithm which can increase fault-Tolerant and Communication load that is suitable for multimedia applications in network on chip. We compare the performance of Fat-Tree, 2d-Mesh architectures in the sense of on chip network design methodology. Dividing bandwidth in source or all switches was in order to provide additional bandwidth for application that needs more bandwidth than one link supported bandwidth. We also compare the effect of link delay in convergence of two portion divided traffics, in source switch in Mesh and Fat Tree topologies. We also carry out the high level simulation of on chip network using NS-2 to verify the analytical analysis. Keywords- Dividing Bandwidth, Communication Load, Fault-Tolerance, Network-on-Chip, Traffics Convergence.
throughput for communication among some pairs of cores on the chip [9]. We would use the tool, Network Simulator ns-2 [10],[11] which has been extensively used in the research for design and evaluation of public domain computer network, to evaluate various design options for NOC architecture, including the design of router, communication protocol, Routing algorithms. II. BACKGROUND
I.
INTRODUCTION
Modern integrated circuits (ICs) are becoming increasingly complex. The complexity makes it difficult to design, manufacture and integrate these high-performance ICs. The advent of multiprocessor Systems-on-chip (SoCs) makes it even more challenging for programmers to utilize the full potential of the computation resources on the chips[1]. With increasing reliability concerns for current and next generation VLSI technologies, fault-tolerance is fast becoming an integral part of system-on-chip and multi-core architectures [2]. As CMOS technology scales down into the deep submicron (DSM) domain, devices and interconnects are subject to new types of malfunctions and failures that are harder to predict and avoid with the current system-on-chip (SoC) design methodologies [3]. Network on Chip (NOC), a new chip design paradigm concurrently proposed by many research Groups [4], [5], [6], is expected to be an important architectural choice for future SOCs. Network-on-chip (NoC) becoming a standard for on-chip global communication [7]. This new chip design paradigm called Network on Chip (NOC) offers a promising architectural choice for future systems on chips [8]. Network-on-Chip is likely to become an attractive alternative for implementing SoCs for many application areas like real time multi-media applications. This implies that the underlying on-chip communication network will be required to provide deterministic bounds on delays and
A. Dividing Bandwidth mechanism We used an Arbiter that forwarding data packets to destination in round-robin Manner .so that we can use more than one link bandwidth in order to increasing the communication load of the cores. As shown in Fig 1. We increase the bandwidth utilization by odd-and even in sequence number in packet in round robin manner.
Figure 1. dividing packet in to odd and even flow in source switch.
B. Fault Model There exist several dimensions in classifying the possible fault occurrences during the life cycle of an MPSoC. We list the classification as follows:
JOURNAL OF COMPUTING, VOLUME 3, ISSUE 12, DECEMBER 2011, ISSN 2151-9617 https://sites.google.com/site/journalofcomputing WWW.JOURNALOFCOMPUTING.ORG
84
Duration, the faults can be classified into permanent and transient [12]. In the case of the MPSoC, both types of fault can occur in the chip life cycle. Crash failures are permanent faults which occur when a tile halts prematurely or a link disconnects, after having behaved correctly until the failure. Transient faults can be either omission failures, when links lose some messages and tiles intermittently omit to send or receive, or arbitrary failures (also called Byzantine or malicious), when links and tiles deviate arbitrarily from their specification, corrupting or even generating spurious messages [13]. Location In general, MPSoC designs consist of two integrated parts, the Processing Elements (PEs) and Network-on-Chip (NoC). Faults can occur in both parts. In the case that a fault occurs in the PEs, the computation results will be erroneous. Dynamic fault detecting and masking actions are needed to make sure the erroneous results will not contaminate the application environment. In the case that a fault occurs in the communication path, such as link failure and scrambled messages, a fault-tolerant communication protocol suite, including error-resilient coding schemes, are needed to ensure the reliable delivery of on-chip messages on top of an unreliable on-chip communication substrate. Time to Failure Faults can occur throughout the lifetime of an IC. Using the point when the chip is packaged and tested as the watershed event, we distinguish between before-shelf faults and after-shelf faults. Currently, chips with before shelf faults, i.e., defects which are discovered during testing, are invariably discarded. Only dies with no discovered defects are shipped out as products. With the shrinking feature size, it is becoming increasingly difficult to achieve decent yield with reasonable cost. The low yield problem will become more acute for the 90nm technology and beyond. On the other hand, the potential yield of the manufacturing process can increase tremendously if some defects on the die can be tolerated in the ICs after-shelf life. Static fault masking and isolation techniques, both hardware and software based, can be used to use these previously deemed Bad chips in commercial products, such as Pico Chip [14]. For after-shelf faults, dynamic fault detection and recovery means are needed to ensure the correct function of the chip as long as possible. Furthermore, graceful degradation of system performance is necessary for some mission-critical Applications. III. SYSTEM ARCHITECTURE
heterogeneous or can be homogeneously. A resource can be an intellectual property (IP). The two promising on chip network topologies are Mesh and Fat-Tree, respectively [16]. The comparison of performance evaluation of the two topologies with respect to physical constraints was made by Petrini et al. [17]. The comparison showed clearly that the Fat-Tree topology proposed firstly by Leiserson [18]. We use the Nostrum Mesh architecture of 2-dimension 4*4 Mesh (see fig 2.) and a Fat-Tree (see fig 3). These topologies are easily scaled to different sizes. So we describe our architecture as follows: 1) Mesh Topology (Mesh architecture) The k - array d - dimensional Mesh architecture is built by its dimension d and radix k. This leads the total number of switches to be kd. The kd switches are organized in an ddimensional grid, with k switches in each dimension and wrap-around connections. Since the number of IPs that can be connected to one switch is d - 1, the total number of IPs clearly is: NMesh = kd (d - 1) We denote b as the one directional bandwidth; the total bandwidth is obtained by BMesh = 2kdb [18]. We can consider that dividing bandwidth was occurring in first switch or all switches as below:
Figure 2. dividing bandwidth in first swithes with the Effect of Convergence in Nostrum NoCs.
A. Hardware Architectures The common characteristic of NoC architectures is that the constituent IP cores communicate with each other through switches [15]. Our NOC is a scalable IP packet switched communication platform for single chip systems. The NoC architecture consists of resources that communicate with each other through switches. resources are
As shown in fig.2 , the traffic between res11 and res44 in first switch was divided in to two portions and forward to destination that may caused to convergence that we will demonstrate this status in this paper. we can consider dividing bandwidth was in all switches that was shown in fig. 3.
85
dividing bandwidth in source switch the traffic at each level was divining by two and by using this mechanism in all nodes the traffic at each level (switches) was dividing by two (see fig.3), But if we divide the bandwidth in source switch in IP protocol it may be that we cant reach to the expected bandwidth for our application (see fig.2). We consider two problems that may occur in our NOCs using this scheme as below: 1) Convergence in divided bandwidth As shown in Fig. 2, dividing bandwidth in source switch using IP protocol has the problem of convergence of the two portion of traffic. This is because that IP protocol uses the shortest path for each of two portion of traffic. But in FatTree, due to multiple diverse paths this problem doesnt exist (see fig. 4).
Figure 3. Dididing bandwidth in all switches in mesh- NoCs.
As shown in fig. 3, the traffics near switch1 (Ingress) and switch31 (Egress) was the more than the core of the network. 2) Fat-Tree architecture The number of IPs N-Fat-Tree and the number of switches S-Fat-Tree are straightforwardly calculated as NFat-Tree = kd, SFat-Tree = kd-1 d, Respectively. Therefore, with the one directional bandwidth defined by b, the total bandwidth is presented as B-Fat-Tree = 2kd db.
2) Faults In our simulation, since transient faults have the less effeteness in multimedia during run-time, only static permanent faults are modeled and we observe that the transient faults are negligible, and we just consider a permanent fault in our simulation. A part of the ns-2 script file about constructing the Permanent fault is shown below: $ns rtmodel-at time1.0 down $switsh1 $switch2 B. Simulation Results Four parameter Fault-Tolerant, packet reordering, communication load, and effect of Link-Delay are defined for our evaluation of performance. Finally, from numerical results we can conclude that: 1) Fault-Tolerant We consider a permanent fault that occurs in time 1.0, with three hop distance from the source switch. As shown in fig. 5, In Mesh, due to convergence of two portion of traffic, the number of lost packets increases significantly. If we dont use the dividing packets mechanism, the traffic will be overflowed, this is because of the traffic source rate is more than one link bandwidth.
Figure 4. Dididing bandwidth in two-dimensional Fat-Tree topology.
We carry out the comparison of this above 4 - array 2 dimensional Mesh, and Fat-tree architectures with equal resources. IV. EVALUATIONS
A. Simulation Framework We analyze the performance of a Mesh-based NoC and Fat-tree in presence of permanent faults, with the effect of link delay in two portion of divided bandwidth. If we used
Figure 5. The Lost Packets in 2d-mesh and Fat-tree
86
Thus depending of the nature of fault that when and where was tacked places, the packets of the traffic was rerouted to other switches, and if the fault occur near source or destination node, our lost packets is like the traditional routing, this is because that we have more communication load near the source switch. but if we use our scheme in all switches, because of dividing the traffic at each level, even a permanent fault occurs in core of the chip it can being tolerate, this is because reduce traffic is there and the permanent fault may cause no packet lost(see fig. 3). The problem of permanent fault has the least lost packet related to convergence of two portion traffics (see fig. 5), this means that the convergence the two portion of traffic in dividing bandwidth in source switch caused to this problem. 2) Packet Reordering In IP routing due to using a single packet following the packet reordering is not used, while in the dividing bandwidth case by using the Transport layer of the TCP/IP stack protocol, in destination node the packet reordering is done. Fig.6 shows the packet reordering in the different topologies.
Figure 7. Bandwidth utilization with dividing mechanism
As shown in fig. 7, the Fat-Tree architecture is suitable for on chip network switching cores. It has superior performance related to 2d-Mesh architecture. In mesh by using the bandwidth division, due to convergence, bandwidth may be limited to one link bandwidth. While in Fat-Tree topology, due to lack of convergence of two portion of our traffic, there are not limitations and the expected bandwidth is supported by network, and dividing bandwidth will be more effective. 4) Effect of Link-Delay on Convergence The "link delay" is influence in convergence of two portion traffics that caused to the two portion of traffic was overlap on each other, and caused to communication load of resources was limited to one link bandwidth. Therefore in our simulations, different values of 5, 10, 15 and 20 milliseconds are considered for all link delays parameter, and the effect on bandwidth utilization in our scheme has been examined in different topologies. The impact of the link delay is evident. Network latency is defined as the time taken to move data from the source PE to the destination PE. It includes the message processing overhead, link delay, and the data processing delay at the intermediate nodes. Network latency is a function of the network topology (which determines the number of nodes and links comprising a network) and the communication protocol (which determines the processing requirements for routing and flow control) [19]. Thus we use latency as the primary metric to ascertain the performance of our scheme in mentioned NoC topologies.
Figure 6. Packet reordering in mesh and Fat-tree topologies.
3) Communication Load We consider that the traffic between resource (IPs) and the switches was very higher than the traffic between switch to switches and the traffic source for multimedia application was UDP with 800Kbit per second, and the bandwidth for all links was equal to 500Kbit/s. Without dividing packets in to odd and even ones, the Bandwidth is limited to the 500Kbits per second (one link bandwidth) while in our scheme it can be increases up to 1Megabit/s. we know that in traditional communication the packets forward to the destination in one path, thus the communication of the cores was limited to one link bandwidth.
87
REFERENCES
[1] X. Zhu, W. Qin, Prototyping a Fault-Tolerant Multiprocessor SoC with Run-time Fault Recovery , DAC 2006, July 2428, 2006, San Francisco, California, USA. Sumit D. Mediratta, Jeffrey T. Draper, "Characterization of a Faulttolerant NoC Router", In Proceedings of ISCAS 2007. pp.381-384. T. Dumitras S. Kerner , R. Marculescu, Towards On-Chip FaultTolerant Communication, Department of Electrical and Computer Engineering Carnegie Mellon University, April 2003. M. Sgroi, et al, "Addressing the System-on-a-Chip Interconnect Woes Through Communication-based Design", 38th Design Automation Conference, June, 2001. Luca Benini, Giovanni De Micheli, " Network on Chips: A new SoC Paradigm ", IEEE computer, Jan., 2002. Shashi Kumar, et. al, "A Network on Chip Architecture and Design Methodology", IEEE Computer Society Annual Symposium on VLSI, Pittsburgh,Pennsylvania, USA, April 2002. S. D. Mediratta, J. Draper, Performance Evaluation of Probe-Send Fault-tolerant Network-on-chip Router , 2007 IEEE. Y-R. Sun, S. Kumar, and A. Jantsch, "Simulation and evaluation of a network on chip architecture using ns-2", In Proceedings of the IEEE NorChip Conference, November 2002. L. Singh Sayana, Prof.M.R.Bhujade, Seminar Report on Network On Chip, Computer Science and Engineering, IIT Bombay, April 1,2008. LBNL Network Simulator, http://www-nrg.ee.lbl.gov/ns/. The network simulator ns-2, available at http://www.isi.edu/nsnam/ns/. D. K. Pradhan. Fault-Tolerant Computer System Design. PrenticeHall, Inc., 1996. Ahmed Amine Jerraya, sungjoo yoo, deiderik verkest norbert Wehn, "Embedded software for SoC", Springer, pp.373-386, 2003. W. Robbins. Redundancy and binning of picoChip processors. Fall Processor Forum, 2004, San Jose, CA. Nostrum, http://www.imit.kth.se/info/FOFU/Nostrum . Vu-Duc Ngo , Hae-Wook Choi , "On Chip Network: Topology design and evaluation using NS2", System VLSI Lab, SITI research center, Information and Communication Univercity(ICU), Feb. 2005. F. Petrini and M. Vanneschi, Network performance under Physical Constraints, Proceedings of International Conference on Parallel Processing, pp.34 - 43, Aug 1997. C . E. Leiserson, Fat Trees: Universal networks for hardware efficient supercomputing, IEEE Transations on Computer, C-34, pp. 892-901, Oct 1985. J. Nurmi, H. Tenhunen, J. Isoaho, and A. Jantsch, editors. "Interconnect-Centric Design for Advanced SoC and NoC", Kluwer Academic Publishers, 2004.
[2] [3]
[4]
[5] [6]
Figure 8. The effect of link-delay in Mesh Topology.
[7] [8]
As shown in fig. 8, the maximum link delay that we can consider for the links of switches in Mesh topology, that we have not convergence in our scheme, was equal to 10 milliseconds, and the more link delay caused to convergence of odd and even traffics, while there is no such limitation in Fat-Tree topology (see fig. 9).
[9]
[10] [11] [12] [13] [14] [15] [16]
[17]
[18]
Figure 9. The effect of link-delay in Fat-Tree Topology.
[19]
Therefore in Fat-Tree topology with dividing bandwidth in source switch, with different link delays, was more effective than mesh topology, and two portions of divided traffics where not convergence and will not overflowed. V. CONCLUSIONS AND FUTUREWORK
This paper, a novel dividing bandwidth for on-chip networks has been proposed. These entire features makes that the proposed mechanism is suitable for a wide range of applications that need the more bandwidth than one link bandwidth.

Improvement The NoC Performance and Fault Tolerant by Dividing Bandwidth in Mesh and Fat-Tree Topologies

Enviado por

Dados do documento

Título original

Direitos autorais

Formatos disponíveis

Compartilhar este documento

Compartilhar ou incorporar documento

Opções de compartilhamento

Você considera este documento útil?

Este conteúdo é inapropriado?

Direitos autorais:

Formatos disponíveis

Improvement The NoC Performance and Fault Tolerant by Dividing Bandwidth in Mesh and Fat-Tree Topologies

Enviado por

Direitos autorais:

Formatos disponíveis

JOURNAL OF COMPUTING, VOLUME 3, ISSUE 12, DECEMBER 2011, ISSN 2151-9617 https://sites.google.com/site/journalofcomputing WWW.JOURNALOFCOMPUTING.

Mohammad Reza Nouri rad

Figure 1. dividing packet in to odd and even flow in source switch.

Figure 4. Dididing bandwidth in two-dimensional Fat-Tree topology.

Figure 5. The Lost Packets in 2d-mesh and Fat-tree

Figure 7. Bandwidth utilization with dividing mechanism

Figure 6. Packet reordering in mesh and Fat-tree topologies.

Figure 8. The effect of link-delay in Mesh Topology.

[10] [11] [12] [13] [14] [15] [16]

Figure 9. The effect of link-delay in Fat-Tree Topology.

Você também pode gostar