Escolar Documentos
Profissional Documentos
Cultura Documentos
a r t i c l e i n f o a b s t r a c t
Article history: Interconnection architectures for hierarchical monitoring communication in parallel System-on-Chip
Available online 5 January 2010 (SoC) platforms are explored. Hierarchical agent monitoring design paradigm is an efficient and scalable
approach for the design of parallel embedded systems. Between distributed agents on different levels,
Keywords: monitoring communication is required to exchange information, which forms a prioritized traffic class
Hierarchical monitoring services over data traffic. The paper explains the common monitoring operations in SoCs, and categorizes them
Network-on-Chip into different types of functionality and various granularities. Requirements for on-chip interconnections
Interconnection architectures
to support the monitoring communication are outlined. Baseline architecture with best-effort service,
Quality of service
time division multiple access (TDMA) and two types of physically separate interconnections are dis-
cussed and compared, both theoretically and quantitatively on a Network-on-Chip (NoC)-based platform.
The simulation uses power estimation of 65 nm technology and NoC microbenchmarks as traffic traces.
The evaluation points out the benefits and issues of each interconnection alternative. In particular, hier-
archical monitoring networks are the most suitable alternative, which decouple the monitoring commu-
nication from data traffic, provide the highest energy efficiency with simple switching, and enable
flexible reconfiguration to tradeoff power and performance.
Ó 2009 Elsevier B.V. All rights reserved.
0141-9331/$ - see front matter Ó 2009 Elsevier B.V. All rights reserved.
doi:10.1016/j.micpro.2009.12.002
L. Guang et al. / Microprocessors and Microsystems 34 (2010) 118–128 119
not differentiate the large variety of monitoring services by their fine-grained monitoring method incurs small area and power
specific requirements. Driven by current technology trend, the de- overhead.
sign of interconnection networks including the realization of QoS Moreover, run-time optimization of specific parameters is an-
needs to consider the emerging considerations. Power and energy other category of monitoring services widely adopted, for instance
efficiency, in particular, has become one of the most important thermal management and power optimization. Thermal manage-
concerns [8,9], since the power density is increasing much faster ment is usually a coarse-grained operation but based on different
than the battery and cooling capacity. thermal conditions specific requirements may be incurred. For in-
This paper analyzes the features of various types of monitoring stance, the prospect of a thermal breakdown requires an urgent
services in the hierarchical agent monitored platform, and outlines monitoring operation, while normal thermal optimization is a
the interconnection requirements to support these features. Sev- comparatively slow process. In [13], the thermal reconfiguration
eral interconnection alternatives, including best-effort architec- interval is set as 167 ls (hundreds of thousands of cycles for oper-
ture, TDMA-based approach and physically separate networks are ating frequency in GHz domain). Power monitoring is more com-
discussed and experimented with diverse settings. The simulation mon for embedded systems. Dynamic voltage and frequency
is performed with 65 nm power and area parameters on an 8 8 scaling (DVFS), as an effective measure to reduce run-time power
on-chip network platform. From these theoretical and quantitative consumption, has been applied in different granularities. Global-
analysis, preferable interconnection alternative under current wise or per-cluster DVFS is a conventional technique, but consid-
technology trend is identified. ered inefficient for certain spatially varying traffic patterns [14].
The rest of the paper is outlined as follows: Section 2 presents Fine-grained per-core DVFS has been proposed with fast voltage
the hierarchical agent monitoring approach and analyzes the fea- switching and low overhead, for instance [14] uses voltage transi-
tures of various monitoring services. Section 3 examines the inter- tion period around 100 ns. Power-gating, which turns off certain
connection requirements to accommodate monitoring components to reduce the static power consumption, is an impor-
communication of different features on the hierarchical agent tant technique for parallel systems. Power-gating can be applied in
monitored system, and discusses four interconnection architec- a coarse-grained manner if a large portion of resources are not ac-
tures in a theoretic manner. Section 4 presents quantitative evalu- tive for a period of time. It can also be applied to fine-grained cir-
ation of these alternatives focusing on communication latency, cuits with less timing penalty. For instance [15], presents on/off
energy and area overhead. Section 5 concludes the paper and dis- channels for interconnection networks, and the transition delay
cusses the on-going work. as low as 10–100 cycles was considered in the experiment.
Monitoring services, as a stand-alone concept, have been pre-
liminarily studied in few previous works. The work in [16] focuses
2. Hierarchical agent monitored System-on-Chip
on the monitoring services for coarse-grained configuration and
debugging operations. The configuration flow exemplified contains
Monitoring services are needed to provide system tracing and
the messages for setting connections, which is an important yet
adjustment at each architectural level. For parallel on-chip sys-
very small part of the potential monitoring traffics on a general-
tems, hierarchical monitoring agents are proposed to perform all
purpose platform. Two other pioneering works [17,18] discuss
types of monitoring operations. Here we will first examine typical
run-time monitoring services targeting NoC-based platform. These
monitoring services in SoCs (Section 2.1), and then present the no-
works focus on functional analysis of the monitoring traffic and the
vel hierarchical agent framework (Section 2.2) .
modification of the design flow with area overhead examined.
However, new considerations, for instance power efficiency of
2.1. Monitoring services for on-chip systems monitoring operations and reconfigurability, have rarely been ad-
dressed in existing works.
Many types of monitoring operations, from system-level to cir-
cuit-level, have been proposed. They are required for resource 2.2. Hierarchical agent monitoring services
management, fault-tolerance, and adjustment of specific run-time
parameters. A scalable approach, hierarchical agent monitoring (Fig. 1), is
Application mapping and network configuration are system-le- proposed to design run-time monitoring services for parallel on-
vel management, which can be performed at run-time given chip systems. An agent is an embedded module, which traces nec-
changing application scenarios and system performance. Properly essary information and correspondingly configures the assigned
mapping instruction flows onto resources is an effective method components dynamically (Fig. 2). The required information may
to improve performance. For instance [10], presents efficient algo- include the status of the assigned components, the messages sent
rithms to map processes onto NoC nodes with minimal expected by other agents, and the environment status if relevant to the per-
energy consumption under performance constraints. This type of formance of the monitored components. The monitoring hierarchy
monitoring service is a coarse-grained operation, and performed is designed based on a hierarchical view of parallel on-chip sys-
infrequently while incurring significant amount of reconfiguration tems. Such a system commonly consists of a pool of components
overhead. There also exist methods of fine-grained application connected by communication interconnections. A fine-grained
mapping, which configure a small number of resources to realize component, either a processing unit or a communication compo-
a relatively simple processing task. For example, nine processors nent, is identified as a cell. A group of cells can be dynamically con-
can be utilized to realize an JPEG encoder on the 36-processor figured into a cluster, a relatively coarse-grained unit of resources.
ASAP chip [11]. A cluster may be assigned for a specific subtask in the application.
Another generic category of monitoring services is fault-toler- The whole platform consists of a number of dynamically assigned
ance, which can be coarse or fine-grained. In a dual or multi-core and configured clusters, as well as the remaining cells which are
platform, the failure of a processing core is a coarse-grained error. either spares or broken components.
With the parallelization of on-chip resources, the fault manage- Hierarchical agents provide monitoring services of various func-
ment for a single core or even a communication channel becomes tionalities (Table 1) on each structural level (Fig. 3).
a fine-grained monitoring operation. For instance [12], describes Application agent specifies the application requirements, func-
using spare wires with local reconfiguration circuitry to deal with tional or non-functional, for instance the expected speedup and
permanent errors in individual communication channels, and the the affordable peak power, to the platform agent. The platform
120 L. Guang et al. / Microprocessors and Microsystems 34 (2010) 118–128
Table 1
Major monitoring services on hierarchical agent monitored SoCs.
agent configures coarse-grained global-wise parameters, for in- by retransmission handled by a monitor on the communication
stance the network topology and universal switching and routing channel [12]. However if whole cells suffer from permanent errors,
techniques. It observes the platform performance, for instance the high-level topology and routing algorithm may need to be rec-
the total power consumption of the platform and the average la- onfigured [19], which belongs to the responsibility of the platform
tency of all traffic flows. The cluster agent is assigned with finer- agent.
grained monitoring operations within a cluster. For instance, the The agents themselves are also victims of errors. The failure of
voltage and frequency of a certain cluster can be set specifically agents makes the corresponding components non-observable.
in case the program running on the cluster requires a faster/slower The failure of the platform agent or a cluster agent significantly
processing speed. The cluster agent observes the performance influence the system operation considering the amount of re-
within a cluster, for instance the thermal hotspot and the traffic sources it is in charge of. This failure needs to be immediately de-
congestion. The lowest level cell agent provides fine-grained local tected and handled. The cell agent failure, though being less
monitoring. Common operations include error detection and critical, still requires fast handling as the local data processing
power gating. and communication are seriously influenced with unpredictable
Besides performance optimization, hierarchical fault-tolerance state. To deal with agent errors, higher level agent regularly checks
is also provided. Each level of agent monitors the failure of local the status of lower level agent. When one agent fails, there can be
components and attempts to fix the error itself. If the errors cannot various mechanisms to ensure proper running of the application.
be fixed by the local agent, the errors will be reported to the higher One alternative is to incorporate backup agents, especially on the
level agent. For instance, transient transmission errors can be fixed level of application and platform agent, where catastrophic conse-
L. Guang et al. / Microprocessors and Microsystems 34 (2010) 118–128 121
quences may happen in case of agent failure. Another alternative is tion. CDMA method decouples traffic classes by using orthogonal
to dynamically assign resources to a healthy agent. The detailed spreading codes. Code generators and demodulators [23] are
discussion of fault-tolerance in hierarchical agent monitored plat- needed in transmitters and receivers respectively. Physically sepa-
form is beyond the scope of this paper. rate networks, instead of using virtual channels, decouple traffic
classes by using dedicated links. Silicon area is sacrificed for sim-
pler switching and arbitration. For instance the TILE64 processor
3. Interconnection architectures for hierarchical monitoring
incorporates five physically separate networks for different types
communication
of traffics [24].
Monitoring communication is a special traffic class to enable
Interconnection architectures require careful analysis to sup-
monitoring operations. Monitoring services, as used to trace and
port hierarchical monitoring communication. The study is driven
adjust system status in case of failure and performance modifica-
by the general concept of quality of service in on-chip communica-
tion (Table 1), are performed constantly during application execu-
tion (Section 3.1), where monitoring communication is treated as a
tion. Thus, the monitoring communication needs to be treated with
prioritized traffic class over data traffic. We will examine the
guaranteed services, decoupled from the data traffic. Several previ-
requirements for this type of traffic class (Section 3.2), and present
ous works have addressed the issues of QoS for certain monitoring
several potential interconnection architectures (Section 3.3).
communication flows. The work in [7] identifies signaling traffic
along with three types of data traffics. The signaling traffic it con-
3.1. Quality of service for on-chip communication siders covers urgent messages which should have the highest pri-
ority and very fast connection. Routing and switching
Quality of service (QoS), in terms of network communication, microarchitectures to provide guaranteed transmission targeting
refers to the methods of providing the required performance to asynchronous systems are studied in [21], though some analysis
specific network clients, for instance a type of traffic [20]. There is applicable to synchronous systems as well. In the context of hier-
are generically two types of QoS services, best-effort and guaran- archical monitoring services, various types of communication may
teed communication. Best-effort service, which provides no special be transmitted on different level of interconnection. Thus the dif-
treatment to individual traffic classes, has the lowest design com- ferentiating these types is necessary and also beneficial since tai-
plexity, while being not able to offer predictable and guaranteed lored configurations can be applied.
performance. Guaranteed services, on the other hand, ensure cer-
tain metrics of performance for a specific traffic class, for instance
guaranteed bandwidth or bounded latency. Usually the provision 3.2. Requirements for hierarchical monitoring communication
of guaranteed services requires the constraints on the traffic itself,
for example an upper boundary on the maximal traffic load. A general-purpose platform is expected to experience various
There exist various methods to provide guaranteed QoS in on- types of monitoring services. Each of these services have require-
chip communication. TDMA, CDMA (code division multiple access) ments with different priorities based on their performance fea-
and physically separate networks are three widely-used tech- tures. Table 2 specifies the priorities of each service type
niques. TDMA reserves time slots for a specific traffic class, usually characterized by its timing features (details of these services can
combined with buffer space reservation [21]. It requires modifica- be found in Table 1). Based on these priorities, several require-
tion of the switching fabrics [22] to offer virtual channel arbitra- ments on the interconnection architectures can be identified.
122 L. Guang et al. / Microprocessors and Microsystems 34 (2010) 118–128
Table 3
Qualitative comparison of interconnection alternatives for hierarchical monitoring communication.
Fig. 6. Positions of cluster and platform agents on the experimental NoC platform.
6.5
4.5
3.5
2.5
(b) TDMA-based Switch
2
0.5 0.55 0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95 1
Supply Voltage
(c) Unified Monitoring Network Power estimation of on-chip communication is enabled by Ca-
dence simulation and external Orion 2.0 tool [25]. We modeled
the inter-switch links in Cadence Spectre using 65 nm STMicro-
electronics technology, and simulated the maximal switching fre-
quencies within a range of supply voltages (Fig. 7). To enable
tradeoff analysis of different network settings, one high voltage/
frequency pair (6 GHz, 1.0 V) and a low voltage/frequency pair
(2 GHz, 0.5 V) are extracted, with the voltage values slightly higher
than the minimal figures in the curve to give proper design slack. It
should be noted that the pairs are chosen for experimental pur-
poses, and in practice, the voltage and frequency should be config-
ured based on the specific platform and application requirements.
Given the two pairs of voltage and frequency settings, the en-
ergy consumption of the interconnection components can be esti-
mated by Orion 2.0. This tool provides accurate system-level
(d) Hierarchical Monitoring Networks calculation of power consumption for on-chip switches and links,
by using detailed modeling equations and technology parameters
Fig. 5. Interconnection architectures for hierarchical monitoring communication on [25]. With supply voltage and working frequency specified, the en-
NoCs. ergy consumed by flits traversing the interconnection can be esti-
mated. Each switch is modeled with the input-buffered structure
works, X–Y routing is used. The locations of cluster agents and the (Fig. 5b), with matrix crossbar and register-type buffers. Each link
platform agent for experimental purposes are illustrated in Fig. 6 is 1 mm long, as intermediate wirings. For TDMA-based connec-
considering reasonable geographic symmetry. It should be noted tion, two virtual channels are integrated. The monitoring networks
L. Guang et al. / Microprocessors and Microsystems 34 (2010) 118–128 125
are assumed as 8-bit wide. Other parameters are set as default val- Average transmission per-flit latency of each trace (Table 4) un-
ues in Orion tool, using 65 nm technology. der the seven network settings (Table 5) is summarized in Fig. 8.
The latency of monitoring communication between cell and cluster
4.3. Synthetic traffic patterns agents is measured in the area with highest data traffic, for traces
with spatial variation (T2 and T3). The values are normalized with
NoC microbenchmarks characterize traffic patterns potentially the lowest latency measured in the experiments, so that they can
experienced in on-chip networks, and are suitable for early-stage be compared across different traffic traces. We can observe that
architectural explorations of interconnection design [26,27]. To the monitoring communication latency is heavily influenced by
evaluate the presented interconnection alternatives, four traffic the data traffic in the baseline architecture (S0), as the latencies
traces generated from NoC microbenchmarks [27] are used in the are large in heavy data traffic (for instance T1 and T3) while rea-
experiments, categorized into three types of traffic patterns: uni- sonably small in light data traffic (T0). In TDMA-based intercon-
form, locality and hotspot (Table 4). nection (S1), the latency is relatively high but not dependent on
Uniform traffic pattern assumes that every node has the same the data traffic pattern. The latencies for the two types of monitor-
probability to be the transmission destination. T0 and T1 are both ing communication remain constant in each traffic trace since the
uniform traffics though with different amount of traffics, in order bandwidth is ensured. Unified and hierarchical monitoring net-
to evaluate the influence of temporal traffic variation. works (S2–S6) provide low-latency transmission for the monitor-
Locality pattern models the network traffic where adjacent ing communication, as being decoupled from the data traffic. In
nodes have higher probability of mutual transmission. The pattern particular, in hierarchical monitoring networks, the transmission
is modeled by Eq. (1), where PðdÞ is the probability of transmission latency can be set specifically to the requirements of each level
destined to a node with the distance of d. AðDÞ is the normalizing of monitoring communication. For example, if the monitoring com-
factor that makes sure the probabilities sum up to 1. D is the max- munication between the cell and cluster agents requires fast con-
imum distance in the network. Existing network mapping algo- nection, while the higher level communication between the
rithms, for instance [10], locate heavily communicating processes cluster and platform agents allows for longer delay, setting S5
onto nearby processing units, so that the total energy consumption can be configured.
can be minimized because of shorter communication distances. Average per-flit energy consumption of each network setting is
With such mapping algorithms, locality traffic traces are likely to summarized in Fig. 9. The measurement of energy consumption is
appear in the network. not influenced by the traffic patterns since the number of transmis-
sion hops is the same (as we use minimal routing). The values are
PðdÞ ¼ 1=ðAðDÞ 2d Þ ð1Þ normalized with the that of the most energy consuming setting for
X
D the specific level of monitoring communication. We can observe
AðDÞ ¼ ð1=2d Þ from Fig. 9 that TDMA-based interconnection consumes the high-
d¼1
est energy, while unified and hierarchical monitoring networks
Hotspot pattern assigns a high transmission probability to cer- provide significant benefits in energy efficiency. Such improve-
tain regions of the network. Such pattern of communication is ment originates from the difference in energy consumption of flits
likely to appear when certain processors are major data consumers traversing each type of routers (Fig. 10). Hierarchical monitoring
in a parallel computing platform. Eqs. (2) and (3) model the prob- networks, in particular, considerably reduce the energy consump-
abilities of a node in the hotspot region and other region as the tion of the communication between cluster and platform agents,
transmission destination respectively. q is the fraction of the traffic since the specialized network uses low-degree crossbars (Fig. 5d),
targeted to the hotspot region. N hotspot is the number of nodes in the which consume the lowest energy (Fig. 10). In addition, hierarchi-
hotspot region. N network is the total number of nodes in the network. cal monitoring networks allow configurable tradeoff on the two
levels of monitoring communication. For example, if the coarse-
PðhotspotÞ ¼ q=Nhotspot ð2Þ
grained monitoring service on clusters issued by the platform
PðotherÞ ¼ ð1 qÞ=ðNnetwork Nhotspot Þ ð3Þ agent is latency-tolerant, the monitoring communication between
platform and cluster agents can be set with low power configura-
4.4. Simulation results tion while leaving the low level monitoring communication with
fast connection (S5).
To comprehensively analyze the features of the four intercon- The area estimation of each network architecture is summa-
nection alternatives, seven settings (Table 5) are simulated running rized in Table 6. The area of each network component is obtained
each of the four network traces (Table 4). In Table 5, ‘‘HP” stands
for the high voltage/frequency pair, and ‘‘LP” stands for the low Table 5
voltage/frequency pair (Section 4.2). The 1st monitoring network Experimental network settings.
refers to the mesh network for monitoring communication be-
Index Architecture Power supply
tween the cell and cluster agents, and the 2nd monitoring network
S0 Baseline HP
refers to the specialized network for monitoring communication
S1 TDMA HP
between the cluster and platform agents (Section 4.1). S2 Unified monitoring HP for data network HP for monitoring
network network
S3 Unified monitoring HP for data network LP for monitoring network
Table 4 network
Experimental traffic traces. S4 Hierarchical HP for data network HP for the 1st monitoring
monitoring network HP for the 2nd monitoring network
Index Traffic Trace detail
networks
pattern
S5 Hierarchical HP for data network HP for the 1st monitoring
T0 Uniform Uniform destination, low traffic monitoring network LP for the 2nd monitoring network
T1 Uniform Uniform destination, high traffic networks
T2 Locality Locality destination, the region in (0, 3, 0, 2) has high S6 Hierarchical HP for data network LP for the 1st monitoring
traffic monitoring network LP for the 2nd monitoring network
T3 Hotspot 70% of packets are destined to the region (0, 3, 0, 2) networks
126 L. Guang et al. / Microprocessors and Microsystems 34 (2010) 118–128
0.7
2.5 0.6
0.5
2
0.4
0.3
1.5
0.2
1 0.1
0
S0 S1 S2 S3 S4 S5 S6
0.5
(a) Monitoring Communication Between Cell and
0 Cluster Agents
S0 S1 S2 S3 S4 S5 S6
(a)TrafficTraceT3 1
0.9
0.7
Normalized Average Latency
5
0.6
0.5
4
0.4
0.3
3
0.2
0.1
2
0
S0 S1 S2 S3 S4 S5 S6
4.5
3
4.5. Quantitative experiment analysis
2.5
1
6GHz, 1V
0.9 2GHz, 0.5V
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
5*5 crossbar 5*5 crossbar 2 virtual channel 3*3 crossbar 2*2 crossbar
Fig. 10. Normalized energy consumption of flits traversing different routers (obtained from Orion 2.0).
Table 6
Area estimation for each interconnection architecture (8 8 mesh).
Architecture Switches ðmm2 Þ Wires ðmm2 Þ Total ðmm2 Þ % of a chip area 275 mm2 (TeraFLOPS)
Baseline 7.2 4.5 11.7 4.3
TDM 7.4 4.5 11.9 4.3
Unified monitoring network 7.7 5.6 13.3 4.8
Hierarchical monitoring networks 7.7 5.7 13.4 4.9
integrate low-degree switches. Moreover, hierarchical monitoring the most appropriate solution on the platform with hierarchical
networks allow flexible configuration on each level of monitoring agents. As this article addresses generic discussion of interconnec-
communication (S4–S6). Given different requirements of monitor- tion architectures, we are currently working on specific applica-
ing flows of various granularities (Table 1), such flexible settings tions and systems. After mapping these applications onto
are needed to support multiple monitoring services on the plat- hierarchical agent monitored platforms, we can examine the pre-
form. The area overhead of separate monitoring networks (Table sented architectures with case-dependent results.
6) is moderate considering the constant technology progress in
multi-layer chip fabrication. Acknowledgement
Shalf, Samuel Webb Williams, Katherine A. Yelick, The Landscape of Parallel Ethiopia Nigussie is a researcher working towards a
Computing Research: A View From Berkeley, Technical Report, UC Berkeley, Ph.D. at the University of Turku. Her research interest
2006. includes self-timed on-chip interconnects, on-chip sig-
[9] Jan M. Rabaey, Scaling the Power Wall: Revisiting the Low-power Design Rules, naling techniques, and variations-tolerant circuit and
Keynote Speech at SoC 07 Symposium, Tampere, November 2007.
design techniques for System-on-Chip. She has a M.Sc.
[10] Jingcao Hu, R. Marculescu, Energy and performance-aware mapping for
degree from Royal Institute of Technology (KTH), Swe-
regular NoC architectures, IEEE Transactions on CAD 24 (4) (2005) 551–562.
[11] Zhiyi Yu, M. Meeuwsen, R. Apperson, O. Sattari, M. Lai, J. Webb, E. Work, T. den and B.Sc. degree from Addis Ababa University,
Mohsenin, M. Singh, B. Baas, An asynchronous array of simple processors for Ethiopia.
DSP applications, in: Proc. Digest of Technical Papers, IEEE International Solid-
State Circuits Conference ISSCC 2006, 2006, pp. 1696–1705.
[12] Teijo Lehtonen, Pasi Liljeberg, Juha Plosila, Online reconfigurable self-timed
links for fault tolerant NoC, VLSI Design 2007 (2007) 13.
[13] P. Chaparro, J. Gonzalez, G. Magklis, Cai Qiong, A. Gonzalez, Understanding the
thermal implications of multi-core architectures, IEEE Transactions on Parallel
and Distributed Systems 18 (8) (2007) 1055–1065.
[14] Wonyoung Kim, M.S. Gupta, Gu-Yeon Wei, D. Brooks, System level analysis of Jouni Isoaho received his M.Sc. (Tech) in electrical
fast, per-core DVFS using on-chip switching regulators, in: Proc. IEEE 14th
engineering, and his Lic.Tech. and Dr.Tech. in signal
International Symposium on High Performance Computer Architecture HPCA
processing from Tampere University of Technology,
2008, 16–20 February 2008, pp. 123–134.
Finland, in 1989, 1992 and 1995, respectively. He was
[15] V. Soteriou, Li-Shiuan Peh, Exploring the design space of self-regulating
power-aware on/off interconnection networks, IEEE Transactions on Parallel an associate professor in the Royal Institute of Tech-
and Distributed Systems 18 (3) (2007) 393–408. nology, Stockholm, Sweden, until he was invited to the
[16] C. Ciordas, T. Basten, A. Radulescu, K. Goossens, J. Meerbergen, An event-based communication systems professorship at the University
network-on-chip monitoring service, in: Proc. IEEE High-Level Design of Turku, Finland, in 1999. There he heads the commu-
Validation and Test Workshop, 2004, pp. 149–154. nication systems laboratory. His research interests
[17] C. Ciordas, K. Goossens, A. Radulescu, T. Basten, Noc monitoring: impact on the include future communication system concepts, appli-
design flow, in: Proc. IEEE International Symposium on Circuits and Systems cations and implementation techniques. His special
ISCAS 2006, 2006, pp. 1981–1984. interests relating to embedded systems are dynamically
[18] C. Ciordas, K. Goossens, T. Basten, A. Radulescu, A. Boon, Transaction reconfigurable agent based systems for future communication and multimedia
monitoring in networks on chip: the on-chip run-time perspective, in: Proc. applications including information security and dependability aspects.
International Symposium on Industrial Embedded Systems IES ’06, 2006, pp.
1–10.
Pekka Rantala is a research scientist working towards a
[19] Lei Zhang, Yinhe Han, Qiang Xu, Xiaowei Li, Defect tolerance in homogeneous
manycore processors using core-level redundancy with unified topology, in: Ph.D. at the University of Turku, Finland. His research
Proc. Design, Automation and Test in Europe DATE ’08, 2008, pp. 891–896. interest includes multi-agent systems and routing and
[20] William James Dally, Brian Towles, Principles and Practices of Interconnection trust management issues on both Network-on-Chip and
Networks, Morgan Kaufmann, 2004. Internet. He has a M.Sc. degree from the University of
[21] T. Felicijan, S.B. Furber, Quality of service (QoS) for asynchronous on-chip Turku.
networks, in: Formal Methods for Globally Asynchronous Locally Synchronous
Architecture (FMGALS 2003), 2003.
[22] Kees Goossens, John Dielissen, Jef van Meerbergen, Peter Poplavko, Andrei
Rădulescu, Edwin Rijpkema, Erwin Waterlander, Paul Wielage, Networks on
Chip, Kluwer Academic Publishers, Hingham, MA, USA, 2003 (Chapter
Guaranteeing The Quality of Services in Networks on Chip, pp. 61–82).
[23] Daewook Kim, Manho Kim, G.E. Sobelman, CDMA-based network-on-chip
architecture, in: Proc. IEEE Asia-Pacific Conference on Circuits and Systems,
vol. 1, 2004, pp. 137–140.
[24] D. Wentzlaff, P. Griffin, H. Hoffmann, Liewei Bao, B. Edwards, C. Ramey, M.
Mattina, Chyi-Chang Miao, J.F. Brown, A. Agarwal, On-chip interconnection Hannu Tenhunen received the Diplomas from Helsinki
architecture of the tile processor, IEEE MICRO 27 (5) (2007) 15–31. University of Technology, Finland, in 1982 and the Ph.D.
[25] A.B. Kahng, Bin Li, Li-Shiuan Peh, K. Samadi, Orion 2.0: a fast and accurate NoC degree in microelectronics from Cornell University, NY,
power and area model for early-stage design space exploration, in: Proc. DATE in 1986. In 1985, he joined Signal Processing Laboratory,
’09, 2009, pp. 423–428. Tampere University of Technology, Finland, as an
[26] C. Grecu, A. Ivanov, R. Pande, A. Jantsch, E. Salminen, U. Ogras, R. Marculescu, Associate Professor and later served as professor and
towards open network-on-chip benchmarks, in: Proc. First International department director. He was also a coordinator of the
Symposium on Networks-on-Chip NOCS 2007, pp. 205–205, 2007. National Microelectronics Program of Finland during
[27] Z. Lu, A. Jantsch, E. Salminen, C. Grecu. Network-on-chip Benchmarking 1987–1991. Since 1992, he has been with Royal Insti-
Specification Part 2: Microbenchmark Specification Version 1.0, Technical
tute of Technology (KTH), Sweden, where he is a Pro-
Report, OCP International Partnership Association, Inc., May 2008.
fessor of electronic system design. He served as dean for
School of Information and Communication Technology
at KTH during 2001–2005. Currently he is director of Turku Centre of Computer
Liang Guang is a Ph.D. student in the lab of computer Science, Turku, Finland and at University of Turku. His current research interests are
systems, in the department of information technology VLSI architectures and systems, especially Network-on-Chip and autonomous sys-
from the university of Turku, Finland. His research tems. He has authored or coauthored over 515 presentations and publications, and
interests include self-aware design and low-power has over 16 patents pending or granted.
techniques for System-on-Chip and Network-on-Chip.
He has a M.Sc. degree from Royal Institute of Technol-
ogy (KTH), Sweden.