Escolar Documentos
Profissional Documentos
Cultura Documentos
Zheng Li , Jie Wu , Li Shang , Alan Mickelson , Manish Vachharajani , Dejan Filipovic , Wounjhang Park and Yihe Sun
TNLIST, Inst. of Microelectronics, Tsinghua University, Beijing 100084, China Department of Electrical, Computer, and Energy Engineering, University of Colorado, Boulder, CO 80309, U.S.A
General Terms
Design, Performance
Keywords
Networks-on-Chip, Silicon Photonics
1.
INTRODUCTION
While fabrication technology scaling has steadily reduced the size and improved the performance and power eciency of transistors, communication remains a key challenge for multi-billion-transistor many-core systems. The main considerations are absolute data throughput and message latency versus the total power dissipation of the interconnect.
This paper was supported in part by the NSF under awards CCF0829950, in part by the National Natural Science Foundation of China (NSFC) under grant #60236020 and the Specialized Research Fund for the Doctoral Program of Higher Education (SRFDP) #20050003083. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for prot or commercial advantage and that copies bear this notice and the full citation on the rst page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specic permission and/or a fee. ISLPED09, August 1921, 2009, San Francisco, California, USA. Copyright 2009 ACM 978-1-60558-684-7/09/08 ...$10.00.
SiO2
Vertical Horizontal
Status update
W7
W0 Wavelengths
Arbitration
Coherence packets
Coherence packets
Laser power
Si Linear waveguides Silicon-silicon interconnect Metal wires CMOS devices Through Silicon via Si Package
receive broadcast
receive broadcast
Planar
Planar
receive
0 1 2 3 4 5 6 7
Time
Figure 1: Schematic depiction of Iris of a low-latency planar-waveguide-based WDM broadcast multicast nanophotonic subnetwork and a linear-waveguidebased throughput-optimized circuit-switching nanophotonic subnetwork. This dual network design delivers exceptional performance with low power dissipation by: 1. Optimizing the delivery of short latency-bound multicast messages on the low-latency multicast subnetwork. 2. Minimizing circuit setup time for the circuit-switched throughput-optimized network by using multicast messages sent over the low-latency multicast subnetwork. 3. Minimizing power dissipation by using ecient optical communication for circuit setup, multicast, and highthroughput data-transfer. Moreover, each of Iris subnetworks is carefully designed to address fabrication challenges for nanophotonic interconnects. In particular we have successfully fabricated the key novel structure, the antennas for broadcasting messages in a planar waveguide, in a CMOS compatible manner. The net result is an interconnect fabric with excellent throughput, low latency, and low power dissipation. It shows that, compared with other recently proposed electrical and photonic alternatives, Iris improves performance by 63.3% and 52.6%, and power eciency by 56.1% and 72.6%, respectively.
Filters
Source waveguides
Detectors
Light Source
Transmitter
Nanophotonic waveguides
Receiver
Figure 3: Photonic network structure wavelengths in the planar waveguide to arbitrate for one shared resource. The arbitration scheme starts with each node being assigned a unique dynamic priority. Each node broadcasts a signal 1 on the wavelength corresponding to its dynamic priority. As the node sends the one-bit message, it also listens for the combined messages broadcasted across all wavelengths. If multiple nodes try to access the same resource, they will see the same bit vector, reecting the dynamic priorities of all the requesting nodes. Each node will be equipped with a checking logic to exam if any of the bits in the bit vector has a higher priority than its own. If so, the node knows that it fails the arbitration. After each data transmission, each node changes its priority using a deterministic random number generator. Since nodes share the same random seed, they agree on the same global priority. To arbitrate for the same K resources, i.e., multiple broadcast channels, W nodes can still leverage W wavelengths in the broadcast network. The only dierence is that each node will be pre-assigned a resource id for arbitration. The arbitration scheme turns to K W/K : 1 arbitration in parallel. This design sacrices some arbitration eciency for reduced arbitration latency and simplied arbitration logic.
2.
In Iris, both the broadcast and throughput nanophotonic subnetworks operate in tandem to provide low-latency, highthroughput communication at low power levels. The interaction is straight-forward. Short, multicast messages, are transmitted via WDM on the planar waveguide-based broadcast network. Since the network is inherently broadcastbased, multicast requires no additional eort. However, since it is a broadcast medium, arbitration for the WDM channels becomes the primary logical design challenge. For larger, throughput-sensitive messages, the linear-waveguidebased throughput-optimized subnetwork is used as it can deliver a higher total cut-through bandwidth. To allocate a channel, we use the broadcast network to reserve the appropriate wavelength and the physical waveguides to set up and tear down circuits with very low latency.
2.1
The broadcast subnetwork is used for short, latency-critical, often-multicast messages. In addition, because each broadcast operation has global reach with low latency, all global resource arbitrations (for both the broadcast subnetwork and the circuit-switched subnetwork) are conducted over this network. Another advantage of this approach is that we can leverage the broadcast nature of the subnetwork to provide a global order for events. The key logical design consideration for the broadcast network is how to perform arbitration. To do this, we leverage the broadcast nature of the subnetwork and WDM. As we will see, the net result is a distributed global arbitration scheme that enables one-cycle arbitration. Figure 2 illustrates the arbitration process. W nodes can leverage W
Figure 4: Antenna model and scanning electron microscope images at the destination. In addition, the source node will broadcast a message to all other nodes, each of which will update its local resource occupation table to reect the newly established circuit. Coordination messages, arbitration, and state update messages all travel through the broadcast network. To minimize network latency and power consumption, the proposed design uses a low-radix mesh topology. High-radix design can eectively minimize network average routing hops. However, given a xed on-chip communication bandwidth, increasing radix increases serialization latency. For light speed operated photonic interconnect, the number of hops does not aect the latency, but the bandwidth does. Therefore, the proposed network uses a low radix mesh design. This design decision also considers power optimization. Since bends and crosses introduce intersection loss, cross talk and back reection, which in turn causes high power consumption, crosses and bends should be avoided whenever possible. The proposed switch design can eectively minimize optical power loss. In addition, since the switches are operated at packet level, the power consumption is negligible compared to transceivers which are operated at bit level.
o the indiused boundaries. The antenna structure that we were attempting to produce is the ideal structure of the Pistolkors [13] diraction antenna as is depicted in Figure 4(a). To fabricate the antennas, a thin layer ( 20nm) of Au is vacuum deposited on the silicon layer of an SOI wafer. A set of square Au islands in a two-dimensional square pattern are then produced by photolithographically processing this Au layer. A reasonable replication of the mask detail is performed in the lithography process, except for some loss of edge sharpness, which is not so desirable for antenna operation. By using a focused ion beam (FIB) that focuses gallium ions onto the gold surface and etching through the deposited layer and down to the surface of the silicon, the optical antenna structures are then formed. Figure 4(b) shows a scanning electron micrograph of an antenna structure written in 20 nm Au on top of the SOI layer.
4. EXPERIMENTAL RESULTS
In this section, we evaluate Iris, the proposed nanophotonic on-chip network. The following experiments are conducted on a 4 4 CMP, using a trace-driven cycle-accurate cache-network simulator. Network trac traces are gathered using the M5 full-system simulator [14] running several SPLASH2 [15] and ALPBench [16] multithreaded benchmarks. The performance and power eciency of Iris are evaluated against the following recently-proposed electrical and photonic alternatives. ELE: a packet-switched electrical mesh network equipped with latency-optimized threepipeline-stage router design with speculative virtual channel allocation [17]. PLinear: a linear photonic waveguide based circuit-switched mesh network, an approximation of a recently-purposed optical network design [7]. It consists of a latency-optimized electrical network to setup the photonic circuit switch path. PPlanar: a hybrid package-switched network consisting of a planar photonic waveguide based subnetwork and an electrical mesh subnetwork. Coordination messages are sent via planar network like Iris, but the large data packets are sent via electrical network. Figure 5(a) shows the power dissipation breakdown of Iris and the other three alternatives. The electrical network power consumption of each of the alternatives is calculated by synthesizing a RTL router design using Synopsys Design Compiler with TSMC 65 nm low power technology library. The power consumption of the electrical network is contributed by routers (labeled as router) and link circuitry (labeled as link). The power consumption of the nanophotonic network is contributed by planar waveguides loss (labeled as p loss), linear waveguides loss (labeled as l loss), transceiver power for multicasting/broadcasting (labeled as p TR) and for unicasting (labeled as l TR). Detector power is negligible compared to other power sources (thus not shown in the results). This study shows that Iris is the most power ecient onchip interconnect solution. Compared to ELE, the electrical alternative, Iris reduces the power consumption by 56.1%. On the other hand, PLinear and PPlanar do not show power benet compared to ELE due to the following reasons. Based on current nanophotonic technology, on-chip waveguides, modulation and crossing are lossy and power consuming. PLinears power characterization shows that transmitting small, often-multicast messages is not power efcient using the linear waveguides due to the high waveguide loss, transceiver power consumption, and also the power consumption of the electrical network used to setup the circuitswitched link path of the photonic network. Compared to
3.
As shown in Figure 3, Iris nanophotonic subnetworks consist of transmitters, waveguides and receivers. The nanophotonic components are fabricated in separate silicon layers, and integrated with the CMOS silicon die through threedimensional integration. Therefore, the design and fabrication of each layer can be optimized independently. As shown in Figure 1, the broadcast waveguide is a planar SOI waveguide built on the top layer of the nanophotonic stack. Within the planar SOI waveguide layer, an array of nanophotonic antennas are printed nanoscopically on the surface of the silicon, which broadcast and receive optical signals. These antennas are one of the unique features of Iris, but also are the key unproven nanophotonic component however, we discuss how we have successfully fabricated these antennas. The antenna feeds extend through several layers of nanophotonic devices, i.e. ring resonators, that relay messages between the send/receive electronics and the planar waveguide. The layer of the linear SOI waveguides is underneath these devices. The linear waveguides form a mesh-like circuit-switched network, which delivers o-chip laser power to on-chip processing cores, transfers large data messages between on-chip cores, and also communicates with o-chip DRAMs via chip-to-chip ber. The bottom layer consists of Germanium-doped silicon photon detectors and ampliers and other electrical components. We choose SOI for both planar and linear waveguides because the index of refraction contrast between silicon (nSi 3.475) and silicon dioxide (nSiO2 1.45) at the working bands (1530 nm 1625 nm) is one of the largest that can be achieved. This large index dierential can be used to obtain tight connement in silicon layers with minimum scattering
50
routers fmm lu
p loss
p TR radix
l loss
100
memory lu
40 Power (W)
mpgenc ocean
SPECJbb TPC-H
80
30
60
20
40
10
20
0
s ar Iri lan ar PP ine PL E EL s ar Iri lan ar PP ine PL E EL s ar Iri lan ar PP ine PL E EL s ar Iri lan ar PP ine PL E EL s ar Iri lan ar PP ine PL E EL s ar Iri lan ar PP ine PL E EL s ar Iri lan ar PP ine PL E EL s ar Iri lan ar PP ine PL E EL s ar Iri lan ar PP ine PL E EL s ar Iri lan ar PP ine PL E EL s ar Iri lan ar PP ine PL E EL
0
s ar Iri lan ar PP ine PL E EL s ar Iri lan ar PP ine PL E EL s ar Iri lan ar PP ine PL E EL s ar Iri lan ar PP ine PL E EL s ar Iri lan ar PP ine PL E EL s ar Iri lan ar PP ine PL E EL s ar Iri lan ar PP ine PL E EL s ar Iri lan ar PP ine PL E EL s ar Iri lan ar PP ine PL E EL s ar Iri lan ar PP ine PL E EL s ar Iri lan ar PP ine PL E EL
Figure 5: Performance and power comparison PLinear, Iris improves the overall network power eciency by 72.6%. For PPlanar, since most of the throughput-hungry messages are still transferred via the electrical subnetwork, the power consumption of the electrical network is signicant. Compared to PPlanar, Iris improves the overall network power eciency by 64.0%. As technology scales further, the power eciency of nanophotonic devices and components is expected to improve. The power consumption of electrical network, on the other hand, is expected to increase. Therefore, using nanophotonic on-chip interconnect becomes increasingly power benecial. Next, we evaluate the performance of Iris. Figure 5(b) shows the average L2 cache miss latency (read and write) of Iris compared to the three alternatives. In this study, latency is decomposed into the following four components: cache miss request latency (labeled as request), protocol transaction latency (labeled as protocol), memory access latency (labeled as memory), and acknowledgments or data reply latency, (labeled as ack). This study shows that the planar waveguide eciently delivers coherence messages and simplied protocol transaction, while the linear waveguide network improves the network throughput for large data packets. Therefore, compared with ELE and PLinear, Iris, which is equipped with the new planar waveguide, can signicantly reduce the protocol transaction latency and request latency. In addition, the planar waveguide can effectively reduce the setup latency of the circuit-switched network. Overall, Iris provides a high-performance on-chip communication solution with 63.3% improvement over ELE, 52.6% improvement over PLinear, and 44.1% improvement over PPlanar. [2] Tilera TILE64 chip-multiprocessor, http://www.tilera.com. [3] S. Vangal, et al., An 80-tile 1.28 TFLOPS network-on-chip in 65nm CMOS, in Proc. Int. Solid-State Circuits Conf., Feb. 2007, pp. 98589. [4] A. Kumar, et al., Express virtual channels: towards the ideal interconnection fabric, in Proc. Int. Symp. Computer Architecture, June 2007, pp. 150161. [5] R. Beausoleil, et al., Nanoelectronic and nanophotonic interconnect, Proceedings of the IEEE, vol. 96, no. 2, pp. 230247, Feb. 2008. [6] N. Kirman, et al., On-chip optical technology in future bus based multicore designs, IEEE Micro, vol. 27, no. 1, pp. 5666, 2007. [7] A. Shacham, K. Bergman, and L. Carloni, Photonic networks-on-chip for future generations of chip multiprocessors, Computers, IEEE Transactions on, vol. 57, no. 9, pp. 12461260, Sept. 2008. [8] D. Vantrease, et al., Corona: System implications of emerging nanophotonic technology, in Proc. Int. Symp. Computer Architecture, 2008, pp. 153164. [9] M. Petracca, et al., Design exploration of optical interconnection networks for chip multiprocessors, in Proc. Symp. High Performance Interconnects, Aug. 2008, pp. 3140. [10] C. Batten, et al., Building manycore processor-to-DRAM networks with monolithic silicon photonics, in Proc. Symp. High Performance Interconnects, Aug. 2008, pp. 2130. [11] N. E. Jerger, L.-S. Peh, and M. Lipasti, Virtual circuit tree multicasting: A case for on-chip hardware multicast support, in Proc. Int. Symp. Computer Architecture, June 2008. [12] M. Haurylau, et al., On-chip optical interconnect roadmap: challenges and critical directions, in Proc. Int. Conf. on Group IV Photonics, Sept. 2005. [13] A. A. Pistolkors, Theory of the circular diraction antenna, Proceedings of the IRE, vol. 36, no. 1, pp. 5660, 1948. [14] N. L. Binkert, et al., The M5 simulator: Modeling networked systems, IEEE Micro, vol. 26, no. 4, pp. 5260, 2006. [15] SPLASH2 website, http://www-ash.stanford.edu/apps/SPLASH/. [16] M.-L. Li, et al., The ALPbench benchmark suite for complex multimedia applications, in Proc. Int. Symp. Workload Characterization, Oct. 2005, pp. 3435. [17] W. J. Dally and B. Towles, Principles and Practices of Interconnection Networks. Morgan Kaufmann Pub., 2003.
5.
CONCLUSION
Emerging many-core on-chip systems call for power-ecient, high-performance on-chip communication solutions. In this work, we propose Iris, a nanophotonic on-chip network, which consists of a low-latency planar-waveguide-based WDM broadcast/multicast subnetwork, and a throughput-optimized linearwaveguide based circuit-switching subnetwork. Together, the proposed design provides power-ecient support for both latency-critical and throughput-critical on-chip communication trac of many-core systems. Experimental study demonstrates that Iris improves power eciency and performance by 56.1% and 63.3% over the electrical alternative, and 72.6% and 52.6% over the linear-waveguide based nanophotonic alternative.
6.
[1] P. Gratz, et al., On-chip interconnection networks of the TRIPS chip, IEEE Micro, vol. 27, no. 5, pp. 4150, Sept. 2007.
REFERENCES