Escolar Documentos
Profissional Documentos
Cultura Documentos
Chuck Semeria Marketing Engineer Juniper Networks September 16, 1998 Contents
Introduction 1 Impact of Rapid Growth on ISPs 2 Delivering Solutions that Permit ISPs to Grow Their Networks 3 Essential Elements of a Routing System 4 Key Attributes for an Internet Backbone Router 5 M40 Internet Backbone Router Architecture 6 Routing Software: JUNOS Internet Software 6 Packet Lookup: The Internet Processor ASIC 7 Route Lookup 7 Programmability 8 Performance Insurance 8 Atomic Updates 9 Traffic Visibility 10 Switch Fabric: Distributed Buffer Manager ASIC, I/O Manager ASIC, and Shared Memory 10 Distributed Buffer Manager ASIC and Shared Memory 11 I/O Manager ASIC 12 Line Cards 12 How Packets Traverse the M40 Packet Forwarding Engine 14 The M40 Internet Backbone Router in an ISP Network 15 Rock-Solid Reliability Under Failure Conditions 15 M40 System Deployment 17 Conclusion 18
Introduction
As we approach the twenty-first century, the Internet continues to experience extraordinary growth. Any way one measures it, the growth is remarkable on all frontsthe number of hosts, the number of users, the amount of traffic, the number of links, the bandwidth of individual links, or the growth rates of Internet Service Provider (ISP) networks.
Introduction
Formerly, the ISP community adapted general-purpose equipment for use in the core of the Internet. The explosive growth of the Internet has created a market for networking equipment that is specially built to solve the unique problems confronting Internet backbone providers. This new class of equipment is required to scale the Internet not only in terms of aggregate bandwidth and raw throughput, but also in terms of software richness and control. To achieve the objectives of providing increased bandwidth and software richness, the designers of new routing systems must achieve the forwarding performance previously found only in switches. However, delivering wire-speed performance for variable-sized packets with longest-match lookups is considerably more complex than the relatively straightforward switching goal of supporting wire-speed performance for fixed-sized cells with fixed-length lookups. In particular, packet processing can no longer remain a microprocessor-based or microprocessor-assisted function, but must transition to an ASIC-based approach that supports the evolution of routing software as the Internet environment changes. The system challenges facing this new class of Internet backbone router also are complex. New routing systems must be deployed into existing OC-3 and OC-12 based cores with their related intra-POP infrastructure, and they also must support the transition to OC-48 based cores with OC-12 and Gigabit Ethernet based intra-POP infrastructures. In addition, new routing systems must accelerate the evolution of the Internet from a best-effort service to a fundamentally reliable service. With the Internet emerging in its role as the new public network, users are demanding and expecting increased reliability and stability to support mission-critical applications. The Internet cannot continue to provide erratic best-effort service that generally works but at other times fails. As they have come to expect with the Plain Old Telephone Service (POTS), Internet users want to hear dial-tone and receive quality service whenever they wish to communicate. The Juniper Networks M40 Internet backbone router is the worlds most complete system for Internet routing. It is the first system that has been specially built to combine Internet scale, Internet control, and unparalleled forwarding performance. By integrating the flexibility and control of router software with the packet-forwarding performance of a switch and providing rock-solid stability during exceptional conditions, the M40 system is designed to serve as the foundation of the optical Internet, while supporting the Internets transition from a best-effort to a fundamentally reliable communications system.
Introduction
Any solution intended for the ISP market must address all these growth issues. If a new system cannot enhance performance and contribute to the fundamental reliability of the network, the solution is doomed to failure.
Internet growth
Software richness
Router
To overcome these limitations, several ISPs decided to use an overlay approach, designing their networks with IP routers at the edges to provide software richness and ATM switches at the core to deliver speed. During the transition to OC-12 based cores, ISPs examined the available tools, put them together in complex ways, and achieved the desired result. However, the resulting overlay network solutions create their own set of management problems: s s s s The physical topology does not match the logical topology of the network. The ATM cell tax results in inefficient use of provisioned bandwidth. The full mesh of PVCs creates n-squared scalability problems. The overlay approach requires coordinated management of two distinct networks: the ATM network and the overlay IP network.
Despite these limitations, many ISPs have been willing to accept the challenges of deploying an overlay solution because it has been their only option to create a high-bandwidth network. However, it certainly would be preferable to have a specially built system that simplifies the creation of high-bandwidth networks while eliminating the cost and complexity of the overlay solution. The new class of Internet backbone routers seeks to provide this tool by allowing network designers to get away from deploying redundant devices at different layers just to meet the fundamental need for speed with control. Before discussing the features that must be supported by the new generation of Internet backbone routers, lets take a moment to examine the essential elements of a routing system.
Introduction
Routing process
Packets in
Packet-forwarding process
Packets out
Any routing system requires four essential elements to implement the routing and packet-forwarding processes: routing software, packet processing, a switch fabric, and line cards (Figure 3). For any system that is designed to operate in the core of the Internet, all four elements must be equally powerful because a high-performance router can be only as strong as its weakest element.
Figure 3: Essential Elements of a Routing System Routing process
Routing software
Packetforwarding process
Switch fabric
Line card
Line card
Line card
Packet processing may be distributed to each line card, executed in a centralized fashion, or performed by both the line cards and a centralized processor
Routing software is the part of the system that performs the routing function. It is responsible for maintaining peer relationships, running the routing protocols, building the routing table, and creating the forwarding table that is accessed by the packet-forwarding part of the system. Software also provides system control features including traffic engineering, the user interface, policy, and network management.
Introduction
Regardless of a routers architecture, each packet entering the system requires a certain amount of processing that is completely independent of the packets length. The incoming encapsulation must be removed, a longest-match route lookup needs to be performed, the packet needs to be queued on the output port, and the outgoing encapsulation must be provided. The tasks of performing longest-match lookups and the related packet processing at a high rate are the most difficult challenges to overcome when developing a high-performance routing system. The switch fabric provides the infrastructure for moving packets between router line cards. Designers have been working with switch fabrics for a number of decades, and the issues are well understood by the vendor community. There even are a number of off-the-shelf chip sets available to router vendors to build a switch fabric. These solutions might be crossbar switches, banyan networks, Clos networks, perfect shuffle networks, and so forth. A line card terminates circuits of different physical media types, implementing the Layer 1 and Layer 2 technologies such as DS-3, ATM, SONET, Frame Relay, and PPP. Issues involving the design and development of router line cards also are well understood by the vendor community. Line cards simply have to be built according to the prevailing standards that define physical interface types, optical characteristics, electrical levels, and so forth.
s s s s s
Routing Engine
JUNOS
Routing table
Packet processing Packet Forwarding Engine Line card Forwarding table Wire speed 40-Mpps packet processor Computer-scale ASICs Line card
Switch fabric
The remainder of this section focuses on how the M40 Internet backbone router provides leading-edge solutions for each of the key elements in a routers architecture: routing software, ASIC-based packet processing and lookup, switch fabric, and line cards. By providing a system where all four components are equally powerful, the M40 system delivers a complete solution that is unique in its ability to operate successfully in the core of the Internet.
s s
The user interface provides multiple user access levels, configuration change control, support for ASCII files, and the ability to return to previous versions of a configuration. To minimize the chance of software configuration errors, it supports the ability to make multiple configuration changes to a new configuration in a single step. System security is provided by Secure Shell (SSH) access to the user interface, TCP MD5 authentication for BGP sessions, architectural safeguards against denial-of-service attacks, and so forth.
Juniper Networks understands the importance of routing software for ISPs to control and manage their networks. Accordingly, the JUNOS routing software has already been tested and qualified by the worlds largest ISPs. For a complete discussion of the features and benefits of the JUNOS Internet software, including traffic engineering, refer to the Juniper Networks white paper entitled Optimizing Routing Software for Reliable Internet Growth.
Route Lookup
The Internet Processor ASIC performs longest-match lookups at a rate of 40 million route lookups per second, which is one hundred times faster than microprocessor-based lookups currently deployed in the Internet. A single Internet Processor ASIC provides more than enough power to perform wire-speed lookups for an 8xOC-48 system. In addition, the Internet Processor ASIC can be configured to perform longest-match lookups with per-prefix accounting. Per-prefix accounting provides statistics describing the number of bytes and packets forwarded to each prefix in the IP forwarding table. ISPs can use these statistics to obtain a picture of how traffic is flowing across their network.
Programmability
The Internet Processor is both a fully generic and a fully programmable lookup engine. In the initial release, it supports IPv4 (including IP multicast) and MPLS. Because of the Internet Processors programmability, support can be extended to include other protocols such as IPv6 and Frame Relay, simply by programming the Packet Forwarding Engine, developing new routing software for the Routing Engine, and communicating the new forwarding table to the Internet Processor. The programmability of the Internet Processor allows the existing JUNOS software to continue to evolve and remain fully supported in hardware. As routing continues to develop in a number of unpredictable ways, the programmability of the Internet Processor ASIC allows Juniper Networks to future-proof the Packet Forwarding Engine by allowing it to support new functionality in hardware.
Performance Insurance
A key attribute facilitated by the Internet Processor ASIC is performance insurance. Performance insurance safeguards system stability by segregating, and thus protecting, the operation of the routing and forwarding functions. Whenever a network failure strikes, three events occur: s The routing process experiences stress because of the amount of routing change. The routing stress is manifest in the need for each router to maintain peer relationships, transmit and process routing update messages, calculate a new shortest-path tree, apply policy to complete the route selection process, and modify the forwarding table used by the Packet Forwarding Engine. The packet-forwarding process experiences stress because routers that are still connected to active network links suddenly can find themselves forced to operate at peak capacity or at even greater capacity rather than the usual background level. The links and routers forced to carry the additional traffic resulting from the failure condition become critical to the stability of the network as a whole. If these network elements are unable to carry the increased traffic load and fail, a relatively simple local failure can cascade across the entire service providers network.
The challenge facing service providers is that traditional router architectures fail to deliver the performance insurance that provides rock-solid system stability during failure conditions. Juniper Networks believes that the M40 system provides the best solution in the industry for accomplishing this critical design objective, supporting performance insurance by: s Implementing an architecture that distinctly separates the functions performed by the Routing Engine and the Packet Forwarding EngineThis design segregates each component of the M40 system so that the stress experienced by each part of the system does not negatively impact the performance of the other part of the system. Ensuring that the lookup performance of the Internet Processor ASIC is never compromisedThe Internet Processor ASIC is fully sized to perform lookups at a rate of 40 Mpps regardless of how long the lookup or how large the routing table. The 40 million lookups per second is achieved with 80,000 routes and 80,000 distinct destination addresses as opposed to artificial benchmarks that do not reflect the current state of the Internet. Allowing the Packet Forwarding Engine to maintain forwarding performance when there are high rates of updates to the forwarding table, by supporting the revolutionary concept of atomic updates to its forwarding table.
Atomic Updates
The Internet Processor ASIC supports atomic updates to its centralized forwarding table (Figure 5). In the M40 system, the routing table contains the routes learned from routing protocol exchanges with neighbors and through static configuration. The forwarding table is derived from the routing table and contains an index of IP prefixes (or MPLS labels) that are actively used to associate a prefix (or MPLS label) with an outgoing interface. The Packet Forwarding Engine uses the contents of the forwarding table, not the routing table, to make its forwarding decisions. Typically, modifying a specific route affects only a small portion of the forwarding tables data structure. This means that the Routing Engine simply needs to update a portion of the binary tree in free memory and switch a pointer, and the update is instantaneous. Because forwarding information does not have to be distributed to multiple line cards, the Internet Processor does not require the forwarding table to be locked as the table is modified. Any attempt to perform a route lookup gets either the old tree or the new tree, depending on whether the location is read before or after the pointer has been changed. The benefit of this design is that the forwarding table remains consistent at all times. This means that during periods of route instability, the Packet Forwarding Engine can simultaneously accept updates to its forwarding table while continuing to make forwarding decisions at an extremely rapid rate.
Figure 5: Atomic Updates Forwarding table (Binary tree data structure)
Routing Engine
Routing table
Update
Old pointer
New pointer
Deleted subtree
New subtree
Other high-performance router architectures perform packet lookups on each individual line card. This means that when a routing change occurs and the forwarding table must be modified, the update must be distributed to each of the individual line cards. During the update and distribution process, the forwarding table must be locked to maintain route consistency as the centralized table is modified. As a result, packets are required to queue up and wait until the forwarding table is updated and then unlocked. Only after the forwarding table is unlocked can packet forwarding resume. Clearly, locking the forwarding table can have a negative impact on system performance during exceptional conditions when a high rate of routing change is coupled with a dramatic increase in traffic flowing through the router.
Traffic Visibility
With backbone routers forwarding traffic at 40-Mpps rates, an ISP needs to understand the traffic trends that are taking place in its network. For example, new applications can affect overall network performance or the volume of traffic can shift in different parts of the network. The traffic sampling feature of the Internet Processor allows an ISP to sample trafficsay 1 out of 1,000 packets. Traffic sampling occurs by sending a notification with a special flag to another part of the system, a process that occurs without impacting the lookup performance of the Internet Processor. A notification is a Juniper Networksdefined data structure that contains all the information needed to process a packet after it has been stored in shared memory. Based on the information contained in the notification, the packet then can be retrieved from shared memory and forwarded to a user process executing in the Routing Engine without impacting the performance of the Packet Forwarding Engine.
Switch Fabric: Distributed Buffer Manager ASIC, I/O Manager ASIC, and Shared Memory
The M40 system provides a conservatively rated 40-Gbps switch fabric that is implemented as a shared memory system. In addition to the Internet Processor ASIC, the switch fabric is composed of the Distributed Buffer Manager ASICs, the I/O Manager ASICs, and the shared memory system (Figure 6). In the illustration, note that each packet is fragmented into 64-byte blocks for efficient storage in the shared memory system, while a notification describing the header of the packet is forwarded to the Internet Processor ASIC for route lookup.
Figure 6: M40 Packet Forwarding Engine
Input interface
Line card
(Blocks)
Output interface
Shared memory
I/O Manager Line card
I/O Manager
(Packet)
(Packet)
From a vendors perspective, the chief limitation of a shared memory interconnect is that it is technically difficult to design and implement. However, this does not mean that vendors should avoid designing switch fabrics based on this architecture. For a system of this size, that is 8xOC-48, or even a few times larger, a shared memory architecture presents a solid approach that results in numerous system benefits.
Similar to both the Internet Processor ASIC and the Distributed Buffer Manager ASIC, the I/O Manager ASIC offers maximum flexibility because it also is highly programmable. It can be programmed by Juniper Networks to provide support for Layer 2 decoding and encapsulation schemes such as PPP, Frame Relay, and MPLS. As far as supporting class of service, the I/O Manager ASIC offers numerous options for assigning a notification to a queue and servicing the output queues, as well as controlling drop profiles and the packet discard process. Finally, the I/O Manager plays an integral role in supporting the write-once, read-many facility required for the efficient forwarding of IP multicast traffic.
Line Cards
The M40 system line cards are implemented with media-specific ASICs. For example, Juniper Networks has integrated full SONET/SDH processing on a single highly integrated ASIC. On other vendors systems SONET/SDH processing typically is performed by a number of different components, not a single, highly integrated ASIC that performs all required functions. The benefits delivered by the M40 system line card ASICs are increased port density, higher performance, lower power draws, and enhanced reliability. Super-POPs require a router that coincidently supports a large number of different interface types and high port densities so that the router can grow and evolve as ISP requirements change. As a result, the M40 system can function in a broad range of super-POP environments during any stage of an ISPs development.
The wide variety of line cards and the number of slots enhance the configuration flexibility of the M40 system in a super-POP environment. A fully populated M40 system provides 32 slotsan industry-leading port density of one slot per rack-inchoffering complete mix-and-match flexibility for line card installation. Because the switch fabric has been oversized, all line cards operate at wire rate for all packet sizes. The line cards available in 1998 include: s s s s s s s OC-48 IP over SONET/SDH OC-12 IP over SONET/SDH OC-3 IP over SONET/SDH OC-12 IP over ATM OC-3 IP over ATM DS-3 with an internal DSU Gigabit Ethernet
The line cards can be mixed and matched in each slot as desired. (The exception is the OC-48 line card, which uses four slots.) Table 1 illustrates the maximum densities for each line card type in a fully populated M40 system.
Table 1: M40 Interface Densities Interface Type OC-48 SONET/SDH OC-12 SONET/SDH OC-12 ATM OC-3 SONET/SDH OC-3 ATM DS3 Maximum Ports Per M40 System 8 32 32 128 64 128 Maximum Ports Per 7-Foot Rack 16 64 64 256 128 256
Packets
IP forwarding table
Blocks Notifications
6
Internet Processor
5 4
7a
Input interface 1
media specific
3
Shared memory
7b
Output interface 9
media specific
I/O Manager
I/O Manager
10
From the media-specific stage, a serial stream of bytes is passed to the I/O Manager ASIC (Step 2). The I/O Manager determines whether the frame is IPv4 (including IP multicast) or MPLS and identifies the beginning of the Layer 3 packet. The I/O Manager also sets flags in the packets notification that might be used for differentiated services. Finally, the I/O Manager chops the packet into 64-byte blocks and passes each of the blocks to the Distributed Buffer Manager ASIC (Step 3). These blocks are sized for efficient storage and retrieval from shared memory and are unrelated to 53-byte ATM cells. The Distributed Buffer Manager ASIC distributes the blocks evenly in a round-robin fashion across the shared memory (Step 4). In parallel to distributing each of blocks to shared memory, the Distributed Buffer Manager ASIC also extracts the route lookup key from the blocks it receives and constructs a packet notification. Recall that a packet notification is a Juniper Networksdefined (and programmable) data structure that contains all the information needed to process a packet after the packet has been stored in shared memory. For a unicast IPv4 packet, the Distributed Buffer Manager ASIC determines the incoming interface, destination IP address, source IP address, and protocol value, as well as the source and destination TCP/UDP port numbers. For an MPLS frame, the Distributed Buffer Manager extracts a route lookup key that contains the incoming interface and the value of the MPLS label. After collecting this information, the Distributed Buffer Manager forwards the notification to the Internet Processor ASIC (Step 5) so that it can make a forwarding decision for the packet.
The Internet Processor ASIC performs the route lookup. For an IPv4 packet, it searches the IP forwarding table (Step 6) looking for the longest match for the destination prefix. For an MPLS frame, the Internet Processor performs an exact match lookup in the MPLS forwarding table. After performing the lookup, the Internet Processor passes the notification message containing the forwarding decision to the Distributed Buffer Manager ASIC (Step 7a). The Distributed Buffer Manager forwards the notification to the I/O Manager ASIC for the output interface (Step 7b). For an IP multicast frame, the Internet Processor forwards a notification to the I/O Manager for each output port. The I/O Manager ASIC on the output interface is responsible for managing packet queues. The packet itself is not queued, but instead the notification for the packet is queued, while the actual packet remains stored as blocks in shared memory. In the specific case of an IP multicast packet, the I/O Manager for each output interface independently queues a packet notification. For every output port there are four queues, each of which has some configured share of the physical links bandwidth. The I/O Manager on the output interface can take a number of factors into account when deciding to queue a packet, including the value of the IP precedence bits, utilization of the input interface, destination address, and RED/WRED algorithms. When the packet notification reaches the front of its queue and is ready for transmission, the I/O Manager issues a request through the Distributed Buffer Manager to read the packets blocks from shared memory (Step 8). The I/O Manager reassembles the blocks into the packet and forwards the frame to the media-specific ASIC on the output interface (Step 9). The media-specific ASIC on the output interface performs the necessary media-specific operations, such as PPP-over-SONET scrambling and HDLC framing, places the bits in a SONET frame, identifies the beginning of the payload in the SONET frame, and then serializes the bits on the fiber (Step 10). At this point, the packet leaves the Packet Forwarding Engine for the next hop along the path towards the destination.
To provide an appropriate level of stability, ISPs traditionally design their networks conservatively and try to anticipate the impossible so that customers never become partitioned, resulting in a complete loss of service. To do this, ISPs deploy redundant systems, install numerous backup links, and run the network around 50 percent capacity during normal operating conditions. The ability of the network to accommodate some number of failures while still being able to switch all traffic is a critical design objective. ISPs follow this conservative approach for several reasons: s If a router or link fails, the rest of the network still has sufficient capacity to carry the additional traffic load. During exceptional conditions, the majority of the network can continue to run at 50 percent capacity, but certain network elements might need to operate at close to 100 percent capacity as the traffic load increases. Conservative designs provide ISPs with a bit of wiggle room during periods of high growth and rapid evolution. If an ISP has failed to provision adequate capacity in the network topology, it might be willing to accept some additional risk and let throughput at that point drift up to 60 or 70 percent until adequate transmission capacity is available.
The challenge with this conservative approach is that existing routers generally operate well under low load conditions, but they are not engineered to provide stable performance when the traffic load jumps dramatically to 100 percent. The M40 Internet backbone router is unique in its ability to provide the rock-solid stable performance that other systems lack during periods of extreme stress: s The M40 Internet backbone router is fully sized with respect to both route processing and packet forwarding. During exceptional conditions, the Routing Engine continues to receive and transmit routing updates, perform route calculations, maintain peer relationships, react to interfaces going down, and so forth. Similarly, the Packet Forwarding Engine continues to switch packets at a rate of 40 Mpps regardless of packet size or load on the system. Complementing the architectural separation of the routing and packet-forwarding processes, atomic updates permit the state of the Packet Forwarding Engine to concur with the state of the Routing Engine without impacting forwarding performance. During exceptional conditions, atomic updates allow the M40 system to avoid destabilizing the links that still remain up, thus eliminating the primary reason for cascading failures. The traffic engineering features supported by the JUNOS Internet software allow ISPs to manage around network failures. Lacking the prefailure transmission capacity, the M40 system provides tools that permit an ISP to determine the best method for distributing the current traffic load over available resources without creating congestion and further destabilizing the network.
The M40 Internet backbone router represents an entirely new class of service provider system, with specialized features that are familiar to those working in a telecommunications environment. For example, a craft interface permits a technician to monitor the status of the system, troubleshoot the system with help from a remote Network Operations Center (NOC), and perform a number of system functions.
Figure 8: Super-POP Environment
Access
Dense DS3 Customer Access Router
Intra-POP
OC-3, OC-12, or Gigabit Ethernet Switch
Core Backbone
M40 OC-48
OC-48 Legacy DS0, DS1 Access OC-3, OC-12, or Gigabit Ethernet Gigabit Ethernet or OC-12
Router
Switch
M40
OC-48
Conclusion
Conclusion
The M40 Internet backbone router represents an entirely new class of routing system that has been specifically designed to help ISPs negotiate the transition from OC-3 and OC-12 to OC-48 based backbones. The core of the Internet is constantly developing along two dimensions, software richness and bandwidth. The M40 system provides all the features that a next generation routing system must support: s The Packet Forwarding Engine is oversized with packet processing and a switch fabric to effortlessly support 8xOC-48 interfaces at full wire-speed. The M40 system provides the forwarding performance previously found only in switches, without sacrificing the elements of network control. The Routing Engine executes industrial-strength, full-featured routing protocol and traffic engineering software designed and written by acknowledged industry experts. The fundamental architecture of the M40 systemcomplete separation between the routing and packet-forwarding functionshas been designed with the goal of providing performance insurance to enhance the stability of large ISP networks during exceptional conditions. The M40 system is unique in its ability to contribute to network stability and adapt to highly fluctuating environments without impacting other parts of the network. Programmable, computer-scale ASICs allows the JUNOS Internet software to continue to evolve while future-proofing the Packet Forwarding Engine by allowing it to support new functionality without hardware changes. The variety and density of interfaces, as well as the mechanicals and serviceability, make the M40 Internet backbone router eminently deployable in the core of large ISP networks. By providing a routing system where all four fundamental components are equally powerful routing software, packet processing, the switch fabric, and line cardsthe M40 system delivers a complete solution that is unique in its ability to operate successfully in the core of the Internet.
s s
s s
Juniper Networks is a registered trademarks of Juniper Networks, Inc. All other trademarks, service marks, registered trademarks, or registered service marks mentioned in this document are the property of their respective owners. Copyright 1997, Juniper Networks, Inc. All rights reserved. Printed in USA.