Você está na página 1de 4

Distributed Control for Decentralized Modular Routers

Markus Hidell, Olof Hagsand, and Peter Sjdin Laboratory of Communication Networks, KTH IMIT, Stockholm, Sweden {mahidell, olofh, psj}@imit.kth.se
AbstractThe requirements on IP routers continue to increase, both from the control plane and the forwarding plane perspectives. To improve scalability, flexibility, and availability we investigate new ways to build future routers. This paper presents a system model of a decentralized and modular router architecture. Design alternatives and implementation aspects are discussed, and a system implementation is presented.

B.

RELATED WORK

A.

INTRODUCTION

The evolution of the Internet into a global multiservice network imposes a number of requirements on network systems (such as routers and switches) in terms of flexibility, reliability, and capacity. Current network systems, however, suffer from architectural limitations, which make it difficult to meet future requirements, and which hold back the introduction of new services and applications. The monolithic structure is one of the main architectural limitations. Software and hardware are intertwined into a single, complex system, where a change in one subsystem may affect many other subsystems. This limits flexibility and performance, and due to its inherent complexity, a monolithic system is more prone to fail. Furthermore, as more and more protocols and services are invented, the list of required features is continuously growing. Consequently, both control plane software and packet forwarding hardware are getting more and more complex. The purpose of this work is to investigate architectures that allow network systems to be composed from multiple sophisticated modules (or elements), which communicate through open, well-defined interfaces. Our hypothesis is that such architectures can significantly improve scalability, flexibility, and availability: Scalability is improved because modules can be added as capacity requirements increase. Flexibility comes from the ability to dynamically add, remove and modify modules. Availability is mainly due to two factors: First, the modularity makes it possible to use redundancy and replication of critical functionality over multiple modules. Second, the modular structure in itself tends to limit the impact of faults in individual modules, and encourages sound engineering design principles. In this paper we present a system model of a decentralized modular router. Different design alternatives and implementation aspects are discussed. Finally, a system implementation composed of several elements with different functionalities is described.

A considerable amount of work has been done on modularization and programmability of network systems in the context of active and programmable networks [1], [2], [4], [5], [6], [7], [8], [9]. A main goal in that work is to support extensibilitythe ability to dynamically reconfigure network systems to support new services and applications. The work has mainly been aimed at software extensibility, to allow the packet processing software to be dynamically modified in centralized single-system architectures. In contrast, our work aims at decentralized architectures that allow for physical and logical separation of different heterogeneous modules. Exploring decentralized architectures is in line with both industry and research efforts to improve the scaling of Internet routers. Recent commercial high-performance routers are based on distributed multi-chassis solutions (e.g., the Juniper T640 [12] and the Cisco CRS-1 [13]). The 100 Tb/s router project at Stanford University [11] targets a distributed architecture where line card chassis are connected to an optical switch fabric using fiber bundles spanning over 100s of yards. However, from a logical viewpoint, these are still closed systems without open, well-defined interfaces. Within the IETF, the ForCES (Forwarding and Control Element Separation) working group [10] aims at defining a protocol for communication between control functions in a router (routing protocols, management, etc) and packet forwarding functions. The purpose is to replace proprietary interfaces between control plane and forwarding plane with standardized protocols. C. SYSTEM MODEL

In our model of a distributed router, as depicted in Figure 1, there are several different elements, which interoperate to provide for example redundancy, load sharing and/or functional separation between modules Using the terminology of ForCES [10], a Networking Element (NE) consists of Control Elements (CEs) and Forwarding Elements (FEs). CEs implement functions such as routing protocols, signalling protocols, and network management, while FEs perform for example forwarding, classification, traffic shaping, and metering. The CE functions are typically performed in software running on a general-purpose CPU, while FEs can be

based on a variety of hardware platforms such as ASICs, FPGAs, network processors, and general purpose CPUs.
NE Network Element CE #1 CE #2 CE Control Element FE Forwarding Element

Data Control

FE #1

FE #2

External Ports

Figure 1. Model of a Network Element (such as a router) consisting of control and forwarding elements.

I.

Design Issues

Our approach is to use existing open source software modules and off-the-shelf hardware (such as PCs and network processor development platforms) as functional node elements in the system design. Thus, we will modularise existing components rather than develop modularised components from scratch. a) Internal Network

The internal network interconnects CEs and FEs, and carries control and data traffic between the elements. It should scale to a large number of FEs and CEs as well as to a geographical area that goes beyond a regular multichassis solution.We choose to use two separate internal networksone for control and one for data. The data network should have high capacity and should be possible to extend if more forwarding elements are added to the system, while the capacity requirements on the control network are more relaxed. Further, such a separation of data and control prevents control traffic from being starved out by large volumes of data passing through the NE. Another design decision concerns the choice of protocol for the internal network. For simplicity, it is tempting to use UDP, which would also enable the use of IP multicast. However, there are many different types internal communication. For example, some information may require reliable transport (e.g., route updates) and others may prefer timeliness rather than reliability (e.g., heartbeat messages). There may even be information that is sent from one to many that still requires reliable transport. Our conclusion is that we cannot rely solely on UDP. Some information will be communicated using TCP and possibly also over a reliable multicast transport protocol.

One complication with using IP both internally and externally is that the internal routing must be separated (and invisible) from the external routing. In addition, a packet cannot be forwarded between external and internal ports using ordinary IP forwarding. Instead, internal forwarding needs to be based on tunneling, using for example IEEE 802.1Q VLANs, MPLS, or IP in IP tunnels. Furthermore, there are several alternatives with respect to the partitioning of functions between ingress and egress FEs. In particular, the partitioning of the lookup operation influences the amount of control information that needs to be communicated over the internal. One alternative is to let the ingress FE perform the complete lookup. In this case, the ingress FE must communicate the result of the lookup (e.g, outgoing interface and next hop MAC address) along with the actual packet when it is forwarded internally to the egress FE. This means that the complete forwarding decision could be made in one single lookup operation, which may be advantageous for efficiency reasons. An alternative solution is to let the ingress FE only look up the egress FE, and leave it to the egress FE to complete the forwarding decision by performing a new lookup to obtain outgoing interface and next hop MAC address. The advantage is that no additional information needs to be communicated from ingress FE to egress FE. The drawback is that at least two lookup operations must be done to make a complete forwarding decision. To summarize: ingress and egress FEs may have to cooperate to do a complete forwarding decision, and it must be possible to communicate information from ingress to egress FE along with the packet. b) Dynamics

There are several possible approaches for NE configuration, i.e., how to decide which functional elements (CEs and FEs) should cooperate and belong to a certain NE. Our approach is that a trusted entity decides which components should constitute the NE using a basic NE configuration. Our router model can be described as a loosely coupled system, where CEs and FEs are allowed to join and leave the NE in a dynamic fashion. Such a design is both robust (the NE can continue to operate even if a certain element breaks) and flexible (elements can be added and deleted during operation of the NE). D. IMPLEMENTATION ASPECTS

The starting point for our implementation is an ordinary PC-based router, running UNIX and Zebra open source routing software [3], as shown in Figure 2.

routing software RIB

netlink

user kernel

forwarder FIB
IF IF IF IF

requires the whole protocol stack to be recreated inside the application. To understand why this would be necessary, consider for example a BGP daemon. The BGP daemon expects that the physical interfaces are present locally and that packets to/from the daemon go through the kernel level TCP/IP stack. Instead, packets to/from the BGP daemon have to go through the user level Forz software. Thus, a user level TCP/IP stack is needed, since the kernel level TCP/IP stack is only aware of the NEs internal IP network.
CE

external links
Figure 2. A monolithic PC-based router
RIB

routing s/w
netlink

Forz

The routing protocols build up a routing information base (RIB). The information in the RIB is condensed and communicated to the operating system kernel where a forwarding information base (FIB) is constructed. The FIB is consulted by the kernel-level IP packet forwarder when packets are forwarded from incoming to outgoing interface (IF). The local communication between Zebra and the UNIX kernel is done through the Netlink interface [14]. Netlink is a message-based Application Programming Interface (API) for networking applications. In general, there is a number of networking objects that reside in the UNIX kernel, such as FIBs, interfaces, filter rulesets, etc. Networking applications use Netlink to access those objects. Both Zebra and Netlink are designed to operate within a single UNIX system. In particular, Netlink does not provide support for remote access to networking objects. Our first step is therefore to design a new protocol, called Forz, for communication between physically separated control and forwarding elements. Forz supports CE-FE communication as well as CE-CE and FE-FE communication. One of its duties is to encapsulate Netlink messages in IP packets. Figure 3 illustrates a decentralized router with one FE and one CE, where Forz is used for the internal communication. The implementation aspects and challenges differ between the CE side and the FE side, and in the remaining part of this section we will discuss CE implementation and FE implementation separately. I. CE Implementation Aspects

ctl

Forz Protocol IP network


FE specific

ctl

FE

Forz

forwarder FIB
IF IF

external links

Figure 3. Physical separation between CE and FE.

An alternative method is to integrate Forz into the operating system kernel and create virtual interfaces in the CE kernel serving as local representations of the remote physical interfaces. Such an approach would allow the control applications to remain unchanged. The work to integrate Forz with the OS kernel may be more challenging from a programming perspective, and would also result in platform dependent software. However, it is a great advantage to be able to use existing control applications without modifications. II. FE Implementation Aspects The implementation aspects on the FE side are very different from the CE. The FE may be based on a variety of hardware architectures, from highly specialized ASICs to general-purpose processors, and a decentralized modular router should support heterogeneous FEs. Thus, the implementation of the Forz protocol on the FE side depends on the FE architecture. Generally, the upper side of the Forz implementation is generic and deals with communication with other NE participants. The lower side of the Forz implementation is platform

There are two main alternatives for implementing Forz. The first alternative is to integrate Forz into the control applications. The advantage with this method is that the resulting software can be kept platformindependent and that only user level programming is needed. The disadvantage is that this approach basically

specific and translates Netlink messages, communicated in Forz messages, to a command format that is understandable to the specific FE hardware. In case of PC-based FE running Linux, the translation is fairly straightforward. The Forz implementation will extract the Netlink messages and send them down to the kernel, where the existing kernel-level forwarding is used to forward packets between the physical interfaces. E. SYSTEM IMPLEMENTATION

We have developed an experimental platform based on the decentralized system model. The platform includes three different CEs (zebra_CE, arp_CE, and linux_CE) and four different FEs (cref_FE, linux_FE, ixp_FE, and x10_FE). CEs and FEs dynamically establish and maintain NE association through the Forz protocol. The system is depicted in Figure 4.
NE zebra_CE arp_CE linux_CE

We have developed two forwarding element implementations based on network processors: ixp_FE on Intel IXP1200 network processors, and x10_FE on Xelerateds X10 network processor. The cref_CE is a user level IPv4 forwarder implemented in a PC running Unix. Although such an implementation is of limited practical use, it has turned out to be invaluable for development of the Forz protocol. Finally, a kernel based FE has been implemented in Linux (linux_FE). This FE uses the existing Linux kernel forwarder, which gives additional forwarding functionality and better performance, compared to the cref_CE. F. CONCLUSIONS AND FURTHER WORK

IP

cref_FE

linux_FE

ixp_FE

x10_FE

We believe that a decentralized modular router design is a promising way to ensure scalability, flexibility, and availability. We have demonstrated that it is practically feasible to modularize existing control and forwarding components in an experimental platform, where several different control elements and heterogeneous forwarding elements interoperate to form a network element. We will continue experimental verification through different measurements and study, for instance, fail-over behavior and load-sharing between multiple instances of specific control elements. REFERENCES
[1]

Figure 4. System implementation overview.

The zebra_CE is a user space CE based on a modified version of Zebra. It supports manual configuration of remote network interfaces and static routes. The zebra_CE takes commands, given through Zebras command line interface or a configuration file, and translates them to Netlink messages that are distributed through the Forz protocol. The arp_CE is a user-space CE that implements ARP. The arp_CE assists the FEs in resolving IP addresses for outgoing packets and in replying to ARP requests. The linux_CE is a kernel-based CE. The purpose with this implementation is to allow existing Linux networking tools and routing protocols to be used without modification to control the decentralized router. Virtual interfaces and other objects reflecting the remote physical objects of the FEs are created in the kernel. This means that remote networking objects appear as local entities that can be accessed by standard configuration tools and routing daemons. For instance, the (unmodified) Zebra software operates, through Netlink, on virtual network interfaces in the kernel, and routing daemons use the kernel level TCP/IP stack to communicate with external peers.

[2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14]

F. Kuhns et al, Design and Evaluation of a High Performance Dynamically Extensible Router, Proceedings of the DARPA Active Networks Conference and Exposition, 5/02. D. Decasper, Z. Dittia, G. Parulkar, and B. Plattner, Router Plugins: A Software Architecture for Next-Generation Routers, IEEE/ACM Transactions on Networking, 8, 2000. GNU Zebra, URL=http://www.zebra.org Y. Gottlieb and L. Petersen, A Comparative Study of Extensible Routers, In 2002 IEEE Open Architectures and Network Programming Proceedings, Pages 51-62, June 2002. M. Handley, O. Hodson, and E. Kohler, XORP: An Open Platform for Network Research, First Workshop in Networks, Princeton, New Jersey, October 2002. G. Hjlmtsson et al, Dynamic Packet Processors, A new abstraction for router extensibility, Proceedings of OPENARCH-2003, San Francisco, April 2003. S. Karlin and L. Peterson, "VERA: An Extensible Router Architecture," Computer Networks, Volume 38, Issue 3, Pages 277-293, (2002). Computer Networks. Special Issue on Programmable Networks, Vol. 38, No. 3, February 2002. IEEE Journal on Selected Areas in Communications on Active and Programmable Networks, Vol. 19, No. 3, March 2001. ForCES (Forwarding and Control Element Separation) IETF Working group, URL=http://www.ietf.org/html.charters/forcescharter.html. I. Keslassy, et al, Scaling Internet Routers Using Optics, ACM Sigcomm 2003, Karlsruhe, Germany, 2003. Juniper, T-Series Routing Platforms: System and Packet Forwarding Architecture, White paper, 2002. Cisco, Next Generation Networks and the Cisco Carrier Routing System, White paper, 2004. J. Salim, A Kleen, and A. Kuznetsov, Linux Netlink as an IP Services Protocol, Internet RFC 3549, July 2003.

Você também pode gostar