A Loss-Free Multipathing Solution For Data Center Network Using Software-Defined Networking Approach

IEEE TRANSACTIONS ON MAGNETICS, VOL. 49, NO.
6, JUNE 2013 2723
A Loss-Free Multipathing Solution for Data Center Network Using

Software-Defined Networking Approach
Shuo Fang , Yang Yu , Chuan Heng Foh , and Khin Mi Mi Aung
School of Computer Engineering, Nanyang Technological University, Singapore
Data Storage Institute, A*STAR, Singapore
Centre for Communication Systems Research, Department of Electronic Engineering, University of Surrey, U.K.
Conventional Ethernet protocols struggle to meet the scalability and performance requirements of data centers. Viable replacements
have been proposed for data center ethernet (DCE): link-layer Multipathing (MP) is deployed to replace spanning tree protocol (STP)
and thus improve network throughput; end-to-end link-layer congestion control (CC) is proposed to better guarantee loss-free frame
delivery for Ethernet. However, little work has been done to incorporate MP and CC to offer a more comprehensive solution for DCE.
In this paper, we propose a two-tier solution by integrating our dynamic load balancing Multipath (DLBMP) scheme with CC. Instead
of using two separate parameters, i.e., path load and buffer level, to trigger MP and CC, our solution only needs to monitor path load
metric to manage MP and CC in an integrated way. Different from a single CC mechanism, which generates congestion notifications from
network core, our integrated CC can make use of link load information in access switches which directly inform sources to control their
traffic admission. To minimize overhead and accelerate update, SDN techniques are employed in our implementation, which decouples
routing intelligence from data transmission. Hence, data sources can react more rapidly to congestion and network can be guaranteed
with loss-free delivery. In addition, our MP scheme is further improved by introducing application-layer flow differentiation. With such
a fine flow differentiation (FFD) mechanism, traffic can be more evenly distributed along multipaths, resulting in better bandwidth
utilization. Simulation results show that our combined solution can further improve network throughput with FFD mechanism and
guarantee loss-free delivery with integrated CC.
Index TermsData center ethernet, dynamic multipath, load balancing, rate control.
NOMENCLATURE performance computing (HPC) network [1][3], to handle dif-

ferent types of communication traffic. This often leads to es-
DCE Data Center Ethernet calating expenditure in data centers in terms of cabling, power
MP MultiPathing consumption, system cooling, infrastructure maintenance and
administration management. Accordingly, it is desirable to have
STP Spanning Tree Protocol a single unified network for a data center, which can handle and
CC Congestion Control satisfy the performance requirements of multiple types of traffic.
With the commercialization of 10 G Ethernet (10 GE) and 40
DLBMP Dynamic Load Balancing MultiPath
G/100 G Ethernet underway [4], together with the emergence
SDN Software-Defined Networking and maturity of fibre channel over Ethernet (FCoE) protocol
FFD Fine Flow Differentiation [1], [2], [5], which allows SAN traffic to be transported over
Ethernet fabric, Ethernet may offer the most cost-effective solu-
LAN Local Area Network tion for building a unified infrastructure to ensure an optimized
SAN Storage Area Network data center with scalability, performance and integrity. Besides
the link capacity of Ethernet increases from typical 100 Mbps
HPC High Performance Computing and 1 Gbps to 10 Gbps and even higher, Ethernet protocols also
FCoE Fibre Channel over Ethernet need to be improved or redesigned to cater for the performance
requirements of a data center.
TRILL Transparent Interconnection of Lots of Links
Conventional Ethernet utilizes STP [6], [7] for frame for-
ECMP Equal Cost Multipath warding, which always reduces a meshed network topology to
a tree by blocking some links and ports from forwarding status
and thus eliminates forwarding loops from the network. How-
ever, STP also eliminates the MP capability and shows poor
I. INTRODUCTION
bandwidth utilization for such a meshed Ethernet. The poor
T ODAY, a typical data center normally has a communica-

tion infrastructure with multiple networks, local area net-
work (LAN), storage area network (SAN), and clustering high
bandwidth utilization likely leads to congestion in the core/ag-
gregation level as those switches need to carry aggregated traffic
from all end stations in the network, especially with the deploy-
ment of 10 GE, where the backbone links are 10 Gbits/s and
Manuscript received November 28, 2012; revised March 14, 2013; accepted end stations also connect to the tree structure with 10 GE ports.
March 17, 2013. Date of current version May 30, 2013. Corresponding author: Hence, traditional Ethernet deployed with STP cannot scale well
S. Fang (e-mail: fang0037@e.ntu.edu.sg).
to support data center. Replacing solutions for increasing bisec-
Color versions of one or more of the figures in this paper are available online
at http://ieeexplore.ieee.org. tional bandwidth between communication pairs then become in-
Digital Object Identifier 10.1109/TMAG.2013.2254703 evitably necessary for DCE. While most research work on MP
0018-9464/$31.00 2013 IEEE

2724 IEEE TRANSACTIONS ON MAGNETICS, VOL. 49, NO. 6, JUNE 2013
tion according to buffer queue length. The control messages in-

troduced by these two indicators also lead to extra network over-
head and increases its complexity. Hence, a simply overlapping
scheme might yield poor performance due to almost doubled
amount of overhead it introduces. To tackle with this, we not
only integrate the two indicators into a single link load metric,
but also decouple this control overhead to a distinct central con-
troller which exchanges information in access switches directly
using SDN technology.
SDN is a technology that decouples control path through a
software application but still handles data traffic on original
Fig. 1. Layer 2 End-to-end congestion management scheme. hardware. By separating control information from data trans-
mission, network can reduce its overhead such as path load up-
focus on Layer-4 network [8], [9], some research work have al- dates from data paths thus yield higher performance. A wide-de-
ready addressed the scalability issue on Ethernet and proposed veloped technology, OpenFlow [14], is one of the popular en-
promising Layer-2 multipathing (L2MP) forwarding solutions ablers of SDN. In this paper, SDN-enabled switches are used
to replace STP [10], [11]. to meet this design. All the SDN-enabled switches connect di-
Different from SAN, Conventional Ethernet sends best ef- rectly to a central controller which amasses knowledge of net-
fort traffic which is prone to frame drop in face of congestion. work status and instructs switches directly. Employing a central
To combat link-layer congestion, PAUSE mechanism [6] can controller that instructs corresponding switches with routing in-
be deployed, however, such a hop-by-hop congestion control formation, switches can react towards network status promptly
scheme likely spreads the congestion to other network nodes even during traffic burst.
[1], [2]. The IEEE 802.1Qau group has presented a new Layer-2 In this paper, we propose a solution to handle data center
end-to-end CC scheme as shown in Fig. 1. When a switch gets congestion by integrating improved MP with CC mechanism
congested, it sends congestion notification messages all the way based on our previously proposed DLBMP [15]. This combined
back to the source of the congestion to require a lower data solution should be able to guarantee the network performance
sending rate. On receiving such a notification, the corresponding in terms of loss-free frame delivery and high throughput. This
source slows down its data rate by activating its installed rate paper is organized as follows: Section II briefly reviews some
limiter. With such mechanisms, CC can alleviate the conges- related work. Section III describes the major mechanisms of
tion in the core without spreading congestion over the network DLBMP+CC. Section IV presents and discusses the perfor-
[1], [2], [12], [13]. mance of DLBMP+CC. A brief conclusion is drawn in Section V
However, neither MP nor CC alone is adequate to guarantee followed by some discussion about the continuing work.
loss-free delivery and high throughput for all kinds of network
situations. One case is that congestion may occur within the net- II. LITERATURE REVIEW
work due to routing conflicts: multiple flows destined to dif- Most recent proposals about Layer-2 forwarding protocols
ferent end stations are routed through the same switches, and focus on eliminating the restrictions of using a single span-
thus results in overutilization of partial links in the network. ning tree and utilizing MP for traffic delivery instead. RBridges
In such a case, reducing the source rate by activating the cor- [16] run a link state protocol amongst themselves broadcasting
responding CC mechanism can resolve the congestion but re- connectivity to all other RBridges to compute pairwise optimal
sult in degraded performance in terms of network throughput. paths, and generate distribution trees to deliver frames. This so-
A viable and better solution is to utilize MP capability of DCE lution later has been standardized as transparent interconnection
and divert some traffic flows to light-loaded paths to optimize of lots of links (TRILL) protocol [17] by IETF. TRILL miti-
link utilization, and thus guarantees loss-free transmission and gates the loop problem by using a TTL field in the frame header,
high network throughput. On the other hand, if the congestion and deploys MP to support scalability. The major drawback of
comes from contention of end stations or extremely large bursty TRILL is that it only allows Equal Cost Multipath (ECMP) to
traffic, reducing source rate is unavoidable. This might happen deliver frames. When TRILL is deployed in a typical DCE as
when multiple sources are sending to the same destination at shown in Fig. 2, the influence of such a limitation is minimal.
a higher rate than the destination node can handle, or all viable However, when TRILL is running in a network with an arbi-
routes are fully occupied by the heavy bursty traffic load. There- trary topology, its performance may be adversely affected by
fore, dynamic MP is to achieve balanced and optimal network this limitation because an arbitrary topology cannot guarantee
resource utilization, while CC prevents excessive traffic being the availability of ECMPs between two RBridges. A protocol
pumped into the network, and thus avoids buffer overflow and named SEATTLE [18], which is similar to TRILL, has been pro-
frame dropping. posed. SEATTLE employs a one-hop DHT, mapping a hosts lo-
A dynamic combination theoretically should perform better cation with its MAC address, and then delivers the frame along
than either single mechanism, since it considers both key fac- the shortest path using the location acquired. SEATTLE admits
tors that affect network performance. However, L2MP normally routing loops for unicast traffic and proposes a new group solu-
balances its load based on link utilization as an indication, it ap- tion for broadcast/multicast traffic. Within a group, it runs over
pears to be different from flow control which measures conges- a single spanning tree protocol, eliminating the possibility of
FANG et al.: LOSS-FREE MULTIPATHING SOLUTION FOR DATA CENTER NETWORK 2725
TABLE I
routingTable OF
Fig. 2. Typical Data Center interconnection topology.
loops for such traffic. However, such a mechanism also limits

the traffic to be delivered over a single path.
Some researches try to solve the scalability issue by utilizing TABLE II
specific topologies, fat trees particularly. Al-Fares et al. [19] ad- INITIAL routingTable OF BEFORE UPDATING
vocate on fat tree topology to provide high level of bandwidth for
many end devices by appropriately interconnecting commodity
switches. Later, instead of using conventionalflat addresses, Port-
Land [11] updates this design by creating an addressing scheme
which follows a hierarchical fashion and specifies pod ID, switch
ID and host ID in separate fields of an address. PortLand also pro-
poses to use two-level routing tables and spread outgoing traffic
on the low-order bits of the destination IP addresses, and such
mechanisms can significantly reduce the amount of entries in a
routing table. However, this design of PortLand simply uses hash
function to map traffic to routes and neglects the real-time traffic
load situation on different routes. Without dynamic load moni-
toring and balancing, the significance of PortLand in achieving
fair link utilization and network scalability is with doubt. VL2
[20], [21], which extends fat tree structure to an extended general
folded Clos topology, also attempts to construct scalable and agile
Fig. 3. Data Center interconnection topology example.
data center networks to support large deployments.
On the other hand, some advocate a prioritized CC solution TABLE III
[12], [22] to ensure loss-free delivery of FC storage traffic in all RoutingTable OF AFTER UPDATING
situations because DCE deployed as a consolidated fabric with
a mix of SAN and LAN traffic requires different handling of
priorities.
In the previous discussion, we can see that MP and CC are
with equal importance to guarantee a DCE of lossless delivery
and high throughput. However, little work has been done to
come out a dynamically combined solution from MP and CC In DLBMP, every access switch (i.e., and of Fig. 3)
for DCE. With such a motivation, we proposed a two-tier so- generates a unique routeID for each discovered route in the
lution by closely and dynamically integrating our proposed MP format of srcMAC.destMAC.RouteNo. For example, corre-
scheme [15] with a prioritized CC mechanism to achieve opti- sponding to the topology in Fig. 3, the access switch has two
mized system performance. routes to the access switch , 5-3-1-4-7 and 5-3-2-4-7 respec-
tively with route ID of and
III. IMPROVED DLBMP COMBINED WITH CONGESTION accordingly. An access switch keeps the generated routeID for
CONTROL: AN INTEGRATED TWO-TIER SOLUTION each discovered route in its routingTable. Each access switch
In this section, our proposal is described with four proce- then floods the network with its routingTable to help reconcile
dures, including initial route construction, path load update, ac- the routeID information at other switches. Table I shows part
cess switch load balance and finally, source rate adjustment. of the initial routingTable of corresponding to Fig. 3. A
non-access switch also maintains its routingTable after the
A. Route Construction route computation, however, it does not generate routeID
Different from STP which employs backward learning to by itself for discovered routes but updates the routeID entry
construct forwarding table and forward frames, DLBMP utilizes for each recorded route in its routingTable according to the
all possible routes for frame delivery. At initialization, switches ones received from access switches. Table II demonstrates the
broadcast their link-state advertisements (LSAs) to present its routingTable of before updating, and Table III illustrates the
topology and compute available routes to every other switch. routingTable of after getting updates from access switches.
the most heavily loaded link along the path, which is proper
to represent the corresponding pathLoad (Step 3 in Fig. 4).
The pathLoad information for respective routes are recorded in
the routingTable of egress access switches. Table IV gives an
example routingTable with such information (Step 4 in Fig. 4).
Periodically, the central controller checks routingTable in
egress access switches about the pathLoad information of
their associated routes and updates ingress access switches
accordingly, details will be illustrated in the next section (Step
Fig. 4. Illustrations of path load updating. 5 in Fig. 4).
2) Downstream Path Load Updates Using Software-Defined
Networking: Access switches connect to a central controller
The purpose of maintaining such consistent routeID infor-
which enables pathLoad exchange among access switches
mation among all switches is that a switch can uniquely map
without network delay. Instead of running a sole distribution
a routeID to a route recorded in its routingTable. When a data
algorithm, which mixes data traffic and control information
frame enters the network, the access switch chooses one route to
with each other and results in performance degradation, or a
forward the frame and attaches the routeID of the chosen route
sole SDN protocol, which depends only on a central controller
in the frame header. As a non-access switch receives the frame,
that accesses to every switch and controls all the intelligence,
it simply forwards the frame to its next hop according to the
our design lets a data frame piggyback control information in
routeID by consulting its routingTable. When the corresponding
its frame header in the data path on the way to its egress access
access switch receives the frame, it checks its MAC association
switch, while ingress access switches obtain pathLoad updates
table and forwards the frame to the destined end station through
through the control path.
the correct Ethernet port.
Specifically, only access switches in our proposal run SDN
protocols, which allows the central controller to access rout-
B. Path Load Updating
ingTables of all the access switches. When SDN-enabled
In our design, path loads of routes are updated through two switches start, they open a secure channel to the central con-
phases. In the first phase, a data frame piggybacks the highest troller. The controller can query, insert and modify flow entries.
link load along intermediate switches on the route to its egress The switches maintain statistics in routingTables, such as
access switch. In the second phase, ingress access switches re- pathLoad and lastUpdate as illustrated in Table IV.
ceive path load updates through central controller which peri- The central controller performs as follows: it obtains network
odically exchanges control information among access switches. status from routingTables in all access switches periodically,
1) Upstream Path Load Updates Using Piggybacking: and updates ingress access switches accordingly when it detects
During operation, each switch continuously measures the pathLoad change on the associated routes (Step 5 and step 6 in
number of outgoing frames and the size of each frame comes Fig. 4).
from its Ethernet ports, so that it can determine the occupied The switches perform as follows: if an incoming frame
link capacity for all attached links. For a link , the ratio does not match any of the flow entries in the routingTable, the
between utilized link capacity and total link capacity is defined switch inserts a new flow entry with the appropriate output
as the link load for . Hence, for a path composing of port (based on Section III-C) which allows any subsequent
several links , the corresponding path load frames to be directly forwarded at line rate in hardware (Step
is defined as follows: 1 in Fig. 4). Once traffic volume on a route grows beyond the
specified threshold , the access switch may inform the source
(1) to adjust its data rate according to the mechanism discussed
in Section III-D.
where represents the link load of To summarize, an illustration of updating process is presented
, respectively. in Fig. 4. The process follows seven steps, including Step 1,
When forwarding the frame for an end station, besides the ingress access switch frame forwarding; Step 2, source access
routeID of the chosen route, the access switch also piggybacks switch route selection; Step 3, intermediate switch path load pig-
the pathLoad information in the frame header. Here, pathLoad gybacking; Step 4, egress access switch path load updating; Step
is the transient link load of the corresponding outgoing link 5, central controller path load updating; Step 6, source access
(Step 2 in Fig. 4). switch path load updating; and optionally Step 7, CC feedback
As the frame flows through the path, each intermediate if path load exceeds preset threshold.
switch will check for the pathload information recorded on the Our design differentiates itself with a central algorithm in two
frames header. If recorded pathLoad is higher than the switch ways. First, only access switches that are SDN-enabled and the
outgoing links linkLoad, the frame remains intact. Otherwise, rest are still common switches, which permits easy implemen-
the switch replaces pathLoad field in the frame header with its tation and high scalability. Second, it also runs distributed algo-
outgoing links linkLoad, and forwards the frame to its next hop. rithm in case of controller failure, which updates ingress access
Upon the arrival of the frame at its destination, the pathload switches through data paths, this ensures that our system oper-
recorded in the frames header always shows the linkLoad of ates even when the controller can not function well.
TABLE IV
routingTable OF
TABLE V
FLOWINFOTABLE OF S5
C. Load Splitting and Balancing isting flow or not. If the frame from an existing flow, the access
When an access switch receives a frame belonged to a new switch first updates the lastUpdate field with current time of the
flow, it chooses among all the available routes with the proba- entry corresponding to the flow in the flowInfoTable, and then
bility that is reverse proportionally to their pathLoad to deliver uses the route referred by the routeID field to deliver the current
this new flow by checking the pathLoad information in its rout- data frame. In addition, it attaches the corresponding routeID
ingTable. This helps to achieve fairer and better utilization of in the frame header. Otherwise, it selects a route to forward the
network links and routes. frame and adds an entry for the new flow in its flowInfoTable
1) Per-Flow Forwarding: DLBMP provides dynamic load as the frame is likely the first frame for a new flow.
balancing by splitting traffic across multiple paths. However, if D. Congestion Control
paths have different delays, split traffic at frame granularity can
cause large number of frames arriving out-of-order. TCP con- By integrating the CC mechanism into DLBMP, we aim to
fuses this reordering as a sign of congestion, resulting in degraded handle network congestion better and prevent loss by control-
performance [23], [24]. Even some UDP-based applications are ling data rate from data sources.
sensitive to packet reordering. Most importantly, it is critical to 1) Congestion Detection and Notification: Instead of gen-
preserve the in-order delivery for FCoE SAN traffic as neither FC erating congestion notification from any switch with potential
nor Small Computer System Interface (SCSI) can handle packet congestion, access switches are equipped with intelligence for
reordering well. In DLBMP, all frames of a flow are forwarded detecting congestion on a route and notifying corresponding
on the same path to preserve in-order delivery and traffic splitting data sources with explicit messages.
for load balancing purpose is only occurred at flow granularity. Since each access switch records up-to-date pathLoad infor-
Besides using MAC address and EtherType in Ethernet frame mation for each route originated from itself, it can easily detect
header, application-layer related information is also considered a congested route when the pathLoad keeps increasing and ex-
in flow differentiation such that flows can be differentiated with ceeds a predefined threshold , and inform all the data sources
finer granularity. For example, IPv4 packets can be further dif- on that route according to its flowTable, requiring them to re-
ferentiated by its TCP or UDP port number, and FCoE flows can duce the data rate of their flows. Specifically, We implement
be further differentiated by their OXID along with other SCSI a sample unit at access switches for congestion detection. The
exchange parameters. As a result, application-layer flows can be sample unit controls sample interval by using a preset sample
finely differentiated in DLBMP. byte . In case of a potential congestion upon receiving an up-
In DLBMP, each access switch records all the flow that it date which pathLoad of any entry exceeds , the sample unit
currently handles in an information table named flowInfoTable. starts recording the number of bytes that has been sent through
Table V gives an example flowInfoTable. flowID field is utilized the switch. When the accumulated bytes reach bytes, flows
to uniquely identify an application-layer flow, and actually it on routes that exceeds stand a probability of reduction. By
represents the header portion of all frames, which belongs to this implementing this sample unit, the sample intervals is different.
flow. routeID field gives the route used for the corresponding That means, when traffic volume becomes higher, sampling be-
flow, and lastUpdate field records the time of the last frame comes more frequent, and vise versa.
received for the flow to help validate the freshness of a flow. Similar to TCP, which utilizes explicit congestion notification
Nonaccess layer switches do not maintain such flowInfoTable as (ECN) to indicate potential congestion, in our mechanism, we
they always forward a frame according to the routeID indicated also notifies the source through an explicit feedback.
in the header of the received frame. It is important to note that this congestion control mechanism
As a data frame enters the network through some access is able to differentiated over different priorities by selecting dif-
switch, the switch can tell whether the frame belongs to an ex- ferent parameters according to the guidelines provided in [22].
Fig. 5. Network topology of the simulation testbed.
However, the topic is not within the scope of this paper, which TABLE VI
we do not provide details and experiments due to space limit. BASIC PARAMETERS
2) Rate Limiter Control: At source side, we use rate limiters
to control traffic generating rate from application layer.
Initially, a rate limiter starts with a sending rate of and in-
creases exponentially by doubling the rate in each predefined
time interval called a slot. Upon receiving congestion notifica-
tions from access switches, rate limiters enter into adjustment
cycles. Similarly as additional increase and multiplicative de- TABLE VII
crease (AIMD) in TCP, we implement our rate control in such a PARAMETER SETTINGS FOR SINGLE FLOW TEST
way that once a source receives a congestion notification it will
reduce rate to a proportion of and then increase linearly by
in a slot as shown in (2).
Let be the sending rate flow transmits to the network
at time through a certain route, source will adjust its load at
the next slot by
(2) and Host13-Host16. The basic simulation parameter settings are

otherwise
listed in Table VI.
where is the sum of traffic loads transmitting
through the same switch, is a preset threshold indicating a A. Experiment 1: System Parameter Test
potential congestion. Note that AIMD can be described by two
parameters, and , respectively, where , . We test the impact of reduction ratio and threshold on
the network throughput. We conduct simulation tests on per-
formance evaluation with two tests. The first test focuses on a
IV. PERFORMANCE EVALUATION AND DISCUSSION
single flow level of the impact on throughput with ratio and
In this experiment, we use a network simulator OMNET++ threshold . The second test focuses on multiple flows of the
to evaluate the performance of our proposal by using an inter- impacts on threshold with MP routing.
connected topology as shown in Fig. 5. We conduct an experi- In the first test, we run a simple simulation scenario with only
ment to test the effect of important system parameters, such as one flow from Host1 to Host9, starting at frame interval of 1 ms
reduction ratio and threshold , to offer parameter guidelines. at 10 s, which means the time interval between two messages
Following this experiment, we further compare performance is 1 ms. The interval reduces 5 times every 10 s, that means, it
gaps on four respective schemes including STP, TRILL, and our changes to 0.2 ms on 20 s, 0.04 ms on 30 s, and so forth. In other
DLBMP, denoted as DLBMP, as well as, the newly proposed words, the flow volume increases on every 10 seconds. The sim-
loss-free flow differentiated load balance scheme, denoted as ulation runs for 60 seconds. In this test, there are eight simula-
DLBMP+CC. tion runs with the parameter sets of and
Fig. 5 shows a standard data center network topology used . Then we compare their throughput difference ac-
for the computer simulations. SW1, SW2, SW9 and SW10 are cording to the results we obtain.
classified as access switches to which end stations are directly Details of the simulation parameters are listed in Table VII
connected. Each access switch and its connected end station and we report network throughput in Fig. 6. As can be seen,
group form a Pod, which means Pod1 consists of SW1 and keep cut ratio the same, increase of threshold results in a
Host1-Host4, Pod2 consists of SW2 and Host5-Host8, Pod3 higher throughput. Similarly, increase of while maintaining
consists of SW9 and Host9-Host12, and Pod4 consists of SW10 the same also yields an increase of throughput.
TABLE IX
PARAMETER SETTINGS FOR COMPARISON TEST
Fig. 6. Throughput of one flow with regard to and .
TABLE VIII
PARAMETER SETTINGS FOR MULTIPLE FLOWS TEST
Fig. 8. Frame delivery ratio versus flow number.
B. Experiment 2: Algorithm Comparisons

In this experiment, we firstly compare the system perfor-
mance in terms of frame delivery ratio of STP, TRILL, DLBMP,
and DLBMP+CC with different numbers of flows. Further, we
demonstrate the number of frame sent at each switch to verify
the load balance capacity of our scheme.
Using the suggested parameters from Experiment 1, we run
simulations in which 1700, 1900, 2100, 2300, and 2500 flows
are generated from Host1-Host8 sending to Host9-Host16. In
this experiment, the simulation runs for 100 s with flows ran-
domly starting from 1015 s and ending afterwards till 100
Fig. 7. Throughput with regard to on multiple flows. s. The frame intervals of these flows are normally distributed
in 0.11 ms. Detailed parameter values for this simulation is
shown in Table IX.
Second, we evaluate the impact of on system throughput From Fig. 8 we can see that the delivery ratio of STP is signif-
with multiple flows transmitting from Host1-Host8 to icantly worse as compared with other schemes due to its lack of
Host9-Host16. These flows randomly start from 10 to 15 s MP capability and poor utilization of network bandwidth. The
and end randomly before 100 s with interval randomly set from gap between TRILL and DLBMP comes from the difference
0.1 ms to 1 ms. In the simulation run of 100 s, we set two sce- between static and dynamic algorithms. TRILL implements a
narios with different flow number of 1700 flows and 2500 flows. static algorithm which assigns a flow sequentially regardless
Each consists simulation runs with threshold 7, 8, 9, and 10 Gbps of flow size and current traffic distribution, such that big flows
and throughput are reported in Fig. 7. Note that 10 Gbps means might be assigned to an already congested path while other paths
no control at sources. We can see from Fig. 7 that as threshold may be lightly loaded. On the other hand, DLBMP assigns flow
increases, higher throughput is yielded for both 1700 and 2500 according to current path load informations which can utilize
flows. Though higher throughput can be obtained with a higher network resource more evenly. In addition, with the introduc-
threshold, it stands a risk of frame drop due to the delay control tion of SDN, access switches in DLBMP can exchange path load
towards congestion. As we observe, there is a slightly drop of information effectively with short delay. DLBMP+CC outper-
0.06% in the simulation run of 2500 flows with threshold of 9 forms TRILL and DLBMP due to FFD and CC mechanisms.
Gbps. In the next part, we choose 8 Gbps as threshold for the next FFD offers a fine grained flow assignment that optimizes net-
test to obtain high throughput as well as guarantee no frame drop. work bandwidth utilization. Moreover, the integrated CC mech-
future work with interest is to design an integrated solution with

both MP and CC capabilities by utilizing buffer level as the con-
trol parameter, and to investigate the pros and cons of such a
scheme as compared with DLBMP+CC.
REFERENCES
[1] S. Gai, Data Center Networks and Fibre Channel over Ethernet
(FCoE) 2008.
[2] G. Silvano and D. Claudio, I/O Consolidation in the Data Center. A
Complete Guide to Data Center Ethernet & Fibre Channel Over Eth-
ernet 2009.
[3] J. B. Graham Smit, Converged Enhanced Ethernet-Good for iSCSI
SANs [Online]. Available: http://bladenetwork.net/userfiles/file/PDFs/
WP_NetApp_Enhanced_Ethernet.pdf 2008
[4] A. Benner, P. Pepeljugoski, and R. Recio, A roadmap to 100 G Eth-
ernet at the enterprise data center, IEEE Commun. Mag., vol. 45, no.
Fig. 9. Throughput at each outgoing port with regard to different algorithms.
11, pp. 1017, 2007.
(a) STP; (b) TRILL; (c) DLBMP; (d) DLBMP+CC.
[5] T11-FC-BB-5 Standard, 2010.
[6] 802.1D MAC Bridges [Online]. Available: http://www.ieee802.org/1/
anism can guarantee loss-free delivery by successfully control- pages/802.1D-2003.html
[7] 802.1wRapid Reconfiguration of Spanning Tree [Online]. Available:
ling data sending rate from sources in an overloaded network. http://www.ieee802.org/1/pages/802.1w.html
Hence the performance gaps become more obvious after flow [8] C. Raiciu, S. Barre, C. Pluntke, A. Greenhalgh, D. Wischik, and M.
number increases to higher than 1900 flows. Handley, Improving datacenter performance and robustness with mul-
tipath TCP, ACM SIGCOMM Comput. Commun. Rev., vol. 41, no. 4,
Second, we show the comparison of throughput on each out- pp. 266277, 2011, ACM.
going port. Due to the space limitation, we just list the statistics [9] Y. Dong, D. Wang, N. Pissinou, and J. Wang, Multi-path load bal-
ancing in transport layer, in Proc. IEEE Next Generation Internet
collected from the simulation test with 2100 flows. However, Netw. 3rd EuroNGI Conf., 2007, pp. 135142.
they are equally comparable to those tests with other flow num- [10] J. Mudigonda, P. Yalagandula, M. Al-Fares, and J. C. Mogul, Spain:
bers. In Fig. 9, we list total million number of frames sent from COTS data-center ethernet for multipathing over arbitrary topologies,
in Proc. 7th USENIX Conf. Netw. Syst. Design Implement. USENIX
all the switch ports towards destinations, Pod3 and Pod4. These Assoc., 2010, pp. 1818.
ports include uplink ports of SW1, SW2, SW3, and SW4, as well [11] R. Niranjan Mysore, A. Pamboris, N. Farrington, N. Huang, P. Miri, S.
as downlink ports of SW5, SW6, SW7, SW8, SW9, and SW10. Radhakrishnan, V. Subramanya, and A. Vahdat, Portland: A scalable
fault-tolerant layer 2 data center network fabric, ACM SIGCOMM
To separate the ports in the same switch, we name them as L Comput. Commun. Rev., vol. 39, no. 4, pp. 3950, 2009, ACM.
port, for those on the left, and R port, for those on the right. In [12] D. Bergamasco, Ethernet congestion manager (ECM) specification,
Fig. 9, we can see that traffic load is mostly imbalanced in STP Cisco Systems initial draft EDCS-574018, 2007.
[13] 802.1QauCongestion notification, [Online]. Available: http://
algorithm. TRILL improves significantly as compared to STP. www.ieee802.org/1/pages/802.1au.html IEEE Standard for Local
However, our DLBMP and DLBMP+CC schemes outperform and Metropolitan Area NetworksVirtual Bridged Local Area Net-
worksAmendment 2007
TRILL, which is mainly due to their periodic inspection on path [14] N. McKeown, T. Anderson, H. Balakrishnan, G. Parulkar, L. Peterson,
load and dynamic traffic distribution. With a fine grained flow J. Rexford, S. Shenker, and J. Turner, Openflow: Enabling innovation
control, DLBMP+CC balances traffic most evenly among the in campus networks, ACM SIGCOMM Comput. Commun. Rev., vol.
38, no. 2, pp. 6974, 2008.
four schemes with throughput of all ports ranging between 30 [15] Y. Yu, K. Aung, E. Tong, and C. Foh, Dynamic load balancing multi-
to 35 million of frames due to the introduction of FFD. pathing for converged enhanced Ethernet, in Proc. 18th Annu. Meet.
IEEE Int. Symp. Modeling, Analysis and Simulation of Computer and
Telecommunication Systems (MASCOTS), Miami, FL, USA, 2010.
V. CONCLUSION [16] R. Perlman, Rbridges: Transparent routing, in Proc. INFOCOM
In this paper, we proposed an integrated Ethernet solu- 2004. 23rd Ann. Joint Conf. IEEE Comput. Commun. Soc., 2004, vol.
2, pp. 12111218, IEEE.
tion, DLBMP+CC, by combining MP with CC based on our [17] Transparent Interconnection of Lots of Links (trill) [Online]. Available:
previously proposed DLBMP. As compared with DLBMP, http://www.ietf.org/html.charters/trill-charter.html IETF WG
[18] C. Kim, M. Caesar, and J. Rexford, Floodless in seattle: A scalable
DLBMP+CC improves network throughput because its ap- Ethernet architecture for large enterprises, ACM SIGCOMM Comput.
plication-layer flow differentiation can make full utilization Commun. Rev., vol. 38, no. 4, pp. 314, 2008, ACM.
of network bandwidth and congestion control can prevent [19] M. Al-Fares, A. Loukissas, and A. Vahdat, A scalable, commodity
data center network architecture, ACM SIGCOMM Comput. Commun.
excessive traffic from entering network. Rev., vol. 38, no. 4, pp. 6374, 2008, ACM.
With the introduction of SDN, our dynamic algorithm can [20] A. Greenberg, P. Lahiri, D. A. Maltz, P. Patel, and S. Sengupta, To-
update and react towards load imbalance and traffic congestion wards a next generation data center architecture: Scalability and com-
moditization, in Proc. ACM Workshop on Program. Routers for Ex-
promptly. In a heavily loaded network, DLBMP+CC can still tensible Services of Tomorrow, 2008, pp. 5762.
guarantee load balance and loss-free frame delivery with its fast [21] A. Greenberg, J. Hamilton, N. Jain, S. Kandula, C. Kim, P. Lahiri, D.
reaction, as the central controller amass network status and ex- Maltz, P. Patel, and S. Sengupta, VL2: A scalable and flexible data
center network, ACM SIGCOMM Comput. Commun. Rev., vol. 39,
change information effectively among ingress and egress access no. 4, pp. 5162, 2009.
switches. Simulation results demonstrated the effectiveness and [22] S. Fang, C. Foh, and K. Aung, Differentiated congestion management
of data traffic for data center Ethernet, IEEE Trans. Netw. Service
efficiency of DLBMP+CC in different scenarios. Manag., no. 99, pp. 112.
We use a single measuring parameter, path load, in this paper [23] T. Chim, K. Yeung, and K. Lui, Traffic distribution over equal-cost-
to track network traffic load for integration of MP and CC. How- multi-paths, Comput. Netw., vol. 49, no. 4, pp. 465475, 2005.
[24] S. Kandula, D. Katabi, S. Sinha, and A. Berger, Dynamic load
ever, buffer utilization level is another important parameter that balancing without packet reordering, ACM SIGCOMM Comput.
can indicate network load and congestion. Hence, our possible Commun. Rev, vol. 37, no. 2, pp. 5162, 2007.

A Loss-Free Multipathing Solution For Data Center Network Using Software-Defined Networking Approach

Enviado por

Dados do documento

Título original

Direitos autorais

Formatos disponíveis

Compartilhar este documento

Compartilhar ou incorporar documento

Opções de compartilhamento

Você considera este documento útil?

Este conteúdo é inapropriado?

Direitos autorais:

Formatos disponíveis

A Loss-Free Multipathing Solution For Data Center Network Using Software-Defined Networking Approach

Enviado por

Direitos autorais:

Formatos disponíveis

IEEE TRANSACTIONS ON MAGNETICS, VOL. 49, NO.

6, JUNE 2013 2723

A Loss-Free Multipathing Solution for Data Center Network Using

NOMENCLATURE performance computing (HPC) network [1][3], to handle dif-

T ODAY, a typical data center normally has a communica-

0018-9464/$31.00 2013 IEEE

tion according to buffer queue length. The control messages in-

Fig. 2. Typical Data Center interconnection topology.

loops for such traffic. However, such a mechanism also limits

Fig. 5. Network topology of the simulation testbed.

(2) and Host13-Host16. The basic simulation parameter settings are

Fig. 6. Throughput of one flow with regard to and .

Fig. 8. Frame delivery ratio versus flow number.

B. Experiment 2: Algorithm Comparisons

future work with interest is to design an integrated solution with

Você também pode gostar