Escolar Documentos
Profissional Documentos
Cultura Documentos
Networks
Mukul Goyal
K. K. Ramakrishnan
CIS Department
The Ohio State University
Columbus, OH, USA
mukul@cis.ohio-state.edu
Networking Research
AT&T Labs - Research
Florham Park, NJ, USA
kkrama@research.att.com
I.
INTRODUCTION
Link state protocols, such as OSPF [1] and IS-IS [2] using
shortest path first forwarding are the most commonly used
Interior Gateway Protocols in the Internet today. Each router
knows the topology of the network, and the associated
weights, and uses this information to determine the shortest
paths to different destinations. However, when there is a
failure in the network (link or node failure), these protocols
take some time to detect the failure and re-establish a
consistent view of the new topology. During this transient, the
data traffic forwarded towards the failed device will be
dropped. Additionally, routing loops might emerge leading to
artificial congestion in the network.
In a carriers network, service level assurances (SLA)
provided to customers potentially limit the number of packets
that may be lost or are delayed excessively. To ensure that
SLAs are met, a carrier network often uses lower layer
transport (or data link) layer failure detection and restoration
techniques, so that service is not excessively impacted.
Depending on just the routing layer (and hence OSPF or IS-
Wu-chi Feng
Dept of Computer Science & Engg
Oregon Institute of Technology
Beaverton, OR, USA
wuchi@cse.ogi.edu
TABLE I.
back up again. However, this work did not study the network
wide generation of false alarms caused by congestion as the
HelloInterval is reduced. Basu and Riecke [8] have also
examined using sub-second HelloIntervals to achieve faster
recovery from network failures. This work is similar to ours in
the sense that it also considered the tradeoff between faster
failure detection and increased frequency of false alarms. It
reports 275ms to be an optimal value for HelloInterval
providing fast failure detection while not resulting in too many
false alarms. However, this work did not consider the impact
of different levels of network congestion and topology
characteristics on the optimal HelloInterval value. We believe
these factors impact the setting of the HelloInterval
substantially, as we illustrate in this paper.
False alarms can also be generated if the Hello message
gets queued behind a huge burst of LSAs and can not be
processed in time. The possibility of such an event increases
with reduction in RouterDeadInterval. Large LSA bursts can
be caused by a number of factors such as simultaneous refresh
of a large number of LSAs or several routers going
down/coming up simultaneously. Choudhury et al. [9] studied
this issue and observed that reducing the HelloInterval lowers
the threshold (in terms of number of LSAs) at which an LSA
burst will lead to generation of false alarms. However, the
probability of such events is quite low. In our experiments
with more probable events such as the failure of a single
router, the resulting LSA burst was too small to cause false
alarms. Similarly, we investigated if frequent update of Traffic
Engineering LSAs [10] leads to large enough LSA bursts to
cause false alarms. However, we did not observe any such
effect even for reasonably high update frequency of such
LSAs.
Since the loss and/or delayed processing of Hello
messages can result in false alarms, recently there have been
proposals to give such packets prioritized treatment at the
router interface as well as in the CPU processing queue
[9][11]. An additional proposal is to consider the receipt of
any OSPF packet (e.g. an LSA) from a neighbor as an
indication of the good health of the routers adjacency with the
neighbor [11]. This provision can help avoid false loss of
adjacency in the scenarios where Hello packets get dropped
because of congestion, caused by a large LSA burst, on the
control link between two routers. Such mechanisms should
help mitigate the false alarm problem significantly. However,
it will take some time before these mechanisms are
standardized and widely deployed.
Since SPF calculation using Dijkstras algorithm imposes
significant processing load on the routers, vendors have
introduced delays (spfDelay and spfHoldTime) that limit the
frequency of such operations. These delays ultimately result in
slowing down the failure recovery process. Alaettinoglu et al.
[12] propose eliminating any restrictions on SPF calculations.
They argue that the frequency of SPF calculations can be
reduced by careful filtering of status changes in the
links/routers and the processing time of an SPF calculation
can be reduced by using modern algorithms (such as
[13][14][15]) instead of Dijkstras algorithm. In a related
work, Thorup [16] proposes the use of data structures that will
help routers make a constant time determination of the next
TABLE II.
Topology
A
B
C
Links
72
58
116
Topology
D
E
F
Nodes
37
51
116
Links
88
176
476
[8]
[9]
[10]
[11]
[12]
[13]
[14]
[15]
[16]
[17]
[18]
[19]
[20]
120
180
160
140
120
100
80
60
40
20
0
maxQ_ = 0.5
Average SPF
Calculations Per
Router in 1 Hour
100
maxQ_ = 0.75
maxQ_ = 1
maxQ_= 0.5
maxQ_ = 0.75
maxQ_ = 1
80
60
40
20
0
0.1
1
Hello Interval in seconds
0.1
10
100
250
200
150
100
50
0
0.1
1
Hello Interval in seconds
80
60
40
20
0
10
Figure 2 Change in False Alarm Frequency As High Drop Rates Persist for
Longer Durations.
0.1
300
700
600
maxQ_ = 1.02
maxQ_ = 1.05
maxQ_ = 1.10
400
300
200
100
0
1
Hello Interval in seconds
10
500
10
300
1
Hello Interval in seconds
250
200
150
100
50
0
0.1
1
Hello Interval in Seconds
10
0.1
1
Hello Interval in seconds
10
10
RED Simulations,
{min_pers, max_pers} = {0.1s, 0.5s}
TABLE III.
FAILURE DETECTION TIME (FDT) AND FAILURE
RECOVERY TIME (FRT) FOR A ROUTER FAILURE ON TOPOLOGY C WITH
DIFFERENT HELLOINTERVAL VALUES. (RED SIMULATIONS WITH MAXQ_=1
AND {MIN_PERS, MAX_PERS} = {0.2S, 2S})
Hello
Intvl
0.1
maxQ_ = 0.75
maxQ_ = 1
Seed 1
Seed 2
Seed 3
FDT
FRT
FDT
FRT
FDT
FRT
10s
32.08s
36.60s
39.84s
46.37s
33.02s
38.07s
2s
7.82s
11.68s
7.63s
12.18s
7.79s
12.02s
1s
3.81s
9.02s
3.80s
8.31s
3.84s
10.11s
0.75s
2.63s
7.84s
2.97s
5.08s
2.81s
7.82s
0.5s
1.88s
6.98s
1.82s
6.89s
1.79s
6.85s
0.25s
0.95s
10.24s
0.84s
6.08s
0.99s
13.41s