Escolar Documentos
Profissional Documentos
Cultura Documentos
Home
1 Introduction
2 Identification
2.1 Problem nature
2.2 CPU
2.3 Interfaces
2.4 Load
3 Mitigation / Alleviation
3.1 Processes
3.2 Traffic
3.3 Optimize throughput
3.4 Flow Control
3.5 Active/Active failover
3.6 More hardware
1 Introduction
There are many times where indications of oversubscription or excessive load on a firewall or a
network device are not enough to prove if oversubscription is really happening. Also, it is often
confusing how to identify and solve such issues. This document will present the basic
troubleshooting steps that someone needs to take in order to pinpoint an oversubscription
problem on a Cisco ASA firewall and will propose potential solutions to overcome it. The
corresponding document for the FWSM is located here.
2 Identification
The most important aspect of solving an oversubscription issue is its identification. Network
engineers will often incorrectly attribute network problems to excessive traffic which leads
devices like the firewalls to be wrongly considered as the bottleneck. Other times they will
focus on other parts of the network in cases were the firewall processing power is not enough
to handle the traffic. There can be multiple indications of load problems on firewall devices and
putting them together will help us understand if traffic is indeed the reason of the problem or
if we should focus elsewhere. That is what this section will try to describe.
2.2 CPU
A "busy" firewall device will almost always show it on its CPU. We can check the CPU use with
the command "show cpu".
A CPU ranging above 80%-90% could indicate high traffic load. As a side note, the "show cpu
profile".can also be provided to TAC so that they will be able to identify the processes that the
CPU is spent.
Also, CPU hogs can show when the CPU is too busy to pull packets off the line:
ASA# show process cpu-hog
Process:
LASTHOG At:
PC:
Call stack:
3.47
80c2575
In the above example, the telnet process is hogging the CPU. During this time, the CPU is not
available to pull packets of the NIC and route them through the firewall.
2.3 Interfaces
Another important indicator of oversubscription will be interface errors. A couple of commands
to check the interfaces are "show interface" and "show interface | i errors"
Interface overruns, no buffer and underruns often show that the firewall cannot process all
the traffic it is receiving on its NIC. Overruns and no buffers indicate that input traffic is too
much on a given interface. The interface maintains a receive ring where packets are stored
before they are processed by the ASA. If the NIC is receiving traffic faster than the ASA can
pull them off the receive ring, the packet will be dropped and either the no buffer or overrun
counter will increment. Underruns behaviour similarly but deal with the transmit ring instead.
2.4 Load
Next it is worth checking the traffic that the device is seeing. We need to clear the traffic
("clear traffic" command) statistics before checking them ("show traffic" command). We are
doing that because we want to see the traffic while the problem is occurring and thus be able
to tell if load is related to the problem investigated. Looking the aggregate traffic output from
"show traffic" carries information since the last reload or the last time the counters were
cleared, so it will not help us identify how much traffic the box is seeing for the time we are
troubleshooting. After the "clear traffic" we let the box collect statistical information for 25minutes and we do "show traffic" to get the traffic the interfaces saw.
773519 bytes
7 pkts/sec
680 bytes/sec
276317 bytes
3 pkts/sec
242 bytes/sec
Monitoring tools and Netflow can also help on identifying traffic and connection rates.
We can then calculate the aggregate throughput the device is passing by examining the traffic
that all physical interfaces saw (output of "show traffic") and we will be able to understand if
it is being pushed to its limits. In order to do that we need to check the device specs:
For the ASA, we can read from the ASA model comparison document
5510
5505
Base /
Base/Security
Security
Plus
Plus
Maximum firewall
throughput
(Mbps)
150 Mbps
300
Mbps
Maximum firewall
connections
10,000 /
25,000
50,000 /
130,000
4000
9000
85,000
190,000
Maximum firewall
connections/secon
d
Packets per
second (64 byte)
5520
450
Mbps
5540
650
Mbps
5550
5580-20
5580-40
1 Gbps
5 Gbps
10 Gbps
(real-world (real-world (real-world
HTTP), 1.2 HTTP), 10 HTTP), 20
Gbps
Gbps
Gbps
(jumbo
(jumbo
(jumbo
frames)
frames)
frames)
280,000 400,000
650,000
1,000,000
2,000,000
12,000
25,000
36,000
90,000
150,000
320,000 500,000
600,000
2,500,000
4,000,000
There are long discussions that people could start trying to tell if a firewall or any other device
is hitting its traffic processing limits or not. Experience has shown that there is controversy on
what the numbers show and what engineers consider as being close to the numbers or not. It
is worth clarifying a few points. Let's use the ASA5510 as an example. Its name throughput is
300Mbps, as we see on the table above. So the question is, "if my ASA5510 sees about
280Mbps should it be 100% CPU or not?". A quick answer would be "No". Though, we must not
forget that there are many factors involved in this question. In the network industry name
speeds of devices come out under certain tests. These tests are repeated and an average is
presented as the maximum speed. Though, not always is "real-world" traffic the same traffic as
the one used in the tests. We could use the aforementioned ASA5510 for example. Usually, the
name speed tests involve stateless protocols with big packets. For a TCP web browsing
application though, the packets are much smaller and TCP uses ACKs and is a "synchronized"
protocol by nature. That would add more load to the firewall itself, which would make its
maximum throughput value drop. On top of that, if the ASA has http inspection configured
(which will do deep packet inspection for http) then we understand that its maximum
processing throughput would be less than 280Mbps. It is obvious that even though 300Mbps is
indeed the throughput the device can achieve, its real-world throughput, based on applications,
traffic nature and configuration could practically be less. That is why in our performance
documents we also try to provide other metrics. These include the "packets per seconds" (pps)
and what is often seen as "real-world HTTP". For example in the ASA table we can see that the
5510 can do 190K pps (small 64-byte packets). These metrics could also be used against the
interface statistics collected from the device in order to decide if the box is pusehd to its
limits.
Another consideration on top of traffic load for the firewall devices is connection and
connection rates. That is another field that could trigger various disagreements. The command
we would use to see the connections on our firewall are "show conn count" and "show
resource usage".
Current
Telnet
Syslogs [rate]
Peak
Limit
1
1
1
293
Denied Context
5
N/A
0 System
0 System
Conns
86
10000
0 System
Xlates
116
N/A
0 System
Hosts
49
N/A
0 System
Resource
Current
SSH
Syslogs [rate]
Limit
1
118
Conns
89
Xlates
150
Hosts
Conns [rate]
Peak
15
103
1
348
15
unlimited
893
1115
18
Denied Context
unlimited
unlimited
unlimited
4694unlimited
0 admin
0 context1
0 context1
0 context1
0 context1
0 context1
...
Now, let's ask one more questions for the output from our ASA5510 above: "In the peak
connection rate I see about 5K connections and in the specifications I read that the maximum
supported rate is 9K conns/second. 5K is much less than 9K, so is the ASA exceeding its
limits?". For someone to be able to answer that question he would need to keep in mind that
the rate that is mentioned in the specifications is the average rate per second. To explain it
better, here are a few examples:
Let's say we have a stable rate of 9K per second. This connection rate conforms to the
ASA5510 limits.
Now let's see we have 90K new conns per 10 seconds. That is also a rate of 9K per
second.and conforms to the ASA5510 limits
Now let's say we have 81K new conns. for 1 second and the next 9 seconds we have 1K.
That makes us total 90K per 10 seconds which equals to average 9K per second which
conforms with 9K conns/second. But the ASA was oversubscribed for 1 second while it was
seeing a rate of 81K/second.
So, it is obvious that bursts of traffic or connections could affect the performance of a firewall
even if the averages over time does not seem to exceed the limits.
Additionally, having few connections through the box does not necessarily mean that traffic is
not high. Theoretically speaking, someone could have 10 connections passing 1Gbps each and
thus oversubscribing an ASA with very few conns.
3 Mitigation / Alleviation
Now, it is equally important to mention options for overcoming an oversubscription issue. We
would suggest to the reader to keep in mind that if a device is oversubscribed it is usually
best to add more processing power by using more or more powerful devices. Though, there
might be cases where we could get away with it by implementing some workarounds after
identifying the root cause and the traffic profiles. Determining causes of
3.1 Processes
When the CPU is high, we can try to see where it is spent and then we might be able to
alleviate it from the process that take most CPU cycles. We can collect the output of the "show
process" command, wait for 1 minute and collect it once more.
PC
SP
STATE
Runtime
SBASE
Stack Process
0 d59a9b80 7728/8192
Syslog Retry
...
Then he can do the diff of the "Runtime" column for all the processes (keep in mind that a
process might show up twice or more). By sorting the diffs from maximum to minimum we can
see the processes that take most of the CPU. Introduced in ASA 8.2, command show
processes cpu-usage non-zero sorted can be used instead.
There are cases where, for example, we might see an inspection process or the logging
process taking most of the CPU. In such cases we can disable the inspections if they are not
needed or turn down the logging level and save some CPU for the device. Please note that
processes like "Dispatch_Unit" and "interface polling" relate to regular packet processing and
there is not much that can be done to alleviate the CPU from them.
3.2 Traffic
If the traffic hitting the firewall is excessive, we can also try to send only necessary traffic
through it. Although, this solution is not practical in most setups, there might be cases where
someone has alternate routes for his traffic and he might not need to "firewall" all packets. In
such scenarios he can use policy based routing (PBR) to divert to the firewall only traffic that
needs to be "firewalled".
Comments
Collapse all
Recent replies last
See More
See More
See More
See More
GigabitEthernet0/1:
received (in 381268.650 secs):
3890040853 packets
4231288232604 bytes
10000 pkts/sec 11097005 bytes/sec
transmitted (in 381268.650 secs):
2709519449 packets
793464758386 bytes
7005 pkts/sec 2081004 bytes/sec
1 minute input rate 15200 pkts/sec, 16836427 bytes/sec
1 minute output rate 10902 pkts/sec, 3050236 bytes/sec
1 minute drop rate, 0 pkts/sec
5 minute input rate 16043 pkts/sec, 17778454 bytes/sec
5 minute output rate 11426 pkts/sec, 3209117 bytes/sec
5 minute drop rate, 0 pkts/sec
GigabitEthernet0/2:
Resource
Telnet
Syslogs [rate]
Conns
Xlates
Hosts
Conns [rate]
Inspects [rate]
Syslogs [rate]
Conns
Xlates
Hosts
Conns [rate]
Inspects [rate]
Syslogs [rate]
Conns
Xlates
Hosts
Conns [rate]
Inspects [rate]
Syslogs [rate]
Conns
Xlates
Hosts
Conns [rate]
Inspects [rate]
Syslogs [rate]
Conns
Xlates
Hosts
Conns [rate]
Inspects [rate]
Syslogs [rate]
Current
1
241
5223
236
734
111
23
3
112
14
35
23
8
89
1760
612
378
26
15
3
13725
117
528
709
638
189
1961
336
1005
69
101
148
Peak
Limit
2
5
18952 unlimited
7005 unlimited
591 unlimited
1214 unlimited
1185 unlimited
999 unlimited
8 unlimited
896 unlimited
14 unlimited
47 unlimited
297 unlimited
34 unlimited
982 unlimited
2250 unlimited
1822 unlimited
497 unlimited
330 unlimited
280 unlimited
91389 unlimited
15705 unlimited
117 unlimited
770 unlimited
1843 unlimited
1743 unlimited
3262 unlimited
4608 unlimited
3897 unlimited
1923 unlimited
1892 unlimited
1888 unlimited
8627 unlimited
Denied Context
0 ad
0 ad
0 ad
0 ad
0 ad
0 ad
0 ad
0 A
0 A
0 A
0 A
0 A
0 A
0 PE
0 PE
0 PE
0 PE
0 PE
0 PE
0 Pr
0 Pr
0 Pr
0 Pr
0 Pr
0 Pr
0 Pu
0 Pu
0 Pu
0 Pu
0 Pu
0 Pu
0 U
Conns
Xlates
Hosts
Conns [rate]
Inspects [rate]
3513
19
72
73
6
4292 unlimited
19 unlimited
115 unlimited
266 unlimited
106 unlimited
0 U
0 U
0 U
0 U
0 U
See More
Hello Pkampana,
Nice work. It is a good document. It really helps a lot. But regarding to the
overrun and input errors, it is different from other Cisco documents. In your
example about overrun errors, input errors is 0, overrun errors are 3276.
However based on other document Input errors = Runts + Giants + CRC + Frame +
Overrun + Ignored + Abort. According to your document, high overrun errors may
be because oversubscription on an interface. According to other Cisco
troubleshooting guide, high input and overrun errors may be because of mismatch
speed and duplex.
We have a ASA 5550 which has a lot of input and overrun errors and L2 decode
drops. I opened TAC SR 615730925. In our case, I do not see mismatch speed and
duplex as both sides are configured as auto/auto and "show interface" shows
correct speed and duplex. But show traffic did not show exceesive traffic on
the interface either.
The response I got from TAC are that we might have an overload interface
Can you help us clarify what is the cause of these high input and overrun
errors?
Thanks,
...
Thank you for the feedback Sean. The outputs you see in the snippets are not real. I arbitrarily
chose the numbers that you see, so they are a little inaccurate. I was trying to convey what
the counters mean.
I would suggest you to eliminate the dup,ex mismatch case (you already did), checking for a
bad cable and then look at the traffic the ASA sees. You would need to clear traffic and "sh
traffic" as the doc explains. Check how close the overall throughput is compared to the 5550
name speeds.
I hope it helps.
PK
See More
See More
See More
See More
Hi,
Great doc.
Is it complete now?
BR
Pavel
Hi Pavel,
Yes, it covers more or less everything that can be done to investigate and try to solve ASA
oversubscription.
Feedback welcome.
Take care,
PK
See More
https://supportforums.cisco.com/document/47506/asa-oversubscription-interface-errrs-troubleshooting