Você está na página 1de 12

SolarWinds Technical Reference

Best Practices for


Troubleshooting NetFlow
Introduction.............................................................. 1
NetFlow Overview ................................................... 1
Troubleshooting NetFlow Service Status
Issues ................................................................. 3
Troubleshooting NetFlow Source Issues ........... 4
NetFlow Version 9 Specific Issues .................... 8
The NTA SQL Server ....................................... 10
Troubleshooting Vendor Exporters ....................... 10

IT Management Inspired by You - solarwinds.com

This Technical Reference is intended to help the


reader identify and correct issues occurring in an
Orion NetFlow Traffic Analyzer deployment.

Copyright 1995-2011 SolarWinds. All rights reserved worldwide. No part of this document may be reproduced by any means nor
modified, decompiled, disassembled, published or distributed, in whole or in part, or translated to any electronic medium or other
means without the written consent of SolarWinds. All right, title and interest in and to the software and documentation are and shall
remain the exclusive property of SolarWinds and its licensors. SolarWinds Orion, SolarWinds Cirrus, and SolarWinds Toolset
are trademarks of SolarWinds and SolarWinds.net and the SolarWinds logo are registered trademarks of SolarWinds All other
trademarks contained in this document and in the Software are the property of their respective owners.
SOLARWINDS DISCLAIMS ALL WARRANTIES, CONDITIONS OR OTHER TERMS, EXPRESS OR IMPLIED, STATUTORY OR
OTHERWISE, ON SOFTWARE AND DOCUMENTATION FURNISHED HEREUNDER INCLUDING WITHOUT LIMITATION THE
WARRANTIES OF DESIGN, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN
NO EVENT SHALL SOLARWINDS, ITS SUPPLIERS OR ITS LICENSORS BE LIABLE FOR ANY DAMAGES, WHETHER
ARISING IN TORT, CONTRACT OR ANY OTHER LEGAL THEORY EVEN IF SOLARWINDS HAS BEEN ADVISED OF THE
POSSIBILITY OF SUCH DAMAGES.
Microsoft and Windows 2000 are either registered trademarks or trademarks of Microsoft Corporation in the United States and/or
other countries.
Graph Layout Toolkit and Graph Editor Toolkit 1992 - 2001 Tom Sawyer Software, Oakland, California. All Rights Reserved.
Portions Copyright ComponentOne, LLC 1991-2002. All Rights Reserved.
Document Revised: 07/29/2011

Best Practices for Troubleshooting NetFlow 1

Introduction
This paper focuses on NetFlow troubleshooting and contains information on troubleshooting other flow
analysis technologies, such as sFlow, J-Flow and IPFIX. Because of the architectural similarities between
these flow technologies the troubleshooting techniques found herein apply to all of them. This paper will
focus strictly on troubleshooting NetFlow issues with regard to the Orion NetFlow Traffic Analyzer (NTA).
For information on the CBQoS feature of NTA see the Orion NetFlow Traffic Analyzer Administrator
Guide. For basic information on NetFlow see New to Networks Volume 3 NetFlow Basic and
Deployment Strategies.

NetFlow Overview
The term NetFlow is often used interchangeably between the three major components of NetFlow
technology, the NetFlow cache and the NetFlow exporter on the router or switch, and the NetFlow
collector used to analyze the flow information. The NetFlow cache is the active monitoring of traffic and
only exists on a NetFlow enabled device. The NetFlow exporter involves sending completed flow
information from the device to a NetFlow collector. Heres what is happening in steps:
1. Users, also known as endpoints, are accessing the information via the central router.
2. As the users data flows into a WAN routers interface, the NetFlow cache creates records about the
flows and saves the records in the NetFlow cache.
3.

As flows expire the WAN router exports the flow information to the NetFlow collector.

This architecture is common across all the previously mentioned flow technologies.

2 Best Practices for Troubleshooting NetFlow

Before troubleshooting a new installation, be certain that the prerequisites for NetFlow are all in place.
These prerequisites include:

The NetFlow Traffic Analyzer module for NPM has been installed without any error messages during
the installation.

The target network devices (exporters) have been configured to export NetFlow to the Orion NTA
server using the appropriate Orion IP address and port.

The SQL database has sufficient available space in accordance with the Orion NetFlow Traffic
Analyzer Administrator Guide.

Problems with NetFlow present themselves at the NetFlow collector, so that is a logical place to begin
looking for the source of an issue. The NetFlow sources, NetFlow Collector Status, and Last 25 Traffic
Analyzer Events resources on the Orion NetFlow Traffic Analyzer Summary screen are typically the best
places to see the overall health of NetFlow. By customizing the NTA summary page you can place these
three resources together. This provides the ability to assess the NTA status at a glance, as seen below.

To quickly review this screen, the Last Received NetFlow (top circle) updates each time NetFlow packets
are received on a known NetFlow source. The second section shows the status on the NTA receiver, in
this case the status is Up and the icon is green. If this is not up NTA cannot received NetFlow at all. The
Last 25 Traffic Analyzer Events resource is a good place to view recent problems with NetFlow.

Best Practices for Troubleshooting NetFlow 3

Probably the most common NetFlow troubleshooting method starts with examining the three resources
seen above. The Traffic Analyzer events screen can display events regarding the NetFlow service,
unknown NetFlow sources, and many other events specific to NTA. The Events window is a good place
to look for obvious problems, while the NetFlow Collector Status indicates obvious problems with the NTA
server.

Troubleshooting NetFlow Service Status Issues


If the NetFlow Service fails, this will be seen as shown below. Note the red status indicator next to the
down collector.

To troubleshoot the NTA collector service not being in the Started state, as seen below, open
Administrative Tools > Services .and check if the SolarWinds NetFlow Service is started. If it is not,
then highlight the service and click Start on the right side of the services pane. Also, check if other
SolarWinds services are not in the Started state, particularly the SolarWinds Orion Module and the
SolarWinds Information Service services.

If the service starts and then stops again, there is an underlying reason causing the service to fail. The
most common issues that cause the NetFlow service to shut down are included here:

The NTA module is a trial which has expired. Normally this will be seen in a yellow banner
http://www.bumwine.com/at the top of the Orion web console. If you have an expired trial, you can
apply a purchased license from the Customer Portal. Follow the directions in the NetFlow
Administrator Guide, Chapter 2.

There is an issue with the connection to the SQL database or with the functioning of SQL. Check that
the SQL server is up and that the SQL server has sufficient CPU and memory available. Also check
disk queue write length. For more information on where to find these metrics see the Managing Orion
Performance technical reference.

As a final attempt to reconcile the NetFlow Service being down, run the Configuration Wizard
checking all three boxes, Web, Services, and Database, as shown below.

4 Best Practices for Troubleshooting NetFlow

If these steps do not bring the collector back to a stable Up status, and you have an active maintenance
contract, you will need to open a ticket with Technical Support.

Troubleshooting NetFlow Source Issues


NetFlow sources are the devices exporting flows to the Orion NTA server.
One common forgotten step with adding NetFlow sources is to manage the device and interface sending
flow information into Orion NPM. When this occurs the following event appears in the NetFlow Events
screen.

Correcting this issue is trivial. Click Manage this device in the event message and NPM will guide you
through adding the device and interface. If you add the device but fail to add the interface, the following
event occurs.

This can be easily corrected by clicking Edit this device and NPM will guide you through adding the
interface.

Best Practices for Troubleshooting NetFlow 5

You may notice that flows are not being received or processes properly when all of the resources on a
NetFlow page have messages stating that there is no data for the requested time period, as seen below.

There are two typical causes for this.

The NetFlow exporter was not configured to export during the time you have requested.

There is an issue causing the NTA server to not display the flows.

If you believe that the device is correctly exporting flows but NTA is not displaying data, a packet capture
should be taken from the NTA server interface. Wireshark is a commonly used free tool for capturing
packets and analyzing protocols. For information on installing and using packet capture tools see the
documentation for that particular tool.
The packet capture should be run for about 5 minutes to make sure any NetFlow packets are captured.
The below capture was run for ten minutes, and then the captured packets were sorted by protocol
looking for NetFlow packets. NetFlow is often listed as CFLOW protocol in protocol analyzers.

Note that there are no CFLOW packets captured. Typically this is because of one of the following:

The exporter is not configured to send NetFlow packets to the NTA server.

There is a device between the exporter and the NTA server blocking the flow data.

NetFlow device configurations vary by model, NetFlow version, and IOS version. All configurations
implement three crucial functions.
Turn on the NetFlow cache
Apply NetFlow to interfaces

Direct flow exports to the collector (NTA server)

6 Best Practices for Troubleshooting NetFlow

Examine the device for the required configurations according to your manufacturer, model, and software
to ensure that each of these functions is in place. For Cisco IOS devices, the following should be in place:
(config)# ip flow-export version {5/9}
(config-if)# ip flow ingress
(config-if)# ip flow egress

Show commands are very helpful in quickly assessing NetFlow configuration and operation on a Cisco
device. First we will begin by examining the flow exporter on the device using the abbreviated version of
the # show ip flow export command.

This screen shows us that two of the three crucial NetFlow configurations are in place. These are:

The main cache is enabled for NetFlow v5.

The exporter is enabled and exporting to 192.168.74.117 and 10.110.6.183 on port 2055

We can also see that 152979 flow records (flows) have been exported in 5101 UDP packets. We can also
see that there are no failed exports. While there are no apparent errors, one item is missing. The NTA
server we are using is at IP address 10.110.6.196. which is not listed as an export destination. The
10.110.6.183 address was the old NTA server IP, so the configuration needs to be updated to export to
the new NTA server IP. The below commands correct this situation.

Notice that we removed the old IP export address prior to adding the new one. NetFlow only allows two
export destinations, so adding the new address without deleting the old will result in the below error.

Best Practices for Troubleshooting NetFlow 7

The next configuration to check is to see if NetFlow has been applied to the interface we wish to monitor.
A simple method is to use the #show ip flow interface command, as seen below.

The above screen shows which interfaces have NetFlow applied and the flow direction being monitored.
Ingress is defined as data flowing into the interface from the network, while egress is the opposite.
Now that we have corrected the issue with the exporter IP address and have verified that the interfaces
are configured, we will look at the packet capture on the NTA server again.

We can see that NetFlow packets are being received at the NTA server interface. Looking at the NTA
Summary view seen below we can see that NTA is processing these packets by noting the correct time
stamp on the Last Received NetFlow column.

8 Best Practices for Troubleshooting NetFlow

Drilling down to the NTA Interface Details page verifies that NetFlow is being properly received,
processed and displayed, as shown below.

NetFlow Version 9 Specific Issues


Cisco implemented version 9 to create a structure for selective flow field exporting. This feature is known
as Flexible NetFlow (FNF). Version 5 exporters send seven key fields and a handful of non-key fields.
Key fields are used to define a unique flow. All fields are part of the IP datagram header. The version 5
key fields are:

Source IP

Destination IP

Protocol

Source port

Destination port

Source interface index

Type of Service

Best Practices for Troubleshooting NetFlow 9

The below screen shot shows these key fields along with several non-key version 5 fields.

The above flow is for an ICMP (protocol 1) packet. ICMP does not use a port number, so the exporter
sets the port to 0. Version 9 not only allows you to add many additional non-key fields, but it also allows
you to redefine the key fields. This caused issues for the NetFlow collector in attempting to identify unique
flows. NTA processes NetFlow version 9 without issue, unless FNF is configured on the exporter and the
seven version 5 key fields are not included in the configuration.
Some devices only support version 9 and this is typically not an issue. To ensure that FNF exports are
processed by NTA, be certain to include all seven of the version 5 key fields.
FNF enabled version 9 exporters send two types of flow information, flow templates and flow records.
Both are required for FNF to function. If you have a version 9 exporter and the NetFlow source appears
and disappears randomly, this may be due to the flow template expiring on the NTA server.
The below packet capture shows version 9 FNF data being received but no template is available to define
the flow fields.

The flow is telling the NTA server to refer to template 265, but the server does not have that template in
memory. To fix this issue add the following to the device NetFlow configuration.
(config)#flow-export template timeout-rate 1

10 Best Practices for Troubleshooting NetFlow

This will configure the exporter to send the template every minute.
Some device will re-index the interface index numbers frequently. While Orion and NTA do compensate
for interface re-indexing through device discovery, some devices will lose their NTA IF index due to rapid
re-indexing. This is seen in the NetFlow Events resource as seen below.

Creating an automatic rediscovery of these devices to run every 15 minutes will rectify this issue. Only
include the devices having the re-indexing issues in this discovery job.

The NTA SQL Server


NTA is a very data intensive technology. As such, NTA has the possibility of overtaxing some SQL server
implementations. The Managing Orion Performance technical reference contains detailed information on
Orion and SQL performance. There are also several options you have to improve SQL performance from
the NTA server. These include:

Plan your NetFlow exporters with the network topology in mind. There is not much use in enabling
NetFlow on every NetFlow capable device. Data duplication and unnecessary data will only serve to
make useful flow information difficult to find.

Use reasonable data retention times. One primary use case for NTA is to quickly identify bottlenecks
and the cause behind them. Data overload from unreasonable or unnecessary retention will
eventually slow the NTA server.

Make use of the Top Talkers optimization feature in NTA. This feature can greatly enhance NTA
performance.

Troubleshooting Vendor Exporters


Troubleshooting J-Flow and SFlow exporters varies greatly depending on the device model, software
level and other factors. Below are useful links for troubleshooting issues on these types of exporters.
Juniper J-Flow

Setting up J-Flow on a J-Series router


http://kb.juniper.net/InfoCenter/index?page=content&id=KB12512
Example working J-Flow configuration [Relevant to J-Series and SRX devices]
http://kb.juniper.net/InfoCenter/index?page=content&id=KB21023

SRX Getting Started - Configure J-Flow


http://kb.juniper.net/InfoCenter/index?page=content&id=KB16677
HP sFlow ImplementationHP SFlow

Você também pode gostar