Escolar Documentos
Profissional Documentos
Cultura Documentos
Authors
Version Number
Version Date
Status
File Name
Danny Mongrain
1.6.0
2015-03-10
Final
Troubleshooting Riverbed WAN OPTIMIZATION.DOC
Revision History
Version
1.0
1.1
1.2
1.3
Date
2013-06-20
2013-07-16
2013-07-16
2014-02-21
By
Danny Mongrain
Danny Mongrain
Danny Mongrain
Danny Mongrain
1.4
1.5
1.5.1
1.5.2
1.5.3
1.6.0
2014-02-21
2014-05-30
2014-06-19
2014-06-19
2014-10-01
2015-01-12
Danny Mongrain
Danny Mongrain
Danny Mongrain
Danny Mongrain
Danny Mongrain
Danny Mongrain
Comments
Initial draft
Added section Getting support from Riverbed TAC.
Added section Software downgrade.
Enforced the requirement to make product aware
when configuration is changed locally or if
passthrough rule must be kept for a while.
Added Secure Peering section.
Added No Logon Servers section
Removed Troubleshooting HTTP problem (Rios 6.5)
Added Scheduling a Reboot and Service restart
Added Service Error
Renamed CMC for SCC everywhere.
Updated screenshots following changes to GIU.
Corrected typos, etc.
Page 2 of 30
Table of Content
1
Component Description_____________________________________________________________4
1.2
Scope______________________________________________________________________________4
1.3
Documentation______________________________________________________________________4
1.4
Prerequisites_______________________________________________________________________4
1.5
Disclaimer__________________________________________________________________________4
Packet capture_________________________________________________________________________14
10
11
12
13
14
15
16
Software downgrade_________________________________________________________________27
17
Scheduling a reboot__________________________________________________________________28
18
19
Page 3 of 30
Component Description
Riverbed Steelheads are WAN optimization controllers (WOC) that accelerates TCP traffic.
1.2
Scope
This document contains information that can be useful when operating the Riverbed Steelheads
(the how to), including the SteelCentral controller (SCC, ex. CMC) but excluding Steelhead Mobile.
The scope of this document is operating daily tasks and troubleshooting common problems on both the
Steelheads and the SCC.
1.3
Documentation
All the vendor documentation for this product can be found on Riverbed
http://support.riverbed.com. A username and password is required to get full access.
1.4
web
site:
Prerequisites
Ensure that WOC is installed and configured according to best practices and Riverbed deployment
guides.
1.5
Disclaimer
This document is NOT an official Riverbed document. In doubt, always adhere to Riverbed
documentation and follow instructions from Riverbed support. Use at your own risk.
Page 4 of 30
Action
The CSH is the WOC at the same location (site) as the client, which is the system that issues the TCP
connection (SYN) towards a server. If the client is in a Campus/MAN network its WOC might be in the central
site that provides WAN connectivity for the MAN.
In doubt, consult the network diagram of the location where the client is located.
Steelhead Mobile agents are CSH only, they cannot be SSH.
You cant determine who the CSH is but you know
who the server-side Steelhead (SSH) is? Connect to
it and go to Report > Current connections. Filter
using the client IP and ALL connection type. Click on
the looking glass of any optimized connection with
your client as source IP. In the screen that opens,
look for Peer Appliance. That is the inpath IP of your
CSH (10.23.255.148 in this example). You dont
know its name but you can always connect to its
inpath IP directly.
TIP: If the Peer Appliance IP is the same as the
source IP issuing the connecting then the CSH is a
Steelhead Mobile agent. You might need to refer to
the 705 WAN Optimization Mobile document
depending on what is the problem.
Action
Page 5 of 30
Action
Go to Report > Current connections and filter using
the client IP and ALL connection type. Confirm that
the timestamp on the connection you expect to be new
is more recent that your last config change.
If the timestamp is older than your config change then
the connection was never closed, so your change is not
effective. If the user cannot kill the connection on its
own you may attempt to reset if from the WOC
interface but this doesnt work all the time, depending
on OS and software combination.
To reset it click on the looking glass of your connection
and click on the bottom button Reset Connection.
You might have to do it multiple times. If reset doesnt
work then youll have to ask the client to log off or
reboot if he cant have his application shut its TCP
sockets.
Action
Connect to the CSH (section Identifying the client-side
Steelhead (CSH)), go to Report > Current
connections and filter using the client IP and ALL
connection type.
Locate the TCP connection that is reported as having
problems. Confirm its the good one by looking at the
server and destination port (service port).
If the connection is not listed then youre on the wrong
CSH.
If the connection is not optimized (
) then the WOC
is not modifying the natural behavior of the connection.
Your problem is most likely elsewhere. If you want to
understand why your connection is not optimized go to
step Troubleshooting a passthrough connection.
Page 6 of 30
Action
The most efficient way to determine if the WOC is
causing the issue is to remove the WOC from the path.
To do so, youll configure a passthrough rule (bypass)
on the CSH. Doing so on the SSH is useless.
Page 7 of 30
Action
Follow the steps in section How to clear configuration changed alarms to get rid of your temporary rule and the
alarm in one step.
If the problem is gone then the WOC is involved in the problem (not necessarily the root cause of it).
Depending on what the exact problem is, you could have to do one or many of these:
restart the service and/or the WOC (section Troubleshooting a general failure)
upgrade the WOC as your problem might be a bug that got fixed
Action
Connect to the CSH (section Identifying the client-side Steelhead (CSH)) and the server-side Steelhead
(SSH). Go to Report > Current connections and filter using IP or port and ALL connection type. Locate the
passthrough TCP connection that you are investigating. Confirm its the good one by looking at the server and
destination port (service port).
Steelheads groups passthrough connections into two families: Intentional passthrough are considered perfectly
normal from a WOC perspective, while unintentional are considered a problem.
The most typical passthrough reasons are explained here.
Inpath rule
(intentional passthrough)
This one means a passthrough inpath rule on the CSH
is responsible. The Rios interface wont tell you which
rule exactly, you have to figure this out on your own.
There are 3 typical situations:
Page 8 of 30
Action
Secure, Interactive or RBT-Proto. To verify
which port is in these port labels go to
Configure Networking Port Labels. Do
not modify these ports, ever.
Page 9 of 30
Action
SYN on WAN side
(unintentional passthrough)
This one means there is a single WOC on the end-toend connection. The WOC added his information as
TCP options into the SYN packet (auto discovery
process) but no other WOC has seen that SYN and
tried to established an optimized connection.
This usually happens when there is no CSH WOC at
the client location and the connection is established
towards a server at a location with a SSH WOC. The
SSH will be the first and only WOC but the SYN
(without TCP options from a CSH) is seen on its WAN
interface instead of a LAN interface. A Riverbed
Steelhead initiate auto-discovery only on LAN
interfaces but accepts auto-discovery answers on both.
In this case there were no CSH so the SSH is
effectively the first WOC but the SYN comes in on a
WAN interface and optimization is denied.
Another scenario is if the LAN and WAN wires are
reversed on the CSH. The LAN clients send their
SYNs to the WAN and the CSH doesnt like it.
Action
Starting with Rios 7, Riverbed has introduced an automated per host HTTP auto-configuration. The CSH will
compile and analyze every HTTP connection. Once it has enough data at hand it will decide which optimization
techniques to apply, per HTTP server.
There are rare situations where the auto-configuration will cause issues such as web page not opening,
authentication issues, etc. Your first step should be top diagnose the problem using the Troubleshooting an
Optimized connection section. Follow up with these steps if a passthrough inpath rule clears the problem and
the destination (service port) is 80 (HTTP).
Start by connecting to the CSH (section Identifying the client-side Steelhead (CSH)),
Page 10 of 30
Action
Go to Configure > Optimization > HTTP.
Click on the web server having the issue in the list and
remove all optimization techniques. Click Apply and
Make Static.
Page 11 of 30
Action
exception permanently.
Action
You should first start by looking at the health status of
the WOC, the problem might be listed there. Go to
Reports > Diagnostics > Alarm status and Reports
> Diagnostics > System details and check if anything
is reported wrong and could be related to your problem.
If most or all optimized TCP traffic is having severe
problems (CIFS, HTTP, MAPI, etc.) but unoptimized
traffic is ok (Telnet/SSH, RDP, anything to internet,
IPT), then the WOC as a whole might be causing
general issues. Restarting or stopping its service might
help.
Page 12 of 30
Action
Beware as this will disrupt all optimized traffic. Most
connections will end up unoptimized until they are reestablished, this can take hours/days depending on the
application. You should only restart or stop the service
if things are going really bad in a site.
Go to Configure > Maintenance > Services, and click
Restart. If the problem goes away for a while but
come back a bit later try doing a full service Stop. If
the problem is gone then the WOC was causing a
general failure. That is a very rare problem, contact
Riverbed Support.
Rebooting the Steelhead as a whole is not required as
the effect is the same as a service restart but it takes
longer to complete. Rebooting a WOC is only useful
for RIOS upgrade. The same goes with powering off a
WOC which is the same as a service stop but you can
enable it back from the network.
You shall never choose the Clear Data Store option
when changing the state of the service. This flushes
the data store cache and will reduce performances
significantly for many days. Use that option only if
instructed by Riverbed support.
Packet capture
Follow this procedure to conduct a packet capture (sniffing trace tcpdump).
Action
Depending on what your problem is you might need to obtain a capture on the CSH, the SSH or both.
Page 13 of 30
Action
Go to Reports > Diagnostics > TCP Dumps. Click
Add a new TCP Dumps.
Give it a meaningful name including a short problem
description, your name and the date.
Use the IP/ports filters as required. Beware if your filter
is too narrow you might not capture the origin of the
problem; if your filter is too large your captures files will
be too big and finding the culprit will be difficult.
Apply the capture on the proper lan interface(s). If
there are many verify which will see your traffic by
referencing to the visio and the local routing/arp table.
Because we use correct addressing most wan traffic
will be on Riverbed ports with CSH and SSH as source
IP and the payload wont be understandable. That is
why a capture on a wan interface is rarely useful.
Capture duration: as you wish, but I usually use 0
which means Ill have to stop the capture myself when I
see fit. Its up to you. Maximum capture size and
number of files to rotate defines how much data youll
keep and in how many files, that is a safety gap in case
your filter is too wide and the amount of traffic too high.
Click Add to start the capture.
If you configured an ongoing capture using 0 in its
duration: Select it and click Stop Selected Captures
when your done.
Your capture is ready to be downloaded. There will be
a capture file per interface selected. The name of the
WOC and the interface are automatically prefixed in the
file name.
Please delete all capture files when youre done so that
the disks space is not wasted with old files.
Action
Admission control is a state in which a WOC refuses to accelerate new TCP connections. Warnings in the form
of alarms on the SCC are triggered at 85% of the maximum. New TCP connections are denied optimization
once 100% is reached.
There are various reasons for a WOC to be in Admission control and figuring out can be easy or quite difficult.
Page 14 of 30
Action
Page 15 of 30
Action
team. When it runs it opens 1000s of connections,
which can cause WOCs to go in admission control.
Problematic destination?
If the same destination IP (same destination port) is
seen with multiple connections it might be just normal
(i.e. Exchange server, connected to by every single PC
in the site) or it might not be.
Examples:
An office had a WOC sized for its user count but there
was a local Exchange server used by remote offices.
All the remote PCs had multiple MAPI connections to
the server, causing admission control at the central
site. The WOC had to be upgraded because it is used
both as a client-side Steelhead and a server-side
Steelhead.
Problematic client <> server?
If you see lots of connections with same source and
destination IPs, and always the same destination port,
then a pair of systems is using a lot of TCP capacity.
This is quite common between a Read-only Domain
controller (RODC) and a normal DC. This is caused by
a RIOS bug that has yet to be fixed.
Example on the left: local RODC opens lots of TCP
sockets to a central site DC. All connections are on the
same port, they all look alike, they are all very small
(2KB). They never go away. The destination port is
not predictable; hence a passthrough rule cannot be
configured.
Other example: A faulty Outlook client was opening
600+ MAPI ports to Exchange. A new Outlook profile
on the PC fixed the issue.
Page 16 of 30
10
Action
Log on the SCC, go to the Reports > Topology >
Appliance status page, click on the Appliances
Needing Attention pane. Your WOC should be there
in alarm: The configuration on appliance has been
changed.
Page 17 of 30
Action
Go to Manage > Topology > Appliances and select
the checkbox next to you WOC.
Click Appliance Operations at the top right of the
page, leave the default operation Push Policies, leave
all options unchecked, and click Push.
Wait 2 minutes then confirm on your WOC that the
temporary rule is gone.
Page 18 of 30
11
Action
Log onto the Steelhead reporting an SSL SSL
Certificates Expiring in its alarm page. The alarm will
tell you that the issue is with a Certificate Authority
(CA).
Page 19 of 30
12
Action
Sometime following a long network outage the SCC will lose track of a WOC. On the SCC Topology >
Appliance status page it will be seen as Disconnected: unreachable address or Disconnected: invalid
username / password. A manual reconnect may help.
First you must confirm your WOC is reachable. Connect to it using its DNS name or Primary IP. Its Home
page will show CMC (or SCC): not managed instead of the usual CMC (or SCC): [your CMC hostname/IP].
Log on the SCC, go to Manage > Topology >
Appliances, click on your WOC (not its checkbox), go
to the Appliance Utilities pane, and click Reconnect.
Wait 2 minutes then verify if your WOC Home page
shows it is managed by mc-qcmtl1-05-01.
Page 20 of 30
13
Follow this procedure if an HDD required a raid rebuild. If the same drive has the same problem more
than once you should open a ticket with Riverbed support and request an RMA.
Action
enable
configure terminal
Page 21 of 30
14
Action
Steelheads are powerful reporting tools and may be used to investigate traffic trend, top talkers, etc.
You must always understand the local topology when analyzing traffic stats:
If the WOC is getting the WAN packets by WCCP then you must analyze the WCCP ACL as it will most
likely ignore a lot of useless traffic which wont show in your report as the WOC never sees those
packets.
If the WOC is physical inpath then its sees everything including internet-bound traffic. This traffic will be
included in the stats in the passthrough category.
If there is a DMZ at the sites, routed on the firewall off the WOC wan port, the WOC will most likely see
the LAN-to-DMZ traffic. It wont optimize it but this traffic will show in your report.
Reports on passthrough traffic dont make any difference between internet traffic and corporate traffic
that couldnt be optimized (i.e. no remote WOC, traffic to local DMZ, etc.).
All these are bundled together per TCP ports.
If the WOC uses Hardware passthrough (HAP), ignored packets wont show in the reports.
The same stats available on individual WOCS are also on the SCC. Generally speaking the SCC is better for
long term local trends reporting or aggregated country/regional/global stats, while local WOCs are better for
short term, local stats.
The SCC aggregates stats of current WOCs only. If a WOC is removed its stats goes away with him. If a WOC
is moved to a different location its historical stats moves with him. This may invalidates some reports.
The direction (Bi-Directional, WAN-to-LAN or LAN-to-WAN) only applies to the individual packets without
regards to the location of the client and server. A local client that downloads from a remote server will look
exactly the same as a remote client that uploads to a local server.
TCP 8779 (SMB2) is using TCP 445 in reality (i.e. current connections, TCPdumps). It is reported on its own
port just to separate from its predecessor SMB1 (CIFS).
LAN statistics represent packets to and from the client and server as they see them (and so do the LAN
switches). WAN statistics represent the same packets after they were optimized by the WOCs. They were
either pre-cached (only the index were sent) compressed or removed (optimization of protocol chattiness).
Page 22 of 30
Action
Per protocol investigation (aggregate)
On the WOC, go to Reports Networking Traffic
Summary.
This report gives on the sum of traffic per protocol, the
reduction % (caching and compression combined) and
the weight of the protocol compared to all traffic in the
site (using pre-optimized LAN stats).
Page 23 of 30
Action
Per host investigation (live)
On the WOC, go to Reports Networking Current
Connections. Filter with ALL optimized.
This report gives you per-connection statistics. The
connections must exist (active or idle) for stats to be
displayed. Connections are removed if a TCP FIN of
RST is seen, of if the WOC service stops.
You may filter with source or destination, IP or port,
protocol name (i.e. MAPI uses different ports) by using
the search field. You may sort all columns as you wish.
There is no export tool for the Current Connections table. To export manually, select the content of the columns
you need while holding the CTRL key so that only those columns are selected. Copy-paste to notepad, save,
open Excel, open your file (filter with all files *.*), accept the format warning, accept the default delimited
column format, click Other and specify : in the box, next, Done. You now have a much powerful tool to sort,
compile, remove duplicates, etc.
Per host investigation (live)
On the WOC, go to Reports Networking Top
talkers.
This report is a lightweight Netflow reporting tool.
This report gives you stats bundled either per source
(Sender), per destination (Receiver), combined
source+destination (Host), per TCP port (Application
ports) or per connections (Conversation).
The period is either last hour last day or All (two days).
Warning: Passthrough is both internet-bound traffic and
internal corporate traffic that couldnt be optimized (no
remote WOC, local DMZ, etc.).
Page 24 of 30
15
Action
If applicable, do a packet capture on the CSH and SSH. (section Packet capture).
If applicable, get screenshots of the problem as it is seen by the user.
Once the problem was reproduced, go to Reports
Diagnostics System Dumps. Choose Include
Statistics and Include All Logs then click Generate
System Dump. The dumps will be ready in a few
minutes.
Downloads the LAN and WAN TCP dumps from both the CSH and SSH (4 TCP dumps in total) when the
problem occurs. Do the same with a passthrough rule if it clears the problem (4 more TCP dumps). Download
the System Dumps. Name all files explecitely such as the TAC engineer will know which is CSH, which is SSH,
which is optimized (not working) and which is passthrough (working). Wrap all these into a single ZIP file, and
include any other files you might need such as screenshots, visio, etc.
Login to https://support.riverbed.com. Youll need an individual account to get in. If you dont have any go
ahead and create one, it will be helpful. It only takes 2 minutes.
Once youre in, go to My Riverbed (top right) then
Cases and RMAs.
Page 25 of 30
Action
Fill in the necessary information.
Please use a precise yet short description in the
subject field as it cannot be changed afterward.
Connection not working or WOC issue is too vague.
Uses HTTP timeout after Rios8.5 upgrade or
Steelhead wont boot after reload instead.
Priority: Should be P3 if you have a workaround
(passthrough rule until problem is fixed) or P2 is users
are affected by the problem. P1 shall be very rarely
used as it means the company operations as a whole
are severely degraded or stopped due to this problem.
Use the Steelhead serial # in the Product identified
field. To get the serial go to Support.
Attach the ZIP file you created the step before only if
its smaller than 50 MB.
Submit the ticket. Note the case ticket #.
If your ZIP was too big to be uploaded in the WEB
form, connect to ftp.riberbed.com using anonymous as
user and your email address as the password.
Rename your ZIP as [case ticket #].zip and upload to
the Incoming folder.
A Riverbed support engineer will eventually contact
you, usually by email but sometime directly by phone.
If you need faster service dial 1.888.782.3822, provide
your case ticket # and ask to get hold of your engineer
ASAP.
Page 26 of 30
16
Software downgrade
Follow this procedure if you need to downgrade the version of a recently upgraded WOC if a new
problem is noticed.
Action
Once you have determined that the problem is WOCrelated and it follows a very recent version upgrade, go
to Configure Maintenance Software Upgrade.
Click Switch to Backup Version.
Wait a few minutes for the WOC to reboot. Log back in and confirm its running its previous version. Verify if
the problem is gone.
Page 27 of 30
17
Scheduling a reboot
Follow this procedure if a WOC requires a reboot and you need this to happen outside business hours.
Action
Connect to the WOC in HTTP(s), Go to go to
Configure Maintenance Reboot/Shut Down.
Do not check the Clear Data Store except if you have
a very good reason of doing it.
Click Schedule Later and enter a date/time.
Click reboot (dont click Shut Down!! Or else youll
need a local contact to power it back on).
Page 28 of 30
18
Action
You cannot schedule a service restart on a WOC, it
can only be done from the SCC.
Log on the SCC and go to Manage Appliances >
Appliances.
Click the checkbox that precedes all the WOCs to be
reloaded.
Click Appliance Operations.
Select the operation: Start/Stop Services.
Change Service Actions to Restart.
Do not check the Clear Data Store except if you have
a very good reason of doing it.
Click Schedule Later and enter a date/time.
Page 29 of 30
19
Action
Connect to the WOC CLI interface by SSH.
Type:
enable
Page 30 of 30