Você está na página 1de 71

Analyzing TCP

Performance

Stephen Hemminger
Sr. Staff Engineer
Linux Kongress 2004
2004-09-09

Copyright 2004 OSDL, All rights reserved.
Agenda

■ Introduction
■ TCP for muggles

■ Engineering Process

■ Problem examples

■ Network Tools

■ Wrapup


Copyright 2004 OSDL, All rights reserved. -2-
Outside of scope

■ Non TCP protocols


■ SCTP, multicast, etc
■ Queuing theory - “no math”
■ Hardware and product comparisons


Copyright 2004 OSDL, All rights reserved. -3-
My Background

■ Did TCP back in the “old school”


■ BSD 4.2, Ethernet
■ SMP Unix versions of OSI, Netware, Appletalk, ...
■ Plan9 Hypercube communication
■ Linux
■ Incorporation of TCP research in 2.6 kernel
■ Performance tests for LWE
■ Wizard gap


Copyright 2004 OSDL, All rights reserved. -4-
Limits of my knowledge

■ Only worked with current Linux (2.4/2.6)


■ Will mention tools here that I have not used

extensively
■ Involved in development of Linux not deployment

or research


Copyright 2004 OSDL, All rights reserved. -5-
Agenda

■ Introduction
■ TCP for muggles

■ Engineering Process

■ Problem examples

■ Network Tools

■ Wrapup


Copyright 2004 OSDL, All rights reserved. -6-
TCP for “muggles”

■ connection establishment
■ slow start

■ windows

■ congestion control

■ silly window


Copyright 2004 OSDL, All rights reserved. -7-
Connection establishment

Client Server

SYN
connect
ACK
+
SYN accept
write
Dat
a1
(10
)

ck 11
A
read

Copyright 2004 OSDL, All rights reserved. -8-
ethereal


Copyright 2004 OSDL, All rights reserved. -9-
tcpdump trace

13:28:21.745624 IP 172.20.1.60.38052 > 216.239.39.99.http: S 1765497548:1765497548(0)


win 5840 <mss 1460,sackOK,timestamp 1563951453 0,nop,wscale 7>
13:28:21.831935 IP 216.239.39.99.http > 172.20.1.60.38052: S 227058185:227058185(0)
ack 1765497549 win 8190 <mss 1460>
13:28:21.832035 IP 172.20.1.60.38052 > 216.239.39.99.http: . ack 1 win 5840
13:28:21.832321 IP 172.20.1.60.38052 > 216.239.39.99.http: P 1:126(125) ack 1 win 5840
13:28:21.939237 IP 216.239.39.99.http > 172.20.1.60.38052: . ack 126 win 31460
13:28:21.972448 IP 216.239.39.99.http > 172.20.1.60.38052: P 1:485(484) ack 126 win 31460
13:28:21.972529 IP 172.20.1.60.38052 > 216.239.39.99.http: . ack 485 win 6432
13:28:21.973016 IP 172.20.1.60.38052 > 216.239.39.99.http: F 126:126(0) ack 485 win 6432


Copyright 2004 OSDL, All rights reserved. - 10 -
Flow control

10 10 ( 50 00)
write ACK
Data
1011
Data (1400
2411 )
Data (1
3811 400)
Data (
5211 1400)
(800)

60 10 (0)
Ack
read (1000)
(1000)
k 6010
Ac


Copyright 2004 OSDL, All rights reserved. - 11 -
Retransmission

write
Data
1

Ack 1
Multiple ack's Ack 1
= fast retransmit
Data 2


Copyright 2004 OSDL, All rights reserved. - 12 -
Tcptrace

http://tcptrace.org
Tool to convert captured data into graphs
■ Time sequence graph
■ Throughput

■ RTT

Lots more than time to cover here!


Copyright 2004 OSDL, All rights reserved. - 13 -
Xplot

http://xplot.org
■ Takes plot command scripts

■ Mouse

■ Zoom – drag with the left button


■ Zoom out – click the left button
■ Scroll – drag with middle button
■ Dump – shift-left button produces postscript
■ Shift-middle and shift-right also


Copyright 2004 OSDL, All rights reserved. - 14 -
Time Sequence Graph


Copyright 2004 OSDL, All rights reserved. - 15 -

Copyright 2004 OSDL, All rights reserved. - 16 -
Windows & Buffering

■ Used to isolate TCP from application read/write


■ Used for congestion control

■ Upper bound determined by system parameters


Copyright 2004 OSDL, All rights reserved. - 17 -
Congestion window

■ slow start
■ Window normally starts small
■ Grows in response to ack
■ congestion control
■ Packet loss = congestion


Copyright 2004 OSDL, All rights reserved. - 18 -
Silly Window

write
8k bytes ck [10]
A

“Hey, I am not going to


try and send this data now Read
give me a bigger window 8k bytes
first” [2000]
Ack

Data
OK, (2000
)
thanks


Copyright 2004 OSDL, All rights reserved. - 19 -
Model of TCP networks

Sender Receiver

Send Receive
Window Window

Data

Network

Ack
BDP = Bandwidth (bytes/sec) * Delay (secs/unit)


Copyright 2004 OSDL, All rights reserved. - 20 -
BDP - Bandwidth Delay Product

■ BDP = amount of data in transit


■ Examples

■ DSL/Cable modem (international)


1,000,000 bit/sec
* 1/8 byte/bit
* 500 ms = 62500 bytes
■ Gigabit across US
1,000,000,000 bit/sec
* 1/8 byte/bit
* 70 ms = 8,75 Mbytes


Copyright 2004 OSDL, All rights reserved. - 21 -
Bandwidth Delay Product (BDP)

1000
64K 1M
8K
LAN Research
100
Bandwidth
Mbits/sec

10

1
Broadband

0.1
0.1 1 10 100 1000
Delay (ms)


Copyright 2004 OSDL, All rights reserved. - 22 -
Internet

■ Router queues
■ Delays

■ Speed of light (70ms coast/coast)


■ Slow routers
■ Packet correlation, sizes
■ DoS


Copyright 2004 OSDL, All rights reserved. - 23 -
Extensions for larger windows

■ TCP Selective Acknowlegement (SACK)


RFC2018
■ Don't have to retransmit everything
■ Window scaling (RFC1323)
■ Window size multiplied by 2n
■ Protection Against Wrapped Sequence (PAWS)
■ Timestamp inside each packet


Copyright 2004 OSDL, All rights reserved. - 24 -
TCP options negotiation 1

Window scale by 4
IP 172.20.1.60.32820 > 216.239.39.99.http: S 3599527174:3599527174(0) win 5840
<mss 1460,sackOK,timestamp 2519711 0,nop,wscale 2>
IP 216.239.39.99.http > 172.20.1.60.32820: S 3820474812:3820474812(0) ack 3599527175
win 8190 <mss 1460>
IP 172.20.1.60.32820 > 216.239.39.99.http: . ack 1 win 5840
IP 172.20.1.60.32820 > 216.239.39.99.http: P 1:126(125) ack 1 win 5840

But server doesn't support scaling


Copyright 2004 OSDL, All rights reserved. - 25 -
TCP options negotiation 2

Window scale by 4
IP 172.20.1.60.32823 > 65.172.181.13.http: S 4120108902:4120108902(0) win 5840
<mss 1460,sackOK,timestamp 3036627 0,nop,wscale 2>
IP 65.172.181.13.http > 172.20.1.60.32823: S 2295773021:2295773021(0) ack 4120108903
win 5792
<mss 1460,sackOK,timestamp 1818411318 3036627,nop,wscale 0>
IP 172.20.1.60.32823 > 65.172.181.13.http: . ack 1 win 1460 <nop,nop,timestamp
3036628 1818411318>
IP 172.20.1.60.32823 > 65.172.181.13.http: P 1:144(143) ack 1 win 1460
<nop,nop,timestamp 3036628 1818411318>

Your scaling is okay, but don't scale mine


Copyright 2004 OSDL, All rights reserved. - 26 -
Linux TCP window tuning

■ Send window - net.ipv4.tcp_wmem


■ three values : initial default max
■ default is 4K 16K 128K

■ also limited by net.core.wmem_max


■ Receive window – net.ipv4.tcp_rmem
■ three values : initial default max

■ default is 4K 85K 170K

■ also limited by net.core.rmem_max


Copyright 2004 OSDL, All rights reserved. - 27 -
Linux TCP window tuning

■ Overall memory – net.ipv4.tcp_mem


■ three values : low pressure max
■ automatic value based on system memory
■ Application window – net.ipv4.tcp_app_mem
■ reserved space to handle slow applications


Copyright 2004 OSDL, All rights reserved. - 28 -
But!

■ Some firewalls and routers are buggy


■ Corrupt window scale change N to 0
■ Forget to track state, or read RFC wrong
■ Connections will hang because initial window looks
like a silly window
■ 1% of the net is buggy..
■ Linux 2.6.9 chooses window scale based on
maximum possible receive window
■ Default tcp_rmem => window scale of 2
■ Buggy devices will see ¼ of the real window


Copyright 2004 OSDL, All rights reserved. - 29 -
Break


Copyright 2004 OSDL, All rights reserved. - 30 -
Agenda

■ Introduction
■ TCP for muggles

■ Engineering Process

■ Problem examples

■ Network Tools

■ Wrapup


Copyright 2004 OSDL, All rights reserved. - 31 -
Performance Engineering process

■ Define what your goal


■ Capture information

■ Analyze and form hypothesis

■ Prototype to validate hypothesis

■ If successful

■ Make changes on production system

■ Report problems or patches to others


Copyright 2004 OSDL, All rights reserved. - 32 -
Goal setting

■ Know what is possible:


■ bus bandwidth, network latency, etc.

■ Know your application

■ Compare with similar applications


Copyright 2004 OSDL, All rights reserved. - 33 -
TCP performance testing

■ Goal: Improve TCP performance over high


bandwidth * delay links
■ Plan:

■ New TCP congestion control


■ Validate and test


Copyright 2004 OSDL, All rights reserved. - 34 -
Testing TCP over WAN

■ Want to test performance of TCP over high BDP


links
■ Can't afford a 10Gbit trans-continental link

■ Proposal: emulate network delay over 1Gbit

Ethernet


Copyright 2004 OSDL, All rights reserved. - 35 -
Existing network emulation tools

■Dummynet
http://info.iet.unipi.it/~luigi/ip_dummynet/
I don't want to setup separate FreeBSD machine
■ NISTnet
http://snad.ncsl.nist.gov/itg/nistnet/
Only on 2.4 and not ready to be in main tree


Copyright 2004 OSDL, All rights reserved. - 36 -
Netem

TCP
IP

netem
Ethernet (eth0)
http://developer.osdl.org/shemminger/netem
■ Started out as simple delay only hack

■ Grown up to do all the functionality of NISTnet


Copyright 2004 OSDL, All rights reserved. - 37 -
Current TCP research

■ Alternative TCP congestion


■ Vegas
■ Westwood
■ Binary Increase Congestion Control (BIC)
■ Research community based around Web100


Copyright 2004 OSDL, All rights reserved. - 38 -
TCP Reno

■ Standard default in 2.4/2.6


■ Adjusts congestion window based on packet loss

■ Slow start – window grows slowly

■ Additive Increase window on each Ack

■ Multiplicative Decrease on loss


Copyright 2004 OSDL, All rights reserved. - 39 -
TCP Vegas

■ Original work by Larry Peterson


■ Patches existed for 2.2, 2.4 and part of web100
■ sysctl net.ipv4.tcp_cong_avoid
■ Measure bandwidth based on RTT
■ Adjust congestion window on bandwidth

■ Avoids packet loss


Copyright 2004 OSDL, All rights reserved. - 40 -
TCP Westwood

■ Work by Caludio Casetti


■ Patches for 2.4 by Angelo Dell'Aera
■ sysctl net.ipv4.tcp_westwood
■ Focused on wireless
■ packet loss != congestion
■ Measure bandwidth based on RTT
■ Use normal Reno till congestion then adjust

congestion window based on bandwidth


Copyright 2004 OSDL, All rights reserved. - 41 -
Binary Increase Congestion Control (BIC)

■ Work by Lisung Xu
■ Patches for Web100 (2.4)
■ sysctl net.ipv4.tcp_bic
■ Designed for best high speed networks
■ Modification of Reno

■ Use additive increase when congestion window

is large
■ Binary search increase when window is small


Copyright 2004 OSDL, All rights reserved. - 42 -
Tuning

■ Default tcp parameters not big enough


■ Need bigger send and receive window

■ Send window autosized based on rtt already


■ Receive window autosizing was done in Web100


Copyright 2004 OSDL, All rights reserved. - 43 -
Receiver Tuning

■ Patches from John Heffner


■ sysctl net.ipv4.tcp_moderate_rcvbuf
■ Dynamic Right Sizing (DRS)
■ adjust receive window based on RTT
■ If application doesn't set window then do it for them
■ Window will grow from default to max


Copyright 2004 OSDL, All rights reserved. - 44 -
Receiver auto-tuning

1000

800
Throughput (Mbits/sec)

600

Default
400 Auto Tuned

200

0
0 50 100 150 200


Delay (ms)- 45 -
Copyright 2004 OSDL, All rights reserved.
Throughput vs Delay (initial run)

800
Reno
Vegas
Westwood
700 Bic

600
Bandwidth (Mbits/sec)

500

400

300

200

100

0
0 50 100 150 200
Delay (ms)

Copyright 2004 OSDL, All rights reserved. - 46 -
What's happening

■ NAPI
■ Driver API to allow avoiding interrupts
■ Trades off latency for overall performance
■ E1000 driver
■ Uses NAPI for transmit
Answer: Transmit ring gets full and driver flow
blocks
Solution: set TxDescriptors=1000


Copyright 2004 OSDL, All rights reserved. - 47 -
Thorughput vs Delay (rerun)

800

700

600
Throughput (bits/sec)

500

Reno
400
Vegas
Westwood
300 BIC

200

100

0
0 25 50 75 100 125 150 175 200
Delay (ms)

Copyright 2004 OSDL, All rights reserved. - 48 -
Performance still slow

■ Vegas and Westwood are terrible


■ Not at full link speed

■ Performance falling off with delay


Copyright 2004 OSDL, All rights reserved. - 49 -
Vegas trace with 100ms delay


Copyright 2004 OSDL, All rights reserved. - 50 -
Vegas detail


Copyright 2004 OSDL, All rights reserved. - 51 -
Westwood (70ms)


Copyright 2004 OSDL, All rights reserved. - 52 -
Westwood detail


Copyright 2004 OSDL, All rights reserved. - 53 -
BIC trace (100ms)


Copyright 2004 OSDL, All rights reserved. - 54 -
BIC detail (100ms)


Copyright 2004 OSDL, All rights reserved. - 55 -
How to squeeze out more performance

■ Large MTU (4k) + 63%


■ LAN driver not-module up to 10%
■ Turn off timestamps + 4%
■ Bind IRQ to processor varies


Copyright 2004 OSDL, All rights reserved. - 56 -
Congestion more work

■ Vegas doesn't use available window


■ Does it under estimate bandwidth?
■ Westwood
■ Another bandwidth problem
■ BIC
■ When does it make into binary mode?
■ What is holding back window?
■ Netem
■ Higher resolution? Packet groups?


Copyright 2004 OSDL, All rights reserved. - 57 -
Break


Copyright 2004 OSDL, All rights reserved. - 58 -
Agenda

■ Introduction
■ TCP for muggles

■ Engineering Process

■ Problem examples

■ Network Tools

■ Wrapup


Copyright 2004 OSDL, All rights reserved. - 59 -
Other tools

■ Information about
■ ISP connection
■ Sockets open
■ Testing infrastructure
■ More data capture

■ Monitoring


Copyright 2004 OSDL, All rights reserved. - 60 -
Tools: basic

■ Network path information


■ Ping – send icmp echo
■ Measure of round trip time and loss

■ Can be blocked by firewall

■ Traceroute – use IP source routing


■ Usually blocked now

■ Pathcapture (pcap)
■ Bandwidth and delay measurement


Copyright 2004 OSDL, All rights reserved. - 61 -
Tools: Network interface

■ ifconfig
■ Basic statistics, packets sent/received/errors
■ ip -stats link
■ Alternate newer, may have more info
■ SNMP
■ Remote access to same information
■ Slightly more work


Copyright 2004 OSDL, All rights reserved. - 62 -
Tools: Sockets

■ Netstat
■ TCP statistics
■ Open sockets
■ Ss
■ More statistics available (rtt, etc)
■ Recvmsg
■ Application can see TCP info (cmsg)


Copyright 2004 OSDL, All rights reserved. - 63 -
Tools: test servers

■ SYN test
telnet syntest.psc.edu 7960
■ TCP bandwidth
http://www.epm.ornl.gov/~duniga
n/java/misc/tcpbw.html
http://dslreports.com
■ ANL network config
http://miranda.ctd.anl.gov:7123
■ Path MTU
http://www.ncne.org/jumbogram/mtu_discove
ry.php

Copyright 2004 OSDL, All rights reserved. - 64 -
Tools: testing

■ Ttcp
■ Basic send /receive throughput
■ Iperf
■ Longer running tests and turnaround
■ Netperf
■ Includes cpu and other statistics
■ Dbs
■ Multiclient testing


Copyright 2004 OSDL, All rights reserved. - 65 -
Tools: monitoring

■ Ntop
■ Measure of network activity by service
■ Nice web interface
■ Mailgraph
■ Long term mail statistics
■ Web server activity log analysis


Copyright 2004 OSDL, All rights reserved. - 66 -
Tools: data capture

■ Tcpdump
■ Filter packets by protocol, address, etc
■ Decode many protcols
■ Ethereal
■ GUI interface
■ RMON
■ Remote monitoring
■ Kismet
■ Wireless activity


Copyright 2004 OSDL, All rights reserved. - 67 -
Tools: generators

■ Pktgen
■ Kernel level packet generation
■ Can generate maximum hardware packet rate
■ Network packet generator
■ Application level


Copyright 2004 OSDL, All rights reserved. - 68 -
Tools: simulation

■ Ns
■ Describe overall system
■ Event based simulation
■ Used for protocol analysis
■ SSFnet
■ More detailed models of real hardware


Copyright 2004 OSDL, All rights reserved. - 69 -
Tools: client simulator

■ Web
■ SPECweb, Apache (as), httpload
■ NFS
■ Nfsstone
■ FTP
■ Dkftpbench


Copyright 2004 OSDL, All rights reserved. - 70 -
Conclusion

■ Data capture can provide clues of:


■ Application problems
■ Device problems
■ TCP/IP problems
■ Nothing is ever simple


Copyright 2004 OSDL, All rights reserved. - 71 -

Você também pode gostar