Escolar Documentos
Profissional Documentos
Cultura Documentos
Performance
Stephen Hemminger
Sr. Staff Engineer
Linux Kongress 2004
2004-09-09
Copyright 2004 OSDL, All rights reserved.
Agenda
■ Introduction
■ TCP for muggles
■ Engineering Process
■ Problem examples
■ Network Tools
■ Wrapup
Copyright 2004 OSDL, All rights reserved. -2-
Outside of scope
Copyright 2004 OSDL, All rights reserved. -3-
My Background
Copyright 2004 OSDL, All rights reserved. -4-
Limits of my knowledge
extensively
■ Involved in development of Linux not deployment
or research
Copyright 2004 OSDL, All rights reserved. -5-
Agenda
■ Introduction
■ TCP for muggles
■ Engineering Process
■ Problem examples
■ Network Tools
■ Wrapup
Copyright 2004 OSDL, All rights reserved. -6-
TCP for “muggles”
■ connection establishment
■ slow start
■ windows
■ congestion control
■ silly window
Copyright 2004 OSDL, All rights reserved. -7-
Connection establishment
Client Server
SYN
connect
ACK
+
SYN accept
write
Dat
a1
(10
)
ck 11
A
read
Copyright 2004 OSDL, All rights reserved. -8-
ethereal
Copyright 2004 OSDL, All rights reserved. -9-
tcpdump trace
Copyright 2004 OSDL, All rights reserved. - 10 -
Flow control
10 10 ( 50 00)
write ACK
Data
1011
Data (1400
2411 )
Data (1
3811 400)
Data (
5211 1400)
(800)
60 10 (0)
Ack
read (1000)
(1000)
k 6010
Ac
Copyright 2004 OSDL, All rights reserved. - 11 -
Retransmission
write
Data
1
Ack 1
Multiple ack's Ack 1
= fast retransmit
Data 2
Copyright 2004 OSDL, All rights reserved. - 12 -
Tcptrace
http://tcptrace.org
Tool to convert captured data into graphs
■ Time sequence graph
■ Throughput
■ RTT
Copyright 2004 OSDL, All rights reserved. - 13 -
Xplot
http://xplot.org
■ Takes plot command scripts
■ Mouse
Copyright 2004 OSDL, All rights reserved. - 14 -
Time Sequence Graph
Copyright 2004 OSDL, All rights reserved. - 15 -
Copyright 2004 OSDL, All rights reserved. - 16 -
Windows & Buffering
Copyright 2004 OSDL, All rights reserved. - 17 -
Congestion window
■ slow start
■ Window normally starts small
■ Grows in response to ack
■ congestion control
■ Packet loss = congestion
Copyright 2004 OSDL, All rights reserved. - 18 -
Silly Window
write
8k bytes ck [10]
A
Data
OK, (2000
)
thanks
Copyright 2004 OSDL, All rights reserved. - 19 -
Model of TCP networks
Sender Receiver
Send Receive
Window Window
Data
Network
Ack
BDP = Bandwidth (bytes/sec) * Delay (secs/unit)
Copyright 2004 OSDL, All rights reserved. - 20 -
BDP - Bandwidth Delay Product
Copyright 2004 OSDL, All rights reserved. - 21 -
Bandwidth Delay Product (BDP)
1000
64K 1M
8K
LAN Research
100
Bandwidth
Mbits/sec
10
1
Broadband
0.1
0.1 1 10 100 1000
Delay (ms)
Copyright 2004 OSDL, All rights reserved. - 22 -
Internet
■ Router queues
■ Delays
Copyright 2004 OSDL, All rights reserved. - 23 -
Extensions for larger windows
Copyright 2004 OSDL, All rights reserved. - 24 -
TCP options negotiation 1
Window scale by 4
IP 172.20.1.60.32820 > 216.239.39.99.http: S 3599527174:3599527174(0) win 5840
<mss 1460,sackOK,timestamp 2519711 0,nop,wscale 2>
IP 216.239.39.99.http > 172.20.1.60.32820: S 3820474812:3820474812(0) ack 3599527175
win 8190 <mss 1460>
IP 172.20.1.60.32820 > 216.239.39.99.http: . ack 1 win 5840
IP 172.20.1.60.32820 > 216.239.39.99.http: P 1:126(125) ack 1 win 5840
Copyright 2004 OSDL, All rights reserved. - 25 -
TCP options negotiation 2
Window scale by 4
IP 172.20.1.60.32823 > 65.172.181.13.http: S 4120108902:4120108902(0) win 5840
<mss 1460,sackOK,timestamp 3036627 0,nop,wscale 2>
IP 65.172.181.13.http > 172.20.1.60.32823: S 2295773021:2295773021(0) ack 4120108903
win 5792
<mss 1460,sackOK,timestamp 1818411318 3036627,nop,wscale 0>
IP 172.20.1.60.32823 > 65.172.181.13.http: . ack 1 win 1460 <nop,nop,timestamp
3036628 1818411318>
IP 172.20.1.60.32823 > 65.172.181.13.http: P 1:144(143) ack 1 win 1460
<nop,nop,timestamp 3036628 1818411318>
Copyright 2004 OSDL, All rights reserved. - 26 -
Linux TCP window tuning
Copyright 2004 OSDL, All rights reserved. - 27 -
Linux TCP window tuning
Copyright 2004 OSDL, All rights reserved. - 28 -
But!
Copyright 2004 OSDL, All rights reserved. - 29 -
Break
Copyright 2004 OSDL, All rights reserved. - 30 -
Agenda
■ Introduction
■ TCP for muggles
■ Engineering Process
■ Problem examples
■ Network Tools
■ Wrapup
Copyright 2004 OSDL, All rights reserved. - 31 -
Performance Engineering process
■ If successful
Copyright 2004 OSDL, All rights reserved. - 32 -
Goal setting
Copyright 2004 OSDL, All rights reserved. - 33 -
TCP performance testing
Copyright 2004 OSDL, All rights reserved. - 34 -
Testing TCP over WAN
Ethernet
Copyright 2004 OSDL, All rights reserved. - 35 -
Existing network emulation tools
■Dummynet
http://info.iet.unipi.it/~luigi/ip_dummynet/
I don't want to setup separate FreeBSD machine
■ NISTnet
http://snad.ncsl.nist.gov/itg/nistnet/
Only on 2.4 and not ready to be in main tree
Copyright 2004 OSDL, All rights reserved. - 36 -
Netem
TCP
IP
netem
Ethernet (eth0)
http://developer.osdl.org/shemminger/netem
■ Started out as simple delay only hack
Copyright 2004 OSDL, All rights reserved. - 37 -
Current TCP research
Copyright 2004 OSDL, All rights reserved. - 38 -
TCP Reno
Copyright 2004 OSDL, All rights reserved. - 39 -
TCP Vegas
Copyright 2004 OSDL, All rights reserved. - 40 -
TCP Westwood
Copyright 2004 OSDL, All rights reserved. - 41 -
Binary Increase Congestion Control (BIC)
■ Work by Lisung Xu
■ Patches for Web100 (2.4)
■ sysctl net.ipv4.tcp_bic
■ Designed for best high speed networks
■ Modification of Reno
is large
■ Binary search increase when window is small
Copyright 2004 OSDL, All rights reserved. - 42 -
Tuning
Copyright 2004 OSDL, All rights reserved. - 43 -
Receiver Tuning
Copyright 2004 OSDL, All rights reserved. - 44 -
Receiver auto-tuning
1000
800
Throughput (Mbits/sec)
600
Default
400 Auto Tuned
200
0
0 50 100 150 200
Delay (ms)- 45 -
Copyright 2004 OSDL, All rights reserved.
Throughput vs Delay (initial run)
800
Reno
Vegas
Westwood
700 Bic
600
Bandwidth (Mbits/sec)
500
400
300
200
100
0
0 50 100 150 200
Delay (ms)
Copyright 2004 OSDL, All rights reserved. - 46 -
What's happening
■ NAPI
■ Driver API to allow avoiding interrupts
■ Trades off latency for overall performance
■ E1000 driver
■ Uses NAPI for transmit
Answer: Transmit ring gets full and driver flow
blocks
Solution: set TxDescriptors=1000
Copyright 2004 OSDL, All rights reserved. - 47 -
Thorughput vs Delay (rerun)
800
700
600
Throughput (bits/sec)
500
Reno
400
Vegas
Westwood
300 BIC
200
100
0
0 25 50 75 100 125 150 175 200
Delay (ms)
Copyright 2004 OSDL, All rights reserved. - 48 -
Performance still slow
Copyright 2004 OSDL, All rights reserved. - 49 -
Vegas trace with 100ms delay
Copyright 2004 OSDL, All rights reserved. - 50 -
Vegas detail
Copyright 2004 OSDL, All rights reserved. - 51 -
Westwood (70ms)
Copyright 2004 OSDL, All rights reserved. - 52 -
Westwood detail
Copyright 2004 OSDL, All rights reserved. - 53 -
BIC trace (100ms)
Copyright 2004 OSDL, All rights reserved. - 54 -
BIC detail (100ms)
Copyright 2004 OSDL, All rights reserved. - 55 -
How to squeeze out more performance
Copyright 2004 OSDL, All rights reserved. - 56 -
Congestion more work
Copyright 2004 OSDL, All rights reserved. - 57 -
Break
Copyright 2004 OSDL, All rights reserved. - 58 -
Agenda
■ Introduction
■ TCP for muggles
■ Engineering Process
■ Problem examples
■ Network Tools
■ Wrapup
Copyright 2004 OSDL, All rights reserved. - 59 -
Other tools
■ Information about
■ ISP connection
■ Sockets open
■ Testing infrastructure
■ More data capture
■ Monitoring
Copyright 2004 OSDL, All rights reserved. - 60 -
Tools: basic
■ Pathcapture (pcap)
■ Bandwidth and delay measurement
Copyright 2004 OSDL, All rights reserved. - 61 -
Tools: Network interface
■ ifconfig
■ Basic statistics, packets sent/received/errors
■ ip -stats link
■ Alternate newer, may have more info
■ SNMP
■ Remote access to same information
■ Slightly more work
Copyright 2004 OSDL, All rights reserved. - 62 -
Tools: Sockets
■ Netstat
■ TCP statistics
■ Open sockets
■ Ss
■ More statistics available (rtt, etc)
■ Recvmsg
■ Application can see TCP info (cmsg)
Copyright 2004 OSDL, All rights reserved. - 63 -
Tools: test servers
■ SYN test
telnet syntest.psc.edu 7960
■ TCP bandwidth
http://www.epm.ornl.gov/~duniga
n/java/misc/tcpbw.html
http://dslreports.com
■ ANL network config
http://miranda.ctd.anl.gov:7123
■ Path MTU
http://www.ncne.org/jumbogram/mtu_discove
ry.php
Copyright 2004 OSDL, All rights reserved. - 64 -
Tools: testing
■ Ttcp
■ Basic send /receive throughput
■ Iperf
■ Longer running tests and turnaround
■ Netperf
■ Includes cpu and other statistics
■ Dbs
■ Multiclient testing
Copyright 2004 OSDL, All rights reserved. - 65 -
Tools: monitoring
■ Ntop
■ Measure of network activity by service
■ Nice web interface
■ Mailgraph
■ Long term mail statistics
■ Web server activity log analysis
Copyright 2004 OSDL, All rights reserved. - 66 -
Tools: data capture
■ Tcpdump
■ Filter packets by protocol, address, etc
■ Decode many protcols
■ Ethereal
■ GUI interface
■ RMON
■ Remote monitoring
■ Kismet
■ Wireless activity
Copyright 2004 OSDL, All rights reserved. - 67 -
Tools: generators
■ Pktgen
■ Kernel level packet generation
■ Can generate maximum hardware packet rate
■ Network packet generator
■ Application level
Copyright 2004 OSDL, All rights reserved. - 68 -
Tools: simulation
■ Ns
■ Describe overall system
■ Event based simulation
■ Used for protocol analysis
■ SSFnet
■ More detailed models of real hardware
Copyright 2004 OSDL, All rights reserved. - 69 -
Tools: client simulator
■ Web
■ SPECweb, Apache (as), httpload
■ NFS
■ Nfsstone
■ FTP
■ Dkftpbench
Copyright 2004 OSDL, All rights reserved. - 70 -
Conclusion
Copyright 2004 OSDL, All rights reserved. - 71 -