Dan Stromberg wrote:
> Hi folks.
>
> I'm trying to get an AIX 5.1 ML 4 box talking fast over a gigabit network
> to a variety of linux hosts. The linux hosts run Fedora, Redhat
> Enterprise and CentOS.
>
> I'm finding that -some- of the time, the communications with this AIX box
> to various linux boxes get TCP Window Scaling enabled (RFC1323), but other
> times, they do not, and there isn't an amazingly clear pattern in the
> differences yet other than they always involved the AIX box so far.
>
> This, despite ostensibly having RFC1323 and SACKS enabled on either end of
> the communication.
>
> I have a more complete writeup of what we're seeing at:
>
> http://dcs.nac.uci.edu/~strombrg/optiputer-discussion/
>
> The link about TCP windows on the above page is probably the most relevant.
>
> I'm guessing this might be an interoperability problem that linux somehow
> fixed and worked around, since Fedora 4 doesn't have the problem, and RH 4
> didn't seem to either, but RH 3 and CentOS do.
>
> Or is it possible that some router or switch between the two machines is
> messing up the RFC1323 Sometimes?
>
> Any suggestions?
>
> Thanks!
>
> PS: TCP Window Scaling allows your TCP windows to scale past 64K, as was
> the max in earlier TCP due to a 16 bit field. A TCP Window, in turn, is a
> buffer that holds unacknowledged packets, in case they need to be
> retransmitted due to not being acknowledged soon enough/ever.
Well, not knowing the setup -- TCP variables set -- or the hardware
encountered along the path makes even guessing pretty much a pointless
past time.
Seems:
-- the netstat stats are for an extended period (longer than the
tcpdump) and what stats belong to what traffic is, well, difficult to
discern :-)
-- tcptrace seems not to agree with netstat stats re: windowscaling,
etc.
-- your email chart does not agree with tcptrace output -- presume
chart data was collected from other traces
The tcpdump file can be processed through ethereal to _very_ good
effect in watching the connection setup (where window scaling is
negotiated) as well as a _very_ good filtering system to watch for
particular packets (eg. watching for tcp options, cwnd, ECN, etc.)
I have not had a chance to get RHEL/Centos on my system yet, but did
reconfirm that some of this code in the stack has been changing. Thus
different kernel versions _may_ behave differently. BTW, Centos makes
no changes to the functional RHEL source code: just trademark
cleaning.
As Allen McIntosh said, you will need to at least _look_ at the tcp
variables set in the /proc fs. If you can't get to the box, ask
someone to send you the info. Without it you're pretty much in the
dark. You may have to tweak the values (or have them changed) to
achieve acceptable GigE performance, though it's much better now than
several years ago.
Also note that at GigE speeds, hardware along the path can make a _big_
difference. That _very_ much includes NICs and (especially) the NIC
driver code.
In summary, without the same OS/kernel, OS settings, hardware, and
pathway you are solving for way too many simultaneous variables if you
don't have a good and complete set of data -- both the measured kind,
like ping and traceroute, and the OS tcp settings. Tcpdump is _really_
valuable to distinguish between host problems and pathway problems.
Here are some good links:
TCP Variables
http://ipsysctl-tutorial.frozentux.net/
/usr/share/doc/kernel-doc-2.[-X-]/networking/ip-sysctl.txt
/usr/src/linux-2.[-X-]/Documentation/networking/ip-sysctl.txt
TCP Perf Links
http://www.psc.edu/networking/projects/tcptune/
http://www-didc.lbl.gov/TCP-tuning/TCP-tuning.html
http://www-didc.lbl.gov/TCP-tuning/linux.html
http://www.uninett.no/tcpperf/
http://www.infosyssec.net/infosyssec/netprot1.htm
http://ltp.sourceforge.net/tooltable.php
http://www.csm.ornl.gov/~dunigan/netperf/netlinks.html
http://www.web100.org/
Kernel Netwoking Notes: (as far back as 7/8/04)
NAPI performance
http://lwn.net/Articles/139208/
Pluggable congestion avoidance modules
http://lwn.net/Articles/128062/
TCP window scaling and broken routers
http://lwn.net/Articles/91976/
hth,
prg