(E-Mail Removed) (P Gentry) wrote in message news:<(E-Mail Removed). com>...
> Besides the memory adjustments most of the other TCP variables are
> already set to "good" values, IIRC. You might double check, but
> nothing leaps out at me as a self-evident cause for your observations.
> You could also compare 1500 MTU stats to 9000 MTU stats (especially
> the TCPExts) via:
> $ netstat -spc tcp
I don't see any TCPExts number listed, but a diff before and after for
both sizes shows nothing that shouldn't be expected.
> I wonder if this:
> http://kerneltrap.org/node/view/2969 kernel 2.4.27-pre1
> would help at all or if these are just "general" e1000 fixes?
I applied the e1000 diffs from the pre2 kernel and have seen no
difference in behavior.
> After reading Intel's Readme, Release Notes, and Application Note as
> well as the Napi paper I couldn't begin to guess what configuration
> changes via the driver/tools would help. The interrupt timers look
> interesting, but where do you begin?
Exactly. I contemplated doing a brute-force search script, but when I
started to get a feel for how man piddly little options that *might*
affect performance there were, the test duration rapidly climbed in
the months.
> The Readme does rather vaguely and unhelpfully acknowledge
> "Performance Degradation with Jumbo Frames" as a Known Issue. OK ...
> and? I wonder if it's more than just a memory tweak that's needed in
> your case.
I would have *assumed* that this would work out of the box, that gig
was used enough by people that these problems would have been
resolved, or at least someone would document what to change...
> Looking at how "versatile" the controller/interrupts are, I wonder if
> APIC code is not playing well with the driver -- I know RH used to
> have problems backporting APIC patches. But those problems usually
> resulted in barely or wholly non-functional hardware/features.
The machines are running Debian Woody, and I build my own reasonably
stock kernels. These machines do have some patches, and for that
reason I'm going to attempt to reboot to a truly stock 2.4.26 right
now. See below.
> The only other thing I ran across, in Intel's Open Software
> Developer's Manual, was Frame Based Flow Control that generates the
> ethernet pause frames when the nic's receive buffer is nearing full.
> It is likely part of the auto-negotiation between two e1000's -- it's
> available on a "dedicated link". See p.109 here:
> http://sourceforge.net/project/showf...ckage_id=68544
> I seem -- very hazily -- to recall having "problems" with this in the
> past that resulted in unexplained behavior (sense we weren't
> looking/aware of it).
I turned off auto/rx/tx pausing and performance dropped to almost
nothing.
UPDATE:
After putting a stock 2.4.26 kernel on both machines, the performance
difference for pure TCP goes away. I can get 112MB/sec with 1500-byte
MTU and 115MB/sec with 9000, *after* raising the rmem/wmem max's for
both core and ipv4 to 4MB, and quadrupling the min/default for ipv4
rmem/wmem.
With that part done, it's on to NFS, where the problem comes right
back!
Previous experiments show that the following mount options are
reasonably optimal:
soft,intr,bg,tcp,async,rsize=8192,rsize=8192
With those mount options and an MTU of 1500, with the above rmem/wmem
tweaks, bonnie++ gets 47MB/sec sequential writes and 68MB/sec
sequential reads. Bonnie on the local machine gets 65MB/sec and
90MB/sec respectively, as it's on a 4-disk RAID-10 array. Switching
to 9000 drops that to 33MB/sec and 45MB/sec. The really bizarre part
is that doing so also *triples* the processor load while reading (but
not writing!).
I've done some packet traces and found some rather odd patterns. If
you look at
http://pdxcolo.net/~omega/misc/jumbo/ you'll see the
following files:
daniel-1500.pcap.bz2 - pre-stock-2.4.26 ttcps
daniel-9000.pcap.bz2
rtt-1500.png - screengrab of RTT from daniel-*.pcap
rtt-9000.png
throughput-1500.png - screengrab of "throughput" from daniel-*.pcap
throughput-9000.png
nfs-write1500.pcap.bz2 - trace of a 256MB zerofile written on *stock*
2.4.26
nfs-write9000.pcap.bz2
nfs-write1500.png - screengrab of "throughput" from
nfs-write*.pcap
nfs-write9000.png
You'll see that there's a *very* different pattern going on depending
on the MTU, both before and after switching to pure stock kernels. I
haven't done a TTCP trace since switching, but I plan on doing so to
see if it retains the odd pattern, or shows an entirely new (third)
pattern to reflect the final achievement of normal TCP-only
throughputs.
My business partner is at Interop this week, I'll forward this to him
and remind him that he was going to try to corner someone from NetApp
and dig into a few NFS-related things ;-)
- Omega
aka Erik Walthinsen
http://pdxcolo.net/