I'm working on an application that we're trying to switch from a 2.4
kernel to a 2.6 kernel. (We might currently be using 2.6.9; I'm not
completely sure.) One part of the program periodically sends out
chunks of data whose size is just over 1MB via tcp.
Frequently, alas, these chunks aren't arriving in a timely fashion.
Instrumenting the code and doing a tcpdump, this is what we see:
1) The sender uses sendmsg() to send all the data. (In chunks of a
little less than 1.5K, in case it matters.)
2) Most of the data arrives in a timely fashion. There are a few
dropped packets that have to get retransmitted; no big deal. (I
assume this step overlaps somewhat with step 1; also, sometimes all
the data makes it, so we don't progress to step 3.)
3) Occasionally, at some point, the transmission slows way down: the
sender sends out bits of data (1 or 2 Ethernet frames, I can't
remember) spaced 200ms apart, each marked with PUSH.
I don't understand why they'd be marked with push: by this time, all
the sendmsg calls have returned, so the sender's kernel should have
all the data, so there should only be one transmission marked with
push. But I'm seeing lots of them. Which I wouldn't mind so much,
but the 200ms gaps are killing us.
Does this ring any bells? This 200 millisecond gap + PUSH behavior
seems very odd, so I'm hoping that somebody's seen a misconfiguration
or kernel bug causing these particular symptoms.
Thanks for any suggestions that anybody has.
David Carlton
(E-Mail Removed)