Networking Forums

Networking Forums > Computer Networking > Linux Networking > Linux TCP - unexpected retransmissions

Reply
Thread Tools Display Modes

Linux TCP - unexpected retransmissions

 
 
Francois
Guest
Posts: n/a

 
      05-28-2007, 05:05 PM
This may not be the proper newsgroup but any help would be greatly
appreciated.

Our are working on an embedded system that has a number of PowerQUICC
processors running Linux. During normal operation, processors exchange
small messages (< 100 bytes) using TCP. We have a response time
requirement of about 100 milliseconds and we observed that sometimes
we have a long latency in transporting (e.g., > 200 mlliseconds across
Ethernet link) messages between nodes of the system resulting in
response time exceeding our requirement. This latency occurs randomly
at different places and on different interface types. We set the
socket NO_DELAY option, tried different setting (proc file ipv4
options) and test programs to isolate the root cause of the latency
with no success.

We can reproduce the latency using a small application where two
PowerQuicc cards randomly send each other burst of messages across an
Ethernet link. For this test, we are using the 2.6.16 kernel. We use a
sniffer to capture data across the Ethernet link to realize that
sometimes when both TCPs send each other messages at about the same
time (segment 5 and 6 below), for unknown reasons, the second TCP does
not ack the message from the first TCP and a transmission occurs
(segment 8). We also observed that retransmissions sometimes occur
when one TCP is busy transmitting many messages (segment 38 contains
many application messages) while a message is being sent to it, again,
for unknown reasons, that TCP does not ack the message thus forcing a
retransmission (segment 40).

Netstats reports TCP segments being retransmitted but no error at the
interface level. We have no reason to believe that segments are
dropped at the physical layer. We suspect that segments are dropped at
the TCP layer but we don't know why/where. Any ideas?

Thanks
Francois

Here is the trace with relative sequence numbers where we capture
three instances of a retransmission.
1 0.000000 172.118.100.102 172.118.100.101 TCP 4124 >
9000 [PSH, ACK] Seq=0 Ack=0 Win=9902 Len=84 TSV=15025917 TSER=16502810
2 0.039817 172.118.100.101 172.118.100.102 TCP 9000 >
4124 [ACK] Seq=0 Ack=84 Win=2896 Len=0 TSV=16502926 TSER=15025917
3 0.080062 172.118.100.101 172.118.100.102 TCP 9000 >
4124 [PSH, ACK] Seq=0 Ack=84 Win=2896 Len=8 TSV=16502936 TSER=15025917
4 0.080103 172.118.100.102 172.118.100.101 TCP 4124 >
9000 [ACK] Seq=84 Ack=8 Win=9902 Len=0 TSV=15025937 TSER=16502936
5 0.583935 172.118.100.102 172.118.100.101 TCP 4124 >
9000 [PSH, ACK] Seq=84 Ack=8 Win=9902 Len=8 TSV=15026063 TSER=16502936
6 0.583940 172.118.100.101 172.118.100.102 TCP 9000 >
4124 [PSH, ACK] Seq=8 Ack=84 Win=2896 Len=8 TSV=16503062 TSER=15025937
7 0.583985 172.118.100.102 172.118.100.101 TCP 4124 >
9000 [ACK] Seq=92 Ack=16 Win=9902 Len=0 TSV=15026063 TSER=16503062
8 0.795861 172.118.100.102 172.118.100.101 TCP [TCP
Retransmission] 4124 > 9000 [PSH, ACK] Seq=84 Ack=16 Win=9902 Len=8
TSV=15026116 TSER=16503062
9 0.796059 172.118.100.101 172.118.100.102 TCP 9000 >
4124 [ACK] Seq=16 Ack=92 Win=2896 Len=0 TSV=16503115 TSER=15026116
10 0.797151 172.118.100.101 172.118.100.102 TCP 9000 >
4124 [PSH, ACK] Seq=16 Ack=92 Win=2896 Len=8 TSV=16503115
TSER=15026116
11 0.797194 172.118.100.102 172.118.100.101 TCP 4124 >
9000 [ACK] Seq=92 Ack=24 Win=9902 Len=0 TSV=15026116 TSER=16503115
12 1.088260 172.118.100.102 172.118.100.101 TCP 4124 >
9000 [PSH, ACK] Seq=92 Ack=24 Win=9902 Len=8 TSV=15026189
TSER=16503115

16 6.127280 172.118.100.102 172.118.100.101 TCP 4124 >
9000 [PSH, ACK] Seq=324 Ack=2656 Win=9902 Len=8 TSV=15027449
TSER=16504322
17 6.127289 172.118.100.101 172.118.100.102 TCP 9000 >
4124 [PSH, ACK] Seq=2656 Ack=324 Win=2896 Len=8 TSV=16504448
TSER=15027323
18 6.127334 172.118.100.102 172.118.100.101 TCP 4124 >
9000 [ACK] Seq=332 Ack=2664 Win=9902 Len=0 TSV=15027449 TSER=16504448
19 6.127865 172.118.100.101 172.118.100.102 TCP 9000 >
4124 [PSH, ACK] Seq=2664 Ack=332 Win=2896 Len=8 TSV=16504448
TSER=15027449
20 6.127907 172.118.100.102 172.118.100.101 TCP 4124 >
9000 [ACK] Seq=332 Ack=2672 Win=9902 Len=0 TSV=15027449 TSER=16504448
21 6.631221 172.118.100.102 172.118.100.101 TCP 4124 >
9000 [PSH, ACK] Seq=332 Ack=2672 Win=9902 Len=8 TSV=15027575
TSER=16504448
22 6.631226 172.118.100.101 172.118.100.102 TCP 9000 >
4124 [PSH, ACK] Seq=2672 Ack=332 Win=2896 Len=8 TSV=16504574
TSER=15027449
23 6.631260 172.118.100.102 172.118.100.101 TCP 4124 >
9000 [ACK] Seq=340 Ack=2680 Win=9902 Len=0 TSV=15027575 TSER=16504574
24 6.839618 172.118.100.102 172.118.100.101 TCP [TCP
Retransmission] 4124 > 9000 [PSH, ACK] Seq=332 Ack=2680 Win=9902 Len=8
TSV=15027627 TSER=16504574
25 6.840379 172.118.100.101 172.118.100.102 TCP 9000 >
4124 [PSH, ACK] Seq=2680 Ack=340 Win=2896 Len=8 TSV=16504626
TSER=15027627
26 6.840433 172.118.100.102 172.118.100.101 TCP 4124 >
9000 [ACK] Seq=340 Ack=2688 Win=9902 Len=0 TSV=15027627 TSER=16504626
27 7.136158 172.118.100.102 172.118.100.101 TCP 4124 >
9000 [PSH, ACK] Seq=340 Ack=2688 Win=9902 Len=8 TSV=15027701
TSER=16504626
28 7.136163 172.118.100.101 172.118.100.102 TCP 9000 >
4124 [PSH, ACK] Seq=2688 Ack=348 Win=2896 Len=8 TSV=16504700
TSER=15027701
29 7.136164 172.118.100.102 172.118.100.101 TCP 4124 >
9000 [ACK] Seq=348 Ack=2696 Win=9902 Len=0 TSV=15027701 TSER=16504700

31 1106.230079 172.118.100.101 172.118.100.102 TCP 9000 >
4124 [PSH, ACK] Seq=470416 Ack=58388 Win=2896 Len=84 TSV=16779507
TSER=15302381
32 1106.230121 172.118.100.102 172.118.100.101 TCP 4124 >
9000 [ACK] Seq=58388 Ack=470500 Win=14942 Len=0 TSV=15302506
TSER=16779507
33 1106.230402 172.118.100.101 172.118.100.102 TCP 9000 >
4124 [PSH, ACK] Seq=470500 Ack=58388 Win=2896 Len=84 TSV=16779507
TSER=15302381
34 1106.230445 172.118.100.102 172.118.100.101 TCP 4124 >
9000 [ACK] Seq=58388 Ack=470584 Win=14942 Len=0 TSV=15302506
TSER=16779507
35 1106.230716 172.118.100.101 172.118.100.102 TCP 9000 >
4124 [PSH, ACK] Seq=470584 Ack=58388 Win=2896 Len=84 TSV=16779507
TSER=15302381
36 1106.230759 172.118.100.102 172.118.100.101 TCP 4124 >
9000 [ACK] Seq=58388 Ack=470668 Win=14942 Len=0 TSV=15302506
TSER=16779507
37 1106.232746 172.118.100.102 172.118.100.101 TCP 4124 >
9000 [PSH, ACK] Seq=58388 Ack=470668 Win=14942 Len=8 TSV=15302507
TSER=16779507
38 1106.232809 172.118.100.101 172.118.100.102 TCP 9000 >
4124 [PSH, ACK] Seq=470668 Ack=58388 Win=2896 Len=588 TSV=16779507
TSER=15302506
39 1106.272712 172.118.100.102 172.118.100.101 TCP 4124 >
9000 [ACK] Seq=58396 Ack=471256 Win=14942 Len=0 TSV=15302517
TSER=16779507
40 1106.440704 172.118.100.102 172.118.100.101 TCP [TCP
Retransmission] 4124 > 9000 [PSH, ACK] Seq=58388 Ack=471256 Win=14942
Len=8 TSV=15302559 TSER=16779507
41 1106.443387 172.118.100.101 172.118.100.102 TCP 9000 >
4124 [PSH, ACK] Seq=471256 Ack=58396 Win=2896 Len=8 TSV=16779560
TSER=15302559
42 1106.443391 172.118.100.102 172.118.100.101 TCP 4124 >
9000 [ACK] Seq=58396 Ack=471264 Win=14942 Len=0 TSV=15302559
TSER=16779560
43 1106.736707 172.118.100.102 172.118.100.101 TCP 4124 >
9000 [PSH, ACK] Seq=58396 Ack=471264 Win=14942 Len=8 TSV=15302633
TSER=16779560
44 1106.737143 172.118.100.101 172.118.100.102 TCP 9000 >
4124 [PSH, ACK] Seq=471264 Ack=58404 Win=2896 Len=8 TSV=16779633
TSER=15302633
45 1106.737196 172.118.100.102 172.118.100.101 TCP 4124 >
9000 [ACK] Seq=58404 Ack=471272 Win=14942 Len=0 TSV=15302633
TSER=16779633

 
Reply With Quote
 
 
 
 
Allen McIntosh
Guest
Posts: n/a

 
      05-29-2007, 02:54 AM
Francois wrote:
> Our are working on an embedded system that has a number of PowerQUICC
> processors running Linux. During normal operation, processors exchange
> small messages (< 100 bytes) using TCP. We have a response time
> requirement of about 100 milliseconds and we observed that sometimes
> we have a long latency in transporting (e.g., > 200 mlliseconds across
> Ethernet link) messages between nodes of the system resulting in
> response time exceeding our requirement. This latency occurs randomly
> at different places and on different interface types. We set the
> socket NO_DELAY option, tried different setting (proc file ipv4
> options) and test programs to isolate the root cause of the latency
> with no success.
>
> We can reproduce the latency using a small application where two
> PowerQuicc cards randomly send each other burst of messages across an
> Ethernet link. For this test, we are using the 2.6.16 kernel. We use a
> sniffer to capture data across the Ethernet link to realize that
> sometimes when both TCPs send each other messages at about the same
> time (segment 5 and 6 below), for unknown reasons, the second TCP does
> not ack the message from the first TCP and a transmission occurs

re?
> (segment 8). We also observed that retransmissions sometimes occur
> when one TCP is busy transmitting many messages (segment 38 contains
> many application messages) while a message is being sent to it, again,
> for unknown reasons, that TCP does not ack the message thus forcing a
> retransmission (segment 40).
>
> Netstats reports TCP segments being retransmitted but no error at the
> interface level. We have no reason to believe that segments are
> dropped at the physical layer. We suspect that segments are dropped at
> the TCP layer but we don't know why/where. Any ideas?


Did you try replacing whatever was in the middle (hub/switch/crossover
cable/...)? I know you said you don't suspect the link layer, but a
little paranoia never hurts.

Did you try using well-tested network cards? The machine I'm using to
write this has a built-in NIC that started mysteriously dropping packets
when I installed FC5. Switching to a well-debugged card/driver made the
problem go away.
 
Reply With Quote
 
Francois
Guest
Posts: n/a

 
      05-29-2007, 01:15 PM
On May 28, 10:54 pm, Allen McIntosh <nos...@mouse-potato.com> wrote:
> Francois wrote:
> > Our are working on an embedded system that has a number of PowerQUICC
> > processors running Linux. During normal operation, processors exchange
> > small messages (< 100 bytes) using TCP. We have a response time
> > requirement of about 100 milliseconds and we observed that sometimes
> > we have a long latency in transporting (e.g., > 200 mlliseconds across
> > Ethernet link) messages between nodes of the system resulting in
> > response time exceeding our requirement. This latency occurs randomly
> > at different places and on different interface types. We set the
> > socket NO_DELAY option, tried different setting (proc file ipv4
> > options) and test programs to isolate the root cause of the latency
> > with no success.

>
> > We can reproduce the latency using a small application where two
> > PowerQuicc cards randomly send each other burst of messages across an
> > Ethernet link. For this test, we are using the 2.6.16 kernel. We use a
> > sniffer to capture data across the Ethernet link to realize that
> > sometimes when both TCPs send each other messages at about the same
> > time (segment 5 and 6 below), for unknown reasons, the second TCP does
> > not ack the message from the first TCP and a transmission occurs

>
> re?
>
> > (segment 8). We also observed that retransmissions sometimes occur
> > when one TCP is busy transmitting many messages (segment 38 contains
> > many application messages) while a message is being sent to it, again,
> > for unknown reasons, that TCP does not ack the message thus forcing a
> > retransmission (segment 40).

>
> > Netstats reports TCP segments being retransmitted but no error at the
> > interface level. We have no reason to believe that segments are
> > dropped at the physical layer. We suspect that segments are dropped at
> > the TCP layer but we don't know why/where. Any ideas?

>
> Did you try replacing whatever was in the middle (hub/switch/crossover
> cable/...)? I know you said you don't suspect the link layer, but a
> little paranoia never hurts.
>
> Did you try using well-tested network cards? The machine I'm using to
> write this has a built-in NIC that started mysteriously dropping packets
> when I installed FC5. Switching to a well-debugged card/driver made the
> problem go away.- Hide quoted text -
>
> - Show quoted text -


Our system is composed of a number of embedded PowerQUICC processors
(VME) located within a number of shelves. Processors communicate using
point-to-point Ethernet links, or through the VME backplane. There is
no hub or switch between them (except when we use a sniffer for
testing purposes). We tried different cables, cards, shelves, etc, to
isolate the root cause of this latency with no success.

After browsing the Linux code for a while (I wish I understand it
better), we realized that the TCP stack optimizes performance by
separating the processing of events between user and kernel space. We
suspect that under certain conditions (heavy burst of messages, or
messages arriving at the same time), the stack drops or postpones
processing of events (holding locks, buffering) causing timers to
trigger retransmissions.

Thanks
Francois

 
Reply With Quote
 
Rick Jones
Guest
Posts: n/a

 
      05-29-2007, 05:29 PM
Francois <(E-Mail Removed)> wrote:
> After browsing the Linux code for a while (I wish I understand it
> better), we realized that the TCP stack optimizes performance by
> separating the processing of events between user and kernel
> space. We suspect that under certain conditions (heavy burst of
> messages, or messages arriving at the same time), the stack drops or
> postpones processing of events (holding locks, buffering) causing
> timers to trigger retransmissions.


ISTR there is a sysctl which controls some of that decision making -
net.ipv4.tcp_low_latency . Maybe that will help, maybe not.

Quite frankly, TCP isn't exactly the right protocol for firm/hard
realtime requirements, as you have learned from experience with lost
traffic and retransmissions. There isn't really a "perfect" protocol
for such things though (IMO).

rick jones
--
a wide gulf separates "what if" from "if only"
these opinions are mine, all mine; HP might not want them anyway...
feel free to post, OR email to rick.jones2 in hp.com but NOT BOTH...
 
Reply With Quote
 
Tim S
Guest
Posts: n/a

 
      05-29-2007, 05:55 PM
Rick Jones wrote:

> Francois <(E-Mail Removed)> wrote:
>> After browsing the Linux code for a while (I wish I understand it
>> better), we realized that the TCP stack optimizes performance by
>> separating the processing of events between user and kernel
>> space. We suspect that under certain conditions (heavy burst of
>> messages, or messages arriving at the same time), the stack drops or
>> postpones processing of events (holding locks, buffering) causing
>> timers to trigger retransmissions.

>
> ISTR there is a sysctl which controls some of that decision making -
> net.ipv4.tcp_low_latency . Maybe that will help, maybe not.
>
> Quite frankly, TCP isn't exactly the right protocol for firm/hard
> realtime requirements, as you have learned from experience with lost
> traffic and retransmissions. There isn't really a "perfect" protocol
> for such things though (IMO).
>
> rick jones


There's Infiniband (which I know little of apart from it exists). I dare say
it would be an expensive option and totally OTT for the OP's application.

However, I do wonder if the OP has considered dumping IP and just throwing
raw ethernet frames around? Hard to say whether it would be better or not -
depends on the hardware setup, but it's worth a though.

Cheers

Tim
 
Reply With Quote
 
Rick Jones
Guest
Posts: n/a

 
      05-29-2007, 06:27 PM
Tim S <(E-Mail Removed)> wrote:
> However, I do wonder if the OP has considered dumping IP and just
> throwing raw ethernet frames around? Hard to say whether it would be
> better or not - depends on the hardware setup, but it's worth a
> though.


One of those damned if you do, damned if you don't things I suspect.
One could go with direct Ethernet, but then one has to segment
oneself, as well as deal with lost traffic. One does have the
advantage of being able to use one's own retransmission timeouts.
Having doe that though, some months later someone will want to be able
to run the application between two sites, without any bridging
available and then the lack of routing (since we've ditched IP) will
come back to haunt.

Also, with direct Ethernet, there are only so many Ethertypes/SAPs one
can use which may make multiple "connections" a bit difficult. The
author might have to write her own connection multiplex/demultiplex.

rick jones
--
oxymoron n, commuter in a gas-guzzling luxury SUV with an American flag
these opinions are mine, all mine; HP might not want them anyway...
feel free to post, OR email to rick.jones2 in hp.com but NOT BOTH...
 
Reply With Quote
 
Tim S
Guest
Posts: n/a

 
      05-29-2007, 07:03 PM
Rick Jones wrote:

> Tim S <(E-Mail Removed)> wrote:
>> However, I do wonder if the OP has considered dumping IP and just
>> throwing raw ethernet frames around? Hard to say whether it would be
>> better or not - depends on the hardware setup, but it's worth a
>> though.

>
> One of those damned if you do, damned if you don't things I suspect.
> One could go with direct Ethernet, but then one has to segment
> oneself, as well as deal with lost traffic. One does have the
> advantage of being able to use one's own retransmission timeouts.
> Having doe that though, some months later someone will want to be able
> to run the application between two sites, without any bridging
> available and then the lack of routing (since we've ditched IP) will
> come back to haunt.
>
> Also, with direct Ethernet, there are only so many Ethertypes/SAPs one
> can use which may make multiple "connections" a bit difficult. The
> author might have to write her own connection multiplex/demultiplex.
>
> rick jones


Yes - I should clarify. I've seriously considered using plain ethernet for a
point-to-point link where one half of the link is hosted by a fairly dumb
embedded system (too dumb to run a "proper" OS, but highly specialised for
its task) and where the link's purpose is to feed data to a more
intelligent but less specialised embedded board.

Cheers

Tim
 
Reply With Quote
 
Dan N
Guest
Posts: n/a

 
      05-30-2007, 01:15 AM
On Tue, 29 May 2007 06:15:00 -0700, Francois wrote:

> We
> suspect that under certain conditions (heavy burst of messages, or
> messages arriving at the same time), the stack drops or postpones
> processing of events (holding locks, buffering) causing timers to
> trigger retransmissions.


That sounds like a reasonable explanation to me. Or the link layer drops
data because of timing constraints and/or limited resource, so the tcp
stack never sees it.

Others have suggested using link layer protocol only, but what about using
udp?

Dan
 
Reply With Quote
 
Francois
Guest
Posts: n/a

 
      05-30-2007, 12:44 PM
On May 29, 9:15 pm, Dan N <d...@localhost.com> wrote:
> On Tue, 29 May 2007 06:15:00 -0700, Francois wrote:
> > We
> > suspect that under certain conditions (heavy burst of messages, or
> > messages arriving at the same time), the stack drops or postpones
> > processing of events (holding locks, buffering) causing timers to
> > trigger retransmissions.

>
> That sounds like a reasonable explanation to me. Or the link layer drops
> data because of timing constraints and/or limited resource, so the tcp
> stack never sees it.
>
> Others have suggested using link layer protocol only, but what about using
> udp?
>
> Dan


We have considered using UDP. Although feasible, it would be a
significant of work, not so much to implement but to prove for
correctness. Rightly or wrongly, we made a number of assumptions early
on in the design that were driven by the fact that we used TCP thus
there would be a need to implement additional services on top of UDP
and prove correctness.

We first wanted to isolate the root cause of this latency. As
described above, we suspect the problem related the TCP stack but we
have not proven this yet. We were hoping someone on the net would
confirm that either the current design of the Linux TCP stack could
result in such behaviour, or that this a bug and even better point us
towards a fix.

Thanks
Francois

 
Reply With Quote
 
Rick Jones
Guest
Posts: n/a

 
      05-30-2007, 05:27 PM
If you have already checked all the stats available in Linux (netstat
-s and ethtool) and they are indeed clean, and then have checked the
stats on the switches (for those situations were switches were used),
and a tcpdump trace, or perhaps better still some external packet
sniffing with a sufficinelty powerfull third system (and perhaps a
hub) shows actual symptoms of packet loss, then it would seem that you
have encountered a situation where there are points in the stack which
can drop packets, but not increment a stat.

That would be a bug.

You may need to start perusing the source of the entire path looking
for places where this might be the case. You would then need to
kludge-in some counters of your own (perhaps just simple printk's even
as a start) to see what might be going-on. If you get your Linux bits
from a commerical source, you could fire-up your support contract and
start getting them to do some of that - the source code perusal and
perhaps quick and dirty counters at least.

rick jones
--
oxymoron n, Hummer H2 with California Save Our Coasts and Oceans plates
these opinions are mine, all mine; HP might not want them anyway...
feel free to post, OR email to rick.jones2 in hp.com but NOT BOTH...
 
Reply With Quote
 
 
 
Reply

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
TCP client machine (Linux) sent the unexpected RST right after the SYN was sent - libnet_write(l) was used slsworking Linux Networking 1 09-24-2007 05:15 PM
Unexpected behavior by 'tc' Naren Linux Networking 0 08-07-2007 11:57 PM
Unexpected behaviour of TCP... srinivasan.gct@gmail.com Linux Networking 0 10-07-2005 02:07 PM
unexpected deauthentication David Boyer Wireless Internet 0 08-05-2005 09:57 PM
Unexpected Shutdown Myrt Webb Wireless Networks 2 11-10-2004 01:02 AM



1 2 3 4 5 6 7 8 9 10 11