Networking Forums

Networking Forums > Computer Networking > Linux Networking > Re: Is this an MTU problem?

Reply
Thread Tools Display Modes

Re: Is this an MTU problem?

 
 
David Efflandt
Guest
Posts: n/a

 
      11-14-2003, 03:21 AM
On Thu, 13 Nov 2003 20:40:46 +0000, Simon Dean <(E-Mail Removed)> wrote:
> Seems I have a slight intermittent problem with outgoing connections.


You really mean connections initiated from outside.

> I have a Linux router, connecting over an ADSL line, using the Linux
> speedtouch user space driver (ie ppp0). The MTU for the ppp link is
> 1500, and largely causes no noticable problems.


The mtu of the eth device that transports PPPoE should have default mtu
1500. But PPPoE itself has an 8-byte header, so its max mtu is 1492.

> But the only time I've noticed a slowdown, is when Im trying to access
> the computer from another ADSL connection.
>
> eg, my brother who is on AOL, the newsgroup server I run on my machine
> is apallingly slow.


What is the advertised speed (or actual test speeds) of your adsl. Note
that your upload is smaller than download. So a server would be much
slower than you can access internet from there.

MTU on your box doing pppoe should not be an issue unless your firewall
is blocking mtu path discovery. Setting ppp0 mtu lower is not going to
help.

However, mtu may be an issue if any of your servers is on a LAN behind
your pppoe box. I forwarded port 25 to a LAN smtp server, and it had
trouble receiving outside mail (timeout while waiting for data). Once I
set the LAN nic of smtp server to match pppoe mtu, the mail flowed in.

To determine max mtu do the following (to a host that responds to ping)
from any of your Linux servers:

ping -s 1472 -M do some.internet.host

If that does not return an error that tells you the mtu, try:

ping -s 1464 -M do some.internet.host

Or make -s smaller until you find the largest -s that does work, then add
28 to the -s get max mtu. Not sure what Windows ping switch is for "do
not fragment".

But you appear to be using a modem that requires a driver, instead of an
ethernet modem, so it could be some other problem I would not be aware of.

--
David Efflandt - All spam ignored http://www.de-srv.com/
 
Reply With Quote
 
 
 
 
noone
Guest
Posts: n/a

 
      11-14-2003, 05:21 AM
David Efflandt wrote:
> On Thu, 13 Nov 2003 20:40:46 +0000, Simon Dean <(E-Mail Removed)> wrote:
>
>>Seems I have a slight intermittent problem with outgoing connections.

>
>
> You really mean connections initiated from outside.
>
>
>>I have a Linux router, connecting over an ADSL line, using the Linux
>>speedtouch user space driver (ie ppp0). The MTU for the ppp link is
>>1500, and largely causes no noticable problems.

>
>
> The mtu of the eth device that transports PPPoE should have default mtu
> 1500. But PPPoE itself has an 8-byte header, so its max mtu is 1492.
>
>
>>But the only time I've noticed a slowdown, is when Im trying to access
>>the computer from another ADSL connection.
>>
>>eg, my brother who is on AOL, the newsgroup server I run on my machine
>>is apallingly slow.

>
>
> What is the advertised speed (or actual test speeds) of your adsl. Note
> that your upload is smaller than download. So a server would be much
> slower than you can access internet from there.
>
> MTU on your box doing pppoe should not be an issue unless your firewall
> is blocking mtu path discovery.


.... or any router in between the 2 is blocking MTU path discovery.

Setting ppp0 mtu lower is not going to
> help.


What if that host initiating the connection tries to turn of MTU path
discovery?

Some useful documents that I have links to are:

http://www.cisco.com/warp/public/105/38.shtml
http://www.netheaven.com/pmtu.html

The last link above seems to describe the problem the OP was describing
under "Example Path MTU Discovery Failure Scenario"

 
Reply With Quote
 
Simon Dean
Guest
Posts: n/a

 
      11-14-2003, 07:46 AM
David Efflandt wrote:
> On Thu, 13 Nov 2003 20:40:46 +0000, Simon Dean <(E-Mail Removed)> wrote:
>
>>Seems I have a slight intermittent problem with outgoing connections.

>
> You really mean connections initiated from outside.


Dang. Ok.

>>I have a Linux router, connecting over an ADSL line, using the Linux
>>speedtouch user space driver (ie ppp0). The MTU for the ppp link is
>>1500, and largely causes no noticable problems.

>
> The mtu of the eth device that transports PPPoE should have default mtu
> 1500. But PPPoE itself has an 8-byte header, so its max mtu is 1492.


Well, I dont have an eth device... since its all handled through ppp. So
a server process operating on the linux server has no eth device as
such, it communicates through ppp0 to the rest of the world. The MTU is
assigned by the ISP and as assigned as 1500.



>>But the only time I've noticed a slowdown, is when Im trying to access
>>the computer from another ADSL connection.
>>
>>eg, my brother who is on AOL, the newsgroup server I run on my machine
>>is apallingly slow.

>
> What is the advertised speed (or actual test speeds) of your adsl. Note
> that your upload is smaller than download. So a server would be much
> slower than you can access internet from there.


512k inwards, 256k outwards. Of course, I dont expect the speed to be
blisteringly fast, but I dont expect my connections to freeze. I also
dont expect the connections to freeze when the MTU is at 1000, but work
fine at 576.


> MTU on your box doing pppoe should not be an issue unless your firewall
> is blocking mtu path discovery. Setting ppp0 mtu lower is not going to
> help.


The linux box is connected directly to the internet (through the ADSL
Speedtouch USB modem). Its actually a pppoa as opposed to pppoe, and
believe me from my tests, in this instance, setting the mtu lower on the
ppp0 link (pppoa link, the direct link to the internet) actually does
help. But agreed that it shouldn't help, because the machines should be
capable of doing path mtu discovery.

Which is the thing that confuses me. From the machine at work, if I ping
with a packet of size 1472, and set the do not fragment bit (in dos,
ping -f -l 1472 my.home.server) I get no warnings, and I get replies
back. So that seems to be OK.


> However, mtu may be an issue if any of your servers is on a LAN behind
> your pppoe box.


Nope. I dont have that setup.

> I forwarded port 25 to a LAN smtp server, and it had
> trouble receiving outside mail (timeout while waiting for data). Once I
> set the LAN nic of smtp server to match pppoe mtu, the mail flowed in.


I've seen that problem before. But with path mtu, that should never be a
problem, right?


> To determine max mtu do the following (to a host that responds to ping)
> from any of your Linux servers:
>
> ping -s 1472 -M do some.internet.host


-M doesn't appear to be a valid option. But 1472 appears fine.

[root@simtext sjdean]# ping -s 1472 www.google.com
PING www.google.akadns.net (216.239.39.99): 1472 data bytes
1480 bytes from 216.239.39.99: icmp_seq=0 ttl=51 time=161.0 ms
1480 bytes from 216.239.39.99: icmp_seq=1 ttl=51 time=160.4 ms
1480 bytes from 216.239.39.99: icmp_seq=2 ttl=51 time=160.2 ms


> Or make -s smaller until you find the largest -s that does work, then add
> 28 to the -s get max mtu. Not sure what Windows ping switch is for "do
> not fragment".


That's -f.

Z:\>ping -f -l 1472 81.168.19.133

Pinging 81.168.19.133 with 1472 bytes of data:

Reply from 81.168.19.133: bytes=1472 time=159ms TTL=58
Reply from 81.168.19.133: bytes=1472 time=158ms TTL=58


> But you appear to be using a modem that requires a driver, instead of an
> ethernet modem, so it could be some other problem I would not be aware of.


Yes. I havent ruled that out.

Transferring files to my website, housed elsewhere on the internet, I
dont get such freezing in transfer. Certainly it appears that other
servers respond satisfactorily with acks.


08:45:52.150043 212.159.113.89.39822 > 213.232.100.35.ftp-data: .
70953:72401(14
48) ack 1 win 5792 <nop,nop,timestamp 100727315 7977408> [tos 0x8] (ttl
64, id 1
312)
08:45:52.212990 213.232.100.35.ftp-data > 212.159.113.89.39822: . ack
72401 win
63712 <nop,nop,timestamp 7977440 100727315> (DF) (ttl 58, id 26412)
08:45:52.213046 212.159.113.89.39822 > 213.232.100.35.ftp-data: P
72401:73849(14
48) ack 1 win 5792 <nop,nop,timestamp 100727321 7977440> [tos 0x8] (ttl
64, id 1
313)
08:45:52.213091 212.159.113.89.39822 > 213.232.100.35.ftp-data: .
73849:75297(14
48) ack 1 win 5792 <nop,nop,timestamp 100727321 7977440> [tos 0x8] (ttl
64, id 1
314)

(212.159.113.89 is mine)

So it could be a setup with the computer system at work (which is behind
a windows firewall). Although in that case, that doesn't explain the
other satisfactory non-freezing connections to other servers. Perhaps
there is something weird in the pppoa driver that's freezing when I talk
to another ADSL modem!

Cheers
Simon

 
Reply With Quote
 
Simon Dean
Guest
Posts: n/a

 
      11-14-2003, 07:56 AM
noone wrote:

> David Efflandt wrote:
>
>> On Thu, 13 Nov 2003 20:40:46 +0000, Simon Dean
>> <(E-Mail Removed)> wrote:
>>
>>> Seems I have a slight intermittent problem with outgoing connections.

>>
>>
>>
>> You really mean connections initiated from outside.
>>
>>
>>> I have a Linux router, connecting over an ADSL line, using the Linux
>>> speedtouch user space driver (ie ppp0). The MTU for the ppp link is
>>> 1500, and largely causes no noticable problems.

>>
>>
>>
>> The mtu of the eth device that transports PPPoE should have default
>> mtu 1500. But PPPoE itself has an 8-byte header, so its max mtu is 1492.
>>
>>
>>> But the only time I've noticed a slowdown, is when Im trying to
>>> access the computer from another ADSL connection.
>>>
>>> eg, my brother who is on AOL, the newsgroup server I run on my
>>> machine is apallingly slow.

>>
>>
>>
>> What is the advertised speed (or actual test speeds) of your adsl. Note
>> that your upload is smaller than download. So a server would be much
>> slower than you can access internet from there.
>>
>> MTU on your box doing pppoe should not be an issue unless your
>> firewall is blocking mtu path discovery.

>
>
> ... or any router in between the 2 is blocking MTU path discovery.
>
> Setting ppp0 mtu lower is not going to
>
>> help.

>
>
> What if that host initiating the connection tries to turn of MTU path
> discovery?


I've tried turning off Path MTU discovery on my server end to see if
that makes a difference today. You never know. From what I've read, any
fixes can only really come from the server end?? I can't expect all the
computers on the internet to be fixed, there may be machines out there
that ignore DF or requests to fragment etc...


> Some useful documents that I have links to are:
>
> http://www.cisco.com/warp/public/105/38.shtml
> http://www.netheaven.com/pmtu.html
>
> The last link above seems to describe the problem the OP was describing
> under "Example Path MTU Discovery Failure Scenario"


Thanks for the links. unfortunately, I dont have pppoe, but pppoa. And I
have tried reducing the MTU on the link to the internet. 1000 doesn't
work. 1488 doesn't work. But 576 does work, with the computer at work.

Hopefully I've turned off the DF bit on my outgoing traffic, to see if
that makes anything more reliable.

Cya
Simon

 
Reply With Quote
 
Simon Dean
Guest
Posts: n/a

 
      11-14-2003, 04:00 PM
Here's another tcpdump:

tcpdump: listening on ppp0
11:44:16.440078 212.159.113.89.www > 81.168.19.134.63281: .
1461:2921(1460) ack
500 win 6432 (DF) (ttl 64, id 9272)
11:44:16.677844 81.168.19.134.63281 > 212.159.113.89.www: . ack 2921 win
65535 (
DF) [tos 0xa0] (ttl 122, id 29567)
11:44:16.677970 212.159.113.89.www > 81.168.19.134.63281: .
2921:4381(1460) ack
500 win 6432 (DF) (ttl 64, id 9273)
11:44:16.678019 212.159.113.89.www > 81.168.19.134.63281: .
4381:5841(1460) ack
500 win 6432 (DF) (ttl 64, id 9274)

11:44:28.670078 212.159.113.89.www > 81.168.19.134.63281: .
2921:4381(1460) ack
500 win 6432 (DF) (ttl 64, id 9275)
11:44:28.928208 81.168.19.134.63281 > 212.159.113.89.www: . ack 4381 win
65535 (
DF) [tos 0xa0] (ttl 122, id 29581)
11:44:28.928298 212.159.113.89.www > 81.168.19.134.63281: .
4381:5841(1460) ack
500 win 6432 (DF) (ttl 64, id 9276)
11:44:28.928352 212.159.113.89.www > 81.168.19.134.63281: .
5841:7301(1460) ack
500 win 6432 (DF) (ttl 64, id 9277)
11:44:52.920072 212.159.113.89.www > 81.168.19.134.63281: .
4381:5841(1460) ack
500 win 6432 (DF) (ttl 64, id 9278)
11:44:53.208927 81.168.19.134.63281 > 212.159.113.89.www: . ack 5841 win
65535 (
DF) [tos 0xa0] (ttl 122, id 29592)
11:44:53.209013 212.159.113.89.www > 81.168.19.134.63281: .
5841:7301(1460) ack
500 win 6432 (DF) (ttl 64, id 9279)

Note:

I send a packet, the other computer acknowledges it, my computer sends
two packets, then retries the first of the two twenty seconds later,
then the other computer acknowledges it, then I send two... etc.

Is that indicitive of anything?


Cheers
Simon

 
Reply With Quote
 
Clifford Kite
Guest
Posts: n/a

 
      11-15-2003, 06:26 PM
Simon Dean <(E-Mail Removed)> wrote:
> Here's another tcpdump:


> tcpdump: listening on ppp0
> 11:44:16.440078 212.159.113.89.www > 81.168.19.134.63281: .
> 1461:2921(1460) ack

^^^^
> 500 win 6432 (DF) (ttl 64, id 9272)
> 11:44:16.677844 81.168.19.134.63281 > 212.159.113.89.www: . ack 2921 win
> 65535 (
> DF) [tos 0xa0] (ttl 122, id 29567)


The remote host, referred to as TRH from here on, ACKs the data segment
sent by the local host, referred to as TLH from here on, which we choose
to identify by 2921, the number after the colon in 1461:2921.

> 11:44:16.677970 212.159.113.89.www > 81.168.19.134.63281: .
> 2921:4381(1460) ack
> 500 win 6432 (DF) (ttl 64, id 9273)
> 11:44:16.678019 212.159.113.89.www > 81.168.19.134.63281: .
> 4381:5841(1460) ack
> 500 win 6432 (DF) (ttl 64, id 9274)


TLH sends two data segments to TRH.

> 11:44:28.670078 212.159.113.89.www > 81.168.19.134.63281: .
> 2921:4381(1460) ack
> 500 win 6432 (DF) (ttl 64, id 9275)


After 12 seconds elapse with no ACK from TRH, TLH resends the first
data segment, 4381.

> 11:44:28.928208 81.168.19.134.63281 > 212.159.113.89.www: . ack 4381 win
> 65535 (DF) [tos 0xa0] (ttl 122, id 29581) ^^^^


And quickly receives an ACK for the data segment from TRH.

> 11:44:28.928298 212.159.113.89.www > 81.168.19.134.63281: .
> 4381:5841(1460) ack
> 500 win 6432 (DF) (ttl 64, id 9276)
> 11:44:28.928352 212.159.113.89.www > 81.168.19.134.63281: .
> 5841:7301(1460) ack
> 500 win 6432 (DF) (ttl 64, id 9277)


TLH sends two more data segments, 5841 is repeat of the second of the
pair already sent but not ACKed and 7301 is new.

> 11:44:52.920072 212.159.113.89.www > 81.168.19.134.63281: .
> 4381:5841(1460) ack
> 500 win 6432 (DF) (ttl 64, id 9278)


After ~24 seconds without an ACK TLH resends the data segment 5841 again.

> 11:44:53.208927 81.168.19.134.63281 > 212.159.113.89.www: . ack 5841 win
> 65535 (
> DF) [tos 0xa0] (ttl 122, id 29592)


And again quickly receives an ACK from TRH.

> 11:44:53.209013 212.159.113.89.www > 81.168.19.134.63281: .
> 5841:7301(1460) ack
> 500 win 6432 (DF) (ttl 64, id 9279)


TLH resends the unACKed data segment 7301.

> Note:


> I send a packet, the other computer acknowledges it, my computer sends
> two packets, then retries the first of the two twenty seconds later,
> then the other computer acknowledges it, then I send two... etc.


> Is that indicitive of anything?


It suggests that TCP exponential backoff is occurring on TLH. The very
first data segment above is from TLH and likely is a repeat of an
immediately preceding data segment that was sent about 6 seconds earlier.
I'd guess that is happening because TRH fails to ACK any outstanding
data segments until one is resent.

The first retransmission would happen after about 1.5 seconds without an
ACK, the second after 3 seconds without an ACK, the third after 6 seconds
without and ACK, ect., until a limit is reached, perhaps at 64 seconds -
but it's implementation dependent. After the limit for the time between
retransmissions is reached, the connection will be closed after reaching
another limit on total time with no ACKs received.

I'm *not* a TCP/IP expert and don't know why TRH is only ACKing
TLH's retransmissions. I can suggest that you repost the log on
comp.protocols.tcp-ip and include the fact that the problem only
occurs between a host accessing the Internet with your PPPoA ADSL
connection and a host using another ADSL connection elsewhere on the
Internet. It might be a good idea to remark that a reduction in MTU
to 576 (an old network standard) cures it, although I don't think the
root of the problem is directly related to MTU. (If you are lucky
and get an answer from Barry Margolin then believe it, his answers
are always good ones.)

--
Clifford Kite Email: "echo xvgr_yvahk-(E-Mail Removed)|rot13"
PPP-Q&A links, downloads: http://ckite.no-ip.net/


 
Reply With Quote
 
Horst Knobloch
Guest
Posts: n/a

 
      11-15-2003, 06:56 PM
Simon Dean <(E-Mail Removed)> wrote:

> Here's another tcpdump:
>
> tcpdump: listening on ppp0


> 11:44:16.440078 YOU.www > OTHER.63281: . 1461:2921(1460) ack 500 win 6432

(DF) (ttl 64, id 9272)
> 11:44:16.677844 OTHER.63281 > YOU.www: . ack 2921 win 65535 (DF) [tos

0xa0] (ttl 122, id 29567)
> 11:44:16.677970 YOU.www > OTHER.63281: . 2921:4381(1460) ack 500 win 6432

(DF) (ttl 64, id 9273)
> 11:44:16.678019 YOU.www > OTHER.63281: . 4381:5841(1460) ack 500 win 6432

(DF) (ttl 64, id 9274)
> 11:44:28.670078 YOU.www > OTHER.63281: . 2921:4381(1460) ack 500 win 6432

(DF) (ttl 64, id 9275)
> 11:44:28.928208 OTHER.63281 > YOU.www: . ack 4381 win 65535 (DF) [tos

0xa0] (ttl 122, id 29581)
> 11:44:28.928298 YOU.www > OTHER.63281: . 4381:5841(1460) ack 500 win 6432

(DF) (ttl 64, id 9276)
> 11:44:28.928352 YOU.www > OTHER.63281: . 5841:7301(1460) ack 500 win 6432

(DF) (ttl 64, id 9277)
> 11:44:52.920072 YOU.www > OTHER.63281: . 4381:5841(1460) ack 500 win 6432

(DF) (ttl 64, id 9278)
> 11:44:53.208927 OTHER.63281 > YOU.www: . ack 5841 win 65535 (DF) [tos

0xa0] (ttl 122, id 29592)
> 11:44:53.209013 YOU.www > OTHER.63281: . 5841:7301(1460) ack 500 win 6432

(DF) (ttl 64, id 9279)
>

[...]
> Is that indicitive of anything?


Yes, it indicates that you do *not* have the typical MTU problem.

It looks like your box, the remote box or a router along the
path is dropping packets due to a yet unknown reason. The most
probable reason is a congested link somewhere along the path.

<WILD GUESS>
Your USB ADSL modem or its PPPoA driver is dropping packets
because of congestion or fault. This happens with big sized
(MTU=1500) and small sized (MTU=576) packets.

If you use big sized packets and no TCP timestamps, the packet
loss is more severe because fewer packets can be sent before
congestion drops packets. Therefore fewer valid RTT samples
can be retrieved. This means your side can hardly calculate a
proper Retransmission Time in this case. This leads to a bad
performance due to very long timeouts before the retransmissions.

If you use small sized packets and no TCP timestamps, the impact
of the packet loss is not that severe. Since there are more
packets sent before congestion kicks in, more valid RTT samples
can be taken. More RTT samples lead to a calculation of a better
Retransmission Time and a better Retransmission Time means
packets are more timely (earlier) retransmitted. This leads to
a better performance of the link than with the big sized packets.

Packet loss with big sized packets on connections using timestamps
is also not that severe. Because each packet contains a timestamp
and due to this more valid RTT samples can be taken even with
fewer packets. This leads also to a better Retransmission Time.
</WILD GUESS>

If my wild guess is true, then enabling TCP timestamps¹ would
improve the performance. However the root cause (the excessive
packet loss) is still not fixed and you need to take care
of that, too.

So, first make sure that there is no bandwidth hog clogging your
upstream or downstream link on your side. Typical bandwidth hogs
are p2p file sharing programs. Use tcpdump to check that no other
program is using excessive bandwidth while you perform the tests.

If you are in control of the remote site, make sure that no
bandwith problem exists on the remote link, too. Use also
tcpdump on the remote site to check whether the packets hit
the other side or not.

Perform the test to different sites and use tcpdump to trace
traffic. Find commonalities in the trace among those who
perform well, and among those who perform bad. Are all sites
which have the TCP timestamp option enabled, performing well?

Perform the test to a site which performs bad and use tcpdump
to trace traffic. Then change your MTU to 576 and repeat the
test. Do you see packet retransmissions in both traces?

Perform the test to a site which performs well with MTU=1500
and use tcpdump to trace traffic. Do you see here also packet
retransmissions?


¹) Another trace showed that your box already has timestamps
enabled. However for timestamps to be used on a connection,
both sides must have them enabled.
So you need to enable it on the remote side. On Windows
you can use "Dr. TCP" for doing that.


HTH

Ciao, Horst
--
»When pings go wrong (It hurts me too)« E.Clapton/E.James/P.Tscharn
 
Reply With Quote
 
Horst Knobloch
Guest
Posts: n/a

 
      11-15-2003, 07:03 PM
Simon Dean <(E-Mail Removed)> wrote:

> Here's another tcpdump:
>
> tcpdump: listening on ppp0


> 11:44:16.440078 YOU.www > OTHER.63281: . 1461:2921(1460) ack 500 win 6432 (DF) (ttl 64, id 9272)
> 11:44:16.677844 OTHER.63281 > YOU.www: . ack 2921 win 65535 (DF) [tos 0xa0] (ttl 122, id 29567)
> 11:44:16.677970 YOU.www > OTHER.63281: . 2921:4381(1460) ack 500 win 6432 (DF) (ttl 64, id 9273)
> 11:44:16.678019 YOU.www > OTHER.63281: . 4381:5841(1460) ack 500 win 6432 (DF) (ttl 64, id 9274)
> 11:44:28.670078 YOU.www > OTHER.63281: . 2921:4381(1460) ack 500 win 6432 (DF) (ttl 64, id 9275)
> 11:44:28.928208 OTHER.63281 > YOU.www: . ack 4381 win 65535 (DF) [tos 0xa0] (ttl 122, id 29581)
> 11:44:28.928298 YOU.www > OTHER.63281: . 4381:5841(1460) ack 500 win 6432 (DF) (ttl 64, id 9276)
> 11:44:28.928352 YOU.www > OTHER.63281: . 5841:7301(1460) ack 500 win 6432 (DF) (ttl 64, id 9277)
> 11:44:52.920072 YOU.www > OTHER.63281: . 4381:5841(1460) ack 500 win 6432 (DF) (ttl 64, id 9278)
> 11:44:53.208927 OTHER.63281 > YOU.www: . ack 5841 win 65535 (DF) [tos 0xa0] (ttl 122, id 29592)
> 11:44:53.209013 YOU.www > OTHER.63281: . 5841:7301(1460) ack 500 win 6432 (DF) (ttl 64, id 9279)
>

[...]
> Is that indicitive of anything?


Yes, it indicates that you do *not* have the typical MTU problem.

It looks like your box, the remote box or a router along the
path is dropping packets due to a yet unknown reason. The most
probable reason is a congested link somewhere along the path.

<WILD GUESS>
Your USB ADSL modem or its PPPoA driver is dropping packets
because of congestion or fault. This happens with big sized
(MTU=1500) and small sized (MTU=576) packets.

If you use big sized packets and no TCP timestamps, the packet
loss is more severe because fewer packets can be sent before
congestion drops packets. Therefore fewer valid RTT samples
can be retrieved. This means your side can hardly calculate a
proper Retransmission Time in this case. This leads to a bad
performance due to very long timeouts before the retransmissions.

If you use small sized packets and no TCP timestamps, the impact
of the packet loss is not that severe. Since there are more
packets sent before congestion kicks in, more valid RTT samples
can be taken. More RTT samples lead to a calculation of a better
Retransmission Time and a better Retransmission Time means
packets are more timely (earlier) retransmitted. This leads to
a better performance of the link than with the big sized packets.

Packet loss with big sized packets on connections using timestamps
is also not that severe. Because each packet contains a timestamp
and due to this more valid RTT samples can be taken even with
fewer packets. This leads also to a better Retransmission Time.
</WILD GUESS>

If my wild guess is true, then enabling TCP timestamps¹ would
improve the performance. However the root cause (the excessive
packet loss) is still not fixed and you need to take care
of that, too.

So, first make sure that there is no bandwidth hog clogging your
upstream or downstream link on your side. Typical bandwidth hogs
are p2p file sharing programs. Use tcpdump to check that no other
program is using excessive bandwidth while you perform the tests.

If you are in control of the remote site, make sure that no
bandwith problem exists on the remote link, too. Use also
tcpdump on the remote site to check whether the packets hit
the other side or not.

Perform the test to different sites and use tcpdump to trace
traffic. Find commonalities in the trace among those who
perform well, and among those who perform bad. Are all sites
which have the TCP timestamp option enabled, performing well?

Perform the test to a site which performs bad and use tcpdump
to trace traffic. Then change your MTU to 576 and repeat the
test. Do you see packet retransmissions in both traces?

Perform the test to a site which performs well with MTU=1500
and use tcpdump to trace traffic. Do you see here also packet
retransmissions?


¹) Another trace showed that your box already has timestamps
enabled. However for timestamps to be used on a connection,
both sides must have them enabled.
So you need to enable it on the remote side. On Windows
you can use "Dr. TCP" for doing that.


HTH

Ciao, Horst
--
»When pings go wrong (It hurts me too)« E.Clapton/E.James/P.Tscharn
 
Reply With Quote
 
 
 
Reply

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Strange problem: no problem with Linux, when I boot windows 2K network is down... Santa Linux Networking 11 11-29-2004 07:46 AM



1 2 3 4 5 6 7 8 9 10 11