Networking Forums

Networking Forums > Computer Networking > Linux Networking > getting tcp packets larger than MTU, how is that possible??

Reply
Thread Tools Display Modes

getting tcp packets larger than MTU, how is that possible??

 
 
Tobias Skytte
Guest
Posts: n/a

 
      01-12-2009, 10:44 AM
Hi,

While speedtesting various MTU sizes (1500 and 9000) I noticed that
packet lengths, as reported by tcpdump, are varying in size from MTU
size (1514 and 9014 bytes, up to 62702(!) bytes).
When the length is over MTU size (e.g. 62702 bytes), the receiving
machine sends back a lot of 66byte ACKs, before receiving the next
packet.
Whats up with this? how can it have a packet size greater than MTU??

I have included below a short excerpt from the tcpdump when it was MTU
9000 and while transfering a large file (2.2gb) over FTP.
The machines both run RH 5.2 and both have the following two NICs
installed in each machine:
Broadcom Corporation NetXtreme II BCM5708 Gigabit Ethernet
Intel Corporation 82571EB Gigabit Ethernet Controller.
The NICs are connected via x-over cable.

The results and speeds are the same when using either card (e.g.
broadcom to broadcom or intel to intel)

Any hints would be much appreciated. Thanks.
Tobias
************************************************** ***********************************************
Capture from the sending machine, having MTU 9000 and capturing on
port 20 only:
************************************************** ***********************************************
11:39:20.323198 00:15:17:16:37:4b (oui Unknown) > 00:15:17:12:f0:8f
(oui Unknown), ethertype IPv4 (0x0800
), length 9014: 192.168.0.2.ftp-data > 192.168.0.4.59912: .
187909:196857(8948) ack 1 win 140 <nop,nop,ti
mestamp 485338 531647>
11:39:20.323200 00:15:17:12:f0:8f (oui Unknown) > 00:15:17:16:37:4b
(oui Unknown), ethertype IPv4 (0x0800
), length 66: 192.168.0.4.59912 > 192.168.0.2.ftp-data: . ack 143169
win 958 <nop,nop,timestamp 531647 48
5338>
11:39:20.323204 00:15:17:16:37:4b (oui Unknown) > 00:15:17:12:f0:8f
(oui Unknown), ethertype IPv4 (0x0800
), length 35858: 192.168.0.2.ftp-data > 192.168.0.4.59912: .
196857:232649(35792) ack 1 win 140 <nop,nop,
timestamp 485338 531647>
11:39:20.323443 00:15:17:12:f0:8f (oui Unknown) > 00:15:17:16:37:4b
(oui Unknown), ethertype IPv4 (0x0800
), length 66: 192.168.0.4.59912 > 192.168.0.2.ftp-data: . ack 161065
win 1014 <nop,nop,timestamp 531648 4
85338>
11:39:20.323448 00:15:17:12:f0:8f (oui Unknown) > 00:15:17:16:37:4b
(oui Unknown), ethertype IPv4 (0x0800
), length 66: 192.168.0.4.59912 > 192.168.0.2.ftp-data: . ack 178961
win 903 <nop,nop,timestamp 531648 48
5338>
11:39:20.323452 00:15:17:16:37:4b (oui Unknown) > 00:15:17:12:f0:8f
(oui Unknown), ethertype IPv4 (0x0800
), length 53754: 192.168.0.2.ftp-data > 192.168.0.4.59912: .
232649:286337(53688) ack 1 win 140 <nop,nop,
timestamp 485338 531648>
11:39:20.323694 00:15:17:12:f0:8f (oui Unknown) > 00:15:17:16:37:4b
(oui Unknown), ethertype IPv4 (0x0800
), length 66: 192.168.0.4.59912 > 192.168.0.2.ftp-data: . ack 196857
win 1014 <nop,nop,timestamp 531648 4
85338>
11:39:20.323698 00:15:17:16:37:4b (oui Unknown) > 00:15:17:12:f0:8f
(oui Unknown), ethertype IPv4 (0x0800
), length 9014: 192.168.0.2.ftp-data > 192.168.0.4.59912: .
286337:295285(8948) ack 1 win 140 <nop,nop,ti
mestamp 485339 531648>
11:39:20.323942 00:15:17:12:f0:8f (oui Unknown) > 00:15:17:16:37:4b
(oui Unknown), ethertype IPv4 (0x0800
), length 66: 192.168.0.4.59912 > 192.168.0.2.ftp-data: . ack 214753
win 1069 <nop,nop,timestamp 531648 4
85338>
11:39:20.323952 00:15:17:12:f0:8f (oui Unknown) > 00:15:17:16:37:4b
(oui Unknown), ethertype IPv4 (0x0800
), length 66: 192.168.0.4.59912 > 192.168.0.2.ftp-data: . ack 241597
win 903 <nop,nop,timestamp 531648 48
5338>
11:39:20.324193 00:15:17:12:f0:8f (oui Unknown) > 00:15:17:16:37:4b
(oui Unknown), ethertype IPv4 (0x0800
), length 66: 192.168.0.4.59912 > 192.168.0.2.ftp-data: . ack 259493
win 1014 <nop,nop,timestamp 531648 4
85338>
11:39:20.324198 00:15:17:16:37:4b (oui Unknown) > 00:15:17:12:f0:8f
(oui Unknown), ethertype IPv4 (0x0800
), length 17962: 192.168.0.2.ftp-data > 192.168.0.4.59912: P
340025:357921(17896) ack 1 win 140 <nop,nop,
timestamp 485339 531648>
11:39:20.324442 00:15:17:12:f0:8f (oui Unknown) > 00:15:17:16:37:4b
(oui Unknown), ethertype IPv4 (0x0800
), length 66: 192.168.0.4.59912 > 192.168.0.2.ftp-data: . ack 277389
win 1069 <nop,nop,timestamp 531649 4
85338>
11:39:20.324452 00:15:17:12:f0:8f (oui Unknown) > 00:15:17:16:37:4b
(oui Unknown), ethertype IPv4 (0x0800
), length 66: 192.168.0.4.59912 > 192.168.0.2.ftp-data: . ack 304233
win 1119 <nop,nop,timestamp 531649 4
85338>
11:39:20.324456 00:15:17:16:37:4b (oui Unknown) > 00:15:17:12:f0:8f
(oui Unknown), ethertype IPv4 (0x0800
), length 6402: 192.168.0.2.ftp-data > 192.168.0.4.59912: .
414221:420557(6336) ack 1 win 140 <nop,nop,ti
:
 
Reply With Quote
 
 
 
 
Tobias Skytte
Guest
Posts: n/a

 
      01-12-2009, 10:50 AM
Forgot to mention that kernel version on both machines is:
2.6.18-92.el5

Also, on a FTP transfer of 2.2gb and MTU 9000 I get the following
packets:
16542 packets of length 66 bytes (ACKs from the receiver)
5746 packets of 9014 bytes
2127 packets of 62702 bytes
69 packets of other size

Tobias
 
Reply With Quote
 
Pascal Hambourg
Guest
Posts: n/a

 
      01-12-2009, 02:08 PM
Hello,

Tobias Skytte a écrit :
>
> While speedtesting various MTU sizes (1500 and 9000) I noticed that
> packet lengths, as reported by tcpdump, are varying in size from MTU
> size (1514 and 9014 bytes, up to 62702(!) bytes).
> When the length is over MTU size (e.g. 62702 bytes), the receiving
> machine sends back a lot of 66byte ACKs, before receiving the next
> packet.
> Whats up with this? how can it have a packet size greater than MTU??


Could it be caused by the NIC doing TSO (TCP segmentation offload) ?
 
Reply With Quote
 
Rick Jones
Guest
Posts: n/a

 
      01-12-2009, 05:07 PM
Pascal Hambourg <boite-a-(E-Mail Removed)> wrote:
> Hello,


> Tobias Skytte a ?crit :
> >
> > While speedtesting various MTU sizes (1500 and 9000) I noticed that
> > packet lengths, as reported by tcpdump, are varying in size from MTU
> > size (1514 and 9014 bytes, up to 62702(!) bytes).
> > When the length is over MTU size (e.g. 62702 bytes), the receiving
> > machine sends back a lot of 66byte ACKs, before receiving the next
> > packet.
> > Whats up with this? how can it have a packet size greater than MTU??


> Could it be caused by the NIC doing TSO (TCP segmentation offload) ?


Most likely, and if one were snapping the entire send tcpdump wuold
probably report a botched checksum too, thanks to CKO

Packet tracing on the sending system takes-place _before_ the
packet(s) make it to the wire - on the wire, the packets will be the
"correct" size and should have the correct checksum.

If what you want to see is the on the wire stuff, you need to trace
with a third system that is not part of any conversations - and
perform some tricks with configuring monitor ports on switches and
whatnot.

rick jone
--
portable adj, code that compiles under more than one compiler
these opinions are mine, all mine; HP might not want them anyway...
feel free to post, OR email to rick.jones2 in hp.com but NOT BOTH...
 
Reply With Quote
 
Blah Blah Blah
Guest
Posts: n/a

 
      01-12-2009, 06:48 PM
On Mon, 12 Jan 2009 18:07:13 +0000, Rick Jones faxed us with....

> Pascal Hambourg <boite-a-(E-Mail Removed)> wrote:
>> Hello,

>
>> Tobias Skytte a ?crit :
>> >
>> > While speedtesting various MTU sizes (1500 and 9000) I noticed that
>> > packet lengths, as reported by tcpdump, are varying in size from MTU
>> > size (1514 and 9014 bytes, up to 62702(!) bytes). When the length is
>> > over MTU size (e.g. 62702 bytes), the receiving machine sends back a
>> > lot of 66byte ACKs, before receiving the next packet.
>> > Whats up with this? how can it have a packet size greater than MTU??

>
>> Could it be caused by the NIC doing TSO (TCP segmentation offload) ?

>
> Most likely, and if one were snapping the entire send tcpdump wuold
> probably report a botched checksum too, thanks to CKO
>
> Packet tracing on the sending system takes-place _before_ the packet(s)
> make it to the wire - on the wire, the packets will be the "correct"
> size and should have the correct checksum.
>
> If what you want to see is the on the wire stuff, you need to trace with
> a third system that is not part of any conversations - and perform some
> tricks with configuring monitor ports on switches and whatnot.
>
> rick jone


This made an interesting read. I was thinking Path MTU Discovery myself -
but this much more interesting. Can we expand on this a bit?

--
Replica Watches - TRY LIDL - Cheap meds? Visit your GP
 
Reply With Quote
 
Rick Jones
Guest
Posts: n/a

 
      01-12-2009, 09:24 PM
Blah Blah Blah <(E-Mail Removed)> wrote:
> On Mon, 12 Jan 2009 18:07:13 +0000, Rick Jones faxed us with....


> > Most likely, and if one were snapping the entire send tcpdump
> > would probably report a botched checksum too, thanks to CKO
> >
> > Packet tracing on the sending system takes-place _before_ the
> > packet(s) make it to the wire - on the wire, the packets will be
> > the "correct" size and should have the correct checksum.
> >
> > If what you want to see is the on the wire stuff, you need to
> > trace with a third system that is not part of any conversations -
> > and perform some tricks with configuring monitor ports on switches
> > and whatnot.
> >
> > rick jone


> This made an interesting read. I was thinking Path MTU Discovery
> myself - but this much more interesting. Can we expand on this a
> bit?


I suppose - in which direction do you seek to see it expand?

rick jones
--
The glass is neither half-empty nor half-full. The glass has a leak.
The real question is "Can it be patched?"
these opinions are mine, all mine; HP might not want them anyway...
feel free to post, OR email to rick.jones2 in hp.com but NOT BOTH...
 
Reply With Quote
 
Blah Blah Blah
Guest
Posts: n/a

 
      01-13-2009, 08:43 AM
On Mon, 12 Jan 2009 22:24:21 +0000, Rick Jones faxed us with....

> Blah Blah Blah <(E-Mail Removed)> wrote:
>> On Mon, 12 Jan 2009 18:07:13 +0000, Rick Jones faxed us with....

>
>> > Most likely, and if one were snapping the entire send tcpdump would
>> > probably report a botched checksum too, thanks to CKO
>> >
>> > Packet tracing on the sending system takes-place _before_ the
>> > packet(s) make it to the wire - on the wire, the packets will be the
>> > "correct" size and should have the correct checksum.
>> >
>> > If what you want to see is the on the wire stuff, you need to trace
>> > with a third system that is not part of any conversations - and
>> > perform some tricks with configuring monitor ports on switches and
>> > whatnot.
>> >
>> > rick jone

>
>> This made an interesting read. I was thinking Path MTU Discovery myself
>> - but this much more interesting. Can we expand on this a bit?

>
> I suppose - in which direction do you seek to see it expand?
>
> rick jones


Not needed Rick - but thanks. A quick google put that one to bed.

--
Replica Watches - TRY LIDL - Cheap meds? Visit your GP
 
Reply With Quote
 
Pascal Hambourg
Guest
Posts: n/a

 
      01-13-2009, 01:59 PM
Maxwell Lol a écrit :
> Tobias Skytte <(E-Mail Removed)> writes:
>
>> While speedtesting various MTU sizes (1500 and 9000) I noticed that
>> packet lengths, as reported by tcpdump, are varying in size from MTU
>> size (1514 and 9014 bytes, up to 62702(!) bytes).

>
> TCP and UDP packets can be fragmented into IP fragments.


AFAIK TCP tries not to send segments bigger than the path MTU allows in
order to avoid fragmentation.

> IP reassembles fragments into larger units.


In a "normal" (without offloading) data path, tcpdump sees packets
before they enter and after they leave the IP stack, so it should see
the fragments, not the reassembled datagrams.

> Are you filtering out non-TCP traffic in your TCPdump results?
> If so, you won't see the IP fragments.


Why not ? The protocol number is in the IP header of each fragment, so
tcpdump knows the protocol of the datagram a fragment is part of.
 
Reply With Quote
 
Pascal Hambourg
Guest
Posts: n/a

 
      01-14-2009, 08:20 AM
Maxwell Lol a écrit :
> Pascal Hambourg <boite-a-(E-Mail Removed)> writes:
>
>> Maxwell Lol a écrit :

>
>>> Are you filtering out non-TCP traffic in your TCPdump results?
>>> If so, you won't see the IP fragments.

>>
>> Why not ? The protocol number is in the IP header of each fragment, so
>> tcpdump knows the protocol of the datagram a fragment is part of.

>
> I haven't tested this. But this is my reasoning
>
> Tcpdump prints fragments as
>
> (frag id:size@offset+)
> (frag id:size@offset)
>
> It doesn't identify the fragment as UDP, TCP or whatever.


Each fragment contains a complete IP header, and each IP header contains
the protocol number, so in /my/ reasoning nothing prevents tcpdump from
printing the protocol of a fragment. Of course it won't be able to print
other information such as the port numbers or ICMP type/code as they are
in the first (offset 0) fragment only.

> Checking the source, the frag printing routine is in print-ip.c
> and not in print-tcp.c or print-udp.c
>
> Also looking at print-ip.c it has
>
> switch (ipds->nh) {
>
> ---------------[snip]-------------
> case IPPROTO_TCP:
> /* pass on the MF bit plus the offset to detect fragments */
> tcp_print(ipds->cp, ipds->len, (const u_char *)ipds->ip,
> ipds->off & (IP_MF|IP_OFFMASK));
> break;
>
> case IPPROTO_UDP:
> /* pass on the MF bit plus the offset to detect fragments */
> udp_print(ipds->cp, ipds->len, (const u_char *)ipds->ip,
> ipds->off & (IP_MF|IP_OFFMASK));
> break;
> ---------------[snip]-------------
> case IPPROTO_IPV4:
> /* DVMRP multicast tunnel (ip-in-ip encapsulation) */
> ip_print(gndo, ipds->cp, ipds->len);
> if (! vflag) {
> ND_PRINT((ndo, " (ipip-proto-4)"));
> return;
> }
> break;


Hmm, looks like IPPROTO_IPV4 is not the raw IP protocol but IPIP
tunneling encapsulation (protocol number 4 ?).

> Which tells me that when you use "tcp" as a filter, "ip" is not
> printed (unless you say "tcp and ip")


I cannot easily test with TCP because the MSS limits the size of TCP
segments, but I tested with UDP and ICMP traceroute sending packets of
1500 octets over a link with MTU set to 1460 :

zenith:~# tcpdump -ntvi ppp0 udp and host y.y.y.y
tcpdump: listening on ppp0, link-type LINUX_SLL (Linux cooked), capture
size 96 bytes
IP (tos 0x0, ttl 1, id 35011, offset 0, flags [+], proto: UDP (17),
length: 1460) x.x.x.x.35007 > y.y.y.y.33438: UDP, length 1472
IP (tos 0x0, ttl 1, id 35011, offset 1440, flags [none], proto: UDP
(17), length: 60) x.x.x.x > y.y.y.y: udp

zenith:~# tcpdump -ntvi ppp0 icmp and host y.y.y.y
tcpdump: listening on ppp0, link-type LINUX_SLL (Linux cooked), capture
size 96 bytes
IP (tos 0x0, ttl 1, id 35013, offset 0, flags [+], proto: ICMP (1),
length: 1460) x.x.x.x > y.y.y.y: ICMP echo request, id 35009, seq 4,
length 1440
IP (tos 0x0, ttl 1, id 35013, offset 1440, flags [none], proto: ICMP
(1), length: 60) x.x.x.x > y.y.y.y: icmp
 
Reply With Quote
 
 
 
Reply

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Drawbacks of sending UDP packets larger than MTU? Nerdwurx Linux Networking 3 04-24-2010 01:35 AM
Anyone know how to make the iptables connection tracking table allocation larger? D. Stussy Linux Networking 8 05-19-2008 03:47 PM
Setup Network for larger area SF Windows Networking 2 03-05-2008 11:49 PM
MN-500 larger file transfer error a931048 Broadband Hardware 3 05-14-2004 01:55 AM
ADDING LARGER HARDDRIVE TO OLD-BIOS SYSTEMS MATT DONATO Broadband Hardware 0 01-27-2004 02:20 AM



1 2 3 4 5 6 7 8 9 10 11