Networking Forums

Networking Forums > Computer Networking > Linux Networking > NFS mount hangs when using one specific IP

Reply
Thread Tools Display Modes

NFS mount hangs when using one specific IP

 
 
Michael Ritzert
Guest
Posts: n/a

 
      05-19-2005, 01:08 PM
Hi all,

I'm setting up a FC3 system to use automount to mount the home
directories. We are required to register MAC addresses for all computers
we use and only get a fixed IP assigned once this process is completed.
I usually set up the computer before I have the IP, so I use the MAC and
IP address of a computer that is currently turned off (standing in my
office, so there is no chance of it being turned on).

This time, again, I proceeded this way and everything was as expected.
Today I received the IP registration, changed IP and MAC and all of a
sudden, I can no longer mount anything from our server. The mount
process just hangs. On the server side the log says that the mount
request has been authenticated.
Listing the mounts on the client does not list the mount I'm trying to
establish.

My first guess was that I somehow managed to configure the server to
restrict access to a specific subnet on another level (since NFS itself
says the mount is Ok). The old IP is .203, the new one .210, so I'me
stepping out of a /28 subnet. However, the mount works from .208.
The only IP that doesn't work is just the one that I have to use... I'm
changing nothing but the IP and MAC addresses, Firewall, SELinux, etc.
settings all do not change.

All other traffic between the two hosts (ping, ssh, NTP) is just fine,
it's just NFS that's freaking out.

Does anybody have an idea where else I might look? I'm really lost here.
I can provide the tcpdump sniff on request.

Michael
 
Reply With Quote
 
 
 
 
Michael Ritzert
Guest
Posts: n/a

 
      05-19-2005, 01:29 PM
Forgot to mention: When I configure another computer on the IP in
question, it also doesn't work. mount just hangs and can't be killed.
So it's really IP specific.

Michael
 
Reply With Quote
 
Menno Duursma
Guest
Posts: n/a

 
      05-19-2005, 02:44 PM
On Thu, 19 May 2005 15:29:43 +0200, Michael Ritzert wrote:

> Forgot to mention: When I configure another computer on the IP in
> question, it also doesn't work. mount just hangs and can't be killed.
> So it's really IP specific.


Sounds like a portmapper problem. How are hosts.{allow,deny} configured
(remember for the rpc.portmap you can only use IP adress lists or "ALL".)

tcpdchk
tcpdmatch

I'd check both client and server settings, and stuff like:

rpcinfo
nfsstat
showmount

HTH.

--
-Menno.

 
Reply With Quote
 
Michael Ritzert
Guest
Posts: n/a

 
      05-19-2005, 06:58 PM
Menno Duursma wrote:
> On Thu, 19 May 2005 15:29:43 +0200, Michael Ritzert wrote:
>
>> Forgot to mention: When I configure another computer on the IP in
>> question, it also doesn't work. mount just hangs and can't be killed.
>> So it's really IP specific.

>
> Sounds like a portmapper problem. How are hosts.{allow,deny} configured
> (remember for the rpc.portmap you can only use IP adress lists or "ALL".)


hosts.allow is empty, hosts.deny contains only
http-rman : ALL EXCEPT LOCAL

I grepped all of /etc for our IP address prefix and the IP in question and
found nothing of interest.

> I'd check both client and server settings, and stuff like:
>
> rpcinfo
> nfsstat
> showmount


run on the client:
# rpcinfo -t server nfs
program 100003 version 2 ready and waiting
program 100003 version 3 ready and waiting
# rpcinfo -u server nfs
program 100003 version 2 ready and waiting
program 100003 version 3 ready and waiting

# rpcinfo -t server mount
program 100005 version 1 ready and waiting
program 100005 version 2 ready and waiting
program 100005 version 3 ready and waiting
# rpcinfo -u server mount
program 100005 version 1 ready and waiting
program 100005 version 2 ready and waiting
program 100005 version 3 ready and waiting

# rpcinfo -p server
100000 2 tcp 111 portmapper
100000 2 udp 111 portmapper
100003 2 udp 2049 nfs
100003 3 udp 2049 nfs
100227 3 udp 2049 nfs_acl
100003 2 tcp 2049 nfs
100003 3 tcp 2049 nfs
100227 3 tcp 2049 nfs_acl
100021 1 udp 32771 nlockmgr
100021 3 udp 32771 nlockmgr
100021 4 udp 32771 nlockmgr
100024 1 udp 32771 status
100021 1 tcp 32776 nlockmgr
100021 3 tcp 32776 nlockmgr
100021 4 tcp 32776 nlockmgr
100024 1 tcp 32776 status
100005 1 udp 1010 mountd
100005 1 tcp 1013 mountd
100005 2 udp 1010 mountd
100005 2 tcp 1013 mountd
100005 3 udp 1010 mountd
100005 3 tcp 1013 mountd

# showmount -e server
Export list for server:
[...]
/home client
[...]

after the mount:
# nfsstat -c
Client rpc stats:
calls retrans authrefrsh
1 0 0
Client nfs v3:
null getattr setattr lookup access readlink
0 0% 0 0% 0 0% 0 0% 0 0% 0 0%
read write create mkdir symlink mknod
0 0% 0 0% 0 0% 0 0% 0 0% 0 0%
remove rmdir rename link readdir readdirplus
0 0% 0 0% 0 0% 0 0% 0 0% 0 0%
fsstat fsinfo pathconf commit
0 0% 1 100% 0 0% 0 0%


In the meantime, I installed tcpdump and let it trace the traffic between
the two hosts. In my analysis I can see that the server receives an FSINFO
call, but never answers. That fits well to the nfsstat output above.
When I perform a mount from another host, the initial sequence is the same
(down to the relative packet numbers), but the server sends the expected
reply to the FSINFO call.

BTW: Should the incorrect checksums sent by the server bother me? Or does
the NIC correct these?

Another idea I have is to set up a second NFS server and see if I can mount
a directory from this one.

Michael

TCP Dump: (stripped all the ACK,SYN,FIN stuff)

No. Time Source Destination Protocol
Info
4 0.000148 1.2.3.210 1.2.3.111 Portmap V2
GETPORT Call (Reply In 6) NFS(100003) V:3 TCP

Frame 4 (126 bytes on wire, 126 bytes captured)
Ethernet II, Src: 12:34:56:78:9a:bc, Dst: fe:dc:ba:98:76:54
Internet Protocol, Src Addr: 1.2.3.210 (1.2.3.210), Dst Addr: 1.2.3.111
(1.2.3.111)
Transmission Control Protocol, Src Port: 32781 (32781), Dst Port: sunrpc
(111), Seq: 1, Ack: 1, Len: 60
Source port: 32781 (32781)
Destination port: sunrpc (111)
Sequence number: 1 (relative sequence number)
Next sequence number: 61 (relative sequence number)
Acknowledgement number: 1 (relative ack number)
Header length: 32 bytes
Flags: 0x0018 (PSH, ACK)
Window size: 5840 (scaled)
Checksum: 0xc51f (correct)
Options: (12 bytes)
Remote Procedure Call, Type:Call XID:0x295f366c
Portmap GETPORT Call NFS(100003) Version:3 TCP

No. Time Source Destination Protocol
Info
6 0.000356 1.2.3.111 1.2.3.210 Portmap V2
GETPORT Reply (Call In 4) Port:2049

Frame 6 (98 bytes on wire, 98 bytes captured)
Ethernet II, Src: fe:dc:ba:98:76:54, Dst: 12:34:56:78:9a:bc
Internet Protocol, Src Addr: 1.2.3.111 (1.2.3.111), Dst Addr: 1.2.3.210
(1.2.3.210)
Transmission Control Protocol, Src Port: sunrpc (111), Dst Port: 32781
(32781), Seq: 1, Ack: 61, Len: 32
Source port: sunrpc (111)
Destination port: 32781 (32781)
Sequence number: 1 (relative sequence number)
Next sequence number: 33 (relative sequence number)
Acknowledgement number: 61 (relative ack number)
Header length: 32 bytes
Flags: 0x0018 (PSH, ACK)
Window size: 5792 (scaled)
Checksum: 0x933b (incorrect, should be 0xca7b)
Options: (12 bytes)
Remote Procedure Call, Type:Reply XID:0x295f366c
Portmap GETPORT Reply Port:2049 Port:2049

No. Time Source Destination Protocol
Info
14 0.000660 1.2.3.210 1.2.3.111 NFS V3
NULL Call (Reply In 16)

Frame 14 (110 bytes on wire, 110 bytes captured)
Ethernet II, Src: 12:34:56:78:9a:bc, Dst: fe:dc:ba:98:76:54
Internet Protocol, Src Addr: 1.2.3.210 (1.2.3.210), Dst Addr: 1.2.3.111
(1.2.3.111)
Transmission Control Protocol, Src Port: 32782 (32782), Dst Port: 2049
(2049), Seq: 1, Ack: 1, Len: 44
Source port: 32782 (32782)
Destination port: 2049 (2049)
Sequence number: 1 (relative sequence number)
Next sequence number: 45 (relative sequence number)
Acknowledgement number: 1 (relative ack number)
Header length: 32 bytes
Flags: 0x0018 (PSH, ACK)
Window size: 5840 (scaled)
Checksum: 0x1c2c (correct)
Options: (12 bytes)
Remote Procedure Call, Type:Call XID:0x453d111d
Network File System, NULL Call

No. Time Source Destination Protocol
Info
16 0.000688 1.2.3.111 1.2.3.210 NFS V3
NULL Reply (Call In 14)

Frame 16 (94 bytes on wire, 94 bytes captured)
Ethernet II, Src: fe:dc:ba:98:76:54, Dst: 12:34:56:78:9a:bc
Internet Protocol, Src Addr: 1.2.3.111 (1.2.3.111), Dst Addr: 1.2.3.210
(1.2.3.210)
Transmission Control Protocol, Src Port: 2049 (2049), Dst Port: 32782
(32782), Seq: 1, Ack: 45, Len: 28
Source port: 2049 (2049)
Destination port: 32782 (32782)
Sequence number: 1 (relative sequence number)
Next sequence number: 29 (relative sequence number)
Acknowledgement number: 45 (relative ack number)
Header length: 32 bytes
Flags: 0x0018 (PSH, ACK)
Window size: 5792 (scaled)
Checksum: 0x9337 (incorrect, should be 0xa2d3)
Options: (12 bytes)
Remote Procedure Call, Type:Reply XID:0x453d111d
Network File System, NULL Reply

No. Time Source Destination Protocol
Info
24 0.000971 1.2.3.210 1.2.3.111 Portmap V2
GETPORT Call (Reply In 26) MOUNT(100005) V:3 TCP

Frame 24 (126 bytes on wire, 126 bytes captured)
Ethernet II, Src: 12:34:56:78:9a:bc, Dst: fe:dc:ba:98:76:54
Internet Protocol, Src Addr: 1.2.3.210 (1.2.3.210), Dst Addr: 1.2.3.111
(1.2.3.111)
Transmission Control Protocol, Src Port: 32783 (32783), Dst Port: sunrpc
(111), Seq: 1, Ack: 1, Len: 60
Source port: 32783 (32783)
Destination port: sunrpc (111)
Sequence number: 1 (relative sequence number)
Next sequence number: 61 (relative sequence number)
Acknowledgement number: 1 (relative ack number)
Header length: 32 bytes
Flags: 0x0018 (PSH, ACK)
Window size: 5840 (scaled)
Checksum: 0xc632 (correct)
Options: (12 bytes)
Remote Procedure Call, Type:Call XID:0x5517656d
Portmap GETPORT Call MOUNT(100005) Version:3 TCP

No. Time Source Destination Protocol
Info
26 0.001105 1.2.3.111 1.2.3.210 Portmap V2
GETPORT Reply (Call In 24) Port:1013

Frame 26 (98 bytes on wire, 98 bytes captured)
Ethernet II, Src: fe:dc:ba:98:76:54, Dst: 12:34:56:78:9a:bc
Internet Protocol, Src Addr: 1.2.3.111 (1.2.3.111), Dst Addr: 1.2.3.210
(1.2.3.210)
Transmission Control Protocol, Src Port: sunrpc (111), Dst Port: 32783
(32783), Seq: 1, Ack: 61, Len: 32
Source port: sunrpc (111)
Destination port: 32783 (32783)
Sequence number: 1 (relative sequence number)
Next sequence number: 33 (relative sequence number)
Acknowledgement number: 61 (relative ack number)
Header length: 32 bytes
Flags: 0x0018 (PSH, ACK)
Window size: 5792 (scaled)
Checksum: 0x933b (incorrect, should be 0xcf9c)
Options: (12 bytes)
Remote Procedure Call, Type:Reply XID:0x5517656d
Portmap GETPORT Reply Port:1013 Port:1013

No. Time Source Destination Protocol
Info
34 0.001390 1.2.3.210 1.2.3.111 MOUNT V3
NULL Call (Reply In 36)

Frame 34 (110 bytes on wire, 110 bytes captured)
Ethernet II, Src: 12:34:56:78:9a:bc, Dst: fe:dc:ba:98:76:54
Internet Protocol, Src Addr: 1.2.3.210 (1.2.3.210), Dst Addr: 1.2.3.111
(1.2.3.111)
Transmission Control Protocol, Src Port: 32784 (32784), Dst Port: 1013
(1013), Seq: 1, Ack: 1, Len: 44
Source port: 32784 (32784)
Destination port: 1013 (1013)
Sequence number: 1 (relative sequence number)
Next sequence number: 45 (relative sequence number)
Acknowledgement number: 1 (relative ack number)
Header length: 32 bytes
Flags: 0x0018 (PSH, ACK)
Window size: 5840 (scaled)
Checksum: 0xb835 (correct)
Options: (12 bytes)
Remote Procedure Call, Type:Call XID:0x2e424981
Mount Service

No. Time Source Destination Protocol
Info
36 0.001628 1.2.3.111 1.2.3.210 MOUNT V3
NULL Reply (Call In 34)

Frame 36 (94 bytes on wire, 94 bytes captured)
Ethernet II, Src: fe:dc:ba:98:76:54, Dst: 12:34:56:78:9a:bc
Internet Protocol, Src Addr: 1.2.3.111 (1.2.3.111), Dst Addr: 1.2.3.210
(1.2.3.210)
Transmission Control Protocol, Src Port: 1013 (1013), Dst Port: 32784
(32784), Seq: 1, Ack: 45, Len: 28
Source port: 1013 (1013)
Destination port: 32784 (32784)
Sequence number: 1 (relative sequence number)
Next sequence number: 29 (relative sequence number)
Acknowledgement number: 45 (relative ack number)
Header length: 32 bytes
Flags: 0x0018 (PSH, ACK)
Window size: 5792 (scaled)
Checksum: 0x9337 (incorrect, should be 0x3ee0)
Options: (12 bytes)
Remote Procedure Call, Type:Reply XID:0x2e424981
Mount Service

No. Time Source Destination Protocol
Info
44 0.001961 1.2.3.210 1.2.3.111 MOUNT V3
MNT Call (Reply In 46)

Frame 44 (198 bytes on wire, 198 bytes captured)
Ethernet II, Src: 12:34:56:78:9a:bc, Dst: fe:dc:ba:98:76:54
Internet Protocol, Src Addr: 1.2.3.210 (1.2.3.210), Dst Addr: 1.2.3.111
(1.2.3.111)
Transmission Control Protocol, Src Port: 905 (905), Dst Port: 1013 (1013),
Seq: 1, Ack: 1, Len: 132
Source port: 905 (905)
Destination port: 1013 (1013)
Sequence number: 1 (relative sequence number)
Next sequence number: 133 (relative sequence number)
Acknowledgement number: 1 (relative ack number)
Header length: 32 bytes
Flags: 0x0018 (PSH, ACK)
Window size: 5840 (scaled)
Checksum: 0xb4bb (correct)
Options: (12 bytes)
Remote Procedure Call, Type:Call XID:0x5f175b37
Mount Service

No. Time Source Destination Protocol
Info
46 0.007929 1.2.3.111 1.2.3.210 MOUNT V3
MNT Reply (Call In 44)

Frame 46 (126 bytes on wire, 126 bytes captured)
Ethernet II, Src: fe:dc:ba:98:76:54, Dst: 12:34:56:78:9a:bc
Internet Protocol, Src Addr: 1.2.3.111 (1.2.3.111), Dst Addr: 1.2.3.210
(1.2.3.210)
Transmission Control Protocol, Src Port: 1013 (1013), Dst Port: 905 (905),
Seq: 1, Ack: 133, Len: 60
Source port: 1013 (1013)
Destination port: 905 (905)
Sequence number: 1 (relative sequence number)
Next sequence number: 61 (relative sequence number)
Acknowledgement number: 133 (relative ack number)
Header length: 32 bytes
Flags: 0x0018 (PSH, ACK)
Window size: 5792 (scaled)
Checksum: 0x9357 (incorrect, should be 0x1744)
Options: (12 bytes)
Remote Procedure Call, Type:Reply XID:0x5f175b37
Mount Service

No. Time Source Destination Protocol
Info
57 3.161871 1.2.3.210 1.2.3.111 NFS V3
FSINFO Call, FH:0x03fa0008

Frame 57 (202 bytes on wire, 202 bytes captured)
Ethernet II, Src: 12:34:56:78:9a:bc, Dst: fe:dc:ba:98:76:54
Internet Protocol, Src Addr: 1.2.3.210 (1.2.3.210), Dst Addr: 1.2.3.111
(1.2.3.111)
Transmission Control Protocol, Src Port: 800 (800), Dst Port: 2049 (2049),
Seq: 1, Ack: 1, Len: 136
Source port: 800 (800)
Destination port: 2049 (2049)
Sequence number: 1 (relative sequence number)
Next sequence number: 137 (relative sequence number)
Acknowledgement number: 1 (relative ack number)
Header length: 32 bytes
Flags: 0x0018 (PSH, ACK)
Window size: 5840 (scaled)
Checksum: 0x81c4 (correct)
Options: (12 bytes)
Remote Procedure Call, Type:Call XID:0x9a1b4739
Network File System, FSINFO Call DH:0x03fa0008

No. Time Source Destination Protocol
Info
58 3.161879 1.2.3.111 1.2.3.210 TCP
2049 > 800 [ACK] Seq=1 Ack=137 Win=6864 Len=0 TSV=177229225 TSER=4294915050

Frame 58 (66 bytes on wire, 66 bytes captured)
Ethernet II, Src: fe:dc:ba:98:76:54, Dst: 12:34:56:78:9a:bc
Internet Protocol, Src Addr: 1.2.3.111 (1.2.3.111), Dst Addr: 1.2.3.210
(1.2.3.210)
Transmission Control Protocol, Src Port: 2049 (2049), Dst Port: 800 (800),
Seq: 1, Ack: 137, Len: 0
Source port: 2049 (2049)
Destination port: 800 (800)
Sequence number: 1 (relative sequence number)
Acknowledgement number: 137 (relative ack number)
Header length: 32 bytes
Flags: 0x0010 (ACK)
Window size: 6864 (scaled)
Checksum: 0x26fb (correct)
Options: (12 bytes)
SEQ/ACK analysis
This is an ACK to the segment in frame: 57
The RTT to ACK the segment was: 0.000008000 seconds

 
Reply With Quote
 
Menno Duursma
Guest
Posts: n/a

 
      05-19-2005, 09:00 PM
On Thu, 19 May 2005 20:58:15 +0200, Michael Ritzert wrote:
> Menno Duursma wrote:
>> On Thu, 19 May 2005 15:29:43 +0200, Michael Ritzert wrote:


[ Snip - libwrap/tcpwrapper not the problem. ]

>> I'd check both client and server settings, and stuff like:


[ Snip - looking OK to me. ]

> # showmount -e server
> Export list for server:
> [...]
> /home client


Not sure but this might have to be:

/home client(rw)

Also don't forget to /etc/init.d/nfsd restart (or similar.)
And look at: sudo tail /var/log/{messages,secure,syslog}
Just for kicks /usr/sbin/exportfs -f -v

> after the mount:
> # nfsstat -c
> Client rpc stats:
> calls retrans authrefrsh
> 1 0 0


So it doesn't get answered: what's in the (server) logs?
BTW: how did you mount, as when "-o nolock" does work, you'll know it's a
locking problem.

> In the meantime, I installed tcpdump and let it trace the traffic between
> the two hosts. In my analysis I can see that the server receives an FSINFO
> call, but never answers. That fits well to the nfsstat output above.


Ok.

> When I perform a mount from another host, the initial sequence is the
> same (down to the relative packet numbers), but the server sends the
> expected reply to the FSINFO call.
>
> BTW: Should the incorrect checksums sent by the server bother me?


Well idunno, it would bother me ...

> Or does the NIC correct these?


I don't think so. If anything it discarts the frame, and lets the
upper-layer protocol handle a resent of it - so if nothing else it's slow.
TCP does this resending. UDP does not, (good for streaming media where one
doesn't care) but applications might have their own transmission integrity
scheme ontop of it.

> Another idea I have is to set up a second NFS server and see if I can
> mount a directory from this one.


If that's an option, good, see what you come op with: have a blast

> 1.2.3.111 (1.2.3.111), Dst Addr: 1.2.3.210 (1.2.3.210)


Reply: so the portmapper connection works.

> Checksum: 0x933b (incorrect, should be 0xca7b) Options: (12 bytes)


But the CRC sucks, thus the packet gets resent:

> 1.2.3.210 (1.2.3.210), Dst Addr: 1.2.3.111 (1.2.3.111)

....
> 1.2.3.111 (1.2.3.111), Dst Addr: 1.2.3.210 (1.2.3.210)


Some more suckage ...

> Checksum: 0x9337 (incorrect, should be 0xa2d3) Options: (12 bytes)


Here we go again [ and some more of this. ]

> No. Time Source Destination Protocol
> Info
> 58 3.161879 1.2.3.111 1.2.3.210 TCP
> 2049 > 800 [ACK] Seq=1 Ack=137 Win=6864 Len=0 TSV=177229225
> TSER=4294915050
>
> Frame 58 (66 bytes on wire, 66 bytes captured) Ethernet II, Src:
> fe:dc:ba:98:76:54, Dst: 12:34:56:78:9a:bc Internet Protocol, Src Addr:
> 1.2.3.111 (1.2.3.111), Dst Addr: 1.2.3.210 (1.2.3.210)
> Transmission Control Protocol, Src Port: 2049 (2049), Dst Port: 800
> (800), Seq: 1, Ack: 137, Len: 0
> Source port: 2049 (2049)
> Destination port: 800 (800)
> Sequence number: 1 (relative sequence number) Acknowledgement
> number: 137 (relative ack number) Header length: 32 bytes
> Flags: 0x0010 (ACK)
> Window size: 6864 (scaled)
> Checksum: 0x26fb (correct)
> Options: (12 bytes)
> SEQ/ACK analysis
> This is an ACK to the segment in frame: 57 The RTT to ACK the
> segment was: 0.000008000 seconds


Ok here the mount request does finally does seem to get through alright.
Hence what do the logs say (timedout?) ?

Are you sure the (physical) connection is alright?
What does "netstat -s" and "/sbin/ifconfig eth0" have to say about all
this? And after sone "ping" ?

Was the other machine (used as test object with the same IP) connected
via the same cableand wall outlet? Try another maybe ...

HTH.

--
-Menno.

 
Reply With Quote
 
Michael Ritzert
Guest
Posts: n/a

 
      05-20-2005, 07:57 AM
Hi Menno,

thank you for all the time you're spending with my problem.

Menno Duursma wrote:

> On Thu, 19 May 2005 20:58:15 +0200, Michael Ritzert wrote:
>> # showmount -e server
>> Export list for server:
>> [...]
>> /home client

>
> Not sure but this might have to be:
>
> /home client(rw)


It's the same for a client that's working. In /etc/exports, t reads
/home client(rw,root_squash,sync)

> Also don't forget to /etc/init.d/nfsd restart (or similar.)


It's a SuSE system. "rcnfsserver reload" reloads the export list.
When I remove the system from the list of allowed clients, I get
mount: server:/home failed, reason given by server: Keine Berechtigung (i.e.
Permission denied). So this is Ok.

> And look at: sudo tail /var/log/{messages,secure,syslog}


Ok. Client side:
that's after 15 minutes, when I compare it to the server side:
May 19 21:09:46 client kernel: nfs: server server not responding, still
trying
Pretty exactly what's happening... Nothing in the other logs.

Server side:
May 19 20:54:43 server rpc.mountd: authenticated mount request from
client:899 for /home (/home)

Absolutely nothing else.

> Just for kicks /usr/sbin/exportfs -f -v


No output, no change in behavior.

>> after the mount:
>> # nfsstat -c
>> Client rpc stats:
>> calls retrans authrefrsh
>> 1 0 0

>
> So it doesn't get answered: what's in the (server) logs?


see above. Nothing worth mentioning.

> BTW: how did you mount, as when "-o nolock" does work, you'll know it's a
> locking problem.


The automounter uses -rw,hard,intr. Manually I use no options. Just tried
with -o nolock, but it doesn't change anything.

>> BTW: Should the incorrect checksums sent by the server bother me?

>
> Well idunno, it would bother me ...


It's a gigabit link to the nearest switch, if that matters.

>> Or does the NIC correct these?

>
> I don't think so. If anything it discarts the frame, and lets the
> upper-layer protocol handle a resent of it - so if nothing else it's slow.
> TCP does this resending. UDP does not, (good for streaming media where one
> doesn't care) but applications might have their own transmission integrity
> scheme ontop of it.


That sounds bad. I heard there are NICs out there that understand TCP and tp
their own checksumming.
But as I mentioned, a dump for a working mount is exactly the same,
including these bad checksums.

>> Another idea I have is to set up a second NFS server and see if I can
>> mount a directory from this one.

>
> If that's an option, good, see what you come op with: have a blast


I will do this as soon as I physically get to the office.

>> 1.2.3.111 (1.2.3.111), Dst Addr: 1.2.3.210 (1.2.3.210)

>
> Reply: so the portmapper connection works.
>
>> Checksum: 0x933b (incorrect, should be 0xca7b) Options: (12 bytes)

>
> But the CRC sucks, thus the packet gets resent:
>
>> 1.2.3.210 (1.2.3.210), Dst Addr: 1.2.3.111 (1.2.3.111)

> ...
>> 1.2.3.111 (1.2.3.111), Dst Addr: 1.2.3.210 (1.2.3.210)


Are you sure these are retransmits? I believe this are negotiations with the
portmapper to get the required port numbers etc. And then first calls to
the daemons to get things going. (Note: I skipped all the ACKs in my
message).

> Some more suckage ...
>
>> Checksum: 0x9337 (incorrect, should be 0xa2d3) Options: (12 bytes)

>
> Here we go again [ and some more of this. ]


I will do the same dump again, on the client side and see what arrives over
there.

>> No. Time Source Destination Protocol
>> Info
>> 58 3.161879 1.2.3.111 1.2.3.210 TCP
>> 2049 > 800 [ACK] Seq=1 Ack=137 Win=6864 Len=0 TSV=177229225
>> TSER=4294915050
>>
>> Frame 58 (66 bytes on wire, 66 bytes captured) Ethernet II, Src:
>> fe:dc:ba:98:76:54, Dst: 12:34:56:78:9a:bc Internet Protocol, Src Addr:
>> 1.2.3.111 (1.2.3.111), Dst Addr: 1.2.3.210 (1.2.3.210)
>> Transmission Control Protocol, Src Port: 2049 (2049), Dst Port: 800
>> (800), Seq: 1, Ack: 137, Len: 0
>> Source port: 2049 (2049)
>> Destination port: 800 (800)
>> Sequence number: 1 (relative sequence number) Acknowledgement
>> number: 137 (relative ack number) Header length: 32 bytes
>> Flags: 0x0010 (ACK)
>> Window size: 6864 (scaled)
>> Checksum: 0x26fb (correct)
>> Options: (12 bytes)
>> SEQ/ACK analysis
>> This is an ACK to the segment in frame: 57 The RTT to ACK the
>> segment was: 0.000008000 seconds

>
> Ok here the mount request does finally does seem to get through alright.
> Hence what do the logs say (timedout?) ?


Yes. Timeout on the client side.

> Are you sure the (physical) connection is alright?


yes. 8192 byte pings go through without any loss in 1.78ms. ssh is just
fine. NTP is ok (so I've both TCP and UDP covered).

> What does "netstat -s" and "/sbin/ifconfig eth0" have to say about all
> this? And after sone "ping" ?


ifconfig: No collision and no bad packets on both side.
netstat -s: Loads of numbers, nothing suspicous on the client side (3
TCPLossUndo, 3 TCPTimeouts - these numbers don't increase after another
try).
On the server side it's impossible to get good numbers due to the high
number of other clients + other section's traffic (I want my subnet, now!).

> Was the other machine (used as test object with the same IP) connected
> via the same cableand wall outlet? Try another maybe ...


No, it's even in another office. The situation really is:
- IP .210 in any port => doesn't work
- IPs .203,.207,.208 and loads of other regular clients in any port => works

I can reproduce this tcpdump that hangs after the FSINFO call, so maybe it's
time to start digging into the nfs daemon. I hope I can reproduce this with
any other server than our production one...

Michael

 
Reply With Quote
 
Michael Ritzert
Guest
Posts: n/a

 
      05-20-2005, 01:15 PM
Michael Ritzert wrote:
>>>BTW: Should the incorrect checksums sent by the server bother me?

>>
>>Well idunno, it would bother me ...


I just checked with a simultaneous tcpdump on both ends: The server sees
wrong checksums, but they arrive Ok on the client.

>
>>>Another idea I have is to set up a second NFS server and see if I can
>>>mount a directory from this one.

>>
>>If that's an option, good, see what you come op with: have a blast

>
>
> I will do this as soon as I physically get to the office.


Of course it works with another computer as server...
I will have to see how to debug this once most of the people around here
have left for the weekend and I can "play" with the server...

Michael
 
Reply With Quote
 
Menno Duursma
Guest
Posts: n/a

 
      05-20-2005, 01:21 PM
On Fri, 20 May 2005 09:57:29 +0200, Michael Ritzert wrote:

> thank you for all the time you're spending with my problem.


NP. This is taking more effort then i had hoped for though.

> Menno Duursma wrote:
>
>> On Thu, 19 May 2005 20:58:15 +0200, Michael Ritzert wrote:


> Server side:
> May 19 20:54:43 server rpc.mountd: authenticated mount request from
> client:899 for /home (/home)
>
> Absolutely nothing else.


Try adding the "-d all" argument to your mountd startup line, and restart.

>> Just for kicks /usr/sbin/exportfs -f -v

>
> No output, no change in behavior.


Maybe adding "-v" to the portmapper startup line, and restart that too.
(On both the client and server machiens.)

>> BTW: how did you mount, as when "-o nolock" does work, you'll know it's a
>> locking problem.

>
> The automounter uses -rw,hard,intr. Manually I use no options. Just tried
> with -o nolock, but it doesn't change anything.


Hmn, that should rule-out portmapper being the problem as well, as nfsd
listens on a well known port (2049) ...

>>> BTW: Should the incorrect checksums sent by the server bother me?

>>
>> Well idunno, it would bother me ...

>
> It's a gigabit link to the nearest switch, if that matters.


Well it could (depending on the configuration of it), probably the simple
way to rule that out as being the problem, would be connecting directly to
the server via a cross-cable. (If not possible becouse of locality, try
using a laptop configured the same way - if in production 24/7, rollback a
recent backup to a test machine and use that to play with.)

I have had this be the problem ones, where we had NFS mounts between a SCO
and Sun machine, it would work with a 3com switch, but not the Cisco one
it got replaced with (by another admin).

So having little time, i ended up with something like this:

SCO Sun
\ /
[ Cisco ] - [ 3com ]
| | | |
other servers

And never got around to knowing/solving the actual problem (may have been
proxy-arp releted?). As they where to be replaced by Linux boxen soon
anyways.

>>> Or does the NIC correct these?

>>
>> I don't think so. If anything it discarts the frame, and lets the
>> upper-layer protocol handle a resent of it - so if nothing else it's slow.
>> TCP does this resending. UDP does not, (good for streaming media where one
>> doesn't care) but applications might have their own transmission integrity
>> scheme ontop of it.

>
> That sounds bad. I heard there are NICs out there that understand TCP and tp
> their own checksumming.


Probably, but i don't know any.

> But as I mentioned, a dump for a working mount is exactly the same,
> including these bad checksums.


Well it may not be the problem at hand. Something to look at though, try
changing the MTU size. (And/or settings on the switch, i.e.: if it's setup
to cut-through try store-and-forward instead, stuff like that.)

>> Are you sure the (physical) connection is alright?

>
> yes. 8192 byte pings go through without any loss in 1.78ms. ssh is just
> fine. NTP is ok (so I've both TCP and UDP covered).


I'd opt for "nc" (Netcat) and "md5sum" particularly for UDP testing.
TFTP might do too.

>> Was the other machine (used as test object with the same IP) connected
>> via the same cableand wall outlet? Try another maybe ...

>
> No, it's even in another office. The situation really is: - IP .210 in
> any port => doesn't work - IPs .203,.207,.208 and loads of other regular
> clients in any port => works


So it may be a routing, QoS or arp problem (appart from the services)...

> I can reproduce this tcpdump that hangs after the FSINFO call, so maybe
> it's time to start digging into the nfs daemon. I hope I can reproduce
> this with any other server than our production one...


Should be nice, if you can roll-back a backup from the server, onto some
PC for testing.

--
-Menno.

 
Reply With Quote
 
Michael Ritzert
Guest
Posts: n/a

 
      05-24-2005, 10:28 AM
Menno Duursma wrote:
> On Fri, 20 May 2005 09:57:29 +0200, Michael Ritzert wrote:
>>thank you for all the time you're spending with my problem.

>
> NP. This is taking more effort then i had hoped for though.


.... and all that time was spent for nothing :-( ...
I had to reboot the server for different reasons and when I got back to
debugging this, it no longer happened. I can now happily ues NFS from
the .210 IP.

I remember that the IP was previously (few months ago) used by another
computer that has since been removed from our net. This computer was not
allowed to access the NFS mounts (not listed in /etc/exports). Maybe
traces of this configuration remained in memory.
We will propably never know... :-(

Thank you once again!
Michael
 
Reply With Quote
 
 
 
Reply

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Re: NFS hangs during mount J G Miller Linux Networking 11 08-19-2010 02:50 PM
HELP: NFS mount hangs when attempting to copy file Timothy Miller Linux Networking 6 07-21-2005 04:11 PM
user specific samba mount bigb Linux Networking 1 11-13-2004 12:10 PM
NFS mount won't mount at boot, but mount -a works fine. BT Linux Networking 2 09-23-2004 09:37 PM
nfs connection: mount hangs Mario Premke Linux Networking 0 10-16-2003 11:13 AM



1 2 3 4 5 6 7 8 9 10 11