Networking Forums

Networking Forums > Computer Networking > Linux Networking > NFS statd/lockd warnings

Reply
Thread Tools Display Modes

NFS statd/lockd warnings

 
 
Roger Leigh
Guest
Posts: n/a

 
      12-16-2004, 09:30 AM
Hello,

I'm running a simple NFS setup (one client, one server) using NFSv3 on
Debian sarge (nfs-utils 1.0.6, and Linux 2.4.27 and 2.6.8 Debian
kernels). I'm seeing the same problem with both kernel versions (plus
vanilla kernel.org 2.6.[89]).

In my logs, I keep getting this message (on both the client and server):

Dec 14 11:59:54 till2 rpc.statd[2141]: Received erroneous SM_UNMON
request from till2 for 10.0.0.203

Looking through list archives with Google, it seems that this can be
caused by:

1. /var/lib/nfs/sm* not being writable (it's writable, owned by root, and
rpc.statd runs as root)
2. Name/IP address changes. This isn't happening either.

The file region locking works; using strace I can see that fcntl64 is
used to get and set locks, and attempts to lock the same records on both
systems give the appropriate errors in the application. I get the
SM_UNMON error if I leave the client system alone for 5 minutes with no
activity: when I next try an operation, I get the above error in the
logs on both machines. With continuous operation, I still get the error
periodically, but if I leave it alone for 5 minutes, it always occurs.

Could anyone suggest how I might debug or fix this?


Many thanks,
Roger


Server Configuration (host till1, ip 10.0.0.203):
/etc/exports:
/srv/epic *(rw,sync,secure_locks,no_root_squash,no_wdelay,no _subtree_check)
[using knfsd]

Client configuration (host till2, ip 10.0.0.204):
I use autofs4, which gives this mount:
automount(pid2099) on /var/autofs/net type autofs
(rw,fd=4,pgrp=2099,minproto=2,maxproto=4)
till1:/srv/epic on /var/autofs/net/till1/srv/epic type nfs
(rw,nosuid,nodev,hard,intr,nfsvers=3,posix,udp,rsi ze=8192,wsize=8192,addr=10.0.0.203)

This is the log in the server (nlm_debug is 65535):

Dec 14 12:02:25 till1 kernel: lockd: request from 0a0000cc
Dec 14 12:02:25 till1 kernel: lockd: LOCK called
Dec 14 12:02:25 till1 kernel: lockd: nlm_lookup_host(0a0000cc, p=17, v=4)
Dec 14 12:02:25 till1 kernel: lockd: host garbage collection
Dec 14 12:02:25 till1 kernel: lockd: nlmsvc_mark_resources
Dec 14 12:02:25 till1 kernel: lockd: delete host 10.0.0.204
Dec 14 12:02:25 till1 kernel: lockd: nsm_unmonitor(10.0.0.204)
Dec 14 12:02:25 till1 kernel: nsm: xdr_encode_mon(0a0000cc, -1249509120, 67108864,
268435456)
Dec 14 12:02:25 till1 rpc.statd[1421]: Received erroneous SM_UNMON request from till1 for
10.0.0.204
Dec 14 12:02:25 till1 kernel: lockd: creating host entry
Dec 14 12:02:25 till1 kernel: lockd: nsm_monitor(10.0.0.204)
Dec 14 12:02:25 till1 kernel: nsm: xdr_encode_mon(0a0000cc, -1249509120, 67108864,
268435456)
Dec 14 12:02:25 till1 kernel: nsm: xdr_decode_stat_res status 0 state 181
Dec 14 12:02:25 till1 kernel: lockd: nlm_file_lookup(01000001 0400fe00 00010001 00010804
13bb0bdb 00000000)
Dec 14 12:02:25 till1 kernel: lockd: creating file for (01000001 0400fe00 00010001 00010804
13bb0bdb
00000000)
Dec 14 12:02:25 till1 kernel: lockd: found file c51aa1d0 (count 0)
Dec 14 12:02:25 till1 kernel: lockd: nlmsvc_lock(fe04/67588, ty=1, pi=2493,
805306370-805306371, bl=0)
Dec 14 12:02:25 till1 kernel: lockd: nlmsvc_lookup_block f=c51aa1d0 pd=2493
805306370-805306371 ty=1
Dec 14 12:02:25 till1 kernel: lockd: posix_lock_file returned 0
Dec 14 12:02:25 till1 kernel: lockd: LOCK status 0
Dec 14 12:02:25 till1 kernel: lockd: release host 10.0.0.204
Dec 14 12:02:25 till1 kernel: lockd: nlm_release_file(c51aa1d0, ct = 1)
Dec 14 12:02:25 till1 kernel: nlmsvc_retry_blocked(00000000, when=0)
Dec 14 12:02:25 till1 kernel: nlmsvc_retry_blocked(00000000, when=0)
Dec 14 12:02:25 till1 kernel: lockd: request from 0a0000cc


This is the log in the client (nlm_debug is 65535, the time is slightly
behind the server):

Dec 14 11:04:55 till2 kernel: lockd: nlm_lookup_host(0a0000cb, p=17, v=4)
Dec 14 11:04:55 till2 kernel: lockd: get host 10.0.0.203
Dec 14 11:04:55 till2 kernel: lockd: call procedure 4 on 10.0.0.203
Dec 14 11:04:55 till2 kernel: lockd: nlm_bind_host(0a0000cb)
Dec 14 11:04:55 till2 kernel: lockd: server returns status 0
Dec 14 11:04:55 till2 kernel: lockd: clnt proc returns 0
Dec 14 11:04:55 till2 kernel: lockd: release host 10.0.0.203
Dec 14 11:04:55 till2 kernel: lockd: release host 10.0.0.203
Dec 14 11:59:54 till2 kernel: lockd: nlm_lookup_host(0a0000cb, p=17, v=4)
Dec 14 11:59:54 till2 kernel: lockd: host garbage collection
Dec 14 11:59:54 till2 kernel: lockd: nlmsvc_mark_resources
Dec 14 11:59:54 till2 kernel: lockd: delete host 10.0.0.203
Dec 14 11:59:54 till2 kernel: lockd: nsm_unmonitor(10.0.0.203)
Dec 14 11:59:54 till2 rpc.statd[2141]: Received erroneous SM_UNMON request from till2 for
10.0.0.203
Dec 14 11:59:54 till2 kernel: lockd: creating host entry
Dec 14 11:59:54 till2 kernel: lockd: nlm_bind_host(0a0000cb)
Dec 14 11:59:54 till2 kernel: lockd: nsm_monitor(10.0.0.203)


--
Roger Leigh

Printing on GNU/Linux? http://gimp-print.sourceforge.net/
GPG Public Key: 0x25BFB848. Please sign and encrypt your mail.
 
Reply With Quote
 
 
 
 
ric@opus1.com
Guest
Posts: n/a

 
      12-20-2004, 07:29 PM

Roger Leigh wrote:
> Hello,
>
> I'm running a simple NFS setup (one client, one server) using NFSv3

on
> Debian sarge (nfs-utils 1.0.6, and Linux 2.4.27 and 2.6.8 Debian
> kernels). I'm seeing the same problem with both kernel versions

(plus
> vanilla kernel.org 2.6.[89]).
>
> In my logs, I keep getting this message (on both the client and

server):
>
> Dec 14 11:59:54 till2 rpc.statd[2141]: Received erroneous SM_UNMON
> request from till2 for 10.0.0.203
>
> Looking through list archives with Google, it seems that this can be
> caused by:
>
> 1. /var/lib/nfs/sm* not being writable (it's writable, owned by root,

and
> rpc.statd runs as root)
> 2. Name/IP address changes. This isn't happening either.
>
> The file region locking works; using strace I can see that fcntl64 is
> used to get and set locks, and attempts to lock the same records on

both
> systems give the appropriate errors in the application. I get the
> SM_UNMON error if I leave the client system alone for 5 minutes with

no
> activity: when I next try an operation, I get the above error in the
> logs on both machines. With continuous operation, I still get the

error
> periodically, but if I leave it alone for 5 minutes, it always

occurs.
>
> Could anyone suggest how I might debug or fix this?
>
>
> Many thanks,
> Roger
>
>
> Server Configuration (host till1, ip 10.0.0.203):
> /etc/exports:
> /srv/epic

*(rw,sync,secure_locks,no_root_squash,no_wdelay,no _subtree_check)
> [using knfsd]
>
> Client configuration (host till2, ip 10.0.0.204):
> I use autofs4, which gives this mount:
> automount(pid2099) on /var/autofs/net type autofs
> (rw,fd=4,pgrp=2099,minproto=2,maxproto=4)
> till1:/srv/epic on /var/autofs/net/till1/srv/epic type nfs
>

(rw,nosuid,nodev,hard,intr,nfsvers=3,posix,udp,rsi ze=8192,wsize=8192,addr=10.0.0.203)
>
> This is the log in the server (nlm_debug is 65535):
>
> Dec 14 12:02:25 till1 kernel: lockd: request from 0a0000cc
> Dec 14 12:02:25 till1 kernel: lockd: LOCK called
> Dec 14 12:02:25 till1 kernel: lockd: nlm_lookup_host(0a0000cc, p=17,

v=4)
> Dec 14 12:02:25 till1 kernel: lockd: host garbage collection
> Dec 14 12:02:25 till1 kernel: lockd: nlmsvc_mark_resources
> Dec 14 12:02:25 till1 kernel: lockd: delete host 10.0.0.204
> Dec 14 12:02:25 till1 kernel: lockd: nsm_unmonitor(10.0.0.204)
> Dec 14 12:02:25 till1 kernel: nsm: xdr_encode_mon(0a0000cc,

-1249509120, 67108864,
> 268435456)
> Dec 14 12:02:25 till1 rpc.statd[1421]: Received erroneous SM_UNMON

request from till1 for
> 10.0.0.204
> Dec 14 12:02:25 till1 kernel: lockd: creating host entry
> Dec 14 12:02:25 till1 kernel: lockd: nsm_monitor(10.0.0.204)
> Dec 14 12:02:25 till1 kernel: nsm: xdr_encode_mon(0a0000cc,

-1249509120, 67108864,
> 268435456)
> Dec 14 12:02:25 till1 kernel: nsm: xdr_decode_stat_res status 0 state

181
> Dec 14 12:02:25 till1 kernel: lockd: nlm_file_lookup(01000001

0400fe00 00010001 00010804
> 13bb0bdb 00000000)
> Dec 14 12:02:25 till1 kernel: lockd: creating file for (01000001

0400fe00 00010001 00010804
> 13bb0bdb
> 00000000)
> Dec 14 12:02:25 till1 kernel: lockd: found file c51aa1d0 (count 0)
> Dec 14 12:02:25 till1 kernel: lockd: nlmsvc_lock(fe04/67588, ty=1,

pi=2493,
> 805306370-805306371, bl=0)
> Dec 14 12:02:25 till1 kernel: lockd: nlmsvc_lookup_block f=c51aa1d0

pd=2493
> 805306370-805306371 ty=1
> Dec 14 12:02:25 till1 kernel: lockd: posix_lock_file returned 0
> Dec 14 12:02:25 till1 kernel: lockd: LOCK status 0
> Dec 14 12:02:25 till1 kernel: lockd: release host 10.0.0.204
> Dec 14 12:02:25 till1 kernel: lockd: nlm_release_file(c51aa1d0, ct =

1)
> Dec 14 12:02:25 till1 kernel: nlmsvc_retry_blocked(00000000, when=0)
> Dec 14 12:02:25 till1 kernel: nlmsvc_retry_blocked(00000000, when=0)
> Dec 14 12:02:25 till1 kernel: lockd: request from 0a0000cc
>
>
> This is the log in the client (nlm_debug is 65535, the time is

slightly
> behind the server):
>
> Dec 14 11:04:55 till2 kernel: lockd: nlm_lookup_host(0a0000cb, p=17,

v=4)
> Dec 14 11:04:55 till2 kernel: lockd: get host 10.0.0.203
> Dec 14 11:04:55 till2 kernel: lockd: call procedure 4 on 10.0.0.203
> Dec 14 11:04:55 till2 kernel: lockd: nlm_bind_host(0a0000cb)
> Dec 14 11:04:55 till2 kernel: lockd: server returns status 0
> Dec 14 11:04:55 till2 kernel: lockd: clnt proc returns 0
> Dec 14 11:04:55 till2 kernel: lockd: release host 10.0.0.203
> Dec 14 11:04:55 till2 kernel: lockd: release host 10.0.0.203
> Dec 14 11:59:54 till2 kernel: lockd: nlm_lookup_host(0a0000cb, p=17,

v=4)
> Dec 14 11:59:54 till2 kernel: lockd: host garbage collection
> Dec 14 11:59:54 till2 kernel: lockd: nlmsvc_mark_resources
> Dec 14 11:59:54 till2 kernel: lockd: delete host 10.0.0.203
> Dec 14 11:59:54 till2 kernel: lockd: nsm_unmonitor(10.0.0.203)
> Dec 14 11:59:54 till2 rpc.statd[2141]: Received erroneous SM_UNMON

request from till2 for
> 10.0.0.203
> Dec 14 11:59:54 till2 kernel: lockd: creating host entry
> Dec 14 11:59:54 till2 kernel: lockd: nlm_bind_host(0a0000cb)
> Dec 14 11:59:54 till2 kernel: lockd: nsm_monitor(10.0.0.203)
>
>
> --
> Roger Leigh
>
> Printing on GNU/Linux?

http://gimp-print.sourceforge.net/
> GPG Public Key: 0x25BFB848. Please sign and encrypt

your mail.

I've got the same problem on Slackware 10.0 (2.4.26 kernel), nfs-utils
1.0.6. It
appears to be a race between automount and lockd to me. The erroneous
SM_UNMON
errors occur reliably on lock requests that happen to force the
mounting of a
file system via automount. The client sees no error on the lock
request
(open() + fcntl(F_SETLKW) + close()). F_SETLKW requests done while the
file
system is mounted don't produce any errors on the server. It should
also
be noted that the sequence of open+lock that caused the mount do not
really
lock the file, as another app can also lock the file at this time. if
the first app does the open+lock sequence a few seconds after the
mount,
then it really gets the lock.

I see this a lot on my home directory (shared from a Slack 10 system),
which is
mounted by our Solaris 8/Sparc mail server to access my .forward file
which
happens to run procmail. I have been able to repoduce it on Fedora
core 3.

Ric

 
Reply With Quote
 
Roger Leigh
Guest
Posts: n/a

 
      12-22-2004, 02:28 PM
On 2004-12-20, (E-Mail Removed) <(E-Mail Removed)> wrote:
>
> Roger Leigh wrote:
>>
>> I'm running a simple NFS setup (one client, one server) using NFSv3
>> on Debian sarge (nfs-utils 1.0.6, and Linux 2.4.27 and 2.6.8 Debian
>> kernels). I'm seeing the same problem with both kernel versions
>> (plus vanilla kernel.org 2.6.[89]).
>>
>> In my logs, I keep getting this message (on both the client and
>> server):
>>
>> Dec 14 11:59:54 till2 rpc.statd[2141]: Received erroneous SM_UNMON
>> request from till2 for 10.0.0.203

>
> I've got the same problem on Slackware 10.0 (2.4.26 kernel), nfs-utils
> 1.0.6. It appears to be a race between automount and lockd to me.
> The erroneous SM_UNMON errors occur reliably on lock requests that
> happen to force the mounting of a file system via automount. The
> client sees no error on the lock request (open() + fcntl(F_SETLKW) +
> close()). F_SETLKW requests done while the file system is mounted
> don't produce any errors on the server. It should also be noted that
> the sequence of open+lock that caused the mount do not really lock the
> file, as another app can also lock the file at this time. if the
> first app does the open+lock sequence a few seconds after the mount,
> then it really gets the lock.
>
> I see this a lot on my home directory (shared from a Slack 10 system),
> which is mounted by our Solaris 8/Sparc mail server to access my
> .forward file which happens to run procmail. I have been able to
> repoduce it on Fedora core 3.


I tested this again, taking autofs/automount out of the equation. I
unmounted all the filesystems, stopped the automounter, then mounted the
filesystem "by hand" using the same mount options:

mount -t nfs -o hard,intr,nodev,nosuid,nfsvers=3,posix,udp,rsize=8 192,wsize=8192 ...

I found I still got the SM_UNMON errors in my logs, still after an
approx. 5 minute delay.

Can you reproduce this without autofs as well?


Regards,
Roger

--
Roger Leigh

Printing on GNU/Linux? http://gimp-print.sourceforge.net/
GPG Public Key: 0x25BFB848. Please sign and encrypt your mail.
 
Reply With Quote
 
 
 
Reply

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
rf warnings given Lenny Broadband 4 05-22-2007 11:19 PM
netdiag (NetBT) - WARNINGS frier Windows Networking 0 12-13-2004 12:09 PM
NTDS Replication warnings Edward Ray \(502974\) Windows Networking 0 05-31-2004 09:04 PM
lockd: cannot monitor although all rpc processes are runnig Jan =?ISO-8859-15?Q?L=FChr?= Linux Networking 0 11-12-2003 04:46 PM
rpc.statd fails on diskless Debian 3.0 install Oliver Hookins Linux Networking 9 09-09-2003 11:50 AM



1 2 3 4 5 6 7 8 9 10 11