Hi, all,
I'm hoping someone in this group can provide some insight or suggestions on
how to solve a tricky NFS problem.
I use a simple backup script to tar major file systems onto a USB hard drive
using a weekly cron task. However, after upgrading my server to FC3 (and
the 40 or so nodes to either FC2 or FC3), I encounter NFS statfs errors.
Specifically, after the backup has been running for a few minutes, the nodes
lose their NFS file handles for one specific file system. A "df" on a given
node shows:
Filesystem 1K-blocks Used Available Use% Mounted on
/dev/sda3 14571488 2404276 11427020 18% /
/dev/sda1 1004024 21748 931272 3% /boot
none 1037288 0 1037288 0% /dev/shm
/dev/sdb1 70557052 38727272 28245684 58% /tmp1
server:/home/users 70557056 47564176 19408784 72% /home/users
server:/usr/local/athlon1
- - - - /usr/local
In /var/log/messages on the node I will see:
Sep 13 12:44:20 node_name kernel: nfs_statfs: statfs error = 116
This message will often be repeated many times.
Loss of access to /usr/local, where many executable programs reside is fatal
to calculations running on our batch system. Thus, every Sunday morning
when the backup would run, our job queue would be wiped, killing many long
tasks.
Interesting points to note:
(1) We're running FC3 (kernel 2.6.9-1.667smp) on the server and either the
same thing or FC2 (2.6.10-1.770_FC2smp) on the nodes.
(2) The nodes use the following args in /etc/fstab for the NFS file systems:
nfsvers=2,rsize=8192,wsize=8192,exec,dev,suid,rw
(3) The server exports file systems with the following args in /etc/exports:
rw,no_root_squash,insecure
(4) Although the nodes lose their file handles for the /usr/local file
system during the backup, the backup is actually working on a completely
different file system when the statfs error occurs.
(5) The problem only occurs with the /usr/local file system, and NOT with
the /home/users file system, which is also NFS exported to the nodes. It
doesn't matter which file system is first in /etc/fstab on the nodes; the
problem always occurs with /usr/local.
(6) It doesn't seem to matter that the backup is made to a (slow) USB disk.
I also tried backing up to a file in /tmp on the server, but the same error
occurred.
Any insight you can provide would be greatly appreciated.
-Daniel
--
T. Daniel Crawford Department of Chemistry
Crawdad[AT]vt.edu Virginia Tech
www.chem.vt.edu/faculty/crawford.php
--------------------
PGP Public Key at:
http://www.chem.vt.edu/chem-dept/crawford/publickey.txt