In article <(E-Mail Removed) >, Shi Jin wrote:
>
> We are using two linux clusters. Both of them have a similiar NFS
> problem. The fileserver has a RAID attached to it and it is exporting
> the /home as NFS to many linux client nodes. Whenever the clients are
> trying to write too much data to the NFS /home or when the somebody is
> doing a lot of IO on the fileserver itself, the writes on the clients
> to the NFS /home will frequently leads to an error. We tried to tune
What is the error you're getting? What hardware do you have on
the server side? What is your network setup?
> the parameters but still have the same problem. The current mounting
> parameters are
> rw,nosuid,soft,intr,rsize=8192,wsize=8192
The 'hard' mount option is preferred over 'soft', but that's a bit of
a religous issue.
> And the NFS server exporting parameters are simple:
> rw,sync,root_squash
>
> Could anybody tell me whether this is our own problem or it is common
> to all linux NFS servers. Is there a way to solve it?
The Linux NFS HOWTO has a whole chapter on tuning:
http://nfs.sourceforge.net/nfs-howto/
Not knowing what exact error you're getting, I can can only guess. One
thing to check is to make sure you have enough server threads. By default,
e.g., Red Hat only launches 8 nfsd threads, which is definitely not enough
for a busy server. Check the output of 'cat /proc/net/rpc/nfsd'. Look
at the output of the 4th line, which should look something like this:
th 256 846 2422.580 993.510 436.910 195.680 123.210 54.990 36.530 27.830 19.010 32.410
The first number is the number of server threads (256 in this case). The
second number is the most important one. That's the number of times
requests have had to be queued because all the threads were busy. In
this case, I had been using 128 threads, saw that 846, and then upped the
threads to 256 (without rebooting and thus without resetting the statistics).
The number hasn't increased since then. The rest of the numbers represent
the amount of time a particular percentage of the threads have been busy
at the same time. I.E., 10% of the threads were busy for 2422.58 units (I'm
not sure of the units), 20% were busy for 993.51, etc. Having
the last few numbers be low indicates that you've got enough threads.
--
Joshua Baker-LePain
Department of Biomedical Engineering
Duke University