NFS - only one client at a time can read files

Discussion in 'Linux Networking' started by David Brown, Sep 20, 2013.

  1. David Brown

    David Brown Guest

    I have a strange problem with NFS.

    I have NFS serving enabled on my Fedora 14 workstation, exporting a
    directory with these options:

    (ro,no_root_squash,sync,no_subtree_check)


    I have an embedded Linux card that is getting its kernel and rootfs from
    this export over NFS. The card is then copying a subdirectory of this
    export onto a flash-mounted file system using rsync (roughly as "rsync
    -av /unpacked/ /mnt/" ).

    When I run with one card, this works fine.

    When I have multiple cards connected, each one gets its kernel and
    mounts its rootfs fine, but it seems that only one client card can read
    at a time. If I watch the progress of the rsyncs, I can see one card
    will run for a bit, then stop and complain about nfs timeouts. Another
    card will run for a bit, before it too stops with a timeout. This goes
    back and forth - at any given time, only one client is successfully reading.


    Any ideas as to what might be wrong, or what I can check, would be
    appreciated.


    David
     
    David Brown, Sep 20, 2013
    #1
    1. Advertisements

  2. David Brown

    unruh Guest

    Disks have only one read head. It cannot be in two places at once.
     
    unruh, Sep 20, 2013
    #2
    1. Advertisements

  3. David Brown

    Chris Davies Guest

    This sounds like you don't have anywhere near enough rpc/nfsd daemons
    on your NFS server.

    On my Debian box there's a file /etc/default/nfs-kernel-server that
    defines the number of kernel nfsd "processes" to start at boot time. (I
    don't know where your equivalent configuration file will live.) The
    default on my system is 8, but you probably want to increase it to 32
    or even 64.

    To test the theory, count the number of nfsd processes already running
    ps -ef | grep -w '[n]fsd' | wc -l

    And then increase it, for example from 8 to 32
    nfsd 32

    If this works, you can configure it for boot-time.
    Chris
     
    Chris Davies, Sep 20, 2013
    #3
  4. David Brown

    Tauno Voipio Guest


    This feels like rsync is doing exclusive access to the concerned files,
    to prevent shooting at a moving target.

    For boot disk copying, an image file and dd may be better.
     
    Tauno Voipio, Sep 20, 2013
    #4
  5. David Brown

    Chris Davies Guest

    I've never seen rsync grab exclusive access to files. It could more likely
    occur over SMB/CIFS, which provides file locking by default, but not
    over NFS.

    Chris
     
    Chris Davies, Sep 21, 2013
    #5
  6. David Brown

    Tauno Voipio Guest

    Thanks for correcting. I was too lazy to wade the sources.
     
    Tauno Voipio, Sep 21, 2013
    #6
  7. David Brown

    David Brown Guest

    Disks are not the bottleneck. The whole shared area is small enough to
    be in ram cache on the server.
     
    David Brown, Sep 23, 2013
    #7
  8. David Brown

    David Brown Guest

    That would sound right - except that I agree with Chris' point below
    that rsync does not lock files in any way. (I've often seen large
    rsyncs end with a message saying that some files changed during the
    rsync run.)
     
    David Brown, Sep 23, 2013
    #8
  9. David Brown

    David Brown Guest

    This sounds like a possible explanation. A quick check shows that I
    have 8 nfsd threads running. Rsync almost certainly needs several
    connections while it is working, as it runs through the source tree to
    see what it should be copying - contention for the nfs connection
    threads could be the cause.

    I'll have to translate your commands here from "Debian" into "Fedora
    14", but now that I know what I am looking for, google can help with the
    translation. Later on, this whole thing will run on a debian server -
    but at the moment it is prototyping on my (outdated) Fedora desktop.

    Additionally, the mere act of talking about the problem has suggested an
    alternative solution. I am copying a bunch of data from one computer to
    another computer using rsync. Why not just use an rsync server? (The
    historical answer is that the copy was originally a "cp -a" rather than
    "rsync -a".)

    Thanks for the help,

    David
     
    David Brown, Sep 23, 2013
    #9
  10. David Brown

    David Brown Guest

    I've now changed the thread count in /etc/sysconfig/nfs to 64 and
    re-starting the nfs server - it made no difference that I could see, but
    my testing was done with a copy to tmpfs on the clients rather than to
    the NAND filesystem (since that takes 20 seconds rather than 12
    minutes). So I am not convinced that the nfs threads are the whole
    answer, but can't yet rule them out. And it should certainly do no harm
    to leave them at 64.

    In the end, I am copying a compressed tarball from the server onto the
    client's tmpfs with a simple "cp" on NFS - this takes about 3 seconds.
    It will not matter if it takes x * 3 seconds for "x" cards in parallel.
    Unpacking these tarballs into the NAND is now an entirely local
    operation on the cards, and will therefore be free from any issues with
    the server or network. It is also faster even for one card.

    Other than that, I will also do testing with the network setup here.
    The cards are currently running across our main LAN, which is well
    over-due for a re-organisation after many years of "organic" growth.

    But I am happy for now with the tarball copying solution.
     
    David Brown, Sep 23, 2013
    #10
  11. David Brown

    Chris Davies Guest

    If your file transfer is network bound then rsync as two separate
    processes (client & server) should run faster than a single process
    accessing a remote filesystem. If the bottleneck is elsewhere it won't
    help, as single-process rsync falls back to a basic copy.

    Chris
     
    Chris Davies, Sep 27, 2013
    #11
  12. David Brown

    David Brown Guest

    I figured out my problem - I'm noting it here in case anyone ever reads
    these as archives.

    It turned out that there was a configuration fault in the rootfs I had
    mounted, leading to all the cards getting the same fixed IP address
    shortly after root was mounted. Different cards therefore got contact
    with the server according to who answered the ARP requests first - I'm
    surprised everything worked in the end. The fix was obviously quite
    simple once I had found the problem (thanks to wireshark and a managed
    switch with port mirroring).
     
    David Brown, Oct 10, 2013
    #12
    1. Advertisements

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments (here). After that, you can post your question and our members will help you out.