Networking Forums

Networking Forums > Computer Networking > Linux Networking > slow file opens on nfs mount across high-latency network

Reply
Thread Tools Display Modes

slow file opens on nfs mount across high-latency network

 
 
A. Howe
Guest
Posts: n/a

 
      06-28-2005, 05:22 PM
I have searched through the mail lists and google and have not found
material describing the nfs file open threshold effect that I am
experiencing.

I have been experimenting with opening files on NFS mounts over varying
network latencies. I notice that there seems to be a threshold on the
number of concurrent nfs file opens as network latency increases. Up to
and including the threshold, nfs file open performance is fine. After the
threshold of concurrent opens, performance degrades at a linear rate.

For example, the graph in the http://ahowe_ca.tripod.com/nfsperformance.pdf
shows this threshold effect for various network latencies:
- 0ms network latency - no max limit
- 20ms network latency - 40 maximum concurrent opens
- 40ms network latency - 20 maximum concurrent opens
- 80ms network latency - 15 maximum concurrent opens
- 120ms network latency - 5 maximum concurrent opens

What would cause this "hockey stick" threshold effect shown in
http://ahowe_ca.tripod.com/nfsperformance.pdf?
Are there any settings that would change this effect?

Here are the stats of my experiment:
- testing using Redhat Enterprise AS servers V3 connected via a 100Mbps
switch
- using client options "rw, noexec, nosuid, nodev, noatime, hard, intr,
tcp"
- using server options "rw, aysnc, wdelay, all_squash, root_squash,
anonuid=500, anongid=500"
- inserting latency with nist net
- experiment process spawns X number of threads set to each open a file on
an NFS mount, the time taken for each file open is recorded
- adjusting rsize, wsize does not affect "hockey stick" threshold effect
- adjusting /proc/sys/net/core/rmem* does not affect "hockey stick"
threshold effect
- adjusting number of nfsd processes does not affect "hockey stick"
threshold effect

Thanks in advance for any tips or directions where I can look for more
information on this topic.

Regards,

Anthony Howe
 
Reply With Quote
 
 
 
 
Stuart Friedberg
Guest
Posts: n/a

 
      06-28-2005, 07:32 PM
On Tue, 28 Jun 2005 10:22:14 -0700, A. Howe <(E-Mail Removed)> wrote:
> there seems to be a threshold on the
> number of concurrent nfs file opens as network latency increases. Up to
> and including the threshold, nfs file open performance is fine. After
> the threshold of concurrent opens, performance degrades at a linear rate.


I took a look at your chart. Those curves sure look like a constant plus
a linear factor, not the sort of exponential explosion suggested by
"hockeystick". And the linear factor seems to be very closely
proportional to the network latency you have introduced (the case for
20ms isn't quite proportionate, but the others are spot on).

The first question that leaps out at me screaming for attention
is: What is the Y axis of your chart actually measuring?

If the answer is "total time to establish X connections under L
latency", then you have no problem whatsoever. As I'm sure you know,
the NFS protocol doesn't have any such thing as an open function.
So "open for write" on the client is going to do some NFS LOOKUPs to
traverse the path followed by a GETATTR and/or a CREATE. Looking at
your chart, I can predict that you are opening files in the current
working directory, because less than two network round-trips are
required for each open. The times on your chart are entirely accounted
for by the latencies you introduced, and your data can be accurately
modeled in the form (constant0 + constant1 * latency). There is
no hockeystick.

If the answer is anything else, there is something terribly screwed
up. And since RHAS 3 was not known for a terminally broken TCP
or NFS implementation, I'd suggest you look at your test program.
Specifically, I'd suggest you make sure you are not inadvertently
carrying out some sort of O(N^2) behavior while doing N opens.
(I once ran into that exact issue when doing select(fdset) N times.
The constant factor for evaluating one bit in the fdset was about
a microsecond. Doing select for N in 1 to 10,000 added seconds
to my test overhead.)

Stu Friedberg
 
Reply With Quote
 
A. Howe
Guest
Posts: n/a

 
      07-06-2005, 08:19 PM
"Stuart Friedberg" <(E-Mail Removed)> wrote in
newsp.ss3hwni9gowx4s@m-stuartf:

> On Tue, 28 Jun 2005 10:22:14 -0700, A. Howe <(E-Mail Removed)> wrote:
>> there seems to be a threshold on the
>> number of concurrent nfs file opens as network latency increases. Up
>> to and including the threshold, nfs file open performance is fine.
>> After the threshold of concurrent opens, performance degrades at a
>> linear rate.

>
> I took a look at your chart. Those curves sure look like a constant
> plus a linear factor, not the sort of exponential explosion suggested
> by "hockeystick". And the linear factor seems to be very closely
> proportional to the network latency you have introduced (the case for
> 20ms isn't quite proportionate, but the others are spot on).
>
> The first question that leaps out at me screaming for attention
> is: What is the Y axis of your chart actually measuring?
>
> If the answer is "total time to establish X connections under L
> latency", then you have no problem whatsoever. As I'm sure you know,
> the NFS protocol doesn't have any such thing as an open function.
> So "open for write" on the client is going to do some NFS LOOKUPs to
> traverse the path followed by a GETATTR and/or a CREATE. Looking at
> your chart, I can predict that you are opening files in the current
> working directory, because less than two network round-trips are
> required for each open. The times on your chart are entirely
> accounted for by the latencies you introduced, and your data can be
> accurately modeled in the form (constant0 + constant1 * latency).
> There is no hockeystick.
>
> If the answer is anything else, there is something terribly screwed
> up. And since RHAS 3 was not known for a terminally broken TCP
> or NFS implementation, I'd suggest you look at your test program.
> Specifically, I'd suggest you make sure you are not inadvertently
> carrying out some sort of O(N^2) behavior while doing N opens.
> (I once ran into that exact issue when doing select(fdset) N times.
> The constant factor for evaluating one bit in the fdset was about
> a microsecond. Doing select for N in 1 to 10,000 added seconds
> to my test overhead.)
>
> Stu Friedberg


I have found a solution (work around) to the latency problem that I
described in the above posting. The solution seems to be to add more
mounts to the same nfs share and spread the threads evenly among the
mounts.

This solution is highlighted by the graph:
http://ahowe_ca.tripod.com/nfsperformance2.pdf. The graph shows the
performance of 40 threads each performing file opens across an nfs share.
Each thread repeats 10 times opening a file for write and closing it 10
times. The x-axis is various roundtrip network latencies. The y-axis is
the mean time to open a file across the nfs mount. Each line represents a
different number of mount points. The top line is one mount point, and 40
threads concurrently calling file opens on this mount. The bottom line
represents 40 mounts to the same share, each thread with its own line.
From this graph I can come to the following conclusions:

1. Evenly distributing file access threads across many NFS mounts to the
same share reduces the effects of latency.

2. There seems to be resource limit per mount point. The higher the
network latency the faster this limit is reached. This explains the hockey
stick effect shown in graph: http://ahowe_ca.tripod.com/nfsperformance.pdf.

3. This resource limit is not related to the hard limit of 256 read or
write operations per mount point since we are only using 40 threads.

4. Removing all network latency eliminates this resource problem and 1
mount or 40 mounts display the same performance.

Does anyone know what would be causing this resource limit? And is there
any way to improve it so that I don't have to add more mounts to reduce the
effects of latency?

Anthony
 
Reply With Quote
 
 
 
Reply

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
What is your favorite method to troubleshoot high-bandwidth/high latency networks? Spin Windows Networking 15 05-07-2009 08:04 PM
high ping latency correlation with high server activity Tal Bar-Or Windows Networking 0 04-29-2008 08:08 AM
wireless problem: high bandwidth but high latency Dan Christensen Linux Networking 1 01-19-2005 02:40 PM
High latency with MASQ, not without Ted Behling Linux Networking 1 09-04-2003 01:03 PM
Word Opens File from XP Network Drive as Read-Only Les Lazareck Windows Networking 0 08-26-2003 07:40 PM



1 2 3 4 5 6 7 8 9 10 11