[NCLUG] NFS Questions

Michael Coffman coffman at ftc.avagotech.com
Wed Sep 2 14:27:44 MDT 2009


Bob,

Thanks for the reply.

On Wed, 2 Sep 2009, Bob Proulx wrote:

> Michael Coffman wrote:
>> The issue (maybe not so much an issue as expected behavior) that I want
>> to better understand.
>>
>> 1 - On Client A - touch /mnt/c/data/testfile;rm /mnt/c/data/testfile; ssh
>> clientB touch /mnt/c/data/testfile
>> 2 - On Client A - ls /mnt/c/data/testfile
>
> Your team is crossing from one client to another client.  That is
> where this problem is triggered.  If all of the access can be done on
> one client then the problem is mitigated.  But modifying files on
> different distributed clients trips into NFS cache coherency problems
> by the very nature of the design of NFS.  Flip flopping between two
> clients reading and writing a shared file is very hard to handle with
> any type of cache and NFS is notorious for hitting these issues.
>
> I recommend getting a copy of NFS Illustrated by Brent Callaghan.  It
> has a lot of good information about NFS implementations.  Here is a
> small snippet from it:
>
>  8.14.2: Cache Consistency

Open on my desk and read before sending the email :)

>
>  Since data obtained from the server may be cached by many clients at
>  different times, there is a possibility that the cached data may be
>  inconsistent between the client and its server or other clients.  The
>  protocol provides no facility that will guarantee that the cached data
>  will always be consistent with the server--instead, clients are
>  expected to make a best-effort attempt to keep their cached data in
>  sync with the server.
>  ...
>  The cache time is a compromise that trades off cache consistency
>  against server and network loading.  If the cache time is small,
>  then the cache consistency will be high, but the server will be
>  consulted frequently to check if the modification time has changed.
>  If the cache time is set to 0, then the server will be consulted
>  whenever the cached data are accessed.  If the cache time is long,
>  then the server will be consulted infrequently, but there's a
>  greater chance that the client may use stale cached
>  data--consistency is low.

Thanks for the description.   It is becoming more clear to me how this 
works.   Do you know how the actual timing of the cache works?  Is ther
a timer set for each directory, file?   Does it work anything like 
standard memory paging in how it times out the attributes?

>
>> Our clients are currently configured using default values for
>> acreg{min,max} and acdir{min,max}.  Currently the above test can
>> take as long as 60 seconds for the new version of testfile to show
>> up in the listing.  The error can be reproduced in a number of ways
>> beyond this example.
>
> You are seeing typical times.
>
>  Solaris Cache Time Ranges
>              Minimum      Maximum
>              Cache Time   Cache Time
>  File        3-seconds    30-seconds
>  Directory   30-seconds   60-seconds
>
>> Questions:
>>
>> - Is is possible to cause an nfs client to flush its directory or file
>>    attribute caches?
>
> As far as I know the only way to do this is to set the cache time to
> zero and disable the cache entirely.  Of course that can have severe
> server side load consequences.

I want to experiment with this, but am concerned about how much the load 
will be increased.

>
> Better is to design the application with NFS cache in mind and avoid
> the problem.  I sympathize that if the application authors weren't
> aware of NFS caching issues that they may have created a problem that
> is hard to correct later but I believe that really is the way to solve
> the problem.

I agree with this but it is not currently an option.  This is where know 
the actual cache hit ratios would help to understand what affect chaning 
the timeout values would have.

>
>> - Is there a way to zero out the nfs traffic counters that nfsstat says it
>>    can't error out?
>> - Is there any way to tell the actual cache hit rate of attributes on the
>>    client?
>
> On these I do not know.
>
>> I know that reducing the attr* values to 0 will achieve what I want, but
>> I am afraid of what that will do to the load on my file servers and am
>> note certain about the best way to approach testing this.
>>
>> Any help or pointers would be greatly appreciated...
>
> I have had to deal with this same problem before and the tactic I took
> was to design the task to do everything that it needed to do from the
> same client.  Then there was only a single client cache and the
> problem of distributed cache coherency was avoided.
>
> It is okay to create a new file on client A and then switch over to
> access the file from client B.  Since this is a new file client B
> won't have it in the cache and will fetch it.  This is okay.  But
> avoid having client A access the file again after client B has
> modified the file.  That way leads to madness.  Okay to switch to a
> third client and have it access the file since in this third client
> the file will not be in cache.

Agreed also, but it is currently how things are being done and it will be 
difficult for changes to made in the short term.  An discussion with folks 
creating the tools simply brings up the question of "can't you just fix 
NFS"  :(

>
> Note: My experience is all with NFSv2 and NFSv3 and I haven't used
> NFSv4 at all yet.  In fact it has been a few years all around now.  My
> knowledge of current issues is likely growing stale, just like the NFS
> buffer caches in distributed NFS clients. :-)
>

Noted :)  thanks for the input.

> Bob
> _______________________________________________
> NCLUG mailing list       NCLUG at nclug.org
>
> To unsubscribe, subscribe, or modify
> your settings, go to:
> http://www.nclug.org/mailman/listinfo/nclug
>

-- 
-MichaelC



More information about the NCLUG mailing list