[NCLUG] NFS Questions

Wed Sep 2 12:11:51 MDT 2009

Michael Coffman wrote:
> The issue (maybe not so much an issue as expected behavior) that I want
> to better understand.
> 
> 1 - On Client A - touch /mnt/c/data/testfile;rm /mnt/c/data/testfile; ssh 
> clientB touch /mnt/c/data/testfile
> 2 - On Client A - ls /mnt/c/data/testfile

Your team is crossing from one client to another client.  That is
where this problem is triggered.  If all of the access can be done on
one client then the problem is mitigated.  But modifying files on
different distributed clients trips into NFS cache coherency problems
by the very nature of the design of NFS.  Flip flopping between two
clients reading and writing a shared file is very hard to handle with
any type of cache and NFS is notorious for hitting these issues.

I recommend getting a copy of NFS Illustrated by Brent Callaghan.  It
has a lot of good information about NFS implementations.  Here is a
small snippet from it:

  8.14.2: Cache Consistency

  Since data obtained from the server may be cached by many clients at
  different times, there is a possibility that the cached data may be
  inconsistent between the client and its server or other clients.  The
  protocol provides no facility that will guarantee that the cached data
  will always be consistent with the server--instead, clients are
  expected to make a best-effort attempt to keep their cached data in
  sync with the server.
  ...
  The cache time is a compromise that trades off cache consistency
  against server and network loading.  If the cache time is small,
  then the cache consistency will be high, but the server will be
  consulted frequently to check if the modification time has changed.
  If the cache time is set to 0, then the server will be consulted
  whenever the cached data are accessed.  If the cache time is long,
  then the server will be consulted infrequently, but there's a
  greater chance that the client may use stale cached
  data--consistency is low.

> Our clients are currently configured using default values for
> acreg{min,max} and acdir{min,max}.  Currently the above test can
> take as long as 60 seconds for the new version of testfile to show
> up in the listing.  The error can be reproduced in a number of ways
> beyond this example.

You are seeing typical times.

  Solaris Cache Time Ranges
              Minimum      Maximum
              Cache Time   Cache Time
  File        3-seconds    30-seconds
  Directory   30-seconds   60-seconds

> Questions:
> 
> - Is is possible to cause an nfs client to flush its directory or file
>    attribute caches? 

As far as I know the only way to do this is to set the cache time to
zero and disable the cache entirely.  Of course that can have severe
server side load consequences.

Better is to design the application with NFS cache in mind and avoid
the problem.  I sympathize that if the application authors weren't
aware of NFS caching issues that they may have created a problem that
is hard to correct later but I believe that really is the way to solve
the problem.

> - Is there a way to zero out the nfs traffic counters that nfsstat says it
>    can't error out?
> - Is there any way to tell the actual cache hit rate of attributes on the
>    client?

On these I do not know.

> I know that reducing the attr* values to 0 will achieve what I want, but
> I am afraid of what that will do to the load on my file servers and am 
> note certain about the best way to approach testing this.
> 
> Any help or pointers would be greatly appreciated...

I have had to deal with this same problem before and the tactic I took
was to design the task to do everything that it needed to do from the
same client.  Then there was only a single client cache and the
problem of distributed cache coherency was avoided.

It is okay to create a new file on client A and then switch over to
access the file from client B.  Since this is a new file client B
won't have it in the cache and will fetch it.  This is okay.  But
avoid having client A access the file again after client B has
modified the file.  That way leads to madness.  Okay to switch to a
third client and have it access the file since in this third client
the file will not be in cache.

Note: My experience is all with NFSv2 and NFSv3 and I haven't used
NFSv4 at all yet.  In fact it has been a few years all around now.  My
knowledge of current issues is likely growing stale, just like the NFS
buffer caches in distributed NFS clients. :-)

Bob