[NCLUG] discrepency between reiserfs and ext3?

jbass at dmsd.com jbass at dmsd.com
Fri May 23 14:31:59 MDT 2003


Hi Robie,

Tough problem, as a bunch of software and interactions are
present which can create the problems you describe as a system.

First, can the application be run on the same machine as the
filesystems to remove NFS from the equation? There is a good
chance the problem is with NFS over reiserfs, and not the
reiserfs itself.

This is especially true if multiple applications/processes are
accessing the database concurrently, as local caching and filesystem
operations ordering is not as ridgid once NFS is added to the
equation, and locking protocols are relaxed via the nfs lock manager.

As a first stage in the debugging, you might want to work hard to
find the minimal failure sequence, then trace the NFS requests
that generate the failure and compare the sequence and ordering
with the same request stream using ext3. If they appear effectively
the same, then the next step is a bit harder - debugging nfs to the
filesystems. Binary comparison of the good and failed database under
the same work load stream might prove useful.

Instrumenting the kernel nfs server to show the sequence of operations
to the filesystem is a bit tougher, and then comparing that to the
generated disk I/O stream is a bit more work ... and you will learn
a lot about kernel filesystems and disk I/O in the process.

If on a RedHat based system you could punt and simply bug report it,
but without a linux to linux job stream to replicate the error and
provide testing for debugging, it's not likely this will get debugged
and fixed in the near term.

If really important you may need to hire a consultant to do the
debugging, if skills do not exist in house.  You might well be
looking at 3-7 man days to issolate the problem and maybe another
week or two to develop a fix for the problem if very complex.

Have fun,
John Bass


robiel <robiel at tgstech.com> writes:
> I have a redhat9 box where the root partition is on ext3, and a second drive 
> is reiserfs.  
>
> The problem:  an ArcInfo table (a small binary database table) stored on the 
> ext3 partition returns sucess after a particular testing script is run on it.  
> The same ArcInfo table fails the same test when stored on the reiserfs 
> partition.  
>
> If I copy the ArcInfo table from the reiserfs part. to the ext3 part., it 
> still fails.  basically, once the info table has been written to the reiserfs 
> part., it will fail the test.  
>
> The data does not appear to be corrupt, however.  The info table can be read 
> successfully and returns the expected results with other tools whether stored 
> on ext3 or reiserfs.
>
> The problem appears to be that results from the reiserfs part. are returned in 
> a different order than from the ext3 (or hfs) partitions.
>
> As a last note, both partitions are accessed via NFS from an hp-ux11 box.
>
> Does anyone have any ideas why this might be happening and/or ways to correct 
> it?
>
> Thanks, Robie Lutsey.



More information about the NCLUG mailing list