[NCLUG] sw raid, recovery after install

Sat Jan 19 02:40:09 MST 2013

On 01/17/2013 02:25 PM, Stephen Warren wrote:
> On 01/17/2013 04:48 AM, Michael Milligan wrote:
>> On 01/16/2013 11:08 PM, Sean Reifschneider wrote:
>>> On 01/16/2013 05:30 PM, Michael Milligan wrote:
>>> It may be worthless data, but I can see situations where it's a useful
>>> data-point.
>>>
>>> My recommendation would be to let the RAID sync...
>>>
>>
>> I'd rather tell it "--assume-clean" since I burn in new drives (which is
>> actually something you taught me a long time ago) and have all their
>> blocks zeroed after that is done, so I already know they will be synced.
> 
> That's an entirely different situation; you were recommending using
> --assume-clean without any qualification that I recall. If instead you

Yes, I was.

> do actually know that the disks are zero'd out first (or at least are
> filled with an identical pattern), then --assume-clean makes perfect
> sense. However, it'd be wrong to use --assume-clean without having
> explicitly caused it to be true first.

Sigh, I disagree and implied something unintended...

For production servers, I check the drives first (badblocks, smartctl).
 For a few home-use NAS' I've built, I used either old server drives or
new drives (not desktop drives!) that I didn't bother burning in, and in
both cases used "--assume-clean" to speed up the install.  I've had no
problems because of that choice.

Again, I just don't care about blocks on drives that are not in use and
not necessarily in sync when I installed a new system using md.  New
data gets written sync'd plus /all/ the blocks get synced up later the
first time it recovers from an event that degrades the array anyway,
well after I'm done with the install.  If there is a (cluster of)
cylinders that are bad on a drive, it gets discovered in either case
when writing to it (or reading in the case of a re-sync).  So the data I
care about is protected from a drive failure any way you look at it.
This was just my MO for installs, it worked for me.  Never had data loss
because of it.  I've only ever had to recover from hairy RAID5 transient
double-drive failures twice, which is all I was trying point out -- that
it was possible to recover from that situation -- and as far as I know
has nothing to do with not having the drives in sync for blocks unused
by the file system.  Perhaps I just got lucky and there is a really
weird corner-case situation I never thought of and never tripped over
where sync of all the blocks would have saved me, I don't know.  That
doesn't necessarily make it "wrong".  It's just a situation where md's
lack of knowledge of which blocks really matter that I find inefficient
and where I like ZFS so much better.  Ingo had a reason to put
"--assume-clean" in there and I made use of it where I found it useful,
just perhaps not for the same reason Ingo put it in.  It worked for me,
doesn't sound like it sits well with you.  Just my $0.02.  Do what you
think best and stick with md's default behaviour if it doesn't bother
you, but let's move on.  ;-)

Regards,
Mike