[NCLUG] sw raid, recovery after install

Michael Milligan milli at acmeps.com
Wed Jan 16 14:37:12 MST 2013


On 01/02/2013 11:35 PM, Zak Smith wrote:
> Between about 5 and 3 years ago, my brother and I both lost data using
> linux RAID5 in circumstances that had nothing worse than one device
> with a transient error.  (These were generally using the SATA versions
> of enterprise type drives.)   

Software RAID5 has long been stable, covering that time frame.  Sorry
for your loss.  But I've had two occasions when I had two transient
drive "failures" at the same time and I recovered from both with no data
loss after putting eyeballs on the situation.  I think perhaps you
didn't try hard enough to inspect and, if necessary copy superblocks,
and then force-re-create the arrays (without re-syncing).  It's not
trivial but it's certainly possible when you haven't truly lost a drive.

That said, I've found ZFSonLinux, the native kind, is stable enough for
my data and have been using that for the past year+.  It does not suffer
from the RAID5 write hole (ZFS equivalent is called RAIDZ).  The file
system instant snapshot feature alone makes it worth the investment in
learning and setting up.  But it's the save-your-ass checksumming on
every block that you'll love it for, i.e., it detects /and corrects/ bit
rot.  RAID5 can detect it but has no way of knowing which of the
redundant blocks or parity block is correct (no per-block checksums).

Regards,
Mike




More information about the NCLUG mailing list