[NCLUG] sw raid, recovery after install

Wed Jan 2 16:01:54 MST 2013

Matt Rosing wrote:
> I added a raid 5 to my computer with 3 identical drives. I mounted it. I 
> moved files to it. It looks like it's working. Then I looked at 
> /proc/mdstat and it says it's recovering:

That is normal just as everyone has said.  The data on the disk starts
out uninitialized and needs to be sync'd.

> Is it normal to be recovering a disk right after the raid is created? It 
> could very well be that I set it up incorrectly.

I suggest that the choice of RAID5 isn't the best choice.  I would
suggest using RAID1 across two of the drives or adding a fourth drive
for RAID10.  RAID5 saves a little money up front with an additional
risk of data loss later.  I won't put RAID5 on any of my own machines.

RAID5 is lower performance than RAID1 or RAID10.  Those 3x 1T drives
will give you a 2T array with RAID5.  But 4x 1T would give you the
same 2T much more safely and with higher performance.  Or 2x 2T for
almost the same price with RAID1.  Also in the degraded state RAID5 is
much slower than RAID1 or RAID10 in the degraded state.

> md0 : active raid5 sdb[0] sdd[3] sdc[1]
>        2930274304 blocks super 1.2 level 5, 512k chunk, algorithm 2 [3/2] [UU_]
>        [========>...]  recovery = 40.6% (595347620/1465137152) finish=235.8min speed=61467K/sec

Also I think you put all of the disk space in one large array.  At 40%
your machine is estimating 235.8 minutes remaining.  Or 590 minutes,
ten hours, for the full raid sync.  That is fine.  But if you need to
reboot or have a power failure then the sync will restart and it will
be another ten hours for the full sync.  (Easy math at 3.3 hours per
1T.)  But that seems pretty slow.  Probably limited by speed_limit_min.

These days I create partitions of about 250G per partition.  Probably
30 minutes per 250G partition by memory on my machines.  Being able to
check off smaller partitions like that is nicer when doing a large
data recovery.  A traditional partition system can easily support 11x
250G-300G partitions for up to a 3T disk with no trouble.  Larger
disks would need larger partitions.  But by splitting the problem into
smaller but significantly large chunks it means splitting the recovery
into smaller problems too.  This is just my own personal preference.

The Linux kernel reserves disk bandwidth for the system to operate
while the disks are sync'ing.  This is actually slowing down your sync
time.  You can see the present speed limits:

  $ cat /proc/sys/dev/raid/speed_limit_min
  1000
  $ cat /proc/sys/dev/raid/speed_limit_max 
  200000

It is safe to use the machine during the sync phase.  But it will be
operating more slowly.  And you don't want to reboot or otherwise
disrupt it because that would restart the sync.  And so if you are
letting the machine be idle so that it can sync before doing anything
with it then you might as well increase the sync speed.  Otherwise you
are just artifically being slowed down.

  # echo 5000 > /proc/sys/dev/raid/speed_limit_min

The machine will be less usable for doing other things but the raid
sync time will be hugely decreased.  Your motherboard might be able to
go faster too.  Try 10000 or larger.

> ... raid 5 to my computer with 3 identical drives

These words triggered alarm bells in my head.  Because identical
drives are much more likely to fail about at the same time.  I always
try to decouple the failure modes by using different drives.  Using
identical drives, such as purchasing two (or three) drives all at once
of the same type are much more likely to be very identical, all from
the same assembly line at the same time.  Perhaps even with
consecutive serial numbers.  Not good for trying to have independent
failure modes.  RAID depends upon independent failure modes.

Twice now I have had two sets of identical drives fail within a week
of the sibling drive.  Once was very painful for me because I had to
do an urgent bare metal restore.  But fortunately I had a good backup.
The second time I had already sync'd to a replacement disk.  But I
would have lost the array if I had not.  But it means that if you get
a single disk failure, the limit of your raid fault tolerance, then it
is *urgent* that you replace the failed drive and restore the
redundancy of the system as soon as possible before the remaining disk
fails and you lose the array.  Because when the drives are identical
then the failures are not independent.

I now buy drives split between the three disk drive vendors available
now.  Only Toshiba, WD and Seagate remain and all other vendors are
now owned by one of those three.

  http://en.wikipedia.org/wiki/List_of_defunct_hard_disk_manufacturers

Or I split drives between one that has been running for months on a
different system in order to offset the ages of them.  Two disks with
one old and one new is safer than two identically aged drives.

A final note that RAID is not backup.  I am hoping that even with raid
in hand that you are still planning a good backup strategy.

Here are some random references lobbying against RAID5 that I dug up
with a quick search.

  http://www.cyberciti.biz/tips/raid5-vs-raid-10-safety-performance.html

  http://www.miracleas.com/BAARF/RAID5_versus_RAID10.txt

  http://www.reddit.com/r/sysadmin/comments/ydi6i/dell_raid_5_is_no_longer_recommended_for_any/

  http://www.baarf.com/

Bob