[NCLUG] RAID array not started on re-boot

Stephen Warren swarren at wwwdotorg.org
Mon Sep 9 14:56:04 MDT 2013


On 09/09/2013 02:09 PM, Kevin Olson wrote:
> As you surmised, we are using the partitions /dev/sda1 and /dev/sdb1 to
> form the array, and not the whole disks. There was an issue at some
> point in time in using the full disks, and since then we've always used
> the partitions. If one can reliably use the whole disks, I would think
> life would be easier.
> 
> It is just so strange that the "mdadm --assemble /dev/md1" worked
> without an issue, yet at boot it follows the logic you noted with the
> information from the "mdadm --misc --examine /dev/sda /dev/sdb" command.

I'm not sure, but it's possible the --assemble command gets its
configuration data from the config file, not the super-blocks on disk,
and hence isn't affected by any duplicate super-blocks.

> From your explanation, I can see where the scan finds the /dev/sda and
> /dev/sdb (the whole disks) and perhaps creates the (separate) arrays
> from those on boot, due to perhaps old data.

Yes, I guess you're ending up with duplicate md devices. I hope there's
no auto-correcting scrub running on /dev/md12[67]. Perhaps it wouldn't
be too much of an issue in practice, but it'd sure we scary.

> We replaced one of the
> disks a few months ago, so I presume that explains the difference in the
> number of events.

True.

> It is a bit difficult to tear the array apart at the moment, otherwise
> I'd follow your advice and zero the disks. I believe I read somewhere
> that in the /etc/mdadm.conf, it is possible to specify which devices
> should be looked at. Should I consider modifying that configuration to
> explicitly list the partitions? Would this approach make any difference
> on boot?

I don't know how the initrds work in this case. It sounds like it's just
auto-scanning all devices and assembling any arrays it can find, then
presumably not even looking at e.g. /dev/sda1 since /dev/sda was already
identified as an md device? You might be able to blacklist certain
device name patterns from scanning in the configuration file (but maybe
I'm thinking of lvm not md), but I don't know if that will influence the
initrd...

Not being able to fix a scary data integrity issue sounds ... scary!


More information about the NCLUG mailing list