[NCLUG] RAID array not started on re-boot

Stephen Warren swarren at wwwdotorg.org
Mon Sep 9 13:45:43 MDT 2013


On 09/09/2013 01:26 PM, Kevin Olson wrote:
> Thank you for the reply. When I ran the mdadm --misc --examine, this is
> the output:
> 
> [root at dta network-scripts]# mdadm --misc --examine /dev/sda
> /dev/sda:
...
> Preferred Minor : 126
...
>  Active Devices : 1
> Working Devices : 1
>  Failed Devices : 1
>   Spare Devices : 0
>        Checksum : ef07b9d2 - correct
>          Events : 168

So the md super-block in /dev/sda thinks there's 1 device in the md
array described there, which is part of /dev/md126.

> /dev/sdb:
...
>   Total Devices : 1
> Preferred Minor : 127
...
>  Active Devices : 1
> Working Devices : 1
>  Failed Devices : 1
>   Spare Devices : 0
>        Checksum : 2db5e120 - correct
>          Events : 10

Similarly, that says there's 1 device in the md array described there,
which is part of /dev/md127.

That certainly matches what is happening when you boot; you get two
arrays just like the md superblocks say.

That implies to me that those two disks are not part of the same RAID
array. The different "Events" value also implies those two devices
haven't been bound into an md array anywhere near the same number of times.

This is interesting though; your fdisk output later says that you have 1
partition on each device; /dev/sda1 and /dev/sdb1. Presumably you're
expecting those two partitions to form the RAID array not the two raw
disks? That would more align with...

> I had at some other point done "mdadm -Evvvvs" (a lot of 'v'
> characters), and this information was present:
> 
> /dev/sdb1:
...
>      Array UUID : c9ed9248:decfe2c1:125c3342:5fad89df
...
> /dev/sda1:
...
>      Array UUID : c9ed9248:decfe2c1:125c3342:5fad89df

Yes, I'm confused why "mdadm -Evvvvs" appears to be detecting both
/dev/sda1 and /dev/sda as having md super-blocks?

Perhaps there are some stale md superblocks on the raw disks in the
space before the partitions start, and so everything is getting confused?

So, something is clearly screwed up. It doesn't seem like the power
cycle caused the md superblocks to get corrupt, since the checksums match.

To be honest, I'm not sure the md arrays were ever created correctly. My
advice would be:

* Back up complete disk images of the two raw drives (not partitions) in
case something goes wrong in later steps, or to investigate the problem
later.
* Wipe the two drives completely.
* Re-create the RAID array.
* Verify that the "mdadm -Evvvvs" output looks sane, and auto-assembles
during boot.
* Restore data from backups.

FWIW, below is the "mdadm -Evvvvs" output from my system with 2 HDDs,
each with 2 partitions, with two md RAID-1 arrays, each comprising of 1
partition from each of the two drives.

> [swarren at swarren-lx1 kernel.git]$ sudo mdadm -Evvvvs
> mdadm: No md superblock detected on /dev/dm-3.
> mdadm: No md superblock detected on /dev/dm-2.
> mdadm: No md superblock detected on /dev/dm-1.
> mdadm: No md superblock detected on /dev/dm-0.
> mdadm: No md superblock detected on /dev/md1.
> mdadm: No md superblock detected on /dev/md0.
> /dev/sdb2:
>           Magic : a92b4efc
>         Version : 0.90.00
>            UUID : 4b68b141:faf315b2:949ac743:757a0ce8 (local to host swarren-lx1)
>   Creation Time : Thu Mar 17 16:18:13 2011
>      Raid Level : raid1
>   Used Dev Size : 1952981760 (1862.51 GiB 1999.85 GB)
>      Array Size : 1952981760 (1862.51 GiB 1999.85 GB)
>    Raid Devices : 2
>   Total Devices : 2
> Preferred Minor : 1
> 
>     Update Time : Mon Sep  9 13:40:29 2013
>           State : clean
>  Active Devices : 2
> Working Devices : 2
>  Failed Devices : 0
>   Spare Devices : 0
>        Checksum : db4c3f3 - correct
>          Events : 1189
> 
> 
>       Number   Major   Minor   RaidDevice State
> this     1       8       18        1      active sync   /dev/sdb2
> 
>    0     0       8        2        0      active sync   /dev/sda2
>    1     1       8       18        1      active sync   /dev/sdb2
> /dev/sdb1:
>           Magic : a92b4efc
>         Version : 0.90.00
>            UUID : 7d1f1cdb:e937beee:949ac743:757a0ce8 (local to host swarren-lx1)
>   Creation Time : Thu Mar 17 16:17:42 2011
>      Raid Level : raid1
>   Used Dev Size : 530048 (517.71 MiB 542.77 MB)
>      Array Size : 530048 (517.71 MiB 542.77 MB)
>    Raid Devices : 2
>   Total Devices : 2
> Preferred Minor : 0
> 
>     Update Time : Sun Sep  8 07:52:59 2013
>           State : clean
>  Active Devices : 2
> Working Devices : 2
>  Failed Devices : 0
>   Spare Devices : 0
>        Checksum : b94e26cd - correct
>          Events : 882
> 
> 
>       Number   Major   Minor   RaidDevice State
> this     1       8       17        1      active sync   /dev/sdb1
> 
>    0     0       8        1        0      active sync   /dev/sda1
>    1     1       8       17        1      active sync   /dev/sdb1
> /dev/sdb:
>    MBR Magic : aa55
> Partition[0] :      1060227 sectors at           63 (type fd)
> Partition[1] :   3905963775 sectors at      1060290 (type fd)
> /dev/sda2:
>           Magic : a92b4efc
>         Version : 0.90.00
>            UUID : 4b68b141:faf315b2:949ac743:757a0ce8 (local to host swarren-lx1)
>   Creation Time : Thu Mar 17 16:18:13 2011
>      Raid Level : raid1
>   Used Dev Size : 1952981760 (1862.51 GiB 1999.85 GB)
>      Array Size : 1952981760 (1862.51 GiB 1999.85 GB)
>    Raid Devices : 2
>   Total Devices : 2
> Preferred Minor : 1
> 
>     Update Time : Mon Sep  9 13:40:29 2013
>           State : clean
>  Active Devices : 2
> Working Devices : 2
>  Failed Devices : 0
>   Spare Devices : 0
>        Checksum : db4c3e1 - correct
>          Events : 1189
> 
> 
>       Number   Major   Minor   RaidDevice State
> this     0       8        2        0      active sync   /dev/sda2
> 
>    0     0       8        2        0      active sync   /dev/sda2
>    1     1       8       18        1      active sync   /dev/sdb2
> /dev/sda1:
>           Magic : a92b4efc
>         Version : 0.90.00
>            UUID : 7d1f1cdb:e937beee:949ac743:757a0ce8 (local to host swarren-lx1)
>   Creation Time : Thu Mar 17 16:17:42 2011
>      Raid Level : raid1
>   Used Dev Size : 530048 (517.71 MiB 542.77 MB)
>      Array Size : 530048 (517.71 MiB 542.77 MB)
>    Raid Devices : 2
>   Total Devices : 2
> Preferred Minor : 0
> 
>     Update Time : Sun Sep  8 07:52:59 2013
>           State : clean
>  Active Devices : 2
> Working Devices : 2
>  Failed Devices : 0
>   Spare Devices : 0
>        Checksum : b94e26bb - correct
>          Events : 882
> 
> 
>       Number   Major   Minor   RaidDevice State
> this     0       8        1        0      active sync   /dev/sda1
> 
>    0     0       8        1        0      active sync   /dev/sda1
>    1     1       8       17        1      active sync   /dev/sdb1
> /dev/sda:
>    MBR Magic : aa55
> Partition[0] :      1060227 sectors at           63 (type fd)
> Partition[1] :   3905963775 sectors at      1060290 (type fd)



More information about the NCLUG mailing list