[NCLUG] RAID array not started on re-boot

Kevin Olson kholson67 at gmail.com
Mon Sep 9 13:26:51 MDT 2013


Thank you for the reply. When I ran the mdadm --misc --examine, this is the
output:

[root at dta network-scripts]# mdadm --misc --examine /dev/sda
/dev/sda:
          Magic : a92b4efc
        Version : 0.90.00
           UUID : 6d394207:085c3321:0bfb6203:e9902df7
  Creation Time : Fri Sep  9 21:05:47 2011
     Raid Level : raid1
  Used Dev Size : 976551872 (931.31 GiB 999.99 GB)
     Array Size : 976551872 (931.31 GiB 999.99 GB)
   Raid Devices : 2
  Total Devices : 1
Preferred Minor : 126

    Update Time : Mon Aug 26 09:25:36 2013
          State : clean
 Active Devices : 1
Working Devices : 1
 Failed Devices : 1
  Spare Devices : 0
       Checksum : ef07b9d2 - correct
         Events : 168


      Number   Major   Minor   RaidDevice State
this     0       8        0        0      active sync   /dev/sda

   0     0       8        0        0      active sync   /dev/sda
   1     1       0        0        1      faulty removed
[root at dta network-scripts]# mdadm --misc --examine /dev/sdb
/dev/sdb:
          Magic : a92b4efc
        Version : 0.90.00
           UUID : 45d103af:e556b678:cc5ef4f6:b2487784
  Creation Time : Fri Sep  9 21:33:20 2011
     Raid Level : raid1
  Used Dev Size : 976551872 (931.31 GiB 999.99 GB)
     Array Size : 976551872 (931.31 GiB 999.99 GB)
   Raid Devices : 2
  Total Devices : 1
Preferred Minor : 127

    Update Time : Mon Aug 26 09:27:31 2013
          State : clean
 Active Devices : 1
Working Devices : 1
 Failed Devices : 1
  Spare Devices : 0
       Checksum : 2db5e120 - correct
         Events : 10


      Number   Major   Minor   RaidDevice State
this     0       8       16        0      active sync   /dev/sdb

   0     0       8       16        0      active sync   /dev/sdb
   1     1       0        0        1      faulty removed


I had at some other point done "mdadm -Evvvvs" (a lot of 'v' characters),
and this information was present:

/dev/sdb1:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x0
     Array UUID : c9ed9248:decfe2c1:125c3342:5fad89df
           Name : dta.example.com:1  (local to host dta.example.com)
  Creation Time : Mon Aug 26 09:34:36 2013
     Raid Level : raid1
   Raid Devices : 2

 Avail Dev Size : 1952840168 (931.19 GiB 999.85 GB)
     Array Size : 976419904 (931.19 GiB 999.85 GB)
  Used Dev Size : 1952839808 (931.19 GiB 999.85 GB)
    Data Offset : 262144 sectors
   Super Offset : 8 sectors
          State : clean
    Device UUID : 90d4fbf2:d6fa78fd:618fc550:369eb45f

    Update Time : Wed Aug 28 08:33:44 2013
       Checksum : 5cbfb0f7 - correct
         Events : 19


   Device Role : Active device 1
   Array State : AA ('A' == active, '.' == missing)
/dev/sdb:
          Magic : a92b4efc
        Version : 0.90.00
           UUID : 45d103af:e556b678:cc5ef4f6:b2487784
  Creation Time : Fri Sep  9 21:33:20 2011
     Raid Level : raid1
  Used Dev Size : 976551872 (931.31 GiB 999.99 GB)
     Array Size : 976551872 (931.31 GiB 999.99 GB)
   Raid Devices : 2
  Total Devices : 1
Preferred Minor : 127

    Update Time : Mon Aug 26 09:27:31 2013
          State : clean
 Active Devices : 1
Working Devices : 1
 Failed Devices : 1
  Spare Devices : 0
       Checksum : 2db5e120 - correct
         Events : 10


      Number   Major   Minor   RaidDevice State
this     0       8       16        0      active sync   /dev/sdb

   0     0       8       16        0      active sync   /dev/sdb
   1     1       0        0        1      faulty removed

/dev/sda1:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x0
     Array UUID : c9ed9248:decfe2c1:125c3342:5fad89df
           Name : dta.example.com:1  (local to host dta.example.com)
  Creation Time : Mon Aug 26 09:34:36 2013
     Raid Level : raid1
   Raid Devices : 2

 Avail Dev Size : 1952840168 (931.19 GiB 999.85 GB)
     Array Size : 976419904 (931.19 GiB 999.85 GB)
  Used Dev Size : 1952839808 (931.19 GiB 999.85 GB)
    Data Offset : 262144 sectors
   Super Offset : 8 sectors
          State : clean
    Device UUID : bc0a17d6:8a1426b2:ed0416bc:22046206

    Update Time : Wed Aug 28 08:33:44 2013
       Checksum : 685dc4e - correct
         Events : 19


   Device Role : Active device 0
   Array State : AA ('A' == active, '.' == missing)
/dev/sda:
          Magic : a92b4efc
        Version : 0.90.00
           UUID : 6d394207:085c3321:0bfb6203:e9902df7
  Creation Time : Fri Sep  9 21:05:47 2011
     Raid Level : raid1
  Used Dev Size : 976551872 (931.31 GiB 999.99 GB)
     Array Size : 976551872 (931.31 GiB 999.99 GB)
   Raid Devices : 2
  Total Devices : 1
Preferred Minor : 126

    Update Time : Mon Aug 26 09:25:36 2013
          State : clean
 Active Devices : 1
Working Devices : 1
 Failed Devices : 1
  Spare Devices : 0
       Checksum : ef07b9d2 - correct
         Events : 168


      Number   Major   Minor   RaidDevice State
this     0       8        0        0      active sync   /dev/sda

   0     0       8        0        0      active sync   /dev/sda
   1     1       0        0        1      faulty removed

The disks show they have the correct partition:
[root at dta onlinebkup]# fdisk -l /dev/sda

Disk /dev/sda: 1000.0 GB, 999989182464 bytes
255 heads, 63 sectors/track, 121575 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x00000000

   Device Boot      Start         End      Blocks   Id  System
/dev/sda1               1      121575   976551156   fd  Linux raid
autodetect
[root at dta onlinebkup]# fdisk -l /dev/sdb

Disk /dev/sdb: 1000.0 GB, 999989182464 bytes
255 heads, 63 sectors/track, 121575 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x00000000

   Device Boot      Start         End      Blocks   Id  System
/dev/sdb1               1      121575   976551156   fd  Linux raid
autodetect


blkid returns (for these two drives);
/dev/sda1: UUID="c9ed9248-decf-e2c1-125c-33425fad89df"
TYPE="linux_raid_member" UUID_SUB="bc0a17d6-8a14-26b2-ed04-16bc22046206"
LABEL="dta.example.com:1"
/dev/sdb1: UUID="c9ed9248-decf-e2c1-125c-33425fad89df"
TYPE="linux_raid_member" UUID_SUB="90d4fbf2-d6fa-78fd-618f-c550369eb45f"
LABEL="dta.example:1"


What I find interesting is that the output on the raid0 device is so much
different using the mdadm --misc --examine than what was reported above. I
also looked at another machine we have with raid1, and its output with the
command resembles the raid0 shown here:

[root at dta onlinebkup]# mdadm --misc --examine /dev/sde /dev/sdf
/dev/sde:
   MBR Magic : aa55
Partition[0] :   1953518017 sectors at         2048 (type fd)
/dev/sdf:
   MBR Magic : aa55
Partition[0] :   1953520002 sectors at           63 (type fd)


Seems like something is not quite configured correctly at some level with
this raid1.

Any thoughts on where to go next? (Stephen, sorry for the duplication)


On Mon, Sep 9, 2013 at 12:40 PM, Stephen Warren <swarren at wwwdotorg.org>wrote:

> On 09/09/2013 12:21 PM, Kevin Olson wrote:
> > Greetings!
> >
> > At work, we are suffering from a strange issue, and I am hoping the
> > collective wisdom of the group can provide insight.
> >
> > We are running a box with CentOS 6.4, fully updated (well, perhaps minus
> > anything in the last week). This machine has two software RAID arrays
> > created with mdadm. One is a RAID1, and one is RAID0. In the normal
> course
> > of events, the RAID1 runs on /dev/md1 and the RAID0 on /dev/md2. The UUID
> > of each RAID is in /etc/fstab, and mount works when the devices are
> running.
> >
> > Due to a strange combination of effects, in the past two weeks the
> machine
> > has twice lost power. In each instance, when power was restored to the
> > machine, the RAID0 was properly built with its two devices, but for some
> > reason the RAID1 was not created, and instead each of the two disks were
> > made into independent (though running in degraded mode) RAIDs arrays.
> >
> > [root at dta ~]$ cat /proc/mdstat
> > Personalities : [raid1] [raid0]
> > md126  : active (auto-read-only) raid1 sdb[0]
> >      976551872 blocks [2/1] [U_]
> >
> > md127 : active (auto-read-only) raid1 sda[0]
> >     976551872 blocks [2/1] [U_]
> >
> > md2 : active raid0 sde1[0] sdf1[1]
> >    1953518592 blocks super 1.2 512k chunks
> >
> > unused devices: <none>
> ...
> > My questions are:
> > * Why upon the restarting of the machine was /dev/md1 not properly
> created?
>
> Are the partition types still set to Linux RAID auto-detect (0xfd)?
>
> Did the RAID super-blocks get corrupted? Try running the following to
> make sure that the UUIDs in the super-blocks match (although I don't
> know why the --assemble command would work later if they are):
>
> # sudo mdadm --misc --examine /dev/sda
> # sudo mdadm --misc --examine /dev/sdb
>
> > * Why did the system decide to create /dev/md126 and /dev/md127?
>
> I guess it thinks they aren't part of the same RAID array any more?
>
> Perhaps try rebuilding your initrd in case that got corrupt. It seems
> unlikely, but you never know.
>
>


More information about the NCLUG mailing list