[NCLUG] sw raid, recovery after install

Tue Jan 22 11:17:37 MST 2013

On 01/19/2013 06:54 PM, Bob Proulx wrote:
> Sean Reifschneider wrote:
>> ... 32h27m ...
> 
> I think you might be running into the problem of just having a huge
> amount of data to move.  Disks have been getting larger faster than

Yeah, but that 120 hour verify time is certainly a lot of seeks for small
amounts of data, basically a large database job on the file-system
meta-data.

> Having the extra mechanisms should give you much better performance than

Indeed.  A few decades ago they had some drives that had two head
mechanisms in one drive, so it could do twice as many seeks.  Plus it could
also survive some modes of failure of one set of heads.  Bad ass!

> It is really no different than having a 2x 250G system and then
> upgrading them to 2x 500G disks and adding in the extra space as

Yeah, that's true for RAID-1.  I was thinking more of it as a use-case for
RAID-5 with many drives.  My points are really moot for you anyway, since
you are likely to have backups.  But so many people just don't, and also
don't do array monitoring and validation, so the likelihood or problems
goes up and the likelihood of frowning goes up.

> Because it is software raid I always reboot at certain points, makes
> me feel better, but in theory it is possible to do this upgrade by hot

I do the same.  When hot-swapping works, it saves time and is great.
However, when it doesn't work, say you swapped the wrong drive, it can be a
huge deal.  So saving that 5 minutes worth of reboot time when the downside
is possibly data loss and downtime while a backup is recovered from, seems
like a bad trade off...

When you swap a drive and the system is quiescent, you have more
opportunities for correcting it before it's a problem.  Like checking the
serial number of the drive you have removed to ensure you removed the
correct one...

Though, for our systems with not all the drive bays full, we've taken to
adding a new drive and doing a rebuild, then removing the bad drive.
Removing a drive live from a system that has full consistency is less of an
issue.

You still probably want to reboot with software RAID to make sure the boot
sector and boot order is  correctly set up though, better to find out now
than later when you do an update or something else is wrong and now you
have to remember doing the drive replacement and track down and correct
this too.

> low-tier reliability software raid cost point anyway.  If you need the
> 99.999 five nines uptime then you really want a hardware controller
> system anyway.

I'm really conflicted about hardware RAID.  Most of the controllers are
proprietary, so if you have an issue and need to try to get data off you
often can't figure out the on-disc layout without reverse-engineering it.
RAID-1 requires no intensive computation, so offloading that isn't a win.
But with 3 and 4TB discs, the "carving" feature to set up a /boot can be
very nice so you don't have to deal with GPT...  And you don't have to deal
with BBU battery replacements...  On the other hand, the SSD storage
tiering under the new Adaptec controllers sounds VERY interesting.  I
haven't tested one yet though.

The "zero maintenance BBU" in the newer Adaptec controllers has been
WONDERFUL.  It uses an ultracapacitor and NAND flash, the ultracap can run
the controller long enough to write the cache off to flash, then it just
shuts down.  No battery to replace.

> It is probably ten minutes or so for me from install to finish.  It
> really depends upon how large of a system I am installing.  The
> desktop can take ten minutes just by itself.  But I never install a
> desktop on a server machine.

True.  I really don't put RAID on a desktop...  If I were to get annoyed
about anything wasting my time on Ubuntu or Debian it would be that the
installer is such an attention-whore.  Unless you have the pre-seeds set
up, it will do some of the install, ask the human for some questions, do
some more install, ask more questions.  That drives me nuts!

The newer Ubuntu desktop installer that is doing install tasks while it
asks you questions is nice.

> It takes the Linux kernel SATA driver something like two minutes or so
> to detect a disk failure even if it is a hard failure such as yanking
> a cable out.  And during that time the system is not happy.  Things
> are usually blocked waiting behind that stuck drive.  And then when it
> finally times out everything breaks free, runs, and catches up.

This is another case where a hardware controller tends to be a win.  They
usually don't have the same sort of "system hangs while the driver figures
things out" issue.

> as the HP Compaq SmartArray controllers are really nice.  LED
> indicator lights on the drive.  All green means all good.  A drive
> fails and it displays a red light.  Grab the handle and pull it out

Agreed.  This is one thing I like about the Drobo, the main interface is
via the LEDs on front, which for a desktop-oriented storage array is great.
That's the environment that is least likely to have array monitoring set
up...

> If I am over the ocean then I want a seaplane.  Then if I got into
> real trouble I could taxi the rest of the way.  :-)

:-)

We mentioned you the other night when some folks were talking about Buddy
Holly and The Big Bopper and their plane getting struck by lightning and
how likely that was, figured you'd know the stats off the top of your head.
:-)

> Oh...  I was assuming that the notification of a failure was a given
> with any type of raid.  It didn't occur to me that someone would set
> up a raid but then not receive failure notifications from it.

Unfortunately, all too common...

>> In our production environment we almost always consider this a "warning
>> sign" and replace the drive.  Since we've been running burn-in testing
>> before deploying drives, the number of drives falling out of arrays has
>> dropped dramatically.
> 
> If you don't mind saying, how long do you burn drives in?

At least 2 days or 5 read/write cycles, I think.  Used to be 10 passes,
(or 50 read/write cycles), but with 2TB drives 10 passes takes too darn long.
"badblocks -svw -p 10 /dev/sdX"

> The problem isn't when we have been maintaining a system from birth
> through middle age and it is starting to get geriatric with old age

Indeed.

> Yes.  And a good backup _should_ exercise this just by the nature of
> needing to read the data for backup.

Sure, but if they have backups, the array failing unexpectedly is only an
inconvenience.  Unfortunately, some people seem to set up a RAID controller
and say "I should set up backups, but I've got a RAID controller so that
can wait."

>> Yeah, yeah, I know.  Not very green...  3 4TB 5400RPM drives would be way
>> better...  Not sure how I live with myself.  :-)
> 
> But then you would be having data bottle neck problems with that large
> data through the fewer mechanisms.

That system isn't really a high throughput system.  It does backups of
other systems and is storage for photos and the like...

> I like it!
> 
> But that is bash/ksh not python?  (shock!)  [Since $RANDOM and "=="
> are both ksh/bashisms. :-) ]

Indeed...  The core of that backup job is in shell because it's super well
tested.  I could translate it into Python, but that hasn't been a priority.

Sean