[NCLUG] Hard disk failure

John L. Bass jbass at dmsd.com
Sat May 19 01:43:10 MDT 2001


When drives crash, the head/platter contamination builds up fast until the drag
causes the drive to detect a spindle fault and it basicly shuts down hard.
Generally within a few hours - some drives might run a few days.

I generally find it advisable to copy the drive to another drive FIRST before 
attempting recovery. This is a very risky, since passing the heads across
the entire surface may cause the final crash - so be prepared to get it right
the first time.

I've written the following program a few dozen times, just don't have a copy near
at hand:

    Set all bits in map NSectors long
    set sectors to NSectors		// NSectors is drive size

    zero all sectors on copy drive

    while sectors {
        set map pointer at beginning of map
	while search map forward for unread sector {
		read this sector
		if good {
			write sector to copy
			clear bitmap[sector]
			decrement sectors
		}
	}
	while search map backward for unread sector {
		read this sector
		if good {
			write sector to copy
			clear bitmap[sector]
			decrement sectors
		}
	}
    increment passes
    print passes sectors
    if passes greater than 1000 break
    }

    dump map of unread sectors

for the light programmers, each read/write above is preceeded with an lseek,
and is sector size.

This will copy forward and back until the drive fails or all sectors are read.
Normally the drive will fail since it will hold the heads over a bad area.

you can then proceed with doing recovery on the copy of the drive.
be sure to track down all the bad sectors to the files that contain them, since
those files are trashed and canidates for reconstruction from backups.

For the brave, if only a few sectors are bad, you can use a media specific
tool to spare out those sectors. you might get lucky and get a reasonable
period of life out of the drive after recovery. My experience is that you
will probably be repeating this exercise in a few days/weeks unless in is
part of a raid array.

Any more I strongly suggest at minimum two drive mirrored operation for anything
you really might want the data from ... personally have adopted using Raid5 with
a rank of 3 to 7 and 1 or 2 parity drives, put the drives in a 6 bay tower case
as a fast/gigabit ethernet or fibre channel SAN serving with nfs/samba/atalk.

When a raid5 drive gets flakey it's just an anoyance for a few days till you can
either remap the bad sectors or swap the bad drive out and allow the raid system
to automatically reinitialize it. The down side is you get much less experience
with filesystem recovery :)

John



More information about the NCLUG mailing list