[NCLUG] badblocks and smartctl?
John Gilmore
j.arthur.gilmore at gmail.com
Tue May 27 08:17:14 MDT 2014
OK, found the smartctl -x thing, which gave me a longer error report.
Based on the new LBA I found there, I executed to following:
sudo dd if=/dev/zero of=/dev/sda bs=512 obs=512 count=1 seek=3667252272
However, the raw value of offline uncorrectable is still 1, so this
had no effect. Maybe dd overflowed it's address?
Here's the extended error information:
Error 29 [8] occurred at disk power-on lifetime: 935 hours (38 days + 23 hours)
When the command that caused the error occurred, the device was
active or idle.
After command completion occurred, registers were:
ER -- ST COUNT LBA_48 LH LM LL DV DC
-- -- -- == -- == == == -- -- -- -- --
40 -- 51 00 00 00 00 da 95 d4 30 00 00 Error: UNC at LBA =
0xda95d430 = 3667252272
Commands leading to the command that caused the error were:
CR FEATR COUNT LBA_48 LH LM LL DV DC Powered_Up_Time Command/Feature_Name
-- == -- == -- == == == -- -- -- -- -- --------------- --------------------
25 00 00 00 08 00 00 da 95 d4 29 e0 00 03:34:16.703 READ DMA EXT
27 00 00 00 00 00 00 00 00 00 00 e0 00 03:34:16.702 READ NATIVE
MAX ADDRESS EXT
ec 00 00 00 00 00 00 00 00 00 00 a0 00 03:34:16.694 IDENTIFY DEVICE
ef 00 03 00 46 00 00 00 00 00 00 a0 00 03:34:16.662 SET
FEATURES [Set transfer mode]
27 00 00 00 00 00 00 00 00 00 00 e0 00 03:34:16.662 READ NATIVE
MAX ADDRESS EXT
On Mon, May 26, 2014 at 1:32 PM, Zak Smith <zak at computer.org> wrote:
> On Mon, May 26, 2014 at 06:54:56AM -0600, John Gilmore wrote:
>> So there are no bad blocks? Maybe it was corrected? So below find the
>> output from smartctl -a, which *still* shows an uncorrectable sector
>> (not that the raw_value of "Current_Pending_Sector" and
>> "Offline_Uncorrectable" is 1)
>>
>> Also note that the error log says "UNC at LBA = 0x0fffffff =
>> 268435455" which *is* in the range that badblocks scanned. I'd go
>> ahead with the assumption that it's accurate, and use dd to touch that
>> sector (I'm using zfs with a mirror, so I can just scrub and any
>> corruption will be corrected afterwards, I don't have to worry about
>> damaging my data)
>>
>> However, I'm concerned about doing that because "0x0fffffff" is just
>> such a nice round number that I suspect it's inaccurate, or that
>
> Searching for "UNC at LBA = 0x0fffffff" reports MANY people getting
> this exact same value over a variety of drive types and sizes, so I
> suspect this is a default value in a class of error conditions. It is
> almost certainly not the real sector/LBA.
>
> smartctl (8) says this,
>
> If the command that caused the error was a READ or WRITE command,
> then the Logical Block Address (LBA) at which the error occurred
> will be printed in base 10 and base 16. The LBA is a linear
> address, which counts 512-byte sectors on the disk, starting from
> zero. (Because of the limitations of the SMART error log, if the
> LBA is greater than 0xfffffff, then either no error log entry will
> be made, or the error log entry will have an incorrect LBA. This
> may happen for drives with a capacity greater than 128 GiB or 137
> GB.) On Linux systems the smartmontools web page has instructions
> about how to convert the LBA address to the name of the disk file
> containing the erroneous disk sector.
>
> Smartctl has a more in-depth self test, I believe.
>
> Another thing I would check is if there are any messages in syslog
> about sda errors.
>
>
>
>
>
>
> --
> # Zak Smith mobile 970-232-4468
>
> _______________________________________________
> NCLUG mailing list NCLUG at lists.nclug.org
>
> To unsubscribe, subscribe, or modify
> your settings, go to:
> http://lists.nclug.org/mailman/listinfo/nclug
More information about the NCLUG
mailing list