[NCLUG] badblocks and smartctl?

Tue May 27 08:17:14 MDT 2014

OK, found the smartctl -x thing, which gave me a longer error report.
Based on the new LBA I found there, I executed to following:

sudo dd if=/dev/zero of=/dev/sda bs=512 obs=512 count=1 seek=3667252272

However, the raw value of offline uncorrectable is still 1, so this
had no effect. Maybe dd overflowed it's address?

Here's the extended error information:

Error 29 [8] occurred at disk power-on lifetime: 935 hours (38 days + 23 hours)
  When the command that caused the error occurred, the device was
active or idle.

  After command completion occurred, registers were:
  ER -- ST COUNT  LBA_48  LH LM LL DV DC
  -- -- -- == -- == == == -- -- -- -- --
  40 -- 51 00 00 00 00 da 95 d4 30 00 00  Error: UNC at LBA =
0xda95d430 = 3667252272

  Commands leading to the command that caused the error were:
  CR FEATR COUNT  LBA_48  LH LM LL DV DC  Powered_Up_Time  Command/Feature_Name
  -- == -- == -- == == == -- -- -- -- --  ---------------  --------------------
  25 00 00 00 08 00 00 da 95 d4 29 e0 00     03:34:16.703  READ DMA EXT
  27 00 00 00 00 00 00 00 00 00 00 e0 00     03:34:16.702  READ NATIVE
MAX ADDRESS EXT
  ec 00 00 00 00 00 00 00 00 00 00 a0 00     03:34:16.694  IDENTIFY DEVICE
  ef 00 03 00 46 00 00 00 00 00 00 a0 00     03:34:16.662  SET
FEATURES [Set transfer mode]
  27 00 00 00 00 00 00 00 00 00 00 e0 00     03:34:16.662  READ NATIVE
MAX ADDRESS EXT

On Mon, May 26, 2014 at 1:32 PM, Zak Smith <zak at computer.org> wrote:
> On Mon, May 26, 2014 at 06:54:56AM -0600, John Gilmore wrote:
>> So there are no bad blocks? Maybe it was corrected? So below find the
>> output from smartctl -a, which *still* shows an uncorrectable sector
>> (not that the raw_value of "Current_Pending_Sector" and
>> "Offline_Uncorrectable" is 1)
>>
>> Also note that the error log says "UNC at LBA = 0x0fffffff =
>> 268435455" which *is* in the range that badblocks scanned. I'd go
>> ahead with the assumption that it's accurate, and use dd to touch that
>> sector (I'm using zfs with a mirror, so I can just scrub and any
>> corruption will be corrected afterwards, I don't have to worry about
>> damaging my data)
>>
>> However, I'm concerned about doing that because "0x0fffffff" is just
>> such a nice round number that I suspect it's inaccurate, or that
>
> Searching for "UNC at LBA = 0x0fffffff" reports MANY people getting
> this exact same value over a variety of drive types and sizes, so I
> suspect this is a default value in a class of error conditions.  It is
> almost certainly not the real sector/LBA.
>
> smartctl (8) says this,
>
>    If the command that caused the error was a READ or WRITE command,
>    then the Logical Block Address (LBA) at which the error occurred
>    will be printed in base 10 and base 16.  The LBA is a linear
>    address, which counts 512-byte sectors on the disk, starting from
>    zero.  (Because of the limitations of the SMART error log, if the
>    LBA is greater than 0xfffffff, then either no error log entry will
>    be made, or the error log entry will have an incorrect LBA. This
>    may happen for drives with a capacity greater than 128 GiB or 137
>    GB.) On Linux systems the smartmontools web page has instructions
>    about how to convert the LBA address to the name of the disk file
>    containing the erroneous disk sector.
>
> Smartctl has a more in-depth self test, I believe.
>
> Another thing I would check is if there are any messages in syslog
> about sda errors.
>
>
>
>
>
>
> --
> # Zak Smith    mobile 970-232-4468
>
> _______________________________________________
> NCLUG mailing list       NCLUG at lists.nclug.org
>
> To unsubscribe, subscribe, or modify
> your settings, go to:
> http://lists.nclug.org/mailman/listinfo/nclug