Is this a bad sign: smartd: 1 Currently unreadable (pending) sectors....?

paleoN · Dec 7, 2012

Requesting a new thread for your completely separate and unrelated issue.

uutzinger · Dec 8, 2012

This is followup for users dealing with smartclt pending sector problem and replacing the disk and then having apparent issue I posted earlier:

uutzinger said:
I replaced the disk and I am going through resilvering.
It does not seem to stop. I use FreeNAS-8.2.0-RELEASE-p1-x64.

When resilvering on FreeNAS 8.2 it might appear that the process is not stopping because the command zpool status shows the following message for several hours:

Code:

zpool status
  pool: Vol1
 state: ONLINE
status: One or more devices is currently being resilvered.  The pool will
        continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
 scrub: resilver in progress for 47h42m, 100.00% done, 0h0m to go

After letting the system continue resilver for an other 12 hours it finally completed.

(It states it resilvered 4.71T although the new drive is only 3T in size.)

underpickled · Nov 7, 2013

Joshua Parker Ruehlig said:
I think the sysctl kern.geom.debugflags keeps a user from screwing things up by not letting them write directly to a disk in a raid. you can change it back afterwards, just google how to check current sysctl values for parameters and set it back after everything is working. it will also reset if you ever reboot as were just temporarily changing it.

I'd run a long selftest

Code:
smartctl -t long /dev/ada2

check the smart information for the unreadable sector, lets call it 'X'

Code:
smartctl -A /dev/ada2

change the syscontrol and try writing to the sector. Change the 'X' below

Code:
sysctl kern.geom.debugflags=16 dd if=/dev/zero of=/dev/ada2 bs=4096 count=1 seek=X conv=noerror,sync

check the smart information to see if 'Current_Pending_Sector' went to 0, you may need to repeat some of the steps multiple times if there are multiple unreadable sectors..

Code:
smartctl -A /dev/ada2

Now run another smart test and hopefully it can complete without error.

Code:
smartctl -t long /dev/ada2 smartctl -A /dev/ada2 #check status to see if it completed

Now run a scrub (either from the gui or with 'zpool scrub poolname').
Check the scrub's status and hopefully it fixes some errors.

Code:
zpool status -v poolname

So just resurrecting this a bit... On one disk I'm getting 2 unreadable (pending) sectors... SMART data shown here:

Code:

[Dylan@freenas] /mnt/ToxServMain# smartctl -A /dev/ada5
smartctl 6.1 2013-03-16 r3800 [FreeBSD 9.1-STABLE amd64] (local build)
Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org
 
=== START OF READ SMART DATA SECTION ===
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG    VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate    0x002f  200  200  051    Pre-fail  Always      -      426
  3 Spin_Up_Time            0x0027  181  178  021    Pre-fail  Always      -      5950
  4 Start_Stop_Count        0x0032  100  100  000    Old_age  Always      -      734
  5 Reallocated_Sector_Ct  0x0033  198  198  140    Pre-fail  Always      -      79
  7 Seek_Error_Rate        0x002e  200  200  000    Old_age  Always      -      0
  9 Power_On_Hours          0x0032  092  092  000    Old_age  Always      -      6023
10 Spin_Retry_Count        0x0032  100  100  000    Old_age  Always      -      0
11 Calibration_Retry_Count 0x0032  100  253  000    Old_age  Always      -      0
12 Power_Cycle_Count      0x0032  100  100  000    Old_age  Always      -      40
192 Power-Off_Retract_Count 0x0032  200  200  000    Old_age  Always      -      12
193 Load_Cycle_Count        0x0032  200  200  000    Old_age  Always      -      721
194 Temperature_Celsius    0x0022  119  103  000    Old_age  Always      -      31
196 Reallocated_Event_Count 0x0032  196  196  000    Old_age  Always      -      4
197 Current_Pending_Sector  0x0032  200  200  000    Old_age  Always      -      2
198 Offline_Uncorrectable  0x0030  100  253  000    Old_age  Offline      -      0
199 UDMA_CRC_Error_Count    0x0032  200  200  000    Old_age  Always      -      0
200 Multi_Zone_Error_Rate  0x0008  200  200  000    Old_age  Offline      -      8

Unfortunately, when I check the log as suggested here: http://daemon-notes.com/articles/system/smartmontools/current-pending

I just get this:

Code:

[Dylan@freenas] /mnt/ToxServMain# smartctl -l selftest /dev/ada5
smartctl 6.1 2013-03-16 r3800 [FreeBSD 9.1-STABLE amd64] (local build)
Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org
 
=== START OF READ SMART DATA SECTION ===
SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Completed without error      00%      6005        -
# 2  Extended offline    Completed without error      00%      5909        -
# 3  Extended offline    Completed without error      00%      5506        -

Which doesn't indicate the error location...

Between the 2 unreadables and what looks like a high read error rate in the SMART data, should I RMA this drive? I don't really know how to interpret SMART data well enough to tell if a drive is bad or unreliable.

cyberjock · Nov 7, 2013

What I'd do is pull that drive out and run badblocks on it. It'll be time consuming and destructive to the drive's data, but that's what redundancy is for. But something like badblocks -svw -b 4096 -t 0xFF -t 0x00 -t 0xFF /dev/adaX should determine if you have any bad sectors.

If I were in your case I'd just replace the disk. Of course there's no justification for an RMA at this time since it passes all tests so things get a little complex and disk replacement is something you'll have to decide on for yourself.

I'd imagine if you choose to wait the disk will probably have problems that will warrant RMA in the next month or two at the most.

underpickled · Nov 7, 2013

cyberjock said:
You know we just had this discussion twice in the last week? LOL. I'll forgive you though as the search feature of the forums is fairly broken.

What I'd do is pull that drive out and run badblocks on it. It'll be time consuming and destructive to the drive's data, but that's what redundancy is for. But something like badblocks -svw -b 4096 -t 0xFF -t 0x00 -t 0xFF /dev/adaX should determine if you have any bad sectors.

If I were in your case I'd just replace the disk. Of course there's no justification for an RMA at this time since it passes all tests so things get a little complex and disk replacement is something you'll have to decide on for yourself.

I'd imagine if you choose to wait the disk will probably have problems that will warrant RMA in the next month or two at the most.

Hah, well when I did search this is the thread that seemed the most relevant to my initial problem (unreadable sectors). Anyway... from what I'm seeing, badblocks shouldn't actually affect the data... can't I just boot to a live disc and run it on the drive instead of removing it?

I'd prefer to RMA... the disk in question is one of my older ones. Do you think WD would just say "no" to an RMA until it actually fails SMART?

warri · Nov 7, 2013

This test is primarily for testing new drives and is a read-write test. As the pattern is written to every accesible block the device effectively gets wiped. Default is an extensive test with four passes using four different patterns: 0xaa (10101010), 0x55 (01010101), 0xff (11111111) and 0x00 (00000000). For some devices this will take a couple of days to complete.

Source

Or did you mean the data in your pool? ;)

Yes, theoretically you can just run it from a live CD. Just may take a while.

underpickled · Nov 7, 2013

warri said:
Source

Or did you mean the data in your pool? ;)

Yes, theoretically you can just run it from a live CD. Just may take a while.

Ahh yes, well I see using the write option would wipe it... but it looks like I can instead use -n instead of -w to do a read-write test, which would preserve data. It would take longer, but if it could save me a resilvering it might be worth saving the wear on the other drives.

enjoywithme · Dec 23, 2014

I got message "Device: /dev/ada0, 65535 Currently unreadable (pending) sectors". Does this means 65535 sectors bad??

And in the test log:

Code:

ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x002f   198   136   051    Pre-fail  Always       -       869
  3 Spin_Up_Time            0x0027   253   170   021    Pre-fail  Always       -       975
  4 Start_Stop_Count        0x0032   099   099   000    Old_age   Always       -       1058
  5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x002e   200   200   000    Old_age   Always       -       0
  9 Power_On_Hours          0x0032   089   089   000    Old_age   Always       -       8386
10 Spin_Retry_Count        0x0032   100   100   000    Old_age   Always       -       0
11 Calibration_Retry_Count 0x0032   100   100   000    Old_age   Always       -       0
12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       882
192 Power-Off_Retract_Count 0x0032   199   199   000    Old_age   Always       -       788
193 Load_Cycle_Count        0x0032   001   001   000    Old_age   Always       -       633490
194 Temperature_Celsius     0x0022   118   106   000    Old_age   Always       -       32
196 Reallocated_Event_Count 0x0032   200   200   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0032   001   001   000    Old_age   Always       -       65535
198 Offline_Uncorrectable   0x0030   200   200   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age   Always       -       0
200 Multi_Zone_Error_Rate   0x0008   200   200   000    Old_age   Offline      -       25

cyberjock · Dec 23, 2014

enjoywithme said:
I got message "Device: /dev/ada0, 65535 Currently unreadable (pending) sectors". Does this means 65535 sectors bad??

Actually, it means more than that. That log only holds 65535 entries. The drive da0 is clearly crap.

enjoywithme · Dec 23, 2014

cyberjock said:
Actually, it means more than that. That log only holds 65535 entries. The drive da0 is clearly crap.

Thanks cyberjock. So I should replace this disk immediately? It is raidz1 system with 5 disks. Is possible to convert into raidz2 directly without copying data out and later copying back?

cyberjock · Dec 23, 2014

Nope. Can't convert once you've created it. I've covered that limitation in my "noobie guide". ;)

You should replace the disk immediately because you basically have zero redundancy at this moment.

drkokandy · Nov 30, 2015

I am encountering this issue myself. Thanks to you all for a nice walk-through. I don't mean to "bump" an old thread like this, but I just wanted to point out, for the benefit of future people who find this thread, that the article Joshua references here -

Joshua Parker Ruehlig said:
You don't need to replace the disk if you still have extra sectors.

I was able to fix this on my freenas system about a week ago.
You basically need to..
1) run a smart test and find where the test fails
2) write directly to that sector with dd(this forces the drive to relocate the secotr to one of your extra sectors)
3) run a scrub and as long as you have a clean redundant copy everything should be back to normal.

In step two you need to change a syscontrol to allow writing directly to a drive.
http://daemon-notes.com/articles/other/smartmontools/current-pending
Goodluck

- is no longer available at that URL. The updated URL is here: http://daemon-notes.com/articles/system/smartmontools/current-pending

Hopefully I will be able to fix this error thanks to this thread and that article.

cattledog · Sep 3, 2017

Hi,

just got this problem with my freenas and I have 6hdd only one showing problem but what I was wanting to know if I start with the repair on that hdd will this then take the server off line for others?

Brett

Ericloewe · Jan 17, 2018

For anyone reading this thread in 2018 or beyond:

Don't try to magically "fix" your hard drives. Even if you could, their firmware presents an interface far too abstract for you to accomplish anything meaningful.

If your drives are failing (in other words, presenting bad sectors or read/write errors or the like, not interface issues like CRC errors), replace them:
https://forums.freenas.org/index.php?resources/replacing-a-failed-failing-disk.75/

Important Announcement for the TrueNAS Community.

Is this a bad sign: smartd: 1 Currently unreadable (pending) sectors....?

paleoN

Wizard

uutzinger

Dabbler

underpickled

Contributor

cyberjock

Inactive Account

underpickled

Contributor

warri

Guru

underpickled

Contributor

enjoywithme

Dabbler

cyberjock

Inactive Account

enjoywithme

Dabbler

cyberjock

Inactive Account

drkokandy

Cadet

cattledog

Dabbler

Ericloewe

Server Wrangler

Similar threads