Is this a bad sign: smartd: 1 Currently unreadable (pending) sectors....?

Status
Not open for further replies.

paleoN

Wizard
Joined
Apr 22, 2012
Messages
1,403
Requesting a new thread for your completely separate and unrelated issue.
 

uutzinger

Dabbler
Joined
Nov 27, 2011
Messages
43
This is followup for users dealing with smartclt pending sector problem and replacing the disk and then having apparent issue I posted earlier:

I replaced the disk and I am going through resilvering.
It does not seem to stop. I use FreeNAS-8.2.0-RELEASE-p1-x64.

When resilvering on FreeNAS 8.2 it might appear that the process is not stopping because the command zpool status shows the following message for several hours:

Code:
zpool status
  pool: Vol1
 state: ONLINE
status: One or more devices is currently being resilvered.  The pool will
        continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
 scrub: resilver in progress for 47h42m, 100.00% done, 0h0m to go


After letting the system continue resilver for an other 12 hours it finally completed.

(It states it resilvered 4.71T although the new drive is only 3T in size.)
 

underpickled

Contributor
Joined
Oct 1, 2013
Messages
167
I think the sysctl kern.geom.debugflags keeps a user from screwing things up by not letting them write directly to a disk in a raid. you can change it back afterwards, just google how to check current sysctl values for parameters and set it back after everything is working. it will also reset if you ever reboot as were just temporarily changing it.

I'd run a long selftest
Code:
smartctl -t long /dev/ada2


check the smart information for the unreadable sector, lets call it 'X'
Code:
smartctl -A /dev/ada2


change the syscontrol and try writing to the sector. Change the 'X' below
Code:
sysctl kern.geom.debugflags=16
dd if=/dev/zero of=/dev/ada2 bs=4096 count=1 seek=X conv=noerror,sync


check the smart information to see if 'Current_Pending_Sector' went to 0, you may need to repeat some of the steps multiple times if there are multiple unreadable sectors..
Code:
smartctl -A /dev/ada2


Now run another smart test and hopefully it can complete without error.
Code:
smartctl -t long /dev/ada2
smartctl -A /dev/ada2 #check status to see if it completed


Now run a scrub (either from the gui or with 'zpool scrub poolname').
Check the scrub's status and hopefully it fixes some errors.
Code:
zpool status -v poolname



So just resurrecting this a bit... On one disk I'm getting 2 unreadable (pending) sectors... SMART data shown here:
Code:
[Dylan@freenas] /mnt/ToxServMain# smartctl -A /dev/ada5
smartctl 6.1 2013-03-16 r3800 [FreeBSD 9.1-STABLE amd64] (local build)
Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org
 
=== START OF READ SMART DATA SECTION ===
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG    VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate    0x002f  200  200  051    Pre-fail  Always      -      426
  3 Spin_Up_Time            0x0027  181  178  021    Pre-fail  Always      -      5950
  4 Start_Stop_Count        0x0032  100  100  000    Old_age  Always      -      734
  5 Reallocated_Sector_Ct  0x0033  198  198  140    Pre-fail  Always      -      79
  7 Seek_Error_Rate        0x002e  200  200  000    Old_age  Always      -      0
  9 Power_On_Hours          0x0032  092  092  000    Old_age  Always      -      6023
10 Spin_Retry_Count        0x0032  100  100  000    Old_age  Always      -      0
11 Calibration_Retry_Count 0x0032  100  253  000    Old_age  Always      -      0
12 Power_Cycle_Count      0x0032  100  100  000    Old_age  Always      -      40
192 Power-Off_Retract_Count 0x0032  200  200  000    Old_age  Always      -      12
193 Load_Cycle_Count        0x0032  200  200  000    Old_age  Always      -      721
194 Temperature_Celsius    0x0022  119  103  000    Old_age  Always      -      31
196 Reallocated_Event_Count 0x0032  196  196  000    Old_age  Always      -      4
197 Current_Pending_Sector  0x0032  200  200  000    Old_age  Always      -      2
198 Offline_Uncorrectable  0x0030  100  253  000    Old_age  Offline      -      0
199 UDMA_CRC_Error_Count    0x0032  200  200  000    Old_age  Always      -      0
200 Multi_Zone_Error_Rate  0x0008  200  200  000    Old_age  Offline      -      8


Unfortunately, when I check the log as suggested here: http://daemon-notes.com/articles/system/smartmontools/current-pending

I just get this:
Code:
[Dylan@freenas] /mnt/ToxServMain# smartctl -l selftest /dev/ada5
smartctl 6.1 2013-03-16 r3800 [FreeBSD 9.1-STABLE amd64] (local build)
Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org
 
=== START OF READ SMART DATA SECTION ===
SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Completed without error      00%      6005        -
# 2  Extended offline    Completed without error      00%      5909        -
# 3  Extended offline    Completed without error      00%      5506        -

Which doesn't indicate the error location...

Between the 2 unreadables and what looks like a high read error rate in the SMART data, should I RMA this drive? I don't really know how to interpret SMART data well enough to tell if a drive is bad or unreliable.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
What I'd do is pull that drive out and run badblocks on it. It'll be time consuming and destructive to the drive's data, but that's what redundancy is for. But something like badblocks -svw -b 4096 -t 0xFF -t 0x00 -t 0xFF /dev/adaX should determine if you have any bad sectors.

If I were in your case I'd just replace the disk. Of course there's no justification for an RMA at this time since it passes all tests so things get a little complex and disk replacement is something you'll have to decide on for yourself.

I'd imagine if you choose to wait the disk will probably have problems that will warrant RMA in the next month or two at the most.
 

underpickled

Contributor
Joined
Oct 1, 2013
Messages
167
You know we just had this discussion twice in the last week? LOL. I'll forgive you though as the search feature of the forums is fairly broken.

What I'd do is pull that drive out and run badblocks on it. It'll be time consuming and destructive to the drive's data, but that's what redundancy is for. But something like badblocks -svw -b 4096 -t 0xFF -t 0x00 -t 0xFF /dev/adaX should determine if you have any bad sectors.

If I were in your case I'd just replace the disk. Of course there's no justification for an RMA at this time since it passes all tests so things get a little complex and disk replacement is something you'll have to decide on for yourself.

I'd imagine if you choose to wait the disk will probably have problems that will warrant RMA in the next month or two at the most.

Hah, well when I did search this is the thread that seemed the most relevant to my initial problem (unreadable sectors). Anyway... from what I'm seeing, badblocks shouldn't actually affect the data... can't I just boot to a live disc and run it on the drive instead of removing it?

I'd prefer to RMA... the disk in question is one of my older ones. Do you think WD would just say "no" to an RMA until it actually fails SMART?
 

warri

Guru
Joined
Jun 6, 2011
Messages
1,193
This test is primarily for testing new drives and is a read-write test. As the pattern is written to every accesible block the device effectively gets wiped. Default is an extensive test with four passes using four different patterns: 0xaa (10101010), 0x55 (01010101), 0xff (11111111) and 0x00 (00000000). For some devices this will take a couple of days to complete.

Source

Or did you mean the data in your pool? ;)

Yes, theoretically you can just run it from a live CD. Just may take a while.
 

underpickled

Contributor
Joined
Oct 1, 2013
Messages
167
Source

Or did you mean the data in your pool? ;)

Yes, theoretically you can just run it from a live CD. Just may take a while.

Ahh yes, well I see using the write option would wipe it... but it looks like I can instead use -n instead of -w to do a read-write test, which would preserve data. It would take longer, but if it could save me a resilvering it might be worth saving the wear on the other drives.
 

enjoywithme

Dabbler
Joined
Dec 23, 2014
Messages
13
I got message "Device: /dev/ada0, 65535 Currently unreadable (pending) sectors". Does this means 65535 sectors bad??

And in the test log:
Code:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x002f   198   136   051    Pre-fail  Always       -       869
  3 Spin_Up_Time            0x0027   253   170   021    Pre-fail  Always       -       975
  4 Start_Stop_Count        0x0032   099   099   000    Old_age   Always       -       1058
  5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x002e   200   200   000    Old_age   Always       -       0
  9 Power_On_Hours          0x0032   089   089   000    Old_age   Always       -       8386
10 Spin_Retry_Count        0x0032   100   100   000    Old_age   Always       -       0
11 Calibration_Retry_Count 0x0032   100   100   000    Old_age   Always       -       0
12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       882
192 Power-Off_Retract_Count 0x0032   199   199   000    Old_age   Always       -       788
193 Load_Cycle_Count        0x0032   001   001   000    Old_age   Always       -       633490
194 Temperature_Celsius     0x0022   118   106   000    Old_age   Always       -       32
196 Reallocated_Event_Count 0x0032   200   200   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0032   001   001   000    Old_age   Always       -       65535
198 Offline_Uncorrectable   0x0030   200   200   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age   Always       -       0
200 Multi_Zone_Error_Rate   0x0008   200   200   000    Old_age   Offline      -       25
 
Last edited:

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
I got message "Device: /dev/ada0, 65535 Currently unreadable (pending) sectors". Does this means 65535 sectors bad??

Actually, it means more than that. That log only holds 65535 entries. The drive da0 is clearly crap.
 

enjoywithme

Dabbler
Joined
Dec 23, 2014
Messages
13
Actually, it means more than that. That log only holds 65535 entries. The drive da0 is clearly crap.
Thanks cyberjock. So I should replace this disk immediately? It is raidz1 system with 5 disks. Is possible to convert into raidz2 directly without copying data out and later copying back?
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
Nope. Can't convert once you've created it. I've covered that limitation in my "noobie guide". ;)

You should replace the disk immediately because you basically have zero redundancy at this moment.
 

drkokandy

Cadet
Joined
Mar 31, 2012
Messages
4
I am encountering this issue myself. Thanks to you all for a nice walk-through. I don't mean to "bump" an old thread like this, but I just wanted to point out, for the benefit of future people who find this thread, that the article Joshua references here -

You don't need to replace the disk if you still have extra sectors.

I was able to fix this on my freenas system about a week ago.
You basically need to..
1) run a smart test and find where the test fails
2) write directly to that sector with dd(this forces the drive to relocate the secotr to one of your extra sectors)
3) run a scrub and as long as you have a clean redundant copy everything should be back to normal.

In step two you need to change a syscontrol to allow writing directly to a drive.
http://daemon-notes.com/articles/other/smartmontools/current-pending
Goodluck

- is no longer available at that URL. The updated URL is here: http://daemon-notes.com/articles/system/smartmontools/current-pending

Hopefully I will be able to fix this error thanks to this thread and that article.
 

cattledog

Dabbler
Joined
Nov 1, 2013
Messages
36
Hi,

just got this problem with my freenas and I have 6hdd only one showing problem but what I was wanting to know if I start with the repair on that hdd will this then take the server off line for others?

Brett
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
For anyone reading this thread in 2018 or beyond:

Don't try to magically "fix" your hard drives. Even if you could, their firmware presents an interface far too abstract for you to accomplish anything meaningful.

If your drives are failing (in other words, presenting bad sectors or read/write errors or the like, not interface issues like CRC errors), replace them:
https://forums.freenas.org/index.php?resources/replacing-a-failed-failing-disk.75/
 
Status
Not open for further replies.
Top