Is this a bad sign: smartd: 1 Currently unreadable (pending) sectors....?

Status
Not open for further replies.

HHawk

Contributor
Joined
Jun 8, 2011
Messages
176
Mod note:

For anyone reading this thread in 2018 or beyond:

Don't try to magically "fix" your hard drives. Even if you could, their firmware presents an interface far too abstract for you to accomplish anything meaningful.

If your drives are failing (in other words, presenting bad sectors or read/write errors or the like, not interface issues like CRC errors), replace them:
https://forums.freenas.org/index.php?resources/replacing-a-failed-failing-disk.75/

- Ericloewe




I am running a 6-disk setup in raidz2 (ZFS) and I recently noticed the following message:

smartd[2713]: Device: /dev/ada2, 1 Currently unreadable (pending) sectors

Sometimes the above messages shows up a few times (only on ada2) and sometimes it doesn't show up.

Does this mean that one of the 6 harddisk is damaged and will soon stop completely? Or how do I need to read this?

And does this that I need to replace the harddisk soon? Since I am still pretty new to FreeNAS (and FreeBSD) is replacing a harddisk a raidz2 difficult within FreeNAS? Any pointers and / or advice?

I am really scared to lose any date...

Oh and, according to FreeNAS' Alert System it says: OK: The volume storage (ZFS) status is HEALTHY
Also scrubbing doesn't show any other errors. I am only getting messages like the one above...
 
Last edited by a moderator:

paleoN

Wizard
Joined
Apr 22, 2012
Messages
1,403
Does this mean that one of the 6 harddisk is damaged and will soon stop completely? Or how do I need to read this?
It means there are 1 or more bad sectors on ada2. What's the output of:
Code:
smartctl -q noserial -a /dev/ada2


And does this that I need to replace the harddisk soon? Since I am still pretty new to FreeNAS (and FreeBSD) is replacing a harddisk a raidz2 difficult within FreeNAS? Any pointers and / or advice?
If it's just the one sector then no. As long as you are on 8.2 & up the GUI replace should work well. I would review the docs. Keep in mind with raidz2 you need to lose two entire disks before you need to worry.

Oh and, according to FreeNAS' Alert System it says: OK: The volume storage (ZFS) status is HEALTHY
Also scrubbing doesn't show any other errors. I am only getting messages like the one above...
Because the array is healthy and there are no errors on the ZFS side.
 

Joshua Parker Ruehlig

Hall of Famer
Joined
Dec 5, 2011
Messages
5,949
You don't need to replace the disk if you still have extra sectors.

I was able to fix this on my freenas system about a week ago.
You basically need to..
1) run a smart test and find where the test fails
2) write directly to that sector with dd(this forces the drive to relocate the secotr to one of your extra sectors)
3) run a scrub and as long as you have a clean redundant copy everything should be back to normal.

In step two you need to change a syscontrol to allow writing directly to a drive.
http://daemon-notes.com/articles/other/smartmontools/current-pending
Goodluck
 

HHawk

Contributor
Joined
Jun 8, 2011
Messages
176
Thank you for your time and answers.

This is the output:

=== START OF INFORMATION SECTION ===
Model Family: Western Digital Caviar Green (Adv. Format)
Device Model: WDC WD20EARX-00PASB0
Firmware Version: 51.0AB51
User Capacity: 2,000,398,934,016 bytes [2.00 TB]
Sector Sizes: 512 bytes logical, 4096 bytes physical
Device is: In smartctl database [for details use: -P show]
ATA Version is: 8
ATA Standard is: Exact ATA specification draft version not indicated
Local Time is: Wed Nov 14 07:46:52 2012 CET
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status: (0x82) Offline data collection activity
was completed without error.
Auto Offline Data Collection: Enabled.
Self-test execution status: ( 0) The previous self-test routine completed
without error or no self-test has ever
been run.
Total time to complete Offline
data collection: (39000) seconds.
Offline data collection
capabilities: (0x7b) SMART execute Offline immediate.
Auto Offline data collection on/off support.
Suspend Offline collection upon new
command.
Offline surface scan supported.
Self-test supported.
Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 2) minutes.
Extended self-test routine
recommended polling time: ( 376) minutes.
Conveyance self-test routine
recommended polling time: ( 5) minutes.
SCT capabilities: (0x3035) SCT Status supported.
SCT Feature Control supported.
SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x002f 200 200 051 Pre-fail Always - 0
3 Spin_Up_Time 0x0027 172 171 021 Pre-fail Always - 6358
4 Start_Stop_Count 0x0032 100 100 000 Old_age Always - 320
5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail Always - 0
7 Seek_Error_Rate 0x002e 100 253 000 Old_age Always - 0
9 Power_On_Hours 0x0032 095 095 000 Old_age Always - 3753
10 Spin_Retry_Count 0x0032 100 100 000 Old_age Always - 0
11 Calibration_Retry_Count 0x0032 100 100 000 Old_age Always - 0
12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 318
192 Power-Off_Retract_Count 0x0032 200 200 000 Old_age Always - 79
193 Load_Cycle_Count 0x0032 199 199 000 Old_age Always - 3993
194 Temperature_Celsius 0x0022 124 116 000 Old_age Always - 26
196 Reallocated_Event_Count 0x0032 200 200 000 Old_age Always - 0
197 Current_Pending_Sector 0x0032 200 200 000 Old_age Always - 1
198 Offline_Uncorrectable 0x0030 200 200 000 Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x0032 200 200 000 Old_age Always - 0
200 Multi_Zone_Error_Rate 0x0008 200 200 000 Old_age Offline - 75

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
No self-tests have been logged. [To run self-tests, use: smartctl -t]


SMART Selective self-test log data structure revision number 1
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.
 

HHawk

Contributor
Joined
Jun 8, 2011
Messages
176
You don't need to replace the disk if you still have extra sectors.

I was able to fix this on my freenas system about a week ago.
You basically need to..
1) run a smart test and find where the test fails
2) write directly to that sector with dd(this forces the drive to relocate the secotr to one of your extra sectors)
3) run a scrub and as long as you have a clean redundant copy everything should be back to normal.

In step two you need to change a syscontrol to allow writing directly to a drive.

http://daemon-notes.com/articles/other/smartmontools/current-pending
Goodluck

Thanks Joshua. I tested it and this is the result:

SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Short offline Completed: read failure 90% 3753 605610841
# 2 Short offline Completed: read failure 90% 3753 605610840

I will also run a 'long'-test to be sure.

Also what does this mean:

First, disable GEOM protection and try to refresh the sector as reallocating bad sectors only happens on write.

If I disable anything, should I re-enable it afterwards. And if yes, how?
 

HHawk

Contributor
Joined
Jun 8, 2011
Messages
176
Okay, I just did as the tutorial said. After I was done I set back the following setting:

sysctl kern.geom.debugflags=0

I don't know if this is / was needed, because I had to set it to: "sysctl kern.geom.debugflags=16" before...?

I am now rerunning the long offline test to test for errros. The RAW value for 'Current_Pending_Sector' was back at 0. Which is good, right?
 

Joshua Parker Ruehlig

Hall of Famer
Joined
Dec 5, 2011
Messages
5,949
I think the sysctl kern.geom.debugflags keeps a user from screwing things up by not letting them write directly to a disk in a raid. you can change it back afterwards, just google how to check current sysctl values for parameters and set it back after everything is working. it will also reset if you ever reboot as were just temporarily changing it.

I'd run a long selftest
Code:
smartctl -t long /dev/ada2


check the smart information for the unreadable sector, lets call it 'X'
Code:
smartctl -A /dev/ada2


change the syscontrol and try writing to the sector. Change the 'X' below
Code:
sysctl kern.geom.debugflags=16
dd if=/dev/zero of=/dev/ada2 bs=4096 count=1 seek=X conv=noerror,sync


check the smart information to see if 'Current_Pending_Sector' went to 0, you may need to repeat some of the steps multiple times if there are multiple unreadable sectors..
Code:
smartctl -A /dev/ada2


Now run another smart test and hopefully it can complete without error.
Code:
smartctl -t long /dev/ada2
smartctl -A /dev/ada2 #check status to see if it completed


Now run a scrub (either from the gui or with 'zpool scrub poolname').
Check the scrub's status and hopefully it fixes some errors.
Code:
zpool status -v poolname
 

Joshua Parker Ruehlig

Hall of Famer
Joined
Dec 5, 2011
Messages
5,949
Okay, I just did as the tutorial said. After I was done I set back the following setting:



I don't know if this is / was needed, because I had to set it to: "sysctl kern.geom.debugflags=16" before...?

I am now rerunning the long offline test to test for errros. The RAW value for 'Current_Pending_Sector' was back at 0. Which is good, right?

Yup, just hope the long selftest completes without error. You did the right thing by having RAIDZ2, just make sure you have your box scrub often (I do once every 5 days) to keep files from corrupting, and setup freenas to email you in case something goes wrong.

By the way, there's a script that automatically does this I believe. I prefer to do it myself, maybe one day they can make this part of the freenas webgui, that would be sweet =]
 

HHawk

Contributor
Joined
Jun 8, 2011
Messages
176
Thanks Joshua for the detailed information and explanation. Highly appreciated.
I will bookmark this page for future reference as well.

On a sidenote; it's currently running the long test. It first failed within a few seconds and now it's still testing.
So, so far, so good, I think.

I will run a scrub after the test has been completed (without errors).

On a sidenote; do I really need to run a scrub every 5 days? I don't have my FreeNAS machine on 24/7, only a few hours in the evening.
But to be sure, I will try and scrub every weekend once, just to make sure (as you stated).

Thank you once again for your time and answers!
 

Joshua Parker Ruehlig

Hall of Famer
Joined
Dec 5, 2011
Messages
5,949
Thanks Joshua for the detailed information and explanation. Highly appreciated.
I will bookmark this page for future reference as well.

On a sidenote; it's currently running the long test. It first failed within a few seconds and now it's still testing.
So, so far, so good, I think.

I will run a scrub after the test has been completed (without errors).

On a sidenote; do I really need to run a scrub every 5 days? I don't have my FreeNAS machine on 24/7, only a few hours in the evening.
But to be sure, I will try and scrub every weekend once, just to make sure (as you stated).

Thank you once again for your time and answers!

I don't really know, I have heard once a week for consumer level harddrives. If your ok with a little corruption, then not scrubbing to often is fine. in my experience, the corruption zfs points out doesn't noticeably change the usability of the file. But it all depends on your application / needs.
 

paleoN

Wizard
Joined
Apr 22, 2012
Messages
1,403
In the future I would zpool offline the disk in question first. ZFS will still notice a problem on read or a scrub, but if you first offline the disk, overwrite the sector and then online the disk it will be synced immediately.

IMHO, scrubs every 5 days is excessive. In fact with higher parity levels you can argue that you can scrub less often. The statistical likelihood of having an unrecoverable read on 3 separate disks for the same block of ZFS data is... unlikely to say the least. Other failure modes begin to dominate.

I would still scrub at least once a month and regularly run some long SMART tests to check for read errors on the drives. The SMART tests are faster, check any unused sectors on the drive and are less stressful on the drives.
 

paleoN

Wizard
Joined
Apr 22, 2012
Messages
1,403
In the future I would zpool offline the disk in question first. ZFS will still notice a problem on read or a scrub, but if you first offline the disk, overwrite the sector and then online the disk it will be synced immediately.
I should have finished my coffee first. ZFS won't magically know if you changed some random block until it tries to read from it. I still think it's cleaner to offline the disk first.
 

Joshua Parker Ruehlig

Hall of Famer
Joined
Dec 5, 2011
Messages
5,949
I should have finished my coffee first. ZFS won't magically know if you changed some random block until it tries to read from it. I still think it's cleaner to offline the disk first.

I don't think it matters, but I don't think it hurts to offline it. I offlined it, but I was learning as I was going through this problem, and probably wont in the future.

Thanks for the info about scrubs, I might do every 2 weeks, with short smart tests daily and long tests once a week.
 

HHawk

Contributor
Joined
Jun 8, 2011
Messages
176
Thanks guys for the extra provided information. It just shows your never to old to learn... :)

* bookmarked for future reference *
 

leenux_tux

Patron
Joined
Sep 3, 2011
Messages
238
Pending sectors

HHawk,

I had exactly the same issue, but with two drives, not one.

At the time I was running three drives, RAIDz, so having issues with two drives was not good. Luckily for me I keep a backup of all my files on an external USB drive. Not ideal as I have to connect the drive to my laptop and backup to the external drive via the lappy, but it works.

I tried a number of tests on the disks, to no avail. If you search for posts I have made you might just find the relevant one, fun-and-games, fun-and-games....

In the end, I had to download the manufacturers tools and do a low-level format of the disks. I even did the drive that was not producing any errors, just in case. This did mean of course that the whole zpool was destroyed, however, rebuilding wasn't actually that much of an issue, just time consuming.

The end result ?? No errors whatsoever on my drives, and this all happened around 3 to 4 weeks ago.

From now on, whenever I get a new hard drive, I will always use the manufacturers tools to format the drive prior to installation.

Leenux_tux
 

HHawk

Contributor
Joined
Jun 8, 2011
Messages
176
Thanks leenux_tux for your response.

Uhmz... After reading your reply, I am a bit worried; I never did a low level format (through WD tools). However a few weeks (or even months) back, I asked if I should do a low-level format when I was going to do a clean install of FreeNAS. I got several answers like "Why?" and "Waste of time!".

So I did a clean install of FreeNAS after I backed up everything (to several PC's, because of the space I was using). I now wish I did a pre-level low format, BEFORE doing a clean install of FreeNAS and putting back everything. As you might understand, it was really time consuming, because data was on several PC's. It took me 2 full days.

FYI: I first had 6 Blue HD's from WD, 1 or 2 were bad and after pulling some strings I received 6 Black HD's from WD. However I sold these and decided to get 6 Green ones instead. Temperature and reliability has a higher value than speed for me. Anyways, the current drives are working out pretty well so far, other than the error mentioned in my first post.

Anyways; thanks for the taking the time to reply. Highly appreciated as usual. Thnx.
 

paleoN

Wizard
Joined
Apr 22, 2012
Messages
1,403
I got several answers like "Why?" and "Waste of time!".
Because it is a waste of time. You assume the drive had a problem a few weeks ago and the "low-level" format would have noticed it or stopped it from happening. The error was likely new (not pre-existing), smartd notified you and you fixed it. If you are concerned about them run long SMART tests regularly. This will proactively read the all the disk's sectors.

If you wanted to you can still do a "low-level" format, just one drive at a time.
  1. Offline the drive to be wiped.

  2. Remove drive & place in separate case.

  3. Run manufacturer's tool & wipe the drive.

  4. Put drive back in FreeNAS box.

  5. Online the drive and let it resilver.

  6. Once resilvering is finished, rinse & repeat for the remaining drives one at a time.
IMO, you gain very little from this and the array is degraded by 1 drive until you finish.

Thanks for the info about scrubs, I might do every 2 weeks, with short smart tests daily and long tests once a week.
To what purpose are you running daily short SMART tests?

You could probably argue weekly long tests are a bit too often, even if I do that myself. I think it's unreasonable to run them any more often than weekly. Though I would run them at least once a month.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
I agree with paleoN. I think it's very unwise(as well as making alot of false assumptions) that doing a low-level format(Hint: There is NO such thing anymore.. hasn't been for decades.. but some people wrongly call some formats "low level") will fix the problem you had. PaleoN's recommendations are a very good alternative to what you want to call a "low level format".
 

HHawk

Contributor
Joined
Jun 8, 2011
Messages
176
Okay apparently the same harddisk does have some problems. Because I am not getting (again) several errors:

1 Currently unreadable (pending) sectors

Now I am going to do what 'paleoN' said (on 11-16-2012, 01:42 AM, message #17) by taking the drive offline and running the manufacturer's tool.


Just to make sure; I never ran or did the following:

- Created ZFS Dataset
- Created a Snapshot

I don't know if these are important to do them before taking the drive offline as mentioned?


I just checked FreeNAS' control panel and I noticed I can find the option 'Offline' in 'Volume Status' under 'View Volumes'.

So, just to be sure; I select 'Offline' turn my FreeNAS off and take out the drive and put it in another PC or something and run the manufacture tools (WD probably). After that (I guess it will take quite a while) I put it back into my FreeNAS machine. I think (guessing here) I can select 'Online' or some similar option then right?

And this 'Resilvering' will it happen by itself or do I need to do anything else?
 

leenux_tux

Patron
Joined
Sep 3, 2011
Messages
238
HHawk,

Previous comments regarding "low level format" are correct, I guess I am showing my age as I am from an era where a low level format was the "term" for what is now known as writing zeros to the entire drive. If you look at the following web page "http://knowledge.seagate.com/articles/en_US/FAQ/203931en" you will find a better way of describing what I was attempting to. Six of one and half a dozen of the other....

I my experience, having two drives with these issues, running the tool and writing out zero's (being very careful not to call it a low level format :D ) did fix my problems. I went through this process around two months ago now and have not had an issue since,

You can install the drive to another PC, however, if you have the right cables, USB-2-SATA (data) and a power converter you can connect your drive that way and still run the tool, and yes, it takes a while.
 
Status
Not open for further replies.
Top