Need help on critical alert: Currently unreadable (pending) sectors

Status
Not open for further replies.

JFD

Dabbler
Joined
Jul 25, 2016
Messages
46
Hello,

I am a beginner on FreeNAS and after using my FreeNAS system I now seem to have a problem. This morning at 03:00 I got a system email that the scrub of the pool started, at 6:02 I got a system email with a critical alert:

Device: /dev/ada0, 1 Currently unreadable (pending) sectors

I quickly did some online research and found: https://dekoder.wordpress.com/2014/10/08/fixing-freenas-currently-unreadable-pending-sectors-error/ and other similar solutions.

But after running the long test I had this for results:

=== START OF INFORMATION SECTION ===
Model Family: Western Digital Red
Device Model: WDC WD40EFRX-68WT0N0
Serial Number: WD-WCC4E5YPD9XZ
LU WWN Device Id: 5 0014ee 2b7d3b7ac
Firmware Version: 82.00A82
User Capacity: 4,000,787,030,016 bytes [4.00 TB]
Sector Sizes: 512 bytes logical, 4096 bytes physical
Rotation Rate: 5400 rpm
Device is: In smartctl database [for details use: -P show]
ATA Version is: ACS-2 (minor revision not indicated)
SATA Version is: SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is: Fri Sep 1 19:39:09 2017 CEST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status: (0x00) Offline data collection activity
was never started.
Auto Offline Data Collection: Disabled.
Self-test execution status: ( 241) Self-test routine in progress...
10% of test remaining.
Total time to complete Offline
data collection: (53760) seconds.
Offline data collection
capabilities: (0x7b) SMART execute Offline immediate.
Auto Offline data collection on/off supp ort.
Suspend Offline collection upon new
command.
Offline surface scan supported.
Self-test supported.
Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 2) minutes.
Extended self-test routine
recommended polling time: ( 537) minutes.
Conveyance self-test routine
recommended polling time: ( 5) minutes.
SCT capabilities: (0x703d) SCT Status supported.
SCT Error Recovery Control supported.
SCT Feature Control supported.
SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_ FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x002f 200 200 051 Pre-fail Always - 5
3 Spin_Up_Time 0x0027 195 177 021 Pre-fail Always - 7233
4 Start_Stop_Count 0x0032 100 100 000 Old_age Always - 23
5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail Always - 0
7 Seek_Error_Rate 0x002e 200 200 000 Old_age Always - 0
9 Power_On_Hours 0x0032 086 086 000 Old_age Always - 10392
10 Spin_Retry_Count 0x0032 100 253 000 Old_age Always - 0
11 Calibration_Retry_Count 0x0032 100 253 000 Old_age Always - 0
12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 23
192 Power-Off_Retract_Count 0x0032 200 200 000 Old_age Always - 10
193 Load_Cycle_Count 0x0032 200 200 000 Old_age Always - 100
194 Temperature_Celsius 0x0022 115 113 000 Old_age Always - 37
196 Reallocated_Event_Count 0x0032 200 200 000 Old_age Always - 0
197 Current_Pending_Sector 0x0032 200 200 000 Old_age Always - 1
198 Offline_Uncorrectable 0x0030 100 253 000 Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x0032 200 200 000 Old_age Always - 0
200 Multi_Zone_Error_Rate 0x0008 200 200 000 Old_age Offline - 3

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA _of_first_error
# 1 Extended offline Completed without error 00% 9981 -
# 2 Extended offline Completed without error 00% 9235 -
# 3 Extended offline Completed without error 00% 8526 -
# 4 Extended offline Completed without error 00% 7783 -
# 5 Extended offline Completed without error 00% 7064 -
# 6 Extended offline Completed without error 00% 6321 -
# 7 Extended offline Completed without error 00% 5650 -
# 8 Extended offline Completed without error 00% 4907 -
# 9 Extended offline Completed without error 00% 4163 -
#10 Extended offline Interrupted (host reset) 20% 3441 -
#11 Extended offline Completed without error 00% 2700 -
#12 Extended offline Completed without error 00% 1987 -
#13 Extended offline Completed without error 00% 1248 -
#14 Extended offline Completed without error 00% 513 -

SMART Selective self-test log data structure revision number 1
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.


For as far as I can understand there are no errors reported, no LBA_of_first_error so no way of knowing which sector is bad. I really need some help.

Can somebody explain me what to do next?

Forgive me if my englisch isn't 100% ok :) (I am from Holland)
 

JFD

Dabbler
Joined
Jul 25, 2016
Messages
46
Also I am on FreeNAS-9.10.2-U5 and using 3 4tb HDD in Z1
 
Joined
May 10, 2017
Messages
838
For as far as I can understand there are no errors reported, no LBA_of_first_error so no way of knowing which sector is bad. I really need some help.

In my experience it's rather common for WD disks to show "false positives" pending sectors on SMART report, if the extended SMART test completes without error there are no pending sectors and also no way of making go to zero.
 

Pitfrr

Wizard
Joined
Feb 10, 2014
Messages
1,531
You might want to monitor the SMART results for this drive and make sure the pending sector count does not increase.

On my backup system I have two drives with one pending sector (WD green) each. The count is not increasing (and long SMART tests are ok) but I monitor it carefully.
 

JFD

Dabbler
Joined
Jul 25, 2016
Messages
46
You might want to monitor the SMART results for this drive and make sure the pending sector count does not increase.

On my backup system I have two drives with one pending sector (WD green) each. The count is not increasing (and long SMART tests are ok) but I monitor it carefully.

So if I understand correctly, you're say that I shouldn't worry about 1 bad sector? But what about test #10 that didn't complete? And if everything is ok, can I delete this critical error message? That way I know when there's a new critical error.

How often should I initiate a long check? And how much of this false positives is acceptable?

Thanks both for your help
 

JFD

Dabbler
Joined
Jul 25, 2016
Messages
46
Last night I ran a second long test, with these results:


SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Extended offline Completed without error 00% 10402 -
# 2 Extended offline Aborted by host 10% 10392 -
# 3 Extended offline Completed without error 00% 9981 -
# 4 Extended offline Completed without error 00% 9235 -
# 5 Extended offline Completed without error 00% 8526 -
# 6 Extended offline Completed without error 00% 7783 -
# 7 Extended offline Completed without error 00% 7064 -
# 8 Extended offline Completed without error 00% 6321 -
# 9 Extended offline Completed without error 00% 5650 -
#10 Extended offline Completed without error 00% 4907 -
#11 Extended offline Completed without error 00% 4163 -
#12 Extended offline Interrupted (host reset) 20% 3441 -
#13 Extended offline Completed without error 00% 2700 -
#14 Extended offline Completed without error 00% 1987 -
#15 Extended offline Completed without error 00% 1248 -
#16 Extended offline Completed without error 00% 513 -

So this time #2 was aborted by host and #12 was interrupted. Both I can't explain. All other systems were down, so nothing was even trying to connect to the FreeNAS system. Still I don't see any errors. Should I try to reboot the FreeNAS, check for latest updates and that's it?
 
Joined
May 10, 2017
Messages
838
Interrupted or aborted by host is not a disk problem, if it says complete: read error, then you have one or more bad sectors.
 
  • Like
Reactions: JFD

Pitfrr

Wizard
Joined
Feb 10, 2014
Messages
1,531
I my opinion (and it is what I'm doing), I wouldn't worry about 1 sector but I would monitor it to check if the count is not increasing.

By "can I delete this critical error message?" I suppose you mean the blinking light in the FreeNAS GUI?
For the GUI, just click on the blinking light and this will open a dialog box and you can just uncheck the error.
Whereas for the message in the SMART report well, you can't delete it (or at least I don't know how and I don't need/want to).

I have long tests scheduled every two weeks (and short tests every 5 days).
Here is how it looks on the SMART report:
Code:
SMART Self-test log structure revision number 1
Num  Test_Description	Status				  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short offline	   Completed without error	   00%	  6952		 -
# 2  Short offline	   Completed without error	   00%	  6838		 -
# 3  Short offline	   Completed without error	   00%	  6718		 -
# 4  Extended offline	Completed without error	   00%	  6657		 -
# 5  Short offline	   Completed without error	   00%	  6598		 -
# 6  Short offline	   Completed without error	   00%	  6463		 -
# 7  Short offline	   Completed without error	   00%	  6343		 -
# 8  Extended offline	Completed without error	   00%	  6282		 -
# 9  Short offline	   Completed without error	   00%	  6223		 -
#10  Short offline	   Completed without error	   00%	  6103		 -
#11  Short offline	   Completed without error	   00%	  5983		 -
#12  Extended offline	Completed without error	   00%	  5922		 -
#13  Short offline	   Completed without error	   00%	  5863		 -
#14  Short offline	   Completed without error	   00%	  5743		 -
#15  Short offline	   Completed without error	   00%	  5623		 -
#16  Extended offline	Completed without error	   00%	  5563		 -
#17  Short offline	   Completed without error	   00%	  5503		 -
#18  Short offline	   Completed without error	   00%	  5497		 -
#19  Short offline	   Completed without error	   00%	  5473		 -



I don't know how many false positives are acceptable?! Maybe some more experienced members could give their opinion. But I would say maybe few (like 1 to 5) and I would keep watch on the pending sector count and the results of the tests to see that no error is found.
In my case, I have one pending sector since at least 2 years on one of my drives. Long tests are completed without errors, the count is not increasing so I do not worry too much... :smile:

Regarding the aborded tests, it seems like an isolated incident... for now. Configure (if not done yet) the SMART tests and check after completion to see if this happens again.
 
  • Like
Reactions: JFD
Joined
May 10, 2017
Messages
838
I don't know how many false positives are acceptable?!

As long as they are false positives, i.e., extended SMART test completes without a read failure, you can have as many as you want, I've seen WD disks reporting hundreds of pending sectors and still pass the extended test, I believe it's a firmware issue, if they are real pending sectors, and assuming good redundancy and backups, I'd say for FreeNAS single digits are acceptable as long as they remain stable.
 

JFD

Dabbler
Joined
Jul 25, 2016
Messages
46
Thank you all for your help. Do I need to do anything in FreeNAS to schedule the tests? I tried to find settings for the SMART tests but can't find them. Do I need to schedule them or can I rely on the function that found this false position to keep me and my data safe?

If you ask me I should schedule those long tests 1 night each month or so. And can I schedule receiving an email with the results?
 

danb35

Hall of Famer
Joined
Aug 16, 2011
Messages
15,504
I tried to find settings for the SMART tests but can't find them.
They're found under Tasks.
Do I need to schedule them
Yes, they don't happen automatically. I'd schedule a short test every 1-3 days, and a long test every 1-3 weeks. The system will email you if one of them results in an error (it doesn't mail on success, that's not the Unix Way).
 
  • Like
Reactions: JFD
Status
Not open for further replies.
Top