/dev/da1 [SAT], 8 Offline uncorrectable sectors, Does this mean the disk is on its last legs?

Status
Not open for further replies.

VioletDragon

Patron
Joined
Aug 6, 2017
Messages
251
Hi folks,

Last night when my server ran a scrub and backed up to my offsite server i had a email from my server about /dev/da1 [SAT], 8 Offline uncorrectable sectors. Is this a bad thing? does it mean that my disk is dying? I have ran smartctl -q noserial -a /dev/da1 via SSH. The output I am getting is
Code:
=== START OF INFORMATION SECTION ===
Model Family:	 Seagate Barracuda 7200.14 (AF)
Device Model:	 ST3000DM001-1ER166
Firmware Version: CC26
User Capacity:	3,000,592,982,016 bytes [3.00 TB]
Sector Sizes:	 512 bytes logical, 4096 bytes physical
Rotation Rate:	7200 rpm
Form Factor:	  3.5 inches
Device is:		In smartctl database [for details use: -P show]
ATA Version is:   ACS-2, ACS-3 T13/2161-D revision 3b
SATA Version is:  SATA 3.1, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:	Thu Mar  1 17:16:56 2018 GMT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x82)	Offline data collection activity
					was completed without error.
					Auto Offline Data Collection: Enabled.
Self-test execution status:	  (   0)	The previous self-test routine completed
					without error or no self-test has ever
					been run.
Total time to complete Offline
data collection:		 (   80) seconds.
Offline data collection
capabilities:			 (0x7b) SMART execute Offline immediate.
					Auto Offline data collection on/off support.
					Suspend Offline collection upon new
					command.
					Offline surface scan supported.
					Self-test supported.
					Conveyance Self-test supported.
					Selective Self-test supported.
SMART capabilities:			(0x0003)	Saves SMART data before entering
					power-saving mode.
					Supports SMART auto save timer.
Error logging capability:		(0x01)	Error logging supported.
					General Purpose Logging supported.
Short self-test routine
recommended polling time:	 (   1) minutes.
Extended self-test routine
recommended polling time:	 ( 313) minutes.
Conveyance self-test routine
recommended polling time:	 (   2) minutes.
SCT capabilities:			(0x1085)	SCT Status supported.

SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME		  FLAG	 VALUE WORST THRESH TYPE	  UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate	 0x000f   118   099   006	Pre-fail  Always	   -	   171354096
  3 Spin_Up_Time			0x0003   094   093   000	Pre-fail  Always	   -	   0
  4 Start_Stop_Count		0x0032   100   100   020	Old_age   Always	   -	   142
  5 Reallocated_Sector_Ct   0x0033   100   100   010	Pre-fail  Always	   -	   0
  7 Seek_Error_Rate		 0x000f   074   060   030	Pre-fail  Always	   -	   25616206
  9 Power_On_Hours		  0x0032   086   086   000	Old_age   Always	   -	   12511
 10 Spin_Retry_Count		0x0013   100   100   097	Pre-fail  Always	   -	   0
 12 Power_Cycle_Count	   0x0032   100   100   020	Old_age   Always	   -	   182
183 Runtime_Bad_Block	   0x0032   100   100   000	Old_age   Always	   -	   0
184 End-to-End_Error		0x0032   100   100   099	Old_age   Always	   -	   0
187 Reported_Uncorrect	  0x0032   100   100   000	Old_age   Always	   -	   0
188 Command_Timeout		 0x0032   100   100   000	Old_age   Always	   -	   0 0 0
189 High_Fly_Writes		 0x003a   063   063   000	Old_age   Always	   -	   37
190 Airflow_Temperature_Cel 0x0022   071   046   045	Old_age   Always	   -	   29 (Min/Max 27/30)
191 G-Sense_Error_Rate	  0x0032   100   100   000	Old_age   Always	   -	   0
192 Power-Off_Retract_Count 0x0032   100   100   000	Old_age   Always	   -	   53
193 Load_Cycle_Count		0x0032   022   022   000	Old_age   Always	   -	   157894
194 Temperature_Celsius	 0x0022   029   054   000	Old_age   Always	   -	   29 (0 15 0 0 0)
197 Current_Pending_Sector  0x0012   100   100   000	Old_age   Always	   -	   8
198 Offline_Uncorrectable   0x0010   100   100   000	Old_age   Offline	  -	   8
199 UDMA_CRC_Error_Count	0x003e   200   200   000	Old_age   Always	   -	   0
240 Head_Flying_Hours	   0x0000   100   253   000	Old_age   Offline	  -	   4438h+10m+58.387s
241 Total_LBAs_Written	  0x0000   100   253   000	Old_age   Offline	  -	   137096222233
242 Total_LBAs_Read		 0x0000   100   253   000	Old_age   Offline	  -	   63285713472

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num  Test_Description	Status				  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short offline	   Completed without error	   00%	 11548		 -
# 2  Short offline	   Completed without error	   00%	 11547		 -
# 3  Short offline	   Completed without error	   00%	 11546		 -
# 4  Short offline	   Completed without error	   00%	 11545		 -
# 5  Short offline	   Completed without error	   00%	 11544		 -
# 6  Short offline	   Completed without error	   00%	 11543		 -
# 7  Short offline	   Completed without error	   00%	 11542		 -
# 8  Short offline	   Completed without error	   00%	 11541		 -
# 9  Short offline	   Completed without error	   00%	 11541		 -
#10  Short offline	   Completed without error	   00%	 11540		 -
#11  Short offline	   Completed without error	   00%	 11539		 -
#12  Short offline	   Completed without error	   00%	 11537		 -
#13  Short offline	   Completed without error	   00%	 11536		 -
#14  Short offline	   Completed without error	   00%	 11535		 -
#15  Short offline	   Completed without error	   00%	 11534		 -
#16  Short offline	   Completed without error	   00%	 11533		 -
#17  Short offline	   Completed without error	   00%	 11532		 -
#18  Short offline	   Completed without error	   00%	 11531		 -
#19  Short offline	   Completed without error	   00%	 11530		 -
#20  Short offline	   Completed without error	   00%	 11529		 -
#21  Short offline	   Completed without error	   00%	 11528		 -

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
	1		0		0  Not_testing
	2		0		0  Not_testing
	3		0		0  Not_testing
	4		0		0  Not_testing
	5		0		0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

Any recommendations?

Thanks.
 
Last edited by a moderator:

danb35

Hall of Famer
Joined
Aug 16, 2011
Messages
15,504
Why were you running SMART tests every hour? And why haven't you run any SMART tests at all for over six weeks? But irrespective of the answer to those questions, the bad blocks are a bad sign.
 

VioletDragon

Patron
Joined
Aug 6, 2017
Messages
251
Why were you running SMART tests every hour? And why haven't you run any SMART tests at all for over six weeks? But irrespective of the answer to those questions, the bad blocks are a bad sign.

I have not been running smart tests for a hour? From looking at this there isnt none scheduled. What should i do then? throw the disk away and buy a new one?

Screenshot from 2018-03-01 17-59-04.png
 

danb35

Hall of Famer
Joined
Aug 16, 2011
Messages
15,504

VioletDragon

Patron
Joined
Aug 6, 2017
Messages
251
Yes, you have; the SMART output you posted makes that clear. But you haven't run any SMART tests at all in six weeks.

You could review any of the approximately 100 threads here asking the same question, or take a look at https://forums.freenas.org/index.ph...bleshooting-guide-all-versions-of-freenas.17/

Im running a Smart Test on the Disk now as we speak. Some people have mentioned about Zeroing out the bad sector. Which is something i will try. The Machine have been on for more than 6 weeks.

When you are posting a long list of text from the screen, please enclose it in code tags so that it looks like this instead
Code:
=== START OF INFORMATION SECTION ===
Model Family:	 Seagate Barracuda 7200.14 (AF)
Device Model:	 ST3000DM001-1ER166
Firmware Version: CC26
User Capacity:	3,000,592,982,016 bytes [3.00 TB]
Sector Sizes:	 512 bytes logical, 4096 bytes physical
Rotation Rate:	7200 rpm
Form Factor:	  3.5 inches
Device is:		In smartctl database [for details use: -P show]
ATA Version is:   ACS-2, ACS-3 T13/2161-D revision 3b
SATA Version is:  SATA 3.1, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:	Thu Mar  1 17:16:56 2018 GMT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x82)	Offline data collection activity
					was completed without error.
					Auto Offline Data Collection: Enabled.
Self-test execution status:	  (   0)	The previous self-test routine completed
					without error or no self-test has ever
					been run.
Total time to complete Offline
data collection:		 (   80) seconds.
Offline data collection
capabilities:			 (0x7b) SMART execute Offline immediate.
					Auto Offline data collection on/off support.
					Suspend Offline collection upon new
					command.
					Offline surface scan supported.
					Self-test supported.
					Conveyance Self-test supported.
					Selective Self-test supported.
SMART capabilities:			(0x0003)	Saves SMART data before entering
					power-saving mode.
					Supports SMART auto save timer.
Error logging capability:		(0x01)	Error logging supported.
					General Purpose Logging supported.
Short self-test routine
recommended polling time:	 (   1) minutes.
Extended self-test routine
recommended polling time:	 ( 313) minutes.
Conveyance self-test routine
recommended polling time:	 (   2) minutes.
SCT capabilities:			(0x1085)	SCT Status supported.

SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME		  FLAG	 VALUE WORST THRESH TYPE	  UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate	 0x000f   118   099   006	Pre-fail  Always	   -	   171354096
  3 Spin_Up_Time			0x0003   094   093   000	Pre-fail  Always	   -	   0
  4 Start_Stop_Count		0x0032   100   100   020	Old_age   Always	   -	   142
  5 Reallocated_Sector_Ct   0x0033   100   100   010	Pre-fail  Always	   -	   0
  7 Seek_Error_Rate		 0x000f   074   060   030	Pre-fail  Always	   -	   25616206
  9 Power_On_Hours		  0x0032   086   086   000	Old_age   Always	   -	   12511
10 Spin_Retry_Count		0x0013   100   100   097	Pre-fail  Always	   -	   0
12 Power_Cycle_Count	   0x0032   100   100   020	Old_age   Always	   -	   182
183 Runtime_Bad_Block	   0x0032   100   100   000	Old_age   Always	   -	   0
184 End-to-End_Error		0x0032   100   100   099	Old_age   Always	   -	   0
187 Reported_Uncorrect	  0x0032   100   100   000	Old_age   Always	   -	   0
188 Command_Timeout		 0x0032   100   100   000	Old_age   Always	   -	   0 0 0
189 High_Fly_Writes		 0x003a   063   063   000	Old_age   Always	   -	   37
190 Airflow_Temperature_Cel 0x0022   071   046   045	Old_age   Always	   -	   29 (Min/Max 27/30)
191 G-Sense_Error_Rate	  0x0032   100   100   000	Old_age   Always	   -	   0
192 Power-Off_Retract_Count 0x0032   100   100   000	Old_age   Always	   -	   53
193 Load_Cycle_Count		0x0032   022   022   000	Old_age   Always	   -	   157894
194 Temperature_Celsius	 0x0022   029   054   000	Old_age   Always	   -	   29 (0 15 0 0 0)
197 Current_Pending_Sector  0x0012   100   100   000	Old_age   Always	   -	   8
198 Offline_Uncorrectable   0x0010   100   100   000	Old_age   Offline	  -	   8
199 UDMA_CRC_Error_Count	0x003e   200   200   000	Old_age   Always	   -	   0
240 Head_Flying_Hours	   0x0000   100   253   000	Old_age   Offline	  -	   4438h+10m+58.387s
241 Total_LBAs_Written	  0x0000   100   253   000	Old_age   Offline	  -	   137096222233
242 Total_LBAs_Read		 0x0000   100   253   000	Old_age   Offline	  -	   63285713472

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num  Test_Description	Status				  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short offline	   Completed without error	   00%	 11548		 -
# 2  Short offline	   Completed without error	   00%	 11547		 -
# 3  Short offline	   Completed without error	   00%	 11546		 -
# 4  Short offline	   Completed without error	   00%	 11545		 -
# 5  Short offline	   Completed without error	   00%	 11544		 -
# 6  Short offline	   Completed without error	   00%	 11543		 -
# 7  Short offline	   Completed without error	   00%	 11542		 -
# 8  Short offline	   Completed without error	   00%	 11541		 -
# 9  Short offline	   Completed without error	   00%	 11541		 -
#10  Short offline	   Completed without error	   00%	 11540		 -
#11  Short offline	   Completed without error	   00%	 11539		 -
#12  Short offline	   Completed without error	   00%	 11537		 -
#13  Short offline	   Completed without error	   00%	 11536		 -
#14  Short offline	   Completed without error	   00%	 11535		 -
#15  Short offline	   Completed without error	   00%	 11534		 -
#16  Short offline	   Completed without error	   00%	 11533		 -
#17  Short offline	   Completed without error	   00%	 11532		 -
#18  Short offline	   Completed without error	   00%	 11531		 -
#19  Short offline	   Completed without error	   00%	 11530		 -
#20  Short offline	   Completed without error	   00%	 11529		 -
#21  Short offline	   Completed without error	   00%	 11528		 -

SMART Selective self-test log data structure revision number 1
SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
	1		0		0  Not_testing
	2		0		0  Not_testing
	3		0		0  Not_testing
	4		0		0  Not_testing
	5		0		0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

Sorry Christ. Will do next time.


The thing keeps bombarding me with emails saying,

Device: /dev/da1 [SAT], 8 Offline uncorrectable sectors
Device: /dev/da1 [SAT], Self-Test Log error count increased from 0 to 1
 

Chris Moore

Hall of Famer
Joined
May 2, 2015
Messages
10,080
Code:
SMART Self-test log structure revision number 1
Num  Test_Description	Status				  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short offline	   Completed without error	   00%	 11548		 -
# 2  Short offline	   Completed without error	   00%	 11547		 -
# 3  Short offline	   Completed without error	   00%	 11546		 -
# 4  Short offline	   Completed without error	   00%	 11545		 -
# 5  Short offline	   Completed without error	   00%	 11544		 -
# 6  Short offline	   Completed without error	   00%	 11543		 -
# 7  Short offline	   Completed without error	   00%	 11542		 -
# 8  Short offline	   Completed without error	   00%	 11541		 -
# 9  Short offline	   Completed without error	   00%	 11541		 -
#10  Short offline	   Completed without error	   00%	 11540		 -
#11  Short offline	   Completed without error	   00%	 11539		 -
#12  Short offline	   Completed without error	   00%	 11537		 -
#13  Short offline	   Completed without error	   00%	 11536		 -
#14  Short offline	   Completed without error	   00%	 11535		 -
#15  Short offline	   Completed without error	   00%	 11534		 -
#16  Short offline	   Completed without error	   00%	 11533		 -
#17  Short offline	   Completed without error	   00%	 11532		 -
#18  Short offline	   Completed without error	   00%	 11531		 -
#19  Short offline	   Completed without error	   00%	 11530		 -
#20  Short offline	   Completed without error	   00%	 11529		 -
#21  Short offline	   Completed without error	   00%	 11528		 -
This portion of your message indicates that a short test has been run every hour for a long time and sometimes even twice in a single hour.

I suggest that the drive should be run through a diagnostic, multi pass, write to test all the sectors and then look at the long SMART test results again. If it has more bad sectors, see if it is in warranty first, but if it is not in warranty it is time for a replacement drive.
 

VioletDragon

Patron
Joined
Aug 6, 2017
Messages
251
This portion of your message indicates that a short test has been run every hour for a long time and sometimes even twice in a single hour.

I suggest that the drive should be run through a diagnostic, multi pass, write to test all the sectors and then look at the long SMART test results again. If it has more bad sectors, see if it is in warranty first, but if it is not in warranty it is time for a replacement drive.

Thanks Christ :). I should Schedule a Smart Test more often i suppose. The Disk is only 3 years olds. So not sure if it still has warranty. On a side note tho i do have my Backup offline Server up and running now. But if this drive does dye even tho i have a RAIDZ1 i shouldnt loose any DATA on the Server should i?
 

VioletDragon

Patron
Joined
Aug 6, 2017
Messages
251
This portion of your message indicates that a short test has been run every hour for a long time and sometimes even twice in a single hour.

I suggest that the drive should be run through a diagnostic, multi pass, write to test all the sectors and then look at the long SMART test results again. If it has more bad sectors, see if it is in warranty first, but if it is not in warranty it is time for a replacement drive.

Thanks Christ :). I should Schedule a Smart Test more often i suppose. The Disk is only 3 years olds. So not sure if it still has warranty. On a side note tho i do have my Backup offline Server up and running now. But if this drive does dye even tho i have a RAIDZ1 i shouldnt loose any DATA on the Server should i?
 

VioletDragon

Patron
Joined
Aug 6, 2017
Messages
251
This portion of your message indicates that a short test has been run every hour for a long time and sometimes even twice in a single hour.

I suggest that the drive should be run through a diagnostic, multi pass, write to test all the sectors and then look at the long SMART test results again. If it has more bad sectors, see if it is in warranty first, but if it is not in warranty it is time for a replacement drive.

Thanks Christ :). I should Schedule a Smart Test more often i suppose. The Disk is only 3 years olds. So not sure if it still has warranty. On a side note tho i do have my Backup offline Server up and running now. But if this drive does dye even tho i have a RAIDZ1 i shouldnt loose any DATA on the Server should i?
 

VioletDragon

Patron
Joined
Aug 6, 2017
Messages
251
This portion of your message indicates that a short test has been run every hour for a long time and sometimes even twice in a single hour.

I suggest that the drive should be run through a diagnostic, multi pass, write to test all the sectors and then look at the long SMART test results again. If it has more bad sectors, see if it is in warranty first, but if it is not in warranty it is time for a replacement drive.

Thanks Christ :). I should Schedule a Smart Test more often i suppose. The Disk is only 3 years olds. So not sure if it still has warranty. On a side note tho i do have my Backup offline Server up and running now. But if this drive does dye even tho i have a RAIDZ1 i shouldnt loose anything on the server should i?
 

Spearfoot

He of the long foot
Moderator
Joined
May 13, 2015
Messages
2,478
Hi folks,

Last night when my Server ran a scrub & backed up to my Offsite Server i had a email from my Server about /dev/da1 [SAT], 8 Offline uncorrectable sectors. Is this a bad thing? does it mean that my disk is dying?
Code:
...snip...
Model Family:	 Seagate Barracuda 7200.14 (AF)
Device Model:	 ST3000DM001-1ER166
...snip...
  9 Power_On_Hours		  0x0032   086   086   000	Old_age   Always	   -	   12511
...snip...


Any recommendations?

Thanks.
You've gotten over 12,000 hours of service out of a Seagate ST3000DM001 -- a disk model which is infamous for failing early and often. My advice is to replace it as soon as possible! :D
 

Chris Moore

Hall of Famer
Joined
May 2, 2015
Messages
10,080
But if this drive does dye even tho i have a RAIDZ1 i shouldn't loose any DATA on the Server should i?
A single disk failure should not take out the array, but it does leave you with no fault tolerance. Best bet is to do an online replacement, meaning that you connect the replacement drive while the old drive is still in the system.
 

danb35

Hall of Famer
Joined
Aug 16, 2011
Messages
15,504
Device: /dev/da1 [SAT], Self-Test Log error count increased from 0 to 1
That means that the most recent self-test (the one you just started) finished with an error. The results of smartctl -a /dev/da1 again (in code tags) would show this.

And, though I'm sure he's a great guy, Chris isn't Christ.
 

VioletDragon

Patron
Joined
Aug 6, 2017
Messages
251
You've gotten over 12,000 hours of service out of a Seagate ST3000DM001 -- a disk model which is infamous for failing early and often. My advice is to replace it as soon as possible! :D

Thanks for all your messages guys :). Huge help. What do you guys think about the Seagate 3TB IronWolf 5900RPM disks? Im looking at replacing the old disk with.
 

Spearfoot

He of the long foot
Moderator
Joined
May 13, 2015
Messages
2,478
Thanks for all your messages guys :). Huge help. What do you guys think about the Seagate 3TB IronWolf 5900RPM disks? Im looking at replacing the old disk with.
I've had so much experience with dozens of bad Seagate drives that I will never willingly purchase a Seagate disk product -- I prefer HGST or Western Digital.

Good luck!
 

VioletDragon

Patron
Joined
Aug 6, 2017
Messages
251
I've had so much experience with dozens of bad Seagate drives that I will never willingly purchase a Seagate disk product -- I prefer HGST or Western Digital.

Good luck!

So when i take the old disk out and put the new one in will i have to do any funky stuff to add it to the existing array? Ive noticed that Chris has said that while the old drive is still in the system.
 

danb35

Hall of Famer
Joined
Aug 16, 2011
Messages
15,504
What do you guys think about the Seagate 3TB IronWolf 5900RPM disks?
Most of my disks are WD, but I have a few Seagates in my current server. One of them has been running for nearly four years (33512 hours, to be exact), 24x7x365, with no problems. I wouldn't mind another Seagate if I needed another disk and the price was right.
 

VioletDragon

Patron
Joined
Aug 6, 2017
Messages
251
The disk that is failing has been on for 24/7 since 16th may 2016 but I will order the new disk next week ASAP. One question is can I just take that disk out and put the new one in or will I have to do something funky? Another thing is when getting a new disk I always zero it in FreeNAS is this a good idea?
 
Last edited by a moderator:

danb35

Hall of Famer
Joined
Aug 16, 2011
Messages
15,504
So when i take the old disk out and put the new one in will i have to do any funky stuff to add it to the existing array?
Someone ought to write a resource about how to replace a disk. To follow Chris's suggestion (which is a good one), don't offline or remove the failing disk first--install the new one, do the replacement through the GUI, and once that finishes, the old disk will offline automatically. You can then remove it at your convenience.
 

danb35

Hall of Famer
Joined
Aug 16, 2011
Messages
15,504
when getting a new disk i always Zero it in FreeNAS is this a good idea?
No--there's no reason to zero the new disk. You should, however, test it thoroughly.
 

VioletDragon

Patron
Joined
Aug 6, 2017
Messages
251
Someone ought to write a resource about how to replace a disk. To follow Chris's suggestion (which is a good one), don't offline or remove the failing disk first--install the new one, do the replacement through the GUI, and once that finishes, the old disk will offline automatically. You can then remove it at your convenience.

Is via resilver? Sorry for all the questions guys.

No--there's no reason to zero the new disk. You should, however, test it thoroughly.

Im guessing a Smart Test?
 

danb35

Hall of Famer
Joined
Aug 16, 2011
Messages
15,504

Jailer

Not strong, but bad
Joined
Sep 12, 2014
Messages
4,977
The User Guide is also a great place to start when you have questions on your FreeNAS server.
 
Status
Not open for further replies.
Top