Very newb - How bad are these errors?

Status
Not open for further replies.

EscapeVelocit3y

Dabbler
Joined
Oct 11, 2014
Messages
28
I have a 5 driver NAS setup with 1 redundant drive. How bad are the errors below? My drives have been getting very hot and the NAS takes a very long time to boot. It's summer and I totally forgot to place my NAS in a cooler location not sure if that had an impact on the life of these drives, I'm sure it did.

  • CRITICAL: June 13, 2018, 10:15 p.m. - Device: /dev/ada2, 65282 Currently unreadable (pending) sectors
  • CRITICAL: June 13, 2018, 10:15 p.m. - Device: /dev/ada2, 82 Offline uncorrectable sectors
  • CRITICAL: June 13, 2018, 10:15 p.m. - Device: /dev/ada1, 1 Currently unreadable (pending) sectors
  • CRITICAL: June 13, 2018, 10:15 p.m. - Device: /dev/ada1, 1 Offline uncorrectable sectors
Thanks for any help.
 

Redcoat

MVP
Joined
Feb 18, 2014
Messages
2,925
I have a 5 driver NAS setup with 1 redundant drive. How bad are the errors below? My drives have been getting very hot and the NAS takes a very long time to boot. It's summer and I totally forgot to place my NAS in a cooler location not sure if that had an impact on the life of these drives, I'm sure it did.

  • CRITICAL: June 13, 2018, 10:15 p.m. - Device: /dev/ada2, 65282 Currently unreadable (pending) sectors
  • CRITICAL: June 13, 2018, 10:15 p.m. - Device: /dev/ada2, 82 Offline uncorrectable sectors
  • CRITICAL: June 13, 2018, 10:15 p.m. - Device: /dev/ada1, 1 Currently unreadable (pending) sectors
  • CRITICAL: June 13, 2018, 10:15 p.m. - Device: /dev/ada1, 1 Offline uncorrectable sectors
Thanks for any help.
Looks like pretty serious to me - two drives with issues and one redundant drive ...

Go into "Shell", select the max size window, run "zpool status" and post the output here in code tags..
 

Chris Moore

Hall of Famer
Joined
May 2, 2015
Messages
10,080
My drives have been getting very hot and the NAS takes a very long time to boot. It's summer and I totally forgot to place my NAS in a cooler location not sure if that had an impact on the life of these drives, I'm sure it did.
Heat is the big enemy of any mechanical device. Even in my computer room at work where we keep the temperature at 18°C, the hard drives in my servers, depending on the location within the chassis, can get as hot as 48°C. That is still inside their operating range and doesn't worry me, but if you have your server in a location with no cooling and the ambient temperature gets to 32°C, imagine how much hotter the drives would be. Yes, you should keep the server in a location that is temperature controlled. Most drives are rated to operate, at a maximum, between 50°C and 60°C but you need to check the manufacturer documentation and plan to keep the drives cooler than that.

If you will post the output of the command smartctl -a /dev/ada2 and again for ada1 in code tags, it would help diagnose the condition of the drives. It should look something like this, but not exactly:
Code:
smartctl 6.4 2015-06-04 r4109 [FreeBSD 10.2-RELEASE-p2 amd64] (local build)
Copyright (C) 2002-15, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Device Model:	 INTEL SSDSCKJW120H6
Serial Number:	CVTQ526501Y3120G
LU WWN Device Id: 5 5cd2e4 14c8bf84c
Firmware Version: RG10
User Capacity:	120,034,123,776 bytes [120 GB]
Sector Size:	  512 bytes logical/physical
Rotation Rate:	Solid State Device
Device is:		Not in smartctl database [for details use: -P showall]
ATA Version is:   ACS-3 (minor revision not indicated)
SATA Version is:  SATA 3.2, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:	Sat Dec 19 23:22:16 2015 HKT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x00)	Offline data collection activity
				   was never started.
				   Auto Offline Data Collection: Disabled.
Self-test execution status:	  (   0)	The previous self-test routine completed
				   without error or no self-test has ever
				   been run.
Total time to complete Offline
data collection:		 ( 2930) seconds.
Offline data collection
capabilities:			 (0x7f) SMART execute Offline immediate.
				   Auto Offline data collection on/off support.
				   Abort Offline collection upon new
				   command.
				   Offline surface scan supported.
				   Self-test supported.
				   Conveyance Self-test supported.
				   Selective Self-test supported.
SMART capabilities:			(0x0003)	Saves SMART data before entering
				   power-saving mode.
				   Supports SMART auto save timer.
Error logging capability:		(0x01)	Error logging supported.
				   General Purpose Logging supported.
Short self-test routine
recommended polling time:	 (   2) minutes.
Extended self-test routine
recommended polling time:	 (  58) minutes.
Conveyance self-test routine
recommended polling time:	 (   4) minutes.
SCT capabilities:		   (0x0025)	SCT Status supported.
				   SCT Data Table supported.

SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME		  FLAG	 VALUE WORST THRESH TYPE	  UPDATED  WHEN_FAILED RAW_VALUE
  5 Reallocated_Sector_Ct   0x0032   100   100   000	Old_age   Always	   -	   0
  9 Power_On_Hours		  0x0032   100   100   000	Old_age   Always	   -	   2187
 12 Power_Cycle_Count	   0x0032   100   100   000	Old_age   Always	   -	   29
170 Unknown_Attribute	   0x0033   100   100   010	Pre-fail  Always	   -	   0
171 Unknown_Attribute	   0x0032   100   100   000	Old_age   Always	   -	   0
172 Unknown_Attribute	   0x0032   100   100   000	Old_age   Always	   -	   0
174 Unknown_Attribute	   0x0032   100   100   000	Old_age   Always	   -	   1
183 Runtime_Bad_Block	   0x0032   100   100   000	Old_age   Always	   -	   0
184 End-to-End_Error		0x0033   100   100   090	Pre-fail  Always	   -	   0
187 Reported_Uncorrect	  0x0032   100   100   000	Old_age   Always	   -	   0
190 Airflow_Temperature_Cel 0x0032   034   100   000	Old_age   Always	   -	   34 (Min/Max 24/47)
192 Power-Off_Retract_Count 0x0032   100   100   000	Old_age   Always	   -	   1
199 UDMA_CRC_Error_Count	0x0032   100   100   000	Old_age   Always	   -	   0
225 Unknown_SSD_Attribute   0x0032   100   100   000	Old_age   Always	   -	   1987
226 Unknown_SSD_Attribute   0x0032   100   100   000	Old_age   Always	   -	   65535
227 Unknown_SSD_Attribute   0x0032   100   100   000	Old_age   Always	   -	   53
228 Power-off_Retract_Count 0x0032   100   100   000	Old_age   Always	   -	   65535
232 Available_Reservd_Space 0x0033   100   100   010	Pre-fail  Always	   -	   0
233 Media_Wearout_Indicator 0x0032   100   100   000	Old_age   Always	   -	   0
241 Total_LBAs_Written	  0x0032   100   100   000	Old_age   Always	   -	   1987
242 Total_LBAs_Read		 0x0032   100   100   000	Old_age   Always	   -	   254
249 Unknown_Attribute	   0x0032   100   100   000	Old_age   Always	   -	   1157

SMART Error Log not supported

SMART Self-test log structure revision number 1
No self-tests have been logged.  [To run self-tests, use: smartctl -t]

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
   1		0		0  Not_testing
   2		0		0  Not_testing
   3		0		0  Not_testing
   4		0		0  Not_testing
   5		0		0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.
 

Chris Moore

Hall of Famer
Joined
May 2, 2015
Messages
10,080
not sure if that had an impact on the life of these drives, I'm sure it did.
I am sure it did have a negative impact. As @m0nkey_ said, you need to plan on replacing these drives soon. I would replace the one with the most errors first.
 

EscapeVelocit3y

Dabbler
Joined
Oct 11, 2014
Messages
28
Looks like pretty serious to me - two drives with issues and one redundant drive ...

Go into "Shell", select the max size window, run "zpool status" and post the output here in code tags..

I will definitely need all the support I can get to recover my data and get my NAS back in fully operational status. I have way too many family photos on this NAS (although most backed up) to loose. I recently ran another scrub which took twice as long to complete vs. the regular monthly scrubs I run. Below is the output of zpool status.

Code:
  pool: Volume_1																													
 state: ONLINE																													 
status: One or more devices has experienced an unrecoverable error.  An															 
	   attempt was made to correct the error.  Applications are unaffected.														
action: Determine if the device needs to be replaced, and clear the errors														 
	   using 'zpool clear' or replace the device with 'zpool replace'.															 
   see: http://illumos.org/msg/ZFS-8000-9P																						 
  scan: scrub repaired 8.09M in 33h6m with 0 errors on Fri Jun 15 07:46:59 2018													 
config:																															 
																																   
	   NAME											STATE	 READ WRITE CKSUM												 
	   Volume_1										ONLINE	   0	 0	 0												 
		 raidz1-0									  ONLINE	   0	 0	 0												 
		   gptid/afe31a8c-5165-11e4-ba9b-50e549523793  ONLINE	   0	 0	11												 
		   gptid/6d63fa2a-04dd-11e5-99a1-50e549523793  ONLINE	   0	 0	 0												 
		   gptid/99fa8c04-b759-11e6-bf4a-50e549523793  ONLINE	   0	 0	 0												 
		   gptid/b1eedf5a-5165-11e4-ba9b-50e549523793  ONLINE	   0	 0	 0												 
		   gptid/b2481a0f-5165-11e4-ba9b-50e549523793  ONLINE	   0	 0	 0												 
																																   
errors: No known data errors																										
																																   
  pool: freenas-boot																												
 state: ONLINE																													 
  scan: scrub repaired 0 in 0h3m with 0 errors on Mon May 14 03:48:18 2018														 
config:																															 
																																   
	   NAME		STATE	 READ WRITE CKSUM																					 
	   freenas-boot  ONLINE	   0	 0	 0																					
		 da0p2	 ONLINE	   0	 0	 0																					 
									 


Code:
[root@freenas] ~# smartctl -a /dev/ada2
smartctl 6.5 2016-05-07 r4318 [FreeBSD 10.3-STABLE amd64] (local build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:	 Western Digital Caviar Green (AF)
Device Model:	 WDC WD20EARS-00MVWB0
Serial Number:	WD-WCAZA1262518
LU WWN Device Id: 5 0014ee 25a5b9692
Firmware Version: 51.0AB51
User Capacity:	2,000,398,934,016 bytes [2.00 TB]
Sector Size:	  512 bytes logical/physical
Device is:		In smartctl database [for details use: -P show]
ATA Version is:   ATA8-ACS (minor revision not indicated)
SATA Version is:  SATA 2.6, 3.0 Gb/s
Local Time is:	Fri Jun 15 18:48:12 2018 EDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x85) Offline data collection activity
										was aborted by an interrupting command from host.
										Auto Offline Data Collection: Enabled.
Self-test execution status:	  ( 113) The previous self-test completed having
										the read element of the test failed.
Total time to complete Offline
data collection:				(38880) seconds.
Offline data collection
capabilities:					(0x7b) SMART execute Offline immediate.
										Auto Offline data collection on/off support.
										Suspend Offline collection upon new
										command.
										Offline surface scan supported.
										Self-test supported.
										Conveyance Self-test supported.
										Selective Self-test supported.
SMART capabilities:			(0x0003) Saves SMART data before entering
										power-saving mode.
										Supports SMART auto save timer.
Error logging capability:		(0x01) Error logging supported.
										General Purpose Logging supported.
Short self-test routine
recommended polling time:		(   2) minutes.
Extended self-test routine
recommended polling time:		( 375) minutes.
Conveyance self-test routine
recommended polling time:		(   5) minutes.
SCT capabilities:			  (0x3035) SCT Status supported.
										SCT Feature Control supported.
										SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME		  FLAG	 VALUE WORST THRESH TYPE	  UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate	 0x002f   196   196   051	Pre-fail  Always	   -	   7342
  3 Spin_Up_Time			0x0027   181   168   021	Pre-fail  Always	   -	   5916
  4 Start_Stop_Count		0x0032   100   100   000	Old_age   Always	   -	   306
  5 Reallocated_Sector_Ct   0x0033   182   182   140	Pre-fail  Always	   -	   345
  7 Seek_Error_Rate		 0x002e   200   200   000	Old_age   Always	   -	   0
  9 Power_On_Hours		  0x0032   054   054   000	Old_age   Always	   -	   33814
 10 Spin_Retry_Count		0x0032   100   100   000	Old_age   Always	   -	   0
 11 Calibration_Retry_Count 0x0032   100   100   000	Old_age   Always	   -	   0
 12 Power_Cycle_Count	   0x0032   100   100   000	Old_age   Always	   -	   267
192 Power-Off_Retract_Count 0x0032   200   200   000	Old_age   Always	   -	   187
193 Load_Cycle_Count		0x0032   001   001   000	Old_age   Always	   -	   871102
194 Temperature_Celsius	 0x0022   099   084   000	Old_age   Always	   -	   51
196 Reallocated_Event_Count 0x0032   125   125   000	Old_age   Always	   -	   75
197 Current_Pending_Sector  0x0032   001   001   000	Old_age   Always	   -	   65289
198 Offline_Uncorrectable   0x0030   200   200   000	Old_age   Offline	  -	   82
199 UDMA_CRC_Error_Count	0x0032   200   200   000	Old_age   Always	   -	   0
200 Multi_Zone_Error_Rate   0x0008   195   195   000	Old_age   Offline	  -	   1362

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num  Test_Description	Status				  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short offline	   Completed: read failure	   10%	 33803		 1858360820
# 2  Short offline	   Completed: read failure	   90%	 33719		 1696655716
# 3  Short offline	   Completed: read failure	   90%	 33695		 1696655716
# 4  Short offline	   Completed: read failure	   90%	 33671		 1696655716
# 5  Short offline	   Completed: read failure	   90%	 33647		 1696655716
# 6  Short offline	   Completed: read failure	   90%	 33623		 1696655716
# 7  Short offline	   Completed: read failure	   90%	 33599		 1696655716
# 8  Short offline	   Completed: read failure	   90%	 33575		 1696655716
# 9  Short offline	   Completed: read failure	   90%	 33551		 1696655716
#10  Short offline	   Completed: read failure	   90%	 33527		 1696655716
#11  Short offline	   Completed: read failure	   90%	 33503		 1696655716
#12  Short offline	   Completed: read failure	   90%	 33479		 1696655716
#13  Short offline	   Completed: read failure	   90%	 33455		 1696655716
#14  Short offline	   Completed: read failure	   90%	 33431		 1696655716
#15  Short offline	   Completed: read failure	   90%	 33407		 1696655716
#16  Short offline	   Completed: read failure	   90%	 33383		 1696655716
#17  Short offline	   Completed: read failure	   90%	 33359		 1696655716
#18  Short offline	   Completed without error	   00%	 33335		 -
#19  Short offline	   Completed: read failure	   90%	 33311		 1696655716
#20  Short offline	   Completed: read failure	   90%	 33287		 1696655716
#21  Short offline	   Completed: read failure	   90%	 33263		 1696655716

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
	1		0		0  Not_testing
	2		0		0  Not_testing
	3		0		0  Not_testing
	4		0		0  Not_testing
	5		0		0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.



Code:
[root@freenas] ~# smartctl -a /dev/ada1
smartctl 6.5 2016-05-07 r4318 [FreeBSD 10.3-STABLE amd64] (local build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:	 Seagate Barracuda XT
Device Model:	 ST32000641AS
Serial Number:	9WM7M1F5
LU WWN Device Id: 5 000c50 03ed1881a
Firmware Version: CC13
User Capacity:	2,000,398,934,016 bytes [2.00 TB]
Sector Size:	  512 bytes logical/physical
Rotation Rate:	7200 rpm
Device is:		In smartctl database [for details use: -P show]
ATA Version is:   ATA8-ACS T13/1699-D revision 4
SATA Version is:  SATA 2.6, 6.0 Gb/s
Local Time is:	Fri Jun 15 18:50:04 2018 EDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
See vendor-specific Attribute list for marginal Attributes.

General SMART Values:
Offline data collection status:  (0x82) Offline data collection activity
										was completed without error.
										Auto Offline Data Collection: Enabled.
Self-test execution status:	  (   0) The previous self-test routine completed
										without error or no self-test has ever
										been run.
Total time to complete Offline
data collection:				(  617) seconds.
Offline data collection
capabilities:					(0x7b) SMART execute Offline immediate.
										Auto Offline data collection on/off support.
										Suspend Offline collection upon new
										command.
										Offline surface scan supported.
										Self-test supported.
										Conveyance Self-test supported.
										Selective Self-test supported.
SMART capabilities:			(0x0003) Saves SMART data before entering
										power-saving mode.
										Supports SMART auto save timer.
Error logging capability:		(0x01) Error logging supported.
										General Purpose Logging supported.
Short self-test routine
recommended polling time:		(   1) minutes.
Extended self-test routine
recommended polling time:		( 338) minutes.
Conveyance self-test routine
recommended polling time:		(   2) minutes.

SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME		  FLAG	 VALUE WORST THRESH TYPE	  UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate	 0x000f   117   099   006	Pre-fail  Always	   -	   145469591
  3 Spin_Up_Time			0x0003   100   100   000	Pre-fail  Always	   -	   0
  4 Start_Stop_Count		0x0032   100   100   020	Old_age   Always	   -	   297
  5 Reallocated_Sector_Ct   0x0033   100   100   036	Pre-fail  Always	   -	   0
  7 Seek_Error_Rate		 0x000f   084   060   030	Pre-fail  Always	   -	   305124724
  9 Power_On_Hours		  0x0032   064   064   000	Old_age   Always	   -	   31805
 10 Spin_Retry_Count		0x0013   100   100   097	Pre-fail  Always	   -	   0
 12 Power_Cycle_Count	   0x0032   100   100   020	Old_age   Always	   -	   274
183 Runtime_Bad_Block	   0x0032   100   100   000	Old_age   Always	   -	   0
184 End-to-End_Error		0x0032   100   100   099	Old_age   Always	   -	   0
187 Reported_Uncorrect	  0x0032   100   100   000	Old_age   Always	   -	   0
188 Command_Timeout		 0x0032   100   099   000	Old_age   Always	   -	   8
189 High_Fly_Writes		 0x003a   090   090   000	Old_age   Always	   -	   10
190 Airflow_Temperature_Cel 0x0022   047   031   045	Old_age   Always   In_the_past 53 (Min/Max 47/58 #6630)
191 G-Sense_Error_Rate	  0x0032   100   100   000	Old_age   Always	   -	   3
192 Power-Off_Retract_Count 0x0032   100   100   000	Old_age   Always	   -	   150
193 Load_Cycle_Count		0x0032   100   100   000	Old_age   Always	   -	   299
194 Temperature_Celsius	 0x0022   053   069   000	Old_age   Always	   -	   53 (0 17 0 0 0)
195 Hardware_ECC_Recovered  0x001a   039   019   000	Old_age   Always	   -	   145469591
197 Current_Pending_Sector  0x0012   100   100   000	Old_age   Always	   -	   1
198 Offline_Uncorrectable   0x0010   100   100   000	Old_age   Offline	  -	   1
199 UDMA_CRC_Error_Count	0x003e   200   200   000	Old_age   Always	   -	   0
240 Head_Flying_Hours	   0x0000   100   253   000	Old_age   Offline	  -	   32477 (231 115 0)
241 Total_LBAs_Written	  0x0000   100   253   000	Old_age   Offline	  -	   204055816
242 Total_LBAs_Read		 0x0000   100   253   000	Old_age   Offline	  -	   38987883

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num  Test_Description	Status				  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short offline	   Completed without error	   00%	 31785		 -
# 2  Short offline	   Completed without error	   00%	 31710		 -
# 3  Short offline	   Completed without error	   00%	 31686		 -
# 4  Short offline	   Completed without error	   00%	 31662		 -
# 5  Short offline	   Completed without error	   00%	 31638		 -
# 6  Short offline	   Completed without error	   00%	 31614		 -
# 7  Short offline	   Completed without error	   00%	 31590		 -
# 8  Short offline	   Completed without error	   00%	 31566		 -
# 9  Short offline	   Completed without error	   00%	 31542		 -
#10  Short offline	   Completed without error	   00%	 31518		 -
#11  Short offline	   Completed without error	   00%	 31494		 -
#12  Short offline	   Completed without error	   00%	 31470		 -
#13  Short offline	   Completed without error	   00%	 31446		 -
#14  Short offline	   Completed without error	   00%	 31422		 -
#15  Short offline	   Completed without error	   00%	 31398		 -
#16  Short offline	   Completed without error	   00%	 31374		 -
#17  Short offline	   Completed without error	   00%	 31350		 -
#18  Short offline	   Completed without error	   00%	 31326		 -
#19  Short offline	   Completed without error	   00%	 31302		 -
#20  Short offline	   Completed without error	   00%	 31278		 -
#21  Short offline	   Completed without error	   00%	 31254		 -

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
	1		0		0  Not_testing
	2		0		0  Not_testing
	3		0		0  Not_testing
	4		0		0  Not_testing
	5		0		0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

 

Chris Moore

Hall of Famer
Joined
May 2, 2015
Messages
10,080
Status
Not open for further replies.
Top