SOLVED drive is failing curious what the error means..

Status
Not open for further replies.

theman1

Dabbler
Joined
Nov 8, 2014
Messages
30
1 (atleast 1) of my 8 3TB drives on the freenas setup is failing and I am going to have to replace it. What exactly does the ATA error count mean? Is it better to buy a new drive or can I use the 3TB external I had laying around?

My setup- 8x3TB’s in Z2, SUPERMICRO X10SL7-F, 16 GB EEC RAM

The errors I got were-
Code:
CRITICAL:						Nov. 30, 2016, 3:47 p.m. - Device: /dev/da2 [SAT], 88 Currently unreadable (pending) sectors
CRITICAL:						Nov. 30, 2016, 3:47 p.m. - Device: /dev/da2 [SAT], 88 Offline uncorrectable sectors
CRITICAL:						Dec. 3, 2016, 1 a.m. - Device: /dev/da2 [SAT], ATA error count increased from 0 to 4
CRITICAL:						Dec. 2, 2016, 9:18 a.m. - Device: /dev/da6 [SAT], 8 Currently unreadable (pending) sectors
CRITICAL:						Dec. 2, 2016, 9:18 a.m. - Device: /dev/da6 [SAT], 8 Offline uncorrectable sectors



zpool status is-

Code:
  pool: ZFS																														
state: ONLINE																													
status: Some supported features are not enabled on the pool. The pool can														 
		still be used, but some features are unavailable.																		 
action: Enable all features using 'zpool upgrade'. Once this is done,															 
		the pool may no longer be accessible by software that does not support													
		the features. See zpool-features(7) for details.																			
  scan: scrub repaired 88K in 35h32m with 0 errors on Sun Dec  4 02:26:35 2016													
config:																															
																																	
		NAME											STATE	 READ WRITE CKSUM												
		ZFS											 ONLINE	   0	 0	 0												
		  raidz2-0									  ONLINE	   0	 0	 0												
			gptid/ce3c98af-6f87-11e4-a0fc-0cc47a30009c  ONLINE	   0	 0	 0												
			gptid/ceec5548-6f87-11e4-a0fc-0cc47a30009c  ONLINE	   0	 0	 0												
			gptid/cfb7fb50-6f87-11e4-a0fc-0cc47a30009c  ONLINE	   0	 0	 0												
			gptid/d086d5af-6f87-11e4-a0fc-0cc47a30009c  ONLINE	   0	 0	 0												
			gptid/d140880e-6f87-11e4-a0fc-0cc47a30009c  ONLINE	   0	 0	 0												
			gptid/d20be4e9-6f87-11e4-a0fc-0cc47a30009c  ONLINE	   0	 0	 0												
			gptid/d2b134ce-6f87-11e4-a0fc-0cc47a30009c  ONLINE	   0	 0	 0												
			gptid/d35490b0-6f87-11e4-a0fc-0cc47a30009c  ONLINE	   0	 0	 0												
																																	
errors: No known data errors																										
																																	
  pool: freenas-boot																												
state: ONLINE																													
  scan: scrub repaired 0 in 0h3m with 0 errors on Tue Nov 29 03:48:52 2016														
config:																															
																																	
		NAME										  STATE	 READ WRITE CKSUM													
		freenas-boot								  ONLINE	   0	 0	 0													
		  gptid/a3052444-9a31-11e6-86ac-bc5ff48e8e55  ONLINE	   0	 0	 0													
																																	
errors: No known data errors
 
Last edited by a moderator:

Dice

Wizard
Joined
Dec 11, 2015
Messages
1,410
The indications show there are problems on da2 and da8.

Post the output of
Code:
smartctl -a /dev/da2
and
Code:
smartctl -a /dev/da6

This will help further assessing the severity of the alarms.

da2 looks to have a LOT of problematic sectors which is really not good.
da6 has a lot less which needs to be kept under attention.

Only from these two indicators, if it had been my drives, I'd run off to the store to get 2 new drives and burn them in (according to guide) as soon as possible.
 
Last edited:

theman1

Dabbler
Joined
Nov 8, 2014
Messages
30
da2:
Code:
When the command that caused the error occurred, the device was active or idle.												
																																	
  After command completion occurred, registers were:																				
  ER ST SC SN CL CH DH																											
  -- -- -- -- -- -- --																											
  40 51 00 ff ff ff 0f  Error: UNC at LBA = 0x0fffffff = 268435455																
																																	
  Commands leading to the command that caused the error were:																	
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name																
  -- -- -- -- -- -- -- --  ----------------  --------------------																
  25 00 b0 ff ff ff 4f 00   2d+08:58:05.816  READ DMA EXT																		
  25 00 80 ff ff ff 4f 00   2d+08:58:05.806  READ DMA EXT																		
  25 00 b8 ff ff ff 4f 00   2d+08:58:05.792  READ DMA EXT																		
  25 00 08 ff ff ff 4f 00   2d+08:58:05.784  READ DMA EXT																		
  25 00 58 ff ff ff 4f 00   2d+08:58:05.784  READ DMA EXT																		
																																	
Error 1 occurred at disk power-on lifetime: 17450 hours (727 days + 2 hours)														
  When the command that caused the error occurred, the device was active or idle.												
																																	
  After command completion occurred, registers were:																				
  ER ST SC SN CL CH DH																											
  -- -- -- -- -- -- --																											
  40 51 00 ff ff ff 0f  Error: UNC at LBA = 0x0fffffff = 268435455																
																																	
  Commands leading to the command that caused the error were:																	
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name																
  -- -- -- -- -- -- -- --  ----------------  --------------------																
  25 00 b0 ff ff ff 4f 00   2d+08:58:05.816  READ DMA EXT																		
  25 00 80 ff ff ff 4f 00   2d+08:58:05.806  READ DMA EXT																		
  25 00 b8 ff ff ff 4f 00   2d+08:58:05.792  READ DMA EXT																		
  25 00 08 ff ff ff 4f 00   2d+08:58:05.784  READ DMA EXT																		
  25 00 58 ff ff ff 4f 00   2d+08:58:05.784  READ DMA EXT																		
																																	
SMART Self-test log structure revision number 1																					
Num  Test_Description	Status				  Remaining  LifeTime(hours)  LBA_of_first_error									
# 1  Short offline	   Completed without error	   00%	 17440		 -													
																																	
SMART Selective self-test log data structure revision number 1																	
SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS																						
	1		0		0  Not_testing																								
	2		0		0  Not_testing																								
	3		0		0  Not_testing																								
	4		0		0  Not_testing																								
	5		0		0  Not_testing																								
Selective self-test flags (0x0):																									
  After scanning selected spans, do NOT read-scan remainder of disk.																
If Selective self-test is pending on power-up, resume after 0 minute delay.	


da6:
Code:
Conveyance self-test routine																										
recommended polling time:		(   2) minutes.																					
SCT capabilities:			  (0x1085) SCT Status supported.																	
																																	
SMART Attributes Data Structure revision number: 10																				
Vendor Specific SMART Attributes with Thresholds:																				
ID# ATTRIBUTE_NAME		  FLAG	 VALUE WORST THRESH TYPE	  UPDATED  WHEN_FAILED RAW_VALUE									
  1 Raw_Read_Error_Rate	 0x000f   117   099   006	Pre-fail  Always	   -	   161770584									
  3 Spin_Up_Time			0x0003   093   093   000	Pre-fail  Always	   -	   0											
  4 Start_Stop_Count		0x0032   100   100   020	Old_age   Always	   -	   63										
  5 Reallocated_Sector_Ct   0x0033   100   100   010	Pre-fail  Always	   -	   0											
  7 Seek_Error_Rate		 0x000f   075   060   030	Pre-fail  Always	   -	   34661397996								
  9 Power_On_Hours		  0x0032   080   080   000	Old_age   Always	   -	   17581										
10 Spin_Retry_Count		0x0013   100   100   097	Pre-fail  Always	   -	   0											
12 Power_Cycle_Count	   0x0032   100   100   020	Old_age   Always	   -	   63										
183 Runtime_Bad_Block	   0x0032   100   100   000	Old_age   Always	   -	   0											
184 End-to-End_Error		0x0032   100   100   099	Old_age   Always	   -	   0											
187 Reported_Uncorrect	  0x0032   100   100   000	Old_age   Always	   -	   0											
188 Command_Timeout		 0x0032   100   100   000	Old_age   Always	   -	   0 0 0										
189 High_Fly_Writes		 0x003a   098   098   000	Old_age   Always	   -	   2											
190 Airflow_Temperature_Cel 0x0022   062   052   045	Old_age   Always	   -	   38 (Min/Max 35/44)						
191 G-Sense_Error_Rate	  0x0032   100   100   000	Old_age   Always	   -	   0											
192 Power-Off_Retract_Count 0x0032   100   100   000	Old_age   Always	   -	   41										
193 Load_Cycle_Count		0x0032   100   100   000	Old_age   Always	   -	   1105										
194 Temperature_Celsius	 0x0022   038   048   000	Old_age   Always	   -	   38 (0 16 0 0 0)							
197 Current_Pending_Sector  0x0012   100   100   000	Old_age   Always	   -	   0											
198 Offline_Uncorrectable   0x0010   100   100   000	Old_age   Offline	  -	   0											
199 UDMA_CRC_Error_Count	0x003e   200   200   000	Old_age   Always	   -	   0											
240 Head_Flying_Hours	   0x0000   100   253   000	Old_age   Offline	  -	   17551h+29m+07.730s						
241 Total_LBAs_Written	  0x0000   100   253   000	Old_age   Offline	  -	   25134508388								
242 Total_LBAs_Read		 0x0000   100   253   000	Old_age   Offline	  -	   1961962797260								
																																	
SMART Error Log Version: 1																										
No Errors Logged																													
																																	
SMART Self-test log structure revision number 1																					
No self-tests have been logged.  [To run self-tests, use: smartctl -t]															
																																	
SMART Selective self-test log data structure revision number 1																	
SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS																						
	1		0		0  Not_testing																								
	2		0		0  Not_testing																								
	3		0		0  Not_testing																								
	4		0		0  Not_testing																								
	5		0		0  Not_testing																								
Selective self-test flags (0x0):																									
  After scanning selected spans, do NOT read-scan remainder of disk.																
If Selective self-test is pending on power-up, resume after 0 minute delay.


thanks! ordering atleast 1 more disk.. may rip out the 3tb from the external enclosure that I have laying around
 

Dice

Wizard
Joined
Dec 11, 2015
Messages
1,410
What the... ..I'm a bit confused.
As it stands - da6 does not show any sign of problems on ID# 197 and #198 as indicated in the criticla report. However - #1 and #7 is through the roof. My drives show 0.
did you reboot the machine inbetween posting the first and last ?
Because - when you do the daX may change.
If that happened, you may have yet another drive that is looking real ugly.

Also indicating that you've not properly configured smart-tests:
SMART Self-test log structure revision number 1
No self-tests have been logged. [To run self-tests, use: smartctl -t]

da2 - I do not recognize this output.

I see three things that needs to be addressed:
1. Assure configuration of schedule for S.M.A.R.T tests and scrubs. This is easy to follow and can be straight out copied: https://forums.freenas.org/index.php?threads/scrub-and-smart-testing-schedules.20108/
2. Grab a new drive and your 'from the external enclosure'. Plug them into your machine and run these tests. It may take up to 2-3 days at worst. Assess proper cooling of the drives (ie - don't have them rattling around on a table meanwhile. They will get hot: https://forums.freenas.org/index.php?threads/how-to-hard-drive-burn-in-testing.21451/
3. Perform S.M.A.R.T tests for all your drives. Start with the short for each drive. Once you've typed and sent all those commands, the first drive is likely ready to do its first long test. That will take a lot longer time. Allow for completion.
4. Review the situation of the rest of your drives, your pool, to assess if further drives are in worse shape than expected.

Further reads and subsequently bookmarks of value:
https://forums.freenas.org/index.php?resources/hard-drive-troubleshooting-guide.17/
https://en.wikipedia.org/wiki/S.M.A.R.T.
 

Bidule0hm

Server Electronics Sorcerer
Joined
Aug 5, 2013
Messages
3,710
However - #1 and #7 is through the roof. My drives show 0.

If it's a Seagate drive it's normal, the value isn't directly human readable.

da2 - I do not recognize this output.

Looks like the end of a smartctl -x to me.
 

theman1

Dabbler
Joined
Nov 8, 2014
Messages
30
I think the first line was missing when I copy/pasted it..

Code:
Error 2 occurred at disk power-on lifetime: 17450 hours (727 days + 2 hours)														
  When the command that caused the error occurred, the device was active or idle.												   
																																	
  After command completion occurred, registers were:																				
  ER ST SC SN CL CH DH																											 
  -- -- -- -- -- -- --																											 
  40 51 00 ff ff ff 0f  Error: UNC at LBA = 0x0fffffff = 268435455																 
																																	
  Commands leading to the command that caused the error were:																	   
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name																   
  -- -- -- -- -- -- -- --  ----------------  --------------------																   
  25 00 b0 ff ff ff 4f 00   2d+08:58:05.816  READ DMA EXT																		   
  25 00 80 ff ff ff 4f 00   2d+08:58:05.806  READ DMA EXT																		   
  25 00 b8 ff ff ff 4f 00   2d+08:58:05.792  READ DMA EXT																		   
  25 00 08 ff ff ff 4f 00   2d+08:58:05.784  READ DMA EXT																		   
  25 00 58 ff ff ff 4f 00   2d+08:58:05.784  READ DMA EXT																		   
																																	
Error 1 occurred at disk power-on lifetime: 17450 hours (727 days + 2 hours)														
  When the command that caused the error occurred, the device was active or idle.												   
																																	
  After command completion occurred, registers were:																				
  ER ST SC SN CL CH DH																											 
  -- -- -- -- -- -- --																											 
  40 51 00 ff ff ff 0f  Error: UNC at LBA = 0x0fffffff = 268435455																 
																																	
  Commands leading to the command that caused the error were:																	   
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name																   
  -- -- -- -- -- -- -- --  ----------------  --------------------																   
  25 00 b0 ff ff ff 4f 00   2d+08:58:05.816  READ DMA EXT																		   
  25 00 80 ff ff ff 4f 00   2d+08:58:05.806  READ DMA EXT																		   
  25 00 b8 ff ff ff 4f 00   2d+08:58:05.792  READ DMA EXT																		   
  25 00 08 ff ff ff 4f 00   2d+08:58:05.784  READ DMA EXT																		   
  25 00 58 ff ff ff 4f 00   2d+08:58:05.784  READ DMA EXT																		   
																																	
SMART Self-test log structure revision number 1																					 
Num  Test_Description	Status				  Remaining  LifeTime(hours)  LBA_of_first_error									 
# 1  Short offline	   Completed without error	   00%	 17440		 -													 
																																	
SMART Selective self-test log data structure revision number 1																	 
SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS																						
	1		0		0  Not_testing																								
	2		0		0  Not_testing																								
	3		0		0  Not_testing																								
	4		0		0  Not_testing																								
	5		0		0  Not_testing																								
Selective self-test flags (0x0):																									
  After scanning selected spans, do NOT read-scan remainder of disk.																
If Selective self-test is pending on power-up, resume after 0 minute delay.											   


Will work one 1-4 from Dice thanks!
 

danb35

Hall of Famer
Joined
Aug 16, 2011
Messages
15,504
No, the full output. All of it. From beginning to end. The beginning looks something like this:
Code:
[root@freenas2] ~# smartctl -a /dev/da0
smartctl 6.5 2016-05-07 r4318 [FreeBSD 10.3-STABLE amd64] (local build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:  Western Digital Green
Device Model:  WDC WD20EARX-008FB0


Edit: If you're using the shell from the web GUI, don't. Enable SSH and use any SSH client written within the last decade or so that has a scrollback buffer so you can review the complete output of this and other commands.
 

Robert Trevellyan

Pony Wrangler
Joined
May 16, 2014
Messages
3,778
Alternatively, send the output to a text file and then copy/paste the contents:
smartctl -x /dev/da0 > da0.txt
 
Last edited:

Sakuru

Guru
Joined
Nov 20, 2015
Messages
527
Better yet, you can pipe the output to ix.io
Code:
smartctl -x /dev/da0 | curl -F 'f:1=<-' ix.io

It's much easier to copy and paste a link than a wall of text :)
 

danb35

Hall of Famer
Joined
Aug 16, 2011
Messages
15,504
No, please don't post a link. Please post the information here, so it stays here no matter what happens to some other site. Same goes for pictures. Besides, links to pastebin don't work from my office.
 

Sakuru

Guru
Joined
Nov 20, 2015
Messages
527
No, please don't post a link. Please post the information here, so it stays here no matter what happens to some other site. Same goes for pictures.

Ooh, good point. Sorry, I'm used to it being really handy in IRC.
 

theman1

Dabbler
Joined
Nov 8, 2014
Messages
30
Code:
smartctl 6.5 2016-05-07 r4318 [FreeBSD 10.3-STABLE amd64] (local build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:	 Seagate Barracuda 7200.14 (AF)
Device Model:	 ST3000DM001-1E6166
Serial Number:	Z1F57ZA5
LU WWN Device Id: 5 000c50 06708f5db
Firmware Version: SC48
User Capacity:	3,000,592,982,016 bytes [3.00 TB]
Sector Sizes:	 512 bytes logical, 4096 bytes physical
Rotation Rate:	7200 rpm
Form Factor:	  3.5 inches
Device is:		In smartctl database [for details use: -P show]
ATA Version is:   ATA8-ACS T13/1699-D revision 4
SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:	Thu Dec 15 16:20:57 2016 PST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
AAM feature is:   Unavailable
APM feature is:   Disabled
Rd look-ahead is: Enabled
Write cache is:   Enabled
ATA Security is:  Disabled, NOT FROZEN [SEC1]
Wt Cache Reorder: Unavailable

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x82)	Offline data collection activity
					was completed without error.
					Auto Offline Data Collection: Enabled.
Self-test execution status:	  ( 113)	The previous self-test completed having
					the read element of the test failed.
Total time to complete Offline
data collection:		 (  600) seconds.
Offline data collection
capabilities:			  (0x7b) SMART execute Offline immediate.
					Auto Offline data collection on/off support.
					Suspend Offline collection upon new
					command.
					Offline surface scan supported.
					Self-test supported.
					Conveyance Self-test supported.
					Selective Self-test supported.
SMART capabilities:			(0x0003)	Saves SMART data before entering
					power-saving mode.
					Supports SMART auto save timer.
Error logging capability:		(0x01)	Error logging supported.
					General Purpose Logging supported.
Short self-test routine
recommended polling time:	  (   1) minutes.
Extended self-test routine
recommended polling time:	  ( 367) minutes.
Conveyance self-test routine
recommended polling time:	  (   2) minutes.
SCT capabilities:			(0x3081)	SCT Status supported.

SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME		  FLAGS	VALUE WORST THRESH FAIL RAW_VALUE
  1 Raw_Read_Error_Rate	 POSR--   117   099   006	-	129939256
  3 Spin_Up_Time			PO----   092   091   000	-	0
  4 Start_Stop_Count		-O--CK   100   100   020	-	58
  5 Reallocated_Sector_Ct   PO--CK   100   100   010	-	16
  7 Seek_Error_Rate		 POSR--   067   057   030	-	473025738698
  9 Power_On_Hours		  -O--CK   080   080   000	-	17754
10 Spin_Retry_Count		PO--C-   100   100   097	-	0
12 Power_Cycle_Count	   -O--CK   100   100   020	-	58
183 Runtime_Bad_Block	   -O--CK   100   100   000	-	0
184 End-to-End_Error		-O--CK   100   100   099	-	0
187 Reported_Uncorrect	  -O--CK   098   098   000	-	2
188 Command_Timeout		 -O--CK   100   100   000	-	0 0 0
189 High_Fly_Writes		 -O-RCK   001   001   000	-	109
190 Airflow_Temperature_Cel -O---K   065   047   045	-	35 (Min/Max 34/46)
191 G-Sense_Error_Rate	  -O--CK   100   100   000	-	0
192 Power-Off_Retract_Count -O--CK   100   100   000	-	54
193 Load_Cycle_Count		-O--CK   100   100   000	-	483
194 Temperature_Celsius	 -O---K   035   053   000	-	35 (0 17 0 0 0)
197 Current_Pending_Sector  -O--C-   100   100   000	-	88
198 Offline_Uncorrectable   ----C-   100   100   000	-	88
199 UDMA_CRC_Error_Count	-OSRCK   200   200   000	-	0
240 Head_Flying_Hours	   ------   100   253   000	-	17737h+26m+24.100s
241 Total_LBAs_Written	  ------   100   253   000	-	29093867477
242 Total_LBAs_Read		 ------   100   253   000	-	115854050144
							||||||_ K auto-keep
							|||||__ C event count
							||||___ R error rate
							|||____ S speed/performance
							||_____ O updated online
							|______ P prefailure warning

General Purpose Log Directory Version 1
SMART		   Log Directory Version 1 [multi-sector log support]
Address	Access  R/W   Size  Description
0x00	   GPL,SL  R/O	  1  Log Directory
0x01		   SL  R/O	  1  Summary SMART error log
0x02		   SL  R/O	  5  Comprehensive SMART error log
0x03	   GPL	 R/O	  5  Ext. Comprehensive SMART error log
0x06		   SL  R/O	  1  SMART self-test log
0x07	   GPL	 R/O	  1  Extended self-test log
0x09		   SL  R/W	  1  Selective self-test log
0x11	   GPL	 R/O	  1  SATA Phy Event Counters log
0x21	   GPL	 R/O	  1  Write stream error log
0x22	   GPL	 R/O	  1  Read stream error log
0x80-0x9f  GPL,SL  R/W	 16  Host vendor specific log
0xa1	   GPL,SL  VS	  20  Device vendor specific log
0xa2	   GPL	 VS	4496  Device vendor specific log
0xa8	   GPL,SL  VS	 129  Device vendor specific log
0xa9	   GPL,SL  VS	   1  Device vendor specific log
0xab	   GPL	 VS	   1  Device vendor specific log
0xb0	   GPL	 VS	5176  Device vendor specific log
0xbd	   GPL	 VS	 512  Device vendor specific log
0xbe-0xbf  GPL	 VS   65535  Device vendor specific log
0xc0	   GPL,SL  VS	   1  Device vendor specific log
0xc1	   GPL,SL  VS	  10  Device vendor specific log
0xe0	   GPL,SL  R/W	  1  SCT Command/Status
0xe1	   GPL,SL  R/W	  1  SCT Data Transfer

SMART Extended Comprehensive Error Log Version: 1 (5 sectors)
Device Error Count: 4
	CR	 = Command Register
	FEATR  = Features Register
	COUNT  = Count (was: Sector Count) Register
	LBA_48 = Upper bytes of LBA High/Mid/Low Registers ]  ATA-8
	LH	 = LBA High (was: Cylinder High) Register	]   LBA
	LM	 = LBA Mid (was: Cylinder Low) Register	  ] Register
	LL	 = LBA Low (was: Sector Number) Register	 ]
	DV	 = Device (was: Device/Head) Register
	DC	 = Device Control Register
	ER	 = Error register
	ST	 = Status register
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.

Error 4 [3] occurred at disk power-on lifetime: 17450 hours (727 days + 2 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER -- ST COUNT  LBA_48  LH LM LL DV DC
  -- -- -- == -- == == == -- -- -- -- --
  40 -- 51 00 00 00 00 70 5b c0 a8 00 00  Error: UNC at LBA = 0x705bc0a8 = 1885061288

  Commands leading to the command that caused the error were:
  CR FEATR COUNT  LBA_48  LH LM LL DV DC  Powered_Up_Time  Command/Feature_Name
  -- == -- == -- == == == -- -- -- -- --  ---------------  --------------------
  25 00 00 00 28 00 00 70 5b c0 a8 40 00  2d+08:58:33.616  READ DMA EXT
  25 00 00 00 58 00 00 70 5b c0 48 40 00  2d+08:58:30.792  READ DMA EXT
  25 00 00 00 58 00 00 70 5c 59 a8 40 00  2d+08:58:30.791  READ DMA EXT
  25 00 00 00 30 00 00 70 5c 59 78 40 00  2d+08:58:30.791  READ DMA EXT
  25 00 00 00 28 00 00 70 5c 59 20 40 00  2d+08:58:30.788  READ DMA EXT

Error 3 [2] occurred at disk power-on lifetime: 17450 hours (727 days + 2 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER -- ST COUNT  LBA_48  LH LM LL DV DC
  -- -- -- == -- == == == -- -- -- -- --
  40 -- 51 00 00 00 00 70 5b c0 a8 00 00  Error: UNC at LBA = 0x705bc0a8 = 1885061288

  Commands leading to the command that caused the error were:
  CR FEATR COUNT  LBA_48  LH LM LL DV DC  Powered_Up_Time  Command/Feature_Name
  -- == -- == -- == == == -- -- -- -- --  ---------------  --------------------
  25 00 00 00 28 00 00 70 5b c0 a8 40 00  2d+08:58:33.616  READ DMA EXT
  25 00 00 00 58 00 00 70 5b c0 48 40 00  2d+08:58:30.792  READ DMA EXT
  25 00 00 00 58 00 00 70 5c 59 a8 40 00  2d+08:58:30.791  READ DMA EXT
  25 00 00 00 30 00 00 70 5c 59 78 40 00  2d+08:58:30.791  READ DMA EXT
  25 00 00 00 28 00 00 70 5c 59 20 40 00  2d+08:58:30.788  READ DMA EXT

Error 2 [1] occurred at disk power-on lifetime: 17450 hours (727 days + 2 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER -- ST COUNT  LBA_48  LH LM LL DV DC
  -- -- -- == -- == == == -- -- -- -- --
  40 -- 51 00 00 00 00 70 5b c0 e8 00 00  Error: UNC at LBA = 0x705bc0e8 = 1885061352

  Commands leading to the command that caused the error were:
  CR FEATR COUNT  LBA_48  LH LM LL DV DC  Powered_Up_Time  Command/Feature_Name
  -- == -- == -- == == == -- -- -- -- --  ---------------  --------------------
  25 00 00 00 b0 00 00 70 5b c0 e8 40 00  2d+08:58:05.816  READ DMA EXT
  25 00 00 00 80 00 00 da 40 a7 90 40 00  2d+08:58:05.806  READ DMA EXT
  25 00 00 00 b8 00 00 da 40 72 80 40 00  2d+08:58:05.792  READ DMA EXT
  25 00 00 00 08 00 00 71 a1 d0 28 40 00  2d+08:58:05.784  READ DMA EXT
  25 00 00 00 58 00 00 71 62 13 48 40 00  2d+08:58:05.784  READ DMA EXT

Error 1 [0] occurred at disk power-on lifetime: 17450 hours (727 days + 2 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER -- ST COUNT  LBA_48  LH LM LL DV DC
  -- -- -- == -- == == == -- -- -- -- --
  40 -- 51 00 00 00 00 70 5b c0 e8 00 00  Error: UNC at LBA = 0x705bc0e8 = 1885061352

  Commands leading to the command that caused the error were:
  CR FEATR COUNT  LBA_48  LH LM LL DV DC  Powered_Up_Time  Command/Feature_Name
  -- == -- == -- == == == -- -- -- -- --  ---------------  --------------------
  25 00 00 00 b0 00 00 70 5b c0 e8 40 00  2d+08:58:05.816  READ DMA EXT
  25 00 00 00 80 00 00 da 40 a7 90 40 00  2d+08:58:05.806  READ DMA EXT
  25 00 00 00 b8 00 00 da 40 72 80 40 00  2d+08:58:05.792  READ DMA EXT
  25 00 00 00 08 00 00 71 a1 d0 28 40 00  2d+08:58:05.784  READ DMA EXT
  25 00 00 00 58 00 00 71 62 13 48 40 00  2d+08:58:05.784  READ DMA EXT

SMART Extended Self-test Log Version: 1 (1 sectors)
Num  Test_Description	Status				  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short offline	   Completed: read failure	   10%	 17743		 1885061288
# 2  Extended offline	Completed: read failure	   90%	 17623		 1885061288
# 3  Short offline	   Completed without error	   00%	 17440		 -

SMART Selective self-test log data structure revision number 1
SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
	1		0		0  Not_testing
	2		0		0  Not_testing
	3		0		0  Not_testing
	4		0		0  Not_testing
	5		0		0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

SCT Status Version:				  3
SCT Version (vendor specific):	   522 (0x020a)
SCT Support Level:				   1
Device State:						Active (0)
Current Temperature:					35 Celsius
Power Cycle Min/Max Temperature:	 34/46 Celsius
Lifetime	Min/Max Temperature:	 16/53 Celsius
Under/Over Temperature Limit Count:   0/0

SCT Data Table command not supported

SCT Error Recovery Control command not supported

Device Statistics (GP/SMART Log 0x04) not supported

SATA Phy Event Counters (GP Log 0x11)
ID	  Size	 Value  Description
0x000a  2			4  Device-to-host register FISes sent due to a COMRESET
0x0001  2			0  Command failed due to ICRC error
0x0003  2			0  R_ERR response for device-to-host data FIS
0x0004  2			0  R_ERR response for host-to-device data FIS
0x0006  2			0  R_ERR response for device-to-host non-data FIS
0x0007  2			0  R_ERR response for host-to-device non-data FIS




(if you want to view it with the formatting intact- http://ix.io/1MCx)

Didn't know I could send it to a txt or ix.io. Pretty cool, thanks. I received my new drive today actually and was going to start running the recommended tests on it.
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
It seems to me that the drive has seen 55 degrees Celsius, which is significantly more than it should have.
 

Robert Trevellyan

Pony Wrangler
Joined
May 16, 2014
Messages
3,778

SweetAndLow

Sweet'NASty
Joined
Nov 6, 2013
Messages
6,421

theman1

Dabbler
Joined
Nov 8, 2014
Messages
30
and da6-

Code:
smartctl 6.5 2016-05-07 r4318 [FreeBSD 10.3-STABLE amd64] (local build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:	 Seagate Barracuda 7200.14 (AF)
Device Model:	 ST3000DM001-1ER166
Serial Number:	W5002SQ7
LU WWN Device Id: 5 000c50 077b86537
Firmware Version: CC41
User Capacity:	3,000,592,982,016 bytes [3.00 TB]
Sector Sizes:	 512 bytes logical, 4096 bytes physical
Rotation Rate:	7200 rpm
Form Factor:	  3.5 inches
Device is:		In smartctl database [for details use: -P show]
ATA Version is:   ACS-2, ACS-3 T13/2161-D revision 3b
SATA Version is:  SATA 3.1, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:	Thu Dec 15 20:59:56 2016 PST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
AAM feature is:   Unavailable
APM feature is:   Disabled
Rd look-ahead is: Enabled
Write cache is:   Enabled
ATA Security is:  Disabled, NOT FROZEN [SEC1]
Wt Cache Reorder: Unavailable

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x82)	Offline data collection activity
					was completed without error.
					Auto Offline Data Collection: Enabled.
Self-test execution status:	  (   0)	The previous self-test routine completed
					without error or no self-test has ever
					been run.
Total time to complete Offline
data collection:		 (   89) seconds.
Offline data collection
capabilities:			  (0x7b) SMART execute Offline immediate.
					Auto Offline data collection on/off support.
					Suspend Offline collection upon new
					command.
					Offline surface scan supported.
					Self-test supported.
					Conveyance Self-test supported.
					Selective Self-test supported.
SMART capabilities:			(0x0003)	Saves SMART data before entering
					power-saving mode.
					Supports SMART auto save timer.
Error logging capability:		(0x01)	Error logging supported.
					General Purpose Logging supported.
Short self-test routine
recommended polling time:	  (   1) minutes.
Extended self-test routine
recommended polling time:	  ( 332) minutes.
Conveyance self-test routine
recommended polling time:	  (   2) minutes.
SCT capabilities:			(0x1085)	SCT Status supported.

SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME		  FLAGS	VALUE WORST THRESH FAIL RAW_VALUE
  1 Raw_Read_Error_Rate	 POSR--   111   099   006	-	29758368
  3 Spin_Up_Time			PO----   093   093   000	-	0
  4 Start_Stop_Count		-O--CK   100   100   020	-	64
  5 Reallocated_Sector_Ct   PO--CK   100   100   010	-	0
  7 Seek_Error_Rate		 POSR--   075   060   030	-	34666540634
  9 Power_On_Hours		  -O--CK   080   080   000	-	17778
10 Spin_Retry_Count		PO--C-   100   100   097	-	0
12 Power_Cycle_Count	   -O--CK   100   100   020	-	64
183 Runtime_Bad_Block	   -O--CK   100   100   000	-	0
184 End-to-End_Error		-O--CK   100   100   099	-	0
187 Reported_Uncorrect	  -O--CK   100   100   000	-	0
188 Command_Timeout		 -O--CK   100   100   000	-	0 0 0
189 High_Fly_Writes		 -O-RCK   098   098   000	-	2
190 Airflow_Temperature_Cel -O---K   066   052   045	-	34 (Min/Max 30/34)
191 G-Sense_Error_Rate	  -O--CK   100   100   000	-	0
192 Power-Off_Retract_Count -O--CK   100   100   000	-	42
193 Load_Cycle_Count		-O--CK   100   100   000	-	1106
194 Temperature_Celsius	 -O---K   034   048   000	-	34 (0 16 0 0 0)
197 Current_Pending_Sector  -O--C-   100   100   000	-	8
198 Offline_Uncorrectable   ----C-   100   100   000	-	8
199 UDMA_CRC_Error_Count	-OSRCK   200   200   000	-	0
240 Head_Flying_Hours	   ------   100   253   000	-	17748h+21m+34.572s
241 Total_LBAs_Written	  ------   100   253   000	-	29236326124
242 Total_LBAs_Read		 ------   100   253   000	-	1966523305399
							||||||_ K auto-keep
							|||||__ C event count
							||||___ R error rate
							|||____ S speed/performance
							||_____ O updated online
							|______ P prefailure warning

General Purpose Log Directory Version 1
SMART		   Log Directory Version 1 [multi-sector log support]
Address	Access  R/W   Size  Description
0x00	   GPL,SL  R/O	  1  Log Directory
0x01		   SL  R/O	  1  Summary SMART error log
0x02		   SL  R/O	  5  Comprehensive SMART error log
0x03	   GPL	 R/O	  5  Ext. Comprehensive SMART error log
0x06		   SL  R/O	  1  SMART self-test log
0x07	   GPL	 R/O	  1  Extended self-test log
0x09		   SL  R/W	  1  Selective self-test log
0x10	   GPL	 R/O	  1  SATA NCQ Queued Error log
0x11	   GPL	 R/O	  1  SATA Phy Event Counters log
0x21	   GPL	 R/O	  1  Write stream error log
0x22	   GPL	 R/O	  1  Read stream error log
0x30	   GPL,SL  R/O	  9  IDENTIFY DEVICE data log
0x80-0x9f  GPL,SL  R/W	 16  Host vendor specific log
0xa1	   GPL,SL  VS	  20  Device vendor specific log
0xa2	   GPL	 VS	4496  Device vendor specific log
0xa8	   GPL,SL  VS	 129  Device vendor specific log
0xa9	   GPL,SL  VS	   1  Device vendor specific log
0xab	   GPL	 VS	   1  Device vendor specific log
0xb0	   GPL	 VS	5176  Device vendor specific log
0xbe-0xbf  GPL	 VS   65535  Device vendor specific log
0xc0	   GPL,SL  VS	   1  Device vendor specific log
0xc1	   GPL,SL  VS	  10  Device vendor specific log
0xc3	   GPL,SL  VS	   8  Device vendor specific log
0xc4	   GPL,SL  VS	   5  Device vendor specific log
0xe0	   GPL,SL  R/W	  1  SCT Command/Status
0xe1	   GPL,SL  R/W	  1  SCT Data Transfer

SMART Extended Comprehensive Error Log Version: 1 (5 sectors)
No Errors Logged

SMART Extended Self-test Log Version: 1 (1 sectors)
Num  Test_Description	Status				  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short offline	   Completed without error	   00%	 17762		 -
# 2  Extended offline	Completed without error	   00%	 17648		 -

SMART Selective self-test log data structure revision number 1
SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
	1		0		0  Not_testing
	2		0		0  Not_testing
	3		0		0  Not_testing
	4		0		0  Not_testing
	5		0		0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

SCT Status Version:				  3
SCT Version (vendor specific):	   522 (0x020a)
SCT Support Level:				   1
Device State:						Active (0)
Current Temperature:					32 Celsius
Power Cycle Min/Max Temperature:	 30/32 Celsius
Lifetime	Min/Max Temperature:	 16/48 Celsius
Under/Over Temperature Limit Count:   0/0

SCT Data Table command not supported

SCT Error Recovery Control command not supported

Device Statistics (GP/SMART Log 0x04) not supported

SATA Phy Event Counters (GP Log 0x11)
ID	  Size	 Value  Description
0x000a  2			4  Device-to-host register FISes sent due to a COMRESET
0x0001  2			0  Command failed due to ICRC error
0x0003  2			0  R_ERR response for device-to-host data FIS
0x0004  2			0  R_ERR response for host-to-device data FIS
0x0006  2			0  R_ERR response for device-to-host non-data FIS
0x0007  2			0  R_ERR response for host-to-device non-data FIS



(http://ix.io/1MEY)

Looks to have 8 offline uncorrectable sectors. Is this a must replace bc the disk may die soon as well?
 

Robert Trevellyan

Pony Wrangler
Joined
May 16, 2014
Messages
3,778
Now would be a good time to test your backups.

Usually, pending and/or offline uncorrectable in the single digits is not a very urgent cause for concern. However, with this particular model, which has a history of sudden, early failure, I wouldn't take any chances.

Do you have any spare drive ports? If so, you can replace a disk without losing any redundancy. If not, with your RAIDZ2 pool, you can replace one at a time while retaining single-parity redundancy.

I'd start with da2, then da6, and take a close look at any other ST3000DM001.
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,994
Two unhappy drives, not a good situation at all. I really don't care for using Desktop hard drives for a server, they die early and have a 2 year warranty which it looks like you just barely exceeded. When you replace these, spend the extra few bucks for a true NAS drive. My WD Reds have been spinning for 4 years, 1 year beyond the warranty. They are still going strong (knock on wood).
 
Status
Not open for further replies.
Top