One or more devices has experienced an unrecoverable error

Status
Not open for further replies.

ITGuy1024

Explorer
Joined
Dec 13, 2014
Messages
89
I logged in to my server today to check something and saw this error. Also apparently email alerts are broken but that's another issue.

"The volume PoolB state is ONLINE: One or more devices has experienced an unrecoverable error. An attempt was made to correct the error. Applications are unaffected."

I checked my pools and they show healthy. Is there any diagnostics or way to gather more info about this? If I need to replace a drive I don't see where it shows me which one is failing.
 

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,112
SSH into the machine or open a shell and type zpool status -v

Post the result inside [CODE] [/CODE] tags for formatting. Look for errors in the READ/WRITE/CKSUM columns on each drive, check the line that talks about a scan/scrub.
 

ITGuy1024

Explorer
Joined
Dec 13, 2014
Messages
89
Code:
state: ONLINE																													 
status: Some supported features are not enabled on the pool. The pool can														   
	   still be used, but some features are unavailable.																		   
action: Enable all features using 'zpool upgrade'. Once this is done,															   
	   the pool may no longer be accessible by software that does not support													 
	   the features. See zpool-features(7) for details.																			
  scan: scrub repaired 0 in 0h2m with 0 errors on Wed Aug 15 13:02:27 2018														 
config:																															 
																																   
	   NAME											STATE	 READ WRITE CKSUM												 
	   PoolA										   ONLINE	   0	 0	 0												 
		 raidz1-0									  ONLINE	   0	 0	 0												 
		   gptid/fcdd10ed-8276-11e4-9022-40167e67c7a4  ONLINE	   0	 0	 0												 
		   gptid/fdbcfa1f-8276-11e4-9022-40167e67c7a4  ONLINE	   0	 0	 0												 
		   gptid/fe95d512-8276-11e4-9022-40167e67c7a4  ONLINE	   0	 0	 0												 
		   gptid/ff5789f5-8276-11e4-9022-40167e67c7a4  ONLINE	   0	 0	 0												 
																																   
errors: No known data errors																										
																																   
  pool: PoolB																													   
 state: ONLINE																													 
status: One or more devices has experienced an unrecoverable error.  An															 
	   attempt was made to correct the error.  Applications are unaffected.														
action: Determine if the device needs to be replaced, and clear the errors														 
	   using 'zpool clear' or replace the device with 'zpool replace'.															 
   see: http://illumos.org/msg/ZFS-8000-9P																						 
  scan: scrub repaired 0 in 18h49m with 0 errors on Thu Aug  2 18:49:23 2018														
config:																															 
																																   
	   NAME											STATE	 READ WRITE CKSUM												 
	   PoolB										   ONLINE	   0	 0	 0												 
		 raidz1-0									  ONLINE	   0	 0	 0												 
		   gptid/2ca2e5d7-8669-11e4-88f8-40167e67c7a4  ONLINE	   0	 0	 0												 
		   gptid/2cdcb60f-8669-11e4-88f8-40167e67c7a4  ONLINE	   0	 0	 1												 
		   gptid/2d9de64f-8669-11e4-88f8-40167e67c7a4  ONLINE	   0	 0	 0												 
		   gptid/2e5d7e49-8669-11e4-88f8-40167e67c7a4  ONLINE	   0	 0	 0												 
																																   
errors: No known data errors																										
																																   
  pool: freenas-boot																												
 state: ONLINE																													 
  scan: scrub repaired 0 in 0h0m with 0 errors on Wed Aug 15 03:45:13 2018														 
config:																															 
																																   
	   NAME		STATE	 READ WRITE CKSUM																					 
	   freenas-boot  ONLINE	   0	 0	 0																					
		 ada4p2	ONLINE	   0	 0	 0																					 
																																   
errors: No known data errors														
 

IceBoosteR

Guru
Joined
Sep 27, 2016
Messages
503
The error message indicated that there is a checksum error on your PoolB. This can be caused by various things. Have you checked your SMART data of your drives? I would start here and investigate where the error could come from.
Also have a look in /var/log/messages if there is anything which could explain the problem. Post the output here in codetags aswell.
 

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,112
Code:
gptid/2cdcb60f-8669-11e4-88f8-40167e67c7a4  ONLINE	   0	 0	 1

That tells you that one drive experienced a checksum error, but ZFS did correct it from other data. It might be an early indication of a failing drive, or it might have just been an error that does happen - drives usually state their error rate as "1 in 10^14" or similar - and you just got unlucky.

Run glabel status - look for that gptid on the left, and find the device (eg: ada1) from the component (eg: ada1p2) then run smartctl -a /dev/ada1 against it and post result in tags.
 

ITGuy1024

Explorer
Joined
Dec 13, 2014
Messages
89
Code:
glabel status																									 
									 Name  Status  Components																	 
gptid/fcdd10ed-8276-11e4-9022-40167e67c7a4	 N/A  ada0p2																		 
gptid/fdbcfa1f-8276-11e4-9022-40167e67c7a4	 N/A  ada1p2																		 
gptid/fe95d512-8276-11e4-9022-40167e67c7a4	 N/A  ada2p2																		 
gptid/ff5789f5-8276-11e4-9022-40167e67c7a4	 N/A  ada3p2																		 
gptid/f1428c05-ea98-11e7-bfaf-40167e67c7a4	 N/A  ada4p1																		 
gptid/2ca2e5d7-8669-11e4-88f8-40167e67c7a4	 N/A  da0p2																		   
gptid/2cdcb60f-8669-11e4-88f8-40167e67c7a4	 N/A  da1p2																		   
gptid/2d9de64f-8669-11e4-88f8-40167e67c7a4	 N/A  da2p2																		   
gptid/2e5d7e49-8669-11e4-88f8-40167e67c7a4	 N/A  da3p2



Code:
smartctl -a /dev/da1p2																							
smartctl 6.5 2016-05-07 r4318 [FreeBSD 11.0-STABLE amd64] (local build)															 
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org														 
																																   
/dev/da1p2: Unable to detect device type																							
Please specify device type with the -d option.																					 
																																   
Use smartctl -h to get a usage summary 
 

ITGuy1024

Explorer
Joined
Dec 13, 2014
Messages
89
Code:
smartctl -a /dev/da1																							 
smartctl 6.5 2016-05-07 r4318 [FreeBSD 11.0-STABLE amd64] (local build)															 
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org														 
																																   
=== START OF INFORMATION SECTION ===																								
Vendor:			   HPT																										   
Product:			  DISK 0_1																									 
Revision:			 4.00																										 
User Capacity:		3,000,592,982,016 bytes [3.00 TB]																			 
Logical block size:   512 bytes																									 
Device type:		  disk																										 
Local Time is:		Wed Aug 15 15:34:32 2018 EDT																				 
SMART support is:	 Unavailable - device lacks SMART capability.																 
																																   
=== START OF READ SMART DATA SECTION ===																							
Request Sense failed, [Input/output error]																						 
Error Counter logging not supported																								 
																																   
Device does not support Self Test logging				 
 

ITGuy1024

Explorer
Joined
Dec 13, 2014
Messages
89
Well if SMART isn't available I'd say don't buy WD Red NAS drives...
 

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,112
Well if SMART isn't available I'd say don't buy WD Red NAS drives...

That's your HighPoint controller (and it had better be in JBOD mode!) not the drive. You need to pass a device argument to smartctl in the format of

Code:
smartctl -a -d hpt,L/M/N /dev/da1


Where L is the controller ID, M is the channel number, and N is the PMPort number if used.

Try:

Code:
smartctl -a -d hpt,1/2 /dev/da1
 

ITGuy1024

Explorer
Joined
Dec 13, 2014
Messages
89
Code:
smartctl -a -d hpt,1/2 /dev/da1																				   
smartctl 6.5 2016-05-07 r4318 [FreeBSD 11.0-STABLE amd64] (local build)															 
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org														 
																																   
Smartctl open device: /dev/da1 [hpt_disk_1/2/1] failed: Operation not permitted
 

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,112
I don't have a HighPoint card with me right now to poke around and find the right parameters to use unfortunately.

Basically you need to find a way to pull the SMART stats from that disk and see if there's any counters that are showing pre-failure indicators, the first culprits usually being 197 Current_Pending_Sector and 198 Offline_Uncorrectable.

If there aren't any, or the values are low, you can issue a zpool clear against the device and it should reset the value, but keep a close eye on it and if it pops up again, consider replacing the drive.

Code:
zpool clear PoolB gptid/2cdcb60f-8669-11e4-88f8-40167e67c7a4


The lack of SMART data though is a concern because regular SMART tests will pick up stuff like this, and they're an excellent additional measure to ensure the health of your pool.
 

Chris Moore

Hall of Famer
Joined
May 2, 2015
Messages
10,080
As a test, can you connect that individual drive to a regular SATA port so we can get SMART data out of it? If the RocketRaid card didn't do anything crazy to your drive, it should work just as well from a regular SATA port.
 

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,112
Here's a very old thread about Highpoint controllers from @cyberjock

https://forums.freenas.org/index.php?threads/highpoint-controller-info.8217/

Note: SMART data and serial numbers are NOT currently working with ANY Highpoint controllers. This is a feature that is being actively pursued by [Cyberjock] and William Grzybowski(developer). Serial and SMART may or may not work ever.

That was in 2012; and there's still no SMART data coming through Highpoint controllers. I'd strongly suggest putting that card where it belongs (in the trash ;) ) and getting an LSI-based HBA.

(Alternatively you could try shooting a Highpoint with a Hi-Point and see if those two respective piles of garbage can collaborate on something useful for once.)
 

Chris Moore

Hall of Famer
Joined
May 2, 2015
Messages
10,080
PS. Do you have a hardware budget / is this your system or a client system?
 

ITGuy1024

Explorer
Joined
Dec 13, 2014
Messages
89
PS. Do you have a hardware budget / is this your system or a client system?

This is my system. I was on a budget when I built it say 4 years ago, and I had all the parts laying around already. But these days spending money on it wouldn't be an issue.


Here's a very old thread about Highpoint controllers from @cyberjock

https://forums.freenas.org/index.php?threads/highpoint-controller-info.8217/



That was in 2012; and there's still no SMART data coming through Highpoint controllers. I'd strongly suggest putting that card where it belongs (in the trash ;) ) and getting an LSI-based HBA.

(Alternatively you could try shooting a Highpoint with a Hi-Point and see if those two respective piles of garbage can collaborate on something useful for once.)

Point taken ;)
 

Chris Moore

Hall of Famer
Joined
May 2, 2015
Messages
10,080
This is my system. I was on a budget when I built it say 4 years ago, and I had all the parts laying around already. But these days spending money on it wouldn't be an issue.
Were you able to move the suspect drive to a standard SATA port so we can check the SMART status?
 

Chris Moore

Hall of Famer
Joined
May 2, 2015
Messages
10,080
I would suggest changing from the two RocketRaid cards to this LSI chip based card:

HP H220 6Gbps SAS PCI-E 3.0 HBA LSI 9207-8i P20 IT Mode for ZFS FreeNAS unRAID
https://www.ebay.com/itm/162862201664

and you may need a set of these cables to go with it:

Drive Cables: Mini SAS to 4-SATA SFF-8087 Multi-Lane Forward Breakout Internal Cable - - US $12.99
https://www.ebay.com/itm/371681252206
 

ITGuy1024

Explorer
Joined
Dec 13, 2014
Messages
89
Were you able to move the suspect drive to a standard SATA port so we can check the SMART status?

I wasn't able to do that.

I would suggest changing from the two RocketRaid cards to this LSI chip based card:

HP H220 6Gbps SAS PCI-E 3.0 HBA LSI 9207-8i P20 IT Mode for ZFS FreeNAS unRAID
https://www.ebay.com/itm/162862201664

and you may need a set of these cables to go with it:

Drive Cables: Mini SAS to 4-SATA SFF-8087 Multi-Lane Forward Breakout Internal Cable - - US $12.99
https://www.ebay.com/itm/371681252206

Seems like a pretty good option. The prices have come down on those cards a bit since the last time I looked at them. What's the max breakout you can have from one of those mini sas ports?
 

Chris Moore

Hall of Famer
Joined
May 2, 2015
Messages
10,080
Seems like a pretty good option. The prices have come down on those cards a bit since the last time I looked at them. What's the max breakout you can have from one of those mini sas ports?
The card can support 256 drives if I recall correctly, but you need SAS expanders for it. Each port is four SAS lanes and, kind of like a network switch does for Ethernet, the SAS expander breaks that connectivity out to many drives. One of the systems I run at work has 4 expanssion chassis connected with 16 drives in each, but you can go higher than that easily. I can give you some guidance if you are looking to build your own or if you plan to buy a factory built chassis. There are so many options.
 
Status
Not open for further replies.
Top