Volume degraded

Status
Not open for further replies.

mst

Explorer
Joined
Aug 17, 2014
Messages
95
Experts,

I have replaced one of the hard drives and resilvering is still going on but I am worry a bit since I used 3rd new hdd and shows me value degraded

Code:
[root@freenas] ~# zpool status -v
  pool: MAIN
 state: ONLINE
  scan: resilvered 0 in 0h0m with 0 errors on Wed May 30 13:10:26 2018
config:

		NAME											STATE	 READ WRITE CKS																																							 UM
		MAIN											ONLINE	   0	 0																																								  0
		  raidz3-0									  ONLINE	   0	 0																																								  0
			gptid/844186ca-d7ff-11e6-9efa-a0369f1466cc  ONLINE	   0	 0																																								  0
			gptid/84e449ec-d7ff-11e6-9efa-a0369f1466cc  ONLINE	   0	 0																																								  0
			gptid/857e1f75-d7ff-11e6-9efa-a0369f1466cc  ONLINE	   0	 0																																								  0
			gptid/86163fb3-d7ff-11e6-9efa-a0369f1466cc  ONLINE	   0	 0																																								  0
			gptid/86af1714-d7ff-11e6-9efa-a0369f1466cc  ONLINE	   0	 0																																								  0
			gptid/87475013-d7ff-11e6-9efa-a0369f1466cc  ONLINE	   0	 0																																								  0
			gptid/87dc3aa1-d7ff-11e6-9efa-a0369f1466cc  ONLINE	   0	 0																																								  0
		  raidz3-1									  ONLINE	   0	 0																																								  0
			gptid/88860260-d7ff-11e6-9efa-a0369f1466cc  ONLINE	   0	 0																																								  0
			gptid/8928d9a5-d7ff-11e6-9efa-a0369f1466cc  ONLINE	   0	 0																																								  0
			gptid/89c997b2-d7ff-11e6-9efa-a0369f1466cc  ONLINE	   0	 0																																								  0
			gptid/8a62f600-d7ff-11e6-9efa-a0369f1466cc  ONLINE	   0	 0																																								  0
			gptid/8b062696-d7ff-11e6-9efa-a0369f1466cc  ONLINE	   0	 0																																								  0
			gptid/8ba5d2fb-d7ff-11e6-9efa-a0369f1466cc  ONLINE	   0	 0																																								  0
			gptid/8c49327a-d7ff-11e6-9efa-a0369f1466cc  ONLINE	   0	 0																																								  0
		  gptid/764b5ef9-dcc7-11e6-9cc2-a0369f1466cc	ONLINE	   0	 0																																								  0

errors: No known data errors

  pool: MAIN_BCKP
 state: DEGRADED
status: One or more devices is currently being resilvered.  The pool will
		continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
  scan: resilver in progress since Wed May 30 18:24:08 2018
		575G scanned out of 1.75T at 13.3M/s, 25h57m to go
		67.5G resilvered, 32.11% done
config:

		NAME											STATE	 READ WRITE CKS																																							 UM
		MAIN_BCKP									   DEGRADED	 0	 0 2.8																																							 1K
		  raidz2-0									  DEGRADED	 0	 0 5.6																																							 1K
			gptid/1eb6cc45-cdf6-11e7-bd27-a0369f1466cc  DEGRADED	 0	 0																																								  0  too many errors
			gptid/cad3c87f-6435-11e8-b6c7-a0369f1466cc  DEGRADED	 0	 0 15.																																							 0K  too many errors  (resilvering)
			gptid/7df99082-7d36-11e7-b369-a0369f1466cc  DEGRADED	 0	 0 19.																																							 9K  too many errors  (resilvering)
			gptid/3ffac730-0349-11e7-ad7d-a0369f1466cc  ONLINE	   0	 0																																								  0  (resilvering)
		  raidz2-1									  ONLINE	   0	 0																																								  0
			gptid/3f426822-d372-11e7-8a41-a0369f1466cc  ONLINE	   0	 0																																								  0
			gptid/da04efac-d3b0-11e7-88da-a0369f1466cc  ONLINE	   0	 0																																								  0
			da16p2									  ONLINE	   0	 0																																								  0
			gptid/723e4fb2-d376-11e7-8a41-a0369f1466cc  ONLINE	   0	 0																																								  0

errors: Permanent errors have been detected in the following files:

		MAIN_BCKP/MAIN_BCKP:<0x1>

  pool: freenas-boot
 state: ONLINE
  scan: scrub repaired 0 in 0h0m with 0 errors on Thu May 31 03:45:27 2018
config:

		NAME		STATE	 READ WRITE CKSUM
		freenas-boot  ONLINE	   0	 0	 0
		  ada0p2	ONLINE	   0	 0	 0

errors: No known data errors




I assume I should wait for reselviring to finish and then check errors again - correct? What worries me:

0 too many errors
gptid/cad3c87f-6435-11e8-b6c7-a0369f1466cc DEGRADED 0 0 15. 0K too many errors (resilvering)
gptid/7df99082-7d36-11e7-b369-a0369f1466cc DEGRADED 0 0 19. 9K too many errors (resilvering)
gptid/3ffac730-0349-11e7-ad7d-a0369f1466cc ONLINE 0 0

Always MAIN was ok - I have newer hard drives there - any thoughts?
 

Chris Moore

Hall of Famer
Joined
May 2, 2015
Messages
10,080
What controller are the problem drives connected to?

Sent from my SAMSUNG-SGH-I537 using Tapatalk
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,994
Just to note, @mst is running FreeNAS 9.x.

@mst please post your system specs for this issue. Also, if this is why you are considering upgrading to FreeNAS 11 (question in a different thread), I'd just figure out what the issue is here and fix it and leave it alone if FreeNAS 9.x works for you.
 

mst

Explorer
Joined
Aug 17, 2014
Messages
95
What controller are the problem drives connected to?

Sent from my SAMSUNG-SGH-I537 using Tapatalk


I have
LSI LOGIC SAS 9207-8i Storage Controller which should be compatible and was flashed to 20.07.00 IT firmware as far as I remember, the storage is Supermicor with 3TB and 1.5TB drives pools
 

Chris Moore

Hall of Famer
Joined
May 2, 2015
Messages
10,080
All drives connected with the same controller?

Sent from my SAMSUNG-SGH-I537 using Tapatalk
 

danb35

Hall of Famer
Joined
Aug 16, 2011
Messages
15,504
You have a couple of big problems:
  • MAIN consists of two seven-disk RAIDZ3 vdevs and one single-disk vdev striped together. When that one disk (gptid/764b5ef9-dcc7-11e6-9cc2-a0369f1466cc) fails, you'll lose your entire pool.
  • You have pool-level metadata corruption on MAIN_BCKP.
 

Chris Moore

Hall of Famer
Joined
May 2, 2015
Messages
10,080
MAIN consists of two seven-disk RAIDZ3 vdevs and one single-disk vdev striped together. When that one disk (gptid/764b5ef9-dcc7-11e6-9cc2-a0369f1466cc) fails, you'll lose your entire pool.
@mst WOW, I didn't notice that when I was looking at it on my phone, but that is a huge problem for you. It kind of blows the whole RAIDz3 redundancy out of the water to have a single disk striped in like that. How did you manage to do that anyhow?
The only fix is to make a full backup of all your data, somewhere reliable, destroy the pool, build it again and then bring the data back.
The MAIN_BCKP pool isn't healthy. I would destroy that pool and run a thorough test on all the drives before making a new pool and copying the data back. I don't know what the issue is, but it may be that you have some old drives here that all started to go bad at once. Last year I had a rash of bad drives in my server because they were all over 5 years old. I was loosing a drive every two or threee weeks and I even had two go out in the same day at one point. If you run the MAIN_BCKP pool drives through some destructive testing, it will help to isolate the cause of the problem. If they are older drives, it is a good bet that they are actually having problems.
Do you have regularly scheduled SMART tests running?
 

Chris Moore

Hall of Famer
Joined
May 2, 2015
Messages
10,080
Look on the bright side, when you destroy this pool:
Code:
		NAME											STATE	 READ WRITE  CKSUM
		MAIN_BCKP									   DEGRADED	 0	 0	  2.81K
		  raidz2-0									  DEGRADED	 0	 0	  5.61K
			gptid/1eb6cc45-cdf6-11e7-bd27-a0369f1466cc  DEGRADED	 0	 0	  0  too many errors
			gptid/cad3c87f-6435-11e8-b6c7-a0369f1466cc  DEGRADED	 0	 0	  15.0K  too many errors  (resilvering)
			gptid/7df99082-7d36-11e7-b369-a0369f1466cc  DEGRADED	 0	 0	  19.9K  too many errors  (resilvering)
			gptid/3ffac730-0349-11e7-ad7d-a0369f1466cc  ONLINE	   0	 0	  0  (resilvering)
		  raidz2-1									  ONLINE	   0	 0	  0
			gptid/3f426822-d372-11e7-8a41-a0369f1466cc  ONLINE	   0	 0	  0
			gptid/da04efac-d3b0-11e7-88da-a0369f1466cc  ONLINE	   0	 0	  0
			da16p2									  ONLINE	   0	 0	  0
			gptid/723e4fb2-d376-11e7-8a41-a0369f1466cc  ONLINE	   0	 0	  0

You can recreate it as a single vdev and you gain two drives worth of storage space.
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,994
You have a couple of big problems:
  • MAIN consists of two seven-disk RAIDZ3 vdevs and one single-disk vdev striped together. When that one disk (gptid/764b5ef9-dcc7-11e6-9cc2-a0369f1466cc) fails, you'll lose your entire pool.
  • You have pool-level metadata corruption on MAIN_BCKP.
Great catch!
 

danb35

Hall of Famer
Joined
Aug 16, 2011
Messages
15,504
How did you manage to do that anyhow?
Probably another attempted disk replacement gone bad--though how people manage to continue to do this still baffles me.
 

Chris Moore

Hall of Famer
Joined
May 2, 2015
Messages
10,080

mst

Explorer
Joined
Aug 17, 2014
Messages
95
Like @joeschmuck said, it would make troubleshooting easier if we knew what hardware you are using. It saves some of the "50 questions" bit. Plus, it is in the rules:

Updated Forum Rules 4/11/17
https://forums.freenas.org/index.php?threads/updated-forum-rules-4-11-17.45124/


o boy .... looks like this would be a bigger project, lucky me I use this storage as backup but anyway ...... recreating both pools would get rid of any corruption - correct?

Indeed all the HDDs are not brand new - but less than 5 years old. Checking so many hdds -one by one will take some time and recreating the process is also not that easy like it looks like

thank you all for all advises
 

danb35

Hall of Famer
Joined
Aug 16, 2011
Messages
15,504
The only fix is to make a full backup of all your data, somewhere reliable, destroy the pool, build it again and then bring the data back.
...or wait for 11.2, which will allow vdev removal. But I wouldn't want to wait out that risk.
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,994
...or wait for 11.2, which will allow vdev removal. But I wouldn't want to wait out that risk.
Oh no! Beta 1 is scheduled to come out at the end of June and we all know how things stick to the schedule. Nope, I'd backup all data and rebuild the pool sooner than later. I guess you could backup all your data and wait for the new 11.2 and then give it the real test on fixing your issue. Naw, that wouldn't be me.
 

Chris Moore

Hall of Famer
Joined
May 2, 2015
Messages
10,080
recreating both pools would get rid of any corruption - correct?
For the moment, it appears there is no data corruption in the main pool. If the backup pool is a backup of the main pool (guessing) then your data should be relatively safe... Fingers crossed about that single drive striped into main... The thing I would do is use the drive burn-in test on all the drives in the backup pool to try and eliminate any drives there that are defective.

Info on drive burn-in here:

Building, Burn-In, and Testing your FreeNAS system
https://forums.freenas.org/index.php?resources/building-burn-in-and-testing-your-freenas-system.38/

Github repository for FreeNAS scripts, including disk burnin
https://forums.freenas.org/index.ph...for-freenas-scripts-including-disk-burnin.28/

Then build a new backup pool and make a fresh backup of the main pool. After testing that backup pool to ensure it is healthy, (fingers crossed again) you should be able to reconfigure the main pool and copy the data back.

It will likely take days, depending on how much data you need to copy. I am making a backup right now at work that is projected to take 22 days. Your backup will probably be much faster.
 
Status
Not open for further replies.
Top