Volume degraded

mst · May 31, 2018

Experts,

I have replaced one of the hard drives and resilvering is still going on but I am worry a bit since I used 3rd new hdd and shows me value degraded

Code:

[root@freenas] ~# zpool status -v
  pool: MAIN
 state: ONLINE
  scan: resilvered 0 in 0h0m with 0 errors on Wed May 30 13:10:26 2018
config:

		NAME											STATE	 READ WRITE CKS																																							 UM
		MAIN											ONLINE	   0	 0																																								  0
		  raidz3-0									  ONLINE	   0	 0																																								  0
			gptid/844186ca-d7ff-11e6-9efa-a0369f1466cc  ONLINE	   0	 0																																								  0
			gptid/84e449ec-d7ff-11e6-9efa-a0369f1466cc  ONLINE	   0	 0																																								  0
			gptid/857e1f75-d7ff-11e6-9efa-a0369f1466cc  ONLINE	   0	 0																																								  0
			gptid/86163fb3-d7ff-11e6-9efa-a0369f1466cc  ONLINE	   0	 0																																								  0
			gptid/86af1714-d7ff-11e6-9efa-a0369f1466cc  ONLINE	   0	 0																																								  0
			gptid/87475013-d7ff-11e6-9efa-a0369f1466cc  ONLINE	   0	 0																																								  0
			gptid/87dc3aa1-d7ff-11e6-9efa-a0369f1466cc  ONLINE	   0	 0																																								  0
		  raidz3-1									  ONLINE	   0	 0																																								  0
			gptid/88860260-d7ff-11e6-9efa-a0369f1466cc  ONLINE	   0	 0																																								  0
			gptid/8928d9a5-d7ff-11e6-9efa-a0369f1466cc  ONLINE	   0	 0																																								  0
			gptid/89c997b2-d7ff-11e6-9efa-a0369f1466cc  ONLINE	   0	 0																																								  0
			gptid/8a62f600-d7ff-11e6-9efa-a0369f1466cc  ONLINE	   0	 0																																								  0
			gptid/8b062696-d7ff-11e6-9efa-a0369f1466cc  ONLINE	   0	 0																																								  0
			gptid/8ba5d2fb-d7ff-11e6-9efa-a0369f1466cc  ONLINE	   0	 0																																								  0
			gptid/8c49327a-d7ff-11e6-9efa-a0369f1466cc  ONLINE	   0	 0																																								  0
		  gptid/764b5ef9-dcc7-11e6-9cc2-a0369f1466cc	ONLINE	   0	 0																																								  0

errors: No known data errors

  pool: MAIN_BCKP
 state: DEGRADED
status: One or more devices is currently being resilvered.  The pool will
		continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
  scan: resilver in progress since Wed May 30 18:24:08 2018
		575G scanned out of 1.75T at 13.3M/s, 25h57m to go
		67.5G resilvered, 32.11% done
config:

		NAME											STATE	 READ WRITE CKS																																							 UM
		MAIN_BCKP									   DEGRADED	 0	 0 2.8																																							 1K
		  raidz2-0									  DEGRADED	 0	 0 5.6																																							 1K
			gptid/1eb6cc45-cdf6-11e7-bd27-a0369f1466cc  DEGRADED	 0	 0																																								  0  too many errors
			gptid/cad3c87f-6435-11e8-b6c7-a0369f1466cc  DEGRADED	 0	 0 15.																																							 0K  too many errors  (resilvering)
			gptid/7df99082-7d36-11e7-b369-a0369f1466cc  DEGRADED	 0	 0 19.																																							 9K  too many errors  (resilvering)
			gptid/3ffac730-0349-11e7-ad7d-a0369f1466cc  ONLINE	   0	 0																																								  0  (resilvering)
		  raidz2-1									  ONLINE	   0	 0																																								  0
			gptid/3f426822-d372-11e7-8a41-a0369f1466cc  ONLINE	   0	 0																																								  0
			gptid/da04efac-d3b0-11e7-88da-a0369f1466cc  ONLINE	   0	 0																																								  0
			da16p2									  ONLINE	   0	 0																																								  0
			gptid/723e4fb2-d376-11e7-8a41-a0369f1466cc  ONLINE	   0	 0																																								  0

errors: Permanent errors have been detected in the following files:

		MAIN_BCKP/MAIN_BCKP:<0x1>

  pool: freenas-boot
 state: ONLINE
  scan: scrub repaired 0 in 0h0m with 0 errors on Thu May 31 03:45:27 2018
config:

		NAME		STATE	 READ WRITE CKSUM
		freenas-boot  ONLINE	   0	 0	 0
		  ada0p2	ONLINE	   0	 0	 0

errors: No known data errors

I assume I should wait for reselviring to finish and then check errors again - correct? What worries me:

0 too many errors
gptid/cad3c87f-6435-11e8-b6c7-a0369f1466cc DEGRADED 0 0 15. 0K too many errors (resilvering)
gptid/7df99082-7d36-11e7-b369-a0369f1466cc DEGRADED 0 0 19. 9K too many errors (resilvering)
gptid/3ffac730-0349-11e7-ad7d-a0369f1466cc ONLINE 0 0

Always MAIN was ok - I have newer hard drives there - any thoughts?

Chris Moore · May 31, 2018

What controller are the problem drives connected to?

Sent from my SAMSUNG-SGH-I537 using Tapatalk

joeschmuck · May 31, 2018

Just to note, @mst is running FreeNAS 9.x.

@mst please post your system specs for this issue. Also, if this is why you are considering upgrading to FreeNAS 11 (question in a different thread), I'd just figure out what the issue is here and fix it and leave it alone if FreeNAS 9.x works for you.

mst · May 31, 2018

Chris Moore said:
What controller are the problem drives connected to?

Sent from my SAMSUNG-SGH-I537 using Tapatalk

I have
LSI LOGIC SAS 9207-8i Storage Controller which should be compatible and was flashed to 20.07.00 IT firmware as far as I remember, the storage is Supermicor with 3TB and 1.5TB drives pools

Chris Moore · May 31, 2018

All drives connected with the same controller?

Sent from my SAMSUNG-SGH-I537 using Tapatalk

danb35 · May 31, 2018

You have a couple of big problems:

MAIN consists of two seven-disk RAIDZ3 vdevs and one single-disk vdev striped together. When that one disk (gptid/764b5ef9-dcc7-11e6-9cc2-a0369f1466cc) fails, you'll lose your entire pool.
You have pool-level metadata corruption on MAIN_BCKP.

mst · May 31, 2018

Chris Moore said:
All drives connected with the same controller?

Sent from my SAMSUNG-SGH-I537 using Tapatalk

yes they are connected to the same controller

Chris Moore · May 31, 2018

danb35 said:
MAIN consists of two seven-disk RAIDZ3 vdevs and one single-disk vdev striped together. When that one disk (gptid/764b5ef9-dcc7-11e6-9cc2-a0369f1466cc) fails, you'll lose your entire pool.

@mst WOW, I didn't notice that when I was looking at it on my phone, but that is a huge problem for you. It kind of blows the whole RAIDz3 redundancy out of the water to have a single disk striped in like that. How did you manage to do that anyhow?
The only fix is to make a full backup of all your data, somewhere reliable, destroy the pool, build it again and then bring the data back.
The MAIN_BCKP pool isn't healthy. I would destroy that pool and run a thorough test on all the drives before making a new pool and copying the data back. I don't know what the issue is, but it may be that you have some old drives here that all started to go bad at once. Last year I had a rash of bad drives in my server because they were all over 5 years old. I was loosing a drive every two or threee weeks and I even had two go out in the same day at one point. If you run the MAIN_BCKP pool drives through some destructive testing, it will help to isolate the cause of the problem. If they are older drives, it is a good bet that they are actually having problems.
Do you have regularly scheduled SMART tests running?

Chris Moore · May 31, 2018

Look on the bright side, when you destroy this pool:

Code:

		NAME											STATE	 READ WRITE  CKSUM
		MAIN_BCKP									   DEGRADED	 0	 0	  2.81K
		  raidz2-0									  DEGRADED	 0	 0	  5.61K
			gptid/1eb6cc45-cdf6-11e7-bd27-a0369f1466cc  DEGRADED	 0	 0	  0  too many errors
			gptid/cad3c87f-6435-11e8-b6c7-a0369f1466cc  DEGRADED	 0	 0	  15.0K  too many errors  (resilvering)
			gptid/7df99082-7d36-11e7-b369-a0369f1466cc  DEGRADED	 0	 0	  19.9K  too many errors  (resilvering)
			gptid/3ffac730-0349-11e7-ad7d-a0369f1466cc  ONLINE	   0	 0	  0  (resilvering)
		  raidz2-1									  ONLINE	   0	 0	  0
			gptid/3f426822-d372-11e7-8a41-a0369f1466cc  ONLINE	   0	 0	  0
			gptid/da04efac-d3b0-11e7-88da-a0369f1466cc  ONLINE	   0	 0	  0
			da16p2									  ONLINE	   0	 0	  0
			gptid/723e4fb2-d376-11e7-8a41-a0369f1466cc  ONLINE	   0	 0	  0

You can recreate it as a single vdev and you gain two drives worth of storage space.

joeschmuck · May 31, 2018

danb35 said:
You have a couple of big problems:

MAIN consists of two seven-disk RAIDZ3 vdevs and one single-disk vdev striped together. When that one disk (gptid/764b5ef9-dcc7-11e6-9cc2-a0369f1466cc) fails, you'll lose your entire pool.

You have pool-level metadata corruption on MAIN_BCKP.

Great catch!

danb35 · May 31, 2018

Chris Moore said:
How did you manage to do that anyhow?

Probably another attempted disk replacement gone bad--though how people manage to continue to do this still baffles me.

Chris Moore · May 31, 2018

mst said:
yes they are connected to the same controller

Like @joeschmuck said, it would make troubleshooting easier if we knew what hardware you are using. It saves some of the "50 questions" bit. Plus, it is in the rules:

Updated Forum Rules 4/11/17
https://forums.freenas.org/index.php?threads/updated-forum-rules-4-11-17.45124/

mst · May 31, 2018

Chris Moore said:
Like @joeschmuck said, it would make troubleshooting easier if we knew what hardware you are using. It saves some of the "50 questions" bit. Plus, it is in the rules:

Updated Forum Rules 4/11/17
https://forums.freenas.org/index.php?threads/updated-forum-rules-4-11-17.45124/

o boy .... looks like this would be a bigger project, lucky me I use this storage as backup but anyway ...... recreating both pools would get rid of any corruption - correct?

Indeed all the HDDs are not brand new - but less than 5 years old. Checking so many hdds -one by one will take some time and recreating the process is also not that easy like it looks like

thank you all for all advises

danb35 · May 31, 2018

Chris Moore said:
The only fix is to make a full backup of all your data, somewhere reliable, destroy the pool, build it again and then bring the data back.

...or wait for 11.2, which will allow vdev removal. But I wouldn't want to wait out that risk.

joeschmuck · May 31, 2018

danb35 said:
...or wait for 11.2, which will allow vdev removal. But I wouldn't want to wait out that risk.

Oh no! Beta 1 is scheduled to come out at the end of June and we all know how things stick to the schedule. Nope, I'd backup all data and rebuild the pool sooner than later. I guess you could backup all your data and wait for the new 11.2 and then give it the real test on fixing your issue. Naw, that wouldn't be me.

Chris Moore · May 31, 2018

mst said:
recreating both pools would get rid of any corruption - correct?

For the moment, it appears there is no data corruption in the main pool. If the backup pool is a backup of the main pool (guessing) then your data should be relatively safe... Fingers crossed about that single drive striped into main... The thing I would do is use the drive burn-in test on all the drives in the backup pool to try and eliminate any drives there that are defective.

Info on drive burn-in here:

Building, Burn-In, and Testing your FreeNAS system
https://forums.freenas.org/index.php?resources/building-burn-in-and-testing-your-freenas-system.38/

Github repository for FreeNAS scripts, including disk burnin
https://forums.freenas.org/index.ph...for-freenas-scripts-including-disk-burnin.28/

Then build a new backup pool and make a fresh backup of the main pool. After testing that backup pool to ensure it is healthy, (fingers crossed again) you should be able to reconfigure the main pool and copy the data back.

It will likely take days, depending on how much data you need to copy. I am making a backup right now at work that is projected to take 22 days. Your backup will probably be much faster.

danb35 · May 31, 2018

@mst, how did that single disk get striped into your main pool?

mst · May 31, 2018

danb35 said:
@mst, how did that single disk get striped into your main pool?

hmm dont rememeber on top of my head - I remember following some guide from this forum about less than 2 years ago .....

mst · May 31, 2018

mst said:
hmm don't remember on top of my head - I remember following some guide from this forum about less than 2 years ago .....

anyway, I think I would wait for 11.2 do only drive burn-in to eliminate bad hdd's and when times comes update to 11.2

Thank you again @chris, @danb35, and @joeschmuck !!!!

Important Announcement for the TrueNAS Community.

Volume degraded

Explorer

Hall of Famer

Old Man

Explorer

Hall of Famer

Hall of Famer

Explorer

Hall of Famer

Hall of Famer

Old Man

Hall of Famer

Hall of Famer

Explorer

Hall of Famer

Old Man

Hall of Famer

Hall of Famer

Explorer

Explorer

Similar threads