Replacing a drive in a ZFS Pool - Resilver Loop and can't replace

Status
Not open for further replies.

jmcnal1

Cadet
Joined
Dec 14, 2017
Messages
9
Hi,

I've seen versions of this on the forum but I haven't found this exact problem for sure. Here is the problem. I am using FreeNAS-11.0-U4 and trying to replace a 1TB drive with a new 4TB drive. The rest of the pool is a 2TB and 4TB drive. The sequence of events was/is:

  1. Powered down, connected the new drive to an open port and rebooted
  2. Selected the pool in the GUI
  3. Selected Volume Status
  4. Selected the drive to be replaced and chose replace
  5. Waited for resilver to be completed and powered down
  6. Removed the old drive and put the new drive in its place
  7. Reboot and the volume and pool can't be identified
  8. Returned original 1TB drive to the slot and the new 4TB to the spare port
  9. Reboot and resilver began on its own
  10. Waited for the resilver to say complete and returned to step 6
This loop has continued ad infinitum. I would say I've gone through this 4-5 times now. There seems to be no way to break out of this. I'm a moderate experience user here so I need some step by step instructions for troubleshooting. I have a backup so if I need to just wipe and rebuild that's ok, but I would rather avoid it if possible.

Thanks in advance for your help!
Jim
 
Last edited by a moderator:

Chris Moore

Hall of Famer
Joined
May 2, 2015
Messages
10,080
Sorry, if you have any details in your signature, I can't see it through Tapatalk.
What kind of pool configuration is it? Mirror, RAID-z2, etc...
Also, why are you doing a shutdown?
The thing I would suggest is to offline the drive to be replaced. Remove it. Install the new drive into the place where you want it. Then resilver the pool.
That's what I have done for years. Mainly because it runs faster than the online replacement you are trying to do.

Sent from my SAMSUNG-SGH-I537 using Tapatalk
 

jmcnal1

Cadet
Joined
Dec 14, 2017
Messages
9
Thanks Chris,

I have seen that approach, but because I'm stuck in this current loop I can't. If I disconnect the new drive and return to my original physical organization (the 1TB drive connected) and choose Offline, I get an error. I think I need a way to breakout of the replace loop and try what you are suggesting.
 

jmcnal1

Cadet
Joined
Dec 14, 2017
Messages
9
zpool status from the CLI:

Code:
[root@mcnallynas ~]# zpool status																								  
  pool: McNallyNAS-1GB																											
 state: DEGRADED																													
  scan: resilvered 944M in 2h4m with 0 errors on Thu Dec 14 07:25:38 2017														  
config:																															
																																  
	   NAME											STATE	 READ WRITE CKSUM												
	   McNallyNAS-1GB								  DEGRADED	 0	 0	 0												
		 gptid/437156dc-bce6-11e6-9cae-d017c2980d87	ONLINE	   0	 0	 0												
		 replacing-1								   DEGRADED	 0	 0	 0												
		   gptid/03c1176b-c09d-11e6-a125-d017c2980d87  ONLINE	   0	 0	 0												
		   15961056166164311311						REMOVED	  0	 0	 0  was /dev/gptid/6e62f453-dd0d-11e7-bb7a-1c1b0d666
9f0																																
		 gptid/d0093cbc-7e9a-11e7-8f1f-1c1b0d6669f0	ONLINE	   0	 0	 0												
		 gptid/ab75e627-45e2-11e7-b261-1c1b0d6669f0	ONLINE	   0	 0	 0												
																																  
errors: No known data errors																										
																																  
  pool: freenas-boot																												
 state: ONLINE																													
  scan: scrub repaired 0 in 0h7m with 0 errors on Mon Dec 11 03:52:18 2017														
config:																															
																																  
	   NAME		STATE	 READ WRITE CKSUM																					
	   freenas-boot  ONLINE	   0	 0	 0																					
		 da0p2	 ONLINE	   0	 0	 0																					
																																  
errors: No known data errors
 

Chris Moore

Hall of Famer
Joined
May 2, 2015
Messages
10,080
That explains it. You can't offline the disk because it would destroy your pool. You don't have any redundancy, just a stripe of individual drives. Kind of like a RAID-0. If any one of your drives fails, you loose the whole pool. It is a good thing you have a backup. This is not a good situation to be in.
I don't know why you got stuck in the loop, but a successful resolution is the only way to avoid having to rebuild from a backup.
@joeschmuck any ideas about this?
 

jmcnal1

Cadet
Joined
Dec 14, 2017
Messages
9
OK, if I have to rebuild how do I avoid this in the future. I'm guessing my novice status when I built this originally is what put me in this situation.
 

SweetAndLow

Sweet'NASty
Joined
Nov 6, 2013
Messages
6,421
That explains it. You can't offline the disk because it would destroy your pool. You don't have any redundancy, just a stripe of individual drives. Kind of like a RAID-0. If any one of your drives fails, you loose the whole pool. It is a good thing you have a backup. This is not a good situation to be in.
I don't know why you got stuck in the loop, but a successful resolution is the only way to avoid having to rebuild from a backup.
@joeschmuck any ideas about this?
You can still replace a drive in a stripe like this. You have to do it in the online way that the op did it. My guess is you might need to offline the drive before removing the old drive.
 
Last edited by a moderator:

jmcnal1

Cadet
Joined
Dec 14, 2017
Messages
9
OK, but it is not letting me offline the old drive (I think) because I'm in the resilver loop. I need to find a way to break out of that.
 
Last edited by a moderator:

SweetAndLow

Sweet'NASty
Joined
Nov 6, 2013
Messages
6,421
I thought the resilver finished? If the resilver never finished your have corruption in your pool. Can you run a scrub or when was your last scrub? Might not be possible during a resilver to run a scrub.
 

jmcnal1

Cadet
Joined
Dec 14, 2017
Messages
9
It says it's finished in the GUI. However when I swap the drive it doesn't work and when I put the new drive back on the extra port, the resilver begins again.

Scrubs are run weekly and no issues have been identified.
 
Last edited by a moderator:

SweetAndLow

Sweet'NASty
Joined
Nov 6, 2013
Messages
6,421
You need to try and offline the disk first.
 

jmcnal1

Cadet
Joined
Dec 14, 2017
Messages
9
OK, suggestions as to how to offline it? I don't even see it (ada1) in the zpool status.
 

SweetAndLow

Sweet'NASty
Joined
Nov 6, 2013
Messages
6,421
Huh? It's in the gui as an option. Same place as when you do the replace.
 

Chris Moore

Hall of Famer
Joined
May 2, 2015
Messages
10,080

jmcnal1

Cadet
Joined
Dec 14, 2017
Messages
9
Since I first chose replace, offline is no longer an option for that drive. When I disconnect the new drive and return to the original physical config, choosing offline results in an error.
 

rs225

Guru
Joined
Jun 28, 2014
Messages
878
You need all 5 drives online. Then you see if a complete resilver works. If not, you recreate your pool from scratch. You really ought to do that anyway, and this time make sure it is raidz1.
 

jmcnal1

Cadet
Joined
Dec 14, 2017
Messages
9
You need all 5 drives online. Then you see if a complete resilver works. If not, you recreate your pool from scratch. You really ought to do that anyway, and this time make sure it is raidz1.
Yes, I think this is exactly where I am headed after everyone's input here.
 
Status
Not open for further replies.
Top