SOLVED HELP!!! Filer will not boot after replacing faulted drive

Status
Not open for further replies.

Havock2

Dabbler
Joined
Apr 15, 2016
Messages
25
I just replaced a faulted drive. My steps are as follows:
  1. Find the serial number of the faulted drive
  2. Pull drive
  3. Replace drive in filer
  4. Go to Zpool status in the GUI
  5. Select the faulted drive
  6. Choose replace
  7. Since this drive previously had data on it, select force
  8. Hit ok
  9. Whole pool goes offline with many other faulted drives
  10. Resilver seems stuck after ~20 minutes
  11. GUI and SSH are non-responsive
  12. Force reboot of the filer
  13. Filer starts boot-looping with many message stating that it's trying to load datasets, scanning them and ten suspending them
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,994
How about providing some more details per our forum rules. Knowing the FreeNAS version and hardware & pool configuration will really help here.

Have you referred to the User Guide section 8.1.10 Replacing a Failed Drive ?

At this point in time due to the reboots I'd reinstall the original failed drive and try again, or at least remove the drive you just installed. You must get the system up and running again so it doesn't reboot.

EDIT: Do you have a current backup of your config file? There are things we can do there if needed but don't jump the gun if you want your data back.
 

Havock2

Dabbler
Joined
Apr 15, 2016
Messages
25
Thanks for the suggestion of adding the drive back in. That allowed me to successfully boot.

I traced the cause of my problem to the power rail that I use for half of the drives. It was loose on the PSU side. I've now double checked all of the other cables on both sides. Everything is secure again.

I was following the steps in the User Guide.

I've added the specs of my filer to my sig.
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,994
Glad that you found out the problem was just a loose power supply connector, easy fix but just a little nerve racking I'm sure.
 

marcevan

Patron
Joined
Dec 15, 2013
Messages
432
Well, it's standard in any release to offline the drive first in storage (select your volume name, e.g., tank or media) then choose icon on bottom for volume status and offline the offending drive.

It reverts to show it's vdev guid id but in a look at your disks the serial number is your main concern.

One of my 3TBs (my ada4) keeps posting 6 sectors at issue and since I have already replaced one 3TB with a 4TB, here goes the replacement. Offlined it, Amazon delivers new one later today. Then I shut down the box as even if AHCI compatible, it's way easier to remove and replace the drive if you can fully access the box.

I'll swap it, reboot it, check the drive and hit replace, and it resilvers and from there re-enable S.M.A.R.T. on it. Then I'll be 40% done with expansion from 3TB to 4TB.

Obviously there's joy when your last drive flips to a bigger size and super easy since you don't care about serial #...you just find the small drive size and that's the only one you're replacing with the bigger size. And...then your pool inflates.
 
Status
Not open for further replies.
Top