Replacing a failed drive in an encrypted pool

Status
Not open for further replies.

mike117

Cadet
Joined
Jan 21, 2016
Messages
8
I have a raidz1-0 with four 6tb drives with encryption enabled. Which has been running fine for 2 years now no issue. Recently however I got an error saying my zpool was DEGRADED, so I bought a new SATA cable, turned off my nas, found the drive, replaced the cable, and turned it back on.

On boot zpool was degraded with 3 drives and didn't have a fourth one listed

The volume status page looked like:
volume
raidz1-0
ada5p2
<numbers>
ada1p2
ada0p2

So I clicked on the drive with numbers and replaced it with ada3p2 which it sees under View Disks (This is the same drive that had issues before but wasn't reporting anymore), hoping that the SATA cable was just the issue. It started resilvering and after a short while the pool status changed from ONLINE to FAULTED on ada3p2. It's currently still resilvering the drive even though it says FAULTED.

I'm guessing the drive is actually bad, so I have a new drive on the way. Because I have encryption enabled and already resilvered once, I'm not sure what to do.

My current plan is to let the resilver finish, follow http://doc.freenas.org/9.3/freenas_storage.html#replacing-a-failed-drive to rekey the drive, create a passphrase, download the key, and add a recover key, then turn off the nas, take out the faulted drive, put in a new one, then in the GUI do a replace of the faulted ada3p2 (or whatever numbers it assigns to it) with the new drive. Wait for the resilver to complete and refollow http://doc.freenas.org/9.3/freenas_storage.html#replacing-a-failed-drive to rekey the drive, create a passphrase, download the key, and add a recover key. Does that all sound right?
 

Attachments

  • zpool logs.txt
    16.9 KB · Views: 234

fabiob

Dabbler
Joined
Dec 4, 2017
Messages
15
4 * 6Tb on encrypted RAIDZ1, you are definitely adrenaline addicted :)
I'll suggest to backup the zpool before touching anything, then switch off, replace and start rekey-ing process. No need to rekey an already faulty drive.
Can you please post /var/log/messages for that fault?
 

mike117

Cadet
Joined
Jan 21, 2016
Messages
8
I have a backup from 6months ago, I'm working on getting another. /var/log/messages is filled with CAM status: Uncorrectable parity/CRC error on ada3. I've attached it. The resilver has finished on the faulted drive.

So you're saying don't rekey, turn off, replace the drive, turn on, resilver, rekey? Why isn't the first rekey needed?
 

Attachments

  • messages.txt
    182.2 KB · Views: 271

fabiob

Dabbler
Joined
Dec 4, 2017
Messages
15
rekeying a faulted device is IMHO useless for many reasons :
- there's no need to do it according to the manual
- if you reboot and rekey, the drive will become FAULTED again, probably before completing rekeying
- you are going to replace it

This is the plan:
1) Finish your pool backup and store it away
2) OFFLINE the faulted disk with the GUI button. (If you see "no valid replicas" error, launch a scrub and retry)
3) Shutdown and replace disk
4) Switch back on
5) use "Replace" GUI button and wait for complete resilver
6) DON'T REBOOT and go with the part "8.1.10.1. Replacing an Encrypted Drive"
 

mike117

Cadet
Joined
Jan 21, 2016
Messages
8
Thanks it worked! The backup finally finished so I felt safe going ahead with your instructions. There wasn't a button to offline the disk so I skipped Step 2. I followed the rest of the steps and tested by rebooting and seeing the drive still decrypts. I'm still not sure why you need to change the encryption key for the volume after silvering the drive. I'm guessing when the box resilvers it rebuilds the drive with a new key different from the other drives in the volume. Is there a reason why we don't use the same key for rebuild?
 

fabiob

Dabbler
Joined
Dec 4, 2017
Messages
15
Good to hear everything was fine!
I'm not the authority on encrypted pools, AFAIK keys are based on gptid, which is unique for each disk.
 
Status
Not open for further replies.
Top