Freenas 11.2-U5 Replacing encrypted disk in Pool

cmcasanova

Cadet
Joined
Jul 3, 2019
Messages
2
Hello, I have gone through the guides and I am just looking for confirmation of steps prior to replacing a failing disk in my encrypted pool, running FreeNAS 11.2-U5. Note, I am primarily a Cisco network/security engineer treading in the storage space lightly over the past few years at a professional level as needed within my profession. This is just about 3 years of having FreeNAS running. I have a basic understand, so feel free to berate me as necessary if I am incorrect or misspeak.

pre-req: passphrase for encryption set (able to unlock with passphrase only, this should be confirmation?), download recovery key (geli_recovery.key), and encryption key (geli.key)
step 1: OFFLINE bad disk from GUI, taking note of serial number (Already OFFLINE'd the failing disk, and Pool now shows DEGRADED, Parity is fine)
step 2: shutdown / physically replace disk (WD-WCC4N1DZJL5V) with new disk (same model/size). Boot the system back up.
step 3: From Storage -> Pools -> Pool Status -> Select "ada4" on the right, click REPLACE disk, confirming passphrase for the encrypted pool
step 4: wait for Resilvering to complete (takes a few hours from what I have read from previous posts)
step 5: Restore the encryption keys before the next reboot or access to the pool will be permanently lost.

Notes:
-Pool is RAIDZ1 with 6 total disks at 3TB each.
-SATA ports are full, so a shutdown would be required prior to installing new hard drive (and removing the failing disk)

Outstanding questions I had:
1. After shutting down to install the new disk (they are not hot-swappable and no extra SATA ports exist on the motherboard), and the system boots back up, do I need to unlock my Pool at this point like I normally would after a cold boot?
2. How do you "restore the encryption keys" after the disk has been replaced?
 

kingc

Dabbler
Joined
Jul 2, 2019
Messages
17
The documentation isn't great in this area. In fact the process isn't great. As it happens, I've been testing recovery scenario's in a visualised environment recently. This is what I've found. Sorry, it's a bit long. I've done a fair amount of testing on 11.2-U4, but as always use this information at your own risk.

Background:
--------------
* Each disk is independently encrypted (for now, until native ZFS encryption is included), using separate volume master keys.
* The GELI metadata on each disk contains 2 keyslots:
Slot 0: The master key encrypted with the "normal" key + optional password.
This "normal" key is the stored at /data/geli/****.key and can be downloaded in via the "Pool->Download Encryption Key" function in the UI.
It is downloaded as geli.key.
Setting a password is optional, but if set it is stored as part of slot 0 and so both the password and "normal" key are required to unlock the volume.
Slot 1: The master key encrypted with the recovery key. A recovery key is not stored within FreeNAS and there isn't one by default.
It is generated using the "Pool->Add Recovery Key" function in the UI and automatically downloaded as geli_recovery.key at that time.
Slot 1 does not use a password component in FreeNAS.

So, to unlock a particular disk, you need to have the "normal" key and optional password OR recovery key FOR THAT DISK. When working with a pool as a whole (i.e. enabling encryption for the first time, or re-keying later), FreeNAS synchronises the relevant keys across all disks. However, when replacing an individual disk it does not and we need to trigger this manually. Hence the possibility of data loss if this isn't done: it may not be possible to unlock the replacement disk if the key isn't known later (post-reboot).

Disk replacement:
--------------------
You should have:
* A backup of your pool if at all possible.
* The pool "normal" key (geli.key), downloaded from the UI as mentioned above, or copied from /data/geli/***.key.
* The passphrase, if set. Not sure why the documentation requires a passphrase to be set, as it's just a component of slot 0. Nevertheless, go ahead and set one if you want (it'll soon be lost - see later).
* A recovery key (geli_recovery.key), created in the UI, as mentioned above. Not strictly necessary, but good practice.

WARNING: There appears to be a bug (as of 11.2-U4) where setting the FIRST passphrase on a pool erases any recovery key already set. This doesn't happen when changing an existing passphrase. I've logged this as a bug with iX. For this reason, it is best to set a passphrase BEFORE creating a recovery key, if you decide to use both.

From there:
0. Ensure you have a backup of your pool if at all possible.
1. Offline faulty disk, if it's still operational.
2. Shutdown if necessary, then physically replace faulty disk
3. If the machine was shut down to replace the disk, start up again and unlock your pool in the usual way.
4. Replace faulty disk with new disk under "Pool->Status". If you had a passphrase set on the pool, you'll need to enter it at this point.
5. The pool will start resilvering. Don't wait for it to complete, rather continue immediately to step 6.
6. For safety, sort out your keys immediately at this point. To do this:
* Select "Pool->Encryption Rekey". This generates a new "normal" key in slot 0 for the pool and synchronises it across all disks in the pool. Because it works on slot 0 as a whole, it also clears out any password that was set. So, after a re-key, you should update any copies of "normal" key you have stored with the new one (i.e. "Pool->Download Encryption Key" again) and set a passphrase again if you need one.
* If you're using recovery keys, add a new recovery key to the pool in slot 1 ("Pool->Add Recovery Key"). This is to ensure that the recovery key is synchronised across all disks in the pool. Again, replace any recovery key you had previously stored with the new geli_recovery.key that will be automatically downloaded.
7. Wait for resilvering to complete.
...
8. Lock and unlock your pool to check your keys.
9. Consume your favourite beverage.

HTH
 

cmcasanova

Cadet
Joined
Jul 3, 2019
Messages
2
kingc,

Thanks for the awesome reply and information, much more than I expected to receive. One follow on question / update of where I am at. The resilver started, I re-keyed, I downloaded the encryption key, and the recovery key. I previously was able to only use a passphrase to unlock, so I set the passphrase again (it seemed to remove it when I replaced the disk). I have not tried to lock and unlock yet, because this morning, I noticed FreeNAS did not remove the old disk, brought the new ada4 online, but "zpool status" still showed the old disk within the pool. The new disk seems to have taken the encryption though, as it has .eli on the disk line itself. I did notice that some of the information matches on the other 5 disks, but this new disk, it does not. Not sure if this is a problem or not, I just do not want to lock and unlock, and then realize I didn't do something. Have you seen this during your testing that you mentioned? I also noticed that the resilvering started again this morning after I noticed that the passphrase was gone again from the pool... I am afraid at this point that I may lose data if I were to lock the pool. Luckily, I have a 90 minute UPS on this guy, so if I were to lose power, I should be alright for a bit.

NAME STATE READ WRITE CKSUM
Disk_Pool DEGRADED 0 0 0
raidz1-0 DEGRADED 0 0 0
gptid/5be02082-47e8-11e7-bee8-d050990a6c49.eli ONLINE 0 0 0
gptid/5c9dcebb-47e8-11e7-bee8-d050990a6c49.eli ONLINE 0 0 0
gptid/5d61fd78-47e8-11e7-bee8-d050990a6c49.eli ONLINE 0 0 0
gptid/5e20f328-47e8-11e7-bee8-d050990a6c49.eli ONLINE 0 0 0
replacing-4 DEGRADED 0 0 0
10974374115521224185 OFFLINE 0 0 0 was /dev/gptid/5f3c342d-47e8-11e7-bee8-d050990a6c49.eli
gptid/6668cb79-9e1f-11e9-ba69-d050990a6c49.eli ONLINE 0 0 0
gptid/6054c5c2-47e8-11e7-bee8-d050990a6c49.eli ONLINE 0 0 0

Chris
 

kingc

Dabbler
Joined
Jul 2, 2019
Messages
17
previously was able to only use a passphrase to unlock, so I set the passphrase again (it seemed to remove it when I replaced the disk)

Yes, as I said, when you re-key, the entirety of slot 0 seems to be replaced - including your previous password. So, adding it again was necessary.

? I also noticed that the resilvering started again this morning after I noticed that the passphrase was gone again from the pool

How do you know that without locking and unlocking it, which you say you haven't done?

but "zpool status" still showed the old disk within the pool

It says it's being replaced with your new disk:

replacing-4 DEGRADED 0 0 0
10974374115521224185 OFFLINE 0 0 0 was /dev/gptid/5f3c342d-47e8-11e7-bee8-d050990a6c49.eli
gptid/6668cb79-9e1f-11e9-ba69-d050990a6c49.eli ONLINE 0 0 0

Just let it complete resilvering.

As for the id's - the .eli is named the same as the underlying GPT id, which is a UUID. So, nothing to do with encryption.
 
Top