SOLVED Buggy behavior when replacing encrypted disk

hexadecagram

Dabbler
Joined
Jul 15, 2016
Messages
32
Hello fellow FreeNAS fans,

So I just spent about 4 hours trying to replace the ZIL in an iXsystems FreeNAS Mini XL running 11.1-U6. Everything seems to be fine at this point but I thought I should share my experience, which seems to be likely due to a bug.

Like anyone with a bit of sense, I immediately turned to the User Guide for direction. My first stop was section 8.1.10. Because my pool is encrypted, I was directed to section 8.1.10.1 from there, which in turn directs me to 8.1.1.1. One thing to note there is that L2ARC drives are supposedly not encrypted in encrypted pools, yet mine is GELI'ed. This seems to be a new addition to the User Guide, because I don't recall seeing it before. Perhaps that is the cause of this behavior?

Anyhow, section 8.1.1.1 also states that ZIL drives ARE encrypted in encrypted pools. And as I check my machine, the ZIL is GELI'ed as well, and my interpretation of what is written is that I just need to treat it as just any other encrypted drive while replacing it.

So I go back and start re-reading section 8.1.10.1 and see that I need to add a passphrase to my pool to start the process. Okay, done.

Back to section 8.1.10 now to follow the steps as prescribed in 8.1.10.1. Step 1, offline the drive, done with no complaint from the GUI. Step 2, shutdown and replace the drive, done. Step 3, boot back up and click "Replace Disk", partially done because now the pool isn't being displayed as Locked as it has been in the past. To my surprise it's marked UNKNOWN!

At that point I ssh into the box, do a zpool status and see that each drive in the pool is marked as UNKNOWN. I fiddle around in the GUI a fair bit and get tracebacks from various python scripts as I try to get information about my pool. Uh oh.

So the first thing I do is check /dev. Come to find out that for some reason GELI didn't attach to the drives on boot. I don't remember everything I did after this point and it took a lot of experimentation on my part to come to this result, but here's a rough summary of what I can remember going through to get the pool back online. Note that at certain points drives were UNAVAIL, but exactly when I can't remember.
  1. Shut the machine down once again, put the old ZIL back in, and it still came back UNKNOWN.
  2. Shut it down again, reinstalled the new ZIL, and booted up, pool is UNKNOWN yet again.
  3. Noticed many repeating log messages such as this:
    ZFS: vdev state changed, pool_guid=mypoolguid vdev_guid=randomnumber
  4. zpool export mypool; ( for prov in /dev/gptid/*; do geli attach -k /data/geli/mypool.key $prov; done ); zpool import -m mypool
  5. Clicked "Replace Disk" in the GUI, waited for resilvering to complete, followed the remaining steps in section 8.1.10.1.
  6. Reboot.
  7. Once again GELI devices did not attach on boot (but notice that /dev/mirror/swap[0-3].eli are up).
  8. zpool export mypool; ( for prov in /dev/gptid/*; do geli attach -k /data/geli/mypool.key $prov; done ); zpool import mypool
  9. Change passphrase (reused the same one I used when reading section 8.1.10.1).
  10. Reboot.
  11. Yay! Pool is in Locked state.
  12. Unlock and reboot.
  13. Again in Locked state.
  14. Unlock, remove passphrase, and reboot.
  15. Pool is online.
  16. Reboot again.
  17. Pool is online.
As I'm sure you can see, this is a bit more than what is described in the User Guide.
 

Chris Moore

Hall of Famer
Joined
May 2, 2015
Messages
10,080
Thanks for documenting it. Encryption has caused many people trouble over the years.
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
One thing to note there is that L2ARC drives are supposedly not encrypted in encrypted pools, yet mine is GELI'ed.
They are supposed to be encrypted. It's a docs bug that will be fixed in 11.2.

As for the details of what happened, the encryption management code is pretty fragile. Surprisingly fragile, really. It will probably be nuked in the future. Until then, backups of the keys and GELI metadata, as well as some understanding of how to use GELI are pre-requisites to embark on encryption in FreeNAS, in my opinion.
 

hexadecagram

Dabbler
Joined
Jul 15, 2016
Messages
32
As for the details of what happened, the encryption management code is pretty fragile. Surprisingly fragile, really. It will probably be nuked in the future.

If I'm understanding you correctly, this is a FreeNAS issue and not a FreeBSD issue.

Is GELI going to be replaced or will disk encryption be removed entirely?
 

Chris Moore

Hall of Famer
Joined
May 2, 2015
Messages
10,080
Is GELI going to be replaced or will disk encryption be removed entirely?
There is ZFS 'native' encryption that is on the way some time in the future. It has already been developed but needs to be ported to the BSD branch of OpenZFS. I have not tracked how far that has progressed, but I remember reading that the task had been started. There are at least three branches of development for OpenZFS; FreeBSD, TruOS, OSX (Mac), Linux, Illumos, Delphix, Joyent and possibly others. Things (like native encryption) all get up-streamed into OpenZFS at some point, and then filter back down to all the other operating systems. It just takes some time. The other thing to realize about FreeNAS is, even though it is based on FreeBSD, the developers of FreeNAS do cherry pick features to incorporate into what is essentially another flavor of the operating system. Not all features of FreeBSD are rolled into FreeNAS and that has caused issues for some users that are very familiar with FreeBSD because they expect features to be present that are not and can't be installed.

PS. I went and did some quick searching and it appears the native encryption is due in BSD 12.
 
Last edited:

hexadecagram

Dabbler
Joined
Jul 15, 2016
Messages
32
Thanks for the info, Chris!

Any idea if there will be a pool conversion feature or should I start thinking about pushing my bits to cloud storage?
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
You can't encrypt everything - you'll still need GELI for that sort of thing. Encryption will be per dataset, allowing for things like scrubs on encrypted data and sending encrypted data. It's really cool and someone will probably share a link about it (on my phone, sorry).
 

hexadecagram

Dabbler
Joined
Jul 15, 2016
Messages
32
Found this: https://blog.heckel.xyz/2017/01/08/zfs-encryption-openzfs-zfs-on-linux/

Per dataset might be fine for my use case, which I assume would be a simple matter of creating new, encrypted datasets and copying the bits over from their unencrypted counterparts.

It may be worth pointing out that the article I linked says that volume-level encryption will be possible. But it's also written with a focus on Linux and ZFS may be going a different route on Linux.
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
Per dataset might be fine for my use case, which I assume would be a simple matter of creating new, encrypted datasets and copying the bits over from their unencrypted counterparts.
Yup, just (regular) send it over.

It may be worth pointing out that the article I linked says that volume-level encryption will be possible. But it's also written with a focus on Linux and ZFS may be going a different route on Linux.
Not sure what that's about. You could conceivably encrypt all datasets (I'm not 100% sure if the top-level dataset can be encrypted, though), but that will always leave a bunch of metadata unencrypted, so that you can do useful things to the data. This applies to all OpenZFS environments.
 

pro lamer

Guru
Joined
Feb 16, 2018
Messages
626
like scrubs on encrypted data
It can be a nice feature especially for datasets with passphrases! A scrub can run without the need of entering the passphrase...

As well as resilvering without need of unlocking (although I don't know enough about native encryption to know if it is possible too)

Sent from my mobile phone
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
As well as resilvering without need of unlocking (although I don't know enough about native encryption to know if it is possible too)
Of course it is, a resilver is just a scrub where you're writing data all the time.

The design of the feature is along the lines of "I want that guy to have just enough information to keep the backups of my data safe, including against bit rot, without actually being able to read it."
 
Top