SOLVED Geli Encrypted Recovery Issue

Volte

Dabbler
Joined
Feb 11, 2016
Messages
19
Hi all!

I know what you're thinking: Oh no, another "my data is encrypted and I can't recover it!". I understand the risks that come with running encrypted disks. I have a peculiar situation none-the-less and would love a few additional brains on the matter.

(I figured I'd take my long narrative and make it hide/showable. If that is not helpful let me know and I'll change it)
I've got four drives (/dev/ada0 - /dev/ada3) set up as mirror/striped. I run them encrypted, and it's been pretty stable since FreeNAS 9.x

I've had a couple of close calls before where I've nearly lost all my data because somehow the recovery key I had saved was not working. (I've been lurking on the forums for a while now, and done many searches over the years around this stuff and it seems there is some sort of known issue with the downloaded recovery key, etc, but that may not be particularly relevant right now, because the situation is what it is).

Anyhow, I avoided disaster by scouring all my backups and computers and I found a rogue geli key laying around that ended up working. Hooray! This was a few years ago.

Fast forward to today, and I'd been getting daily reports about my boot volume being in a degraded state. (booo). So, aside from waiting two weeks to find time to address it, I was a relatively good citizen and got myself some new USB Drives to move over to. (This is not the first time, in fact, the first close call I mentioned above was similar in that the boot drive would not... boot). Anyhow, I was running 11.2-U4-1 and figured it would be a good time to upgrade anyway. I have daily backups of my `freenas-v1.db`, so I wasn't worried, but I downloaded a copy just in case. I grabbed a fresh 11.2-U6 threw it on one new USB drive, and installed onto a second new USB drive.

Meanwhile (and this was stupid, but spoiler alert it still boots) I took my degraded 11.4-U4-1 USB drive and ran some diagnostics on it, including block recovery (yes, data corruption can occur). So the fresh install finishes, and I reboot into it. I upload my backedup settings, and it reboots a few times. Once it comes back up, I mosey on over to the Storage section, I see my pool sitting there, locked, as expected. I click the unlock button, add my password, upload the recovery key (which I'd quadruple checked during and after my last close-call, and saved it in a very safe place, and wrote down the MD5 hash of it, and saved that as well, just to be sure).

Code:
Error: Traceback (most recent call last):

  File "/usr/local/lib/python3.6/site-packages/tastypie/resources.py", line 219, in wrapper
    response = callback(request, *args, **kwargs)

  File "./freenasUI/api/resources.py", line 949, in unlock
    form.done(obj)

  File "./freenasUI/storage/forms.py", line 2847, in done
    raise MiddlewareError(msg)

freenasUI.middleware.exceptions.MiddlewareError: [MiddlewareError: Volume could not be imported: 4 devices failed to decrypt]


Well, what the heck!? So I try it a few more times, I try it from the command line (geli attach -k [geli_key_file] [dev_to_unlock]), which informs me that geli: Wrong key for ada3p2 etc.

At this point, the cold sweats start to appear. Ok, so the recovery key that I quadruple checked on new clean installs prior to storing away for a rainy day, doesn't in fact work. I was away from my server at this point, but I brought the thumb drive with me. I tried booting from it on a VirtualBox with the intent on checking for and recovering a different geli key (maybe I goofed again and saved the wrong one?), but to no avail, so I was sure at this point I'd corrupted the drive to a point where it couldn't boot. So, I began a data recovery process on it for 3 days in hopes of maybe finding the right geli recovery key. That proved fruitless, but I didn't want to stop the recovery process until it was done.

Once it'd finished, and I couldn't find the key, I said screw it, let me try booting from it (now that I was back with the server). Sure enough, it booted up (although, it began the upgrade process to 11.2-U6, which I stopped halfway. It booted up, and I started poking around. I unlocked the pool with just the password, as usual (since I assume it's using the geli key in the /data/geli/ directory. Interestingly though, the /data/geli/[hash].key geli key was at zero bytes. I thought that was odd after I'd scp'd it to my local computer. Well clearly that's useless (but how did it unlock the drives anyway? ). So I logged into the GUI, and started clicking around. I saw a "encryption rekey" and I figured what the heck, might as well try this while I still have access to my encrypted pool!. So I did that, and it wrote over the zero byte geli key, with 64 bytes. That looks better! So I downloaded that (both through the GUI, and through SCP to be sure), and now confident that I was armed with the correct key, I shutdown, swapped the USB thumb drives, and booted into the fresh 11.2-U6 install. I headed over to the storage section, went to unlock my pool, and again, the same error (as above). I also tried (again) to do it from the command line.

So, here I am.

TL;DR:
  1. Degraded boot drive (USB) 11.2-U4-1
  2. Run encrypted pool (four drives, mirrored/striped)
  3. Have quadruple confirmed working (apparently not) geli_recovery.key
  4. Fresh install on new USB (11.2-U6).
  5. Try to unlock pool with passphrase and recovery key, fails with error (above)
  6. Try finding a different geli_recovery.key.
  7. Reboot into degraded (but working) 11.2-U4-1 drive, see that geli key is 0bytes.
  8. Rekey pool. Download geli_recovery.key (note: different MD5 from "original" one).
  9. New install of 11.2-U6 still doesn't accept key and passphrase.
  10. Have tried the command line as well.
Thoughts?

Very best,

~ Dale
 

Volte

Dabbler
Joined
Feb 11, 2016
Messages
19
Welp. I feel a little sheepish here. I seem to have figured it out.

I started with the Reusing Encryption Keys & Passphrases resource, which linked me to this post titled Recovery Key Pool Import Puzzle, which has a section about importing the pool with the recovery key (called 1.key in that post. Turns out, supplying a passphrase causes the process to fail, as the recovery key does not need a passphrase. The inner workings are more nuanced than this, having to do with positioning 0.key and 1.key, etc. I'm still working out the exact next steps to get back to where I was originally, but for now, the pool is unlocking with the recovery key (and no passphrase).

Note (for posterity, on top of the links above), to accomplish this via the command line, you would supply an empty -p to the geli attach command.

Frustrating, to say the lease, but I'm a bit relieved! What an unfortunate use of three days. :(
 
Joined
Oct 18, 2018
Messages
969
Hi @Volte I'm happy my resource and puzzle helped you in your situation. I can help you get back to the "normal" situation, no problem.

FreeNAS does a poor job of naming their various keys. When you "Download Recovery Key" at pool creation you have downloaded what I call 0.key or the primary key; this one has the passphrase. When you "Add Recovery Key" and download that key you get the 1.key or the recovery key.

When I work with encrypted pools I make sure that I always have an up-to-date copy of both keys. To help me understand your exact situation can you answer the following?

1. Do you have both keys that you expect to work, even if you are only able to unlock the pool using one of them.

2. Are you able to unlock the pool at all?
 

Volte

Dabbler
Joined
Feb 11, 2016
Messages
19
Hi @Volte I'm happy my resource and puzzle helped you in your situation. I can help you get back to the "normal" situation, no problem.

FreeNAS does a poor job of naming their various keys. When you "Download Recovery Key" at pool creation you have downloaded what I call 0.key or the primary key; this one has the passphrase. When you "Add Recovery Key" and download that key you get the 1.key or the recovery key.

When I work with encrypted pools I make sure that I always have an up-to-date copy of both keys. To help me understand your exact situation can you answer the following?

1. Do you have both keys that you expect to work, even if you are only able to unlock the pool using one of them.

2. Are you able to unlock the pool at all?
Hi @PhiloEpisteme,

I appreciate your bid to help here! Sorry for the delayed response.

As for your questions:
1. I do not think I have both keys, as I created the pool many years ago, had a close call with recovering my data which involved a similar fresh FreeNAS install scenario, and didn't know what I know now about all these keys, however...
2. I do have a key (which I believe is the "recovery key", what you call the 1.key), and have unlocked my pool.
 
Joined
Oct 18, 2018
Messages
969
So, if you have 1 key that works you can certainly get the pool to a state as if nothing happened.

If the pool is available in the GUI then simply click "Encryption Rekey" and save this key, it is User Key 0 (0.key).

Then, add a passphrase if you desire one.

Then, click "Add Recovery Key", this is User Key 1 and has no passphrase.

Keep both keys some place safe.

If the pool is not available try importing via the GUI first, using only the recovery key.

For posterity to address some of your comments you made about your struggles I thought I'd offer a brief follow up. Some of this may be something you've since learned so perhaps not as relevant to you directly as much to future readers.

I've had a couple of close calls before where I've nearly lost all my data because somehow the recovery key I had saved was not working.
I think it is best to test basic scenarios with encryption prior to using it on real data. Make sure you know what your keys are, how to use them, etc.

've been lurking on the forums for a while now, and done many searches over the years around this stuff and it seems there is some sort of known issue with the downloaded recovery key, etc, but that may not be particularly relevant right now, because the situation is what it is
I havent seen any issues. What I've seen is folks accidentally not keeping up-to-date backups of their recovery key and not regenerating it when they resilver a disk, expand a pool, add or remove log/cache devices. I am quite curious though if there are or have been legit bugs with the recovery key.

I click the unlock button, add my password, upload the recovery key
As you now know, this likely is what caused the issue. I think it may have worked right away if you hadn't provided the passphrase since recovery keys dont use them. But you seem to have discovered this. :)

geli: Wrong key for ada3p2
This suggests you were trying to unlock the wrong device. Typically you will unlock /dev/gptid/<device>.
 

Volte

Dabbler
Joined
Feb 11, 2016
Messages
19
So, if you have 1 key that works you can certainly get the pool to a state as if nothing happened.

If the pool is available in the GUI then simply click "Encryption Rekey" and save this key, it is User Key 0 (0.key).

Then, add a passphrase if you desire one.

Then, click "Add Recovery Key", this is User Key 1 and has no passphrase.
Alright, I gave this a try. A few notes:
1. Upon "rekeying", I was not offered any key to download. So I clicked "download encrypt key", and got a geli.key, with the MD5 of 1f30048a40acd624a9fc514a3bc6b97d
2. Then I clicked "Add recovery key", which did offere me a key to download, the MD5 of which was 14373c532ee28872977b9afb73ea9e55
3. I exported the pool, then reimported it, and was prompted with passphrase/encryption key dialog.
4. I tried with just the passphrase, no dice.
5. Then I tried passphrase and geli.key (supposed User 0.key, what I call master.key), no dice.
6. Then I tried no passphrase, and the "new" recovery key (supposed User 1.key, what I call recovery.key), and that worked.

Conclusion, I'm still in the same spot, but now with many keys in my downloads directory

Did I eff something up here?

P.S. I really appreciate your engagement here, and I definitely support your notes for posterity! I can only hope this becomes a useful resource for other folks down the line.

I think it is best to test basic scenarios with encryption prior to using it on real data. Make sure you know what your keys are, how to use them, etc.
I agree!

I havent seen any issues. What I've seen is folks accidentally not keeping up-to-date backups of their recovery key and not regenerating it when they resilver a disk, expand a pool, add or remove log/cache devices. I am quite curious though if there are or have been legit bugs with the recovery key.
I'd have to go look up the old tickets, but I recall seeing some issues around the downloading of keys, particularly as compared to instructions presented (and while I respect that you're not official FreeNAS representation, see above regarding my "key download" bit)

As you now know, this likely is what caused the issue. I think it may have worked right away if you hadn't provided the passphrase since recovery keys don't use them. But you seem to have discovered this. :)
Seems this way, although still seem to be struggling with distinguishing keys. (I promise I'm not a moron haha!)

This suggests you were trying to unlock the wrong device. Typically you will unlock /dev/gptid/<device>.
Good to know! Thank you for this.
 
Joined
Oct 18, 2018
Messages
969
Seems this way, although still seem to be struggling with distinguishing keys. (I promise I'm not a moron haha!)
FreeNAS uses confusing names, it is easy to lose track.

Conclusion, I'm still in the same spot, but now with many keys in my downloads directory
I suggest you try to keep them separate, that way you know which key is which for each iteration. Basically, with each iteration you should end up with two keys, which it sounds like you have.

1. Upon "rekeying", I was not offered any key to download. So I clicked "download encrypt key", and got a geli.key, with the MD5 of 1f30048a40acd624a9fc514a3bc6b97d
This sounds correct.

4. I tried with just the passphrase, no dic
I think you may have accidently forgot to set the passphrase again. When you rekey a pool it resets the passphrase component.

Because you imported the pool with the recovery key the new 0.key is no longer valid.

Try the exact same steps you took above except make sure you do the following steps.

1.5 I wasnt clear above but right after you reky the pool be sure to click Add Passphrase if you want a passphrase (you have to do this else no passphrase is set. Rekeying the pool resets the passphrase.)

2.5 i didnt suggest this before but as a proof the keys work lock the pool and then try to unlock with your recovery key. Lock it again and unlock it with geli.key and the passphrase. This will prove to yourself that they keys work as expected.

Make sure when you import that you use geli.key and passphrase and not the recovery key else your keys will still be all funky.

It is worth noting that the passphrase component, if set, always still requires the 0.key. you cant use just the key or passphrase. 0.key is stored on your FreeNAS machine so it knows where to get it such that when simply unlocking you can just provide the passphrase and the system provides the key. On pool import your system doesnt have the key so you have to provide both. Unless you do something very crafty the recovery key (1.key) should never be given at the same time as the passphrase because it has no passphrase component and will therefore fail when provided with one.

When all is well I recommend you get rid of copies if old keys as it may accide fly lead to confusion over which key is which. Make sure you keep a few copies of the new, correct keys.
 

Volte

Dabbler
Joined
Feb 11, 2016
Messages
19
FreeNAS uses confusing names, it is easy to lose track.
Right, for example when I click "Download Encrypt Key", why does it continue to reference "Recovery Key" :
Screen Shot 2019-11-26 at 10.30.26.png


I suggest you try to keep them separate, that way you know which key is which for each iteration. Basically, with each iteration you should end up with two keys, which it sounds like you have.
Agreed. Mostly mentioned it for humor and to illustrate the frustration of the situation :D

I think you may have accidently forgot to set the passphrase again. When you rekey a pool it resets the passphrase component.

...
I think you're right. I just retried the process again, and here's what I get:
Screen Shot 2019-11-26 at 10.27.58.png

It is worth noting that the passphrase component, if set, always still requires the 0.key. you can't use just the key or passphrase. 0.key is stored on your FreeNAS machine so it knows where to get it such that when simply unlocking you can just provide the passphrase and the system provides the key. On pool import your system doesnt have the key so you have to provide both. Unless you do something very crafty the recovery key (1.key) should never be given at the same time as the passphrase because it has no passphrase component and will therefore fail when provided with one.
Question: Should everything work out, and I do import/unlock with 0.key, will recovery key (1.key) be restored, or shall I trash that one, and add a new recovery key?

Also, if I unlock/import with recovery key, couldn't I simply add a passphrase (if it would work, see above), and be at the state of "0.key with passphrase"?

~ Dale
 
Joined
Oct 18, 2018
Messages
969
Also, if I unlock/import with recovery key, couldn't I simply add a passphrase (if it would work, see above), and be at the state of "0.key with passphrase"?
Not quite, no. When you import a pool User Key 0 is set to whichever key you imported with. This if you import with the recovery key both User Key 0 and Use Key 1 will be the recovery key.

I think you're right. I just retried the process again, and here's what I get:
This is unexpected. What do you get when you grep for "geli setkey" in /var/log?

There is a way to manually set these keys and make sure FreeNAS knows which to look for but the fact that you're getting errors is indicative if something else possibly wrong. What do you get when the pool is unlocked and imported when you do "zpool status"?
Right, for example when I click "Download Encrypt Key", why does it continue to reference "Recovery Key" :
Very terrible naming. Every key you download is always geli.key (0.key) except the one you download when you click "Add Recovery Key". FreeNAS never stored the recovery key and so it can only be downloaded at that time.
 

Volte

Dabbler
Joined
Feb 11, 2016
Messages
19
Not quite, no. When you import a pool User Key 0 is set to whichever key you imported with. This if you import with the recovery key both User Key 0 and Use Key 1 will be the recovery key.
Oh ok, and then you could "Remove Recovery Key", "Add Recovery Key" (download it) and "Set Passphrase" to be back to the desired state, right? I'm just looking for a more succinct path in the future, for example, say one lost the master key (0.key)

This is unexpected. What do you get when you grep for "geli setkey" in /var/log?
Code:
/var/log/middlewared.log.1:[2018/10/13 16:49:25] (DEBUG) middleware.notifier._pipeopen():177 - Popen()ing: geli setkey -n 0 -J /tmp/tmphasx576q -K /data/geli/7391a625-82fa-4a0d-8626-81a3dd8b0fa7.key  /dev/gptid/4eb82f89-9131-11e8-8bdb-94de80aecfcf
/var/log/middlewared.log.1:[2018/10/13 16:49:26] (DEBUG) middleware.notifier._pipeerr():186 - geli setkey -n 0 -J /tmp/tmphasx576q -K /data/geli/7391a625-82fa-4a0d-8626-81a3dd8b0fa7.key  /dev/gptid/4eb82f89-9131-11e8-8bdb-94de80aecfcf -> 0
/var/log/middlewared.log.1:[2018/10/13 16:49:26] (DEBUG) middleware.notifier._pipeopen():177 - Popen()ing: geli setkey -n 0 -J /tmp/tmphasx576q -K /data/geli/7391a625-82fa-4a0d-8626-81a3dd8b0fa7.key  /dev/gptid/1c88233d-0f78-11e4-a8a0-94de80aecfcf
/var/log/middlewared.log.1:[2018/10/13 16:49:27] (DEBUG) middleware.notifier._pipeerr():186 - geli setkey -n 0 -J /tmp/tmphasx576q -K /data/geli/7391a625-82fa-4a0d-8626-81a3dd8b0fa7.key  /dev/gptid/1c88233d-0f78-11e4-a8a0-94de80aecfcf -> 0
/var/log/middlewared.log.1:[2018/10/13 16:49:27] (DEBUG) middleware.notifier._pipeopen():177 - Popen()ing: geli setkey -n 0 -J /tmp/tmphasx576q -K /data/geli/7391a625-82fa-4a0d-8626-81a3dd8b0fa7.key  /dev/gptid/1bbd442b-0f78-11e4-a8a0-94de80aecfcf
/var/log/middlewared.log.1:[2018/10/13 16:49:28] (DEBUG) middleware.notifier._pipeerr():186 - geli setkey -n 0 -J /tmp/tmphasx576q -K /data/geli/7391a625-82fa-4a0d-8626-81a3dd8b0fa7.key  /dev/gptid/1bbd442b-0f78-11e4-a8a0-94de80aecfcf -> 0
/var/log/middlewared.log.1:[2018/10/13 16:49:28] (DEBUG) middleware.notifier._pipeopen():177 - Popen()ing: geli setkey -n 0 -J /tmp/tmphasx576q -K /data/geli/7391a625-82fa-4a0d-8626-81a3dd8b0fa7.key  /dev/gptid/1af623bb-0f78-11e4-a8a0-94de80aecfcf
/var/log/middlewared.log.1:[2018/10/13 16:49:29] (DEBUG) middleware.notifier._pipeerr():186 - geli setkey -n 0 -J /tmp/tmphasx576q -K /data/geli/7391a625-82fa-4a0d-8626-81a3dd8b0fa7.key  /dev/gptid/1af623bb-0f78-11e4-a8a0-94de80aecfcf -> 0
Code:
/var/log/debug.log:Nov 26 10:26:59 localhost uwsgi: [middleware.notifier:179] Popen()ing: geli setkey -n 0 -P -K /data/geli/7391a625-82fa-4a0d-8626-81a3dd8b0fa7.key.tmp  gptid/1af623bb-0f78-11e4-a8a0-94de80aecfcf
/var/log/debug.log:Nov 26 10:26:59 localhost uwsgi: [middleware.notifier:188] geli setkey -n 0 -P -K /data/geli/7391a625-82fa-4a0d-8626-81a3dd8b0fa7.key.tmp  gptid/1af623bb-0f78-11e4-a8a0-94de80aecfcf -> 0
/var/log/debug.log:Nov 26 10:26:59 localhost uwsgi: [middleware.notifier:179] Popen()ing: geli setkey -n 0 -P -K /data/geli/7391a625-82fa-4a0d-8626-81a3dd8b0fa7.key.tmp  gptid/1bbd442b-0f78-11e4-a8a0-94de80aecfcf
/var/log/debug.log:Nov 26 10:26:59 localhost uwsgi: [middleware.notifier:188] geli setkey -n 0 -P -K /data/geli/7391a625-82fa-4a0d-8626-81a3dd8b0fa7.key.tmp  gptid/1bbd442b-0f78-11e4-a8a0-94de80aecfcf -> 0
/var/log/debug.log:Nov 26 10:26:59 localhost uwsgi: [middleware.notifier:179] Popen()ing: geli setkey -n 0 -P -K /data/geli/7391a625-82fa-4a0d-8626-81a3dd8b0fa7.key.tmp  gptid/1c88233d-0f78-11e4-a8a0-94de80aecfcf
/var/log/debug.log:Nov 26 10:26:59 localhost uwsgi: [middleware.notifier:188] geli setkey -n 0 -P -K /data/geli/7391a625-82fa-4a0d-8626-81a3dd8b0fa7.key.tmp  gptid/1c88233d-0f78-11e4-a8a0-94de80aecfcf -> 0
/var/log/debug.log:Nov 26 10:26:59 localhost uwsgi: [middleware.notifier:179] Popen()ing: geli setkey -n 0 -P -K /data/geli/7391a625-82fa-4a0d-8626-81a3dd8b0fa7.key.tmp  gptid/4eb82f89-9131-11e8-8bdb-94de80aecfcf
/var/log/debug.log:Nov 26 10:26:59 localhost uwsgi: [middleware.notifier:188] geli setkey -n 0 -P -K /data/geli/7391a625-82fa-4a0d-8626-81a3dd8b0fa7.key.tmp  gptid/4eb82f89-9131-11e8-8bdb-94de80aecfcf -> 0

There is a way to manually set these keys and make sure FreeNAS knows which to look for but the fact that you're getting errors is indicative if something else possibly wrong. What do you get when the pool is unlocked and imported when you do "zpool status"?
Nothing out of the ordinary from what I can see...
Code:
volte@november: ~ [14:50] zpool status
  pool: freenas-boot
 state: ONLINE
  scan: scrub repaired 0 in 0 days 00:00:30 with 0 errors on Mon Nov 25 03:45:30 2019
config:

    NAME        STATE     READ WRITE CKSUM
    freenas-boot  ONLINE       0     0     0
      da0p2     ONLINE       0     0     0

errors: No known data errors

  pool: raid1
 state: ONLINE
  scan: scrub repaired 0 in 0 days 08:39:26 with 0 errors on Sun Nov 24 08:39:29 2019
config:

    NAME                                                STATE     READ WRITE CKSUM
    raid1                                               ONLINE       0     0     0
      raidz2-0                                          ONLINE       0     0     0
        gptid/1af623bb-0f78-11e4-a8a0-94de80aecfcf.eli  ONLINE       0     0     0
        gptid/1bbd442b-0f78-11e4-a8a0-94de80aecfcf.eli  ONLINE       0     0     0
        gptid/1c88233d-0f78-11e4-a8a0-94de80aecfcf.eli  ONLINE       0     0     0
        gptid/4eb82f89-9131-11e8-8bdb-94de80aecfcf.eli  ONLINE       0     0     0

errors: No known data errors

Very terrible naming. Every key you download is always geli.key (0.key) except the one you download when you click "Add Recovery Key". FreeNAS never stored the recovery key and so it can only be downloaded at that time.
Oooh... very interesting distinction. I suppose that makes sense, given that you can find the geli.key (0.key, master.key) in /data/geli/.
 
Joined
Oct 18, 2018
Messages
969
Oh ok, and then you could "Remove Recovery Key", "Add Recovery Key" (download it) and "Set Passphrase" to be back to the desired state, right? I'm just looking for a more succinct path in the future, for example, say one lost the master key (0.key)
The order I suggest is "Rekey Pool", "Download Encrypt Key", "Add Passphrase", "Add Recovery Key".

I dont see any issues in your logs as is. Can you provide exact steps you took that lead to your most recent error? Start with rekeying the pool and follow the order above.
 
Top