HELP: Extended GELI-encrypted pool, now some disks have different keys

neutralitus

Cadet
Joined
Dec 16, 2017
Messages
2
Hi All,

I extended a GELI-encrypted pool on truenas 13U1 core. There was an error rekeying when it extended the pool and it was unable to rekey the old disks. The pool has 44 disks (7x 6disk raidz2, 1 spare, 1 cache) When I try to unlock the pool (after stupidly locking it), it is able to unlock 38 of the disks with the old key and 6 disks with the new key.

Is there a way I can manually mount this? I see in dmesg that between the 2 key files I can unlock all the disks, but as soon as truenas fails to unlock one of the disks it relocks the geli disks it DID successfully unlock. If I can manually mount each GELI container, I should be able to mount the zpool as well I think.

After I do that, I will need to find a way to get all the GELI keys in sync. I cannot afford to buy 42 more disks and zfs send to a new pool.

Please help me!

Christian
 

artlessknave

Wizard
Joined
Oct 29, 2016
Messages
1,506
the people most likely to be able to help with this generally do not reply to posts that haven't followed the forum rules.

you need to redesign you pool, and you NEED to get rid of GELI. that will be going away entirely probably fairly soon. there are instructions out there to decrypt each drive, one by one, until the pool no longer have encrypted drives. you will, however, want a bacup first. you dont need to copy the same pool layout, you can make a single striped pool of a few large drives. no backup is a recipe for disaster with GELI.
 

neutralitus

Cadet
Joined
Dec 16, 2017
Messages
2
Ah, sorry for not doing so; I was panicking.

Anyways, I ended up solving my own problem. I ended up writing bash scripts to do most of this.

*Read geli manpage at https://www.freebsd.org/cgi/man.cgi?geli(8)
*Use try mounting the disks, and find the paths in dev that truenas was using for each geli-encrypted partition (the disks can be accessed from multiple paths).
Since I was able to mount some of the disks with one key+password and the rest with another key, I could see in dmesg what was needed to mount each and their nodes in /dev, then I did some regex on the dmesg output to get a list of the two sets of partitions.
*Use "geli backup" to save metadata of each geli partition (VERY IMPORTANT. I figured that if I made a mistake with "geli setkey" I could always restore the old metadata)
*Figure out what key/password combination each drive needs to geli attach
*"geli attach" all volumes
*"zpool import -R /mnt poolname"
*pool now shows as mounted in webui!

*on each volume: "geli setkey -j filewitholdpassphrase -J filewithnewpassphrase -k filewitholdkey -K filewithnewkey" on each geli partition.
note: I had to remove -j and add -p since some of the geli partitions seemed like the passphrase didn't get set at all by the webui
also, i ended up using the old key file as my new keyfile. I really should have generated another one for security but I was lazy. It's stored in /data/geli
*lock the pool from the webui (actually i just rebooted since truenas decided to use the particular pool to hold my home directories and i was too lazy to mess with that
*unlock pool from webui with new password: SUCCESS!
*add new recovery key to pool
*reboot again
*unlock pool from webui: still working. all done

(I hope this helps other people)

Sorry for not reading the manpages first. I really was freaking out.

I'd love to back up my pools but I have hundreds of terabytes of 4K camera footage and personal data across a bunch of 6TB SAS datacentre disks in a few raidz2 pools. For awhile I had 2 freenas servers and used zfs send over a 10GbE link to send differential snapshots, but I ran out of space and could not afford more disks (and my apartment probably couldn't handle the extra heat anyways)

Even for a backup just to rebuild a pool, I wouldn't trust anything less than raidz1 or mirrored, even though I have never had any of these disks fail. I really need to create a bunch of smaller pools and migrate my data to them one at a time, so it only takes a smaller pool if I need to rebuild.

I use GELI since it sounds like zfs native encryption does not encrypt all the metadata, and that's probably a no-go for me. I suppose I'll have to switch to vanilla freebsd or ubuntu with luks at some point, unless i am somehow convinced to trust zfs native encryption. From this experience I have learned that it wouldn't be too hard to write a shell script to unlock and import my pool (which I actually did during the process) and it'd give me more control. Seeing python scripts give meaningless errors in the webui scares me!

Overall, I got VERY lucky. The process to read figure out what happened, panic for a bit, read some manpages, write prototype scripts, and do the above process took under 3 hours.
 

ChrisRJ

Wizard
Joined
Oct 23, 2020
Messages
1,919
I'd love to back up my pools but I have hundreds of terabytes of 4K camera footage and personal data across a bunch of 6TB SAS datacentre disks in a few raidz2 pools. For awhile I had 2 freenas servers and used zfs send over a 10GbE link to send differential snapshots, but I ran out of space and could not afford more disks
The "I don't have the money" argument is not a very good one in my view. Without backup you will sooner or later loose data. Period. So either the data is not important, or you should really do something here. I don't want to be rude, especially not on Christmas Eve :smile:. But the money argument is a bit time like saying "I didn't have time to do XYZ", which really means "XYZ was not that important to me".

You could start gradually by sorting the data into different categories of criticality. To make up an example, B-roll footage is likely less important than personal photos. A relatively simple start would be to get 2 USB HDDs on sale for 2 generations of on-site backup. Plus off-site to a cloud storage provider.
 
Top