geli - "Cannot read metadata from" "Invalid argument."

bpothier

Cadet
Joined
Apr 30, 2016
Messages
4
This is more of an informational post / bug / reference / resolution (for me at least) post.
Recently upgraded to TrueNAS 12 (started from FreeNAS 9.x to 10, rolled back to 9, upgraded to 11.x).
Have 2 GELI pools (with passphrase) running RAID-Z-2 (Supermicro server HW, lots of ECC RAM, SAS/SATA not-RAID controller, etc...).
Due to drive ages, recently started getting hardware faults on 3TB disks, resilvered - even w/ 2 failing drives, no issues.
Had drive errors on 4TB pool (2 equal RAIDZ2 VDEVS), added hotspare... there was an issue somewhere between adding hotspare and having a healthy pool, web UI faulted somewhere, but thru some CLI it eventually "seemed" happy.
Downloaded GELI key and recovery key when prompted - did not notice "recovery" key was 0 bytes! :(
Removed faulted 4TB disk, everything continued fine.
At some point upgraded from upgraded from 12.0-U2.1 to 12.0-U3.
On boot, Web UI attempted to unlock pool, and failed/aborted/detached them.
Trying from SSH saw that less than half of the disks would attach - this is when I noticed the 0 byte "recovery" key.
<INSERT PANIC>
Found multiple posts that basically said "you backed it up, so don't worry.. right?"
Found this (old) post https://www.truenas.com/community/threads/accidentally-marked-zfs-drives-as-unconfigured-good.69912/
Some "key" pieces of data are about using "dd" to read the "last block" of partition to check for geli header.
Eventually (10 days or so now) I discovered more details from playing with this command, so now will attempt to document in hopes that it may help someone else - or maybe a dev can figure out if there is a bug?
Code:
freenas# geli dump /dev/da3
Cannot read metadata from /dev/da3: Invalid argument.
geli: Not fully done.
freenas# gpart list /dev/da3 | grep Mediasize
   Mediasize: 2147483648 (2.0G)
   Mediasize: 3998639460352 (3.6T)
   Mediasize: 4000787030016 (3.6T)


# The following is copied / modified from above post/comment by @Ibes
# The number of bytes per block
BLOCK_SIZE=512
# The number of blocks on the media
BLOCK_COUNT=$(( 3998639460352 / ${BLOCK_SIZE} ))
# The number of blocks to skip (count - 1)
let SKIP=$(( ${BLOCK_COUNT} - 1))

# dump the entire last block (256B) then pipe it into grep to see if "GEOM::ELI" is present
dd if=/dev/da3p2 bs=${BLOCK_SIZE} skip=${SKIP} | od -a
0000000  nul nul nul nul nul nul nul nul nul nul nul nul nul nul nul nul
*
0001000


This was true for all of the disks that would not geli attach!
Eventually expanded out the dd and found a GELI header!
Eventually found that the GELI header for these partitions is located 4096b from end instead of 512b from end!?
On first disk, I wrote out the 512 bytes (starting 4096 from end) to a file and then re-wrote it to last 512 bytes of partition (when reading with dd use "skip=" when writing use "seek=").
Once I did that, doing the dd/od now showed the correct "GEOM::ELI" string!
Code:
0000000    G   E   O   M   :   :   E   L   I nul nul nul nul nul nul nul

Using geli dump even showed geli metadata!
However, when I attempted to attach it, I got:
Code:
freenas# geli dump /dev/da3p2
Metadata on /dev/da3p2:
     magic: GEOM::ELI
   version: 7
     flags: 0x0
     ealgo: AES-XTS
    keylen: 256
  provsize: 3998639456256
sectorsize: 4096
      keys: 0x03
iterations: 123456
      Salt: saltysaltysaltysalty
Master Key: 0keyd0key0keyd0key
  MD5 hash: 5a5a5a5a5a5a5a5a5a5

freenas# geli attach /dev/da3p2
geli: Provider size mismatch.
geli: There was an error with at least one provider.

Comparing it so a working geli device, I saw that the working one had: "provsize: 3998639460352" instead of "3998639456256".
Doing some math showed: 3998639460352 - 3998639456256 = 4096

So to fix it, I had to do a geli resize - which would read the GELI header from "old" offset and write it to the "new" last sector:
Code:
freenas# geli resize -v -s 3998639456256 /dev/da3p2
Done.
freenas# geli dump /dev/da3p2
Metadata on /dev/da3p2:
     magic: GEOM::ELI
   version: 7
     flags: 0x0
     ealgo: AES-XTS
    keylen: 256
  provsize: 3998639460352
sectorsize: 4096
      keys: 0x03
iterations: 123456
      Salt: saltysaltysaltysalty
Master Key: 0keyd0key0keyd0key
  MD5 hash: 5a5a5a5a5a5a5a5a5a5
freenas# geli attach -C -k /data/geli/0123456-0123-0123-0123-01234567890.key /dev/da3p2
Enter passphrase:
freenas#

SUCCESS!!!!!
So, just had to repeat for the remaining drives that all had a "blank" final sector - did not need to use dd, just geli resize using the discovered "old" size.

So, in summary, not sure how or why but somehow the geli size or header changed or moved during the resilver/hotspare expansion/upgrade....
No clue which step "caused" the issue, but hopefully this may help someone else recover.

Also things I discovered:
1) Each disk has its own "master encryption key" that is stored in this 512byte GELI header whihc is *only* stored in last sector of partition.
2) This drive master key is encrypted with the "key" and "recovery key" before actually being written.
3) The key and recovery key you download are used to decrypt each disk's master key to "create"(mount) the geli/eli device
4) This "master key" controls that actual encryption and never changes - it is only re-encrypted when the key/recovery key are rekeyed
5) You can backup this "encrypted master key" with geli backup command - must be down for each disk.
6) Losing this GELI header/sector renders the rest of the disk useless.
7) You can backup/dump this info after successfully attaching the geli/eli devices - not done by FreeNAS/TrueNAS or as part of its backup, though there have previously been a few requests to include it...

Hopefully these breadcrumbs help someone else at some point... I don't know too much more about the geli/eli stuff beyond this and otehr threads, so sorry probably can't help you with your specific situation... I am also moving to ZFS encryption (which doesn't seem to require these silly key/recovery key in the same way geli/eli did/does)
 

fahrgast

Cadet
Joined
Aug 13, 2018
Messages
7
You would not believe how much this blog post meant to me, when I found it!

I have had the same Issue when I added a new NVME as a L2 read cache to my GELI encrypted (with password) Z2 pool. I belive It was on TrueNAS 12.0-U5 at the time.

It was all working fine at first. I didn't realize the problem until the first reboot approximately one month later.

Strangely enough only one of four drives was able to decrypt when I tried to unlock the pool.

Only thanks to your post I still have the whole pool intact. Of course I had a backup but it would have taken days to restore and I would have lost the snapshots.

I also had the GELI header 4096 bytes from the end. I also did the geli resize and it all worked fine afterwards.

I’m since afraid to add a Vdev and haven’t modified anything since the incidence. I will rebuild the pool with zfs encryption as well and move the data over with zfs send / recv.

Thank you so so very much for your diary entry about your solution. I was so very happy when I was able to solve this problem with the help of your post. Send me your Bitcoin or Monero address and I will pay you a beer! Cheers
 
Top