POOL LOSS bug 11.1-RC1

Status
Not open for further replies.

rs225

Guru
Joined
Jun 28, 2014
Messages
878
Yesterday a user reported the loss of their encrypted pool after installing FreeNAS 11.1-RC1 to a new boot device. Upon attempting to import the pool, the drives were wiped of critical metadata.

I have reproduced this behavior in a virtual machine by installing 11.0 U3, upgrading to 11.0 U4 on a fresh boot device ( which worked), and then upgrading to 11.1-RC1 on a fresh boot device, which destroyed the pool when attempting to import through the GUI. As I was expecting this, I was running with snapshots in my VM, and was able to confirm that the encryption was wiped, and pool was destroyed by zpool destroy.

STAY CLEAR OF FREENAS 11.1-RC1. If you are already running it, you should consider your next steps.

The bug was already filed as: https://redmine.ixsystems.com/issues/26507
New bug filed by Ericloewe(thanks!): https://redmine.ixsystems.com/issues/26834
 
Last edited:

Jailer

Not strong, but bad
Joined
Sep 12, 2014
Messages
4,977
That bug report is for upgrading an encrypted Corral pool to 11. You should file a new bug report with your findings.
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194

rs225

Guru
Joined
Jun 28, 2014
Messages
878
Indeed.

I've been testing, seems to only hit when importing an encrypted pool. If upgrading from 11.0-U4 to 11.1-RC1, it's fine.
I wasn't able to test anything else, but wanted to make sure this was known. Therefore, I only tested a fresh boot device install, rather than an upgrade of the 11.0 U4 device to 11.1-RC1. Do you know if it happens either way, or only the way I did it?

Thanks to Ericloewe for filing a new bug, as updated above.
 

m0nkey_

MVP
Joined
Oct 27, 2015
Messages
2,739
Here's a screen capture of the bug. Created an encrypted pool, added passphrase and backed up key and recovery key. Locked and detached the pool, then attempted to re-import. The import fails the pool is lost.

 

m0nkey_

MVP
Joined
Oct 27, 2015
Messages
2,739
More debugging. It seems if a passphrase is set and you import, it basically runs 'zpool destroy'.

Code:
Nov 25 17:32:27 freenas uwsgi: [middleware.notifier:138] Executing: zpool destroy -f tank										  
Nov 25 17:32:27 freenas uwsgi: [middleware.notifier:159] Executed: zpool destroy -f tank -> 0   


Will provide more info when I have it.
 

m0nkey_

MVP
Joined
Oct 27, 2015
Messages
2,739
Last edited by a moderator:

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
From what we could figure out, this is more or less what's going on:
  • Someone broke importing encrypted pools with some database bug, which triggers an exception
  • The exception handler is poorly coded. The function that exports a pool is called something like volume.destroy() and it does a number of things - it can export pools and it can delete pools (possibly other cleanup tasks). The .destroy nomenclature is probably referring to destroying an object in memory that represents the pool.
    Now, the bug here is a two-parter:
    • The exception handler calls volume.destroy() with no arguments, relying on its default behavior
    • The default behavior is to destroy a pool with extreme prejudice, instead of exporting it (We're talking issuing zpool destroy and then nuking all GELI metadata)

This also means that any exception triggered during pool import would destroy the pool being imported.

Note that all this behavior seems confined to the 11.1 branch, after 11.0 was spun off for release.
 
Last edited:

KingPenguin

Cadet
Joined
Nov 25, 2017
Messages
2
Is there any method of recovery considering the data partition sounds like it is untouched? Is it possible to recreate the "murdered" metadata for example (forgive my lack of understanding if that's a dumb question!)? Unfortunately, it appears I have encountered this issue also. Appreciate it may be too late for me but happy to share more info if it helps resolve for others. Thanks.
 

m0nkey_

MVP
Joined
Oct 27, 2015
Messages
2,739
Is there any method of recovery considering the data partition sounds like it is untouched? Is it possible to recreate the "murdered" metadata for example (forgive my lack of understanding if that's a dumb question!)? Unfortunately, it appears I have encountered this issue also. Appreciate it may be too late for me but happy to share more info if it helps resolve for others. Thanks.
If the pool was destroyed with zpool destroy it would be somewhat trivial to recover a deleted pool. However, when this exception is called it not only destroys the pool, it also wipes the GELI data. If you have the GELI dump backed up, it may be possible to recover. Since FreeNAS doesn't backup the GELI data, this is practically impossible to recover from.
 

KingPenguin

Cadet
Joined
Nov 25, 2017
Messages
2
Bummer. I already assumed all was lost, however I will take your comment as closure! If you don't mind answering further, FreeNAS seems to create *.eli files which I understand could be used with geli restore. Are these wiped upon OS reboot? Is it possible to extract these from an OS backup USB key perhaps? Given up on recovery, purely educational at this point. Thanks.
 

rs225

Guru
Joined
Jun 28, 2014
Messages
878
A followup. Variants of this bug still seem to be present in 11.1-RC3 (attempt to autoimport a Corral pool, non-encrypted, probably recoverable) and 11.0U4 (an upgrade that is believed to have ran out of USB space, non-encrypted, recovered the pool.) Anyone with an encrypted pool should not be thinking about upgrading right now, and also how nice it is to have a backup, especially of the irreplaceable data.
 

m0nkey_

MVP
Joined
Oct 27, 2015
Messages
2,739
A followup. Variants of this bug still seem to be present in 11.1-RC3 (attempt to autoimport a Corral pool, non-encrypted, probably recoverable) and 11.0U4 (an upgrade that is believed to have ran out of USB space, non-encrypted, recovered the pool.) Anyone with an encrypted pool should not be thinking about upgrading right now, and also how nice it is to have a backup, especially of the irreplaceable data.
Suggest you update the bug ticket: https://redmine.ixsystems.com/issues/26834
 
Status
Not open for further replies.
Top