Replace encrypted drive

Gen8 Runner

Contributor
Joined
Aug 5, 2015
Messages
103
Hi everyone,
no emergency post, just a general question, because i had the last time on expanding my pool capacity by bigger drives, serious problems due to full disk encryption.

In the FreeNAS User Guide, 9.4.1.1. Replacing an Encrypted Disk, it says:

"First, make sure that a passphrase has been set using the instructions in Managing Encrypted Pools before attempting to replace the failed drive. Then, follow steps 1 and 2 as described above. During step 3, there will be a prompt to enter and confirm the passphrase for the pool. Enter this information, then click REPLACE DISK. Wait until resilvering is complete.
Next, restore the encryption keys to the pool. If this additional step is not performed before the next reboot, access to the pool might be permanently lost. "


But isn't this procedure really riskfull?
If your computer crashes during resilvering, due to electricity fail in your city, hardware-fault, power-supply fault etc. and you didn't restore the encryption key, what happens exactly?
I had this case last time. I replaced just some hard drives by bigger ones, as suddenly the electricity broke down and the server switched off. I had severe problems, fixing this and a really big portion of luck.

1. Is here any improvement planned, that in those cases almost nothing can happen? (I know, 100% safety can never exist)
2. How can you protect yourself best at the moment from this fault? Resilvering takes sometimes 20 Hours or more, even a big USV cannot keep the Server for such a long time alive, until resilvering is completed and you can restore the encryption keys safely.
3. Is it possible, to start the resilvering process and immediately restore the encryption keys or is it really only possible at the end of the resilvering process?

So, which guideline would you recommend?
Cheers
 

Apollo

Wizard
Joined
Jun 13, 2013
Messages
1,458
As far as understand and from experience, but like yourself the documentation isn't very specific and detailed.
But I seem to remember doing the following:
If you do not have a passphrase, create one, add the new disk and as resilvering takes place, you can rekey the volume, save the Geli key and save the recovery key. You can then remove the passphrase and generate another Geli key.
The rekeying is no longer necessary.
I had an issue once around Coral, where encryption was a complete mess during the upgrade, and this is when I realized the passphrase was required to rekey all the drives.
Theoretically, when a disk is replacing a failed drive the key from the volume should be used onto the new drive, but that may not be the case.
If system reboot and passphrase was not added to rekey the volume, upon rebooting, the new drive would proceed resilvering from scratch.

I don't know how much of it is really the right way, or if it was a series of misfortune caused by the upgrades. Regardless, I never lost an encrypted pool due to a failed replacmeent procedure. Maybe I am just a special case or there is more resiliency in place to cover such cases.

A deep dive into Encryption and volume replacement would be a great asset to this forum. I think iXsystem should validate and demonstrate the ins and out.
 

aaron.stjohn

iXsystems
iXsystems
Joined
Jan 10, 2019
Messages
11
Hello, Gen8 Runner. Not too long ago I made a minor change to the documentation dealing specifically with your issue. The change says to restore the encryption keys immediately after clicking REPLACE DISK instead of waiting on the resiliver to complete. This is a much safer process.

This change can be found at https://github.com/freenas/freenas-docs/pull/549.
 

Gen8 Runner

Contributor
Joined
Aug 5, 2015
Messages
103
Hello, Gen8 Runner. Not too long ago I made a minor change to the documentation dealing specifically with your issue. The change says to restore the encryption keys immediately after clicking REPLACE DISK instead of waiting on the resiliver to complete. This is a much safer process.

This change can be found at https://github.com/freenas/freenas-docs/pull/549.

Thanks for your answers and thanks @aaron.stjohn for the Update of the documentation. This time can really be bypassed by an USP.
But that looks like a working solution for this case.
 
Last edited:

pro lamer

Guru
Joined
Feb 16, 2018
Messages
626

Gen8 Runner

Contributor
Joined
Aug 5, 2015
Messages
103

psoni

Dabbler
Joined
Mar 5, 2019
Messages
13
I have two (2) failed disks in RAIDZ-2 pool.

I read the documentation to better understand the process of replacing an encrypted disk and have one doubt - would really appreciate if someone could confirm the steps below


Details –
  • Build 11.1-U5
  • RAIDZ-2 Pool with two failed disks (ada0; ada1 – 4TB drives)
  • No passphrase; only recovery key (volume is not locked)
  • Recovery key is missing
Steps –

[1] Go to Storage > View disks and note down the serial numbers
[2] Shut down the system
[3] Install new disk
[4] Power on the system
[5] Go to Storage > Volumes, and click the “Volume Status” button. Select a disk and click the “Replace” button. Choose the new disk as the replacement.
At this stage, it’s not asking for a passphrase only a warning message – “recovery key of the volume will be invalidated” which I think is fine since we will be creating a new recovery key later
[6] Wait for resilver process to complete
[7] Once resilver is complete, highlight the pool that contains the recently replaced disk and click the Encryption Re-key button in the GUI
[8] Click the Download Key button to save the new encryption key
[9] Click the Add Recovery Key button to save the new recovery key. The old recovery key will no longer function, so it can be safely discarded.

Question - Can I replace both drives at the same time and then go through #5 to #9 OR Should I replace one disk at a time, wait for resilver to finish and then repeat the process (same steps above) for the second disk?

Any help / direction would be greatly appreciated.
 

Apollo

Wizard
Joined
Jun 13, 2013
Messages
1,458
Out of curiosity, how did the 2 disks happened to fail? Is this due to an upgrade from an earlier version of Freenas?
If that's the case, the drives may not be failing due to hardware but due to driver and encryption mechanism.
 

psoni

Dabbler
Joined
Mar 5, 2019
Messages
13
Not sure, we recently took over this system for management.
Degraded volume alert in the GUI shows 2018 timestamp. So, it's been running like this for quite sometime.

Any suggestions/recommendations on the steps above?

Thank you
 

pro lamer

Guru
Joined
Feb 16, 2018
Messages
626
  • No passphrase; only recovery key (volume is not locked)
  • Recovery key is missing
Not sure if I understand it correctly - you're not having the keys downloaded... Maybe download them just in case... apparently I misunderstood :)

Sent from my phone
 
Last edited:

Apollo

Wizard
Joined
Jun 13, 2013
Messages
1,458
Just backup the keys as suggested by @pro-lamer.
In my experience with degraded encrypted pool, is that a pool without a passphrase can be the root of the problem when upgrading Freenas to a different release.
If you have a backup pool already setup, it would make things easier.
Before shutting down your system, I would do "zpool status" of the defective pool and figure out what are the state of the failed drives.
To me, RAIDZ2 with 2 failed drives indicates a possibility the upgrade didn't include the redundant drives in the process.
I will need to check the exact procedure, and I thought I made a post related to that, but I need to find it.

Normally having created a passphrase doing the rebuild is enough to rebuild the pool without rekeying the all thing.
 

Apollo

Wizard
Joined
Jun 13, 2013
Messages
1,458

psoni

Dabbler
Joined
Mar 5, 2019
Messages
13
Thank you for the response.

Volume status page has an option to 'download key'. But I assume it's for the Encryption (Geli) key and not Recovery key.
I believe they are both different parameters and there's no option to download Recovery key. PLEASE correct me if I am wrong.

What I meant in my previous post was - we don't have the Recovery key that was configured earlier.

There are five (5) tabs with key icon on volume status page (screenshot attached) which are as follows (from left to right)
1 - Create Passphrase
2 - Download Key
3 - Encryption Re-Key
4 - Add Recovery Key
5 - Remove Recovery Key

Also, when I click on "Replace" disk - it's not asking for a passphrase since passphrase hasn't been configured on the pool.
There is a warning message however that states - "recovery key of the volume will be invalidated” (See Attached)

Based on the description given here - https://www.ixsystems.com/documentation/freenas/11.1/storage.html#managing-encrypted-volumes
I believe the configuration we have is -
  • Key stored locally, no passphrase: the encrypted volume is decrypted and accessible when the system running. Protects “data at rest”only
There is no passphrase since the volume has no lock icon.
But it seems Recovery Key was configured since I am getting the warning above while replacing a disk.

Questions -
[1] Do I have to create a passphrase before replacing a disk even if it's not configured currently?
[2] Can I replace both drives at the same time and then go through #5 to #9 OR Should I replace one disk at a time, wait for resilver to finish and then repeat the process (same steps above) for the second disk?


Please let me know if I am missing any steps.
Thank you once again.
 

Attachments

  • Volume Status Page.png
    Volume Status Page.png
    82 KB · Views: 411
  • Replace Disk Window.png
    Replace Disk Window.png
    76.2 KB · Views: 407
  • Zpool Status Output.png
    Zpool Status Output.png
    38.1 KB · Views: 394

Apollo

Wizard
Joined
Jun 13, 2013
Messages
1,458
"Donwload key" is indeed going to download the GELI key.
You need to do that.
Encryption-Re-Key generate a new key different from the one currently in use. So when you do, I believe you will be able to save the new Geli Key.

"Add Recovery Key" actually allow you to download the Geli key with the disk GPID. You won't need the Geli key from the above steps, but you will need to know the Passphrase is it is setup.

From my experience, the trouble I have had with encrypted pool during resilvering due to a missing disk, was the absence of a Passphrase.
It would seem if the passphrase was not enabled when resilvering was happening, either at the begining or the end, upon reboot, the newly added disk would not have the correct key and would not be made unavailable.
I don't know if Re-keying would be enough. Most issues related to unavailable disk were the result of an upgrade of the Freenas/FreeBSD release so it is possible the passphrase would not have been required back then.


About the following comments:

  • Key stored locally, no passphrase: the encrypted volume is decrypted and accessible when the system running. Protects “data at rest”only
This means that when the pool is attached as it is currently, and doesn't have a passphrase, then upon booting Freenas, the content will be available for anyone with Freenas knowledge and who has access to the server.
What I think abut ""data at rest" is probably in the event you have the volume detached from the system. Without the seed on the system, the content of the pool will be literally inaccessible until the recovery key or Geli key is used to import the volume.

Dealing with an encrypted degraded pool is always nerve wrecking. I would suggest if the failure was not hardware related and the system is stable, then I woul perform a complete replication of it to a new pool, running on another system. At least, when successful, you will have less chances of loosing everything.

With a backup in place, then I would do the following:

Save the Geli keys from pool.
I suspect the issue is with the release of 11.1 back then as I have experienced.
I would consider upgrading to 11.2 as the encryption issue has been fixed at some point. If it is the case, you would not have to replace the disk, assuming they are not mechanically or electrically faulty.
If I remember correctly, upon updating and restarting, the pool will/should be reloaded automantically and the pool going through the resilvering on its own without you needing to do anything. That woul be the best case scenario.

If this doesn't work, you can proceed by replacing the faulty disks. You can replace both at the same time. I don't think it will put more load on the system. I have done it a few times because of the reason above.
 

pro lamer

Guru
Joined
Feb 16, 2018
Messages
626
upgrading to 11.2
Upgrading to 11.2 caused some people lost their data. If an unset passphrase was an issue I'd set it up temporarily. Just need to be careful not to confuse keys...


Not sure if I understand it correctly - you're not having the keys downloaded...
So I was wrong :) I'll edit my original post...


Sent from my phone
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
[2] Can I replace both drives at the same time and then go through #5 to #9 OR Should I replace one disk at a time, wait for resilver to finish and then repeat the process (same steps above) for the second disk?
Replace both disks and re-key as soon as the resilver starts.
 

psoni

Dabbler
Joined
Mar 5, 2019
Messages
13
Replace both disks and re-key as soon as the resilver starts.
Thanks, is it safe to re-key as soon as resilver starts? I read in the documentation that I have to wait for resilvering to finish before I can re-key. We are on 11.1-U5. Maybe it changed in 11.2?
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194

Apollo

Wizard
Joined
Jun 13, 2013
Messages
1,458
I think it is safe. If you don't then the replaced disk will not be available.
I don't know the underlying process to tell what's happening, but make sure you save the new Geli key and recovery key when you do the Rekeying.
Place them in a different folder location, best to give it a name to the pool and date so it makes easier to track which is which.
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
I don't know the underlying process to tell what's happening
Prepare to learn:

GELI has two key slots. FreeNAS uses the first one with a key and a passphrase and the second one for the "recovery" key. These keys are used to decrypt the key that actually encrypts the disk.

FreeNAS keeps all disks using the same keys, rekeying replaces all the keys with a new key, which is then the same for all disks.
 

Apollo

Wizard
Joined
Jun 13, 2013
Messages
1,458
Will Rekeying works when no passphrase is used?
What is the meaning of "Adding" "Removing" recovery key? What exactly is happening in the background?
 
Top