Syncing over internet and encryption methods for stored data

rtX

Cadet
Joined
Apr 26, 2021
Messages
6
I'm new to TrueNAS and ZFS. I'm planning to set up two servers, in two geographically separate locations. I want them to mirror each other (via internet), and at least one to have RAID capability switched on. I want data on both systems to be encrypted. I want this in case someone gains physical access to the drives and removes them: I want them to be useless in respect of accessing data. I need this facility for GDPR and other reasons. I'm aware that OpenZFS has native encryption.

I'm considering the best way to achieve encryption of my data. I could create large Veracrypt file containers (say 1TB each) and mount them on my desktop machine when I need access, and I think this is my preferred course of action. I have a concern that each modification will necessitate a massive (whole container) file upload for syncing with the other server. Does TrueNAS do some kind of delta synching (only that part of the file that has been changed is synched)?

I'm aware of the potential inefficiency in this choice in respect of wasted hard drive disk space. I'm familiar with Veracrypt and use it every day, and I'm nowhere near as familiar with TrueNAS, ZFS etc. I'm aware that my fear of trying something different may be driving my choice. I cannot afford to lose the data. I do want to make sure that the best aspects of TrueNAS and ZFS are utilised and as I understand it LUKS/Geli prevent access to the underlying hardware believe that may not work for me. I'm good at keeping access keys/passphrases securely.

Any thoughts and suggestions would be welcome.
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
If you cannot afford to lose the data, you need to behave accordingly, and set up ... probably RAIDZ2 or mirrors, depending on how the data is written/rewritten. Mirrors will be more performant, but if you don't really need lots of speed, RAIDZ2 may work fine. Doing without basic data protection is not recommended. Fate likes to victimize the unprepared. If you set up a system that's properly protected, then the other funny thing happens, which is that fate usually leaves you alone, and you feel like you've wasted money.

Replication behaviour will depend on how the application works.

If you use ZFS replication, only blocks that are rewritten will be replicated. If you take a 1TB file, copy it into memory, and rewrite it to disk, this counts as having been rewritten for purposes of this discussion, even though you might want to argue that the actual data has not changed. ZFS sees that the disk blocks have been written to, and replication is driven by that.

rsync is more tolerant of that kind of thing, because it can be asked to work by analyzing the file contents, and only transfer over changed portions. This incurs a huge I/O penalty, of course, because it has to read the entire file on each side in order to accomplish that trick.

There isn't a good way to "mirror each other". Replication is basically a one-way street. You can set up a datastore at each site and then replicate it to a backup datastore at the other site, but you will be working on separate files. If you are willing to manually manage synchronization, rsync will be happy to push or pull a single file in either direction. Other tools such as syncthing or unison attempt to handle this in various ways too, but with caveats.
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
I could create large Veracrypt file containers (say 1TB each) and mount them on my desktop machine when I need access, and I think this is my preferred course of action.
That seems like a byzantine approach. Full-disk encryption should take care of any compliance requirements not met by ZFS native encryption.

Granted, it's not clear that iX wants to support that going forward in addition to native encryption. Realistically, I don't think they have a choice, ZFS native encryption leaks just enough metadata to require close scrutiny. This is of course, just my analysis.
 

rtX

Cadet
Joined
Apr 26, 2021
Messages
6
That seems like a byzantine approach. Full-disk encryption should take care of any compliance requirements not met by ZFS native encryption.
Many thanks. Is there a good resource that I can read that explains the pros and cons of full disk encryption and ZFS native encryption?
 

rtX

Cadet
Joined
Apr 26, 2021
Messages
6
If you use ZFS replication, only blocks that are rewritten will be replicated. If you take a 1TB file, copy it into memory, and rewrite it to disk, this counts as having been rewritten for purposes of this discussion, even though you might want to argue that the actual data has not changed. ZFS sees that the disk blocks have been written to, and replication is driven by that.
Many thanks. I'm pretty sure that Veracrypt does not read the whole encrypted 'volume' into RAM. Just the relevant parts. and just writes that relevant part to the drive. That is one of the reasons that I like it.
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
Many thanks. Is there a good resource that I can read that explains the pros and cons of full disk encryption and ZFS native encryption?
Good question, I'm not aware of any...
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
To summarize the differences between using full-disk encryption (I'll refer to GELI for brevity's sake):
  • Granularity: GELI encrypts entire disks, ZFS encrypts specific datasets
  • Portability: GELI is specific to FreeBSD, other OSes have analogous solutions. ZFS native encryption is expected to work across OpenZFS Tier-1 platforms without issues.
  • What is encrypted: With GELI, basically everything is encrypted wholesale. ZFS leaks metadata, namely dataset properties (in particular the names) of encrypted datasets.
  • Data at rest: With GELI, data must be decrypted for any sort of management. ZFS allows you to scrub the data without decrypting it, ensuring its safety against bit rot, when given redundancy. ZFS also allows you to send an encrypted stream, so that you can have a remote backup at a less-than-trusted provider that they cannot decrypt but can still keep safe.
 
Joined
Oct 22, 2019
Messages
3,641
If you use ZFS replication, only blocks that are rewritten will be replicated. If you take a 1TB file, copy it into memory, and rewrite it to disk, this counts as having been rewritten for purposes of this discussion, even though you might want to argue that the actual data has not changed. ZFS sees that the disk blocks have been written to, and replication is driven by that.

I did a test with a VeraCrypt container (saved and shared via SMB on its own "playground" dataset). The dataset playground is non-encrypted and uses no compression.

I created a new encrypted container file in VeraCrypt, chose 2GB as its size, and selected the SMB shared folder as its location. Upon completion, there now sits a 2GB file filled with random data (dontmindme.img) that lives under the playground dataset.

I made a snapshot named playground@001 and replicated it to a destination. A total 2GB were transferred. (As expected.)

With the container still mounted with VeraCrypt, I copied over a 100MB file into the folder.

I made a new snapshot named playground@002.

The dataset still only contains a single 2GB file named dontmindme.img.

When I did an incremental replication of playground@001 to playground@002, only 100MB needed to be transferred to the destination.

This tells me that VeraCrypt writes "in place", regardless if the container file is on a local disk or on an SMB network share; and thus it plays friendly with ZFS replication. However, I'm not sure how well this can work for multiple users accessing the container simultaneously (shared over an SMB folder)?

As mentioned before, I don't see much advantage to this approach over native ZFS encryption. The fact that there exists a gigantic 1TB file filled with nothing but random data, it's not without reason that a savvy individual will assume it's an encrypted container. (Why else would someone have a 1TB file that contains nothing but random data without any header information?)

With native ZFS encryption, it's obvious that the encrypted dataset is encrypted, yet you needn't use a third-party application (VeraCrypt) nor initially create a fixed file-size container to achieve the same benefits of encryption and replication. As for it being "obvious" the dataset is encrypted, it's really only "more obvious" than a 1TB file completely filled with random data. :wink:
 
Last edited:

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
Why else would someone have a 1TB file that contains nothing but random data without any header information?

Why, ZFS performance testing, of course. It's necessary to have a high quality random data file if you want to test disk speeds and ZFS compression under worst-case scenarios.

You asked.
 
Top