Which tool for local backups?

ragametal

Contributor
Joined
May 4, 2021
Messages
188
I'm new to Truenas and I want your opinion about which tool to use to properly create a local backup of my data pool (because raid is not a backup, and snapshots are not a backup).

The backups will saved on a dedicated internal drive installed in the same machine as the data disks and will only be accessible to the root user.

I'm confused as to whether I should use local replication, rsync or use a tool such as Duplicati.

Ideally this local backup will be have versioning (incremental backup), encryption, and allow me to restore files from a non-truenas system.

I know this last requirement is the most dificult one and i can live without it but is important to me because as I would like to be able to mount the backup drive on another system and recover my data from there in case of a catastrophic hardware failure of the server. This other system will be either a windows workstation or a vm running Ubuntu.

Duplicati checks all the right boxes and I'm familiar with it but it is so incredible slow to do restores that makes me nervous about having a hardware failure during a full restore. That's why I want to evaluate other options such as replication or rsync.

This is what i have found during my research:
  1. Replication: faster, most efficient, has versinoning as it is snapshot based. But, does it copy all the data or just the snapshots? could i mount this ZFS formated drive on a linux machine and recover data from it later?
  2. Rsync: efficient, true backup, can be recovered from another machine. But, does it do versioning?
  3. Duplicati: Does everything but it's Slowwwwwwwww. Besides, why should I look for a third party software to do something truenas does natively?
So, what do you think? Would replication be the right tool for me? or maybe rsync? or am I stuck with duplicati?

P.s. I'm still planning on doing remote backups by using cloud sync pointing to a remote sftp server.
 

Patrick M. Hausen

Hall of Famer
Joined
Nov 25, 2013
Messages
7,776
Replications tasks will satisfy all your needs. In case of a catastrophic failure and provided you have stored your encryption keys somewhere offline you can import that single drive backup pool on any system running a sufficiently recent version of OpenZFS. FreeBSD, Linux, Mac - in theory even Windows but I don't have any experience with ZFS on Windows and its stability specifically.
 

Arwen

MVP
Joined
May 17, 2014
Messages
3,611
I use RSync with ZFS snapshots, so I get multiple backups on the same backup disk. After each RSync completes, I snapshot the backup.

It's not perfect. Before each backup I have to check to see if their is enough space. And if not, clear out the oldest snapshot.
 

ragametal

Contributor
Joined
May 4, 2021
Messages
188
@Patrick M. Hausen, thanks for your input which is consistent with what i have found during my research. Now, everything i have found indicates that the replication tasks are snapshot based. Do you know if they have the base data in addition to the snapshots? I'm sorry if that question is too basic, I'm still learning about this and I just want to be sure i only need the backup drive (and associated encryption keys) to recover my data.

@Arwen , Your approach is very interesting. You are making sure you get a copy of all the data with Rsync and also versioning with snapshots. I just don't think that solution is for me as I see myself forgetting to do the backups because its a manual process (time is a luxury i no longer have).

Now, based on my research, Truenas have several backup tools that provides protection against different risks as follows:
  1. Snapshots: protection against accidental/intentional deletion or modification of files.
  2. Snapshots: ramsomware protection.
  3. Raid: recover from hardware failure of 1 or multiple disk in the pool (depending on the configuration).
  4. Local replication: Local backup to another pool or another local Truenas server. Recover from losing the main pool. Common occurrence when rebuilding a pool after replacing 1 damaged drive.
  5. Rsync: same as #4 but to use a non-truenas system as target.
  6. Remote replication: Backup to a remote Truenas server: Recover from total loss of original system (fire, flood, theft) or from loosing the local backup pool during a recovery.
  7. Cloud Sync: Same as 6 but to use a non-truenas system as target.
Is my understanding correct?
 

Patrick M. Hausen

Hall of Famer
Joined
Nov 25, 2013
Messages
7,776
Do you know if they have the base data in addition to the snapshots?
What do you mean by "base data"? The replicated ZFS dataset or ZVOL contains exactly the content of the source dataset at the specific point in time when the snapshot was created. So if you e.g. create a snapshot every hour and keep them for two weeks you can get data as it was on the source two weeks into the past with a granularity of one hour.

HTH,
Patrick
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
Replication: faster, most efficient, has versinoning as it is snapshot based. But, does it copy all the data or just the snapshots? could i mount this ZFS formated drive on a linux machine and recover data from it later?
This question comes up every once in a while. I really don't understand it - who in their right mind would write something like ZFS replication and then somehow "only allow snapshots" without a "starting point or base image"? Whom would that help? What does it even mean to "only store snapshots" if you don't have something to diff from?

The best way of visualizing what is happening is as follows:
  • A snapshot is a little bit of metadata that represents the state of a given filesystem (TXG number, name, etc.).
  • ZFS is copy-on-write, so any writes involve allocating new space.
  • Later, blocks are freed if they are not part of a state represented by the live system or any extant snapshot. If they are, they stay exactly where they've been since they were written.
  • ZFS replication replicates a ZFS dataset from side A to side B. Now, since it is not practical to atomically operate on a live filesystem, it uses snapshots as the source, since they're guaranteed to be static and they were atomically created.
  • Incremental snapshots work because ZFS goes and checks the various snapshots and what blocks changed between them (all blocks are written with their creation TXG)
 
Last edited:

ChrisRJ

Wizard
Joined
Oct 23, 2020
Messages
1,919
The way I explain ZFS snapshots to folks is that they can think of it like a differential backup. The initial run makes a full "copy", and any subsequent run copies over the delta.
 

Evertb1

Guru
Joined
May 31, 2016
Messages
700
The backups will saved on a dedicated internal drive installed in the same machine as the data disks and will only be accessible to the root user.
Pray that nothing happens with your system. I think a backup in the same machine as the original data is a bad idea.
 

Constantin

Vampire Pig
Joined
May 19, 2017
Messages
1,829
Pray that nothing happens with your system. I think a backup in the same machine as the original data is a bad idea.
Agreed. At least an occasional offline and offsite backup.
 

ragametal

Contributor
Joined
May 4, 2021
Messages
188
Thank you all for the responses, especially for the clarification that ZFS replication is indeed a full copy of the source data when the last snapshot was created.

The more I learn about replication the more I’m convinced it is the right tool to do my local backups (at least in truenas).

@Evertb1 and @Constantin , I agree with you 100%. That is why I also intent to use cloud sync task to do offsite backups to a remote SFTP server in a friend’s house.

My understanding is that when you replace a failed disk from a degraded pool, the remaining disks will be stressed during the resilvering process. This could lead to another hardware failure in the pool which, depending of the configuration of your vdevs, could mean losing the entire pool.

This local backup would be a protection against that particular scenario.
 

Constantin

Vampire Pig
Joined
May 19, 2017
Messages
1,829
Yup. The nice thing about sending ZFS Snapshots vs. individual files is that you can encrypt the local pool and the replicated pool will be encrypted also.

The zfs send is super efficient compared to using rsync and likely most protocols because the snapshots already capture what has changed in a way that is much more efficient than most file replication protocols.

For example, rsync will traverse up and down the directory tree looking for changes. then all files no longer at their old place are deleted on the target and “new” files are copied. However, even moving files locally will cause them all to be deleted on the target and re-copied. So a lot of additional transfer. Snapshots by contrast capture the changes in the directory tree, just transfer the change in the metadata.

So if there is a way to use zfs to send the data, I’d explore that - including for offsite cloud backups. The less you transfer, the less time it takes to complete, the less likely it is to hit transfer limits for both your ISP as well as the cloud provider, etc.
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
So if there is a way to use zfs to send the data, I’d explore that - including for offsite cloud backups. The less you transfer, the less time it takes to complete, the less likely it is to hit transfer limits for both your ISP as well as the cloud provider, etc.
Conveniently, rsync.net now seems to support encrypted sends. It's not the cheapest option around, but it's pretty convenient.
 

Constantin

Vampire Pig
Joined
May 19, 2017
Messages
1,829
If they use rsync, I’d wager you’ll need a big pipe with low latency to make it work. Or use disk images. Lots of small files and rsync do not play nice.
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
rsync.net does zfs recv in addition to rsync (and a few other things), don't let the name deceive you.
 
Top