rsync with hardlinks or rsync then snapshot?

FlyingHacker · Jul 9, 2022

One task of our old (non-TrueNAS) NAS was to run some rsync backups using hard links to keep multiple "snapshots" of a remote un*x file system that does not use ZFS. I use rsync with the -H option to preserve hard links to existing files so that the same file is seen in multiple directories, but saved only once on disk.

I'd like to be able to do with with TrueNAS. I am not sure what the best approach is. I could possibly run a standard rsync task followed by a zfs snapshot task? The rsync task would update everything locally from the remote server (including deletions), and then the zfs snapshot would record it for some period of time. At network speeds the rsync generally finishes with 30mins, maybe an hour if a lot of files have changed. So perhaps I space the rsync and snapshot tasks twelve hours apart.

Or am I better off with a custom rsync script like I have been using?

I suppose if I use the custom script I should just run it from cron in the UI, right?

Thanks.

Arwen · Jul 10, 2022

I use RSync with -H from my Linux clients to my TrueNAS, then snapshot the destination ZFS dataset. My Linux clients all have dedicated datasets in the TrueNAS, so I only need to snapshot that specific one after. This allows me to clean up space more appropriately, client by client. My snapshots are manual ones related to the backups, so are not auto-removed after a specific time.

The RSync -H option only preserves the sources hard-links in the destination. It does not make a backup tree, (which I think RSync has some options to do, but I've never used).

As for the snapshot task, you can have a snapshot script that runs from cron every few minutes looking for a specific file in the backup dataset's top level directory. Like "/mnt/mypool/backups/Linux_Client1/make_snapshot". If that file exists, the snapshot script will erase the file and make the snapshot of "mypool/backups/Linux_Client1". Your RSync backup task adds that file as it's last step.

You can even cause that snapshot task to auto-clean up after a certain amount of snapshots, time or if free space gets tight.

FlyingHacker · Jul 11, 2022

I guess I was wondering if I should rsync WITHOUT -H and then snapshot after each rsync. This way the snapshots capture the file changes, and multiple versions of the files, rather than having to rely on the hardlinks from -H.

I really just need a way to make some sort of "snapshot" of the remote machines each day (or week, etc.). Then I need to roll those off (delete them) after a month or two.

So perhaps it is rsync without -H (but with delete enabled) at midnight and snapshot at noon every day (for daily backups). Would the snapshots handle the deduplication themselves, or do I need a dataset with dedup on for this to work? That might not be a bad idea anyway to put all them in a dedeup'ed dataset because there would be some overlap in terms of FreeBSD and linux system files common among various systems (backing up a few remote hosts).

Arwen · Jul 12, 2022

If you ever want to restore, you want to use RSync's -H option. This has very little to do with ZFS or TrueNAS. Preserving hard-links simply reduces storage requirements on the Unix client server. You would want to preserve hard-links on the NAS during backups, for any restore. Plus, it maintains the reduced storage requires on the NAS just like the source Unix client server.

You do want to use RSync's "--delete" as that will make the new backup identical to the source at the time of the backup.

Another useful option for RSync backups of Unix OSes, is the "--sparse" feature. Several files in Unix, specifically login log files, are written by pointers. Thus, they can seem huge. I've seen some say they are 4GB, yet really only take several hundred kilo-bytes. Even seen some bizarrely huge ones in the tera-byte range, (because my company started using 64 bit user or group IDs).

Now as for HOW ZFS snapshots work, it does work similar to a copy on write, hard-link. So immediately after a ZFS snapshot, both the source dataset and the new snapshot are identical as if the snapshot is 100% hard links to the source dataset. The magic comes in to play as the source dataset gets changed, the snapshot remains static. Thus, the source dataset has to allocation new space for any of the changes, breaking that specific pseudo hard-link.

Last, ZFS De-Duplication is an advanced feature. Most people just don't have the hardware for it, (lots of RAM). Using ZFS snapshots allows a similar concept for client backups, because only the changes between the snapshots and the source dataset take up extra space. De-Dup does allow that reduced space across all Unix client backups using the same ZFS dataset, but at a higher cost.

Important Announcement for the TrueNAS Community.

rsync with hardlinks or rsync then snapshot?

FlyingHacker

Dabbler

Arwen

MVP

FlyingHacker

Dabbler

Arwen

MVP

Similar threads