Unveiling ZFS Replication Quirk: Your Destination Snapshots at Risk?

tnuser9999 · Aug 11, 2023

I did a little investigation of syncoid. There is an option to not rollback the destination when the source is. I am hoping TrueNAS has some similar configuration or as moderator Honeybadger stated, a configuration that perserves the destination snapshots.

--no-rollback
Do not rollback anything (clones or snapshots) on target host

WI_Hedgehog · Aug 11, 2023

Johnny Fartpants said:
Wouldn’t that double the used space of the dataset as it would need to create a parallel universe?

A VBox snapshot contains only the blocks that changed from the prior snapshot. Restoring to a previous snapshot uses no space until blocks start changing when using the dataset at which point the changed blocks start taking up space. Taking a snapshot of those changes takes no space, rather it saves that point in time as a named checkpoint and opens a new generic diff file--if you snapshot that then it names the generic diff file. If you do a rollback without saving the current state it deletes the generic diff file, changes the base pointer to the snapshot, and creates a new generic diff file.

So VirtualBox in essence is always making differential backups, snapshotting "names and saves" them. It's awesome and instant all the time...until:

You delete a center snapshot (a collapse causes a merge).
A block on disk or in memory gets corrupted (which seems an eventuality without at least ECC, but really ZFS Best Practices should be followed).
Ransomware that a.) understands the disk layout and b.) has Admin privileges gets into the system (mainly affects Windows boxes through privilege escalation).

I imagine ZFS does something similar? Maybe? (This is presumably where ZFS is more like Doc's time travel as there's only one odd genius who understands it and everyone else hopes their idea of how it works is fairly close.)

NickF · Aug 11, 2023

You can do a ZFS hold on a snapshot so that it NEVER gets deleted. IIRC this is coming in the GUI in SCALE Cobia.
But you can run the ZFS hold command from basically any version of TrueNAS

Johnny Fartpants · Aug 12, 2023

Problem is if you run zfs hold on a snapshot on the receiving end it breaks auto replication after the rollback as the source wants to remove all unrelated snapshots and it can’t because of the hold.

tnuser9999 · Aug 12, 2023

I would want a rollback on source to break replication with destination in this case. I would not want destination retention affected by source. As noted in this thread, rollback is a serious step. I think it's a serious enough step that if I wish to perform a rollback on source manually, then I would also take manual steps to rollback destination so replication continues to work if the rollback on source was intentional. TrueNAS replication can't be a reliable secure backup solution if changes to source can change destination. I am looking forward to learning what the moderator finds when testing if this is intended or not. From the previous post it sounded like the expectation was there is an option where source changes did not affect destination.

tnuser9999 · Aug 12, 2023

For instance how syncoid handles this

"--no-rollback" tells syncoid not to attempt to bring the target dataset to the same state as the src in order for the replication to work. So in case the dst dataset is newer than the src (has data written to it), the replication fails with this option, which is the desired outcome.
You need to make sure nothing is written to the backup dataset for the replication to work.

If you rollback the src dataste, like you did, the data on the target dataset is newer and for the zfs replication to work, it needs to be rolled back.

NickF · Aug 12, 2023

Johnny Fartpants said:
Problem is if you run zfs hold on a snapshot on the receiving end it breaks auto replication after the rollback as the source wants to remove all unrelated snapshots and it can’t because of the hold.

Not if it’s held on both sides

Johnny Fartpants · Aug 12, 2023

Destination snapshot retention and rollback are two different things. There is already the ability to make destination retention different to source but rollback is very different. As I mentioned above when you rollback you are servering the tie between the existing datasets and those snapshots that are essentially being discarded. It sounds like your desired outcome is to have an up-to-date replicated backup until the day you don’t want one which is kind of tricky. I completely get where you are coming from but I don’t agree this means that ‘TrueNAS can’t be a secure reliable backup solution’. If someone gets root access to your primary TN as mentioned above you are in the so the focus of your attention should primarily be on making sure that doesn’t happen which should involve secure isolated network with ACLs and firewall. Complex password authentication and 2FA.

NickF · Aug 12, 2023

Johnny Fartpants said:
Destination snapshot retention and rollback are two different things. There is already the ability to make destination retention different to source but rollback is very different.

Yes, you can set differing retentions between source and destination. You can hold it longer on the source or you can hold it longer on the destination, etc.

Johnny Fartpants said:
As I mentioned above when you rollback you are servering the tie between the existing datasets and those snapshots that are essentially being discarded

The "rollback" mechanism is not designed for normal use. It's a "oh shit" button in a disaster recovery situation. An alternative is to restore individual files via SMB and Shadow Copies.

To be clear, I think we are on the same page here. I'm just making sure that I am being clear for OPs benefit. There is no "quirk" here. This is an exercise in understanding the "nuance" of this particular backup and DR strategy.

Let's take ZFS out of the conversation and more generally talk about this topic for a moment. Let's imagine we are using Veeam to backup a bunch of stuff. Now let's imagine that the Veeam server was setup in such a way that it was compromised along with all of your other systems in a Crypto malware attack. How do you restore your backups? :P

My point is that you as a sysadmin need to understand how the technologies you are using work, and plan around those systemic behaviors.

WI_Hedgehog · Aug 12, 2023

I'm going to say at this point Replication is not a resonable backup strategy. Anything that's as complex/involved as this turned out to be is prone to failure, especially when the staff is under stress from trying to mitigate damages.

Use reliable archival software on a remote system.
KISS: Keep It Stupidly Simple

NickF · Aug 12, 2023

WI_Hedgehog said:
I'm going to say at this point Replication is not a resonable backup strategy. Anything that's as complex/involved as this turned out to be is prone to failure, especially when the staff is under stress from trying to mitigate damages.

I fundamentally and completely disagree with you. Whether we are talking about ZFS replication or simply having a second copy of your data with RSYNC or some other methodology is literally the definition of a good backup strategy. Also ZFS replication is one of the few solutions to this problem that preserves ACLs and XATTRs...which is critical for some workloads.

For the particulars of ZFS replication, you can literally use the TN wizard in a "set-and-forget" way using the defaults, or you can tune it to do all sorts of advanced things like we have been talking about in this thread.

WI_Hedgehog said:
Use reliable archival software on a remote system.
KISS: Keep It Stupidly Simpl

Pray tell, what "archival" software on a remote system offer better features, performance, reliability, simplicity, etc? You talked about virtual box and why it's differential snapshots are great above. This is exactly what ZFS snapshots do. Since replication literally relies on (requires) snapshots its not a dissimilar situation.

awasb · Aug 12, 2023

Not better, but an alternative if there is no ZFS ...

restic · Backups done right!

restic.net

NickF · Aug 12, 2023

tnuser9999 said:
I really don't want it to work after a rollback. I want it to break. If it was intentional I would rollback destination manually too.

This answers all of my questions here. I agree with you and I think this is /thread.
TLDR, use a ZFS hold on snaps on both sides if you are worried, or just set the retention time longer on the destination side. :)

Johnny Fartpants · Aug 12, 2023

Haha this sounds just like the debate I have at work with the old skool backup folk.

NickF · Aug 12, 2023

Johnny Fartpants said:
Haha this sounds just like the debate I have at work with the old skool backup folk.

But we didn't even talk about the minimum geographic distances for storing backups like RCLONE (the cloud) or tape yet!

Johnny Fartpants · Aug 12, 2023

NickF said:
use a ZFS hold on snaps on both sides if you are worried, or just set the retention time longer on the destination side. :)

Even if this works how practical is this when taking potentially hundreds of snapshots? Are you suggesting you somehow auto hold all snapshots on both sides and if so how and for how long?

winnielinnie · Aug 12, 2023

Johnny Fartpants said:
Are you suggesting you somehow auto hold all snapshots on both sides and if so how and for how long?

Because there's no seamless nor integrated means to do this with TrueNAS (yet?), you can do it manually "once in a while" as a simple contingency:

Server off longer than snapshot retention period

I noticed during testing that if I have the source truenas server off for longer than the snapshot retention policy, all snapshots are removed and the it breaks the replication to the destination server, making all those useless. This seems really dangerous if you have a server you take offline...

www.truenas.com

NickF · Aug 12, 2023

Johnny Fartpants said:
Even if this works how practical is this when taking potentially hundreds of snapshots? Are you suggesting you somehow auto hold all snapshots on both sides and if so how and for how long?

This is a function of your delta size between snapshots. If you have a dataset which does not dynamically change much (like an archive of files) does it really matter? I've used this strategy for those instances dozens of times. You only need to hold a single snapshot on both sides :P

If its for a backing LUN for VMs? That's a different story.

Johnny Fartpants · Aug 12, 2023

Ok so this is a manual add-hoc suggestion? I guess this won’t help the situation the OP has raised around been hijacked and someone letting rip and rolling back all the datasets?

Johnny Fartpants · Aug 12, 2023

Unless they zfs hold one snap on every dataset on both sides? Then remember to release them from time to time so they don’t run out of space?

Important Announcement for the TrueNAS Community.

Unveiling ZFS Replication Quirk: Your Destination Snapshots at Risk?

Dabbler

Guru

Guru

Guru

Dabbler

Dabbler

Guru

Guru

Guru

Guru

Guru

Patron

Guru

Guru

Guru

Guru

MVP

Guru

Guru

Guru

Similar threads