Unveiling ZFS Replication Quirk: Your Destination Snapshots at Risk?

tnuser9999

Dabbler
Joined
Jun 29, 2023
Messages
40
I did a little investigation of syncoid. There is an option to not rollback the destination when the source is. I am hoping TrueNAS has some similar configuration or as moderator Honeybadger stated, a configuration that perserves the destination snapshots.
  • --no-rollback
    Do not rollback anything (clones or snapshots) on target host
 
Joined
Jun 15, 2022
Messages
674
Wouldn’t that double the used space of the dataset as it would need to create a parallel universe?
A VBox snapshot contains only the blocks that changed from the prior snapshot. Restoring to a previous snapshot uses no space until blocks start changing when using the dataset at which point the changed blocks start taking up space. Taking a snapshot of those changes takes no space, rather it saves that point in time as a named checkpoint and opens a new generic diff file--if you snapshot that then it names the generic diff file. If you do a rollback without saving the current state it deletes the generic diff file, changes the base pointer to the snapshot, and creates a new generic diff file.

So VirtualBox in essence is always making differential backups, snapshotting "names and saves" them. It's awesome and instant all the time...until:
  • You delete a center snapshot (a collapse causes a merge).
  • A block on disk or in memory gets corrupted (which seems an eventuality without at least ECC, but really ZFS Best Practices should be followed).
  • Ransomware that a.) understands the disk layout and b.) has Admin privileges gets into the system (mainly affects Windows boxes through privilege escalation).
I imagine ZFS does something similar? Maybe? (This is presumably where ZFS is more like Doc's time travel as there's only one odd genius who understands it and everyone else hopes their idea of how it works is fairly close.)
 
Last edited:

NickF

Guru
Joined
Jun 12, 2014
Messages
763
You can do a ZFS hold on a snapshot so that it NEVER gets deleted. IIRC this is coming in the GUI in SCALE Cobia.
But you can run the ZFS hold command from basically any version of TrueNAS
 
Joined
Jul 3, 2015
Messages
926
Problem is if you run zfs hold on a snapshot on the receiving end it breaks auto replication after the rollback as the source wants to remove all unrelated snapshots and it can’t because of the hold.
 

tnuser9999

Dabbler
Joined
Jun 29, 2023
Messages
40
I would want a rollback on source to break replication with destination in this case. I would not want destination retention affected by source. As noted in this thread, rollback is a serious step. I think it's a serious enough step that if I wish to perform a rollback on source manually, then I would also take manual steps to rollback destination so replication continues to work if the rollback on source was intentional. TrueNAS replication can't be a reliable secure backup solution if changes to source can change destination. I am looking forward to learning what the moderator finds when testing if this is intended or not. From the previous post it sounded like the expectation was there is an option where source changes did not affect destination.
 
Last edited:

tnuser9999

Dabbler
Joined
Jun 29, 2023
Messages
40
For instance how syncoid handles this
"--no-rollback" tells syncoid not to attempt to bring the target dataset to the same state as the src in order for the replication to work. So in case the dst dataset is newer than the src (has data written to it), the replication fails with this option, which is the desired outcome.
You need to make sure nothing is written to the backup dataset for the replication to work.

If you rollback the src dataste, like you did, the data on the target dataset is newer and for the zfs replication to work, it needs to be rolled back.
 

NickF

Guru
Joined
Jun 12, 2014
Messages
763
Problem is if you run zfs hold on a snapshot on the receiving end it breaks auto replication after the rollback as the source wants to remove all unrelated snapshots and it can’t because of the hold.
Not if it’s held on both sides
 
Joined
Jul 3, 2015
Messages
926
Destination snapshot retention and rollback are two different things. There is already the ability to make destination retention different to source but rollback is very different. As I mentioned above when you rollback you are servering the tie between the existing datasets and those snapshots that are essentially being discarded. It sounds like your desired outcome is to have an up-to-date replicated backup until the day you don’t want one which is kind of tricky. I completely get where you are coming from but I don’t agree this means that ‘TrueNAS can’t be a secure reliable backup solution’. If someone gets root access to your primary TN as mentioned above you are in the so the focus of your attention should primarily be on making sure that doesn’t happen which should involve secure isolated network with ACLs and firewall. Complex password authentication and 2FA.
 

NickF

Guru
Joined
Jun 12, 2014
Messages
763
Destination snapshot retention and rollback are two different things. There is already the ability to make destination retention different to source but rollback is very different.
Yes, you can set differing retentions between source and destination. You can hold it longer on the source or you can hold it longer on the destination, etc.
As I mentioned above when you rollback you are servering the tie between the existing datasets and those snapshots that are essentially being discarded
The "rollback" mechanism is not designed for normal use. It's a "oh shit" button in a disaster recovery situation. An alternative is to restore individual files via SMB and Shadow Copies.

To be clear, I think we are on the same page here. I'm just making sure that I am being clear for OPs benefit. There is no "quirk" here. This is an exercise in understanding the "nuance" of this particular backup and DR strategy.

Let's take ZFS out of the conversation and more generally talk about this topic for a moment. Let's imagine we are using Veeam to backup a bunch of stuff. Now let's imagine that the Veeam server was setup in such a way that it was compromised along with all of your other systems in a Crypto malware attack. How do you restore your backups? :P

My point is that you as a sysadmin need to understand how the technologies you are using work, and plan around those systemic behaviors.
 
Joined
Jun 15, 2022
Messages
674
I'm going to say at this point Replication is not a resonable backup strategy. Anything that's as complex/involved as this turned out to be is prone to failure, especially when the staff is under stress from trying to mitigate damages.

Use reliable archival software on a remote system.
KISS: Keep It Stupidly Simple
 

NickF

Guru
Joined
Jun 12, 2014
Messages
763
I'm going to say at this point Replication is not a resonable backup strategy. Anything that's as complex/involved as this turned out to be is prone to failure, especially when the staff is under stress from trying to mitigate damages.
I fundamentally and completely disagree with you. Whether we are talking about ZFS replication or simply having a second copy of your data with RSYNC or some other methodology is literally the definition of a good backup strategy. Also ZFS replication is one of the few solutions to this problem that preserves ACLs and XATTRs...which is critical for some workloads.

For the particulars of ZFS replication, you can literally use the TN wizard in a "set-and-forget" way using the defaults, or you can tune it to do all sorts of advanced things like we have been talking about in this thread.
Use reliable archival software on a remote system.
KISS: Keep It Stupidly Simpl
Pray tell, what "archival" software on a remote system offer better features, performance, reliability, simplicity, etc? You talked about virtual box and why it's differential snapshots are great above. This is exactly what ZFS snapshots do. Since replication literally relies on (requires) snapshots its not a dissimilar situation.
 
Last edited:

awasb

Patron
Joined
Jan 11, 2021
Messages
415

NickF

Guru
Joined
Jun 12, 2014
Messages
763
I really don't want it to work after a rollback. I want it to break. If it was intentional I would rollback destination manually too.
This answers all of my questions here. I agree with you and I think this is /thread.
TLDR, use a ZFS hold on snaps on both sides if you are worried, or just set the retention time longer on the destination side. :)
 
Joined
Jul 3, 2015
Messages
926
Haha this sounds just like the debate I have at work with the old skool backup folk.
 

NickF

Guru
Joined
Jun 12, 2014
Messages
763
Haha this sounds just like the debate I have at work with the old skool backup folk.
But we didn't even talk about the minimum geographic distances for storing backups like RCLONE (the cloud) or tape yet! :wink:
 
Joined
Jul 3, 2015
Messages
926
use a ZFS hold on snaps on both sides if you are worried, or just set the retention time longer on the destination side. :)
Even if this works how practical is this when taking potentially hundreds of snapshots? Are you suggesting you somehow auto hold all snapshots on both sides and if so how and for how long?
 
Joined
Oct 22, 2019
Messages
3,641
Are you suggesting you somehow auto hold all snapshots on both sides and if so how and for how long?
Because there's no seamless nor integrated means to do this with TrueNAS (yet?), you can do it manually "once in a while" as a simple contingency:

 

NickF

Guru
Joined
Jun 12, 2014
Messages
763
Even if this works how practical is this when taking potentially hundreds of snapshots? Are you suggesting you somehow auto hold all snapshots on both sides and if so how and for how long?
This is a function of your delta size between snapshots. If you have a dataset which does not dynamically change much (like an archive of files) does it really matter? I've used this strategy for those instances dozens of times. You only need to hold a single snapshot on both sides :P

If its for a backing LUN for VMs? That's a different story.
 
Joined
Jul 3, 2015
Messages
926
Ok so this is a manual add-hoc suggestion? I guess this won’t help the situation the OP has raised around been hijacked and someone letting rip and rolling back all the datasets?
 
Joined
Jul 3, 2015
Messages
926
Unless they zfs hold one snap on every dataset on both sides? Then remember to release them from time to time so they don’t run out of space?
 
Top