Unveiling ZFS Replication Quirk: Your Destination Snapshots at Risk?

tnuser9999

Dabbler
Joined
Jun 29, 2023
Messages
40
I'm currently testing Truenas ZFS replication as a backup solution. My aim is to ensure that ZFS snapshots are not deleted on the destination server when they are removed from the source. This is crucial to prevent potential data loss if unauthorized access occurs.

In my replication configuration, I selected a custom snapshot retention policy to prevent source snapshot actions from affecting the snapshots on the destination server. However, during a test where I rolled back the source to a previous snapshot, I noticed that the corresponding snapshots on the destination were also removed. This unexpected behavior occurred consistently in multiple tests.

Could someone please advise me on the correct configuration to ensure that changes on the source do not lead to snapshot removals on the destination? Your insights would be greatly appreciated.

rep3.PNG
 

artlessknave

Wizard
Joined
Oct 29, 2016
Messages
1,506
when you roll back all snapshots later than the rollback cease to exist. you have rolled back. any snapshots that were replicated after the time period you rolled back to are no longer valid and thus useless. there is nowhere to use them.

the only way i know of to change this is to clone THEN rollback, which will keep all snapshots in any clones. they will then belong only to the clones, not the anything that was rolled back.

or change the replication destination.
 

samarium

Contributor
Joined
Apr 8, 2023
Messages
192
As above, or maybe you could tweak things so the replication breaks in the case of a source rollback and requires manual intervention.

How to do differentiate between known good rollback where you probably want dst to mirror src and hostile rollback, where you want to block rollback/delete on dst?

Now extrapolate to n src machines and m datasets? How to you manage it while still getting backups flowing without admin intervention at 0300?

Seems like you could run a pre-script that clones/promotes the remote to cater for your, nor necessarily unreasonable, paranoia, so you could then allow the normal snapshot flow.

But as far as I'm concerned it is working properly as expected in line with how snapshots and replication work. So you will need to do the extra work to make it do what you want.
 
Last edited:

artlessknave

Wizard
Joined
Oct 29, 2016
Messages
1,506
aditionally, a rollback is a major operation. you would only be doing this for a catastrophic problem, such as everything getting encrypted by ransomware. normally, you would just grab any individual files from snapshots, or clone a snapshot. you could, for example, instead clone the snapshot you were rolling back to. you would then have the original dataset and a clone (clones are of any snapshot, and create a new filesystem tree dependant on said snapshot)
 
Joined
Jul 3, 2015
Messages
926
This is crucial to prevent potential data loss if unauthorized access occurs.
Could you elaborate on this please? Might be worth exploring the situation you are trying to prevent to confirm it’s necessary.
 

tnuser9999

Dabbler
Joined
Jun 29, 2023
Messages
40
Are people relying on ZFS replication as backups? To elaborate, if you are depending on zfs replication as a backup of your data, if a malicious bad actor has compromised your system and is in and has root access, they could easily execute a rollback command and wipe out most of your data, not only on the source but after a replication on the backup destination truenas server. Unless you have a backup elsewhere, you will sadly find that source data is gone and the data you thought would be there on the truenas backup server is as well.
 

tnuser9999

Dabbler
Joined
Jun 29, 2023
Messages
40
To add: The preferred behavior for someone in this case would be for replication to fail on a rollback of the source. Then at least the data would still be present on destination and while recovery is happening from bad actor attack, the destination could be the primary with all data in tact. Or if it was an intended rollback, could the person not just remove the conflicting snapshots on destination manually then replicate again?
 
Joined
Jul 3, 2015
Messages
926
Its a fair point and there are three thoughts that spring to mind.

1. If we are to assume that your TrueNAS is sat within a secure network only allowing certain IP addresses/ranges access over https and that your root/admin account has a complex password with 2FA enabled realistically what is the likelihood of this happening? Coupled with the fact the would be attacker would also fortuitously know that rolling back a snapshot of a given dataset would cause much damaged. This would most likely be carried out by the systems administrator and as we know if they want to let rip we can't really stop them as they are god in this situation.

2. More practically you could delay your replication task to say only run between 6pm-8pm at night meaning the window for this unfortunate event to happen is very small and one would assume you would have realised this before the next replication had chance to run.

3. zpool checkpoint allows you to take a pool wide snapshot of your system but only one for that matter. All actions performed after a checkpoint such as roll-backs and dataset deletions can be undone. However only one checkpoint can exist at any one time and naturally as the checkpoint diverges from its current data the space grows. You could perhaps have a cron job run every day/week and then release the checkpoint and take another as this would give you some sort of DR. You could decide on the most appropriate retention period based on your requirements and data changes on the pool.

Once again fair point and good to think these things through. Thanks for sharing your thoughts, I enjoyed this one.
 

NickF

Guru
Joined
Jun 12, 2014
Messages
763
I've discussed part of this at length and have designed what I feel is a very good solution.

I don't see an "quirks" here, but certainly there is nuance to any backup strategy...details matter.
 
Joined
Oct 22, 2019
Messages
3,641
if a malicious bad actor has compromised your system and is in and has root access
This would most likely be carried out by the systems administrator and as we know if they want to let rip we can't really stop them as they are god in this situation.

That's all that needs to be said. Having such access means they can also access your keys to connect to the backup server to outright destroy other datasets.

Just visit System > SSH Connections and SSH Keypairs to find the relevant targets and keys.
 
Last edited:

tnuser9999

Dabbler
Joined
Jun 29, 2023
Messages
40
Thanks for all the discussion! Yes we all want to think we have protected our systems well. There is no gaurantees that there will not be a exploit now or in the future where a bad actor does get in. In this case you want a reliable backup that isn't subject to the state of the source. Ideally the what happens on source doesn't remove data from the destination. If you are completing pull replications, the only keys the bad actor should have on source is your source keys and maybe some public keys of destination. Also if pulling you can have this destination in a separate firewall zone without any inbound rules into that zone. It's not if bad guys get in, it's when and how you can recover from it.
 

tnuser9999

Dabbler
Joined
Jun 29, 2023
Messages
40
It sounds like this behavior is by design and unforutunately can't be tuned. It would be nice to have the option for destination snapshots not to be deleted by anything happening on the source, and rather managed at the destination. I love the gui and the convenience of Truenas, this is probably a deal breaker for me. Has anyone worked with syncoid from the sanoid project? I may look to see if it has the same limitations. Thank you!
 
Joined
Oct 22, 2019
Messages
3,641
Has anyone worked with syncoid from the sanoid project?
I have. It's a quick and easy way to create snapshots and send them to a destination. You don't even need to use the "Sanoid" portion.

Syncoid works as an "on-demand" replication. It uses its own self-contained named snapshots. It handles its own pruning as well. However, I never tested out your scenario of "destroying the source" to see what will happen with the destination. It will probably just fail with "no common snapshots exist between source and target."

It's not included with TrueNAS, nor would it be recommended to try to "install" it in the host system.
 

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,112
It sounds like this behavior is by design and unforutunately can't be tuned. It would be nice to have the option for destination snapshots not to be deleted by anything happening on the source, and rather managed at the destination. I love the gui and the convenience of Truenas, this is probably a deal breaker for me. Has anyone worked with syncoid from the sanoid project? I may look to see if it has the same limitations. Thank you!
Hello @tnuser9999

Have you configured your snapshots for PUSH or PULL replication? I'll have to see if I can reproduce the settings here, but there should certainly be a setting that will preserve the snapshots, albeit with an increase in disk consumption on the destination/replication target.
 

tnuser9999

Dabbler
Joined
Jun 29, 2023
Messages
40
Hello @tnuser9999

Have you configured your snapshots for PUSH or PULL replication? I'll have to see if I can reproduce the settings here, but there should certainly be a setting that will preserve the snapshots, albeit with an increase in disk consumption on the destination/replication target.

Oh that's good news if this isn't intentional. I have pull replication configured. Let me share all of the configuration. To reproduce the issue, I performed a rollback on the source to an earlier snapshot. After the source creats a new snapshot after the rollback that matches the replication schedule, it replicates the newest snapshot and removes the snapshot on the destination that was no longer present on source after the rollback.

rep3_1.PNG

rep3.PNG
 
Last edited:
Joined
Jul 3, 2015
Messages
926
I can’t quite see how this is going to work tbh. Snapshots reference each other and if you rollback you essentially start a new timeline. Have you ever seen ‘Back to the Future 2’? It’s like that bit with Doc and the chalkboard.
 
Joined
Jun 15, 2022
Messages
674
I can’t quite see how this is going to work tbh. Snapshots reference each other and if you rollback you essentially start a new timeline. Have you ever seen ‘Back to the Future 2’? It’s like that bit with Doc and the chalkboard.
I haven't tried it in TrueNAS, in VirtualBox you start a new concurrent timeline.

---
Doc's timeline analysis is suspect (it creates a temporal paradox). If you go back to your past 20 years ago and change it that doesn't change your current self because the timeline you came from is also in your past. (Novikov self-consistency principle)
 
Joined
Jul 3, 2015
Messages
926
Wouldn’t that double the used space of the dataset as it would need to create a parallel universe?
 

tnuser9999

Dabbler
Joined
Jun 29, 2023
Messages
40
I can’t quite see how this is going to work tbh. Snapshots reference each other and if you rollback you essentially start a new timeline. Have you ever seen ‘Back to the Future 2’? It’s like that bit with Doc and the chalkboard.
I really don't want it to work after a rollback. I want it to break. If it was intentional I would rollback destination manually too.
 
Last edited:
Top