Server off longer than snapshot retention period

tnuser9999

Dabbler
Joined
Jun 29, 2023
Messages
40
I noticed during testing that if I have the source truenas server off for longer than the snapshot retention policy, all snapshots are removed and the it breaks the replication to the destination server, making all those useless. This seems really dangerous if you have a server you take offline for long periods of time. How would you prevent breaking the replication? Do you just set a really long retention policy ?
 
Joined
Oct 22, 2019
Messages
3,641
There's no GUI solution for this with TrueNAS.

I had submitted a feature request to leverage the "hold" feature for ZFS snapshots, but it might be a while before they implement it. (Think of a "hold" as a way to protect a snapshot from destruction, no matter what operation attempts to destroy it.) It doesn't need to be limited to a simple checkbox that you manually manage. There can be a way to automate the hold/release of snapshots. But that's for another conversation.

Then there's also the ZFS "bookmark" feature. Again, the GUI does not leverage this.

You can manually manage "holds" to protect snapshots from destruction. But it will be incumbent upon you to do this yourself in the command-line, whether manually or via a script. If you manually manage it, you have to remember to "release" old "holds".
 

tnuser9999

Dabbler
Joined
Jun 29, 2023
Messages
40
Does not seem to be at least an option in the GUI in the snapshot task under retention to "do nothing". I wish I could set up a retention based on the number of snapshots rather than a time frame. Lets say you have retention for snapshots set for a week and you take your server offline for that week, when you return your replication task on the destination server is broken.
 
Joined
Oct 22, 2019
Messages
3,641
Yup. I have an opinion on this and other related oversights, but it's been done to death on these forums.
 
Joined
Jun 15, 2022
Messages
674
Synology can use "date" and "minimum number of retained snapshots" concurrently, which makes sense. Therefore via competition there is value at the small- office level.

Home user snapshot retention for offline servers should be 3 months given vacations. If you have a family emergency make that 6 months. If you become the emergency or should be at least a year.
 

Constantin

Vampire Pig
Joined
May 19, 2017
Messages
1,829
Could one way to create a “hold” be to check the “retain snapshots until they have successfully been replicated to a backup server feature” in replication tasks?

Then tell whoever to turn off the backup server when the hold event starts, thereby stopping replications and hence local snapshot destructions?
 

tnuser9999

Dabbler
Joined
Jun 29, 2023
Messages
40
For now I guess I will set snapshot lifetime to something like 1 year even though I really wish not to have them that long to prevent breaking replication. There have been times when the offline backup server isn't been powered on for months at a time.
 
Joined
Jun 15, 2022
Messages
674
For now I guess I will set snapshot lifetime to something like 1 year even though I really wish not to have them that long to prevent breaking replication. There have been times when the offline backup server isn't been powered on for months at a time.
Months??? Picture the server power supply pops and fries all the drive controllers..."months" of data lost. Or jenky user Herbert plugs a ransomware infected USB drive into his/her/other PC and infects the server. Or like happened to other users recently, cooling fan runs out of bearing juice and the system runs up to 45C causing HDD bearing juice to evaporate and now ya got thunky drives hammering out the drumline to "Livin' On A Prayer" by The King of Balsam.
 

tnuser9999

Dabbler
Joined
Jun 29, 2023
Messages
40
Months??? Picture the server power supply pops and fries all the drive controllers..."months" of data lost. Or jenky user Herbert plugs a ransomware infected USB drive into his/her/other PC and infects the server. Or like happened to other users recently, cooling fan runs out of bearing juice and the system runs up to 45C causing HDD bearing juice to evaporate and now ya got thunky drives hammering out the drumline to "Livin' On A Prayer" by The King of Balsam.
My apologies, I stated that incorrectly. This is just for personal use and I was away for over 6 months. I had both the backup server and the "NAS" server both powered off. The data I very much wanted to keep but it was static, not changing. One server onsite and the other remote.

Not the end of the world but it would be a pain to come back to all the source snapshots gone because they didn't meet the "lifetime" and then replication with the backup server be broken. There are times both servers are up and active and data is changing. I guess I will go with something like with snapshots on the source; hourly snapshots/life time 1 day, daily snapshots/life time 1 week, weekly snapshots/life time 1 year.

Then the replication set as below:

rep_schedule1.PNG
 
Last edited:

Constantin

Vampire Pig
Joined
May 19, 2017
Messages
1,829
When I have run into this, the easiest solution for me ended up being to bring the remote server to the local one and start the replication process over.

It’s much simpler when both machines are local and with the data volume that may be involved will also take a lot less time. My SSH+netcat replication task was running at several hundred MB/s for hours on end.
 

tnuser9999

Dabbler
Joined
Jun 29, 2023
Messages
40
Could one way to create a “hold” be to check the “retain snapshots until they have successfully been replicated to a backup server feature” in replication tasks?

Then tell whoever to turn off the backup server when the hold event starts, thereby stopping replications and hence local snapshot destructions?
I am not finding this setting under task / snapshots
 

Constantin

Vampire Pig
Joined
May 19, 2017
Messages
1,829
See tasks -> replication then set up a replication job with the remote server. It’s in the window where the actual job is set up. The snapshot retention toggle is the lower left under the menu tree for what snapshots you want to replicate to the remote server. It’s a check box with a description like “retain snapshots until successful replication” or something like that.
 

tnuser9999

Dabbler
Joined
Jun 29, 2023
Messages
40
ah I guess this would be for a push setup then, all of the things I am running into makes me think about the differences between replication and backups. I am trying to fit a square peg into a round hole. I love the convenience and the speed at which the replication takes place however I do see the limitations. I may reluctantly look at rsnapshot, restic or something, though the time it takes to backup is painful.
 

Constantin

Vampire Pig
Joined
May 19, 2017
Messages
1,829
Depends on your use case, I suppose, as well as how much free space you have lying around. I like snapshots because like Time Machine they make it possible to go back to a specific date before some unhappy event occurred and pretend like nothing happened at all.
 
Joined
Oct 22, 2019
Messages
3,641
If you just want a quick and easy, albeit rudimentary safeguard in the meantime, you can do this:
  1. Complete a full replication (since incremental is no longer possible, due to pruned snapshots)
  2. Hold the latest "common" snapshot on the source (and optionally, on the destination as well)
  3. Eventually (a year later?) release this snapshot, since you will now hold a more recent common snapshot
  4. The process repeats from Step 2 at your own manual intervention, at your preferred intervals of time

The hold command is simple.
Code:
zfs hold tag dataset@snapshot


It will look something like this. (I use "safeguard" as the tag, since it helps explain the purpose.)
Code:
zfs hold safeguard mypool/dataset@auto-2023-08-06_00-00


To release a hold is just as intuitive.
Code:
zfs release tag dataset@snapshot


So in the above example, it looks like this.
Code:
zfs release safeguard mypool/dataset@auto-2023-08-06_00-00


To view a list of snapshots protected with "hold", feed the list command to the holds command.
Code:
zfs list -H -o name -t snapshot mypool/dataset | xargs zfs holds



Now you see why I submitted a feature request to streamline this into the GUI? :wink: More practical with point-and-click and shiny buttons. (Doesn't look like it'll ever be implemented in Core. :confused:)
 
Last edited:
Top