Server off longer than snapshot retention period

tnuser9999 · Aug 5, 2023

I noticed during testing that if I have the source truenas server off for longer than the snapshot retention policy, all snapshots are removed and the it breaks the replication to the destination server, making all those useless. This seems really dangerous if you have a server you take offline for long periods of time. How would you prevent breaking the replication? Do you just set a really long retention policy ?

winnielinnie · Aug 5, 2023

There's no GUI solution for this with TrueNAS.

I had submitted a feature request to leverage the "hold" feature for ZFS snapshots, but it might be a while before they implement it. (Think of a "hold" as a way to protect a snapshot from destruction, no matter what operation attempts to destroy it.) It doesn't need to be limited to a simple checkbox that you manually manage. There can be a way to automate the hold/release of snapshots. But that's for another conversation.

Then there's also the ZFS "bookmark" feature. Again, the GUI does not leverage this.

You can manually manage "holds" to protect snapshots from destruction. But it will be incumbent upon you to do this yourself in the command-line, whether manually or via a script. If you manually manage it, you have to remember to "release" old "holds".

tnuser9999 · Aug 5, 2023

Thanks for the information, that is great to know

tnuser9999 · Aug 5, 2023

Does not seem to be at least an option in the GUI in the snapshot task under retention to "do nothing". I wish I could set up a retention based on the number of snapshots rather than a time frame. Lets say you have retention for snapshots set for a week and you take your server offline for that week, when you return your replication task on the destination server is broken.

winnielinnie · Aug 5, 2023

Yup. I have an opinion on this and other related oversights, but it's been done to death on these forums.

WI_Hedgehog · Aug 5, 2023

Synology can use "date" and "minimum number of retained snapshots" concurrently, which makes sense. Therefore via competition there is value at the small- office level.

Home user snapshot retention for offline servers should be 3 months given vacations. If you have a family emergency make that 6 months. If you become the emergency or should be at least a year.

Constantin · Aug 5, 2023

Could one way to create a “hold” be to check the “retain snapshots until they have successfully been replicated to a backup server feature” in replication tasks?

Then tell whoever to turn off the backup server when the hold event starts, thereby stopping replications and hence local snapshot destructions?

tnuser9999 · Aug 6, 2023

For now I guess I will set snapshot lifetime to something like 1 year even though I really wish not to have them that long to prevent breaking replication. There have been times when the offline backup server isn't been powered on for months at a time.

WI_Hedgehog · Aug 6, 2023

tnuser9999 said:
For now I guess I will set snapshot lifetime to something like 1 year even though I really wish not to have them that long to prevent breaking replication. There have been times when the offline backup server isn't been powered on for months at a time.

Months??? Picture the server power supply pops and fries all the drive controllers..."months" of data lost. Or jenky user Herbert plugs a ransomware infected USB drive into his/her/other PC and infects the server. Or like happened to other users recently, cooling fan runs out of bearing juice and the system runs up to 45C causing HDD bearing juice to evaporate and now ya got thunky drives hammering out the drumline to "Livin' On A Prayer" by The King of Balsam.

tnuser9999 · Aug 6, 2023

WI_Hedgehog said:
Months??? Picture the server power supply pops and fries all the drive controllers..."months" of data lost. Or jenky user Herbert plugs a ransomware infected USB drive into his/her/other PC and infects the server. Or like happened to other users recently, cooling fan runs out of bearing juice and the system runs up to 45C causing HDD bearing juice to evaporate and now ya got thunky drives hammering out the drumline to "Livin' On A Prayer" by The King of Balsam.

My apologies, I stated that incorrectly. This is just for personal use and I was away for over 6 months. I had both the backup server and the "NAS" server both powered off. The data I very much wanted to keep but it was static, not changing. One server onsite and the other remote.

Not the end of the world but it would be a pain to come back to all the source snapshots gone because they didn't meet the "lifetime" and then replication with the backup server be broken. There are times both servers are up and active and data is changing. I guess I will go with something like with snapshots on the source; hourly snapshots/life time 1 day, daily snapshots/life time 1 week, weekly snapshots/life time 1 year.

Then the replication set as below:

Constantin · Aug 6, 2023

When I have run into this, the easiest solution for me ended up being to bring the remote server to the local one and start the replication process over.

It’s much simpler when both machines are local and with the data volume that may be involved will also take a lot less time. My SSH+netcat replication task was running at several hundred MB/s for hours on end.

tnuser9999 · Aug 6, 2023

Constantin said:
Could one way to create a “hold” be to check the “retain snapshots until they have successfully been replicated to a backup server feature” in replication tasks?

Then tell whoever to turn off the backup server when the hold event starts, thereby stopping replications and hence local snapshot destructions?

I am not finding this setting under task / snapshots

Constantin · Aug 6, 2023

See tasks -> replication then set up a replication job with the remote server. It’s in the window where the actual job is set up. The snapshot retention toggle is the lower left under the menu tree for what snapshots you want to replicate to the remote server. It’s a check box with a description like “retain snapshots until successful replication” or something like that.

tnuser9999 · Aug 6, 2023

ah I guess this would be for a push setup then, all of the things I am running into makes me think about the differences between replication and backups. I am trying to fit a square peg into a round hole. I love the convenience and the speed at which the replication takes place however I do see the limitations. I may reluctantly look at rsnapshot, restic or something, though the time it takes to backup is painful.

Constantin · Aug 6, 2023

Depends on your use case, I suppose, as well as how much free space you have lying around. I like snapshots because like Time Machine they make it possible to go back to a specific date before some unhappy event occurred and pretend like nothing happened at all.

winnielinnie · Aug 6, 2023

If you just want a quick and easy, albeit rudimentary safeguard in the meantime, you can do this:

Complete a full replication (since incremental is no longer possible, due to pruned snapshots)
Hold the latest "common" snapshot on the source (and optionally, on the destination as well)
Eventually (a year later?) release this snapshot, since you will now hold a more recent common snapshot
The process repeats from Step 2 at your own manual intervention, at your preferred intervals of time

The hold command is simple.

Code:

zfs hold tag dataset@snapshot

It will look something like this. (I use "safeguard" as the tag, since it helps explain the purpose.)

Code:

zfs hold safeguard mypool/dataset@auto-2023-08-06_00-00

To release a hold is just as intuitive.

Code:

zfs release tag dataset@snapshot

So in the above example, it looks like this.

Code:

zfs release safeguard mypool/dataset@auto-2023-08-06_00-00

To view a list of snapshots protected with "hold", feed the list command to the holds command.

Code:

zfs list -H -o name -t snapshot mypool/dataset | xargs zfs holds

Now you see why I submitted a feature request to streamline this into the GUI?

More practical with point-and-click and shiny buttons. (Doesn't look like it'll ever be implemented in Core.

)

tnuser9999 · Aug 6, 2023

Thanks for the information, this sounds like a good safegaurd.

Important Announcement for the TrueNAS Community.

Server off longer than snapshot retention period

tnuser9999

Dabbler

winnielinnie

MVP

tnuser9999

Dabbler

tnuser9999

Dabbler

winnielinnie

MVP

WI_Hedgehog

Guru

Constantin

Vampire Pig

tnuser9999

Dabbler

WI_Hedgehog

Guru

tnuser9999

Dabbler

Constantin

Vampire Pig

tnuser9999

Dabbler

Constantin

Vampire Pig

tnuser9999

Dabbler

Constantin

Vampire Pig

winnielinnie

MVP

tnuser9999

Dabbler

Similar threads

Important Announcement for the TrueNAS Community.

Server off longer than snapshot retention period

Dabbler

MVP

Dabbler

Dabbler

MVP

Guru

Vampire Pig

Dabbler

Guru

Dabbler

Vampire Pig

Dabbler

Vampire Pig

Dabbler

Vampire Pig

MVP

Dabbler

Important Announcement for the TrueNAS Community.

Related topics on forums.truenas.com for thread: "Server off longer than snapshot retention period"

Similar threads