snapshot retention policy is being ignored

Fab Sidoli

Contributor
Joined
May 15, 2019
Messages
114
Dear All,

It seems as though my snapshot retention policy is being ignore. Before I open a bug report I would like to make sure I'm not being silly.

Under Tasks -> Periodic Snapshot Tasks I have set a snaphost lifetime of two weeks. As it stands, I have snapshots going back 19 days (i.e., more than two weeks). The snapshots were setup as part of a replication task and I wonder if this is somehow causing the issue.

I'm not sure what command line tools I can run to show you how this is configured, but attached is a screenshot if that helps?

Thanks,
Fab
Screenshot 2020-08-26 at 15.40.27.png
Screenshot 2020-08-26 at 15.40.27.png
 
Joined
Jan 4, 2014
Messages
1,644
FreeNAS version?
 

Fab Sidoli

Contributor
Joined
May 15, 2019
Messages
114
Apologies, I thought that was in my signature.

I'm currently running the latest update to 11.3.
 
Joined
Jan 4, 2014
Messages
1,644
As it stands, I have snapshots going back 19 days (i.e., more than two weeks).
On both source and target?

The snapshots were setup as part of a replication task and I wonder if this is somehow causing the issue.
A screenshot of the replication task (Tasks > Replication Tasks > Edit) might be useful.
 

orochics

Cadet
Joined
Aug 30, 2020
Messages
5
I can confirm I am experiencing the same thing where snapshots are not being deleted and are replicating too many to the destination. This is happening on all of the volumes I am replicating to multiple destinations. I have a 1 week retention, but i have way more than 7 days worth of snapshots. Things seemed to work properly until I added one destination that took longer than 7 days to finish replication. Now that all are caught up, I would expect the local snapshots that are older than 7 days to be removed, and on the next sync, the remote ones to be removed as well. I am running FreeNAS-11.3-U4.1.

Example snapshot:
1598809318756.png



Example of one of my replication tasks:

1598809384569.png


Snapshots on Freenas Server:
1598809433432.png


Snapshots on remote server:
1598809465377.png
 
Joined
Jan 4, 2014
Messages
1,644
Things seemed to work properly until I added one destination that took longer than 7 days to finish replication.
Interesting. This is not a scenario I've subjected my own servers to. I generally place servers on the same gigabit switch for the initial replication, which tends to take the longest. This is true even if the replication target is across the WAN or on the LAN, but wirelessly connected. If offsite, I'll bring a server back onsite and place it next to the source server on the same switch. Similarly, if a server is wirelessly connected. Doing it this way, for example, the longest it's taken me is about 44 hours to replicate 14.5 TB between two servers, or around 330 GB/hr. Adopting this approach, I've not experienced the subsequent replication lifetime issues that you're observing.

How replication behaves under extreme conditions is not altogether clear. It's a new replication engine for 11.3 and I'm not sure what its operating limits are, how it behaves outside of these limits, or what stress testing replication has been subjected to. It's possible that server resources are being exhausted e.g. queues. Community members are often first to pick up on issues so if you feel there is a problem, you might want to raise an issue ticket on Jira.
 

Apollo

Wizard
Joined
Jun 13, 2013
Messages
1,458

orochics

Cadet
Joined
Aug 30, 2020
Messages
5
There are no snapshots being held.

[root@pdxfreenas ~]# zfs list -t snapshot | awk '{print $1}' | grep "@" | while read line; do zfs holds $line; done | grep keep
[root@pdxfreenas ~]#

Additionally; I checked for holds while replication jobs run; It doesn't look like holds are added during a replication job, and removed afterwards.
 

Apollo

Wizard
Joined
Jun 13, 2013
Messages
1,458
There are no snapshots being held.

[root@pdxfreenas ~]# zfs list -t snapshot | awk '{print $1}' | grep "@" | while read line; do zfs holds $line; done | grep keep
[root@pdxfreenas ~]#

Additionally; I checked for holds while replication jobs run; It doesn't look like holds are added during a replication job, and removed afterwards.
try something like:
zfs get -r defer_destroy,userrefs | grep -iname "defer_destroy" | grep -iname "on"
 

ukro

Dabbler
Joined
Mar 6, 2020
Messages
10
Do you have any other server that is making replication task pull from your server?
Are the snapshots and replication on one server, or you do snapshots on one and relpication task on the other server?
As i partialy solved my problem by turning off the replication task on second server and the snapshots lifetime seems okay now :

So as i can see the snapshot and replication task must be on one device.
Please confirm so we can put it to your ticket.
thx
 

orochics

Cadet
Joined
Aug 30, 2020
Messages
5
- Do you have any other server that is making replication task pull from your server?
There are no servers pulling snapshots from my freenas box.

- Are the snapshots and replication on one server, or you do snapshots on one and relpication task on the other server?
My freenas box has 5 Periodic snapshot tasks taking snapshots. I have 4 replication tasks that send the snaphots, 1 locally, 3 remote.


Could you clarify; when you have 1 snapshot task tied to 1 replication task it works as intended?

In my case, i have 1 snapshot task and 4 replication tasks that use them.
 

Fab Sidoli

Contributor
Joined
May 15, 2019
Messages
114
Hi All.

Sorry, I've been on leave which is why I have been silent.

I only have a replication task set up on the source and the retention policy is set to be "Same as Source" so at least that aspect is working.

My snapshots are being kept for 19 days, not the two weeks I have set them to. It's consistently 19 days, which makes little sense to me. This is true on both the source and target.
 

ukro

Dabbler
Joined
Mar 6, 2020
Messages
10
Hi All.

Sorry, I've been on leave which is why I have been silent.

I only have a replication task set up on the source and the retention policy is set to be "Same as Source" so at least that aspect is working.

My snapshots are being kept for 19 days, not the two weeks I have set them to. It's consistently 19 days, which makes little sense to me. This is true on both the source and target.
ohhhhhh constantly 19, LOL
I have hoarding snapshots xD and don't know why....
So i guess our situation is different
 

ukro

Dabbler
Joined
Mar 6, 2020
Messages
10
- Do you have any other server that is making replication task pull from your server?
There are no servers pulling snapshots from my freenas box.

- Are the snapshots and replication on one server, or you do snapshots on one and relpication task on the other server?
My freenas box has 5 Periodic snapshot tasks taking snapshots. I have 4 replication tasks that send the snaphots, 1 locally, 3 remote.


Could you clarify; when you have 1 snapshot task tied to 1 replication task it works as intended?

In my case, i have 1 snapshot task and 4 replication tasks that use them.

Could you clarify; when you have 1 snapshot task tied to 1 replication task it works as intended?
What do you mean?

If i hadn't have problem with hoarding snapshots for no reason i would be happy. :)

In normal logic do i understand correctly that one will do snapshot and then replicate it locally or remote, thats all? or is there some magic with drums aroung and i need to do something?
 

Apollo

Wizard
Joined
Jun 13, 2013
Messages
1,458
ohhhhhh constantly 19, LOL
I have hoarding snapshots xD and don't know why....
So i guess our situation is different
Maybe 15 9am-5pm working days excluding week ends.
 
Top