Can't pin down why I keep developing this error with snapshot replication to 2 other FreeNASes on same LAN

jobsoftinc

Cadet
Joined
Oct 27, 2015
Messages
7
Not sure what additional info you guys would need, but I've got several periodic snapshots replicating to 2 different same rev level FreeNASes on the same gigabit LAN. This HAD been working very smoothly for several months too! But, I think after one of the updates back in July, suddenly I started getting the the error 'kern.ipc.maxpipekva exceeded followed by Signal 6 core dump of 'zfs'. If I reboot this main server, it goes back to replicating, for a while. I tried to increase this parameter within the FreeNAS GUI, but while it seems to take longer for this error to resurface, resurface it always seems to.

1598285013459.png


1598285053507.png


The server is a Dell C2100 with dual Xeon L5630s and 48GB ram.

1598285614280.png


The snapshots are every 15 mins keeping 2 weeks worth. Most incrementals are small to nothing in size. And again, this had been going fine with no issues for at least 2 months prior to this anomaly first appearing.

Any thoughts anyone?

Thanks!

Mark
 

jobsoftinc

Cadet
Joined
Oct 27, 2015
Messages
7
I will add one of these replication task setups. All are more or less setup the same way. I will say that they were setup originally as SSH, but just yesterday, I decided to switch them to SSH+NETCAT as that seems to be the "Add Replication" "Basic" wizard's default now in 11.3.

1598286729377.png
 

jobsoftinc

Cadet
Joined
Oct 27, 2015
Messages
7
Well, I suppose I will be forced to reply to my own post! :)

Thinking that perhaps because of being so long since replication had completed successfully thus leaving quite a few snapshots pending replication, I jacked way up the value for tunable 'kern.ipc.maxpipekva' to 256MB:

1598692161799.png


and rebooted the server. As one would expect, this time it got a LOT further. But after 2 days, the error condition returned. All the while, most of the replication tasks showed "RUNNING". So, when I started to get the errors again, I took a look at one of the tasks still running, and this is what I found:

1598692127478.png


Not really thinking initially of the number of pending snapshots here, I decided to see just how many snapshots were pending for this particular task:

1598693140714.png


OK, a LOT more than I had thought! However, that aside, what is consuming (opening) all these pipes, and not closing them such that no matter what I've done, I keep running out of this resource. Because of recursion and child datasets with just this one task (each user in Nextcloud has child dataset by the user's username - so that I can independently manipulate the datasets per user (rollbacks, etc)), a considerable number of pending datasets have racked up. But this shouldn't matter, should it? If it has 100,000 snapshots to catch up, it should just run until it's done, right? And I think it would if it didn't keep erroring out because of, essentially, running out of available pipes to open. As each snapshot is replicated, should whatever pipe resource was consumed, then be released?

These thousands of incremental snapshots are part of our ransomware data protection strategy. We just happen to be using Nextcloud as the sync means. But this anomaly impacts all replication tasks. And I don't want to just purge everything and start it all back up from scratch. BOTTOM LINE is that, I should NOT have to!

This is a major bug or design flaw here! I hope someone at iXSystems might be paying attention to this thread! I don't know if this "leak" bug is in FreeBSD, ZFS or FreeNAS, but it will quickly diminish the usability/suitability factor of FreeNAS and/or TrueNAS for any IT manager or CIO that might want to use periodic snapshots and replication offered by FreeNAS/TrueNAS for this same purpose.
 
Last edited:
Joined
Jan 4, 2014
Messages
1,644
But this shouldn't matter, should it? If it has 100,000 snapshots to catch up, it should just run until it's done, right?
I wonder. Not if you're generating snapshots at a faster rate than you can address. Then you have a runaway system and you will run out of resources at some point.
 
Last edited:

jobsoftinc

Cadet
Joined
Oct 27, 2015
Messages
7
Don't think so. It kept up fine for several months. Always showed the green FINISHED each day. To be honest, it seems like it was when 11.2 was upgraded to 11.3 that this error condition first showed up. Daily I would venture maybe 3000 periodic snapshots. But replication is ongoing as well. So, at any given instance, it may only be dealing with a couple of hundred, most small snapshots. But to have a system that can't recover after some environmental condition (like downed replication server) is a real problem for reliability of replication in FreeNAS/TrueNAS.
 

jobsoftinc

Cadet
Joined
Oct 27, 2015
Messages
7
FWIW, I've also now opened a bug report in JIRA. I don't think we're doing anything all that out of the ordinary either. Using mostly the defaults of the replication wizard, I have certain high priority datasets that I need replicated. Not at all out of the question. If, however, some design aspect (like keeping pipes open for the length of a replication task's duration for, say, performance improvements) is going to potentially lead to an unrecoverable scenario, FreeNAS GUI needs to detect and WARN about this potential road hazard, or block a user from getting into this boat in the first place.
 
Joined
Jan 4, 2014
Messages
1,644
You might want to reconsider your snapshot strategy. This is what I do for nested user datasets under a home root and I haven't experienced any problems yet with snapshot replication:

Snapshot frequencyLifetimeMax number of snapshots per child dataset
Every 15 mins1 hour4
Every hour1 day24
Every day2 weeks14
Every week3 months12-14
Every month1 year12
Maximum number of snapshots per dataset in 1 year66-68

I'm able to keep a year's worth of snapshots per dataset. The closer I am to current time, the higher the snapshot frequency; the further away from current time, the lower the snapshot frequency.

With the approach you've adopted you have 1,344 snapshots per dataset (4 per hour x 24 hours x 14 days) generated in a 2 week period.
 
Last edited:

jobsoftinc

Cadet
Joined
Oct 27, 2015
Messages
7
Very interesting temporally-hierarchical approach! :) I'm still curious to see what comes back on the JIRA ticket as, while I see the logic in what you've outlined, I still don't think that, using only the wizard to setup some replications of my datasets, that a 1-2 week disruption (with no loss of snapshots in common on each end), should result in an unrecoverable situation. I can see most Average Joes easily getting in the same boat. It would seem just closing a given pipe associated with a given replication step (even be it a 0 byte snapshot) would then never run the system out of this resource (even when it's been jacked up to 268 MILLION! :)). And while it may be 2-3 days to resync, it would still recover without having to restart from scratch. I understand when snapshots in common expire off on one end of the other, sure. But regardless of my setup's efficiency, a storage solution like TrueNAS/FreeNAS that touts as an "enterprise capable" NAS had better be able to recover from a scenario like this, especially when the necessary snapshots in common still exist between each end. Or at least warn or block the user in the GUI about it. Thanks for you reply!!
 

Apollo

Wizard
Joined
Jun 13, 2013
Messages
1,458
In general, one single replication should be running if targeting a single pool otherwise ZFS will have trouble requesting snapshots on the target causing increased delays.
There is an option for snapshots to not be created if the dataset filesystem hasn't been modified. This can significantly reduce the number of snapshots.
 
Top