Can't pin down why I keep developing this error with snapshot replication to 2 other FreeNASes on same LAN

jobsoftinc · Aug 24, 2020

Not sure what additional info you guys would need, but I've got several periodic snapshots replicating to 2 different same rev level FreeNASes on the same gigabit LAN. This HAD been working very smoothly for several months too! But, I think after one of the updates back in July, suddenly I started getting the the error 'kern.ipc.maxpipekva exceeded followed by Signal 6 core dump of 'zfs'. If I reboot this main server, it goes back to replicating, for a while. I tried to increase this parameter within the FreeNAS GUI, but while it seems to take longer for this error to resurface, resurface it always seems to.

The server is a Dell C2100 with dual Xeon L5630s and 48GB ram.

The snapshots are every 15 mins keeping 2 weeks worth. Most incrementals are small to nothing in size. And again, this had been going fine with no issues for at least 2 months prior to this anomaly first appearing.

Any thoughts anyone?

Thanks!

Mark

jobsoftinc · Aug 24, 2020

I will add one of these replication task setups. All are more or less setup the same way. I will say that they were setup originally as SSH, but just yesterday, I decided to switch them to SSH+NETCAT as that seems to be the "Add Replication" "Basic" wizard's default now in 11.3.

jobsoftinc · Aug 29, 2020

Well, I suppose I will be forced to reply to my own post! :)

Thinking that perhaps because of being so long since replication had completed successfully thus leaving quite a few snapshots pending replication, I jacked way up the value for tunable 'kern.ipc.maxpipekva' to 256MB:

and rebooted the server. As one would expect, this time it got a LOT further. But after 2 days, the error condition returned. All the while, most of the replication tasks showed "RUNNING". So, when I started to get the errors again, I took a look at one of the tasks still running, and this is what I found:

Not really thinking initially of the number of pending snapshots here, I decided to see just how many snapshots were pending for this particular task:

OK, a LOT more than I had thought! However, that aside, what is consuming (opening) all these pipes, and not closing them such that no matter what I've done, I keep running out of this resource. Because of recursion and child datasets with just this one task (each user in Nextcloud has child dataset by the user's username - so that I can independently manipulate the datasets per user (rollbacks, etc)), a considerable number of pending datasets have racked up. But this shouldn't matter, should it? If it has 100,000 snapshots to catch up, it should just run until it's done, right? And I think it would if it didn't keep erroring out because of, essentially, running out of available pipes to open. As each snapshot is replicated, should whatever pipe resource was consumed, then be released?

These thousands of incremental snapshots are part of our ransomware data protection strategy. We just happen to be using Nextcloud as the sync means. But this anomaly impacts all replication tasks. And I don't want to just purge everything and start it all back up from scratch. BOTTOM LINE is that, I should NOT have to!

This is a major bug or design flaw here! I hope someone at iXSystems might be paying attention to this thread! I don't know if this "leak" bug is in FreeBSD, ZFS or FreeNAS, but it will quickly diminish the usability/suitability factor of FreeNAS and/or TrueNAS for any IT manager or CIO that might want to use periodic snapshots and replication offered by FreeNAS/TrueNAS for this same purpose.

Basil Hendroff · Aug 29, 2020

jobsoftinc said:
But this shouldn't matter, should it? If it has 100,000 snapshots to catch up, it should just run until it's done, right?

I wonder. Not if you're generating snapshots at a faster rate than you can address. Then you have a runaway system and you will run out of resources at some point.

jobsoftinc · Aug 29, 2020

Don't think so. It kept up fine for several months. Always showed the green FINISHED each day. To be honest, it seems like it was when 11.2 was upgraded to 11.3 that this error condition first showed up. Daily I would venture maybe 3000 periodic snapshots. But replication is ongoing as well. So, at any given instance, it may only be dealing with a couple of hundred, most small snapshots. But to have a system that can't recover after some environmental condition (like downed replication server) is a real problem for reliability of replication in FreeNAS/TrueNAS.

jobsoftinc · Aug 29, 2020

FWIW, I've also now opened a bug report in JIRA. I don't think we're doing anything all that out of the ordinary either. Using mostly the defaults of the replication wizard, I have certain high priority datasets that I need replicated. Not at all out of the question. If, however, some design aspect (like keeping pipes open for the length of a replication task's duration for, say, performance improvements) is going to potentially lead to an unrecoverable scenario, FreeNAS GUI needs to detect and WARN about this potential road hazard, or block a user from getting into this boat in the first place.

Basil Hendroff · Aug 29, 2020

You might want to reconsider your snapshot strategy. This is what I do for nested user datasets under a home root and I haven't experienced any problems yet with snapshot replication:

Snapshot frequency	Lifetime	Max number of snapshots per child dataset
Every 15 mins	1 hour	4
Every hour	1 day	24
Every day	2 weeks	14
Every week	3 months	12-14
Every month	1 year	12
Maximum number of snapshots per dataset in 1 year		66-68

I'm able to keep a year's worth of snapshots per dataset. The closer I am to current time, the higher the snapshot frequency; the further away from current time, the lower the snapshot frequency.

With the approach you've adopted you have 1,344 snapshots per dataset (4 per hour x 24 hours x 14 days) generated in a 2 week period.

jobsoftinc · Aug 29, 2020

Very interesting temporally-hierarchical approach! :) I'm still curious to see what comes back on the JIRA ticket as, while I see the logic in what you've outlined, I still don't think that, using only the wizard to setup some replications of my datasets, that a 1-2 week disruption (with no loss of snapshots in common on each end), should result in an unrecoverable situation. I can see most Average Joes easily getting in the same boat. It would seem just closing a given pipe associated with a given replication step (even be it a 0 byte snapshot) would then never run the system out of this resource (even when it's been jacked up to 268 MILLION! :)). And while it may be 2-3 days to resync, it would still recover without having to restart from scratch. I understand when snapshots in common expire off on one end of the other, sure. But regardless of my setup's efficiency, a storage solution like TrueNAS/FreeNAS that touts as an "enterprise capable" NAS had better be able to recover from a scenario like this, especially when the necessary snapshots in common still exist between each end. Or at least warn or block the user in the GUI about it. Thanks for you reply!!

Apollo · Aug 29, 2020

In general, one single replication should be running if targeting a single pool otherwise ZFS will have trouble requesting snapshots on the target causing increased delays.
There is an option for snapshots to not be created if the dataset filesystem hasn't been modified. This can significantly reduce the number of snapshots.

Server	Version	HPE Proliant Micro Server	CPU	RAM (DDR3 ECC @ 1600 MHz)	Pool	Boot	Battery Backup	Jails	VMs	Docker	Other
truenas-l	CORE 12.0-U6	Gen 8	Intel Xeon E3-1270L V2 @ 2.3GHz	16GB	4 x 10TB WD Red+ in RAID-Z1	2 x 16GB Verbatim Store n Go USB 3.0 Gold flash drives in mirror	PowerShield Defender 1200VA. Server is NUT master	DNSmasq, Heimdall, Nextcloud, Plex (Beta), Resilio Sync, Tautulli, Transmission, WordPress			File & media server. Replication source.
truenas-l2	CORE 12.0-U6	Gen 8	Intel Xeon E3-1220L V2 @ 3.5GHz	16GB	4 x 8TB WD Red+ in RAID-Z1	2 x 16GB Verbatim Store n Go USB 3.0 Gold flash drives in mirror	PowerShield Defender 1200VA. Server is NUT slave	Caddy Reverse Proxy	Ubuntu 20.0.1 Desktop (2 core, 4GB RAM, 150GB HDD) with Docker and Docker Compose	OnlyOffice, Collabora, TrueCommand, TC 1.2.3 & 1.3.2 Portainer, Nextcloud-Apache, Nextcloud-FPM, WordPress	Plex DVR media server.
truenas-b1	CORE 12.0-U6	Gen 7 N54L	AMD Turion II Neo N54L @ 2.2GHz	12GB	5 x 6TB WD Red+ in RAID-Z1	2 x 16GB SanDisk Cruzer Facet USB 2.0 flash drives in mirror	PowerShield Defender 1200VA. Server is NUT master				Media replication target.
truenas-b2	CORE 12.0-U6	Gen 7 N54L	AMD Turion II Neo N54L @ 2.2GHz	12GB	5 x 4TB WD Red+ in RAID-Z1	2 x 16GB SanDisk Cruzer Facet USB 2.0 flash drives in mirror	PowerShield Defender 1200VA Server is NUT slave				File replication target.
truenas-r	CORE 12.0-U6	Gen 7 N54L	AMD Turion II Neo N54L @ 2.2GHz	10GB	5 x 6TB WD Red+ in RAID-Z1	2 x 16GB SanDisk Cruzer Facet USB 2.0 flash drives in mirror		Plex (Beta)			Off-site backup
truenas-t	CORE 12.0-U6	Gen 7 N40L	AMD Turion II Neo N40L @ 1.5GHz	8GB	4 x 3TB WD Red+ in RAID-Z1	2 x 16GB SanDisk Cruzer Facet USB 2.0 flash drives in mirror					Test server
truenas-s	SCALE 22.02-RC.1	Gen 8	Intel Xeon E3-1220L V2 @ 3.5GHz	16GB	2 x 1TB WD Red in mirror	1 x 32GB Transcend M.2 SSD in a USB 3.1 enclosure				OnlyOffice, Collabora, TrueCommand	Test server

Server	Version	HPE Proliant Micro Server	CPU	RAM (DDR3 ECC @ 1600 MHz)	Pool	Boot	Battery Backup	Jails	VMs	Docker	Other
truenas-l	CORE 12.0-U6	Gen 8	Intel Xeon E3-1270L V2 @ 2.3GHz	16GB	4 x 10TB WD Red+ in RAID-Z1	2 x 16GB Verbatim Store n Go USB 3.0 Gold flash drives in mirror	PowerShield Defender 1200VA. Server is NUT master	DNSmasq, Heimdall, Nextcloud, Plex (Beta), Resilio Sync, Tautulli, Transmission, WordPress			File & media server. Replication source.
truenas-l2	CORE 12.0-U6	Gen 8	Intel Xeon E3-1220L V2 @ 3.5GHz	16GB	4 x 8TB WD Red+ in RAID-Z1	2 x 16GB Verbatim Store n Go USB 3.0 Gold flash drives in mirror	PowerShield Defender 1200VA. Server is NUT slave	Caddy Reverse Proxy	Ubuntu 20.0.1 Desktop (2 core, 4GB RAM, 150GB HDD) with Docker and Docker Compose	OnlyOffice, Collabora, TrueCommand, TC 1.2.3 & 1.3.2 Portainer, Nextcloud-Apache, Nextcloud-FPM, WordPress	Plex DVR media server.
truenas-b1	CORE 12.0-U6	Gen 7 N54L	AMD Turion II Neo N54L @ 2.2GHz	12GB	5 x 6TB WD Red+ in RAID-Z1	2 x 16GB SanDisk Cruzer Facet USB 2.0 flash drives in mirror	PowerShield Defender 1200VA. Server is NUT master				Media replication target.
truenas-b2	CORE 12.0-U6	Gen 7 N54L	AMD Turion II Neo N54L @ 2.2GHz	12GB	5 x 4TB WD Red+ in RAID-Z1	2 x 16GB SanDisk Cruzer Facet USB 2.0 flash drives in mirror	PowerShield Defender 1200VA Server is NUT slave				File replication target.
truenas-r	CORE 12.0-U6	Gen 7 N54L	AMD Turion II Neo N54L @ 2.2GHz	10GB	5 x 6TB WD Red+ in RAID-Z1	2 x 16GB SanDisk Cruzer Facet USB 2.0 flash drives in mirror		Plex (Beta)			Off-site backup
truenas-t	CORE 12.0-U6	Gen 7 N40L	AMD Turion II Neo N40L @ 1.5GHz	8GB	4 x 3TB WD Red+ in RAID-Z1	2 x 16GB SanDisk Cruzer Facet USB 2.0 flash drives in mirror					Test server
truenas-s	SCALE 22.02-RC.1	Gen 8	Intel Xeon E3-1220L V2 @ 3.5GHz	16GB	2 x 1TB WD Red in mirror	1 x 32GB Transcend M.2 SSD in a USB 3.1 enclosure				OnlyOffice, Collabora, TrueCommand	Test server

Important Announcement for the TrueNAS Community.

Can't pin down why I keep developing this error with snapshot replication to 2 other FreeNASes on same LAN

jobsoftinc

Cadet

jobsoftinc

Cadet

jobsoftinc

Cadet

Basil Hendroff

Wizard

jobsoftinc

Cadet

jobsoftinc

Cadet

Basil Hendroff

Wizard

jobsoftinc

Cadet

Apollo

Wizard

Similar threads