Large number of ZFS replications eventually ends with remote pool corruption

gamebrigada

Dabbler
Joined
Jul 9, 2019
Messages
12
Hey guys,
I've been running a production system for about 8 months now. Everything is pretty stable except for a few quirks.

Primary Hardware: 45 10TB disks, 128GB ECC, Dual xeon 4108, Broadcom 3108 Raid controller running in passthrough mode.

My snapshotting and replication tasks are hourly, and I seem to keep running into the same problem.... Eventually the remote system will hard reset, the pool corrupts and I get a Fatal trap 12 on import on boot.

I started with some pretty crappy hardware for this, but have moved to an identical system as the primary, and I've encountered this twice. I'm now out of ideas on what misconfiguration could be corrupting my pool. I'm even more terrified that my primary pool will do the same thing...

I'm willing to do paid support if I can learn from it on how to resolve this kind of problem, or maybe if they figure out what is misconfigured?
 
Joined
Jul 3, 2015
Messages
926
Broadcom 3108 Raid controller running in passthrough mode
Personally I would get rid of this for a start and replace it with the 3008. It may not be your issue but start with getting the right tools for the job.
 

gamebrigada

Dabbler
Joined
Jul 9, 2019
Messages
12
Personally I would get rid of this for a start and replace it with the 3008. It may not be your issue but start with getting the right tools for the job.
Shouldn't be a problem. Working fine on the primary. 3108 was chosen for its battery backup capabilities. So it can keep the drives alive for a bit so they flush their caches before the system goes out in case of a power failure. When it's working in passthrough mode, it's equivalent to any raid card running in IT mode.
 

ccav

Dabbler
Joined
Apr 28, 2019
Messages
15
I'm curious, what is your 45 * 10TB storage pool? Raidz or mirror?
 
Top