ZFS replication causing kernel panic

Lebesgue

Dabbler
Joined
Oct 10, 2016
Messages
17
Hi everyone,
I have setup two pools and replication in between, however shortly after this is initiated I encounter a kernel panic causing the server to reboot with approximately 10 minutes intervals. Replication has been enabled for months since v9.10 however kernel panic only happened recently on v11.2U5.

I have now disabled ZFS replication instead resorting to less efficient rsync. Since this server has been stable having neither SMART nor scrub errors reported.

The error is "panic: dva_get_dsize_sync() bad DVA []".
I have not been able to find a bug describing this - have anyone else experienced this?

Rgds. Thomas
 

toadman

Guru
Joined
Jun 4, 2013
Messages
619
Hi. I have not experienced it. A quick google search seems to indicate a sparse few others have seen it. Not sure on the root cause, but seems to imply disk corruption.
 

Lebesgue

Dabbler
Joined
Oct 10, 2016
Messages
17
Hi,
update on my side still pointing to this being a FreeNAS/BSD kernel or ZFS software issue rather than hardware related.
Initiated full disk write overwriting with zero on the 4 disks on the backup pool from within FreeNAS GUI. Ran for app. 4 days without any errors reported nor the server rebooting as previously.
Recreated the pool and created ZFS replication job. This was running for some days as well.

The pools are now in sync and server have uptime of 14 days.

Rgds. Thomas
 

flashero

Cadet
Joined
Aug 30, 2019
Messages
6
Hi, how are you. I have experiencing, the same error, on TWO separate systems (that are replicating from a main server). Is this a bug?.

Best regards
 

flashero

Cadet
Joined
Aug 30, 2019
Messages
6
All systems are running FreeNAS-11.2-U5. One server is remote, and another local.
 

flashero

Cadet
Joined
Aug 30, 2019
Messages
6
All systems are running FreeNAS-11.2-U5. One server is remote, and another local.
 

rvassar

Guru
Joined
May 2, 2018
Messages
972
Hi, how are you. I have experiencing, the same error, on TWO separate systems (that are replicating from a main server). Is this a bug?.

Best regards

I would call this a bug. You need to submit the full stack trace. It may be in the end, a device driver that's not well supported, or flakey hardware. But if it crashes the kernel, and provides a stack trace, it's likely something the iX crew want to see.
 

flashero

Cadet
Joined
Aug 30, 2019
Messages
6
I would call this a bug. You need to submit the full stack trace. It may be in the end, a device driver that's not well supported, or flakey hardware. But if it crashes the kernel, and provides a stack trace, it's likely something the iX crew want to see.

Thank you very much forma the reply. How should I submit the full stack trace?

Best regards!
 

rvassar

Guru
Joined
May 2, 2018
Messages
972

Lebesgue

Dabbler
Joined
Oct 10, 2016
Messages
17
Hi flashero,
what I did at first was simply to disable the ZFS replication job I had enabled in the FreeNAS GUI. This prevented the regular 10 min. server server crash intervals, I had experienced prior to this.
I subsequently setup rsync to stress the disks, which worked as well and had it running for some days.
Subsequently as described above I eventually wiped all disks at the receiving side overwriting all block with zero (took days), before recreating the pool and enabling ZFS replication again.
My FreeNAS server has been stable since, although I fail to explain why.

In /data/crash you should find the tar'ed crash logs. Attach these if you submit a bug report.

Rgds. Thomas
 

flashero

Cadet
Joined
Aug 30, 2019
Messages
6
Hi flashero,
what I did at first was simply to disable the ZFS replication job I had enabled in the FreeNAS GUI. This prevented the regular 10 min. server server crash intervals, I had experienced prior to this.
I subsequently setup rsync to stress the disks, which worked as well and had it running for some days.
Subsequently as described above I eventually wiped all disks at the receiving side overwriting all block with zero (took days), before recreating the pool and enabling ZFS replication again.
My FreeNAS server has been stable since, although I fail to explain why.

In /data/crash you should find the tar'ed crash logs. Attach these if you submit a bug report.

Rgds. Thomas

Hi, I have installed a fresh server, with new disks, and the panics remain. I have allready submited the bug report, and waiting.

Best regards!
 

appliance

Explorer
Joined
Nov 6, 2019
Messages
96
Last edited:

dwoodard3950

Dabbler
Joined
Dec 16, 2012
Messages
18
Similar crash here which results in a reboot. The message is;
panic: dva_get_dsize_sync(): bad DVA ...
cpuid = 6
KDB: stack backtrace:
db_trace_self_wrapper() at ...

I'm able to reliably duplicate the crash with a specific snapshot on a specific dataset.

Unfortunately, not sure where to do with it, other than wait for this snapshot to rotate out of the source data.

Machine in question:
Destination server is Supermicro X10SDV-6C-TLN4F with 64GB ECC.
 

Henry L

Dabbler
Joined
Nov 21, 2013
Messages
10
Same crash issue here. Same panic: dva_get_dsize_sync(): bad DVA

My work around, while not ideal delete the last 2 snapshots, then replication runs fine.
 

styno

Patron
Joined
Apr 11, 2016
Messages
466
It looks like that nasty bug is finally squashed! ( commit here ) HAPPY DAYS.
I hope there will be some sort of hotfix and we don't have to wait for the next update cycle.
 
Top