panic: dva_get_dsize_sync(): bad DVA

dak180

Patron
Joined
Nov 22, 2017
Messages
310
Last night my server started rebooting on its own with the following message:

IMG_2410_.png


What exactly is this indicative of?
Is this a fixable issue; if so how?
What is the best way to keep this from happening in the future?
 
D

dlavigne

Guest
It's possible the boot device failed. You can confirm if that is the issue by installing the same version of FreeNAS to a new USB stick. If it boots, you can then restore your config.

If the issue persists, it may indicate a more serious hardware failure.
 

dak180

Patron
Joined
Nov 22, 2017
Messages
310
It's possible the boot device failed. You can confirm if that is the issue by installing the same version of FreeNAS to a new USB stick. If it boots, you can then restore your config.
As noted in my sig my boot device is a mirrored pair of intel ssds; given that what sort of hardware failure would you expect to see something like this happen?

Also, if I have not already made this clear the system boots completely and this happens after about 1.5-2 mins after boot is done, at which point it reboots; is that still consistent with a boot device failure?
 
D

dlavigne

Guest
Something is not happy. Perhaps there was a power surge?
Trying to boot from a fresh install of a USB stick helps to narrow down whether or not it is the boot device or requires deeper investigation.
 

dak180

Patron
Joined
Nov 22, 2017
Messages
310
Something is not happy. Perhaps there was a power surge?
I would like to think that would not be an issue since the UPS it is connected to is working properly.
Trying to boot from a fresh install of a USB stick helps to narrow down whether or not it is the boot device or requires deeper investigation.
I got a thumb drive install up and running and restored from a backup config so far 45 mins up without panics.

What would be the best next steps (I would like to recover the ssh keys if I can)?

So you will need to reconfigure any tasks requiring host keys and add them to your backup strategy.
 

dak180

Patron
Joined
Nov 22, 2017
Messages
310
SSH keys are not stored in the configuration database and must be backed up separately. System host keys are files with names beginning with ssh_host_ in /usr/local/etc/ssh/. The root user keys are stored in /root/.ssh.
The question is more about what would be the best steps to either repair my original boot mirror and/or temporarily mount it to get the key files off.

It would also be good to have an idea as to how to figure out why/how this happened in the first place.
 

dak180

Patron
Joined
Nov 22, 2017
Messages
310
I got a thumb drive install up and running and restored from a backup config so far 45 mins up without panics.
Look like i spoke too soon; I now have a new panic:
IMG_2411.png


Also smart tests and scrubs on my original boot drives come back with no issues.
 

dak180

Patron
Joined
Nov 22, 2017
Messages
310
Trying to boot from a fresh install of a USB stick helps to narrow down whether or not it is the boot device or requires deeper investigation.
So, since I am still getting panics (although different ones) what are the next investigative steps? Is this something I should file a bug report for?
 

appliance

Explorer
Joined
Nov 6, 2019
Messages
96
i have a similar daily panic but it's triggered by replication.. so i had to turn off this major feature of the whole solution. interestingly it happens on the same 480GB SSD drive you have.
 

styno

Patron
Joined
Apr 11, 2016
Messages
466
I too have experienced panics on the replication target (no SSD's involved in the data pool here).

I typically see that the replication finishes without issue but the panics are triggered by a scrub (or worst -as you can't stop that one- a resilver).
Panics I've spotted also range from 'DVA' to 'blkptr has invalid TYPE'. All after aforementioned zfs replications and multiple pool rebuilds.
Both systems are build with ECC memory, the source system never had this issue.
As my tickets are going nowhere I now did the sync/backup via rsync which is not optimal at all.

I hope to see this resolved in 11.3 as -as far as I understood- the zfs replication is completely overhauled in that version. Tests with only 11.3(RC) as replication target unfortunately still triggers the panics.
 

appliance

Explorer
Joined
Nov 6, 2019
Messages
96
i've got most 'warnings' from tickets or forum threads accumulated until the pool blew up with 'blkptr has invalid TYPE' panic on import (so no longer on just disfunctional replication). from there, another UI error similar to this sad story happened and UI reshuffled drives and geli assignments and wiped another pool. both pools replicated to each other, so the incredible 2 separate drive "backup" on top of RAID was instantly gone. UPS, selfpowered drives, ECC, none of this will save you from bugs. (actually trying to recall when a filesystem crashed a machine in last 30years and cannot, perhaps only a bad sector could crash widows3.11 in 90s?).
i have yet another backup as my setup was only 90% done, but not so well organised, months of work is gone, so i moved to hated BTRFS hoping their send/receive functions work and will jump on the ship once native encryption ZFS is ready and replication issues are finally recognized, instead of blaming everyones hardware. 11.3 will likely stay in this shape given the priority of tickets, i'm looking forward for v13 looking at, citing, "create a culture of testing" (yes please!), "Forwarding: 100gbps" (yes, let's abandon this slow 10gbit) and zfs persistent arc, native encr etc.
 

kweevuss

Cadet
Joined
Feb 26, 2020
Messages
1
I am also running into a similar issue. At first it was just replications causing kernal panics. Just recently for whatever reason when running a scrub I get a kernal panic with blkptr has invalid TYPE. As a troubleshooting step, I was going through and deleting all the old datasets that I had when running replication to this host, and during the one delete it kernal panic again with the same error, and now unfortunately on boot when importing the ZFS volume it does the same thing and I cannot boot it again. It may look like I will be rebuilding soon and restoring from backups.
 
Top