Panic mounting local filesystems

rhinok · Feb 23, 2014

Hi,

Last night I had a power failure. Now, upon rebooting the box (FreeNAS v9.2.1) I got (screenshot_3.jpg):

Code:

panic: solaris assert: 0 == dmu_bonus_hold(os, object, dl, &dl->dl_dbuf) (0x0 == 0x2)

Rebooting the box or swapping the USB key (I tried FreeNAS v9.2.0) shows the same errors.

The hardware: FreeNAS v9.2.1, CPU Intel i3 3225, 16 GB RAM, HDD WD Red 3TB x 2, WD Re3 x1 TB

Pool layout:

tank0 with 2 HDDs in RAID 1, 10 datasets, ~ 50% full
tank1 with a 1 HDD, 1 dataset, ~ 15% full

Compression (lz4) enabled on tank0 and inherited by all the datasets.
Scrubbing set to run once every 15 days, and SMART tests once a month.
There are no errors/repairs reported by the last scrub (or any other scrub) on both pools.

I'm pretty much stuck.

I'll try with the latest version of FreeNAS and report back. Is anything else I could try?

Thank you,
Paul

cyberjock · Feb 23, 2014

You didn't have a UPS did you?

rhinok · Feb 23, 2014

Unfortunately, no.

Using a fresh image (v9.2.1.1) I can reach the console, but when I try to import tank0 nothting happens, no output in console or in /var/log/messages.

Code:

[root@freenas ~]# zpool import tank0

the command apparently hung after ~5minutes and I got screenshot_6.png.
After a reboot, I tried to import the second pool:

Code:

[root@freenas ~]# zpool import tank1
cannot mount '/tank1': failed to create mountpoint
cannot mount '/tank1/media': failed to create mountpoint

Then I retried to import tank0 - the command is still running after over 15 minutes.

cyberjock · Feb 23, 2014

zpool import tank0 is not the proper way to mount a pool in FreeNAS.

Post the output of the following in pastebin:

zpool status
zpool import

rhinok · Feb 23, 2014

Output for import:

Code:

[root@freenas] ~# zpool import
   pool: tank1
     id: 3101957539559678790
  state: ONLINE
 status: Some supported features are not enabled on the pool.
 action: The pool can be imported using its name or numeric identifier, though
some features will not be available without an explicit 'zpool upgrade'.
 config:
 
tank1                                         ONLINE
 gptid/c417a6e8-4e35-11e3-b870-6805ca120ec5  ONLINE
 
   pool: tank0
     id: 774380436556752452
  state: ONLINE
 status: Some supported features are not enabled on the pool.
 action: The pool can be imported using its name or numeric identifier, though
some features will not be available without an explicit 'zpool upgrade'.
 config:
 
tank0                                           ONLINE
 mirror-0                                      ONLINE
   gptid/88e7500d-4d64-11e3-bd3e-6805ca120ec5  ONLINE
   gptid/89a6ad98-4d64-11e3-bd3e-6805ca120ec5  ONLINE
[root@freenas] ~#

Output for status:

Code:

[root@freenas] ~# zpool status
no pools available

rhinok · Feb 23, 2014

This doesn't look good either:

Code:

[root@freenas] ~# zdb -l /dev/ada0
--------------------------------------------
LABEL 0
--------------------------------------------
failed to unpack label 0
--------------------------------------------
LABEL 1
--------------------------------------------
failed to unpack label 1
--------------------------------------------
LABEL 2
--------------------------------------------
failed to unpack label 2
--------------------------------------------
LABEL 3
--------------------------------------------
failed to unpack label 3
 
[root@freenas] ~# zdb -l /dev/ada1
--------------------------------------------
LABEL 0
--------------------------------------------
failed to unpack label 0
--------------------------------------------
LABEL 1
--------------------------------------------
failed to unpack label 1
--------------------------------------------
LABEL 2
--------------------------------------------
failed to unpack label 2
--------------------------------------------
LABEL 3
--------------------------------------------
failed to unpack label 3
 
[root@freenas] ~# zdb -l /dev/ada2
--------------------------------------------
LABEL 0
--------------------------------------------
failed to unpack label 0
--------------------------------------------
LABEL 1
--------------------------------------------
failed to unpack label 1
--------------------------------------------
LABEL 2
--------------------------------------------
failed to unpack label 2
--------------------------------------------
LABEL 3
--------------------------------------------
failed to unpack label 3

cyberjock · Feb 23, 2014

Ug. You've got a mess on your hands to say the least. Everything "looks" okay on the surface.

Here's some advice:

-Try mounting each disk individually (basically mount the pool as a 'broken' pool) and see if one of them works. Always mount read-only right now!
-Go find your backups. You've got a fairly good chance you're going to need them.
-Be very very careful about what you are running. Just like you demonstrated above, the wrong command line parameters can spell unrecoverable damage to an already desperate situation.
-There's a handful of threads on ZFS recovery. Read through them and try to figure out what is going on and try them if they apply. Don't forget what I said in the previous bullet. If you don't think you can make the distinction between what you should and shouldn't do, then you need to be ready to pay for recovery if your data is very important.

indy · Feb 23, 2014

Shouldn't copy-on-write harden ZFS against corruption due to incomplete writes in most cases?
Personally, by browsing through the forums, I got the feeling that in fact ZFS is more susceptible to power outages than other file-systems.
Or is this just caused by the lack of an offline repair mechanism?

cyberjock · Feb 23, 2014

indy said:
Shouldn't copy-on-write harden ZFS against corruption due to incomplete writes in most cases?
Personally, by browsing through the forums, I got the feeling that in fact ZFS is more susceptible to power outages than other file-systems.
Or is this just caused by the lack of an offline repair mechanism?

Yes. In fact, it should roll back incomplete transactions automatically on mount. The mechanism for failure is not well understood. And with more and more people not really having a deep understanding of ZFS, they are making more and more mistakes which are making this more prevalent. One side argues that these issues should be addressed in code. But the other side argues that you can't fix these issues with code. I'm in the latter half in that you can't fix certain things in code no matter what you do. People are going to find ways around code limitations and people are going to do exactly what you tell them not to do and they're going to say "I didn't have the money" or "I thought I mitigated that problem" or whatever. If there were ways to mitigate a problem I'd be discussing them. But I don't.. because you can't.

To be honest, aside from non-ECC RAM I wouldn't expect corruption of this magnitude with his setup. If the world revolved around me I'd love to take a look at the system from SSH and Teavviewer and see if something is clearly screwed up in the hardware or software settings, and if not, then see if I can get a developer to look at the problem.

The real "problem" with this problem is that if you setup the system following our stickies to the letter, you never see this problem. Never. Not once has someone lost a pool and still done everything right. In my 2 years here any time omeone loses a pool like this they've done many incredibly lame/stupid/whatever-you-want-to-call-it mistakes. Now here's where things get really interesting from my perspective. I'll admit that not using a UPS and using non-ECC RAM are two serious potential mistakes. But, for the conditions I wouldn't expect those mistakes to be causing the problems we are seeing. Yet, somehow, and for some reason, it seems to be. I don't have answers. I just push the "I believe" button and realize that this *is* happening even without a known cause. For that reason I push for the UPS and ECC RAM in my guide. It's all I can do.

OP: If you are interested in seeing if we can recover your data, send me a PM and I'll see if I can get a developer to look at the problem. Other than that, your chances of seeing your data again are incredibly slim based on statistical history.

rhinok · Feb 23, 2014

I tried

Code:

[root@freenas] ~# zpool import
  pool: tank1
    id: 3101957539559678790
  state: ONLINE
action: The pool can be imported using its name or numeric identifier.
config:
 
    tank1                                        ONLINE
      gptid/c417a6e8-4e35-11e3-b870-6805ca120ec5  ONLINE
 
  pool: tank0
    id: 774380436556752452
  state: ONLINE
status: Some supported features are not enabled on the pool.
action: The pool can be imported using its name or numeric identifier, though
    some features will not be available without an explicit 'zpool upgrade'.
config:
 
    tank0                                          ONLINE
      mirror-0                                      ONLINE
        gptid/88e7500d-4d64-11e3-bd3e-6805ca120ec5  ONLINE
        gptid/89a6ad98-4d64-11e3-bd3e-6805ca120ec5  ONLINE
[
root@freenas] ~# zpool status
no pools available
 
[root@freenas] ~# zpool list
no pools available
 
[root@freenas] ~# zpool import -R /tmp/tank1 tank1
 
[root@freenas] ~# df -h
Filesystem            Size    Used  Avail Capacity  Mounted on
/dev/ufs/FreeNASs1a    926M    748M    104M    88%    /
devfs                  1.0k    1.0k      0B  100%    /dev
/dev/md0              4.6M    3.3M    911k    79%    /etc
/dev/md1              823k    1.5k    756k    0%    /mnt
/dev/md2              149M    23M    113M    17%    /var
/dev/ufs/FreeNASs4      19M    724k    17M    4%    /data
tank1                  714G    152k    714G    0%    /var/tmp/tank1/tank1
tank1/media            913G    198G    714G    22%    /var/tmp/tank1/tank1/media
 
[root@freenas] ~# zpool list
NAME    SIZE  ALLOC  FREE    CAP  DEDUP  HEALTH  ALTROOT
tank1  928G  199G  729G    21%  1.00x  ONLINE  /tmp/tank1
 
[root@freenas] ~# zpool import -R /tmp/tank0 tank0
-- the system reboots --

I cannot see the errors as it scrolls down fast and the system reboots after a couple of seconds.

Any other ideas?
This is it?..

Regards,
Paul

cyberjock · Feb 23, 2014

I don't have any recommendations. I mean, I do, but they'd be exhaustively hard to discuss in the forum and to carry out in a forum setting.

rhinok · Feb 23, 2014

Running in single user mode, with vfs.zfs.recover=1 and vfs.zfs.debug=1:

Code:

# zdb -e -bcsvL tank0
Assertion failed: 0 == dmu_bonus_hold(os, object, dl, &dl->dl_dbuf) (0x0 == 0x2),
file /tank/home/jkh/checkout/freenas/FreeBSD/src/cddl/lib/libzpool/../../../sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dsl_deadlist.c, line 101.
pid 73 (zdb), uid 0: exited on signal 6
Abort trap

And:

Code:

# zpool import -o readonly=on -R /tmp/t0 tank0
...
cannot import 'tank0': pool may be in use from other system, it was last accessed by stormstone.local (hostid: 0xca8d0318) on Sun Feb 23 04:06:37 2014
use -f to import anyway

Using -f switch is causing the system to panic (without a restart this time).

rhinok · Feb 26, 2014

The error:

Code:

[root@freenas] ~# zdb -ce tank0
Assertion failed: 0 == dmu_bonus_hold(os, object, dl, &dl->dl_dbuf) (0x0 == 0x2), file /tank/home/jkh/checkout/freenas/FreeBSD/src/cddl/lib/libzpool/../../../sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dsl_deadlist.c, line 101.
Abort

rhinok · Mar 1, 2014

I tried pulling one of the disks from NAS.

Running in multiuser mode:

Code:

[root@freenas] ~# zpool import
   pool: tank1
     id: 3101957539559678790
  state: ONLINE
 action: The pool can be imported using its name or numeric identifier.
 config:
 
tank1                                         ONLINE
  gptid/c417a6e8-4e35-11e3-b870-6805ca120ec5  ONLINE
 
   pool: tank0
     id: 774380436556752452
  state: DEGRADED
 status: One or more devices are missing from the system.
 action: The pool can be imported despite missing or damaged devices.  The
fault tolerance of the pool may be compromised if imported.
   see: http://illumos.org/msg/ZFS-8000-2Q
 config:
 
tank0                                           DEGRADED
  mirror-0                                      DEGRADED
    gptid/88e7500d-4d64-11e3-bd3e-6805ca120ec5  ONLINE
    6440157375047474756                         UNAVAIL  cannot open

Importing tank0:

Code:

# zpool import -o readonly [-f] -R /mnt tank0

it panics and reboots.

Commands ran at loader prompt:

Code:

# set vfs.zfs.debug=1
# set vfs.zfs.recover=1
# boot -s

then

Code:

# sh /etc/rc.initdiskless
# zpool import -o readonly -R /mnt tank0
 
cannot import 'tank0': pool may be in use from other system, it was last accessed by stormstone.local (hostid: 0xca8d0318) on Sun Feb 23 04:06:37 2014
    use '-f' to import anyway
 
# zpool import -o readonly -f -R /mnt tank0
...
KDB: enter: panic
[ thread pid 70 tid 100082 ]
Stopped at      kdb_enter+0x3b: movq    $0,0xb76342(%rip)
db>

it panics and escapes to debugger prompt.

The outcome is the same when pulling the other disk.

indy · Mar 1, 2014

I still can not believe this happened over one power outage.
Best of luck to your recovery efforts. :(

cyberjock · Mar 1, 2014

Can you post the output of zpool history -i in pastebin. It'll be very long, so do pastebin.

rhinok · Mar 1, 2014

@indy: My hope is getting thinner.

@cyberjock: Unfortunately the output is very short because tank0 cannot be imported, no mater what:

Code:

[root@freenas] ~# zpool history -i
no pools available

And importing the pool panics the system. Something is corrupted, but I don't know how to diagnose this. Everything I tried it ends with:

Code:

Assertion failed: 0 == dmu_bonus_hold(os, object, dl, &dl->dl_dbuf) (0x0 == 0x2), file /tank/home/jkh/checkout/freenas/FreeBSD/src/cddl/lib/libzpool/../../../sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dsl_deadlist.c, line 101.

cyberjock · Mar 1, 2014

Oh. I thought you could get zpool history even if a pool wasn't mounted. My bad. I'm guessing the dev has given up trying to help you?

rhinok · Mar 2, 2014

No, he didn't give up just yet. It's just me, more and more convinced that the data is gone because nothing seems to work.

rhinok · Mar 8, 2014

Still tinkering on this. Running zdb with "-AAA" yields the same result:

Code:

# zdb -AAA -e tank0
Configuration for import:
        vdev_children: 1
        version: 5000
        pool_guid: 774380436556752452
        name: 'tank0'
        state: 0
        hostid: 3398239000
        hostname: 'stormstone.local'
        vdev_tree:
            type: 'root'
            id: 0
            guid: 774380436556752452
            children[0]:
                type: 'mirror'
                id: 0
                guid: 13590223622916399775
                metaslab_array: 34
                metaslab_shift: 34
                ashift: 12
                asize: 2998440558592
                is_log: 0
                create_txg: 4
                children[0]:
                    type: 'disk'
                    id: 0
                    guid: 3731903511112360879
                    phys_path: '/dev/gptid/88e7500d-4d64-11e3-bd3e-6805ca120ec5'
                    whole_disk: 1
                    DTL: 180
                    create_txg: 4
                    path: '/dev/gptid/88e7500d-4d64-11e3-bd3e-6805ca120ec5'
                children[1]:
                    type: 'disk'
                    id: 1
                    guid: 6440157375047474756
                    phys_path: '/dev/gptid/89a6ad98-4d64-11e3-bd3e-6805ca120ec5'
                    whole_disk: 1
                    DTL: 179
                    create_txg: 4
                    path: '/dev/gptid/89a6ad98-4d64-11e3-bd3e-6805ca120ec5'
Assertion failed: 0 == dmu_bonus_hold(os, object, dl, &dl->dl_dbuf) (0x0 == 0x2),
file /tank/home/jkh/checkout/freenas/FreeBSD/src/cddl/lib/libzpool/../../../sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dsl_deadlist.c, line 101.
pid 928 (zdb), uid 0: exited on signal 6 (core dumped)

Comparing this with output from another pool, it looks like the crash happens when trying to get the MOS Configuration.

Code:

Configuration for import:
        vdev_children: 1
        version: 5000
        pool_guid: 3101957539559678790
        name: 'tank1'
        state: 0
        hostid: 3398239000
        hostname: 'freenas.local'
        vdev_tree:
            type: 'root'
            id: 0
            guid: 3101957539559678790
            children[0]:
                type: 'disk'
                id: 0
                guid: 3001617559897053389
                phys_path: '/dev/gptid/c417a6e8-4e35-11e3-b870-6805ca120ec5'
                whole_disk: 1
                metaslab_array: 34
                metaslab_shift: 33
                ashift: 12
                asize: 998052462592
                is_log: 0
                DTL: 56
                create_txg: 4
                path: '/dev/gptid/c417a6e8-4e35-11e3-b870-6805ca120ec5'
 
MOS Configuration:
        version: 5000
        name: 'tank1'
        state: 0
        txg: 1694119
        pool_guid: 3101957539559678790
        hostid: 3398239000
        hostname: 'freenas.local'
...

What does MOS stand for?

I found another difference in output of zdb -l output on the disk in tank1 (/dev/ada2p2) is

Code:

features_for_read:
        com.delphix:hole_birth

but on both disks from tank0 is:

Code:

feature_for_read:

Also, the generated core dump file is zdb.core. I tried gdb but it doesn't work (no debugging symbols found). How can I examine this dump?

Edit: Maybe this is a silly question, but it is possible to build the FreeNAS kernel and the ZFS tools (zpool, zdb etc) with debugging symbols? Would this be any help?

Important Announcement for the TrueNAS Community.

Panic mounting local filesystems

Dabbler

Attachments

Inactive Account

Dabbler

Attachments

Inactive Account

Dabbler

Dabbler

Inactive Account

Patron

Inactive Account

Dabbler

Inactive Account

Dabbler

Dabbler

Dabbler

Patron

Inactive Account

Dabbler

Inactive Account

Dabbler

Dabbler

Important Announcement for the TrueNAS Community.

Related topics on forums.truenas.com for thread: "Panic mounting local filesystems"

Similar threads