snapshot causes kernel panic

JDCynical · Apr 15, 2019

FreeNAS-11.2-RELEASE-U1 (I know, it's not the latest, but I'm feeling a bit gun shy after the 11.2 data loss escapade.)
zpool get version shows a - for the value, so it's possible I'm not running the 'latest' pool version for the installed release.

I did check the bug tracker and wasn't able to find anything similar.

I've been able to reproduce this multiple times. System is otherwise stable.

Once a snapshot is attempted, manually from the GUI or via a scheduled task, the system kernel panics and reboots.

This is what I was able to capture via remote console screen recording. The system will just be sitting there until the snapshot is attempted:

Code:

panic: solaris assert: zap_add(mos, desl_dataset_phys(ds)->ds_snapnames_zapobj, snapname, 8, 1, &dsobj, tx) == 0 (0x5 ==0x0), file: /freenas-releng-final/freenas/_BE/os/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dsl_dataset.c, line: 1534
cpuid = 3
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe0469cbc3d0
vpanic() at vpanic+0x177/frame 0xfffffe0469cbc430
panic() at panic+0x43/frame 0xfffffe0469cbc490
assfail3() at assfail3+0x2c/frame 0xfffffe0469cbc4b0
dsl_dataset_snapshot_sync_impl() at dsl_dataset_snapshot_sync_impl+0x628/frame 0xfffffe0469cbc560
dsl_dataset_snapshot_sync_impl() at dsl_dataset_snapshot_sync_impl+0f7/frame 0xfffffe0469cbc6c0
dsl_sync_task_sync() at dsl_sync_task_sync+0xae/frame 0xfffffe0469cbc6f0
dsl_pool_sync() at dsl_sync_task_sync+0x3b/frame 0xfffffe0469cbc770
spa_sync() at spa_sync+0xad5/frame 0xfffffe0469cbc9a0
txg_sync_thread() at tgx_sync_thread+0x208/frame 0xfffffe0469cbcab0
fork_exit() at fork)exit+0x83/frame 0xfffffe0469cbcab0
fork_trampoline() at form_exit+0x83/frame 0xfffffe0469cbcab0
--- trap 0, rip = 0, rsp = 0, rbp = 0 ---
KDB: enter: panic
[ thread pid 15 tid 101395 ]
stopped at      kdb_enter+0x3b: movq    $0,kdb_why
db:0:kdb.enter.deafault> wrtie cn_mute 1
cn_mute                0        =           0,x1
db:0:kbd.enter.default>  reset
cpu_reset: Restarting BSP
cpu_reset_proxy: Stopped CPU 3

Current system specs:
Supermicro X9SCL+-F
Xeon E3-1230
16 gig ECC
LSI 9211-8i
2x Crossflashed Dell H310
Firmware 20.00.07.00 on all three adapters (covers all available drive bays with an extra port on one of the cards)

jgreco · Apr 15, 2019

Definitely file a bug report and please post it here.

JDCynical · Apr 15, 2019

jgreco said:
Definitely file a bug report and please post it here.

Created:

https://jira.ixsystems.com/browse/NAS-101448

SweetAndLow · Apr 15, 2019

Smart test results? What's the output of zpool status and run a scrub.

JDCynical · Apr 17, 2019

SweetAndLow said:
Smart test results? What's the output of zpool status and run a scrub.

Smart test results are clean. Scrubs are done weekly and there has been no problems, but I'll run one now to be sure.

EDIT: Scrub is clean. Panic has happened when trying to make a snapshot from an SSH session as well.

Code:

# zpool status
  pool: freenas-boot
state: ONLINE
  scan: scrub repaired 0 in 0 days 00:00:17 with 0 errors on Tue Apr  9 03:45:17 2019
config:

    NAME        STATE     READ WRITE CKSUM
    freenas-boot  ONLINE       0     0     0
      ada0p2    ONLINE       0     0     0

errors: No known data errors

  pool: storage01
state: ONLINE
status: One or more devices are configured to use a non-native block size.
    Expect reduced performance.
action: Replace affected devices with devices that support the
    configured block size, or migrate data to a properly configured
    pool.
  scan: scrub repaired 0 in 0 days 14:42:40 with 0 errors on Sun Apr  7 14:42:44 2019
config:

    NAME                                            STATE     READ WRITE CKSUM
    storage01                                       ONLINE       0     0     0
      raidz1-0                                      ONLINE       0     0     0
        gptid/be0b76d7-9681-11e7-a375-002590a8e53a  ONLINE       0     0     0  block size: 512B configured, 4096B native
        gptid/1eaed483-ab60-11e4-b0ac-003048d45614  ONLINE       0     0     0  block size: 512B configured, 4096B native
        gptid/a92e9f80-9f50-11e2-ba98-003048d45614  ONLINE       0     0     0  block size: 512B configured, 4096B native
        gptid/d917ca93-9e1d-11e2-be47-003048d45614  ONLINE       0     0     0  block size: 512B configured, 4096B native
        gptid/e1625856-9e1d-11e2-be47-003048d45614  ONLINE       0     0     0  block size: 512B configured, 4096B native

errors: No known data errors

Ticket suggestion was to update to U3 and see if it happens. Problem to me is that I wanted a snapshot in place before attempting any kind of upgrade (once bitten, twice shy and all that).

JDCynical · Jun 23, 2019

So, finally got a chance to look at this again (I hate real life and work priorities). Updated to FreeNAS-11.2-U4.1, trying to do a snapshot from the GUI or CLI still causes the KP.

Original bug report updated. Not sure if I need to remake it since it's currently closed, but we will see.

JDCynical · Jun 26, 2019

..And Lo, a new update arrives, FreeNAS-11.2-U5. Interestingly, making a snapshot of a low use filesystem works, but a recursive of the entire pool causes a crash.

I've made a few individual recursive snapshot tasks vs trying to set up one pool sized recursive task, will see if that causes any crashes.

Important Announcement for the TrueNAS Community.

snapshot causes kernel panic

JDCynical

Contributor

jgreco

Resident Grinch

JDCynical

Contributor

SweetAndLow

Sweet'NASty

JDCynical

Contributor

JDCynical

Contributor

JDCynical

Contributor

Similar threads