snapshot causes kernel panic

JDCynical

Contributor
Joined
Aug 18, 2014
Messages
141
FreeNAS-11.2-RELEASE-U1 (I know, it's not the latest, but I'm feeling a bit gun shy after the 11.2 data loss escapade.)
zpool get version shows a - for the value, so it's possible I'm not running the 'latest' pool version for the installed release.

I did check the bug tracker and wasn't able to find anything similar.

I've been able to reproduce this multiple times. System is otherwise stable.

Once a snapshot is attempted, manually from the GUI or via a scheduled task, the system kernel panics and reboots.

This is what I was able to capture via remote console screen recording. The system will just be sitting there until the snapshot is attempted:
Code:
panic: solaris assert: zap_add(mos, desl_dataset_phys(ds)->ds_snapnames_zapobj, snapname, 8, 1, &dsobj, tx) == 0 (0x5 ==0x0), file: /freenas-releng-final/freenas/_BE/os/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dsl_dataset.c, line: 1534
cpuid = 3
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe0469cbc3d0
vpanic() at vpanic+0x177/frame 0xfffffe0469cbc430
panic() at panic+0x43/frame 0xfffffe0469cbc490
assfail3() at assfail3+0x2c/frame 0xfffffe0469cbc4b0
dsl_dataset_snapshot_sync_impl() at dsl_dataset_snapshot_sync_impl+0x628/frame 0xfffffe0469cbc560
dsl_dataset_snapshot_sync_impl() at dsl_dataset_snapshot_sync_impl+0f7/frame 0xfffffe0469cbc6c0
dsl_sync_task_sync() at dsl_sync_task_sync+0xae/frame 0xfffffe0469cbc6f0
dsl_pool_sync() at dsl_sync_task_sync+0x3b/frame 0xfffffe0469cbc770
spa_sync() at spa_sync+0xad5/frame 0xfffffe0469cbc9a0
txg_sync_thread() at tgx_sync_thread+0x208/frame 0xfffffe0469cbcab0
fork_exit() at fork)exit+0x83/frame 0xfffffe0469cbcab0
fork_trampoline() at form_exit+0x83/frame 0xfffffe0469cbcab0
--- trap 0, rip = 0, rsp = 0, rbp = 0 ---
KDB: enter: panic
[ thread pid 15 tid 101395 ]
stopped at      kdb_enter+0x3b: movq    $0,kdb_why
db:0:kdb.enter.deafault> wrtie cn_mute 1
cn_mute                0        =           0,x1
db:0:kbd.enter.default>  reset
cpu_reset: Restarting BSP
cpu_reset_proxy: Stopped CPU 3

Current system specs:
Supermicro X9SCL+-F
Xeon E3-1230
16 gig ECC
LSI 9211-8i
2x Crossflashed Dell H310
Firmware 20.00.07.00 on all three adapters (covers all available drive bays with an extra port on one of the cards)
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
Definitely file a bug report and please post it here.
 

SweetAndLow

Sweet'NASty
Joined
Nov 6, 2013
Messages
6,421
Smart test results? What's the output of zpool status and run a scrub.
 

JDCynical

Contributor
Joined
Aug 18, 2014
Messages
141
Smart test results? What's the output of zpool status and run a scrub.

Smart test results are clean. Scrubs are done weekly and there has been no problems, but I'll run one now to be sure.

EDIT: Scrub is clean. Panic has happened when trying to make a snapshot from an SSH session as well.

Code:
# zpool status
  pool: freenas-boot
state: ONLINE
  scan: scrub repaired 0 in 0 days 00:00:17 with 0 errors on Tue Apr  9 03:45:17 2019
config:

    NAME        STATE     READ WRITE CKSUM
    freenas-boot  ONLINE       0     0     0
      ada0p2    ONLINE       0     0     0

errors: No known data errors

  pool: storage01
state: ONLINE
status: One or more devices are configured to use a non-native block size.
    Expect reduced performance.
action: Replace affected devices with devices that support the
    configured block size, or migrate data to a properly configured
    pool.
  scan: scrub repaired 0 in 0 days 14:42:40 with 0 errors on Sun Apr  7 14:42:44 2019
config:

    NAME                                            STATE     READ WRITE CKSUM
    storage01                                       ONLINE       0     0     0
      raidz1-0                                      ONLINE       0     0     0
        gptid/be0b76d7-9681-11e7-a375-002590a8e53a  ONLINE       0     0     0  block size: 512B configured, 4096B native
        gptid/1eaed483-ab60-11e4-b0ac-003048d45614  ONLINE       0     0     0  block size: 512B configured, 4096B native
        gptid/a92e9f80-9f50-11e2-ba98-003048d45614  ONLINE       0     0     0  block size: 512B configured, 4096B native
        gptid/d917ca93-9e1d-11e2-be47-003048d45614  ONLINE       0     0     0  block size: 512B configured, 4096B native
        gptid/e1625856-9e1d-11e2-be47-003048d45614  ONLINE       0     0     0  block size: 512B configured, 4096B native

errors: No known data errors


Ticket suggestion was to update to U3 and see if it happens. Problem to me is that I wanted a snapshot in place before attempting any kind of upgrade (once bitten, twice shy and all that).
 
Last edited:

JDCynical

Contributor
Joined
Aug 18, 2014
Messages
141
So, finally got a chance to look at this again (I hate real life and work priorities). Updated to FreeNAS-11.2-U4.1, trying to do a snapshot from the GUI or CLI still causes the KP.

Original bug report updated. Not sure if I need to remake it since it's currently closed, but we will see.
 

JDCynical

Contributor
Joined
Aug 18, 2014
Messages
141
..And Lo, a new update arrives, FreeNAS-11.2-U5. Interestingly, making a snapshot of a low use filesystem works, but a recursive of the entire pool causes a crash.

I've made a few individual recursive snapshot tasks vs trying to set up one pool sized recursive task, will see if that causes any crashes.
 
Top