SOLVED Kernel panic running truss on suspended process

ascl

Dabbler
Joined
Jan 30, 2019
Messages
26
This is a strange one, and I'm honestly not sure where to ask about this, or how to proceed with it. But let me give some context and details:
- FreeNAS 11.2 U7
- Supermicro X11SAE-M-O
- 32GB ECC Crucial RAM
- SeasonicFocus Plus platinum 550w PSU
- half a dozen WD Red HDDs - main storage for media
- misc SSDs (Samsung mainly IIRC) - jails and misc stuff

I have many jails running, and have been toying with one for CCTV use, and been looking at motion and motioneye for this, and it mostly works acceptably (especially after updating motion to the latest version). However there is one very strange thing that happens, after video is recorded, if you navigate to the list of videos, motioneye (python scripts) calls out to ffmpeg to convert the video. The first call hangs. It never returns, it's not using CPU, nothing. Same command from the command line works immediately. If I kill the ffmpeg proc, it turns zombie and eventually disappears... and then the python script continues and the subsequent ffmpeg call will succeed (for the next video).

I have cleaned out the directory several times, and it keeps happening... meaning it isn't just one corrupt video file or something.

I tried to call truss on the process, and the kernel paniced!

Code:
Fatal trap 12: page fault while in kernel mode
cpuid = 5; apic id = 05
fault virtual address    = 0x2b0
fault code        = supervisor write data, page not present
instruction pointer    = 0x20:0xffffffff80b17c5a
stack pointer            = 0x28:0xfffffe08556f4520
frame pointer            = 0x28:0xfffffe08556f4640
code segment        = base 0x0, limit 0xfffff, type 0x1b
            = DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags    = interrupt enabled, resume, IOPL = 0
current process        = 32635 (truss)


I am somewhat reluctant to repeat this, but I do think the crash is repeatable. I also do not believe this is a memory issue, as I'd expect to see random stuff, rather than specifically "ffmpeg called by python the first time". I can repeat this python/ffmpeg hang 100% of the time, even across reboots.

The crashing PID looked like this:
Code:
Tracing pid 32635 tid 101441 td 0xfffff80258532620
kern_ptrace() at kern_ptrace+0x138a/frame 0xfffffe08556f4640
sys_ptrace() at sys_ptrace+0x18d/frame 0xfffffe08556f4880
amd64_syscall() at amd64_syscall+0xa38/frame 0xfffffe08556f49b0
fast_syscall_common() at fast_syscall_common+0x101/frame 0xfffffe08556f49b0
--- syscall (26, FreeBSD ELF64, sys_ptrace), rip = 0x800ae586a, rsp = 0x7fffffffeae8, rbp = 0x7fffffffeb10 ---


The PID I was trying to run truss on looked like this:
Code:
Tracing command ffmpeg pid 29505 tid 101577 td 0xfffff80233b48000
sched_switch() at sched_switch+0x8ad/frame 0xfffffe0855422460
mi_switch() at mi_switch+0xe6/frame 0xfffffe0855422490
thread_suspend_check() at thread_suspend_check+0x2a7/frame 0xfffffe08554224e0
sleepq_catch_signals() at sleepq_catch_signals+0x1e9/frame 0xfffffe0855422540
sleepq_wait_sig() at sleepq_wait_sig+0xf/frame 0xfffffe0855422570
_cv_wait_sig() at _cv_wait_sig+0x167/frame 0xfffffe08554225c0
tty_wait_background() at tty_wait_background+0x19c/frame 0xfffffe0855422690
ttydev_ioctl() at ttydev_ioctl+0x158/frame 0xfffffe08554226e0
devfs_ioctl_f() at devfs_ioctl_f+0x128/frame 0xfffffe0855422740
kern_ioctl() at kern_ioctl+0x26d/frame 0xfffffe08554227b0
sys_ioctl() at sys_ioctl+0x15c/frame 0xfffffe0855422880
amd64_syscall() at amd64_syscall+0xa38/frame 0xfffffe08554229b0
fast_syscall_common() at fast_syscall_common+0x101/frame 0xfffffe08554229b0
--- syscall (54, FreeBSD ELF64, sys_ioctl), rip = 0x8043adf7a, rsp = 0x7fffffffe7f8, rbp = 0x7fffffffe840 ---


This system has been stable for about 12 months now, and nothing has changed recently other than adding a jail to run motion/motioneye in.


Typical. As I proof-read the post before clicking post I noticed something.... motioneye is running from the terminal in the background, and I had noticed that when I foregrounded it, the process was suspended, which makes sense given the above stack trace (ie sleeping inside the tty device ioctl call). This is good, this means I can avoid the issue, presumably ffmpeg is trying to write something to stdout, and there is no where for it to go. I can work around that. The kernel panic is not good, but if it is purely running truss on a suspended process, I can just avoid doing that.

I have tested this on FreeBSD12 and it does not panic, so I'm going to say this is solved and leave this here in case someone else hits something similar.
 
Top