Lock Order Reversal causing mysterious freezing?

Tom Johnson

Cadet
Joined
Jan 3, 2016
Messages
6
I've played with Freenas for years, probably getting to the point where I'm moderately dangerous. For many months I've had a problem where my system freezes and become non-responsive. It doesn't happen quickly after boot, in fact often operating fine for hours - but never 24 hours. Nor is the freezing aligned with any process that I can discern.

Running FreeNAS 11.3-U3.1

I am running on a SuperMicro H8DM8E-2 motherboard with a SuperMicro BPN-SAS-846TQ backplane.

I have stress tested my memory (60Gig ECC) and my CPU's (duo Six-Core AMD Opteron(tm) Processor 2425 HE (2110.86-MHz K8-class CPU)) a minimum of 12 hours each without failure. I have no SMART reported issues on the 24 hard drives.

I was running three SuperMicro AOC-SAT2-MV8 (not running RAID) but thinking they might be the problem I replaced them with a new IBM ServeRaid M1015 (modified to work in FREENAS) with a IBM 46M0997 16-Port 6GB SAS SATA ServeRAID Expansion Adapter Card - again no RAID. All cabling between the controllers and the hard drives were replaced.

I am running one jail - Plex Media Server.

Finally, I turned on kernel debugging and came up with this report happening about 30s after the end of starting up:

Code:
Jul 16 21:11:36 freenas lock order reversal:
Jul 16 21:11:36 freenas 1st 0xfffff8014b806d50 zfs (zfs) @ /freenas-releng/freenas/_BE/os/sys/kern/vfs_mount.c:908
Jul 16 21:11:36 freenas 2nd 0xfffff80084fa85f0 devfs (devfs) @ /freenas-releng/freenas/_BE/os/sys/kern/vfs_mount.c:919
Jul 16 21:11:36 freenas stack backtrace:
Jul 16 21:11:36 freenas #0 0xffffffff80b6fbb0 at witness_debugger+0x70
Jul 16 21:11:36 freenas #1 0xffffffff80b6fa46 at witness_checkorder+0xe76
Jul 16 21:11:36 freenas #2 0xffffffff80ae02a1 at lockmgr_lock_fast_path+0x1b1
Jul 16 21:11:36 freenas #3 0xffffffff811ec8e1 at VOP_LOCK1_APV+0xe1
Jul 16 21:11:36 freenas #4 0xffffffff80be9c97 at _vn_lock+0x67
Jul 16 21:11:36 freenas #5 0xffffffff80bd0c1d at vfs_domount+0xd1d
Jul 16 21:11:36 freenas #6 0xffffffff80bcf959 at vfs_donmount+0x7b9
Jul 16 21:11:36 freenas #7 0xffffffff80bcf171 at sys_nmount+0x71
Jul 16 21:11:36 freenas #8 0xffffffff8101b712 at amd64_syscall+0x792
Jul 16 21:11:36 freenas #9 0xffffffff80ff547d at fast_syscall_common+0x101
Jul 16 21:11:37 freenas bridge0: Ethernet address: 02:04:10:24:77:00
Jul 16 21:11:37 freenas kernel: nfe0: promiscuous mode enabled
Jul 16 21:11:37 freenas kernel: bridge0: link state changed to UP
Jul 16 21:11:37 freenas kernel: bridge0: link state changed to UP
Jul 16 21:11:38 freenas epair0a: Ethernet address: 02:ff:10:00:05:0a
Jul 16 21:11:38 freenas epair0b: Ethernet address: 02:f1:4f:00:06:0b
Jul 16 21:11:38 freenas kernel: epair0a: link state changed to UP
Jul 16 21:11:38 freenas kernel: epair0a: link state changed to UP
Jul 16 21:11:38 freenas kernel: epair0b: link state changed to UP
Jul 16 21:11:38 freenas kernel: epair0b: link state changed to UP
Jul 16 21:11:38 freenas kernel: epair0a: changing name to 'vnet0.1'
Jul 16 21:11:39 freenas kernel: vnet0.1: promiscuous mode enabled
Jul 16 21:11:42 freenas kernel: lo0: link state changed to UP
Jul 16 21:11:42 freenas kernel: lo0: link state changed to UP


In this instance, nothing more happens until the logfile switched to a new log file at midnight, but when I looked to see if it was running at 7AM, it had frozen.

This seems to be the standard case - I will reboot sometime in the early evening and it runs fine through when I go to bed (before or after midnight), but freezes before I awake.

I can reboot with the IPMI card with no issues every time.

From what I've read abut Lock Order Reversal, it would produce "freezing" like I see. Also, from what I've read, it's probably not a hardware issue but something I'll just need to send in as a bug report.

Any suggestions on further tests I might run to isolate the problem would be helpful. Or better yet, if anyone has a solution would be best of all.

Thank you and awaiting your insightful replies.
 

Tom Johnson

Cadet
Joined
Jan 3, 2016
Messages
6
Good gosh guys and gals, I expected at least one of this communities notoriously prickly replies about what data I was missing, etc.

Please, I need your genius help - I am but a poor acolyte of this wonderful and beautifully complex software.
 

Tom Johnson

Cadet
Joined
Jan 3, 2016
Messages
6
Just as a further piece of information, this error will repeat at random intervals, but it does not appear to be directly associated to the freezing. I am right now sending a timed output to an external file to see if I can correlate this report to the actual freeze. Any other suggestions for diagnosing this?
 

Samuel Tai

Never underestimate your own stupidity
Moderator
Joined
Apr 24, 2020
Messages
5,399
The debug trace indicates your CPUs tried to unlock file locks within your jail mount(s) into the Plex jail in the wrong order. Typically, file locks need to be unlocked in the reverse order they were applied, to guarantee data safety. I would guess this is because the Opterons tried to reorder threads for better CPU throughput, but are doing things a bit too aggressively, and not fully communicating state between the 2 CPUs. There are many complaints of similar behavior with AMD CPUs on the FreeBSD boards.

You could try disabling one of the two CPUs to see if this improves this behavior. File sharing isn't CPU-limited.
 
Top