TrueNAS 12.0U1 python3.8 crashing

johnwbyrd

Cadet
Joined
Dec 31, 2020
Messages
7
On my install, the version of TrueNAS listed above seems to have problems with python3.8 dumping core at a certain point, which in turn stops the GUI from running.

The web GUI can be manually restarted by ssh'ing into the box and running "service middewared restart" as root.

The crashes are restricted to the python3.8 executable, which makes me think this is a software bug and not a hardware one. The hardware survives multiple MemTest86 tests, which makes me think it's a software issue and not a hardware one.

It's a little disturbing to see python itself dumping core. Python 3.8.5 has been known to dump core -- see for example https://bugs.python.org/issue37135 .

I don't know how the developers are rolling python3.8 for TrueNAS, but I might suggest making sure that python3.8 is built from the latest possible sources.

Additionally, it might be good to have a watchdog process catch when python3.8 has crashed, and restart the middlewared process or at least report the crash.

Thoughts on how I can get more debug info to the developers?

Some possibly relevant lines from /var/log/message:

Dec 30 04:18:47 truenas 1 2020-12-30T04:18:47.118906-08:00 truenas collectd 3353 - - nut plugin: nut_connect: upscli_connect (localhost, 3493) failed: Connection failure: Connection refused
Dec 30 09:24:21 truenas kernel: pid 506 (python3.8), jid 0, uid 0: exited on signal 11 (core dumped)
Dec 30 09:24:21 truenas kernel: pid 17779 (python3.8), jid 0, uid 0: exited on signal 11 (core dumped)
Dec 30 09:24:24 truenas kernel: pid 17789 (python3.8), jid 0, uid 0: exited on signal 11 (core dumped)
Dec 30 12:14:38 truenas kernel: pid 18068 (python3.8), jid 0, uid 0: exited on signal 11 (core dumped)
Dec 30 15:41:27 truenas 1 2020-12-30T15:41:27.133678-08:00 truenas collectd 3353 - - nut plugin: nut_connect: upscli_connect (localhost, 3493) failed: Connection failure: Connection refused
Dec 30 16:57:12 truenas 1 2020-12-31T00:57:12.022530+00:00 truenas devd 429 - - notify_clients: send() failed; dropping unresponsive client
Dec 30 16:57:12 truenas kernel: pid 446 (python3.8), jid 0, uid 0: exited on signal 4 (core dumped)
Dec 30 17:01:27 truenas 1 2020-12-30T17:01:27.199768-08:00 truenas collectd 3353 - - Traceback (most recent call last):
File "/usr/local/lib/collectd_pyplugins/disktemp.py", line 62, in read
with Client() as c:
File "/usr/local/lib/python3.8/site-packages/middlewared/client/client.py", line 281, in __init__
self._ws.connect()
File "/usr/local/lib/python3.8/site-packages/middlewared/client/client.py", line 124, in connect
rv = super(WSClient, self).connect()
File "/usr/local/lib/python3.8/site-packages/ws4py/client/__init__.py", line 216, in connect
self.sock.connect(self.bind_addr)
ConnectionRefusedError: [Errno 61] Connection refused
Dec 30 17:06:27 truenas 1 2020-12-30T17:06:27.199002-08:00 truenas collectd 3353 - - Traceback (most recent call last):
File "/usr/local/lib/collectd_pyplugins/disktemp.py", line 62, in read
with Client() as c:
File "/usr/local/lib/python3.8/site-packages/middlewared/client/client.py", line 281, in __init__
self._ws.connect()
File "/usr/local/lib/python3.8/site-packages/middlewared/client/client.py", line 124, in connect
rv = super(WSClient, self).connect()
File "/usr/local/lib/python3.8/site-packages/ws4py/client/__init__.py", line 216, in connect
self.sock.connect(self.bind_addr)
ConnectionRefusedError: [Errno 61] Connection refused


The hardware that TrueNAS is running on:

CPU: AMD Ryzen 5 3600 6-Core Processor
Memory: 64 GB
Motherboard: ASUS TUF GAMING X570-PLUS (WI-FI)
Graphics: NVidia GeForce 710
2 Intel PRO/1000 PCI-Express Network interface compatible cards
nvd0: WDS100T3X0C-00SJG0 WD Black SN750 NVMe SSD
nvd1: WDS100T3X0C-00SJG0 WD Black SN750 NVMe SSD
ada0 boot device: Samsung SSD 860 EVO 500GB RVT04B6Q, Serial Number S598NJ0NA16137K
ada1: HGST HUS728T8TALE6L4 V8GAW4J0
ada2: HGST HUS728T8TALE6L4 V8GAW4J0
ada3: HGST HUS728T8TALE6L4 V8GAW4J0
 

johnwbyrd

Cadet
Joined
Dec 31, 2020
Messages
7
I used "gdb /usr/local/bin/python3.8 /python3.8.core" and then "bt" to try to get a call stack, but python3.8 seems to have been compiled without debugging symbols. The crash is coming from deep within _PyEval_EvalFrameDefault(), from /usr/local/lib/libpython3.8.so.1.0:

# gdb /usr/local/bin/python3.8 /python3.8.core
GNU gdb (GDB) 9.1 [GDB v9.1 for FreeBSD]
Copyright (C) 2020 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-portbld-freebsd12.0".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from /usr/local/bin/python3.8...
(No debugging symbols found in /usr/local/bin/python3.8)
[New LWP 100866]

warning: Section `.reg-xstate/100866' in core file too small.
Core was generated by `/usr/local/bin/python3.8 /usr/local/bin/midclt call --job true --job-print descr'.
Program terminated with signal SIGILL, Illegal instruction.

warning: Section `.reg-xstate/100866' in core file too small.
#0 0x0000000800472524 in _PyEval_EvalFrameDefault () from /usr/local/lib/libpython3.8.so.1.0
(gdb) bt
#0 0x0000000800472524 in _PyEval_EvalFrameDefault () from /usr/local/lib/libpython3.8.so.1.0
#1 0x00000008012a6410 in ?? ()
#2 0x0000000800aae9f0 in ?? ()
#3 0x0000000000000004 in ?? ()
#4 0x00000008003fe788 in ?? () from /usr/local/lib/libpython3.8.so.1.0
#5 0x00000008004768d0 in _PyEval_EvalCodeWithName () from /usr/local/lib/libpython3.8.so.1.0
#6 0x00000008003aa26b in _PyFunction_Vectorcall () from /usr/local/lib/libpython3.8.so.1.0
#7 0x0000000800475c04 in ?? () from /usr/local/lib/libpython3.8.so.1.0
#8 0x0000000800472e12 in _PyEval_EvalFrameDefault () from /usr/local/lib/libpython3.8.so.1.0
#9 0x00000008004768d0 in _PyEval_EvalCodeWithName () from /usr/local/lib/libpython3.8.so.1.0
#10 0x00000008003aa26b in _PyFunction_Vectorcall () from /usr/local/lib/libpython3.8.so.1.0
[snip]
#141 0x000000080046c593 in PyEval_EvalCode () from /usr/local/lib/libpython3.8.so.1.0
#142 0x00000008004b81ae in ?? () from /usr/local/lib/libpython3.8.so.1.0
#143 0x00000008004b6dbc in PyRun_FileExFlags () from /usr/local/lib/libpython3.8.so.1.0
#144 0x00000008004b632c in PyRun_SimpleFileExFlags () from /usr/local/lib/libpython3.8.so.1.0
#145 0x00000008004d4afe in Py_RunMain () from /usr/local/lib/libpython3.8.so.1.0
#146 0x00000008004d503b in ?? () from /usr/local/lib/libpython3.8.so.1.0
#147 0x00000008004d50ba in Py_BytesMain () from /usr/local/lib/libpython3.8.so.1.0
#148 0x00000000002017d0 in _start ()
 

johnwbyrd

Cadet
Joined
Dec 31, 2020
Messages
7
I notice, for example, that TrueNAS 12.0U1 is stuck on Python 3.8.5, whereas the FreeBSD portsnap version has moved on to Python 3.8.7...
 

Redcoat

MVP
Joined
Feb 18, 2014
Messages
2,925
Thoughts on how I can get more debug info to the developers?
Welcome to the forums!

To get to the developers here, file a bug ticket (link on the masthead)with your info and comments and a debug file via System>Support>Generate Debug. The developers don't live here in the forum.

Thanks for taking the time to provide constructive feedback. Happy New Year!
 
Top