On my install, the version of TrueNAS listed above seems to have problems with python3.8 dumping core at a certain point, which in turn stops the GUI from running.
The web GUI can be manually restarted by ssh'ing into the box and running "service middewared restart" as root.
The crashes are restricted to the python3.8 executable, which makes me think this is a software bug and not a hardware one. The hardware survives multiple MemTest86 tests, which makes me think it's a software issue and not a hardware one.
It's a little disturbing to see python itself dumping core. Python 3.8.5 has been known to dump core -- see for example https://bugs.python.org/issue37135 .
I don't know how the developers are rolling python3.8 for TrueNAS, but I might suggest making sure that python3.8 is built from the latest possible sources.
Additionally, it might be good to have a watchdog process catch when python3.8 has crashed, and restart the middlewared process or at least report the crash.
Thoughts on how I can get more debug info to the developers?
Some possibly relevant lines from /var/log/message:
Dec 30 04:18:47 truenas 1 2020-12-30T04:18:47.118906-08:00 truenas collectd 3353 - - nut plugin: nut_connect: upscli_connect (localhost, 3493) failed: Connection failure: Connection refused
Dec 30 09:24:21 truenas kernel: pid 506 (python3.8), jid 0, uid 0: exited on signal 11 (core dumped)
Dec 30 09:24:21 truenas kernel: pid 17779 (python3.8), jid 0, uid 0: exited on signal 11 (core dumped)
Dec 30 09:24:24 truenas kernel: pid 17789 (python3.8), jid 0, uid 0: exited on signal 11 (core dumped)
Dec 30 12:14:38 truenas kernel: pid 18068 (python3.8), jid 0, uid 0: exited on signal 11 (core dumped)
Dec 30 15:41:27 truenas 1 2020-12-30T15:41:27.133678-08:00 truenas collectd 3353 - - nut plugin: nut_connect: upscli_connect (localhost, 3493) failed: Connection failure: Connection refused
Dec 30 16:57:12 truenas 1 2020-12-31T00:57:12.022530+00:00 truenas devd 429 - - notify_clients: send() failed; dropping unresponsive client
Dec 30 16:57:12 truenas kernel: pid 446 (python3.8), jid 0, uid 0: exited on signal 4 (core dumped)
Dec 30 17:01:27 truenas 1 2020-12-30T17:01:27.199768-08:00 truenas collectd 3353 - - Traceback (most recent call last):
File "/usr/local/lib/collectd_pyplugins/disktemp.py", line 62, in read
with Client() as c:
File "/usr/local/lib/python3.8/site-packages/middlewared/client/client.py", line 281, in __init__
self._ws.connect()
File "/usr/local/lib/python3.8/site-packages/middlewared/client/client.py", line 124, in connect
rv = super(WSClient, self).connect()
File "/usr/local/lib/python3.8/site-packages/ws4py/client/__init__.py", line 216, in connect
self.sock.connect(self.bind_addr)
ConnectionRefusedError: [Errno 61] Connection refused
Dec 30 17:06:27 truenas 1 2020-12-30T17:06:27.199002-08:00 truenas collectd 3353 - - Traceback (most recent call last):
File "/usr/local/lib/collectd_pyplugins/disktemp.py", line 62, in read
with Client() as c:
File "/usr/local/lib/python3.8/site-packages/middlewared/client/client.py", line 281, in __init__
self._ws.connect()
File "/usr/local/lib/python3.8/site-packages/middlewared/client/client.py", line 124, in connect
rv = super(WSClient, self).connect()
File "/usr/local/lib/python3.8/site-packages/ws4py/client/__init__.py", line 216, in connect
self.sock.connect(self.bind_addr)
ConnectionRefusedError: [Errno 61] Connection refused
The hardware that TrueNAS is running on:
CPU: AMD Ryzen 5 3600 6-Core Processor
Memory: 64 GB
Motherboard: ASUS TUF GAMING X570-PLUS (WI-FI)
Graphics: NVidia GeForce 710
2 Intel PRO/1000 PCI-Express Network interface compatible cards
nvd0: WDS100T3X0C-00SJG0 WD Black SN750 NVMe SSD
nvd1: WDS100T3X0C-00SJG0 WD Black SN750 NVMe SSD
ada0 boot device: Samsung SSD 860 EVO 500GB RVT04B6Q, Serial Number S598NJ0NA16137K
ada1: HGST HUS728T8TALE6L4 V8GAW4J0
ada2: HGST HUS728T8TALE6L4 V8GAW4J0
ada3: HGST HUS728T8TALE6L4 V8GAW4J0
The web GUI can be manually restarted by ssh'ing into the box and running "service middewared restart" as root.
The crashes are restricted to the python3.8 executable, which makes me think this is a software bug and not a hardware one. The hardware survives multiple MemTest86 tests, which makes me think it's a software issue and not a hardware one.
It's a little disturbing to see python itself dumping core. Python 3.8.5 has been known to dump core -- see for example https://bugs.python.org/issue37135 .
I don't know how the developers are rolling python3.8 for TrueNAS, but I might suggest making sure that python3.8 is built from the latest possible sources.
Additionally, it might be good to have a watchdog process catch when python3.8 has crashed, and restart the middlewared process or at least report the crash.
Thoughts on how I can get more debug info to the developers?
Some possibly relevant lines from /var/log/message:
Dec 30 04:18:47 truenas 1 2020-12-30T04:18:47.118906-08:00 truenas collectd 3353 - - nut plugin: nut_connect: upscli_connect (localhost, 3493) failed: Connection failure: Connection refused
Dec 30 09:24:21 truenas kernel: pid 506 (python3.8), jid 0, uid 0: exited on signal 11 (core dumped)
Dec 30 09:24:21 truenas kernel: pid 17779 (python3.8), jid 0, uid 0: exited on signal 11 (core dumped)
Dec 30 09:24:24 truenas kernel: pid 17789 (python3.8), jid 0, uid 0: exited on signal 11 (core dumped)
Dec 30 12:14:38 truenas kernel: pid 18068 (python3.8), jid 0, uid 0: exited on signal 11 (core dumped)
Dec 30 15:41:27 truenas 1 2020-12-30T15:41:27.133678-08:00 truenas collectd 3353 - - nut plugin: nut_connect: upscli_connect (localhost, 3493) failed: Connection failure: Connection refused
Dec 30 16:57:12 truenas 1 2020-12-31T00:57:12.022530+00:00 truenas devd 429 - - notify_clients: send() failed; dropping unresponsive client
Dec 30 16:57:12 truenas kernel: pid 446 (python3.8), jid 0, uid 0: exited on signal 4 (core dumped)
Dec 30 17:01:27 truenas 1 2020-12-30T17:01:27.199768-08:00 truenas collectd 3353 - - Traceback (most recent call last):
File "/usr/local/lib/collectd_pyplugins/disktemp.py", line 62, in read
with Client() as c:
File "/usr/local/lib/python3.8/site-packages/middlewared/client/client.py", line 281, in __init__
self._ws.connect()
File "/usr/local/lib/python3.8/site-packages/middlewared/client/client.py", line 124, in connect
rv = super(WSClient, self).connect()
File "/usr/local/lib/python3.8/site-packages/ws4py/client/__init__.py", line 216, in connect
self.sock.connect(self.bind_addr)
ConnectionRefusedError: [Errno 61] Connection refused
Dec 30 17:06:27 truenas 1 2020-12-30T17:06:27.199002-08:00 truenas collectd 3353 - - Traceback (most recent call last):
File "/usr/local/lib/collectd_pyplugins/disktemp.py", line 62, in read
with Client() as c:
File "/usr/local/lib/python3.8/site-packages/middlewared/client/client.py", line 281, in __init__
self._ws.connect()
File "/usr/local/lib/python3.8/site-packages/middlewared/client/client.py", line 124, in connect
rv = super(WSClient, self).connect()
File "/usr/local/lib/python3.8/site-packages/ws4py/client/__init__.py", line 216, in connect
self.sock.connect(self.bind_addr)
ConnectionRefusedError: [Errno 61] Connection refused
The hardware that TrueNAS is running on:
CPU: AMD Ryzen 5 3600 6-Core Processor
Memory: 64 GB
Motherboard: ASUS TUF GAMING X570-PLUS (WI-FI)
Graphics: NVidia GeForce 710
2 Intel PRO/1000 PCI-Express Network interface compatible cards
nvd0: WDS100T3X0C-00SJG0 WD Black SN750 NVMe SSD
nvd1: WDS100T3X0C-00SJG0 WD Black SN750 NVMe SSD
ada0 boot device: Samsung SSD 860 EVO 500GB RVT04B6Q, Serial Number S598NJ0NA16137K
ada1: HGST HUS728T8TALE6L4 V8GAW4J0
ada2: HGST HUS728T8TALE6L4 V8GAW4J0
ada3: HGST HUS728T8TALE6L4 V8GAW4J0