Sorry, should have been clearer. IPMI itself was fine; I have getty running on the redirected serial console, and I was not able to get to a login prompt when I saw the crash.
Hardware:
* Supermicro X8-series motherboard
* Intel(R) Xeon(R) CPU E31220 @ 3.10GHz
* 16GB memory
* Samsung SSD 840 EVO 120GB as boot drive on mothreboard SATA
* SAS2008-based SATA controller
Last entries from /var/log/messages -- this was basically repeating since 14:50 the previous day, nothing interesting in log otherwise. If I filter out the service:nas-health warnings (capacity for a volume is at 86%, should be under 80%) and the Samba4Alert warning, then:
15:18 the previous day, I start getting 'Alert module '<update_check.UpdateCheckAlert object at 0x8161d3d30>' failed: timed out' and '[WARN] consul: error getting server health from "nas": context deadline exceeded'.
18:31 syslong-ng core dumps due to a SIGABRT
then the Samba4Alert, UpdateCheckAlert, and 'error getting server health' continue until the last message.
syslog rotated successfully at 00:00
tail end of syslog:
Code:
Dec 21 01:01:45 nas /alert.py: [system.alert:400] Alert module '<update_check.UpdateCheckAlert object at 0x8161d3d30>' failed: timed out
Dec 21 01:02:46 nas daemon[3264]: 2017/12/21 01:02:46 [WARN] Timed out (30s) running check '/usr/local/etc/consul-checks/freenas_health.sh'
Dec 21 01:03:31 nas daemon[3264]: 2017/12/21 01:03:30 [WARN] consul: error getting server health from "nas": context deadline exceeded
Dec 21 01:05:16 nas daemon[3264]: 2017/12/21 01:05:16 [WARN] Timed out (30s) running check '/usr/local/etc/consul-checks/freenas_health.sh'
Dec 21 01:06:18 nas /alert.py: [system.alert:400] Alert module '<samba4.Samba4Alert object at 0x8161c5d30>' failed: timed out
Dec 21 01:07:17 nas /alert.py: [system.alert:400] Alert module '<update_check.UpdateCheckAlert object at 0x8161d3d30>' failed: timed out
Dec 21 01:07:47 nas daemon[3264]: 2017/12/21 01:07:47 [WARN] Timed out (30s) running check '/usr/local/etc/consul-checks/freenas_health.sh'
Dec 21 01:10:17 nas daemon[3264]: 2017/12/21 01:10:17 [WARN] Timed out (30s) running check '/usr/local/etc/consul-checks/freenas_health.sh'
Dec 21 01:11:49 nas /alert.py: [system.alert:400] Alert module '<samba4.Samba4Alert object at 0x8161c5d30>' failed: timed out
Dec 21 01:12:26 nas daemon[3264]: 2017/12/21 01:12:26 [WARN] consul: error getting server health from "nas": context deadline exceeded
Dec 21 01:12:47 nas daemon[3264]: 2017/12/21 01:12:47 [WARN] Timed out (30s) running check '/usr/local/etc/consul-checks/freenas_health.sh'
Dec 21 01:13:01 nas /alert.py: [system.alert:400] Alert module '<update_check.UpdateCheckAlert object at 0x8161d3d30>' failed: timed out
Dec 21 01:15:18 nas daemon[3264]: 2017/12/21 01:15:18 [WARN] Timed out (30s) running check '/usr/local/etc/consul-checks/freenas_health.sh'
Dec 21 01:17:11 nas /alert.py: [system.alert:400] Alert module '<samba4.Samba4Alert object at 0x8161c5d30>' failed: timed out
Dec 21 01:18:14 nas /alert.py: [system.alert:400] Alert module '<update_check.UpdateCheckAlert object at 0x8161d3d30>' failed: timed out
Dec 21 01:19:12 nas daemon[3264]: 2017/12/21 01:19:12 [WARN] Timed out (30s) running check '/usr/local/etc/consul-checks/freenas_health.sh'