Understand the reasons for an unscheduled system reboot.

DarkCorner · Dec 17, 2022

A scale TrueNAS was used on Thursday.
On Friday morning it was found to be completely shut down. To turn it on, the power cord was removed and inserted again.
Whether it's true or false I don't know, but this is what I was told.

So, there is an alert about an unscheduled system reboot.

I have checked various logs, but there are messages related to the reboot (at the time of the alert) or older than a few days. I can't find anything that makes me think of something that happened during the night.
Or maybe I didn't check the right logo.

How can I determine if the conditions for blocking the system have been met at the hardware level?
As a rule, I don't see in dashboard any problems, for example at the temperate level.

Side question, is there any menu item to check the logs without having to access /var/log with the shell?

morganL · Dec 17, 2022

Forum rules: all hardware and software should be described.

artlessknave · Dec 17, 2022

DarkCorner said:
menu item to check the logs

there is an option, in either general or advanced iirc, to enable a footer of /var/log/messages output. it probably won't help though, and, as noted, no hardware info makes it very difficult to make much for suggestions.
the first thing I would think of is that the power went out and you do not have any bios setting to turn it back on. usually, if the OS crashes, it reboots, not stays off.

there was a story I read once, on a website I cannot remmeber the name of with IT stories, where a server kept going down, and nobody could figure out why, until the tech camped out for a stakeout, and observed someone unplugging the server to plug in a coffee machine (or something) and then they would plug the server back in when done with the machine.

the way you report it reminded me of that.

DarkCorner said:
but this is what I was told.

DarkCorner · Dec 18, 2022

Let's start with the story told by @artlessknave
Maybe that guy was related to mine.
The NAS is not in a secure area and although I have placed it where no one should touch it, more than once I have found it moved.
The BIOS is configured to restart with the return of the power, but in this case even pressing the power button did not turn on.
At least that's what they told me on the phone.
When I replied (very angry) that I would go to them to see what was going on, they called me back saying they had unplugged it and plugged it back in.
Thinking badly, I think they messed up the electrical wires and power strip first, but, of course, "nobody touched anything"

The hardware (I'm sorry)

Motherboard: ASRock FM2A85X-ITX
CPU: AMD A10-6700 (Quadcore)
RAM: 8+8GB
Disk: 1x120GB SSD (for Boot), 1x120GB SSD (for Apps), 4x1TB HDDs for storage in RAID-Z2 (Used space 8%), 1x750GB HDD for internal backup.

This NAS is for an office of 3 users, of which only one works permanently in the office and the other two often work outside and only come back to enter their contracts.

var/log/message is the first log I checked and the last messages before the reboot were from 6 days ago.

...
Dec  3 00:00:33 truenas syslog-ng[4030]: Configuration reload request received, reloading configuration;
Dec  3 00:00:33 truenas syslog-ng[4030]: Configuration reload finished;
Dec 10 00:00:28 truenas syslog-ng[4030]: Configuration reload request received, reloading configuration;
Dec 10 00:00:28 truenas syslog-ng[4030]: Configuration reload finished;
Dec 16 07:02:13 truenas syslog-ng[4300]: syslog-ng starting up; version='3.28.1'
Dec 16 07:01:22 truenas kernel: Linux version 5.10.142+truenas (root@tnsbuilds01.tn.ixsystems.net) (gcc (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.2) #1 SMP Mon Sep 26 18:20:46 UTC 2022
Dec 16 07:01:22 truenas kernel: Command line: BOOT_IMAGE=/ROOT/22.02.4@/boot/vmlinuz-5.10.142+truenas root=ZFS=boot-pool/ROOT/22.02.4 ro console=ttyS0,9600 console=tty1 libata.allow_tpm=1 systemd.unified_cgroup_hierarchy=0 amd_iommu=on iommu=pt kvm_amd.npt=1 kvm_amd.avic=1 intel_iommu=on zfsforce=1
...

This is debug

...
Nov 30 13:54:11 truenas ntpd[4059]: new interface(s) found: waking up resolver
Dec 16 07:01:22 truenas kernel: e820: update [mem 0x00000000-0x00000fff] usable ==> reserved
...

For the other logs, either there is only post-reboot data, or in any case I don't see data relating to day 15. What should I look for?

there is an option, in either general or advanced iirc, to enable a footer of /var/log/messages output

I only see an option to display console messages at the foot.

artlessknave · Dec 18, 2022

it sounds like that first thing you need to sort out is your communication with your users. relying on smarthands of any kind can be really frustrating, and if you can't be sure they are reporting steps taken accurately, the frustrations will multiply.

make sure not to be accusatory, however, as you need to get them to accurately report anything they do. if they do unplug it to plug in a coffee machine, get them to the point of at least giving you an email or something so that you know the reason the hardware went offline, instead of driving your self nuts.

in this case, it sounds likely to be a one time event. i would suggest waiting and seeing if it does any more mystery reboots or shutdowns. if no, then no more investigation is required. sometimes weird things do just happen, particularly since you appear to be running this on consumer grade hardware to begin with, which is very much not recommended for truenas.

DarkCorner · Dec 18, 2022

@artlessknave
It's true, now TrueNAS is running this on consumer grade hardware.
For the use I have to do now, at the moment it's the best solution, waiting for an evolution on a professional platform.

I agree with you on user relations and that at the moment it's time to wait and see what happens.

But my initial question remains: any signals for too high a temperature or the state of the disks, in which log file is it detected?
I would like to enable an alert, for example via email or Telegram.

artlessknave · Dec 18, 2022

to my knowledge, no, you would have to log temps to file with a cron or something.
sometimes, things just crash. one of the reasons for recommending server grade is because server grade tends to have less random weird things occur, due to being quality, tested, and burned in hardware. having to deal with more annoyances is one of the costs of cheating out on the hardware.
you have to calculate out how much time you are spending troubleshooting un-ideal gear vs the cost of known good hardware.

those of us who try to help really recomend the hadrware recomendations because we are donating our free time, and the consumer grade gear tends to waste that time.

Important Announcement for the TrueNAS Community.

Understand the reasons for an unscheduled system reboot.

DarkCorner

Explorer

morganL

Captain Morgan

artlessknave

Wizard

DarkCorner

Explorer

artlessknave

Wizard

DarkCorner

Explorer

artlessknave

Wizard

Similar threads