Collecting logs to diagnose random reboots

Status
Not open for further replies.

anRossi

Dabbler
Joined
Feb 1, 2014
Messages
36
Hello,

I just built a new FreeNAS and I plugged in drives from my old FreeNAS system, which has been offline for months including using the same USB stick without reformatting it.

The box will reset and reboot on it's own after booting. Clicking on something in WebGUI trigger it, some things in the WebGUI don't trigger it. If I let it sit long enough, however, it looks like it will trigger on its own.

I'm looking to collect logs to find out which thing is triggering this and maybe investigate what might be the issue. I'm comfortable using SSH to collect the logs if somebody would give me some commands to try.


The next thing, of course, is to try flashing the latest stable image to the flash drive. Currently installed version is:
FreeBSD 9.2-RELEASE #0 r+2315ea3: Fri Dec 20 12:48:50 PST 2013 root@build.ixsystems.com:/tank/home/jkh/checkout/freenas/os-base/amd64/tank/home/jkh/checkout
/freenas/FreeBSD/src/sys/FREENAS.amd64​
according to uname -v


I ran memtest on the board for ~36 hours, so I don't think it's a memory issue (it's ECC memory). The IPMI interface is showing a "undetermined system hardware failure" event whenever this reboot happens.

Could it be the watchdog timer in the BIOS triggering it? I just now disabled it and it hasn't mysteriously rebooted. I want to look at logs regardless to find out.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
Could very well be the watchdog timer. That is supposed to reboot the system in certain circumstances, which can be from FreeNAS. ;)
 

anRossi

Dabbler
Joined
Feb 1, 2014
Messages
36
I thought the watchdog timer was a piece of hardware the OS *could* use. Why would freeNAS be restarting itself doing normal operations? Isn't that kind of a bug?
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
No.

The watchdog timer is a piece of hardware that is nothing more than a timer. Let's say its 5 minutes. So the watchdog starts at 300 seconds and counts down to 0. If it gets to zero it reboots the box. That sums up the hardware portion.

The OS, if it supports the particular watchdog timer your hardware uses and is properly setup, should reset the timer at regular intervals. Every time it resets it returns to 300 seconds. That's the extent of the software side of the watchdog timer.

Now ideally when the two are put together the hardware has this timer that is constantly being reset. In the event that the OS crashes then the timer will no longer be updated. After the 5 minutes is up, poof, the box reboots automatically. This ensures that an obscure problem that may occur from time to time doesn't result in a box that is down for too long. If you're in an enterprise and setup a good monitoring system you should be getting emails, smtp messages, or *something* that says "hey, box XYZ just rebooted on watchdog timer". If you rack up enough of these then you can safely assume there is a hardware or software problem and examine the server more closely.

The problem: if the software and hardware isn't compatible or isn't setup properly then the box will reboot every time the timer hits zero, which in our example is every 5 minutes after the box POSTs.

Make more sense now? You're the second or third person to have this problem. Quite literally the problem on the basic level is that you enabled the watchdog timer in your BIOS when its not supported, not enabled, or not functioning properly in FreeNAS. In this case FreeNAS doesn't support watchdog timers for many motherboards yet you enabled the feature when you shouldn't have. Unfortunately there's tons and tons of different watchdog timers out there so expecting FreeNAS to support them all is virtually impossible.

For the record I am aware of zero motherboards that enable the watchdog timer by default. The reason? Most OSes can't even complete an install in the watchdog's timeframe so it's something you leave disabled until you install the OS, setup the applicable driver to manage the watchdog timer, and only then would you enable the watchdog timer. Kind of useless to sell a motherboard that reboots every 5 minutes or so and you're just trying to install an OS. Plus many OSes only have support for a small number of watchdog timers so for a large number of OSes you can't utilize a watchdog timer even if you wanted to.

So chock this one up to "user error" and "lesson learned". Don't enable features you don't fully understand. ;)
 

anRossi

Dabbler
Joined
Feb 1, 2014
Messages
36
Heh, thanks for the patient and thorough explanation :)
Yeah, I thought watchdog timers worked differently and were universally supported (i.e. it doesn't start counting down until the OS tells it to start). But yeah, it's an enterprise-level feature I have no need for, so that's never going on.

At least I learned something today.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
Many watchdog timers will intially do a long countdown on the first bootup. In my example it might be 10 minutes on first bootup to give time for the OS to fully load. But the basic premise is still the same... if the OS can't or won't reset the timer you'll have a regularly rebooting box... and that sucks (as you saw first-hand).
 
Status
Not open for further replies.
Top