Random reboots, need help with logs

Matt Funk · Oct 6, 2015

I have a NAS system that will periodically lose network connection. When I go to the console I find data on the screen that appears to show that a reboot has happened and in the midst of the OS loading it froze. Normally we can Ctrl Alt Del and restart the machine that way. Yesterday I had to reboot the NAS with a hard shutdown and then a soft reboot because it kept getting stuck during the OS being loaded. I tried to get on and look at the logs today to see what is causing my issue but any pertinent information is no longer in the active log or message files and more than likely are in a .bz2 file that is in there. Can anyone help me with getting these files open so I can find if there are any log entries that would show me why these reboots are happening?

anodos · Oct 6, 2015

You can use 7zip to unzip them. Post full hardware specs, version of freenas, etc.

Matt Funk · Oct 6, 2015

Intel S5000SVA
Intel Xeon E5405 x2
8GB of RAM
6x2TB WD
FreeNAS 9.3 Stable

Portion of log attached that should capture the time around the reboot

Darren Myers · Oct 6, 2015

How are the drives connected? Are they connected to the motherboard directly, using a LSI 9211-8i HBA, a M1015? What RAIDz do you have the x6 2Tb hdds in

Matt Funk · Oct 6, 2015

They are directly connected to the motherboard. They are in a RAIDZ6

dlavigne · Oct 8, 2015

Matt Funk said:
They are directly connected to the motherboard. They are in a RAIDZ6

There's no such thing as RAIDZ6... You can use bzcat to read the zipped log files (no need to unzip first).

Matt Funk · Oct 19, 2015

RAIDZ2, I was thinking RAID6 when I typed that. I am still having the reboot issue that happens fairly regularly and at some point gets stuck and is no longer available. Is there anything of use in the logs?

Matt Funk · Oct 19, 2015

Actually it appears that the reboots are far more often than I originally thought.

dlavigne · Oct 20, 2015

Is there anything of use in the logs?

No idea, you haven't posted them yet...

Matt Funk · Oct 20, 2015

They are attached above.

Darren Myers · Oct 20, 2015

Matt Funk said:
They are attached above.

you attached a .txt file, after you login to FreeNAS goto system, advanced, bottom of page do "Save debug" upload it in the format its in.

Matt Funk · Oct 20, 2015

Here is the file that was created.

cyberjock · Oct 20, 2015

First, I'd update. You're many many builds behind. The issue may be fixed.

Second, the box isn't crashing (there's no crash data). It could be a build issue with that build (we did have some builds in Feb/March that didn't create crash data on a crash).

Third, if there is a BIOS (or IPMI) update I would recommend you upgrade.

Fourth, I saw these entries:

arning: KLD '/boot/kernel-debug/profile.ko' is newer than the linker.hints file
warning: KLD '/boot/kernel-debug/systrace_freebsd32.ko' is newer than the linker.hints file
warning: KLD '/boot/kernel-debug/systrace.ko' is newer than the linker.hints file
warning: KLD '/boot/kernel-debug/sdt.ko' is newer than the linker.hints file
warning: KLD '/boot/kernel-debug/lockstat.ko' is newer than the linker.hints file
link_elf_obj: symbol _mtx_assert undefined
linker_load_file: Unsupported file type
KLD dtraceall.ko: depends on fasttrap - not available or version mismatch
linker_load_file: Unsupported file type
ahcich5: Timeout on slot 1 port 0
ahcich5: is 00000000 cs 00000002 ss 00000000 rs 00000002 tfd 58 serr 00000000 cmd 0000c117
ahcich4: Timeout on slot 15 port 0
ahcich4: is 00000000 cs 00008000 ss 00000000 rs 00008000 tfd 58 serr 00000000 cmd 0000cf17
ahcich3: Timeout on slot 21 port 0
ahcich3: is 00000000 cs 00200000 ss 00000000 rs 00200000 tfd 58 serr 00000000 cmd 0000d517
ahcich2: Timeout on slot 31 port 0
ahcich2: is 00000000 cs 80000000 ss 00000000 rs 80000000 tfd 58 serr 00000000 cmd 0000df17
ahcich1: Timeout on slot 24 port 0
ahcich1: is 00000000 cs 01000000 ss 00000000 rs 01000000 tfd 58 serr 00000000 cmd 0000d817
ahcich0: Timeout on slot 24 port 0
ahcich0: is 00000000 cs 01000000 ss 00000000 rs 01000000 tfd 58 serr 00000000 cmd 0000d817
warning: KLD '/boot/kernel-debug/profile.ko' is newer than the linker.hints file
warning: KLD '/boot/kernel-debug/systrace_freebsd32.ko' is newer than the linker.hints file
warning: KLD '/boot/kernel-debug/systrace.ko' is newer than the linker.hints file
warning: KLD '/boot/kernel-debug/sdt.ko' is newer than the linker.hints file
warning: KLD '/boot/kernel-debug/lockstat.ko' is newer than the linker.hints file
link_elf_obj: symbol _mtx_assert undefined
linker_load_file: Unsupported file type
KLD dtraceall.ko: depends on fasttrap - not available or version mismatch
linker_load_file: Unsupported file type
ahcich5: Timeout on slot 31 port 0
ahcich5: is 00000000 cs 80000000 ss 00000000 rs 80000000 tfd 58 serr 00000000 cmd 0000df17
ahcich4: Timeout on slot 12 port 0
ahcich4: is 00000000 cs 00001000 ss 00000000 rs 00001000 tfd 58 serr 00000000 cmd 0000cc17
ahcich3: Timeout on slot 5 port 0
ahcich3: is 00000000 cs 00000020 ss 00000000 rs 00000020 tfd 58 serr 00000000 cmd 0000c517
ahcich2: Timeout on slot 2 port 0
ahcich2: is 00000000 cs 00000004 ss 00000000 rs 00000004 tfd 58 serr 00000000 cmd 0000c217
ahcich1: Timeout on slot 14 port 0
ahcich1: is 00000000 cs 00004000 ss 00000000 rs 00004000 tfd 58 serr 00000000 cmd 0000ce17
ahcich0: Timeout on slot 25 port 0
ahcich0: is 00000000 cs 02000000 ss 00000000 rs 02000000 tfd 58 serr 00000000 cmd 0000d917

The warnings pretty much imply that either you've modified the OS, or the OS has some kind of corruption. Likewise if you've enabled the debug kernel, you've probably made your system less reliable as a result.

Other than those 4 things I don't have anything else to go on. Nothing "looks" wrong.

Matt Funk · Oct 20, 2015

I have updated the OS and will monitor it and submit new logs if the issue persists. I have not made any changes to the OS so the warnings would be from a corruption potentially. What is my best bet for preserving my current data if I need to do a reinstallation on the OS to remedy the corruption? Or is there another option to remedy it?

razvanc.mobile · Oct 20, 2015

Can you check in BIOS and see if you have watchdog enabled? In the first logs i saw a period of about 6 minutes with nothing written, which seems consistent with a kernel panic that makes the watchdog reboot the server after 5 minutes of "inactivity".

Sent from my SM-N9005

Robert Trevellyan · Oct 20, 2015

Matt Funk said:
What is my best bet for preserving my current data if I need to do a reinstallation on the OS to remedy the corruption?

The data in your storage pool is unaffected by a reinstall. If you want to preserve settings (e.g. shares, scheduled SMART tests and Scrubs) just backup your configuration and upload it after the reinstall.

Matt Funk · Oct 22, 2015

I had an unscheduled restart again early this morning, logs attached. I have not yet done a reinstall.

dlavigne · Oct 28, 2015

Again, nothing in the logs. I'd suspect a hardware issue, possibly heat.

cyberjock · Oct 29, 2015

razvanc.mobile said:
Can you check in BIOS and see if you have watchdog enabled? In the first logs i saw a period of about 6 minutes with nothing written, which seems consistent with a kernel panic that makes the watchdog reboot the server after 5 minutes of "inactivity".

Sent from my SM-N9005

That's not how the watchdog works. The watchdogd service resets the watchdog timer at regular intervals even if the system is 100% idle for days.

Important Announcement for the TrueNAS Community.

Random reboots, need help with logs

Dabbler

Sambassador

Dabbler

Attachments

Guru

Dabbler

dlavigne

Guest

Dabbler

Dabbler

dlavigne

Guest

Dabbler

Guru

Dabbler

Attachments

Inactive Account

Dabbler

Dabbler

Pony Wrangler

Dabbler

Attachments

dlavigne

Guest

Inactive Account

Similar threads