Random reboots, need help with logs

Status
Not open for further replies.

Matt Funk

Dabbler
Joined
Apr 21, 2015
Messages
21
I have a NAS system that will periodically lose network connection. When I go to the console I find data on the screen that appears to show that a reboot has happened and in the midst of the OS loading it froze. Normally we can Ctrl Alt Del and restart the machine that way. Yesterday I had to reboot the NAS with a hard shutdown and then a soft reboot because it kept getting stuck during the OS being loaded. I tried to get on and look at the logs today to see what is causing my issue but any pertinent information is no longer in the active log or message files and more than likely are in a .bz2 file that is in there. Can anyone help me with getting these files open so I can find if there are any log entries that would show me why these reboots are happening?
 

Matt Funk

Dabbler
Joined
Apr 21, 2015
Messages
21
Intel S5000SVA
Intel Xeon E5405 x2
8GB of RAM
6x2TB WD
FreeNAS 9.3 Stable

Portion of log attached that should capture the time around the reboot
 

Attachments

  • NAS Debug.txt
    76.7 KB · Views: 349
Joined
Oct 2, 2014
Messages
925
How are the drives connected? Are they connected to the motherboard directly, using a LSI 9211-8i HBA, a M1015? What RAIDz do you have the x6 2Tb hdds in
 

Matt Funk

Dabbler
Joined
Apr 21, 2015
Messages
21
RAIDZ2, I was thinking RAID6 when I typed that. I am still having the reboot issue that happens fairly regularly and at some point gets stuck and is no longer available. Is there anything of use in the logs?
 

Matt Funk

Dabbler
Joined
Apr 21, 2015
Messages
21
Actually it appears that the reboots are far more often than I originally thought.
 
Joined
Oct 2, 2014
Messages
925
They are attached above.
you attached a .txt file, after you login to FreeNAS goto system, advanced, bottom of page do "Save debug" upload it in the format its in.
 

Matt Funk

Dabbler
Joined
Apr 21, 2015
Messages
21
Here is the file that was created.
 

Attachments

  • debug-CHallNAS-20151020114841.tgz
    605.9 KB · Views: 249

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
First, I'd update. You're many many builds behind. The issue may be fixed.

Second, the box isn't crashing (there's no crash data). It could be a build issue with that build (we did have some builds in Feb/March that didn't create crash data on a crash).

Third, if there is a BIOS (or IPMI) update I would recommend you upgrade.

Fourth, I saw these entries:

arning: KLD '/boot/kernel-debug/profile.ko' is newer than the linker.hints file
warning: KLD '/boot/kernel-debug/systrace_freebsd32.ko' is newer than the linker.hints file
warning: KLD '/boot/kernel-debug/systrace.ko' is newer than the linker.hints file
warning: KLD '/boot/kernel-debug/sdt.ko' is newer than the linker.hints file
warning: KLD '/boot/kernel-debug/lockstat.ko' is newer than the linker.hints file
link_elf_obj: symbol _mtx_assert undefined
linker_load_file: Unsupported file type
KLD dtraceall.ko: depends on fasttrap - not available or version mismatch
linker_load_file: Unsupported file type
ahcich5: Timeout on slot 1 port 0
ahcich5: is 00000000 cs 00000002 ss 00000000 rs 00000002 tfd 58 serr 00000000 cmd 0000c117
ahcich4: Timeout on slot 15 port 0
ahcich4: is 00000000 cs 00008000 ss 00000000 rs 00008000 tfd 58 serr 00000000 cmd 0000cf17
ahcich3: Timeout on slot 21 port 0
ahcich3: is 00000000 cs 00200000 ss 00000000 rs 00200000 tfd 58 serr 00000000 cmd 0000d517
ahcich2: Timeout on slot 31 port 0
ahcich2: is 00000000 cs 80000000 ss 00000000 rs 80000000 tfd 58 serr 00000000 cmd 0000df17
ahcich1: Timeout on slot 24 port 0
ahcich1: is 00000000 cs 01000000 ss 00000000 rs 01000000 tfd 58 serr 00000000 cmd 0000d817
ahcich0: Timeout on slot 24 port 0
ahcich0: is 00000000 cs 01000000 ss 00000000 rs 01000000 tfd 58 serr 00000000 cmd 0000d817
warning: KLD '/boot/kernel-debug/profile.ko' is newer than the linker.hints file
warning: KLD '/boot/kernel-debug/systrace_freebsd32.ko' is newer than the linker.hints file
warning: KLD '/boot/kernel-debug/systrace.ko' is newer than the linker.hints file
warning: KLD '/boot/kernel-debug/sdt.ko' is newer than the linker.hints file
warning: KLD '/boot/kernel-debug/lockstat.ko' is newer than the linker.hints file
link_elf_obj: symbol _mtx_assert undefined
linker_load_file: Unsupported file type
KLD dtraceall.ko: depends on fasttrap - not available or version mismatch
linker_load_file: Unsupported file type
ahcich5: Timeout on slot 31 port 0
ahcich5: is 00000000 cs 80000000 ss 00000000 rs 80000000 tfd 58 serr 00000000 cmd 0000df17
ahcich4: Timeout on slot 12 port 0
ahcich4: is 00000000 cs 00001000 ss 00000000 rs 00001000 tfd 58 serr 00000000 cmd 0000cc17
ahcich3: Timeout on slot 5 port 0
ahcich3: is 00000000 cs 00000020 ss 00000000 rs 00000020 tfd 58 serr 00000000 cmd 0000c517
ahcich2: Timeout on slot 2 port 0
ahcich2: is 00000000 cs 00000004 ss 00000000 rs 00000004 tfd 58 serr 00000000 cmd 0000c217
ahcich1: Timeout on slot 14 port 0
ahcich1: is 00000000 cs 00004000 ss 00000000 rs 00004000 tfd 58 serr 00000000 cmd 0000ce17
ahcich0: Timeout on slot 25 port 0
ahcich0: is 00000000 cs 02000000 ss 00000000 rs 02000000 tfd 58 serr 00000000 cmd 0000d917

The warnings pretty much imply that either you've modified the OS, or the OS has some kind of corruption. Likewise if you've enabled the debug kernel, you've probably made your system less reliable as a result.

Other than those 4 things I don't have anything else to go on. Nothing "looks" wrong.
 

Matt Funk

Dabbler
Joined
Apr 21, 2015
Messages
21
I have updated the OS and will monitor it and submit new logs if the issue persists. I have not made any changes to the OS so the warnings would be from a corruption potentially. What is my best bet for preserving my current data if I need to do a reinstallation on the OS to remedy the corruption? Or is there another option to remedy it?
 

razvanc.mobile

Dabbler
Joined
Oct 19, 2015
Messages
16
Can you check in BIOS and see if you have watchdog enabled? In the first logs i saw a period of about 6 minutes with nothing written, which seems consistent with a kernel panic that makes the watchdog reboot the server after 5 minutes of "inactivity".

Sent from my SM-N9005
 

Robert Trevellyan

Pony Wrangler
Joined
May 16, 2014
Messages
3,778
What is my best bet for preserving my current data if I need to do a reinstallation on the OS to remedy the corruption?
The data in your storage pool is unaffected by a reinstall. If you want to preserve settings (e.g. shares, scheduled SMART tests and Scrubs) just backup your configuration and upload it after the reinstall.
 

Matt Funk

Dabbler
Joined
Apr 21, 2015
Messages
21
I had an unscheduled restart again early this morning, logs attached. I have not yet done a reinstall.
 

Attachments

  • debug-CHallNAS-20151022101804..tgz
    572.3 KB · Views: 237
D

dlavigne

Guest
Again, nothing in the logs. I'd suspect a hardware issue, possibly heat.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
Can you check in BIOS and see if you have watchdog enabled? In the first logs i saw a period of about 6 minutes with nothing written, which seems consistent with a kernel panic that makes the watchdog reboot the server after 5 minutes of "inactivity".

Sent from my SM-N9005

That's not how the watchdog works. The watchdogd service resets the watchdog timer at regular intervals even if the system is 100% idle for days.
 
Status
Not open for further replies.
Top