BMC Watchdog and random resets

Status
Not open for further replies.

Xevus

Dabbler
Joined
Nov 22, 2016
Messages
12
Hi.

I have recently finished my first FreeNAS build on 11 version (latest update). It's seems to be working fine, except for some random resets which as far as i understand are caused by BMC Watchdog, although watchdog is disabled in BIOS. I'm not sure where to start, because only thing i can currently see in the logs is below message in IPMI log

Code:
Watchdog 2 #0xca Watchdog 2 Timer Interrupt - Asserted
Watchdog 2 #0xca Watchdog 2 Hard Reset - Asserted


There is nothing special in FreeNAS system log. As you can see it worked without any issue for 22 hours and then suddenly restarted.
Code:
Oct  8 17:41:34 freenas kernel: igb0: link state changed to UP				
Oct  8 17:41:34 freenas kernel: igb0: link state changed to UP				
Oct  9 00:00:00 freenas syslog-ng[1507]: Configuration reload request received,
reloading configuration;														
Oct  9 19:56:51 freenas syslog-ng[1551]: syslog-ng starting up; version='3.7.3'
Oct  9 19:56:51 freenas Copyright (c) 1992-2017 The FreeBSD Project.			
Oct  9 19:56:51 freenas Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991,
1992, 1993, 1994		

My HW is below. Would really appreciate some direction on how to diagnose the issue.

SuperMicro X10SLM+-F-O
2 x 8Gb DDR-III 1600MHz Crucial ECC (CT102472BD160B)
Pentium G3258
Dell Perc H200, reflashed to LSI2008-IT
 
Last edited by a moderator:

Chris Moore

Hall of Famer
Joined
May 2, 2015
Messages
10,080
It looks to me like a hardware fault. Can you list the rest of the hardware you are using with this?

The first thing I would try is to remove the Perc H200. I had a bad SAS controller cause my system to reboot and I had to replace the controller to fix the problem.
You may need to do some testing to see if you can get the system to fault without the H200.
 

Xevus

Dabbler
Joined
Nov 22, 2016
Messages
12
Rest of HW is Corsair AX850 PSU and my random collection of 2 Tb drives

3 x WD RE
2 x Toshiba DT01ACA
Hitachi 7K3000
WD Red
Seagate Constellation ES.3

I think that better test might be to disable HW reset via Watchdog, and check if the system actually hangs. However, i couldn't find the way to disable this watchdog, it seems to turn on automatically.
 

Chris Moore

Hall of Famer
Joined
May 2, 2015
Messages
10,080
Do you already have data on the system or is it still in the test phase?
I don't think it is the kind of drive that is causing the problem but I would seriously suspect the H200 and the best way to isolate it as a fault is not to let the system hang, but to remove it and see if the system runs without and then put it back and see if the fault returns.
If there is no data yet, connect as many of the drives as possible to the system board onboard SATA ports, create a pool and do some transfers to it for load testing. Then put the H200 back in and do the same kind of testing using it. If the rebooting issue only happens with the H200 installed, it is a good bet that is the source of the fault. I personally have had to replace a bad H300 that started causing my system to fault after I had been using the card for almost a year. The testing process is not easy or painless, but it is needed to isolate the fault.
 

Xevus

Dabbler
Joined
Nov 22, 2016
Messages
12
I've tried running a system without H200 and seemed to be stable. So i've ordered H310 instead, reflashed to IT mode and started using new card. Unfortunately, the freeze behavior continues. What's worse - the system now cannot properly reboot after watchdog reset - it hangs at "ZFS volume import completed". Could a bad boot flash cause this kind of behavior ?
 

Chris Moore

Hall of Famer
Joined
May 2, 2015
Messages
10,080
The boot rom is not necessary and can cause problems in the SAS HBA. I suggest you do not use it. If you are booting the system from USB or SATA, you don't need to be able to boot to the SAS controller. I found this to be faster and more reliable than using it the other way.

Sent from my SAMSUNG-SGH-I537 using Tapatalk
 

Xevus

Dabbler
Joined
Nov 22, 2016
Messages
12
One more thing. I've just realized that my RAM is 1600 Mghz and G3258 only supports 1333 RAM. Could this be an issue ?
 
Status
Not open for further replies.
Top