Lockup then reboot, but why?

Status
Not open for further replies.

Visseroth

Guru
Joined
Nov 4, 2011
Messages
546
Well isn't this interesting... Just saw this.....

kernel: sonewconn: pcb 0xfffff801055f3e10: Listen queue overflow: 8 already in queue awaiting acceptance (1 occurrences)

ifconfig....

igb0: flags=8c02<BROADCAST,OACTIVE,SIMPLEX,MULTICAST> metric 0 mtu 1500
options=403bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,TSO4,TSO6,VLAN_HWTSO>
ether 00:25:90:63:a0:7a
nd6 options=9<PERFORMNUD,IFDISABLED>
media: Ethernet autoselect (1000baseT <full-duplex>)
status: active
igb1: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
options=403bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,TSO4,TSO6,VLAN_HWTSO>
ether 00:25:90:63:a0:7b
inet 10.10.10.248 netmask 0xffffff00 broadcast 10.10.10.255
nd6 options=9<PERFORMNUD,IFDISABLED>
media: Ethernet autoselect (1000baseT <full-duplex>)
status: active
ipfw0: flags=8801<UP,SIMPLEX,MULTICAST> metric 0 mtu 65536
nd6 options=9<PERFORMNUD,IFDISABLED>
lo0: flags=8049<UP,LOOPBACK,RUNNING,MULTICAST> metric 0 mtu 16384
options=600003<RXCSUM,TXCSUM,RXCSUM_IPV6,TXCSUM_IPV6>
inet6 ::1 prefixlen 128
inet6 fe80::1%lo0 prefixlen 64 scopeid 0xc
inet 127.0.0.1 netmask 0xff000000
nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>
 

Visseroth

Guru
Joined
Nov 4, 2011
Messages
546
Current listen queue sizes (qlen/incqlen/maxqlen)
Proto Listen Local Address
tcp4 0/0/50 127.0.0.1.139
tcp4 0/0/50 127.0.0.1.445
tcp4 0/0/50 10.10.10.248.139
tcp4 0/0/50 10.10.10.248.445
tcp4 0/0/128 *.22
tcp6 0/0/128 *.22
tcp4 0/0/128 127.0.0.1.9042
tcp6 0/0/128 *.80
tcp4 0/0/128 10.10.10.248.80
tcp4 0/0/16 *.3493
tcp4 0/0/128 *.199
unix 0/0/100 /var/run/mdnsd
unix 0/0/5 /var/db/samba4/winbindd_privileged/pipe
unix 0/0/5 /var/run/samba4/winbindd/pipe
unix 0/0/5 /var/run/samba4/nmbd/unexpected
unix 0/0/128 /var/db/syslog-ng.ctl
unix 0/0/30 /var/run/dbus/system_bus_socket
unix 0/0/16 /var/db/nut/snmp-ups-ups
unix 0/0/5 /var/run/freenas-snmpd.sock
unix 0/0/5 /var/run/snmpd.sock
unix 0/0/4 /var/run/devd.pipe
 

Visseroth

Guru
Joined
Nov 4, 2011
Messages
546
anyone even looking at this post anymore?
 

rogerh

Guru
Joined
Apr 18, 2014
Messages
1,111

Visseroth

Guru
Joined
Nov 4, 2011
Messages
546
ahh, well I was hoping for some help. I'm glad to be of interest to someone. Hopefully I'll find a answer. I'm definately hard at it trying to figure it out and don't really want to give up until I know EXACTLY what is going on.

As a update. My reboots seem to now be persisting even with the SuperMicro AOC-STGN-I2S card removed from the server. Sometimes it just takes longer to reboot. I had over 12 hours on this last hammering before it finally reboot at about 12:20pm my time which gave me another NMI error...

2 01/16/2016 12:23:23 OEM Critical Interrupt Software NMI @ BUS:0 /Dev:1 /Func:0 - Asserted

lspci says that bus 0, device 1 is....
00:01.0 PCI bridge: Intel Corporation 5520/5500/X58 I/O Hub PCI Express Root Port 1 (rev 22)

That is with ACPI v2.0 enabled instead of 3.

I have a SuperMicro Tech working with me on this too. He loaded up a motherboard with FreeNAS and got the same error on startup.

Anyone have any ideas as to what else I should try?
 

Attachments

  • ACPI.JPG
    ACPI.JPG
    30.5 KB · Views: 251
Last edited:

rogerh

Guru
Joined
Apr 18, 2014
Messages
1,111
Assuming SuperMicro don't think it is a hardware fault, would it be a good idea to report it as a bug? That is the only way to get the attention of the FreeNAS developer.
 

Visseroth

Guru
Joined
Nov 4, 2011
Messages
546
I just reported the NMI error as a hardware related bug and made it clear that my reboots are not what is being reported. I'm reporting for the NMI error which may or may not be related to the reboot issue.
https://bugs.freenas.org/issues/13315

As a side note I ran a memtest for about 4 or 5 days, got about 3 passes on the 48GB of RAM without any errors, rebooted the system, started hammering it with traffic and about 5 or 6 hours later received a reboot.
The SuperMicro Tech did some testing of his own and managed to also get a reboot, he then restarted the test and managed to run all night without error.

So I'm not exactly sure why but it seems FreeNAS does not play nice with the X8DTH boards and high loads.

My next test is to put a 10Gb fiber card in, connect it to a 10Gb switch and start hammering it with the onboard NICs disabled (minus the management NIC). I will report back when I am able. I'm currently waiting for a couple 10Gb NICs to come in and I'm setting up the switch.
I ended up getting a Quanta LB4M switch (not sure if I like it or not) that I'm currently trying to wrap my head around. Thus far I'm not liking the noise and the amount of power it uses. It draws 63W at Idle.

Quote from SuperMicro Tech...
I mounted a hdd and share it as CIFS, then map the drive on another machine. Looks like the FreeNAS stopped receiving files after 5 hours (stared at 6:00pm).
I am not sure is the network down or the system reboot. Is it a way to check the system activities in FreeNAS?
 

Visseroth

Guru
Joined
Nov 4, 2011
Messages
546
Seems I may have inadvertantly made the error go away by applying the suggested changes on this page.... http://forums.ayksolutions.com/foru...questions/656-how-to-improve-scp-rsync-speeds
to my sysctl in "tunables"

I disabled the onboard NICs because I was putting in a 10Gb NIC. The error persisted but I was getting slow speeds so I was looking for a fix, I tried the posted sysctl settings and noticed that my NMI error went away and the system loaded the OS a bit faster.
Grated my speeds still are not fixed but the error is gone and that is with ACPI 3 enabled. :D
 

Visseroth

Guru
Joined
Nov 4, 2011
Messages
546
As a side note I'm still investigating the lockup/reboot issue. I believe it to be a tuning issue though I'm not 100% sure that is the case. Thus far I have a 10Gb NIC installed and the onboard NICs disabled.

Disabling the onboard NICs and trying to stress the system only proves that the onboard NICs are possibly bad or that the NICs are not the problem. Something I'm still trying to figure out.
 

Visseroth

Guru
Joined
Nov 4, 2011
Messages
546
I have since disabled the onboard NICs and am ONLY using my 10GB NIC and the errors are completely gone. I likely have a bad onboard NIC or NIC chipset.
Just thought I'd update the post.
 

Visseroth

Guru
Joined
Nov 4, 2011
Messages
546
Found this post again because I'm dealing with another X8 chipset in which is having issues.
There is apparently a issue with the X8 series and the 5520 chipsets which cause problems such as the NIC dropping, packet loss, intermittent reboots, PCI Express errors in the IPMI event logs, DIMM errors in the IPMI event logs, ect. It may have to do with the VT-d re-addressing as listed here... https://www.suse.com/support/kb/doc/?id=7014344
or something else, I'm not sure, but the x8 series has some issues.
Anyone else that finds this thread, my recommendation is to replace your board and move on to either the X10 or X11 series or something else all together.
I replaced my X8 board with the X10SRH-CF and it's been ROCK SOLID! No glitches from the board. Only issue I've run into is SMART issues with a expander, but that's for another thread.
Keep in mind that changing your board also means you'll need to change your CPU, possibly you backplanes and most likely your RAM.
Moving from the X8 to the X10 reduced my power consumption by about 100W and improved system performance.

In short, DON'T USE THE X8 SERIES BOARDS!
 

Visseroth

Guru
Joined
Nov 4, 2011
Messages
546
Sad that I never saw these trash talking threads, could have saved some $$, so I made it a point to update this one so hopefully others will see this thread before they consider a X8, but good to know.
 
Status
Not open for further replies.
Top