HP Microserver Gen10 fails network access randomly, second NIC port not working

PhreakShow

Dabbler
Joined
Sep 16, 2020
Messages
17
Hi.

I have two HP Microserver Gen10, X3418, 8 GB RAM, Intel X540-T2 in the bigger slot. Also, there's four 8TB WD Red drives, but I don't think they matter for now.
BIOS is at ZA10A380, it used to be ZA10A360 and I tried updating it to resolve the issue, no change. I am using FreeNAS-11.3-U4.1 on both units, they have the same problem.

At first I noticed failed backup jobs that run over night. I have a windows server 2019 that copies files to mapped smb shares on the freenas machines.
Mid-backup robocopy complained about a network drive that is no longer reachable. In this state, I cannot reach the machine via ping, ssh, browser, share.
Furthermore, if I plug a monitor in this state, no image appears. Pressing the power button seems to trigger a shutdown (sudden HDD activity on the LED), but it does not appear in the "last" log of shutdowns.

Then I paid more attention to the network devices and the boot process. During startup, either both or one NIC port fail to get recognized.

Code:
Sep 16 13:46:20 freenas02 ix0: <Intel(R) PRO/10GbE PCI-Express Network Driver, Version - 3.2.12-k> mem 0xfffa00000-0xfffbfffff,0xfffe00000-0xfffe03fff irq 35 at device 0.1 on pci3
Sep 16 13:46:20 freenas02 ix0: Using MSI-X interrupts with 5 vectors
Sep 16 13:46:20 freenas02 ix0: Hardware initialization failed
Sep 16 13:46:20 freenas02 device_attach: ix0 attach returned 5


With this message, both ix0 and ix1 fail to get recognized by the OS. No link, no ping, and no status in ifconfig.

Sometimes, only one port fails, showing:

Code:
Sep 15 17:37:06 freenas02 ix1: <Intel(R) PRO/10GbE PCI-Express Network Driver, Version - 3.2.12-k> mem 0xf0c00000-0xf0c03fff irq 35 at device 0.1 on pci3
Sep 15 17:37:06 freenas02 ix1: 0x200000 bytes of rid 0x10 res 3 failed (0, 0xffffffffffffffff).
Sep 15 17:37:06 freenas02 ix1: Unable to allocate bus resource: memory
Sep 15 17:37:06 freenas02 ix1: Allocation of PCI resources failed
Sep 15 17:37:06 freenas02 device_attach: ix1 attach returned 6


The issues is the same for both machines. 8GB is plenty of RAM for a single PCI resource.

I am thinking of abandoning FreeNAS and switching back to windows, because even dynamic drives with a software raid5 seem to be more reliable than this.

Any ideas on how to resolve that, or even grasp the actual issue here?
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,681
Have you tried a different network card? I've seen a number of reports of the X540 which suggest that it can be flaky. The X520 is the go-to Intel card. There are also a lot of knockoff cards in the market. Pull your card and verify that the Yottamark passes validation. If it doesn't, or if there isn't a Yottamark, there's a good chance the card is a knockoff.
 

PhreakShow

Dabbler
Joined
Sep 16, 2020
Messages
17
These X540 converged cards are the only 10G cards which I have available. The other end is connected to a 10G switch and a 10G onboard NIC on my supermicro server.
They don't have a yottamark, though. Probably because they are converged OEM cards?

I used both NICs with windows in a test environment. Back then, I didnt have any issues with them.
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,681
Well, the main thing I know is that everyone someone says "Intel" and "10G" and "problem", the fourth thing is almost always "X540".
 

PhreakShow

Dabbler
Joined
Sep 16, 2020
Messages
17
So your recommendation would be buying two X520-DA2 with a pair of 10GBase-T copper SFP+ transceivers?
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,681
No, my recommendation for many years has been that 10GBase-T is not a good idea. It's a poor technology, it's extremely expensive, and is very sensitive to issues such as poor cabling. which leads to transient problems of the sort that you seem to be experiencing.

It *might* be a better idea to ditch the whole copper thing and go use some of the stuff suggested in the 10 Gig Networking Primer. This is meant as a way to ease into what might be uncomfortable territory for people who've never worked with non-copper ethernet.

The benefit here is that we know the stuff to work, and it isn't as twitchy as the copper stuff. There have been many people in the forums with problem-free setups involving the recommended X520-SR2, various ethernet switches, vendor optics, and a few bits of fiber. The technology is inexpensively available used via eBay in many cases.

Now, just to be clear, I don't know that this would resolve your problems. However, your original post discusses losing connectivity during busy times, which is typically a sign of an overheating card, or a fake card, or a card with driver bugs, or bad cabling. I can't tell which. However, with copper, you need four perfect pairs which is something that you don't always get, while for fiber stuff 10Gbps is very ordinary and *yawn*. I've also seen a number of people with X540 copper setups have problems, so it is also possible that there is some sort of unidentified driver issue.

So all I can really do is suggest you move to something where there are less variables and a lot more "known to work" involved.

Mikrotik makes some totally awesome small SFP+ switches starting at about $160 for four ports. In the spirit of full disclosure, I will mention that I did ask them for an 8 port CRS309-1G-8S+IN evaluation unit and they sent me one, but I also told them I'd tell people what I thought of it whether good or bad. It's an incredible value at the price, with features that indicated to me that people who designed the product had actually used switches in the past. It isn't a high end Cisco, Brocade, or Juniper switch, but for a home or small office setup, it's definitely one of the most intriguing options.

There's also a variety of used 10G enterprise switches available on eBay.
 

PhreakShow

Dabbler
Joined
Sep 16, 2020
Messages
17
So I followed your advice and bought an X520 with an additional SFP+ module. Turns out, that does not work either:

Code:
Sep 21 16:07:34 freenas02 ix0: <Intel(R) PRO/10GbE PCI-Express Network Driver, Version - 3.2.12-k> port 0xd000-0xd01f mem 0xfffe00000-0xfffe7ffff,0xffff00000-0xffff03fff irq 32 at device 0.1 on pci3
Sep 21 16:07:34 freenas02 ix0: Using MSI-X interrupts with 5 vectors
Sep 21 16:07:34 freenas02 ix0: Hardware initialization failed
Sep 21 16:07:34 freenas02 device_attach: ix0 attach returned 5


What is going on here :/
 

Samuel Tai

Never underestimate your own stupidity
Moderator
Joined
Apr 24, 2020
Messages
5,399
Have you tried blowing out the PCI-E slot, and then reseating the card?
 

PhreakShow

Dabbler
Joined
Sep 16, 2020
Messages
17
Yes I did that, and I also tried the other PCIe slot. It's just the x1, but still, it should work.
 

Samuel Tai

Never underestimate your own stupidity
Moderator
Joined
Apr 24, 2020
Messages
5,399
This smells like a hardware issue with the slots not providing enough power for the cards to fully initialize. Check your power supply to see if it's supplying correct voltages and amps on all leads. Try blowing all the dust off the entire motherboard, especially the capacitor/inductor banks handling the power supply phases.
 

PhreakShow

Dabbler
Joined
Sep 16, 2020
Messages
17
I also did that with a multimeter, and both machines worked for half a year running windows server with a RAID-Controller and the 10G card.
 
Top