10 Gig Won't Connect to Switch Without Intervention on One of Two Similar Servers

Status
Not open for further replies.

bollar

Patron
Joined
Oct 28, 2012
Messages
411
One of my servers ("Storage" see sig) will not connect to my Ubiquiti Unifi Switch 48 POE-500W without unplugging and replugging the SFP+ module at the switch. Once I do this, it remains connected until the next reboot. The other server ("Replication" see sig) connects fine. Unplugging and replugging the SFP+ module at the card end does nothing.

I did find a note on this site that suggests that auto-negotiation is potentially the problem. I set the switch to manual 10 Gbps, but that didn't work. I also tried setting manual negotiation on the card, but got this error:
Code:
# ifconfig cxgb0 media 1000baseTX mediaopt full-duplex

ifconfig: SIOCSIFMEDIA (media): Device not configured


If there is another way I need to try to configure the card, I haven't found the right way.

Other things I have tried:
  • Change cables
  • Change cards (to the Intel card used on "Replication")
  • Swap SFP+ ports (with the working "Replication")
  • Reinstalling FreeNAS (both upgrade and fresh install of 11.1U1 USO)
I currently have the Chelsio listed in my sig installed, but I can reinstall the Intel card, if there is a resolution that works only on it.

In all cases, "Replication" continued to connect on reboot and "Storage" did not.

Things I considered, but have not tried:
  • Swapping motherboards
  • Swapping switches -- My other switches are 1 Gig SFP, and I don't have another 10 Gig switch
I have exhausted the things i know how to do. Do any of you august and learned FreeNASsers have other suggestions for me?

IFCONFIG in working state:

Code:
cxgb0: flags=8943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> metric 0 mtu 1500

	options=2c00b9<RXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,VLAN_HWTSO,LINKSTATE,RXCSUM_IPV6>

	ether 00:07:43:07:57:8b

	hwaddr 00:07:43:07:57:8b

	inet 10.0.0.5 netmask 0xff000000 broadcast 10.255.255.255

	inet6 fe80::207:43ff:fe07:578b%cxgb0 prefixlen 64 scopeid 0x1

	nd6 options=23<PERFORMNUD,ACCEPT_RTADV,AUTO_LINKLOCAL>

	media: Ethernet Unknown <full-duplex>

	status: active


Also, IFCONFIG from "Replication" I see the "media" line is different -- perhaps there's a config line or driver missing on "Storage"?
Code:
ix0: flags=8943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> metric 0 mtu 1500

	options=a400b9<RXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,VLAN_HWTSO,RXCSUM_IPV6>

	ether 90:e2:ba:1a:e8:a4

	hwaddr 90:e2:ba:1a:e8:a4

	inet 10.0.0.9 netmask 0xffff0000 broadcast 10.0.255.255

	nd6 options=9<PERFORMNUD,IFDISABLED>

	media: Ethernet autoselect (10Gbase-Twinax <full-duplex,rxpause,txpause>)

	status: active
 
Last edited:

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
Unfortunately, this is likely to be a hardware/firmware issue on the switch's side of things (SFP module included).
 

bollar

Patron
Joined
Oct 28, 2012
Messages
411
Unfortunately, this is likely to be a hardware/firmware issue on the switch's side of things (SFP module included).
One way or another, I'm afraid you're going to be right. I'll float this by the Ubiquiti experts and see what they say.

It is strange, though, that all of the hardware (cards, cables & SFP+ modules) has been swapped between the servers and works in one but not the other. The only difference between the two servers is the specific Xeon processor and the server that does not work has dual processors.
 

bollar

Patron
Joined
Oct 28, 2012
Messages
411
Ubiquiti suggested using their U Fiber Multi-Mode SFP 10G modules. They're inexpensive and Amazon can deliver today, so we'll give them a try.
 

Nick2253

Wizard
Joined
Apr 21, 2014
Messages
1,633
One way or another, I'm afraid you're going to be right. I'll float this by the Ubiquiti experts and see what they say.

It is strange, though, that all of the hardware (cards, cables & SFP+ modules) has been swapped between the servers and works in one but not the other. The only difference between the two servers is the specific Xeon processor and the server that does not work has dual processors.
Did you also try switching ports? (i.e., Server 1 used to be in port 1 on the switch, move it to port 2.) If you've really tried switching everything, as incredible as it sounds, it seems like FreeBSD may be to blame.

One possible test that would rule out FreeBSD (assuming Server 1 is plugged in to SFP1 in switch port 1 is the problem): disconnect the cable from SFP1, reboot the switch, try plugging in the cable, if that doesn't work, plug in Server 2 to that SFP1 and see what happens.
 

bollar

Patron
Joined
Oct 28, 2012
Messages
411
Did you also try switching ports? (i.e., Server 1 used to be in port 1 on the switch, move it to port 2.) If you've really tried switching everything, as incredible as it sounds, it seems like FreeBSD may be to blame.

One possible test that would rule out FreeBSD (assuming Server 1 is plugged in to SFP1 in switch port 1 is the problem): disconnect the cable from SFP1, reboot the switch, try plugging in the cable, if that doesn't work, plug in Server 2 to that SFP1 and see what happens.
Thanks!

I gave that a try and "Replication" continued to work while "Storage" did not. Odd.

The Unifi SFP+ module arrived and after a few tries, I may have a configuration that's working. The module was not compatible with the Intel 82599ES card -- or at least that's what it reported and it hung the switch and the card (!). I put the Chelsio S320e in and it could identify the module and configure it correctly. It continued to report active media after reboot (which was the original problem).

In the meantime, the other server continued to work with the Intel card and 10Gtek cable.

The good news was that I had never had a reason to play with fiber before and I've checked that box.
 

c32767a

Patron
Joined
Dec 13, 2012
Messages
371
Thanks!

I gave that a try and "Replication" continued to work while "Storage" did not. Odd.

The Unifi SFP+ module arrived and after a few tries, I may have a configuration that's working. The module was not compatible with the Intel 82599ES card -- or at least that's what it reported and it hung the switch and the card (!). I put the Chelsio S320e in and it could identify the module and configure it correctly. It continued to report active media after reboot (which was the original problem).

In the meantime, the other server continued to work with the Intel card and 10Gtek cable.

The good news was that I had never had a reason to play with fiber before and I've checked that box.

If the problem followed the device and did not stay with the switch port, it's likely a configuration issue on the FreeNAS side.

One other thing to note, the old days of autonegotiaton problems are past us. You should ensure that both sides of the link are configured to operate in 10G auto mode. The 10G phy does not have a "half duplex" mode, so there's no chance for a duplex mismatch at 10G. And by disabling autonegotiation, you're preventing the NIC and switch from exchanging other information needed to establish the link, such as whether flow control is supported.

FWIW, at $dayjob we have Dell, Cisco and Brocade (now Ruckus? blech) switches connected to NICs on freenas boxes and have no issues with the default config. I have a UBNT 16XG switch at home that also works fine with the Intel and Solarflare cards in my home FreeNAS.
 

bollar

Patron
Joined
Oct 28, 2012
Messages
411
Thanks @c32767a -- I had tried a new install of FreeNAS 11.1U2 and the problems on "Storage" remained. I don't envy you guys who deal with this every day. If I were to continue experimenting, I'd look harder at the motherboards since I did no testing on those.

With mixing & matching parts on hand and recommended to me, I have a configuration that is working:

Switch: Ubiquiti Unifi Switch 48 POE-500W

Storage:
Card: Chelsio S320e Dual-Port 10GBe
SFP Modules: Ubiquiti U Fiber Multi-Mode SFP 10G
Fiber: Cable Matters 10Gb 40Gb Multimode OM3 Duplex 50/125 OFNP Fiber Patch Cable LC to LC - 30m

# ifconfig cxgb0
cxgb0: flags=8943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> metric 0 mtu 1500
options=2c00b9<RXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,VLAN_HWTSO,LINKSTATE,RXCSUM_IPV6>
ether 00:07:43:07:57:8b
hwaddr 00:07:43:07:57:8b
inet 10.0.0.5 netmask 0xff000000 broadcast 10.255.255.255
nd6 options=9<PERFORMNUD,IFDISABLED>
media: Ethernet 10Gbase-SR <full-duplex>
status: active

# ifconfig cxgb1
cxgb1: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
options=6c07bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,TSO4,TSO6,LRO,VLAN_HWTSO,LINKSTATE,RXCSUM_IPV6,TXCSUM_IPV6>
ether 00:07:43:07:57:8c
hwaddr 00:07:43:07:57:8c
inet 192.168.200.20 netmask 0xffffff00 broadcast 192.168.200.255
nd6 options=9<PERFORMNUD,IFDISABLED>
media: Ethernet 10Gbase-Twinax <full-duplex>
status: active

Replication:
Card: Intel 82599ES Dual-Port 10GBe
SFP Modules / Fiber: Fiberstore 10G SFP+ Active Optical Cable - 25m

# ifconfig ix0

ix0: flags=8943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> metric 0 mtu 1500
options=a400b9<RXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,VLAN_HWTSO,RXCSUM_IPV6>
ether 90:e2:ba:1a:e8:a4
hwaddr 90:e2:ba:1a:e8:a4
inet 10.0.0.9 netmask 0xffff0000 broadcast 10.0.255.255
nd6 options=9<PERFORMNUD,IFDISABLED>
media: Ethernet autoselect (10Gbase-Twinax <full-duplex,rxpause,txpause>)
status: active

# ifconfig ix1

ix1: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
options=e407bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,TSO4,TSO6,LRO,VLAN_HWTSO,RXCSUM_IPV6,TXCSUM_IPV6>
ether 90:e2:ba:1a:e8:a5
hwaddr 90:e2:ba:1a:e8:a5
inet 192.168.200.10 netmask 0xffffff00 broadcast 192.168.200.255
inet6 fe80::92e2:baff:fe1a:e8a5%ix1 prefixlen 64 scopeid 0x6
nd6 options=23<PERFORMNUD,ACCEPT_RTADV,AUTO_LINKLOCAL>
media: Ethernet autoselect (10Gbase-Twinax <full-duplex,rxpause,txpause>)
status: active

I also have the two servers connected directly on a separate subnet using a StarTech.com SFP-H10GB-CU2M – 2m
 
Last edited:

hescominsoon

Patron
Joined
Jul 27, 2016
Messages
456
One of my servers ("Storage" see sig) will not connect to my Ubiquiti Unifi Switch 48 POE-500W without unplugging and replugging the SFP+ module at the switch. Once I do this, it remains connected until the next reboot. The other server ("Replication" see sig) connects fine. Unplugging and replugging the SFP+ module at the card end does nothing.

I did find a note on this site that suggests that auto-negotiation is potentially the problem. I set the switch to manual 10 Gbps, but that didn't work. I also tried setting manual negotiation on the card, but got this error:
Code:
# ifconfig cxgb0 media 1000baseTX mediaopt full-duplex

ifconfig: SIOCSIFMEDIA (media): Device not configured


If there is another way I need to try to configure the card, I haven't found the right way.

Other things I have tried:
  • Change cables
  • Change cards (to the Intel card used on "Replication")
  • Swap SFP+ ports (with the working "Replication")
  • Reinstalling FreeNAS (both upgrade and fresh install of 11.1U1 USO)
I currently have the Chelsio listed in my sig installed, but I can reinstall the Intel card, if there is a resolution that works only on it.

In all cases, "Replication" continued to connect on reboot and "Storage" did not.

Things I considered, but have not tried:
  • Swapping motherboards
  • Swapping switches -- My other switches are 1 Gig SFP, and I don't have another 10 Gig switch
I have exhausted the things i know how to do. Do any of you august and learned FreeNASsers have other suggestions for me?

IFCONFIG in working state:

Code:
cxgb0: flags=8943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> metric 0 mtu 1500

	options=2c00b9<RXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,VLAN_HWTSO,LINKSTATE,RXCSUM_IPV6>

	ether 00:07:43:07:57:8b

	hwaddr 00:07:43:07:57:8b

	inet 10.0.0.5 netmask 0xff000000 broadcast 10.255.255.255

	inet6 fe80::207:43ff:fe07:578b%cxgb0 prefixlen 64 scopeid 0x1

	nd6 options=23<PERFORMNUD,ACCEPT_RTADV,AUTO_LINKLOCAL>

	media: Ethernet Unknown <full-duplex>

	status: active


Also, IFCONFIG from "Replication" I see the "media" line is different -- perhaps there's a config line or driver missing on "Storage"?
Code:
ix0: flags=8943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> metric 0 mtu 1500

	options=a400b9<RXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,VLAN_HWTSO,RXCSUM_IPV6>

	ether 90:e2:ba:1a:e8:a4

	hwaddr 90:e2:ba:1a:e8:a4

	inet 10.0.0.9 netmask 0xffff0000 broadcast 10.0.255.255

	nd6 options=9<PERFORMNUD,IFDISABLED>

	media: Ethernet autoselect (10Gbase-Twinax <full-duplex,rxpause,txpause>)

	status: active
What SFP+ transceiver are you using the Ubiquiti switch? Also what version of firmware are you running on the switch. Ubiquiti is very good about keeping their firmware up to date and solving bugs....but if you are not using the right transceiver you will have issues.
 
Joined
Nov 1, 2017
Messages
6
I have the same problem with a Dlink DGS-1510-28X where everytime a server reboots I have to manually replug the 10gbe SFP+ cables for the connection to be restored. I'm using ubuntu 17.10 across the board and it happens on every server though not every time. This leads me to believe it might be a Linux related issue especially since the 10gbe standard doesn't support auto-negotiation.
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
Since it happens on FreeBSD, which is definitely not Linux, I'm inclined to blame the switches.
 
Status
Not open for further replies.
Top