SOLVED 10GbE direct connection FreeNAS<-->Workstation stopped working

Status
Not open for further replies.

MHobbes

Cadet
Joined
Mar 27, 2015
Messages
2
Summary:

I had a working 10GbE connection between a FreeNAS box and separate dual boot Windows/Linux system that ceased functioning after updating the FreeNAS box. Booting the previous version didn't restore functionality. I need some help troubleshooting! :)​



I've been running this particular FreeNAS system since January of this year and recently added a spare Chelsio T3 10GbE card for direct connection (no switch involved) to a workstation. At the time of install, the hardware was running FreeNAS-9.3-STABLE-201503200528. I directly connected the two systems using a SFP+ DAC cable -- the system on the other end has a Qlogic dual port 10GbE card in it -- and configured both sides of the link with a static IP address. Connectivity over the direct link was working well after this and I was able to transfer data over the link in excess of several hundred megabytes per second as expected. I was using this configuration for several days without incident; the workstation was powered down daily breaking the link, but it always came up the next time the workstation was turned on.

Today, the FreeNAS system was updated to FreeNAS-9.3-STABLE-201503270027 (no hardware was touched) and now the link between the systems is only functional for a short time while FreeNAS boots. After a certain point in the boot process, it is almost as if a firewall has come up - the link still appears to be up and active on both sides, but pings in either direction fail. A ping started while the FreeNAS system is booting responds normally at first, but consistently stops working right after the FreeNAS console shows, "Starting ntpd." Once the next lines of "Importing account for <user>...ok" messages start appearing, ping starts returning "Destination Host Unreachable". I tried booting the previous version of FreeNAS that it was running on before but the behavior persisted. I have reseated the cable, reconfigured the interfaces from scratch, disabled and re-enabled the interface on either side, rebooted both systems countless times yet can't get the link working again.

Anyone have any insights as to what might be happening here? I've swapped out the Chelsio T3 card with another similar card and got the same results. Just to rule out a problem on the other side of the link, I swapped out the Qlogic card on that end as well. The behavior doesn't change - there is connectivity initially, which is subsequently lost once the boot process gets to a certain point.

Any pointers as to what my next troubleshooting steps should be would be greatly appreciated. I'm at a loss as to what I've screwed up here since it was all working great prior to today's reboot.

Thanks in advance!

Regards,

Michael

Hardware

Motherboard: SuperMicro X10SL7-F
CPU: Intel Xeon E3-1270 v3
Memory: Crucial 16GB (8GBx2) ECC
Network: Onboard Intel GbE
Network (add-on): Chelsio T3 10GbE
Power Supply: Seasonic 500w
Boot Device: SATA DoM
Storage: WD Red (x4) in RAIDZ2
Other Notes: Onboard LSI 2308 Flashed IT, v16 firmware

Software: FreeNAS-9.3-STABLE-201503270027

00:00.0 Host bridge: Intel Corporation Xeon E3-1200 v3 Processor DRAM Controller (rev 06)
00:01.0 PCI bridge: Intel Corporation Xeon E3-1200 v3/4th Gen Core Processor PCI Express x16 Controller (rev 06)
00:01.1 PCI bridge: Intel Corporation Xeon E3-1200 v3/4th Gen Core Processor PCI Express x8 Controller (rev 06)
00:14.0 USB controller: Intel Corporation 8 Series/C220 Series Chipset Family USB xHCI (rev 05)
00:1a.0 USB controller: Intel Corporation 8 Series/C220 Series Chipset Family USB EHCI #2 (rev 05)
00:1c.0 PCI bridge: Intel Corporation 8 Series/C220 Series Chipset Family PCI Express Root Port #1 (rev d5)
00:1c.2 PCI bridge: Intel Corporation 8 Series/C220 Series Chipset Family PCI Express Root Port #3 (rev d5)
00:1c.3 PCI bridge: Intel Corporation 8 Series/C220 Series Chipset Family PCI Express Root Port #4 (rev d5)
00:1d.0 USB controller: Intel Corporation 8 Series/C220 Series Chipset Family USB EHCI #1 (rev 05)
00:1f.0 ISA bridge: Intel Corporation C222 Series Chipset Family Server Essential SKU LPC Controller (rev 05)
00:1f.2 SATA controller: Intel Corporation 8 Series/C220 Series Chipset Family 6-port SATA Controller 1 [AHCI mode] (rev 05)
00:1f.3 SMBus: Intel Corporation 8 Series/C220 Series Chipset Family SMBus Controller (rev 05)
00:1f.6 Signal processing controller: Intel Corporation 8 Series Chipset Family Thermal Management Controller (rev 05)
01:00.0 Ethernet controller: Chelsio Communications Inc S310-CR 10GbE Single Port Adapter
02:00.0 Serial Attached SCSI controller: LSI Logic / Symbios Logic SAS2308 PCI-Express Fusion-MPT SAS-2 (rev 05)
03:00.0 PCI bridge: ASPEED Technology, Inc. AST1150 PCI-to-PCI Bridge (rev 03)
04:00.0 VGA compatible controller: ASPEED Technology, Inc. ASPEED Graphics Family (rev 30)
05:00.0 Ethernet controller: Intel Corporation I210 Gigabit Network Connection (rev 03)
06:00.0 Ethernet controller: Intel Corporation I210 Gigabit Network Connection (rev 03)

hostb0@pci0:0:0:0: class=0x060000 card=0x080415d9 chip=0x0c088086 rev=0x06 hdr=0x00
vendor = 'Intel Corporation'
class = bridge
subclass = HOST-PCI

pcib1@pci0:0:1:0: class=0x060400 card=0x080415d9 chip=0x0c018086 rev=0x06 hdr=0x01
vendor = 'Intel Corporation'
class = bridge
subclass = PCI-PCI

pcib2@pci0:0:1:1: class=0x060400 card=0x080415d9 chip=0x0c058086 rev=0x06 hdr=0x01
vendor = 'Intel Corporation'
class = bridge
subclass = PCI-PCI

none0@pci0:0:20:0: class=0x0c0330 card=0x080415d9 chip=0x8c318086 rev=0x05 hdr=0x00
vendor = 'Intel Corporation'
class = serial bus
subclass = USB

ehci0@pci0:0:26:0: class=0x0c0320 card=0x080415d9 chip=0x8c2d8086 rev=0x05 hdr=0x00
vendor = 'Intel Corporation'
class = serial bus
subclass = USB

pcib3@pci0:0:28:0: class=0x060400 card=0x080415d9 chip=0x8c108086 rev=0xd5 hdr=0x01
vendor = 'Intel Corporation'
class = bridge
subclass = PCI-PCI

pcib5@pci0:0:28:2: class=0x060400 card=0x080415d9 chip=0x8c148086 rev=0xd5 hdr=0x01
vendor = 'Intel Corporation'
class = bridge
subclass = PCI-PCI

pcib6@pci0:0:28:3: class=0x060400 card=0x080415d9 chip=0x8c168086 rev=0xd5 hdr=0x01
vendor = 'Intel Corporation'
class = bridge
subclass = PCI-PCI

ehci1@pci0:0:29:0: class=0x0c0320 card=0x080415d9 chip=0x8c268086 rev=0x05 hdr=0x00
vendor = 'Intel Corporation'
class = serial bus
subclass = USB

isab0@pci0:0:31:0: class=0x060100 card=0x080415d9 chip=0x8c528086 rev=0x05 hdr=0x00
vendor = 'Intel Corporation'
class = bridge
subclass = PCI-ISA

ahci0@pci0:0:31:2: class=0x010601 card=0x080415d9 chip=0x8c028086 rev=0x05 hdr=0x00
vendor = 'Intel Corporation'
class = mass storage
subclass = SATA

none1@pci0:0:31:3: class=0x0c0500 card=0x080415d9 chip=0x8c228086 rev=0x05 hdr=0x00
vendor = 'Intel Corporation'
class = serial bus
subclass = SMBus

none2@pci0:0:31:6: class=0x118000 card=0x080415d9 chip=0x8c248086 rev=0x05 hdr=0x00
vendor = 'Intel Corporation'
class = dasp

cxgbc0@pci0:1:0:0: class=0x020000 card=0x00011425 chip=0x00351425 rev=0x00 hdr=0x00
vendor = 'Chelsio Communications Inc'
device = 'S310-CR 10GbE Single Port Adapter'
class = network
subclass = ethernet

mps0@pci0:2:0:0: class=0x010700 card=0x069115d9 chip=0x00861000 rev=0x05 hdr=0x00
vendor = 'LSI Logic / Symbios Logic'
device = 'SAS2308 PCI-Express Fusion-MPT SAS-2'
class = mass storage
subclass = SAS

pcib4@pci0:3:0:0: class=0x060400 card=0x080415d9 chip=0x11501a03 rev=0x03 hdr=0x01
vendor = 'ASPEED Technology, Inc.'
device = 'AST1150 PCI-to-PCI Bridge'
class = bridge
subclass = PCI-PCI

vgapci0@pci0:4:0:0: class=0x030000 card=0x080415d9 chip=0x20001a03 rev=0x30 hdr=0x00
vendor = 'ASPEED Technology, Inc.'
device = 'ASPEED Graphics Family'
class = display
subclass = VGA

igb0@pci0:5:0:0: class=0x020000 card=0x153315d9 chip=0x15338086 rev=0x03 hdr=0x00
vendor = 'Intel Corporation'
class = network
subclass = ethernet

igb1@pci0:6:0:0: class=0x020000 card=0x153315d9 chip=0x15338086 rev=0x03 hdr=0x00
vendor = 'Intel Corporation'
class = network
subclass = ethernet

Code:
mhobbes@nas:~ % ifconfig cxgb0
cxgb0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
    options=6c07bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,TSO4,TSO6,LRO,VLAN_HWTSO,LINKSTATE,RXCSUM_IPV6,TXCSUM_IPV6>
    ether 00:07:43:06:bc:ad
    inet 10.0.5.50 netmask 0xffffff00 broadcast 10.0.5.255
    nd6 options=9<PERFORMNUD,IFDISABLED>
    media: Ethernet 10Gbase-Twinax <full-duplex>
    status: active


Code:
mhobbes@workstation ~ $ ping nas10
PING nas10 (10.0.5.50) 56(84) bytes of data.
64 bytes from nas10 (10.0.5.50): icmp_seq=16 ttl=64 time=998 ms
64 bytes from nas10 (10.0.5.50): icmp_seq=17 ttl=64 time=0.056 ms
64 bytes from nas10 (10.0.5.50): icmp_seq=18 ttl=64 time=0.090 ms
64 bytes from nas10 (10.0.5.50): icmp_seq=19 ttl=64 time=0.111 ms
64 bytes from nas10 (10.0.5.50): icmp_seq=20 ttl=64 time=0.104 ms
64 bytes from nas10 (10.0.5.50): icmp_seq=21 ttl=64 time=0.105 ms
64 bytes from nas10 (10.0.5.50): icmp_seq=22 ttl=64 time=0.100 ms
64 bytes from nas10 (10.0.5.50): icmp_seq=23 ttl=64 time=0.097 ms
64 bytes from nas10 (10.0.5.50): icmp_seq=24 ttl=64 time=0.091 ms
64 bytes from nas10 (10.0.5.50): icmp_seq=25 ttl=64 time=0.091 ms
64 bytes from nas10 (10.0.5.50): icmp_seq=26 ttl=64 time=0.093 ms
64 bytes from nas10 (10.0.5.50): icmp_seq=27 ttl=64 time=0.113 ms
64 bytes from nas10 (10.0.5.50): icmp_seq=28 ttl=64 time=0.109 ms
64 bytes from nas10 (10.0.5.50): icmp_seq=29 ttl=64 time=0.100 ms
64 bytes from nas10 (10.0.5.50): icmp_seq=30 ttl=64 time=0.099 ms
64 bytes from nas10 (10.0.5.50): icmp_seq=31 ttl=64 time=0.093 ms
64 bytes from nas10 (10.0.5.50): icmp_seq=32 ttl=64 time=0.095 ms
From nas10 (10.0.5.50) icmp_seq=59 Destination Host Unreachable
From nas10 (10.0.5.50) icmp_seq=60 Destination Host Unreachable
From nas10 (10.0.5.50) icmp_seq=61 Destination Host Unreachable
From nas10 (10.0.5.50) icmp_seq=62 Destination Host Unreachable
From nas10 (10.0.5.50) icmp_seq=63 Destination Host Unreachable
From nas10 (10.0.5.50) icmp_seq=64 Destination Host Unreachable
^C
--- nas10 ping statistics ---
65 packets transmitted, 17 received, +6 errors, 73% packet loss, time 64118ms
rtt min/avg/max/mdev = 0.056/58.800/998.059/234.814 ms, pipe 4


Code:
mhobbes@nas:~ % arp -a
nas (10.0.4.50) at 0c:c4:7a:c2:d1:d8 on igb0 permanent [ethernet]
? (10.0.5.25) at 78:e3:b5:f4:c3:d2 on cxgb0 expires in 977 seconds [ethernet]
nas10 (10.0.5.50) at 00:07:43:06:bc:ad on cxgb0 permanent [ethernet]


Code:
mhobbes@workstation ~ $ ifconfig eth2
eth2      Link encap:Ethernet  HWaddr 78:e3:b5:f4:c3:d2 
          inet addr:10.0.5.25  Bcast:10.0.5.255  Mask:255.255.255.0
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:23 errors:0 dropped:0 overruns:0 frame:0
          TX packets:200 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:2276 (2.2 KB)  TX bytes:26202 (26.2 KB)
          Interrupt:48
 

MHobbes

Cadet
Joined
Mar 27, 2015
Messages
2
I just wanted to follow up in case anyone reads this thread and was curious...

I spent the weekend trying everything and anything I could think of; I reinstalled, used a brand new flash device for FreeNAS, tried different versions of FreeNAS, tested with completely different client system, played with many different cards and cables, etc.

I stumbled upon the fix after trying a Qlogic card (unsupported by FreeNAS) with a clean install of Linux on the FreeNAS server hardware. That card wouldn't show up in the output of lspci at all, which lead me to believe something strange was happening with the PCIe slot it was in. From there, I started digging through the BIOS on the X10SL7-F.

The manual doesn't match up well with all of the settings in the BIOS, but I eventually wound up in:

Code:
Advanced -> Chipset Configuration -> System Agent (SA) Configuration -> PCIe Configuration


Note that this is different from the "PCIe/PCI/PnP Configuration" sub-menu of Advanced, which is much more prominent (and thus where I was looking initially.) On this particular screen, the BIOS sets the PCI-E slot6 to "Auto" by default, and offers Gen1-Gen3 as options.

This (auto) is how mine was set, and the Chelsio T3 card had been working fine with the BIOS default set like this for a while. Once I manually changed the setting to "Gen2", the card worked perfectly as it did before the fateful reboot last week. Since making this change, I have been unable to reproduce the problem -- it seems stable across reboots, cold boots, etc. and has transferred many, many more GB of data over the 10GbE link without incident. Changing that setting back to Auto, the problems immediately return.

I tested the other options just for the sake of completeness and found that set manually to Gen1 or Gen2, the card worked perfectly. Set to Gen3 or Auto would cause the aforementioned problems.

I have zero idea as to why it worked fine for many TB of data transfers on the "Auto" setting previously, and then suddenly stopped working on a warm reboot. It presented a strange set of symptoms that made it difficult to track down but it certainly seems fixed, at least for now. I'll try to update this thread if any new quirks pop up.


For what it's worth, the T3 card in question is supposedly PCIe 1.1. The Qlogic card that wouldn't show up at all is apparently PCIe 2.0. The X10SL7-F motherboard I'm using is silk-screened v1.01, running the latest BIOS from SuperMicro's english support site (v2.0, dated 4/24/2014.)
 
Joined
Oct 2, 2014
Messages
925
Thats actually one of the last things i would look at, unless i did a BIOS upgrade the night or week before and figured mhmmmm maybe my upgrade set something to default. But good that you got it working, stuffs never fun when it isnt working right.
 

Chromatics

Cadet
Joined
Mar 16, 2017
Messages
2
Though this was a thread long ago, but I thought it would be better to let others know as this also had affected me.

I'm also using Supermicro X10SL7-F and I installed an Intel 10Gbps card with 82598 chip (EXPX9501AFXSR) and suffered the same problem.
But following the solution of MHobbes solved my issue, too.

It might be possible this can be Supermicro X10SL7-F specific problem. Also, my MB's firmware version is 3.27.
 
Status
Not open for further replies.
Top