Intel NIC performance (FN 9.2.1.7)

Status
Not open for further replies.

ctd_uk

Cadet
Joined
Sep 27, 2014
Messages
8
(from a complete FN newbie; no linux, nor bsd fluency. Please treat gently)

Hastily installed a new copy of FN on to a less-than-ideal but available machine, in order to avert a near-disaster situation with an inherited complex linux/drbd/esxi setup that had drives failing and had way too steep a learning curve for me to manage crisis avoidance. All critical data successfully moved to the FN box and esxi more-or-less happily now talking to FN.

Many thanks to the FN team for that.

Now waiting on sundry bits of hardware to arrive to create a more suitable platform.
The present scenario has one disc-less machine running esxi (small Windows SBS, linux firewall, etc) . The VMs are resident on the (primary) FN machine. There is also a secondary FN machine. Each of these machines is connected to a consumer-grade 8-port gigabit switch (subnet 192.128.218.xxx) which is exclusively for server-server traffic. They are also connected via a 24-port gigabit switch to the rest of the LAN (subnet 192.128.16.xxx). In all 3 machines the system drive is implemented as a USB stick, typically 8GB or 16GB.

Almost all of the NICs involved are Intel - integral (motherboard), or PCI, or PCIe. One - on the original life-saver machine - is a Realtek. That machine - which will when the new hardware arrives be replaced - is presently the primary FN box. Both of the FN boxes are using raidz2. The primary box has a 4-bay HS enclosure containing 1TB drives. The secondary has an 8-bay HS enclosure containing 750GB drives.

The transfer of data from FN1 to FN2 is extremely slow - with Intel NICs at each end. zfs send/recv on a volume of appr 300GB delivers about 8MBps. A copy of the same volume of data from FN2 to another dataset on FN2 achieves about 61MBps - not great, but acceptable given the read/write contention. This suggests that the FN1 to FN2 transfer is network constrained, not disc-constrained. The network utilisation - as shown in the GUI Reporting page - does not exceed 100Mbps.

The FN1 box also communicates with the esxi box. Initially, this connection was using the Realtek adapter. Every 30 to 40 mins there is a sustained burst of activity lasting about 7 to 10 minutes. The FN network graph and the vSphere network graph each agree ... the traffic is about 600Mbps.

Because this 'heartbeat' workload is so predictable, as a test I swapped the cables into the Intel and Realtek NICs and swapped the (fixed) IP addresses. When the Intel NIC was doing the job, the speed again dropped to less than 100Mbps. Equally, when the assignments/cables were reinstated the network transfer rate went back up to 600 or so. The Realtek was significantly out-performing the Intel. More accurately, the Intel was massively under-performing.

Everything I have read re FN suggests that (barring 10Gbe) Intel is the way to go for NICs, and that autosense should be good enough. ifconfig output confirms that all NICs are operating at 1000Mbps.

Can someone please shed some light on where my actions/thinking have gone astray, and/or what I can do to get the Intel NICs up to acceptable performance levels. I suppose I should add that I have not - as far as I know - altered any config files other than through the GUI.
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
If you detail all involved machines and how the NICs interface with them, we might be able to provide some insight.

My hunch is on a hardware issue. I imagine the one you call "the original lifesaver" might be a setup that characteristically produces such results.
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,994
Cut out any hardware between any two machines and use a direct Ethernet cable and then retest. Also specify how you are testing the transfer, many folks state certain transfer rates and we find out they have incorrectly tested so it was just a waste of time. I don't know how reliable zfs send/recieve is for this testing either, maybe someone can post if it's reliable for this purpose.
 

ctd_uk

Cadet
Joined
Sep 27, 2014
Messages
8
Thanks for the responses.

@Ericloewe
I hope this is no too much info:
FN1 (Primary)
Mbd: Asus P7Q57-M DO
Bios: 1202 2011/05/11 (chkd 2014-09)
CPU: Intel Core i5 650 (dual-core 3.2GHz)
RAM: 8GB DDR3 1333MHz unbuff ... not ECC
OS: FreeNAS 9.2.1.7 x64 (FreeBSD)
Boot: Kingston Traveller 16GB USB 2.0
LAN: 1 x Intel Pro/1000 (mbd); won't perform well
1 x Realtek 8169/81695/ etc Gbe (pci)
1 x Intel Pro/1000 (pciE); won't perform well
[root@FreeNAS] ~# ifconfig
em0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
options=4219b<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM,TSO4,WOL_MAGIC,VLAN_HWTSO>
ether e0:cb:4e:d3:b4:6b
inet 192.168.16.57 netmask 0xffffff00 broadcast 192.168.16.255
nd6 options=9<PERFORMNUD,IFDISABLED>
media: Ethernet autoselect (1000baseT <full-duplex>)
status: active
re0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
options=8209b<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM,WOL_MAGIC,LINKSTATE>
ether 00:0a:cd:1e:62:f5
inet 192.168.218.57 netmask 0xffffff00 broadcast 192.168.218.255
nd6 options=9<PERFORMNUD,IFDISABLED>
media: Ethernet autoselect (1000baseT <full-duplex>)
status: active
ipfw0: flags=8800<SIMPLEX,MULTICAST> metric 0 mtu 65536
nd6 options=9<PERFORMNUD,IFDISABLED>
lo0: flags=8049<UP,LOOPBACK,RUNNING,MULTICAST> metric 0 mtu 16384
options=600003<RXCSUM,TXCSUM,RXCSUM_IPV6,TXCSUM_IPV6>
inet6 ::1 prefixlen 128
inet6 fe80::1%lo0 prefixlen 64 scopeid 0x7
inet 127.0.0.1 netmask 0xff000000
nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>​

FN2 (Secondary)
Mbd: Asus M4A88T-M/USB3
Bios: 0704 2012/01/30 (chkd 2014-09)
CPU: AMD Phenom II X4 955 (Quad-core 3.2GHz)
RAM: 16GB DDR3 1600MHz ECC unbuff Corsair
OS: FreeNAS 9.2.1.7 x64 (FreeBSD)
Boot: SanDisk 8GB usb 2.0
LAN: 1 x Intel Pro/1000 (pci); need to measure performance
1 x Intel Pro/1000 (pciE); need to measure performance
Gbe on mbd not used
[root@RA_NAS_01] ~# ifconfig
em0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
options=4219b<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM,TSO4,WOL_MAGIC,VLAN_HWTSO>
ether 00:1b:21:a4:c5:fb
inet 192.168.218.50 netmask 0xffffff00 broadcast 192.168.218.255
nd6 options=9<PERFORMNUD,IFDISABLED>
media: Ethernet autoselect (1000baseT <full-duplex>)
status: active
em1: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
options=209b<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM,WOL_MAGIC>
ether 00:1b:21:a3:60:ab
inet 192.168.16.50 netmask 0xffffff00 broadcast 192.168.16.255
nd6 options=9<PERFORMNUD,IFDISABLED>
media: Ethernet autoselect (1000baseT <full-duplex>)
status: active
ipfw0: flags=8801<UP,SIMPLEX,MULTICAST> metric 0 mtu 65536
nd6 options=9<PERFORMNUD,IFDISABLED>
lo0: flags=8049<UP,LOOPBACK,RUNNING,MULTICAST> metric 0 mtu 16384
options=600003<RXCSUM,TXCSUM,RXCSUM_IPV6,TXCSUM_IPV6>
inet6 ::1 prefixlen 128
inet6 fe80::1%lo0 prefixlen 64 scopeid 0xb
inet 127.0.0.1 netmask 0xff000000
nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>​
@joeschmuck
As I see it, there are two distinct pieces of timing evidence.

One is the declared rate between the esxi box and the FN primary (using the FN GUI and the vSphere GUI). This is Intel NIC on the esxi box and Realtek NIC on the FN, reporting appr 600Mbps. Further evidence on this speed is the perceived performance of the SBS VM running on the esxi box. This is downright sluggish when the Intel NIC is used on the FN box and acceptable when the Realtek NIC is used.

The second piece of performance data is the zfs send/recv between FN1 and FN2. I genuinely have no idea what is happening at the lowest level - for example given that disc data is compressed does the send/recv transfer bother to decompress-transmit-recompress. I choose to ignore that for the present purpose. The gross stats are (as declared by the 'time' wrapper that I put around the send/recv) "273GB stream in 31764 seconds (8.81MB/sec)". The Storage page declares the Dataset size as 309GB, so maybe the 273GB is a compressed figure ... I don't know. But that difference is for the moment not that significant - not when we are out by a factor of 6 to 9 in performance terms. My measurement of elapsed time is generally in line with the 'time' command.

By minimising the physical disruption (i.e. literally all I did was disconnect a cable from one NIC and put it in the other and make compensating IP adjustment), I was trying to eliminate all infrastructure variables. A completely reversible performance change shows the lower performance associated with the Intel NIC and the higher performance associated with the Realtek NIC. I also tried a pciE Intel NIC and fared no better. I repeat my complete newbie status, but the evidence as I see it is strongly hinting at a driver and/or config issue re the Intel cards.
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
Well, those Intel GbE work like a charm for everyone, so something else is up.

On FN2, which NIC connects to FN1? If it's the PCI one, try the PCI-e one. PCI can be a major bottleneck for GbE if there's anything else hanging off the bus.
 

ctd_uk

Cadet
Joined
Sep 27, 2014
Messages
8
@Ericloewe
"Well, those Intel GbE work like a charm for everyone, so something else is up."

Absolutely. That's why I've been reading everything I could all over the web for many days before posting this enquiry.
I'm just not up to identifying what the relevant "something else" might be.

The 'SAN' subnet (.218.) links FN1 and FN2. On FN2 that's the Intel PCIe. On FN1, that's the Realtek PCI. The 'LAN' subnet (.16.) is served by Intel PCI on both machines.
While the FN1 to FN2 connection is a concern, it uses the same 8-port gigabit switch that FN1 uses to link to the ESXi box.
We use Intel NICs all over the LAN and they work fine.
I have 4 Intel NICs in the ESXi box, and they work fine.
For me, the consistently underperforming combo is Intel NIC (PCI or PCIe) on FN - i.e. the combo that everyone else swears by.

Once again exposing my ignorance ... is there any significance in my having ignored IPv6 in fixing the IP addresses?
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
@Ericloewe
"Well, those Intel GbE work like a charm for everyone, so something else is up."

Absolutely. That's why I've been reading everything I could all over the web for many days before posting this enquiry.
I'm just not up to identifying what the relevant "something else" might be.

The 'SAN' subnet (.218.) links FN1 and FN2. On FN2 that's the Intel PCIe. On FN1, that's the Realtek PCI. The 'LAN' subnet (.16.) is served by Intel PCI on both machines.
While the FN1 to FN2 connection is a concern, it uses the same 8-port gigabit switch that FN1 uses to link to the ESXi box.
We use Intel NICs all over the LAN and they work fine.
I have 4 Intel NICs in the ESXi box, and they work fine.
For me, the consistently underperforming combo is Intel NIC (PCI or PCIe) on FN - i.e. the combo that everyone else swears by.

Once again exposing my ignorance ... is there any significance in my having ignored IPv6 in fixing the IP addresses?

No, IPv6 is as good as irrelevant at the moment, especially if you're just linking two local machines.

I'd try a direct connection, to rule out some crazy interaction, possibly replacing the NIC with a different one of the same model. Maybe you can shuffle them around to see if the problem follows the NIC?
 

gpsguy

Active Member
Joined
Jan 22, 2012
Messages
4,472
Edited message to remove incorrect information. I misread the addresses on the NIC's.

Are you using iSCSI? If so, you'll need more RAM on FN1.

FN1 only has 8Gb of RAM, whereas FN2 has 16Gb.
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
Edited message to remove incorrect information. I misread the addresses on the NIC's.

Are you using iSCSI? If so, you'll need more RAM on FN1.

FN1 only has 8Gb of RAM, whereas FN2 has 16Gb.

Valid point, but it doesn't explain why the Realtek is working normally while the Intel is barely working at sub 100Mb/s speeds.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
8MB/sec seems suspiciously like it was negotiating at only 100Mb LAN connections. You ruled that out though.

Keep in mind that we've had plenty of problems related to NICs on the other end as well as cabling and network switches that just sucked. It is odd that Realtek worked better than Intel, but I'd be more interested.

I also don't think I need to say it, but buying desktop hardware can lead to "unexpected" and "unexplainable" consequences. No clue why some of them happen. And stuff like this leaves people scratching their heads and wondering what is wrong. But we've found on more than on occasion that just buying recommended hardware from our stickies resolves the problem even when reusing the old NIC that was previously working poorly. I've had to push the "I believe" button more than once because sometimes you can't explain it and it's not worth the time and effort to explain it.
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,994
Sorry but I've never done iSCSI but if it were me, I'd stop using two FN systems while troubleshooting and start with one of them and start conducting throughput testing to a known fast networked computer. I'd also head the note about the fact that you may need more RAM in your 8GB system. Once you are satisfied it works properly, do the same to the second FN computer. After those pass with flying colors then connect them together and pass data between them, it should be fine. I'm certain you will find something wrong with one of those systems or the interfacing network hardware.
 

ctd_uk

Cadet
Joined
Sep 27, 2014
Messages
8
Thanks again for the assistance.

I accept the comments re the FN primary. 8GB non-ECC is the max the board will support. The whole FN thing was thrown together in zero time because the then production system was falling apart by the minute.
With the primitive stop-gap FN setup in place I was able to use vSphere browser to copy data from the failing system to the FN. With that done, I went on to reconfigure the ESXi system to use the FN dataset as the VM repository.
The holding operation was in place.

Still feeling very exposed, and not knowing how effective the FN system was going to be, I built FN2 to be a more realistic hardware platform. The early intentions were (a) to get a backup copy of the data system and (b) to experiment with zfs replication to get a handle on timings etc. Again, the 16GB ECC is max for that board. Another machine (FN3) will replace FN1. That will also have 16GB ECC and a quad-core cpu.

So, I agree that the equipment mix right now is not ideal, and is actively being replaced.
However, it is not a simply academic point that in my (very limited) experience, every Intel NIC used within the FN machines seems to be giving poor performance. And, using the same cabling and switches the sole Realtek NIC is performing acceptably. As has been commented, it is as though the Intels are being capped at 100Mbps, while each is reporting autoselect of 1000Mbps. I obviously need to get this resolved to clarify the proper forward path. It would be nice (for everyone) if it turns out to be a hardware fault. If it is, it's a pretty elusive one.

Because the system is in effect a production system, I have to tread very lightly and with considerable care. I would like to 'promote' FN2 to become the primary, but don't have a full understanding of how to convert the copied snapshot(s) on FN2 into a live dataset (more reading to do, I guess). When that can be achieved, then I can decommission the current FN1 and use FN3 as the backup.

Meanwhile, I can try a couple of other Intel NICs in FN1. I'll try different PCI slots and see if there's any variation in result. I suppose I can also try a straight cable connect from FN1 to ESXi.
I'll return with more info as soon as I'm able.
 

ctd_uk

Cadet
Joined
Sep 27, 2014
Messages
8
Well, the wise advisors have it right again ...

One of the Intel PCI NICs has some kind of a fault. Unfortunately, not broken - just slow.
The on-board Intel NIC in FN1 has a similar problem.

One of these two managed to be in each of the (many) tests that I ran, wrecking the performance.
Without either of these faulty units, the network behaviour is much better - typically 750Mbps plus with Intel PCIe, as compared with 600-ish achieved earlier with the Realtek.

As it happens, one of the machines also had a 500w PSU fail today.
It may be that the machine has been wobbling, adding to the confusion.
Having replaced the PSU, everything is up and stable.

Following the advice given, I have completed the build of another machine using 16GB of ECC RAM.
Tomorrow, I shall attempt to promote FN2 to the primary role, install FN3 as the backup and withdraw FN1.

Thanks again for the help - both technical and psychological. Things were getting very confusing there for a while.
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
Good to know that the problem has been solved, even if the mystery of the misbehaving Intel NIC.

Good luck with your migration plan. Hopefully, once everything is properly stabilized, you'll be able to review the whole setup and decide where to go from there.
 
Status
Not open for further replies.
Top