40G NIC's not getting full performance

Joined
Dec 29, 2014
Messages
1,135
I just upgraded my two FreeNAS systems with Chelsio T580 40G NIC's from Chelsio T520 NIC's. Both systems are Cisco C240 M3S units with LSI-9207-8i and LSI-9207-8e and are running FreeNAS 11.1-U7. The primary unit has dual E5-2637 v2 @ 3.50GHz CPU's and 256G RAM. The secondary has dual E5-2637 @ 3.00GHz CPU's and 128G RAM. The switch is a cisco Nexus3000 C3064PQ and I am using 2M OM4 fiber cables and new QSFP+ modules from FS.COM. I guess the performance isn't terrible with iperf3, but it is certainly a lot less than I had hoped/expected.

Code:
Primary as client:
Connecting to host 192.168.252.23, port 5201
[  5] local 192.168.252.27 port 18741 connected to 192.168.252.23 port 5201
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec  1.97 GBytes  16.9 Gbits/sec    0    841 KBytes
[  5]   1.00-2.00   sec  1.73 GBytes  14.9 Gbits/sec    0    845 KBytes
[  5]   2.00-3.00   sec  1.96 GBytes  16.8 Gbits/sec    0    872 KBytes
[  5]   3.00-4.00   sec  1.68 GBytes  14.5 Gbits/sec    0    872 KBytes
[  5]   4.00-5.00   sec  1.71 GBytes  14.7 Gbits/sec    0    897 KBytes
[  5]   5.00-6.00   sec  1.73 GBytes  14.8 Gbits/sec    0    912 KBytes
[  5]   6.00-7.00   sec  1.75 GBytes  15.0 Gbits/sec    0    920 KBytes
[  5]   7.00-8.00   sec  1.87 GBytes  16.1 Gbits/sec    0    958 KBytes
[  5]   8.00-9.00   sec  1.70 GBytes  14.6 Gbits/sec    0    982 KBytes
[  5]   9.00-10.00  sec  1.83 GBytes  15.7 Gbits/sec    0    982 KBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec  17.9 GBytes  15.4 Gbits/sec    0             sender
[  5]   0.00-10.00  sec  17.9 GBytes  15.4 Gbits/sec                  receiver

Secondary as client:
Connecting to host 192.168.252.27, port 5201
[  5] local 192.168.252.23 port 38278 connected to 192.168.252.27 port 5201
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec  2.06 GBytes  17.7 Gbits/sec    0    594 KBytes
[  5]   1.00-2.00   sec  2.05 GBytes  17.6 Gbits/sec    0    594 KBytes
[  5]   2.00-3.00   sec  2.98 GBytes  25.6 Gbits/sec    0   5.44 MBytes
[  5]   3.00-4.00   sec  3.03 GBytes  26.0 Gbits/sec  530    818 KBytes
[  5]   4.00-5.00   sec  2.87 GBytes  24.6 Gbits/sec    0    818 KBytes
[  5]   5.00-6.00   sec  2.80 GBytes  24.1 Gbits/sec    0    818 KBytes
[  5]   6.00-7.00   sec  2.82 GBytes  24.2 Gbits/sec   66    495 KBytes
[  5]   7.00-8.00   sec  2.05 GBytes  17.6 Gbits/sec    0    595 KBytes
[  5]   8.00-9.00   sec  2.05 GBytes  17.6 Gbits/sec    0    595 KBytes
[  5]   9.00-10.00  sec  2.05 GBytes  17.6 Gbits/sec    0    595 KBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec  24.8 GBytes  21.3 Gbits/sec  596             sender
[  5]   0.00-10.02  sec  24.7 GBytes  21.2 Gbits/sec                  receiver

The switch is running NXOS: version 7.0(3)I7(6). I am not using jumbo frames at the moment. The ESXi hosts access the FreeNAS data stores via NFS which are RAIDZ2, I haven't compared anything with the data stores yet. I am only examining the iperf3 data.

These are the tunables from the primary:
1569614988566.png

And these are the tunables from the secondary.
1569615019462.png


The switch port configs could hardly be more vanilla:
Code:
interface Ethernet1/49
  description FreeNAS2
  switchport access vlan 252
  spanning-tree link-type point-to-point

interface Ethernet1/50
  description FreeNAS
  switchport access vlan 252
  spanning-tree link-type point-to-point

FYI, FreeNAS2 is the primary. Any ideas on where to hunt?
 
Joined
Dec 29, 2014
Messages
1,135
No joy. It works, but iperf throughput is much less than I expected. I wonder if perhaps the boxes are CPU bound, but I am not sure.
 

Jessep

Patron
Joined
Aug 19, 2018
Messages
379
If I recall correctly 40Gb is actually 4X10Gb paired, whereas 100Gb is 4X25Gb paired.

Should you expect to see 40Gb on a single stream?

If I'm way off above second question would be can you generate 40Gb of tracffic? Is it a limitation on send or on receive?
 
Joined
Dec 29, 2014
Messages
1,135
I can definitely get more than 10G in a single stream. What you are saying is true of a LAGG, but I don't think that is true of this particular NIC/Switch combination. I can get 15G+ on the slower server and 25G+ on the faster one. That is why I am wondering if it is CPU bound or perhaps there is some more tuning I need to do. The two FreeNAS boxes are the only only I have with 40G connections. I have done a lot of 10G, but not a lot of 40G. I can't help but wonder if it is CPU/resource related since the faster server with more memory can generate significantly more traffic than the slower one, but I am not sure where to go next.
 

Jessep

Patron
Joined
Aug 19, 2018
Messages
379
I was thinking of "lanes".
https://www.theregister.co.uk/2017/02/06/decoding_25gb_ethernet_and_beyond/
Faster Ethernet was eventually needed. It was decided that the next steps were to be 40Gb and 100Gb. With everyone having had so much fun the last time 40Gb is actually 4x 10.3125Gb lanes. Meanwhile, 100Gb can come in either 10x 10.3125Gb lanes or 4x 25.78125Gb lanes. Because of course it can.

Also this link from Mellanox
https://blog.mellanox.com/2016/03/25-is-the-new-10-50-is-the-new-40-100-is-the-new-amazing/
 
Joined
Dec 29, 2014
Messages
1,135
I think there must be something hardware wise that is limiting me to around 23G.
Code:
Connecting to host 192.168.252.27, port 5201
[  5] local 192.168.252.27 port 14304 connected to 192.168.252.27 port 5201
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec  2.53 GBytes  21.7 Gbits/sec   35   2.26 MBytes
[  5]   1.00-2.00   sec  2.64 GBytes  22.7 Gbits/sec    0   2.26 MBytes
[  5]   2.00-3.00   sec  2.74 GBytes  23.6 Gbits/sec    0   2.26 MBytes
[  5]   3.00-4.00   sec  2.79 GBytes  23.9 Gbits/sec    0   2.26 MBytes
[  5]   4.00-5.00   sec  2.60 GBytes  22.3 Gbits/sec    7   4.05 MBytes
[  5]   5.00-6.00   sec  2.68 GBytes  23.0 Gbits/sec    0   4.05 MBytes
[  5]   6.00-7.00   sec  2.73 GBytes  23.5 Gbits/sec    0   4.05 MBytes
[  5]   7.00-8.00   sec  2.72 GBytes  23.4 Gbits/sec   23   2.15 MBytes
[  5]   8.00-9.00   sec  2.54 GBytes  21.8 Gbits/sec    0   2.15 MBytes
[  5]   9.00-10.00  sec  2.69 GBytes  23.1 Gbits/sec    0   3.46 MBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec  26.7 GBytes  22.9 Gbits/sec   65             sender
[  5]   0.00-10.00  sec  26.7 GBytes  22.9 Gbits/sec                  receiver

This is the faster of my two units talking to itself. I am a little confused by the retries. I haven't seen that when running iperf before. So even if the 40G is multiple lanes, it does it appear that it does bond and not load balance. It feels a little silly to whine about only being able to push 20G on the network, but I am a little disappointed. I would certainly contemplate upgrading some components, but I am not sure which ones that would be.

It does seem that 4 streams is where I get the very highest values from iperf. It drops off when I go to 8.
4 streams from faster unit.
Code:
[SUM]   0.00-10.00  sec  29.6 GBytes  25.4 Gbits/sec    0             sender
[SUM]   0.00-10.00  sec  29.6 GBytes  25.4 Gbits/sec                  receiver

4 streams on slower unit.
Code:
[SUM]   0.00-10.00  sec  23.1 GBytes  19.9 Gbits/sec    0             sender
[SUM]   0.00-10.00  sec  23.1 GBytes  19.9 Gbits/sec                  receiver
 
Last edited:
Joined
Dec 29, 2014
Messages
1,135
I think there must be something in my hardware (likely the CPU) that is limiting me. I just now decided to do an iperf test to itself on the loopback interface.
Code:
root@freenas2:/nonexistent # iperf3 -c 127.0.0.1
Connecting to host 127.0.0.1, port 5201
[  5] local 127.0.0.1 port 64550 connected to 127.0.0.1 port 5201
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec  2.56 GBytes  22.0 Gbits/sec   26   3.49 MBytes
[  5]   1.00-2.00   sec  2.41 GBytes  20.7 Gbits/sec    0   6.62 MBytes
[  5]   2.00-3.00   sec  2.77 GBytes  23.8 Gbits/sec    0   6.62 MBytes
[  5]   3.00-4.00   sec  2.37 GBytes  20.4 Gbits/sec    0   7.01 MBytes
[  5]   4.00-5.00   sec  2.35 GBytes  20.2 Gbits/sec   95   2.27 MBytes
[  5]   5.00-6.00   sec  2.43 GBytes  20.9 Gbits/sec    0   7.01 MBytes
[  5]   6.00-7.00   sec  2.36 GBytes  20.3 Gbits/sec  126   5.68 MBytes
[  5]   7.00-8.00   sec  2.48 GBytes  21.3 Gbits/sec    0   5.74 MBytes
[  5]   8.00-9.00   sec  2.42 GBytes  20.8 Gbits/sec    0   7.01 MBytes
[  5]   9.00-10.00  sec  2.44 GBytes  21.0 Gbits/sec    0   7.01 MBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec  24.6 GBytes  21.1 Gbits/sec  247             sender
[  5]   0.00-10.00  sec  24.6 GBytes  21.1 Gbits/sec                  receiver

iperf Done.

I was able to get a little higher with 4 streams, but that seems to be as high as it will go.
Code:
[SUM]   0.00-10.00  sec  28.6 GBytes  24.6 Gbits/sec    0             sender
[SUM]   0.00-10.00  sec  28.6 GBytes  24.6 Gbits/sec                  receiver

Does that sounds reasonable? I have the fastest CPU I can get in my server although I could get more cores. I don't think that would help.
 
Joined
Dec 29, 2014
Messages
1,135
I have upgraded the CPU's in my secondary FreeNAS to match the primary, and the results were fairly consistent. I did notice some retries in iperf which seemed to be slowing things down. I was able to clear the retries by changing some sysctl settings.
Code:
sysctl net.inet.tcp.blackhole=2
sysctl net.inet.udp.blackhole=1

What tipped me off to this was the following message:
Code:
Limiting open port RST response from 236 to 200 packets/sec

I still appear to be capped in the upper 20G range even on the loopback interface, so I guess that is hardware limit. Sigh.
 
Joined
Dec 29, 2014
Messages
1,135
Actually iperf3 was a big part of the problem. Iperf (iperf2) works much better. What a kick in the shorts that the problem was the testing software. :-(
 

JoeAtWork

Contributor
Joined
Aug 20, 2018
Messages
165
Nice, I assume with iperf2 your getting the 40gig you were looking for?

How hard is it for you to get any firmware updates on the C240?

On your Primary FreeNAS node, have you tried using using PCIe devices that require bifurcation? Supermicro AOC-SLG3-2M2 PCIe Add-On Card and a pair of cheap NVME disks for l2arc?
 
Joined
Dec 29, 2014
Messages
1,135
Nice, I assume with iperf2 your getting the 40gig you were looking for?
Yes, I was happy with the hosts to host speed. My two FreeNAS boxes are the only ones with 40G interfaces. The ESXi hosts have 10G interfaces.
How hard is it for you to get any firmware updates on the C240?
It isn't hard for me, but I work for a Cisco partner. That might not be true for everyone else. You do have to have a service contract to download firmware other than images related to major security advisories.
On your Primary FreeNAS node, have you tried using using PCIe devices that require bifurcation?
Not sure what you mean there. I have an Intel Optane SLOG that works great. That provided a major boost for my NFS traffic. I don't have an iSCSI volumes configured. I don't see the need to add an L2ARC since I have 256G of RAM in the primary box.
 

JoeAtWork

Contributor
Joined
Aug 20, 2018
Messages
165
No cisco contract here. :-( Looks like the C240 is nice, the price is very good on ebay as well.

If you use that Supermico x8 PCI card you can run two NVME SSD's if your system will do the bifercation. The 16gig pcie 3.0 x 2 Optane gum sticks are very inexpensive. I really do not have anything I could test them with or I would. The Synology M2D18 card us more $150 and it is said to be fixed in FreeBSD 12 and some version of 11.x patched this faill : https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=211713
 
Joined
Dec 29, 2014
Messages
1,135
Here are my ARC statistics. I am not feeling like I really need to do anything given my ARC hit ratios.
1585532763669.png
 

kspare

Guru
Joined
Feb 19, 2015
Messages
507
did you ever update the software on your 3064? I run those switches too, looking at upgrading to V9 of the software, but can't really find a REALLY good reason to do it. I am running the same cards, and it's really hard to actually do a svmotion and hit over 10gb of throughput with 22 drives in mirrors.
 
Joined
Dec 29, 2014
Messages
1,135
I am running NXOS version 7.0(3)I7(9). I have never been able to get over about 16G throughput from a pool. That is good enough for me. I only did the 40G links because I could. :smile:
 

kspare

Guru
Joined
Feb 19, 2015
Messages
507
yeah i'd be curious what it takes to get over 16gb. I have seen svm spikes at the end of a move hit about 30-35G, but the main data transfer sits at about 10.
 
Joined
Dec 29, 2014
Messages
1,135
My pool structure (2 x RAID-Z2 vdevs) isn't optimal, but I wasn't willing to take the storage hit to do mirrored vdevs. This is mostly my lab and a couple small servers for mail, etc so my needs aren't really that high. I have enough RAM that my ARC hit ratio is 94% or more, so it meets my needs.
 
Top