Performance issue with 10Gbps network

Status
Not open for further replies.

pclausen

Patron
Joined
Apr 19, 2015
Messages
267
I recently upgraded my home network to include a central switch with a pair of SFP+ ports. The switch I got was the 48 port 500W model seen here:

https://www.ubnt.com/unifi-switching-routing/unifi-switch/

I got a pair of Intel X520 NICs. One installed in my FreeNAS server and connected to the switch via a SFP+ twinax cable.

The other X520 is installed in my Windows 10 workstation and connected to the switch via a 50 ft OM3 cable.

iperf running from the workstation shows the following:

Code:
C:\iperf>iperf -p 5001 -c 10.0.1.50 -w 512k
------------------------------------------------------------
Client connecting to 10.0.1.50, TCP port 5001
TCP window size:  512 KByte
------------------------------------------------------------
[  3] local 10.0.1.53 port 57211 connected with 10.0.1.50 port 5001
[ ID] Interval       Transfer     Bandwidth
[  3]  0.0-10.0 sec  5.74 GBytes  4.92 Gbits/sec


When I increase to 6 threads, throughput almost doubles:

Code:
C:\iperf>iperf -p 5001 -c 10.0.1.50 -w 512k -P 6
------------------------------------------------------------
Client connecting to 10.0.1.50, TCP port 5001
TCP window size:  512 KByte
------------------------------------------------------------
[  7] local 10.0.1.53 port 63293 connected with 10.0.1.50 port 5001
[  8] local 10.0.1.53 port 63294 connected with 10.0.1.50 port 5001
[  3] local 10.0.1.53 port 63289 connected with 10.0.1.50 port 5001
[  4] local 10.0.1.53 port 63290 connected with 10.0.1.50 port 5001
[  6] local 10.0.1.53 port 63292 connected with 10.0.1.50 port 5001
[  5] local 10.0.1.53 port 63291 connected with 10.0.1.50 port 5001
[ ID] Interval       Transfer     Bandwidth
[  3]  0.0-10.0 sec  2.42 GBytes  2.08 Gbits/sec
[  4]  0.0-10.0 sec  1.54 GBytes  1.32 Gbits/sec
[  5]  0.0-10.0 sec  1.53 GBytes  1.32 Gbits/sec
[  7]  0.0-10.0 sec  1.53 GBytes  1.32 Gbits/sec
[  8]  0.0-10.0 sec  2.42 GBytes  2.08 Gbits/sec
[  6]  0.0-10.0 sec  1.53 GBytes  1.32 Gbits/sec
[SUM]  0.0-10.0 sec  11.0 GBytes  9.42 Gbits/sec


So my first question is, why does single thread performance only appear to be about 50% of what the link should be capable of?

When I copy from the server to the workstation, I'm only getting about 600 Mbps as seen here:

copytoworkstation.PNG


When I go the other way, I get about 2 Gbps as seen here:

copytofreenas.PNG


The workstation has 4 Samsung 128GB 840 PROs in RAID0 so that should not be the bottleneck. Crystal gives me the following:

crystaldiskmark.PNG


Any ideas about what the issue might be?
 

pclausen

Patron
Joined
Apr 19, 2015
Messages
267
Going the other way the performance is terrible. Single thread:

Code:
[root@freenas] ~# iperf -p 5001 -c 10.0.1.53 -w 512k
------------------------------------------------------------
Client connecting to 10.0.1.53, TCP port 5001
TCP window size:  513 KByte (WARNING: requested  512 KByte)
------------------------------------------------------------
[  3] local 10.0.1.50 port 49666 connected with 10.0.1.53 port 5001
[ ID] Interval       Transfer     Bandwidth
[  3]  0.0-10.0 sec   544 MBytes   456 Mbits/sec


Increasing the thread count increases the throughput accordingly. The most it will allow me to do is 9 threads before I get a connection refused error. Stats with 9 threads as follows:

Code:
[root@freenas] ~# iperf -p 5001 -c 10.0.1.53 -w 512k -P 9
------------------------------------------------------------
Client connecting to 10.0.1.53, TCP port 5001
TCP window size:  513 KByte (WARNING: requested  512 KByte)
------------------------------------------------------------
[  3] local 10.0.1.50 port 38929 connected with 10.0.1.53 port 5001
[  4] local 10.0.1.50 port 38930 connected with 10.0.1.53 port 5001
[  5] local 10.0.1.50 port 38931 connected with 10.0.1.53 port 5001
[  6] local 10.0.1.50 port 38932 connected with 10.0.1.53 port 5001
[  7] local 10.0.1.50 port 38933 connected with 10.0.1.53 port 5001
[  8] local 10.0.1.50 port 38934 connected with 10.0.1.53 port 5001
[ 10] local 10.0.1.50 port 38936 connected with 10.0.1.53 port 5001
[  9] local 10.0.1.50 port 38935 connected with 10.0.1.53 port 5001
[ 11] local 10.0.1.50 port 38937 connected with 10.0.1.53 port 5001
[ ID] Interval       Transfer     Bandwidth
[  3]  0.0-10.0 sec   744 MBytes   624 Mbits/sec
[  5]  0.0-10.0 sec   777 MBytes   652 Mbits/sec
[  6]  0.0-10.0 sec   876 MBytes   735 Mbits/sec
[  7]  0.0-10.0 sec   852 MBytes   714 Mbits/sec
[  8]  0.0-10.0 sec   868 MBytes   728 Mbits/sec
[ 10]  0.0-10.0 sec   617 MBytes   517 Mbits/sec
[  9]  0.0-10.0 sec   876 MBytes   734 Mbits/sec
[ 11]  0.0-10.0 sec   624 MBytes   524 Mbits/sec
[  4]  0.0-10.2 sec   767 MBytes   632 Mbits/sec
[SUM]  0.0-10.2 sec  6.84 GBytes  5.77 Gbits/sec


I'm going to boot the Windows workstation from a FreeBSD USB stick and run the test again to see if this is a windows driver issue.
 

Pheran

Patron
Joined
Jul 14, 2015
Messages
280
Yeah, I suspected you'd see something like this since the direction of your original iperf didn't match the file transfer direction you are having a problem with. I hope FreeNAS 10 updates to iperf3, it's easier to reverse the traffic flow on that version. Your next test sounds like a good idea.
 

pclausen

Patron
Joined
Apr 19, 2015
Messages
267
I created a FreeNAS boot USB stick and repeated the tests. This time from FreeNAS to FreeNAS.

Going from the workstation to the server:

Code:
[root@freenas] ~# iperf -s -p 5001 -w 512k
------------------------------------------------------------
[ ID] Interval       Transfer     Bandwidth
[  4]  0.0-10.0 sec  5.22 GBytes  4.48 Gbits/sec

[  5]  0.0-10.0 sec  3.48 GBytes  2.99 Gbits/sec
[  4]  0.0-10.0 sec  3.52 GBytes  3.02 Gbits/sec
[SUM]  0.0-10.0 sec  7.00 GBytes  6.00 Gbits/sec

[  6]  0.0-10.0 sec  1.35 GBytes  1.15 Gbits/sec
[  5]  0.0-10.0 sec  2.72 GBytes  2.33 Gbits/sec
[  4]  0.0-10.0 sec  2.74 GBytes  2.35 Gbits/sec
[  7]  0.0-10.0 sec  1.35 GBytes  1.16 Gbits/sec
[SUM]  0.0-10.0 sec  8.15 GBytes  6.99 Gbits/sec

[  4]  0.0-10.0 sec  1.08 GBytes   929 Mbits/sec
[  8]  0.0-10.0 sec  2.17 GBytes  1.86 Gbits/sec
[  5]  0.0-10.0 sec  2.17 GBytes  1.86 Gbits/sec
[  7]  0.0-10.0 sec  1.08 GBytes   929 Mbits/sec
[  9]  0.0-10.0 sec  2.17 GBytes  1.86 Gbits/sec
[  6]  0.0-10.0 sec  2.17 GBytes  1.86 Gbits/sec
[SUM]  0.0-10.0 sec  10.8 GBytes  9.31 Gbits/sec

[ 10]  0.0-10.1 sec  1.30 GBytes  1.11 Gbits/sec
[  4]  0.0-10.1 sec   635 MBytes   528 Mbits/sec
[  8]  0.0-10.1 sec  1.91 GBytes  1.63 Gbits/sec
[  7]  0.0-10.1 sec  2.26 GBytes  1.92 Gbits/sec
[  6]  0.0-10.1 sec  1.92 GBytes  1.64 Gbits/sec
[  9]  0.0-10.1 sec   635 MBytes   528 Mbits/sec
[  5]  0.0-10.1 sec  1.30 GBytes  1.10 Gbits/sec
[ 11]  0.0-10.1 sec  1.01 GBytes   861 Mbits/sec
[SUM]  0.0-10.1 sec  11.0 GBytes  9.32 Gbits/sec


Going from the server to the workstation:

Code:
[root@freenas] ~# iperf -c 10.0.1.53 -p 5001 -w 512k
------------------------------------------------------------
[ ID] Interval       Transfer     Bandwidth
[  3]  0.0-10.1 sec  2.34 GBytes  2.00 Gbits/sec

[  4]  0.0-10.0 sec  2.10 GBytes  1.80 Gbits/sec
[  3]  0.0-10.0 sec  2.27 GBytes  1.95 Gbits/sec
[SUM]  0.0-10.0 sec  4.37 GBytes  3.75 Gbits/sec

[  5]  0.0-10.0 sec  2.01 GBytes  1.73 Gbits/sec
[  4]  0.0-10.0 sec  2.01 GBytes  1.72 Gbits/sec
[  3]  0.0-10.1 sec  2.16 GBytes  1.83 Gbits/sec
[  6]  0.0-10.2 sec  1.83 GBytes  1.54 Gbits/sec
[SUM]  0.0-10.2 sec  8.01 GBytes  6.75 Gbits/sec

[  3]  0.0-10.0 sec  1.73 GBytes  1.48 Gbits/sec
[  5]  0.0-10.0 sec  1.39 GBytes  1.19 Gbits/sec
[  4]  0.0-10.0 sec  1.57 GBytes  1.35 Gbits/sec
[  7]  0.0-10.0 sec  1.45 GBytes  1.25 Gbits/sec
[  6]  0.0-10.0 sec  1.80 GBytes  1.55 Gbits/sec
[  8]  0.0-10.2 sec  1.85 GBytes  1.57 Gbits/sec
[SUM]  0.0-10.2 sec  9.79 GBytes  8.28 Gbits/sec

[  3]  0.0-10.0 sec  1.39 GBytes  1.20 Gbits/sec
[  4]  0.0-10.0 sec  1.43 GBytes  1.23 Gbits/sec
[  5]  0.0-10.0 sec  1.46 GBytes  1.26 Gbits/sec
[  6]  0.0-10.0 sec  1.33 GBytes  1.14 Gbits/sec
[  7]  0.0-10.0 sec  1.02 GBytes   878 Mbits/sec
[ 10]  0.0-10.0 sec  1.52 GBytes  1.31 Gbits/sec
[  8]  0.0-10.0 sec  1.28 GBytes  1.10 Gbits/sec
[  9]  0.0-10.0 sec  1.29 GBytes  1.10 Gbits/sec
[SUM]  0.0-10.0 sec  10.7 GBytes  9.21 Gbits/sec


So to sum up the performance by thread count and direction, I'm seeing the following:

gbpstest.PNG


So the single thread performance issue does not have anything to do with Windows. I wonder what the next test would be. Perhaps taking the switch out of the equation by connecting the X520 cards directly to each other? Not sure I'll be able to pull that off since my fiber cable on the server end won't reach all the way down to the lower portion of the rack where the FreeNAS box is sitting.
 
Last edited:

Pheran

Patron
Joined
Jul 14, 2015
Messages
280
I'm confused by your conclusion, as it appears to me that your base performance (server->client) went from 456 Mbps to 2 Gbps when you switched from Windows to FreeBSD. That looks like a substantial difference to me.
 

pclausen

Patron
Joined
Apr 19, 2015
Messages
267
Sorry, what I meant was that Windows was not the reason why single thread performance could not reach 90%+ of available bandwidth.

You are correct that 2Gbps vs. 456Mbps under Windows for single thread performance is a substantial improvement, but still not in the 9Gbps + range that others are getting with single thread performance.
 

Mlovelace

Guru
Joined
Aug 19, 2014
Messages
1,111
What are the hardware specs of the win 10 box? The PC might not be able to saturate a single threaded 10gig link. If your windows box hardware was up to the task iperf should have shown it, as it removes protocol and hard drive read/write from the equation.
 

pclausen

Patron
Joined
Apr 19, 2015
Messages
267
Just for grins, I ran a loopback test on both machines. The one with the Xeon 1620 can do 41.1 Gbps, the one with the i7-4770K can only do about 1/4 that.

Code:
[root@freenas] ~# iperf -c 127.0.0.1 -d
------------------------------------------------------------
Server listening on TCP port 5001
TCP window size: 64.0 KByte (default)
------------------------------------------------------------
------------------------------------------------------------
Client connecting to 127.0.0.1, TCP port 5001
TCP window size:  304 KByte (default)
------------------------------------------------------------
[  5] local 127.0.0.1 port 23233 connected with 127.0.0.1 port 5001
[  4] local 127.0.0.1 port 5001 connected with 127.0.0.1 port 23233
[ ID] Interval       Transfer     Bandwidth
[  5]  0.0-10.0 sec  47.9 GBytes  41.1 Gbits/sec
[  4]  0.0-10.0 sec  47.9 GBytes  41.1 Gbits/sec


Code:
C:\iperf>iperf -c 127.0.0.1 -d
------------------------------------------------------------
Server listening on TCP port 5001
TCP window size: 63.0 KByte (default)
------------------------------------------------------------
------------------------------------------------------------
Client connecting to 127.0.0.1, TCP port 5001
TCP window size: 63.0 KByte (default)
------------------------------------------------------------
[  4] local 127.0.0.1 port 51051 connected with 127.0.0.1 port 5001
[  5] local 127.0.0.1 port 5001 connected with 127.0.0.1 port 51051
[ ID] Interval       Transfer     Bandwidth
[  4]  0.0-10.0 sec  10.7 GBytes  9.18 Gbits/sec
[  5]  0.0-10.0 sec  10.7 GBytes  9.17 Gbits/sec
 

pclausen

Patron
Joined
Apr 19, 2015
Messages
267
What are the hardware specs of the win 10 box? The PC might not be able to saturate a single threaded 10gig link. If your windows box hardware was up to the task iperf should have shown it, as it removes protocol and hard drive read/write from the equation.
Win 10 box has Intel i7-4770K Haswell CPU and Z87 motherboard (Asus MAXIMUS VI HERO) and 2x DDR3-2400 4Gig memory sticks. X520 card is sitting in PCIe x16 slot.

Running the single thread loopback test on the Win 10 machine as per above post, I do max out one core but get 9.17 Gbps doing so.
 
Last edited:

Mlovelace

Guru
Joined
Aug 19, 2014
Messages
1,111
Win 10 box has Intel i7-4770K Haswell CPU and Z87 motherboard (Asus MAXIMUS VI HERO) and 2x DDR3-2400 4Gig memory sticks. X520 card is sitting in PCIe x16 slot.

Running the single thread loopback test on the Win 10 machine as per above post, I do max out one core but get 9.17 Gbps doing so.
Windows' network stack needs some help with 10gbe, it's not very plug-n-play. Check that receive side scaling is enables and all logical cores are available (should be 8 for i7), check that chimney offload is enabled, and try adjusting receive and transmit buffers size. There are several other tweaks but those are the first ones to do.

If you are wanting to transfer multi-threaded over cifs I would recommend using robocopy.
 

pclausen

Patron
Joined
Apr 19, 2015
Messages
267
Ok thanks. How do I access the windows network stack? Are you talking about the settings under device manager for the network card?
 
Last edited:

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
I had to do no tweaking at all to get 10Gb/sec over my LAN connections on Windows.....
 

pclausen

Patron
Joined
Apr 19, 2015
Messages
267
Interesting. What hardware are you running?

I ran this command:

Code:
C:\iperf>netsh interface tcp show global
Querying active state...

TCP Global Parameters
----------------------------------------------
Receive-Side Scaling State          : enabled
Chimney Offload State               : disabled
NetDMA State                        : disabled
Direct Cache Access (DCA)           : disabled
Receive Window Auto-Tuning Level    : normal
Add-On Congestion Control Provider  : none
ECN Capability                      : disabled
RFC 1323 Timestamps                 : disabled
Initial RTO                         : 3000
Receive Segment Coalescing State    : disabled
Non Sack Rtt Resiliency             : disabled
Max SYN Retransmissions             : 2
 

pclausen

Patron
Joined
Apr 19, 2015
Messages
267
So I booted the Windows 10 workstation from the FreeNAS boot stick again and ran the iperf loopback test. This time I got the same, in fact even higher, speed than I did on the FreeNAS server as seen here:

loopbackworkstation.JPG


Booted back into windows and created a RAM disk using ImDisk.

Result of copying from RAM disk to RAID0 on local machine:

ramdisktoraid0.PNG


So about 8.64 Gbps.

Going from RAID0 to RAM disk:

raid0toramdisk.PNG


So about 11.6 Gbps.

Next test was going from FreeNAS to RAM Disk on Windows machine:

freenastoramdisk.PNG


Absolutely awful performance with hardly any CPU being used...

And finally, going from RAM Disk to FreeNAS:

ramdisktofreenas.PNG


Certainly better than going the other way, but still less than 20% of what the 10 Gig link is capable of and CPU load is still next to nothing.

All I can think of is that I'm dealing with some issue with the Windows 10 X520 driver. Driver I'm using is Intel version 3.10.162.1 dated 4/24/2015.
 

pclausen

Patron
Joined
Apr 19, 2015
Messages
267
Windows' network stack needs some help with 10gbe, it's not very plug-n-play. Check that receive side scaling is enables and all logical cores are available (should be 8 for i7), check that chimney offload is enabled, and try adjusting receive and transmit buffers size. There are several other tweaks but those are the first ones to do.

I did find those options under Device Manager / Network adapters / Intel Ehternet Server Adapter X520 / Advanced / Maximum Number of RSS Queues. It was already set to 8.

The Offloading Options were also all enabled. They were:

IPsec Auth Header & ESP Enabled
IPv4 Checksum Rx & Tx Enabled
TCP Checksum Rx & Tx Enabled
UDP Checksum Rx & Tx Enabled
 

Mlovelace

Guru
Joined
Aug 19, 2014
Messages
1,111
I did find those options under Device Manager / Network adapters / Intel Ehternet Server Adapter X520 / Advanced / Maximum Number of RSS Queues. It was already set to 8.

The Offloading Options were also all enabled. They were:

IPsec Auth Header & ESP Enabled
IPv4 Checksum Rx & Tx Enabled
TCP Checksum Rx & Tx Enabled
UDP Checksum Rx & Tx Enabled
Do you have anti-virus running on the Windows client?
 

pclausen

Patron
Joined
Apr 19, 2015
Messages
267
No. Clean install of Windows 10 Pro 64-bit done just 3 days ago when I set up the RAID0. Nothing else has been installed since but Chrome, Notepad++, OpenOffice, MakeMKV, FileZilla and a few other utils. I have Windows firewall disabled as well since I rely on my EdgeRouter for that. I am on the fast ring and currently running build 10565. I suppose I could try a clean install of Windows 8.1 to see if the issue is somehow with Windows 10?
 

pclausen

Patron
Joined
Apr 19, 2015
Messages
267
Also, I get the same performance using ftp protocol vs. windows filer explorer, so that should rule out it being a Samba/CIFS issue I would think.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
Also, I get the same performance using ftp protocol vs. windows filer explorer, so that should rule out it being a Samba/CIFS issue I would think.

We know it's a network issue and not the storage subsystem or a protocol since iperf needs neither of those.

This is 100% a networking issue of some kind. Hardware, driver,something.
 
Status
Not open for further replies.
Top