Mediocre 10Ge network performance

Status
Not open for further replies.

cookiesowns

Dabbler
Joined
Jun 8, 2014
Messages
31
Hi,

So I just got my ZFS box setup.

Hw consists of:
E5-1620v2
24x 4TB 7.2K Deskstar NAS hdd in a 4 vdev 6 drive z2 pool
X9SRH-7TF 10Ge directly connected to a machine running windows 8 and a X540-T2 copper nic.
64GB RAM
3 LSI HBA's all direct attached running R16 firmware

Max performance I've been seeing is around 1.2-2.5gbps single thread CIFS

Iperf is the same at 1.2-1.6gbps. If i increase window size or use parallel tests I can saturate the nic no problem.

Is there any tuning steps I'm missing for cifs or maybe tunables I should try?

The array was about to push 1GB/s in various benchmarks even with pushing the arc to its max.
 
Last edited:

c32767a

Patron
Joined
Dec 13, 2012
Messages
371
Hi,

So I just got my ZFS box setup.

Hw consists of:
E5-1620v2
24x 4TB 7.2K Deskstar NAS hdd in a 4 vdev 6 drive z2 pool
X9SRH-7TF 10Ge directly connected to a machine running windows 8 and a X540-T2 copper nic.
64GB RAM
3 LSI HBA's all direct attached running R16 firmware

Max performance I've been seeing is around 1.2-2.5gbps single thread CIFS

Iperf is the same at 1.2-1.6gbps. If i increase window size or use parallel tests I can saturate the nic no problem.

Is there any tuning steps I'm missing for cifs or maybe tunables I should try?

The array was about to push 1GB/s in various benchmarks even with pushing the arc to its max.

I'm missing something..

If you run iperf, you get the same rate as what you see with CIFS, except when you increase which window? the TCP window iperf uses?
 

cookiesowns

Dabbler
Joined
Jun 8, 2014
Messages
31
I'm missing something..

If you run iperf, you get the same rate as what you see with CIFS, except when you increase which window? the TCP window iperf uses?

Correct.

Single thread iperf and CIFS without any tweaking client -> server gives me around 1.2-1.6gbps maximum.

Server -> client CIFS gives me good performance, 2 concurrent downloads I can do about 7gbps. I did however notice something interesting. If there is a file transfer going slowly, all other transfers will be limited at the same rate. so e.g.: file is being copied at 100MB/s, copying to a faster disk will net 100MB/s as well, if both disks are fast, I can do 500MB/s each. This is copying the same file from the nas.

file uploads seem to be similar as well. Is this something going on with some tunables? Or is this just how ZFS/CIFS works?
 

c32767a

Patron
Joined
Dec 13, 2012
Messages
371
Correct.

Single thread iperf and CIFS without any tweaking client -> server gives me around 1.2-1.6gbps maximum.

Server -> client CIFS gives me good performance, 2 concurrent downloads I can do about 7gbps. I did however notice something interesting. If there is a file transfer going slowly, all other transfers will be limited at the same rate. so e.g.: file is being copied at 100MB/s, copying to a faster disk will net 100MB/s as well, if both disks are fast, I can do 500MB/s each. This is copying the same file from the nas.

file uploads seem to be similar as well. Is this something going on with some tunables? Or is this just how ZFS/CIFS works?

It's hard to say without breaking everything down and finding the bottleneck. I think it's likely there is a bug or tunable that's responsible, but the only way to know is to look.
I can say one of the setups I use at $dayjob is almost identical to yours, except we use the X9SRL and a PCIe SolarFlare 10G card instead of Intel. With a 2.4Ghz E5 It does 5-6Gb/s without breaking a sweat.

In order, I would:

There are tunables for window size, I would check to make sure they're set.
There is an issue with TCP offload and the Intel ixbge driver, I would look at that.
I would validate the network between your devices, which iperf seems to report is OK.
I'd also look at zfs stats when this performance issue is happening, just to see what your I/O load looks like.
 
Last edited:

cookiesowns

Dabbler
Joined
Jun 8, 2014
Messages
31
It's hard to say without breaking everything down and finding the bottleneck. I think it's likely there is a bug or tunable that's responsible, but the only way to know is to look.
I can say one of the setups I use at $dayjob is almost identical to yours, except we use the X9SRL and a PCIe SolarFlare 10G card instead of Intel. With a 2.4Ghz E5 It does 5-6Gb/s without breaking a sweat.

In order, I would:

There are tunables for window size, I would check to make sure they're set.
There is an issue with TCP offload and the Intel ixbge driver, I would look at that.
I would validate the network between your devices, which iperf seems to report is OK.
I'd also look at zfs stats when this performance issue is happening, just to see what your I/O load looks like.


Do you mind sharing your tunables? I'll look into the TCP offload issue.

Network performance was validated, however the same single thread performance is abysmal like mentioned earlier. Do you mind sharing your iperfs with default window size, and a window size of let's say 100K end to end? I've tried both jumbo frames and default MTU's and performance seems to be similar.

ZFS stats looks okay.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
Here's some sysctls I'm currently experimenting with for 10GbE. Use these at your own risk:

kern.ipc.maxsockbuf=16777216
net.inet.tcp.recvbuf_max=16777216
net.inet.tcp.sendbuf_max=16777216

Note that if you have more than about 80-100ms of latency between the server and clients these will certainly hurt your throughput... severely!
 

cookiesowns

Dabbler
Joined
Jun 8, 2014
Messages
31
I think that my limitation right now is CIFS. Even though it seems that CIFS doesn't use much CPU

I'll try tweaking the sysctls

sysctls by itself made no difference:

However after doing some more digging and testing on CIFS/SAMBA I noticed it's the receive window of the cifs server that was slowing transfers down:

aio write size = 8192
aio read size = 8192
write cache size = 262144
socket options = SO_SNDBUF=163840
socket options = SO_RCVBUF=163840

pushing the speed of a single SSD on one single stream no problem now!

Only on uploads though :( it seems sending to server speeds seem to have gone down.
 
Last edited:

c32767a

Patron
Joined
Dec 13, 2012
Messages
371
I think that my limitation right now is CIFS. Even though it seems that CIFS doesn't use much CPU

I'll try tweaking the sysctls

sysctls by itself made no difference:

However after doing some more digging and testing on CIFS/SAMBA I noticed it's the receive window of the cifs server that was slowing transfers down:

aio write size = 8192
aio read size = 8192
write cache size = 262144
socket options = SO_SNDBUF=163840
socket options = SO_RCVBUF=163840

pushing the speed of a single SSD on one single stream no problem now!

Only on uploads though :( it seems sending to server speeds seem to have gone down.

Full disclosure, most of our systems are either NFS or AFP. We don't do much CIFS.

Last I knew Samba was still single threaded, so you're going to hit a wall when you get to what a single core can do on your system.

When you say Upload, you mean Client -> FreeNAS and Download is FreeNAS -> Client?

Note that writes will scream until you run out of cache RAM, while reads will be at the mercy of the IOPS of you drives, unless the data is in cache.

kern.ipc.maxsockbuf=2097152
net.inet.tcp.recvbuf_max=2097152
net.inet.tcp.sendbuf_max=2097152

Code:
root@bsd:/usr/ports/benchmarks/iperf # iperf -c nas1

------------------------------------------------------------

Client connecting to nas1, TCP port 5001

TCP window size: 32.5 KByte (default)

------------------------------------------------------------

[  3] local 192.168.1.190 port 35120 connected with 192.168.1.51 port 5001

[ ID] Interval       Transfer     Bandwidth

[  3]  0.0-10.0 sec  2.59 GBytes  2.22 Gbits/sec


That's a BSD VM on a ESX 5.5 host talking to a FreeNAS via a 10G link, no tuning. If I twiddle with the window size I can get it to 3-4G/s. Native 10G clients do better, but I don't have any handy here. :)
 

c32767a

Patron
Joined
Dec 13, 2012
Messages
371
I think that my limitation right now is CIFS. Even though it seems that CIFS doesn't use much CPU



pushing the speed of a single SSD on one single stream no problem now!

Only on uploads though :( it seems sending to server speeds seem to have gone down.

One other thought.. Your CPU has hyper threading. Just on a whim, disable HT in your BIOS, reboot and try your tests again. One long shot explanation is that I don't know how good BSD is at keeping I/O threads off the hyper threaded cores. HT used to have limitations on how much I/O the threads can do when you have more threads than physical cores. It's a long shot, but maybe that's in play here.
 

cookiesowns

Dabbler
Joined
Jun 8, 2014
Messages
31
One other thought.. Your CPU has hyper threading. Just on a whim, disable HT in your BIOS, reboot and try your tests again. One long shot explanation is that I don't know how good BSD is at keeping I/O threads off the hyper threaded cores. HT used to have limitations on how much I/O the threads can do when you have more threads than physical cores. It's a long shot, but maybe that's in play here.

Good tip, I'll give that a shot. I'm setting up the second freenas server as we speak and I'll do some BSD -> BSD testing with iPerf and what not.
As for your BSD VM above, those are good speeds for default window size.

Only tweaks you made were those sysctls?
 

c32767a

Patron
Joined
Dec 13, 2012
Messages
371
Good tip, I'll give that a shot. I'm setting up the second freenas server as we speak and I'll do some BSD -> BSD testing with iPerf and what not.
As for your BSD VM above, those are good speeds for default window size.

Only tweaks you made were those sysctls?

Um,

Looks like I missed one:

net.inet.tcp.delayed_ack=0

Otherwise, yeah, it's stock.. :)

This is an X540 right? So your interface is ixbge0, or ixbge1?
 

cookiesowns

Dabbler
Joined
Jun 8, 2014
Messages
31
Um,

Looks like I missed one:

net.inet.tcp.delayed_ack=0

Otherwise, yeah, it's stock.. :)

This is an X540 right? So your interface is ixbge0, or ixbge1?


ixgb0 and ixgb1 if I'm not mistaken. I don't think it's ixbge, I'll check tomorrow once I'm back at work.

As far as the tuning params, looks like auto tune added all of those for me, so It's a matter of raising them even more, or tuning CIFS/SMB. Maybe I'll try some bigger striped vdevs just to test maximum performance for CIFS and what not. The SSD's I have are far too slow.

I'll also be flashing all HBA's to R19's and testing for stability, is it really necessary to run same version as driver?
 

c32767a

Patron
Joined
Dec 13, 2012
Messages
371
You should match the firmware version to the LSI driver version. Having firmware lower than the driver is proven to cause Bad Things to happen. Having the firmware level higher than the Driver is also not considered best practice.. So yeah, match it up.

kinda weird that SSDs would be too slow.. :)

If you don't mind blowing up your data, try destroying the pool and recreate it as 1 24 drive Z3 pool and see how that performs.. Just for kicks.. :P
 

cookiesowns

Dabbler
Joined
Jun 8, 2014
Messages
31
You should match the firmware version to the LSI driver version. Having firmware lower than the driver is proven to cause Bad Things to happen. Having the firmware level higher than the Driver is also not considered best practice.. So yeah, match it up.

kinda weird that SSDs would be too slow.. :)

If you don't mind blowing up your data, try destroying the pool and recreate it as 1 24 drive Z3 pool and see how that performs.. Just for kicks.. :p

Sure.

Yes, the SSD's are slow. I only have 2 of the S3500 120GB's. In RAID-0 I'd get around 700MB/s reads and 300MB/s writes. Not the fastest.

As for firmware, that would mean I need to downgrade the two HBA's. The HBA's are running R17 with no issues so far. I'll play with fire on the backup server and run R19 everything for giggles. Since people report issues on the super micro LSI 2308 onboard SAS on R16.... who knows.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
Um,

Looks like I missed one:

net.inet.tcp.delayed_ack=0

Otherwise, yeah, it's stock.. :)

This is an X540 right? So your interface is ixbge0, or ixbge1?

Then it's still stock because that setting is already 0....
 
Status
Not open for further replies.
Top