10G Speed issues on Intel X520-DA2

Coolamasta

Dabbler
Joined
Feb 16, 2019
Messages
11
Hey all, I am quite new to FreeNAS, have been using it for about a year and been very pleased with it, I have just built a new budget dedicated NAS and am using 10G on it this time but having some speed issues which is capping my I/O.

Very quickly my NAS hardware is a Xeon E3-1220v3 with 32GB ECC with an Avago/LSI 9201 16 port HBA (latest FW) and an Intel X520-DA2 dual 10G NIC, Drive Pool's include a 6 x 8TB RZ2 and a 4 x 256GB SSD Mirrored (Raid 10 equivalent I assume)

All works flawlessly as far as getting into the drives over the network etc, perfect over a 1G network but slow in 10G, the only network option I have changed is "mtu 9000" on the X520 interfaces and also set 9000 for jumbo frames on the switch and windows client so its enabled all round.

Problem is I have this hard cap of around 280MB/s read and 380MB/s write (SMB) and that is to the spinning disk pool OR the SSD pool, same speeds, (client I am testing on has an nVME SSD so plenty of I/O) I also used iperf and the speeds that show are the same are the read world copying speeds I am seeing above so definately seems to be network rather than disk I/O not able to keep up, would expect to see much higher speeds to the SSD pool.

I have tried a bunch of things and have spent ages trawling around on here and Google but not finding a lot, does anyone have any suggestions as to what to try next, I see that there is the tunables area but not sure where to start with all that, weird as I see people has great success with the X520's?

Any help is hugely appreciated :)
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
Try turning jumbo off.

Jumbo is a stupidism from a time when CPU couldn't keep up with the interrupt rate. On modern systems with a modern CPU and LSO/LRO, jumbo shouldn't really make a difference, because the hardware is able to do all the heavy lifting. All jumbo is doing is throwing work back into less-heavily-optimized codepaths in the driver that use a totally different system for managing mbufs, etc., and quite frankly it just makes network building more difficult.

 

Coolamasta

Dabbler
Joined
Feb 16, 2019
Messages
11
Try turning jumbo off.

Jumbo is a stupidism from a time when CPU couldn't keep up with the interrupt rate. On modern systems with a modern CPU and LSO/LRO, jumbo shouldn't really make a difference, because the hardware is able to do all the heavy lifting. All jumbo is doing is throwing work back into less-heavily-optimized codepaths in the driver that use a totally different system for managing mbufs, etc., and quite frankly it just makes network building more difficult.


Thanks for the info, I just removed the mtu 9000 option from 10G interface in FreeNAS and set mtu back to 1500 everywhere else and I have lost more speed, now seeing around 25MB/s less than before so like 255MB/s read and 355MB/s write now, again exactly the same on spinning disk and SSD pools :( I do find it strange that my writes seem to be higher than my reads as well! o_O
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
It is totally NOT strange for writes to be faster. All writes are staged to a transaction group in RAM at the speed of RAM, and a transaction group is HUGE. The transaction group is then committed to disk while the next one is being built. There is very little to slow writes down.

Reads that aren't in cache MUST go out to a disk or SSD, which adds additional latency. Therefore reads tend to be slower unless ZFS can predict what you're about to read and have it cached.

If you haven't availed yourself of the 10 Gig Networking Primer, I suggest you do so. There's a lot of information in there including tunables that increase network buffer sizes and other optimizations. You really kinda have to page through it as a lot of practical experience from FreeNAS users is included in all the followups. Note that optimizations often need to be done on both sides (the client as well) in order to get good single-session 10 gig.
 

Coolamasta

Dabbler
Joined
Feb 16, 2019
Messages
11
OK so I am still stuck with this, have spent way too long trying to work out what is going on and not getting anywhere, appreciate your help @jgreco and had a read through that primer thread quite a bit and lot of good info but am now not sure what else to try!

Leaving my spinning disk pool aside and just using my SSD pool for testing, when I do a pool speed test on FreeNAS itself (dd if=/dev/zero of=tmp.dat bs=2048k count=50k & dd if=tmp.dat of=/dev/null bs=2048k count=50k) I can get around 950 MB/s write and over 2000 MB/s read once converted from bytes so I am assuming that there is no issue with the drives, pool or HBA controller but no matter what I do I cannot get past 280MB/s read and 380MB/s write (SMB) over the physical 10G network.

I have created a RAMDISK on a VMware 6.7 VM with Vmxnet3 NIC through a vSwitch directly connected to 10G network and I hit the 280/380 cap, I have tried a standalone PC with another Intel x520 NIC directly connected and hit the 280/380 cap, I bypassed my 10G switch and directly connected FreeNAS box to standalone PC and still hit the 280/380 cap and even deleted and rebuilt the SSD pool for good measure.

Tried all sorts of tunables and settings on the 10G NICs I have read up on, while I can get file transfer to go a little bit faster (like 10 MB/s) I am still nowhere near 10G speeds, cant get past 290 read and 290 write, tried MTU 1500 and Jumbo 9014 (Jumbo was slightly better)

CPU usage on FreeNAS box doesn't go over 10-15% while copying files so I am really not sure what is going on here and its really starting to do my head in now, does anyone have any more ideas on this? :(

Below shows some of the tunables I have tried:

sysctl kern.ipc.somaxconn=2048
sysctl kern.ipc.maxsockbuf=16777216
sysctl net.inet.tcp.recvspace=4194304
sysctl net.inet.tcp.sendspace=2097152
sysctl net.inet.tcp.sendbuf_max=16777216
sysctl net.inet.tcp.recvbuf_max=16777216
sysctl net.inet.tcp.sendbuf_auto=1
sysctl net.inet.tcp.recvbuf_auto=1
sysctl net.inet.tcp.sendbuf_inc=16384
sysctl net.inet.tcp.recvbuf_inc=524288
 

phil1c

Dabbler
Joined
Dec 28, 2019
Messages
21
What version of FreeNAS are you running? I'm experiencing similar (loosely): I have an r720xd running FreeNas 11.3-RC1 and an r620 running proxmox (about to attempt Windows 10 for testing my issue) that are direct connect via DAC on 10gbe cards. For me, iperf3 can get dang near full line speed (9.4Gbps) going box to box (not in VMs) and about 8.5Gbps going Proxmox VM (ubuntu or Fedora) to FN but any mounted CIFS shares on VMs --> FreeNAS have about 500MB/s write but only 280MB/s read, no matter what I try. I haven't tried any of your tunables but I can't seem to make the read speed budge. I ran Bonnie++ from a Jail on my FN box and that got me about 500MB/s write and 1.3GB/s read (yay, compression) so I know the box can do it, and so can the network, and I'm at such a loss right now.
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
What version of FreeNAS are you running? I'm experiencing similar (loosely): I have an r720xd running FreeNas 11.3-RC1 and an r620 running proxmox (about to attempt Windows 10 for testing my issue) that are direct connect via DAC on 10gbe cards. For me, iperf3 can get dang near full line speed (9.4Gbps) going box to box (not in VMs) and about 8.5Gbps going Proxmox VM (ubuntu or Fedora) to FN but any mounted CIFS shares on VMs --> FreeNAS have about 500MB/s write but only 280MB/s read, no matter what I try. I haven't tried any of your tunables but I can't seem to make the read speed budge. I ran Bonnie++ from a Jail on my FN box and that got me about 500MB/s write and 1.3GB/s read (yay, compression) so I know the box can do it, and so can the network, and I'm at such a loss right now.

What's your pool size, pool layout, memory size, size of any L2ARC, and approximate working set size?
 

phil1c

Dabbler
Joined
Dec 28, 2019
Messages
21
What's your pool size, pool layout, memory size, size of any L2ARC, and approximate working set size?
1) Pool is 84tb usable, comprised of 2x raidz2 vdevs each with 6x 12tb shucked easystore drives (tested w/CrystalDisk before adding to box at around 215MB/s read/ 190MB/s write).
2) 128GB of 1600mhz ECC RAM.
3) No L2ARC
4) By working set, are you referring to the specific size of the dataset I've been testing with or the size of the file(s) I've been testing with? First answer is roughly 8TB right in the dataset, I've been using 5-10GB files to test. Total pool is only about 12% full right now. I should specify that, when I say "that I've been testing with", I mean that I've been testing with through the VMs on Proxmox. When I ran Bonnie++ on the FN box, I used a different, empty share to avoid permissions issues.

Also, I just re-ran bonnie++ on FN in the jail with compression on the dataset in question set to "off" (because i should have done that from the start) and got values of 399MB/s write and 451MB/s read, which is perfectly inline with expected (based on drive speed and vdev layout.
 

Donny Davis

Contributor
Joined
Jul 31, 2015
Messages
139
Are you hitting a CPU limit with your testing?
In a previous system I was hitting a CPU limit with compression - turned it off and my 10G issues went away. Not saying that is your case, but I am thinking maybe a little more investigation on what you are bound by ( CPU / IO / Network ) is helpful in getting to the bottom of these types of issues.
 

phil1c

Dabbler
Joined
Dec 28, 2019
Messages
21
Are you hitting a CPU limit with your testing?
In a previous system I was hitting a CPU limit with compression - turned it off and my 10G issues went away. Not saying that is your case, but I am thinking maybe a little more investigation on what you are bound by ( CPU / IO / Network ) is helpful in getting to the bottom of these types of issues.
No, not even close. Might jump to overall usage of, like, 7% during tests (I'm going to re-run and make a post so I'm double checking).
Interestingly, I just installed Win10 on a second ssd in the proxmox box and ran smb transfer tests and got great results, so now I'm worried it's something with Proxmox VMs and SMB or a combo of Proxmox and FreeNAS that I don't yet (hopefully "yet") understand.
 

Donny Davis

Contributor
Joined
Jul 31, 2015
Messages
139
Overall system CPU util is usually not a bottleneck - in all of my cases it has been single thread performance. Watch
Code:
top -aSH


When you are running your tests and see if you see any single threads hitting 100%
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
1) Pool is 84tb usable, comprised of 2x raidz2 vdevs each with 6x 12tb shucked easystore drives (tested w/CrystalDisk before adding to box at around 215MB/s read/ 190MB/s write).
2) 128GB of 1600mhz ECC RAM.
3) No L2ARC
4) By working set, are you referring to the specific size of the dataset I've been testing with or the size of the file(s) I've been testing with? First answer is roughly 8TB right in the dataset, I've been using 5-10GB files to test. Total pool is only about 12% full right now. I should specify that, when I say "that I've been testing with", I mean that I've been testing with through the VMs on Proxmox. When I ran Bonnie++ on the FN box, I used a different, empty share to avoid permissions issues.

Also, I just re-ran bonnie++ on FN in the jail with compression on the dataset in question set to "off" (because i should have done that from the start) and got values of 399MB/s write and 451MB/s read, which is perfectly inline with expected (based on drive speed and vdev layout.

I think the performance is within the expected realm for 2x RAIDZ2 vdevs being used for VM storage.


RAIDZ vdev IOPS performance tends to approximate the performance of the slowest component device of the vdev, so you effectively have a pool that has the IOPS capacity of just two drives (because - two vdevs). RAIDZ is designed for storage of large sequential files, and is terrible at random I/O. From ZFS's point of view, "random" in this context doesn't really mean what you might think. It means files that have lots of blocks in the middle rewritten, which is the definition of VM virtual disks or database files.

128GB is probably OK for a VM filer but the lack of L2ARC will tend to limit read speeds.

Look, the recipe for VM speed is really simple.

1) Use mirrors
2) if you want typical VM storage IOPS density of 25-50 IOPS/sec per VM, you need one mirror vdev per every 4-8 VM's. Use lots of vdevs.
3) Use massive drives, like 12TB'ers. A 12TB 5400RPM drive is a lot more helpful than a 6TB 7200RPM drive.
4) Keep the pool occupancy low. The sweet spot is 10%-40%. You really want to stay under 50%. This will keep writes fast.
5) Have lots of ARC. 64GB bare minimum.
6) Have lots of L2ARC. Enough to hold the working set is ideal (plus you need to bump up ARC to hold the indirect pointers). This is what makes reads fast, because once you have fragmentation on the HDD pool, there's no other fix.
7) VM storage is an exercise in parallelism. Do not obsess on sequential speed benchmarks.

In my experience, virtually everyone who reports mysterious speed problems with FreeNAS and VM's is violating several of these points.
 

phil1c

Dabbler
Joined
Dec 28, 2019
Messages
21
I think the performance is within the expected realm for 2x RAIDZ2 vdevs being used for VM storage.


RAIDZ vdev IOPS performance tends to approximate the performance of the slowest component device of the vdev, so you effectively have a pool that has the IOPS capacity of just two drives (because - two vdevs). RAIDZ is designed for storage of large sequential files, and is terrible at random I/O. From ZFS's point of view, "random" in this context doesn't really mean what you might think. It means files that have lots of blocks in the middle rewritten, which is the definition of VM virtual disks or database files.

128GB is probably OK for a VM filer but the lack of L2ARC will tend to limit read speeds.

Look, the recipe for VM speed is really simple.

1) Use mirrors
2) if you want typical VM storage IOPS density of 25-50 IOPS/sec per VM, you need one mirror vdev per every 4-8 VM's. Use lots of vdevs.
3) Use massive drives, like 12TB'ers. A 12TB 5400RPM drive is a lot more helpful than a 6TB 7200RPM drive.
4) Keep the pool occupancy low. The sweet spot is 10%-40%. You really want to stay under 50%. This will keep writes fast.
5) Have lots of ARC. 64GB bare minimum.
6) Have lots of L2ARC. Enough to hold the working set is ideal (plus you need to bump up ARC to hold the indirect pointers). This is what makes reads fast, because once you have fragmentation on the HDD pool, there's no other fix.
7) VM storage is an exercise in parallelism. Do not obsess on sequential speed benchmarks.

In my experience, virtually everyone who reports mysterious speed problems with FreeNAS and VM's is violating several of these points.

Hey jgreco. That is very helpful, but I think i should have stated the purpose of my Freenas box to save you the time. I'm not using FreeNAS to store VMs. I'm using it as the data repository for things like movies and tv shows that VMs may pull, but not the VMs themselves. The VM virtual disks and databases (as is relevant) are all store on mirrored SSDs on the R620 Proxmox box itself. The testing I have been doing have been performed using single (one at a time), small to medium (relative to the context of movie files) 10-25GB file just to test throughput as this would represent the usage of these large files in "production".

I did a *lot* of testing and retesting (and I think I have to redo some of it again)of different set ups yesterday and got some interesting results so I'll be making a large post here and in PM forums in the next couple days to detail, I just wanted to expand on my particular situation.

EDIT: Error in statement regarding single file testing.
 
Last edited:

phil1c

Dabbler
Joined
Dec 28, 2019
Messages
21
Overall system CPU util is usually not a bottleneck - in all of my cases it has been single thread performance. Watch
Code:
top -aSH


When you are running your tests and see if you see any single threads hitting 100%

Quick update on this. When running any file transfer to FN, smbd on FreeNAS maxes out a core at 100%. However, it does whether the transfer is running at 200MB/s or 900MB/s, which, to me says, it's not CPU bounded. My FN box is running a pair of e5-2630L V2s and my PM box runs a pair of e5 2630L V1s. Is it possible they dont have the single-threaded performance to handle 10gb transfers?
 

Donny Davis

Contributor
Joined
Jul 31, 2015
Messages
139
SMB is a single threaded application on freebsd last I checked - so sounds to me like you are hitting a CPU limit.
Have you tried any other protocols?
You can check to see if this is a SMB issue by creating a RAM disk / creating a pool from it / then drop your test file on that. This will assuredly tell you if its an SMB issue / network issue / or disk read issue.

Also what do local disk benchmarks say? Also I have a 40G network and have literally never seen SMB move much faster than 1G/s so you really aren't too far off.

I know in some of my machines I have had to do things like turn off hyperthreading to get the best possible single core performance from them for applications like this. Tinker around with BIOS settings and see if you can't get that thread performance up there a little bit.
 

phil1c

Dabbler
Joined
Dec 28, 2019
Messages
21
SMB is a single threaded application on freebsd last I checked - so sounds to me like you are hitting a CPU limit.
Have you tried any other protocols?
You can check to see if this is a SMB issue by creating a RAM disk / creating a pool from it / then drop your test file on that. This will assuredly tell you if its an SMB issue / network issue / or disk read issue.

Also what do local disk benchmarks say? Also I have a 40G network and have literally never seen SMB move much faster than 1G/s so you really aren't too far off.

I know in some of my machines I have had to do things like turn off hyperthreading to get the best possible single core performance from them for applications like this. Tinker around with BIOS settings and see if you can't get that thread performance up there a little bit.

Hitting the CPU limit would make sense if I was able to find a cap for file transfer. I've run Bare Metal win 10 on the r620 and set a 96GB ramdisk to do some testing and have been able to hit 1GB/s read/715MB/s write. SMBD on the FreeNAS box is pegging a CPU at 100% whether I'm getting those transfer speeds or 500MB/s Write/ 300MB/s Read from a Fedora VM. No, i haven't tried other protocols yet as using SMB will be by far the easiest to use in my environment so I'd like to avoid having to get NFS running on the windows or Mac clients. Maybe I'll see about disabling hyperthreading.

I think it's time I made my own post as I've done extensive testing the last couple of days and need some help interpreting all of the data because it isn't straightforward to me what the issue is.
 

Donny Davis

Contributor
Joined
Jul 31, 2015
Messages
139
On the fedora VM - what network driver are you using?
 

phil1c

Dabbler
Joined
Dec 28, 2019
Messages
21
On the fedora VM - what network driver are you using?
VirtIO, same as all of the VMs. Fedora has been the best performer of all of the VMs I've tried so far.

I did some quick testing (thank god for remote connections) by disabling HT on the R720 and switch power management (in the bios) on both machines to OS controlled and retested a file transfer in the Fed VM. Now, my writes to FN are the same, but at the limit of my PM SSD raid pool, but the writes have gone up from 235-300MB/s to 390-400 MB/s, which I believe is the theoretical limit of the SSD pool as well. During this time, writes to the pool don't cause FreeNAS SMBD to break 55% single threaded CPU usage while reads still can get up to 90% BUT never reach 100%. Previously, both operations were pegging 100%.

Sounds like I have *another* few hours of retesting to do once I get out of work but this is the first thing since jumbo frames that is moving me in the right direction. I am still surprised that it may (MAY, as I want to repeat all of my testing with the new settings before I settle on this being the culprit) be my processors single-threaded performance that is hurting file transfers. I was genuinely unaware that SMB file transfers could be so processor intensive (again, all of this is preliminary) nor was I aware that a 2630L V2 could be insufficient.
 
Last edited:

Donny Davis

Contributor
Joined
Jul 31, 2015
Messages
139
You will find that to be the case with more than just SMB - check this one out.


Glad to see you are making progress though :)
 

phil1c

Dabbler
Joined
Dec 28, 2019
Messages
21
You will find that to be the case with more than just SMB - check this one out.


Glad to see you are making progress though :)

Ok, I did a fair bit more testing, and implemented the change you suggested @Donny Davis , and I've opted to start my own thread to stop hogging this one. Head on over here to see a VERY long post along with some follow-up.
 
Top