FreeNAS 11.3RC1 and 10GBe: Leaving Sequential read speed on the table.

phil1c

Dabbler
Joined
Dec 28, 2019
Messages
21
I felt it best to stop hogging someone else's thread and decided to make my own. The following continues from my posts here. I'll include a tl;dr as I'm also making the full version very detailed to hopefully save time with questions, but hopefully not deter comments.

Tl;dr
Read speeds can saturate 10GBe, write speeds can't break half. Any further suggestions?

Much longer version:
Setup/Hardware
I have an r620 that I am using as a virtual machine box. The PM OS and VM virtual disks are stored on the single RAID1 virtual disk made of a pair of Samsung 860 EVO SSDs. Specs of the system are as follows:
Proxmox v6.1
2x e5-2630L
128gb 1333MHz DDR3 ECC RAM
H710 Mini mono RAID card
  • 2x Samsung 860 Evo in Raid1
2x 10gb SFP+/2x 1gbe Network Daughter Card

I also have an R720XD LFF that runs FreeNAS, which is used for storage of large files (ie: movies) that will be accessed over my LAN or via the VM for various computing needs (ie: transcoding for Plex). The 12x 12TB drives are split into a pool consisting of 2x 6-drive RaidZ2 VDEVs. Specs for this are as follows:
FreeNAS 11.3RC1
2x e5-2630
128gb 1333MHz DDR3 ECC RAM
LSI 9208-8i HBA
12x WD 12tb EasyStore drives, shucked.
  • Before shucking, I tested each on a windows PC and received ~225MB/s R, 200MB/s W sequential
2x 10gb+/2x 1gbe Network Daughter Card

I have one of the 1gbe ports on each machine connected to my local network and the two machines are direct connected to each other via SFP+ DAC on a 10Gb port.

All file sharing between devices is done using SMB/CIFS as the files need to be accessible by Windows machines and VMs and *nix VMs and SMB is, to me in my limited experience, the easiest way to do this with FreeNAS.

FN Expectations:
Based on this (https://www.ixsystems.com/blog/zfs-pool-performance-2/) information, and given the R/W speeds I saw prior to shucking, the following should be the theoretical limits for my pool for sequential reads (given compression off):
  • Seq Read: V * ((N – p) * D)
  • V = number of VDEVs in pool
  • N = total number of drives in VDEV
  • P = parity width, or 2 for RaidZ2
  • D = Slowest drive speed, in MB/s for given operation in pool (identical for all drives)
  • Seq Read should be = 2 * ((6-2)*225) = 900MB/s
  • Seq Write = 2 * ((6 – 2) * 200) = 800MB/s
I know that compression will help boost these numbers, especially on read, but calculating theoretical is beyond me so my assumptions on sequential transfers will be based on these numbers going forward.

VM Setups
Fedora:
Fedora 31 fresh install with latest updates applied.
VirtIO used for NIC drivers
4CPU cores/8GB of RAM (RAM may be different, if specified in following data tables)

Ubuntu:
Ubuntu 19.10
VirtIO used for NIC drivers
Docker active: no dockers actively performing tasks, and no docker accesses FreeNAS mount Plex but performing no tasks (test folder already scanned to library and no one has access to Plex so no transcoding occurring) 4CPU cores/8GB of RAM (RAM may be different, if specified in following data tables)

Win10VM:
Win10 fresh install, latest updates applied
VirtIO used for NIC drivers
4CPU cores/8GB of RAM (RAM may be different, if specified in following data tables)

Troubleshooting
I've been running a gauntlet of testing through a variety of parameter changes. I won't post ALL of my testing results here, I'll just pick up from where I left off in the referenced post from above. At this point, the changes that actually led to an impact are as follows:
1) Jumbo frames on both FreeNAS and proxmox, along with in the VMs themselves.
2) Setting CPU performance in both machine's BIOSes to max performance.
* I'll put a special note here that setting the CPU on FreeNAS to max performance had the most impact, and I'll reference this in a bit.
3) Turning off HyperThreading on FreeNAS box (I may have to retest this particular setting as I changed CPU performance settings after this so it might do nothing)

The key tests I have been performing are iperf3, dd, pv copies, and copies within Nautilus. After a lovely suggestion from @Donny Davis , I keep an eye on CPU core utilization of smbd on FreeNAS during all of these tests, though I only recorded the results for some.

The non-DD files that I am copying for testing are compressed movie files, so I would expect they don't compress much in ZFS, if at all.

To start, I run iperf3 to ensure that it's not a network issue.
Host 1Host 2ClientHostResults (Gbps)Results (MB/s)
FreeNASProxmoxProxmoxFreeNAS9.91238
FreeNASProxmox9.61200
FreeNAS means from the WebUI shell, proxmox means from the webUI shell.

So, it's not a NIC issue between the boxes. The following are the results I get from DD, Nautilus transfers, and pv copies (note: compression is still on unless specified otherwise):
DD
HostFile Size UsedWrite to FN Speed (MB/s)Read from FN Speed (MB/s)FN SMBD ObservationsSetup Comments
Fedora VM16GB753, 82059090% on write, 67% on readRAM 40GB

File Transfer Nautilus
HostLocationWrite to FN Speed (MB/s)Read from FN Speed (MB/s)FN SMBD ObservationsSetup Comments
Fedora VMPhysicalToo fast (no number shown)320, 390 (delete and redo)*N/ARAM 40GB

File Transfer pv
HostLocationWrite to FN Speed (MB/s)Read from FN Speed (MB/s)SMBD commentsSetup Comments
Fedora VMPhysical1007345, 410 (delete and redo)*N/ARAM 40GB, new file
Fedora VMPhysical938290, 415 (delete and redo)*N/ARAM 40GB, new file
* "Delete and redo": first value is the first time I copy from FN to the VM. After the copy, I immediately delete from the VM. The second value is copying the same file back to the VM again.

Yes, I intentionally set the RAM to 40GB so as to remove the physical disk speed (SSD) of the VM as a limiting factor (it's only a SATA drive). Even with this, I'm still not getting full speed (10GBe line speed/theoretical drive read speeds, they're identical based on my math above in "FN Expectations"). However, write speeds to FN are spot on or higher (except dd, which I'd expect higher from as it's just a giant file of zeros which should be VERY compressible, which still seems odd).

What I noticed after that round of testing is that read speeds will peg a CPU core to 100% BUT they will operate at line speed. Reads from FN are far less but never peg a CPU core to 100%.
Even though smbd never used a full core, I still noticed an improvement from previous testing where I did NOT have the FN box CPU performance set to max vs the testing above where I *did*. So, on a whim, I decided to swap the processors between the two boxes (as they aren't identical, see above "Setup") just to see what would happen, expecting some sort of change to the results I was getting, mostly expecting worse read results as the processors currently in my FN box are a generation older, though they aren't the low-power version. Those results follow:

IPERF3
Host 1Host 2ClientHostResults (Gbps)Results (MB/s)
FreeNASProxmoxProxmoxFreeNAS9.91238
FreeNASProxmox9.71213
FreeNASFedora VMFedoraFN9.81225
FNFedora9.61200
FreeNASWin10VMWin10FN5.1638
FNWin102.8350

DD
ChronoHostFile Size UsedWrite to FN Speed (MB/s)Read from FN Speed (MB/s)FN SMBD ObservationsSetup Comments
30Fedora VM16GB811, 812808, 781N/ARAM 40GB

File Transfer Nautilus/Explorer (Windows)
ChronoHostLocationWrite to FN Speed (MB/s)Read from FN Speed (MB/s)FN SMBD ObservationsSetup Comments
31Fedora VMPhysicalToo fast370, 510(delete and redo)*N/ARAM 40GB
32Fedora VMRamdisk*1100445, 580(delete and redo)*N/ARAM 40GB
35Win10 VMPhysical100047755%R
95%W
RAM 40GB

File Transfer pv
ChronoHostLocationWrite to FN Speed (MB/s)Read from FN Speed (MB/s)FN SMBD ObservationsSetup Comments
33Fedora VMRamdisk*793, 798532 - 53795%WRAM 40GB, new file
34Fedora VMPhysical94947656%RRAM 40GB, new file
Interestingly, swapping in the older, but not low-power version, processor into the FN box improved my write speeds. Still, during reads, FreeNAS smbd never pegs a core at 100%, which I would expect to be 100% when the speed isn't full line speed signifying that I am CPU bound on my read speeds, but this may be very wrong.

However, given that changes to the CPU did impact read speeds, I'm still wondering if I am CPU bound and the next step is to move to a CPU with higher per-thread clock speed.

I've also included some testing done in a Win10VM that is on Proxmox, which has (by far) the worst performance of all of the testing. For the record, I am using VirtIO NIC drivers for all VMs. I haven't done any testing at these phases with my Ubuntu VM as Fedora was also performing a bit better in all tests.

DD on FreeNAS directly
I should have included this earlier, but so it is separate, here are the results of running DD on FreeNAS directly when FreeNAS had the 2630L V2s in (I forgot to re-run with the 2630s):
HostFile Size UsedDataset usedWrite Speed (MB/s)Read Speed (MB/s)
FreeNAS Shell400GBnew, empty dataset with comp = off1030800
FreeNAS Shell800GBnew, empty dataset with comp = off840725
FreeNAS Shell400GBsame Movies dataset as over the network tests, comp = lz416995753
FreeNAS Shell1000GBsame Movies dataset as over the network tests, comp = lz417115676

Final Thoughts and Request for Help
I think I've reached the limits of my knowledge and testing as to what is going on. At this point, I'm thinking these are the most likely :
1) I am still CPU bound on the FN box and need to step up to higher single thread performance.
2) There is something in proxmox VMs related to data ingress that is throttling speeds (not sure how to test this, recommendations welcome)
3) A configuration in FN is hindering my read performance over the network, or possibly in general (see DD on FreeNAS directly results).
4) NIC problem? That's a straight guess.

My two biggest points of confusion are:
1) During reads from FreeNAS over the "network", smbd never fully utilizes a CPU core, as it does on Writes to FreeNAS over the network.
2) Even performing DD on FreeNAS directly, my write speeds are higher than read, though not as large a gap (read speed gets closer to write speed) while, strangely, the results are flipped based on theoretical values.

Any observations or expectations I've made are obviously up for debate and refute as I'm very new to FreeNAS/ZFS. Any observations that you see that might shed some light are appreciated, and I'll gladly do further testing. I'm just looking to maximize the performance of my set-up to help with future use cases.

Cheers
 
Last edited:
Joined
Dec 29, 2014
Messages
1,135
The key tests I have been performing are iperf3
It doesn't look like you are hitting the wall with iperf3 at 10G speeds, but I would stay with iperf (aka iperf2). I was thinking I was hitting a CPU wall or something with my 40G NIC's, but the problem turned out to be iperf3. It would max out at ~= 25G between the FreeNAS boxes which are the only ones with 40G NIC's. iperf2 will consistently get 37.7G in my setup.
I have one of the 1gbe ports on each machine connected to my local network and the two machines are direct connected to each other via SFP+ DAC on a 10Gb port.
What does the IP network look like? It sounds like you are doing this without a switch. Do you have dedicated IP networks between the boxes, or are you using FreeNAS as a bridge? If it is the latter, that could have a negative impact on your performance.
 

phil1c

Dabbler
Joined
Dec 28, 2019
Messages
21
It doesn't look like you are hitting the wall with iperf3 at 10G speeds, but I would stay with iperf (aka iperf2). I was thinking I was hitting a CPU wall or something with my 40G NIC's, but the problem turned out to be iperf3. It would max out at ~= 25G between the FreeNAS boxes which are the only ones with 40G NIC's. iperf2 will consistently get 37.7G in my setup.

What does the IP network look like? It sounds like you are doing this without a switch. Do you have dedicated IP networks between the boxes, or are you using FreeNAS as a bridge? If it is the latter, that could have a negative impact on your performance.

I'll give iperf a shot on the Win10VM just to see if that get's me more accurate numbers as Win10VM is the only VM getting low iperf #s but transferring higher.

As for the network, I have one of the 1gbe ports off each the PM and FN boxes connected to my home network (Edgeswitch8). The FN and PM boxes are also connected directly to each other via a 10gb port on each machine. On FN, I assigned an IP address of 10.10.250.1/24 to the relevant 10gb port directly (not on a virtual NIC) and on PM i have a virtual bridge set up that contains the in-use 10gb port with IP address 10.10.250.2/24. I need to utilize a bridge so that the VMs can receive

Could it be possible that I am CPU bound in some way on the PM box, but only in the "in" direction (ie: when reading FROM FN, not writing TO FN)? Such that my read speeds improved not because of putting the non-low power CPUs into FN but instead because I put the newer generation CPUs into PM? I recall observing (but not recording) cifsd CPU usage in the VMs on PM and it never reached 100% CPU utilization on any core, but maybe something to do with the bridge is causing CPU-limited throttling on the box itself (outside the VM).
 
Last edited:
Joined
Dec 29, 2014
Messages
1,135
on PM i have a virtual bridge set up that contains the in-use 10gb port with IP address 10.10.250.2/24.
It sounds like you are using a host as a bridge/switch. That can work, but I would certainly say that is not optimal. You can get a MikroTik 10G switch (CRS305-1G-4S+IN) from Baltic Networks for ~= $125 that has 4 10G SFP+ ports. I can't say for sure that is the problem because I am not able to picture the physical topology as well as the logical.
 

phil1c

Dabbler
Joined
Dec 28, 2019
Messages
21
It sounds like you are using a host as a bridge/switch. That can work, but I would certainly say that is not optimal. You can get a MikroTik 10G switch (CRS305-1G-4S+IN) from Baltic Networks for ~= $125 that has 4 10G SFP+ ports. I can't say for sure that is the problem because I am not able to picture the physical topology as well as the logical.

On PM, in order to have multiple VMs use a physical port, you need it set up a virtual bridge. Connecting the two machines through a switch wouldn't help because I would still need a bridge on the PM box to connect all of the VMs to this 10gb port.

But the point of I am using a bridge is still taken. Is there something fundamental to how bridges operate, particularly in PM, that can cause data out to be at full speed but data in to be limited in some way? And would this somehow be possible even though I'm getting line speed for iperf testing?
 
Joined
Dec 29, 2014
Messages
1,135
Now I understand better why you have the bridge. I don't see anything that is a smoking gun, so that makes the troubleshooting more difficult. My process is to make sure that the network performs up to snuff on synthetic traffic with iperf, and then move on to troubleshooting other components.
 

phil1c

Dabbler
Joined
Dec 28, 2019
Messages
21
Now I understand better why you have the bridge. I don't see anything that is a smoking gun, so that makes the troubleshooting more difficult. My process is to make sure that the network performs up to snuff on synthetic traffic with iperf, and then move on to troubleshooting other components.

Yeah, that's what's making this so troublesome. Iperf3 is getting me line speed, or just shy of, on linux VMs to FN or PM to FN so I have no reason to believe it's that. After that, and all my other testing, I'm down to shooting blindly blaming CPU performance (which doesn't make enough sense yet), HBA performance, NIC, a config somewhere, something inherent to one or both of my machines, or blackmagic, and all of those are just guesses as to where to start.

Thanks for the input anyway. Hopefully someone else has another avenue to start before I have to begin buying hardware to test (costly and blind) or formatting and restarting (time consuming).
 
Joined
Dec 29, 2014
Messages
1,135
I don't have as good a suggestion for testing the IO stats on the pool locally, but that is worth doing. I have done some stuff with iozone, but I don't feel like I have a good understanding of it. If the network is good with synthetic traffic and the pool performs well locally, that rules out a number of things. When you are doing network reads, I would look at the ARC hit ratio. Higher is better, and it is hard to go wrong with adding more RAM.
 

phil1c

Dabbler
Joined
Dec 28, 2019
Messages
21
I don't have as good a suggestion for testing the IO stats on the pool locally, but that is worth doing. I have done some stuff with iozone, but I don't feel like I have a good understanding of it. If the network is good with synthetic traffic and the pool performs well locally, that rules out a number of things. When you are doing network reads, I would look at the ARC hit ratio. Higher is better, and it is hard to go wrong with adding more RAM.

I've run DD on the pool locally with compression of just to see what I get (exact numbers are in the post under "DD on FreeNAS Directly") just to test sequential. Short version is Read speed is higher than I am getting over the network but Write speed is still higher than Read. Anything locally that you can think of that might cause reduced Read vs Write speed that I can check?

EDIT: Also, I'm more interested in getting to operate at theoretical pool speeds, not so much ARC, as sequential reads of large (50+gb) files are my larger concern. Plus, I've already got 128GB in the FN box and don't think I feel like buying more any time soon as this is only for serving files, no VM/Jail workload.
 
Top