Poor performance - how to find the bottleneck?

hgerstung

Cadet
Joined
Jun 30, 2020
Messages
2
Hi Community,

I just started and set up my first FreeNAS box based on a slightly old Fujitsu MX130 S2 with an AMD Opteron 3280 (8 core 2.4Ghz) and 16GB RAM. The box has a TDP of 65W (excluding the drives), so a perfect little storage for my home office I thought.

The problem is that I do not see a good performance over the network (say, from my Mac) using a SMB share to access it.

The pool itself I created is a "Mirror" (not sure how this is correctly called) using two Seagate ST4000 IronWolf 4TB SATA 6GB/s disks. I also added a log drive (SanDisk SSD Plus 240GB) and a cache drive (SanDisk SSD Plus 240GB as well).

For networking, I found an (also "slightly" outdated Chelsio 10GB card that I connected with a 10GB twinax to my core switch (Ubiquiti US-16-XG, 12x10GB SFP+ + 4x 1/10GB RJ45). The Mac is connected to another switch using a 1GB link in my office. The two switches are connected via a 10G fibre link.

Now, in order to find a way to measure the performance instead of relying on my gut feeling about file transfers not being fast enough for this setup, I use a tool called fio, both on my Mac and on the FreeNAS box as well.

Here are the results when I run it on the CLI of the FreeNAS server:
READ: bw=3124MiB/s (3276MB/s), 3124MiB/s-3124MiB/s (3276MB/s-3276MB/s), io=8192MiB (8590MB), run=2622-2622msec

And here are the results when I run it on the Mac in a shell, using the mounted Samba share for the very same pool:
READ: bw=47.8MiB/s (50.1MB/s), 47.8MiB/s-47.8MiB/s (50.1MB/s-50.1MB/s), io=8192MiB (8590MB), run=171475-171475msec

I used this fio command:
fio --name=seqwrite --rw=read --direct=1 --bs=8k --numjobs=8 --size=1G --runtime=300 --group_reporting

It produced a lot of additional output, but I tried to keep this post short - let me know if there is anything in addition to the results I posted.

I am happy to share more details and I would be very grateful if you could point me to a wiki, website or document that describes in more detail how I can find out where the problem is. I obviously do not expect the same performance over the network compared to accessing a pool locally, but I was hoping to achieve more than 50MB/s to be honest.

My questions would be:
a) do you see any problems with the hardware I chose?
b) am I measuring the performance correctly, i.e. is fio a good tool for this or should I try something else?
c) I understand that the results of my little test seem to indicate that there is a problem with the network performance and not the hardware itself, is that correct?
d) If c) is correct, what can I try and look for next?

Apologies if this is not the right place to post it, please move my post accordingly or let me know whether I should delete/repost it myself in the right spot. And thank you everyone for your support.
Heiko
 

subhuman

Contributor
Joined
Nov 21, 2019
Messages
121
After what garm said,
The Mac is connected to another switch using a 1GB link in my office.
Right there's gonna be a major bottleneck. The rest of the network being 10Gb won't matter when the computer you're checking from is on 1Gb.

but I was hoping to achieve more than 50MB/s to be honest.
Just keep in mind that 1Gb LAN is going to cap you in the ballpark of 100MB/sec.


Now, on that note, there's a major difference between what you said "6GB", "10GB" and "1GB" and what they actually are, "6Gb", "10Gb" and "1Gb".
B=byte
b=bit
Did you make a similar mistake here?
READ: bw=3124MiB/s (3276MB/s)
That speed is not possible for SATA. That's roughly the speed of a PCIe 1.0 x16 interface. SATA-III maxes out at around 550 MB/sec.
If it's a mistake from you copying it to the post, that's fine. We can move on. But if the program you're using, fio, is reporting it that way there's there's either something fundamentally wrong with fio or it's misinterpreting your hardware configuration.
 

hgerstung

Cadet
Joined
Jun 30, 2020
Messages
2
Thanks for your replies and explanations so far. I tried the same measurement on a pool without a SLOG and L2ARC and came out at the same speed (47MiB/s).

And the 3124MiB/s was not a copy and paste error, I assume this was coming directly out of the memory cache. I used 1G files and 8 of them, and as my RAM is 16GB all the files together fit into the RAM. When I use 16 jobs (=16GB) the result is a more realistic 140MB/s ...

I do not bother too much about the local speed, but as you explained, the 1Gb/s (thanks for pointing out my mistake regarding GB and Gb btw) link should deliver 100MB/s and I am at 50% right now.

What I did in the meantime is setting up a direct 10GbE link between the FreeNAS box and a Linux server next to it. I mounted the pool using SMB on this Linux machine. This removed my network, the 1GbE NIC of my Mac, MacOS and its SMB client implementation from the picture. I got a 110 MB/s result with this and a 16G test data read, which is at least closer to the 140MB/s that I get as a local result.

Trying the same (Linux -> 10Gbe Direct Twinax -> FreeNAS) with a 4G test set gets me to 200MB/s. This certainly involves some caching I assume, but it looks OK to me.

Now, next test was to again use the Linux server but now utilize the 10GbE link over the switch, and this results in 187MB/s (again with 4GB), slightly less than the direct link, which sounds logical to me.

Now, it seems that I at least get FreeNAS to deliver 200MB/s over its 10GbE interface. I see 187MB/s with one of the switches involved, but I do not really have a good way of testing behind the 2nd switch at the moment.

I am a little bit afraid to remove the LOG and ARC drives from the pool, a second pool (also with MIRROR but not SLOG and L2ARC drives) gives me the same 187MB/s via the 10GbE link crossing one switch. Do you still think that I should try removing it, @garm ? (Sorry to be a coward, I just want to avoid losing the data on this pool if possible) - I would just choose "Remove" on the pool status page, right?

Thanks again for all your help, this was already very helpful!
 
Last edited:

subhuman

Contributor
Joined
Nov 21, 2019
Messages
121
Please read that. Or at least the first post as the entire thread is several pages long at this point. If you have further questions I'll let someone else take over, as I've never used/needed SLOG or L2ARC and am very far from an expert. But in your scenario, you probably don't need one, and if you do the SanDisk SSDs are a very wrong choice for hardware.
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,994
If you have removed your L2ARC and ZIL, next you need to use a proper test and setup to get accurate results. You need to disable caching ability and compression, etc... and move files which cannot be compressed further and that are larger than your ARC. These are important things if you desire somewhat accurate results. The link above is well worth reading.
 

hgerstung

Cadet
Joined
Jun 30, 2020
Messages
2
OK, I read the above link and understand now that I do not need an SLOG (and that it can be counterproductive in my situation). I do not understand why an L2ARC would be a bad thing / might cause the performance to go down in my case (and therefore would be a reason to remove it). I appreciate all the answers I got until now and I know that there is a point when I asked one too many questions. But I genuinely want to understand why removing the L2ARC is going to improve my performance (maybe I missed the point in the above posted link..).

I understand that I do not get accurate/realistic results when I try to access my pool locally on the FreeNAS box and use test data that nicely fits into the ARC. But I do not need an exact measurement, I just want to find out what is causing the <50MB/s (=50% of my network link capacity) performance or if this performance is what I should expect because of my setup.

As it looks (and I am happy to learn otherwise), the bottleneck is the network connection.

I borrowed a 10G TB3 interface for my MacBookPro and will try some more tests with SLOG removed.

Again, thank you all for your friendly and useful responses!
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,994
So one main reason I prefer to have people remove the cache devices is to minimize the hardware and potiential issues that can cause problems getting a true hardware baseline. It's difficult to troubleshoot a problem when we don't know how it operated without any enhancements. Another way to reduce hardware cache impact is to remove all but 16GB RAM if possible and then transfer files that cannot be compressed and are much larger than the RAM capacity, this will force the system to show it's true hardware limited speed.

I suspect your limitation is your network connection but is it the protocol, is it the NIC driver, is it the interconnection cable, or is it the unit you are testing with?

Have you tried any other protocols?

I've never used fio before, I'll try to give it a test someday but I still have not unpacked my FreeNAS server since moving. Moving from a 3800 sqft house to a 2400 sqft house have been a real shock.
 
Top