Bottleneck between Z1 pool and NIC

Cristoffer_W

Cadet
Joined
Sep 6, 2020
Messages
3
Hello!

I have an issue where the performance of my NAS somehow gets bottlenecked between my Z1 pool and my NIC.

I’ve benchmarked this by saturating RAM with large videofiles (3x20gb) and then copying an even larger video file (1x60gb) to make sure nothing is reading from the cache.

The benchmarks reads as follows:

NAS Z1 (5discs) pool -> NAS Nvme pool = 550 - 620mb/s
NAS Nvme -> Local computer over 10gbe network = 1.13gb/s
NAS Z1 (5discs) -> Local computer over 10gbe network = 250 - 350mb/s

As you can see, my Z1 pool performs as expected when copying files to the Nvme pool inside my NAS.
The 10gbe network also performs as expected when copying from the Nvme pool over the network to my local computer.

What doesn't make sense is when I copy files from the Z1 pool over the 10gbe network, I cannot figure out what bottlenecks my performance here.

Any ideas?

Specs:
Asus X99 A-II
Intel i7 6800k
4 x 8gb Corsair 2400Mhz
Intel X540-t2
5 x WD RED 4TB (CMR) (All connected to the same controller)
1 x Samsung 970 EVO Plus 250gb

Version: FreeNAS-11.3-U4.1
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
What are you expecting to happen, magic?

ZFS cannot predict exactly what you are going to do next. It does some readahead to try to compensate for the most likely and predictable thing, which is the next section of the thing you've been reading, but if you've got a 10GbE connection, you will blow through that quickly.

So what's happening here is:

Your PC asks for "first part of file XYZ"

NAS reads it from the disk, along with some of the second part, and sends it off to your PC.

Your PC asks for "next part of file XYZ"

NAS sends the part previously read-ahead, and then has to read the remainder from disk, and sends it off to your PC.

Your PC asks for "third part of file XYZ"

NAS sends anything that had been read-ahead from the previous iteration, and then reads the remainder from disk, and sends it off to your PC.

So what's killing you is that there's a huge lag when your PC asks for a bit that ZFS hasn't already read, and it has to go out to disk and retrieve that. These are not *long* delays, but effectively it works out that your speeds are going to be proportional to the speed at which the disk can respond to an I/O request. Since disk isn't fast, neither is your file read.

There may be some things you can tune to improve this, somewhat.

Also, I can't tell for sure what your speeds are. Please take a few moments to review the Terminology and Abbreviations Primer because you've been listing your speeds in millibits per second (mb/s). When you are talking both network and disk transfer speeds, the network speeds are typically measured in megabits per second (Mbps) while disk speeds are in megabytes per second (MBytes/s). Your speeds look approximately reasonable if we were to assume that you were talking megabytes per second. If you are talking about megabits per second, then there's probably some room for substantial improvement.
 

Cristoffer_W

Cadet
Joined
Sep 6, 2020
Messages
3
Hi!
Thank you for answering!

I wasn't expecting half the performance of the disks.

I'm sorry for being a total noob screwing up the terminology, I understand the frustration after reading the doc.
The speeds should ofc have been labeled MByte/s & GByte/s.

So if I understand you correctly there is a difference in how ZFS retrieve data depending on whether the request comes from another PC over a 10GbE NIC compared to locally between pools in the NAS?
 

morganL

Captain Morgan
Administrator
Moderator
iXsystems
Joined
Mar 10, 2018
Messages
2,694
The difference is that when you access the ZFS pool across the network, you add network latency and you are limited by the protocol being used. Are you using NFS or SMB?

Copy functions are notoriously sensitive to latency because they are serial and not parallel processes. To test the real performance of the NAS, you are better testing reading and writing using a tool like fio.
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
Hi!
Thank you for answering!

I wasn't expecting half the performance of the disks.

You're not *getting* "half the performance" of the disks. You're getting about *twice* the performance of the disks. See discussion below the next quote.

Any modern hard disk is only capable of around 150MBytes/sec average, and that is ONLY if the reads are totally sequential. Please don't make the mistake of trying to point out that you've seen them go faster, because it will be at the outer edge of the platter, and I'll have to put that down with what happens at the inner edge. :smile:

So if I understand you correctly there is a difference in how ZFS retrieve data depending on whether the request comes from another PC over a 10GbE NIC compared to locally between pools in the NAS?

No, of course there's not a difference in how ZFS retrieves the data. There's a difference in the latency of the steps involved, though.

If you don't understand what latency is, I suggest looking at the following link, even though it is talking about a totally different subsystem, the SLOG/ZIL. https://www.ixsystems.com/community/threads/some-insights-into-slog-zil-with-zfs-on-freenas.13633/

Latency is essentially the sum of the delays involved in handling data at each step in the process. When you are doing something that is a strictly serial workload, as @morganL says, it is MUCH harder to tap the theoretical potential of the hardware. Your SMB client is only asking for data in a sequential fashion, and the only thing keeping things moving as fast as it is here in this situation is the read-ahead intelligence of ZFS. See my first response.

So here's a bit of a question for you. If you have five HDD's that are capable of an average 150MByte/sec (and if this is a new NAS you are probably on the outer tracks which are probably closer to 200MBytes/sec), that's somewhere between 750MBytes/sec and 1GByte/sec theoretical read speed possible from the pool. An NVMe SSD is typically capable of far more than that. Why would you think that the 550-620MBytes/sec in the original post is okay? Where's the rest of that speed vanishing to? It's lost to the latency inherent in a serialized operation.

You will notice this a LOT more when going over the network because the network is a slower medium, and there are multiple points at which the data is delayed if even just a little bit. Most consumer-grade 10G network switches do not do cut-through forwarding, most network cards are not of the low latency type, SMB has a lot of protocol overhead, and if you're running Windows, it isn't really optimized for high performance networking either.

So your 550-620 and 250-350 numbers look fine to me, and if you haven't done any tuning, I'd even consider them to be really good.

Now turn it around. Try *writing* to the pool. What you will find is that the write speeds tend to be much better, because the way ZFS works is that it stages writes into transaction groups (txg) in memory before flushing them to disk. The transaction group commit process is asynchronously handling writes to disk, and this takes a bunch of links out of the "weakest link" chain that happens during read. You end up being limited by the speed at which your PC and NAS can sling protocol over the ethernet, with the NAS stashing its writes in RAM to the next txg. When you do this, you will get much closer to the limit of EITHER your network-plus-protocol OR your actual pool speed.

For read, there are a few things you can do. Increase your ARC size (add RAM) so that cacheable data is served from RAM. Try to tune for increased prefetch. This has the potential to destroy performance for nonsequential workloads and I have no idea whether this is possible in current ZFS. Make sure you've maximized all other existing tunables for 10G. These things cannot eliminate latency, but they can fatten up some of the links in the chain.

Do note that when your pool gets full and fragmented, you ARE going to end up losing speed due to seeking.
 
Top