10Gb a bit slow - Optimization tips?

GFR · May 24, 2018

I tried searching the forums, but didn't come up with a solution to my problem. Apologies if this has been addressed before, but I missed it through my search.

Had a FreeNAS system for a while, and have decided to upgrade it to larger system. I have been purchasing miscellaneous parts and gradually adding them to my system before transitioning to a larger server - probably going to order a Dell T640 in the near future. For now, the specs of my current system are below:

Lenovo TS140 chassis - CPU: E3-1225 V3
32GB ECC DDR3
Large Storage Array: 3x8TB WD drives (Red and Gold) in RaidZ1
Fast Storage Array: 1x Intel P3605 1.6TB NVMe SSD
FreeNAS installed on a spare 500gb 2.5" drive
Network adapter: Intel XXV710

This post only concerns the "Fast Storage Array" listed above.

This is a private system, so only 1-3 users are using it at a time. I have the NVMe set up as a sort of fast scratch space - mostly for file transfers between systems and active photo/video editing.

The issue I am having is my transfer speeds are a bit slower than I was expecting. It's still fast and workable, but with 10Gb I think I should be able to get better speeds. For all these tests, my client is a MacBook Pro. Connection is via Thunderbolt3 to a Sonnet Echo PCIe chassis containing a Mellanox card (I can get the exact card number later, if it makes a difference). Mellanox card connected via SFP+ DAC to a Netgear GC728X. The other 10Gb port on the switch is connected via SFP+ DAC to the FreeNAS box.

When I set the system up, first thing I did was run an iperf. Results came out to about 7Gb. I then switched on jumbo frames on both systems and the switch, at which point I saw about 9.5Gb:

Code:

./iperf -c 192.168.1.167
------------------------------------------------------------
Client connecting to 192.168.1.167, TCP port 5001
TCP window size:  131 KByte (default)
------------------------------------------------------------
[  4] local 192.168.1.168 port 55058 connected with 192.168.1.167 port 5001
[ ID] Interval	   Transfer	 Bandwidth
[  4]  0.0-10.0 sec  11.1 GBytes  9.54 Gbits/sec

Results are same when I reverse client/server as well.

So, I think the network is working as it should. I then created my storage volume with the NVMe drive. Disabled compression for these tests. First test I did was verify the local disk speed using dd:

Code:

sudo dd if=/dev/zero of=testfile bs=1m count=100000
100000+0 records in
100000+0 records out
104857600000 bytes transferred in 63.211588 secs (1658835082 bytes/sec)

The 1.5GB/sec seems within specs for the drive, so I was happy with that.

Next, shared the drive out via AFP, NFS, and CIFS. Then tested my write speeds from the client using both file transfers and dd. The results were very similar, so I'm only posting up the dd results from CIFS here. CIFS was, surprisingly, the fastest. However, it still topped out around 450MB/sec. The NFS and AFP were about 350MB/sec or so. CIFS results:

Code:

dd if=/dev/zero of=testfile bs=1m count=100000
100000+0 records in
100000+0 records out
104857600000 bytes transferred in 226.087431 secs (463792258 bytes/sec)

My initial thought was that the transfer was banging the CPU too hard. However, during the transfer the smbd process was using only about 33% cpu, according to top.

I then tried running multiple concurrent dd transfers, from between 2 and 4. At 2 concurrent transfers, my speed was nearly the same, for each transfer (about 800MB/sec total). Scaling up to 4 concurrent transfers, my speed dropped to about 210MB/sec each. The smdb task never went past 65% usage in any of these cases.

So, it seems I'm able to hit a wall 800MB/sec total transfer speed, but only if I have multiple concurrent transfers. I would greatly appreciate any suggestions on how I can hit this with a single stream instead of having to force multiple concurrent transfers.

In the course of my troubleshooting, I have also tried adding the sysctl tunables mentioned by jgreco in this thread:
https://forums.freenas.org/index.ph...rdware-smb-cifs-bottleneck.42641/#post-277350
Those tunables didn't seem to have any significant effect - write speed was within about 10MB/sec before and after.

I have also tried to disable jumbo frames on both systems and my switch, but that resulted in close to the same transfer speed as well (about 60MB/sec slower).

I'm not sure what to test next, so any guidance would be appreciated. Sorry for all the text - wanted to preemptively put my system specs and previous testing up here to save time for anyone kind enough to help.

c32767a · May 24, 2018

I've found FreeNAS's defaults for the number of NFS servers to be quite low for most modern CPUs.. For NFS performance, I'd look at that setting in the NFS options and increase it. We typically use a number between 20 and 40 depending on the CPU configuration of a given NAS.

GFR · May 24, 2018

c32767a said:
I've found FreeNAS's defaults for the number of NFS servers to be quite low for most modern CPUs.. For NFS performance, I'd look at that setting in the NFS options and increase it. We typically use a number between 20 and 40 depending on the CPU configuration of a given NAS.

Thank you very much for your suggestion. I just tried changing the value to 30 and doing another write test. Unfortunately, I'm still pegged at a little over 300MB/sec for NFS:

Code:

dd if=/dev/zero of=testfile bs=1m count=10000
10000+0 records in
10000+0 records out
10485760000 bytes transferred in 31.445477 secs (333458449 bytes/sec)

If I understand that option correctly, it would only affect the transfer rate of multiple streams? It's very possible I'm mis-interpreting it as well.

c32767a · May 25, 2018

GFR said:
Thank you very much for your suggestion. I just tried changing the value to 30 and doing another write test. Unfortunately, I'm still pegged at a little over 300MB/sec for NFS:

Code:
dd if=/dev/zero of=testfile bs=1m count=10000 10000+0 records in 10000+0 records out 10485760000 bytes transferred in 31.445477 secs (333458449 bytes/sec)

If I understand that option correctly, it would only affect the transfer rate of multiple streams? It's very possible I'm mis-interpreting it as well.

It depends on the specifics of the transactions. For our workloads, even a single NFS client benefits from increasing the thread count.

I would not expect your test to be doing sync writes, but just in case have you turned off sync writes on the pool and tried testing?

Elliot Dierksen · May 25, 2018

GFR said:
I have the NVMe set up as a sort of fast scratch space - mostly for file transfers between systems and active photo/video editing.

I don't understand exactly what you mean by the above statement. Is that a spare, or part of the pool? I had some similar issues with NFS writes from ESXi being throttled. I added a NVMe card (Intel 900P) to my pool as an SLOG (dedicated ZFS intent log), and my NFS write performance went from ~= 500M to ~= 4G. Try using the NVMe that way and see if that improves your performance.

GFR · May 25, 2018

Elliot Dierksen said:
I don't understand exactly what you mean by the above statement. Is that a spare, or part of the pool? I had some similar issues with NFS writes from ESXi being throttled. I added a NVMe card (Intel 900P) to my pool as an SLOG (dedicated ZFS intent log), and my NFS write performance went from ~= 500M to ~= 4G. Try using the NVMe that way and see if that improves your performance.

Sorry if I wasn't more clear - I meant that my NVMe drive is set up a separate pool. I want this to be a storage pool that is very quick and low latency, for purposes such as photo/video edit and temporary file transfers between other clients. In this case, I don't think setting the drive as an SLOG would be beneficial. It is my understanding an SLOG would improve write performance, but not necessarily read performance. I could be mistaken here, though, as I'm by no means a ZFS guru.

Elliot Dierksen · May 27, 2018

GFR said:
Sorry if I wasn't more clear - I meant that my NVMe drive is set up a separate pool. I want this to be a storage pool that is very quick and low latency, for purposes such as photo/video edit and temporary file transfers between other clients. In this case, I don't think setting the drive as an SLOG would be beneficial. It is my understanding an SLOG would improve write performance, but not necessarily read performance. I could be mistaken here, though, as I'm by no means a ZFS guru.

I think you might be surprised with what an SSD SLOG could do for you. If you have enough drives, you will likely get pretty darn good performance from enterprise SAS or SATA drives. I have a pool of 16 1TB 7.2K SATA drives with 2 RAIDZ2 vdevs of 8 drives each. When reading off that pool, I can routinely get 8G throughput. When I added the NME SSD as an SLOG, I went to ~=450M write performance to ~= 4G write performance. At a really high level, I have always gotten great read performance out of FreeNAS, even on some less than ideal disk configurations (being forced to use a proprietary RAID controller on an enclosure). The key on that appeared to be RAM. RAM helps on non-synchronous writes, but not on synchronous writes. The super-fast SSD SLOG resolved that issue quite nicely.

kdragon75 · May 27, 2018

I have 8 3TB drives and a GEOM RAID10 of SAS SSDs for a SLOG. Running sync always I get just over 500MB/s writes. With sync disabled I easily hit 1+GB/s Granted this is over 8GB fiber channel with two paths. I don't have 10gb Ethernet to test that yet but suspect I should hit near wire speed on my Xeon X5670.

Borja Marcos · May 28, 2018

Try with "gstat -I1s -d" while using the NVMe. It might be TRIM activity hurting performance.

In case it is, try disabling TRIM temporarily and check wether it improves.

In theory, this should enable to disable TRIM on the fly:

sysctl vfs.zfs.vdev.bio_delete_disable=1

And the explanation for it:

https://forums.freenas.org/index.php?threads/anyone-else-mess-around-with-nvme-yet.46884/
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=209571
https://lists.freebsd.org/pipermail/freebsd-fs/2016-May/023134.html

Elliot Dierksen · May 28, 2018

Borja Marcos said:
Try with "gstat -I1s -d" while using the NVMe. It might be TRIM activity hurting performance.

What would be the indication in these statistics that TRIM activity is causing a problem?

Here are some stats from one of my FreeNAS boxes.

dT: 1.003s w: 1.000s
L(q) ops/s r/s kBps ms/r w/s kBps ms/w d/s kBps ms/d %busy Name
0 0 0 0 0.0 0 0 0.0 0 0 0.0 0.0| da0
0 0 0 0 0.0 0 0 0.0 0 0 0.0 0.0| da1
0 0 0 0 0.0 0 0 0.0 0 0 0.0 0.0| da2
0 0 0 0 0.0 0 0 0.0 0 0 0.0 0.0| da3
0 0 0 0 0.0 0 0 0.0 0 0 0.0 0.0| da4
0 0 0 0 0.0 0 0 0.0 0 0 0.0 0.0| da5
0 0 0 0 0.0 0 0 0.0 0 0 0.0 0.0| da6
0 0 0 0 0.0 0 0 0.0 0 0 0.0 0.0| da7
0 0 0 0 0.0 0 0 0.0 0 0 0.0 0.0| da8
0 0 0 0 0.0 0 0 0.0 0 0 0.0 0.0| da9
0 0 0 0 0.0 0 0 0.0 0 0 0.0 0.0| da10
0 0 0 0 0.0 0 0 0.0 0 0 0.0 0.0| da11
0 0 0 0 0.0 0 0 0.0 0 0 0.0 0.0| da12
0 0 0 0 0.0 0 0 0.0 0 0 0.0 0.0| da13
0 0 0 0 0.0 0 0 0.0 0 0 0.0 0.0| da14
0 0 0 0 0.0 0 0 0.0 0 0 0.0 0.0| da15
0 0 0 0 0.0 0 0 0.0 0 0 0.0 0.0| da16
0 0 0 0 0.0 0 0 0.0 0 0 0.0 0.0| da17
0 0 0 0 0.0 0 0 0.0 0 0 0.0 0.0| da18
0 0 0 0 0.0 0 0 0.0 0 0 0.0 0.0| da19
0 0 0 0 0.0 0 0 0.0 0 0 0.0 0.0| da20
0 0 0 0 0.0 0 0 0.0 0 0 0.0 0.0| da21
0 0 0 0 0.0 0 0 0.0 0 0 0.0 0.0| da22
0 0 0 0 0.0 0 0 0.0 0 0 0.0 0.0| da23
0 0 0 0 0.0 0 0 0.0 0 0 0.0 0.0| da24
4 113 0 0 0.0 107 3860 6.7 0 0 0.0 46.8| da25
4 114 0 0 0.0 108 3844 7.1 0 0 0.0 51.9| da26
4 117 0 0 0.0 111 3956 6.6 0 0 0.0 50.4| da27
4 112 0 0 0.0 106 3912 6.9 0 0 0.0 48.6| da28
4 112 0 0 0.0 106 3904 6.9 0 0 0.0 50.9| da29
4 119 0 0 0.0 113 3908 7.0 0 0 0.0 53.8| da30
4 109 0 0 0.0 102 2676 6.9 0 0 0.0 48.8| da31
4 104 0 0 0.0 97 2628 7.1 0 0 0.0 46.6| da32
4 106 0 0 0.0 99 2628 7.3 0 0 0.0 49.4| da33
4 106 0 0 0.0 99 2724 7.1 0 0 0.0 48.2| da34
4 103 0 0 0.0 96 2684 7.6 0 0 0.0 49.0| da35
4 104 0 0 0.0 97 2680 7.3 0 0 0.0 47.9| da36
4 76 0 0 0.0 69 1898 7.4 0 0 0.0 34.7| da37
4 85 0 0 0.0 78 2197 9.7 0 0 0.0 54.7| da38
0 76 0 0 0.0 69 1902 8.2 0 0 0.0 35.2| da39
0 78 0 0 0.0 71 1890 7.8 0 0 0.0 36.7| da40
0 74 0 0 0.0 67 1858 8.3 0 0 0.0 33.1| da41
0 73 0 0 0.0 66 1838 7.4 0 0 0.0 31.5| da42
0 72 0 0 0.0 67 2273 6.3 0 0 0.0 31.7| da43
0 71 0 0 0.0 66 2253 6.4 0 0 0.0 29.8| da44
0 69 0 0 0.0 64 2253 6.8 0 0 0.0 28.9| da45
0 68 0 0 0.0 63 2241 6.5 0 0 0.0 29.7| da46

Borja Marcos · May 28, 2018

High %busy figures together with lots of d/s and maybe slow deletions (ms/d).

In this case there's no deletion (as in TRIM, BIO_DELETE) activity.

Elliot Dierksen · May 28, 2018

Borja Marcos said:
High %busy figures together with lots of d/s and maybe slow deletions (ms/d).

In this case there's no deletion (as in TRIM, BIO_DELETE) activity.

OK, thanks! I ask because I am awaiting the arrival of some used S3500 drives [Damn you, scredfox !!

]

sfcredfox said:
"interesting data"

to see how those work as a mirrored SLOG. My primary FreeNAS is working great using an Intel 900P NVMe as an SLOG.

Borja Marcos · May 28, 2018

It should work fine.

My test case was a really synthetic worst case scenario (running several instances of Bonnie++ in parallel). However, destroying a large snapshot/dataset/chunk of data in a busy server could cause a similar behavior which is undesirable.

NVMes have an additional issue: Not using the CAM subsystem means that TRIM operations are not coalesced, which can make it even worse.

Important Announcement for the TrueNAS Community.

10Gb a bit slow - Optimization tips?

GFR

Cadet

c32767a

Patron

GFR

Cadet

c32767a

Patron

Elliot Dierksen

Guru

GFR

Cadet

Elliot Dierksen

Guru

kdragon75

Wizard

Borja Marcos

Contributor

Elliot Dierksen

Guru

Borja Marcos

Contributor

Elliot Dierksen

Guru

Borja Marcos

Contributor

Similar threads