NFS write performance

wyang

Dabbler
Joined
Jul 8, 2020
Messages
24
Greetings,

My servers' hardware configuration

Type: SuperMicro SYS-6029U-E1CR25M
Processor: Intel(R) Xeon(R) Gold 6226R CPU @ 2.90GHz
Memory: 12 x 32GB Samsung M393A4K40DB2-CVF
Storage:
2 x 256GB NVMe SSD, INTEL SSDPEKKA256GB
12 x 16TB HDD, Seagate ST16000NM001G-2KK103

TrueNAS-12.0-U3.1

Server 1
- Boot: 1 x 2-way mirror (SSD)
- Pool: 2 x 6-wide RAIDZ2 data VDEVs

Server 2
- Boot: 1 x 2-way mirror (SSD)
- Pool: 1 x 12-wide RAIDZ3 data VDEV

Each server has a 2 x 25G lagg0 interface for data traffic.

Spec of the HDDs
Spindle Speed: 7200RPM
interface Access Speed (Gb/s): 6.0, 3.0
Max. Sustained Transfer Rate OD (MB/s, MiB/s): 261, 249
Random Read/Write 4K QD 16 WCD (IOPS): 170/440
Interface Ports: Single

As per reference https://www.ixsystems.com/blog/zfs-pool-performance-2/, the expected performance of server 1 vs server 2
write IOPS: 880 vs 440
streaming write: 2088 MB/s, 1992 MiB/s vs 2349 MB/s, 2241 MiB/s

I created NFS shares on the two servers, mounted on Linux clients (bare metal servers and VMs) and run fio tests from these clients.

fio --name=test --filename=/mnt/bk1/test40G --size=40g --direct=1 --rw=<rw> --ioengine=libaio --fallocate=none --group_reporting --numjobs=<n> --bs=4k --iodepth=16 --ramp_time=10 --runtime=50
rw=write, randwrite; n=1, 10

Best write IOPS of server 1 vs server 2: 350 vs 320

fio --name=test --filename=/mnt/bk1/test40G --size=40g --direct=1 --rw=<rw> --ioengine=libaio --fallocate=none --group_reporting --numjobs=<n> --bs=4M --iodepth=64 --ramp_time=10 --runtime=50
rw=write, randwrite; n=1, 10

Best write throughput of server 1 vs server 2: 156MiB/s vs 128MiB/s

I'd much appreciate if you could help me to understand the test results

write IOPS, expected 880 vs 440, test results 350 vs 320
write throughput, expected 1992 MiB/s vs 2241 MiB/s, test results 156MiB/s vs 128MiB/s

Thanks very much!
 

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,112
Once contributing factor I would suspect is the NFS client on your Linux guests are likely requesting synchronous write operations (or periodically sending COMMIT/flush commands) - you can test this by forcing sync=disabled on a dataset and running the benchmark remotely again.

There will be some overhead of remote vs. local as well - I assume you have validated with iperf or a similar tool to ensure that your network bandwidth between the two endpoints is close to the theoretical 25Gbps?
 

wyang

Dabbler
Joined
Jul 8, 2020
Messages
24
Thanks very much @HoneyBadger for the recommendations.

Network bandwidth is not a bottleneck in the test environment.

Being new, would forcing sync=disabled on an NFS dataset impact data integrity assurance of NFS service?
 

morganL

Captain Morgan
Administrator
Moderator
iXsystems
Joined
Mar 10, 2018
Messages
2,694
The estimated numbers being reported are for ZFS..... not for NFS.

NFS has its own metadata and locking mechanisms. Is this all being random write tests from one client to a single file?
 

wyang

Dabbler
Joined
Jul 8, 2020
Messages
24
I see. Thanks @morganL .

These values are results of sequential write tests for both IOPS and throughput, better than those of random write tests. A single client to a server at a time. The result values are similar for different fio options, numjobs=1 or 10, idepth=16 or 64.
 

ChrisRJ

Wizard
Joined
Oct 23, 2020
Messages
1,919
I assume that the test is supposed to validate the suitability of the setup for one ore several scenarios/use-cases. Are we talking about something like purely sequential transfer of large files, or is there more?
 

morganL

Captain Morgan
Administrator
Moderator
iXsystems
Joined
Mar 10, 2018
Messages
2,694
I see. Thanks @morganL .

These values are results of sequential write tests for both IOPS and throughput, better than those of random write tests. A single client to a server at a time. The result values are similar for different fio options, numjobs=1 or 10, idepth=16 or 64.
Its difficult to evaluate whether you have a client issue or a NAS configuration issue when there is only one NAS client.
I'd check whether sync=always is on.. you don't have a SLOG.
I don't think the settings above are sequential... I'd check with a simple write a single file test.

When we test a large system, we genrally test with 10-20 clients. For a client test, we test with a simple workload.
 

wyang

Dabbler
Joined
Jul 8, 2020
Messages
24
The test is to validate read/write performance from clients on different network segments, 9 client network segments, a single client request at a time, pool compression off or lz4. The immediate use cases will be application data backups from Linux servers, medium to large files. The objective of the test is to understand read/write performance and if possible, to select appropriate settings to achieve better performance with prerequisites of ensuring no data loss.

With existing hardware, no SLOG on servers. For the test, datasets are configured as sync=standard, using sync settings requested by client software.

Reconfigured a dataset to be sync=disable for comparison.
- best write IOPS value with fio sequential write vs random write: 27.9k vs 1523
- best write throughput value with fio sequential write vs random write: 333MiB/s vs 332MiB/s

Did some more reading for nfs sync vs async, it looks I still need nfs sync for my use cases.

Thank you all, @HoneyBadger , @morganL and @ChrisRJ
 

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,112
The immediate use cases will be application data backups from Linux servers, medium to large files.

If you're saving data here in a non-real-time scenario (eg: in case of failure, you could simply re-run the backup job) then you may be able to run sync=disabled on the backup datasets safely. There is a minor risk as "directory metadata" wouldn't be written synchronously either - but generally speaking, it wouldn't be any less safe than using SMB as a backup target with regular parameters, and if you validate your backups after creation (which you should, as what good is a backup you can't restore from?) then it would immediately inform you if there was any issue.

But if you intend to extend the use-case beyond just "backup target" - since changing the export to async improved throughput, I suspect it's the nature of "remote sync writes" that is limiting you. You may be able to increase overall throughput by adding a high-performance SLOG device such as an Optane card.
 

morganL

Captain Morgan
Administrator
Moderator
iXsystems
Joined
Mar 10, 2018
Messages
2,694
The test is to validate read/write performance from clients on different network segments, 9 client network segments, a single client request at a time, pool compression off or lz4. The immediate use cases will be application data backups from Linux servers, medium to large files. The objective of the test is to understand read/write performance and if possible, to select appropriate settings to achieve better performance with prerequisites of ensuring no data loss.

With existing hardware, no SLOG on servers. For the test, datasets are configured as sync=standard, using sync settings requested by client software.

Reconfigured a dataset to be sync=disable for comparison.
- best write IOPS value with fio sequential write vs random write: 27.9k vs 1523
- best write throughput value with fio sequential write vs random write: 333MiB/s vs 332MiB/s

Did some more reading for nfs sync vs async, it looks I still need nfs sync for my use cases.

Thank you all, @HoneyBadger , @morganL and @ChrisRJ
Apart from SLOG, you should check whether the clients are accessing different files or using different mount points. If the test doesn't do the same, then you may be creating artificial bottlenecks
 

wyang

Dabbler
Joined
Jul 8, 2020
Messages
24
Thank you all very much for sharing your experiences and providing advices!
 
Top