NFS write performance

wyang · Jun 28, 2021

Greetings,

My servers' hardware configuration

Type: SuperMicro SYS-6029U-E1CR25M
Processor: Intel(R) Xeon(R) Gold 6226R CPU @ 2.90GHz
Memory: 12 x 32GB Samsung M393A4K40DB2-CVF
Storage:
2 x 256GB NVMe SSD, INTEL SSDPEKKA256GB
12 x 16TB HDD, Seagate ST16000NM001G-2KK103

TrueNAS-12.0-U3.1

Server 1
- Boot: 1 x 2-way mirror (SSD)
- Pool: 2 x 6-wide RAIDZ2 data VDEVs

Server 2
- Boot: 1 x 2-way mirror (SSD)
- Pool: 1 x 12-wide RAIDZ3 data VDEV

Each server has a 2 x 25G lagg0 interface for data traffic.

Spec of the HDDs
Spindle Speed: 7200RPM
interface Access Speed (Gb/s): 6.0, 3.0
Max. Sustained Transfer Rate OD (MB/s, MiB/s): 261, 249
Random Read/Write 4K QD 16 WCD (IOPS): 170/440
Interface Ports: Single

As per reference https://www.ixsystems.com/blog/zfs-pool-performance-2/, the expected performance of server 1 vs server 2
write IOPS: 880 vs 440
streaming write: 2088 MB/s, 1992 MiB/s vs 2349 MB/s, 2241 MiB/s

I created NFS shares on the two servers, mounted on Linux clients (bare metal servers and VMs) and run fio tests from these clients.

fio --name=test --filename=/mnt/bk1/test40G --size=40g --direct=1 --rw=<rw> --ioengine=libaio --fallocate=none --group_reporting --numjobs=<n> --bs=4k --iodepth=16 --ramp_time=10 --runtime=50
rw=write, randwrite; n=1, 10

Best write IOPS of server 1 vs server 2: 350 vs 320

fio --name=test --filename=/mnt/bk1/test40G --size=40g --direct=1 --rw=<rw> --ioengine=libaio --fallocate=none --group_reporting --numjobs=<n> --bs=4M --iodepth=64 --ramp_time=10 --runtime=50
rw=write, randwrite; n=1, 10

Best write throughput of server 1 vs server 2: 156MiB/s vs 128MiB/s

I'd much appreciate if you could help me to understand the test results

write IOPS, expected 880 vs 440, test results 350 vs 320
write throughput, expected 1992 MiB/s vs 2241 MiB/s, test results 156MiB/s vs 128MiB/s

Thanks very much!

HoneyBadger · Jun 28, 2021

Once contributing factor I would suspect is the NFS client on your Linux guests are likely requesting synchronous write operations (or periodically sending COMMIT/flush commands) - you can test this by forcing sync=disabled on a dataset and running the benchmark remotely again.

There will be some overhead of remote vs. local as well - I assume you have validated with iperf or a similar tool to ensure that your network bandwidth between the two endpoints is close to the theoretical 25Gbps?

wyang · Jun 28, 2021

Thanks very much @HoneyBadger for the recommendations.

Network bandwidth is not a bottleneck in the test environment.

Being new, would forcing sync=disabled on an NFS dataset impact data integrity assurance of NFS service?

morganL · Jun 28, 2021

The estimated numbers being reported are for ZFS..... not for NFS.

NFS has its own metadata and locking mechanisms. Is this all being random write tests from one client to a single file?

wyang · Jun 28, 2021

I see. Thanks @morganL .

These values are results of sequential write tests for both IOPS and throughput, better than those of random write tests. A single client to a server at a time. The result values are similar for different fio options, numjobs=1 or 10, idepth=16 or 64.

ChrisRJ · Jun 28, 2021

I assume that the test is supposed to validate the suitability of the setup for one ore several scenarios/use-cases. Are we talking about something like purely sequential transfer of large files, or is there more?

morganL · Jun 28, 2021

wyang said:
I see. Thanks @morganL .

These values are results of sequential write tests for both IOPS and throughput, better than those of random write tests. A single client to a server at a time. The result values are similar for different fio options, numjobs=1 or 10, idepth=16 or 64.

Its difficult to evaluate whether you have a client issue or a NAS configuration issue when there is only one NAS client.
I'd check whether sync=always is on.. you don't have a SLOG.
I don't think the settings above are sequential... I'd check with a simple write a single file test.

When we test a large system, we genrally test with 10-20 clients. For a client test, we test with a simple workload.

wyang · Jun 29, 2021

The test is to validate read/write performance from clients on different network segments, 9 client network segments, a single client request at a time, pool compression off or lz4. The immediate use cases will be application data backups from Linux servers, medium to large files. The objective of the test is to understand read/write performance and if possible, to select appropriate settings to achieve better performance with prerequisites of ensuring no data loss.

With existing hardware, no SLOG on servers. For the test, datasets are configured as sync=standard, using sync settings requested by client software.

Reconfigured a dataset to be sync=disable for comparison.
- best write IOPS value with fio sequential write vs random write: 27.9k vs 1523
- best write throughput value with fio sequential write vs random write: 333MiB/s vs 332MiB/s

Did some more reading for nfs sync vs async, it looks I still need nfs sync for my use cases.

Thank you all, @HoneyBadger , @morganL and @ChrisRJ

HoneyBadger · Jun 29, 2021

wyang said:
The immediate use cases will be application data backups from Linux servers, medium to large files.

If you're saving data here in a non-real-time scenario (eg: in case of failure, you could simply re-run the backup job) then you may be able to run sync=disabled on the backup datasets safely. There is a minor risk as "directory metadata" wouldn't be written synchronously either - but generally speaking, it wouldn't be any less safe than using SMB as a backup target with regular parameters, and if you validate your backups after creation (which you should, as what good is a backup you can't restore from?) then it would immediately inform you if there was any issue.

But if you intend to extend the use-case beyond just "backup target" - since changing the export to async improved throughput, I suspect it's the nature of "remote sync writes" that is limiting you. You may be able to increase overall throughput by adding a high-performance SLOG device such as an Optane card.

morganL · Jun 29, 2021

wyang said:
The test is to validate read/write performance from clients on different network segments, 9 client network segments, a single client request at a time, pool compression off or lz4. The immediate use cases will be application data backups from Linux servers, medium to large files. The objective of the test is to understand read/write performance and if possible, to select appropriate settings to achieve better performance with prerequisites of ensuring no data loss.

With existing hardware, no SLOG on servers. For the test, datasets are configured as sync=standard, using sync settings requested by client software.

Reconfigured a dataset to be sync=disable for comparison.
- best write IOPS value with fio sequential write vs random write: 27.9k vs 1523
- best write throughput value with fio sequential write vs random write: 333MiB/s vs 332MiB/s

Did some more reading for nfs sync vs async, it looks I still need nfs sync for my use cases.

Thank you all, @HoneyBadger , @morganL and @ChrisRJ

Apart from SLOG, you should check whether the clients are accessing different files or using different mount points. If the test doesn't do the same, then you may be creating artificial bottlenecks

wyang · Jun 29, 2021

Thank you all very much for sharing your experiences and providing advices!

Important Announcement for the TrueNAS Community.

NFS write performance

wyang

Dabbler

HoneyBadger

actually does care

wyang

Dabbler

morganL

Captain Morgan

wyang

Dabbler

ChrisRJ

Wizard

morganL

Captain Morgan

wyang

Dabbler

HoneyBadger

actually does care

morganL

Captain Morgan

wyang

Dabbler

Similar threads