~1.7 GiB/s pool performance, but expecting more...

Phlesher · Mar 3, 2023

Looking for some pointers on what I can do to increase throughput, or if perhaps I'm expecting too much and should be content with what I've got. :)

Layout is:

5-disk RAIDZ-1
Each disk is a 4 TB Samsung EVO 870 (should have ~510 MiB/s sequential write throughput)
Block size for dataset at 1MB
No compression

Executing this to test a basic write to the dataset:

dd if=/dev/zero of=/mnt/test-dataset/testfile bs=4M count=10000

I get about ~1700 MiB/s throughput.

I don't understand enough about the characteristics of the combined writes happening across the drives, including the polarity bits, to determine whether this is about the maximum performance I should expect for such a sequential write. Any advice welcome!

EDIT to reflect other information I should have included originally:

Understood that compression is a thing I'll want for the real dataset. This is merely a test dataset, and turning off compression ensures a clean test (bytes written to dataset == bytes written to disks).
Understood that mirrors would be more performant. In this case, I am maximizing for storage space (with a bare minimum of fault tolerance), not performance. I want to achieve the best possible performance out of a RAIDZ1 pool. (Aside: I have a separate pool that's a simple mirror, for other use cases.)

winnielinnie · Mar 3, 2023

Phlesher said:
No compression

Not your central issue in this thread, but I recommend always enabling some level of compression.

Even for datasets that hold mostly uncompressible files, you still gain the benefits of removing "slack" at the end of records. At minimum, enable LZ4 compression, since it has early abort, and you'll reap the benefits of removing record "slack".

Johnny Fartpants · Mar 4, 2023

If you want performance then use mirrors in your case 2 x 2.

Phlesher · Mar 4, 2023

Sorry, I should have clarified a few things and will edit the original post to reflect:

Understood that compression is a thing I'll want for the real dataset. This is merely a test dataset, and turning off compression ensures a clean test (bytes written to dataset == bytes written to disks).
Understood that mirrors would be more performant. In this case, I am maximizing for storage space (with a bare minimum of fault tolerance), not performance. I want to achieve the best possible performance out of a RAIDZ1 pool. (Aside: I have a separate pool that's a simple mirror, for other use cases.)

rvassar · Mar 4, 2023

Phlesher said:
Sorry, I should have clarified a few things and will edit the original post to reflect:

Understood that compression is a thing I'll want for the real dataset. This is merely a test dataset, and turning off compression ensures a clean test (bytes written to dataset == bytes written to disks).

Understood that mirrors would be more performant. In this case, I am maximizing for storage space (with a bare minimum of fault tolerance), not performance. I want to achieve the best possible performance out of a RAIDZ1 pool. (Aside: I have a separate pool that's a simple mirror, for other use cases.)

You have a misconception about RAIDz1 write performance. A single vdev RAIDz1 requires the write to complete on every device to ensure data integrity. RAIDz1 gives you a small speed bonus on read, and essentially single device write rates.

I suspect you're front loading into RAM, and the overall write rate is equivalent to a single device 510MB/sec rate. Bump up your blocksize*count so the test file size exceeds your RAM, and watch the rate plummet. Alternatively, try dd's "oflag=sync" option.

Johnny Fartpants · Mar 4, 2023

RAIDZ1 falls between the cracks a bit. Personally if you want performance I'd say go for mirrors and maybe with a hot-spare and if not RAID-Z2. Z1 sounds like a compromise but it isn't because you get the worst of both worlds.

Phlesher · Mar 6, 2023

rvassar said:
You have a misconception about RAIDz1 write performance. A single vdev RAIDz1 requires the write to complete on every device to ensure data integrity. RAIDz1 gives you a small speed bonus on read, and essentially single device write rates.

I suspect you're front loading into RAM, and the overall write rate is equivalent to a single device 510MB/sec rate. Bump up your blocksize*count so the test file size exceeds your RAM, and watch the rate plummet. Alternatively, try dd's "oflag=sync" option.

I actually have very few conceptions! Your reply here has helped me understand some things about the RAM possibly acting as a buffer in front of the disks. I'll try more testing along the lines of what you're suggesting here. Thanks!

Phlesher · Mar 6, 2023

Johnny Fartpants said:
RAIDZ1 falls between the cracks a bit. Personally if you want performance I'd say go for mirrors and maybe with a hot-spare and if not RAID-Z2. Z1 sounds like a compromise but it isn't because you get the worst of both worlds.

Yeah, that was understood when I decided to build it this way. What I'm going for here is a little bit of fault tolerance without the sacrifice of a bunch of disk space. I have limited drive bays to play with and am trying to make everything fit.

Note that I also have a hot spare I can assign to any given vdev in the pool. So I really should say the profile here is "multiple RAIDZ1 vdevs + N hot spares". My understanding is that this gives me another (lesser) level of fault tolerance, in that a single drive failure within N minutes (however long it takes data to be replicated to the hot spare) will be managed cleanly and without my intervention, while only a double-drive failure in a single vdev within the same N minutes would be unrecoverable. I felt this was an acceptable level of risk given the kind of data I'm storing here (not critical, and to be backed up remotely anyway).

Please disabuse me of my notions if you find this to be an intolerably stupid setup. :D

Phlesher · Mar 6, 2023

Tested more along the lines of your suggestions @rvassar.

First test was this, using oflag=sync:

Code:

root@hoard[/mnt/bulk-storage/test-no-compression]# dd if=/dev/zero of=/mnt/bulk-storage/test-no-compression/testfile bs=4M count=10000 oflag=sync
10000+0 records in
10000+0 records out
41943040000 bytes transferred in 59.954501 secs (699581170 bytes/sec)

Second test was this, increasing total data written past the limits of my RAM (64 GB allocated to TrueNAS):

Code:

root@hoard[/mnt/bulk-storage/test-no-compression]# dd if=/dev/zero of=/mnt/bulk-storage/test-no-compression/testfile2 bs=10M count=10000
10000+0 records in
10000+0 records out
104857600000 bytes transferred in 61.379977 secs (1708335590 bytes/sec)

Graph of disk I/O showing both tests (chose just one of the disks -- they all look basically the same, as you'd expect):

Screen Shot 2023-03-06 at 10.03.22 AM.png

So neither one of those results lines up exactly with what I might have expected. ~699 MiB/sec in the first case, and the second case still shows the same ~1700 MiB/s I was getting in my original test.

mav@ · Mar 6, 2023

With oflag=sync speed is likely limited by cache flush time to SSDs and additional ZIL traffic. The value is not surprising for desktop SATA SSDs and single-threaded write.

1700MB/s though sounds perfect for RAIDZ of 5 SATA SSDs, limited to ~500MB/s each. You should probably not expect much more. There may also be a CPU limitation of memory copy in ZFS due to single threaded write, depending on the CPU frequency, but I'd expect that to be closer to 4GB/s.

Important Announcement for the TrueNAS Community.

~1.7 GiB/s pool performance, but expecting more...

Phlesher

Dabbler

winnielinnie

MVP

Johnny Fartpants

Guru

Phlesher

Dabbler

rvassar

Guru

Johnny Fartpants

Guru

Phlesher

Dabbler

Phlesher

Dabbler

Phlesher

Dabbler

mav@

iXsystems

Similar threads