~1.7 GiB/s pool performance, but expecting more...

Phlesher

Dabbler
Joined
Jan 9, 2022
Messages
16
Looking for some pointers on what I can do to increase throughput, or if perhaps I'm expecting too much and should be content with what I've got. :)

Layout is:
  • 5-disk RAIDZ-1
  • Each disk is a 4 TB Samsung EVO 870 (should have ~510 MiB/s sequential write throughput)
  • Block size for dataset at 1MB
  • No compression
Executing this to test a basic write to the dataset:

dd if=/dev/zero of=/mnt/test-dataset/testfile bs=4M count=10000

I get about ~1700 MiB/s throughput.

I don't understand enough about the characteristics of the combined writes happening across the drives, including the polarity bits, to determine whether this is about the maximum performance I should expect for such a sequential write. Any advice welcome!


EDIT to reflect other information I should have included originally:
  • Understood that compression is a thing I'll want for the real dataset. This is merely a test dataset, and turning off compression ensures a clean test (bytes written to dataset == bytes written to disks).
  • Understood that mirrors would be more performant. In this case, I am maximizing for storage space (with a bare minimum of fault tolerance), not performance. I want to achieve the best possible performance out of a RAIDZ1 pool. (Aside: I have a separate pool that's a simple mirror, for other use cases.)
 
Last edited:
Joined
Oct 22, 2019
Messages
3,641
No compression
Not your central issue in this thread, but I recommend always enabling some level of compression.

Even for datasets that hold mostly uncompressible files, you still gain the benefits of removing "slack" at the end of records. At minimum, enable LZ4 compression, since it has early abort, and you'll reap the benefits of removing record "slack".
 
Joined
Jul 3, 2015
Messages
926
If you want performance then use mirrors in your case 2 x 2.
 

Phlesher

Dabbler
Joined
Jan 9, 2022
Messages
16
Sorry, I should have clarified a few things and will edit the original post to reflect:
  • Understood that compression is a thing I'll want for the real dataset. This is merely a test dataset, and turning off compression ensures a clean test (bytes written to dataset == bytes written to disks).
  • Understood that mirrors would be more performant. In this case, I am maximizing for storage space (with a bare minimum of fault tolerance), not performance. I want to achieve the best possible performance out of a RAIDZ1 pool. (Aside: I have a separate pool that's a simple mirror, for other use cases.)
 

rvassar

Guru
Joined
May 2, 2018
Messages
972
Sorry, I should have clarified a few things and will edit the original post to reflect:
  • Understood that compression is a thing I'll want for the real dataset. This is merely a test dataset, and turning off compression ensures a clean test (bytes written to dataset == bytes written to disks).
  • Understood that mirrors would be more performant. In this case, I am maximizing for storage space (with a bare minimum of fault tolerance), not performance. I want to achieve the best possible performance out of a RAIDZ1 pool. (Aside: I have a separate pool that's a simple mirror, for other use cases.)

You have a misconception about RAIDz1 write performance. A single vdev RAIDz1 requires the write to complete on every device to ensure data integrity. RAIDz1 gives you a small speed bonus on read, and essentially single device write rates.

I suspect you're front loading into RAM, and the overall write rate is equivalent to a single device 510MB/sec rate. Bump up your blocksize*count so the test file size exceeds your RAM, and watch the rate plummet. Alternatively, try dd's "oflag=sync" option.
 
Joined
Jul 3, 2015
Messages
926
RAIDZ1 falls between the cracks a bit. Personally if you want performance I'd say go for mirrors and maybe with a hot-spare and if not RAID-Z2. Z1 sounds like a compromise but it isn't because you get the worst of both worlds.
 

Phlesher

Dabbler
Joined
Jan 9, 2022
Messages
16
You have a misconception about RAIDz1 write performance. A single vdev RAIDz1 requires the write to complete on every device to ensure data integrity. RAIDz1 gives you a small speed bonus on read, and essentially single device write rates.

I suspect you're front loading into RAM, and the overall write rate is equivalent to a single device 510MB/sec rate. Bump up your blocksize*count so the test file size exceeds your RAM, and watch the rate plummet. Alternatively, try dd's "oflag=sync" option.

I actually have very few conceptions! Your reply here has helped me understand some things about the RAM possibly acting as a buffer in front of the disks. I'll try more testing along the lines of what you're suggesting here. Thanks!
 

Phlesher

Dabbler
Joined
Jan 9, 2022
Messages
16
RAIDZ1 falls between the cracks a bit. Personally if you want performance I'd say go for mirrors and maybe with a hot-spare and if not RAID-Z2. Z1 sounds like a compromise but it isn't because you get the worst of both worlds.

Yeah, that was understood when I decided to build it this way. What I'm going for here is a little bit of fault tolerance without the sacrifice of a bunch of disk space. I have limited drive bays to play with and am trying to make everything fit.

Note that I also have a hot spare I can assign to any given vdev in the pool. So I really should say the profile here is "multiple RAIDZ1 vdevs + N hot spares". My understanding is that this gives me another (lesser) level of fault tolerance, in that a single drive failure within N minutes (however long it takes data to be replicated to the hot spare) will be managed cleanly and without my intervention, while only a double-drive failure in a single vdev within the same N minutes would be unrecoverable. I felt this was an acceptable level of risk given the kind of data I'm storing here (not critical, and to be backed up remotely anyway).

Please disabuse me of my notions if you find this to be an intolerably stupid setup. :D
 

Phlesher

Dabbler
Joined
Jan 9, 2022
Messages
16
Tested more along the lines of your suggestions @rvassar.

First test was this, using oflag=sync:

Code:
root@hoard[/mnt/bulk-storage/test-no-compression]# dd if=/dev/zero of=/mnt/bulk-storage/test-no-compression/testfile bs=4M count=10000 oflag=sync
10000+0 records in
10000+0 records out
41943040000 bytes transferred in 59.954501 secs (699581170 bytes/sec)


Second test was this, increasing total data written past the limits of my RAM (64 GB allocated to TrueNAS):

Code:
root@hoard[/mnt/bulk-storage/test-no-compression]# dd if=/dev/zero of=/mnt/bulk-storage/test-no-compression/testfile2 bs=10M count=10000
10000+0 records in
10000+0 records out
104857600000 bytes transferred in 61.379977 secs (1708335590 bytes/sec)


Graph of disk I/O showing both tests (chose just one of the disks -- they all look basically the same, as you'd expect):

Screen Shot 2023-03-06 at 10.03.22 AM.png


So neither one of those results lines up exactly with what I might have expected. ~699 MiB/sec in the first case, and the second case still shows the same ~1700 MiB/s I was getting in my original test.
 

mav@

iXsystems
iXsystems
Joined
Sep 29, 2011
Messages
1,428
With oflag=sync speed is likely limited by cache flush time to SSDs and additional ZIL traffic. The value is not surprising for desktop SATA SSDs and single-threaded write.

1700MB/s though sounds perfect for RAIDZ of 5 SATA SSDs, limited to ~500MB/s each. You should probably not expect much more. There may also be a CPU limitation of memory copy in ZFS due to single threaded write, depending on the CPU frequency, but I'd expect that to be closer to 4GB/s.
 
Top