VDEV configuration for 24 SAS-SSD

MisterDeeds

Cadet
Joined
May 1, 2023
Messages
5
Dear all

I am new to TrueNas and would like to replace the Synology systems at our company with TrueNas. The first system I want to replace is a storage which primarily contains virtual machines.

A Supermicro CSE-216 X10DRI-T4+ with 2x Intel Xeon E5-2687WV4 (3,00 GHz) and 384GB Registered ECC DDR4 RAM is used. Also 3x LSI SAS9300-8i storage controller are installed. Thus, each SSD (24x 1.92TB 2.5" Samsung PM1643a SAS SSD) receives the full bandwidth. The controllers are configured as HBA. There are also 2 x Intel XXV710-DA2 (SFP28) network cards installed.

Since the system is home to virutal machines, I naturally want to get the maximum performance. However, I also don't want to lose too much disk space.

Now my specific question. Does it make sense to create just 2 VDEVs with 12 SSDS as RAIDz1? Or are there better configurations for this.
In general, I would be interested to know if there are any other suggestions for improving the system? Does it make sense to install an NVMe cache, for example?

I look forward to your responses.

Thank you and best regards
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
Does it make sense to create just 2 VDEVs with 12 SSDS as RAIDz1?

No, please read the resource on block storage.

 

danb35

Hall of Famer
Joined
Aug 16, 2011
Messages
15,504
Does the fact that SSDs are involved change this analysis? I'd kind of think it would, as seeks are pretty much instant, but I know there's a whole lot I don't know here.
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
Does the fact that SSDs are involved change this analysis? I'd kind of think it would, as seeks are pretty much instant, but I know there's a whole lot I don't know here.

SSD's make seeks less relevant in some ways, but there's other stuff going on that do not directly involve IOPS. For this I will refer to


and just quote verbatim because I don't have the strength to go through this in detail tonite.

---------------------------------------------------
The problem with RAIDZ isn't just IOPS, it is space allocation. The manner in which space is allocated on RAIDZ is ... um, well, optimized for sequential file data. If you pick the wrong design, which is, as far as I can tell, MOST of them, you end up with space inefficiencies burning up lots of space.

This is especially hurty when you use a larger ashift (such as the 12 which IIRC is current default) because the recordsize or volblocksize tends not to fit efficiently into the allocations, and there are just some really weird interactions that make my head hurt.

If we approach this from one of ZFS's classic examples of space amplification: if you have an ashift=12 (4K sector size) and you're using a volblocksize of 4K, if you do RAIDZ3, you need three parity blocks to protect that one 4K "volblock", and end up using 16K of space (one data, three parity).

But wait! It's even better. With RAIDZ, ZFS always allocates an even number of blocks, adding "padding" blocks to make sure that a block contains an even number, so you can also create pathological conditions where space is wasted on RAIDZ with small record/volblocksize on RAIDZ2 as well.

Now, you can INCREASE the record/volblocksize, and these problems are reduced, absolutely true. But this increases write amplification to an SSD, because you're writing unchanged stuff back out to the SSD.

Reducing to ashift=9 reduces the scope of the allocation inefficiency but creates other problems. Using mirrors makes it possible to use ashift=12 AND a volblocksize=4K and it all works optimally. Your only real chance with ashift=12 RAIDZ is to design your pool with a moderate volblocksize that is designed to cooperate with the RAIDZ structure to deliver optimal efficiency; I leave this as an exercise for the reader.

Basically the contortions here make my brain hurt so I don't like thinking of this and now my head hurts and now I hate you all. :smile:
 

NickF

Guru
Joined
Jun 12, 2014
Messages
763
So...
WHILE YOU ARE NOT WRONG

With mirrors you get a hard 50% storage reduction, in turn you get high performance.

With two vdevs 12-wide RAIDZ1, You're probably going to get more than 50% no matter how you cut....
A less wide pool makes your argument stronger, and I'm not advocating OP make a 12-wide z1...thats crazy talk.

I would say 4 6-wide RAID Z1s would be a good place to start. If you throw a workload at it and you are happy with performance, then go into production with it! If not, setup some mirrors and accept that mirrors are your friend.

Just don't make a 12-wide vdev for z1....
 

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,112
I'll throw a little more fun on the pile and quote myself explaining some "RAIDZ on SSD" peculiarities a few times, specifically the potentially odd behavior regarding space efficiency vs mirrors.



I'd suggest that if you have a robust backup solution, a 3-wide RAIDZ1 instead of 6-wide RAIDZ2 could be a good configuration.

You may still need/want a performance SLOG device, but at the level where you're trying to feed 24x SAS SSDs, you're starting to hit the point where even Optane may not keep up. It's worth testing out certainly.
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
With mirrors you get a hard 50% storage reduction, in turn you get high performance.

Well, let's actually strive for correctness here. Mirrors get at *least* a 50% storage reduction; it's easy to forget that there are many environments with requirements such as "redundancy shall be maintained" which mandates mirroring three ways which is a 66.6_% storage reduction. Or four way which would be 75%. And we can discuss performance more precisely as well. Write performance for a mirror vdev is basically 1x the slowest component's IOPS. Meanwhile, reads are a function of the number of components; read IOPS can generally be as much as Nx the average component IOPS, where N is the number of components. ZFS generally does not hit that unless there is some parallelism in the pool access patterns however, and there's some valid argument that concurrent reads are better serviced through careful ARC/L2ARC design.

By way of comparison, RAIDZ is very sucky, being limited because of the involvement of multiple components for every I/O. This may not be catastrophic from the point of view that there are opportunities with small block sizes to have up to N/2 simultaneous operations in process on an N-component RAIDZ1, but that is significantly less than mirrors, and actual space efficiency gets tied to the block size selection. Also, rather than improve as the number of components increases, with RAIDZ you instead tend to see a decrease in performance. RAIDZ is therefore very hard on flash storage especially if you have a poorly selected block size, you can blow through endurance very rapidly if you get your computations for time warp wrong. Oh, wait, wrong movie...
 

MisterDeeds

Cadet
Joined
May 1, 2023
Messages
5
Thank you for the numerous explanations and answers!

A mirror is out of the question for me, because the loss of storage is too big. All data on it is additionally backed up and replicated - so the risk of data loss is low.

Storage is only a small part of my job, so I'm looking for a simple and workable solution - which is easy to implement and not a lot of tweaking - even if it might not result in the absolute best performance.

I think I will start with 4x a 6-SSDs RAID Z1 and see how the performance is.

Thanks to you!
 
Top