SOLVED Clarification on different vdev types

jblack · Apr 18, 2023

I've spent the last day or so reading up on the different types of vdevs in a pool and I'm basically just looking for someone to confirm or correct my understanding. I also have a couple of questions that I'll get to after.
vdev types:

Data: Stores the files themselves, and everything else if no special vdevs are used.

Cache: I believe this is what people refer to as L2ARC, basically a pool-specific extension of the RAM-based ARC. Can improve read speeds by caching some files on higher speed drives. Should not be used on a system with less than 32/64GB (couldn't find a strong consensus there) or it may hurt performance by using up RAM. Should be less than 10x the total system RAM in size. Should be high speed and high endurance (since it's written to a lot), but failure isn't a huge deal as it won't cause data loss. This won't really do anything unless the system is getting a lot of ARC misses.

Log: I believe this is what people refer to as SLOG, a separate, higher speed vdev for write logs. Can improve speeds for synchronous writes. A synchronous write is when the ZFS write-data (not the files themselves, but some sort of ZFS-specific write log) is written to the RAM cache (ARC) and the pool (ZIL or SLOG if available) at the same time, vs an asynchronous write where it's written to ARC, then eventually gets moved to the pool. SLOG basically replaces the ZIL, but with faster storage, allowing sync writes to complete faster. Should be high speed, ~~but doesn't need to be super high endurance like cache, since it sees a lot less writes.~~ (Edit: I don't actually know this to be true. jgreco's guide on SLOGs says it should be high endurance, so maybe I don't understand exactly what the 'intent log' data is) Won't do anything for async writes, and general file storing is usually mostly async.

Hotspare: A backup physical drive (or multiple drives) that are kept running, but no data is written to. In the event of a disk failure, the hot spare can be used to replace the failed disk without needing to physically move any disks around. Hotspare disks should be the same disks as whatever disks they will replace.

Metadata: A Separate vdev for storing just the metadata of the main data vdev(s), allowing it to be run on much faster storage. This speeds up file browsing or searching, as well as reading lots of files (at least, it speeds up the locating of the files, not the actual reading itself). If this vdev dies, the whole pool dies, so this should be a 2/3-way mirror. Should be high speed, but doesn't need super high endurance like cache.

Dedup: Stores the de-duplication tables for the data vdev(s) on faster storage, (I'm guessing) to speed up de-duplication tasks. I haven't really come across many posts about this, so I don't really know what the write frequency looks like.

I've also seen people mention small-file vdevs as a way to improve read/write speed for lots of small files. I've also seen mentions of special vdevs, which seem to be a combination of metadata and small-file, however, I haven't been able to find anything like that in the current TrueNAS Scale GUI. I'm interested in the small-files vdev and it seems like maybe that's part of what the metadata vdev does, but I can't seem to find a concrete answer on that. Does the small-file vdev still exist? If so, how do I go about that?
Also, I've seen some people saying that a metadata vdev can increase overall speed of a pool, but from my understanding, it only allows the files to be searched and found faster, not actually written or read any faster. Does this increased finding speed actually make a big difference in general pool usage? Or does it speed things up in another way?

Sorry for the long post, I just didn't want to make a bunch of smaller posts since these are all related.

Davvo · Apr 18, 2023

Your definitions are basically correct. You can find infos on the small file vdev looking for fusion pools.

/scale/scaletutorials/storage/fusionpoolsscale/

Just note that it's usually possibile to use L2ARC to do the work of metadata or dedup vdevs, which is generally suggested unless you really know what are doing and have a very specific need.

Anyway, why did you post in FreeNas when you are using SCALE?

Patrick M. Hausen · Apr 18, 2023

jblack said:
I've also seen people mention small-file vdevs as a way to improve read/write speed for lots of small files. I've also seen mentions of special vdevs, which seem to be a combination of metadata and small-file, however, I haven't been able to find anything like that in the current TrueNAS Scale GUI. I'm interested in the small-files vdev and it seems like maybe that's part of what the metadata vdev does, but I can't seem to find a concrete answer on that.

Same vdev as metadata. There's a tunable that defines up to which size files will be stored on the special vdev in addition to metadata. So it's really a "special" vdev that always stores metadata and optionally small files, too.

jblack said:
Also, I've seen some people saying that a metadata vdev can increase overall speed of a pool, but from my understanding, it only allows the files to be searched and found faster, not actually written or read any faster. Does this increased finding speed actually make a big difference in general pool usage? Or does it speed things up in another way?

Depends on your application. Serving files to dozens or hundreds of users via SMB with potentially thousands of files in a directory is quite heavy on metadata operations. Add to that that the perceived performance of the server might - depending on the users' expectations - not so much depend on raw file read/write speed. As long as there is a reasonable progress indicator and the task gets done in a couple of seconds, in my experience all is well. But opening that network share or a subdirectory and waiting 30 seconds for the file list to even appear - no way people are going to accept that. That's where a special vdev can improve - as I wrote - "perceived" performance.

Everything in pool layout depends on the use case. File sharing? SLOG is 100% useless. NFS storage for VMware? Go for SLOG and don't even think to use RAIDZn ...

jblack · Apr 18, 2023

Patrick M. Hausen said:
Same vdev as metadata. There's a tunable that defines up to which size files will be stored on the special vdev in addition to metadata. So it's really a "special" vdev that always stores metadata and optionally small files, too.

Ah, ok I see that option on my test pool.

Patrick M. Hausen said:
Depends on your application. Serving files to dozens or hundreds of users via SMB with potentially thousands of files in a directory is quite heavy on metadata operations. Add to that that the perceived performance of the server might - depending on the users' expectations - not so much depend on raw file read/write speed. As long as there is a reasonable progress indicator and the task gets done in a couple of seconds, in my experience all is well. But opening that network share or a subdirectory and waiting 30 seconds for the file list to even appear - no way people are going to accept that. That's where a special vdev can improve - as I wrote - "perceived" performance.

Everything in pool layout depends on the use case. File sharing? SLOG is 100% useless. NFS storage for VMware? Go for SLOG and don't even think to use RAIDZn ...

That makes sense, for me a metedata vdev probably wouldn't make much difference, but a slog might. I'll have to just do some testing once I build my system then.

Arwen · Apr 19, 2023

Davvo said:
...
Just note that it's usually possible to use L2ARC to do the work of metadata or dedup vdevs, which is generally suggested unless you really know what are doing and have a very specific need.
...

And the real nice thing about L2ARC verses special vDev for metadata. is that a Mirror, (or 3 way Mirror for RAID-Z2, or 4 way Mirror for RAID-Z3), is not required. Single device, self populating. Loss of L2ARC is not fatal, even during normal operation. Server simply starts fetching the data from the pool's data vDevs.

On the other hand, loss of a special vDev for metadata is total loss of the ENTIRE pool. Thus the recommended configuration of;
- For a RAID-Z1 pool, 2 way Mirror for special vDev
- For a RAID-Z2 pool, 3 way Mirror for special vDev
- For a RAID-Z3 pool, 4 way Mirror for special vDev
That maintains the same number of disk failures before pool loss, between the data vDevs and the special vDevs.

To be clear, as far as I know, it is not possible to RAID-Zx or Mirror L2ARC. It can be striped if desired, (meaning more than 1 L2ARC device on a pool).

Johnny Fartpants · Apr 19, 2023

Arwen said:
To be clear, as far as I know, it is not possible to RAID-Zx or Mirror L2ARC. It can be striped if desired, (meaning more than 1 L2ARC device on a pool).

This is true and striping the L2 would improve performance *IF* an L2 would be helpful (which in a lot of cases it wouldn’t).

jblack · Apr 19, 2023

Davvo said:
Just note that it's usually possibile to use L2ARC to do the work of metadata or dedup vdevs, which is generally suggested unless you really know what are doing and have a very specific need.

Do you have any suggestions on how to go about doing this? Adding a L2ARC just for metadata/dedup would be a pretty cheap experiment.

Johnny Fartpants said:
This is true and striping the L2 would improve performance *IF* an L2 would be helpful (which in a lot of cases it wouldn’t).

Which is only when there are substantial ARC misses and the system has enough RAM that it doesn't mind giving some up for running the L2ARC right?

Davvo · Apr 19, 2023

jblack said:
Do you have any suggestions on how to go about doing this? Adding a L2ARC just for metadata/dedup would be a pretty cheap experiment.

As the others said you will have to play with tunables a bit (there is a lot of documentation about this in the forum), you can even make it (sort of) persistent.

Examples of properties would be secondarycache=metadata or l2arc_rebuild_enabled.

A place to start would be the following thread (it's big).

ZFS "ARC" doesn't seem that smart...

Without yet resorting to adding an L2ARC, is there a "tuneable" that I can test which instructs the ARC to prioritize metadata? I upgraded from 16GB to 32GB ECC RAM. Yet there is zero change in this behavior. This keeps happening: I run regular rsync tasks from a few local clients, which is...

www.truenas.com

Anyway, you are likely to receive more focused help if you open a thread regarding your use case and the possibile configurations.

jblack said:
Which is only when there are substantial ARC misses and the system has enough RAM that it doesn't mind giving some up for running the L2ARC right?

There is a ratio that should be followed regarding L2ARC implementation, which iirc should be 1:8 or 1:6 of ARC to L2ARC, and usually it's suggested to max out RAM capacity first or have at least 64GB of it.

From my experience it's hard to find striped L2ARC on (relatively) small systems such as the ones on this forum; maybe it's more common in big industrial/enterprise applications.

Important Announcement for the TrueNAS Community.

SOLVED Clarification on different vdev types

jblack

Dabbler

Davvo

MVP

Patrick M. Hausen

Hall of Famer

jblack

Dabbler

Arwen

MVP

Johnny Fartpants

Guru

jblack

Dabbler

Davvo

MVP

ZFS "ARC" doesn't seem that smart...

Similar threads