Seeking benchmarks or performance anecdotes about special vdev metadata performance comparison between optane and nvme

unphased · Jul 23, 2023

I'm trying to decide how to move forward with my ZFS setup (ZFS on linux, hopefully that doesn't make this off topic posting here... my questions are all high level hardware choice questions and strategy though.)

Already have drives, had them for months, just been bogged down by research rabbit holes. One thing I'm realizing is that I ought to really go ahead and spin up the pool and load data in it -- if I want to leverage a special metadata vdev I can add it later, do a series of file moves within the same pool to rewrite the data, and that should take care of distributing the data. That will help get me out of my procrastinated state on this project. I'd been planning to do a carefully orchestrated scheme of setting up a degraded RAIDZ2 from the start with one missing disk, so that I can buy one more disk at a later time to bring online the additional redundancy. This scheme does require a bit of planning ahead, but I've been taking the planning ahead a bit far and went down the research rabbit hole.
I'm really curious about how much I could push the envelope when it comes to metadata access speed. I don't really have a hard use case, but it's some kind of intrinsically satisfying thing to be able to query a large quantity of metadata quickly.
- What I currently do for my data is make a call to
  Code:
```
find /pool -type f -printf "%M %s %t %p\\n" > ~/find_zfs
```
  periodically (manually, on demand), which will take maybe a minute to run (though should be longer these days as I've set up Time Machine for my macbook to sync to this pool now...) and then I can do realtime fuzzy search of my entire data with no latency using a tool like FZF: https://github.com/junegunn/fzf This wonderful piece of software can let you search through gigabytes of text content with ease. Although this is actually a strong argument for not needing fast metadata access, it really still makes me really really wonder what I could set up to get the fastest possible speed of fetching the metadata.
I understand well that metadata access with ZFS can be sped up through many methods. If it fits in system RAM, then it could be served by ARC, I have 96GB of RAM in the system and for one reason or another it isn't being cached too well by ARC when I'm dumping the entire file listing with the find command. Probably because I don't have *that* much RAM.
- L2ARC can accelerate metadata access. It cannot accelerate metadata write, but metadata write isn't really ever gonna be a bottleneck for me.
- Using a Special vdev for metadata can accelerate access in a guaranteed way. It's a commitment requiring mirrored devices, and
My question is, will optane offer any advantage over NVMe when used as special vdev (in a mirror of course)? It seems clear that it has advantages for SLOG but I'm not terribly interested in SLOG performance because I don't mess around with networked VM storage and I take care not to litter filesystems with tiny files in any software that I build (as well as avoid where possible using software that does that kind of thing).
I also had a separate question, which is: I've read about Optane being beefy enough to host L2ARC and SLOG together, or SLOG and special vdev together (I think I read that here), but what about all three (SLOG, special metadata vdev, and L2ARC)? How insane would that actually be. I know this question sort of contradicts my earlier statement that I don't care about small file write performance. But like I'll gladly resort to using it as justification for buying more gadgets I don't need (in this case optane storage devices).

All because I found out both PLX cards and Optane 900p are at reasonable price points by now in a way that they were not fairly recently. And I'd be pretty stoked to put them toward something cool.

Thanks!

unphased · Jul 23, 2023

I did find this topic https://forum.level1techs.com/t/zfs-metadata-special-device-real-world-perf-demo/191533 which is informative but not terribly conclusive. Def seems to validate most of the learning that I've tried to summarize above, though.

What I wanna get a handle on is: Can Optane actually offer a tangible speedup for metadata access or write over NVMe? If so I'd love to use that to justify obtaining some of these devices.

If not, then I should wait even more for prices to drop further, and merely press a mirror of garden variety NVMes into service for special vdev use, because I have no short term use case for small file write loads.

sretalla · Jul 24, 2023

unphased said:
what about all three (SLOG, special metadata vdev, and L2ARC)? How insane would that actually be.

How much do you care about the data in your pool? (answer determines level of insanity)

That would create a situation where a failure of the optane doesn't just stop your access to L2ARC and SLOG (which a pool can survive), but kills your pool with the loss of the metadata VDEV.

Optane is generally valued for two attributes... speed and endurance.

NVMEs may match or even exceed the speed, but it's unlikely that you'll find a "cheap consumer NVME" with anything like the endurance of optane. If you can fund such a beast, consider using it.

If you're doing SLOG or L2ARC, you want endurance, not just speed (and PLP for SLOG).

A metadata/special VDEV doesn't put the same requirement on the drives in terms of endurance as writes are less frequent and reads don't impact endurance.

Etorix · Jul 24, 2023

unphased said:
L2ARC can accelerate metadata access. It cannot accelerate metadata write, but metadata write isn't really ever gonna be a bottleneck for me.

So don't go for special vdev, which requires redundancy not to become a liability, and rely on (metadata-only?) L2ARC, or just plan ARC.
What's arc_summary telling of your system and actual use?

unphased said:
I also had a separate question, which is: I've read about Optane being beefy enough to host L2ARC and SLOG together, or SLOG and special vdev together (I think I read that here), but what about all three (SLOG, special metadata vdev, and L2ARC)? How insane would that actually be.

L2ARC+SLOG may be acceptable for home use. Mixing special vdev duty with anything else IS insanity, for the reason given by @sretalla (special vdev is pool-critical).

unphased said:
But like I'll gladly resort to using it as justification for buying more gadgets I don't need (in this case optane storage devices).
All because I found out both PLX cards and Optane 900p are at reasonable price points by now in a way that they were not fairly recently. And I'd be pretty stoked to put them toward something cool.

Gear Acquisition Syndome is NOT a good way to build a reliable NAS. And the less power-eating and heat-generating additional devices there are, the better.

unphased · Jul 24, 2023

Etorix said:
What's arc_summary telling of your system and actual use?

I get an error. I've posted about this on reddit one year ago, without a satisfactory resolution: https://www.reddit.com/r/zfs/comments/w6zina/comment/ihi4p14/

When will I be allowed to edit my posts?

Thanks for your input. It is very helpful.

You are astute in your diagnosis of Gear Acquisition Syndrome. The only counterpoint that I have as regards that is that I think the hands on experience is gonna be pretty valuable. Basically if I want to be able to have the knowledge and confidence to be able to configure/work with systems like this, I can start doing it from a hobbyist perspective or I try to get a job in the industry. Since only one of those choices is fun then that's clearly what it's gonna have to be.

So here I am checking out all the lego pieces that drop to around the $100 mark where I can justify to myself that I can use them eventually to make something neat out of.

The point about heat and complexity is also well taken. I'd be pushing the limits for sure (considering dual non blower 3090 GPUs will also be in the case) trying to add complexity to the ZFS setup. There is a very salient component of "maybe think about whether or not you should, and not just whether or not you could" relating to cramming ever more stuff into a consumer platform using PLX cards.

Ericloewe · Jul 24, 2023

Mod note: it's not that OP suffers from ADHD, I just merged several posts together.
Unfortunately, new users need to have the edit function disabled to cut down on post-hoc spam. You're automatically promoted out of the kiddie pool after a few natural interactions.

unphased · Jul 24, 2023

I'll summarize some take-aways from this then as notes to myself and other gear acquisition syndrome sufferers...
- I didn't get hard numbers on performance comparison between optane and nvme w.r.t. special vdev performance for metadata. As I already admitted earlier, this knowledge would be of questionable value anyway. But it feels like there wouldn't be significant difference. Indeed from the l1t thread the optane-backed metadata lookup was multiple times faster, but I'm now expecting something pretty similar if it was nvme-backed instead.
- Really wanna be thinking more about redundancy requirements for special vdev. Likely want a triple mirror or something. It's really not a choice to be taken lightly to use such a pool architecture
- A likely much more sane approach to my stated use case would be perhaps a single 900p optane device, deployed as combination SLOG and L2ARC

This is a reasonable conclusion I guess -- but I've read a lot of threads around these parts indicating it's not necessarily easy to make L2ARC behave like you want: https://www.truenas.com/community/t...e-metadata-special-device-replacement.107327/

That's what got me into this rabbit hole.

First order of business is still to get to a point where I have proper zfsutils so I can check arc_summary and troubleshoot the ARC caching of my metadata. Probably many might agree that if I care about performance in this way I'd be better served using a server platform with a lot more RAM in it.

unphased · Jul 24, 2023

Thanks!

Ericloewe said:
it's not that OP suffers from ADHD

I assure you I do. I just don't tell my doctor about it.

Important Announcement for the TrueNAS Community.

Seeking benchmarks or performance anecdotes about special vdev metadata performance comparison between optane and nvme

unphased

Cadet

unphased

Cadet

sretalla

Powered by Neutrality

Etorix

Wizard

unphased

Cadet

Ericloewe

Server Wrangler

unphased

Cadet

unphased

Cadet

Similar threads