Hi OP,
I think you raise several very important questions that I have also been trying to find the answers to. This is something I have been actively researching myself, so I hope my findings and research can help you. I apologise for the formatting of this, I am on my phone in bed and can't sleep lol.
From
my current understanding of the way in which this is implemented, the way in which Dedupe works now with a special allocation VDEV dedicated for it isn't much different than how it worked prior.
As we all know, the community has been preaching to not use Dedupe with ZFS for years because the implementation sucks. NetApp, Pure, etc all do it significantly better in sane ways that don't use all of your memory.
The special allocation class doesn't fix the underlying issues with Dedupe, it is a bandaid. They are leveraging the fact that hardware has become faster and are using that to their advantage to cheat. Matt Ahrens, one of the original ZFS team members, proposed some solutions to actually fix Dedupe in 2017.
https://youtu.be/PYxFDBgxFS8 None of that has yet materialized.
So, how did it actually work before the introduction of the special allocation class? Again, This is my current understanding. The Dedupe table needs to be persistent, because if you lose it during a reboot because it is only in RAM, you're data has holes in it. Anything that was deduped would be garbage. Because of this, The table was stored in your main pool's VDEVs like any other data. Similar in that way to ARC, if your Dedupe table in memory doesn't have what it's looking for, it looks on the disk. Because it's not just a cache like the ARC though, this has a more severe performance penalty than an ARC miss. You end up in a position where you're main data VDEVs are spending significant amounts of their IO just doing lookups in the Dedupe table.
The special allocation class simply just moves this function off of your main VDEVs onto dedicated devices to help alleviate some of the performance penalty. You can think of it similarly to an L2ARC, but the data isn't just a cache and if the special VDEV is lost your data has holes in it and your whole pool is dead.
The jury is still out on the effectiveness of this method. No one has really reported on it . Do you still need to follow the old ratio wisdom? I think it depends on how fast your VDEV is and how busy your pool is. You would also need to do some tuning on your ARC to make sure you can fit enough metadata in it to keep your L2ARC populated and cache hits up.
What makes this all more frustrating is that the operating system doesn't even expose Dedupe ratios in the GUI. In fact, It literally ignores the deduped data and counts it as if it weren't deduped, so from a glance it looks like it's not doing anything at all. There's no way of even visualizing it's effectiveness without digging into the command line. It will show you the I/O of your Dedupe VDEV though, so you will be able to see how busy it is at any given time and track trends.
I hate to say it, but if you're here to ask questions of the community here, I don't think you're going to get more satisfactory answers. This feature really hasn't been tested by many people here. These forums are a really good place for people to rehash the old adadges, but many here are unwilling to go beyond the "orthodoxy" as you have probably noticed by being chastised for even asking the question.
I can at least say that I have tested this new feature in a very limited way, and it does work. I created a test VM, cloned it 4 times and watched as I saw I/O hitting the special VDEV as expected. I can't tell you what performance would be like when it has to go to the special VDEV for reads because my DDT was only like 1200 bytes so it was in RAM. But, it definitely is writing to that VDEV instead of the main pool, which is saving some additional I/O that would be wasted on writes. That being said, write performance seemed about the same with Dedupe on vs with Dedupe off, which is a good sign. But you are limited by the special vdevs speed for your writes. If your main pool is 11 VDEVs wide, you would probably need more than a single mirror of NVME to actually be fast enough to keep up, my guess would be that you would want 2 special vdevs for Dedupe. Because you are using spinners here, I'm not really sure though. The 5GB per TB guide assumed there wasn't a second usable tier, but I don't think they would have to be particularly large, just particularly fast. There isn't a formula or a reference, so this is all just guessing
Hi all, I've been playing around on my test bench and I'm not really sure what I'm doing wrong. I am using TrueNAS core as a backend storage device for an ESXI host. I installed 2x Samsung 480GB SM953s and set them up as a dedupe VDEV on the pool where the iSCSI target is located. I ZVOL I...
www.ixsystems.com
I think the only way were going to get a real answer here is if someone does testing in a production environment and actually reports their findings here.
i would like to say, though, that Honeybadger does have an interesting point. Doing your workload in VMWare Horizon in a pool (not a ZFS pool lol) of either instant or linked clones would effectively solve the same problem at a different layer in the stack. Obviously, that would not help you if you are also doing traditional Datacenter workloads beyond your current VDI application however.