TrueNAS CORE - Metadata drives and/or ZIL L2ARC drives

dpearcefl

Contributor
Joined
Aug 4, 2015
Messages
145
I am excited about the new metadrive feature but confused as to whether ZIL and L2ARC drives should be used at the same time? They sound like they perform different functions, but it there a point of diminishing returns? Are there any guidelines when a metadrive should be used over ZIL and L2ARC drives?

Thanks.
 

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,112
They sound like they perform different functions, but it there a point of diminishing returns? Are there any guidelines when a metadrive should be used over ZIL and L2ARC drives?

All three have a different purpose and are intended to solve different problems.

Do you have too much "hot data" to fit in your RAM, and can't fit or afford more? Deploy L2ARC as a second-level read cache. It's not RAM fast but it beats spinning disk.

Do you have synchronous writes (eg: NFS clients, hypervisors, OLTP DBs) that have to return fast while remaining safe? Add an SLOG device, which will accelerate the response of the "write to stable storage" data flow.

Do you do a lot of metadata-heavy operations (directory listing, scans, small updates to many dozens/hundreds of thousands of files?) and find that they take far too long? Here's where a dedicated metadata vdev may help - rather than your back-end data vdevs spending time handling metadata reads and writes (especially if they're spinning disks) push this data to separate flash devices which are much faster at the random I/O that's inherent to metadata work.

If your "metadata-heavy" operations primarily result in metadata reads then you can achieve most of the same results with adding L2ARC and setting the secondarycache=meta property. It doesn't help metadata writes - for that, you need the special vdevs. But an important note though is that unlike L2ARC, where all contents are volatile and loss of the device just results in slowdown, a metadata vdev needs redundancy - metadata copies only exist on this vdev, and if you lose it, the whole pool is toast. (Edit: This also means you can't remove it after it's been added. Edit2: Apparently you can, but unless checksum on removal is implemented now, you could end up in a bad spot if you have a read error during a copy of pool metadata.) Mirrors will be heavily recommended and a triple-mirror wouldn't be unreasonable. Depending on your write workload you'll also want to use decently well-rated drives in terms of endurance. While it isn't directly "drinking from the firehose" like an SLOG is for sync writes, you will want to scale it based on how update-heavy your workload is. I'd say "mixed use" SSDs with a 1-3 DWPD rating depending on size are where you'd want to land. Not cheap 0.1 DWPD QLC, but not 25+ DWPD Optane either. (Edit3: Of course, if you can afford Optane, it's the best solution. It also doesn't suffer from increase read latency in a mixed-workload scenario.)

StorageReview did a YouTube podcast with @Kris Moore where he talks a bit about the metadata-only vdevs (sorry, "Fusion Pools" ;) ) and I've linked to that timestamp (hopefully) below.


Edit: You'll notice I didn't mention deduplication here. While you can add separate vdevs for metadata and in fact dedup tables explicitly in TN12 it doesn't relieve the additional memory pressure or extra considerations that arise from enabling deduplication. If you truly needed it before, you'll be very happy to have these vdevs available as it will increase performance (possibly significantly) but if you didn't use it before, don't think that you can just drop a couple SSDs into a Fusion Pool as special type=dedup and enable it globally. It's still a recipe for pain if you do it wrong.
 
Last edited:

dpearcefl

Contributor
Joined
Aug 4, 2015
Messages
145
Thank you for this most-helpful description. I'm sure many other people will benefit.

Can metadata drives be added after the fact? Or only at creation of the vdev?

Is there a rule-of-thumb for sizing the metadrive?
 
Last edited:

Patrick M. Hausen

Hall of Famer
Joined
Nov 25, 2013
Messages
7,776
They are a separate vdev and so can be added after the fact. They only cover new writes from that point onwards. They cannot be removed.
An SLOG or an L2ARC can be removed.
 

dpearcefl

Contributor
Joined
Aug 4, 2015
Messages
145
Is there a rule-of-thumb for sizing the metadrive?
 

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,112
Is there a rule-of-thumb for sizing the metadrive?

"1% of usable storage" but it's only a thumbrule and actual usage can vary wildly.

Like deduplication, the amount of metadata generated depends on the record count, not the amount of data stored. 1T of large files with a 64K average recordsize might only generate 1G of metadata (0.1%) but 1T of ZVOLs used to back VMFS datastores with average 4K and 8K records might generate 10G (1%)
 

Patrick M. Hausen

Hall of Famer
Joined
Nov 25, 2013
Messages
7,776
At a rough estimate of 1% you can of course ridiculously outgrow that with current technology and go for, say, 5% and never reach that capacity. 2 TB NVME M.2 drives ... check. And if this is an environment where that performance boost counts you probably won't even waste money. Your complete system will similarly be 10 to 20 times the cost of those drives.
 

Stilez

Guru
Joined
Apr 8, 2016
Messages
529
Edit: You'll notice I didn't mention deduplication here. While you can add separate vdevs for metadata and in fact dedup tables explicitly in TN12 it doesn't relieve the additional memory pressure or extra considerations that arise from enabling deduplication. If you truly needed it before, you'll be very happy to have these vdevs available as it will increase performance (possibly significantly) but if you didn't use it before, don't think that you can just drop a couple SSDs into a Fusion Pool as special type=dedup and enable it globally. It's still a recipe for pain if you do it wrong.

Adding to this excellent post, specifically about dedup. Dedup places different pressures on the system:

  1. The best known one is a lot of RAM. ZFS will need access to the dedup table records (DDT) potentially for *every* disk block read/write. So you need enough RAM to hold those records in very high speed access (your ARC cache in RAM), and be sure they won't be evicted from ARC. That means you need RAM, or at worst a very good L2ARC, possibly configured as being dedicated to metadata.
    ...
  2. You also need a good CPU. Deduplication checks and generates data hashes, and that puts an extra burden on the CPU. It'll slow you down a bit.
    ...
  3. You need storage hardware that handles 4k I/O *really* well, until OpenZFS 2.0 progresses a bit more. The cached DDT doesn't stay cached forever, and persistent L2ARC isn't here yet. When you reboot, no matter your RAM and L2ARC, *every* block of data you R/W on the pool, will need the related deduplication records fetched. You can see this for example when you try and copy a large (10GB+) file across 10 gigabit ethernet using SMB or iSCSI, and simultaneously run gstat on the *NAS console. What you may naively *expect* is a flurry of large (1 MB + ) disk writes. as ZFS very efficiently writes out your file. But what you *get* is almost zero disk writes, and a *huge* flurry of hundreds of thousands of 4k disk *reads*. That's the relevant blocks of the DDT loading from cold, which it needs before it can do a thing with your actual file operation. Plus write amplification which adds to the burden. Unfortunately spinning disks suck at 4K I/O and this can take *minutes*. So eventually your file operation can stall or time out.

    *This* part of the problem is where a dedicated metadata VDEV *can* really help. Preloading DDT, persistent L2ARC, and other OpenZFS DDT improvements are all being worked on, so we should see a lot happen within a year maybe, or 2 at most? But until they are completed and rolled out, you have no way to force the DDT to reload and it'll be hell on your deduplicated pool, until all of the DDT has been needed (and therefore loaded) if it's held on HDD.

    Dedicated VDEVs do NOT help with 1 or 2.
 
Last edited:

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
Just a note: You can remove the entire metadata vdev, under the same conditions as "normal" device removal. Mostly, that means mirrors only.
 

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,112
Just a note: You can remove the entire metadata vdev, under the same conditions as "normal" device removal. Mostly, that means mirrors only.
Really. I hadn't tested this but thought that special vdev removal wasn't yet supported. I don't think I'd encourage it though unless they've started computing checksums during the removal (last I saw this still wasn't done) - so you'd definitely want a remap/scrub afterwards.
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
Same as any removal, scrub before and scrub afterwards for maximum safety.

Until we get BPR, but that'll be the day.
 

Yorick

Wizard
Joined
Nov 4, 2018
Messages
1,912

J Ree

Cadet
Joined
Sep 3, 2020
Messages
5
How do i remove the special metadata vdev? I tried using the GUI, it threw an exception. (USING TN CORE RC1)
 

Stilez

Guru
Joined
Apr 8, 2016
Messages
529
How do i remove the special metadata vdev? I tried using the GUI, it threw an exception. (USING TN CORE RC1)
Can you post the output of zpool list -v
 

J Ree

Cadet
Joined
Sep 3, 2020
Messages
5
Can you post the output of zpool list -v
Here you go!

NAME SIZE ALLOC FREE CKPOINT EXPANDSZ FRAG CAP DEDUP HEALTH ALTROOT
MAIN 22.0T 9.72T 12.3T - - 0% 44% 1.00x ONLINE /mnt
raidz2 21.8T 9.70T 12.1T - - 0% 44.5% - ONLINE
gptid/59a35a41-e66b-11ea-9bc0-00155d1b2208 - - - - - - - - ONLINE
gptid/59da79fc-e66b-11ea-9bc0-00155d1b2208 - - - - - - - - ONLINE
gptid/5a7c97c7-e66b-11ea-9bc0-00155d1b2208 - - - - - - - - ONLINE
gptid/5ad54ef6-e66b-11ea-9bc0-00155d1b2208 - - - - - - - - ONLINE
special - - - - - - - - -
mirror 222G 12.5G 209G - - 4% 5.65% - ONLINE
gptid/67c01a06-eda2-11ea-9dd4-00155d1b2208 - - - - - - - - ONLINE
gptid/67d2ef65-eda2-11ea-9dd4-00155d1b2208 - - - - - - - - ONLINE
logs - - - - - - - - -
gptid/27153ef4-ee07-11ea-9dd4-00155d1b2208 232G 340K 232G - - 0% 0.00% - ONLINE
cache - - - - - - - - -
gptid/590763e3-e66b-11ea-9bc0-00155d1b2208 466G 459G 6.88G - - 0% 98.5% - ONLINE
RSYNC 3.62T 4.92G 3.62T - - 0% 0% 1.00x ONLINE /mnt
gptid/772f3ff5-e692-11ea-9919-00155d1b2208 3.62T 4.92G 3.62T - - 0% 0.13% - ONLINE
freenas-boot 9.50G 2.52G 6.98G - - 6% 26% 1.00x ONLINE -
da0p2 9.50G 2.52G 6.98G - - 6% 26.5% -
 

Stilez

Guru
Joined
Apr 8, 2016
Messages
529
How do i remove the special metadata vdev? I tried using the GUI, it threw an exception. (USING TN CORE RC1)
This is the output of the info for man zpool-remove, the command used to remove disks and vdevs from a pool. My emphasis added:

zpool remove [-npw] pool device...
Removes the specified device from the pool. This command
supports removing hot spare, cache, log, and both mirrored and
non-redundant primary top-level vdevs, including dedup and
special vdevs. When the primary pool storage includes a top-
level raidz vdev only hot spare, cache, and log devices can be
removed.

The Web UI threw an exception because the underlying command failed, although admittedly it should have handled it more gracefully. But the failure is genuine, not a bug.

You can't just remove the special vdev, as best I know, because of that restriction. You'd have to rebuilt (replicate) the pool.
 

J Ree

Cadet
Joined
Sep 3, 2020
Messages
5
Can you post the output of zpool list -v

Here is the pic as well
This is the output of the info for man zpool-remove, the command used to remove disks and vdevs from a pool. My emphasis added:

zpool remove [-npw] pool device...
Removes the specified device from the pool. This command
supports removing hot spare, cache, log, and both mirrored and
non-redundant primary top-level vdevs, including dedup and
special vdevs. When the primary pool storage includes a top-
level raidz vdev only hot spare, cache, and log devices can be
removed.

The Web UI threw an exception because the underlying command failed, although admittedly it should have handled it more gracefully. But the failure is genuine, not a bug.

You can't just remove the special vdev, as best I know, because of that restriction. You'd have to rebuilt (replicate) the pool.
Ok thanks. Then why did someone on this post say you can remove it? I am getting contradicting info lol.
 

Stilez

Guru
Joined
Apr 8, 2016
Messages
529
Because not everyone, even the best intentioned, reads all the small print on man zpool-remove. Honest mistake, special vdevs are new in this version and its easy not to know the detailed limitations that can apply to removal
 

J Ree

Cadet
Joined
Sep 3, 2020
Messages
5
Because not everyone, even the best intentioned, reads all the small print on man zpool-remove. Honest mistake, special vdevs are new in this version and its easy not to know the detailed limitations that can apply to removal
Damn. I guess I'm f***Ed then. I don't wanna rebuild Q_Q
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
I did say...
You can remove the entire metadata vdev, under the same conditions as "normal" device removal.
The main condition is "the pool may only have mirrors".
 
Top