Redundancy necessary for special Metadata vdev?

ajgnet

Explorer
Joined
Jun 16, 2020
Messages
65
New to the forums, but not new to FreeNAS/TrueNAS. I've been running TrueNAS 12 and Fusion pools but had a few questions about it.

I can't figure out if I need to set up redundant mirrored disks for the speical Metadata vdev. If the Metadata vdev is lost with a single drive, will the data vdev still function? Can the Metadata vdev still be rebuilt?

Also, what is optimal sizing for a Metadata vdev? Trying to figure out if there is any benefit to using large SSDs for this purpose. My data vdev is around 128 TB.
 

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,703
OK, so the first thing I notice is the warning when creating a new pool with a mirror for data and a stripe for metadata:
Metadata vdev must be the same type as the data vdevs. First data vdev is a mirror, new Metadata vdev is a stripe.

If I force it to ignore that, I can create the pool.

I then offline one of the disks in the stripe after putting a little data on the pool...

But the system throws an error indicating not enough replicas to offline the disk... so no option in the GUI... now for the CLI

Even the CLI will not offline a disk that's part of that metadata stripe VDEV.

My guess is you can't because that would kill the pool... so inferring from that... you certainly need to have redundancy in the metadata VDEV(s) if you want your pool to survive a disk failure.

Also, what is optimal sizing for a Metadata vdev? Trying to figure out if there is any benefit to using large SSDs for this purpose. My data vdev is around 128 TB.
I think the concept was that metadata could be expected to be 1G per TB, so for you 128GB could be enough. I'm certainly not prefessing to be the ultimate expert on that, having not run anything in production yet.
 

Yorick

Wizard
Joined
Nov 4, 2018
Messages
1,912
That’s exactly right. A special allocation vdev is part of the pool and subject to the same rules as any other mirror vdev that’s part of a pool. Lose the special allocation vdev and lose the pool.

A metadata-only L2ARC doesn’t have those limitations, it can be safely removed, and can be a single disk. It’s not persistent - yet.
 

ajgnet

Explorer
Joined
Jun 16, 2020
Messages
65
Thanks so much. So if I set up a sufficiently-sized L2ARC with all the default configuration, non-persistent metadata will be stored on the L2ARC as populated?
 

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,703
non-persistent metadata will be stored on the L2ARC as populated?
You need to tell it to be metadata only ARC or it won't do that (at least I don't think so).
Have a look at these threads:
 

ajgnet

Explorer
Joined
Jun 16, 2020
Messages
65
Thanks, that's extremely helpful. If I understand it correctly, I can only have an L2ARC set as metadata only (secondarycache=metadata) or as a read-only cache for the data itself (secondarycache=all) which excludes storing a cache of the metadata in the L2ARC. Is there no way to do both simultaneously? Perhaps with 2 cache disks?
 

ajgnet

Explorer
Joined
Jun 16, 2020
Messages
65
Do I need to worry about
OK, so the first thing I notice is the warning when creating a new pool with a mirror for data and a stripe for metadata:
Metadata vdev must be the same type as the data vdevs. First data vdev is a mirror, new Metadata vdev is a stripe.

If I force it to ignore that, I can create the pool.
Do I need to worry about the new pool being of a different type? Particularly if using SSDs for the special metadata mirror vdev and platter disks in raidz2 for the data vdev? Example:
Code:
root@ubuntu20zfs:/tank# zpool status
  pool: tank
 state: ONLINE
  scan: none requested
config:

        NAME                 STATE     READ WRITE CKSUM
        tank                 ONLINE       0     0     0
          raidz2-0           ONLINE       0     0     0
            /root/data1.img  ONLINE       0     0     0
            /root/data2.img  ONLINE       0     0     0
            /root/data3.img  ONLINE       0     0     0
            /root/data4.img  ONLINE       0     0     0
            /root/data5.img  ONLINE       0     0     0
        special
          mirror-1           ONLINE       0     0     0
            /root/data8.img  ONLINE       0     0     0
            /root/data9.img  ONLINE       0     0     0

errors: No known data errors
 

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,703
Is there no way to do both simultaneously? Perhaps with 2 cache disks?
Unlikely. That setting is done at the level of the ZFS pool, so the same pool can't have 2 different settings for it.

I did see that maybe it was at dataset level also, so it may be possible to have some datasets caching content and others doing metadata, but again, not both for the same one.
 

Yorick

Wizard
Joined
Nov 4, 2018
Messages
1,912
Do I need to worry about the new pool being of a different type? Particularly if using SSDs for the special metadata mirror vdev and platter disks in raidz2 for the data vdev?

No, that's fine. What you want is the same level of risk, give or take. A mirror SSD vdev is just as resilient as a platter raidz2, arguably more so. Using platter for special would be concerning, but then why would one ever.
 

ajgnet

Explorer
Joined
Jun 16, 2020
Messages
65
How would I set the ashifts correctly if my raidz2 pool is HDD and my special mirror consists of 2 SSDs? Is there a way to set a different ashift for each vdev?
 

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,112
How would I set the ashifts correctly if my raidz2 pool is HDD and my special mirror consists of 2 SSDs? Is there a way to set a different ashift for each vdev?
TrueNAS should default to ashift=12 for all devices including the special mirror, and honestly you shouldn't use anything less than that with how common 512e drives are. Are you looking to increase the ashift value?
 

ajgnet

Explorer
Joined
Jun 16, 2020
Messages
65
Could I do something like this (using /zfs/* sparse files as an example)

Code:
zpool create library raidz2 /zfs/disk[1-8] -o ashift=12 # PLATTER DISKS
zpool add library special mirror /zfs/meta[1-2] -o ashift=13 -f #Mirrored SSDs
zpool add library cache /zfs/cache1 -o ashift=13 #NVME
zpool add library log /zfs/slog11 -o ashift=13  #NVME


And then add the below to put small blocks on the faster SSDs? (What happens if they run out of space?)
Code:
zfs set special_small_blocks=32K library
 

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,112
Could I do something like this (using /zfs/* sparse files as an example)

Code:
zpool create library raidz2 /zfs/disk[1-8] -o ashift=12 # PLATTER DISKS
zpool add library special mirror /zfs/meta[1-2] -o ashift=13 -f #Mirrored SSDs
zpool add library cache /zfs/cache1 -o ashift=13 #NVME
zpool add library log /zfs/slog11 -o ashift=13  #NVME

Creating pools from the command line isn't a supported method, but if you want to match the creation options you would want to use something like this to match the GUI setup - note the use of /dev/gptid/ for adding members.

Code:
zpool create -o cachefile=/data/zfs/zpool.cache -o failmode=continue -o autoexpand=on -o ashift=12 -O compression=lz4 -O aclmode=passthrough -O aclinherit=passthrough -f -m /library -o altroot=/mnt library raidz2 /dev/gptid/gpt-ids-go-here


You could then use the lines to add the special/cache/log vdevs with the increased ashift - but generally speaking, most devices are optimized for 4K blocks. 8K does provide some increases but might add overhead in terms of consumed space.

And then add the below to put small blocks on the faster SSDs? (What happens if they run out of space?)
Code:
zfs set special_small_blocks=32K library

If you run out of space it will overflow to the main vdevs.
 

sdellape

Cadet
Joined
Jan 6, 2021
Messages
1
Unlikely. That setting is done at the level of the ZFS pool, so the same pool can't have 2 different settings for it.

I did see that maybe it was at dataset level also, so it may be possible to have some datasets caching content and others doing metadata, but again, not both for the same one.

1. secondarycache=all implies both. There currently is no way to cache data only (although it’s been proposed).

2. This setting is dataset-level
 
Top