Redundancy necessary for special Metadata vdev?

ajgnet · Jun 16, 2020

New to the forums, but not new to FreeNAS/TrueNAS. I've been running TrueNAS 12 and Fusion pools but had a few questions about it.

I can't figure out if I need to set up redundant mirrored disks for the speical Metadata vdev. If the Metadata vdev is lost with a single drive, will the data vdev still function? Can the Metadata vdev still be rebuilt?

Also, what is optimal sizing for a Metadata vdev? Trying to figure out if there is any benefit to using large SSDs for this purpose. My data vdev is around 128 TB.

sretalla · Jun 16, 2020

OK, so the first thing I notice is the warning when creating a new pool with a mirror for data and a stripe for metadata:
Metadata vdev must be the same type as the data vdevs. First data vdev is a mirror, new Metadata vdev is a stripe.

If I force it to ignore that, I can create the pool.

I then offline one of the disks in the stripe after putting a little data on the pool...

But the system throws an error indicating not enough replicas to offline the disk... so no option in the GUI... now for the CLI

Even the CLI will not offline a disk that's part of that metadata stripe VDEV.

My guess is you can't because that would kill the pool... so inferring from that... you certainly need to have redundancy in the metadata VDEV(s) if you want your pool to survive a disk failure.

ajgnet said:
Also, what is optimal sizing for a Metadata vdev? Trying to figure out if there is any benefit to using large SSDs for this purpose. My data vdev is around 128 TB.

I think the concept was that metadata could be expected to be 1G per TB, so for you 128GB could be enough. I'm certainly not prefessing to be the ultimate expert on that, having not run anything in production yet.

Yorick · Jun 16, 2020

That’s exactly right. A special allocation vdev is part of the pool and subject to the same rules as any other mirror vdev that’s part of a pool. Lose the special allocation vdev and lose the pool.

A metadata-only L2ARC doesn’t have those limitations, it can be safely removed, and can be a single disk. It’s not persistent - yet.

ajgnet · Jun 16, 2020

Thanks so much. So if I set up a sufficiently-sized L2ARC with all the default configuration, non-persistent metadata will be stored on the L2ARC as populated?

sretalla · Jun 16, 2020

ajgnet said:
non-persistent metadata will be stored on the L2ARC as populated?

You need to tell it to be metadata only ARC or it won't do that (at least I don't think so).
Have a look at these threads:

L2ARC impact on rsync performance for largely dormant data

I've been doing some more testing and the evidence suggests that using a L2ARC with a metadata only flag has a potential substantial positive impact on rsync performance over AFP. For largely dormant data, the combination of L2ARC dedicated to metadata seems to benefit rsync performance a lot...

www.ixsystems.com

L2ARC device recommendation?

I'm pondering adding an SSD for L2ARC dedicated to metadata to hopefully speed up rsync. I use rsync to back up to two different pairs of striped disks for offsite backup (I also have two automatically replicated RAIDZ2 onsite backups). The data is mostly static, and rsync takes many hours to...

www.ixsystems.com

adding l2arc for metadata only

Hello, Following the advice kindly given here I extended a pool. I want to make sure I've done things right. I pressed add cache, then added the disk to read cache. After that I went over to shell and entered the command: zfs set secondarycache=metadata misradpool/misraddata...

www.ixsystems.com

ajgnet · Jun 16, 2020

Thanks, that's extremely helpful. If I understand it correctly, I can only have an L2ARC set as metadata only (secondarycache=metadata) or as a read-only cache for the data itself (secondarycache=all) which excludes storing a cache of the metadata in the L2ARC. Is there no way to do both simultaneously? Perhaps with 2 cache disks?

ajgnet · Jun 16, 2020

Do I need to worry about

sretalla said:
OK, so the first thing I notice is the warning when creating a new pool with a mirror for data and a stripe for metadata:
Metadata vdev must be the same type as the data vdevs. First data vdev is a mirror, new Metadata vdev is a stripe.

If I force it to ignore that, I can create the pool.

Do I need to worry about the new pool being of a different type? Particularly if using SSDs for the special metadata mirror vdev and platter disks in raidz2 for the data vdev? Example:

Code:

root@ubuntu20zfs:/tank# zpool status
  pool: tank
 state: ONLINE
  scan: none requested
config:

        NAME                 STATE     READ WRITE CKSUM
        tank                 ONLINE       0     0     0
          raidz2-0           ONLINE       0     0     0
            /root/data1.img  ONLINE       0     0     0
            /root/data2.img  ONLINE       0     0     0
            /root/data3.img  ONLINE       0     0     0
            /root/data4.img  ONLINE       0     0     0
            /root/data5.img  ONLINE       0     0     0
        special
          mirror-1           ONLINE       0     0     0
            /root/data8.img  ONLINE       0     0     0
            /root/data9.img  ONLINE       0     0     0

errors: No known data errors

sretalla · Jun 16, 2020

ajgnet said:
Is there no way to do both simultaneously? Perhaps with 2 cache disks?

Unlikely. That setting is done at the level of the ZFS pool, so the same pool can't have 2 different settings for it.

I did see that maybe it was at dataset level also, so it may be possible to have some datasets caching content and others doing metadata, but again, not both for the same one.

Yorick · Jun 16, 2020

ajgnet said:
Do I need to worry about the new pool being of a different type? Particularly if using SSDs for the special metadata mirror vdev and platter disks in raidz2 for the data vdev?

No, that's fine. What you want is the same level of risk, give or take. A mirror SSD vdev is just as resilient as a platter raidz2, arguably more so. Using platter for special would be concerning, but then why would one ever.

ajgnet · Jun 18, 2020

How would I set the ashifts correctly if my raidz2 pool is HDD and my special mirror consists of 2 SSDs? Is there a way to set a different ashift for each vdev?

HoneyBadger · Jun 18, 2020

ajgnet said:
How would I set the ashifts correctly if my raidz2 pool is HDD and my special mirror consists of 2 SSDs? Is there a way to set a different ashift for each vdev?

TrueNAS should default to ashift=12 for all devices including the special mirror, and honestly you shouldn't use anything less than that with how common 512e drives are. Are you looking to increase the ashift value?

ajgnet · Jun 18, 2020

Could I do something like this (using /zfs/* sparse files as an example)

Code:

zpool create library raidz2 /zfs/disk[1-8] -o ashift=12 # PLATTER DISKS
zpool add library special mirror /zfs/meta[1-2] -o ashift=13 -f #Mirrored SSDs
zpool add library cache /zfs/cache1 -o ashift=13 #NVME
zpool add library log /zfs/slog11 -o ashift=13  #NVME

And then add the below to put small blocks on the faster SSDs? (What happens if they run out of space?)

Code:

zfs set special_small_blocks=32K library

HoneyBadger · Jun 18, 2020

ajgnet said:

Creating pools from the command line isn't a supported method, but if you want to match the creation options you would want to use something like this to match the GUI setup - note the use of /dev/gptid/ for adding members.

Code:

zpool create -o cachefile=/data/zfs/zpool.cache -o failmode=continue -o autoexpand=on -o ashift=12 -O compression=lz4 -O aclmode=passthrough -O aclinherit=passthrough -f -m /library -o altroot=/mnt library raidz2 /dev/gptid/gpt-ids-go-here

You could then use the lines to add the special/cache/log vdevs with the increased ashift - but generally speaking, most devices are optimized for 4K blocks. 8K does provide some increases but might add overhead in terms of consumed space.

ajgnet said:
And then add the below to put small blocks on the faster SSDs? (What happens if they run out of space?)

Code:
zfs set special_small_blocks=32K library

If you run out of space it will overflow to the main vdevs.

sdellape · Mar 27, 2021

sretalla said:
Unlikely. That setting is done at the level of the ZFS pool, so the same pool can't have 2 different settings for it.

I did see that maybe it was at dataset level also, so it may be possible to have some datasets caching content and others doing metadata, but again, not both for the same one.

1. secondarycache=all implies both. There currently is no way to cache data only (although it’s been proposed).

2. This setting is dataset-level

Important Announcement for The TrueNAS Community.

Redundancy necessary for special Metadata vdev?

ajgnet

Explorer

sretalla

Powered by Neutrality

Yorick

Wizard

ajgnet

Explorer

sretalla

Powered by Neutrality

L2ARC impact on rsync performance for largely dormant data

L2ARC device recommendation?

adding l2arc for metadata only

ajgnet

Explorer

ajgnet

Explorer

sretalla

Powered by Neutrality

Yorick

Wizard

ajgnet

Explorer

HoneyBadger

actually does care

ajgnet

Explorer

HoneyBadger

actually does care

sdellape

Cadet

Similar threads

Important Announcement for The TrueNAS Community.