Partition NVMe drive as L2ARCs for multiple pools

ilmarmors · Dec 27, 2020

I have warm storage server with two pools - 1) 3 vdevs of 11 disks in RAIDZ2 (long term, usage close to write once, read many, one dataset with 1M record size) and 2) 1 vdev of 2 disks in mirror (scratch space for incoming data upload, processing and preparation for ingest into long term pool, multiple datasets with 128K recordsize). 128GB RAM and Intel OPAL D7-P4610 1.6T NVMe for L2ARC.

I have installed FreeNAS-11.3-U5, can migrate to TrueNAS-12.0 probably when U2 is out, or might upgrade to TrueNAS-12.0-U1 sooner, if there is good reason to do so.

My main goal is to get filesystem metadata in the L2ARC as much as possible. I don't have random load - mainly rsync, find, du, streaming files (which are 1-30MB usually). Biggest gain I have seen is when ARC has cached filesystem metadata, but it can't fit everything in ARC (RAM) and data in ARC is replaced by file data itself during normal use case.

Ideal solution would be shared L2ARC, when ZFS takes care of balancing usage of L2ARC among multiple pools, but that is not possible currently and probably won't be for some time - https://github.com/openzfs/zfs/issues/9859

FreeNAS interface allows adding whole devices as L2ARC to the pool. I have only one NVMe drive, and due to reasons beyond my control, I won't be able to upgrade particular server.

How safe or dangerous is following workaround - partition NVMe disk manually and add individual paritions as L2ARC cache for different pools? In my case I created two partiions - 128G for scratch pool and remaining for big tank pool:

root@freenas# gpart create -s GPT /dev/nvd0
nvd0 created
root@freenas# gpart add -t freebsd-zfs -a 1m -l l2arca -s 128G /dev/nvd0
nvd0p1 added
root@freenas# gpart add -t freebsd-zfs -a 1m -l l2arcb /dev/nvd0
nvd0p2 added
root@freenas# zpool add scratch cache nvd0p1
root@freenas# zpool add tank cache nvd0p2

Running zpool status shows nvd0p1 and nvd0p2 under cache sections of scratch and tank pool respectively.

On pool status page FreeNAS UI shows /dev/nvd0p1 and /dev/nvd0p2 for pools respectively instead of nvd0 (full device, can be attached to one pool via UI) in cache sections.

I there any downsides with approach I took? Something that might bite my ass down the road? Things I should not forget and remember during upgrades, config restores or anything else in the future?

Is there any way how to let ZFS know that I prefere caching filesystem metadata in the L2ARC? If yes, what is the correct way to congure that? I would like to avoid long sequential file data reads or writes expunging metadata from L2ARC.

sretalla · Dec 27, 2020

ilmarmors said:
Is there any way how to let ZFS know that I prefere caching filesystem metadata in the L2ARC? If yes, what is the correct way to congure that?

You can elect to have it store only metadata... if that's interesting, check here:

adding l2arc for metadata only

Hello, Following the advice kindly given here I extended a pool. I want to make sure I've done things right. I pressed add cache, then added the disk to read cache. After that I went over to shell and entered the command: zfs set secondarycache=metadata misradpool/misraddata...

www.truenas.com

Also of interest would be this one (once you upgrade to 12)

How do I enable persistent L2ARC?

Keep seeing, "L2ARC rebuild disabled" on startup. Is there a flag I need to set to enable this new OpenZFS feature?

www.truenas.com

Constantin · Dec 27, 2020

L2ARC is duplicate data so if it blows up, corrupts, or whatever, the file system can go back to the pool for the missing data.

I found a metadata-only L2ARC to be a huge benefit for rsync operations. By default, it took three passes for the L2ARC cache to get “hot” with metadata and maximize its benefit. As of TrueNAS 12, the L2ARC can be made persistent.

Another option in TrueNAS 12 is setting up a special VDEV for metadata. However, unlike L2ARC, that sVDEV is essential for the pool. Hence, the sVDEV hardware / configuration should be designed to match the redundancy of your pool. For example, I will use a 3-way mirror of identical Intel SSDs for my sVDEV. Sizing the sVDEV properly is also important.

Patrick M. Hausen · Dec 28, 2020

I'd recommend not using nvd0p1 as the device name to add to the pool but gptid/<rawuuid-of-nvd0p1> instead. You can get this with gpart list nvd0.

Constantin · Dec 28, 2020

Can this be done per the GUI with those UUIDs as well? I ask since I agree with your approach but want it stick to the GUI as much as possible due to the repeated warnings here not drop into the shell for this sort of stuff.

Patrick M. Hausen · Dec 28, 2020

Nope. Only the entire disk can be added via UI. But doing so would lead to

a partition table with a single partition being created
the rawuuid of that partition being used to insert the disk as a cache into the pool

So you already dropped into the shell, didn't you? I only advise to use the identifiers TrueNAS would use if it had that feature to partition a disk. TN always uses the UUIDs for vdevs.

Look at the output of zpool status. With the exception of the boot pool it's gptid/something throughout.

Constantin · Dec 28, 2020

Having been bitten by changing ada# designations in the past, I concur with using UUID, whenever possible. It's the only foolproof way, but it's too bad it has to be done via the shell since that increases the likelihood that I somehow screw it up...

Patrick M. Hausen · Dec 28, 2020

Like omitting the "cache" keyword which would lead to an additional unmirrored vdev ...
zpool checkpoint is your friend in this case. Before you do anything else.

Constantin · Dec 28, 2020

Yup.
I will have to educate myself on the “how” of adding a small files as well as a metadata sVDEV on separate partitions via the shell then. That’s for a different thread.

good thing my plan was to nuke the current pool once I had multiple backups setup. That should make verifying the correct setup a lot easier.

also, I reckon the GUI will show the results as expected once I’ve set it all up via the shell?

Patrick M. Hausen · Dec 28, 2020

Be aware that you must provide some resiliency to metadata vdevs. I.e. at least mirror them and check the write endurance of the devices you are planning to use. These are not cache devices. If the metadata special vdev is lost, the pool is toast.

Constantin · Dec 28, 2020

Based on my inquiries here, a 3-way mirror consisting of S3610s seemed to fit the bill for my use case (a pool with largely dormant data). I also have a cold spare on hand. All of these SSDs have been burned in.

Other use cases may need something more robust!

Important Announcement for the TrueNAS Community.

Partition NVMe drive as L2ARCs for multiple pools

ilmarmors

Dabbler

sretalla

Powered by Neutrality

adding l2arc for metadata only

How do I enable persistent L2ARC?

Constantin

Vampire Pig

Patrick M. Hausen

Hall of Famer

Constantin

Vampire Pig

Patrick M. Hausen

Hall of Famer

Constantin

Vampire Pig

Patrick M. Hausen

Hall of Famer

Constantin

Vampire Pig

Patrick M. Hausen

Hall of Famer

Constantin

Vampire Pig

Similar threads