Choosing SSD and configuring mirrored pools

pasha-19 · Jul 12, 2023

Davvo said:
The SSD pool won't benefit much from a small file VDEV, using it with the HDD pool has its merits.

A single VDEV can't be in two different pools.

You can set L2ARC as metadata-only. It's a different thing from how fusion pools work. It generally also need at least 64GB of RAM in order to be beneficial and not harm performance.

Fusion Pools

Describes how to create a Fusion Pool on TrueNAS CORE.

www.truenas.com

L2ARC is a cache, it doesn't protect from corruption. That's work done at the block-level.

You should read the following resources.

Introduction to ZFS

This is a short introduction to ZFS. It is really only intended to convey the bare minimum knowledge needed to start diving into ZFS and is in no way meant to cut Michael W. Lucas' and Allan Jude's book income. It is a bit of a spiritual...

www.truenas.com

Hardware Recommendations Guide

This is the latest edition of the FreeNAS Community hardware recommendations guide. The current major version is R2, dated January 2021, with the last minor update on 2021-01-24. The format has moved away from the forum post form factor, to...

www.truenas.com

Small file vdev after the early post was already abandoned for the SSD pool only considered for the HDD pool. Thanks

Yes, I agree l2arc alone does not help corruption. IF a l2arc was chosen instead of the special metadata small file vdev; the potential for the special metadata small file vdev on NVMe to be corrupted (taking down the HDD pool too) is eliminated because the special metadata small file vdev never existed.

Going back to my earlier thinking not mentioned here yet; l2arc did not impress me as a great solution, it did not focus on specific problem areas as much as what was most recently used meaning a large media file could start pushing out metadata and small file data. Special metadata small file vdev directly attacked two significant problems; metadata and small files, but it is risky to implement especially on "commercial" NVMe drives. because the main pool does not have duplicate copies of the data in the special metadata small file vdev and corruption of either took out both. From the earlier paragraph l2arc I believe would have the original copies of the metadata and small files in the HDD pool lowering maybe eliminating the chance loss of the l2arc losing any data.

I have 64GB of ram with the potential total of 128gb, L2arc seemed really to be for memory strapped systems. It seems the easiest way to create the effect of a special metadata and small file cache if it appears that the media files are disrupting the cache of metadata and small files is to increase the ram to 128GB before considering either l2arc or special metadata small file vdev. The extra 64GB of ram has the same most recently used replacement and is faster than any l2arc could ever be.

(zdb -LbbbA -U ...)
The distribution of my file sizes in the first set of columns for the HDD pool indicate I have 1TB of data at 32K and just short of 6TB at 128K of 8tb total. (I expect as an estimate that the 32K area could expand upward considerably (videos). in the small files area 250K files each for 2k or less (0.5G) and 4k (1G) and another 400k at 8k, 100K files (2G) at 16K. The total small file space up to 16K about 8GB. The total TB in the last colum is 8TB. Generations of backups maybe by Asigra (plugin) for 3 desktops (500GB) will be added. More videos will probably increase the 32K usage to matching or exceeding the 128K range. The videos will increase the small file usage for video metadata.

The small files well beyond my considered level are only 8GB; 64GB of ram for four users will probably not replace that data often.

Hardware recommendations guide told me none of hardware I found was appropriate if I read it correctly -- no acceptable NVMes
Quick Hardware recommendation guide said to press an orange button I never found.

in the end lets consider two questions

Can I improve the jails and jellyfin metadata in what I have called fpool with say a paired mirror of two commercial NVMe drives is the system is attached to a UPS for orderly shutdown or is there still a corruption risk for that data? If there is any risk then then just the following:

With the other concerns perhaps maxing the memory, when required, is easiest and there is no potential ssds to corrupt data.

I just may be too old fashioned trying to spread disk access across devices to eliminate bottlenecks when the HDD pool already does some of that.

Davvo · Jul 12, 2023

pasha-19 said:
Small file vdev after the early post was already abandoned for the SSD pool only considered for the HDD pool. Thanks

Yes, I agree l2arc alone does not help corruption. IF a l2arc was chosen instead of the special metadata small file vdev; the potential for the special metadata small file vdev on NVMe to be corrupted (taking down the HDD pool too) is eliminated because the special metadata small file vdev never existed. That could merit some attention.

I understand, you are saying that contrary to losing the special VDEV losing the L2ARC won't make you lose your pool: it is correct.

pasha-19 said:
Going back to my earlier thinking not mentioned here yet; l2arc did not impress me as a great solution, it did not focus on specific problem areas as much as what was most recently used meaning a large media file could start pushing out metadata and small file data. Special metadata small file vdev directly attacked two significant problems; metadata and small files, but it is risky to implement especially on "commercial" NVMe drives. because the main pool does not have duplicate copies of the data in the special metadata small file vdev and corruption of either took out both. From the earlier paragraph l2arc I believe would have the original copies of the metadata and small files in the HDD pool lowering maybe eliminating the chance loss of the l2arc losing any data.

As said L2ARC is cache and as such it contains metadata, most recently used files and more frequently used files... it behaves the same way as the ARC, see it as an extension of it; it can be configured to hold only metadata, but it cannot hold small files in place of the HDD pool like the special vdev can.

Commerical NVMe are bad because of their poor resiliency (low TBW) and a few other thigs, if you want to look into proper drives you should look into the likes of WD's RED Segate's IronWolf or Intel's Optanes (which are the best in mixed iops).

pasha-19 said:
I have 64GB of ram with the potential total of 128gb, L2arc seemed really to be for memory strapped systems. It seems the easiest way to create the effect of a special metadata and small file cache if it appears that the media files are disrupting the cache of metadata and small files is to increase the ram to 128GB before considering either l2arc or special metadata small file vdev. The extra 64GB of rame has the same most recently used replacement and is faster than any l2arc could ever be.

Yes, it's generally suggested to reach your motherboard's maximum ram capacity before considering L2ARC.

pasha-19 said:
Can I improve the jails and jellyfin metadata in what I have called fpool with say a paired mirror of two commercial NVMe drives is the system is attached to a UPS for orderly shutdown or is there still a corruption risk for that data?

I don't see why you think the metadata of an SSD pool needs to be improved. Having an UPS for orderly shutdown won't harm yur system in any way... proven that you use a pure sine wave one.

pasha-19 · Jul 12, 2023

a pair of WD Red SN700 500GB for special metadata and small file vdev (overkill but cheaper than 250GB today at amazon and will probably last longer with more room to spread the data (snapshots) TBW 1000. See above even a more aggressive use up to 16K for small files only 8GB today for a considerably large number of small files

a pair of WD Red SN700 2TB for jails and jellyfin's application and content metadata and future growth (like a database) TBW 2500 (TBW over double what I was looking at)

These would be considered safe for the intended purpose.

Price is about the same as the commercial drives because I oversized one pair of them -- these are both probably grossly oversize but I have a bunch of content to add and desktop backup versions to add and expansion fo the physical unit after this may be a new motherboard and controller.

pasha-19 said:
Can I improve the jails and jellyfin metadata in what I have called fpool with say a paired mirror of two commercial NVMe drives is the system is attached to a UPS for orderly shutdown or is there still a corruption risk for that data?

I don't see why you think the metadata of an SSD pool needs to be improved. Having an UPS for orderly shutdown won't harm yur system in any way... proven that you use a pure sine wave one.

The above references the jail content and jellyfin application and content metadata (not zfs metadata) the extra stuff jellyfin uses to display with the media content. Jellyfin is likely to be the high use resource intensive app on the server probably more than samba.

Thanks -- I can go any way I want now. Consider this solved.

Constantin · Jul 12, 2023

sretalla said:
Otherwise... wishes get turned into reality by feature requests (via the Report a bug link at the top of the page here).

Thank you for the instructions re: how to get sVDEV data via the shell. Interesting results - my fragmentation for the sVDEV is at 13% yet fill is only 2.3%. Pool is at 3% fragmentation with 38% fill.

Now, as for this specific feature request, I submitted it in 2021 and like all my other reports, improvement suggestions, etc. it has sat dormant as far as the iXsystems team is concerned. Presumably, the team has more pressing issues to address.

Davvo · Jul 12, 2023

Constantin said:
Presumably, the team has more pressing issues to address.

"more pressing issues to address" is read as SCALE.

Constantin · Jul 12, 2023

Davvo said:
"more pressing issues to address" is read as SCALE.

Maybe.

Though I was particularly horrified by just how bog-awful the error messages re pool-import failures are - both in the UI and CLI.

Improving these messages beyond “your pool needs to be destroyed, hope you have a backup” would be something that iXsystems could leverage across both platforms and would also help sysadmins not needlessly / uselessly nuke their pools when just a simple electrical backplane connection went bad.

It doesn’t even have to be that sophisticated, just a simple tally of what disks the system expected to import vs. what it has “found”, highlighting drives that are missing. Then the sysadmin can investigate - did a controller card burn out, did a connection go bad, etc. Simply telling the admin that the pool is irretrievably done in (when it’s not!) is neither helpful nor does it build confidence in the awesome power of ZFS and/or IXsystems.

But based on the nonextant response by iXsystems, I doubt anything has been done and the company expects the user to deduce either through experience, community outreach, etc. that even if the UI / CLI claims that the pool must be destroyed, that there might be a simply-remedied issue like a loose connector that stands between them and a flawless pool import.

Look, I have nothing but respect for iXsystems re: their support over the years but I will disagree on some of their software development priorities. When something goes catastrophically wrong, it is imperative that the software does the best it can to help the user. Nurturing a TrueNAS back to health should not entail a steep learning curve as (with any luck) few of us should ever have to dive deep into that aspect of administrating a NAS.

Ok time to get off my soapbox now.

sretalla · Jul 13, 2023

Constantin said:
my fragmentation for the sVDEV is at 13% yet fill is only 2.3%.

Which means that 13% of the free space is in fragments created by re-writing of data, leaving holes once snapshots expire. Probably the nature of metadata.

morganL · Jul 13, 2023

Constantin said:
Maybe.

Though I was particularly horrified by just how bog-awful the error messages re pool-import failures are - both in the UI and CLI.

Improving these messages beyond “your pool needs to be destroyed, hope you have a backup” would be something that iXsystems could leverage across both platforms and would also help sysadmins not needlessly / uselessly nuke their pools when just a simple electrical backplane connection went bad.

It doesn’t even have to be that sophisticated, just a simple tally of what disks the system expected to import vs. what it has “found”, highlighting drives that are missing. Then the sysadmin can investigate - did a controller card burn out, did a connection go bad, etc. Simply telling the admin that the pool is irretrievably done in (when it’s not!) is neither helpful nor does it build confidence in the awesome power of ZFS and/or IXsystems.

But based on the nonextant response by iXsystems, I doubt anything has been done and the company expects the user to deduce either through experience, community outreach, etc. that even if the UI / CLI claims that the pool must be destroyed, that there might be a simply-remedied issue like a loose connector that stands between them and a flawless pool import.

Look, I have nothing but respect for iXsystems re: their support over the years but I will disagree on some of their software development priorities. When something goes catastrophically wrong, it is imperative that the software does the best it can to help the user. Nurturing a TrueNAS back to health should not entail a steep learning curve as (with any luck) few of us should ever have to dive deep into that aspect of administrating a NAS.

Ok time to get off my soapbox now.

Its a useful comment... it is difficult to triage between hardware failures and pool failures.

Even on our own hardware there are limits, but we haven't lost a pool this way in at least 3 years (across many thousands of systems) . Its much harder when the hardware config is "random". Has anyone in the open source world solved this issue?

Without that, these forums are probably the best approach. We're open to improving the error messages if someone wants to take on a project to make the change recommendations.

Constantin · Jul 13, 2023

morganL said:
Without that, these forums are probably the best approach. We're open to improving the error messages if someone wants to take on a project to make the change recommendations.

I suggested this in my Jira submission 2 years ago: when it comes to the pool import error process, how about starting with the hardware basics - what disks did the import process expect to find & which of those disks are missing?

Pools don’t usually go from being 100% healthy on shutdown to dead & beyond revival on a subsequent startup. If the failed import notes that the following “5 disks are missing w/these serial #’s” then the admin has really good info on where to start looking - did a connector / backplane / PSU / etc go bad.

Conversely, if the hardware is willing and the software is not, that is also really good troubleshooting info. Presently, TrueNAS provides neither, the import process simply quits both in the UI and the CLI.

And this isn’t a “random hardware config” problem, rather it’s a reflection of level of understanding the dev team presently expects the average user to bring to TrueNAS.

At the very least, please consider changing the pool import error message from the default “I won’t tell you what went wrong during import, why it failed, etc. but I will advise you to destroy the pool and start over” to “well, we cannot yet provide good failure info on the import process due to software development priorities but the pool won’t import at the moment. The pool may be dead but before destroying the pool and starting over, why not check all electrical connections and seek help from us if you have a support plan or the community forum if you don’t?”

@morganL, you have a pretty self-selected group of folk here that chose TrueNAS because it was supposed to be bullet-proof. I will bet a ham sandwich that many are refugees from other hardware and software RAID solutions that failed them. Don’t leave them hanging w/non-useful or even destructive error messages when something goes wrong.

Davvo · Jul 13, 2023

I'm totally fine with the current OpenZFS messages.

morganL · Jul 13, 2023

Davvo said:
I'm totally fine with the current OpenZFS messages.

Thanks.. can you compare OpenZFS messages with the TrueNAS messages?
That would make a good NAS ticket if we agree that OpenZFS is better.

Davvo · Jul 13, 2023

morganL said:
Thanks.. can you compare OpenZFS messages with the TrueNAS messages?
That would make a good NAS ticket if we agree that OpenZFS is better.

I supposed TN messages to be the standard OpenZFS ones; my point was that I'm fine with TN current state regarding those messages.

morganL · Jul 13, 2023

Davvo said:
I supposed TN messages to be the standard OpenZFS ones; my point was that I'm fine with TN current state regarding those messages.

That was my assumption, but your comment seemed to indicate otherwise. I'll check..

morganL · Jul 13, 2023

Constantin said:
I suggested this in my Jira submission 2 years ago: when it comes to the pool import error process, how about starting with the hardware basics - what disks did the import process expect to find & which of those disks are missing?

Looking at the suggestion, it was marked as iX-private and didn't receive any upvotes. The privacy issue is fixed.

I personally like the idea... but its a whole new software "module" to compare current and previous configs. Sometimes those changes are legitimate and sometimes not. Perhaps the 1st step is just to report the changes.... either in UI or in TrueCommand?

Constantin said:
At the very least, please consider changing the pool import error message from the default “I won’t tell you what went wrong during import, why it failed, etc. but I will advise you to destroy the pool and start over” to “well, we cannot yet provide good failure info on the import process due to software development priorities but the pool won’t import at the moment. The pool may be dead but before destroying the pool and starting over, why not check all electrical connections and seek help from us if you have a support plan or the community forum if you don’t?”

Sorry, couldn't find the error message... do you have an accurate quote? Which version of Software?

Constantin · Jul 13, 2023

So the first time this happened when I upgraded from 12U6.1 to U7. When I tried to manually import via the CLI all I got in response was

" I/O Error, destroy and recreate the pool from backup"

Back then, and likely in the latest episode (when my PSU died), a electrical power issue was the root cause. In both cases, all I got was " I/O Error, destroy and recreate the pool from backup". I may be misinterpreting your response so please bear with me as software is not my métier.

I would like to see as a error message response to a failed import is:

"Here are the disks that I should be able to import (by capacity and serial #)"

"Here are the drives I can presently import (by bus ID, capacity, and serial #)"

"Here are the drives that are missing (by capacity and serial #)".

While the pool may need to be destroyed and rebuilt, please first contact

customer support if you have a support subscription or

turn to the community forums at truenas.com/community for help

If you guys want to really impress users, you could even throw in some diagnostics such as suggesting a controller may be dead if all drives attached to said controller dropped out. Etc. But that's a lot more work and you may consider that too far afield for the kinds of trained professionals that typically use your systems.

At the very least, I would reconsider the error message altogether and replacing it with a more hopeful one, i.e. giving some tips and pointing to people to support or the forum for further support rather than advocating for the nuking of a pool by default. Giving potentially-bad advice by default helps no one and undercuts the deservedly-great reputation of the TrueNAS platform.

morganL · Jul 13, 2023

Constantin said:
So the first time this happened when I upgraded from 12U6.1 to U7. When I tried to manually import via the CLI all I got in response was

Back then, and likely in the latest episode (when my PSU died), a electrical power issue was likely the root cause. In both cases, all I got was " I/O Error, destroy and recreate the pool from backup". I may be misinterpreting your response so please bear with me as software is not my métier.

I would like to see as a error message response to a failed import is:

If you guys want to really impress users, you could even throw in some diagnostics such as suggesting a controller may be dead if all drives attached to said controller dropped out. Etc. But that's a lot more work and you may consider that too far afield for the kinds of trained professionals that typically use your systems.

At the very least, I would reconsider the error message altogether and replacing it with a more hopeful one, i.e. giving some tips and pointing to people to support or the forum for further support rather than advocating for the nuking of a pool by default. Giving potentially-bad advice by default helps no one and undercuts the deservedly-great reputation of the TrueNAS platform.

These are OpenZFS error messages.

https://github.com/xuanngo2001/cust-live-deb/issues/298

There are some commands which provide better diagnosis of state of drives.

~>zpool import -F
pool: poolname
id: 13185988754217356698
state: DEGRADED
status: One or more devices contains corrupted data.
action: The pool can be imported despite missing or damaged devices. The
fault tolerance of the pool may be compromised if imported.
see: http://zfsonlinux.org/msg/ZFS-8000-4J
config:

poolname DEGRADED
mirror-0 DEGRADED
4123668411235123049 FAULTED corrupted data
ata-ST8000AS0002-2NA17G_K2239JTL ONLINE

# Solution:
~>zpool import -FfmX poolname

We'd have to check whether the latest OpenZFS has better messages.

Constantin · Jul 13, 2023

FWIW, I tried using the -F flag in one of the instances listed above and the result was zero CLI feedback.

FWIW, I [got] zero response if I use the "zpool import -F -f -n pool_name" command. If I omit the -n, the pool is declared dead as needing to be destroyed.

However, my stacking of options at the CLI may have been an issue?

Worse, the middleware issues / error messages that the UI responded with were even more obscure (see referenced threads). That’s where I feel the TrueNAS team has more opportunity to fix issues that it has control over - first by having the code not crash and burn as well as offering more constructive feedback.

As far as replication of the error goes, for me it was as simple as electrically omitting the three sVDEV drives in my pool during the import attempt.

morganL · Jul 13, 2023

Constantin said:
FWIW, I tried using the -F flag in one of the instances listed above and the result was zero CLI feedback.

However, my stacking of options at the CLI may have been an issue?

Worse, the middleware issues / error messages that the UI responded with were even more obscure (see referenced threads). That’s where I feel the TrueNAS team has more opportunity to fix issues that it has control over - first by having the code not crash and burn as well as offering more constructive feedback.

As far as replication of the error goes, for me it was as simple as electrically omitting the three sVDEV drives in my pool during the import attempt.

The question is whether this is a ZFS fix... or we need a separate process to look for config changes. I'll see if our support team has seen any issues.

Important Announcement for the TrueNAS Community.

Choosing SSD and configuring mirrored pools

Dabbler

MVP

Dabbler

Vampire Pig

MVP

Vampire Pig

Powered by Neutrality

Captain Morgan

Vampire Pig

MVP

Captain Morgan

MVP

Captain Morgan

Captain Morgan

Vampire Pig

Captain Morgan

Vampire Pig

Captain Morgan

Important Announcement for the TrueNAS Community.

Related topics on forums.truenas.com for thread: "Choosing SSD and configuring mirrored pools"

Similar threads