Hello Mr.T
This was quite the read I must say. I'm thankful your and other contributors insights have been shared.
I'd like to contribute a bit of a 'list' in terms of how to approach the scenario as it stands today while looking ahead.
I feel like a lot of advice is put on the table, but some missing links can be filled in while adding emphasis to others.
Let's start with the hardware.
I've a shrugging feeling that your drives have been put through quite the harsh life, both in terms of power and potentially cooling.
I'd immediately (that is - mid crisis) make significant efforts to improve cooling. Drives will never be put to the test as hard as during the resilvers. It is now you want to have a proper cooling system sorted.
Step 1: assure good ventilation around the box, clear all air filters (but don't rub a vacuum cleaner against the chassis with components mounted......!)
Step 2: add or replace fans to significantly increase cooling. Potential noise issues can be handled later (via scripts), and are fixable. At the moment - cooling is the priority, until all sorts of RAIDZ2 recovery, salvation operations are complete and the system is considered stable, setup properly and data maintained.
The other part of my ambient related concerns is power. That multiple old drives have failed on you (compared to the norm) may suggest a faulty PSU - which may have been faulted due to bad quality power from the grid itself. This is where UPS comes into play. Get one. If not a real proper (aka - one that actually is sufficient to power down your rig in the event of power failure) - get
ANY UPS to at least accommodate for disturbances in the grid. In particular since you've recently (IIRC) invested in a new PSU. If this was my situation, I'd consider this to be a high priority task.
Now looking to setup the box properly. It is slightly evident there are some rudimentary stuff that needs to be setup before leaving the box to sit again.
I'd go through the list of stuff I find most important - some of which you can already tick off your list.
1. Set up your email, to receive warnings. (you had this running right?)
2. SMART and Scrub scheduling. Follow this guide blindly (assure all drives/pools are included). It is tried and tested:
https://forums.freenas.org/index.php?threads/scrub-and-smart-testing-schedules.20108/
3. Setup a list (not stored on the FreeNAS) which matches your serial numbers to the proper hdd. There is a script to do this, but can be done manually albeit more copy/paste work:
look for Display drives identification infos (Device, GPTID, Serial). The important part is to not get blinded by the Device ("da1") since it may change - but focus on GPTID and the serial. I match these with a handwritten sticker on each drive, containing a number. The list will be used at any time it is of interest to find what physical device that needs to be in the center of attention. This saves a lot of hassle.
4. Configure the send config email script. It is LOVELY to know a recent version is available on the mail.
Now I'd look into the drives that are scattered around in several pools. If there aren't any particularly strong reasons to why you want them separated (obviously no redundancy at all) - I'd at least consider the following actions. Overall - convert them into a 'dirt box' Raidz1 containing perhaps a duplication of the most important files, and the least important once (or the one's with highest turn over rate generating the most wear on drives). Now you see, the overall goal of this maneuver is not only to remedy some utility of the drives -
but to let you practice a proper burn in and drive validation. Do the whole shenanigan as prescribed in the how-to guides:
https://forums.freenas.org/index.php?threads/building-burn-in-and-testing-your-freenas-system.17750/
https://forums.freenas.org/index.php?threads/how-to-hard-drive-burn-in-testing.21451/
This can be done while other operations are ongoing. Obviously, the first step is to empty the drives from data. Also as a side note, you can have multiple datasets on the same pool, so any argument for having 4 pools (and a dataset each) is out the door.
So, to get the last piece (not necessarily last in the order of execution) would be to reread the basics. At some point you know you've skipped it through, but not followed through.
Yeah, the last step that I wish you'd do, is to read through the newbie materials again. Cyberjocks guide, ZFS primer (links in my sig) and the documentation. I can guarantee that even if some parts may not appeal to much to your interests - a whole new understanding of the systems in and out's will unfold. Some fragments stuck each time, most of the information is lost and needs revisiting to keep fresh. These fragments are the building blocks of knowledge.
Cheers, Dice