Alright, here is the current status of my FreeNAS box.
I transferred about 500GB of data onto it with no issues, warnings or errors. I then ran a scrub without any issues or errors. All of the original drives are back in the box any everything is the exact same way it was a week ago before I had any issues.
As far as I can tell their are no issues with any of the drives SMART data
Since everything about the server checks out, this leads me to believe something happened that caused the system to temporarily flip out and think that drives were going bad. This is most likely,
1. A bad part of some kind, motherboard, psu etc.
2. A power source issue (I'm use a surge protector and a large pro APC UPS)
3. A heat issue (All fans are working and temps appear to be fine currently)
While I am relieved my system is currently working fine, I am concerned as to what caused the issue to prevent it from happening again. Anyone have any ideas as to what might have happened to my system? Any other tests I can perform to help figure it out?
Desktop grade parts are not really designed for 24/7 operation and it is possible you've used up or worn out some marginal part.
If you didn't find dust bunnies reproducing in your box, and all the fans seem clean and happy, the values from your SMART data seem to indicate that it was perhaps a little warm but not intolerably so.
If and when you start replacing parts, please refer to the hardware forum sticky to help guide you to the most appropriate parts. The Supermicro boards, for example, are designed to be placed in servers that run 24/7. Your typical motherboard manufacturer doesn't go for the most expensive parts possible because the PC market is insanely competitive and they need to sell products people are willing to buy. So on a range of parts with various qualities, they typically pick one with the intention to run maybe 8 hours a day 5 days a week. Supermicro probably pays a little bit more for parts that are suitable for 24/7 operation.
Also sad but true, PC's may sometimes require disassembly and reseating of boards etc.
But quite frankly maybe it was just lonely and wanted some time out of the solitary confinement closet. Sometimes there's no obvious explanation.
So a few things to think about, though:
1) Labeling by device name (adaX) is really bad. Correcting that is a bit of a PITA though.
2) Actually set up some SMART tests to run! I do a short every 4 hours and a long 3 times a week on the filers here and this seems to be good at catching failing disks.
3) Consider that your disks are aging and are probably closer to the end of their service life than the beginning. You could consider this an opportunity. If your data is important, it should really be backed up, and so if that's a consideration, maybe there's an opportunity somewhere to make a new filer and then use the current one as a replication target or something like that.
4) Before you put the filer back online, be sure to run memory tests on it "just to be safe."