Recent pool import probs - RAM corruption?

Arwen

MVP
Joined
May 17, 2014
Messages
3,611
Here in the forums we have had a few recent post about users with pools that won't import, either crash the computer or just work work. The drives don't appear failed.


Have we started to see pools corrupted because of non-ECC RAM?


With the increase in the numbers of users, using many TeraBytes of storage, with non-ECC RAM servers, we would be bound to see it after a while. But, I did not think it would now, or take the form of import failures. These ones don't appear to be caused by hardware RAID or other obvious known problem, (aka WD Red SMR). Well, except non-ECC RAM is obvious in both.

Now I am NOT talking about ignored pools, like what Linus Tech Tips did. They got what they deserved, ignored the archive backup pools for years and got too many failed disks, (or other things...).

Here are some of the posts that I am referring to:



It may have come down to a need for a new tool, that does similar work to zpool import with R/O, except it does not import. This would simply look for consistency in the pool. And if exceptions are found, what they are. And of course, never crash or take forever before printing some results. In someways, the mythical off-line scrub would / should do a sanity check.

Or perhaps a more detailed ZDB help on pool import problems. And how to identify pool corruption problems, (with hopefully a clue why it happened).



So, am I totally off the wall?
Should I request the young gentlemen in white coats pay me a visit with a nice jacket that includes full sleeves which tie in the back?
Is non-ECC RAM potentially the problem here?
 
Last edited:
Top