However, it is worth noting that that this has always been a dicey metric to begin with, and probably doesn't translate to useful data, in the same way that MTBF isn't really directly meaningful. Nonrecoverable read errors aren't likely to magically all be 1x10^14 for consumer grade drives and 1x10^15 for the enterprise drives, across many years, underlying technology changes, etc. It MAY be indicative of somewhat better materials/design/etc but it is also definitely indicative of the fact that they'd prefer enterprises to buy the more pricey drives.
The whole point of RAID, however, was to create an redundant array of inexpensive disks, and to tackle the problem that way. The difference between 1x10^14 and 1x10^15 isn't particularly meaningful in that context, because, again, data loss is tied to the probability of two drives losing the same block simultaneously - not just two drives losing some arbitrary unrelated blocks simultaneously.
Which is precisely why I said "claim". I did engineering work for many years. I was one of those poor souls that provide input into the MTBF calculations that we had to go by. They, for all intents and purposes, are basically useless metrics.
What makes this whole conversation uglier isn't what I'm about to quote below. But the fact that if we are going to dismiss the 1x10^15 as a bunch of crap, why should we even trust the 1x10^14 value? Isn't that number just as meaningless then? I'm not saying we should or shouldn't dismiss those numbers. I'm just saying that if we are going to dismiss the smaller number for enterprise drives, why the hell aren't we also dismissing the higher value for consumer drives? Seems to be logical to me...
It's actually really only tangentially related, and I'm kinda surprised you'd say such a thing. What you're actually looking for is the likelihood of data loss on a pool, and how we can affect that in the future.
Bull shit it's only tangentially related. It's totally and unequivally related. Those failure rates that are sold are the math behind why RAIDZ1/RAID5 died in 2009. That precise math. Nothing else was involved.
RAIDZ1 dies "in 2009" for a very specific reason: the loss of the parity disk results in the elimination of redundancy for the pool. When you're rebuilding, you actually do need each and every sector on the remaining drives that contains pool data to be readable, or you will encounter some loss of data. That is very much intertwined with the URE values you're discussing.
RAIDZ2, however, retains redundancy. Because of that, the URE values are of less concern. As long as the redundancy is capable of recovering the data, you're still fine. The problem with RAIDZ2 is that if you lose a drive, any block on the remaining drives which falls victim to a URE is still recoverable, but has effectively lost redundancy. Still, it is totally recoverable.
And you don't hear me harping on RAIDZ2 being dead do we? Because I know better than to assume that a single URE is a death kneel for ZFS. For hardware RAID, it could very well be since many RAID controllers drop a drive for the controller on first sign of problem.
We do run into a problem with that, however, as the rebuild times increase. The likelihood of a second drive failing during a multi-day rebuild with these modern large drives is substantially greater than the chance of failure striking during the rebuild of a much smaller drive.
RAIDZ3 extends that out further. At this point, the impact of the URE rate is essentially meaningless, because you're multiply covered even for two failures. Again, as I pointed out earlier on, this is actually a problem in statistics, and statistically speaking, you're very likely to retain availability of a data block as long as you haven't managed to lose access to that block either due to a URE on that same block on the other drive, or lose that drive entirely.
Which I've said many times on this forum...
By the way, I don't know if WD Reds are still being recommended much around here but they don't seem to be doing well on Backblaze's report. Link. Failure rates of up to 13%.
You know that link has been provided twice on the last 7 days, right? Linking to it again doesn't make it any more trustworthy than the other thread that linked to it yesterday afternoon.
Yes, Reds are still recommended around here. Probably in more than 50% of cases compared to all other brands combined. I'm using 10xWD Reds and I have no reason to be concerned, at all. Backblaze has been publicly humiliated for more than one report they've put out in the past, and even admitted that the report wasn't meant to extrapolate assumptions that people started making.