The purpose of RAIDZx is to prevent the need to re-create your zpool from backup if x drives fail simultaneously.
That is to say that RAIDZx is not a standalone backup solution.
I think this is important perspective for the rest of the post so I'll say it again:
The purpose of RAIDZx is to prevent the need to re-create your zpool from backup if x drives fail simultaneously. (plus of course protecting not yet backed-up data...)
To this end, there's been some discussion about what 'x' should be in RAIDZx, mostly based on the article below which raises great points about the problems with traditional RAID5 with current HDDs.
http://www.zdnet.com/blog/storage/why-raid-5-stops-working-in-2009/162
The point being, with normal RAID5, an Unrecoverable Bit Error (UBE) during restripeing means the entire RAID volume is likely (certainly?) unrecoverable, hence the need for RAID6.
But ZFS is different and handles checksumming and resilvering on the block level.
So, my ultimate question after learning about all of this was - what exactly happens with ZFS if there is an unrecoverable bit error (UBE) during resilver of a RAIDZ1 vdev?
As far as I can tell, the answer is - you get 1 corrupted block, for 1 UBE, but you keep the rest of your data and zpool.
Assuming the bit errors are random, if you have multiple errors, it corrupts multiple blocks and by extension multiple files.
What if the UBE is in metadata? This can potentially cause a LOT of additional data to be lost.
Well, there are at least 2 copies of all metadata in ZFS, each with their own parity data (zpool-critical metadata is stored 3x). So a UBE on metadata can fall back on the copy.
That leaves us with the following conclusion for RAIDZx:
RAIDZx protects against the simultaneous failure of x disks, and 1 UBE during resilver of those x disks results in the corruption of 1 block/file without losing the zpool.
If you can tolerate the corruption of 1 or more single blocks or files, then this is not a problem.
ZFS should tell you which files are corrupted and you can restore them from backup.
My point being, I don't see a UBE during resilver as an end-all situation.
I'll let you draw your own conclusion on what implication this has for RAIDZx selection.
I don't want to make this too long but I'll briefly consider the chance of a 2nd disk completely failing during resilver.
I find MTBF and MTTDL basically meaningless. (See https://blogs.oracle.com/relling/entry/a_story_of_two_mttdl, and https://www.usenix.org/legacy/event/hotstorage10/tech/full_papers/Greenan.pdf)
Instead you can look at the annual failure rate which gives a more real-world sense of your risk.
I'll use the data from Backblaze (http://blog.backblaze.com/2014/01/21/what-hard-drive-should-i-buy/)
This shows an annual failure rate of about 4%. If you assume a uniform distribution of failures, then this is a failure rate of ~4.5E-6 per hour.
For a 48 hour resilver window, you get a failure rate of ~2.2E-4 (0.022%). That's 1 failure in ~4500 resilvers.
For a home user with backups, this may be an acceptable risk. For a critical business application, maybe not.
Welcome thoughts, corrections, and perspectives :)
That is to say that RAIDZx is not a standalone backup solution.
I think this is important perspective for the rest of the post so I'll say it again:
The purpose of RAIDZx is to prevent the need to re-create your zpool from backup if x drives fail simultaneously. (plus of course protecting not yet backed-up data...)
To this end, there's been some discussion about what 'x' should be in RAIDZx, mostly based on the article below which raises great points about the problems with traditional RAID5 with current HDDs.
http://www.zdnet.com/blog/storage/why-raid-5-stops-working-in-2009/162
The point being, with normal RAID5, an Unrecoverable Bit Error (UBE) during restripeing means the entire RAID volume is likely (certainly?) unrecoverable, hence the need for RAID6.
But ZFS is different and handles checksumming and resilvering on the block level.
So, my ultimate question after learning about all of this was - what exactly happens with ZFS if there is an unrecoverable bit error (UBE) during resilver of a RAIDZ1 vdev?
As far as I can tell, the answer is - you get 1 corrupted block, for 1 UBE, but you keep the rest of your data and zpool.
Assuming the bit errors are random, if you have multiple errors, it corrupts multiple blocks and by extension multiple files.
What if the UBE is in metadata? This can potentially cause a LOT of additional data to be lost.
Well, there are at least 2 copies of all metadata in ZFS, each with their own parity data (zpool-critical metadata is stored 3x). So a UBE on metadata can fall back on the copy.
That leaves us with the following conclusion for RAIDZx:
RAIDZx protects against the simultaneous failure of x disks, and 1 UBE during resilver of those x disks results in the corruption of 1 block/file without losing the zpool.
If you can tolerate the corruption of 1 or more single blocks or files, then this is not a problem.
ZFS should tell you which files are corrupted and you can restore them from backup.
My point being, I don't see a UBE during resilver as an end-all situation.
I'll let you draw your own conclusion on what implication this has for RAIDZx selection.
I don't want to make this too long but I'll briefly consider the chance of a 2nd disk completely failing during resilver.
I find MTBF and MTTDL basically meaningless. (See https://blogs.oracle.com/relling/entry/a_story_of_two_mttdl, and https://www.usenix.org/legacy/event/hotstorage10/tech/full_papers/Greenan.pdf)
Instead you can look at the annual failure rate which gives a more real-world sense of your risk.
I'll use the data from Backblaze (http://blog.backblaze.com/2014/01/21/what-hard-drive-should-i-buy/)
This shows an annual failure rate of about 4%. If you assume a uniform distribution of failures, then this is a failure rate of ~4.5E-6 per hour.
For a 48 hour resilver window, you get a failure rate of ~2.2E-4 (0.022%). That's 1 failure in ~4500 resilvers.
For a home user with backups, this may be an acceptable risk. For a critical business application, maybe not.
Welcome thoughts, corrections, and perspectives :)
Last edited: