Chance of a failing resilver?

Status
Not open for further replies.

Blues Guy

Explorer
Joined
Dec 1, 2014
Messages
69
Hi, everybody!

I just read an article explaning, why RAID-5 (or with ZFS RAIDZ1) is (presumably) dead. What do you think of it?

The core of it was that on a commercial-grade drive (including WD Red and the like), you have one non-correctable read-error every 1^14 bits (about every 12TB of read data). So, if you (like me) have a ZFS RAIDZ1 with 4 * 2TB drives and one fails, there is a 50:50 chance of a non-correctable read-error while rebuilding the array (resilvering). What would happen in that case? Will the resilver fail and all data is lost? Or does ZFS have some checksum-magic for this case?

Thank you!
 

Mlovelace

Guru
Joined
Aug 19, 2014
Messages
1,111
Or does ZFS have some checksum-magic for this case?
ZFS can't have some checksum-magic for this case because if you're resilvering with raidz then there is no parity drive to checksum against. That is why raidz2 is recommended a thousand times on this forum, so that if you have a corruption during resilver zfs can recover "checksum-magic" off the remaining parity drive.

Edit: Sorry I forgot the other bit. If you have a URE when resilvering a raidz1 pool you will loose the entire pool.
 
Last edited:

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
ZFS has no "magic". The "magic" that you speak of would be redundancy, which you have none of in a RAIDZ1 (aka RAID5) solution. That's the Achilles's heal with RAIDZ1 and is why it's not recommended. The answer is to use RAIDZ2+ so you don't have those kinds of problems.

What happens in that case depends on what kind of data goes corrupt. If it's metadata you might see the system crash and on reboot you can't ever access the pool again (aka all data is gone and you must restore from backup). On the other hand it might simply flag a file as corrupt. So the question to ask yourself is "how much risk am I willing to incur?"
 

Blues Guy

Explorer
Joined
Dec 1, 2014
Messages
69
Thank you for the quick responses. I know that RAIDZ1 can't deal with a URE while resilvering, but as far, as I understood, ZFS uses a checksum per file, so it should know, when a file can't be recovered. And that's where Mlovelace and cyberjock give me conflicting information. I undertand that when having a URE while resilvering, you *will* lose information and if it's at a bad point in time, you could lose important metadata. But if I'm lucky and it is in a "normal" payload-file (for example, an mp3), I would just lose that one file? Or do I lose the whole pool no matter what?

Thank you!
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
No, you won't lose the pool no matter what. It depends on what is corrupted, which I tried to explain in my last paragraph of my previous post.
 

Blues Guy

Explorer
Joined
Dec 1, 2014
Messages
69
I know, you did. Thank you. I was just confused because Mlovelace wrote "If you have a URE when resilvering a raidz1 pool you will loose the entire pool."
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
I know, you did. Thank you. I was just confused because Mlovelace wrote "If you have a URE when resilvering a raidz1 pool you will loose the entire pool."

Statistically he seems to be more right than wrong. It's a gamble, and if you are in the business of gambling with your data you should use NTFS and hardware RAID. /zing!
 

ixidor

Dabbler
Joined
Jun 23, 2011
Messages
20
similar question. i have a server chassis with 12hot-swap bays. am planning on doing 2 raid array's eventually. currently i have centos with 8 x 2TB drives raid5 ( that are 4 years old and needs to go). am considering doing raidz2 with something like 6x 3TB wd reds, 5 or 6 x 4TB wd reds.
-question, with 1e14 ure on most NAS drives ( works out to about 12TB) if i make a 16tb raid array with raidz2, and it hits a URE, will it be able to recover missing data from the other parity set?
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
similar question. i have a server chassis with 12hot-swap bays. am planning on doing 2 raid array's eventually. currently i have centos with 8 x 2TB drives raid5 ( that are 4 years old and needs to go). am considering doing raidz2 with something like 6x 3TB wd reds, 5 or 6 x 4TB wd reds.
-question, with 1e14 ure on most NAS drives ( works out to about 12TB) if i make a 16tb raid array with raidz2, and it hits a URE, will it be able to recover missing data from the other parity set?

Yep, that's precisely why RAIDZ2 is safer than RAIDZ1. Mathematically, something around 2020, RAIDZ3 will be the "safe" way to store data. That assumes disk sizes continue to increase and reliability doesn't increase substantially too.

I'm kind of wondering what its going to be like when we say 'RAIDZ3 or bust'. That's gonna seem like total overkill (and it totally does feel that way right now). Is ZFS going to try to stay ahead by making a RAIDZ4? Obviously there needs to be a technological solution to this problem, and there certainly will be. I just think it's weird to think about that.

Of course, it's weird to think about us having 20TB+ drives in standard desktops in 2020, which is at least possible.
 
Status
Not open for further replies.
Top