The RAIDZ1 thing... scrub vs resilver error rates...?

nickt · Mar 8, 2015

So my parts are in transit for my first FreeNAS build - looking forward to getting it going! I've seen heaps of great advice in these forums, and I plan to follow it, including the use of RAIDZ2 (6 x 3TB drives). I am also using ECC RAM (2x 8 GB) and a good board (I hope: C2750D4i). I will also have a good backup strategy in place.

But I have to say that I am not entirely convinced by the "RAIDZ1 is dead" thing - there are some real world aspects to this that I can't reconcile.

Yes, I absolutely get the basic maths (I'd have a 23.7% chance of recovering from a single drive failure if I used RAIDZ1 assuming 10^-14 URE drives), but there are a whole bunch of assumptions in this calculation...

What I can't quite reconcile is how these numbers play out in real life with regular scrubs. I'll be doing those too, but surely the maths suggest that a significant majority of scrubs will detect (and correct) at least one URE.

Is this what actually happens? Do FreeNAS users find that *most* scrubs experience a URE (that needs to be corrected)? And if so, what does this mean? My working assumption was that if a scrub found an error, it was an indication that I had a drive that was on the way out, so I should replace said drive. But does that mean that I'm likely to be replacing a drive every few weeks (depending on my scrub schedule)?

My guess is that most scrubs do not find an error, so I am left wondering about the maths involved with concluding that RAIDZ1 is dead. (My underlying assumption here is that a scrub and a resilver both require the same intensities of read activity / checksum calculation / checking, and so the processes are inherently similar).

Like I say, I plan to "play by the rules", but I'm just not entirely sure about the evils of RAIDZ1. Assuming I need to keep my system below 80% capacity, RAIDZ2 reduces my 16 TB to an effective capacity of 9.6 TB - RAIDZ1 would be nice...

Interested in your feedback - what is your experience with scrub error rates? Am I making some wrong assumptions in how scrubs and resilvers work?

SirMaster · Mar 9, 2015

nickt said:
Do FreeNAS users find that *most* scrubs experience a URE (that needs to be corrected)?

I have 28TB of data in my pool and I'm using WD Red disks. I have had many, many scrubs in a row where there is no data that needed to be repaired, so presumably there were no UREs. I've definitely scrubbed much more than 100TB without any data repairing going on.

The spec sheet for my disks state: Non-recoverable read errors per bits read = "< 1 in 10^14". So If I am seeing 1 URE in 100TB scrubbed on average, then that is 1 in 10^16 which is still < 1 in 10^14 and is still agreeing with the spec sheet.

1 in 10^14 is the worse case that the specs say the disk is allowed, but I would assume that most disks are not constantly operating at their worst case as that would be pretty terrible. At least my scrub statistics also seem to agree that my disks are not experiencing anywhere near 1 in 10^14 unrecoverable errors, at least not presently.

nickt · Mar 9, 2015

Thanks - that's good feedback...

Important Announcement for the TrueNAS Community.

The RAIDZ1 thing... scrub vs resilver error rates...?

nickt

Contributor

SirMaster

Patron

nickt

Contributor

Similar threads

Important Announcement for the TrueNAS Community.

The RAIDZ1 thing... scrub vs resilver error rates...?

nickt

Contributor

SirMaster

Patron

nickt

Contributor

Important Announcement for the TrueNAS Community.

Related topics on forums.truenas.com for thread: "The RAIDZ1 thing... scrub vs resilver error rates...?"

Similar threads