Backblaze Hard Drive Reliability Report

Status
Not open for further replies.

cmfisher4

Explorer
Joined
Oct 8, 2013
Messages
51
Yes, very relevant. Especially considering that with a new build, all of your harddrives will come into their peak failure rates all at the same time. Depending on the size of your array, it looks like you need to plan for a failure after 3 years, maybe even to the point of preventive measures (like some Navy PMS...if it ain't broke, still fix it). And, if two line up during resilvering...maybe even RAIDZ2 is in trouble?
Also interesting to see that a one-year warranty may be good enough because if you've made it that far, give or take, the drive will hang on until after 3 years...making the 3 year warranty not as much of a selling point.

Chris
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
... (like some Navy PMS...if it ain't broke, still fix it) ...

He's a fraud! If he were a Navy Nuke he'd know that the real phrase is "if it ain't broke, fix it till it is".
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
Anyway, if you read that google paper and basically chart out all of their data, you'll see failure rates increase in 3 steps; one at 0-1year, one at 1-3year, and one at 3+ years.

Not surprisingly, the hard drive manufacturers first figured this out years ago(circal 2009?). If you remember warranties used to be for 5 years, then suddenly they were 3 years. Now they are 1 year with the "premium" drives being 3 years. Sound familiar at all? They knew exactly what they were doing. It's extra profit margin for them to convince you to buy those "premium drives" as you have only a small chance of having a failure between the 1 and 3 year mark. They pocket the money and say "thanks sucker..." and you got a piece of mind that you can RMA the drive if need be. I pointed this out last year sometime when there was a long discussion over the Google White Paper on Failure Trends. I guess nobody listened to me then. It's easy to extrapolate that data if you check out the Google paper...

The only thing I really disagree with is the comment that 24/7 uptime will lead to higher failure rates. Personal experience(as well as that observed by others) has show that leaving them up 24/7 seems to extend their life. Not sure why that's the case, or if there are situations where 24/7 uptime is "bad". I've had 3 drives out of 24 fail(all are at the 3 year or 3.7-ish mark) and I've had 1 disk that just up and died(at about the 2.9 year mark) and 2 that I allowed to get too hot(whoops). Of the 2 that got too hot one was just barely in warranty(2.9 year mark) and one was out of warranty (3.4ish mark).

Quite frankly, there's nothing about that report that's overly shocking to me.
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
Yes, but validation of what Google found is helpful. For any number of reasons, it is possible that their findings are not representative. Two relatively independent studies of large drive populations is interesting.
 

cmfisher4

Explorer
Joined
Oct 8, 2013
Messages
51
No, not a fraud. Been a nuke probably a lot longer than you think.
Anyway, yes, I agree that the warranties track reality (or at least this study). Seems like a lot of corporate knowledge being closely guarded. Not that you can blame them, of course.
As for the 24/7 thing, I am pretty sure that it probably one of the oldest arguments I remember; whether to let your drives run for ever or spin them down...started around the time that power management really become mainstream. Is it better to "waste" the energy and let the drive spin continuously or is it better to let it spin down to save some $...which results in more wear and tear on the drive? Not sure this one will ever be truly answered until large-scale studies on thousands of drives are done.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
No, not a fraud. Been a nuke probably a lot longer than you think.

I was just teasing you. When I was EM1 back in a previous life it seemed like every time we tried to do PMs on something we'd take it apart and figure out something was broken(not our fault) that prevented reassembly. But when we ran drills like crazy and didn't do the PMs the equipment never broke(except for the MMs.. their crap constantly broke during drills).

Anyway, yes, I agree that the warranties track reality (or at least this study). Seems like a lot of corporate knowledge being closely guarded. Not that you can blame them, of course.
As for the 24/7 thing, I am pretty sure that it probably one of the oldest arguments I remember; whether to let your drives run for ever or spin them down...started around the time that power management really become mainstream. Is it better to "waste" the energy and let the drive spin continuously or is it better to let it spin down to save some $...which results in more wear and tear on the drive? Not sure this one will ever be truly answered until large-scale studies on thousands of drives are done.

I just look at the warranty thing for hard drives the same as for those 3rd party laptop warranties and stuff you can buy. They're in it for profit only. They'll make sure they have the upper hand no matter what.
 

Johhhn

Explorer
Joined
Oct 29, 2013
Messages
79
I find it rather amusing that Backblaze uses consumer drives vs Enterprise. You'd think they would like to save money (drives last longer, less labor to replace, etc.) and be more green.
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
Enterprise drives might be half as likely to fail but also cost twice as much... ST4000NM0033 running $350 right now, got my ST4000DM000's for $149.

So are two mirrored drives more reliable than one single enterprise drive?

Put differently, I made a RAIDZ3 out of 11 drives, plus a hot spare, plus a cold spare. 13 drives cost less than $2000. 11 enterprise class drives is $3850.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
Enterprise drives might be half as likely to fail but also cost twice as much... ST4000NM0033 running $350 right now, got my ST4000DM000's for $149.

So are two mirrored drives more reliable than one single enterprise drive?

Put differently, I made a RAIDZ3 out of 11 drives, plus a hot spare, plus a cold spare. 13 drives cost less than $2000. 11 enterprise class drives is $3850.

Precisely. Enterprise class drive have only one advantage(in my opinion) and that's TLER. A drive starts failing and performance of the RAID drops, but it's probably still fairly acceptable. In ZFS, every time I had a disk start failing I was happy to get 7MB/sec because of the time spent with retrying.
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
7200RPM could be an advantage for heavy loads. TLER is available on the NAS-class drives though.
 

Johhhn

Explorer
Joined
Oct 29, 2013
Messages
79
Enterprise drives might be half as likely to fail but also cost twice as much... ST4000NM0033 running $350 right now, got my ST4000DM000's for $149.

So are two mirrored drives more reliable than one single enterprise drive?

Put differently, I made a RAIDZ3 out of 11 drives, plus a hot spare, plus a cold spare. 13 drives cost less than $2000. 11 enterprise class drives is $3850.

I replied with context of this original article: Backblaze. Naturally, in a consumer environment, it's much different.

Back to Blackblaze (HA!) If an Enterprise drive last twice as long and costs twice as much, then sure, why isn't it worth it? Most likely they won't have the 1st year failures as well.

In addition, Backblaze doesn't care too much about reliability of drives, just saving $ (as witnessed with their 'chucking' of drives from externals).
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
You do realize that "lasts twice as long" is a logical failure, yes? The information available is annual failures.

So if you have 12 consumer drives with an 80% survival rate over three years, you are seeing maybe three of them fail.

With 12 enterprise drives and a 90% survival rate, that could still be two failed drives.

But the thing is, I could take the $2000 savings, hold it for three years, and buy new larger consumer drives at that time ... just at the point where the enterprise class drives would have begun failing at greater rates.
 

Johhhn

Explorer
Joined
Oct 29, 2013
Messages
79
Sure, it's a logical failure, but so is your followup statement of 90% survival rate. (regarding information available in the article) :)

From personal experience (and when I say personal, I am talking MY own drives and many of my clients as well), enterprise drives DO last longer AND have much less frequent 'early' deaths. Is this worth it in an environment that needs to rely on their server? Absolutely. Is it a big deal to Joe Schmoe (no offense!) that is merely using it as a backup machine? Probably not.

But in the case of 25,000 drives, which is what I was talking about originally, I think it's insane.
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
For reliability, I'll take two slightly less reliable devices operating redundantly any day over a single much more expensive device. The pair of less expensive devices have more 9's of reliability.

For storage, this rule generalizes out very nicely with ZFS and its capabilities.

If early failures impact you, I have to wonder where your burn-in and qualification processes are going awry. But then again I guess we're an unusual shop because we actually have such processes...
 

Johhhn

Explorer
Joined
Oct 29, 2013
Messages
79
1- I agree, I would (and do) the same. But, I'm referring to the Blackblaze setup and other enterprise setups.

2- Early failure for some of my clients can be very bad-- and quite a few will make your life very difficult if you don't convince them to buy the more reliable solution, and later on it fails early. These clients don't like it when you tell them "I told you so". ;)

For reliability, I'll take two slightly less reliable devices operating redundantly any day over a single much more expensive device. The pair of less expensive devices have more 9's of reliability.

For storage, this rule generalizes out very nicely with ZFS and its capabilities.

If early failures impact you, I have to wonder where your burn-in and qualification processes are going awry. But then again I guess we're an unusual shop because we actually have such processes...
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
Yes, that's great, but you seem to be failing to grasp that for the same budget, the consumer drives are more reliable ... because you can buy twice as many, which opens up lots of options for increased levels of redundancy.

Backblaze and many other operations are only feasible because they are able to control costs. There is a REASON Backblaze designed their Pod rather than just using the commercially available chassis offerings. It is the same reason they use consumer drives. Done properly, it works reasonably well for them.

I am not particularly afraid of losing our 30TB filer; 11 4TB drives in RAIDZ3, a hot spare to allow staff to command replace/resilver even if not on site, and a cold spare that has been burned in for 1200+ hours as part of the array. Short SMART testing every four hours. Long test three times a week. More than 2500 hours of qualification and burn-in. Periodic scrubs.

All of the big boys ASSUME storage components WILL fail. The trick is to leverage that intelligently, using less expensive hardware to create a solution that is MORE reliable at a similar or (better yet!) lower cost.
 

cmfisher4

Explorer
Joined
Oct 8, 2013
Messages
51
So, this discussion has made me rethink my decision (no yet acted upon) to purchase WD Reds. With the increasing popularity of consumer NAS devices (or appliances), are WD and Seagate trying to fill a market niche that really doesn't need filling (just making people think it does)? Good on them if they are successful (and it seems they are). And while the price difference isn't huge between a 3TB WD Red and a 3TB WD Green (~$14 bucks on Amazon right now) this will add up for a large array, of course. Hell, if I'm looking at an 8-drive array, that cost savings is a stick of 8GB ECC to support that array or another drive to have on the shelf when one of them goes.
I know cyberjock uses WD Greens and that he modifies their error reporting time or something using wdidle (something like that, would have to find those posts again) and they have been very successful for him (if I get the tone of his posts). Based on this, do you guys think consumer drives a just-as-good solution for someone like me who is just running a basic home file server (using email reports and periodic scrubs and all of the preventive stuff)?
And, cyberjock, I have always have a soft spot in my heart for electricians. ETs, we worked in port because if our stuff broke, you pulled in. MMs worked in port because, well, steam is rather harmful. You guys, you worked all the frigging time. They have zero-maintenance battery cells now, though, like your car battery. No more humping gravities. The guys love it and their t-shirts don't get a million little holes in them.
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
I generally doubt that there is a physical difference between "red" and "green" drives despite some claims to the contrary. It seems likely it is purely a matter of firmware (idle/green modes, TLER, etc), warranty, and of course marketing.

For a home user, where loss of responsiveness is not a critical issue, TLER is not meaningful. Assuming you do not mind messing with wdidle for WD Greens, then, the substantial factor becomes warranty.

But if you are looking to "save" money, also consider 4TB drives. Despite a cost premium, if you add up the cost of your NAS and drives and then divide by the usable terabytes (cost-per-delivered-TB) you usually find that 4's are substantially cheaper on that basis. Also less power used per TB. Of course this assumes you can use that extra space or adjust your pool design to reduce the number of drives used...
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
Me personally, at only $14 each I'd probably go with Reds just because I could. I realize that adds up to an 8GB stick of RAM, but you could always add RAM later if you really wanted too. Once you buy the drives, you can't change them or upgrade them without significant expense. If you bought the Greens and for some reason they've been changed so they aren't as friendly as I'm used to I'd be really sad to have to drop more cash to get the "right" disks. I had that issue with over $2000 worth of disks in 2009. So I'm a little weary of choosing "the wrong disk".

To be honest, wdidle was last updated in 2010. WD had made a statement years ago that wdidle is no longer supported and may stop working with future version at any time. I'd hate to end up with a bunch of disks that aren't compatible because WD changed their firmware and the tool no longer works. Now that its fairly obvious that the Green drives cannibalize from the Reds, if WD gets enough evidence to deliberately break wdidle, I won't be surprised. Doing it in 2010 or 2011 might have pushed people to a competitor's product. But now that WD sells a product just for people like us, they can easily recommend a marginally more expensive product that is "designed for just your use".
 
Status
Not open for further replies.
Top