3 out of 9 hard drives fail smartctl in less than 3 months -- what's the deal?

Yorick

Wizard
Joined
Nov 4, 2018
Messages
1,912
Ouch. What do the max temp values in smart tell you? Is this “bad luck”, or do you have high temps in that case?

Your experience reaffirms my idea of continuing to buy WD as long as they remain “shuckable” (8TB and above) - I may as well get enterprise drives at a great price point if I expect failures at the 4 and 5 year mark. I’m only 1.5 years “in” with my shucked HGST He8, I’ve got a ways to go.
 

Constantin

Vampire Pig
Joined
May 19, 2017
Messages
1,828
That failure rate is well above normal. I’d start considering other causes. For example, are you sure that the power supply is good? Have you played with spin down or similar variables causing excessive wear? Etc.

43*C is high in my book but OEMs now specify allowable drive temperatures as high as 60-70*C IIRC - likely leading to premature drive failure but from their perspective that’s ok, as long as it happens outside the warranty period. These temps are easily reached in cheapie external cases without active cooling during a big backup.

I doubt WD will tell you anything but to pound sand. However, your failure rates are so egregious that they may feel some mercy, so I’d contact them and see if they can do something even though your drives are out of warranty.

FWIW, this is why I don’t take the backblaze data as gospel. Their drives are running at 17-33*C, they are automatically replaced after four years, etc. which likely is a different use case than what we have spinning at home. Many of us run our pool hardware a lot longer than 4 years!

speaking of hardware, consider getting a better case. My Lian li a75 keeps the drives at 5*C above ambient. It can hold up to 12 drives and I simply staggered mine to leave an empty slot for every two drives. Three 120mm fans do the work, a dedicated Fan controller with feedback thermostat controls their speed.

my NAS drives are helium HGSTs that I bought used, most were 2 years old. One drive was DOA out of the box, another failed two years in, both elicited a full refund from the seller (3-year warranty). At this point, the remaining drives are 4-5 years old. No issues since. I have a cold spare or two handy.

One benefit of buying used is that you’re unlikely to get drives from the same batch, so failures should be more random.

lastly, I do not understand the benefit of a hot spare in a home setting. Buy the drive, qualify it (bad blocks, etc), pull it, set it aside. Why incur wear and tear + power consumption when you don’t need to? I get using hot spares at a data center or remote backup where physical access is limited but in a home setting it makes no sense.
 
Last edited:

Scharbag

Guru
Joined
Feb 1, 2012
Messages
620
I know it seems odd, but have you checked your cables? I had a bunch of 3TB drives report SMART problems in a short period and then I found that I had a bad SAS cable. No idea how a SAS cable can go bad, but mine did. So, I changed the SAS cable and things got a lot better in my system.

Cheers,
 

Arwen

MVP
Joined
May 17, 2014
Messages
3,600
This one reason why 1/2 my drives are WD Reds, and the other half are WD Red Pros. I don't get the speed improvement since it's a mixed pool. And yes the Pros run hotter. But, hopefully I will get mixed failure rates.
 

revengineer

Contributor
Joined
Oct 27, 2019
Messages
193
To add my two cents and speaking from my own experience over the last 20 years, failures of WD drives after 4-5 years is not uncommon. I have had WD reds fail after as little as 3 years of light use. I am new here coming from the Windows world where I used pools created with the Drivebender software. The advantage was that I could buy drives and grow the pool over time to avoid simultaneous failures. If all drives were bought at the same time and they are from the same lot, it may explain failure of several drives in a similar time frame. As I transition to FreeNAS, I have been buying drives two at a time with 4 weeks in between to not put all my eggs in one basket. I have also switched from WD to Seagate to avoid SMR issues. Not sure if any of this makes a real difference or just makes me feel better. I will let you know in 4-5 years. :)
 

pschatz100

Guru
Joined
Mar 30, 2014
Messages
1,184
Looking for the simplest possible issues first, I would power down the system and double check all the cable connections - both data cables and power cables. A few years ago, I actually had a molex to sata power adapter fail. I don't know why it failed, but it was very difficult to diagnose because the problem was intermittent.

Something else you might do is change which disks are plugged into which sata ports. Keep notes on this. If the problem stays with the disk, that will tell you something - if the problem stays with the sata port, that will tell you something else.
 
Top