How to minimize the risk of multiple disk failure

ChrisHolzer · Apr 17, 2017

Hello again!

I am quickly approaching the point where I will order the hardware for my FreeNAS. :)

I want to go with 10TB NAS drives, however one thing that I read frequently is that using the same model/manufacturer for a raid increases the risk of multiple drives failing at the same time, especially during the rebuild process which puts extra stress on the remaining (old) disks.

So I was wondering what you guys think about this. Does anyone of you use drives from different manufactures to minimize the risk of having multiple drives fail? If so, what are the downsides of doing that?

Thanks again for the help I already got here in the forums! :)

Dice · Apr 18, 2017

There are very few users who actively get different drives for this reason. Although you are correct that there might be some benefits.
However, following a proper burn-in procedure of the box and drives will ensure most early death of drives to be caught prior to commitment of data.

The standard reply is: Get quality drives, burn in, your're fine.

danb35 · Apr 18, 2017

Dice said:
Get quality drives, burn in, enable regular scrubs, run regular SMART tests, keep drives cool, your're fine.

Fixed that for you. And use at least RAIDZ2.

rs225 · Apr 18, 2017

I always recommend spending less time worrying about disk failures, and more time developing your backup plan. As motivation, I recommend RAIDZ1. RAIDZ2 is for a system you may not be able to service for a month. RAIDZ3 is for a system you may not service for a year. (These are examples, not hard numbers.)

The fear of multiple disk failure is a HW RAID thing. If you have scheduled scrubs, and SMART tests, you're covered. A scrub is just as much work as a resilver. HW RAID would fail during a rebuild because they would never be scrubbed, and then additional problems would be discovered during the rebuild.

As we have seen recently, your data can be lost in a second by screwed up encryption or accidentally deleting a dataset. Multiple disk failures should be the least of your worries. Make sure you are notified as soon as a problem is detected.

nojohnny101 · Apr 18, 2017

I do not purchase drives from various manufacturers. The above advice that has already been stated is sound. Backups and regular maintenance (automated) is as good as security as any.

ChrisHolzer · Apr 18, 2017

Thanks for the helpful feedback guys!

What do you think about the http://www.seagate.com/gb/en/internal-hard-drives/hdd/ironwolf/ (8TB Model) for a home NAS?

Dice · Apr 19, 2017

ChrisHolzer said:
Thanks for the helpful feedback guys!

What do you think about the http://www.seagate.com/gb/en/internal-hard-drives/hdd/ironwolf/ (8TB Model) for a home NAS?

I'm intrigued by the Ironwolf series for sure. Particularly since they are currently a lot cheaper than WD REDS in my area.
The potential downside to IMO is the deviant formatting and reporting of SMART status. Seagate has used a different pattern than WDs that I'm used to. I find the Seagate SMART data to be more difficult to read. It might be a minor detail accentuated by habituation, but none the less one that influences my decision.

Arwen · Apr 24, 2017

@ChrisHolzer
I personally ended up using 2 x 4TB WD Reds and 2 x 4TB WD Red Pros for my 4 disk RAID-Z2 pool.
And each disk bought separately, (like one Red in retail packaging, one Red bought in bulk packaging).

If I were to do it over, (even though none of the 4 disks has failed or given any indication of failing any
time soon), I would have bought 2 from a different manufacturer.

Please note that in my case, I was looking for 5 years of good, solid, reliable use from my NAS. My prior
NAS lasted 7 years with only a memory upgrade and 2 additional disks.

Last, one thing can help during disk replacement. If you plan on having an extra SATA / SAS disk slot,
(internal or external), you can perform a ZFS disk replacement while the current failing disk is still present.
Basically, ZFS will create a mirror of the failing disk. Any bad blocks encountered, ZFS will use whatever
redundancy is available, (parity or mirror). When done, ZFS will detach the failing disk from the vDev.
Less impact to the overall pool, just the failing disk which obviously we care less about.

Plus, the free disk slot can be used for backups. Especially if it's either hot swap or cold swap, but in a tray.

Dice said:
...
The potential downside to IMO is the deviant formatting
...

Did you mean the Seagate 8TB SMR Archive disks in the above comment?

As far as I know, the Seagate Ironwolf do not use SMR technology.

danb35 · Apr 24, 2017

Arwen said:
If you plan on having an extra SATA / SAS disk slot,
(internal or external), you can perform a ZFS disk replacement while the current failing disk is still present.

It only took nine months to get this into the docs, but it's there (kind of). But see Chris Moore (not Kris Moore)'s comment on the bug, noting that the resilver seems to take much longer this way.

Arwen · Apr 24, 2017

danb35 said:
It only took nine months to get this into the docs, but it's there (kind of). But see Chris Moore (not Kris Moore)'s comment on the bug, noting that the resilver seems to take much longer this way.

Yes, performance using the replace with existing still present, can be slower, (because source is a single disk, instead of the rest of the vDev's disks).

It all depends on what you want to achive. For example, the replace with existing still present can maintain additional redundancy for the data. Except when the failing disk is still finding new errors and taking too long. Then it's time to just get it done and ignore the failing disk.

danb35 · Apr 24, 2017

Arwen said:
Yes, performance using the replace with existing still present, can be slower, (because source is a single disk

I don't think this (the bolded part) is correct. I've done this type of disk replacement a few times, and it looked like the system was hitting the whole vdev quite a bit. It's entirely possible, of course, that there was other stuff going on, but even though the outcome is a replacement disk that's identical to the replaced disk (as of the moment the latter is taken offline), I don't think that's how the process works internally. I'm speaking from a user's perspective, of course, having never looked at the code (and unlikely I'd be capable of understanding it anyway).

Dice · Apr 24, 2017

Arwen said:
Did you mean the Seagate 8TB SMR Archive disks in the above comment?

As far as I know, the Seagate Ironwolf do not use SMR technology.

Correct, Ironwolf do not use SMR.
I referred to the S.M.A.R.T raw value output formatting.

Important Announcement for the TrueNAS Community.

How to minimize the risk of multiple disk failure

ChrisHolzer

Dabbler

Dice

Wizard

danb35

Hall of Famer

rs225

Guru

nojohnny101

Wizard

ChrisHolzer

Dabbler

Dice

Wizard

Arwen

MVP

danb35

Hall of Famer

Arwen

MVP

danb35

Hall of Famer

Dice

Wizard

Similar threads

Important Announcement for the TrueNAS Community.

How to minimize the risk of multiple disk failure

Dabbler

Wizard

Hall of Famer

Guru

Wizard

Dabbler

Wizard

MVP

Hall of Famer

MVP

Hall of Famer

Wizard

Important Announcement for the TrueNAS Community.

Related topics on forums.truenas.com for thread: "How to minimize the risk of multiple disk failure"

Similar threads