Preventative disk replacement?

Status
Not open for further replies.

MDKAOD

Dabbler
Joined
Mar 11, 2014
Messages
37
Hey all. When I set up my box 18 months ago, the only option I had at the time for the quantity of disks I needed was to order from a single vendor. So, currently, I'm running 15 disks from the same lot. I've been acquiring new disks from different vendors with staggered lot numbers over the last couple of months and I would like to replace the working disks with the tested new staggered lot disks.

Reading the manual, it covers replacing failed disks, and I just want to confirm the procedure is the same for replacing perfectly fine, working disks.

(RaidZ2 configuration) From Volume Status, select disk to be replaced > Offline > Insert new disk > Click Replace disk > Select new disk

Correct? Or is there a better way? Can I resilver to another disk in the chassis without physically removing the working disk until the resilver is complete?
 
Last edited:

danb35

Hall of Famer
Joined
Aug 16, 2011
Messages
15,504
Exactly the same process. If you have a spare SATA port and a place for another disk, you can install the new disk first, do the replacement, and when it finishes, remove the old one. That way there's no reduction in redundancy while the new disk is resilvering.

But that said, it seems needlessly expensive. Implement a comprehensive SMART testing and monitoring protocol and replace disks as needed.
 

MDKAOD

Dabbler
Joined
Mar 11, 2014
Messages
37
Thanks Dan. Purely a security blanket. We'll re-purpose the current disks, but we host so much internal data (print house) that a true backup solution is more ludicrously expensive so we're more interested in making sure the raid array is safe. None of the data is mission critical, more of a value added service to our clientele, so if the machine melts in a fire, we have bigger issues to worry about than the loss of a bunch of ripped/raw pre-processed print data.

Furthering your point about the spare SATA port, which I have, can I resilver to the new disk in the spare port, remove the old disk, then switch the new disk to the old port? What are the ramifications of doing so?
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
Honestly, I don't subscribe to the idea that multiple lots are bad.

Take a hypothetical server... two vdevs of RAIDZ2 of 10 disks.

You are diligent, so you buy 5 disk each from different vendors so as to avoid this "firmware is sucky" problem.

So you build that shiny pool and copy all your data to the zpool. You sleep well at night, right?

Ok, so now you have 4 possibilities of failure. All you need is 3 disks in any vdev to fail and you are done. So any lot turns out to be a fail lot, you are screwed. So you just quadrupled the chances of *having* a bad lot.

What if you bought 20 disks from Newegg. You've basically got a zpool that is going to be rock solid, or be a disaster. There really won't be any in-between.

I don't know about you, but I haven't heard anything about bad lots in a *very* long time. So I tend to think that those kinds of problems aren't something that you really need to mitigate.

Now, if you went with the same layout, but bought 2 disks from 10 different vendors, and then made sure that each vdev had no 2 disks from the same vendor, then I'd say you just might be better off. I say might because now you've made the chances of a "bad lot" go up by 10 fold!. But nobody buys 2 disks from 10 vendors. That's just silly!

I realize that there are people that have had first-hand experience that says otherwise. But the past isn't necessarily the present, and there is still the need to be smart and not actually increase your risk of zpool loss. :P
 

MDKAOD

Dabbler
Joined
Mar 11, 2014
Messages
37
Ok, so now you have 4 possibilities of failure. All you need is 3 disks in any vdev to fail and you are done. So any lot turns out to be a fail lot, you are screwed. So you just quadrupled the chances of *having* a bad lot.

This is exactly the reason I want to swap good disks from 1 lot to staggered lots. Sure, the disks are fine now, but I'm assuming that the build quality will be pretty on par with each other, and they all have a 2 year manufacturer warranty. That puts the manufacturer "guarantee" all on the same month, day and year.

Now, while I don't expect the disks to all die at the same time, I want to minimize that risk, and if I can do that for $50 every couple of months until I've swapped all of them out, it's cheap insurance.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
This is exactly the reason I want to swap good disks from 1 lot to staggered lots. Sure, the disks are fine now, but I'm assuming that the build quality will be pretty on par with each other, and they all have a 2 year manufacturer warranty. That puts the manufacturer "guarantee" all on the same month, day and year.

Uh... what? You are claiming to be trying to do, for the benefit of reliability, exactly what I'm saying I wouldn't do, for the benefit of reliability. :P
 

MDKAOD

Dabbler
Joined
Mar 11, 2014
Messages
37
You said you don't believe multiple lots are bad, and I'm agreeing with that statement, but re-reading I follow you. English is such a pesky language.
Wouldn't that logic only work until the time the drives start to fail. So, yes, the array will be rock solid for years, but what happens when the mechanical lifespan of the drives themselves begin to reach EOL? That there's the risk of losing too many drives while waiting for resilvering. I've read both of your RAID5 is dead articles, I figured you'd support this line of thinking :)
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
Ah. I see what you are saying. Instead of having all of your disks with the same age, which means that in theory you'd likely have all the drives failing from old age simultaneously, you are proactively replacing them in a staggered process.

But, I do believe that multiple lots are bad. If a given lot is going to be bad, unless you plan to buy a bunch of different lots, then you are increasing the chance of a bad lot making your zpool unmountable. For that reason I do prefer to keep all my eggs in one basket. Sure, you might be putting your eggs in 5 different baskets, but if one basket of eggs breaks you might as lose them all, right?

To be honest, I really haven't thought about that kind of problem that you are hypothesizing (disks having rapid successive failures). I've been fortunate in that:

- I've never had so many failures that I had to be concerned with this at all. (last server had drives that were all out of their 3Y warranty and I only had seen 3 failures in 24x7 uptime)
- I upgrade so fast that I never really see disks get "really old". To me, "really old" means more than 4 years old.

To be honest, in today's day and age I wouldn't think too much about proactively replacing them until I noticed an uptick in failure rates that made you a little concerned, and at that point I'd probably be looking at most of (or a whole) new server. The stuff I bought when I started FreeNAS in 2012 (X9SCM, 32GB of RAM, etc.) is getting kind of long in the teeth. I've had the system for 3 years, and I probably won't be doing more storage for at least another year. So I'll probably be looking at a whole new server and ZFS replication at that point in time.

Do you have reason to think that these drives are going to fail soon? If you haven't had any failures in several months I think I'd wait to start swapping disks. I prefer to try to use hard drives as long as possible before replacing them, especially since hard drive prices on a per-GB basis are constantly dropping. Waiting just a few months longer than you might "want" can often yield significant savings. Especially if you get lucky and can buy during a weekend that often has savings (like this weekend... Memorial weekend).

Even with my experience here at iX, disk failures seem to occur with irregularity. Even some of the systems that are 5 years old still perform extremely well and they don't have disks failing frequently.

So what would I do? I'd keep things in your server exactly how they are until you have reason to be concerned that lots of disks may start failing soon. Or you need to add more disk space and you want to replace them to expand the zpool. ;)
 

MDKAOD

Dabbler
Joined
Mar 11, 2014
Messages
37
Ah. I see what you are saying. Instead of having all of your disks with the same age, which means that in theory you'd likely have all the drives failing from old age simultaneously, you are proactively replacing them in a staggered process.

<snip>

Do you have reason to think that these drives are going to fail soon? If you haven't had any failures in several months I think I'd wait to start swapping disks. I prefer to try to use hard drives as long as possible before replacing them, especially since hard drive prices on a per-GB basis are constantly dropping. Waiting just a few months longer than you might "want" can often yield significant savings. Especially if you get lucky and can buy during a weekend that often has savings (like this weekend... Memorial weekend).

You're absolutely correct. The array in question scares the hell out of me. I have a striped RaidZ2 (4 disks per array, 8 disks total) all bought at the same time at the same vendor. I don't see outgrowing this array within the next year or so, and I've been picking up identical disks over the last two months from different vendors and want to "fix" this array with a more reliable disk setup, which is what led to this question in the first place.
Can you answer my secondary question about resilvering in a spare SATA slot then moving the drive from one slot to another in post #3?
 

Bidule0hm

Server Electronics Sorcerer
Joined
Aug 5, 2013
Messages
3,710
Furthering your point about the spare SATA port, which I have, can I resilver to the new disk in the spare port, remove the old disk, then switch the new disk to the old port? What are the ramifications of doing so?

No problem, FreeNAS doesn't care on which port each drive is. You can shutdown the server, swap every drive, boot and everything will be working fine ;)
 

MDKAOD

Dabbler
Joined
Mar 11, 2014
Messages
37
@Bidule0hm Thank you so much.
You've all been very helpful.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
You're absolutely correct. The array in question scares the hell out of me. I have a striped RaidZ2 (4 disks per array, 8 disks total) all bought at the same time at the same vendor. I don't see outgrowing this array within the next year or so, and I've been picking up identical disks over the last two months from different vendors and want to "fix" this array with a more reliable disk setup, which is what led to this question in the first place.
Can you answer my secondary question about resilvering in a spare SATA slot then moving the drive from one slot to another in post #3?

I wouldn't worry about your RAIDZ2 zpool at all. Unless several disk are exhibiting signs of failing soon I think you're worse off going to multiple brands/models/lots versus keeping what works running.
 
Status
Not open for further replies.
Top