Replacing faulty disks (with bad sectors) with the cloned versions, on FreeNAS 11.3 with 11 disks (RAIDZ2)

szebasztian

Cadet
Joined
Oct 20, 2023
Messages
2
Hi everyone!

My question is a little specific, maybe someone has some experience with it.

We've got a FreeNAS 11.3 install on an HP ProLiant DL380 Gen9 (Intel Xeon E5-2620, 16GB DDR4) with 11 disks (1 x 500 GB Toshiba some consumer type and 10 x 8TB Seagate IronWolf). RAIDZ level is RAIDZ2.
Now, the system, for some reason, didn't report that some of the disks needs to be replaced, so we've ended up with 3 critical 8TB disks (with 4%, 14% and 60% health) and with some bad sectors. The others are OK. The whole NAS and the data is still accessible, however sometimes when copying data (for backup purposes) it freezes and needs to be restarted.

My question is: we've just bought 3 new 8TB IronWolf disks and could I replace the 3 faulty disks with the cloned versions of it (cloning sector-by-sector, because the disk can be accessed)? Would the FreeNAS recognize them and would not ruin the pool, thus loosing all the data?

Or what would be the best practice in this case? I'm fairly new to FreeNAS, as the server was passed down to me from the previous IT colleague.

If I'm not clear enough feel free to ask for more info.

Thank you for your help in advance.

Cheers!
 

Arwen

MVP
Joined
May 17, 2014
Messages
3,611
Off-line cloning for replacement is not an approved method. It may work, or it may not due to the drive's serial number changing.


In your case, with 3 faulty drives in a RAID-Z2, replace in place would be best. If you have a free disk slot, you install the replacement disk in it and tell FreeNAS to replace one of the faulty disks with this new replacement.

Whence that is complete, you can then remove the "bad" disk, and put in another replacement disk. Repeat...

Unfortunately on rare occasions replace in place can take longer to replace since it favors reading data from the "bad" disk. It more or less mirrors the "bad" disk, until it finds a bad spot. Then it uses RAID-Z2 recovery information from the other disks to fix the bad data, and write good data to the replacement disk.

Some disks people use are still setup for desktop use. Or the vendor screwed up. Normally NAS disks use TLER, Time Limited Error Recovery, of about 7 seconds. (Seagate calls TLER something else.) This is because NAS servers can generally get redundant / recovery information from other disks. Desktops on the other hand, want more than 60 seconds, (per bad block), to see if extreme measures can be used to recover the bad block. This causes delays in recovery for NAS applications.


Now what you do if you don't have a free disk slot, I don't know. Some people have temporarily used USB to SATA enclosures for the task. Then, when synced up, shutdown and put the new, good disk in place of the bad disk you just replaced. However, USB to SATA enclosures are not all made equal. Since this task would be heavy writing, you want one with fan cooling. And preferably USB 3 with USAP. Even then, their is not certainty this will work.
 
Last edited:

szebasztian

Cadet
Joined
Oct 20, 2023
Messages
2
Thank you for your excellent and detailed answer, Arwen. Plus all the extra shared knowledge. Honestly I didn't thought of that method.
Fortunately the server has 12 slots, so there is one free and it is possible to try and hopefully succeed.

I'll definitely keep you posted about the progress.
 
Top