Hard Drive Failure and Burnin

Status
Not open for further replies.

Andrew076

Patron
Joined
Apr 5, 2015
Messages
206
I have a 3TB WD Red Hard Drive (WD30EFRX) which failed last night. It was giving me SMART errors but was passing the tests. However since it was still under warranty I had already requested a replacement drive (which should arrive today).

I have been researching how to Replace a Drive in Release 11 and it seems pretty straight forward.

My question however is whether or not there are any recommended steps I should take to the new drive before adding it and it starts the "Resilver" process.

For example, after I built my system several years ago I read where people were recommending a burn in or other procedures be conducted on new hard drives before putting them into a new system (something I didn't do and always worried me). Anyway, I can't seem to find recommendations however for what you do when you are adding a replacement hard drive. Any help would be appreciated.
 

Redcoat

MVP
Joined
Feb 18, 2014
Messages
2,925
Burn it in before you install it - follow the procedures in the Resources section.
 

Andrew076

Patron
Joined
Apr 5, 2015
Messages
206
Burn it in before you install it - follow the procedures in the Resources section.
Where might this Resources section be located?
 

Redcoat

MVP
Joined
Feb 18, 2014
Messages
2,925
Where might this Resources section be located?
It's the third listing on the masthead...
Look here for one of the burnin references - by Moderator @Spearfoot. Also search for Uncle Fester's Guide (or look for a post by @danb35 and find it in his sig).
 

Andrew076

Patron
Joined
Apr 5, 2015
Messages
206
Thank you very much. Good to know it can be done in the System before replacing the failed drive.
 

Stux

MVP
Joined
Jun 2, 2016
Messages
4,419
Love that script.
 

Andrew076

Patron
Joined
Apr 5, 2015
Messages
206
Okay. The more I reread the more concerned I get (too many warnings about how this could destroy your data...) and with 4TB worth it will be a real pain to have to restore anything.

Can anyone confirm for me that when I plug in the new drive, and having not added it to a Volume I assume it will not be a part of the volume and therefore I don't need to detach anything, etc. I believe from the shell I would do the following to do the badblocks check:

sysctl kern.geom.debugflags=0x10
tmux
badblocks -ws /dev/ada5

This is a 3TB drive so I assume this should take some time to complete. I am also guessing at the ada5 value as I have not gotten home to install the replacement drive and actually see what it will out put, however currently when I view the disks they are listed from ada0 to ada4 (although the drive that went bad was originally ada1 my assumption is that the values get reassigned?).
 

Spearfoot

He of the long foot
Moderator
Joined
May 13, 2015
Messages
2,478
Okay. The more I reread the more concerned I get (too many warnings about how this could destroy your data...) and with 4TB worth it will be a real pain to have to restore anything.

Can anyone confirm for me that when I plug in the new drive, and having not added it to a Volume I assume it will not be a part of the volume and therefore I don't need to detach anything, etc. I believe from the shell I would do the following to do the badblocks check:

sysctl kern.geom.debugflags=0x10
tmux
badblocks -ws /dev/ada5
Correct: the new drive will not be automatically added to your pool, so it's safe to perform burn-in tests on it.

You must be sure to identify it correctly, though, because drive ID's can change between reboots and you don't want to blow away the wrong disk. So, make a note of the new drive's serial number, then run smartctl -a /dev/ada1 | grep Serial . Check the serial number; if it doesn't match repeat the command for /dev/ada2, /dev/ada3, etc., until you find the correct drive.

Again... make sure you're working with the new drive when you run badblocks, as it will overwrite every block on the disk.
This is a 3TB drive so I assume this should take some time to complete. I am also guessing at the ada5 value as I have not gotten home to install the replacement drive and actually see what it will out put, however currently when I view the disks they are listed from ada0 to ada4 (although the drive that went bad was originally ada1 my assumption is that the values get reassigned?).
The drive ID's are assigned with no particular scheme, ryhme, or reason. So be sure to identify the new drive before you proceed with testing.

I suggest running a short SMART test before proceeding; this may help quickly identify a DOA disk: smartctl -t short /dev/???

Run the conveyance test, too, if the drive supports it: smartctl -t conveyance /dev/???

If the drive passes these tests, proceed with your badblocks test. I suggest using the -b option, as shown below; you need it for drives >2TB in size:
Code:
sysctl kern.geom.debugflags=0x10
tmux
badblocks -wsv -b 4096 /dev/???
After the drive passes all of your burn-in tests, you can use it to replace the failed disk in your pool. And yes! This will take a long, long time!

Good luck!
 

Andrew076

Patron
Joined
Apr 5, 2015
Messages
206
Okay.... added the new drive etc. all is honky dory.

I have a second drive that is starting to throw off errors as well and I have a replacement for it, but my MB only has 6 SATA slots and I have six drives. So my question is this. Is there a windows program that I can use to do the burn in?
 

Spearfoot

He of the long foot
Moderator
Joined
May 13, 2015
Messages
2,478
Okay.... added the new drive etc. all is honky dory.

I have a second drive that is starting to throw off errors as well and I have a replacement for it, but my MB only has 6 SATA slots and I have six drives. So my question is this. Is there a windows program that I can use to do the burn in?
There may be, but not that I'm aware of.

My script is a POSIX shell script intended for use on Linux and FreeBSD systems equipped with the badblocks program and the SmartMonTools.

I keep an old Dell Vostro 200 system with Intel Core Duo CPU, 8GB of RAM, and a Dell H200 HBA near my workbench in my shop. It boots Linux from a small SSD. I use this system for burning-in disks, plus whatever other uses I find for it from time to time.

In your case, you could take the failing disk offline, shut down the system, install the replacement drive, boot the system, and run the script against the new disk. But your pool would be degraded the entire time. Not the optimal solution, but you may have no other choice if (a) the disk really is failing, and (b) you don't have another system you can use to burnin the disk.
 

Stux

MVP
Joined
Jun 2, 2016
Messages
4,419
You can boot almost any pc up with a FreeNAS thumb drive and do the burnin....
 

danb35

Hall of Famer
Joined
Aug 16, 2011
Messages
15,504
The drive ID's are assigned with no particular scheme, ryhme, or reason.
Not really--they're assigned in the order that the OS picks up the drives. Most of the time (I'd expect the vast majority of the time) this will follow the order of the ports on your motherboard--port 0 will be ada0, port 1 will be ada1, etc. The vast majority of the time, the device IDs will stay consistent, as long as all the drives are plugged into the same places, and nothing's dropped offline. But that "as long as" is enough reason to double-check, especially before you do anything destructive.
 

Spearfoot

He of the long foot
Moderator
Joined
May 13, 2015
Messages
2,478
Not really--they're assigned in the order that the OS picks up the drives. Most of the time (I'd expect the vast majority of the time) this will follow the order of the ports on your motherboard--port 0 will be ada0, port 1 will be ada1, etc. The vast majority of the time, the device IDs will stay consistent, as long as all the drives are plugged into the same places, and nothing's dropped offline. But that "as long as" is enough reason to double-check, especially before you do anything destructive.
Pardon my perhaps too colorful language. Of course there is rhyme and reason in the way FreeBSD assigns drive labels; it's just that the labels are subject to change, depending on the order in which the drives are detected (as you point out) and especially so when a new drive is added to the system.

I merely intended to emphasize the later point, i.e., that drive label assignments might change after adding a new drive, and that the new drive's label should be verified before proceeding with any destructive testing.
 
Status
Not open for further replies.
Top