How do you burn in new disks?

rvassar · Jun 30, 2018

I got bit by a WD 1TB Black drive a couple years ago. I dropped it in my workstation, installed Linux, and "moved in"... Completely, ssh keys, PGP keys, etc... Powered it off and went to bed. They next morning, I powered it on and it went "ker-thunk... click", and that was that. I then faced the prospect of RMA'ing a < $100 drive I couldn't wipe, that contained encryption & security keys. It's been sitting on the shelf now for 2+ years of it's 5 year warranty.

Lesson learned, I now spin up the drive and run fio against it for at least 24 hours. Something along the lines of:

Code:

sudo fio --name=randrw --time_based --runtime=86400 --ioengine=libaio --iodepth=64 --rw=randrw --bs=4k --direct=1 --numjobs=4 --filename=/dev/sdX

But I'm curious, how do you burn in new disks? What do you consider a sufficient amount of time?

Rob

kdragon75 · Jun 30, 2018

rvassar said:
It's been sitting on the shelf now for 2+ years of it's 5 year warranty

Call then and tell them it's been in service for healthcare and you can only send the drive back if its been physically destroyed for HIPAA compliance and that is a legal requirement. Just leave the SN# intact.
As for the burn in try searching, I know there are a number of Resources (black top navigation bar in the forum) that cover this and even a few awesome scripts.

Chris Moore · Jun 30, 2018

rvassar said:
I got bit by a WD 1TB Black drive a couple years ago. I dropped it in my workstation, installed Linux, and "moved in"... Completely, ssh keys, PGP keys, etc... Powered it off and went to bed. They next morning, I powered it on and it went "ker-thunk... click", and that was that. I then faced the prospect of RMA'ing a < $100 drive I couldn't wipe, that contained encryption & security keys. It's been sitting on the shelf now for 2+ years of it's 5 year warranty.

Lesson learned, I now spin up the drive and run fio against it for at least 24 hours. Something along the lines of:

Code:
sudo fio --name=randrw --time_based --runtime=86400 --ioengine=libaio --iodepth=64 --rw=randrw --bs=4k --direct=1 --numjobs=4 --filename=/dev/sdX

But I'm curious, how do you burn in new disks? What do you consider a sufficient amount of time?

Rob

Did you look at the links in my signature?
One is to a group of scripts. One of those scripts is all about burn-in of new drives.

I have also been known to use Dan's Boot and Nuke (DBAN) to do a short DOD wipe with verify between passes as a test of the drives before they are used for data.

Sent from my SAMSUNG-SGH-I537 using Tapatalk

danb35 · Jun 30, 2018

rvassar said:
how do you burn in new disks?

https://www.familybrown.org/dokuwiki/doku.php?id=fester:hvalid_hdd

rvassar · Jun 30, 2018

Chris Moore said:
Did you look at the links in my signature?
One is to a group of scripts. One of those scripts is all about burn-in of new drives.

I have, but missed that script. I went and looked at it in Github, it actually doesn't look significantly different from what I'm doing. I just have more experience using the fio command, and use it out of habit. But the thing about fio is it doesn't give you any guaranty that it's touched every sector. It gets there statistically over time, where as badblocks appears to specifically define that as it's end point. The fio snippet I included does a read & compare, but presents a random sequence that flings the actuator around. Fio beats the mechanism up, and badblocks comprehensively tests the media. Sounds like we need to do both.

kdragon75 said:
Call then and tell them it's been in service for healthcare and you can only send the drive back if its been physically destroyed for HIPAA compliance and that is a legal requirement. Just leave the SN# intact.

It's not worth the trouble at this point. It was mostly a case of not wanting to hunt down the 40+ systems that had those keys installed, and swap things out. At the time, my time was worth more than the new drive. I've since moved on to SSD for workstation use.

danb35 said:
https://www.familybrown.org/dokuwiki/doku.php?id=fester:hvalid_hdd

More good stuff!

Chris Moore · Jun 30, 2018

rvassar said:
At the time, my time was worth more than the new drive. I've since moved on to SSD for workstation use.

I have a drive like that, I use it as a coaster for my coffee cup.

rvassar · Jul 1, 2018

Ok... So I omitted the step that precipitated my historical drive failure. The power down and return to room temp.

So here's my burn-in list:

1. 24 hours random Fio exercise.
2. Extended SMART self-check
3. Power-off, sufficient duration to reach room ambient temp., preferably somewhere around 68/F or 20/C.
4. Full badblocks run.
5. Extended SMART self-check. Compare to previous run, with specific attention to reallocated sectors.

I think this encompasses everything in the community scripts, and adds some extra stress. I did omit the initial SMART self-test, just because I was running manually. If I was automating this, I'd keep it.

Chris Moore · Jul 1, 2018

rvassar said:
I think this encompasses everything in the community scripts, and adds some extra stress. I did omit the initial SMART self-test, just because I was running manually. If I was automating this, I'd keep it.

I like to run the full sweep of the platters a couple times, both write and read, so that the drive has a chance to detect bad sectors.

kdragon75 · Jul 2, 2018

rvassar said:
Ok... So I omitted the step that precipitated my historical drive failure. The power down and return to room temp.

So here's my burn-in list:

1. 24 hours random Fio exercise.
2. Extended SMART self-check
3. Power-off, sufficient duration to reach room ambient temp., preferably somewhere around 68/F or 20/C.
4. Full badblocks run.
5. Extended SMART self-check. Compare to previous run, with specific attention to reallocated sectors.

I think this encompasses everything in the community scripts, and adds some extra stress. I did omit the initial SMART self-test, just because I was running manually. If I was automating this, I'd keep it.

I like that you include a few thermal cycles.

rvassar · Jul 2, 2018

Chris Moore said:
I like to run the full sweep of the platters a couple times, both write and read, so that the drive has a chance to detect bad sectors.

That's what the 24 hour Fio run gets me. But it's fully random, with the heads seeking all over the place. The problem is, you have to adjust the duration to match the size of the drive, and the performance rate. For example:

The drive I'm currently testing is 3Tb, or 3*1000Gb x 1000Mb = 3,000,000 Mb. It should be capable of ~140mb/sec sustained write, which means Fio should cover the whole storage space in 21,428 seconds, or ~6 hours. So after 86,400 seconds, Fio will have swept the storage space four times. However... Consider if I was testing this in a USB enclosure attached to an old laptop stuck at USB 2.0 speeds. That limits the rate to roughly 40 mb/sec. Reworking the math, 3,000,000 / 40 = 75,000 seconds. So in that case my 24 hour (86400 seconds) run only sweeps the platters a little more that once. Which is probably not enough.

Chris Moore · Jul 2, 2018

rvassar said:
But it's fully random, with the heads seeking all over the place.

My worry about that is actually the 'fully random' part. It may not hit every location. It might hit some locations 5 or 6 times and other locations only 1 or 2 times, possibly even none at all. I like the sequential nature of the badblocks test. Just me, but I would sweep the whole drive once with badblocks, then run the random test with fio long enough to theoretically cover the whole drive, then sweep the whole disk again with the badblocks test and run SMART tests in between. The power cycles are a good test too. I have had more than one drive give me the "click of death" when powered back on after having cooled over night. It is one of the reasons that I don't like to turn systems off and if I do, I want them back on as soon as possible. The thing I have seen in the population of drives at work is that most of the ones that survive the first year will last into the 5 year range. I like Seagate desktop drives for home, because they are cheap, but we use the "enterprise" drives at work and most of them would still be in warranty at 5 years, but we get them with the servers and the OEMs will usually cut the warranty off at 3 years and the drive manufacturer will tell you to go back to the server vendor. Sorry, off topic there.
Mostly, I try to stress the drives out in the first month so I can get any that are going to fail to fail while they are still in warranty, preferably before I even put data on them. If they survive testing, they usually last at least 5 years. At work that means they last the life of the server, but we still keep spares on hand because there are the occasional drives that will not do what they are supposed to do...
Last year I replaced all the drives I had, in both of my home FreeNAS systems, because they were all over 5 years of age. It is a massive pain in the wallet to buy that many drives at one time.

rvassar · Jul 2, 2018

Chris Moore said:
My worry about that is actually the 'fully random' part. It may not hit every location. It might hit some locations 5 or 6 times and other locations only 1 or 2 times, possibly even none at all.

That's definitely a concern, which is why I mentioned figuring out the rate at which it covers the storage space, and you'll note I do the badblocks scan afterwards which does cover the entire space. It's a statistics game. I want the head actuator to get exercised. A bad sector doesn't have to be faulty media, it can also result from the servo mechanism failing to position the head correctly on seek, or a motion induced departure from intended flying height, etc...

So far this 3Tb Toshiba P300 desktop model seems to be holding up OK under test. I picked it up for $70 on sale at Fry's the other day... I expect some ECC RAM to arrive today, and am planning on swapping my T3500 ESXi box with the Optiplex 790 I'm currently running FreeNAS on. When that happens, I'll retire a 2Tb drive that is approaching the 50,000 hour mark.

Chris Moore · Jul 2, 2018

rvassar said:
So far this 3Tb Toshiba P300 desktop model seems to be holding up OK under test.

I hope it works better for you over time than the Toshiba drives I bought last year.
I picked up 8 (6 and 2 spares) of the DT01ACA200 Toshiba drives and in only 6 months I had 3 hard failures. That made me not trust the drives so I sold the remaining 5 on eBay and bought Seagate Desktop drives to take their place. That was in the early part of last year (2017).
I still have a photo:

rvassar · Jul 2, 2018

Chris Moore said:
I hope it works better for you over time than the Toshiba drives I bought last year.
I picked up 8 (6 and 2 spares) of the DT01ACA200 Toshiba drives and in only 6 months I had 3 hard failures. That made me not trust the drives so I sold the remaining 5 on eBay and bought Seagate Desktop drives to take their place. That was in the early part of last year (2017).
I still have a photo: View attachment 24606

Did you try RMA'ing any of them? I know Toshiba has had some financial difficulties recently.

I have trust issues, which is why I'm working on the proper burn-in. This will be the first Toshiba drive in my house, so I'm pairing it up in the vdev with a HGST 3Tb drive that has proven to be rock solid.

Chris Moore · Jul 2, 2018

rvassar said:
Did you try RMA'ing any of them? I know Toshiba has had some financial difficulties recently.

The drives I bought were sold to Dell as OEM drives but the company that bought the systems pulled them before they even turned the computers on and replaced with SSDs. So, I couldn't get warranty from Toshiba and I couldn't get warranty from Dell. I got them cheap and sold them for almost what I bought them for, minus the ones that died. I had high hopes going in, but it didn't work out for me. The one you have is a slightly different model, so it might not suffer from the issues I had.
Over the years, I have seen some HGST drives fail, but they are generally very reliable. I have four of the 500GB IDE drives from when it was still Hitachi that I ran for over 5 years before I stopped using them and I recently bought some IDE to SATA adapters just so I could check them out and after sitting for over another 5 years, they still survived a round of DOD wiping from DBAN and appear to be working as good as they ever did. I just don't want to throw them out, but they are not good for much at this point. The question in my mind is, what can I do with them?

rvassar · Jul 2, 2018

Chris Moore said:
I just don't want to throw them out, but they are not good for much at this point. The question in my mind is, what can I do with them?

I think we all probably have a stack of old drives kicking around the junk box. I have at least three WD 250Gb, and a Seagate 320 in SATA, and another 320 PATA, and that's just the 3.5" stuff. I have a couple USB enclosures for them. I keep them around for oddball experiments, if I drop one and it dies, no loss. But with ESX and now FreeNAS I find less and less use for them.

Chris Moore · Nov 20, 2019

rvassar said:
I have trust issues, which is why I'm working on the proper burn-in. This will be the first Toshiba drive in my house, so I'm pairing it up in the vdev with a HGST 3Tb drive that has proven to be rock solid.

How did this work out for you?

rvassar · Nov 20, 2019

Chris Moore said:
How did this work out for you?

Has just shy of 11,000 hours on it. Back around the 7k hour mark, I picked up a couple Easystore 8Tb disks and it got rotated out of the mirror pool, and paired in my RAIDz backup / security cam pool, but the other two pool members were 2Tb, so I put the 3Tb HGST drive in as active, and the Toshiba as a hot spare, intending to pick up a white label 3Tb drive to rid myself of the older 2Tb units when I added another camera.

Then I got RIF'ed...

So it's been sitting spun up but idle since early Summer. The odd quirk is, it now throw's seek errors that rise to the level of FreeNAS warning me of imminent failure randomly every couple weeks, but it's not really even an active pool member. I run a long SMART test and it just goes away. I'm kind of curious and planning an experiment.

Apologies for being rather absent lately... I find myself working contract, so I'm keeping a low profile.

Chris Moore · Nov 20, 2019

rvassar said:
Apologies for being rather absent lately... I find myself working contract, so I'm keeping a low profile.

I was working a contract that didn't get renewed, so I found myself out of work at the end of August. It was two months before I was able to get back to work. I am back with the same company, but working on a different contract at a different facility. I sure want to get out of the contract support business. It is just not reliable enough.

Important Announcement for the TrueNAS Community.

How do you burn in new disks?

Guru

Wizard

Hall of Famer

Hall of Famer

Guru

Hall of Famer

Guru

Hall of Famer

Wizard

Guru

Hall of Famer

Guru

Hall of Famer

Guru

Hall of Famer

Guru

Hall of Famer

Guru

Hall of Famer

Similar threads