Resource icon

Hard Drive Burn-In Testing - Discussion Thread

Matdif

Explorer
Joined
Oct 10, 2014
Messages
59
Could you tell us what the emails said? I'm not exactly keen on scouring 9 smartctl dumps without guidance.

Also, temperature you should be concerned about is 40, not 30. Your temperatures are fine.

sry I posted that in the wrong thread didnt realize it was here.
 

mattlach

Patron
Joined
Oct 14, 2012
Messages
280
I guess my question is, why?

Isn't this what redundancy is for? And doesn't ZFS automatically find and correct for bad blocks?

I usually just plop my new drives in, and if one starts having problems, I pull it and RMA it.
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
You mean burning drives in? That's an awfully simplistic view.

The whole premise of RAID and/or ZFS is that a drive is reliable by default, with possible errors or even complete failure. If you don't test your drives, you might end up with 4 dead drives on your hands in a short period - and bam goes your data.
 

mattlach

Patron
Joined
Oct 14, 2012
Messages
280
You mean burning drives in? That's an awfully simplistic view.

The whole premise of RAID and/or ZFS is that a drive is reliable by default, with possible errors or even complete failure. If you don't test your drives, you might end up with 4 dead drives on your hands in a short period - and bam goes your data.

Interesting.

I have never seen a "bad batch" of drives. I've had a few fail here and there from old age, and one or two out of box failures over the years, but nothing that would ever have threatened my array.
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
Interesting.

I have never seen a "bad batch" of drives. I've had a few fail here and there from old age, and one or two out of box failures over the years, but nothing that would ever have threatened my array.

I've been similarly lucky, but you can bet a little accident during delivery will wreck multiple drives at once.
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
Why not screen? Then you can disconnect SSH and run as many as you would like and I find it a lot easier to use than tmux.

[Unless you're a license zealot]

Tmux is included, screen isn't, IIRC.
 

Apollo

Wizard
Joined
Jun 13, 2013
Messages
1,458
Indeed, screen is now available on 9.3.
 

horse_porcupine

Dabbler
Joined
Jun 11, 2013
Messages
22
Maybe a stupid question, but can I set up shares, datasets, permissions, scrubs, SMART tests, etc while running badblocks..? Or will anything I set up be lost? I started badblocks yesterday and it looks like it'll be another ~72hrs before it's done. Good Lord is it difficult not to tinker...
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
Maybe a stupid question, but can I set up shares, datasets, permissions, scrubs, SMART tests, etc while running badblocks..? Or will anything I set up be lost? I started badblocks yesterday and it looks like it'll be another ~72hrs before it's done. Good Lord is it difficult not to tinker...

No. Badblocks is very destructive. Potentially even "Forensics won't get much out of this" destructive. Badblocks should only be run on unformatted drives.
 

horse_porcupine

Dabbler
Joined
Jun 11, 2013
Messages
22
Blast...! I'll have to sit on my hands for another few days so... Thanks Ericlowe!
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
Look at the bright side: You didn't have to run three and a half sets of badblocks because you could only test four drives at a time, insisted on testing all drives more or less equally and did not use tmux and the remote computer crashed in the middle of the SSH session.
 

Fraoch

Patron
Joined
Aug 14, 2014
Messages
395
Look at the bright side: You didn't have to run three and a half sets of badblocks because you could only test four drives at a time, insisted on testing all drives more or less equally and did not use tmux and the remote computer crashed in the middle of the SSH session.

Jeez, I thought I was the only one that happened to...

@sogansfire, look on the bright side, at least the time isn't totally wasted - this is intensive drive testing time deducted from your recommended 1000 hours testing.
 

horse_porcupine

Dabbler
Joined
Jun 11, 2013
Messages
22
Ouch...! Tmux definitely makes life easier.

My worry now is that I'm seeing errors on some new drives I bought...
WD Green 4TB - Testing with pattern 0xff: ~65% done, ~42hrs elapsed. (0/0/48 errors)
WD Red 4TB - Testing with pattern 0xff: ~60% done, ~42hrs elapsed. (0/0/48 errors)
WD Green 4TB - Testing with pattern 0xff: ~52% done, ~42hrs elapsed. (0/0/136 errors)​

How concerned should I be about this? The most I've done I think is to log in to the Web GUI and check the reporting tab, can't think of anything else that would have caused these. Just have to leave it finish I guess and go from there, but a bit concerning. :(
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
I'd say that's pretty serious. It shows your disks or storage subsystem has some kind of problem.
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
I'd start by taking a look at the SMART data for the affected drives (smartctl -a /dev/adaX) - which you can do right now. It should give an idea of where the problem lies.
 

Matdif

Explorer
Joined
Oct 10, 2014
Messages
59
I have a problem with the badblocks test and was wonder if someone could interpret it.

A month ago I had a problem with a motherboard I was using to set up freenas. Specifically the motherboard is the
C2750D4I found in the recommended hardware thread. The specific problem I had was that the bottom 4 sata ports on the marvell controller could become unreliable when loaded with more then 1 hard drive. I found another person who had the problem and after he got an rma he says he still had it. Meanwhile I had sent my board in for replacement and told Asrock over the phone my specific problem. Meanwhile it took a month to get a board to me with ups losing it finding it sending it back to them plus the holidays.

Now I have a board back but it looks like an updated model. At least they changed the packaging from a black box to blue and such. The board itself looks much the same but I was hopefull they upgraded something or fixed the problem. Now I hooked everything back up made sure to fill every Marvell sata port as a test and started a badblocks test again on all slots.

The problem is I am getting an error I did not get before.

I just started the tests and every hard drive badblocks test immediately gave me back "set_o_direct: Inappropriate ioctl for device". The first board I had did not give this error I still have a screenshot from before.

You can see it in the pic. The tests seem to be continuing so I will wait and see if I get any error emails for the marvell controller becoming unreliable unstable like before. In the meantime I was wondering what everyone thinks of this error. Is the test still good? Or has something changed in the board that will make the test unreliable? Is there something else I should do now to test and make sure this thing will handle a zpool well?
 

Attachments

  • badblocks_ser_o_direct.gif
    badblocks_ser_o_direct.gif
    157.8 KB · Views: 629

horse_porcupine

Dabbler
Joined
Jun 11, 2013
Messages
22
I've tried running SMART tests on the drives that I'm currently running badblocks on, and they don't seem to be finishing for some reason. They seem to be stuck at 10% remaining. I'm guessing this will correct itself once badblocks has run. Looking at current results without having re-run a short, conveyance, or long test is meaningless, correct? I need a SMART test to successfully complete to properly interpret the results?

I was able to run the SMART tests prior to starting badblocks and everything looked fine, but it'll be interesting to see what they report as soon as badblocks is finished.
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
I've tried running SMART tests on the drives that I'm currently running badblocks on, and they don't seem to be finishing for some reason. They seem to be stuck at 10% remaining. I'm guessing this will correct itself once badblocks has run. Looking at current results without having re-run a short, conveyance, or long test is meaningless, correct? I need a SMART test to successfully complete to properly interpret the results?

I was able to run the SMART tests prior to starting badblocks and everything looked fine, but it'll be interesting to see what they report as soon as badblocks is finished.

Not really. Any errors are immediately logged. A bad sector won't be found until it's read/written, but it'll be found during normal IO and not just tests. The short test does... something... at least we think it does. The long test is supposed to do a full surface scan. The conveyance test is only meant to be used after transport.

Failing SMART errors tends to indicate a bad drive.
 

horse_porcupine

Dabbler
Joined
Jun 11, 2013
Messages
22
Ah, I see. Well in that case, they're all "PASSED" and none of the numbers look too unusual. So now I'm not sure what to make of the badblocks errors. They haven't increased in number since earlier so maybe it was some kind of blip or I did do something that saved to the disk throwing them off. Nothing to be done other than keep testing I suppose. Thanks again.
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
When the tests are done, paste the SMART output into pastebin and link it here.
 
Top