SOLVED Hard failure of drive, cannot mark as 'offline' per manual

Status
Not open for further replies.

eldo

Explorer
Joined
Dec 18, 2014
Messages
99
Over last weekend (I know, I know, I should have posted a thread earlier) I received an email that my pool had gone into a degraded state.

When I got home from my business trip and could investigate, a drive had completely fallen off of the system, and volume status showed as 'UNAVAIL' and gave no option of marking as 'offline' per the manual:
http://doc.freenas.org/9.3/freenas_storage.html#replacing-a-failed-drive

I have since received an advance RMA from WD for my 2TB Red (2nd failure of 2TB RED in less than one year out of 6 drives, not terribly pleased...) and the new drive has passed the smart short, conveyance, long, badblocks, smart long test with no errors and I am looking to get this drive resilvered as soon as I can so I'll sleep better at night.

Upon investigation, which I admit I may have jumped the gun by powering off the server and removing the drive-- the drive's motor completely fails to spin up and BIOS will not see the drive as plugged in. complete total failure of the drive. I suspect possibly the controller board has failed, but do not want to risk trying to replace the old drive's board with the new WD refurb board.

My question is how to procede since I cannot mark the failed drive as offline.
I do have the 'replace' button when the drive is selected in 'volume status', and I would assume that I would be able to pickup at 8.1.10.1 with clicking the 'replace' button, but would like to verify this is the proper method and not risk any other gotchas, uh-ohs, or otherwise unpleasant and unexpected happenings.

I did click on replace, and the new drive shows up as I would expect -- but I did not click continue and instead decided to cancel the repair process unless there is something I am missing.

Any help greatly appreciated, thanks!
 

eldo

Explorer
Joined
Dec 18, 2014
Messages
99
upload_2015-10-17_2-7-58.png


This is what I have seen in the 'volume status' area, for clarity.
 

Bidule0hm

Server Electronics Sorcerer
Joined
Aug 5, 2013
Messages
3,710
As you don't have the drive anymore you can't offline it (I've more or less the same problem, if I plug the failed drive in the server it won't boot...) just hit replace and follow the documentation from here ;)

Can you post the outputs of smarctl -a /dev/adaX (X = 0 to 5) between code tags for readability please? I wonder if you have a temp or a LCC problem with your drives which would explain the failures.
 

diedrichg

Wizard
Joined
Dec 4, 2012
Messages
1,319
In the spirit of troubleshooting, have you changed the case or psu recently?
 

eldo

Explorer
Joined
Dec 18, 2014
Messages
99
Thanks Bidule0hm that's what I thought but I wanted to make sure.
diedrichg, I have not made any hardware changes since I put the box together in Dec last year.

I did check the other drives in case I did not get a smart alert email, but the errors looked good.
For reference my smart service settings are: 30, Never, 0, 0, 40.
My temps seem to be 30 - 31 on the other 5 drives.

smart -a /dev/adaX results are in the attachment as I can't post longer than 3000 characters.
 

Attachments

  • 2015-10-17-smartctl-a.txt
    39.8 KB · Views: 196

Bidule0hm

Server Electronics Sorcerer
Joined
Aug 5, 2013
Messages
3,710
Every drive looks very good, no problem at all.

Did you burn in the drives before using them?
 

eldo

Explorer
Joined
Dec 18, 2014
Messages
99
Great, thats what i hought as well when I looked through them.

Yes, I did the smart/badblocks process on all of my drives prior to putting them into service.

Currently at 75% in the resliver process, once it completes I'll mark the thread as complete.

Thanks much for yalls eyes
 

Bidule0hm

Server Electronics Sorcerer
Joined
Aug 5, 2013
Messages
3,710
Ok, so then I'd say you just had bad luck to have 2 drives fail in less than one year.
 

eldo

Explorer
Joined
Dec 18, 2014
Messages
99
Agreed.
Ive had plenty of drives, and these are the first two to fail, so I've been pretty lucky up to now.

The resliver is complete, and my status shows as healthy, thanks for the confirmation and looking over my info.
 

Bidule0hm

Server Electronics Sorcerer
Joined
Aug 5, 2013
Messages
3,710
Perfect, you're welcome ;)
 
Status
Not open for further replies.
Top