Freenas 8.3 Release not responding after hdd pull

Status
Not open for further replies.

gizimoto

Cadet
Joined
Nov 30, 2012
Messages
9
Hi All

I am new to FreeNAS and ZFS in general, but I have reasonable level of IT knowledge.

System: 10x3TB RAIDZ2, XEON E3 12?? V2 CPU, 16GB ECC, SuperMicro X9SCM-F 2.0 bios board, 2x DELL SAS (LSI 2008) running DELL IT firware. Freenas runs in a ESXI VM.

I worked through all the issues with the cards firmware, etc and had everything up and running, got the updated from beta to release sorted also. Been working fine for a week or so.

Today I pulled the SATA & power from one of the drives - to test how to recover from a drive failure - bad idea (or possibly better to learn now than when system critical). I am not a risk of losing any data as the content is 'dummy' data.

Now the web user interface does not respond - firefox must get something back as it never times out just 'connecting'.

The console also does not respond the last message was "(da10:mps1:0:10:0): lost device - 0 outstanding, 1 refs", I get nothing back other than my key inputs echo'ed.

vSphere shows the CPU pegged at 100%.

SSH says host not responding (but never accesss system via this method before either).

Help - what is the method of recovery now?

Cheers,
Mark
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
Boy am I glad you wrote "not a risk of losing any data asthe content is dummy data". The manual states clearly never to remove a drive while the system is on without informing the OS of your intent first.

Before I deployed my first server I pulled a drive just as you did with dummy data. Scrub found no problems after reinserting the drive. I didn't have any problems with the system becoming unresponsive. In fact, the only way I knew the drive had been removed was a 'zpool status'. It would lost more and more errors with the removed drive with each access of the zpool.

For you, I guess your hardware can't handle a drive being pulled while on. But you should be able to reboot the machine with no ill effects except loss of any data that wasn't written to the zpool. Technically, there is a difference between pulling a drive with the system on and a hard disk failing. Usually(but not always) a failing hard disk will still communicate with your SATA controller. You also broke the current loop of the SATA signals by disconnecting the drive.

If I were in your shoes and testing a "failed disk" I'd:

1. Reboot your server and see if it works fine(it should).
2. Zero out the drive you removed(because the "removed" disk "failed" so you are using a "new" disk).
3. Then reinstall the new disk with the system online. The new drive may or may not work since your hardware didn't like the drive being pulled under load. The system may even lock up again.
4. Reboot the machine if necessary to make your drive available to the system.
5. Using the FreeNAS manual, follow the steps necessary to rebuild the zpool back to full redundancy.
 

gizimoto

Cadet
Joined
Nov 30, 2012
Messages
9
Ok. I hot replugged the drive, I restarted to VM (not hard reset) and the system is back online.

Items of concern are the logs report nothing, and the pool appears healthy.

I don't have time to dig deeper into the logs tonight, but if there are items I should check (yes - I should re-read the FAQ and manual)

I assume the system should not report Status OK after such a mishap.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
It shouldn't, but it does. Theres a thread discussing this issue somewhere in the forum. Basically, you have to look at the nightly emails from FreeNAS and see that one drive has TONS of errors. Or wait until the GUI catches up, which I think it does at midnight or 3am. Can't remember for sure.
 
Status
Not open for further replies.
Top