Replacing drives, may be facing data loss

Status
Not open for further replies.

pdoten

Dabbler
Joined
Mar 27, 2013
Messages
24
Version: FreeNAS 8.3

Back in March I had one of the 3 drives in my NAS go out (let’s call that drive #1). I replaced it per the instructions and it was easy, everything went smoothly. Fast-forward to a few days ago I noticed that the performance of my NAS appeared to be pretty slow. It was fine a few days prior so I opened up the GUI to find out that my pool was degraded. I began troubleshooting the hardware and identified the bad drive, ordered a replacement, and shut down the NAS to prevent any further issues. I was prepping the new drive last night and when I loaded the GUI to mark the old drive offline, I didn’t have the option. Only the replace option was showing in the UI. I rebooted and launched seatools (since drive #2 and #3 are Seagate drives) to grab the serial number of the bad drive. I ran a quick test on the drive I suspected (#2) to be bad and drive #3 as well. The suspected drive (#2) instantly failed. However, drive #3 also failed at the end of the quick test. I restarted FreeNAS once more to see if I could get the GUI to give me the option to mark drive #2 as offline. The GUI was still not giving me the option to mark as offline. To make things more scary the console was showing read errors on one drive. This corresponds to drive #3. I shutdown, awaiting what I should do next.

At this point, facing possible data loss, what is my best course of action? To me it seems that my best course of action may be:

-Replace drive #2 in the GUI. This is different from the workflow in the manual/documentation and what I had done previously so definitely a concern of mine not having previously marked the drive as offline.

-Let drive #2 resliver

-Pray

-Once the resilver is complete, shutdown

-Follow the replacement procedure for drive #3

Now if there are bad sectors on drive #3 and the only other copy of the data existed on drive #2 which is now a paperweight I am SOL. I am not particularly familiar with the error handling of FreeNAS so it definitely concerns me quite a bit. Coupled with the fact that my only option in the GUI is to just use the replace function, not having previously marked #2 offline, I am worried that will mess up the pool even further. I have attached two screenshots of what I am seeing at the moment.

free-nas-1.JPG
free-nas-2.JPG
free-nas-1.JPG free-nas-2.JPG
 

SweetAndLow

Sweet'NASty
Joined
Nov 6, 2013
Messages
6,421
If a drive is offline already you can just continue with the replacement procedure like normal. So you should shutdown, replace drive, start up, click replace in the GUI for the old drive and select new drive.
 

pdoten

Dabbler
Joined
Mar 27, 2013
Messages
24
You've lost 1 drive and ada2 is dying. If I were you, at this point, I would try to copy all data out to somewhere else and then, try to replace drives. You might not have another chance.
I did backup my most important stuff but don't really have the storage to backup everything else.
 

mjws00

Guru
Joined
Jul 25, 2014
Messages
798
Sounds like you've done all you can. Time to cross your fingers and replace the drive. Nothing went sideways on the offline/replace procedure. FreeNAS automatically took the drive offline for you. The scary part is resilvering a degraded Z1 array with a bad existing drive. The good news is ZFS is much better at dealing with a bad block while resilvering than traditional RAID. Good luck.
 

pdoten

Dabbler
Joined
Mar 27, 2013
Messages
24

pdoten

Dabbler
Joined
Mar 27, 2013
Messages
24
Well I finished the replacement for drive #2 (ada1). It is currently reslivering. I have the replacement for drive #3 (ada2) arriving tomorrow.

This scene from Jurassic Park comes to mind...
4a5c519396c22eef4722df1c81c2d589854e5bfe5982e1824d7d6e6d8a2650fd.jpg
 

pdoten

Dabbler
Joined
Mar 27, 2013
Messages
24
Any news?
Yes!

Drive #2/ada1 finished reslivering yesterday. It took something like 20 hours but wasn't throwing a lot of explicit errors so maybe just slow reads from the dying drive.

The drive (#3/ada2) I swapped out last night. Reslivering was completed overnight without any errors I saw. I spot checked some recently accessed files and they were OK so only futher use will really indicate if there was any serious data loss. I think I may be in the clear though. Thanks for the input everyone. Hopefully these WD reds are better than the Barracudas I replaced.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
I'd do a scrub. If you have no errors after the scrub is done you can thank the ZFS for keeping things in your favor. :P
 

pdoten

Dabbler
Joined
Mar 27, 2013
Messages
24
I'd do a scrub. If you have no errors after the scrub is done you can thank the ZFS for keeping things in your favor. :p
I ran a scrub and after 41 hours it finished with 0 errors. I have added a scheduled scrub to hopefully help keep things in order and I am dropping my off-site backup into the safe this weekend.

Thanks for the help everyone. Now to get my UPS working correctly.....
 
Status
Not open for further replies.
Top