Replace a Drive

Status
Not open for further replies.

HDNewbie1028

Dabbler
Joined
Oct 8, 2014
Messages
26
Hi, I am new to Freenas. I built my box 2.5 years ago and this is a new problem for me.

I am running Freenas 9.2.1.8 64x. I have a RAIDz2-0 setup. 4 4.0TB WD disks. 2 are WD Green. 2 are WD Red.

One of my disks went bad a few days ago. I ordered a new WD Red drive. Followed the instructions to replace the drive from Freenas documentation - took it offline, shut down box, replaced drive, rebooted box, brought disk online, resilvered. During the course of that process I received two error messages - another drive had apparently died and a third has four bad sectors.

Not sure if I did something wrong or if all the drives are just failing. The status is currently degraded.

What should I do next please? I don't wish to lose anything and am actually planning on copying everything to an external hard drive to make sure I don't lose anything before the next step.

Sent from my SM-G920V using Tapatalk
 

BigDave

FreeNAS Enthusiast
Joined
Oct 6, 2013
Messages
2,479
What should I do next please? I don't wish to lose anything and am actually planning on copying everything to an external hard drive to make sure I don't lose anything before the next step.

  • Make sure the original drive replacement finished the resilvering process before going forward.
  • From SSH, post (in code tags) the results of zpool status
 

Stux

MVP
Joined
Jun 2, 2016
Messages
4,419
Did you have periodic scrubs and smart tests set up?

Are you 100% sure you removed the right drive? Ie you confirmed it by serial number?
 
Last edited:

danb35

Hall of Famer
Joined
Aug 16, 2011
Messages
15,504
What should I do next please? I don't wish to lose anything and am actually planning on copying everything to an external hard drive to make sure I don't lose anything before the next step.
This is why you have RAIDZ2; your data should be fine. You should have a backup anyway, of course, as bad things happen (like catastrophic hardware failure and natural disasters, not to mention user error). As @BigDave says, show us the output of zpool status; that will help us advise you further.

Edit: Also please give the output of camcontrol devlist.
 

HDNewbie1028

Dabbler
Joined
Oct 8, 2014
Messages
26
  • Make sure the original drive replacement finished the resilvering process before going forward.
  • From SSH, post (in code tags) the results of zpool status
Yes, I thought it was done with the resilvering process. It wasn't - it hadn't even started as I had not detached the old drive. So I am waiting for it to finish. Is there a place in the GUI where I can check the progress of the resilver?
 

danb35

Hall of Famer
Joined
Aug 16, 2011
Messages
15,504
Is there a place in the GUI where I can check the progress of the resilver?
The Volume Status page. Go to Storage, click on the name of your pool (the first entry), then click the Volume Status button at the bottom of the screen--it looks like a blank sheet of notebook paper.
 

HDNewbie1028

Dabbler
Joined
Oct 8, 2014
Messages
26
Did you have periodic scrubs and smart tests set up?

Are you 100% sure you removed the right drive? Ie you confirmed it by serial number?
I did do monthly scrubs and they always came back fine. I did not have any smart tests set up.

As for making sure I had the correct drive - I did write down each SN for each drive - i.e. ada0, ada1, etc. so I am sure that I removed the correct drive. The error said ada0 was the bad drive and the serial number for ada0 is what I checked / pulled.
 

HDNewbie1028

Dabbler
Joined
Oct 8, 2014
Messages
26
The Volume Status page. Go to Storage, click on the name of your pool (the first entry), then click the Volume Status button at the bottom of the screen--it looks like a blank sheet of notebook paper.

Yes, and when I do that I get a listing of the 4 drives. Each says it is online. However, the Alert button at the top right is flashing and says that the resilver process is still going on?
 

danb35

Hall of Famer
Joined
Aug 16, 2011
Messages
15,504
I did not have any smart tests set up.
That would be a good thing to change once you get your pool back in order. Enable the SMART service, set up email notifications, run short and long SMART self-tests regularly. The schedule is up to you--I run short tests daily and long weekly, but you could probably go as much as weekly/monthly without a real problem.
 

danb35

Hall of Famer
Joined
Aug 16, 2011
Messages
15,504
However, the Alert button at the top right is flashing and says that the resilver process is still going on?
Let's go back to the earlier questions: What's the output of zpool status? And camcontrol devlist?
 

HDNewbie1028

Dabbler
Joined
Oct 8, 2014
Messages
26
  • Make sure the original drive replacement finished the resilvering process before going forward.
  • From SSH, post (in code tags) the results of zpool status
Hope this is correct...

[root@freenas ~]# zpool status
pool: homeshares
state: ONLINE
status: One or more devices is currently being resilvered. The pool will
continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
scan: resilver in progress since Sat Jun 10 13:36:19 2017
2.88T scanned out of 3.37T at 375M/s, 0h22m to go
714G resilvered, 85.39% done
config:

NAME STATE READ WRITE CKSUM
homeshares ONLINE 0 0 0
raidz2-0 ONLINE 0 0 0
gptid/4e174663-4dfe-11e7-9803-10c37b9ce984 ONLINE 0 0 0 (resilvering)
gptid/09e19df8-4e72-11e4-8ce3-10c37b9ce984 ONLINE 0 0 0
gptid/0abecec0-4e72-11e4-8ce3-10c37b9ce984 ONLINE 0 0 0
gptid/0ba9766a-4e72-11e4-8ce3-10c37b9ce984 ONLINE 0 0 1 (resilvering)

errors: No known data errors



It doesn't look I did it with code tags. Please tell me how to do that to make it easier to read. Thank you
 

HDNewbie1028

Dabbler
Joined
Oct 8, 2014
Messages
26
This is why you have RAIDZ2; your data should be fine. You should have a backup anyway, of course, as bad things happen (like catastrophic hardware failure and natural disasters, not to mention user error). As @BigDave says, show us the output of zpool status; that will help us advise you further.

Edit: Also please give the output of camcontrol devlist.

Again, I am sorry I don't know the codetags to make this easier to read...

[root@freenas ~]# camcontrol devlist
<WDC WD40EFRX-68N32N0 82.00A82> at scbus0 target 0 lun 0 (pass0,ada0)
<WDC WD40EZRX-00SPEB0 80.00A80> at scbus1 target 0 lun 0 (pass1,ada1)
<WDC WD40EFRX-68WT0N0 80.00A80> at scbus2 target 0 lun 0 (pass2,ada2)
<WDC WD40EZRX-00SPEB0 80.00A80> at scbus3 target 0 lun 0 (pass3,ada3)
<SanDisk Cruzer Fit 1.27> at scbus5 target 0 lun 0 (pass4,da0)
[root@freenas ~]#
 

BigDave

FreeNAS Enthusiast
Joined
Oct 6, 2013
Messages
2,479
codetags.JPG
 

danb35

Hall of Famer
Joined
Aug 16, 2011
Messages
15,504
OK, so you have two disks being resilvered right now. Let them complete (which should only take another ten minutes or so), and see where you are.
 

HDNewbie1028

Dabbler
Joined
Oct 8, 2014
Messages
26
Code:
zpool status																									 
  pool: homeshares																												 
state: ONLINE																													 
status: One or more devices has experienced an unrecoverable error.  An															 
		attempt was made to correct the error.  Applications are unaffected.														
action: Determine if the device needs to be replaced, and clear the errors														 
		using 'zpool clear' or replace the device with 'zpool replace'.															 
   see: http://illumos.org/msg/ZFS-8000-9P																						 
  scan: resilvered 873G in 3h40m with 0 errors on Sat Jun 10 17:17:04 2017														 
config:																															 
																																	
		NAME											STATE	 READ WRITE CKSUM												 
		homeshares									  ONLINE	   0	 0	 0												 
		  raidz2-0									  ONLINE	   0	 0	 0												 
			gptid/4e174663-4dfe-11e7-9803-10c37b9ce984  ONLINE	   0	 0  296K												 
			gptid/09e19df8-4e72-11e4-8ce3-10c37b9ce984  ONLINE	   0	 0	 0												 
			gptid/0abecec0-4e72-11e4-8ce3-10c37b9ce984  ONLINE	   0	 0	 0												 
			gptid/0ba9766a-4e72-11e4-8ce3-10c37b9ce984  ONLINE	   0	 0  596K												 
																																	
errors: No known data errors																									
 

HDNewbie1028

Dabbler
Joined
Oct 8, 2014
Messages
26
Code:
 camcontrol devlist											
<WDC WD40EFRX-68N32N0 82.00A82>	at scbus0 target 0 lun 0 (pass0,ada0)		
<WDC WD40EZRX-00SPEB0 80.00A80>	at scbus1 target 0 lun 0 (pass1,ada1)		
<WDC WD40EFRX-68WT0N0 80.00A80>	at scbus2 target 0 lun 0 (pass2,ada2)		
<WDC WD40EZRX-00SPEB0 80.00A80>	at scbus3 target 0 lun 0 (pass3,ada3)		
<SanDisk Cruzer Fit 1.27>		  at scbus5 target 0 lun 0 (pass4,da0)		 
		  
 

danb35

Hall of Famer
Joined
Aug 16, 2011
Messages
15,504
For now, try swapping cables between that and a different disk, then do a scrub. See if that changes where the checksum errors are seen.
 
Status
Not open for further replies.
Top