SOLVED volume is degraded after taking out a disk

Status
Not open for further replies.

litan

Cadet
Joined
Dec 20, 2017
Messages
7
Hi,

One of the disks was took out from the NAS mistakenly and the current pool status is showing as degraded.

Alert message:
Code:
Device: /dev/da4, SMART Failure: HARDWARE IMPENDING FAILURE TOO MANY BLOCK REASSIGNS
Device: /dev/da4, failed to read SMART values The volume default state is DEGRADED: One or more devices has experienced an unrecoverable error. An attempt was made to correct the error. Applications are unaffected.
Device: /dev/da4, Read SMART Self-Test Log Failed


Web GUI is not accessible at the moment but ssh is still working

zpool status is showing as below:
Code:
root@freenas:~ # zpool status
  pool: default
 state: DEGRADED
status: One or more devices has experienced an unrecoverable error.  An
	attempt was made to correct the error.  Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
	using 'zpool clear' or replace the device with 'zpool replace'.
   see: http://illumos.org/msg/ZFS-8000-9P
  scan: scrub repaired 392K in 17h35m with 0 errors on Sun Nov 26 17:35:48 2017
config:

	NAME											STATE	 READ WRITE CKSUM
	default										 DEGRADED	 0	 0	 0
	  raidz2-0									  DEGRADED	 0	 0	 0
		gptid/87859b23-f7e5-11e5-bbbf-0025909b4d4a  ONLINE	   0	 0	 0
		gptid/8a77d7c9-f7e5-11e5-bbbf-0025909b4d4a  ONLINE	   0	 0	 0
		gptid/8d67c8a1-f7e5-11e5-bbbf-0025909b4d4a  ONLINE	   0	 0	 0
		1255253969284545076						 REMOVED	  0	 0	 0  was /dev/gptid/9057d78c-f7e5-11e5-bbbf-0025909b4d4a
		gptid/93449ee9-f7e5-11e5-bbbf-0025909b4d4a  ONLINE	   0	 0	 2
		gptid/962b228d-f7e5-11e5-bbbf-0025909b4d4a  ONLINE	   0	 0	 0
		gptid/9918cd79-f7e5-11e5-bbbf-0025909b4d4a  ONLINE	   0	 0	 0
		gptid/9c0990fc-f7e5-11e5-bbbf-0025909b4d4a  ONLINE	   0	 0	 0
	logs
	  gptid/9cad6899-f7e5-11e5-bbbf-0025909b4d4a	ONLINE	   0	 0	 0
	cache
	  ada0p1										ONLINE	   0	 0	 0

errors: No known data errors

  pool: freenas-boot
 state: ONLINE
  scan: scrub repaired 0 in 0h1m with 0 errors on Tue Dec 19 03:46:57 2017
config:

	NAME		STATE	 READ WRITE CKSUM
	freenas-boot  ONLINE	   0	 0	 0
	  da8p2	 ONLINE	   0	 0	 0

errors: No known data errors

Can someone give any idea how to fix this? thanks in advance
 
Last edited by a moderator:

litan

Cadet
Joined
Dec 20, 2017
Messages
7
And sorry for that none of the admins knows a lot about FreeNAS, though the above message suggests running zpool clean, b
Put the disk back in and reboot?
Thanks for the quick response.
The first thing we did was put the disk back to same place
However we can't reboot it in during the business hour so we want to see if anything else we can do to put it back to normal.
And also we are worrying whether the volume can up and running after reboot...

Thanks again.
 

danb35

Hall of Famer
Joined
Aug 16, 2011
Messages
15,504
The first thing we did was put the disk back to same place
Is the zpool status output you posted after doing that? If so, try doing zpool online default gptid/9057d78c-f7e5-11e5-bbbf-0025909b4d4a. It should bring the disk back online and resilver whatever needs to be resilvered (likely not much).
 

litan

Cadet
Joined
Dec 20, 2017
Messages
7
Much much much ...............much appreciate your reply.
Here is the current output
Code:
root@freenas:~ # zpool status
  pool: default
 state: ONLINE
status: One or more devices is currently being resilvered.  The pool will
	continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
  scan: resilver in progress since Thu Dec 21 12:47:55 2017
		15.4M scanned out of 2.35T at 7.70M/s, 88h42m to go
		1.67M resilvered, 0.00% done
config:

	NAME											STATE	 READ WRITE CKSUM
	default										 ONLINE	   0	 0	 0
	  raidz2-0									  ONLINE	   0	 0	 0
		gptid/87859b23-f7e5-11e5-bbbf-0025909b4d4a  ONLINE	   0	 0	 0
		gptid/8a77d7c9-f7e5-11e5-bbbf-0025909b4d4a  ONLINE	   0	 0	 0
		gptid/8d67c8a1-f7e5-11e5-bbbf-0025909b4d4a  ONLINE	   0	 0	 0
		gptid/9057d78c-f7e5-11e5-bbbf-0025909b4d4a  ONLINE	   0	 0	 0  (resilvering)
		gptid/93449ee9-f7e5-11e5-bbbf-0025909b4d4a  ONLINE	   0	 0	 3
		gptid/962b228d-f7e5-11e5-bbbf-0025909b4d4a  ONLINE	   0	 0	 0
		gptid/9918cd79-f7e5-11e5-bbbf-0025909b4d4a  ONLINE	   0	 0	 0
		gptid/9c0990fc-f7e5-11e5-bbbf-0025909b4d4a  ONLINE	   0	 0	 0
	logs
	  gptid/9cad6899-f7e5-11e5-bbbf-0025909b4d4a	ONLINE	   0	 0	 0
	cache
	  ada0p1										ONLINE	   0	 0	 0

errors: No known data errors

  pool: freenas-boot
 state: ONLINE
  scan: scrub repaired 0 in 0h1m with 0 errors on Tue Dec 19 03:46:57 2017
config:

	NAME		STATE	 READ WRITE CKSUM
	freenas-boot  ONLINE	   0	 0	 0
	  da8p2	 ONLINE	   0	 0	 0

errors: No known data errors
 
Last edited by a moderator:

litan

Cadet
Joined
Dec 20, 2017
Messages
7
Is the zpool status output you posted after doing that? If so, try doing zpool online default gptid/9057d78c-f7e5-11e5-bbbf-0025909b4d4a. It should bring the disk back online and resilver whatever needs to be resilvered (likely not much).

By the way - HTTP GUI is still not accessible - browser status bar is showing : Waiting for available socket...

Assume this will be resolved after the resilvering?
 

danb35

Hall of Famer
Joined
Aug 16, 2011
Messages
15,504
Assume this will be resolved after the resilvering?
Perhaps, or the disk being out might have killed it until either a reboot or until the service is restarted. Was the system running when the disk was removed?
 

litan

Cadet
Joined
Dec 20, 2017
Messages
7
Yes, the system seems keep running since the accident because we can SSH and Ping from the moment we took out the disk, till now - following your advice the bring the disk online.
And also checking the VMs uptime (iscsi image on this NAS), there wasn't any disconnection since that.

So yes, the system was running, and keep running when the disk was removed.
 
Last edited by a moderator:

danb35

Hall of Famer
Joined
Aug 16, 2011
Messages
15,504
So yes, the system was running, and keep running when the disk was removed.
In that case, if the web GUI comes back up when resilvering finishes, great. Otherwise, that service may have crashed (perhaps it was using some swap which was on the removed disk--this should be addressed in 11.1), but rebooting the server at your next convenient opportunity should bring it back up in any event.
 

litan

Cadet
Joined
Dec 20, 2017
Messages
7
Thanks. After resilvering, all disks are online and status looks normal
Code:
root@freenas:~ # zpool status
  pool: default
 state: ONLINE
status: One or more devices has experienced an unrecoverable error.  An
	attempt was made to correct the error.  Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
	using 'zpool clear' or replace the device with 'zpool replace'.
   see: http://illumos.org/msg/ZFS-8000-9P
  scan: resilvered 292M in 0h8m with 0 errors on Thu Dec 21 12:56:35 2017
config:

	NAME											STATE	 READ WRITE CKSUM
	default										 ONLINE	   0	 0	 0
	  raidz2-0									  ONLINE	   0	 0	 0
		gptid/87859b23-f7e5-11e5-bbbf-0025909b4d4a  ONLINE	   0	 0	 0
		gptid/8a77d7c9-f7e5-11e5-bbbf-0025909b4d4a  ONLINE	   0	 0	 0
		gptid/8d67c8a1-f7e5-11e5-bbbf-0025909b4d4a  ONLINE	   0	 0	 0
		gptid/9057d78c-f7e5-11e5-bbbf-0025909b4d4a  ONLINE	   0	 0	 1
		gptid/93449ee9-f7e5-11e5-bbbf-0025909b4d4a  ONLINE	   0	 0	 5
		gptid/962b228d-f7e5-11e5-bbbf-0025909b4d4a  ONLINE	   0	 0	 0
		gptid/9918cd79-f7e5-11e5-bbbf-0025909b4d4a  ONLINE	   0	 0	 0
		gptid/9c0990fc-f7e5-11e5-bbbf-0025909b4d4a  ONLINE	   0	 0	 0
	logs
	  gptid/9cad6899-f7e5-11e5-bbbf-0025909b4d4a	ONLINE	   0	 0	 0
	cache
	  ada0p1										ONLINE	   0	 0	 0

errors: No known data errors

  pool: freenas-boot
 state: ONLINE
  scan: scrub repaired 0 in 0h1m with 0 errors on Tue Dec 19 03:46:57 2017
config:

	NAME		STATE	 READ WRITE CKSUM
	freenas-boot  ONLINE	   0	 0	 0
	  da8p2	 ONLINE	   0	 0	 0

errors: No known data errors

However the Web GUI is still not up, we will follow the advice to restart in the next maintenance window.

Thanks again.
 
Last edited by a moderator:

danb35

Hall of Famer
Joined
Aug 16, 2011
Messages
15,504
all disks are online and status looks normal
Not really--the fourth and fifth disks are still showing errors. You'll need to do something about that. But your pool's back to full redundancy at least.
 

rs225

Guru
Joined
Jun 28, 2014
Messages
878
Those errors could be a cable or enclosure problem. I would check the SMART on those two drives, just to be sure.
 

litan

Cadet
Joined
Dec 20, 2017
Messages
7
Not really--the fourth and fifth disks are still showing errors. You'll need to do something about that. But your pool's back to full redundancy at least.
Hi Danb35 - may I know from which line you can see the 4th and 5th disks still in problem?, thanks
 

danb35

Hall of Famer
Joined
Aug 16, 2011
Messages
15,504
Hi Danb35 - may I know from which line you can see the 4th and 5th disks still in problem?, thanks
It's the line of the zpool status output that shows the fourth and fifth disks in your pool. Both are showing checksum errors, which means the data read from the disk didn't match its checksum. Those could be errors with the disks, with the cables, with the backplane, or possibly other places, but something's still not 100% right.
 
Status
Not open for further replies.
Top