SOLVED volume is degraded after taking out a disk

litan · Dec 20, 2017

Hi,

One of the disks was took out from the NAS mistakenly and the current pool status is showing as degraded.

Alert message:

Code:

Device: /dev/da4, SMART Failure: HARDWARE IMPENDING FAILURE TOO MANY BLOCK REASSIGNS
Device: /dev/da4, failed to read SMART values The volume default state is DEGRADED: One or more devices has experienced an unrecoverable error. An attempt was made to correct the error. Applications are unaffected.
Device: /dev/da4, Read SMART Self-Test Log Failed

Web GUI is not accessible at the moment but ssh is still working

zpool status is showing as below:

Code:

root@freenas:~ # zpool status
  pool: default
 state: DEGRADED
status: One or more devices has experienced an unrecoverable error.  An
	attempt was made to correct the error.  Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
	using 'zpool clear' or replace the device with 'zpool replace'.
   see: http://illumos.org/msg/ZFS-8000-9P
  scan: scrub repaired 392K in 17h35m with 0 errors on Sun Nov 26 17:35:48 2017
config:

	NAME											STATE	 READ WRITE CKSUM
	default										 DEGRADED	 0	 0	 0
	  raidz2-0									  DEGRADED	 0	 0	 0
		gptid/87859b23-f7e5-11e5-bbbf-0025909b4d4a  ONLINE	   0	 0	 0
		gptid/8a77d7c9-f7e5-11e5-bbbf-0025909b4d4a  ONLINE	   0	 0	 0
		gptid/8d67c8a1-f7e5-11e5-bbbf-0025909b4d4a  ONLINE	   0	 0	 0
		1255253969284545076						 REMOVED	  0	 0	 0  was /dev/gptid/9057d78c-f7e5-11e5-bbbf-0025909b4d4a
		gptid/93449ee9-f7e5-11e5-bbbf-0025909b4d4a  ONLINE	   0	 0	 2
		gptid/962b228d-f7e5-11e5-bbbf-0025909b4d4a  ONLINE	   0	 0	 0
		gptid/9918cd79-f7e5-11e5-bbbf-0025909b4d4a  ONLINE	   0	 0	 0
		gptid/9c0990fc-f7e5-11e5-bbbf-0025909b4d4a  ONLINE	   0	 0	 0
	logs
	  gptid/9cad6899-f7e5-11e5-bbbf-0025909b4d4a	ONLINE	   0	 0	 0
	cache
	  ada0p1										ONLINE	   0	 0	 0

errors: No known data errors

  pool: freenas-boot
 state: ONLINE
  scan: scrub repaired 0 in 0h1m with 0 errors on Tue Dec 19 03:46:57 2017
config:

	NAME		STATE	 READ WRITE CKSUM
	freenas-boot  ONLINE	   0	 0	 0
	  da8p2	 ONLINE	   0	 0	 0

errors: No known data errors

Can someone give any idea how to fix this? thanks in advance

danb35 · Dec 20, 2017

litan said:
Can someone give any idea how to fix this?

Put the disk back in and reboot?

litan · Dec 20, 2017

And sorry for that none of the admins knows a lot about FreeNAS, though the above message suggests running zpool clean, b

danb35 said:
Put the disk back in and reboot?

Thanks for the quick response.
The first thing we did was put the disk back to same place
However we can't reboot it in during the business hour so we want to see if anything else we can do to put it back to normal.
And also we are worrying whether the volume can up and running after reboot...

Thanks again.

danb35 · Dec 20, 2017

litan said:
The first thing we did was put the disk back to same place

Is the zpool status output you posted after doing that? If so, try doing zpool online default gptid/9057d78c-f7e5-11e5-bbbf-0025909b4d4a. It should bring the disk back online and resilver whatever needs to be resilvered (likely not much).

litan · Dec 20, 2017

Much much much ...............much appreciate your reply.
Here is the current output

Code:

root@freenas:~ # zpool status
  pool: default
 state: ONLINE
status: One or more devices is currently being resilvered.  The pool will
	continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
  scan: resilver in progress since Thu Dec 21 12:47:55 2017
		15.4M scanned out of 2.35T at 7.70M/s, 88h42m to go
		1.67M resilvered, 0.00% done
config:

	NAME											STATE	 READ WRITE CKSUM
	default										 ONLINE	   0	 0	 0
	  raidz2-0									  ONLINE	   0	 0	 0
		gptid/87859b23-f7e5-11e5-bbbf-0025909b4d4a  ONLINE	   0	 0	 0
		gptid/8a77d7c9-f7e5-11e5-bbbf-0025909b4d4a  ONLINE	   0	 0	 0
		gptid/8d67c8a1-f7e5-11e5-bbbf-0025909b4d4a  ONLINE	   0	 0	 0
		gptid/9057d78c-f7e5-11e5-bbbf-0025909b4d4a  ONLINE	   0	 0	 0  (resilvering)
		gptid/93449ee9-f7e5-11e5-bbbf-0025909b4d4a  ONLINE	   0	 0	 3
		gptid/962b228d-f7e5-11e5-bbbf-0025909b4d4a  ONLINE	   0	 0	 0
		gptid/9918cd79-f7e5-11e5-bbbf-0025909b4d4a  ONLINE	   0	 0	 0
		gptid/9c0990fc-f7e5-11e5-bbbf-0025909b4d4a  ONLINE	   0	 0	 0
	logs
	  gptid/9cad6899-f7e5-11e5-bbbf-0025909b4d4a	ONLINE	   0	 0	 0
	cache
	  ada0p1										ONLINE	   0	 0	 0

errors: No known data errors

  pool: freenas-boot
 state: ONLINE
  scan: scrub repaired 0 in 0h1m with 0 errors on Tue Dec 19 03:46:57 2017
config:

	NAME		STATE	 READ WRITE CKSUM
	freenas-boot  ONLINE	   0	 0	 0
	  da8p2	 ONLINE	   0	 0	 0

errors: No known data errors

litan · Dec 20, 2017

danb35 said:
Is the zpool status output you posted after doing that? If so, try doing zpool online default gptid/9057d78c-f7e5-11e5-bbbf-0025909b4d4a. It should bring the disk back online and resilver whatever needs to be resilvered (likely not much).

By the way - HTTP GUI is still not accessible - browser status bar is showing : Waiting for available socket...

Assume this will be resolved after the resilvering?

danb35 · Dec 20, 2017

litan said:
Assume this will be resolved after the resilvering?

Perhaps, or the disk being out might have killed it until either a reboot or until the service is restarted. Was the system running when the disk was removed?

litan · Dec 20, 2017

Yes, the system seems keep running since the accident because we can SSH and Ping from the moment we took out the disk, till now - following your advice the bring the disk online.
And also checking the VMs uptime (iscsi image on this NAS), there wasn't any disconnection since that.

So yes, the system was running, and keep running when the disk was removed.

danb35 · Dec 20, 2017

litan said:
So yes, the system was running, and keep running when the disk was removed.

In that case, if the web GUI comes back up when resilvering finishes, great. Otherwise, that service may have crashed (perhaps it was using some swap which was on the removed disk--this should be addressed in 11.1), but rebooting the server at your next convenient opportunity should bring it back up in any event.

litan · Dec 20, 2017

Thanks. After resilvering, all disks are online and status looks normal

Code:

root@freenas:~ # zpool status
  pool: default
 state: ONLINE
status: One or more devices has experienced an unrecoverable error.  An
	attempt was made to correct the error.  Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
	using 'zpool clear' or replace the device with 'zpool replace'.
   see: http://illumos.org/msg/ZFS-8000-9P
  scan: resilvered 292M in 0h8m with 0 errors on Thu Dec 21 12:56:35 2017
config:

	NAME											STATE	 READ WRITE CKSUM
	default										 ONLINE	   0	 0	 0
	  raidz2-0									  ONLINE	   0	 0	 0
		gptid/87859b23-f7e5-11e5-bbbf-0025909b4d4a  ONLINE	   0	 0	 0
		gptid/8a77d7c9-f7e5-11e5-bbbf-0025909b4d4a  ONLINE	   0	 0	 0
		gptid/8d67c8a1-f7e5-11e5-bbbf-0025909b4d4a  ONLINE	   0	 0	 0
		gptid/9057d78c-f7e5-11e5-bbbf-0025909b4d4a  ONLINE	   0	 0	 1
		gptid/93449ee9-f7e5-11e5-bbbf-0025909b4d4a  ONLINE	   0	 0	 5
		gptid/962b228d-f7e5-11e5-bbbf-0025909b4d4a  ONLINE	   0	 0	 0
		gptid/9918cd79-f7e5-11e5-bbbf-0025909b4d4a  ONLINE	   0	 0	 0
		gptid/9c0990fc-f7e5-11e5-bbbf-0025909b4d4a  ONLINE	   0	 0	 0
	logs
	  gptid/9cad6899-f7e5-11e5-bbbf-0025909b4d4a	ONLINE	   0	 0	 0
	cache
	  ada0p1										ONLINE	   0	 0	 0

errors: No known data errors

  pool: freenas-boot
 state: ONLINE
  scan: scrub repaired 0 in 0h1m with 0 errors on Tue Dec 19 03:46:57 2017
config:

	NAME		STATE	 READ WRITE CKSUM
	freenas-boot  ONLINE	   0	 0	 0
	  da8p2	 ONLINE	   0	 0	 0

errors: No known data errors

However the Web GUI is still not up, we will follow the advice to restart in the next maintenance window.

Thanks again.

danb35 · Dec 21, 2017

litan said:
all disks are online and status looks normal

Not really--the fourth and fifth disks are still showing errors. You'll need to do something about that. But your pool's back to full redundancy at least.

rs225 · Dec 21, 2017

Those errors could be a cable or enclosure problem. I would check the SMART on those two drives, just to be sure.

litan · Dec 21, 2017

danb35 said:
Not really--the fourth and fifth disks are still showing errors. You'll need to do something about that. But your pool's back to full redundancy at least.

Hi Danb35 - may I know from which line you can see the 4th and 5th disks still in problem?, thanks

danb35 · Dec 21, 2017

litan said:
Hi Danb35 - may I know from which line you can see the 4th and 5th disks still in problem?, thanks

It's the line of the zpool status output that shows the fourth and fifth disks in your pool. Both are showing checksum errors, which means the data read from the disk didn't match its checksum. Those could be errors with the disks, with the cables, with the backplane, or possibly other places, but something's still not 100% right.

Important Announcement for the TrueNAS Community.

SOLVED volume is degraded after taking out a disk

litan

Cadet

danb35

Hall of Famer

litan

Cadet

danb35

Hall of Famer

litan

Cadet

litan

Cadet

danb35

Hall of Famer

litan

Cadet

danb35

Hall of Famer

litan

Cadet

danb35

Hall of Famer

rs225

Guru

litan

Cadet

danb35

Hall of Famer

Similar threads