Pool is degraded (mystery disk)

Status
Not open for further replies.

marcevan

Patron
Joined
Dec 15, 2013
Messages
432
This AM got a red alert saying my pool is degraded.

I look at the volume status and there's my 5 WD 3.0 TB drives plus one I've not seen.

When I setup the pool I had put in my 5 HDs and nothing else (not even a CD, card reader). I believe the USB boot would never be eligible for the pool anyway.

When I look at the disks, there's only my 5 disks, btw.

zpool status shows the 5 HDs plus the unknown one status unavail and saying was /dev/gptid/etc that looks close to the ones showing for my 5 drives
 
Last edited:

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
Output of zpool status, camcontrol devlist and smartctl -a /dev/adaX for all drives, please.
 

DrKK

FreeNAS Generalissimo
Joined
Oct 15, 2013
Messages
3,630
Indeed. That is one strange problem sir.

Please provide what @Ericloewe asked for, and we will be able to tell you what's going on.

Also, an AMD A4-6300 is certainly not what we'd recommend, generally speaking, for a FreeNAS :) But certainly that's not causing the problem.
 

styno

Patron
Joined
Apr 11, 2016
Messages
466
If I was you I would backup - poweroff / poweron in that order.
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,994
Did you do any drive replacements, mess around with the unit at all (move it)? I'm looking at a possible hardware disconnect and reconnect of a hard drive.

Provide the requested data, it will help a lot. Also, are you running FreeNAS 9.10 as indicatd in your signature or have you upgraded?
 

marcevan

Patron
Joined
Dec 15, 2013
Messages
432
I replaced a drive that had bad cable (SATA cable cracked near motherboard).

New cable, new 4TB drive, GUI to replace was run and it re-silvered.

Here's the email I got before I did this (yesterday) so note the drive is now different (4TB) and says "UNAVAILABLE" but otherwise looks like below...

Code:
pool: Media
state: DEGRADED
status: One or more devices has been taken offline by the administrator.
		Sufficient replicas exist for the pool to continue functioning in a
		degraded state.
action: Online the device using 'zpool online' or replace the device with
		'zpool replace'.
  scan: resilvered 1.07M in 0h0m with 0 errors on Sun Jul 23 17:11:04 2017
config:

		NAME											STATE	 READ WRITE CKSUM
		Media										   DEGRADED	 0	 0	 0
		  raidz2-0									  DEGRADED	 0	 0	 0
			gptid/41804705-6726-11e3-831d-6805ca1adbd8  ONLINE	   0	 0	 0
			gptid/41eefd8e-6726-11e3-831d-6805ca1adbd8  ONLINE	   0	 0	 0
			8490885614271992732						 OFFLINE	  0	 0	 0  was /dev/gptid/426f00fc-6726-11e3-831d-6805ca1adbd8
			gptid/dc16e4b0-8c2c-11e3-867a-6805ca1adbd8  ONLINE	   0	 0	 0
			gptid/d9cee92c-9662-11e3-a1dc-6805ca1adbd8  ONLINE	   0	 0	 0
			gptid/43c4e34c-6726-11e3-831d-6805ca1adbd8  ONLINE	   0	 0	 0

errors: No known data errors


So went about an inline replacement for 8490885.... and now it's basically the same (new Serial # for new 4TB drive) but instead of OFFLINE it's UNAVAILABLE and in GUI only option (again).

I checked the manual and it said if replacing a failed drive in place, you do what I did, and it re-silvers and is immediately removed from the pool. Not sure why it does that as I expect if re-silvering, upon success you should be done.

So as of this AM I got this:

Code:
Checking status of zfs pools:
NAME		   SIZE  ALLOC   FREE  EXPANDSZ   FRAG	CAP  DEDUP  HEALTH  ALTROOT
Media		 16.2T  6.38T  9.87T		 -	 9%	39%  1.00x  DEGRADED  /mnt
freenas-boot  14.9G  6.70G  8.17G		 -	  -	45%  1.00x  ONLINE  -

  pool: Media
 state: DEGRADED
status: One or more devices is currently being resilvered.  The pool will
		continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
  scan: resilver in progress since Tue Jul 25 19:40:07 2017
		6.29T scanned out of 6.38T at 249M/s, 0h6m to go
		1.05T resilvered, 98.53% done
config:

		NAME											STATE	 READ WRITE CKSUM
		Media										   DEGRADED	 0	 0	 0
		  raidz2-0									  DEGRADED	 0	 0	 0
			gptid/41804705-6726-11e3-831d-6805ca1adbd8  ONLINE	   0	 0	 0
			501206551231718479						  UNAVAIL	  0	 0	 0  was /dev/gptid/41eefd8e-6726-11e3-831d-6805ca1adbd8
			gptid/8f0a7046-7192-11e7-806a-6805ca1adbd8  ONLINE	   0	 0	 0  (resilvering)
			gptid/dc16e4b0-8c2c-11e3-867a-6805ca1adbd8  ONLINE	   0	 0	 0
			gptid/d9cee92c-9662-11e3-a1dc-6805ca1adbd8  ONLINE	   0	 0	 0
			gptid/43c4e34c-6726-11e3-831d-6805ca1adbd8  ONLINE	   0	 0	 0

errors: No known data errors


Note: When this finished (shows 98.53% done but that was some time ago) the 501206551231718479 (4TB HD) is UNAVAIL.

There's no replace available as all 6 SATA ports are in use now. I will try to online it after I read the manual to see if I need to remove it first.
 
Last edited:

melloa

Wizard
Joined
May 22, 2016
Messages
1,749
Your motherboard only has 6 sata ports, so you can't move that disk to another "free" port, so try to swap the disk to another "working" port to see if the problem moves. That way you will check the cable/disk/port.
 

marcevan

Patron
Joined
Dec 15, 2013
Messages
432
The one I replaced was done inline (out goes old drive and cable, in goes new drive and cable) with the replace option that then prompted the resilvering. I doubt it's the port in this case as this all started with a bad cable on the prior drive.

With new cable and drive the 4TB was found, replaced the old, and was resilvered. I checked the manual and apparently after resilvering the new drive is supposed to go UNAVAIL.
 

melloa

Wizard
Joined
May 22, 2016
Messages
1,749
I checked the manual and apparently after resilvering the new drive is supposed to go UNAVAIL.

After replaced/resilvered? I'm gonna check on that as when I did one replacement all went flawlessly without any "offline" drives or need any other steps from the GUI or shell.
 

melloa

Wizard
Joined
May 22, 2016
Messages
1,749
Sorry, @marcevan, I don't see this here.

Below test:

Code:
*** After creation ***

[root@freenas] /mnt/jeep/test# zpool status
  pool: freenas-boot
state: ONLINE
  scan: none requested
config:

		NAME		STATE	 READ WRITE CKSUM
		freenas-boot  ONLINE	   0	 0	 0
		  da0p2	 ONLINE	   0	 0	 0

errors: No known data errors

  pool: jeep
state: ONLINE
  scan: none requested
config:

		NAME											STATE	 READ WRITE CKSUM
		jeep											ONLINE	   0	 0	 0
		  mirror-0									  ONLINE	   0	 0	 0
			gptid/36d5b0f7-7225-11e7-b337-000c29fc2e3c  ONLINE	   0	 0	 0
			gptid/3e042565-7225-11e7-b337-000c29fc2e3c  ONLINE	   0	 0	 0

errors: No known data errors

*** Took one drive off-line ***

[root@freenas] /mnt/jeep/test# zpool status
  pool: freenas-boot
state: ONLINE
  scan: none requested
config:

		NAME		STATE	 READ WRITE CKSUM
		freenas-boot  ONLINE	   0	 0	 0
		  da0p2	 ONLINE	   0	 0	 0

errors: No known data errors

  pool: jeep
state: DEGRADED
status: One or more devices has been taken offline by the administrator.
		Sufficient replicas exist for the pool to continue functioning in a
		degraded state.
action: Online the device using 'zpool online' or replace the device with
		'zpool replace'.
  scan: none requested
config:

		NAME											STATE	 READ WRITE CKSUM
		jeep											DEGRADED	 0	 0	 0
		  mirror-0									  DEGRADED	 0	 0	 0
			gptid/36d5b0f7-7225-11e7-b337-000c29fc2e3c  ONLINE	   0	 0	 0
			12524577004742733701						OFFLINE	  0	 0	 0  was /dev/gptid/3e042565-7225-11e7-b337-000c29fc2e3c

errors: No known data errors

*** Replaced ***

[root@freenas] /mnt/jeep/test# zpool status
  pool: freenas-boot
state: ONLINE
  scan: none requested
config:

		NAME		STATE	 READ WRITE CKSUM
		freenas-boot  ONLINE	   0	 0	 0
		  da0p2	 ONLINE	   0	 0	 0

errors: No known data errors

  pool: jeep
state: ONLINE
  scan: scrub repaired 0 in 0h0m with 0 errors on Wed Jul 26 10:19:14 2017
config:

		NAME											STATE	 READ WRITE CKSUM
		jeep											ONLINE	   0	 0	 0
		  mirror-0									  ONLINE	   0	 0	 0
			gptid/36d5b0f7-7225-11e7-b337-000c29fc2e3c  ONLINE	   0	 0	 0
			gptid/81e3a185-7226-11e7-b337-000c29fc2e3c  ONLINE	   0	 0	 0

errors: No known data errors
[root@freenas] /mnt/jeep/test#
 

marcevan

Patron
Joined
Dec 15, 2013
Messages
432
At home now and opened up the Server and there's 6 Hard Drives in there... Yes 6 including the 4TB that replaced the 3TB that had a bad SATA cable.

I'm in progress of reboots after pulling a drive to see what that changes since I only see 5 drives. I want to isolate the physical drive that isn't showing in my list of disks.

A ha! I got lucky. I pulled the SATA and power from one of the disks and refreshed the zpool status to no changes. That must be the disk! I replaced with my already silvered 4TB drive and booted the machine.

Zpool is now fine with 6 drives!
 
Last edited:

melloa

Wizard
Joined
May 22, 2016
Messages
1,749
I don't know of a better way from the shell, but:

smartctl -a /dev/<drive> | grep "Serial"

Will give you the serial number for /dev/<drive>

From the GUI: Storage -> View Disks.
 

marcevan

Patron
Joined
Dec 15, 2013
Messages
432
Thanks. I'll file that for disk replacement duties since I can power it off, photo each serial and name the serial # image after the SATA port number as a file. I'll likely be stepping up 5 more drives from 3TB to 4TB.
 
Status
Not open for further replies.
Top