Phantom Disks Hung in Array - Replacing-X

DaBuddha

Dabbler
Joined
Jan 24, 2019
Messages
11
First post, need a hand.

Code:
Build:         FreeNAS-11.1-U2
Platform:       Intel(R) Xeon(R) CPU E5645 @ 2.40GHz


I've replaced two disks (and will have to soon replace a third), the replacements resilver, and array is actually fine. But the old disks, which are now gone, are still listed, and I am permanently DEGRADED (I am so ashamed)

zpool status:

Code:
raidz1-1                                        DEGRADED     3     0    26
            gptid/b523a793-4b7e-11e8-b779-0025902b9ca0    ONLINE       0     0     0
            gptid/b62a653f-4b7e-11e8-b779-0025902b9ca0    ONLINE       0     0     0
            gptid/b72c2150-4b7e-11e8-b779-0025902b9ca0    ONLINE       0     0     0
            gptid/b84d3e1f-4b7e-11e8-b779-0025902b9ca0    ONLINE       0     0     0
            gptid/b96533e6-4b7e-11e8-b779-0025902b9ca0    ONLINE       0     0     0
            gptid/ba6bff50-4b7e-11e8-b779-0025902b9ca0    ONLINE       0     0     0
            replacing-6                                   UNAVAIL      0     0     0
              17940202337998090796                        UNAVAIL      0     0     0  was /dev/gptid/bb68588f-4b7e-11e8-b779-0025902b9ca0
              gptid/8b16de9a-1ec6-11e9-bd7c-0025902b9ca0  ONLINE       0     0     0
            gptid/ec325bb4-01b0-11e9-86bf-0025902b9ca0    ONLINE       0     0     0
            gptid/c1818f64-4b7e-11e8-b779-0025902b9ca0    ONLINE       3     0     0
            replacing-9                                   DEGRADED     0     0     0
              10630623345475226511                        UNAVAIL      0     0     0  was /dev/gptid/c2832171-4b7e-11e8-b779-0025902b9ca0


Searching I found some references to this problem on FreeNAS 9 back in 2015, but no real solution (Linked to a fixed bug), also references to sector size difference issues causing issues on replace. The bad disks got bumped to HGST 6TBs from old ass seagate 'cudas (long story, plan on replacing all the old disks with 6TBs).

How do I exorcise the disks that are listed as being replaced (r-6; r-9) but are now gone?
 

Chris Moore

Hall of Famer
Joined
May 2, 2015
Messages
10,080
You should not be able to get into this situation.
What is the disk controller?
 

Chris Moore

Hall of Famer
Joined
May 2, 2015
Messages
10,080
PS. there are many good bug fix reasons for you to upgrade to version 11.1-U6.3
 

Chris Moore

Hall of Famer
Joined
May 2, 2015
Messages
10,080
Thanks. Will do. Didn't see anything that might indicate this is a known bug.
I have not seen anything like what you are experiencing in quite a while, but there were a long list of other bugs fixed between FreeNAS-11.1-U2 and FreeNAS-11.1-U6.3.
The new 11.2 line is not fully stable yet, so I don't suggest going there.

Have you tried running a scrub on the pool since changing these disks?
 

DaBuddha

Dabbler
Joined
Jan 24, 2019
Messages
11
I have not seen anything like what you are experiencing in quite a while, but there were a long list of other bugs fixed between FreeNAS-11.1-U2 and FreeNAS-11.1-U6.3.
The new 11.2 line is not fully stable yet, so I don't suggest going there.

Have you tried running a scrub on the pool since changing these disks?

Not yet, will when I can.

I just did the upgrade and, with that, it is resilvering both of the replacement disks.

Once that is complete I will scrub.
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
Were you on IRC last night, by any chance? If not, that's an odd coincidence.

Something went wrong there, the vdev has a number of checksum errors. That zpool status output is seriously cut off, missing at least another vdev and the header. Can you please post the rest so we can take a look at it?
 

DaBuddha

Dabbler
Joined
Jan 24, 2019
Messages
11
Were you on IRC last night, by any chance? If not, that's an odd coincidence.

Something went wrong there, the vdev has a number of checksum errors. That zpool status output is seriously cut off, missing at least another vdev and the header. Can you please post the rest so we can take a look at it?

Nope not on the channel.

As mentioned there is another disk in that array I'll be replacing.

I shortened the status for brevity (36 disk 4U SM chassis), the other arrays are not seeing issues.

This is since the upgrade.

Code:
pool: DeepHole
state: DEGRADED
status: One or more devices is currently being resilvered.  The pool will
        continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
  scan: resilver in progress since Wed Jan 23 12:42:01 2019
        16.8T scanned at 169M/s, 15.3T issued at 154M/s, 34.0T total
        37.9M resilvered, 44.96% done, 1 days 11:24:03 to go
config:

        NAME                                              STATE     READ WRITE CKSUM
        DeepHole                                          DEGRADED     0     0     0
          raidz1-0                                        ONLINE       0     0     0
            gptid/aaf16082-4b7e-11e8-b779-0025902b9ca0    ONLINE       0     0     0
            gptid/ac02e533-4b7e-11e8-b779-0025902b9ca0    ONLINE       0     0     0
            gptid/ad00a296-4b7e-11e8-b779-0025902b9ca0    ONLINE       0     0     0
            gptid/adf34e84-4b7e-11e8-b779-0025902b9ca0    ONLINE       0     0     0
            gptid/aee7b242-4b7e-11e8-b779-0025902b9ca0    ONLINE       0     0     0
            gptid/afdd97f8-4b7e-11e8-b779-0025902b9ca0    ONLINE       0     0     0
            gptid/b0d3cde7-4b7e-11e8-b779-0025902b9ca0    ONLINE       0     0     0
            gptid/b1d6e3af-4b7e-11e8-b779-0025902b9ca0    ONLINE       0     0     0
            gptid/b2f09c43-4b7e-11e8-b779-0025902b9ca0    ONLINE       0     0     0
            gptid/b3f84504-4b7e-11e8-b779-0025902b9ca0    ONLINE       0     0     0


          raidz1-1                                        DEGRADED     0     0     0
            gptid/b523a793-4b7e-11e8-b779-0025902b9ca0    ONLINE       0     0     0
            gptid/b62a653f-4b7e-11e8-b779-0025902b9ca0    ONLINE       0     0     0
            gptid/b72c2150-4b7e-11e8-b779-0025902b9ca0    ONLINE       0     0     0
            gptid/b84d3e1f-4b7e-11e8-b779-0025902b9ca0    ONLINE       0     0     0
            gptid/b96533e6-4b7e-11e8-b779-0025902b9ca0    ONLINE       0     0     0
            gptid/ba6bff50-4b7e-11e8-b779-0025902b9ca0    ONLINE       0     0     0
            replacing-6                                   DEGRADED     0     0     0
              17940202337998090796                        UNAVAIL      0     0     0  was /dev/gptid/bb68588f-4b7e-11e8-b779-0025902b9ca0
              gptid/8b16de9a-1ec6-11e9-bd7c-0025902b9ca0  ONLINE       0     0     0  (resilvering)
            gptid/ec325bb4-01b0-11e9-86bf-0025902b9ca0    ONLINE       0     0     0
            gptid/c1818f64-4b7e-11e8-b779-0025902b9ca0    ONLINE       0     0     0
            replacing-9                                   DEGRADED     0     0     0
              10630623345475226511                        UNAVAIL      0     0     0  was /dev/gptid/c2832171-4b7e-11e8-b779-0025902b9ca0
              gptid/67f4a8eb-1605-11e9-b785-0025902b9ca0  ONLINE       0     0     0  (resilvering)


          raidz1-2                                        ONLINE       0     0     0
            gptid/c3b6328e-4b7e-11e8-b779-0025902b9ca0    ONLINE       0     0     0
            gptid/c4780266-4b7e-11e8-b779-0025902b9ca0    ONLINE       0     0     0
            gptid/c53beba5-4b7e-11e8-b779-0025902b9ca0    ONLINE       0     0     0
            gptid/c61c3b1d-4b7e-11e8-b779-0025902b9ca0    ONLINE       0     0     0
            gptid/c6f06114-4b7e-11e8-b779-0025902b9ca0    ONLINE       0     0     0
            gptid/c7c9e017-4b7e-11e8-b779-0025902b9ca0    ONLINE       0     0     0
            gptid/c88b0732-4b7e-11e8-b779-0025902b9ca0    ONLINE       0     0     0
            gptid/c94ef6e1-4b7e-11e8-b779-0025902b9ca0    ONLINE       0     0     0
          raidz1-3                                        ONLINE       0     0     0
            gptid/b0804c36-0121-11e9-86bf-0025902b9ca0    ONLINE       0     0     0
            gptid/cc66037e-4b7e-11e8-b779-0025902b9ca0    ONLINE       0     0     0
            gptid/cd98ee9f-4b7e-11e8-b779-0025902b9ca0    ONLINE       0     0     0
            gptid/cf65d71a-4b7e-11e8-b779-0025902b9ca0    ONLINE       0     0     0
            gptid/d20c53f8-4b7e-11e8-b779-0025902b9ca0    ONLINE       0     0     0
            gptid/d3c9f714-4b7e-11e8-b779-0025902b9ca0    ONLINE       0     0     0
            gptid/d4fe3e36-4b7e-11e8-b779-0025902b9ca0    ONLINE       0     0     0
            gptid/d6b1612f-4b7e-11e8-b779-0025902b9ca0    ONLINE       0     0     0

errors: 5 data errors, use '-v' for a list

 
Last edited:

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
The resilver is still in progress, in that output. Apart from the data errors, everything seems normal - and those aren't too weird, given that two to three disks in a single vdev were acting up.
It's going pretty slowly, how full is that pool?
 

DaBuddha

Dabbler
Joined
Jan 24, 2019
Messages
11
The resilver is still in progress, in that output. Apart from the data errors, everything seems normal - and those aren't too weird, given that two to three disks in a single vdev were acting up.
It's going pretty slowly, how full is that pool?

Don't really know, the entire pool is a little more than half full.

Ya, like I said, the array is fine, except for the fact that a 10x disk array now has twelve disks.

temp.JPG
 
Last edited:

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
The old disks will disappear when the resilver is complete.
 

DaBuddha

Dabbler
Joined
Jan 24, 2019
Messages
11
The old disks will disappear when the resilver is complete.

As shown in the first listing, the resilver has previously finished. I wouldn't of posted if they didn't hang around anomalously.

Those two disks began the resilver process once again upon upgrade / reboot. Maybe they will etherealize this time?
 
Last edited:

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
That part was a bit unclear, I missed the detail that this resilver started after a reboot.

In any case, I just realized that I jumbled your case with the other one I mentioned. You actually have RAIDZ1 vdevs. That raises an important question: What steps did you take, exactly? Did you remove one disk, resilver, remove another, resilver?

This is very weird territory, because it implies that the disks somehow have just enough information between them to cobble together the data - or most of it, but they're not actually caught up. Do I understand correctly that the old disks were throwing errors, like the one with the three errors in the OP?

We'll also want to know what the actual errors are so far with zpool status -v. If metadata is affected, you might have to restore from backup.
 

DaBuddha

Dabbler
Joined
Jan 24, 2019
Messages
11
Yes, Disk1: replace, resilver - Week later, Disk2 replace, resilver

The first time it happened, with the first disk, dopily I searched for what I thought was another disk gone bad, only after doing a simple count (11 not 10) did I realize it was a phantom.

This is my second FreeNAS array I'm currently running. Fourth ZFS array, of the other two, one is SmartOS VPS server, and an enterprise SAN is OmniOS. Also running many, many HW RAID VPS servers (Areca, LSI, HP.) Replaced a fair number of disks, never had this happen before. This home machine is cobbled though, unlike the others, leftover pieces and drives all thrown together (a mistake....)

Was hoping someone would say, "oh ya seen that, it happens when.... you need to do this that and the other thing, it will be all good" Given that this isn't the case, I'll work it from another angle.


After resilver, just the array with issues:

Code:
         raidz1-1                                        DEGRADED     0     0    29
            gptid/b523a793-4b7e-11e8-b779-0025902b9ca0    ONLINE       0     0     0
            gptid/b62a653f-4b7e-11e8-b779-0025902b9ca0    ONLINE       0     0     0
            gptid/b72c2150-4b7e-11e8-b779-0025902b9ca0    ONLINE       0     0     0
            gptid/b84d3e1f-4b7e-11e8-b779-0025902b9ca0    ONLINE       0     0     0
            gptid/b96533e6-4b7e-11e8-b779-0025902b9ca0    ONLINE       0     0     0
            gptid/ba6bff50-4b7e-11e8-b779-0025902b9ca0    ONLINE       0     0     0
            replacing-6                                   DEGRADED     0     0     0
              17940202337998090796                        UNAVAIL      0     0     0  was /dev/gptid/bb68588f-4b7e-11e8-b779-0025902b9ca0
              gptid/8b16de9a-1ec6-11e9-bd7c-0025902b9ca0  ONLINE       0     0     0
            gptid/ec325bb4-01b0-11e9-86bf-0025902b9ca0    ONLINE       0     0     0
            gptid/c1818f64-4b7e-11e8-b779-0025902b9ca0    ONLINE       1     0     0
            replacing-9                                   DEGRADED     0     0     0
              10630623345475226511                        UNAVAIL      0     0     0  was /dev/gptid/c2832171-4b7e-11e8-b779-0025902b9ca0
              gptid/67f4a8eb-1605-11e9-b785-0025902b9ca0  ONLINE       0     0     0


Scrub underway.
 
Last edited:

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
The scrub isn't likely to change anything, since it's fundamentally the same as the resilver.
 

Chris Moore

Hall of Famer
Joined
May 2, 2015
Messages
10,080
This home machine is cobbled though, unlike the others, leftover pieces and drives all thrown together (a mistake....)
What disk controller are you using?
 

Chris Moore

Hall of Famer
Joined
May 2, 2015
Messages
10,080
We'll also want to know what the actual errors are so far with zpool status -v. If metadata is affected, you might have to restore from backup.
@DaBuddha , did your resilver finish yet?
 

Chris Moore

Hall of Famer
Joined
May 2, 2015
Messages
10,080
The output of zpool list -v will show us fairly accurately how full each vdev is, if you would share that?
 
Top