Phantom Disks Hung in Array - Replacing-X

DaBuddha · Jan 24, 2019

First post, need a hand.

Code:

Build:         FreeNAS-11.1-U2
Platform:       Intel(R) Xeon(R) CPU E5645 @ 2.40GHz

I've replaced two disks (and will have to soon replace a third), the replacements resilver, and array is actually fine. But the old disks, which are now gone, are still listed, and I am permanently DEGRADED (I am so ashamed)

zpool status:

Code:

raidz1-1                                        DEGRADED     3     0    26
            gptid/b523a793-4b7e-11e8-b779-0025902b9ca0    ONLINE       0     0     0
            gptid/b62a653f-4b7e-11e8-b779-0025902b9ca0    ONLINE       0     0     0
            gptid/b72c2150-4b7e-11e8-b779-0025902b9ca0    ONLINE       0     0     0
            gptid/b84d3e1f-4b7e-11e8-b779-0025902b9ca0    ONLINE       0     0     0
            gptid/b96533e6-4b7e-11e8-b779-0025902b9ca0    ONLINE       0     0     0
            gptid/ba6bff50-4b7e-11e8-b779-0025902b9ca0    ONLINE       0     0     0
            replacing-6                                   UNAVAIL      0     0     0
              17940202337998090796                        UNAVAIL      0     0     0  was /dev/gptid/bb68588f-4b7e-11e8-b779-0025902b9ca0
              gptid/8b16de9a-1ec6-11e9-bd7c-0025902b9ca0  ONLINE       0     0     0
            gptid/ec325bb4-01b0-11e9-86bf-0025902b9ca0    ONLINE       0     0     0
            gptid/c1818f64-4b7e-11e8-b779-0025902b9ca0    ONLINE       3     0     0
            replacing-9                                   DEGRADED     0     0     0
              10630623345475226511                        UNAVAIL      0     0     0  was /dev/gptid/c2832171-4b7e-11e8-b779-0025902b9ca0

Searching I found some references to this problem on FreeNAS 9 back in 2015, but no real solution (Linked to a fixed bug), also references to sector size difference issues causing issues on replace. The bad disks got bumped to HGST 6TBs from old ass seagate 'cudas (long story, plan on replacing all the old disks with 6TBs).

How do I exorcise the disks that are listed as being replaced (r-6; r-9) but are now gone?

Chris Moore · Jan 24, 2019

You should not be able to get into this situation.
What is the disk controller?

Chris Moore · Jan 24, 2019

PS. there are many good bug fix reasons for you to upgrade to version 11.1-U6.3

DaBuddha · Jan 24, 2019

What is the disk controller?

LSI 9211 flashed IT (HBA)

Can you point me towards logs that might help clue my ass up?

DaBuddha · Jan 24, 2019

Chris Moore said:
PS. there are many good bug fix reasons for you to upgrade to version 11.1-U6.3

Thanks. Will do. Didn't see anything that might indicate this is a known bug.

Chris Moore · Jan 24, 2019

DaBuddha said:
Thanks. Will do. Didn't see anything that might indicate this is a known bug.

I have not seen anything like what you are experiencing in quite a while, but there were a long list of other bugs fixed between FreeNAS-11.1-U2 and FreeNAS-11.1-U6.3.
The new 11.2 line is not fully stable yet, so I don't suggest going there.

Have you tried running a scrub on the pool since changing these disks?

DaBuddha · Jan 24, 2019

Chris Moore said:
I have not seen anything like what you are experiencing in quite a while, but there were a long list of other bugs fixed between FreeNAS-11.1-U2 and FreeNAS-11.1-U6.3.
The new 11.2 line is not fully stable yet, so I don't suggest going there.

Have you tried running a scrub on the pool since changing these disks?

Not yet, will when I can.

I just did the upgrade and, with that, it is resilvering both of the replacement disks.

Once that is complete I will scrub.

Ericloewe · Jan 24, 2019

Were you on IRC last night, by any chance? If not, that's an odd coincidence.

Something went wrong there, the vdev has a number of checksum errors. That zpool status output is seriously cut off, missing at least another vdev and the header. Can you please post the rest so we can take a look at it?

DaBuddha · Jan 24, 2019

Ericloewe said:
Were you on IRC last night, by any chance? If not, that's an odd coincidence.

Something went wrong there, the vdev has a number of checksum errors. That zpool status output is seriously cut off, missing at least another vdev and the header. Can you please post the rest so we can take a look at it?

Nope not on the channel.

As mentioned there is another disk in that array I'll be replacing.

I shortened the status for brevity (36 disk 4U SM chassis), the other arrays are not seeing issues.

This is since the upgrade.

Code:

pool: DeepHole
state: DEGRADED
status: One or more devices is currently being resilvered.  The pool will
        continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
  scan: resilver in progress since Wed Jan 23 12:42:01 2019
        16.8T scanned at 169M/s, 15.3T issued at 154M/s, 34.0T total
        37.9M resilvered, 44.96% done, 1 days 11:24:03 to go
config:

        NAME                                              STATE     READ WRITE CKSUM
        DeepHole                                          DEGRADED     0     0     0
          raidz1-0                                        ONLINE       0     0     0
            gptid/aaf16082-4b7e-11e8-b779-0025902b9ca0    ONLINE       0     0     0
            gptid/ac02e533-4b7e-11e8-b779-0025902b9ca0    ONLINE       0     0     0
            gptid/ad00a296-4b7e-11e8-b779-0025902b9ca0    ONLINE       0     0     0
            gptid/adf34e84-4b7e-11e8-b779-0025902b9ca0    ONLINE       0     0     0
            gptid/aee7b242-4b7e-11e8-b779-0025902b9ca0    ONLINE       0     0     0
            gptid/afdd97f8-4b7e-11e8-b779-0025902b9ca0    ONLINE       0     0     0
            gptid/b0d3cde7-4b7e-11e8-b779-0025902b9ca0    ONLINE       0     0     0
            gptid/b1d6e3af-4b7e-11e8-b779-0025902b9ca0    ONLINE       0     0     0
            gptid/b2f09c43-4b7e-11e8-b779-0025902b9ca0    ONLINE       0     0     0
            gptid/b3f84504-4b7e-11e8-b779-0025902b9ca0    ONLINE       0     0     0


          raidz1-1                                        DEGRADED     0     0     0
            gptid/b523a793-4b7e-11e8-b779-0025902b9ca0    ONLINE       0     0     0
            gptid/b62a653f-4b7e-11e8-b779-0025902b9ca0    ONLINE       0     0     0
            gptid/b72c2150-4b7e-11e8-b779-0025902b9ca0    ONLINE       0     0     0
            gptid/b84d3e1f-4b7e-11e8-b779-0025902b9ca0    ONLINE       0     0     0
            gptid/b96533e6-4b7e-11e8-b779-0025902b9ca0    ONLINE       0     0     0
            gptid/ba6bff50-4b7e-11e8-b779-0025902b9ca0    ONLINE       0     0     0
            replacing-6                                   DEGRADED     0     0     0
              17940202337998090796                        UNAVAIL      0     0     0  was /dev/gptid/bb68588f-4b7e-11e8-b779-0025902b9ca0
              gptid/8b16de9a-1ec6-11e9-bd7c-0025902b9ca0  ONLINE       0     0     0  (resilvering)
            gptid/ec325bb4-01b0-11e9-86bf-0025902b9ca0    ONLINE       0     0     0
            gptid/c1818f64-4b7e-11e8-b779-0025902b9ca0    ONLINE       0     0     0
            replacing-9                                   DEGRADED     0     0     0
              10630623345475226511                        UNAVAIL      0     0     0  was /dev/gptid/c2832171-4b7e-11e8-b779-0025902b9ca0
              gptid/67f4a8eb-1605-11e9-b785-0025902b9ca0  ONLINE       0     0     0  (resilvering)


          raidz1-2                                        ONLINE       0     0     0
            gptid/c3b6328e-4b7e-11e8-b779-0025902b9ca0    ONLINE       0     0     0
            gptid/c4780266-4b7e-11e8-b779-0025902b9ca0    ONLINE       0     0     0
            gptid/c53beba5-4b7e-11e8-b779-0025902b9ca0    ONLINE       0     0     0
            gptid/c61c3b1d-4b7e-11e8-b779-0025902b9ca0    ONLINE       0     0     0
            gptid/c6f06114-4b7e-11e8-b779-0025902b9ca0    ONLINE       0     0     0
            gptid/c7c9e017-4b7e-11e8-b779-0025902b9ca0    ONLINE       0     0     0
            gptid/c88b0732-4b7e-11e8-b779-0025902b9ca0    ONLINE       0     0     0
            gptid/c94ef6e1-4b7e-11e8-b779-0025902b9ca0    ONLINE       0     0     0
          raidz1-3                                        ONLINE       0     0     0
            gptid/b0804c36-0121-11e9-86bf-0025902b9ca0    ONLINE       0     0     0
            gptid/cc66037e-4b7e-11e8-b779-0025902b9ca0    ONLINE       0     0     0
            gptid/cd98ee9f-4b7e-11e8-b779-0025902b9ca0    ONLINE       0     0     0
            gptid/cf65d71a-4b7e-11e8-b779-0025902b9ca0    ONLINE       0     0     0
            gptid/d20c53f8-4b7e-11e8-b779-0025902b9ca0    ONLINE       0     0     0
            gptid/d3c9f714-4b7e-11e8-b779-0025902b9ca0    ONLINE       0     0     0
            gptid/d4fe3e36-4b7e-11e8-b779-0025902b9ca0    ONLINE       0     0     0
            gptid/d6b1612f-4b7e-11e8-b779-0025902b9ca0    ONLINE       0     0     0

errors: 5 data errors, use '-v' for a list

Ericloewe · Jan 24, 2019

The resilver is still in progress, in that output. Apart from the data errors, everything seems normal - and those aren't too weird, given that two to three disks in a single vdev were acting up.
It's going pretty slowly, how full is that pool?

DaBuddha · Jan 24, 2019

Ericloewe said:
The resilver is still in progress, in that output. Apart from the data errors, everything seems normal - and those aren't too weird, given that two to three disks in a single vdev were acting up.
It's going pretty slowly, how full is that pool?

Don't really know, the entire pool is a little more than half full.

Ya, like I said, the array is fine, except for the fact that a 10x disk array now has twelve disks.

Ericloewe · Jan 24, 2019

The old disks will disappear when the resilver is complete.

DaBuddha · Jan 24, 2019

Ericloewe said:
The old disks will disappear when the resilver is complete.

As shown in the first listing, the resilver has previously finished. I wouldn't of posted if they didn't hang around anomalously.

Those two disks began the resilver process once again upon upgrade / reboot. Maybe they will etherealize this time?

Ericloewe · Jan 24, 2019

That part was a bit unclear, I missed the detail that this resilver started after a reboot.

In any case, I just realized that I jumbled your case with the other one I mentioned. You actually have RAIDZ1 vdevs. That raises an important question: What steps did you take, exactly? Did you remove one disk, resilver, remove another, resilver?

This is very weird territory, because it implies that the disks somehow have just enough information between them to cobble together the data - or most of it, but they're not actually caught up. Do I understand correctly that the old disks were throwing errors, like the one with the three errors in the OP?

We'll also want to know what the actual errors are so far with zpool status -v. If metadata is affected, you might have to restore from backup.

DaBuddha · Jan 25, 2019

Yes, Disk1: replace, resilver - Week later, Disk2 replace, resilver

The first time it happened, with the first disk, dopily I searched for what I thought was another disk gone bad, only after doing a simple count (11 not 10) did I realize it was a phantom.

This is my second FreeNAS array I'm currently running. Fourth ZFS array, of the other two, one is SmartOS VPS server, and an enterprise SAN is OmniOS. Also running many, many HW RAID VPS servers (Areca, LSI, HP.) Replaced a fair number of disks, never had this happen before. This home machine is cobbled though, unlike the others, leftover pieces and drives all thrown together (a mistake....)

Was hoping someone would say, "oh ya seen that, it happens when.... you need to do this that and the other thing, it will be all good" Given that this isn't the case, I'll work it from another angle.

After resilver, just the array with issues:

Code:

         raidz1-1                                        DEGRADED     0     0    29
            gptid/b523a793-4b7e-11e8-b779-0025902b9ca0    ONLINE       0     0     0
            gptid/b62a653f-4b7e-11e8-b779-0025902b9ca0    ONLINE       0     0     0
            gptid/b72c2150-4b7e-11e8-b779-0025902b9ca0    ONLINE       0     0     0
            gptid/b84d3e1f-4b7e-11e8-b779-0025902b9ca0    ONLINE       0     0     0
            gptid/b96533e6-4b7e-11e8-b779-0025902b9ca0    ONLINE       0     0     0
            gptid/ba6bff50-4b7e-11e8-b779-0025902b9ca0    ONLINE       0     0     0
            replacing-6                                   DEGRADED     0     0     0
              17940202337998090796                        UNAVAIL      0     0     0  was /dev/gptid/bb68588f-4b7e-11e8-b779-0025902b9ca0
              gptid/8b16de9a-1ec6-11e9-bd7c-0025902b9ca0  ONLINE       0     0     0
            gptid/ec325bb4-01b0-11e9-86bf-0025902b9ca0    ONLINE       0     0     0
            gptid/c1818f64-4b7e-11e8-b779-0025902b9ca0    ONLINE       1     0     0
            replacing-9                                   DEGRADED     0     0     0
              10630623345475226511                        UNAVAIL      0     0     0  was /dev/gptid/c2832171-4b7e-11e8-b779-0025902b9ca0
              gptid/67f4a8eb-1605-11e9-b785-0025902b9ca0  ONLINE       0     0     0

Scrub underway.

Ericloewe · Jan 25, 2019

The scrub isn't likely to change anything, since it's fundamentally the same as the resilver.

DaBuddha · Jan 25, 2019

Chris Moore said:
Have you tried running a scrub on the pool since changing these disks?

Shake a dead chicken here, shake a dead chicken there. Just doing as promised.

Chris Moore · Jan 25, 2019

DaBuddha said:
This home machine is cobbled though, unlike the others, leftover pieces and drives all thrown together (a mistake....)

What disk controller are you using?

Chris Moore · Jan 25, 2019

Ericloewe said:
We'll also want to know what the actual errors are so far with zpool status -v. If metadata is affected, you might have to restore from backup.

@DaBuddha , did your resilver finish yet?

Chris Moore · Jan 25, 2019

The output of zpool list -v will show us fairly accurately how full each vdev is, if you would share that?

Important Announcement for the TrueNAS Community.

Phantom Disks Hung in Array - Replacing-X

Dabbler

Hall of Famer

Hall of Famer

Dabbler

Dabbler

Hall of Famer

Dabbler

Server Wrangler

Dabbler

Server Wrangler

Dabbler

Server Wrangler

Dabbler

Server Wrangler

Dabbler

Server Wrangler

Dabbler

Hall of Famer

Hall of Famer

Hall of Famer

Similar threads