SOLVED Can't import full, encrypted pool. vdev state changed and my physical pool disks are really, really hard at work

devnullius · Mar 26, 2019

Code:

Mar 28 16:18:50 Freenas ZFS: vdev state changed, pool_guid=4057754529204707565 vdev_guid=421127857251552155
Mar 28 16:18:51 Freenas ZFS: vdev state changed, pool_guid=4057754529204707565 vdev_guid=14799165109861233241
Mar 28 16:18:52 Freenas ZFS: vdev state changed, pool_guid=4057754529204707565 vdev_guid=1092407030297009407
Mar 28 16:18:53 Freenas ZFS: vdev state changed, pool_guid=4057754529204707565 vdev_guid=11554200803400165353
Mar 28 16:18:53 Freenas ZFS: vdev state changed, pool_guid=4057754529204707565 vdev_guid=16833066816084926306
Mar 28 16:18:54 Freenas ZFS: vdev state changed, pool_guid=4057754529204707565 vdev_guid=11437423961006186802
Mar 28 16:18:55 Freenas ZFS: vdev state changed, pool_guid=4057754529204707565 vdev_guid=10804527141880872542
Mar 28 16:18:56 Freenas ZFS: vdev state changed, pool_guid=4057754529204707565 vdev_guid=1041627678558872719
Mar 28 16:18:57 Freenas ZFS: vdev state changed, pool_guid=4057754529204707565 vdev_guid=421127857251552155

My console is being flooded (every second) with the above statement (only the numbers change).

What's going on? This happened after the steps we did here: https://www.ixsystems.com/community...nstallation-fails-because-its-too-full.75047/ (TL / DL Fresh FreeNAS installation, importing old pool failed with the GUI, we used the command line to place the decryption .key file in the proper location (and chmod) and then decrypted the volume manually (disk by disk: total 8 times). Importing still failed (/mnt busy) so I did a reboot to start over. After the reboot, the ZFS vdev state errors / warnings / notifications appeared. And I have to mention that the pool's at 96% capacity… And I also did this, based on advise from a forum post: zfs set mountpoint=/mnt ZFS_8x_3TB_RAIDz2_pool. Could this have messed it up?

I have no clue as of why or what now? :(

Any help greatly appreciated :) dmesg from the last boot: https://pastebin.com/PXMr8Sbw

Post-edit: http://zfsguru.com/forum/zfsgurusupport/1200, maybe they have idea what the vdev messages are...

Redcoat · Mar 26, 2019

Is there a scrub in progress? What's the ouput of "zpool status" ?

devnullius · Mar 26, 2019

Code:

root@Freenas:~ # zpool list
NAME                     SIZE  ALLOC   FREE  EXPANDSZ   FRAG    CAP  DEDUP  HEALTH  ALTROOT
VOLU10TB                9.06T  3.34T  5.73T         -     0%    36%  1.00x  ONLINE  /mnt
ZFS_8x_3TB_RAIDz2_pool      -      -      -         -      -      -      -  UNAVAIL  -
freenas-boot            14.4G   755M  13.6G         -      -     5%  1.00x  ONLINE  -
root@Freenas:~ # zpool status
  pool: VOLU10TB
 state: ONLINE
  scan: scrub repaired 0 in 0 days 04:52:22 with 0 errors on Sun Mar 10 04:52:22 2019
config:

        NAME                                          STATE     READ WRITE CKSUM
        VOLU10TB                                      ONLINE       0     0     0
          gptid/6baca59b-0553-11e8-a1db-0025901159d4  ONLINE       0     0     0

errors: No known data errors

  pool: ZFS_8x_3TB_RAIDz2_pool
 state: UNAVAIL
status: One or more devices could not be opened.  There are insufficient
        replicas for the pool to continue functioning.
action: Attach the missing device and online it using 'zpool online'.
   see: http://illumos.org/msg/ZFS-8000-3C
  scan: none requested
config:

        NAME                      STATE     READ WRITE CKSUM
        ZFS_8x_3TB_RAIDz2_pool    UNAVAIL      0     0     0
          raidz1-0                UNAVAIL      0     0     0
            1092407030297009407   UNAVAIL      0     0     0  was /dev/gptid/7a2fc1c9-08ef-11e8-ab60-0025901159d4.eli
            11554200803400165353  UNAVAIL      0     0     0  was /dev/gptid/7b297d44-08ef-11e8-ab60-0025901159d4.eli
            16833066816084926306  UNAVAIL      0     0     0  was /dev/gptid/7c382ed2-08ef-11e8-ab60-0025901159d4.eli
            11437423961006186802  UNAVAIL      0     0     0  was /dev/gptid/7eade0a6-08ef-11e8-ab60-0025901159d4.eli
          raidz1-1                UNAVAIL      0     0     0
            10804527141880872542  UNAVAIL      0     0     0  was /dev/gptid/85693c1a-08ef-11e8-ab60-0025901159d4.eli
            1041627678558872719   UNAVAIL      0     0     0  was /dev/gptid/888ff6c8-08ef-11e8-ab60-0025901159d4.eli
            421127857251552155    UNAVAIL      0     0     0  was /dev/gptid/8bb7c249-08ef-11e8-ab60-0025901159d4.eli
            14799165109861233241  UNAVAIL      0     0     0  was /dev/gptid/8ec07bd1-08ef-11e8-ab60-0025901159d4.eli

  pool: freenas-boot
 state: ONLINE
  scan: none requested
config:

        NAME        STATE     READ WRITE CKSUM
        freenas-boot  ONLINE       0     0     0
          da8p2     ONLINE       0     0     0

errors: No known data errors

This makes my stomach jump up & down :/

Redcoat · Mar 26, 2019

It's above my pay-grade ...

PhiloEpisteme · Mar 26, 2019

Just chiming in with a reminder from the other thread. You're going to expect your pool to fail to import if your disks are still encrypted.

devnullius said:
was /dev/gptid/7a2fc1c9-08ef-11e8-ab60-0025901159d4.eli

This indicates that the disk is likely still encrypted. You'll probably see a device at /dev/gptid/7a2fc1c9-08ef-11e8-ab60-0025901159d4

devnullius · Mar 26, 2019

It's still encrypted, true. But for me that does not explain the vdev errors? Remember, I only rebooted and those messages appeared… And we hadn't them up to now.

But if you are right, what would be wise to do now? And can I safely ignore those vdev messages?

PS: OH, hello Philo!! :)

PhiloEpisteme · Mar 26, 2019

devnullius said:
But if you are right, what would be wise to do now? And can I safely ignore those vdev messages?

I found a discussion that might be relevant to these errors. After the work from the last post I have to say I'm not entirely surprised to be seeing errors like this, especially since the pool is currently unavailable due to the disks being encrypted.

Also, I recommend you consider making the title of your post shorter. Something like Can't import full, encrypted pool. vdev state changed or even something shorter. It helps give folks a quick glance of what your post is about.

devnullius · Mar 26, 2019

Yes, I found that one too but couldn't get much from it. I'm gonna go to bed, give it another round tomorrow

xx

devnullius · Mar 27, 2019

While I patiently await any further insights (hopefully!), I did notice one extra strange thing: today I'm also getting email reports with the following warning/error:


Snapshot ZFS_8x_3TB_RAIDz2_pool/Pipi/M@auto-20190327.0825-1w failed with the following error: cannot open 'ZFS_8x_3TB_RAIDz2_pool/Pipi/M': dataset does not exist
usage:
        snapshot|snap [-r] [-o property=value] ... <filesystem|volume>@<snap> ...

For the property list, run: zfs set|get

For the delegated permission list, run: zfs allow|unallow
)

(and the console is still flashing vdev errors / warnings).

ALSO: all disks are physically really hard at work; it's not that the disks are doing nothing. I really do not dare to reboot just yet!

Help :(

:)

devnullius · Mar 27, 2019

Reading this: https://forums.freebsd.org/threads/...-minute-should-i-be-worried.61335/post-353240, and seen my e-mails, the vdev errors probably (!) are caused by a scrub that runs on an offline pool... :|

I removed the scrub for that pool (GUI). Let's see if the vdev errors (and disks) calm the frack down :/

Edit: they didn't. So from https://www.ixsystems.com/community/threads/stop-scrub-or-just-shutdown.22054/post-130197 and from https://redmine.ixsystems.com/issues/868, a zpool scrub -s <pool> should stop already running scrubs... Which should be safe to do... (which I will look into next!). Something that should relax me a bit though is https://www.ixsystems.com/community/threads/slow-zfs-scrub-on-one-of-two-pools.17044/post-89741, which states that on heavy fragmented pools, this can take a long long time... Seen that my pool is close to 50% fragmented and 96% full... I think it might be a day or so before my disks calm the frack down. I'll then reboot (?right) and try some commands to import and decrypt the pool once more. I'll not stop the scrub just yet…

PhiloEpisteme · Mar 27, 2019

devnullius said:
I removed the scrub for that pool (GUI). Let's see if the vdev errors (and disks) calm the frack down :/

Did the disks calm down at least?

Until you get your pool decrytped and imported you will probably want to turn off any automated tasks like scrubs, snapshots, etc.

devnullius · Mar 27, 2019

I've tried shutting down the snapshot tasks too but the GUI times out. I guess I need to delve into the shell again https://mwl.io/archives/2140. I'm afk for the next few hours, if someone knows how to disable Snapshot schedules through command line, do share :) All disks are still hard at work and I doubt it will stop any time soon :(

devnullius · Mar 28, 2019

(disks still hard at work) How the frack do I use the console to shutdown the scrub. Or should I just reboot? Maybe even do a fresh install without importing my old settings and manually / GUI / try to import the pool?

PhiloEpisteme · Mar 28, 2019

devnullius said:
(disks still hard at work) How the frack do I use the console to shutdown the scrub. Or should I just reboot? Maybe even do a fresh install without importing my old settings and manually / GUI / try to import the pool?

You're certain a scrub is in progress? And you're certain it is the 3TB disks that are so active? zfs iostat may give you some indication of what the pool is doing. zpool status ZFS_8x_3TB_RAIDz2_pool may also be useful to see what is going on.

devnullius · Mar 28, 2019

Certain about the scrub? No, not 100%. But I don't see what else could be going on (I haven't unlocked or decrypted the pool yet after the reboot; because of the vdev messages and the hard work the 8x 3TB disks are doing: the hardware LEDs are very clear as to which disks are active.

zpool iostat

Code:

 zpool iostat
                 capacity     operations    bandwidth
pool          alloc   free   read  write   read  write
------------  -----  -----  -----  -----  -----  -----
VOLU10TB      3.34T  5.73T      0      0    261  3.47K
freenas-boot   757M  13.6G      0      0    753    125
------------  -----  -----  -----  -----  -----  -----

Code:

zpool status ZFS_8x_3TB_RAIDz2_pool
  pool: ZFS_8x_3TB_RAIDz2_pool
 state: UNAVAIL
status: One or more devices could not be opened.  There are insufficient
        replicas for the pool to continue functioning.
action: Attach the missing device and online it using 'zpool online'.
   see: http://illumos.org/msg/ZFS-8000-3C
  scan: none requested
config:

        NAME                      STATE     READ WRITE CKSUM
        ZFS_8x_3TB_RAIDz2_pool    UNAVAIL      0     0     0
          raidz1-0                UNAVAIL      0     0     0
            1092407030297009407   UNAVAIL      0     0     0  was /dev/gptid/7a2fc1c9-08ef-11e8-ab60-0025901159d4.eli
            11554200803400165353  UNAVAIL      0     0     0  was /dev/gptid/7b297d44-08ef-11e8-ab60-0025901159d4.eli
            16833066816084926306  UNAVAIL      0     0     0  was /dev/gptid/7c382ed2-08ef-11e8-ab60-0025901159d4.eli
            11437423961006186802  UNAVAIL      0     0     0  was /dev/gptid/7eade0a6-08ef-11e8-ab60-0025901159d4.eli
          raidz1-1                UNAVAIL      0     0     0
            10804527141880872542  UNAVAIL      0     0     0  was /dev/gptid/85693c1a-08ef-11e8-ab60-0025901159d4.eli
            1041627678558872719   UNAVAIL      0     0     0  was /dev/gptid/888ff6c8-08ef-11e8-ab60-0025901159d4.eli
            421127857251552155    UNAVAIL      0     0     0  was /dev/gptid/8bb7c249-08ef-11e8-ab60-0025901159d4.eli
            14799165109861233241  UNAVAIL      0     0     0  was /dev/gptid/8ec07bd1-08ef-11e8-ab60-0025901159d4.eli

Code:

root@Freenas:~ # zpool online ZFS_8x_3TB_RAIDz2_pool
missing device name
usage:
        online [-e] <pool> <device> …

xx

PhiloEpisteme · Mar 28, 2019

devnullius said:
I haven't unlocked or decrypted the pool yet after the reboot; because of the vdev messages and the hard work the 8x 3TB disks are doing: the hardware LEDs are very clear as to which disks are active.

So long as your drives are locked I don't see how FreeNAS or ZFS can be doing anything useful with those drives. You can confirm that they are doing some work with gstat -p. The flag tells gstat to only look at physical disks.

devnullius · Mar 28, 2019

:( I guess it's not a scrub after all.... My bad for assuming it. https://www.ixsystems.com/community/threads/stop-scrub-or-just-shutdown.22054/

Code:

root@Freenas:~ # zpool scrub -s ZFS_8x_3TB_RAIDz2_pool
cannot scrub 'ZFS_8x_3TB_RAIDz2_pool': pool is currently unavailable

devnullius · Mar 28, 2019

PhiloEpisteme said:
gstat -p. The flag tells gstat to only look at physical disks.

Code:

dT: 1.031s  w: 1.000s
L(q)  ops/s    r/s   kBps   ms/r    w/s   kBps   ms/w   %busy Name
    0     31     31   3476    0.7      0      0    0.0    0.8| da0
    0     12     12   1304    0.6      0      0    0.0    0.3| da1
    0     12     12   1304    0.7      0      0    0.0    0.3| da2
    0     27     27   3042   25.5      0      0    0.0   23.5| da3
    2     25     25   2824   27.7      0      0    0.0   23.7| da4
    0     12     12   1304    1.2      0      0    0.0    0.5| da5
    0     12     12   1304    1.2      0      0    0.0    0.5| da6
    0     12     12   1304    1.3      0      0    0.0    0.6| da7
    0      8      8    435    8.1      0      0    0.0    1.4| da8
    0     78     62   3476   18.0      0      0    0.0   24.2| da9
    0      8      8    869    1.5      0      0    0.0    0.4| ada0
    0      0      0      0    0.0      0      0    0.0    0.0| cd0

(only 3 disks (part of the pool) seem to be hard at work, at least at this moment, though all 8 HDD LEDs are flashing)

devnullius · Mar 28, 2019

PhiloEpisteme said:
So long as your drives are locked I don't see how FreeNAS or ZFS can be doing anything useful with those drives.

THAT is what's confusing me too!! What are those vdev messages? Why are my disks so hard at work? It's beyond me - which is easily done with FreeNAS, true :)

PhiloEpisteme · Mar 28, 2019

Do you have any scheduled SMART tests? Try smartctl -a <device> perhaps? There are also a LOT of tools out there that will tell you which processes are causing a lot of i/o.

Important Announcement for the TrueNAS Community.

SOLVED Can't import full, encrypted pool. vdev state changed and my physical pool disks are really, really hard at work

Patron

MVP

Patron

MVP

Guru

Patron

Guru

Patron

Patron

Patron

Guru

Patron

Patron

Guru

Patron

Guru

Patron

Patron

Patron

Guru

Similar threads