SOLVED Unable to replace Disk

pechkin000 · Mar 8, 2019

Hi,
I would really appreciate any guidance in this.
I had a bad disk. I was unable to offline it before replacing it. I kept getting error that particular device was not available.
So i shut it down and inserted a new disk. Here is what my zpool status looks like now:

Code:

replicator# zpool status
  pool: backups
state: DEGRADED
status: One or more devices has been taken offline by the administrator.
        Sufficient replicas exist for the pool to continue functioning in a
        degraded state.
action: Online the device using 'zpool online' or replace the device with
        'zpool replace'.
  scan: scrub repaired 540K in 0 days 05:28:45 with 0 errors on Wed Mar  6 18:51:22 2019
config:

        NAME                                            STATE     READ WRITE CKSUM
        backups                                         DEGRADED     0     0     0
          raidz3-0                                      DEGRADED     0     0     0
            gptid/a38d9d54-2470-11e7-be70-ac220b8c944c  ONLINE       0     0     0
            gptid/6ba58636-4588-11e7-867f-ac220b8c944c  ONLINE       0     0     0
            gptid/e3acea0a-8574-11e4-9c86-ac220b8c944c  ONLINE       0     0     0
            gptid/18b749b3-c0b6-11e7-81f8-ac220b8c944c  ONLINE       0     0     0
            15678995806359064346                        OFFLINE      0     0     0  was /dev/gptid/8016f0cf-5557-11e4-a84e-ac220b8c944c
            gptid/a0fb3bb7-c685-11e4-acbc-ac220b8c944c  ONLINE       0     0     0
            gptid/80f31598-5557-11e4-a84e-ac220b8c944c  ONLINE       0     0     0
            gptid/8163ab25-5557-11e4-a84e-ac220b8c944c  ONLINE       0     0     0

errors: No known data errors

  pool: freenas-boot
state: ONLINE
  scan: scrub repaired 0 in 0 days 00:01:56 with 0 errors on Sat Mar  2 03:46:56 2019
config:

        NAME        STATE     READ WRITE CKSUM
        freenas-boot  ONLINE       0     0     0
          ada0p2    ONLINE       0     0     0

errors: No known data errors

In GUI if I try to online it in GUI i get this:

Code:

 File "./freenasUI/middleware/notifier.py", line 287, in zfs_online_disk
    c.call('pool.online', volume.id, {'label': label})

  File "./freenasUI/middleware/notifier.py", line 287, in zfs_online_disk
    c.call('pool.online', volume.id, {'label': label})

  File "/usr/local/lib/python3.7/site-packages/middlewared/client/client.py", line 453, in call
    raise ValidationErrors(c.extra)

middlewared.client.client.ValidationErrors: [EINVAL] options.label: Label /dev/gptid/8016f0cf-5557-11e4-a84e-ac220b8c944c not found on this pool.


During handling of the above exception, another exception occurred:


Traceback (most recent call last):

  File "/usr/local/lib/python3.7/site-packages/tastypie/resources.py", line 219, in wrapper
    response = callback(request, *args, **kwargs)

  File "./freenasUI/api/resources.py", line 899, in online_disk
    notifier().zfs_online_disk(obj, deserialized.get('label'))

  File "./freenasUI/middleware/notifier.py", line 289, in zfs_online_disk
    raise MiddlewareError(f'Disk online failed: {str(e)}')

freenasUI.middleware.exceptions.MiddlewareError: [MiddlewareError: Disk online failed: [EINVAL] options.label: Label /dev/gptid/8016f0cf-5557-11e4-a84e-ac220b8c944c not found on this pool.]

If i do zpool online backups /dev/gptid/8016f0cf-5557-11e4-a84e-ac220b8c944c then i get this:

Code:

replicator# zpool online backups /dev/gptid/806359064346
cannot online /dev/gptid/806359064346: no such device in pool
replicator# zpool online backups /dev/gptid/8016f0cf-5557-11e4-a84e-ac220b8c944c
warning: device '/dev/gptid/8016f0cf-5557-11e4-a84e-ac220b8c944c' onlined, but remains in faulted state
use 'zpool replace' to replace devices that are no longer present
replicator#

AFter I do this, im GUI it shows up as unavailable:

And if I try to do DIsk Replace in GUI it asks me if I want to replace Disk Member da7, I say yes (tired both forced and not forced option, i get something like this:

Code:

Mar  8 09:58:14 replicator uwsgi: [middleware.exceptions:36] [MiddlewareError: Disk online failed: [EINVAL] options.label: Label /dev/gptid/8016f0cf-5557-11e4-a84e-ac220b8c944c not found on this pool.]
Mar  8 10:00:08 replicator ZFS: vdev state changed, pool_guid=7276832981028910459 vdev_guid=13712933058586299289
Mar  8 10:00:08 replicator ZFS: vdev state changed, pool_guid=7276832981028910459 vdev_guid=4742708301625801581
Mar  8 10:00:08 replicator ZFS: vdev state changed, pool_guid=7276832981028910459 vdev_guid=10722137843091395938
Mar  8 10:00:08 replicator ZFS: vdev state changed, pool_guid=7276832981028910459 vdev_guid=10312215785613305283
Mar  8 10:00:08 replicator ZFS: vdev state changed, pool_guid=7276832981028910459 vdev_guid=15678995806359064346
Mar  8 10:00:08 replicator ZFS: vdev state changed, pool_guid=7276832981028910459 vdev_guid=18414390688458562224
Mar  8 10:00:08 replicator ZFS: vdev state changed, pool_guid=7276832981028910459 vdev_guid=7372792234020972046
Mar  8 10:00:08 replicator ZFS: vdev state changed, pool_guid=7276832981028910459 vdev_guid=14887578890026946931

I tried
replicator# zpool replace backups /dev/gptid/8016f0cf-5557-11e4-a84e-ac220b8c944c

but all i get is

zpool replace backups /dev/gptid/8016f0cf-5557-11e4-a84e-ac220b8c944c
cannot open '/dev/gptid/8016f0cf-5557-11e4-a84e-ac220b8c944c': No such file or directory
replicator#

I would really appreciate any help with this. Thank you guys in advance!!!

pechkin000 · Mar 8, 2019

In case anyone ran into this issue, my problem was identifying the device name. It changed the device name from what it was originally da1 to da7 after it was replaced. Once i realized that, the following got it re-silvering:
zpool replace backups /dev/gptid/8016f0cf-5557-11e4-a84e-ac220b8c944c /dev/da7

pechkin000 · Mar 8, 2019

Just one last thought, in case any of the developers read this. GUI was offering me the correct device to replace, but it was getting an error, I was only able to do this in shell with zpool replace command, it did not work in GUI. I am on a nightlies, there must be some sort of issue, if I was unable to do it via gui but it worked fine over command line.

pro lamer · Mar 8, 2019

pechkin000 said:
in case any of the developers read this.

You may file a bug...

pechkin000 said:
da1 to da7 after it was replaced. Once i realized that, the following got it re-silvering:
zpool replace backups /dev/gptid/8016f0cf-5557-11e4-a84e-ac220b8c944c /dev/da7

da7 may change to something else in future :/ Depending on the exact result of your zpool replace command. Disclaimer: I don't know the exact result since I am a bit noob. So I just recommend you run zpool status command and see if the /dev/daX name is used or not... This way you can learn and this command is not risky ;)

Sent from my phone

pechkin000 · Mar 8, 2019

What threw me off was that it was erroring out in GUI under new device name. Once i did it in in shell it worked no problem. I just assumed (dumb me) that there was something wrong with new device name (da7), not GUI... I think I am going to file a bug. Its definitely seems like a GUI issue, for whatever reason when using gui to replace, the old device reference (/dev/gptid/8016f0cf-5557-11e4-a84e-ac220b8c944c) was not being recognized. In shell it had no problems. GUI offered the correct replacement device, just couldn't find the old one. I assumed the issue was with the new name...

pro lamer · Mar 8, 2019

You manged to make things working :)

Your way was not the way recommended in our forums though. Ie
1.CLI is not recommended when GUI can be used (that is not the case when there is a bug so let's move on :) )
2.Our forums recommend some more sophisticated steps when replacing a disk using CLI (including glabel-stuff IIRC and some more but I want to focus on glabel)

Glabel-stuff causes a new disk is referenced using a well fixed identifier comparing to da identifiers. da identifiers may change after reboots (that's what I've heard) - it's told to be specific to BSD. If it changes one day for you the effect might be similar to losing a disk - the OS would be looking for a disk using da7 but it could change without any warning...

I recommend running zpool status so you can learn whether a da identifier is now in use or a glabel identifier. (Before recommending anything risky)

PS

pechkin000 said:
think I am going to file a bug.

Please post it's link here so we can follow :)

Sent from my phone

pechkin000 · Mar 8, 2019

I think i did use the gptid of teh old disk... the new name wasnt the issue, it wasnt recognizing the gptid of the old disk in the GUI... I did use glable to find the gptid of the disk.
I filed the bug here:
#79878

I know that I should have offlined the disk before replacing but I want able to dude to the gptid not being recognized, I should have done it in CLI since i couldnt in GUI, but from my previous experience I knew that it really shoudn't be too much of a problem, so i just rushed it and replaced it.

In any case, I am pretty sure its a GUI issue, since it all worked fine in CLI.Hopefully teh devs will be able to sort this out.
Thanks for your help!

pro lamer · Mar 9, 2019

pechkin000 said:
I think i did use the gptid of teh old disk

This is not what I meant... I'll try to explain below...

BTW

pechkin000 said:
the new name wasnt the issue

But it may become an issue in some specific case in future. I'm not sure cause you haven't posted your current zpool status results :/

Back to the track:
The new name visible in CLI (zpool status result which is different than GUI list) might now be da7. If it changes in future it may cause some issues (maybe minor ones especially you're using raidz3). If you had used GUI (which you couldn't have) or used a recommended CLI method (there are some threads describing how to do it, one can find it using for example "clicked partition" query - but it's still not the point since you already have a drive replaced. You can read them anyway if you have time) the new drive would be referenced by gptid. (Actually a partition would be referenced).

I'm not trying to emphasize what you did not ideally but where you are now. I I'm glad your way worked for you.

If da7 is in the zpool status result it means the gptid is not in use for the new drive. I suppose it might be just a minor issue (unless someone more experienced knows better). If it's minor you may just leave it like this - changing it might not be worth...

For now I recommend: if next drive failure occurs before the bug is fixed and you need to use CLI again do it by creating gptid for the new drive partition (search our forums - I've just done a quick search and there were so many results that I didn't know which one was worth recommending).

pechkin000 said:
I filed the bug here:
#79878

Thanks. I can't access it right now. I guess it's private. It's on purpose - your logs might contain private data. The dev team usually deletes the logs later and then makes the bug public.

Will you post please when they resolve it? I can't even enable watch for this bug - it's such private ;)

Sent from my phone

pro lamer · Mar 9, 2019

Just an update

pro lamer said:
For now I recommend: if next drive failure occurs before the bug is fixed and you need to use CLI again do it by creating gptid for the new drive partition (search our forums - I've just done a quick search and there were so many results that I didn't know which one was worth recommending).

Reason for the above: I'm afraid that having many drives in a vdev referenced using /dev/daX may cause some problem. Just a guess.

And I've just read that CLI method

cyberjock said:
has changed in the past, and may change at any point in the future, without warning.

It's from an old thread but the thread may seem interesting anyway. I guess if I had more problems with this I'd start my own new thread...

Sent from my phone

pechkin000 · Mar 25, 2019

Quick update. As per https://redmine.ixsystems.com/issues/79878 this has been fixed.
@pro lamer thanks for your help!

pro lamer · Mar 25, 2019

pechkin000 said:
Quick update. As per https://redmine.ixsystems.com/issues/79878 this has been fixed.
@pro lamer thanks for your help!

They marked it closed because of being a duplicate of https://redmine.ixsystems.com/issues/56454 which is still in progress ATM

Sent from my phone

Important Announcement for the TrueNAS Community.

SOLVED Unable to replace Disk

pechkin000

Explorer

pechkin000

Explorer

pechkin000

Explorer

pro lamer

Guru

pechkin000

Explorer

pro lamer

Guru

pechkin000

Explorer

pro lamer

Guru

pro lamer

Guru

pechkin000

Explorer

pro lamer

Guru

Similar threads