I fat-fingered zpool replace command and now one pool is resilvering 2 disks at once

everythingisonfire · Aug 24, 2022

I have two pools in raidz2 with 8 disks each. Each pool had one disk showing signs of going bad, so I decided to replace them both at the same time.
I've marked the two disks as offline in the freenas web UI, removed the 2 disks, inserted 2 new disks.
I gave the command to replace the disk for the "tank" pool from the web UI. Everything fine and it started resilvering.
The web UI then decided to become unresponsive when trying to view the volume status for the "backups" pool, so I gave a command to replace the other disk from SSH.
I must have done something wrong because now the pool "tank" is reporting 9 disks with 2 marked as being resilvered at the same time, and the pool "backups" is as I left it (8 disks, of which one is marked as offline).

Code:

[root@nas01] ~# zpool status -x
  pool: backups
 state: DEGRADED
status: One or more devices has been taken offline by the administrator.
        Sufficient replicas exist for the pool to continue functioning in a
        degraded state.
action: Online the device using 'zpool online' or replace the device with
        'zpool replace'.
  scan: scrub repaired 846K in 521h30m with 0 errors on Sun Aug 14 17:30:42 2022
config:

        NAME                                            STATE     READ WRITE CKSUM
        backups                                         DEGRADED     0     0     0
          raidz2-0                                      DEGRADED     0     0     0
            gptid/1b182b9d-bb16-11ea-be28-003048d504aa  ONLINE       0     0     0
            6335302298534030911                         OFFLINE      0     0     0  was /dev/gptid/5917b5c2-ff88-11e6-87c1-003048d504aa
            gptid/59b11de3-ff88-11e6-87c1-003048d504aa  ONLINE       0     0     0
            gptid/5a4b8236-ff88-11e6-87c1-003048d504aa  ONLINE       0     0     0
            gptid/5adfbcd0-ff88-11e6-87c1-003048d504aa  ONLINE       0     0     0
            gptid/5b768f4d-ff88-11e6-87c1-003048d504aa  ONLINE       0     0     0
            gptid/5c151242-ff88-11e6-87c1-003048d504aa  ONLINE       0     0     0
            gptid/5cb278e0-ff88-11e6-87c1-003048d504aa  ONLINE       0     0     0

errors: No known data errors

  pool: tank
 state: ONLINE
status: One or more devices is currently being resilvered.  The pool will
        continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
  scan: resilver in progress since Mon Aug  8 05:59:14 2022
        13.9T scanned out of 23.8T at 10.3M/s, 278h49m to go
        3.42T resilvered, 58.42% done
config:

        NAME                                              STATE     READ WRITE CKSUM
        tank                                              ONLINE       0     0     0
          raidz2-0                                        ONLINE       0     0     0
            gptid/c6946e38-bb14-11ea-be28-003048d504aa    ONLINE       0     0     0
            gptid/60dd0414-fe04-11e6-87c1-003048d504aa    ONLINE       0     0     0
            gptid/61724eb6-fe04-11e6-87c1-003048d504aa    ONLINE       0     0     0
            replacing-3                                   ONLINE       0     0     0
              da14                                        ONLINE       0     0     0  block size: 512B configured, 4096B native  (resilvering)
              gptid/c9c295a1-1719-11ed-9c40-003048d504aa  ONLINE       0     0     0  block size: 512B configured, 4096B native  (resilvering)
            gptid/62927254-fe04-11e6-87c1-003048d504aa    ONLINE       0     0     0
            gptid/6321d982-fe04-11e6-87c1-003048d504aa    ONLINE       0     0     0
            gptid/63b84530-fe04-11e6-87c1-003048d504aa    ONLINE       0     0     0
            gptid/64497fe6-fe04-11e6-87c1-003048d504aa    ONLINE       0     0     0

errors: No known data errors

Somehow running "history" via SSH does not report the previous zpool replace command I ran, but looking at the result I suspect I ran something like
zpool replace tank deviceIdOfTheRemovedDriveInBackups newDeviceName
or
zpool replace backups deviceIdOfTheRemovedDriveInTank newDeviceName
instead of
zpool replace backups deviceIdOfTheRemovedDriveInBackups newDeviceName

Web UI is still unresponsive when trying to view the volume status for the backups pool, but that's a minor concern right now.

How can I remediate? (have 2 pools with 8 disks each)
Thank you!!

Ericloewe · Aug 24, 2022

everythingisonfire said:
so I gave a command to replace the other disk from SSH.

Did you partition the disk according to TrueNAS' expectations?

everythingisonfire said:
I must have done something wrong because now the pool "tank" is reporting 9 disks with 2 marked as being resilvered at the same time, and the pool "backups" is as I left it (8 disks, of which one is marked as offline).

No, it's actually replacing gptid/c9c295a1-1719-11ed-9c40-003048d504aa with da14 (something of an artifact of not following TrueNAS' process when doing this manually), the former of which wasn't offlined or otherwise reporting errors. Not much you can do here other than letting it finish, at which point you'll have a disk not in use and a disk as part of the vdev.

It looks to me like the original replace operation was never actually issued for backups.

What concerns me far more than any of this, however, are your scrub times.

everythingisonfire said:
scrub repaired 846K in 521h30m with 0 errors on Sun Aug 14 17:30:42 2022

521 hours. And 30 minutes. That's almost 22 days! Many of us run scrubs every two weeks, which is less time than it takes for your pools to scrub once. Something is very wrong there - it could be the failing disks you mentioned, to some extent, but I'd like to know more about the setup before making any definitive statements.

everythingisonfire · Aug 25, 2022

thank you for looking into this

Ericloewe said:
Did you partition the disk according to TrueNAS' expectations?

no, I inserted two brand new disks and issued one replace command from the web UI and one from SSH

Ericloewe said:
No, it's actually replacing gptid/c9c295a1-1719-11ed-9c40-003048d504aa with da14 (something of an artifact of not following TrueNAS' process when doing this manually), the former of which wasn't offlined or otherwise reporting errors. Not much you can do here other than letting it finish, at which point you'll have a disk not in use and a disk as part of the vdev.

ok, looks like it will take another couple of weeks, I'll just let it finish then

It looks to me like the original replace operation was never actually issued for backups.

indeed that's what it appears. But I'm sure I issued two replace commands, one from the browser and one from SSH that's why I'm thinking I messed the SSH one up and somehow I'm now replacing the same disk twice.

What concerns me far more than any of this, however, are your scrub times.

521 hours. And 30 minutes. That's almost 22 days! Many of us run scrubs every two weeks, which is less time than it takes for your pools to scrub once. Something is very wrong there - it could be the failing disks you mentioned, to some extent, but I'd like to know more about the setup before making any definitive statements.

yes, that's excessive. There are 16 disks between 2 pools, and several need attention. I started with replacing the 2 worst ones, but there's more.
Annotated smart report:

today the freenas gods are being kind, and the volume status for the backups pool loaded from the web ui:

Ericloewe · Aug 25, 2022

The disks don’t seem to be quite terrible enough to explain the crawling scrub… Are these SMR drives by any chance?

everythingisonfire · Aug 25, 2022

checked a previous report, and da14 had 1093 Current Pending Sectors and 1482 Offline Uncorrectable Sectors on 8/4 (while the last scrub was running)

everythingisonfire · Aug 25, 2022

no SMR. It's a mixture of WD Gold and Seagate IronWolf Pro 4TB NAS Hard Drive 7200 RPM 256MB Cache CMR SATA 6.0Gb/s 3.5" ST4000NE001

Alex_K · Aug 26, 2022

May speed up resilvering with "Resilver Priority" tab in Storage.

For scrub what are the values for tunables mentioned here https://www.truenas.com/community/threads/scrub-performance-tuning.51959/
in your system? May check with sysctl

Also, what's normal workload for your backup pool? Maybe its just busy all the time and thats why scrub is running that long. May check disk busy graph in Reporting under Disks tab. Use zoom out

Temperature on disks are a lil higher then you may want

everythingisonfire · Aug 29, 2022

For scrub what are the values for tunables mentioned here https://www.truenas.com/community/threads/scrub-performance-tuning.51959/
in your system?

default for scrub, higher priority for resilver. I've lowered vfs.zfs.scrub_delay to 2 now

Also, what's normal workload for your backup pool? Maybe its just busy all the time and thats why scrub is running that long.

looks like the backups pool is busy all the time

Alex_K · Aug 29, 2022

Ok so disks being under this kind of load together with everything else will affect scrub/resilver time but 22 days still a bit to much

Priority Resilver should normally take up to 48 hours for your configuration - depending on %data space used on the disks.

Normally, resilver is most stressful - and therefore most risky operation for your disks. You want it to complete as fast as possible. Usually to achieve that I try to reduce/pause other loads. If I don't it make other operations crawling slow anyway.

So if you can, give your backup pool some air (btw 43C on disks is not good, if you can - increase cooling at least until you finish replacing disks, if its a server it possibly can do that via IPMI /iLO/ iDRAC etc depend on vendor without restart)
Pause other tasks which could be paused for 2-3 days
Make Resilver priority 24/7
and watch if your resilver time drops and how much.

Also, check if you have in your /var/log/messsages some errors like maybe SYNCHRONIZE CACHE or CAM status: Command timeout

everythingisonfire · Aug 29, 2022

the backups pool is always busy and had one drive (now removed) with thousands of bad sectors - these factors might explain the long scrub time. Backups is not being resilvered at this moment.

The tank pool however gets busy only once a month for about 8 days during the scrub using the default priority settings (see peaks in the graph of da15 in my previous post for example). Only reason I can think of why the resilver is going so slow for the tank pool is that somehow it's writing to da2 and da14 at the same time, instead of just to da2. I was expecting tank to resilver one disk and backups to resilver one disk, but I messed something up and tank is writing to two disks. My guess is that this is forcing the healthy drives in the pool to keep seeking to different parts of the disk and this is causing the slow down

I've removed some dust from the disks, let's see if it helps with the temperatures

everythingisonfire · Aug 29, 2022

Alex_K said:
Also, check if you have in your /var/log/messsages some errors like maybe SYNCHRONIZE CACHE or CAM status: Command timeout

those 2 messages don't show up, I've only got a handful of alerts from smartd once every 30 minutes about the other drives with bad sectors

Alex_K · Aug 29, 2022

Ok it does write to both da2 and da14 but judging by "disk busy" its looks to be limited by their write performance, not other disk's read performance.

Could it have been that you issued in ssh command like this:
zpool replace tank deviceIdOfTheRemovedDriveInTank da14?
That would explain the current configuration

About why its showing da14 there and not some GUID. daX address may change and thats why you want guid there.

What you have normally as devices in your pools are not your disks. Not entire disks. But partitions
For each physical disk you want to use in your data pools, FreeNAS GUI (Middleware) partition them by default like this
daXp1 size 2GB - for swap using gmirror (not always used but always partitioned)
daXp2 size the rest of the disk gets identified by GUID and that GUID takes part in the pool

Now any change in topology may restart resilver from the beginning, and given its more then half-done you'd not want that probably.
But when it would settle down, you probably wish to remove da14 from tank, make sure tank is ok without it, then
"quick wipe" da14 from GUI, than - carefully - do this:
dd if=/dev/zero of=/dev/da14 bs=1m count=1
then "quick wipe" it again
this is fast and reliable way for me for GUI not to throw errors when I replace failed drive with another drive which was part of zfs pool on same server before.
Then replace the offline device from backups with this one

If, however, you want to do it manually, make sure that device name is still da14, then after quick wiping it:
gpart add -a 4k -b 128 -i 1 -t freebsd-zfs -s 2G da14
gpart add -a 4k -i 2 -t freebsd-zfs da14
gpart list da14
take note of "rawuuid" of da14p2
you'll replace missing member of "backups" with guid/that_rawuuid
you may take a loot at gmirror status and maybe add da14p1 there if you feel like it, but even if you wont - FreeNAS probably wont suffer much from 1 swap partition absence but will rearrange all gmirrors on next reboot using the partition you prepared for it

everythingisonfire · Aug 29, 2022

Alex_K said:
Could it have been that you issued in ssh command like this:
zpool replace tank deviceIdOfTheRemovedDriveInTank da14?
That would explain the current configuration

totally possible

...than - carefully - do this:...

I promise!

And thanks for the explanation, very informative

everythingisonfire · Sep 6, 2022

Update: resilver finished on tank. Ended up not quite as we expected (or at least how I understood it), but everything seems good anyway.
Expected: da14 is in the tank pool with a single partition and da2 is gone.
Actual: da2 is in the pool, da2 has the 2 partitions, the pool has 8 disks and is marked as online

The backups pool is as before

I think the best way to proceed now is to quick wipe da14 and then issue the replace command from the web UI, and backups will eventually be fine as well.

Alex_K · Sep 6, 2022

Well seems GUI replace with da2 on tank took precedence, and GUI did partition da2 properly so it ended up all just fine.
You can do just quick wipe on da14 and replace it into backups it probably would be fine, but
without dd and 2nd quick wipe when I reuse drive which previously was in a pool, I often see in log (/var/log/messages) something about one of superblocks on daXX being corrupted, and it uses the other one. For me that error won't show up if I do quick wipe+dd begining of the disk+another quick wipe.

Important Announcement for the TrueNAS Community.

I fat-fingered zpool replace command and now one pool is resilvering 2 disks at once

everythingisonfire

Cadet

Ericloewe

Server Wrangler

everythingisonfire

Cadet

Ericloewe

Server Wrangler

everythingisonfire

Cadet

everythingisonfire

Cadet

Alex_K

Explorer

everythingisonfire

Cadet

Alex_K

Explorer

everythingisonfire

Cadet

everythingisonfire

Cadet

Alex_K

Explorer

everythingisonfire

Cadet

everythingisonfire

Cadet

Alex_K

Explorer

Similar threads