New drive stuck in "Replacing".

krakah

Dabbler
Joined
Jun 22, 2011
Messages
20
Hello all. I had a failed drive and purchased an identical one. Upon following the documentation to replace the drive it is now stuck in "replacing".


Here is an output from a zpool status -v.

Code:
root@s83-stor0[~]# zpool status -v POOL1
  pool: POOL1
 state: DEGRADED
status: One or more devices has experienced an error resulting in data
        corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
        entire pool from backup.
   see: http://illumos.org/msg/ZFS-8000-8A
  scan: resilvered 3.29T in 0 days 14:56:40 with 339759 errors on Sun Sep 13 02:38:48 2020
config:

        NAME                                              STATE     READ WRITE CKSUM
        POOL1                                             DEGRADED     0     0  857K
          raidz1-0                                        DEGRADED     0     0 2.05M
            gptid/4167c803-245f-11ea-ad31-0cc47aca5b0a    DEGRADED     0     0     0  too many errors
            gptid/470b7fd1-245f-11ea-ad31-0cc47aca5b0a    DEGRADED     0     0     0  too many errors
            gptid/4cbd97cd-245f-11ea-ad31-0cc47aca5b0a    DEGRADED     0     0     0  too many errors
            gptid/523fd061-245f-11ea-ad31-0cc47aca5b0a    DEGRADED     0     0     0  too many errors
            gptid/57ed538e-245f-11ea-ad31-0cc47aca5b0a    DEGRADED     0     0     0  too many errors
            replacing-5                                   DEGRADED     0     0     0
              7000134766729437270                         OFFLINE      0     0     0  was /dev/gptid/5d76883d-245f-11ea-ad31-0cc47aca5b0a
              gptid/1c0196b3-f219-11ea-852a-0cc47aca5b0a  ONLINE       0     0     0
        cache
          gptid/63c2d766-245f-11ea-ad31-0cc47aca5b0a      ONLINE       0     0     0

errors: Permanent errors have been detected in the following files:

        POOL1/DATA1:<0x69a0f>
        POOL1/DATA1:<0x69b7d>
        POOL1/DATA1:<0x6a9f0>
root@s83-stor0[~]#


Its currently in degraded mode and has been for a couple weeks while its resilvered several times. I'd rather not have another drive failure. How can I get this new drive to replace the failed one??
 

morganL

Captain Morgan
Administrator
Moderator
iXsystems
Joined
Mar 10, 2018
Messages
2,694
What type of drives are you using?
Is the pool down or still functioning?
 

krakah

Dabbler
Joined
Jun 22, 2011
Messages
20
They are all WD Easystore shucked 10TB drives.

The pool is still functioning. I'm worried about another drive loss which could render the pool completely failed.
 

morganL

Captain Morgan
Administrator
Moderator
iXsystems
Joined
Mar 10, 2018
Messages
2,694
So, the symptom seems to be that all drives are reporting errors? Can you confirm...
 

krakah

Dabbler
Joined
Jun 22, 2011
Messages
20
According to the zpool status output it appears that multiple drives have errors on them and are affecting 3 files which no longer exist.

I'm running a scrub right now which should be over in a couple hours. Then maybe I'll try a zpool clear and let it resilver again?
 

morganL

Captain Morgan
Administrator
Moderator
iXsystems
Joined
Mar 10, 2018
Messages
2,694
When multiple drives have errors, it's time to be suspicious that something else is causing the issue. I don't know your hardware. For example do you have ECC RAM?
 

krakah

Dabbler
Joined
Jun 22, 2011
Messages
20
While I can certainly look at hardware issues, I'm a bit more concerned about how to bring the pool out of degraded state. It seems stuck on 3 files that it says have permanent error. I've since deleted those files and they no longer exist. The data isnt incredibly important but it would be a major pain in the ass to get back. All the critical stuff is backed up already, and offsite. How can I just get this thing to replace the drive?
 

krakah

Dabbler
Joined
Jun 22, 2011
Messages
20
With about an hour left it always picks up these things. There used to be 5 files altogether. I deleted two and did a resilver and it completely forgot about those files. I deleted the remaining three and always with about an hour left in the resilver (takes about 14hrs) it still complains about these three files.

Code:

errors: Permanent errors have been detected in the following files:

        POOL1/DATA1:<0x69a0f>
        POOL1/DATA1:<0x69b7d>
        POOL1/DATA1:<0x6a9f0>


 

morganL

Captain Morgan
Administrator
Moderator
iXsystems
Joined
Mar 10, 2018
Messages
2,694
Unless someone else has seen a similar issue... it may be a bug. I'd suggest reporting it as a bug with all the details of hardware and software version. Which version of software are you using?

If you know anything about the files involved it might be useful.
 
Top