Failed to replace hard drive via GUI

Delphinus

Dabbler
Joined
Sep 15, 2011
Messages
14
Hello All, I have a failed drive in one of my raidz2 pools. Normally this is a trivial thing to replace, done it a dozen times already but this time around I cannot get the GUI to replace the failed disk with the new disk I installed. Even tried via command line and can't get the replace to take. Anyone have any ideas? Appreciate the help!!

Code:
 media
 state: DEGRADED
status: One or more devices has experienced an unrecoverable error.  An
        attempt was made to correct the error.  Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
        using 'zpool clear' or replace the device with 'zpool replace'.
   see: http://illumos.org/msg/ZFS-8000-9P
  scan: scrub repaired 0 in 0 days 13:28:58 with 0 errors on Sun Jun 21 13:28:59 2020
config:

        NAME                                            STATE     READ WRITE CKSUM
        media                                           DEGRADED     0     0     0
          raidz2-0                                      DEGRADED     0     0     0
            gptid/6861e259-1068-11e9-918e-0c9d9264b1d5  ONLINE       0     0     0
            gptid/6994874f-1068-11e9-918e-0c9d9264b1d5  ONLINE       0     0     0
            gptid/6bd32ca7-1068-11e9-918e-0c9d9264b1d5  ONLINE       0     0     0
            gptid/6e20a0df-1068-11e9-918e-0c9d9264b1d5  ONLINE       0     0     0
            4779586579116178481                         OFFLINE      6     0     0  was /dev/gptid/6f54e2db-1068-11e9-918e-0c9d9264b1d5
            gptid/718e607b-1068-11e9-918e-0c9d9264b1d5  ONLINE       0     0     0
            gptid/73fe3fea-1068-11e9-918e-0c9d9264b1d5  ONLINE       0     0     0
            gptid/762e78ac-1068-11e9-918e-0c9d9264b1d5  ONLINE       0     0     0


Replace via GUI and receive this error:
1592944906502.png


Have no idea how to get around this. It's a brand new disk, just took it out of the anti-static bag, don't think it's DOA...
 

Samuel Tai

Never underestimate your own stupidity
Moderator
Joined
Apr 24, 2020
Messages
5,399
What's the exact model of the new drive? You may have picked up a SMR drive by chance.
 

Delphinus

Dabbler
Joined
Sep 15, 2011
Messages
14
Even if it was an SMR, should still be able to use it as a replacement drive, no?

IT's a Seagate 10 TB NAS IronWolf drive. Definitely not an SMR.
 

Samuel Tai

Never underestimate your own stupidity
Moderator
Joined
Apr 24, 2020
Messages
5,399
Even if it was an SMR, should still be able to use it as a replacement drive, no?

Odds are, no. The resilver would take approximately 10x longer than with a CMR drive, and could end up knocking the drive out of the pool anyway due to timeouts.

IT's a Seagate 10 TB NAS IronWolf drive.

OK. The drive is failing the gpart create -s gpt da19 step, most likely because it may already have a partition table. Try gpart destroy -F da19 to wipe out any factory partitioning. Observe any errors from the shell during this. Then try your disk replacement again.
 

Delphinus

Dabbler
Joined
Sep 15, 2011
Messages
14
My point is, the drive would have added fine even if it was SMR, yes the resilver would have failed.

Gparted says the device is not configured.
Screenshot_20200623-173154.png
 

Samuel Tai

Never underestimate your own stupidity
Moderator
Joined
Apr 24, 2020
Messages
5,399
Hmm. What happens when you manually run gpart create -s gpt da19?
 

subhuman

Contributor
Joined
Nov 21, 2019
Messages
121
Have no idea how to get around this. It's a brand new disk, just took it out of the anti-static bag, don't think it's DOA...
Sounds like it's time to confirm that the drive is not DOA.
 

Samuel Tai

Never underestimate your own stupidity
Moderator
Joined
Apr 24, 2020
Messages
5,399
Yes, the drive is probably DOA. It won't accept writes.
 

danb35

Hall of Famer
Joined
Aug 16, 2011
Messages
15,504

Delphinus

Dabbler
Joined
Sep 15, 2011
Messages
14
Drive is not DOA, plugged it into another system and was able to initilize it and create a volume. It's something with freenas...

1593010585656.png
 

Delphinus

Dabbler
Joined
Sep 15, 2011
Messages
14
Then you shouldn't be putting it in your pool anyway before you've done some burn-in and testing. There are a number of resources for this; I tend to favor the one I host:

Really? I'm not running an enterprise data center here...if the drive fails on initial resilver I warranty it and replace it. And all this just to plug your own burn in tester?
 

danb35

Hall of Famer
Joined
Aug 16, 2011
Messages
15,504
Yes, really.

And all this just to plug your own burn in tester?
I didn't write it, but I host it. And my "own burn in tester" consists of a handful of open-source tools that ship with FreeNAS. I don't care which method you use, but it's the link I have most handy.
 

Samuel Tai

Never underestimate your own stupidity
Moderator
Joined
Apr 24, 2020
Messages
5,399
What's the firmware level of this drive? See this earlier thread:

 

Samuel Tai

Never underestimate your own stupidity
Moderator
Joined
Apr 24, 2020
Messages
5,399
Does the same behavior persist when you connect the drive directly to a MB SATA port, as opposed to a port on your HBA?
 

subhuman

Contributor
Joined
Nov 21, 2019
Messages
121
Really? I'm not running an enterprise data center here...if the drive fails on initial resilver I warranty it and replace it. And all this just to plug your own burn in tester?
Yes, really. Never assume.
Think about what you're saying. FreeNAS worked fine with the old drive until it failed, it's working fine with the other drives in the pool, you have a problem with this one drive, and you claim:
It's something with FreeNAS...

Great, you were able to partition it under Windows. But have you so much as run self-tests in the drive?
At bare minimum you should start with both the short and long SMART self-tests. New drives being DOA is not common, but it does happen. If it's never happened to you before, great. But it's possible this one is. Verify, don't assume.

Your pool is degraded, you want the new drive in there ASAP. I get that. But the worst-case scenario you can run into is to throw in a new drive, make the old drives work hard to resilver, then have the new drive fail and have to immediately go through the resilver again. This is what you want to avoid. This is why so many people recommend such intensive testing of new drives.
 

Delphinus

Dabbler
Joined
Sep 15, 2011
Messages
14
Code:
SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short offline       Completed without error       00%        22         -

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing


Running the long test now...and yes, I do have 16 other IronWolf 10TB drives in the NAS already. Which is why I don't understand why it's having such a hard time replacing this drive...I've also tried 2 other IronWolf 10TB drives that were part of another array and it also won't add those. So I really don't think it's a drive problem or a "burn-in" issue...for some reason freenas won't accept a drive replacement. Pretty sure I could throw any drive in there and it's gonna throw the same error.
 

Samuel Tai

Never underestimate your own stupidity
Moderator
Joined
Apr 24, 2020
Messages
5,399
There's another possibility: maybe that port was damaged pulling out the old drive. To make sure the drive can actually write under FreeNAS, you could try dd if=/dev/zero of=/dev/da19 bs=4096 count=10000 status=progress after your long test.
 
Top