ATA error count increased, device experienced unrecoverable error

koberulz

Dabbler
Joined
Aug 4, 2022
Messages
45
Rebooted and it now said it was offline, so I shut down and pulled the disk...but it's still showing sde in the pool. Has it renamed everything? None of the current disks show as faulted, and the serial number of the disk I pulled definitely matches all the thrown errors (which I've been getting several times a day over the past day or so).
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,994
You beat me to it, good work.
 

koberulz

Dabbler
Joined
Aug 4, 2022
Messages
45
You beat me to it, good work.
Not sure what I beat you to - rebooting? - but this doesn't address the sde-still-in-the-pool thing.

I'm 99% sure I yanked the right disk and it just renamed, but 99 is not 100 and I definitely don't want to send the wrong disk back. As it is I may well end up receiving the replacement after I've left the state for a week and a half, which is not ideal.
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,994
post the output of zpool status in code brackets so we can see exactly what you are looking at.
 

koberulz

Dabbler
Joined
Aug 4, 2022
Messages
45
Code:
  pool: MainStorage
 state: DEGRADED
status: One or more devices has been taken offline by the administrator.
        Sufficient replicas exist for the pool to continue functioning in a
        degraded state.
action: Online the device using 'zpool online' or replace the device with
        'zpool replace'.
  scan: scrub repaired 960K in 19:04:21 with 0 errors on Sun Sep 10 19:04:23 2023
config:

        NAME                                      STATE     READ WRITE CKSUM
        MainStorage                               DEGRADED     0     0     0
          raidz2-0                                DEGRADED     0     0     0
            99a0bc63-87b7-44e2-a174-08535631723d  ONLINE       0     0     0
            53b54420-0eb5-4667-a59a-0fed5e2d788e  ONLINE       0     0     0
            31133de3-dad8-4668-8544-c2b780480239  OFFLINE      0     0     0
            91f76a5f-3418-4ad9-aaab-34c89f4d04f7  ONLINE       0     0     0

errors: No known data errors

  pool: Mirror
 state: ONLINE
  scan: scrub repaired 0B in 1 days 00:54:18 with 0 errors on Mon Sep  4 00:54:20 2023
config:

        NAME                                      STATE     READ WRITE CKSUM
        Mirror                                    ONLINE       0     0     0
          mirror-0                                ONLINE       0     0     0
            0b1a6f66-8b64-49da-aaab-661b22a0e38e  ONLINE       0     0     0
            43e9436d-3f8a-430a-b4ba-fb6028ce96df  ONLINE       0     0     0

errors: No known data errors

  pool: boot-pool
 state: ONLINE
status: Some supported and requested features are not enabled on the pool.
        The pool can still be used, but some features are unavailable.
action: Enable all features using 'zpool upgrade'. Once this is done,
        the pool may no longer be accessible by software that does not support
        the features. See zpool-features(7) for details.
  scan: scrub repaired 0B in 00:08:30 with 0 errors on Tue Sep  5 03:53:31 2023
config:

        NAME         STATE     READ WRITE CKSUM
        boot-pool    ONLINE       0     0     0
          nvme0n1p3  ONLINE       0     0     0

errors: No known data errors
 

NugentS

MVP
Joined
Apr 16, 2020
Messages
2,947
Well in this circumstance, the monthly test finished less than a week before the issue cropped up, so it makes no difference.

Given it takes two days to test one disc, testing six discs weekly seems excessive, no? It'd be running tests 24/7.
The tests will all run at the same time. Its the drive testing itself, not the OS testing the drive
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,994
Sorry, I brain farted. This is easy to answer. The drive you should have pulled would be Serial Number: 2LGA639K.
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,994
I honestly do not know what to do for the "OFFLINE" drive, maybe try zpool clear MainStorage and see if that resolves it. Also, maybe just a complete power off if you haven't tried that, stranger things have happened.

Is this a 3 drive or 4 drive MainStorage pool? If it's a 4 drive then you still need to online the drive. If this is a 3 drive pool then we just need to figure out how to remove the one drive. I have an idea but someone here should KNOW the correct answer.
 

koberulz

Dabbler
Joined
Aug 4, 2022
Messages
45
Sorry, I brain farted. This is easy to answer. The drive you should have pulled would be Serial Number: 2LGA639K.
That's what I figured but it was listed as drive sde and there's still a drive sde listed in the pool.

I honestly do not know what to do for the "OFFLINE" drive, maybe try zpool clear MainStorage and see if that resolves it. Also, maybe just a complete power off if you haven't tried that, stranger things have happened.
It being offline is correct, isn't it? I took it offline, shut down, removed it, and booted up.

Is this a 3 drive or 4 drive MainStorage pool? If it's a 4 drive then you still need to online the drive. If this is a 3 drive pool then we just need to figure out how to remove the one drive. I have an idea but someone here should KNOW the correct answer.
It was four, one drive died, now it's three.
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,994
It was four, one drive died, now it's three
I just need to be clear and not make an assumption, If your pool originally started (before the failure) as a 4 disk pool, then you need to ONLINE the drive that is listed as OFFLINE. If this is the case, can you online the drive via the GUI?
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,994
That's what I figured but it was listed as drive sde and there's still a drive sde listed in the pool.
sde is not assigned by drive serial number, it's by which drive is recognized first. This means that sda could be one drive today, and the next time you reboot sda could be a different drive. This is why we say to use the drive serial number.

It being offline is correct, isn't it? I took it offline, shut down, removed it, and booted up.
Not if it really was a 4 drive pool before the failure.
 

koberulz

Dabbler
Joined
Aug 4, 2022
Messages
45
I just need to be clear and not make an assumption, If your pool originally started (before the failure) as a 4 disk pool, then you need to ONLINE the drive that is listed as OFFLINE. If this is the case, can you online the drive via the GUI?
But there's no actual drive to online? The drive is sitting on my desk.

Are you assuming I have a replacement already installed?
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,994
Are you assuming I have a replacement already installed?
Yes. Sorry, I get involved in so many different forum threads, I forgot that you were shipping the drive off for RMA and eventually you will get a replacement.

Your system is reporting correctly. Once you have a replacement drive, you will choose to REPLACE the offlined drive with the new drive.
 
Top