Degraded Pool

LIGISTX

Guru
Joined
Apr 12, 2015
Messages
525
Hey guys, I got an email last night about a degraded pool, but I am not 100% sure what is actually wrong here.

When I run zpool status I get this. Should I assume all has been fixed clear the issue (I think that is zpool clear?). I did have a drive not show up initially when I rebooted after some hardware changes, but I am not sure if that caused this issue.

Thanks for the help!

Code:
          raidz2-0                                      DEGRADED     0     0     0
            gptid/ab0351e8-44ea-11e8-8cad-e0071bffdaee  ONLINE       0     0     0
            gptid/abbfceac-44ea-11e8-8cad-e0071bffdaee  ONLINE       0     0     0
            gptid/ac8d872a-44ea-11e8-8cad-e0071bffdaee  ONLINE       0     0     0
            gptid/ad4a2436-44ea-11e8-8cad-e0071bffdaee  ONLINE       0     0     0
            gptid/ae0d7e64-44ea-11e8-8cad-e0071bffdaee  ONLINE       0     0     0
            gptid/aeca106f-44ea-11e8-8cad-e0071bffdaee  ONLINE       0     0     0
            gptid/af89686d-44ea-11e8-8cad-e0071bffdaee  ONLINE       0     0     0
            gptid/b04ad4fc-44ea-11e8-8cad-e0071bffdaee  ONLINE       0     0     0
            gptid/b10b6452-44ea-11e8-8cad-e0071bffdaee  ONLINE       0     0     0
            gptid/b1d949c1-44ea-11e8-8cad-e0071bffdaee  DEGRADED     0     0    81  too many errors

errors: No known data errors
 

tfran1990

Patron
Joined
Oct 18, 2017
Messages
294
There might be a problem with that last disc (gptid/b1d949c1-44ea-11e8-8cad-e0071bffdaee)
it could be cable/connector related.
how are you connecting the disc to your freenas?
 

LIGISTX

Guru
Joined
Apr 12, 2015
Messages
525
There might be a problem with that last disc (gptid/b1d949c1-44ea-11e8-8cad-e0071bffdaee)
it could be cable/connector related.
how are you connecting the disc to your freenas?
Freenas is virtualized under ESXi with my HBA passed through to freenas, so it does see it “bare metal”. From the HBA I run a SAS expander and then SAS to quad SATA plugs to my drives. It hasn’t been an issue in the past, but as luck has it, seemingly every time I physically move the box, I get an issue of some sort, usually a plug that wiggled out or something. I have checked the cables and they seem fine though.

I am not extremely well versed in ZFS, so while I think it isn’t an issue, I very easily could be wrong, and I wouldn’t know what to check fo confirm one way or another.
 

tfran1990

Patron
Joined
Oct 18, 2017
Messages
294
reseat the cable and do a smartctl -t long on that last device and see if maybe that 81 changes. you could also switch the disc that has 81 errors to a different cable or even a different port on the expander to see if the problem follows.
 

Jessep

Patron
Joined
Aug 19, 2018
Messages
379
Are you running scheduled SMART tests?
 

LIGISTX

Guru
Joined
Apr 12, 2015
Messages
525

LIGISTX

Guru
Joined
Apr 12, 2015
Messages
525
So I have a scheduled Long smart test tonight actually, and will get an email in a few days of the pool status and smart report. I suppose I should just wait until that is all completed before I make any assumptions?
 

LIGISTX

Guru
Joined
Apr 12, 2015
Messages
525
Actually, I found this old report I got immediately after I had the issue when I physically moved the box and I had a degraded state issue at bootup. It only saw 9 of my 10 drives, and once I got all 10 showing up again, I got this:

Code:
          raidz2-0                                      ONLINE       0     0     0
            gptid/ab0351e8-44ea-11e8-8cad-e0071bffdaee  ONLINE       0     0     0
            gptid/abbfceac-44ea-11e8-8cad-e0071bffdaee  ONLINE       0     0     0
            gptid/ac8d872a-44ea-11e8-8cad-e0071bffdaee  ONLINE       0     0     0
            gptid/ad4a2436-44ea-11e8-8cad-e0071bffdaee  ONLINE       0     0     0
            gptid/ae0d7e64-44ea-11e8-8cad-e0071bffdaee  ONLINE       0     0     0
            gptid/aeca106f-44ea-11e8-8cad-e0071bffdaee  ONLINE       0     0     0
            gptid/af89686d-44ea-11e8-8cad-e0071bffdaee  ONLINE       0     0     0
            gptid/b04ad4fc-44ea-11e8-8cad-e0071bffdaee  ONLINE       0     0     0
            gptid/b10b6452-44ea-11e8-8cad-e0071bffdaee  ONLINE       0     0     0
            gptid/b1d949c1-44ea-11e8-8cad-e0071bffdaee  ONLINE       0     0    13


Does this mean the drive is in fact getting worse and thus needs to be replaced? Or could it simply be a bad cable and a reboot/cable swap may fix my problem?
 

LIGISTX

Guru
Joined
Apr 12, 2015
Messages
525
reseat the cable and do a smartctl -t long on that last device and see if maybe that 81 changes. you could also switch the disc that has 81 errors to a different cable or even a different port on the expander to see if the problem follows.
Any ideas with this new info ^?

Thanks!
 

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,703
You're getting CRC errors, which on their own point to cables rather than disks. If you clear the errors and do a scrub, do they come back?
 

LIGISTX

Guru
Joined
Apr 12, 2015
Messages
525
You're getting CRC errors, which on their own point to cables rather than disks. If you clear the errors and do a scrub, do they come back?
Clear the errors via zpool clear? Try a different cable, scrub, and see what happens?
 

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,703

LIGISTX

Guru
Joined
Apr 12, 2015
Messages
525
Yes, that's what I intended.
Ok, after a scrub and smart Long, same numbers. So I did zpool clear, figured out what disc it was mad about, reseated the SATA which so far seems to have fixed it, and am running a scrub now. Will run a Long after that as well and see what info I get.

Thanks!
 

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,703
Will run a Long after that as well and see what info I get.
Although a cabling change would not be expected to have any impact on the SMART findings as those tests are run from the onboard controller on the disk itself... no cables in that pathway.
 

LIGISTX

Guru
Joined
Apr 12, 2015
Messages
525
Although a cabling change would not be expected to have any impact on the SMART findings as those tests are run from the onboard controller on the disk itself... no cables in that pathway.
Ah, that’s a good point I hadn’t considered. Thanks for pointing that out!
 
Top