22.02.1 upgrade to 22.02.4 Pool Failure , Hardware Failure?

majerus

Contributor
Joined
Dec 21, 2012
Messages
126
Tried to upgrade from 22.02.2.1 -> 22.02.4 and after can no longer decrypt a pool.

This specific pool lives on a disk shelf (DS4246) with a single controller , connected to a Supermicro motherboard using LSI SAS9200-8e and a single cable from the shelf to the controller.


After the upgrade everything looked ok, however I then tried to unlock a dataset and that is where all hell broke loose. Once I put in the passphrase to unlock and press ok i start to get a bunch of read errors shown below. My other 2 pools not on this disk shelf are working fine.

So far I have tried the following.

  • Swapped the SAS cable with a new one , restart reboot same failure
  • Unplugged the top controller on the disk shelf, installed the 2nd one in the middle slot of the disk shelf, restart reboot same problem.
  • Removed a 10GB NIC that was sorta close to the HBA thinking maybe heat, restart reboot same problem.
  • Connected the sas cable to the top port instead of the bottom port on the SAS9200-8e, restart reboot same problem.

At this point not sure what else to do maybe get a new sas controller, and hope that fixes it? Open to ideas. Also below is a bit more detail on timing. The pool starts fine, its only after unlocking does it freak out.

BEFORE Unlock
Code:
root@Vault:~# zpool status Infinity -v
  pool: Infinity
 state: ONLINE
  scan: resilvered 532K in 00:00:00 with 0 errors on Fri Jul 22 17:52:27 2022
config:

        NAME                                      STATE     READ WRITE CKSUM
        Infinity                                  ONLINE       0     0     0
          raidz2-0                                ONLINE       0     0     0
            ad4febd2-9418-4c98-9a29-6009d93d477d  ONLINE       0     0     0
            d1ae15bd-ea73-4e35-9343-dedd8bbe0e6f  ONLINE       0     0     0
            40bc676e-1411-4fb8-9b2a-e9d62af1dee4  ONLINE       0     0     0
            c8d40c54-0498-49f5-95c7-dfebd5e349a4  ONLINE       0     0     0
            3472a6bf-91c2-4e73-92fe-76479307e952  ONLINE       0     0     0
            84d7b5d5-38ab-4dc2-8b54-d42b1836aa3d  ONLINE       0     0     0
            00f5e183-37a3-4053-964c-e39ae771f17d  ONLINE       0     0     0
            4c39c9db-3bd8-4705-96df-d2f182efd75e  ONLINE       0     0     0
            0911d6f0-cac5-4a1f-92d0-f334e399eab4  ONLINE       0     0     0
            f3c2cd4a-c2a2-487f-baec-0af0f17d1de8  ONLINE       0     0     0
            16926f72-01a9-43ae-b08b-5bc0a90d80d6  ONLINE       0     0     0
            7dbe4bb7-80a6-4557-ae68-19b21f176a4d  ONLINE       0     0     0

errors: No known data errors



After Unlocking
Code:
root@Vault:~# zpool status Infinity -v
  pool: Infinity
 state: ONLINE
status: One or more devices are faulted in response to IO failures.
action: Make sure the affected devices are connected, then run 'zpool clear'.
   see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-JQ
  scan: resilvered 532K in 00:00:00 with 0 errors on Fri Jul 22 17:52:27 2022
config:

        NAME                                      STATE     READ WRITE CKSUM
        Infinity                                  ONLINE       0     0     0
          raidz2-0                                ONLINE       3     0     0
            ad4febd2-9418-4c98-9a29-6009d93d477d  ONLINE       4     0     0
            d1ae15bd-ea73-4e35-9343-dedd8bbe0e6f  ONLINE       0     0     0
            40bc676e-1411-4fb8-9b2a-e9d62af1dee4  ONLINE       0     0     0
            c8d40c54-0498-49f5-95c7-dfebd5e349a4  ONLINE       4     0     0
            3472a6bf-91c2-4e73-92fe-76479307e952  ONLINE       4     0     0
            84d7b5d5-38ab-4dc2-8b54-d42b1836aa3d  ONLINE       4     0     0
            00f5e183-37a3-4053-964c-e39ae771f17d  ONLINE       4     0     0
            4c39c9db-3bd8-4705-96df-d2f182efd75e  ONLINE       4     0     0
            0911d6f0-cac5-4a1f-92d0-f334e399eab4  ONLINE       4     0     0
            f3c2cd4a-c2a2-487f-baec-0af0f17d1de8  ONLINE       0     0     0
            16926f72-01a9-43ae-b08b-5bc0a90d80d6  ONLINE       4     0     0
            7dbe4bb7-80a6-4557-ae68-19b21f176a4d  ONLINE       4     0     0

errors: List of errors unavailable: pool I/O is currently suspended




Code:
root@Vault:~# tail -f /var/log/messages
Mar 17 12:43:36 Vault kernel:  sdan: sdan1 sdan2
Mar 17 12:43:36 Vault kernel: sd 0:0:25:0: [sdao] Attached SCSI disk
Mar 17 12:43:36 Vault kernel:  sdam: sdam1 sdam2
Mar 17 12:43:36 Vault kernel: sd 0:0:24:0: [sdan] Attached SCSI disk
Mar 17 12:43:36 Vault kernel:  sdaj: sdaj1 sdaj2
Mar 17 12:43:36 Vault kernel: sd 0:0:23:0: [sdam] Attached SCSI disk
Mar 17 12:43:36 Vault kernel:  sdal: sdal1 sdal2
Mar 17 12:43:36 Vault kernel: sd 0:0:20:0: [sdaj] Attached SCSI disk
Mar 17 12:43:36 Vault kernel: sd 0:0:22:0: [sdal] Attached SCSI disk
Mar 17 12:46:35 Vault kernel: cgroup: cgroup: disabling cgroup2 socket matching due to net_prio or net_cls activation
<INSERTED THIS LINE HERE, NOTICE THESE ERRORS AFTER UNLOCKING>
Mar 17 12:49:00 Vault kernel: zio pool=Infinity vdev=/dev/disk/by-partuuid/84d7b5d5-38ab-4dc2-8b54-d42b1836aa3d error=5 type=1 offset=7307039215616 size=4096 flags=180880
Mar 17 12:49:00 Vault kernel: zio pool=Infinity vdev=/dev/disk/by-partuuid/84d7b5d5-38ab-4dc2-8b54-d42b1836aa3d error=5 type=1 offset=270336 size=8192 flags=b08c1
Mar 17 12:49:00 Vault kernel: zio pool=Infinity vdev=/dev/disk/by-partuuid/84d7b5d5-38ab-4dc2-8b54-d42b1836aa3d error=5 type=1 offset=9794672140288 size=8192 flags=b08c1
Mar 17 12:49:00 Vault kernel: zio pool=Infinity vdev=/dev/disk/by-partuuid/84d7b5d5-38ab-4dc2-8b54-d42b1836aa3d error=5 type=1 offset=9794672402432 size=8192 flags=b08c1
Mar 17 12:49:00 Vault kernel: zio pool=Infinity vdev=/dev/disk/by-partuuid/c8d40c54-0498-49f5-95c7-dfebd5e349a4 error=5 type=1 offset=7307039215616 size=4096 flags=180880
Mar 17 12:49:00 Vault kernel: zio pool=Infinity vdev=/dev/disk/by-partuuid/3472a6bf-91c2-4e73-92fe-76479307e952 error=5 type=1 offset=7307039215616 size=4096 flags=180880
Mar 17 12:49:00 Vault kernel: zio pool=Infinity vdev=/dev/disk/by-partuuid/c8d40c54-0498-49f5-95c7-dfebd5e349a4 error=5 type=1 offset=270336 size=8192 flags=b08c1
Mar 17 12:49:00 Vault kernel: zio pool=Infinity vdev=/dev/disk/by-partuuid/3472a6bf-91c2-4e73-92fe-76479307e952 error=5 type=1 offset=270336 size=8192 flags=b08c1
Mar 17 12:49:00 Vault kernel: zio pool=Infinity vdev=/dev/disk/by-partuuid/c8d40c54-0498-49f5-95c7-dfebd5e349a4 error=5 type=1 offset=9794672140288 size=8192 flags=b08c1
Mar 17 12:49:00 Vault kernel: zio pool=Infinity vdev=/dev/disk/by-partuuid/3472a6bf-91c2-4e73-92fe-76479307e952 error=5 type=1 offset=9794672140288 size=8192 flags=b08c1
Mar 17 12:49:00 Vault kernel: zio pool=Infinity vdev=/dev/disk/by-partuuid/c8d40c54-0498-49f5-95c7-dfebd5e349a4 error=5 type=1 offset=9794672402432 size=8192 flags=b08c1
Mar 17 12:49:00 Vault kernel: zio pool=Infinity vdev=/dev/disk/by-partuuid/3472a6bf-91c2-4e73-92fe-76479307e952 error=5 type=1 offset=9794672402432 size=8192 flags=b08c1
Mar 17 12:49:00 Vault kernel: zio pool=Infinity vdev=/dev/disk/by-partuuid/0911d6f0-cac5-4a1f-92d0-f334e399eab4 error=5 type=1 offset=7156744179712 size=4096 flags=180880
Mar 17 12:49:00 Vault kernel: zio pool=Infinity vdev=/dev/disk/by-partuuid/0911d6f0-cac5-4a1f-92d0-f334e399eab4 error=5 type=1 offset=270336 size=8192 flags=b08c1
Mar 17 12:49:00 Vault kernel: zio pool=Infinity vdev=/dev/disk/by-partuuid/0911d6f0-cac5-4a1f-92d0-f334e399eab4 error=5 type=1 offset=9794672140288 size=8192 flags=b08c1
Mar 17 12:49:00 Vault kernel: zio pool=Infinity vdev=/dev/disk/by-partuuid/0911d6f0-cac5-4a1f-92d0-f334e399eab4 error=5 type=1 offset=9794672402432 size=8192 flags=b08c1
Mar 17 12:49:00 Vault kernel: zio pool=Infinity vdev=/dev/disk/by-partuuid/00f5e183-37a3-4053-964c-e39ae771f17d error=5 type=1 offset=7156744179712 size=4096 flags=180880
Mar 17 12:49:00 Vault kernel: zio pool=Infinity vdev=/dev/disk/by-partuuid/4c39c9db-3bd8-4705-96df-d2f182efd75e error=5 type=1 offset=7156744179712 size=4096 flags=180880
Mar 17 12:49:00 Vault kernel: zio pool=Infinity vdev=/dev/disk/by-partuuid/00f5e183-37a3-4053-964c-e39ae771f17d error=5 type=1 offset=270336 size=8192 flags=b08c1
Mar 17 12:49:00 Vault kernel: zio pool=Infinity vdev=/dev/disk/by-partuuid/4c39c9db-3bd8-4705-96df-d2f182efd75e error=5 type=1 offset=270336 size=8192 flags=b08c1
Mar 17 12:49:00 Vault kernel: zio pool=Infinity vdev=/dev/disk/by-partuuid/00f5e183-37a3-4053-964c-e39ae771f17d error=5 type=1 offset=9794672140288 size=8192 flags=b08c1
Mar 17 12:49:00 Vault kernel: zio pool=Infinity vdev=/dev/disk/by-partuuid/4c39c9db-3bd8-4705-96df-d2f182efd75e error=5 type=1 offset=9794672140288 size=8192 flags=b08c1
Mar 17 12:49:00 Vault kernel: zio pool=Infinity vdev=/dev/disk/by-partuuid/00f5e183-37a3-4053-964c-e39ae771f17d error=5 type=1 offset=9794672402432 size=8192 flags=b08c1
Mar 17 12:49:00 Vault kernel: zio pool=Infinity vdev=/dev/disk/by-partuuid/4c39c9db-3bd8-4705-96df-d2f182efd75e error=5 type=1 offset=9794672402432 size=8192 flags=b08c1
Mar 17 12:49:00 Vault kernel: zio pool=Infinity vdev=/dev/disk/by-partuuid/ad4febd2-9418-4c98-9a29-6009d93d477d error=5 type=1 offset=7308429172736 size=4096 flags=180880
Mar 17 12:49:00 Vault kernel: zio pool=Infinity vdev=/dev/disk/by-partuuid/ad4febd2-9418-4c98-9a29-6009d93d477d error=5 type=1 offset=270336 size=8192 flags=b08c1
Mar 17 12:49:00 Vault kernel: zio pool=Infinity vdev=/dev/disk/by-partuuid/ad4febd2-9418-4c98-9a29-6009d93d477d error=5 type=1 offset=9794672140288 size=8192 flags=b08c1
Mar 17 12:49:00 Vault kernel: zio pool=Infinity vdev=/dev/disk/by-partuuid/ad4febd2-9418-4c98-9a29-6009d93d477d error=5 type=1 offset=9794672402432 size=8192 flags=b08c1
Mar 17 12:49:00 Vault kernel: zio pool=Infinity vdev=/dev/disk/by-partuuid/16926f72-01a9-43ae-b08b-5bc0a90d80d6 error=5 type=1 offset=7308429168640 size=4096 flags=180880
Mar 17 12:49:00 Vault kernel: zio pool=Infinity vdev=/dev/disk/by-partuuid/16926f72-01a9-43ae-b08b-5bc0a90d80d6 error=5 type=1 offset=270336 size=8192 flags=b08c1
Mar 17 12:49:00 Vault kernel: zio pool=Infinity vdev=/dev/disk/by-partuuid/7dbe4bb7-80a6-4557-ae68-19b21f176a4d error=5 type=1 offset=7308429168640 size=4096 flags=180880
Mar 17 12:49:00 Vault kernel: zio pool=Infinity vdev=/dev/disk/by-partuuid/16926f72-01a9-43ae-b08b-5bc0a90d80d6 error=5 type=1 offset=9794672140288 size=8192 flags=b08c1
Mar 17 12:49:00 Vault kernel: zio pool=Infinity vdev=/dev/disk/by-partuuid/7dbe4bb7-80a6-4557-ae68-19b21f176a4d error=5 type=1 offset=270336 size=8192 flags=b08c1
Mar 17 12:49:00 Vault kernel: zio pool=Infinity vdev=/dev/disk/by-partuuid/16926f72-01a9-43ae-b08b-5bc0a90d80d6 error=5 type=1 offset=9794672402432 size=8192 flags=b08c1
Mar 17 12:49:00 Vault kernel: zio pool=Infinity vdev=/dev/disk/by-partuuid/7dbe4bb7-80a6-4557-ae68-19b21f176a4d error=5 type=1 offset=9794672140288 size=8192 flags=b08c1
Mar 17 12:49:00 Vault kernel: zio pool=Infinity vdev=/dev/disk/by-partuuid/7dbe4bb7-80a6-4557-ae68-19b21f176a4d error=5 type=1 offset=9794672402432 size=8192 flags=b08c1
Mar 17 12:49:00 Vault kernel: WARNING: Pool 'Infinity' has encountered an uncorrectable I/O failure and has been suspended.
 

majerus

Contributor
Joined
Dec 21, 2012
Messages
126
Also forgot to mention, tried to go back to previous build via grub and seeing the same issues. Attached is the smart status. I suppose from the logs /dev/SDAI could be replaced but doesn't make sense for the other read errors.


Link to SmartCTL
 
Last edited:

majerus

Contributor
Joined
Dec 21, 2012
Messages
126
I bought a LSI 9207-8E and Noctua NF-A4x10 . After doing so the pool decrypts without an issue and is up and "healthy" I am going to run a scrub now to see if its ok following that. Crazy that this might just be the case of a bad card. Adding the fan since I wasnt 100% confident that the old card got the cooling it needed.


Snag_6553164.png
Snag_65629be.png
 

majerus

Contributor
Joined
Dec 21, 2012
Messages
126
Looks like have one faulted drive , just ordered replacement from amazon other then that controller appears to have resolved this completely.


Code:
ZFS has finished a scrub:

eid: 1556
class: scrub_finish
host: Vault
time: 2023-03-24 04:13:46-0500
pool: Infinity
state: DEGRADED
status: One or more devices are faulted in response to persistent errors.
Sufficient replicas exist for the pool to continue functioning in a
degraded state.
action: Replace the faulted device, or use 'zpool clear' to mark the device
repaired.
scan: scrub repaired 460K in 1 days 14:20:12 with 0 errors on Fri Mar 24 04:13:46 2023
config:

NAME STATE READ WRITE CKSUM
Infinity DEGRADED 0 0 0
raidz2-0 DEGRADED 0 0 0
ad4febd2-9418-4c98-9a29- ONLINE 0 0 0
d1ae15bd-ea73-4e35-9343- ONLINE 0 0 0
40bc676e-1411-4fb8-9b2a- ONLINE 0 0 0
c8d40c54-0498-49f5-95c7- ONLINE 0 0 0
3472a6bf-91c2-4e73-92fe- ONLINE 0 0 0
84d7b5d5-38ab-4dc2-8b54- ONLINE 0 0 0
00f5e183-37a3-4053-964c- ONLINE 0 0 0
4c39c9db-3bd8-4705-96df- ONLINE 0 0 0
0911d6f0-cac5-4a1f-92d0- ONLINE 0 0 0
f3c2cd4a-c2a2-487f-baec- ONLINE 0 0 0
16926f72-01a9-43ae-b08b- FAULTED 34 0 0 too many errors
7dbe4bb7-80a6-4557-ae68-ONLINE 0 0 0

errors: No known data errors
 
Top