Alibuba
Cadet
- Joined
- Jan 30, 2021
- Messages
- 6
Hi,
I'm running TrueNAS 12 (not sure of the exact release, but uname -a says 12.2-RELEASE-p12).
TrueNAS is running on a VM hosted by ESXi. The disks (12pcs of 3TB WD30EFRX disks and 4pcs of 4TB WD40EFRX) are connected to LSI2008 HBAs which are passed through to TrueNAS. The disks are configured to a single raidz3 pool consisting of 12 disks. The rest of the disks are spares in the pool.
Everything has been running without a hitch for a couple of years.
Yesterday an active disk failed SMART test in my raidz3 pool. I detached one of the spares from the pool, took the failed drive offline, and replaced if with the detached spare, and resilvering begun as expected.
However in the morning the pool encountered an I/O failure and the resilvering process was halted.
I'm unable to get timestamps from dmesg output, but from what I can gather, it looks like there was (or is) an issue with one of the LSI HBAs.
Here's (probably) the relevant output from dmesg
The ESXi host seems to be acting up as well (I'm unable to edit the settings of the TrueNAS VM), so it might be an ESXi issue as well.
So far I haven't touched either TrueNAS or the ESXi host, besides gathering logs.
I would appreciate any and all suggestions for my following steps for 1) investigating and resolving the underlying issue and 2) attempting to bring the pool back online (preferably with all the data =) ).
Thanks in advance!
I'm running TrueNAS 12 (not sure of the exact release, but uname -a says 12.2-RELEASE-p12).
TrueNAS is running on a VM hosted by ESXi. The disks (12pcs of 3TB WD30EFRX disks and 4pcs of 4TB WD40EFRX) are connected to LSI2008 HBAs which are passed through to TrueNAS. The disks are configured to a single raidz3 pool consisting of 12 disks. The rest of the disks are spares in the pool.
Everything has been running without a hitch for a couple of years.
Yesterday an active disk failed SMART test in my raidz3 pool. I detached one of the spares from the pool, took the failed drive offline, and replaced if with the detached spare, and resilvering begun as expected.
However in the morning the pool encountered an I/O failure and the resilvering process was halted.
I'm unable to get timestamps from dmesg output, but from what I can gather, it looks like there was (or is) an issue with one of the LSI HBAs.
Here's (probably) the relevant output from dmesg
Code:
(da3:mps0:0:2:0): SYNCHRONIZE CACHE(10). CDB: 35 00 00 00 00 00 00 00 00 00 length 0 SMID 136 Command timeout on target 2(0x000d) 60000 set, 60.67265958 elapsed mps0: Sending abort to target 2 for SMID 136 (da3:mps0:0:2:0): SYNCHRONIZE CACHE(10). CDB: 35 00 00 00 00 00 00 00 00 00 length 0 SMID 136 Aborting command 0xfffffe00e3d0b6c0 (da16:mps1:0:7:0): SYNCHRONIZE CACHE(10). CDB: 35 00 00 00 00 00 00 00 00 00 length 0 SMID 795 Command timeout on target 7(0x0009) 60000 set, 60.67633423 elapsed mps1: Sending abort to target 7 for SMID 795 (da16:mps1:0:7:0): SYNCHRONIZE CACHE(10). CDB: 35 00 00 00 00 00 00 00 00 00 length 0 SMID 795 Aborting command 0xfffffe00e4042c48 (da5:mps0:0:4:0): SYNCHRONIZE CACHE(10). CDB: 35 00 00 00 00 00 00 00 00 00 length 0 SMID 257 Command timeout on target 4(0x000b) 60000 set, 60.67919296 elapsed mps0: Sending abort to target 4 for SMID 257 (da5:mps0:0:4:0): SYNCHRONIZE CACHE(10). CDB: 35 00 00 00 00 00 00 00 00 00 length 0 SMID 257 Aborting command 0xfffffe00e3d15958 (da6:mps0:0:5:0): SYNCHRONIZE CACHE(10). CDB: 35 00 00 00 00 00 00 00 00 00 length 0 SMID 1644 Command timeout on target 5(0x0010) 60000 set, 60.68200370 elapsed mps0: Sending abort to target 5 for SMID 1644 (da6:mps0:0:5:0): SYNCHRONIZE CACHE(10). CDB: 35 00 00 00 00 00 00 00 00 00 length 0 SMID 1644 Aborting command 0xfffffe00e3d8a120 (da12:mps1:0:3:0): SYNCHRONIZE CACHE(10). CDB: 35 00 00 00 00 00 00 00 00 00 length 0 SMID 1108 Command timeout on target 3(0x000d) 60000 set, 60.68640428 elapsed mps1: Sending abort to target 3 for SMID 1108 (da12:mps1:0:3:0): SYNCHRONIZE CACHE(10). CDB: 35 00 00 00 00 00 00 00 00 00 length 0 SMID 1108 Aborting command 0xfffffe00e405d0e0 mps1: mpssas_action_scsiio: Freezing devq for target ID 3 (da12:mps1:0:3:0): READ(10). CDB: 28 00 bb 9d 05 d0 00 00 08 00 (da12:mps1:0:3:0): CAM status: CAM subsystem is busy (da12:mps1:0:3:0): Retrying command, 3 more tries remain mps0: Controller reported scsi ioc terminated tgt 5 SMID 1343 loginfo 31130000 mps0: Controller reported scsi ioc terminated tgt 5 SMID 329 loginfo 31130000 mps0: Controller reported scsi ioc terminated tgt 5 SMID 881 loginfo 31140000 mps0: Controller reported scsi ioc terminated tgt 5 SMID 1414 loginfo 31140000 mps0: Controller reported scsi ioc terminated tgt 5 SMID 251 loginfo 31140000 mps0: Controller reported scsi ioc terminated tgt 5 SMID 122 loginfo 31140000 mps0: Controller reported scsi ioc terminated tgt 5 SMID 795 loginfo 31140000 mps0: Controller reported scsi ioc terminated tgt 5 SMID 2083 loginfo 31140000 mps0: Controller reported scsi ioc terminated tgt 5 SMID 1114 loginfo 31140000 (da6:mps0:0:5:0): SYNCHRONIZE CACHE(10). CDB: 35 00 00 00 00 00 00 00 00 00 mps0: (da6:mps0:0:5:0): CAM status: Command timeout (da6:mps0:0:5:0): Retrying command, 0 more tries remain Controller reported scsi ioc terminated tgt 5 SMID 2019 loginfo 31140000 (da6:mps0:0:5:0): SYNCHRONIZE CACHE(10). CDB: 35 00 00 00 00 00 00 00 00 00 mps0: (da6:mps0:0:5:0): CAM status: CCB request completed with an error Finished abort recovery for target 5 (da6:mps0:0:5:0): Retrying command, 0 more tries remain mps0: Unfreezing devq for target ID 5 (da6:mps0:0:5:0): SYNCHRONIZE CACHE(10). CDB: 35 00 00 00 00 00 00 00 00 00 (da6:mps0:0:5:0): CAM status: CCB request completed with an error (da6:mps0:0:5:0): Retrying command, 0 more tries remain (da6:mps0:0:5:0): READ(16). CDB: 88 00 00 00 00 01 0a 7c c8 30 00 00 00 28 00 00 (da6:mps0:0:5:0): CAM status: CCB request completed with an error (da6:mps0:0:5:0): Retrying command, 3 more tries remain (da6:mps0:0:5:0): READ(16). CDB: 88 00 00 00 00 01 0a 7c ab 38 00 00 00 08 00 00 (da6:mps0:0:5:0): CAM status: CCB request completed with an error (da6:mps0:0:5:0): Retrying command, 3 more tries remain (da6:mps0:0:5:0): READ(16). CDB: 88 00 00 00 00 01 0a 7c a1 f0 00 00 00 08 00 00 (da6:mps0:0:5:0): CAM status: CCB request completed with an error (da6:mps0:0:5:0): Retrying command, 3 more tries remain (da6:mps0:0:5:0): READ(16). CDB: 88 00 00 00 00 01 0a 7a 67 c8 00 00 00 08 00 00 (da6:mps0:0:5:0): CAM status: CCB request completed with an error (da6:mps0:0:5:0): Retrying command, 3 more tries remain (da6:mps0:0:5:0): READ(16). CDB: 88 00 00 00 00 01 0a 7b c0 f0 00 00 00 08 00 00 (da6:mps0:0:5:0): CAM status: CCB request completed with an error (da6:mps0:0:5:0): Retrying command, 3 more tries remain (da6:mps0:0:5:0): READ(16). CDB: 88 00 00 00 00 01 0a 78 e1 50 00 00 00 08 00 00 (da6:mps0:0:5:0): CAM status: CCB request completed with an error (da6:mps0:0:5:0): Retrying command, 3 more tries remain (da6:mps0:0:5:0): READ(16). CDB: 88 00 00 00 00 01 0a 7c f2 88 00 00 00 10 00 00 (da6:mps0:0:5:0): CAM status: CCB request completed with an error (da6:mps0:0:5:0): Retrying command, 3 more tries remain (da6:mps0:0:5:0): READ(16). CDB: 88 00 00 00 00 01 0a 78 54 d8 00 00 00 08 00 00 (da6:mps0:0:5:0): CAM status: CCB request completed with an error (da6:mps0:0:5:0): Retrying command, 3 more tries remain mps1: Controller reported scsi ioc terminated tgt 7 SMID 936 loginfo 31130000 mps1: (da16:mps1:0:7:0): SYNCHRONIZE CACHE(10). CDB: 35 00 00 00 00 00 00 00 00 00 (da16:mps1:0:7:0): CAM status: Command timeout Controller reported scsi ioc terminated tgt 7 SMID 1762 loginfo 31130000 (da16:mps1:0:7:0): Retrying command, 0 more tries remain mps1: Controller reported scsi ioc terminated tgt 7 SMID 296 loginfo 31140000 mps1: (da16:mps1:0:7:0): SYNCHRONIZE CACHE(10). CDB: 35 00 00 00 00 00 00 00 00 00 Controller reported scsi ioc terminated tgt 7 SMID 1690 loginfo 31140000 (da16:mps1:0:7:0): CAM status: CCB request completed with an error (da16:mps1:0:7:0): Retrying command, 0 more tries remain mps1: Controller reported scsi ioc terminated tgt 7 SMID 1004 loginfo 31140000 (da16:mps1:0:7:0): SYNCHRONIZE CACHE(10). CDB: 35 00 00 00 00 00 00 00 00 00 mps1: Controller reported scsi ioc terminated tgt 7 SMID 832 loginfo 31140000 (da16:mps1:0:7:0): CAM status: CCB request completed with an error mps1: (da16:mps1:0:7:0): Retrying command, 0 more tries remain Controller reported scsi ioc terminated tgt 7 SMID 1709 loginfo 31140000 (da16:mps1:0:7:0): READ(16). CDB: 88 00 00 00 00 01 04 21 b0 78 00 00 00 40 00 00 mps1: (da16:mps1:0:7:0): CAM status: CCB request completed with an error (da16:mps1:0:7:0): Retrying command, 3 more tries remain Controller reported scsi ioc terminated tgt 7 SMID 653 loginfo 31140000 (da16:mps1:0:7:0): READ(16). CDB: 88 00 00 00 00 01 04 20 a1 90 00 00 00 08 00 00 mps1: (da16:mps1:0:7:0): CAM status: CCB request completed with an error (da16:mps1:0:7:0): Retrying command, 3 more tries remain Controller reported scsi ioc terminated tgt 7 SMID 576 loginfo 31140000 (da16:mps1:0:7:0): READ(16). CDB: 88 00 00 00 00 01 04 23 8b 68 00 00 00 08 00 00 mps1: Controller reported scsi ioc terminated tgt 7 SMID 1270 loginfo 31140000 mps1: Finished abort recovery for target 7 mps1: Unfreezing devq for target ID 7 mps1: Controller reported scsi ioc terminated tgt 3 SMID 388 loginfo 31130000 mps1: Controller reported scsi ioc terminated tgt 3 SMID 974 loginfo 31130000 mps1: Controller reported scsi ioc terminated tgt 3 SMID 1267 loginfo 31140000 mps1: (da16:mps1:0:7:0): CAM status: CCB request completed with an error (da16:mps1:0:7:0): Retrying command, 3 more tries remain Controller reported scsi ioc terminated tgt 3 SMID 594 loginfo 31140000 (da16:mps1:0:7:0): READ(16). CDB: 88 00 00 00 00 01 04 23 cd 78 00 00 00 08 00 00 mps1: (da16:mps1:0:7:0): CAM status: CCB request completed with an error (da16:mps1:0:7:0): Retrying command, 3 more tries remain Controller reported scsi ioc terminated tgt 3 SMID 1624 loginfo 31140000 (da16:mps1:0:7:0): READ(16). CDB: 88 00 00 00 00 01 04 21 3c d0 00 00 00 30 00 00 mps1: Controller reported scsi ioc terminated tgt 3 SMID 1437 loginfo 31140000 mps1: Controller reported scsi ioc terminated tgt 3 SMID 780 loginfo 31140000 mps1: Controller reported scsi ioc terminated tgt 3 SMID 596 loginfo 31140000 mps1: (da16:mps1:0:7:0): CAM status: CCB request completed with an error (da16:mps1:0:7:0): Retrying command, 3 more tries remain Controller reported scsi ioc terminated tgt 3 SMID 346 loginfo 31140000 (da16:mps1:0:7:0): READ(16). CDB: 88 00 00 00 00 01 04 23 8f 28 00 00 00 08 00 00 mps1: (da16:mps1:0:7:0): CAM status: CCB request completed with an error (da16:mps1:0:7:0): Retrying command, 3 more tries remain Finished abort recovery for target 3 (da16:mps1:0:7:0): READ(16). CDB: 88 00 00 00 00 01 04 23 a2 a8 00 00 00 08 00 00 mps1: (da16:mps1:0:7:0): CAM status: CCB request completed with an error (da16:mps1:0:7:0): Retrying command, 3 more tries remain Unfreezing devq for target ID 3 (da16:mps1:0:7:0): READ(16). CDB: 88 00 00 00 00 01 04 23 dd 40 00 00 00 10 00 00 (da16:mps1:0:7:0): CAM status: CCB request completed with an error (da16:mps1:0:7:0): Retrying command, 3 more tries remain (da12:mps1:0:3:0): SYNCHRONIZE CACHE(10). CDB: 35 00 00 00 00 00 00 00 00 00 (da12:mps1:0:3:0): CAM status: Command timeout (da12:mps1:0:3:0): Retrying command, 0 more tries remain (da12:mps1:0:3:0): SYNCHRONIZE CACHE(10). CDB: 35 00 00 00 00 00 00 00 00 00 (da12:mps1:0:3:0): CAM status: CCB request completed with an error (da12:mps1:0:3:0): Retrying command, 0 more tries remain (da12:mps1:0:3:0): SYNCHRONIZE CACHE(10). CDB: 35 00 00 00 00 00 00 00 00 00 (da12:mps1:0:3:0): CAM status: CCB request completed with an error (da12:mps1:0:3:0): Retrying command, 0 more tries remain (da12:mps1:0:3:0): READ(10). CDB: 28 00 bb 9c dd f8 00 00 08 00 (da12:mps1:0:3:0): CAM status: CCB request completed with an error (da12:mps1:0:3:0): Retrying command, 3 more tries remain (da12:mps1:0:3:0): READ(10). CDB: 28 00 a1 52 67 c0 00 00 08 00 (da12:mps1:0:3:0): CAM status: CCB request completed with an error (da12:mps1:0:3:0): Retrying command, 3 more tries remain (da12:mps1:0:3:0): READ(10). CDB: 28 00 bb 9c 9d c8 00 00 08 00 (da12:mps1:0:3:0): CAM status: CCB request completed with an error (da12:mps1:0:3:0): Retrying command, 3 more tries remain (da12:mps1:0:3:0): READ(10). CDB: 28 00 bb 9c ee 30 00 00 28 00 (da12:mps1:0:3:0): CAM status: CCB request completed with an error (da12:mps1:0:3:0): Retrying command, 3 more tries remain (da12:mps1:0:3:0): READ(10). CDB: 28 00 73 cb 99 f8 00 00 08 00 (da12:mps1:0:3:0): CAM status: CCB request completed with an error (da12:mps1:0:3:0): Retrying command, 3 more tries remain (da12:mps1:0:3:0): READ(10). CDB: 28 00 73 d3 70 c8 00 00 08 00 (da12:mps1:0:3:0): CAM status: CCB request completed with an error (da12:mps1:0:3:0): Retrying command, 3 more tries remain (da12:mps1:0:3:0): READ(10). CDB: 28 00 bb 9c dc 78 00 00 08 00 (da12:mps1:0:3:0): CAM status: CCB request completed with an error (da12:mps1:0:3:0): Retrying command, 3 more tries remain mps0: Controller reported scsi ioc terminated tgt 4 SMID 1409 loginfo 31130000 mps0: Controller reported scsi ioc terminated tgt 4 SMID 274 loginfo 31130000 mps0: Controller reported scsi ioc terminated tgt 4 SMID 496 loginfo 31140000 (da5:mps0:0:4:0): SYNCHRONIZE CACHE(10). CDB: 35 00 00 00 00 00 00 00 00 00 mps0: (da5:mps0:0:4:0): CAM status: Command timeout (da5:mps0:0:4:0): Retrying command, 0 more tries remain Controller reported scsi ioc terminated tgt 4 SMID 973 loginfo 31140000 mps0: Controller reported scsi ioc terminated tgt 4 SMID 1196 loginfo 31140000 mps0: Controller reported scsi ioc terminated tgt 4 SMID 832 loginfo 31140000 mps0: (da5:mps0:0:4:0): SYNCHRONIZE CACHE(10). CDB: 35 00 00 00 00 00 00 00 00 00 Controller reported scsi ioc terminated tgt 4 SMID 1016 loginfo 31140000 (da5:mps0:0:4:0): CAM status: CCB request completed with an error (da5:mps0:0:4:0): Retrying command, 0 more tries remain (da5:mps0:0:4:0): SYNCHRONIZE CACHE(10). CDB: 35 00 00 00 00 00 00 00 00 00 mps0: Controller reported scsi ioc terminated tgt 4 SMID 1245 loginfo 31140000 (da5:mps0:0:4:0): CAM status: CCB request completed with an error (da5:mps0:0:4:0): Retrying command, 0 more tries remain (da5:mps0:0:4:0): READ(16). CDB: 88 00 00 00 00 01 00 7e 29 08 00 00 00 08 00 00 mps0: (da5:mps0:0:4:0): CAM status: CCB request completed with an error (da5:mps0:0:4:0): Retrying command, 3 more tries remain Controller reported scsi ioc terminated tgt 4 SMID 1586 loginfo 31140000 (da5:mps0:0:4:0): READ(16). CDB: 88 00 00 00 00 01 00 7f 89 70 00 00 00 08 00 00 mps0: Controller reported scsi ioc terminated tgt 4 SMID 570 loginfo 31140000 mps0: Finished abort recovery for target 4 mps0: Unfreezing devq for target ID 4 mps0: (da5:mps0:0:4:0): CAM status: CCB request completed with an error Controller reported scsi ioc terminated tgt 2 SMID 492 loginfo 31130000 (da5:mps0:0:4:0): Retrying command, 3 more tries remain mps0: Controller reported scsi ioc terminated tgt 2 SMID 1994 loginfo 31130000 (da5:mps0:0:4:0): READ(16). CDB: 88 00 00 00 00 01 00 7e e7 88 00 00 00 08 00 00 mps0: (da5:mps0:0:4:0): CAM status: CCB request completed with an error (da5:mps0:0:4:0): Retrying command, 3 more tries remain (da5:mps0:0:4:0): READ(10). CDB: 28 00 ff 79 ae 68 00 00 08 00 Controller reported scsi ioc terminated tgt 2 SMID 1545 loginfo 31140000 (da5:mps0:0:4:0): CAM status: CCB request completed with an error mps0: (da5:mps0:0:4:0): Retrying command, 3 more tries remain Controller reported scsi ioc terminated tgt 2 SMID 1988 loginfo 31140000 (da5:mps0:0:4:0): READ(16). CDB: 88 00 00 00 00 01 00 7f 7e 80 00 00 00 08 00 00 mps0: (da5:mps0:0:4:0): CAM status: CCB request completed with an error (da5:mps0:0:4:0): Retrying command, 3 more tries remain Controller reported scsi ioc terminated tgt 2 SMID 643 loginfo 31140000 (da5:mps0:0:4:0): READ(16). CDB: 88 00 00 00 00 01 00 7f 83 c8 00 00 00 08 00 00 mps0: Controller reported scsi ioc terminated tgt 2 SMID 591 loginfo 31140000 (da5:mps0:0:4:0): CAM status: CCB request completed with an error (da5:mps0:0:4:0): Retrying command, 3 more tries remain (da5:mps0:0:4:0): READ(16). CDB: 88 00 00 00 00 01 00 7f 4f 88 00 00 00 10 00 00 mps0: (da5:mps0:0:4:0): CAM status: CCB request completed with an error (da5:mps0:0:4:0): Retrying command, 3 more tries remain Controller reported scsi ioc terminated tgt 2 SMID 1500 loginfo 31140000 mps0: Controller reported scsi ioc terminated tgt 2 SMID 2081 loginfo 31140000 mps0: Controller reported scsi ioc terminated tgt 2 SMID 850 loginfo 31140000 mps0: Controller reported scsi ioc terminated tgt 2 SMID 1057 loginfo 31140000 mps0: Finished abort recovery for target 2 mps0: Unfreezing devq for target ID 2 (da5:mps0:0:4:0): READ(16). CDB: 88 00 00 00 00 01 00 7f 9b 88 00 00 00 10 00 00 (da5:mps0:0:4:0): CAM status: CCB request completed with an error (da5:mps0:0:4:0): Retrying command, 3 more tries remain (da3:mps0:0:2:0): SYNCHRONIZE CACHE(10). CDB: 35 00 00 00 00 00 00 00 00 00 (da3:mps0:0:2:0): CAM status: Command timeout (da3:mps0:0:2:0): Retrying command, 0 more tries remain (da3:mps0:0:2:0): SYNCHRONIZE CACHE(10). CDB: 35 00 00 00 00 00 00 00 00 00 (da3:mps0:0:2:0): CAM status: CCB request completed with an error (da3:mps0:0:2:0): Retrying command, 0 more tries remain (da3:mps0:0:2:0): SYNCHRONIZE CACHE(10). CDB: 35 00 00 00 00 00 00 00 00 00 (da3:mps0:0:2:0): CAM status: CCB request completed with an error (da3:mps0:0:2:0): Retrying command, 0 more tries remain (da3:mps0:0:2:0): READ(16). CDB: 88 00 00 00 00 01 04 21 8b 48 00 00 00 08 00 00 (da3:mps0:0:2:0): CAM status: CCB request completed with an error (da3:mps0:0:2:0): Retrying command, 3 more tries remain (da3:mps0:0:2:0): READ(16). CDB: 88 00 00 00 00 01 04 23 57 80 00 00 00 10 00 00 (da3:mps0:0:2:0): CAM status: CCB request completed with an error (da3:mps0:0:2:0): Retrying command, 3 more tries remain (da3:mps0:0:2:0): READ(16). CDB: 88 00 00 00 00 01 04 20 b6 80 00 00 00 08 00 00 (da3:mps0:0:2:0): CAM status: CCB request completed with an error (da3:mps0:0:2:0): Retrying command, 3 more tries remain (da3:mps0:0:2:0): READ(16). CDB: 88 00 00 00 00 01 04 1e 36 90 00 00 00 08 00 00 (da3:mps0:0:2:0): CAM status: CCB request completed with an error (da3:mps0:0:2:0): Retrying command, 3 more tries remain (da3:mps0:0:2:0): READ(16). CDB: 88 00 00 00 00 01 04 23 6c f8 00 00 00 08 00 00 (da3:mps0:0:2:0): CAM status: CCB request completed with an error (da3:mps0:0:2:0): Retrying command, 3 more tries remain (da3:mps0:0:2:0): READ(16). CDB: 88 00 00 00 00 01 04 23 8b 68 00 00 00 08 00 00 (da3:mps0:0:2:0): CAM status: CCB request completed with an error (da3:mps0:0:2:0): Retrying command, 3 more tries remain (da3:mps0:0:2:0): READ(16). CDB: 88 00 00 00 00 01 04 23 4b 30 00 00 00 08 00 00 (da3:mps0:0:2:0): CAM status: CCB request completed with an error (da3:mps0:0:2:0): Retrying command, 3 more tries remain (da3:mps0:0:2:0): READ(16). CDB: 88 00 00 00 00 01 04 23 74 c8 00 00 00 08 00 00 (da3:mps0:0:2:0): CAM status: CCB request completed with an error (da3:mps0:0:2:0): Retrying command, 3 more tries remain (da16:mps1:0:7:0): SYNCHRONIZE CACHE(10). CDB: 35 00 00 00 00 00 00 00 00 00 (da16:mps1:0:7:0): CAM status: SCSI Status Error (da16:mps1:0:7:0): SCSI status: Check Condition (da16:mps1:0:7:0): SCSI sense: UNIT ATTENTION asc:29,0 (Power on, reset, or bus device reset occurred) (da16:mps1:0:7:0): Error 6, Retries exhausted (da16:mps1:0:7:0): Invalidating pack (da12:mps1:0:3:0): SYNCHRONIZE CACHE(10). CDB: 35 00 00 00 00 00 00 00 00 00 (da12:mps1:0:3:0): CAM status: SCSI Status Error (da12:mps1:0:3:0): SCSI status: Check Condition (da12:mps1:0:3:0): SCSI sense: UNIT ATTENTION asc:29,0 (Power on, reset, or bus device reset occurred) (da12:mps1:0:3:0): Error 6, Retries exhausted (da12:mps1:0:3:0): Invalidating pack (da5:mps0:0:4:0): SYNCHRONIZE CACHE(10). CDB: 35 00 00 00 00 00 00 00 00 00 (da5:mps0:0:4:0): CAM status: SCSI Status Error (da5:mps0:0:4:0): SCSI status: Check Condition (da6:mps0:0:5:0): SYNCHRONIZE CACHE(10). CDB: 35 00 00 00 00 00 00 00 00 00 (da5:mps0:0:4:0): SCSI sense: UNIT ATTENTION asc:29,0 (Power on, reset, or bus device reset occurred) (da6:mps0:0:5:0): CAM status: SCSI Status Error (da5:mps0:0:4:0): Error 6, Retries exhausted (da6:mps0:0:5:0): SCSI status: Check Condition (da6:mps0:0:5:0): SCSI sense: UNIT ATTENTION asc:29,0 (Power on, reset, or bus device reset occurred) (da5:mps0:0:4:0): Invalidating pack (da6:mps0:0:5:0): Error 6, Retries exhausted (da6:mps0:0:5:0): Invalidating pack (da3:mps0:0:2:0): SYNCHRONIZE CACHE(10). CDB: 35 00 00 00 00 00 00 00 00 00 (da3:mps0:0:2:0): CAM status: SCSI Status Error (da3:mps0:0:2:0): SCSI status: Check Condition (da3:mps0:0:2:0): SCSI sense: UNIT ATTENTION asc:29,0 (Power on, reset, or bus device reset occurred) (da3:mps0:0:2:0): Error 6, Retries exhausted (da3:mps0:0:2:0): Invalidating pack Solaris: WARNING: Pool 'tank' has encountered an uncorrectable I/O failure and has been suspended.
zpool status -v
output isCode:
pool: tank state: ONLINE status: One or more devices is currently being resilvered. The pool will continue to function, possibly in a degraded state. action: Wait for the resilver to complete. scan: resilver in progress since Sun Aug 7 22:29:40 2022 15.2T scanned at 180M/s, 5.18T issued at 61.4M/s, 25.3T total 437G resilvered, 20.49% done, 3 days 23:29:26 to go config: NAME STATE READ WRITE CKSUM tank ONLINE 0 0 0 raidz3-0 ONLINE 0 16 0 gptid/25154c0a-9652-11eb-98e6-000c29be6648 ONLINE 0 0 434 (resilvering) gptid/254a36d3-9652-11eb-98e6-000c29be6648 ONLINE 0 0 142 (resilvering) gptid/26a5958f-9652-11eb-98e6-000c29be6648 ONLINE 0 0 0 gptid/796d52d2-b6ca-11eb-b288-000c29be6648 ONLINE 1.08K 1005 0 gptid/268a4690-9652-11eb-98e6-000c29be6648 ONLINE 0 0 0 gptid/276725a8-9652-11eb-98e6-000c29be6648 ONLINE 0 0 0 gptid/284d86dc-9652-11eb-98e6-000c29be6648 ONLINE 0 0 0 gptid/5e647216-b70c-11eb-b288-000c29be6648 ONLINE 4.81K 4.38K 0 gptid/2826445a-9652-11eb-98e6-000c29be6648 ONLINE 6.70K 7.65K 0 gptid/287f6272-9652-11eb-98e6-000c29be6648 ONLINE 6.01K 7.04K 0 gptid/28897e41-9652-11eb-98e6-000c29be6648 ONLINE 5.87K 6.95K 0 gptid/476d231d-1687-11ed-90d7-000c29be6648 ONLINE 0 0 20.1K (resilvering) spares gptid/e19d6609-65ba-11ec-bf4a-000c29be6648 AVAIL gptid/e2208018-65ba-11ec-bf4a-000c29be6648 AVAIL gptid/e23a6cf8-65ba-11ec-bf4a-000c29be6648 AVAIL
The ESXi host seems to be acting up as well (I'm unable to edit the settings of the TrueNAS VM), so it might be an ESXi issue as well.
So far I haven't touched either TrueNAS or the ESXi host, besides gathering logs.
I would appreciate any and all suggestions for my following steps for 1) investigating and resolving the underlying issue and 2) attempting to bring the pool back online (preferably with all the data =) ).
Thanks in advance!