Problems with LSI SAS 9207-8i after 12.0-U8 -> 13.0-U1.1 upgrade

Mystik

Cadet
Joined
Aug 26, 2022
Messages
3
Greeting!

After upgrading from 12.0-U8 to 13.0-U1.1, I instantly started having some problems with (I believe) my LSI SAS 9207-8i.

TrueNAS: 13.0-U1.1
CPU: i7-3770K
RAM: 16GB DDR4-1600
Controller: LSI SAS 9207-8i PCI-e 3.0 (IT Mode)
HDDs: 6 x WD Red 4TB (raidz2)

After update, my NAS raised an alert: * Device: /dev/da5 [SAT], not capable of SMART self-check.
Some time after this, it seems like the controller started failing somehow, as if I read the logs correctly all of the drives are being dropped one by one and then the controller is being reset:

Code:
Aug 26 21:33:46 TrueNAS-304 mps0: IOC Fault 0x40000d04, Resetting
Aug 26 21:33:46 TrueNAS-304 mps0: Reinitializing controller
Aug 26 21:33:46 TrueNAS-304 mps0: Firmware: 20.00.07.00, Driver: 21.02.00.00-fbsd
Aug 26 21:33:46 TrueNAS-304 mps0: IOCCapabilities: 5a85c<ScsiTaskFull,DiagTrace,SnapBuf,EEDP,TransRetry,EventReplay,MSIXIndex,HostDisc>
Aug 26 21:33:59 TrueNAS-304 mps0: IOC Fault 0x40000d04, Resetting
Aug 26 21:33:59 TrueNAS-304 mps0: Reinitializing controller
Aug 26 21:33:59 TrueNAS-304 mps0: Firmware: 20.00.07.00, Driver: 21.02.00.00-fbsd
Aug 26 21:33:59 TrueNAS-304 mps0: IOCCapabilities: 5a85c<ScsiTaskFull,DiagTrace,SnapBuf,EEDP,TransRetry,EventReplay,MSIXIndex,HostDisc>
Aug 26 21:34:16 TrueNAS-304 mps0: IOC Fault 0x40000d04, Resetting
Aug 26 21:34:16 TrueNAS-304 mps0: Reinitializing controller
Aug 26 21:34:16 TrueNAS-304 mps0: Firmware: 20.00.07.00, Driver: 21.02.00.00-fbsd
Aug 26 21:34:16 TrueNAS-304 mps0: IOCCapabilities: 5a85c<ScsiTaskFull,DiagTrace,SnapBuf,EEDP,TransRetry,EventReplay,MSIXIndex,HostDisc>
Aug 26 21:34:16 TrueNAS-304 (da5:mps0:0:5:0): Invalidating pack
Aug 26 21:34:16 TrueNAS-304 da5 at mps0 bus 0 scbus0 target 5 lun 0
Aug 26 21:34:16 TrueNAS-304 da5: <ATA WDC WD40EFRX-68N 0A82>  s/n WD-WCC7K2KASH4F detached
Aug 26 21:34:16 TrueNAS-304 GEOM_MIRROR: Device swap0: provider da5p1 disconnected.
Aug 26 21:34:29 TrueNAS-304 mps0: IOC Fault 0x40000d04, Resetting
Aug 26 21:34:29 TrueNAS-304 mps0: Reinitializing controller
Aug 26 21:34:29 TrueNAS-304 mps0: Firmware: 20.00.07.00, Driver: 21.02.00.00-fbsd
Aug 26 21:34:29 TrueNAS-304 mps0: IOCCapabilities: 5a85c<ScsiTaskFull,DiagTrace,SnapBuf,EEDP,TransRetry,EventReplay,MSIXIndex,HostDisc>
Aug 26 21:34:45 TrueNAS-304 mps0: IOC Fault 0x40000d04, Resetting
Aug 26 21:34:45 TrueNAS-304 mps0: Reinitializing controller
Aug 26 21:34:45 TrueNAS-304 mps0: Firmware: 20.00.07.00, Driver: 21.02.00.00-fbsd
Aug 26 21:34:45 TrueNAS-304 mps0: IOCCapabilities: 5a85c<ScsiTaskFull,DiagTrace,SnapBuf,EEDP,TransRetry,EventReplay,MSIXIndex,HostDisc>
Aug 26 21:34:45 TrueNAS-304 (da2:mps0:0:2:0): Invalidating pack
Aug 26 21:34:45 TrueNAS-304 da2 at mps0 bus 0 scbus0 target 2 lun 0
Aug 26 21:34:45 TrueNAS-304 da2: <ATA WDC WD40EFRX-68N 0A82>  s/n WD-WCC7K0SF95HR detached
Aug 26 21:34:45 TrueNAS-304 GEOM_MIRROR: Device swap1: provider da2p1 disconnected.
Aug 26 21:34:55 TrueNAS-304 (da5:mps0:0:5:0): Periph destroyed
Aug 26 21:34:55 TrueNAS-304 (da2:mps0:0:2:0): Periph destroyed
Aug 26 21:34:55 TrueNAS-304 da2 at mps0 bus 0 scbus0 target 2 lun 0
Aug 26 21:34:55 TrueNAS-304 da2: <ATA WDC WD40EFRX-68N 0A82> Fixed Direct Access SPC-4 SCSI device
Aug 26 21:34:55 TrueNAS-304 da2: Serial Number WD-WCC7K0SF95HR
Aug 26 21:34:55 TrueNAS-304 da2: 600.000MB/s transfers
Aug 26 21:34:55 TrueNAS-304 da2: Command Queueing enabled
Aug 26 21:34:55 TrueNAS-304 da2: 3815447MB (7814037168 512 byte sectors)
Aug 26 21:34:55 TrueNAS-304 da2: quirks=0x8<4K>
Aug 26 21:34:55 TrueNAS-304 da5 at mps0 bus 0 scbus0 target 5 lun 0
Aug 26 21:34:55 TrueNAS-304 da5: <ATA WDC WD40EFRX-68N 0A82> Fixed Direct Access SPC-4 SCSI device
Aug 26 21:34:55 TrueNAS-304 da5: Serial Number WD-WCC7K2KASH4F
Aug 26 21:34:55 TrueNAS-304 da5: 600.000MB/s transfers
Aug 26 21:34:55 TrueNAS-304 da5: Command Queueing enabled
Aug 26 21:34:55 TrueNAS-304 da5: 3815447MB (7814037168 512 byte sectors)
Aug 26 21:34:55 TrueNAS-304 da5: quirks=0x8<4K>
Aug 26 21:35:23 TrueNAS-304 GEOM_MIRROR: Device mirror/swap2 launched (3/3).
Aug 26 21:35:24 TrueNAS-304 GEOM_ELI: Device mirror/swap2.eli created.
Aug 26 21:35:24 TrueNAS-304 GEOM_ELI: Encryption: AES-XTS 128
Aug 26 21:35:24 TrueNAS-304 GEOM_ELI:     Crypto: accelerated software
Aug 26 21:38:13 TrueNAS-304 mps0: IOC Fault 0x40000d04, Resetting
Aug 26 21:38:13 TrueNAS-304 mps0: Reinitializing controller
Aug 26 21:38:13 TrueNAS-304 mps0: Firmware: 20.00.07.00, Driver: 21.02.00.00-fbsd
Aug 26 21:38:13 TrueNAS-304 mps0: IOCCapabilities: 5a85c<ScsiTaskFull,DiagTrace,SnapBuf,EEDP,TransRetry,EventReplay,MSIXIndex,HostDisc>
Aug 26 21:38:30 TrueNAS-304 mps0: IOC Fault 0x40000d04, Resetting
Aug 26 21:38:30 TrueNAS-304 mps0: Reinitializing controller
Aug 26 21:38:30 TrueNAS-304 mps0: Firmware: 20.00.07.00, Driver: 21.02.00.00-fbsd
Aug 26 21:38:30 TrueNAS-304 mps0: IOCCapabilities: 5a85c<ScsiTaskFull,DiagTrace,SnapBuf,EEDP,TransRetry,EventReplay,MSIXIndex,HostDisc>
Aug 26 21:38:54 TrueNAS-304 mps0: IOC Fault 0x40000d04, Resetting
Aug 26 21:38:54 TrueNAS-304 mps0: Reinitializing controller
Aug 26 21:38:56 TrueNAS-304 mps0: Firmware: 20.00.07.00, Driver: 21.02.00.00-fbsd
Aug 26 21:38:56 TrueNAS-304 mps0: IOCCapabilities: 5a85c<ScsiTaskFull,DiagTrace,SnapBuf,EEDP,TransRetry,EventReplay,MSIXIndex,HostDisc>
Aug 26 21:38:56 TrueNAS-304 (da5:mps0:0:5:0): Invalidating pack
Aug 26 21:38:56 TrueNAS-304 da5 at mps0 bus 0 scbus0 target 5 lun 0
Aug 26 21:38:56 TrueNAS-304 da5: <ATA WDC WD40EFRX-68N 0A82>  s/n WD-WCC7K2KASH4F detached
Aug 26 21:38:56 TrueNAS-304 GEOM_MIRROR: Device swap2: provider da5p1 disconnected.
Aug 26 21:39:08 TrueNAS-304 (da5:mps0:0:5:0): Periph destroyed
Aug 26 21:39:08 TrueNAS-304 da5 at mps0 bus 0 scbus0 target 5 lun 0
Aug 26 21:39:08 TrueNAS-304 da5: <ATA WDC WD40EFRX-68N 0A82> Fixed Direct Access SPC-4 SCSI device
Aug 26 21:39:08 TrueNAS-304 da5: Serial Number WD-WCC7K2KASH4F
Aug 26 21:39:08 TrueNAS-304 da5: 600.000MB/s transfers
Aug 26 21:39:08 TrueNAS-304 da5: Command Queueing enabled
Aug 26 21:39:08 TrueNAS-304 da5: 3815447MB (7814037168 512 byte sectors)
Aug 26 21:39:08 TrueNAS-304 da5: quirks=0x8<4K>
Aug 26 21:39:47 TrueNAS-304 mps0: IOC Fault 0x40000d04, Resetting
Aug 26 21:39:47 TrueNAS-304 mps0: Reinitializing controller
Aug 26 21:39:47 TrueNAS-304 mps0: Firmware: 20.00.07.00, Driver: 21.02.00.00-fbsd
Aug 26 21:39:47 TrueNAS-304 mps0: IOCCapabilities: 5a85c<ScsiTaskFull,DiagTrace,SnapBuf,EEDP,TransRetry,EventReplay,MSIXIndex,HostDisc>
Aug 26 21:40:03 TrueNAS-304 mps0: IOC Fault 0x40000d04, Resetting
Aug 26 21:40:03 TrueNAS-304 mps0: Reinitializing controller
Aug 26 21:40:03 TrueNAS-304 mps0: Firmware: 20.00.07.00, Driver: 21.02.00.00-fbsd
Aug 26 21:40:03 TrueNAS-304 mps0: IOCCapabilities: 5a85c<ScsiTaskFull,DiagTrace,SnapBuf,EEDP,TransRetry,EventReplay,MSIXIndex,HostDisc>
Aug 26 21:40:43 TrueNAS-304 mps0: IOC Fault 0x40000d04, Resetting
Aug 26 21:40:43 TrueNAS-304 mps0: Reinitializing controller
Aug 26 21:40:43 TrueNAS-304 mps0: Firmware: 20.00.07.00, Driver: 21.02.00.00-fbsd
Aug 26 21:40:43 TrueNAS-304 mps0: IOCCapabilities: 5a85c<ScsiTaskFull,DiagTrace,SnapBuf,EEDP,TransRetry,EventReplay,MSIXIndex,HostDisc>
Aug 26 21:40:43 TrueNAS-304 (da0:mps0:0:0:0): Invalidating pack
Aug 26 21:40:43 TrueNAS-304 da0 at mps0 bus 0 scbus0 target 0 lun 0
Aug 26 21:40:43 TrueNAS-304 da0: <ATA WDC WD40EFRX-68N 0A82>  s/n WD-WCC7K7EZK9C0 detached
Aug 26 21:40:43 TrueNAS-304 GEOM_MIRROR: Device swap1: provider da0p1 disconnected.
Aug 26 21:40:51 TrueNAS-304 (da0:mps0:0:0:0): Periph destroyed
Aug 26 21:40:51 TrueNAS-304 da0 at mps0 bus 0 scbus0 target 0 lun 0
Aug 26 21:40:51 TrueNAS-304 da0: <ATA WDC WD40EFRX-68N 0A82> Fixed Direct Access SPC-4 SCSI device
Aug 26 21:40:51 TrueNAS-304 da0: Serial Number WD-WCC7K7EZK9C0
Aug 26 21:40:51 TrueNAS-304 da0: 600.000MB/s transfers
Aug 26 21:40:51 TrueNAS-304 da0: Command Queueing enabled
Aug 26 21:40:51 TrueNAS-304 da0: 3815447MB (7814037168 512 byte sectors)
Aug 26 21:40:51 TrueNAS-304 da0: quirks=0x8<4K>
Aug 26 21:40:58 TrueNAS-304 mps0: IOC Fault 0x40000d04, Resetting
Aug 26 21:40:58 TrueNAS-304 mps0: Reinitializing controller
Aug 26 21:40:58 TrueNAS-304 mps0: Firmware: 20.00.07.00, Driver: 21.02.00.00-fbsd
Aug 26 21:40:58 TrueNAS-304 mps0: IOCCapabilities: 5a85c<ScsiTaskFull,DiagTrace,SnapBuf,EEDP,TransRetry,EventReplay,MSIXIndex,HostDisc>
Aug 26 21:40:58 TrueNAS-304 (da2:mps0:0:2:0): Invalidating pack
Aug 26 21:40:58 TrueNAS-304 da2 at mps0 bus 0 scbus0 target 2 lun 0
Aug 26 21:40:58 TrueNAS-304 da2: <ATA WDC WD40EFRX-68N 0A82>  s/n WD-WCC7K0SF95HR detached
Aug 26 21:40:58 TrueNAS-304 GEOM_MIRROR: Device swap2: provider da2p1 disconnected.
Aug 26 21:41:11 TrueNAS-304 mps0: IOC Fault 0x40000d04, Resetting
Aug 26 21:41:11 TrueNAS-304 mps0: Reinitializing controller
Aug 26 21:41:11 TrueNAS-304 mps0: Firmware: 20.00.07.00, Driver: 21.02.00.00-fbsd
Aug 26 21:41:11 TrueNAS-304 mps0: IOCCapabilities: 5a85c<ScsiTaskFull,DiagTrace,SnapBuf,EEDP,TransRetry,EventReplay,MSIXIndex,HostDisc>
Aug 26 21:41:19 TrueNAS-304 GEOM_ELI: Device mirror/swap2.eli destroyed.
Aug 26 21:41:19 TrueNAS-304 GEOM_MIRROR: Device swap2: provider destroyed.
Aug 26 21:41:19 TrueNAS-304 GEOM_MIRROR: Device swap2 destroyed.
Aug 26 21:41:19 TrueNAS-304 GEOM_ELI: Device mirror/swap1.eli destroyed.
Aug 26 21:41:19 TrueNAS-304 GEOM_MIRROR: Device swap1: provider destroyed.
Aug 26 21:41:19 TrueNAS-304 GEOM_MIRROR: Device swap1 destroyed.
Aug 26 21:41:20 TrueNAS-304 mps0: IOC Fault 0x40000d04, Resetting
Aug 26 21:41:20 TrueNAS-304 mps0: Reinitializing controller
Aug 26 21:41:21 TrueNAS-304 mps0: Firmware: 20.00.07.00, Driver: 21.02.00.00-fbsd
Aug 26 21:41:21 TrueNAS-304 mps0: IOCCapabilities: 5a85c<ScsiTaskFull,DiagTrace,SnapBuf,EEDP,TransRetry,EventReplay,MSIXIndex,HostDisc>
Aug 26 21:41:21 TrueNAS-304 (da3:mps0:0:3:0): Invalidating pack
Aug 26 21:41:21 TrueNAS-304 da3 at mps0 bus 0 scbus0 target 3 lun 0
Aug 26 21:41:21 TrueNAS-304 da3: <ATA WDC WD40EFRX-68N 0A82>  s/n WD-WCC7K0SF93CX detached
Aug 26 21:41:29 TrueNAS-304 GEOM_MIRROR: Request failed (error=6). da3p1[READ(offset=512, length=512)]
Aug 26 21:41:29 TrueNAS-304 GEOM_MIRROR: Device swap0: provider da3p1 disconnected.
Aug 26 21:41:30 TrueNAS-304 GEOM_ELI: Device mirror/swap0.eli destroyed.
Aug 26 21:41:30 TrueNAS-304 GEOM_MIRROR: Device swap0: provider destroyed.
Aug 26 21:41:30 TrueNAS-304 GEOM_MIRROR: Device swap0 destroyed.
Aug 26 21:41:30 TrueNAS-304 (da2:mps0:0:2:0): Periph destroyed
Aug 26 21:41:30 TrueNAS-304 (da3:mps0:0:3:0): Periph destroyed
Aug 26 21:41:30 TrueNAS-304 da2 at mps0 bus 0 scbus0 target 2 lun 0
Aug 26 21:41:30 TrueNAS-304 da2: <ATA WDC WD40EFRX-68N 0A82> Fixed Direct Access SPC-4 SCSI device
Aug 26 21:41:30 TrueNAS-304 da2: Serial Number WD-WCC7K0SF95HR
Aug 26 21:41:30 TrueNAS-304 da2: 600.000MB/s transfers
Aug 26 21:41:30 TrueNAS-304 da2: Command Queueing enabled
Aug 26 21:41:30 TrueNAS-304 da2: 3815447MB (7814037168 512 byte sectors)
Aug 26 21:41:30 TrueNAS-304 da2: quirks=0x8<4K>
Aug 26 21:41:30 TrueNAS-304 da3 at mps0 bus 0 scbus0 target 3 lun 0
Aug 26 21:41:30 TrueNAS-304 da3: <ATA WDC WD40EFRX-68N 0A82> Fixed Direct Access SPC-4 SCSI device
Aug 26 21:41:30 TrueNAS-304 da3: Serial Number WD-WCC7K0SF93CX
Aug 26 21:41:30 TrueNAS-304 da3: 600.000MB/s transfers
Aug 26 21:41:30 TrueNAS-304 da3: Command Queueing enabled
Aug 26 21:41:30 TrueNAS-304 da3: 3815447MB (7814037168 512 byte sectors)
Aug 26 21:41:30 TrueNAS-304 da3: quirks=0x8<4K>
Aug 26 21:41:30 TrueNAS-304 GEOM_MIRROR: Device mirror/swap0 launched (3/3).
Aug 26 21:41:30 TrueNAS-304 GEOM_ELI: Device mirror/swap0.eli created.
Aug 26 21:41:30 TrueNAS-304 GEOM_ELI: Encryption: AES-XTS 128
Aug 26 21:41:30 TrueNAS-304 GEOM_ELI:     Crypto: accelerated software
Aug 26 21:41:31 TrueNAS-304 GEOM_MIRROR: Device mirror/swap1 launched (3/3).
Aug 26 21:41:31 TrueNAS-304 GEOM_ELI: Device mirror/swap1.eli created.
Aug 26 21:41:31 TrueNAS-304 GEOM_ELI: Encryption: AES-XTS 128
Aug 26 21:41:31 TrueNAS-304 GEOM_ELI:     Crypto: accelerated software
...

After some time the pool is suspended by TrueNAS automatically: Aug 26 21:46:36 TrueNAS-304 Solaris: WARNING: Pool 'holvi' has encountered an uncorrectable I/O failure and has been suspended. (Limited to 3000 characters, so the entire log is readable here: https://f.mstk.eu/NAS_problem.txt)

After this both the Pools and Disks screens just keep showing the loading icon forever. After a restart the whole system works great for a few minutes, until this starts again.

I would revert back to 12.0-U8, but I unfortunately I already upgraded my pools... A big mistake on my part. Is there anything else I should still try? I do not have another controller on hand. Reinstall and restore configuration saved before upgrade maybe?
 

Samuel Tai

Never underestimate your own stupidity
Moderator
Joined
Apr 24, 2020
Messages
5,399
Make sure the heat sink is still well connected to the controller chip. You may need to refresh the thermal compound underneath the heat sink. Also, clean the gold fingers with alcohol, and reseat all the cables.
 

ChrisRJ

Wizard
Joined
Oct 23, 2020
Messages
1,919
While the thermal aspect is certainly something to look at in general, is it really that relevant in the context of an upgrade scenario (assuming the OP did not leave out any temporal aspects)?
 

Mystik

Cadet
Joined
Aug 26, 2022
Messages
3
Make sure the heat sink is still well connected to the controller chip. You may need to refresh the thermal compound underneath the heat sink. Also, clean the gold fingers with alcohol, and reseat all the cables.
This certainly doesn't hurt, so I went ahead and cleaned the insides of NAS and repasted the controller chip. As you might have guessed, the old thermal compoud was very dry, so a repaste was definitely due, in any case. I also set fans to full speed on my case just to make sure a possible heat problem would be eliminated while testing.

On boot the issue still remained, and the same errors were continuing:
Code:
Aug 28 18:51:54 TrueNAS-304 mps0: IOC Fault 0x40000d04, Resetting
Aug 28 18:51:54 TrueNAS-304 mps0: Reinitializing controller
Aug 28 18:51:54 TrueNAS-304 mps0: Firmware: 20.00.07.00, Driver: 21.02.00.00-fbsd
Aug 28 18:51:54 TrueNAS-304 mps0: IOCCapabilities: 5a85c<ScsiTaskFull,DiagTrace,SnapBuf,EEDP,TransRetry,EventReplay,MSIXIndex,HostDisc>
Aug 28 18:52:20 TrueNAS-304 mps0: IOC Fault 0x40000d04, Resetting
Aug 28 18:52:20 TrueNAS-304 mps0: Reinitializing controller
Aug 28 18:52:20 TrueNAS-304 mps0: Firmware: 20.00.07.00, Driver: 21.02.00.00-fbsd
Aug 28 18:52:20 TrueNAS-304 mps0: IOCCapabilities: 5a85c<ScsiTaskFull,DiagTrace,SnapBuf,EEDP,TransRetry,EventReplay,MSIXIndex,HostDisc>
Aug 28 18:52:37 TrueNAS-304 mps0: IOC Fault 0x40000d04, Resetting
Aug 28 18:52:37 TrueNAS-304 mps0: Reinitializing controller
Aug 28 18:52:37 TrueNAS-304 mps0: Firmware: 20.00.07.00, Driver: 21.02.00.00-fbsd
Aug 28 18:52:37 TrueNAS-304 mps0: IOCCapabilities: 5a85c<ScsiTaskFull,DiagTrace,SnapBuf,EEDP,TransRetry,EventReplay,MSIXIndex,HostDisc>
Aug 28 18:52:49 TrueNAS-304 mps0: IOC Fault 0x40000d04, Resetting
Aug 28 18:52:49 TrueNAS-304 mps0: Reinitializing controller
Aug 28 18:52:49 TrueNAS-304 mps0: Firmware: 20.00.07.00, Driver: 21.02.00.00-fbsd
Aug 28 18:52:49 TrueNAS-304 mps0: IOCCapabilities: 5a85c<ScsiTaskFull,DiagTrace,SnapBuf,EEDP,TransRetry,EventReplay,MSIXIndex,HostDisc>
Aug 28 18:53:12 TrueNAS-304 mps0: IOC Fault 0x40000d04, Resetting
Aug 28 18:53:12 TrueNAS-304 mps0: Reinitializing controller
Aug 28 18:53:12 TrueNAS-304 mps0: Firmware: 20.00.07.00, Driver: 21.02.00.00-fbsd
Aug 28 18:53:12 TrueNAS-304 mps0: IOCCapabilities: 5a85c<ScsiTaskFull,DiagTrace,SnapBuf,EEDP,TransRetry,EventReplay,MSIXIndex,HostDisc>
Aug 28 18:53:28 TrueNAS-304 mps0: IOC Fault 0x40000d04, Resetting
Aug 28 18:53:28 TrueNAS-304 mps0: Reinitializing controller
Aug 28 18:53:28 TrueNAS-304 mps0: Firmware: 20.00.07.00, Driver: 21.02.00.00-fbsd
Aug 28 18:53:28 TrueNAS-304 mps0: IOCCapabilities: 5a85c<ScsiTaskFull,DiagTrace,SnapBuf,EEDP,TransRetry,EventReplay,MSIXIndex,HostDisc>
Aug 28 18:53:46 TrueNAS-304 mps0: IOC Fault 0x40000d04, Resetting
Aug 28 18:53:46 TrueNAS-304 mps0: Reinitializing controller
Aug 28 18:53:48 TrueNAS-304 mps0: Firmware: 20.00.07.00, Driver: 21.02.00.00-fbsd
Aug 28 18:53:48 TrueNAS-304 mps0: IOCCapabilities: 5a85c<ScsiTaskFull,DiagTrace,SnapBuf,EEDP,TransRetry,EventReplay,MSIXIndex,HostDisc>
Aug 28 18:53:59 TrueNAS-304 mps0: IOC Fault 0x40000d04, Resetting
Aug 28 18:53:59 TrueNAS-304 mps0: Reinitializing controller
Aug 28 18:53:59 TrueNAS-304 mps0: Firmware: 20.00.07.00, Driver: 21.02.00.00-fbsd
Aug 28 18:53:59 TrueNAS-304 mps0: IOCCapabilities: 5a85c<ScsiTaskFull,DiagTrace,SnapBuf,EEDP,TransRetry,EventReplay,MSIXIndex,HostDisc>
Aug 28 18:54:12 TrueNAS-304 mps0: IOC Fault 0x40000d04, Resetting
Aug 28 18:54:12 TrueNAS-304 mps0: Reinitializing controller
Aug 28 18:54:12 TrueNAS-304 mps0: Firmware: 20.00.07.00, Driver: 21.02.00.00-fbsd
Aug 28 18:54:12 TrueNAS-304 mps0: IOCCapabilities: 5a85c<ScsiTaskFull,DiagTrace,SnapBuf,EEDP,TransRetry,EventReplay,MSIXIndex,HostDisc>


This time however the disks were not getting dropped, which I believe was due to a resilver trying to complete, but constantly failing and restarting due to the controller problems.

I booted with the TrueNAS-12.0-U8.1 boot environment, and was surprised when the upgraded pools apparently didn't matter, as I could boot just find and the pools were attached just fine. My jails don't seem to be working, but at least the pools seem to be working fine.

The errors are not present in TrueNAS-12.0-U8.1, so this tells me that this is indeed an issue with the TrueNAS-13-U1.1 and LSI SAS 9207-i8, or then something went wrong with the update process. I'll let you know after I perform a clean install and then with a restored configuration.
 

Mystik

Cadet
Joined
Aug 26, 2022
Messages
3
After the boot to 12.0-U8.1 and then back to 13.0-U1.1, I can no longer reproduce the issue.
 

naz119849

Cadet
Joined
Jun 21, 2022
Messages
3
I encountered the exact same issue as described when upgrading from 12.0-U8.1 to 13.0-U1.1.
After the boot to 12.0-U8.1 and then back to 13.0-U1.1, I can no longer reproduce the issue.

This made no difference for me. A fresh 13.0-U1.1 also reproduced the issue. I've since gone back to 12.0-U8.1 and the problem is gone.

If it helps, here are the controller details (captured when running 12.0-U8.1 again):

# sas2flash -list LSI Corporation SAS2 Flash Utility Version 16.00.00.00 (2013.03.01) Copyright (c) 2008-2013 LSI Corporation. All rights reserved Adapter Selected is a LSI SAS: SAS2008(B2) Controller Number : 0 Controller : SAS2008(B2) PCI Address : 00:02:00:00 SAS Address : 500605b-0-044c-8000 NVDATA Version (Default) : 14.01.00.08 NVDATA Version (Persistent) : 14.01.00.08 Firmware Product ID : 0x2213 (IT) Firmware Version : 20.00.07.00 NVDATA Vendor : LSI NVDATA Product ID : SAS9211-8i BIOS Version : 07.31.00.00 UEFI BSD Version : N/A FCODE Version : N/A Board Name : SAS9211-8i Board Assembly : N/A Board Tracer Number : N/A Finished Processing Commands Successfully. Exiting SAS2Flash.
 

NatK

Cadet
Joined
Aug 22, 2022
Messages
4
Is this issue still existing in Truenas 13.0-U3.1 with LSI 9207 or not? Can anyone confirm? How about Trunas SCALE?

Thanks
Nat
 

naz119849

Cadet
Joined
Jun 21, 2022
Messages
3
I've upgraded from 12.0-U8.1 to 13.0-U5.3 and I'm still seeing the same problem (with a new controller of the same make). I've dropped back to 12.0-U8.1 again.

The current controller has the following output:


Code:
# sas2flash -list
LSI Corporation SAS2 Flash Utility
Version 16.00.00.00 (2013.03.01)
Copyright (c) 2008-2013 LSI Corporation. All rights reserved

Adapter Selected is a LSI SAS: SAS2008(B2)

Controller Number              : 0
Controller                     : SAS2008(B2)
PCI Address                    : 00:02:00:00
SAS Address                    : 500605b-0-095a-7280
NVDATA Version (Default)       : 14.01.00.06
NVDATA Version (Persistent)    : 14.01.00.06
Firmware Product ID            : 0x2213 (IT)
Firmware Version               : 20.00.07.00
NVDATA Vendor                  : LSI
NVDATA Product ID              : SAS9210-8i
BIOS Version                   : 07.39.02.00
UEFI BSD Version               : 07.27.01.01
FCODE Version                  : N/A
Board Name                     : SAS9210-8i
Board Assembly                 : H3-25097-03B
Board Tracer Number            : SP50303427

Finished Processing Commands Successfully.
Exiting SAS2Flash.
 
Last edited by a moderator:

mardel

Cadet
Joined
Aug 11, 2019
Messages
8
@Samuel Tai

Looks like I have similar issue :) - "door bell handshake failed".
My setup was running smoothly for quite some time now on TrueNAS Core 12 + 2 HBAs 2008 + 8x8TB spinners. I decided to upgrade to v13 and upgrade to 12-U8.1 I believe and then to the latest 13 a week ago +-.
This is when "door bell handshake failed" started showing up and my TrueNAS VM is just abruptly shuts down.
I checked forum and seemed like HBA could be "dead" (though everyone says it is very hard to kill them). I found the culprit HBA and removed it physically. System worked fine for a day without much data movement. I was running badblocks on a new drive and was playing with mounting/unmounting SMB shares from the MAC when it happened again :). I disconnected new drive and tried to copy large file from the existing working pool (3-way mirror) and it crashed VM again.
BTW, when my first HBA "failed" I ordered new one and it should be soon delivered, but now wondering whether it is TrueNAS Core 13 is the problem. I wish I didn't upgrade my pool after updating to v13 :) so it will be hard to test it again on v12.

Anything I can provide to help diagnose it?
 
Top