Help with seemingly random reboots

Ctiger256 · Feb 12, 2020

Was hoping some folks on the forum might be able to help me troubleshoot this issue.

This is an all flash system I built a little less than a year ago with the help of this forum. The specs are:

Chasis: SUPERMICRO CSE-216BE2C-R920LPB
Motherboard: SUPERMICRO MBD-X10SRH-CLN4F-O (HBA is an onboard LSI 3008 flashed to IT mode)
CPU: Xeon E5-1650v4
RAM: 4x32GB Samsung DDR4-2400
Boot Drive: Two mirrored SSD-DM064-SMCMVN1 (64GB DOM)
SLOG: Intel P4800X (my performance on that is posted on the SLOG benchmarking thread)
Array Drives - 8 x Samsung 883 DCT 1.92 TB
Network - Chelsio T520-BT

It has worked extremely well in production since then, serving exclusively as the datastore for an ESXi cluster on a 10gb network. The activity it sees is a few dozen VMs with varying workloads (virtual desktops, networks servers, a few light databases) and nightly backups to a few different places. All great; no issues.

About a month ago I upgraded to the 11.3 chain (i think it was RC2 but not positive, could have been RC1). A little after that the following error popped up in the daily alerts

Code:

Device: /dev/da7 [SAT], not capable of SMART self-check.

Went in to the console--smart seemed to be working fine. Ran short and long tests on the drive; no issues.

A little while later the system reboots. Never had a random reboot before on the system. OK, I think (maybe not entirely rationally) this is just bad luck--maybe a cosmic ray hit the controller, whatever. But the reboots keep happening about once a day. The logs have entries like this around the time of the reboots:

Code:

mpr0: mprsas_action_scsiio: Freezing devq for target ID 24
(da7:mpr0:0:24:0): WRITE(10). CDB: 2a 00 db 9d 83 18 00 00 10 00
(da7:mpr0:0:24:0): CAM status: CAM subsystem is busy
(da7:mpr0:0:24:0): Retrying command

I've been running FreeNAS for probably 6 or 7 years and the only issues I've ever had are bad drives, so I immediately assume that da7 has just gone bad. Swap it. A few hours later I get

Code:

Device: /dev/da6 [SAT], not capable of SMART self-check.

Da6 is not the replacement drive; it's the next one down in the pool. And the reboots keep happening--about once a day (BTW kudos to VMWare for how resilient ESXi is to a complete reboot of the datastore; the vms freeze but nearly all of them resume after the store comes back--without a reboot). I don't really know what is going on but it seems like its more likely a controller or cable issue. So I migrate all the VMs off this store, pull out the chasis and take a look. It's fairly dusty (note to self: look at improving server room environment) so I blow out all the dust, reseat all the cables, and, for good measure, move all the drives to previously unused ports on the backplane (8 drives in a 24 bay chasis, plenty of room). I also take the old da7 drive and add it back as a hot spare (bringing drive total to 9) as I'm now thinking that it wasn't bad.

Reboot, and all seems well. Drives all pass smartctl. Run a few read/write tests; wait a day; all good. Start migrating some non-essential vms back on to it. No issues. VMs work fine, backups work fine. 6 days of solid uptime. I'm feeling good; migrate a few of the bigger--but still not mission critical--vms back on to it. All OK.

Then--this morning (7 days of uptime into it) I wake up to the following e-mail:

Current alerts:
* Scrub of pool 'flashpool' finished.
* Scrub of pool 'freenas-boot' finished.
* Device: /dev/da8 [SAT], not capable of SMART self-check.
* Device: /dev/da6 [SAT], Read SMART Self-Test Log Failed.
* Device: /dev/da5 [SAT], not capable of SMART self-check.
* Device: /dev/da5 [SAT], failed to read SMART Attribute Data.

I don't need to tell you that these are all fully capable of SMART tests, and, in fact, pass them when run manually. The log is attached, but you'll see the similar camstatus errors.

I have not had any reboots, but I feel like that's coming...

Any thoughts or ideas on how to troubleshoot further? At this point, I'm not sure what to do but trying swapping the SAS breakout cables, then the controller (annoying as I'll have to get a PCI controller since this one is on the mobo), and, really worse case, the backplane (yuck). Any other/better ideas?

What else can I tell you that might be helpful? Let me see:

At this point I'm on 11.3-Release (I upgraded when I pulled the chasis)
The only thing "unusual" that happened last night was I had added an extra backup location in Veeam so there would have been a little more pressure on the drives copying to the new repository.
Drive temperatures are good--maybe max 34 C under load.
I have logs going back in time which I'm not posting (a) to spare you; (b) they are handled by a remote server which stores them in a sqldb so its a little more work to post them in a usable format (but I'm happy to if it would help). To my eye, they look similar to the one I'm posting now.

As always greatly appreciate the patience and insight on these boards.

Best,
Sam

willuz · Feb 12, 2020

I'm having an identical problem to yours after upgrading from 11.2u7 to 11.3. Both the SMART errors and the "mprsas_action_scsiio: Freezing devq for target ID" have been occurring. Sometimes FreeNAS just resets all 4 SAS controllers and resumes and other times it immediately reboots. While I have not found a solution I have found a possible correlation. The system has a mix of spinning and SSD but the errors only occur on SSD's. I have a nearly identical backup system with only spinning disks that was upgraded at the same time and it has not had any issues. There may be a bug between SMART and the SSD's.

Motherboard: Supermicro X10 DRL-i
RAM: 128GB
SAS Cards: LSI 9305-16i (x2) LSI 9305-24i (x2)
SSDs: Micron 5200 MTFS U804
Disks: HGST W3DO

Ctiger256 · Feb 12, 2020

Interesting, do you get reboots as well or just the errors? I have three other FreeNAS boxes that I upgraded around the same time--all magnetic drives--and none of them have any issues.

willuz · Feb 12, 2020

I'm getting both the errors and the reboots but not at the same time. When the errors occur the pools are unavailable temporarily while the SAS system reinitializes all of the controller cards and drives. The reboots apear to be a seg fault because all logs halt just as the reboot occurs with no errors.

Ctiger256 · Feb 12, 2020

Anyone have any ideas on how we can troubleshoot this more to see if there is a regression in 11.3 or if we are both just having bad luck?

Ctiger256 · Feb 13, 2020

No reboot, but I just had it kick out a drive a resilver using a spare. Here's the log:

Code:

Feb 13 08:11:52 bob     (pass7:mpr0:0:24:0): LOG SENSE. CDB: 4d 00 0d 00 00 00 00 00 40 00 length 64 SMID 631 Aborting command 0xfffffe0001253b10
Feb 13 08:11:52 bob mpr0: Sending reset from mprsas_send_abort for target ID 24
Feb 13 08:11:52 bob mpr0: mprsas_action_scsiio: Freezing devq for target ID 24
Feb 13 08:11:52 bob (da7:mpr0:0:24:0): WRITE(10). CDB: 2a 00 63 2c 63 40 00 00 08 00
Feb 13 08:11:52 bob (da7:mpr0:0:24:0): CAM status: CAM subsystem is busy
Feb 13 08:11:52 bob (da7:mpr0:0:24:0): Retrying command
Feb 13 08:11:52 bob     (pass8:mpr0:0:25:0): LOG SENSE. CDB: 4d 00 2f 00 00 00 00 00 40 00 length 64 SMID 305 Aborting command 0xfffffe0001236670
Feb 13 08:11:52 bob mpr0: Sending reset from mprsas_send_abort for target ID 25
Feb 13 08:11:53 bob mpr0: mprsas_action_scsiio: Freezing devq for target ID 24
Feb 13 08:11:53 bob (da7:mpr0:0:24:0): WRITE(10). CDB: 2a 00 63 2c 63 40 00 00 08 00
Feb 13 08:11:53 bob (da7:mpr0:0:24:0): CAM status: CAM subsystem is busy
Feb 13 08:11:53 bob (da7:mpr0:0:24:0): Retrying command
Feb 13 08:11:54 bob mpr0: mprsas_action_scsiio: Freezing devq for target ID 24
Feb 13 08:11:54 bob (da7:mpr0:0:24:0): WRITE(10). CDB: 2a 00 63 2c 63 40 00 00 08 00
Feb 13 08:11:54 bob (da7:mpr0:0:24:0): CAM status: CAM subsystem is busy
Feb 13 08:11:54 bob (da7:mpr0:0:24:0): Retrying command
Feb 13 08:11:54 bob mpr0: mprsas_action_scsiio: Freezing devq for target ID 24
Feb 13 08:11:54 bob (da7:mpr0:0:24:0): WRITE(10). CDB: 2a 00 63 2c 63 40 00 00 08 00
Feb 13 08:11:54 bob (da7:mpr0:0:24:0): CAM status: CAM subsystem is busy
Feb 13 08:11:54 bob (da7:mpr0:0:24:0): Retrying command
Feb 13 08:11:55 bob mpr0: mprsas_action_scsiio: Freezing devq for target ID 24
Feb 13 08:11:55 bob (da7:mpr0:0:24:0): WRITE(10). CDB: 2a 00 63 2c 63 40 00 00 08 00
Feb 13 08:11:55 bob (da7:mpr0:0:24:0): CAM status: CAM subsystem is busy
Feb 13 08:11:55 bob (da7:mpr0:0:24:0): Error 5, Retries exhausted
Feb 13 08:11:55 bob mpr0: mprsas_action_scsiio: Freezing devq for target ID 24
Feb 13 08:11:55 bob (da7:mpr0:0:24:0): ATA COMMAND PASS THROUGH(16). CDB: 85 0d 06 00 01 00 01 00 00 00 00 00 00 40 06 00
Feb 13 08:11:55 bob (da7:mpr0:0:24:0): CAM status: CAM subsystem is busy
Feb 13 08:11:55 bob (da7:mpr0:0:24:0): Retrying command
Feb 13 08:11:55 bob mpr0: mprsas_action_scsiio: Freezing devq for target ID 24
Feb 13 08:11:55 bob (da7:mpr0:0:24:0): ATA COMMAND PASS THROUGH(16). CDB: 85 0d 06 00 01 00 01 00 00 00 00 00 00 40 06 00
Feb 13 08:11:55 bob (da7:mpr0:0:24:0): CAM status: CAM subsystem is busy
Feb 13 08:11:55 bob (da7:mpr0:0:24:0): Retrying command
Feb 13 08:11:56 bob mpr0: mprsas_action_scsiio: Freezing devq for target ID 24
Feb 13 08:11:56 bob (da7:mpr0:0:24:0): ATA COMMAND PASS THROUGH(16). CDB: 85 0d 06 00 01 00 01 00 00 00 00 00 00 40 06 00
Feb 13 08:11:56 bob (da7:mpr0:0:24:0): CAM status: CAM subsystem is busy
Feb 13 08:11:56 bob (da7:mpr0:0:24:0): Retrying command
Feb 13 08:11:56 bob mpr0: mprsas_action_scsiio: Freezing devq for target ID 24
Feb 13 08:11:56 bob (da7:mpr0:0:24:0): ATA COMMAND PASS THROUGH(16). CDB: 85 0d 06 00 01 00 01 00 00 00 00 00 00 40 06 00
Feb 13 08:11:56 bob (da7:mpr0:0:24:0): CAM status: CAM subsystem is busy
Feb 13 08:11:56 bob (da7:mpr0:0:24:0): Retrying command
Feb 13 08:11:57 bob mpr0: mprsas_action_scsiio: Freezing devq for target ID 24
Feb 13 08:11:57 bob (da7:mpr0:0:24:0): ATA COMMAND PASS THROUGH(16). CDB: 85 0d 06 00 01 00 01 00 00 00 00 00 00 40 06 00
Feb 13 08:11:57 bob (da7:mpr0:0:24:0): CAM status: CAM subsystem is busy
Feb 13 08:11:57 bob (da7:mpr0:0:24:0): Error 5, Retries exhausted
Feb 13 08:11:57 bob mpr0: mprsas_action_scsiio: Freezing devq for target ID 24
Feb 13 08:11:57 bob (da7:mpr0:0:24:0): READ(10). CDB: 28 00 00 40 02 90 00 00 10 00
Feb 13 08:11:57 bob (da7:mpr0:0:24:0): CAM status: CAM subsystem is busy
Feb 13 08:11:57 bob (da7:mpr0:0:24:0): Retrying command
Feb 13 08:11:57 bob mpr0: mprsas_action_scsiio: Freezing devq for target ID 24
Feb 13 08:11:57 bob (da7:mpr0:0:24:0): READ(10). CDB: 28 00 00 40 02 90 00 00 10 00
Feb 13 08:11:57 bob (da7:mpr0:0:24:0): CAM status: CAM subsystem is busy
Feb 13 08:11:57 bob (da7:mpr0:0:24:0): Retrying command
Feb 13 08:11:59 bob     (xpt0:mpr0:0:24:0): SMID 7 task mgmt 0xfffffe000121ba10 timed out
Feb 13 08:11:59 bob mpr0: Reinitializing controller,
Feb 13 08:11:59 bob mpr0: Unfreezing devq for target ID 24
Feb 13 08:11:59 bob mpr0: Unfreezing devq for target ID 25
Feb 13 08:11:59 bob mpr0: Firmware: 16.00.01.00, Driver: 18.03.00.00-fbsd
Feb 13 08:11:59 bob mpr0: IOCCapabilities: 7a85c<ScsiTaskFull,DiagTrace,SnapBuf,EEDP,TransRetry,EventReplay,MSIXIndex,HostDisc,FastPath,RDPQArray>
Feb 13 08:11:59 bob mpr0: mpr_reinit finished sc 0xfffffe00011c5000 post 4 free 3
Feb 13 08:11:59 bob (da3:mpr0:0:20:0): Invalidating pack
Feb 13 08:11:59 bob da3 at mpr0 bus 0 scbus0 target 20 lun 0
Feb 13 08:11:59 bob da3: <ATA SAMSUNG MZ7LH1T9 304Q> s/n S455NY0M210668       detached
Feb 13 08:12:01 bob mpr0: SAS Address for SATA device = 8f7d5c37ddc396bf
Feb 13 08:12:01 bob mpr0: SAS Address for SATA device = 8f7c5c37ddc396bf
Feb 13 08:12:01 bob mpr0: SAS Address for SATA device = 8f7b5c37ddc396bf
Feb 13 08:12:01 bob mpr0: SAS Address from SAS device page0 = 500304801f1e0449
Feb 13 08:12:01 bob mpr0: SAS Address from SATA device = 8f7d5c37ddc396bf
Feb 13 08:12:01 bob mpr0: Found device <81<SataDev>,End Device> <12.0Gbps> handle<0x000a> enclosureHandle<0x0002> slot 9
Feb 13 08:12:01 bob mpr0: At enclosure level 0 and connector name (    )
Feb 13 08:12:01 bob mpr0: SAS Address from SAS device page0 = 500304801f1e044a
Feb 13 08:12:01 bob mpr0: SAS Address from SATA device = 8f7c5c37ddc396bf
Feb 13 08:12:01 bob mpr0: Found device <81<SataDev>,End Device> <12.0Gbps> handle<0x000b> enclosureHandle<0x0002> slot 10
Feb 13 08:12:01 bob mpr0: At enclosure level 0 and connector name (    )
Feb 13 08:12:01 bob mpr0: SAS Address from SAS device page0 = 500304801f1e044b
Feb 13 08:12:01 bob mpr0: SAS Address from SATA device = 8f7b5c37ddc396bf
Feb 13 08:12:01 bob mpr0: Found device <81<SataDev>,End Device> <12.0Gbps> handle<0x000c> enclosureHandle<0x0002> slot 11
Feb 13 08:12:01 bob mpr0: At enclosure level 0 and connector name (    )
Feb 13 08:12:01 bob mpr0: SAS Address for SATA device = 8e815c37ddc396bf
Feb 13 08:12:01 bob mpr0: SAS Address for SATA device = 8f805c37ddc396bf
Feb 13 08:12:01 bob mpr0: SAS Address for SATA device = 8f7f5c37ddc396bf
Feb 13 08:12:01 bob mpr0: SAS Address for SATA device = 8f7e5c37ddc396bf
Feb 13 08:12:01 bob mpr0: SAS Address for SATA device = 897f5c37ecc29ebe
Feb 13 08:12:01 bob mpr0: SAS Address for SATA device = 8f7a5c37ddc396bf
Feb 13 08:12:01 bob mpr0: SAS Address from SAS device page0 = 500304801f1e045c
Feb 13 08:12:01 bob mpr0: SAS Address from SATA device = 8e815c37ddc396bf
Feb 13 08:12:01 bob mpr0: Found device <81<SataDev>,End Device> <12.0Gbps> handle<0x000d> enclosureHandle<0x0002> slot 12
Feb 13 08:12:01 bob mpr0: At enclosure level 0 and connector name (    )
Feb 13 08:12:01 bob mpr0: SAS Address from SAS device page0 = 500304801f1e045d
Feb 13 08:12:01 bob mpr0: SAS Address from SATA device = 8f805c37ddc396bf
Feb 13 08:12:01 bob mpr0: Found device <81<SataDev>,End Device> <12.0Gbps> handle<0x000e> enclosureHandle<0x0002> slot 13
Feb 13 08:12:01 bob mpr0: At enclosure level 0 and connector name (    )
Feb 13 08:12:01 bob mpr0: SAS Address from SAS device page0 = 500304801f1e045e
Feb 13 08:12:01 bob mpr0: SAS Address from SATA device = 8f7f5c37ddc396bf
Feb 13 08:12:01 bob mpr0: Found device <81<SataDev>,End Device> <12.0Gbps> handle<0x000f> enclosureHandle<0x0002> slot 14
Feb 13 08:12:01 bob mpr0: At enclosure level 0 and connector name (    )
Feb 13 08:12:01 bob mpr0: SAS Address from SAS device page0 = 500304801f1e045f
Feb 13 08:12:01 bob mpr0: SAS Address from SATA device = 8f7e5c37ddc396bf
Feb 13 08:12:01 bob mpr0: Found device <81<SataDev>,End Device> <12.0Gbps> handle<0x0010> enclosureHandle<0x0002> slot 15
Feb 13 08:12:01 bob mpr0: At enclosure level 0 and connector name (    )
Feb 13 08:12:01 bob mpr0: SAS Address from SAS device page0 = 500304801f1e0460
Feb 13 08:12:01 bob mpr0: SAS Address from SATA device = 897f5c37ecc29ebe
Feb 13 08:12:01 bob mpr0: Found device <81<SataDev>,End Device> <12.0Gbps> handle<0x0011> enclosureHandle<0x0002> slot 16
Feb 13 08:12:01 bob mpr0: At enclosure level 0 and connector name (    )
Feb 13 08:12:01 bob mpr0: SAS Address from SAS device page0 = 500304801f1e0461
Feb 13 08:12:01 bob mpr0: SAS Address from SATA device = 8f7a5c37ddc396bf
Feb 13 08:12:01 bob mpr0: Found device <81<SataDev>,End Device> <12.0Gbps> handle<0x0012> enclosureHandle<0x0002> slot 17
Feb 13 08:12:01 bob mpr0: At enclosure level 0 and connector name (    )
Feb 13 08:12:01 bob mpr0: SAS Address from SAS device page0 = 500304801f1e047d
Feb 13 08:12:01 bob mpr0: Found device <4411<SmpInit,SspTarg,SepDev>,End Device> <12.0Gbps> handle<0x0013> enclosureHandle<0x0002> slot 24
Feb 13 08:12:01 bob mpr0: At enclosure level 0 and connector name (    )
Feb 13 08:12:07 bob ZFS: vdev state changed, pool_guid=8431904677349058363 vdev_guid=2515032484553382323
Feb 13 08:12:07 bob ZFS: vdev is removed, pool_guid=8431904677349058363 vdev_guid=2515032484553382323
Feb 13 08:12:07 bob ZFS: vdev state changed, pool_guid=8431904677349058363 vdev_guid=2515032484553382323
Feb 13 08:12:07 bob ZFS: vdev state changed, pool_guid=8431904677349058363 vdev_guid=17635940207551695627
Feb 13 08:12:07 bob ZFS: vdev state changed, pool_guid=8431904677349058363 vdev_guid=17439536234879532832
Feb 13 08:12:07 bob GEOM_MIRROR: Device swap2: provider da3p1 disconnected.
Feb 13 08:12:07 bob (da3:mpr0:0:20:0): Periph destroyed
Feb 13 08:12:07 bob da3 at mpr0 bus 0 scbus0 target 20 lun 0
Feb 13 08:12:07 bob da3: <ATA SAMSUNG MZ7LH1T9 304Q> Fixed Direct Access SPC-4 SCSI device
Feb 13 08:12:07 bob da3: Serial Number S455NY0M210668
Feb 13 08:12:07 bob da3: 1200.000MB/s transfers
Feb 13 08:12:07 bob da3: Command Queueing enabled
Feb 13 08:12:07 bob da3: 1831420MB (3750748848 512 byte sectors)
Feb 13 08:12:07 bob da3: quirks=0x8<4K>
Feb 13 08:12:30 bob GEOM_ELI: Device mirror/swap2.eli destroyed.
Feb 13 08:12:30 bob GEOM_MIRROR: Device swap2: provider destroyed.
Feb 13 08:12:30 bob GEOM_MIRROR: Device swap2 destroyed.
Feb 13 08:12:30 bob GEOM_MIRROR: Device mirror/swap2 launched (2/2).
Feb 13 08:12:30 bob GEOM_ELI: Device mirror/swap2.eli created.
Feb 13 08:12:30 bob GEOM_ELI: Encryption: AES-XTS 128
Feb 13 08:12:30 bob GEOM_ELI:     Crypto: hardware
Feb 13 08:20:08 bob ZFS: vdev state changed, pool_guid=8431904677349058363 vdev_guid=5627634585846080818
Feb 13 08:20:08 bob ZFS: vdev state changed, pool_guid=8431904677349058363 vdev_guid=2515032484553382323

There is another thread in the Hardware forum and a Jira ticket suggesting this may be a driver problem in 11.3 So, I guess my next step is to revert to 11.2 (bleh) and see if these errors go away.

Ctiger256 · Feb 13, 2020

This is the other thread. I've also updated the JIRA ticket to point to this thread as well.

Ctiger256 · Feb 15, 2020

I will post any further updates in the other thread (but, yes, it did reboot randomly eventually...)

adeeronafoggylake · Feb 17, 2020

Having similar issues with my mirrored SSDs on HBA cards. Not appearing to have issues with non-mirrored SSD. I'm virtualization freenas with 2 hba cards passed through. The mirrored SSDs sit on separate HBA cards for redundancy. Problems started occurring after update to 11.3. Unfortunately I updated my pools already so there's no going back.

Freenas 11.3
2 x SAS 9300-8i - SAS3008-IT - 16.00.01
2 x - Crucial MX500 2.5" 500GB

Logs:
-----------------------------------------------------------
Feb 17 01:44:29 HOSTNAME (pass14:mpr1:0:5:0): LOG SENSE. CDB: 4d 00 2f 00 00 00 00 00 40 00 length 64 SMID 150 Aborting command 0xfffffe000153c7a0
Feb 17 01:44:29 HOSTNAME mpr1: Sending reset from mprsas_send_abort for target ID 5
Feb 17 01:44:29 HOSTNAME (pass15:mpr1:0:6:0): LOG SENSE. CDB: 4d 00 0d 00 00 00 00 00 40 00 length 64 SMID 942 Aborting command 0xfffffe0001583a20
Feb 17 01:44:29 HOSTNAME mpr1: Sending reset from mprsas_send_abort for target ID 6
Feb 17 01:44:32 HOSTNAME mpr1: mprsas_action_scsiio: Freezing devq for target ID 6
Feb 17 01:44:32 HOSTNAME (da15:mpr1:0:6:0): WRITE(10). CDB: 2a 00 04 e4 b3 98 00 01 00 00
Feb 17 01:44:32 HOSTNAME (da15:mpr1:0:6:0): CAM status: CAM subsystem is busy
Feb 17 01:44:32 HOSTNAME (da15:mpr1:0:6:0): Retrying command
Feb 17 01:44:32 HOSTNAME mpr1: mprsas_action_scsiio: Freezing devq for target ID 6
Feb 17 01:44:32 HOSTNAME (da15:mpr1:0:6:0): WRITE(10). CDB: 2a 00 04 e4 b3 98 00 01 00 00
Feb 17 01:44:32 HOSTNAME (da15:mpr1:0:6:0): CAM status: CAM subsystem is busy
Feb 17 01:44:32 HOSTNAME (da15:mpr1:0:6:0): Retrying command
Feb 17 01:44:33 HOSTNAME mpr1: mprsas_action_scsiio: Freezing devq for target ID 6
Feb 17 01:44:33 HOSTNAME (da15:mpr1:0:6:0): WRITE(10). CDB: 2a 00 04 e4 b3 98 00 01 00 00
Feb 17 01:44:33 HOSTNAME (da15:mpr1:0:6:0): CAM status: CAM subsystem is busy
Feb 17 01:44:33 HOSTNAME (da15:mpr1:0:6:0): Retrying command
Feb 17 01:44:33 HOSTNAME mpr1: mprsas_action_scsiio: Freezing devq for target ID 6
Feb 17 01:44:33 HOSTNAME (da15:mpr1:0:6:0): WRITE(10). CDB: 2a 00 04 e4 b3 98 00 01 00 00
Feb 17 01:44:33 HOSTNAME (da15:mpr1:0:6:0): CAM status: CAM subsystem is busy
Feb 17 01:44:33 HOSTNAME (da15:mpr1:0:6:0): Retrying command
Feb 17 01:44:34 HOSTNAME mpr1: mprsas_action_scsiio: Freezing devq for target ID 6
Feb 17 01:44:34 HOSTNAME (da15:mpr1:0:6:0): WRITE(10). CDB: 2a 00 04 e4 b3 98 00 01 00 00
Feb 17 01:44:34 HOSTNAME (da15:mpr1:0:6:0): CAM status: CAM subsystem is busy
Feb 17 01:44:34 HOSTNAME (da15:mpr1:0:6:0): Error 5, Retries exhausted
Feb 17 01:44:34 HOSTNAME mpr1: mprsas_action_scsiio: Freezing devq for target ID 6
Feb 17 01:44:34 HOSTNAME (da15:mpr1:0:6:0): WRITE(10). CDB: 2a 00 19 13 af 20 00 00 d0 00
Feb 17 01:44:34 HOSTNAME (da15:mpr1:0:6:0): CAM status: CAM subsystem is busy
Feb 17 01:44:34 HOSTNAME (da15:mpr1:0:6:0): Retrying command
Feb 17 01:44:53 HOSTNAME collectd[6369]: nut plugin: nut_read: upscli_list_start (1500PFCLCD) failed: Driver not connected
Feb 17 01:45:23 HOSTNAME collectd[6369]: nut plugin: nut_read: upscli_list_start (1500PFCLCD) failed: Driver not connected
Feb 17 01:45:24 HOSTNAME collectd[6369]: Traceback (most recent call last):
File "/usr/local/lib/collectd_pyplugins/disktemp.py", line 66, in read
temperatures = c.call('disk.temperatures', self.disks, self.powermode, self.smartctl_args)
File "/usr/local/lib/collectd_pyplugins/disktemp.py", line 66, in read
temperatures = c.call('disk.temperatures', self.disks, self.powermode, self.smartctl_args)
File "/usr/local/lib/python3.7/site-packages/middlewared/client/client.py", line 500, in call
raise CallTimeout("Call timeout")
middlewared.client.client.CallTimeout: Call timeout
Feb 17 01:45:53 HOSTNAME collectd[6369]: nut plugin: nut_read: upscli_list_start (1500PFCLCD) failed: Driver not connected
Check to stop refresh
-----------------------------------------------------------

Important Announcement for the TrueNAS Community.

Help with seemingly random reboots

Ctiger256

Dabbler

Attachments

willuz

Cadet

Ctiger256

Dabbler

willuz

Cadet

Ctiger256

Dabbler

Ctiger256

Dabbler

Ctiger256

Dabbler

Ctiger256

Dabbler

adeeronafoggylake

Cadet

Similar threads