Some CDB errors, followed by an immediate unscheduled reboot?

thirdgen89gta

Dabbler
Joined
May 5, 2014
Messages
32
This happened this morning to me, and I'm, wondering if I have a HBA card issue possibly.

This morning when checking my email I noticed I had a email notification from FreeNAS about an unscheduled reboot.

Digging into the log, I found these messages before the standard Boot-up messages.

All drives are connected to the same SAS HBA 'mps1' which is a LSI SAS 9211-8i controller. However, not all drives share the same SAS cable or backplane. I don't have any SAS expanders, there is one SAS cable to each backplane.
da11, and da13 are on the same backplane. However da14 is on another backplane.

Code:
Aug  8 00:00:00 Asgard syslog-ng[2491]: Configuration reload request received, reloading configuration;
Aug  8 00:00:00 Asgard syslog-ng[2491]: Configuration reload finished;
Aug  8 04:27:27 Asgard  (da11:mps1:0:9:0): READ(16). CDB: 88 00 00 00 00 01 2b 2f c1 a8 00 00 00 40 00 00 length 32768 SMID 409 Aborting command 0xfffffe00011138d0
Aug  8 04:27:27 Asgard mps1: Sending reset from mpssas_send_abort for target ID 9
Aug  8 04:27:27 Asgard  (da14:mps1:0:26:0): WRITE(10). CDB: 2a 00 0b 02 b7 00 00 00 08 00 length 4096 SMID 194 Aborting command 0xfffffe0001101ea0
Aug  8 04:27:27 Asgard mps1: Sending reset from mpssas_send_abort for target ID 26
Aug  8 04:27:27 Asgard  (da13:mps1:0:25:0): WRITE(10). CDB: 2a 00 0b 02 b7 00 00 00 10 00 length 8192 SMID 353 Aborting command 0xfffffe000110ef50
Aug  8 04:27:27 Asgard mps1: Sending reset from mpssas_send_abort for target ID 25
Aug  8 04:35:49 Asgard syslog-ng[2661]: syslog-ng starting up; version='3.20.1'
Aug  8 04:35:49 Asgard Copyright (c) 1992-2018 The FreeBSD Project.
Aug  8 04:35:49 Asgard Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
Aug  8 04:35:49 Asgard  The Regents of the University of California. All rights reserved.
Aug  8 04:35:49 Asgard FreeBSD is a registered trademark of The FreeBSD Foundation.
Aug  8 04:35:49 Asgard FreeBSD 11.2-STABLE #0 r325575+6aad246318c(HEAD): Mon Jun 24 17:25:47 UTC 2019
Aug  8 04:35:49 Asgard root@nemesis:/freenas-releng/freenas/_BE/objs/freenas-releng/freenas/_BE/os/sys/FreeNAS.amd64 amd64
Aug  8 04:35:49 Asgard FreeBSD clang version 6.0.0 (tags/RELEASE_600/final 326565) (based on LLVM 6.0.0)
Aug  8 04:35:49 Asgard VT(efifb): resolution 1280x1024
Aug  8 04:35:49 Asgard CPU: Intel(R) Core(TM) i7-4790K CPU @ 4.00GHz (4000.08-MHz K8-class CPU)


Code:
mps1: <Avago Technologies (LSI) SAS2008> port 0xd000-0xd0ff mem 0xf7dc0000-0xf7dc3fff,0xf7d80000-0xf7dbffff irq 17 at device 0.0 on pci2
mps1: Firmware: 20.00.07.00, Driver: 21.02.00.00-fbsd


mps0: IOCCapabilities: 1285c<ScsiTaskFull,DiagTrace,SnapBuf,EEDP,TransRetry,EventReplay,HostDisc>

So part of me is thinking, though this has only occurred once, could it be a possible indication of a failing HBA?

Some information about my NAS.
Its a Norco 4224 chassis. It has 24 3.5" hot swap bays. 4 bays wide, spread across 6 bays vertically. There are 6 backplanes overall. I have 1 cable for each backplane. the top 4 backplanes are routed to the 9201-16i. The remaining 2 bottom backplanes are routed to the 9211-8i.
i7-4790k, 32GB.

I have 2 pools. One is a 6 drive all SSD pool configured as RaidZ2. Each drive is spread vertically across all backplanes. The 2nd pool is also 6 drive RaidZ2, made up of 8TB 3.5" drives also split vertically among the 6 backplanes.

This way if I have a backplane failure, I only lose one drive from each pool, and it will be obvious as both of the lost drives would be connected to the same backplane.
 

thirdgen89gta

Dabbler
Joined
May 5, 2014
Messages
32
Well, this got more difficult.

Came home today from work to find that nothing on my network could get an IP address unless it was on the WiFi VLAN. Started looking right at the router.

After lots of troubleshooting I decided to see if the NAS had network connectivity. When I turned on the monitor and prepared to look at the console, I noticed it was frozen.

And what do I see on the screen? The system was hard locked, nothing happening, and no control. The soft reset button was unresponsive, I ended up having to hold the power button and do a hard power cycle. As soon as the FreeNAS shut down, DHCP on my network magically started functioning again and handing out IP addresses. The DHCP issue only affected devices on the same VLAN as the FreeNAS.

Code:
Aug  8 11:38:30 Asgard mps1: IOC Fault 0x40002622, Resetting
Aug  8 11:38:30 Asgard mps1: Reinitializing controller,
Aug  8 11:38:30 Asgard mps1: Firmware: 20.00.07.00, Driver: 21.02.00.00-fbsd
Aug  8 11:38:30 Asgard mps1: IOCCapabilities: 1285c<ScsiTaskFull,DiagTrace,SnapBuf,EEDP,TransRetry,EventReplay,HostDisc>
Aug  8 11:38:30 Asgard mps1: mps_reinit finished sc 0xfffffe00010ce000 post 4 free 3
Aug  8 11:38:31 Asgard mps1: SAS Address for SATA device = dc271f4dc5b18f80
Aug  8 11:38:31 Asgard mps1: SAS Address from SATA device = dc271f4dc5b18f80
Aug  8 11:38:31 Asgard mps1: SAS Address for SATA device = 78928e54100f94a5
Aug  8 11:38:31 Asgard mps1: SAS Address from SATA device = 78928e54100f94a5
Aug  8 11:38:31 Asgard mps1: SAS Address for SATA device = dd3e2c4dc5b18f80
Aug  8 11:38:31 Asgard mps1: SAS Address from SATA device = dd3e2c4dc5b18f80
Aug  8 11:38:31 Asgard mps1: SAS Address for SATA device = 78928e54100f95a2
Aug  8 11:38:31 Asgard mps1: SAS Address from SATA device = 78928e54100f95a2
Aug  8 11:41:14 Asgard mps1: IOC Fault 0x40002622, Resetting
Aug  8 11:41:14 Asgard mps1: Reinitializing controller,
Aug  8 11:41:14 Asgard mps1: Firmware: 20.00.07.00, Driver: 21.02.00.00-fbsd
Aug  8 11:41:14 Asgard mps1: IOCCapabilities: 1285c<ScsiTaskFull,DiagTrace,SnapBuf,EEDP,TransRetry,EventReplay,HostDisc>
Aug  8 11:41:14 Asgard mps1: mps_reinit finished sc 0xfffffe00010ce000 post 4 free 3
Aug  8 11:41:15 Asgard mps1: SAS Address for SATA device = dc271f4dc5b18f80
Aug  8 11:41:15 Asgard mps1: SAS Address from SATA device = dc271f4dc5b18f80
Aug  8 11:41:15 Asgard mps1: SAS Address for SATA device = 78928e54100f94a5
Aug  8 11:41:15 Asgard mps1: SAS Address from SATA device = 78928e54100f94a5
Aug  8 11:41:16 Asgard mps1: SAS Address for SATA device = dd3e2c4dc5b18f80
Aug  8 11:41:16 Asgard mps1: SAS Address from SATA device = dd3e2c4dc5b18f80
Aug  8 11:41:16 Asgard mps1: SAS Address for SATA device = 78928e54100f95a2
Aug  8 11:41:16 Asgard mps1: SAS Address from SATA device = 78928e54100f95a2

This is where I did the hard reset.  You can see the time jump.

Aug  8 16:59:32 Asgard syslog-ng[2504]: syslog-ng starting up; version='3.20.1'
Aug  8 16:59:32 Asgard Copyright (c) 1992-2018 The FreeBSD Project.
Aug  8 16:59:32 Asgard Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
Aug  8 16:59:32 Asgard  The Regents of the University of California. All rights reserved.


If it happens again, I'm going to move the drives to the other bays to get them off the 9211-8i controller. Then if it happens again I'm going to remove the 9211-8i front he system and see if they recur again.

Really don't want to buy another 9211-8i, not that they are expensive, just don't feel like flashing it again and spending any money at all.
 
Joined
May 22, 2019
Messages
5
Were you able to isolate the problem? Did it turn out to be the SAS card? I have the exact same errors. When booting, half of the time, everything works well, the other half, I get stuck in the same loop as you.
 

thirdgen89gta

Dabbler
Joined
May 5, 2014
Messages
32
Yes, it ended up being the 9211-8i. Replacing it fixed the issues.
 
Top