LodestoneJames
Cadet
- Joined
- May 5, 2021
- Messages
- 5
We've just got some iXsystems servers (not TrueNAS Enterprise hardware) that we're loading TrueNAS/FreeNAS Core onto. This is for a lab environment so not really necessary to splurge on the high end gear, yet. However, we're having some problems. I'm really skeptical that it's hardware related because why would iXsystems sell gear that's not going to work out of the box with TrueNAS (even if its not TrueNAS Enterprise gear).
Controller Hardware Specs (iX-2224R-E1CR24L-IXN):
Motherboard: Supermicro X11DPH-T
CPU: 2x Intel Gold 6226R
RAM: 256GB
HBA: 3x 9305-16e w/16.00.12.00 firmware
Expander Specs (iXC-4072DJ-IXN):
3x - 72 bay 2.5" SAS slots filled with Samsung 850 EVO 2TB SATA SSDs
We've arranged the cabling such that each enclosure is connected via 2x SFF-8644 connections (1 meter/3.3 feet cables) to each HBA. We aren't interested in multi-pathing and the 9305-16e controllers have two individual SAS controller cores, so if you connect too many cables it causes standard SATA drives to show up via multiple paths and can cause some strange behavior (not good).
What we're running into is random I/O errors. It doesn't seem to have any rhyme or reason. It affects all 3 HBAs and drives in all 3 enclosures that previously were working flawlessly. When the I/O errors show up the ZFS pool starts faulting drives and eventually takes the array offline. The drives don't even have to be in an array to get I/O errors or even under any load. Sometimes just a S.M.A.R.T. check is enough to cause an error, then FreeNAS/TrueNAS marks it as not capable of S.M.A.R.T.
Here's what we've tried:
TrueNAS 12.2
TrueNAS 11.1
TrueNAS 11.0 (Kernel Panics booting the installer)
Changing Cables
Changing arrangement of cabling
Changing Drives
The ONLY thing that comes to mind is we got a bad batch of cables from our Amazon seller?
I've attached a debug - its a brand new install and there's nothing on here of any security or importance.
Here's some of the errors:
Controller Hardware Specs (iX-2224R-E1CR24L-IXN):
Motherboard: Supermicro X11DPH-T
CPU: 2x Intel Gold 6226R
RAM: 256GB
HBA: 3x 9305-16e w/16.00.12.00 firmware
Expander Specs (iXC-4072DJ-IXN):
3x - 72 bay 2.5" SAS slots filled with Samsung 850 EVO 2TB SATA SSDs
We've arranged the cabling such that each enclosure is connected via 2x SFF-8644 connections (1 meter/3.3 feet cables) to each HBA. We aren't interested in multi-pathing and the 9305-16e controllers have two individual SAS controller cores, so if you connect too many cables it causes standard SATA drives to show up via multiple paths and can cause some strange behavior (not good).
What we're running into is random I/O errors. It doesn't seem to have any rhyme or reason. It affects all 3 HBAs and drives in all 3 enclosures that previously were working flawlessly. When the I/O errors show up the ZFS pool starts faulting drives and eventually takes the array offline. The drives don't even have to be in an array to get I/O errors or even under any load. Sometimes just a S.M.A.R.T. check is enough to cause an error, then FreeNAS/TrueNAS marks it as not capable of S.M.A.R.T.
Here's what we've tried:
TrueNAS 12.2
TrueNAS 11.1
TrueNAS 11.0 (Kernel Panics booting the installer)
Changing Cables
Changing arrangement of cabling
Changing Drives
The ONLY thing that comes to mind is we got a bad batch of cables from our Amazon seller?
I've attached a debug - its a brand new install and there's nothing on here of any security or importance.
Here's some of the errors:
(da52:mpr2:0:799:0): CAM status: SCSI Status Error
(da52:mpr2:0:799:0): SCSI status: Check Condition
(da52:mpr2:0:799:0): SCSI sense: UNIT ATTENTION asc:29,0 (Power on, reset, or bus device reset occurred)
(da52:mpr2:0:799:0): Retrying command (per sense data)
(da72:mpr2:0:819:0): READ(10). CDB: 28 00 e8 e0 86 20 00 00 e0 00 length 114688 SMID 870 terminated ioc 804b loginfo 31110e0
3 scsi 0 state c xfer 0
(da72:mpr2:0:819:0): READ(10). CDB: 28 00 e8 e0 84 20 00 00 e0 00 length 114688 SMID 796 terminated ioc 804b loginfo 31110e0
3(da72:mpr2:0:819:0): READ(10). CDB: 28 00 e8 e0 86 20 00 00 e0 00
scsi 0 state c xfer 0
(da72:mpr2:0:819:0): READ(6). CDB: 08 00 02 20 e0 00 length 114688 SMID 803 terminated ioc 804b loginfo 31110e03 scsi 0 stat
e c xfer 105480
(da72:mpr2:0:819:0): CAM status: CCB request completed with an error
(da72:mpr2:0:819:0): Retrying command
(da72:mpr2:0:819:0): READ(10). CDB: 28 00 e8 e0 84 20 00 00 e0 00
(da72:mpr2:0:819:0): CAM status: CCB request completed with an error
(da72:mpr2:0:819:0): Retrying command
(da72:mpr2:0:819:0): READ(6). CDB: 08 00 02 20 e0 00
(da72:mpr2:0:819:0): CAM status: CCB request completed with an error
(da72:mpr2:0:819:0): Retrying command
(da72:mpr2:0:819:0): READ(10). CDB: 28 00 e8 e0 86 20 00 00 e0 00
(da72:mpr2:0:819:0): CAM status: SCSI Status Error
(da72:mpr2:0:819:0): SCSI status: Check Condition
(da72:mpr2:0:819:0): SCSI sense: UNIT ATTENTION asc:29,0 (Power on, reset, or bus device reset occurred)
(da72:mpr2:0:819:0): Retrying command (per sense data)
(da106:mpr2:0:977:0): READ(10). CDB: 28 00 e8 e0 86 20 00 00 e0 00 length 114688 SMID 962 terminated ioc 804b loginfo 31110e
03 scsi 0 state c xfer 0
(da106:mpr2:0:977:0): READ(10). CDB: 28 00 e8 e0 84 20 00 00 e0 00 length 114688 SMID 960 terminated ioc 804b loginfo 31110e
0(da106:mpr2:0:977:0): READ(10). CDB: 28 00 e8 e0 86 20 00 00 e0 00
3 scsi 0 state c xfer 80904
(da106:mpr2:0:977:0): CAM status: CCB request completed with an error
(da106:mpr2:0:977:0): Retrying command
(da106:mpr2:0:977:0): READ(10). CDB: 28 00 e8 e0 84 20 00 00 e0 00
(da106:mpr2:0:977:0): CAM status: CCB request completed with an error
(da106:mpr2:0:977:0): Retrying command
(da106:mpr2:0:977:0): READ(10). CDB: 28 00 e8 e0 86 20 00 00 e0 00
(da106:mpr2:0:977:0): CAM status: SCSI Status Error
(da106:mpr2:0:977:0): SCSI status: Check Condition
(da106:mpr2:0:977:0): SCSI sense: UNIT ATTENTION asc:29,0 (Power on, reset, or bus device reset occurred)
(da106:mpr2:0:977:0): Retrying command (per sense data)
(da108:mpr2:0:979:0): READ(10). CDB: 28 00 e8 e0 86 20 00 00 e0 00 length 114688 SMID 287 terminated ioc 804b loginfo 31110e
03 scsi 0 state c xfer 39944
(da108:mpr2:0:979:0): READ(10). CDB: 28 00 e8 e0 86 20 00 00 e0 00
(da108:mpr2:0:979:0): CAM status: CCB request completed with an error
(da108:mpr2:0:979:0): Retrying command
(da108:mpr2:0:979:0): READ(10). CDB: 28 00 e8 e0 86 20 00 00 e0 00
(da108:mpr2:0:979:0): CAM status: SCSI Status Error
(da108:mpr2:0:979:0): SCSI status: Check Condition
(da108:mpr2:0:979:0): SCSI sense: UNIT ATTENTION asc:29,0 (Power on, reset, or bus device reset occurred)
(da108:mpr2:0:979:0): Retrying command (per sense data)
(da52:mpr2:0:799:0): SCSI status: Check Condition
(da52:mpr2:0:799:0): SCSI sense: UNIT ATTENTION asc:29,0 (Power on, reset, or bus device reset occurred)
(da52:mpr2:0:799:0): Retrying command (per sense data)
(da72:mpr2:0:819:0): READ(10). CDB: 28 00 e8 e0 86 20 00 00 e0 00 length 114688 SMID 870 terminated ioc 804b loginfo 31110e0
3 scsi 0 state c xfer 0
(da72:mpr2:0:819:0): READ(10). CDB: 28 00 e8 e0 84 20 00 00 e0 00 length 114688 SMID 796 terminated ioc 804b loginfo 31110e0
3(da72:mpr2:0:819:0): READ(10). CDB: 28 00 e8 e0 86 20 00 00 e0 00
scsi 0 state c xfer 0
(da72:mpr2:0:819:0): READ(6). CDB: 08 00 02 20 e0 00 length 114688 SMID 803 terminated ioc 804b loginfo 31110e03 scsi 0 stat
e c xfer 105480
(da72:mpr2:0:819:0): CAM status: CCB request completed with an error
(da72:mpr2:0:819:0): Retrying command
(da72:mpr2:0:819:0): READ(10). CDB: 28 00 e8 e0 84 20 00 00 e0 00
(da72:mpr2:0:819:0): CAM status: CCB request completed with an error
(da72:mpr2:0:819:0): Retrying command
(da72:mpr2:0:819:0): READ(6). CDB: 08 00 02 20 e0 00
(da72:mpr2:0:819:0): CAM status: CCB request completed with an error
(da72:mpr2:0:819:0): Retrying command
(da72:mpr2:0:819:0): READ(10). CDB: 28 00 e8 e0 86 20 00 00 e0 00
(da72:mpr2:0:819:0): CAM status: SCSI Status Error
(da72:mpr2:0:819:0): SCSI status: Check Condition
(da72:mpr2:0:819:0): SCSI sense: UNIT ATTENTION asc:29,0 (Power on, reset, or bus device reset occurred)
(da72:mpr2:0:819:0): Retrying command (per sense data)
(da106:mpr2:0:977:0): READ(10). CDB: 28 00 e8 e0 86 20 00 00 e0 00 length 114688 SMID 962 terminated ioc 804b loginfo 31110e
03 scsi 0 state c xfer 0
(da106:mpr2:0:977:0): READ(10). CDB: 28 00 e8 e0 84 20 00 00 e0 00 length 114688 SMID 960 terminated ioc 804b loginfo 31110e
0(da106:mpr2:0:977:0): READ(10). CDB: 28 00 e8 e0 86 20 00 00 e0 00
3 scsi 0 state c xfer 80904
(da106:mpr2:0:977:0): CAM status: CCB request completed with an error
(da106:mpr2:0:977:0): Retrying command
(da106:mpr2:0:977:0): READ(10). CDB: 28 00 e8 e0 84 20 00 00 e0 00
(da106:mpr2:0:977:0): CAM status: CCB request completed with an error
(da106:mpr2:0:977:0): Retrying command
(da106:mpr2:0:977:0): READ(10). CDB: 28 00 e8 e0 86 20 00 00 e0 00
(da106:mpr2:0:977:0): CAM status: SCSI Status Error
(da106:mpr2:0:977:0): SCSI status: Check Condition
(da106:mpr2:0:977:0): SCSI sense: UNIT ATTENTION asc:29,0 (Power on, reset, or bus device reset occurred)
(da106:mpr2:0:977:0): Retrying command (per sense data)
(da108:mpr2:0:979:0): READ(10). CDB: 28 00 e8 e0 86 20 00 00 e0 00 length 114688 SMID 287 terminated ioc 804b loginfo 31110e
03 scsi 0 state c xfer 39944
(da108:mpr2:0:979:0): READ(10). CDB: 28 00 e8 e0 86 20 00 00 e0 00
(da108:mpr2:0:979:0): CAM status: CCB request completed with an error
(da108:mpr2:0:979:0): Retrying command
(da108:mpr2:0:979:0): READ(10). CDB: 28 00 e8 e0 86 20 00 00 e0 00
(da108:mpr2:0:979:0): CAM status: SCSI Status Error
(da108:mpr2:0:979:0): SCSI status: Check Condition
(da108:mpr2:0:979:0): SCSI sense: UNIT ATTENTION asc:29,0 (Power on, reset, or bus device reset occurred)
(da108:mpr2:0:979:0): Retrying command (per sense data)