CAM Status Errors

Status
Not open for further replies.
Joined
Jul 3, 2015
Messages
1
First off, I want to say that I was very hesitant to post on this forum because I was able to find a number of threads where people had similar issues. However, ultimately, it seems like their issues are slightly different from mine so I decided to give it a shot; thank you in advance for your help and understanding.

System:
Motherboard - Supermicro X10SL7-F uATX
Memory - 2x MEM-DR380L-HL03-EU16 Hynix 8GB DDR3-1600 2Rx8 1.35v ECC Un-Buffered
Processor - Intel Xeon Processor E3-1231V3B
Enclosure - Supermicro CSE-825TQ-563LPB
Controller - Built in Supermicro LSI 2308 w/v20.04 FW
Controller (tried independently of mobo controller) - IBM m1015 converted to 9211 w/v20.04 FW
Boot Drives (have tried) - 2x HP v221w Metal 16GB USB 2.0 (mirror)
Boot Drives (have tried) - 2x SanDisk Cruzer Fit 8GB USB 2.0 (mirror)
Boot Drives (have tried) - 2x Patriot 16GB Autobahn Series Ultra Compact USB 2.0 (mirror)
Hard Drives (have tried in RAIDZ2 & Mirrors) - 6x HGST Deskstar NAS 3.5-Inch 3TB 7200RPM SATA III
Hard Drives (have tried in RAIDZ2) - 6x Seagate 3TB Enterprise Capacity SAS 6Gb/s 128MB Cache (ST3000NM0023)

Issue:
About a month ago I noticed that there were some strange CAM status errors using the HP USBs as boot devices and the Seagate HDDs in RAIDZ2 above for data. I had been using the system without any issues since May 2015 and I didn't notice any particular notifications in the FreeNAS gui, but I did see that the enclosure SCSI status lights started blinking red for every single drive. Please forgive me, but given the errors encountered, I was unable to get a direct command line text output but I used IPMI to capture screenshots.
Here is the first set of errors I saw:
kdQb4Pa.jpg


I checked the SMART test results of my hard drives and noticed that one had some errors, so I ordered some spares and gracefully shut down the server.

After the drives arrived, I rebooted my system to detach the bad drive and follow the replacement procedures, however I noticed that those CAM status errors persisted during system startup. I began to conduct some additional testing and found that when I tried accessing data on the NAS via SMB share, there were near constant ioc errors (as seen in the top of the previous screenshot) and performance was terribly slow.

At this point, I shut down the server, took out the boot devices and storage devices to start some hardware troubleshooting. For my first test, I installed the 6 HGST hard drives in the enclosure still using the motherboard LSI controller and booted from a freshly installed FreeNAS OS on HP USB drives. I created a RAIDZ2 volume, SMB share and started copying data to it. Within a short period of time I saw this error:
XYbGiXB.jpg


I thought it might be a controller issue, so detached the volume, selecting to destroy the data, shutdown the system and switched 4 drives over to an IBM m1015 converted to LSI9211 FW v/20.04 and connected with different cables (miniSAS to SATA. I created a new 4 disk RAIDZ2 volume and a SMB share to test. Once I started transferring data for a short duration, I saw red lights on the enclosure SCSI status indicator and this error:
liGK1mc.jpg


Now I was really confused so I thought I'd just switch 4 disks over to the motherboard SATA connectors and try in a simple mirrored configuration. I had 2 sets of 2 disk mirrors. After I created the pool and started transferring data on a SMB share I saw this:
H4agCHr.jpg


My conclusion is that either the motherboard or the enclosure are the problem with a lean towards the enclosure. Unfortunately, I don't have a really easy way to test that since I don't have any spare cases to install my equipment in. I might be able to jury rig a test with a couple of mirrored SATA drives directly connected to the supermicro motherboard (bypassing the hot swap enclosures), but what do you guys think? Is that the right step? At this point I've contacted supermicro telling them what's going on and I asked them for recommendations as well, but no offense to their customer support, I'm guessing you guys will know a helluva lot more than they will. Thanks for the help!
 

BigDave

FreeNAS Enthusiast
Joined
Oct 6, 2013
Messages
2,479
Unfortunately, I don't have a really easy way to test that since I don't have any spare cases to install my equipment in. I might be able to jury rig a test with a couple of mirrored SATA drives directly connected to the supermicro motherboard (bypassing the hot swap enclosures), but what do you guys think? Is that the right step?
Get the hardware out of the enclosure and spread it out on a non-conductive (anti-static) surface. hook up as many drives as you can
and test again. Without the enclosure involved, it can then only be the board if it continues to fail.
 
Status
Not open for further replies.
Top