Hello there,
I've got a several year old server that has been throwing "CAM status: Command timeout" errors basically every day at 3 am (if it is on). It'll throw a few of them over several days and eventually freeze up and require a reboot. This has been going on for a long time (year+).
The server is an ASRock C2550D4I w/ 16 gigs ECC memory and 2x5 TB & 2x3 TB WD Red NAS hard drives. I recently replaced on of the 5 TB drives because it was throwing some smartctl errors. Running the latest stable release of 11.2.
Some example errors are:
Smartctl on both drives (ada2 and ada3) are totally clean with no errors and attributes 1, 7, and 199 = 0. Zpool status is also clean.
I have replaced the SATA cables on both of these drives. I noticed that the plastic on one of the drives is a little chewed up and so the cable doesn't click in quite right, but other than that they seemed fine.
I'd really like this server to be reliable again. I've looked into getting a new drive controller (HBA in IT mode), but a guy on eBay who sells them told me he was skeptical that it would help (and otherwise very helpful). I suppose I could also replace the drives, but they haven't shown any SMART errors and that's also a lot more expensive. I'd really appreciate any advice as to what I should do.
I've got a several year old server that has been throwing "CAM status: Command timeout" errors basically every day at 3 am (if it is on). It'll throw a few of them over several days and eventually freeze up and require a reboot. This has been going on for a long time (year+).
The server is an ASRock C2550D4I w/ 16 gigs ECC memory and 2x5 TB & 2x3 TB WD Red NAS hard drives. I recently replaced on of the 5 TB drives because it was throwing some smartctl errors. Running the latest stable release of 11.2.
Some example errors are:
Code:
May 12 03:02:36 freenas ahcich3: Timeout on slot 17 port 0 May 12 03:02:36 freenas ahcich3: is 00000008 cs 00000000 ss 00000000 rs 00020000 tfd 40 serr 00000000 cmd 10009117 May 12 03:02:36 freenas (ada3:ahcich3:0:0:0): WRITE_FPDMA_QUEUED. ACB: 61 08 c0 77 d9 40 fa 00 00 00 00 00 May 12 03:02:36 freenas (ada3:ahcich3:0:0:0): CAM status: Command timeout May 12 03:02:36 freenas (ada3:ahcich3:0:0:0): Retrying command May 12 03:04:18 freenas ahcich2: Timeout on slot 30 port 0 May 12 03:04:18 freenas ahcich2: is 00000008 cs 00000000 ss 00000000 rs 40000000 tfd 40 serr 00000000 cmd 10009e17 May 12 03:04:18 freenas (ada2:ahcich2:0:0:0): WRITE_FPDMA_QUEUED. ACB: 61 08 b8 30 32 40 83 00 00 00 00 00 May 12 03:04:18 freenas (ada2:ahcich2:0:0:0): CAM status: Command timeout May 12 03:04:18 freenas (ada2:ahcich2:0:0:0): Retrying command May 12 03:08:53 freenas ahcich2: Timeout on slot 20 port 0 May 12 03:08:53 freenas ahcich2: is 00000008 cs 00000000 ss 00000000 rs 00100000 tfd 40 serr 00000000 cmd 10009417 May 12 03:08:53 freenas (ada2:ahcich2:0:0:0): WRITE_FPDMA_QUEUED. ACB: 61 08 48 d4 32 40 83 00 00 00 00 00 May 12 03:08:53 freenas (ada2:ahcich2:0:0:0): CAM status: Command timeout May 12 03:08:53 freenas (ada2:ahcich2:0:0:0): Retrying command
Smartctl on both drives (ada2 and ada3) are totally clean with no errors and attributes 1, 7, and 199 = 0. Zpool status is also clean.
I have replaced the SATA cables on both of these drives. I noticed that the plastic on one of the drives is a little chewed up and so the cable doesn't click in quite right, but other than that they seemed fine.
I'd really like this server to be reliable again. I've looked into getting a new drive controller (HBA in IT mode), but a guy on eBay who sells them told me he was skeptical that it would help (and otherwise very helpful). I suppose I could also replace the drives, but they haven't shown any SMART errors and that's also a lot more expensive. I'd really appreciate any advice as to what I should do.