WD Blue SSDs dropping out of pool

MikeyG

Patron
Joined
Dec 8, 2017
Messages
442
I put together a pool of WD Blue SSDs (WDS200T2B0A) a few months ago, and for the first month or two they worked without any error. All of a sudden in the last month I'm getting CAM status errors on them about 1-2 times per week (seems to be after updating from 11.2-U6 to 11.2-U7). Usually they were recovering from the error without actually degrading the pool, but today one finally dropped out:

Code:
Jan  5 03:08:20 nas smartd[23433]: Device: /dev/da9 [SAT], failed to read SMART Attribute Data
Jan  5 03:08:20 nas     (da9:mpr0:0:9:0): WRITE(10). CDB: 2a 00 70 13 ec 60 00 00 08 00 length 4096 SMID 316 Aborting command 0xfffffe0001568640
Jan  5 03:08:20 nas mpr0: Sending reset from mprsas_send_abort for target ID 9
Jan  5 03:08:20 nas     (pass13:mpr0:0:9:0): ATA COMMAND PASS THROUGH(16). CDB: 85 08 0e 00 d0 00 01 00 00 00 4f 00 c2 00 b0 00 length 512 SMID 278 terminated ioc 804b loginfo 31130000 scsi 0 state c xfer 0
Jan  5 03:08:20 nas     (da9:mpr0:0:9:0): ATA COMMAND PASS THROUGH(16). CDB: 85 0d 06 00 01 00 01 00 00 00 00 00 00 40 06 00 length 512 SMID 842 terminated ioc 804b loginfo 31130000 scsi 0 state c xfer 0
Jan  5 03:08:20 nas mpr0: (da9:mpr0:0:9:0): WRITE(10). CDB: 2a 00 70 13 ec 60 00 00 08 00
Jan  5 03:08:20 nas Unfreezing devq for target ID 9
Jan  5 03:08:20 nas (da9:mpr0:0:9:0): CAM status: Command timeout
Jan  5 03:08:20 nas (da9:mpr0:0:9:0): Retrying command
Jan  5 03:08:20 nas (da9:mpr0:0:9:0): ATA COMMAND PASS THROUGH(16). CDB: 85 0d 06 00 01 00 01 00 00 00 00 00 00 40 06 00
Jan  5 03:08:20 nas (da9:mpr0:0:9:0): CAM status: CCB request completed with an error
Jan  5 03:08:20 nas (da9:mpr0:0:9:0): Retrying command
Jan  5 03:08:21 nas     (pass13:mpr0:0:9:0): ATA COMMAND PASS THROUGH(16). CDB: 85 08 0e 00 d5 00 01 00 06 00 4f 00 c2 00 b0 00 length 512 SMID 850 terminated ioc 804b loginfo 31110e00 scsi 0 state c xfer 0
Jan  5 03:08:21 nas     (da9:mpr0:0:9:0): WRITE(10). CDB: 2a 00 70 13 ec 60 00 00 08 00 length 4096 SMID 592 terminated ioc 804b loginfo 31110e00 scsi 0 state c xfer 0
Jan  5 03:08:21 nas     (da9:mpr0:0:9:0): ATA COMMAND PASS THROUGH(16). CDB: 85 0d 06 00 01 00 01 00 00 00 00 00 00 40 06 00 length 512 SMID 792 term(da9:mpr0:0:9:0): WRITE(10). CDB: 2a 00 70 13 ec 60 00 00 08 00
Jan  5 03:08:21 nas inated ioc 804b loginfo 31110e00 scsi 0 state c xfer 0
Jan  5 03:08:21 nas (da9:mpr0:0:9:0): CAM status: CCB request completed with an error
Jan  5 03:08:21 nas (da9:mpr0:0:9:0): Retrying command
Jan  5 03:08:21 nas (da9:mpr0:0:9:0): ATA COMMAND PASS THROUGH(16). CDB: 85 0d 06 00 01 00 01 00 00 00 00 00 00 40 06 00
Jan  5 03:08:21 nas (da9:mpr0:0:9:0): CAM status: CCB request completed with an error
Jan  5 03:08:21 nas (da9:mpr0:0:9:0): Retrying command
Jan  5 03:08:21 nas     (pass13:mpr0:0:9:0): ATA COMMAND PASS THROUGH(16). CDB: 85 08 0e 00 d5 00 01 00 01 00 4f 00 c2 00 b0 00 length 512 SMID 502 terminated ioc 804b loginfo 31110e00 scsi 0 state c xfer 0
Jan  5 03:08:21 nas     (da9:mpr0:0:9:0): WRITE(10). CDB: 2a 00 70 13 ec 60 00 00 08 00 length 4096 SMID 772 terminated ioc 804b loginfo 31110e00 scsi 0 state c xfer 0
Jan  5 03:08:21 nas     (da9:mpr0:0:9:0): ATA COMMAND PASS THROUGH(16). CDB: 85 0d 06 00 01 00 01 00 00 00 00 00 00 40 06 00 length 512 SMID 396 term(da9:mpr0:0:9:0): WRITE(10). CDB: 2a 00 70 13 ec 60 00 00 08 00
Jan  5 03:08:21 nas inated ioc 804b loginfo 31110e00 scsi 0 state c xfer 0
Jan  5 03:08:21 nas (da9:mpr0:0:9:0): CAM status: CCB request completed with an error
Jan  5 03:08:21 nas (da9:mpr0:0:9:0): Retrying command
Jan  5 03:08:21 nas (da9:mpr0:0:9:0): ATA COMMAND PASS THROUGH(16). CDB: 85 0d 06 00 01 00 01 00 00 00 00 00 00 40 06 00
Jan  5 03:08:21 nas (da9:mpr0:0:9:0): CAM status: CCB request completed with an error
Jan  5 03:08:21 nas (da9:mpr0:0:9:0): Retrying command
Jan  5 03:08:22 nas     (pass13:mpr0:0:9:0): ATA COMMAND PASS THROUGH(16). CDB: 85 06 2c 00 00 00 00 00 00 00 00 00 00 00 e5 00 length 0 SMID 849 terminated ioc 804b loginfo 31110e00 scsi 0 state c xfer 0
Jan  5 03:08:22 nas     (da9:mpr0:0:9:0): WRITE(10). CDB: 2a 00 70 13 ec 60 00 00 08 00 length 4096 SMID 791 terminated ioc 804b loginfo 31110e00 scsi 0 state c xfer 0
Jan  5 03:08:22 nas     (da9:mpr0:0:9:0): ATA COMMAND PASS THROUGH(16). CDB: 85 0d 06 00 01 00 01 00 00 00 00 00 00 40 06 00 length 512 SMID 203 term(da9:mpr0:0:9:0): WRITE(10). CDB: 2a 00 70 13 ec 60 00 00 08 00
Jan  5 03:08:22 nas inated ioc 804b loginfo 31110e00 scsi 0 state c xfer 0
Jan  5 03:08:22 nas (da9:mpr0:0:9:0): CAM status: CCB request completed with an error
Jan  5 03:08:22 nas (da9:mpr0:0:9:0): Retrying command
Jan  5 03:08:22 nas (da9:mpr0:0:9:0): ATA COMMAND PASS THROUGH(16). CDB: 85 0d 06 00 01 00 01 00 00 00 00 00 00 40 06 00
Jan  5 03:08:22 nas (da9:mpr0:0:9:0): CAM status: CCB request completed with an error
Jan  5 03:08:22 nas (da9:mpr0:0:9:0): Retrying command
Jan  5 03:08:23 nas (da9:mpr0:0:9:0): WRITE(10). CDB: 2a 00 70 13 ec 60 00 00 08 00
Jan  5 03:08:23 nas (da9:mpr0:0:9:0): CAM status: SCSI Status Error
Jan  5 03:08:23 nas (da9:mpr0:0:9:0): SCSI status: Check Condition
Jan  5 03:08:23 nas (da9:mpr0:0:9:0): SCSI sense: UNIT ATTENTION asc:29,0 (Power on, reset, or bus device reset occurred)
Jan  5 03:08:23 nas (da9:mpr0:0:9:0): Error 6, Retries exhausted
Jan  5 03:08:23 nas (da9:mpr0:0:9:0): Invalidating pack

This has now happened on 4 of these drives. In searching around I saw there can be issues with RZAT and DRAT support behind LSI controllers (I have a 9305 running firmware 16.00.01.00). As far as I can tell these are the relevant fields for these drives:

Data Set Management (DSM/TRIM) yes DSM - max 512byte blocks yes 8 DSM - deterministic read yes zeroed Host Protected Area (HPA) no
I'm sure I would have been better off running enterprise SSDs, and if that's so, I can accept my mistake, but would like to get some verification that it's actually an issue with these SSDs.

The other area I've explored is cooling of the HBA. There are currently fans attached to the heatsink on the 9405 which makes it run relatively cool, as in cool enough to touch comfortable shortly after power on.

I have several MX500s and several ST4000VN000 spinning drives on the 9305 that have not had this issue.
 
Last edited:

MikeyG

Patron
Joined
Dec 8, 2017
Messages
442
I should also add that I'm using an ICY Dock MB516SP-B connected using Coboc SFF8643 cables. The drives that have failed so far are across 3 different cables. I was previously using 10gtek 8643 cables, but the connectors were loose and would occasionally fall out, so I switched. However, I never experienced these errors on the 10gtek cables. I could switch back for testing, but would like other ideas first.
 
Last edited:
Top