Hi All:
As some of you know, we recently built a new FreeNAS box about a month or so ago. During that time and since, we have been going through the paces, running memory, CPU, and lots of HDD tests, in preparation for the device to take over as the master NAS server in our home. This week, we finally started to RSYNC the data over from our QNAP device over to this new server. After copying over the data to this server, we decided it would be a good idea to run the server's first scrub. We started the scrub up and verified all was progressing before going to bed. Awoke this morning to an email from the server indicating that 6 of the 10 drives had encountered errors.
ada0
da0, da1, da2, da4, da6
So an error came up like this before, and it seemed to be caused by a fault fan switch on the back of my case. To Fractal's credit, they replaced the switch and I did some testing to validate that resolved the problem. Since we could not reproduce the problem with the new switch, we figured the issue was resolved. That issue is in a previous post. It seemed moving the switch back and forth during a badblocks test caused the cam errors to throw.
Spring forward to now, and we are again seeing these errors. First thought was, maybe it was the breakout cables or perhaps the M1015 card ; but then we noticed that ADA0 (drive connected directly to the SATA port) also encountered a SATA equivalent error. Given that drive is connected via a different signaling cable independent of the da* series of drives, it seems unlikely (though not impossible) that cabling is the problem.
The bigger oddity is if a smartctl -x is issued, we see these FPMA errors on the aforementioned drives, but if a smartctl -a is run you do not see the errors. In fact, looking for the key indicators of drive issues, none are immediately present.
Checked for the following areas for all (10) drives:
Raw_Read_Error_Rate
Reallocated_Event_Ct
Current_Pending_Sector
Offline_Uncorrectable
Multi_Zone_Error_Rate
This is starting to look more and more like some kind of power oddity. The investigation is still ongoing, but look forward to hearing input from the community.
We are in the process of running a new extended test on all (10) drives, which will be posted shortly after completion for your consideration. The status of the pool will also be included. Everything looks in order except for those CAM errors and we don't like to see errors, unless they can be effectively explained and if at all possible solved :)
Thanks,
-Dan
As some of you know, we recently built a new FreeNAS box about a month or so ago. During that time and since, we have been going through the paces, running memory, CPU, and lots of HDD tests, in preparation for the device to take over as the master NAS server in our home. This week, we finally started to RSYNC the data over from our QNAP device over to this new server. After copying over the data to this server, we decided it would be a good idea to run the server's first scrub. We started the scrub up and verified all was progressing before going to bed. Awoke this morning to an email from the server indicating that 6 of the 10 drives had encountered errors.
Code:
jupiter.local kernel log messages: (da4:mps0:0:4:0): READ(10). CDB: 28 00 8c e6 48 b8 00 00 e8 00 length 118784 SMID 468 terminated ioc 804b scsi 0 state 0 xfer 0 (da4:mps0:0:4:0): READ(10). CDB: 28 00 8c e6 48 b8 00 00 e8 00 (da4:mps0:0:4:0): CAM status: CCB request completed with an error (da4:mps0:0:4:0): Retrying command (da4:mps0:0:4:0): READ(10). CDB: 28 00 8c e6 47 d0 00 00 e8 00 (da4:mps0:0:4:0): CAM status: SCSI Status Error (da4:mps0:0:4:0): SCSI status: Check Condition (da4:mps0:0:4:0): SCSI sense: MEDIUM ERROR asc:11,0 (Unrecovered read error) (da4:mps0:0:4:0): Info: 0x8ce647d0 (da4:mps0:0:4:0): Error 5, Unretryable error GEOM_ELI: g_eli_read_done() failed (error=5) gptid/13ef3990-fb93-11e6-abcc-0cc47a8668da.eli[READ(offset=1208170225664, length=118784)] (da4:mps0:0:4:0): READ(10). CDB: 28 00 8c e9 1e 28 00 00 e8 00 length 118784 SMID 361 terminated ioc 804b scsi 0 state 0 xfer 0 (da4:mps0:0:4:0): READ(10). CDB: 28 00 8c e9 1e 28 00 00 e8 00 (da4:mps0:0:4:0): CAM status: CCB request completed with an error (da4:mps0:0:4:0): Retrying command (da4:mps0:0:4:0): READ(10). CDB: 28 00 8c e9 1d 48 00 00 e0 00 (da4:mps0:0:4:0): CAM status: SCSI Status Error (da4:mps0:0:4:0): SCSI status: Check Condition (da4:mps0:0:4:0): SCSI sense: MEDIUM ERROR asc:11,0 (Unrecovered read error) (da4:mps0:0:4:0): Info: 0x8ce91d48 (da4:mps0:0:4:0): Error 5, Unretryable error GEOM_ELI: g_eli_read_done() failed (error=5) gptid/13ef3990-fb93-11e6-abcc-0cc47a8668da.eli[READ(offset=1208265314304, length=114688)] (da0:mps0:0:0:0): READ(10). CDB: 28 00 8c e7 86 b8 00 00 e0 00 length 114688 SMID 719 terminated ioc 804b scsi 0 state 0 xfer 0 (da0:mps0:0:0:0): READ(10). CDB: 28 00 8c e7 86 b8 00 00 e0 00 (da0:mps0:0:0:0): CAM status: CCB request completed with an error (da0:mps0:0:0:0): Retrying command (da0:mps0:0:0:0): READ(10). CDB: 28 00 8c e7 85 d0 00 00 e8 00 (da0:mps0:0:0:0): CAM status: SCSI Status Error (da0:mps0:0:0:0): SCSI status: Check Condition (da0:mps0:0:0:0): SCSI sense: MEDIUM ERROR asc:11,0 (Unrecovered read error) (da0:mps0:0:0:0): Info: 0x8ce785d0 (da0:mps0:0:0:0): Error 5, Unretryable error GEOM_ELI: g_eli_read_done() failed (error=5) gptid/10cd817e-fb93-11e6-abcc-0cc47a8668da.eli[READ(offset=1208211906560, length=118784)] (da6:mps0:0:6:0): READ(10). CDB: 28 00 8c e8 cf b0 00 00 28 00 length 20480 SMID 531 terminated ioc 804b scsi 0 state 0 xfer 0 (da6:mps0:0:6:0): READ(10). CDB: 28 00 8c e8 cf b0 00 00 28 00 (da6:mps0:0:6:0): CAM status: CCB request completed with an error (da6:mps0:0:6:0): Retrying command (da6:mps0:0:6:0): READ(10). CDB: 28 00 8c e8 ce c8 00 00 e8 00 (da6:mps0:0:6:0): CAM status: SCSI Status Error (da6:mps0:0:6:0): SCSI status: Check Condition (da6:mps0:0:6:0): SCSI sense: MEDIUM ERROR asc:11,0 (Unrecovered read error) (da6:mps0:0:6:0): Info: 0x8ce8cec8 (da6:mps0:0:6:0): Error 5, Unretryable error GEOM_ELI: g_eli_read_done() failed (error=5) gptid/156d3a72-fb93-11e6-abcc-0cc47a8668da.eli[READ(offset=1208255025152, length=118784)] (da2:mps0:0:2:0): READ(10). CDB: 28 00 8c e8 d0 50 00 00 20 00 length 16384 SMID 838 terminated ioc 804b scsi 0 state 0 xfer 0 (da2:mps0:0:2:0): READ(10). CDB: 28 00 8c e8 d0 50 00 00 20 00 (da2:mps0:0:2:0): CAM status: CCB request completed with an error (da2:mps0:0:2:0): Retrying command (da2:mps0:0:2:0): READ(10). CDB: 28 00 8c e8 cf 68 00 00 e8 00 (da2:mps0:0:2:0): CAM status: SCSI Status Error (da2:mps0:0:2:0): SCSI status: Check Condition (da2:mps0:0:2:0): SCSI sense: MEDIUM ERROR asc:11,0 (Unrecovered read error) (da2:mps0:0:2:0): Info: 0x8ce8cf68 (da2:mps0:0:2:0): Error 5, Unretryable error GEOM_ELI: g_eli_read_done() failed (error=5) gptid/1252f2b2-fb93-11e6-abcc-0cc47a8668da.eli[READ(offset=1208255107072, length=118784)] (da6:mps0:0:6:0): READ(10). CDB: 28 00 8c e9 bb 20 00 00 e8 00 length 118784 SMID 505 terminated ioc 804b scsi 0 state 0 xfer 0 (da6:mps0:0:6:0): READ(10). CDB: 28 00 8c e9 bb 20 00 00 e8 00 (da6:mps0:0:6:0): CAM status: CCB request completed with an error (da6:mps0:0:6:0): Retrying command (da6:mps0:0:6:0): READ(10). CDB: 28 00 8c e9 ba 38 00 00 e8 00 (da6:mps0:0:6:0): CAM status: SCSI Status Error (da6:mps0:0:6:0): SCSI status: Check Condition (da6:mps0:0:6:0): SCSI sense: MEDIUM ERROR asc:11,0 (Unrecovered read error) (da6:mps0:0:6:0): Info: 0x8ce9ba38 (da6:mps0:0:6:0): Error 5, Unretryable error GEOM_ELI: g_eli_read_done() failed (error=5) gptid/156d3a72-fb93-11e6-abcc-0cc47a8668da.eli[READ(offset=1208285884416, length=118784)] (ada0:ahcich0:0:0:0): READ_FPDMA_QUEUED. ACB: 60 e8 18 90 e7 40 8c 00 00 00 00 00 (ada0:ahcich0:0:0:0): CAM status: ATA Status Error (ada0:ahcich0:0:0:0): ATA status: 41 (DRDY ERR), error: 40 (UNC ) (ada0:ahcich0:0:0:0): RES: 41 40 18 90 e7 00 8c 00 00 00 00 (ada0:ahcich0:0:0:0): Retrying command (da2:mps0:0:2:0): READ(10). CDB: 28 00 8c e7 e8 70 00 00 e8 00 length 118784 SMID 578 terminated ioc 804b scsi 0 state 0 xfer 0 (da2:mps0:0:2:0): READ(10). CDB: 28 00 8c e7 e8 70 00 00 e8 00 (da2:mps0:0:2:0): CAM status: CCB request completed with an error (da2:mps0:0:2:0): Retrying command (da2:mps0:0:2:0): READ(10). CDB: 28 00 8c e7 e7 90 00 00 e0 00 (da2:mps0:0:2:0): CAM status: SCSI Status Error (da2:mps0:0:2:0): SCSI status: Check Condition (da2:mps0:0:2:0): SCSI sense: MEDIUM ERROR asc:11,0 (Unrecovered read error) (da2:mps0:0:2:0): Info: 0x8ce7e790 (da2:mps0:0:2:0): Error 5, Unretryable error GEOM_ELI: g_eli_read_done() failed (error=5) gptid/1252f2b2-fb93-11e6-abcc-0cc47a8668da.eli[READ(offset=1208224718848, length=114688)] (da1:mps0:0:1:0): READ(10). CDB: 28 00 8c ea 67 50 00 00 e0 00 length 114688 SMID 768 terminated ioc 804b scsi 0 state 0 xfer 0 (da1:mps0:0:1:0): READ(10). CDB: 28 00 8c ea 67 50 00 00 e0 00 (da1:mps0:0:1:0): CAM status: CCB request completed with an error (da1:mps0:0:1:0): Retrying command (da1:mps0:0:1:0): READ(10). CDB: 28 00 8c ea 66 68 00 00 e8 00 (da1:mps0:0:1:0): CAM status: SCSI Status Error (da1:mps0:0:1:0): SCSI status: Check Condition (da1:mps0:0:1:0): SCSI sense: MEDIUM ERROR asc:11,0 (Unrecovered read error) (da1:mps0:0:1:0): Info: 0x8cea6668 (da1:mps0:0:1:0): Error 5, Unretryable error GEOM_ELI: g_eli_read_done() failed (error=5) gptid/118f126a-fb93-11e6-abcc-0cc47a8668da.eli[READ(offset=1208308453376, length=118784)] (da4:mps0:0:4:0): READ(10). CDB: 28 00 8c eb 0f 70 00 00 e8 00 length 118784 SMID 621 terminated ioc 804b scsi 0 state 0 xfer 0 (da4:mps0:0:4:0): READ(10). CDB: 28 00 8c eb 0f 70 00 00 e8 00 (da4:mps0:0:4:0): CAM status: CCB request completed with an error (da4:mps0:0:4:0): Retrying command (da4:mps0:0:4:0): READ(10). CDB: 28 00 8c eb 0e 88 00 00 e8 00 (da4:mps0:0:4:0): CAM status: SCSI Status Error (da4:mps0:0:4:0): SCSI status: Check Condition (da4:mps0:0:4:0): SCSI sense: MEDIUM ERROR asc:11,0 (Unrecovered read error) (da4:mps0:0:4:0): Info: 0x8ceb0e88 (da4:mps0:0:4:0): Error 5, Unretryable error GEOM_ELI: g_eli_read_done() failed (error=5) gptid/13ef3990-fb93-11e6-abcc-0cc47a8668da.eli[READ(offset=1208330489856, length=118784)] (ada0:ahcich0:0:0:0): READ_FPDMA_QUEUED. ACB: 60 e0 50 0f eb 40 8c 00 00 00 00 00 (ada0:ahcich0:0:0:0): CAM status: ATA Status Error (ada0:ahcich0:0:0:0): ATA status: 41 (DRDY ERR), error: 40 (UNC ) (ada0:ahcich0:0:0:0): RES: 41 40 50 0f eb 00 8c 00 00 00 00 (ada0:ahcich0:0:0:0): Retrying command (da1:mps0:0:1:0): READ(10). CDB: 28 00 8c ec 3f a0 00 00 c0 00 length 98304 SMID 924 terminated ioc 804b scsi 0 state 0 xfer 0 (da1:mps0:0:1:0): READ(10). CDB: 28 00 8c ec 3f a0 00 00 c0 00 (da1:mps0:0:1:0): CAM status: CCB request completed with an error (da1:mps0:0:1:0): Retrying command (da1:mps0:0:1:0): READ(10). CDB: 28 00 8c ec 3e b8 00 00 e8 00 (da1:mps0:0:1:0): CAM status: SCSI Status Error (da1:mps0:0:1:0): SCSI status: Check Condition (da1:mps0:0:1:0): SCSI sense: MEDIUM ERROR asc:11,0 (Unrecovered read error) (da2:mps0:0:2:0): READ(10). CDB: 28 00 8c ec 3f 50 00 00 c0 00 length 98304 SMID 964 terminated ioc 804b scsi 0 state 0 xfer (da1:mps0:0:1:0): Info: 0x8cec3eb8 0 (da1:(da2:mps0:0:2:0): READ(10). CDB: 28 00 8c ec 3f 50 00 00 c0 00 mps0:0:(da2:mps0:0:2:0): CAM status: CCB request completed with an error 1:(da2:0): mps0:0:Error 5, Unretryable error 2:0): GEOM_ELIRetrying command : g_eli_read_done() failed (error=5)(da2:mps0:0:2:0): READ(10). CDB: 28 00 8c ec 3e 68 00 00 e8 00 (da2:mps0:0:2:0): CAM status: SCSI Status Error gptid/118f126a-fb93-11e6-abcc-0cc47a8668da.eli[READ(offset=1208370360320, length=118784)](da2:mps0:0:2:0): SCSI status: Check Condition (da2:mps0:0:2:0): SCSI sense: MEDIUM ERROR asc:11,0 (Unrecovered read error) (da2:mps0:0:2:0): Info: 0x8cec3e68 (da2:mps0:0:2:0): Error 5, Unretryable error GEOM_ELI: g_eli_read_done() failed (error=5) gptid/1252f2b2-fb93-11e6-abcc-0cc47a8668da.eli[READ(offset=1208370319360, length=118784)] (ada0:ahcich0:0:0:0): READ_FPDMA_QUEUED. ACB: 60 e8 78 13 a0 40 99 01 00 00 00 00 (ada0:ahcich0:0:0:0): CAM status: ATA Status Error (ada0:ahcich0:0:0:0): ATA status: 41 (DRDY ERR), error: 40 (UNC ) (ada0:ahcich0:0:0:0): RES: 41 40 78 13 a0 00 99 01 00 00 00 (ada0:ahcich0:0:0:0): Retrying command (da4:mps0:0:4:0): READ(16). CDB: 88 00 00 00 00 01 99 a0 a9 90 00 00 00 e8 00 00 length 118784 SMID 403 terminated ioc 804b scsi 0 state 0 xfer 0 (da4:mps0:0:4:0): READ(16). CDB: 88 00 00 00 00 01 99 a0 a9 90 00 00 00 e8 00 00 (da4:mps0:0:4:0): CAM status: CCB request completed with an error (da4:mps0:0:4:0): Retrying command (da4:mps0:0:4:0): READ(16). CDB: 88 00 00 00 00 01 99 a0 a8 a8 00 00 00 e8 00 00 (da4:mps0:0:4:0): CAM status: SCSI Status Error (da4:mps0:0:4:0): SCSI status: Check Condition (da4:mps0:0:4:0): SCSI sense: MEDIUM ERROR asc:11,0 (Unrecovered read error) (da4:mps0:0:4:0): Info: 0x199a0a8a8 (da4:mps0:0:4:0): Error 5, Unretryable error GEOM_ELI: g_eli_read_done() failed (error=5) gptid/13ef3990-fb93-11e6-abcc-0cc47a8668da.eli[READ(offset=3516526514176, length=118784)] (da4:mps0:0:4:0): READ(16). CDB: 88 00 00 00 00 01 99 a7 44 88 00 00 00 e8 00 00 length 118784 SMID 849 terminated ioc 804b scsi 0 state 0 xfer 0 (da4:mps0:0:4:0): READ(16). CDB: 88 00 00 00 00 01 99 a7 44 88 00 00 00 e8 00 00 (da4:mps0:0:4:0): CAM status: CCB request completed with an error (da4:mps0:0:4:0): Retrying command (da4:mps0:0:4:0): READ(16). CDB: 88 00 00 00 00 01 99 a7 43 a8 00 00 00 e0 00 00 (da4:mps0:0:4:0): CAM status: SCSI Status Error (da4:mps0:0:4:0): SCSI status: Check Condition (da4:mps0:0:4:0): SCSI sense: MEDIUM ERROR asc:11,0 (Unrecovered read error) (da4:mps0:0:4:0): Info: 0x199a743a8 (da4:mps0:0:4:0): Error 5, Unretryable error GEOM_ELI: g_eli_read_done() failed (error=5) gptid/13ef3990-fb93-11e6-abcc-0cc47a8668da.eli[READ(offset=3516748156928, length=114688)] -- End of security output --
ada0
da0, da1, da2, da4, da6
So an error came up like this before, and it seemed to be caused by a fault fan switch on the back of my case. To Fractal's credit, they replaced the switch and I did some testing to validate that resolved the problem. Since we could not reproduce the problem with the new switch, we figured the issue was resolved. That issue is in a previous post. It seemed moving the switch back and forth during a badblocks test caused the cam errors to throw.
Spring forward to now, and we are again seeing these errors. First thought was, maybe it was the breakout cables or perhaps the M1015 card ; but then we noticed that ADA0 (drive connected directly to the SATA port) also encountered a SATA equivalent error. Given that drive is connected via a different signaling cable independent of the da* series of drives, it seems unlikely (though not impossible) that cabling is the problem.
The bigger oddity is if a smartctl -x is issued, we see these FPMA errors on the aforementioned drives, but if a smartctl -a is run you do not see the errors. In fact, looking for the key indicators of drive issues, none are immediately present.
Checked for the following areas for all (10) drives:
Raw_Read_Error_Rate
Reallocated_Event_Ct
Current_Pending_Sector
Offline_Uncorrectable
Multi_Zone_Error_Rate
This is starting to look more and more like some kind of power oddity. The investigation is still ongoing, but look forward to hearing input from the community.
We are in the process of running a new extended test on all (10) drives, which will be posted shortly after completion for your consideration. The status of the pool will also be included. Everything looks in order except for those CAM errors and we don't like to see errors, unless they can be effectively explained and if at all possible solved :)
Thanks,
-Dan
Last edited: