Register for the iXsystems Community to get an ad-free experience and exclusive discounts in our eBay Store.

SYNCHRONIZE CACHE command timeout error

El Al

Neophyte
Joined
Oct 30, 2016
Messages
10
Sadly, this fix does not appear to work. I lost a drive today with NCQ disabled.

Code:
# camcontrol tag da6 -v
(pass6:mpr0:0:6:0): dev_openings  1
(pass6:mpr0:0:6:0): dev_active    0
(pass6:mpr0:0:6:0): allocated     0
(pass6:mpr0:0:6:0): queued        0
(pass6:mpr0:0:6:0): held          0
(pass6:mpr0:0:6:0): mintags       2
(pass6:mpr0:0:6:0): maxtags       255

Aug  1 03:18:40 vault   (da6:mpr0:0:6:0): SYNCHRONIZE CACHE(10). CDB: 35 00 00 00 00 00 00 00 00 00 length 0 SMID 975 Aborting command 0xfffffe00010a6990
Aug  1 03:18:40 vault mpr0: Sending reset from mprsas_send_abort for target ID 6
Aug  1 03:18:40 vault mpr0: Unfreezing devq for target ID 6
Aug  1 03:18:40 vault (da6:mpr0:0:6:0): SYNCHRONIZE CACHE(10). CDB: 35 00 00 00 00 00 00 00 00 00
Aug  1 03:18:40 vault (da6:mpr0:0:6:0): CAM status: Command timeout
Aug  1 03:18:40 vault (da6:mpr0:0:6:0): Retrying command
Aug  1 03:18:40 vault (da6:mpr0:0:6:0): SYNCHRONIZE CACHE(10). CDB: 35 00 00 00 00 00 00 00 00 00
Aug  1 03:18:40 vault (da6:mpr0:0:6:0): CAM status: SCSI Status Error
Aug  1 03:18:40 vault (da6:mpr0:0:6:0): SCSI status: Check Condition
Aug  1 03:18:40 vault (da6:mpr0:0:6:0): SCSI sense: UNIT ATTENTION asc:29,0 (Power on, reset, or bus device reset occurred)
Aug  1 03:18:40 vault (da6:mpr0:0:6:0): Error 6, Retries exhausted
Aug  1 03:18:40 vault (da6:mpr0:0:6:0): Invalidating pack
Aug  1 03:18:40 vault ZFS: vdev state changed, pool_guid=8222513563341353211 vdev_guid=2829914938609022262
Aug  1 03:18:40 vault ZFS: vdev state changed, pool_guid=8222513563341353211 vdev_guid=2829914938609022262

That is too bad! I have been absolutely error free since making the change. And I sometimes had them crap out in groups. I have done nothing except make sure they never sleep and disabling NCQ... Apparently the seagate saga continues...

Code:
(da13:mpr0:0:4:0): CAM status: Command timeout
    (da13:mpr0:0:4:0): READ(10). CDB: 28 00 63 c4 26 d8 00 00 30 00 length 24576 SMID 281 terminated ioc 804b scsi 0 state c xfer(da13: 0
mpr0:0:    (da13:mpr0:0:4:0): WRITE(16). CDB: 8a 00 00 00 00 02 01 4d 23 e0 00 00 00 58 00 00 length 45056 SMID 325 terminated ioc 804b 4:scsi 0 state c xfer 0
0): mpr0: Retrying command
Unfreezing devq for target ID 4
(da13:mpr0:0:4:0): READ(10). CDB: 28 00 63 c4 25 f8 00 00 28 00
(da13:mpr0:0:4:0): CAM status: CCB request completed with an error
(da13:mpr0:0:4:0): Retrying command
(da13:mpr0:0:4:0): READ(10). CDB: 28 00 63 c4 26 d8 00 00 30 00
(da13:mpr0:0:4:0): CAM status: CCB request completed with an error
(da13:mpr0:0:4:0): Retrying command
(da13:mpr0:0:4:0): WRITE(16). CDB: 8a 00 00 00 00 02 01 4d 23 e0 00 00 00 58 00 00
(da13:mpr0:0:4:0): CAM status: CCB request completed with an error
(da13:mpr0:0:4:0): Retrying command
(da13:mpr0:0:4:0): READ(10). CDB: 28 00 63 c4 26 28 00 00 80 00
(da13:mpr0:0:4:0): CAM status: SCSI Status Error
(da13:mpr0:0:4:0): SCSI status: Check Condition
(da13:mpr0:0:4:0): SCSI sense: UNIT ATTENTION asc:29,0 (Power on, reset, or bus device reset occurred)
(da13:mpr0:0:4:0): Retrying command (per sense data)
nfsd: can't register svc name
    (noperiph:mpr0:0:4294967295:0): SMID 2 Aborting command 0xfffffe0000af6040
mpr0: Sending reset from mprsas_send_abort for target ID 7
    (da16:mpr0:0:7:0): SYNCHRONIZE CACHE(10). CDB: 35 00 00 00 00 00 00 00 00 00 length 0 SMID 709 terminated ioc 804b scsi 0 state c xfer 0
    (da16:mpr0:0:7:0): READ(10). CDB: 28 00 64 d9 9f 98 00 00 88 00 length 69632 SMID 684 terminated ioc 804b scsi 0 state c xfer(da16:mpr0:0:7:0): SYNCHRONIZE CACHE(10). CDB: 35 00 00 00 00 00 00 00 00 00
0
(da16:mpr0:0:7:0): CAM status: CCB request completed with an error
    (da16:mpr0:0:7:0): READ(16). CDB: 88 00 00 00 00 03 14 e4 b8 b0 00 00 00 30 00 00 length 24576 SMID 192 terminated ioc 804b s(da16:csi 0 state c xfer 0
mpr0:0:    (da16:mpr0:0:7:0): READ(16). CDB: 88 00 00 00 00 02 90 9d ca c8 00 00 00 e0 00 00 length 114688 SMID 431 terminated ioc 804b 7:scsi 0 state c xfer 0
0): mpr0: Retrying command
Unfreezing devq for target ID 7
(da16:mpr0:0:7:0): READ(10). CDB: 28 00 64 d9 9f 98 00 00 88 00
(da16:mpr0:0:7:0): CAM status: CCB request completed with an error
(da16:mpr0:0:7:0): Retrying command
(da16:mpr0:0:7:0): READ(16). CDB: 88 00 00 00 00 03 14 e4 b8 b0 00 00 00 30 00 00
(da16:mpr0:0:7:0): CAM status: CCB request completed with an error
(da16:mpr0:0:7:0): Retrying command
(da16:mpr0:0:7:0): READ(16). CDB: 88 00 00 00 00 02 90 9d cb a8 00 00 00 b0 00 00
(da16:mpr0:0:7:0): CAM status: Command timeout
(da16:mpr0:0:7:0): Retrying command
(da16:mpr0:0:7:0): READ(16). CDB: 88 00 00 00 00 02 90 9d ca c8 00 00 00 e0 00 00
(da16:mpr0:0:7:0): CAM status: CCB request completed with an error
(da16:mpr0:0:7:0): Retrying command
(da16:mpr0:0:7:0): READ(10). CDB: 28 00 64 d9 9f 98 00 00 88 00
(da16:mpr0:0:7:0): CAM status: SCSI Status Error
(da16:mpr0:0:7:0): SCSI status: Check Condition
(da16:mpr0:0:7:0): SCSI sense: UNIT ATTENTION asc:29,0 (Power on, reset, or bus device reset occurred)
(da16:mpr0:0:7:0): Retrying command (per sense data)
    (da16:mpr0:0:7:0): WRITE(16). CDB: 8a 00 00 00 00 02 03 25 2d 50 00 00 00 e0 00 00 length 114688 SMID 271 terminated ioc 804b scsi 0 state c xfer 114688
    (da16:mpr0:0:7:0): READ(16). CDB: 88 00 00 00 00 03 14 e4 f3 40 00 00 00 e8 00 00 length 118784 SMID 544 terminated ioc 804b (da16:mpr0:0:7:0): WRITE(16). CDB: 8a 00 00 00 00 02 03 25 2d 50 00 00 00 e0 00 00
scsi 0 state c xfer 0
(da16:mpr0:0:7:0): CAM status: CCB request completed with an error
    (da16:mpr0:0:7:0): READ(16). CDB: 88 00 00 00 00 03 14 e4 f4 28 00 00 00 b0 00 00 length 90112 SMID 325 terminated ioc 804b s(da16:csi 0 state c xfer 0
mpr0:0:    (da16:mpr0:0:7:0): READ(16). CDB: 88 00 00 00 00 03 14 e4 f2 30 00 00 00 e0 00 00 length 114688 SMID 980 terminated ioc 804b 7:scsi 0 state c xfer 0
0): Retrying command
(da16:mpr0:0:7:0): READ(16). CDB: 88 00 00 00 00 03 14 e4 f3 40 00 00 00 e8 00 00
(da16:mpr0:0:7:0): CAM status: CCB request completed with an error
(da16:mpr0:0:7:0): Retrying command
(da16:mpr0:0:7:0): READ(16). CDB: 88 00 00 00 00 03 14 e4 f4 28 00 00 00 b0 00 00
(da16:mpr0:0:7:0): CAM status: CCB request completed with an error
(da16:mpr0:0:7:0): Retrying command
(da16:mpr0:0:7:0): READ(16). CDB: 88 00 00 00 00 03 14 e4 f2 30 00 00 00 e0 00 00
(da16:mpr0:0:7:0): CAM status: CCB request completed with an error
(da16:mpr0:0:7:0): Retrying command
(da16:mpr0:0:7:0): READ(16). CDB: 88 00 00 00 00 03 14 e4 f2 30 00 00 00 e0 00 00
(da16:mpr0:0:7:0): CAM status: SCSI Status Error
(da16:mpr0:0:7:0): SCSI status: Check Condition
(da16:mpr0:0:7:0): SCSI sense: UNIT ATTENTION asc:29,0 (Power on, reset, or bus device reset occurred)
(da16:mpr0:0:7:0): Retrying command (per sense data)
    (da16:mpr0:0:7:0): WRITE(16). CDB: 8a 00 00 00 00 06 5d df fd b8 00 00 00 08 00 00 length 4096 SMID 882 terminated ioc 804b scsi 0 state c xfer 0
    (da16:mpr0:0:7:0): WRITE(16). CDB: 8a 00 00 00 00 06 5d df fb b8 00 00 00 08 00 00 length 4096 SMID 875 terminated ioc 804b s(da16:mpr0:0:7:0): WRITE(16). CDB: 8a 00 00 00 00 06 5d df fd b8 00 00 00 08 00 00
csi 0 state c xfer 0
(da16:mpr0:0:7:0): CAM status: CCB request completed with an error
    (da16:mpr0:0:7:0): WRITE(6). CDB: 0a 00 03 b8 08 00 length 4096 SMID 488 terminated ioc 804b scsi 0 state c xfer 0
(da16:mpr0:0:    (da16:mpr0:0:7:0): WRITE(6). CDB: 0a 00 01 b8 08 00 length 4096 SMID 994 terminated ioc 804b scsi 0 state c xfer 0
7:0):     (da16:mpr0:0:7:0): READ(16). CDB: 88 00 00 00 00 01 1a af 61 68 00 00 00 b0 00 00 length 90112 SMID 548 terminated ioc 804b sRetrying command
csi 0 state c xfer 0
(da16:mpr0:0:7:0): WRITE(16). CDB: 8a 00 00 00 00 06 5d df fb b8 00 00 00 08 00 00
    (da16:mpr0:0:7:0): READ(16). CDB: 88 00 00 00 00 01 1a af 60 88 00 00 00 b0 00 00 length 90112 SMID 909 terminated ioc 804b s(da16:mpr0:0:7:0): CAM status: CCB request completed with an error
csi 0 state c xfer 0
(da16:mpr0:0:    (da16:mpr0:0:7:0): READ(16). CDB: 88 00 00 00 00 01 1a af 62 48 00 00 00 30 00 00 length 24576 SMID 707 terminated ioc 804b s7:0): csi 0 state c xfer 0
Retrying command
(da16:mpr0:0:7:0): WRITE(6). CDB: 0a 00 03 b8 08 00
(da16:mpr0:0:7:0): CAM status: CCB request completed with an error
(da16:mpr0:0:7:0): Retrying command
(da16:mpr0:0:7:0): WRITE(6). CDB: 0a 00 01 b8 08 00
(da16:mpr0:0:7:0): CAM status: CCB request completed with an error
(da16:mpr0:0:7:0): Retrying command
(da16:mpr0:0:7:0): READ(16). CDB: 88 00 00 00 00 01 1a af 61 68 00 00 00 b0 00 00
(da16:mpr0:0:7:0): CAM status: CCB request completed with an error
(da16:mpr0:0:7:0): Retrying command
(da16:mpr0:0:7:0): READ(16). CDB: 88 00 00 00 00 01 1a af 60 88 00 00 00 b0 00 00
(da16:mpr0:0:7:0): CAM status: CCB request completed with an error
(da16:mpr0:0:7:0): Retrying command
(da16:mpr0:0:7:0): READ(16). CDB: 88 00 00 00 00 01 1a af 62 48 00 00 00 30 00 00
(da16:mpr0:0:7:0): CAM status: CCB request completed with an error
(da16:mpr0:0:7:0): Retrying command
(da16:mpr0:0:7:0): READ(10). CDB: 28 00 59 ad 3d a0 00 00 28 00
(da16:mpr0:0:7:0): CAM status: SCSI Status Error
(da16:mpr0:0:7:0): SCSI status: Check Condition
(da16:mpr0:0:7:0): SCSI sense: UNIT ATTENTION asc:29,0 (Power on, reset, or bus device reset occurred)
(da16:mpr0:0:7:0): Retrying command (per sense data)
    (da16:mpr0:0:7:0): READ(10). CDB: 28 00 60 d6 c6 38 00 00 20 00 length 16384 SMID 397 terminated ioc 804b scsi 0 state c xfer 0
    (da16:mpr0:0:7:0): READ(10). CDB: 28 00 60 d0 0d a8 00 00 b0 00 length 90112 SMID 215 terminated ioc 804b scsi 0 state c xfer(da16:mpr0:0:7:0): READ(10). CDB: 28 00 60 d6 c6 38 00 00 20 00
0
(da16:mpr0:0:7:0): CAM status: CCB request completed with an error
    (da16:mpr0:0:7:0): READ(10). CDB: 28 00 60 c7 5e c0 00 00 20 00 length 16384 SMID 898 terminated ioc 804b scsi 0 state c xfer(da16: 0
mpr0:0:    (da16:mpr0:0:7:0): SYNCHRONIZE CACHE(10). CDB: 35 00 00 00 00 00 00 00 00 00 length 0 SMID 191 terminated ioc 804b scsi 0 sta7:te c xfer 0
0):     (da16:mpr0:0:7:0): READ(10). CDB: 28 00 59 ad 5a 00 00 00 28 00 length 20480 SMID 969 terminated ioc 804b scsi 0 state c xferRetrying command
0
(da16:mpr0:0:7:0): READ(10). CDB: 28 00 60 d0 0d a8 00 00 b0 00
    (da16:mpr0:0:7:0): READ(10). CDB: 28 00 59 ad 3d a0 00 00 28 00 length 20480 SMID 841 terminated ioc 804b scsi 0 state c xfer(da16:mpr0:0:7:0): CAM status: CCB request completed with an error
0
(da16:mpr0:0:7:0): Retrying command
(da16:mpr0:0:7:0): READ(10). CDB: 28 00 60 c7 5e c0 00 00 20 00
(da16:mpr0:0:7:0): CAM status: CCB request completed with an error
(da16:mpr0:0:7:0): Retrying command
(da16:mpr0:0:7:0): SYNCHRONIZE CACHE(10). CDB: 35 00 00 00 00 00 00 00 00 00
(da16:mpr0:0:7:0): CAM status: CCB request completed with an error
(da16:mpr0:0:7:0): Retrying command
(da16:mpr0:0:7:0): READ(10). CDB: 28 00 59 ad 5a 00 00 00 28 00
(da16:mpr0:0:7:0): CAM status: CCB request completed with an error
(da16:mpr0:0:7:0): Retrying command
(da16:mpr0:0:7:0): READ(10). CDB: 28 00 59 ad 3d a0 00 00 28 00
(da16:mpr0:0:7:0): CAM status: CCB request completed with an error
(da16:mpr0:0:7:0): Retrying command
(da16:mpr0:0:7:0): READ(10). CDB: 28 00 59 ad 5a 00 00 00 28 00
(da16:mpr0:0:7:0): CAM status: SCSI Status Error
(da16:mpr0:0:7:0): SCSI status: Check Condition
(da16:mpr0:0:7:0): SCSI sense: UNIT ATTENTION asc:29,0 (Power on, reset, or bus device reset occurred)
(da16:mpr0:0:7:0): Retrying command (per sense data)
    (noperiph:mpr0:0:4294967295:0): SMID 3 Aborting command 0xfffffe0000b0b2e0
mpr0: Sending reset from mprsas_send_abort for target ID 1
    (da10:mpr0:0:1:0): READ(16). CDB: 88 00 00 00 00 02 c5 df 0d 28 00 00 00 28 00 00 length 20480 SMID 582 terminated ioc 804b scsi 0 state c xfer 0
    (da10:mpr0:0:1:0): READ(16). CDB: 88 00 00 00 00 02 02 a8 75 20 00 00 00 48 00 00 length 36864 SMID 824 terminated ioc 804b s(da10:mpr0:0:1:0): READ(16). CDB: 88 00 00 00 00 02 c5 df 0d 28 00 00 00 28 00 00
csi 0 state c xfer 0
(da10:mpr0:0:1:0): CAM status: CCB request completed with an error
    (da10:mpr0:0:1:0): WRITE(16). CDB: 8a 00 00 00 00 02 04 11 ec 18 00 00 00 e0 00 00 length 114688 SMID 778 terminated ioc 804b(da10: scsi 0 state c xfer 0
mpr0:0:mpr0: 1:Unfreezing devq for target ID 1
0): Retrying command
(da10:mpr0:0:1:0): READ(16). CDB: 88 00 00 00 00 02 02 a8 eb a0 00 00 00 b0 00 00
(da10:mpr0:0:1:0): CAM status: Command timeout
(da10:mpr0:0:1:0): Retrying command
(da10:mpr0:0:1:0): READ(16). CDB: 88 00 00 00 00 02 02 a8 75 20 00 00 00 48 00 00
(da10:mpr0:0:1:0): CAM status: CCB request completed with an error
(da10:mpr0:0:1:0): Retrying command
(da10:mpr0:0:1:0): WRITE(16). CDB: 8a 00 00 00 00 02 04 11 ec 18 00 00 00 e0 00 00
(da10:mpr0:0:1:0): CAM status: CCB request completed with an error
(da10:mpr0:0:1:0): Retrying command
(da10:mpr0:0:1:0): READ(16). CDB: 88 00 00 00 00 02 02 a8 eb a0 00 00 00 b0 00 00
(da10:mpr0:0:1:0): CAM status: SCSI Status Error
(da10:mpr0:0:1:0): SCSI status: Check Condition
(da10:mpr0:0:1:0): SCSI sense: UNIT ATTENTION asc:29,0 (Power on, reset, or bus device reset occurred)
(da10:mpr0:0:1:0): Retrying command (per sense data)
    (da10:mpr0:0:1:0): SYNCHRONIZE CACHE(10). CDB: 35 00 00 00 00 00 00 00 00 00 length 0 SMID 341 terminated ioc 804b scsi 0 state c xfer 0
    (da10:mpr0:0:1:0): READ(16). CDB: 88 00 00 00 00 03 20 f9 67 20 00 00 00 30 00 00 length 24576 SMID 674 terminated ioc 804b s(da10:mpr0:0:1:0): SYNCHRONIZE CACHE(10). CDB: 35 00 00 00 00 00 00 00 00 00
csi 0 state c xfer 0
(da10:mpr0:0:1:0): CAM status: CCB request completed with an error
    (da10:mpr0:0:1:0): READ(16). CDB: 88 00 00 00 00 03 20 f9 29 60 00 00 00 b0 00 00 length 90112 SMID 372 terminated ioc 804b s(da10:csi 0 state c xfer 0
mpr0:0:    (da10:mpr0:0:1:0): READ(16). CDB: 88 00 00 00 00 03 20 f9 28 80 00 00 00 b0 00 00 length 90112 SMID 483 terminated ioc 804b s1:csi 0 state c xfer 0
0): Retrying command
(da10:mpr0:0:1:0): READ(16). CDB: 88 00 00 00 00 03 20 f9 67 20 00 00 00 30 00 00
(da10:mpr0:0:1:0): CAM status: CCB request completed with an error
(da10:mpr0:0:1:0): Retrying command
(da10:mpr0:0:1:0): READ(16). CDB: 88 00 00 00 00 03 20 f9 29 60 00 00 00 b0 00 00
(da10:mpr0:0:1:0): CAM status: CCB request completed with an error
(da10:mpr0:0:1:0): Retrying command
(da10:mpr0:0:1:0): READ(16). CDB: 88 00 00 00 00 03 20 f9 28 80 00 00 00 b0 00 00
(da10:mpr0:0:1:0): CAM status: CCB request completed with an error
(da10:mpr0:0:1:0): Retrying command
(da10:mpr0:0:1:0): READ(16). CDB: 88 00 00 00 00 03 20 f9 28 80 00 00 00 b0 00 00
(da10:mpr0:0:1:0): CAM status: SCSI Status Error
(da10:mpr0:0:1:0): SCSI status: Check Condition
(da10:mpr0:0:1:0): SCSI sense: UNIT ATTENTION asc:29,0 (Power on, reset, or bus device reset occurred)
(da10:mpr0:0:1:0): Retrying command (per sense data)
    (da10:mpr0:0:1:0): READ(16). CDB: 88 00 00 00 00 02 c5 e2 3c 98 00 00 00 28 00 00 length 20480 SMID 242 terminated ioc 804b scsi 0 state c xfer 0
    (da10:mpr0:0:1:0): READ(16). CDB: 88 00 00 00 00 03 20 fa 8b b0 00 00 00 e0 00 00 length 114688 SMID 180 terminated ioc 804b (da10:mpr0:0:1:0): READ(16). CDB: 88 00 00 00 00 02 c5 e2 3c 98 00 00 00 28 00 00
scsi 0 state c xfer 0
(da10:mpr0:0:1:0): CAM status: CCB request completed with an error
    (da10:mpr0:0:1:0): WRITE(16). CDB: 8a 00 00 00 00 02 00 06 3a c0 00 00 00 10 00 00 length 8192 SMID 628 terminated ioc 804b s(da10:csi 0 state c xfer 0
mpr0:0:    (da10:mpr0:0:1:0): READ(16). CDB: 88 00 00 00 00 03 20 fa 8b 00 00 00 00 58 00 00 length 45056 SMID 255 terminated ioc 804b s1:csi 0 state c xfer 0
0):     (da10:mpr0:0:1:0): READ(16). CDB: 88 00 00 00 00 03 20 fa 89 e0 00 00 00 b0 00 00 length 90112 SMID 604 terminated ioc 804b sRetrying command
csi 0 state c xfer 0
(da10:mpr0:0:1:0): READ(16). CDB: 88 00 00 00 00 03 20 fa 8b b0 00 00 00 e0 00 00
(da10:mpr0:0:1:0): CAM status: CCB request completed with an error
(da10:mpr0:0:1:0): Retrying command
(da10:mpr0:0:1:0): WRITE(16). CDB: 8a 00 00 00 00 02 00 06 3a c0 00 00 00 10 00 00
(da10:mpr0:0:1:0): CAM status: CCB request completed with an error
(da10:mpr0:0:1:0): Retrying command
(da10:mpr0:0:1:0): READ(16). CDB: 88 00 00 00 00 03 20 fa 8b 00 00 00 00 58 00 00
(da10:mpr0:0:1:0): CAM status: CCB request completed with an error
(da10:mpr0:0:1:0): Retrying command
(da10:mpr0:0:1:0): READ(16). CDB: 88 00 00 00 00 03 20 fa 89 e0 00 00 00 b0 00 00
(da10:mpr0:0:1:0): CAM status: CCB request completed with an error
(da10:mpr0:0:1:0): Retrying command
(da10:mpr0:0:1:0): READ(16). CDB: 88 00 00 00 00 03 20 fa 89 e0 00 00 00 b0 00 00
(da10:mpr0:0:1:0): CAM status: SCSI Status Error
(da10:mpr0:0:1:0): SCSI status: Check Condition
(da10:mpr0:0:1:0): SCSI sense: UNIT ATTENTION asc:29,0 (Power on, reset, or bus device reset occurred)
(da10:mpr0:0:1:0): Retrying command (per sense data)
    (da10:mpr0:0:1:0): READ(10). CDB: 28 00 6c 9e a5 f0 00 00 b0 00 length 90112 SMID 305 terminated ioc 804b scsi 0 state c xfer 0
    (da10:mpr0:0:1:0): READ(16). CDB: 88 00 00 00 00 03 20 fd 6f 30 00 00 00 30 00 00 length 24576 SMID 748 terminated ioc 804b s(da10:mpr0:0:1:0): READ(10). CDB: 28 00 6c 9e a5 f0 00 00 b0 00
csi 0 state c xfer 0
(da10:mpr0:0:1:0): CAM status: CCB request completed with an error
    (da10:mpr0:0:1:0): READ(16). CDB: 88 00 00 00 00 03 20 fc 52 70 00 00 00 b0 00 00 length 90112 SMID 207 terminated ioc 804b s(da10:csi 0 state c xfer 0
mpr0:0:    (da10:mpr0:0:1:0): SYNCHRONIZE CACHE(10). CDB: 35 00 00 00 00 00 00 00 00 00 length 0 SMID 689 terminated ioc 804b scsi 0 sta1:te c xfer 0
0): Retrying command
(da10:mpr0:0:1:0): READ(16). CDB: 88 00 00 00 00 03 20 fd 6f 30 00 00 00 30 00 00
(da10:mpr0:0:1:0): CAM status: CCB request completed with an error
(da10:mpr0:0:1:0): Retrying command
(da10:mpr0:0:1:0): READ(16). CDB: 88 00 00 00 00 03 20 fc 52 70 00 00 00 b0 00 00
(da10:mpr0:0:1:0): CAM status: CCB request completed with an error
(da10:mpr0:0:1:0): Retrying command
(da10:mpr0:0:1:0): SYNCHRONIZE CACHE(10). CDB: 35 00 00 00 00 00 00 00 00 00
(da10:mpr0:0:1:0): CAM status: CCB request completed with an error
(da10:mpr0:0:1:0): Retrying command
(da10:mpr0:0:1:0): READ(16). CDB: 88 00 00 00 00 03 20 fc 52 70 00 00 00 b0 00 00
(da10:mpr0:0:1:0): CAM status: SCSI Status Error
(da10:mpr0:0:1:0): SCSI status: Check Condition
(da10:mpr0:0:1:0): SCSI sense: UNIT ATTENTION asc:29,0 (Power on, reset, or bus device reset occurred)
(da10:mpr0:0:1:0): Retrying command (per sense data)
    (noperiph:mpr0:0:4294967295:0): SMID 4 Aborting command 0xfffffe0000ad3f90
mpr0: Sending reset from mprsas_send_abort for target ID 4
    (da13:mpr0:0:4:0): READ(16). CDB: 88 00 00 00 00 01 11 c0 20 a8 00 00 00 30 00 00 length 24576 SMID 792 terminated ioc 804b s(da13:mpr0:0:4:0): READ(16). CDB: 88 00 00 00 00 01 11 c0 20 50 00 00 00 28 00 00
csi 0 state c xfer 0
(da13:mpr0:0:4:0): CAM status: Command timeout
    (da13:mpr0:0:4:0): READ(16). CDB: 88 00 00 00 00 01 11 c1 68 90 00 00 00 28 00 00 length 20480 SMID 927 terminated ioc 804b s(da13:csi 0 state c xfer 0
mpr0:0:    (da13:mpr0:0:4:0): WRITE(16). CDB: 8a 00 00 00 00 02 04 95 8a 48 00 00 00 e0 00 00 length 114688 SMID 289 terminated ioc 804b4: scsi 0 state c xfer 0
0): mpr0: Retrying command
Unfreezing devq for target ID 4
(da13:mpr0:0:4:0): READ(16). CDB: 88 00 00 00 00 01 11 c0 20 a8 00 00 00 30 00 00
(da13:mpr0:0:4:0): CAM status: CCB request completed with an error
(da13:mpr0:0:4:0): Retrying command
(da13:mpr0:0:4:0): READ(16). CDB: 88 00 00 00 00 01 11 c1 68 90 00 00 00 28 00 00
(da13:mpr0:0:4:0): CAM status: CCB request completed with an error
(da13:mpr0:0:4:0): Retrying command
(da13:mpr0:0:4:0): WRITE(16). CDB: 8a 00 00 00 00 02 04 95 8a 48 00 00 00 e0 00 00
(da13:mpr0:0:4:0): CAM status: CCB request completed with an error
(da13:mpr0:0:4:0): Retrying command
(da13:mpr0:0:4:0): READ(16). CDB: 88 00 00 00 00 01 11 c0 20 a8 00 00 00 30 00 00
(da13:mpr0:0:4:0): CAM status: SCSI Status Error
(da13:mpr0:0:4:0): SCSI status: Check Condition
(da13:mpr0:0:4:0): SCSI sense: UNIT ATTENTION asc:29,0 (Power on, reset, or bus device reset occurred)
(da13:mpr0:0:4:0): Retrying command (per sense data)
    (da13:mpr0:0:4:0): READ(10). CDB: 28 00 6f 37 bc a0 00 00 28 00 length 20480 SMID 309 terminated ioc 804b scsi 0 state c xfer 0
    (da13:mpr0:0:4:0): READ(10). CDB: 28 00 6f 37 bc 18 00 00 30 00 length 24576 SMID 694 terminated ioc 804b scsi 0 state c xfer(da13:mpr0:0:4:0): READ(10). CDB: 28 00 6f 37 bc a0 00 00 28 00
0
(da13:mpr0:0:4:0): CAM status: CCB request completed with an error
    (da13:mpr0:0:4:0): READ(10). CDB: 28 00 6f 37 bc 48 00 00 28 00 length 20480 SMID 337 terminated ioc 804b scsi 0 state c xfer(da13: 0
mpr0:0:    (da13:mpr0:0:4:0): SYNCHRONIZE CACHE(10). CDB: 35 00 00 00 00 00 00 00 00 00 length 0 SMID 131 terminated ioc 804b scsi 0 sta4:te c xfer 0
0): Retrying command
(da13:mpr0:0:4:0): READ(10). CDB: 28 00 6f 37 bc 18 00 00 30 00
(da13:mpr0:0:4:0): CAM status: CCB request completed with an error
(da13:mpr0:0:4:0): Retrying command
(da13:mpr0:0:4:0): READ(10). CDB: 28 00 6f 37 bc 48 00 00 28 00
(da13:mpr0:0:4:0): CAM status: CCB request completed with an error
(da13:mpr0:0:4:0): Retrying command
(da13:mpr0:0:4:0): SYNCHRONIZE CACHE(10). CDB: 35 00 00 00 00 00 00 00 00 00
(da13:mpr0:0:4:0): CAM status: CCB request completed with an error
(da13:mpr0:0:4:0): Retrying command
(da13:mpr0:0:4:0): SYNCHRONIZE CACHE(10). CDB: 35 00 00 00 00 00 00 00 00 00
(da13:mpr0:0:4:0): CAM status: SCSI Status Error
(da13:mpr0:0:4:0): SCSI status: Check Condition
(da13:mpr0:0:4:0): SCSI sense: UNIT ATTENTION asc:29,0 (Power on, reset, or bus device reset occurred)
(da13:mpr0:0:4:0): Error 6, Retries exhausted
(da13:mpr0:0:4:0): Invalidating pack
mpr0: mpr_user_pass_thru: user reply buffer (64) smaller than returned buffer (68)
mpr0: mpr_user_pass_thru: user reply buffer (64) smaller than returned buffer (68)
    (noperiph:mpr0:0:4294967295:0): SMID 5 Aborting command 0xfffffe0000af0ee0
mpr0: Sending reset from mprsas_send_abort for target ID 7
    (da16:mpr0:0:7:0): READ(16). CDB: 88 00 00 00 00 03 26 3c d8 08 00 00 00 28 00 00 length 20480 SMID 483 terminated ioc 804b scsi 0 state c xfer 0
    (da16:mpr0:0:7:0): READ(16). CDB: 88 00 00 00 00 03 26 3c d7 d8 00 00 00 30 00 00 length 24576 SMID 277 terminated ioc 804b s(da16:mpr0:0:7:0): READ(16). CDB: 88 00 00 00 00 03 26 3c d8 08 00 00 00 28 00 00
csi 0 state c xfer 0
(da16:mpr0:0:7:0): CAM status: CCB request completed with an error
    (da16:mpr0:0:7:0): READ(16). CDB: 88 00 00 00 00 03 26 3c d7 b0 00 00 00 28 00 00 length 20480 SMID 692 terminated ioc 804b s(da16:csi 0 state c xfer 0
mpr0:0:mpr0: 7:Unfreezing devq for target ID 7
0): Retrying command
(da16:mpr0:0:7:0): READ(16). CDB: 88 00 00 00 00 03 26 3c d7 d8 00 00 00 30 00 00
(da16:mpr0:0:7:0): CAM status: CCB request completed with an error
(da16:mpr0:0:7:0): Retrying command
(da16:mpr0:0:7:0): READ(16). CDB: 88 00 00 00 00 03 26 3c d7 b0 00 00 00 28 00 00
(da16:mpr0:0:7:0): CAM status: CCB request completed with an error
(da16:mpr0:0:7:0): Retrying command
(da16:mpr0:0:7:0): SYNCHRONIZE CACHE(10). CDB: 35 00 00 00 00 00 00 00 00 00
(da16:mpr0:0:7:0): CAM status: Command timeout
(da16:mpr0:0:7:0): Retrying command
(da16:mpr0:0:7:0): READ(16). CDB: 88 00 00 00 00 03 26 3c d7 b0 00 00 00 28 00 00
(da16:mpr0:0:7:0): CAM status: SCSI Status Error
(da16:mpr0:0:7:0): SCSI status: Check Condition
(da16:mpr0:0:7:0): SCSI sense: UNIT ATTENTION asc:29,0 (Power on, reset, or bus device reset occurred)
(da16:mpr0:0:7:0): Retrying command (per sense data)
(da16:mpr0:0:7:0): READ(10). CDB: 28 00 6d a4 8a a0 00 00 30 00 length 24576 SMID 281 terminated ioc 804b scsi 0 state c xfer 0
(da16:mpr0:0:7:0): READ(10). CDB: 28 00 6d a4 8a 70 00 00 30 00 length 24576 SMID 155 terminated ioc 804b scsi 0 state c xfer(da16:mpr0:0:7:0): READ(10). CDB: 28 00 6d a4 8a a0 00 00 30 00
0
(da16:mpr0:0:7:0): CAM status: CCB request completed with an error
    (da16:mpr0:0:7:0): READ(16). CDB: 88 00 00 00 00 01 08 9a 70 60 00 00 00 28 00 00 length 20480 SMID 380 terminated ioc 804b s(da16:csi 0 state c xfer 0
mpr0:0:    (da16:mpr0:0:7:0): WRITE(16). CDB: 8a 00 00 00 00 02 08 96 9b 10 00 00 00 e0 00 00 length 114688 SMID 527 terminated ioc 804b7: scsi 0 state c xfer 0
0): Retrying command
(da16:mpr0:0:7:0): READ(10). CDB: 28 00 6d a4 8a 70 00 00 30 00
(da16:mpr0:0:7:0): CAM status: CCB request completed with an error
(da16:mpr0:0:7:0): Retrying command
(da16:mpr0:0:7:0): READ(16). CDB: 88 00 00 00 00 01 08 9a 70 60 00 00 00 28 00 00
(da16:mpr0:0:7:0): CAM status: CCB request completed with an error
(da16:mpr0:0:7:0): Retrying command
(da16:mpr0:0:7:0): WRITE(16). CDB: 8a 00 00 00 00 02 08 96 9b 10 00 00 00 e0 00 00
(da16:mpr0:0:7:0): CAM status: CCB request completed with an error
(da16:mpr0:0:7:0): Retrying command
(da16:mpr0:0:7:0): READ(16). CDB: 88 00 00 00 00 01 08 17 d1 48 00 00 00 08 00 00
(da16:mpr0:0:7:0): CAM status: SCSI Status Error
(da16:mpr0:0:7:0): SCSI status: Check Condition
(da16:mpr0:0:7:0): SCSI sense: UNIT ATTENTION asc:29,0 (Power on, reset, or bus device reset occurred)
(da16:mpr0:0:7:0): Retrying command (per sense data)
    (da16:mpr0:0:7:0): WRITE(16). CDB: 8a 00 00 00 00 02 08 96 9b f0 00 00 00 e8 00 00 length 118784 SMID 378 terminated ioc 804b scsi 0 state c xfer 0
    (da16:mpr0:0:7:0): READ(16). CDB: 88 00 00 00 00 01 08 17 d1 48 00 00 00 08 00 00 length 4096 SMID 455 terminated ioc 804b sc(da16:mpr0:0:7:0): WRITE(16). CDB: 8a 00 00 00 00 02 08 96 9b f0 00 00 00 e8 00 00
si 0 state c xfer 0
(da16:mpr0:0:7:0): CAM status: CCB request completed with an error
    (da16:mpr0:0:7:0): READ(10). CDB: 28 00 6d a4 8a a0 00 00 30 00 length 24576 SMID 232 terminated ioc 804b scsi 0 state c xfer(da16: 0
mpr0:0:    (da16:mpr0:0:7:0): READ(16). CDB: 88 00 00 00 00 01 08 9a 70 60 00 00 00 28 00 00 length 20480 SMID 352 terminated ioc 804b s7:csi 0 state c xfer 0
0):     (da16:mpr0:0:7:0): READ(10). CDB: 28 00 6d a4 8a 70 00 00 30 00 length 24576 SMID 563 terminated ioc 804b scsi 0 state c xferRetrying command
0
(da16:mpr0:0:7:0): READ(16). CDB: 88 00 00 00 00 01 08 17 d1 48 00 00 00 08 00 00
(da16:mpr0:0:7:0): CAM status: CCB request completed with an error
(da16:mpr0:0:7:0): Retrying command
(da16:mpr0:0:7:0): READ(10). CDB: 28 00 6d a4 8a a0 00 00 30 00
(da16:mpr0:0:7:0): CAM status: CCB request completed with an error
(da16:mpr0:0:7:0): Retrying command
(da16:mpr0:0:7:0): READ(16). CDB: 88 00 00 00 00 01 08 9a 70 60 00 00 00 28 00 00
(da16:mpr0:0:7:0): CAM status: CCB request completed with an error
(da16:mpr0:0:7:0): Retrying command
(da16:mpr0:0:7:0): READ(10). CDB: 28 00 6d a4 8a 70 00 00 30 00
(da16:mpr0:0:7:0): CAM status: CCB request completed with an error
(da16:mpr0:0:7:0): Retrying command
(da16:mpr0:0:7:0): READ(10). CDB: 28 00 6d a4 8a 70 00 00 30 00
(da16:mpr0:0:7:0): CAM status: SCSI Status Error
(da16:mpr0:0:7:0): SCSI status: Check Condition
(da16:mpr0:0:7:0): SCSI sense: UNIT ATTENTION asc:29,0 (Power on, reset, or bus device reset occurred)
(da16:mpr0:0:7:0): Retrying command (per sense data)
 

DGenerateKane

Member
Joined
Sep 4, 2014
Messages
90
I can concur the fix does not work. It looked like it would, my VM was up over three days before the first drive faulted, it has never made it past the 24 hour mark before. After a reboot, two drives faulted within a couple hours with the VM running. I'm still replacing these crappy Seagate drives.
 

El Al

Neophyte
Joined
Oct 30, 2016
Messages
10
I can concur the fix does not work. It looked like it would, my VM was up over three days before the first drive faulted, it has never made it past the 24 hour mark before. After a reboot, two drives faulted within a couple hours with the VM running. I'm still replacing these crappy Seagate drives.

Keep in mind that the change doesnt persist through reboots. Are you using a script to disable NCQ at startup?
 

DGenerateKane

Member
Joined
Sep 4, 2014
Messages
90
Keep in mind that the change doesnt persist through reboots. Are you using a script to disable NCQ at startup?
That wouldn't explain why a drive faulted the first time since I hadn't rebooted after. But no, I don't use a script because I've never been able to get any type of script to work. I've never found instructions good enough for me to understand how to get them to work.
 

El Al

Neophyte
Joined
Oct 30, 2016
Messages
10
That wouldn't explain why a drive faulted the first time since I hadn't rebooted after. But no, I don't use a script because I've never been able to get any type of script to work. I've never found instructions good enough for me to understand how to get them to work.
That's right. Again too bad.. I have not had a single issue since May after struggling with this for month bascially since i got the seagate drives.

For the script: You place it somewhere like /mnt/tank/scriptXXXXXXX.sh
Then: Tasks > Init/shutdown scripts > and add the script. Type is script/postinit and then you get to browse to your script and select it.
 

Pheran

Senior Member
Joined
Jul 14, 2015
Messages
276
I still have 6 10TB Ironwolf drives in my system - all of them on SC60 firmware. I just updated one of them to SC61. Everything came back fine but I'm running a scrub as a precaution. If that goes well I'll update another.
 

El Al

Neophyte
Joined
Oct 30, 2016
Messages
10
I realize this is mostly an old topic but a new "fix" for this has potentially surfaced, I opened a new topic for it you can check out here.

Thanks for the update! The fix is in line with my experience as you can read above. Unfortunately this doesnt seem to be a panacea and some people will have to continue to hunt after potentially broken hardware/software/configuration errors.


quote from the linked thread:
Now the point of this topic, digging deeper it turns out Seagate released a firmware update for the ST10000VN0004 and ST10000NE0004 last month. They bumped from firmware SC60 to SC61 and in that topic it's stated that this is because of "flush cache timing out bug that was discovered during routine testing" in regards to Synology systems.

As it turns out, write cache (and I believe internally NCQ) had been turned off for these specific drives in Synology systems for a while now because of "stability" issues. Since this firmware update it gets turned on again and all is well.
 

Pheran

Senior Member
Joined
Jul 14, 2015
Messages
276
No issues with the scrub, so I flashed 2 more drives today. My pool is now 2 HGST, 3 Ironwolf SC60, and 3 Ironwolf SC61. I'm going to run like this for a while and see if the frequency of drive failures decreases, and whether any new failures are limited to the SC60 drives.
 

Pheran

Senior Member
Joined
Jul 14, 2015
Messages
276
No issues with the scrub, so I flashed 2 more drives today. My pool is now 2 HGST, 3 Ironwolf SC60, and 3 Ironwolf SC61. I'm going to run like this for a while and see if the frequency of drive failures decreases, and whether any new failures are limited to the SC60 drives.
The system went a week before the first failure, which is unusual though not unprecedented - usually a drive fails every few days. The failed drive was on SC60, so far so good.
 

El Al

Neophyte
Joined
Oct 30, 2016
Messages
10
The system went a week before the first failure, which is unusual though not unprecedented - usually a drive fails every few days. The failed drive was on SC60, so far so good.
Thanks for the update!
 

Pheran

Senior Member
Joined
Jul 14, 2015
Messages
276
No issues with the scrub, so I flashed 2 more drives today. My pool is now 2 HGST, 3 Ironwolf SC60, and 3 Ironwolf SC61. I'm going to run like this for a while and see if the frequency of drive failures decreases, and whether any new failures are limited to the SC60 drives.
It's been 3 weeks and I've had 6 drive failures in that time; every single one of them was a drive on the old firmware. I'm really tired of failures at this point so I updated the remaining drives to SC61. Hopefully this system will be stable for the first time in its life. If it makes it to Halloween without another failure I will declare it fixed; going a whole month with no problems would be unprecedented.
 

El Al

Neophyte
Joined
Oct 30, 2016
Messages
10
Sounds very promising. It would be even better if seagate could come out with a firmware update for all drive sizes. I dont think this would be particularly difficult for them. Maybe something to draw their attention to once you have concluded your testing. It would do them well to be responsive to this and come out with a fix to rebuild their (currently) still very poor reputation with many.
 

Pheran

Senior Member
Joined
Jul 14, 2015
Messages
276
Sounds very promising. It would be even better if seagate could come out with a firmware update for all drive sizes. I don't think this would be particularly difficult for them. Maybe something to draw their attention to once you have concluded your testing. It would do them well to be responsive to this and come out with a fix to rebuild their (currently) still very poor reputation with many.
The only interaction I want to have with Seagate at the moment is to give them a big middle finger for releasing a completely broken drive and then taking THREE YEARS to fix the problem with a firmware update.
 

Pheran

Senior Member
Joined
Jul 14, 2015
Messages
276
I'm sad to say that I've had the first failure on a drive with SC61 firmware. However, it looks quite different. I'm not sure if the SC61 firmware still has an issue that's just much less frequent, or if this drive is legitimately failing in some way.

Code:
Oct 11 03:36:37 vault   (pass2:mpr0:0:2:0): ATA COMMAND PASS THROUGH(16). CDB: 85 08 0e 00 d0 00 01 00 00 00 4f 00 c2 00 b0 00 length 512 SMID 477 Aborting command 0xfffffe0001079db0
Oct 11 03:36:37 vault mpr0: Sending reset from mprsas_send_abort for target ID 2
Oct 11 03:36:37 vault smartd[3380]: Device: /dev/da2 [SAT], failed to read SMART Attribute Data
Oct 11 03:36:37 vault   (da2:mpr0:0:2:0): WRITE(16). CDB: 8a 00 00 00 00 02 fa 11 d1 38 00 00 00 10 00 00 length 8192 SMID 352 terminated ioc 804b loginfo 31130000 scsi 0 state c xfer 0
Oct 11 03:36:37 vault   (pass2:mpr0:0:2:0): ATA COMMAND PASS THROUGH(16). CDB: 85 08 0e 00 d0 00 01 00 00 00 4f 00 c2 00 b0 00 length 512 SMID 598 terminated ioc 804b loginfo 31130000 scsi 0 state c xfer 0
Oct 11 03:36:37 vault   (da2:mpr0:0:2:0): WRITE(16). CDB: 8a 00 00 00 00 01 73 77 bf 28 00 00 00 10 00 00 length 8192 SMID 230 terminated ioc 804b lo(da2:mpr0:0:2:0): WRITE(16). CDB: 8a 00 00 00 00 02 fa 11 d1 38 00 00 00 10 00 00
Oct 11 03:36:37 vault ginfo 31130000 scsi 0 state c xfer 0
Oct 11 03:36:37 vault mpr0: Unfreezing devq for target ID 2
Oct 11 03:36:37 vault (da2:mpr0:0:2:0): CAM status: CCB request completed with an error
Oct 11 03:36:37 vault (da2:mpr0:0:2:0): Retrying command
Oct 11 03:36:37 vault (da2:mpr0:0:2:0): WRITE(16). CDB: 8a 00 00 00 00 01 73 77 bf 28 00 00 00 10 00 00
Oct 11 03:36:37 vault (da2:mpr0:0:2:0): CAM status: CCB request completed with an error
Oct 11 03:36:37 vault (da2:mpr0:0:2:0): Retrying command
Oct 11 03:36:37 vault (da2:mpr0:0:2:0): WRITE(16). CDB: 8a 00 00 00 00 01 73 77 bf 28 00 00 00 10 00 00
Oct 11 03:36:37 vault (da2:mpr0:0:2:0): CAM status: SCSI Status Error
Oct 11 03:36:37 vault (da2:mpr0:0:2:0): SCSI status: Check Condition
Oct 11 03:36:37 vault (da2:mpr0:0:2:0): SCSI sense: UNIT ATTENTION asc:29,0 (Power on, reset, or bus device reset occurred)
Oct 11 03:36:37 vault (da2:mpr0:0:2:0): Retrying command (per sense data)
Oct 11 03:36:38 vault   (da2:mpr0:0:2:0): WRITE(16). CDB: 8a 00 00 00 00 03 c8 f7 74 68 00 00 00 20 00 00 length 16384 SMID 645 terminated ioc 804b loginfo 31120440 scsi 0 state c xfer 0
Oct 11 03:36:38 vault (da2:mpr0:0:2:0): WRITE(16). CDB: 8a 00 00 00 00 03 c8 f7 74 68 00 00 00 20 00 00
Oct 11 03:36:38 vault (da2:mpr0:0:2:0): CAM status: CCB request completed with an error
Oct 11 03:36:38 vault (da2:mpr0:0:2:0): Retrying command
Oct 11 03:36:38 vault (da2:mpr0:0:2:0): WRITE(16). CDB: 8a 00 00 00 00 03 c8 f7 74 68 00 00 00 20 00 00
Oct 11 03:36:38 vault (da2:mpr0:0:2:0): CAM status: SCSI Status Error
Oct 11 03:36:38 vault (da2:mpr0:0:2:0): SCSI status: Check Condition
Oct 11 03:36:38 vault (da2:mpr0:0:2:0): SCSI sense: UNIT ATTENTION asc:29,0 (Power on, reset, or bus device reset occurred)
Oct 11 03:36:38 vault (da2:mpr0:0:2:0): Retrying command (per sense data)
Oct 11 03:36:39 vault   (da2:mpr0:0:2:0): SYNCHRONIZE CACHE(10). CDB: 35 00 00 00 00 00 00 00 00 00 length 0 SMID 943 terminated ioc 804b loginfo 31110e03 scsi 0 state c xfer 0
Oct 11 03:36:39 vault (da2:mpr0:0:2:0): SYNCHRONIZE CACHE(10). CDB: 35 00 00 00 00 00 00 00 00 00
Oct 11 03:36:39 vault (da2:mpr0:0:2:0): CAM status: CCB request completed with an error
Oct 11 03:36:39 vault (da2:mpr0:0:2:0): Retrying command
Oct 11 03:36:39 vault (da2:mpr0:0:2:0): SYNCHRONIZE CACHE(10). CDB: 35 00 00 00 00 00 00 00 00 00
Oct 11 03:36:39 vault (da2:mpr0:0:2:0): CAM status: SCSI Status Error
Oct 11 03:36:39 vault (da2:mpr0:0:2:0): SCSI status: Check Condition
Oct 11 03:36:39 vault (da2:mpr0:0:2:0): SCSI sense: UNIT ATTENTION asc:29,0 (Power on, reset, or bus device reset occurred)
Oct 11 03:36:39 vault (da2:mpr0:0:2:0): Error 6, Retries exhausted
Oct 11 03:36:39 vault (da2:mpr0:0:2:0): Invalidating pack
Oct 11 03:36:39 vault ZFS: vdev state changed, pool_guid=8222513563341353211 vdev_guid=5541660926192300493
Oct 11 03:36:39 vault ZFS: vdev state changed, pool_guid=8222513563341353211 vdev_guid=5541660926192300493


Here's the SMART data for this drive.

Code:
=== START OF INFORMATION SECTION ===
Model Family:     Seagate IronWolf
Device Model:     ST10000VN0004-1ZD101
Serial Number:    ZA21536E
LU WWN Device Id: 5 000c50 0a22fd47b
Firmware Version: SC61
User Capacity:    10,000,831,348,736 bytes [10.0 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    7200 rpm
Form Factor:      3.5 inches
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ACS-3 T13/2161-D revision 5
SATA Version is:  SATA 3.1, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Fri Oct 11 09:45:34 2019 EDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x82) Offline data collection activity
                                        was completed without error.
                                        Auto Offline Data Collection: Enabled.
Self-test execution status:      (   0) The previous self-test routine completed
                                        without error or no self-test has ever
                                        been run.
Total time to complete Offline
data collection:                (  575) seconds.
Offline data collection
capabilities:                    (0x7b) SMART execute Offline immediate.
                                        Auto Offline data collection on/off support.
                                        Suspend Offline collection upon new
                                        command.
                                        Offline surface scan supported.
                                        Self-test supported.
                                        Conveyance Self-test supported.
                                        Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine
recommended polling time:        (   1) minutes.
Extended self-test routine
recommended polling time:        ( 930) minutes.
Conveyance self-test routine
recommended polling time:        (   2) minutes.
SCT capabilities:              (0x50bd) SCT Status supported.
                                        SCT Error Recovery Control supported.
                                        SCT Feature Control supported.
                                        SCT Data Table supported.

SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000f   083   064   044    Pre-fail  Always       -       220415216
  3 Spin_Up_Time            0x0003   089   086   000    Pre-fail  Always       -       0
  4 Start_Stop_Count        0x0032   100   100   020    Old_age   Always       -       24
  5 Reallocated_Sector_Ct   0x0033   100   100   010    Pre-fail  Always       -       16
  7 Seek_Error_Rate         0x000f   089   060   045    Pre-fail  Always       -       815182664
  9 Power_On_Hours          0x0032   077   077   000    Old_age   Always       -       20756
 10 Spin_Retry_Count        0x0013   100   100   097    Pre-fail  Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   020    Old_age   Always       -       23
184 End-to-End_Error        0x0032   100   100   099    Old_age   Always       -       0
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0
188 Command_Timeout         0x0032   099   099   000    Old_age   Always       -       4
189 High_Fly_Writes         0x003a   098   098   000    Old_age   Always       -       2
190 Airflow_Temperature_Cel 0x0022   062   052   040    Old_age   Always       -       38 (Min/Max 35/44)
191 G-Sense_Error_Rate      0x0032   093   093   000    Old_age   Always       -       14610
192 Power-Off_Retract_Count 0x0032   100   100   000    Old_age   Always       -       55
193 Load_Cycle_Count        0x0032   098   098   000    Old_age   Always       -       4869
194 Temperature_Celsius     0x0022   038   048   000    Old_age   Always       -       38 (0 20 0 0 0)
195 Hardware_ECC_Recovered  0x001a   039   001   000    Old_age   Always       -       220415216
197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0010   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age   Always       -       0
200 Multi_Zone_Error_Rate   0x0023   100   100   001    Pre-fail  Always       -       0
240 Head_Flying_Hours       0x0000   100   253   000    Old_age   Offline      -       20369 (249 53 0)
241 Total_LBAs_Written      0x0000   100   253   000    Old_age   Offline      -       34658940636
242 Total_LBAs_Read         0x0000   100   253   000    Old_age   Offline      -       504404149598

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short offline       Completed without error       00%     20697         -
# 2  Extended offline    Completed without error       00%     20556         -
# 3  Short offline       Completed without error       00%     20361         -
# 4  Short offline       Completed without error       00%     20025         -
# 5  Extended offline    Completed without error       00%     19873         -
# 6  Short offline       Completed without error       00%     19689         -
# 7  Short offline       Completed without error       00%     19353         -
# 8  Extended offline    Interrupted (host reset)      70%     19189         -
# 9  Short offline       Completed without error       00%     18849         -
#10  Short offline       Completed without error       00%     18517         -
#11  Extended offline    Completed without error       00%     18365         -
#12  Short offline       Completed without error       00%     18181         -
#13  Short offline       Completed without error       00%     17846         -
#14  Extended offline    Completed without error       00%     17694         -
#15  Short offline       Completed without error       00%     17510         -
#16  Short offline       Completed without error       00%     17174         -
#17  Extended offline    Completed without error       00%     17031         -
#18  Short offline       Completed without error       00%     16669         -
#19  Short offline       Completed without error       00%     16334         -
#20  Extended offline    Completed without error       00%     16181         -
#21  Short offline       Completed without error       00%     15998         -

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.
 

cobrakiller58

Senior Member
Joined
Jan 18, 2017
Messages
380
Code:
  5 Reallocated_Sector_Ct   0x0033   100   100   010    Pre-fail  Always       -       16
That would have me questioning it's health AND verify my backup, I wouldn't replace the drive yet (unless it is under warranty still) but I would keep a close eye on how quickly this attribute increases.
 

Pheran

Senior Member
Joined
Jul 14, 2015
Messages
276
That would have me questioning it's health AND verify my backup, I wouldn't replace the drive yet (unless it is under warranty still) but I would keep a close eye on how quickly this attribute increases.
It seems that several of my drives have counts in that field, particularly da1.
Code:
# for i in {0..7}; do echo -n "da$i "; smartctl -a /dev/da$i | grep 'Reallocated_Sector_Ct' | awk '{ print $10 }'; done
da0 0
da1 680
da2 16
da3 56
da4 0
da5 0
da6 0
da7 32

I'm not sure how critical they are, but I'll start with a reboot/scrub. At least I can more easily monitor the situation without the constant failures I got from the SC60 firmware.
 

cobrakiller58

Senior Member
Joined
Jan 18, 2017
Messages
380
680 is high, I would have a spare burned in and ready to go. Are any of those drives still covered by their warranties? Can you post the smart output for da1
 

Pheran

Senior Member
Joined
Jul 14, 2015
Messages
276
680 is high, I would have a spare burned in and ready to go. Are any of those drives still covered by their warranties? Can you post the smart output for da1
Sure.

Code:
smartctl 6.6 2017-11-05 r4594 [FreeBSD 11.2-STABLE amd64] (local build)
Copyright (C) 2002-17, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Seagate IronWolf
Device Model:     ST10000VN0004-1ZD101
Serial Number:    ZA215PKR
LU WWN Device Id: 5 000c50 0a25fd00d
Firmware Version: SC61
User Capacity:    10,000,831,348,736 bytes [10.0 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    7200 rpm
Form Factor:      3.5 inches
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ACS-3 T13/2161-D revision 5
SATA Version is:  SATA 3.1, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Sat Oct 12 12:13:58 2019 EDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x82) Offline data collection activity
                                        was completed without error.
                                        Auto Offline Data Collection: Enabled.
Self-test execution status:      (   0) The previous self-test routine completed
                                        without error or no self-test has ever
                                        been run.
Total time to complete Offline
data collection:                (  575) seconds.
Offline data collection
capabilities:                    (0x7b) SMART execute Offline immediate.
                                        Auto Offline data collection on/off support.
                                        Suspend Offline collection upon new
                                        command.
                                        Offline surface scan supported.
                                        Self-test supported.
                                        Conveyance Self-test supported.
                                        Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine
recommended polling time:        (   1) minutes.
Extended self-test routine
recommended polling time:        ( 961) minutes.
Conveyance self-test routine
recommended polling time:        (   2) minutes.
SCT capabilities:              (0x50bd) SCT Status supported.
                                        SCT Error Recovery Control supported.
                                        SCT Feature Control supported.
                                        SCT Data Table supported.

SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000f   077   064   044    Pre-fail  Always       -       46080048
  3 Spin_Up_Time            0x0003   089   086   000    Pre-fail  Always       -       0
  4 Start_Stop_Count        0x0032   100   100   020    Old_age   Always       -       25
  5 Reallocated_Sector_Ct   0x0033   100   100   010    Pre-fail  Always       -       680
  7 Seek_Error_Rate         0x000f   090   060   045    Pre-fail  Always       -       922331801
  9 Power_On_Hours          0x0032   077   077   000    Old_age   Always       -       20868
 10 Spin_Retry_Count        0x0013   100   100   097    Pre-fail  Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   020    Old_age   Always       -       24
184 End-to-End_Error        0x0032   100   100   099    Old_age   Always       -       0
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0
188 Command_Timeout         0x0032   100   099   000    Old_age   Always       -       6
189 High_Fly_Writes         0x003a   092   092   000    Old_age   Always       -       8
190 Airflow_Temperature_Cel 0x0022   061   054   040    Old_age   Always       -       39 (Min/Max 35/44)
191 G-Sense_Error_Rate      0x0032   098   098   000    Old_age   Always       -       4620
192 Power-Off_Retract_Count 0x0032   100   100   000    Old_age   Always       -       55
193 Load_Cycle_Count        0x0032   098   098   000    Old_age   Always       -       4273
194 Temperature_Celsius     0x0022   039   046   000    Old_age   Always       -       39 (0 20 0 0 0)
195 Hardware_ECC_Recovered  0x001a   033   001   000    Old_age   Always       -       46080048
197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0010   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age   Always       -       0
200 Multi_Zone_Error_Rate   0x0023   100   100   001    Pre-fail  Always       -       0
240 Head_Flying_Hours       0x0000   100   253   000    Old_age   Offline      -       20438 (70 1 0)
241 Total_LBAs_Written      0x0000   100   253   000    Old_age   Offline      -       59988886362
242 Total_LBAs_Read         0x0000   100   253   000    Old_age   Offline      -       520821617534

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short offline       Completed without error       00%     20783         -
# 2  Extended offline    Completed without error       00%     20642         -
# 3  Short offline       Completed without error       00%     20447         -
# 4  Short offline       Completed without error       00%     20111         -
# 5  Extended offline    Completed without error       00%     19959         -
# 6  Short offline       Completed without error       00%     19775         -
# 7  Short offline       Completed without error       00%     19439         -
# 8  Extended offline    Completed without error       00%     19287         -
# 9  Short offline       Completed without error       00%     18934         -
#10  Short offline       Completed without error       00%     18603         -
#11  Extended offline    Completed without error       00%     18451         -
#12  Short offline       Completed without error       00%     18267         -
#13  Short offline       Completed without error       00%     17931         -
#14  Extended offline    Completed without error       00%     17780         -
#15  Short offline       Completed without error       00%     17595         -
#16  Short offline       Completed without error       00%     17259         -
#17  Extended offline    Completed without error       00%     17117         -
#18  Short offline       Completed without error       00%     16755         -
#19  Short offline       Completed without error       00%     16419         -
#20  Extended offline    Completed without error       00%     16267         -
#21  Short offline       Completed without error       00%     16083         -

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.
 
Top