"synchronize cache" timeout

GaMMaLiKKeR

Cadet
Joined
Nov 27, 2021
Messages
1
Hi i while ago a drive reported an error as i was already waiting for an excuse to replace the drive i didn't look too much into it and just replaced the faulted drive. But after connecting the "failed" drive to my desktop i i noticed it didn't report any errors even after writing the drive full of data. i ended up using the drive for something else but a month later another drive failed in the pool this time i looked a bit more into it and found this in dmesg:
root@truenas[~]# dmesg
(da3:mpr0:0:3:0): SYNCHRONIZE CACHE(10). CDB: 35 00 00 00 00 00 00 00 00 00 length 0 SMID 1497 Command timeout on target 3(0x0009), 60000 set, 60.77154745 elapsed
mpr0: At enclosure level 0, slot 3, connector name ( )
mpr0: Sending abort to target 3 for SMID 1497
(da3:mpr0:0:3:0): SYNCHRONIZE CACHE(10). CDB: 35 00 00 00 00 00 00 00 00 00 length 0 SMID 1497 Aborting command 0xfffffe00e84896b8
mpr0: Controller reported scsi ioc terminated tgt 3 SMID 743 loginfo 31130000
mpr0: Controller reported scsi ioc terminated tgt 3 SMID 623 loginfo 31130000
mpr0: (da3:mpr0:0:3:0): READ(16). CDB: 88 00 00 00 00 02 51 36 46 90 00 00 00 08 00 00
Controller reported scsi ioc terminated tgt 3 SMID 879 loginfo 31130000
(da3:mpr0:0:3:0): CAM status: CCB request completed with an error
mpr0: Controller reported scsi ioc terminated tgt 3 SMID 2028 loginfo 31130000
(da3:mpr0:0:3:0): Retrying command, 3 more tries remain
mpr0: Finished abort recovery for target 3
(da3:mpr0:0:3:0): READ(16). CDB: 88 00 00 00 00 02 40 7d c6 80 00 00 00 08 00 00
(da3:mpr0:0:3:0): CAM status: CCB request completed with an error
(da3:mpr0:0:3:0): Retrying command, 3 more tries remain
(da3:mpr0:0:3:0): SYNCHRONIZE CACHE(10). CDB: 35 00 00 00 00 00 00 00 00 00
(da3:mpr0:0:3:0): CAM status: Command timeout
(da3:mpr0:0:3:0): Retrying command, 0 more tries remain
(da3:mpr0:0:3:0): SYNCHRONIZE CACHE(10). CDB: 35 00 00 00 00 00 00 00 00 00
(da3:mpr0:0:3:0): CAM status: SCSI Status Error
(da3:mpr0:0:3:0): SCSI status: Check Condition
(da3:mpr0:0:3:0): SCSI sense: UNIT ATTENTION asc:29,0 (Power on, reset, or bus device reset occurred)

If i set the drive to offline and then online again it will work just fine after re-silvering. but after a couple of days or weeks this exact error will happen again somtimes 2 drives at a time. but never 2 from the same pool. I found a thread from 2017 with people that had the "SYNCHRONIZE CACHE" issue but that ended up being a problem with the Seagate firmware. i also found multiple threads of people with the "CAM status: CCB request completed with an error" but that was a problem with the flash drive they where using to boot Truenas from.


system specs:
Truenascore version: TrueNAS-12.0-U6.1
CPU:Intel(R) Core(TM) i7-9700T
Memory: 2x 16GB crucial sodimm
Motherboard: Fujitsu D3633
HBA: LSI SAS 9207-8i
Bootdrive: 250GB Samsung 870

Pool:
2 2 drive mirrors
2x 8TB ATA WDC WD80EMAZ-00W
2x 8TB ATA WDC WD80EDAZ-11T
Note: all the drives a shucked drives. the 3.3V rail is cut on the power supply

I'm considering moving to a raidz1 but i don't want to do that when random drives are failing :/

I hope i provided enough info but if not just ask.
 
Top