negabinary
Dabbler
- Joined
- Jun 24, 2016
- Messages
- 11
I am at my wit’s end with this, so I’m hoping that someone here might recognize my problem or at least have some idea of where to go next.
Here’s my setup:
Supermicro X10SLL-F-O
Intel i3-4360
32GB Crucial DDR3L PC3-12800 ECC (4x8GB)
4x WD Red 3TB (two sets of mirrored drives, RAID-10)
8GB SanDisk CZ36 (OS drive)
835W Raidmax 80 Bronze PSU
Freenas 9.10-STABLE
3 Jails (Plex, VirtualBox, Generic BSD)
My system has two apparent symptoms: (1) “zpool status” shows many checksum errors and occasionally read/write errors for individual disks (especially under load, like during reslivering), and (2) individual disks periodically show as REMOVED or DEGRADED, causing my pool to become DEGRADED.
The problem seems to affect one or two disks at a time, but (knock on wood) not more than two at any one time. But, the problem seems to affect different drives at different times, no one drive appears to be more culpable than any other.
All drives pass a SMART long test.
Memtest86 ran for 8 hours with no errors.
I replaced the PSU and drive power cables. No difference.
I tried replacing all the SATA cables – no difference.
Despite the SMART test, I tried replacing the drive with the most errors (at the time) with a brand-new drive (of the same type), and checksum errors immediately appeared for that drive.
I disconnected my drives from the on-board SATA controller, added a LSI SAS9211-8i, flashed it to p20-IT, and connected all my drives to it instead. Problem continues.
So, before I buy a new motherboard and CPU and/or reconfigure my system with a sledgehammer – does anyone have any idea what I might be dealing with here? Could it be settings/configuration instead of hardware? Happy to provide any diagnostic data that might be useful. An example of the kernel log while this is occurring is below.
Thanks
> (ada2:ata2:0:0:0): WRITE_DMA48. ACB: 35 00 18 29 17 40 43 00 00 00 58 00
> (ada2:ata2:0:0:0): CAM status: ATA Status Error
> (ada2:ata2:0:0:0): ATA status: 51 (DRDY SERV ERR), error: 10 (IDNF )
> (ada2:ata2:0:0:0): RES: 51 10 18 29 17 43 43 00 00 58 00
> (ada2:ata2:0:0:0): Retrying command
> (ada2:ata2:0:0:0): FLUSHCACHE48. ACB: ea 00 00 00 00 40 00 00 00 00 00 00
> (ada2:ata2:0:0:0): CAM status: Command timeout
> (ada2:ata2:0:0:0): Retrying command
> (ada2:ata2:0:0:0): WRITE_DMA. ACB: ca 00 c0 bb 2b 43 00 00 00 00 08 00
> (ada2:ata2:0:0:0): CAM status: ATA Status Error
> (ada2:ata2:0:0:0): ATA status: 51 (DRDY SERV ERR), error: 10 (IDNF )
> (ada2:ata2:0:0:0): RES: 51 10 c0 bb 2b 03 03 00 00 08 00
> (ada2:ata2:0:0:0): Retrying command
> (ada3:ata3:0:0:0): WRITE_DMA. ACB: ca 00 08 07 2d 43 00 00 00 00 00 00
> (ada3:ata3:0:0:0): CAM status: ATA Status Error
> (ada3:ata3:0:0:0): ATA status: 51 (DRDY SERV ERR), error: 10 (IDNF )
> (ada3:ata3:0:0:0): RES: 51 10 08 07 2d 03 03 00 00 30 00
> (ada3:ata3:0:0:0): Retrying command
Here’s my setup:
Supermicro X10SLL-F-O
Intel i3-4360
32GB Crucial DDR3L PC3-12800 ECC (4x8GB)
4x WD Red 3TB (two sets of mirrored drives, RAID-10)
8GB SanDisk CZ36 (OS drive)
835W Raidmax 80 Bronze PSU
Freenas 9.10-STABLE
3 Jails (Plex, VirtualBox, Generic BSD)
My system has two apparent symptoms: (1) “zpool status” shows many checksum errors and occasionally read/write errors for individual disks (especially under load, like during reslivering), and (2) individual disks periodically show as REMOVED or DEGRADED, causing my pool to become DEGRADED.
The problem seems to affect one or two disks at a time, but (knock on wood) not more than two at any one time. But, the problem seems to affect different drives at different times, no one drive appears to be more culpable than any other.
All drives pass a SMART long test.
Memtest86 ran for 8 hours with no errors.
I replaced the PSU and drive power cables. No difference.
I tried replacing all the SATA cables – no difference.
Despite the SMART test, I tried replacing the drive with the most errors (at the time) with a brand-new drive (of the same type), and checksum errors immediately appeared for that drive.
I disconnected my drives from the on-board SATA controller, added a LSI SAS9211-8i, flashed it to p20-IT, and connected all my drives to it instead. Problem continues.
So, before I buy a new motherboard and CPU and/or reconfigure my system with a sledgehammer – does anyone have any idea what I might be dealing with here? Could it be settings/configuration instead of hardware? Happy to provide any diagnostic data that might be useful. An example of the kernel log while this is occurring is below.
Thanks
> (ada2:ata2:0:0:0): WRITE_DMA48. ACB: 35 00 18 29 17 40 43 00 00 00 58 00
> (ada2:ata2:0:0:0): CAM status: ATA Status Error
> (ada2:ata2:0:0:0): ATA status: 51 (DRDY SERV ERR), error: 10 (IDNF )
> (ada2:ata2:0:0:0): RES: 51 10 18 29 17 43 43 00 00 58 00
> (ada2:ata2:0:0:0): Retrying command
> (ada2:ata2:0:0:0): FLUSHCACHE48. ACB: ea 00 00 00 00 40 00 00 00 00 00 00
> (ada2:ata2:0:0:0): CAM status: Command timeout
> (ada2:ata2:0:0:0): Retrying command
> (ada2:ata2:0:0:0): WRITE_DMA. ACB: ca 00 c0 bb 2b 43 00 00 00 00 08 00
> (ada2:ata2:0:0:0): CAM status: ATA Status Error
> (ada2:ata2:0:0:0): ATA status: 51 (DRDY SERV ERR), error: 10 (IDNF )
> (ada2:ata2:0:0:0): RES: 51 10 c0 bb 2b 03 03 00 00 08 00
> (ada2:ata2:0:0:0): Retrying command
> (ada3:ata3:0:0:0): WRITE_DMA. ACB: ca 00 08 07 2d 43 00 00 00 00 00 00
> (ada3:ata3:0:0:0): CAM status: ATA Status Error
> (ada3:ata3:0:0:0): ATA status: 51 (DRDY SERV ERR), error: 10 (IDNF )
> (ada3:ata3:0:0:0): RES: 51 10 08 07 2d 03 03 00 00 30 00
> (ada3:ata3:0:0:0): Retrying command
Last edited: