HDD Error Count increasing

neto_hugo · Jun 14, 2019

Hi guys

I am having some strange problems with my FN and I ask your help to understand what is happening.

My system is in the signature.

I have 1 zpool with 5x6TB (on motherboard), 1 zpool with 6x1TB, 1 zpool with 6x6TB and 1 8TB HDD (on HBA board). All are WD Red.

the problem is happening right on that HDD that does not belong to any zpool. This HD I won in a WD event and I decided to leave it isolated from the others only for installation of Plugins, study of scripts and virtual machines

Every day SMART shows that errors (attribute 199) have risen dramatically, generating WARNINGs and at times even accusing the volume status is "Unknown".

At first I thought it was HDD problem and requested RMA. WD returned me another 8TB and so I put it back on my machine. To my surprise the problem continued to happen and if the RMA was not enough 2 times, I'm already in my fourth HDD 8TB and the system continues to accuse this problem.

Every time the HDD arrived I passed the test for badblocks (https://www.ixsystems.com/community/resources/hard-drive-burn-in-testing.92/) and every time it passed 100%.

I am passing SMART (long test) on all other HDDs and so far all are OK.

Now I am thinking about the following scenarios:
1- Wrong PSU? By the site (outervision.com/power-supply-calculator), when simulating a machine like mine was recommended 450W source.
2- HBA board? if yes, I am thinking about buying a LSI 9211.
3- Any wrong configuration in the system? Unlikely.
4- Cables? If it is it will be the cheapest! LOL

Have you ever been through this situation?
What could be happening?

Thank you for your help

JaimieV · Jun 14, 2019

There's some simple experimentation you can do - ZFS doesn't care what controller any pool drive is connected to, it'll find them wherever. So you can connect the 8T drive to the motherboard or a different HBA cable.

199 UDMA_CRC_Error_Count - these can go up due to bad cables, yes. It's talking about comms between the drive and the host.

Chris Moore · Jun 14, 2019

neto_hugo said:
I have 1 zpool with 5x6TB (on motherboard), 1 zpool with 6x1TB, 1 zpool with 6x6TB and 1 8TB HDD (on HBA board). All are WD Red.

You have not told us what your pool configuration is. You gave us quantities of drives, but are the pools RAIDz1, z2, something else?

neto_hugo said:
Every day SMART shows that errors (attribute 199) have risen dramatically, generating WARNINGs and at times even accusing the volume status is "Unknown".

As @JaimieV said, CRC errors are usually caused by a bad cable. I have one myself that has a single bad wire and it causes CRC errors. The cable was initially fine but started having errors after being pinched in the case. It is very likely that there was nothing wrong with the drive, just the data cable the drive was connected to.

Useful Commands
https://www.ixsystems.com/community/threads/useful-commands.30314/#post-195192

Hard Drive Troubleshooting Guide (All Versions of FreeNAS)
https://www.ixsystems.com/community...bleshooting-guide-all-versions-of-freenas.17/

neto_hugo · Jun 16, 2019

JaimieV said:
There's some simple experimentation you can do - ZFS doesn't care what controller any pool drive is connected to, it'll find them wherever. So you can connect the 8T drive to the motherboard or a different HBA cable.

199 UDMA_CRC_Error_Count - these can go up due to bad cables, yes. It's talking about comms between the drive and the host.

Thanks JaimieV, I replace the cables today and I'm going to retake the tests and observe the behavior!
If something fails, the plan B will switch to the motherboard controller.

Chris Moore said:
You have not told us what your pool configuration is. You gave us quantities of drives, but are the pools RAIDz1, z2, something else?

As @JaimieV said, CRC errors are usually caused by a bad cable. I have one myself that has a single bad wire and it causes CRC errors. The cable was initially fine but started having errors after being pinched in the case. It is very likely that there was nothing wrong with the drive, just the data cable the drive was connected to.

Useful Commands
https://www.ixsystems.com/community/threads/useful-commands.30314/#post-195192

Hard Drive Troubleshooting Guide (All Versions of FreeNAS)
https://www.ixsystems.com/community...bleshooting-guide-all-versions-of-freenas.17/

Thanks for the warning Chris, I have already updated my signature. But to answer you all my zpools are raidz2.

neto_hugo · Jul 18, 2019

Guys, after many tests carried out I am still having the same problem reported before.

I tried with 4 new cables of different brands, and even then the warnings are constant to the point where the FN does not recognize the HDD anymore.

Here is some information (da12 is the 8TB HDD)

Code:

Jul 17 17:03:55 freenas scsi 0 state 0 xfer 0
Jul 17 17:03:55 freenas (da12:mps0:0:33:0): Retrying command
Jul 17 17:03:55 freenas (da12:mps0:0:33:0): READ(10). CDB: 28 00 18 71 6d 70 00 01 00 00
Jul 17 17:03:55 freenas (da12:mps0:0:33:0): CAM status: CCB request completed with an error
Jul 17 17:03:55 freenas (da12:mps0:0:33:0): Retrying command
Jul 17 17:03:55 freenas (da12:mps0:0:33:0): READ(10). CDB: 28 00 18 71 6e 70 00 01 00 00
Jul 17 17:03:55 freenas (da12:mps0:0:33:0): CAM status: CCB request completed with an error
Jul 17 17:03:55 freenas (da12:mps0:0:33:0): Retrying command
Jul 17 17:03:55 freenas (da12:mps0:0:33:0): READ(10). CDB: 28 00 18 71 70 70 00 01 00 00
Jul 17 17:03:55 freenas (da12:mps0:0:33:0): CAM status: CCB request completed with an error
Jul 17 17:03:55 freenas (da12:mps0:0:33:0): Retrying command
Jul 17 17:03:55 freenas (da12:mps0:0:33:0): READ(10). CDB: 28 00 18 71 69 70 00 01 00 00
Jul 17 17:03:55 freenas (da12:mps0:0:33:0): CAM status: SCSI Status Error
Jul 17 17:03:55 freenas (da12:mps0:0:33:0): SCSI status: Check Condition
Jul 17 17:03:55 freenas (da12:mps0:0:33:0): SCSI sense: ABORTED COMMAND asc:47,3 (Information unit iuCRC error detected)
Jul 17 17:03:55 freenas (da12:mps0:0:33:0): Retrying command (per sense data)
Jul 17 17:03:55 freenas (da12:mps0:0:33:0): READ(10). CDB: 28 00 18 71 6c 70 00 01 00 00 length 131072 SMID 1036 terminated ioc 804b loginfo 31080000 scsi 0 state 0 xfer 0
Jul 17 17:03:55 freenas (da12:mps0:0:33:0): READ(10). CDB: 28 00 18 71 6d 70 00 01 00 00 length 131072 SMID 873 terminated ioc 804b loginfo 31080000 (da12:mps0:0:33:0): READ(10). CDB: 28 00 18 71 6c 70 00 01 00 00
Jul 17 17:03:55 freenas scsi 0 state 0 xfer 0
Jul 17 17:03:55 freenas (da12:mps0:0:33:0): READ(10). CDB: 28 00 18 71 6e 70 00 01 00 00 length 131072 SMID 1006 terminated ioc 804b loginfo 31080000(da12:mps0:0:33:0): CAM status: CCB request completed with an error
Jul 17 17:03:55 freenas (da12:mps0:0:33:0): Error 5, Retries exhausted
Jul 17 17:03:55 freenas (da12:mps0:0:33:0): READ(10). CDB: 28 00 18 71 6d 70 00 01 00 00
Jul 17 17:03:55 freenas (da12:mps0:0:33:0): CAM status: CCB request completed with an error
Jul 17 17:03:55 freenas (da12:mps0:0:33:0): Error 5, Retries exhausted
Jul 17 17:03:55 freenas scsi 0 state 0 xfer 0
Jul 17 17:03:55 freenas (da12:mps0:0:33:0): READ(10). CDB: 28 00 18 71 70 70 00 01 00 00 length 131072 SMID 1061 terminated ioc 804b loginfo 31080000(da12:mps0:0:33:0): READ(10). CDB: 28 00 18 71 6e 70 00 01 00 00
Jul 17 17:03:55 freenas scsi 0 state 0 xfer 0
Jul 17 17:03:55 freenas (da12:mps0:0:33:0): CAM status: CCB request completed with an error
Jul 17 17:03:55 freenas (da12:mps0:0:33:0): Error 5, Retries exhausted
Jul 17 17:03:55 freenas (da12:mps0:0:33:0): READ(10). CDB: 28 00 18 71 70 70 00 01 00 00
Jul 17 17:03:55 freenas (da12:mps0:0:33:0): CAM status: CCB request completed with an error
Jul 17 17:03:55 freenas (da12:mps0:0:33:0): Retrying command
Jul 17 17:03:55 freenas (da12:mps0:0:33:0): READ(10). CDB: 28 00 18 71 6b 70 00 01 00 00
Jul 17 17:03:55 freenas (da12:mps0:0:33:0): CAM status: SCSI Status Error
Jul 17 17:03:55 freenas (da12:mps0:0:33:0): SCSI status: Check Condition
Jul 17 17:03:55 freenas (da12:mps0:0:33:0): SCSI sense: ABORTED COMMAND asc:47,3 (Information unit iuCRC error detected)
Jul 17 17:03:55 freenas (da12:mps0:0:33:0): Error 5, Retries exhausted

If you still agree that it might be cable problem, please, suggest me the brands models so that I can buy them

I'm considering buying another HBA card to split the zpools, something like, 1 zpool on one board and 2 zpool on another ...

What do you think? Makes sense?
Thanks to all who tried to help.

JaimieV · Jul 19, 2019

Perhaps it's the port that's dodgy, not the cable. Plug it into the motherboard and see. If there's no empty port, just swap it with one of the mobo-connected drives - like I mentioned above, you can hang a ZFS drive off anywhere and it'll be fine, won't break the pool or anything.

If that works okay, maybe leave it like that. No need for pointles "neatness" of having all the pool on one controller.

It's somewhat possible that there's some conflict between the HDD and the HBA firmware - is the HBA on the latest, 20.00.07.00? Even that's from 2014 iirc so won't be well tested with 8TB drives.

Important Announcement for The TrueNAS Community.

HDD Error Count increasing

neto_hugo

Explorer

JaimieV

Guru

Chris Moore

Hall of Famer

neto_hugo

Explorer

neto_hugo

Explorer

JaimieV

Guru

Similar threads

Important Announcement for The TrueNAS Community.