Specific SATA port causes CRC errors leading to pool degradation

f26

Cadet
Joined
Jun 4, 2023
Messages
6
Hello dear community,

a few weeks ago I decided to finally get into self hosting. As I had some experience building PCs I figured I'd go with a self assembled build instead of just buying a prebuilt.
Unfortunately, I immediately got some write errors after creating the pool for the first time which led to a degraded pool. The smart logs showed CRC errors which after a quick google search indicated an issue with the SATA cables. No big deal, I thought, but after changing the cables multiple times the issue kept popping up.

After a few days of swapping cables, swapping drives, and running smart tests I narrowed it down to SATA_5 on my motherboard. That seems to be the only port introducing those issues. All other ports, including the oculink ones ran for days without a single issue. The only post I found that describes a similar issue appears to be this one.

I have now attached my boot drive to the suspected faulty SATA port, which did not immediately cause CRC erros, which I found very strange tbh, bc I did not even have to wait for 20 minutes to degrade the storage pool.

Any of you guys having an idea if thats simply a bad mobo controller or is there something else I can do about it?
The biggest issue is that the motherboard is sold out everywhere so my retailer could only offer me a refund and not a replacement.

I've put my setup in my signature for easy referencing.

Br,
Felix
 

Attachments

  • truenas_smart (5).txt
    29.3 KB · Views: 79

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,994
Post your hardware, motherboard, RAM, drives, etc. Off the top of my head I would think you may have SATA_5 shared with an NVMe or something like that. Some MB's have that.
 

f26

Cadet
Joined
Jun 4, 2023
Messages
6
Post your hardware, motherboard, RAM, drives, etc. Off the top of my head I would think you may have SATA_5 shared with an NVMe or something like that. Some MB's have that.
I have attached it to my signature as recommended in the rules (I also stated this in the OP), but you may be right that its a good idea to mention them in the thread to.

TrueNAS-SCALE-22.12.2
ASRock Rack C246 WSI
i3 9100
64 GB KINGSTON ECC Ram
3x 4TB SAMSUNG PM893 (MZ7L33T8HBLT-00A07) in RAIDZ1
480 GB KINGSTON SEDC500M480G Boot Drive
SilverStone DS380
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,994
Some people unfortunately can't open the link, but i just overlooked it.

The drive data looks fine. Nothing bad. I'm looking into the MB now.
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,994
I have now attached my boot drive to the suspected faulty SATA port, which did not immediately cause CRC erros, which I found very strange tbh, bc I did not even have to wait for 20 minutes to degrade the storage pool.
I just want to be clear and not assume I understand you correctly, You connected the 480GB Kingston drive to SATA_5, rebooted the system, about 20 minutes later the "boot-pool" showed as degraded? The text data you provided shows the boot-pool as fine. Did you repair it? I'm not doubting the failure, I just want to ensure I understand what is happening. And I do think you did the correct thing in isolating the problem to SATA_5 port.

What happens if you do not use SATA_5 at all? You could do that since you only have 4 SATA drives in total. I'm not asking you to ignore the SATA_5 issue, but I'm asking if the problem comes back at all. Will it run for days without an issue.

I would however highly suggest, if the vendor is offering to give you your money back, TAKE IT. You can wait for another motherboard to become available rather than have a bad motherboard. If you still want to keep it, you can and accept the problem.

Someone else may be able to offer more help with this motherboard issue but unless something else pops into my mind, I'm out of ideas.
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,994
Last thing, the SSD's temperatures seem a bit high. Are they getting enough airflow? They are not crazy high but mine usually run under 40C.
 

f26

Cadet
Joined
Jun 4, 2023
Messages
6
I just want to be clear and not assume I understand you correctly, You connected the 480GB Kingston drive to SATA_5, rebooted the system, about 20 minutes later the "boot-pool" showed as degraded? The text data you provided shows the boot-pool as fine. Did you repair it? I'm not doubting the failure, I just want to ensure I understand what is happening. And I do think you did the correct thing in isolating the problem to SATA_5 port.

What happens if you do not use SATA_5 at all? You could do that since you only have 4 SATA drives in total. I'm not asking you to ignore the SATA_5 issue, but I'm asking if the problem comes back at all. Will it run for days without an issue.

I would however highly suggest, if the vendor is offering to give you your money back, TAKE IT. You can wait for another motherboard to become available rather than have a bad motherboard. If you still want to keep it, you can and accept the problem.

Someone else may be able to offer more help with this motherboard issue but unless something else pops into my mind, I'm out of ideas.
hmm I see, maybe I phrased that poorly. Many thanks for the help you provided so far!

The boot pool never showed as degraded, the storage pool did (TrueSSD)
That is one of the weird things, namely that the boot drive (the kingston) never showed any CRC errors even when connected to SATA_5.

The other SSDs however (the Samsung ones), always run into CRC errors after being connected to SATA_5 (only takes them a few minutes). Currently, all samsung drives are connected to the motherboard via the oculink connector, which has never caused any issues.

Actually, the error that shows in the UI only states write error and that the drive that is connected to SATA_5 either as degraded or faulted. But I can observe that the CRC errors increment so I guess the write errors are caused by the CRC errors.

Regarding the logs, yes I cleared the pool after each cable/port/drive swap to ensure that I catch every new issue, so thats why the pool says its fine (strangely enough, switching ports often leads to the pool saying its fine without having to clear it).

I already wrote ASRock and I'm hoping for an RMA as waiting for the motherboard to become available again could take months in my region, so thats not really an option.

Last thing, the SSD's temperatures seem a bit high. Are they getting enough airflow? They are not crazy high but mine usually run under 40C.

The temps are that high only because the cage currently sits outside of the case to there is not really any active airflow cooling the drives. When they were in the case the temps never went above 41.
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,994
You might also look into if there is a firmware update for the Samsung drives. You currently have version JXTC304Q. If there is an update, I'd recommend updating to see if that fixes the problem.
 

f26

Cadet
Joined
Jun 4, 2023
Messages
6
You might also look into if there is a firmware update for the Samsung drives. You currently have version JXTC304Q. If there is an update, I'd recommend updating to see if that fixes the problem.
I have already looked into that but I can't find any firmware for those drives. very strange
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,994
Your current firmware version is:
Code:
Firmware Version: JXTC304Q


I did a search on the version number and the model number, found nothing as well. I was hoping a firmware update would just fix all your problems, but I guess not.

I hope you are able to get the replacement motherboard and it all works great.
 
  • Like
Reactions: f26

Constantin.FF

Dabbler
Joined
Apr 6, 2022
Messages
13
I have similar issues with my SSDs.
Started with single port that I suspected, but using new ssd on another SATA port started giving the same error after couple of months.
After too many CRC errors disks stopped getting recognized even in the bios.
Still have not resolved.
 
Top