ASrock C2550CD4I corrupting files

Status
Not open for further replies.

mortar

Dabbler
Joined
Oct 5, 2015
Messages
25
Hello,

I have a FreeNAS box, luckily, still in testing that on top of other problems with FreeNAS and crappy network filesystems (good ones don't seem to exist actually) also seems to corrupt files. The box is based on an ASrock C2550CD4I (similar to C2570CD4I) motherboard that has both Intel and Marvell SATA controllers. 16GB ECC memory. I have 3 disks in RAIDZ, all connected to the Marvell 88SE9230 AHCI SATA controller. FreeNAS version is FreeNAS-9.3-STABLE-201509022158.

Some weeks ago, ZFS seems to have hit corruption during a scrub.

Code:
zpool status -v
reports it found the corrupt file on Sun Nov 22 02:40:01 2015.

The only problem in the kernel message log is the following.

Code:
Nov 22 01:20:58 xxx ahcich4: Timeout on slot 20 port 0  
Nov 22 01:20:58 xxx ahcich4: is 00000000 cs 00100000 ss 00000000 rs 00100000
tfd 40 serr 00000000 cmd 10009417 


The timeouts seem to have happened a few times a month. (The box has not been in heavy use.)

I found postings of similar timeout problems with the Marvell controller as well as a Linux kernel developer noticing the controller behaving badly with some IOMMU. So, as is the norm, purpose built box turns out to be worse for the purpose (storage) than anything else I had before it..

Now, the FreeNAS specific question I have is how can this happen? Supposing that the timeout is related to the corrupted file, it's only one channel (disk) that timed out. Shouldn't RAIDZ still manage with two disks out of three without losing anything?

And another question: should the pool keep working if I switch the disks over to the Intel controller, i.e. all disks attached to different SATA ports than before and possibly getting different device names (I'm not that familiar with FreeBSD /dev, though)

FWIW, the disks are all fine according to extended SMART tests, which found no errors of any kind.

Of course, if anyone has any helpful pointers or experience with these motherboards and FreeNAS/-BSD I'd be glad to hear about that as well. I will contact ASrock, but throught I'd ask here first as this might be specific to FreeBSD (i.e. drivers written based on guesswork in lack of documentation, etc.).
 
Last edited:

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
Unplug the disks from the Marvell controller and plug them into the Intel. This will have no effect on anything other than to make your life better in a non-Marvell universe.

Your message doesn't indicate what the corruption was, but yes, RAIDZ should have protected the data unless something went awry on two drives, which is possible.
 

HeinzApfel

Dabbler
Joined
Nov 24, 2014
Messages
13
I did find the exact same problem with my board. Actually the time-out appeared when smartctl was accessing a drive while some copying was going on. Instantly time-out appeared and the filesystem did corrupted over the time.

Since then I disabled all SATA controllers on the C2550CD4I and installed a LSI HBA adapter SAS 9201-16i. Since the the system runs totally stable with 16 discs.
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
Intel SATA is perfectly reliable, so no need to mess with it.
 

mortar

Dabbler
Joined
Oct 5, 2015
Messages
25
After a couple of months with all storage disks on the Intel controller, as suggested by jgreco above, I can say that the problem seems to be solved. ASrock support was very prompt and helpful and suggested to update the firmware of the Marvell controller, which I did, but I did not care to start experimenting with it again.
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
ASRock's in a bit of a hard place there, because it isn't really their fault the Marvell controller has poor driver support in FreeBSD, and the Marvell controller works fine for some other operating systems. Best avoided if possible.
 
Status
Not open for further replies.
Top