Data corruption issues - help please!

Status
Not open for further replies.

globus999

Contributor
Joined
Jun 9, 2011
Messages
105
Hi,

The system I just built is basically:

Mobo: AX4-SPE-N AOpen
RAM: 2 Gb
HDDs:

Bank #1: 4x WD EARS
Bank #2: 4 assorted (2x1Tb 1x2Tb (Seagate) 1x2Tb WD EARS)

2 x PCI to Sata controller: Sil 3114 based.

I created two raidz1 as follows:

tank1: using Bank #1
tank2: using Bank #2

What I am experiencing is a low level corruption on scrubs, not on during freenas operation. This is, during use I don't see a single ZFS read, write or cksum error.

When I scrub either tank, I get anywhere between 5 to 25 cksum error on *all* hdds. and the odd irrecoverable error (i.e. the odd file to be restored from backup).

I tried to troubleshoot the following

1 - Thermal issues: the CPU max temp under heavy load is 47 Celcius, the hottest hdd under heavy load is at 37 Celcius
2 - Disable anything that is not in use in BIOS
3 - Run RAM memtest (passed OK without a single error)
4 - Re-seat the Sata controllers

No difference - same error levels.

So, then I replaced 1 Sata controller (tank2) with a Promise S300 TX4. Ran scrub (found and corrected some issues) and then I run scrub a second time. Found more cksum errors on all hdds!!!

Did the same with the remaining Sil3114 controller (tank1). Run scrub - fix errors -run scrub a second time. Found more cksum errors on all hdds.

This indicates that the issue is not the controller.

So now I replaced the remaining Sil3114 with another Promise S300 TX4 and scurbbed. Same issues.

This indicates that the issue is not about mismatched controllers.

This would indicate that the errors are:

a - not I/O related
b - created on the fly by what??? I haven't the faintest. Can't be the hdd's since they are half new / half old and assorted sizes and brands.

However, now not only I have cksum errors, but on shutdown (after 30 min - I have spin down set up for 30 min) I get the following error messages (the hdds vary from shutdown to shutdown):

(ada4:ata6:0:0:0) Spin-down disk failed.

If I shutdown quite soon before turn on (presummably while all the hdds are spinning) I don't get the error.

WTF!

I mean, if I use the cheapo controller I get *LESS* problems that if I use the expensive one? The one that it is widely recommended????

Also, does anybody have any ideas what else can I test? I just run out....

Any help would be appreciated.
 
Joined
May 27, 2011
Messages
566
try the disks and controllers in a different computer. that motherboard is pretty old. just because you got no memory issues does not mean you don't have I/O issues.

I'd venture a guess that having 8 disks on one pci bus isn't helping matters. you're asking for a lot of I/O on one tiny bus. maybe place 2 on the on-board sata and 2 on one card and see how that goes.
 

globus999

Contributor
Joined
Jun 9, 2011
Messages
105
try the disks and controllers in a different computer. that motherboard is pretty old. just because you got no memory issues does not mean you don't have I/O issues.

I'd venture a guess that having 8 disks on one pci bus isn't helping matters. you're asking for a lot of I/O on one tiny bus. maybe place 2 on the on-board sata and 2 on one card and see how that goes.

Re: mobo, yes, it's possible, but if that would be the case I would be seeing I/O issues during use, which I don't. And I don't mean a little of them, I mean I see zilch, zero, nada. It's only when I scrub that I start to see problems. I went through a massive data migration and absolutely no I/O errors. I was pumping data as fas as I could and nope, not one. Strange...

Wrt discs and controllers, yes, 8 on one is quite a bit, however, I am not using the data inside massively. Worst case scenario I am accessing one tank at the time. That would be only 4 hdds being accessed simultaneously. Heck! my lousy Windoze 2000 can do that, speaking of obsolete tech :smile:. I'll try 2 + 2 see how it goes, but I am not too optimistic.

I also bought Ampenol sata cables (gossip has it that crappy cables will also do it) but I am not holding my breath (haven't received them yet).

At this point, there is very little to replace other than the mobo.... ugh....

Million tx! for your suggestions. Will try and report back.
 

globus999

Contributor
Joined
Jun 9, 2011
Messages
105
I'd venture a guess that having 8 disks on one pci bus isn't helping matters. you're asking for a lot of I/O on one tiny bus. maybe place 2 on the on-board sata and 2 on one card and see how that goes.

Well, nix on the 2+2 test. Same issues. BTW, during this test I switched tank1 to the 2+2 configuration but made sure tank2 wasn't in use (so it was not using PCI bandwidth)

Oh, just in case, I checked the BIOS on the TX4 and it is the latest.
 
Status
Not open for further replies.
Top