Checksum errors returning after scrubs

Status
Not open for further replies.

guldan

Dabbler
Joined
Nov 25, 2012
Messages
29
Hey Guys,

I made a post the other day regarding this but it's a different question. Essentially i've had this 4 disk raidz1 running for 1.5 yrs no issues, I then upgraded Freenas to 8.3 and my zpool to 28. Within the week all my disks were showing checksum errors, after a full scrub the results were 2/7/1/2.

Here is my logic.. It can't be the disks/cables because previously there was 0 in all of them, they can't all be damaged at once.. In fact I doubt it's hardware at all because of that fact although if it is then it's probably the RAM or Controller. That week I had Freenas randomly reboot, perhaps crashed.. So I am thinking that crash created some file corruption unrelated to the hardware.

I ran a full scrub, said it fixed 440k in errors. They came back, running another full scrub and they are still there. Can anyone shed light on this?

Thx
 

pete_c20

Dabbler
Joined
Nov 23, 2012
Messages
23
I'm new to FreeNAS (still reading!) but not new to computers.
My first thought is 'what changed' between working and non working? That spawns these further thoughts.....1)That the upgrade noticed something that was always there but had gone undetected. 2)That the upgrade has messed something up. 3) The random reboot was actually the hallmark of a hardware problem.

1 & 2 I can't help you with...but 3....

How about running memtest86 (runs nicely off a bootable USB stick) to see if that shows any errors? Maybe a simple way of doing a basic system hardware integrity check. Errors there may point to RAM, power supply (sometimes a menace to diagnose). Leave it to bake overnight. It's an easy one to do and shouldn't disturb anything.
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
That's a great starting point. Want to expand upon it:

1.5 years, eh. Check all your fans. It would not be unlikely for a failed fan to result in insufficient airflow, resulting in EZ-Bake Oven time. The cheap fans often found in lots of gear seem to have an expected lifetime of about 12 months. They might not stop spinning... might just get gummed up and slow down. Or the awful dreaded buzz saw of doom sound. Any of those can cause the silicon to get too warm. Blow out DUST that might be building up, especially on any heatsinks, especially including your CPU's heatsink.

Also, check with SMART to see how your drives are doing. You can use "smartctl -a /dev/adaN" where adaN is one of your disk devices. Look at things like Temperature_Celsius to see if they're running hot or anything. Other problems reported?

If that all looks okay, consider your power supply, as pete_c20 suggested. Particularly if it's a generic supply, open it up and look for bulging capacitors (also inspect your mainboard while you're in there). ANY bulging capacitors (suggest Google images for dandy examples) indicate that something in your system isn't going to be working right. This is more likely for mainboards made more than five years ago, but it isn't unheard-of even today especially if you happen to have cheap, no-name mainboards or power supplies.

It doesn't hurt to remove all your cards and memory modules at a static-safe workstation and reseat them all, either. Wait, am I showing my age and paranoia? :D
 

Ken Almond

Dabbler
Joined
May 11, 2014
Messages
19
When I first installed FreeNAS (running 8.3.1) it worked for a few weeks and then I got a lot of read/write/chksum errors on a particular drive. I replace the drive, but still problems. I replaced SATA cables with expensive ones, no good. I have 5 drives (raidz1 with single volume) and 6 SATA connectors on motherboard - so I moved the 'bad drive' to a different SATA connector (was the 'external SATA' connector on motherboar) and wa-la, problems went away for over 6 months now.

Today, after massive 2TB copy onto the system over last week, I now show 200 (or so) chksum errors after a scrub. A clear, second scrub resulted in another 200 chksum errors (read/write are OK). So... I'm wondering myself how 'bad' this is. Will try the smartctl tests and re-scrub and see what happens.
 

pete_c20

Dabbler
Joined
Nov 23, 2012
Messages
23
As the problems are maybe a little strange and from what you've said don't respond to some 1st line attempts to clear them, have you checked stuff like your power supply? Sometimes when things are borderline it can lead to some very strange problems that are seemingly cleared by some actions only to return later on.
 

Ken Almond

Dabbler
Joined
May 11, 2014
Messages
19
When I first installed FreeNAS (running 8.3.1) it worked for a few weeks and then I got a lot of read/write/chksum errors on a particular drive. I replace the drive, but still problems. I replaced SATA cables with expensive ones, no good. I have 5 drives (raidz1 with single volume) and 6 SATA connectors on motherboard - so I moved the 'bad drive' to a different SATA connector (was the 'external SATA' connector on motherboar) and wa-la, problems went away for over 6 months now.

Today, after massive 2TB copy onto the system over last week, I now show 200 (or so) chksum errors after a scrub. A clear, second scrub resulted in another 200 chksum errors (read/write are OK). So... I'm wondering myself how 'bad' this is. Will try the smartctl tests and re-scrub and see what happens.


UPDATE: Additional scrub (3rd one) shows all clear.... so I don't see any reason to panic yet. I did part of the massive copy (aboug 1TB of it) with sync=disabled (temporarily) to speed performance. I wonder if this could have resulted in a few chksum errors?.
state: ONLINE
scan: scrub repaired 0 in 10h29m with 0 errors on Tue May 13 19:35:10 2014
config:

NAME STATE READ WRITE CKSUM
aeraidz ONLINE 0 0 0
raidz1-0 ONLINE 0 0 0
gptid/299b793d-0c5a-11e3-a299-001cc09fd586 ONLINE 0 0 0
gptid/6de19cf7-c24a-11e2-99c4-001cc09fd586 ONLINE 0 0 0
gptid/6e63ebf4-c24a-11e2-99c4-001cc09fd586 ONLINE 0 0 0
gptid/6ee70e0c-c24a-11e2-99c4-001cc09fd586 ONLINE 0 0 0
gptid/6f6a81c7-c24a-11e2-99c4-001cc09fd586 ONLINE 0 0 0
 
Status
Not open for further replies.
Top