ECC vs non-ECC RAM and ZFS

Status
Not open for further replies.

panz

Guru
Joined
May 24, 2013
Messages
556
One test I'm thinking about running...
So here's my plan. I have a system, my old FreeNAS box. It has a Westmere socket 1366 Xeon and can use ECC or non-ECC RAM. I'm thinking I might pull out a few spare working drives and make a pool, fill that sucker up with fake data and do some tests to see what is going on. Since all of the hardware will stay the same except the RAM it might be worthwhile to see what the result is. I'm fortunate to have this board and CPU as I can and have used both ECC and non-ECC RAM and I have confirmed that the ECC does work when I use ECC RAM. Hmm.... how devoted am I to this?

I've built the system in my signature + another one with MSI C847MS-E33 and 2 old 1TB HDs (Samsung and Seagate, ZFS mirrored). Both are non -ECC systems.

Before and after each backup of my Windows server, I commit the following checks:
  1. zpool scrub
  2. HCI memtest
  3. Memtest86+
  4. Spinrite
Result is zero errors on both systems in all the above tests. These are made-by-myself builds and I think that one assembling precaution is very imporant: ESD.

RAM is an Electrostatic discharge sensitive piece of hardware (not telling about motherboard and other electronic components), so a good SOP is needed: anti-static mat, wrist straps, ESD certified tools, correct handling, ambient humidity, ESD certified boxes, etc.
 

Richman

Patron
Joined
Dec 12, 2013
Messages
233
One test I'm thinking about running...

The very first FreeNAS box I built had non-ECC RAM. It had some weird behavior. Every scrub it would find a handful of checksum errors on every disk. Mind you, ever disk passed every test I ever threw at them. But every scrub, every disk(all were 1.5TB disks I believe) would have 5-30 CHKSUM errors if you did a zpool status. Usually 200-500Kb were "repaired" and that was that. Nobody really understands what was going on or why it does it.
Now is your chance man, to really understand. No more sleepless nights CJ. you can find out and all will be well with the universe again.
Today I noticed something. A friend is using ZFS on Linux with a non-ECC system(don't ask...) and I noticed when I do a zpool status on his Linux box with ZFS you get a "scrub repaired 432k". I save his zpool status every night for historical purposes. Well, I went back and looked and every scrub he's run since his desktop was built 8 months ago has 150-500kb that were "repaired". I'm wondering if there is a link between these "repairs" and non-ECC. Could this be a way to definitively prove that non-ECC is more destructive than we think?
So you or nobody knows if the, '150-500kb that were "repaired"' were actually repaired of if its actual damage? Is that what your saying?

So here's my plan. I have a system, my old FreeNAS box. It has a Westmere socket 1366 Xeon and can use ECC or non-ECC RAM. I'm thinking I might pull out a few spare working drives and make a pool, fill that sucker up with fake data and do some tests to see what is going on. Since all of the hardware will stay the same except the RAM it might be worthwhile to see what the result is. I'm fortunate to have this board and CPU as I can and have used both ECC and non-ECC RAM and I have confirmed that the ECC does work when I use ECC RAM. Hmm.... how devoted am I to this?

Your very devoted. So devoted you can smell it. You can taste it even. Do it CJ do it. I know you want to. Your itching so sooooo bad to find out once and for all what is going on.
Meanwhile, we will all stayed tuned, waiting for next weeks episode when CJ will say, finally after a long peaceful and restful nights sleep , "Eureka, I figured it out!!"
 

fracai

Guru
Joined
Aug 22, 2012
Messages
1,212
I'm still running a non-ECC system (ECC support has been selected, but not yet ordered) and have never seen an error in zpool status. It's been running for almost a year and a half, with scrubs every two weeks and I resilvered four of the drives (capacity upgrade) a few months in. I did run memtest for around a full day (I think that was three or four passes over 16 GB) before installing FreeNAS.

I don't think non-ECC RAM guarantees checksum errors.
 

Richman

Patron
Joined
Dec 12, 2013
Messages
233
I'm still running a non-ECC system (ECC support has been selected, but not yet ordered) and have never seen an error in zpool status. It's been running for almost a year and a half, with scrubs every two weeks and I resilvered four of the drives (capacity upgrade) a few months in. I did run memtest for around a full day (I think that was three or four passes over 16 GB) before installing FreeNAS.

I don't think non-ECC RAM guarantees checksum errors.

Maybe a majority of non-ECC users who had problems were a problem actually perpetuated and realized because of low RAM. Any idea or stats on that or nobody really cares? I know CJ doesn't care and will say, "Your stupid for even asking the question or thinking about it" I feel I have to pre-empt most of my responses since he will tear them all to pieces and throw out judgments and anmecalling as if I am a 10 year old special-Ed person
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
I don't think non-ECC RAM guarantees checksum errors.

Let's not be silly here. The vast majority of people running non-ECC are going to be just fine, because non-ECC memory fails at similar rates to ECC memory, and ECC memory only fails infrequently. But if that's maybe 10% of systems failing in a three year period, the question is whether or not you want to gamble becoming a member of that unenviable population. That's really the decision anyone with non-ECC faces. And it's why so many people will say "but I don't do ECC and I'm just fine!" Because they are just fine. It is the poor sorry bastards who have a failure that have the sad stories.
 

Richman

Patron
Joined
Dec 12, 2013
Messages
233
Let's not be silly here. The vast majority of people running non-ECC are going to be just fine, because non-ECC memory fails at similar rates to ECC memory, and ECC memory only fails infrequently.
Didn't understand that last sentence. But I guess the morale of the story is that ECC can withstand a lot of those errors and correct many of them or halt the system without causing damage and you don't have to think about it or worry about it. With non-ECC you have to be extra diligent and know how to manage it and do untouched backup of some sort to be able to rebuild. Something only the at-home hobbyist or media mongrel would care to do but then not do the needed ersearch and learning to do it right. Probably why CJ always says basically, paraphrased 'Don't use non-ECC or you WILL be sorry'
But if that's maybe 10% of systems failing in a three year period, the question is whether or not you want to gamble becoming a member of that unenviable population. That's really the decision anyone with non-ECC faces. And it's why so many people will say "but I don't do ECC and I'm just fine!" Because they are just fine. It is the poor sorry bastards who have a failure that have the sad stories.

I think non-ECC is like driving a car on balled tires. You may be just fine but it compounds the issue when another monkey wrench is thrown in that non-ECC is not meant to deal with like wet or icy roads. Then you have to be extra careful and even then you could be screwed.

Personally I have been thinking of starting with UFS just to move data around and get used to it. Specially after reading everything I have. At least until I get my ultimate, dream ECC system hardware. But then others have used ZFS and I think I could mitigate tragedy. I probably won't even post anything to say, 'Chalk me up to a statistic' since CJ will flame all over it. I would just quietly rebuild without a peep.
 

fracai

Guru
Joined
Aug 22, 2012
Messages
1,212
Let's not be silly here. The vast majority of people running non-ECC are going to be just fine, because non-ECC memory fails at similar rates to ECC memory, and ECC memory only fails infrequently. But if that's maybe 10% of systems failing in a three year period, the question is whether or not you want to gamble becoming a member of that unenviable population. That's really the decision anyone with non-ECC faces. And it's why so many people will say "but I don't do ECC and I'm just fine!" Because they are just fine. It is the poor sorry bastards who have a failure that have the sad stories.
Just responding to Cyberjock's comment re: "I'm wondering if there is a link between these "repairs" and non-ECC. Could this be a way to definitively prove that non-ECC is more destructive than we think?" That I took to mean non-ECC would always exhibit some amount of error.
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
I think that is a poor assumption.

In the old sun3 days we had parity, not ECC, and 24 SIMM modules on a board like the 3/60. A parity error resulted in a deliberate panic, so the "fix" would be to replace the afflicted module. Some would only err every few months. Weeding out the baddies was therefore both an art and a science. Once done, you could usually go 5+ years before another problem appeared. note: we ran one 3/60 for about 20 years...

I have seen little to convince me that modern memory is substantially different. Test your memory, always. If new problems appear, they might be glaringly obvious (stuck bit) but also might just be a monthly single bit flip. ECC lets you detect. Non-ECC doesn't. Whether any given bitflip is harmful to a pool is random bad luck, as hitting a directory filename or file data is not harmful to the pool after all... but that might not be desirable from a user's perspective to have silent magic correction corruption...
 

PenalunWil

Contributor
Joined
Dec 30, 2013
Messages
115
So now that you're convinced that ECC really is that important, you can build a system with ECC for relatively cheap...

Motherboard: Supermicro X9SCM-F ($160ish)
CPU: Pentium G2020 ($60ish)
RAM: KVR16E11K4/32 32GB of DDR3-ECC-1600Mhz RAM($350ish)

So total cost is about $570. Less if you don't want to go to a full 32GB of RAM. If you went with a 2x8GB RAM stick kit you can get the total price down about $370. The G2020 is a great CPU for FreeNAS.

Of course, if you plan to use plugins like Plex that can be CPU intensive for transcoding you will need a little more power. Be wary of what CPUs do and don't support ECC. All Xeons do, and some i3s do. Check with Intels specification sheets to be sure before you spend the money. I use an E3-1230v2(about $250) and it is AMAZING! No matter what I throw at it I can't get more than about 30% CPU usage. Don't go by the TDP to try to pick the "greenest" CPU either. TDP is for full load heat output. That provides no information on what kind of power usage you will see when idle(which is what your system will probably be doing 99% of the time). My system with no disks installed and the system idle is at about 35w. Unfortunately I can't help with AMD CPUs since I'm not a fan of AMD. I do know that trying to go with "Green" CPUs for AMDs has disappointed many people here. "Green" CPUs perform slower than other CPUs, so be careful and don't buy a CPU that can't perform fast enough to make you happy.


Hi cyberjock,

Can I ask what case and what power supply you used for this build please?
Also are you using a raid card or the mobo's sata connectors?
I'm looking at a build with "hotswap" hard drives.
You may have already answered part of this question, which a total newbie like me has missed. If that's the case (oops sorry about the pun) then please accept my apologise.

Amazing thread btw, I felt like a human standing amongst gods. Most of it went over my head, but you know how you get a sort of gut feeling when you know when someone is talking total sense? Yes... I'm a total convert to ECC RAM specially after reading those two fail threads. My Dell Optiplex 755 NAS box build has died a death. I think I got to go down the road of MoBo which supports ECC.

Wil
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
I'm using a Norco-4224 and a Corsair 1000w PSU.

I wouldn't necessarily recommend the Norc-4224 unless you want to add a bunch of hidden costs. Read the thread i wrote up on overheating drives. ;)

Generally speaking, you're better off just getting a Supermicro case(even ebay is a good place to look!) after you factor in hidden costs and time.
 

PenalunWil

Contributor
Joined
Dec 30, 2013
Messages
115
Thanks Cyberjock,

I think this is overkill for my 2 man office needs but at least I'm starting to sort the wheat from the chaff.

When I'm ready I'll start my own build blog (says he hoping his non-ECC RAM holds out for as long as it takes :) )

Many thanks again.

Wil
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
Not coming up for me but I seem to recall having identified that as a Chenbro chassis.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
Yes, and can we stop talking about cases in a RAM thread? Thanks!
 

eatkinola

Cadet
Joined
Mar 3, 2013
Messages
5
@cyberjock: Just wanted to thank you for this info about ECC RAM. While I've been fortunate not to have a major problem running FreeNAS ZFS pools with non-ECC RAM, you convinced me to upgrade my hardware. Data -- especially that which is irreplaceable -- is very important to me. I migrated to FreeNAS when my Netgear ReadyNAS Duo failed a year ago ... thankfully I had an offline backup at that time (and still do). Hardware ordered for my new ECC-based system is as follows - set me back a little, but my data is worth it.

AsRock E3C223D2I Mini-ITX (w/SATAx6)
Intel Core i3-4130 Haswell 3.4 GHz Dual-Core
Kingston KVR1333D3E9S/8G (x2) ECC RAM

Anyway, thanks for looking out for the rest of us.
 

Joshua Parker Ruehlig

Hall of Famer
Joined
Dec 5, 2011
Messages
5,949
Just upgraded my system to ECC ram, had to buy a new motherboard ($115) and ram ($260). Tested and ECC is working http://hardforum.com/showthread.php?t=1693051 :)
  • CPU - g530
  • Mobo - MBD-X9SCM-O
  • Ram - 4*8GB HMT41GU7MFR8C-PB
 

DrKK

FreeNAS Generalissimo
Joined
Oct 15, 2013
Messages
3,630
Wow. Nice Time Machine Intel Celeron, sir. Perhaps you might want to update your CPU to something from the Obama administration :)

Seems a shame to have such a nice new motherboard with a CPU that old. :)
 

Joshua Parker Ruehlig

Hall of Famer
Joined
Dec 5, 2011
Messages
5,949
Wow. Nice Time Machine Intel Celeron, sir. Perhaps you might want to update your CPU to something from the Obama administration :)

Seems a shame to have such a nice new motherboard with a CPU that old. :)

I just like I got it for $35 :p
I could go i3, don't feel like paying $200 for a Xeon
 
Status
Not open for further replies.
Top