Do you use ECC memory?

Do you use ECC memory?

  • Yes

    Votes: 11 42.3%
  • No

    Votes: 15 57.7%

  • Total voters
    26
Status
Not open for further replies.

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
Intel? Makes the decision? Difficult? How?

Intel makes some great Xeon CPU's and Supermicro makes some great Xeon motherboards.

Supermicro X9SCL-O, $150 (don't forget to account for the fact that this includes dual Intel ethernets, so you don't need to go and buy a decent network card).

Intel E3-1220, $200

or spend a few bucks for my favorite

Intel E3-1230, $230

So that's what, $350 for a competent server grade system board and CPU?

Okay, fine, you can get an E-350 for $60, that's a big difference, but there's a hell of a difference between a system with only 2 or 3 SATA ports, 16GB max, and a Realtek ethernet, and a system with six SATA ports, 32GB max, two Intel ethernets, and two more expansion slots. Difficult decision? :smile:
 

JaimieV

Guru
Joined
Oct 12, 2012
Messages
742
ECC'd up in my HP Microservers. If I'm going to have so many terabytes of data all looked after by ZFS, the least I can do is make sure that the cache RAM is also validated!
 

batpot

Dabbler
Joined
Nov 29, 2011
Messages
10
Yes, I spent a lot of time yesterday researching different options, and it used to be that you could count on AMD to run ECC but not true anymore. Most AM3 boards don't support ECC (Asus is one notable exception), and NONE of the FM sockets do.

And I still don't know why there's not an Atom/Fusion solution that uses ECC.

So I think I'm going to build around this 35W proc:
http://www.newegg.com/Product/Product.aspx?Item=N82E16819116407
...but to get ECC support requires a ~$200 motherboard.

I'm surprised to see so many people don't bother with it though.

And why do you need a second NIC?
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
Don't think the featureset for that CPU supports ECC.

http://www.cs.toronto.edu/~bianca/papers/sigmetrics09.pdf outlines that ECC is a pretty good idea, errors are more common than you might expect. It's mostly a matter of how much you value your data.

The cold, hard reality is that people are willing to take some risks with their stuff. The likelihood of getting severely corrupted by a single bit error is pretty low. Consumer-grade gear is cheap because it's made in high volume and low quality. Server-grade gear is expensive because it's made in low volume but high quality. The extra cost isn't worth it to many people.
 

batpot

Dabbler
Joined
Nov 29, 2011
Messages
10
Don't think the featureset for that CPU supports ECC.

Supermicro X9SC series, and Asus P8B series support this proc with ECC.

...but is it safe to assume that just because I can use ECC memory with this proc that the ECC features are enabled?
 

survive

Behold the Wumpus
Moderator
Joined
May 28, 2011
Messages
875
Hi batpot,

No, it's not....ECC support is a function of both the proc & CPU. You want ECC, you have to get a C20X series board like the Supermicro X9S series.

-Will
 

batpot

Dabbler
Joined
Nov 29, 2011
Messages
10
hmm...thought I responded earlier, but post isn't showing up.

So...even if intel doesn't state explicitly that it supports ECC, doesn't mean it doesn't support it w/ the right motherboard.

Note here that ECC is not explicitly un-supported for the i3 or Sandy Bridge Pentium:
http://ark.intel.com/compare/52952,53426,53480,53487

And here where a couple users confirmed ECC is working with the G620 and i3-2120:
http://hardforum.com/showthread.php?t=1693051

And here where Supermicro says it should work (at least for the i3):
http://www.supermicro.com/support/faqs/faq.cfm?faq=10712
 

survive

Behold the Wumpus
Moderator
Joined
May 28, 2011
Messages
875
Hi batpot,

I'm not sure how to answer your question....to many negatives for me to keep straight. That said, take a look at this:

http://cache-www.intel.com/cd/00/00/46/78/467819_467819.pdf

page 5

ECC is supported on Intel's lower-end processors like the Pentiums & i3's when used with the proper chipset. Personally I'm using an i3-2100 in an X9S series with ECC memory.

-Will
 

batpot

Dabbler
Joined
Nov 29, 2011
Messages
10
Hi batpot,

I'm not sure how to answer your question....to many negatives for me to keep straight. That said, take a look at this:

http://cache-www.intel.com/cd/00/00/46/78/467819_467819.pdf

page 5

ECC is supported on Intel's lower-end processors like the Pentiums & i3's when used with the proper chipset. Personally I'm using an i3-2100 in an X9S series with ECC memory.

-Will


Funny, because even there it only identifies the i3s, not the Sandy Bridge Pentiums.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
This is intriguing, but also very confusing. I've always understood that with the removal of the FSB the CPU talks directly with the RAM. So "support" for ECC was strictly based on what CPU you are using. Using a CPU that supports ECC then you can(some bioses will require) use ECC RAM.

What is odd is that Supermicro is saying that this is more of a chipset thing. That conflicts with everything I've always been told. More reading is definitely in order!
 

ramius

Dabbler
Joined
Oct 30, 2012
Messages
17
A few day ago my Supermicro X9SCM-F Board finally arrived and I can assure that the Pentium G620 works perfectly. I got in touch with the supermicro support and they send me this chart

image003.jpg

The chart only shows Sandy Bridge CPU's, but it gives you an idea on what you can combine with a C2xx chipset and ECC memory, because ECC UDIMM memory is mandatory for C2xx chipsets.
 

thejestre

Cadet
Joined
Oct 26, 2012
Messages
6
I use ECC RAM on my AMD build in my sig. Processor was about $70 and so was the motherboard. Come to think of it, the RAM was about that price also.

A small price to pay for peace of mind. Even if I could have saved $30 on buying non-ECC RAM, that's only $30 bucks. My data's integrity is worth at least that much.

_theJestre
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
Ok, so curiosity is killing me. Assuming I'm not using ECC RAM, If all of the RAM is written as 1's or 0's(not sure which is better for this test since I'm sleepy) and I simply keep reading all available memory beginning to end, eventually I should get a bit that is wrong.

So here's my question. Are there any programs that will actually do this?

I'm pretty sure the way memtest does its test is unlikely because it tests very small (256kb?) blocks of memory sequentially by writing its test pattern, then reading it. So to have cosmic radiation(or whatever) affect the outcome of the test you'd have to have that small block of RAM written to, then the bit flipped for whatever reason, then the block read. In that order. Considering that the time difference between the write and read is nanoseconds and considering that you are talking about only 256kb(?) or so at at time it's VERY unlikely that you'll ever find an error because of cosmic radiation.

So how COULD I run a test and actually have it tell me when a bit of RAM is flipped because of radiation. I'm very curious to test this with a machine or two that I have laying around just to do it.
 

Stephens

Patron
Joined
Jun 19, 2012
Messages
496
You had it right the first time. Memtest would detect this. At any given time it knows exactly what should be in a block of memory, just like the test you suggest. It's just that some patterns are more likely to stress memory and cause flipping. But you might have to let it run a VERY long time before it'd detect anything. Then once you detect errors, you have to re-run them to see if it's repeatable (permanent) or not (transient).

DRAM Errors in the Wild: A Large-Scale Field Study

BTW, ECC is better, but it isn't a cure-all, just like RAID-Z1 isn't. Any correction system is only as effective as long as errors aren't more severe than what the system is designed to correct. Certainly ZFS and RAIDZ1 is better than nothing, and so is ECC. Also, the more RAM you have, the more likely you are to experience a bitflip somewhere in your working set.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
You had it right the first time. Memtest would detect this. At any given time it knows exactly what should be in a block of memory, just like the test you suggest. It's just that some patterns are more likely to stress memory and cause flipping. But you might have to let it run a VERY long time before it'd detect anything. Then once you detect errors, you have to re-run them to see if it's repeatable (permanent) or not (transient).

How are you coming to this conclusion? I'm 99% sure that Memtest doesn't work in more than 4GB blocks(might be 2GB). On the memtest screen it always has a field for "testing" and that one has always been less than 4GB from what I remember. This means, at least to me, that memtest isn't testing all of the sticks of RAM at the same time but but is testing it in small chunks. To catch a memory error from cosmic rays you really need to write a single pattern to all of your RAM and then keep reading all of your RAM over and over until you get a different result somewhere. This isn't how memtest works, so you may have to run memtest for decades(centuries?) to get a single error. Since Intel says that one should expect 150 errors per year, I'd expect that if I wrote a test pattern and then reread the pattern, I'd expect at least 1 error within about a week. I've had machines run for more than a week before with no errors, so I kind of dismiss memtest for multiple reasons.

BTW, ECC is better, but it isn't a cure-all, just like RAID-Z1 isn't. Any correction system is only as effective as long as errors aren't more severe than what the system is designed to correct. Certainly ZFS and RAIDZ1 is better than nothing, and so is ECC. Also, the more RAM you have, the more likely you are to experience a bitflip somewhere in your working set.

I believe ECC can correct single bit errors and identify(but not correct) multi-bit errors, so that's about the best you're gonna get considering the harshness of reality.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
Also, memtest does alot more writing than just to the block it wants to test. It writes specific patterns to "adjacent physical memory location" to help identify RAM that has potential issues with the insulation material. It's actually MUCH more complicated than "write a block, read the block". It does alot of other stuff to try and rule out a neighboring location's insulation breakdown and other possible failure modes for your RAM. That's one of the reasons why a single pass takes SO long. If all you had to do was write to every block and read every block the test would take seconds.
 

Stephens

Patron
Joined
Jun 19, 2012
Messages
496
Just a response to acknowledge the discrepancy between what I said and your correct explanation. Not everyting you said contradicted what I said, but I'm not going to be able to do any digging/testing/explaining anytime soon so I'll leave it at that. I did have a software memory tester that allowed you to specify the pattern (all 0's if you want, but I used a different pattern) and keep reading it back, but I can't even be sure that was a version of memtest at this point, and if so, which version. I do know for sure I never ran it with more than 4GB because at the time I did it systems were smaller. At the time, I did it for exactly the reason you want to use it: I wanted to see the difference between ECC and non-ECC RAM. Actually, it may have even been parity vs non-parity at that time, not ECC. Anyway, good luck.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
Do you know what program you had used? I'd love to get a program that writes to 99% of your RAM and then just reads and re-reads until a bit(or bits) are flipped.
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
Well hell that's easy enough to write, assuming you don't mind a UNIX kernel running. Allocate a block of memory. Lock it in-core. Write it with your pattern. Read it back in a continuous loop. It isn't going to be as rigorous as something like memtest86 though.
 
Status
Not open for further replies.
Top