BUILD Working AMD "prosumer" Build

Status
Not open for further replies.

Urs

Dabbler
Joined
Oct 23, 2014
Messages
26
Hi,

beeing an "Underdog-Lover", i wanted to build an cheap but moderate powerful system based on the underdog AMD. ECC should work so i started googling for ecc-compatibel AMD non server stuff.
I found the following:

CPU: AMD FX-8350
Mainboard: ASUS M5A97-R2.0
RAM: 2x 8GB Kingston DDR3-1600 ECC, (KVR16E11/8)

It works just out of the box, running for 2 month now, receiving a daily rsync of about 50GB changing data and snapshotting the whole data of 1TB once a day.
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
Good luck, but you're not out of the woods.

Can you somehow make sure ECC is working?
 

Urs

Dabbler
Joined
Oct 23, 2014
Messages
26
Terminal Ubuntu 14.04 (this is on my Virtualbox-Host with same Hardware, only 4x 8GB Ram):

sudo dmidecode -t memory
[sudo] password for XXXX:
# dmidecode 2.12
# SMBIOS entry point at 0x000f04c0
SMBIOS 2.7 present.

Handle 0x002C, DMI type 16, 23 bytes
Physical Memory Array
Location: System Board Or Motherboard
Use: System Memory
Error Correction Type: Multi-bit ECC
Maximum Capacity: 32 GB
Error Information Handle: Not Provided
Number Of Devices: 4

Handle 0x002E, DMI type 17, 34 bytes
Memory Device
Array Handle: 0x002C
Error Information Handle: Not Provided
Total Width: 72 bits
Data Width: 64 bits
Size: 8192 MB
Form Factor: DIMM
Set: None
Locator: DIMM0
Bank Locator: BANK0
Type: DDR3
Type Detail: Synchronous Unbuffered (Unregistered)
Speed: 1600 MHz
Manufacturer: Kingston
Serial Number: B81207A
Asset Tag: AssetTagNum0
Part Number: 9965525-024.A00LF
Rank: 2
Configured Clock Speed: 800 MHz

Handle 0x0030, DMI type 17, 34 bytes
Memory Device
Array Handle: 0x002C
Error Information Handle: Not Provided
Total Width: 72 bits
Data Width: 64 bits
Size: 8192 MB
Form Factor: DIMM
Set: None
Locator: DIMM1
Bank Locator: BANK1
Type: DDR3
Type Detail: Synchronous Unbuffered (Unregistered)
Speed: 1600 MHz
Manufacturer: Kingston
Serial Number: CE0D1F1
Asset Tag: AssetTagNum1
Part Number: 9965525-116.A00LF
Rank: 2
Configured Clock Speed: 800 MHz

Handle 0x0032, DMI type 17, 34 bytes
Memory Device
Array Handle: 0x002C
Error Information Handle: Not Provided
Total Width: 72 bits
Data Width: 64 bits
Size: 8192 MB
Form Factor: DIMM
Set: None
Locator: DIMM2
Bank Locator: BANK2
Type: DDR3
Type Detail: Synchronous Unbuffered (Unregistered)
Speed: 1600 MHz
Manufacturer: Kingston
Serial Number: CD12D5A
Asset Tag: AssetTagNum2
Part Number: 9965525-024.A00LF
Rank: 2
Configured Clock Speed: 800 MHz

Handle 0x0034, DMI type 17, 34 bytes
Memory Device
Array Handle: 0x002C
Error Information Handle: Not Provided
Total Width: 72 bits
Data Width: 64 bits
Size: 8192 MB
Form Factor: DIMM
Set: None
Locator: DIMM3
Bank Locator: BANK3
Type: DDR3
Type Detail: Synchronous Unbuffered (Unregistered)
Speed: 1600 MHz
Manufacturer: Kingston
Serial Number: D10D271
Asset Tag: AssetTagNum3
Part Number: 9965525-116.A00LF
Rank: 2
Configured Clock Speed: 800 MHz
 

DJABE

Contributor
Joined
Jan 28, 2014
Messages
154
I doubt it's an Multi-bit ECC...
On my SM / Crucial ECC presents as a single bit ECC... how that could be?

I think that CPU is too strong for a NAS system, I have the same CPU on an Hypervisor machine... and it's really good for that. It takes a lots of power, especially if you did O/C like I did..

However, I like your build very much, as I wanted to build something similar on AMD platform, but didn't wanted to buy ECC ram for a MoBo where I cannot tell for sure if it's working with all 9 chips, or just 8 as a regular non-ECC memory...
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
Why would you ever overclock the CPU on a hypervisor? Bad enough for a single machine... but do you really want to risk instability and corruption with multiple VM's?
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
Terminal Ubuntu 14.04 (this is on my Virtualbox-Host with same Hardware, only 4x 8GB Ram):

Yeah, that doesn't prove a darn thing. That's the problem with AMD. The dmidecode works on Intel chipsets (and only specific ones at that). Do you know how many AMD boards can be validated for ECC with dmidecode? Zero. AMD has provided no documentation that makes this possible.
 

DJABE

Contributor
Joined
Jan 28, 2014
Messages
154
Why would you ever overclock the CPU on a hypervisor? Bad enough for a single machine... but do you really want to risk instability and corruption with multiple VM's?
Since raw (horse) power is important in certain (many?) workloads.
Instability is not an issue if you know what you're doing. All of my CPU's (except NAS box :D) are overclocked by default. 2500 MHz -> 4600, not bad for free :) My FX 8320 goes to 4600 MHz x 8 "cores" = lots of brain power for as many as 32 VM's with low to moderate workloads... but that's different topic. In short, never had any issue with any clocked machine, back from the PII/III era.


However, AMD seems to be a black box with ECC support. Or better said - ECC as whole is a black box from the simple fact you cannot tell for sure if your ECC function is really working or not...
 

DJABE

Contributor
Joined
Jan 28, 2014
Messages
154
Something comes on my mind - anyone tried AM3+ CPU on an server-grade MoBo for Opteron CPU series? Like we do it with Intel Pentiums and i3's...
Right now I cannot compile which sockets are compatible with which series..
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
I am /so/ confused as to why someone who is willing to risk instability and corruption by running parts out of spec would give a flying f about ECC.
 

anodos

Sambassador
iXsystems
Joined
Mar 6, 2014
Messages
9,554
I am /so/ confused as to why someone who is willing to risk instability and corruption by running parts out of spec would give a flying f about ECC.
What? You haven't ever overclocked a server to make it more fasterer? I only stripe my pools because redundancy is for pansies. AND I power everthing with an 10 year old PSU from an optiplex. :)
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
What? You haven't ever overclocked a server to make it more fasterer?

I've been known to underclock things. But basically I expect computers to work reliably. It is easy enough to throw a little money at the problem and "make it more fasterer" that way.
 

anodos

Sambassador
iXsystems
Joined
Mar 6, 2014
Messages
9,554
I've been known to underclock things. But basically I expect computers to work reliably. It is easy enough to throw a little money at the problem and "make it more fasterer" that way.
I know of one small business where the IT guy built the "server" with desktop components and overclocked the CPU. He used the "server" to play games after work hours. :) For some reason, the server was rather unstable. There are times when you don't know whether to laugh or grimace.
 

Urs

Dabbler
Joined
Oct 23, 2014
Messages
26
Mine isn´t overclocked ;-) I can play games at home (or on my workstation...).

About the question if ECC is running, i did a screenshot of the bios.



Bios says ECC is enabled.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
Yes, and if you read up on the ECC stuff, being "enabled" is not the same as functional. You can often "enable" VT-d on hardware that is clearly incapable of supporting VT-d.

Unfortunately on AMD there is *no* technical way to validate that ECC is actually functional. So the question is "do you trust that if the motherboard manufacturer claims to use ECC that it does?"

We've had numerous motherboard manufacturers that made that claim and were later found to be lying. ;)
 

Urs

Dabbler
Joined
Oct 23, 2014
Messages
26
Your complains about how to prove that ECC is working didn´t let me sleep. i´m taking that seriously.
Today i took a new Memtest86 V5 (without the plus). It states that the RAM/system is capable of ECC and that it is enabled.
But i will look for more evidence. Now i will have to doo some more things:

1. I will have a look for edac-utils
2. Memtest86 V5 has a function to insert ECC errors on certain system to check if ECC is really working. I´m in contact to passmark what systems can do ECC error injection to decide if i buy the pro version. (i definitly will if my hardware is supported)
3. Ultimatly if 2. will not work i will build a "single error ram" as stated in http://www.passmark.com/forum/showthread.php?4470-MemTest86-ECC-RAM-error-reporting-status
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
Here's the problems.. (don't keep reading if you want to miss more sleep)

1. How does Memtest come up with that "ECC supported/enabled" message? If it's going by the same bits as dmidecode, well, we know that's already garbage. If not, what is it using, and how can we validate it is valid or not?
2. There is NO WAY to "insert ECC errors". ECC is handled in hardware and there is no way to "force" ECC errors. You *can* however generate a fake error for the purpose of testing logging functions. There's a command somewhere to do it (dmidecode parameter I believe), but there is no way to insert an ECC error since software doesn't have that kind of control over the hardware. (If you know otherwise be prepared to provide a link because I know quite a few people that would love to see that).
3. Let's assume you build that bad RAM stick (which I'm going to make an assumption that you are not experienced enough to handle this). I've tried to build it twice and failed, despite doing soldering and such for a living. Your test will only validate single-bit errors. That's all fine and dandy, except when RAM fails it's typically more than a single bit error, which also means you need to be able to rely on the system halting itself with a proper log entry for a multi-bit RAM error. How do you plan to test that? ;)

So even if you have a working "bad RAM stick", and even if it does pass your test, how are you going to prove multi-bit errors?

The hole keeps getting deeper and deeper.

If you've been paying attention to the forums, there's some growing fears that AMD is not going to play nice with 9.3. Check the poll in the 9.3 forums. AMD's lack of support in FreeBSD is clearly showing. Right now 9.3 looks like it's not going to work for many AMD users, and 10.1 likely even less. So going AMD is very likely a bad choice if you don't want to be alienated by code that isn't supported on AMD hardware. I've had 3 or 4 people message me saying 9.3 won't even finish booting before panicing. Of the two that answered the poll one said it would boot and one said it wouldn't.

But hey, it's your choice and your money. ;) Just don't be surprised if someday a version of FreeNAS comes out that says "no AMD support".
 

DJABE

Contributor
Joined
Jan 28, 2014
Messages
154
I can't figure out what CPU architecture has to do with a certain OS?
Both Intel and AMD CPU's are based on well known x86 (i386 / amd64) architecture. You don't need a driver for a certain CPU at an OS level...
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
I can't figure out what CPU architecture has to do with a certain OS?
Both Intel and AMD CPU's are based on well known x86 (i386 / amd64) architecture. You don't need a driver for a certain CPU at an OS level...

Specific instructions aren't present in all processors. For instance, Windows 8 requires SSE2 and won't boot if the CPU doesn't have these instructions.
Something similar may happen with FreeNAS 10, but there's no solid data at the moment.

If you've bought the hardware, don't lose sleep over this. If you haven't, consider reducing the probability of problems in the future.

I'm sure there will be more details publicly available as soon as the dust has settled and some proper investigation has taken place.
 

Urs

Dabbler
Joined
Oct 23, 2014
Messages
26
Hi together,

its getting REAL interesting by now for those who like to dig deep ;-)

1. At the moment i dont know how Memtest does that. A possible tool for linux ist edac_utils, it reads and writes directly to the memory controller (i want to look tomorrow into that).
3. I could build such a stick at work (Working at University, my lab where i am working as senior engineer has a complete solution for that) but i hope to not needing that ;-)
2. Injection of ECC errors !really writing bad ram data! is supported for amd family 15h processors (family 16h also) as written in the BIOS and Kernel Developer’s Guide (BKDG) for AMD Family 15h on page 173. I think this is how Passmark is doing it!

Edit: Just had a look for Xeons, it also possible there! (According to their manual)
 
Last edited:
Status
Not open for further replies.
Top