Memory Errors

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
What is the correct memtest version/variant that should i use?
Oof, that's a can of worms. There was a recent discussion about that, it seems that Memtest86 (no plus) finally seems to more or less support ECC features, but you're still dependent on firmware not doing stupid things. Note that Memtest86+ has been abandoned for years... and holy crap, there's an update!?

Did someone resolved this with re seating the memory?
It won't hurt, but outside of high-vibration or frequent thermal cycling environments it's not likely to be a mechanical issue.
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,994
I am encountering the same MCA Memory error.

Jun 11 22:12:25 fns MCA: Bank 7, Status 0xcc1005c000010091
Jun 11 22:12:25 fns MCA: Global Cap 0x0000000001000c1d, Status 0x0000000000000000
Jun 11 22:12:25 fns MCA: Vendor "GenuineIntel", ID 0x306e4, APIC ID 0
Jun 11 22:12:25 fns MCA: CPU 0 COR (16407) OVER RD channel 1 memory error
Jun 11 22:12:25 fns MCA: Address 0xf3bb22540
Jun 11 22:12:25 fns MCA: Misc 0x1420aca86
Unfortunately this will likely be a hardware failure. I see in your signature that you have a dual CPU motherboard... How about being a little more specific, provide all the specs on your MB (make/model), CPUs, RAM (make/model), and the specific slots the RAM resides in. Was the system working just fine and for how long? If the system was working fine, did yo upgrade or update any of FreeNAS or maybe some jails/VMs? And what version of FreeNAS are you running.

My first theory is you just built this system and you are getting these error messages so with any luck you have the RAM plugged into the wrong slots, or maybe you have the wrong type of RAM all together. Without knowing anything I'm trying to make an assumption so you can get started troubleshooting. Please don't take anything I say offensive, it certainly isn't meant to make you feel bad.

Good luck, troubleshooting RAM problems isn't terribly difficult but it can be very time consuming. When you do run MemTest86, run it overnight and if some failures occur, well hopefully we (the group) can troubleshoot them.
 

Radu

Dabbler
Joined
Mar 7, 2014
Messages
45
First of all, thanks for your help and intention.

The answer was PEPSI!
See the attached picture.

Someone did it for me :)

The best identification method for the single bit ECC error is the BIOS SEL, it tells the exact CPU and slot. The MCA error message is not that easy to interpret.

The MB made a shower in the tub, alongside the MB mounting plate and now i have a 16GB ECC DIMM in the ultrasonic cleaner, the one with the MCA error. I have personally saved another 8GB ECC DIMM some time ago with the ultrasonic cleaner, hope it works this time also.

Just to have an ideea:
Dual Xeon 2696v2
128GB ECC DDR3 now minus 32GB :)
10G Chelsio SFP+
2xZ2 HDDs 1xZ1 SSDs and one nvme for jails and stuff
2xHBA P20 LSI 2308
Single Corsair HX850PSU
on some aluminium frame wheeled case
Freenas 11.3-U3.2
 

Attachments

  • mb.jpg
    mb.jpg
    436.8 KB · Views: 291
Top