MCA Memory Error

Status
Not open for further replies.

Matt84

Dabbler
Joined
May 24, 2016
Messages
22
Hi all,

I noticed the following errors on the console today and from my research it suggests my ECC memory has detected and corrected a single error. The board is a Supermicro MBD-X10SDV-TLN4F-O and the memory is the Samsung M393A4K40BB0-CPB on the QVL list.

Code:
Mar  8 06:26:24 freenas MCA: Bank 10, Status 0x8c000041000800c1
Mar  8 06:26:24 freenas MCA: Global Cap 0x0000000001000c16, Status 0x0000000000000000
Mar  8 06:26:24 freenas MCA: Vendor "GenuineIntel", ID 0x50663, APIC ID 0
Mar  8 06:26:24 freenas MCA: CPU 0 COR (1) MS channel 1 memory error
Mar  8 06:26:24 freenas MCA: Address 0xc4a2968c0
Mar  8 06:26:24 freenas MCA: Misc 0x90840040004028c


Is this something I should worry about straight away, or is it something that should only be a concern if the frequency and error number starts increasing? Also how can I tell which DIMM had the fault. Looking at other posts on this forum about the same thing it mentioned looking at the Bank Locator value, but I can't see a match from the dmidecode output shown below. Is there any way to match it up conclusively?

Code:
# dmidecode 3.0
Scanning /dev/mem for entry point.
SMBIOS 2.8 present.

Handle 0x0028, DMI type 16, 23 bytes
Physical Memory Array
  Location: System Board Or Motherboard
  Use: System Memory
  Error Correction Type: Multi-bit ECC
  Maximum Capacity: 128 GB
  Error Information Handle: Not Provided
  Number Of Devices: 4

Handle 0x002A, DMI type 17, 40 bytes
Memory Device
  Array Handle: 0x0028
  Error Information Handle: Not Provided
  Total Width: 72 bits
  Data Width: 64 bits
  Size: 32 GB
  Form Factor: DIMM
  Set: None
  Locator: DIMMA1
  Bank Locator: P0_Node0_Channel0_Dimm0
  Type: DDR4
  Type Detail: Synchronous
  Speed: 2133 MHz
  Manufacturer: Samsung
  Serial Number: 32148B75
  Asset Tag: (Date:16/14)
  Part Number: M393A4K40BB0-CPB
  Rank: 2
  Configured Clock Speed: 2133 MHz
  Minimum Voltage: Unknown
  Maximum Voltage: Unknown
  Configured Voltage: 0.003 V

Handle 0x002C, DMI type 17, 40 bytes
Memory Device
  Array Handle: 0x0028
  Error Information Handle: Not Provided
  Total Width: Unknown
  Data Width: Unknown
  Size: No Module Installed
  Form Factor: DIMM
  Set: None
  Locator: DIMMA2
  Bank Locator: P0_Node0_Channel0_Dimm1
  Type: DDR4
  Type Detail: Synchronous
  Speed: Unknown
  Manufacturer: NO DIMM
  Serial Number: NO DIMM
  Asset Tag: NO DIMM
  Part Number: NO DIMM
  Rank: 1
  Configured Clock Speed: Unknown
  Minimum Voltage: Unknown
  Maximum Voltage: Unknown
  Configured Voltage: 0.003 V

Handle 0x002D, DMI type 17, 40 bytes
Memory Device
  Array Handle: 0x0028
  Error Information Handle: Not Provided
  Total Width: 72 bits
  Data Width: 64 bits
  Size: 32 GB
  Form Factor: DIMM
  Set: None
  Locator: DIMMB1
  Bank Locator: P0_Node0_Channel1_Dimm0
  Type: DDR4
  Type Detail: Synchronous
  Speed: 2133 MHz
  Manufacturer: Samsung
  Serial Number: 32147F28
  Asset Tag: (Date:16/14)
  Part Number: M393A4K40BB0-CPB
  Rank: 2
  Configured Clock Speed: 2133 MHz
  Minimum Voltage: Unknown
  Maximum Voltage: Unknown
  Configured Voltage: 0.003 V

Handle 0x002F, DMI type 17, 40 bytes
Memory Device
  Array Handle: 0x0028
  Error Information Handle: Not Provided
  Total Width: Unknown
  Data Width: Unknown
  Size: No Module Installed
  Form Factor: DIMM
  Set: None
  Locator: DIMMB2
  Bank Locator: P0_Node0_Channel1_Dimm1
  Type: DDR4
  Type Detail: Synchronous
  Speed: Unknown
  Manufacturer: NO DIMM
  Serial Number: NO DIMM
  Asset Tag: NO DIMM
  Part Number: NO DIMM
  Rank: 1
  Configured Clock Speed: Unknown
  Minimum Voltage: Unknown
  Maximum Voltage: Unknown
  Configured Voltage: 0.003 V


EDIT - Mainboard details provided
 
Last edited:

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,994
You would need to know the memory maping of the RAM slots. Take a look at your user manual for the motherboard, it might have the memory map. But you could also use something like MemTest86+ to test the address range and I think it may list the bank being testing, always something to look at.

You should have posted the motherboard model number.

My advice is to just watch your system and if you feel compelled to, run MemTest for a few days to see how things go.
 

Matt84

Dabbler
Joined
May 24, 2016
Messages
22
I updated my previous post with the motherboard model and also checked the motherboard manual looking for a memory map but did no find anything. I did find the memory settings relevant to this and I have Memory RAS Patrol Scrub set to enabled at the default interval of 24 hours as well as having Demand Scrub enabled. The Memory Corr. Error Threshold is at the default setting of 10 which must be why it didn't write to the board's event log.

http://www.supermicro.com/manuals/motherboard/D/MNL-1726.pdf

I might monitor this and if any more crop up then I might look at MemTest86 which from what I can tell I'll need a paid version to test ECC memory.

Thanks for your help
 
Status
Not open for further replies.
Top