ECC RAM Issue with ASUS Motherboard (Random Reboots)

Status
Not open for further replies.

manifest3r

Dabbler
Joined
Jul 3, 2014
Messages
16
Alright guys...I've had this problem for a while. Just never took the time to actually look into it.

I have the ASUS M5A97 LE R2.0 AM3+ Motherboard, it supports ECC RAM. The ECC RAM I am running is the Kingston KVR13E9/8I 8GB 1333MHz DDR3 PC3-10666 ECC I ran a memtest for 12ish hours, no problem. The CPU I am using is the AMD Athlon II X4 620, which supports ECC RAM as well.

Well...what IS the problem? I haven't been able to put a finger on it, but I am guessing it's some compatibility issue with the RAM. No, I haven't tried non-ECC RAM or another stick of ECC RAM as I do not have the funds to get a spare.

What other troubleshooting steps can I take to make my system more stable or pinpoint what exactly is the issue?
My FreeNAS box returns zero issues when it randomly reboots, it just shuts off, then starts back up. I have tried modifying some BIOS RAM settings, and it went from 1-2 hour reboots to 5-16 hour reboots. I have tried changing memory banks, but it doesn't matter where I put the RAM, it still reboots. I have also tried switching the USB stick FreeNAS is installed on as well as reinstalling FreeNAS from the ground up.

Here's some info about the RAM:

Code:
# dmidecode 2.12
SMBIOS 2.7 present.

Handle 0x002C, DMI type 17, 34 bytes
Memory Device
        Array Handle: 0x002A
        Error Information Handle: Not Provided
        Total Width: Unknown
        Data Width: 64 bits
        Size: No Module Installed
        Form Factor: DIMM
        Set: None
        Locator: DIMM0
        Bank Locator: BANK0
        Type: Unknown
        Type Detail: Synchronous
        Speed: Unknown
        Manufacturer: Manufacturer0
        Serial Number: SerNum0
        Asset Tag: AssetTagNum0
        Part Number: Array1_PartNumber0
        Rank: Unknown
        Configured Clock Speed: Unknown

Handle 0x002E, DMI type 17, 34 bytes
Memory Device
        Array Handle: 0x002A
        Error Information Handle: Not Provided
        Total Width: 72 bits
        Data Width: 64 bits
        Size: 8192 MB
        Form Factor: DIMM
        Set: None
        Locator: DIMM1
        Bank Locator: BANK1
        Type: DDR3
        Type Detail: Synchronous Unbuffered (Unregistered)
        Speed: 1333 MHz
        Manufacturer: Kingston
        Serial Number: 4424CD7
        Asset Tag: AssetTagNum1
        Part Number: 9965525-026.A00LF
        Rank: 2
        Configured Clock Speed: 667 MHz

Handle 0x0030, DMI type 17, 34 bytes
Memory Device
        Array Handle: 0x002A
        Error Information Handle: Not Provided
        Total Width: Unknown
        Data Width: 64 bits
        Size: No Module Installed
        Form Factor: DIMM
        Set: None
        Locator: DIMM2
        Bank Locator: BANK2
        Type: Unknown
        Type Detail: Synchronous
        Speed: Unknown
        Manufacturer: Manufacturer2
        Serial Number: SerNum2
        Asset Tag: AssetTagNum2
        Part Number: Array1_PartNumber2
        Rank: Unknown
        Configured Clock Speed: Unknown

Handle 0x0032, DMI type 17, 34 bytes
Memory Device
        Array Handle: 0x002A
        Error Information Handle: Not Provided
        Total Width: Unknown
        Data Width: 64 bits
        Size: No Module Installed
        Form Factor: DIMM
        Set: None
        Locator: DIMM3
        Bank Locator: BANK3
        Type: Unknown
        Type Detail: Synchronous
        Speed: Unknown
        Manufacturer: Manufacturer3
        Serial Number: SerNum3
        Asset Tag: AssetTagNum3
        Part Number: Array1_PartNumber3
        Rank: Unknown
        Configured Clock Speed: Unknown


Uptime
Code:
# uptime
9:49PM  up 11:12, 1 user, load averages: 3.05, 1.42, 1.08


EDIT:
10/27: I updated the BIOS from v1903 to v2501, so a HUGE jump. I will update on progress...
10/28: Server rebooted after 16 hours and 30-some minutes. Disabled audio and serial ports on motherboard, as suggested by @Urs.
 
Last edited:

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
AMD systems are known for seemingly random issues nobody understands, especially in corner cases (like ECC).
 

DJABE

Contributor
Joined
Jan 28, 2014
Messages
154
Perhaps some PCI device are causing Kernel panic. I had an similar issue with PCI SATA controller based on stupid VIA chip. NAS would just freeze...
 
L

L

Guest
At times like this I miss solaris. Solaris/illumos will retire pages of memory with too many errors.
 

Urs

Dabbler
Joined
Oct 23, 2014
Messages
26
Hi,

i had with my setup no problems about stability.
Running absolutely without problems (uptime ist just about 2 months).

Have you checked everything else in your system, like powersupply, thermal problems?
I would suggest to take an ubuntu-live usb/cd and run cpuburn for 72h to see if it is a thermal problem. Also run MEMTEST for 72h, thats about to see if is stable.
Have you deactivated all not used parts like audio on your mobo?
 

manifest3r

Dabbler
Joined
Jul 3, 2014
Messages
16
Hi,

i had with my setup no problems about stability.
Running absolutely without problems (uptime ist just about 2 months).

Have you checked everything else in your system, like powersupply, thermal problems?
I would suggest to take an ubuntu-live usb/cd and run cpuburn for 72h to see if it is a thermal problem. Also run MEMTEST for 72h, thats about to see if is stable.
Have you deactivated all not used parts like audio on your mobo?

The server restarted after 16 hours.

I went and disabled those ports. If it's still unstable I might have to do a cpuburn and memtest like you suggested.
 
Last edited:

Rsulliv1

Cadet
Joined
May 13, 2016
Messages
1
Were you able to get to the bottom of this? That mobo doesn't list any 8gb ECC sticks in their QVL, was the issue with using an unsupported ECC module size?
 

manifest3r

Dabbler
Joined
Jul 3, 2014
Messages
16
Were you able to get to the bottom of this? That mobo doesn't list any 8gb ECC sticks in their QVL, was the issue with using an unsupported ECC module size?
Turned out it was a bad stick of RAM. I upgraded to a workstation server that did actually support ECC unregistered RAM and right on POST it said the RAM stick was bad.

I'm running 20gb of non-ECC atm (I know, I know), but my data isn't really important, so if I lose it, it's nothing valuable.
 
Status
Not open for further replies.
Top