SOLVED Help troubleshoot randoom reboots

Status
Not open for further replies.

bigphil

Patron
Joined
Jan 30, 2014
Messages
486
I would appreciate a little help trying to find the cause of random reboots on a FreeNAS system.

9.2.1.2 Release
i5-2400S
Intel DQ67EP motherboard
8gb non-ecc
intel pro/1000 nic's

I understand that the hardware is not the greatest here, and if its a hardware issue, so be it...I would bet on memory. I would just like a little assistance to see if there is anything in the logs that could point to any other possible issues. The reboots are usually random in nature, but almost guaranteed to happen when I perform a Veeam backup over NFS from an ESXi 5.5 host. The system will reboot almost immediately after the job starts and then it comes back up and continues on its way without any issue until the next backup job runs. I have syslog going to the .system dataset and have looked at the logs but nothing seems obvious to me. The latest crash was 6am on the nose this morning, but no Veeam job runs then. I've attached some logs if anyone with knowledge on this would care to look at them. Also...is there any additional logging I may turn on that could assist in tracking this down? Thanks!
 

Attachments

  • Crash.zip
    59.8 KB · Views: 177

Yatti420

Wizard
Joined
Aug 12, 2012
Messages
1,437
Did you properly test the system before deployment? Memtest and hard drive checks..
 

bigphil

Patron
Joined
Jan 30, 2014
Messages
486
Did you properly test the system before deployment? Memtest and hard drive checks..

Hard disks are known good and all smartctl looks good. The system is running on a 4GB usb stick that had previously had ESXi running on it with zero issues...doubt that's a problem. However, like I said in my initial post...it could very well be hardware and likely memory as its not ECC and is different than the memory that was previously in it when running ESXi on it. I cant swap the memory to the old stuff because its now in my wifes pc and I dare not touch it. The memory in the FreeNAS box wouldn't work in her pc so I had to swap. I am just looking for guidance on any additional logging that could be enabled (i.e. nfsd maybe?) or additional logs to look at that aren't in the attachment on post #1. I'm planning to build a new box with a spare E5620 chip I have, so I'm not overly concerned...just looking to learn a little more about additional logging.
 

bigphil

Patron
Joined
Jan 30, 2014
Messages
486
Turns out it was a bad DIMM slot on the Intel motherboard. I have a new supermicro system up and running...running memtest on it now ;-)
 

DJABE

Contributor
Joined
Jan 28, 2014
Messages
154
So, have you lost your pool, or noticed some data corrupt issues?
I think reboots are actually very good in such situation! Your MB/CPU/OS immediately halts/reboots the system and you are warned something is wrong... much better than so called 'silent corruption'.
 

bigphil

Patron
Joined
Jan 30, 2014
Messages
486
I haven't seen any data corruption on the pool. It's hosting some Veeam backups and a dataset for a test Exchange Server database and they both appear to be good.
 

DJABE

Contributor
Joined
Jan 28, 2014
Messages
154
That's great. I just wonder how your OS (FreeBSD) or your MOBO was aware of the bad RAM....
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
It probably wasn't... if you remove a stick of RAM with a system on it reboots. Not sure why, but it does. The bad RAM slot probably caused the RAM stick to appear to be removed, triggering a reboot.

Also, it's not necessarily a guarantee it's his RAM stick. For example, the voltage regulators that provide power for the RAM slot could be fluctuating on that slot, giving the appearance it's a bad slot. And everyone knows bad voltage regulators doesn't make for a stable machine.

Correlation isn't necessarily causation. But it may be enough for you to know that the hardware is bad.
 
Status
Not open for further replies.
Top