Consistent system freezes every week

Status
Not open for further replies.

blackangus

Dabbler
Joined
Aug 11, 2016
Messages
16
I am having an issue where my system is freezing hard (no kernel panics) about every week. I've had it happen about three times now and it's suspiciously consistent. Note that I've never had any kernel panics nor anything go wrong with my ZFS volume. Everything has always been reported as healthy, even the boot volume. I don't run any plugins, only CIFS, SSH, and the usual default services of FreeNAS.

Here are the usual flow of events when this happens: I wake up one day and notice that my server can't be reached at all (HTTP or SSH). I hook up a monitor to the server to see what has happened and don't see anything wrong. Normal messages are showing such as information about the crypto I'm using for my volume, mDNS messages, etc. Then I go to plug a keyboard into the machine but it doesn't recognize it or respond at all. I press the power button on the machine and nothing happens (usually this triggers a shutdown when the system is fine). I now have to hold the power button to power cycle the whole thing and get it back up and running.

Here's what I've done so far:
  1. Tested my RAM for an entire weekend with Memtest86+ with no reported errors
  2. Checked SMART on all of my disks (everything's reported fine)
  3. Tried new RAM that I know is good from another running system
  4. Reinstalled FreeNAS
  5. Looked at various logs (dmesg, messages) which shows nothing of interest. Looking at the logs for the date/time right before my freeze in /var/log/messages shows very little but nothing related to the freeze and is followed by the logs from me staring the system up from a hard reboot. dmesg doesn't seem to give any indication that anything is wrong
If anyone has any other ideas for what I can look at or try to do I'd greatly appreciate it! I am much more familiar with Linux than I am with FreeBSD so I'm sure there's something I'm missing.

Here's a general view of my hardware (note that I'm NOT using any kind of hardware RAID, just RAIDZ1 with ZFS):
 
Last edited:

JoshDW19

Community Hall of Fame
Joined
May 16, 2016
Messages
1,077
Personally I've had several systems that will experience hard freezes when the motherboards are going out. That can be tough to diagnose though considering most of us don't have a ton of extra motherboards sitting around to swap out :).
 

blackangus

Dabbler
Joined
Aug 11, 2016
Messages
16
Personally I've had several systems that will experience hard freezes when the motherboards are going out. That can be tough to diagnose though considering most of us don't have a ton of extra motherboards sitting around to swap out :).
I really hope it's not my motherboard as I don't have any others to test with at this point. When you've experienced the freezes were they consistent? I can almost guarantee now that my system will freeze after 5-6 days of uptime...
 

JoshDW19

Community Hall of Fame
Joined
May 16, 2016
Messages
1,077
With failing motherboards they were never really consistent for me. It could be up for 2 hours and freeze consistently over and over or go for a couple days at a time with no issues. I had a device similar to this at one time: https://www.amazon.com/dp/B005J1SUIO/?tag=ozlp-20. It could run diagnostics, issue a 2 digit reference code, and tell you exactly what was failing on the board (i.e. voltage regulator). If the board was broke enough it would just issue a generic error code in some cases, but you still knew it was a defect. Definitely do your homework though if you decide to buy one and make sure you grab a decent one with good ratings.

Before you grab anything like that though I'd encourage you to rule out anything else it could be (especially because your freezes seem consistent), and see if any of our other users have any other ideas. I really hope it's not your motherboard too. Best of luck!
 

Robert Trevellyan

Pony Wrangler
Joined
May 16, 2014
Messages
3,778
Any correlation with scrubs, SMART tests, or any other scheduled activity, either on the FreeNAS or on any other device connected to the same network?
 

blackangus

Dabbler
Joined
Aug 11, 2016
Messages
16
Any correlation with scrubs, SMART tests, or any other scheduled activity, either on the FreeNAS or on any other device connected to the same network?
I was actually looking at that earlier today. My long SMART tests are scheduled twice a month and on days where I never crashed (although I don't think I've ever hit this schedule as it always crashes before the task runs). My ZFS scrubs are scheduled twice a month and on days where the crashes hadn't happened. I have snapshots scheduled everyday but I just recently set that up within the last week. I've had these consistent crashes since I built the machine early in July.

Any other ideas? :(

BTW has anyone heard of or used this before? http://www.inquisitor.ru/about/ It was linked to in a few posts on the FreeBSD forum where people were able to find some issues with it. It seems to check a lot of stuff http://www.inquisitor.ru/doc/tests/index.html

EDIT: I missed something! I do have a short SMART test scheduled to run every 4 days at 6 AM. This closely aligns with how frequently my crashes have been happening. Is there any way in FreeNAS to immediately start a scan and skip the scheduling?
 

styno

Patron
Joined
Apr 11, 2016
Messages
466
Is there any way in FreeNAS to immediately start a scan and skip the scheduling?
Yes, you can run them manually from a shell.
Code:
smartctl -t short -C /dev/daX

Wait a few minutes and review with:
Code:
smartctl -a /dev/daX


I encountered the same symptoms on an Abit VP6 board years ago, the solution then was to replace some bad capacitors. (Google bad capacitors motherboard and do at least a quick visual check on your board to rule this out.)
 

blackangus

Dabbler
Joined
Aug 11, 2016
Messages
16
Thanks for the info, didn't realize FreeNAS just ran those commands exactly as you specified.

I had temporarily changed the schedule so that the short SMART tests ran every hour last night. It didn't seem to cause any issues so I don't think that's related anymore.

I'll take a look at the capacitors on my motherboard and see if I can test with a different motherboard...
 

blackangus

Dabbler
Joined
Aug 11, 2016
Messages
16
Well, I think my NAS is already frozen. I'm not at home but I can't SSH into it via my jump host so I'm assuming it's frozen. This morning when I woke up things seemed fine with it. I figured the short SMART tests weren't causing the issue so I set the schedule back to running them once every 7 days. Now that it's frozen I'm not really sure what to think. This is the first time it's frozen in less than a day. I had powered it on last night around 11:30 PM or so when I kicked off the hourly SMART tests...

I wonder if this is bringing out some issue with the motherboard (memory bus, SATA controller, power fluctuations, etc.) as I don't think my drives are faulty.
 

blackangus

Dabbler
Joined
Aug 11, 2016
Messages
16
I'm running FreeNAS-9.10.1 (d989edd)

I had freezes before the 9.10.1 update as well as after the 9.10.1 update so I wouldn't think its related. Although if he is having issues updating from 8.x to 9.10 I wonder if there's an issue with the 9.10.x line...? I never had anything less than 9.10 since I built this thing less than 2 months ago.
 

blackangus

Dabbler
Joined
Aug 11, 2016
Messages
16
I haven't encountered this before but it looks like my install did not verify correctly after my last freeze:



The following Inconsistencies were found in your Current Install:

List of Files/Directories/Symlinks not Found:

/compat/linux/proc/.donotremove
List of Permission Errors:

/compat/linux/proc Expected MODE: 0755, Got: 0555
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,994
Has the system ever failed in less than 24 hours of rebooting? If No then you could add a CRON job to force a reboot once every 24 hours but I personally hate that work around.

Also you should run a CPU burn in test for 20 minutes, or longer but I wouldn't exceed 1 hour, just my opinion.

Also I hate to burst your bubble but the A10 cpu is an APU. There have been others who have had compatibility issues with AMD APUs in the past which is why if you desire using an AMD product, the FX series will do the job. So with the fact that you have had this problem since day 1, I'd say you have a hardware compatibility issue. To verify something like this I would recommend you do something like proving the system is stable using a different OS like Ubuntu. Disconnect the hard drives and boot up Ubuntu Live DVD. Run it for a week or two without rebooting it. Does it crash and burn, Yes means you have a hardware failure, No means your hardware is not working with FreeNAS.
 

blackangus

Dabbler
Joined
Aug 11, 2016
Messages
16
I don't think I'd be able to put up with the reboot scenario :) I hear you on the APU issue. I knew it wasn't very well supported by FreeBSD but took the risk due to budget constraints.
I like your idea about running Linux for a few weeks to see what happens. I was about to get that going until an update was posted on this https://forums.freenas.org/index.ph...frequently-daily-after-upgrade-to-9-10.45290/ where it was claimed that the issue never happened on 9.3. It sounds like they also swapped out a fair amount of hardware multiple times and also ran Linux like you had suggested without running into any crashes.

I'm going to install 9.3 and if that continues to do the same thing then I'll try out your Linux suggestion.
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,994
Yea, if you have the problem still then you really just need to make sure the hardware isn't just crapping out on you due to some failing component. If it doesn't fail then you may have another option to use FreeNAS but it isn't ideal. You could try to run FreeNAS in a VM, basically run it on top of ESXi or VirtualBox to see if it works. I am running ESXi but I hear VirtualBox is nice too, however I don't know if either one will get you past an incompatible CPU. And the last alternative which is definitely the least desirable but since you don't run plugins or jails, you only run CIFS and SSH (sounds like you are just using this as a simple NAS device) then you could try to run FreeNAS 9.2 or even more earlier versions. I'll bet version 8.1 would work, we never had an AMD reported issue that I can recall back in the early days.

Something else I didn't mention earlier, I wouldn't run two SSDs for the boot devices, one is good enough. This shouldn't cause any problems BUT there isn't any real benefit either unless you have a true RAID card and create a mirror for the boot drives. Just keep a backup of your configuration file and restoration is very easy.
 

blackangus

Dabbler
Joined
Aug 11, 2016
Messages
16
Sounds good, thanks for your help! I actually switched out my two mirrored SSDs just now for one smaller one. My two other SSDs were kinda big too (256 a piece) so I was wasting an incredible amount of space.
I'll keep the VM idea in mind too in case things go poorly. I would like to start setting up jails and spin some VMs up from FreeNAS so I hope I won't be forced to do that.

This is a bit off topic, but do you know why the latest stable version of FreeNAS is using a RC kernel? I think it's on 10.3 RC3. Seems a bit strange that a stable release would move from a 9.3 kernel in FreeNAS 9.3 to an RC release of a kernel...
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,994
This is a bit off topic, but do you know why the latest stable version of FreeNAS is using a RC kernel? I think it's on 10.3 RC3. Seems a bit strange that a stable release would move from a 9.3 kernel in FreeNAS 9.3 to an RC release of a kernel...
I have no earthly idea. Them programmers are years beyond me these days.
 

MrToddsFriends

Documentation Browser
Joined
Jan 12, 2015
Messages
1,338
This is a bit off topic, but do you know why the latest stable version of FreeNAS is using a RC kernel? I think it's on 10.3 RC3. Seems a bit strange that a stable release would move from a 9.3 kernel in FreeNAS 9.3 to an RC release of a kernel...

On FreeNAS-9.10.1 (d989edd):
Code:
~ % uname -r
10.3-STABLE

Am I missing something?

As per release announcement the very first release of FreeNAS 9.10 was based on FreeBSD 10.3-RC3
https://forums.freenas.org/index.php?threads/freenas-9-10-release-now-available.42223/

IIRC I read a comment back then that the FreeNAS devs expected that RC3 is practically final.

Edit: The switch to FreeBSD 10.3-RELEASE was made with 9.10-STABLE-201603252134, a few days after the very first release of FreeNAS 9.10.
 
Last edited:

blackangus

Dabbler
Joined
Aug 11, 2016
Messages
16
On FreeNAS-9.10.1 (d989edd):
Code:
~ % uname -r
10.3-STABLE

Am I missing something?

As per release announcement the very first release of FreeNAS 9.10 was based on FreeBSD 10.3-RC3
https://forums.freenas.org/index.php?threads/freenas-9-10-release-now-available.42223/

IIRC I read a comment back then that the FreeNAS devs expected that RC3 is practically final.

Edit: The switch to FreeBSD 10.3-RELEASE was made with 9.10-STABLE-201603252134, a few days after the very first release of FreeNAS 9.10.
You're right, I only looked at the kernel version after I installed the very first version of 9.10.
 

blackangus

Dabbler
Joined
Aug 11, 2016
Messages
16
Now that it's been a while, I can confirm that downgrading to 9.3 has solved my issue. I've been able to setup some more jails for other things I wanted to do so my overall system load has increased without causing any issues.
 
Status
Not open for further replies.
Top