Jails and stability?

Status
Not open for further replies.

rungekutta

Contributor
Joined
May 11, 2016
Messages
146
So I recently built and set up my system:
SuperMicro X11SSM-F
Intel i3 6100
Samsung 16GB DDR4 ECC 2133Mhz
4x 3TB WD Red
2x 3TB Seagate NAS
The 6 drives arranged in RAIDZ2
32GB SuperDOM for boot

FreeNAS version 9.10-STABLE.

The system has been rock-solid from start and through burn-in with the exception of when I try to use jails. Frankly this is not working well and I'm a little unsure how to debug.

First time I had a problem was when I played around with setting up a Minecraft server. I found that by starting and stopping jails, the whole server would sometimes reboot.

More recently I tested setting up Plex in a jail, following the instructions from a thread in this forum and installing the Plex server from pkg. Worked ok... until I realised my SMB shares via the CIFS-service no longer works from Mac OS X (10.11.5). Finder fails to connect to the server and then drops the whole server off the "shared" sidebar. Needs a force-restart of Finder itself to resolve, but then consistently fails again. I didn't realise the connection at first, but when I stop the Plex jail everything snaps back to life and works as before again.

The Plex jail has its own static IP, separate from the static IP FreeNAS itself which the CIFS service is also explicitly bound to. VIMAGE enabled on the jail. Still, they clearly interfere with each other somehow.

Final icing on the cake arrived a few minutes later after I had stopped the jail and the server decided to spontaneously reboot again. No clues in the logs as to what happened.

The is a relatively fresh install, not much config added yet, just a few few datasets and a few users. CIFS, NFS, SMART and SSH services running.

I'm surprised jails can be this unstable on a fresh and almost vanilla install... unless I had some kind of hardware problem which ONLY manifests itself when running jails? I have otherwise hammered it pretty hard with memtest and I/O via dd and the sharing services through burn-in, and not a glitch.

Am I missing something obvious here...?


Jun 20 20:48:23 alaska devd: Executing '/etc/rc.d/dhclient quietstart igb1'
Jun 20 22:56:06 alaska kernel: epair0a: link state changed to DOWN
Jun 20 22:56:06 alaska kernel: epair0a: link state changed to DOWN
Jun 20 22:56:06 alaska kernel: epair0b: link state changed to DOWN
Jun 20 22:56:06 alaska kernel: epair0b: link state changed to DOWN
Jun 20 22:56:06 alaska kernel: igb1: link state changed to DOWN
Jun 20 22:56:06 alaska kernel: igb1: link state changed to DOWN
Jun 20 22:56:06 alaska kernel: bridge0: link state changed to DOWN
Jun 20 22:56:06 alaska kernel: bridge0: link state changed to DOWN
Jun 20 22:56:06 alaska kernel: igb1: promiscuous mode disabled
Jun 20 20:56:10 alaska devd: Executing '/etc/rc.d/dhclient quietstart igb1'
Jun 20 22:56:10 alaska kernel: igb1: link state changed to UP
Jun 20 22:56:10 alaska kernel: igb1: link state changed to UP
Jun 20 22:57:01 alaska manage.py: [common.pipesubr:61] Popen()ing: /sbin/zfs list -H -o name '/mnt/pool1/jails/.warden-template-standard'
Jun 20 22:57:01 alaska manage.py: [common.pipesubr:61] Popen()ing: /sbin/zfs get -H origin '/mnt/pool1/jails/plex'
Jun 20 22:57:01 alaska manage.py: [common.pipesubr:61] Popen()ing: /sbin/zfs list -H -o name '/mnt/pool1/jails/.warden-template-standard'
Jun 20 22:57:01 alaska manage.py: [common.pipesubr:61] Popen()ing: /sbin/zfs get -H origin '/mnt/pool1/jails/plex'
Jun 20 23:02:08 alaska syslog-ng[1510]: syslog-ng starting up; version='3.6.4'
Jun 20 23:02:08 alaska kernel: ifa_del_loopback_route: deletion failed: 48
Jun 20 23:02:08 alaska Freed UMA keg (udp_inpcb) was not empty (120 items). Lost 12 pages of memory.
Jun 20 23:02:08 alaska Freed UMA keg (udpcb) was not empty (1169 items). Lost 7 pages of memory.
Jun 20 23:02:08 alaska Freed UMA keg (tcptw) was not empty (540 items). Lost 12 pages of memory.
Jun 20 23:02:08 alaska Freed UMA keg (tcp_inpcb) was not empty (119 items). Lost 12 pages of memory.
Jun 20 23:02:08 alaska Freed UMA keg (tcpcb) was not empty (44 items). Lost 15 pages of memory.
Jun 20 23:02:08 alaska hhook_vnet_uninit: hhook_head type=1, id=1 cleanup required
Jun 20 23:02:08 alaska hhook_vnet_uninit: hhook_head type=1, id=0 cleanup required
Jun 20 23:02:08 alaska Fatal trap 12: page fault while in kernel mode
Jun 20 23:02:08 alaska cpuid = 0; apic id = 00
 

Nick2253

Wizard
Joined
Apr 21, 2014
Messages
1,633
The system has been rock-solid from start and through burn-in

What exactly did you do for burn in?

This problem makes me think that you might have bad memory.
 

Jailer

Not strong, but bad
Joined
Sep 12, 2014
Messages
4,977
What are you running for a power supply?
 

SweetAndLow

Sweet'NASty
Joined
Nov 6, 2013
Messages
6,421
Jails are pretty stable and shouldn't be causing a crash. Do you have any tuneables set or did you use auto tune?
 

rungekutta

Contributor
Joined
May 11, 2016
Messages
146
What exactly did you do for burn in?

This problem makes me think that you might have bad memory.
@Nick2253 to add to the previous response above, perhaps memtest86 is not the right tool, if so any other ideas on a better tool for the job? It would seem odd otherwise that memtest86 can read and write to RAM continuously for 4 days without issues, whereas starting and stopping Jails in FreeNAS immediately exposes problems? That must mean to that either memtest86 is a poor testing tool, or alternatively that the issue with Jails is in software rather than hardware...?

At the moment I'm inclined to think the problem lies with FreeNAS itself in combination with my specific hardware combo but as mentioned I'm somewhat at loss to diagnose this further.
 

Nick2253

Wizard
Joined
Apr 21, 2014
Messages
1,633
@Nick2253 to add to the previous response above, perhaps memtest86 is not the right tool, if so any other ideas on a better tool for the job? It would seem odd otherwise that memtest86 can read and write to RAM continuously for 4 days without issues, whereas starting and stopping Jails in FreeNAS immediately exposes problems? That must mean to that either memtest86 is a poor testing tool, or alternatively that the issue with Jails is in software rather than hardware...?

At the moment I'm inclined to think the problem lies with FreeNAS itself in combination with my specific hardware combo but as mentioned I'm somewhat at loss to diagnose this further.

Memtest is the right tool.

Did you do any other burn in? Hard drives, CPU, anything else?

Don't forget, burn in only catches problems that exist during burn in, and are no guarantee that problems won't develop later. If you still have it easily accessible, I'd recommend running memtest overnight to see if you have any problems right now.
 

Mr Snow

Dabbler
Joined
May 22, 2016
Messages
29
Thanks for pointing me to this thread rungekutta. I'll keep my thread related to just my hw issue I think.

I'm seeing a very similar issue related to jails. The system seems to run stably without jails (admittedly I haven't had it up and running without a jail for a decent length of time). I think at least I can eliminate memory as an issue. I've had memtest running for 9 hours now with zero errors.

I found this thread (bug #4058) which suggests these issues may be related to the Intel i210 chipset, which is what the X11SSM board uses.

So, considering FreeNAS 10 is moving away from jails, my next moves....
  1. Leave the NAS running with no jails for at least 24 hours.
  2. Try and create a bhyve vm (as per doco here) running boot2docker
  3. Create some docker instances to run the tools I want (sonarr, nzbget, plex, etc)
Regards,

CJ
 

maglin

Patron
Joined
Jun 20, 2015
Messages
299
You should check to ensure the MAC address is different in the jail from the physical NIC. I had issues which one of my jails was using the same MAC address causing some strange behavior on my router. No reboots though. I did have a nignix server config cause FreeNAS to have major issues and eventually cause a reboot. Took me a few hours to sort it out.

I currently have the following running in jails.
Jail 1:
Sabnzbd
Sickrage
Couchpotato
Plexpy
Plex
Htpcmanager

Jail2:
Unbound DNSSec server

Jail 3:
Squid
Privoxy

Jail4:
UniFi Controller

Jail5:
Owncloud

It was owncloud that was giving me issues. I would start with checking MAC addresses to maybe fix your CIFS issues.


Sent from my iPhone using Tapatalk
 

Mr Snow

Dabbler
Joined
May 22, 2016
Messages
29
What were the symptoms and solution for the nginx issue? I've seen some mention of nginx in console messages, but figured it was unrelated.

Regards,

CJ
 

maglin

Patron
Joined
Jun 20, 2015
Messages
299
It was a bad config for an older version of Nginx. I'm not well versed in Ngnix and found a newer config to use. My google foo was strong. Sorry I can't help more. But I would pull up my IPMI console and after accessing owncloud a non stop flood would occur and practically bring the network to a hault. That and it was the last thing I had changed before the reboots started.


Sent from my iPhone using Tapatalk
 

rungekutta

Contributor
Joined
May 11, 2016
Messages
146
Mr Snow thanks for the link to bug #4058, very interesting, almost identical symptom to mine on same hardware. I ran memtest again overnight with no issues so I think I'm concluding for now that this is probably a software issue and will leave jails alone altogether. Works very well apart from this and I haven't managed to stress the server under normal use, let alone shake out any other types of issues.
 

Mr Snow

Dabbler
Joined
May 22, 2016
Messages
29
Ok, so some interesting developments at home today!

Because of the mention of network and mac addresses, etc, I decided to change my jail config from using internally assigned IP's to DHCP. For the last 3 hours or so I've been creating and deleting jails and doing all the things that normally caused me a kernel panic. And I've yet to see a single crash. So, tentative yay. I'll hold off on celebrations until this has been stable for a few days.

rungekutta, have you got your jail config set to IPv4 DHCP enabled? If not, give it a try and see what happens.

Regards,

CJ
 

rungekutta

Contributor
Joined
May 11, 2016
Messages
146
Ok interesting, let us know if it remains stable with DHCP or if you run into any more reboots... Mine is static IP too. Needs to be, as I'm running servers (plex) and need routing from the firewall. Suppose I could run dhcp and set up a static mapping against the MAC address in the dhcp server if push came to shove, and if this really works around the problem in FreeNAS/FreeBSD.

Nevertheless seems you may be getting nearer to exposing and narrowing down a bug which then hopefully can be fixed...!
 

Mr Snow

Dabbler
Joined
May 22, 2016
Messages
29
Suppose I could run dhcp and set up a static mapping against the MAC address in the dhcp server if push came to shove

This is exactly what I did. I also need the port forwarding setup (which is why I went static initially).

Uptime 8 hours now (I had to shutdown before I hit the sack last night to swap the box from my desk to it's more permanent home). I'll also raise a bug somewhere (FreeBSD or FreeNAS...) once I'm happy with stability.

Regards,

CJ
 

Mr Snow

Dabbler
Joined
May 22, 2016
Messages
29
So, I've had over 36 hours of stable use since I swapped to DHCP (I deleted all the jails and deleted the dataset before I made the config change). I think I can (somewhat) confidently say that it was using static IPs (well, FreeNAS internally assigned IPs) that was causing the crashes.

rungekutta, it would be great if you are in a position to verify those results. If it gives you a stable environment, I'll raise a FreeNAS bug (which may actually be a FreeBSD thing, but I'm not the best person to decide that).

Regards,

CJ
 

rungekutta

Contributor
Joined
May 11, 2016
Messages
146
So, I've had over 36 hours of stable use since I swapped to DHCP (I deleted all the jails and deleted the dataset before I made the config change). I think I can (somewhat) confidently say that it was using static IPs (well, FreeNAS internally assigned IPs) that was causing the crashes.

rungekutta, it would be great if you are in a position to verify those results. If it gives you a stable environment, I'll raise a FreeNAS bug (which may actually be a FreeBSD thing, but I'm not the best person to decide that).
Thanks, I'll check it out. Will likely by over the weekend or sometime next week.
 
Status
Not open for further replies.
Top