RancherOS Issues (web interface sometimes not coming up, cannot upgrade, etc)

Status
Not open for further replies.

Sasayaki

Explorer
Joined
Apr 20, 2014
Messages
86
Hi guys,

I've been running 11.1.U4 for a while now, after finally transitioning off Coral and back to jails, and then out of Jails and back to Docker (haha). However, I've come into issues with the RancherOS implementation in FreeNAS. Specifically:

- Oftentimes, after rebooting the VM, the web interface (https://<ip>:8080) doesn't come up. The only solution to this seems to be waiting a while to see if it's coming to life, and if it isn't, rebooting until it does. Obviously this is quite frustrating, since "reboot it 3-6 times until it works" is not a great solution. Anyone else having this issue?

- The command 'sudo ros os upgrade' does not appear to be honoured. The version is currently stuck at v1.1.3, yet v1.3.0 is available. When I attempt to upgrade, it confirms the upgrade was complete, but after rebooting, loads into v1.1.3.

- Some containers fail when their config directories are loaded from an NFS drive. Others are totally fine. I'm guessing, based on research, that this is because of the way NFS does file locking with databases; however, when I use SMB shares instead of NFS, they seem to fail to connect sometimes after rebooting, which causes problems with the containers (they load without their config directories and therefore in a 'default' state).

- Whenever the web interface is loaded, one CPU sits at 100% usage constantly. Bizarrely, it doesn't seem to be reported in 'top' on the RancherOS server, but when I use 'top' on the actual bare-metal FreeNAS machine, it reports Bhyve at somewhere around 100%-130%, even when it should nominally be idling. Again, the RancherOS VM reports that it's idling, but the FreeNAS box doesn't.

None of these are huge showstoppers (even if the rebooting thing is highly annoying...), but they are frustrations. Anyone else having similar issues?
 
Last edited:

KrisBee

Wizard
Joined
Mar 20, 2017
Messages
1,288
The command 'sudo ros os upgrade' does not appear to be honoured. The version is currently stuck at v1.1.3, yet v1.3.0 is available. When I attempt to upgrade, it confirms the upgrade was complete, but after rebooting, loads into v1.1.3.

This is a consequence of implementation used to create and boot the "Docker VM" type. I've posted about this before elsewhere. I don't know if there is a newer outstanding ticket for this issue. If not, raise a new one.

Oftentimes, after rebooting the VM, the web interface (https://<IP>:8080) doesn't come up. The only solution to this seems to be waiting a while to see if it's coming to life, and if it isn't, rebooting until it does. Obviously this is quite frustrating, since "reboot it 3-6 times until it works" is not a great solution. Anyone else having this issue?

If you're running a rancher server on the rancheros base, you can always see how long it takes for all the required elements to load by connecting to the VM with ssh and checking the running rancher docker containers with a simple docker ps and/or as root system-docker ps Also, if you gave your "docker VM" a fixed IP, you might not have noticed is first makes a DHCP connection before overwriting the IP with the static IP. So on a home network make sure you don't run out of DHCP leases due to reboots.

Some containers fail when their config directories are loaded from an NFS drive. Others are totally fine. I'm guessing, based on research, that this is because of the way NFS does file locking with databases; however, when I use SMB shares instead of NFS, they seem to fail to connect sometimes after rebooting, which causes problems with the containers (they load without their config directories and therefore in a 'default' state).

Not making use of rancher/docker at the moment, but I was under the impression reads/writes to databases mounted over a NFS share is not going to work too well.

Whenever the web interface is loaded, one CPU sits at 100% usage constantly. Bizarrely, it doesn't seem to be reported in 'top' on the RancherOS server, but when I use 'top' on the actual bare-metal FreeNAS machine, it reports Bhyve at somewhere around 100%-130%, even when it should nominally be idling. Again, the RancherOS VM reports that it's idling, but the FreeNAS box doesn't.

Bhyve suck at 100% for any length of time could be a bug in rancher or possibly insufficient resources given to the VM. The combo of rancheros + rancher server for the RancherUI requires 2GB + of memory.

Also, things may run more smoothly if you use two "docker vm". One to run the Rancher UI and a second separate rancher host for the containers. I've read comments that running containers on the same host as the rancher server an cause problems. Unfortunately this uses even more resources.

Another alternative would to be to set up the rancher server on an Ubuntu VM. Rancher advises 4GB of memory if you do this:

https://rancher.com/docs/rancher/v2.0/en/quick-start-guide/#host-and-node-requirements

Personally, after some experimentation, I gave up on rancher. To create and manage a small number docker containers, I found using docker & docker compose within a minimal debian VM manged through portainer to be less resource hungry, easier to understand and hence maintain.
 

Sasayaki

Explorer
Joined
Apr 20, 2014
Messages
86
Hmm, well, thanks heaps for the comments mate. It sucks that these issues exist, but I'll give some of those suggestions a shot anyway and see how it goes. :)
 
Status
Not open for further replies.
Top