Inaccessible VM after a period of time

Ziggy

Contributor
Joined
Oct 7, 2015
Messages
157
I have installed an ubuntu server 20.04 vm and installed wireguard within it. I have successfully connected to it external to the LAN via the android wireguard app. SSH set up and working also. After about 24 hours - sometimes more, sometimes less - both ssh and vpn access are gone, even tho the gui shows the vm as being up. Only a restart fixes the problem. Tho I'm still not quite a newbie, I am certainly inexperienced in network matters, I wonder how I would go about investigating this. Have looked at 'sudo journalctl' output but cannot see anything obvious - even tho I'm not sure what I should be looking for, or even if this is where I should look. Grateful for any suggestions.
 

Ziggy

Contributor
Joined
Oct 7, 2015
Messages
157
Postscript: I saw elsewhere here in the forums (I will link to it if we can get this working) where someone had suggested using Virti0 instead of the Intel NIC. I tried this last night and initially all was well, but this morning the login was in some kind of journalctl loop. I executed that command last night with the 'less' option but am pretty sure I exited it. In any case, rebooting doesn't fix it, the loop continues.
 

samuel-emrys

Contributor
Joined
Dec 14, 2018
Messages
136
If you open a serial connection to the VM and leave it open, you may get some output when the VM becomes unresponsive. I was having a similar issue in the past that seems to have been rectified by adding more RAM to my server. The errors I was getting were the following:

Code:
[168474.229769] INFO: rcu_sched detected stalls on CPUs/tasks:
[168474.231971]         1-...: (0 ticks this GP) idle=342/0/0 softirq=9278254/9278254 fqs=0
[168474.234501]         (detected by 0, t=5252 jiffies, g=2846663, c=2846662, q=7242)
[168474.236975] rcu_sched kthread starved for 5252 jiffies! g2846663 c2846662 f0x0 RCU_GP_WAIT_FQS(3) ->state=0x1


Just trying to interpret these messages indicates that it was something to do with the CPU stalling, perhaps deadlocking on this task. Other posts [1][2] with similar error messages indicated problems allocating memory, which seems to be consistent with my solution but I'm missing the backtrace in my error messaging so I don't have the details regarding what's caused it. I didn't get much further in my research but hopefully this is helpful to you. My bhyve VM was running Debian 8.
 

Ziggy

Contributor
Joined
Oct 7, 2015
Messages
157
Thanks for that quick response. Funnily enough I thought I had allocated 4gb of ram but it was only 2, so have bumped that to 4. I have 2 CPUs allocated also so may increase that if the extra ram doesn't work. I had already reverted to the Intel nic too, so may try another combination of these changes if no success with your proposed solution. I have 16gbs of system ram so I suppose I could try 8 for the vm if 4 doesn't work?
 

samuel-emrys

Contributor
Joined
Dec 14, 2018
Messages
136
Thanks for that quick response. Funnily enough I thought I had allocated 4gb of ram but it was only 2, so have bumped that to 4. I have 2 CPUs allocated also so may increase that if the extra ram doesn't work. I had already reverted to the Intel nic too, so may try another combination of these changes if no success with your proposed solution. I have 16gbs of system ram so I suppose I could try 8 for the vm if 4 doesn't work?
See how you go, it might work. I was allocating 2GB to the VM prior, and changing the allocation to 4GB made no difference for me. The server had 32GB of RAM installed, and I've since upgraded that to 128GB. I haven't seen the issue since increasing the system RAM, and I have left the VM allocated at 4GB. Either way, I think it would be worth attaching a serial session to the VM and observing any error messages to confirm that your error is similar to mine.
 

Ziggy

Contributor
Joined
Oct 7, 2015
Messages
157
Thanks. I had forgotten to ask - how do I attach a serial session to the VM?
 

Ziggy

Contributor
Joined
Oct 7, 2015
Messages
157
P.S. Apologies, I had misread your original post - you had added ram to the system not the vm.
 

samuel-emrys

Contributor
Joined
Dec 14, 2018
Messages
136
Thanks. I had forgotten to ask - how do I attach a serial session to the VM?

You can do it from the FreeNAS GUI - I'm not sure how stable it is though:
Virtual Machines > Click the arrow to your VM > Serial

Alternatively, you can connect via the terminal by noting down the "Com Port" on the same page as above (will look something like /dev/nmdc4A) and issuing this command from the freenas terminal:

Code:
sudo cu -l /dev/nmdc4A


This will bring you to a login screen. If you use something like tmux, you'll get a more stable session. Try whatever is easiest for you and see how you go. In testing this issue, I've noticed that my issue hasn't been fixed, it was just a bit more stable than it was prior. Regardless, perhaps this gives you a place to start.
 

soupy

Cadet
Joined
Aug 3, 2020
Messages
5
Hello,

I am also troubleshooting the same issue. I have the issue on both a VM of FuryOS (BSD) and a Linux VM. I have tried Fedora 32, Xubuntu, and Ubuntu Server 20.04. I am currently running CentOS 8, which runs table for >48 hours UNTIL I do all of the updates post installation. I have just reinstalled using VirtIO for the disk and the Intel driver for the network (Virtio worked during install but not on first reboot)

Hardware wise I am running a Xeon T20 with 12GB on RAM. I have ordered an additional 16GB but it is very slow to arrive.
 

soupy

Cadet
Joined
Aug 3, 2020
Messages
5
I have tried all combinations of CPU cores and RAM with no change. I've attached a serial session and will update accordingly.
 

Patrick M. Hausen

Hall of Famer
Joined
Nov 25, 2013
Messages
7,776
My experience is that the VirtIO driver is the best choice for disk and network for both Linux and FreeBSD. One cause of the problem that the VM is unreachable after the first reboot is Linux device naming.

If you install Ubuntu with one disk, one network card and VGA/VNC, the network card ends up as enp0s5. If you remove the VGA device after installation it gets renamed to enp0s4. There may be other scenarios that make something like this happen.

If your VM is connected during install but not after reboot, I'd logon via whichever console you prefer and have available (VNC or serial) and check with ifconfig -a what the interface is named and then in /etc/netplan what is actually configured. Most certainly you have a config problem within Linux here.

HTH,
Patrick
 

Ziggy

Contributor
Joined
Oct 7, 2015
Messages
157
Interesting development. I increased ram to 4gb and updated FreeNAS to 11.3-U4. On reboot same problem. VM went down after about 12 hours. Subsequently I happened to be working on a Windows machine - I think this is incidental, but I want to explain just in case - and had a Putty terminal open into the VM while I worked for about 8-12 hours or so on Windows. The VM remained connected and I then decided to leave the Putty connection running overnight. Following morning, still connected, so as I was going away for 4 days I opted to leave the Windows machine on and Putty linked to the VM as an experiment. During this time checked the VPN connection to the VM outside the LAN and it worked every time. On return about 3/4 days ago I exited the Putty session and switched off the Windows machine and the connection has persevered ever since. Total uptime is now 8 days.
Note, however, that on return about 4 days ago I also deleted a couple of other VMs that though they weren't runnnng contemporaneously may have contributed to warnings I had been receiving about over commiting resources (now gone) when I launched the ubuntu server after each downtime. I have also since deleted all the snapshots associated with FreeNAS for the main pool and then recreated one main one.
One other thing I notice is that the 'users' command via a Linux terminal into the VM shows 2 users of the same name - the only other user besides root (don't think I created a root user when the VM was built). Since I am not logged in via the GUI shell or any other device, I cannot figure out why the same user is logged twice. In any case, could this second login be the reason the VM remains connected by keeping it alive somehow? In relation to Patrick's comment above about Virti0 I am still using Intel. With regard to Samuel-Emry's advice on connecting via a serial session, I tried this but obtained some kind of error. Unfortunately I didn't log it so cannot now remember what it was. Apologies and thanks for the help. Will update in due course.
 

LordCrc

Cadet
Joined
Nov 28, 2018
Messages
7
I'm having similar issues, though mine started recently after upgrading from Debian 9 to Debian 10. Running 11.3-U. VM shows as running, but is completely unresponsive. Serial output doesn't show anything (it doesn't really show anything except a message about "running blind").

Restart makes the thing come back alive for a while. It seems to correlate with network activity, and using the VirtIO network drivers seems worse (though might just be random). Can run fine for 5 minutes or 5 hours.
 

Ziggy

Contributor
Joined
Oct 7, 2015
Messages
157
As luck would have it ... gone again! Last night was having difficulty accessing my shares on an Android TV box and thought it was due to FreeNAS, so after a couple of attempts at restarting services I ended up rebooting. It turned out the router was the problem after all that. Doh! Why? Because, sure enough the VM became inaccessible again after about 12 hours. I note that there's only one user logged in now as I look at it via Linux terminal, therefore I am still wondering if somehow the second user login had been keeping the VM alive. I am going to try and troubleshoot the logs, but I am very much a noob in this area.
 

Ziggy

Contributor
Joined
Oct 7, 2015
Messages
157
UPDATE:
Just logged in from Linux terminal and I appear to be connected (blinking cursor also). However, when I attempted to check via the FreeNAS GUI - GUI says VM is up - and VNC tab in it, I get the ‘failed to connect to server’ error message. Then I go back to the terminal and altho the cursor is blinking, the terminal is unresponsive – doesn’t take entered text (e.g. tried ‘uptime’). Pity I didn’t try this before attempting the VNC connection to see if I had a response. The Tlot plickens :smile:
 

spiceygas

Explorer
Joined
Jul 9, 2020
Messages
63
This sounds similar to a problem I've been having with an Ubuntu Server 20.04 VM running in Bhyve. (Posted here) Interestingly, my Ubuntu 20.04 Desktop VMs haven't had similar problems -- only Ubuntu Server. Not sure if that's a relevant distinction.

I'll be watching the conversation, and am hopeful you find a solution that I can steal :smile:.
 

Ziggy

Contributor
Joined
Oct 7, 2015
Messages
157
This sounds similar to a problem I've been having with an Ubuntu Server 20.04 VM running in Bhyve. (Posted here) Interestingly, my Ubuntu 20.04 Desktop VMs haven't had similar problems -- only Ubuntu Server. Not sure if that's a relevant distinction.

I'll be watching the conversation, and am hopeful you find a solution that I can steal :smile:.
Thanks for your input. No, I had Ubuntu 20.04 Desktop and equivalent result. That is why I tried the server edition but same outcome. No different with 19 server edition either. I've had earlier versions of Mint (19) Desktop work without a problem on earlier versions of FreeNAS - 11.1 and possibly 11.2. I first noticed the problem with Mint 20.04 and 11.3 (I'm now on 11.3-U4). Couldn't even get it to install. It seemed as if the installer could not see the /dev. I wouldn't even bother with this except I cannot get Wireguard to work in a jail. The latter has more to do with my being unable to get past the initial instructions at https://www.ixsystems.com/community/resources/how-to-setup-a-wireguard-vpn-server-in-a-jail.147/. All I'm really looking for is a workable Wireguard VPN and the VM provides that but for the dropped connections. If I can't get this sorted soon I may opt for a Raspberry Pi for the VPN. It just seems a waste to have to run two machines side by side when FreeNAS can probably do this if I can just find the solution.
 

Ziggy

Contributor
Joined
Oct 7, 2015
Messages
157
UPDATE: I may be wrong, but I’m beginning to see the possibility that for some reason the launching of a VNC session is the trigger that makes the VM inaccessible. My VM is running two days now and I have only accessed it via terminal in Linux or an iOS or Android app. Previously I sometimes felt the need to check the VM via the GUI and so would always click on the VNC tab and then obtain the error in connection message. If the VM stays connectable without VNC use, then it seems VNC may very well be the culprit. In the words of Arnie, “I’ll be back.”
 

samuel-emrys

Contributor
Joined
Dec 14, 2018
Messages
136
UPDATE: I may be wrong, but I’m beginning to see the possibility that for some reason the launching of a VNC session is the trigger that makes the VM inaccessible. My VM is running two days now and I have only accessed it via terminal in Linux or an iOS or Android app. Previously I sometimes felt the need to check the VM via the GUI and so would always click on the VNC tab and then obtain the error in connection message. If the VM stays connectable without VNC use, then it seems VNC may very well be the culprit. In the words of Arnie, “I’ll be back.”
We might be having different issues; but this doesn't match the symptoms I see. I only notice the VM has become unresponsive when the web service I host there is no longer accessible, I don't VNC in.
 

Ziggy

Contributor
Joined
Oct 7, 2015
Messages
157
We might be having different issues; but this doesn't match the symptoms I see. I only notice the VM has become unresponsive when the web service I host there is no longer accessible, I don't VNC in.
I agree. I’m not really sure if VNC is the foundational issue, and of course you and I may have different issues, or there may be a number of issues affecting VMs in 11.3-U4. It will be interesting to see if TrueNAS has these problems. I’m almost tempted to give it a go, but it’s reportedly very much a beta still
 
Top