Bhyve with Ubuntu 19.04 - keeps locking up?

victorhooi

Contributor
Joined
Mar 16, 2012
Messages
184
I have FreeNAS 11.2.-U6 running on an Atom C3000 machine.

I have a Bhyve-VM, previously running Ubuntu 19.04 (and now 19.10 beta).

Both the disk and NIC are set to VirtIO.

Within this Ubuntu VM, I have qBittorrent running under Docker, and it's also mounting my main FreeNAS ZFS pools via SMB.

The problem is - the Bhyve VM keeps locking up, and becoming unresponsive. When this happens, I can no longer access the qBittorrent web interface, and if I try to launch a VNC session to the Bhyve VM, it does not respond. I do see the following being output to the local console:

Code:
[ 2378.363427] rcu: INFO: rcu_cshed detected stalls on CPUs/tasks:
[2378.365006] rcu: 0-...!: (0 ticks this GP) idle=148/0/0x0 softirq=4885/4885 fqs=1
[ 2378.366726] rcu: rcu_sched kthread starved for 25974 jiffies! g78241 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x402 ->cpu=1
[ 2378.367388] rcu: RCU grace-period kthread stack dump:
[ 9542.860934] rcu: rcu_schedu self-detected stall on CPU
[ 9542.862529] rcu: 0-...!: (4 GPs behind) idle=292/0/0x1 softir1=79946/79947 fqs=0
[ 9570.904083] watchdog: BUG: soft lockup -CPU#0 stuck for 26s! [swapper/0:0]
[10711.881480] watchdog: BUG: soft lockup -CPU#0 stuck for 31s! [swapper/0:0]
[17101.360544] watchdog: BUG: soft lockup -CPU#2 stuck for 1066s! [swapper/2:0]
[53645.015923] watchdog: BUG: soft lockup -CPU#2 stuck for 57s! [swapper/2:0]

Any idea what's going on?
 
D

dlavigne

Guest
Were you able to determine the cause of the lock ups?
 

victorhooi

Contributor
Joined
Mar 16, 2012
Messages
184
No I didn't - I tried with both NFS and SMB/CIFS - it seems to lock up when either one is active.
 

sjieke

Contributor
Joined
Jun 7, 2011
Messages
125
I'm also having this issue, after upgrading to an "Intel(R) Atom(TM) CPU C3758 @ 2.20GHz (8 cores)" with 64GB of ram, so I have plenty to run some VM's.
My old board was had a "Intel Avoton C2750 Octa-Core Processor " cpu with only 16GB of ram, but running 2 VM's was more stable than now. Just to slow because they started to swap due to the lack of RAM

Upgraded board and memory for better performance, but hate it that I have to restart my VM's almost daily...

So if anyone can help it would be really appriciated.
I was thinking it's maybe some BIOS setting or so.

No NFS or SMB yet in the VM. Only installed rancher and some nodes, with nothing on it yet, except monitoring.
 

vastabo

Cadet
Joined
Dec 24, 2019
Messages
1
I'm seeing a similar issue on much higher-spec'd hardware doing far less:

OS Version:
FreeNAS-11.2-U7
(Build Date: Nov 19, 2019 0:4)
Processor:
Intel(R) Xeon(R) CPU D-1518 @ 2.20GHz (8 cores)
Memory:
64 GiB

The VM is Ubuntu 18.04 with 1 vCPU, 2 GB RAM, AHCI Disk*. The only thing it's doing is running the UniFi Controller.

* This hasn't been a problem lately but I'll switch it to VirtIO just to be sure.
 

Attachments

  • Screenshot_20191224_093940.png
    Screenshot_20191224_093940.png
    43.5 KB · Views: 521

apwiggins

Dabbler
Joined
Dec 23, 2016
Messages
41
Were you able to determine the cause of the lock ups?
@dlavigne @Kris Moore ...You do realize that the FreeNAS forum is riddlled with reports of this VM lockup bug in FreeNAS? It has been reported dozens of times by others and I've reported it myself. It's absolutely the most frustrating bug in FreeNAS, yet nobody reports what the root issue is and nobody has fixed in in FreeNAS as of 11.2-U7.

It's appalling that you as a moderator keep coming back with the same softball question -- were you able to determine the cause of the lockups? Has anyone ever reported an actual fix for this issue? If they have what is it? Where is the FAQ on how to fix it???

I'm convinced that there is a bug in how FreeNAS creates VMs. So, what do we as a user and engineering community need to do to fix this ANNOYING bug?
 

Patrick M. Hausen

Hall of Famer
Joined
Nov 25, 2013
Messages
7,776
Well, while I understand your frustration - i just finally diagnosed an annoying FreeBSD vnet/epair bug that only shows in production in rather large network environments (yeah!) after 2 years - please note that I for instance run Ubuntu 18.04 in FreeNAS 11.3-RC1 in production (Atlassian Confluence) with uptimes measuring in weeks.

So there seems to be something particular to your environment. Please open a bug on the FreeNAS JIRA - there you will get noticed by the developers who rarely read the forum and they will hopefully ask the specific questions to hunt down this beast.

Kind regards,
Patrick
 

apwiggins

Dabbler
Joined
Dec 23, 2016
Messages
41
@seanm @dlavigne @Chris Moore So, please equip me with some basics to facilitate debugging.

1. In what file / folder is bhyve logging done? I've been trolling /var/log and not seeing anything, so where is it?
2. To replicate a FreeNAS bhyve launch, where are the command-line options that are used by FreeNAS? Better yet, is there a script that FreeNAS uses to launch a bhyve VM that can be populated with the typical FreeNAS command-line options?
3. In what file / folder is the set of FreeNAS bhyve configuration for a VM stored?
4. If there are shell scripts for any of the above, where are they in a typical FreeNAS installation?

I would like to be able to launch the same FreeNAS bhyve VM from the command-line with any debugging flags enabled in order to create a long-running process to capture stdout and stderr outputs.

The bug takes ~12-24 hours to replicate. In FreeNAS, I'm not doing anything special other than creating simple VMs which lockup about 12-24 hours after launch. When connecting to the VMs via VNC or serial, they magically resume. I've been able to reproduce with multiple guest OS -- ClearLinux, Ubuntu, Alpine, K3OS, etc

@apwiggins maybe your investigation needs to move more towards bhyve... perhaps start here: https://wiki.freebsd.org/bhyve#Troubleshooting
 

Patrick M. Hausen

Hall of Famer
Joined
Nov 25, 2013
Messages
7,776
You probably want to look at /var/log/middlewared.log
It contains at least the full start command line for bhyve.

HTH,
Patrick
 

apwiggins

Dabbler
Joined
Dec 23, 2016
Messages
41
Filed a JIRA bug here: https://jira.ixsystems.com/browse/NAS-104526

There's nothing obvious in the logs. To set the stage, I have a VM running a docker container with Portainer using Alpine Linux as the VM client OS. After about 12-24 hours, the VM drops off the network in what is best described as a suspended state. Although the VM client OS is still active in the BhyveVM (although possibly suspended from bhyve's perspective), it can no longer be pinged and no longer provides Portainer web services. Immediately upon connecting to the VM client OS with either FreeNAS admin web client via serial console or via a VNC web client, the Bhyve VM resumes and the VM client OS instantly resumes Portainer services. While in the suspended state, there is no logging of this event in FreeNAS' middlewared.log.


Looking at the bhyve flags used at launch, -H (-H: vmexit from the guest on hlt) is potentially causing the behaviour above.
(DEBUG) VMService.run():266 - Starting bhyve: bhyve -A -H -w -c 1 -m 1024 -s 0:0,hostbridge -s 31,lpc -l com1,/dev/nmdm14A -l bootrom,/usr/local/share/uefi-firmware/BHYVE_UEFI.fd -s 4,e1000,tap1,mac=00:a0:98:17:c3:60 -s 29,fbuf,vncserver,tcp=192.168.0.200:6113,w=1024,h=768,,wait -s 30,xhci,tablet -s 3:0,ahci,hd:/dev/zvol/tank/rancheros-wgkzbe 14_portainerVM4
 

apwiggins

Dabbler
Joined
Dec 23, 2016
Messages
41

# runs continuously
bhyve -A -H -w \
-c 2 \
-m 2560 \
-s 0:0,hostbridge \
-s 31,\
lpc \
-l com1,/dev/nmdm5A \
-l bootrom,/usr/local/share/uefi-firmware/BHYVE_UEFI.fd \
-s 3,\
e1000,\
tap0,\
mac=00:A0:98:1F:C3:2E \
-s 29,\
fbuf,\
vncserver,\
tcp=192.168.0.200:5905,\
w=1280,h=720,, \
-s 30,\
xhci,\
tablet \
-s 4:0,\
virtio-blk,\
/dev/zvol/tank/Gitlab-storage 5_Gitlab_VM2



# suspends after xx hours (no ping, no web service); resumes upon VNC connect
bhyve -A -H -w \
-c 1 \
-m 1024 \
-s 0:0,hostbridge \
-s 31,\
lpc \
-l com1,/dev/nmdm14A \
-l bootrom,/usr/local/share/uefi-firmware/BHYVE_UEFI.fd \
-s 4,\
e1000,\
tap1,\
mac=00:a0:98:17:c3:60 \
-s 29,\
fbuf,\
vncserver,\
tcp=192.168.0.200:6113,\
w=1024,h=768,,wait \
-s 30,\
xhci,\
tablet \
-s 3:0,\
ahci,\
hd:/dev/zvol/tank/rancheros-wgkzbe 14_portainerVM4
 

KrisBee

Wizard
Joined
Mar 20, 2017
Messages
1,288
@apwiggins Why have you left "wait" enabled on the VNC device attached to the VM that's causing a problem? Do you need a VNC device at all if you can conncet to your VM via ssh and serial console? Why not use virtio NIC on your VM?
 

Patrick M. Hausen

Hall of Famer
Joined
Nov 25, 2013
Messages
7,776
Good point, I only use VNC devices for Windows guests. FreeBSD and Linux only for installation. Afterwards I remove them and configure a serial console.

Have a great new year, everyone!
Patrick

P.S. Hey, when did I ascend to "FreeNAS Guru"? ;)
 

apwiggins

Dabbler
Joined
Dec 23, 2016
Messages
41
@apwiggins Why have you left "wait" enabled on the VNC device attached to the VM that's causing a problem? Do you need a VNC device at all if you can conncet to your VM via ssh and serial console? Why not use virtio NIC on your VM?
@KrisBee

Two points:
1. Why does it matter whether a VNC device is used or not? The VM suspend/pause is incorrect system behaviour if a VNC device is used. To me, this is a bug. In fact, in one case VNC works as expected and the second case, it fails.
2. I did some experimentation with virtio NIC yesterday on bhyve #2 above prior to your post and had the same result.
 

apwiggins

Dabbler
Joined
Dec 23, 2016
Messages
41
Good point, I only use VNC devices for Windows guests. FreeBSD and Linux only for installation. Afterwards I remove them and configure a serial console.

Have a great new year, everyone!
Patrick

P.S. Hey, when did I ascend to "FreeNAS Guru"? ;)
@Patrick M. Hausen Thanks for the tip. If I recall, I used the VNC for the install and just left it in place. Because I've been having this issue, ssh would fail and I had to poke it via VNC. I'll switch to serial to see if that improves.

At the end of the day, it should work with VNC regardless without having the VM suspend. That is just a bug in my view. Having to do a bunch of undocumented workarounds to get functional behaviour is a sure sign of a problem.

Thanks again @KrisBee and @Patrick M. Hausen for your help.
 

KrisBee

Wizard
Joined
Mar 20, 2017
Messages
1,288
@KrisBee

Two points:
1. Why does it matter whether a VNC device is used or not? The VM suspend/pause is incorrect system behaviour if a VNC device is used. To me, this is a bug. In fact, in one case VNC works as expected and the second case, it fails.
2. I did some experimentation with virtio NIC yesterday on bhyve #2 above prior to your post and had the same result.

It may well be a VNC bug, but reports that running VMs with a VNC device increases load even when VM is idle is enough to discourage all unneccessary use. I only mentioned virtio for NIC as it's meant to give better performance, rather than it being related to any VNC issue you may have.
 

apwiggins

Dabbler
Joined
Dec 23, 2016
Messages
41
So, I've removed the VNCs from the VMs with interesting results as it invokes all the EFI issues around blind mode...ugh! More troubleshooting

All outputs from cu -l /dev/nmdmxxX -s 9600 in a tmux from the FreeNAS host to run individual VM consoles in 3 tmux panes.

Clear Linux - no issue

Ubuntu Linux:
Code:
error: no suitable video mode found.
error: no suitable video mode found.
Booting in blind mode


AlpineLinux:
Code:
  Booting `Alpine Linux, with Linux vanilla'

Loading Linux vanilla ...
Loading initial ramdisk ...
error: no suitable video mode found.
Booting in blind mode
 
Last edited:
Top