Linux crashes with VirtIO

brumnas

Dabbler
Joined
Oct 4, 2015
Messages
33
Hello,

My motivation: To import a huge database (a few TBs, runs on fast dedicated HW days or weeks) onto a ZFS volume, attached to a VM-Linux, which runs the import-task.

I've setup an Ubuntu 18.04 box and all the database+importing stuff. The VM-box itself is on a dataset on my Pool1/zvol, the data-disk holding the imported data is on the Pool2/zvol.
=> My VM lives on a zvol1 and has attached a data-disk on zvol2.

I've seen the FreeNAS UI warning "VirtIO is not so much stable in some OS but it's faster", but I've tried it anyway. The VM kept crashing, vert badly crashing, like the console was half-broken with I/O-pipes errors when trying to execute a command, broken boot-filesystem (zvol1, VM itself) and such. I was snapshoting quite heavily to have rollback-points for troubleshooting - at the end I was forced to switched to AHCI-mode for attaching the zvol2 to the VM: everything works smoothly since then.

It seems the Linux drivers worked quite nicely, as the Kernel detected the zvol2 w/out any issues, I was able to mkdir/cp etc. But at random, like every few minutes after a reboot, the VM crashed.

I suspect the problem lies in FreeNAS/FreeBSD/bhyve and not the Linux-VirtIO drivers, as:
- The crashes never corrupted the filesystem on zvol2, but zvol1 boot filesystem
- Never resulted in any kernel-log errors, like broken filesystem, unresponsive drive or anything like that => the OS didn't notice any problems with the I/O
- Crashed pretty ugly the VM itself, like terminal playing crazy, network broken and such

Any ideas? The AHCI works from the very first try really stable, no issues, the import-process is now running almost 24 hours. But: it's SLOW. I've analyzed the bottleneck a bit:
- Inside the VM, 50% of CPU time is waiting for I/O
- Inside the VM, the iostat shows the attached zvol2-drive as being almost everytime 100% saturated
- Outside the VM, FreeNAS tells me the 3 zvol2-disks are reading/writing only a few (1-4) MBs - that's really peanuts
- FreeNAS reports show the total-load is minimum and ZFS not heavily used

My questions:
1. I understand the import-performance as a bottleneck in the AHCI: am I wrong?
2. Why is the VirtIO so unstable? It looks like FreeNAS/bhyve problem, is it? But nothing is logged in the VM and FreeNAS either.

Thank you for any ideas!
 

Patrick M. Hausen

Hall of Famer
Joined
Nov 25, 2013
Messages
7,776
Which FreeNAS version are you running, what is the layout of your pool, etc. ...?

I am running two Ubuntu 18.04 VMs in production on FreeNAS 11.3-U2.1, so it *can* be perfectly stable.
 

brumnas

Dabbler
Joined
Oct 4, 2015
Messages
33
FreeNAS-11.3-U2
Pool-1: 8 x 3TB, 2 spare
Pool-2: 3 x 4TB, 1 spare
RAM: ECC 64GB
Supermicro X11SSL-CF
Tunables: vfs.zfs.arc_max = 30GB, so there is still 32GB for the VM.

I'm talking about the VirtIO zvol-sharing to the Ubuntu VM. As I've said, the box is running stable otherwise, when using the zvol as AHCI - but the disk-access is rather slow, giving me up 100% busy for writing lousy 18MBps.

Thank you
 
Top