Hello,
My motivation: To import a huge database (a few TBs, runs on fast dedicated HW days or weeks) onto a ZFS volume, attached to a VM-Linux, which runs the import-task.
I've setup an Ubuntu 18.04 box and all the database+importing stuff. The VM-box itself is on a dataset on my Pool1/zvol, the data-disk holding the imported data is on the Pool2/zvol.
=> My VM lives on a zvol1 and has attached a data-disk on zvol2.
I've seen the FreeNAS UI warning "VirtIO is not so much stable in some OS but it's faster", but I've tried it anyway. The VM kept crashing, vert badly crashing, like the console was half-broken with I/O-pipes errors when trying to execute a command, broken boot-filesystem (zvol1, VM itself) and such. I was snapshoting quite heavily to have rollback-points for troubleshooting - at the end I was forced to switched to AHCI-mode for attaching the zvol2 to the VM: everything works smoothly since then.
It seems the Linux drivers worked quite nicely, as the Kernel detected the zvol2 w/out any issues, I was able to mkdir/cp etc. But at random, like every few minutes after a reboot, the VM crashed.
I suspect the problem lies in FreeNAS/FreeBSD/bhyve and not the Linux-VirtIO drivers, as:
- The crashes never corrupted the filesystem on zvol2, but zvol1 boot filesystem
- Never resulted in any kernel-log errors, like broken filesystem, unresponsive drive or anything like that => the OS didn't notice any problems with the I/O
- Crashed pretty ugly the VM itself, like terminal playing crazy, network broken and such
Any ideas? The AHCI works from the very first try really stable, no issues, the import-process is now running almost 24 hours. But: it's SLOW. I've analyzed the bottleneck a bit:
- Inside the VM, 50% of CPU time is waiting for I/O
- Inside the VM, the iostat shows the attached zvol2-drive as being almost everytime 100% saturated
- Outside the VM, FreeNAS tells me the 3 zvol2-disks are reading/writing only a few (1-4) MBs - that's really peanuts
- FreeNAS reports show the total-load is minimum and ZFS not heavily used
My questions:
1. I understand the import-performance as a bottleneck in the AHCI: am I wrong?
2. Why is the VirtIO so unstable? It looks like FreeNAS/bhyve problem, is it? But nothing is logged in the VM and FreeNAS either.
Thank you for any ideas!
My motivation: To import a huge database (a few TBs, runs on fast dedicated HW days or weeks) onto a ZFS volume, attached to a VM-Linux, which runs the import-task.
I've setup an Ubuntu 18.04 box and all the database+importing stuff. The VM-box itself is on a dataset on my Pool1/zvol, the data-disk holding the imported data is on the Pool2/zvol.
=> My VM lives on a zvol1 and has attached a data-disk on zvol2.
I've seen the FreeNAS UI warning "VirtIO is not so much stable in some OS but it's faster", but I've tried it anyway. The VM kept crashing, vert badly crashing, like the console was half-broken with I/O-pipes errors when trying to execute a command, broken boot-filesystem (zvol1, VM itself) and such. I was snapshoting quite heavily to have rollback-points for troubleshooting - at the end I was forced to switched to AHCI-mode for attaching the zvol2 to the VM: everything works smoothly since then.
It seems the Linux drivers worked quite nicely, as the Kernel detected the zvol2 w/out any issues, I was able to mkdir/cp etc. But at random, like every few minutes after a reboot, the VM crashed.
I suspect the problem lies in FreeNAS/FreeBSD/bhyve and not the Linux-VirtIO drivers, as:
- The crashes never corrupted the filesystem on zvol2, but zvol1 boot filesystem
- Never resulted in any kernel-log errors, like broken filesystem, unresponsive drive or anything like that => the OS didn't notice any problems with the I/O
- Crashed pretty ugly the VM itself, like terminal playing crazy, network broken and such
Any ideas? The AHCI works from the very first try really stable, no issues, the import-process is now running almost 24 hours. But: it's SLOW. I've analyzed the bottleneck a bit:
- Inside the VM, 50% of CPU time is waiting for I/O
- Inside the VM, the iostat shows the attached zvol2-drive as being almost everytime 100% saturated
- Outside the VM, FreeNAS tells me the 3 zvol2-disks are reading/writing only a few (1-4) MBs - that's really peanuts
- FreeNAS reports show the total-load is minimum and ZFS not heavily used
My questions:
1. I understand the import-performance as a bottleneck in the AHCI: am I wrong?
2. Why is the VirtIO so unstable? It looks like FreeNAS/bhyve problem, is it? But nothing is logged in the VM and FreeNAS either.
Thank you for any ideas!