TempleHasFallen
Dabbler
- Joined
- Jan 27, 2022
- Messages
- 34
This is a post to document a lot of troubleshooting that was done as well as to share information with the goal of getting an ESXi hypervisor to run nested as a VM inside of TrueNAS SCALE.
Short backstory:
In the process of upgrading and migrating from TrueNAS Core (storage) and 2 hypervisors (compute, connected via 10GbE to TN Core) to a single machine to lower power consumption. The Truenas box is a X10DRi with dual E5-2680v3's and 256GB ddr4-ecc, while the hypervisors are Z8NA-D6's with dual X5675's and 64GB ddr3-ecc each. For connectivity between the boxes, Intel X540-T2's were used.
The goal is to keep the VM format ESXi compatible and basically directly mount the NFS datastore on the ESXi VM, so that at any time, additional hypervisors can just spin up if required and VM's can be migrated over if more compute is required by simply mounting the NFS share.
For testing purposes one of the hypervisors was used with SCALE 22.02.01 as well as Bluefin (but that brought up a *lot* of bugs)
Main issues faced:
For ESXi 7.0.3 I can't get it to see a network card or pass through a card.
After changing hardware to an old consumer system (i7-3970X, DDR3), I was able to pass through the X540-T2 without a problem. This suggests that it may be due to hardware and may work on the target system, however, it also means that effectively, I would have to physically route all the datastore NFS traffic from one X540-T2 attached to the VM to another attached to TrueNAS.
Alternatively, is there any way to add or modify the KVM options provided for adapters to allow for vmxnet3 adapters? Or alternatively edit the VM configuration directly? There seems to be a ticket already open about this too, but it has basically zero activity as well. I understand that vmxnet3 adapter libraries and not at all present in SCALE 22.02 (so its not only an issue of have the option available via GUI/CLI). Modifying the adapter model type to vmxnet3 via virsh did not affect a change either (not to mention it effectively breaks the GUI communication to libvrt).
If anyone has any idea how to make the e1000 driver for ESXi 7.0.3 work, or knows how to get VirtIO to work or has any suggestions regarding it, please assist.
Short backstory:
In the process of upgrading and migrating from TrueNAS Core (storage) and 2 hypervisors (compute, connected via 10GbE to TN Core) to a single machine to lower power consumption. The Truenas box is a X10DRi with dual E5-2680v3's and 256GB ddr4-ecc, while the hypervisors are Z8NA-D6's with dual X5675's and 64GB ddr3-ecc each. For connectivity between the boxes, Intel X540-T2's were used.
The goal is to keep the VM format ESXi compatible and basically directly mount the NFS datastore on the ESXi VM, so that at any time, additional hypervisors can just spin up if required and VM's can be migrated over if more compute is required by simply mounting the NFS share.
For testing purposes one of the hypervisors was used with SCALE 22.02.01 as well as Bluefin (but that brought up a *lot* of bugs)
Main issues faced:
- ESXi 6.5 installer PSOD's regardless of hardware configuration (tried various CPU modes, NIC and Disk configs)
- ESXi 6.7 installer PSOD's regardless of hardware configuration (tried various CPU modes, NIC and Disk configs)
- ESXi 7.0.3(a-f) do not support the e1000 used by KVM as a NIC. It seems to virtualize as an Intel 82540EM which is deprecated in 7.0
- Preloading the ISO with community drivers did not resolve the issue.
- e1000 support was supposedly added in 7.0.3f but it does not work with SCALE KVM's e1000.
- booting with option "preferVMklinux=true" with custom or standard iso did not affect any change
- VirtIO NICs are not supported at all by ESXi and I could not find any way to make them work
- Passthrough issues with NIC's... Dedicated card did not want to pass through to VM at all (IOMMU enabled, SRV-IO enabled)
- VM fails to start when trying to attach one of the two onboard NICs (separate entries in lspci)
- Code:
group 24 is not viable Please ensure all devices within the iommu_group are bound to their vfio bus driver
- A ticket seems to be already open but without much activity which should allow isolating/blacklisting PCI devices other than GPU's in the future as that seemed to be the cause of the issue.
- VM fails to start when trying to attach the dedicated X540-T2 (either both ports or either of them separately)
- Code:
failed to setup container for group : Failed to set iommu for container: Operation not permitted
- For passing through either dedicated or onboard NICs I also tried setting custom kernel options and rebooting (both downstream and multifunction and separately) as it suggested it may be an IOMMU grouping issue:
- Code:
midclt call system.advanced.update '{"kernel_extra_options": "pcie_acs_override=downstream,multifunction"}
- VM fails to start when trying to attach one of the two onboard NICs (separate entries in lspci)
For ESXi 7.0.3 I can't get it to see a network card or pass through a card.
After changing hardware to an old consumer system (i7-3970X, DDR3), I was able to pass through the X540-T2 without a problem. This suggests that it may be due to hardware and may work on the target system, however, it also means that effectively, I would have to physically route all the datastore NFS traffic from one X540-T2 attached to the VM to another attached to TrueNAS.
Alternatively, is there any way to add or modify the KVM options provided for adapters to allow for vmxnet3 adapters? Or alternatively edit the VM configuration directly? There seems to be a ticket already open about this too, but it has basically zero activity as well. I understand that vmxnet3 adapter libraries and not at all present in SCALE 22.02 (so its not only an issue of have the option available via GUI/CLI). Modifying the adapter model type to vmxnet3 via virsh did not affect a change either (not to mention it effectively breaks the GUI communication to libvrt).
If anyone has any idea how to make the e1000 driver for ESXi 7.0.3 work, or knows how to get VirtIO to work or has any suggestions regarding it, please assist.