TrueNAS Core 12: SCSI IO timeout on VMware ESXi "passtrough enviroment"

josketto

Cadet
Joined
Nov 23, 2020
Messages
5
Hi,
I'm unable to complete TrueNAS "as VM" installation on the following enviroment:
  • Server: Quantagrid D52B-1U
  • CPU: Intel Xeon Silver 4110 @ 2.10 GHz
  • RAM: 96 GB DDR4
  • Onboard PCI Controller with the following disks:
    • n. 1 NVMe Samsung MZQLB960HAJR-0007 960 GB
  • Avago LSI 9361-8i 12 Gbps SAS Controller, configured in JBOD mode, with the following disks:
    • n. 2 SSD Intel SSDSC2KB48 480 GB
    • n. 4 SAS WDC WUH721414AL5201
  • Hardware BIOS Settings: EUFI
  • Operating system: VMware ESXi 7.0.1 (boot from NVMe)
    ESXi Disk.jpg
  • Passtrough device: MegaRAID SAS Invader Controller
    Passtrough.jpg
  • TrueNAS Virtual Machine:
    • Boot virtual disk: 20 GB on NVMe Datastore
    • 32 GB dedicated RAM
    • 4 vCPU
    • PCI Device: MegaRAID SAS Invader Controller
The TrueNAS setup initialization is stopping after some monites with the following message:
mrsas0: initiating target RESET because of SCSI IO timeout
Schermata 2020-11-24 alle 10.21.32.jpg


If I try to install the same version of TrueNAS directly con the "bare metal" hardware (same BIOS and HBA config) it's working fine.

What could I do to make it works in ESXi?

Thanks in advance to everyone can help me.

Francesco
 

josketto

Cadet
Joined
Nov 23, 2020
Messages
5
I've analyzed the boot screens and I found the following difference between physical (working) and virtual (not working) setup:

Avago-Boot-Phisical.jpg


Avago-Boot-Virtual.jpg


I hope thus can help for a resolution.
 

josketto

Cadet
Joined
Nov 23, 2020
Messages
5
Hi jgreco, thank you very much for the reply.

I've read your article and I can understand that the RAID Controller LSI 9361-8i, even if configured in JBOD mode, could keep control of the phisical disks, creating some problem for TrueNAS direct management of them.

I will keep it in mind that, but the server I'm working with is pre-configured by a cloud provider and I can't choose the exact RAID controller model (only the disk configuration), so I can't change the PCI device.

By the way, my "SCSI IO timeout" problem was related only to TrueNAS setup on VMware enviroment: the phisical setup (on the same server) was working fine.

Finally, I've been able to solve the VM passtrough problem following the suggestions of this external article and the TrueNAS post-install for VMware guidelines, by adding the following lines to the "/boot/loader.conf" file:
  • if_vmx_load="YES"
  • hw.pci.honor_msi_blacklist=0

    boot-loader.conf.jpg
After the reboot I've added the PCI device to the VM (via ESXi passtrough) and all the disks attached to the "MegaRAID SAS Invader Controller" have been recognized. You can show the result in the following screenshots:

camcontrol-devlist.jpg


truenas-disks.jpg


Do you think this configuration is ok to make me sure that I "... DO HAVE A TRUE HBA ..." ?

Thanks again for your response.

Francesco
 

Attachments

  • camcontrol-devlist.jpg
    camcontrol-devlist.jpg
    117 KB · Views: 199

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
Hi jgreco, thank you very much for the reply.

I've read your article and I can understand that the RAID Controller LSI 9361-8i, even if configured in JBOD mode, could keep control of the phisical disks, creating some problem for TrueNAS direct management of them.

No, while that is true, it is also that the driver isn't up to it, and that the controller itself can suffer a variety of issues under heavy load.

I will keep it in mind that, but the server I'm working with is pre-configured by a cloud provider and I can't choose the exact RAID controller model (only the disk configuration), so I can't change the PCI device.

Don't be silly. Of course you can change the PCI device. Buy from a cloud provider who sells what you need. Or colocate your own server.

It's just like buying a server. If you have high random I/O needs and you need a server with lots of lightning fast SSD, and then you walk in the door of the computer store and they only sell servers with HDD, you just walk out the door. It's very simple. You don't buy the HDD-based server and say "but that's all they had." It being all they had will not magically make it work better or be an acceptable server. Your HDD-based server will suck and bad things may happen. I hope you get what I'm saying here.

By the way, my "SCSI IO timeout" problem was related only to TrueNAS setup on VMware enviroment: the phisical setup (on the same server) was working fine.

Yes, I know that's what you think, but typically the mrsas stuff goes sideways when bad things happen. It's like riding around in a car with no airbags. It gets you where you're going, until maybe one day you head-on someone and swallow your steering wheel, because you had no airbags. My measure of success for a NAS is that it safely and securely stores your data, both under adverse conditions as well as regular conditions.

Do you think this configuration is ok to make me sure that I "... DO HAVE A TRUE HBA ..." ?

No. I quote,

https://www.truenas.com/community/t...s-and-why-cant-i-use-a-raid-controller.81931/

4) FreeBSD may or may not have good support for other HBA's/RAID controllers.

[...]

If you have come to FreeNAS with the intention of making your NAS into a guinea pig for testing of an unknown and untested controller, then, by all means, go ahead. Just please be aware that the measure of success isn't "I got it to make a pool." It is possible for things to work for weeks, months, even years before something adverse happens.

Yay you got it to make a pool. That is not *my* measure of success.

But, you'll go and do it anyways, most likely. I'm cynical, not naive. I do understand that getting a proper setup can be challenging. There is no magic button I can press to force a bad hardware choice to magically work flawlessly for you, I'm sorry. :-( If you are intent on running mrsas, I suggest running minimal scrubs because they tend to flood the controller, and I suggest having a second server with the data replicated onto it in case things go sideways.
 

josketto

Cadet
Joined
Nov 23, 2020
Messages
5
Hi jgreco, your reply is indisputable and I THANK YOU very much for being particularly firm towards me.

The limitation in the choice of hardware that the cloud provider imposes is compensated by the fact that I can have new servers available in 120 seconds and a very powerful private network infrastructure with a good quality / price ratio.

I'm not currently using TrueNAS in production environment (yet). I am simply experimenting various hardware options (and limitations) to evaluate the possibility of using your product for production enviroment in a VMware distributed datacenter network.

Your answers and the articles you linked to me are just what I need in this evaluation phase!

Before testing the "virtual" TrueNAS solution on another type of server (without RAID controller), I ask you: is it possible to "crossflash" the firmware for the RAID card I mentioned (LSI 9361-8i)? If so, would it protect me from the risk of an "incident"?
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
In theory, the high end LSI RAID controllers use a similar overall design to their low end counterparts, but have different firmware loaded onto them to take advantage of the beefier CPU, additional RAM, and lots of additional complexity that the high end controllers are blessed with.

It is apparently possible to crossflash the LSI 2108 and 2208 (926x, 927x, 928x) into a 2308, though I have not done this myself, because 2308's are cheap and I have no desire to bludgeon more expensive controllers into cheap controllers.

https://mywiredhouse.net/blog/flashing-lsi-2208-firmware-use-hba/

This is, of course, an older generation of controllers than your 9361. The point is simply that it isn't entirely outside the realm of possibility. However, I am not aware of anyone having done this with the 93xx controllers, and there are a lot of possible sharp edges and points that might hurt you along the way. I don't recommend trying, but if it worked, you'd end up with a supported controller.
 

josketto

Cadet
Joined
Nov 23, 2020
Messages
5
Hi jgreco, thank you once again for the very clear answer.

I have no intention of becoming a "pioneer" of LSI firmware flashing! :smile:

I would say to close the topic here, which seems to me to have had a more than exhaustive technical discussion.

I congratulate you and the entire TrueNAS community for the support and the technical evolution you're providing.

It has been a real pleasure to be able to discuss with you.

Kind regards,

Francesco

PS:considering the test results and the hardware limitations I have on the cloud managed infrastructure, I will soon evaluate the experimentation of a "classic" solution (other than ZFS) for the virtual NAS configuration.
 
Top