nvme0 Missing Interrupt after ESXi and bios update

SiCwan

Dabbler
Joined
Dec 4, 2018
Messages
14
Hello! I've been using the following setup on 11.2 U2 until about 2 weeks ago, had a power outage long enough for my UPS to drain and shut down the NAS. when I booted it back up everything seemed to work but I had no network, spent about a week troubleshooting things and finally decided to try and update the bios from 2.0a to 2.0c, and to update ESXi 6.7U1 to U3, that didn't help with the network connection issue, pulled the ESXi host out of maintenance mode and found that my FreeNAS wasn't booting, checked in to that and found one of the settings in the BIOS reverted and after correcting that, and assigning the AOC-SLG3-2M2 to Passthough in ESXi, I'm stuck w/ this error trying to boot FreeNAS:

Mod note: removed external image links
- Ericloewe


Running FreeNAS 11.2U2 inside ESXi 6.7
  • SuperMicro X10SDV-TLN4F-O
  • Intel® Xeon® Processor D-1541
  • 64GB
  • Boot with ESXi 6.7: Samsung - 970 EVO 500GB Internal PCI Express 3.0 x4 (NVMe)
    zFS default settings for 4 drives 4*WD Red Pro 6TB NAS Internal Hard Drive - 7200 RPM Class, SATA 6 Gb/s, 256 MB Cache, 3.5" - WD6003FFBX
    SLOG: Intel Optane SSD 800P Series (60GB, M.2 80mm PCIe 3.0, 3D XPoint) - SSDPEK1W060GAXT attached via AOC-SLG3-2M2
  • Supermicro AOC-SLG3-2M2 with PCI passthough to FreeNAS
    Onboard Intel Lync controller (zFS drives) with PCI passthough to FreeNAS
    Onboard M.2 M-Key (Boot drive)
  • Dual LAN with Intel® Ethernet Controller I350-AM2
    Dual 10GbE
    Dual LAN with SoC
 
Last edited by a moderator:

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,681
Virtualization is a house of cards and if any of a hundred different things isn't exactly right, things won't work. That's why we advise against virtualization.

Try detaching the SLOG device from within ESXi and see what happens.

Also I'm not down with pulling screen caps from random back-alley image sharing sites, as you never know what exploits might be getting embedded with them. You might want to upload your screen caps directly to the forum.
 

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,110

SiCwan

Dabbler
Joined
Dec 4, 2018
Messages
14

JJT211

Patron
Joined
Jul 4, 2014
Messages
323
adding the 2nd thing worked, Thanks!

Im having this same exact issue with very similar hardware. What exactly did you add to passthru.map ? Im also using the 800p. Did you just add Intel Optane 800p? What are the numbers after that OP added?
 

SiCwan

Dabbler
Joined
Dec 4, 2018
Messages
14
Im having this same exact issue with very similar hardware. What exactly did you add to passthru.map ? Im also using the 800p. Did you just add Intel Optane 800p? What are the numbers after that OP added?
Login to the shell console of your ESXi and edit the .vmx config file for your FreeNAS. Assuming that Optane is the 1st passthru device that you have added [0], add
pciPassthru0.msiEnabled = "FALSE"
you may have to change the 0 to a different # if its not your first passthough
 

leadphalanx

Cadet
Joined
Jan 24, 2020
Messages
4
I'm also experiencing a similar issue with a 16GB Optane M10 as a SLOG device. I've applied both fixes. With a reboot of the host, everything is happy until you run a minute or two of reads/writes, then the missing interrupt errors show up. May have to find an alternate SLOG.
 

leadphalanx

Cadet
Joined
Jan 24, 2020
Messages
4
Hopefully I'm not jinxing myself, but I may have sorted it out. In addition to the two steps above, I also added this line to FreeNAS's loader.conf:
Code:
hw.nvme.force_intx=1


For the moment, things are stable under load, whereas before it would crash and burn after 30 seconds of heavy writes.
 

leadphalanx

Cadet
Joined
Jan 24, 2020
Messages
4
(am I just not seeing the edit button?)

It appears I was a bit hasty with my earlier post. Things are certainly more stable, I can write several hundred GB worth of data without any issue, but I'm still getting a 'nvme0 missing interrupt' error popping up and stopping things, usually if a write is a couple minutes long. Temps are looking fine across the board, not running into memory issues (32gb dedicated to FreeNAS). Unfortunately, most threads and bug report posts deal with missing interrupt issues on boot, so I'm at a bit of a dead end.

I'm beginning to think the Optane M10 may just not work for this configuration (FreeNAS 11.3 on ESXi 6.7).
 

RegularJoe

Patron
Joined
Aug 19, 2013
Messages
330
Things are certainly more stable, I can write several hundred GB worth of data without any issue, but I'm still getting a 'nvme0 missing interrupt' error popping up and stopping things, usually if a write is a couple minutes long.

Did you ever get your pass thru working?

1) First of all most pass thru is tested on enterprise kit, not a small SOC embedded motherboard with just one slot, do you have a test server you can put all your hardware in and see if a larger server works right?
2) Next when doing pass thru do all devices plugged into the slot, you might never get bifurcated stuff to work, try a single NVME disk and boot VMware from a usb drive. I have never tested pass thru on bifurcation as I have older kits and older vmwware so it has been working for a long time, ESXi 5.5 and Xeon 55xx with both Dell and HP enterprise 2u servers.
3) next you might have a curse of passing thru the chip set devices to FreeNAS, I have always had perfect success passing thru NON-chipset devices that plug into a slot. Hence you may have to not use your onboard sata or NVME for your FreeNAS VM. If your NVME is in not directly wired to the CPU it may also be a part of the fenced SOC devices using a PCIe bridge of sorts.

The passthru I have used would be Fiberchannel cards, dual LSI SAS HBAs and quad SAS HBAs. PCIe 2.0 and PCIe 3.0 devices. That is a pretty skinny list of hardware you can get pass thru to work on. The pass thru I wished I could use would be a video card and a PCIe USB 3.0 card to have a virtual workstaion running on FreeNAS storage via bhyve, but this is not a reality yet.

Thanks,
Joe
 

richardm1

Dabbler
Joined
Oct 31, 2013
Messages
19
Code:
pciPassthru0.msiEnabled = "FALSE"

Try this for the passthru device in the vmx file. It seems to be working with my Optane 800p.
 

leadphalanx

Cadet
Joined
Jan 24, 2020
Messages
4
Revisited this issue now that I've gotten around to updating to TrueNAS core - figured I'd update with more info for anyone who runs into this thread in the future. In my particular configuration (running FreeNAS 11 virtualized on ESXI, passing through an optane SLOG) the device crashes under load, even with all 3 NVME fixes applied. My research came up with one other person with a similar situation, who was able to replicate the issue on FreeBSD 11, and found that it was fixed in FreeBSD 12.

Now that I've updated to FreeNAS 12/TrueNAS I've gone through setting up my Optane SLOG again. So far I have not encountered any crashes under load.

The two fixes in this thread are still required for the Optane to work at all: https://redmine.ixsystems.com/issues/26508#note-68

I also have this line added in TrueNAS's loader.conf (but I'm not sure if this is still necessary):
hw.nvme.force_intx=1
 
Top