FreeNAS crash at boot time when NVdimm populated

nephri

Dabbler
Joined
Sep 20, 2015
Messages
40
Hi,

I have a storage server built as follow:
- FreeNas 11.2U5
- SuperMicro X10DRH-CT (using the last bios)
- 2x CPUs E5-2643 (total of 12 cores/24 threads @ 3.4Ghz)
- 64 GB RAM DDR4 ECC
- 16 GB RAM NVDIMM DDR4
- Few HBA and Disks

On the bios, i set theses properties:
- Enable ADR [Enabled]
- Erase-Arm NVDIMMss [Disabled]
- Restore NVDIMMs [Enabled]
- ACPI Shutdown Trigger ADR [Enabled]

On FreeNas (before populated NVdimm), i turned on some tunables:
- Set nvdimm_load to YES the loader tunable
- Set ioat to YES the loader tunable

The kldstat showed theses modules successfully loaded:
Code:
> kldstat
  Id Refs Address            Size     Name
 1   73 0xffffffff80200000 25608a8  kernel
 ...
 6    1 0xffffffff8287f000 cee8     nvdimm.ko
 8    2 0xffffffff82892000 11188    ioat.ko
 ...


When the NVdimm is populated, the freenas is starting and crash with this following error:

crash_freenas_nvdimm.jpg


If someone have any idea of what to do !

Best Regards,
Sébastien.
 
D

dlavigne

Guest
If you decide to report this at bugs.ixsystems.com, post the issue number here.
 

Skud

Dabbler
Joined
Mar 26, 2013
Messages
15
Did you ever resolve this issue? I'm experiencing the exact same problem with very similar hardware on 11.3-RC2. I haven't tried other versions yet.

Thanks!!
Riley
 

nephri

Dabbler
Joined
Sep 20, 2015
Messages
40
Hi, for now i haven't resolved this issue and my nvdimms sit on the table....

I have to start my box under a live usb key on Linux to see what is happen (but didn't do it yet)
 

Skud

Dabbler
Joined
Mar 26, 2013
Messages
15
Thanks for the reply. I'm having some strange issues with my setup.

Originally, I had an X10SRH board, which I found out Supermicro does NOT support for NVDIMMs. However, they did function perfectly fine under Linux (Ubuntu Live 19.10) and I could interact with the /dev/pmemX devices. The only thing that didn't work are the save/restore. The options weren't available in the BIOS and Supermicro support told me that the single CPU board physically didn't have the hardware for that.

I picked-up a X10DRH-CT and I'm using the same CPU as before (single CPU). I have 16GB RDIMMs in A1 AND B1 and 1 x Micron 16GB NVDIMM in C1. I can see all the NVDIMM options in the BIOS and the save/restore appears to be working correctly. The BIOS waits for restore on boot-up and I see the LEDs on the DIMM for save/restore on boot-up and shutdown.

However, I can't get anything to show-up in an OS. Linux shows that there is a 16GB persistent memory region (Type 7), but ndctl and ipmctl don't detect anything. Also, FreeNAS hangs when I try to load the nvdimm or ioat modules. If I add them to loader then it panics during boot.

Still doing some digging....

Riley
 

mav@

iXsystems
iXsystems
Joined
Sep 29, 2011
Messages
1,428
The shown "Unrecoverable machine check exception" probably means that the NVDIMM was used in some other system and does not consist valid ECC data for this system (it may depend even on specific memory topology within the system). So any attempt to access it end up in exception and following panic, since it is practically a fatal memory error. What you need to do is forcefully wipe NVDIMM to make it restart from scratch. How to do it depends on motherboard. Some may have BIOS options just for that. Sometimes you may be able to disable saving on S5 and then shut down your system clean from the OS. As the most barbaric way it may help to disconnect battery backup unit from the NVDIMM module on the running system and then pull the power from the system, so that NVDIMM could not back up its content to flash and restart clean later.
 

nephri

Dabbler
Joined
Sep 20, 2015
Messages
40
I will go on this direction as soon as possible to check this.
Thanks for your reply.
 

Skud

Dabbler
Joined
Mar 26, 2013
Messages
15
The Linux test will be the kicker.... FYI - the latest Ubuntu 19.10 desktop live works just fine with NVDIMM support.

I'm thinking that these boards (X10DRH) have broken NFIT entries in the ACPI table. This is just a wild-ass guess though. Under Linux neither the nfit nor nvdimm modules load properly and the NFIT registers don't look "right".

The NFIT string on the "working" system is:

ACPI: NFIT 0x00000000798CC498 000198 (v01 SUPERM GRANTLEY 01072009 AMI 00000000)

I see the nfit and libnvdimm modules loaded properly:
lsmod | grep nfit
nfit 65536 2
libnvdimm 188416 5 dax_pmem_core,nd_btt,nd_pmem,dax_pmem_compat,nfit

...and I have nmem and pmem devices in /dev

The NFIT string on the non-working system (X10DRH) is:

ACPI: NFIT 0x0000000079B21E70 000028 (v01 ALASKA A M I 00000001 ? 00000001)

That entry looks "stock" like it's never been customized from the base AMI BIOS image they built off of. I'm not a BIOS engineer by any means and this is just a wild-ass guess though.

The nfit and nvdimm modules do not load on this system. lsmod | grep nfit shows nothing.

...and there are no nmem or pmem devices created.

I have a case open with SM support. So, we'll see what happens.
 

upgrayedd

Cadet
Joined
May 14, 2020
Messages
1
The nfit and nvdimm modules do not load on this system. lsmod | grep nfit shows nothing.

...and there are no nmem or pmem devices created.

I have a case open with SM support. So, we'll see what happens.

Same symptom on x10dri, no nmem or pmem devices.

SMC get back?
 

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,702
I see that support for NVDIMMs arrives with TrueNAS 12 (maybe in Core, maybe not).

You might want to try building a boot drive for the nightly train and see what happens.
 

Skud

Dabbler
Joined
Mar 26, 2013
Messages
15
Same symptom on x10dri, no nmem or pmem devices.

SMC get back?
Yes, I went back and forth with SM support for a few weeks and basically they told me that since the X10 boards don't have "official" support for JEDEC NVDIMMs (mine are JEDEC) then there is nothing they could do. If I wanted to use NVDIMMs with an X10 board then I would need to use "Legacy" DDR4 NVDIMMs. I wasn't too thrilled because I bought that board specifically because they said that it would work. It went..

X10SRL - JEDEC NVDIMMs worked perfectly in the OS and I could use them as PMEM devices. However, the ADR/save function didn't work on power-off and the options were missing in the BIOS. SM support told me that while they may "work" in the OS, the single CPU X10 boards don't have the required hardware to trigger the ADR function on the DIMMS. I was told only the DP boards had that. I asked if the X10DRH worked and they said yes. So...

X10DRH - NVDIMMs didn't work at all. They would show as regular memory but I could never get them to work as PMEM devices. All the BIOS options were present and there is a bunch of literature out there from JEDEC/Micron/Smart that their JEDEC NVDIMMs work with this *exact* board (they used it for demoing their NVDIMMs) but SM support says they don't support JEDEC DDR4 NVDIMMs - only legacy. Only the X11 boards support JEDEC NVDIMMs. I suspect that the companies working on this has a special BIOS with the requisite support. So..

X11SPL - Everything works fine. I had some issues with the ADR function - specifically the "Erase-ARM NVDIMMs" option in the BIOS has a really poor description. Everywhere you read says that enabling the "Erase-ARM" option will DISable the NVDIMMs ability to erase-arm-save data on power-off. In reality this option needs to be ENabled for the save function to work.

Another thing to watch-out for is the pull-up resistor on the SAVE pin. There are two types of NVDIMMs out there ones with the pull-up resistor and ones without.


For Micron NVDIMMs there would be either a P or an X in the part number. The X has the resistor on the DIMM and the P does not. For me, the X parts did not work.
 
Top