K Panic running 11.2 w 2620v2 + X9DR3-LN4F+ (or X9DRi) board (supermicro)?

SMnasMAN

Contributor
Joined
Dec 2, 2018
Messages
177
hi, is anyone running 11.2 on a supermicro X9DR3-LN4F+ board or a similar X9 supermicro board?

I ask as i have tried everything to get 11.2 (or the nightlys of 11.3) to run on my test system, but i keep getting panics during boot up.
(ive tried literally everything, not limited to: removing all HW and disabling MB SATA / SCU, different heavily tested QVL memory, USB or HDD boot drive-> all exhibit the exact same issue. (see my bug link below if you want full details and vids of incident)

11.1 on this system has been rock solid under 24/7 heavy stress-testing for right at a month (30 days).

(Im still learning / playing with FN so i dont have any real data on it yet)

Im mainly looking to see if others with my board type are having this or other issues (or not) w only 11.2+.


i do have a very detailed bug report filed,
https://redmine.ixsystems.com/issues/66130

thanks
 

rvassar

Guru
Joined
May 2, 2018
Messages
971
Lots of us are running Supermicro X9's, but probably not as many here running dual socket. That is a very interesting stack trace. Since it's down in ZFS code, I'm left wondering if it's panicing trying to swap. Have you validated you boot pool devices and that I/O chain? Maybe split up the boot pool across different controllers?
 

Jailer

Not strong, but bad
Joined
Sep 12, 2014
Messages
4,974
Have you checked to see if you are on the latest bios for your motherboard? @danb35 was having a random reboot issue with his X9 board that a bios update solved.
 

danb35

Hall of Famer
Joined
Aug 16, 2011
Messages
15,456
I'm still not clear if it was the BIOS update or physically removing power from the machine--but in any event, my problem does appear to be resolved.
 

SMnasMAN

Contributor
Joined
Dec 2, 2018
Messages
177
hi guys, thanks alot for the replies. to add a bit more info / answer ?s (at bottom):
1- to be clear im only having issues with FN 11.2 (and any 11.3 nightly too) - When im testing this (at this point), i have removed everything (physically- LSI 9207's/10gbit card/HDDs , everything i can, removed.) from the system so that im either installing to just a USB , or i have tried install to just a single HDD (via the MB sata port). Its always right towards the end of boot up process, panic (usually same exact spot too).

I am running the latest bios (v3.3).

Im not sure why i didnt think of this till yesterday, but i have 2x other of the exact same board i use for testing of memory or as spares. (so i have 3x X9DR3-LN4F+ boards).

So i tested 11.2 on another x9dr3 and it works! (and i stressed FN 11.2 over night, no issues). The only difference in that setup is i only have 1x CPU installed, and its a Xeon v1 cpu (e5-2620v1) (i also tested both BIOS 3.2 and 3.3 on this setup, both worked, so i dont think its BIOS related).

as a reminder, The problem setup has 2x cpus and both are Xeon v2 cpus (e5-2620v2).


So im now off to try just 1x CPU (and will try desperately, 1x v1 cpu and 1x v2 cpu) in the problem setup and report back.
(this would be the first time ive seen OS issues related to a CPU setup though, so im a bit skeptical of either 11.2 or my hardware)


>>Have you validated you boot pool devices and that I/O chain? Maybe split up the boot pool across different controllers?
A: not 100% clear on your question, but all the hardware is tested extensively with FN 11.1 , and split up the boot pool/different controllers- Wouldn't trying to install to a USB stick, and also trying to install to a sata HDD, cover this? (or am i wrong?)

>>I'm still not clear if it was the BIOS update or physically removing power from the machine--but in any event, my problem does appear to be >>resolved.
danb; was this on FN 11.2? (your reboot issue , and were you 1x cpu or 2x cpu?)

thanks
 

SMnasMAN

Contributor
Joined
Dec 2, 2018
Messages
177
WTF! so with only 1x cpu (same e5-2620v2) , on the problem setup, 11.2 seemed to work! (even with all my hardware connected)

EDIT: While i had 11.2 up and running fine for ~ 15 minutes (further than ive ever got when with 2x cpus), when i went to shutdown FN11.2, 11.2 PANIC'd during the shutdown process (fatal trap 9 , but i didnt have time to grab a pict)
(also ix sys has updated my bug report, see towards bottom of link
they say: "This seems to be similar to some other panics -- initially reported against NFS, but it looks like ZFS. The NULL pointer and offset of 0x328 are the same. "
and
zio->io_cv
the likely source of trouble.
(end of EDIT)


Boots fine, im able to use 11.2 and run disk IO, no probs. So it is directly an issue of having 2x CPUs connected, with this setup, only when using 11.2 or 11.3betas

Im going to test more on the other duplicate boards i have (ie try 2x e5-2620v1).

but this is def. progress!. so as of now, all else being same:

2x E5-2620v2 CPUs - FN 11.1 works fine for a month (w stress), 11.2 (and .3 betas) panics during boot-up.
1x E5-2620v2 CPUs - FN 11.2 does not panic at boot (and seems fine).


is anyone running 11.2 with dual cpus? (im sure yes, but would like to know their config).

thanks
 
Last edited:

danb35

Hall of Famer
Joined
Aug 16, 2011
Messages
15,456
is anyone running 11.2 with dual cpus? (im sure yes, but would like to know their config).
Yes, and my config is in my .sig--motherboard is an X9DRD-7LN4F-JBOD with 2x E5-2670 CPUs.
 

SMnasMAN

Contributor
Joined
Dec 2, 2018
Messages
177
Yes, and my config is in my .sig--motherboard is an X9DRD-7LN4F-JBOD with 2x E5-2670 CPUs.
interesting, but you are NOT running v2 cpus correct? (ie your 2x cpus are not E5-2670v2 cpus), everything im seeing so far seems to point to this being an issue only with v2 CPUs (weather 1 or 2 sockets used) and 11.2 <- my opinion
(and zfs related according to ixSys on the bug report)
thanks
 

rvassar

Guru
Joined
May 2, 2018
Messages
971
>>Have you validated you boot pool devices and that I/O chain? Maybe split up the boot pool across different controllers?
A: not 100% clear on your question, but all the hardware is tested extensively with FN 11.1 , and split up the boot pool/different controllers- Wouldn't trying to install to a USB stick, and also trying to install to a sata HDD, cover this? (or am i wrong?)

I was thinking there was some kind of failure in the device driver between 11.1 and 11.2... But it looks like you have a CPU issue of some sort. If not an outright CPU failure. If you pull a CPU and it suddenly works, what happens when you swap the CPU's and run the one you pulled as a single?
 

SMnasMAN

Contributor
Joined
Dec 2, 2018
Messages
177
rvassar- sorry i updated the bug report, but forgot to update here, its a e5-26xx v1 vs v2 cpu (number of cpus does not matter) , below is paste from my last update to the bug report (https://redmine.ixsystems.com/issues/66130):

So i have narrowed it down to EXACTLY one thing:

Using a v2 cpu will cause the 11.2 crash.

(i only a few e5-2620v2 - so i havent tested any other v2 cpu models, like a e5-2640v2 for example)

I currently have 2x e5-2620 v1 cpus in the same system (with all hardware added back) that was panicing with 11.2 - and it has been running fine for over 3 hours with constant random dd load to 2x pools.

If i use 1x e5-2620 v2 cpu in the same system, it will crash as soon as i do any pool disk IO (same goes for my other 2nd MB)

If i use 2x e5-2620 v2 cpu in the same system, it will crash as soon during the boot up (towards the end, as i outlined in posts above)

so the issue follows the v2 CPUs around (and is isolated to them).
thanks
 

rvassar

Guru
Joined
May 2, 2018
Messages
971
rvassar- sorry i updated the bug report, but forgot to update here, its a e5-26xx v1 vs v2 cpu (number of cpus does not matter) , below is paste from my last update to the bug report (https://redmine.ixsystems.com/issues/66130):

So i have narrowed it down to EXACTLY one thing:

Using a v2 cpu will cause the 11.2 crash.

(i only a few e5-2620v2 - so i havent tested any other v2 cpu models, like a e5-2640v2 for example)

I currently have 2x e5-2620 v1 cpus in the same system (with all hardware added back) that was panicing with 11.2 - and it has been running fine for over 3 hours with constant random dd load to 2x pools.

If i use 1x e5-2620 v2 cpu in the same system, it will crash as soon as i do any pool disk IO (same goes for my other 2nd MB)

If i use 2x e5-2620 v2 cpu in the same system, it will crash as soon during the boot up (towards the end, as i outlined in posts above)

so the issue follows the v2 CPUs around (and is isolated to them).
thanks

Very nice bit of troubleshooting! Out of curiosity did the actual kernel dump wind up in the iXsystems hands? I know you printed the backtrace, but sometimes they need the whole pile of rubble.
 

SMnasMAN

Contributor
Joined
Dec 2, 2018
Messages
177
thanks. (i have not been able to get a kdump), but thats a good point (getting a kdump) , you might be able to help me with a bit, so pls correct me if im wrong below:

as i could rarely get 11.2 (w 2x v2 cpus) to even fully boot , i had no way of getting/extracting a kdump , right? (as its written to a volatile boot drive, but that is wiped on every reboot. as its a ram disk essentially).

but now that i can get 11.2 to boot for a few minutes (and then crash, when using just 1x v2), i now can set crashdump to store its kdumps files in a location i specify , then it will crash as usual in a few minutse? (ie ill set it to persistent location, so i can grab it and send to ix sys).

(and btw, its was just a random occurrence that i was able to get that single backtrace- every other time it panic'd it would reboot within 10 seconds, and was not allowing any input , nor the "db>" prompt, that one time was just luck)

thanks
(im really good with linux and win, but FN is my first run with freebsd! loving fn tho)

EDIT btw; ive now had the full system that wouldnt run 11.2 (when it had any # of v2 CPUs installed) ,
running 11.2 for over 18 hours now under heavy disk IO + network stressing, no problems. ( ofcourse with 2x v1 CPUs (2x e5-2620v1) , further supporting issue is related directly to use of a v2 cpu

EDIT2: possibly a related issue with 11.2 (issues during boot) and same SM X9 MB + v2 CPUs:
https://forums.freenas.org/index.php?threads/11-2-release-u1-wont-install.72644/
 
Last edited:

rvassar

Guru
Joined
May 2, 2018
Messages
971
(im really good with linux and win, but FN is my first run with freebsd! loving fn tho)

I'm afraid the last time I could claim any significant competence with BSD was SunOS 4.1.4 circa 1993... I've been in the Solaris 2.x+ and Linux worlds since then, and only checked in on the BSD's from time to time to see what they were up to. With FreeNAS it is actually kind of refreshing to get back to my youth, but it also leaves me at a technical disadvantage on details like this.

I did look at the FreeNAS dump config and there's a comment that they handle it in middleware somehow, and by default /etc/rc.conf has 'savecore_enable="NO"', but your crash is happening early enough that the middleware likely isn't even running yet.... So I'm not quite sure how to proceed. In FreeBSD I'd enable that and make sure it pointed somewhere accessible, and see what gets caught.

Maybe one of the FreeNAS developers could offer a clue here?
 

SMnasMAN

Contributor
Joined
Dec 2, 2018
Messages
177
well still no good news with v2 CPUs and FN 11.2 , in my bug post, ixSys support suggested i try bios 3.2 vs the latest 3.3 i was running (as there were cpu micocode updates + some spectre vuln mitigations supermicro applied in bios 3.3).
( https://redmine.ixsystems.com/issue...on=54&next_issue_id=66096&prev_issue_id=66144 )

however, the result is the exact same, (same panic, same spot during boot w 11.2 when using 2x v2 cpus on bios 3.2).
just like with bios 3.3, FN 11.1 has no issues with my v2 cpus running bios 3.2.

Anyone else have any ideas? or is there anyone else running 11.2 FN with intel v2 CPUs (ideally on supermicro x9 hardware).

thanks
 

SMnasMAN

Contributor
Joined
Dec 2, 2018
Messages
177
so wow, i have an interesting update to this thread. (to recap, ONLY with 11.2 , not 11.1 - i kept getting kernel Panics during boot up, or about 30s after boot up- as soon as any ZFS disk IO occurred). This was on the x9dr3 setup i describe above. (and i filed a bug report with IXsys as you can see linked above)

around the same time (about 4 months ago), i had also bought a separate system (different source), with a x9dri-ln4f+ board , and 2x 2640v2 CPUs (and 16gb of ram on the SM QVL). That system had no issues with 11.2 , and i have been running 11.2 for months, with not one panic nor problem of any kind.

(i also have been running the x9dr3 setup for months as well, but with 2x 2620v1 CPUs, on 11.2 - also with not one panic nor problem of any kind.)
(the end assumption by me/ixsys was there is some incompatible specifically with the X9dr3 board, and v2 CPUs, with FN 11.2)

Today i wanted to move some parts around as im getting closer to my final/"production" freenas system,

So i put the 2x 2620v2 , into the x9dri board (which should work fine), and installed the latest 11.2-U2.1 to a usb. and MUCH to my surprise, the exact same panic started happening again!! So this means the 11.2 issue is actually an issue with both of these x9 boards AND specifically the e5-2620v2 CPUs (but does not occur with 2640v2 cpus!).

Twice, I tried just 1x 2620v2 (via swapped the 1x around) , as well as my other set of memory (both my sets of memory are ecc, and on the SM QVL for both boards). I even tried swapping the 2620v2 's around , running just 1x CPU. in all cases the panics occur! (still 11.1 no problems, and still w 2640v2 , no problems).

given all this, i would maybe think there is something wrong with the 2x 2620v2 cpus i have, but i doubt it, as i have stress tested many different OSs and did not see a single issue (this was months ago, and b4 i even first install FN). additionally, the same cpus have run 11.1 for months with no problems.

The other poster a few replies up, who is not having problems with 11.2 and this same board, is NOT using a 2620v2 (he is using a 2670v1).

im not sure if i should bother buying another 2620v2 , or buy a different v2 cpu (i want the lowest power possible, thus why i was going for 2620v2 which is 6core/80w vs 2640 which is 8core/95w).

has anyone else run 11.2 on 26xxV2 cpus? (if pls post your MB type). also im going to update my ixsys bug report with this info.
thanks
 
Last edited:

Chris Moore

Hall of Famer
Joined
May 2, 2015
Messages
10,080
Anyone else have any ideas? or is there anyone else running 11.2 FN with intel v2 CPUs (ideally on supermicro x9 hardware).
I have my test system running 11.2, but it is not used much. I have not reviewed all the posts yet. Is this something that only happens under certain circumstances?
I am sure there's a lot of people running X9 single CPU systems that are not having trouble. Is there any other model v2 processor that has been tested?
 

pro lamer

Guru
Joined
Feb 16, 2018
Messages
626

SMnasMAN

Contributor
Joined
Dec 2, 2018
Messages
177
Are they both SR1AN sSpec? (A wild guess)

Yes they are both SR1AN rev. (partially related fyi- i never use ES cpus) but i think i may have found the issue AFTER ALL OF THIS!

I think just ONE of the v2 cpus i have is somehow partially bad. but it ONLY manifests itself in FN 11.2. (not in 11.1 nor any several other OSs i have run on this CPU and stressted SUPER extensively- and i mean NOT even one hickup nor anything unusual during my 1-2 months of stressing the system + 2 months of FN 11.1 running with no issues).

about 3 weeks ago i got some more v2 CPUs in to test this issue further. after several hours of swapping CPUs (testing both single and dual CPU configs) amoung 2x systems (both X9 dual CPU MBs). It does seem that the issue follow just 1 single 2620-v2 cpu specifically. Somehow that 1 cpus must be bad in a minor way that causes this.

I have a new 2620v2 on the way to 100% confirm this, but over about 2 weeks of testing 2x setups with 11.2 , i have not seen any issues:

setup 1 (11.2 u2.1)- 2x 2640v2 , X9DR3-LN4F+ , 128g QVL ECC

setup 1 (11.2 u2.1)- 1x 2620v2 (one from that orginal pair of cpus that started all this) , X9DRi-LN4F+ , 64g QVL ECC.

On both setups, using that 1x "bad" 2620v2 cpu in a single cpu setup, both would panic within 30s of FN 11.2 completing its bootup (it was always when disk IO of any kind started).

this will be one WEIRD issue, as ive never come across a bad cpu that only acts up with one specific version of one specific OS, but i guess that is the case! (will know for sure in the next week and will update here). Ive built or managed well over 200 servers, and bad cpus alone are rare enough, but this is really rare/unusual. (or maybe im just not at the scale/server count to see frequent cpu issues)
 
Joined
Apr 20, 2019
Messages
1
I had to laugh when I found your bug report (thanks for linking to this thread inside of it). I've been getting kernel panics on 11.2 as well. I wanted to create a bug report, but for some reason I can't. Instead, I searched through the bug history for "fatal trap 12." I read through a couple with nothing interesting to go on. Before giving up and closing the tab, I happened to see the bug you reported. Guess what? I have the exact same motherboard (X9DR3-LNF4+) and exact same same CPU's (Xeon 2620 v2's). Small world! Like you, I ran through the gambit. I had everything in a 2u 216 SuperMicro chassis with the SAS2 backplane. I moved everything over to the same chassis with a SAS3 backplane. I immediately started having the same symptoms as you! I updated to the latest firmware on the 9300-8i I had installed. I also tried using the onboard controller like you. What I'm seeing:

If I have a single SAS2 drive plugged into the backplane, there are no issues. If I plug in any type of SATA drive (SSD or HDD) and try to import or create a new pool on it, I get the kernel panics (mostly 12 but sometimes 9). Interesting enough while trying to install FN on different drives, I came across a kernel panic when using the BMC virtual CD drive by plugging it in and out. I was able to repeat this a couple times, although it's not always repeatable the first time.

I wonder if I somehow have a bad CPU like you, or if there's something in the firmware causing this. I saw you mentioned that you tried the 3.2 BIOS with same results, so I won't try that. However, I'd like to compare an older BIOS with older microcode and see if I can extract that and add it to the 3.3 BIOS to see if this resolves the issue. I want to keep the 3.3 BIOS since I have the AOC-SAS3 card with two NVMe drives on it and I believe only the latest BIOS had the bifurcation option in it.

If you happen to want to exchange more info or diagnose, could be interesting. Seems like maybe you've moved on from this. Either way, I want to give you a big shout out for providing all the info you did! I've been throwing money at this issue, and it's great to see I'm not alone in my battle.
 
Top